All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Chris Mason <clm@fb.com>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: don't zero partial page cache pages during O_DIRECT
Date: Fri, 8 Aug 2014 16:39:28 -0400	[thread overview]
Message-ID: <20140808203928.GA64279@bfoster.bfoster> (raw)
In-Reply-To: <53E4E03A.7050101@fb.com>

On Fri, Aug 08, 2014 at 10:35:38AM -0400, Chris Mason wrote:
> 
> xfs is using truncate_pagecache_range to invalidate the page cache
> during DIO reads.  This is different from the other filesystems who only
> invalidate pages during DIO writes.
> 
> truncate_pagecache_range is meant to be used when we are freeing the
> underlying data structs from disk, so it will zero any partial ranges
> in the page.  This means a DIO read can zero out part of the page cache
> page, and it is possible the page will stay in cache.
> 
> buffered reads will find an up to date page with zeros instead of the
> data actually on disk.
> 
> This patch fixes things by leaving the page cache alone during DIO
> reads.
> 
> We discovered this when our buffered IO program for distributing
> database indexes was finding zero filled blocks.  I think writes
> are broken too, but I'll leave that for a separate patch because I don't
> fully understand what XFS needs to happen during a DIO write.
> 
> Test program:
> 
...
> 
> Signed-off-by: Chris Mason <clm@fb.com>
> cc: stable@vger.kernel.org
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1f66779..8d25d98 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -295,7 +295,11 @@ xfs_file_read_iter(
>  				xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
>  				return ret;
>  			}
> -			truncate_pagecache_range(VFS_I(ip), pos, -1);
> +
> +			/* we don't remove any pages here.  A direct read
> +			 * does not invalidate any contents of the page
> +			 * cache
> +			 */
>  		}

That seems sane to me at first glance. I don't know why we would need to
completely kill the cache on a dio read. I'm not a fan of the additional
comment though. We should probably just fix up the existing comment
instead. It also seems like we might be able to kill the XFS_IOLOCK_EXCL
dance here if the truncate goes away.. Dave?

FWIW, I had to go back to the following commit to see where this
originates from:

commit 9cea236492ebabb9545564eb039aa0f477a05c96
Author: Nathan Scott <nathans@sgi.com>
Date:   Fri Mar 17 17:26:41 2006 +1100

    [XFS] Flush and invalidate dirty pages at the start of a direct read also,
    else we can hit a delalloc-extents-via-direct-io BUG.
    
    SGI-PV: 949916
    SGI-Modid: xfs-linux-melb:xfs-kern:25483a
    
    Signed-off-by: Nathan Scott <nathans@sgi.com>
    ...

That adds a VOP_FLUSHINVAL_PAGES() call that looks like it's some kind
of portability API. I would expect the flush to deal with any delalloc
conversion issues vs. the invalidation, so perhaps the invalidation part
is a historical artifact of the api. Then again, there's also a straight
'flushpages' call so perhaps there's more to it than that.

Brian

>  		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
>  	}
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2014-08-08 20:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08 14:35 [PATCH] xfs: don't zero partial page cache pages during O_DIRECT Chris Mason
2014-08-08 15:17 ` Chris Mason
2014-08-08 16:04 ` [PATCH RFC] xfs: use invalidate_inode_pages2_range for DIO writes Chris Mason
2014-08-09  0:48   ` Dave Chinner
2014-08-09  2:42     ` Chris Mason
2014-08-08 20:39 ` Brian Foster [this message]
2014-08-09  0:36 ` [PATCH] xfs: don't zero partial page cache pages during O_DIRECT Dave Chinner
2014-08-09  2:32   ` Chris Mason
2014-08-09  3:19     ` Eric Sandeen
2014-08-09  4:17     ` Dave Chinner
2014-08-09 12:57 ` [PATCH v2] " Chris Mason
2014-08-11 13:29   ` Brian Foster
2014-08-12  1:17   ` Dave Chinner
2014-08-19 19:24     ` Chris Mason
2014-08-19 22:35       ` Dave Chinner
2014-08-20  1:54         ` Chris Mason
2014-08-20  2:19           ` Dave Chinner
2014-08-20  2:36             ` Dave Chinner
2014-08-20  4:41               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140808203928.GA64279@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=clm@fb.com \
    --cc=sandeen@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.