From: Ben Myers <bpm@sgi.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes
Date: Mon, 20 May 2013 14:18:13 -0500 [thread overview]
Message-ID: <20130520191813.GA20028@sgi.com> (raw)
In-Reply-To: <519A6553.4090801@redhat.com>
On Mon, May 20, 2013 at 02:02:59PM -0400, Brian Foster wrote:
> On 05/19/2013 07:51 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > FSX on 512 byte block size filesystems has been failing for some
> > time with corrupted data. The fault dates back to the change in
> > the writeback data integrity algorithm that uses a mark-and-sweep
> > approach to avoid data writeback livelocks.
> >
> > Unfortunately, a side effect of this mark-and-sweep approach is that
> > each page will only be written once for a data integrity sync, and
> > there is a condition in writeback in XFS where a page may require
> > two writeback attempts to be fully written. As a result of the high
> > level change, we now only get a partial page writeback during the
> > integrity sync because the first pass through writeback clears the
> > mark left on the page index to tell writeback that the page needs
> > writeback....
> >
> > The cause is writing a partial page in the clustering code. This can
> > happen when a mapping boundary falls in the middle of a page - we
> > end up writing back the first part of the page that the mapping
> > covers, but then never revisit the page to have the remainder mapped
> > and written.
> >
> > The fix is simple - if the mapping boundary falls inside a page,
> > then simple abort clustering without touching the page. This means
> > that the next ->writepage entry that write_cache_pages() will make
> > is the page we aborted on, and xfs_vm_writepage() will map all
> > sections of the page correctly. This behaviour is also optimal for
> > non-data integrity writes, as it results in contiguous sequential
> > writeback of the file rather than missing small holes and having to
> > write them a "random" writes in a future pass.
> >
> > With this fix, all the fsx tests in xfstests now pass on a 512 byte
> > block size filesystem on a 4k page machine.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
>
> Looks good to me.
>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
>
> > fs/xfs/xfs_aops.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 2b2691b..f04eceb 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -725,6 +725,25 @@ xfs_convert_page(
> > (xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
> > i_size_read(inode));
> >
> > + /*
> > + * If the current map does not span the entire page we are about to try
> > + * to write, then give up. The only way we can write a page that spans
> > + * multiple mappings in a single writeback iteration is via the
> > + * xfs_vm_writepage() function. Data integrity writeback requires the
> > + * entire page to be written in a single attempt, otherwise the part of
> > + * the page we don't write here doesn't get written as part of the data
> > + * integrity sync.
> > + *
> > + * For normal writeback, we also don't attempt to write partial pages
> > + * here as it simply means that write_cache_pages() will see it under
> > + * writeback and ignore the page until some pointin the future, at which
> > + * time this will be the only page inteh file that needs writeback.
> > + * Hence for more optimal IO patterns, we should always avoid partial
> > + * page writeback due to multiple mappings on a page here.
> > + */
Applying this with a couple of spelling fixes in this comment.
Thanks for the reviews Brian.
-Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-05-20 19:18 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-19 23:51 [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Dave Chinner
2013-05-19 23:51 ` [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes Dave Chinner
2013-05-20 18:02 ` Brian Foster
2013-05-20 19:18 ` Ben Myers [this message]
2013-05-19 23:51 ` [PATCH 02/14] xfs: fix rounding in xfs_free_file_space Dave Chinner
2013-05-20 18:03 ` Brian Foster
2013-05-19 23:51 ` [PATCH 03/14] xfs: Don't reference the EFI after it is freed Dave Chinner
2013-05-20 18:03 ` Brian Foster
2013-05-19 23:51 ` [PATCH 04/14] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
2013-05-20 18:03 ` Brian Foster
2013-05-21 0:06 ` Dave Chinner
2013-05-21 0:36 ` [PATCH 04/14 V2] " Dave Chinner
2013-05-21 10:51 ` Brian Foster
2013-05-19 23:51 ` [PATCH 05/14] xfs: fix missing KM_NOFS tags to keep lockdep happy Dave Chinner
2013-05-20 21:16 ` Ben Myers
2013-05-21 0:08 ` Dave Chinner
2013-05-19 23:51 ` [PATCH 06/14] xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC Dave Chinner
2013-05-20 21:32 ` Ben Myers
2013-05-19 23:51 ` [PATCH 07/14] xfs: xfs_attr_shortform_allfit() does not handle attr3 format Dave Chinner
2013-05-20 21:52 ` Ben Myers
2013-05-19 23:51 ` [PATCH 08/14] xfs: remote attribute allocation may be contiguous Dave Chinner
2013-05-20 19:03 ` Brian Foster
2013-05-20 22:04 ` Ben Myers
2013-05-21 0:25 ` Dave Chinner
2013-05-19 23:51 ` [PATCH 09/14] xfs: remote attribute lookups require the value length Dave Chinner
2013-05-20 22:15 ` Ben Myers
2013-05-19 23:51 ` [PATCH 10/14] xfs: remote attribute read too short Dave Chinner
2013-05-20 23:00 ` Ben Myers
2013-05-19 23:51 ` [PATCH 11/14] xfs: remote attribute tail zeroing does too much Dave Chinner
2013-05-20 23:01 ` Ben Myers
2013-05-19 23:51 ` [PATCH 12/14] xfs: correctly map remote attr buffers during removal Dave Chinner
2013-05-19 23:51 ` [PATCH 13/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
2013-05-19 23:51 ` [PATCH 14/14] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
2013-05-20 19:37 ` [PATCH 00/14] xfs: fixes for 3.10-rc2 (update) Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130520191813.GA20028@sgi.com \
--to=bpm@sgi.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox