From: Christoph Hellwig <hch@infradead.org>
To: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: do_sync() and XFSQA test 182 failures....
Date: Thu, 30 Oct 2008 18:46:25 -0400 [thread overview]
Message-ID: <20081030224625.GA18690@infradead.org> (raw)
In-Reply-To: <20081030085020.GP17077@disturbed>
On Thu, Oct 30, 2008 at 07:50:20PM +1100, Dave Chinner wrote:
> [*] Starting pdflush to sync data in the background when we are
> about to start flushing ourselves is self-defeating. instead of
> having a single thread doing optimal writeout patterns, we now
> have two threads trying to sync the same filesystems and
> competing with each other to write out dirty inodes. This
> actually causes bugs in sync because pdflush is doing async
> flushes. Hence if pdflush races and wins during the sync flush
> part of the sync process, sync_inodes(1) will return before all
> the data/metadata is on disk because it can't be found to be
> waited on.
Note that this is an XFS special. Every other filesystem (at least the
major ones) rely on the VFS to write out data.
> Now the sync is _supposedly_ complete. But we still have a dirty
> log and superblock thanks to delayed allocation that may have
> occurred after the sync_supers() call. Hence we can immediately
> see that we cannot *ever* do a proper sync of an XFS filesystem
> in Linux without modifying do_sync() to do more callouts.
>
> Worse, XFS can also still have *dirty inodes* because sync_inodes(1)
> will remove inodes from the dirty list in the async pass, but they
> can get dirtied a short time later (if they had dirty data) when the
> data I/O completes. Hence if the second sync pass completes before
> the inode is dirtied again we'll miss flushing it. This will mean we
> don't write inode size updates during sync. This is the same race
> that pdflush running in the background can trigger.
This is a common problem that would hit any filesystem trying to have
some sort of ordered data or journaled data mode.
> However, I have a problem - I'm an expert in XFS, not the other tens
> of Linux filesystems so I can't begin to guess what the impact of
> changing do_sync() would be on those many filesystems. How many
> filesystems would such a change break? Indeed - how many are broken
> right now by having dirty inodes and superblocks slip through
> sync(1)?
I would guess more are broken now then a change in order would break.
Then again purely a change in order would still leave this code
fragile as hell.
> What are the alternatives? do_sync() operates above any particular
> filesystem, so it's hard to provide a filesystem specific ->do_sync
> method to avoid changing sync order for all filesystems. Do we
> change do_sync() to completely sync a superblock at a time instead
> of doing each operation across all superblocks before moving onto
> the next operation? Is there any particular reason (e.g. performance, locking) for the current
> method that would prevent changing to completely-sync-a-superblock
> iteration algorithm so we can provide a custom ->do_sync method?
Locking can't be the reason as we should never hold locks while
returning from one of the VFS operations. I think it's performance
or rather alledged performance as I think it doesn't really matter.
If it matters however there is an easy method to make it perform just
as well with a proper callout - just spawn a thread for every filesystem
to perform it in parallel.
> Are there any other ways that we can get a custom ->do_sync
> method for XFS? I'd prefer a custom method so we don't have to
> revalidate every linux filesystem, especially as XFS already has
> everything it needs to provide it's own sync method (used for
> freezing) and a test suite to validate it is working correctly.....
I think having a method for this would be useful. And given that
a proper sync should be exactly the same as a filesysytem freeze
we should maybe use one method for both of those and use the chance
to give filesystem better control over the freeze process?
next prev parent reply other threads:[~2008-10-30 22:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-30 8:50 do_sync() and XFSQA test 182 failures Dave Chinner
2008-10-30 22:46 ` Christoph Hellwig [this message]
2008-10-31 0:12 ` Dave Chinner
2008-10-31 0:48 ` Dave Chinner
2008-10-31 20:31 ` Christoph Hellwig
2008-10-31 21:54 ` Dave Chinner
2008-10-31 22:22 ` Christoph Hellwig
2008-10-31 4:02 ` Lachlan McIlroy
2008-10-31 13:06 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081030224625.GA18690@infradead.org \
--to=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).