From: Christoph Hellwig <hch@infradead.org>
To: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, Curt Wohlgemuth <curtw@google.com>,
Al Viro <viro@ZenIV.linux.org.uk>
Subject: Re: [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)
Date: Thu, 28 Jul 2011 16:37:56 -0400 [thread overview]
Message-ID: <20110728203756.GA28549@infradead.org> (raw)
In-Reply-To: <20110728174203.GF5044@quack.suse.cz>
On Thu, Jul 28, 2011 at 07:42:03PM +0200, Jan Kara wrote:
> > be do a blocking sync_inodes_sb, but then a non-blocking ->sync_fs?
> I tried to explain this in the changelog but it seems I failed.
> wakeup_flusher_threads(0) starts completely asynchronous writeback of all
> bdis. We could do an iteration of sb_sync_fs with wait==0 over all
> superblocks just after wakeup_flusher_threads(0) returns but at this
> moment, almost nothing is written yet because flusher threads have just
> started doing their work. So what useful work can ->sync_fs(0) do in such
> case? E.g. ocfs2 could start processing orphan list but for example
> starting quota writeback as done by XFS would be really fruitless as quota
> is going to change heavily as a result of writeback I expect. Similarly
> ext3 or ext4 cannot really do useful work while most inodes are not yet
> written.
the way xfs currently starts quota writeback actually seems pretty
useless to be already. I don't think removing it will have a
performance impact, but I will have to benchmark it carefully. Remember
that even right now the chances that all inodes are actually written
is fairly low on a busy systems, so we might just create a lot of
duplicate work by the two sync_fs passes.
>
> It would be nice to call ->sync_fs with wait == 0 when flusher thread is
> done with the bdi but there's no easy way to do that. We could use the
> completion for this but we'd have to track these for every bdi which isn't
> very nice. That's why I chose to synchronize with flusher threads using
> the synchronous inode writeback - when that is finished we know flusher
> threads are done and so calling ->sync_fs() can be useful. So do you think
> calling ->sync_fs() with wait == 0 that late is a problem?
It's far too much subtile changes for one patch. The current sync
sequence in pseudo-code is:
wakeup_flusher_threads()
for_each_sb {
writeback_inodes_sb();
sync_fs(nowait);
__sync_blockdev(nowait);
}
for_each_sb {
sync_inodes_sb();
sync_fs(wait);
__sync_blockdev(nowait);
}
and currently your patch changes it to:
wakeup_flusher_threads()
for_each_sb {
sync_inodes_sb();
sync_fs(nowait);
}
for_each_bdev
__sync_blockdev(nowait);
for_each_sb
sync_fs(wait);
for_each_bdev
__sync_blockdev(wait);
Let's do things bisectable, with one behaviour change and one goal at a
time.
That is first patch in the sequence splits the loops and moves to:
wakeup_flusher_threads()
for_each_sb
writeback_inodes_sb();
for_each_sb
sync_fs(nowait);
for_each_sb
__sync_blockdev(nowait);
for_each_sb
sync_inodes_sb();
for_each_sb
sync_fs(wait);
for_each_sb
__sync_blockdev(nowait);
Then as next steps add one patch that moves to iterate over the block
devices for the __sync_blockdev pass, and one that drops the
writeback_inodes_sb call at which point we'll have
wakeup_flusher_threads()
for_each_sb
sync_fs(nowait);
for_each_bdev
__sync_blockdev(nowait);
for_each_sb
sync_inodes_sb();
for_each_sb
sync_fs(wait);
for_each_bdev
__sync_blockdev(nowait);
And now we can thing about optimizations. Randomly changing order
as done right now doesn't seem to helpful. My theory would be that
we could simply drop the nowait versions of sync_fs entirely, as it
doesn't do a lot of work that gets invalidated by the inode writeback
later on, but we'd really have to validate that theory by benchmark
runs on the filesystems where people care about performance.
next prev parent reply other threads:[~2011-07-28 20:37 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-26 22:38 [PATCH 0/5 v2] Improve sync(2) handling Jan Kara
2011-07-26 22:38 ` [PATCH 1/5] vfs: Make sync(1) writeout also block device inodes Jan Kara
2011-07-27 9:44 ` Christoph Hellwig
2011-07-28 20:13 ` Jan Kara
2011-07-26 22:38 ` [PATCH 2/5] vfs: Move noop_backing_dev_info check from sync into writeback Jan Kara
2011-07-27 9:44 ` Christoph Hellwig
2011-07-26 22:38 ` [PATCH 3/5] quota: Split dquot_quota_sync() to writeback and cache flushing part Jan Kara
2011-07-27 8:26 ` Steven Whitehouse
2011-07-27 9:45 ` Christoph Hellwig
2011-07-26 22:38 ` [PATCH 4/5] quota: Move quota syncing to ->sync_fs method Jan Kara
2011-07-27 8:32 ` Steven Whitehouse
2011-07-27 9:46 ` Christoph Hellwig
2011-07-27 13:44 ` Dave Kleikamp
2011-07-26 22:38 ` [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1) Jan Kara
2011-07-27 9:52 ` Christoph Hellwig
2011-07-28 17:42 ` Jan Kara
2011-07-28 20:37 ` Christoph Hellwig [this message]
2011-07-28 21:20 ` Curt Wohlgemuth
2011-07-29 11:09 ` Christoph Hellwig
2011-09-28 15:00 ` [PATCH 0/5 v2] Improve sync(2) handling Christoph Hellwig
2011-10-06 22:20 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110728203756.GA28549@infradead.org \
--to=hch@infradead.org \
--cc=curtw@google.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).