From: Christoph Hellwig <hch@lst.de>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Hellwig <hch@lst.de>,
Jan Engelhardt <jengelh@medozas.de>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/5] writeback: stop background/kupdate works from livelocking other works
Date: Thu, 11 Nov 2010 14:32:11 +0100 [thread overview]
Message-ID: <20101111133211.GC12940@lst.de> (raw)
In-Reply-To: <20101111004047.GA7879@localhost>
On Thu, Nov 11, 2010 at 08:40:47AM +0800, Wu Fengguang wrote:
> Seriously, I also doubt the value of doing sync() in the flusher thread.
> sync() is by definition inefficient. In the block layer, it's served
> with less emphasis on throughput. In the VFS layer, it may sleep in
> inode_wait_for_writeback() and filemap_fdatawait(). In various FS,
> pages won't be skipped at the cost of more lock waiting.
>
> So when a flusher thread is serving sync(), it has difficulties
> saturating the storage device.
>
> btw, it seems current sync() does not take advantage of the flusher
> threads to sync multiple disks in parallel.
sys_sync does a wakeup_flusher_threads(0) which kicks all flusher
threads to write back all data. We then do another serialized task
of kicking per-sb writeback, and then do synchronous per-sb writeback,
interwinded with non-blocking and blocking quota sync, ->sync_fs
and __sync_blockdev calls.
The way sync ended up looking like it did is rather historic:
First Jan and I fixed sync to actually get data correctly to disk,
the way is was done traditionally had a lot of issues with ordering
it's steps. For that we changed it to just loop around sync_filesystem
to have one common place to define the proper order for it.
That caused a large performance regression, which Yanmin Zhang found
and fixed, which added back the wakeup_pdflush(0) (which later became
wakeup_flusher_threads).
The introduction of the per-bdi writeback threads by Jens changed
writeback_inodes_sb and sync_inodes_sb to offload the I/O submission
to the I/O thread.
I'm not overly happy with the current situation. Part of that is
the rather complex callchain in __sync_filesystem. If we moved the
quota sync and the sync_blockdev into ->sync_fs we'd already be down
to a much more managable level, and we could optimize sync down to:
wakeup_flusher_threads(0);
for_each_sb
sb->sync_fs(sb, 0)
for_each_sb {
sync_inodes_sb(sb);
sb->sync_fs(sb, 1)
}
We would still try to do most of the I/O from the flusher thread,
but only if we can do it non-blocking. If we have to block we'll
get less optimal I/O patterns, but at least we don't block other
writeback while waiting for it.
I suspect a big problem for the statving workloads is that we only
do the live-lock avoidance tagging inside sync_inodes_sb, so
any page written by wakeup_flusher_threads, or the writeback_inodes_sb
in the two first passes that gets redirties is synced out again.
But I'd feel very vary doing this without a lot of performance testing.
dpkg package install workloads, the ffsb create_4k test Yanmin used,
or fs_mark in one of the sync using versions would be a good benchmark.
Btw, where in the block I/O code do we penalize sync?
I don't think moving the I/O submission into the caller is going to
help us anything. What we should look into instead is to make as
much of the I/O submission non-blocking even inside sync.
> And I guess (concurrent) sync/fsync/msync calls will be rare,
> especially for really performance demanding workloads (which will
> optimize sync away in the first place).
There is no way to optimize a sync away if you want your data on disk.
The most common case is fsync, followed by O_SYNC, but for example due
to the massive fsync suckage on ext3 dpkg for example has switched to
sync instead, which is quite nasty if you have other work going on.
Offloading fsync to the flusher thread is an interesting idea, but I
wonder how well it works. Both fsync and sync are calls the application
waits on to make progress, so naturally we gave them some preference
to decrease the latency will penalizing background writeback. By
first trying an asynchronous pass via the flusher thread we could
optimize the I/O pattern, but at a huge cost to the latency for
the caller.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-11-11 13:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-08 23:09 [PATCH 0/5] writeback livelock fixes Wu Fengguang
2010-11-08 23:09 ` [PATCH 1/5] writeback: integrated background writeback work Wu Fengguang
2010-11-08 23:09 ` [PATCH 2/5] writeback: trace wakeup event for background writeback Wu Fengguang
2010-11-08 23:09 ` [PATCH 3/5] writeback: stop background/kupdate works from livelocking other works Wu Fengguang
2010-11-09 21:13 ` Andrew Morton
2010-11-09 22:28 ` Jan Kara
2010-11-09 23:00 ` Andrew Morton
2010-11-09 23:56 ` Jan Kara
2010-11-10 23:37 ` Andrew Morton
2010-11-11 0:40 ` Wu Fengguang
2010-11-11 13:32 ` Christoph Hellwig [this message]
2010-11-11 16:44 ` Jan Kara
2010-11-08 23:09 ` [PATCH 4/5] writeback: avoid livelocking WB_SYNC_ALL writeback Wu Fengguang
2010-11-09 22:43 ` Andrew Morton
2010-11-09 23:18 ` Jan Kara
2010-11-10 2:26 ` Wu Fengguang
2010-11-08 23:09 ` [PATCH 5/5] writeback: check skipped pages on WB_SYNC_ALL Wu Fengguang
2010-11-09 22:47 ` Andrew Morton
2010-11-09 23:16 ` Wu Fengguang
2010-11-08 23:23 ` [PATCH 0/5] writeback livelock fixes Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2010-11-10 2:35 [PATCH 0/5] writeback livelock fixes v2 Wu Fengguang
2010-11-10 2:35 ` [PATCH 3/5] writeback: stop background/kupdate works from livelocking other works Wu Fengguang
2010-11-10 3:55 ` Wu Fengguang
2010-11-10 16:26 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101111133211.GC12940@lst.de \
--to=hch@lst.de \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=jengelh@medozas.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).