From: Martin Bligh <mbligh@google.com>
To: Chad Talbott <ctalbott@google.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
wfg@mail.ustc.edu.cn, Michael Rubin <mrubin@google.com>,
Andrew Morton <akpm@google.com>,
sandeen@redhat.com, Michael Davidson <md@google.com>
Subject: Re: Bug in kernel 2.6.31, Slow wb_kupdate writeout
Date: Wed, 29 Jul 2009 00:15:48 -0700 [thread overview]
Message-ID: <33307c790907290015m1e6b5666x9c0014cdaf5ed08@mail.gmail.com> (raw)
In-Reply-To: <33307c790907281449k5e8d4f6cib2c93848f5ec2661@mail.gmail.com>
On Tue, Jul 28, 2009 at 2:49 PM, Martin Bligh<mbligh@google.com> wrote:
>> An interesting recent-ish change is "writeback: speed up writeback of
>> big dirty files." When I revert the change to __sync_single_inode the
>> problem appears to go away and background writeout proceeds at disk
>> speed. Interestingly, that code is in the git commit [2], but not in
>> the post to LKML. [3] This is may not be the fix, but it makes this
>> test behave better.
>
> I'm fairly sure this is not fixing the root cause - but putting it at the head
> rather than the tail of the queue causes the error not to starve wb_kupdate
> for nearly so long - as long as we keep the queue full, the bug is hidden.
OK, it seems this is the root cause - I wasn't clear why all the pages weren't
being written back, and thought there was another bug. What happens is
we go into write_cache_pages, and stuff the disk queue with as much as
we can put into it, and then inevitably hit the congestion limit.
Then we back out to __sync_single_inode, who says "huh, you didn't manage
to write your whole slice", and penalizes the poor blameless inode in question
by putting it back into the penalty box for 30s.
This results in very lumpy I/O writeback at 5s intervals, and very
poor throughput.
Patch below is inline and probably text munged, but is for RFC only.
I'll test it
more thoroughly tomorrow. As for the comment about starving other writes,
I believe requeue_io moves it from s_io to s_more_io which should at least
allow some progress of other files.
--- linux-2.6.30/fs/fs-writeback.c.old 2009-07-29 00:08:29.000000000 -0700
+++ linux-2.6.30/fs/fs-writeback.c 2009-07-29 00:11:28.000000000 -0700
@@ -322,46 +322,11 @@ __sync_single_inode(struct inode *inode,
/*
* We didn't write back all the pages. nfs_writepages()
* sometimes bales out without doing anything. Redirty
- * the inode; Move it from s_io onto s_more_io/s_dirty.
+ * the inode; Move it from s_io onto s_more_io. It
+ * may well have just encountered congestion
*/
- /*
- * akpm: if the caller was the kupdate function we put
- * this inode at the head of s_dirty so it gets first
- * consideration. Otherwise, move it to the tail, for
- * the reasons described there. I'm not really sure
- * how much sense this makes. Presumably I had a good
- * reasons for doing it this way, and I'd rather not
- * muck with it at present.
- */
- if (wbc->for_kupdate) {
- /*
- * For the kupdate function we move the inode
- * to s_more_io so it will get more writeout as
- * soon as the queue becomes uncongested.
- */
- inode->i_state |= I_DIRTY_PAGES;
- if (wbc->nr_to_write <= 0) {
- /*
- * slice used up: queue for next turn
- */
- requeue_io(inode);
- } else {
- /*
- * somehow blocked: retry later
- */
- redirty_tail(inode);
- }
- } else {
- /*
- * Otherwise fully redirty the inode so that
- * other inodes on this superblock will get some
- * writeout. Otherwise heavy writing to one
- * file would indefinitely suspend writeout of
- * all the other files.
- */
- inode->i_state |= I_DIRTY_PAGES;
- redirty_tail(inode);
- }
+ inode->i_state |= I_DIRTY_PAGES;
+ requeue_io(inode);
} else if (inode->i_state & I_DIRTY) {
/*
* Someone redirtied the inode while were writing back
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-07-29 7:15 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-28 19:11 Bug in kernel 2.6.31, Slow wb_kupdate writeout Chad Talbott
2009-07-28 21:49 ` Martin Bligh
2009-07-29 7:15 ` Martin Bligh [this message]
2009-07-29 11:43 ` Wu Fengguang
2009-07-29 14:11 ` Martin Bligh
2009-07-30 1:06 ` Wu Fengguang
2009-07-30 1:12 ` Martin Bligh
2009-07-30 1:57 ` Wu Fengguang
2009-07-30 2:59 ` Martin Bligh
2009-07-30 4:08 ` Wu Fengguang
2009-07-30 19:55 ` Martin Bligh
2009-08-01 2:02 ` Wu Fengguang
2009-07-30 0:19 ` Martin Bligh
2009-07-30 1:28 ` Martin Bligh
2009-07-30 2:09 ` Wu Fengguang
2009-07-30 2:57 ` Martin Bligh
2009-07-30 3:19 ` Wu Fengguang
2009-07-30 20:33 ` Martin Bligh
2009-08-01 2:58 ` Wu Fengguang
2009-08-01 4:10 ` Wu Fengguang
2009-07-30 1:49 ` Wu Fengguang
2009-07-30 21:39 ` Jens Axboe
2009-07-30 22:01 ` Martin Bligh
2009-07-30 22:17 ` Jens Axboe
2009-07-30 22:34 ` Martin Bligh
2009-07-30 22:43 ` Jens Axboe
2009-07-30 22:48 ` Martin Bligh
2009-07-31 7:50 ` Peter Zijlstra
2009-08-01 4:03 ` Wu Fengguang
2009-08-01 4:53 ` Wu Fengguang
2009-08-01 5:03 ` Wu Fengguang
2009-08-01 4:02 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=33307c790907290015m1e6b5666x9c0014cdaf5ed08@mail.gmail.com \
--to=mbligh@google.com \
--cc=akpm@google.com \
--cc=ctalbott@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=md@google.com \
--cc=mrubin@google.com \
--cc=sandeen@redhat.com \
--cc=wfg@mail.ustc.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).