From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
Denys Fedorysychenko <nuclearcat@nuclearcat.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: endless sync on bdi_sched_wait()? 2.6.33.1
Date: Wed, 21 Apr 2010 15:27:18 +0200 [thread overview]
Message-ID: <20100421132718.GA3327@quack.suse.cz> (raw)
In-Reply-To: <20100421015428.GC23541@dastard>
On Wed 21-04-10 11:54:28, Dave Chinner wrote:
> On Wed, Apr 21, 2010 at 02:33:09AM +0200, Jan Kara wrote:
> > On Mon 19-04-10 17:04:58, Dave Chinner wrote:
> > > The first async flush does:
> > > vvvv
> > > flush-253:16-2497 [000] 616072.897747: writeback_exec: work=13c0 pages=13818, sb=0, kupdate=0, range_cyclic=0 for_background=0
> > > flush-253:16-2497 [000] 616072.897748: writeback_clear: work=ffff88003d8813c0, refs=1
> > > flush-253:16-2497 [000] 616072.897753: wbc_writeback_start: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
> > > flush-253:16-2497 [000] 616072.897768: wbc_writeback_written: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
> > >
> > > Nothing - it does not write any pages towrt (nr_to_write) is
> > > unchanged by the attempted flush.
> > This looks a bit strange. Surly there are plenty of dirty pages. I guess
> > we never get to ->writepages for XFS. But then I wonder how does it
> > happen that we return without more_io set. Strange.
>
> more_io not being set implies that we aren't calling requeue_io().
> So that means it's not caused by I_SYNC being set. If we get down to
> write_cache_pages, it implies that there are no dirty pages
> remaining we can write back.
>
> Given a background flush just completed before this queued async
> flush was executed (didn't seem relevant, so I didn't include
> it), it is entirely possible that there were no dirty pages to
> write in the followup async flushes.
Ah, OK.
> > > The third flush - the sync one - does:
> > > vvvv
> > > flush-253:16-2497 [000] 616072.897784: writeback_exec: work=3e58 pages=9223372036854775807, sb=1, kupdate=0, range_cyclic=0 for_background=0
> > > flush-253:16-2497 [000] 616072.897785: wbc_writeback_start: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=1 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
> > >
> > > some 75 seconds later having written only 1024 pages. In the mean
> > > time, the traces show dd blocked in balance_dirty_pages():
> > >
> > > dd-2498 [000] 616072.908675: wbc_balance_dirty_start: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > > dd-2498 [000] 616072.908679: wbc_balance_dirty_wait: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > > dd-2498 [000] 616073.238785: wbc_balance_dirty_start: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > > dd-2498 [000] 616073.238788: wbc_balance_dirty_wait: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > > And it appears to stay blocked there without doing any writeback at
> > > all - there are no wbc_balance_dirty_pages_written traces at all.
> > > That is, it is blocking until the number of dirty pages is dropping
> > > below the dirty threshold, then continuing to write and dirty more
> > > pages.
> > I think this happens because sync writeback is running so I_SYNC is set
> > and thus we cannot do any writeout for the inode from balance_dirty_pages.
>
> It's not even calling into writeback so the I_SYNC flag is way out of
> scope ;)
Are you sure? The tracepoints are in wb_writeback() but
writeback_inodes_wbc() calls directly into writeback_inodes_wb() so you
won't see any of the tracepoints to trigger. So how do you know we didn't
get to writeback_single_inode?
> > > <git blame>
> > >
> > > commit 89e12190 - fix bug in nr_to_write introduced by dcf6a79d
> > > commit dcf6a79d - fix bug in nr_to_write introduced by 05fe478d
> > > commit 05fe478d - data integrity write fix: ignore nr_to_write for
> > > WB_SYNC_ALL writes.
> > > "This change does indeed make the possibility of
> > > long stalls la[r]ger, and that's not a good thing,
> > > but lying about data integrity is even worse."
> > >
> > > IOWs, the observed sync behaviour is as intended - if you keep
> > > dirtying the file, sync will keep cleaning it because it defaults to
> > > being safe. I'd say "not a bug" then. I agree it's not ideal, but
> > > until Jan's inode sync sweep code is accepted I don't think there's
> > > much that can be done about it.
> > Yes, my writeback sweeping patch was aimed exactly to reliably address
> > this issue. Anyway, if we could get the async stuff working properly then I
> > think livelocks should happen much less often... Need to really find some
> > time for this.
>
> I think the async writeback is working correctly. It's just that if
> we queue async writeback, and it runs directly after a previous
> async writeback command was executed, it's possible it has nothing
> to do. The problem is that the sync writeback will wait on pages
> under writeback as it finds them, so it's likely to be running while
> more pages get dirtied and that's when the the tail-chase in
> write_cache_pages() starts.
OK, probably you are right...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2010-04-21 13:27 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-31 16:07 endless sync on bdi_sched_wait()? 2.6.33.1 Denys Fedorysychenko
2010-03-31 22:12 ` Dave Chinner
2010-04-01 10:42 ` Denys Fedorysychenko
2010-04-01 11:13 ` Dave Chinner
2010-04-01 20:14 ` Jeff Moyer
2010-04-08 9:28 ` Jan Kara
2010-04-08 10:12 ` Denys Fedorysychenko
2010-04-12 0:47 ` Dave Chinner
2010-04-19 1:37 ` Dave Chinner
2010-04-19 7:04 ` Dave Chinner
2010-04-19 7:23 ` Dave Chinner
2010-04-21 0:33 ` Jan Kara
2010-04-21 1:54 ` Dave Chinner
2010-04-21 13:27 ` Jan Kara [this message]
2010-04-22 0:06 ` Dave Chinner
2010-04-22 12:48 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100421132718.GA3327@quack.suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nuclearcat@nuclearcat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).