Re: endless sync on bdi_sched_wait()? 2.6.33.1

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	Denys Fedorysychenko <nuclearcat@nuclearcat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: endless sync on bdi_sched_wait()? 2.6.33.1
Date: Wed, 21 Apr 2010 15:27:18 +0200	[thread overview]
Message-ID: <20100421132718.GA3327@quack.suse.cz> (raw)
In-Reply-To: <20100421015428.GC23541@dastard>

On Wed 21-04-10 11:54:28, Dave Chinner wrote:
> On Wed, Apr 21, 2010 at 02:33:09AM +0200, Jan Kara wrote:
> > On Mon 19-04-10 17:04:58, Dave Chinner wrote:
> > > The first async flush does:
> > >                                                                  vvvv
> > >     flush-253:16-2497  [000] 616072.897747: writeback_exec: work=13c0 pages=13818, sb=0, kupdate=0, range_cyclic=0 for_background=0
> > >     flush-253:16-2497  [000] 616072.897748: writeback_clear: work=ffff88003d8813c0, refs=1
> > >     flush-253:16-2497  [000] 616072.897753: wbc_writeback_start: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
> > >     flush-253:16-2497  [000] 616072.897768: wbc_writeback_written: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
> > > 
> > > Nothing - it does not write any pages towrt (nr_to_write) is
> > > unchanged by the attempted flush.
> >   This looks a bit strange. Surly there are plenty of dirty pages. I guess
> > we never get to ->writepages for XFS.  But then I wonder how does it
> > happen that we return without more_io set. Strange.
> 
> more_io not being set implies that we aren't calling requeue_io().
> So that means it's not caused by I_SYNC being set. If we get down to
> write_cache_pages, it implies that there are no dirty pages
> remaining we can write back.
> 
> Given a background flush just completed before this queued async
> flush was executed (didn't seem relevant, so I didn't include
> it), it is entirely possible that there were no dirty pages to
> write in the followup async flushes.
  Ah, OK.

> > > The third flush - the sync one - does:
> > >                                                                  vvvv
> > >     flush-253:16-2497  [000] 616072.897784: writeback_exec: work=3e58 pages=9223372036854775807, sb=1, kupdate=0, range_cyclic=0 for_background=0
> > >     flush-253:16-2497  [000] 616072.897785: wbc_writeback_start: dev 253:16 wbc=9d20 towrt=1024 skip=0 sb=1 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
> > > 
> > > some 75 seconds later having written only 1024 pages. In the mean
> > > time, the traces show dd blocked in balance_dirty_pages():
> > > 
> > >               dd-2498  [000] 616072.908675: wbc_balance_dirty_start: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > >               dd-2498  [000] 616072.908679: wbc_balance_dirty_wait: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > >               dd-2498  [000] 616073.238785: wbc_balance_dirty_start: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > >               dd-2498  [000] 616073.238788: wbc_balance_dirty_wait: dev 253:16 wbc=fb68 towrt=1536 skip=0 sb=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
> > > And it appears to stay blocked there without doing any writeback at
> > > all - there are no wbc_balance_dirty_pages_written traces at all.
> > > That is, it is blocking until the number of dirty pages is dropping
> > > below the dirty threshold, then continuing to write and dirty more
> > > pages.
> >   I think this happens because sync writeback is running so I_SYNC is set
> > and thus we cannot do any writeout for the inode from balance_dirty_pages.
> 
> It's not even calling into writeback so the I_SYNC flag is way out of
> scope ;)
  Are you sure? The tracepoints are in wb_writeback() but
writeback_inodes_wbc() calls directly into writeback_inodes_wb() so you
won't see any of the tracepoints to trigger. So how do you know we didn't
get to writeback_single_inode?

> > > <git blame>
> > > 
> > > commit 89e12190 - fix bug in nr_to_write introduced by dcf6a79d
> > > commit dcf6a79d - fix bug in nr_to_write introduced by 05fe478d
> > > commit 05fe478d - data integrity write fix: ignore nr_to_write for
> > > 			WB_SYNC_ALL writes.
> > > 		"This change does indeed make the possibility of
> > > 		long stalls la[r]ger, and that's not a good thing,
> > > 		but lying about data integrity is even worse."
> > > 
> > > IOWs, the observed sync behaviour is as intended - if you keep
> > > dirtying the file, sync will keep cleaning it because it defaults to
> > > being safe. I'd say "not a bug" then. I agree it's not ideal, but
> > > until Jan's inode sync sweep code is accepted I don't think there's
> > > much that can be done about it.
> >   Yes, my writeback sweeping patch was aimed exactly to reliably address
> > this issue. Anyway, if we could get the async stuff working properly then I
> > think livelocks should happen much less often... Need to really find some
> > time for this.
> 
> I think the async writeback is working correctly. It's just that if
> we queue async writeback, and it runs directly after a previous
> async writeback command was executed, it's possible it has nothing
> to do. The problem is that the sync writeback will wait on pages
> under writeback as it finds them, so it's likely to be running while
> more pages get dirtied and that's when the the tail-chase in
> write_cache_pages() starts.
  OK, probably you are right...
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2010-04-21 13:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-31 16:07 endless sync on bdi_sched_wait()? 2.6.33.1 Denys Fedorysychenko
2010-03-31 22:12 ` Dave Chinner
2010-04-01 10:42   ` Denys Fedorysychenko
2010-04-01 11:13     ` Dave Chinner
2010-04-01 20:14       ` Jeff Moyer
2010-04-08  9:28 ` Jan Kara
2010-04-08 10:12   ` Denys Fedorysychenko
2010-04-12  0:47   ` Dave Chinner
2010-04-19  1:37   ` Dave Chinner
2010-04-19  7:04     ` Dave Chinner
2010-04-19  7:23       ` Dave Chinner
2010-04-21  0:33       ` Jan Kara
2010-04-21  1:54         ` Dave Chinner
2010-04-21 13:27           ` Jan Kara [this message]
2010-04-22  0:06             ` Dave Chinner
2010-04-22 12:48               ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100421132718.GA3327@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nuclearcat@nuclearcat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).