linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Tso <tytso@mit.edu>, Jens Axboe <jens.axboe@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"chris.mason@oracle.com" <chris.mason@oracle.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"jack@suse.cz" <jack@suse.cz>
Subject: Re: [PATCH 0/7] Per-bdi writeback flusher threads v20
Date: Sun, 20 Sep 2009 21:00:06 +0200	[thread overview]
Message-ID: <20090920190006.GD16919@duck.suse.cz> (raw)
In-Reply-To: <20090919150351.GA19880@localhost>

On Sat 19-09-09 23:03:51, Wu Fengguang wrote:
> On Sat, Sep 19, 2009 at 12:26:07PM +0800, Wu Fengguang wrote:
> > On Sat, Sep 19, 2009 at 12:00:51PM +0800, Wu Fengguang wrote:
> > > On Sat, Sep 19, 2009 at 11:58:35AM +0800, Wu Fengguang wrote:
> > > > On Sat, Sep 19, 2009 at 01:52:52AM +0800, Theodore Tso wrote:
> > > > > On Fri, Sep 11, 2009 at 10:39:29PM +0800, Wu Fengguang wrote:
> > > > > > 
> > > > > > That would be good. Sorry for the late work. I'll allocate some time
> > > > > > in mid next week to help review and benchmark recent writeback works,
> > > > > > and hope to get things done in this merge window.
> > > > > 
> > > > > Did you have some chance to get more work done on the your writeback
> > > > > patches?
> > > > 
> > > > Sorry for the delay, I'm now testing the patches with commands
> > > > 
> > > >  cp /dev/zero /mnt/test/zero0 &
> > > >  dd if=/dev/zero of=/mnt/test/zero1 &
> > > > 
> > > > and the attached debug patch.
> > > > 
> > > > One problem I found with ext3/4 is, redirty_tail() is called repeatedly
> > > > in the traces, which could slow down the inode writeback significantly.
> > > 
> > > FYI, it's this redirty_tail() called in writeback_single_inode():
> > > 
> > >                         /*
> > >                          * Someone redirtied the inode while were writing back
> > >                          * the pages.
> > >                          */
> > >                         redirty_tail(inode);
> > 
> > Hmm, this looks like an old fashioned problem get blew up by the
> > 128MB MAX_WRITEBACK_PAGES.
> > 
> > The inode was redirtied by the busy cp/dd processes. Now it takes much
> > more time to sync 128MB, so that a heavy dirtier can easily redirty
> > the inode in that time window.
> > 
> > One single invocation of redirty_tail() could hold up the writeback of
> > current inode for up to 30 seconds.
> 
> It seems that this patch helps. However I'm afraid it's too late to
> risk merging such kind of patches now..
  Fenguang, could we maybe write down how the logic should look like
and then look at the code and modify it as needed to fit the logic?
Because I couldn't find a compact description of the logic anywhere
in the code.
  Here is how I'd imaging the writeout logic should work:
We would have just two lists - b_dirty and b_more_io. Both would be
ordered by dirtied_when.
  A thread doing WB_SYNC_ALL writeback will just walk the list and cleanup
everything (we should be resistant against livelocks because we stop at
inode which has been dirtied after the sync has started).
  A thread doing WB_SYNC_NONE writeback will start walking the list. If the
inode has I_SYNC set, it puts it on b_more_io. Otherwise it takes I_SYNC
and writes as much as it finds necessary from the first inode. If it
stopped before it wrote everything, it puts the inode at the end of
b_more_io.  If it wrote everything (writeback_index cycled or scanned the
whole range) but inode is dirty, it puts the inode at the end of b_dirty
and resets dirtied_when to the current time. Then it continues with the
next inode.
  kupdate style writeback stops scanning dirty list when dirtied_when is
new enough. Then if b_more_io is nonempty, it splices it into the beginning
of the dirty list and restarts.
  Other types of writeback splice b_more_io to b_dirty when b_dirty gets
empty. pdflush style writeback writes until we drop below background dirty
limit. Other kinds of writeback (throttled threads, writeback submitted by
filesystem itself) write while nr_to_write > 0.
  If we didn't write anything during the b_dirty scan, we wait until I_SYNC
of the first inode on b_more_io gets cleared before starting the next scan.
  Does this look reasonably complete and cover all the cases?

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  parent reply	other threads:[~2009-09-20 19:00 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11  7:34 [PATCH 0/7] Per-bdi writeback flusher threads v20 Jens Axboe
2009-09-11  7:34 ` [PATCH 1/7] writeback: get rid of generic_sync_sb_inodes() export Jens Axboe
2009-09-11  7:34 ` [PATCH 2/7] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-09-11  7:34 ` [PATCH 3/7] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-09-11  7:34 ` [PATCH 4/7] writeback: get rid of pdflush completely Jens Axboe
2009-09-11  7:34 ` [PATCH 5/7] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-09-11  7:34 ` [PATCH 6/7] writeback: add name to backing_dev_info Jens Axboe
2009-09-11  7:34 ` [PATCH 7/7] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-09-11 13:42 ` [PATCH 0/7] Per-bdi writeback flusher threads v20 Theodore Tso
2009-09-11 13:45   ` Chris Mason
2009-09-11 14:04     ` Jens Axboe
2009-09-11 14:16   ` Christoph Hellwig
2009-09-11 14:29     ` Jens Axboe
2009-09-11 14:39       ` Wu Fengguang
2009-09-18 17:52         ` Theodore Tso
2009-09-19  3:58           ` Wu Fengguang
2009-09-19  4:00             ` Wu Fengguang
2009-09-19  4:26               ` Wu Fengguang
     [not found]               ` <20090919042607.GA19752@localhost>
2009-09-19 15:03                 ` Wu Fengguang
     [not found]                 ` <20090919150351.GA19880@localhost>
2009-09-20 19:00                   ` Jan Kara [this message]
2009-09-21  3:04                     ` Wu Fengguang
2009-09-21  5:35                       ` Wu Fengguang
2009-09-21  9:53                         ` Wu Fengguang
2009-09-21 10:02                           ` Jan Kara
2009-09-21 10:18                             ` Wu Fengguang
2009-09-21 12:42                       ` Jan Kara
2009-09-21 15:12                         ` Wu Fengguang
2009-09-21 16:08                           ` Jan Kara
2009-09-22  5:10                             ` Wu Fengguang
2009-09-21 13:53                 ` Chris Mason
2009-09-22 10:13                   ` Wu Fengguang
     [not found]                   ` <20090922101335.GA27432@localhost>
2009-09-22 11:30                     ` Jan Kara
2009-09-22 13:33                       ` Wu Fengguang
2009-09-22 11:30                     ` Chris Mason
2009-09-22 11:45                       ` Jan Kara
2009-09-22 12:47                         ` Wu Fengguang
2009-09-22 17:41                         ` Chris Mason
2009-09-22 13:18                       ` Wu Fengguang
2009-09-22 15:59                         ` Chris Mason
2009-09-23  1:05                           ` Wu Fengguang
2009-09-23 14:08                             ` Chris Mason
2009-09-24  1:32                               ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090920190006.GD16919@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).