All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jeff Moyer <jmoyer@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	Shaohua Li <shli@fusionio.com>
Subject: Re: [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold
Date: Thu, 3 May 2012 18:02:49 +0800	[thread overview]
Message-ID: <20120503100249.GA18819@localhost> (raw)
In-Reply-To: <20120503092528.GA1104@quack.suse.cz>

On Thu, May 03, 2012 at 11:25:28AM +0200, Jan Kara wrote:
> On Thu 03-05-12 11:43:11, Wu Fengguang wrote:
> > This helps write performance when setting the dirty threshold to tiny numbers.
> > 
> >      3.4.0-rc2         3.4.0-rc2-btrfs4+
> >   ------------  ------------------------
> >          96.92        -0.4%        96.54  bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
> >          98.47        +0.0%        98.50  bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
> >          99.38        -0.3%        99.06  bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
> >          98.04        -0.0%        98.02  bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
> >          98.68        +0.3%        98.98  bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
> >          99.34        -0.0%        99.31  bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
> >   ==>    88.98        +9.6%        97.53  bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
> >   ==>    86.99       +13.1%        98.39  bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
> >   ==>     2.75     +2442.4%        69.88  bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
> >   ==>     3.31     +2634.1%        90.54  bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
> > 
> > Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
> > ---
> >  fs/btrfs/disk-io.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > --- linux-next.orig/fs/btrfs/disk-io.c	2012-05-02 14:04:00.989262395 +0800
> > +++ linux-next/fs/btrfs/disk-io.c	2012-05-02 14:04:01.773262414 +0800
> > @@ -930,7 +930,8 @@ static int btree_writepages(struct addre
> >  
> >  		/* this is a bit racy, but that's ok */
> >  		num_dirty = root->fs_info->dirty_metadata_bytes;
> > -		if (num_dirty < thresh)
> > +		if (num_dirty < min(thresh,
> > +				    global_dirty_limit << (PAGE_CACHE_SHIFT-2)))
> >  			return 0;
> >  	}
> >  	return btree_write_cache_pages(mapping, wbc);
>   Frankly, that whole condition on WB_SYNC_NONE in btree_writepages() looks
> like a hack. I think we also had problems with this condition when we tried
> to change b_more_io list handling. I found rather terse commit message
> explaining the code:
> Btrfs: Limit btree writeback to prevent seeks
> 
>   Which I kind of understand but is it that bad? Also I think last time we
> stumbled over this code we were discussing that these dirty metadata would
> be simply hidden from mm which would solve the problem of flusher thread
> trying to outsmart the filesystem... But I guess noone had time to
> implement this for btrfs.

Yeah I have the same uneasy feelings. Actually my first attempt was to
remove the heuristics in btree_writepages() altogether. The result is
more or less performance degradations in the normal cases:

wfg@bee /export/writeback% ./compare bay/*/*-{3.4.0-rc2,3.4.0-rc2-btrfs+} 
               3.4.0-rc2          3.4.0-rc2-btrfs+  
------------------------  ------------------------  
                  190.81        -6.8%       177.82  bay/JBOD-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                  195.86        -3.3%       189.31  bay/JBOD-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                  196.68        -1.7%       193.30  bay/JBOD-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                  194.83       -24.4%       147.27  bay/JBOD-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
                  196.60        -2.5%       191.61  bay/JBOD-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
                  197.09        -0.7%       195.69  bay/JBOD-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
                  181.64        -8.7%       165.80  bay/RAID0-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                  186.14        -2.8%       180.85  bay/RAID0-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                  191.10        -1.5%       188.23  bay/RAID0-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                  191.30       -20.7%       151.63  bay/RAID0-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
                  186.03        -2.4%       181.54  bay/RAID0-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
                  170.18        -2.5%       165.97  bay/RAID0-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   96.18        -1.9%        94.32  bay/RAID1-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                   97.71        -1.4%        96.36  bay/RAID1-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                   97.57        -0.4%        97.23  bay/RAID1-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                   97.68        -6.0%        91.79  bay/RAID1-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
                   97.76        -0.7%        97.07  bay/RAID1-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
                   97.53        -0.3%        97.19  bay/RAID1-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   96.92        -3.0%        94.03  bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                   98.47        -1.4%        97.08  bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                   99.38        -0.7%        98.66  bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                   98.04        -8.2%        89.99  bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
                   98.68        -0.6%        98.09  bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
                   99.34        -0.7%        98.62  bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   88.98        -0.5%        88.51  bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
                   86.99       +14.5%        99.60  bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
                    2.75     +1871.2%        54.18  bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
                    3.31     +2035.0%        70.70  bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
                 3635.55        -1.2%      3592.46  TOTAL write_bw

So I end up with the conservative fix in this patch.

FYI I also experimented with "global_dirty_limit << PAGE_CACHE_SHIFT"
w/o the further "/4" in this patch, however result is not good:

               3.4.0-rc2         3.4.0-rc2-btrfs3+
------------------------  ------------------------
                   96.92        -0.3%        96.62  bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                   98.47        +0.1%        98.56  bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                   99.38        -0.2%        99.23  bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                   98.04        +0.1%        98.15  bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
                   98.68        +0.3%        98.96  bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
                   99.34        -0.1%        99.20  bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   88.98        -0.3%        88.73  bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
                   86.99        +1.4%        88.23  bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
                    2.75      +232.0%         9.13  bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
                    3.31        +1.5%         3.36  bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2

So this patch is kind of based on "experiment" rather than "reasoning".
And I took the easy way of using the global dirty threshold. Ideally
it should be based upon the per-bdi dirty threshold, but anyway...

Thanks,
Fengguang

  reply	other threads:[~2012-05-03 10:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-08  1:06 [RESEND][PATCH v2] block: remove plugging at buffered write time Wu Fengguang
2012-04-09 14:34 ` Jeff Moyer
2012-04-11 23:13   ` Andrew Morton
2012-04-12  1:32     ` Fengguang Wu
2012-04-12  2:20       ` Fengguang Wu
2012-04-12 14:26         ` Jan Kara
2012-04-13  1:40           ` Fengguang Wu
2012-05-03  3:43             ` [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold Fengguang Wu
2012-05-03  3:53               ` [PATCH] writeback: initialize global_dirty_limit Fengguang Wu
2012-05-03  9:25               ` [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold Jan Kara
2012-05-03 10:02                 ` Fengguang Wu [this message]
2012-05-03 12:31                 ` Chris Mason
2012-05-03 13:30                 ` Josef Bacik
2012-05-03 14:08               ` Fengguang Wu
2012-05-06  6:01           ` [RESEND][PATCH v2] block: remove plugging at buffered write time Fengguang Wu
2012-05-06  9:58       ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120503100249.GA18819@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@fusionio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.