From: Fengguang Wu <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jeff Moyer <jmoyer@redhat.com>, Jens Axboe <axboe@kernel.dk>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@infradead.org>,
Shaohua Li <shli@fusionio.com>
Subject: Re: [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold
Date: Thu, 3 May 2012 18:02:49 +0800 [thread overview]
Message-ID: <20120503100249.GA18819@localhost> (raw)
In-Reply-To: <20120503092528.GA1104@quack.suse.cz>
On Thu, May 03, 2012 at 11:25:28AM +0200, Jan Kara wrote:
> On Thu 03-05-12 11:43:11, Wu Fengguang wrote:
> > This helps write performance when setting the dirty threshold to tiny numbers.
> >
> > 3.4.0-rc2 3.4.0-rc2-btrfs4+
> > ------------ ------------------------
> > 96.92 -0.4% 96.54 bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
> > 98.47 +0.0% 98.50 bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
> > 99.38 -0.3% 99.06 bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
> > 98.04 -0.0% 98.02 bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
> > 98.68 +0.3% 98.98 bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
> > 99.34 -0.0% 99.31 bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
> > ==> 88.98 +9.6% 97.53 bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
> > ==> 86.99 +13.1% 98.39 bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
> > ==> 2.75 +2442.4% 69.88 bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
> > ==> 3.31 +2634.1% 90.54 bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
> >
> > Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
> > ---
> > fs/btrfs/disk-io.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > --- linux-next.orig/fs/btrfs/disk-io.c 2012-05-02 14:04:00.989262395 +0800
> > +++ linux-next/fs/btrfs/disk-io.c 2012-05-02 14:04:01.773262414 +0800
> > @@ -930,7 +930,8 @@ static int btree_writepages(struct addre
> >
> > /* this is a bit racy, but that's ok */
> > num_dirty = root->fs_info->dirty_metadata_bytes;
> > - if (num_dirty < thresh)
> > + if (num_dirty < min(thresh,
> > + global_dirty_limit << (PAGE_CACHE_SHIFT-2)))
> > return 0;
> > }
> > return btree_write_cache_pages(mapping, wbc);
> Frankly, that whole condition on WB_SYNC_NONE in btree_writepages() looks
> like a hack. I think we also had problems with this condition when we tried
> to change b_more_io list handling. I found rather terse commit message
> explaining the code:
> Btrfs: Limit btree writeback to prevent seeks
>
> Which I kind of understand but is it that bad? Also I think last time we
> stumbled over this code we were discussing that these dirty metadata would
> be simply hidden from mm which would solve the problem of flusher thread
> trying to outsmart the filesystem... But I guess noone had time to
> implement this for btrfs.
Yeah I have the same uneasy feelings. Actually my first attempt was to
remove the heuristics in btree_writepages() altogether. The result is
more or less performance degradations in the normal cases:
wfg@bee /export/writeback% ./compare bay/*/*-{3.4.0-rc2,3.4.0-rc2-btrfs+}
3.4.0-rc2 3.4.0-rc2-btrfs+
------------------------ ------------------------
190.81 -6.8% 177.82 bay/JBOD-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
195.86 -3.3% 189.31 bay/JBOD-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
196.68 -1.7% 193.30 bay/JBOD-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
194.83 -24.4% 147.27 bay/JBOD-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
196.60 -2.5% 191.61 bay/JBOD-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
197.09 -0.7% 195.69 bay/JBOD-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
181.64 -8.7% 165.80 bay/RAID0-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
186.14 -2.8% 180.85 bay/RAID0-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
191.10 -1.5% 188.23 bay/RAID0-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
191.30 -20.7% 151.63 bay/RAID0-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
186.03 -2.4% 181.54 bay/RAID0-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
170.18 -2.5% 165.97 bay/RAID0-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
96.18 -1.9% 94.32 bay/RAID1-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
97.71 -1.4% 96.36 bay/RAID1-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
97.57 -0.4% 97.23 bay/RAID1-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
97.68 -6.0% 91.79 bay/RAID1-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
97.76 -0.7% 97.07 bay/RAID1-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
97.53 -0.3% 97.19 bay/RAID1-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
96.92 -3.0% 94.03 bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
98.47 -1.4% 97.08 bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
99.38 -0.7% 98.66 bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
98.04 -8.2% 89.99 bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
98.68 -0.6% 98.09 bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
99.34 -0.7% 98.62 bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
88.98 -0.5% 88.51 bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
86.99 +14.5% 99.60 bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
2.75 +1871.2% 54.18 bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
3.31 +2035.0% 70.70 bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
3635.55 -1.2% 3592.46 TOTAL write_bw
So I end up with the conservative fix in this patch.
FYI I also experimented with "global_dirty_limit << PAGE_CACHE_SHIFT"
w/o the further "/4" in this patch, however result is not good:
3.4.0-rc2 3.4.0-rc2-btrfs3+
------------------------ ------------------------
96.92 -0.3% 96.62 bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
98.47 +0.1% 98.56 bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
99.38 -0.2% 99.23 bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
98.04 +0.1% 98.15 bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
98.68 +0.3% 98.96 bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
99.34 -0.1% 99.20 bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
88.98 -0.3% 88.73 bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
86.99 +1.4% 88.23 bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
2.75 +232.0% 9.13 bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
3.31 +1.5% 3.36 bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
So this patch is kind of based on "experiment" rather than "reasoning".
And I took the easy way of using the global dirty threshold. Ideally
it should be based upon the per-bdi dirty threshold, but anyway...
Thanks,
Fengguang
next prev parent reply other threads:[~2012-05-03 10:02 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-08 1:06 [RESEND][PATCH v2] block: remove plugging at buffered write time Wu Fengguang
2012-04-09 14:34 ` Jeff Moyer
2012-04-11 23:13 ` Andrew Morton
2012-04-12 1:32 ` Fengguang Wu
2012-04-12 2:20 ` Fengguang Wu
2012-04-12 14:26 ` Jan Kara
2012-04-13 1:40 ` Fengguang Wu
2012-05-03 3:43 ` [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold Fengguang Wu
2012-05-03 3:53 ` [PATCH] writeback: initialize global_dirty_limit Fengguang Wu
2012-05-03 9:25 ` [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold Jan Kara
2012-05-03 10:02 ` Fengguang Wu [this message]
2012-05-03 12:31 ` Chris Mason
2012-05-03 13:30 ` Josef Bacik
2012-05-03 14:08 ` Fengguang Wu
2012-05-06 6:01 ` [RESEND][PATCH v2] block: remove plugging at buffered write time Fengguang Wu
2012-05-06 9:58 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120503100249.GA18819@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=shli@fusionio.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.