From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To: Shaohua Li <shli@fusionio.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>, NeilBrown <neilb@suse.de>,
linux RAID <linux-raid@vger.kernel.org>
Subject: Re: ext4 write performance regression in 3.6-rc1 on RAID0/5
Date: Wed, 22 Aug 2012 13:39:36 +0800 [thread overview]
Message-ID: <20120822053936.GG2570@yliu-dev.sh.intel.com> (raw)
In-Reply-To: <50345AFE.1070700@fusionio.com>
On Wed, Aug 22, 2012 at 12:07:26PM +0800, Shaohua Li wrote:
> On 8/22/12 11:57 AM, Yuanhan Liu wrote:
> > On Fri, Aug 17, 2012 at 10:25:26PM +0800, Fengguang Wu wrote:
> >> [CC md list]
> >>
> >> On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote:
> >>> On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote:
> >>>> Ted,
> >>>>
> >>>> I find ext4 write performance dropped by 3.3% on average in the
> >>>> 3.6-rc1 merge window. xfs and btrfs are fine.
> >>>>
> >>>> Two machines are tested. The performance regression happens in the
> >>>> lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does
> >>>> not see regression, which is equipped with HDD drives. I'll continue
> >>>> to repeat the tests and report variations.
> >>>
> >>> Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 --
> >>> fs/ext4 fs/jbd2" and I don't see anything that I would expect would
> >>> cause that. The are the lock elimination changes for Direct I/O
> >>> overwrites, but that shouldn't matter for your tests which are
> >>> measuring buffered writes, correct?
> >>>
> >>> Is there any chance you could do me a favor and do a git bisect
> >>> restricted to commits involving fs/ext4 and fs/jbd2?
> >>
> >> I noticed that the regressions all happen in the RAID0/RAID5 cases.
> >> So it may be some interactions between the RAID/ext4 code?
> >>
> >> I'll try to get some ext2/3 numbers, which should have less
> >changes
> on the fs side.
> >>
> >> wfg@bee /export/writeback% ./compare -g ext4
> lkp-nex04/*/*-{3.5.0,3.6.0-rc1+}
> >> 3.5.0 3.6.0-rc1+
> >> ------------------------ ------------------------
> >> 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0
> >> 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0
> >> 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0
> >> 702.41 -0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0
> >> 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0
> >> 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0
> >> 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0
> >> 704.21 +1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0
> >> 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0
> >> 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0
> >> 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0
> >> 701.17 +0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0
> >> 675.08 -10.5% 604.29
> lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0
> >> 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0
> >> 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0
> >> 524.61 -0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0
> >> 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0
> >> 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0
> >> 524.16 +0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0
> >> 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0
> >> 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0
> >> 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0
> >> 470.40 -3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0
> >> 167.97 -38.7% 103.03
> lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0
> >> 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0
> >> 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0
> >> 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0
> >> 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0
> >> 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0
> >> 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0
> >> 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0
> >> 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0
> >> 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0
> >
> > Hi,
> >
> > About this issue, I did some investigation. And found we are blocked at
> > get_active_stripes() in most times. It's reasonable, since max_nr_stripes
> > is set to 256 now. It's a kind of small value, thus I tried with
> > different value. Please see the following patch for detailed numbers.
> >
> > The test machine is same as above.
> >
> > From 85c27fca12b770da5bc8ec9f26a22cb414e84c68 Mon Sep 17 00:00:00 2001
> > From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > Date: Wed, 22 Aug 2012 10:51:48 +0800
> > Subject: [RFC PATCH] md/raid5: increase NR_STRIPES to 1024
> >
> > Stripe head is a must held resource before doing any IO. And it's
> > limited to 256 by default. With 10dd case, we found that it is
> > blocked at get_active_stripes() in most times(please see the ps
> > output attached).
> >
> > Thus I did some tries with different value set to NR_STRIPS, and
> > here are some numbers(EXT4 only) I got with different NR_STRIPS set:
> >
> > write bandwidth:
> > ================
> > 3.5.0-rc1-256+: (Here 256 means with max strip head set to 256)
> > write bandwidth: 280
> > 3.5.0-rc1-1024+:
> > write bandwidth: 421 (+50.4%)
> > 3.5.0-rc1-4096+:
> > write bandwidth: 506 (+80.7%)
> > 3.5.0-rc1-32768+:
> > write bandwidth: 615 (+119.6%)
> >
> > (Here 'sh' means with Shaohua's "multiple threads to handle
> >strips"
> patch [0])
> > 3.5.0-rc3-strip-sh+-256:
> > write bandwidth: 465
> >
> > 3.5.0-rc3-strip-sh+-1024:
> > write bandwidth: 599
> >
> > 3.5.0-rc3-strip-sh+-32768:
> > write bandwidth: 615
> >
> > The kernel maybe a bit older but I found that the data are still kind of
> > valid. Though, I haven't tried Shaohua's latest patch.
> >
> > As you can see from those data above: the write bandwidth is increased
> > (a lot) as we increase NR_STRIPES. Thus the bigger NR_STRIPES set, the
> > better write bandwidth we get. But we can't set NR_STRIPES with a too
> > large number, especially by default, or it need lots of memory. Due to
> > the number I got with Shaohua's patch applied, I guess 1024 would be
> > nice value; it's not too big but we gain above 110% performance.
> >
> > Comments? BTW, I have a more flexible(more stupid, in the meantime) way:
> > change the max_nr_stripes dynamically based on need?
> >
> > Here I also attached more data: the script I used to get those number,
> > ps output, and iostat -kx 3 output.
> >
> > The script does it's job in a straight way: start NR dd in background,
> > trace the writeback/global_dirty_state event in background to count the
> > write bandwidth, sample the ps out regularly.
> >
> > ---
> > [0]: patch: http://lwn.net/Articles/500200/
> >
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > ---
> > drivers/md/raid5.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index adda94d..82dca53 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -62,7 +62,7 @@
> > * Stripe cache
> > */
> >
> > -#define NR_STRIPES 256
> > +#define NR_STRIPES 1024
> > #define STRIPE_SIZE PAGE_SIZE
> > #define STRIPE_SHIFT (PAGE_SHIFT - 9)
> > #define STRIPE_SECTORS (STRIPE_SIZE>>9)
>
> does revert commit 8811b5968f6216e fix the problem?
Hi Shaohua,
Quote those numbers again:
write bandwidth:
================
3.5.0-rc1-256+:
write bandwidth: 280
3.5.0-rc1-1024+:
write bandwidth: 421 (+50.4%)
3.5.0-rc1-4096+:
write bandwidth: 506 (+80.7%)
3.5.0-rc1-32768+:
write bandwidth: 615 (+119.6%)
Where the above kernel does not include commit 8811b5968f6216e; it's bit
old kernel.
The following kernel does, which I applied your patch
series(http://thread.gmane.org/gmane.linux.raid/38711)
3.5.0-rc3-strip-sh+-256:
write bandwidth: 465
3.5.0-rc3-strip-sh+-1024:
write bandwidth: 599
3.5.0-rc3-strip-sh+-32768:
write bandwidth: 615
And yes, the kernel is old. But from Fengguang's data, I don't see that
new kernel matters too much.
Thanks,
Yuanhan Liu
next prev parent reply other threads:[~2012-08-22 5:39 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-15 18:33 NULL pointer dereference in ext4_ext_remove_space on 3.5.1 Marti Raudsepp
2012-08-16 2:46 ` Theodore Ts'o
2012-08-16 11:10 ` Fengguang Wu
2012-08-16 15:25 ` Theodore Ts'o
2012-08-16 20:21 ` Maciej Żenczykowski
2012-08-16 20:21 ` Maciej Żenczykowski
2012-08-16 21:19 ` Theodore Ts'o
2012-08-16 21:19 ` Theodore Ts'o
2012-08-16 21:40 ` Maciej Żenczykowski
2012-08-16 22:26 ` Theodore Ts'o
2012-08-16 22:44 ` Maciej Żenczykowski
2012-08-17 6:01 ` Fengguang Wu
2012-08-17 13:15 ` Theodore Ts'o
2012-08-17 13:22 ` Fengguang Wu
2012-08-17 13:50 ` [PATCH] ext4: fix kernel BUG on large-scale rm -rf commands Theodore Ts'o
2012-08-17 17:48 ` NULL pointer dereference in ext4_ext_remove_space on 3.5.1 Christoph Hellwig
2012-08-17 20:34 ` Theodore Ts'o
2012-08-17 20:34 ` Theodore Ts'o
2012-08-17 21:05 ` Christoph Hellwig
2012-08-17 21:05 ` Christoph Hellwig
2012-08-17 22:55 ` Dave Chinner
2012-08-17 22:55 ` Dave Chinner
2012-08-17 23:11 ` Theodore Ts'o
2012-08-17 23:11 ` Theodore Ts'o
2012-08-17 6:09 ` ext4 write performance regression in 3.6-rc1 Fengguang Wu
2012-08-17 13:40 ` Theodore Ts'o
2012-08-17 14:13 ` Fengguang Wu
2012-08-17 14:25 ` ext4 write performance regression in 3.6-rc1 on RAID0/5 Fengguang Wu
[not found] ` <20120817151318.GA2341@localhost>
2012-08-17 15:37 ` Theodore Ts'o
2012-08-17 20:44 ` NeilBrown
2012-08-21 9:42 ` Fengguang Wu
2012-08-21 12:07 ` Fengguang Wu
[not found] ` <20120822035702.GF2570@yliu-dev.sh.intel.com>
2012-08-22 4:07 ` Shaohua Li
2012-08-22 5:39 ` Yuanhan Liu [this message]
2012-08-22 6:00 ` NeilBrown
2012-08-22 6:31 ` Yuanhan Liu
2012-08-22 7:14 ` Andreas Dilger
2012-08-22 20:47 ` Dan Williams
2012-08-22 21:59 ` NeilBrown
2012-09-17 12:21 ` NULL pointer dereference in ext4_ext_remove_space on 3.5.1 Dmitry Monakhov
2012-09-17 13:52 ` Theodore Ts'o
2012-09-17 14:48 ` Dmitry Monakhov
2012-08-16 9:00 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120822053936.GG2570@yliu-dev.sh.intel.com \
--to=yuanhan.liu@linux.intel.com \
--cc=fengguang.wu@intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=shli@fusionio.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.