From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Benny Halevy <bhalevy@panasas.com>,
LKML <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
aneesh.kumar@linux.vnet.ibm.com
Subject: Re: iozone write 50% regression in kernel 2.6.24-rc1
Date: Tue, 13 Nov 2007 10:19:22 +0800 [thread overview]
Message-ID: <1194920362.20251.161.camel@ymzhang> (raw)
In-Reply-To: <1194886100.9713.13.camel@twins>
On Mon, 2007-11-12 at 17:48 +0100, Peter Zijlstra wrote:
> On Mon, 2007-11-12 at 17:05 +0200, Benny Halevy wrote:
> > On Nov. 12, 2007, 15:26 +0200, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > > Single socket, dual core opteron, 2GB memory
> > > Single SATA disk, ext3
> > >
> > > x86_64 kernel and userland
> > >
> > > (dirty_background_ratio, dirty_ratio) tunables
> > >
> > > ---- (5,10) - default
> > >
> > > 2.6.23.1-42.fc8 #1 SMP
> > >
> > > 524288 4 59580 60356
> > > 524288 4 59247 61101
> > > 524288 4 61030 62831
> > >
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > >
> > > 524288 4 49277 56582
> > > 524288 4 50728 61056
> > > 524288 4 52027 59758
> > > 524288 4 51520 62426
> > >
> > >
> > > ---- (20,40) - similar to your 8GB
> > >
> > > 2.6.23.1-42.fc8 #1 SMP
> > >
> > > 524288 4 225977 447461
> > > 524288 4 232595 496848
> > > 524288 4 220608 478076
> > > 524288 4 203080 445230
> > >
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > >
> > > 524288 4 54043 83585
> > > 524288 4 69949 516253
> > > 524288 4 72343 491416
> > > 524288 4 71775 492653
> > >
> > > ---- (60,80) - overkill
> > >
> > > 2.6.23.1-42.fc8 #1 SMP
> > >
> > > 524288 4 208450 491892
> > > 524288 4 216262 481135
> > > 524288 4 221892 543608
> > > 524288 4 202209 574725
> > > 524288 4 231730 452482
> > >
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > >
> > > 524288 4 49091 86471
> > > 524288 4 65071 217566
> > > 524288 4 72238 492172
> > > 524288 4 71818 492433
> > > 524288 4 71327 493954
> > >
> > >
> > > While I see that the write speed as reported under .24 ~70MB/s is much
> > > lower than the one reported under .23 ~200MB/s, I find it very hard to
> > > believe my poor single SATA disk could actually do the 200MB/s for
> > > longer than its cache 8/16 MB (not sure).
> > >
> > > vmstat shows that actual IO is done, even though the whole 512MB could
> > > fit in cache, hence my suspicion that the ~70MB/s is the most realistic
> > > of the two.
> >
> > Even 70 MB/s seems too high. What throughput do you see for the
> > raw disk partition/
> >
> > Also, are the numbers above for successive runs?
> > It seems like you're seeing some caching effects so
> > I'd recommend using a file larger than your cache size and
> > the -e and -c options (to include fsync and close in timings)
> > to try to eliminate them.
>
> ------ iozone -i 0 -r 4k -s 512m -e -c
>
> .23 (20,40)
>
> 524288 4 31750 33560
> 524288 4 29786 32114
> 524288 4 29115 31476
>
> .24 (20,40)
>
> 524288 4 25022 32411
> 524288 4 25375 31662
> 524288 4 26407 33871
>
>
> ------ iozone -i 0 -r 4k -s 4g -e -c
>
> .23 (20,40)
>
> 4194304 4 39699 35550
> 4194304 4 40225 36099
>
>
> .24 (20,40)
>
> 4194304 4 39961 41656
> 4194304 4 39244 39673
>
>
> Yanmin, for that benchmark you ran, what was it meant to measure?
> From what I can make of it its just write cache benching.
Yeah. It's quite related to cache. I did more testing on my stoakley machine (8 cores,
8GB mem). If I reduce the memory to 4GB, the speed will be far slower.
>
> One thing I don't understand is how the write numbers are so much lower
> than the rewrite numbers. The iozone code (which gives me headaches,
> damn what a mess) seems to suggest that the only thing that is different
> is the lack of block allocation.
It might be a good direction.
>
> Linus posted a patch yesterday fixing up a regression in the ext3 bitmap
> block allocator, /me goes apply that patch and rerun the tests.
>
> > > ---- (20,40) - similar to your 8GB
> > >
> > > 2.6.23.1-42.fc8 #1 SMP
> > >
> > > 524288 4 225977 447461
> > > 524288 4 232595 496848
> > > 524288 4 220608 478076
> > > 524288 4 203080 445230
> > >
> > > 2.6.24-rc2 #28 SMP PREEMPT
> > >
> > > 524288 4 54043 83585
> > > 524288 4 69949 516253
> > > 524288 4 72343 491416
> > > 524288 4 71775 492653
>
> 2.6.24-rc2 +
> patches/wu-reiser.patch
> patches/writeback-early.patch
> patches/bdi-task-dirty.patch
> patches/bdi-sysfs.patch
> patches/sched-hrtick.patch
> patches/sched-rt-entity.patch
> patches/sched-watchdog.patch
> patches/linus-ext3-blockalloc.patch
>
> 524288 4 179657 487676
> 524288 4 173989 465682
> 524288 4 175842 489800
>
>
> Linus' patch is the one that makes the difference here. So I'm unsure
> how you bisected it down to:
>
> 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
Originally, my test suite is just to pick up the result of first run. Your prior
patch(speed up writeback ramp-up on clean systems) fixed an issue about first
run result regression. So my bisect captured it.
However, late on, I found following run have different results. A moment ago,
I retested 04fbfdc14e5f48463820d6b9807daa5e9c92c51f by:
#git checkout 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
#make
Then, reverse your patch. It looks like 04fbfdc14e5f48463820d6b9807daa5e9c92c51f
is not the root cause of following run regression. I will change my test suite to
make it run for many times and do a new bisect.
> These results seem to point to
>
> 7c9e69faa28027913ee059c285a5ea8382e24b5d
I tested 2.6.24-rc2 which already includes above patch. 2.6.24-rc2 has the same
regression like 2.6.24-rc1.
-yanmin
next prev parent reply other threads:[~2007-11-13 2:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-09 9:47 iozone write 50% regression in kernel 2.6.24-rc1 Zhang, Yanmin
2007-11-09 9:54 ` Peter Zijlstra
2007-11-12 2:14 ` Zhang, Yanmin
2007-11-12 9:45 ` Peter Zijlstra
2007-11-12 9:51 ` Zhang, Yanmin
2007-11-12 13:26 ` Peter Zijlstra
[not found] ` <47386BC4.3050403@panasas.com>
2007-11-12 16:48 ` Peter Zijlstra
2007-11-13 2:19 ` Zhang, Yanmin [this message]
2007-11-13 8:34 ` Zhang, Yanmin
2007-11-13 18:32 ` Peter Zijlstra
2007-11-12 17:25 ` Mark Lord
2007-11-13 1:49 ` Zhang, Yanmin
-- strict thread matches above, loose matches on Subject: below --
2007-11-09 12:36 Martin Knoblauch
2007-11-12 0:45 ` Zhang, Yanmin
2007-11-12 12:58 Martin Knoblauch
2007-11-13 2:04 ` Zhang, Yanmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1194920362.20251.161.camel@ymzhang \
--to=yanmin_zhang@linux.intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=bhalevy@panasas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox