public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.24-rc1: First impressions
@ 2007-10-26 14:18 Martin Knoblauch
  2007-10-26 15:22 ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Knoblauch @ 2007-10-26 14:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter zijlstra, Fengguang Wu

Hi ,

 just to give some feedback on 2.6.24-rc1. For some time I am tracking IO/writeback problems that hurt system responsiveness big-time. I tested Peters stuff together with Fenguangs additions and it looked promising. Therefore I was very happy to see Peters stuff going into 2.6.24 and waited eagerly for rc1. In short, I am impressed. This really looks good. IO throughput is great and I could not reproduce the responsiveness problems so far.

 Below are a some numbers of my brute-force I/O tests that I can use to bring responsiveness down. My platform is a HP/DL380g4, dual CPUs, HT-enabled, 8 GB Memory, SmartaArray6i controller with 4x72GB SCSI disks as RAID5 (battery protected writeback cahe enabled) and gigabit networking (tg3). User space is 64-bit RHEL4.3

 I am basically doing copies using "dd" with 1MB blocksize. Local Filesystem ist ext2 (noatime). IO-Scheduler is dealine, as it tends to give best results. NFS3 Server is a Sun/T2000/Solaris10. The tests are:

dd1 - copy 16 GB from /dev/zero to local FS
dd1-dir - same, but using O_DIRECT for output
dd2/dd2-dir - copy 2x7.6 GB in parallel from /dev/zero to local FS
dd3/dd3-dir - copy 3x5.2 GB in parallel from /dev/zero lo local FS
net1 - copy 5.2 GB from NFS3 share to local FS
mix3 - copy 3x5.2 GB from /dev/zero to local disk and two NFS3 shares

 I did the numbers for 2.6.19.2, 2.6.22.6 and 2.6.24-rc1. All units are MB/sec.

test           2.6.19.2     2.6.22.6    2.6.24.-rc1
----------------------------------------------------------------
dd1           28            50                96
dd1-dir     88                88                86
dd2          2x16.5       2x11            2x44.5
dd2-dir      2x44          2x44            2x43
dd3            3x9.8        3x8.7         3x30
dd3-dir      3x29.5      3x29.5        3x28.5
net1            30-33         50-55         37-52
mix3           17/32         25/50        96/35 (disk/combined-network)


 Some observations:

- single threaded disk speed really went up wit 2.6.24-rc1. It is now even better than O_DIRECT
- O_DIRECT took a slight hit compared to the older kernels. Not an issue for me, but maybe others care
- multi threaded non O_DIRECT scales for the first time ever !!!! Almost no loss compared to single threaded !!!!!!
- network throughput took a hit from 2.6.22.6 and is not as repeatable. Still better than 2.6.19.2 though

 What actually surprises me most is the big performance win on the single threaded non O_DIRECT dd test. I did not expect that :-) What I had hoped for was of course the scalability.

 So, this looks great and most likely I will push 2.6.24 (maybe .X) into my environment.

Happy weekend
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: 2.6.24-rc1: First impressions
@ 2007-10-29  8:29 Martin Knoblauch
  0 siblings, 0 replies; 13+ messages in thread
From: Martin Knoblauch @ 2007-10-29  8:29 UTC (permalink / raw)
  To: Andrew Morton, Arjan van de Ven
  Cc: Ingo Molnar, linux-kernel, a.p.zijlstra, wfg, torvalds, riel

----- Original Message ----
> From: Andrew Morton <akpm@linux-foundation.org>
> To: Arjan van de Ven <arjan@infradead.org>
> Cc: Ingo Molnar <mingo@elte.hu>; spamtrap@knobisoft.de; linux-kernel@vger.kernel.org; a.p.zijlstra@chello.nl; wfg@mail.ustc.edu.cn; torvalds@linux-foundation.org; riel@redhat.com
> Sent: Saturday, October 27, 2007 7:59:51 AM
> Subject: Re: 2.6.24-rc1: First impressions
> 
> On Fri, 26 Oct 2007 22:46:57 -0700 Arjan van de
> Ven
> 
  wrote:
> 
> > > > > dd1 - copy 16 GB from /dev/zero to local FS
> > > > > dd1-dir - same, but using O_DIRECT for output
> > > > > dd2/dd2-dir - copy 2x7.6 GB in parallel from /dev/zero to
> local
> 
 FS
> > > > > dd3/dd3-dir - copy 3x5.2 GB in parallel from /dev/zero lo
> local
> 
 FS
> > > > > net1 - copy 5.2 GB from NFS3 share to local FS
> > > > > mix3 - copy 3x5.2 GB from /dev/zero to local disk and two NFS3
> > > > > shares
> > > > > 
> > > > >  I did the numbers for 2.6.19.2, 2.6.22.6 and 2.6.24-rc1. All
> > > > > units are MB/sec.
> > > > > 
> > > > > test           2.6.19.2     2.6.22.6    2.6.24.-rc1
> > > >
> >
> 
 ----------------------------------------------------------------
> > > > > dd1                  28           50             96
> > > > > dd1-dir              88           88             86
> > > > > dd2              2x16.5         2x11         2x44.5
> > > > > dd2-dir            2x44         2x44           2x43
> > > > > dd3               3x9.8        3x8.7           3x30
> > > > > dd3-dir          3x29.5       3x29.5         3x28.5
> > > > > net1              30-33        50-55          37-52
> > > > > mix3              17/32        25/50          96/35
> > > > > (disk/combined-network)
> > > > 
> > > > wow, really nice results!
> > > 
> > > Those changes seem suspiciously large to me.  I wonder if
> there's
> 
 less
> > > physical IO happening during the timed run, and
> correspondingly
> 
 more
> > > afterwards.
> > > 
> > 
> > another option... this is ext2.. didn't the ext2 reservation
> stuff
> 
 get
> > merged into -rc1? for ext3 that gave a 4x or so speed boost (much
> > better sequential allocation pattern)
> > 
> 
> Yes, one would expect that to make a large difference in
> dd2/dd2-dir
> 
 and
> dd3/dd3-dir - but only on SMP.  On UP there's not enough concurrency
> in the fs block allocator for any damage to occur.
>

 Just for the record the test are done on  SMP.
 
> Reservations won't affect dd1 though, and that went faster too.
>

 This is the one result that surprised me most, as I did not really expect any big moves here. I am not complaining :-), but definitely it would be nice to understand the why.

Cheers
Martin
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: 2.6.24-rc1: First impressions
@ 2007-10-29 11:09 Martin Knoblauch
  2007-10-29 11:40 ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Knoblauch @ 2007-10-29 11:09 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: linux-kernel, a.p.zijlstra, wfg, torvalds, riel

----- Original Message ----
> From: Ingo Molnar <mingo@elte.hu>
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: spamtrap@knobisoft.de; linux-kernel@vger.kernel.org; a.p.zijlstra@chello.nl; wfg@mail.ustc.edu.cn; torvalds@linux-foundation.org; riel@redhat.com
> Sent: Friday, October 26, 2007 9:33:40 PM
> Subject: Re: 2.6.24-rc1: First impressions
> 
> 
> * Andrew Morton  wrote:
> 
> > > > dd1 - copy 16 GB from /dev/zero to local FS
> > > > dd1-dir - same, but using O_DIRECT for output
> > > > dd2/dd2-dir - copy 2x7.6 GB in parallel from /dev/zero to
> local
> 
 FS
> > > > dd3/dd3-dir - copy 3x5.2 GB in parallel from /dev/zero lo
> local
> 
 FS
> > > > net1 - copy 5.2 GB from NFS3 share to local FS
> > > > mix3 - copy 3x5.2 GB from /dev/zero to local disk and two
> NFS3
> 
 shares
> > > > 
> > > >  I did the numbers for 2.6.19.2, 2.6.22.6 and 2.6.24-rc1.
> All
> 
 units 
> > > >  are MB/sec.
> > > > 
> > > > test           2.6.19.2     2.6.22.6    2.6.24.-rc1
> > > > ----------------------------------------------------------------
> > > > dd1                  28           50             96
> > > > dd1-dir              88           88             86
> > > > dd2              2x16.5         2x11         2x44.5
> > > > dd2-dir            2x44         2x44           2x43
> > > > dd3               3x9.8        3x8.7           3x30
> > > > dd3-dir          3x29.5       3x29.5         3x28.5
> > > > net1              30-33        50-55          37-52
> > > > mix3              17/32        25/50         
> 96/35
> 
 (disk/combined-network)
> > > 
> > > wow, really nice results!
> > 
> > Those changes seem suspiciously large to me.  I wonder if
> there's
> 
 less 
> > physical IO happening during the timed run, and correspondingly more 
> > afterwards.
> 
> so a final 'sync' should be added to the test too, and the time
> it
> 
 takes 
> factored into the bandwidth numbers?
> 

 One of the reasons I do 15 GB transfers is to make sure that I am well above the possible page cache size. And of course I am doing a final sync to finish the runs :-) The sync is also running faster in 2.6.24-rc1.

 If I factor it in the results for dd1/dd3 are:

test                2.6.19.2        2.6.22.6    2.6.24-rc1
sync time       18sec            19sec      6sec
dd1                     27.5                 47.5        92
dd3                     3x9.1              3x8.5       3x29

So basically including the sync time make 2.6.24-rc1 even more promosing. Now, I know that my benchmarks numbers are crude and show only a very small aspect of system performance. But - it is an aspect I care about a lot. And those benchmarks match my use-case pretty good.

Cheers
Martin






^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-10-29 11:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-26 14:18 2.6.24-rc1: First impressions Martin Knoblauch
2007-10-26 15:22 ` Ingo Molnar
2007-10-26 15:29   ` Peter Zijlstra
2007-10-26 15:49     ` Rik van Riel
2007-10-26 19:21   ` Andrew Morton
2007-10-26 19:33     ` Ingo Molnar
2007-10-26 19:42       ` Andrew Morton
2007-10-27 19:14         ` Bill Davidsen
2007-10-27  5:46     ` Arjan van de Ven
2007-10-27  5:59       ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2007-10-29  8:29 Martin Knoblauch
2007-10-29 11:09 Martin Knoblauch
2007-10-29 11:40 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox