2.6.8-rc2-mm2 performance improvements (scheduler?)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.8-rc2-mm2 performance improvements (scheduler?)
@ 2004-08-04 15:10 Martin J. Bligh
  2004-08-04 15:12 ` Martin J. Bligh
  0 siblings, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2004-08-04 15:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, linux-kernel

2.6.8-rc2-mm2 has some significant improvements over 2.6.8-rc2,
particularly at low to mid loads ... at the high loads, it's still       -58   -33.5% clear_page_tables
       -61   -19.3% __d_lookup
       -70   -15.2% page_remove_rmap
       -71   -71.0% finish_task_switch
       -71   -46.4% fput
       -72   -56.7% buffered_rmqueue
       -73   -53.7% pte_alloc_one
       -74   -22.9% __copy_to_user_ll
       -75   -31.0% do_no_page
       -85   -68.0% free_hot_cold_page
       -95   -66.0% __copy_user_intel
      -118   -21.1% find_trylock_page
      -126   -43.8% do_anonymous_page
      -171   -21.6% copy_page_range
      -368   -38.8% zap_pte_range
      -392   -62.1% do_wp_page
     -6262   -11.9% default_idle
     -9294   -14.4% total

slightly improved, but less significant. Numbers from 16x NUMA-Q.
Kernbench sees most improvement in sys time, but also some elapsed
time ... SDET sees up to 18% more througput.

I'm also amused to see that the process scalability is now pretty
damned good ... a full -j kernel compile (using up to about 1300
tasks) goes as fast as the -j 256 (the middle one) ... and elapsed
is faster than -j16, even if system is a little higher.

I *think* this is the scheduler changes ... fits in with profile diffs
I've seen before ... diffprofiles at the end. In my experience, higher
copy_to/from_user and finish_task_switch stuff tends to indicate task 
thrashing. Note also .text.lock.semaphore numbers in kernbench profile.
The SDET one looks like it's load-balancing better (mainly less idle
time).

Great stuff.

M.

Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
                              Elapsed      System        User         CPU
                    2.6.7       45.37       90.91      579.75     1479.33
                2.6.8-rc2       45.05       88.53      577.87     1485.67
            2.6.8-rc2-mm2       44.09       78.84      577.01     1486.33

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
                              Elapsed      System        User         CPU
                    2.6.7       44.77       97.96      576.59     1507.00
                2.6.8-rc2       44.83       96.00      575.50     1497.33
            2.6.8-rc2-mm2       43.43       86.04      576.26     1524.33

Kernbench: (make -j vmlinux, maximal tasks)
                              Elapsed      System        User         CPU
                    2.6.7       44.25       88.95      575.63     1501.33
                2.6.8-rc2       44.03       87.74      573.82     1503.67
            2.6.8-rc2-mm2       43.75       86.68      576.98     1518.00

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         1.0%
                2.6.8-rc2        95.9%         2.3%
            2.6.8-rc2-mm2       111.5%         3.3%

SDET 2  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.0%
                2.6.8-rc2       100.5%         1.4%
            2.6.8-rc2-mm2       115.1%         4.0%

SDET 4  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         1.0%
                2.6.8-rc2        99.2%         1.1%
            2.6.8-rc2-mm2       111.9%         0.5%

SDET 8  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.2%
                2.6.8-rc2       100.2%         1.0%
            2.6.8-rc2-mm2       117.4%         0.9%

SDET 16  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.3%
                2.6.8-rc2        99.5%         0.3%
            2.6.8-rc2-mm2       118.5%         0.6%

SDET 32  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.3%
                2.6.8-rc2        99.7%         0.4%
            2.6.8-rc2-mm2       102.1%         0.8%

SDET 64  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.2%
                2.6.8-rc2       101.6%         0.4%
            2.6.8-rc2-mm2       103.2%         0.0%

SDET 128  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.2%
                2.6.8-rc2       100.2%         0.1%
            2.6.8-rc2-mm2       103.0%         0.3%



Diffprofile for kernbench -j32 (-ve numbers better with mm2)
      2135     4.3% default_idle
       233    44.2% pte_alloc_one
       220    11.9% buffered_rmqueue
       164   264.5% schedule
       135     5.9% do_page_fault
        84    10.7% clear_page_tables
        62    62.6% __wake_up_sync
        51    60.7% set_page_address
...
       -50   -43.9% sys_close
       -56   -10.7% __fput
       -58   -13.7% set_page_dirty
       -61   -10.0% copy_process
       -70   -41.4% pipe_writev
       -77    -8.1% file_move
       -85  -100.0% wake_up_forked_thread
       -87   -50.9% pipe_wait
       -90    -5.7% path_lookup
       -93   -21.2% page_add_anon_rmap
      -105   -28.5% release_task
      -113   -11.9% do_wp_page
      -116    -7.9% link_path_walk
      -116   -43.1% pipe_readv
      -121    -7.5% atomic_dec_and_lock
      -138   -15.4% strnlen_user
      -159    -9.4% do_no_page
      -167    -2.7% __d_lookup
      -214   -59.4% find_idlest_cpu
      -230    -6.2% find_trylock_page
      -237    -1.6% do_anonymous_page
      -255    -7.8% zap_pte_range
      -444   -97.6% .text.lock.semaphore
      -532   -43.4% Letext
      -632   -54.2% __wake_up
     -1086   -52.2% finish_task_switch
     -1436   -24.6% __copy_to_user_ll
     -3079   -46.3% __copy_from_user_ll
     -7468    -5.4% total

sdetbench 8 (-ve numbers better with mm2)

...
       -58   -33.5% clear_page_tables
       -61   -19.3% __d_lookup
       -70   -15.2% page_remove_rmap
       -71   -71.0% finish_task_switch
       -71   -46.4% fput
       -72   -56.7% buffered_rmqueue
       -73   -53.7% pte_alloc_one
       -74   -22.9% __copy_to_user_ll
       -75   -31.0% do_no_page
       -85   -68.0% free_hot_cold_page
       -95   -66.0% __copy_user_intel
      -118   -21.1% find_trylock_page
      -126   -43.8% do_anonymous_page
      -171   -21.6% copy_page_range
      -368   -38.8% zap_pte_range
      -392   -62.1% do_wp_page
     -6262   -11.9% default_idle
     -9294   -14.4% total


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 15:10 Martin J. Bligh
@ 2004-08-04 15:12 ` Martin J. Bligh
  2004-08-04 19:24   ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2004-08-04 15:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, linux-kernel

Doh. Clipped the paste button on the mouse against the PC just as I
hit send ... pasting a bunch of data in the wrong place ;-) Should've
looked like this:

----------------------------------------------

2.6.8-rc2-mm2 has some significant improvements over 2.6.8-rc2,
particularly at low to mid loads ... at the high loads, it's still
slightly improved, but less significant. Numbers from 16x NUMA-Q.
Kernbench sees most improvement in sys time, but also some elapsed
time ... SDET sees up to 18% more througput.

I'm also amused to see that the process scalability is now pretty
damned good ... a full -j kernel compile (using up to about 1300
tasks) goes as fast as the -j 256 (the middle one) ... and elapsed
is faster than -j16, even if system is a little higher.

I *think* this is the scheduler changes ... fits in with profile diffs
I've seen before ... diffprofiles at the end. In my experience, higher
copy_to/from_user and finish_task_switch stuff tends to indicate task 
thrashing. Note also .text.lock.semaphore numbers in kernbench profile.
The SDET one looks like it's load-balancing better (mainly less idle
time).

Great stuff.

M.

Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
                              Elapsed      System        User         CPU
                    2.6.7       45.37       90.91      579.75     1479.33
                2.6.8-rc2       45.05       88.53      577.87     1485.67
            2.6.8-rc2-mm2       44.09       78.84      577.01     1486.33

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
                              Elapsed      System        User         CPU
                    2.6.7       44.77       97.96      576.59     1507.00
                2.6.8-rc2       44.83       96.00      575.50     1497.33
            2.6.8-rc2-mm2       43.43       86.04      576.26     1524.33

Kernbench: (make -j vmlinux, maximal tasks)
                              Elapsed      System        User         CPU
                    2.6.7       44.25       88.95      575.63     1501.33
                2.6.8-rc2       44.03       87.74      573.82     1503.67
            2.6.8-rc2-mm2       43.75       86.68      576.98     1518.00

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         1.0%
                2.6.8-rc2        95.9%         2.3%
            2.6.8-rc2-mm2       111.5%         3.3%

SDET 2  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.0%
                2.6.8-rc2       100.5%         1.4%
            2.6.8-rc2-mm2       115.1%         4.0%

SDET 4  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         1.0%
                2.6.8-rc2        99.2%         1.1%
            2.6.8-rc2-mm2       111.9%         0.5%

SDET 8  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.2%
                2.6.8-rc2       100.2%         1.0%
            2.6.8-rc2-mm2       117.4%         0.9%

SDET 16  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.3%
                2.6.8-rc2        99.5%         0.3%
            2.6.8-rc2-mm2       118.5%         0.6%

SDET 32  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.3%
                2.6.8-rc2        99.7%         0.4%
            2.6.8-rc2-mm2       102.1%         0.8%

SDET 64  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.2%
                2.6.8-rc2       101.6%         0.4%
            2.6.8-rc2-mm2       103.2%         0.0%

SDET 128  (see disclaimer)
                           Throughput    Std. Dev
                    2.6.7       100.0%         0.2%
                2.6.8-rc2       100.2%         0.1%
            2.6.8-rc2-mm2       103.0%         0.3%



Diffprofile for kernbench -j32 (-ve numbers better with mm2)
      2135     4.3% default_idle
       233    44.2% pte_alloc_one
       220    11.9% buffered_rmqueue
       164   264.5% schedule
       135     5.9% do_page_fault
        84    10.7% clear_page_tables
        62    62.6% __wake_up_sync
        51    60.7% set_page_address
...
       -50   -43.9% sys_close
       -56   -10.7% __fput
       -58   -13.7% set_page_dirty
       -61   -10.0% copy_process
       -70   -41.4% pipe_writev
       -77    -8.1% file_move
       -85  -100.0% wake_up_forked_thread
       -87   -50.9% pipe_wait
       -90    -5.7% path_lookup
       -93   -21.2% page_add_anon_rmap
      -105   -28.5% release_task
      -113   -11.9% do_wp_page
      -116    -7.9% link_path_walk
      -116   -43.1% pipe_readv
      -121    -7.5% atomic_dec_and_lock
      -138   -15.4% strnlen_user
      -159    -9.4% do_no_page
      -167    -2.7% __d_lookup
      -214   -59.4% find_idlest_cpu
      -230    -6.2% find_trylock_page
      -237    -1.6% do_anonymous_page
      -255    -7.8% zap_pte_range
      -444   -97.6% .text.lock.semaphore
      -532   -43.4% Letext
      -632   -54.2% __wake_up
     -1086   -52.2% finish_task_switch
     -1436   -24.6% __copy_to_user_ll
     -3079   -46.3% __copy_from_user_ll
     -7468    -5.4% total

sdetbench 8 (-ve numbers better with mm2)

...
       -58   -33.5% clear_page_tables
       -61   -19.3% __d_lookup
       -70   -15.2% page_remove_rmap
       -71   -71.0% finish_task_switch
       -71   -46.4% fput
       -72   -56.7% buffered_rmqueue
       -73   -53.7% pte_alloc_one
       -74   -22.9% __copy_to_user_ll
       -75   -31.0% do_no_page
       -85   -68.0% free_hot_cold_page
       -95   -66.0% __copy_user_intel
      -118   -21.1% find_trylock_page
      -126   -43.8% do_anonymous_page
      -171   -21.6% copy_page_range
      -368   -38.8% zap_pte_range
      -392   -62.1% do_wp_page
     -6262   -11.9% default_idle
     -9294   -14.4% total


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 15:12 ` Martin J. Bligh
@ 2004-08-04 19:24   ` Andrew Morton
  2004-08-04 19:34     ` Martin J. Bligh
  2004-08-04 23:44     ` Peter Williams
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2004-08-04 19:24 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: kernel, linux-kernel, Ingo Molnar

"Martin J. Bligh" <mbligh@aracnet.com> wrote:
>
> SDET 8  (see disclaimer)
>                             Throughput    Std. Dev
>                      2.6.7       100.0%         0.2%
>                  2.6.8-rc2       100.2%         1.0%
>              2.6.8-rc2-mm2       117.4%         0.9%
> 
>  SDET 16  (see disclaimer)
>                             Throughput    Std. Dev
>                      2.6.7       100.0%         0.3%
>                  2.6.8-rc2        99.5%         0.3%
>              2.6.8-rc2-mm2       118.5%         0.6%

hum, interesting.  Can Con's changes affect the inter-node and inter-cpu
balancing decisions, or is this all due to caching effects, reduced context
switching etc?

I don't expect we'll be merging a new CPU scheduler into mainline any time
soon, but we should work to understand where this improvement came from,
and see if we can get the mainline scheduler to catch up.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 19:24   ` Andrew Morton
@ 2004-08-04 19:34     ` Martin J. Bligh
  2004-08-04 19:50       ` Andrew Morton
  2004-08-04 20:10       ` Ingo Molnar
  2004-08-04 23:44     ` Peter Williams
  1 sibling, 2 replies; 20+ messages in thread
From: Martin J. Bligh @ 2004-08-04 19:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kernel, linux-kernel, Ingo Molnar, Rick Lindsley

--On Wednesday, August 04, 2004 12:24:14 -0700 Andrew Morton <akpm@osdl.org> wrote:

> "Martin J. Bligh" <mbligh@aracnet.com> wrote:
>> 
>> SDET 8  (see disclaimer)
>>                             Throughput    Std. Dev
>>                      2.6.7       100.0%         0.2%
>>                  2.6.8-rc2       100.2%         1.0%
>>              2.6.8-rc2-mm2       117.4%         0.9%
>> 
>>  SDET 16  (see disclaimer)
>>                             Throughput    Std. Dev
>>                      2.6.7       100.0%         0.3%
>>                  2.6.8-rc2        99.5%         0.3%
>>              2.6.8-rc2-mm2       118.5%         0.6%
> 
> hum, interesting.  Can Con's changes affect the inter-node and inter-cpu
> balancing decisions, or is this all due to caching effects, reduced context
> switching etc?
>
> I don't expect we'll be merging a new CPU scheduler into mainline any time
> soon, but we should work to understand where this improvement came from,
> and see if we can get the mainline scheduler to catch up.

Dunno ... really need to take schedstats profiles before and afterwards to
get a better picture what it's doing. Rick was working on a port.

M.

PS. schedstats is great for this kind of thing. Very useful, minimally 
invasive, no impact unless configed in, and nothing measurable even then.
Hint. Hint ;-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 19:34     ` Martin J. Bligh
@ 2004-08-04 19:50       ` Andrew Morton
  2004-08-04 20:07         ` Rick Lindsley
  2004-08-04 20:10       ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2004-08-04 19:50 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: kernel, linux-kernel, mingo, ricklind

"Martin J. Bligh" <mbligh@aracnet.com> wrote:
>
> PS. schedstats is great for this kind of thing. Very useful, minimally 
>  invasive, no impact unless configed in, and nothing measurable even then.
>  Hint. Hint ;-)

Ho hum.  It's up to the hordes of scheduler hackers really.  If they want,
and can agree upon a patch then go wild.  It should be against -mm minus
staircase, as there's a fair amount of scheduler stuff banked up for
post-2.6.8.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 19:50       ` Andrew Morton
@ 2004-08-04 20:07         ` Rick Lindsley
  0 siblings, 0 replies; 20+ messages in thread
From: Rick Lindsley @ 2004-08-04 20:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, kernel, linux-kernel, mingo

    Ho hum.  It's up to the hordes of scheduler hackers really.  If they
    want, and can agree upon a patch then go wild.  It should be against
    -mm minus staircase, as there's a fair amount of scheduler stuff
    banked up for post-2.6.8.

The patch exists for both -mm2 and -mm1, but I've been holding off
posting it until I get a chance to do more than simply compile it.
Our lab machines are back up now so I'll test a (-mm2 - staircase)
patch this afternoon.

Rick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 19:34     ` Martin J. Bligh
  2004-08-04 19:50       ` Andrew Morton
@ 2004-08-04 20:10       ` Ingo Molnar
  2004-08-04 20:36         ` Martin J. Bligh
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2004-08-04 20:10 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, kernel, linux-kernel, Rick Lindsley


* Martin J. Bligh <mbligh@aracnet.com> wrote:

> >>  SDET 16  (see disclaimer)
> >>                             Throughput    Std. Dev
> >>                      2.6.7       100.0%         0.3%
> >>                  2.6.8-rc2        99.5%         0.3%
> >>              2.6.8-rc2-mm2       118.5%         0.6%
> > 
> > hum, interesting.  Can Con's changes affect the inter-node and inter-cpu
> > balancing decisions, or is this all due to caching effects, reduced context
> > switching etc?

Martin, could you try 2.6.8-rc2-mm2 with staircase-cpu-scheduler 
unapplied a re-run at least part of your tests?

there are a number of NUMA improvements queued up on -mm, and it would
be nice to know what effect these cause, and what effect the staircase
scheduler has.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 20:10       ` Ingo Molnar
@ 2004-08-04 20:36         ` Martin J. Bligh
  2004-08-04 21:31           ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2004-08-04 20:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, kernel, linux-kernel, Rick Lindsley

--On Wednesday, August 04, 2004 22:10:19 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Martin J. Bligh <mbligh@aracnet.com> wrote:
> 
>> >>  SDET 16  (see disclaimer)
>> >>                             Throughput    Std. Dev
>> >>                      2.6.7       100.0%         0.3%
>> >>                  2.6.8-rc2        99.5%         0.3%
>> >>              2.6.8-rc2-mm2       118.5%         0.6%
>> > 
>> > hum, interesting.  Can Con's changes affect the inter-node and inter-cpu
>> > balancing decisions, or is this all due to caching effects, reduced context
>> > switching etc?
> 
> Martin, could you try 2.6.8-rc2-mm2 with staircase-cpu-scheduler 
> unapplied a re-run at least part of your tests?
> 
> there are a number of NUMA improvements queued up on -mm, and it would
> be nice to know what effect these cause, and what effect the staircase
> scheduler has.

Sure. I presume it's just the one patch:

staircase-cpu-scheduler-268-rc2-mm1.patch

which seemed to back out clean and is building now. Scream if that's not
all of it ...

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 20:36         ` Martin J. Bligh
@ 2004-08-04 21:31           ` Ingo Molnar
  2004-08-04 23:34             ` Martin J. Bligh
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2004-08-04 21:31 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, kernel, linux-kernel, Rick Lindsley


* Martin J. Bligh <mbligh@aracnet.com> wrote:

> > Martin, could you try 2.6.8-rc2-mm2 with staircase-cpu-scheduler 
> > unapplied a re-run at least part of your tests?
> > 
> > there are a number of NUMA improvements queued up on -mm, and it would
> > be nice to know what effect these cause, and what effect the staircase
> > scheduler has.
> 
> Sure. I presume it's just the one patch:
> 
> staircase-cpu-scheduler-268-rc2-mm1.patch
> 
> which seemed to back out clean and is building now. Scream if that's
> not all of it ...

correct, that's the end of the scheduler patch-queue and it works fine
if unapplied. (The schedstats patch i just sent applies cleanly to that
base, in case you need one.)

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 21:31           ` Ingo Molnar
@ 2004-08-04 23:34             ` Martin J. Bligh
  0 siblings, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2004-08-04 23:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, kernel, linux-kernel, Rick Lindsley

--On Wednesday, August 04, 2004 23:31:13 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Martin J. Bligh <mbligh@aracnet.com> wrote:
> 
>> > Martin, could you try 2.6.8-rc2-mm2 with staircase-cpu-scheduler 
>> > unapplied a re-run at least part of your tests?
>> > 
>> > there are a number of NUMA improvements queued up on -mm, and it would
>> > be nice to know what effect these cause, and what effect the staircase
>> > scheduler has.
>> 
>> Sure. I presume it's just the one patch:
>> 
>> staircase-cpu-scheduler-268-rc2-mm1.patch
>> 
>> which seemed to back out clean and is building now. Scream if that's
>> not all of it ...
> 
> correct, that's the end of the scheduler patch-queue and it works fine
> if unapplied. (The schedstats patch i just sent applies cleanly to that
> base, in case you need one.)

OK, the perf of 2.6.8-rc2-mm2 with the new sched code backed out is exactly
the same as 2.6.8-rc2 ... ie it's definitely the new sched code that makes
the improvement.

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 19:24   ` Andrew Morton
  2004-08-04 19:34     ` Martin J. Bligh
@ 2004-08-04 23:44     ` Peter Williams
  2004-08-04 23:59       ` Martin J. Bligh
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Williams @ 2004-08-04 23:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, kernel, linux-kernel, Ingo Molnar

Andrew Morton wrote:
> "Martin J. Bligh" <mbligh@aracnet.com> wrote:
> 
>>SDET 8  (see disclaimer)
>>                            Throughput    Std. Dev
>>                     2.6.7       100.0%         0.2%
>>                 2.6.8-rc2       100.2%         1.0%
>>             2.6.8-rc2-mm2       117.4%         0.9%
>>
>> SDET 16  (see disclaimer)
>>                            Throughput    Std. Dev
>>                     2.6.7       100.0%         0.3%
>>                 2.6.8-rc2        99.5%         0.3%
>>             2.6.8-rc2-mm2       118.5%         0.6%
> 
> 
> hum, interesting.  Can Con's changes affect the inter-node and inter-cpu
> balancing decisions, or is this all due to caching effects, reduced context
> switching etc?

One candidate for the cause of this improvement is the replacement of 
the active/expired array mechanism with a single array.  I believe that 
one of the short comings of the active/expired array mechanism is that 
it can lead to excessive queuing (possibly even starvation) of tasks 
that aren't considered "interactive".

> 
> I don't expect we'll be merging a new CPU scheduler into mainline any time
> soon, but we should work to understand where this improvement came from,
> and see if we can get the mainline scheduler to catch up.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 23:44     ` Peter Williams
@ 2004-08-04 23:59       ` Martin J. Bligh
  2004-08-05  5:20         ` Rick Lindsley
  0 siblings, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2004-08-04 23:59 UTC (permalink / raw)
  To: Peter Williams, Andrew Morton, Rick Lindsley
  Cc: kernel, linux-kernel, Ingo Molnar

--On Thursday, August 05, 2004 09:44:06 +1000 Peter Williams <pwil3058@bigpond.net.au> wrote:

> Andrew Morton wrote:
>> "Martin J. Bligh" <mbligh@aracnet.com> wrote:
>> 
>>> SDET 8  (see disclaimer)
>>>                            Throughput    Std. Dev
>>>                     2.6.7       100.0%         0.2%
>>>                 2.6.8-rc2       100.2%         1.0%
>>>             2.6.8-rc2-mm2       117.4%         0.9%
>>> 
>>> SDET 16  (see disclaimer)
>>>                            Throughput    Std. Dev
>>>                     2.6.7       100.0%         0.3%
>>>                 2.6.8-rc2        99.5%         0.3%
>>>             2.6.8-rc2-mm2       118.5%         0.6%
>> 
>> 
>> hum, interesting.  Can Con's changes affect the inter-node and inter-cpu
>> balancing decisions, or is this all due to caching effects, reduced context
>> switching etc?
> 
> One candidate for the cause of this improvement is the replacement of the active/expired array mechanism with a single array.  I believe that one of the short comings of the active/expired array mechanism is that it can lead to excessive queuing (possibly even starvation) of tasks that aren't considered "interactive".

Rick showed me schedstats graphs of the two ... it seems to have lower
latency, does less rebalancing, fewer pull_tasks, etc, etc. Everything
looks better ... he'll send them out soon, I think (hint, hint).

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-04 23:59       ` Martin J. Bligh
@ 2004-08-05  5:20         ` Rick Lindsley
  2004-08-05 10:45           ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Lindsley @ 2004-08-05  5:20 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Peter Williams, Andrew Morton, kernel, linux-kernel, Ingo Molnar

    Rick showed me schedstats graphs of the two ... it seems to have lower
    latency, does less rebalancing, fewer pull_tasks, etc, etc. Everything
    looks better ... he'll send them out soon, I think (hint, hint).

Okay, they're done. Here's the URL of the graphs:

    http://eaglet.rain.com/rick/linux/staircase/scase-vs-noscase.html

General summary: as Martin reported, we're seeing improvements in a number
of areas, at least with sdet.  The graphs as listed there represent stats
from four separate sdet runs run sequentially with an increasing load.
(We're trying to see if we can get the information from each run separately,
rather than the aggregate -- one of the hazards of an automated test
harness :)

Rick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-05  5:20         ` Rick Lindsley
@ 2004-08-05 10:45           ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2004-08-05 10:45 UTC (permalink / raw)
  To: Rick Lindsley
  Cc: Martin J. Bligh, Peter Williams, Andrew Morton, kernel,
	linux-kernel


* Rick Lindsley <ricklind@us.ibm.com> wrote:

> Okay, they're done. Here's the URL of the graphs:
> 
>     http://eaglet.rain.com/rick/linux/staircase/scase-vs-noscase.html
> 
> General summary: as Martin reported, we're seeing improvements in a
> number of areas, at least with sdet.  The graphs as listed there
> represent stats from four separate sdet runs run sequentially with an
> increasing load. (We're trying to see if we can get the information
> from each run separately, rather than the aggregate -- one of the
> hazards of an automated test harness :)

really nice results! Would be interesting to see the effect of Con's
patch on other SMP/NUMA workloads as well - i'd expect to see an
improvement there too. The test was done with the default interactive=1
compute=0 setting, right?

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
       [not found] <200408092240.05287.habanero@us.ibm.com>
@ 2004-08-10  4:08 ` Andrew Theurer
  2004-08-10  4:37   ` Con Kolivas
  2004-08-10  7:40   ` Rick Lindsley
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Theurer @ 2004-08-10  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: rocklind, mbligh, mingo, akpm

On Monday 09 August 2004 22:40, you wrote:
>    Rick showed me schedstats graphs of the two ... it seems to have lower
>     latency, does less rebalancing, fewer pull_tasks, etc, etc. Everything
>     looks better ... he'll send them out soon, I think (hint, hint).
>
> Okay, they're done. Here's the URL of the graphs:
>
>     http://eaglet.rain.com/rick/linux/staircase/scase-vs-noscase.html
>
> General summary: as Martin reported, we're seeing improvements in a number
> of areas, at least with sdet.  The graphs as listed there represent stats
> from four separate sdet runs run sequentially with an increasing load.
> (We're trying to see if we can get the information from each run
> separately, rather than the aggregate -- one of the hazards of an automated
> test harness :)

What's quite interesting is that there is a very noticeable surge in 
load_balance with staircase in the early stage of the test, but there appears 
to be -no- direct policy changes to load-balance at all in Con's patch (or at 
least I didn't notice it -please tell me if you did!).  You can see it in 
busy load_balance, sched_balance_exec, and pull_task.  The runslice and 
latency stats confirm this -no-staircase does not balance early on, and the 
tasks suffer, waiting on a cpu already loaded up.  I do not have an 
explanation for this; perhaps it has something to do with eliminating expired 
queue.

I would be nice to have per cpu runqueue lengths logged to see how this plays 
out -do the cpus on staircase obtain a runqueue length close to 
nr_running()/nr_online_cpus sooner than no-staircase?

Also, one big change apparent to me, the elimination of TIMESLICE_GRANULARITY.  
Do you have cswitch data?  I would not be surprised if it's a lot higher on 
-no-staircase, and cache is thrashed a lot more.  This may be something you 
can pull out of the -no-staircase kernel quite easily.

-Andrew Theurer

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-10  4:08 ` 2.6.8-rc2-mm2 performance improvements (scheduler?) Andrew Theurer
@ 2004-08-10  4:37   ` Con Kolivas
  2004-08-10 15:05     ` Andrew Theurer
  2004-08-10  7:40   ` Rick Lindsley
  1 sibling, 1 reply; 20+ messages in thread
From: Con Kolivas @ 2004-08-10  4:37 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: linux-kernel, rocklind, mbligh, mingo, akpm

Andrew Theurer writes:

> On Monday 09 August 2004 22:40, you wrote:
>>    Rick showed me schedstats graphs of the two ... it seems to have lower
>>     latency, does less rebalancing, fewer pull_tasks, etc, etc. Everything
>>     looks better ... he'll send them out soon, I think (hint, hint).
>>
>> Okay, they're done. Here's the URL of the graphs:
>>
>>     http://eaglet.rain.com/rick/linux/staircase/scase-vs-noscase.html
>>
>> General summary: as Martin reported, we're seeing improvements in a number
>> of areas, at least with sdet.  The graphs as listed there represent stats
>> from four separate sdet runs run sequentially with an increasing load.
>> (We're trying to see if we can get the information from each run
>> separately, rather than the aggregate -- one of the hazards of an automated
>> test harness :)
> 
> What's quite interesting is that there is a very noticeable surge in 
> load_balance with staircase in the early stage of the test, but there appears 
> to be -no- direct policy changes to load-balance at all in Con's patch (or at 
> least I didn't notice it -please tell me if you did!).  You can see it in 
> busy load_balance, sched_balance_exec, and pull_task.  The runslice and 
> latency stats confirm this -no-staircase does not balance early on, and the 
> tasks suffer, waiting on a cpu already loaded up.  I do not have an 
> explanation for this; perhaps it has something to do with eliminating expired 
> queue.

To be honest I have no idea why that's the case. One of the first things I 
did was eliminate the expired array and in my testing (up to 8x at osdl) I 
did not really notice this in and of itself made any big difference - of 
course this could be because the removal of the expired array was not done 
in a way which entitled starved tasks to run in reasonable timeframes.

> I would be nice to have per cpu runqueue lengths logged to see how this plays 
> out -do the cpus on staircase obtain a runqueue length close to 
> nr_running()/nr_online_cpus sooner than no-staircase?

/me looks in the schedstats peoples' way

> Also, one big change apparent to me, the elimination of TIMESLICE_GRANULARITY. 

Ah well I tuned the timeslice granularity and I can tell you it isn't quite 
what most people think. The granularity when you get to greater than 4 cpus 
is effectively _disabled_. So in fact, the timeslices are shorter in 
staircase (in normal interactive=1, compute=0 mode which is how martin 
would have tested it), not longer. But this is not the reason either since 
in "compute" mode they are ten times longer and this also improves 
throughput further.

> Do you have cswitch data?  I would not be surprised if it's a lot higher on 
> -no-staircase, and cache is thrashed a lot more.  This may be something you 
> can pull out of the -no-staircase kernel quite easily.

Well from what I got on 8x the optimal load (-j x4cpus) and maximal load 
(-j) on kernbench gives surprisingly similar context switch rates. It's only 
when I enable compute mode that the context switches drop compared to 
default staircase mode and mainline. You'd have to ask Martin and Rick about 
what they got.

> -Andrew Theurer

Cheers,
Con

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-10  4:08 ` 2.6.8-rc2-mm2 performance improvements (scheduler?) Andrew Theurer
  2004-08-10  4:37   ` Con Kolivas
@ 2004-08-10  7:40   ` Rick Lindsley
  2004-08-10 15:19     ` Andrew Theurer
  1 sibling, 1 reply; 20+ messages in thread
From: Rick Lindsley @ 2004-08-10  7:40 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: linux-kernel, mbligh, mingo, akpm

    What's quite interesting is that there is a very noticeable surge in
    load_balance with staircase in the early stage of the test, but there
    appears to be -no- direct policy changes to load-balance at all in
    Con's patch (or at least I didn't notice it -please tell me if you
    did!).  You can see it in busy load_balance, sched_balance_exec, and
    pull_task.  The runslice and latency stats confirm this -no-staircase
    does not balance early on, and the tasks suffer, waiting on a cpu
    already loaded up.  I do not have an explanation for this; perhaps
    it has something to do with eliminating expired queue.

Possibly.  The other factor thrown in here is that this was on an SMT
machine, so it's possible that the balancing is no different but we are
seeing tasks initially assigned more poorly.  Or, perhaps we're drawing
too much from one data point.

    It would be nice to have per cpu runqueue lengths logged to see how
    this plays out -do the cpus on staircase obtain a runqueue length
    close to nr_running()/nr_online_cpus sooner than no-staircase?

The only difficulty there is do we know how long it normally takes for
this to balance out?  We're taking samples every five seconds; might this
not work itself out between one snapshot and the next?  Shrug.  It would
be easy enough to add another field to report nr_running at the moment
the statistics snapshot was taken, but on anything but compute-intensive
benchmarks I'm afraid we might miss all the interesting data.

    Also, one big change apparent to me, the elimination of
    TIMESLICE_GRANULARITY.  Do you have cswitch data?  I would not
    be surprised if it's a lot higher on -no-staircase, and cache is
    thrashed a lot more.  This may be something you can pull out of the
    -no-staircase kernel quite easily.

Yes, sar data was collected every five seconds so I do have context switch
data.  The bad news is that it was collected for each of 10 runs times
four different loads, and I don't have any handy dandy scripts to pretty
it up :)  (Pause.) A quick exercise with a calculator, though, suggests
you are right. cswitches were 10%-20% higher on the no staircase runs.

Rick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-10  4:37   ` Con Kolivas
@ 2004-08-10 15:05     ` Andrew Theurer
  2004-08-10 20:57       ` Con Kolivas
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Theurer @ 2004-08-10 15:05 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, ricklind, mbligh, mingo, akpm

> > Also, one big change apparent to me, the elimination of
> > TIMESLICE_GRANULARITY.
>
> Ah well I tuned the timeslice granularity and I can tell you it isn't quite
> what most people think. The granularity when you get to greater than 4 cpus
> is effectively _disabled_. So in fact, the timeslices are shorter in
> staircase (in normal interactive=1, compute=0 mode which is how martin
> would have tested it), not longer. But this is not the reason either since
> in "compute" mode they are ten times longer and this also improves
> throughput further.

Interesting, I forgot about the "* nr_cpus" that was in the granularity 
calculation.  That does make me wonder, maybe the timeslices you are 
calculating could have something similar, but more appropriate.  

Since the number of runnable tasks on a cpu should play a part in latency (the 
more tasks, potentially the longer the latency), I wonder if the timeslice 
would benefit from a modifier like " / task_cpu(p)->nr_running ".  With this 
the base timeslice could be quite a bit larger to start for better cache 
warmth, and as we add more tasks to that cpu, the timeslices get smaller, so 
an acceptable latency is preserved.  


> > Do you have cswitch data?  I would not be surprised if it's a lot higher
> > on -no-staircase, and cache is thrashed a lot more.  This may be
> > something you can pull out of the -no-staircase kernel quite easily.
>
> Well from what I got on 8x the optimal load (-j x4cpus) and maximal load
> (-j) on kernbench gives surprisingly similar context switch rates. It's
> only when I enable compute mode that the context switches drop compared to
> default staircase mode and mainline. You'd have to ask Martin and Rick
> about what they got.

OK, thanks!

-Andrew Theurer

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-10  7:40   ` Rick Lindsley
@ 2004-08-10 15:19     ` Andrew Theurer
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Theurer @ 2004-08-10 15:19 UTC (permalink / raw)
  To: Rick Lindsley; +Cc: Con Kolivas, linux-kernel, mbligh, mingo, akpm

On Tuesday 10 August 2004 02:40, Rick Lindsley wrote:
>     What's quite interesting is that there is a very noticeable surge in
>     load_balance with staircase in the early stage of the test, but there
>     appears to be -no- direct policy changes to load-balance at all in
>     Con's patch (or at least I didn't notice it -please tell me if you
>     did!).  You can see it in busy load_balance, sched_balance_exec, and
>     pull_task.  The runslice and latency stats confirm this -no-staircase
>     does not balance early on, and the tasks suffer, waiting on a cpu
>     already loaded up.  I do not have an explanation for this; perhaps
>     it has something to do with eliminating expired queue.
>
> Possibly.  The other factor thrown in here is that this was on an SMT
> machine, so it's possible that the balancing is no different but we are
> seeing tasks initially assigned more poorly.  Or, perhaps we're drawing
> too much from one data point.

Yes, my first guess was that sched_balance_exec was changed, and I guess it 
was, but earlier than Con's patch.  The first conditional we had used to 
have:

if (this_rq()->nr_running <= 2)
                goto out;

but the 2 is now a 1 for both -rc2 and -rc2-mm2, so we tend to find the best 
cpu in the system more often now.

>
>     It would be nice to have per cpu runqueue lengths logged to see how
>     this plays out -do the cpus on staircase obtain a runqueue length
>     close to nr_running()/nr_online_cpus sooner than no-staircase?
>
> The only difficulty there is do we know how long it normally takes for
> this to balance out?  We're taking samples every five seconds; might this
> not work itself out between one snapshot and the next?  Shrug.  It would
> be easy enough to add another field to report nr_running at the moment
> the statistics snapshot was taken, but on anything but compute-intensive
> benchmarks I'm afraid we might miss all the interesting data.

Actually if you have sar cpu util data, we might be able to extract this.  For 
example, if we have balance issues on 16 user sdet, we may see that very 
early on the staircase cpu util was near 100%, where the no-staircase may 
have been much lower for the first portion of the test (showing that some 
cpus were idle while others may have had more than one task).  If we can see 
this in sar, IMO that would confirm some sort of indirect load balance 
improvement in staircase.


>     Also, one big change apparent to me, the elimination of
>     TIMESLICE_GRANULARITY.  Do you have cswitch data?  I would not
>     be surprised if it's a lot higher on -no-staircase, and cache is
>     thrashed a lot more.  This may be something you can pull out of the
>     -no-staircase kernel quite easily.
>
> Yes, sar data was collected every five seconds so I do have context switch
> data.  The bad news is that it was collected for each of 10 runs times
> four different loads, and I don't have any handy dandy scripts to pretty
> it up :)  (Pause.) A quick exercise with a calculator, though, suggests
> you are right. cswitches were 10%-20% higher on the no staircase runs.

Interesting.  I wouldn't expect it to account for up to 20% performance, but 
maybe 1-2%.
>
> Rick


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)
  2004-08-10 15:05     ` Andrew Theurer
@ 2004-08-10 20:57       ` Con Kolivas
  0 siblings, 0 replies; 20+ messages in thread
From: Con Kolivas @ 2004-08-10 20:57 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: linux-kernel, ricklind, mbligh, mingo, akpm

[-- Attachment #1: Type: text/plain, Size: 2393 bytes --]

Andrew Theurer wrote:
>>>Also, one big change apparent to me, the elimination of
>>>TIMESLICE_GRANULARITY.
>>
>>Ah well I tuned the timeslice granularity and I can tell you it isn't quite
>>what most people think. The granularity when you get to greater than 4 cpus
>>is effectively _disabled_. So in fact, the timeslices are shorter in
>>staircase (in normal interactive=1, compute=0 mode which is how martin
>>would have tested it), not longer. But this is not the reason either since
>>in "compute" mode they are ten times longer and this also improves
>>throughput further.
> 
> 
> Interesting, I forgot about the "* nr_cpus" that was in the granularity 
> calculation.  That does make me wonder, maybe the timeslices you are 
> calculating could have something similar, but more appropriate.  
> 
> Since the number of runnable tasks on a cpu should play a part in latency (the 
> more tasks, potentially the longer the latency), I wonder if the timeslice 
> would benefit from a modifier like " / task_cpu(p)->nr_running ".  With this 
> the base timeslice could be quite a bit larger to start for better cache 
> warmth, and as we add more tasks to that cpu, the timeslices get smaller, so 
> an acceptable latency is preserved.  

I had a problem with fairness once I made the timeslices too long since 
that also determines priority demotion in the staircase design. That's 
why I have the "compute" mode as quite a separate entity because the 
longer timeslices on their own weren't of any special benefit (in my up 
to 8x testing but could be elsewhere) unless I added the delayed 
preemption which is probably where the main extra cache warmth comes 
from in "compute" design. Of course this comes at a cost which is higher 
latencies... because normal priority preemption is delayed.

>>>Do you have cswitch data?  I would not be surprised if it's a lot higher
>>>on -no-staircase, and cache is thrashed a lot more.  This may be
>>>something you can pull out of the -no-staircase kernel quite easily.
>>
>>Well from what I got on 8x the optimal load (-j x4cpus) and maximal load
>>(-j) on kernbench gives surprisingly similar context switch rates. It's
>>only when I enable compute mode that the context switches drop compared to
>>default staircase mode and mainline. You'd have to ask Martin and Rick
>>about what they got.
> 
> 
> OK, thanks!
> 
> -Andrew Theurer

Cheers,
Con

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2004-08-10 20:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200408092240.05287.habanero@us.ibm.com>
2004-08-10  4:08 ` 2.6.8-rc2-mm2 performance improvements (scheduler?) Andrew Theurer
2004-08-10  4:37   ` Con Kolivas
2004-08-10 15:05     ` Andrew Theurer
2004-08-10 20:57       ` Con Kolivas
2004-08-10  7:40   ` Rick Lindsley
2004-08-10 15:19     ` Andrew Theurer
2004-08-04 15:10 Martin J. Bligh
2004-08-04 15:12 ` Martin J. Bligh
2004-08-04 19:24   ` Andrew Morton
2004-08-04 19:34     ` Martin J. Bligh
2004-08-04 19:50       ` Andrew Morton
2004-08-04 20:07         ` Rick Lindsley
2004-08-04 20:10       ` Ingo Molnar
2004-08-04 20:36         ` Martin J. Bligh
2004-08-04 21:31           ` Ingo Molnar
2004-08-04 23:34             ` Martin J. Bligh
2004-08-04 23:44     ` Peter Williams
2004-08-04 23:59       ` Martin J. Bligh
2004-08-05  5:20         ` Rick Lindsley
2004-08-05 10:45           ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox