* Performance of -mm2 and -mm4
@ 2004-08-23 16:58 Martin J. Bligh
2004-08-23 21:31 ` Jesse Barnes
2004-08-24 3:23 ` Nick Piggin
0 siblings, 2 replies; 7+ messages in thread
From: Martin J. Bligh @ 2004-08-23 16:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Nick Piggin
Kernbench: (make -j vmlinux, maximal tasks)
Elapsed System User CPU
2.6.8.1 43.90 87.76 572.94 1505.67
2.6.8.1-mm1 44.26 87.71 574.73 1496.33
2.6.8.1-mm2 44.27 90.27 574.84 1502.33
2.6.8.1-mm4 45.87 97.60 595.23 1510.00
mm2 seems to take slightly (but consistently) more systime than mm1, and
mm4 is significantly worse still ;-(
diffprofile from mm1 to mm2:
5469 32170.6% find_get_page
785 13.0% __d_lookup
476 0.4% total
128 21.9% generic_file_open
93 22.5% __alloc_pages
62 26.1% file_ra_state_init
58 0.0% put_page
54 9.9% dput
...
-51 -2.8% finish_task_switch
-55 -4.3% __wake_up
-67 -6.6% file_move
-128 -1.7% __copy_to_user_ll
-156 -1.1% do_anonymous_page
-2189 -4.7% default_idle
-3632 -100.0% find_trylock_page
and -mm1 to -mm4
5841 34358.8% find_get_page
5394 4.0% total
1459 24.2% __d_lookup
740 9.2% __copy_from_user_ll
718 9.5% __copy_to_user_ll
304 24.0% __wake_up
253 20.2% free_hot_cold_page
248 19.8% atomic_dec_and_lock
229 42.2% dput
228 13.7% path_lookup
226 43.0% Letext
202 34.6% generic_file_open
197 23.6% pte_alloc_one
194 77.0% pgd_ctor
180 22.6% kmem_cache_free
173 6.5% zap_pte_range
170 9.2% buffered_rmqueue
146 16.1% in_group_p
123 22.3% __fput
...
-56 -23.5% file_ra_state_init
-72 -31.7% page_add_anon_rmap
-104 -5.6% finish_task_switch
-124 -0.9% do_anonymous_page
-3633 -100.0% find_trylock_page
-4636 -10.0% default_idle
The -mm4 looks more like sched stuff to me (copy_to/from_user, etc),
but the -mm2 stuff looks like something else. Buggered if I know what.
-mm3 didn't compile cleanly, so I didn't bother, but I prob can if you
like.
m.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Performance of -mm2 and -mm4
2004-08-23 16:58 Performance of -mm2 and -mm4 Martin J. Bligh
@ 2004-08-23 21:31 ` Jesse Barnes
2004-08-24 0:41 ` Nick Piggin
2004-08-24 3:23 ` Nick Piggin
1 sibling, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2004-08-23 21:31 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, Nick Piggin
On Monday, August 23, 2004 9:58 am, Martin J. Bligh wrote:
> The -mm4 looks more like sched stuff to me (copy_to/from_user, etc),
> but the -mm2 stuff looks like something else. Buggered if I know what.
> -mm3 didn't compile cleanly, so I didn't bother, but I prob can if you
> like.
If you suspect the scheduler, you could try bumping SD_NODES_PER_DOMAIN in
kernel/sched.c to a larger value (e.g. the number of nodes in your system).
That'll make the scheduler balance more aggressively across the whole system.
Jesse
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Performance of -mm2 and -mm4
2004-08-23 21:31 ` Jesse Barnes
@ 2004-08-24 0:41 ` Nick Piggin
2004-08-24 1:29 ` Con Kolivas
0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2004-08-24 0:41 UTC (permalink / raw)
To: Jesse Barnes; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel
Jesse Barnes wrote:
>On Monday, August 23, 2004 9:58 am, Martin J. Bligh wrote:
>
>>The -mm4 looks more like sched stuff to me (copy_to/from_user, etc),
>>but the -mm2 stuff looks like something else. Buggered if I know what.
>>-mm3 didn't compile cleanly, so I didn't bother, but I prob can if you
>>like.
>>
>
>If you suspect the scheduler, you could try bumping SD_NODES_PER_DOMAIN in
>kernel/sched.c to a larger value (e.g. the number of nodes in your system).
>That'll make the scheduler balance more aggressively across the whole system.
>
>
Try increasing /proc/sys/kernel/base_timeslice as well.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Performance of -mm2 and -mm4
2004-08-24 0:41 ` Nick Piggin
@ 2004-08-24 1:29 ` Con Kolivas
2004-08-25 22:29 ` Martin J. Bligh
0 siblings, 1 reply; 7+ messages in thread
From: Con Kolivas @ 2004-08-24 1:29 UTC (permalink / raw)
To: Nick Piggin; +Cc: Jesse Barnes, Martin J. Bligh, Andrew Morton, linux-kernel
Nick Piggin writes:
>
>
> Jesse Barnes wrote:
>
>>On Monday, August 23, 2004 9:58 am, Martin J. Bligh wrote:
>>
>>>The -mm4 looks more like sched stuff to me (copy_to/from_user, etc),
>>>but the -mm2 stuff looks like something else. Buggered if I know what.
>>>-mm3 didn't compile cleanly, so I didn't bother, but I prob can if you
>>>like.
>>>
>>
>>If you suspect the scheduler, you could try bumping SD_NODES_PER_DOMAIN in
>>kernel/sched.c to a larger value (e.g. the number of nodes in your system).
>>That'll make the scheduler balance more aggressively across the whole system.
>>
>>
>
> Try increasing /proc/sys/kernel/base_timeslice as well.
Or back out nicksched.patch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Performance of -mm2 and -mm4
2004-08-24 1:29 ` Con Kolivas
@ 2004-08-25 22:29 ` Martin J. Bligh
0 siblings, 0 replies; 7+ messages in thread
From: Martin J. Bligh @ 2004-08-25 22:29 UTC (permalink / raw)
To: Con Kolivas, Nick Piggin; +Cc: Jesse Barnes, Andrew Morton, linux-kernel
>>>> The -mm4 looks more like sched stuff to me (copy_to/from_user, etc),
>>>> but the -mm2 stuff looks like something else. Buggered if I know what.
>>>> -mm3 didn't compile cleanly, so I didn't bother, but I prob can if you
>>>> like.
>>>>
>>>
>>> If you suspect the scheduler, you could try bumping SD_NODES_PER_DOMAIN in
>>> kernel/sched.c to a larger value (e.g. the number of nodes in your system).
>>> That'll make the scheduler balance more aggressively across the whole system.
>>>
>>>
>>
>> Try increasing /proc/sys/kernel/base_timeslice as well.
>
> Or back out nicksched.patch
Yeah, that mostly fixed it.
Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
Elapsed System User CPU
2.6.8.1 44.82 97.19 574.55 1497.33
2.6.8.1-mm4 46.82 107.47 594.15 1497.33
2.6.8.1-mm4-nn 44.93 96.33 576.44 1496.33
Kernbench: (make -j vmlinux, maximal tasks)
Elapsed System User CPU
2.6.8.1 43.90 87.76 572.94 1505.67
2.6.8.1-mm4 45.87 97.60 595.23 1510.00
2.6.8.1-mm4-nn 44.53 90.71 575.68 1495.67
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Performance of -mm2 and -mm4
2004-08-23 16:58 Performance of -mm2 and -mm4 Martin J. Bligh
2004-08-23 21:31 ` Jesse Barnes
@ 2004-08-24 3:23 ` Nick Piggin
2004-08-24 3:26 ` Martin J. Bligh
1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2004-08-24 3:23 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel
Martin J. Bligh wrote:
>Kernbench: (make -j vmlinux, maximal tasks)
> Elapsed System User CPU
> 2.6.8.1 43.90 87.76 572.94 1505.67
> 2.6.8.1-mm1 44.26 87.71 574.73 1496.33
> 2.6.8.1-mm2 44.27 90.27 574.84 1502.33
> 2.6.8.1-mm4 45.87 97.60 595.23 1510.00
>
>mm2 seems to take slightly (but consistently) more systime than mm1, and
>mm4 is significantly worse still ;-(
>
>
Increasing base_timeslice here takes about 10s off the user time,
and maybe 1-2 off elapsed. You may see a better improvement because
the machine I'm testing on has very small caches; I assume you are
using a 32-way NUMAQ with 1-2MB caches?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Performance of -mm2 and -mm4
2004-08-24 3:23 ` Nick Piggin
@ 2004-08-24 3:26 ` Martin J. Bligh
0 siblings, 0 replies; 7+ messages in thread
From: Martin J. Bligh @ 2004-08-24 3:26 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel
> Martin J. Bligh wrote:
>
>> Kernbench: (make -j vmlinux, maximal tasks)
>> Elapsed System User CPU
>> 2.6.8.1 43.90 87.76 572.94 1505.67
>> 2.6.8.1-mm1 44.26 87.71 574.73 1496.33
>> 2.6.8.1-mm2 44.27 90.27 574.84 1502.33
>> 2.6.8.1-mm4 45.87 97.60 595.23 1510.00
>>
>> mm2 seems to take slightly (but consistently) more systime than mm1, and
>> mm4 is significantly worse still ;-(
>>
>>
>
> Increasing base_timeslice here takes about 10s off the user time,
> and maybe 1-2 off elapsed. You may see a better improvement because
> the machine I'm testing on has very small caches; I assume you are
> using a 32-way NUMAQ with 1-2MB caches?
16-way with 2MB caches. Doing 256 as opposed to 64 gives a little less
user time, more systime at the low end, and a wash with more tasks.
Not much affects elapsed though. I'll try 16, then backing out the
sched patch, and what Jesse suggested as well.
M.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-08-25 22:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-23 16:58 Performance of -mm2 and -mm4 Martin J. Bligh
2004-08-23 21:31 ` Jesse Barnes
2004-08-24 0:41 ` Nick Piggin
2004-08-24 1:29 ` Con Kolivas
2004-08-25 22:29 ` Martin J. Bligh
2004-08-24 3:23 ` Nick Piggin
2004-08-24 3:26 ` Martin J. Bligh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox