Re: kernbench-16 on 2.5.59 vs 2.5.59-mm6

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: kernbench-16 on 2.5.59 vs 2.5.59-mm6
       [not found] <20030127174015$5cfa@gated-at.bofh.it>
@ 2003-01-27 18:48 ` Dipankar Sarma
  0 siblings, 0 replies; 3+ messages in thread
From: Dipankar Sarma @ 2003-01-27 18:48 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton

On Mon, Jan 27, 2003 at 06:40:15PM +0100, Martin J. Bligh wrote:
> Going from 59 to 59-mm6, I get:
> 
> Kernbench-16:
>                                    Elapsed        User      System         CPU
>                         2.5.59       47.45      568.02      143.17     1498.17
>                     2.5.59-mm6       47.18      567.15      138.62     1495.50
> 
> Summary: Scheduler stuff seems like a wash (schedule -> do_schedule). 
> Seems to be some sort of rearrangement of the dcache stuff which 
> appears to be mildly beneficial (what's going in there?). 
> 
> diffprofile (+ gets worse, - gets better).
> 
> 2023 do_schedule
> 485 dentry_open
> 289 .text.lock.file_table

Looks like you are getting hit by contention on files_lock. I have
been messing around with some code to split up the files_lock, but
I can't seem to get the locking in the tty layer right.

Hmm.. .text.lock.namei is probably dcache_lock. -mms no longer has
dcache_rcu, so not quite sure what helped you here.


Thanks
Dipankar

^ permalink raw reply	[flat|nested] 3+ messages in thread

* kernbench-16 on 2.5.59 vs 2.5.59-mm6
@ 2003-01-27 17:36 Martin J. Bligh
  2003-01-28  0:22 ` William Lee Irwin III
  0 siblings, 1 reply; 3+ messages in thread
From: Martin J. Bligh @ 2003-01-27 17:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

This test does a make -j X vmlinux on a 2.4.17 kernel compile with
a very large config set. X is 16 times the number of cpus. This is
on a 16-way NUMA-Q so we end up with a make -j256 (it's fastest
with about 1.5 * num_cpus), but this test puts more stress on the 
kernel. 

None of the other tests I ran showed anything very interesting.
(the new NUMA sched stuff from Ingo seems to give mild degredations
in -mjb ... probably needs some more tuning).

Going from 59 to 59-mm6, I get:

Kernbench-16:
                                   Elapsed        User      System         CPU
                        2.5.59       47.45      568.02      143.17     1498.17
                    2.5.59-mm6       47.18      567.15      138.62     1495.50

Summary: Scheduler stuff seems like a wash (schedule -> do_schedule). 
Seems to be some sort of rearrangement of the dcache stuff which 
appears to be mildly beneficial (what's going in there?). 
current_kernel_time seems to be less than half the cost, I'm assuming 
the new frlock kernel time stuff is doing that. This workload doesn't 
stress that very much, so I'll find a better test for that one ...

2.5.59: 1657 current_kernel_time
2.5.59-mm6: 747 current_kernel_time

diffprofile (+ gets worse, - gets better).

2023 do_schedule
485 dentry_open
289 .text.lock.file_table
132 clear_page_tables
131 pgd_ctor
113 vma_merge
75 kmap_atomic
62 get_empty_filp
51 can_vma_merge_after
-52 dget_locked
-54 vfs_follow_link
-55 kmem_cache_free
-66 buffered_rmqueue
-74 __copy_to_user_ll
-94 page_add_rmap
-102 fd_install
-110 __copy_from_user_ll
-117 __d_lookup
-157 do_generic_mapping_read
-188 path_lookup
-273 .text.lock.dec_and_lock
-275 file_ra_state_init
-283 do_anonymous_page
-331 pfn_to_nid
-405 page_remove_rmap
-413 pgd_alloc
-427 vm_enough_memory
-910 current_kernel_time
-1222 .text.lock.namei
-2076 total
-2133 schedule

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernbench-16 on 2.5.59 vs 2.5.59-mm6
  2003-01-27 17:36 Martin J. Bligh
@ 2003-01-28  0:22 ` William Lee Irwin III
  0 siblings, 0 replies; 3+ messages in thread
From: William Lee Irwin III @ 2003-01-28  0:22 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel

On Mon, Jan 27, 2003 at 09:36:52AM -0800, Martin J. Bligh wrote:
> 132 clear_page_tables
> 131 pgd_ctor
> -413 pgd_alloc

The pagetable preconstruction cache hit is spread across
clear_page_tables() and pgd_ctor() with the pgd_ctor patches.
This is the equivalent of the explicit zeroing in pgd_alloc().

Your result appears to imply the overhead has been reduced by 36%,
which is useful evidence for the PAE case. Before this the pgd_alloc()
overhead had only been observed on non-PAE systems.

Now, YTF hadn't I seen this before if all it took to bring it out was
a kernel compile? Perhaps diffprof (I prefer the multiplicative flavor
but nm that) of some flavor was lacking.

-- wli

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-01-28  0:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20030127174015$5cfa@gated-at.bofh.it>
2003-01-27 18:48 ` kernbench-16 on 2.5.59 vs 2.5.59-mm6 Dipankar Sarma
2003-01-27 17:36 Martin J. Bligh
2003-01-28  0:22 ` William Lee Irwin III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.