unusual scheduling performance

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* unusual scheduling performance
@ 2002-11-18  8:18 William Lee Irwin III
  2002-11-18 16:34 ` Martin J. Bligh
  2002-11-20 14:12 ` Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: William Lee Irwin III @ 2002-11-18  8:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, rml, riel, akpm

On 16x, 2.5.47 kernel compiles take about 26s when the machine is
otherwise idle.

On 32x, 2.5.47 kernel compiles take about 48s when the machine is 
otherwise idle.

When a single-threaded task consumes an entire cpu, kernel compiles
take 36s on 32s when the machine is idle aside from the task consuming
that cpu and the kernel compile itself.

I suspect the scheduler, because cpu reporting in top(1) shows that a
two or more cpu-intensive tasks are concentrated on the same cpu, and
some long-lived tasks appear to be "bouncing" across cpus. If someone
with knowledge and/or expertise with respect to scheduling semantics
could look into this, I would be much obliged. Resolving this would
likely address many SMP and/or NUMA scheduling performance issues.

Thanks,
Bill

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18  8:18 unusual scheduling performance William Lee Irwin III
@ 2002-11-18 16:34 ` Martin J. Bligh
  2002-11-18 16:53   ` William Lee Irwin III
  2002-11-20 14:12 ` Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Martin J. Bligh @ 2002-11-18 16:34 UTC (permalink / raw)
  To: William Lee Irwin III, linux-kernel; +Cc: mingo, rml, riel, akpm

> On 16x, 2.5.47 kernel compiles take about 26s when the machine is
> otherwise idle.
> 
> On 32x, 2.5.47 kernel compiles take about 48s when the machine is 
> otherwise idle.
> 
> When a single-threaded task consumes an entire cpu, kernel compiles
> take 36s on 32s when the machine is idle aside from the task consuming
> that cpu and the kernel compile itself.
> 
> I suspect the scheduler, because cpu reporting in top(1) shows that a
> two or more cpu-intensive tasks are concentrated on the same cpu, and
> some long-lived tasks appear to be "bouncing" across cpus. If someone
> with knowledge and/or expertise with respect to scheduling semantics
> could look into this, I would be much obliged. Resolving this would
> likely address many SMP and/or NUMA scheduling performance issues.

1. make -j <what?>

2. profiles?

3. Can you try the latest set of NUMA sched patches posted by Eric Focht?

M.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 16:34 ` Martin J. Bligh
@ 2002-11-18 16:53   ` William Lee Irwin III
  2002-11-18 17:53     ` Dave Hansen
  0 siblings, 1 reply; 19+ messages in thread
From: William Lee Irwin III @ 2002-11-18 16:53 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, mingo, rml, riel, akpm

On Mon, Nov 18, 2002 at 08:34:34AM -0800, Martin J. Bligh wrote:
> 1. make -j <what?>
> 2. profiles?
> 3. Can you try the latest set of NUMA sched patches posted by Eric Focht?

(1) make -j64 bzImage
(2) doesn't sound useful for load balancing
(3) sure


Bill

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 16:53   ` William Lee Irwin III
@ 2002-11-18 17:53     ` Dave Hansen
  2002-11-18 18:16       ` Andrew Morton
  2002-11-18 20:17       ` William Lee Irwin III
  0 siblings, 2 replies; 19+ messages in thread
From: Dave Hansen @ 2002-11-18 17:53 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Martin J. Bligh, linux-kernel, mingo, rml, riel, akpm

William Lee Irwin III wrote:
> On Mon, Nov 18, 2002 at 08:34:34AM -0800, Martin J. Bligh wrote:
> 
>>1. make -j <what?>
>>2. profiles?
>>3. Can you try the latest set of NUMA sched patches posted by Eric Focht?
> 
> (1) make -j64 bzImage
> (2) doesn't sound useful for load balancing
> (3) sure

I'm seeing the same thing.  In my pagecache warmup test, I do 20 greps 
to pull in a 10-gig fileset.  Each grep works on 1/20th of the files.

For a long, long time, one the file set was warmed up, the time to do 
the test took ~14 secdonds:
Average Real: 14.0824
Average User: 0.94055
Average Sys:  5.20875
Full profile here:
http://www.sr71.net/prof/grep/run-grep-warm-2.5.47-11-15-2002-15.21.31/

As of 2.5.47, it looks like this:
Average Real: 18.9168
Average User: 1.0073
Average Sys:  4.9918
Full profile here: 
http://www.sr71.net/prof/grep/run-grep-warm-2.5.47-11-15-2002-15.58.02/

                                readprofile ticks
                                ------------------
                                fast   slow   diff
        page_cache_readahead:     93     82    -11
     __generic_file_aio_read:     73     83     10
                   file_move:     52     86     34
                 dget_locked:     57     87     30
               proc_pid_stat:    149     88    -61
        ep_notify_file_close:     59     89     30
                get_pid_list:     21     91     70
                update_atime:    100     93     -7
               get_unused_fd:     23    105     82
                        fget:    120    113     -7
                        dput:    100    120     20
              get_empty_filp:    105    121     16
                 system_call:    113    129     16
     rwsem_down_write_failed:           133    133
             vfs_follow_link:    116    164     48
             file_read_actor:    198    227     29
                      __fput:    178    241     63
           radix_tree_lookup:    324    293    -31
         atomic_dec_and_lock:    229    307     78
     .text.lock.dec_and_lock:    111    331    220
              try_to_wake_up:           374    374
                 kmap_atomic:    346    398     52
               kunmap_atomic:    379    409     30
                    vfs_read:    440    431     -9
            .text.lock.namei:    149    482    333
                  __d_lookup:    456    518     62
              link_path_walk:    533    710    177
                    schedule:      1   1060   1059
     do_generic_mapping_read:   1880   1846    -34
                   poll_idle:   2059  33416  31357
              __copy_to_user:  94208  87678  -6530
                       total: 104173 132206  28033

So, schedule() is being called a _lot_ more.  But, for some reason, 
the slower one wasn't caught doing __copy_to_user() as much.

Bill, does this look like what you're seeing?
-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 17:53     ` Dave Hansen
@ 2002-11-18 18:16       ` Andrew Morton
  2002-11-18 18:34         ` Davide Libenzi
  2002-11-18 20:17       ` William Lee Irwin III
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2002-11-18 18:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: William Lee Irwin III, Martin J. Bligh, linux-kernel, mingo, rml,
	riel, akpm

Dave Hansen wrote:
> 
> ...
>      rwsem_down_write_failed:           133    133

Possible culprit.

Please stick a dump_stack() in rwsem_down_write_failed(), and add the below.
Suggest you stick with 2.5.47 to diagnose this.  The loss of kksymoops
is a pain.


 fs/eventpoll.c |    2 ++
 1 files changed, 2 insertions(+)

--- 25/fs/eventpoll.c~hey	Mon Nov 18 10:13:40 2002
+++ 25-akpm/fs/eventpoll.c	Mon Nov 18 10:14:01 2002
@@ -328,6 +328,8 @@ void eventpoll_release(struct file *file
 	if (list_empty(lsthead))
 		return;
 
+	printk("hey!\n");
+
 	/*
 	 * We don't want to get "file->f_ep_lock" because it is not
 	 * necessary. It is not necessary because we're in the "struct file"

_

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 18:16       ` Andrew Morton
@ 2002-11-18 18:34         ` Davide Libenzi
  2002-11-18 18:52           ` Andrew Morton
  2002-11-18 18:56           ` Dave Hansen
  0 siblings, 2 replies; 19+ messages in thread
From: Davide Libenzi @ 2002-11-18 18:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, William Lee Irwin III, Martin J. Bligh,
	Linux Kernel Mailing List, Ingo Molnar, Robert Love, riel, akpm

On Mon, 18 Nov 2002, Andrew Morton wrote:

> Dave Hansen wrote:
> >
> > ...
> >      rwsem_down_write_failed:           133    133
>
> Possible culprit.
>
> Please stick a dump_stack() in rwsem_down_write_failed(), and add the below.
> Suggest you stick with 2.5.47 to diagnose this.  The loss of kksymoops
> is a pain.
>
>
>  fs/eventpoll.c |    2 ++
>  1 files changed, 2 insertions(+)
>
> --- 25/fs/eventpoll.c~hey	Mon Nov 18 10:13:40 2002
> +++ 25-akpm/fs/eventpoll.c	Mon Nov 18 10:14:01 2002
> @@ -328,6 +328,8 @@ void eventpoll_release(struct file *file
>  	if (list_empty(lsthead))
>  		return;
>
> +	printk("hey!\n");
> +

Andrew, if you don't use epoll there's no way you get there. The function
eventpoll_file_init() initialize the list at each file* init in
fs/file_table.c
If you're not using epoll and you get there, someone is screwing up the
data inside the struct file



- Davide



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 18:34         ` Davide Libenzi
@ 2002-11-18 18:52           ` Andrew Morton
  2002-11-18 18:58             ` Davide Libenzi
  2002-11-18 18:56           ` Dave Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2002-11-18 18:52 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Dave Hansen, William Lee Irwin III, Martin J. Bligh,
	Linux Kernel Mailing List, Ingo Molnar, Robert Love, riel

Davide Libenzi wrote:
> 
> On Mon, 18 Nov 2002, Andrew Morton wrote:
> 
> > Dave Hansen wrote:
> > >
> > > ...
> > >      rwsem_down_write_failed:           133    133
> >
> > Possible culprit.
> >
> > Please stick a dump_stack() in rwsem_down_write_failed(), and add the below.
> > Suggest you stick with 2.5.47 to diagnose this.  The loss of kksymoops
> > is a pain.
> >
> >
> >  fs/eventpoll.c |    2 ++
> >  1 files changed, 2 insertions(+)
> >
> > --- 25/fs/eventpoll.c~hey     Mon Nov 18 10:13:40 2002
> > +++ 25-akpm/fs/eventpoll.c    Mon Nov 18 10:14:01 2002
> > @@ -328,6 +328,8 @@ void eventpoll_release(struct file *file
> >       if (list_empty(lsthead))
> >               return;
> >
> > +     printk("hey!\n");
> > +
> 
> Andrew, if you don't use epoll there's no way you get there.

Yup.  That was a random stab based on recently-added down_write()
calls.

However the down_write isn't there in 2.5.47 so that's a false
lead.  We'll need that dump_stack() output.

Here's Dave's profile.  ep_notify_file_close() makes a small appearance.
The change you made to 2.5.48 will wipe that out.  Neat.


  0.058%       78 locks_remove_flock
  0.062%       82 page_cache_readahead
  0.062%       83 __generic_file_aio_read
  0.065%       86 file_move
  0.065%       87 dget_locked
  0.066%       88 proc_pid_stat
  0.067%       89 ep_notify_file_close
  0.068%       91 get_pid_list
  0.070%       93 update_atime
  0.079%      105 get_unused_fd
  0.085%      113 fget
  0.090%      120 dput
  0.091%      121 get_empty_filp
  0.097%      129 system_call
  0.100%      133 rwsem_down_write_failed
  0.124%      164 vfs_follow_link
  0.171%      227 file_read_actor
  0.182%      241 __fput
  0.221%      293 radix_tree_lookup
  0.232%      307 atomic_dec_and_lock
  0.250%      331 .text.lock.dec_and_lock
  0.282%      374 try_to_wake_up
  0.301%      398 kmap_atomic
  0.309%      409 kunmap_atomic
  0.326%      431 vfs_read
  0.364%      482 .text.lock.namei
  0.391%      518 __d_lookup
  0.537%      710 link_path_walk
  0.801%     1060 schedule
  1.396%     1846 do_generic_mapping_read
 25.275%    33416 poll_idle
 66.319%    87678 __copy_to_user
100.000%   132206 total

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 18:34         ` Davide Libenzi
  2002-11-18 18:52           ` Andrew Morton
@ 2002-11-18 18:56           ` Dave Hansen
  2002-11-18 18:59             ` Davide Libenzi
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2002-11-18 18:56 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andrew Morton, William Lee Irwin III, Martin J. Bligh,
	Linux Kernel Mailing List, Ingo Molnar, Robert Love, riel, akpm

Davide Libenzi wrote:
> On Mon, 18 Nov 2002, Andrew Morton wrote:
>> fs/eventpoll.c |    2 ++
>> 1 files changed, 2 insertions(+)
>>
>>--- 25/fs/eventpoll.c~hey	Mon Nov 18 10:13:40 2002
>>+++ 25-akpm/fs/eventpoll.c	Mon Nov 18 10:14:01 2002
>>@@ -328,6 +328,8 @@ void eventpoll_release(struct file *file
>> 	if (list_empty(lsthead))
>> 		return;
>>
>>+	printk("hey!\n");
>>+
> 
> Andrew, if you don't use epoll there's no way you get there. The function
> eventpoll_file_init() initialize the list at each file* init in
> fs/file_table.c
> If you're not using epoll and you get there, someone is screwing up the
> data inside the struct file

That little tidbit isn't even in .47.  Is that patch against one of 
the 2.5.47-mm's?

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 18:52           ` Andrew Morton
@ 2002-11-18 18:58             ` Davide Libenzi
  0 siblings, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2002-11-18 18:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, William Lee Irwin III, Martin J. Bligh,
	Linux Kernel Mailing List, Ingo Molnar, Robert Love, riel

On Mon, 18 Nov 2002, Andrew Morton wrote:

> Here's Dave's profile.  ep_notify_file_close() makes a small appearance.
> The change you made to 2.5.48 will wipe that out.  Neat.

It was per-Linus suggestion actually :)



- Davide



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 18:56           ` Dave Hansen
@ 2002-11-18 18:59             ` Davide Libenzi
  0 siblings, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2002-11-18 18:59 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Linux Kernel Mailing List

On Mon, 18 Nov 2002, Dave Hansen wrote:

> > If you're not using epoll and you get there, someone is screwing up the
> > data inside the struct file
>
> That little tidbit isn't even in .47.  Is that patch against one of
> the 2.5.47-mm's?

No Andrew took it from 2.5.48 ...



- Davide



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 17:53     ` Dave Hansen
  2002-11-18 18:16       ` Andrew Morton
@ 2002-11-18 20:17       ` William Lee Irwin III
  2002-11-18 22:51         ` Dave Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: William Lee Irwin III @ 2002-11-18 20:17 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Martin J. Bligh, linux-kernel, mingo, rml, riel, akpm

On Mon, Nov 18, 2002 at 09:53:24AM -0800, Dave Hansen wrote:
> I'm seeing the same thing.  In my pagecache warmup test, I do 20 greps 
> to pull in a 10-gig fileset.  Each grep works on 1/20th of the files.
[...]
> So, schedule() is being called a _lot_ more.  But, for some reason, 
> the slower one wasn't caught doing __copy_to_user() as much.
> Bill, does this look like what you're seeing?

No, I'm seeing strange load balancing behavior. But you seem to have
tripped over a somewhat more severe anomaly.


Bill

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 20:17       ` William Lee Irwin III
@ 2002-11-18 22:51         ` Dave Hansen
  2002-11-18 23:09           ` Andrew Morton
  2002-11-18 23:33           ` Davide Libenzi
  0 siblings, 2 replies; 19+ messages in thread
From: Dave Hansen @ 2002-11-18 22:51 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Martin J. Bligh, linux-kernel, mingo, rml, riel, akpm

As Andrew suggested, I put a dump_stack() in rwsem_down_write_failed().

This was actually in a 2.5.47 bk snapshot, so it has eventpoll in it.
kksymoops is broken, so:
dmesg | tail -20 | sort | uniq | ksymoops -m /boot/System.map

Trace; c01c5757 <rwsem_down_write_failed+27/170>
Trace; c01220c6 <update_wall_time+16/50>
Trace; c01223ee <do_timer+2e/c0>
Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
Trace; c0146568 <__fput+18/c0>
Trace; c010ae9a <handle_IRQ_event+2a/60>
Trace; c0144a05 <filp_close+85/b0>
Trace; c0144a8d <sys_close+5d/70>
Trace; c0108fab <syscall_call+7/b>

Trace; c01c5757 <rwsem_down_write_failed+27/170>
Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
Trace; c0146568 <__fput+18/c0>
Trace; c011e90b <do_softirq+6b/d0>
Trace; c0144a05 <filp_close+85/b0>
Trace; c0144a8d <sys_close+5d/70>
Trace; c0108fab <syscall_call+7/b>

Trace; c01c5757 <rwsem_down_write_failed+27/170>
Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
Trace; c0146568 <__fput+18/c0>
Trace; c0144c2d <generic_file_llseek+2d/e0>
Trace; c0144a05 <filp_close+85/b0>
Trace; c0144a8d <sys_close+5d/70>
Trace; c0108fab <syscall_call+7/b>

Trace; c01c5757 <rwsem_down_write_failed+27/170>
Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
Trace; c0146568 <__fput+18/c0>
Trace; c01553fa <sys_getdents64+4a/98>
Trace; c0144a05 <filp_close+85/b0>
Trace; c0144a8d <sys_close+5d/70>
Trace; c0108fab <syscall_call+7/b>

Mystery solved?

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 22:51         ` Dave Hansen
@ 2002-11-18 23:09           ` Andrew Morton
  2002-11-18 23:20             ` Davide Libenzi
  2002-11-18 23:26             ` Dave Hansen
  2002-11-18 23:33           ` Davide Libenzi
  1 sibling, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2002-11-18 23:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: William Lee Irwin III, Martin J. Bligh, linux-kernel, mingo, rml,
	riel, Davide Libenzi

Dave Hansen wrote:
> 
> As Andrew suggested, I put a dump_stack() in rwsem_down_write_failed().
> 
> This was actually in a 2.5.47 bk snapshot, so it has eventpoll in it.

So printk("hey!\n") would have worked.  Looks like it would have
talked to you, too...

> kksymoops is broken, so:
> dmesg | tail -20 | sort | uniq | ksymoops -m /boot/System.map
> 
> Trace; c01c5757 <rwsem_down_write_failed+27/170>
> Trace; c01220c6 <update_wall_time+16/50>
> Trace; c01223ee <do_timer+2e/c0>
> Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
> Trace; c0146568 <__fput+18/c0>
> Trace; c010ae9a <handle_IRQ_event+2a/60>
> Trace; c0144a05 <filp_close+85/b0>
> Trace; c0144a8d <sys_close+5d/70>
> Trace; c0108fab <syscall_call+7/b>
> 

So it would appear that eventpoll_release() is the problem.
How odd.  You're not actually _using_ epoll there, are you?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 23:09           ` Andrew Morton
@ 2002-11-18 23:20             ` Davide Libenzi
  2002-11-18 23:26             ` Dave Hansen
  1 sibling, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2002-11-18 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, William Lee Irwin III, Martin J. Bligh,
	Linux Kernel Mailing List, Ingo Molnar, Robert Love, riel

On Mon, 18 Nov 2002, Andrew Morton wrote:

> Dave Hansen wrote:
> >
> > As Andrew suggested, I put a dump_stack() in rwsem_down_write_failed().
> >
> > This was actually in a 2.5.47 bk snapshot, so it has eventpoll in it.
>
> So printk("hey!\n") would have worked.  Looks like it would have
> talked to you, too...
>
> > kksymoops is broken, so:
> > dmesg | tail -20 | sort | uniq | ksymoops -m /boot/System.map
> >
> > Trace; c01c5757 <rwsem_down_write_failed+27/170>
> > Trace; c01220c6 <update_wall_time+16/50>
> > Trace; c01223ee <do_timer+2e/c0>
> > Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
> > Trace; c0146568 <__fput+18/c0>
> > Trace; c010ae9a <handle_IRQ_event+2a/60>
> > Trace; c0144a05 <filp_close+85/b0>
> > Trace; c0144a8d <sys_close+5d/70>
> > Trace; c0108fab <syscall_call+7/b>
> >
>
> So it would appear that eventpoll_release() is the problem.
> How odd.  You're not actually _using_ epoll there, are you?

Could you pls use 2.5.48 ...
This is wierd, the code is straight forward.



- Davide



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 23:09           ` Andrew Morton
  2002-11-18 23:20             ` Davide Libenzi
@ 2002-11-18 23:26             ` Dave Hansen
  2002-11-18 23:30               ` Davide Libenzi
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2002-11-18 23:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: William Lee Irwin III, Martin J. Bligh, linux-kernel, mingo, rml,
	riel, Davide Libenzi

Andrew Morton wrote:
> Dave Hansen wrote:
>>kksymoops is broken, so:
>>dmesg | tail -20 | sort | uniq | ksymoops -m /boot/System.map
>>
>>Trace; c01c5757 <rwsem_down_write_failed+27/170>
>>Trace; c01220c6 <update_wall_time+16/50>
>>Trace; c01223ee <do_timer+2e/c0>
>>Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
>>Trace; c0146568 <__fput+18/c0>
>>Trace; c010ae9a <handle_IRQ_event+2a/60>
>>Trace; c0144a05 <filp_close+85/b0>
>>Trace; c0144a8d <sys_close+5d/70>
>>Trace; c0108fab <syscall_call+7/b>
>>
> 
> So it would appear that eventpoll_release() is the problem.
> How odd.  You're not actually _using_ epoll there, are you?

Not unless grep uses epoll.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 23:26             ` Dave Hansen
@ 2002-11-18 23:30               ` Davide Libenzi
  0 siblings, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2002-11-18 23:30 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Linux Kernel Mailing List

On Mon, 18 Nov 2002, Dave Hansen wrote:

> > So it would appear that eventpoll_release() is the problem.
> > How odd.  You're not actually _using_ epoll there, are you?
>
> Not unless grep uses epoll.

I'd be surprised if it would :)



- Davide



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18 22:51         ` Dave Hansen
  2002-11-18 23:09           ` Andrew Morton
@ 2002-11-18 23:33           ` Davide Libenzi
  1 sibling, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2002-11-18 23:33 UTC (permalink / raw)
  To: Dave Hansen
  Cc: William Lee Irwin III, Martin J. Bligh, linux-kernel, mingo, rml,
	riel, akpm

On Mon, 18 Nov 2002, Dave Hansen wrote:

> As Andrew suggested, I put a dump_stack() in rwsem_down_write_failed().
>
> This was actually in a 2.5.47 bk snapshot, so it has eventpoll in it.
> kksymoops is broken, so:
> dmesg | tail -20 | sort | uniq | ksymoops -m /boot/System.map
>
> Trace; c01c5757 <rwsem_down_write_failed+27/170>
> Trace; c01220c6 <update_wall_time+16/50>
> Trace; c01223ee <do_timer+2e/c0>
> Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
> Trace; c0146568 <__fput+18/c0>
> Trace; c010ae9a <handle_IRQ_event+2a/60>
> Trace; c0144a05 <filp_close+85/b0>
> Trace; c0144a8d <sys_close+5d/70>
> Trace; c0108fab <syscall_call+7/b>
>
> Trace; c01c5757 <rwsem_down_write_failed+27/170>
> Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
> Trace; c0146568 <__fput+18/c0>
> Trace; c011e90b <do_softirq+6b/d0>
> Trace; c0144a05 <filp_close+85/b0>
> Trace; c0144a8d <sys_close+5d/70>
> Trace; c0108fab <syscall_call+7/b>
>
> Trace; c01c5757 <rwsem_down_write_failed+27/170>
> Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
> Trace; c0146568 <__fput+18/c0>
> Trace; c0144c2d <generic_file_llseek+2d/e0>
> Trace; c0144a05 <filp_close+85/b0>
> Trace; c0144a8d <sys_close+5d/70>
> Trace; c0108fab <syscall_call+7/b>
>
> Trace; c01c5757 <rwsem_down_write_failed+27/170>
> Trace; c0166bd3 <.text.lock.eventpoll+6/f3>
> Trace; c0146568 <__fput+18/c0>
> Trace; c01553fa <sys_getdents64+4a/98>
> Trace; c0144a05 <filp_close+85/b0>
> Trace; c0144a8d <sys_close+5d/70>
> Trace; c0108fab <syscall_call+7/b>
>
> Mystery solved?

Could you pls put this in eventpoll_release() :


        if (list_empty(lsthead))
                return;

	printk("[%p] head=%p prev=%p next=%p\n", current, lsthead,
		lsthead->prev, lsthead->next);





- Davide



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-18  8:18 unusual scheduling performance William Lee Irwin III
  2002-11-18 16:34 ` Martin J. Bligh
@ 2002-11-20 14:12 ` Ingo Molnar
  2002-11-20 22:19   ` William Lee Irwin III
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2002-11-20 14:12 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel, Robert Love, riel, Andrew Morton

On Mon, 18 Nov 2002, William Lee Irwin III wrote:

> On 16x, 2.5.47 kernel compiles take about 26s when the machine is
> otherwise idle.
> 
> On 32x, 2.5.47 kernel compiles take about 48s when the machine is
> otherwise idle.

one thing to note is that the kernel's compilation is not something that
parallelizes well to above 8 CPUs. Our make architecture creates many link
points which serialize 'threads of compilation'.

i'd try two things:

 1) try Erich Focht's NUMA enhancements to the load balancer.

 2) remove the -pipe flag from arch/i386/Makefile

the later thing will reduce the number of processes and makes compilation
more localized to a single CPU - which might (or might not) help NUMA
architectures.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: unusual scheduling performance
  2002-11-20 14:12 ` Ingo Molnar
@ 2002-11-20 22:19   ` William Lee Irwin III
  0 siblings, 0 replies; 19+ messages in thread
From: William Lee Irwin III @ 2002-11-20 22:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Robert Love, riel, Andrew Morton

On Mon, 18 Nov 2002, William Lee Irwin III wrote:
>> On 16x, 2.5.47 kernel compiles take about 26s when the machine is
>> otherwise idle.
>> On 32x, 2.5.47 kernel compiles take about 48s when the machine is
>> otherwise idle.

On Wed, Nov 20, 2002 at 03:12:57PM +0100, Ingo Molnar wrote:
> one thing to note is that the kernel's compilation is not something that
> parallelizes well to above 8 CPUs. Our make architecture creates many link
> points which serialize 'threads of compilation'.

Well, I was only -j64. Thats 2 processes per-cpu... something unusual
seems to happen with low process/cpu density. Some fiddling around with
prior kernels seemed to show that both -j64 and -j256 were previously
near-equivalent sweet spots for 32x.

On Wed, Nov 20, 2002 at 03:12:57PM +0100, Ingo Molnar wrote:
> i'd try two things:
>  1) try Erich Focht's NUMA enhancements to the load balancer.
>  2) remove the -pipe flag from arch/i386/Makefile
> the later thing will reduce the number of processes and makes compilation
> more localized to a single CPU - which might (or might not) help NUMA
> architectures.

The unusual bit that neither of those can really address was that
eating a single cpu with something completely unrelated sped the whole
process up from 48s to 36s on 32x (this is all nicely repeatable). No
good explanations for this have surfaced yet. I'll have to get a good
way of logging what processes are chewing how much cpu and what cpus
they're running on before I can send comprehensible traces of this.

OTOH Focht's fork() and/or exec() -time load balancing should
significantly help the low process/cpu density case by creating an
opportunity for load balancing before the lifetime of short-lived
processes expires, with the added bonus of keeping things within
nodes most of the time.

Thanks,
Bill

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-11-20 22:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-18  8:18 unusual scheduling performance William Lee Irwin III
2002-11-18 16:34 ` Martin J. Bligh
2002-11-18 16:53   ` William Lee Irwin III
2002-11-18 17:53     ` Dave Hansen
2002-11-18 18:16       ` Andrew Morton
2002-11-18 18:34         ` Davide Libenzi
2002-11-18 18:52           ` Andrew Morton
2002-11-18 18:58             ` Davide Libenzi
2002-11-18 18:56           ` Dave Hansen
2002-11-18 18:59             ` Davide Libenzi
2002-11-18 20:17       ` William Lee Irwin III
2002-11-18 22:51         ` Dave Hansen
2002-11-18 23:09           ` Andrew Morton
2002-11-18 23:20             ` Davide Libenzi
2002-11-18 23:26             ` Dave Hansen
2002-11-18 23:30               ` Davide Libenzi
2002-11-18 23:33           ` Davide Libenzi
2002-11-20 14:12 ` Ingo Molnar
2002-11-20 22:19   ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox