public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.17 absurd number of context switches
@ 2001-12-28 16:35 Jeffrey W. Baker
  2001-12-28 17:02 ` Alan Cox
  2001-12-29  3:06 ` Anton Tinchev
  0 siblings, 2 replies; 14+ messages in thread
From: Jeffrey W. Baker @ 2001-12-28 16:35 UTC (permalink / raw)
  To: linux-kernel


Here's a slice of vmstat 1 on my 2-way x86, 2GB main memory machine
running Postgres 7.2beta4 on Linux 2.4.17:
                                                           cpu
r b w                                bi  bo  in    cs us sy id
7 0 0 371612 58272 18576 1568896 0 0  0 168 414 33113 49 38 13
9 0 0 371612 59168 18576 1568900 0 0  0  64 215 32143 56 36  8
5 0 0 371612 58532 18576 1568924 0 0  0 696 363 33553 52 41  7
8 0 0 371612 59344 18576 1568956 0 0 16 240 374 34237 52 38  9
3 0 0 371612 58860 18576 1568996 0 0  0 128 254 31848 51 38 11
6 0 0 371612 59172 18576 1568996 0 0  0  64 234 36340 56 30 14
3 0 0 371612 59092 18576 1569004 0 0  0 232 204 32065 48 42 11
                                                ^^^^^
Check out those figures for context switches!  30,000 switches per second
with only three runnable processes and practically no block I/O seems
quite high to me.  You can also see that the system is spending half its
time in the kernel, presumably in the scheduler.  Postgres is barely
getting any CPU time at all, and the performance suffers noticeably.

Is this a scheduler worst-case, something to be expected, or something I
can work around?

Please CC me since vger's majordomo is an impossible chunk of shit.

-jwb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 16:35 2.4.17 absurd number of context switches Jeffrey W. Baker
@ 2001-12-28 17:02 ` Alan Cox
  2001-12-28 17:07   ` Davide Libenzi
                     ` (2 more replies)
  2001-12-29  3:06 ` Anton Tinchev
  1 sibling, 3 replies; 14+ messages in thread
From: Alan Cox @ 2001-12-28 17:02 UTC (permalink / raw)
  To: Jeffrey W. Baker; +Cc: linux-kernel

> Check out those figures for context switches!  30,000 switches per second
> with only three runnable processes and practically no block I/O seems
> quite high to me.  You can also see that the system is spending half its
..
> Is this a scheduler worst-case, something to be expected, or something I
> can work around?

The scheduler is _good_ at the three process case. Run some straces it looks
more like postgres is doing wacky yield based locks.

Alan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:02 ` Alan Cox
@ 2001-12-28 17:07   ` Davide Libenzi
  2001-12-28 17:14     ` Stephan von Krawczynski
  2001-12-28 17:37     ` Alan Cox
  2001-12-28 17:44   ` Jeffrey W. Baker
  2001-12-29 20:49   ` Jeffrey W. Baker
  2 siblings, 2 replies; 14+ messages in thread
From: Davide Libenzi @ 2001-12-28 17:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeffrey W. Baker, linux-kernel

On Fri, 28 Dec 2001, Alan Cox wrote:

> > Check out those figures for context switches!  30,000 switches per second
> > with only three runnable processes and practically no block I/O seems
> > quite high to me.  You can also see that the system is spending half its
> ..
> > Is this a scheduler worst-case, something to be expected, or something I
> > can work around?
>
> The scheduler is _good_ at the three process case. Run some straces it looks
> more like postgres is doing wacky yield based locks.

The scheduler that Linus merged in 2.5.2-pre3 will solve the problem.



- Davide



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:07   ` Davide Libenzi
@ 2001-12-28 17:14     ` Stephan von Krawczynski
  2001-12-28 17:22       ` Davide Libenzi
  2001-12-28 17:37     ` Alan Cox
  1 sibling, 1 reply; 14+ messages in thread
From: Stephan von Krawczynski @ 2001-12-28 17:14 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: alan, jwb, linux-kernel

On Fri, 28 Dec 2001 09:07:48 -0800 (PST)
Davide Libenzi <davidel@xmailserver.org> wrote:

> The scheduler that Linus merged in 2.5.2-pre3 will solve the problem.

Could you kindly provide a patch for 2.4.17 for Jeffrey to test and give
comparison results to the list. This could be interesting for 2.4 too (which
the world uses nowadays (and in the near future))?

Thanks for your help ;-)

Stephan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:14     ` Stephan von Krawczynski
@ 2001-12-28 17:22       ` Davide Libenzi
  2001-12-28 17:43         ` Alan Cox
  0 siblings, 1 reply; 14+ messages in thread
From: Davide Libenzi @ 2001-12-28 17:22 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Alan Cox, jwb, lkml

On Fri, 28 Dec 2001, Stephan von Krawczynski wrote:

> On Fri, 28 Dec 2001 09:07:48 -0800 (PST)
> Davide Libenzi <davidel@xmailserver.org> wrote:
>
> > The scheduler that Linus merged in 2.5.2-pre3 will solve the problem.
>
> Could you kindly provide a patch for 2.4.17 for Jeffrey to test and give
> comparison results to the list. This could be interesting for 2.4 too (which
> the world uses nowadays (and in the near future))?
>
> Thanks for your help ;-)

You can't, the scheduler is changed.
Try this, in sys_sched_yield() remove :

        spin_lock_irq(&runqueue_lock);
        move_last_runqueue(current);
        spin_unlock_irq(&runqueue_lock);

and replace it with :

        local_irq_disable();
        if (current->counter > 0)
            --current->counter;
        local_irq_enable();



- Davide



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:37     ` Alan Cox
@ 2001-12-28 17:33       ` Davide Libenzi
  2001-12-28 17:45         ` Alan Cox
  0 siblings, 1 reply; 14+ messages in thread
From: Davide Libenzi @ 2001-12-28 17:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeffrey W. Baker, lkml

On Fri, 28 Dec 2001, Alan Cox wrote:

> > > The scheduler is _good_ at the three process case. Run some straces it looks
> > > more like postgres is doing wacky yield based locks.
> >
> > The scheduler that Linus merged in 2.5.2-pre3 will solve the problem.
>
> Looking at the postgres traces here it wont make any difference at all. Not
> one iota. If I am reading it right I have processes each going
> yield, yield, yield... so the kernel does just that (and indeed posix
> semantics require that behaviour).

task A old the lock , counter = 2
task B counter = 5 and task C counter = 4 are woke up
try to look at a switch dump.
the scheduler will spend 3 entire time slices switching between B and C
before A will get back the CPU and will free the lock.




- Davide



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:07   ` Davide Libenzi
  2001-12-28 17:14     ` Stephan von Krawczynski
@ 2001-12-28 17:37     ` Alan Cox
  2001-12-28 17:33       ` Davide Libenzi
  1 sibling, 1 reply; 14+ messages in thread
From: Alan Cox @ 2001-12-28 17:37 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Alan Cox, Jeffrey W. Baker, linux-kernel

> > The scheduler is _good_ at the three process case. Run some straces it looks
> > more like postgres is doing wacky yield based locks.
> 
> The scheduler that Linus merged in 2.5.2-pre3 will solve the problem.

Looking at the postgres traces here it wont make any difference at all. Not
one iota. If I am reading it right I have processes each going
yield, yield, yield... so the kernel does just that (and indeed posix
semantics require that behaviour).

Alan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:43         ` Alan Cox
@ 2001-12-28 17:43           ` Davide Libenzi
  0 siblings, 0 replies; 14+ messages in thread
From: Davide Libenzi @ 2001-12-28 17:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: Stephan von Krawczynski, jwb, lkml

On Fri, 28 Dec 2001, Alan Cox wrote:

> >         local_irq_disable();
> >         if (current->counter > 0)
> >             --current->counter;
> >         local_irq_enable();
>
> Umm: SuS sayeth..
>
>    DESCRIPTION
>
>      The sched_yield() function forces the running thread to relinquish the
>      processor until it again becomes the head of its thread list. It takes
>      no arguments.
>
> Which doesnt seem to be what you are doing.

1) the scheduler in 2.5.2-pre3 does it in a different way because the
	dynamic priority is split from the time slice.

2) the current scheduler does not permit you doing such a thing in a smart
	way so, if i've to choose i prefer an implementation that solves
	real world cases. i challenge you to measure the counter tick loss
	during the yield() call anyway. it's very easy indeed to measure
	the current behavior, like we're currently seeing




- Davide



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:22       ` Davide Libenzi
@ 2001-12-28 17:43         ` Alan Cox
  2001-12-28 17:43           ` Davide Libenzi
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Cox @ 2001-12-28 17:43 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Stephan von Krawczynski, Alan Cox, jwb, lkml

>         local_irq_disable();
>         if (current->counter > 0)
>             --current->counter;
>         local_irq_enable();

Umm: SuS sayeth..

   DESCRIPTION

     The sched_yield() function forces the running thread to relinquish the
     processor until it again becomes the head of its thread list. It takes
     no arguments.

Which doesnt seem to be what you are doing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:02 ` Alan Cox
  2001-12-28 17:07   ` Davide Libenzi
@ 2001-12-28 17:44   ` Jeffrey W. Baker
  2001-12-28 18:06     ` Davide Libenzi
  2001-12-29 20:49   ` Jeffrey W. Baker
  2 siblings, 1 reply; 14+ messages in thread
From: Jeffrey W. Baker @ 2001-12-28 17:44 UTC (permalink / raw)
  To: linux-kernel



On Fri, 28 Dec 2001, Alan Cox wrote:

> > Check out those figures for context switches!  30,000 switches per second
> > with only three runnable processes and practically no block I/O seems
> > quite high to me.  You can also see that the system is spending half its
> ..
> > Is this a scheduler worst-case, something to be expected, or something I
> > can work around?
>
> The scheduler is _good_ at the three process case. Run some straces it looks
> more like postgres is doing wacky yield based locks.

All I see in strace is semop forever

[pid 10076]      0.000054 semop(1179648, 0xbfffe6e8, 1 <unfinished ...>
[pid 10077]      0.000224 <... semop resumed> ) = 0
[pid 10077]      0.000077 semop(1179648, 0xbfffe1e8, 1) = 0
[pid 10077]      0.000057 semop(1179648, 0xbfffe0f8, 1 <unfinished ...>
[pid 10076]      0.000128 <... semop resumed> ) = 0
[pid 10076]      0.000035 semop(1179648, 0xbfffe6a8, 1) = 0
[pid 10076]      0.000127 semop(1179648, 0xbfffe758, 1 <unfinished ...>
[pid 10077]      0.000085 <... semop resumed> ) = 0
[pid 10077]      0.000075 semop(1179648, 0xbfffe0f8, 1) = 0
[pid 10077]      0.000155 semop(1179648, 0xbfffdfb8, 1 <unfinished ...>
[pid 10076]      0.000401 <... semop resumed> ) = 0
[pid 10076]      0.000034 semop(1179648, 0xbfffe758, 1) = 0
[pid 10076]      0.000046 semop(1179648, 0xbfffe758, 1 <unfinished ...>
[pid 10077]      0.000113 <... semop resumed> ) = 0
[pid 10077]      0.000040 semop(1179648, 0xbfffdf78, 1) = 0
[pid 10077]      0.000051 semop(1179648, 0xbfffdfc8, 1 <unfinished ...>
[pid 10076]      0.000317 <... semop resumed> ) = 0
[pid 10076]      0.000055 semop(1179648, 0xbfffe718, 1) = 0
[pid 10076]      0.000083 semop(1179648, 0xbfffe8d8, 1 <unfinished ...>
[pid 10077]      0.000217 <... semop resumed> ) = 0
[pid 10077]      0.000091 semop(1179648, 0xbfffdfc8, 1) = 0
[pid 10077]      0.000057 semop(1179648, 0xbfffdfa8, 1 <unfinished ...>
[pid 10076]      0.000191 <... semop resumed> ) = 0
[pid 10076]      0.000037 semop(1179648, 0xbfffe898, 1) = 0
[pid 10076]      0.000054 semop(1179648, 0xbfffe928, 1 <unfinished ...>
[pid 10077]      0.000056 <... semop resumed> ) = 0
[pid 10077]      0.000034 semop(1179648, 0xbfffdf68, 1) = 0

-jwb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:33       ` Davide Libenzi
@ 2001-12-28 17:45         ` Alan Cox
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Cox @ 2001-12-28 17:45 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Alan Cox, Jeffrey W. Baker, lkml

> the scheduler will spend 3 entire time slices switching between B and C
> before A will get back the CPU and will free the lock.

Ugggh. That sounds horribly plausible. 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:44   ` Jeffrey W. Baker
@ 2001-12-28 18:06     ` Davide Libenzi
  0 siblings, 0 replies; 14+ messages in thread
From: Davide Libenzi @ 2001-12-28 18:06 UTC (permalink / raw)
  To: Jeffrey W. Baker; +Cc: lkml

On Fri, 28 Dec 2001, Jeffrey W. Baker wrote:

>
>
> On Fri, 28 Dec 2001, Alan Cox wrote:
>
> > > Check out those figures for context switches!  30,000 switches per second
> > > with only three runnable processes and practically no block I/O seems
> > > quite high to me.  You can also see that the system is spending half its
> > ..
> > > Is this a scheduler worst-case, something to be expected, or something I
> > > can work around?
> >
> > The scheduler is _good_ at the three process case. Run some straces it looks
> > more like postgres is doing wacky yield based locks.
>
> All I see in strace is semop forever
>
> [pid 10076]      0.000054 semop(1179648, 0xbfffe6e8, 1 <unfinished ...>
> [pid 10077]      0.000224 <... semop resumed> ) = 0
> [pid 10077]      0.000077 semop(1179648, 0xbfffe1e8, 1) = 0
> [pid 10077]      0.000057 semop(1179648, 0xbfffe0f8, 1 <unfinished ...>
> [pid 10076]      0.000128 <... semop resumed> ) = 0
> [pid 10076]      0.000035 semop(1179648, 0xbfffe6a8, 1) = 0
> [pid 10076]      0.000127 semop(1179648, 0xbfffe758, 1 <unfinished ...>
> [pid 10077]      0.000085 <... semop resumed> ) = 0
> [pid 10077]      0.000075 semop(1179648, 0xbfffe0f8, 1) = 0
> [pid 10077]      0.000155 semop(1179648, 0xbfffdfb8, 1 <unfinished ...>
> [pid 10076]      0.000401 <... semop resumed> ) = 0
> [pid 10076]      0.000034 semop(1179648, 0xbfffe758, 1) = 0
> [pid 10076]      0.000046 semop(1179648, 0xbfffe758, 1 <unfinished ...>
> [pid 10077]      0.000113 <... semop resumed> ) = 0
> [pid 10077]      0.000040 semop(1179648, 0xbfffdf78, 1) = 0
> [pid 10077]      0.000051 semop(1179648, 0xbfffdfc8, 1 <unfinished ...>
> [pid 10076]      0.000317 <... semop resumed> ) = 0
> [pid 10076]      0.000055 semop(1179648, 0xbfffe718, 1) = 0
> [pid 10076]      0.000083 semop(1179648, 0xbfffe8d8, 1 <unfinished ...>
> [pid 10077]      0.000217 <... semop resumed> ) = 0
> [pid 10077]      0.000091 semop(1179648, 0xbfffdfc8, 1) = 0
> [pid 10077]      0.000057 semop(1179648, 0xbfffdfa8, 1 <unfinished ...>
> [pid 10076]      0.000191 <... semop resumed> ) = 0
> [pid 10076]      0.000037 semop(1179648, 0xbfffe898, 1) = 0
> [pid 10076]      0.000054 semop(1179648, 0xbfffe928, 1 <unfinished ...>
> [pid 10077]      0.000056 <... semop resumed> ) = 0
> [pid 10077]      0.000034 semop(1179648, 0xbfffdf68, 1) = 0

It's not the a sys_sched_yield() problem. probably and IPC_NOWAIT issue




- Davide



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 16:35 2.4.17 absurd number of context switches Jeffrey W. Baker
  2001-12-28 17:02 ` Alan Cox
@ 2001-12-29  3:06 ` Anton Tinchev
  1 sibling, 0 replies; 14+ messages in thread
From: Anton Tinchev @ 2001-12-29  3:06 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Jeffrey W. Baker

Is your system heavy loaded?
I'm running Postgres 7.1.2 on 2-way x86 too and noticed that context switches
are high!
Here the slice on about %50 free cpu time, but when goes up to %80-%90 -
context switches going to 10000-15000:
   procs                      memory    swap          io     system
cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
id
 1  0  0  87364  15448   2540 1218324   1   1    20    13    7    17  16   4
18
 3  0  0  87364  12596   2552 1219336  36   0   964     0  504  1385  30  16
53
 4  1  0  87364   7684   2556 1220568   0   0  1196   768  687  1645  28  16
56
 2  1  0  87364   6172   2568 1218940   0   0  1156     0  604  1485  31  11
58
10  0  0  87364   5648   2572 1218764   0   0   972     0  366  1012  23  18
60
 5  0  0  87364   6216   2580 1218208   0   0   604     0  411  1154  28  20
52
12  0  0  87364   5832   2580 1216952   0   0   272     0  317  1567  31  23
46
12  0  0  87364   6048   2592 1216100   0   0   168   384  448  2011  47  13
40
11  0  0  87364   5920   2596 1216116   0   0    20     0  284  2307  58  12
30
 9  0  0  87364   6556   2600 1216136   0   0    12     0  284  2164  53   9
38
11  0  0  87364   6536   2608 1216220   0   0    68     0  324  2140  50  12
38
23  0  0  87364   7176   2612 1216232   0   0     4     0  317  2238  46   9
46
12  0  0  87364   6880   2616 1216360  16   0   140   256  440  2367  33  18
49
 9  0  0  87364   6380   2644 1216692   0   0   336     0  454  2874  28  17
55
 7  0  0  87364   5956   2648 1216772   0   0    52     0  550  2659  28  17
55
 1  0  0  87364   5420   2628 1215544   0   0    32     0  538   648  13   2
85
 0  0  0  87364   6180   2632 1215636   0   0    72     0  421   430  11   3
86
 0  0  0  87364   6152   2632 1215664   0   0    20  1152  328   225   6   1
92
 0  0  0  87364   6116   2632 1215696   0   0    36     0  278   242   7   1
91
 0  0  0  87364   5512   2580 1214524   0   0   152     0  433   426  10   4
87
 0  0  0  87364   9072   2580 1214536   0   0     8     0  303   290   8   1
91


"Jeffrey W. Baker" wrote:

> Here's a slice of vmstat 1 on my 2-way x86, 2GB main memory machine
> running Postgres 7.2beta4 on Linux 2.4.17:
>                                                            cpu
> r b w                                bi  bo  in    cs us sy id
> 7 0 0 371612 58272 18576 1568896 0 0  0 168 414 33113 49 38 13
> 9 0 0 371612 59168 18576 1568900 0 0  0  64 215 32143 56 36  8
> 5 0 0 371612 58532 18576 1568924 0 0  0 696 363 33553 52 41  7
> 8 0 0 371612 59344 18576 1568956 0 0 16 240 374 34237 52 38  9
> 3 0 0 371612 58860 18576 1568996 0 0  0 128 254 31848 51 38 11
> 6 0 0 371612 59172 18576 1568996 0 0  0  64 234 36340 56 30 14
> 3 0 0 371612 59092 18576 1569004 0 0  0 232 204 32065 48 42 11
>                                                 ^^^^^
> Check out those figures for context switches!  30,000 switches per second
> with only three runnable processes and practically no block I/O seems
> quite high to me.  You can also see that the system is spending half its
> time in the kernel, presumably in the scheduler.  Postgres is barely
> getting any CPU time at all, and the performance suffers noticeably.
>
> Is this a scheduler worst-case, something to be expected, or something I
> can work around?
>
> Please CC me since vger's majordomo is an impossible chunk of shit.
>
> -jwb
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.17 absurd number of context switches
  2001-12-28 17:02 ` Alan Cox
  2001-12-28 17:07   ` Davide Libenzi
  2001-12-28 17:44   ` Jeffrey W. Baker
@ 2001-12-29 20:49   ` Jeffrey W. Baker
  2 siblings, 0 replies; 14+ messages in thread
From: Jeffrey W. Baker @ 2001-12-29 20:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel



On Fri, 28 Dec 2001, Alan Cox wrote:

> > Check out those figures for context switches!  30,000 switches per second
> > with only three runnable processes and practically no block I/O seems
> > quite high to me.  You can also see that the system is spending half its
> ..
> > Is this a scheduler worst-case, something to be expected, or something I
> > can work around?
>
> The scheduler is _good_ at the three process case. Run some straces it looks
> more like postgres is doing wacky yield based locks.

As it turns out, Postgres 7.2 is forcing all this context switching, and
it really hasn't got anything to do with the kernel.

-jwb


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-12-29 20:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-28 16:35 2.4.17 absurd number of context switches Jeffrey W. Baker
2001-12-28 17:02 ` Alan Cox
2001-12-28 17:07   ` Davide Libenzi
2001-12-28 17:14     ` Stephan von Krawczynski
2001-12-28 17:22       ` Davide Libenzi
2001-12-28 17:43         ` Alan Cox
2001-12-28 17:43           ` Davide Libenzi
2001-12-28 17:37     ` Alan Cox
2001-12-28 17:33       ` Davide Libenzi
2001-12-28 17:45         ` Alan Cox
2001-12-28 17:44   ` Jeffrey W. Baker
2001-12-28 18:06     ` Davide Libenzi
2001-12-29 20:49   ` Jeffrey W. Baker
2001-12-29  3:06 ` Anton Tinchev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox