Interesting interaction between lguest and CFS

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Interesting interaction between lguest and CFS
@ 2007-06-04 17:37 Matt Mackall
  2007-06-04 17:54 ` Ingo Molnar
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Matt Mackall @ 2007-06-04 17:37 UTC (permalink / raw)
  To: Ingo Molnar, Rusty Russell, akpm; +Cc: Linux-kernel

With 2.6.22-rc3-mm1, I've got a long-running video transcoding going
on. In other windows, I'm compiling, reading email, etc. with no
noticeable problems.

If I fire up lguest and leave it sitting at a shell prompt for a
couple moments, when I return to type something at the prompt, it can
take 2-3 seconds for my input to echo. Once it starts responding,
typing latency disappears. Suspend the other app and the latency
disappears.

The system isn't swapping and has basically no I/O load.

Ok, my transcoding just finished as I was writing this. So I've
reproduced the problem with this Python script that I had handy:

memload.py:
#!/usr/bin/python
a = "a" * 16 * 1024 * 1024
while 1:
    b = a[1:] + "b"
    a = b[1:] + "c"

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 17:37 Interesting interaction between lguest and CFS Matt Mackall
@ 2007-06-04 17:54 ` Ingo Molnar
  2007-06-04 18:41   ` Matt Mackall
  2007-06-04 18:11 ` Ingo Molnar
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-04 17:54 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> With 2.6.22-rc3-mm1, I've got a long-running video transcoding going 
> on. In other windows, I'm compiling, reading email, etc. with no 
> noticeable problems.
> 
> If I fire up lguest and leave it sitting at a shell prompt for a 
> couple moments, when I return to type something at the prompt, it can 
> take 2-3 seconds for my input to echo. Once it starts responding, 
> typing latency disappears. Suspend the other app and the latency 
> disappears.
> 
> The system isn't swapping and has basically no I/O load.

could you send me the /proc/PID/sched file of that task which shows this 
2-3 seconds delay? (if it's a shell and a graphical terminal then please 
send the sched-debug file of both.) I suspect the latency goes away if 
you strace -f the shell task(s)? If not then please send the strace 
output too.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 17:54 ` Ingo Molnar
@ 2007-06-04 18:41   ` Matt Mackall
  2007-06-05  7:19     ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2007-06-04 18:41 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, akpm, Linux-kernel

On Mon, Jun 04, 2007 at 07:54:36PM +0200, Ingo Molnar wrote:
> 
> * Matt Mackall <mpm@selenic.com> wrote:
> 
> > With 2.6.22-rc3-mm1, I've got a long-running video transcoding going 
> > on. In other windows, I'm compiling, reading email, etc. with no 
> > noticeable problems.
> > 
> > If I fire up lguest and leave it sitting at a shell prompt for a 
> > couple moments, when I return to type something at the prompt, it can 
> > take 2-3 seconds for my input to echo. Once it starts responding, 
> > typing latency disappears. Suspend the other app and the latency 
> > disappears.
> > 
> > The system isn't swapping and has basically no I/O load.
> 
> could you send me the /proc/PID/sched file of that task which shows this 
> 2-3 seconds delay? (if it's a shell and a graphical terminal then please 
> send the sched-debug file of both.) I suspect the latency goes away if 
> you strace -f the shell task(s)? If not then please send the strace 
> output too.

Not sure why the shell task (bash) is interesting here?

gnome-terminal(4177)->bash(4409)->lg(24934)->lguest(24936)->lguest(24944)

(lg is a shell script to launch lguest)

/proc/4177/sched 
wait_start               :                   0
wait_start_fair          :                   0
exec_start               :                   0
sleep_start              :   53594838106004537
sleep_start_fair         :   53575815099165186
block_start              :                   0
sleep_max                :         57476665627
block_max                :   18014060106626075
exec_max                 :             4021419
wait_max                 :           557161172
last_ran                 :                   0
wait_runtime             :            23552233
wait_runtime_overruns    :             1864716
wait_runtime_underruns   :                 310
sum_exec_runtime         :        678350336100
clock-delta              :                  94

cat /proc/4409/sched 
wait_start               :                   0
wait_start_fair          :                   0
exec_start               :                   0
sleep_start              :   53593602868366272
sleep_start_fair         :   53575146154577161
block_start              :                   0
sleep_max                :       2006065218683
block_max                :           457256091
exec_max                 :             3995741
wait_max                 :           107832482
last_ran                 :                   0
wait_runtime             :            23634554
wait_runtime_overruns    :                 316
wait_runtime_underruns   :                   9
sum_exec_runtime         :           289263702
clock-delta              :                  83

/proc/24936/sched 
wait_start               :                   0
wait_start_fair          :                   0
exec_start               :                   0
sleep_start              :   53594765858891273
sleep_start_fair         :   53575748785282138
block_start              :                   0
sleep_max                :            21492554
block_max                :            27044576
exec_max                 :             4008057
wait_max                 :          1253670288
last_ran                 :                   0
wait_runtime             :            23955448
wait_runtime_overruns    :              270678
wait_runtime_underruns   :                   1
sum_exec_runtime         :         36978464446
clock-delta              :                  81

/proc/24944/sched 
wait_start               :                   0
wait_start_fair          :                   0
exec_start               :                   0
sleep_start              :   53594628265104224
sleep_start_fair         :   53575623092295018
block_start              :                   0
sleep_max                :        223179035512
block_max                :                   0
exec_max                 :             3960054
wait_max                 :          4334838886
last_ran                 :                   0
wait_runtime             :            23959470
wait_runtime_overruns    :                1243
wait_runtime_underruns   :                  15
sum_exec_runtime         :            17872418
clock-delta              :                  81

strace -f doesn't kill the latency, here's an strace:

24936 13:39:44.469329 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:44.469572 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:44.469677 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:44.469765 read(7, 0xbfe6c010, 8) = -1 EINTR (Interrupted system call)
24936 13:39:48.699490 --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
24936 13:39:48.699655 sigreturn()       = ? (mask now [])
24936 13:39:48.699842 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.699939 select(4, [0 3], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
24936 13:39:48.700063 write(7, "\1\0\0\0\0\0\0\0", 8) = 5675552
24936 13:39:48.700168 readv(0, [{"sdfsdfsdfsdfsdon\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256}], 1) = 14
24936 13:39:48.700360 write(7, "\2\0\0\0\3\0\0\0", 8) = 0
24936 13:39:48.700451 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.700547 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.700635 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.700872 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.700962 writev(1, [{"s", 1}], 1) = 1
24936 13:39:48.701200 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.701297 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.701385 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.701495 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.701584 writev(1, [{"d", 1}], 1) = 1
24936 13:39:48.701773 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.701870 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.701957 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.702061 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.702149 writev(1, [{"f", 1}], 1) = 1
24936 13:39:48.702336 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.702431 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.702519 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.702622 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.702710 writev(1, [{"s", 1}], 1) = 1
24936 13:39:48.702898 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.702993 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.703157 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.703264 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.703352 writev(1, [{"d", 1}], 1) = 1
24936 13:39:48.703541 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.703636 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.703724 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.703828 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.703916 writev(1, [{"f", 1}], 1) = 1
24936 13:39:48.704103 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.704198 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.704285 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.704389 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.704476 writev(1, [{"s", 1}], 1) = 1
24936 13:39:48.704662 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.704758 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.704845 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.704948 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.705036 writev(1, [{"d", 1}], 1) = 1
24936 13:39:48.705222 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.705317 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.705405 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.705508 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.705595 writev(1, [{"f", 1}], 1) = 1
24936 13:39:48.705781 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.705877 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.705964 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.706068 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.706155 writev(1, [{"s", 1}], 1) = 1
24936 13:39:48.706341 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.706483 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.706571 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.706675 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.706763 writev(1, [{"d", 1}], 1) = 1
24936 13:39:48.706950 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.707045 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.707221 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.707325 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.707414 writev(1, [{"f", 1}], 1) = 1
24936 13:39:48.707600 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.707695 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.707800 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.707904 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.707992 writev(1, [{"s", 1}], 1) = 1
24936 13:39:48.708177 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.708273 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.708360 read(7, "P\216\304\2\0\0\0\0", 8) = 8
24936 13:39:48.708463 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.708551 writev(1, [{"d", 1}], 1) = 1
24936 13:39:48.708736 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.708832 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.708919 read(7, 0xbfe6c010, 8) = -1 EINTR (Interrupted system call)
24936 13:39:48.951555 --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
24936 13:39:48.951697 sigreturn()       = ? (mask now [])
24936 13:39:48.951884 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.951983 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
24936 13:39:48.952092 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
24936 13:39:48.952184 read(7,  <unfinished ...>



-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 18:41   ` Matt Mackall
@ 2007-06-05  7:19     ` Ingo Molnar
  2007-06-05 14:03       ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-05  7:19 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> sleep_max                :         57476665627
> block_max                :   18014060106626075

hm, this block_max looks a bit suspect, it's 003fffb1359e341b. Does your 
box make any use of cpufreq? (what CPU is it?)

> strace -f doesn't kill the latency, here's an strace:
> 
> 24936 13:39:44.469329 rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
> 24936 13:39:44.469572 select(4, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
> 24936 13:39:44.469677 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
> 24936 13:39:44.469765 read(7, 0xbfe6c010, 8) = -1 EINTR (Interrupted system call)
> 24936 13:39:48.699490 --- SIGUSR1 (User defined signal 1) @ 0 (0) ---

hm, this indeed implicates some wakeup problem. lguest task 24936's 
relevant stats are:

 block_start              :                   0
 sleep_max                :            21492554
 block_max                :            27044576
 exec_max                 :             4008057
 wait_max                 :          1253670288

the wait_max means it was delayed on the runqueue for 1.2 seconds. Could 
you try to get a /proc/sched_debug snapshot done exactly during the 
'delay' window - it's 4 seconds so you should in theory be able to 
trigger it by doing something like:

  sleep 3; cat /proc/sched_debug > sched_debug.txt

Click into the lguest window and trigger the delay.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  7:19     ` Ingo Molnar
@ 2007-06-05 14:03       ` Matt Mackall
  2007-06-05 15:14         ` Dmitry Adamushko
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2007-06-05 14:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, akpm, Linux-kernel

On Tue, Jun 05, 2007 at 09:19:04AM +0200, Ingo Molnar wrote:
> 
> * Matt Mackall <mpm@selenic.com> wrote:
> 
> > sleep_max                :         57476665627
> > block_max                :   18014060106626075
> 
> hm, this block_max looks a bit suspect, it's 003fffb1359e341b. Does your 
> box make any use of cpufreq? (what CPU is it?)

Yes, I use the conservative governor. As I posted yesterday, setting
the governor to performance doesn't help: the CPU hog drives the CPU
to full speed.

> > 24936 13:39:44.469677 rt_sigprocmask(SIG_UNBLOCK, [USR1], NULL, 8) = 0
> > 24936 13:39:44.469765 read(7, 0xbfe6c010, 8) = -1 EINTR (Interrupted system call)
> > 24936 13:39:48.699490 --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
> 
> hm, this indeed implicates some wakeup problem. lguest task 24936's 
> relevant stats are:
> 
>  block_start              :                   0
>  sleep_max                :            21492554
>  block_max                :            27044576
>  exec_max                 :             4008057
>  wait_max                 :          1253670288
> 
> the wait_max means it was delayed on the runqueue for 1.2 seconds. Could 
> you try to get a /proc/sched_debug snapshot done exactly during the 
> 'delay' window - it's 4 seconds so you should in theory be able to 
> trigger it by doing something like:
> 
>   sleep 3; cat /proc/sched_debug > sched_debug.txt
> 
> Click into the lguest window and trigger the delay.

I did:

while true; do sleep 1; cat /proc/sched_debug > sched_debug.txt; done

and got this, hopefully inside the window:

Sched Debug Version: v0.02
now at 257428593818894 nsecs

cpu: 0
  .nr_running            : 3
  .raw_weighted_load     : 2063
  .nr_switches           : 242830075
  .nr_load_updates       : 30172063
  .nr_uninterruptible    : 0
  .jiffies               : 64282148
  .next_balance          : 0
  .curr->pid             : 27182
  .clock                 : 125650217819008823
  .prev_clock_raw        : 257428516403535
  .clock_warps           : 9
  .clock_unstable_events : 41133344
  .clock_max_delta       : 3954619
  .fair_clock            : 125631632368075562
  .prev_fair_clock       : 0
  .exec_clock            : 125649753800946713
  .prev_exec_clock       : 0
  .wait_runtime          : -12865352643
  .wait_runtime_overruns : 41121996
  .wait_runtime_underruns: 4107250
  .cpu_load[0]           : 1039
  .cpu_load[1]           : 1039
  .cpu_load[2]           : 1039
  .cpu_load[3]           : 1041
  .cpu_load[4]           : 1053
  .wait_runtime_rq_sum   : -48091562

runnable tasks:
            task   PID        tree-key         delta       waiting  switches  prio        sum-exec        sum-wait       sum-sleep    wait-overrun   wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
R            cat 27182 125631632367998747        -76815        -91562         6   120          250189          -91562           75163               0               0
          python 26973 125631632404141566      36066004     -24000000     25486   120     93635486661     -1940103390          140903              49            2069
          lguest 26986 125631633015454796     647379234     -24000000      1481   139         7638721     37069331905     90071079196            1461               2


-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05 14:03       ` Matt Mackall
@ 2007-06-05 15:14         ` Dmitry Adamushko
  2007-06-05 15:41           ` Dmitry Adamushko
  0 siblings, 1 reply; 38+ messages in thread
From: Dmitry Adamushko @ 2007-06-05 15:14 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Molnar, Rusty Russell, akpm, Linux-kernel

On 05/06/07, Matt Mackall <mpm@selenic.com> wrote:
> > [...]
> > Click into the lguest window and trigger the delay.
>
> I did:
>
> while true; do sleep 1; cat /proc/sched_debug > sched_debug.txt; done
>
> and got this, hopefully inside the window:
>
> Sched Debug Version: v0.02
> now at 257428593818894 nsecs
>
> cpu: 0
>   .nr_running            : 3
>   .raw_weighted_load     : 2063
>   .nr_switches           : 242830075
>   .nr_load_updates       : 30172063
>   .nr_uninterruptible    : 0
>   .jiffies               : 64282148
>   .next_balance          : 0
>   .curr->pid             : 27182
>   .clock                 : 125650217819008823
>   .prev_clock_raw        : 257428516403535

The delta (clock - prev_clock_raw) looks insane.

The current time (which doesn't depend on rq_clock() --> sched_clock() is
" now at 257428593818894 nsecs " (right at the beginning of the output).

'prev_clock_raw' is updated any time rq_clock() is called - basically
upon any scheduling operation (e.g. enqueue/dequeue)

now - prev_clock_raw == 257428593818894 - 257428516403535 == ~ 80 ms.

while 'clock' reports something crazy.. that would explain why there
wes a huge "block_max" reported earlier.. I guess, sched_clock() is
tsc-based in your case?

Any way to get it switched to jiffies-based one and repeat the test?

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05 15:14         ` Dmitry Adamushko
@ 2007-06-05 15:41           ` Dmitry Adamushko
  2007-06-05 19:50             ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Dmitry Adamushko @ 2007-06-05 15:41 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Molnar, Linux Kernel, Rusty Russell, Andrew Morton

On 05/06/07, Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> > now at 257428593818894 nsecs
> >
> > cpu: 0
> >   .nr_running            : 3
> >   .raw_weighted_load     : 2063
> >   .nr_switches           : 242830075
> >   .nr_load_updates       : 30172063
> >   .nr_uninterruptible    : 0
> >   .jiffies               : 64282148
> >   .next_balance          : 0
> >   .curr->pid             : 27182
> >   .clock                 : 125650217819008823
> >   .prev_clock_raw        : 257428516403535
>
> The delta (clock - prev_clock_raw) looks insane.
>

err.. please ignore it. 'clock' and 'prev_clock_raw' follow each other
only as long as the 'clock_warps' == 0.. otherwise 'delta' just
doesn't make any sense. And in your case:

  .clock_warps           : 9


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05 15:41           ` Dmitry Adamushko
@ 2007-06-05 19:50             ` Ingo Molnar
  2007-06-06 20:23               ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-05 19:50 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Matt Mackall, Linux Kernel, Rusty Russell, Andrew Morton

* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> On 05/06/07, Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:
> >> now at 257428593818894 nsecs
> >>
> >> cpu: 0
> >>   .nr_running            : 3
> >>   .raw_weighted_load     : 2063
> >>   .nr_switches           : 242830075
> >>   .nr_load_updates       : 30172063
> >>   .nr_uninterruptible    : 0
> >>   .jiffies               : 64282148
> >>   .next_balance          : 0
> >>   .curr->pid             : 27182
> >>   .clock                 : 125650217819008823
> >>   .prev_clock_raw        : 257428516403535
> >
> >The delta (clock - prev_clock_raw) looks insane.
> >
> 
> err.. please ignore it. 'clock' and 'prev_clock_raw' follow each other 
> only as long as the 'clock_warps' == 0.. otherwise 'delta' just 
> doesn't make any sense. And in your case:
> 
>  .clock_warps           : 9

yeah. Look at the sched_clock() reliability counts:

  .clock_warps           : 9
  .clock_unstable_events : 41133344
  .clock_max_delta       : 3954619

i.e. a quite large 'tsc unstable events' counter and with a handful 
negative warps detected, the actual maximum TSC delta was 3954 usecs 
(clock_max_delta), so i think rq_clock() behaved sane and sched_clock() 
has good quality on Matt's box. So the large block_max is still 
mysterious to me:

  > > sleep_max                :         57476665627
  > > block_max                :   18014060106626075

because the whole rq_clock() logic is about avoiding large jumps. So i 
dont see how 'now' could _ever_ have ended up being so very large. Even 
if the new sched-clock code in arch/i386/kernel/sched-clock.c frequently 
switches between sc->unstable == 0 or 1, (i.e. frequently switches 
between jiffies based and TSC based clock), the rq_clock() logic should 
cushion off any jumpiness or small warps from that. I'm wondering 
whether this line:

                r = (jiffies_64 - sc->sync_base) * (1000000000 / HZ);

could produce funnies. But even if it does, it could at most produce a 
very large _positive_ jump - which they would have to show up in 
clock_max_delta. Negative jumps are completely ignored by rq_clock(). 

although ... the 'raw' rq_clock() values are very large:

  .clock                 : 125650217819008823

  $ printf "%016Lx\n" 125650217819008823
  01be6635170d6f37

which could be due to some TSC corruption? So perhaps somewhere we 
calculated block_max with 'now' zero, hence resulting in that very large 
value? Still, i dont see how that could happen. Weird.

it would be nice to monitor how 'clock' changes in time on that box. It 
should be advancing monotonically and at most with the speed of the CPU 
clock. (but it can move slower, over C3/unstable/cpufreq periods.) This 
could be done the following way:

   while sleep 1; do cat /proc/sched_debug >> sched_debug.txt; done

Matt, could you run this for 1-2 minutes and send us the sched_debug.txt 
output?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05 19:50             ` Ingo Molnar
@ 2007-06-06 20:23               ` Matt Mackall
  2007-06-08  9:34                 ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2007-06-06 20:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Dmitry Adamushko, Linux Kernel, Rusty Russell, Andrew Morton

On Tue, Jun 05, 2007 at 09:50:15PM +0200, Ingo Molnar wrote:
> Matt, could you run this for 1-2 minutes and send us the sched_debug.txt 
> output?

http://selenic.com/sched_debug.txt.gz

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-06 20:23               ` Matt Mackall
@ 2007-06-08  9:34                 ` Ingo Molnar
  2007-06-08 10:18                   ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-08  9:34 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Dmitry Adamushko, Linux Kernel, Rusty Russell, Andrew Morton

* Matt Mackall <mpm@selenic.com> wrote:

> On Tue, Jun 05, 2007 at 09:50:15PM +0200, Ingo Molnar wrote:
> > Matt, could you run this for 1-2 minutes and send us the sched_debug.txt 
> > output?
> 
> http://selenic.com/sched_debug.txt.gz

thanks. It shows the anomaly in action:

  now at 365294215491977 nsecs

  .jiffies               : 91248553
  .next_balance          : 0
  .curr->pid             : 18589
  .clock                 : 125652924079659272
  .prev_clock_raw        : 365201238127457
  .clock_warps           : 9
  .clock_unstable_events : 61896358
  .clock_max_delta       : 3997813

next one is:

  now at 365295219388142 nsecs

  .jiffies               : 91248804
  .next_balance          : 0
  .curr->pid             : 18591
  .clock                 : 125653018059166371
  .prev_clock_raw        : 365295217642619
  .clock_warps           : 9
  .clock_unstable_events : 61896359
  .clock_max_delta       : 92976502936

251 jiffies passed, at 250 Hz that's 1 second - this proves that the 
sample is indeed an accurate once-per-second sample according to the 
timer interrupt. The 'now' timestamp (ktime_get() based) shows 
1003896165 nanosecs passed - this too is showing a precise 1 second 
sample, according to GTOD.

So all the time references we have show that (no surprise here) 1 second 
passed between the two samples. But sched_clock() shows a _large_ jump:

  .clock                 : 125652924079659272
  .clock                 : 125653018059166371

also reflected in .clock_max_delta:

  .clock_max_delta       : 92976502936

that's a 93 seconds jump (!) in a single 1-second sample. We also had a 
single sched-clock-unstable event:

  .clock_unstable_events : 61896358
  .clock_unstable_events : 61896359

could you please try a test-boot with 'notsc' - do the scheduling 
weirdnesses go away? Also, 

There are two reasons why the sched_clock() in -mm could behave like 
that - either the sched-clock-share patches in it are buggy and we do 
not smoothly switch over from sc->unstable == 1 to sc->unstable == 0, or 
the TSC itself is unstable. To test the latter theory, could you run a 1 
minute tsc-dump on your box:

	./tsc-dump > tsc-dump.txt

You can pick tsc-dump up from:

	http://redhat.com/~mingo/time-warp-test/

(please run this on a recent -mm kernel so that we have the same 
ACPI-idle characteristics as on the buggy kernel.)

to test the former theory, could you boot with 'notsc' - do the 
weirdnesses go away? (please create another sched-debug.txt as well)

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-08  9:34                 ` Ingo Molnar
@ 2007-06-08 10:18                   ` Ingo Molnar
  0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2007-06-08 10:18 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Dmitry Adamushko, Linux Kernel, Rusty Russell, Andrew Morton


* Ingo Molnar <mingo@elte.hu> wrote:

> thanks. It shows the anomaly in action:

> So all the time references we have show that (no surprise here) 1 
> second passed between the two samples. But sched_clock() shows a 
> _large_ jump:
> 
>   .clock                 : 125652924079659272
>   .clock                 : 125653018059166371
> 
> also reflected in .clock_max_delta:
> 
>   .clock_max_delta       : 92976502936
> 
> that's a 93 seconds jump (!) in a single 1-second sample. [...]

we also had a similar jump in exec_clock, which suggests that the 
sched_clock() "jump" occured while there was a task running. (i.e. not 
during ACPI-C2/C3) Could you try the patch below? It should catch such 
large jumps. The 'clock_overflows' counter in /proc/sched_debug will 
show whether this new protection triggered on your box.

	Ingo

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -149,12 +149,12 @@ struct rq {
 
 	u64 clock, prev_clock_raw;
 	s64 clock_max_delta;
-	u64 fair_clock, prev_fair_clock;
-	u64 exec_clock, prev_exec_clock;
+	u64 fair_clock, prev_fair_clock;
+	u64 exec_clock, prev_exec_clock;
 	s64 wait_runtime;
 	unsigned long wait_runtime_overruns, wait_runtime_underruns;
 
-	unsigned int clock_warps;
+	unsigned int clock_warps, clock_overflows;
 	unsigned int clock_unstable_events;
 
 	struct sched_class *load_balance_class;
@@ -245,9 +245,17 @@ static inline unsigned long long __rq_cl
 		clock++;
 		rq->clock_warps++;
 	} else {
-		if (unlikely(delta > rq->clock_max_delta))
-			rq->clock_max_delta = delta;
-		clock += delta;
+		/*
+		 * Catch too large forward jumps too:
+		 */
+		if (delta > 2*TICK_NSEC) {
+			clock++;
+			rq->clock_overflows++;
+		} else {
+			if (unlikely(delta > rq->clock_max_delta))
+				rq->clock_max_delta = delta;
+			clock += delta;
+		}
 	}
 
 	rq->prev_clock_raw = now;
Index: linux/kernel/sched_debug.c
===================================================================
--- linux.orig/kernel/sched_debug.c
+++ linux/kernel/sched_debug.c
@@ -117,13 +119,13 @@ static void print_cpu(struct seq_file *m
 	P(clock);
 	P(prev_clock_raw);
 	P(clock_warps);
+	P(clock_overflows);
 	P(clock_unstable_events);
 	P(clock_max_delta);
-	rq->clock_max_delta = 0;
 	P(fair_clock);
-	P(prev_fair_clock);
+	P(prev_fair_clock);
 	P(exec_clock);
-	P(prev_exec_clock);
+	P(prev_exec_clock);
 	P(wait_runtime);
 	P(wait_runtime_overruns);
 	P(wait_runtime_underruns);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 17:37 Interesting interaction between lguest and CFS Matt Mackall
  2007-06-04 17:54 ` Ingo Molnar
@ 2007-06-04 18:11 ` Ingo Molnar
  2007-06-04 18:42   ` Matt Mackall
  2007-06-04 20:15   ` James Morris
  2007-06-04 22:58 ` Olof Johansson
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2007-06-04 18:11 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> With 2.6.22-rc3-mm1, I've got a long-running video transcoding going 
> on. In other windows, I'm compiling, reading email, etc. with no 
> noticeable problems.
> 
> If I fire up lguest and leave it sitting at a shell prompt for a 
> couple moments, when I return to type something at the prompt, it can 
> take 2-3 seconds for my input to echo. Once it starts responding, 
> typing latency disappears. Suspend the other app and the latency 
> disappears.

btw., does this only happen with lguest, or with other idle shells too?

perhaps lguest does something unusual when it idles? Does it fiddle 
sched_clock() in any way?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 18:11 ` Ingo Molnar
@ 2007-06-04 18:42   ` Matt Mackall
  2007-06-04 19:12     ` Ingo Molnar
  2007-06-04 20:15   ` James Morris
  1 sibling, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2007-06-04 18:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, akpm, Linux-kernel

On Mon, Jun 04, 2007 at 08:11:51PM +0200, Ingo Molnar wrote:
> 
> * Matt Mackall <mpm@selenic.com> wrote:
> 
> > With 2.6.22-rc3-mm1, I've got a long-running video transcoding going 
> > on. In other windows, I'm compiling, reading email, etc. with no 
> > noticeable problems.
> > 
> > If I fire up lguest and leave it sitting at a shell prompt for a 
> > couple moments, when I return to type something at the prompt, it can 
> > take 2-3 seconds for my input to echo. Once it starts responding, 
> > typing latency disappears. Suspend the other app and the latency 
> > disappears.
> 
> btw., does this only happen with lguest, or with other idle shells too?

Only noticed with lguest.
 
> perhaps lguest does something unusual when it idles? Does it fiddle 
> sched_clock() in any way?

In the host? That'd be weird.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 18:42   ` Matt Mackall
@ 2007-06-04 19:12     ` Ingo Molnar
  2007-06-04 19:27       ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-04 19:12 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> > btw., does this only happen with lguest, or with other idle shells 
> > too?
> 
> Only noticed with lguest.

ah, so both the shell and the 'competing' CPU hog was running within the 
same lguest instance?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 19:12     ` Ingo Molnar
@ 2007-06-04 19:27       ` Matt Mackall
  2007-06-04 19:40         ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2007-06-04 19:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, akpm, Linux-kernel

On Mon, Jun 04, 2007 at 09:12:29PM +0200, Ingo Molnar wrote:
> 
> * Matt Mackall <mpm@selenic.com> wrote:
> 
> > > btw., does this only happen with lguest, or with other idle shells 
> > > too?
> > 
> > Only noticed with lguest.
> 
> ah, so both the shell and the 'competing' CPU hog was running within the 
> same lguest instance?

No.

The CPU hog is running in the host. A single instance of lguest using
busybox is started in another shell. When it reaches a shell prompt,
-that- shell is occassionally unresponsive for long stretches.

For the purposes of CFS, of course, the shell running inside lguest is
invisible.

Also, for comparison, running the same kernel and disk image inside
qemu doesn't show any lag, despite being substantially slower (no
kernel/VT acceleration).

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 19:27       ` Matt Mackall
@ 2007-06-04 19:40         ` Ingo Molnar
  2007-06-04 19:59           ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-04 19:40 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> > ah, so both the shell and the 'competing' CPU hog was running within 
> > the same lguest instance?
> 
> No.
> 
> The CPU hog is running in the host. A single instance of lguest using 
> busybox is started in another shell. When it reaches a shell prompt, 
> -that- shell is occassionally unresponsive for long stretches.

so the shell within lguest is affected by the CPU hog outside? Could you 
send me the sched-debug stats of the CPU hog and of the lguest host 
process as well? (or were those amongst the ones you already sent?)

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 19:40         ` Ingo Molnar
@ 2007-06-04 19:59           ` Matt Mackall
  2007-06-04 20:13             ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2007-06-04 19:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, akpm, Linux-kernel

On Mon, Jun 04, 2007 at 09:40:38PM +0200, Ingo Molnar wrote:
> 
> * Matt Mackall <mpm@selenic.com> wrote:
> 
> > > ah, so both the shell and the 'competing' CPU hog was running within 
> > > the same lguest instance?
> > 
> > No.
> > 
> > The CPU hog is running in the host. A single instance of lguest using 
> > busybox is started in another shell. When it reaches a shell prompt, 
> > -that- shell is occassionally unresponsive for long stretches.
> 
> so the shell within lguest is affected by the CPU hog outside? Could you 
> send me the sched-debug stats of the CPU hog and of the lguest host 
> process as well? (or were those amongst the ones you already sent?)

I sent you the stats for:

- both lguest processes
- the bash which started lguest (which I suspect you don't actually
  want)
- the gnome-terminal that lguest was running inside of
  (keystrokes to the shell in lguest go through that terminal)

The last is probably not very interesting either, as gnome-terminal is
a single process with multiple windows and none of the other windows
were affected.

So I sent you everything but the stats for the CPU hog.

I suggest you try this yourself - lguest is incredibly easy to get up
and running. It's also quite useful: I can test-boot kernels with it
in less than a second, or about 10x faster than basic qemu, and 100x
faster than a real boot. And as it uses a pty as console, you can do
things like pipe it through grep.

You'll need the following tab-damaged patch for rc3-mm1:

--- mm.orig/arch/i386/kernel/sched-clock.c      2007-06-04 12:15:51.000000000 -0500
+++ mm/arch/i386/kernel/sched-clock.c   2007-06-04 12:15:57.000000000
-0500
@@ -133,7 +133,7 @@ static void resync_freq(void *arg)
        struct sc_data *sc = &__get_cpu_var(sc_data);
 
        sc->sync_base = jiffies;
-       if (!cpu_has_tsc) {
+       if (!cpu_has_tsc || tsc_disable) {
                sc->unstable = 1;
                return;
        }

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 19:59           ` Matt Mackall
@ 2007-06-04 20:13             ` Ingo Molnar
  2007-06-04 20:20               ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-04 20:13 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> So I sent you everything but the stats for the CPU hog.

ok, thx.

> I suggest you try this yourself [...]

yep, i know how to use lguest and i'm already running it :) But i cannot 
trigger the problem, that's why i was curious about your host-side 
stats.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 20:13             ` Ingo Molnar
@ 2007-06-04 20:20               ` Ingo Molnar
  0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2007-06-04 20:20 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> > I suggest you try this yourself [...]
> 
> yep, i know how to use lguest and i'm already running it :) But i 
> cannot trigger the problem, that's why i was curious about your 
> host-side stats.

i can run a hundred CPU hogs on the host and the lguest instance is 
still usable under CFS:

 @%@%> time ls
 bin         home        lost+found  proc        sys         var
 dev         lib         mnt         root        tmp
 etc         linuxrc     opt         sbin        usr
 real    0m 0.33s
 user    0m 0.00s
 sys     0m 0.00s

i'm wondering whether the delay goes away if you boot with maxcpus=1? 
(if you have multiple cpus/cores on your box)

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 18:11 ` Ingo Molnar
  2007-06-04 18:42   ` Matt Mackall
@ 2007-06-04 20:15   ` James Morris
  2007-06-04 20:24     ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: James Morris @ 2007-06-04 20:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Matt Mackall, Rusty Russell, akpm, Linux-kernel

On Mon, 4 Jun 2007, Ingo Molnar wrote:

> perhaps lguest does something unusual when it idles? Does it fiddle 
> sched_clock() in any way?

One of the lguest processes calls nice(2), which seems like a long shot, 
but also not impossibly related.  (Note that newer patches in the lguest 
queue remove this, so it'll depend on exactly which patches you have 
applied).

- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 20:15   ` James Morris
@ 2007-06-04 20:24     ` Ingo Molnar
  0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2007-06-04 20:24 UTC (permalink / raw)
  To: James Morris; +Cc: Matt Mackall, Rusty Russell, akpm, Linux-kernel


* James Morris <jmorris@namei.org> wrote:

> On Mon, 4 Jun 2007, Ingo Molnar wrote:
> 
> > perhaps lguest does something unusual when it idles? Does it fiddle 
> > sched_clock() in any way?
> 
> One of the lguest processes calls nice(2), which seems like a long 
> shot, but also not impossibly related.  (Note that newer patches in 
> the lguest queue remove this, so it'll depend on exactly which patches 
> you have applied).

i've got 2.6.22-rc3-mm1's lguest (with a trivial boot-time crash fix in 
sched_clock that i ran into). But nice(2) should only cause a gentle, 
20% smaller CPU bandwidth to be allocated by CFS to that thread, nothing 
more drastic like user-visible delays.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 17:37 Interesting interaction between lguest and CFS Matt Mackall
  2007-06-04 17:54 ` Ingo Molnar
  2007-06-04 18:11 ` Ingo Molnar
@ 2007-06-04 22:58 ` Olof Johansson
  2007-06-05  1:02   ` Rusty Russell
  2007-06-05  0:18 ` Rusty Russell
  2007-06-05  9:23 ` Ingo Molnar
  4 siblings, 1 reply; 38+ messages in thread
From: Olof Johansson @ 2007-06-04 22:58 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Molnar, Rusty Russell, akpm, Linux-kernel

On Mon, Jun 04, 2007 at 12:37:10PM -0500, Matt Mackall wrote:
> With 2.6.22-rc3-mm1, I've got a long-running video transcoding going
> on. In other windows, I'm compiling, reading email, etc. with no
> noticeable problems.
> 
> If I fire up lguest and leave it sitting at a shell prompt for a
> couple moments, when I return to type something at the prompt, it can
> take 2-3 seconds for my input to echo. Once it starts responding,
> typing latency disappears. Suspend the other app and the latency
> disappears.
> 
> The system isn't swapping and has basically no I/O load.

Is this on hvc console? The hvc driver was recently modified to back off
it's timing interval when there's no input, the max value is 2 seconds.

See b791072ba3c3b29bab1890963bde52eb944a8451 for the changes in question.


-Olof


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 22:58 ` Olof Johansson
@ 2007-06-05  1:02   ` Rusty Russell
  2007-06-05  1:24     ` Stephen Rothwell
  2007-06-05  2:23     ` Olof Johansson
  0 siblings, 2 replies; 38+ messages in thread
From: Rusty Russell @ 2007-06-05  1:02 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Matt Mackall, Ingo Molnar, akpm, Linux-kernel, Stephen Rothwell

On Mon, 2007-06-04 at 17:58 -0500, Olof Johansson wrote:
> On Mon, Jun 04, 2007 at 12:37:10PM -0500, Matt Mackall wrote:
> > With 2.6.22-rc3-mm1, I've got a long-running video transcoding going
> > on. In other windows, I'm compiling, reading email, etc. with no
> > noticeable problems.
> > 
> > If I fire up lguest and leave it sitting at a shell prompt for a
> > couple moments, when I return to type something at the prompt, it can
> > take 2-3 seconds for my input to echo. Once it starts responding,
> > typing latency disappears. Suspend the other app and the latency
> > disappears.
> > 
> > The system isn't swapping and has basically no I/O load.
> 
> Is this on hvc console? The hvc driver was recently modified to back off
> it's timing interval when there's no input, the max value is 2 seconds.
> 
> See b791072ba3c3b29bab1890963bde52eb944a8451 for the changes in question.

Hmm, I thought that was only supposed to happen for "interruptless"
consoles?

Rusty.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  1:02   ` Rusty Russell
@ 2007-06-05  1:24     ` Stephen Rothwell
  2007-06-05  2:23     ` Olof Johansson
  1 sibling, 0 replies; 38+ messages in thread
From: Stephen Rothwell @ 2007-06-05  1:24 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Olof Johansson, Matt Mackall, Ingo Molnar, akpm, Linux-kernel

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Tue, 05 Jun 2007 11:02:00 +1000 Rusty Russell <rusty@rustcorp.com.au> wrote:
>
> > See b791072ba3c3b29bab1890963bde52eb944a8451 for the changes in question.
> 
> Hmm, I thought that was only supposed to happen for "interruptless"
> consoles?

The comment certainly says it only applies to polled mode.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  1:02   ` Rusty Russell
  2007-06-05  1:24     ` Stephen Rothwell
@ 2007-06-05  2:23     ` Olof Johansson
  1 sibling, 0 replies; 38+ messages in thread
From: Olof Johansson @ 2007-06-05  2:23 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Matt Mackall, Ingo Molnar, akpm, Linux-kernel, Stephen Rothwell

On Tue, Jun 05, 2007 at 11:02:00AM +1000, Rusty Russell wrote:
> On Mon, 2007-06-04 at 17:58 -0500, Olof Johansson wrote:
> > 
> > Is this on hvc console? The hvc driver was recently modified to back off
> > it's timing interval when there's no input, the max value is 2 seconds.
> > 
> > See b791072ba3c3b29bab1890963bde52eb944a8451 for the changes in question.
> 
> Hmm, I thought that was only supposed to happen for "interruptless"
> consoles?

Ok, my bad. I thought lguest was polled as well.


-Olof

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 17:37 Interesting interaction between lguest and CFS Matt Mackall
                   ` (2 preceding siblings ...)
  2007-06-04 22:58 ` Olof Johansson
@ 2007-06-05  0:18 ` Rusty Russell
  2007-06-05  2:31   ` Rusty Russell
                     ` (2 more replies)
  2007-06-05  9:23 ` Ingo Molnar
  4 siblings, 3 replies; 38+ messages in thread
From: Rusty Russell @ 2007-06-05  0:18 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Molnar, akpm, Linux-kernel

On Mon, 2007-06-04 at 12:37 -0500, Matt Mackall wrote:
> With 2.6.22-rc3-mm1, I've got a long-running video transcoding going
> on. In other windows, I'm compiling, reading email, etc. with no
> noticeable problems.
> 
> If I fire up lguest and leave it sitting at a shell prompt for a
> couple moments, when I return to type something at the prompt, it can
> take 2-3 seconds for my input to echo. Once it starts responding,
> typing latency disappears. Suspend the other app and the latency
> disappears.

This sounds like the waker process (nice 19) not getting a chance to
run.  You can hack around it for the moment by changing "nice(19)" in
Documentation/lguest/lguest.c to something less aggressive.

The real solution is to switch to an fd.  Fortunately, I have just such
a patch.  I am shuffling it forward in my queue now, and will send when
testing is complete...

Thanks!
Rusty.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  0:18 ` Rusty Russell
@ 2007-06-05  2:31   ` Rusty Russell
  2007-06-05  4:18     ` Matt Mackall
  2007-06-05  2:34   ` Matt Mackall
  2007-06-05  7:10   ` Ingo Molnar
  2 siblings, 1 reply; 38+ messages in thread
From: Rusty Russell @ 2007-06-05  2:31 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Molnar, akpm, Linux-kernel

On Tue, 2007-06-05 at 10:19 +1000, Rusty Russell wrote:
> On Mon, 2007-06-04 at 12:37 -0500, Matt Mackall wrote:
> > With 2.6.22-rc3-mm1, I've got a long-running video transcoding going
> > on. In other windows, I'm compiling, reading email, etc. with no
> > noticeable problems.
> > 
> > If I fire up lguest and leave it sitting at a shell prompt for a
> > couple moments, when I return to type something at the prompt, it can
> > take 2-3 seconds for my input to echo. Once it starts responding,
> > typing latency disappears. Suspend the other app and the latency
> > disappears.
> 
> This sounds like the waker process (nice 19) not getting a chance to
> run.  You can hack around it for the moment by changing "nice(19)" in
> Documentation/lguest/lguest.c to something less aggressive.
> 
> The real solution is to switch to an fd.  Fortunately, I have just such
> a patch.  I am shuffling it forward in my queue now, and will send when
> testing is complete...

Does this solve it for you?

lguest: Replace signal with fd operation

We currently use a "waker" process: a child of the launcher which
selects() on the incoming file descriptors.  It sends a SIGUSR1 to the
launcher whenever select() returns to kick the launcher out of the
kernel.

This has nasty side-effects: the waker needs to keep sending signals
to avoid the race, so we nice it to try to make sure the launcher runs
soon.  Also the launcher blocks SIGUSR1 when it's not running the
guest, so it doesn't have to deal with other interrupted reads...

It's better to explicitly tell the kernel to break out of the guest,
and this is what we do, with a new LHREQ_BREAK command.  This makes
the launcher return -EAGAIN from reading /dev/lguest, and blocks the
waker until the launcher calls LHREQ_BREAK, avoiding the race.

We also take precautions against simultaneous writes or reads on the
/dev/lguest fd.  As only root can have these file descriptors, it's
not much of a problem, but we want to relax that restriction
eventually.

The main improvement is in consistency, rather than raw benchmark
results:

Before:
Time for one context switch via pipe: 9265 (4534 - 9495)
Time for one Copy-on-Write fault: 67687 (14898 - 159125)
Time to exec client once: 1102812 (795843 - 1128250)
Time for one fork/exit/wait: 712000 (400625 - 723156)
Time for gettimeofday(): 16681 (16378 - 35835)
Time to send 4 MB from host: 141317343 (140165578 - 141469500)
Time for one int-0x80 syscall: 272 (272 - 575)
Time for one syscall via libc: 275 (274 - 904)
Time for two PTE updates: 16232 (6430 - 16316)
Time to read from disk (256 kB): 16786750 (16597500 - 31493250)
Time for one disk read: 192656 (189312 - 958687)
Time for inter-guest pingpong: 110453 (104492 - 316429)

After:
Time for one context switch via pipe: 4687 (4563 - 4857)
Time for one Copy-on-Write fault: 44523 (11628 - 77855)
Time to exec client once: 814765 (805796 - 829875)
Time for one fork/exit/wait: 405875 (400562 - 434750)
Time for gettimeofday(): 16644 (16203 - 16931)
Time to send 4 MB from host: 136530000 (121522250 - 151629000)
Time for one int-0x80 syscall: 273 (272 - 274)
Time for one syscall via libc: 279 (277 - 279)
Time for two PTE updates: 6439 (6395 - 6528)
Time to read from disk (256 kB): 16787000 (16641250 - 16861250)
Time for one disk read: 192187 (190515 - 193843)
Time for inter-guest pingpong: 111093 (109203 - 136554)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

---
 Documentation/lguest/lguest.c   |  105 +++++++++------------------------------
 drivers/lguest/core.c           |    9 ++-
 drivers/lguest/lg.h             |    5 +
 drivers/lguest/lguest_user.c    |   62 +++++++++++++++++++----
 include/linux/lguest_launcher.h |    1
 5 files changed, 90 insertions(+), 92 deletions(-)

diff -r 7d068abb5a41 Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c	Tue Jun 05 09:51:59 2007 +1000
+++ b/Documentation/lguest/lguest.c	Tue Jun 05 10:32:23 2007 +1000
@@ -16,7 +16,6 @@
 #include <fcntl.h>
 #include <stdbool.h>
 #include <errno.h>
-#include <signal.h>
 #include <ctype.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
@@ -341,15 +340,14 @@ static void set_fd(int fd, struct device
 		devices->max_infd = fd;
 }
 
-/* We send lguest_add signals while input is pending: avoids races. */
-static void wake_parent(int pipefd, struct device_list *devices)
-{
-	nice(19);
-
+/* When input arrives, we tell the kernel to kick lguest out with -EAGAIN. */
+static void wake_parent(int pipefd, int lguest_fd, struct device_list *devices)
+{
 	set_fd(pipefd, devices);
 
 	for (;;) {
 		fd_set rfds = devices->infds;
+		u32 args[] = { LHREQ_BREAK, 1 };
 
 		select(devices->max_infd+1, &rfds, NULL, NULL, NULL);
 		if (FD_ISSET(pipefd, &rfds)) {
@@ -357,25 +355,14 @@ static void wake_parent(int pipefd, stru
 			if (read(pipefd, &ignorefd, sizeof(ignorefd)) == 0)
 				exit(0);
 			FD_CLR(ignorefd, &devices->infds);
-		}
-		kill(getppid(), SIGUSR1);
-	}
-}
-
-/* We don't want signal to kill us, just jerk us out of kernel. */
-static void wakeup(int signo)
-{
-}
-
-static int setup_waker(struct device_list *device_list)
+		} else
+			write(lguest_fd, args, sizeof(args));
+	}
+}
+
+static int setup_waker(int lguest_fd, struct device_list *device_list)
 {
 	int pipefd[2], child;
-	struct sigaction act;
-
-	act.sa_handler = wakeup;
-	sigemptyset(&act.sa_mask);
-	act.sa_flags = 0;
-	sigaction(SIGUSR1, &act, NULL);
 
 	pipe(pipefd);
 	child = fork();
@@ -384,7 +371,7 @@ static int setup_waker(struct device_lis
 
 	if (child == 0) {
 		close(pipefd[1]);
-		wake_parent(pipefd[0], device_list);
+		wake_parent(pipefd[0], lguest_fd, device_list);
 	}
 	close(pipefd[0]);
 
@@ -620,6 +607,7 @@ static void handle_input(int fd, int chi
 	for (;;) {
 		struct device *i;
 		fd_set fds = devices->infds;
+		unsigned int num = 0;
 
 		if (select(devices->max_infd+1, &fds, NULL, NULL, &poll) == 0)
 			break;
@@ -632,6 +620,7 @@ static void handle_input(int fd, int chi
 					write(childfd, &i->fd, sizeof(i->fd));
 				}
 			}
+			num++;
 		}
 	}
 }
@@ -900,27 +889,26 @@ static void __attribute__((noreturn))
 static void __attribute__((noreturn))
 run_guest(int lguest_fd, int waker_fd, struct device_list *device_list)
 {
-	sigset_t sigset;
-
-	sigemptyset(&sigset);
-	sigaddset(&sigset, SIGUSR1);
 	for (;;) {
+		u32 args[] = { LHREQ_BREAK, 0 };
 		unsigned long arr[2];
 		int readval;
 
-		sigprocmask(SIG_UNBLOCK, &sigset, NULL);
+		/* We read from the /dev/lguest device to run the Guest. */
 		readval = read(lguest_fd, arr, sizeof(arr));
-		sigprocmask(SIG_BLOCK, &sigset, NULL);
-
-		if (readval == sizeof(arr))
+
+		if (readval == sizeof(arr)) {
 			handle_output(lguest_fd, arr[0], arr[1], device_list);
-		else if (errno == ENOENT) {
+			continue;
+		} else if (errno == ENOENT) {
 			char reason[1024] = { 0 };
 			read(lguest_fd, reason, sizeof(reason)-1);
 			errx(1, "%s", reason);
-		} else if (errno != EINTR)
+		} else if (errno != EAGAIN)
 			err(1, "Running guest failed");
 		handle_input(lguest_fd, waker_fd, device_list);
+		if (write(lguest_fd, args, sizeof(args)) < 0)
+			err(1, "Resetting break");
 	}
 }
 
@@ -1014,7 +1002,7 @@ int main(int argc, char *argv[])
 	*(int *)(boot + 0x23c) = 1;
 
 	lguest_fd = tell_kernel(pgdir, start, page_offset);
-	waker_fd = setup_waker(&device_list);
+	waker_fd = setup_waker(lguest_fd, &device_list);
 
 	run_guest(lguest_fd, waker_fd, &device_list);
 }
diff -r 7d068abb5a41 drivers/lguest/core.c
--- a/drivers/lguest/core.c	Tue Jun 05 09:51:59 2007 +1000
+++ b/drivers/lguest/core.c	Tue Jun 05 09:56:32 2007 +1000
@@ -313,7 +313,12 @@ int run_guest(struct lguest *lg, unsigne
 		}
 
 		if (signal_pending(current))
-			return -EINTR;
+			return -ERESTARTSYS;
+
+		/* If Waker set break_out, return to Launcher. */
+		if (lg->break_out)
+			return -EAGAIN;
+
 		maybe_do_interrupt(lg);
 
 		try_to_freeze();
diff -r 7d068abb5a41 drivers/lguest/lg.h
--- a/drivers/lguest/lg.h	Tue Jun 05 09:51:59 2007 +1000
+++ b/drivers/lguest/lg.h	Tue Jun 05 09:53:28 2007 +1000
@@ -16,6 +16,7 @@
 #include <linux/futex.h>
 #include <linux/lguest.h>
 #include <linux/lguest_launcher.h>
+#include <linux/wait.h>
 #include <linux/err.h>
 #include <asm/semaphore.h>
 #include "irq_vectors.h"
@@ -138,6 +139,10 @@ struct lguest
 	u32 esp1;
 	u8 ss1;
 
+	/* Do we need to stop what we're doing and return to userspace? */
+	int break_out;
+	wait_queue_head_t break_wq;
+
 	/* Bitmap of what has changed: see CHANGED_* above. */
 	int changed;
 	struct lguest_pages *last_pages;
diff -r 7d068abb5a41 drivers/lguest/lguest_user.c
--- a/drivers/lguest/lguest_user.c	Tue Jun 05 09:51:59 2007 +1000
+++ b/drivers/lguest/lguest_user.c	Tue Jun 05 10:36:58 2007 +1000
@@ -30,6 +30,30 @@ static long user_get_dma(struct lguest *
 	return udma;
 }
 
+/* To force the Guest to stop running and return to the Launcher, the
+ * Waker sets writes LHREQ_BREAK and the value "1" to /dev/lguest.  The
+ * Launcher then writes LHREQ_BREAK and "0" to release the Waker. */
+static int break_guest_out(struct lguest *lg, const u32 __user *input)
+{
+	unsigned long on;
+
+	/* Fetch whether they're turning break on or off.. */
+	if (get_user(on, input) != 0)
+		return -EFAULT;
+
+	if (on) {
+		lg->break_out = 1;
+		/* Pop it out (may be running on different CPU) */
+		wake_up_process(lg->tsk);
+		/* Wait for them to reset it */
+		return wait_event_interruptible(lg->break_wq, !lg->break_out);
+	} else {
+		lg->break_out = 0;
+		wake_up(&lg->break_wq);
+		return 0;
+	}
+}
+
 /* + irq */
 static int user_send_irq(struct lguest *lg, const u32 __user *input)
 {
@@ -49,6 +73,10 @@ static ssize_t read(struct file *file, c
 
 	if (!lg)
 		return -EINVAL;
+
+	/* If you're not the task which owns the guest, go away. */
+	if (current != lg->tsk)
+		return -EPERM;
 
 	if (lg->dead) {
 		size_t len;
@@ -75,13 +103,20 @@ static int initialize(struct file *file,
 	int err, i;
 	u32 args[4];
 
-	if (file->private_data)
-		return -EBUSY;
-
-	if (copy_from_user(args, input, sizeof(args)) != 0)
-		return -EFAULT;
-
+	/* We grab the Big Lguest lock, which protects the global array
+	 * "lguests" and multiple simultaneous initializations. */
 	mutex_lock(&lguest_lock);
+
+	if (file->private_data) {
+		err = -EBUSY;
+		goto unlock;
+	}
+
+	if (copy_from_user(args, input, sizeof(args)) != 0) {
+		err = -EFAULT;
+		goto unlock;
+	}
+
 	i = find_free_guest();
 	if (i < 0) {
 		err = -ENOSPC;
@@ -107,10 +142,12 @@ static int initialize(struct file *file,
 	lg->tsk = current;
 	get_task_struct(lg->tsk);
 	lg->mm = get_task_mm(lg->tsk);
+	init_waitqueue_head(&lg->break_wq);
 	lg->last_pages = NULL;
-	mutex_unlock(&lguest_lock);
-
 	file->private_data = lg;
+
+	mutex_unlock(&lguest_lock);
+
 	return sizeof(args);
 
 free_regs:
@@ -136,6 +173,10 @@ static ssize_t write(struct file *file, 
 		return -EINVAL;
 	if (lg && lg->dead)
 		return -ENOENT;
+
+	/* If you're not the task which owns the Guest, you can only break */
+	if (lg && current != lg->tsk && req != LHREQ_BREAK)
+		return -EPERM;
 
 	switch (req) {
 	case LHREQ_INITIALIZE:
@@ -144,6 +185,8 @@ static ssize_t write(struct file *file, 
 		return user_get_dma(lg, (const u32 __user *)input);
 	case LHREQ_IRQ:
 		return user_send_irq(lg, (const u32 __user *)input);
+	case LHREQ_BREAK:
+		return break_guest_out(lg, (const u32 __user *)input);
 	default:
 		return -EINVAL;
 	}
diff -r 7d068abb5a41 include/linux/lguest_launcher.h
--- a/include/linux/lguest_launcher.h	Tue Jun 05 09:51:59 2007 +1000
+++ b/include/linux/lguest_launcher.h	Tue Jun 05 09:53:28 2007 +1000
@@ -68,5 +68,6 @@ enum lguest_req
 	LHREQ_INITIALIZE, /* + pfnlimit, pgdir, start, pageoffset */
 	LHREQ_GETDMA, /* + addr (returns &lguest_dma, irq in ->used_len) */
 	LHREQ_IRQ, /* + irq */
+	LHREQ_BREAK, /* + on/off flag (on blocks until someone does off) */
 };
 #endif /* _ASM_LGUEST_USER */



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  2:31   ` Rusty Russell
@ 2007-06-05  4:18     ` Matt Mackall
  2007-06-05  4:37       ` Rusty Russell
  2007-06-05 10:43       ` Ingo Molnar
  0 siblings, 2 replies; 38+ messages in thread
From: Matt Mackall @ 2007-06-05  4:18 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, akpm, Linux-kernel

On Tue, Jun 05, 2007 at 12:31:15PM +1000, Rusty Russell wrote:
> Does this solve it for you?

Nope. Doesn't accept input and hogs the CPU with lots of system time.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  4:18     ` Matt Mackall
@ 2007-06-05  4:37       ` Rusty Russell
  2007-06-05 13:49         ` Matt Mackall
  2007-06-05 10:43       ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Rusty Russell @ 2007-06-05  4:37 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Molnar, akpm, Linux-kernel

On Mon, 2007-06-04 at 23:18 -0500, Matt Mackall wrote:
> On Tue, Jun 05, 2007 at 12:31:15PM +1000, Rusty Russell wrote:
> > Does this solve it for you?
> 
> Nope. Doesn't accept input and hogs the CPU with lots of system time.

Dumb question: did you replace both the module and the launcher?

Rusty.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  4:37       ` Rusty Russell
@ 2007-06-05 13:49         ` Matt Mackall
  2007-06-05 19:38           ` Ingo Molnar
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Mackall @ 2007-06-05 13:49 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, akpm, Linux-kernel

On Tue, Jun 05, 2007 at 02:37:38PM +1000, Rusty Russell wrote:
> On Mon, 2007-06-04 at 23:18 -0500, Matt Mackall wrote:
> > On Tue, Jun 05, 2007 at 12:31:15PM +1000, Rusty Russell wrote:
> > > Does this solve it for you?
> > 
> > Nope. Doesn't accept input and hogs the CPU with lots of system time.
> 
> Dumb question: did you replace both the module and the launcher?

Oh, probably not. Yep, actually loading the new module appears to fix it.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05 13:49         ` Matt Mackall
@ 2007-06-05 19:38           ` Ingo Molnar
  0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2007-06-05 19:38 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> On Tue, Jun 05, 2007 at 02:37:38PM +1000, Rusty Russell wrote:
> > On Mon, 2007-06-04 at 23:18 -0500, Matt Mackall wrote:
> > > On Tue, Jun 05, 2007 at 12:31:15PM +1000, Rusty Russell wrote:
> > > > Does this solve it for you?
> > > 
> > > Nope. Doesn't accept input and hogs the CPU with lots of system time.
> > 
> > Dumb question: did you replace both the module and the launcher?
> 
> Oh, probably not. Yep, actually loading the new module appears to fix it.

oh - so Rusty's fix did the trick?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  4:18     ` Matt Mackall
  2007-06-05  4:37       ` Rusty Russell
@ 2007-06-05 10:43       ` Ingo Molnar
  2007-06-05 19:08         ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-05 10:43 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


Matt, could you send me your .config too? Maybe i can reproduce it 
with your config.

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05 10:43       ` Ingo Molnar
@ 2007-06-05 19:08         ` Ingo Molnar
  2007-06-06 20:19           ` Matt Mackall
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-05 19:08 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel

* Ingo Molnar <mingo@elte.hu> wrote:

> Matt, could you send me your .config too? Maybe i can reproduce it 
> with your config.

ok, got it (off-list) and you seem to have the default HZ=250. Could you 
change it to HZ=1000 - does the delay get alot smaller?

if yes then this could be roughly the expected worst-case behavior for a 
nice+19 'signalling' task - i'd suggest to try the dont-use-renice patch 
Rusty sent (with the hang it causes fixed), or to change the nice(19) to 
nice(10).

(the rq_clock behavior still looks a bit anomalous, that perhaps 
contributes to the problem.)

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05 19:08         ` Ingo Molnar
@ 2007-06-06 20:19           ` Matt Mackall
  0 siblings, 0 replies; 38+ messages in thread
From: Matt Mackall @ 2007-06-06 20:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, akpm, Linux-kernel

On Tue, Jun 05, 2007 at 09:08:06PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > Matt, could you send me your .config too? Maybe i can reproduce it 
> > with your config.
> 
> ok, got it (off-list) and you seem to have the default HZ=250. Could you 
> change it to HZ=1000 - does the delay get alot smaller?

Well I'm too busy/lazy to reboot at the moment, but I've reproduced
this behavior with the following:

watchdelta.py:
import time

last = time.time()
while 1:
    t = time.time()
    delta = t - last
    if delta > .5:
        print delta
    last = t

$ nice -n 20 python watchdelta.py 
0.696138858795
0.772134065628
0.852141141891
2.22020101547
0.600075960159
0.528147935867
0.556170940399
0.536160945892
0.572169065475
0.672204971313
0.576174974442
0.640056848526
0.548166036606
0.568286895752
0.544052124023
0.572114944458
0.534852027893
0.596146821976
0.552155971527
0.552098035812
0.564115047455
0.546761989594
0.63211107254
0.876178026199
0.584072828293
0.544088125229
0.760088920593
1.07612800598
0.552155017853
...

At load=1 (the first several lines), this prints out messages
sporadically. At higher loads, every iteration is a >.5s delay. So
it's probably not particular to lguest at all.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  0:18 ` Rusty Russell
  2007-06-05  2:31   ` Rusty Russell
@ 2007-06-05  2:34   ` Matt Mackall
  2007-06-05  7:10   ` Ingo Molnar
  2 siblings, 0 replies; 38+ messages in thread
From: Matt Mackall @ 2007-06-05  2:34 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Ingo Molnar, akpm, Linux-kernel

On Tue, Jun 05, 2007 at 10:18:54AM +1000, Rusty Russell wrote:
> On Mon, 2007-06-04 at 12:37 -0500, Matt Mackall wrote:
> > With 2.6.22-rc3-mm1, I've got a long-running video transcoding going
> > on. In other windows, I'm compiling, reading email, etc. with no
> > noticeable problems.
> > 
> > If I fire up lguest and leave it sitting at a shell prompt for a
> > couple moments, when I return to type something at the prompt, it can
> > take 2-3 seconds for my input to echo. Once it starts responding,
> > typing latency disappears. Suspend the other app and the latency
> > disappears.
> 
> This sounds like the waker process (nice 19) not getting a chance to
> run.  You can hack around it for the moment by changing "nice(19)" in
> Documentation/lguest/lguest.c to something less aggressive.

Manual renicing seems to have done the trick.

This is still a little suspicious though. Why does it take two seconds
to wake up and then behave responsively for a long interval after that?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  0:18 ` Rusty Russell
  2007-06-05  2:31   ` Rusty Russell
  2007-06-05  2:34   ` Matt Mackall
@ 2007-06-05  7:10   ` Ingo Molnar
  2 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2007-06-05  7:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Matt Mackall, akpm, Linux-kernel


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> This sounds like the waker process (nice 19) not getting a chance to 
> run.  You can hack around it for the moment by changing "nice(19)" in 
> Documentation/lguest/lguest.c to something less aggressive.

even if the waker runs at nice+19, under CFS (which schedules 
deterministically) it should not be delayed by a single nice-0 task by 
more than ~70 msecs. That is a far cry from the 'seconds' that Matt saw 
- and the sched stats he sent look anomalous too. Also, i'm not able to 
reproduce it at all - while any nice+19 related problem should be 
readily reproducable on my box too. So my current take is that this is 
some sort of host-side CFS related problem and that the lguest host 
threads somehow were the first ones that managed to trigger it in some 
really bad way.

OTOH i do agree with your patch, that nice+19 thing is quite hacky :-)

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-04 17:37 Interesting interaction between lguest and CFS Matt Mackall
                   ` (3 preceding siblings ...)
  2007-06-05  0:18 ` Rusty Russell
@ 2007-06-05  9:23 ` Ingo Molnar
  2007-06-05 14:06   ` Matt Mackall
  4 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2007-06-05  9:23 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Rusty Russell, akpm, Linux-kernel


* Matt Mackall <mpm@selenic.com> wrote:

> Ok, my transcoding just finished as I was writing this. So I've 
> reproduced the problem with this Python script that I had handy:
> 
> memload.py:
> #!/usr/bin/python
> a = "a" * 16 * 1024 * 1024
> while 1:
>     b = a[1:] + "b"
>     a = b[1:] + "c"

i suspect the same latency is triggered by a simple infinite loop too:

	while : ; do : ; done &

correct?

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Interesting interaction between lguest and CFS
  2007-06-05  9:23 ` Ingo Molnar
@ 2007-06-05 14:06   ` Matt Mackall
  0 siblings, 0 replies; 38+ messages in thread
From: Matt Mackall @ 2007-06-05 14:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rusty Russell, akpm, Linux-kernel

On Tue, Jun 05, 2007 at 11:23:44AM +0200, Ingo Molnar wrote:
> 
> * Matt Mackall <mpm@selenic.com> wrote:
> 
> > Ok, my transcoding just finished as I was writing this. So I've 
> > reproduced the problem with this Python script that I had handy:
> > 
> > memload.py:
> > #!/usr/bin/python
> > a = "a" * 16 * 1024 * 1024
> > while 1:
> >     b = a[1:] + "b"
> >     a = b[1:] + "c"
> 
> i suspect the same latency is triggered by a simple infinite loop too:
> 
> 	while : ; do : ; done &
> 
> correct?

Yes.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2007-06-08 10:19 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-04 17:37 Interesting interaction between lguest and CFS Matt Mackall
2007-06-04 17:54 ` Ingo Molnar
2007-06-04 18:41   ` Matt Mackall
2007-06-05  7:19     ` Ingo Molnar
2007-06-05 14:03       ` Matt Mackall
2007-06-05 15:14         ` Dmitry Adamushko
2007-06-05 15:41           ` Dmitry Adamushko
2007-06-05 19:50             ` Ingo Molnar
2007-06-06 20:23               ` Matt Mackall
2007-06-08  9:34                 ` Ingo Molnar
2007-06-08 10:18                   ` Ingo Molnar
2007-06-04 18:11 ` Ingo Molnar
2007-06-04 18:42   ` Matt Mackall
2007-06-04 19:12     ` Ingo Molnar
2007-06-04 19:27       ` Matt Mackall
2007-06-04 19:40         ` Ingo Molnar
2007-06-04 19:59           ` Matt Mackall
2007-06-04 20:13             ` Ingo Molnar
2007-06-04 20:20               ` Ingo Molnar
2007-06-04 20:15   ` James Morris
2007-06-04 20:24     ` Ingo Molnar
2007-06-04 22:58 ` Olof Johansson
2007-06-05  1:02   ` Rusty Russell
2007-06-05  1:24     ` Stephen Rothwell
2007-06-05  2:23     ` Olof Johansson
2007-06-05  0:18 ` Rusty Russell
2007-06-05  2:31   ` Rusty Russell
2007-06-05  4:18     ` Matt Mackall
2007-06-05  4:37       ` Rusty Russell
2007-06-05 13:49         ` Matt Mackall
2007-06-05 19:38           ` Ingo Molnar
2007-06-05 10:43       ` Ingo Molnar
2007-06-05 19:08         ` Ingo Molnar
2007-06-06 20:19           ` Matt Mackall
2007-06-05  2:34   ` Matt Mackall
2007-06-05  7:10   ` Ingo Molnar
2007-06-05  9:23 ` Ingo Molnar
2007-06-05 14:06   ` Matt Mackall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox