Strange interaction between latencytop and the scheduler

All of lore.kernel.org
 help / color / mirror / Atom feed

* Strange interaction between latencytop and the scheduler
@ 2008-01-22 17:54 Török Edwin
  2008-01-22 19:04 ` Arjan van de Ven
  0 siblings, 1 reply; 5+ messages in thread
From: Török Edwin @ 2008-01-22 17:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, Linux Kernel

Hi Arjan, Ingo,

I am using latest latencytop from git (commit
92b6ca9d40998697866689f64b95647eca3200cb), and I'm seeing some strange
things:
[for minor latencytop userspace specific issues see at end of mail]

* if I compile with CONFIG_SCHED_DEBUG I see a 30-40msec latency from
the scheduler, always (Scheduler: waiting for cpu).
There is also a constant ~5% user, ~2% sys CPU usage on an idle system,
regardless  if latencytop sysctl is on or off.
Is this normal? (is overhead really 40msec?)
I was also seeing an unusually high number of context switches (as shown
by vmstat), I usually have 400-800 with non-patched kernels (while
running mplayer too), but I was getting steadily over 1100 with the
patch (on idle system).

CPU usage with latencytop on:
Cpu(s):  7.3%us,  7.0%sy,  0.3%ni, 85.3%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

CPU usage with latencytop sysctl off:
Cpu(s):  5.0%us,  1.7%sy,  0.0%ni, 93.3%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

However top won't tell me who's using 5% CPU!
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      20   0 10388  648  548 S  0.0  0.0   0:01.70 init
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      15  -5     0    0    0 S  0.0  0.0   0:00.13 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0

High number of context switches (>1100):
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
 2  0    520  60828  92276 832212    0    0   281    55  330 1215 16  3
73  8
 0  0    520  60828  92276 832212    0    0     0     2  242 1158  5  1
94  0
 0  0    520  60828  92288 832212    0    0     0     3  290 1261  8  1
91  0
 0  0    520  60836  92288 832216    0    0     1     0  227 1407 10  2
89  0

Scheduler latency:
Cause                                                Maximum     Percentage
put_device elv_insert blk_plug_device default_wake152.4 msec          4.3 %
Creating block layer request                       68.1 msec          2.5 %
Scheduler: waiting for cpu                         26.6 msec          6.5 %
Writing a page to disk                             15.5 msec          1.8 %
Checking for media change                          10.3 msec          7.8 %
SCSI cdrom ioctl                                    9.4 msec          1.3 %
Checking CDROM media present                        7.1 msec          0.3 %
Waiting for event (poll)                            5.0 msec         59.4 %
Application requested delay                         4.9 msec         11.6 %

Process hddtemp (3772)
put_device elv_insert blk_plug_device default_wake152.4 msec        
61.5 %blk_rq_bio_prep blk_rq_append_bio blk_rq_map_user sg_io scsi_
Creating block layer request                       68.1 msec         36.3 %
Scheduler: waiting for cpu                          6.5 msec          2.2 %


* I compile without CONFIG_SCHED_DEBUG, I no longer get *any* latency
from the scheduler, even if I run multi-threaded programs, etc. Is this
to be expected? (i.e. is this feature available only when enabling
CONFIG_SCHED_DEBUG?)


I was trying to trigger some scheduler latencies artifically, so I wrote
a program that creates 10 threads, each doing an usleep(1) in an infloop.
Still I get no scheduler latency (I was getting scheduler latency with
version 0.1).
If latencytop sysctl is on, when running this test, after a short while
X becomes unusable, and almost freezes; yet I still can't see latencies.
Before it freezes I captured top outputs (see below), notice how
otherwise idle programs get a high CPU usage!
If latencytop sysctl is off no freeze occurs, and CPU usage% is as
normal (a.out getting 99%)

Top output when latencytop sysctl is on:

top - 19:33:31 up  1:28,  2 users,  load average: 3.45, 1.21, 1.05
Tasks: 120 total,   9 running, 110 sleeping,   1 stopped,   0 zombie
Cpu(s):  3.9%us, 95.5%sy,  0.6%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   2060592k total,  1979756k used,    80836k free,    69028k buffers
Swap:  3903724k total,        0k used,  3903724k free,  1473896k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7008 edwin     20   0 38764  472  356 R 58.4  0.0   0:05.94 a.out
 5613 edwin     20   0  179m  10m 7452 R 17.3  0.5   1:45.19 gkrellm
 5920 edwin     20   0  565m  90m  25m R 17.3  4.5   1:41.82 icedove-bin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is an idle program!
 5610 edwin     20   0  152m  21m  13m R  1.9  1.1   1:28.88 konsole
 6749 edwin     20   0  254m 7800 5496 S  1.9  0.4   0:00.87 mplayer
 5209 root      20   0  110m  35m 9016 S  1.3  1.8   1:36.31 Xorg
  908 root      15  -5     0    0    0 S  0.6  0.0   0:01.60 scsi_eh_0
 5132 postgres  20   0 77216 1284  736 S  0.6  0.1   0:00.35 postgres
 5151 root      35  15 41068  29m  648 S  0.6  1.5   0:27.96 preload
 7005 edwin     20   0 18976 1184  880 R  0.6  0.1   0:00.02 top
    1 root      20   0 10388  712  588 S  0.0  0.0   0:01.68 init

with latencytop sysctl off:

top - 19:37:03 up  1:32,  2 users,  load average: 0.76, 0.95, 0.98
Tasks: 120 total,   6 running, 113 sleeping,   1 stopped,   0 zombie
Cpu(s): 10.6%us, 89.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   2060592k total,  1984368k used,    76224k free,    69476k buffers
Swap:  3903724k total,        0k used,  3903724k free,  1474340k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7107 edwin     20   0 87940  504  368 S 94.1  0.0   0:05.52 a.out
 5610 edwin     20   0  152m  21m  13m R  3.0  1.1   1:31.26 konsole
 5209 root      20   0  110m  35m 9016 S  1.5  1.8   1:41.19 Xorg
 5613 edwin     20   0  179m  10m 7452 S  1.5  0.5   1:48.59 gkrellm
 6749 edwin     20   0  254m 7808 5496 S  1.5  0.4   0:02.31 mplayer
 7111 edwin     20   0 18976 1180  880 R  1.5  0.1   0:00.01 top
    1 root      20   0 10388  712  588 S  0.0  0.0   0:01.68 init
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd

Some minor latencytop userspace issues:
* percentages: new feature (nice!), but the values seem all wrong
(global perc. are always wrong, per app perc. are usually ok, but see
second example below)
* I miss the Average latency column. If it is too costly to keep account
of an overall average, can we have last N second average?
Or just cumulative latency so far (unless it overflows)
This is especially useful for lock contention latencies: if I see max
latency a few miliseconds I don't worry, if I see average latency a few
miliseconds I'm starting to worry.
* unknown reasons show a backtrace, but backtrace doesn't have enough
room on screen
* on startup there are lots of latencies with process_timeout as first
in backtrace
* /proc/latency_stats still says it is v0.1

Example output to illustrate some of the issues:

   LatencyTOP version 0.3       (C) 2008 Intel Corporation

Cause                                                Maximum     Percentage
put_device elv_insert blk_plug_device default_wake 29.3 msec          0.2 %
Creating block layer request                       20.5 msec          0.2 %
Waiting for event (poll)                            5.0 msec         43.1 %
Application requested delay                         5.0 msec         36.5 %
Waiting for TTY data                                5.0 msec         14.2 %
Page fault                                          4.9 msec          3.0 %
Waiting for data on unix socket                     4.0 msec          0.1 %
Reading from a pipe                                 3.4 msec          0.0 %
Userspace lock contention                           1.6 msec          0.0 %

Process hddtemp (3760)
put_device elv_insert blk_plug_device default_wake 29.3 msec        
51.1 %blk_rq_bio_prep blk_rq_append_bio blk_rq_map_user sg_io scs
Creating block layer request                       20.5 msec         48.9 %

 reiserfs/0  udevd  kpsmoused  kgameportd  mdadm  dhclient3  mdadm 
dirmngr  lighttpd  syslogd  acpid  cupsd  dbus-daemon  hddtemp


   LatencyTOP version 0.3       (C) 2008 Intel Corporation

Cause                                                Maximum     Percentage
Waiting for TTY data                                5.0 msec         13.5 %
Waiting for event (poll)                            5.0 msec         44.2 %
Application requested delay                         5.0 msec         36.7 %
synchronous write                                   5.0 msec          1.9 %
Waiting for data on unix socket                     4.9 msec          0.1 %
Page fault                                          4.9 msec          2.4 %
Writing buffer to disk (synchronous)                4.8 msec          0.0 %
Reading from a pipe                                 3.7 msec          0.1 %
Closing a file                                      1.9 msec          0.1 %

Process hald-addon-stor (4645)
SCSI cdrom ioctl                                    1.1 msec         17.7 %
Checking for media change                           0.6 msec         77.9 %
Executing raw SCSI command                          0.2 msec          4.3 %


Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Strange interaction between latencytop and the scheduler
  2008-01-22 17:54 Strange interaction between latencytop and the scheduler Török Edwin
@ 2008-01-22 19:04 ` Arjan van de Ven
  2008-01-22 19:35   ` Török Edwin
  0 siblings, 1 reply; 5+ messages in thread
From: Arjan van de Ven @ 2008-01-22 19:04 UTC (permalink / raw)
  To: Török Edwin; +Cc: Ingo Molnar, Linux Kernel

Török Edwin wrote:
> Hi Arjan, Ingo,

> [for minor latencytop userspace specific issues see at end of mail]
> 
> * if I compile with CONFIG_SCHED_DEBUG I see a 30-40msec latency from
> the scheduler, always (Scheduler: waiting for cpu).
> There is also a constant ~5% user, ~2% sys CPU usage on an idle system,
> regardless  if latencytop sysctl is on or off.
> Is this normal? (is overhead really 40msec?)

I'm seeing similar, I would not rule out that this is actual scheduler behavior ;(
at least it seems consistent.
I also made a patch (to lkml yesterday) that counts the total and count of scheduler latencies
to also show the cumulative effect of these latencies

> I was also seeing an unusually high number of context switches (as shown
> by vmstat), I usually have 400-800 with non-patched kernels (while
> running mplayer too), but I was getting steadily over 1100 with the
> patch (on idle system).

that's weird; there's nothing blocking in the entire patch at all.
(the only lock is a spinlock)

The performance aspect... collecting the data isn't cheap (which is why it's made a sysctl),
I still plan to look at optimizing it but it won't ever be free.

> * I compile without CONFIG_SCHED_DEBUG, I no longer get *any* latency
> from the scheduler, even if I run multi-threaded programs, etc. Is this
> to be expected? (i.e. is this feature available only when enabling
> CONFIG_SCHED_DEBUG?)

afaik yes.

> Some minor latencytop userspace issues:
> * percentages: new feature (nice!), but the values seem all wrong
> (global perc. are always wrong, per app perc. are usually ok, but see
> second example below)

note that the percentages are percentage this entry had compared the total sum of latencies.
The maximum entry is only loosely related to that; say you have 1 latency in "foo" for 100ms
but 900 of 1ms in "bar", "foo" will show 10% and "bar" will show 90%

> * I miss the Average latency column. If it is too costly to keep account
> of an overall average, can we have last N second average?

it's not costly to calculate, it's the screen space versus the value of the information :(

> * unknown reasons show a backtrace, but backtrace doesn't have enough
> room on screen

still working on that; you can pass the --unknown option to dump these so that I can add them
to the translation table.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Strange interaction between latencytop and the scheduler
  2008-01-22 19:04 ` Arjan van de Ven
@ 2008-01-22 19:35   ` Török Edwin
  2008-01-24  9:35     ` Török Edwin
  0 siblings, 1 reply; 5+ messages in thread
From: Török Edwin @ 2008-01-22 19:35 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, Linux Kernel

Arjan van de Ven wrote:
> Török Edwin wrote:
>> Is this normal? (is overhead really 40msec?)
>
> I'm seeing similar, I would not rule out that this is actual scheduler
> behavior ;(
> at least it seems consistent.

Ok, it is good that we are seeing same behaviour.

> I also made a patch (to lkml yesterday) that counts the total and
> count of scheduler latencies
> to also show the cumulative effect of these latencies
Found it: http://lkml.org/lkml/2008/1/21/223.
I'll try it the next days. Do you want me to report results? [and
especially, should I Cc: Ingo Molnar on the report?]
>
>> I was also seeing an unusually high number of context switches (as shown
>> by vmstat), I usually have 400-800 with non-patched kernels (while
>> running mplayer too), but I was getting steadily over 1100 with the
>> patch (on idle system).
>
> that's weird; there's nothing blocking in the entire patch at all.
> (the only lock is a spinlock)

Maybe the overhead of latencytop is triggerring other behaviour in the
scheduler? Just a guess.
Are there any tests I can run to see why there are more context switches?

>
> The performance aspect... collecting the data isn't cheap (which is
> why it's made a sysctl),
> I still plan to look at optimizing it but it won't ever be free.

Yes, I understand that. Is there a way latencytop could track its own
overhead? I suppose it would lead to more accurate results
(not that there would be anything wrong with the current ones).
>
>> * I compile without CONFIG_SCHED_DEBUG, I no longer get *any* latency
>> from the scheduler, even if I run multi-threaded programs, etc. Is this
>> to be expected? (i.e. is this feature available only when enabling
>> CONFIG_SCHED_DEBUG?)
>
> afaik yes.

Ok.

>> * percentages: new feature (nice!), but the values seem all wrong
>> (global perc. are always wrong, per app perc. are usually ok, but see
>> second example below)
> Some minor latencytop userspace issues:
>
> note that the percentages are percentage this entry had compared the
> total sum of latencies.
> The maximum entry is only loosely related to that; say you have 1
> latency in "foo" for 100ms
> but 900 of 1ms in "bar", "foo" will show 10% and "bar" will show 90%

Thanks, that explains it. Looks like a good substitute for the average
column.
This is the functionality I was missing, when that column went away.
Still, I wouldn't mind to see the average column too (maybe activated
via a hotkey, or shown only if enough screenspace is available?).

>
>> * I miss the Average latency column. If it is too costly to keep account
>> of an overall average, can we have last N second average?
>
> it's not costly to calculate, it's the screen space versus the value
> of the information :(

If I stretch my terminal window there's room for 2 or 3 more columns :)

>
>> * unknown reasons show a backtrace, but backtrace doesn't have enough
>> room on screen
>
> still working on that; you can pass the --unknown option to dump these
> so that I can add them
> to the translation table.

I gathered these while writing this reply:

Unknown: put_device elv_insert blk_plug_device default_wake_function
blk_execute_rq blk_rq_bio_prep blk_rq_append_bio blk_rq_map_user sg_io
scsi_cmd_ioctl ip_queue_xmit tcp_transmit_skb
Unknown: md_thread autoremove_wake_function md_thread md_thread kthread
child_rip kthread child_rip
Unknown: kswapd autoremove_wake_function kswapd kswapd kthread child_rip
kthread child_rip


Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Strange interaction between latencytop and the scheduler
  2008-01-22 19:35   ` Török Edwin
@ 2008-01-24  9:35     ` Török Edwin
  2008-01-24 15:41       ` Arjan van de Ven
  0 siblings, 1 reply; 5+ messages in thread
From: Török Edwin @ 2008-01-24  9:35 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, Linux Kernel

Török Edwin wrote:
>   
>> The performance aspect... collecting the data isn't cheap (which is
>> why it's made a sysctl),
>> I still plan to look at optimizing it but it won't ever be free.
>>     
>
> Yes, I understand that. Is there a way latencytop could track its own
> overhead? I suppose it would lead to more accurate results
> (not that there would be anything wrong with the current ones).
>   

Latencytop userspace tool shows latencies > 0.1 msec, thus capturing
backtraces for latencies <0.1msec could be avoided.
If I apply the patch below, then enabling latencytop doesn't freeze X
when running the "10-threads doing infloop usleep(1)" test.
Still, I don't want to loose track of the latencies we didn't collect
backtraces for, so I added a special "untraced" category, reported as
first line in /proc/latency_stats. If needed, instead of hardcoding the
threshold, it could be made a sysctl, or set via writing to
/proc/latency_stats,...

While I am running the test-program:
$ grep untraced /proc/latency_stats
4875605 5120414 49 untraced

On an idle system:
$ grep untraced /proc/latency_stats
532 3287 47 untraced
$ grep untraced /proc/latency_stats
853 5778 47 untraced
$ grep untraced /proc/latency_stats
950 6788 47 untraced
$ grep untraced /proc/latency_stats
1343 9977 49 untraced
$ grep untraced /proc/latency_stats
1448 11075 49 untraced

Best regards,
--Edwin

---
latencytop.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)
--- /tmp/linux-2.6.24-rc8/kernel/latencytop.c    2008-01-24
11:27:32.727487585 +0200
+++ kernel/latencytop.c    2008-01-24 10:42:25.000000000 +0200
@@ -24,8 +24,11 @@
 static DEFINE_SPINLOCK(latency_lock);
 
 #define MAXLR 128
+/* we are not interested in latencies less than 0.1msec, so
+ * don't get backtraces for latencies <0.05msec.*/
+#define LATENCY_TRACE_THRESHOLD 50
 static struct latency_record latency_record[MAXLR];
-
+static struct latency_record untraced;
 int latencytop_enabled;
 
 void clear_all_latency_tracing(struct task_struct *p)
@@ -47,6 +50,7 @@
 
     spin_lock_irqsave(&latency_lock, flags);
     memset(&latency_record, 0, sizeof(latency_record));
+    memset(&untraced, 0, sizeof(untraced));
     spin_unlock_irqrestore(&latency_lock, flags);
 }
 
@@ -124,6 +128,15 @@
     if (inter && usecs > 5000)
         return;
 
+    if(usecs < LATENCY_TRACE_THRESHOLD) {
+        /* don't get stacktrace for very low latencies */
+        untraced.time += usecs;
+        if(usecs > untraced.max)
+            untraced.max = usecs;
+        untraced.count++;
+        return;
+    }
+
     memset(&lat, 0, sizeof(lat));
     lat.count = 1;
     lat.time = usecs;
@@ -177,6 +190,10 @@
 
     seq_puts(m, "Latency Top version : v0.1\n");
 
+    seq_printf(m, "%i %li %li untraced \n",
+            untraced.count,
+            untraced.time,
+            untraced.max);
     for (i = 0; i < MAXLR; i++) {
         if (latency_record[i].backtrace[0]) {
             int q;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Strange interaction between latencytop and the scheduler
  2008-01-24  9:35     ` Török Edwin
@ 2008-01-24 15:41       ` Arjan van de Ven
  0 siblings, 0 replies; 5+ messages in thread
From: Arjan van de Ven @ 2008-01-24 15:41 UTC (permalink / raw)
  To: Török Edwin; +Cc: Ingo Molnar, Linux Kernel

Török Edwin wrote:
> 
> Latencytop userspace tool shows latencies > 0.1 msec, thus capturing
> backtraces for latencies <0.1msec could be avoided.
> If I apply the patch below, then enabling latencytop doesn't freeze X
> when running the "10-threads doing infloop usleep(1)" test.

ok I like the idea; I would propose though to make the 0.1 msec a sysctl or something,
so that people who really care about latencies lower than that can just set that lower.
(and the tool can then read the value and use that)

> Still, I don't want to loose track of the latencies we didn't collect
> backtraces for, so I added a special "untraced" category, reported as
> first line in /proc/latency_stats. If needed, instead of hardcoding the
> threshold, it could be made a sysctl, or set via writing to
> /proc/latency_stats,...

yeah for the total it makes sense; I'll do the sysctl thing for the threshold
and integrate this idea as well. Thanks a lot!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-01-24 15:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-22 17:54 Strange interaction between latencytop and the scheduler Török Edwin
2008-01-22 19:04 ` Arjan van de Ven
2008-01-22 19:35   ` Török Edwin
2008-01-24  9:35     ` Török Edwin
2008-01-24 15:41       ` Arjan van de Ven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.