[REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
@ 2011-11-24 10:30 Artem S. Tashkinov
  2011-11-24 20:05 ` Tino Keitel
  2011-11-29 21:16 ` Maciej Rutecki
  0 siblings, 2 replies; 15+ messages in thread
From: Artem S. Tashkinov @ 2011-11-24 10:30 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
under this kernel:

Here are two text snapshots of htop:

  1  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Tasks: 135 total, 1 running
  2  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Load average: 0.00 0.01 0.05
  3  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Uptime: 00:27:49
  4  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Load: 0.00
  Mem[||||||||||||                                              544/8093MB]     Avg[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]

 IORR  IOWR    IO   PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
    0     0     0  3529 root      20   0 70052 52244 19324 S 300.  0.6  0:25.93 /usr/bin/X -br
    0     0     0  3678 user      20   0 33952 10220  8448 S 100.  0.1  0:03.58 gkrellm
    0     0     0  3772 user      20   0 32144 13960  9532 S 100.  0.2  0:07.46 konsole [kdeinit] --noxft
    0     0     0  6061 user      20   0  2780  1276   960 R 100.  0.0  0:00.01 htop
    0     0     0     1 root      20   0  2884  1376  1168 S  0.0  0.0  0:01.00 /sbin/init
    0     0     0     2 root      20   0     0     0     0 S  0.0  0.0  0:00.00 kthreadd


  1  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Tasks: 135 total, 1 running
  2  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Load average: 0.00 0.01 0.05
  3  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Uptime: 00:28:43
  4  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]     Load: 0.00
  Mem[||||||||||||                                              545/8093MB]     Avg[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]

 IORR  IOWR    IO   PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
    0     0     0  6061 user      20   0  2780  1288   972 R 57.0  0.0  0:00.11 htop
    0     0     0  5243 user      20   0  559M  318M 30928 S 57.0  3.9  0:26.07 /opt/firefox/firefox
    0     0     0  3687 user      20   0  108M 39124 23564 S 57.0  0.5  0:06.50 kedit
    0     0     0  3637 user      20   0 26960  5572  4288 S 57.0  0.1  0:00.02 klauncher [kdeinit] --new-startup
    0     0     0  3529 root      20   0 70052 52244 19324 S  0.0  0.6  0:26.71 /usr/bin/X -br

Interestingly with this madness going on, the internal kernel average load counter works properly:

[root@localhost ~]# uptime
 16:20:38 up 44 min,  2 users,  load average: 0.02, 0.04, 0.05

Right at this moment all process viewers report 400% CPU load (I have 4 CPU cores), either 400% loaded by user processes
or 200% by system and 200% by user processes.

Graphically it looks this way: http://img717.imageshack.us/img717/6495/top2b.png

I'm not running any CPU intensive applications at the moment at all.

My .config file can be downloaded here: http://ompldr.org/iYmZneg

My distro is Fedora 14 i686.

Best wishes,

Artem

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-24 10:30 [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers Artem S. Tashkinov
@ 2011-11-24 20:05 ` Tino Keitel
  2011-11-27 11:04   ` Tino Keitel
  2011-11-29 21:16 ` Maciej Rutecki
  1 sibling, 1 reply; 15+ messages in thread
From: Tino Keitel @ 2011-11-24 20:05 UTC (permalink / raw)
  To: linux-kernel

On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> Hello,
> 
> I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> under this kernel:

I get the same using top, htop and the gnome system monitor with kernel
3.2 on a Sandy Bridge quad core box, running Debian unstable.

Regards,
Tino

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-24 20:05 ` Tino Keitel
@ 2011-11-27 11:04   ` Tino Keitel
  2011-11-27 11:45     ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Tino Keitel @ 2011-11-27 11:04 UTC (permalink / raw)
  To: linux-kernel

On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > Hello,
> > 
> > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > under this kernel:
> 
> I get the same using top, htop and the gnome system monitor with kernel
> 3.2 on a Sandy Bridge quad core box, running Debian unstable.

I just tested 3.2-rc2, and see the same bug.

Regards,
Tino

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-27 11:04   ` Tino Keitel
@ 2011-11-27 11:45     ` Rafael J. Wysocki
  2011-11-27 11:45       ` Tino Keitel
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2011-11-27 11:45 UTC (permalink / raw)
  To: Tino Keitel; +Cc: linux-kernel, Artem S. Tashkinov

On Sunday, November 27, 2011, Tino Keitel wrote:
> On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > Hello,
> > > 
> > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > under this kernel:
> > 
> > I get the same using top, htop and the gnome system monitor with kernel
> > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> 
> I just tested 3.2-rc2, and see the same bug.

I'm seeing that too on one of my test boxes, but not all the time
(i.e. there are periods in which the readings are correct).  The other boxes
I've tested with 3.2-rc are fine in that respect.

Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
like there's an overflow somewhere in the CPU load measuring code, at least
on some CPUs.

What's your CPU, BTW?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-27 11:45     ` Rafael J. Wysocki
@ 2011-11-27 11:45       ` Tino Keitel
  2011-11-27 11:56         ` Rafael J. Wysocki
  2011-11-27 11:57       ` Artem S. Tashkinov
  2011-11-28 19:55       ` Tino Keitel
  2 siblings, 1 reply; 15+ messages in thread
From: Tino Keitel @ 2011-11-27 11:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Artem S. Tashkinov

On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> On Sunday, November 27, 2011, Tino Keitel wrote:
> > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > Hello,
> > > > 
> > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > under this kernel:
> > > 
> > > I get the same using top, htop and the gnome system monitor with kernel
> > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > 
> > I just tested 3.2-rc2, and see the same bug.
> 
> I'm seeing that too on one of my test boxes, but not all the time
> (i.e. there are periods in which the readings are correct).  The other boxes
> I've tested with 3.2-rc are fine in that respect.
> 
> Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> like there's an overflow somewhere in the CPU load measuring code, at least
> on some CPUs.
> 
> What's your CPU, BTW?

Intel Core i5-2400
 
Regards,
Tino

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-27 11:45       ` Tino Keitel
@ 2011-11-27 11:56         ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2011-11-27 11:56 UTC (permalink / raw)
  To: Tino Keitel; +Cc: linux-kernel, Artem S. Tashkinov

On Sunday, November 27, 2011, Tino Keitel wrote:
> On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> > On Sunday, November 27, 2011, Tino Keitel wrote:
> > > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > > Hello,
> > > > > 
> > > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > > under this kernel:
> > > > 
> > > > I get the same using top, htop and the gnome system monitor with kernel
> > > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > > 
> > > I just tested 3.2-rc2, and see the same bug.
> > 
> > I'm seeing that too on one of my test boxes, but not all the time
> > (i.e. there are periods in which the readings are correct).  The other boxes
> > I've tested with 3.2-rc are fine in that respect.
> > 
> > Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> > like there's an overflow somewhere in the CPU load measuring code, at least
> > on some CPUs.
> > 
> > What's your CPU, BTW?
> 
> Intel Core i5-2400

The CPU I'm seeing the problem with is AMD Athlon X2 3800+.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-27 11:45     ` Rafael J. Wysocki
  2011-11-27 11:45       ` Tino Keitel
@ 2011-11-27 11:57       ` Artem S. Tashkinov
  2011-11-28 19:55       ` Tino Keitel
  2 siblings, 0 replies; 15+ messages in thread
From: Artem S. Tashkinov @ 2011-11-27 11:57 UTC (permalink / raw)
  To: rjw; +Cc: tino.keitel, linux-kernel

> On Nov 27, 2011, Rafael J. Wysocki <rjw@sisk.pl> wrote: 
> 
> On Sunday, November 27, 2011, Tino Keitel wrote:
> > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > Hello,
> > > > 
> > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > under this kernel:
> > > 
> > > I get the same using top, htop and the gnome system monitor with kernel
> > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > 
> > I just tested 3.2-rc2, and see the same bug.
> 
> I'm seeing that too on one of my test boxes, but not all the time
> (i.e. there are periods in which the readings are correct).  The other boxes
> I've tested with 3.2-rc are fine in that respect.
> 
> Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> like there's an overflow somewhere in the CPU load measuring code, at least
> on some CPUs.
> 
> What's your CPU, BTW?

Intel Core i5 2500, i686

Best wishes,

Artem

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-27 11:45     ` Rafael J. Wysocki
  2011-11-27 11:45       ` Tino Keitel
  2011-11-27 11:57       ` Artem S. Tashkinov
@ 2011-11-28 19:55       ` Tino Keitel
  2011-11-28 20:19         ` Rafael J. Wysocki
  2011-11-29 21:25         ` Tino Keitel
  2 siblings, 2 replies; 15+ messages in thread
From: Tino Keitel @ 2011-11-28 19:55 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Artem S. Tashkinov

On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> On Sunday, November 27, 2011, Tino Keitel wrote:
> > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > Hello,
> > > > 
> > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > under this kernel:
> > > 
> > > I get the same using top, htop and the gnome system monitor with kernel
> > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > 
> > I just tested 3.2-rc2, and see the same bug.
> 
> I'm seeing that too on one of my test boxes, but not all the time
> (i.e. there are periods in which the readings are correct).  The other boxes
> I've tested with 3.2-rc are fine in that respect.
> 
> Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> like there's an overflow somewhere in the CPU load measuring code, at least
> on some CPUs.

Hi,

I reverted this commit and so far it looks good:

commit a25cac5198d4ff2842ccca63b423962848ad24b2
Author: Michal Hocko <mhocko@suse.cz>
Date:   Wed Aug 24 09:40:25 2011 +0200

    proc: Consider NO_HZ when printing idle and iowait times

I'll report back tomorrow how the kernel behaves.

Regards,
Tino

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-28 19:55       ` Tino Keitel
@ 2011-11-28 20:19         ` Rafael J. Wysocki
  2011-11-28 21:41           ` Michal Hocko
  2011-11-29 21:25         ` Tino Keitel
  1 sibling, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2011-11-28 20:19 UTC (permalink / raw)
  To: Tino Keitel, Michal Hocko; +Cc: linux-kernel, Artem S. Tashkinov

On Monday, November 28, 2011, Tino Keitel wrote:
> On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> > On Sunday, November 27, 2011, Tino Keitel wrote:
> > > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > > Hello,
> > > > > 
> > > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > > under this kernel:
> > > > 
> > > > I get the same using top, htop and the gnome system monitor with kernel
> > > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > > 
> > > I just tested 3.2-rc2, and see the same bug.
> > 
> > I'm seeing that too on one of my test boxes, but not all the time
> > (i.e. there are periods in which the readings are correct).  The other boxes
> > I've tested with 3.2-rc are fine in that respect.
> > 
> > Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> > like there's an overflow somewhere in the CPU load measuring code, at least
> > on some CPUs.
> 
> Hi,
> 
> I reverted this commit and so far it looks good:
> 
> commit a25cac5198d4ff2842ccca63b423962848ad24b2
> Author: Michal Hocko <mhocko@suse.cz>
> Date:   Wed Aug 24 09:40:25 2011 +0200
> 
>     proc: Consider NO_HZ when printing idle and iowait times
> 
> I'll report back tomorrow how the kernel behaves.

Hmm.  Michal, can you have a look at that, please?

Rafael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-28 20:19         ` Rafael J. Wysocki
@ 2011-11-28 21:41           ` Michal Hocko
  2011-11-28 21:43             ` Michal Hocko
                               ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Michal Hocko @ 2011-11-28 21:41 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Tino Keitel, linux-kernel, Artem S. Tashkinov

Hi,

On Mon 28-11-11 21:19:26, Rafael J. Wysocki wrote:
> On Monday, November 28, 2011, Tino Keitel wrote:
> > On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> > > On Sunday, November 27, 2011, Tino Keitel wrote:
> > > > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > > > under this kernel:
> > > > > 
> > > > > I get the same using top, htop and the gnome system monitor with kernel
> > > > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > > > 
> > > > I just tested 3.2-rc2, and see the same bug.
> > > 
> > > I'm seeing that too on one of my test boxes, but not all the time
> > > (i.e. there are periods in which the readings are correct).  The other boxes
> > > I've tested with 3.2-rc are fine in that respect.
> > > 
> > > Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> > > like there's an overflow somewhere in the CPU load measuring code, at least
> > > on some CPUs.
> > 
> > Hi,
> > 
> > I reverted this commit and so far it looks good:
> > 
> > commit a25cac5198d4ff2842ccca63b423962848ad24b2
> > Author: Michal Hocko <mhocko@suse.cz>
> > Date:   Wed Aug 24 09:40:25 2011 +0200
> > 
> >     proc: Consider NO_HZ when printing idle and iowait times
> > 
> > I'll report back tomorrow how the kernel behaves.
> 
> Hmm.  Michal, can you have a look at that, please?

Hmm, my testing didn't show anything like that. Could you post
cat /proc/stat collected every second during 30s or so?

Here is the output of my run with 3.2.0-rc3-00004-gdd38d29 and the attached config:
for i in `seq 30`; 
do 
	cat /proc/stat > `date +'%s'`
	sleep 1
done
export old_user=0 old_nice=0 old_sys=0 old_idle=0 old_iowait=0; 
grep cpu0 * | while read cpu user nice sys idle iowait rest; 
do 
	echo $cpu $(($user-$old_user)) $(($nice-$old_nice)) $(($sys-$old_sys)) $(($idle-$old_idle)) $(($iowait-$old_iowait))
	old_user=$user old_nice=$nice old_sys=$sys old_idle=$idle old_iowait=$iowait
done

Mostly no workload (idle desktop) - few seconds of bosy loop:
1322516060:cpu0 621150 1978 148367 299773 196163
1322516061:cpu0 4 0 3 92 0
1322516062:cpu0 16 0 9 79 0
1322516063:cpu0 0 0 0 97 0
1322516064:cpu0 70 0 2 28 0    << Busy loop started
1322516065:cpu0 100 0 0 0 0
1322516066:cpu0 100 0 0 0 0
1322516067:cpu0 41 0 1 58 0    << Busy loop finished
1322516068:cpu0 0 0 2 96 0
1322516069:cpu0 1 0 2 97 0
1322516070:cpu0 100 0 0 0 0
1322516071:cpu0 42 0 1 58 0
1322516072:cpu0 0 0 2 97 0
1322516073:cpu0 1 0 2 97 0
1322516074:cpu0 1 0 1 98 0
1322516075:cpu0 2 0 1 97 0
1322516076:cpu0 1 0 1 91 7
1322516077:cpu0 1 0 0 97 0
1322516078:cpu0 0 0 0 97 0
1322516079:cpu0 2 0 1 97 0
1322516080:cpu0 0 0 1 97 1
1322516081:cpu0 1 0 4 90 4
1322516082:cpu0 2 0 0 97 0
1322516083:cpu0 1 0 2 98 0
1322516084:cpu0 2 0 1 96 0
1322516085:cpu0 0 0 2 98 0
1322516086:cpu0 1 0 1 91 7
1322516087:cpu0 0 0 0 97 0
1322516088:cpu0 1 0 0 97 0
1322516089:cpu0 1 0 1 100 0

Which looks correct (matches USER_HZ 100) to me.
Governors are updating those values and maybe idle driver might be relevant.
Here is my setting:
$ grep . -r /sys/devices/system/cpu/cpuidle/
/sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:menu
$ grep . -r /sys/devices/system/cpu/cpufreq/
/sys/devices/system/cpu/cpufreq/ondemand/sampling_rate_min:10000
/sys/devices/system/cpu/cpufreq/ondemand/sampling_rate:10000
/sys/devices/system/cpu/cpufreq/ondemand/up_threshold:95
/sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor:1
/sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load:0
/sys/devices/system/cpu/cpufreq/ondemand/powersave_bias:0
/sys/devices/system/cpu/cpufreq/ondemand/io_is_busy:1

> 
> Rafael

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-28 21:41           ` Michal Hocko
@ 2011-11-28 21:43             ` Michal Hocko
  2011-11-28 21:48             ` Michal Hocko
  2011-11-29  8:14             ` Michal Hocko
  2 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2011-11-28 21:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Tino Keitel, linux-kernel, Artem S. Tashkinov

On Mon 28-11-11 22:41:25, Michal Hocko wrote:
> Hi,
> 
> On Mon 28-11-11 21:19:26, Rafael J. Wysocki wrote:
> > On Monday, November 28, 2011, Tino Keitel wrote:
> > > On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, November 27, 2011, Tino Keitel wrote:
> > > > > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > > > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > > > > under this kernel:
> > > > > > 
> > > > > > I get the same using top, htop and the gnome system monitor with kernel
> > > > > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > > > > 
> > > > > I just tested 3.2-rc2, and see the same bug.
> > > > 
> > > > I'm seeing that too on one of my test boxes, but not all the time
> > > > (i.e. there are periods in which the readings are correct).  The other boxes
> > > > I've tested with 3.2-rc are fine in that respect.
> > > > 
> > > > Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> > > > like there's an overflow somewhere in the CPU load measuring code, at least
> > > > on some CPUs.
> > > 
> > > Hi,
> > > 
> > > I reverted this commit and so far it looks good:
> > > 
> > > commit a25cac5198d4ff2842ccca63b423962848ad24b2
> > > Author: Michal Hocko <mhocko@suse.cz>
> > > Date:   Wed Aug 24 09:40:25 2011 +0200
> > > 
> > >     proc: Consider NO_HZ when printing idle and iowait times
> > > 
> > > I'll report back tomorrow how the kernel behaves.
> > 
> > Hmm.  Michal, can you have a look at that, please?
> 
> Hmm, my testing didn't show anything like that. Could you post
> cat /proc/stat collected every second during 30s or so?
> 
> Here is the output of my run with 3.2.0-rc3-00004-gdd38d29 and the attached config:
> for i in `seq 30`; 
> do 
> 	cat /proc/stat > `date +'%s'`
> 	sleep 1
> done
> export old_user=0 old_nice=0 old_sys=0 old_idle=0 old_iowait=0; 
> grep cpu0 * | while read cpu user nice sys idle iowait rest; 
> do 
> 	echo $cpu $(($user-$old_user)) $(($nice-$old_nice)) $(($sys-$old_sys)) $(($idle-$old_idle)) $(($iowait-$old_iowait))
> 	old_user=$user old_nice=$nice old_sys=$sys old_idle=$idle old_iowait=$iowait
> done
> 
> Mostly no workload (idle desktop) - few seconds of bosy loop:
> 1322516060:cpu0 621150 1978 148367 299773 196163
> 1322516061:cpu0 4 0 3 92 0
> 1322516062:cpu0 16 0 9 79 0
> 1322516063:cpu0 0 0 0 97 0
[...]

Forgot to add, but cpu1 looks similar
1322516060:cpu1 641344 832 137307 132871 44144
1322516061:cpu1 4 0 4 92 0
1322516062:cpu1 19 0 11 74 0
1322516063:cpu1 2 0 2 96 0
1322516064:cpu1 7 0 4 89 0
1322516065:cpu1 0 0 0 97 0
1322516066:cpu1 2 0 3 88 6
1322516067:cpu1 59 0 1 40 0
1322516068:cpu1 101 0 0 0 0
1322516069:cpu1 100 0 0 0 0
1322516070:cpu1 1 0 1 96 0
1322516071:cpu1 1 0 3 90 7
1322516072:cpu1 2 0 0 97 0
1322516073:cpu1 1 0 1 98 0
1322516074:cpu1 1 0 3 97 0
1322516075:cpu1 0 0 0 98 0
1322516076:cpu1 2 0 1 98 0
1322516077:cpu1 1 0 2 98 0
1322516078:cpu1 0 0 1 99 0
1322516079:cpu1 1 0 2 99 0
1322516080:cpu1 0 0 1 98 0
1322516081:cpu1 1 0 1 98 0
1322516082:cpu1 1 0 2 98 0
1322516083:cpu1 0 0 1 99 0
1322516084:cpu1 2 0 1 98 0
1322516085:cpu1 1 0 2 97 0
1322516086:cpu1 1 0 0 99 0
1322516087:cpu1 0 0 2 97 0
1322516088:cpu1 2 0 2 98 0
1322516089:cpu1 1 0 1 97 0


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-28 21:41           ` Michal Hocko
  2011-11-28 21:43             ` Michal Hocko
@ 2011-11-28 21:48             ` Michal Hocko
  2011-11-29  8:14             ` Michal Hocko
  2 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2011-11-28 21:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Tino Keitel, linux-kernel, Artem S. Tashkinov

[-- Attachment #1: Type: text/plain, Size: 133 bytes --]

And the forgotten config. Sorry...
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

[-- Attachment #2: config.gz --]
[-- Type: application/octet-stream, Size: 17157 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-28 21:41           ` Michal Hocko
  2011-11-28 21:43             ` Michal Hocko
  2011-11-28 21:48             ` Michal Hocko
@ 2011-11-29  8:14             ` Michal Hocko
  2 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2011-11-29  8:14 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Tino Keitel, linux-kernel, Artem S. Tashkinov

[-- Attachment #1: Type: text/plain, Size: 3182 bytes --]

On Mon 28-11-11 22:41:25, Michal Hocko wrote:
> Hi,
> 
> On Mon 28-11-11 21:19:26, Rafael J. Wysocki wrote:
> > On Monday, November 28, 2011, Tino Keitel wrote:
> > > On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, November 27, 2011, Tino Keitel wrote:
> > > > > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > > > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > > > > under this kernel:
> > > > > > 
> > > > > > I get the same using top, htop and the gnome system monitor with kernel
> > > > > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > > > > 
> > > > > I just tested 3.2-rc2, and see the same bug.
> > > > 
> > > > I'm seeing that too on one of my test boxes, but not all the time
> > > > (i.e. there are periods in which the readings are correct).  The other boxes
> > > > I've tested with 3.2-rc are fine in that respect.
> > > > 
> > > > Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> > > > like there's an overflow somewhere in the CPU load measuring code, at least
> > > > on some CPUs.
> > > 
> > > Hi,
> > > 
> > > I reverted this commit and so far it looks good:
> > > 
> > > commit a25cac5198d4ff2842ccca63b423962848ad24b2
> > > Author: Michal Hocko <mhocko@suse.cz>
> > > Date:   Wed Aug 24 09:40:25 2011 +0200
> > > 
> > >     proc: Consider NO_HZ when printing idle and iowait times
> > > 
> > > I'll report back tomorrow how the kernel behaves.
> > 
> > Hmm.  Michal, can you have a look at that, please?
> 
> Hmm, my testing didn't show anything like that. Could you post
> cat /proc/stat collected every second during 30s or so?
> 
> Here is the output of my run with 3.2.0-rc3-00004-gdd38d29 and the attached config:
> for i in `seq 30`; 
> do 
> 	cat /proc/stat > `date +'%s'`
> 	sleep 1
> done
> export old_user=0 old_nice=0 old_sys=0 old_idle=0 old_iowait=0; 
> grep cpu0 * | while read cpu user nice sys idle iowait rest; 
> do 
> 	echo $cpu $(($user-$old_user)) $(($nice-$old_nice)) $(($sys-$old_sys)) $(($idle-$old_idle)) $(($iowait-$old_iowait))
> 	old_user=$user old_nice=$nice old_sys=$sys old_idle=$idle old_iowait=$iowait
> done
> 

Same results (attached) x86_64 with AMD 16CPUs in my lab with a
different cpuidle driver:
grep . -r /sys/devices/system/cpu/cpuidle/
$ /sys/devices/system/cpu/cpuidle/current_driver:none
/sys/devices/system/cpu/cpuidle/current_governor_ro:menu
$ grep . -r /sys/devices/system/cpu/cpufreq/
/sys/devices/system/cpu/cpufreq/ondemand/sampling_rate_min:10000
/sys/devices/system/cpu/cpufreq/ondemand/sampling_rate:38000
/sys/devices/system/cpu/cpufreq/ondemand/up_threshold:40
/sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor:1
/sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load:0
/sys/devices/system/cpu/cpufreq/ondemand/powersave_bias:0
/sys/devices/system/cpu/cpufreq/ondemand/io_is_busy:0

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

[-- Attachment #2: config.gz --]
[-- Type: application/octet-stream, Size: 33394 bytes --]

[-- Attachment #3: amd_16cpus.tar.bz2 --]
[-- Type: application/octet-stream, Size: 3958 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-28 19:55       ` Tino Keitel
  2011-11-28 20:19         ` Rafael J. Wysocki
@ 2011-11-29 21:25         ` Tino Keitel
  1 sibling, 0 replies; 15+ messages in thread
From: Tino Keitel @ 2011-11-29 21:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Rafael J. Wysocki, Artem S. Tashkinov

On Mon, Nov 28, 2011 at 20:55:34 +0100, Tino Keitel wrote:
> On Sun, Nov 27, 2011 at 12:45:57 +0100, Rafael J. Wysocki wrote:
> > On Sunday, November 27, 2011, Tino Keitel wrote:
> > > On Thu, Nov 24, 2011 at 21:05:53 +0100, Tino Keitel wrote:
> > > > On Thu, Nov 24, 2011 at 10:30:15 +0000, Artem S. Tashkinov wrote:
> > > > > Hello,
> > > > > 
> > > > > I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all CPU metering applications have gone terribly mad
> > > > > under this kernel:
> > > > 
> > > > I get the same using top, htop and the gnome system monitor with kernel
> > > > 3.2 on a Sandy Bridge quad core box, running Debian unstable.
> > > 
> > > I just tested 3.2-rc2, and see the same bug.
> > 
> > I'm seeing that too on one of my test boxes, but not all the time
> > (i.e. there are periods in which the readings are correct).  The other boxes
> > I've tested with 3.2-rc are fine in that respect.
> > 
> > Also, it seems that it shows 100%-(real load) when it is wrong.  So, it looks
> > like there's an overflow somewhere in the CPU load measuring code, at least
> > on some CPUs.
> 
> Hi,
> 
> I reverted this commit and so far it looks good:
> 
> commit a25cac5198d4ff2842ccca63b423962848ad24b2
> Author: Michal Hocko <mhocko@suse.cz>
> Date:   Wed Aug 24 09:40:25 2011 +0200
> 
>     proc: Consider NO_HZ when printing idle and iowait times
> 
> I'll report back tomorrow how the kernel behaves.

Hi,

looks fine so far with the git tree from commit
401d0069cb344f401bc9d264c31db55876ff78c0 and
a25cac5198d4ff2842ccca63b423962848ad24b2 reverted.

Regards,
Tino

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers
  2011-11-24 10:30 [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers Artem S. Tashkinov
  2011-11-24 20:05 ` Tino Keitel
@ 2011-11-29 21:16 ` Maciej Rutecki
  1 sibling, 0 replies; 15+ messages in thread
From: Maciej Rutecki @ 2011-11-29 21:16 UTC (permalink / raw)
  To: Artem S. Tashkinov; +Cc: linux-kernel

On czwartek, 24 listopada 2011 o 11:30:15 Artem S. Tashkinov wrote:
> Hello,
> 
> I'd like to report a weird regression in Linux 3.2 (running rc3 now) - all
> CPU metering applications have gone terribly mad under this kernel:
> 
> Here are two text snapshots of htop:
> 
>   1  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
>     Tasks: 135 total, 1 running 2 
> [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]    
> Load average: 0.00 0.01 0.05 3 
> [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]    
> Uptime: 00:27:49 4 
> [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]    
> Load: 0.00 Mem[||||||||||||                                             
> 544/8093MB]    
> Avg[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
> 
>  IORR  IOWR    IO   PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%  
> TIME+  Command 0     0     0  3529 root      20   0 70052 52244 19324 S
> 300.  0.6  0:25.93 /usr/bin/X -br 0     0     0  3678 user      20   0
> 33952 10220  8448 S 100.  0.1  0:03.58 gkrellm 0     0     0  3772 user   
>   20   0 32144 13960  9532 S 100.  0.2  0:07.46 konsole [kdeinit] --noxft
> 0     0     0  6061 user      20   0  2780  1276   960 R 100.  0.0 
> 0:00.01 htop 0     0     0     1 root      20   0  2884  1376  1168 S  0.0
>  0.0  0:01.00 /sbin/init 0     0     0     2 root      20   0     0     0 
>    0 S  0.0  0.0  0:00.00 kthreadd
> 
> 
>   1  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
>     Tasks: 135 total, 1 running 2 
> [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]    
> Load average: 0.00 0.01 0.05 3 
> [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]    
> Uptime: 00:28:43 4 
> [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]    
> Load: 0.00 Mem[||||||||||||                                             
> 545/8093MB]    
> Avg[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
> 
>  IORR  IOWR    IO   PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%  
> TIME+  Command 0     0     0  6061 user      20   0  2780  1288   972 R
> 57.0  0.0  0:00.11 htop 0     0     0  5243 user      20   0  559M  318M
> 30928 S 57.0  3.9  0:26.07 /opt/firefox/firefox 0     0     0  3687 user  
>    20   0  108M 39124 23564 S 57.0  0.5  0:06.50 kedit 0     0     0  3637
> user      20   0 26960  5572  4288 S 57.0  0.1  0:00.02 klauncher
> [kdeinit] --new-startup 0     0     0  3529 root      20   0 70052 52244
> 19324 S  0.0  0.6  0:26.71 /usr/bin/X -br
> 
> Interestingly with this madness going on, the internal kernel average load
> counter works properly:
> 
> [root@localhost ~]# uptime
>  16:20:38 up 44 min,  2 users,  load average: 0.02, 0.04, 0.05
> 
> Right at this moment all process viewers report 400% CPU load (I have 4 CPU
> cores), either 400% loaded by user processes or 200% by system and 200% by
> user processes.
> 
> Graphically it looks this way:
> http://img717.imageshack.us/img717/6495/top2b.png
> 
> I'm not running any CPU intensive applications at the moment at all.
> 
> My .config file can be downloaded here: http://ompldr.org/iYmZneg
> 
> My distro is Fedora 14 i686.
> 
> Best wishes,
> 
> Artem
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Seems be similar to:
http://marc.info/?l=linux-kernel&m=132156832124314&w=2
http://marc.info/?l=linux-kernel&m=132164909313594&w=2

Regards
-- 
Maciej Rutecki
http://www.mrutecki.pl

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-11-29 21:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-24 10:30 [REGRESSION] [Linux 3.2] top/htop and all other CPU usage metering applications has gone crackers Artem S. Tashkinov
2011-11-24 20:05 ` Tino Keitel
2011-11-27 11:04   ` Tino Keitel
2011-11-27 11:45     ` Rafael J. Wysocki
2011-11-27 11:45       ` Tino Keitel
2011-11-27 11:56         ` Rafael J. Wysocki
2011-11-27 11:57       ` Artem S. Tashkinov
2011-11-28 19:55       ` Tino Keitel
2011-11-28 20:19         ` Rafael J. Wysocki
2011-11-28 21:41           ` Michal Hocko
2011-11-28 21:43             ` Michal Hocko
2011-11-28 21:48             ` Michal Hocko
2011-11-29  8:14             ` Michal Hocko
2011-11-29 21:25         ` Tino Keitel
2011-11-29 21:16 ` Maciej Rutecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox