* wrong count of CPU usage
@ 2012-05-16 14:33 Azat Khuzhin
2012-05-21 20:20 ` Azat Khuzhin
0 siblings, 1 reply; 3+ messages in thread
From: Azat Khuzhin @ 2012-05-16 14:33 UTC (permalink / raw)
To: Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 2739 bytes --]
Hi all
I have machine that have uptime 212 days, kernel version: Linux
2.6.35.14-97.fc14.x86_64 #1 SMP Sat Sep 17 00:15:37 UTC 2011 x86_64
x86_64 x86_64 GNU/Linux
Before top (1) utility shows some activity per process, as it should
be (sort by %CPU column)
But from some time, no activity in top (1) per process at all
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 19420 1548 1240 S 0.0 0.0 17:48.09 init
2 root 20 0 0 0 0 S 0.0 0.0 0:09.72 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 20:34.76 ksoftirqd/0
But load avg > 10
top (1) shows some activity on cores (I mean not processes, that
created by kernel)
top - 09:58:26 up 212 days, 7:06, 8 users, load average: 13.27, 12.20, 11.37
Tasks: 815 total, 9 running, 803 sleeping, 2 stopped, 1 zombie
Cpu0 : 47.5%us, 8.0%sy, 0.3%ni, 26.4%id, 16.9%wa, 0.0%hi, 0.9%si, 0.0%st
Cpu1 : 39.7%us, 5.2%sy, 0.0%ni, 48.0%id, 6.8%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 29.8%us, 3.1%sy, 0.0%ni, 59.7%id, 7.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 15.9%us, 2.8%sy, 0.0%ni, 77.3%id, 3.7%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 39.3%us, 5.2%sy, 0.0%ni, 37.4%id, 17.8%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 29.6%us, 4.0%sy, 0.0%ni, 62.3%id, 3.7%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu6 : 20.4%us, 5.0%sy, 0.0%ni, 74.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu7 : 17.3%us, 2.8%sy, 0.0%ni, 76.8%id, 3.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 32.8%us, 6.5%sy, 0.0%ni, 31.6%id, 29.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 27.6%us, 3.1%sy, 0.0%ni, 69.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 6.8%us, 2.5%sy, 0.0%ni, 87.9%id, 2.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 8.7%us, 1.9%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 27.3%us, 2.8%sy, 0.0%ni, 52.8%id, 17.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 18.2%us, 3.7%sy, 0.0%ni, 77.5%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 21.4%us, 3.8%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 10.0%us, 1.9%sy, 0.0%ni, 87.9%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
So seems like after some time kernel not updating /proc/PID/stat properly
I wrote shell script (attach), that count CPU usage, and it also shows 0
This script base on 14, 15 columns from /proc/PID/stat (as man proc
says, that it utime and stime clocks, and /proc/stat summary), but for
newly created processes
Summarize
/proc/PID/stat seems not change counters utime, stime
/proc/stat updated
--
Azat Khuzhin
[-- Attachment #2: cpu_usage --]
[-- Type: application/octet-stream, Size: 1271 bytes --]
#!/bin/bash
PID=$1
if [ "x$PID" = "x" ] ; then
exit 1
fi
# utime
USER_START=`awk '{print $14}' /proc/$PID/stat`
USER_ALL_START=`head -n1 /proc/stat | awk 'BEGIN {sum = 0} { for (i = 1; i <= NF; ++i) { sum += $i } } END {print sum}'`
sleep 1
USER_END=`awk '{print $14}' /proc/$PID/stat`
USER_ALL_END=`head -n1 /proc/stat | awk 'BEGIN {sum = 0} { for (i = 1; i <= NF; ++i) { sum += $i } } END {print sum}'`
USER=`echo "sacle=5; 100 * ($USER_END - $USER_START) / ($USER_ALL_END - $USER_ALL_START)" | bc`
echo "[process] jiffies for 1 sec: $(( $USER_END - $USER_START ))"
echo "[ all] jiffies for 1 sec: $(( $USER_ALL_END - $USER_ALL_START ))"
echo "[ cpu%] $USER %"
# stime
SYS_START=`awk '{print $15}' /proc/$PID/stat`
SYS_ALL_START=`head -n1 /proc/stat | awk 'BEGIN {sum = 0} { for (i = 1; i <= NF; ++i) { sum += $i } } END {print sum}'`
sleep 1
SYS_END=`awk '{print $15}' /proc/$PID/stat`
SYS_ALL_END=`head -n1 /proc/stat | awk 'BEGIN {sum = 0} { for (i = 1; i <= NF; ++i) { sum += $i } } END {print sum}'`
SYS=`echo "scale=5; 100 * ($SYS_END - $SYS_START) / ($SYS_ALL_END - $SYS_ALL_START)" | bc`
echo "[process] jiffies for 1 sec: $(( $SYS_END - $SYS_START ))"
echo "[ all] jiffies for 1 sec: $(( $SYS_ALL_END - $SYS_ALL_START ))"
echo "[ cpu%] $SYS %"
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: wrong count of CPU usage
2012-05-16 14:33 wrong count of CPU usage Azat Khuzhin
@ 2012-05-21 20:20 ` Azat Khuzhin
2012-06-13 9:24 ` [utime/stime times have stalled] Walter Doekes
0 siblings, 1 reply; 3+ messages in thread
From: Azat Khuzhin @ 2012-05-21 20:20 UTC (permalink / raw)
To: Linux Kernel Mailing List
Anybody?
BTW I restart server and CPU usage counting fine now, as I expected
But maybe this is a bug in kernel, that after a while it count CPU usage wrong?
On Wed, May 16, 2012 at 6:33 PM, Azat Khuzhin <dohardgopro@gmail.com> wrote:
> Hi all
>
> I have machine that have uptime 212 days, kernel version: Linux
> 2.6.35.14-97.fc14.x86_64 #1 SMP Sat Sep 17 00:15:37 UTC 2011 x86_64
> x86_64 x86_64 GNU/Linux
> Before top (1) utility shows some activity per process, as it should
> be (sort by %CPU column)
>
> But from some time, no activity in top (1) per process at all
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 1 root 20 0 19420 1548 1240 S 0.0 0.0 17:48.09 init
> 2 root 20 0 0 0 0 S 0.0 0.0 0:09.72 kthreadd
> 3 root 20 0 0 0 0 S 0.0 0.0 20:34.76 ksoftirqd/0
>
> But load avg > 10
> top (1) shows some activity on cores (I mean not processes, that
> created by kernel)
> top - 09:58:26 up 212 days, 7:06, 8 users, load average: 13.27, 12.20, 11.37
> Tasks: 815 total, 9 running, 803 sleeping, 2 stopped, 1 zombie
> Cpu0 : 47.5%us, 8.0%sy, 0.3%ni, 26.4%id, 16.9%wa, 0.0%hi, 0.9%si, 0.0%st
> Cpu1 : 39.7%us, 5.2%sy, 0.0%ni, 48.0%id, 6.8%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu2 : 29.8%us, 3.1%sy, 0.0%ni, 59.7%id, 7.4%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu3 : 15.9%us, 2.8%sy, 0.0%ni, 77.3%id, 3.7%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu4 : 39.3%us, 5.2%sy, 0.0%ni, 37.4%id, 17.8%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu5 : 29.6%us, 4.0%sy, 0.0%ni, 62.3%id, 3.7%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu6 : 20.4%us, 5.0%sy, 0.0%ni, 74.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu7 : 17.3%us, 2.8%sy, 0.0%ni, 76.8%id, 3.1%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu8 : 32.8%us, 6.5%sy, 0.0%ni, 31.6%id, 29.1%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu9 : 27.6%us, 3.1%sy, 0.0%ni, 69.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu10 : 6.8%us, 2.5%sy, 0.0%ni, 87.9%id, 2.8%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu11 : 8.7%us, 1.9%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu12 : 27.3%us, 2.8%sy, 0.0%ni, 52.8%id, 17.1%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu13 : 18.2%us, 3.7%sy, 0.0%ni, 77.5%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu14 : 21.4%us, 3.8%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu15 : 10.0%us, 1.9%sy, 0.0%ni, 87.9%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
>
> So seems like after some time kernel not updating /proc/PID/stat properly
>
> I wrote shell script (attach), that count CPU usage, and it also shows 0
> This script base on 14, 15 columns from /proc/PID/stat (as man proc
> says, that it utime and stime clocks, and /proc/stat summary), but for
> newly created processes
>
> Summarize
> /proc/PID/stat seems not change counters utime, stime
> /proc/stat updated
>
> --
> Azat Khuzhin
--
Azat Khuzhin
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [utime/stime times have stalled]
2012-05-21 20:20 ` Azat Khuzhin
@ 2012-06-13 9:24 ` Walter Doekes
0 siblings, 0 replies; 3+ messages in thread
From: Walter Doekes @ 2012-06-13 9:24 UTC (permalink / raw)
To: linux-kernel; +Cc: wjdoekes, dohardgopro
Hi,
(I hope this lands in the right thread, since I had to set the
In-Reply-To by hand. Further, I'm not subscribed, so I'd like replies to
go to me to.)
Azat Khuzhin writes that his process-specific utime/stime times are
zero. So are mine.
On a normal system, we have this for a basic 1 billion increment:
$ time ./manyops
real 0m2.343s
user 0m2.340s
sys 0m0.000s
On the system where I have issues, it looks like this:
# time ./manyops
real 0m2.936s
user 0m0.000s <-- 0 ??
sys 0m0.000s
Like Azat also writes, the CPU times in /proc/stat are ok. There is CPU
usage. And when I run an infinite loop, I see 100% cpu on one of the cores.
But the process specific utime/stime/cutime/cstime found in
/proc/<pid>/stat have *all* *stalled*.
Today is 13th of June. The latest processes that still have any utime at
all were started a month ago. No times are increasing anywhere.
# for x in /proc/[0-9]* ; do
l=`cat $x/stat`
num=`echo $l|cut -d' ' -f14`
[ $num == 0 ] && continue
pid=`echo $l|cut -d' ' -f1`
nam=`echo $l|cut -d' ' -f2`
jif=`echo $l|cut -d' ' -f22`
printf "%-15s %d (pid=%s)\n" $nam $jif $pid
done | sort
...
(cron) 37188 (pid=3932)
(events/0) 145 (pid=51)
(events/11) 145 (pid=62)
...
(events/9) 145 (pid=60)
(init) 2 (pid=1)
(kblockd/0) 146 (pid=90)
...
(ntpd) 1779238631 (pid=18489)
(rsyslogd) 36487 (pid=3436)
(saslauthd) 36520 (pid=3474)
...
(sshd) 1778081256 (pid=15197)
(supervisord) 37404 (pid=4088)
...
# ps faxu | egrep ' (18489|15197) ' | awk '{print $9 " " $11}'
May14 /usr/sbin/sshd
May14 /usr/sbin/ntpd
The next-oldest process is from May21, and that one has the times set to
zero:
# cat /proc/14396/stat
14396 (nrpe) S 1 14396 14396 0 -1 4202816 110426 2933382 0 8663 0 0 0
0 20 0 1 0 1839481161 25292800 165 18446744073709551615 4194304 4228372
140735909835936 140735909828360 139786175811971 0 0 0 16389 0 0 0 17 4 0
0 2008199 0 0
I haven't tested what happens when we reboot. The customer hasn't given
us the permission to do so yet. But it wouldn't surprise me if the
counters started to work again, like for Azat.
Can anyone think of any reason why the utime/stime counters fail to work
after a while?
System specs:
CPU: 16 core(?) "Intel(R) Xeon(R) CPU E5620 @ 2.40GHz"
Mem: 16GB
Kernel: 2.6.32-5-amd64 (debian squeeze 2.6.32-45)
Uptime: 11:14:35 up 235 days, 20:54
Are there any other relevant details you might need.
Kind regards,
Walter Doekes
OSSO B.V.
P.S. The cluster of applications on the system is suffering from
unexplained slowness. It is probably unrelated to the problem, but
without the utime/stime it is hard to track down where the bottleneck is.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-06-13 9:33 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-16 14:33 wrong count of CPU usage Azat Khuzhin
2012-05-21 20:20 ` Azat Khuzhin
2012-06-13 9:24 ` [utime/stime times have stalled] Walter Doekes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox