* IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4
@ 2009-04-23 14:19 Andrew Gallatin
2009-04-28 23:30 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Gallatin @ 2009-04-23 14:19 UTC (permalink / raw)
To: linux-kernel; +Cc: Rick Jones, Brice Goglin
When running netperf for some 10GbE tests, I noticed
that IRQ and SOFTIRQ CPU time is no longer reported for an
otherwise idle CPU on recent kernels, at least for x86_64.
If I take a 2-CPU system, and bind the NIC IRQ to CPU0, and
bind the user-space netserver daemon to CPU1, the problem
is obvious when blasting 10Gb/s of traffic at it. I see
no CPU used for irq or softirq on CPU0, even though it is
handling 13K interrupts/sec:
% mpstat -P 0 1
Linux 2.6.30-rc1 (venice) 04/22/09
11:25:25 CPU %user %nice %system %iowait %irq %soft %idle
intr/s
11:25:26 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00
13248.00
11:25:27 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00
13280.00
Common sense tells me that is wrong, and oprofile verifies there is
a lot happening on CPU0. Further, when I run a cpu-soaker in
usermode bound to CPU0, I start to see irq, softirq, etc,
being correctly identified:
11:28:02 CPU %user %nice %system %iowait %irq %soft %idle
intr/s
11:28:03 0 45.10 0.00 0.00 0.00 1.96 52.94 0.00
13019.61
11:28:04 0 46.46 0.00 0.00 0.00 2.02 51.52 0.00
13414.14
The problem is observable, but much less obvious when using a more
common, e1000 1GbE NIC (15% softirq is missing, rather than 50%).
I spent a few hours git-bisecting until I finally got here:
% git-bisect bad
Bisecting: 0 revisions left to test after this
[457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4] fix scaled & unscaled cputime
accounting
I have neither CONFIG_NO_HZ, CONFIG_VIRT_CPU_ACCOUNTING, or XEN configured.
Drew
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 2009-04-23 14:19 IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 Andrew Gallatin @ 2009-04-28 23:30 ` Andrew Morton 2009-04-29 5:01 ` Eric Dumazet 2009-04-29 8:02 ` IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 Martin Schwidefsky 0 siblings, 2 replies; 14+ messages in thread From: Andrew Morton @ 2009-04-28 23:30 UTC (permalink / raw) To: Andrew Gallatin Cc: linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar (cc's added) On Thu, 23 Apr 2009 10:19:38 -0400 Andrew Gallatin <gallatin@myri.com> wrote: > When running netperf for some 10GbE tests, I noticed > that IRQ and SOFTIRQ CPU time is no longer reported for an > otherwise idle CPU on recent kernels, at least for x86_64. > > If I take a 2-CPU system, and bind the NIC IRQ to CPU0, and > bind the user-space netserver daemon to CPU1, the problem > is obvious when blasting 10Gb/s of traffic at it. I see > no CPU used for irq or softirq on CPU0, even though it is > handling 13K interrupts/sec: > > > > % mpstat -P 0 1 > Linux 2.6.30-rc1 (venice) 04/22/09 > > 11:25:25 CPU %user %nice %system %iowait %irq %soft %idle > intr/s > 11:25:26 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 13248.00 > 11:25:27 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 13280.00 > > Common sense tells me that is wrong, and oprofile verifies there is > a lot happening on CPU0. Further, when I run a cpu-soaker in > usermode bound to CPU0, I start to see irq, softirq, etc, > being correctly identified: > > 11:28:02 CPU %user %nice %system %iowait %irq %soft %idle > intr/s > 11:28:03 0 45.10 0.00 0.00 0.00 1.96 52.94 0.00 > 13019.61 > 11:28:04 0 46.46 0.00 0.00 0.00 2.02 51.52 0.00 > 13414.14 > > The problem is observable, but much less obvious when using a more > common, e1000 1GbE NIC (15% softirq is missing, rather than 50%). > > I spent a few hours git-bisecting until I finally got here: > > % git-bisect bad > Bisecting: 0 revisions left to test after this > [457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4] fix scaled & unscaled cputime > accounting > > I have neither CONFIG_NO_HZ, CONFIG_VIRT_CPU_ACCOUNTING, or XEN configured. > Thanks for doing the bisection. 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 was merged late last year, so this regression has been around for a while. We might have fixed it in more recent kernels - stranger things have happened ;) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 2009-04-28 23:30 ` Andrew Morton @ 2009-04-29 5:01 ` Eric Dumazet 2009-04-29 7:46 ` [PATCH] sched: account system time properly Eric Dumazet 2009-04-29 8:02 ` IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 Martin Schwidefsky 1 sibling, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2009-04-29 5:01 UTC (permalink / raw) To: Andrew Morton Cc: Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar Andrew Morton a écrit : > (cc's added) > > On Thu, 23 Apr 2009 10:19:38 -0400 > Andrew Gallatin <gallatin@myri.com> wrote: > >> When running netperf for some 10GbE tests, I noticed >> that IRQ and SOFTIRQ CPU time is no longer reported for an >> otherwise idle CPU on recent kernels, at least for x86_64. >> >> If I take a 2-CPU system, and bind the NIC IRQ to CPU0, and >> bind the user-space netserver daemon to CPU1, the problem >> is obvious when blasting 10Gb/s of traffic at it. I see >> no CPU used for irq or softirq on CPU0, even though it is >> handling 13K interrupts/sec: >> >> >> >> % mpstat -P 0 1 >> Linux 2.6.30-rc1 (venice) 04/22/09 >> >> 11:25:25 CPU %user %nice %system %iowait %irq %soft %idle >> intr/s >> 11:25:26 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >> 13248.00 >> 11:25:27 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >> 13280.00 >> >> Common sense tells me that is wrong, and oprofile verifies there is >> a lot happening on CPU0. Further, when I run a cpu-soaker in >> usermode bound to CPU0, I start to see irq, softirq, etc, >> being correctly identified: >> >> 11:28:02 CPU %user %nice %system %iowait %irq %soft %idle >> intr/s >> 11:28:03 0 45.10 0.00 0.00 0.00 1.96 52.94 0.00 >> 13019.61 >> 11:28:04 0 46.46 0.00 0.00 0.00 2.02 51.52 0.00 >> 13414.14 >> >> The problem is observable, but much less obvious when using a more >> common, e1000 1GbE NIC (15% softirq is missing, rather than 50%). >> >> I spent a few hours git-bisecting until I finally got here: >> >> % git-bisect bad >> Bisecting: 0 revisions left to test after this >> [457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4] fix scaled & unscaled cputime >> accounting >> >> I have neither CONFIG_NO_HZ, CONFIG_VIRT_CPU_ACCOUNTING, or XEN configured. >> > > Thanks for doing the bisection. > > 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 was merged late last year, so > this regression has been around for a while. > > We might have fixed it in more recent kernels - stranger things have > happened ;) > Same problem here on 32 bit current git kernel, 8 cpus machine. Easily reproductible with a ping flood, NIC interrupts bonded to CPU0 Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 396260 474568 2881560 0 0 0 4 43642 2859 0 0 100 0 0 0 0 396260 474568 2881560 0 0 0 0 55802 2840 0 0 100 0 0 0 0 396260 474576 2881560 0 0 0 4 51239 2861 0 0 100 0 0 0 0 396260 474576 2881560 0 0 0 0 47848 2847 0 0 100 0 0 0 0 396260 474576 2881560 0 0 0 4 54245 2841 0 0 100 0 0 0 0 396260 474576 2881560 0 0 0 0 46500 2851 0 0 100 0 0 0 0 396260 474576 2881560 0 0 0 4 48035 2839 0 0 100 0 0 0 0 396136 474580 2881560 0 0 0 0 41994 2855 0 0 100 0 0 0 0 396136 474580 2881560 0 0 0 4 46701 2842 0 0 100 0 If load is large enough to trigger ksoftirqd, then softirq time is correctly reported. Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st CONFIG_GENERIC_TIME=y # CONFIG_GENERIC_TIME_VSYSCALL is not set CONFIG_KTIME_SCALAR=y CONFIG_TIMERFD=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y CONFIG_HPET_TIMER=y # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_X86_PM_TIMER=y # CONFIG_NETFILTER_XT_MATCH_TIME is not set CONFIG_SERIAL_8250_RUNTIME_UARTS=4 # CONFIG_HW_RANDOM_TIMERIOMEM is not set # CONFIG_HANGCHECK_TIMER is not set # CONFIG_MACHZ_WDT is not set CONFIG_PRINTK_TIME=y CONFIG_TIMER_STATS=y ping flood received on eth0 / eth1 interfaces (PCI-MSI-edge IRQS) $ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 343 0 0 0 0 0 0 0 IO-APIC-edge timer 1: 1 0 1 0 0 0 0 0 IO-APIC-edge i8042 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 0 1 1 1 1 1 IO-APIC-edge i8042 14: 0 0 0 0 0 0 0 0 IO-APIC-edge ide0 15: 0 0 0 0 0 0 0 0 IO-APIC-edge ide1 16: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1 17: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 19: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 22: 12 10 13 9 14 14 14 8 IO-APIC-fasteoi uhci_hcd:usb5 33: 109191 165558 162520 156506 160469 161549 162655 161780 PCI-MSI-edge cciss0 34: 91628623 0 0 0 0 0 0 0 PCI-MSI-edge eth1 36: 18517783 0 0 0 0 0 0 0 PCI-MSI-edge eth0 NMI: 227238100 165960221 154895362 165425712 111786333 116776935 114633666 111315114 Non-maskable interrupts LOC: 88434578 80673549 73277767 78134420 77183920 71046805 68905716 70869751 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts RES: 16425 216454 73793 347021 125963 30655 47623 31300 Rescheduling interrupts CAL: 13164 13182 13184 13175 100 13169 13151 13163 Function call interrupts TLB: 45488 52740 50498 53806 57548 54933 58091 56249 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts commit 79741dd35713ff4f6fd0eafd59fa94e8a4ba922d did : void account_process_tick(struct task_struct *p, int user_tick) { cputime_t one_jiffy = jiffies_to_cputime(1); cputime_t one_jiffy_scaled = cputime_to_scaled(one_jiffy); struct rq *rq = this_rq(); if (user_tick) account_user_time(p, one_jiffy, one_jiffy_scaled); else if (p != rq->idle) account_system_time(p, HARDIRQ_OFFSET, one_jiffy, one_jiffy_scaled); else account_idle_time(one_jiffy); } So, if IRQs are interrupting idle task, I guess if (p != rq->idle) will be false. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] sched: account system time properly 2009-04-29 5:01 ` Eric Dumazet @ 2009-04-29 7:46 ` Eric Dumazet 2009-04-29 8:08 ` Martin Schwidefsky 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2009-04-29 7:46 UTC (permalink / raw) To: Andrew Morton Cc: Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar Eric Dumazet a écrit : > Andrew Morton a écrit : >> (cc's added) >> >> On Thu, 23 Apr 2009 10:19:38 -0400 >> Andrew Gallatin <gallatin@myri.com> wrote: >> >>> When running netperf for some 10GbE tests, I noticed >>> that IRQ and SOFTIRQ CPU time is no longer reported for an >>> otherwise idle CPU on recent kernels, at least for x86_64. >>> >>> If I take a 2-CPU system, and bind the NIC IRQ to CPU0, and >>> bind the user-space netserver daemon to CPU1, the problem >>> is obvious when blasting 10Gb/s of traffic at it. I see >>> no CPU used for irq or softirq on CPU0, even though it is >>> handling 13K interrupts/sec: >>> >>> >>> >>> % mpstat -P 0 1 >>> Linux 2.6.30-rc1 (venice) 04/22/09 >>> >>> 11:25:25 CPU %user %nice %system %iowait %irq %soft %idle >>> intr/s >>> 11:25:26 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 13248.00 >>> 11:25:27 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 13280.00 >>> >>> Common sense tells me that is wrong, and oprofile verifies there is >>> a lot happening on CPU0. Further, when I run a cpu-soaker in >>> usermode bound to CPU0, I start to see irq, softirq, etc, >>> being correctly identified: >>> >>> 11:28:02 CPU %user %nice %system %iowait %irq %soft %idle >>> intr/s >>> 11:28:03 0 45.10 0.00 0.00 0.00 1.96 52.94 0.00 >>> 13019.61 >>> 11:28:04 0 46.46 0.00 0.00 0.00 2.02 51.52 0.00 >>> 13414.14 >>> >>> The problem is observable, but much less obvious when using a more >>> common, e1000 1GbE NIC (15% softirq is missing, rather than 50%). >>> >>> I spent a few hours git-bisecting until I finally got here: >>> >>> % git-bisect bad >>> Bisecting: 0 revisions left to test after this >>> [457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4] fix scaled & unscaled cputime >>> accounting >>> >>> I have neither CONFIG_NO_HZ, CONFIG_VIRT_CPU_ACCOUNTING, or XEN configured. >>> >> Thanks for doing the bisection. >> >> 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 was merged late last year, so >> this regression has been around for a while. >> >> We might have fixed it in more recent kernels - stranger things have >> happened ;) >> > > Same problem here on 32 bit current git kernel, 8 cpus machine. > Easily reproductible with a ping flood, NIC interrupts bonded to CPU0 > Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > vmstat 1 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 396260 474568 2881560 0 0 0 4 43642 2859 0 0 100 0 > 0 0 0 396260 474568 2881560 0 0 0 0 55802 2840 0 0 100 0 > 0 0 0 396260 474576 2881560 0 0 0 4 51239 2861 0 0 100 0 > 0 0 0 396260 474576 2881560 0 0 0 0 47848 2847 0 0 100 0 > 0 0 0 396260 474576 2881560 0 0 0 4 54245 2841 0 0 100 0 > 0 0 0 396260 474576 2881560 0 0 0 0 46500 2851 0 0 100 0 > 0 0 0 396260 474576 2881560 0 0 0 4 48035 2839 0 0 100 0 > 0 0 0 396136 474580 2881560 0 0 0 0 41994 2855 0 0 100 0 > 0 0 0 396136 474580 2881560 0 0 0 4 46701 2842 0 0 100 0 > > If load is large enough to trigger ksoftirqd, then softirq time is correctly reported. > > Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st > > CONFIG_GENERIC_TIME=y > # CONFIG_GENERIC_TIME_VSYSCALL is not set > CONFIG_KTIME_SCALAR=y > CONFIG_TIMERFD=y > # CONFIG_NO_HZ is not set > CONFIG_HIGH_RES_TIMERS=y > CONFIG_HPET_TIMER=y > # CONFIG_HZ_100 is not set > # CONFIG_HZ_250 is not set > # CONFIG_HZ_300 is not set > CONFIG_HZ_1000=y > CONFIG_HZ=1000 > CONFIG_X86_PM_TIMER=y > # CONFIG_NETFILTER_XT_MATCH_TIME is not set > CONFIG_SERIAL_8250_RUNTIME_UARTS=4 > # CONFIG_HW_RANDOM_TIMERIOMEM is not set > # CONFIG_HANGCHECK_TIMER is not set > # CONFIG_MACHZ_WDT is not set > CONFIG_PRINTK_TIME=y > CONFIG_TIMER_STATS=y > > > ping flood received on eth0 / eth1 interfaces (PCI-MSI-edge IRQS) > > $ cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > 0: 343 0 0 0 0 0 0 0 IO-APIC-edge timer > 1: 1 0 1 0 0 0 0 0 IO-APIC-edge i8042 > 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi > 12: 0 0 0 1 1 1 1 1 IO-APIC-edge i8042 > 14: 0 0 0 0 0 0 0 0 IO-APIC-edge ide0 > 15: 0 0 0 0 0 0 0 0 IO-APIC-edge ide1 > 16: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1 > 17: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 > 18: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 > 19: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 > 22: 12 10 13 9 14 14 14 8 IO-APIC-fasteoi uhci_hcd:usb5 > 33: 109191 165558 162520 156506 160469 161549 162655 161780 PCI-MSI-edge cciss0 > 34: 91628623 0 0 0 0 0 0 0 PCI-MSI-edge eth1 > 36: 18517783 0 0 0 0 0 0 0 PCI-MSI-edge eth0 > NMI: 227238100 165960221 154895362 165425712 111786333 116776935 114633666 111315114 Non-maskable interrupts > LOC: 88434578 80673549 73277767 78134420 77183920 71046805 68905716 70869751 Local timer interrupts > SPU: 0 0 0 0 0 0 0 0 Spurious interrupts > RES: 16425 216454 73793 347021 125963 30655 47623 31300 Rescheduling interrupts > CAL: 13164 13182 13184 13175 100 13169 13151 13163 Function call interrupts > TLB: 45488 52740 50498 53806 57548 54933 58091 56249 TLB shootdowns > TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts > > > > commit 79741dd35713ff4f6fd0eafd59fa94e8a4ba922d did : > > void account_process_tick(struct task_struct *p, int user_tick) > { > cputime_t one_jiffy = jiffies_to_cputime(1); > cputime_t one_jiffy_scaled = cputime_to_scaled(one_jiffy); > struct rq *rq = this_rq(); > > if (user_tick) > account_user_time(p, one_jiffy, one_jiffy_scaled); > else if (p != rq->idle) > account_system_time(p, HARDIRQ_OFFSET, one_jiffy, > one_jiffy_scaled); > else > account_idle_time(one_jiffy); > } > > So, if IRQs are interrupting idle task, I guess if (p != rq->idle) will be false. > > Maybe following patch is needed ? [PATCH] sched: account system time properly When idle task is interrupted by an IRQ, time accounting considers CPU is idle, even while it should account for hard or softirq. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> diff --git a/kernel/sched.c b/kernel/sched.c index b902e58..26efa47 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) if (user_tick) account_user_time(p, one_jiffy, one_jiffy_scaled); - else if (p != rq->idle) + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) account_system_time(p, HARDIRQ_OFFSET, one_jiffy, one_jiffy_scaled); else ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] sched: account system time properly 2009-04-29 7:46 ` [PATCH] sched: account system time properly Eric Dumazet @ 2009-04-29 8:08 ` Martin Schwidefsky 2009-04-29 9:20 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Martin Schwidefsky @ 2009-04-29 8:08 UTC (permalink / raw) To: Eric Dumazet Cc: Andrew Morton, Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt, Ingo Molnar On Wed, 29 Apr 2009 09:46:17 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote: > Eric Dumazet a écrit : > > Andrew Morton a écrit : > > > > So, if IRQs are interrupting idle task, I guess if (p != rq->idle) will be false. > > If an IRQ interrupts the idle task the tick is supposed to be accounted as an idle tick. Only if the IRQ interrupted the system while it has been in hardirq or softirq processing then it should be accounted as system tick. > > Maybe following patch is needed ? > > [PATCH] sched: account system time properly > > When idle task is interrupted by an IRQ, time accounting considers CPU is idle, > even while it should account for hard or softirq. > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > > diff --git a/kernel/sched.c b/kernel/sched.c > index b902e58..26efa47 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) > > if (user_tick) > account_user_time(p, one_jiffy, one_jiffy_scaled); > - else if (p != rq->idle) > + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) > account_system_time(p, HARDIRQ_OFFSET, one_jiffy, > one_jiffy_scaled); > else That patch makes a lot of sense to me. Does it fix the problem? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sched: account system time properly 2009-04-29 8:08 ` Martin Schwidefsky @ 2009-04-29 9:20 ` Eric Dumazet 2009-04-29 9:48 ` Martin Schwidefsky 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2009-04-29 9:20 UTC (permalink / raw) To: Martin Schwidefsky Cc: Andrew Morton, Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt, Ingo Molnar Martin Schwidefsky a écrit : > On Wed, 29 Apr 2009 09:46:17 +0200 > Eric Dumazet <dada1@cosmosbay.com> wrote: > >> Eric Dumazet a écrit : >>> Andrew Morton a écrit : >>> >>> So, if IRQs are interrupting idle task, I guess if (p != rq->idle) will be false. >>> > > If an IRQ interrupts the idle task the tick is supposed to be accounted > as an idle tick. Only if the IRQ interrupted the system while it has > been in hardirq or softirq processing then it should be accounted as > system tick. > >> Maybe following patch is needed ? >> >> [PATCH] sched: account system time properly >> >> When idle task is interrupted by an IRQ, time accounting considers CPU is idle, >> even while it should account for hard or softirq. >> >> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> >> >> diff --git a/kernel/sched.c b/kernel/sched.c >> index b902e58..26efa47 100644 >> --- a/kernel/sched.c >> +++ b/kernel/sched.c >> @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) >> >> if (user_tick) >> account_user_time(p, one_jiffy, one_jiffy_scaled); >> - else if (p != rq->idle) >> + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) >> account_system_time(p, HARDIRQ_OFFSET, one_jiffy, >> one_jiffy_scaled); >> else > > That patch makes a lot of sense to me. Does it fix the problem? > Yes it does, on my machine at least : 11:18:48 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 11:18:58 AM all 0.00 0.00 0.00 0.00 0.21 0.69 0.00 0.00 99.10 11:18:58 AM 0 0.00 0.00 0.00 0.00 1.70 5.50 0.00 0.00 92.80 << HERE >> 11:18:58 AM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 11:18:58 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 11:18:58 AM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 11:18:58 AM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 11:18:58 AM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 11:18:58 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 11:18:58 AM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sched: account system time properly 2009-04-29 9:20 ` Eric Dumazet @ 2009-04-29 9:48 ` Martin Schwidefsky 2009-04-29 10:24 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: Martin Schwidefsky @ 2009-04-29 9:48 UTC (permalink / raw) To: Eric Dumazet Cc: Andrew Morton, Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt, Ingo Molnar On Wed, 29 Apr 2009 11:20:03 +0200 Eric Dumazet <dada1@cosmosbay.com> wrote: > Martin Schwidefsky a écrit : > > On Wed, 29 Apr 2009 09:46:17 +0200 > > Eric Dumazet <dada1@cosmosbay.com> wrote: > > > >> Eric Dumazet a écrit : > >>> Andrew Morton a écrit : > >>> > >>> So, if IRQs are interrupting idle task, I guess if (p != rq->idle) will be false. > >>> > > > > If an IRQ interrupts the idle task the tick is supposed to be accounted > > as an idle tick. Only if the IRQ interrupted the system while it has > > been in hardirq or softirq processing then it should be accounted as > > system tick. > > > >> Maybe following patch is needed ? > >> > >> [PATCH] sched: account system time properly > >> > >> When idle task is interrupted by an IRQ, time accounting considers CPU is idle, > >> even while it should account for hard or softirq. > >> > >> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > >> > >> diff --git a/kernel/sched.c b/kernel/sched.c > >> index b902e58..26efa47 100644 > >> --- a/kernel/sched.c > >> +++ b/kernel/sched.c > >> @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) > >> > >> if (user_tick) > >> account_user_time(p, one_jiffy, one_jiffy_scaled); > >> - else if (p != rq->idle) > >> + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) > >> account_system_time(p, HARDIRQ_OFFSET, one_jiffy, > >> one_jiffy_scaled); > >> else > > > > That patch makes a lot of sense to me. Does it fix the problem? > > > > Yes it does, on my machine at least : > > 11:18:48 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle > 11:18:58 AM all 0.00 0.00 0.00 0.00 0.21 0.69 0.00 0.00 99.10 > 11:18:58 AM 0 0.00 0.00 0.00 0.00 1.70 5.50 0.00 0.00 92.80 << HERE >> > 11:18:58 AM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 11:18:58 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 11:18:58 AM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 11:18:58 AM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 11:18:58 AM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 11:18:58 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > 11:18:58 AM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Very good. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sched: account system time properly 2009-04-29 9:48 ` Martin Schwidefsky @ 2009-04-29 10:24 ` Ingo Molnar 2009-04-29 12:44 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Ingo Molnar @ 2009-04-29 10:24 UTC (permalink / raw) To: Martin Schwidefsky Cc: Eric Dumazet, Andrew Morton, Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt * Martin Schwidefsky <schwidefsky@de.ibm.com> wrote: > On Wed, 29 Apr 2009 11:20:03 +0200 > Eric Dumazet <dada1@cosmosbay.com> wrote: > > > Martin Schwidefsky a écrit : > > > On Wed, 29 Apr 2009 09:46:17 +0200 > > > Eric Dumazet <dada1@cosmosbay.com> wrote: > > > > > >> Eric Dumazet a écrit : > > >>> Andrew Morton a écrit : > > >>> > > >>> So, if IRQs are interrupting idle task, I guess if (p != rq->idle) will be false. > > >>> > > > > > > If an IRQ interrupts the idle task the tick is supposed to be accounted > > > as an idle tick. Only if the IRQ interrupted the system while it has > > > been in hardirq or softirq processing then it should be accounted as > > > system tick. > > > > > >> Maybe following patch is needed ? > > >> > > >> [PATCH] sched: account system time properly > > >> > > >> When idle task is interrupted by an IRQ, time accounting considers CPU is idle, > > >> even while it should account for hard or softirq. > > >> > > >> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > > >> > > >> diff --git a/kernel/sched.c b/kernel/sched.c > > >> index b902e58..26efa47 100644 > > >> --- a/kernel/sched.c > > >> +++ b/kernel/sched.c > > >> @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) > > >> > > >> if (user_tick) > > >> account_user_time(p, one_jiffy, one_jiffy_scaled); > > >> - else if (p != rq->idle) > > >> + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) > > >> account_system_time(p, HARDIRQ_OFFSET, one_jiffy, > > >> one_jiffy_scaled); > > >> else > > > > > > That patch makes a lot of sense to me. Does it fix the problem? > > > > > > > Yes it does, on my machine at least : > > > > 11:18:48 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle > > 11:18:58 AM all 0.00 0.00 0.00 0.00 0.21 0.69 0.00 0.00 99.10 > > 11:18:58 AM 0 0.00 0.00 0.00 0.00 1.70 5.50 0.00 0.00 92.80 << HERE >> > > 11:18:58 AM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > > 11:18:58 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > > 11:18:58 AM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > > 11:18:58 AM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > > 11:18:58 AM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > > 11:18:58 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > > 11:18:58 AM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 > > Very good. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Thanks. Eric, mind (re-)sending the patch with Martin's ack included, and with either a suitable impact-line footer or an extra paragraph that describes the bug you found (and how it shows up in practice) and how the patch fixed that problem. Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sched: account system time properly 2009-04-29 10:24 ` Ingo Molnar @ 2009-04-29 12:44 ` Eric Dumazet 2009-04-29 13:09 ` [tip:sched/urgent] " tip-bot for Eric Dumazet 2009-04-29 13:23 ` [PATCH] " Andrew Gallatin 0 siblings, 2 replies; 14+ messages in thread From: Eric Dumazet @ 2009-04-29 12:44 UTC (permalink / raw) To: Ingo Molnar Cc: Martin Schwidefsky, Andrew Morton, Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt Ingo Molnar a écrit : > * Martin Schwidefsky <schwidefsky@de.ibm.com> wrote: > >> On Wed, 29 Apr 2009 11:20:03 +0200 >> Eric Dumazet <dada1@cosmosbay.com> wrote: >> >>> Martin Schwidefsky a écrit : >>>> On Wed, 29 Apr 2009 09:46:17 +0200 >>>> Eric Dumazet <dada1@cosmosbay.com> wrote: >>>> >>>>> Eric Dumazet a écrit : >>>>>> Andrew Morton a écrit : >>>>>> >>>>>> So, if IRQs are interrupting idle task, I guess if (p != rq->idle) will be false. >>>>>> >>>> If an IRQ interrupts the idle task the tick is supposed to be accounted >>>> as an idle tick. Only if the IRQ interrupted the system while it has >>>> been in hardirq or softirq processing then it should be accounted as >>>> system tick. >>>> >>>>> Maybe following patch is needed ? >>>>> >>>>> [PATCH] sched: account system time properly >>>>> >>>>> When idle task is interrupted by an IRQ, time accounting considers CPU is idle, >>>>> even while it should account for hard or softirq. >>>>> >>>>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> >>>>> >>>>> diff --git a/kernel/sched.c b/kernel/sched.c >>>>> index b902e58..26efa47 100644 >>>>> --- a/kernel/sched.c >>>>> +++ b/kernel/sched.c >>>>> @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) >>>>> >>>>> if (user_tick) >>>>> account_user_time(p, one_jiffy, one_jiffy_scaled); >>>>> - else if (p != rq->idle) >>>>> + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) >>>>> account_system_time(p, HARDIRQ_OFFSET, one_jiffy, >>>>> one_jiffy_scaled); >>>>> else >>>> That patch makes a lot of sense to me. Does it fix the problem? >>>> >>> Yes it does, on my machine at least : >>> >>> 11:18:48 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle >>> 11:18:58 AM all 0.00 0.00 0.00 0.00 0.21 0.69 0.00 0.00 99.10 >>> 11:18:58 AM 0 0.00 0.00 0.00 0.00 1.70 5.50 0.00 0.00 92.80 << HERE >> >>> 11:18:58 AM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 11:18:58 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 11:18:58 AM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 11:18:58 AM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 11:18:58 AM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 11:18:58 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >>> 11:18:58 AM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 >> Very good. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> > > Thanks. > > Eric, mind (re-)sending the patch with Martin's ack included, and > with either a suitable impact-line footer or an extra paragraph that > describes the bug you found (and how it shows up in practice) and > how the patch fixed that problem. > No problem, here it is : [PATCH] sched: account system time properly Andrew Gallatin reported that IRQ and SOFTIRQ times were sometime not reported correctly on recent kernels, and even bisected to commit 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 ([PATCH] fix scaled & unscaled cputime accounting) first bad commit. Further analysis pointed that commit 79741dd35713ff4f6fd0eafd59fa94e8a4ba922d ([PATCH] idle cputime accounting) was the real cause of the problem. account_process_tick() was not taking into account timer IRQ interrupting the idle task servicing a hard or soft irq. On mostly idle cpu, irqs were thus not accounted and top or mpstat could tell user/admin that cpu was 100 % idle, 0.00 % irq, 0.00 % softirq, while it was not. Reported-by: Andrew Gallatin <gallatin@myri.com> Re-reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Tested-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> diff --git a/kernel/sched.c b/kernel/sched.c index b902e58..26efa47 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) if (user_tick) account_user_time(p, one_jiffy, one_jiffy_scaled); - else if (p != rq->idle) + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) account_system_time(p, HARDIRQ_OFFSET, one_jiffy, one_jiffy_scaled); else ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [tip:sched/urgent] sched: account system time properly 2009-04-29 12:44 ` Eric Dumazet @ 2009-04-29 13:09 ` tip-bot for Eric Dumazet 2009-04-29 14:58 ` Andrew Morton 2009-04-29 13:23 ` [PATCH] " Andrew Gallatin 1 sibling, 1 reply; 14+ messages in thread From: tip-bot for Eric Dumazet @ 2009-04-29 13:09 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, paulus, hpa, mingo, schwidefsky, dada1, benh, gallatin, akpm, tglx, mingo Commit-ID: f5f293a4e3d0a0c52cec31de6762c95050156516 Gitweb: http://git.kernel.org/tip/f5f293a4e3d0a0c52cec31de6762c95050156516 Author: Eric Dumazet <dada1@cosmosbay.com> AuthorDate: Wed, 29 Apr 2009 14:44:49 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Wed, 29 Apr 2009 15:02:28 +0200 sched: account system time properly Andrew Gallatin reported that IRQ and SOFTIRQ times were sometime not reported correctly on recent kernels, and even bisected to commit 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 ([PATCH] fix scaled & unscaled cputime accounting) as the first bad commit. Further analysis pointed that commit 79741dd35713ff4f6fd0eafd59fa94e8a4ba922d ([PATCH] idle cputime accounting) was the real cause of the problem. account_process_tick() was not taking into account timer IRQ interrupting the idle task servicing a hard or soft irq. On mostly idle cpu, irqs were thus not accounted and top or mpstat could tell user/admin that cpu was 100 % idle, 0.00 % irq, 0.00 % softirq, while it was not. [ Impact: fix occasionally incorrect CPU statistics in top/mpstat ] Reported-by: Andrew Gallatin <gallatin@myri.com> Re-reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: rick.jones2@hp.com Cc: brice@myri.com Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <49F84BC1.7080602@cosmosbay.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index b902e58..26efa47 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) if (user_tick) account_user_time(p, one_jiffy, one_jiffy_scaled); - else if (p != rq->idle) + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) account_system_time(p, HARDIRQ_OFFSET, one_jiffy, one_jiffy_scaled); else ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [tip:sched/urgent] sched: account system time properly 2009-04-29 13:09 ` [tip:sched/urgent] " tip-bot for Eric Dumazet @ 2009-04-29 14:58 ` Andrew Morton 2009-04-29 19:06 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2009-04-29 14:58 UTC (permalink / raw) To: mingo, hpa, paulus, linux-kernel, schwidefsky, dada1, benh, gallatin, akpm, tglx, mingo Cc: tip-bot for Eric Dumazet, linux-tip-commits, linux-kernel, paulus, hpa, mingo, schwidefsky, benh, gallatin, tglx, mingo, stable On Wed, 29 Apr 2009 13:09:19 GMT tip-bot for Eric Dumazet <dada1@cosmosbay.com> wrote: > Commit-ID: f5f293a4e3d0a0c52cec31de6762c95050156516 > Gitweb: http://git.kernel.org/tip/f5f293a4e3d0a0c52cec31de6762c95050156516 > Author: Eric Dumazet <dada1@cosmosbay.com> > AuthorDate: Wed, 29 Apr 2009 14:44:49 +0200 > Committer: Ingo Molnar <mingo@elte.hu> > CommitDate: Wed, 29 Apr 2009 15:02:28 +0200 > > sched: account system time properly > > ... > > Reported-by: Andrew Gallatin <gallatin@myri.com> > Re-reported-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> > Cc: rick.jones2@hp.com > Cc: brice@myri.com > Cc: Paul Mackerras <paulus@samba.org> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > LKML-Reference: <49F84BC1.7080602@cosmosbay.com> > Signed-off-by: Ingo Molnar <mingo@elte.hu> The regression was present in 2.6.29, so a -stable backport (and cc!) is appropriate. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tip:sched/urgent] sched: account system time properly 2009-04-29 14:58 ` Andrew Morton @ 2009-04-29 19:06 ` Ingo Molnar 0 siblings, 0 replies; 14+ messages in thread From: Ingo Molnar @ 2009-04-29 19:06 UTC (permalink / raw) To: Andrew Morton Cc: mingo, hpa, paulus, linux-kernel, schwidefsky, dada1, benh, gallatin, tglx, linux-tip-commits, stable * Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 29 Apr 2009 13:09:19 GMT tip-bot for Eric Dumazet <dada1@cosmosbay.com> wrote: > > > Commit-ID: f5f293a4e3d0a0c52cec31de6762c95050156516 > > Gitweb: http://git.kernel.org/tip/f5f293a4e3d0a0c52cec31de6762c95050156516 > > Author: Eric Dumazet <dada1@cosmosbay.com> > > AuthorDate: Wed, 29 Apr 2009 14:44:49 +0200 > > Committer: Ingo Molnar <mingo@elte.hu> > > CommitDate: Wed, 29 Apr 2009 15:02:28 +0200 > > > > sched: account system time properly > > > > ... > > > > Reported-by: Andrew Gallatin <gallatin@myri.com> > > Re-reported-by: Andrew Morton <akpm@linux-foundation.org> > > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > > Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> > > Cc: rick.jones2@hp.com > > Cc: brice@myri.com > > Cc: Paul Mackerras <paulus@samba.org> > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > LKML-Reference: <49F84BC1.7080602@cosmosbay.com> > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > The regression was present in 2.6.29, so a -stable backport (and > cc!) is appropriate. agreed, i've forwarded this info to stable@kernel.org. Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sched: account system time properly 2009-04-29 12:44 ` Eric Dumazet 2009-04-29 13:09 ` [tip:sched/urgent] " tip-bot for Eric Dumazet @ 2009-04-29 13:23 ` Andrew Gallatin 1 sibling, 0 replies; 14+ messages in thread From: Andrew Gallatin @ 2009-04-29 13:23 UTC (permalink / raw) To: Eric Dumazet Cc: Ingo Molnar, Martin Schwidefsky, Andrew Morton, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt Eric Dumazet wrote: > On mostly idle cpu, irqs were thus not accounted and top or mpstat could tell > user/admin that cpu was 100 % idle, 0.00 % irq, 0.00 % softirq, while it was not. > > Reported-by: Andrew Gallatin <gallatin@myri.com> > Re-reported-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> > Tested-by: Eric Dumazet <dada1@cosmosbay.com> > Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> > > diff --git a/kernel/sched.c b/kernel/sched.c > index b902e58..26efa47 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -4732,7 +4732,7 @@ void account_process_tick(struct task_struct *p, int user_tick) > > if (user_tick) > account_user_time(p, one_jiffy, one_jiffy_scaled); > - else if (p != rq->idle) > + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) > account_system_time(p, HARDIRQ_OFFSET, one_jiffy, > one_jiffy_scaled); > else > I can confirm this fixes the issue for me. Thank you, and I am sorry if my git-bisect fumble-fingers caused too much wasted time. Cheers, Drew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 2009-04-28 23:30 ` Andrew Morton 2009-04-29 5:01 ` Eric Dumazet @ 2009-04-29 8:02 ` Martin Schwidefsky 1 sibling, 0 replies; 14+ messages in thread From: Martin Schwidefsky @ 2009-04-29 8:02 UTC (permalink / raw) To: Andrew Morton Cc: Andrew Gallatin, linux-kernel, rick.jones2, brice, Paul Mackerras, Benjamin Herrenschmidt, Ingo Molnar On Tue, 28 Apr 2009 16:30:04 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > (cc's added) > > On Thu, 23 Apr 2009 10:19:38 -0400 > Andrew Gallatin <gallatin@myri.com> wrote: > > > When running netperf for some 10GbE tests, I noticed > > that IRQ and SOFTIRQ CPU time is no longer reported for an > > otherwise idle CPU on recent kernels, at least for x86_64. > > > > I spent a few hours git-bisecting until I finally got here: > > > > % git-bisect bad > > Bisecting: 0 revisions left to test after this > > [457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4] fix scaled & unscaled cputime > > accounting > > > > I have neither CONFIG_NO_HZ, CONFIG_VIRT_CPU_ACCOUNTING, or XEN configured. > > > > Thanks for doing the bisection. > > 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 was merged late last year, so > this regression has been around for a while. > > We might have fixed it in more recent kernels - stranger things have > happened ;) Hmm, are you sure that this commit caused the problem? I can't see how. In regard to the accounting of system ticks on x86_64 the only change is the bug fix that a system tick of a guest cpu has incorrectly been added to p->stimescaled instead of p->utimescaled. Without guest cpus in the picture nothing changes. For x86_64 CONFIG_VIRT_CPU_ACCOUNTING=n so account_process_tick is used. Before the patch accout_process_tick accounted a system tick by calling account_system_time and account_system_time_scaled. The second function just added the cputime to p->stimescaled. That is now done in accout_system_time. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-04-29 19:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-23 14:19 IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 Andrew Gallatin 2009-04-28 23:30 ` Andrew Morton 2009-04-29 5:01 ` Eric Dumazet 2009-04-29 7:46 ` [PATCH] sched: account system time properly Eric Dumazet 2009-04-29 8:08 ` Martin Schwidefsky 2009-04-29 9:20 ` Eric Dumazet 2009-04-29 9:48 ` Martin Schwidefsky 2009-04-29 10:24 ` Ingo Molnar 2009-04-29 12:44 ` Eric Dumazet 2009-04-29 13:09 ` [tip:sched/urgent] " tip-bot for Eric Dumazet 2009-04-29 14:58 ` Andrew Morton 2009-04-29 19:06 ` Ingo Molnar 2009-04-29 13:23 ` [PATCH] " Andrew Gallatin 2009-04-29 8:02 ` IRQ / SoftIRQ CPU time accounting broken by 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 Martin Schwidefsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox