* [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle
@ 2016-07-11 16:53 riel
2016-07-11 16:53 ` [PATCH 1/3] sched,time: count actually elapsed irq & softirq time riel
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: riel @ 2016-07-11 16:53 UTC (permalink / raw)
To: linux-kernel
Cc: peterz, mingo, pbonzini, fweisbec, wanpeng.li, efault, tglx,
rkrcmar
Currently irq time accounting only works in these cases:
1) purely ticke based accounting
2) nohz_full accounting, but only on housekeeping & nohz_full CPUs
3) architectures with native vtime accounting
On nohz_idle CPUs, which are probably the majority nowadays,
irq time accounting is currently broken. This leads to systems
reporting a dramatically lower amount of irq & softirq time than
is actually spent handling them, with all the time spent while the
system is in the idle task being accounted as idle.
This patch set seems to bring the amount of irq time reported by
top (and /proc/stat) roughly in line with that measured when I do
a "perf record -g -a" run to see what is using all that time.
The amount of irq time used, especially softirq, is shockingly high,
to the point of me thinking this patch set may be wrong, but the
numbers seem to match what perf is giving me...
These patches apply on top of Wanpeng Li's steal time patches.
CONFIG_IRQ_TIME_ACCOUNTING is now a config option that is available
as a separate choice from tick based / nohz_idle / nohz_full mode,
a suggested by Frederic Weisbecker.
Next up: look at the things that are using CPU time on an otherwise
idle system, and see if I can make those a little faster :)
v2: address Peterz's concerns, some more cleanups
v3: rewrite the code along Frederic's suggestions, now cputime_t
is used everywhere
v4: greatly simplify the local_irq_save/restore optimisation, thanks
to Paolo pointing out irqs are already blocked by the callers
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] sched,time: count actually elapsed irq & softirq time
2016-07-11 16:53 [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle riel
@ 2016-07-11 16:53 ` riel
2016-07-11 16:53 ` [PATCH 2/3] nohz,cputime: replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code riel
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: riel @ 2016-07-11 16:53 UTC (permalink / raw)
To: linux-kernel
Cc: peterz, mingo, pbonzini, fweisbec, wanpeng.li, efault, tglx,
rkrcmar
From: Rik van Riel <riel@redhat.com>
Currently, if there was any irq or softirq time during 'ticks'
jiffies, the entire period will be accounted as irq or softirq
time.
This is inaccurate if only a subset of the time was actually spent
handling irqs, and could conceivably mis-count all of the ticks during
a period as irq time, when there was some irq and some softirq time.
This can actually happen when irqtime_account_process_tick is called
from account_idle_ticks, which can pass a larger number of ticks down
all at once.
Fix this by changing irqtime_account_hi_update, irqtime_account_si_update,
and steal_account_process_ticks to work with cputime_t time units, and
return the amount of time spent in each mode.
Rename steal_account_process_ticks to steal_account_process_time, to
reflect that time is now accounted in cputime_t, instead of ticks.
Additionally, have irqtime_account_process_tick take into account how
much time was spent in each of steal, irq, and softirq time.
The latter could help improve the accuracy of cputime
accounting when returning from idle on a NO_HZ_IDLE CPU.
Properly accounting how much time was spent in hardirq and
softirq time will also allow the NO_HZ_FULL code to re-use
these same functions for hardirq and softirq accounting.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
include/asm-generic/cputime_nsecs.h | 2 +
kernel/sched/cputime.c | 124 ++++++++++++++++++++++--------------
2 files changed, 79 insertions(+), 47 deletions(-)
diff --git a/include/asm-generic/cputime_nsecs.h b/include/asm-generic/cputime_nsecs.h
index 0f1c6f315cdc..918ebb01486c 100644
--- a/include/asm-generic/cputime_nsecs.h
+++ b/include/asm-generic/cputime_nsecs.h
@@ -50,6 +50,8 @@ typedef u64 __nocast cputime64_t;
(__force u64)(__ct)
#define nsecs_to_cputime(__nsecs) \
(__force cputime_t)(__nsecs)
+#define nsecs_to_cputime64(__nsecs) \
+ (__force cputime_t)(__nsecs)
/*
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 3d60e5d76fdb..db82ae12cf01 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -79,40 +79,50 @@ void irqtime_account_irq(struct task_struct *curr)
}
EXPORT_SYMBOL_GPL(irqtime_account_irq);
-static int irqtime_account_hi_update(void)
+static cputime_t irqtime_account_hi_update(cputime_t maxtime)
{
u64 *cpustat = kcpustat_this_cpu->cpustat;
unsigned long flags;
- u64 latest_ns;
- int ret = 0;
+ cputime_t irq_cputime;
local_irq_save(flags);
- latest_ns = this_cpu_read(cpu_hardirq_time);
- if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ])
- ret = 1;
+ irq_cputime = nsecs_to_cputime64(this_cpu_read(cpu_hardirq_time)) -
+ cpustat[CPUTIME_IRQ];
+ irq_cputime = min(irq_cputime, maxtime);
+ cpustat[CPUTIME_IRQ] += irq_cputime;
local_irq_restore(flags);
- return ret;
+ return irq_cputime;
}
-static int irqtime_account_si_update(void)
+static cputime_t irqtime_account_si_update(cputime_t maxtime)
{
u64 *cpustat = kcpustat_this_cpu->cpustat;
unsigned long flags;
- u64 latest_ns;
- int ret = 0;
+ cputime_t softirq_cputime;
local_irq_save(flags);
- latest_ns = this_cpu_read(cpu_softirq_time);
- if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_SOFTIRQ])
- ret = 1;
+ softirq_cputime = nsecs_to_cputime64(this_cpu_read(cpu_softirq_time)) -
+ cpustat[CPUTIME_SOFTIRQ];
+ softirq_cputime = min(softirq_cputime, maxtime);
+ cpustat[CPUTIME_SOFTIRQ] += softirq_cputime;
local_irq_restore(flags);
- return ret;
+ return softirq_cputime;
}
#else /* CONFIG_IRQ_TIME_ACCOUNTING */
#define sched_clock_irqtime (0)
+static cputime_t irqtime_account_hi_update(cputime_t dummy)
+{
+ return 0;
+}
+
+static cputime_t irqtime_account_si_update(cputime_t dummy)
+{
+ return 0;
+}
+
#endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
static inline void task_group_account_field(struct task_struct *p, int index,
@@ -257,32 +267,45 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
}
-static __always_inline unsigned long steal_account_process_tick(unsigned long max_jiffies)
+static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(¶virt_steal_enabled)) {
+ cputime_t steal_cputime;
u64 steal;
- unsigned long steal_jiffies;
steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
- /*
- * steal is in nsecs but our caller is expecting steal
- * time in jiffies. Lets cast the result to jiffies
- * granularity and account the rest on the next rounds.
- */
- steal_jiffies = min(nsecs_to_jiffies(steal), max_jiffies);
- this_rq()->prev_steal_time += jiffies_to_nsecs(steal_jiffies);
+ steal_cputime = min(nsecs_to_cputime(steal), maxtime);
+ account_steal_time(steal_cputime);
+ this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);
- account_steal_time(jiffies_to_cputime(steal_jiffies));
- return steal_jiffies;
+ return steal_cputime;
}
#endif
return 0;
}
/*
+ * Account how much elapsed time was spent in steal, irq, or softirq time.
+ */
+static inline cputime_t account_other_time(cputime_t max)
+{
+ cputime_t accounted;
+
+ accounted = steal_account_process_time(max);
+
+ if (accounted < max)
+ accounted += irqtime_account_hi_update(max - accounted);
+
+ if (accounted < max)
+ accounted += irqtime_account_si_update(max - accounted);
+
+ return accounted;
+}
+
+/*
* Accumulate raw cputime values of dead tasks (sig->[us]time) and live
* tasks (sum on group iteration) belonging to @tsk's group.
*/
@@ -342,21 +365,23 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
struct rq *rq, int ticks)
{
- cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
- u64 cputime = (__force u64) cputime_one_jiffy;
- u64 *cpustat = kcpustat_this_cpu->cpustat;
+ u64 cputime = (__force u64) cputime_one_jiffy * ticks;
+ cputime_t scaled, other;
- if (steal_account_process_tick(ULONG_MAX))
+ /*
+ * When returning from idle, many ticks can get accounted at
+ * once, including some ticks of steal, irq, and softirq time.
+ * Subtract those ticks from the amount of time accounted to
+ * idle, or potentially user or system time. Due to rounding,
+ * other time can exceed ticks occasionally.
+ */
+ other = account_other_time(cputime);
+ if (other >= cputime)
return;
+ cputime -= other;
+ scaled = cputime_to_scaled(cputime);
- cputime *= ticks;
- scaled *= ticks;
-
- if (irqtime_account_hi_update()) {
- cpustat[CPUTIME_IRQ] += cputime;
- } else if (irqtime_account_si_update()) {
- cpustat[CPUTIME_SOFTIRQ] += cputime;
- } else if (this_cpu_ksoftirqd() == p) {
+ if (this_cpu_ksoftirqd() == p) {
/*
* ksoftirqd time do not get accounted in cpu_softirq_time.
* So, we have to handle it separately here.
@@ -466,7 +491,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
*/
void account_process_tick(struct task_struct *p, int user_tick)
{
- cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
+ cputime_t cputime, scaled, steal;
struct rq *rq = this_rq();
if (vtime_accounting_cpu_enabled())
@@ -477,16 +502,21 @@ void account_process_tick(struct task_struct *p, int user_tick)
return;
}
- if (steal_account_process_tick(ULONG_MAX))
+ cputime = cputime_one_jiffy;
+ steal = steal_account_process_time(cputime);
+
+ if (steal >= cputime)
return;
+ cputime -= steal;
+ scaled = cputime_to_scaled(cputime);
+
if (user_tick)
- account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
+ account_user_time(p, cputime, scaled);
else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
- account_system_time(p, HARDIRQ_OFFSET, cputime_one_jiffy,
- one_jiffy_scaled);
+ account_system_time(p, HARDIRQ_OFFSET, cputime, scaled);
else
- account_idle_time(cputime_one_jiffy);
+ account_idle_time(cputime);
}
/*
@@ -681,14 +711,14 @@ static cputime_t vtime_delta(struct task_struct *tsk)
static cputime_t get_vtime_delta(struct task_struct *tsk)
{
unsigned long now = READ_ONCE(jiffies);
- unsigned long delta_jiffies, steal_jiffies;
+ cputime_t delta, steal;
- delta_jiffies = now - tsk->vtime_snap;
- steal_jiffies = steal_account_process_tick(delta_jiffies);
+ delta = jiffies_to_cputime(now - tsk->vtime_snap);
+ steal = steal_account_process_time(delta);
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap = now;
- return jiffies_to_cputime(delta_jiffies - steal_jiffies);
+ return delta - steal;
}
static void __vtime_account_system(struct task_struct *tsk)
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] nohz,cputime: replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code
2016-07-11 16:53 [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle riel
2016-07-11 16:53 ` [PATCH 1/3] sched,time: count actually elapsed irq & softirq time riel
@ 2016-07-11 16:53 ` riel
2016-07-11 16:53 ` [PATCH 3/3] time: drop local_irq_save/restore from irqtime_account_irq riel
2016-07-12 12:10 ` [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle Frederic Weisbecker
3 siblings, 0 replies; 6+ messages in thread
From: riel @ 2016-07-11 16:53 UTC (permalink / raw)
To: linux-kernel
Cc: peterz, mingo, pbonzini, fweisbec, wanpeng.li, efault, tglx,
rkrcmar
From: Rik van Riel <riel@redhat.com>
The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
appear to currently work right.
On CPUs without nohz_full=, only tick based irq time sampling is
done, which breaks down when dealing with a nohz_idle CPU.
On firewalls and similar systems, no ticks may happen on a CPU for a
while, and the irq time spent may never get accounted properly. This
can cause issues with capacity planning and power saving, which use
the CPU statistics as inputs in decision making.
Replace the VTIME_GEN vtime irq time code, and replace it with the
IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
include/linux/vtime.h | 32 ++++++++++++++------------------
init/Kconfig | 6 +++---
kernel/sched/cputime.c | 16 +++-------------
3 files changed, 20 insertions(+), 34 deletions(-)
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index fa2196990f84..d1977d84ebdf 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -14,6 +14,18 @@ struct task_struct;
*/
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
static inline bool vtime_accounting_cpu_enabled(void) { return true; }
+
+#ifdef __ARCH_HAS_VTIME_ACCOUNT
+extern void vtime_account_irq_enter(struct task_struct *tsk);
+#else
+extern void vtime_common_account_irq_enter(struct task_struct *tsk);
+static inline void vtime_account_irq_enter(struct task_struct *tsk)
+{
+ if (vtime_accounting_cpu_enabled())
+ vtime_common_account_irq_enter(tsk);
+}
+#endif /* __ARCH_HAS_VTIME_ACCOUNT */
+
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
@@ -64,17 +76,6 @@ extern void vtime_account_system(struct task_struct *tsk);
extern void vtime_account_idle(struct task_struct *tsk);
extern void vtime_account_user(struct task_struct *tsk);
-#ifdef __ARCH_HAS_VTIME_ACCOUNT
-extern void vtime_account_irq_enter(struct task_struct *tsk);
-#else
-extern void vtime_common_account_irq_enter(struct task_struct *tsk);
-static inline void vtime_account_irq_enter(struct task_struct *tsk)
-{
- if (vtime_accounting_cpu_enabled())
- vtime_common_account_irq_enter(tsk);
-}
-#endif /* __ARCH_HAS_VTIME_ACCOUNT */
-
#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
static inline void vtime_task_switch(struct task_struct *prev) { }
@@ -85,13 +86,8 @@ static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
extern void arch_vtime_task_switch(struct task_struct *tsk);
-extern void vtime_gen_account_irq_exit(struct task_struct *tsk);
-
-static inline void vtime_account_irq_exit(struct task_struct *tsk)
-{
- if (vtime_accounting_cpu_enabled())
- vtime_gen_account_irq_exit(tsk);
-}
+static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
+static inline void vtime_account_irq_exit(struct task_struct *tsk) { }
extern void vtime_user_enter(struct task_struct *tsk);
diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..4c7ee4f136cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -375,9 +375,11 @@ config VIRT_CPU_ACCOUNTING_GEN
If unsure, say N.
+endchoice
+
config IRQ_TIME_ACCOUNTING
bool "Fine granularity task level IRQ time accounting"
- depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL
+ depends on HAVE_IRQ_TIME_ACCOUNTING && !VIRT_CPU_ACCOUNTING_NATIVE
help
Select this option to enable fine granularity task irq time
accounting. This is done by reading a timestamp on each
@@ -386,8 +388,6 @@ config IRQ_TIME_ACCOUNTING
If in doubt, say N here.
-endchoice
-
config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
depends on MULTIUSER
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index db82ae12cf01..ca7e33cb0967 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -711,14 +711,14 @@ static cputime_t vtime_delta(struct task_struct *tsk)
static cputime_t get_vtime_delta(struct task_struct *tsk)
{
unsigned long now = READ_ONCE(jiffies);
- cputime_t delta, steal;
+ cputime_t delta, other;
delta = jiffies_to_cputime(now - tsk->vtime_snap);
- steal = steal_account_process_time(delta);
+ other = account_other_time(delta);
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap = now;
- return delta - steal;
+ return delta - other;
}
static void __vtime_account_system(struct task_struct *tsk)
@@ -738,16 +738,6 @@ void vtime_account_system(struct task_struct *tsk)
write_seqcount_end(&tsk->vtime_seqcount);
}
-void vtime_gen_account_irq_exit(struct task_struct *tsk)
-{
- write_seqcount_begin(&tsk->vtime_seqcount);
- if (vtime_delta(tsk))
- __vtime_account_system(tsk);
- if (context_tracking_in_user())
- tsk->vtime_snap_whence = VTIME_USER;
- write_seqcount_end(&tsk->vtime_seqcount);
-}
-
void vtime_account_user(struct task_struct *tsk)
{
cputime_t delta_cpu;
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] time: drop local_irq_save/restore from irqtime_account_irq
2016-07-11 16:53 [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle riel
2016-07-11 16:53 ` [PATCH 1/3] sched,time: count actually elapsed irq & softirq time riel
2016-07-11 16:53 ` [PATCH 2/3] nohz,cputime: replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code riel
@ 2016-07-11 16:53 ` riel
2016-07-11 17:03 ` Paolo Bonzini
2016-07-12 12:10 ` [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle Frederic Weisbecker
3 siblings, 1 reply; 6+ messages in thread
From: riel @ 2016-07-11 16:53 UTC (permalink / raw)
To: linux-kernel
Cc: peterz, mingo, pbonzini, fweisbec, wanpeng.li, efault, tglx,
rkrcmar
From: Rik van Riel <riel@redhat.com>
Paolo pointed out that irqs are already blocked when irqtime_account_irq
is called. That means there is no reason to call local_irq_save/restore
again.
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
---
kernel/sched/cputime.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index ca7e33cb0967..7b6fa4d7ad4c 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -49,15 +49,12 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
*/
void irqtime_account_irq(struct task_struct *curr)
{
- unsigned long flags;
s64 delta;
int cpu;
if (!sched_clock_irqtime)
return;
- local_irq_save(flags);
-
cpu = smp_processor_id();
delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
__this_cpu_add(irq_start_time, delta);
@@ -75,7 +72,6 @@ void irqtime_account_irq(struct task_struct *curr)
__this_cpu_add(cpu_softirq_time, delta);
irq_time_write_end();
- local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(irqtime_account_irq);
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] time: drop local_irq_save/restore from irqtime_account_irq
2016-07-11 16:53 ` [PATCH 3/3] time: drop local_irq_save/restore from irqtime_account_irq riel
@ 2016-07-11 17:03 ` Paolo Bonzini
0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2016-07-11 17:03 UTC (permalink / raw)
To: riel, linux-kernel
Cc: peterz, mingo, fweisbec, wanpeng.li, efault, tglx, rkrcmar
On 11/07/2016 18:53, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
>
> Paolo pointed out that irqs are already blocked when irqtime_account_irq
> is called. That means there is no reason to call local_irq_save/restore
> again.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> kernel/sched/cputime.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index ca7e33cb0967..7b6fa4d7ad4c 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -49,15 +49,12 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
> */
> void irqtime_account_irq(struct task_struct *curr)
> {
> - unsigned long flags;
> s64 delta;
> int cpu;
>
> if (!sched_clock_irqtime)
> return;
>
> - local_irq_save(flags);
> -
> cpu = smp_processor_id();
> delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
> __this_cpu_add(irq_start_time, delta);
> @@ -75,7 +72,6 @@ void irqtime_account_irq(struct task_struct *curr)
> __this_cpu_add(cpu_softirq_time, delta);
>
> irq_time_write_end();
> - local_irq_restore(flags);
> }
> EXPORT_SYMBOL_GPL(irqtime_account_irq);
>
>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle
2016-07-11 16:53 [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle riel
` (2 preceding siblings ...)
2016-07-11 16:53 ` [PATCH 3/3] time: drop local_irq_save/restore from irqtime_account_irq riel
@ 2016-07-12 12:10 ` Frederic Weisbecker
3 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2016-07-12 12:10 UTC (permalink / raw)
To: riel
Cc: linux-kernel, peterz, mingo, pbonzini, fweisbec, wanpeng.li,
efault, tglx, rkrcmar
On Mon, Jul 11, 2016 at 12:53:54PM -0400, riel@redhat.com wrote:
> Currently irq time accounting only works in these cases:
> 1) purely ticke based accounting
> 2) nohz_full accounting, but only on housekeeping & nohz_full CPUs
> 3) architectures with native vtime accounting
>
> On nohz_idle CPUs, which are probably the majority nowadays,
> irq time accounting is currently broken. This leads to systems
> reporting a dramatically lower amount of irq & softirq time than
> is actually spent handling them, with all the time spent while the
> system is in the idle task being accounted as idle.
>
> This patch set seems to bring the amount of irq time reported by
> top (and /proc/stat) roughly in line with that measured when I do
> a "perf record -g -a" run to see what is using all that time.
>
> The amount of irq time used, especially softirq, is shockingly high,
> to the point of me thinking this patch set may be wrong, but the
> numbers seem to match what perf is giving me...
>
> These patches apply on top of Wanpeng Li's steal time patches.
>
> CONFIG_IRQ_TIME_ACCOUNTING is now a config option that is available
> as a separate choice from tick based / nohz_idle / nohz_full mode,
> a suggested by Frederic Weisbecker.
>
> Next up: look at the things that are using CPU time on an otherwise
> idle system, and see if I can make those a little faster :)
>
> v2: address Peterz's concerns, some more cleanups
> v3: rewrite the code along Frederic's suggestions, now cputime_t
> is used everywhere
> v4: greatly simplify the local_irq_save/restore optimisation, thanks
> to Paolo pointing out irqs are already blocked by the callers
>
Thanks Rick!
I'm applying the series with my patches and will do a pull request to
Ingo.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-12 12:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-11 16:53 [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle riel
2016-07-11 16:53 ` [PATCH 1/3] sched,time: count actually elapsed irq & softirq time riel
2016-07-11 16:53 ` [PATCH 2/3] nohz,cputime: replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code riel
2016-07-11 16:53 ` [PATCH 3/3] time: drop local_irq_save/restore from irqtime_account_irq riel
2016-07-11 17:03 ` Paolo Bonzini
2016-07-12 12:10 ` [PATCH v4 0/3] sched,time: fix irq time accounting with nohz_idle Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox