* [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers @ 2015-08-14 1:46 Frederic Weisbecker 2015-08-22 21:09 ` Ping: " Frederic Weisbecker 2015-08-23 5:40 ` Ingo Molnar 0 siblings, 2 replies; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-14 1:46 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter, Paul E . McKenney, Frederic Weisbecker From: Vatika Harlalka <vatikaharlalka@gmail.com> The problem addressed in this patch is about affining unpinned timers. Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary jitter due to firing of such timers on them. This patch will affine timers to online CPUs which are not full dynticks in NOHZ_FULL configured systems. It should not introduce overhead in nohz full off case due to static keys. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off by: Vatika Harlalka <vatikaharlalka@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- include/linux/tick.h | 9 ++++++++- kernel/sched/core.c | 7 +++++-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/tick.h b/include/linux/tick.h index 3741ba1..51e6493 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) if (tick_nohz_full_enabled()) cpumask_or(mask, mask, tick_nohz_full_mask); } - +static inline int housekeeping_any_cpu(void) +{ + return cpumask_any_and(housekeeping_mask, cpu_online_mask); +} extern void __tick_nohz_full_check(void); extern void tick_nohz_full_kick(void); extern void tick_nohz_full_kick_cpu(int cpu); extern void tick_nohz_full_kick_all(void); extern void __tick_nohz_task_switch(struct task_struct *tsk); #else +static inline int housekeeping_any_cpu(void) +{ + return smp_processor_id(); +} static inline bool tick_nohz_full_enabled(void) { return false; } static inline bool tick_nohz_full_cpu(int cpu) { return false; } static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9917c96..4fd42e4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -623,18 +623,21 @@ int get_nohz_timer_target(void) int i, cpu = smp_processor_id(); struct sched_domain *sd; - if (!idle_cpu(cpu)) + if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu)) return cpu; rcu_read_lock(); for_each_domain(cpu, sd) { for_each_cpu(i, sched_domain_span(sd)) { - if (!idle_cpu(i)) { + if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) { cpu = i; goto unlock; } } } + + if (!is_housekeeping_cpu(cpu)) + cpu = housekeeping_any_cpu(); unlock: rcu_read_unlock(); return cpu; -- 2.1.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Ping: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-14 1:46 [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers Frederic Weisbecker @ 2015-08-22 21:09 ` Frederic Weisbecker 2015-08-23 1:22 ` Christoph Lameter 2015-08-23 5:40 ` Ingo Molnar 1 sibling, 1 reply; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-22 21:09 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter, Paul E . McKenney Ping! On Fri, Aug 14, 2015 at 03:46:14AM +0200, Frederic Weisbecker wrote: > From: Vatika Harlalka <vatikaharlalka@gmail.com> > > The problem addressed in this patch is about affining unpinned timers. > Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary > jitter due to firing of such timers on them. > > This patch will affine timers to online CPUs which are not full dynticks > in NOHZ_FULL configured systems. It should not introduce overhead in > nohz full off case due to static keys. > > Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > Signed-off by: Vatika Harlalka <vatikaharlalka@gmail.com> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Christoph Lameter <cl@linux.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: Chris Metcalf <cmetcalf@ezchip.com> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > --- > include/linux/tick.h | 9 ++++++++- > kernel/sched/core.c | 7 +++++-- > 2 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/include/linux/tick.h b/include/linux/tick.h > index 3741ba1..51e6493 100644 > --- a/include/linux/tick.h > +++ b/include/linux/tick.h > @@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) > if (tick_nohz_full_enabled()) > cpumask_or(mask, mask, tick_nohz_full_mask); > } > - > +static inline int housekeeping_any_cpu(void) > +{ > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > +} > extern void __tick_nohz_full_check(void); > extern void tick_nohz_full_kick(void); > extern void tick_nohz_full_kick_cpu(int cpu); > extern void tick_nohz_full_kick_all(void); > extern void __tick_nohz_task_switch(struct task_struct *tsk); > #else > +static inline int housekeeping_any_cpu(void) > +{ > + return smp_processor_id(); > +} > static inline bool tick_nohz_full_enabled(void) { return false; } > static inline bool tick_nohz_full_cpu(int cpu) { return false; } > static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { } > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 9917c96..4fd42e4 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -623,18 +623,21 @@ int get_nohz_timer_target(void) > int i, cpu = smp_processor_id(); > struct sched_domain *sd; > > - if (!idle_cpu(cpu)) > + if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu)) > return cpu; > > rcu_read_lock(); > for_each_domain(cpu, sd) { > for_each_cpu(i, sched_domain_span(sd)) { > - if (!idle_cpu(i)) { > + if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) { > cpu = i; > goto unlock; > } > } > } > + > + if (!is_housekeeping_cpu(cpu)) > + cpu = housekeeping_any_cpu(); > unlock: > rcu_read_unlock(); > return cpu; > -- > 2.1.4 > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ping: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-22 21:09 ` Ping: " Frederic Weisbecker @ 2015-08-23 1:22 ` Christoph Lameter 0 siblings, 0 replies; 21+ messages in thread From: Christoph Lameter @ 2015-08-23 1:22 UTC (permalink / raw) To: Frederic Weisbecker Cc: Peter Zijlstra, Ingo Molnar, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Paul E . McKenney On Sat, 22 Aug 2015, Frederic Weisbecker wrote: > Ping! Exellent!!! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-14 1:46 [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers Frederic Weisbecker 2015-08-22 21:09 ` Ping: " Frederic Weisbecker @ 2015-08-23 5:40 ` Ingo Molnar 2015-08-23 16:01 ` Paul E. McKenney 2015-08-24 1:45 ` Frederic Weisbecker 1 sibling, 2 replies; 21+ messages in thread From: Ingo Molnar @ 2015-08-23 5:40 UTC (permalink / raw) To: Frederic Weisbecker Cc: Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter, Paul E . McKenney * Frederic Weisbecker <fweisbec@gmail.com> wrote: > From: Vatika Harlalka <vatikaharlalka@gmail.com> > > The problem addressed in this patch is about affining unpinned timers. > Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary > jitter due to firing of such timers on them. > > This patch will affine timers to online CPUs which are not full dynticks > in NOHZ_FULL configured systems. It should not introduce overhead in > nohz full off case due to static keys. > > Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > Signed-off by: Vatika Harlalka <vatikaharlalka@gmail.com> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Christoph Lameter <cl@linux.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: Chris Metcalf <cmetcalf@ezchip.com> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > --- > include/linux/tick.h | 9 ++++++++- > kernel/sched/core.c | 7 +++++-- > 2 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/include/linux/tick.h b/include/linux/tick.h > index 3741ba1..51e6493 100644 > --- a/include/linux/tick.h > +++ b/include/linux/tick.h > @@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) > if (tick_nohz_full_enabled()) > cpumask_or(mask, mask, tick_nohz_full_mask); > } > - > +static inline int housekeeping_any_cpu(void) > +{ > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > +} > extern void __tick_nohz_full_check(void); > extern void tick_nohz_full_kick(void); > extern void tick_nohz_full_kick_cpu(int cpu); > extern void tick_nohz_full_kick_all(void); > extern void __tick_nohz_task_switch(struct task_struct *tsk); > #else > +static inline int housekeeping_any_cpu(void) > +{ > + return smp_processor_id(); > +} > static inline bool tick_nohz_full_enabled(void) { return false; } > static inline bool tick_nohz_full_cpu(int cpu) { return false; } > static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { } > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 9917c96..4fd42e4 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -623,18 +623,21 @@ int get_nohz_timer_target(void) > int i, cpu = smp_processor_id(); > struct sched_domain *sd; > > - if (!idle_cpu(cpu)) > + if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu)) > return cpu; > > rcu_read_lock(); > for_each_domain(cpu, sd) { > for_each_cpu(i, sched_domain_span(sd)) { > - if (!idle_cpu(i)) { > + if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) { > cpu = i; > goto unlock; > } > } > } > + > + if (!is_housekeeping_cpu(cpu)) > + cpu = housekeeping_any_cpu(); > unlock: > rcu_read_unlock(); > return cpu; So I almost applied this yesterday, but had the following question: what ensures that housekeeping_mask isn't empty? If it's empty then housekeeping_any_cpu() returns cpumask_any_and() of an empty cpumask - which returns an out of range index AFAICS - which will crash and burn in: kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target()); kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target()); housekeeping_mask itself is derived from tick_nohz_full_mask (it's the inverse of it in essence), and tick_nohz_full_mask is set via two methods, either via a boot parameter: if (cpulist_parse(str, tick_nohz_full_mask) < 0) { in tick_nohz_full_setup(). What ensures here that tick_nohz_full_mask is not completely full - making housekeeping_mask empty? The other method is via CONFIG_NO_HZ_FULL_ALL: cpumask_setall(tick_nohz_full_mask); here it's fully set - triggering the bug I'm worried about. So what am I missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing? Thanks, Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-23 5:40 ` Ingo Molnar @ 2015-08-23 16:01 ` Paul E. McKenney 2015-08-24 1:28 ` Frederic Weisbecker 2015-08-24 6:44 ` Ingo Molnar 2015-08-24 1:45 ` Frederic Weisbecker 1 sibling, 2 replies; 21+ messages in thread From: Paul E. McKenney @ 2015-08-23 16:01 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Sun, Aug 23, 2015 at 07:40:32AM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > From: Vatika Harlalka <vatikaharlalka@gmail.com> > > > > The problem addressed in this patch is about affining unpinned timers. > > Adaptive or Full Dynticks CPUs are currently disturbed by unnecessary > > jitter due to firing of such timers on them. > > > > This patch will affine timers to online CPUs which are not full dynticks > > in NOHZ_FULL configured systems. It should not introduce overhead in > > nohz full off case due to static keys. > > > > Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > > Signed-off by: Vatika Harlalka <vatikaharlalka@gmail.com> > > Cc: Ingo Molnar <mingo@kernel.org> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Christoph Lameter <cl@linux.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Cc: Chris Metcalf <cmetcalf@ezchip.com> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > > --- > > include/linux/tick.h | 9 ++++++++- > > kernel/sched/core.c | 7 +++++-- > > 2 files changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/tick.h b/include/linux/tick.h > > index 3741ba1..51e6493 100644 > > --- a/include/linux/tick.h > > +++ b/include/linux/tick.h > > @@ -143,13 +143,20 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) > > if (tick_nohz_full_enabled()) > > cpumask_or(mask, mask, tick_nohz_full_mask); > > } > > - > > +static inline int housekeeping_any_cpu(void) > > +{ > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > > +} > > extern void __tick_nohz_full_check(void); > > extern void tick_nohz_full_kick(void); > > extern void tick_nohz_full_kick_cpu(int cpu); > > extern void tick_nohz_full_kick_all(void); > > extern void __tick_nohz_task_switch(struct task_struct *tsk); > > #else > > +static inline int housekeeping_any_cpu(void) > > +{ > > + return smp_processor_id(); > > +} > > static inline bool tick_nohz_full_enabled(void) { return false; } > > static inline bool tick_nohz_full_cpu(int cpu) { return false; } > > static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { } > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 9917c96..4fd42e4 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -623,18 +623,21 @@ int get_nohz_timer_target(void) > > int i, cpu = smp_processor_id(); > > struct sched_domain *sd; > > > > - if (!idle_cpu(cpu)) > > + if (!idle_cpu(cpu) && is_housekeeping_cpu(cpu)) > > return cpu; > > > > rcu_read_lock(); > > for_each_domain(cpu, sd) { > > for_each_cpu(i, sched_domain_span(sd)) { > > - if (!idle_cpu(i)) { > > + if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) { > > cpu = i; > > goto unlock; > > } > > } > > } > > + > > + if (!is_housekeeping_cpu(cpu)) > > + cpu = housekeeping_any_cpu(); > > unlock: > > rcu_read_unlock(); > > return cpu; > > So I almost applied this yesterday, but had the following question: what ensures > that housekeeping_mask isn't empty? If it's empty then housekeeping_any_cpu() > returns cpumask_any_and() of an empty cpumask - which returns an out of range > index AFAICS - which will crash and burn in: > > kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target()); > kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target()); > > housekeeping_mask itself is derived from tick_nohz_full_mask (it's the inverse of > it in essence), and tick_nohz_full_mask is set via two methods, either via a boot > parameter: > > if (cpulist_parse(str, tick_nohz_full_mask) < 0) { > > in tick_nohz_full_setup(). What ensures here that tick_nohz_full_mask is not > completely full - making housekeeping_mask empty? > > The other method is via CONFIG_NO_HZ_FULL_ALL: > > cpumask_setall(tick_nohz_full_mask); > > here it's fully set - triggering the bug I'm worried about. So what am I missing, > what prevents CONFIG_NO_HZ_FULL_ALL from crashing? The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is called from tick_init() which is called from start_kernel() shortly after rcu_init(): cpu = smp_processor_id(); if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu); cpumask_clear_cpu(cpu, tick_nohz_full_mask); } This happens after the call to tick_nohz_init_all() that does the cpumask_setall() that you called out above. Or is a recent patch that I missed changing this? Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-23 16:01 ` Paul E. McKenney @ 2015-08-24 1:28 ` Frederic Weisbecker 2015-08-24 6:44 ` Ingo Molnar 1 sibling, 0 replies; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-24 1:28 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Sun, Aug 23, 2015 at 09:01:01AM -0700, Paul E. McKenney wrote: > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), > which is called from tick_init() which is called from start_kernel() > shortly after rcu_init(): > > cpu = smp_processor_id(); > > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu); > cpumask_clear_cpu(cpu, tick_nohz_full_mask); > } > > This happens after the call to tick_nohz_init_all() that does the > cpumask_setall() that you called out above. > > Or is a recent patch that I missed changing this? Exactly, this happens right after tick_nohz_full_mask is filled and makes sure that at least the boot CPU runs as a housekeeper. We also make sure that it can't become offline later. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-23 16:01 ` Paul E. McKenney 2015-08-24 1:28 ` Frederic Weisbecker @ 2015-08-24 6:44 ` Ingo Molnar 2015-08-24 7:23 ` Mike Galbraith ` (2 more replies) 1 sibling, 3 replies; 21+ messages in thread From: Ingo Molnar @ 2015-08-24 6:44 UTC (permalink / raw) To: Paul E. McKenney Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > here it's fully set - triggering the bug I'm worried about. So what am I > > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing? > > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is > called from tick_init() which is called from start_kernel() shortly after > rcu_init(): > > cpu = smp_processor_id(); > > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu); > cpumask_clear_cpu(cpu, tick_nohz_full_mask); > } > > This happens after the call to tick_nohz_init_all() that does the > cpumask_setall() that you called out above. Ah, indeed - I somehow missed that. This brings up two other questions: 1) the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to it (housekeeping_mask - a variable mask to begin with) and recover the housekeeping CPU via: + return cpumask_any_and(housekeeping_mask, cpu_online_mask); which can be pretty expensive, and which gets executed in two hotpaths: kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target()); kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target()); ... why not just use a single housekeeping_cpu which would be way faster to pass down to the timer code? 2) What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU is offlined. Thanks, Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 6:44 ` Ingo Molnar @ 2015-08-24 7:23 ` Mike Galbraith 2015-08-24 7:41 ` Ingo Molnar 2015-08-24 13:36 ` Frederic Weisbecker 2015-08-24 13:50 ` Paul E. McKenney 2 siblings, 1 reply; 21+ messages in thread From: Mike Galbraith @ 2015-08-24 7:23 UTC (permalink / raw) To: Ingo Molnar Cc: Paul E. McKenney, Frederic Weisbecker, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote: > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to > it (housekeeping_mask - a variable mask to begin with) and recover the > housekeeping CPU via: > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); There can be, and had better be if box is big, multiple housekeepers. Imagine a NO_HZ_FULL_ALL kernel on an SGI beast from hell. Offloading 8191 CPUs onto poor little CPU0 probably wouldn't work out well :) -Mike ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 7:23 ` Mike Galbraith @ 2015-08-24 7:41 ` Ingo Molnar 2015-08-24 7:54 ` Mike Galbraith 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2015-08-24 7:41 UTC (permalink / raw) To: Mike Galbraith Cc: Paul E. McKenney, Frederic Weisbecker, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter * Mike Galbraith <umgwanakikbuti@gmail.com> wrote: > On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote: > > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to > > it (housekeeping_mask - a variable mask to begin with) and recover the > > housekeeping CPU via: > > > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > > There can be, and had better be if box is big, multiple housekeepers. Yes - but that does not seem to be possible via the code right now AFAICS, so at minimum it's incomplete. Thanks, Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 7:41 ` Ingo Molnar @ 2015-08-24 7:54 ` Mike Galbraith 2015-08-24 8:00 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Mike Galbraith @ 2015-08-24 7:54 UTC (permalink / raw) To: Ingo Molnar Cc: Paul E. McKenney, Frederic Weisbecker, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Mon, 2015-08-24 at 09:41 +0200, Ingo Molnar wrote: > * Mike Galbraith <umgwanakikbuti@gmail.com> wrote: > > > On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote: > > > > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to > > > it (housekeeping_mask - a variable mask to begin with) and recover the > > > housekeeping CPU via: > > > > > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > > > > There can be, and had better be if box is big, multiple housekeepers. > > Yes - but that does not seem to be possible via the code right now AFAICS, so at > minimum it's incomplete. In master housekeepers are set up in tick_nohz_init(). Everybody who's not a nohz_full CPU is a housekeeper. -Mike ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 7:54 ` Mike Galbraith @ 2015-08-24 8:00 ` Ingo Molnar 0 siblings, 0 replies; 21+ messages in thread From: Ingo Molnar @ 2015-08-24 8:00 UTC (permalink / raw) To: Mike Galbraith Cc: Paul E. McKenney, Frederic Weisbecker, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter * Mike Galbraith <umgwanakikbuti@gmail.com> wrote: > On Mon, 2015-08-24 at 09:41 +0200, Ingo Molnar wrote: > > * Mike Galbraith <umgwanakikbuti@gmail.com> wrote: > > > > > On Mon, 2015-08-24 at 08:44 +0200, Ingo Molnar wrote: > > > > > > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to > > > > it (housekeeping_mask - a variable mask to begin with) and recover the > > > > housekeeping CPU via: > > > > > > > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > > > > > > There can be, and had better be if box is big, multiple housekeepers. > > > > Yes - but that does not seem to be possible via the code right now AFAICS, so > > at minimum it's incomplete. > > In master housekeepers are set up in tick_nohz_init(). Everybody who's not a > nohz_full CPU is a housekeeper. Ah, indeed, I missed the nohz_full= boot option: static int __init tick_nohz_full_setup(char *str) { alloc_bootmem_cpumask_var(&tick_nohz_full_mask); if (cpulist_parse(str, tick_nohz_full_mask) < 0) { Ok, first question is resolved - but the second one, behavior on boot CPU unplug still holds. Thanks, Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 6:44 ` Ingo Molnar 2015-08-24 7:23 ` Mike Galbraith @ 2015-08-24 13:36 ` Frederic Weisbecker 2015-08-24 14:01 ` Mike Galbraith 2015-08-25 8:29 ` Ingo Molnar 2015-08-24 13:50 ` Paul E. McKenney 2 siblings, 2 replies; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-24 13:36 UTC (permalink / raw) To: Ingo Molnar Cc: Paul E. McKenney, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > 2) > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > is offlined. We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which is the boot CPU in nohz full, never gets offlined. > > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 13:36 ` Frederic Weisbecker @ 2015-08-24 14:01 ` Mike Galbraith 2015-08-25 8:29 ` Ingo Molnar 1 sibling, 0 replies; 21+ messages in thread From: Mike Galbraith @ 2015-08-24 14:01 UTC (permalink / raw) To: Frederic Weisbecker Cc: Ingo Molnar, Paul E. McKenney, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Mon, 2015-08-24 at 15:36 +0200, Frederic Weisbecker wrote: > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > 2) > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > > is offlined. > > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which > is the boot CPU in nohz full, never gets offlined. Aha.. and it works fine too. (CPU0 told me to go away earlier;) -Mike ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 13:36 ` Frederic Weisbecker 2015-08-24 14:01 ` Mike Galbraith @ 2015-08-25 8:29 ` Ingo Molnar 2015-08-25 13:45 ` Frederic Weisbecker 1 sibling, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2015-08-25 8:29 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paul E. McKenney, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter * Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > 2) > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > > is offlined. > > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which > is the boot CPU in nohz full, never gets offlined. That solution really sucks - it essentially regresses a feature the user explicitly asked for! I also see no way for the user to migrate the timekeeping functionality over to another CPU without rebooting. If this is the last timekeeping CPU then it should migrate the timekeeping functionality to another CPU, and perhaps printk a warning if all other CPUs are nohz-full and we have to mark one of them as the timekeeper. Also, the nohz-full and timekeeper functionality should not be a boot parameter only thing, but should be runtime configurable. Thanks, Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-25 8:29 ` Ingo Molnar @ 2015-08-25 13:45 ` Frederic Weisbecker 2015-08-28 8:32 ` Ingo Molnar 0 siblings, 1 reply; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-25 13:45 UTC (permalink / raw) To: Ingo Molnar Cc: Paul E. McKenney, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Tue, Aug 25, 2015 at 10:29:04AM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > > 2) > > > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > > > is offlined. > > > > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which > > is the boot CPU in nohz full, never gets offlined. > > That solution really sucks - it essentially regresses a feature the user > explicitly asked for! I also see no way for the user to migrate the timekeeping > functionality over to another CPU without rebooting. > > If this is the last timekeeping CPU then it should migrate the timekeeping > functionality to another CPU, and perhaps printk a warning if all other CPUs are > nohz-full and we have to mark one of them as the timekeeper. > > Also, the nohz-full and timekeeper functionality should not be a boot parameter > only thing, but should be runtime configurable. When I tried to allow moving the timekeeping duty over all housekeeping CPUs, Thomas got angry because it broke the KISS current nohz full code. Indeed, there must be at least one running all the time on behalf of nohz full CPUs that can run anytime. Thus balancing the timekeeping duty over housekeepers is a bit more complicated than in normal configurations. Now surely we can do that using an IPI from CPU_DOWN_PREPARE to a housekeeper if any remains or to a nohz full one. Then we must make sure the new timekeeper never goes to idle. But nohz_full is a corner usecase and I'm not sure it's worth the complexity. If a nohz full user came and complained about CPU0 hotplog not working, I would definetly retry it but I haven't heard about that yet. Besides, hotplug is very isolation-unfriendly in general due to stop machine. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-25 13:45 ` Frederic Weisbecker @ 2015-08-28 8:32 ` Ingo Molnar 2015-08-28 12:30 ` Frederic Weisbecker 0 siblings, 1 reply; 21+ messages in thread From: Ingo Molnar @ 2015-08-28 8:32 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paul E. McKenney, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter * Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Tue, Aug 25, 2015 at 10:29:04AM +0200, Ingo Molnar wrote: > > > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > > > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > > > 2) > > > > > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > > > > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > > > > is offlined. > > > > > > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which > > > is the boot CPU in nohz full, never gets offlined. > > > > That solution really sucks - it essentially regresses a feature the user > > explicitly asked for! I also see no way for the user to migrate the timekeeping > > functionality over to another CPU without rebooting. > > > > If this is the last timekeeping CPU then it should migrate the timekeeping > > functionality to another CPU, and perhaps printk a warning if all other CPUs are > > nohz-full and we have to mark one of them as the timekeeper. > > > > Also, the nohz-full and timekeeper functionality should not be a boot parameter > > only thing, but should be runtime configurable. > > When I tried to allow moving the timekeeping duty over all housekeeping CPUs, > Thomas got angry because it broke the KISS current nohz full code. Indeed, there > must be at least one running all the time on behalf of nohz full CPUs that can > run anytime. Thus balancing the timekeeping duty over housekeepers is a bit more > complicated than in normal configurations. > > Now surely we can do that using an IPI from CPU_DOWN_PREPARE to a housekeeper if > any remains or to a nohz full one. Then we must make sure the new timekeeper > never goes to idle. > > But nohz_full is a corner usecase and I'm not sure it's worth the complexity. If > a nohz full user came and complained about CPU0 hotplog not working, I would > definetly retry it but I haven't heard about that yet. Besides, hotplug is very > isolation-unfriendly in general due to stop machine. Ok, I guess we can live with this. Mind sending an updated series with all patches? Thanks, Ingo ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-28 8:32 ` Ingo Molnar @ 2015-08-28 12:30 ` Frederic Weisbecker 0 siblings, 0 replies; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-28 12:30 UTC (permalink / raw) To: Ingo Molnar Cc: Paul E. McKenney, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Fri, Aug 28, 2015 at 10:32:59AM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > On Tue, Aug 25, 2015 at 10:29:04AM +0200, Ingo Molnar wrote: > > > > > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > > > > > > > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > > > > 2) > > > > > > > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > > > > > > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > > > > > is offlined. > > > > > > > > We have tick_nohz_cpu_down_callback() which makes sure that the timekeeper, which > > > > is the boot CPU in nohz full, never gets offlined. > > > > > > That solution really sucks - it essentially regresses a feature the user > > > explicitly asked for! I also see no way for the user to migrate the timekeeping > > > functionality over to another CPU without rebooting. > > > > > > If this is the last timekeeping CPU then it should migrate the timekeeping > > > functionality to another CPU, and perhaps printk a warning if all other CPUs are > > > nohz-full and we have to mark one of them as the timekeeper. > > > > > > Also, the nohz-full and timekeeper functionality should not be a boot parameter > > > only thing, but should be runtime configurable. > > > > When I tried to allow moving the timekeeping duty over all housekeeping CPUs, > > Thomas got angry because it broke the KISS current nohz full code. Indeed, there > > must be at least one running all the time on behalf of nohz full CPUs that can > > run anytime. Thus balancing the timekeeping duty over housekeepers is a bit more > > complicated than in normal configurations. > > > > Now surely we can do that using an IPI from CPU_DOWN_PREPARE to a housekeeper if > > any remains or to a nohz full one. Then we must make sure the new timekeeper > > never goes to idle. > > > > But nohz_full is a corner usecase and I'm not sure it's worth the complexity. If > > a nohz full user came and complained about CPU0 hotplog not working, I would > > definetly retry it but I haven't heard about that yet. Besides, hotplug is very > > isolation-unfriendly in general due to stop machine. > > Ok, I guess we can live with this. Now this will likely evolve in the future, I can easily imagine that timekeeping becomes balanced among housekeepers when we'll have one per node. It's not yet the priority but we may come to that one day. > > Mind sending an updated series with all patches? Sure, I'm cooking that. Thanks! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 6:44 ` Ingo Molnar 2015-08-24 7:23 ` Mike Galbraith 2015-08-24 13:36 ` Frederic Weisbecker @ 2015-08-24 13:50 ` Paul E. McKenney 2015-08-24 14:04 ` Frederic Weisbecker 2 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2015-08-24 13:50 UTC (permalink / raw) To: Ingo Molnar Cc: Frederic Weisbecker, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > here it's fully set - triggering the bug I'm worried about. So what am I > > > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing? > > > > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is > > called from tick_init() which is called from start_kernel() shortly after > > rcu_init(): > > > > cpu = smp_processor_id(); > > > > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { > > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu); > > cpumask_clear_cpu(cpu, tick_nohz_full_mask); > > } > > > > This happens after the call to tick_nohz_init_all() that does the > > cpumask_setall() that you called out above. > > Ah, indeed - I somehow missed that. > > This brings up two other questions: > > 1) > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to > it (housekeeping_mask - a variable mask to begin with) and recover the > housekeeping CPU via: > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > > which can be pretty expensive, and which gets executed in two hotpaths: > > kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target()); > kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target()); > > ... why not just use a single housekeeping_cpu which would be way faster to pass > down to the timer code? The housekeeping_cpu came later, but that does seem like a good optimization. > 2) > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > is offlined. The tick_nohz_cpu_down_callback() function does this, though in a less than obvious way. The tick_do_timer_cpu variable is the housekeeping CPU that is currently handling timing, and it is not permitted to go offline. Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 13:50 ` Paul E. McKenney @ 2015-08-24 14:04 ` Frederic Weisbecker 2015-08-24 15:45 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-24 14:04 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Mon, Aug 24, 2015 at 06:50:18AM -0700, Paul E. McKenney wrote: > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > > > here it's fully set - triggering the bug I'm worried about. So what am I > > > > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing? > > > > > > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is > > > called from tick_init() which is called from start_kernel() shortly after > > > rcu_init(): > > > > > > cpu = smp_processor_id(); > > > > > > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { > > > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu); > > > cpumask_clear_cpu(cpu, tick_nohz_full_mask); > > > } > > > > > > This happens after the call to tick_nohz_init_all() that does the > > > cpumask_setall() that you called out above. > > > > Ah, indeed - I somehow missed that. > > > > This brings up two other questions: > > > > 1) > > > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to > > it (housekeeping_mask - a variable mask to begin with) and recover the > > housekeeping CPU via: > > > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > > > > which can be pretty expensive, and which gets executed in two hotpaths: > > > > kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target()); > > kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target()); > > > > ... why not just use a single housekeeping_cpu which would be way faster to pass > > down to the timer code? > > The housekeeping_cpu came later, but that does seem like a good optimization. Well nohz full is likely to be used for HPC and that can involve big machines. Having the housekeeping duty spread per node is a likely future evolution there, if it isn't already used that way. So we need to keep it a cpumask. > > > 2) > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > > is offlined. > > The tick_nohz_cpu_down_callback() function does this, though in a less > than obvious way. The tick_do_timer_cpu variable is the housekeeping > CPU that is currently handling timing, and it is not permitted to go > offline. Indeed, more specifically tick-common.c makes sure to set the timekeeping duty to a housekeeper and that housekeeper is always the boot CPU due to early device initialization. But I should find a way to simplify that code and make it obvious it's always set to the boot CPU. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-24 14:04 ` Frederic Weisbecker @ 2015-08-24 15:45 ` Paul E. McKenney 0 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2015-08-24 15:45 UTC (permalink / raw) To: Frederic Weisbecker Cc: Ingo Molnar, Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter On Mon, Aug 24, 2015 at 04:04:37PM +0200, Frederic Weisbecker wrote: > On Mon, Aug 24, 2015 at 06:50:18AM -0700, Paul E. McKenney wrote: > > On Mon, Aug 24, 2015 at 08:44:12AM +0200, Ingo Molnar wrote: > > > > > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > > > > > here it's fully set - triggering the bug I'm worried about. So what am I > > > > > missing, what prevents CONFIG_NO_HZ_FULL_ALL from crashing? > > > > > > > > The boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(), which is > > > > called from tick_init() which is called from start_kernel() shortly after > > > > rcu_init(): > > > > > > > > cpu = smp_processor_id(); > > > > > > > > if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { > > > > pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu); > > > > cpumask_clear_cpu(cpu, tick_nohz_full_mask); > > > > } > > > > > > > > This happens after the call to tick_nohz_init_all() that does the > > > > cpumask_setall() that you called out above. > > > > > > Ah, indeed - I somehow missed that. > > > > > > This brings up two other questions: > > > > > > 1) > > > > > > the 'housekeeping CPU' is essentially the boot CPU. Yet we dedicate a full mask to > > > it (housekeeping_mask - a variable mask to begin with) and recover the > > > housekeeping CPU via: > > > > > > + return cpumask_any_and(housekeeping_mask, cpu_online_mask); > > > > > > which can be pretty expensive, and which gets executed in two hotpaths: > > > > > > kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target()); > > > kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target()); > > > > > > ... why not just use a single housekeeping_cpu which would be way faster to pass > > > down to the timer code? > > > > The housekeeping_cpu came later, but that does seem like a good optimization. > > Well nohz full is likely to be used for HPC and that can involve big machines. > Having the housekeeping duty spread per node is a likely future evolution there, > if it isn't already used that way. > > So we need to keep it a cpumask. Fair point! Thanx, Paul > > > 2) > > > > > > What happens if the boot CPU is offlined? (under CONFIG_BOOTPARAM_HOTPLUG_CPU0=y) > > > > > > I don't see CPU hotplug callbacks fixing up the housekeeping_mask if the boot CPU > > > is offlined. > > > > The tick_nohz_cpu_down_callback() function does this, though in a less > > than obvious way. The tick_do_timer_cpu variable is the housekeeping > > CPU that is currently handling timing, and it is not permitted to go > > offline. > > Indeed, more specifically tick-common.c makes sure to set the timekeeping > duty to a housekeeper and that housekeeper is always the boot CPU due to > early device initialization. > > But I should find a way to simplify that code and make it obvious it's always > set to the boot CPU. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers 2015-08-23 5:40 ` Ingo Molnar 2015-08-23 16:01 ` Paul E. McKenney @ 2015-08-24 1:45 ` Frederic Weisbecker 1 sibling, 0 replies; 21+ messages in thread From: Frederic Weisbecker @ 2015-08-24 1:45 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, LKML, Vatika Harlalka, Chris Metcalf, Thomas Gleixner, Preeti U Murthy, Christoph Lameter, Paul E . McKenney On Sun, Aug 23, 2015 at 07:40:32AM +0200, Ingo Molnar wrote: > So I almost applied this yesterday, but had the following question: what ensures > that housekeeping_mask isn't empty? If it's empty then housekeeping_any_cpu() > returns cpumask_any_and() of an empty cpumask - which returns an out of range > index AFAICS - which will crash and burn in: > > kernel/time/hrtimer.c: return &per_cpu(hrtimer_bases, get_nohz_timer_target()); > kernel/time/timer.c: return per_cpu_ptr(&tvec_bases, get_nohz_timer_target()); > > housekeeping_mask itself is derived from tick_nohz_full_mask (it's the inverse of > it in essence), and tick_nohz_full_mask is set via two methods, either via a boot > parameter: > > if (cpulist_parse(str, tick_nohz_full_mask) < 0) { > > in tick_nohz_full_setup(). What ensures here that tick_nohz_full_mask is not > completely full - making housekeeping_mask empty? > > The other method is via CONFIG_NO_HZ_FULL_ALL: > > cpumask_setall(tick_nohz_full_mask); > > here it's fully set - triggering the bug I'm worried about. So what am I missing, > what prevents CONFIG_NO_HZ_FULL_ALL from crashing? Legitimate worry and I should have explained that in the changelog. Like Paul replied, we make sure that at least the boot CPU is excluded from tick_nohz_full_mask in tick_nohz_init(). Then housekeeping_mask, by reverse effect, contains that boot CPU at least. And we also make sure that the boot CPU can't get offline (tick_nohz_cpu_down_callback()). Now we should really document and check that assumption so here is a second patch below. The sched patch depends on tip:sched/core (to avoid conflicts with sched changes) and the following one is based on tip:timer/nohz but should be applicable to sched/core without conflict. Both are standalone anyway. Thanks! --- From: Frederic Weisbecker <fweisbec@gmail.com> Date: Sun, 23 Aug 2015 19:34:31 +0200 Subject: [PATCH] nohz: Assert existing housekeepers when nohz full enabled The code ensures that at least the boot CPU serves as a housekeeper. Let's assert this assumption to make sure that we have CPUs to handle unbound jobs like workqueues and timers while nohz full CPUs run undisturbed. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- kernel/time/tick-sched.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3319e16..cc9884f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -370,6 +370,12 @@ void __init tick_nohz_init(void) cpu_notifier(tick_nohz_cpu_down_callback, 0); pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n", cpumask_pr_args(tick_nohz_full_mask)); + + /* + * We need at least one CPU to handle housekeeping work such + * as timekeeping, unbound timers, workqueues, ... + */ + WARN_ON_ONCE(cpumask_empty(housekeeping_mask)); } #endif -- 2.1.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2015-08-28 12:31 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-14 1:46 [PATCH RESEND] sched/nohz: Affine unpinned timers to housekeepers Frederic Weisbecker 2015-08-22 21:09 ` Ping: " Frederic Weisbecker 2015-08-23 1:22 ` Christoph Lameter 2015-08-23 5:40 ` Ingo Molnar 2015-08-23 16:01 ` Paul E. McKenney 2015-08-24 1:28 ` Frederic Weisbecker 2015-08-24 6:44 ` Ingo Molnar 2015-08-24 7:23 ` Mike Galbraith 2015-08-24 7:41 ` Ingo Molnar 2015-08-24 7:54 ` Mike Galbraith 2015-08-24 8:00 ` Ingo Molnar 2015-08-24 13:36 ` Frederic Weisbecker 2015-08-24 14:01 ` Mike Galbraith 2015-08-25 8:29 ` Ingo Molnar 2015-08-25 13:45 ` Frederic Weisbecker 2015-08-28 8:32 ` Ingo Molnar 2015-08-28 12:30 ` Frederic Weisbecker 2015-08-24 13:50 ` Paul E. McKenney 2015-08-24 14:04 ` Frederic Weisbecker 2015-08-24 15:45 ` Paul E. McKenney 2015-08-24 1:45 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox