* [PATCH 0/2] Handle timers added to offlined CPU(s). @ 2025-01-15 13:41 Imran Khan 2025-01-15 13:41 ` [PATCH 1/2] timers: WARN if add_timer_on is used with offlined cpu Imran Khan 2025-01-15 13:41 ` [PATCH 2/2] timers: introduce timer_try_add_on_cpu Imran Khan 0 siblings, 2 replies; 8+ messages in thread From: Imran Khan @ 2025-01-15 13:41 UTC (permalink / raw) To: anna-maria, frederic, tglx; +Cc: linux-kernel This patchset is in continuation of issue discussed in [1]. Currently there is a possibilty that users of add_timer_on, can (mistakenly) use it for starting a timer on an offlined cpu. As expected such timers will not give desired result. This patchset: i. (PATCH 1) WARNs users of add_timer_on, if they are using it with offlined CPU. ii. (PATCH 2) Introduces an alternative interface that could be used, to ensure that timer gets started only on an online cpu. Imran Khan (2): timers: WARN if add_timer_on is used with offlined cpu. timers: introduce timer_try_add_on_cpu. include/linux/timer.h | 1 + kernel/time/timer.c | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) base-commit: e7bb221a638962d487231ac45a6699fb9bb8f9fa [1]: https://lore.kernel.org/all/0a83e9e1-3e85-4c33-bec8-6b60e6216bd8@oracle.com/ -- 2.34.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] timers: WARN if add_timer_on is used with offlined cpu. 2025-01-15 13:41 [PATCH 0/2] Handle timers added to offlined CPU(s) Imran Khan @ 2025-01-15 13:41 ` Imran Khan 2025-01-15 14:39 ` Thomas Gleixner 2025-01-15 13:41 ` [PATCH 2/2] timers: introduce timer_try_add_on_cpu Imran Khan 1 sibling, 1 reply; 8+ messages in thread From: Imran Khan @ 2025-01-15 13:41 UTC (permalink / raw) To: anna-maria, frederic, tglx; +Cc: linux-kernel timer started on an offlined cpu will not fire after its expiry time and may never fire if that cpu remains offline. So add a WARN_ON_ONCE in add_timer_on, to indicate if any of its users are (wrongly) starting a timer on an offlined cpu. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> --- kernel/time/timer.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index a5860bf6d16f9..ec9eb58e45241 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1370,6 +1370,12 @@ void add_timer_on(struct timer_list *timer, int cpu) if (!timer->function) goto out_unlock; + /* + * WARN if specified cpu is offline, because on offlined cpu + * timer will not fire even after its expiry. + */ + WARN_ON_ONCE(!cpu_online(cpu)); + if (base != new_base) { timer->flags |= TIMER_MIGRATING; -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] timers: WARN if add_timer_on is used with offlined cpu. 2025-01-15 13:41 ` [PATCH 1/2] timers: WARN if add_timer_on is used with offlined cpu Imran Khan @ 2025-01-15 14:39 ` Thomas Gleixner 2025-01-15 15:10 ` imran.f.khan 0 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2025-01-15 14:39 UTC (permalink / raw) To: Imran Khan, anna-maria, frederic; +Cc: linux-kernel On Thu, Jan 16 2025 at 00:41, Imran Khan wrote: > timer started on an offlined cpu will not fire after > its expiry time and may never fire if that cpu remains > offline. > So add a WARN_ON_ONCE in add_timer_on, to indicate > if any of its users are (wrongly) starting a timer > on an offlined cpu. > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Imran Khan <imran.f.khan@oracle.com> > --- > kernel/time/timer.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index a5860bf6d16f9..ec9eb58e45241 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1370,6 +1370,12 @@ void add_timer_on(struct timer_list *timer, int cpu) > if (!timer->function) > goto out_unlock; > > + /* > + * WARN if specified cpu is offline, because on offlined cpu > + * timer will not fire even after its expiry. > + */ > + WARN_ON_ONCE(!cpu_online(cpu)); Then why queueing the timer in the first place? Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] timers: WARN if add_timer_on is used with offlined cpu. 2025-01-15 14:39 ` Thomas Gleixner @ 2025-01-15 15:10 ` imran.f.khan 0 siblings, 0 replies; 8+ messages in thread From: imran.f.khan @ 2025-01-15 15:10 UTC (permalink / raw) To: Thomas Gleixner, anna-maria, frederic; +Cc: linux-kernel Hello Thomas, On 16/1/2025 1:39 am, Thomas Gleixner wrote: > On Thu, Jan 16 2025 at 00:41, Imran Khan wrote: >> timer started on an offlined cpu will not fire after >> its expiry time and may never fire if that cpu remains >> offline. >> So add a WARN_ON_ONCE in add_timer_on, to indicate >> if any of its users are (wrongly) starting a timer >> on an offlined cpu. >> >> Suggested-by: Thomas Gleixner <tglx@linutronix.de> >> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> >> --- >> kernel/time/timer.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/kernel/time/timer.c b/kernel/time/timer.c >> index a5860bf6d16f9..ec9eb58e45241 100644 >> --- a/kernel/time/timer.c >> +++ b/kernel/time/timer.c >> @@ -1370,6 +1370,12 @@ void add_timer_on(struct timer_list *timer, int cpu) >> if (!timer->function) >> goto out_unlock; >> >> + /* >> + * WARN if specified cpu is offline, because on offlined cpu >> + * timer will not fire even after its expiry. >> + */ >> + WARN_ON_ONCE(!cpu_online(cpu)); > > Then why queueing the timer in the first place? > The queueing is for the case, if the cpu becomes online again. Warning tells that timer is being put on an offlined cpu so may never fire, if the cpu remains offline. But if the cpu becomes online, the timer will fire. Or should I just return after warning, like currently being done for timer_pending. In that case I can move the warning before lock_timer_base. Thanks, Imran > Thanks, > > tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] timers: introduce timer_try_add_on_cpu. 2025-01-15 13:41 [PATCH 0/2] Handle timers added to offlined CPU(s) Imran Khan 2025-01-15 13:41 ` [PATCH 1/2] timers: WARN if add_timer_on is used with offlined cpu Imran Khan @ 2025-01-15 13:41 ` Imran Khan 2025-01-15 16:04 ` Thomas Gleixner 1 sibling, 1 reply; 8+ messages in thread From: Imran Khan @ 2025-01-15 13:41 UTC (permalink / raw) To: anna-maria, frederic, tglx; +Cc: linux-kernel Timers started using add_timer_on, can get stuck if the specified cpu is offline. If the user of add_timer_on can't guarantee that the specified cpu is online and ends up starting timer on an offline cpu, then that timer may not give expected results. Such users can use new interface timer_try_add_on_cpu, which starts timer on a given cpu, only after ensuring that it remains online. If it sees that the specified cpu is offline or if it can't ensure that the cpu is online, it does not start timer on any cpu and leaves the decision of selecting other cpu to the caller. Thus it ensures that started timer does not get lost, because it was started on an offlined cpu. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> --- include/linux/timer.h | 1 + kernel/time/timer.c | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/include/linux/timer.h b/include/linux/timer.h index e67ecd1cbc97d..210c15527b325 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -148,6 +148,7 @@ static inline int timer_pending(const struct timer_list * timer) } extern void add_timer_on(struct timer_list *timer, int cpu); +extern bool timer_try_add_on_cpu(struct timer_list *timer, int cpu); extern int mod_timer(struct timer_list *timer, unsigned long expires); extern int mod_timer_pending(struct timer_list *timer, unsigned long expires); extern int timer_reduce(struct timer_list *timer, unsigned long expires); diff --git a/kernel/time/timer.c b/kernel/time/timer.c index ec9eb58e45241..800ed9b4dea7a 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1394,6 +1394,39 @@ void add_timer_on(struct timer_list *timer, int cpu) } EXPORT_SYMBOL_GPL(add_timer_on); +/** + * timer_try_add_on_cpu - Try to start a timer on a particular CPU, + * after ensuring that it is and remains online. + * @timer: The timer to be started + * @cpu: The CPU to start it on + * + * Check and ensure that specified cpu is around, before starting a timer + * on it. + * + * Return: + * * %true - If timer was started on an online cpu + * * %false - If the specified cpu was offline or if its online status + * could not be ensured due to unavailability of hotplug lock. + */ +bool timer_try_add_on_cpu(struct timer_list *timer, int cpu) +{ + bool ret = true; + + if (unlikely(!cpu_online(cpu))) + ret = false; + else if (cpus_read_trylock()) { + if (likely(cpu_online(cpu))) + add_timer_on(timer, cpu); + else + ret = false; + cpus_read_unlock(); + } else + ret = false; + + return ret; +} +EXPORT_SYMBOL_GPL(timer_try_add_on_cpu); + /** * __timer_delete - Internal function: Deactivate a timer * @timer: The timer to be deactivated -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] timers: introduce timer_try_add_on_cpu. 2025-01-15 13:41 ` [PATCH 2/2] timers: introduce timer_try_add_on_cpu Imran Khan @ 2025-01-15 16:04 ` Thomas Gleixner 2025-01-15 17:00 ` imran.f.khan 0 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2025-01-15 16:04 UTC (permalink / raw) To: Imran Khan, anna-maria, frederic; +Cc: linux-kernel On Thu, Jan 16 2025 at 00:41, Imran Khan wrote: > + * Return: > + * * %true - If timer was started on an online cpu > + * * %false - If the specified cpu was offline or if its online status > + * could not be ensured due to unavailability of hotplug lock. > + */ > +bool timer_try_add_on_cpu(struct timer_list *timer, int cpu) > +{ > + bool ret = true; > + > + if (unlikely(!cpu_online(cpu))) > + ret = false; > + else if (cpus_read_trylock()) { > + if (likely(cpu_online(cpu))) > + add_timer_on(timer, cpu); > + else > + ret = false; > + cpus_read_unlock(); > + } else > + ret = false; > + > + return ret; Aside of the horrible coding style, that cpus_read_trylock() part does not make any sense. It's perfectly valid to queue a timer on a online CPU when the CPU hotplug lock is held write, which can have tons of reasons even unrelated to an actual CPU hotplug operation. Even during a hotplug operation adding a timer on a particular CPU is valid, whether that's the CPU which is actually plugged or not is irrelevant. So if we add such a function, then it needs to have very precisely defined semantics, which have to be independent of the CPU hotplug lock. The only way I can imagine is that the state is part of the per CPU timer base, but then I have to ask the question what is actually tried to solve here. As far as I understood that there is an issue in the RDS code, queueing a delayed work on a offline CPU, but that should have triggered at least the warning in __queue_delayed_work(), right? So the question is whether this try() interface is solving any of this and not papering over the CPU hotplug related issues in the RDS code in some way. Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] timers: introduce timer_try_add_on_cpu. 2025-01-15 16:04 ` Thomas Gleixner @ 2025-01-15 17:00 ` imran.f.khan 2025-01-28 3:36 ` imran.f.khan 0 siblings, 1 reply; 8+ messages in thread From: imran.f.khan @ 2025-01-15 17:00 UTC (permalink / raw) To: Thomas Gleixner, anna-maria, frederic; +Cc: linux-kernel Hello Thomas, Thanks for taking a look and your feedback. On 16/1/2025 3:04 am, Thomas Gleixner wrote: > On Thu, Jan 16 2025 at 00:41, Imran Khan wrote: >> + * Return: >> + * * %true - If timer was started on an online cpu >> + * * %false - If the specified cpu was offline or if its online status >> + * could not be ensured due to unavailability of hotplug lock. >> + */ >> +bool timer_try_add_on_cpu(struct timer_list *timer, int cpu) >> +{ >> + bool ret = true; >> + >> + if (unlikely(!cpu_online(cpu))) >> + ret = false; >> + else if (cpus_read_trylock()) { >> + if (likely(cpu_online(cpu))) >> + add_timer_on(timer, cpu); >> + else >> + ret = false; >> + cpus_read_unlock(); >> + } else >> + ret = false; >> + >> + return ret; > > Aside of the horrible coding style, that cpus_read_trylock() part does > not make any sense. > > It's perfectly valid to queue a timer on a online CPU when the CPU > hotplug lock is held write, which can have tons of reasons even > unrelated to an actual CPU hotplug operation. > > Even during a hotplug operation adding a timer on a particular CPU is > valid, whether that's the CPU which is actually plugged or not is > irrelevant. > > So if we add such a function, then it needs to have very precisely > defined semantics, which have to be independent of the CPU hotplug lock. > The hotplug lock is being used to avoid the scenario where cpu_online tells that @cpu is online but @cpu gets offline before add_timer_on could actually add the timer to that @cpu's timer base. Are you saying that this can't happen or by "defined semantics" you mean that @cpu indicated as online by cpu_online should not get offline in the middle of this function. > The only way I can imagine is that the state is part of the per CPU > timer base, but then I have to ask the question what is actually tried > to solve here. > > As far as I understood that there is an issue in the RDS code, queueing > a delayed work on a offline CPU, but that should have triggered at least > the warning in __queue_delayed_work(), right? > I guess you are referring to warning of [1]. This was just added few days back but the timer of delayed_work can still end up on offlined cpu. > So the question is whether this try() interface is solving any of this > and not papering over the CPU hotplug related issues in the RDS code in > some way. > The RDS code that I referred to in my query, is an in-house change and there may be some scope of updating the cached-cpu information there with cpu hotplug callbacks. But we also wanted to see if something could be done on timer side to address the possibilty of timer ending up on an offlined cpu. That's why I asked earlier if you see any merit in having a try() interface. As of now I don't have any more cases, running into this problem (putting timer-wheel timers on offlined cpu). May be with warning in __queue_delayed_work and (if gets added) in add_timer_on we may see more such cases. But if you agree, try() interface could still be added albeit without hotplug lock. Thanks, Imran [1]: https://github.com/torvalds/linux/blob/master/kernel/workqueue.c#L2511 > Thanks, > > tglx > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] timers: introduce timer_try_add_on_cpu. 2025-01-15 17:00 ` imran.f.khan @ 2025-01-28 3:36 ` imran.f.khan 0 siblings, 0 replies; 8+ messages in thread From: imran.f.khan @ 2025-01-28 3:36 UTC (permalink / raw) To: Thomas Gleixner, anna-maria, frederic; +Cc: linux-kernel Hello Thomas, Below, I have tried further to explain the reason behind using cpus_read_trylock in try_add_timer_on_cpu. Say CPU X is being offlined and CPU Y is checking for its online status (before issuing add_timer_on(X)). CPU Y is not the bootstrap processor (BP) here, it is executing something else that does: if cpu_online(X) add_timer_on(X); If at the time of checking cpu_online(X),the hotplug thread of CPU X i.e. cpuhp/X has not yet done __cpu_disable, CPU Y will see CPU X as online and issue add_timer_on (in the above snippet). In this case, whether the timer ends on an offlined cpu or not, depends on who gets the per_cpu timer_base.lock first. If the bootstrap processor, offlining CPU X, gets this lock first (in timers_dead_cpu), it will migrate all the timers from CPU X and then release timer_base.lock. Then CPU Y (add_timer_on) will get this lock and add timer to CPU X's timer_base, but since CPU X's timer have already been migrated, this newly added timer will be left on an offlined CPU. On the other hand if CPU Y (add_timer_on) wins the race, it would have already added the timer into timer_base of CPU X, before BP (timers_dead_cpu) gets the timer_base.lock and migrates all timers (including the one just added), to bootstrap processor and hence the timer will not be left on an offlined CPU. Could you please let me know if you see any problems/mistakes in the above reasoning ? From your previous reply I could not understand if you are totally against using cpus_read_trylock (because it may not be needed here and I am wrongly seeing its need) or if you are against using cpus_read_trylock in try_add_timer_on_cpu (i.e. caller of try_add_timer_on_cpu should take this lock). So I have tried to explain my reasoning further and know your thoughts. Thanks, Imran On 16/1/2025 4:00 am, imran.f.khan@oracle.com wrote: > Hello Thomas, > Thanks for taking a look and your feedback. > On 16/1/2025 3:04 am, Thomas Gleixner wrote: >> On Thu, Jan 16 2025 at 00:41, Imran Khan wrote: >>> + * Return: >>> + * * %true - If timer was started on an online cpu >>> + * * %false - If the specified cpu was offline or if its online status >>> + * could not be ensured due to unavailability of hotplug lock. >>> + */ >>> +bool timer_try_add_on_cpu(struct timer_list *timer, int cpu) >>> +{ >>> + bool ret = true; >>> + >>> + if (unlikely(!cpu_online(cpu))) >>> + ret = false; >>> + else if (cpus_read_trylock()) { >>> + if (likely(cpu_online(cpu))) >>> + add_timer_on(timer, cpu); >>> + else >>> + ret = false; >>> + cpus_read_unlock(); >>> + } else >>> + ret = false; >>> + >>> + return ret; >> >> Aside of the horrible coding style, that cpus_read_trylock() part does >> not make any sense. >> >> It's perfectly valid to queue a timer on a online CPU when the CPU >> hotplug lock is held write, which can have tons of reasons even >> unrelated to an actual CPU hotplug operation. >> >> Even during a hotplug operation adding a timer on a particular CPU is >> valid, whether that's the CPU which is actually plugged or not is >> irrelevant. >> >> So if we add such a function, then it needs to have very precisely >> defined semantics, which have to be independent of the CPU hotplug lock. >> > > The hotplug lock is being used to avoid the scenario where cpu_online tells > that @cpu is online but @cpu gets offline before add_timer_on could > actually add the timer to that @cpu's timer base. > Are you saying that this can't happen or by "defined semantics" > you mean that @cpu indicated as online by cpu_online should not get > offline in the middle of this function. > >> The only way I can imagine is that the state is part of the per CPU >> timer base, but then I have to ask the question what is actually tried >> to solve here. >> >> As far as I understood that there is an issue in the RDS code, queueing >> a delayed work on a offline CPU, but that should have triggered at least >> the warning in __queue_delayed_work(), right? >> > > I guess you are referring to warning of [1]. This was just added few days > back but the timer of delayed_work can still end up on offlined cpu. > >> So the question is whether this try() interface is solving any of this >> and not papering over the CPU hotplug related issues in the RDS code in >> some way. >> > > The RDS code that I referred to in my query, is an in-house change and there > may be some scope of updating the cached-cpu information there with cpu hotplug > callbacks. But we also wanted to see if something could be done on timer > side to address the possibilty of timer ending up on an offlined cpu. That's > why I asked earlier if you see any merit in having a try() interface. > > As of now I don't have any more cases, running into this problem (putting > timer-wheel timers on offlined cpu). May be with warning in > __queue_delayed_work and (if gets added) in add_timer_on we may see > more such cases. > > But if you agree, try() interface could still be added albeit without > hotplug lock. > > Thanks, > Imran > > [1]: https://github.com/torvalds/linux/blob/master/kernel/workqueue.c#L2511 >> Thanks, >> >> tglx >> >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-28 3:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-15 13:41 [PATCH 0/2] Handle timers added to offlined CPU(s) Imran Khan 2025-01-15 13:41 ` [PATCH 1/2] timers: WARN if add_timer_on is used with offlined cpu Imran Khan 2025-01-15 14:39 ` Thomas Gleixner 2025-01-15 15:10 ` imran.f.khan 2025-01-15 13:41 ` [PATCH 2/2] timers: introduce timer_try_add_on_cpu Imran Khan 2025-01-15 16:04 ` Thomas Gleixner 2025-01-15 17:00 ` imran.f.khan 2025-01-28 3:36 ` imran.f.khan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox