* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline [not found] ` <20121205165356.GL3885@mtj.dyndns.org> @ 2012-12-05 18:32 ` Srivatsa S. Bhat 0 siblings, 0 replies; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:32 UTC (permalink / raw) To: Tejun Heo Cc: Oleg Nesterov, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, linux-kernel@vger.kernel.org, Linux PM mailing list, Nikunj A Dadhania On 12/05/2012 10:23 PM, Tejun Heo wrote: > Hello, Oleg. > > On Wed, Dec 05, 2012 at 05:46:40PM +0100, Oleg Nesterov wrote: >> Sorry I don't understand the context and I can't find this thread >> anywhere, so I am not sure I understand... > > Weird, lkml cc is missing. Srivatsa? > Argh! I forgot to CC mailing lists :-( Sorry about that! I'll repost the patchset (and I'll probably even try to post the comments that you and Oleg have already given, for the benefit of other readers). Regards, Srivatsa S. Bhat >> On 12/05, Tejun Heo wrote: >>> >>> So, we basically need percpu_rwlock. We already have percpu_rwsem. >> >> Yes, and with -mm patches it becomes reader-friendly. In particular >> see http://marc.info/?l=linux-mm-commits&m=135240650828875 >> >>> Oleg, it seems >>> CPU hotplug needs big-reader rwlock, ideas on how to proceed? >> >> I am going to convert get_online_cpus() to use percpu_down_read(), >> this looks simple. >> >> We already discussed this with Paul, see >> >> http://marc.info/?l=linux-kernel&m=135248463226031 >> >> and the whole thread. >> >> In short, all we need is percpu_down_write_recursive_readers() and >> afaics the only complication is lockdep, we need down_read_no_lockdep() >> which (like __up_read) doesn't do rwsem_acquire_read(). > > So, it's a different thing. There are two mechanism protecting > against cpu hotplug - get_online_cpus() and preempt_disable(). The > former can be used by ones which can sleep and need to protect against > the whole up/down process (DOWN_PREPARE and so on). The latter > protects the last step and can be used when the caller can't sleep. > Replacing get_online_cpus() w/ percpu_rwsem is great but this thread > is about replacing preempt_disable with something finer grained and > less heavy on the writer side - IOW, percpu_rwlock as opposed to > percpu_rwsem, so, I think the end result would be that CPU hotplug > will be protected by percpu_rwsem for the whole part and by > percpu_rwlock for the last commit stage. > > The problem seems that we don't have percpu_rwlock yet. It shouldn't > be too difficult to implement, right? > > Thanks. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH v2 00/10][RESEND] CPU hotplug: stop_machine()-free CPU hotplug @ 2012-12-05 18:42 Srivatsa S. Bhat 2012-12-05 18:43 ` [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline Srivatsa S. Bhat 0 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:42 UTC (permalink / raw) To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, oleg Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel [Resending because I forgot to CC lkml last time. Please send your replies to this thread instead of the previous (non-public) one. Sorry for the trouble!] Hi, This patchset removes CPU hotplug's dependence on stop_machine() from the CPU offline path and provides an alternative (set of APIs) to preempt_disable() to prevent CPUs from going offline, which can be invoked from atomic context. This is an RFC patchset with only a few call-sites of preempt_disable() converted to the new APIs for now, and the main goal is to get feedback on the design of the new atomic APIs and see if it serves as a viable replacement for stop_machine()-free CPU hotplug. Overview of the patches: ----------------------- Patch 1 introduces the new APIs that can be used from atomic context, to prevent CPUs from going offline. This set of APIs is for "light" atomic hotplug readers (the term "light" is explained in the changelog/comments). Patch 2 introduces the new APIs for "full" atomic hotplug readers. Patch 3 is a cleanup; it converts preprocessor macros to static inline functions. Patches 4 to 9 convert various call-sites to use the new APIs. (Patches 4 to 8 deal with "light" readers, patch 9 deals with a "full" reader. Together, they demonstrate how to use these APIs effectively). Patch 10 is the one which actually removes stop_machine() from the CPU offline path. Changes in v2: ------------- * Completely redesigned the synchronization scheme to avoid using any extra cpumasks. * Provided APIs for 2 types of atomic hotplug readers: "light" (for light-weight) and "full". We wish to have more "light" readers than the "full" ones, to avoid indirectly inducing the "stop_machine effect" without even actually using stop_machine(). And the patches show that it _is_ generally true: 5 patches deal with "light" readers, whereas only 1 patch deals with a "full" reader. Also, the "light" readers happen to be in very hot paths. So it makes a lot of sense to have such a distinction and a corresponding light-weight API. v1: https://lkml.org/lkml/2012/12/4/88 Comments and suggestions welcome! -- Paul E. McKenney (1): cpu: No more __stop_machine() in _cpu_down() Srivatsa S. Bhat (9): CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline CPU hotplug: Provide APIs for "full" atomic readers to prevent CPU offline CPU hotplug: Convert preprocessor macros to static inline functions smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq() kick_process(), cpu-hotplug: Prevent offlining of target CPU properly yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly kvm, vmx: Add full atomic synchronization with CPU Hotplug arch/x86/kvm/vmx.c | 8 ++- include/linux/cpu.h | 12 ++++- kernel/cpu.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++- kernel/sched/core.c | 22 +++++++- kernel/smp.c | 64 +++++++++++++++--------- 5 files changed, 210 insertions(+), 32 deletions(-) Thanks, Srivatsa S. Bhat IBM Linux Technology Center ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:42 [RFC PATCH v2 00/10][RESEND] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat @ 2012-12-05 18:43 ` Srivatsa S. Bhat 2012-12-05 18:47 ` Srivatsa S. Bhat 2012-12-05 19:07 ` Oleg Nesterov 0 siblings, 2 replies; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:43 UTC (permalink / raw) To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, oleg Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel There are places where preempt_disable() is used to prevent any CPU from going offline during the critical section. Let us call them as "atomic hotplug readers" (atomic because they run in atomic contexts). Often, these atomic hotplug readers have a simple need : they want the cpu online mask that they work with (inside their critical section), to be stable, i.e., it should be guaranteed that CPUs in that mask won't go offline during the critical section. The important point here is that they don't really need to synchronize with the actual CPU tear-down sequence. All they need is synchronization with the updates to the cpu_online_mask. (Hence the term "light", for light-weight). The intent of this patch is to provide synchronization APIs for such "light" atomic hotplug readers. [ get/put_online_cpus_atomic_light() ] Fundamental idea behind the design: ----------------------------------- Simply put, in the hotplug writer path, have appropriate locking around the update to the cpu_online_mask in the CPU tear-down sequence. And once the update is done, release the lock and allow the "light" atomic hotplug readers to go ahead. Meanwhile, the hotplug writer can safely continue the actual CPU tear-down sequence (running CPU_DYING notifiers etc) since the "light" atomic readers don't really care about those operations (and hence don't need to synchronize with them). Also, once the hotplug writer completes taking the CPU offline, it should not start any new cpu_down() operations until all existing "light" atomic hotplug readers have completed. Some important design requirements and considerations: ----------------------------------------------------- 1. The "light" atomic hotplug readers should ideally *never* have to wait for the hotplug writer (cpu_down()) for too long (like entire duration of CPU offline, for example). Because, these atomic hotplug readers can be in very hot-paths like interrupt handling/IPI and hence, if they have to wait for an ongoing cpu_down() to complete, it would pretty much introduce the same performance/latency problems as stop_machine(). 2. Any synchronization at the atomic hotplug readers side must be highly scalable - avoid global single-holder locks/counters etc. Because, these paths currently use the extremely fast preempt_disable(); our replacement to preempt_disable() should not become ridiculously costly and also should not serialize the readers among themselves needlessly. 3. preempt_disable() was recursive. The replacement should also be recursive. Implementation of the design: ---------------------------- At the core, we use a reader-writer lock to synchronize the update to the cpu_online_mask. That way, multiple "light" atomic hotplug readers can co-exist and the writer can acquire the lock only when all the readers have completed. Once acquired, the writer holds the "light" lock only during the duration of the update to the cpu_online_mask. That way, the readers don't have to spin for too long (ie., the write-hold-time for the "light" lock is tiny), which keeps the readers in good shape. Reader-writer lock are recursive, so they can be used in a nested fashion in the reader-path. Together, these satisfy all the 3 requirements mentioned above. Also, since we don't use per-cpu locks (because rwlocks themselves are quite scalable for readers), we don't end up in any lock ordering problems that can occur if we try to use per-cpu locks. I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful suggestions and ideas, which inspired and influenced many of the decisions in this as well as previous designs. Thanks a lot Michael and Xiao! Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> --- include/linux/cpu.h | 4 ++ kernel/cpu.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index ce7a074..dd0a3ee 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys; extern void get_online_cpus(void); extern void put_online_cpus(void); +extern void get_online_cpus_atomic_light(void); +extern void put_online_cpus_atomic_light(void); #define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) @@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void) #define get_online_cpus() do { } while (0) #define put_online_cpus() do { } while (0) +#define get_online_cpus_atomic_light() do { } while (0) +#define put_online_cpus_atomic_light() do { } while (0) #define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0) /* These aren't inline functions due to a GCC bug. */ #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; }) diff --git a/kernel/cpu.c b/kernel/cpu.c index 42bd331..381593c 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -49,6 +49,69 @@ static int cpu_hotplug_disabled; #ifdef CONFIG_HOTPLUG_CPU +/* + * Reader-writer lock to synchronize between "light" atomic hotplug readers + * and the hotplug writer while updating cpu_online_mask. + * "Light" atomic hotplug readers are those who don't really need to + * synchronize with the actual CPU bring-up/take-down sequence, but only + * need to synchronize with the updates to the cpu_online_mask. + */ +static DEFINE_RWLOCK(light_hotplug_rwlock); + +/* + * Hotplug readers (those that want to prevent CPUs from coming online or + * going offline ) sometimes run from atomic contexts, and hence can't use + * get/put_online_cpus() because they can sleep. And often-times, all + * they really want is that the cpu_online_mask remain unchanged while + * they are executing in their critical section. They also don't really + * need to synchronize with the actual CPU tear-down sequence. Such atomic + * hotplug readers are called "light" readers (light for light-weight). + * + * These "light" atomic hotplug readers can use the APIs + * get/put_online_atomic_light() around their critical sections to + * ensure that the cpu_online_mask remains unaltered throughout that + * critical section. + * + * Caution!: While the readers are in their critical section, a CPU offline + * operation can actually happen under the covers; its just that the bit flip + * in the cpu_online_mask will be synchronized properly if you use these APIs. + * If you really want full synchronization with the entire CPU tear-down + * sequence, then you are not a "light" hotplug reader. So don't use these + * APIs! + * + * Eg: + * + * "Light" atomic hotplug read-side critical section: + * -------------------------------------------------- + * + * get_online_cpus_atomic_light(); + * + * for_each_online_cpu(cpu) { + * ... Do something... + * } + * ... + * + * if (cpu_online(other_cpu)) + * do_something(); + * + * put_online_cpus_atomic_light(); + * + * You can call this function recursively. + */ +void get_online_cpus_atomic_light(void) +{ + preempt_disable(); + read_lock(&light_hotplug_rwlock); +} +EXPORT_SYMBOL_GPL(get_online_cpus_atomic_light); + +void put_online_cpus_atomic_light(void) +{ + read_unlock(&light_hotplug_rwlock); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(put_online_cpus_atomic_light); + static struct { struct task_struct *active_writer; struct mutex lock; /* Synchronizes accesses to refcount, */ @@ -246,14 +309,36 @@ struct take_cpu_down_param { static int __ref take_cpu_down(void *_param) { struct take_cpu_down_param *param = _param; + unsigned long flags; int err; + /* + * __cpu_disable() is the step where the CPU is removed from the + * cpu_online_mask. Protect it with the light-lock held for write. + */ + write_lock_irqsave(&light_hotplug_rwlock, flags); + /* Ensure this CPU doesn't handle any more interrupts. */ err = __cpu_disable(); - if (err < 0) + if (err < 0) { + write_unlock_irqrestore(&light_hotplug_rwlock, flags); return err; + } + + /* + * We have successfully removed the CPU from the cpu_online_mask. + * So release the light-lock, so that the light-weight atomic readers + * (who care only about the cpu_online_mask updates, and not really + * about the actual cpu-take-down operation) can continue. + * + * But don't enable interrupts yet, because we still have work left to + * do, to actually bring the CPU down. + */ + write_unlock(&light_hotplug_rwlock); cpu_notify(CPU_DYING | param->mod, param->hcpu); + + local_irq_restore(flags); return 0; } ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:43 ` [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline Srivatsa S. Bhat @ 2012-12-05 18:47 ` Srivatsa S. Bhat 2012-12-05 18:51 ` Srivatsa S. Bhat 2012-12-05 19:07 ` Oleg Nesterov 1 sibling, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:47 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel Replaying what Tejun wrote: (cc'ing Oleg) Hello, Srivatsa. On 12/06/2012 12:13 AM, Srivatsa S. Bhat wrote: > Also, since we don't use per-cpu locks (because rwlocks themselves are quite > scalable for readers), we don't end up in any lock ordering problems that can > occur if we try to use per-cpu locks. > Read-lock really isn't that scalable when you compare it to preempt_disable/enable(). When used on hot paths, it's gonna generate a lot of cacheline pingpongs. This patch is essentially creating a new big lock which has potential for being very hot. preempt_disable/enable() + stop_machine() essentially works as percpu rwlock with very heavy penalty on the writer side. Because the reader side doesn't even implement spinning while writer is in progress, the writer side has to preempt the readers before entering critical section and that's what the "stopping machine" is about. Note that the resolution on the reader side is very low. Any section w/ preemption disabled is protected against stop_machine(). Also, the stop_machine() itself is extremely heavy involving essentially locking up the machine until all CPUs can reach the same condition via scheduling the stop_machine tasks. So, I *think* all you need to do here is making cpu online locking finer grained (separated from preemption) and lighten the writer side a bit. I'm quite doubtful that you would need to go hunting donw all get_online_cpus(). They aren't used that often anyway. Anyways, so, separating out cpu hotplug locking from preemption is the right thing to do but I think rwlock is likely to be too heavy on the reader side. I think percpu reader accounting + reader spinning while writer in progress should be a good combination. It's a bit heavier than preempt_disable() - it'll have an extra conditional jump on the hot path, but there won't be any cacheline bouncing. The writer side would need to synchronize against all CPUs but only against the ones actually read locking cpu hotplug. As long as reader side critical sections don't go crazy, it should be okay. So, we basically need percpu_rwlock. We already have percpu_rwsem. We used to have some different variants of writer-heavy locks. Dunno what happened to them. Maybe we still have it somewhere. Oleg has been working on the area lately and should know more. Oleg, it seems CPU hotplug needs big-reader rwlock, ideas on how to proceed? Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:47 ` Srivatsa S. Bhat @ 2012-12-05 18:51 ` Srivatsa S. Bhat 2012-12-05 18:53 ` Srivatsa S. Bhat 0 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:51 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel Replaying what Oleg wrote: Hi, Sorry I don't understand the context and I can't find this thread anywhere, so I am not sure I understand... > Replaying what Tejun wrote: > So, we basically need percpu_rwlock. We already have percpu_rwsem. Yes, and with -mm patches it becomes reader-friendly. In particular see http://marc.info/?l=linux-mm-commits&m=135240650828875 > Oleg, it seems > CPU hotplug needs big-reader rwlock, ideas on how to proceed? > I am going to convert get_online_cpus() to use percpu_down_read(), this looks simple. We already discussed this with Paul, see http://marc.info/?l=linux-kernel&m=135248463226031 and the whole thread. In short, all we need is percpu_down_write_recursive_readers() and afaics the only complication is lockdep, we need down_read_no_lockdep() which (like __up_read) doesn't do rwsem_acquire_read(). Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:51 ` Srivatsa S. Bhat @ 2012-12-05 18:53 ` Srivatsa S. Bhat 2012-12-05 18:56 ` Srivatsa S. Bhat 0 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:53 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel Replaying what Tejun wrote: Hello, Oleg. > Replaying what Oleg wrote: > > Hi, > > Sorry I don't understand the context and I can't find this thread > anywhere, so I am not sure I understand... > Weird, lkml cc is missing. Srivatsa? [Now fixed. This thread has lkml CC] >> Replaying what Tejun wrote: >> So, we basically need percpu_rwlock. We already have percpu_rwsem. > > Yes, and with -mm patches it becomes reader-friendly. In particular > see http://marc.info/?l=linux-mm-commits&m=135240650828875 > >> Oleg, it seems >> CPU hotplug needs big-reader rwlock, ideas on how to proceed? >> > > I am going to convert get_online_cpus() to use percpu_down_read(), > this looks simple. > > We already discussed this with Paul, see > > http://marc.info/?l=linux-kernel&m=135248463226031 > > and the whole thread. > > In short, all we need is percpu_down_write_recursive_readers() and > afaics the only complication is lockdep, we need down_read_no_lockdep() > which (like __up_read) doesn't do rwsem_acquire_read(). > So, it's a different thing. There are two mechanism protecting against cpu hotplug - get_online_cpus() and preempt_disable(). The former can be used by ones which can sleep and need to protect against the whole up/down process (DOWN_PREPARE and so on). The latter protects the last step and can be used when the caller can't sleep. Replacing get_online_cpus() w/ percpu_rwsem is great but this thread is about replacing preempt_disable with something finer grained and less heavy on the writer side - IOW, percpu_rwlock as opposed to percpu_rwsem, so, I think the end result would be that CPU hotplug will be protected by percpu_rwsem for the whole part and by percpu_rwlock for the last commit stage. The problem seems that we don't have percpu_rwlock yet. It shouldn't be too difficult to implement, right? Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:53 ` Srivatsa S. Bhat @ 2012-12-05 18:56 ` Srivatsa S. Bhat 2012-12-05 18:59 ` Srivatsa S. Bhat 0 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:56 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel Replaying what Oleg wrote: (add lkml) > Replaying what Tejun wrote: > Replacing get_online_cpus() w/ percpu_rwsem is great but this thread > is about replacing preempt_disable with something finer grained and > less heavy on the writer side If only I understood why preempt_disable() is bad ;-) OK, I guess "less heavy on the writer side" is the hint, and in the previous email you mentioned that "stop_machine() itself is extremely heavy". Looks like, you are going to remove stop_machine() from cpu_down ??? > The problem seems that we don't have percpu_rwlock yet. It shouldn't > be too difficult to implement, right? > Oh, I am not sure... unless you simply copy-and-paste the lglock code and replace spinlock_t with rwlock_t. We probably want something more efficient, but I bet we can't avoid the barriers on the read side. And somehow we should avoid the livelocks. Say, we can't simply add the per_cpu_reader_counter, _read_lock should spin if the writer is active. But at the same time _read_lock should be recursive. Tejun, could you please send me mbox with this thread offlist? [That should now be unnecessary, since the discussion can continue on-list on this thread]. Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:56 ` Srivatsa S. Bhat @ 2012-12-05 18:59 ` Srivatsa S. Bhat 2012-12-05 20:14 ` Srivatsa S. Bhat 0 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 18:59 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel Replaying what Tejun wrote: Hello, Oleg. > Replaying what Oleg wrote: >> Replacing get_online_cpus() w/ percpu_rwsem is great but this thread >> is about replacing preempt_disable with something finer grained and >> less heavy on the writer side > > If only I understood why preempt_disable() is bad ;-) > > OK, I guess "less heavy on the writer side" is the hint, and in the > previous email you mentioned that "stop_machine() itself is extremely > heavy". > > Looks like, you are going to remove stop_machine() from cpu_down ??? > Yeah, that's what Srivatsa is trying to do. The problem seems to be that cpu up/down is very frequent on certain mobile platforms for power management and as currently implemented cpu hotplug is too heavy and latency-inducing. >> The problem seems that we don't have percpu_rwlock yet. It shouldn't >> be too difficult to implement, right? >> > > Oh, I am not sure... unless you simply copy-and-paste the lglock code > and replace spinlock_t with rwlock_t. > Ah... right, so that's where brlock ended up. So, lglock is the new thing and brlock is a wrapper around it. > We probably want something more efficient, but I bet we can't avoid > the barriers on the read side. > > And somehow we should avoid the livelocks. Say, we can't simply add > the per_cpu_reader_counter, _read_lock should spin if the writer is > active. But at the same time _read_lock should be recursive. > I think we should just go with lglock. It does involve local atomic ops but atomic ops themselves aren't that expensive and it's not like we can avoid memory barriers. Also, that's the non-sleeping counterpart of percpu_rwsem. If it's not good enough for some reason, we should improve it rather than introducing something else. Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:59 ` Srivatsa S. Bhat @ 2012-12-05 20:14 ` Srivatsa S. Bhat 2012-12-06 16:18 ` Oleg Nesterov 0 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 20:14 UTC (permalink / raw) To: tj, oleg Cc: Srivatsa S. Bhat, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel > Replaying what Tejun wrote: > > Hello, Oleg. > >> Replaying what Oleg wrote: >>> Replacing get_online_cpus() w/ percpu_rwsem is great but this thread >>> is about replacing preempt_disable with something finer grained and >>> less heavy on the writer side >> >> If only I understood why preempt_disable() is bad ;-) >> >> OK, I guess "less heavy on the writer side" is the hint, and in the >> previous email you mentioned that "stop_machine() itself is extremely >> heavy". >> >> Looks like, you are going to remove stop_machine() from cpu_down ??? >> > > Yeah, that's what Srivatsa is trying to do. The problem seems to be > that cpu up/down is very frequent on certain mobile platforms for > power management and as currently implemented cpu hotplug is too heavy > and latency-inducing. > >>> The problem seems that we don't have percpu_rwlock yet. It shouldn't >>> be too difficult to implement, right? >>> >> >> Oh, I am not sure... unless you simply copy-and-paste the lglock code >> and replace spinlock_t with rwlock_t. >> > > Ah... right, so that's where brlock ended up. So, lglock is the new > thing and brlock is a wrapper around it. > >> We probably want something more efficient, but I bet we can't avoid >> the barriers on the read side. >> >> And somehow we should avoid the livelocks. Say, we can't simply add >> the per_cpu_reader_counter, _read_lock should spin if the writer is >> active. But at the same time _read_lock should be recursive. >> > > I think we should just go with lglock. It does involve local atomic > ops but atomic ops themselves aren't that expensive and it's not like > we can avoid memory barriers. Also, that's the non-sleeping > counterpart of percpu_rwsem. If it's not good enough for some reason, > we should improve it rather than introducing something else. > While working on the v2 yesterday, I had actually used rwlocks for the light readers and atomic ops for the full-readers. (Later I changed both to rwlocks while posting this v2). Anyway, the atomic ops version looked something like shown below. I'll take a look at lglocks and see if that helps in our case. Regards, Srivatsa S. Bhat --- include/linux/cpu.h | 4 ++ kernel/cpu.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index c64b6ed..5011c7d 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -177,6 +177,8 @@ extern void get_online_cpus(void); extern void put_online_cpus(void); extern void get_online_cpus_stable_atomic(void); extern void put_online_cpus_stable_atomic(void); +extern void get_online_cpus_atomic(void); +extern void put_online_cpus_atomic(void); #define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) @@ -202,6 +204,8 @@ static inline void cpu_hotplug_driver_unlock(void) #define put_online_cpus() do { } while (0) #define get_online_cpus_stable_atomic() do { } while (0) #define put_online_cpus_stable_atomic() do { } while (0) +#define get_online_cpus_atomic() do { } while (0) +#define put_online_cpus_atomic() do { } while (0) #define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0) /* These aren't inline functions due to a GCC bug. */ #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; }) diff --git a/kernel/cpu.c b/kernel/cpu.c index 8c9eecc..76b07f7 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -19,6 +19,7 @@ #include <linux/mutex.h> #include <linux/gfp.h> #include <linux/suspend.h> +#include <linux/atomic.h> #include "smpboot.h" @@ -104,6 +105,58 @@ void put_online_cpus_stable_atomic(void) } EXPORT_SYMBOL_GPL(put_online_cpus_stable_atomic); +static DEFINE_PER_CPU(atomic_t, atomic_reader_refcount); + +#define writer_active(v) ((v) < 0) +#define reader_active(v) ((v) > 0) + +/* + * Invoked by hotplug reader, to prevent CPUs from going offline. + * Increments its per-cpu 'atomic_reader_refcount' to mark itself as being + * active. + * + * If 'atomic_reader_refcount' is negative, it means that a CPU offline + * operation is in progress (hotplug writer). Wait for it to complete + * and then mark your presence (increment the count) and return. + * + * You can call this recursively, because it doesn't hold any locks. + * + * Returns with preemption disabled. + */ +void get_online_cpus_atomic(void) +{ + int c, old; + + preempt_disable(); + read_lock(&hotplug_rwlock); + + for (;;) { + c = atomic_read(&__get_cpu_var(atomic_reader_refcount)); + if (unlikely(writer_active(c))) { + cpu_relax(); + continue; + } + + old = atomic_cmpxchg(&__get_cpu_var(atomic_reader_refcount), + c, c + 1); + + if (likely(old == c)) + break; + + c = old; + } +} +EXPORT_SYMBOL_GPL(get_online_cpus_atomic); + +void put_online_cpus_atomic(void) +{ + atomic_dec(&__get_cpu_var(atomic_reader_refcount)); + smp_mb__after_atomic_dec(); + read_unlock(&hotplug_rwlock); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(put_online_cpus_atomic); + static struct { struct task_struct *active_writer; struct mutex lock; /* Synchronizes accesses to refcount, */ @@ -292,6 +345,42 @@ static inline void check_for_tasks(int cpu) write_unlock_irq(&tasklist_lock); } +/* + * Invoked by hotplug writer, in preparation to take a CPU offline. + * Decrements the per-cpu 'atomic_reader_refcount' to mark itself as being + * active. + * + * If 'atomic_reader_refcount' is positive, it means that there are active + * hotplug readers (those that prevent hot-unplug). Wait for them to complete + * and then mark your presence (decrement the count) and return. + */ +static void disable_atomic_reader(unsigned int cpu) +{ + int c, old; + + for (;;) { + c = atomic_read(&per_cpu(atomic_reader_refcount, cpu)); + if (likely(reader_active(c))) { + cpu_relax(); + continue; + } + + old = atomic_cmpxchg(&per_cpu(atomic_reader_refcount, cpu), + c, c - 1); + + if (likely(old == c)) + break; + + c = old; + } +} + +static void enable_atomic_reader(unsigned int cpu) +{ + atomic_inc(&per_cpu(atomic_reader_refcount, cpu)); + smp_mb__after_atomic_inc(); +} + struct take_cpu_down_param { unsigned long mod; void *hcpu; @@ -302,6 +391,7 @@ static int __ref take_cpu_down(void *_param) { struct take_cpu_down_param *param = _param; unsigned long flags; + unsigned int cpu; int err; /* @@ -317,6 +407,10 @@ static int __ref take_cpu_down(void *_param) return err; } + /* Disable the atomic hotplug readers who need full synchronization */ + for_each_online_cpu(cpu) + disable_atomic_reader(cpu); + /* * We have successfully removed the CPU from the cpu_online_mask. * So release the lock, so that the light-weight atomic readers (who care @@ -330,6 +424,10 @@ static int __ref take_cpu_down(void *_param) cpu_notify(CPU_DYING | param->mod, param->hcpu); + /* Enable the atomic hotplug readers who need full synchronization */ + for_each_online_cpu(cpu) + enable_atomic_reader(cpu); + local_irq_restore(flags); return 0; } ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 20:14 ` Srivatsa S. Bhat @ 2012-12-06 16:18 ` Oleg Nesterov 2012-12-06 18:48 ` Srivatsa S. Bhat 0 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2012-12-06 16:18 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/06, Srivatsa S. Bhat wrote: > > +void get_online_cpus_atomic(void) > +{ > + int c, old; > + > + preempt_disable(); > + read_lock(&hotplug_rwlock); Confused... Why it also takes hotplug_rwlock? > + > + for (;;) { > + c = atomic_read(&__get_cpu_var(atomic_reader_refcount)); > + if (unlikely(writer_active(c))) { > + cpu_relax(); > + continue; > + } > + > + old = atomic_cmpxchg(&__get_cpu_var(atomic_reader_refcount), > + c, c + 1); > + > + if (likely(old == c)) > + break; > + > + c = old; > + } > +} while (!atomic_inc_unless_negative(...)) cpu_relax(); and atomic_dec_unless_positive() in disable_atomic_reader(). Obviously you can't use get_online_cpus_atomic() under rq->lock or task->pi_lock or any other lock CPU_DYING can take. Probably this is fine, but perhaps it makes sense to add the lockdep annotations. Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 16:18 ` Oleg Nesterov @ 2012-12-06 18:48 ` Srivatsa S. Bhat 2012-12-06 19:17 ` Srivatsa S. Bhat ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-06 18:48 UTC (permalink / raw) To: Oleg Nesterov Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > On 12/06, Srivatsa S. Bhat wrote: >> >> +void get_online_cpus_atomic(void) >> +{ >> + int c, old; >> + >> + preempt_disable(); >> + read_lock(&hotplug_rwlock); > > Confused... Why it also takes hotplug_rwlock? To avoid ABBA deadlocks. hotplug_rwlock was meant for the "light" readers. The atomic counters were meant for the "heavy/full" readers. I wanted them to be able to nest in any manner they wanted, such as: Full inside light: get_online_cpus_atomic_light() ... get_online_cpus_atomic_full() ... put_online_cpus_atomic_full() ... put_online_cpus_atomic_light() Or, light inside full: get_online_cpus_atomic_full() ... get_online_cpus_atomic_light() ... put_online_cpus_atomic_light() ... put_online_cpus_atomic_full() To allow this, I made the two sets of APIs take the locks in the same order internally. (I had some more description of this logic in the changelog of 2/10; the only difference there is that instead of atomic counters, I used rwlocks for the full-readers as well. https://lkml.org/lkml/2012/12/5/320) > >> + >> + for (;;) { >> + c = atomic_read(&__get_cpu_var(atomic_reader_refcount)); >> + if (unlikely(writer_active(c))) { >> + cpu_relax(); >> + continue; >> + } >> + >> + old = atomic_cmpxchg(&__get_cpu_var(atomic_reader_refcount), >> + c, c + 1); >> + >> + if (likely(old == c)) >> + break; >> + >> + c = old; >> + } >> +} > > while (!atomic_inc_unless_negative(...)) > cpu_relax(); > > and atomic_dec_unless_positive() in disable_atomic_reader(). > Ah, great! I was searching for them while writing the code, but somehow overlooked them and rolled out my own. ;-) > > Obviously you can't use get_online_cpus_atomic() under rq->lock or > task->pi_lock or any other lock CPU_DYING can take. Probably this is > fine, but perhaps it makes sense to add the lockdep annotations. > Hmm, you are right. We can't use _atomic() in the CPU_DYING path. So how about altering it to _allow_ that, instead of teaching lockdep that we don't allow it? I mean, just like how the existing get_online_cpus() allows such calls in the writer? (I haven't thought it through; just thinking aloud...) Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 18:48 ` Srivatsa S. Bhat @ 2012-12-06 19:17 ` Srivatsa S. Bhat 2012-12-07 21:01 ` Oleg Nesterov 2012-12-06 19:28 ` Steven Rostedt 2012-12-07 19:56 ` Oleg Nesterov 2 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-06 19:17 UTC (permalink / raw) To: Oleg Nesterov Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/07/2012 12:18 AM, Srivatsa S. Bhat wrote: > On 12/06/2012 09:48 PM, Oleg Nesterov wrote: >> On 12/06, Srivatsa S. Bhat wrote: >>> >>> +void get_online_cpus_atomic(void) >>> +{ >>> + int c, old; >>> + >>> + preempt_disable(); >>> + read_lock(&hotplug_rwlock); >> >> Confused... Why it also takes hotplug_rwlock? > > To avoid ABBA deadlocks. > > hotplug_rwlock was meant for the "light" readers. > The atomic counters were meant for the "heavy/full" readers. > I wanted them to be able to nest in any manner they wanted, > such as: > > Full inside light: > > get_online_cpus_atomic_light() > ... > get_online_cpus_atomic_full() > ... > put_online_cpus_atomic_full() > ... > put_online_cpus_atomic_light() > > Or, light inside full: > > get_online_cpus_atomic_full() > ... > get_online_cpus_atomic_light() > ... > put_online_cpus_atomic_light() > ... > put_online_cpus_atomic_full() > > To allow this, I made the two sets of APIs take the locks > in the same order internally. > > (I had some more description of this logic in the changelog > of 2/10; the only difference there is that instead of atomic > counters, I used rwlocks for the full-readers as well. > https://lkml.org/lkml/2012/12/5/320) > One of the reasons why I changed everybody to global rwlocks instead of per-cpu atomic counters was to avoid lock ordering related deadlocks associated with per-cpu locking. Eg: CPU 0 CPU 1 ------ ------ 1. Acquire lock A Increment CPU1's atomic counter 2. Increment CPU0's Try to acquire lock A atomic counter Now consider what happens if a hotplug writer (cpu_down) begins, and starts looking at CPU0, to try to decrement its atomic counter, in between steps 1 and 2. The hotplug writer will be successful in CPU0 because CPU0 hasn't yet incremented its counter. So, now CPU0 spins waiting for the hotplug writer to reset the atomic counter again. When the hotplug writer looks at CPU1, it can't decrement the atomic counter because CPU1 has already incremented it. So the hotplug writer waits. CPU1 goes ahead, only to start spinning on lock A which was acquired by CPU0. So we end up in a deadlock due to circular locking dependency between the 3 entities. One way to deal with this would be that the writer should abort its loop of trying to atomic_dec per-cpu counters, whenever it has to wait. But that might prove to be too wasteful (and overly paranoid) in practice. So, instead, if we have global locks (like global rwlocks that I finally used when posting out this v2), then we won't end up in such messy issues. And why exactly was I concerned about all this? Because, the current code uses the extremely flexible preempt_disable() /preempt_enable() pair which impose absolutely no ordering restrictions. And probably the existing code _depends_ on that. So if our new APIs that replace preempt_disable/enable start imposing ordering restrictions, I feared that they might become unusable. Hence I used global rwlocks, thereby trading cache-line bouncing (due to global rwlocks) for lock-safety and flexibility. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 19:17 ` Srivatsa S. Bhat @ 2012-12-07 21:01 ` Oleg Nesterov 0 siblings, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2012-12-07 21:01 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/07, Srivatsa S. Bhat wrote: > > CPU 0 CPU 1 > ------ ------ > > 1. Acquire lock A Increment CPU1's > atomic counter > > > > 2. Increment CPU0's Try to acquire lock A > atomic counter > > > Now consider what happens if a hotplug writer (cpu_down) begins, Exactly. So the fake lockdep_map should be per-cpu as well. lglock doesn't need this because lg_local_lock() is not recursive and lockdep can catch the bug like this. So it can look as single lock for lockdep. IOW. If you use the global lockdep_map and want lockdep to notice this deadlock you need rwlock_acquire_read(map, 0, 0, IP). But then lockdep will complain if the task does "read lock" under "read lock". Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 18:48 ` Srivatsa S. Bhat 2012-12-06 19:17 ` Srivatsa S. Bhat @ 2012-12-06 19:28 ` Steven Rostedt 2012-12-06 19:36 ` Srivatsa S. Bhat [not found] ` <20121207200014.GB13238@redhat.com> 2012-12-07 19:56 ` Oleg Nesterov 2 siblings, 2 replies; 23+ messages in thread From: Steven Rostedt @ 2012-12-06 19:28 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Oleg Nesterov, tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On Fri, 2012-12-07 at 00:18 +0530, Srivatsa S. Bhat wrote: > On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > > On 12/06, Srivatsa S. Bhat wrote: > >> > >> +void get_online_cpus_atomic(void) > >> +{ > >> + int c, old; > >> + > >> + preempt_disable(); > >> + read_lock(&hotplug_rwlock); > > > > Confused... Why it also takes hotplug_rwlock? > > To avoid ABBA deadlocks. > > hotplug_rwlock was meant for the "light" readers. > The atomic counters were meant for the "heavy/full" readers. > I wanted them to be able to nest in any manner they wanted, > such as: > > Full inside light: > > get_online_cpus_atomic_light() > ... > get_online_cpus_atomic_full() > ... > put_online_cpus_atomic_full() > ... > put_online_cpus_atomic_light() > > Or, light inside full: > > get_online_cpus_atomic_full() > ... > get_online_cpus_atomic_light() > ... > put_online_cpus_atomic_light() > ... > put_online_cpus_atomic_full() > > To allow this, I made the two sets of APIs take the locks > in the same order internally. > > (I had some more description of this logic in the changelog > of 2/10; the only difference there is that instead of atomic > counters, I used rwlocks for the full-readers as well. > https://lkml.org/lkml/2012/12/5/320) > You know reader locks can deadlock with each other, right? And this isn't caught be lockdep yet. This is because rwlocks have been made to be fair with writers. Before writers could be starved if a CPU always let a reader in. Now if a writer is waiting, a reader will block behind the writer. But this has introduced new issues with the kernel as follows: CPU0 CPU1 CPU2 CPU3 ---- ---- ---- ---- read_lock(A); read_lock(B) write_lock(A) <- block write_lock(B) <- block read_lock(B) <-block read_lock(A) <- block DEADLOCK! -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 19:28 ` Steven Rostedt @ 2012-12-06 19:36 ` Srivatsa S. Bhat 2012-12-06 22:02 ` Steven Rostedt [not found] ` <20121207200014.GB13238@redhat.com> 1 sibling, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-06 19:36 UTC (permalink / raw) To: Steven Rostedt Cc: Oleg Nesterov, tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/07/2012 12:58 AM, Steven Rostedt wrote: > On Fri, 2012-12-07 at 00:18 +0530, Srivatsa S. Bhat wrote: >> On 12/06/2012 09:48 PM, Oleg Nesterov wrote: >>> On 12/06, Srivatsa S. Bhat wrote: >>>> >>>> +void get_online_cpus_atomic(void) >>>> +{ >>>> + int c, old; >>>> + >>>> + preempt_disable(); >>>> + read_lock(&hotplug_rwlock); >>> >>> Confused... Why it also takes hotplug_rwlock? >> >> To avoid ABBA deadlocks. >> >> hotplug_rwlock was meant for the "light" readers. >> The atomic counters were meant for the "heavy/full" readers. >> I wanted them to be able to nest in any manner they wanted, >> such as: >> >> Full inside light: >> >> get_online_cpus_atomic_light() >> ... >> get_online_cpus_atomic_full() >> ... >> put_online_cpus_atomic_full() >> ... >> put_online_cpus_atomic_light() >> >> Or, light inside full: >> >> get_online_cpus_atomic_full() >> ... >> get_online_cpus_atomic_light() >> ... >> put_online_cpus_atomic_light() >> ... >> put_online_cpus_atomic_full() >> >> To allow this, I made the two sets of APIs take the locks >> in the same order internally. >> >> (I had some more description of this logic in the changelog >> of 2/10; the only difference there is that instead of atomic >> counters, I used rwlocks for the full-readers as well. >> https://lkml.org/lkml/2012/12/5/320) >> > > You know reader locks can deadlock with each other, right? And this > isn't caught be lockdep yet. This is because rwlocks have been made to > be fair with writers. Before writers could be starved if a CPU always > let a reader in. Now if a writer is waiting, a reader will block behind > the writer. But this has introduced new issues with the kernel as > follows: > > > CPU0 CPU1 CPU2 CPU3 > ---- ---- ---- ---- > read_lock(A); > read_lock(B) > write_lock(A) <- block > write_lock(B) <- block > read_lock(B) <-block > > read_lock(A) <- block > > DEADLOCK! > The root-cause of this deadlock is again lock-ordering mismatch right? CPU0 takes locks in order A, B CPU1 takes locks in order B, A And the writer facilitates in actually getting deadlocked. I avoid this in this patchset by always taking the locks in the same order. So we won't be deadlocking like this. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 19:36 ` Srivatsa S. Bhat @ 2012-12-06 22:02 ` Steven Rostedt 2012-12-07 17:33 ` Srivatsa S. Bhat 0 siblings, 1 reply; 23+ messages in thread From: Steven Rostedt @ 2012-12-06 22:02 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Oleg Nesterov, tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On Fri, 2012-12-07 at 01:06 +0530, Srivatsa S. Bhat wrote: > On 12/07/2012 12:58 AM, Steven Rostedt wrote: > > On Fri, 2012-12-07 at 00:18 +0530, Srivatsa S. Bhat wrote: > >> On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > >>> On 12/06, Srivatsa S. Bhat wrote: > >>>> > >>>> +void get_online_cpus_atomic(void) > >>>> +{ > >>>> + int c, old; > >>>> + > >>>> + preempt_disable(); > >>>> + read_lock(&hotplug_rwlock); > >>> > >>> Confused... Why it also takes hotplug_rwlock? > >> > >> To avoid ABBA deadlocks. > >> > >> hotplug_rwlock was meant for the "light" readers. > >> The atomic counters were meant for the "heavy/full" readers. > >> I wanted them to be able to nest in any manner they wanted, > >> such as: > >> > >> Full inside light: > >> > >> get_online_cpus_atomic_light() > >> ... > >> get_online_cpus_atomic_full() > >> ... > >> put_online_cpus_atomic_full() > >> ... > >> put_online_cpus_atomic_light() > >> > >> Or, light inside full: > >> > >> get_online_cpus_atomic_full() > >> ... > >> get_online_cpus_atomic_light() > >> ... > >> put_online_cpus_atomic_light() > >> ... > >> put_online_cpus_atomic_full() > >> > >> > The root-cause of this deadlock is again lock-ordering mismatch right? > CPU0 takes locks in order A, B > CPU1 takes locks in order B, A > > And the writer facilitates in actually getting deadlocked. > > I avoid this in this patchset by always taking the locks in the same > order. So we won't be deadlocking like this. OK, I haven't looked closely at the patch yet. I'm currently hacking on my own problems. But just from the description above, it looked like you were using rw_locks() to be able to inverse the order of the locks. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 22:02 ` Steven Rostedt @ 2012-12-07 17:33 ` Srivatsa S. Bhat 0 siblings, 0 replies; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-07 17:33 UTC (permalink / raw) To: Steven Rostedt Cc: Oleg Nesterov, tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/07/2012 03:32 AM, Steven Rostedt wrote: > On Fri, 2012-12-07 at 01:06 +0530, Srivatsa S. Bhat wrote: >> The root-cause of this deadlock is again lock-ordering mismatch right? >> CPU0 takes locks in order A, B >> CPU1 takes locks in order B, A >> >> And the writer facilitates in actually getting deadlocked. >> >> I avoid this in this patchset by always taking the locks in the same >> order. So we won't be deadlocking like this. > > OK, I haven't looked closely at the patch yet. I'm currently hacking on > my own problems. But just from the description above, it looked like you > were using rw_locks() to be able to inverse the order of the locks. > Ah, ok, no problem! I'd be grateful if you could take a look when you are free :-) I'll post a v3 soon, which has a completely redesigned synchronization scheme, to address the performance concerns related to global rwlocks that Tejun raised. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20121207200014.GB13238@redhat.com>]
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline [not found] ` <20121207200014.GB13238@redhat.com> @ 2012-12-10 18:21 ` Oleg Nesterov 2012-12-10 19:07 ` Steven Rostedt 0 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2012-12-10 18:21 UTC (permalink / raw) To: Steven Rostedt Cc: Srivatsa S. Bhat, tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/07, Oleg Nesterov wrote: > > On 12/06, Steven Rostedt wrote: > > > > You know reader locks can deadlock with each other, right? And this > > isn't caught be lockdep yet. This is because rwlocks have been made to > > be fair with writers. Before writers could be starved if a CPU always > > let a reader in. Now if a writer is waiting, a reader will block behind > > the writer. But this has introduced new issues with the kernel as > > follows: > > > > > > CPU0 CPU1 CPU2 CPU3 > > ---- ---- ---- ---- > > read_lock(A); > > read_lock(B) > > write_lock(A) <- block > > write_lock(B) <- block > > read_lock(B) <-block > > > > read_lock(A) <- block > > > > DEADLOCK! > > Really??? Oh I didn't know... > > Yes this was always true for rwsem, but rwlock_t? Sorry, please ignore my email. I misread your email. Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-10 18:21 ` Oleg Nesterov @ 2012-12-10 19:07 ` Steven Rostedt 0 siblings, 0 replies; 23+ messages in thread From: Steven Rostedt @ 2012-12-10 19:07 UTC (permalink / raw) To: Oleg Nesterov Cc: Srivatsa S. Bhat, tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On Mon, 2012-12-10 at 19:21 +0100, Oleg Nesterov wrote: > On 12/07, Oleg Nesterov wrote: > > > > On 12/06, Steven Rostedt wrote: > > > > > > You know reader locks can deadlock with each other, right? And this > > > isn't caught be lockdep yet. This is because rwlocks have been made to > > > be fair with writers. Before writers could be starved if a CPU always > > > let a reader in. Now if a writer is waiting, a reader will block behind > > > the writer. But this has introduced new issues with the kernel as > > > follows: > > > > > > > > > CPU0 CPU1 CPU2 CPU3 > > > ---- ---- ---- ---- > > > read_lock(A); > > > read_lock(B) > > > write_lock(A) <- block > > > write_lock(B) <- block > > > read_lock(B) <-block > > > > > > read_lock(A) <- block > > > > > > DEADLOCK! > > > > Really??? Oh I didn't know... > > > > Yes this was always true for rwsem, but rwlock_t? > > Sorry, please ignore my email. I misread your email. > No prob, looking at what I wrote, I should have explicitly stated two different rwlocks. The only hint that I gave about two locks was (A) and (B). Even what I started with: "reader locks can deadlock with each other" is a bit ambiguous. So I can easily see the confusion. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-06 18:48 ` Srivatsa S. Bhat 2012-12-06 19:17 ` Srivatsa S. Bhat 2012-12-06 19:28 ` Steven Rostedt @ 2012-12-07 19:56 ` Oleg Nesterov 2012-12-07 20:25 ` Srivatsa S. Bhat 2 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2012-12-07 19:56 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/07, Srivatsa S. Bhat wrote: > > On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > > On 12/06, Srivatsa S. Bhat wrote: > >> > >> +void get_online_cpus_atomic(void) > >> +{ > >> + int c, old; > >> + > >> + preempt_disable(); > >> + read_lock(&hotplug_rwlock); > > > > Confused... Why it also takes hotplug_rwlock? > > To avoid ABBA deadlocks. > > hotplug_rwlock was meant for the "light" readers. > The atomic counters were meant for the "heavy/full" readers. OK, I got lost a bit. I'll try to read v3 tomorrow. > > Obviously you can't use get_online_cpus_atomic() under rq->lock or > > task->pi_lock or any other lock CPU_DYING can take. Probably this is > > fine, but perhaps it makes sense to add the lockdep annotations. > > Hmm, you are right. We can't use _atomic() in the CPU_DYING path. Not sure I undestand... I simply meant that, say, get_online_cpus_atomic() under task->pi_lock can obviously deadlock with take_cpu_down() which can want the same task->pi_lock after disable_atomic_reader(). Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-07 19:56 ` Oleg Nesterov @ 2012-12-07 20:25 ` Srivatsa S. Bhat 2012-12-07 20:59 ` Oleg Nesterov 0 siblings, 1 reply; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-07 20:25 UTC (permalink / raw) To: Oleg Nesterov Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/08/2012 01:26 AM, Oleg Nesterov wrote: > On 12/07, Srivatsa S. Bhat wrote: >> >> On 12/06/2012 09:48 PM, Oleg Nesterov wrote: >>> On 12/06, Srivatsa S. Bhat wrote: >>>> >>>> +void get_online_cpus_atomic(void) >>>> +{ >>>> + int c, old; >>>> + >>>> + preempt_disable(); >>>> + read_lock(&hotplug_rwlock); >>> >>> Confused... Why it also takes hotplug_rwlock? >> >> To avoid ABBA deadlocks. >> >> hotplug_rwlock was meant for the "light" readers. >> The atomic counters were meant for the "heavy/full" readers. > > OK, I got lost a bit. I'll try to read v3 tomorrow. > OK, thanks! But note that v3 is completely different from v2. >>> Obviously you can't use get_online_cpus_atomic() under rq->lock or >>> task->pi_lock or any other lock CPU_DYING can take. Probably this is >>> fine, but perhaps it makes sense to add the lockdep annotations. >> >> Hmm, you are right. We can't use _atomic() in the CPU_DYING path. > > Not sure I undestand... I simply meant that, say, > get_online_cpus_atomic() under task->pi_lock can obviously deadlock > with take_cpu_down() which can want the same task->pi_lock after > disable_atomic_reader(). > Right, I mistook your point for something else (i.e., ability for the writer to do get_online_cpus_atomic() safely, which I fixed in v3). So, your point above is very valid. And yes, we can't do much about it, we'll just have to teach lockdep to catch such usages. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-07 20:25 ` Srivatsa S. Bhat @ 2012-12-07 20:59 ` Oleg Nesterov 0 siblings, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2012-12-07 20:59 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/08, Srivatsa S. Bhat wrote: > > On 12/08/2012 01:26 AM, Oleg Nesterov wrote: > > Not sure I undestand... I simply meant that, say, > > get_online_cpus_atomic() under task->pi_lock can obviously deadlock > > with take_cpu_down() which can want the same task->pi_lock after > > disable_atomic_reader(). > > Right, I mistook your point for something else (i.e., ability for > the writer to do get_online_cpus_atomic() safely, which I fixed in > v3). > > So, your point above is very valid. And yes, we can't do much > about it, we'll just have to teach lockdep to catch such usages. Afaics, this is simple. Just add the "fake" lockdep_map as, say, lglock does. Except, you need rwlock_acquire_read(map, 0, 1, IP) because this lock is recursive. But. There is another email from you about the possible deadlock. I'll write the reply in a minute... Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 18:43 ` [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline Srivatsa S. Bhat 2012-12-05 18:47 ` Srivatsa S. Bhat @ 2012-12-05 19:07 ` Oleg Nesterov 2012-12-05 19:16 ` Srivatsa S. Bhat 1 sibling, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2012-12-05 19:07 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel I'll try to read this series later, one minor and almost offtopic nit. On 12/06, Srivatsa S. Bhat wrote: > > static int __ref take_cpu_down(void *_param) > { > struct take_cpu_down_param *param = _param; > + unsigned long flags; > int err; > > + /* > + * __cpu_disable() is the step where the CPU is removed from the > + * cpu_online_mask. Protect it with the light-lock held for write. > + */ > + write_lock_irqsave(&light_hotplug_rwlock, flags); > + > /* Ensure this CPU doesn't handle any more interrupts. */ > err = __cpu_disable(); > - if (err < 0) > + if (err < 0) { > + write_unlock_irqrestore(&light_hotplug_rwlock, flags); > return err; > + } > + > + /* > + * We have successfully removed the CPU from the cpu_online_mask. > + * So release the light-lock, so that the light-weight atomic readers > + * (who care only about the cpu_online_mask updates, and not really > + * about the actual cpu-take-down operation) can continue. > + * > + * But don't enable interrupts yet, because we still have work left to > + * do, to actually bring the CPU down. > + */ > + write_unlock(&light_hotplug_rwlock); > > cpu_notify(CPU_DYING | param->mod, param->hcpu); > + > + local_irq_restore(flags); > return 0; This is subjective, but imho _irqsave and the fat comment look confusing. Currently take_cpu_down() is always called with irqs disabled, so you do not need to play with interrupts. 10/10 does s/__stop_machine/stop_cpus/ and that patch could simply add local_irq_disable/enable into take_cpu_down(). But again this is minor and subjective, I won't insist. Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline 2012-12-05 19:07 ` Oleg Nesterov @ 2012-12-05 19:16 ` Srivatsa S. Bhat 0 siblings, 0 replies; 23+ messages in thread From: Srivatsa S. Bhat @ 2012-12-05 19:16 UTC (permalink / raw) To: Oleg Nesterov Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung, vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel On 12/06/2012 12:37 AM, Oleg Nesterov wrote: > I'll try to read this series later, > > one minor and almost offtopic nit. > > On 12/06, Srivatsa S. Bhat wrote: >> >> static int __ref take_cpu_down(void *_param) >> { >> struct take_cpu_down_param *param = _param; >> + unsigned long flags; >> int err; >> >> + /* >> + * __cpu_disable() is the step where the CPU is removed from the >> + * cpu_online_mask. Protect it with the light-lock held for write. >> + */ >> + write_lock_irqsave(&light_hotplug_rwlock, flags); >> + >> /* Ensure this CPU doesn't handle any more interrupts. */ >> err = __cpu_disable(); >> - if (err < 0) >> + if (err < 0) { >> + write_unlock_irqrestore(&light_hotplug_rwlock, flags); >> return err; >> + } >> + >> + /* >> + * We have successfully removed the CPU from the cpu_online_mask. >> + * So release the light-lock, so that the light-weight atomic readers >> + * (who care only about the cpu_online_mask updates, and not really >> + * about the actual cpu-take-down operation) can continue. >> + * >> + * But don't enable interrupts yet, because we still have work left to >> + * do, to actually bring the CPU down. >> + */ >> + write_unlock(&light_hotplug_rwlock); >> >> cpu_notify(CPU_DYING | param->mod, param->hcpu); >> + >> + local_irq_restore(flags); >> return 0; > > This is subjective, but imho _irqsave and the fat comment look confusing. > > Currently take_cpu_down() is always called with irqs disabled, so you > do not need to play with interrupts. > > 10/10 does s/__stop_machine/stop_cpus/ and that patch could simply add > local_irq_disable/enable into take_cpu_down(). > Hmm, we could certainly do that, but somehow I felt it would be easier to read if we tinker and fix up the take_cpu_down() logic at one place, as a whole, instead of breaking up into pieces in different patches. And that also makes the last patch look really cute: it just replaces stop_machine() with stop_cpus(), as the changelog intended. I'll see if doing like what you suggested improves the readability, and if yes, I'll change it. Thank you! Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-12-10 19:07 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20121205131038.17383.55472.stgit@srivatsabhat.in.ibm.com> [not found] ` <20121205131136.17383.23318.stgit@srivatsabhat.in.ibm.com> [not found] ` <20121205142316.GI3885@mtj.dyndns.org> [not found] ` <20121205164640.GA7382@redhat.com> [not found] ` <20121205165356.GL3885@mtj.dyndns.org> 2012-12-05 18:32 ` [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline Srivatsa S. Bhat 2012-12-05 18:42 [RFC PATCH v2 00/10][RESEND] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat 2012-12-05 18:43 ` [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline Srivatsa S. Bhat 2012-12-05 18:47 ` Srivatsa S. Bhat 2012-12-05 18:51 ` Srivatsa S. Bhat 2012-12-05 18:53 ` Srivatsa S. Bhat 2012-12-05 18:56 ` Srivatsa S. Bhat 2012-12-05 18:59 ` Srivatsa S. Bhat 2012-12-05 20:14 ` Srivatsa S. Bhat 2012-12-06 16:18 ` Oleg Nesterov 2012-12-06 18:48 ` Srivatsa S. Bhat 2012-12-06 19:17 ` Srivatsa S. Bhat 2012-12-07 21:01 ` Oleg Nesterov 2012-12-06 19:28 ` Steven Rostedt 2012-12-06 19:36 ` Srivatsa S. Bhat 2012-12-06 22:02 ` Steven Rostedt 2012-12-07 17:33 ` Srivatsa S. Bhat [not found] ` <20121207200014.GB13238@redhat.com> 2012-12-10 18:21 ` Oleg Nesterov 2012-12-10 19:07 ` Steven Rostedt 2012-12-07 19:56 ` Oleg Nesterov 2012-12-07 20:25 ` Srivatsa S. Bhat 2012-12-07 20:59 ` Oleg Nesterov 2012-12-05 19:07 ` Oleg Nesterov 2012-12-05 19:16 ` Srivatsa S. Bhat
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).