* Deadlock between cpu_hotplug_begin and cpu_add_remove_lock @ 2014-01-22 5:52 Paul Mackerras 2014-01-22 8:30 ` Srivatsa S. Bhat 0 siblings, 1 reply; 11+ messages in thread From: Paul Mackerras @ 2014-01-22 5:52 UTC (permalink / raw) To: Srivatsa S. Bhat; +Cc: linux-kernel This arises out of a report from a tester that offlining a CPU never finished on a system they were testing. This was on a POWER8 running a 3.10.x kernel, but the issue is still present in mainline AFAICS. What I found when I looked at the system was this: * There was a ppc64_cpu process stuck inside cpu_hotplug_begin(), called from _cpu_down(), from cpu_down(). This process was holding the cpu_add_remove_lock mutex, since cpu_down() calls cpu_maps_update_begin() before calling _cpu_down(). It was stuck there because cpu_hotplug.refcount == 1. * There was a mdadm process trying to acquire the cpu_add_remove_lock mutex inside register_cpu_notifier(), called from raid5_alloc_percpu() in drivers/md/raid5.c. That process had previously called get_online_cpus, which is why cpu_hotplug.refcount was 1. Result: deadlock. Thus it seems that the following code is not safe: get_online_cpus(); register_cpu_notifier(&...); put_online_cpus(); There are a few different places that do that sort of thing; besides drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu, arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c, drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c. My question is this: is it reasonable to call register_cpu_notifier inside a get/put_online_cpus block? If so, the deadlock needs to be fixed; if not, the callers need to be fixed, and the restriction should be documented. Regards, Paul. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-22 5:52 Deadlock between cpu_hotplug_begin and cpu_add_remove_lock Paul Mackerras @ 2014-01-22 8:30 ` Srivatsa S. Bhat 2014-01-22 9:16 ` Srivatsa S. Bhat 0 siblings, 1 reply; 11+ messages in thread From: Srivatsa S. Bhat @ 2014-01-22 8:30 UTC (permalink / raw) To: Paul Mackerras Cc: linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Oleg Nesterov, Tejun Heo, Michel Lespinasse, ego, rusty@rustcorp.com.au, Thomas Gleixner, akpm@linux-foundation.org Hi Paul, On 01/22/2014 11:22 AM, Paul Mackerras wrote: > This arises out of a report from a tester that offlining a CPU never > finished on a system they were testing. This was on a POWER8 running > a 3.10.x kernel, but the issue is still present in mainline AFAICS. > > What I found when I looked at the system was this: > > * There was a ppc64_cpu process stuck inside cpu_hotplug_begin(), > called from _cpu_down(), from cpu_down(). This process was holding > the cpu_add_remove_lock mutex, since cpu_down() calls > cpu_maps_update_begin() before calling _cpu_down(). It was stuck > there because cpu_hotplug.refcount == 1. > > * There was a mdadm process trying to acquire the cpu_add_remove_lock > mutex inside register_cpu_notifier(), called from > raid5_alloc_percpu() in drivers/md/raid5.c. That process had > previously called get_online_cpus, which is why cpu_hotplug.refcount > was 1. > > Result: deadlock. > > Thus it seems that the following code is not safe: > > get_online_cpus(); > register_cpu_notifier(&...); > put_online_cpus(); > Yes, this is a known problem, and I had proposed an elaborate solution some time ago: https://lkml.org/lkml/2012/3/1/39 But that won't work for all cases, so that solution is a no-go. If we forget the CPU_POST_DEAD stage for a moment, we can just replace the calls to cpu_maps_update_begin/done() with get/put_online_cpus() in both register_cpu_notifier() as well as unregister_cpu_notifier(). After all, the callback registration code needs to synchronize only with the actual hotplug operations, and not the update of cpu-maps. So they don't really need to acquire the cpu_add_remove_lock. However, CPU_POST_DEAD notifications run with the hotplug lock dropped. So we can't simply replace cpu_add_remove_lock with hotplug lock in the registration routines, because notifier invocations and notifier registration needs to be synchronized. Hmm... > There are a few different places that do that sort of thing; besides > drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu, > arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c, > drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c. > > My question is this: is it reasonable to call register_cpu_notifier > inside a get/put_online_cpus block? Ideally, we would want that to work. Because there is no other race-free way of registering a notifier. > If so, the deadlock needs to be > fixed; if not, the callers need to be fixed, and the restriction > should be documented. Fixing the callers is a last resort. I'm thinking of ways to fix the deadlock itself, and allow the callers to call register_cpu_notifier within a get/put_online_cpus() block... Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-22 8:30 ` Srivatsa S. Bhat @ 2014-01-22 9:16 ` Srivatsa S. Bhat 2014-01-22 19:18 ` Oleg Nesterov 2014-01-23 2:29 ` Rusty Russell 0 siblings, 2 replies; 11+ messages in thread From: Srivatsa S. Bhat @ 2014-01-22 9:16 UTC (permalink / raw) To: Paul Mackerras Cc: linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Oleg Nesterov, Tejun Heo, Michel Lespinasse, ego, rusty@rustcorp.com.au, Thomas Gleixner, akpm@linux-foundation.org On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote: > Hi Paul, > > On 01/22/2014 11:22 AM, Paul Mackerras wrote: >> This arises out of a report from a tester that offlining a CPU never >> finished on a system they were testing. This was on a POWER8 running >> a 3.10.x kernel, but the issue is still present in mainline AFAICS. >> >> What I found when I looked at the system was this: >> >> * There was a ppc64_cpu process stuck inside cpu_hotplug_begin(), >> called from _cpu_down(), from cpu_down(). This process was holding >> the cpu_add_remove_lock mutex, since cpu_down() calls >> cpu_maps_update_begin() before calling _cpu_down(). It was stuck >> there because cpu_hotplug.refcount == 1. >> >> * There was a mdadm process trying to acquire the cpu_add_remove_lock >> mutex inside register_cpu_notifier(), called from >> raid5_alloc_percpu() in drivers/md/raid5.c. That process had >> previously called get_online_cpus, which is why cpu_hotplug.refcount >> was 1. >> >> Result: deadlock. >> >> Thus it seems that the following code is not safe: >> >> get_online_cpus(); >> register_cpu_notifier(&...); >> put_online_cpus(); >> > > Yes, this is a known problem, and I had proposed an elaborate solution > some time ago: https://lkml.org/lkml/2012/3/1/39 > But that won't work for all cases, so that solution is a no-go. > Wait a min, that _will_ actually work for all cases because I have provided an option to invoke _any_ arbitrary function as the "setup" routine. So, taking the example of raid5 that you mentioned below, instead of doing this: static int raid5_alloc_percpu() { ... get_online_cpus(); for_each_present_cpu(cpu) { ... } ... register_cpu_notifier(); put_online_cpus(); } We can do this: void func() { for_each_present_cpu(cpu) { ... } ... } static int raid5_alloc_percpu() { ... register_allcpu_notifier(..., true, func); } The other, simpler alternative fix is to use cpu_hotplug_disable/enable() in place of get/put_online_cpus() around the callback registration code. Something like this: cpu_hotplug_disable(); register_cpu_notifier(); cpu_hotplug_enable(); But the problem with this is that a hotplug operation that tries to run concurrently with this will get a -EBUSY, which is kinda undesirable. Also, this will only synchronize with hotplug operations initiated via calls to cpu_up/down() (such as those that are initiated by writing to the online file in sysfs). It won't synchronize with the hotplug operations invoked by disable/enable_nonboot_cpus(), which by-pass cpu_up/down() and directly call _cpu_up/down() by ignoring the cpu_hotplug_disabled flag. The latter is a more controlled environment though, since its mostly used by the suspend/hibernate code, in a state where the entire userspace is frozen. So it might not be that bad. Regards, Srivatsa S. Bhat > If we forget the CPU_POST_DEAD stage for a moment, we can just replace the > calls to cpu_maps_update_begin/done() with get/put_online_cpus() in both > register_cpu_notifier() as well as unregister_cpu_notifier(). After all, > the callback registration code needs to synchronize only with the actual > hotplug operations, and not the update of cpu-maps. So they don't really > need to acquire the cpu_add_remove_lock. > > However, CPU_POST_DEAD notifications run with the hotplug lock dropped. > So we can't simply replace cpu_add_remove_lock with hotplug lock in the > registration routines, because notifier invocations and notifier registration > needs to be synchronized. > > Hmm... > >> There are a few different places that do that sort of thing; besides >> drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu, >> arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c, >> drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c. >> >> My question is this: is it reasonable to call register_cpu_notifier >> inside a get/put_online_cpus block? > > Ideally, we would want that to work. Because there is no other race-free > way of registering a notifier. > >> If so, the deadlock needs to be >> fixed; if not, the callers need to be fixed, and the restriction >> should be documented. > > Fixing the callers is a last resort. I'm thinking of ways to fix the > deadlock itself, and allow the callers to call register_cpu_notifier > within a get/put_online_cpus() block... > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-22 9:16 ` Srivatsa S. Bhat @ 2014-01-22 19:18 ` Oleg Nesterov 2014-01-22 19:58 ` Srivatsa S. Bhat 2014-01-23 2:29 ` Rusty Russell 1 sibling, 1 reply; 11+ messages in thread From: Oleg Nesterov @ 2014-01-22 19:18 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Paul Mackerras, linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Tejun Heo, Michel Lespinasse, ego, rusty@rustcorp.com.au, Thomas Gleixner, akpm@linux-foundation.org On 01/22, Srivatsa S. Bhat wrote: > > Wait a min, that _will_ actually work for all cases because I have provided > an option to invoke _any_ arbitrary function as the "setup" routine. And probably the generic solution makes sense. I am not sure I actually understand the semantics of register_allcpu_notifier(), but the problem it tries to solve looks clear/valid. But as for a quick fix for raid5_alloc_percpu(), can't it simply call register_cpu_notifier() before get_online_cpus/for_each_present_cpu ? This probably means that raid456_cpu_notify() should be modified because it obviously can be called before get_online_cpus(). Hmm, it already has safe_put_page(), so this looks really simple? Something like below, although of course I can miss easily something. Oleg. --- x/drivers/md/raid5.c +++ x/drivers/md/raid5.c @@ -5542,6 +5542,24 @@ static void free_conf(struct r5conf *con kfree(conf); } +static int alloc_xxx(struct r5conf *conf, struct raid5_percpu *percpu) +{ + if (conf->level == 6 && !percpu->spare_page) + percpu->spare_page = alloc_page(GFP_KERNEL); + if (!percpu->scribble) + percpu->scribble = kmalloc(conf->scribble_len, GFP_KERNEL); + + if (!percpu->scribble || (conf->level == 6 && !percpu->spare_page)) { + safe_put_page(percpu->spare_page); + kfree(percpu->scribble); + pr_err("%s: failed memory allocation for cpu%ld\n", + __func__, cpu); + return -ENOMEM; + } + + return 0; +} + #ifdef CONFIG_HOTPLUG_CPU static int raid456_cpu_notify(struct notifier_block *nfb, unsigned long action, void *hcpu) @@ -5553,19 +5571,8 @@ static int raid456_cpu_notify(struct not switch (action) { case CPU_UP_PREPARE: case CPU_UP_PREPARE_FROZEN: - if (conf->level == 6 && !percpu->spare_page) - percpu->spare_page = alloc_page(GFP_KERNEL); - if (!percpu->scribble) - percpu->scribble = kmalloc(conf->scribble_len, GFP_KERNEL); - - if (!percpu->scribble || - (conf->level == 6 && !percpu->spare_page)) { - safe_put_page(percpu->spare_page); - kfree(percpu->scribble); - pr_err("%s: failed memory allocation for cpu%ld\n", - __func__, cpu); + if (alloc_xxx(conf, percpu)) return notifier_from_errno(-ENOMEM); - } break; case CPU_DEAD: case CPU_DEAD_FROZEN: @@ -5585,39 +5592,27 @@ static int raid5_alloc_percpu(struct r5c { unsigned long cpu; struct page *spare_page; - struct raid5_percpu __percpu *allcpus; void *scribble; - int err; + int err = 0; - allcpus = alloc_percpu(struct raid5_percpu); - if (!allcpus) + conf->percpu = alloc_percpu(struct raid5_percpu); + if (!conf->percpu) return -ENOMEM; - conf->percpu = allcpus; - get_online_cpus(); - err = 0; - for_each_present_cpu(cpu) { - if (conf->level == 6) { - spare_page = alloc_page(GFP_KERNEL); - if (!spare_page) { - err = -ENOMEM; - break; - } - per_cpu_ptr(conf->percpu, cpu)->spare_page = spare_page; - } - scribble = kmalloc(conf->scribble_len, GFP_KERNEL); - if (!scribble) { - err = -ENOMEM; - break; - } - per_cpu_ptr(conf->percpu, cpu)->scribble = scribble; - } #ifdef CONFIG_HOTPLUG_CPU conf->cpu_notify.notifier_call = raid456_cpu_notify; conf->cpu_notify.priority = 0; - if (err == 0) - err = register_cpu_notifier(&conf->cpu_notify); + err = register_cpu_notifier(&conf->cpu_notify); + if (err) + return err; #endif + + get_online_cpus(); + for_each_present_cpu(cpu) { + err = alloc_xxx(conf, per_cpu_ptr(conf->percpu, cpu)); + if (err) + break; + } put_online_cpus(); return err; ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-22 19:18 ` Oleg Nesterov @ 2014-01-22 19:58 ` Srivatsa S. Bhat 2014-01-23 17:02 ` Oleg Nesterov 0 siblings, 1 reply; 11+ messages in thread From: Srivatsa S. Bhat @ 2014-01-22 19:58 UTC (permalink / raw) To: Oleg Nesterov Cc: Paul Mackerras, linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Tejun Heo, Michel Lespinasse, ego, rusty@rustcorp.com.au, Thomas Gleixner, akpm@linux-foundation.org On 01/23/2014 12:48 AM, Oleg Nesterov wrote: > On 01/22, Srivatsa S. Bhat wrote: >> >> Wait a min, that _will_ actually work for all cases because I have provided >> an option to invoke _any_ arbitrary function as the "setup" routine. > > And probably the generic solution makes sense. I am not sure I actually > understand the semantics of register_allcpu_notifier(), but the problem > it tries to solve looks clear/valid. > Thank you. But I was wondering whether its usage is a bit unintuitive/ convoluted. So I was contemplating between going with that solution or the below one, where the call-sites are expected to do: cpu_maps_update_begin(); for_each_online_cpu(cpu) { ... } __register_cpu_notifier(); //use the __reg() variant, which doesn't take locks cpu_maps_update_done(); Of course, that requires exporting the functions cpu_maps_update_begin/done(), but this latter form of callback registration might look more natural. diff --git a/kernel/cpu.c b/kernel/cpu.c index deff2e6..37373c1 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -166,6 +166,11 @@ int __ref register_cpu_notifier(struct notifier_block *nb) return ret; } +int __ref __register_cpu_notifier(struct notifier_block *nb) +{ + return raw_notifier_chain_register(&cpu_chain, nb); +} + static int __cpu_notify(unsigned long val, void *v, int nr_to_call, int *nr_calls) { @@ -189,6 +194,7 @@ static void cpu_notify_nofail(unsigned long val, void *v) BUG_ON(cpu_notify(val, v)); } EXPORT_SYMBOL(register_cpu_notifier); +EXPORT_SYMBOL(__register_cpu_notifier); void __ref unregister_cpu_notifier(struct notifier_block *nb) { @@ -198,6 +204,12 @@ void __ref unregister_cpu_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_cpu_notifier); +void __ref __unregister_cpu_notifier(struct notifier_block *nb) +{ + raw_notifier_chain_unregister(&cpu_chain, nb); +} +EXPORT_SYMBOL(__unregister_cpu_notifier); + /** * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU * @cpu: a CPU id > But as for a quick fix for raid5_alloc_percpu(), can't it simply call > register_cpu_notifier() before get_online_cpus/for_each_present_cpu ? > > This probably means that raid456_cpu_notify() should be modified because > it obviously can be called before get_online_cpus(). Hmm, it already > has safe_put_page(), so this looks really simple? Something like below, > although of course I can miss easily something. Yes, your solution for raid5 does look good; it is also simple and elegant. But for some of the other call-sites, we might have to use one of the solutions mentioned above. Regards, Srivatsa S. Bhat > > > --- x/drivers/md/raid5.c > +++ x/drivers/md/raid5.c > @@ -5542,6 +5542,24 @@ static void free_conf(struct r5conf *con > kfree(conf); > } > > +static int alloc_xxx(struct r5conf *conf, struct raid5_percpu *percpu) > +{ > + if (conf->level == 6 && !percpu->spare_page) > + percpu->spare_page = alloc_page(GFP_KERNEL); > + if (!percpu->scribble) > + percpu->scribble = kmalloc(conf->scribble_len, GFP_KERNEL); > + > + if (!percpu->scribble || (conf->level == 6 && !percpu->spare_page)) { > + safe_put_page(percpu->spare_page); > + kfree(percpu->scribble); > + pr_err("%s: failed memory allocation for cpu%ld\n", > + __func__, cpu); > + return -ENOMEM; > + } > + > + return 0; > +} > + > #ifdef CONFIG_HOTPLUG_CPU > static int raid456_cpu_notify(struct notifier_block *nfb, unsigned long action, > void *hcpu) > @@ -5553,19 +5571,8 @@ static int raid456_cpu_notify(struct not > switch (action) { > case CPU_UP_PREPARE: > case CPU_UP_PREPARE_FROZEN: > - if (conf->level == 6 && !percpu->spare_page) > - percpu->spare_page = alloc_page(GFP_KERNEL); > - if (!percpu->scribble) > - percpu->scribble = kmalloc(conf->scribble_len, GFP_KERNEL); > - > - if (!percpu->scribble || > - (conf->level == 6 && !percpu->spare_page)) { > - safe_put_page(percpu->spare_page); > - kfree(percpu->scribble); > - pr_err("%s: failed memory allocation for cpu%ld\n", > - __func__, cpu); > + if (alloc_xxx(conf, percpu)) > return notifier_from_errno(-ENOMEM); > - } > break; > case CPU_DEAD: > case CPU_DEAD_FROZEN: > @@ -5585,39 +5592,27 @@ static int raid5_alloc_percpu(struct r5c > { > unsigned long cpu; > struct page *spare_page; > - struct raid5_percpu __percpu *allcpus; > void *scribble; > - int err; > + int err = 0; > > - allcpus = alloc_percpu(struct raid5_percpu); > - if (!allcpus) > + conf->percpu = alloc_percpu(struct raid5_percpu); > + if (!conf->percpu) > return -ENOMEM; > - conf->percpu = allcpus; > > - get_online_cpus(); > - err = 0; > - for_each_present_cpu(cpu) { > - if (conf->level == 6) { > - spare_page = alloc_page(GFP_KERNEL); > - if (!spare_page) { > - err = -ENOMEM; > - break; > - } > - per_cpu_ptr(conf->percpu, cpu)->spare_page = spare_page; > - } > - scribble = kmalloc(conf->scribble_len, GFP_KERNEL); > - if (!scribble) { > - err = -ENOMEM; > - break; > - } > - per_cpu_ptr(conf->percpu, cpu)->scribble = scribble; > - } > #ifdef CONFIG_HOTPLUG_CPU > conf->cpu_notify.notifier_call = raid456_cpu_notify; > conf->cpu_notify.priority = 0; > - if (err == 0) > - err = register_cpu_notifier(&conf->cpu_notify); > + err = register_cpu_notifier(&conf->cpu_notify); > + if (err) > + return err; > #endif > + > + get_online_cpus(); > + for_each_present_cpu(cpu) { > + err = alloc_xxx(conf, per_cpu_ptr(conf->percpu, cpu)); > + if (err) > + break; > + } > put_online_cpus(); > > return err; > ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-22 19:58 ` Srivatsa S. Bhat @ 2014-01-23 17:02 ` Oleg Nesterov 2014-01-28 14:32 ` Srivatsa S. Bhat 0 siblings, 1 reply; 11+ messages in thread From: Oleg Nesterov @ 2014-01-23 17:02 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Paul Mackerras, linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Tejun Heo, Michel Lespinasse, ego, rusty@rustcorp.com.au, Thomas Gleixner, akpm@linux-foundation.org On 01/23, Srivatsa S. Bhat wrote: > > On 01/23/2014 12:48 AM, Oleg Nesterov wrote: > > On 01/22, Srivatsa S. Bhat wrote: > >> > >> Wait a min, that _will_ actually work for all cases because I have provided > >> an option to invoke _any_ arbitrary function as the "setup" routine. > > > > And probably the generic solution makes sense. I am not sure I actually > > understand the semantics of register_allcpu_notifier(), but the problem > > it tries to solve looks clear/valid. > > > > Thank you. But I was wondering whether its usage is a bit unintuitive/ > convoluted. So I was contemplating between going with that solution or the > below one, where the call-sites are expected to do: > > cpu_maps_update_begin(); > for_each_online_cpu(cpu) { > ... > } > __register_cpu_notifier(); //use the __reg() variant, which doesn't take locks > cpu_maps_update_done(); > > Of course, that requires exporting the functions cpu_maps_update_begin/done(), > but this latter form of callback registration might look more natural. Yes, I thought about this too ;) > But for some of the other call-sites, we might have to use one > of the solutions mentioned above. Yes, yes, sure, I agree. I suggested this change only for discussion, for the case we need an "urgent" fix without changes outside of drivers/md/. The generic solution is better. Oleg. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-23 17:02 ` Oleg Nesterov @ 2014-01-28 14:32 ` Srivatsa S. Bhat 0 siblings, 0 replies; 11+ messages in thread From: Srivatsa S. Bhat @ 2014-01-28 14:32 UTC (permalink / raw) To: Oleg Nesterov Cc: Paul Mackerras, linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Tejun Heo, Michel Lespinasse, ego, rusty@rustcorp.com.au, Thomas Gleixner, akpm@linux-foundation.org On 01/23/2014 10:32 PM, Oleg Nesterov wrote: > On 01/23, Srivatsa S. Bhat wrote: >> >> On 01/23/2014 12:48 AM, Oleg Nesterov wrote: >>> On 01/22, Srivatsa S. Bhat wrote: >>>> >>>> Wait a min, that _will_ actually work for all cases because I have provided >>>> an option to invoke _any_ arbitrary function as the "setup" routine. >>> >>> And probably the generic solution makes sense. I am not sure I actually >>> understand the semantics of register_allcpu_notifier(), but the problem >>> it tries to solve looks clear/valid. >>> >> >> Thank you. But I was wondering whether its usage is a bit unintuitive/ >> convoluted. So I was contemplating between going with that solution or the >> below one, where the call-sites are expected to do: >> >> cpu_maps_update_begin(); >> for_each_online_cpu(cpu) { >> ... >> } >> __register_cpu_notifier(); //use the __reg() variant, which doesn't take locks >> cpu_maps_update_done(); >> >> Of course, that requires exporting the functions cpu_maps_update_begin/done(), >> but this latter form of callback registration might look more natural. > > Yes, I thought about this too ;) > >> But for some of the other call-sites, we might have to use one >> of the solutions mentioned above. > > Yes, yes, sure, I agree. > > I suggested this change only for discussion, for the case we need > an "urgent" fix without changes outside of drivers/md/. The generic > solution is better. > Ok :) But your fix for drivers/md/ also makes the code look much neater. So I'll include your patch in my series and convert the rest of the call- sites using the generic solution. Thank you! Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-22 9:16 ` Srivatsa S. Bhat 2014-01-22 19:18 ` Oleg Nesterov @ 2014-01-23 2:29 ` Rusty Russell 2014-01-23 5:36 ` Srivatsa S. Bhat 1 sibling, 1 reply; 11+ messages in thread From: Rusty Russell @ 2014-01-23 2:29 UTC (permalink / raw) To: Srivatsa S. Bhat, Paul Mackerras Cc: linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Oleg Nesterov, Tejun Heo, Michel Lespinasse, ego, Thomas Gleixner, akpm@linux-foundation.org "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes: > On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote: >> Hi Paul, I find an old patch for register_allcpu_notifier(), but the "bool replay_history" should be eliminated (always true): it's too weird. Then we should get rid of register_cpu_notifier, or at least hide it. Thanks, Rusty. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-23 2:29 ` Rusty Russell @ 2014-01-23 5:36 ` Srivatsa S. Bhat 2014-01-23 23:01 ` Rusty Russell 0 siblings, 1 reply; 11+ messages in thread From: Srivatsa S. Bhat @ 2014-01-23 5:36 UTC (permalink / raw) To: Rusty Russell Cc: Paul Mackerras, linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Oleg Nesterov, Tejun Heo, Michel Lespinasse, ego, Thomas Gleixner, akpm@linux-foundation.org On 01/23/2014 07:59 AM, Rusty Russell wrote: > "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes: >> On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote: >>> Hi Paul, > > I find an old patch for register_allcpu_notifier(), but the "bool > replay_history" should be eliminated (always true): it's too weird. > Sorry, I didn't get this part. Why do you say that replay_history will always be true? replay_history will be set to true whenever the caller wants to get notified of CPU_UP_PREPARE and CPU_ONLINE notifications for the already online CPUs, or wants to run a custom setup-routine of its own. And it will be false whenever the caller simply wants to just register the callback. Note that passing NULL for the setup-routine, by itself isn't enough to make a decision. NULL + replay_history == True will invoke the normal CPU_UP_PREPARE/CPU_ONLINE notifiers for the already online CPUs before registering the callback. NULL + replay_history == False will just register the callback and do nothing else. > Then we should get rid of register_cpu_notifier, or at least hide it. > Why? Isn't it easier to use (since you don't have to pass 2 additional parameters)? I see register_allcpu_notifier (or whatever better name we can give it), as an API for special cases where there is something more to be done than just registering the callback. And register_cpu_notifier will continue to be the API for the regular case when the caller wants to just register the callback. This latter case is the majority in the kernel. So I don't think eliminating the regular API would be a good idea. By the way, I'm still tempted to try out the simpler-looking alternative idea of exporting cpu_maps_update_begin() and cpu_maps_update_done() and then mandating that the callers do: cpu_maps_update_begin(); for_each_online_cpu(cpu) { ... } __register_cpu_notifier(); // this doesn't take the add_remove_lock cpu_maps_update_done(); I'm working on a patchset that does this and performs a tree-wide conversion. Please let me know if you have any objections to exporting cpu_maps_update_begin/done() in this manner. I thought I'd give this solution a try first, before going to the much fancier register_allcpu_notifier() method. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-23 5:36 ` Srivatsa S. Bhat @ 2014-01-23 23:01 ` Rusty Russell 2014-01-28 14:36 ` Srivatsa S. Bhat 0 siblings, 1 reply; 11+ messages in thread From: Rusty Russell @ 2014-01-23 23:01 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Paul Mackerras, linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Oleg Nesterov, Tejun Heo, Michel Lespinasse, ego, Thomas Gleixner, akpm@linux-foundation.org "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes: > On 01/23/2014 07:59 AM, Rusty Russell wrote: >> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes: >>> On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote: >>>> Hi Paul, >> >> I find an old patch for register_allcpu_notifier(), but the "bool >> replay_history" should be eliminated (always true): it's too weird. >> > > Sorry, I didn't get this part. Why do you say that replay_history > will always be true? OK, let me start again and try to explain myself properly: register_cpu_notifier is a bad API. It's hard to get right because: 1) You need to loop over online (or present) cpus once before you call it. 2) You have to beware the race between the loop and registration, but much example code happens at boot time where it doesn't matter, so random author is likely to copy that and have a race. 3) You have two paths doing the same thing: the loop which is run on every machine (cpu hotplug or not), and the notifier callback which is run far less rarely. What we actually *want* is a routine which will reliably call for every current and future CPU, and then there are very few places which should use the current register_cpu_notifier(). ie. halfway between register_cpu_notifier() (too racy) and register_allcpu_notifier() (too simplified). Let's call it register_cpu_callback / unregister_cpu_callback? > By the way, I'm still tempted to try out the simpler-looking alternative > idea of exporting cpu_maps_update_begin() and cpu_maps_update_done() > and then mandating that the callers do: > > cpu_maps_update_begin(); > for_each_online_cpu(cpu) { > ... > } > > __register_cpu_notifier(); // this doesn't take the add_remove_lock > cpu_maps_update_done(); Sure, fix this one for -stable. But let's create an idiom we can be proud of for the longer term. Thanks, Rusty. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock 2014-01-23 23:01 ` Rusty Russell @ 2014-01-28 14:36 ` Srivatsa S. Bhat 0 siblings, 0 replies; 11+ messages in thread From: Srivatsa S. Bhat @ 2014-01-28 14:36 UTC (permalink / raw) To: Rusty Russell Cc: Paul Mackerras, linux-kernel, Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Oleg Nesterov, Tejun Heo, Michel Lespinasse, ego, Thomas Gleixner, akpm@linux-foundation.org On 01/24/2014 04:31 AM, Rusty Russell wrote: > "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes: >> On 01/23/2014 07:59 AM, Rusty Russell wrote: >>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes: >>>> On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote: >>>>> Hi Paul, >>> >>> I find an old patch for register_allcpu_notifier(), but the "bool >>> replay_history" should be eliminated (always true): it's too weird. >>> >> >> Sorry, I didn't get this part. Why do you say that replay_history >> will always be true? > > OK, let me start again and try to explain myself properly: > > register_cpu_notifier is a bad API. It's hard to get right because: > 1) You need to loop over online (or present) cpus once before you call > it. > 2) You have to beware the race between the loop and registration, but > much example code happens at boot time where it doesn't matter, > so random author is likely to copy that and have a race. > 3) You have two paths doing the same thing: the loop which is run on > every machine (cpu hotplug or not), and the notifier callback which > is run far less rarely. > > What we actually *want* is a routine which will reliably call for every > current and future CPU, and then there are very few places which should > use the current register_cpu_notifier(). > > ie. halfway between register_cpu_notifier() (too racy) and > register_allcpu_notifier() (too simplified). > > Let's call it register_cpu_callback / unregister_cpu_callback? > Thanks a lot for the detailed and profound explanation! It makes perfect sense to me now. >> By the way, I'm still tempted to try out the simpler-looking alternative >> idea of exporting cpu_maps_update_begin() and cpu_maps_update_done() >> and then mandating that the callers do: >> >> cpu_maps_update_begin(); >> for_each_online_cpu(cpu) { >> ... >> } >> >> __register_cpu_notifier(); // this doesn't take the add_remove_lock >> cpu_maps_update_done(); > > Sure, fix this one for -stable. But let's create an idiom we can be > proud of for the longer term. > Ok, that sounds good, will work on that. Thank you very much! Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-01-28 14:41 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-22 5:52 Deadlock between cpu_hotplug_begin and cpu_add_remove_lock Paul Mackerras 2014-01-22 8:30 ` Srivatsa S. Bhat 2014-01-22 9:16 ` Srivatsa S. Bhat 2014-01-22 19:18 ` Oleg Nesterov 2014-01-22 19:58 ` Srivatsa S. Bhat 2014-01-23 17:02 ` Oleg Nesterov 2014-01-28 14:32 ` Srivatsa S. Bhat 2014-01-23 2:29 ` Rusty Russell 2014-01-23 5:36 ` Srivatsa S. Bhat 2014-01-23 23:01 ` Rusty Russell 2014-01-28 14:36 ` Srivatsa S. Bhat
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).