* Re: [PATCH 00/11] Modified workqueue patches for your review [not found] ` <4E8F8BE4.2080104@broadcom.com> @ 2011-10-08 14:51 ` Oleg Nesterov 0 siblings, 0 replies; 2+ messages in thread From: Oleg Nesterov @ 2011-10-08 14:51 UTC (permalink / raw) To: Bhanu Prakash Gollapudi Cc: Tejun Heo, Mike Christie, Michael Chan, linux-kernel On 10/07, Bhanu Prakash Gollapudi wrote: > > Ok. I guess I plan to do something like this. This should avoid the race > condition. I have not compiled or tested it yet, but will update you > the progress. Confused. I was CC'ed in the middle of discussion, I simply do not understand what are you talking about. And since we discuss this off-list I can't find the previous messages. I added lkml. So, what does this patch do? Looks like, it is on top of another patch which changes the old workqueue code to take get_online_cpus() instead of cpu_maps_update_begin() in create/destroy. That previous change was wrong. And how this one can help? And could you please explain which problem (or problems) you are trying to solve? I thought that the problem is that work->func() can't use cpu_hotplug_begin(), in particular this means it can not call destroy_workqueue(). > @@ -209,6 +220,7 @@ static int __ref _cpu_down(unsigned int cpu, int > tasks_frozen) > if (!cpu_online(cpu)) > return -EINVAL; > > + cpu_sync_hotplug_begin(); > cpu_hotplug_begin(); > set_cpu_active(cpu, false); > err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod, > @@ -258,6 +270,7 @@ out_release: > hcpu) == NOTIFY_BAD) > BUG(); > } > + cpu_sync_hotplug_done(); > return err; > } So, we add another global lock, it covers CPU_POST_DEAD. > @@ -930,7 +932,9 @@ void destroy_workqueue(struct workqueue_struct *wq) > const struct cpumask *cpu_map = wq_cpu_map(wq); > int cpu; > > + cpu_sync_hotplug_begin(); > get_online_cpus(); > + cpu_sync_hotplug_done(); OK, we are going to flush the pending works. Since we drop _sync_ lock, a work->func() can take it again. Seems to work, but it doesn't. Suppose _cpu_down() is called, suppose that it takes cpu_sync_hotplug_begin() before that work. Deadlock. Once again. May be I missed something (or even everything ;) but you should not blame 3da1c84c00c commit, it was always wrong to destroy_ from work->func(). Note that there is another problem, CPU_POST_DEAD needs to flush the pending works too and we have another obvious source of deadlock. Oleg. ^ permalink raw reply [flat|nested] 2+ messages in thread
[parent not found: <20111007145102.GA25449@redhat.com>]
[parent not found: <4E8F8C97.6010804@broadcom.com>]
* Re: [PATCH 00/11] Modified workqueue patches for your review [not found] ` <4E8F8C97.6010804@broadcom.com> @ 2011-10-08 15:51 ` Oleg Nesterov 0 siblings, 0 replies; 2+ messages in thread From: Oleg Nesterov @ 2011-10-08 15:51 UTC (permalink / raw) To: Bhanu Prakash Gollapudi Cc: Tejun Heo, Mike Christie, Michael Chan, linux-kernel On 10/07, Bhanu Prakash Gollapudi wrote: > > On 10/7/2011 7:51 AM, Oleg Nesterov wrote: >> On 10/06, Bhanu Prakash Gollapudi wrote: >>> >>> On 10/6/2011 5:48 PM, Tejun Heo wrote: >>>> >>>>> Reviewing the patch, I agree that we cannot rely on >>>>> get_online_cpus() any longer. But I'm also not convinced that >>>>> cpu_add_remove_lock should be used instead, as it shows up some >>>>> other deadlocks in destroy_workqueue context because of this global >>>>> lock. >> >> Which deadlocks? work->func() must not use cpu_maps_update_begin/end >> and thus it can't create/destroy !singlethread workqueue. > > Oleg, I attached the stack traces leading to deadlock in my previous > email. Yes, I didn't read it to the end... But you could save me some time and explain ;) OK. Afaics, it is easy to fix this particular problem... First of all, scsi_host_dev_release() destroys the single-threaded wq. In this case we do not actually need the locking/list_del, the code was written this way just for consistency. See the patch below. But, it seems, we could change scsi_host_dev_release() instead? It could probably schedule_work() a work which actually does destroy_workqueue(). destroy/flush under the lock shared with work->func's is always dangerous. > Based on Tejun's suggestion I sent a prototype patch that should fix the > deadlock due to cpu_add_remove_lock, and avoid the race condition. I'm > yet to test it. Doesn't look right... But once again, I didn't see the whole discussion, I have no idea what I have missed. Oleg. --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -930,14 +930,19 @@ void destroy_workqueue(struct workqueue_ const struct cpumask *cpu_map = wq_cpu_map(wq); int cpu; - cpu_maps_update_begin(); - spin_lock(&workqueue_lock); - list_del(&wq->list); - spin_unlock(&workqueue_lock); - - for_each_cpu(cpu, cpu_map) - cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); - cpu_maps_update_done(); + if (is_wq_single_threaded(wq)) { + cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, + singlethread_cpu)); + } else { + cpu_maps_update_begin(); + spin_lock(&workqueue_lock); + list_del(&wq->list); + spin_unlock(&workqueue_lock); + + for_each_cpu(cpu, cpu_map) + cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); + cpu_maps_update_done(); + } free_percpu(wq->cpu_wq); kfree(wq); ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-10-08 15:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1314339850-32666-1-git-send-email-bprakash@broadcom.com>
[not found] ` <20110826085457.GC2632@htj.dyndns.org>
[not found] ` <4E58138A.5050702@broadcom.com>
[not found] ` <4E8E378B.30907@broadcom.com>
[not found] ` <20111007004824.GA5458@google.com>
[not found] ` <4E8E5493.5010804@broadcom.com>
[not found] ` <20111007014534.GC5458@google.com>
[not found] ` <4E8E6660.8070502@broadcom.com>
[not found] ` <20111007062635.GA18562@dhcp-172-17-108-109.mtv.corp.google.com>
[not found] ` <4E8F8BE4.2080104@broadcom.com>
2011-10-08 14:51 ` [PATCH 00/11] Modified workqueue patches for your review Oleg Nesterov
[not found] ` <20111007145102.GA25449@redhat.com>
[not found] ` <4E8F8C97.6010804@broadcom.com>
2011-10-08 15:51 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox