* Re: [PATCH 00/11] Modified workqueue patches for your review
[not found] ` <4E8F8BE4.2080104@broadcom.com>
@ 2011-10-08 14:51 ` Oleg Nesterov
0 siblings, 0 replies; 2+ messages in thread
From: Oleg Nesterov @ 2011-10-08 14:51 UTC (permalink / raw)
To: Bhanu Prakash Gollapudi
Cc: Tejun Heo, Mike Christie, Michael Chan, linux-kernel
On 10/07, Bhanu Prakash Gollapudi wrote:
>
> Ok. I guess I plan to do something like this. This should avoid the race
> condition. I have not compiled or tested it yet, but will update you
> the progress.
Confused. I was CC'ed in the middle of discussion, I simply do not
understand what are you talking about. And since we discuss this
off-list I can't find the previous messages. I added lkml.
So, what does this patch do? Looks like, it is on top of another patch
which changes the old workqueue code to take get_online_cpus() instead
of cpu_maps_update_begin() in create/destroy.
That previous change was wrong. And how this one can help?
And could you please explain which problem (or problems) you are trying
to solve? I thought that the problem is that work->func() can't use
cpu_hotplug_begin(), in particular this means it can not call
destroy_workqueue().
> @@ -209,6 +220,7 @@ static int __ref _cpu_down(unsigned int cpu, int
> tasks_frozen)
> if (!cpu_online(cpu))
> return -EINVAL;
>
> + cpu_sync_hotplug_begin();
> cpu_hotplug_begin();
> set_cpu_active(cpu, false);
> err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> @@ -258,6 +270,7 @@ out_release:
> hcpu) == NOTIFY_BAD)
> BUG();
> }
> + cpu_sync_hotplug_done();
> return err;
> }
So, we add another global lock, it covers CPU_POST_DEAD.
> @@ -930,7 +932,9 @@ void destroy_workqueue(struct workqueue_struct *wq)
> const struct cpumask *cpu_map = wq_cpu_map(wq);
> int cpu;
>
> + cpu_sync_hotplug_begin();
> get_online_cpus();
> + cpu_sync_hotplug_done();
OK, we are going to flush the pending works. Since we drop _sync_ lock,
a work->func() can take it again.
Seems to work, but it doesn't. Suppose _cpu_down() is called, suppose
that it takes cpu_sync_hotplug_begin() before that work. Deadlock.
Once again. May be I missed something (or even everything ;) but you
should not blame 3da1c84c00c commit, it was always wrong to destroy_
from work->func(). Note that there is another problem, CPU_POST_DEAD
needs to flush the pending works too and we have another obvious source
of deadlock.
Oleg.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH 00/11] Modified workqueue patches for your review
[not found] ` <4E8F8C97.6010804@broadcom.com>
@ 2011-10-08 15:51 ` Oleg Nesterov
0 siblings, 0 replies; 2+ messages in thread
From: Oleg Nesterov @ 2011-10-08 15:51 UTC (permalink / raw)
To: Bhanu Prakash Gollapudi
Cc: Tejun Heo, Mike Christie, Michael Chan, linux-kernel
On 10/07, Bhanu Prakash Gollapudi wrote:
>
> On 10/7/2011 7:51 AM, Oleg Nesterov wrote:
>> On 10/06, Bhanu Prakash Gollapudi wrote:
>>>
>>> On 10/6/2011 5:48 PM, Tejun Heo wrote:
>>>>
>>>>> Reviewing the patch, I agree that we cannot rely on
>>>>> get_online_cpus() any longer. But I'm also not convinced that
>>>>> cpu_add_remove_lock should be used instead, as it shows up some
>>>>> other deadlocks in destroy_workqueue context because of this global
>>>>> lock.
>>
>> Which deadlocks? work->func() must not use cpu_maps_update_begin/end
>> and thus it can't create/destroy !singlethread workqueue.
>
> Oleg, I attached the stack traces leading to deadlock in my previous
> email.
Yes, I didn't read it to the end... But you could save me some time and
explain ;)
OK. Afaics, it is easy to fix this particular problem... First of all,
scsi_host_dev_release() destroys the single-threaded wq. In this case
we do not actually need the locking/list_del, the code was written this
way just for consistency. See the patch below.
But, it seems, we could change scsi_host_dev_release() instead? It could
probably schedule_work() a work which actually does destroy_workqueue().
destroy/flush under the lock shared with work->func's is always dangerous.
> Based on Tejun's suggestion I sent a prototype patch that should fix the
> deadlock due to cpu_add_remove_lock, and avoid the race condition. I'm
> yet to test it.
Doesn't look right...
But once again, I didn't see the whole discussion, I have no idea what
I have missed.
Oleg.
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -930,14 +930,19 @@ void destroy_workqueue(struct workqueue_
const struct cpumask *cpu_map = wq_cpu_map(wq);
int cpu;
- cpu_maps_update_begin();
- spin_lock(&workqueue_lock);
- list_del(&wq->list);
- spin_unlock(&workqueue_lock);
-
- for_each_cpu(cpu, cpu_map)
- cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu));
- cpu_maps_update_done();
+ if (is_wq_single_threaded(wq)) {
+ cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq,
+ singlethread_cpu));
+ } else {
+ cpu_maps_update_begin();
+ spin_lock(&workqueue_lock);
+ list_del(&wq->list);
+ spin_unlock(&workqueue_lock);
+
+ for_each_cpu(cpu, cpu_map)
+ cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu));
+ cpu_maps_update_done();
+ }
free_percpu(wq->cpu_wq);
kfree(wq);
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-10-08 15:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1314339850-32666-1-git-send-email-bprakash@broadcom.com>
[not found] ` <20110826085457.GC2632@htj.dyndns.org>
[not found] ` <4E58138A.5050702@broadcom.com>
[not found] ` <4E8E378B.30907@broadcom.com>
[not found] ` <20111007004824.GA5458@google.com>
[not found] ` <4E8E5493.5010804@broadcom.com>
[not found] ` <20111007014534.GC5458@google.com>
[not found] ` <4E8E6660.8070502@broadcom.com>
[not found] ` <20111007062635.GA18562@dhcp-172-17-108-109.mtv.corp.google.com>
[not found] ` <4E8F8BE4.2080104@broadcom.com>
2011-10-08 14:51 ` [PATCH 00/11] Modified workqueue patches for your review Oleg Nesterov
[not found] ` <20111007145102.GA25449@redhat.com>
[not found] ` <4E8F8C97.6010804@broadcom.com>
2011-10-08 15:51 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox