From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752887Ab1JHPz1 (ORCPT ); Sat, 8 Oct 2011 11:55:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12829 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751296Ab1JHPz0 (ORCPT ); Sat, 8 Oct 2011 11:55:26 -0400 Date: Sat, 8 Oct 2011 17:51:32 +0200 From: Oleg Nesterov To: Bhanu Prakash Gollapudi Cc: Tejun Heo , Mike Christie , Michael Chan , linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/11] Modified workqueue patches for your review Message-ID: <20111008155132.GA28854@redhat.com> References: <1314339850-32666-1-git-send-email-bprakash@broadcom.com> <20110826085457.GC2632@htj.dyndns.org> <4E58138A.5050702@broadcom.com> <4E8E378B.30907@broadcom.com> <20111007004824.GA5458@google.com> <4E8E5493.5010804@broadcom.com> <20111007145102.GA25449@redhat.com> <4E8F8C97.6010804@broadcom.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E8F8C97.6010804@broadcom.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/07, Bhanu Prakash Gollapudi wrote: > > On 10/7/2011 7:51 AM, Oleg Nesterov wrote: >> On 10/06, Bhanu Prakash Gollapudi wrote: >>> >>> On 10/6/2011 5:48 PM, Tejun Heo wrote: >>>> >>>>> Reviewing the patch, I agree that we cannot rely on >>>>> get_online_cpus() any longer. But I'm also not convinced that >>>>> cpu_add_remove_lock should be used instead, as it shows up some >>>>> other deadlocks in destroy_workqueue context because of this global >>>>> lock. >> >> Which deadlocks? work->func() must not use cpu_maps_update_begin/end >> and thus it can't create/destroy !singlethread workqueue. > > Oleg, I attached the stack traces leading to deadlock in my previous > email. Yes, I didn't read it to the end... But you could save me some time and explain ;) OK. Afaics, it is easy to fix this particular problem... First of all, scsi_host_dev_release() destroys the single-threaded wq. In this case we do not actually need the locking/list_del, the code was written this way just for consistency. See the patch below. But, it seems, we could change scsi_host_dev_release() instead? It could probably schedule_work() a work which actually does destroy_workqueue(). destroy/flush under the lock shared with work->func's is always dangerous. > Based on Tejun's suggestion I sent a prototype patch that should fix the > deadlock due to cpu_add_remove_lock, and avoid the race condition. I'm > yet to test it. Doesn't look right... But once again, I didn't see the whole discussion, I have no idea what I have missed. Oleg. --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -930,14 +930,19 @@ void destroy_workqueue(struct workqueue_ const struct cpumask *cpu_map = wq_cpu_map(wq); int cpu; - cpu_maps_update_begin(); - spin_lock(&workqueue_lock); - list_del(&wq->list); - spin_unlock(&workqueue_lock); - - for_each_cpu(cpu, cpu_map) - cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); - cpu_maps_update_done(); + if (is_wq_single_threaded(wq)) { + cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, + singlethread_cpu)); + } else { + cpu_maps_update_begin(); + spin_lock(&workqueue_lock); + list_del(&wq->list); + spin_unlock(&workqueue_lock); + + for_each_cpu(cpu, cpu_map) + cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); + cpu_maps_update_done(); + } free_percpu(wq->cpu_wq); kfree(wq);