From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756933AbcCUP6i (ORCPT ); Mon, 21 Mar 2016 11:58:38 -0400 Received: from mail-wm0-f45.google.com ([74.125.82.45]:35540 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756863AbcCUP6Z (ORCPT ); Mon, 21 Mar 2016 11:58:25 -0400 Subject: Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev To: Tejun Heo References: <56C5CE85.6090808@suse.cz> <20160218174427.GG13177@mtj.duckdns.org> <56C6EC62.8080107@suse.cz> <56C70618.3010902@suse.cz> <20160302154507.GC4282@mtj.duckdns.org> <56D7FFE1.90900@suse.cz> <20160311171205.GB24046@htj.duckdns.org> <56EA9C4D.2080803@suse.cz> <20160318205231.GO20028@mtj.duckdns.org> Cc: Dmitry Vyukov , Marcel Holtmann , Gustavo Padovan , Johan Hedberg , "David S. Miller" , linux-bluetooth@vger.kernel.org, netdev , LKML , syzkaller , Kostya Serebryany , Alexander Potapenko , Sasha Levin , Eric Dumazet , Takashi Iwai From: Jiri Slaby Message-ID: <56F01A1C.40208@suse.cz> Date: Mon, 21 Mar 2016 16:58:20 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160318205231.GO20028@mtj.duckdns.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On 03/18/2016, 09:52 PM, Tejun Heo wrote: > On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote: >>>> I have not done that yet, but today, I see: >>>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300 >>>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works: >>>> pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1 >>>> in-flight: 18568:wq_barrier_func >>> >>> So, this means that there's flush_work() racing against workqueue >>> destruction, which can't be safe. :( >> >> But I cannot trigger the WARN_ONs in the attached patch, so I am >> confused how this can happen :(. (While I am still seeing the destroy >> WARNINGs.) > > So, no operations should be in progress when destroy_workqueue() is > called. If somebody was flushing a work item, the flush call must > have returned before destroy_workqueue() was invoked, which doesn't > seem to be the case here. Can you trigger BUG_ON() or sysrq-t when > the above triggers? There must be a task which is flushing a work > item there and it shouldn't be difficult to pinpoint what's going on > from it. The output of sysrq-t is here (> 200k), but I cannot see anything suspicious in it: http://www.fi.muni.cz/~xslaby/sklad/panics/jctl.txt This is what the code does now: + if ((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) { + pr_info("%s: name='%s' pwq=%p wq->dfl_pwq=%p pwq->refcnt=%d pwq->nr_active=%d delayed_works:", + __func__, wq->name, pwq, wq->dfl_pwq, + pwq->refcnt, pwq->nr_active); + + show_pwq(pwq); + + mutex_unlock(&wq->mutex); + show_state(); + show_workqueue_state(); + WARN_ON(1); + return; + } thanks, -- js suse labs