From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932302AbdJJOPr (ORCPT <rfc822;w@1wt.eu>);
        Tue, 10 Oct 2017 10:15:47 -0400
Received: from mail-qt0-f196.google.com ([209.85.216.196]:38831 "EHLO
        mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S932087AbdJJOPp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 10 Oct 2017 10:15:45 -0400
X-Google-Smtp-Source: AOwi7QDAmluW2/1QM4XuIevVhk/NJVIY67bdv9umqt01/PvEqLA58pG5i/zpSO4hfpPOY2F7z36IlA==
Date: Tue, 10 Oct 2017 07:15:39 -0700
From: Tejun Heo <tj@kernel.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-kernel@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Lai Jiangshan <jiangshanlai@gmail.com>
Subject: Re: [PATCH v2 workqueue/for-4.14-fixes] workqueue: replace
 pool->manager_arb mutex with a flag
Message-ID: <20171010141539.GL3301751@devbig577.frc2.facebook.com>
References: <20171008090225.27034-1-boqun.feng@gmail.com>
 <20171009132104.GZ3301751@devbig577.frc2.facebook.com>
 <20171009150413.GC3301751@devbig577.frc2.facebook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20171009150413.GC3301751@devbig577.frc2.facebook.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Oct 09, 2017 at 08:04:13AM -0700, Tejun Heo wrote:
> Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
> lockdep:
> 
>  [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
>  [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
>  [ 1270.473240] -----------------------------------------------------
>  [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
>  [ 1270.474239]  (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
>  [ 1270.474994]
>  [ 1270.474994] and this task is already holding:
>  [ 1270.475440]  (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
>  [ 1270.476046] which would create a new lock dependency:
>  [ 1270.476436]  (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
>  [ 1270.476949]
>  [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
>  [ 1270.477553]  (&pool->lock/1){-.-.}
>  ...
>  [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
>  [ 1270.489327]  (&(&lock->wait_lock)->rlock){+.+.}
>  ...
>  [ 1270.494735]  Possible interrupt unsafe locking scenario:
>  [ 1270.494735]
>  [ 1270.495250]        CPU0                    CPU1
>  [ 1270.495600]        ----                    ----
>  [ 1270.495947]   lock(&(&lock->wait_lock)->rlock);
>  [ 1270.496295]                                local_irq_disable();
>  [ 1270.496753]                                lock(&pool->lock/1);
>  [ 1270.497205]                                lock(&(&lock->wait_lock)->rlock);
>  [ 1270.497744]   <Interrupt>
>  [ 1270.497948]     lock(&pool->lock/1);
> 
> , which will cause a irq inversion deadlock if the above lock scenario
> happens.
> 
> The root cause of this safe -> unsafe lock order is the
> mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
> held.
> 
> Unlocking mutex while holding an irq spinlock was never safe and this
> problem has been around forever but it never got noticed because the
> only time the mutex is usually trylocked while holding irqlock making
> actual failures very unlikely and lockdep annotation missed the
> condition until the recent b9c16a0e1f73 ("locking/mutex: Fix
> lockdep_assert_held() fail").
> 
> Using mutex for pool->manager_arb has always been a bit of stretch.
> It primarily is an mechanism to arbitrate managership between workers
> which can easily be done with a pool flag.  The only reason it became
> a mutex is that pool destruction path wants to exclude parallel
> managing operations.
> 
> This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
> and make the destruction path wait for the current manager on a wait
> queue.
> 
> v2: Drop unnecessary flag clearing before pool destruction as
>     suggested by Boqun.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Josef Bacik <josef@toxicpanda.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: stable@vger.kernel.org

Applied to wq/for-4.14-fixes.

Thanks.

-- 
tejun