From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by mail09.linbit.com (LINBIT Mail Daemon) with ESMTP id 7358A1028A68 for ; Thu, 25 Apr 2019 13:04:58 +0200 (CEST) Received: by mail-wr1-f66.google.com with SMTP id s15so29654292wra.12 for ; Thu, 25 Apr 2019 04:04:58 -0700 (PDT) Received: from soda.linbit (212-186-191-219.static.upcbusiness.at. [212.186.191.219]) by smtp.gmail.com with ESMTPSA id g19sm19717142wmh.17.2019.04.25.03.56.42 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 25 Apr 2019 03:56:42 -0700 (PDT) Date: Thu, 25 Apr 2019 12:56:41 +0200 From: Lars Ellenberg To: drbd-dev@lists.linbit.com Message-ID: <20190425105641.GA919@soda.linbit> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [Drbd-dev] Please test with CONFIG_PROVE_LOCKING=y List-Id: "*Coordination* of development, patches, contributions -- *Questions* \(even to developers\) go to drbd-user, please." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Thu, Apr 25, 2019 at 06:30:05PM +0900, Tetsuo Handa wrote: > I found that simply doing > > # mount /dev/drbd0 /mnt/ > > on the primary side causes a lockdep splat on the peer side. > > [ 23.039882] ======================================================== > [ 23.039906] WARNING: possible irq lock inversion dependency detected > [ 23.039931] 5.0.0 #891 Tainted: G O > [ 23.039950] -------------------------------------------------------- > [ 23.039975] drbd_r_r0/8237 just changed the state of lock: > [ 23.039997] 000000007cc227b6 (&(&connection->epoch_lock)->rlock){+.+.}, at: receive_Data+0x36b/0x1ca0 [drbd] > [ 23.040049] but this lock was taken by another, SOFTIRQ-safe lock in the past: > [ 23.040115] (&(&resource->req_lock)->rlock){..-.} > [ 23.040117] > > and interrupts could create inverse lock ordering between them. > > [ 23.040176] > other info that might help us debug this: > [ 23.040200] Possible interrupt unsafe locking scenario: > > [ 23.040225] CPU0 CPU1 > [ 23.040243] ---- ---- > [ 23.040260] lock(&(&connection->epoch_lock)->rlock); > [ 23.040281] local_irq_disable(); > [ 23.040303] lock(&(&resource->req_lock)->rlock); > [ 23.040330] lock(&(&connection->epoch_lock)->rlock); > [ 23.040359] > [ 23.040370] lock(&(&resource->req_lock)->rlock); > [ 23.040389] > *** DEADLOCK *** Yes. We already know. "impossible odds"... But needs fixing. Certainly NOT by making all epoch_lock irqsave. Problem was introduced by me with f4acb16f drbd: fix lifetime of "need to apply activity log" metadata flag I "just" need to come up with a way to check what I am checking there without taking the epoch lock. > Although making below change seems to solve the lockdep splat, > I can't check the correctness because I don't know how drbd works. > Please test with CONFIG_PROVE_LOCKING=y and fix. See above. Thanks. -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker : R&D, Integration, Ops, Consulting, Support DRBD® and LINBIT® are registered trademarks of LINBIT