From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAFD8344055 for ; Tue, 16 Dec 2025 14:22:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765894961; cv=none; b=mQMKQN11vMh5Gw4D6xo3vAouXTofDNK/GcV5LfZChim+i6C5x0UVO+lnNSvM69eIV667xbStKJkazuFCJXS1rmOVETo5E7Vyc4x7dJRvJG7HmRpsXq1m/h0ETyYXOyLHKO78cK2I96CFjiRt6UpXwtkPzWwaM+TdKbT3/DlHi8M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765894961; c=relaxed/simple; bh=XGIiVNgmbcX0eVRdXfX6oOJ6DpkDt+GHo8sty1bDOMg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HCe17qOOxnQqdq8mEhg0T+0wg4ddBI0yS2eLwInGq+7rbUsb2LTVfHeilU/MtVoWUZBn9SL6RGBV6MdnCeH0dLa8HvFA1PdXweUVbJr4tTFAGokEUeC0d338PbFuJi8vO1LreyxbJVXZ29VeB7bjq1Sx0yyWIQtXOwXOYXqwVlM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca; spf=pass smtp.mailfrom=ziepe.ca; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b=mKY9SBk1; arc=none smtp.client-ip=209.85.160.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="mKY9SBk1" Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4f1899960f0so49376361cf.1 for ; Tue, 16 Dec 2025 06:22:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1765894957; x=1766499757; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=+6wGiQHUS31LUV2hFUdw9+vi8x8iNhVGvlP3n/hyQFA=; b=mKY9SBk1FvKlMXFeD3yCaiRnbKMFuClpCIi+ygEfa75jZC5PEaOrV5Isf88ff1bLCg bQvwq/Hduui/lxfGGOwHqoXSjh6q54AzSUi4nPxldEpL6vqMdxc81dZfn7NEE4j4tGt6 rFm8DnsqAOdrMTsrmvgnhQbHZgFYvbw2aIVJ0qQuq0GLlvF0ffG4BfJp2gPvedy7h+fG G6BUHX9Bp5+1r7EAVTp8diEH5gfkGunYVW1hSCF7Xt7xaWUzaTVkYt2WHm50Z9siz+Sb 8P2Yi9TdE8mQVMjpUo72KJmyLXwwcqP5uJk32FD5g7ddbNzJuarCWWfn41uIrfBvX4t/ ZUfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765894957; x=1766499757; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+6wGiQHUS31LUV2hFUdw9+vi8x8iNhVGvlP3n/hyQFA=; b=ZTUF6ak+4zkqucwgRc7A18YD5qTiFKixO+zVDWCOlswVGFzw7ls/POsk32P7wYEwib WEFh1QjKUWw9GxFoQX+L3EVVWtGFmnn8Khh2vd/7rQQkY1zpi7WkkzgiL2JCtKOonoDm juI6YiBonvsSAghCSy/ZJzQuLqb2487bbgfwAhp2JrtSzjLKIsHUrG0V+o4yHyYMtWBh l2tLSgZOENcWmwGBFgrNL7Hk0qwmUMTjBjz5WJLDfu78mhW/eWX12GbzsBGE9xKxHP12 5itRXgiqFsPjPwLM3jaxiv9jianAs1Fa9v2kE7vGucSLQb/zUiNdzyCqIX+CL5nFXdHz BjOg== X-Forwarded-Encrypted: i=1; AJvYcCWhyjoh6jkMoH5BLjVDKsB3v91oYpA000c2EZsNcIL2X+0RxOyuFSG+CuFOa2c9h5duiYFXPS3sqsdmkpQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yw+ZuFzW9/K69r99ICfGkW3erwTgDQNOzDLimgucfKyNUEKOTTz WfkbUxovu/HcZTLndxSCZLRBc1qjQNlH3WMNYEuT1+fM05ont/yknF0QHDg19f1q35as1Io509I FcCwy X-Gm-Gg: AY/fxX65f60HvLNhbmAjBdvmeVtZYFq5iPUx8jVR8ckIU17svAbWh/ytNCvQ9D6hQt6 nosREEunbwAkYbi+1i6ziiRT+bkMIwDEyIxfxKH0QRD/xynES4jKCoIBRsqTiedc1IlaGCLjm/1 8qwN01KhULwIW2We4aZwkx1hAsBrgA5mA348FEutzab9yVYH1R/9obXLJAfU+eqIGZUQH8UxWpI MI/SwRjh7T2sec48T0VIC0ecBwd1rzMeWfI+YP1WOF3hFuJNc5fhMXGaq57SQZeeFHmb2qTjCJR 1+1RKSytx9zHDs/DTpGIVilCVVVOjtL/dffTg/CJSXBNoklwE4sgO5rsa20K4ZvN0ZrSo9ctHig FnYZc9w8GOlo9Zdv9M6a9pxfRlpI69StfmU5H0eelL1vkErZfxA/kvxq0kJwwmu4bs2cmmgb0Dh Pvofd+xkA8uOdyXNBYNCQpsj8EWE0mpgC8WkY+lJDyAnHRKNzWJGiqlMm6 X-Google-Smtp-Source: AGHT+IFDZEDuaRCPwiJW2iKvnDZCiiVa/qjjwcwDcPiZ2BJvdS+08oL80HOVEhlMeknHb+YHZB0zcw== X-Received: by 2002:a05:622a:1f10:b0:4ed:b7ad:6fd with SMTP id d75a77b69052e-4f1d05a931cmr206640191cf.49.1765894957387; Tue, 16 Dec 2025 06:22:37 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-47-55-120-4.dhcp-dynamic.fibreop.ns.bellaliant.net. [47.55.120.4]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4f345c7e99dsm16973041cf.34.2025.12.16.06.22.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Dec 2025 06:22:36 -0800 (PST) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1vVVwi-00000000Jd5-0Q6n; Tue, 16 Dec 2025 10:22:36 -0400 Date: Tue, 16 Dec 2025 10:22:36 -0400 From: Jason Gunthorpe To: Michael Gur Cc: wujing , leon@kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, yuanql9@chinatelecom.cn Subject: Re: [PATCH] IB/core: Fix ABBA deadlock in rdma_dev_exit_net Message-ID: <20251216142236.GD31492@ziepe.ca> References: <20251216005705.GB31492@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Dec 16, 2025 at 03:59:32PM +0200, Michael Gur wrote: > > On 12/16/2025 11:59 AM, wujing wrote: > > Hi Jason, > > > > You're right that the locks aren't nested in rdma_dev_exit_net() - it does release > > rdma_nets_rwsem before acquiring devices_rwsem. However, this is still an ABBA deadlock, > > just not the trivial nested kind. The issue is caused by **rwsem writer priority** > > and lock ordering inconsistency. > > > > Here's the actual deadlock scenario: > > > > **Thread A (rdma_dev_exit_net - cleanup_net workqueue):** > > ``` > > down_write(&rdma_nets_rwsem); // Acquired > > xa_store(&rdma_nets, ...); > > up_write(&rdma_nets_rwsem); // Released > > down_read(&devices_rwsem); // Waiting here <-- BLOCKED > > ``` > > > > **Thread B (rdma_dev_init_net - stress-ng-clone):** > > ``` > > down_read(&devices_rwsem); // Acquired > > down_read(&rdma_nets_rwsem); // Waiting here <-- BLOCKED > > ``` > > > > The deadlock happens because: > > > > 1. Thread A releases rdma_nets_rwsem as a **writer** > > 2. Thread B (and many others) are waiting to acquire rdma_nets_rwsem as **readers** > > 3. Thread A then tries to acquire devices_rwsem as a reader > > 4. BUT: rwsem gives priority to pending writers over new readers > > 5. Since Thread A was a pending writer on rdma_nets_rwsem, Thread B's read request is blocked > > 6. Thread B holds devices_rwsem, which Thread A needs > > 7. Thread A holds the "writer priority slot" on rdma_nets_rwsem, which Thread B needs > > > Why would Thread A still hold any writer priority after calling up_write()? I've never heard of a 'writer priority slot' in linux, a thread does not block other users of a lock after it has released the lock. The rwsem priority is done by biasing the atomic counter, not with some kind of weird per-thread slots. Jason