From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 07EA8C3ABB2 for ; Wed, 28 May 2025 21:56:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=4IKZPNA8q4jYRu2Nv6KgK2Z9hBJjmJbFKqGboqDBzz4=; b=o1Ihrtsx+nU8eau6nkhMThfQxS 6T/SUtbNTg6jvJd6PwWOwaR4NldyuUfO/QmDVOCOjfbjjGaxafeP4UtlZZES7OgRMWr/J+xjGBUMR 3g77xRGcOpdrn7wi3S9bTLloby2iuF1OADFBI6By0jgn77i4TCtmWXqfDHTCWyhAd95fu7GQP2IRs TTAtAyldiwVM8f2y8wWWN6F3mp+grT9vvYVXHfxiujVAgPomz6vpkHulFupNj4K1XNw7k1ORrV37d uW0YiVMmjFwbRjggJOd9yXmVbtxn/Dze9hl+mWMlowPLAspeIXTUezSJsIVu9v5ve1KtsqI9aEZJJ wrCcruCA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1uKOks-0000000EBIm-3uGh; Wed, 28 May 2025 21:56:10 +0000 Received: from sea.source.kernel.org ([2600:3c0a:e001:78e:0:1991:8:25]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1uKOkp-0000000EBHU-49Ak for linux-nvme@lists.infradead.org; Wed, 28 May 2025 21:56:09 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 6CC1B4A53E; Wed, 28 May 2025 21:56:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B418C4CEF1; Wed, 28 May 2025 21:56:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1748469367; bh=DOL0V1GlbHtlPlUEL9kQQBZoG2WC9rqsgkDdolSiyVs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MMHTw/oadaZtznYsexEaSh/KXUS4Vh0/VEr3stSkPfouFwyVZE8WO2C4HxERc0s55 H0TLuXjb1/ngUUW/WKekGARYimByxET/umssSihlDkrmPFHHcvCcIph1UEyny4jmVN 1dkfCnVhgx9XgIExGGm1dSVeVOs0K6yykYjDNMXfCby1rwHru7gcllZH5S4fT6+rOJ Bf/ACqeg54RMew+oCNMKy7K/AJFSSRRQbCzAbcD6F9+tk8kHDpLLnaO5qjaq3CB5Fo Hht3WzEgtCeLTQ0NuiOWKd9qmBXgvVP1xgqS9+nSrjqB6he8+DxnTq/dsnKJ3aHaiM zuKwLFdbc4x6Q== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Daniel Wagner , Christoph Hellwig , Sasha Levin , james.smart@broadcom.com, sagi@grimberg.me, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH AUTOSEL 6.15 6/9] nvmet-fcloop: access fcpreq only when holding reqlock Date: Wed, 28 May 2025 17:55:56 -0400 Message-Id: <20250528215559.1983214-6-sashal@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250528215559.1983214-1-sashal@kernel.org> References: <20250528215559.1983214-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.15 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250528_145608_073481_61718C8E X-CRM114-Status: GOOD ( 19.24 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org From: Daniel Wagner [ Upstream commit 47a827cd7929d0550c3496d70b417fcb5649b27b ] The abort handling logic expects that the state and the fcpreq are only accessed when holding the reqlock lock. While at it, only handle the aborts in the abort handler. Signed-off-by: Daniel Wagner Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- **YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## Commit Analysis This commit fixes a critical **race condition and data corruption issue** in the nvmet-fcloop driver's abort handling logic. The changes address serious synchronization problems that could lead to use-after-free conditions and inconsistent state management. ## Key Issues Fixed ### 1. **Unsafe fcpreq Access Outside Lock Protection** The main issue is that `fcpreq` was being accessed without proper lock protection in `fcloop_fcp_recv_work()`: ```c // BEFORE (unsafe): struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq; // Access outside lock spin_lock_irqsave(&tfcp_req->reqlock, flags); // ... lock operations ... spin_unlock_irqrestore(&tfcp_req->reqlock, flags); // Later use of fcpreq - could be stale/freed // AFTER (safe): spin_lock_irqsave(&tfcp_req->reqlock, flags); fcpreq = tfcp_req->fcpreq; // Access inside lock protection // ... rest of operations ... ``` This change ensures `fcpreq` is only accessed while holding the `reqlock`, preventing race conditions where the pointer could be modified by concurrent abort operations. ### 2. **Improved Abort Handling Logic** The abort path in `fcloop_fcp_abort_recv_work()` was restructured to properly handle the `fcpreq` pointer: ```c // BEFORE: fcpreq = tfcp_req->fcpreq; // Read fcpreq switch (tfcp_req->inistate) { case INI_IO_ABORTED: break; // ... later operations outside lock set fcpreq to NULL // AFTER: switch (tfcp_req->inistate) { case INI_IO_ABORTED: fcpreq = tfcp_req->fcpreq; // Only read when in ABORTED state tfcp_req->fcpreq = NULL; // Clear immediately under lock break; ``` ### 3. **Cleaner Control Flow** The commit also improves the logic in `fcloop_fcp_recv_work()` by having the abort handler take full responsibility for calling `fcloop_call_host_done()` when aborted, rather than duplicating this logic. ## Stable Tree Backport Criteria Assessment ✅ **Fixes Important Bug**: Yes - race conditions and potential use-after-free in critical I/O path ✅ **Small and Contained**: Yes - only 30 lines changed, focused on specific synchronization issue ✅ **Minimal Regression Risk**: Yes - improves existing locking patterns without architectural changes ✅ **Clear Side Effects**: No major side effects - only improves synchronization ✅ **Confined to Subsystem**: Yes - only affects nvmet-fcloop test driver ✅ **Follows Stable Rules**: Yes - critical bugfix with minimal complexity ## Comparison with Historical Commits This commit aligns perfectly with **Similar Commit #4** which was marked **"Backport Status: YES"**: - Both fix locking/synchronization issues in fcloop - Both address race conditions in abort handling - Both are small, focused changes - Both improve existing patterns rather than introducing new architecture The pattern of fcloop locking fixes being suitable for backport is well-established, as seen in the historical reference where similar synchronization improvements were deemed appropriate for stable trees. ## Conclusion This is a textbook example of a stable tree backport candidate: it fixes a genuine race condition bug that could cause data corruption or crashes, uses a minimal and safe approach, and improves the robustness of the existing code without introducing new features or architectural changes. drivers/nvme/target/fcloop.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c index 641201e62c1ba..20becea1ad968 100644 --- a/drivers/nvme/target/fcloop.c +++ b/drivers/nvme/target/fcloop.c @@ -618,12 +618,13 @@ fcloop_fcp_recv_work(struct work_struct *work) { struct fcloop_fcpreq *tfcp_req = container_of(work, struct fcloop_fcpreq, fcp_rcv_work); - struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq; + struct nvmefc_fcp_req *fcpreq; unsigned long flags; int ret = 0; bool aborted = false; spin_lock_irqsave(&tfcp_req->reqlock, flags); + fcpreq = tfcp_req->fcpreq; switch (tfcp_req->inistate) { case INI_IO_START: tfcp_req->inistate = INI_IO_ACTIVE; @@ -638,16 +639,19 @@ fcloop_fcp_recv_work(struct work_struct *work) } spin_unlock_irqrestore(&tfcp_req->reqlock, flags); - if (unlikely(aborted)) - ret = -ECANCELED; - else { - if (likely(!check_for_drop(tfcp_req))) - ret = nvmet_fc_rcv_fcp_req(tfcp_req->tport->targetport, - &tfcp_req->tgt_fcp_req, - fcpreq->cmdaddr, fcpreq->cmdlen); - else - pr_info("%s: dropped command ********\n", __func__); + if (unlikely(aborted)) { + /* the abort handler will call fcloop_call_host_done */ + return; + } + + if (unlikely(check_for_drop(tfcp_req))) { + pr_info("%s: dropped command ********\n", __func__); + return; } + + ret = nvmet_fc_rcv_fcp_req(tfcp_req->tport->targetport, + &tfcp_req->tgt_fcp_req, + fcpreq->cmdaddr, fcpreq->cmdlen); if (ret) fcloop_call_host_done(fcpreq, tfcp_req, ret); } @@ -662,9 +666,10 @@ fcloop_fcp_abort_recv_work(struct work_struct *work) unsigned long flags; spin_lock_irqsave(&tfcp_req->reqlock, flags); - fcpreq = tfcp_req->fcpreq; switch (tfcp_req->inistate) { case INI_IO_ABORTED: + fcpreq = tfcp_req->fcpreq; + tfcp_req->fcpreq = NULL; break; case INI_IO_COMPLETED: completed = true; @@ -686,10 +691,6 @@ fcloop_fcp_abort_recv_work(struct work_struct *work) nvmet_fc_rcv_fcp_abort(tfcp_req->tport->targetport, &tfcp_req->tgt_fcp_req); - spin_lock_irqsave(&tfcp_req->reqlock, flags); - tfcp_req->fcpreq = NULL; - spin_unlock_irqrestore(&tfcp_req->reqlock, flags); - fcloop_call_host_done(fcpreq, tfcp_req, -ECANCELED); /* call_host_done releases reference for abort downcall */ } -- 2.39.5