From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 11240CCD183 for ; Thu, 9 Oct 2025 15:59:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=G7/pFRGLk1Xma/9LFyMX95/isS2VnJo05zgjplAw4Sw=; b=tZsmbt9iAU/ochsUV5GdAbZh0Z JN+NNJRa2hGyy0/4A9ChOOpHxXJT8DihQZmzBa43Y+Qbmxi3dhxsvBqwRshA8tg3Du2EC/+Eq1Fqp Uc0Ov7fscP7+9towh+RRSwpw5pRioeclLNBMmoi+WvHzNrbKSh66CcBJ1JZ7P2EyzWDvNN/3e2jeY flRi0to0enNBco1k6YlbC/vhbp1Ppc4i40Dickb26a3ZgBymnuUaRbM+5xkJJEHi0AoRzsKu4r/bl 8y1lbUWrKgQALE5SesrS4LwBcjajqejF2gBY/lXg7U3Aa6u9qfYUTptgwgh1dMjZketLvThdZ2TnM p4vyJXvg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1v6t2u-00000006b5j-1EL8; Thu, 09 Oct 2025 15:59:12 +0000 Received: from tor.source.kernel.org ([172.105.4.254]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1v6t2t-00000006b4K-00S5 for linux-nvme@lists.infradead.org; Thu, 09 Oct 2025 15:59:11 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 4E33C6236A; Thu, 9 Oct 2025 15:59:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00426C4CEF7; Thu, 9 Oct 2025 15:59:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1760025549; bh=l7/7eUi2mjzkYxpZeBHXxPnQp1fR3KlCIejpfYTIw7s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aFjUssgpx8MC7ZyMW17AldPm1+991gwWJ04iJQ6khVrFOubz635jgmGclZx7mmLWE HdUcasbN6758DLSf6J1XBHhBS7OSSHVd/KNWEfTtfDWzwl7iJz+DJz9AZbXgN7Y9/N Z7CBIvex3lLr+TJl8bG5NP59RrUrlYYJUqWcTBNlJTVOPw+2RtlntiK5rtMpZMZrTe 5umu4e7d56XcDfXDE9NoDsyScMNvtc/YNAKqX8GPcQRXA7i1tZC4PGUhsg3Fvc3OBm KwDQzgh9/+dEG0tyFegmeMLhOSDWeXu3yvzCyFSDAubX2Cn9CvvxEeIt215YcTQZXP 35hOHbW8PEOhg== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Alistair Francis , Hannes Reinecke , Christoph Hellwig , Keith Busch , Sasha Levin , sagi@grimberg.me, linux-nvme@lists.infradead.org Subject: [PATCH AUTOSEL 6.17-6.1] nvme: Use non zero KATO for persistent discovery connections Date: Thu, 9 Oct 2025 11:55:07 -0400 Message-ID: <20251009155752.773732-41-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251009155752.773732-1-sashal@kernel.org> References: <20251009155752.773732-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.17.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org From: Alistair Francis [ Upstream commit 2e482655019ab6fcfe8865b62432c6d03f0b5f80 ] The NVMe Base Specification 2.1 states that: """ A host requests an explicit persistent connection ... by specifying a non-zero Keep Alive Timer value in the Connect command. """ As such if we are starting a persistent connection to a discovery controller and the KATO is currently 0 we need to update KATO to a non zero value to avoid continuous timeouts on the target. Signed-off-by: Alistair Francis Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: YES - What it fixes - Aligns Linux host behavior with NVMe Base Spec 2.1 requirement that a host must specify a non-zero Keep Alive Timer in the Connect command to request an explicit persistent discovery connection. The previous behavior left `KATO=0` for discovery controllers even when the connection became persistent, causing targets to time out and drop connections. - Code change and behavior - In `nvme_start_ctrl()` (`drivers/nvme/host/core.c:4998`), on reconnect for discovery controllers (`test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags)` and `nvme_discovery_ctrl(ctrl)`), the patch: - Checks if `ctrl->kato` is zero. - If zero, calls `nvme_stop_keep_alive(ctrl)`, sets `ctrl->kato = NVME_DEFAULT_KATO`, then `nvme_start_keep_alive(ctrl)`. - Still sends the rediscover uevent: `nvme_change_uevent(ctrl, "NVME_EVENT=rediscover")`. - This immediately starts keep-alive commands after a persistent discovery reconnect and ensures subsequent Connect commands advertise non-zero KATO. - Why this is correct and effective - Immediate effect: Even if the just-completed Connect used `kato=0`, forcing a non-zero `kato` here starts the host keep-alive work right away, avoiding target keep-alive timeouts after a persistent reconnect. - Future connections: `nvmf_connect_cmd_prep()` sets Connect’s KATO from `ctrl->kato` (`drivers/nvme/host/fabrics.c:426`). With this change, the next reconnection will send a non-zero KATO in the Connect command as the spec requires. - Safe sequence: `nvme_stop_keep_alive()` is a no-op when `kato==0` (`drivers/nvme/host/core.c:1412`), then `ctrl->kato` is set to `NVME_DEFAULT_KATO` (`drivers/nvme/host/nvme.h:31`), and `nvme_start_keep_alive()` only schedules work when `kato!=0` (`drivers/nvme/host/core.c:1404`). - Scope and risk - Scope-limited: Only affects discovery controllers on reconnect (persistent discovery) and only when `kato==0`. No effect on: - Non-discovery (I/O) controllers (they already default to non-zero KATO). - Discovery controllers where userspace explicitly set a non-zero KATO. - No architectural changes; uses existing helpers and flags; no ABI change. - Regression risk is low. Prior history already introduced persistent discovery semantics and a sysfs `kato` attribute, and transports already honor `ctrl->kato` for Connect. This change simply fills a corner case where `kato` remained zero in a persistent discovery reconnect. - Historical context and consistency - 2018: We explicitly avoided KA to discovery controllers per early spec constraints. - 2021: The code was adjusted so discovery controllers default to `kato=0`, while I/O controllers default to `NVME_DEFAULT_KATO` (commit 32feb6de). Persistent discovery connections were intended to have a positive KATO (via options), but implicit persistent reconnects could still have `kato=0`. - 2022: Added rediscover uevent for persistent discovery reconnects (f46ef9e87) and `NVME_CTRL_STARTED_ONCE` usage. - This patch completes the intent by ensuring persistent discovery reconnects run with non-zero KATO automatically, preventing target timeouts and complying with spec 2.1. - Stable backport suitability - Fixes a user-visible bug (target timeouts and unstable discovery connectivity on persistent reconnects). - Small, self-contained change confined to `nvme_start_ctrl()` in `drivers/nvme/host/core.c`. - No new features or interfaces; minimal risk of regression; behavior matches spec and existing design. - Dependencies exist in stable trees that already have persistent discovery support and the `NVME_CTRL_STARTED_ONCE` mechanism. For older branches that still use `test_and_set_bit` in the rediscover path, the logic remains valid within that conditional block. - Side notes for backporters - Ensure the tree has `NVME_CTRL_STARTED_ONCE`, `nvme_discovery_ctrl()`, and the rediscover uevent path in `nvme_start_ctrl()`. If an older stable branch uses `test_and_set_bit` instead of `test_bit`, place the new KATO block inside that existing conditional. - `nvmf_connect_cmd_prep()` must already populate Connect’s `kato` from `ctrl->kato` (`drivers/nvme/host/fabrics.c:426`) so that future reconnects benefit from the updated `kato`. drivers/nvme/host/core.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 6b7493934535a..5714d49932822 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4990,8 +4990,14 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl) * checking that they started once before, hence are reconnecting back. */ if (test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags) && - nvme_discovery_ctrl(ctrl)) + nvme_discovery_ctrl(ctrl)) { + if (!ctrl->kato) { + nvme_stop_keep_alive(ctrl); + ctrl->kato = NVME_DEFAULT_KATO; + nvme_start_keep_alive(ctrl); + } nvme_change_uevent(ctrl, "NVME_EVENT=rediscover"); + } if (ctrl->queue_count > 1) { nvme_queue_scan(ctrl); -- 2.51.0