From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f42.google.com (mail-dl1-f42.google.com [74.125.82.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14FC729B8DB for ; Tue, 17 Feb 2026 18:35:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771353359; cv=none; b=TgSoACoFilwpMm9PFXgYpM5M7IqEa49f4d1SMRPExzp1drrQCrX1NUj6iawDSaQiEeLNOpXOYM7dGXTBi8MrNNhNBTbdpXXU2eEzM2HxZjgrwjJWqfOnNMlH+yL0teV+4tfsX6fzS9JkJjY/l63sEABZil7KOZb4unryJHqcAuU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771353359; c=relaxed/simple; bh=SJFALqM4HL1DlnBf3phUXR/TIp/MQTbF5QJU8Q3PGjg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=PCakcmT4VnwVQG6qbP146CHpF1z5A+Ihax5YwkPK3grdFrf2T0wyOj3rKmpK1pw7I1ZzdaHEQEQr/YHnSaho5NILkkmrbUIXGpqRkpZLGrlXTtEx+2a3Va61mpgMJiqUh3ixbwVhg5GWYeGm3Bbn4jGzT8rQox37g3+oMY731jI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=KtiRl0mR; arc=none smtp.client-ip=74.125.82.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="KtiRl0mR" Received: by mail-dl1-f42.google.com with SMTP id a92af1059eb24-1233bc1117fso84649c88.0 for ; Tue, 17 Feb 2026 10:35:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1771353357; x=1771958157; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Ar3sJnUhQqfIA6ag4F/NsoNaK28/BEdZL0lU+k7vNas=; b=KtiRl0mRVK7En0Btrbr+1wx3Pox2Nl7P+oC8SBmRVCQpOxN1BRTSfHSNtg79cpwDUh gbSz3F6REPz130hAn8xRMVkeQ1SZn+vWvW/KKSGzFAFsgd4oM0mDIzxIzGLJdQWBE+TY boYy0jopdNz9SuYFsU7tauLk1RDVyPZB/ggXZt5A+H1843Nb7N+UH/5fHJqXvTCHI/tG OcYWc+7Ofw1v2JTjKiYw7avQnYquJM/4BIZM/YfmqjfaAuaHaHLFfWnRr4N66iOWP8KB gWun6srYI8n3Hy2UCoMMnEFyv1ZZP4ieA9ipkEtBUkbK5CL+XpTofKky9OO+7sfgIBnh JzqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771353357; x=1771958157; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ar3sJnUhQqfIA6ag4F/NsoNaK28/BEdZL0lU+k7vNas=; b=COguLYDqx4NkV5T9qvLFBhBPs1ODOCm+ps2vPhyUuuJhXuZZlXQg/u5Gu0H4rSQux0 f4DECZEdeDDuuA83lL3YeZSMEf+Sb6yFQKVe0f4Ao9/ihYC94Itth2ep5K4StZZBzATy aFY4S3aUL7XFr3V/OFY1dUcRIGEYElY9LCmWGHOBgIunXZlWnWh6vDUilA34RWa5EoSV a0m3WhxfT0Mke1MLUQUul81G6kPLTXikQEvq4GAfHa3ztQ7Vcz/sfdgPpqkwrH8T00Vi 2ueubPvoOkIttYoDrRDUddxq6idHeLcIM7XsNfWc5ZoCjKajhNyDNyiXFeEZ7x4PIClA lSKw== X-Forwarded-Encrypted: i=1; AJvYcCVqKgY4NhSzjCirbId8JtNxE0oitZ0gC6mbNyQBTMqE1+8b3ALBgDXJkAOhwkrQi+wf4SObbzmTyKxC6es=@vger.kernel.org X-Gm-Message-State: AOJu0YwfudGLmO6QgdLhSnKQ+aSF+y685s/T/4QoWzi6X1CgEutV88IP TCM4BXh643GIPvys/lw1MmaVuJt3mjMvDFK/RDvQ8T18m2Go8D7+hT3Eh9tMQU777O8= X-Gm-Gg: AZuq6aIGYqFfel84jVCbXge0M5Qyc22a9bxt7cI8rvqNseEyEqzlj+yDOitcTKAqg/5 iDbEXHk0kgA0ftiRfOcrb7hjXG5S+8G+MAsvTCeUC7dI6HmBUqrZx7M/Qo2rqSYNgqnePxx0jJQ QXU8rmR2g89OBYkI0YR6ZQC4kCQGXuhfeijK/D/+7FFp5NeITUyyoRCvlvNOToin6pfbUOeRq5X zQrRwQ/DNp7MB1Wc2gYR6n2F/We7i1+Af4h0lXgt0yxmuedw2fbDwyXLjTVRaNwJYdYLuekYltS TRF+9mtVX9JRmaAS6G0GtC4cOKNhY5Cn27yC+R7kHHXloc1kY+LxOktbM3FsRa6gYZdRsfl+hc1 Q7EphK/st5M85A6WuBb1lKOa9Qlz7Akrmu9NPJbHBeDacoFU3gCuusZ8+g1/QY8kD7Y14Y3o6ML Vz/WHIkoWnBlAkGngZLzHulfDtgxPuqIqZlK7fzheZqns= X-Received: by 2002:a05:7022:40f:b0:123:36f3:2d2f with SMTP id a92af1059eb24-1274103e7fdmr6005295c88.26.1771353356746; Tue, 17 Feb 2026 10:35:56 -0800 (PST) Received: from medusa.lab.kspace.sh ([208.88.152.253]) by smtp.googlemail.com with UTF8SMTPSA id a92af1059eb24-12742cbc900sm17558009c88.14.2026.02.17.10.35.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Feb 2026 10:35:56 -0800 (PST) Date: Tue, 17 Feb 2026 10:35:55 -0800 From: Mohamed Khalfella To: Hannes Reinecke Cc: Justin Tee , Naresh Gottumukkala , Paul Ely , Chaitanya Kulkarni , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , James Smart , Aaron Dailey , Randy Jennings , Dhaval Giani , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 08/21] nvme: Implement cross-controller reset recovery Message-ID: <20260217183555.GF3435530-mkhalfella@purestorage.com> References: <20260214042753.4073668-1-mkhalfella@purestorage.com> <20260214042753.4073668-9-mkhalfella@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon 2026-02-16 13:41:39 +0100, Hannes Reinecke wrote: > On 2/14/26 05:25, Mohamed Khalfella wrote: > > A host that has more than one path connecting to an nvme subsystem > > typically has an nvme controller associated with every path. This is > > mostly applicable to nvmeof. If one path goes down, inflight IOs on that > > path should not be retried immediately on another path because this > > could lead to data corruption as described in TP4129. TP8028 defines > > cross-controller reset mechanism that can be used by host to terminate > > IOs on the failed path using one of the remaining healthy paths. Only > > after IOs are terminated, or long enough time passes as defined by > > TP4129, inflight IOs should be retried on another path. Implement core > > cross-controller reset shared logic to be used by the transports. > > > > Signed-off-by: Mohamed Khalfella > > --- > > drivers/nvme/host/constants.c | 1 + > > drivers/nvme/host/core.c | 141 ++++++++++++++++++++++++++++++++++ > > drivers/nvme/host/nvme.h | 9 +++ > > 3 files changed, 151 insertions(+) > > > > diff --git a/drivers/nvme/host/constants.c b/drivers/nvme/host/constants.c > > index dc90df9e13a2..f679efd5110e 100644 > > --- a/drivers/nvme/host/constants.c > > +++ b/drivers/nvme/host/constants.c > > @@ -46,6 +46,7 @@ static const char * const nvme_admin_ops[] = { > > [nvme_admin_virtual_mgmt] = "Virtual Management", > > [nvme_admin_nvme_mi_send] = "NVMe Send MI", > > [nvme_admin_nvme_mi_recv] = "NVMe Receive MI", > > + [nvme_admin_cross_ctrl_reset] = "Cross Controller Reset", > > [nvme_admin_dbbuf] = "Doorbell Buffer Config", > > [nvme_admin_format_nvm] = "Format NVM", > > [nvme_admin_security_send] = "Security Send", > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 231d402e9bfb..765b1524b3ed 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -554,6 +554,146 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl) > > } > > EXPORT_SYMBOL_GPL(nvme_cancel_admin_tagset); > > > > +static struct nvme_ctrl *nvme_find_ctrl_ccr(struct nvme_ctrl *ictrl, > > + u32 min_cntlid) > > +{ > > + struct nvme_subsystem *subsys = ictrl->subsys; > > + struct nvme_ctrl *ctrl, *sctrl = NULL; > > + unsigned long flags; > > + > > + mutex_lock(&nvme_subsystems_lock); > > + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) { > > + if (ctrl->cntlid < min_cntlid) > > + continue; > > + > > + if (atomic_dec_if_positive(&ctrl->ccr_limit) < 0) > > + continue; > > + > > + spin_lock_irqsave(&ctrl->lock, flags); > > + if (ctrl->state != NVME_CTRL_LIVE) { > > + spin_unlock_irqrestore(&ctrl->lock, flags); > > + atomic_inc(&ctrl->ccr_limit); > > + continue; > > + } > > + > > + /* > > + * We got a good candidate source controller that is locked and > > + * LIVE. However, no guarantee ctrl will not be deleted after > > + * ctrl->lock is released. Get a ref of both ctrl and admin_q > > + * so they do not disappear until we are done with them. > > + */ > > + WARN_ON_ONCE(!blk_get_queue(ctrl->admin_q)); > > + nvme_get_ctrl(ctrl); > > + spin_unlock_irqrestore(&ctrl->lock, flags); > > + sctrl = ctrl; > > + break; > > + } > > + mutex_unlock(&nvme_subsystems_lock); > > + return sctrl; > > +} > > + > > +static void nvme_put_ctrl_ccr(struct nvme_ctrl *sctrl) > > +{ > > + atomic_inc(&sctrl->ccr_limit); > > + blk_put_queue(sctrl->admin_q); > > + nvme_put_ctrl(sctrl); > > +} > > + > > +static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl) > > +{ > > + struct nvme_ccr_entry ccr = { }; > > + union nvme_result res = { 0 }; > > + struct nvme_command c = { }; > > + unsigned long flags, tmo; > > + bool completed = false; > > + int ret = 0; > > + u32 result; > > + > > + init_completion(&ccr.complete); > > + ccr.ictrl = ictrl; > > + > > + spin_lock_irqsave(&sctrl->lock, flags); > > + list_add_tail(&ccr.list, &sctrl->ccr_list); > > + spin_unlock_irqrestore(&sctrl->lock, flags); > > + > > + c.ccr.opcode = nvme_admin_cross_ctrl_reset; > > + c.ccr.ciu = ictrl->ciu; > > + c.ccr.icid = cpu_to_le16(ictrl->cntlid); > > + c.ccr.cirn = cpu_to_le64(ictrl->cirn); > > + ret = __nvme_submit_sync_cmd(sctrl->admin_q, &c, &res, > > + NULL, 0, NVME_QID_ANY, 0); > > + if (ret) { > > + ret = -EIO; > > + goto out; > > + } > > + > > + result = le32_to_cpu(res.u32); > > + if (result & 0x01) /* Immediate Reset Successful */ > > + goto out; > > + > > + tmo = secs_to_jiffies(ictrl->kato); > > + if (!wait_for_completion_timeout(&ccr.complete, tmo)) { > > + ret = -ETIMEDOUT; > > + goto out; > > + } > > + > That will be tricky. The 'ccr' comand will be sent with the default > command queue timeout which is decoupled from KATO. > So you really should set the command timeout for the 'ccr' command > to ctrl->kato to ensure it'll be terminated correctly. > Agreed. The timeout for CCR request should be ctr->kato just like what we do for keep alive request. The easiest way IMO to do is is to extend __nvme_submit_sync_cmd() to take request timeout. I do not want to make this change in this patchset. Is it okay I make this change after this patchset gets merged?