From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-dl1-f42.google.com (mail-dl1-f42.google.com [74.125.82.42])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14FC729B8DB
	for <linux-kernel@vger.kernel.org>; Tue, 17 Feb 2026 18:35:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.42
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1771353359; cv=none; b=TgSoACoFilwpMm9PFXgYpM5M7IqEa49f4d1SMRPExzp1drrQCrX1NUj6iawDSaQiEeLNOpXOYM7dGXTBi8MrNNhNBTbdpXXU2eEzM2HxZjgrwjJWqfOnNMlH+yL0teV+4tfsX6fzS9JkJjY/l63sEABZil7KOZb4unryJHqcAuU=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1771353359; c=relaxed/simple;
	bh=SJFALqM4HL1DlnBf3phUXR/TIp/MQTbF5QJU8Q3PGjg=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=PCakcmT4VnwVQG6qbP146CHpF1z5A+Ihax5YwkPK3grdFrf2T0wyOj3rKmpK1pw7I1ZzdaHEQEQr/YHnSaho5NILkkmrbUIXGpqRkpZLGrlXTtEx+2a3Va61mpgMJiqUh3ixbwVhg5GWYeGm3Bbn4jGzT8rQox37g3+oMY731jI=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=KtiRl0mR; arc=none smtp.client-ip=74.125.82.42
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="KtiRl0mR"
Received: by mail-dl1-f42.google.com with SMTP id a92af1059eb24-1233bc1117fso84649c88.0
        for <linux-kernel@vger.kernel.org>; Tue, 17 Feb 2026 10:35:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1771353357; x=1771958157; darn=vger.kernel.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=Ar3sJnUhQqfIA6ag4F/NsoNaK28/BEdZL0lU+k7vNas=;
        b=KtiRl0mRVK7En0Btrbr+1wx3Pox2Nl7P+oC8SBmRVCQpOxN1BRTSfHSNtg79cpwDUh
         gbSz3F6REPz130hAn8xRMVkeQ1SZn+vWvW/KKSGzFAFsgd4oM0mDIzxIzGLJdQWBE+TY
         boYy0jopdNz9SuYFsU7tauLk1RDVyPZB/ggXZt5A+H1843Nb7N+UH/5fHJqXvTCHI/tG
         OcYWc+7Ofw1v2JTjKiYw7avQnYquJM/4BIZM/YfmqjfaAuaHaHLFfWnRr4N66iOWP8KB
         gWun6srYI8n3Hy2UCoMMnEFyv1ZZP4ieA9ipkEtBUkbK5CL+XpTofKky9OO+7sfgIBnh
         JzqQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1771353357; x=1771958157;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Ar3sJnUhQqfIA6ag4F/NsoNaK28/BEdZL0lU+k7vNas=;
        b=COguLYDqx4NkV5T9qvLFBhBPs1ODOCm+ps2vPhyUuuJhXuZZlXQg/u5Gu0H4rSQux0
         f4DECZEdeDDuuA83lL3YeZSMEf+Sb6yFQKVe0f4Ao9/ihYC94Itth2ep5K4StZZBzATy
         aFY4S3aUL7XFr3V/OFY1dUcRIGEYElY9LCmWGHOBgIunXZlWnWh6vDUilA34RWa5EoSV
         a0m3WhxfT0Mke1MLUQUul81G6kPLTXikQEvq4GAfHa3ztQ7Vcz/sfdgPpqkwrH8T00Vi
         2ueubPvoOkIttYoDrRDUddxq6idHeLcIM7XsNfWc5ZoCjKajhNyDNyiXFeEZ7x4PIClA
         lSKw==
X-Forwarded-Encrypted: i=1; AJvYcCVqKgY4NhSzjCirbId8JtNxE0oitZ0gC6mbNyQBTMqE1+8b3ALBgDXJkAOhwkrQi+wf4SObbzmTyKxC6es=@vger.kernel.org
X-Gm-Message-State: AOJu0YwfudGLmO6QgdLhSnKQ+aSF+y685s/T/4QoWzi6X1CgEutV88IP
	TCM4BXh643GIPvys/lw1MmaVuJt3mjMvDFK/RDvQ8T18m2Go8D7+hT3Eh9tMQU777O8=
X-Gm-Gg: AZuq6aIGYqFfel84jVCbXge0M5Qyc22a9bxt7cI8rvqNseEyEqzlj+yDOitcTKAqg/5
	iDbEXHk0kgA0ftiRfOcrb7hjXG5S+8G+MAsvTCeUC7dI6HmBUqrZx7M/Qo2rqSYNgqnePxx0jJQ
	QXU8rmR2g89OBYkI0YR6ZQC4kCQGXuhfeijK/D/+7FFp5NeITUyyoRCvlvNOToin6pfbUOeRq5X
	zQrRwQ/DNp7MB1Wc2gYR6n2F/We7i1+Af4h0lXgt0yxmuedw2fbDwyXLjTVRaNwJYdYLuekYltS
	TRF+9mtVX9JRmaAS6G0GtC4cOKNhY5Cn27yC+R7kHHXloc1kY+LxOktbM3FsRa6gYZdRsfl+hc1
	Q7EphK/st5M85A6WuBb1lKOa9Qlz7Akrmu9NPJbHBeDacoFU3gCuusZ8+g1/QY8kD7Y14Y3o6ML
	Vz/WHIkoWnBlAkGngZLzHulfDtgxPuqIqZlK7fzheZqns=
X-Received: by 2002:a05:7022:40f:b0:123:36f3:2d2f with SMTP id a92af1059eb24-1274103e7fdmr6005295c88.26.1771353356746;
        Tue, 17 Feb 2026 10:35:56 -0800 (PST)
Received: from medusa.lab.kspace.sh ([208.88.152.253])
        by smtp.googlemail.com with UTF8SMTPSA id a92af1059eb24-12742cbc900sm17558009c88.14.2026.02.17.10.35.56
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Feb 2026 10:35:56 -0800 (PST)
Date: Tue, 17 Feb 2026 10:35:55 -0800
From: Mohamed Khalfella <mkhalfella@purestorage.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Justin Tee <justin.tee@broadcom.com>,
	Naresh Gottumukkala <nareshgottumukkala83@gmail.com>,
	Paul Ely <paul.ely@broadcom.com>,
	Chaitanya Kulkarni <kch@nvidia.com>, Christoph Hellwig <hch@lst.de>,
	Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>,
	Sagi Grimberg <sagi@grimberg.me>,
	James Smart <jsmart833426@gmail.com>,
	Aaron Dailey <adailey@purestorage.com>,
	Randy Jennings <randyj@purestorage.com>,
	Dhaval Giani <dgiani@purestorage.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 08/21] nvme: Implement cross-controller reset recovery
Message-ID: <20260217183555.GF3435530-mkhalfella@purestorage.com>
References: <20260214042753.4073668-1-mkhalfella@purestorage.com>
 <20260214042753.4073668-9-mkhalfella@purestorage.com>
 <c5ee3fd5-2bbe-44e0-ae1b-ed77d7d87029@suse.de>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <c5ee3fd5-2bbe-44e0-ae1b-ed77d7d87029@suse.de>

On Mon 2026-02-16 13:41:39 +0100, Hannes Reinecke wrote:
> On 2/14/26 05:25, Mohamed Khalfella wrote:
> > A host that has more than one path connecting to an nvme subsystem
> > typically has an nvme controller associated with every path. This is
> > mostly applicable to nvmeof. If one path goes down, inflight IOs on that
> > path should not be retried immediately on another path because this
> > could lead to data corruption as described in TP4129. TP8028 defines
> > cross-controller reset mechanism that can be used by host to terminate
> > IOs on the failed path using one of the remaining healthy paths. Only
> > after IOs are terminated, or long enough time passes as defined by
> > TP4129, inflight IOs should be retried on another path. Implement core
> > cross-controller reset shared logic to be used by the transports.
> > 
> > Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> > ---
> >   drivers/nvme/host/constants.c |   1 +
> >   drivers/nvme/host/core.c      | 141 ++++++++++++++++++++++++++++++++++
> >   drivers/nvme/host/nvme.h      |   9 +++
> >   3 files changed, 151 insertions(+)
> > 
> > diff --git a/drivers/nvme/host/constants.c b/drivers/nvme/host/constants.c
> > index dc90df9e13a2..f679efd5110e 100644
> > --- a/drivers/nvme/host/constants.c
> > +++ b/drivers/nvme/host/constants.c
> > @@ -46,6 +46,7 @@ static const char * const nvme_admin_ops[] = {
> >   	[nvme_admin_virtual_mgmt] = "Virtual Management",
> >   	[nvme_admin_nvme_mi_send] = "NVMe Send MI",
> >   	[nvme_admin_nvme_mi_recv] = "NVMe Receive MI",
> > +	[nvme_admin_cross_ctrl_reset] = "Cross Controller Reset",
> >   	[nvme_admin_dbbuf] = "Doorbell Buffer Config",
> >   	[nvme_admin_format_nvm] = "Format NVM",
> >   	[nvme_admin_security_send] = "Security Send",
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 231d402e9bfb..765b1524b3ed 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -554,6 +554,146 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
> >   }
> >   EXPORT_SYMBOL_GPL(nvme_cancel_admin_tagset);
> >   
> > +static struct nvme_ctrl *nvme_find_ctrl_ccr(struct nvme_ctrl *ictrl,
> > +					    u32 min_cntlid)
> > +{
> > +	struct nvme_subsystem *subsys = ictrl->subsys;
> > +	struct nvme_ctrl *ctrl, *sctrl = NULL;
> > +	unsigned long flags;
> > +
> > +	mutex_lock(&nvme_subsystems_lock);
> > +	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
> > +		if (ctrl->cntlid < min_cntlid)
> > +			continue;
> > +
> > +		if (atomic_dec_if_positive(&ctrl->ccr_limit) < 0)
> > +			continue;
> > +
> > +		spin_lock_irqsave(&ctrl->lock, flags);
> > +		if (ctrl->state != NVME_CTRL_LIVE) {
> > +			spin_unlock_irqrestore(&ctrl->lock, flags);
> > +			atomic_inc(&ctrl->ccr_limit);
> > +			continue;
> > +		}
> > +
> > +		/*
> > +		 * We got a good candidate source controller that is locked and
> > +		 * LIVE. However, no guarantee ctrl will not be deleted after
> > +		 * ctrl->lock is released. Get a ref of both ctrl and admin_q
> > +		 * so they do not disappear until we are done with them.
> > +		 */
> > +		WARN_ON_ONCE(!blk_get_queue(ctrl->admin_q));
> > +		nvme_get_ctrl(ctrl);
> > +		spin_unlock_irqrestore(&ctrl->lock, flags);
> > +		sctrl = ctrl;
> > +		break;
> > +	}
> > +	mutex_unlock(&nvme_subsystems_lock);
> > +	return sctrl;
> > +}
> > +
> > +static void nvme_put_ctrl_ccr(struct nvme_ctrl *sctrl)
> > +{
> > +	atomic_inc(&sctrl->ccr_limit);
> > +	blk_put_queue(sctrl->admin_q);
> > +	nvme_put_ctrl(sctrl);
> > +}
> > +
> > +static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
> > +{
> > +	struct nvme_ccr_entry ccr = { };
> > +	union nvme_result res = { 0 };
> > +	struct nvme_command c = { };
> > +	unsigned long flags, tmo;
> > +	bool completed = false;
> > +	int ret = 0;
> > +	u32 result;
> > +
> > +	init_completion(&ccr.complete);
> > +	ccr.ictrl = ictrl;
> > +
> > +	spin_lock_irqsave(&sctrl->lock, flags);
> > +	list_add_tail(&ccr.list, &sctrl->ccr_list);
> > +	spin_unlock_irqrestore(&sctrl->lock, flags);
> > +
> > +	c.ccr.opcode = nvme_admin_cross_ctrl_reset;
> > +	c.ccr.ciu = ictrl->ciu;
> > +	c.ccr.icid = cpu_to_le16(ictrl->cntlid);
> > +	c.ccr.cirn = cpu_to_le64(ictrl->cirn);
> > +	ret = __nvme_submit_sync_cmd(sctrl->admin_q, &c, &res,
> > +				     NULL, 0, NVME_QID_ANY, 0);
> > +	if (ret) {
> > +		ret = -EIO;
> > +		goto out;
> > +	}
> > +
> > +	result = le32_to_cpu(res.u32);
> > +	if (result & 0x01) /* Immediate Reset Successful */
> > +		goto out;
> > +
> > +	tmo = secs_to_jiffies(ictrl->kato);
> > +	if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
> > +		ret = -ETIMEDOUT;
> > +		goto out;
> > +	}
> > +
> That will be tricky. The 'ccr' comand will be sent with the default
> command queue timeout which is decoupled from KATO.
> So you really should set the command timeout for the 'ccr' command
> to ctrl->kato to ensure it'll be terminated correctly.
> 

Agreed. The timeout for CCR request should be ctr->kato just like what
we do for keep alive request. The easiest way IMO to do is is to extend
__nvme_submit_sync_cmd() to take request timeout. I do not want to make
this change in this patchset.

Is it okay I make this change after this patchset gets merged?