From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dy1-f173.google.com (mail-dy1-f173.google.com [74.125.82.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C92AC423A63 for ; Tue, 31 Mar 2026 16:47:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774975657; cv=none; b=uiBNoRUL5bPdeMFVC8+CHsZG7hP0ciF3BxnXFOx9kwYazNInS3jN3aet+VK9PeO5A6u5TC0DcnZeKOnJH+RyoDYim1SOP683lMrdeY1jYqDBjPybAs/xtun4PGuxdPmMLLUIxPQfceIWg0zO9bucluFDy5Gz2hgrfl9aIPGjBN0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774975657; c=relaxed/simple; bh=2rnKIIOMIxRp+Lnl3uHgdjKepj5mRfY7fz97jm30PrQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZuDcEc6R6RR2tTAP1F88jvfXIR5r0lKxL1KKxClbyGVj/4Ov1lLYbXFfHlIMMIPfdAEe3GEa/FMxkYEWaBOtqUELfVWwyNiOKLMPuZytV2vCF5waxJMG1e/FXg5E5U4rgsuTsEEOAtRjhfudM0CFexNEZA84k/7xAtAqGBHFYoY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=CV60w/3B; arc=none smtp.client-ip=74.125.82.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="CV60w/3B" Received: by mail-dy1-f173.google.com with SMTP id 5a478bee46e88-2c156c4a9efso5701211eec.1 for ; Tue, 31 Mar 2026 09:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1774975655; x=1775580455; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=B5AgkFEvCnT6f4BK4/qZpldsBicfsRimh7sp9Id4RbY=; b=CV60w/3BHDrwbVj/ZgZemVxi7Mg9IqqpWqtLzxIIxw9L7if4ez+PUdQWjQCt9pjk0H ixeUWoMnbC+jTNSlElpBhuHpHJg33r9CU28aBXcQjAbmxrgHlWPx3Vzmi0XZ8g6wMv7D FBilDqgJhi0lqu442iFrtlmH50ErDTcd/zO1HdhnnmB5v5xEduE/ijyg6wtZKy2i9h00 wuQ2gDPc4Qax6qVEawfpyhWtJeXra3XEyx7IqZbbyYfKQHFCIJxS1ZT3230+hwpRkKk8 f1MttyDbeSfIH88cs+DZsjA0Q6jFnXL3jUAH8S18t70jz8VmEYtYawl6KHEl4A6Lynb9 eTgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774975655; x=1775580455; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=B5AgkFEvCnT6f4BK4/qZpldsBicfsRimh7sp9Id4RbY=; b=Sv1tk2wS/hyWtqJZGASdro18kMjFV1B1sI+sH4lkboBVN/Jd/HsaQXu6N8C/4djtae vHbmV1LZXvJn4M6YjNxKeuER9XfkiCHVxyL6DfOpPDA8SIVGU/rPwh//1X8pn6IV7LXs uyRlEf0esz98Rk/UoWiqtI0E8qF5pG2/Z71pz+aFbuNHBQeW7QEQjMMysfUrwVUu2wHs cZLVPl/TdNLA9dllF6gbcg7/4C+oMXY2ES+YLP339QXUJq6sZkR9TzbIzdxHdf8c6iHi tXqvQksgzcHcrTGMXc+cuIX4bCf6cfasniSKTzeZSBp/d6EoXsFaWGDzk5gJelgFwHtP eFZA== X-Forwarded-Encrypted: i=1; AJvYcCVMOT5rxp2c+6aRCP4Zd0gc3xO0At7xsAwgaVmRvzmkK9yQRKHD6jqARR+m5dEnRUCWZsCqfgZ2dQu769g=@vger.kernel.org X-Gm-Message-State: AOJu0YzcyFrwQYy0JkXGNdjSeNhCr+kXm3fAUCXh1ZLrvlUQ4TobSdCN 5XGumiSTYW0Jzv3qMXulysz3rwnRFPIsaFLErvC1tuCHIdd2t131ykGKMxnsKgbEScxNUItIdJi xMWE5Hic= X-Gm-Gg: ATEYQzwt6ywBvCkJ/UIc0B5OoXPv2K2QJicM3UNOQEBpxOjboBD0mqNXkZNtQf4Sh2c V+OwenjlziIk1e+LvANaxJP6x3bmD7Sac1dGdGHKSs/mG/UKMBpv99tkqoMlzNpBErHnP3glWhW THuY2p6pPxvcZlunrOzVZWdVIdak6ewPKwaMxz19ijN9kqnONALCedndz8xJCud1vRsy2suHhnI ohtLvAiFGw1k99T86Ir+MHnhIm29mLj0IPjMVZQWtTng5JlM/uDwN7v4JltY2AtlQq4RwTG/aal 6x5zZ5VsHpjTf+Na+WzK4TJzyIX4B09kBaBQK3QcsXmxwyUJRzlhWZsRIIYRi6svZnRZ2pfNGNV eZB3YId9wLUXiKMztabwSYB64Xu+f6l24RwDwEqPiIU+PlWgJbb9adVQWxaRdvcjz7JLGlaeZZC 42WaLA2YH7M2IFx6fYy39J2jByCiKM85dOm6nx5Sx4KQCY X-Received: by 2002:a05:693c:3005:b0:2c1:23d:c728 with SMTP id 5a478bee46e88-2c185f95f4fmr9120915eec.31.1774975654492; Tue, 31 Mar 2026 09:47:34 -0700 (PDT) Received: from medusa.lab.kspace.sh ([208.88.152.253]) by smtp.googlemail.com with UTF8SMTPSA id 5a478bee46e88-2c3c46a0a84sm10278042eec.9.2026.03.31.09.47.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Mar 2026 09:47:34 -0700 (PDT) Date: Tue, 31 Mar 2026 09:47:33 -0700 From: Mohamed Khalfella To: Hannes Reinecke Cc: Justin Tee , Naresh Gottumukkala , Paul Ely , Chaitanya Kulkarni , Jens Axboe , Keith Busch , Sagi Grimberg , James Smart , Aaron Dailey , Randy Jennings , Dhaval Giani , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 08/15] nvme: Implement cross-controller reset recovery Message-ID: <20260331164733.GC2861-mkhalfella@purestorage.com> References: <20260328004518.1729186-1-mkhalfella@purestorage.com> <20260328004518.1729186-9-mkhalfella@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon 2026-03-30 12:50:24 +0200, Hannes Reinecke wrote: > On 3/28/26 01:43, Mohamed Khalfella wrote: > > A host that has more than one path connecting to an nvme subsystem > > typically has an nvme controller associated with every path. This is > > mostly applicable to nvmeof. If one path goes down, inflight IOs on that > > path should not be retried immediately on another path because this > > could lead to data corruption as described in TP4129. TP8028 defines > > cross-controller reset mechanism that can be used by host to terminate > > IOs on the failed path using one of the remaining healthy paths. Only > > after IOs are terminated, or long enough time passes as defined by > > TP4129, inflight IOs should be retried on another path. Implement core > > cross-controller reset shared logic to be used by the transports. > > > > Signed-off-by: Mohamed Khalfella > > --- > > drivers/nvme/host/constants.c | 1 + > > drivers/nvme/host/core.c | 145 ++++++++++++++++++++++++++++++++++ > > drivers/nvme/host/nvme.h | 9 +++ > > 3 files changed, 155 insertions(+) > > > > diff --git a/drivers/nvme/host/constants.c b/drivers/nvme/host/constants.c > > index dc90df9e13a2..f679efd5110e 100644 > > --- a/drivers/nvme/host/constants.c > > +++ b/drivers/nvme/host/constants.c > > @@ -46,6 +46,7 @@ static const char * const nvme_admin_ops[] = { > > [nvme_admin_virtual_mgmt] = "Virtual Management", > > [nvme_admin_nvme_mi_send] = "NVMe Send MI", > > [nvme_admin_nvme_mi_recv] = "NVMe Receive MI", > > + [nvme_admin_cross_ctrl_reset] = "Cross Controller Reset", > > [nvme_admin_dbbuf] = "Doorbell Buffer Config", > > [nvme_admin_format_nvm] = "Format NVM", > > [nvme_admin_security_send] = "Security Send", > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 824a1193bec8..5603ae36444f 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -554,6 +554,150 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl) > > } > > EXPORT_SYMBOL_GPL(nvme_cancel_admin_tagset); > > > > +static struct nvme_ctrl *nvme_find_ctrl_ccr(struct nvme_ctrl *ictrl, > > + u32 min_cntlid) > > +{ > > + struct nvme_subsystem *subsys = ictrl->subsys; > > + struct nvme_ctrl *ctrl, *sctrl = NULL; > > + unsigned long flags; > > + > > + mutex_lock(&nvme_subsystems_lock); > > + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) { > > + if (ctrl->cntlid < min_cntlid) > > + continue; > > + > > + if (atomic_dec_if_positive(&ctrl->ccr_limit) < 0) > > + continue; > > + > > + spin_lock_irqsave(&ctrl->lock, flags); > > + if (ctrl->state != NVME_CTRL_LIVE) { > > + spin_unlock_irqrestore(&ctrl->lock, flags); > > + atomic_inc(&ctrl->ccr_limit); > > + continue; > > + } > > + > > + /* > > + * We got a good candidate source controller that is locked and > > + * LIVE. However, no guarantee ctrl will not be deleted after > > + * ctrl->lock is released. Get a ref of both ctrl and admin_q > > + * so they do not disappear until we are done with them. > > + */ > > + WARN_ON_ONCE(!blk_get_queue(ctrl->admin_q)); > > + nvme_get_ctrl(ctrl); > > + spin_unlock_irqrestore(&ctrl->lock, flags); > > + sctrl = ctrl; > > + break; > > + } > > + mutex_unlock(&nvme_subsystems_lock); > > + return sctrl; > > +} > > + > > +static void nvme_put_ctrl_ccr(struct nvme_ctrl *sctrl) > > +{ > > + atomic_inc(&sctrl->ccr_limit); > > + blk_put_queue(sctrl->admin_q); > > + nvme_put_ctrl(sctrl); > > +} > > + > > +static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl, > > + unsigned long deadline) > > +{ > > + struct nvme_ccr_entry ccr = { }; > > + union nvme_result res = { 0 }; > > + struct nvme_command c = { }; > > + unsigned long flags, now, tmo = 0; > > + bool completed = false; > > + int ret = 0; > > + u32 result; > > + > > + init_completion(&ccr.complete); > > + ccr.ictrl = ictrl; > > + > > + spin_lock_irqsave(&sctrl->lock, flags); > > + list_add_tail(&ccr.list, &sctrl->ccr_list); > > + spin_unlock_irqrestore(&sctrl->lock, flags); > > + > > + c.ccr.opcode = nvme_admin_cross_ctrl_reset; > > + c.ccr.ciu = ictrl->ciu; > > + c.ccr.icid = cpu_to_le16(ictrl->cntlid); > > + c.ccr.cirn = cpu_to_le64(ictrl->cirn); > > + ret = __nvme_submit_sync_cmd(sctrl->admin_q, &c, &res, > > + NULL, 0, NVME_QID_ANY, 0); > > + if (ret) { > > + ret = -EIO; > > + goto out; > > + } > > + > > + result = le32_to_cpu(res.u32); > > + if (result & 0x01) /* Immediate Reset Successful */ > > + goto out; > > + > > + now = jiffies; > > + if (time_before(now, deadline)) > > + tmo = min_t(unsigned long, > > + secs_to_jiffies(ictrl->kato), deadline - now); > > + > > + if (!wait_for_completion_timeout(&ccr.complete, tmo)) { > > + ret = -ETIMEDOUT; > > + goto out; > > + } > > + > > + completed = true; > > + > > +out: > > + spin_lock_irqsave(&sctrl->lock, flags); > > + list_del(&ccr.list); > > + spin_unlock_irqrestore(&sctrl->lock, flags); > > + if (completed) { > > + if (ccr.ccrs == NVME_CCR_STATUS_SUCCESS) > > + return 0; > > + return -EREMOTEIO; > > + } > > + return ret; > > +} > > + > > +int nvme_fence_ctrl(struct nvme_ctrl *ictrl) > > +{ > > + unsigned long deadline, timeout; > > + struct nvme_ctrl *sctrl; > > + u32 min_cntlid = 0; > > + int ret; > > + > > + timeout = nvme_fence_timeout_ms(ictrl); > > + dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout); > > + > > + deadline = jiffies + msecs_to_jiffies(timeout); > > + while (time_is_after_jiffies(deadline)) { > > + sctrl = nvme_find_ctrl_ccr(ictrl, min_cntlid); > > + if (!sctrl) { > > + dev_dbg(ictrl->device, > > + "failed to find source controller\n"); > > + return -EIO; > > + } > > + > > + ret = nvme_issue_wait_ccr(sctrl, ictrl, deadline); > > + if (!ret) { > > + dev_info(ictrl->device, "CCR succeeded using %s\n", > > + dev_name(sctrl->device)); > > + nvme_put_ctrl_ccr(sctrl); > > + return 0; > > + } > > + > > + min_cntlid = sctrl->cntlid + 1; > > + nvme_put_ctrl_ccr(sctrl); > > + > > + if (ret == -EIO) /* CCR command failed */ > > + continue; > > + > > + /* CCR operation failed or timed out */ > > + return ret; > > + } > > + > > + dev_info(ictrl->device, "CCR operation timeout\n"); > > + return -ETIMEDOUT; > > +} > > Please restructure the loop. > Having a comment 'CCR operation failed or timed out', > returning a status, and then have a comment > 'CCR operation timeout' _after_ the return is confusing. I can change /* CCR operation failed or timed out */ to something like /* * Source controller accepted CCR command but CCR operation * timed out or failed. Retrying another path is not likely * to succeed, return an error. */ And change the log line "CCR operation timeout\n" outside the while loop to "fencing timedout\n". Will this help? > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg > HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich