From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f50.google.com (mail-dl1-f50.google.com [74.125.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 089A68287E for ; Tue, 7 Apr 2026 20:46:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775594818; cv=none; b=K5LFOFLzcnhktMPMVOkGCxlYcOeg6/MRDdmkSeWn4aaKgo4xfzXp4pEjG9LaTB3/lueoVEINsWUb5slfljm2czPO3aH8KBFMKrqQw5wpGQU5lxbMWY7WWRdo1NBeWkTv/NDdw9euza54LFhPUVGoArGR3Wt2naimcNaHnLhM9go= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775594818; c=relaxed/simple; bh=G0WMQVPF8v6QtYh9yec5s/WMWlJ3dB749d49iJOQanY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=bLL7cgW6C/MqCrLR2Anz4KJTo05mzlmQ4ma9rcaLrAiaXwY3aPlZhYcB87MBSprSTxy615l3KE+QGrtMAQALeVH/41BXPaDGfrVwVFePicFH8UNB7Bw8DToUiN73dYT3O1ORN61Q2ixG5Vc8yRa4kjXA4dkcyWWhvDI4YNSS0fQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=ccYuCFW5; arc=none smtp.client-ip=74.125.82.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="ccYuCFW5" Received: by mail-dl1-f50.google.com with SMTP id a92af1059eb24-1271257ae53so6007376c88.1 for ; Tue, 07 Apr 2026 13:46:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1775594816; x=1776199616; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=o0Rb6Syr+uyB0QHqFMGe8rTCN44fF3340UUGjzkUSv8=; b=ccYuCFW5giPgLcg34n7leVbXOVPfs5D2aaA/brXM/FqfCIAWB0kpEv9bSyi3AahXXM 32HCG7Xw55Xo1zIwzQSkf1veMYxPPbYlB+vGPq3oIM5og3wx2oxX0bZ0uabVe0BRQrmM TCHj1BU7NsLoVkQkIyQHynWHzySqHLbXDllM4m38+XHro1TxWi82GFeUxo6MVw2tTZH6 4VvgdR71iibQdQMMXR3j85G59aWnlLkEC/Lwk1JF1by/Smp3jy1Q/iDTs6rXB3CrjMkj B5OOjAi5ZB+bq3bSuIf4Ym/L2lMbAzcJJt8I4Rq+gjNypB7px4Zf3ZywwU4vmrH+ylBV vkvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775594816; x=1776199616; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=o0Rb6Syr+uyB0QHqFMGe8rTCN44fF3340UUGjzkUSv8=; b=lq57lwCTO3tG88GRjxVU6u6Lu2UCkN5GIGYCJMmCrLVU6M0hdrje75H3o/l5z5KYv/ qK7tmGla1OVS6BROINo6/CyE8mWRk4C/zFJQmCmpV4YUuPem7TIB+QKt+hn0vYepAV8+ 2CwRP8N3j47neLwoY9UD78+39P34MqHtEeDzWS2hH7Y6BtTq5I8FCFoKOrHV0bKLLMKo HKsMP8Eb50xBddZFkO+25Fux2AJO01795ABwEn9P4LfdoOUIRKVCdKHQ7jr/T7nM4KtD v+b+HV60opoAO26X49iCzMkdH+OLyV9YwajCqI8ScJ/Fd28J2Tyzn+IxB5HhiEbmW0Mh weew== X-Forwarded-Encrypted: i=1; AJvYcCW59MAtLVYZcPVISd7GifZ9CCzDRBaKFufeI+G5X9h7nvAHdhBVzacEzN9/TEAqAMcIcCNydfzVfrXhSsc=@vger.kernel.org X-Gm-Message-State: AOJu0YwYe/wQbFr/xo+KdolvAJqI50NrkD/5VzSrbfF20L555YqaUzq5 aIydXe8DcqCzkM/zcgB/AxJ12VKdIsJXTyCIxFAiz72VsfY/kNGjq5PdylUAauycGBs= X-Gm-Gg: AeBDiesholpVQec6vp4TB4hMIlOGgaMXgCW0cR3VAlNdh7en6AxM1VE2apqcT5dW9KR V7yvjmB24Q8jUutvw28GNJLaklt1aadqRwTnzYBuYMpW9KdXPp9Y7OlRl5h2w0ZVInHXMlIQMVI l3WXDW6mtQQgpu14EtWKOVpBI9KI2koqyHxAVv0owQMotP1TStQpAwmmoDajhkajCXTmQ5mre3p CU5pFbrJjQv+kXcWC0hexBlJuDdYqt8bCWI1oIslz+hbMqU0NDUnBy1p4zhNe+2bThLMsA+7E/s aqo3tfZBXU5tFzDylNf5Je/xJTs4iND/o/vTlPF8xCcrya2nzWtu9HAM9xXyCAasAFFepv0TxNm x2bxTNradsCBq6sbQ+86SCIGW3nWefcVXBTJcdiWFyl37fOeXIfgfo9p9O9KI5OBlL6vpgtOQHa phbWiVfpo+YP3s7hZFTfwExUjJEjV3KYIntIYIMTBwBase X-Received: by 2002:a05:7022:e23:b0:128:ca83:5aa1 with SMTP id a92af1059eb24-12bfb70eee8mr9003118c88.16.1775594815694; Tue, 07 Apr 2026 13:46:55 -0700 (PDT) Received: from medusa.lab.kspace.sh ([208.88.152.253]) by smtp.googlemail.com with UTF8SMTPSA id a92af1059eb24-12bed93f861sm22095867c88.0.2026.04.07.13.46.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Apr 2026 13:46:54 -0700 (PDT) Date: Tue, 7 Apr 2026 13:46:53 -0700 From: Mohamed Khalfella To: Hannes Reinecke Cc: Justin Tee , Naresh Gottumukkala , Paul Ely , Chaitanya Kulkarni , Jens Axboe , Keith Busch , Sagi Grimberg , James Smart , Aaron Dailey , Randy Jennings , Dhaval Giani , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 08/15] nvme: Implement cross-controller reset recovery Message-ID: <20260407204653.GG2861-mkhalfella@purestorage.com> References: <20260328004518.1729186-1-mkhalfella@purestorage.com> <20260328004518.1729186-9-mkhalfella@purestorage.com> <20260331164733.GC2861-mkhalfella@purestorage.com> <5d3fecf4-9101-4028-858d-bfbbccf3d8d3@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5d3fecf4-9101-4028-858d-bfbbccf3d8d3@suse.de> On Tue 2026-04-07 07:39:09 +0200, Hannes Reinecke wrote: > On 3/31/26 18:47, Mohamed Khalfella wrote: > > On Mon 2026-03-30 12:50:24 +0200, Hannes Reinecke wrote: > >> On 3/28/26 01:43, Mohamed Khalfella wrote: > >>> A host that has more than one path connecting to an nvme subsystem > >>> typically has an nvme controller associated with every path. This is > >>> mostly applicable to nvmeof. If one path goes down, inflight IOs on that > >>> path should not be retried immediately on another path because this > >>> could lead to data corruption as described in TP4129. TP8028 defines > >>> cross-controller reset mechanism that can be used by host to terminate > >>> IOs on the failed path using one of the remaining healthy paths. Only > >>> after IOs are terminated, or long enough time passes as defined by > >>> TP4129, inflight IOs should be retried on another path. Implement core > >>> cross-controller reset shared logic to be used by the transports. > >>> > >>> Signed-off-by: Mohamed Khalfella > >>> --- > >>> drivers/nvme/host/constants.c | 1 + > >>> drivers/nvme/host/core.c | 145 ++++++++++++++++++++++++++++++++++ > >>> drivers/nvme/host/nvme.h | 9 +++ > >>> 3 files changed, 155 insertions(+) > >>> > [ .. ] > >>> + > >>> +int nvme_fence_ctrl(struct nvme_ctrl *ictrl) > >>> +{ > >>> + unsigned long deadline, timeout; > >>> + struct nvme_ctrl *sctrl; > >>> + u32 min_cntlid = 0; > >>> + int ret; > >>> + > >>> + timeout = nvme_fence_timeout_ms(ictrl); > >>> + dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout); > >>> + > >>> + deadline = jiffies + msecs_to_jiffies(timeout); > >>> + while (time_is_after_jiffies(deadline)) { > >>> + sctrl = nvme_find_ctrl_ccr(ictrl, min_cntlid); > >>> + if (!sctrl) { > >>> + dev_dbg(ictrl->device, > >>> + "failed to find source controller\n"); > >>> + return -EIO; > >>> + } > >>> + > >>> + ret = nvme_issue_wait_ccr(sctrl, ictrl, deadline); > >>> + if (!ret) { > >>> + dev_info(ictrl->device, "CCR succeeded using %s\n", > >>> + dev_name(sctrl->device)); > >>> + nvme_put_ctrl_ccr(sctrl); > >>> + return 0; > >>> + } > >>> + > >>> + min_cntlid = sctrl->cntlid + 1; > >>> + nvme_put_ctrl_ccr(sctrl); > >>> + > >>> + if (ret == -EIO) /* CCR command failed */ > >>> + continue; > >>> + > >>> + /* CCR operation failed or timed out */ > >>> + return ret; > >>> + } > >>> + > >>> + dev_info(ictrl->device, "CCR operation timeout\n"); > >>> + return -ETIMEDOUT; > >>> +} > >> > >> Please restructure the loop. > >> Having a comment 'CCR operation failed or timed out', > >> returning a status, and then have a comment > >> 'CCR operation timeout' _after_ the return is confusing. > > > > I can change /* CCR operation failed or timed out */ to something like > > > > /* > > * Source controller accepted CCR command but CCR operation > > * timed out or failed. Retrying another path is not likely > > * to succeed, return an error. > > */ > > > > And change the log line "CCR operation timeout\n" outside the while > > loop to "fencing timedout\n". > > > > Will this help? > > > Yes, thank you. Will do that. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg > HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich