From: Hannes Reinecke <hare@suse.de>
To: Nilay Shroff <nilay@linux.ibm.com>,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Cc: hch@lst.de, kbusch@kernel.org, sagi@grimberg.me,
jmeneghi@redhat.com, axboe@kernel.dk, martin.petersen@oracle.com,
gjoyce@ibm.com
Subject: Re: [RFC PATCHv2 2/3] nvme: introduce multipath_head_always module param
Date: Tue, 29 Apr 2025 09:01:53 +0200 [thread overview]
Message-ID: <10ba7fa9-15e9-48b9-a8ac-e7c3982a211c@suse.de> (raw)
In-Reply-To: <cdbd9209-420e-4c1b-a0f4-30b2c7e9cfb3@linux.ibm.com>
On 4/29/25 08:24, Nilay Shroff wrote:
>
>
> On 4/29/25 11:19 AM, Hannes Reinecke wrote:
>> On 4/28/25 09:39, Nilay Shroff wrote:
>>>
>>>
>>> On 4/28/25 12:27 PM, Hannes Reinecke wrote:
>>>> On 4/25/25 12:33, Nilay Shroff wrote:
>>>>> Currently, a multipath head disk node is not created for single-ported
>>>>> NVMe adapters or private namespaces. However, creating a head node in
>>>>> these cases can help transparently handle transient PCIe link failures.
>>>>> Without a head node, features like delayed removal cannot be leveraged,
>>>>> making it difficult to tolerate such link failures. To address this,
>>>>> this commit introduces nvme_core module parameter multipath_head_always.
>>>>>
>>>>> When this param is set to true, it forces the creation of a multipath
>>>>> head node regardless NVMe disk or namespace type. So this option allows
>>>>> the use of delayed removal of head node functionality even for single-
>>>>> ported NVMe disks and private namespaces and thus helps transparently
>>>>> handling transient PCIe link failures.
>>>>>
>>>>> By default multipath_head_always is set to false, thus preserving the
>>>>> existing behavior. Setting it to true enables improved fault tolerance
>>>>> in PCIe setups. Moreover, please note that enabling this option would
>>>>> also implicitly enable nvme_core.multipath.
>>>>>
>>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>>>>> ---
>>>>> drivers/nvme/host/multipath.c | 70 +++++++++++++++++++++++++++++++----
>>>>> 1 file changed, 63 insertions(+), 7 deletions(-)
>>>>>
>>>> I really would model this according to dm-multipath where we have the
>>>> 'fail_if_no_path' flag.
>>>> This can be set for PCIe devices to retain the current behaviour
>>>> (which we need for things like 'md' on top of NVMe) whenever the
>>>> this flag is set.
>>>>
>>> Okay so you meant that when sysfs attribute "delayed_removal_secs"
>>> under head disk node is _NOT_ configured (or delayed_removal_secs
>>> is set to zero) we have internal flag "fail_if_no_path" is set to
>>> true. However in other case when "delayed_removal_secs" is set to
>>> a non-zero value we set "fail_if_no_path" to false. Is that correct?
>>>
>> Don't make it overly complicated.
>> 'fail_if_no_path' (and the inverse 'queue_if_no_path') can both be
>> mapped onto delayed_removal_secs; if the value is '0' then the head
>> disk is immediately removed (the 'fail_if_no_path' case), and if it's
>> -1 it is never removed (the 'queue_if_no_path' case).
>>
> Yes if the value of delayed_removal_secs is 0 then the head is immediately
> removed, however if value of delayed_removal_secs is anything but zero
> (i.e. greater than zero as delayed_removal_secs is unsigned) then head
> is removed only after delayed_removal_secs is elapsed and hence disk
> couldn't recover from transient link failure. We never pin head node
> indefinitely.
>
>> Question, though: How does it interact with the existing 'ctrl_loss_tmo'? Both describe essentially the same situation...
>>
> The delayed_removal_secs is modeled for NVMe PCIe adapter. So it really
> doesn't interact or interfere with ctrl_loss_tmo which is fabric controller
> option.
>
Not so sure here.
You _could_ expand the scope for ctrl_loss_tmo to PCI, too;
as most PCI devices will only ever have one controller 'ctrl_loss_tmo'
will be identical to 'delayed_removal_secs'.
So I guess my question is: is there a value for fabrics to control
the lifetime of struct ns_head independent on the lifetime of the
controller?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
next prev parent reply other threads:[~2025-04-29 7:02 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-25 10:33 [RFC PATCHv2 0/3] improve NVMe multipath handling Nilay Shroff
2025-04-25 10:33 ` [RFC PATCHv2 1/3] nvme-multipath: introduce delayed removal of the multipath head node Nilay Shroff
2025-04-25 14:43 ` Christoph Hellwig
2025-04-28 7:05 ` Nilay Shroff
2025-04-25 22:26 ` Sagi Grimberg
2025-04-28 7:39 ` Nilay Shroff
2025-04-25 10:33 ` [RFC PATCHv2 2/3] nvme: introduce multipath_head_always module param Nilay Shroff
2025-04-25 14:45 ` Christoph Hellwig
2025-04-29 6:26 ` Nilay Shroff
2025-04-28 6:57 ` Hannes Reinecke
2025-04-28 7:39 ` Nilay Shroff
2025-04-29 5:49 ` Hannes Reinecke
2025-04-29 6:24 ` Nilay Shroff
2025-04-29 7:01 ` Hannes Reinecke [this message]
2025-04-29 7:15 ` Nilay Shroff
2025-04-25 10:33 ` [RFC PATCHv2 3/3] nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk Nilay Shroff
2025-04-25 14:46 ` Christoph Hellwig
2025-04-25 22:27 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=10ba7fa9-15e9-48b9-a8ac-e7c3982a211c@suse.de \
--to=hare@suse.de \
--cc=axboe@kernel.dk \
--cc=gjoyce@ibm.com \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=martin.petersen@oracle.com \
--cc=nilay@linux.ibm.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox