linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Meneghini <jmeneghi@redhat.com>
To: Hannes Reinecke <hare@suse.de>, Bryan Gurney <bgurney@redhat.com>,
	linux-nvme@lists.infradead.org, kbusch@kernel.org, hch@lst.de,
	sagi@grimberg.me, axboe@kernel.dk
Cc: james.smart@broadcom.com, dick.kennedy@broadcom.com,
	linux-scsi@vger.kernel.org, njavali@marvell.com,
	muneendra737@gmail.com
Subject: Re: [PATCH v7 0/6] nvme-fc: FPIN link integrity handling
Date: Tue, 8 Jul 2025 15:56:13 -0400	[thread overview]
Message-ID: <b0817e2a-1c4e-4d7d-9a84-633913d9e0e4@redhat.com> (raw)
In-Reply-To: <19484829-0a7b-42ce-99fe-e74ccda67dce@suse.de>

On 7/2/25 2:10 AM, Hannes Reinecke wrote:
>> During path fail testing on the numa iopolicy, I found that I/O moves
>> off of the marginal path after a first link integrity event is
>> received, but if the non-marginal path the I/O is on is disconnected,
>> the I/O is transferred onto a marginal path (in testing, sometimes
>> I've seen it go to a "marginal optimized" path, and sometimes
>> "marginal non-optimized").
>>
> That is by design.
> 'marginal' paths are only evaluated for the 'optimized' path selection,
> where it's obvious that 'marginal' paths should not be selected as
> 'optimized'.

I think we might want to change this.  With the NUMA scheduler you can end up with
using the non-optimized marginal path.  This happens when we test with 4 paths
(2 optimized and 2 non-optimized) and set all 4 paths to marginal.  In this case
the NUMA scheduler should simply choose the optimized marginal path on the closest
numa node.  However, that's not what happens. It consistently chooses the
non-optimized first non-optimized path.

> For 'non-optimized' the situation is less clear; is the 'non-optimized'
> path preferrable to 'marginal'? Or the other way round?
> So once the 'optimized' path selection returns no paths, _any_ of the
> remaining paths are eligible.

This is a good question for Broadcom.  I think, with all IO schedulers, as long
as there is a non-marginal path available, that path should be used.  So
a non-marginal non-optimized path should take precedence over a marginal optimized path.

In the case were all paths are marginal, I think the scheduler should simply use the firt
optimized path on the closest numa node.
   >> The queue-depth iopolicy doesn't change its path selection based on
>> the marginal flag, but looking at nvme_queue_depth_path(), I can see
>> that there's currently no logic to handle marginal paths.  We're
>> developing a patch to address that issue in queue-depth, but we need
>> to do more testing.
>>
> Again, by design.
> The whole point of the marginal path patchset is that I/O should
> be steered away from the marginal path, but the path itself should
> not completely shut off (otherwise we just could have declared the
> path as 'faulty' and be done with).
> Any I/O on 'marginal' paths should have higher latencies, and higher
> latencies should result in higher queue depths on these paths. So
> the queue-depth based IO scheduler should to the right thing
> automatically.

I don't understand this.  The Round-robin scheduler removes marginal paths, why shouldn't the
queue-depth and numa scheduler do the same?

> Always assuming that marginal paths should have higher latencies.
> If they haven't then they will be happily selection for I/O.
> But then again, if the marginal path does _not_ have highert
> latencies why shouldn't we select it for I/O?

This may be true with FPIN Congestion Signal, but we are testing Link Integrity.  With FPIN LI I think we want to simply stop using the path.
LI has nothing to do with latency.  So unless ALL paths are marginal, the IO scheduler should not be using any marginal path.

Do we need another state here?  There is an ask to support FPIN CS, so maybe using the term "marginal" to describe LI is wrong.

Maybe we need something like "marginal_li" and "marginal_cs" to describe the difference.

/John


  reply	other threads:[~2025-07-08 19:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24 20:20 [PATCH v7 0/6] nvme-fc: FPIN link integrity handling Bryan Gurney
2025-06-24 20:20 ` [PATCH v7 1/6] fc_els: use 'union fc_tlv_desc' Bryan Gurney
2025-06-24 20:20 ` [PATCH v7 2/6] nvme-fc: marginal path handling Bryan Gurney
2025-07-03 11:47   ` Christoph Hellwig
2025-06-24 20:20 ` [PATCH v7 3/6] nvme-fc: nvme_fc_fpin_rcv() callback Bryan Gurney
2025-06-24 20:20 ` [PATCH v7 4/6] lpfc: enable FPIN notification for NVMe Bryan Gurney
2025-06-24 20:20 ` [PATCH v7 5/6] qla2xxx: " Bryan Gurney
2025-06-24 20:20 ` [PATCH v7 6/6] nvme: sysfs: emit the marginal path state in show_state() Bryan Gurney
2025-07-01  0:27 ` [PATCHv7 0/6] nvme-fc: FPIN link integrity handling Muneendra Kumar
2025-07-01 20:32 ` [PATCH v7 " Bryan Gurney
2025-07-02  6:10   ` Hannes Reinecke
2025-07-08 19:56     ` John Meneghini [this message]
2025-07-09  6:21       ` Hannes Reinecke
2025-07-09 13:42         ` Bryan Gurney
2025-07-09 17:44           ` Hannes Reinecke
2025-07-11  8:54             ` Muneendra Kumar M
2025-07-11 16:49               ` John Meneghini
2025-07-11  2:59 ` [PATCH v8 8/8] nvme-multipath: queue-depth support for marginal paths Muneendra Kumar
2025-07-11 14:53   ` John Meneghini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0817e2a-1c4e-4d7d-9a84-633913d9e0e4@redhat.com \
    --to=jmeneghi@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bgurney@redhat.com \
    --cc=dick.kennedy@broadcom.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=muneendra737@gmail.com \
    --cc=njavali@marvell.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).