From: Kevin Wolf <kwolf@redhat.com>
To: Martin Wilck <mwilck@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Benjamin Marzinski <bmarzins@redhat.com>,
dm-devel@lists.linux.dev, hreitz@redhat.com, mpatocka@redhat.com,
snitzer@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] dm mpath: Interface for explicit probing of active paths
Date: Thu, 15 May 2025 12:11:49 +0200 [thread overview]
Message-ID: <aCW95f8RGpLJZwSA@redhat.com> (raw)
In-Reply-To: <cc2ec011cf286cb5d119f2378ecbd7b818e46769.camel@suse.com>
Am 14.05.2025 um 23:21 hat Martin Wilck geschrieben:
> On Tue, 2025-05-13 at 10:00 +0200, Martin Wilck wrote:
> > > If you think it does, is there another reason why you didn't try
> > > this
> > > before?
> >
> > It didn't occur to me back then that we could fail paths without
> > retrying in the kernel.
> >
> > Perhaps we could have the sg driver pass the blk_status_t (which is
> > available on the sg level) to device mapper somehow in the sg_io_hdr
> > structure? That way we could entirely avoid the layering violation
> > between SCSI and dm. Not sure if that would be acceptible to
> > Christoph,
> > as blk_status_t is supposed to be exclusive to the kernel. Can we
> > find
> > a way to make sure it's passed to DM, but not to user space?
>
> I have to correct myself. I was confused by my old patches which
> contain special casing for SG_IO. The current upstream code does of
> course not support special-casing SG_IO in any way. device-mapper
> neither looks at the ioctl `cmd` value nor at any arguments, and has
> only the Unix error code to examine when the ioctl returns. The device
> mapper layer has access to *less* information than the user space
> process that issued the ioctl. Adding hooks to the sg driver wouldn't
> buy us anything in this situation.
>
> If we can't change this, we can't fail paths in the SG_IO error code
> path, end of story.
Yes, as long as we can't look at the sg_io_hdr, there is no way to
figure out if we got a path error.
> With Kevin's patch 1/2 applied, it would in principle be feasible to
> special-case SG_IO, handle it in the dm-multipath, retrieve the
> blk_status_t somehow, and possibly initiate path failover. This way
> we'd at least keep the generic dm layer clean of SCSI specific code.
> But still, the end result would look very similar attempt from 2021 and
> would therefore lead us nowhere, probably.
Right, that was my impression, too.
The interfaces could be made look a bit different, and we could return
-EAGAIN to userspace instead of retrying immediately (not that it makes
sense to me, but if that were really the issue, fine with me), but the
core logic with copying the sg_io_hdr, calling sg_io() directly and then
inspecting the status and possibly failing paths would have to be pretty
much the same as you had.
> I'm still not too fond of DM_MPATH_PROBE_PATHS_CMD, but I can't offer a
> better solution at this time. If the side issues are fixed, it will be
> an improvement over the current upstream, situation where we can do no
> path failover at all.
Yes, I agree we should focus on improving what we have, rather than
trying to find another radically different approach that none of us have
thought of before.
> In the long term, we should evaluate alternatives. If my conjecture in
> my previous post is correct we need only PRIN/PROUT commands, there
> might be a better solution than scsi-block for our customers. Using
> regular block IO should actually also improved performance.
If you're talking about SG_IO in dm-mpath, then PRIN/PROUT commands are
actually the one thing that we don't need. libmpathpersist sends the
commands to the individual path devices, so dm-mpath will never see
those. It's mostly about getting the full results on the SCSI level for
normal I/O commands.
There has actually been a patch series on qemu-devel last year (that I
haven't found the time to review properly yet) that would add explicit
persistent reservation operations to QEMU's block layer that could then
be used with the emulated scsi-hd device. On the backend, it only
implemented it for iscsi, but I suppose we could implement it for
file-posix, too (using the same libmpathpersist code as for
passthrough). If that works, maybe at least some users can move away
from SCSI passthrough.
The thing that we need to make sure, though, is that the emulated status
we can expose to the guest is actually good enough. That Paolo said that
the problem with reservation conflicts was mostly because -EBADE wasn't
a thing yet gives me some hope that at least this wouldn't be a problem
any more today.
We would still lose other parts of the SCSI status, so I'm still a bit
cautious here with making a prediction for how many users could
eventually (I expect years) use the emulated device instead and how many
would keep using passthrough even in the long term.
Kevin
next prev parent reply other threads:[~2025-05-15 10:12 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-29 16:50 [PATCH 0/2] dm mpath: Interface for explicit probing of active paths Kevin Wolf
2025-04-29 16:50 ` [PATCH 1/2] dm: Allow .prepare_ioctl to handle ioctls directly Kevin Wolf
2025-04-29 23:22 ` Benjamin Marzinski
2025-05-08 13:50 ` Martin Wilck
2025-04-29 16:50 ` [PATCH 2/2] dm mpath: Interface for explicit probing of active paths Kevin Wolf
2025-04-29 23:22 ` Benjamin Marzinski
2025-05-08 13:51 ` [PATCH 0/2] " Martin Wilck
2025-05-12 13:46 ` Mikulas Patocka
2025-05-13 7:06 ` Martin Wilck
2025-05-12 15:18 ` Kevin Wolf
2025-05-13 5:55 ` Christoph Hellwig
2025-05-13 6:09 ` Hannes Reinecke
2025-05-13 6:14 ` Christoph Hellwig
2025-05-13 6:32 ` Hannes Reinecke
2025-05-13 6:49 ` Christoph Hellwig
2025-05-13 8:17 ` Martin Wilck
2025-05-14 4:53 ` Christoph Hellwig
2025-05-15 11:14 ` Paolo Bonzini
2025-05-13 16:29 ` Benjamin Marzinski
2025-05-14 4:56 ` Christoph Hellwig
2025-05-14 6:39 ` Hannes Reinecke
2025-05-14 16:01 ` Benjamin Marzinski
2025-05-16 5:52 ` Christoph Hellwig
2025-05-13 9:29 ` Kevin Wolf
2025-05-13 15:43 ` Paolo Bonzini
2025-05-14 4:57 ` Christoph Hellwig
2025-05-14 16:23 ` Benjamin Marzinski
2025-05-14 17:37 ` Martin Wilck
2025-05-15 2:53 ` Paolo Bonzini
2025-05-15 10:34 ` Martin Wilck
2025-05-15 10:51 ` Paolo Bonzini
2025-05-15 14:50 ` Martin Wilck
2025-05-15 14:29 ` Benjamin Marzinski
2025-05-15 15:00 ` Martin Wilck
2025-05-16 5:57 ` Christoph Hellwig
2025-05-13 6:30 ` Hannes Reinecke
2025-05-13 18:09 ` Benjamin Marzinski
2025-05-13 8:00 ` Martin Wilck
2025-05-13 10:06 ` Martin Wilck
2025-05-14 21:21 ` Martin Wilck
2025-05-15 10:11 ` Kevin Wolf [this message]
2025-05-15 11:09 ` Paolo Bonzini
2025-05-15 15:18 ` Martin Wilck
2025-05-15 15:05 ` Martin Wilck
2025-05-16 6:00 ` Christoph Hellwig
2025-05-16 16:06 ` Benjamin Marzinski
2025-05-19 5:32 ` Christoph Hellwig
2025-05-19 18:24 ` Benjamin Marzinski
2025-05-28 20:44 ` Martin Wilck
2025-05-19 10:06 ` Kevin Wolf
2025-05-19 17:33 ` Martin Wilck
2025-05-20 13:46 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aCW95f8RGpLJZwSA@redhat.com \
--to=kwolf@redhat.com \
--cc=bmarzins@redhat.com \
--cc=dm-devel@lists.linux.dev \
--cc=hch@infradead.org \
--cc=hreitz@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=mwilck@suse.com \
--cc=snitzer@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).