[PATCH] file-posix: Probe paths and retry SG

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
@ 2025-05-13 11:37 Kevin Wolf
  2025-05-13 13:51 ` Stefan Hajnoczi
  2025-05-14 13:43 ` Hanna Czenczek
  0 siblings, 2 replies; 9+ messages in thread
From: Kevin Wolf @ 2025-05-13 11:37 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, hreitz, stefanha, pbonzini, bmarzins, qemu-devel

When scsi-block is used on a host multipath device, it runs into the
problem that the kernel dm-mpath doesn't know anything about SCSI or
SG_IO and therefore can't decide if a SG_IO request returned an error
and needs to be retried on a different path. Instead of getting working
failover, an error is returned to scsi-block and handled according to
the configured error policy. Obviously, this is not what users want,
they want working failover.

QEMU can parse the SG_IO result and determine whether this could have
been a path error, but just retrying the same request could just send it
to the same failing path again and result in the same error.

With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
block devices (queued in the device mapper tree for Linux 6.16), we can
tell the kernel to probe all paths and tell us if any usable paths
remained. If so, we can now retry the SG_IO ioctl and expect it to be
sent to a working path.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ef52ed9169..2ea41dbc2d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -41,6 +41,7 @@
 
 #include "scsi/pr-manager.h"
 #include "scsi/constants.h"
+#include "scsi/utils.h"
 
 #if defined(__APPLE__) && (__MACH__)
 #include <sys/ioctl.h>
@@ -72,6 +73,7 @@
 #include <linux/blkzoned.h>
 #endif
 #include <linux/cdrom.h>
+#include <linux/dm-ioctl.h>
 #include <linux/fd.h>
 #include <linux/fs.h>
 #include <linux/hdreg.h>
@@ -138,6 +140,8 @@
 #define RAW_LOCK_PERM_BASE             100
 #define RAW_LOCK_SHARED_BASE           200
 
+#define SG_IO_MAX_RETRIES 5
+
 typedef struct BDRVRawState {
     int fd;
     bool use_lock;
@@ -165,6 +169,7 @@ typedef struct BDRVRawState {
     bool use_linux_aio:1;
     bool has_laio_fdsync:1;
     bool use_linux_io_uring:1;
+    bool use_mpath:1;
     int page_cache_inconsistent; /* errno from fdatasync failure */
     bool has_fallocate;
     bool needs_alignment;
@@ -4263,15 +4268,86 @@ hdev_open_Mac_error:
     /* Since this does ioctl the device must be already opened */
     bs->sg = hdev_is_sg(bs);
 
+    /* sg devices aren't even block devices and can't use dm-mpath */
+    s->use_mpath = !bs->sg;
+
     return ret;
 }
 
 #if defined(__linux__)
+#if defined(DM_MPATH_PROBE_PATHS)
+static bool sgio_path_error(int ret, sg_io_hdr_t *io_hdr)
+{
+    if (ret == -ENODEV) {
+        return true;
+    } else if (ret < 0) {
+        return false;
+    }
+
+    if (io_hdr->host_status != SCSI_HOST_OK) {
+        return true;
+    }
+
+    switch (io_hdr->status) {
+    case GOOD:
+    case CONDITION_GOOD:
+    case INTERMEDIATE_GOOD:
+    case INTERMEDIATE_C_GOOD:
+    case RESERVATION_CONFLICT:
+    case COMMAND_TERMINATED:
+        return false;
+    case CHECK_CONDITION:
+        return !scsi_sense_buf_is_guest_recoverable(io_hdr->sbp,
+                                                    io_hdr->mx_sb_len);
+    default:
+        return true;
+    }
+}
+
+static bool coroutine_fn hdev_co_ioctl_sgio_retry(RawPosixAIOData *acb, int ret)
+{
+    BDRVRawState *s = acb->bs->opaque;
+    RawPosixAIOData probe_acb;
+
+    if (!s->use_mpath) {
+        return false;
+    }
+
+    if (!sgio_path_error(ret, acb->ioctl.buf)) {
+        return false;
+    }
+
+    probe_acb = (RawPosixAIOData) {
+        .bs         = acb->bs,
+        .aio_type   = QEMU_AIO_IOCTL,
+        .aio_fildes = s->fd,
+        .aio_offset = 0,
+        .ioctl      = {
+            .buf        = NULL,
+            .cmd        = DM_MPATH_PROBE_PATHS,
+        },
+    };
+
+    ret = raw_thread_pool_submit(handle_aiocb_ioctl, &probe_acb);
+    if (ret == -ENOTTY) {
+        s->use_mpath = false;
+    }
+
+    return ret == 0;
+}
+#else
+static bool coroutine_fn hdev_co_ioctl_sgio_retry(RawPosixAIOData *acb, int ret)
+{
+    return false;
+}
+#endif /* DM_MPATH_PROBE_PATHS */
+
 static int coroutine_fn
 hdev_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 {
     BDRVRawState *s = bs->opaque;
     RawPosixAIOData acb;
+    int retries = SG_IO_MAX_RETRIES;
     int ret;
 
     ret = fd_open(bs);
@@ -4299,7 +4375,11 @@ hdev_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
         },
     };
 
-    return raw_thread_pool_submit(handle_aiocb_ioctl, &acb);
+    do {
+        ret = raw_thread_pool_submit(handle_aiocb_ioctl, &acb);
+    } while (req == SG_IO && retries-- && hdev_co_ioctl_sgio_retry(&acb, ret));
+
+    return ret;
 }
 #endif /* linux */
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-13 11:37 [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors Kevin Wolf
@ 2025-05-13 13:51 ` Stefan Hajnoczi
  2025-05-15  8:15   ` Kevin Wolf
  2025-05-14 13:43 ` Hanna Czenczek
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2025-05-13 13:51 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, pbonzini, bmarzins, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1477 bytes --]

On Tue, May 13, 2025 at 01:37:30PM +0200, Kevin Wolf wrote:
> When scsi-block is used on a host multipath device, it runs into the
> problem that the kernel dm-mpath doesn't know anything about SCSI or
> SG_IO and therefore can't decide if a SG_IO request returned an error
> and needs to be retried on a different path. Instead of getting working
> failover, an error is returned to scsi-block and handled according to
> the configured error policy. Obviously, this is not what users want,
> they want working failover.
> 
> QEMU can parse the SG_IO result and determine whether this could have
> been a path error, but just retrying the same request could just send it
> to the same failing path again and result in the same error.
> 
> With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> block devices (queued in the device mapper tree for Linux 6.16), we can
> tell the kernel to probe all paths and tell us if any usable paths
> remained. If so, we can now retry the SG_IO ioctl and expect it to be
> sent to a working path.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 81 insertions(+), 1 deletion(-)

Maybe the probability of retry success would be higher with a delay so
that intermittent issues have time to resolve themselves. Either way,
the patch looks good.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-13 11:37 [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors Kevin Wolf
  2025-05-13 13:51 ` Stefan Hajnoczi
@ 2025-05-14 13:43 ` Hanna Czenczek
  1 sibling, 0 replies; 9+ messages in thread
From: Hanna Czenczek @ 2025-05-14 13:43 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: stefanha, pbonzini, bmarzins, qemu-devel

On 13.05.25 13:37, Kevin Wolf wrote:
> When scsi-block is used on a host multipath device, it runs into the
> problem that the kernel dm-mpath doesn't know anything about SCSI or
> SG_IO and therefore can't decide if a SG_IO request returned an error
> and needs to be retried on a different path. Instead of getting working
> failover, an error is returned to scsi-block and handled according to
> the configured error policy. Obviously, this is not what users want,
> they want working failover.
>
> QEMU can parse the SG_IO result and determine whether this could have
> been a path error, but just retrying the same request could just send it
> to the same failing path again and result in the same error.
>
> With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> block devices (queued in the device mapper tree for Linux 6.16), we can
> tell the kernel to probe all paths and tell us if any usable paths
> remained. If so, we can now retry the SG_IO ioctl and expect it to be
> sent to a working path.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 81 insertions(+), 1 deletion(-)

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-13 13:51 ` Stefan Hajnoczi
@ 2025-05-15  8:15   ` Kevin Wolf
  2025-05-15 14:01     ` Stefan Hajnoczi
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2025-05-15  8:15 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-block, hreitz, pbonzini, bmarzins, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2472 bytes --]

Am 13.05.2025 um 15:51 hat Stefan Hajnoczi geschrieben:
> On Tue, May 13, 2025 at 01:37:30PM +0200, Kevin Wolf wrote:
> > When scsi-block is used on a host multipath device, it runs into the
> > problem that the kernel dm-mpath doesn't know anything about SCSI or
> > SG_IO and therefore can't decide if a SG_IO request returned an error
> > and needs to be retried on a different path. Instead of getting working
> > failover, an error is returned to scsi-block and handled according to
> > the configured error policy. Obviously, this is not what users want,
> > they want working failover.
> > 
> > QEMU can parse the SG_IO result and determine whether this could have
> > been a path error, but just retrying the same request could just send it
> > to the same failing path again and result in the same error.
> > 
> > With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> > block devices (queued in the device mapper tree for Linux 6.16), we can
> > tell the kernel to probe all paths and tell us if any usable paths
> > remained. If so, we can now retry the SG_IO ioctl and expect it to be
> > sent to a working path.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 81 insertions(+), 1 deletion(-)
> 
> Maybe the probability of retry success would be higher with a delay so
> that intermittent issues have time to resolve themselves. Either way,
> the patch looks good.

I don't think adding a delay here would be helpful. The point of
multipath isn't that you wait until a bad path comes back, but that you
just switch to a different path until it is restored.

Ideally, calling DM_MPATH_PROBE_PATHS would just remove all the bad
paths instantaneously and we would either be able to send the request
using one of the remaining good paths, or know that we have to fail. In
practice, the ioctl will probably have to wait for a timeout, so you get
a delay anyway.

What could potentially be useful is a new error policy [rw]error=retry
with a configurable delay. This wouldn't be in file-posix, but in the
devices after file-posix came to the conclusion that there is currently
no usable path. On the other hand, retrying indefinitely is probably not
what you want either, so that could quickly become rather complex.

> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Thanks.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-15  8:15   ` Kevin Wolf
@ 2025-05-15 14:01     ` Stefan Hajnoczi
  2025-05-15 15:02       ` Kevin Wolf
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2025-05-15 14:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, pbonzini, bmarzins, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3127 bytes --]

On Thu, May 15, 2025 at 10:15:53AM +0200, Kevin Wolf wrote:
> Am 13.05.2025 um 15:51 hat Stefan Hajnoczi geschrieben:
> > On Tue, May 13, 2025 at 01:37:30PM +0200, Kevin Wolf wrote:
> > > When scsi-block is used on a host multipath device, it runs into the
> > > problem that the kernel dm-mpath doesn't know anything about SCSI or
> > > SG_IO and therefore can't decide if a SG_IO request returned an error
> > > and needs to be retried on a different path. Instead of getting working
> > > failover, an error is returned to scsi-block and handled according to
> > > the configured error policy. Obviously, this is not what users want,
> > > they want working failover.
> > > 
> > > QEMU can parse the SG_IO result and determine whether this could have
> > > been a path error, but just retrying the same request could just send it
> > > to the same failing path again and result in the same error.
> > > 
> > > With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> > > block devices (queued in the device mapper tree for Linux 6.16), we can
> > > tell the kernel to probe all paths and tell us if any usable paths
> > > remained. If so, we can now retry the SG_IO ioctl and expect it to be
> > > sent to a working path.
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 81 insertions(+), 1 deletion(-)
> > 
> > Maybe the probability of retry success would be higher with a delay so
> > that intermittent issues have time to resolve themselves. Either way,
> > the patch looks good.
> 
> I don't think adding a delay here would be helpful. The point of
> multipath isn't that you wait until a bad path comes back, but that you
> just switch to a different path until it is restored.

That's not what this loop does. DM_MPATH_PROBE_PATHS probes all paths
and fails when no paths are available. The delay would only apply in the
case when there are no paths available.

If the point is not to wait until some path comes back, then why loop at
all?

> Ideally, calling DM_MPATH_PROBE_PATHS would just remove all the bad
> paths instantaneously and we would either be able to send the request
> using one of the remaining good paths, or know that we have to fail. In
> practice, the ioctl will probably have to wait for a timeout, so you get
> a delay anyway.

True, if the read requests themselves time out rather than failing with
an immediate error, then no delay is required. I guess both cases can
happen and userspace has no visibility aside from measuring the time
spent in the ioctl.

> What could potentially be useful is a new error policy [rw]error=retry
> with a configurable delay. This wouldn't be in file-posix, but in the
> devices after file-posix came to the conclusion that there is currently
> no usable path. On the other hand, retrying indefinitely is probably not
> what you want either, so that could quickly become rather complex.

Yes, it will be complex. I have no objection to this patch as-is.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-15 14:01     ` Stefan Hajnoczi
@ 2025-05-15 15:02       ` Kevin Wolf
  2025-05-20 14:03         ` Stefan Hajnoczi
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2025-05-15 15:02 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-block, hreitz, pbonzini, bmarzins, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3381 bytes --]

Am 15.05.2025 um 16:01 hat Stefan Hajnoczi geschrieben:
> On Thu, May 15, 2025 at 10:15:53AM +0200, Kevin Wolf wrote:
> > Am 13.05.2025 um 15:51 hat Stefan Hajnoczi geschrieben:
> > > On Tue, May 13, 2025 at 01:37:30PM +0200, Kevin Wolf wrote:
> > > > When scsi-block is used on a host multipath device, it runs into the
> > > > problem that the kernel dm-mpath doesn't know anything about SCSI or
> > > > SG_IO and therefore can't decide if a SG_IO request returned an error
> > > > and needs to be retried on a different path. Instead of getting working
> > > > failover, an error is returned to scsi-block and handled according to
> > > > the configured error policy. Obviously, this is not what users want,
> > > > they want working failover.
> > > > 
> > > > QEMU can parse the SG_IO result and determine whether this could have
> > > > been a path error, but just retrying the same request could just send it
> > > > to the same failing path again and result in the same error.
> > > > 
> > > > With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> > > > block devices (queued in the device mapper tree for Linux 6.16), we can
> > > > tell the kernel to probe all paths and tell us if any usable paths
> > > > remained. If so, we can now retry the SG_IO ioctl and expect it to be
> > > > sent to a working path.
> > > > 
> > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > > ---
> > > >  block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 81 insertions(+), 1 deletion(-)
> > > 
> > > Maybe the probability of retry success would be higher with a delay so
> > > that intermittent issues have time to resolve themselves. Either way,
> > > the patch looks good.
> > 
> > I don't think adding a delay here would be helpful. The point of
> > multipath isn't that you wait until a bad path comes back, but that you
> > just switch to a different path until it is restored.
> 
> That's not what this loop does. DM_MPATH_PROBE_PATHS probes all paths
> and fails when no paths are available. The delay would only apply in the
> case when there are no paths available.
> 
> If the point is not to wait until some path comes back, then why loop at
> all?

DM_MPATH_PROBE_PATHS can only send I/O to paths in the active path
group, so it doesn't fail over to different path groups. If there are no
usable paths left in the current path group, but there are some in
another one, then the ioctl returns 0 and the next SG_IO would switch to
a different path group, which may or may not succeed. If it fails, we
have to probe the paths in that group, too.

> > Ideally, calling DM_MPATH_PROBE_PATHS would just remove all the bad
> > paths instantaneously and we would either be able to send the request
> > using one of the remaining good paths, or know that we have to fail. In
> > practice, the ioctl will probably have to wait for a timeout, so you get
> > a delay anyway.
> 
> True, if the read requests themselves time out rather than failing with
> an immediate error, then no delay is required. I guess both cases can
> happen and userspace has no visibility aside from measuring the time
> spent in the ioctl.

Right. But if they fail with an immediate error, that's really what we
want because it means no I/O that is stuck for a long time.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-15 15:02       ` Kevin Wolf
@ 2025-05-20 14:03         ` Stefan Hajnoczi
  2025-05-21 17:46           ` Kevin Wolf
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2025-05-20 14:03 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, pbonzini, bmarzins, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3013 bytes --]

On Thu, May 15, 2025 at 05:02:46PM +0200, Kevin Wolf wrote:
> Am 15.05.2025 um 16:01 hat Stefan Hajnoczi geschrieben:
> > On Thu, May 15, 2025 at 10:15:53AM +0200, Kevin Wolf wrote:
> > > Am 13.05.2025 um 15:51 hat Stefan Hajnoczi geschrieben:
> > > > On Tue, May 13, 2025 at 01:37:30PM +0200, Kevin Wolf wrote:
> > > > > When scsi-block is used on a host multipath device, it runs into the
> > > > > problem that the kernel dm-mpath doesn't know anything about SCSI or
> > > > > SG_IO and therefore can't decide if a SG_IO request returned an error
> > > > > and needs to be retried on a different path. Instead of getting working
> > > > > failover, an error is returned to scsi-block and handled according to
> > > > > the configured error policy. Obviously, this is not what users want,
> > > > > they want working failover.
> > > > > 
> > > > > QEMU can parse the SG_IO result and determine whether this could have
> > > > > been a path error, but just retrying the same request could just send it
> > > > > to the same failing path again and result in the same error.
> > > > > 
> > > > > With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> > > > > block devices (queued in the device mapper tree for Linux 6.16), we can
> > > > > tell the kernel to probe all paths and tell us if any usable paths
> > > > > remained. If so, we can now retry the SG_IO ioctl and expect it to be
> > > > > sent to a working path.
> > > > > 
> > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > > > ---
> > > > >  block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 81 insertions(+), 1 deletion(-)
> > > > 
> > > > Maybe the probability of retry success would be higher with a delay so
> > > > that intermittent issues have time to resolve themselves. Either way,
> > > > the patch looks good.
> > > 
> > > I don't think adding a delay here would be helpful. The point of
> > > multipath isn't that you wait until a bad path comes back, but that you
> > > just switch to a different path until it is restored.
> > 
> > That's not what this loop does. DM_MPATH_PROBE_PATHS probes all paths
> > and fails when no paths are available. The delay would only apply in the
> > case when there are no paths available.
> > 
> > If the point is not to wait until some path comes back, then why loop at
> > all?
> 
> DM_MPATH_PROBE_PATHS can only send I/O to paths in the active path
> group, so it doesn't fail over to different path groups. If there are no
> usable paths left in the current path group, but there are some in
> another one, then the ioctl returns 0 and the next SG_IO would switch to
> a different path group, which may or may not succeed. If it fails, we
> have to probe the paths in that group, too.

This wasn't obvious to me, can that be emphasized in the code via naming
or comments? About retrying up to 5 times: is the assumption that there
will be 5 or fewer path groups?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-20 14:03         ` Stefan Hajnoczi
@ 2025-05-21 17:46           ` Kevin Wolf
  2025-05-21 18:23             ` Benjamin Marzinski
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2025-05-21 17:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-block, hreitz, pbonzini, bmarzins, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3775 bytes --]

Am 20.05.2025 um 16:03 hat Stefan Hajnoczi geschrieben:
> On Thu, May 15, 2025 at 05:02:46PM +0200, Kevin Wolf wrote:
> > Am 15.05.2025 um 16:01 hat Stefan Hajnoczi geschrieben:
> > > On Thu, May 15, 2025 at 10:15:53AM +0200, Kevin Wolf wrote:
> > > > Am 13.05.2025 um 15:51 hat Stefan Hajnoczi geschrieben:
> > > > > On Tue, May 13, 2025 at 01:37:30PM +0200, Kevin Wolf wrote:
> > > > > > When scsi-block is used on a host multipath device, it runs into the
> > > > > > problem that the kernel dm-mpath doesn't know anything about SCSI or
> > > > > > SG_IO and therefore can't decide if a SG_IO request returned an error
> > > > > > and needs to be retried on a different path. Instead of getting working
> > > > > > failover, an error is returned to scsi-block and handled according to
> > > > > > the configured error policy. Obviously, this is not what users want,
> > > > > > they want working failover.
> > > > > > 
> > > > > > QEMU can parse the SG_IO result and determine whether this could have
> > > > > > been a path error, but just retrying the same request could just send it
> > > > > > to the same failing path again and result in the same error.
> > > > > > 
> > > > > > With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> > > > > > block devices (queued in the device mapper tree for Linux 6.16), we can
> > > > > > tell the kernel to probe all paths and tell us if any usable paths
> > > > > > remained. If so, we can now retry the SG_IO ioctl and expect it to be
> > > > > > sent to a working path.
> > > > > > 
> > > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > > > > ---
> > > > > >  block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
> > > > > >  1 file changed, 81 insertions(+), 1 deletion(-)
> > > > > 
> > > > > Maybe the probability of retry success would be higher with a delay so
> > > > > that intermittent issues have time to resolve themselves. Either way,
> > > > > the patch looks good.
> > > > 
> > > > I don't think adding a delay here would be helpful. The point of
> > > > multipath isn't that you wait until a bad path comes back, but that you
> > > > just switch to a different path until it is restored.
> > > 
> > > That's not what this loop does. DM_MPATH_PROBE_PATHS probes all paths
> > > and fails when no paths are available. The delay would only apply in the
> > > case when there are no paths available.
> > > 
> > > If the point is not to wait until some path comes back, then why loop at
> > > all?
> > 
> > DM_MPATH_PROBE_PATHS can only send I/O to paths in the active path
> > group, so it doesn't fail over to different path groups. If there are no
> > usable paths left in the current path group, but there are some in
> > another one, then the ioctl returns 0 and the next SG_IO would switch to
> > a different path group, which may or may not succeed. If it fails, we
> > have to probe the paths in that group, too.
> 
> This wasn't obvious to me, can that be emphasized in the code via naming
> or comments? About retrying up to 5 times: is the assumption that there
> will be 5 or fewer path groups?

Originally, the thought behind the 5 was more about the case where
DM_MPATH_PROBE_PATHS offlines bad paths, but then another one goes down
before we retry SG_IO, so that it fails again.

But you're right that it would now apply to retrying in a different path
group. The assumption we make would then be that there will be 5 or
fewer path groups with no working path in them (rather than just 5 of
them existing). That doesn't seem like a completely unreasonable
assumption, but maybe we should increase the number now just to be on
the safe side?

Ben, do you have an opinion on this?

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors
  2025-05-21 17:46           ` Kevin Wolf
@ 2025-05-21 18:23             ` Benjamin Marzinski
  0 siblings, 0 replies; 9+ messages in thread
From: Benjamin Marzinski @ 2025-05-21 18:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-block, hreitz, pbonzini, qemu-devel

On Wed, May 21, 2025 at 07:46:30PM +0200, Kevin Wolf wrote:
> Am 20.05.2025 um 16:03 hat Stefan Hajnoczi geschrieben:
> > On Thu, May 15, 2025 at 05:02:46PM +0200, Kevin Wolf wrote:
> > > Am 15.05.2025 um 16:01 hat Stefan Hajnoczi geschrieben:
> > > > On Thu, May 15, 2025 at 10:15:53AM +0200, Kevin Wolf wrote:
> > > > > Am 13.05.2025 um 15:51 hat Stefan Hajnoczi geschrieben:
> > > > > > On Tue, May 13, 2025 at 01:37:30PM +0200, Kevin Wolf wrote:
> > > > > > > When scsi-block is used on a host multipath device, it runs into the
> > > > > > > problem that the kernel dm-mpath doesn't know anything about SCSI or
> > > > > > > SG_IO and therefore can't decide if a SG_IO request returned an error
> > > > > > > and needs to be retried on a different path. Instead of getting working
> > > > > > > failover, an error is returned to scsi-block and handled according to
> > > > > > > the configured error policy. Obviously, this is not what users want,
> > > > > > > they want working failover.
> > > > > > > 
> > > > > > > QEMU can parse the SG_IO result and determine whether this could have
> > > > > > > been a path error, but just retrying the same request could just send it
> > > > > > > to the same failing path again and result in the same error.
> > > > > > > 
> > > > > > > With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> > > > > > > block devices (queued in the device mapper tree for Linux 6.16), we can
> > > > > > > tell the kernel to probe all paths and tell us if any usable paths
> > > > > > > remained. If so, we can now retry the SG_IO ioctl and expect it to be
> > > > > > > sent to a working path.
> > > > > > > 
> > > > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > > > > > ---
> > > > > > >  block/file-posix.c | 82 +++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > >  1 file changed, 81 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > Maybe the probability of retry success would be higher with a delay so
> > > > > > that intermittent issues have time to resolve themselves. Either way,
> > > > > > the patch looks good.
> > > > > 
> > > > > I don't think adding a delay here would be helpful. The point of
> > > > > multipath isn't that you wait until a bad path comes back, but that you
> > > > > just switch to a different path until it is restored.
> > > > 
> > > > That's not what this loop does. DM_MPATH_PROBE_PATHS probes all paths
> > > > and fails when no paths are available. The delay would only apply in the
> > > > case when there are no paths available.
> > > > 
> > > > If the point is not to wait until some path comes back, then why loop at
> > > > all?
> > > 
> > > DM_MPATH_PROBE_PATHS can only send I/O to paths in the active path
> > > group, so it doesn't fail over to different path groups. If there are no
> > > usable paths left in the current path group, but there are some in
> > > another one, then the ioctl returns 0 and the next SG_IO would switch to
> > > a different path group, which may or may not succeed. If it fails, we
> > > have to probe the paths in that group, too.
> > 
> > This wasn't obvious to me, can that be emphasized in the code via naming
> > or comments? About retrying up to 5 times: is the assumption that there
> > will be 5 or fewer path groups?
> 
> Originally, the thought behind the 5 was more about the case where
> DM_MPATH_PROBE_PATHS offlines bad paths, but then another one goes down
> before we retry SG_IO, so that it fails again.
> 
> But you're right that it would now apply to retrying in a different path
> group. The assumption we make would then be that there will be 5 or
> fewer path groups with no working path in them (rather than just 5 of
> them existing). That doesn't seem like a completely unreasonable
> assumption, but maybe we should increase the number now just to be on
> the safe side?
> 
> Ben, do you have an opinion on this?

5 seems like a reasonable number. Unless people have the
path_grouping_policy set to failover, 5 path groups seems like more
than enough. You could make the argument that if users were configured
with failover (one path per path group) or had a number paths marked
marginal (and placed into marginal path groups), you could exceed 5
path groups. 8 would seems like a reasonable maximum then. It is
possible to have multipath devices with over 8 paths, but that's pretty
rare.

-Ben

> 
> Kevin




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-21 19:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 11:37 [PATCH] file-posix: Probe paths and retry SG_IO on potential path errors Kevin Wolf
2025-05-13 13:51 ` Stefan Hajnoczi
2025-05-15  8:15   ` Kevin Wolf
2025-05-15 14:01     ` Stefan Hajnoczi
2025-05-15 15:02       ` Kevin Wolf
2025-05-20 14:03         ` Stefan Hajnoczi
2025-05-21 17:46           ` Kevin Wolf
2025-05-21 18:23             ` Benjamin Marzinski
2025-05-14 13:43 ` Hanna Czenczek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).