All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org, hreitz@redhat.com, bmarzins@redhat.com,
	pbonzini@redhat.com, qemu-devel@nongnu.org
Subject: Re: [PATCH v2] file-posix: Probe paths and retry SG_IO on potential path errors
Date: Thu, 22 May 2025 10:46:07 -0400	[thread overview]
Message-ID: <20250522144607.GA258433@fedora> (raw)
In-Reply-To: <20250522130803.34738-1-kwolf@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4409 bytes --]

On Thu, May 22, 2025 at 03:08:03PM +0200, Kevin Wolf wrote:
> When scsi-block is used on a host multipath device, it runs into the
> problem that the kernel dm-mpath doesn't know anything about SCSI or
> SG_IO and therefore can't decide if a SG_IO request returned an error
> and needs to be retried on a different path. Instead of getting working
> failover, an error is returned to scsi-block and handled according to
> the configured error policy. Obviously, this is not what users want,
> they want working failover.
> 
> QEMU can parse the SG_IO result and determine whether this could have
> been a path error, but just retrying the same request could just send it
> to the same failing path again and result in the same error.
> 
> With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath
> block devices (queued in the device mapper tree for Linux 6.16), we can
> tell the kernel to probe all paths and tell us if any usable paths
> remained. If so, we can now retry the SG_IO ioctl and expect it to be
> sent to a working path.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> v2:
> - Add a comment to explain retry scenarios [Stefan]
> - Handle -EAGAIN returned for suspended devices [Ben]
> 
>  block/file-posix.c | 115 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 114 insertions(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ec95b74869..569f4ca472 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -41,6 +41,7 @@
>  
>  #include "scsi/pr-manager.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  
>  #if defined(__APPLE__) && (__MACH__)
>  #include <sys/ioctl.h>
> @@ -72,6 +73,7 @@
>  #include <linux/blkzoned.h>
>  #endif
>  #include <linux/cdrom.h>
> +#include <linux/dm-ioctl.h>
>  #include <linux/fd.h>
>  #include <linux/fs.h>
>  #include <linux/hdreg.h>
> @@ -138,6 +140,22 @@
>  #define RAW_LOCK_PERM_BASE             100
>  #define RAW_LOCK_SHARED_BASE           200
>  
> +/*
> + * Multiple retries are mostly meant for two separate scenarios:
> + *
> + * - DM_MPATH_PROBE_PATHS returns success, but before SG_IO completes, another
> + *   path goes down.
> + *
> + * - DM_MPATH_PROBE_PATHS failed all paths in the current path group, so we have
> + *   to send another SG_IO to switch to another path group to probe the paths in
> + *   it.
> + *
> + * Even if each path is in a separate path group (path_grouping_policy set to
> + * failover), it's rare to have more than eight path groups - and even then
> + * pretty unlikely that only bad path groups would be chosen in eight retries.
> + */
> +#define SG_IO_MAX_RETRIES 8
> +
>  typedef struct BDRVRawState {
>      int fd;
>      bool use_lock;
> @@ -165,6 +183,7 @@ typedef struct BDRVRawState {
>      bool use_linux_aio:1;
>      bool has_laio_fdsync:1;
>      bool use_linux_io_uring:1;
> +    bool use_mpath:1;
>      int page_cache_inconsistent; /* errno from fdatasync failure */
>      bool has_fallocate;
>      bool needs_alignment;
> @@ -4264,15 +4283,105 @@ hdev_open_Mac_error:
>      /* Since this does ioctl the device must be already opened */
>      bs->sg = hdev_is_sg(bs);
>  
> +    /* sg devices aren't even block devices and can't use dm-mpath */
> +    s->use_mpath = !bs->sg;
> +
>      return ret;
>  }
>  
>  #if defined(__linux__)
> +#if defined(DM_MPATH_PROBE_PATHS)
> +static bool sgio_path_error(int ret, sg_io_hdr_t *io_hdr)
> +{
> +    if (ret < 0) {
> +        switch (ret) {
> +        case -ENODEV:
> +            return true;
> +        case -EAGAIN:
> +            /*
> +             * The device is probably suspended. This happens while the dm table
> +             * is reloaded, e.g. because a path is added or removed. This is an
> +             * operation that should complete within 1ms, so just wait a bit and
> +             * retry.
> +             *
> +             * If the device was suspended for another reason, we'll wait and
> +             * retry SG_IO_MAX_RETRIES times. This is a tolerable delay before
> +             * we return an error and potentially stop the VM.
> +             */
> +            qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 1000000);

sgio_path_error() is missing coroutine_fn.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-05-22 14:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-22 13:08 [PATCH v2] file-posix: Probe paths and retry SG_IO on potential path errors Kevin Wolf
2025-05-22 14:46 ` Stefan Hajnoczi [this message]
2025-05-22 14:57   ` Kevin Wolf
2025-05-22 15:13 ` Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250522144607.GA258433@fedora \
    --to=stefanha@redhat.com \
    --cc=bmarzins@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.