From: Hanna Czenczek <hreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>, qemu-block@nongnu.org
Cc: qinwang@redhat.com, bmarzins@redhat.com, qemu-devel@nongnu.org,
qemu-stable@nongnu.org
Subject: Re: [PATCH for-10.2] file-posix: Handle suspended dm-multipath better for SG_IO
Date: Wed, 3 Dec 2025 18:27:20 +0100 [thread overview]
Message-ID: <1c7784d0-8952-4b4f-9a5e-923bdaef9f2f@redhat.com> (raw)
In-Reply-To: <20251128221440.89125-1-kwolf@redhat.com>
On 28.11.25 23:14, Kevin Wolf wrote:
> When introducing DM_MPATH_PROBE_PATHS, we already anticipated that
> dm-multipath devices might be suspended for a short time when the DM
> tables are reloaded and that they return -EAGAIN in this case. The
> behaviour promised in the comment wasn't actually implemented, though:
> We don't get SG_IO_MAX_RETRIES in practice, because after the first
> 1ms sleep, DM_MPATH_PROBE_PATHS is called and if that still fails with
> -EAGAIN, we error out immediately without any retry.
How so? `hdev_co_ioctl_sgio_retry()` is what issues
`DM_MPATH_PROBE_PATHS`, and if it gets `-EAGAIN` it will return `true`,
requesting a retry.
> However, meanwhile it has also turned out that libmpathpersist (which is
> used by qemu-pr-helper) may need to perform more complex recovery
> operations to get reservations back to expected state if a path failure
> happened in the middle of a PR operation. In this case, the device is
> suspended for a longer time compared to the case we originally expected.
In any case, this does warrant the change.
> This patch changes hdev_co_ioctl() to treat -EAGAIN separately so that
> it doesn't result in an immediate failure if the device is suspended for
> more than 1ms, and moves to incremental backoff to cover both quick and
> slow cases without excessive delays.
>
> Buglink: https://issues.redhat.com/browse/RHEL-121543
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/file-posix.c | 56 ++++++++++++++++++++++++++++------------------
> 1 file changed, 34 insertions(+), 22 deletions(-)
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index c9e367a222..6265d2e248 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -4288,25 +4288,8 @@ hdev_open_Mac_error:
> static bool coroutine_fn sgio_path_error(int ret, sg_io_hdr_t *io_hdr)
> {
> if (ret < 0) {
> - switch (ret) {
> - case -ENODEV:
> - return true;
> - case -EAGAIN:
> - /*
> - * The device is probably suspended. This happens while the dm table
> - * is reloaded, e.g. because a path is added or removed. This is an
> - * operation that should complete within 1ms, so just wait a bit and
> - * retry.
> - *
> - * If the device was suspended for another reason, we'll wait and
> - * retry SG_IO_MAX_RETRIES times. This is a tolerable delay before
> - * we return an error and potentially stop the VM.
> - */
> - qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 1000000);
> - return true;
> - default:
> - return false;
> - }
> + /* Path errors sometimes result in -ENODEV */
> + return ret == -ENODEV;
> }
>
> if (io_hdr->host_status != SCSI_HOST_OK) {
> @@ -4375,6 +4358,7 @@ hdev_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
> {
> BDRVRawState *s = bs->opaque;
> RawPosixAIOData acb;
> + uint64_t eagain_sleep_ns = 1 * SCALE_MS;
> int retries = SG_IO_MAX_RETRIES;
> int ret;
>
> @@ -4403,9 +4387,37 @@ hdev_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
> },
> };
>
> - do {
> - ret = raw_thread_pool_submit(handle_aiocb_ioctl, &acb);
> - } while (req == SG_IO && retries-- && hdev_co_ioctl_sgio_retry(&acb, ret));
> +retry:
> + ret = raw_thread_pool_submit(handle_aiocb_ioctl, &acb);
> + if (req == SG_IO && s->use_mpath) {
> + if (ret == -EAGAIN && eagain_sleep_ns < NANOSECONDS_PER_SECOND) {
> + /*
> + * If this is a multipath device, it is probably suspended.
> + *
> + * This can happen while the dm table is reloaded, e.g. because a
> + * path is added or removed. This is an operation that should
> + * complete within 1ms, so just wait a bit and retry.
> + *
> + * There are also some cases in which libmpathpersist must recover
> + * from path failure during its operation, which can leave the
> + * device suspended for a bit longer while the library brings back
> + * reservations into the expected state.
> + *
> + * Use increasing delays to cover both cases without waiting
> + * excessively, and stop after a bit more than a second (1023 ms).
> + * This is a tolerable delay before we return an error and
> + * potentially stop the VM.
> + */
> + qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, eagain_sleep_ns);
> + eagain_sleep_ns *= 2;
> + goto retry;
> + }
> +
> + /* Even for ret == 0, the SG_IO header can contain an error */
> + if (retries-- && hdev_co_ioctl_sgio_retry(&acb, ret)) {
> + goto retry;
> + }
> + }
>
> return ret;
> }
next prev parent reply other threads:[~2025-12-03 17:28 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-28 22:14 [PATCH for-10.2] file-posix: Handle suspended dm-multipath better for SG_IO Kevin Wolf
2025-12-03 17:27 ` Hanna Czenczek [this message]
2025-12-04 10:44 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1c7784d0-8952-4b4f-9a5e-923bdaef9f2f@redhat.com \
--to=hreitz@redhat.com \
--cc=bmarzins@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
--cc=qinwang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).