From: Mike Snitzer <snitzer@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: linux-scsi@vger.kernel.org, Daniel Wagner <dwagner@suse.de>,
Christoph Hellwig <hch@lst.de>,
linux-block@vger.kernel.org, dm-devel@redhat.com,
Paolo Bonzini <pbonzini@redhat.com>,
mwilck@suse.com, Alasdair G Kergon <agk@redhat.com>
Subject: Re: [dm-devel] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
Date: Fri, 30 Apr 2021 16:15:43 -0400 [thread overview]
Message-ID: <20210430201542.GA7880@redhat.com> (raw)
In-Reply-To: <7124009b-1ea5-61eb-419f-956e659a0996@suse.de>
On Thu, Apr 29 2021 at 2:02am -0400,
Hannes Reinecke <hare@suse.de> wrote:
> On 4/28/21 9:54 PM, Mike Snitzer wrote:
> >On Thu, Apr 22 2021 at 4:21P -0400,
> >mwilck@suse.com <mwilck@suse.com> wrote:
> >
> >>From: Martin Wilck <mwilck@suse.com>
> >>
> >>In virtual deployments, SCSI passthrough over dm-multipath devices is a
> >>common setup. The qemu "pr-helper" was specifically invented for it. I
> >>believe that this is the most important real-world scenario for sending
> >>SG_IO ioctls to device-mapper devices.
> >>
> >>In this configuration, guests send SCSI IO to the hypervisor in the form of
> >>SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> >>ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> >>switch paths in dm_blk_ioctl() code path"), no path switching was done at
> >>all. Worse though, if an SG_IO call fails because of a path error,
> >>dm-multipath doesn't retry the IO on a another path; rather, the failure is
> >>passed back to the guest, and paths are not marked as faulty. This is in
> >>stark contrast with regular block IO of guests on dm-multipath devices, and
> >>certainly comes as a surprise to users who switch to SCSI passthrough in
> >>qemu. In general, users of dm-multipath devices would probably expect failover
> >>to work at least in a basic way.
> >>
> >>This patch fixes this by taking a special code path for SG_IO on request-
> >>based device mapper targets. Rather then just choosing a single path,
> >>sending the IO to it, and failing to the caller if the IO on the path
> >>failed, it retries the same IO on another path for certain error codes,
> >>using the same logic as blk_path_error() to determine if a retry would make
> >>sense for the given error code. Moreover, it sends a message to the
> >>multipath target to mark the path as failed.
> >>
> >>I am aware that this looks sort of hack-ish. I'm submitting it here as an
> >>RFC, asking for guidance how to reach a clean implementation that would be
> >>acceptable in mainline. I believe that it fixes an important problem.
> >>Actually, it fixes the qemu SCSI-passthrough use case on dm-multipath,
> >>which is non-functional without it.
> >>
> >>One problem remains open: if all paths in a multipath map are failed,
> >>normal multipath IO may switch to queueing mode (depending on
> >>configuration). This isn't possible for SG_IO, as SG_IO requests can't
> >>easily be queued like regular block I/O. Thus in the "no path" case, the
> >>guest will still see an error. If anybody can conceive of a solution for
> >>that, I'd be interested.
> >>
> >>Signed-off-by: Martin Wilck <mwilck@suse.com>
> >>---
> >> block/scsi_ioctl.c | 5 +-
> >> drivers/md/dm.c | 134 ++++++++++++++++++++++++++++++++++++++++-
> >> include/linux/blkdev.h | 2 +
> >> 3 files changed, 137 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> >>index 6599bac0a78c..bcc60552f7b1 100644
> >>--- a/block/scsi_ioctl.c
> >>+++ b/block/scsi_ioctl.c
> >>@@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
> >> return ret;
> >> }
> >>-static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>- struct sg_io_hdr *hdr, fmode_t mode)
> >>+int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>+ struct sg_io_hdr *hdr, fmode_t mode)
> >> {
> >> unsigned long start_time;
> >> ssize_t ret = 0;
> >>@@ -369,6 +369,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >> blk_put_request(rq);
> >> return ret;
> >> }
> >>+EXPORT_SYMBOL_GPL(sg_io);
> >> /**
> >> * sg_scsi_ioctl -- handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> >>diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >>index 50b693d776d6..443ac5e5406c 100644
> >>--- a/drivers/md/dm.c
> >>+++ b/drivers/md/dm.c
> >>@@ -29,6 +29,8 @@
> >> #include <linux/part_stat.h>
> >> #include <linux/blk-crypto.h>
> >> #include <linux/keyslot-manager.h>
> >>+#include <scsi/sg.h>
> >>+#include <scsi/scsi.h>
> >> #define DM_MSG_PREFIX "core"
> >
> >Ngh... not loving this at all. But here is a patch (that I didn't
> >compile test) that should be folded in, still have to think about how
> >this could be made cleaner in general though.
> >
> >Also, dm_sg_io_ioctl should probably be in dm-rq.c (maybe even dm-mpath.c!?)
> >
> I fully share your sentiments; this just feels so awkward, having to
> redo the logic in scsi_cmd_ioctl().
>
> My original idea to that was to _use_ scsi_cmd_ioctl() directly from
> dm_blk_ioctl:
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 50b693d776d6..007ff981f888 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -567,6 +567,9 @@ static int dm_blk_ioctl(struct block_device
> *bdev, fmode_t mode,
> struct mapped_device *md = bdev->bd_disk->private_data;
> int r, srcu_idx;
>
> + if (cmd == SG_IO)
> + return scsi_cmd_blk_ioctl(bdev, mode, cmd, arg);
> +
> r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
> if (r < 0)
> goto out;
>
>
> But that crashes horribly as sg_io() expects the request pdu to be
> of type 'scsi_request', which it definitely isn't here.
>
> We _could_ prepend struct dm_rq_target_io with a struct
> scsi_request, seeing that basically _all_ dm-rq installations are on
> SCSI devices:
If calling sg_io() is an issue then how does Martin's latest patchset,
that exports and calls sg_io direct from DM, work!?
And _no_ I have no interest in embedding struct scsi_request in
dm_rq_target_io.
Mike
>
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 13b4385f4d5a..aa7856621f83 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -16,6 +16,7 @@
> * One of these is allocated per request.
> */
> struct dm_rq_target_io {
> + struct scsi_request scsi_req;
> struct mapped_device *md;
> struct dm_target *ti;
> struct request *orig, *clone;
>
> Then the above should work, but we would increase the size of
> dm_rq_target_io by quite a lot. Although, given the current size of
> it, question is whether it matters...
>
> Hmm?
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Kernel Storage Architect
> hare@suse.de +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
>
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: mwilck@suse.com, Alasdair G Kergon <agk@redhat.com>,
dm-devel@redhat.com, Daniel Wagner <dwagner@suse.de>,
linux-block@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Christoph Hellwig <hch@lst.de>,
linux-scsi@vger.kernel.org
Subject: Re: dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
Date: Fri, 30 Apr 2021 16:15:43 -0400 [thread overview]
Message-ID: <20210430201542.GA7880@redhat.com> (raw)
In-Reply-To: <7124009b-1ea5-61eb-419f-956e659a0996@suse.de>
On Thu, Apr 29 2021 at 2:02am -0400,
Hannes Reinecke <hare@suse.de> wrote:
> On 4/28/21 9:54 PM, Mike Snitzer wrote:
> >On Thu, Apr 22 2021 at 4:21P -0400,
> >mwilck@suse.com <mwilck@suse.com> wrote:
> >
> >>From: Martin Wilck <mwilck@suse.com>
> >>
> >>In virtual deployments, SCSI passthrough over dm-multipath devices is a
> >>common setup. The qemu "pr-helper" was specifically invented for it. I
> >>believe that this is the most important real-world scenario for sending
> >>SG_IO ioctls to device-mapper devices.
> >>
> >>In this configuration, guests send SCSI IO to the hypervisor in the form of
> >>SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> >>ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> >>switch paths in dm_blk_ioctl() code path"), no path switching was done at
> >>all. Worse though, if an SG_IO call fails because of a path error,
> >>dm-multipath doesn't retry the IO on a another path; rather, the failure is
> >>passed back to the guest, and paths are not marked as faulty. This is in
> >>stark contrast with regular block IO of guests on dm-multipath devices, and
> >>certainly comes as a surprise to users who switch to SCSI passthrough in
> >>qemu. In general, users of dm-multipath devices would probably expect failover
> >>to work at least in a basic way.
> >>
> >>This patch fixes this by taking a special code path for SG_IO on request-
> >>based device mapper targets. Rather then just choosing a single path,
> >>sending the IO to it, and failing to the caller if the IO on the path
> >>failed, it retries the same IO on another path for certain error codes,
> >>using the same logic as blk_path_error() to determine if a retry would make
> >>sense for the given error code. Moreover, it sends a message to the
> >>multipath target to mark the path as failed.
> >>
> >>I am aware that this looks sort of hack-ish. I'm submitting it here as an
> >>RFC, asking for guidance how to reach a clean implementation that would be
> >>acceptable in mainline. I believe that it fixes an important problem.
> >>Actually, it fixes the qemu SCSI-passthrough use case on dm-multipath,
> >>which is non-functional without it.
> >>
> >>One problem remains open: if all paths in a multipath map are failed,
> >>normal multipath IO may switch to queueing mode (depending on
> >>configuration). This isn't possible for SG_IO, as SG_IO requests can't
> >>easily be queued like regular block I/O. Thus in the "no path" case, the
> >>guest will still see an error. If anybody can conceive of a solution for
> >>that, I'd be interested.
> >>
> >>Signed-off-by: Martin Wilck <mwilck@suse.com>
> >>---
> >> block/scsi_ioctl.c | 5 +-
> >> drivers/md/dm.c | 134 ++++++++++++++++++++++++++++++++++++++++-
> >> include/linux/blkdev.h | 2 +
> >> 3 files changed, 137 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> >>index 6599bac0a78c..bcc60552f7b1 100644
> >>--- a/block/scsi_ioctl.c
> >>+++ b/block/scsi_ioctl.c
> >>@@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
> >> return ret;
> >> }
> >>-static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>- struct sg_io_hdr *hdr, fmode_t mode)
> >>+int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>+ struct sg_io_hdr *hdr, fmode_t mode)
> >> {
> >> unsigned long start_time;
> >> ssize_t ret = 0;
> >>@@ -369,6 +369,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >> blk_put_request(rq);
> >> return ret;
> >> }
> >>+EXPORT_SYMBOL_GPL(sg_io);
> >> /**
> >> * sg_scsi_ioctl -- handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> >>diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >>index 50b693d776d6..443ac5e5406c 100644
> >>--- a/drivers/md/dm.c
> >>+++ b/drivers/md/dm.c
> >>@@ -29,6 +29,8 @@
> >> #include <linux/part_stat.h>
> >> #include <linux/blk-crypto.h>
> >> #include <linux/keyslot-manager.h>
> >>+#include <scsi/sg.h>
> >>+#include <scsi/scsi.h>
> >> #define DM_MSG_PREFIX "core"
> >
> >Ngh... not loving this at all. But here is a patch (that I didn't
> >compile test) that should be folded in, still have to think about how
> >this could be made cleaner in general though.
> >
> >Also, dm_sg_io_ioctl should probably be in dm-rq.c (maybe even dm-mpath.c!?)
> >
> I fully share your sentiments; this just feels so awkward, having to
> redo the logic in scsi_cmd_ioctl().
>
> My original idea to that was to _use_ scsi_cmd_ioctl() directly from
> dm_blk_ioctl:
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 50b693d776d6..007ff981f888 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -567,6 +567,9 @@ static int dm_blk_ioctl(struct block_device
> *bdev, fmode_t mode,
> struct mapped_device *md = bdev->bd_disk->private_data;
> int r, srcu_idx;
>
> + if (cmd == SG_IO)
> + return scsi_cmd_blk_ioctl(bdev, mode, cmd, arg);
> +
> r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
> if (r < 0)
> goto out;
>
>
> But that crashes horribly as sg_io() expects the request pdu to be
> of type 'scsi_request', which it definitely isn't here.
>
> We _could_ prepend struct dm_rq_target_io with a struct
> scsi_request, seeing that basically _all_ dm-rq installations are on
> SCSI devices:
If calling sg_io() is an issue then how does Martin's latest patchset,
that exports and calls sg_io direct from DM, work!?
And _no_ I have no interest in embedding struct scsi_request in
dm_rq_target_io.
Mike
>
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 13b4385f4d5a..aa7856621f83 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -16,6 +16,7 @@
> * One of these is allocated per request.
> */
> struct dm_rq_target_io {
> + struct scsi_request scsi_req;
> struct mapped_device *md;
> struct dm_target *ti;
> struct request *orig, *clone;
>
> Then the above should work, but we would increase the size of
> dm_rq_target_io by quite a lot. Although, given the current size of
> it, question is whether it matters...
>
> Hmm?
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Kernel Storage Architect
> hare@suse.de +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
>
next prev parent reply other threads:[~2021-04-30 20:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-22 20:21 [dm-devel] [PATCH] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
2021-04-22 20:21 ` mwilck
2021-04-22 20:24 ` [dm-devel] [*RFC* PATCH] " Martin Wilck
2021-04-22 20:24 ` Martin Wilck
2021-04-28 19:54 ` [dm-devel] " Mike Snitzer
2021-04-28 19:54 ` Mike Snitzer
2021-04-28 20:54 ` [dm-devel] " Martin Wilck
2021-04-28 20:54 ` Martin Wilck
2021-04-29 6:02 ` [dm-devel] " Hannes Reinecke
2021-04-29 6:02 ` Hannes Reinecke
2021-04-30 20:15 ` Mike Snitzer [this message]
2021-04-30 20:15 ` Mike Snitzer
2021-05-03 7:27 ` [dm-devel] " Martin Wilck
2021-05-03 7:27 ` Martin Wilck
2021-04-29 8:33 ` [dm-devel] " Martin Wilck
2021-04-29 8:33 ` Martin Wilck
2021-04-29 8:39 ` [dm-devel] " Paolo Bonzini
2021-04-29 8:39 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210430201542.GA7880@redhat.com \
--to=snitzer@redhat.com \
--cc=agk@redhat.com \
--cc=dm-devel@redhat.com \
--cc=dwagner@suse.de \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mwilck@suse.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.