From: Mike Snitzer <snitzer@redhat.com>
To: mwilck@suse.com
Cc: Alasdair G Kergon <agk@redhat.com>,
dm-devel@redhat.com, Hannes Reinecke <hare@suse.de>,
Daniel Wagner <dwagner@suse.de>,
linux-block@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Christoph Hellwig <hch@lst.de>,
linux-scsi@vger.kernel.org
Subject: Re: dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath
Date: Wed, 28 Apr 2021 15:54:57 -0400 [thread overview]
Message-ID: <20210428195457.GA46518@lobo> (raw)
In-Reply-To: <20210422202130.30906-1-mwilck@suse.com>
On Thu, Apr 22 2021 at 4:21P -0400,
mwilck@suse.com <mwilck@suse.com> wrote:
> From: Martin Wilck <mwilck@suse.com>
>
> In virtual deployments, SCSI passthrough over dm-multipath devices is a
> common setup. The qemu "pr-helper" was specifically invented for it. I
> believe that this is the most important real-world scenario for sending
> SG_IO ioctls to device-mapper devices.
>
> In this configuration, guests send SCSI IO to the hypervisor in the form of
> SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> switch paths in dm_blk_ioctl() code path"), no path switching was done at
> all. Worse though, if an SG_IO call fails because of a path error,
> dm-multipath doesn't retry the IO on a another path; rather, the failure is
> passed back to the guest, and paths are not marked as faulty. This is in
> stark contrast with regular block IO of guests on dm-multipath devices, and
> certainly comes as a surprise to users who switch to SCSI passthrough in
> qemu. In general, users of dm-multipath devices would probably expect failover
> to work at least in a basic way.
>
> This patch fixes this by taking a special code path for SG_IO on request-
> based device mapper targets. Rather then just choosing a single path,
> sending the IO to it, and failing to the caller if the IO on the path
> failed, it retries the same IO on another path for certain error codes,
> using the same logic as blk_path_error() to determine if a retry would make
> sense for the given error code. Moreover, it sends a message to the
> multipath target to mark the path as failed.
>
> I am aware that this looks sort of hack-ish. I'm submitting it here as an
> RFC, asking for guidance how to reach a clean implementation that would be
> acceptable in mainline. I believe that it fixes an important problem.
> Actually, it fixes the qemu SCSI-passthrough use case on dm-multipath,
> which is non-functional without it.
>
> One problem remains open: if all paths in a multipath map are failed,
> normal multipath IO may switch to queueing mode (depending on
> configuration). This isn't possible for SG_IO, as SG_IO requests can't
> easily be queued like regular block I/O. Thus in the "no path" case, the
> guest will still see an error. If anybody can conceive of a solution for
> that, I'd be interested.
>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
> block/scsi_ioctl.c | 5 +-
> drivers/md/dm.c | 134 ++++++++++++++++++++++++++++++++++++++++-
> include/linux/blkdev.h | 2 +
> 3 files changed, 137 insertions(+), 4 deletions(-)
>
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 6599bac0a78c..bcc60552f7b1 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
> return ret;
> }
>
> -static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> - struct sg_io_hdr *hdr, fmode_t mode)
> +int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> + struct sg_io_hdr *hdr, fmode_t mode)
> {
> unsigned long start_time;
> ssize_t ret = 0;
> @@ -369,6 +369,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> blk_put_request(rq);
> return ret;
> }
> +EXPORT_SYMBOL_GPL(sg_io);
>
> /**
> * sg_scsi_ioctl -- handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 50b693d776d6..443ac5e5406c 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -29,6 +29,8 @@
> #include <linux/part_stat.h>
> #include <linux/blk-crypto.h>
> #include <linux/keyslot-manager.h>
> +#include <scsi/sg.h>
> +#include <scsi/scsi.h>
>
> #define DM_MSG_PREFIX "core"
>
Ngh... not loving this at all. But here is a patch (that I didn't
compile test) that should be folded in, still have to think about how
this could be made cleaner in general though.
Also, dm_sg_io_ioctl should probably be in dm-rq.c (maybe even dm-mpath.c!?)
From: Mike Snitzer <snitzer@redhat.com>
Date: Wed, 28 Apr 2021 15:03:20 -0400
Subject: [PATCH] dm: use scsi_result_to_blk_status rather than open-coding it
Other small cleanups/nits.
BUT the fact that dm.c now #includes <scsi/scsi.h> and <scsi/sg.h>
leaves a very bitter taste.
Also, hardcoding the issuing of a "fail_path" message (assumes tgt is
dm-mpath) but still having checks like !tgt->type->message.. its all
very contrived and awkward!
Not-Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm.c | 50 ++++++++++++++++--------------------------------
drivers/scsi/scsi_lib.c | 21 ++++++++++++--------
include/scsi/scsi_cmnd.h | 2 ++
3 files changed, 32 insertions(+), 41 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b79adf021ef1..51e187309ebd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -524,9 +524,9 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
#define dm_blk_report_zones NULL
#endif /* CONFIG_BLK_DEV_ZONED */
-static int _dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
- struct block_device **bdev,
- struct dm_target **target)
+static int __dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
+ struct block_device **bdev,
+ struct dm_target **target)
{
struct dm_target *tgt;
struct dm_table *map;
@@ -565,7 +565,7 @@ static int _dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
struct block_device **bdev)
{
- return _dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
+ return __dm_prepare_ioctl(md, srcu_idx, bdev, NULL);
}
static void dm_unprepare_ioctl(struct mapped_device *md, int srcu_idx)
@@ -594,9 +594,9 @@ static int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
struct dm_target *tgt;
struct sg_io_hdr rhdr;
- rc = _dm_prepare_ioctl(md, &srcu_idx, &bdev, &tgt);
+ rc = __dm_prepare_ioctl(md, &srcu_idx, &bdev, &tgt);
if (rc < 0) {
- DMERR("%s: failed to get path: %d\n",
+ DMERR("%s: failed to get path: %d",
__func__, rc);
goto out;
}
@@ -605,7 +605,7 @@ static int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
rc = sg_io(bdev->bd_disk->queue, bdev->bd_disk, &rhdr, mode);
- DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x\n",
+ DMDEBUG("SG_IO via %s: rc = %d D%02xH%02xM%02xS%02x",
bdevname(bdev, path_name), rc,
rhdr.driver_status, rhdr.host_status,
rhdr.msg_status, rhdr.status);
@@ -626,32 +626,16 @@ static int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
}
if (rhdr.info & SG_INFO_CHECK) {
- /*
- * See if this is a target or path error.
- * Compare blk_path_error(), scsi_result_to_blk_status(),
- * blk_errors[].
- */
- switch (rhdr.host_status) {
- case DID_OK:
- if (scsi_status_is_good(rhdr.status))
- rc = 0;
- break;
- case DID_TARGET_FAILURE:
- rc = -EREMOTEIO;
- goto out;
- case DID_NEXUS_FAILURE:
- rc = -EBADE;
- goto out;
- case DID_ALLOC_FAILURE:
- rc = -ENOSPC;
- goto out;
- case DID_MEDIUM_ERROR:
- rc = -ENODATA;
- goto out;
- default:
- /* Everything else is a path error */
+ blk_status_t sts = scsi_result_to_blk_status(rhdr.host_status, NULL);
+
+ /* See if this is a target or path error. */
+ if (sts == BLK_STS_OK)
+ rc = 0;
+ else if (blk_path_error(sts))
rc = -EIO;
- break;
+ else {
+ rc = blk_status_to_errno(sts);
+ goto out;
}
}
@@ -674,7 +658,7 @@ static int dm_sg_io_ioctl(struct block_device *bdev, fmode_t mode,
scnprintf(bdbuf, sizeof(bdbuf), "%u:%u",
MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
- DMDEBUG("sending \"%s %s\" to target\n", argv[0], argv[1]);
+ DMDEBUG("sending \"%s %s\" to target", argv[0], argv[1]);
rc = tgt->type->message(tgt, 2, argv, NULL, 0);
if (rc < 0)
goto out;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7d52a11e1b61..ecaaba66983c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -617,7 +617,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
* Translate a SCSI result code into a blk_status_t value. May reset the host
* byte of @cmd->result.
*/
-static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
+blk_status_t scsi_result_to_blk_status(int result, struct scsi_cmnd *cmd)
{
switch (host_byte(result)) {
case DID_OK:
@@ -633,21 +633,26 @@ static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
case DID_TRANSPORT_MARGINAL:
return BLK_STS_TRANSPORT;
case DID_TARGET_FAILURE:
- set_host_byte(cmd, DID_OK);
+ if (cmd)
+ set_host_byte(cmd, DID_OK);
return BLK_STS_TARGET;
case DID_NEXUS_FAILURE:
- set_host_byte(cmd, DID_OK);
+ if (cmd)
+ set_host_byte(cmd, DID_OK);
return BLK_STS_NEXUS;
case DID_ALLOC_FAILURE:
- set_host_byte(cmd, DID_OK);
+ if (cmd)
+ set_host_byte(cmd, DID_OK);
return BLK_STS_NOSPC;
case DID_MEDIUM_ERROR:
- set_host_byte(cmd, DID_OK);
+ if (cmd)
+ set_host_byte(cmd, DID_OK);
return BLK_STS_MEDIUM;
default:
return BLK_STS_IOERR;
}
}
+EXPORT_SYMBOL(scsi_result_to_blk_status);
/* Helper for scsi_io_completion() when "reprep" action required. */
static void scsi_io_completion_reprep(struct scsi_cmnd *cmd,
@@ -691,7 +696,7 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
if (sense_valid)
sense_current = !scsi_sense_is_deferred(&sshdr);
- blk_stat = scsi_result_to_blk_status(cmd, result);
+ blk_stat = scsi_result_to_blk_status(result, cmd);
if (host_byte(result) == DID_RESET) {
/* Third party bus reset or reset for error recovery
@@ -869,14 +874,14 @@ static int scsi_io_completion_nz_result(struct scsi_cmnd *cmd, int result,
SCSI_SENSE_BUFFERSIZE);
}
if (sense_current)
- *blk_statp = scsi_result_to_blk_status(cmd, result);
+ *blk_statp = scsi_result_to_blk_status(result, cmd);
} else if (blk_rq_bytes(req) == 0 && sense_current) {
/*
* Flush commands do not transfers any data, and thus cannot use
* good_bytes != blk_rq_bytes(req) as the signal for an error.
* This sets *blk_statp explicitly for the problem case.
*/
- *blk_statp = scsi_result_to_blk_status(cmd, result);
+ *blk_statp = scsi_result_to_blk_status(result, cmd);
}
/*
* Recovered errors need reporting, but they're always treated as
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index ace15b5dc956..19e76f8db1ea 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -161,6 +161,8 @@ static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
extern void scsi_finish_command(struct scsi_cmnd *cmd);
+extern blk_status_t scsi_result_to_blk_status(int result, struct scsi_cmnd *cmd);
+
extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
size_t *offset, size_t *len);
extern void scsi_kunmap_atomic_sg(void *virt);
--
2.15.0
next prev parent reply other threads:[~2021-04-28 19:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-22 20:21 [PATCH] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath mwilck
2021-04-22 20:24 ` [*RFC* PATCH] " Martin Wilck
2021-04-28 19:54 ` Mike Snitzer [this message]
2021-04-28 20:54 ` Martin Wilck
2021-04-29 6:02 ` Hannes Reinecke
2021-04-30 20:15 ` Mike Snitzer
2021-05-03 7:27 ` Martin Wilck
2021-04-29 8:33 ` Martin Wilck
2021-04-29 8:39 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210428195457.GA46518@lobo \
--to=snitzer@redhat.com \
--cc=agk@redhat.com \
--cc=dm-devel@redhat.com \
--cc=dwagner@suse.de \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mwilck@suse.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox