From: Juergen Gross <jgross@suse.com>
To: Bob Liu <bob.liu@oracle.com>, xen-devel@lists.xen.org
Cc: paul.durrant@citrix.com, ian.jackson@eu.citrix.com,
jbeulich@suse.com, roger.pau@citrix.com
Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
Date: Mon, 29 Feb 2016 09:12:30 +0100 [thread overview]
Message-ID: <56D3FD6E.7000500@suse.com> (raw)
In-Reply-To: <1456717031-13423-1-git-send-email-bob.liu@oracle.com>
On 29/02/16 04:37, Bob Liu wrote:
> 1) What is this patch about?
> This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
> extra request which is used to pass through SCSI commands.
> This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> It can be extended easily to transmit other per-request/bio data from frontend
> to backend e.g Data Integrity Field per bio.
>
> 2) Why we need this?
> Currently only raw data segments are transmitted from blkfront to blkback, which
> means some advanced features are lost.
> * Guest knows nothing about features of the real backend storage.
> For example, on bare-metal environment INQUIRY SCSI command can be used
> to query storage device information. If it's a SSD or flash device we
> can have the option to use the device as a fast cache.
> But this can't happen in current domU guests, because blkfront only
> knows it's just a normal virtual disk
>
> * Failover Clusters in Windows
> Failover clusters require SCSI-3 persistent reservation target disks,
> but now this can't work in domU.
>
> 3) Known issues:
> * Security issues, how to 'validate' this extra request payload.
> E.g SCSI operates on LUN bases (the whole disk) while we really just want to
> operate on partitions
It's not only validation: some operations just affect the whole LUN
(e.g. Reserve/Release). And what about "multi-LUN" commands like
"report LUNs"?
> * Can't pass SCSI commands through if the backend storage driver is bio-based
> instead of request-based.
>
> 4) Alternative approach: Using PVSCSI instead:
> * Doubt PVSCSI can support as many type of backend storage devices as Xen-block.
pvSCSI won't need to support all types of backends. It's enough to
support those where passing through SCSI commands makes sense.
Seems to be a similar issue as the above mentioned problem with
bio-based backend storage drivers.
> * Much longer path:
> ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) ->
>
> With xen-block we only need:
> ioctl() -> blkfront -> blkback ->
I'd like to see performance numbers before making a decision.
> * xen-block has been existed for many years, widely used and more stable.
Adding another SCSI passthrough capability wasn't accepted for pvSCSI
(that's the reason I used the Target Framework). Why do you think it
will be accepted for pvblk?
This is not my personal opinion, just a heads up from someone who had a
try already. ;-)
Juergen
> Welcome any input, thank you!
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> xen/include/public/io/blkif.h | 73 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 73 insertions(+)
>
> diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
> index 99f0326..7c10bce 100644
> --- a/xen/include/public/io/blkif.h
> +++ b/xen/include/public/io/blkif.h
> @@ -635,6 +635,28 @@
> #define BLKIF_OP_INDIRECT 6
>
> /*
> + * Recognised only if "feature-extra-request" is present in backend xenbus info.
> + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed
> + * in the shared ring buffer.
> + *
> + * By this way, extra data like SCSI command, DIF/DIX and other per-request/bio
> + * data can be transmitted from frontend to backend.
> + *
> + * The 'wire' format is like:
> + * Request 1: xen_blkif_request
> + * [Request 2: xen_blkif_extra_request] (only if request 1 has BLKIF_OP_EXTRA_FLAG)
> + * Request 3: xen_blkif_request
> + * Request 4: xen_blkif_request
> + * [Request 5: xen_blkif_extra_request] (only if request 4 has BLKIF_OP_EXTRA_FLAG)
> + * ...
> + * Request N: xen_blkif_request
> + *
> + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the
> + * "feature-extra-request" node!
> + */
> +#define BLKIF_OP_EXTRA_FLAG (0x80)
> +
> +/*
> * Maximum scatter/gather segments per request.
> * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
> * NB. This could be 12 if the ring indexes weren't stored in the same page.
> @@ -703,10 +725,61 @@ struct blkif_request_indirect {
> };
> typedef struct blkif_request_indirect blkif_request_indirect_t;
>
> +enum blkif_extra_request_type {
> + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI command. */
> +};
> +
> +struct scsi_cmd_req {
> + /*
> + * Grant mapping for transmiting SCSI command to backend, and
> + * also receive sense data from backend.
> + * One 4KB page is enough.
> + */
> + grant_ref_t cmd_gref;
> + /* Length of SCSI command in the grant mapped page. */
> + unsigned int cmd_len;
> +
> + /*
> + * SCSI command may require transmiting data segment length less
> + * than a sector(512 bytes).
> + * Record num_sg and last segment length in extra request so that
> + * backend can know about them.
> + */
> + unsigned int num_sg;
> + unsigned int last_sg_len;
> +};
> +
> +/*
> + * Extra request, must follow a normal-request and a normal-request can
> + * only be followed by one extra request.
> + */
> +struct blkif_request_extra {
> + uint8_t type; /* BLKIF_EXTRA_TYPE_* */
> + uint16_t _pad1;
> +#ifndef CONFIG_X86_32
> + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 */
> +#endif
> + uint64_t id;
> + struct scsi_cmd_req scsi_cmd;
> +} __attribute__((__packed__));
> +typedef struct blkif_request_extra blkif_request_extra_t;
> +
> +struct scsi_cmd_res {
> + unsigned int resid_len;
> + /* Length of sense data returned in grant mapped page. */
> + unsigned int sense_len;
> +};
> +
> +struct blkif_response_extra {
> + uint8_t type; /* BLKIF_EXTRA_TYPE_* */
> + struct scsi_cmd_res scsi_cmd;
> +} __attribute__((__packed__));
> +
> struct blkif_response {
> uint64_t id; /* copied from request */
> uint8_t operation; /* copied from request */
> int16_t status; /* BLKIF_RSP_??? */
> + struct blkif_response_extra extra;
> };
> typedef struct blkif_response blkif_response_t;
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-02-29 8:12 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-29 3:37 [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands Bob Liu
2016-02-29 8:12 ` Juergen Gross [this message]
2016-02-29 15:05 ` Konrad Rzeszutek Wilk
2016-02-29 15:34 ` Juergen Gross
2016-02-29 9:13 ` Paul Durrant
2016-02-29 14:55 ` Konrad Rzeszutek Wilk
2016-02-29 15:28 ` Paul Durrant
2016-02-29 15:35 ` Roger Pau Monné
2016-02-29 16:48 ` Konrad Rzeszutek Wilk
2016-02-29 16:56 ` Paul Durrant
2016-02-29 16:14 ` Ian Jackson
2016-02-29 16:29 ` Ian Jackson
2016-02-29 23:45 ` Bob Liu
2016-02-29 23:45 ` Bob Liu
2016-03-01 18:08 ` Ian Jackson
2016-03-02 7:39 ` Juergen Gross
2016-03-02 7:57 ` Bob Liu
2016-03-02 11:40 ` Ian Jackson
2016-03-02 11:46 ` Paul Durrant
2016-03-02 12:00 ` Juergen Gross
2016-03-02 12:28 ` Bob Liu
2016-03-02 14:44 ` Ian Jackson
[not found] ` <20160302172257.GC27821@char.us.oracle.com>
2016-03-03 11:54 ` Paul Durrant
2016-03-03 12:03 ` Ian Jackson
2016-03-03 12:25 ` Juergen Gross
2016-03-03 14:07 ` Konrad Rzeszutek Wilk
2016-03-03 14:19 ` Paul Durrant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D3FD6E.7000500@suse.com \
--to=jgross@suse.com \
--cc=bob.liu@oracle.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=paul.durrant@citrix.com \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).