xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Bob Liu <bob.liu@oracle.com>, xen-devel@lists.xen.org
Cc: paul.durrant@citrix.com, ian.jackson@eu.citrix.com,
	jbeulich@suse.com, roger.pau@citrix.com
Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
Date: Mon, 29 Feb 2016 09:12:30 +0100	[thread overview]
Message-ID: <56D3FD6E.7000500@suse.com> (raw)
In-Reply-To: <1456717031-13423-1-git-send-email-bob.liu@oracle.com>

On 29/02/16 04:37, Bob Liu wrote:
> 1) What is this patch about?
> This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
> extra request which is used to pass through SCSI commands.
> This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> It can be extended easily to transmit other per-request/bio data from frontend
> to backend e.g Data Integrity Field per bio.
> 
> 2) Why we need this?
> Currently only raw data segments are transmitted from blkfront to blkback, which
> means some advanced features are lost.
>  * Guest knows nothing about features of the real backend storage.
> 	For example, on bare-metal environment INQUIRY SCSI command can be used
> 	to query storage device information. If it's a SSD or flash device we
> 	can have the option to use the device as a fast cache.
> 	But this can't happen in current domU guests, because blkfront only
> 	knows it's just a normal virtual disk
> 
>  * Failover Clusters in Windows
> 	Failover clusters require SCSI-3 persistent reservation target disks,
> 	but now this can't work in domU.
> 
> 3) Known issues:
>  * Security issues, how to 'validate' this extra request payload.
>    E.g SCSI operates on LUN bases (the whole disk) while we really just want to
>    operate on partitions

It's not only validation: some operations just affect the whole LUN
(e.g. Reserve/Release). And what about "multi-LUN" commands like
"report LUNs"?

>  * Can't pass SCSI commands through if the backend storage driver is bio-based
>    instead of request-based.
> 
> 4) Alternative approach: Using PVSCSI instead:
>  * Doubt PVSCSI can support as many type of backend storage devices as Xen-block.

pvSCSI won't need to support all types of backends. It's enough to
support those where passing through SCSI commands makes sense.

Seems to be a similar issue as the above mentioned problem with
bio-based backend storage drivers.

>  * Much longer path:
>    ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) ->
> 
>    With xen-block we only need:
>    ioctl() -> blkfront -> blkback ->

I'd like to see performance numbers before making a decision.

>  * xen-block has been existed for many years, widely used and more stable.

Adding another SCSI passthrough capability wasn't accepted for pvSCSI
(that's the reason I used the Target Framework). Why do you think it
will be accepted for pvblk?

This is not my personal opinion, just a heads up from someone who had a
try already. ;-)


Juergen

> Welcome any input, thank you!
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  xen/include/public/io/blkif.h |   73 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
> 
> diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
> index 99f0326..7c10bce 100644
> --- a/xen/include/public/io/blkif.h
> +++ b/xen/include/public/io/blkif.h
> @@ -635,6 +635,28 @@
>  #define BLKIF_OP_INDIRECT          6
>  
>  /*
> + * Recognised only if "feature-extra-request" is present in backend xenbus info.
> + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed
> + * in the shared ring buffer.
> + *
> + * By this way, extra data like SCSI command, DIF/DIX and other per-request/bio
> + * data can be transmitted from frontend to backend.
> + *
> + * The 'wire' format is like:
> + *  Request 1: xen_blkif_request
> + * [Request 2: xen_blkif_extra_request]    (only if request 1 has BLKIF_OP_EXTRA_FLAG)
> + *  Request 3: xen_blkif_request
> + *  Request 4: xen_blkif_request
> + * [Request 5: xen_blkif_extra_request]    (only if request 4 has BLKIF_OP_EXTRA_FLAG)
> + *  ...
> + *  Request N: xen_blkif_request
> + *
> + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the
> + * "feature-extra-request" node!
> + */
> +#define BLKIF_OP_EXTRA_FLAG (0x80)
> +
> +/*
>   * Maximum scatter/gather segments per request.
>   * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
>   * NB. This could be 12 if the ring indexes weren't stored in the same page.
> @@ -703,10 +725,61 @@ struct blkif_request_indirect {
>  };
>  typedef struct blkif_request_indirect blkif_request_indirect_t;
>  
> +enum blkif_extra_request_type {
> +	BLKIF_EXTRA_TYPE_SCSI_CMD = 1,		/* Transmit SCSI command.  */
> +};
> +
> +struct scsi_cmd_req {
> +	/*
> +	 * Grant mapping for transmiting SCSI command to backend, and
> +	 * also receive sense data from backend.
> +	 * One 4KB page is enough.
> +	 */
> +	grant_ref_t cmd_gref;
> +	/* Length of SCSI command in the grant mapped page. */
> +	unsigned int cmd_len;
> +
> +	/*
> +	 * SCSI command may require transmiting data segment length less
> +	 * than a sector(512 bytes).
> +	 * Record num_sg and last segment length in extra request so that
> +	 * backend can know about them.
> +	 */
> +	unsigned int num_sg;
> +	unsigned int last_sg_len;
> +};
> +
> +/*
> + * Extra request, must follow a normal-request and a normal-request can
> + * only be followed by one extra request.
> + */
> +struct blkif_request_extra {
> +	uint8_t type;		/* BLKIF_EXTRA_TYPE_* */
> +	uint16_t _pad1;
> +#ifndef CONFIG_X86_32
> +	uint32_t _pad2;		/* offsetof(blkif_...,u.extra.id) == 8 */
> +#endif
> +	uint64_t id;
> +	struct scsi_cmd_req scsi_cmd;
> +} __attribute__((__packed__));
> +typedef struct blkif_request_extra blkif_request_extra_t;
> +
> +struct scsi_cmd_res {
> +	unsigned int resid_len;
> +	/* Length of sense data returned in grant mapped page. */
> +	unsigned int sense_len;
> +};
> +
> +struct blkif_response_extra {
> +	uint8_t type;  /* BLKIF_EXTRA_TYPE_* */
> +	struct scsi_cmd_res scsi_cmd;
> +} __attribute__((__packed__));
> +
>  struct blkif_response {
>      uint64_t        id;              /* copied from request */
>      uint8_t         operation;       /* copied from request */
>      int16_t         status;          /* BLKIF_RSP_???       */
> +    struct blkif_response_extra extra;
>  };
>  typedef struct blkif_response blkif_response_t;
>  
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-02-29  8:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-29  3:37 [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands Bob Liu
2016-02-29  8:12 ` Juergen Gross [this message]
2016-02-29 15:05   ` Konrad Rzeszutek Wilk
2016-02-29 15:34     ` Juergen Gross
2016-02-29  9:13 ` Paul Durrant
2016-02-29 14:55   ` Konrad Rzeszutek Wilk
2016-02-29 15:28     ` Paul Durrant
2016-02-29 15:35       ` Roger Pau Monné
2016-02-29 16:48         ` Konrad Rzeszutek Wilk
2016-02-29 16:56           ` Paul Durrant
2016-02-29 16:14 ` Ian Jackson
2016-02-29 16:29   ` Ian Jackson
2016-02-29 23:45     ` Bob Liu
2016-02-29 23:45     ` Bob Liu
2016-03-01 18:08       ` Ian Jackson
2016-03-02  7:39         ` Juergen Gross
2016-03-02  7:57           ` Bob Liu
2016-03-02 11:40             ` Ian Jackson
2016-03-02 11:46               ` Paul Durrant
2016-03-02 12:00                 ` Juergen Gross
2016-03-02 12:28               ` Bob Liu
2016-03-02 14:44                 ` Ian Jackson
     [not found]                   ` <20160302172257.GC27821@char.us.oracle.com>
2016-03-03 11:54                     ` Paul Durrant
2016-03-03 12:03                       ` Ian Jackson
2016-03-03 12:25                         ` Juergen Gross
2016-03-03 14:07                           ` Konrad Rzeszutek Wilk
2016-03-03 14:19                             ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D3FD6E.7000500@suse.com \
    --to=jgross@suse.com \
    --cc=bob.liu@oracle.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=paul.durrant@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).