* [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
@ 2016-02-29 3:37 Bob Liu
2016-02-29 8:12 ` Juergen Gross
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Bob Liu @ 2016-02-29 3:37 UTC (permalink / raw)
To: xen-devel; +Cc: jgross, ian.jackson, Bob Liu, paul.durrant, jbeulich, roger.pau
1) What is this patch about?
This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
extra request which is used to pass through SCSI commands.
This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
It can be extended easily to transmit other per-request/bio data from frontend
to backend e.g Data Integrity Field per bio.
2) Why we need this?
Currently only raw data segments are transmitted from blkfront to blkback, which
means some advanced features are lost.
* Guest knows nothing about features of the real backend storage.
For example, on bare-metal environment INQUIRY SCSI command can be used
to query storage device information. If it's a SSD or flash device we
can have the option to use the device as a fast cache.
But this can't happen in current domU guests, because blkfront only
knows it's just a normal virtual disk
* Failover Clusters in Windows
Failover clusters require SCSI-3 persistent reservation target disks,
but now this can't work in domU.
3) Known issues:
* Security issues, how to 'validate' this extra request payload.
E.g SCSI operates on LUN bases (the whole disk) while we really just want to
operate on partitions
* Can't pass SCSI commands through if the backend storage driver is bio-based
instead of request-based.
4) Alternative approach: Using PVSCSI instead:
* Doubt PVSCSI can support as many type of backend storage devices as Xen-block.
* Much longer path:
ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) ->
With xen-block we only need:
ioctl() -> blkfront -> blkback ->
* xen-block has been existed for many years, widely used and more stable.
Welcome any input, thank you!
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
xen/include/public/io/blkif.h | 73 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
index 99f0326..7c10bce 100644
--- a/xen/include/public/io/blkif.h
+++ b/xen/include/public/io/blkif.h
@@ -635,6 +635,28 @@
#define BLKIF_OP_INDIRECT 6
/*
+ * Recognised only if "feature-extra-request" is present in backend xenbus info.
+ * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed
+ * in the shared ring buffer.
+ *
+ * By this way, extra data like SCSI command, DIF/DIX and other per-request/bio
+ * data can be transmitted from frontend to backend.
+ *
+ * The 'wire' format is like:
+ * Request 1: xen_blkif_request
+ * [Request 2: xen_blkif_extra_request] (only if request 1 has BLKIF_OP_EXTRA_FLAG)
+ * Request 3: xen_blkif_request
+ * Request 4: xen_blkif_request
+ * [Request 5: xen_blkif_extra_request] (only if request 4 has BLKIF_OP_EXTRA_FLAG)
+ * ...
+ * Request N: xen_blkif_request
+ *
+ * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the
+ * "feature-extra-request" node!
+ */
+#define BLKIF_OP_EXTRA_FLAG (0x80)
+
+/*
* Maximum scatter/gather segments per request.
* This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
* NB. This could be 12 if the ring indexes weren't stored in the same page.
@@ -703,10 +725,61 @@ struct blkif_request_indirect {
};
typedef struct blkif_request_indirect blkif_request_indirect_t;
+enum blkif_extra_request_type {
+ BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI command. */
+};
+
+struct scsi_cmd_req {
+ /*
+ * Grant mapping for transmiting SCSI command to backend, and
+ * also receive sense data from backend.
+ * One 4KB page is enough.
+ */
+ grant_ref_t cmd_gref;
+ /* Length of SCSI command in the grant mapped page. */
+ unsigned int cmd_len;
+
+ /*
+ * SCSI command may require transmiting data segment length less
+ * than a sector(512 bytes).
+ * Record num_sg and last segment length in extra request so that
+ * backend can know about them.
+ */
+ unsigned int num_sg;
+ unsigned int last_sg_len;
+};
+
+/*
+ * Extra request, must follow a normal-request and a normal-request can
+ * only be followed by one extra request.
+ */
+struct blkif_request_extra {
+ uint8_t type; /* BLKIF_EXTRA_TYPE_* */
+ uint16_t _pad1;
+#ifndef CONFIG_X86_32
+ uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 */
+#endif
+ uint64_t id;
+ struct scsi_cmd_req scsi_cmd;
+} __attribute__((__packed__));
+typedef struct blkif_request_extra blkif_request_extra_t;
+
+struct scsi_cmd_res {
+ unsigned int resid_len;
+ /* Length of sense data returned in grant mapped page. */
+ unsigned int sense_len;
+};
+
+struct blkif_response_extra {
+ uint8_t type; /* BLKIF_EXTRA_TYPE_* */
+ struct scsi_cmd_res scsi_cmd;
+} __attribute__((__packed__));
+
struct blkif_response {
uint64_t id; /* copied from request */
uint8_t operation; /* copied from request */
int16_t status; /* BLKIF_RSP_??? */
+ struct blkif_response_extra extra;
};
typedef struct blkif_response blkif_response_t;
--
1.7.10.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 3:37 [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands Bob Liu
@ 2016-02-29 8:12 ` Juergen Gross
2016-02-29 15:05 ` Konrad Rzeszutek Wilk
2016-02-29 9:13 ` Paul Durrant
2016-02-29 16:14 ` Ian Jackson
2 siblings, 1 reply; 27+ messages in thread
From: Juergen Gross @ 2016-02-29 8:12 UTC (permalink / raw)
To: Bob Liu, xen-devel; +Cc: paul.durrant, ian.jackson, jbeulich, roger.pau
On 29/02/16 04:37, Bob Liu wrote:
> 1) What is this patch about?
> This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
> extra request which is used to pass through SCSI commands.
> This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> It can be extended easily to transmit other per-request/bio data from frontend
> to backend e.g Data Integrity Field per bio.
>
> 2) Why we need this?
> Currently only raw data segments are transmitted from blkfront to blkback, which
> means some advanced features are lost.
> * Guest knows nothing about features of the real backend storage.
> For example, on bare-metal environment INQUIRY SCSI command can be used
> to query storage device information. If it's a SSD or flash device we
> can have the option to use the device as a fast cache.
> But this can't happen in current domU guests, because blkfront only
> knows it's just a normal virtual disk
>
> * Failover Clusters in Windows
> Failover clusters require SCSI-3 persistent reservation target disks,
> but now this can't work in domU.
>
> 3) Known issues:
> * Security issues, how to 'validate' this extra request payload.
> E.g SCSI operates on LUN bases (the whole disk) while we really just want to
> operate on partitions
It's not only validation: some operations just affect the whole LUN
(e.g. Reserve/Release). And what about "multi-LUN" commands like
"report LUNs"?
> * Can't pass SCSI commands through if the backend storage driver is bio-based
> instead of request-based.
>
> 4) Alternative approach: Using PVSCSI instead:
> * Doubt PVSCSI can support as many type of backend storage devices as Xen-block.
pvSCSI won't need to support all types of backends. It's enough to
support those where passing through SCSI commands makes sense.
Seems to be a similar issue as the above mentioned problem with
bio-based backend storage drivers.
> * Much longer path:
> ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) ->
>
> With xen-block we only need:
> ioctl() -> blkfront -> blkback ->
I'd like to see performance numbers before making a decision.
> * xen-block has been existed for many years, widely used and more stable.
Adding another SCSI passthrough capability wasn't accepted for pvSCSI
(that's the reason I used the Target Framework). Why do you think it
will be accepted for pvblk?
This is not my personal opinion, just a heads up from someone who had a
try already. ;-)
Juergen
> Welcome any input, thank you!
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> xen/include/public/io/blkif.h | 73 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 73 insertions(+)
>
> diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
> index 99f0326..7c10bce 100644
> --- a/xen/include/public/io/blkif.h
> +++ b/xen/include/public/io/blkif.h
> @@ -635,6 +635,28 @@
> #define BLKIF_OP_INDIRECT 6
>
> /*
> + * Recognised only if "feature-extra-request" is present in backend xenbus info.
> + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed
> + * in the shared ring buffer.
> + *
> + * By this way, extra data like SCSI command, DIF/DIX and other per-request/bio
> + * data can be transmitted from frontend to backend.
> + *
> + * The 'wire' format is like:
> + * Request 1: xen_blkif_request
> + * [Request 2: xen_blkif_extra_request] (only if request 1 has BLKIF_OP_EXTRA_FLAG)
> + * Request 3: xen_blkif_request
> + * Request 4: xen_blkif_request
> + * [Request 5: xen_blkif_extra_request] (only if request 4 has BLKIF_OP_EXTRA_FLAG)
> + * ...
> + * Request N: xen_blkif_request
> + *
> + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the
> + * "feature-extra-request" node!
> + */
> +#define BLKIF_OP_EXTRA_FLAG (0x80)
> +
> +/*
> * Maximum scatter/gather segments per request.
> * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
> * NB. This could be 12 if the ring indexes weren't stored in the same page.
> @@ -703,10 +725,61 @@ struct blkif_request_indirect {
> };
> typedef struct blkif_request_indirect blkif_request_indirect_t;
>
> +enum blkif_extra_request_type {
> + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI command. */
> +};
> +
> +struct scsi_cmd_req {
> + /*
> + * Grant mapping for transmiting SCSI command to backend, and
> + * also receive sense data from backend.
> + * One 4KB page is enough.
> + */
> + grant_ref_t cmd_gref;
> + /* Length of SCSI command in the grant mapped page. */
> + unsigned int cmd_len;
> +
> + /*
> + * SCSI command may require transmiting data segment length less
> + * than a sector(512 bytes).
> + * Record num_sg and last segment length in extra request so that
> + * backend can know about them.
> + */
> + unsigned int num_sg;
> + unsigned int last_sg_len;
> +};
> +
> +/*
> + * Extra request, must follow a normal-request and a normal-request can
> + * only be followed by one extra request.
> + */
> +struct blkif_request_extra {
> + uint8_t type; /* BLKIF_EXTRA_TYPE_* */
> + uint16_t _pad1;
> +#ifndef CONFIG_X86_32
> + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 */
> +#endif
> + uint64_t id;
> + struct scsi_cmd_req scsi_cmd;
> +} __attribute__((__packed__));
> +typedef struct blkif_request_extra blkif_request_extra_t;
> +
> +struct scsi_cmd_res {
> + unsigned int resid_len;
> + /* Length of sense data returned in grant mapped page. */
> + unsigned int sense_len;
> +};
> +
> +struct blkif_response_extra {
> + uint8_t type; /* BLKIF_EXTRA_TYPE_* */
> + struct scsi_cmd_res scsi_cmd;
> +} __attribute__((__packed__));
> +
> struct blkif_response {
> uint64_t id; /* copied from request */
> uint8_t operation; /* copied from request */
> int16_t status; /* BLKIF_RSP_??? */
> + struct blkif_response_extra extra;
> };
> typedef struct blkif_response blkif_response_t;
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 8:12 ` Juergen Gross
@ 2016-02-29 15:05 ` Konrad Rzeszutek Wilk
2016-02-29 15:34 ` Juergen Gross
0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-29 15:05 UTC (permalink / raw)
To: Juergen Gross; +Cc: ian.jackson, xen-devel, paul.durrant, jbeulich, roger.pau
On Mon, Feb 29, 2016 at 09:12:30AM +0100, Juergen Gross wrote:
> On 29/02/16 04:37, Bob Liu wrote:
> > 1) What is this patch about?
> > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> > A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
> > extra request which is used to pass through SCSI commands.
> > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> > It can be extended easily to transmit other per-request/bio data from frontend
> > to backend e.g Data Integrity Field per bio.
> >
> > 2) Why we need this?
> > Currently only raw data segments are transmitted from blkfront to blkback, which
> > means some advanced features are lost.
> > * Guest knows nothing about features of the real backend storage.
> > For example, on bare-metal environment INQUIRY SCSI command can be used
> > to query storage device information. If it's a SSD or flash device we
> > can have the option to use the device as a fast cache.
> > But this can't happen in current domU guests, because blkfront only
> > knows it's just a normal virtual disk
> >
> > * Failover Clusters in Windows
> > Failover clusters require SCSI-3 persistent reservation target disks,
> > but now this can't work in domU.
> >
> > 3) Known issues:
> > * Security issues, how to 'validate' this extra request payload.
> > E.g SCSI operates on LUN bases (the whole disk) while we really just want to
> > operate on partitions
>
> It's not only validation: some operations just affect the whole LUN
> (e.g. Reserve/Release). And what about "multi-LUN" commands like
> "report LUNs"?
Don't expose them. Bob and I want to get an idea of what would be a good
compromise to allow some SCSI specific (or perhaps ATA specific or DIF/DIX?) type of
commands go through the PV driver.
Would it be better if it was through XenBus? But that may not work for some
that are tied closely to requests, such as DIF/DIX.
However the 'DISCARD' for example worked out - it is an umbrella for both
SCSI UNMAP and ATA DISCARD operation and hides the complexity of the low level
protocol. Could there be an 'INQ' ? Since the SCSI VPD is the most exhaustive
in terms of details it may make sense to base it on that..?
>
> > * Can't pass SCSI commands through if the backend storage driver is bio-based
> > instead of request-based.
> >
> > 4) Alternative approach: Using PVSCSI instead:
> > * Doubt PVSCSI can support as many type of backend storage devices as Xen-block.
>
> pvSCSI won't need to support all types of backends. It's enough to
> support those where passing through SCSI commands makes sense.
>
> Seems to be a similar issue as the above mentioned problem with
> bio-based backend storage drivers.
In particular the ones we care about are:
- Multipath over FibreChannel devices.
- Linear mapping (LVM) over the multipath.
- And then potentially an filesystem on top of that
- .. and a raw file on the filesystem.
Having SCSI VPD 0x83 page sent to frontend for that would be good.
Not sure about SCSI reservations. Perhaps those are more of .. unique
in that the implementation would have to make sure that the guest
owns the whole LUN. But that is implementation question.
This is about the design - how would you envision to to cram in
SCSI commands or DIF/DIX commands or ATA commands via PV block layer?
>
> > * Much longer path:
> > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) ->
> >
> > With xen-block we only need:
> > ioctl() -> blkfront -> blkback ->
>
> I'd like to see performance numbers before making a decision.
For SCSI INQ? <laughs>
Or are you talking about raw READ/WRITE?
>
> > * xen-block has been existed for many years, widely used and more stable.
>
> Adding another SCSI passthrough capability wasn't accepted for pvSCSI
> (that's the reason I used the Target Framework). Why do you think it
> will be accepted for pvblk?
>
> This is not my personal opinion, just a heads up from someone who had a
> try already. ;-)
Right. So SCSI passthrough out. What about mediated access for
specific SCSI, or specific ATA, or DIF/DIX ones? And how would you do it
knowing the SCSI maintainers much better than we do?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 15:05 ` Konrad Rzeszutek Wilk
@ 2016-02-29 15:34 ` Juergen Gross
0 siblings, 0 replies; 27+ messages in thread
From: Juergen Gross @ 2016-02-29 15:34 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: ian.jackson, xen-devel, paul.durrant, jbeulich, roger.pau
On 29/02/16 16:05, Konrad Rzeszutek Wilk wrote:
> On Mon, Feb 29, 2016 at 09:12:30AM +0100, Juergen Gross wrote:
>> On 29/02/16 04:37, Bob Liu wrote:
>>> 1) What is this patch about?
>>> This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
>>> A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
>>> extra request which is used to pass through SCSI commands.
>>> This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
>>> It can be extended easily to transmit other per-request/bio data from frontend
>>> to backend e.g Data Integrity Field per bio.
>>>
>>> 2) Why we need this?
>>> Currently only raw data segments are transmitted from blkfront to blkback, which
>>> means some advanced features are lost.
>>> * Guest knows nothing about features of the real backend storage.
>>> For example, on bare-metal environment INQUIRY SCSI command can be used
>>> to query storage device information. If it's a SSD or flash device we
>>> can have the option to use the device as a fast cache.
>>> But this can't happen in current domU guests, because blkfront only
>>> knows it's just a normal virtual disk
>>>
>>> * Failover Clusters in Windows
>>> Failover clusters require SCSI-3 persistent reservation target disks,
>>> but now this can't work in domU.
>>>
>>> 3) Known issues:
>>> * Security issues, how to 'validate' this extra request payload.
>>> E.g SCSI operates on LUN bases (the whole disk) while we really just want to
>>> operate on partitions
>>
>> It's not only validation: some operations just affect the whole LUN
>> (e.g. Reserve/Release). And what about "multi-LUN" commands like
>> "report LUNs"?
>
> Don't expose them. Bob and I want to get an idea of what would be a good
> compromise to allow some SCSI specific (or perhaps ATA specific or DIF/DIX?) type of
> commands go through the PV driver.
>
> Would it be better if it was through XenBus? But that may not work for some
> that are tied closely to requests, such as DIF/DIX.
>
> However the 'DISCARD' for example worked out - it is an umbrella for both
> SCSI UNMAP and ATA DISCARD operation and hides the complexity of the low level
> protocol. Could there be an 'INQ' ? Since the SCSI VPD is the most exhaustive
> in terms of details it may make sense to base it on that..?
>
>>
>>> * Can't pass SCSI commands through if the backend storage driver is bio-based
>>> instead of request-based.
>>>
>>> 4) Alternative approach: Using PVSCSI instead:
>>> * Doubt PVSCSI can support as many type of backend storage devices as Xen-block.
>>
>> pvSCSI won't need to support all types of backends. It's enough to
>> support those where passing through SCSI commands makes sense.
>>
>> Seems to be a similar issue as the above mentioned problem with
>> bio-based backend storage drivers.
>
> In particular the ones we care about are:
> - Multipath over FibreChannel devices.
> - Linear mapping (LVM) over the multipath.
> - And then potentially an filesystem on top of that
> - .. and a raw file on the filesystem.
>
> Having SCSI VPD 0x83 page sent to frontend for that would be good.
>
> Not sure about SCSI reservations. Perhaps those are more of .. unique
> in that the implementation would have to make sure that the guest
> owns the whole LUN. But that is implementation question.
>
> This is about the design - how would you envision to to cram in
> SCSI commands or DIF/DIX commands or ATA commands via PV block layer?
Have some kind of abstraction which can be mapped to SCSI commands
easily, but don't stick to the SCSI definitions. Use your own
structures, commands etc. and build SCSI commands from those in the
backend. This way you avoid having to emulate SCSI. Instead of
naming it "SCSI passthrough" call it "special commands". :-)
I would add only those operations you really need. Add an inquiry
operation which returns the supported "special commands". The
availability of the inquiry can be reported via Xenstore.
>
>>
>>> * Much longer path:
>>> ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) ->
>>>
>>> With xen-block we only need:
>>> ioctl() -> blkfront -> blkback ->
>>
>> I'd like to see performance numbers before making a decision.
>
> For SCSI INQ? <laughs>
>
> Or are you talking about raw READ/WRITE?
READ/WRITE. I'm quite sure pvSCSI will be slower than pvblk, but I don't
know how much slower.
>>
>>> * xen-block has been existed for many years, widely used and more stable.
>>
>> Adding another SCSI passthrough capability wasn't accepted for pvSCSI
>> (that's the reason I used the Target Framework). Why do you think it
>> will be accepted for pvblk?
>>
>> This is not my personal opinion, just a heads up from someone who had a
>> try already. ;-)
>
> Right. So SCSI passthrough out. What about mediated access for
> specific SCSI, or specific ATA, or DIF/DIX ones? And how would you do it
> knowing the SCSI maintainers much better than we do?
Don't call it SCSI command passthrough, or use the target framework.
As stated above: I think renaming the whole feature is better. You'll
avoid much trouble and weird configuration problems.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 3:37 [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands Bob Liu
2016-02-29 8:12 ` Juergen Gross
@ 2016-02-29 9:13 ` Paul Durrant
2016-02-29 14:55 ` Konrad Rzeszutek Wilk
2016-02-29 16:14 ` Ian Jackson
2 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2016-02-29 9:13 UTC (permalink / raw)
To: Bob Liu, xen-devel@lists.xen.org
Cc: jgross@suse.com, Ian Jackson, jbeulich@suse.com, Roger Pau Monne
> -----Original Message-----
> From: Bob Liu [mailto:bob.liu@oracle.com]
> Sent: 29 February 2016 03:37
> To: xen-devel@lists.xen.org
> Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne; jgross@suse.com;
> Paul Durrant; konrad.wilk@oracle.com; Bob Liu
> Subject: [RFC PATCH] xen-block: introduces extra request to pass-through
> SCSI commands
>
> 1) What is this patch about?
> This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
> extra request which is used to pass through SCSI commands.
> This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> It can be extended easily to transmit other per-request/bio data from
> frontend
> to backend e.g Data Integrity Field per bio.
>
> 2) Why we need this?
> Currently only raw data segments are transmitted from blkfront to blkback,
> which
> means some advanced features are lost.
> * Guest knows nothing about features of the real backend storage.
> For example, on bare-metal environment INQUIRY SCSI command
> can be used
> to query storage device information. If it's a SSD or flash device we
> can have the option to use the device as a fast cache.
> But this can't happen in current domU guests, because blkfront only
> knows it's just a normal virtual disk
>
That's the sort of information that should be advertised via xenstore then. There already feature flags for specific things but if some form of throughput/latency information is meaningful to a frontend stack then perhaps that can be advertised too.
> * Failover Clusters in Windows
> Failover clusters require SCSI-3 persistent reservation target disks,
> but now this can't work in domU.
>
That's true but allowing arbitrary SCSI messages through is not the way forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean other OS do so I think reservation/release should have dedicated messages in the blkif protocol if it's desirable to support clustering in the frontend.
> 3) Known issues:
> * Security issues, how to 'validate' this extra request payload.
> E.g SCSI operates on LUN bases (the whole disk) while we really just want
> to
> operate on partitions
>
> * Can't pass SCSI commands through if the backend storage driver is bio-
> based
> instead of request-based.
>
> 4) Alternative approach: Using PVSCSI instead:
> * Doubt PVSCSI can support as many type of backend storage devices as
> Xen-block.
>
LIO can interface to any block device in much the same way blkback does IIRC.
> * Much longer path:
> ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-
> backend -> Target framework(LIO?) ->
>
> With xen-block we only need:
> ioctl() -> blkfront -> blkback ->
>
...and what happens if the block device that blkback is talking to is a SCSI LUN?
That latter path is also not true for Windows. You've got all the SCSI translation logic in the frontend when using blkif so that first path would collapse to:
Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX
> * xen-block has been existed for many years, widely used and more stable.
>
It's definitely widely used, but it has had stability issues in recent times.
Paul
> Welcome any input, thank you!
>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> xen/include/public/io/blkif.h | 73
> +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 73 insertions(+)
>
> diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
> index 99f0326..7c10bce 100644
> --- a/xen/include/public/io/blkif.h
> +++ b/xen/include/public/io/blkif.h
> @@ -635,6 +635,28 @@
> #define BLKIF_OP_INDIRECT 6
>
> /*
> + * Recognised only if "feature-extra-request" is present in backend xenbus
> info.
> + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is
> followed
> + * in the shared ring buffer.
> + *
> + * By this way, extra data like SCSI command, DIF/DIX and other per-
> request/bio
> + * data can be transmitted from frontend to backend.
> + *
> + * The 'wire' format is like:
> + * Request 1: xen_blkif_request
> + * [Request 2: xen_blkif_extra_request] (only if request 1 has
> BLKIF_OP_EXTRA_FLAG)
> + * Request 3: xen_blkif_request
> + * Request 4: xen_blkif_request
> + * [Request 5: xen_blkif_extra_request] (only if request 4 has
> BLKIF_OP_EXTRA_FLAG)
> + * ...
> + * Request N: xen_blkif_request
> + *
> + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not*
> create the
> + * "feature-extra-request" node!
> + */
> +#define BLKIF_OP_EXTRA_FLAG (0x80)
> +
> +/*
> * Maximum scatter/gather segments per request.
> * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
> * NB. This could be 12 if the ring indexes weren't stored in the same page.
> @@ -703,10 +725,61 @@ struct blkif_request_indirect {
> };
> typedef struct blkif_request_indirect blkif_request_indirect_t;
>
> +enum blkif_extra_request_type {
> + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI
> command. */
> +};
> +
> +struct scsi_cmd_req {
> + /*
> + * Grant mapping for transmiting SCSI command to backend, and
> + * also receive sense data from backend.
> + * One 4KB page is enough.
> + */
> + grant_ref_t cmd_gref;
> + /* Length of SCSI command in the grant mapped page. */
> + unsigned int cmd_len;
> +
> + /*
> + * SCSI command may require transmiting data segment length less
> + * than a sector(512 bytes).
> + * Record num_sg and last segment length in extra request so that
> + * backend can know about them.
> + */
> + unsigned int num_sg;
> + unsigned int last_sg_len;
> +};
> +
> +/*
> + * Extra request, must follow a normal-request and a normal-request can
> + * only be followed by one extra request.
> + */
> +struct blkif_request_extra {
> + uint8_t type; /* BLKIF_EXTRA_TYPE_* */
> + uint16_t _pad1;
> +#ifndef CONFIG_X86_32
> + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8
> */
> +#endif
> + uint64_t id;
> + struct scsi_cmd_req scsi_cmd;
> +} __attribute__((__packed__));
> +typedef struct blkif_request_extra blkif_request_extra_t;
> +
> +struct scsi_cmd_res {
> + unsigned int resid_len;
> + /* Length of sense data returned in grant mapped page. */
> + unsigned int sense_len;
> +};
> +
> +struct blkif_response_extra {
> + uint8_t type; /* BLKIF_EXTRA_TYPE_* */
> + struct scsi_cmd_res scsi_cmd;
> +} __attribute__((__packed__));
> +
> struct blkif_response {
> uint64_t id; /* copied from request */
> uint8_t operation; /* copied from request */
> int16_t status; /* BLKIF_RSP_??? */
> + struct blkif_response_extra extra;
> };
> typedef struct blkif_response blkif_response_t;
>
> --
> 1.7.10.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 9:13 ` Paul Durrant
@ 2016-02-29 14:55 ` Konrad Rzeszutek Wilk
2016-02-29 15:28 ` Paul Durrant
0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-29 14:55 UTC (permalink / raw)
To: Paul Durrant
Cc: jgross@suse.com, xen-devel@lists.xen.org, jbeulich@suse.com,
Ian Jackson, Roger Pau Monne
On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Bob Liu [mailto:bob.liu@oracle.com]
> > Sent: 29 February 2016 03:37
> > To: xen-devel@lists.xen.org
> > Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne; jgross@suse.com;
> > Paul Durrant; konrad.wilk@oracle.com; Bob Liu
> > Subject: [RFC PATCH] xen-block: introduces extra request to pass-through
> > SCSI commands
> >
> > 1) What is this patch about?
> > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> > A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
> > extra request which is used to pass through SCSI commands.
> > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> > It can be extended easily to transmit other per-request/bio data from
> > frontend
> > to backend e.g Data Integrity Field per bio.
> >
> > 2) Why we need this?
> > Currently only raw data segments are transmitted from blkfront to blkback,
> > which
> > means some advanced features are lost.
> > * Guest knows nothing about features of the real backend storage.
> > For example, on bare-metal environment INQUIRY SCSI command
> > can be used
> > to query storage device information. If it's a SSD or flash device we
> > can have the option to use the device as a fast cache.
> > But this can't happen in current domU guests, because blkfront only
> > knows it's just a normal virtual disk
> >
>
> That's the sort of information that should be advertised via xenstore then. There already feature flags for specific things but if some form of throughput/latency information is meaningful to a frontend stack then perhaps that can be advertised too.
Certainly could be put on the XenStore. Do you envision this being done
pre guest creation (so toolstack does it), or the backend finds this
and populates the XenStore keys?
Or that the frontend writes an XenStore key 'scsi-inq=vpd80' and the backend
responds by populating an 'scsi-inq-vpd80=' <binary blob>'? If so can
the XenStore accept binary payloads? Can it be more than 4K?
>
> > * Failover Clusters in Windows
> > Failover clusters require SCSI-3 persistent reservation target disks,
> > but now this can't work in domU.
> >
>
> That's true but allowing arbitrary SCSI messages through is not the way forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean other OS do so I think reservation/release should have dedicated messages in the blkif protocol if it's desirable to support clustering in the frontend.
Could you expand a bit on the 'dedicated message' you have in mind please?
>
> > 3) Known issues:
> > * Security issues, how to 'validate' this extra request payload.
> > E.g SCSI operates on LUN bases (the whole disk) while we really just want
> > to
> > operate on partitions
> >
> > * Can't pass SCSI commands through if the backend storage driver is bio-
> > based
> > instead of request-based.
> >
> > 4) Alternative approach: Using PVSCSI instead:
> > * Doubt PVSCSI can support as many type of backend storage devices as
> > Xen-block.
> >
>
> LIO can interface to any block device in much the same way blkback does IIRC.
But it can't do multipath or LVMs - which is an integral component.
Anyhow that is more of a implementation specific quirk.
>
> > * Much longer path:
> > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-
> > backend -> Target framework(LIO?) ->
> >
> > With xen-block we only need:
> > ioctl() -> blkfront -> blkback ->
> >
>
> ...and what happens if the block device that blkback is talking to is a SCSI LUN?
>
> That latter path is also not true for Windows. You've got all the SCSI translation logic in the frontend when using blkif so that first path would collapse to:
>
> Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX
I don't know if it matters on the length of the path for say SCSI INQ. It isn't like
that is performance specific. Neither are the clustering SCSI commands.
>
> > * xen-block has been existed for many years, widely used and more stable.
> >
>
> It's definitely widely used, but it has had stability issues in recent times.
Oh? Could you send the bug-reports to me and Roger, CC xen-devel and LKML please ?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 14:55 ` Konrad Rzeszutek Wilk
@ 2016-02-29 15:28 ` Paul Durrant
2016-02-29 15:35 ` Roger Pau Monné
0 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2016-02-29 15:28 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: jgross@suse.com, xen-devel@lists.xen.org, jbeulich@suse.com,
Ian Jackson, Roger Pau Monne
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: 29 February 2016 14:56
> To: Paul Durrant
> Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeulich@suse.com; Roger
> Pau Monne; jgross@suse.com
> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-
> through SCSI commands
>
> On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Bob Liu [mailto:bob.liu@oracle.com]
> > > Sent: 29 February 2016 03:37
> > > To: xen-devel@lists.xen.org
> > > Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne;
> jgross@suse.com;
> > > Paul Durrant; konrad.wilk@oracle.com; Bob Liu
> > > Subject: [RFC PATCH] xen-block: introduces extra request to pass-
> through
> > > SCSI commands
> > >
> > > 1) What is this patch about?
> > > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> > > A request with BLKIF_OP_EXTRA_FLAG set means the following request
> is an
> > > extra request which is used to pass through SCSI commands.
> > > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> > > It can be extended easily to transmit other per-request/bio data from
> > > frontend
> > > to backend e.g Data Integrity Field per bio.
> > >
> > > 2) Why we need this?
> > > Currently only raw data segments are transmitted from blkfront to
> blkback,
> > > which
> > > means some advanced features are lost.
> > > * Guest knows nothing about features of the real backend storage.
> > > For example, on bare-metal environment INQUIRY SCSI command
> > > can be used
> > > to query storage device information. If it's a SSD or flash device we
> > > can have the option to use the device as a fast cache.
> > > But this can't happen in current domU guests, because blkfront only
> > > knows it's just a normal virtual disk
> > >
> >
> > That's the sort of information that should be advertised via xenstore then.
> There already feature flags for specific things but if some form of
> throughput/latency information is meaningful to a frontend stack then
> perhaps that can be advertised too.
>
> Certainly could be put on the XenStore. Do you envision this being done
> pre guest creation (so toolstack does it), or the backend finds this
> and populates the XenStore keys?
>
> Or that the frontend writes an XenStore key 'scsi-inq=vpd80' and the
> backend
> responds by populating an 'scsi-inq-vpd80=' <binary blob>'? If so can
> the XenStore accept binary payloads? Can it be more than 4K?
>
I was thinking more along the lines of blkback creating xenstore keys with any relevant information. We have sector size and number of sectors already but I don't see any harm in having values for other quantities that may be useful to a frontend. Bouncing SCSI inquiries via xenstore was certainly not what I was thinking.
>
> >
> > > * Failover Clusters in Windows
> > > Failover clusters require SCSI-3 persistent reservation target disks,
> > > but now this can't work in domU.
> > >
> >
> > That's true but allowing arbitrary SCSI messages through is not the way
> forward IMO. Just because Windows thinks every HBA is SCSI doesn't mean
> other OS do so I think reservation/release should have dedicated messages
> in the blkif protocol if it's desirable to support clustering in the frontend.
>
> Could you expand a bit on the 'dedicated message' you have in mind please?
>
As in we create message types to reserve/release whatever backend is being used and the backend uses whatever mechanism is appropriate to deal with that. E.g. if it were qdisk talking to a raw file then that might just be an flock.
> >
> > > 3) Known issues:
> > > * Security issues, how to 'validate' this extra request payload.
> > > E.g SCSI operates on LUN bases (the whole disk) while we really just
> want
> > > to
> > > operate on partitions
> > >
> > > * Can't pass SCSI commands through if the backend storage driver is bio-
> > > based
> > > instead of request-based.
> > >
> > > 4) Alternative approach: Using PVSCSI instead:
> > > * Doubt PVSCSI can support as many type of backend storage devices as
> > > Xen-block.
> > >
> >
> > LIO can interface to any block device in much the same way blkback does
> IIRC.
>
> But it can't do multipath or LVMs - which is an integral component.
>
Really? I was not aware of that limitation and it surprises me since AFAIK LIO can also use a raw file as a backstore which seems like it would be above either of those.
> Anyhow that is more of a implementation specific quirk.
> >
> > > * Much longer path:
> > > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-
> > > backend -> Target framework(LIO?) ->
> > >
> > > With xen-block we only need:
> > > ioctl() -> blkfront -> blkback ->
> > >
> >
> > ...and what happens if the block device that blkback is talking to is a SCSI
> LUN?
> >
> > That latter path is also not true for Windows. You've got all the SCSI
> translation logic in the frontend when using blkif so that first path would
> collapse to:
> >
> > Disk driver -> (SCSI) HBA Driver -> xen-scsiback -> LIO -> backstore -> XXX
>
> I don't know if it matters on the length of the path for say SCSI INQ. It isn't
> like
> that is performance specific. Neither are the clustering SCSI commands.
>
> >
> > > * xen-block has been existed for many years, widely used and more
> stable.
> > >
> >
> > It's definitely widely used, but it has had stability issues in recent times.
>
> Oh? Could you send the bug-reports to me and Roger, CC xen-devel and
> LKML please ?
I was casting my mind back to incompatibilities that crept in with persistent grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as my backend these days.
Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 15:28 ` Paul Durrant
@ 2016-02-29 15:35 ` Roger Pau Monné
2016-02-29 16:48 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2016-02-29 15:35 UTC (permalink / raw)
To: Paul Durrant, Konrad Rzeszutek Wilk
Cc: jgross@suse.com, Ian Jackson, jbeulich@suse.com,
xen-devel@lists.xen.org
El 29/2/16 a les 16:28, Paul Durrant ha escrit:
>> -----Original Message-----
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
>> Sent: 29 February 2016 14:56
>> To: Paul Durrant
>> Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeulich@suse.com; Roger
>> Pau Monne; jgross@suse.com
>> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-
>> through SCSI commands
>>
>> On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Bob Liu [mailto:bob.liu@oracle.com]
>>>> Sent: 29 February 2016 03:37
>>>> To: xen-devel@lists.xen.org
>>>> Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne;
>> jgross@suse.com;
>>>> Paul Durrant; konrad.wilk@oracle.com; Bob Liu
>>>> Subject: [RFC PATCH] xen-block: introduces extra request to pass-
>>>> * xen-block has been existed for many years, widely used and more
>> stable.
>>>>
>>>
>>> It's definitely widely used, but it has had stability issues in recent times.
>>
>> Oh? Could you send the bug-reports to me and Roger, CC xen-devel and
>> LKML please ?
>
> I was casting my mind back to incompatibilities that crept in with persistent grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as my backend these days.
FWIW, QEMU Qdisk also has persistent grants support... and was
implemented more or less at the same time as blkback persistent grants.
Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 15:35 ` Roger Pau Monné
@ 2016-02-29 16:48 ` Konrad Rzeszutek Wilk
2016-02-29 16:56 ` Paul Durrant
0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-29 16:48 UTC (permalink / raw)
To: Roger Pau Monné
Cc: jgross@suse.com, xen-devel@lists.xen.org, Paul Durrant,
jbeulich@suse.com, Ian Jackson
On Mon, Feb 29, 2016 at 04:35:36PM +0100, Roger Pau Monné wrote:
> El 29/2/16 a les 16:28, Paul Durrant ha escrit:
> >> -----Original Message-----
> >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> >> Sent: 29 February 2016 14:56
> >> To: Paul Durrant
> >> Cc: Bob Liu; xen-devel@lists.xen.org; Ian Jackson; jbeulich@suse.com; Roger
> >> Pau Monne; jgross@suse.com
> >> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-
> >> through SCSI commands
> >>
> >> On Mon, Feb 29, 2016 at 09:13:41AM +0000, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Bob Liu [mailto:bob.liu@oracle.com]
> >>>> Sent: 29 February 2016 03:37
> >>>> To: xen-devel@lists.xen.org
> >>>> Cc: Ian Jackson; jbeulich@suse.com; Roger Pau Monne;
> >> jgross@suse.com;
> >>>> Paul Durrant; konrad.wilk@oracle.com; Bob Liu
> >>>> Subject: [RFC PATCH] xen-block: introduces extra request to pass-
> >>>> * xen-block has been existed for many years, widely used and more
> >> stable.
> >>>>
> >>>
> >>> It's definitely widely used, but it has had stability issues in recent times.
> >>
> >> Oh? Could you send the bug-reports to me and Roger, CC xen-devel and
> >> LKML please ?
> >
> > I was casting my mind back to incompatibilities that crept in with persistent grants. TBH I haven't used blkback much since then; I tend to use qemu qdisk as my backend these days.
>
> FWIW, QEMU Qdisk also has persistent grants support... and was
> implemented more or less at the same time as blkback persistent grants.
<nods>
Perhaps Pauls' definition of 'recent times' means == two years or so :-)
:-)
>
> Roger.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 16:48 ` Konrad Rzeszutek Wilk
@ 2016-02-29 16:56 ` Paul Durrant
0 siblings, 0 replies; 27+ messages in thread
From: Paul Durrant @ 2016-02-29 16:56 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Roger Pau Monne
Cc: jgross@suse.com, Ian Jackson, jbeulich@suse.com,
xen-devel@lists.xen.org
> -----Original Message-----
[snip]
> > > I was casting my mind back to incompatibilities that crept in with
> persistent grants. TBH I haven't used blkback much since then; I tend to use
> qemu qdisk as my backend these days.
> >
> > FWIW, QEMU Qdisk also has persistent grants support... and was
> > implemented more or less at the same time as blkback persistent grants.
>
> <nods>
>
> Perhaps Pauls' definition of 'recent times' means == two years or so :-)
>
> :-)
Indeed. That's a blink of an eye for someone as ancient as me :-)
Paul
> >
> > Roger.
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 3:37 [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands Bob Liu
2016-02-29 8:12 ` Juergen Gross
2016-02-29 9:13 ` Paul Durrant
@ 2016-02-29 16:14 ` Ian Jackson
2016-02-29 16:29 ` Ian Jackson
2 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2016-02-29 16:14 UTC (permalink / raw)
To: Bob Liu; +Cc: jgross, xen-devel, paul.durrant, jbeulich, roger.pau
Bob Liu writes ("[RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
> 4) Alternative approach: Using PVSCSI instead:
> * Doubt PVSCSI can support as many type of backend storage devices
> as Xen-block.
I don't understand why this would be the case. Your "extra requests"
are SCSI CDBs, so your proposal only works with SCSI targets.
> * Much longer path:
> ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> PVSCSI-backend -> Target framework(LIO?) ->
The pvscsi pieces here is precisely the multiplexer that you really
want, for this all to work correctly.
If you want your guests to be able to drive this thing properly, you
need to make the thing look like a SCSI target anyway. So effectively
you've reinvented PVSCSI.
When you say "longer path" do you really mean that the performance is
poor ? I don't think this is due to the number of layers here. The
layers are quite thin.
If there is a performance problem with PVSCSI then perhaps that should
be investigated.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 16:14 ` Ian Jackson
@ 2016-02-29 16:29 ` Ian Jackson
2016-02-29 23:45 ` Bob Liu
2016-02-29 23:45 ` Bob Liu
0 siblings, 2 replies; 27+ messages in thread
From: Ian Jackson @ 2016-02-29 16:29 UTC (permalink / raw)
To: Bob Liu, xen-devel, jbeulich, roger.pau, jgross, paul.durrant,
konrad.wilk
Ian Jackson writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
> [stuff suggesting use of PVSCSI instead]
For the avoidance of doubt:
1. Thanks very much for bringing this proposal to us at the concept
stage. It is much easier to discuss these matters in a constructive
way before a lot of effort has been put into an implementation.
2. I should explain the downsides which I see in your proposal:
- Your suggestion has bad security properties: previously, the PV
block protocol would present only a very simple and narrow
interface. Your SCSI CDB passthrough proposal means that guests
would be able to activate features in SCSI targets which would be
unexpected and unintended by the host administrator. Such features
would perhaps even be unknown to the host administrator.
This could be mitigated by making this feature configurable, of
course, defaulting to off, along with clear documentation. But it's
not a desirable property.
- For similar reasons it will often be difficult to use such a feature
safely. Guest software in particular might expect that it can
safely use whatever features it can see, and do all sorts of
exciting things.
- It involves duplicating multiplexing logic which already exists in
PVSCSI.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 16:29 ` Ian Jackson
@ 2016-02-29 23:45 ` Bob Liu
2016-02-29 23:45 ` Bob Liu
1 sibling, 0 replies; 27+ messages in thread
From: Bob Liu @ 2016-02-29 23:45 UTC (permalink / raw)
To: Ian Jackson; +Cc: jgross, xen-devel, paul.durrant, jbeulich, roger.pau
On 03/01/2016 12:29 AM, Ian Jackson wrote:
> Ian Jackson writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
>> [stuff suggesting use of PVSCSI instead]
>
> For the avoidance of doubt:
>
> 1. Thanks very much for bringing this proposal to us at the concept
> stage. It is much easier to discuss these matters in a constructive
> way before a lot of effort has been put into an implementation.
>
> 2. I should explain the downsides which I see in your proposal:
>
> - Your suggestion has bad security properties: previously, the PV
> block protocol would present only a very simple and narrow
> interface. Your SCSI CDB passthrough proposal means that guests
> would be able to activate features in SCSI targets which would be
> unexpected and unintended by the host administrator. Such features
> would perhaps even be unknown to the host administrator.
>
> This could be mitigated by making this feature configurable, of
> course, defaulting to off, along with clear documentation. But it's
> not a desirable property.
>
> - For similar reasons it will often be difficult to use such a feature
> safely. Guest software in particular might expect that it can
> safely use whatever features it can see, and do all sorts of
> exciting things.
>
> - It involves duplicating multiplexing logic which already exists in
> PVSCSI.
>
One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device.
E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may send SCSI operates on LUN bases (the whole disk).
P.S. Thanks to all of you, it helps a lot!
--
Regards,
-Bob
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 16:29 ` Ian Jackson
2016-02-29 23:45 ` Bob Liu
@ 2016-02-29 23:45 ` Bob Liu
2016-03-01 18:08 ` Ian Jackson
1 sibling, 1 reply; 27+ messages in thread
From: Bob Liu @ 2016-02-29 23:45 UTC (permalink / raw)
To: Ian Jackson; +Cc: jgross, xen-devel, paul.durrant, jbeulich, roger.pau
On 03/01/2016 12:29 AM, Ian Jackson wrote:
> Ian Jackson writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
>> [stuff suggesting use of PVSCSI instead]
>
> For the avoidance of doubt:
>
> 1. Thanks very much for bringing this proposal to us at the concept
> stage. It is much easier to discuss these matters in a constructive
> way before a lot of effort has been put into an implementation.
>
> 2. I should explain the downsides which I see in your proposal:
>
> - Your suggestion has bad security properties: previously, the PV
> block protocol would present only a very simple and narrow
> interface. Your SCSI CDB passthrough proposal means that guests
> would be able to activate features in SCSI targets which would be
> unexpected and unintended by the host administrator. Such features
> would perhaps even be unknown to the host administrator.
>
> This could be mitigated by making this feature configurable, of
> course, defaulting to off, along with clear documentation. But it's
> not a desirable property.
>
> - For similar reasons it will often be difficult to use such a feature
> safely. Guest software in particular might expect that it can
> safely use whatever features it can see, and do all sorts of
> exciting things.
>
> - It involves duplicating multiplexing logic which already exists in
> PVSCSI.
>
One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device.
E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk).
P.S. Thanks to all of you, it helps a lot!
--
Regards,
-Bob
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-02-29 23:45 ` Bob Liu
@ 2016-03-01 18:08 ` Ian Jackson
2016-03-02 7:39 ` Juergen Gross
0 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2016-03-01 18:08 UTC (permalink / raw)
To: Bob Liu; +Cc: jgross, xen-devel, paul.durrant, jbeulich, roger.pau
Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
> One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device.
> E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk).
I don't think you can use pvscsi to passthrough a partition such as
/dev/sda1. Such a thing is not a SCSI command target.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-03-01 18:08 ` Ian Jackson
@ 2016-03-02 7:39 ` Juergen Gross
2016-03-02 7:57 ` Bob Liu
0 siblings, 1 reply; 27+ messages in thread
From: Juergen Gross @ 2016-03-02 7:39 UTC (permalink / raw)
To: Ian Jackson, Bob Liu; +Cc: paul.durrant, roger.pau, jbeulich, xen-devel
On 01/03/16 19:08, Ian Jackson wrote:
> Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
>> One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device.
>> E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk).
>
> I don't think you can use pvscsi to passthrough a partition such as
> /dev/sda1. Such a thing is not a SCSI command target.
It might be possible via the fileio target backend. In this case LUN
based SCSI operations are ignored/refused/emulated by LIO.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-03-02 7:39 ` Juergen Gross
@ 2016-03-02 7:57 ` Bob Liu
2016-03-02 11:40 ` Ian Jackson
0 siblings, 1 reply; 27+ messages in thread
From: Bob Liu @ 2016-03-02 7:57 UTC (permalink / raw)
To: Juergen Gross; +Cc: Ian Jackson, xen-devel, paul.durrant, jbeulich, roger.pau
Hi Juergen,
On 03/02/2016 03:39 PM, Juergen Gross wrote:
> On 01/03/16 19:08, Ian Jackson wrote:
>> Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
>>> One thing I'm still not sure about PVSCSI is do we have the same security issue since LIO can interface to any block device.
>>> E.g when using a partition /dev/sda1 as the PVSCSI-backend, but the PVSCSI-frontend may still send SCSI operates on LUN bases (the whole disk).
>>
>> I don't think you can use pvscsi to passthrough a partition such as
>> /dev/sda1. Such a thing is not a SCSI command target.
>
> It might be possible via the fileio target backend. In this case LUN
> based SCSI operations are ignored/refused/emulated by LIO.
>
Do you know whether pvscsi can work on top of multipath(the device-mapper framework) or LVMs?
Thank you!
Bob
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-03-02 7:57 ` Bob Liu
@ 2016-03-02 11:40 ` Ian Jackson
2016-03-02 11:46 ` Paul Durrant
2016-03-02 12:28 ` Bob Liu
0 siblings, 2 replies; 27+ messages in thread
From: Ian Jackson @ 2016-03-02 11:40 UTC (permalink / raw)
To: Bob Liu; +Cc: Juergen Gross, xen-devel, paul.durrant, jbeulich, roger.pau
Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
> Do you know whether pvscsi can work on top of multipath(the device-mapper framework) or LVMs?
No, it can't. devmapper and LVM work with the block device
abstraction.
Implicitly you seem to be suggesting that you want to use dm-multipath
and LVM, but also send other SCSI CDBs from the upper layers through
to the underlying SCSI storage target.
I can't see how that could cause anything but pain. In many cases
"the underlying SCSI storage target" wouldn't be well defined. Even
if it was, these side channel SCSI commands are likely to Go Wrong in
exciting ways.
What SCSI commands do you want to send ?
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-03-02 11:40 ` Ian Jackson
@ 2016-03-02 11:46 ` Paul Durrant
2016-03-02 12:00 ` Juergen Gross
2016-03-02 12:28 ` Bob Liu
1 sibling, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2016-03-02 11:46 UTC (permalink / raw)
To: Ian Jackson, Bob Liu
Cc: Juergen Gross, Roger Pau Monne, jbeulich@suse.com,
xen-devel@lists.xen.org
> -----Original Message-----
> From: Ian Jackson [mailto:Ian.Jackson@eu.citrix.com]
> Sent: 02 March 2016 11:41
> To: Bob Liu
> Cc: Juergen Gross; xen-devel@lists.xen.org; jbeulich@suse.com; Roger Pau
> Monne; Paul Durrant; konrad.wilk@oracle.com
> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-
> through SCSI commands
>
> Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to
> pass-through SCSI commands"):
> > Do you know whether pvscsi can work on top of multipath(the device-
> mapper framework) or LVMs?
>
> No, it can't. devmapper and LVM work with the block device
> abstraction.
>
Google suggests you can use LIO to present an LV as a virtual SCSI LUN, in which case pvscsi should work.
Paul
> Implicitly you seem to be suggesting that you want to use dm-multipath
> and LVM, but also send other SCSI CDBs from the upper layers through
> to the underlying SCSI storage target.
>
> I can't see how that could cause anything but pain. In many cases
> "the underlying SCSI storage target" wouldn't be well defined. Even
> if it was, these side channel SCSI commands are likely to Go Wrong in
> exciting ways.
>
> What SCSI commands do you want to send ?
>
> Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-03-02 11:46 ` Paul Durrant
@ 2016-03-02 12:00 ` Juergen Gross
0 siblings, 0 replies; 27+ messages in thread
From: Juergen Gross @ 2016-03-02 12:00 UTC (permalink / raw)
To: Paul Durrant, Ian Jackson, Bob Liu
Cc: Roger Pau Monne, jbeulich@suse.com, xen-devel@lists.xen.org
On 02/03/16 12:46, Paul Durrant wrote:
>> -----Original Message-----
>> From: Ian Jackson [mailto:Ian.Jackson@eu.citrix.com]
>> Sent: 02 March 2016 11:41
>> To: Bob Liu
>> Cc: Juergen Gross; xen-devel@lists.xen.org; jbeulich@suse.com; Roger Pau
>> Monne; Paul Durrant; konrad.wilk@oracle.com
>> Subject: Re: [RFC PATCH] xen-block: introduces extra request to pass-
>> through SCSI commands
>>
>> Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to
>> pass-through SCSI commands"):
>>> Do you know whether pvscsi can work on top of multipath(the device-
>> mapper framework) or LVMs?
>>
>> No, it can't. devmapper and LVM work with the block device
>> abstraction.
>>
>
> Google suggests you can use LIO to present an LV as a virtual SCSI LUN, in which case pvscsi should work.
Indeed: http://linux-iscsi.org/wiki/Targetcli#IBLOCK
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-03-02 11:40 ` Ian Jackson
2016-03-02 11:46 ` Paul Durrant
@ 2016-03-02 12:28 ` Bob Liu
2016-03-02 14:44 ` Ian Jackson
1 sibling, 1 reply; 27+ messages in thread
From: Bob Liu @ 2016-03-02 12:28 UTC (permalink / raw)
To: Ian Jackson; +Cc: Juergen Gross, xen-devel, paul.durrant, jbeulich, roger.pau
On 03/02/2016 07:40 PM, Ian Jackson wrote:
> Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
>> Do you know whether pvscsi can work on top of multipath(the device-mapper framework) or LVMs?
>
> No, it can't. devmapper and LVM work with the block device
> abstraction.
>
> Implicitly you seem to be suggesting that you want to use dm-multipath
> and LVM, but also send other SCSI CDBs from the upper layers through
> to the underlying SCSI storage target.
>
Exactly!
> I can't see how that could cause anything but pain. In many cases
> "the underlying SCSI storage target" wouldn't be well defined. Even
> if it was, these side channel SCSI commands are likely to Go Wrong in
> exciting ways.
>
> What SCSI commands do you want to send ?
>
* INQUIRY
* PERSISTENT RESERVE IN
* PERSISTENT RESERVE OUT
This is for Failover Clusters in Windows, not sure whether more commands are required.
I didn't get a required scsi commands list in the failover document.
--
Regards,
-Bob
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
2016-03-02 12:28 ` Bob Liu
@ 2016-03-02 14:44 ` Ian Jackson
[not found] ` <20160302172257.GC27821@char.us.oracle.com>
0 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2016-03-02 14:44 UTC (permalink / raw)
To: Bob Liu; +Cc: Juergen Gross, xen-devel, paul.durrant, jbeulich, roger.pau
Bob Liu writes ("Re: [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands"):
> On 03/02/2016 07:40 PM, Ian Jackson wrote:
> > I can't see how that could cause anything but pain. In many cases
> > "the underlying SCSI storage target" wouldn't be well defined. Even
> > if it was, these side channel SCSI commands are likely to Go Wrong in
> > exciting ways.
> >
> > What SCSI commands do you want to send ?
>
> * INQUIRY
... but why ?
> * PERSISTENT RESERVE IN
> * PERSISTENT RESERVE OUT
>
> This is for Failover Clusters in Windows, not sure whether more
> commands are required. I didn't get a required scsi commands list
> in the failover document.
So you want to be able to reserve the volume against concurrent
access ? If you're using LVM, such a reservation should apply to the
LVM LV, not to the underlying physical storage device, clearly. So I
think LIO [1] + PVSCSI might be what you want.
Ian.
[1] http://linux-iscsi.org/wiki/LIO
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2016-03-03 14:19 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-29 3:37 [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands Bob Liu
2016-02-29 8:12 ` Juergen Gross
2016-02-29 15:05 ` Konrad Rzeszutek Wilk
2016-02-29 15:34 ` Juergen Gross
2016-02-29 9:13 ` Paul Durrant
2016-02-29 14:55 ` Konrad Rzeszutek Wilk
2016-02-29 15:28 ` Paul Durrant
2016-02-29 15:35 ` Roger Pau Monné
2016-02-29 16:48 ` Konrad Rzeszutek Wilk
2016-02-29 16:56 ` Paul Durrant
2016-02-29 16:14 ` Ian Jackson
2016-02-29 16:29 ` Ian Jackson
2016-02-29 23:45 ` Bob Liu
2016-02-29 23:45 ` Bob Liu
2016-03-01 18:08 ` Ian Jackson
2016-03-02 7:39 ` Juergen Gross
2016-03-02 7:57 ` Bob Liu
2016-03-02 11:40 ` Ian Jackson
2016-03-02 11:46 ` Paul Durrant
2016-03-02 12:00 ` Juergen Gross
2016-03-02 12:28 ` Bob Liu
2016-03-02 14:44 ` Ian Jackson
[not found] ` <20160302172257.GC27821@char.us.oracle.com>
2016-03-03 11:54 ` Paul Durrant
2016-03-03 12:03 ` Ian Jackson
2016-03-03 12:25 ` Juergen Gross
2016-03-03 14:07 ` Konrad Rzeszutek Wilk
2016-03-03 14:19 ` Paul Durrant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).