* [Patch 0/7] pvSCSI driver
@ 2008-02-18 10:10 Jun Kamada
2008-02-18 12:14 ` James Harper
` (3 more replies)
0 siblings, 4 replies; 34+ messages in thread
From: Jun Kamada @ 2008-02-18 10:10 UTC (permalink / raw)
To: xen-devel; +Cc: kama
Hi all,
I will post total seven patches for new pvSCSI driver on following
E-mails.
New features of the driver are as follows.
- Support assignment of each SCSI device(LUN:Logical Unit Number) to
guest domains.
- Can specify the SCSI device by three ways.(See below.)
- Simplified RING mechanism between frontend and backend communication.
(Previous version used two RINGs for frontend to backend communication
and backend to frontend communication respectively. This version uses
one RING as same as VBD.)
[ How to use ]
a.) by "xm" command
# xm scsi-attach <domain> <scsidevice>
b.) by config file
vscsi=['scsidevice','scsidevice']
You can specify "scsidevice" by three ways for both case.
1.) /dev/sdx or sdx, /dev/stx or stx, /dev/sgx or sgx
2.) scsi_id (result of "scsi_id -gu -s /block/sda")
Example: 36000b5d0006a0000006a025700400000
3.) host:chanel:target:lun
Example: 4:0:0:10
Any comments are welcome.
Best regards,
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-02-18 10:10 [Patch 0/7] pvSCSI driver Jun Kamada
@ 2008-02-18 12:14 ` James Harper
2008-02-19 2:26 ` Jun Kamada
2008-02-20 3:58 ` James Harper
` (2 subsequent siblings)
3 siblings, 1 reply; 34+ messages in thread
From: James Harper @ 2008-02-18 12:14 UTC (permalink / raw)
To: Jun Kamada, xen-devel
>
> Hi all,
>
> I will post total seven patches for new pvSCSI driver on following
> E-mails.
>
Jun,
What version of Xen is this patch supposed to be against? I am keen to
develop a front end for the Windows PV drivers I've been working on, so
I need to build it for my Dom0. I have built the 'tools' stuff into the
Debian package (eg just patched the original tree, applied the Debian
patches, and did a dpkg-buildpackage), and am now trying to build the
scsiback driver 'out of tree', and it all builds but complains about
'bind_interdomain_evtchn_to_irqhandler', which I'm guessing is a symbol
that isn't in the Debian kernel...
Any suggestions?
Thanks
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-18 12:14 ` James Harper
@ 2008-02-19 2:26 ` Jun Kamada
2008-02-19 2:28 ` James Harper
0 siblings, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-02-19 2:26 UTC (permalink / raw)
To: James Harper; +Cc: kama, xen-devel
Hi James-san,
The pvSCSI driver requires Xen 3.2.
I think the Debian kernel uses Xen 3.1.x probably. Isn't it?
Thanks
On Mon, 18 Feb 2008 23:14:01 +1100
"James Harper" <james.harper@bendigoit.com.au> wrote:
> >
> > Hi all,
> >
> > I will post total seven patches for new pvSCSI driver on following
> > E-mails.
> >
>
> Jun,
>
> What version of Xen is this patch supposed to be against? I am keen to
> develop a front end for the Windows PV drivers I've been working on, so
> I need to build it for my Dom0. I have built the 'tools' stuff into the
> Debian package (eg just patched the original tree, applied the Debian
> patches, and did a dpkg-buildpackage), and am now trying to build the
> scsiback driver 'out of tree', and it all builds but complains about
> 'bind_interdomain_evtchn_to_irqhandler', which I'm guessing is a symbol
> that isn't in the Debian kernel...
>
> Any suggestions?
>
> Thanks
>
> James
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-02-19 2:26 ` Jun Kamada
@ 2008-02-19 2:28 ` James Harper
0 siblings, 0 replies; 34+ messages in thread
From: James Harper @ 2008-02-19 2:28 UTC (permalink / raw)
To: Jun Kamada; +Cc: xen-devel
> Hi James-san,
>
> The pvSCSI driver requires Xen 3.2.
> I think the Debian kernel uses Xen 3.1.x probably. Isn't it?
I think it's worse than that... I think the Xen hypervisor is 3.1.2, but
the Debian Xen Linux kernel has patches that are even older.
What is it about 3.2 that the pvSCSI driver requires?
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-02-18 10:10 [Patch 0/7] pvSCSI driver Jun Kamada
2008-02-18 12:14 ` James Harper
@ 2008-02-20 3:58 ` James Harper
2008-02-20 5:09 ` Jun Kamada
2008-02-21 2:19 ` James Harper
2008-02-27 11:16 ` Steven Smith
3 siblings, 1 reply; 34+ messages in thread
From: James Harper @ 2008-02-20 3:58 UTC (permalink / raw)
To: Jun Kamada, xen-devel
I think I've got it working under Debian Etch.
I'm now trying to develop a frontend driver for windows, and triggered a
BUG() on or around line 328 of scsiback.c, because I wasn't setting bus,
target, and lun in the request. This effectively breaks Dom0 (hotplug
scripts refused to work thereafter until a reboot), which means a rogue
DomU can crash Dom0. I think you should implement a more graceful
failure path.
Also, for what reason are the bus, target, and lun required in the
request? It look like that's a leftover from an earlier version and I
don't see that it is required now.
James
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of Jun Kamada
> Sent: Monday, 18 February 2008 21:11
> To: xen-devel@lists.xensource.com
> Cc: kama@jp.fujitsu.com
> Subject: [Xen-devel] [Patch 0/7] pvSCSI driver
>
> Hi all,
>
> I will post total seven patches for new pvSCSI driver on following
> E-mails.
>
> New features of the driver are as follows.
>
> - Support assignment of each SCSI device(LUN:Logical Unit Number) to
> guest domains.
> - Can specify the SCSI device by three ways.(See below.)
> - Simplified RING mechanism between frontend and backend
communication.
> (Previous version used two RINGs for frontend to backend
communication
> and backend to frontend communication respectively. This version
uses
> one RING as same as VBD.)
>
>
> [ How to use ]
> a.) by "xm" command
> # xm scsi-attach <domain> <scsidevice>
>
> b.) by config file
> vscsi=['scsidevice','scsidevice']
>
>
> You can specify "scsidevice" by three ways for both case.
>
> 1.) /dev/sdx or sdx, /dev/stx or stx, /dev/sgx or sgx
> 2.) scsi_id (result of "scsi_id -gu -s /block/sda")
> Example: 36000b5d0006a0000006a025700400000
> 3.) host:chanel:target:lun
> Example: 4:0:0:10
>
>
> Any comments are welcome.
>
> Best regards,
>
> -----
> Jun Kamada
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-20 3:58 ` James Harper
@ 2008-02-20 5:09 ` Jun Kamada
0 siblings, 0 replies; 34+ messages in thread
From: Jun Kamada @ 2008-02-20 5:09 UTC (permalink / raw)
To: James Harper; +Cc: kama, xen-devel
Hi James-san,
Thank you for your comment.
On Wed, 20 Feb 2008 14:58:48 +1100
"James Harper" <james.harper@bendigoit.com.au> wrote:
> I'm now trying to develop a frontend driver for windows, and triggered a
> BUG() on or around line 328 of scsiback.c, because I wasn't setting bus,
> target, and lun in the request. This effectively breaks Dom0 (hotplug
> scripts refused to work thereafter until a reboot), which means a rogue
> DomU can crash Dom0. I think you should implement a more graceful
> failure path.
Yes, I agree on your opinion.
Some modification or addition may be needed about error handling,
including Reset/Abort SCSI command. We would like to post new version
ASAP.
However, we also would like to get a lot of comments on *current*
version for the enhancement.
> Also, for what reason are the bus, target, and lun required in the
> request? It look like that's a leftover from an earlier version and I
> don't see that it is required now.
The LUN assignment to guest can provide HBA sharing from multiple
guests. We consider that feature is very useful for many usage
scenarios. And also, LUN assignment covers HBA assignment by using
wildcard, for example "xm scsi-attach <domain> 4:*:*:*". Needless to
say, expansion of "xm" or "xend" is required in that case. :-)
Best regards,
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-02-18 10:10 [Patch 0/7] pvSCSI driver Jun Kamada
2008-02-18 12:14 ` James Harper
2008-02-20 3:58 ` James Harper
@ 2008-02-21 2:19 ` James Harper
2008-02-21 3:39 ` James Harper
2008-02-27 11:16 ` Steven Smith
3 siblings, 1 reply; 34+ messages in thread
From: James Harper @ 2008-02-21 2:19 UTC (permalink / raw)
To: Jun Kamada, xen-devel
Another issue I've come across - you appear to have hardcoded the
timeout to 5 seconds. I'm trying to run the HP Library & Tape Tools
under windows, and things like unload and erase go well beyond the 5
seconds you allow. I increased the timeout to 30 seconds and the unload
works fine but the erase goes for longer than that.
I notice you have a timeout field in the request field, but have
commented it out. What was the reason for this?
Also, I had a system crash when I removed the pvscsi backend driver
module. Something else to look at.
The timeout is the thing that is causing concern at the moment though.
Thanks
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-02-21 2:19 ` James Harper
@ 2008-02-21 3:39 ` James Harper
2008-02-21 4:23 ` Jun Kamada
0 siblings, 1 reply; 34+ messages in thread
From: James Harper @ 2008-02-21 3:39 UTC (permalink / raw)
To: Jun Kamada, xen-devel
>
> Another issue I've come across - you appear to have hardcoded the
> timeout to 5 seconds. I'm trying to run the HP Library & Tape Tools
> under windows, and things like unload and erase go well beyond the 5
> seconds you allow. I increased the timeout to 30 seconds and the
unload
> works fine but the erase goes for longer than that.
>
> I notice you have a timeout field in the request field, but have
> commented it out. What was the reason for this?
>
Just responding to myself, would I be guessing correctly that you
removed the timeout field to make the request structure smaller? The top
byte of the request_bufflen field could be used as a timeout, as
sensible timeout values don't need to be very exact. Even if we just
used the top 4 bits and made it (1 << (timeout + 1)) * 5 * HZ, that
would give us a bit over 45 hours, and we'd make 15 mean infinite.
By my calculations, the largest that bufflen could be is 27 * PAGE_SIZE
= ~110K on x86, so we have plenty of headroom.
With the timeout increased, the HP Library & Tape Tools have
successfully completed a 'LTO Drive Assessment test' under Windows.
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-21 3:39 ` James Harper
@ 2008-02-21 4:23 ` Jun Kamada
2008-02-21 5:30 ` James Harper
0 siblings, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-02-21 4:23 UTC (permalink / raw)
To: James Harper; +Cc: kama, xen-devel
Hi James-san,
On Thu, 21 Feb 2008 14:39:47 +1100
"James Harper" <james.harper@bendigoit.com.au> wrote:
> Just responding to myself, would I be guessing correctly that you
> removed the timeout field to make the request structure smaller? The top
That is one reason. However, the main reason is as follows.
The time that guests/host will get is not real world's time on
virtualized environment. The time depends on hypervisor's scheduling.
(Is this assumption right ?)
For example, if upper layer of the pvSCSI frontend specified 5 seconds
as timeout, it should be treated as real world's time or within a guest
domain's world time ?
We didn't have clear answer when we implemented that part. Therefore,
we coded it temporally 5 seconds.
James-san, how do you think about this issue ?
By the way, we understand that the 5 seconds is too short to support
tape device.
Best regards,
> byte of the request_bufflen field could be used as a timeout, as
> sensible timeout values don't need to be very exact. Even if we just
> used the top 4 bits and made it (1 << (timeout + 1)) * 5 * HZ, that
> would give us a bit over 45 hours, and we'd make 15 mean infinite.
>
> By my calculations, the largest that bufflen could be is 27 * PAGE_SIZE
> = ~110K on x86, so we have plenty of headroom.
>
> With the timeout increased, the HP Library & Tape Tools have
> successfully completed a 'LTO Drive Assessment test' under Windows.
>
> James
>
Jun Kamada
Linux Technology Development Div.
Server Systems Unit
Fujitsu Ltd.
kama@jp.fujitsu.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-02-21 4:23 ` Jun Kamada
@ 2008-02-21 5:30 ` James Harper
2008-02-25 1:53 ` Jun Kamada
0 siblings, 1 reply; 34+ messages in thread
From: James Harper @ 2008-02-21 5:30 UTC (permalink / raw)
To: Jun Kamada; +Cc: xen-devel
> Hi James-san,
>
> On Thu, 21 Feb 2008 14:39:47 +1100
> "James Harper" <james.harper@bendigoit.com.au> wrote:
> > Just responding to myself, would I be guessing correctly that you
> > removed the timeout field to make the request structure smaller? The
top
>
> That is one reason. However, the main reason is as follows.
>
> The time that guests/host will get is not real world's time on
> virtualized environment. The time depends on hypervisor's scheduling.
> (Is this assumption right ?)
>
>
> For example, if upper layer of the pvSCSI frontend specified 5 seconds
> as timeout, it should be treated as real world's time or within a
guest
> domain's world time ?
>
> We didn't have clear answer when we implemented that part. Therefore,
> we coded it temporally 5 seconds.
>
> James-san, how do you think about this issue ?
I don't think the exact value of the timeout matters that much. At
worst, a 5 second timeout is going to be at least 5 seconds, and
probably not much more than that. It's the Linux SCSI subsystem itself
that handles the timeout anyway.
> By the way, we understand that the 5 seconds is too short to support
> tape device.
Yes, way too short. For running the HP LT&T (Library and Tape Tools),
even 60 seconds is too short for some operations.
FYI, I have the HP LT&T working nicely under windows now. All tests
succeed, even a firmware update to the tape drive worked. A Read/Write
test on a HP LTO2 drive with LTO1 media gives me 13.3MB/s (approx
800MB/minute) for both read and write operations. I assume that a CD or
DVD burner would work also, although I don't have one to test.
Are you planning on requesting that pvSCSI get merged into the Xen tree
once you have the timeout and unload issues sorted out?
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-21 5:30 ` James Harper
@ 2008-02-25 1:53 ` Jun Kamada
0 siblings, 0 replies; 34+ messages in thread
From: Jun Kamada @ 2008-02-25 1:53 UTC (permalink / raw)
To: James Harper; +Cc: kama, xen-devel
Hi James-san,
Sorry for late reply.
On Thu, 21 Feb 2008 16:30:09 +1100
"James Harper" <james.harper@bendigoit.com.au> wrote:
> Are you planning on requesting that pvSCSI get merged into the Xen tree
> once you have the timeout and unload issues sorted out?
Yes, I will re-post it ASAP. In addition to the issues mentioned above,
I would like to implement Reset/Abort function.
Best regards,
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-18 10:10 [Patch 0/7] pvSCSI driver Jun Kamada
` (2 preceding siblings ...)
2008-02-21 2:19 ` James Harper
@ 2008-02-27 11:16 ` Steven Smith
2008-02-28 2:51 ` Jun Kamada
3 siblings, 1 reply; 34+ messages in thread
From: Steven Smith @ 2008-02-27 11:16 UTC (permalink / raw)
To: Jun Kamada; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1554 bytes --]
> I will post total seven patches for new pvSCSI driver on following
> E-mails.
Thanks for doing this, being able to pass SCSI devices through to
guests is likely to be a useful facility.
I have a couple of comments on the design:
-- You've ended up re-implementing a lot of Linux SCSI stuff in the
backend. I don't understand why this was necessary. Would you
mind explaining, please?
-- The code seems to be a bit undecided about whether the exposed
devices are supposed to represent SCSI adapters or SCSI targets.
It looks like the frontend initially tries to treat them as a bunch
of targets, and then conditionally gloms them back together into
hosts depending on xenstore fields? Having a host per target would
make sense, as would having a single host with all of the targets
hanging off of it, but I don't understand why this split model is
useful. Perhaps I'm just missing something.
-- I don't understand the distinction between comfront and scsifront.
What was the reason for this split?
-- There don't seem to be many comments in these patches. Xen and
Linux are both generally pretty comment-light, but an entire new
device class without a single meaningful comment still kind of
stands out.
I'll reply to the individual patches with more detailed comments. A
lot of my complaints will doubtless turn out to just be because I'm
not very used to Linux SCSI. I've not looked at the xend changes,
because I'm not really competent to evaluate them.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-27 11:16 ` Steven Smith
@ 2008-02-28 2:51 ` Jun Kamada
2008-02-28 11:13 ` Steven Smith
0 siblings, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-02-28 2:51 UTC (permalink / raw)
To: Steven Smith; +Cc: kama, xen-devel
Hi Steven-san,
I appreciate your sending a lot of helpful comments. I will answer
the questions and comments described below now. For the other
questions/comments about each source code, I would like to reply on
another mail later.
On Wed, 27 Feb 2008 11:16:10 +0000
Steven Smith <steven.smith@eu.citrix.com> wrote:
> I have a couple of comments on the design:
>
> -- You've ended up re-implementing a lot of Linux SCSI stuff in the
> backend. I don't understand why this was necessary. Would you
> mind explaining, please?
# If I misunderstood your question, please inform.
In order to provide LUN assignment to guest domain, backend driver is
modified. Modification point is as follows.
- Ring, EventChannel and GrantTable are independently allocated for
each LUN assignment. Previsous version used them for each host (HBA).
This is the main difference between previous and new version.
And also, we removed code for FC transport layer.
> -- The code seems to be a bit undecided about whether the exposed
> devices are supposed to represent SCSI adapters or SCSI targets.
> It looks like the frontend initially tries to treat them as a bunch
> of targets, and then conditionally gloms them back together into
> hosts depending on xenstore fields? Having a host per target would
> make sense, as would having a single host with all of the targets
> hanging off of it, but I don't understand why this split model is
> useful. Perhaps I'm just missing something.
Frontend driver try to attach each LUN according to the information on
xenstore. New version support only LUN assignment to guest domain. If
you want SCSI device/host assignment to guest, you have to specify all
LUNs under the device/host. I understand wildcard description, such
like "1:*:*:*", is needed.
> -- I don't understand the distinction between comfront and scsifront.
> What was the reason for this split?
I intended to seperate two types of code, primitive code for
communication between frontend and backend, and SCSI specific code.
However, the separation may be incomplete.
> -- There don't seem to be many comments in these patches. Xen and
> Linux are both generally pretty comment-light, but an entire new
> device class without a single meaningful comment still kind of
> stands out.
I agree to your comment completely. I should add more comment to
source code.
> I'll reply to the individual patches with more detailed comments. A
> lot of my complaints will doubtless turn out to just be because I'm
> not very used to Linux SCSI. I've not looked at the xend changes,
> because I'm not really competent to evaluate them.
>
> Steven.
Best regards,
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-28 2:51 ` Jun Kamada
@ 2008-02-28 11:13 ` Steven Smith
2008-02-29 4:47 ` Jun Kamada
0 siblings, 1 reply; 34+ messages in thread
From: Steven Smith @ 2008-02-28 11:13 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 3635 bytes --]
> > I have a couple of comments on the design:
> >
> > -- You've ended up re-implementing a lot of Linux SCSI stuff in the
> > backend. I don't understand why this was necessary. Would you
> > mind explaining, please?
>
> # If I misunderstood your question, please inform.
>
> In order to provide LUN assignment to guest domain, backend driver is
> modified. Modification point is as follows.
>
> - Ring, EventChannel and GrantTable are independently allocated for
> each LUN assignment. Previsous version used them for each host (HBA).
> This is the main difference between previous and new version.
>
> And also, we removed code for FC transport layer.
I'm afraid I haven't looked closely at the previous revisions of this
patch, so I don't know about any differences between them. I was
referring more to bits like requset_map_sg, which is an almost direct
copy and paste of drivers/scsi/scsi_lib.c::scsi_req_map_sg, and
scsiback_merge_bio(), which is identical to scsi_merge_bio() except
for some whitespace changes. Having to carry our own implementation
of core Linux SCSI support seems like it'll be a significant
maintenance burden, and I'd like to understand why it was necessary.
> > -- The code seems to be a bit undecided about whether the exposed
> > devices are supposed to represent SCSI adapters or SCSI targets.
> > It looks like the frontend initially tries to treat them as a bunch
> > of targets, and then conditionally gloms them back together into
> > hosts depending on xenstore fields? Having a host per target would
> > make sense, as would having a single host with all of the targets
> > hanging off of it, but I don't understand why this split model is
> > useful. Perhaps I'm just missing something.
> Frontend driver try to attach each LUN according to the information on
> xenstore. New version support only LUN assignment to guest domain. If
> you want SCSI device/host assignment to guest, you have to specify all
> LUNs under the device/host. I understand wildcard description, such
> like "1:*:*:*", is needed.
Okay, so a device in xenstore corresponds to a LUN, and you map them
to particular hosts based on the device name? That's kind of ugly,
but it's probably the most direct way of doing it through xenstore.
What I don't understand is why you need this at all. It seems like it
would make more sense to either:
a) Hang every LUN off of the same scsi host, or
b) Give each LUN its own scsi host.
Is there some reason why you might want to do something like this:
Host A -------+----- LUN 1
|
+----- LUN 2
Host B ------------- LUN 3
i.e. partition the virtual LUNs between multiple hosts in the guest,
but keeping some of them together? Perhaps I'm just missing
something, but I can't think of any use cases which would benefit from
that, and trying to support it noticeably complicates the frontend.
> > -- I don't understand the distinction between comfront and scsifront.
> > What was the reason for this split?
> I intended to seperate two types of code, primitive code for
> communication between frontend and backend, and SCSI specific code.
> However, the separation may be incomplete.
Okay.
> > -- There don't seem to be many comments in these patches. Xen and
> > Linux are both generally pretty comment-light, but an entire new
> > device class without a single meaningful comment still kind of
> > stands out.
> I agree to your comment completely. I should add more comment to
> source code.
Thanks.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-28 11:13 ` Steven Smith
@ 2008-02-29 4:47 ` Jun Kamada
2008-03-03 11:38 ` Steven Smith
0 siblings, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-02-29 4:47 UTC (permalink / raw)
To: Steven Smith; +Cc: kama, xen-devel
Steven-san,
On Thu, 28 Feb 2008 11:13:31 +0000
Steven Smith <steven.smith@eu.citrix.com> wrote:
> What I don't understand is why you need this at all. It seems like it
> would make more sense to either:
>
> a) Hang every LUN off of the same scsi host, or
> b) Give each LUN its own scsi host.
>
> Is there some reason why you might want to do something like this:
>
> Host A -------+----- LUN 1
> |
> +----- LUN 2
>
> Host B ------------- LUN 3
>
> i.e. partition the virtual LUNs between multiple hosts in the guest,
> but keeping some of them together? Perhaps I'm just missing
> something, but I can't think of any use cases which would benefit from
> that, and trying to support it noticeably complicates the frontend.
Can I explain a numbering logic of assigning LUNs to guests?
Basically, each guest looks same SCSI tree as host except for following
two points.
1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be
same as that on host.
2.) Tree on the guest may be sparse when some LUN doesn't assign to
the guest.
Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2
generally)
I think the numbering logic is same as b) you mentioned above. Is it
right?
Thanks,
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-02-29 4:47 ` Jun Kamada
@ 2008-03-03 11:38 ` Steven Smith
2008-03-04 7:57 ` Jun Kamada
0 siblings, 1 reply; 34+ messages in thread
From: Steven Smith @ 2008-03-03 11:38 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1624 bytes --]
> > What I don't understand is why you need this at all. It seems like it
> > would make more sense to either:
> >
> > a) Hang every LUN off of the same scsi host, or
> > b) Give each LUN its own scsi host.
> >
> > Is there some reason why you might want to do something like this:
> >
> > Host A -------+----- LUN 1
> > |
> > +----- LUN 2
> >
> > Host B ------------- LUN 3
> >
> > i.e. partition the virtual LUNs between multiple hosts in the guest,
> > but keeping some of them together? Perhaps I'm just missing
> > something, but I can't think of any use cases which would benefit from
> > that, and trying to support it noticeably complicates the frontend.
> Can I explain a numbering logic of assigning LUNs to guests?
That was what I was hoping you'd do, yes. :)
> Basically, each guest looks same SCSI tree as host except for following
> two points.
>
> 1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be
> same as that on host.
> 2.) Tree on the guest may be sparse when some LUN doesn't assign to
> the guest.
>
> Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2
> generally)
Okay, why do you require that the device in the guest has the same
channel:id:lun as the device on the host? That seems like a somewhat
gratuitous restriction to me.
> I think the numbering logic is same as b) you mentioned above. Is it
> right?
No, you've gone for option c:
c) The topology inside the guest reflects a subset of the host
topology
which I hadn't previously considered.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-03 11:38 ` Steven Smith
@ 2008-03-04 7:57 ` Jun Kamada
2008-03-04 13:05 ` Steven Smith
0 siblings, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-03-04 7:57 UTC (permalink / raw)
To: Steven Smith; +Cc: kama, xen-devel
Hi Steven-san,
On Mon, 3 Mar 2008 11:38:57 +0000
Steven Smith <steven.smith@eu.citrix.com> wrote:
> > > What I don't understand is why you need this at all. It seems like it
> > > would make more sense to either:
> > >
> > > a) Hang every LUN off of the same scsi host, or
> > > b) Give each LUN its own scsi host.
> > >
> > > Is there some reason why you might want to do something like this:
> > >
> > > Host A -------+----- LUN 1
> > > |
> > > +----- LUN 2
> > >
> > > Host B ------------- LUN 3
> > >
> > > i.e. partition the virtual LUNs between multiple hosts in the guest,
> > > but keeping some of them together? Perhaps I'm just missing
> > > something, but I can't think of any use cases which would benefit from
> > > that, and trying to support it noticeably complicates the frontend.
> > Can I explain a numbering logic of assigning LUNs to guests?
> That was what I was hoping you'd do, yes. :)
>
> > Basically, each guest looks same SCSI tree as host except for following
> > two points.
> >
> > 1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be
> > same as that on host.
> > 2.) Tree on the guest may be sparse when some LUN doesn't assign to
> > the guest.
> >
> > Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2
> > generally)
> Okay, why do you require that the device in the guest has the same
> channel:id:lun as the device on the host? That seems like a somewhat
> gratuitous restriction to me.
>
> > I think the numbering logic is same as b) you mentioned above. Is it
> > right?
> No, you've gone for option c:
>
> c) The topology inside the guest reflects a subset of the host
> topology
>
> which I hadn't previously considered.
The reason why we took the option c is as follows.
- Some storage management software running on guest may asume physical
topology. (However, I'm not sure whether there is such a software or
not.)
- The "host" is Linux specific number and Scsi-Host structure for
dummy consumes relatively large memory space. Therefore, we decided
to compress the "host" number. (Not sparse. Contiguous.)
Explicit declaration like below may be one solution. Of cource some
default setting is needed.
On Dom0 On Guest
------------------------
"1:2:3:4" ---> "5:6:7:8"
Best regards,
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-04 7:57 ` Jun Kamada
@ 2008-03-04 13:05 ` Steven Smith
2008-03-05 2:34 ` James Harper
0 siblings, 1 reply; 34+ messages in thread
From: Steven Smith @ 2008-03-04 13:05 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 3875 bytes --]
> > > > What I don't understand is why you need this at all. It seems like it
> > > > would make more sense to either:
> > > >
> > > > a) Hang every LUN off of the same scsi host, or
> > > > b) Give each LUN its own scsi host.
...
> > > Can I explain a numbering logic of assigning LUNs to guests?
> > That was what I was hoping you'd do, yes. :)
> >
> > > Basically, each guest looks same SCSI tree as host except for following
> > > two points.
> > >
> > > 1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be
> > > same as that on host.
> > > 2.) Tree on the guest may be sparse when some LUN doesn't assign to
> > > the guest.
> > >
> > > Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2
> > > generally)
> > Okay, why do you require that the device in the guest has the same
> > channel:id:lun as the device on the host? That seems like a somewhat
> > gratuitous restriction to me.
> >
> > > I think the numbering logic is same as b) you mentioned above. Is it
> > > right?
> > No, you've gone for option c:
> >
> > c) The topology inside the guest reflects a subset of the host
> > topology
> >
> > which I hadn't previously considered.
> The reason why we took the option c is as follows.
>
> - Some storage management software running on guest may asume physical
> topology. (However, I'm not sure whether there is such a software or
> not.)
There are three obvious ways for them to make that kind of assumption:
1) There's some SCSI command which applies to a collection of devices,
and that collection depends on the topology. Bus resets are the
obvious one here. All of these commands will need special handling
anyway, to prevent VMs from interfering with each other (and I
don't think you currently support any of them, anyway).
2) There are some magic LUNs/targets/whatevers which the application
tries to access at a particular address. sam4r10 requires that
either LUN 0 or the REPORT LUNS well-known LUN be present, so
that's pretty plausible. I think your current implementation may
already have problems here if a user decides to only connect a
subset of a device's LUNs, yes?
3) There's some SCSI command which returns LUNs in its results.
REPORT LUNs is the obvious one here. The frontend will currently
report incorrect results for these commands if the user has only
connected a subset of the LUNs.
This kind of suggests that we should be plumbing things through to the
guest with a granularity of whole targets, rather than individual
logical units. The alternative is a much more complicated scsi
emulation which can fix up the LUN-sensitive commands.
> - The "host" is Linux specific number and Scsi-Host structure for
> dummy consumes relatively large memory space. Therefore, we decided
> to compress the "host" number. (Not sparse. Contiguous.)
Are you implying that frontend host numbers won't always match up with
backend host numbers?
If hosts are expensive to construct then that's a good reason to avoid
model (b) (one host per LUN/target) (although my desktop has a scsi
host for each SATA port, so they can't be *that* expensive). It
doesn't rule out model (a) (one host shared by all LUNs).
> Explicit declaration like below may be one solution. Of cource some
> default setting is needed.
>
>
> On Dom0 On Guest
> ------------------------
> "1:2:3:4" ---> "5:6:7:8"
Allowing this kind of mapping sounds reasonable to me. It would also
make it possible (hopefully) to add support for some of the weirder
SCSI logical unit addressing modes without changing the frontends
(e.g. hierarchical addressing with 64 bit LUNs). That might involve a
certain amount of munging of REPORT LUNS commands in the backend,
though.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-03-04 13:05 ` Steven Smith
@ 2008-03-05 2:34 ` James Harper
2008-03-05 9:53 ` Jun Kamada
2008-03-07 2:55 ` Jun Kamada
0 siblings, 2 replies; 34+ messages in thread
From: James Harper @ 2008-03-05 2:34 UTC (permalink / raw)
To: Steven Smith, Jun Kamada; +Cc: xen-devel
> This kind of suggests that we should be plumbing things through to the
> guest with a granularity of whole targets, rather than individual
> logical units. The alternative is a much more complicated scsi
> emulation which can fix up the LUN-sensitive commands.
I think we should probably have the option of doing either.
> Allowing this kind of mapping sounds reasonable to me. It would also
> make it possible (hopefully) to add support for some of the weirder
> SCSI logical unit addressing modes without changing the frontends
> (e.g. hierarchical addressing with 64 bit LUNs). That might involve a
> certain amount of munging of REPORT LUNS commands in the backend,
> though.
Not sure how much it matters, but any 'munging' of scsi commands would
be a real drag for Windows drivers. The Windows SCSI layer is very
strict on lots of things, and is a real pain if you are not talking to a
physical PCI scsi device.
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-05 2:34 ` James Harper
@ 2008-03-05 9:53 ` Jun Kamada
2008-03-05 9:56 ` James Harper
2008-03-07 2:55 ` Jun Kamada
1 sibling, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-03-05 9:53 UTC (permalink / raw)
To: James Harper; +Cc: kama, Steven Smith, xen-devel
Hi James-san and Steven-san,
Thank you for your comments.
In order to avoid my misunderstanding, could you teach me what the
'munging' is? It means to reject the some SCSI commands or to modify
inside of the command(CDB) and response(SENSE) on the backend ?
Thanks,
On Wed, 5 Mar 2008 13:34:48 +1100
"James Harper" <james.harper@bendigoit.com.au> wrote:
> > This kind of suggests that we should be plumbing things through to the
> > guest with a granularity of whole targets, rather than individual
> > logical units. The alternative is a much more complicated scsi
> > emulation which can fix up the LUN-sensitive commands.
>
> I think we should probably have the option of doing either.
>
> > Allowing this kind of mapping sounds reasonable to me. It would also
> > make it possible (hopefully) to add support for some of the weirder
> > SCSI logical unit addressing modes without changing the frontends
> > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a
> > certain amount of munging of REPORT LUNS commands in the backend,
> > though.
>
> Not sure how much it matters, but any 'munging' of scsi commands would
> be a real drag for Windows drivers. The Windows SCSI layer is very
> strict on lots of things, and is a real pain if you are not talking to a
> physical PCI scsi device.
>
> James
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-03-05 9:53 ` Jun Kamada
@ 2008-03-05 9:56 ` James Harper
2008-03-05 10:00 ` Jun Kamada
0 siblings, 1 reply; 34+ messages in thread
From: James Harper @ 2008-03-05 9:56 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, xen-devel
> Hi James-san and Steven-san,
>
> Thank you for your comments.
>
> In order to avoid my misunderstanding, could you teach me what the
> 'munging' is? It means to reject the some SCSI commands or to modify
> inside of the command(CDB) and response(SENSE) on the backend ?
>
:)
In this context it just means modifying the packets 'on the fly' in a
way that we'd probably rather not. I guess it's kind of a NAT for
SCSI... maybe we'd call it SAT for Scsi Address Translation :)
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-05 9:56 ` James Harper
@ 2008-03-05 10:00 ` Jun Kamada
2008-03-06 23:48 ` Dan Magenheimer
0 siblings, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-03-05 10:00 UTC (permalink / raw)
To: James Harper; +Cc: kama, Steven Smith, xen-devel
Hi James-san,
On Wed, 5 Mar 2008 20:56:32 +1100
"James Harper" <james.harper@bendigoit.com.au> wrote:
> In this context it just means modifying the packets 'on the fly' in a
> way that we'd probably rather not. I guess it's kind of a NAT for
> SCSI... maybe we'd call it SAT for Scsi Address Translation :)
OK, I understood. Thanks.
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-03-05 10:00 ` Jun Kamada
@ 2008-03-06 23:48 ` Dan Magenheimer
2008-03-07 1:20 ` Jun Kamada
0 siblings, 1 reply; 34+ messages in thread
From: Dan Magenheimer @ 2008-03-06 23:48 UTC (permalink / raw)
To: Jun Kamada, James Harper; +Cc: Steven Smith, xen-devel@lists.xensource.com
For a more precise definition of "munge" and for future
reference see:
http://foldoc.org/index.cgi?query=munge&action=Search
http://foldoc.org/
Hope that helps!
Dan
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com]On Behalf Of Jun Kamada
> Sent: Wednesday, March 05, 2008 3:00 AM
> To: James Harper
> Cc: kama@jp.fujitsu.com; Steven Smith; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [Patch 0/7] pvSCSI driver
>
>
> Hi James-san,
>
> On Wed, 5 Mar 2008 20:56:32 +1100
> "James Harper" <james.harper@bendigoit.com.au> wrote:
> > In this context it just means modifying the packets 'on the
> fly' in a
> > way that we'd probably rather not. I guess it's kind of a NAT for
> > SCSI... maybe we'd call it SAT for Scsi Address Translation :)
>
> OK, I understood. Thanks.
>
>
> Jun Kamada
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-06 23:48 ` Dan Magenheimer
@ 2008-03-07 1:20 ` Jun Kamada
0 siblings, 0 replies; 34+ messages in thread
From: Jun Kamada @ 2008-03-07 1:20 UTC (permalink / raw)
To: dan.magenheimer@oracle.com
Cc: kama, Steven Smith, James Harper, xen-devel@lists.xensource.com
Hi Dan-san,
On Thu, 6 Mar 2008 16:48:42 -0700
"Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> For a more precise definition of "munge" and for future
> reference see:
>
> http://foldoc.org/index.cgi?query=munge&action=Search
>
> http://foldoc.org/
>
> Hope that helps!
> Dan
It's very helpful for me. Thanks.
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-05 2:34 ` James Harper
2008-03-05 9:53 ` Jun Kamada
@ 2008-03-07 2:55 ` Jun Kamada
2008-03-07 4:31 ` James Harper
2008-03-10 12:00 ` Steven Smith
1 sibling, 2 replies; 34+ messages in thread
From: Jun Kamada @ 2008-03-07 2:55 UTC (permalink / raw)
To: James Harper; +Cc: kama, Steven Smith, xen-devel
Hi,
Problems discussed in this context, what the portion of whole SCSI
tree should be exposed to guest and how the numbering logic of guest's
tree should be, is very fundamental and difficult, I think.
In my current thought, following two options are reasonable solutions.
How do you think about them? Could you please comment me?
Option 1 (LUN assignment)
- Specify the assignment like below:
"host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest)
The lun1 must be same as the lun2.
- Munging :-) REPORT LUNS command on Dom0 according to the number of
LUNs actually attached to the guest.
Option 2 (Target Assignment)
- Specify the assignment like below:
"host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest)
All LUNs under id1 are assigned to one guest.
- Munging for LUN is not needed.
For each option, how host/bus/device reset command should be?
Best regards,
On Wed, 5 Mar 2008 13:34:48 +1100
"James Harper" <james.harper@bendigoit.com.au> wrote:
> > This kind of suggests that we should be plumbing things through to the
> > guest with a granularity of whole targets, rather than individual
> > logical units. The alternative is a much more complicated scsi
> > emulation which can fix up the LUN-sensitive commands.
>
> I think we should probably have the option of doing either.
>
> > Allowing this kind of mapping sounds reasonable to me. It would also
> > make it possible (hopefully) to add support for some of the weirder
> > SCSI logical unit addressing modes without changing the frontends
> > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a
> > certain amount of munging of REPORT LUNS commands in the backend,
> > though.
>
> Not sure how much it matters, but any 'munging' of scsi commands would
> be a real drag for Windows drivers. The Windows SCSI layer is very
> strict on lots of things, and is a real pain if you are not talking to a
> physical PCI scsi device.
>
> James
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Patch 0/7] pvSCSI driver
2008-03-07 2:55 ` Jun Kamada
@ 2008-03-07 4:31 ` James Harper
2008-03-14 19:04 ` James Smart
2008-03-10 12:00 ` Steven Smith
1 sibling, 1 reply; 34+ messages in thread
From: James Harper @ 2008-03-07 4:31 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, xen-devel
> Hi,
>
> Problems discussed in this context, what the portion of whole SCSI
> tree should be exposed to guest and how the numbering logic of guest's
> tree should be, is very fundamental and difficult, I think.
>
> In my current thought, following two options are reasonable solutions.
> How do you think about them? Could you please comment me?
>
> Option 1 (LUN assignment)
> - Specify the assignment like below:
> "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest)
> The lun1 must be same as the lun2.
> - Munging :-) REPORT LUNS command on Dom0 according to the number of
> LUNs actually attached to the guest.
>
> Option 2 (Target Assignment)
> - Specify the assignment like below:
> "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest)
> All LUNs under id1 are assigned to one guest.
> - Munging for LUN is not needed.
I think it would help to have some real-life examples about where each
option would and wouldn't make sense. It may be that you need to
implement both options. I'm not familiar enough with the variety of scsi
devices out there to be able to judge.
> For each option, how host/bus/device reset command should be?
I have thought about this some more. Normally, a reset will be issued
because of some error, normally a timeout I assume. You could implement
something like:
. if the reset requested is a device reset, and the DomU 'owns' all luns
attached to the device, then allow the device reset.
. if the reset requested is a device reset, and the DomU 'owns' only
some of the luns attached to the device, then only allow the device
reset if all the other 'owners' have requested a device reset also.
. the above two rules might work for host and bus resets too, as long as
all 'owners' agree to a reset.
The problem might be if you had a device with three luns, and three
DomU's with a single lun each. If the device had hung and required a
reset, then any DomU using it would notice the timeout and issue a
reset, but if one DomU wasn't using its lun at the time it might not
notice. Maybe you need another communication channel where Dom0 can ask
each DomU for permission to do the reset.
This reset stuff seems like a lot of extra work for probably not much
benefit though.
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-07 2:55 ` Jun Kamada
2008-03-07 4:31 ` James Harper
@ 2008-03-10 12:00 ` Steven Smith
2008-03-12 6:23 ` Jun Kamada
1 sibling, 1 reply; 34+ messages in thread
From: Steven Smith @ 2008-03-10 12:00 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, James Harper, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 3074 bytes --]
> Problems discussed in this context, what the portion of whole SCSI
> tree should be exposed to guest and how the numbering logic of guest's
> tree should be, is very fundamental and difficult, I think.
>
> In my current thought, following two options are reasonable solutions.
> How do you think about them? Could you please comment me?
>
> Option 1 (LUN assignment)
> - Specify the assignment like below:
> "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest)
> The lun1 must be same as the lun2.
> - Munging :-) REPORT LUNS command on Dom0 according to the number of
> LUNs actually attached to the guest.
I think this is the most flexible approach.
One thing to watch out for here is that some old systems get quite
confused if lun0 is missing but some of the higher luns are present.
That's easy to handle if you allow an arbitrary mapping between dom0
and guest luns, but is hard if you require them to be identical. This
might not be an issue in the cases which we care about, though.
> Option 2 (Target Assignment)
> - Specify the assignment like below:
> "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest)
> All LUNs under id1 are assigned to one guest.
> - Munging for LUN is not needed.
>
> For each option, how host/bus/device reset command should be?
It's possible that we'll be able to get away with just supporting
LOGICAL UNIT RESET commands, and completely ignoring lower granularity
resets. I'm not sure how widely supported they are on actual
hardware, but it might be good enough for a first implementation. You
might even be able to get away with not supporting any kind of reset
at all, and just accepting that error recovery is going to suck.
Steven.
> On Wed, 5 Mar 2008 13:34:48 +1100
> "James Harper" <james.harper@bendigoit.com.au> wrote:
>
> > > This kind of suggests that we should be plumbing things through to the
> > > guest with a granularity of whole targets, rather than individual
> > > logical units. The alternative is a much more complicated scsi
> > > emulation which can fix up the LUN-sensitive commands.
> >
> > I think we should probably have the option of doing either.
> >
> > > Allowing this kind of mapping sounds reasonable to me. It would also
> > > make it possible (hopefully) to add support for some of the weirder
> > > SCSI logical unit addressing modes without changing the frontends
> > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a
> > > certain amount of munging of REPORT LUNS commands in the backend,
> > > though.
> >
> > Not sure how much it matters, but any 'munging' of scsi commands would
> > be a real drag for Windows drivers. The Windows SCSI layer is very
> > strict on lots of things, and is a real pain if you are not talking to a
> > physical PCI scsi device.
> >
> > James
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
> Jun Kamada
>
>
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-10 12:00 ` Steven Smith
@ 2008-03-12 6:23 ` Jun Kamada
2008-03-13 14:30 ` Steven Smith
2008-03-14 19:16 ` James Smart
0 siblings, 2 replies; 34+ messages in thread
From: Jun Kamada @ 2008-03-12 6:23 UTC (permalink / raw)
To: Steven Smith; +Cc: kama, James Harper, xen-devel
Hi Steven-san and James-san,
Thank you for your comments.
We have had a internal discussion based on your comments and reached
following thoughts. I consider that the thoughts can provide both
flexibility and ease of implementation.
We would like to start modification of the pvSCSI driver according to
the thoughts. How do you think about it? The thoughts is reasonable?
If you have any comments, could you please?
-----
1.) Allow specifying arbitrary mapping between Dom0's SCSI tree and
Guest's SCSI tree. This includes "lun".
( Dom0's IDs [host1:channel1:id1:lun1] --->
Guest's IDs [host2:channel2:id2:lun2] )
2.) Guest has responsibility to have mapping and transform between
Dom0's IDs and Guest's IDs. It depends on guest OS's implementation
which level(e.g. only "host" or all of 4-tuples or no-transform) of
mapping/transformation will be supported.
If guest decides to support lun transformation and in case of
"lun1 != lun2", the guest's frontend driver should maintain LUN
value in CDB data structure.
3.) As for REPORT LUNS command, Dom0 performs munging.
4.) Dom0 accepts only LOGICAL UNIT RESET command.
5.) Of course, the backend driver performs sanity check of IDs that the
guest already transformed.
And I would like to implement pvSCSI frontend driver for Linux by
following mapping/transformation policy. (Please note that another guest
OS such as Windows can take another policy, of cource.)
- The guest looks identical tree as Dom0 looks except for "host".
(This comes by the reason that arbitrary "host" mapping is difficult
for current Linux implementation.)
- Of course, the guest's tree is sparse if some LUNs were not attached
to the guest. Linux kernel allows the situation that lun=0 does not
exist, therefore sparse tree is not a problem.
Best regards,
On Mon, 10 Mar 2008 12:00:59 +0000
Steven Smith <steven.smith@eu.citrix.com> wrote:
> > Problems discussed in this context, what the portion of whole SCSI
> > tree should be exposed to guest and how the numbering logic of guest's
> > tree should be, is very fundamental and difficult, I think.
> >
> > In my current thought, following two options are reasonable solutions.
> > How do you think about them? Could you please comment me?
> >
> > Option 1 (LUN assignment)
> > - Specify the assignment like below:
> > "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest)
> > The lun1 must be same as the lun2.
> > - Munging :-) REPORT LUNS command on Dom0 according to the number of
> > LUNs actually attached to the guest.
> I think this is the most flexible approach.
>
> One thing to watch out for here is that some old systems get quite
> confused if lun0 is missing but some of the higher luns are present.
> That's easy to handle if you allow an arbitrary mapping between dom0
> and guest luns, but is hard if you require them to be identical. This
> might not be an issue in the cases which we care about, though.
>
> > Option 2 (Target Assignment)
> > - Specify the assignment like below:
> > "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest)
> > All LUNs under id1 are assigned to one guest.
> > - Munging for LUN is not needed.
> >
> > For each option, how host/bus/device reset command should be?
> It's possible that we'll be able to get away with just supporting
> LOGICAL UNIT RESET commands, and completely ignoring lower granularity
> resets. I'm not sure how widely supported they are on actual
> hardware, but it might be good enough for a first implementation. You
> might even be able to get away with not supporting any kind of reset
> at all, and just accepting that error recovery is going to suck.
>
> Steven.
>
> > On Wed, 5 Mar 2008 13:34:48 +1100
> > "James Harper" <james.harper@bendigoit.com.au> wrote:
> >
> > > > This kind of suggests that we should be plumbing things through to the
> > > > guest with a granularity of whole targets, rather than individual
> > > > logical units. The alternative is a much more complicated scsi
> > > > emulation which can fix up the LUN-sensitive commands.
> > >
> > > I think we should probably have the option of doing either.
> > >
> > > > Allowing this kind of mapping sounds reasonable to me. It would also
> > > > make it possible (hopefully) to add support for some of the weirder
> > > > SCSI logical unit addressing modes without changing the frontends
> > > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a
> > > > certain amount of munging of REPORT LUNS commands in the backend,
> > > > though.
> > >
> > > Not sure how much it matters, but any 'munging' of scsi commands would
> > > be a real drag for Windows drivers. The Windows SCSI layer is very
> > > strict on lots of things, and is a real pain if you are not talking to a
> > > physical PCI scsi device.
> > >
> > > James
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> >
> > Jun Kamada
> >
> >
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-12 6:23 ` Jun Kamada
@ 2008-03-13 14:30 ` Steven Smith
2008-03-17 2:33 ` Jun Kamada
2008-03-14 19:16 ` James Smart
1 sibling, 1 reply; 34+ messages in thread
From: Steven Smith @ 2008-03-13 14:30 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, James Harper, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 11370 bytes --]
Looking through the SCSI spec, I don't think we're going to be able to
get away with passing requests through from the frontend all the way
to the physical disk without sanity checking the actual CDB in the
backend. There are a couple of commands which look scary:
-- CHANGE ALIAS/REPORT ALIAS -- the alias list is shared across
everything in the I_T nexus. That will lead to interesting issues
if you ever have multiple guests modifying it at the same time.
-- EXTENDED COPY -- allows you to copy arbitrary data between logical
units, sometimes even ones not in the same target device. That's
obviously going to need to be controlled in a VM setting.
-- Some mode pages, as modified by MODE SELECT, can apply across
multiple LUs. Even more exciting, the level of sharing can in
principle vary between devices, even for the same page.
-- WRITE BUFFER commands can be used to change the microcode on a
device. I've no idea what the implications of letting an untrusted
user push microcode into a device would be, but I doubt it's a good
idea.
-- I'm not sure whether we want to allow untrusted guests to issue SET
PRIORITY commands.
-- We've already been over REPORT LUNS :)
Plus whatever weird things the various device manufacturers decide to
introduce.
What this means is that the REPORT LUNS issue fundamentally isn't
restricted to just the REPORT LUNS command, but instead affects an
unknown and potentially large set of other commands. The only way I
can see to deal with this is to white-list commands individually once
they've been confirmed to be safe, and have the backend block any
commands which haven't been checked yet. That's going to be a fair
amount of work, and it'll screw up the whole ``transparent pass
through'' thing, but I can't see any other way of solving this problem
safely.
(And even that assumes that the hardware people got everything right.
Most devices will be designed on the assumption that only trusted
system components can submit CDBs, so it wouldn't surprise me if some
of them can be made to do bad things if a malicious CDB comes in.
There's not really a great deal we can do about this, though.)
Backtracking a little, the fundamental goal here is to make some
logical units which are accessible to dom0 appear inside the guest.
Guest operating systems are unlikely to be very happy about having
logical units floating around not attached to scsi hosts, and so we
need (somehow) to come up with a scsi host which has the right set of
logical units attached to it. There are lots of valid use cases in
which there don't exist physical hosts with the right set of LUs, and
so somebody needs to invent one, and then emulate it. That somebody
will necessarily be either the frontend or the backend.
Doing the emulation also gives you the option of filtering out things
like TCQ support in INQUIRY commands, which might be supported by the
physical device but certainly isn't supported by the pvSCSI protocol.
If you emulate the HBA in the backend, you get a design like this:
-- There is usually only one xenbus scsi device attached to any given
VM, and that device represents the emulated HBA.
-- scsifront creates a struct scsi_host (or equivalent) for each
xenbus device, and those provide your interface to the rest of the
guest operating system.
-- When the guest OS submits a request to the frontend driver, it gets
packaged up and shipped over the ring to the backend pretty much
completely unchanged.
-- The backend figures out what the request is doing, and either:
a) Routes it to a physical device, or
b) Synthesises an answer (for things like REPORT LUNS), or
c) Fails the request (for things like WRITE BUFFER),
as appropriate.
If you emulate the HBA in the frontend, you get a design which looks
like this:
-- Each logical unit exposed to the guest has its own xenbus scsi
device.
-- scsifront creates a single struct scsi_host, representing the
emulated HBA.
-- When the guest OS submits a request to the frontend driver, it
either:
a) Routes it to a Xen scsifront and passes it off to the backend, or
b) Synthesises an answer, or
c) Fails the request,
as appropriate.
-- When a request reaches the backend, it does a basic check to make
sure that it's dealing with one of the whitelisted requests, and
then sends it directly to the relevant physical device. The
routing problem is trivial here, because there is only ever one
physical device (struct scsi_device in Linux-speak) associated with
any xenbus device, and the request is just dropped directly into
the relevant request queue.
The first approach gives you a simple frontend at the expense of a
complicated backend, while the second one gives you a simple backend
at the expense of a complicated frontend. It seems likely that there
will be more frontend implementations than backend, which suggests
that putting the HBA emulation in the backend is a better choice.
The main difference from a performance point of view is that the
second approach will use a ring for each device, whereas the first has
a single ring shared across all devices, so you'll get more requests
in flight with the second scheme. I'd expect that just making the
rings larger would have more effect, though, and that's easier when
there's just one of them.
Steven.
On Wed, Mar 12, 2008 at 03:23:00PM +0900, Jun Kamada wrote:
> Date: Wed, 12 Mar 2008 15:23:00 +0900
> From: Jun Kamada <kama@jp.fujitsu.com>
> To: Steven Smith <steven.smith@eu.citrix.com>
> Subject: Re: [Xen-devel] [Patch 0/7] pvSCSI driver
> Cc: kama@jp.fujitsu.com, James Harper <james.harper@bendigoit.com.au>,
> xen-devel@lists.xensource.com
>
> Hi Steven-san and James-san,
>
> Thank you for your comments.
>
> We have had a internal discussion based on your comments and reached
> following thoughts. I consider that the thoughts can provide both
> flexibility and ease of implementation.
>
> We would like to start modification of the pvSCSI driver according to
> the thoughts. How do you think about it? The thoughts is reasonable?
> If you have any comments, could you please?
>
>
> -----
> 1.) Allow specifying arbitrary mapping between Dom0's SCSI tree and
> Guest's SCSI tree. This includes "lun".
> ( Dom0's IDs [host1:channel1:id1:lun1] --->
> Guest's IDs [host2:channel2:id2:lun2] )
> 2.) Guest has responsibility to have mapping and transform between
> Dom0's IDs and Guest's IDs. It depends on guest OS's implementation
> which level(e.g. only "host" or all of 4-tuples or no-transform) of
> mapping/transformation will be supported.
> If guest decides to support lun transformation and in case of
> "lun1 != lun2", the guest's frontend driver should maintain LUN
> value in CDB data structure.
> 3.) As for REPORT LUNS command, Dom0 performs munging.
> 4.) Dom0 accepts only LOGICAL UNIT RESET command.
> 5.) Of course, the backend driver performs sanity check of IDs that the
> guest already transformed.
>
>
> And I would like to implement pvSCSI frontend driver for Linux by
> following mapping/transformation policy. (Please note that another guest
> OS such as Windows can take another policy, of cource.)
>
> - The guest looks identical tree as Dom0 looks except for "host".
> (This comes by the reason that arbitrary "host" mapping is difficult
> for current Linux implementation.)
> - Of course, the guest's tree is sparse if some LUNs were not attached
> to the guest. Linux kernel allows the situation that lun=0 does not
> exist, therefore sparse tree is not a problem.
>
>
> Best regards,
>
>
> On Mon, 10 Mar 2008 12:00:59 +0000
> Steven Smith <steven.smith@eu.citrix.com> wrote:
>
> > > Problems discussed in this context, what the portion of whole SCSI
> > > tree should be exposed to guest and how the numbering logic of guest's
> > > tree should be, is very fundamental and difficult, I think.
> > >
> > > In my current thought, following two options are reasonable solutions.
> > > How do you think about them? Could you please comment me?
> > >
> > > Option 1 (LUN assignment)
> > > - Specify the assignment like below:
> > > "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest)
> > > The lun1 must be same as the lun2.
> > > - Munging :-) REPORT LUNS command on Dom0 according to the number of
> > > LUNs actually attached to the guest.
> > I think this is the most flexible approach.
> >
> > One thing to watch out for here is that some old systems get quite
> > confused if lun0 is missing but some of the higher luns are present.
> > That's easy to handle if you allow an arbitrary mapping between dom0
> > and guest luns, but is hard if you require them to be identical. This
> > might not be an issue in the cases which we care about, though.
> >
> > > Option 2 (Target Assignment)
> > > - Specify the assignment like below:
> > > "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest)
> > > All LUNs under id1 are assigned to one guest.
> > > - Munging for LUN is not needed.
> > >
> > > For each option, how host/bus/device reset command should be?
> > It's possible that we'll be able to get away with just supporting
> > LOGICAL UNIT RESET commands, and completely ignoring lower granularity
> > resets. I'm not sure how widely supported they are on actual
> > hardware, but it might be good enough for a first implementation. You
> > might even be able to get away with not supporting any kind of reset
> > at all, and just accepting that error recovery is going to suck.
> >
> > Steven.
> >
> > > On Wed, 5 Mar 2008 13:34:48 +1100
> > > "James Harper" <james.harper@bendigoit.com.au> wrote:
> > >
> > > > > This kind of suggests that we should be plumbing things through to the
> > > > > guest with a granularity of whole targets, rather than individual
> > > > > logical units. The alternative is a much more complicated scsi
> > > > > emulation which can fix up the LUN-sensitive commands.
> > > >
> > > > I think we should probably have the option of doing either.
> > > >
> > > > > Allowing this kind of mapping sounds reasonable to me. It would also
> > > > > make it possible (hopefully) to add support for some of the weirder
> > > > > SCSI logical unit addressing modes without changing the frontends
> > > > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a
> > > > > certain amount of munging of REPORT LUNS commands in the backend,
> > > > > though.
> > > >
> > > > Not sure how much it matters, but any 'munging' of scsi commands would
> > > > be a real drag for Windows drivers. The Windows SCSI layer is very
> > > > strict on lots of things, and is a real pain if you are not talking to a
> > > > physical PCI scsi device.
> > > >
> > > > James
> > > >
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@lists.xensource.com
> > > > http://lists.xensource.com/xen-devel
> > >
> > > Jun Kamada
> > >
> > >
>
>
> -----
> Jun Kamada
>
>
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-07 4:31 ` James Harper
@ 2008-03-14 19:04 ` James Smart
0 siblings, 0 replies; 34+ messages in thread
From: James Smart @ 2008-03-14 19:04 UTC (permalink / raw)
To: James Harper; +Cc: Jun Kamada, Steven Smith, xen-devel
James Harper wrote:
> This reset stuff seems like a lot of extra work for probably not much
> benefit though.
This ends up being the crux of it...
It all depends on how important the use of scsi is to the consumer. In otherwords,
a scsi disk, layered under LVM, filesystems, etc, the nuances of the resets and
inter-relations between luns and targets isn't that meaningful and they will happily
live in a world with these things are emulated.
However, if the scsi disk is talked to directly via things like sg tools, or things
like multipathing software (where failover is disk & target specific), it matters
more.
And if the scsi disk is handed all the way to the database, it matter *much* *much*
more. In fact, this is one reason you find very enterprise databases supported in
virtualized environments.
All this hints at levels of pass-thru. Granted, you can always take one step at a time.
-- james
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-12 6:23 ` Jun Kamada
2008-03-13 14:30 ` Steven Smith
@ 2008-03-14 19:16 ` James Smart
2008-03-17 2:59 ` Jun Kamada
1 sibling, 1 reply; 34+ messages in thread
From: James Smart @ 2008-03-14 19:16 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, James Harper, xen-devel
Jun Kamada wrote:
> -----
> 1.) Allow specifying arbitrary mapping between Dom0's SCSI tree and
> Guest's SCSI tree. This includes "lun".
> ( Dom0's IDs [host1:channel1:id1:lun1] --->
> Guest's IDs [host2:channel2:id2:lun2] )
It would really be nice, when considering a model of FC NPIV, or IOV-based
ports, where you allow a model where the mapping can be stronger and done
in a single step. E.g. map everything from a particular scsi_host into the DomU.
Note: I'm not lobbying for a change in emulation, but rather trying to automate
the arbitrary and individual mappings when there is a higher level (relative to
the scsi tree) association to the DomU.
Note: given that channel # is specific to the host#,
and id # is specific to the channel #,
and lun # is specific to the id #
there's no real reason why they couldn't be the same, or at least
overlap, with the Dom0 values. It's all up to whomever is doing
the transformation or emulation.
> 2.) Guest has responsibility to have mapping and transform between
> Dom0's IDs and Guest's IDs. It depends on guest OS's implementation
> which level(e.g. only "host" or all of 4-tuples or no-transform) of
> mapping/transformation will be supported.
> If guest decides to support lun transformation and in case of
> "lun1 != lun2", the guest's frontend driver should maintain LUN
> value in CDB data structure.
Wow. This seems odd. In my mind, this really is based on the abstraction
you choose between the DomU and Dom0. You're either exporting SCSI Disks,
SCSI targets, or SCSI Hosts. Each of these dictates differences in the way
the emulation is done.
I would have thought the translation always occurs on the Dom0 side.
> 3.) As for REPORT LUNS command, Dom0 performs munging.
This is in line with my last statement - it's on the Dom0 side.
> 4.) Dom0 accepts only LOGICAL UNIT RESET command.
Note: At least for Linux stacks, and I know it's true for older Windows
releases as well, the scsi stacks don't generate LOGICAL UNIT RESETS.
They are either Target Resets or Bus Resets.
> 5.) Of course, the backend driver performs sanity check of IDs that the
> guest already transformed.
And if you're checking it - why are you (the Dom0) managing the transformation ?
Sounds like the work got done twice in your proposal.
-- james
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-13 14:30 ` Steven Smith
@ 2008-03-17 2:33 ` Jun Kamada
2008-03-17 17:29 ` Steven Smith
0 siblings, 1 reply; 34+ messages in thread
From: Jun Kamada @ 2008-03-17 2:33 UTC (permalink / raw)
To: Steven Smith; +Cc: kama, James Harper, xen-devel
Hi Steven-san,
On Thu, 13 Mar 2008 14:30:10 +0000
Steven Smith <steven.smith@eu.citrix.com> wrote:
> Backtracking a little, the fundamental goal here is to make some
> logical units which are accessible to dom0 appear inside the guest.
> Guest operating systems are unlikely to be very happy about having
> logical units floating around not attached to scsi hosts, and so we
> need (somehow) to come up with a scsi host which has the right set of
> logical units attached to it. There are lots of valid use cases in
> which there don't exist physical hosts with the right set of LUs, and
> so somebody needs to invent one, and then emulate it. That somebody
> will necessarily be either the frontend or the backend.
>
> Doing the emulation also gives you the option of filtering out things
> like TCQ support in INQUIRY commands, which might be supported by the
> physical device but certainly isn't supported by the pvSCSI protocol.
>
> If you emulate the HBA in the backend, you get a design like this:
>
> -- There is usually only one xenbus scsi device attached to any given
> VM, and that device represents the emulated HBA.
>
> -- scsifront creates a struct scsi_host (or equivalent) for each
> xenbus device, and those provide your interface to the rest of the
> guest operating system.
>
> -- When the guest OS submits a request to the frontend driver, it gets
> packaged up and shipped over the ring to the backend pretty much
> completely unchanged.
>
> -- The backend figures out what the request is doing, and either:
>
> a) Routes it to a physical device, or
> b) Synthesises an answer (for things like REPORT LUNS), or
> c) Fails the request (for things like WRITE BUFFER),
>
> as appropriate.
>
> If you emulate the HBA in the frontend, you get a design which looks
> like this:
>
> -- Each logical unit exposed to the guest has its own xenbus scsi
> device.
>
> -- scsifront creates a single struct scsi_host, representing the
> emulated HBA.
>
> -- When the guest OS submits a request to the frontend driver, it
> either:
>
> a) Routes it to a Xen scsifront and passes it off to the backend, or
> b) Synthesises an answer, or
> c) Fails the request,
>
> as appropriate.
>
> -- When a request reaches the backend, it does a basic check to make
> sure that it's dealing with one of the whitelisted requests, and
> then sends it directly to the relevant physical device. The
> routing problem is trivial here, because there is only ever one
> physical device (struct scsi_device in Linux-speak) associated with
> any xenbus device, and the request is just dropped directly into
> the relevant request queue.
>
> The first approach gives you a simple frontend at the expense of a
> complicated backend, while the second one gives you a simple backend
> at the expense of a complicated frontend. It seems likely that there
> will be more frontend implementations than backend, which suggests
> that putting the HBA emulation in the backend is a better choice.
I agree with your thoughts. On the other hand, I also consider that
the "more frontend implementation" suggests each guest OS has each own
emulation policy, therefore emulating on the frontend is suitable,
maybe. It's very difficult to decide which approach I should take.
Each approach has both good points and bad points. :-<
However, I would like to take the first approach, emulation on the
backend, according to your and James Smart-san's advise, and to start
implementation. :-)
> The main difference from a performance point of view is that the
> second approach will use a ring for each device, whereas the first has
> a single ring shared across all devices, so you'll get more requests
> in flight with the second scheme. I'd expect that just making the
> rings larger would have more effect, though, and that's easier when
> there's just one of them.
>
I expect the Netchannel2 for solving performance issues.
> Looking through the SCSI spec, I don't think we're going to be able to
> get away with passing requests through from the frontend all the way
> to the physical disk without sanity checking the actual CDB in the
> backend. There are a couple of commands which look scary:
>
> -- CHANGE ALIAS/REPORT ALIAS -- the alias list is shared across
> everything in the I_T nexus. That will lead to interesting issues
> if you ever have multiple guests modifying it at the same time.
>
> -- EXTENDED COPY -- allows you to copy arbitrary data between logical
> units, sometimes even ones not in the same target device. That's
> obviously going to need to be controlled in a VM setting.
>
> -- Some mode pages, as modified by MODE SELECT, can apply across
> multiple LUs. Even more exciting, the level of sharing can in
> principle vary between devices, even for the same page.
>
> -- WRITE BUFFER commands can be used to change the microcode on a
> device. I've no idea what the implications of letting an untrusted
> user push microcode into a device would be, but I doubt it's a good
> idea.
>
> -- I'm not sure whether we want to allow untrusted guests to issue SET
> PRIORITY commands.
>
> -- We've already been over REPORT LUNS :)
>
> Plus whatever weird things the various device manufacturers decide to
> introduce.
>
> What this means is that the REPORT LUNS issue fundamentally isn't
> restricted to just the REPORT LUNS command, but instead affects an
> unknown and potentially large set of other commands. The only way I
> can see to deal with this is to white-list commands individually once
> they've been confirmed to be safe, and have the backend block any
> commands which haven't been checked yet. That's going to be a fair
> amount of work, and it'll screw up the whole ``transparent pass
> through'' thing, but I can't see any other way of solving this problem
> safely.
I will take the approach that start with mandatory SCSI commands by
white-list, and expands the other commands.
> (And even that assumes that the hardware people got everything right.
> Most devices will be designed on the assumption that only trusted
> system components can submit CDBs, so it wouldn't surprise me if some
> of them can be made to do bad things if a malicious CDB comes in.
> There's not really a great deal we can do about this, though.)
Best regards,
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-14 19:16 ` James Smart
@ 2008-03-17 2:59 ` Jun Kamada
0 siblings, 0 replies; 34+ messages in thread
From: Jun Kamada @ 2008-03-17 2:59 UTC (permalink / raw)
To: James.Smart; +Cc: kama, Steven Smith, James Harper, xen-devel
Hi James-san,
Thank you for your comments.
On Fri, 14 Mar 2008 15:16:44 -0400
James Smart <James.Smart@Emulex.Com> wrote:
> Jun Kamada wrote:
> > -----
> > 1.) Allow specifying arbitrary mapping between Dom0's SCSI tree and
> > Guest's SCSI tree. This includes "lun".
> > ( Dom0's IDs [host1:channel1:id1:lun1] --->
> > Guest's IDs [host2:channel2:id2:lun2] )
>
> It would really be nice, when considering a model of FC NPIV, or IOV-based
> ports, where you allow a model where the mapping can be stronger and done
> in a single step. E.g. map everything from a particular scsi_host into the DomU.
>
> Note: I'm not lobbying for a change in emulation, but rather trying to automate
> the arbitrary and individual mappings when there is a higher level (relative to
> the scsi tree) association to the DomU.
I have a same thoughts that the interface such like wild-card (for
example 1:0:*:*) is needed.
> Note: given that channel # is specific to the host#,
> and id # is specific to the channel #,
> and lun # is specific to the id #
> there's no real reason why they couldn't be the same, or at least
> overlap, with the Dom0 values. It's all up to whomever is doing
> the transformation or emulation.
>
> > 2.) Guest has responsibility to have mapping and transform between
> > Dom0's IDs and Guest's IDs. It depends on guest OS's implementation
> > which level(e.g. only "host" or all of 4-tuples or no-transform) of
> > mapping/transformation will be supported.
> > If guest decides to support lun transformation and in case of
> > "lun1 != lun2", the guest's frontend driver should maintain LUN
> > value in CDB data structure.
>
> Wow. This seems odd. In my mind, this really is based on the abstraction
> you choose between the DomU and Dom0. You're either exporting SCSI Disks,
> SCSI targets, or SCSI Hosts. Each of these dictates differences in the way
> the emulation is done.
>
> I would have thought the translation always occurs on the Dom0 side.
I would like to take backend side emulation approach as mentioned on
another mail. Thank you for your advise.
> > 3.) As for REPORT LUNS command, Dom0 performs munging.
>
> This is in line with my last statement - it's on the Dom0 side.
>
> > 4.) Dom0 accepts only LOGICAL UNIT RESET command.
>
> Note: At least for Linux stacks, and I know it's true for older Windows
> releases as well, the scsi stacks don't generate LOGICAL UNIT RESETS.
> They are either Target Resets or Bus Resets.
This is very difficult issue and there is no good solution. :-<
> > 5.) Of course, the backend driver performs sanity check of IDs that the
> > guest already transformed.
>
> And if you're checking it - why are you (the Dom0) managing the transformation ?
> Sounds like the work got done twice in your proposal.
>
> -- james
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
Best regards,
-----
Jun Kamada
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Patch 0/7] pvSCSI driver
2008-03-17 2:33 ` Jun Kamada
@ 2008-03-17 17:29 ` Steven Smith
0 siblings, 0 replies; 34+ messages in thread
From: Steven Smith @ 2008-03-17 17:29 UTC (permalink / raw)
To: Jun Kamada; +Cc: Steven Smith, James Harper, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2465 bytes --]
> > The first approach gives you a simple frontend at the expense of a
> > complicated backend, while the second one gives you a simple backend
> > at the expense of a complicated frontend. It seems likely that there
> > will be more frontend implementations than backend, which suggests
> > that putting the HBA emulation in the backend is a better choice.
> I agree with your thoughts. On the other hand, I also consider that
> the "more frontend implementation" suggests each guest OS has each own
> emulation policy, therefore emulating on the frontend is suitable,
> maybe. It's very difficult to decide which approach I should take.
> Each approach has both good points and bad points. :-<
>
> However, I would like to take the first approach, emulation on the
> backend, according to your and James Smart-san's advise, and to start
> implementation. :-)
It's a tricky decision, but I think this is the best path.
> > The main difference from a performance point of view is that the
> > second approach will use a ring for each device, whereas the first has
> > a single ring shared across all devices, so you'll get more requests
> > in flight with the second scheme. I'd expect that just making the
> > rings larger would have more effect, though, and that's easier when
> > there's just one of them.
> I expect the Netchannel2 for solving performance issues.
It'll avoid this particular issue, yes.
> > Looking through the SCSI spec, I don't think we're going to be able to
> > get away with passing requests through from the frontend all the way
> > to the physical disk without sanity checking the actual CDB in the
> > backend. There are a couple of commands which look scary:
...
> > What this means is that the REPORT LUNS issue fundamentally isn't
> > restricted to just the REPORT LUNS command, but instead affects an
> > unknown and potentially large set of other commands. The only way I
> > can see to deal with this is to white-list commands individually once
> > they've been confirmed to be safe, and have the backend block any
> > commands which haven't been checked yet. That's going to be a fair
> > amount of work, and it'll screw up the whole ``transparent pass
> > through'' thing, but I can't see any other way of solving this problem
> > safely.
> I will take the approach that start with mandatory SCSI commands by
> white-list, and expands the other commands.
Thank you.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2008-03-17 17:29 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-18 10:10 [Patch 0/7] pvSCSI driver Jun Kamada
2008-02-18 12:14 ` James Harper
2008-02-19 2:26 ` Jun Kamada
2008-02-19 2:28 ` James Harper
2008-02-20 3:58 ` James Harper
2008-02-20 5:09 ` Jun Kamada
2008-02-21 2:19 ` James Harper
2008-02-21 3:39 ` James Harper
2008-02-21 4:23 ` Jun Kamada
2008-02-21 5:30 ` James Harper
2008-02-25 1:53 ` Jun Kamada
2008-02-27 11:16 ` Steven Smith
2008-02-28 2:51 ` Jun Kamada
2008-02-28 11:13 ` Steven Smith
2008-02-29 4:47 ` Jun Kamada
2008-03-03 11:38 ` Steven Smith
2008-03-04 7:57 ` Jun Kamada
2008-03-04 13:05 ` Steven Smith
2008-03-05 2:34 ` James Harper
2008-03-05 9:53 ` Jun Kamada
2008-03-05 9:56 ` James Harper
2008-03-05 10:00 ` Jun Kamada
2008-03-06 23:48 ` Dan Magenheimer
2008-03-07 1:20 ` Jun Kamada
2008-03-07 2:55 ` Jun Kamada
2008-03-07 4:31 ` James Harper
2008-03-14 19:04 ` James Smart
2008-03-10 12:00 ` Steven Smith
2008-03-12 6:23 ` Jun Kamada
2008-03-13 14:30 ` Steven Smith
2008-03-17 2:33 ` Jun Kamada
2008-03-17 17:29 ` Steven Smith
2008-03-14 19:16 ` James Smart
2008-03-17 2:59 ` Jun Kamada
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.