* [Qemu-devel] virtio-scsi spec, first public draft
@ 2011-05-04 22:28 Paolo Bonzini
2011-05-05 9:43 ` Stefan Hajnoczi
2011-05-05 12:50 ` Christoph Hellwig
0 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2011-05-04 22:28 UTC (permalink / raw)
To: qemu-devel, Hannes Reinecke, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 118 bytes --]
Here it is at last...
It might be overengineered, I'm waiting for the SCSI experts to tell me
about that. :)
Paolo
[-- Attachment #2: virtio-scsi-v4.txt --]
[-- Type: text/plain, Size: 14702 bytes --]
Virtio SCSI Controller Device Spec
==================================
The virtio controller device groups together one or more simple virtual
devices (ie. disk), and allows communicating to these devices using the
SCSI protocol. A controller device represents a SCSI host with many
targets attached.
The virtio controller services two kinds of requests:
- command requests for a logical unit;
- task management functions related to a logical unit, target or
command.
The controller is also able to send out notifications about added
and removed devices.
v4:
First public version
Configuration
-------------
Subsystem Device ID
TBD
Virtqueues
0..n-1:one requestq per target
n:control transmitq
n+1:control receiveq
Feature bits
VIRTIO_SCSI_F_INOUT - Whether a single request can include both
read-only and write-only data buffers.
Device configuration layout
struct virtio_scsi_config {
u32 num_targets;
}
num_targets is the number of targets, and the id of the
virtqueue used for the control receiveq.
Device initialization
---------------------
The initialization routine should first of all discover the controller's
control virtqueues.
The driver should then place at least a buffer in the control receiveq.
Buffers returned by the device on the control receiveq may be referred
to as "events" in the rest of the document.
The driver can immediately issue requests (for example, INQUIRY or
REPORT LUNS) or task management functions (for example, I_T RESET).
Device operation: request queue
-------------------------------
The driver queues requests to the virtqueue, and they are used by the device
(not necessarily in order).
Requests have the following format:
struct virtio_scsi_req
{
u32 type;
...
u8 response;
}
#define VIRTIO_SCSI_T_BARRIER 0x80000000
The type identifies the remaining fields. The value
VIRTIO_SCSI_T_BARRIER can be ORed in the type as well. This bit
indicates that this request acts as a barrier and that all preceding
requests must be complete before this one, and all following requests
must not be started until this is complete. Note that a barrier
does not flush caches in the underlying backend device in host,
and thus does not serve as data consistency guarantee. The driver
must send a SYNCHRONIZE CACHE command to flush the host cache.
Valid response values are defined separately for each command.
- Task management function
#define VIRTIO_SCSI_T_TMF 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH (1 << 24)
struct virtio_scsi_req_tmf
{
u32 subtype;
u8 lun[8];
u8 additional[];
u8 response;
}
/* command-specific response values */
#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
#define VIRTIO_SCSI_S_NO_TARGET 1
#define VIRTIO_SCSI_S_TARGET_FAILURE 2
#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 3
#define VIRTIO_SCSI_S_FUNCTION_REJECTED 4
#define VIRTIO_SCSI_S_INCORRECT_LUN 5
The type is VIRTIO_SCSI_T_LUN_INFO, possibly with the
VIRTIO_SCSI_T_BARRIER bit ORed in.
The subtype and lun field are filled in by the driver, the additional
and response field is filled in by the device. Unknown LUNs are
ignored; also, the lun field is ignored for the I_T NEXUS RESET
command.
Task management functions accepting an I_T_L_Q nexus (ABORT TASK,
QUERY TASK) are only accessible through the control transmitq.
Task management functions not in the above list are not accessible
in this version of the specification. Future versions may allow
access to them through additional features.
VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
logical unit (and the target as well if this is the last logical
unit) disappear. It takes an I_T_L nexus. This non-standard TMF
should be used in response to a host request to shutdown a target
or LUN, after having placed the LUN in a clean state.
The outcome of the task management function is written by the device
in the response field. A value of VIRTIO_SCSI_S_NO_TARGET means
that (even though the virtqueue exists) there is no target with this
number. Other return values map 1-to-1 with those defined in SAM.
- SCSI command
#define VIRTIO_SCSI_T_CMD 1
struct virtio_scsi_req_cmd {
u32 type;
u32 ioprio;
u8 lun[8];
u64 id;
u32 num_dataout, num_datain;
char cdb[];
char data[][num_dataout+num_datain];
u8 sense[];
u32 sense_len;
u32 residual;
u8 status;
u8 response;
};
/* command-specific response values */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_NO_TARGET 1
#define VIRTIO_SCSI_S_UNDERRUN 6
#define VIRTIO_SCSI_S_ABORTED 7
#define VIRTIO_SCSI_S_FAILURE 8
The type field must be VIRTIO_SCSI_T_CMD, possibly with the
VIRTIO_SCSI_T_BARRIER bit ORed in. The ioprio field will indicate the
priority of this request, with higher values corresponding to higher
priorities. The lun field addresses a logical unit in the target.
The id field is the command identifier as defined in SAM. All
of these fields are always read-only.
The cdb, data and sense fields must reside in separate buffers.
The cdb field is always read-only. The data buffers may be either
read-only or write-only, depending on the request, with the read-only
buffers coming first. The sense buffer is always write-only.
The request shall have num_dataout read-only data buffers and
num_datain write-only data buffers. One of these two values must be
zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.
Remaining fields are filled in by the device. The sense_len field
indicates the number of bytes actually written to the sense buffer,
while the residual field indicates the residual size, calculated as
data_length - number_of_transferred_bytes.
The status byte is written by the device to be the SCSI status code.
The response byte is written by the device to be one of the following:
- VIRTIO_SCSI_S_OK when the request was completed and the status byte
is filled with a SCSI status code (not necessarily "GOOD").
- VIRTIO_SCSI_S_NO_TARGET is returned if there is no target with this
number.
- VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
more data than is available in the data buffers.
- VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a LUN reset,
target reset or another task management function.
- VIRTIO_SCSI_S_FAILURE for other host or guest error.
- Asynchronous notification query
#define VIRTIO_SCSI_T_AN_QUERY 2
struct virtio_scsi_an_subscribe {
u8 lun[8];
u32 event_requested;
u32 event_actual;
}
#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
By sending this command, the driver asks the controller which
events it can report, as described in Annex A of the SCSI MMC
specification. The driver writes the events it is interested in
into the event_requested; the device responds by writing the events
that it supports into event_actual.
The lun and event_requested fields are written by the driver.
The event_actual and response fields are written by the device.
Valid values of the response byte are VIRTIO_SCSI_S_OK,
VIRTIO_SCSI_S_NO_TARGET, VIRTIO_SCSI_S_FAILURE (with the same meaning
as above).
- Asynchronous notification subscription
#define VIRTIO_SCSI_T_AN_SUBSCRIBE 3
struct virtio_scsi_an_subscribe {
u8 lun[8];
u32 event_requested;
u32 event_actual;
}
#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
By sending this command, the driver asks the controller to report
events as described in Annex A of the SCSI MMC specification. The
driver writes the events it is interested in into the event_requested;
the device responds by writing the events that it supports into
event_actual.
The lun and event_requested fields are written by the driver.
The event_actual and response fields are written by the device.
Valid values of the response byte are VIRTIO_SCSI_S_OK,
VIRTIO_SCSI_S_NO_TARGET, VIRTIO_SCSI_S_FAILURE (with the same meaning
as above).
Device operation: control transmitq
-----------------------------------
Currently, the control transmitq is only used to send out-of-band task
management functions. This allows the driver to abort tasks even when
a target's virtqueue is full. Note that, in contrast, sending TMF on
the request virtqueues allows setting the VIRTIO_SCSI_T_BARRIER bit;
no similar functionality is provided by the control transmitq.
Requests have the following format:
struct virtio_scsi_ctrl
{
u32 type;
...
u8 response;
}
The type identifies the remaining fields.
The following commands are defined:
- Task management function
#define VIRTIO_SCSI_T_TMF 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH (1 << 24)
struct virtio_scsi_ctrl_tmf
{
u32 subtype;
u32 target;
u8 lun[8];
u64 id;
u8 additional[];
u8 response;
}
/* command-specific response values */
#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
#define VIRTIO_SCSI_S_NO_TARGET 1
#define VIRTIO_SCSI_S_FAILURE 2
#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 3
#define VIRTIO_SCSI_S_FUNCTION_REJECTED 4
#define VIRTIO_SCSI_S_INCORRECT_LUN 5
The type is VIRTIO_SCSI_T_TMF. All fields but the last one are
filled by the driver, the response field is filled in by the device.
The id command must match the id in a SCSI command. Irrelevant fields
for the requested TMF are ignored.
VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
logical unit (and the target as well if this is the last logical
unit) disappear. It takes an I_T_L nexus. This non-standard TMF
should be used in response to a host request to shutdown a target
or LUN, after having placed the LUN in a clean state.
The outcome of the task management function is written by the device
in the response field. VIRTIO_SCSI_S_NO_TARGET is returned if
there is no target with this number. Other return values map 1-to-1
with those defined in SAM.
Device operation: control receiveq
----------------------------------
The control receiveq is used by the device to report information on
devices that are attached to the controller. The driver should always
leave a few (?) buffers ready in the control receiveq. The device may
end up dropping events if it finds no buffer ready.
Buffers are placed in the control receiveq and filled by the device when
interesting events occur. Events have the following format:
#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
struct virtio_scsi_ctrl_recv {
u32 event;
u8 event-specific-data[];
u8 event-specific-data-ret[];
}
If bit 31 is set in the event, the device failed to report an event due
to missing buffers. In this case, the driver should poll the logical
units for unit attention conditions, and/or do whatever form of bus scan
is appropriate for the guest operating system.
The following events are defined:
- Transport reset
#define VIRTIO_SCSI_T_TRANSPORT_RESET 0
struct virtio_scsi_reset {
u32 target;
u8 lun[8];
u32 reason;
}
#define VIRTIO_SCSI_EVT_RESET_HARD 1
#define VIRTIO_SCSI_EVT_RESET_RESCAN 2
#define VIRTIO_SCSI_EVT_RESET_SHUTDOWN 4
#define VIRTIO_SCSI_EVT_RESET_REMOVED 8
By sending this event, the controller signals that a logical unit
on a target has been reset, including the case of a new device
appearing or disappearing on the bus.
The device fills in all three fields. By convention, a LUN field
referring to a well-known LUN means that the event affects all LUNs
enumerated by the target.
The reason value is one of the four #define values appearing above.
VIRTIO_SCSI_EVT_RESET_REMOVED is used if the target or logical unit
is no longer able to receive commands. VIRTIO_SCSI_EVT_RESET_HARD
is used if the logical unit has been reset, but is still present.
VIRTIO_SCSI_EVT_RESET_RESCAN is used if a target or logical unit has
just appeared on the controller. VIRTIO_SCSI_EVT_RESET_SHUTDOWN
is used when the host wants to initiate a graceful shutdown of a
logical unit.
Events should also be reported via sense codes or response codes,
with the exception of newly appeared targets:
- VIRTIO_SCSI_EVT_RESET_HARD
sense UNIT ATTENTION
asc POWER ON, RESET OR BUS DEVICE RESET OCCURRED
- VIRTIO_SCSI_EVT_RESET_RESCAN
sense UNIT ATTENTION
asc REPORTED LUNS DATA HAS CHANGED
- VIRTIO_SCSI_EVT_RESET_SHUTDOWN
sense UNIT ATTENTION
asc TARGET OPERATING CONDITIONS HAVE CHANGED
ascq 0x80 (vendor specific)
- VIRTIO_SCSI_EVT_RESET_REMOVED
sense ILLEGAL REQUEST
asc LOGICAL UNIT NOT SUPPORTED
However, in general events should be more easily handled by the
driver than sense codes.
- Asynchronous notification
#define VIRTIO_SCSI_T_MEDIA_CHANGE 1
struct virtio_scsi_reset {
u32 target;
u8 lun[8];
u32 event;
}
#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 1
By sending this event, the controller signals that an event was
fired from a physical interface. The device only sends events
that the driver has subscribed to via the "Asynchronous notification
subscription" command.
All fields are written by the device.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-04 22:28 [Qemu-devel] virtio-scsi spec, first public draft Paolo Bonzini
@ 2011-05-05 9:43 ` Stefan Hajnoczi
2011-05-05 12:49 ` Paolo Bonzini
2011-05-05 12:50 ` Christoph Hellwig
1 sibling, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-05-05 9:43 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, Hannes Reinecke
On Thu, May 05, 2011 at 12:28:31AM +0200, Paolo Bonzini wrote:
> Virtio SCSI Controller Device Spec
> ==================================
>
> The virtio controller device groups together one or more simple virtual
> devices (ie. disk), and allows communicating to these devices using the
> SCSI protocol. A controller device represents a SCSI host with many
> targets attached.
>
> The virtio controller services two kinds of requests:
>
> - command requests for a logical unit;
>
> - task management functions related to a logical unit, target or
> command.
>
> The controller is also able to send out notifications about added
> and removed devices.
>
> v4:
> First public version
>
> Configuration
> -------------
>
> Subsystem Device ID
> TBD
>
> Virtqueues
> 0..n-1:one requestq per target
> n:control transmitq
> n+1:control receiveq
1 requestq per target makes it harder to support large numbers or
dynamic targets. You mention detaching targets so is there a way to add
a target?
The following would be simpler:
0:requestq
1:control transmitq
2:control receiveq
Requests must include a target port identifier/name so that they can be
delivered to the correct target. Adding or removing targets is easy
with a single requestq since the virtqueues don't change.
> Feature bits
> VIRTIO_SCSI_F_INOUT - Whether a single request can include both
> read-only and write-only data buffers.
Why make this an optional feature?
> Device configuration layout
> struct virtio_scsi_config {
> u32 num_targets;
> }
>
> num_targets is the number of targets, and the id of the
> virtqueue used for the control receiveq.
>
> Device initialization
> ---------------------
>
> The initialization routine should first of all discover the controller's
> control virtqueues.
>
> The driver should then place at least a buffer in the control receiveq.
> Buffers returned by the device on the control receiveq may be referred
> to as "events" in the rest of the document.
>
> The driver can immediately issue requests (for example, INQUIRY or
> REPORT LUNS) or task management functions (for example, I_T RESET).
>
> Device operation: request queue
> -------------------------------
>
> The driver queues requests to the virtqueue, and they are used by the device
> (not necessarily in order).
>
> Requests have the following format:
>
> struct virtio_scsi_req
> {
> u32 type;
> ...
> u8 response;
> }
>
> #define VIRTIO_SCSI_T_BARRIER 0x80000000
>
> The type identifies the remaining fields. The value
> VIRTIO_SCSI_T_BARRIER can be ORed in the type as well. This bit
> indicates that this request acts as a barrier and that all preceding
> requests must be complete before this one, and all following requests
> must not be started until this is complete. Note that a barrier
> does not flush caches in the underlying backend device in host,
> and thus does not serve as data consistency guarantee. The driver
> must send a SYNCHRONIZE CACHE command to flush the host cache.
Why are these barrier semantics needed?
> Valid response values are defined separately for each command.
>
> - Task management function
>
> #define VIRTIO_SCSI_T_TMF 0
>
> #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
> #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
> #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
> #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
>
> #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH (1 << 24)
>
> struct virtio_scsi_req_tmf
> {
> u32 subtype;
> u8 lun[8];
> u8 additional[];
> u8 response;
> }
>
> /* command-specific response values */
> #define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
> #define VIRTIO_SCSI_S_NO_TARGET 1
> #define VIRTIO_SCSI_S_TARGET_FAILURE 2
> #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 3
> #define VIRTIO_SCSI_S_FUNCTION_REJECTED 4
> #define VIRTIO_SCSI_S_INCORRECT_LUN 5
>
> The type is VIRTIO_SCSI_T_LUN_INFO, possibly with the
> VIRTIO_SCSI_T_BARRIER bit ORed in.
Did you mean "type is VIRTIO_SCSI_T_TMF"?
>
> The subtype and lun field are filled in by the driver, the additional
> and response field is filled in by the device. Unknown LUNs are
> ignored; also, the lun field is ignored for the I_T NEXUS RESET
> command.
In/out buffers must be separate in virtio so I think it makes sense to
split apart a struct virtio_scsi_tmf_req and struct
virtio_scsi_tmf_resp.
> Task management functions accepting an I_T_L_Q nexus (ABORT TASK,
> QUERY TASK) are only accessible through the control transmitq.
> Task management functions not in the above list are not accessible
> in this version of the specification. Future versions may allow
> access to them through additional features.
>
> VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
> logical unit (and the target as well if this is the last logical
> unit) disappear. It takes an I_T_L nexus. This non-standard TMF
> should be used in response to a host request to shutdown a target
> or LUN, after having placed the LUN in a clean state.
Do we need an initiator-driven detach? If the initiator doesn't care
about a device anymore it simply doesn't communicate with it or allocate
resources for it. I think the real detach should be performed on the
target side (e.g. QEMU monitor command removes the target from the SCSI
bus). So I guess I'm asking what is the real use-case for this
function?
> The outcome of the task management function is written by the device
> in the response field. A value of VIRTIO_SCSI_S_NO_TARGET means
> that (even though the virtqueue exists) there is no target with this
> number. Other return values map 1-to-1 with those defined in SAM.
>
> - SCSI command
>
> #define VIRTIO_SCSI_T_CMD 1
>
> struct virtio_scsi_req_cmd {
> u32 type;
> u32 ioprio;
> u8 lun[8];
> u64 id;
> u32 num_dataout, num_datain;
> char cdb[];
> char data[][num_dataout+num_datain];
> u8 sense[];
> u32 sense_len;
> u32 residual;
> u8 status;
> u8 response;
> };
We don't need explicit buffer size fields since virtqueue elements
include sizes. For example:
size_t sense_len = elem->in_sg[sense_idx].iov_len;
memcpy(elem->in_sg[sense_idx].iov_buf, sense_buf,
MIN(sense_len, sizeof(sense_buf)));
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-05 9:43 ` Stefan Hajnoczi
@ 2011-05-05 12:49 ` Paolo Bonzini
2011-05-05 14:29 ` Hannes Reinecke
0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2011-05-05 12:49 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, Hannes Reinecke
>> Virtqueues
>> 0..n-1:one requestq per target
>> n:control transmitq
>> n+1:control receiveq
>
> 1 requestq per target makes it harder to support large numbers or
> dynamic targets.
I chose 1 requestq per target so that, with MSI-X support, each target
can be associated to one MSI-X vector.
If you want a large number of units, you can subdivide targets into
logical units, or use multiple adapters if you prefer. We can have
20-odd SCSI adapters, each with 65534 targets. I think we're way beyond
the practical limits even before LUN support is added to QEMU.
For comparison, Windows supports up to 1024 targets per adapter (split
across 8 channels); IBM vSCSI provides up to 128; VMware supports a
maximum of 15 SCSI targets per adapter and 4 adapters per VM.
> You mention detaching targets so is there a way to add
> a target?
Yes, just add the first LUN to it (it will be LUN0 which must be there
anyway). The target's existence will be reported on the control receiveq.
>> Feature bits
>> VIRTIO_SCSI_F_INOUT - Whether a single request can include both
>> read-only and write-only data buffers.
>
> Why make this an optional feature?
Because QEMU does not support it so far.
>> The type identifies the remaining fields. The value
>> VIRTIO_SCSI_T_BARRIER can be ORed in the type as well. This bit
>> indicates that this request acts as a barrier and that all preceding
>> requests must be complete before this one, and all following requests
>> must not be started until this is complete. Note that a barrier
>> does not flush caches in the underlying backend device in host,
>> and thus does not serve as data consistency guarantee. The driver
>> must send a SYNCHRONIZE CACHE command to flush the host cache.
>
> Why are these barrier semantics needed?
They are a convenience that I took from virtio-blk. They are not needed
in upstream Linux (which uses flush/FUA instead), so I'm not wedded to
it, but they may be useful if virtio-scsi is ever ported to the stable
2.6.32 series.
>> The type is VIRTIO_SCSI_T_LUN_INFO, possibly with the
>> VIRTIO_SCSI_T_BARRIER bit ORed in.
>
> Did you mean "type is VIRTIO_SCSI_T_TMF"?
Yes, of course. Will fix.
>>
>> The subtype and lun field are filled in by the driver, the additional
>> and response field is filled in by the device. Unknown LUNs are
>> ignored; also, the lun field is ignored for the I_T NEXUS RESET
>> command.
>
> In/out buffers must be separate in virtio so I think it makes sense to
> split apart a struct virtio_scsi_tmf_req and struct
> virtio_scsi_tmf_resp.
Here I was using the same standard used by the existing virtio specs,
which place both kinds of buffers in the same struct. I am fine with
separating the two (and similarly for the other requests), but I'd
rather not make virtio-scsi the only different one.
>> VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
>> logical unit (and the target as well if this is the last logical
>> unit) disappear. It takes an I_T_L nexus. This non-standard TMF
>> should be used in response to a host request to shutdown a target
>> or LUN, after having placed the LUN in a clean state.
>
> Do we need an initiator-driven detach? If the initiator doesn't care
> about a device anymore it simply doesn't communicate with it or allocate
> resources for it. I think the real detach should be performed on the
> target side (e.g. QEMU monitor command removes the target from the SCSI
> bus). So I guess I'm asking what is the real use-case for this
> function?
It is not really an initiator-driven detach, it is the initiator's
acknowledgement of a target-driven detach. The target needs to know
when the initiator is ready so that it can free resources attached to
the logical unit (this is particularly important if the LU is a physical
disk and it is opened with exclusive access).
>> - SCSI command
>>
>> #define VIRTIO_SCSI_T_CMD 1
>>
>> struct virtio_scsi_req_cmd {
>> u32 type;
>> u32 ioprio;
>> u8 lun[8];
>> u64 id;
>> u32 num_dataout, num_datain;
>> char cdb[];
>> char data[][num_dataout+num_datain];
>> u8 sense[];
>> u32 sense_len;
>> u32 residual;
>> u8 status;
>> u8 response;
>> };
>
> We don't need explicit buffer size fields since virtqueue elements
> include sizes. For example:
>
> size_t sense_len = elem->in_sg[sense_idx].iov_len;
> memcpy(elem->in_sg[sense_idx].iov_buf, sense_buf,
> MIN(sense_len, sizeof(sense_buf)));
I think only the total length is written in the used ring, letting the
driver figure out the number of bytes written to the sense buffer is
harder than just writing it.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-05 12:49 ` Paolo Bonzini
@ 2011-05-05 14:29 ` Hannes Reinecke
2011-05-05 14:50 ` Paolo Bonzini
0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2011-05-05 14:29 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel
Hi all,
On 05/05/2011 02:49 PM, Paolo Bonzini wrote:
>>> Virtqueues
>>> 0..n-1:one requestq per target
>>> n:control transmitq
>>> n+1:control receiveq
>>
>> 1 requestq per target makes it harder to support large numbers or
>> dynamic targets.
>
> I chose 1 requestq per target so that, with MSI-X support, each
> target can be associated to one MSI-X vector.
>
> If you want a large number of units, you can subdivide targets into
> logical units, or use multiple adapters if you prefer. We can have
> 20-odd SCSI adapters, each with 65534 targets. I think we're way
> beyond the practical limits even before LUN support is added to QEMU.
>
But this will make queue full tracking harder.
If we have one queue per LUN the SCSI stack is able to track QUEUE
FULL states and will adjust the queue depth accordingly.
When we have only one queue per target we cannot track QUEUE FULL
anymore and have to rely on the static per-host 'can_queue' setting.
Which doesn't work as well, especially in a virtualized environment
where the queue full conditions might change at any time.
But read on:
> For comparison, Windows supports up to 1024 targets per adapter
> (split across 8 channels); IBM vSCSI provides up to 128; VMware
> supports a maximum of 15 SCSI targets per adapter and 4 adapters per
> VM.
>
We don't have to impose any hard limits here. The virtio scsi
transport would need to be able to detect the targets, and we would
be using whatever targets have been found.
>> You mention detaching targets so is there a way to add
>> a target?
>
> Yes, just add the first LUN to it (it will be LUN0 which must be
> there anyway). The target's existence will be reported on the
> control receiveq.
>
?? How is this supposed to work?
How can I detect the existence of a virtqueue ?
For this I actually like the MSI-X idea:
If we were to rely on MSI-X to refer to the virtqueues we could
just parse the MSI-X structure and create virtqueues for each entry
found to be valid.
And to be consistent with the SCSI layer the virtqueues then in fact
would need to map the SCSI targets; LUNs would be detected from the
SCSI midlayer outside the control of the virtio-scsi HBA.
>>> Feature bits
>>> VIRTIO_SCSI_F_INOUT - Whether a single request can include both
>>> read-only and write-only data buffers.
>>
>> Why make this an optional feature?
>
> Because QEMU does not support it so far.
>
>>> The type identifies the remaining fields. The value
>>> VIRTIO_SCSI_T_BARRIER can be ORed in the type as well. This bit
>>> indicates that this request acts as a barrier and that all preceding
>>> requests must be complete before this one, and all following
>>> requests
>>> must not be started until this is complete. Note that a barrier
>>> does not flush caches in the underlying backend device in host,
>>> and thus does not serve as data consistency guarantee. The driver
>>> must send a SYNCHRONIZE CACHE command to flush the host cache.
>>
>> Why are these barrier semantics needed?
>
> They are a convenience that I took from virtio-blk. They are not
> needed in upstream Linux (which uses flush/FUA instead), so I'm not
> wedded to it, but they may be useful if virtio-scsi is ever ported
> to the stable 2.6.32 series.
>
As mentioned by hch; just drop this.
[ .. ]
>>> VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
>>> logical unit (and the target as well if this is the last logical
>>> unit) disappear. It takes an I_T_L nexus. This non-standard TMF
>>> should be used in response to a host request to shutdown a target
>>> or LUN, after having placed the LUN in a clean state.
>>
>> Do we need an initiator-driven detach? If the initiator doesn't care
>> about a device anymore it simply doesn't communicate with it or
>> allocate
>> resources for it. I think the real detach should be performed on the
>> target side (e.g. QEMU monitor command removes the target from the
>> SCSI
>> bus). So I guess I'm asking what is the real use-case for this
>> function?
>
> It is not really an initiator-driven detach, it is the initiator's
> acknowledgement of a target-driven detach. The target needs to know
> when the initiator is ready so that it can free resources attached
> to the logical unit (this is particularly important if the LU is a
> physical disk and it is opened with exclusive access).
>
Not required. The target can detach any LUN at any time and can rely
on the initiator to handle this situation. Multipath handles this
just fine.
>>> - SCSI command
>>>
>>> #define VIRTIO_SCSI_T_CMD 1
>>>
>>> struct virtio_scsi_req_cmd {
>>> u32 type;
>>> u32 ioprio;
>>> u8 lun[8];
>>> u64 id;
>>> u32 num_dataout, num_datain;
>>> char cdb[];
>>> char data[][num_dataout+num_datain];
>>> u8 sense[];
>>> u32 sense_len;
>>> u32 residual;
>>> u8 status;
>>> u8 response;
>>> };
>>
>> We don't need explicit buffer size fields since virtqueue elements
>> include sizes. For example:
>>
>> size_t sense_len = elem->in_sg[sense_idx].iov_len;
>> memcpy(elem->in_sg[sense_idx].iov_buf, sense_buf,
>> MIN(sense_len, sizeof(sense_buf)));
>
> I think only the total length is written in the used ring, letting
> the driver figure out the number of bytes written to the sense
> buffer is harder than just writing it.
>
Yes. The sense buffer would be present always, and we will need a
means of detecting whether the contents of the sense buffer are valid.
And no, CHECK CONDITION is not sufficient here.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-05 14:29 ` Hannes Reinecke
@ 2011-05-05 14:50 ` Paolo Bonzini
2011-05-06 12:31 ` Stefan Hajnoczi
0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2011-05-05 14:50 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Stefan Hajnoczi, qemu-devel
On 05/05/2011 04:29 PM, Hannes Reinecke wrote:
>> I chose 1 requestq per target so that, with MSI-X support, each
>> target can be associated to one MSI-X vector.
>>
>> If you want a large number of units, you can subdivide targets into
>> logical units, or use multiple adapters if you prefer. We can have
>> 20-odd SCSI adapters, each with 65534 targets. I think we're way
>> beyond the practical limits even before LUN support is added to QEMU.
>
> But this will make queue full tracking harder.
> If we have one queue per LUN the SCSI stack is able to track QUEUE FULL
> states and will adjust the queue depth accordingly.
> When we have only one queue per target we cannot track QUEUE FULL
> anymore and have to rely on the static per-host 'can_queue' setting.
> Which doesn't work as well, especially in a virtualized environment
> where the queue full conditions might change at any time.
So you want one virtqueue per LUN? I had it in the first version, but
then you had to associate a (target, 8-byte LUN) pair to each virtqueue
manually. That was very hairy, so I changed it to one target per queue.
> But read on:
>
>> For comparison, Windows supports up to 1024 targets per adapter
>> (split across 8 channels); IBM vSCSI provides up to 128; VMware
>> supports a maximum of 15 SCSI targets per adapter and 4 adapters per
>> VM.
>>
> We don't have to impose any hard limits here. The virtio scsi transport
> would need to be able to detect the targets, and we would be using
> whatever targets have been found.
Yes, that's what I wrote above. Right now "detect the targets" means
"send INQUIRY for LUN0 and/or REPORT LUNS to each virtqueue", thanks to
the 1:1 relationship. In my first version it would mean:
- associate each target's LUN0 to a virtqueue
- if needed, send INQUIRY for LUN0 and/or REPORT LUNS
- if needed, deassociate the LUN0 and the virtqueue
Really, it was ugly. It also brings a lot more the question, such as
what to do if a virtqueue has pending requests at deassociation time.
>> Yes, just add the first LUN to it (it will be LUN0 which must be
>> there anyway). The target's existence will be reported on the
>> control receiveq.
>>
> ?? How is this supposed to work?
> How can I detect the existence of a virtqueue ?
Config space tells you how many virtqueue exist. That gives how many
targets you can address at most. If some of them are empty at the
beginning of the guest's life, their LUN0 will fail to answer INQUIRY
and REPORT LUNS.
(It is the same for vmw_pvscsi by the way, except simpler: the maximum #
of targets is not configurable, and there is just one queue + one
interrupt).
> And to be consistent with the SCSI layer the virtqueues then in fact
> would need to map the SCSI targets; LUNs would be detected from the SCSI
> midlayer outside the control of the virtio-scsi HBA.
Exactly, that was my point! It seemed so clean compared to a dynamic
assignment between LUNs and virtqueues.
>>>> VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
>>>> logical unit (and the target as well if this is the last logical
>>>> unit) disappear. It takes an I_T_L nexus. This non-standard TMF
>>>> should be used in response to a host request to shutdown a target
>>>> or LUN, after having placed the LUN in a clean state.
>>
>> It is not really an initiator-driven detach, it is the initiator's
>> acknowledgement of a target-driven detach. The target needs to know
>> when the initiator is ready so that it can free resources attached
>> to the logical unit (this is particularly important if the LU is a
>> physical disk and it is opened with exclusive access).
>>
> Not required. The target can detach any LUN at any time and can rely on
> the initiator to handle this situation. Multipath handles this just fine.
I didn't invent this, we had a customer request this feature for Xen
guests in the past (a "soft" target detach where the filesystem is
unmounted cleanly). But I guess I can drop it since KVM guests have
agents like Matahari that will take care of this. They will use
out-of-band channels to start an initiator-driven detach, and I guess
it's better this way. :)
BTW, with barriers gone, I think I can also drop the per-target TMF command.
Thanks for the review.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-05 14:50 ` Paolo Bonzini
@ 2011-05-06 12:31 ` Stefan Hajnoczi
2011-05-06 12:48 ` Paolo Bonzini
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-05-06 12:31 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, Hannes Reinecke, Stefan Hajnoczi
On Thu, May 5, 2011 at 3:50 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 05/05/2011 04:29 PM, Hannes Reinecke wrote:
>>>
>>> I chose 1 requestq per target so that, with MSI-X support, each
>>> target can be associated to one MSI-X vector.
>>>
>>> If you want a large number of units, you can subdivide targets into
>>> logical units, or use multiple adapters if you prefer. We can have
>>> 20-odd SCSI adapters, each with 65534 targets. I think we're way
>>> beyond the practical limits even before LUN support is added to QEMU.
>>
>> But this will make queue full tracking harder.
>> If we have one queue per LUN the SCSI stack is able to track QUEUE FULL
>> states and will adjust the queue depth accordingly.
>> When we have only one queue per target we cannot track QUEUE FULL
>> anymore and have to rely on the static per-host 'can_queue' setting.
>> Which doesn't work as well, especially in a virtualized environment
>> where the queue full conditions might change at any time.
>
> So you want one virtqueue per LUN? I had it in the first version, but then
> you had to associate a (target, 8-byte LUN) pair to each virtqueue manually.
> That was very hairy, so I changed it to one target per queue.
>
>> But read on:
>>
>>> For comparison, Windows supports up to 1024 targets per adapter
>>> (split across 8 channels); IBM vSCSI provides up to 128; VMware
>>> supports a maximum of 15 SCSI targets per adapter and 4 adapters per
>>> VM.
>>>
>> We don't have to impose any hard limits here. The virtio scsi transport
>> would need to be able to detect the targets, and we would be using
>> whatever targets have been found.
>
> Yes, that's what I wrote above. Right now "detect the targets" means "send
> INQUIRY for LUN0 and/or REPORT LUNS to each virtqueue", thanks to the 1:1
> relationship. In my first version it would mean:
>
> - associate each target's LUN0 to a virtqueue
> - if needed, send INQUIRY for LUN0 and/or REPORT LUNS
> - if needed, deassociate the LUN0 and the virtqueue
>
> Really, it was ugly. It also brings a lot more the question, such as what
> to do if a virtqueue has pending requests at deassociation time.
>
>>> Yes, just add the first LUN to it (it will be LUN0 which must be
>>> there anyway). The target's existence will be reported on the
>>> control receiveq.
>>>
>> ?? How is this supposed to work?
>> How can I detect the existence of a virtqueue ?
>
> Config space tells you how many virtqueue exist. That gives how many
> targets you can address at most. If some of them are empty at the beginning
> of the guest's life, their LUN0 will fail to answer INQUIRY and REPORT LUNS.
>
> (It is the same for vmw_pvscsi by the way, except simpler: the maximum # of
> targets is not configurable, and there is just one queue + one interrupt).
Okay, this explains how you plan to handle targets appearing - you
want to set a maximum number of targets. I was wondering how we would
add virtqueues dynamically (and why the control vqs are placed last at
n,n+1 instead of 0,1). Like Hannes said, why introduce a limit here
if we don't have to?
I'm really not sure I understand the win of creating lots of
virtqueues. I just want a pipe out onto the SCSI bus so I can talk to
all devices in the SCSI domain. Creating separate virtqueues
increases complexity in the driver and emulation IMO.
What is the MSI-X win you mentioned? I guess if an application on
vcpu0 is accessing target0 a lot then interrupt handling can be
handled on vcpu0 while other vcpus handle interrupts for other SCSI
targets? I remember VMware pv scsi has a trick here, each request can
contain the vcpu number which influences interrupt routing somehow.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-06 12:31 ` Stefan Hajnoczi
@ 2011-05-06 12:48 ` Paolo Bonzini
0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2011-05-06 12:48 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, Hannes Reinecke, Stefan Hajnoczi
On 05/06/2011 02:31 PM, Stefan Hajnoczi wrote:
> Okay, this explains how you plan to handle targets appearing - you
> want to set a maximum number of targets. I was wondering how we would
> add virtqueues dynamically (and why the control vqs are placed last at
> n,n+1 instead of 0,1).
You don't, it's not in the spec. On the other hand, I don't think a
limit on the number of targets is imposing, and the limit that virtio
places is more theoretical than practical.
(Control virtqueues are last simply to avoid +2 and -2 all over the place).
> I'm really not sure I understand the win of creating lots of
> virtqueues. I just want a pipe out onto the SCSI bus so I can talk to
> all devices in the SCSI domain. Creating separate virtqueues
> increases complexity in the driver and emulation IMO.
In the driver, probably. Emulation shouldn't change much, there's so
little to do in the end in a PV HBA emulation if you have a proper SCSI
subsystem and the protocol is a simple transport or reasonably close.
> What is the MSI-X win you mentioned? I guess if an application on
> vcpu0 is accessing target0 a lot then interrupt handling can be
> handled on vcpu0 while other vcpus handle interrupts for other SCSI
> targets?
Yes, possibly. But I think the main benefit is in resiliency. If one
target malfunctions and timeouts, other targets still work normally
until the SCSI layer decides to reset that target.
> I remember VMware pv scsi has a trick here, each request can
> contain the vcpu number which influences interrupt routing somehow.
I don't think it works under Linux though, it depends on how the OS sets
up the APICs.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-04 22:28 [Qemu-devel] virtio-scsi spec, first public draft Paolo Bonzini
2011-05-05 9:43 ` Stefan Hajnoczi
@ 2011-05-05 12:50 ` Christoph Hellwig
2011-05-05 12:52 ` Paolo Bonzini
1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2011-05-05 12:50 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi, Hannes Reinecke
> #define VIRTIO_SCSI_T_BARRIER 0x80000000
>
> The type identifies the remaining fields. The value
> VIRTIO_SCSI_T_BARRIER can be ORed in the type as well. This bit
> indicates that this request acts as a barrier and that all preceding
> requests must be complete before this one, and all following requests
> must not be started until this is complete. Note that a barrier
> does not flush caches in the underlying backend device in host,
> and thus does not serve as data consistency guarantee. The driver
> must send a SYNCHRONIZE CACHE command to flush the host cache.
Please don't repeat the barrier mistake done in the Xen and virtio-blk/lguest
protocols. It really doesn't make sense to put this kind of strict odering
in. If we really want ordering let's do it using SCSI ordered tags at least
to use a standard implementation.
And SCSI already supports the FUA bit to force a write to be writethrough,
even if the QEMU SCSI code doesn't implement.
Let's just make virtio-scsi purely a transport and not added magic features
into it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] virtio-scsi spec, first public draft
2011-05-05 12:50 ` Christoph Hellwig
@ 2011-05-05 12:52 ` Paolo Bonzini
0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2011-05-05 12:52 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: qemu-devel, Stefan Hajnoczi, Hannes Reinecke
On 05/05/2011 02:50 PM, Christoph Hellwig wrote:
> Please don't repeat the barrier mistake done in the Xen and virtio-blk/lguest
> protocols. It really doesn't make sense to put this kind of strict odering
> in. If we really want ordering let's do it using SCSI ordered tags at least
> to use a standard implementation.
>
> And SCSI already supports the FUA bit to force a write to be writethrough,
> even if the QEMU SCSI code doesn't implement.
You're right, I reviewed the history of barriers and you can consider
this gone.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-05-06 12:48 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-04 22:28 [Qemu-devel] virtio-scsi spec, first public draft Paolo Bonzini
2011-05-05 9:43 ` Stefan Hajnoczi
2011-05-05 12:49 ` Paolo Bonzini
2011-05-05 14:29 ` Hannes Reinecke
2011-05-05 14:50 ` Paolo Bonzini
2011-05-06 12:31 ` Stefan Hajnoczi
2011-05-06 12:48 ` Paolo Bonzini
2011-05-05 12:50 ` Christoph Hellwig
2011-05-05 12:52 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).