From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:34973)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1QHxzf-0004pa-02
	for qemu-devel@nongnu.org; Thu, 05 May 2011 08:49:24 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1QHxzd-0006zW-UC
	for qemu-devel@nongnu.org; Thu, 05 May 2011 08:49:22 -0400
Received: from mail-iw0-f173.google.com ([209.85.214.173]:55967)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1QHxzd-0006zS-KI
	for qemu-devel@nongnu.org; Thu, 05 May 2011 08:49:21 -0400
Received: by iwl42 with SMTP id 42so2058495iwl.4
	for <qemu-devel@nongnu.org>; Thu, 05 May 2011 05:49:20 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <4DC29CCB.8050903@redhat.com>
Date: Thu, 05 May 2011 14:49:15 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <4DC1D30F.4070408@redhat.com>
	<20110505094323.GC5298@stefanha-thinkpad.localdomain>
In-Reply-To: <20110505094323.GC5298@stefanha-thinkpad.localdomain>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] virtio-scsi spec, first public draft
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, Hannes Reinecke <hare@suse.de>

>> Virtqueues
>>      0..n-1:one requestq per target
>>      n:control transmitq
>>      n+1:control receiveq
>
> 1 requestq per target makes it harder to support large numbers or
> dynamic targets.

I chose 1 requestq per target so that, with MSI-X support, each target 
can be associated to one MSI-X vector.

If you want a large number of units, you can subdivide targets into 
logical units, or use multiple adapters if you prefer.  We can have 
20-odd SCSI adapters, each with 65534 targets.  I think we're way beyond 
the practical limits even before LUN support is added to QEMU.

For comparison, Windows supports up to 1024 targets per adapter (split 
across 8 channels); IBM vSCSI provides up to 128; VMware supports a 
maximum of 15 SCSI targets per adapter and 4 adapters per VM.

> You mention detaching targets so is there a way to add
> a target?

Yes, just add the first LUN to it (it will be LUN0 which must be there 
anyway).  The target's existence will be reported on the control receiveq.

>> Feature bits
>>      VIRTIO_SCSI_F_INOUT - Whether a single request can include both
>>          read-only and write-only data buffers.
>
> Why make this an optional feature?

Because QEMU does not support it so far.

>>      The type identifies the remaining fields.  The value
>>      VIRTIO_SCSI_T_BARRIER can be ORed in the type as well.  This bit
>>      indicates that this request acts as a barrier and that all preceding
>>      requests must be complete before this one, and all following requests
>>      must not be started until this is complete.  Note that a barrier
>>      does not flush caches in the underlying backend device in host,
>>      and thus does not serve as data consistency guarantee.  The driver
>>      must send a SYNCHRONIZE CACHE command to flush the host cache.
>
> Why are these barrier semantics needed?

They are a convenience that I took from virtio-blk.  They are not needed 
in upstream Linux (which uses flush/FUA instead), so I'm not wedded to 
it, but they may be useful if virtio-scsi is ever ported to the stable 
2.6.32 series.

>>      The type is VIRTIO_SCSI_T_LUN_INFO, possibly with the
>>      VIRTIO_SCSI_T_BARRIER bit ORed in.
>
> Did you mean "type is VIRTIO_SCSI_T_TMF"?

Yes, of course.  Will fix.

>>
>>      The subtype and lun field are filled in by the driver, the additional
>>      and response field is filled in by the device.  Unknown LUNs are
>>      ignored; also, the lun field is ignored for the I_T NEXUS RESET
>>      command.
>
> In/out buffers must be separate in virtio so I think it makes sense to
> split apart a struct virtio_scsi_tmf_req and struct
> virtio_scsi_tmf_resp.

Here I was using the same standard used by the existing virtio specs, 
which place both kinds of buffers in the same struct.  I am fine with 
separating the two (and similarly for the other requests), but I'd 
rather not make virtio-scsi the only different one.

>>      VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
>>      logical unit (and the target as well if this is the last logical
>>      unit) disappear.  It takes an I_T_L nexus.  This non-standard TMF
>>      should be used in response to a host request to shutdown a target
>>      or LUN, after having placed the LUN in a clean state.
>
> Do we need an initiator-driven detach?  If the initiator doesn't care
> about a device anymore it simply doesn't communicate with it or allocate
> resources for it.  I think the real detach should be performed on the
> target side (e.g. QEMU monitor command removes the target from the SCSI
> bus).  So I guess I'm asking what is the real use-case for this
> function?

It is not really an initiator-driven detach, it is the initiator's 
acknowledgement of a target-driven detach.  The target needs to know 
when the initiator is ready so that it can free resources attached to 
the logical unit (this is particularly important if the LU is a physical 
disk and it is opened with exclusive access).

>> - SCSI command
>>
>>      #define VIRTIO_SCSI_T_CMD	                    1
>>
>>      struct virtio_scsi_req_cmd {
>>          u32 type;
>>          u32 ioprio;
>>          u8 lun[8];
>>          u64 id;
>>          u32 num_dataout, num_datain;
>>          char cdb[];
>>          char data[][num_dataout+num_datain];
>>          u8 sense[];
>>          u32 sense_len;
>>          u32 residual;
>>          u8 status;
>>          u8 response;
>>      };
>
> We don't need explicit buffer size fields since virtqueue elements
> include sizes.  For example:
>
> size_t sense_len = elem->in_sg[sense_idx].iov_len;
> memcpy(elem->in_sg[sense_idx].iov_buf, sense_buf,
>         MIN(sense_len, sizeof(sense_buf)));

I think only the total length is written in the used ring, letting the 
driver figure out the number of bytes written to the sense buffer is 
harder than just writing it.

Paolo