From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:54162)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <hare@suse.de>) id 1QHzYy-00056b-Fu
	for qemu-devel@nongnu.org; Thu, 05 May 2011 10:29:58 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <hare@suse.de>) id 1QHzYt-0002Ox-7e
	for qemu-devel@nongnu.org; Thu, 05 May 2011 10:29:56 -0400
Received: from cantor2.suse.de ([195.135.220.15]:36280 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <hare@suse.de>)
	id 1QHzYs-0002Nt-Nv
	for qemu-devel@nongnu.org; Thu, 05 May 2011 10:29:51 -0400
Message-ID: <4DC2B455.9090007@suse.de>
Date: Thu, 05 May 2011 16:29:41 +0200
From: Hannes Reinecke <hare@suse.de>
MIME-Version: 1.0
References: <4DC1D30F.4070408@redhat.com>
	<20110505094323.GC5298@stefanha-thinkpad.localdomain>
	<4DC29CCB.8050903@redhat.com>
In-Reply-To: <4DC29CCB.8050903@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] virtio-scsi spec, first public draft
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, qemu-devel <qemu-devel@nongnu.org>

Hi all,

On 05/05/2011 02:49 PM, Paolo Bonzini wrote:
>>> Virtqueues
>>> 0..n-1:one requestq per target
>>> n:control transmitq
>>> n+1:control receiveq
>>
>> 1 requestq per target makes it harder to support large numbers or
>> dynamic targets.
>
> I chose 1 requestq per target so that, with MSI-X support, each
> target can be associated to one MSI-X vector.
>
> If you want a large number of units, you can subdivide targets into
> logical units, or use multiple adapters if you prefer. We can have
> 20-odd SCSI adapters, each with 65534 targets. I think we're way
> beyond the practical limits even before LUN support is added to QEMU.
>
But this will make queue full tracking harder.
If we have one queue per LUN the SCSI stack is able to track QUEUE=20
FULL states and will adjust the queue depth accordingly.
When we have only one queue per target we cannot track QUEUE FULL=20
anymore and have to rely on the static per-host 'can_queue' setting.
Which doesn't work as well, especially in a virtualized environment=20
where the queue full conditions might change at any time.

But read on:

> For comparison, Windows supports up to 1024 targets per adapter
> (split across 8 channels); IBM vSCSI provides up to 128; VMware
> supports a maximum of 15 SCSI targets per adapter and 4 adapters per
> VM.
>
We don't have to impose any hard limits here. The virtio scsi=20
transport would need to be able to detect the targets, and we would=20
be using whatever targets have been found.

>> You mention detaching targets so is there a way to add
>> a target?
>
> Yes, just add the first LUN to it (it will be LUN0 which must be
> there anyway). The target's existence will be reported on the
> control receiveq.
>
?? How is this supposed to work?
How can I detect the existence of a virtqueue ?

For this I actually like the MSI-X idea:
If we were to rely on MSI-X to refer to the virtqueues we could
just parse the MSI-X structure and create virtqueues for each entry=20
found to be valid.
And to be consistent with the SCSI layer the virtqueues then in fact=20
would need to map the SCSI targets; LUNs would be detected from the=20
SCSI midlayer outside the control of the virtio-scsi HBA.

>>> Feature bits
>>> VIRTIO_SCSI_F_INOUT - Whether a single request can include both
>>> read-only and write-only data buffers.
>>
>> Why make this an optional feature?
>
> Because QEMU does not support it so far.
>
>>> The type identifies the remaining fields. The value
>>> VIRTIO_SCSI_T_BARRIER can be ORed in the type as well. This bit
>>> indicates that this request acts as a barrier and that all preceding
>>> requests must be complete before this one, and all following
>>> requests
>>> must not be started until this is complete. Note that a barrier
>>> does not flush caches in the underlying backend device in host,
>>> and thus does not serve as data consistency guarantee. The driver
>>> must send a SYNCHRONIZE CACHE command to flush the host cache.
>>
>> Why are these barrier semantics needed?
>
> They are a convenience that I took from virtio-blk. They are not
> needed in upstream Linux (which uses flush/FUA instead), so I'm not
> wedded to it, but they may be useful if virtio-scsi is ever ported
> to the stable 2.6.32 series.
>
As mentioned by hch; just drop this.

[ .. ]
>>> VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the
>>> logical unit (and the target as well if this is the last logical
>>> unit) disappear. It takes an I_T_L nexus. This non-standard TMF
>>> should be used in response to a host request to shutdown a target
>>> or LUN, after having placed the LUN in a clean state.
>>
>> Do we need an initiator-driven detach? If the initiator doesn't care
>> about a device anymore it simply doesn't communicate with it or
>> allocate
>> resources for it. I think the real detach should be performed on the
>> target side (e.g. QEMU monitor command removes the target from the
>> SCSI
>> bus). So I guess I'm asking what is the real use-case for this
>> function?
>
> It is not really an initiator-driven detach, it is the initiator's
> acknowledgement of a target-driven detach. The target needs to know
> when the initiator is ready so that it can free resources attached
> to the logical unit (this is particularly important if the LU is a
> physical disk and it is opened with exclusive access).
>
Not required. The target can detach any LUN at any time and can rely=20
on the initiator to handle this situation. Multipath handles this=20
just fine.

>>> - SCSI command
>>>
>>> #define VIRTIO_SCSI_T_CMD 1
>>>
>>> struct virtio_scsi_req_cmd {
>>> u32 type;
>>> u32 ioprio;
>>> u8 lun[8];
>>> u64 id;
>>> u32 num_dataout, num_datain;
>>> char cdb[];
>>> char data[][num_dataout+num_datain];
>>> u8 sense[];
>>> u32 sense_len;
>>> u32 residual;
>>> u8 status;
>>> u8 response;
>>> };
>>
>> We don't need explicit buffer size fields since virtqueue elements
>> include sizes. For example:
>>
>> size_t sense_len =3D elem->in_sg[sense_idx].iov_len;
>> memcpy(elem->in_sg[sense_idx].iov_buf, sense_buf,
>> MIN(sense_len, sizeof(sense_buf)));
>
> I think only the total length is written in the used ring, letting
> the driver figure out the number of bytes written to the sense
> buffer is harder than just writing it.
>
Yes. The sense buffer would be present always, and we will need a=20
means of detecting whether the contents of the sense buffer are valid.
And no, CHECK CONDITION is not sufficient here.

Cheers,

Hannes
--=20
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg
GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)