From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:54162) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QHzYy-00056b-Fu for qemu-devel@nongnu.org; Thu, 05 May 2011 10:29:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QHzYt-0002Ox-7e for qemu-devel@nongnu.org; Thu, 05 May 2011 10:29:56 -0400 Received: from cantor2.suse.de ([195.135.220.15]:36280 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QHzYs-0002Nt-Nv for qemu-devel@nongnu.org; Thu, 05 May 2011 10:29:51 -0400 Message-ID: <4DC2B455.9090007@suse.de> Date: Thu, 05 May 2011 16:29:41 +0200 From: Hannes Reinecke MIME-Version: 1.0 References: <4DC1D30F.4070408@redhat.com> <20110505094323.GC5298@stefanha-thinkpad.localdomain> <4DC29CCB.8050903@redhat.com> In-Reply-To: <4DC29CCB.8050903@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] virtio-scsi spec, first public draft List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Stefan Hajnoczi , qemu-devel Hi all, On 05/05/2011 02:49 PM, Paolo Bonzini wrote: >>> Virtqueues >>> 0..n-1:one requestq per target >>> n:control transmitq >>> n+1:control receiveq >> >> 1 requestq per target makes it harder to support large numbers or >> dynamic targets. > > I chose 1 requestq per target so that, with MSI-X support, each > target can be associated to one MSI-X vector. > > If you want a large number of units, you can subdivide targets into > logical units, or use multiple adapters if you prefer. We can have > 20-odd SCSI adapters, each with 65534 targets. I think we're way > beyond the practical limits even before LUN support is added to QEMU. > But this will make queue full tracking harder. If we have one queue per LUN the SCSI stack is able to track QUEUE=20 FULL states and will adjust the queue depth accordingly. When we have only one queue per target we cannot track QUEUE FULL=20 anymore and have to rely on the static per-host 'can_queue' setting. Which doesn't work as well, especially in a virtualized environment=20 where the queue full conditions might change at any time. But read on: > For comparison, Windows supports up to 1024 targets per adapter > (split across 8 channels); IBM vSCSI provides up to 128; VMware > supports a maximum of 15 SCSI targets per adapter and 4 adapters per > VM. > We don't have to impose any hard limits here. The virtio scsi=20 transport would need to be able to detect the targets, and we would=20 be using whatever targets have been found. >> You mention detaching targets so is there a way to add >> a target? > > Yes, just add the first LUN to it (it will be LUN0 which must be > there anyway). The target's existence will be reported on the > control receiveq. > ?? How is this supposed to work? How can I detect the existence of a virtqueue ? For this I actually like the MSI-X idea: If we were to rely on MSI-X to refer to the virtqueues we could just parse the MSI-X structure and create virtqueues for each entry=20 found to be valid. And to be consistent with the SCSI layer the virtqueues then in fact=20 would need to map the SCSI targets; LUNs would be detected from the=20 SCSI midlayer outside the control of the virtio-scsi HBA. >>> Feature bits >>> VIRTIO_SCSI_F_INOUT - Whether a single request can include both >>> read-only and write-only data buffers. >> >> Why make this an optional feature? > > Because QEMU does not support it so far. > >>> The type identifies the remaining fields. The value >>> VIRTIO_SCSI_T_BARRIER can be ORed in the type as well. This bit >>> indicates that this request acts as a barrier and that all preceding >>> requests must be complete before this one, and all following >>> requests >>> must not be started until this is complete. Note that a barrier >>> does not flush caches in the underlying backend device in host, >>> and thus does not serve as data consistency guarantee. The driver >>> must send a SYNCHRONIZE CACHE command to flush the host cache. >> >> Why are these barrier semantics needed? > > They are a convenience that I took from virtio-blk. They are not > needed in upstream Linux (which uses flush/FUA instead), so I'm not > wedded to it, but they may be useful if virtio-scsi is ever ported > to the stable 2.6.32 series. > As mentioned by hch; just drop this. [ .. ] >>> VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_DETACH asks the device to make the >>> logical unit (and the target as well if this is the last logical >>> unit) disappear. It takes an I_T_L nexus. This non-standard TMF >>> should be used in response to a host request to shutdown a target >>> or LUN, after having placed the LUN in a clean state. >> >> Do we need an initiator-driven detach? If the initiator doesn't care >> about a device anymore it simply doesn't communicate with it or >> allocate >> resources for it. I think the real detach should be performed on the >> target side (e.g. QEMU monitor command removes the target from the >> SCSI >> bus). So I guess I'm asking what is the real use-case for this >> function? > > It is not really an initiator-driven detach, it is the initiator's > acknowledgement of a target-driven detach. The target needs to know > when the initiator is ready so that it can free resources attached > to the logical unit (this is particularly important if the LU is a > physical disk and it is opened with exclusive access). > Not required. The target can detach any LUN at any time and can rely=20 on the initiator to handle this situation. Multipath handles this=20 just fine. >>> - SCSI command >>> >>> #define VIRTIO_SCSI_T_CMD 1 >>> >>> struct virtio_scsi_req_cmd { >>> u32 type; >>> u32 ioprio; >>> u8 lun[8]; >>> u64 id; >>> u32 num_dataout, num_datain; >>> char cdb[]; >>> char data[][num_dataout+num_datain]; >>> u8 sense[]; >>> u32 sense_len; >>> u32 residual; >>> u8 status; >>> u8 response; >>> }; >> >> We don't need explicit buffer size fields since virtqueue elements >> include sizes. For example: >> >> size_t sense_len =3D elem->in_sg[sense_idx].iov_len; >> memcpy(elem->in_sg[sense_idx].iov_buf, sense_buf, >> MIN(sense_len, sizeof(sense_buf))); > > I think only the total length is written in the used ring, letting > the driver figure out the number of bytes written to the sense > buffer is harder than just writing it. > Yes. The sense buffer would be present always, and we will need a=20 means of detecting whether the contents of the sense buffer are valid. And no, CHECK CONDITION is not sufficient here. Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)