From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Bolkhovitin Subject: Re: [Stgt-devel] Question for pass-through target design Date: Mon, 07 May 2007 18:24:44 +0400 Message-ID: <463F36AC.3010207@vlnb.net> References: <20070504160712.GB16528@austin.ibm.com> <200705041704.l44H4WXa003789@mbox.iij4u.or.jp> <463B72F6.3000207@torque.net> <20070506053629P.fujita.tomonori@lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-relay-02.mailcluster.net ([85.249.135.243]:59629 "EHLO mail-relay-02.mailcluster.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754445AbXEGO4v (ORCPT ); Mon, 7 May 2007 10:56:51 -0400 In-Reply-To: <20070506053629P.fujita.tomonori@lab.ntt.co.jp> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: FUJITA Tomonori Cc: dougg@torque.net, stgt-devel@lists.berlios.de, scst-devel , linux-scsi@vger.kernel.org FUJITA Tomonori wrote: >>>>It looks like the pass-through target support is currently broken, at >>>>least as I've checked for ibmvstgt, but I think it's a general problem. >>>>I wanted to check my assumptions and get ideas. >>> >>>Yeah, unfortunately, it works only with the iSCSI target driver (which >>>runs in user space). >>> >>> >>> >>>>The code isn't allocating any memory to pass along to the sg code to store >>>>the result of a read or data for a write. Currently, dxferp for sg_io_hdr >>>>or dout_xferp/din_xferp for sg_io_v4 are assigned to the value of uaddr, >>>>which is set to 0 in kern_queue_cmd. With the pointer set to NULL, >>>>the pass-through target isn't going to function. Even if we had memory >>>>allocated, there isn't a means of getting data to be written via sg down >>>>this code path. >>>> >>>>What ideas are there as to how the data will get to user-space so that >>>>we can use sg? >>> >>>For kernel-space drivers, we don't need to go to user-space. We can do >>>the pass-through in kernel space. I talked with James about this last >>>year and he said that if the code is implemented cleanly, he would >>>merges it into mainline. >> >>We already have a pass-through in the kernel space for >>kernel space drivers. It is the scsi_tgt* code. > > > Could you elaborate more? > > What I meant that is that the kernel tgt code (scsi_tgt*) receives > SCSI commands from one lld and send them to another lld instead of > sending them to user space. Although the approach of passing SCSI commands from a target LLD to an initiator one without any significant interventions from the target software looks to be nice and simple, you should realize how limited, unsafe and illegal it is, since it badly violates SCSI specs. Before I elaborate, let's have the following terminology in addition to one described in SAM: - Target system - the overall system containing target and initiator devices (and their LDDs). Target system exports one or more initiator devices via the target device(s). - Target device - a SCSI device on the target system in the target mode. - Initiator device - a SCSI device on the target system in the initiator mode. It actually serves commands that come from remote initiators via target device(s). - Remote initiator - a SCSI initiator device connected to the target device on the target system and uses (i.e. sends SCSI commands) exported by it devices. - Target software - software that runs on the target system and implements the necessary pass-through functionality Let's consider a simplest case when a target system has one target device, one initiator device and it exports the initiator device via the target device as pass-through. The problem is that then the target system creates a new SCSI target device, which is not the same as the exported initiator device. Particularly, the new device could have >1 nexuses with remote initiators connected to it, while the initiator device has no glue about them, it sees a single nexus with the target system and only it. And so? All the event notifications, which should be seen by all remote initiators will be delivered to only one of them or not generated at all, since some events are generated only for I_T nexuses other, than one on which the command causing the event is received. The most common example of such events is Unit Attentions. For example, after MODE SELECT command, all remote initiators, except one, who sent the command, shall receive "MODE PARAMETERS CHANGED" Unit Attention. Otherwise a bad and quiet data corruption could happen. More complicated example is SCSI reservations, doesn't matter persistent or SPC-2 ones. Since the initiator device knows only about one nexus, instead of actual many of them, the reservation commands should be completely handled by target software on the target system. Having delivery of Unit Attentions to all remote initiators especially important for reservations, since they could mean that a reservation was revoked by another initiator via, e.g., some task management function. Things get even worse if we realize that (1) the initiator device could report about its capabilities (like ACA support), which aren't supported by the target software, hence misinform the remote initiators and again could provoke a quiet data corruption, and (2) accesses to the initiator devices from local programs on the target systems create another I_T nexus, which needs to be handled as well. (I suppose it is obvious that if the target system exports >1 initiator devices via a single target device, since the initiator devices don't know about each other, the target software in any case needs to implement its own LUN addressing as well as own REPORT LUNS command handler). Thus, such in-kernel pass-through mode could be used only for limited set of SCSI commands and SCSI device types with a big caution and complete comprehension what's going on and how it should be. The latter isn't true in the absolute majority of uses and users, so such approach will give users a perfect weapon to shoot themselfs. If you start addressing the above issues, I believe, you will endup with: - Either with complete duplicating the SCSI state machine in both user and kernel code, eventually copying what's already done in SCST (see below) - Or have so complicated interactions between user space and kernel that you will never like them (here is why: http://lkml.org/lkml/2006/7/1/41, http://lkml.org/lkml/2007/4/24/364 and http://lkml.org/lkml/2007/4/24/451, I totally agree with Linus), - Or fully drop the in-kernel pass-through and leave only the user space one with all its drawbacks and penalties. But, I believe, in this case you will also have serious difficulties handling the local nexus case (i.e. commands originated from local applications on the target system) cleanly. So, if you need in-kernel pass-through I would suggest you to look at SCST project (http://scst.sf.net), which is currently stable and mature, although also not fully finished yet. It was historically from the very beginning designed for full feature in-kernel pass-through for not only stateless SCSI devices, like disks, but also for stateful SCSI devices (like SSC ones a.k.a. tapes), where the correct handling of all above is essential. In additional to considerably better performance, the complete in-kernel approach makes the code simpler, smaller and cleaner as well as allows such things as zero-copy buffered file IO, i.e. when data are sent to remote initiators or received from them directly from/to the page cache (currently under development). For those who need implementing SCSI devices in the user space scst_user module is about to be added. Since the SCSI state machine is in kernel the interface provided by scst_user is very simple, it essentially consists from only a single IOCTL and allows to have overhead as low as a single syscall per SCSI command without any additional context switches. It is already implemented and works. For some legal reasons I can't at the moment publish it, but you can see its full description in the project's SVN docs (you can get them using command "svn co https://svn.sourceforge.net/svnroot/scst/trunk/doc"). Thanks to all who managed to read until this, Vlad P.S. This message is not to start a new flamewar, this is just an attempt of a healthy criticism for the current mainline target approach as well as a hope to get some healthy criticism for SCST.