From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:57663) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T4pUd-0001Ll-5j for qemu-devel@nongnu.org; Fri, 24 Aug 2012 04:43:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T4pUS-0007zy-EC for qemu-devel@nongnu.org; Fri, 24 Aug 2012 04:43:51 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51969 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T4pUS-0007zj-3w for qemu-devel@nongnu.org; Fri, 24 Aug 2012 04:43:40 -0400 Message-ID: <50375AD6.8060203@suse.de> Date: Fri, 24 Aug 2012 12:43:34 +0200 From: Hannes Reinecke MIME-Version: 1.0 References: <1345537427-21601-1-git-send-email-mc@linux.vnet.ibm.com> <50334B51.6050900@redhat.com> <503357B2.5040901@linux.vnet.ibm.com> <50335F78.1030005@redhat.com> <5034BCD1.9020603@linux.vnet.ibm.com> <5034CBF8.3050602@redhat.com> <20120822131348.GA3512@stefanha-thinkpad.localdomain> <5034E918.4030305@redhat.com> <5035F873.6090305@linux.vnet.ibm.com> <5035FFF4.4040603@redhat.com> <1345769101.10190.124.camel@haakon2.linux-iscsi.org> <503733A2.1050300@redhat.com> In-Reply-To: <503733A2.1050300@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/2 v1] blkdrv: Add queue limits parameters for sg block drive List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Stefan Hajnoczi , zwanp@cn.ibm.com, linuxram@us.ibm.com, qemu-devel@nongnu.org, "Nicholas A. Bellinger" , virtualization@lists.linux-foundation.org, Cong Meng , Christoph Hellwig On 08/24/2012 09:56 AM, Paolo Bonzini wrote: > Il 24/08/2012 02:45, Nicholas A. Bellinger ha scritto: >> So up until very recently, TCM would accept an I/O request for an DATA >> I/O type CDB with a max_sectors larger than the reported max_sectors f= or >> it's TCM backend (regardless of backend type), and silently generate N >> backend 'tasks' to complete the single initiator generated command. > > This is what QEMU does if you use scsi-block, except for MMC devices > (because of the insanity of the commands used for burning). > >> Also FYI for Paolo, for control type CDBs I've never actually seen an >> allocation length exceed max_sectors, so in practice AFAIK this only >> happens for DATA I/O type CDBs. > > Yes, that was my impression as well. > >> This was historically required by the pSCSI backend driver (using a >> number of old SCSI passthrough interfaces) in order to support this ve= ry >> type of case described above, but over the years the logic ended up >> creeping into various other non-passthrough backend drivers like IBLOC= K >> +FILEIO. So for v3.6-rc1 code, hch ended up removing the 'task' logic >> thus allowing backends (and the layers below) to the I/O sectors > >> max_sectors handling work, allowing modern pSCSI using struct request = to >> do the same. (hch assured me this works now for pSCSI) > > So now LIO and QEMU work the same. (Did he test tapes too?) > >> Anyways, I think having the guest limit virtio-scsi DATA I/O to >> max_sectors based upon the host accessible block limits is reasonable >> approach to consider. Reducing this value even further based upon the >> lowest max_sectors available amongst possible migration hosts would be= a >> good idea here to avoid having to reject any I/O's exceeding a new >> host's device block queue limits. > > Yeah, it's reasonable _assuming it is needed at all_. For disks, it is > not needed. For CD-ROMs it is, but right now we have only one report > and it is using USB so we don't know if the problem is in the drive or > rather in the USB bridge (whose quality usually leaves much to be desir= ed). > > So in the only observed case, the fix would really be a workaround; the > right thing to do with USB devices is to use USB passthrough. > Hehe. So finally someone else stumbled across this one. All is fine and dandy as long as you're able to use scsi-disk. As soon as you're forced to use scsi-generic we're in trouble. With scsi-generic we actually have two problems: 1) scsi-generic just acts as a pass-through and passes the commands as-is, including the scatter-gather information as formatted by the guest. So the guest could easily format an SG_IO comand which will not be compatible with the host. 2) The host is not able to differentiate between a malformed SG_IO command and a real I/O error; in both cases it'll return -EIO. So we can fix this by either a) ignore (as we do nowadays :-) b) Fixup scsi-generic to inspect and modify SG_IO information to ensure the host-limits are respected c) Fixup the host to differentiate between a malformed SG_IO and a real I/O error. c) would only be feasible for Linux et al. _personally_ I would prefer=20 that approach, as I fail to see why we cannot return a proper error code=20 here. But I already can hear the outraged cry 'POSIX! POSIX!', so I guess it's=20 not going to happen anytime soon. So I would vote for b). Yes, it's painful. But in the long run we'll have to do an SG_IO=20 inspection anyway, otherwise we'll always be susceptible to malicious=20 SG_IO attacks. Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: Markus Rex, HRB 16746 (AG N=FCrnberg)