From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43284) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XmgAB-0006hV-6E for qemu-devel@nongnu.org; Fri, 07 Nov 2014 04:49:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XmgA1-0000y1-G5 for qemu-devel@nongnu.org; Fri, 07 Nov 2014 04:49:03 -0500 Received: from e06smtp16.uk.ibm.com ([195.75.94.112]:48153) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XmgA1-0000xk-3q for qemu-devel@nongnu.org; Fri, 07 Nov 2014 04:48:53 -0500 Received: from /spool/local by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 7 Nov 2014 09:48:51 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id B8D2D17D8047 for ; Fri, 7 Nov 2014 09:48:46 +0000 (GMT) Received: from d06av07.portsmouth.uk.ibm.com (d06av07.portsmouth.uk.ibm.com [9.149.37.248]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id sA79mdcJ16187882 for ; Fri, 7 Nov 2014 09:48:39 GMT Received: from d06av07.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av07.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id sA79mdmf023624 for ; Fri, 7 Nov 2014 04:48:39 -0500 Message-ID: <545C9576.9080703@de.ibm.com> Date: Fri, 07 Nov 2014 10:48:38 +0100 From: Christian Borntraeger MIME-Version: 1.0 References: <1406636839-11946-1-git-send-email-tumanova@linux.vnet.ibm.com> <545B9A92.4030802@de.ibm.com> <8761ere5dx.fsf@blackfin.pond.sub.org> In-Reply-To: <8761ere5dx.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Geometry and blocksize support for backing devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Kevin Wolf , Stefan Hajnoczi , Ekaterina Tumanova , qemu-devel@nongnu.org, Viktor Mihajlovski , dahi@linux.vnet.ibm.com, cornelia.huck@de.ibm.com, Paolo Bonzini Am 07.11.2014 10:17, schrieb Markus Armbruster: > Christian Borntraeger writes: > >> Markus, Kevin, Stefan, >> >> here is a (somewhat late) followup of some KVM forum discussions regarding >> block size and geometry of pass-through block devices. Let's just do a quick >> wrap-up (as of my understanding) and a proposal at the end of the mail >> >> >> >> >> >> - DASD/ECKD disk devices have several special properties, e.g. the geometry >> still has an influence on the partitions and each track can have a different >> format and the format of the disk actually follows what z/OS has as basic >> data structures. Linux does a low-level format of the disk to look like a >> block device (most common case is formatted with 4K blocks). There are >> still some smalls warts for z/OS compatibility (block0 has 28 bytes of data, >> block1 has 148 bytes and block2 has 84 bytes of data. everything else has >> 4k. the dasd device driver will fake 4k blocks for all blocks by ignoring >> writes beyond the data for blocks 0-2 and filling the gaps with 0xe5). >> >> - Since Linux uses DASDs as normal block device, we actually want to use >> virtio-blk to pass those to KVM guests. Due to the warts mentioned above, >> we have to have the proper block size and geometry in the guest. Otherwise >> things will fail in certain cases (linux partition detection for example) >> >> - we have libvirt support to provide this information in the XML. This is far >> from user friendly, though, as the admin has to manually look up the >> properties of the host disks and then modify the guest definition accordingly. >> >> - Kate came up with patches (based on initial patches from Einar Lueck) for >> auto-detection of geometry and block size for host block devices >> >> - Stefan and Paolo had some concerns: >> 1. if the geometry etc is important, then make it part of the guest definition >> 2. what about migration and the target disk differs >> 3. is that issue system z specific or generic? >> >> Regarding 1: this does work as of today, but it is pretty complicated for an >> admin to do so >> >> Regarding 2: System z system do not have built-in disks, they are always >> accessed via fibre channel (either FICON protocol for DASDs or FCP protocol >> for scsi). So its quite common to have shared access to the same disk from >> different System z boxes. system z admins should be able to setup this >> properly. Question is, is it ok to assume that and fail if not? >> >> Regarding 3: No idea. >> >> At KVM forum I talked to several people regarding a solution: >> >> a) Stefan suggested to make the auto detection explicit, e.g.: provide a >> "autodetect" tag for the secs, cylc, heads and logical_sector_size properties >> This would require changes in qemu, but also in libvirt and its domain >> configuration >> b) Markus suggested that there are already some cases in QEMU, where we rely >> on the admin to provide a proper setup on the target, e.g. an .iso >> file as image. >> If the target has a different content in the .iso file things will break. >> >> Markus said, there are two classes of this. >> a) problems that can be detected by QEMU. Here QEMU will abort migration > > Example: device missing on target, target rejects migration when it > receives the device's state. > >> b) problems that cannot be detected by QEMU (e.g. different iso content). this >> will trigger failures later on >> >> a is preferred > > Note that the geometry is currently in class (b). Configuration > generally is. > > A perpetual long-term goal of migration is embedding configuration in > the migration stream, to move it from (b) to (a). Just frontend > configuration, because backend configuration is generally host-specific. > > Geometry is a property of the device, thus frontend configuration. > > For historical reasons, device geometry properties default to backend > values set with -drive cyls=... or -hdachs. This is deprecated. > > If the user doesn't specify geometry, QEMU makes one up, usually based > on device size. In certain circumstances, it bases on a DOS partition > table instead. Misfeature, in my opinion. Partitioning a disk can give > you a different geometry on the next restart, including migration, > unless you specify the geometry explicitly. Fortunately, most guests > don't care for geometry at all. This is entirely undocumented, as far > as I can tell. > >> Now here comes my proposal: >> Markus statement brought up an idea of special casing DASDs support. We can >> call an ioctl BIODASDINFO on the block device that will only succeed if the host >> disk is really a dasd. We could enable the auto detection for that case. > > If BIODASDINFO succeeds, QEMU uses that instead of making up device > geometry as described above. Correct? > > Let's spell out when exactly BIODASDINFO succeeds, to avoid > misunderstandings. It does when the backend is a DASD (/dev/dasd*). > What about a partition on a DASD? A file in a filesystem on a DASD? BIODASDINFO (we will propbably use BIODASDINFO2) will succeed only if the backend is backed by a dasd or its partitions. It will fail on other block devices and files on a dasd. Now: when passing in only a dasd partition we cannot do any sane thing with the dasd oddities inside the guest (e.g. creating dasd partitions, reading volume label or TOC etc.) So there is no need passing in geometry and block size. We could make the special case even stricter and check for start == 0 when doing the HDIO_GETGEO. So pass-through of geometry and block size is only performed if BIODASDINFO2 succeeds, and the GETGEO call indicates that this is not a partions by having start == 0. Makes sense? > > Auto-detecting geometry on DASDs adds an irregularity to the user > interface. I guess DASDs are special enough to tolerate that. > >> In addition, QEMU will check if geometry and block size match during >> migration, if >> not, migration will fail. That would work with the following cases >> >> (manual override == secs, cyls, headers, blocksize given by admin) >> >> HOST A HOST B >> dasd (auto) -----> dasd (auto) >> dasd (auto) -----> image file (manual override) >> image file (manual) -----> dasd (auto) >> image file (manual) -----> image file (manual) >> dasd (auto) -----> other host block device with manual override >> >> if there are different dasds or different value migration will fail (and that is >> what we want) > > I guess I helped plant this idea. Thinking about it again, I fear doing > it just for geometry could be shortsighted. When we do it for all > device configuration later on, a geometry special case could get in the > way. > > While you're certainly welcome to take on one of migration's long-term > projects, I don't want to make it a prerequisite for getting DASDs more > usable ;) > > Without this sanity check, we gain another way to mess up geometry by > changing the default geometry (see "DOS partition table" above for the > existing way). We reuse the same answer: don't do that then. > > Good enough for Paolo, if I understand him correctly. I guess it's good > enough for me too, but please document it. Covering the whole default > geometry machinery would be nice. > >> In addition we could also implement Stefans proposal to add a >> "autodetect" statement >> for secs, cyls, head.... but I am not sure about libvirt support, though >> >> So 3 parts: >> 1. autodetect on real DASDs >> 2. geometry and sector size checking in generic code >> 3. maybe an autodetect flag >> >> makes sense? Any guidance how to proceed? > > Try posting a patch just for 1., and see if anyone screams :) Kate, can you have a try?