From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34497) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XmjlZ-0002Zi-T7 for qemu-devel@nongnu.org; Fri, 07 Nov 2014 08:40:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XmjlQ-0004MF-Q0 for qemu-devel@nongnu.org; Fri, 07 Nov 2014 08:39:53 -0500 Received: from e06smtp12.uk.ibm.com ([195.75.94.108]:51541) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XmjlQ-0004LN-Eb for qemu-devel@nongnu.org; Fri, 07 Nov 2014 08:39:44 -0500 Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 7 Nov 2014 13:39:40 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 7690C1B08023 for ; Fri, 7 Nov 2014 13:39:44 +0000 (GMT) Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id sA7Ddb6V9765146 for ; Fri, 7 Nov 2014 13:39:37 GMT Received: from d06av01.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id sA7Ddapo015514 for ; Fri, 7 Nov 2014 06:39:36 -0700 Message-ID: <545CCB96.5020400@linux.vnet.ibm.com> Date: Fri, 07 Nov 2014 16:39:34 +0300 From: Ekaterina Tumanova MIME-Version: 1.0 References: <1406636839-11946-1-git-send-email-tumanova@linux.vnet.ibm.com> <545B9A92.4030802@de.ibm.com> <8761ere5dx.fsf@blackfin.pond.sub.org> In-Reply-To: <8761ere5dx.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Geometry and blocksize support for backing devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster , Christian Borntraeger Cc: Kevin Wolf , Stefan Hajnoczi , qemu-devel@nongnu.org, Viktor Mihajlovski , dahi@linux.vnet.ibm.com, cornelia.huck@de.ibm.com, Paolo Bonzini On 11/07/2014 12:17 PM, Markus Armbruster wrote: > Christian Borntraeger writes: > >> Markus, Kevin, Stefan, >> >> here is a (somewhat late) followup of some KVM forum discussions regarding >> block size and geometry of pass-through block devices. Let's just do a quick >> wrap-up (as of my understanding) and a proposal at the end of the mail >> >> >> >> >> >> - DASD/ECKD disk devices have several special properties, e.g. the geometry >> still has an influence on the partitions and each track can have a different >> format and the format of the disk actually follows what z/OS has as basic >> data structures. Linux does a low-level format of the disk to look like a >> block device (most common case is formatted with 4K blocks). There are >> still some smalls warts for z/OS compatibility (block0 has 28 bytes of data, >> block1 has 148 bytes and block2 has 84 bytes of data. everything else has >> 4k. the dasd device driver will fake 4k blocks for all blocks by ignoring >> writes beyond the data for blocks 0-2 and filling the gaps with 0xe5). >> >> - Since Linux uses DASDs as normal block device, we actually want to use >> virtio-blk to pass those to KVM guests. Due to the warts mentioned above, >> we have to have the proper block size and geometry in the guest. Otherwise >> things will fail in certain cases (linux partition detection for example) >> >> - we have libvirt support to provide this information in the XML. This is far >> from user friendly, though, as the admin has to manually look up the >> properties of the host disks and then modify the guest definition accordingly. >> >> - Kate came up with patches (based on initial patches from Einar Lueck) for >> auto-detection of geometry and block size for host block devices >> >> - Stefan and Paolo had some concerns: >> 1. if the geometry etc is important, then make it part of the guest definition >> 2. what about migration and the target disk differs >> 3. is that issue system z specific or generic? >> >> Regarding 1: this does work as of today, but it is pretty complicated for an >> admin to do so >> >> Regarding 2: System z system do not have built-in disks, they are always >> accessed via fibre channel (either FICON protocol for DASDs or FCP protocol >> for scsi). So its quite common to have shared access to the same disk from >> different System z boxes. system z admins should be able to setup this >> properly. Question is, is it ok to assume that and fail if not? >> >> Regarding 3: No idea. >> >> At KVM forum I talked to several people regarding a solution: >> >> a) Stefan suggested to make the auto detection explicit, e.g.: provide a >> "autodetect" tag for the secs, cylc, heads and logical_sector_size properties >> This would require changes in qemu, but also in libvirt and its domain >> configuration >> b) Markus suggested that there are already some cases in QEMU, where we rely >> on the admin to provide a proper setup on the target, e.g. an .iso >> file as image. >> If the target has a different content in the .iso file things will break. >> >> Markus said, there are two classes of this. >> a) problems that can be detected by QEMU. Here QEMU will abort migration > > Example: device missing on target, target rejects migration when it > receives the device's state. > >> b) problems that cannot be detected by QEMU (e.g. different iso content). this >> will trigger failures later on >> >> a is preferred > > Note that the geometry is currently in class (b). Configuration > generally is. > > A perpetual long-term goal of migration is embedding configuration in > the migration stream, to move it from (b) to (a). Just frontend > configuration, because backend configuration is generally host-specific. > > Geometry is a property of the device, thus frontend configuration. > > For historical reasons, device geometry properties default to backend > values set with -drive cyls=... or -hdachs. This is deprecated. > > If the user doesn't specify geometry, QEMU makes one up, usually based > on device size. In certain circumstances, it bases on a DOS partition > table instead. Misfeature, in my opinion. Partitioning a disk can give > you a different geometry on the next restart, including migration, > unless you specify the geometry explicitly. Fortunately, most guests > don't care for geometry at all. This is entirely undocumented, as far > as I can tell. > >> Now here comes my proposal: >> Markus statement brought up an idea of special casing DASDs support. We can >> call an ioctl BIODASDINFO on the block device that will only succeed if the host >> disk is really a dasd. We could enable the auto detection for that case. > > If BIODASDINFO succeeds, QEMU uses that instead of making up device > geometry as described above. Correct? > > Let's spell out when exactly BIODASDINFO succeeds, to avoid > misunderstandings. It does when the backend is a DASD (/dev/dasd*). > What about a partition on a DASD? A file in a filesystem on a DASD? > > Auto-detecting geometry on DASDs adds an irregularity to the user > interface. I guess DASDs are special enough to tolerate that. > >> In addition, QEMU will check if geometry and block size match during >> migration, if >> not, migration will fail. That would work with the following cases >> >> (manual override == secs, cyls, headers, blocksize given by admin) >> >> HOST A HOST B >> dasd (auto) -----> dasd (auto) >> dasd (auto) -----> image file (manual override) >> image file (manual) -----> dasd (auto) >> image file (manual) -----> image file (manual) >> dasd (auto) -----> other host block device with manual override >> >> if there are different dasds or different value migration will fail (and that is >> what we want) > > I guess I helped plant this idea. Thinking about it again, I fear doing > it just for geometry could be shortsighted. When we do it for all > device configuration later on, a geometry special case could get in the > way. > > While you're certainly welcome to take on one of migration's long-term > projects, I don't want to make it a prerequisite for getting DASDs more > usable ;) > > Without this sanity check, we gain another way to mess up geometry by > changing the default geometry (see "DOS partition table" above for the > existing way). We reuse the same answer: don't do that then. > > Good enough for Paolo, if I understand him correctly. I guess it's good > enough for me too, but please document it. Covering the whole default > geometry machinery would be nice. > >> In addition we could also implement Stefans proposal to add a >> "autodetect" statement >> for secs, cyls, head.... but I am not sure about libvirt support, though >> >> So 3 parts: >> 1. autodetect on real DASDs >> 2. geometry and sector size checking in generic code >> 3. maybe an autodetect flag >> >> makes sense? Any guidance how to proceed? > > Try posting a patch just for 1., and see if anyone screams :) > Thanks a lot for the advise! A small follow-up question... Since the whole thing becomes arch-specific again (or even DASD-specific), I suppose I no longer need to change the driver code. In my most recent patchset geometry was detected by HDIO_GETGEO inside hw/block/hd-geometry.c (via arch-specific hook) But... for the blocksize the control was passed to the driver functions (probe_logical_blocksize, probe_physical_blocksize) which called appropriate ioctls (BLKPBSZGET, BLKSSZGET) for "raw" and "host devices" I suppose, that now there's no need for that anymore I can move the block ioctl calls back to hw/block/block.c and remove the introduced driver functions. What do you think? Kate.