From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:54044) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SPbsy-00048A-D8 for qemu-devel@nongnu.org; Wed, 02 May 2012 11:54:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SPbsr-0003Wy-MY for qemu-devel@nongnu.org; Wed, 02 May 2012 11:54:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:24632) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SPbsr-0003Wr-E4 for qemu-devel@nongnu.org; Wed, 02 May 2012 11:54:29 -0400 Message-ID: <4FA15890.90908@redhat.com> Date: Wed, 02 May 2012 17:53:52 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <1335801160-321-1-git-send-email-mjt@msgid.tls.msk.ru> <4FA104F2.6020106@redhat.com> <4FA1079C.80105@msgid.tls.msk.ru> <4FA1462D.7020108@redhat.com> <4FA154AB.2040201@msgid.tls.msk.ru> In-Reply-To: <4FA154AB.2040201@msgid.tls.msk.ru> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] increase BlockConf.min_io_size type from uint16_t to uint32_t List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Tokarev Cc: qemu-devel@nongnu.org, Anthony Liguori , Stefan Hajnoczi Am 02.05.2012 17:37, schrieb Michael Tokarev: > On 02.05.2012 18:35, Kevin Wolf wrote: > [] >>> As I already mentioned, the virtio protocol has the same defect (but there >>> it is less serious due to usage of larger units). And that's where the >>> additional overflow needs to be ELIMINATED, not just checked. Ie, the >>> protocol should be changed somehow - the only issue is that I don't know >>> how to do that now, once it has been in use for quite some time. >> >> Even if you create a new version of the protocol (introduce a new >> feature flag or whatever), newer qemus will still have to deal with >> older guests and vice versa. > > Sure. And for these, the checks indeed should be done in lower layers. What lower layers do you mean? >> But now that you're extending the property value to 32 bits, but only 25 >> bits can be really used, the property type doesn't even theoretically >> suffice as a check. > > So, what do you propose? To add a check into virtio-blk.c (and into > whatever else place is using this variable) too? If yes, and indeed > it appears to be the right thing to do, care to show me the right > place please, I'm not very familiar with that code... I think the qdev init functions, or whatever they have become with QOM, are the right place. Looks like they are virtio_blk_init() and scsi_initfn(). I believe it's possible to fail them. >> But I can't see which structure is only used by virtio-blk, though. The >> min_io_size property is contained in DEFINE_BLOCK_PROPERTIES(), which is >> used by every qdevified block device. Most of them ignore min_io_size, >> but virtio-blk and scsi-disk don't. > > I mean the BlockConf struct. It isn't used anywhere but in virtio-blk.c. > > But again, since I'm not familiar with the code, I might be wrong. $ git grep BlockConf block.h:typedef struct BlockConf { block.h:} BlockConf; block.h:static inline unsigned int get_physical_block_exp(BlockConf *conf) hw/ide/internal.h: BlockConf conf; hw/s390-virtio-bus.h: BlockConf block; hw/scsi.h: BlockConf conf; hw/usb/dev-storage.c: BlockConf conf; hw/virtio-blk.c: BlockConf *conf; hw/virtio-blk.c:VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf, hw/virtio-pci.h: BlockConf block; hw/virtio.h:VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf, So there are more users than just virtio-blk. >> That wouldn't be a good interface. Let's just take a 32 bit number and >> add the checks in the devices. > > That's fine. The only thing left to do is to find the proper places for > the checks. Help? See above. >> Just curious... What values do you want to use? The 32 MB minimum I/O >> size that are possible with 16 bits and 512 byte sectors already sounds >> insanely large. > > I don't "use" any values. I merely pass whatever is defined on my systems > down to the guest. > > md layer in kernel - raid4, raid5 and raid6 implementation - sets min_io_size > to the chunk size, and opt_io_size to "stripe size". It is not uncommon at > all to have chunk size = 1Mb or more. I've no idea how useful this information > is, but at least with it present (my small raid5 array has 256Kb chunk size), > xfs created in the guest performs much faster than without this information > (which means usual 512 there). > > This is how I discovered the issue - I wondered why xfs created in the guest > is so much slower than the same xfs but created on host. I/O sizes immediately > come to min, so I added these, but it was still slow. So I noticed min_io_size > isn't passed correctly, increased the size of this type, and voila, xfs > created in guest now behaves as fast as created on host. Something like that > anyway. > > There's an obvious bug in there, but it is not obvious for me where/how it should > be fixed. Maybe the sizes used by md raid5 are insane, that's arguable ofcourse, > but this is what is in use now (and since the day the i/o sizes were added to > md to start with), and this is what makes things fly. Thanks for the explanation. I wasn't trying to say that your setup is wrong, I just wasn't aware that such sizes are in use. Kevin