From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:54044)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1SPbsy-00048A-D8
	for qemu-devel@nongnu.org; Wed, 02 May 2012 11:54:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1SPbsr-0003Wy-MY
	for qemu-devel@nongnu.org; Wed, 02 May 2012 11:54:35 -0400
Received: from mx1.redhat.com ([209.132.183.28]:24632)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1SPbsr-0003Wr-E4
	for qemu-devel@nongnu.org; Wed, 02 May 2012 11:54:29 -0400
Message-ID: <4FA15890.90908@redhat.com>
Date: Wed, 02 May 2012 17:53:52 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <1335801160-321-1-git-send-email-mjt@msgid.tls.msk.ru>
	<4FA104F2.6020106@redhat.com> <4FA1079C.80105@msgid.tls.msk.ru>
	<4FA1462D.7020108@redhat.com> <4FA154AB.2040201@msgid.tls.msk.ru>
In-Reply-To: <4FA154AB.2040201@msgid.tls.msk.ru>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] increase BlockConf.min_io_size type from
 uint16_t to uint32_t
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Michael Tokarev <mjt@tls.msk.ru>
Cc: qemu-devel@nongnu.org, Anthony Liguori <anthony@codemonkey.ws>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Am 02.05.2012 17:37, schrieb Michael Tokarev:
> On 02.05.2012 18:35, Kevin Wolf wrote:
> []
>>> As I already mentioned, the virtio protocol has the same defect (but there
>>> it is less serious due to usage of larger units).  And that's where the
>>> additional overflow needs to be ELIMINATED, not just checked.  Ie, the
>>> protocol should be changed somehow - the only issue is that I don't know
>>> how to do that now, once it has been in use for quite some time.
>>
>> Even if you create a new version of the protocol (introduce a new
>> feature flag or whatever), newer qemus will still have to deal with
>> older guests and vice versa.
> 
> Sure.  And for these, the checks indeed should be done in lower layers.

What lower layers do you mean?

>> But now that you're extending the property value to 32 bits, but only 25
>> bits can be really used, the property type doesn't even theoretically
>> suffice as a check.
> 
> So, what do you propose?  To add a check into virtio-blk.c (and into
> whatever else place is using this variable) too?  If yes, and indeed
> it appears to be the right thing to do, care to show me the right
> place please, I'm not very familiar with that code...

I think the qdev init functions, or whatever they have become with QOM,
are the right place. Looks like they are virtio_blk_init() and
scsi_initfn(). I believe it's possible to fail them.

>> But I can't see which structure is only used by virtio-blk, though. The
>> min_io_size property is contained in DEFINE_BLOCK_PROPERTIES(), which is
>> used by every qdevified block device. Most of them ignore min_io_size,
>> but virtio-blk and scsi-disk don't.
> 
> I mean the BlockConf struct.  It isn't used anywhere but in virtio-blk.c.
> 
> But again, since I'm not familiar with the code, I might be wrong.

$ git grep BlockConf
block.h:typedef struct BlockConf {
block.h:} BlockConf;
block.h:static inline unsigned int get_physical_block_exp(BlockConf *conf)
hw/ide/internal.h:    BlockConf conf;
hw/s390-virtio-bus.h:    BlockConf block;
hw/scsi.h:    BlockConf conf;
hw/usb/dev-storage.c:    BlockConf conf;
hw/virtio-blk.c:    BlockConf *conf;
hw/virtio-blk.c:VirtIODevice *virtio_blk_init(DeviceState *dev,
BlockConf *conf,
hw/virtio-pci.h:    BlockConf block;
hw/virtio.h:VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf,

So there are more users than just virtio-blk.

>> That wouldn't be a good interface. Let's just take a 32 bit number and
>> add the checks in the devices.
> 
> That's fine.  The only thing left to do is to find the proper places for
> the checks.  Help?

See above.

>> Just curious... What values do you want to use? The 32 MB minimum I/O
>> size that are possible with 16 bits and 512 byte sectors already sounds
>> insanely large.
> 
> I don't "use" any values.  I merely pass whatever is defined on my systems
> down to the guest.
> 
> md layer in kernel - raid4, raid5 and raid6 implementation - sets min_io_size
> to the chunk size, and opt_io_size to "stripe size".  It is not uncommon at
> all to have chunk size = 1Mb or more.   I've no idea how useful this information
> is, but at least with it present (my small raid5 array has 256Kb chunk size),
> xfs created in the guest performs much faster than without this information
> (which means usual 512 there).
> 
> This is how I discovered the issue - I wondered why xfs created in the guest
> is so much slower than the same xfs but created on host.  I/O sizes immediately
> come to min, so I added these, but it was still slow.  So I noticed min_io_size
> isn't passed correctly, increased the size of this type, and voila, xfs
> created in guest now behaves as fast as created on host.  Something like that
> anyway.
> 
> There's an obvious bug in there, but it is not obvious for me where/how it should
> be fixed.  Maybe the sizes used by md raid5 are insane, that's arguable ofcourse,
> but this is what is in use now (and since the day the i/o sizes were added to
> md to start with), and this is what makes things fly.

Thanks for the explanation. I wasn't trying to say that your setup is
wrong, I just wasn't aware that such sizes are in use.

Kevin