All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Michael Tokarev <mjt@tls.msk.ru>
Cc: qemu-devel@nongnu.org, Anthony Liguori <anthony@codemonkey.ws>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH] increase BlockConf.min_io_size type from uint16_t to uint32_t
Date: Wed, 02 May 2012 17:53:52 +0200	[thread overview]
Message-ID: <4FA15890.90908@redhat.com> (raw)
In-Reply-To: <4FA154AB.2040201@msgid.tls.msk.ru>

Am 02.05.2012 17:37, schrieb Michael Tokarev:
> On 02.05.2012 18:35, Kevin Wolf wrote:
> []
>>> As I already mentioned, the virtio protocol has the same defect (but there
>>> it is less serious due to usage of larger units).  And that's where the
>>> additional overflow needs to be ELIMINATED, not just checked.  Ie, the
>>> protocol should be changed somehow - the only issue is that I don't know
>>> how to do that now, once it has been in use for quite some time.
>>
>> Even if you create a new version of the protocol (introduce a new
>> feature flag or whatever), newer qemus will still have to deal with
>> older guests and vice versa.
> 
> Sure.  And for these, the checks indeed should be done in lower layers.

What lower layers do you mean?

>> But now that you're extending the property value to 32 bits, but only 25
>> bits can be really used, the property type doesn't even theoretically
>> suffice as a check.
> 
> So, what do you propose?  To add a check into virtio-blk.c (and into
> whatever else place is using this variable) too?  If yes, and indeed
> it appears to be the right thing to do, care to show me the right
> place please, I'm not very familiar with that code...

I think the qdev init functions, or whatever they have become with QOM,
are the right place. Looks like they are virtio_blk_init() and
scsi_initfn(). I believe it's possible to fail them.

>> But I can't see which structure is only used by virtio-blk, though. The
>> min_io_size property is contained in DEFINE_BLOCK_PROPERTIES(), which is
>> used by every qdevified block device. Most of them ignore min_io_size,
>> but virtio-blk and scsi-disk don't.
> 
> I mean the BlockConf struct.  It isn't used anywhere but in virtio-blk.c.
> 
> But again, since I'm not familiar with the code, I might be wrong.

$ git grep BlockConf
block.h:typedef struct BlockConf {
block.h:} BlockConf;
block.h:static inline unsigned int get_physical_block_exp(BlockConf *conf)
hw/ide/internal.h:    BlockConf conf;
hw/s390-virtio-bus.h:    BlockConf block;
hw/scsi.h:    BlockConf conf;
hw/usb/dev-storage.c:    BlockConf conf;
hw/virtio-blk.c:    BlockConf *conf;
hw/virtio-blk.c:VirtIODevice *virtio_blk_init(DeviceState *dev,
BlockConf *conf,
hw/virtio-pci.h:    BlockConf block;
hw/virtio.h:VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf,

So there are more users than just virtio-blk.

>> That wouldn't be a good interface. Let's just take a 32 bit number and
>> add the checks in the devices.
> 
> That's fine.  The only thing left to do is to find the proper places for
> the checks.  Help?

See above.

>> Just curious... What values do you want to use? The 32 MB minimum I/O
>> size that are possible with 16 bits and 512 byte sectors already sounds
>> insanely large.
> 
> I don't "use" any values.  I merely pass whatever is defined on my systems
> down to the guest.
> 
> md layer in kernel - raid4, raid5 and raid6 implementation - sets min_io_size
> to the chunk size, and opt_io_size to "stripe size".  It is not uncommon at
> all to have chunk size = 1Mb or more.   I've no idea how useful this information
> is, but at least with it present (my small raid5 array has 256Kb chunk size),
> xfs created in the guest performs much faster than without this information
> (which means usual 512 there).
> 
> This is how I discovered the issue - I wondered why xfs created in the guest
> is so much slower than the same xfs but created on host.  I/O sizes immediately
> come to min, so I added these, but it was still slow.  So I noticed min_io_size
> isn't passed correctly, increased the size of this type, and voila, xfs
> created in guest now behaves as fast as created on host.  Something like that
> anyway.
> 
> There's an obvious bug in there, but it is not obvious for me where/how it should
> be fixed.  Maybe the sizes used by md raid5 are insane, that's arguable ofcourse,
> but this is what is in use now (and since the day the i/o sizes were added to
> md to start with), and this is what makes things fly.

Thanks for the explanation. I wasn't trying to say that your setup is
wrong, I just wasn't aware that such sizes are in use.

Kevin

      reply	other threads:[~2012-05-02 15:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1335801160-321-1-git-send-email-mjt@msgid.tls.msk.ru>
2012-05-01  8:27 ` [Qemu-devel] [PATCH] increase BlockConf.min_io_size type from uint16_t to uint32_t Stefan Hajnoczi
2012-05-01  8:43   ` Michael Tokarev
2012-05-02  9:57 ` Kevin Wolf
2012-05-02 10:08   ` Michael Tokarev
2012-05-02 14:35     ` Kevin Wolf
2012-05-02 15:37       ` Michael Tokarev
2012-05-02 15:53         ` Kevin Wolf [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA15890.90908@redhat.com \
    --to=kwolf@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=mjt@tls.msk.ru \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.