From: "Michael S. Tsirkin" <mst@redhat.com>
To: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
Denis Lunev <den@virtuozzo.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"mreitz@redhat.com" <mreitz@redhat.com>,
"kraxel@redhat.com" <kraxel@redhat.com>,
"stefanha@redhat.com" <stefanha@redhat.com>
Subject: Re: [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839
Date: Wed, 13 Nov 2019 08:18:48 -0500 [thread overview]
Message-ID: <20191113081108-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <650b229d-34d6-f9a5-6a52-d7b5892a2c0c@virtuozzo.com>
On Wed, Nov 13, 2019 at 12:38:48PM +0000, Denis Plotnikov wrote:
>
>
> On 06.11.2019 15:03, Michael S. Tsirkin wrote:
> > On Thu, Oct 24, 2019 at 11:34:34AM +0000, Denis Lunev wrote:
> >> On 10/24/19 12:28 AM, Michael S. Tsirkin wrote:
> >>> On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote:
> >>>> From: "Denis V. Lunev" <den@openvz.org>
> >>>>
> >>>> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
> >>>> field reported by SCSI controler. Thus typical sequential read with
> >>>> 1 MB size results in the following pattern of the IO from the guest:
> >>>> 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd]
> >>>> 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd]
> >>>> 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd]
> >>>> 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0]
> >>>> 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0]
> >>>> 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0]
> >>>> The IO was generated by
> >>>> dd if=/dev/sda of=/dev/null bs=1024 iflag=direct
> >>>>
> >>>> This effectively means that on rotational disks we will observe 3 IOPS
> >>>> for each 2 MBs processed. This definitely negatively affects both
> >>>> guest and host IO performance.
> >>>>
> >>>> The cure is relatively simple - we should report lengthy scatter-gather
> >>>> ability of the SCSI controller. Fortunately the situation here is very
> >>>> good. VirtIO transport layer can accomodate 1024 items in one request
> >>>> while we are using only 128. This situation is present since almost
> >>>> very beginning. 2 items are dedicated for request metadata thus we
> >>>> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg.
> >>>>
> >>>> The following pattern is observed after the patch:
> >>>> 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd]
> >>>> 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd]
> >>>> 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0]
> >>>> 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0]
> >>>> which is much better.
> >>>>
> >>>> The dark side of this patch is that we are tweaking guest visible
> >>>> parameter, though this should be relatively safe as above transport
> >>>> layer support is present in QEMU/host Linux for a very long time.
> >>>> The patch adds configurable property for VirtIO SCSI with a new default
> >>>> and hardcode option for VirtBlock which does not provide good
> >>>> configurable framework.
> >>>>
> >>>> Unfortunately the commit can not be applied as is. For the real cure we
> >>>> need guest to be fixed to accomodate that queue length, which is done
> >>>> only in the latest 4.14 kernel. Thus we are going to expose the property
> >>>> and tweak it on machine type level.
> >>>>
> >>>> The problem with the old kernels is that they have
> >>>> max_segments <= virtqueue_size restriction which cause the guest
> >>>> crashing in the case of violation.
> >>> This isn't just in the guests: virtio spec also seems to imply this,
> >>> or at least be vague on this point.
> >>>
> >>> So I think it'll need a feature bit.
> >>> Doing that in a safe way will also allow being compatible with old guests.
> >>>
> >>> The only downside is it's a bit more work as we need to
> >>> spec this out and add guest support.
> >>>
> >>>> To fix the case described above in the old kernels we can increase
> >>>> virtqueue_size to 256 and max_segments to 254. The pitfall here is
> >>>> that seabios allows the virtqueue_size-s < 128, however, the seabios
> >>>> patch extending that value to 256 is pending.
> >>> And the fix here is just to limit large vq size to virtio 1.0.
> >>> In that mode it's fine I think:
> >>>
> >>>
> >>> /* check if the queue is available */
> >>> if (vp->use_modern) {
> >>> num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size);
> >>> if (num > MAX_QUEUE_NUM) {
> >>> vp_write(&vp->common, virtio_pci_common_cfg, queue_size,
> >>> MAX_QUEUE_NUM);
> >>> num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size);
> >>> }
> >>> } else {
> >>> num = vp_read(&vp->legacy, virtio_pci_legacy, queue_num);
> >>> }
> >> you mean to put the code like this into virtio_pci_realize() inside QEMU?
> >>
> >> If no, can you pls clarify which component should be touched.
> >>
> >> Den
> > I mean:
> > - add an API to change the default queue size
> > - add a validate features callback, in there check and for modern
> > flag set in features increase the queue size
> >
> > maybe all this is too much work, we could block this
> > for transitional devices, but your patch does not do it,
> > you need to check that legacy is enabled not that modern
> > is not disabled.
> To develop the idea of how to adjust queue size further I'd like to
> summarize what we have:
>
> 1. Variatly of gusts without(?) queue size limitations which can support
> queue sizes up to MAX(1024)
>
> 2. seabios setups with two possible max queue size limitations: 128 and
> 256 (recently commited)
>
> 3. non-sebios setups with unknown max queue-size limitations
>
> Taking into account that queue size may be limited in bios(efi), to
> safely support gueue sizes > 128 we need to distinguish those how can
> support greater_than_128 from those who can't.
> seabios potentially can't do it, so, as far as I understood, the idea is
> to start with queue size=128 and then increase the queue size when the
> guest driver is engaged.
>
> To achieve that, we need to
>
> 1. understand, which driver is currently working with a virtio device:
> seabios, guest, other. Things
> here are quite complex, since we can't modify any guest, seabios or
> other drivers to explicitly tell
> that to device
Anyone negotiating VIRTIO_1
> 2. be able to increase queue size dynamically (re-create queues?). At
> the time, this functionality
> is absent, at least in qemu virtio-scsi.
> Is it possible by design?
Why not, it's just an array.
This is what I meant when I said we need an API to resize a queue.
> 3. choose a place for queue size extending (re-creation).
> VirtioDeviceClass->reset?
Definitely not reset, that gets you back to original state.
> I actually don't know how to do it reliably, so would really appreciate
> sone help or advice.
validate features sounds like a good place.
this is why I wrote "add a validate features callback".
>
> You've mentioned that old seabios won't use the modern interface, so
> would it be ok, if we
>
> * define DEFAULT_QUEUE_SIZE = 128
> * leave queues creation as is at VirtioDeviceClass->realize()
> with queue_size = conf.queue_size
> * on VirtioDeviceClass->reset() we check if the device accessed
> through "legacy" interface
> if so, then (in pseudocode)
> if (current_queue_size > DEFAULT_QUEUE_SIZE) {
> for (queue in all_queues) {
> reduce_queue_size(queue, DEFAULT_QUEUE_SIZE) //
> recreate_queue() ?
> }
> }
> else
> if (conf.queue_size > current_queue_size) {
> for (queue in all_queues) {
> increase_queue_size(queue, conf.queue_size)
> }
> }
>
> Might this approach work? Does it what you meant?
>
> Denis
I don't think you can do anything useful in reset. We need to check
features after they have been negotiated. So we'd start with a small
queue min(DEFAULT_QUEUE_SIZE, current_queue_size)?
and if VIRTIO_1 is set increase the size.
This is very compatible but it is certainly ugly as we are
second-guessing the user.
Simpler idea: add a new property that is simply
unsupported with legacy. E.g. "modern-queue-size" ?
If someone sets it, legacy must be disabled otherwise we fail.
Way less compatible but hey.
> >
> >
> >
>
next prev parent reply other threads:[~2019-11-13 13:19 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-18 11:55 [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839 Denis Plotnikov
2019-10-21 13:24 ` Stefan Hajnoczi
2019-10-22 4:01 ` Denis Lunev
2019-10-23 14:17 ` Stefan Hajnoczi
2019-10-23 14:37 ` Denis Lunev
2019-10-23 9:13 ` Denis Plotnikov
2019-10-23 21:50 ` Michael S. Tsirkin
2019-10-23 21:28 ` Michael S. Tsirkin
2019-10-24 11:34 ` Denis Lunev
2019-11-06 12:03 ` Michael S. Tsirkin
2019-11-13 12:38 ` Denis Plotnikov
2019-11-13 13:18 ` Michael S. Tsirkin [this message]
2019-11-14 15:33 ` Denis Plotnikov
2019-11-25 9:16 ` Denis Plotnikov
2019-12-05 7:59 ` Denis Plotnikov
2019-12-13 12:24 ` [PING] " Denis Plotnikov
2019-12-13 12:40 ` Michael S. Tsirkin
2019-11-12 10:03 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191113081108-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=den@virtuozzo.com \
--cc=dplotnikov@virtuozzo.com \
--cc=kraxel@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).