From: "Michael S. Tsirkin" <mst@redhat.com>
To: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: kwolf@redhat.com, den@virtuozzo.com, qemu-devel@nongnu.org,
mreitz@redhat.com, kraxel@redhat.com, stefanha@redhat.com
Subject: Re: [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839
Date: Wed, 23 Oct 2019 17:28:17 -0400 [thread overview]
Message-ID: <20191023172005-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20191018115547.19299-1-dplotnikov@virtuozzo.com>
On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote:
> From: "Denis V. Lunev" <den@openvz.org>
>
> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
> field reported by SCSI controler. Thus typical sequential read with
> 1 MB size results in the following pattern of the IO from the guest:
> 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd]
> 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd]
> 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd]
> 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0]
> 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0]
> 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0]
> The IO was generated by
> dd if=/dev/sda of=/dev/null bs=1024 iflag=direct
>
> This effectively means that on rotational disks we will observe 3 IOPS
> for each 2 MBs processed. This definitely negatively affects both
> guest and host IO performance.
>
> The cure is relatively simple - we should report lengthy scatter-gather
> ability of the SCSI controller. Fortunately the situation here is very
> good. VirtIO transport layer can accomodate 1024 items in one request
> while we are using only 128. This situation is present since almost
> very beginning. 2 items are dedicated for request metadata thus we
> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg.
>
> The following pattern is observed after the patch:
> 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd]
> 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd]
> 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0]
> 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0]
> which is much better.
>
> The dark side of this patch is that we are tweaking guest visible
> parameter, though this should be relatively safe as above transport
> layer support is present in QEMU/host Linux for a very long time.
> The patch adds configurable property for VirtIO SCSI with a new default
> and hardcode option for VirtBlock which does not provide good
> configurable framework.
>
> Unfortunately the commit can not be applied as is. For the real cure we
> need guest to be fixed to accomodate that queue length, which is done
> only in the latest 4.14 kernel. Thus we are going to expose the property
> and tweak it on machine type level.
>
> The problem with the old kernels is that they have
> max_segments <= virtqueue_size restriction which cause the guest
> crashing in the case of violation.
This isn't just in the guests: virtio spec also seems to imply this,
or at least be vague on this point.
So I think it'll need a feature bit.
Doing that in a safe way will also allow being compatible with old guests.
The only downside is it's a bit more work as we need to
spec this out and add guest support.
> To fix the case described above in the old kernels we can increase
> virtqueue_size to 256 and max_segments to 254. The pitfall here is
> that seabios allows the virtqueue_size-s < 128, however, the seabios
> patch extending that value to 256 is pending.
And the fix here is just to limit large vq size to virtio 1.0.
In that mode it's fine I think:
/* check if the queue is available */
if (vp->use_modern) {
num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size);
if (num > MAX_QUEUE_NUM) {
vp_write(&vp->common, virtio_pci_common_cfg, queue_size,
MAX_QUEUE_NUM);
num = vp_read(&vp->common, virtio_pci_common_cfg, queue_size);
}
} else {
num = vp_read(&vp->legacy, virtio_pci_legacy, queue_num);
}
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Gerd Hoffmann <kraxel@redhat.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
> hw/block/virtio-blk.c | 3 ++-
> hw/scsi/vhost-scsi.c | 2 ++
> hw/scsi/virtio-scsi.c | 4 +++-
> include/hw/virtio/virtio-blk.h | 1 +
> include/hw/virtio/virtio-scsi.h | 1 +
> 5 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 06e57a4d39..b2eaeeaf67 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -903,7 +903,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
> blk_get_geometry(s->blk, &capacity);
> memset(&blkcfg, 0, sizeof(blkcfg));
> virtio_stq_p(vdev, &blkcfg.capacity, capacity);
> - virtio_stl_p(vdev, &blkcfg.seg_max, 128 - 2);
> + virtio_stl_p(vdev, &blkcfg.seg_max, s->conf.max_segments);
> virtio_stw_p(vdev, &blkcfg.geometry.cylinders, conf->cyls);
> virtio_stl_p(vdev, &blkcfg.blk_size, blk_size);
> virtio_stw_p(vdev, &blkcfg.min_io_size, conf->min_io_size / blk_size);
> @@ -1240,6 +1240,7 @@ static Property virtio_blk_properties[] = {
> conf.max_discard_sectors, BDRV_REQUEST_MAX_SECTORS),
> DEFINE_PROP_UINT32("max-write-zeroes-sectors", VirtIOBlock,
> conf.max_write_zeroes_sectors, BDRV_REQUEST_MAX_SECTORS),
> + DEFINE_PROP_UINT32("max_segments", VirtIOBlock, conf.max_segments, 126),
> DEFINE_PROP_END_OF_LIST(),
> };
>
I'd worry that it's too easy to create a broken config with this
parameter.
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index 61e2e57da9..fa3b377807 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -242,6 +242,8 @@ static Property vhost_scsi_properties[] = {
> DEFINE_PROP_BIT64("t10_pi", VHostSCSICommon, host_features,
> VIRTIO_SCSI_F_T10_PI,
> false),
> + DEFINE_PROP_UINT32("max_segments", VirtIOSCSICommon, conf.max_segments,
> + 126),
> DEFINE_PROP_END_OF_LIST(),
> };
>
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 839f120256..8b070ddeed 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -650,7 +650,7 @@ static void virtio_scsi_get_config(VirtIODevice *vdev,
> VirtIOSCSICommon *s = VIRTIO_SCSI_COMMON(vdev);
>
> virtio_stl_p(vdev, &scsiconf->num_queues, s->conf.num_queues);
> - virtio_stl_p(vdev, &scsiconf->seg_max, 128 - 2);
> + virtio_stl_p(vdev, &scsiconf->seg_max, s->conf.max_segments);
> virtio_stl_p(vdev, &scsiconf->max_sectors, s->conf.max_sectors);
> virtio_stl_p(vdev, &scsiconf->cmd_per_lun, s->conf.cmd_per_lun);
> virtio_stl_p(vdev, &scsiconf->event_info_size, sizeof(VirtIOSCSIEvent));
> @@ -948,6 +948,8 @@ static Property virtio_scsi_properties[] = {
> VIRTIO_SCSI_F_CHANGE, true),
> DEFINE_PROP_LINK("iothread", VirtIOSCSI, parent_obj.conf.iothread,
> TYPE_IOTHREAD, IOThread *),
> + DEFINE_PROP_UINT32("max_segments", VirtIOSCSI, parent_obj.conf.max_segments,
> + 126),
> DEFINE_PROP_END_OF_LIST(),
> };
>
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index cddcfbebe9..22da23a4a3 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -40,6 +40,7 @@ struct VirtIOBlkConf
> uint16_t queue_size;
> uint32_t max_discard_sectors;
> uint32_t max_write_zeroes_sectors;
> + uint32_t max_segments;
> };
>
> struct VirtIOBlockDataPlane;
> diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
> index 4c0bcdb788..1e5805eec4 100644
> --- a/include/hw/virtio/virtio-scsi.h
> +++ b/include/hw/virtio/virtio-scsi.h
> @@ -49,6 +49,7 @@ struct VirtIOSCSIConf {
> uint32_t num_queues;
> uint32_t virtqueue_size;
> uint32_t max_sectors;
> + uint32_t max_segments;
> uint32_t cmd_per_lun;
> #ifdef CONFIG_VHOST_SCSI
> char *vhostfd;
> --
> 2.17.0
next prev parent reply other threads:[~2019-10-23 21:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-18 11:55 [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839 Denis Plotnikov
2019-10-21 13:24 ` Stefan Hajnoczi
2019-10-22 4:01 ` Denis Lunev
2019-10-23 14:17 ` Stefan Hajnoczi
2019-10-23 14:37 ` Denis Lunev
2019-10-23 9:13 ` Denis Plotnikov
2019-10-23 21:50 ` Michael S. Tsirkin
2019-10-23 21:28 ` Michael S. Tsirkin [this message]
2019-10-24 11:34 ` Denis Lunev
2019-11-06 12:03 ` Michael S. Tsirkin
2019-11-13 12:38 ` Denis Plotnikov
2019-11-13 13:18 ` Michael S. Tsirkin
2019-11-14 15:33 ` Denis Plotnikov
2019-11-25 9:16 ` Denis Plotnikov
2019-12-05 7:59 ` Denis Plotnikov
2019-12-13 12:24 ` [PING] " Denis Plotnikov
2019-12-13 12:40 ` Michael S. Tsirkin
2019-11-12 10:03 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191023172005-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=den@virtuozzo.com \
--cc=dplotnikov@virtuozzo.com \
--cc=kraxel@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).