qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Jason Wang <jasowang@redhat.com>
Cc: eperezma <eperezma@redhat.com>, Eli Cohen <eli@mellanox.com>,
	qemu-devel <qemu-devel@nongnu.org>, mst <mst@redhat.com>
Subject: Re: [RFC PATCH] vhost_net: should not use max_queue_pairs for non-mq guest
Date: Fri, 25 Mar 2022 16:15:28 -0700	[thread overview]
Message-ID: <53dd4ba4-9c7e-cc0f-eaed-3c884dd1b144@oracle.com> (raw)
In-Reply-To: <CACGkMEuh8S3ShJZRtDkjvykHMNSi4A1pO0PRJPuEKJL=uAhX9Q@mail.gmail.com>



On 3/25/2022 12:59 AM, Jason Wang wrote:
> On Fri, Mar 25, 2022 at 3:02 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 3/21/2022 8:47 PM, Jason Wang wrote:
>>> On Sat, Mar 19, 2022 at 12:14 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> With MQ enabled vdpa device and non-MQ supporting guest e.g.
>>>> booting vdpa with mq=on over OVMF of single vqp, it's easy
>>>> to hit assert failure as the following:
>>>>
>>>> ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
>>>>
>>>> 0  0x00007f8ce3ff3387 in raise () at /lib64/libc.so.6
>>>> 1  0x00007f8ce3ff4a78 in abort () at /lib64/libc.so.6
>>>> 2  0x00007f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
>>>> 3  0x00007f8ce3fec252 in  () at /lib64/libc.so.6
>>>> 4  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:563
>>>> 5  0x0000558f52d79421 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:558
>>>> 6  0x0000558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=<optimized out>) at ../hw/virtio/vhost.c:1557
>>>> 7  0x0000558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false)
>>>>      at ../hw/virtio/virtio-pci.c:974
>>>> 8  0x0000558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
>>>> 9  0x0000558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
>>>>      at ../hw/net/vhost_net.c:361
>>>> 10 0x0000558f52d4e5e7 in virtio_net_set_status (status=<optimized out>, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
>>>> 11 0x0000558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370
>>>> 12 0x0000558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
>>>> 13 0x0000558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=<optimized out>, val=<optimized out>, size=<optimized out>) at ../hw/virtio/virtio-pci.c:1292
>>>> 14 0x0000558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized out>, attrs=...)
>>>>      at ../softmmu/memory.c:492
>>>> 15 0x0000558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x558f52d15cf0 <memory_region_write_accessor>, mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554
>>>> 16 0x0000558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=<optimized out>, op=<optimized out>, attrs=attrs@entry=...)
>>>>      at ../softmmu/memory.c:1504
>>>> 17 0x0000558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
>>>> 18 0x0000558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
>>>> 19 0x0000558f52d0b36b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>)
>>>>      at ../softmmu/physmem.c:2914
>>>> 20 0x0000558f52d0b3da in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=...,
>>>>      attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=<optimized out>, is_write=<optimized out>) at ../softmmu/physmem.c:2924
>>>> 21 0x0000558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903
>>>> 22 0x0000558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49
>>>> 23 0x0000558f52f9f04a in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:556
>>>> 24 0x00007f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
>>>> 25 0x00007f8ce40bb9fd in clone () at /lib64/libc.so.6
>>>>
>>>> The cause for the assert failure is due to that the vhost_dev index
>>>> for the ctrl vq was not aligned with actual one in use by the guest.
>>>> Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
>>>> if guest doesn't support multiqueue, the guest vq layout would shrink
>>>> to single queue pair of 3 vqs in total (rx, tx and ctrl). This results
>>>> in ctrl_vq taking a different vhost_dev group index than the default
>>>> n->max_queue_pairs, the latter of which is only valid for multiqueue
>>>> guest. While on those additional vqs not exposed to the guest,
>>>> vhost_net_set_vq_index() never populated vq_index properly, hence
>>>> getting the assert failure.
>>>>
>>>> A possible fix is to pick the correct vhost_dev group for the control
>>>> vq according to this table [*]:
>>>>
>>>> vdpa tool / QEMU arg / guest config    / ctrl_vq group index
>>>> ----------------------------------------------------------------
>>>> max_vqp 8 / mq=on    / mq=off  (UEFI) => data_queue_pairs
>>>> max_vqp 8 / mq=on    / mq=on  (Linux) => n->max_queue_pairs(>1)
>>>> max_vqp 8 / mq=off   / mq=on  (Linux) => n->max_queue_pairs(=1)
>>>>
>>>> [*] Please see FIXME in the code for open question and discussion
>>>>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> ---
>>>>    hw/net/vhost_net.c     | 13 +++++++++----
>>>>    hw/virtio/vhost-vdpa.c | 25 ++++++++++++++++++++++++-
>>>>    2 files changed, 33 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>> index 30379d2..9a4479b 100644
>>>> --- a/hw/net/vhost_net.c
>>>> +++ b/hw/net/vhost_net.c
>>>> @@ -322,6 +322,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>        BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
>>>>        VirtioBusState *vbus = VIRTIO_BUS(qbus);
>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>> +    bool mq = virtio_host_has_feature(dev, VIRTIO_NET_F_MQ);
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>> @@ -343,7 +344,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>            if (i < data_queue_pairs) {
>>>>                peer = qemu_get_peer(ncs, i);
>>>>            } else { /* Control Virtqueue */
>>>> -            peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>> +            peer = qemu_get_peer(ncs, mq ? data_queue_pairs : n->max_queue_pairs);
>>>>            }
>>>>
>>>>            net = get_vhost_net(peer);
>>>> @@ -368,7 +369,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>            if (i < data_queue_pairs) {
>>>>                peer = qemu_get_peer(ncs, i);
>>>>            } else {
>>>> -            peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>> +            peer = qemu_get_peer(ncs, mq ? data_queue_pairs : n->max_queue_pairs);
>>>>            }
>>>>            r = vhost_net_start_one(get_vhost_net(peer), dev);
>>>>
>>>> @@ -390,7 +391,10 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>>>>
>>>>    err_start:
>>>>        while (--i >= 0) {
>>>> -        peer = qemu_get_peer(ncs , i);
>>>> +        if (mq)
>>>> +            peer = qemu_get_peer(ncs, i < data_queue_pairs ? i : data_queue_pairs);
>>>> +        else
>>>> +            peer = qemu_get_peer(ncs, i < data_queue_pairs ? i : n->max_queue_pairs);
>>>>            vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>        }
>>>>        e = k->set_guest_notifiers(qbus->parent, total_notifiers, false);
>>>> @@ -409,6 +413,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>        VirtioBusState *vbus = VIRTIO_BUS(qbus);
>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>> +    bool mq = virtio_host_has_feature(dev, VIRTIO_NET_F_MQ);
>>>>        NetClientState *peer;
>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>        int nvhosts = data_queue_pairs + cvq;
>>>> @@ -418,7 +423,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
>>>>            if (i < data_queue_pairs) {
>>>>                peer = qemu_get_peer(ncs, i);
>>>>            } else {
>>>> -            peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>> +            peer = qemu_get_peer(ncs, mq ? data_queue_pairs : n->max_queue_pairs);
>>>>            }
>>>>            vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>        }
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index 27ea706..623476e 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -1097,7 +1097,30 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>>        }
>>>>
>>>> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>>> +    /* FIXME the vhost_dev group for the control vq may have bogus nvqs=2
>>>> +     * value rather than nvqs=1. This can happen in case the guest doesn't
>>>> +     * support multiqueue, as a result of virtio_net_change_num_queue_pairs()
>>>> +     * destroying and rebuilding all the vqs, the guest index for control vq
>>>> +     * will no longer align with the host's. Currently net_init_vhost_vdpa()
>>>> +     * only initializes all vhost_dev's and net_clients once during
>>>> +     * net_client_init1() time, way earlier before multiqueue feature
>>>> +     * negotiation can kick in.
>>> See below, it looks like the code doesn't find the correct vhost_dev.
>>>
>>>> +     *
>>>> +     * Discussion - some possible fixes so far I can think of:
>>>> +     *
>>>> +     * option 1: fix vhost_net->dev.nvqs and nc->is_datapath in place for
>>>> +     * vdpa's ctrl vq, or rebuild all vdpa's vhost_dev groups and the
>>>> +     * net_client array, in the virtio_net_set_multiqueue() path;
>>>> +     *
>>>> +     * option 2: fix vhost_dev->nvqs in place at vhost_vdpa_set_features()
>>>> +     * before coming down to vhost_vdpa_dev_start() (Q: nc->is_datapath
>>>> +     * seems only used in virtio_net_device_realize, is it relevant?);
>>> Relevant but not directly related, for the vhost_dev where
>>> nc->is_datapath is false, it will assume it is backed by a single
>>> queue not a queue pair.
>>>
>>>> +     *
>>>> +     * option 3: use host queue index all along in vhost-vdpa ioctls instead
>>>> +     * of using guest vq index, so that vhost_net_start/stop() can remain
>>>> +     * as-is today
>>>> +     */
>>> Note that the vq_index of each vhost_dev is assigned during
>>> vhost_net_start() according to whether or not the MQ or CVQ is
>>> negotiated in vhost_net_start()
>>>
>>>       for (i = 0; i < nvhosts; i++) {
>>>
>>>           if (i < data_queue_pairs) {
>>>               peer = qemu_get_peer(ncs, i);
>>>           } else { /* Control Virtqueue */
>>>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>           }
>>>
>>>           net = get_vhost_net(peer);
>>>           vhost_net_set_vq_index(net, i * 2, index_end);
>>>
>>> It means some of the peers won't be used when MQ is not negotiated. So
>>> it looks to me the evil came from virtio_net_get_notifier_mask().
>> Yes, there it is. Where the control virtqueue first ever needs a
>> guest_notifier for vhost_dev.
>>> Where it doesn't mask the correct vhost dev when the guest doesn't
>>> support MQ but the host does. So we had option 4:
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 2087516253..5e9ac019cd 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -3179,7 +3179,13 @@ static void
>>> virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
>>>                                               bool mask)
>>>    {
>>>        VirtIONet *n = VIRTIO_NET(vdev);
>>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>> +    NetClientState *nc;
>>> +
>>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>> Hmmm, I thought it would be more natural to align the layout of
>> vhost_dev's with that of virtqueue's, not the other way around.
> The problem is that we need to make sure it works for vhost_net as
> well where it doesn't support cvq.
>
>> Not sure
>> how this vhost_dev selection scheme may work with additional queues
>> discovered through transport specific mechanism, such as the admin
>> virtqueue, but I can live with it for now:
>>
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -244,7 +244,8 @@ static void virtio_net_vhost_status(VirtIONet *n,
>> uint8_t status)
>>        VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>        NetClientState *nc = qemu_get_queue(n->nic);
>>        int queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
>> -    int cvq = n->max_ncs - n->max_queue_pairs;
>> +    int cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
>> +              n->max_ncs - n->max_queue_pairs : 0;
> Any reason for this line?
This corresponds to the following asserts:

assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));

If the QEMU or guest doesn't support control vq, there's no need to 
bother exposing vhost_dev and guest notifier for the control vq. Noted 
the vhost_net_start/stop implies DRIVER_OK is set in device status, 
meaning feature negotiation is complete already (same as n->multiqueue).
>
> Btw, would you mind to post a formal patch for this?
I would love to, there was a set of mq bug fixes sitting in my queue 
pending on paper work, but I had been dragged to other stuff earlier 
this week. I will try to post it early next week.

-Siwei

>
> Thanks
>
>>        if (!get_vhost_net(nc->peer)) {
>>            return;
>> @@ -3161,8 +3162,14 @@ static NetClientInfo net_virtio_info = {
>>    static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
>>    {
>>        VirtIONet *n = VIRTIO_NET(vdev);
>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    NetClientState *nc;
>>        assert(n->vhost_started);
>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>> +    } else {
>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    }
>>        return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
>>    }
>>
>> @@ -3170,8 +3177,14 @@ static void
>> virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
>>                                               bool mask)
>>    {
>>        VirtIONet *n = VIRTIO_NET(vdev);
>> -    NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    NetClientState *nc;
>>        assert(n->vhost_started);
>> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_MQ) && idx == 2) {
>> +        assert(virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ));
>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>> +    } else {
>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>> +    }
>>        vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
>>                                 vdev, idx, mask);
>>    }
>>
>>
>> Thanks,
>> -Siwei
>>
>>> +        nc = qemu_get_subqueue(n->nic, n->max_queue_pairs);
>>> +    } else {
>>> +        nc = qemu_get_subqueue(n->nic, vq2q(idx));
>>> +    }
>>>        assert(n->vhost_started);
>>>        vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
>>>                                 vdev, idx, mask);
>>>
>>> Thanks
>>>
>>>> +    if (dev->vq_index + dev->nvqs < dev->vq_index_end) {
>>>>            return 0;
>>>>        }
>>>>
>>>> --
>>>> 1.8.3.1
>>>>



      reply	other threads:[~2022-03-25 23:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-19  4:13 [RFC PATCH] vhost_net: should not use max_queue_pairs for non-mq guest Si-Wei Liu
2022-03-22  3:47 ` Jason Wang
2022-03-25  7:01   ` Si-Wei Liu
2022-03-25  7:59     ` Jason Wang
2022-03-25 23:15       ` Si-Wei Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53dd4ba4-9c7e-cc0f-eaed-3c884dd1b144@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=eli@mellanox.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).