From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
qemu-devel@nongnu.org, si-wei.liu@oracle.com,
Liuxiangdong <liuxiangdong5@huawei.com>,
Zhu Lingshan <lingshan.zhu@intel.com>,
"Gonglei (Arei)" <arei.gonglei@huawei.com>,
alvaro.karsz@solid-run.com, Shannon Nelson <snelson@pensando.io>,
Laurent Vivier <lvivier@redhat.com>,
Harpreet Singh Anand <hanand@xilinx.com>,
Gautam Dawar <gdawar@xilinx.com>,
Stefano Garzarella <sgarzare@redhat.com>,
Cornelia Huck <cohuck@redhat.com>, Cindy Lu <lulu@redhat.com>,
Eli Cohen <eli@mellanox.com>, Paolo Bonzini <pbonzini@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Parav Pandit <parav@mellanox.com>,
Juan Quintela <quintela@redhat.com>,
Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: Re: [RFC v2 11/13] vdpa: add vdpa net migration state notifier
Date: Tue, 17 Jan 2023 12:54:20 +0000 [thread overview]
Message-ID: <Y8aafO51a77Xyn2x@work-vm> (raw)
In-Reply-To: <CAJaqyWd5uyCZwVPWb=1wk1uc0mX2hOpcj2qDNtA7WxHrGGvMgQ@mail.gmail.com>
* Eugenio Perez Martin (eperezma@redhat.com) wrote:
> On Tue, Jan 17, 2023 at 10:58 AM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Eugenio Perez Martin (eperezma@redhat.com) wrote:
> > > On Fri, Jan 13, 2023 at 5:55 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Fri, Jan 13, 2023 at 1:25 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > >
> > > > > This allows net to restart the device backend to configure SVQ on it.
> > > > >
> > > > > Ideally, these changes should not be net specific. However, the vdpa net
> > > > > backend is the one with enough knowledge to configure everything because
> > > > > of some reasons:
> > > > > * Queues might need to be shadowed or not depending on its kind (control
> > > > > vs data).
> > > > > * Queues need to share the same map translations (iova tree).
> > > > >
> > > > > Because of that it is cleaner to restart the whole net backend and
> > > > > configure again as expected, similar to how vhost-kernel moves between
> > > > > userspace and passthrough.
> > > > >
> > > > > If more kinds of devices need dynamic switching to SVQ we can create a
> > > > > callback struct like VhostOps and move most of the code there.
> > > > > VhostOps cannot be reused since all vdpa backend share them, and to
> > > > > personalize just for networking would be too heavy.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > > net/vhost-vdpa.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > 1 file changed, 84 insertions(+)
> > > > >
> > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > index 5d7ad6e4d7..f38532b1df 100644
> > > > > --- a/net/vhost-vdpa.c
> > > > > +++ b/net/vhost-vdpa.c
> > > > > @@ -26,6 +26,8 @@
> > > > > #include <err.h>
> > > > > #include "standard-headers/linux/virtio_net.h"
> > > > > #include "monitor/monitor.h"
> > > > > +#include "migration/migration.h"
> > > > > +#include "migration/misc.h"
> > > > > #include "migration/blocker.h"
> > > > > #include "hw/virtio/vhost.h"
> > > > >
> > > > > @@ -33,6 +35,7 @@
> > > > > typedef struct VhostVDPAState {
> > > > > NetClientState nc;
> > > > > struct vhost_vdpa vhost_vdpa;
> > > > > + Notifier migration_state;
> > > > > Error *migration_blocker;
> > > > > VHostNetState *vhost_net;
> > > > >
> > > > > @@ -243,10 +246,86 @@ static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> > > > > return DO_UPCAST(VhostVDPAState, nc, nc0);
> > > > > }
> > > > >
> > > > > +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
> > > > > +{
> > > > > + struct vhost_vdpa *v = &s->vhost_vdpa;
> > > > > + VirtIONet *n;
> > > > > + VirtIODevice *vdev;
> > > > > + int data_queue_pairs, cvq, r;
> > > > > + NetClientState *peer;
> > > > > +
> > > > > + /* We are only called on the first data vqs and only if x-svq is not set */
> > > > > + if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> > > > > + return;
> > > > > + }
> > > > > +
> > > > > + vdev = v->dev->vdev;
> > > > > + n = VIRTIO_NET(vdev);
> > > > > + if (!n->vhost_started) {
> > > > > + return;
> > > > > + }
> > > > > +
> > > > > + if (enable) {
> > > > > + ioctl(v->device_fd, VHOST_VDPA_SUSPEND);
> > > >
> > > > Do we need to check if the device is started or not here?
> > > >
> > >
> > > v->vhost_started is checked right above, right?
> > >
> > > > > + }
> > > >
> > > > I'm not sure I understand the reason for vhost_net_stop() after a
> > > > VHOST_VDPA_SUSPEND. It looks to me those functions are duplicated.
> > > >
> > >
> > > I think this is really worth exploring, and it would have been clearer
> > > if I didn't squash the vhost_reset_status commit by mistake :).
> > >
> > > Looking at qemu master vhost.c:vhost_dev_stop:
> > > if (hdev->vhost_ops->vhost_dev_start) {
> > > hdev->vhost_ops->vhost_dev_start(hdev, false);
> > > }
> > > if (vrings) {
> > > vhost_dev_set_vring_enable(hdev, false);
> > > }
> > > for (i = 0; i < hdev->nvqs; ++i) {
> > > vhost_virtqueue_stop(hdev,
> > > vdev,
> > > hdev->vqs + i,
> > > hdev->vq_index + i);
> > > }
> > >
> > > Both vhost-used and vhost-vdpa set_status(0) at
> > > ->vhost_dev_start(hdev, false). It cleans virtqueue state in vdpa so
> > > they are not recoverable at vhost_virtqueue_stop->get_vring_base, and
> > > I think it is too late for vdpa devices to change it. I guess
> > > vhost-user devices do not lose the state there, but I did not test.
> > >
> > > I call VHOST_VDPA_SUSPEND here so vhost_vdpa_dev_start looks more
> > > similar to vhost_user_dev_start. We can make
> > > vhost_vdpa_dev_start(false) to suspend the device instead. But then we
> > > need to reset it after getting the indexes. That's why I added
> > > vhost_vdpa_reset_status, but I admit it is neither the cleanest
> > > approach nor the best name to it.
> > >
> > > Adding Maxime, RFC here so we can make -vdpa and -user not to divert too much.
> > >
> > > > > + data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > > > > + cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> > > > > + n->max_ncs - n->max_queue_pairs : 0;
> > > > > + vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > > > > +
> > > > > + peer = s->nc.peer;
> > > > > + for (int i = 0; i < data_queue_pairs + cvq; i++) {
> > > > > + VhostVDPAState *vdpa_state;
> > > > > + NetClientState *nc;
> > > > > +
> > > > > + if (i < data_queue_pairs) {
> > > > > + nc = qemu_get_peer(peer, i);
> > > > > + } else {
> > > > > + nc = qemu_get_peer(peer, n->max_queue_pairs);
> > > > > + }
> > > > > +
> > > > > + vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> > > > > + vdpa_state->vhost_vdpa.shadow_data = enable;
> > > > > +
> > > > > + if (i < data_queue_pairs) {
> > > > > + /* Do not override CVQ shadow_vqs_enabled */
> > > > > + vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > > > > + if (unlikely(r < 0)) {
> > > > > + error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
> > > > > + }
> > > > > +}
> > > > > +
> > > > > +static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
> > > > > +{
> > > > > + MigrationState *migration = data;
> > > > > + VhostVDPAState *s = container_of(notifier, VhostVDPAState,
> > > > > + migration_state);
> > > > > +
> > > > > + switch (migration->state) {
> > > > > + case MIGRATION_STATUS_SETUP:
> > > > > + vhost_vdpa_net_log_global_enable(s, true);
> > > > > + return;
> > > > > +
> > > > > + case MIGRATION_STATUS_CANCELLING:
> > > > > + case MIGRATION_STATUS_CANCELLED:
> > > > > + case MIGRATION_STATUS_FAILED:
> > > > > + vhost_vdpa_net_log_global_enable(s, false);
> > > >
> > > > Do we need to recover here?
> > > >
> > >
> > > I may be missing something, but the device is fully reset and restored
> > > in these cases.
> > >
> > > CCing Juan and D. Gilbert, a review would be appreciated to check if
> > > this covers all the cases.
> >
> > I'm surprised I'm not seeing an entry for MIGRATION_STATUS_COMPLETED
> > there.
> >
> > You might consider:
> > if (migration_in_setup(s)) {
> > vhost_vdpa_net_log_global_enable(s, true);
> > } else if (migration_has_finished(s) || migration_has_failed(s)) {
> > vhost_vdpa_net_log_global_enable(s, false);
> > }
> >
>
> Thank you very much for the input, I see this is definitely cleaner
> than my proposal.
>
> Just for completion here I need to handle differently has_finished vs
> has_failed because of recovery. This is easily achievable from your
> snippet so thank you very much.
>
> > I'm not too sure what will happen in your world with postcopy; it's
> > worth testing, just remember on the source you don't want to be changing
> > guest memory when you're in the postcopy phase.
> >
>
> If I'm not wrong postcopy is forbidden as long as there exists a vdpa
> device but I can check it for sure.
Ah yes, we don't want the vdpa writing into the destination RAM during
the postcopy phase; I can imagine with shadow-queues you might be able
to come up with a solution to that - but that's a complication for
another time.
Dave
> Thanks!
>
>
> > Dave
> >
> > > Thanks!
> > >
> > >
> > > > Thanks
> > > >
> > > > > + return;
> > > > > + };
> > > > > +}
> > > > > +
> > > > > static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
> > > > > {
> > > > > struct vhost_vdpa *v = &s->vhost_vdpa;
> > > > >
> > > > > + if (v->feature_log) {
> > > > > + add_migration_state_change_notifier(&s->migration_state);
> > > > > + }
> > > > > +
> > > > > if (v->shadow_vqs_enabled) {
> > > > > v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
> > > > > v->iova_range.last);
> > > > > @@ -280,6 +359,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
> > > > >
> > > > > assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > > > >
> > > > > + if (s->vhost_vdpa.index == 0 && s->vhost_vdpa.feature_log) {
> > > > > + remove_migration_state_change_notifier(&s->migration_state);
> > > > > + }
> > > > > +
> > > > > dev = s->vhost_vdpa.dev;
> > > > > if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
> > > > > g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
> > > > > @@ -767,6 +850,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> > > > > s->vhost_vdpa.device_fd = vdpa_device_fd;
> > > > > s->vhost_vdpa.index = queue_pair_index;
> > > > > s->always_svq = svq;
> > > > > + s->migration_state.notify = vdpa_net_migration_state_notifier;
> > > > > s->vhost_vdpa.shadow_vqs_enabled = svq;
> > > > > s->vhost_vdpa.iova_range = iova_range;
> > > > > s->vhost_vdpa.shadow_data = svq;
> > > > > --
> > > > > 2.31.1
> > > > >
> > > >
> > > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2023-01-17 12:54 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-12 17:24 [RFC v2 00/13] Dinamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 01/13] vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check Eugenio Pérez
2023-01-13 3:12 ` Jason Wang
2023-01-13 6:42 ` Eugenio Perez Martin
2023-01-16 3:01 ` Jason Wang
2023-01-12 17:24 ` [RFC v2 02/13] vdpa net: move iova tree creation from init to start Eugenio Pérez
2023-01-13 3:53 ` Jason Wang
2023-01-13 7:28 ` Eugenio Perez Martin
2023-01-16 3:05 ` Jason Wang
2023-01-16 9:14 ` Eugenio Perez Martin
2023-01-17 4:30 ` Jason Wang
2023-01-12 17:24 ` [RFC v2 03/13] vdpa: copy cvq shadow_data from data vqs, not from x-svq Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 04/13] vdpa: rewind at get_base, not set_base Eugenio Pérez
2023-01-13 4:09 ` Jason Wang
2023-01-13 7:40 ` Eugenio Perez Martin
2023-01-16 3:32 ` Jason Wang
2023-01-16 9:53 ` Eugenio Perez Martin
2023-01-17 4:38 ` Jason Wang
2023-01-17 6:57 ` Eugenio Perez Martin
2023-01-12 17:24 ` [RFC v2 05/13] vdpa net: add migration blocker if cannot migrate cvq Eugenio Pérez
2023-01-13 4:24 ` Jason Wang
2023-01-13 7:46 ` Eugenio Perez Martin
2023-01-16 3:34 ` Jason Wang
2023-01-16 5:23 ` Michael S. Tsirkin
2023-01-16 9:33 ` Eugenio Perez Martin
2023-01-17 5:42 ` Jason Wang
2023-01-12 17:24 ` [RFC v2 06/13] vhost: delay set_vring_ready after DRIVER_OK Eugenio Pérez
2023-01-13 4:36 ` Jason Wang
2023-01-13 8:19 ` Eugenio Perez Martin
2023-01-13 9:51 ` Stefano Garzarella
2023-01-13 10:03 ` Eugenio Perez Martin
2023-01-13 10:37 ` Stefano Garzarella
2023-01-17 15:15 ` Maxime Coquelin
2023-01-16 6:36 ` Jason Wang
2023-01-16 16:16 ` Eugenio Perez Martin
2023-01-17 5:36 ` Jason Wang
2023-01-12 17:24 ` [RFC v2 07/13] vdpa: " Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 08/13] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
2023-01-13 4:39 ` Jason Wang
2023-01-13 8:45 ` Eugenio Perez Martin
2023-01-16 6:48 ` Jason Wang
2023-01-16 16:17 ` Eugenio Perez Martin
2023-01-12 17:24 ` [RFC v2 09/13] vdpa: add feature_log parameter to vhost_vdpa Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 10/13] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
2023-01-13 4:42 ` Jason Wang
2023-01-12 17:24 ` [RFC v2 11/13] vdpa: add vdpa net migration state notifier Eugenio Pérez
2023-01-13 4:54 ` Jason Wang
2023-01-13 9:00 ` Eugenio Perez Martin
2023-01-16 6:51 ` Jason Wang
2023-01-16 15:21 ` Eugenio Perez Martin
2023-01-17 9:58 ` Dr. David Alan Gilbert
2023-01-17 10:23 ` Eugenio Perez Martin
2023-01-17 12:54 ` Dr. David Alan Gilbert [this message]
2023-02-02 1:52 ` Si-Wei Liu
2023-02-02 15:28 ` Eugenio Perez Martin
2023-02-04 2:03 ` Si-Wei Liu
2023-02-13 9:47 ` Eugenio Perez Martin
2023-02-13 22:36 ` Si-Wei Liu
2023-02-14 18:51 ` Eugenio Perez Martin
2023-02-12 14:31 ` Eli Cohen
2023-01-12 17:24 ` [RFC v2 12/13] vdpa: preemptive kick at enable Eugenio Pérez
2023-01-13 2:31 ` Jason Wang
2023-01-13 3:25 ` Zhu, Lingshan
2023-01-13 3:39 ` Jason Wang
2023-01-13 9:06 ` Eugenio Perez Martin
2023-01-16 7:02 ` Jason Wang
2023-02-02 16:55 ` Eugenio Perez Martin
2023-02-02 0:56 ` Si-Wei Liu
2023-02-02 16:53 ` Eugenio Perez Martin
2023-02-04 11:04 ` Si-Wei Liu
2023-02-05 10:00 ` Michael S. Tsirkin
2023-02-06 5:08 ` Si-Wei Liu
2023-01-12 17:24 ` [RFC v2 13/13] vdpa: Conditionally expose _F_LOG in vhost_net devices Eugenio Pérez
2023-02-02 1:00 ` [RFC v2 00/13] Dinamycally switch to vhost shadow virtqueues at vdpa net migration Si-Wei Liu
2023-02-02 11:27 ` Eugenio Perez Martin
2023-02-03 5:08 ` Si-Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y8aafO51a77Xyn2x@work-vm \
--to=dgilbert@redhat.com \
--cc=alvaro.karsz@solid-run.com \
--cc=arei.gonglei@huawei.com \
--cc=cohuck@redhat.com \
--cc=eli@mellanox.com \
--cc=eperezma@redhat.com \
--cc=gdawar@xilinx.com \
--cc=hanand@xilinx.com \
--cc=jasowang@redhat.com \
--cc=lingshan.zhu@intel.com \
--cc=liuxiangdong5@huawei.com \
--cc=lulu@redhat.com \
--cc=lvivier@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mst@redhat.com \
--cc=parav@mellanox.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=sgarzare@redhat.com \
--cc=si-wei.liu@oracle.com \
--cc=snelson@pensando.io \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.