From: Jonah Palmer <jonah.palmer@oracle.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: Peter Xu <peterx@redhat.com>,
qemu-devel@nongnu.org, farosas@suse.de, eblake@redhat.com,
armbru@redhat.com, jasowang@redhat.com, mst@redhat.com,
si-wei.liu@oracle.com, boris.ostrovsky@oracle.com,
Dragos Tatulea DE <dtatulea@nvidia.com>
Subject: Re: [RFC 5/6] virtio,virtio-net: skip consistency check in virtio_load for iterative migration
Date: Mon, 18 Aug 2025 10:46:00 -0400 [thread overview]
Message-ID: <c5b97e10-a8bb-4d59-b509-734eab7d5be3@oracle.com> (raw)
In-Reply-To: <CAJaqyWfc3G5NLnxqXvZFxw2aRnVvOcZbLds5LHzcdoLjVGmOsw@mail.gmail.com>
On 8/18/25 2:51 AM, Eugenio Perez Martin wrote:
> On Fri, Aug 15, 2025 at 4:50 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>>
>>
>> On 8/14/25 5:28 AM, Eugenio Perez Martin wrote:
>>> On Wed, Aug 13, 2025 at 4:06 PM Peter Xu <peterx@redhat.com> wrote:
>>>>
>>>> On Wed, Aug 13, 2025 at 11:25:00AM +0200, Eugenio Perez Martin wrote:
>>>>> On Mon, Aug 11, 2025 at 11:56 PM Peter Xu <peterx@redhat.com> wrote:
>>>>>>
>>>>>> On Mon, Aug 11, 2025 at 05:26:05PM -0400, Jonah Palmer wrote:
>>>>>>> This effort was started to reduce the guest visible downtime by
>>>>>>> virtio-net/vhost-net/vhost-vDPA during live migration, especially
>>>>>>> vhost-vDPA.
>>>>>>>
>>>>>>> The downtime contributed by vhost-vDPA, for example, is not from having to
>>>>>>> migrate a lot of state but rather expensive backend control-plane latency
>>>>>>> like CVQ configurations (e.g. MQ queue pairs, RSS, MAC/VLAN filters, offload
>>>>>>> settings, MTU, etc.). Doing this requires kernel/HW NIC operations which
>>>>>>> dominates its downtime.
>>>>>>>
>>>>>>> In other words, by migrating the state of virtio-net early (before the
>>>>>>> stop-and-copy phase), we can also start staging backend configurations,
>>>>>>> which is the main contributor of downtime when migrating a vhost-vDPA
>>>>>>> device.
>>>>>>>
>>>>>>> I apologize if this series gives the impression that we're migrating a lot
>>>>>>> of data here. It's more along the lines of moving control-plane latency out
>>>>>>> of the stop-and-copy phase.
>>>>>>
>>>>>> I see, thanks.
>>>>>>
>>>>>> Please add these into the cover letter of the next post. IMHO it's
>>>>>> extremely important information to explain the real goal of this work. I
>>>>>> bet it is not expected for most people when reading the current cover
>>>>>> letter.
>>>>>>
>>>>>> Then it could have nothing to do with iterative phase, am I right?
>>>>>>
>>>>>> What are the data needed for the dest QEMU to start staging backend
>>>>>> configurations to the HWs underneath? Does dest QEMU already have them in
>>>>>> the cmdlines?
>>>>>>
>>>>>> Asking this because I want to know whether it can be done completely
>>>>>> without src QEMU at all, e.g. when dest QEMU starts.
>>>>>>
>>>>>> If src QEMU's data is still needed, please also first consider providing
>>>>>> such facility using an "early VMSD" if it is ever possible: feel free to
>>>>>> refer to commit 3b95a71b22827d26178.
>>>>>>
>>>>>
>>>>> While it works for this series, it does not allow to resend the state
>>>>> when the src device changes. For example, if the number of virtqueues
>>>>> is modified.
>>>>
>>>> Some explanation on "how sync number of vqueues helps downtime" would help.
>>>> Not "it might preheat things", but exactly why, and how that differs when
>>>> it's pure software, and when hardware will be involved.
>>>>
>>>
>>> By nvidia engineers to configure vqs (number, size, RSS, etc) takes
>>> about ~200ms:
>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/__;!!ACWV5N9M2RV99hQ!OQdf7sGaBlbXhcFHX7AC7HgYxvFljgwWlIgJCvMgWwFvPqMrAMbWqf0862zV5shIjaUvlrk54fLTK6uo2pA$
>>>
>>> Adding Dragos here in case he can provide more details. Maybe the
>>> numbers have changed though.
>>>
>>> And I guess the difference with pure SW will always come down to PCI
>>> communications, which assume it is slower than configuring the host SW
>>> device in RAM or even CPU cache. But I admin that proper profiling is
>>> needed before making those claims.
>>>
>>> Jonah, can you print the time it takes to configure the vDPA device
>>> with traces vs the time it takes to enable the dataplane of the
>>> device? So we can get an idea of how much time we save with this.
>>>
>>
>> Let me know if this isn't what you're looking for.
>>
>> I'm assuming by "configuration time" you mean:
>> - Time from device startup (entry to vhost_vdpa_dev_start()) to right
>> before we start enabling the vrings (e.g.
>> VHOST_VDPA_SET_VRING_ENABLE in vhost_vdpa_net_cvq_load()).
>>
>> And by "time taken to enable the dataplane" I'm assuming you mean:
>> - Time right before we start enabling the vrings (see above) to right
>> after we enable the last vring (at the end of
>> vhost_vdpa_net_cvq_load())
>>
>> Guest specs: 128G Mem, SVQ=on, CVQ=on, 8 queue pairs:
>>
>> -netdev type=vhost-vdpa,vhostdev=$VHOST_VDPA_0,id=vhost-vdpa0,
>> queues=8,x-svq=on
>>
>> -device virtio-net-pci,netdev=vhost-vdpa0,id=vdpa0,bootindex=-1,
>> romfile=,page-per-vq=on,mac=$VF1_MAC,ctrl_vq=on,mq=on,
>> ctrl_vlan=off,vectors=18,host_mtu=9000,
>> disable-legacy=on,disable-modern=off
>>
>> ---
>>
>> Configuration time: ~31s
>> Dataplane enable time: ~0.14ms
>>
>
> I was vague, but yes, that's representative enough! It would be more
> accurate if the configuration time ends by the time QEMU enables the
> first queue of the dataplane though.
>
> As Si-Wei mentions, is v->shared->listener_registered == true at the
> beginning of vhost_vdpa_dev_start?
>
Ah, I also realized that Qemu I was using for measurements was using a
version before the listener_registered member was introduced.
I retested with the latest changes in Qemu and set x-svq=off, e.g.:
guest specs: 128G Mem, SVQ=off, CVQ=on, 8 queue pairs. I ran testing 3
times for measurements.
v->shared->listener_registered == false at the beginning of
vhost_vdpa_dev_start().
---
Configuration time: Time from first entry into vhost_vdpa_dev_start() to
right after Qemu enables the first VQ.
- 26.947s, 26.606s, 27.326s
Enable dataplane: Time from right after first VQ is enabled to right
after the last VQ is enabled.
- 0.081ms, 0.081ms, 0.079ms
next prev parent reply other threads:[~2025-08-18 14:47 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 12:41 [RFC 0/6] virtio-net: initial iterative live migration support Jonah Palmer
2025-07-22 12:41 ` [RFC 1/6] migration: Add virtio-iterative capability Jonah Palmer
2025-08-06 15:58 ` Peter Xu
2025-08-07 12:50 ` Jonah Palmer
2025-08-07 13:13 ` Peter Xu
2025-08-07 14:20 ` Jonah Palmer
2025-08-08 10:48 ` Markus Armbruster
2025-08-11 12:18 ` Jonah Palmer
2025-08-25 12:44 ` Markus Armbruster
2025-08-25 14:57 ` Jonah Palmer
2025-08-26 6:11 ` Markus Armbruster
2025-08-26 18:08 ` Jonah Palmer
2025-08-27 6:37 ` Markus Armbruster
2025-08-28 15:29 ` Jonah Palmer
2025-08-29 9:24 ` Markus Armbruster
2025-09-01 14:10 ` Jonah Palmer
2025-07-22 12:41 ` [RFC 2/6] virtio-net: Reorder vmstate_virtio_net and helpers Jonah Palmer
2025-07-22 12:41 ` [RFC 3/6] virtio-net: Add SaveVMHandlers for iterative migration Jonah Palmer
2025-07-22 12:41 ` [RFC 4/6] virtio-net: iter live migration - migrate vmstate Jonah Palmer
2025-07-23 6:51 ` Michael S. Tsirkin
2025-07-24 14:45 ` Jonah Palmer
2025-07-25 9:31 ` Michael S. Tsirkin
2025-07-28 12:30 ` Jonah Palmer
2025-07-22 12:41 ` [RFC 5/6] virtio, virtio-net: skip consistency check in virtio_load for iterative migration Jonah Palmer via
2025-07-28 15:30 ` [RFC 5/6] virtio,virtio-net: " Eugenio Perez Martin
2025-07-28 16:23 ` Jonah Palmer
2025-07-30 8:59 ` Eugenio Perez Martin
2025-08-06 16:27 ` Peter Xu
2025-08-07 14:18 ` Jonah Palmer
2025-08-07 16:31 ` Peter Xu
2025-08-11 12:30 ` Jonah Palmer
2025-08-11 13:39 ` Peter Xu
2025-08-11 21:26 ` Jonah Palmer
2025-08-11 21:55 ` Peter Xu
2025-08-12 15:51 ` Jonah Palmer
2025-08-13 9:25 ` Eugenio Perez Martin
2025-08-13 14:06 ` Peter Xu
2025-08-14 9:28 ` Eugenio Perez Martin
2025-08-14 16:16 ` Dragos Tatulea
2025-08-14 20:27 ` Peter Xu
2025-08-15 14:50 ` Jonah Palmer
2025-08-15 19:35 ` Si-Wei Liu
2025-08-18 6:51 ` Eugenio Perez Martin
2025-08-18 14:46 ` Jonah Palmer [this message]
2025-08-18 16:21 ` Peter Xu
2025-08-19 7:20 ` Eugenio Perez Martin
2025-08-19 7:10 ` Eugenio Perez Martin
2025-08-19 15:10 ` Jonah Palmer
2025-08-20 7:59 ` Eugenio Perez Martin
2025-08-25 12:16 ` Jonah Palmer
2025-08-27 16:55 ` Jonah Palmer
2025-09-01 6:57 ` Eugenio Perez Martin
2025-09-01 13:17 ` Jonah Palmer
2025-09-02 7:31 ` Eugenio Perez Martin
2025-07-22 12:41 ` [RFC 6/6] virtio-net: skip vhost_started assertion during " Jonah Palmer
2025-07-23 5:51 ` [RFC 0/6] virtio-net: initial iterative live migration support Jason Wang
2025-07-24 21:59 ` Jonah Palmer
2025-07-25 9:18 ` Lei Yang
2025-07-25 9:33 ` Michael S. Tsirkin
2025-07-28 7:09 ` Jason Wang
2025-07-28 7:35 ` Jason Wang
2025-07-28 12:41 ` Jonah Palmer
2025-07-28 14:51 ` Eugenio Perez Martin
2025-07-28 15:38 ` Eugenio Perez Martin
2025-07-29 2:38 ` Jason Wang
2025-07-29 12:41 ` Jonah Palmer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c5b97e10-a8bb-4d59-b509-734eab7d5be3@oracle.com \
--to=jonah.palmer@oracle.com \
--cc=armbru@redhat.com \
--cc=boris.ostrovsky@oracle.com \
--cc=dtatulea@nvidia.com \
--cc=eblake@redhat.com \
--cc=eperezma@redhat.com \
--cc=farosas@suse.de \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=si-wei.liu@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).