qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jonah Palmer <jonah.palmer@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, farosas@suse.de, eblake@redhat.com,
	armbru@redhat.com, jasowang@redhat.com, mst@redhat.com,
	si-wei.liu@oracle.com, eperezma@redhat.com,
	boris.ostrovsky@oracle.com
Subject: Re: [RFC 5/6] virtio,virtio-net: skip consistency check in virtio_load for iterative migration
Date: Mon, 11 Aug 2025 17:26:05 -0400	[thread overview]
Message-ID: <eafcf9ca-f23f-42d5-b8c2-69f81a395d11@oracle.com> (raw)
In-Reply-To: <aJnydjxFzKwVzi7Y@x1.local>



On 8/11/25 9:39 AM, Peter Xu wrote:
> On Mon, Aug 11, 2025 at 08:30:19AM -0400, Jonah Palmer wrote:
>>
>>
>> On 8/7/25 12:31 PM, Peter Xu wrote:
>>> On Thu, Aug 07, 2025 at 10:18:38AM -0400, Jonah Palmer wrote:
>>>>
>>>>
>>>> On 8/6/25 12:27 PM, Peter Xu wrote:
>>>>> On Tue, Jul 22, 2025 at 12:41:26PM +0000, Jonah Palmer wrote:
>>>>>> Iterative live migration for virtio-net sends an initial
>>>>>> VMStateDescription while the source is still active. Because data
>>>>>> continues to flow for virtio-net, the guest's avail index continues to
>>>>>> increment after last_avail_idx had already been sent. This causes the
>>>>>> destination to often see something like this from virtio_error():
>>>>>>
>>>>>> VQ 0 size 0x100 Guest index 0x0 inconsistent with Host index 0xc: delta 0xfff4
>>>>>
>>>>> This is pretty much understanable, as vmstate_save() / vmstate_load() are,
>>>>> IMHO, not designed to be used while VM is running.
>>>>>
>>>>> To me, it's still illegal (per previous patch) to use vmstate_save_state()
>>>>> while VM is running, in a save_setup() phase.
>>>>
>>>> Yea I understand where you're coming from. It just seemed too good to pass
>>>> up on as a way to send and receive the entire state of a device.
>>>>
>>>> I felt that if I were to implement something similar for iterative migration
>>>> only that I'd, more or less, be duplicating a lot of already existing code
>>>> or vmstate logic.
>>>>
>>>>>
>>>>> Some very high level questions from migration POV:
>>>>>
>>>>> - Have we figured out why the downtime can be shrinked just by sending the
>>>>>      vmstate twice?
>>>>>
>>>>>      If we suspect it's memory got preheated, have we tried other ways to
>>>>>      simply heat the memory up on dest side?  For example, some form of
>>>>>      mlock[all]()?  IMHO it's pretty important we figure out the root of why
>>>>>      such optimization came from.
>>>>>
>>>>>      I do remember we have downtime issue with number of max_vqueues that may
>>>>>      cause post_load() to be slow, I wonder there're other ways to improve it
>>>>>      instead of vmstate_save(), especially in setup phase.
>>>>>
>>>>
>>>> Yea I believe that the downtime shrinks on the second vmstate_load_state due
>>>> to preheated memory. But I'd like to stress that it's not my intention to
>>>> resend the entire vmstate again during the stop-and-copy phase if iterative
>>>> migration was used. A future iteration of this series will eventually
>>>> include a more efficient approach to update the destination with any deltas
>>>> since the vmstate was sent during the iterative portion (instead of just
>>>> resending the entire vmstate again).
>>>>
>>>> And yea there is an inefficiency regarding walking through VIRTIO_QUEUE_MAX
>>>> (1024) VQs (twice with PCI) that I mentioned here in another comment: https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/0f5b804d-3852-4159-b151-308a57f1ec74@oracle.com/__;!!ACWV5N9M2RV99hQ!Oyhh-o4V5gzcWsbmSxAkonhYn3xcLBF50-h-a9-D5MiKgbiHvkaAqdu1VZP5SVmuCk5GQu-sjFhL0IUC$
>>>>
>>>> This might be better handled in a separate series though rather than as part
>>>> of this one.
>>>
>>> One thing to mention is I recall some other developer was trying to
>>> optimize device load from memory side:
>>>
>>> https://urldefense.com/v3/__https://lore.kernel.org/all/20230317081904.24389-1-xuchuangxclwt@bytedance.com/__;!!ACWV5N9M2RV99hQ!Oyhh-o4V5gzcWsbmSxAkonhYn3xcLBF50-h-a9-D5MiKgbiHvkaAqdu1VZP5SVmuCk5GQu-sjBifRrAz$
>>>
>>> So maybe there're more than one way of doing this, and I'm not sure which
>>> way is better, or both.
>>>
>>
>> Ack. I'll take a look at this.
>>
>>>>
>>>>> - Normally devices need iterative phase because:
>>>>>
>>>>>      (a) the device may contain huge amount of data to transfer
>>>>>
>>>>>          E.g. RAM and VFIO are good examples and fall into this category.
>>>>>
>>>>>      (b) the device states are "iterable" from concept
>>>>>
>>>>>          RAM is definitely true.  VFIO somehow mimiced that even though it was
>>>>>          a streamed binary protocol..
>>>>>
>>>>>      What's the answer for virtio-net here?  How large is the device state?
>>>>>      Is this relevant to vDPA and real hardware (so virtio-net can look
>>>>>      similar to VFIO at some point)?
>>>>
>>>>
>>>> The main motivation behind implementing iterative migration for virtio-net
>>>> is really to improve the guest visible downtime seen when migrating a vDPA
>>>> device.
>>>>
>>>> That is, by implementing iterative migration for virtio-net, we can see the
>>>> state of the device early on and get a head start on work that's currently
>>>> being done during the stop-and-copy phase. If we do this work before the
>>>> stop-and-copy phase, we can further decrease the time spent in this window.
>>>>
>>>> This would include work such as sending down the CVQ commands for queue-pair
>>>> creation (even more beneficial for multiqueue), RSS, filters, etc.
>>>>
>>>> I'm hoping to show this more explicitly in the next version of this RFC
>>>> series that I'm working on now.
>>>
>>> OK, thanks for the context. I can wait and read the new version.
>>>
>>> In all cases, please be noted that since migration thread does not take
>>> BQL, it means either the setup or iterable phase may happen concurrently
>>> with any of the vCPU threads.  I think it means maybe it's not wise to try
>>> to iterate everything: please be ready to see e.g. 64bits MMIO register
>>> being partially updated when dumping it to the wire, for example.
>>>
>>
>> Gotcha. Some of the iterative hooks though like .save_setup, .load_state,
>> etc. do hold the BQL though, right?
> 
> load_state() definitely needs the lock.
> 
> save_setup(), yes we have bql, but I really wish we don't depend on it, and
> I don't know whether it'll keep holding true - AFAIU, the majority of it
> really doesn't need the lock..  and I always wanted to see whether I can
> remove it.
> 
> Normal iterations definitely runs without the lock.
> 

Gotcha. Shouldn't be an issue for my implementation (for .save_setup 
anyway).

>>
>>> Do you have a rough estimation of the size of the device states to migrate?
>>>
>>
>> Do you have a method at how I might be able to estimate this? I've been
>> trying to get some kind of rough estimation but failing to do so.
> 
> Could I ask why you started this "migrate virtio-net in iteration phase"
> effort?
> 
> I thought it was because there're a lot of data to migrate, and there
> should be a way to estimate the minumum.  So is it not the case?
> 
> How about vDPA devices?  Do those devices have a lot of data to migrate?
> 
> We really need a good enough reason to have a device provide
> save_iterate().  If it's only about "preheat some MMIO registers", we
> should, IMHO, look at more generic ways first.
> 

This effort was started to reduce the guest visible downtime by 
virtio-net/vhost-net/vhost-vDPA during live migration, especially 
vhost-vDPA.

The downtime contributed by vhost-vDPA, for example, is not from having 
to migrate a lot of state but rather expensive backend control-plane 
latency like CVQ configurations (e.g. MQ queue pairs, RSS, MAC/VLAN 
filters, offload settings, MTU, etc.). Doing this requires kernel/HW NIC 
operations which dominates its downtime.

In other words, by migrating the state of virtio-net early (before the 
stop-and-copy phase), we can also start staging backend configurations, 
which is the main contributor of downtime when migrating a vhost-vDPA 
device.

I apologize if this series gives the impression that we're migrating a 
lot of data here. It's more along the lines of moving control-plane 
latency out of the stop-and-copy phase.

> Thanks,
> 



  reply	other threads:[~2025-08-11 21:28 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-22 12:41 [RFC 0/6] virtio-net: initial iterative live migration support Jonah Palmer
2025-07-22 12:41 ` [RFC 1/6] migration: Add virtio-iterative capability Jonah Palmer
2025-08-06 15:58   ` Peter Xu
2025-08-07 12:50     ` Jonah Palmer
2025-08-07 13:13       ` Peter Xu
2025-08-07 14:20         ` Jonah Palmer
2025-08-08 10:48   ` Markus Armbruster
2025-08-11 12:18     ` Jonah Palmer
2025-08-25 12:44       ` Markus Armbruster
2025-08-25 14:57         ` Jonah Palmer
2025-08-26  6:11           ` Markus Armbruster
2025-08-26 18:08             ` Jonah Palmer
2025-08-27  6:37               ` Markus Armbruster
2025-08-28 15:29                 ` Jonah Palmer
2025-08-29  9:24                   ` Markus Armbruster
2025-09-01 14:10                     ` Jonah Palmer
2025-07-22 12:41 ` [RFC 2/6] virtio-net: Reorder vmstate_virtio_net and helpers Jonah Palmer
2025-07-22 12:41 ` [RFC 3/6] virtio-net: Add SaveVMHandlers for iterative migration Jonah Palmer
2025-07-22 12:41 ` [RFC 4/6] virtio-net: iter live migration - migrate vmstate Jonah Palmer
2025-07-23  6:51   ` Michael S. Tsirkin
2025-07-24 14:45     ` Jonah Palmer
2025-07-25  9:31       ` Michael S. Tsirkin
2025-07-28 12:30         ` Jonah Palmer
2025-07-22 12:41 ` [RFC 5/6] virtio, virtio-net: skip consistency check in virtio_load for iterative migration Jonah Palmer via
2025-07-28 15:30   ` [RFC 5/6] virtio,virtio-net: " Eugenio Perez Martin
2025-07-28 16:23     ` Jonah Palmer
2025-07-30  8:59       ` Eugenio Perez Martin
2025-08-06 16:27   ` Peter Xu
2025-08-07 14:18     ` Jonah Palmer
2025-08-07 16:31       ` Peter Xu
2025-08-11 12:30         ` Jonah Palmer
2025-08-11 13:39           ` Peter Xu
2025-08-11 21:26             ` Jonah Palmer [this message]
2025-08-11 21:55               ` Peter Xu
2025-08-12 15:51                 ` Jonah Palmer
2025-08-13  9:25                 ` Eugenio Perez Martin
2025-08-13 14:06                   ` Peter Xu
2025-08-14  9:28                     ` Eugenio Perez Martin
2025-08-14 16:16                       ` Dragos Tatulea
2025-08-14 20:27                       ` Peter Xu
2025-08-15 14:50                       ` Jonah Palmer
2025-08-15 19:35                         ` Si-Wei Liu
2025-08-18  6:51                         ` Eugenio Perez Martin
2025-08-18 14:46                           ` Jonah Palmer
2025-08-18 16:21                             ` Peter Xu
2025-08-19  7:20                               ` Eugenio Perez Martin
2025-08-19  7:10                             ` Eugenio Perez Martin
2025-08-19 15:10                               ` Jonah Palmer
2025-08-20  7:59                                 ` Eugenio Perez Martin
2025-08-25 12:16                                   ` Jonah Palmer
2025-08-27 16:55                                   ` Jonah Palmer
2025-09-01  6:57                                     ` Eugenio Perez Martin
2025-09-01 13:17                                       ` Jonah Palmer
2025-09-02  7:31                                         ` Eugenio Perez Martin
2025-07-22 12:41 ` [RFC 6/6] virtio-net: skip vhost_started assertion during " Jonah Palmer
2025-07-23  5:51 ` [RFC 0/6] virtio-net: initial iterative live migration support Jason Wang
2025-07-24 21:59   ` Jonah Palmer
2025-07-25  9:18     ` Lei Yang
2025-07-25  9:33     ` Michael S. Tsirkin
2025-07-28  7:09       ` Jason Wang
2025-07-28  7:35         ` Jason Wang
2025-07-28 12:41           ` Jonah Palmer
2025-07-28 14:51           ` Eugenio Perez Martin
2025-07-28 15:38             ` Eugenio Perez Martin
2025-07-29  2:38             ` Jason Wang
2025-07-29 12:41               ` Jonah Palmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eafcf9ca-f23f-42d5-b8c2-69f81a395d11@oracle.com \
    --to=jonah.palmer@oracle.com \
    --cc=armbru@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=eblake@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=farosas@suse.de \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).