Re: [PATCH v4 0/7] Move memory listener register to vhost_vdpa_init

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Jonah Palmer <jonah.palmer@oracle.com>,
	 qemu-devel@nongnu.org, eperezma@redhat.com, peterx@redhat.com,
	mst@redhat.com, lvivier@redhat.com, dtatulea@nvidia.com,
	leiyang@redhat.com, parav@mellanox.com, sgarzare@redhat.com,
	lingshan.zhu@intel.com, boris.ostrovsky@oracle.com
Subject: Re: [PATCH v4 0/7] Move memory listener register to vhost_vdpa_init
Date: Thu, 29 May 2025 00:57:30 -0700	[thread overview]
Message-ID: <dcbf9e2e-9442-4439-8593-dff036a4d781@oracle.com> (raw)
In-Reply-To: <87frgr7mvk.fsf@pond.sub.org>



On 5/26/2025 2:16 AM, Markus Armbruster wrote:
> Si-Wei Liu <si-wei.liu@oracle.com> writes:
>
>> On 5/15/2025 11:40 PM, Markus Armbruster wrote:
>>> Jason Wang <jasowang@redhat.com> writes:
>>>
>>>> On Thu, May 8, 2025 at 2:47 AM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>> Current memory operations like pinning may take a lot of time at the
>>>>> destination.  Currently they are done after the source of the migration is
>>>>> stopped, and before the workload is resumed at the destination.  This is a
>>>>> period where neigher traffic can flow, nor the VM workload can continue
>>>>> (downtime).
>>>>>
>>>>> We can do better as we know the memory layout of the guest RAM at the
>>>>> destination from the moment that all devices are initializaed.  So
>>>>> moving that operation allows QEMU to communicate the kernel the maps
>>>>> while the workload is still running in the source, so Linux can start
>>>>> mapping them.
>>>>>
>>>>> As a small drawback, there is a time in the initialization where QEMU
>>>>> cannot respond to QMP etc.  By some testing, this time is about
>>>>> 0.2seconds.
>>>> Adding Markus to see if this is a real problem or not.
>>> I guess the answer is "depends", and to get a more useful one, we need
>>> more information.
>>>
>>> When all you care is time from executing qemu-system-FOO to guest
>>> finish booting, and the guest takes 10s to boot, then an extra 0.2s
>>> won't matter much.
>> There's no such delay of an extra 0.2s or higher per se, it's just shifting around the page pinning hiccup, no matter it is 0.2s or something else, from the time of guest booting up to before guest is booted. This saves back guest boot time or start up delay, but in turn the same delay effectively will be charged to VM launch time. We follow the same model with VFIO, which would see the same hiccup during launch (at an early stage where no real mgmt software would care about).
>>
>>> When a management application runs qemu-system-FOO several times to
>>> probe its capabilities via QMP, then even milliseconds can hurt.
>>>
>> Not something like that, this page pinning hiccup is one time only that occurs in the very early stage when launching QEMU, i.e. there's no consistent delay every time when QMP is called. The delay in QMP response at that very point depends on how much memory the VM has, but this is just specif to VM with VFIO or vDPA devices that have to pin memory for DMA. Having said, there's no extra delay at all if QEMU args has no vDPA device assignment, on the other hand, there's same delay or QMP hiccup when VFIO is around in QEMU args.
>>
>>> In what scenarios exactly is QMP delayed?
>> Having said, this is not a new problem to QEMU in particular, this QMP delay is not peculiar, it's existent on VFIO as well.
> In what scenarios exactly is QMP delayed compared to before the patch?
The page pinning process now runs in a pretty early phase at qemu_init() 
e.g. machine_run_board_init(), before any QMP command can be serviced, 
the latter of which typically would be able to get run from 
qemu_main_loop() until the AIO gets chance to be started to get polled 
and dispatched to bh. Technically it's not a real delay for specific QMP 
command, but rather an extended span of initialization process may take 
place before the very first QMP request, usually qmp_capabilities, will 
be serviced. It's natural for mgmt software to expect initialization 
delay for the first qmp_capabilities response if it has to immediately 
issue one after launching qemu, especially when you have a large guest 
with hundred GBs of memory and with passthrough device that has to pin 
memory for DMA e.g. VFIO, the delayed effect from the QEMU 
initialization process is very visible too. On the other hand, before 
the patch, if memory happens to be in the middle of being pinned, any 
ongoing QMP can't be serviced by the QEMU main loop, either.

I'd also like to highlight that without this patch, the pretty high 
delay due to page pinning is even visible to the guest in addition to 
just QMP delay, which largely affected guest boot time with vDPA device 
already. It is long standing, and every VM user with vDPA device would 
like to avoid such high delay for the first boot, which is not seen with 
similar device e.g. VFIO passthrough.

>
>> Thanks,
>> -Siwei
>>
>>> You told us an absolute delay you observed.  What's the relative delay,
>>> i.e. what's the delay with and without these patches?
> Can you answer this question?
I thought I already got that answered in earlier reply. The relative 
delay is subject to the size of memory. Usually mgmt software won't be 
able to notice, unless the guest has more than 100GB of THP memory to 
pin, for DMA or whatever reason.


>
>>> We need QMP to become available earlier in the startup sequence for
>>> other reasons.  Could we bypass the delay that way?  Please understand
>>> that this would likely be quite difficult: we know from experience that
>>> messing with the startup sequence is prone to introduce subtle
>>> compatility breaks and even bugs.
>>>
>>>> (I remember VFIO has some optimization in the speed of the pinning,
>>>> could vDPA do the same?)
>>> That's well outside my bailiwick :)

Please be understood that any possible optimization is out of scope of 
this patch series, while there's certainly way around that already and 
to be carry out in the future, as Peter alluded to in earlier discussion 
thread:

https://lore.kernel.org/qemu-devel/ZZT7wuq-_IhfN_wR@x1n/
https://lore.kernel.org/qemu-devel/ZZZUNsOVxxqr-H5S@x1n/

Thanks,
-Siwei

>>>
>>> [...]
>>>

next prev parent reply	other threads:[~2025-05-29  7:59 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-07 18:46 [PATCH v4 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
2025-05-07 18:46 ` [PATCH v4 1/7] vdpa: check for iova tree initialized at net_client_start Jonah Palmer
2025-05-16  1:52   ` Jason Wang
2025-05-07 18:46 ` [PATCH v4 2/7] vdpa: reorder vhost_vdpa_set_backend_cap Jonah Palmer
2025-05-16  1:53   ` Jason Wang
2025-05-16  1:56   ` Jason Wang
2025-05-07 18:46 ` [PATCH v4 3/7] vdpa: set backend capabilities at vhost_vdpa_init Jonah Palmer
2025-05-16  1:57   ` Jason Wang
2025-05-07 18:46 ` [PATCH v4 4/7] vdpa: add listener_registered Jonah Palmer
2025-05-16  2:00   ` Jason Wang
2025-05-07 18:46 ` [PATCH v4 5/7] vdpa: reorder listener assignment Jonah Palmer
2025-05-16  2:01   ` Jason Wang
2025-05-07 18:46 ` [PATCH v4 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init Jonah Palmer
2025-05-16  2:07   ` Jason Wang
2025-05-07 18:46 ` [PATCH v4 7/7] vdpa: move memory listener register to vhost_vdpa_init Jonah Palmer
2025-05-15  5:42   ` Michael S. Tsirkin
2025-05-15 17:36     ` Si-Wei Liu
2025-05-20 13:23       ` Jonah Palmer
2025-05-14  1:42 ` [PATCH v4 0/7] Move " Lei Yang
2025-05-14 15:49 ` Eugenio Perez Martin
2025-05-15  0:17   ` Si-Wei Liu
2025-05-15  5:43     ` Michael S. Tsirkin
2025-05-15 17:41       ` Si-Wei Liu
2025-05-16 10:45         ` Michael S. Tsirkin
2025-05-15  8:30     ` Eugenio Perez Martin
2025-05-16  1:49     ` Jason Wang
2025-05-20 13:27   ` Jonah Palmer
2025-05-14 23:00 ` Si-Wei Liu
2025-05-16  1:47 ` Jason Wang
2025-05-16  1:51 ` Jason Wang
2025-05-16  6:40   ` Markus Armbruster
2025-05-16 19:09     ` Si-Wei Liu
2025-05-26  9:16       ` Markus Armbruster
2025-05-29  7:57         ` Si-Wei Liu [this message]
2025-06-02  8:08           ` Markus Armbruster
2025-06-02  8:29             ` Markus Armbruster
2025-06-06 16:21               ` Jonah Palmer
2025-06-26 12:08                 ` Markus Armbruster
2025-07-02 19:31                   ` Jonah Palmer
2025-07-04 15:00                     ` Markus Armbruster
2025-07-07 13:21                       ` Jonah Palmer
2025-07-08  8:17                         ` Markus Armbruster
2025-07-09 19:57                           ` Jonah Palmer
2025-07-10  5:31                             ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dcbf9e2e-9442-4439-8593-dff036a4d781@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=armbru@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dtatulea@nvidia.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jonah.palmer@oracle.com \
    --cc=leiyang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@mellanox.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).