qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
@ 2025-03-26 16:22 Marco Cavenati
  2025-03-26 16:48 ` Daniel P. Berrangé
  2025-03-29  5:41 ` Michael Tokarev
  0 siblings, 2 replies; 12+ messages in thread
From: Marco Cavenati @ 2025-03-26 16:22 UTC (permalink / raw)
  To: Peter Xu, Fabiano Rosas; +Cc: qemu-devel, Marco Cavenati, qemu-trivial

The SEEK_CUR case in qio_channel_block_seek was incorrectly using the
'whence' parameter instead of the 'offset' parameter when calculating the
new position.

Fixes: 65cf200a51ddc6d0a28ecceac30dc892233cddd7 ("migration: introduce a QIOChannel impl for BlockDriverState VMState")

Signed-off-by: Marco Cavenati <Marco.Cavenati@eurecom.fr>
---
 migration/channel-block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/channel-block.c b/migration/channel-block.c
index fff8d87094..b0477f5b6d 100644
--- a/migration/channel-block.c
+++ b/migration/channel-block.c
@@ -123,7 +123,7 @@ qio_channel_block_seek(QIOChannel *ioc,
         bioc->offset = offset;
         break;
     case SEEK_CUR:
-        bioc->offset += whence;
+        bioc->offset += offset;
         break;
     case SEEK_END:
         error_setg(errp, "Size of VMstate region is unknown");
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
  2025-03-26 16:22 [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek Marco Cavenati
@ 2025-03-26 16:48 ` Daniel P. Berrangé
  2025-03-29  5:41 ` Michael Tokarev
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel P. Berrangé @ 2025-03-26 16:48 UTC (permalink / raw)
  To: Marco Cavenati; +Cc: Peter Xu, Fabiano Rosas, qemu-devel, qemu-trivial

On Wed, Mar 26, 2025 at 05:22:30PM +0100, Marco Cavenati wrote:
> The SEEK_CUR case in qio_channel_block_seek was incorrectly using the
> 'whence' parameter instead of the 'offset' parameter when calculating the
> new position.
> 
> Fixes: 65cf200a51ddc6d0a28ecceac30dc892233cddd7 ("migration: introduce a QIOChannel impl for BlockDriverState VMState")
> 
> Signed-off-by: Marco Cavenati <Marco.Cavenati@eurecom.fr>
> ---
>  migration/channel-block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
  2025-03-26 16:22 [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek Marco Cavenati
  2025-03-26 16:48 ` Daniel P. Berrangé
@ 2025-03-29  5:41 ` Michael Tokarev
  2025-03-31 12:26   ` Fabiano Rosas
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2025-03-29  5:41 UTC (permalink / raw)
  To: Marco Cavenati, Peter Xu, Fabiano Rosas; +Cc: qemu-devel, qemu-trivial

26.03.2025 19:22, Marco Cavenati wrote:
> The SEEK_CUR case in qio_channel_block_seek was incorrectly using the
> 'whence' parameter instead of the 'offset' parameter when calculating the
> new position.
> 
> Fixes: 65cf200a51ddc6d0a28ecceac30dc892233cddd7 ("migration: introduce a QIOChannel impl for BlockDriverState VMState")
> 
> Signed-off-by: Marco Cavenati <Marco.Cavenati@eurecom.fr>
> ---
>   migration/channel-block.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/channel-block.c b/migration/channel-block.c
> index fff8d87094..b0477f5b6d 100644
> --- a/migration/channel-block.c
> +++ b/migration/channel-block.c
> @@ -123,7 +123,7 @@ qio_channel_block_seek(QIOChannel *ioc,
>           bioc->offset = offset;
>           break;
>       case SEEK_CUR:
> -        bioc->offset += whence;
> +        bioc->offset += offset;
>           break;
>       case SEEK_END:
>           error_setg(errp, "Size of VMstate region is unknown");

Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>

This is a (trivial) bugfix, I'd say it should be in 10.0.
Will you guys send a pullreq for the block layer, or should
I make a single-patch pullreq from the trivial tree?

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
  2025-03-29  5:41 ` Michael Tokarev
@ 2025-03-31 12:26   ` Fabiano Rosas
  2025-08-26 20:32     ` Michael Tokarev
  0 siblings, 1 reply; 12+ messages in thread
From: Fabiano Rosas @ 2025-03-31 12:26 UTC (permalink / raw)
  To: Michael Tokarev, Marco Cavenati, Peter Xu; +Cc: qemu-devel, qemu-trivial

Michael Tokarev <mjt@tls.msk.ru> writes:

> 26.03.2025 19:22, Marco Cavenati wrote:
>> The SEEK_CUR case in qio_channel_block_seek was incorrectly using the
>> 'whence' parameter instead of the 'offset' parameter when calculating the
>> new position.
>> 
>> Fixes: 65cf200a51ddc6d0a28ecceac30dc892233cddd7 ("migration: introduce a QIOChannel impl for BlockDriverState VMState")
>> 
>> Signed-off-by: Marco Cavenati <Marco.Cavenati@eurecom.fr>
>> ---
>>   migration/channel-block.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/migration/channel-block.c b/migration/channel-block.c
>> index fff8d87094..b0477f5b6d 100644
>> --- a/migration/channel-block.c
>> +++ b/migration/channel-block.c
>> @@ -123,7 +123,7 @@ qio_channel_block_seek(QIOChannel *ioc,
>>           bioc->offset = offset;
>>           break;
>>       case SEEK_CUR:
>> -        bioc->offset += whence;
>> +        bioc->offset += offset;
>>           break;
>>       case SEEK_END:
>>           error_setg(errp, "Size of VMstate region is unknown");
>
> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
>
> This is a (trivial) bugfix, I'd say it should be in 10.0.
> Will you guys send a pullreq for the block layer, or should
> I make a single-patch pullreq from the trivial tree?

I'll take it. It's not entirely trivial as it shifts a value by 1 in
mapped-ram migration. Fortunately, it's a value that doesn't need to be
the same between migration source and destination.

Thanks

>
> Thanks,
>
> /mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
  2025-03-31 12:26   ` Fabiano Rosas
@ 2025-08-26 20:32     ` Michael Tokarev
  2025-08-26 21:27       ` Peter Xu
  2025-08-26 21:52       ` Fabiano Rosas
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-08-26 20:32 UTC (permalink / raw)
  To: farosas; +Cc: Marco.Cavenati, mjt, peterx, qemu-devel, qemu-trivial

Hi!

This is

commit c0b32426ce56182c1ce2a12904f3a702c2ecc460
Author: Marco Cavenati <Marco.Cavenati@eurecom.fr>
Date:   Wed Mar 26 17:22:30 2025 +0100

     migration: fix SEEK_CUR offset calculation in qio_channel_block_seek

which went to 10.0.0-rc2, and has been cherry-picked to
7.2 and 9.2 stable series.

Reportedly it breaks migration in 7.2.18 and up.  Which is
kinda strange, as it shouldn't do any harm?

https://bugs.debian.org/1112044

any guess what's going on?

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
  2025-08-26 20:32     ` Michael Tokarev
@ 2025-08-26 21:27       ` Peter Xu
  2025-08-26 21:52       ` Fabiano Rosas
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Xu @ 2025-08-26 21:27 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: farosas, Marco.Cavenati, qemu-devel, qemu-trivial

On Tue, Aug 26, 2025 at 11:32:20PM +0300, Michael Tokarev wrote:
> Hi!
> 
> This is
> 
> commit c0b32426ce56182c1ce2a12904f3a702c2ecc460
> Author: Marco Cavenati <Marco.Cavenati@eurecom.fr>
> Date:   Wed Mar 26 17:22:30 2025 +0100
> 
>     migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
> 
> which went to 10.0.0-rc2, and has been cherry-picked to
> 7.2 and 9.2 stable series.
> 
> Reportedly it breaks migration in 7.2.18 and up.  Which is
> kinda strange, as it shouldn't do any harm?
> 
> https://bugs.debian.org/1112044
> 
> any guess what's going on?

The only thing I can think of is, when it used to save to some file /
snapshot, then if the old image was stored with some wrong offsets (due to
wrong seek()s) then a new QEMU with correct offsets will instead read wrong
data even if they started to do the right things..

The reporter says:

        This occurs during live-migrating a guest onto a host with u15,
        migrating it back fixes the softlocks. A reset is required to fix
        it but is only required when the receiving host is on the latest
        version.

So it's a host-to-host live migration.  Is that using TCP as URI?  The
problem is I don't even think TCP layer should use io_seek at all.

qio_channel_io_seek() is only used in below (except VFIO when used with
multifd, that doesn't look like what the reporter was using..):

*** migration/file.c:
file_start_outgoing_migration[121] if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
file_start_incoming_migration[190] qio_channel_io_seek(QIO_CHANNEL(fioc), offset, SEEK_SET, errp) < 0) {

*** migration/qemu-file.c:
qemu_set_offset[611]           ret = qio_channel_io_seek(f->ioc, off, whence, &err);
qemu_get_offset[624]           ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, &err);

All these references are about file migrations, not generic live
migrations..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
  2025-08-26 20:32     ` Michael Tokarev
  2025-08-26 21:27       ` Peter Xu
@ 2025-08-26 21:52       ` Fabiano Rosas
  2025-08-27  6:59         ` [PATCH] virtio: Call set_features during reset (was: migration: fix SEEK_CUR offset calculation in qio_channel_block_seek) Michael Tokarev
  2025-08-28  0:57         ` [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek Akihiko Odaki
  1 sibling, 2 replies; 12+ messages in thread
From: Fabiano Rosas @ 2025-08-26 21:52 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Marco.Cavenati, mjt, peterx, qemu-devel, qemu-trivial,
	Akihiko Odaki

Michael Tokarev <mjt@tls.msk.ru> writes:

+CC Akihiko

> Hi!
>
> This is
>
> commit c0b32426ce56182c1ce2a12904f3a702c2ecc460
> Author: Marco Cavenati <Marco.Cavenati@eurecom.fr>
> Date:   Wed Mar 26 17:22:30 2025 +0100
>
>      migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
>
> which went to 10.0.0-rc2, and has been cherry-picked to
> 7.2 and 9.2 stable series.
>
> Reportedly it breaks migration in 7.2.18 and up.  Which is
> kinda strange, as it shouldn't do any harm?
>

Yeah, this is not it. Unless you're using colo or mapped-ram.

> https://bugs.debian.org/1112044
>
> any guess what's going on?
>

The virtio changes are probably the issue. One of them touches
mhdr.num_buffers, under mergeable_rx_bufs, which is migrated state. The
flag in turn depends on VIRTIO_NET_F_MRG_RXBUF, which is set on the
cmdline with -device virtio-net-pci,mrg_rxbuf= but also reset by
virtio_set_features_nocheck, if I'm reading this right.

Let's ask Akihiko.

> Thanks,
>
> /mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtio: Call set_features during reset (was: migration: fix SEEK_CUR offset calculation in qio_channel_block_seek)
  2025-08-26 21:52       ` Fabiano Rosas
@ 2025-08-27  6:59         ` Michael Tokarev
  2025-08-28  0:57         ` [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek Akihiko Odaki
  1 sibling, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-08-27  6:59 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Marco.Cavenati, peterx, qemu-devel, qemu-trivial, Akihiko Odaki

On 27.08.2025 00:52, Fabiano Rosas wrote:
> Michael Tokarev <mjt@tls.msk.ru> writes:
> 
> +CC Akihiko
> 
>> Hi!
>>
>> This is
>>
>> commit c0b32426ce56182c1ce2a12904f3a702c2ecc460
>> Author: Marco Cavenati <Marco.Cavenati@eurecom.fr>
>> Date:   Wed Mar 26 17:22:30 2025 +0100
>>
>>       migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
>>
>> which went to 10.0.0-rc2, and has been cherry-picked to
>> 7.2 and 9.2 stable series.
>>
>> Reportedly it breaks migration in 7.2.18 and up.  Which is
>> kinda strange, as it shouldn't do any harm?
>>
> 
> Yeah, this is not it. Unless you're using colo or mapped-ram.
> 
>> https://bugs.debian.org/1112044
>>
>> any guess what's going on?
>>
> 
> The virtio changes are probably the issue. One of them touches
> mhdr.num_buffers, under mergeable_rx_bufs, which is migrated state. The
> flag in turn depends on VIRTIO_NET_F_MRG_RXBUF, which is set on the
> cmdline with -device virtio-net-pci,mrg_rxbuf= but also reset by
> virtio_set_features_nocheck, if I'm reading this right.
> 
> Let's ask Akihiko.

The reporter just reported that they used the wrong commit id, --
actually, the issue is caused by ce1431615292dc735597db4062834b:

commit ce1431615292dc735597db4062834bfb271381bc
Author: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Date:   Mon Apr 21 21:17:20 2025 +0900

     virtio: Call set_features during reset

     virtio-net expects set_features() will be called when the feature set
     used by the guest changes to update the number of virtqueues but it is
     not called during reset, which will clear all features, leaving the
     queues added for VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS. Not only these
     extra queues are visible to the guest, they will cause segmentation
     fault during migration.

     Call set_features() during reset to remove those queues for virtio-net
     as we call set_status(). It will also prevent similar bugs for
     virtio-net and other devices in the future.

     Fixes: f9d6dbf0bf6e ("virtio-net: remove virtio queues if the guest 
doesn't support multiqueue")
     Buglink: https://issues.redhat.com/browse/RHEL-73842
     Cc: qemu-stable@nongnu.org
     Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
     Message-Id: <20250421-reset-v2-1-e4c1ead88ea1@daynix.com>
     Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
     (cherry picked from commit 0caed25cd171c611781589b5402161d27d57229c)
     Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

(so commit 0caed25cd171c611781589b5402161 in master).

So yes, this one looks much more real in this context.

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
  2025-08-26 21:52       ` Fabiano Rosas
  2025-08-27  6:59         ` [PATCH] virtio: Call set_features during reset (was: migration: fix SEEK_CUR offset calculation in qio_channel_block_seek) Michael Tokarev
@ 2025-08-28  0:57         ` Akihiko Odaki
  2025-08-29  8:34           ` [PATCH] virtio: Call set_features during reset Michael Tokarev
  1 sibling, 1 reply; 12+ messages in thread
From: Akihiko Odaki @ 2025-08-28  0:57 UTC (permalink / raw)
  To: Fabiano Rosas, Michael Tokarev
  Cc: Marco.Cavenati, peterx, qemu-devel, qemu-trivial

On 2025/08/27 6:52, Fabiano Rosas wrote:
> Michael Tokarev <mjt@tls.msk.ru> writes:
> 
> +CC Akihiko
> 
>> Hi!
>>
>> This is
>>
>> commit c0b32426ce56182c1ce2a12904f3a702c2ecc460
>> Author: Marco Cavenati <Marco.Cavenati@eurecom.fr>
>> Date:   Wed Mar 26 17:22:30 2025 +0100
>>
>>       migration: fix SEEK_CUR offset calculation in qio_channel_block_seek
>>
>> which went to 10.0.0-rc2, and has been cherry-picked to
>> 7.2 and 9.2 stable series.
>>
>> Reportedly it breaks migration in 7.2.18 and up.  Which is
>> kinda strange, as it shouldn't do any harm?
>>
> 
> Yeah, this is not it. Unless you're using colo or mapped-ram.
> 
>> https://bugs.debian.org/1112044
>>
>> any guess what's going on?
>>
> 
> The virtio changes are probably the issue. One of them touches
> mhdr.num_buffers, under mergeable_rx_bufs, which is migrated state. The
> flag in turn depends on VIRTIO_NET_F_MRG_RXBUF, which is set on the
> cmdline with -device virtio-net-pci,mrg_rxbuf= but also reset by
> virtio_set_features_nocheck, if I'm reading this right.

I don't think commit cefd67f25430 ("virtio-net: Fix num_buffers for 
version 1") is related to the issue. Commit ce1431615292 ("virtio: Call 
set_features during reset") is more likely.

virtio_set_features_nocheck() shouldn't reset VIRTIO_NET_F_MRG_RXBUF. It 
calls virtio_net_set_features(), which does not clear features. 
virtio_net_get_features() clears features, but it is called before 
migration.

The posted call trace indicates a lockup happens in the control path, 
but commit cefd67f25430 ("virtio-net: Fix num_buffers for version 1") 
changes the data path.

On the other hand, I can come up with a possible failure scenario with 
commit ce1431615292 ("virtio: Call set_features during reset"). Perhaps 
it changed the machine state before loading the migrated state, and 
caused a mismatch between them.

I need more information to understand the issue. A command line to 
reproduce the issue is especially helpful because options like 
mrg_rxbuf=, which you mentioned, tell enabled features, which is 
valuable information.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtio: Call set_features during reset
  2025-08-28  0:57         ` [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek Akihiko Odaki
@ 2025-08-29  8:34           ` Michael Tokarev
  2025-08-29 14:40             ` Akihiko Odaki
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2025-08-29  8:34 UTC (permalink / raw)
  To: Akihiko Odaki, Fabiano Rosas; +Cc: peterx, qemu-devel

On 28.08.2025 03:57, Akihiko Odaki wrote:

> The posted call trace indicates a lockup happens in the control path, 
> but commit cefd67f25430 ("virtio-net: Fix num_buffers for version 1") 
> changes the data path.
> 
> On the other hand, I can come up with a possible failure scenario with 
> commit ce1431615292 ("virtio: Call set_features during reset"). Perhaps 
> it changed the machine state before loading the migrated state, and 
> caused a mismatch between them.

Yes, the problem commit is 0caed25cd171c6 "virtio: Call set_features
during reset", - the OP corrected himself in the next message (subject
line updated).

> I need more information to understand the issue. A command line to 
> reproduce the issue is especially helpful because options like 
> mrg_rxbuf=, which you mentioned, tell enabled features, which is 
> valuable information.

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1112044#69 for
the command line.  The guest is started by libvirtd.

Please note: this is stable-7.2 series, it is *not* master (I picked
up this commit to 7.2.x).  It'd be interesting to check if master is
affected too.  Unfortunately I never tried migration, and now I only
have my notebook, so can only migrate between two qemu instances on
the same box - which is probably okay too.

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtio: Call set_features during reset
  2025-08-29  8:34           ` [PATCH] virtio: Call set_features during reset Michael Tokarev
@ 2025-08-29 14:40             ` Akihiko Odaki
  2025-08-29 15:14               ` Michael Tokarev
  0 siblings, 1 reply; 12+ messages in thread
From: Akihiko Odaki @ 2025-08-29 14:40 UTC (permalink / raw)
  To: Michael Tokarev, Fabiano Rosas; +Cc: peterx, qemu-devel

On 2025/08/29 17:34, Michael Tokarev wrote:
> On 28.08.2025 03:57, Akihiko Odaki wrote:
> 
>> The posted call trace indicates a lockup happens in the control path, 
>> but commit cefd67f25430 ("virtio-net: Fix num_buffers for version 1") 
>> changes the data path.
>>
>> On the other hand, I can come up with a possible failure scenario with 
>> commit ce1431615292 ("virtio: Call set_features during reset"). 
>> Perhaps it changed the machine state before loading the migrated 
>> state, and caused a mismatch between them.
> 
> Yes, the problem commit is 0caed25cd171c6 "virtio: Call set_features
> during reset", - the OP corrected himself in the next message (subject
> line updated).
> 
>> I need more information to understand the issue. A command line to 
>> reproduce the issue is especially helpful because options like 
>> mrg_rxbuf=, which you mentioned, tell enabled features, which is 
>> valuable information.
> 
> See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1112044#69 for
> the command line.  The guest is started by libvirtd.

Thank you, now I think I understand the problem.

> 
> Please note: this is stable-7.2 series, it is *not* master (I picked
> up this commit to 7.2.x).  It'd be interesting to check if master is
> affected too.  Unfortunately I never tried migration, and now I only
> have my notebook, so can only migrate between two qemu instances on
> the same box - which is probably okay too.

I think you need to backport commit 9379ea9db3c0 ("virtio-net: Add 
queues before loading them") and adda0ad56bd2 ("virtio-net: Add queues 
for RSS during migration"). Here is an explanation:

First, let me define two variables for conciseness:
N: the number of queue pairs
M: the maximum number of queue pairs, which is determined with
    n->max_queue_pairs

The problem is that QEMU inconsistently chose N for virtio-net in the 
past. Before commit 8c49756825da ("virtio-net: Add only one queue pair 
when realizing"):
1) realize() chose M.
2) set_features() chose: 1 (when RSS and MQ are disabled)
                          M (otherwise)

This itself was a problem; both RSS and MQ were disabled when realize() 
but N was M, which is inconsistent with 2) and this inconsistency was 
guest-visible.

I wrote commit 8c49756825da ("virtio-net: Add only one queue pair when 
realizing") to make QEMU implement the behavior in 2) also during 
realization and fix the inconsistency, but it broke migration when the 
migrated VM had enabled VIRTIO_NET_F_RSS and VIRTIO_NET_F_MQ because it 
expected that N == M.

This is also why the backported commit also broke migration; it 
accidentally fixed the inconsistency between the first reset state and 
the state after set_features() and caused the same problem.

I wrote commit 9379ea9db3c0 ("virtio-net: Add queues before loading 
them") to fix the issue and later complemented it with commit 
adda0ad56bd2 ("virtio-net: Add queues for RSS during migration").

There are several relevant commits because I could not fix the 
underlying problem at once, but hopefully this email clarifies how the 
two commits fixed it in the end.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] virtio: Call set_features during reset
  2025-08-29 14:40             ` Akihiko Odaki
@ 2025-08-29 15:14               ` Michael Tokarev
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-08-29 15:14 UTC (permalink / raw)
  To: Akihiko Odaki; +Cc: qemu-devel

On 29.08.2025 17:40, Akihiko Odaki wrote:
> On 2025/08/29 17:34, Michael Tokarev wrote:
>> On 28.08.2025 03:57, Akihiko Odaki wrote:
...
>> Please note: this is stable-7.2 series, it is *not* master (I picked
>> up this commit to 7.2.x).  It'd be interesting to check if master is
>> affected too.  Unfortunately I never tried migration, and now I only
>> have my notebook, so can only migrate between two qemu instances on
>> the same box - which is probably okay too.
> 
> I think you need to backport commit 9379ea9db3c0 ("virtio-net: Add 
> queues before loading them") and adda0ad56bd2 ("virtio-net: Add queues 
> for RSS during migration"). Here is an explanation:

"Add queues before loading them" is marked as fixing

Both the mentioned commits were tagged with Cc: qemu-stable, but both
has Fixes: the same commit 8c49756825da ("virtio-net: Add only one queue
pair when realizing"), which is in 9.1, but not in 7.2.  I guess, in
this case, I also need 8c49756825da - let's first break things, and
next fix them :)

> First, let me define two variables for conciseness:
> N: the number of queue pairs
> M: the maximum number of queue pairs, which is determined with
>     n->max_queue_pairs
> 
> The problem is that QEMU inconsistently chose N for virtio-net in the 
> past. Before commit 8c49756825da ("virtio-net: Add only one queue pair 
> when realizing"):
> 1) realize() chose M.
> 2) set_features() chose: 1 (when RSS and MQ are disabled)
>                           M (otherwise)
> 
> This itself was a problem; both RSS and MQ were disabled when realize() 
> but N was M, which is inconsistent with 2) and this inconsistency was 
> guest-visible.
> 
> I wrote commit 8c49756825da ("virtio-net: Add only one queue pair when 
> realizing") to make QEMU implement the behavior in 2) also during 
> realization and fix the inconsistency, but it broke migration when the 
> migrated VM had enabled VIRTIO_NET_F_RSS and VIRTIO_NET_F_MQ because it 
> expected that N == M.

Yup, 8c49756825da it is, which is missing in 7.2, which broke things :))

> This is also why the backported commit also broke migration; it 
> accidentally fixed the inconsistency between the first reset state and 
> the state after set_features() and caused the same problem.
> 
> I wrote commit 9379ea9db3c0 ("virtio-net: Add queues before loading 
> them") to fix the issue and later complemented it with commit 
> adda0ad56bd2 ("virtio-net: Add queues for RSS during migration").
> 
> There are several relevant commits because I could not fix the 
> underlying problem at once, but hopefully this email clarifies how the 
> two commits fixed it in the end.

Now, with 3 commits picked up to 7.2, things are at least compiles :))

Let's test the result..  hopefully I didn't break something else in the
process of breaking+fixing stuff :))

Thank you very much for the detailed explanation and attention to
details.  It makes sense.

An for the 7.2.x series which is getting old(ish) - I plan to end
support for it in the near future, so there will be less surprises
like this one.. if not only for 10.0.x :)

/mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-08-30 15:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26 16:22 [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek Marco Cavenati
2025-03-26 16:48 ` Daniel P. Berrangé
2025-03-29  5:41 ` Michael Tokarev
2025-03-31 12:26   ` Fabiano Rosas
2025-08-26 20:32     ` Michael Tokarev
2025-08-26 21:27       ` Peter Xu
2025-08-26 21:52       ` Fabiano Rosas
2025-08-27  6:59         ` [PATCH] virtio: Call set_features during reset (was: migration: fix SEEK_CUR offset calculation in qio_channel_block_seek) Michael Tokarev
2025-08-28  0:57         ` [PATCH] migration: fix SEEK_CUR offset calculation in qio_channel_block_seek Akihiko Odaki
2025-08-29  8:34           ` [PATCH] virtio: Call set_features during reset Michael Tokarev
2025-08-29 14:40             ` Akihiko Odaki
2025-08-29 15:14               ` Michael Tokarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).