All of lore.kernel.org
 help / color / mirror / Atom feed
* WARN in xennet_disconnect_backend when frontend is paused during backend shutdown
@ 2025-09-11 15:11 Marek Marczykowski-Górecki
  2025-09-12  9:49 ` Jürgen Groß
  0 siblings, 1 reply; 3+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-09-11 15:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Jürgen Groß

[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]

Hi,

The steps:
1. Have domU netfront ("untrusted" here) and domU netback
("sys-firewall-alt" here).
2. Pause frontend
3. Shutdown backend
4. Unpause frontend
5. Detach network (in my case attaching another one follows just after,
but I believe it's not relevant).

This gives the following on the frontend side:

    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 141 at include/linux/mm.h:1328 xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
    Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulnighash_clmulni_intel xen_netfront pcspkr xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn i2c_dev loop fuse nfnetlink overlay xen_blkfront
    CPU: 1 UID: 0 PID: 141 Comm: xenwatch Not tainted 6.17.0-0.rc5.1.qubes.1.fc41.x86_64 #1 PREEMPT(full)
    RIP: 0010:xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
    Code: 00 0f 83 93 03 00 00 48 8b 94 dd 90 10 00 00 48 8b 4a 08 f6 c1 01 75 79 66 90 0f b6 4a 33 81 f9 f5 00 00 00 0f 85 f3 fe ff ff <0f> 0b 49 81 ff 00 01 00 00 0f 82 01 ff ff ff 4c 89 fe 48 c7 c7 e0
    RSP: 0018:ffffc90001123cf8 EFLAGS: 00010246
    RAX: 0000000000000010 RBX: 0000000000000001 RCX: 00000000000000f5
    RDX: ffffea0000a05200 RSI: 0000000000000001 RDI: ffffffff82528d60
    RBP: ffff888041400000 R08: ffff888005054c80 R09: ffff888005054c80
    R10: 0000000000150013 R11: ffff88801851cd80 R12: 0000000000000000
    R13: ffff888053619000 R14: ffff888005d61a80 R15: 0000000000000001
    FS:  0000000000000000(0000) GS:ffff8880952c6000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00006182a11f3328 CR3: 000000001084c006 CR4: 0000000000770ef0
    PKRU: 55555554
    Call Trace:
     <TASK>
     xennet_remove+0x1e/0x80 [xen_netfront]
     xenbus_dev_remove+0x6e/0xf0
     device_release_driver_internal+0x19c/0x200
     bus_remove_device+0xc6/0x130
     device_del+0x160/0x3e0
     ? _raw_spin_unlock+0xe/0x30
     ? klist_iter_exit+0x18/0x30
     ? __pfx_xenwatch_thread+0x10/0x10
     device_unregister+0x17/0x60
     xenbus_dev_changed+0x1d7/0x240
     xenwatch_thread+0x8f/0x1c0
     ? __pfx_autoremove_wake_function+0x10/0x10
     kthread+0xf9/0x240
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x152/0x180
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    ---[ end trace 0000000000000000 ]---
    xen_netfront: backend supports XDP headroom
    vif vif-0: bouncing transmitted data to zeroed pages

The last two are likely related to following attach, not detach.

The same happens on 6.15 too, so it isn't new thing.

Shutting down backend without detaching first is not really a normal
operation, and doing that while frontend is paused is even less so. But
is the above expected outcome? If I read it right, it's
WARN_ON_ONCE(folio_test_slab(folio)) in get_page(), which I find
confusing.

Originally reported at https://github.com/QubesOS/qubes-core-agent-linux/pull/603#issuecomment-3280953080

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: WARN in xennet_disconnect_backend when frontend is paused during backend shutdown
  2025-09-11 15:11 WARN in xennet_disconnect_backend when frontend is paused during backend shutdown Marek Marczykowski-Górecki
@ 2025-09-12  9:49 ` Jürgen Groß
  2025-09-12 10:48   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 3+ messages in thread
From: Jürgen Groß @ 2025-09-12  9:49 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3965 bytes --]

On 11.09.25 17:11, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> The steps:
> 1. Have domU netfront ("untrusted" here) and domU netback
> ("sys-firewall-alt" here).
> 2. Pause frontend
> 3. Shutdown backend
> 4. Unpause frontend
> 5. Detach network (in my case attaching another one follows just after,
> but I believe it's not relevant).
> 
> This gives the following on the frontend side:
> 
>      ------------[ cut here ]------------
>      WARNING: CPU: 1 PID: 141 at include/linux/mm.h:1328 xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
>      Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulnighash_clmulni_intel xen_netfront pcspkr xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn i2c_dev loop fuse nfnetlink overlay xen_blkfront
>      CPU: 1 UID: 0 PID: 141 Comm: xenwatch Not tainted 6.17.0-0.rc5.1.qubes.1.fc41.x86_64 #1 PREEMPT(full)
>      RIP: 0010:xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
>      Code: 00 0f 83 93 03 00 00 48 8b 94 dd 90 10 00 00 48 8b 4a 08 f6 c1 01 75 79 66 90 0f b6 4a 33 81 f9 f5 00 00 00 0f 85 f3 fe ff ff <0f> 0b 49 81 ff 00 01 00 00 0f 82 01 ff ff ff 4c 89 fe 48 c7 c7 e0
>      RSP: 0018:ffffc90001123cf8 EFLAGS: 00010246
>      RAX: 0000000000000010 RBX: 0000000000000001 RCX: 00000000000000f5
>      RDX: ffffea0000a05200 RSI: 0000000000000001 RDI: ffffffff82528d60
>      RBP: ffff888041400000 R08: ffff888005054c80 R09: ffff888005054c80
>      R10: 0000000000150013 R11: ffff88801851cd80 R12: 0000000000000000
>      R13: ffff888053619000 R14: ffff888005d61a80 R15: 0000000000000001
>      FS:  0000000000000000(0000) GS:ffff8880952c6000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: 00006182a11f3328 CR3: 000000001084c006 CR4: 0000000000770ef0
>      PKRU: 55555554
>      Call Trace:
>       <TASK>
>       xennet_remove+0x1e/0x80 [xen_netfront]
>       xenbus_dev_remove+0x6e/0xf0
>       device_release_driver_internal+0x19c/0x200
>       bus_remove_device+0xc6/0x130
>       device_del+0x160/0x3e0
>       ? _raw_spin_unlock+0xe/0x30
>       ? klist_iter_exit+0x18/0x30
>       ? __pfx_xenwatch_thread+0x10/0x10
>       device_unregister+0x17/0x60
>       xenbus_dev_changed+0x1d7/0x240
>       xenwatch_thread+0x8f/0x1c0
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       kthread+0xf9/0x240
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0x152/0x180
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
>      ---[ end trace 0000000000000000 ]---
>      xen_netfront: backend supports XDP headroom
>      vif vif-0: bouncing transmitted data to zeroed pages
> 
> The last two are likely related to following attach, not detach.
> 
> The same happens on 6.15 too, so it isn't new thing.
> 
> Shutting down backend without detaching first is not really a normal
> operation, and doing that while frontend is paused is even less so. But
> is the above expected outcome? If I read it right, it's
> WARN_ON_ONCE(folio_test_slab(folio)) in get_page(), which I find
> confusing.
> 
> Originally reported at https://github.com/QubesOS/qubes-core-agent-linux/pull/603#issuecomment-3280953080
> 

Hmm, with this scenario I imagine you could manage to have
xennet_disconnect_backend() running multiple times for the same device
concurrently.

How reliable can this be reproduced? How many vcpus does the guest have?

Maybe the fix is as simple as adding a lock in xennet_disconnect_backend().


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: WARN in xennet_disconnect_backend when frontend is paused during backend shutdown
  2025-09-12  9:49 ` Jürgen Groß
@ 2025-09-12 10:48   ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 3+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-09-12 10:48 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4365 bytes --]

On Fri, Sep 12, 2025 at 11:49:12AM +0200, Jürgen Groß wrote:
> On 11.09.25 17:11, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > The steps:
> > 1. Have domU netfront ("untrusted" here) and domU netback
> > ("sys-firewall-alt" here).
> > 2. Pause frontend
> > 3. Shutdown backend
> > 4. Unpause frontend
> > 5. Detach network (in my case attaching another one follows just after,
> > but I believe it's not relevant).
> > 
> > This gives the following on the frontend side:
> > 
> >      ------------[ cut here ]------------
> >      WARNING: CPU: 1 PID: 141 at include/linux/mm.h:1328 xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
> >      Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_ct nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulnighash_clmulni_intel xen_netfront pcspkr xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn i2c_dev loop fuse nfnetlink overlay xen_blkfront
> >      CPU: 1 UID: 0 PID: 141 Comm: xenwatch Not tainted 6.17.0-0.rc5.1.qubes.1.fc41.x86_64 #1 PREEMPT(full)
> >      RIP: 0010:xennet_disconnect_backend+0x1be/0x590 [xen_netfront]
> >      Code: 00 0f 83 93 03 00 00 48 8b 94 dd 90 10 00 00 48 8b 4a 08 f6 c1 01 75 79 66 90 0f b6 4a 33 81 f9 f5 00 00 00 0f 85 f3 fe ff ff <0f> 0b 49 81 ff 00 01 00 00 0f 82 01 ff ff ff 4c 89 fe 48 c7 c7 e0
> >      RSP: 0018:ffffc90001123cf8 EFLAGS: 00010246
> >      RAX: 0000000000000010 RBX: 0000000000000001 RCX: 00000000000000f5
> >      RDX: ffffea0000a05200 RSI: 0000000000000001 RDI: ffffffff82528d60
> >      RBP: ffff888041400000 R08: ffff888005054c80 R09: ffff888005054c80
> >      R10: 0000000000150013 R11: ffff88801851cd80 R12: 0000000000000000
> >      R13: ffff888053619000 R14: ffff888005d61a80 R15: 0000000000000001
> >      FS:  0000000000000000(0000) GS:ffff8880952c6000(0000) knlGS:0000000000000000
> >      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >      CR2: 00006182a11f3328 CR3: 000000001084c006 CR4: 0000000000770ef0
> >      PKRU: 55555554
> >      Call Trace:
> >       <TASK>
> >       xennet_remove+0x1e/0x80 [xen_netfront]
> >       xenbus_dev_remove+0x6e/0xf0
> >       device_release_driver_internal+0x19c/0x200
> >       bus_remove_device+0xc6/0x130
> >       device_del+0x160/0x3e0
> >       ? _raw_spin_unlock+0xe/0x30
> >       ? klist_iter_exit+0x18/0x30
> >       ? __pfx_xenwatch_thread+0x10/0x10
> >       device_unregister+0x17/0x60
> >       xenbus_dev_changed+0x1d7/0x240
> >       xenwatch_thread+0x8f/0x1c0
> >       ? __pfx_autoremove_wake_function+0x10/0x10
> >       kthread+0xf9/0x240
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork+0x152/0x180
> >       ? __pfx_kthread+0x10/0x10
> >       ret_from_fork_asm+0x1a/0x30
> >       </TASK>
> >      ---[ end trace 0000000000000000 ]---
> >      xen_netfront: backend supports XDP headroom
> >      vif vif-0: bouncing transmitted data to zeroed pages
> > 
> > The last two are likely related to following attach, not detach.
> > 
> > The same happens on 6.15 too, so it isn't new thing.
> > 
> > Shutting down backend without detaching first is not really a normal
> > operation, and doing that while frontend is paused is even less so. But
> > is the above expected outcome? If I read it right, it's
> > WARN_ON_ONCE(folio_test_slab(folio)) in get_page(), which I find
> > confusing.
> > 
> > Originally reported at https://github.com/QubesOS/qubes-core-agent-linux/pull/603#issuecomment-3280953080
> > 
> 
> Hmm, with this scenario I imagine you could manage to have
> xennet_disconnect_backend() running multiple times for the same device
> concurrently.
> 
> How reliable can this be reproduced? How many vcpus does the guest have?

Quite reliably (always?). And there are 2 vcpus.
Interestingly, it doesn't happen on 6.12.42, but does on 6.15.10 and
later.

> Maybe the fix is as simple as adding a lock in xennet_disconnect_backend().

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-09-12 10:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-11 15:11 WARN in xennet_disconnect_backend when frontend is paused during backend shutdown Marek Marczykowski-Górecki
2025-09-12  9:49 ` Jürgen Groß
2025-09-12 10:48   ` Marek Marczykowski-Górecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.