qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net/hub: make net_hub_port_cleanup idempotent
@ 2025-08-21 14:26 Jonah Palmer
  2025-08-21 15:13 ` David Woodhouse
  2025-09-03  5:42 ` Jason Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Jonah Palmer @ 2025-08-21 14:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, eperezma, leiyang, dwmw2, jonah.palmer

Makes the net_hub_port_cleanup function idempotent to avoid double
removals by guarding its QLIST_REMOVE with a flag.

When using a Xen networking device with hubport backends, e.g.:

-accel kvm,xen-version=0x40011
-netdev hubport,...
-device xen-net-device,...

the shutdown order starts with net_cleanup, which walks the list and
deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
called, which eventually leads to a second net_hub_port_cleanup call,
resulting in a segfault.

Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 net/hub.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/hub.c b/net/hub.c
index e3b58b1c4f8e..ee5881f6d5cb 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -34,6 +34,7 @@ typedef struct NetHubPort {
     QLIST_ENTRY(NetHubPort) next;
     NetHub *hub;
     int id;
+    bool listed;
 } NetHubPort;
 
 struct NetHub {
@@ -129,7 +130,10 @@ static void net_hub_port_cleanup(NetClientState *nc)
 {
     NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
 
-    QLIST_REMOVE(port, next);
+    if (port->listed) {
+        QLIST_REMOVE(port, next);
+        port->listed = false;
+    }
 }
 
 static NetClientInfo net_hub_port_info = {
@@ -159,8 +163,10 @@ static NetHubPort *net_hub_port_new(NetHub *hub, const char *name,
     port = DO_UPCAST(NetHubPort, nc, nc);
     port->id = id;
     port->hub = hub;
+    port->listed = false;
 
     QLIST_INSERT_HEAD(&hub->ports, port, next);
+    port->listed = true;
 
     return port;
 }
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
  2025-08-21 14:26 [PATCH] net/hub: make net_hub_port_cleanup idempotent Jonah Palmer
@ 2025-08-21 15:13 ` David Woodhouse
  2025-08-21 16:38   ` Jonah Palmer
  2025-09-03  5:42 ` Jason Wang
  1 sibling, 1 reply; 5+ messages in thread
From: David Woodhouse @ 2025-08-21 15:13 UTC (permalink / raw)
  To: Jonah Palmer, qemu-devel; +Cc: jasowang, eperezma, leiyang

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
> Makes the net_hub_port_cleanup function idempotent to avoid double
> removals by guarding its QLIST_REMOVE with a flag.
> 
> When using a Xen networking device with hubport backends, e.g.:
> 
> -accel kvm,xen-version=0x40011
> -netdev hubport,...
> -device xen-net-device,...
> 
> the shutdown order starts with net_cleanup, which walks the list and
> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> called, which eventually leads to a second net_hub_port_cleanup call,
> resulting in a segfault.
> 
> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")

Tested-by: David Woodhouse <dwmw@amazon.co.uk>

But I hate it.

The lifetime of these objects is confusing, and this patch doesn't make
it nicer.

Why is it OK for the object to be taken off the list while it still
exists and is findable by other pointers? What does it *mean* for it to
be in that state? Doesn't it have a refcount? Can't it be unlisted
*and* freed only when that refcount goes to zero?



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
  2025-08-21 15:13 ` David Woodhouse
@ 2025-08-21 16:38   ` Jonah Palmer
  2025-09-03  5:42     ` Jason Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Jonah Palmer @ 2025-08-21 16:38 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel; +Cc: jasowang, eperezma, leiyang



On 8/21/25 11:13 AM, David Woodhouse wrote:
> On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
>> Makes the net_hub_port_cleanup function idempotent to avoid double
>> removals by guarding its QLIST_REMOVE with a flag.
>>
>> When using a Xen networking device with hubport backends, e.g.:
>>
>> -accel kvm,xen-version=0x40011
>> -netdev hubport,...
>> -device xen-net-device,...
>>
>> the shutdown order starts with net_cleanup, which walks the list and
>> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
>> called, which eventually leads to a second net_hub_port_cleanup call,
>> resulting in a segfault.
>>
>> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> 
> Tested-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> But I hate it.
> 
> The lifetime of these objects is confusing, and this patch doesn't make
> it nicer.
> 
> Why is it OK for the object to be taken off the list while it still
> exists and is findable by other pointers? What does it *mean* for it to
> be in that state? Doesn't it have a refcount? Can't it be unlisted
> *and* freed only when that refcount goes to zero?
> 
> 

I believe this "off-list but still exists" state is intentional and is 
the model for NIC peer backends. IIUC, it essentially means "detached 
from hub broadcast but owned by the NIC until device teardown".

The net core explicitly transfers ownership of a backend from the global 
list to the NIC without freeing it, so it can be torn down later by the 
device itself. See the comments in qemu_del_net_client and net_cleanup 
in net/net.c.

In other words, a backend can be removed from net_clients and still 
exist (owned by the NIC) until qemu_del_nic cleans it up and frees it. 
There's no refcount today and lifetime is manual.

The real bug here is that hubport's cleanup is not idempotent, not 
necessarily the ownership model itself.

I do agree though that the lifetime of these objects is confusing. And a 
refcount model would be theoretically cleaner, but current code 
explicitly relies on ownership transfer without refcounts.

---

Actually, now that I think about it, perhaps we don't need this extra 
'listed' state and instead can just check 'if (port->hub)' and set it to 
NULL after it's removed from the list.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
  2025-08-21 16:38   ` Jonah Palmer
@ 2025-09-03  5:42     ` Jason Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Wang @ 2025-09-03  5:42 UTC (permalink / raw)
  To: Jonah Palmer; +Cc: David Woodhouse, qemu-devel, eperezma, leiyang

On Fri, Aug 22, 2025 at 12:39 AM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 8/21/25 11:13 AM, David Woodhouse wrote:
> > On Thu, 2025-08-21 at 14:26 +0000, Jonah Palmer wrote:
> >> Makes the net_hub_port_cleanup function idempotent to avoid double
> >> removals by guarding its QLIST_REMOVE with a flag.
> >>
> >> When using a Xen networking device with hubport backends, e.g.:
> >>
> >> -accel kvm,xen-version=0x40011
> >> -netdev hubport,...
> >> -device xen-net-device,...
> >>
> >> the shutdown order starts with net_cleanup, which walks the list and
> >> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> >> called, which eventually leads to a second net_hub_port_cleanup call,
> >> resulting in a segfault.
> >>
> >> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> >
> > Tested-by: David Woodhouse <dwmw@amazon.co.uk>
> >
> > But I hate it.
> >
> > The lifetime of these objects is confusing, and this patch doesn't make
> > it nicer.
> >
> > Why is it OK for the object to be taken off the list while it still
> > exists and is findable by other pointers? What does it *mean* for it to
> > be in that state? Doesn't it have a refcount? Can't it be unlisted
> > *and* freed only when that refcount goes to zero?
> >
> >
>
> I believe this "off-list but still exists" state is intentional and is
> the model for NIC peer backends. IIUC, it essentially means "detached
> from hub broadcast but owned by the NIC until device teardown".
>
> The net core explicitly transfers ownership of a backend from the global
> list to the NIC without freeing it, so it can be torn down later by the
> device itself. See the comments in qemu_del_net_client and net_cleanup
> in net/net.c.
>
> In other words, a backend can be removed from net_clients and still
> exist (owned by the NIC) until qemu_del_nic cleans it up and frees it.
> There's no refcount today and lifetime is manual.
>
> The real bug here is that hubport's cleanup is not idempotent, not
> necessarily the ownership model itself.
>
> I do agree though that the lifetime of these objects is confusing. And a
> refcount model would be theoretically cleaner, but current code
> explicitly relies on ownership transfer without refcounts.

In the long term, we should consider converting networking code to QOM.

Thanks

>
> ---
>
> Actually, now that I think about it, perhaps we don't need this extra
> 'listed' state and instead can just check 'if (port->hub)' and set it to
> NULL after it's removed from the list.
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/hub: make net_hub_port_cleanup idempotent
  2025-08-21 14:26 [PATCH] net/hub: make net_hub_port_cleanup idempotent Jonah Palmer
  2025-08-21 15:13 ` David Woodhouse
@ 2025-09-03  5:42 ` Jason Wang
  1 sibling, 0 replies; 5+ messages in thread
From: Jason Wang @ 2025-09-03  5:42 UTC (permalink / raw)
  To: Jonah Palmer; +Cc: qemu-devel, eperezma, leiyang, dwmw2

On Thu, Aug 21, 2025 at 10:28 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Makes the net_hub_port_cleanup function idempotent to avoid double
> removals by guarding its QLIST_REMOVE with a flag.
>
> When using a Xen networking device with hubport backends, e.g.:
>
> -accel kvm,xen-version=0x40011
> -netdev hubport,...
> -device xen-net-device,...
>
> the shutdown order starts with net_cleanup, which walks the list and
> deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
> called, which eventually leads to a second net_hub_port_cleanup call,
> resulting in a segfault.
>
> Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>

Queued.

Thanks



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-03  5:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 14:26 [PATCH] net/hub: make net_hub_port_cleanup idempotent Jonah Palmer
2025-08-21 15:13 ` David Woodhouse
2025-08-21 16:38   ` Jonah Palmer
2025-09-03  5:42     ` Jason Wang
2025-09-03  5:42 ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).