All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, kwolf@redhat.com
Subject: Re: [PATCH 6/8] qio: Hoist ref of listener outside loop
Date: Wed, 12 Nov 2025 10:31:57 +0000	[thread overview]
Message-ID: <aRRiHfgK24xcD72y@redhat.com> (raw)
In-Reply-To: <elpit3ypz2vysuvwiodd742i5js2konefudavrdzbvywczzlg6@jf4ebybiqcvb>

On Tue, Nov 11, 2025 at 02:07:50PM -0600, Eric Blake wrote:
> On Tue, Nov 11, 2025 at 01:09:24PM -0600, Eric Blake wrote:
> > > In the current code, when we unref() the GSource for the socket
> > > watch, the destroy-notify does not get called, because the
> > > event thread is in the middle of a dispatch callback for the
> > > I/O event.  When the dispatch callback returns control to the
> > > event loop, the GSourceCallback is unrefed, and this triggers
> > > the destroy-notify call, which unrefs the listener.
> > > 
> > > The flow looks like this:
> > > 
> > >   Thread 1:
> > >        qio_net_listener_set_client_func(lstnr, f, ...);
> > >            => foreach sock: socket
> > >                => object_ref(lstnr)
> > >                => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);
> > > 
> > >   Thread 2:
> > >        poll()
> > >           => event POLLIN on socket
> > >                => ref(GSourceCallback)
> > >                => call dispatch(sock)
> > >                     ...do stuff..
> > > 
> > >   Thread 1:
> > >        qio_net_listener_set_client_func(lstnr, NULL, ...);
> > >           => foreach sock: socket
> > >                => g_source_unref(sock_src)
> > >        unref(lstnr)  (the final reference)
> > >           => finalize(lstnr)
> > 
> > If I understand correctly, _this_ unref(lstnr) is NOT the final
> > reference, because of the additional reference still owned by the
> > GSource that is still dispatching, so finalize(lstnr) is not reached
> > here...
> > 
> > > 
> > >   Thread 2:
> > >               => return dispatch(sock)
> > >               => unref(GSourceCallback)
> > >                   => destroy-notify
> > >                      => object_unref
> > 
> > ...but instead here.  And that is the desirable property of the
> > pre-patch behavior with a per-GSource reference on the listsner object
> > - we are guaranteed that listener can't be finalized while there are
> > any pending dispatch in flight, even if the caller has unref'd their
> > last mention of lstnr, and therefore the dispatch never has a
> > use-after-free.
> 
> But thinking further, even if it is not an object_unref, but merely a
> change of the async callback function, is this the sort of thing where
> a mutex is needed to make things safer?
> 
> Even without my patch series, we have this scenario:
> 
>   Thread 1:
>        qio_net_listener_set_client_func(lstnr, f1, ...);
>            => foreach sock: socket
>                => object_ref(lstnr)
>                => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);
> 
>   Thread 2:
>        poll()
>           => event POLLIN on socket
>                => ref(GSourceCallback) // while lstnr->io_func is f1
>                     ...do stuff..
> 
>   Thread 1:
>        qio_net_listener_set_client_func(lstnr, f2, ...);
>           => foreach sock: socket
>                => g_source_unref(sock_src)
>           => foreach sock: socket
>                => object_ref(lstnr)
>                => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);
> 
>   Thread 2:
>                => access lstnr->io_func // now sees f2
>                => call f2(sock)
>                => return dispatch(sock)
>                => unref(GSourceCallback)
>                   => destroy-notify
>                      => object_unref
> 
> So even though thread 2 noticed a client while f1 was registered,
> because we lack a mutex and are not using atomic access primitives,
> there is nothing preventing thread 1 from changing the io_func from f1
> to f2 before thread 2 finally calls the callback.  And if f2 is NULL
> (because the caller is deregistering the callback), I could see the
> race resulting in qio_net_listener_channel_func() with its pre-patch
> code of:
> 
>     if (listener->io_func) {
>         listener->io_func(listener, sioc, listener->io_data);
>     }
> 
> resulting in a NULL-pointer deref if thread 1 changed out
> listener->io_func after the if but before the dispatch.

Hmm, yes, that is a risk I hadn't thought of before.

> Adding a mutex might make QIONetListener slower in the time to grab
> the mutex (even if it is uncontended most of the time), but if we have
> a proper mutex in place, we could at least guarantee that
> listener->io_func, listener->io_data, and listener->io_notify are
> changed as a group, rather than sharded if the polling thread is
> managing a dispatch with listener as its opaque data at the same time
> the user's thread is trying to change the registered callback.

Yeah, a mutex should be ok, as it will be uncontended 99.999% of
the time.

> By itself, GSource DOES have a mutex lock around the GMainContext it
> is attached to; but our particular use of GSource is operating outside
> that locking scheme, so it looks like I've uncovered another latent
> problem.

Agreed.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2025-11-12 10:32 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
2025-11-03 20:10 ` [PATCH 1/8] qio: Add trace points to net_listener Eric Blake
2025-11-04 10:43   ` Daniel P. Berrangé
2025-11-04 11:08   ` Kevin Wolf
2025-11-05 17:18     ` Eric Blake
2025-11-06 12:20       ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 2/8] qio: Minor optimization when callback function is unchanged Eric Blake
2025-11-04 10:44   ` Daniel P. Berrangé
2025-11-04 11:13   ` Kevin Wolf
2025-11-05 17:23     ` Eric Blake
2025-11-03 20:10 ` [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full Eric Blake
2025-11-04 10:50   ` Daniel P. Berrangé
2025-11-04 11:25   ` Kevin Wolf
2025-11-05 19:18     ` Eric Blake
2025-11-03 20:10 ` [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch Eric Blake
2025-11-04 11:03   ` Daniel P. Berrangé
2025-11-04 13:15     ` Kevin Wolf
2025-11-05 19:22       ` Eric Blake
2025-11-04 12:37   ` Kevin Wolf
2025-11-04 13:10     ` Daniel P. Berrangé
2025-11-05 19:32       ` Eric Blake
2025-11-03 20:10 ` [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener Eric Blake
2025-11-05 20:06   ` Eric Blake
2025-11-06 18:35     ` Eric Blake
2025-11-07  8:50       ` Daniel P. Berrangé
2025-11-07 13:47         ` Eric Blake
2025-11-03 20:10 ` [PATCH 6/8] qio: Hoist ref of listener outside loop Eric Blake
2025-11-04 11:13   ` Daniel P. Berrangé
2025-11-05 21:57     ` Eric Blake
2025-11-11 14:43       ` Daniel P. Berrangé
2025-11-11 15:48         ` Kevin Wolf
2025-11-11 16:07           ` Daniel P. Berrangé
2025-11-11 19:09         ` Eric Blake
2025-11-11 20:07           ` Eric Blake
2025-11-12 10:31             ` Daniel P. Berrangé [this message]
2025-11-12 10:20           ` Daniel P. Berrangé
2025-11-03 20:10 ` [PATCH 7/8] qio: Use AioContext for default-context QIONetListener Eric Blake
2025-11-04 11:37   ` Daniel P. Berrangé
2025-11-05 22:06     ` Eric Blake
2025-11-04 15:14   ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix Eric Blake
2025-11-04 11:38   ` Vladimir Sementsov-Ogievskiy
2025-11-05 22:10     ` Eric Blake
2025-11-06  8:20       ` Vladimir Sementsov-Ogievskiy
2025-11-06 12:26       ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRRiHfgK24xcD72y@redhat.com \
    --to=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.