qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Roman Kiryanov <rkir@google.com>
Cc: jansene@google.com, jpcottin@google.com, mett@google.com,
	 qemu-devel@nongnu.org
Subject: Re: [PATCH v3] timer: Fix a race condition between timer's callback and destroying code
Date: Thu, 27 Jun 2024 19:15:49 +0200	[thread overview]
Message-ID: <CABgObfaBjneGy__o_aABdm+60wsg=rxLmgZLthwCoezwnH34ZQ@mail.gmail.com> (raw)
In-Reply-To: <CAOGAQepLGTY-92So1arNZWjg9m+HZ8AjZ28Gsscm2ai5sj1AMQ@mail.gmail.com>

On Thu, Jun 27, 2024 at 6:12 PM Roman Kiryanov <rkir@google.com> wrote:
>
> On Thu, Jun 27, 2024 at 6:27 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On Thu, Jun 27, 2024 at 2:32 AM Roman Kiryanov <rkir@google.com> wrote:
> > > +        if (qatomic_read(&ts->cb_running)) {
> > > +            qemu_event_wait(&timer_list->timers_done_ev);
> > > +        }
> >
> > qemu_event_wait() already has the right atomic magic, and
> > ts->cb_running is both redundant (in general), and I think racy (as
> > implemented in this patch).
>
> I added cb_running to avoid waiting for timers_done_ev if we know our
> cb is done.

Yes, but it's very tricky. Assuming we want to fix it in the timer
core, the QemuEvent should be enough, no need to optimize it. On the
other hand, I'm still worried about deadlocks (more below).

> > But especially, you haven't justified in the commit message _why_ you
> > need this.
>
> I mentioned the problem of cleanup racing with the timer's callback function
> in the current shape of QEMU.

Yes, but it was not clear what are the involved threads. It is clear
now that you have a function in a separate thread, creating a timer in
the main QEMU event loop.

> > using
> > aio_bh_schedule_oneshot() or aio_wait_bh_oneshot() to synchronize
> > everything with the AioContext thread seems like a superior solution
> > to me.
>
> Could you please elaborate? The problem we want to solve is this:
>
> void myThreadFunc() {
>     CallbackState callbackState;
>     QEMUTimer timer;
>
>     timer_init(&timer, myClockType, myScale, &myTimerCallbackFunc,
> &callbackState);
>     ...
>     timer_del(&timer);
> }
>
> Currently, myTimerCallbackFunc could fire after myThreadFunc exits
> (if timer_del runs between qemu_mutex_unlock and cb(opaque) in
> timerlist_run_timers) and callbackState gets destroyed.

Ok, got it now. I agree that qemu_event_wait() is safe for you here
because you are in a completely separate thread. But I'm worried that
it causes deadlocks in QEMU where the timer callback and the timer_del
run in the same thread.

I think the easiest options would be:

1) if possible, allocate the timer and the callbackState statically in
the device.

2) use "aio_wait_bh_oneshot(qemu_get_aio_context(), [](void
*opaque){}, NULL);" after timer_del(). You can also put the timer and
the callbackState in a RAII wrapper, so that aio_wait_bh_oneshot() is
executed when the RAII wrapper is destructed

Another thing that you could do is to use a shared_ptr<> for the
timer+callbackState combo, and pass a weak_ptr<> to the timer. Then:

- at the beginning of the timer, you upgrade the weak_ptr with lock()
and if it fails, return

- at the end of myThreadfunc, you destruct the shared_ptr before
deleting the timer.

I'm not sure how you'd pass the weak_ptr/shared_ptr to a callback
(Rust has Weak::into_raw/Weak::from_raw, but I don't know C++ well
enough). That may be overkill.

Paolo



  reply	other threads:[~2024-06-27 17:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26 21:52 [PATCH v2] timer: Fix a race condition between timer's callback and destroying code Roman Kiryanov
2024-06-26 23:29 ` Paolo Bonzini
2024-06-27  0:31   ` [PATCH v3] " Roman Kiryanov
2024-06-27 13:27     ` Paolo Bonzini
2024-06-27 16:12       ` Roman Kiryanov
2024-06-27 17:15         ` Paolo Bonzini [this message]
2024-06-27 17:53           ` Roman Kiryanov
2024-08-14 21:12             ` Patrick Leis
2024-08-14 22:10               ` Paolo Bonzini
2024-07-01 21:15           ` [PATCH v4] Add timer_join to avoid racing in timer cleanup Roman Kiryanov
2024-07-08 16:02             ` Roman Kiryanov
2024-07-22  6:00               ` Roman Kiryanov
  -- strict thread matches above, loose matches on Subject: below --
2025-04-19  7:36 [PATCH v3] timer: Fix a race condition between timer's callback and destroying code Ajmal K

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABgObfaBjneGy__o_aABdm+60wsg=rxLmgZLthwCoezwnH34ZQ@mail.gmail.com' \
    --to=pbonzini@redhat.com \
    --cc=jansene@google.com \
    --cc=jpcottin@google.com \
    --cc=mett@google.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rkir@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).