From: Peter Xu <peterx@redhat.com>
To: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Cc: qemu-devel@nongnu.org,
"Philippe Mathieu-Daudé" <philmd@mailo.com>,
"Fabiano Rosas" <farosas@suse.de>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH] memory/ramblock: Fix clear of mru_block on possible race condition
Date: Tue, 2 Jun 2026 15:51:43 -0400 [thread overview]
Message-ID: <ah80T6aIns82fspH@x1.local> (raw)
In-Reply-To: <36749195-2c49-4f9c-8445-2e67b2097676@rsg.ci.i.u-tokyo.ac.jp>
On Tue, Jun 02, 2026 at 01:25:02PM +0900, Akihiko Odaki wrote:
> On 2026/06/02 5:59, Peter Xu wrote:
> > The race condition was not reported in any real bug, but I found it while
> > reviewing some relevant chnages from Akihiko [1]. It's not clear if
> > there's any way to reproduce it, so far it's only theoretical. Hence
> > there's also no Fixes tag attached. There's also no need to copy stable
> > until we have a solid reproducer.
> >
> > Currently, mru_block might still points to ramblocks that are removed if
> > below race condition happens:
> >
> > Reader A Writer
> > -------- ------
> > rcu_read_lock()
> > walk list, find block X
> > QLIST_REMOVE_RCU(X)
> > qatomic_set(&mru_block, NULL)
> > call_rcu(X, reclaim_ramblock)
> > qatomic_set(&mru_block, X) <----------- overwrites NULL
> > rcu_read_unlock()
> > grace period ends, X freed
> >
> > Reader C
> > --------
> > rcu_read_lock()
> > qatomic_rcu_read(&mru_block) -> X
> > Read X's block->offset ... <----------- UAF
> >
> > To fix it, we can introduce a nested RCU free for reset of the mru_block
> > field, and only free the ramblock until the 2nd RCU call.
> >
> > Since QEMU always: (1) dequeue the RCU node before invoking the
> > function, (2) reset all fields in rcu_head in the call_rcu1() call, nested
> > RCU will work all fine like what's used in this patch.
> >
> > [1] https://lore.kernel.org/r/5fd9e540-cf13-45f5-ba9a-c7faf364035b@rsg.ci.i.u-tokyo.ac.jp
> >
> > Cc: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >
> > Based-on: <5fd9e540-cf13-45f5-ba9a-c7faf364035b@rsg.ci.i.u-tokyo.ac.jp>
> >
> > Since there's no reproducer, I didn't verify the problem as-is. What I did
> > is I ran this through some high concurrency "make check -jN" (8 processes
> > with 4 cores pinned, hence 8*N make-check threads running concurrently on 4
> > physical cores), I didn't see any regression.
> > ---
> > system/physmem.c | 18 +++++++++++++++++-
> > 1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/system/physmem.c b/system/physmem.c
> > index 9e1ac13e82..1697568fa7 100644
> > --- a/system/physmem.c
> > +++ b/system/physmem.c
> > @@ -2590,6 +2590,21 @@ static void reclaim_ramblock(RAMBlock *block)
> > g_free(block);
> > }
> > +static void reclaim_ramblock_prepare(RAMBlock *block)
> > +{
> > + /*
> > + * 1st round rcu free guarantees the last reader that can access this
> > + * ramblock is gone. Reset this field making sure it will never point
[1]
> > + * to the ramblock being freed.
>
> Somewhat nitpicking, but I would call this the first grace period rather
> than “rcu free”. call_rcu() itself does not free anything; it schedules a
> callback to run after a grace period, which is what matters here.
Sure, I'll fix the wording.
>
> > + */
> > + qatomic_set(&ram_list.mru_block, NULL);
> > + /*
> > + * Note: this is an intended nested use of rcu_head. If needed, we can
> > + * provide two rcu_heads for ramblock.
> > + */
> > + call_rcu(block, reclaim_ramblock, rcu);
> > +}
> > +
> > void qemu_ram_free(RAMBlock *block)
> > {
> > g_autofree char *name = NULL;
> > @@ -2607,10 +2622,11 @@ void qemu_ram_free(RAMBlock *block)
> > name = cpr_name(block->mr);
> > cpr_delete_fd(name, 0);
> > QLIST_REMOVE_RCU(block, next);
> > + /* First attempt of reset, see reclaim_ramblock_prepare() for the 2nd */
> > qatomic_set(&ram_list.mru_block, NULL);
>
> I think it would be better to remove this for simplicity.
>
> mru_block is cleared later anyway, so this does not affect correctness. The
> only practical effect is that qemu_get_ram_block() may be slower during the
> first grace period.
>
> Conceptually, it is simpler to describe the two grace periods as serving two
> distinct purposes, rather than as two attempts to clear mru_block. With the
> early clear removed, readers do not first need to understand that the first
> grace period leaves the first clear ineffective; they can instead focus on
> how it makes the later clear effective.
It was my intention to keep this, because then it's easier to justify
"nobody has access to the ramblock being removed".
If with this line removed here, it means after the first grace period (say,
reaching reclaim_ramblock_prepare()), RCU readers can still access the
ramblock, essentially because before reclaim_ramblock_prepare() clearing
mru_block even if the block is not visible on ram_list it's still visible
when a cache hit happens.
But I agree removal of this line shouldn't involve correctness issues,
because qemu_get_ram_block() when having a mru_block cache hit, will always
return the block directly:
[2]
block = qatomic_rcu_read(&ram_list.mru_block);
if (block && addr - block->offset < block->max_length) {
return block;
}
And it will never reach the "found:" label later to update mru_block again.
But that's obscure.
So it's a niche use of it, but keeping this line to reset makes this
justification slightly easier (corresponds to the comment at [1] above).
I can also remove this line, but then I'll need to enrich comment
explaining case [2] above. We'll also need to be careful on not changing
that behavior in qemu_get_ram_block().
Thanks,
>
> Regars,
> Akihiko Odaki
>
> > /* Write list before version */
> > qatomic_store_release(&ram_list.version, ram_list.version + 1);
> > - call_rcu(block, reclaim_ramblock, rcu);
> > + call_rcu(block, reclaim_ramblock_prepare, rcu);
> > qemu_mutex_unlock_ramlist();
> > }
>
--
Peter Xu
next prev parent reply other threads:[~2026-06-02 19:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 20:59 [PATCH] memory/ramblock: Fix clear of mru_block on possible race condition Peter Xu
2026-06-02 4:25 ` Akihiko Odaki
2026-06-02 19:51 ` Peter Xu [this message]
2026-06-03 5:11 ` Akihiko Odaki
2026-06-03 17:35 ` Peter Xu
2026-06-04 5:24 ` Akihiko Odaki
2026-06-04 13:33 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ah80T6aIns82fspH@x1.local \
--to=peterx@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=farosas@suse.de \
--cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
--cc=pbonzini@redhat.com \
--cc=philmd@mailo.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.