From: Peter Xu <peterx@redhat.com>
To: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Cc: qemu-devel@nongnu.org,
"Philippe Mathieu-Daudé" <philmd@mailo.com>,
"Fabiano Rosas" <farosas@suse.de>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH] memory/ramblock: Fix clear of mru_block on possible race condition
Date: Wed, 3 Jun 2026 13:35:54 -0400 [thread overview]
Message-ID: <aiBl-hXpmXc8ULSJ@x1.local> (raw)
In-Reply-To: <622da0a8-8101-4816-bf20-b5c3c9547a5f@rsg.ci.i.u-tokyo.ac.jp>
On Wed, Jun 03, 2026 at 02:11:44PM +0900, Akihiko Odaki wrote:
> qemu_get_ram_block() may set mru_block during the grace period. Once
> mru_block is set, RCU readers can still access the ramblock, even if the
> block is no longer visible in ram_list.
True..
>
> When enriching the comment in reclaim_ramblock_prepare(), I think the phrase
> “the last reader that can access this ramblock is gone” could be improved.
> What matters here is that the last reader that could find the block in
> ram_list is gone. The block is no longer visible in ram_list to any later
> reader, and clearing the cache at that point ensures that it is no longer
> visible through an mru_block cache hit either.
Yes that's ambiguous.
I updated comments in this patch, removed the 1st reset of mru_block, and
when at it also touched up the comment in qemu_get_ram_block(), let me know
if there's further comments before repost.
Thanks,
===8<===
From 18789a1136c753a6cdce1ea698cf4c1871121cc9 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 29 May 2026 12:05:15 -0400
Subject: [PATCH] memory/ramblock: Fix clear of mru_block on possible race
condition
The race condition was not reported in any real bug, but I found it while
reviewing some relevant chnages from Akihiko [1]. It's not clear if
there's any way to reproduce it, so far it's only theoretical. Hence
there's also no Fixes tag attached. There's also no need to copy stable
until we have a solid reproducer.
Currently, mru_block might still points to ramblocks that are removed if
below race condition happens:
Reader A Writer
-------- ------
rcu_read_lock()
walk list, find block X
QLIST_REMOVE_RCU(X)
qatomic_set(&mru_block, NULL)
call_rcu(X, reclaim_ramblock)
qatomic_set(&mru_block, X) <----------- overwrites NULL
rcu_read_unlock()
grace period ends, X freed
Reader C
--------
rcu_read_lock()
qatomic_rcu_read(&mru_block) -> X
Read X's block->offset ... <----------- UAF
To fix it, we can introduce a nested RCU free for reset of the mru_block
field, and only free the ramblock until the 2nd RCU call.
Since QEMU always: (1) dequeue the RCU node before invoking the
function, (2) reset all fields in rcu_head in the call_rcu1() call, nested
RCU will work all fine like what's used in this patch.
[1] https://lore.kernel.org/r/5fd9e540-cf13-45f5-ba9a-c7faf364035b@rsg.ci.i.u-tokyo.ac.jp
Cc: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
Based-on: <5fd9e540-cf13-45f5-ba9a-c7faf364035b@rsg.ci.i.u-tokyo.ac.jp>
---
system/physmem.c | 46 ++++++++++++++++++++++++++++++----------------
1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/system/physmem.c b/system/physmem.c
index 9e1ac13e82..c9e5696d6a 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -836,21 +836,14 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t addr)
abort();
found:
- /* It is safe to write mru_block outside the BQL. This
- * is what happens:
- *
- * qatomic_set(&mru_block, xxx)
- * rcu_read_unlock()
- * xxx removed from list
- * rcu_read_lock()
- * read mru_block
- * qatomic_set(&mru_block, NULL);
- * call_rcu(reclaim_ramblock, xxx);
- * rcu_read_unlock()
+ /*
+ * It is safe to write mru_block outside the BQL, the writer (e.g. when
+ * QEMU frees a ramblock) is designed to be thread-safe with readers
+ * updating this field concurrently. See reclaim_ramblock_prepare().
*
- * qatomic_rcu_set is not needed here. The block was already published
- * when it was placed into the list. Here we're just making an extra
- * copy of the pointer.
+ * qatomic_rcu_set() is not needed here, because the block was already
+ * published when it was placed into the list. Here we're just making
+ * an extra copy of the pointer.
*/
qatomic_set(&ram_list.mru_block, block);
return block;
@@ -2590,6 +2583,23 @@ static void reclaim_ramblock(RAMBlock *block)
g_free(block);
}
+static void reclaim_ramblock_prepare(RAMBlock *block)
+{
+ /*
+ * After one round of grace period, no more reader can see this
+ * ramblock via ram_list. Reset this field making sure it will never
+ * point to the ramblock being freed.
+ */
+ qatomic_set(&ram_list.mru_block, NULL);
+ /*
+ * Wait for a second round of grace period to make sure whoever
+ * accessed the ramblock previously via mru_block has finished using
+ * it. Note: this is an intended nested use of rcu_head. If needed,
+ * we can provide two rcu_heads for ramblock.
+ */
+ call_rcu(block, reclaim_ramblock, rcu);
+}
+
void qemu_ram_free(RAMBlock *block)
{
g_autofree char *name = NULL;
@@ -2607,10 +2617,14 @@ void qemu_ram_free(RAMBlock *block)
name = cpr_name(block->mr);
cpr_delete_fd(name, 0);
QLIST_REMOVE_RCU(block, next);
- qatomic_set(&ram_list.mru_block, NULL);
/* Write list before version */
qatomic_store_release(&ram_list.version, ram_list.version + 1);
- call_rcu(block, reclaim_ramblock, rcu);
+ /*
+ * Wait for a grace period to make sure no reader can see this ramblock
+ * via ram_list anymore. Note that readers can still see and access
+ * the ramblock via mru_block, so we can't free it yet.
+ */
+ call_rcu(block, reclaim_ramblock_prepare, rcu);
qemu_mutex_unlock_ramlist();
}
--
2.53.0
--
Peter Xu
next prev parent reply other threads:[~2026-06-03 17:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 20:59 [PATCH] memory/ramblock: Fix clear of mru_block on possible race condition Peter Xu
2026-06-02 4:25 ` Akihiko Odaki
2026-06-02 19:51 ` Peter Xu
2026-06-03 5:11 ` Akihiko Odaki
2026-06-03 17:35 ` Peter Xu [this message]
2026-06-04 5:24 ` Akihiko Odaki
2026-06-04 13:33 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiBl-hXpmXc8ULSJ@x1.local \
--to=peterx@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=farosas@suse.de \
--cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
--cc=pbonzini@redhat.com \
--cc=philmd@mailo.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.