All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Cc: qemu-devel@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>
Subject: Re: [PATCH] system/physmem: Synchronize ram_list accesses
Date: Tue, 26 May 2026 17:35:55 -0400	[thread overview]
Message-ID: <ahYSO98JgdYJrP2e@x1.local> (raw)
In-Reply-To: <20260523-tsan-v1-1-07d5eb9dcaa2@rsg.ci.i.u-tokyo.ac.jp>

On Sat, May 23, 2026 at 09:38:28PM +0900, Akihiko Odaki wrote:
> Alex Bennée reported a ThreadSanitizer warning about a plain concurrent
> access to ram_list [1]. Ensure the concurrent accesses to ram_list are
> properly synchronized with atomic accesses, mutexes, or RCU.
> 
> First, the plain assignments of ram_list.mru_block are replaced with
> qatomic_set(). A comment in qemu_get_ram_block() explains why the
> ordering requirement is relaxed, but it still needs to be atomically
> accessed. include/qemu/atomic.h says:
> 
> > The C11 memory model says that variables that are accessed from
> > different threads should at least be done with __ATOMIC_RELAXED
> > primitives or the result is undefined. Generally this has little to
> > no effect on the generated code but not using the atomic primitives
> > will get flagged by sanitizers as a violation.
> 
> Second, ram_list.version accesses are replaced with atomic operations
> or protected with a mutex. Unlike ram_list.mru_block, ram_list.version
> has tighter ordering requirements for one of its goals: ensuring that
> the reader-held rs->last_seen_block value is invalidated whenever a RAM
> block is reclaimed between two RCU reader critical sections. Below are
> steps a reader and an updater follow:
> 
> Reader:
> 
> R-1. Enter the first RCU read-side critical section:
> R-1-1. rs->last_version = qatomic_load_acquire(&ram_list.version)
> R-1-2. rs->last_seen_block = an element of ram_list.blocks
> R-2. Enter the second RCU read-side critical section:
> R-2-1. if (qatomic_read(&ram_list.version) != rs->last_version)
> R-2-2.     rs->last_seen_block = NULL
> 
> Updater:
> 
> W-1. Enter a ram_list.mutex critical section
> W-1-1. Update ram_list.blocks
> W-1-2. qatomic_store_release(&ram_list.version, ram_list.version + 1)
> W-2. Enter another ram_list.mutex critical section
> W-2-1. QLIST_REMOVE_RCU(block, next)
> W-2-2. qatomic_store_release(&ram_list.version, ram_list.version + 1)
> W-2-3. call_rcu(block, reclaim_ramblock, rcu)
> 
> W-1-2 represents the write observed by R-1-1.
> 
> ram_list.version is read non-atomically on the update side because the
> update side is serialized with ram_list.mutex. The other ram_list
> accesses in these steps are reasoned about in two cases.
> 
> When the grace period of W-2-3 contains R-2:
>     qatomic_load_acquire() at R-1-1 and qatomic_store_release() at W-1-2
>     enforce the following ordering:
>         W-1-1 -> W-1-2 -> R-1-1 -> R-1-2
>     The value of ram_list.blocks stored by W-1-1 or a newer value that
>     was loaded by R-1-2 is still valid because of the grace period.
> 
> When the grace period of W-2-3 ends before R-2:
>     call_rcu() at W-2-3 and the read-side critical section at R-2 ensure
>     the following ordering:
>         W-2-2 -> W-2-3 -> the grace period -> R-2 -> R-2-1
>     The value of ram_list.version stored by W-2-2 or a newer value that
>     was loaded by R-2-1 differs from rs->last_version and the reader
>     invalidates rs->last_seen_block.
> 
> Together, these steps ensure that rs->last_seen_block is invalidated
> whenever necessary. With added atomic operations, pre-existing memory
> barriers are no longer necessary and are removed.
> 
> Any other ram_list accesses are already properly synchronized.
> 
> [1] https://lore.kernel.org/qemu-devel/878q9fbmap.fsf@draig.linaro.org/
> 
> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>

Above link [1] also mentioned about a hang.  Just to double check: we're
only trying to silent the TSAN warning (which is a false positive), right?

> ---
>  migration/ram.c  | 10 +++++-----
>  system/physmem.c | 16 +++++++---------
>  2 files changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index fc38ffbf8af1..6da24d725861 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2495,7 +2495,10 @@ static void ram_state_reset(RAMState *rs)
>  
>      rs->last_seen_block = NULL;
>      rs->last_page = 0;
> -    rs->last_version = ram_list.version;
> +
> +    /* Read version before ram_list.blocks */
> +    rs->last_version = qatomic_load_acquire(&ram_list.version);
> +
>      rs->xbzrle_started = false;
>  
>      ram_page_hint_reset(&rs->page_hint);
> @@ -3270,13 +3273,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>       */
>      WITH_QEMU_LOCK_GUARD(&rs->bitmap_mutex) {
>          WITH_RCU_READ_LOCK_GUARD() {
> -            if (ram_list.version != rs->last_version) {
> +            if (qatomic_read(&ram_list.version) != rs->last_version) {
>                  ram_state_reset(rs);
>              }

Not directly relevant to this patch, but I never thought into why this few
lines are needed at all, and then I found I don't have an answer..

Here, ram_list version change should only happen during ramblock
add/remove.  I can think of a few special cases:

a) device add/remove (e.g. dimms) causing new / removing ramblocks, should
   never happen as qdev code fails any device add/remove attempt during
   migration (qdev_device_add_from_qdict, qdev_unplug)

b) virtio-mem, which only manages one memory backend, hence one ramblock,
   no add/remove

c) virtio-gpu, will do random add/remove with memory_region_init_ram_ptr()
   which means ramblocks are not migratable

PS: I recall we have other special cases like ROM sizes can change on dest
QEMU, but it's about used_length, not about version boosts, so doesn't
matter here.

Both a) and b) should make sure no change on version, c) will change
version, but in a case where the ramblock is destined to be not migratable,
hence not visible to migration core.

The patch itself looks all fine, but I'm just wondering if we should just
remove this check completely or even try to somehow trap it from happening
(if we can rule out things like virtio-gpu): if something really changed,
it'll be a serious problem because we sync ramblock list already during
ram_save_setup.

Thanks,

>  
> -            /* Read version before ram_list.blocks */
> -            smp_rmb();
> -
>              ret = rdma_registration_start(f, RAM_CONTROL_ROUND);
>              if (ret < 0) {
>                  qemu_file_set_error(f, ret);
> diff --git a/system/physmem.c b/system/physmem.c
> index 7bcbf8757361..9e1ac13e8259 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -839,12 +839,12 @@ found:
>      /* It is safe to write mru_block outside the BQL.  This
>       * is what happens:
>       *
> -     *     mru_block = xxx
> +     *     qatomic_set(&mru_block, xxx)
>       *     rcu_read_unlock()
>       *                                        xxx removed from list
>       *                  rcu_read_lock()
>       *                  read mru_block
> -     *                                        mru_block = NULL;
> +     *                                        qatomic_set(&mru_block, NULL);
>       *                                        call_rcu(reclaim_ramblock, xxx);
>       *                  rcu_read_unlock()
>       *
> @@ -852,7 +852,7 @@ found:
>       * when it was placed into the list.  Here we're just making an extra
>       * copy of the pointer.
>       */
> -    ram_list.mru_block = block;
> +    qatomic_set(&ram_list.mru_block, block);
>      return block;
>  }
>  
> @@ -2260,11 +2260,10 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>      } else { /* list is empty */
>          QLIST_INSERT_HEAD_RCU(&ram_list.blocks, new_block, next);
>      }
> -    ram_list.mru_block = NULL;
> +    qatomic_set(&ram_list.mru_block, NULL);
>  
>      /* Write list before version */
> -    smp_wmb();
> -    ram_list.version++;
> +    qatomic_store_release(&ram_list.version, ram_list.version + 1);
>      qemu_mutex_unlock_ramlist();
>  
>      physical_memory_set_dirty_range(new_block->offset,
> @@ -2608,10 +2607,9 @@ void qemu_ram_free(RAMBlock *block)
>      name = cpr_name(block->mr);
>      cpr_delete_fd(name, 0);
>      QLIST_REMOVE_RCU(block, next);
> -    ram_list.mru_block = NULL;
> +    qatomic_set(&ram_list.mru_block, NULL);
>      /* Write list before version */
> -    smp_wmb();
> -    ram_list.version++;
> +    qatomic_store_release(&ram_list.version, ram_list.version + 1);
>      call_rcu(block, reclaim_ramblock, rcu);
>      qemu_mutex_unlock_ramlist();
>  }
> 
> ---
> base-commit: e89049b3ba5f1f0468bc0d294173345597514a1b
> change-id: 20260520-tsan-c6f1aa909a90
> 
> Best regards,
> --  
> Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> 

-- 
Peter Xu



  parent reply	other threads:[~2026-05-26 21:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-23 12:38 [PATCH] system/physmem: Synchronize ram_list accesses Akihiko Odaki
2026-05-26  9:22 ` Philippe Mathieu-Daudé
2026-05-26 11:58   ` Akihiko Odaki
2026-05-26 12:10     ` Philippe Mathieu-Daudé
2026-05-26 21:35 ` Peter Xu [this message]
2026-05-27  7:54   ` Akihiko Odaki
2026-05-28 15:18     ` Peter Xu
2026-05-28 15:32       ` Akihiko Odaki
2026-05-28 16:03         ` Peter Xu
2026-05-29  8:24           ` Akihiko Odaki
2026-05-29 16:13             ` Peter Xu
2026-05-29 16:39               ` Akihiko Odaki
2026-06-01 21:01                 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahYSO98JgdYJrP2e@x1.local \
    --to=peterx@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=farosas@suse.de \
    --cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.