All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, peterx@redhat.com,
	alex@shazbot.org, richard.henderson@linaro.org,
	berrange@redhat.com, philmd@oss.qualcomm.com, philmd@mailo.com,
	david@kernel.org, clg@redhat.com, pbonzini@redhat.com,
	phrdina@redhat.com, jugraham@redhat.com,
	liugang24219@sangfor.com.cn, dinghui@sangfor.com.cn,
	shan.gavin@gmail.com
Subject: Re: [PATCH v2 1/2] system/memory: Use qemu_ram_{copy, move}() in ram device region accessors
Date: Tue, 16 Jun 2026 00:36:16 -0400	[thread overview]
Message-ID: <20260616002413-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <2535e092-b3ef-427c-9d33-39861b3a43f1@redhat.com>

On Tue, Jun 16, 2026 at 02:22:18PM +1000, Gavin Shan wrote:
> On 6/16/26 7:31 AM, Gavin Shan wrote:
> > On 6/16/26 5:42 AM, Michael S. Tsirkin wrote:
> > > On Tue, Jun 16, 2026 at 05:24:15AM +1000, Gavin Shan wrote:
> > > > Hi Peter and Michael,
> > > > 
> > > > On 6/16/26 1:12 AM, Peter Maydell wrote:
> > > > > On Mon, 15 Jun 2026 at 15:56, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > 
> > > > > > On Mon, Jun 15, 2026 at 11:57:09AM +0100, Peter Maydell wrote:
> > > > > > > On Mon, 15 Jun 2026 at 11:03, Gavin Shan <gshan@redhat.com> wrote:
> > > > > > > > 
> > > > > > > > All ram device regions were turned to be indirectly accessible by commit
> > > > > > > > 4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads
> > > > > > > > to guest hang on attempt to build 'cuda-samples' as reported by Julia. The
> > > > > > > > guest is started by the following command lines, with GH100 GPU card passed
> > > > > > > > from the host.
> > > > > > > 
> > > > > > > > diff --git a/include/system/memory.h b/include/system/memory.h
> > > > > > > > index 1417132f6d..5878727d09 100644
> > > > > > > > --- a/include/system/memory.h
> > > > > > > > +++ b/include/system/memory.h
> > > > > > > > @@ -2897,6 +2897,8 @@ void address_space_register_map_client(AddressSpace *as, QEMUBH *bh);
> > > > > > > >    void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh);
> > > > > > > > 
> > > > > > > >    /* Internal functions, part of the implementation of address_space_read.  */
> > > > > > > > +void qemu_ram_copy(void *dest, const void *src, size_t n);
> > > > > > > > +void qemu_ram_move(void *dest, const void *src, size_t n);
> > > > > > > 
> > > > > > > New function prototypes in include headers need documentation comments.
> > > > > > > 
> > > > > > > In particular for these, it's really important that we clearly say
> > > > > > > what semantics we are attempting to provide with them, so that
> > > > > > > (a) when we're reviewing or later updating the implementation we
> > > > > > > know what we are trying to provide
> > > > > > > (b) when we're looking at the callsites we know what the function
> > > > > > > is guaranteeing to us and what it is not, and thus whether it's
> > > > > > > OK to use it or we need something els.
> > > > > > > 
> > > > > > > > +#if defined(__i386__) || defined(__x86_64__)
> > > > > > > > +#define HOST_UNALIGNED_MMIO_OK 1
> > > > > > > > +#else
> > > > > > > > +#define HOST_UNALIGNED_MMIO_OK 0
> > > > > > > > +#endif
> > > > > > > 
> > > > > > > We need to do something better than this. We can't
> > > > > > > just say "oh, we trust that on x86 this works": it is
> > > > > > > neither actually true that the compiler guarantees it
> > > > > > 
> > > > > > sorry guarantee what?
> > > > > 
> > > > > Well, that's part of my point about "we need a doc comment":
> > > > > what exactly are we trying to guarantee ? But whatever it is,
> > > > > "hardcoded ifdef that privileges x86" is definitely the wrong
> > > > > answer.
> > > > > 
> > > > 
> > > > Could you please check the following comments (documentation context) for the
> > > > added functions look good to you? Please let me know if there are anything
> > > > can be improved.
> > > > 
> > > > +
> > > > +/**
> > > > + * qemu_ram_copy: copy data to a RAMBlock
> > > > + *
> > > > + * @dest: destination where the data is copied to. It's the pointer returned
> > > > + *        by ramblock_ptr() and its variants.
> > > > + * @src: source where the data is copied from. It can be a regular memory block
> > > > + *       or a pointer returned by ramblock_ptr() and its variants.
> > > > + * @n: length of data to be copied
> > > > + *
> > > > + * The destination is always a pointer to data area of a RAMBlock which can
> > > > + * be for a regular memory block or a MMIO region. The source can be a pointer
> > > > + * to a regular memory block or a MMIO region. Atomicity isn't guranteed by
> > > > + * memcpy() to copy data between those MMIO regions as we do in this function.
> > > > + * Besides, unaligned accesses are well handled on all architectures except
> > > > + * i386 and x86_64 where the unaligned accesses are supported by the CPUs.
> > > > + *
> > > > + * The ensured atomicity and alignment consideration, which aren't needed
> > > > + * by data copy between two regular memory blocks. So performance penalty
> > > > + * is introduced by this function in such circumstance.
> > > > + */
> > > > +void qemu_ram_copy(void *dest, const void *src, size_t n);
> > > > +
> > > > +/**
> > > > + * qemu_ram_move: move data to a RAMBlock
> > > > + *
> > > > + * @dest: destination where the data is moved to. It's the pointer returned
> > > > + *        by ramblock_ptr() and its variants.
> > > > + * @src: source where the data is moved from. It can be a regular memory block
> > > > + *       or a pointer returned by ramblock_ptr() and its variants.
> > > > + * @n: length of data to be moved
> > > > + *
> > > > + * The destination is always a pointer to data area of a RAMBlock which can
> > > > + * be for a regular memory block or a MMIO region. The source can be a pointer
> > > > + * to a regular memory block or a MMIO region. Atomicity isn't guranteed by
> > > > + * memmove() to move data from or to those MMIO regions as we do in this
> > > > + * function. Besides, unaligned accesses are well handled on all architectures
> > > > + * except i386 and x86_64 where the unaligned accesses are supported by the
> > > > + * CPUs.
> > > > + *
> > > > + * The ensured atomicity and alignment consideration, which aren't needed
> > > > + * by data movement between two regular memory blocks. So performance penalty
> > > > + * is introduced by this function in such circumstance.
> > > > + */
> > > > +void qemu_ram_move(void *dest, const void *src, size_t n);
> > > > +
> > > 
> > > 
> > > I apologize, I don't understand what these comments are saying, at all.
> > > 
> > 
> > Sorry to hear that, Michael. There are two things done in the newly added
> > function qemu_ram_{copy, move}(), comparing to the original memcpy() and
> > memmove():
> > 
> > - Data copy or movement on MMIO region(s) using __bultin_{memcopy, memmove}()
> >    on x86, or qatomic_set() on other architecutures, to avoid the unexpected
> >    optimization done by the compiler on memcpy/memmove. Unsafe instructions
> >    can be produced by the compiler's optimization. This fixes the issue resolved
> >    by commit 4a2e242bbb ("memory: Don't use memcpy for ram_device regions")
> > 
> > - The unaligned access is handled by forcing the access size aligned to (src | dest | n)
> >    on non-x86 architectures, comparing to the original memcpy() and memmove().
> > 
> > Please let me know if we're on same page. If we do, I can revise the comments
> > accordingly.
> > 
> 
> I've modified the comments to something as below. Please let me know if it's
> better. The point is to emphasize qemu_ram_{copy, memmove}() are friendly to
> IO regions :-)
> 
> -----> include/system/memory.h
> 
> +/**
> + * qemu_ram_copy: copy data to a guest memory block
> + *

what is a memory block? ramblock?

don't we need something like this for reads?


> + * @dest: destination where the data is copied to. A pointer to a guest memory

@dst?

> + *        block returned from ramblock_ptr() or its variants
> + * @src: source where the data is copied from. A pointer to host memory block
> + *       or guest memory block returned from ramblock_ptr() or its variants.
> + * @n: length of data to be copied
> + *
> + * When the source pointer, destinatoin pointer or both dereference guest
> + * memory blocks, which can be IO blocks (regions).

IO block?

let's not invent terminology pls.

> Under this cirtumstance,
> + * data copy between those blocks is equivalent to IO accesses. It's unsafe
> + * to use memcpy(), which could be translated to IO unfriendly instructions
> + * by the compiler due to code level optimization, resulting in the unexpected
> + * behavior. This function is ensured to be friendly to both IO and memory
> + * accesses. No optimized instructions should be generated by the compiler,
> + * and no unexpected behavior should be observed for IO accesses.

Let's be specific, and brief, and tell user what this does:
Maybe:

	Copy @n bytes from @src to @dst.
	Assumes @src and @dst do not overlap.
	Handles special cases such as uncacheable ramblocks correctly.
	Use this for accessing ramblock in response to DMA/VCPU IO
	and in preference to memcpy.



> + */
> +void qemu_ram_copy(void *dest, const void *src, size_t n);
> +
> +/**
> + * qemu_ram_move: move data to a guest memory block



> + *
> + * @dest: destination where the data is moved to. A pointer to a guest memory
> + *        block returned from ramblock_ptr() or its variants
> + * @src: source where the data is moved from. A pointer to host memory block
> + *       or guest memory block returned from ramblock_ptr() or its variants.
> + * @n: length of data to be copied
> + *
> + * Similar to qemu_ram_copy(), this function is ensured to be friendly to both
> + * IO and memory accesses. No optimized instructions should be generated by
> + * the compiler, and no unexpected behavior should be observed for IO accesses.

unclear how it is different from copy.

> + */
> +void qemu_ram_move(void *dest, const void *src, size_t n);



> 
> > > 
> > > 
> > > > > -- PMM
> > > > > 
> 
> Thanks,
> Gavin



  reply	other threads:[~2026-06-16  4:37 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 10:01 [PATCH v2 0/2] system/memory: Make ram device region directly accessible Gavin Shan
2026-06-15 10:01 ` [PATCH v2 1/2] system/memory: Use qemu_ram_{copy, move}() in ram device region accessors Gavin Shan
2026-06-15 10:57   ` Peter Maydell
2026-06-15 14:48     ` Michael S. Tsirkin
2026-06-15 14:56     ` Michael S. Tsirkin
2026-06-15 15:12       ` Peter Maydell
2026-06-15 19:24         ` Gavin Shan
2026-06-15 19:42           ` Michael S. Tsirkin
2026-06-15 21:31             ` Gavin Shan
2026-06-16  4:22               ` Gavin Shan
2026-06-16  4:36                 ` Michael S. Tsirkin [this message]
2026-06-16  4:23               ` Michael S. Tsirkin
2026-06-16  4:48                 ` Richard Henderson
2026-06-16  4:49                   ` Michael S. Tsirkin
2026-06-16  4:55                     ` Gavin Shan
2026-06-16  5:17                       ` Michael S. Tsirkin
2026-06-16  5:21                         ` Gavin Shan
2026-06-16  5:32                           ` Michael S. Tsirkin
2026-06-16  4:59                   ` Michael S. Tsirkin
2026-06-16  5:07                     ` Gavin Shan
2026-06-16  5:25                       ` Michael S. Tsirkin
2026-06-15 19:52         ` Michael S. Tsirkin
2026-06-15 15:17   ` Richard Henderson
2026-06-15 16:33     ` Gavin Shan
2026-06-15 17:03       ` Richard Henderson
2026-06-15 18:09         ` Gavin Shan
2026-06-15 18:33           ` Richard Henderson
2026-06-15 19:40           ` Michael S. Tsirkin
2026-06-16  4:18             ` Gavin Shan
2026-06-15 16:35     ` Michael S. Tsirkin
2026-06-15 16:37       ` Michael S. Tsirkin
2026-06-15 17:05       ` Richard Henderson
2026-06-15 10:01 ` [PATCH v2 2/2] system/memory: Make ram device region directly accessible Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260616002413-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex@shazbot.org \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=david@kernel.org \
    --cc=dinghui@sangfor.com.cn \
    --cc=gshan@redhat.com \
    --cc=jugraham@redhat.com \
    --cc=liugang24219@sangfor.com.cn \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@mailo.com \
    --cc=philmd@oss.qualcomm.com \
    --cc=phrdina@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.