All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, peterx@redhat.com,
	alex@shazbot.org, richard.henderson@linaro.org,
	peter.maydell@linaro.org, berrange@redhat.com,
	philmd@oss.qualcomm.com, philmd@mailo.com, david@kernel.org,
	clg@redhat.com, pbonzini@redhat.com, phrdina@redhat.com,
	jugraham@redhat.com, liugang24219@sangfor.com.cn,
	dinghui@sangfor.com.cn, shan.gavin@gmail.com
Subject: Re: [PATCH v3 0/2] system/memory: Make ram device region directly accessible
Date: Wed, 17 Jun 2026 03:27:23 -0400	[thread overview]
Message-ID: <20260617031443-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <ee0280dc-0c21-44df-a1cb-94c580140491@redhat.com>

On Wed, Jun 17, 2026 at 05:00:45PM +1000, Gavin Shan wrote:
> Hi Michael,
> 
> On 6/17/26 3:52 PM, Michael S. Tsirkin wrote:
> > On Wed, Jun 17, 2026 at 12:35:00PM +1000, Gavin Shan wrote:
> > > On 6/16/26 3:44 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 16, 2026 at 03:40:34PM +1000, Gavin Shan wrote:
> > > > > On 6/16/26 3:25 PM, Gavin Shan wrote:
> > > > > > All ram device regions was turned to be indirectly accessible by commit
> > > > > > 4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads
> > > > > > to a hanged guest where a NVidia GH100 GPU is passed from host. The memory
> > > > > > in its PCI BAR#4 can be allocated as DMA target buffer. qemu has to take
> > > > > > DMA bounce buffer in address_space_map() to cover the DMA request. However,
> > > > > > the bounce buffer size is 4096 bytes and we're overrunning it easily when
> > > > > > the guest has significant disk activities on compiling 'cuda-samples'.
> > > > > > The full log and problem description can be found from PATCH[1/2]'s commit
> > > > > > log.
> > > > > > 
> > > > > > Try to fix the issue handled in commit 4a2e242bbb by replacing memcopy()/
> > > > > > memmove() with newly added helpers qemu_ram_{copy, move}() that works on
> > > > > > top of __builtin_{memcpy, memmove} or unaligned access friendly memory
> > > > > > movement in the accessors to the ram device regions. With this, we can
> > > > > > basically revert that commit to make ram device region directly accessible
> > > > > > again and bypass the bounce buffer in address_space_map() where the guest
> > > > > > hang is caused.
> > > > > > 
> > > > > > PATCH[1] uses qemu_ram_{copy, move}() in ram device region accessors
> > > > > > PATCH[2] makes ram device region directly accessible again
> > > > > > 
> > > > > Michael asked to include below context in the cover letter in v3, but I
> > > > > didn't noticed that before I sent v3 series, appended with them.
> > > > > 
> > > 
> > > Looking at the list of issues (questions) raised by Michael, I don't understand
> > > every one
> > 
> > Gavin, I doubt one should make memory.c changes without understanding the issues
> > it is trying to address.
> > 
> > What is unclear? Ask away.
> > 
> 
> Yeah, absolutely. I need some time to understand all questions or suggestions
> by digging the code a bit, before I'm able to come back to you, but not very
> soon though :-)
> 
> > 
> > > before I'm able to put more time to dig, but I feel this series has
> > > too ambitious goal to cover accesses to all the directly accessible regions
> > > with the newly introduced qemu_ram_{copy, move}. It causes too many behavior
> > > changes and concerns, making this series impossible to land.
> > > 
> > > I would suggest to break down the goal and step back to apply the newly introduced
> > > qemu_ram_{copy, move} to the ram device regions only? It's actually something
> > > proposed by Peter Xu in the earlier replies. Taking address_space_write() as an
> > > example, the indirectly accessible regions are covered by memory_region_dispatch_write()
> > > in (1), the ram device region is covered by qemu_ram_move() in (2), and all other
> > > directly accessible regions are covered by memmove() in (3).
> > > 
> > >    address_space_write
> > >      flatview_write
> > >        flatview_write_continue
> > >          flatview_write_continue_step
> > >            memory_access_size             // (1) indirectly accessible region
> > >            memory_region_dispatch_write
> > >              access_with_adjusted_size
> > >                memory_region_write_accessor
> > >                  mr->ops->write
> > >            qemu_ram_move                  // (2) ram device region
> > >            memmove                        // (3) all other directly accessible regions
> > > 
> > > With the limitation, only the ram device regions in (2) are affected. We're
> > > basically moving the accesses to the ram device region from (1) to (2). No
> > > changes introduced to other types of regions. The goal is to make the ram device
> > > region accessible so that the bounce buffer can be bypassed in DMA path.
> > 
> > Esthetics aside - ram device regions have all the same issues.
> > 
> > Maybe you can limit the scope of the changes,
> > but I doubt you can get out understanding)
> > 
> 
> Yes. Limited to the scope of VFIO and PCI BARs, it depends on how the PCI BARs
> are mapped, pgprot_noncached or pgprot_writecombine. The listed problems and
> concerns are existing on the ram device region exposed with pgprot_noncached.
> However, everything should be just fine if the region is exposed with pgprot_writecombine.
> Am I understanding this correctly?

Not everything)

We have issues around guest RAM access, too.

But yes you can then mostly access device RAM as guest RAM on
arm and x86. I am not sure about power. for power pgprot_writecombine
is cache inhibited and I am not sure e.g. vector instructions
with misaligned addresses behave the same.



> [...]
> 
> Thanks,
> Gavin



      reply	other threads:[~2026-06-17  7:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16  5:25 [PATCH v3 0/2] system/memory: Make ram device region directly accessible Gavin Shan
2026-06-16  5:25 ` [PATCH v3 1/2] system/memory: Use qemu_ram_{copy, move}() in ram device region accessors Gavin Shan
2026-06-16  6:17   ` Michael S. Tsirkin
2026-06-16  7:15     ` Gavin Shan
2026-06-16  9:51       ` Michael S. Tsirkin
2026-06-16 12:50         ` Ding Hui
2026-06-16 15:51           ` Michael S. Tsirkin
2026-06-16 23:01             ` Gavin Shan
2026-06-16  5:25 ` [PATCH v3 2/2] system/memory: Make ram device region directly accessible Gavin Shan
2026-06-16  5:36 ` [PATCH v3 0/2] " Michael S. Tsirkin
2026-06-16  5:43   ` Gavin Shan
2026-06-16  5:40 ` Gavin Shan
2026-06-16  5:44   ` Michael S. Tsirkin
2026-06-17  2:35     ` Gavin Shan
2026-06-17  5:52       ` Michael S. Tsirkin
2026-06-17  7:00         ` Gavin Shan
2026-06-17  7:27           ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260617031443-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex@shazbot.org \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=david@kernel.org \
    --cc=dinghui@sangfor.com.cn \
    --cc=gshan@redhat.com \
    --cc=jugraham@redhat.com \
    --cc=liugang24219@sangfor.com.cn \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@mailo.com \
    --cc=philmd@oss.qualcomm.com \
    --cc=phrdina@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.