qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org, thorsten kohfeldt <thorsten.kohfeldt@gmx.de>
Subject: Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump
Date: Sat, 22 Oct 2016 05:14:21 -0400 (EDT)	[thread overview]
Message-ID: <1810367462.6153203.1477127661599.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20161021171100.18049.96340.stgit@gimli.home>



----- Original Message -----
> From: "Alex Williamson" <alex.williamson@redhat.com>
> To: qemu-devel@nongnu.org
> Cc: pbonzini@redhat.com, "thorsten kohfeldt" <thorsten.kohfeldt@gmx.de>
> Sent: Friday, October 21, 2016 7:11:44 PM
> Subject: [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump
> 
> With a vfio assigned device we lay down a base MemoryRegion registered
> as an IO region, giving us read & write accessors.  If the region
> supports mmap, we lay down a higher priority sub-region MemoryRegion
> on top of the base layer initialized as a RAM pointer to the mmap.
> Finally, if we have any quirks for the device (ie. address ranges that
> need additional virtualization support), we put another IO sub-region
> on top of the mmap MemoryRegion.  When this is flattened, we now
> potentially have sub-page mmap MemoryRegions exposed which cannot be
> directly mapped through KVM.
> 
> This is as expected, but a subtle detail of this is that we end up
> with two different access mechanisms through QEMU.  If we disable the
> mmap MemoryRegion, we make use of the IO MemoryRegion and service
> accesses using pread and pwrite to the vfio device file descriptor.
> If the mmap MemoryRegion is enabled and we end up in one of these
> sub-page gaps, QEMU handles the access as RAM, using memcpy to the
> mmap.  Using the mmap through QEMU is a subtle difference, but it's
> fine, the problem is the memcpy.  My assumption is that memcpy makes
> no guarantees about access width and potentially uses all sorts of
> optimized memory transfers that are not intended for talking to device
> MMIO.  It turns out that this has been a problem for Realtek NIC
> assignment, which has such a quirk that creates a sub-page mmap
> MemoryRegion access.
> 
> My proposal to fix this is to leverage the skip_dump flag that we
> already use for special handling of these device-backed MMIO ranges.
> When skip_dump is set for a MemoryRegion, we mark memory access as
> non-direct and automatically insert MemoryRegionOps with basic
> semantics to handle accesses.  Note that we only enable dword
> accesses because some devices don't particularly like qword accesses
> (Realtek NICs are such a device).  This actually also fixes memory
> inspection via the xp command in the QEMU monitor as well.
> 
> Please comment.  Is this the best way to solve this problem?  Thanks

Looks good to me.

Paolo

> 
> Reported-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  include/exec/memory.h |    6 ++++--
>  memory.c              |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 10d7eac..a4c3acf 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1464,9 +1464,11 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t
> addr);
>  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>  {
>      if (is_write) {
> -        return memory_region_is_ram(mr) && !mr->readonly;
> +        return memory_region_is_ram(mr) &&
> +               !mr->readonly && !memory_region_is_skip_dump(mr);
>      } else {
> -        return memory_region_is_ram(mr) || memory_region_is_romd(mr);
> +        return (memory_region_is_ram(mr) && !memory_region_is_skip_dump(mr))
> ||
> +               memory_region_is_romd(mr);
>      }
>  }
>  
> diff --git a/memory.c b/memory.c
> index 58f9269..7ed7ca9 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1136,6 +1136,46 @@ const MemoryRegionOps unassigned_mem_ops = {
>      .endianness = DEVICE_NATIVE_ENDIAN,
>  };
>  
> +static uint64_t skip_dump_mem_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    uint64_t val = (uint64_t)~0;
> +
> +    switch (size) {
> +    case 1:
> +        val = *(uint8_t *)(opaque + addr);
> +        break;
> +    case 2:
> +        val = *(uint16_t *)(opaque + addr);
> +        break;
> +    case 4:
> +        val = *(uint32_t *)(opaque + addr);
> +        break;
> +    }
> +
> +    return val;
> +}
> +
> +static void skip_dump_mem_write(void *opaque, hwaddr addr, uint64_t data,
> unsigned size)
> +{
> +    switch (size) {
> +    case 1:
> +        *(uint8_t *)(opaque + addr) = (uint8_t)data;
> +        break;
> +    case 2:
> +        *(uint16_t *)(opaque + addr) = (uint16_t)data;
> +        break;
> +    case 4:
> +        *(uint32_t *)(opaque + addr) = (uint32_t)data;
> +        break;
> +    }
> +}
> +
> +const MemoryRegionOps skip_dump_mem_ops = {
> +    .read = skip_dump_mem_read,
> +    .write = skip_dump_mem_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
>  bool memory_region_access_valid(MemoryRegion *mr,
>                                  hwaddr addr,
>                                  unsigned size,
> @@ -1366,6 +1406,10 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
>  void memory_region_set_skip_dump(MemoryRegion *mr)
>  {
>      mr->skip_dump = true;
> +    if (mr->ram && mr->ops == &unassigned_mem_ops) {
> +        mr->ops = &skip_dump_mem_ops;
> +        mr->opaque = mr->ram_block->host;
> +    }
>  }
>  
>  void memory_region_init_alias(MemoryRegion *mr,
> 
> 

      parent reply	other threads:[~2016-10-22  9:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-21 17:11 [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump Alex Williamson
2016-10-22  9:10 ` Thorsten Kohfeldt
2016-10-22 15:09   ` Alex Williamson
2016-10-24 11:05     ` Paolo Bonzini
2016-10-24 15:16       ` Alex Williamson
2016-10-24 21:40     ` Thorsten Kohfeldt
2016-10-22  9:14 ` Paolo Bonzini [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1810367462.6153203.1477127661599.JavaMail.zimbra@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thorsten.kohfeldt@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).