From: Paolo Bonzini <pbonzini@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org, thorsten kohfeldt <thorsten.kohfeldt@gmx.de>
Subject: Re: [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump
Date: Sat, 22 Oct 2016 05:14:21 -0400 (EDT) [thread overview]
Message-ID: <1810367462.6153203.1477127661599.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20161021171100.18049.96340.stgit@gimli.home>
----- Original Message -----
> From: "Alex Williamson" <alex.williamson@redhat.com>
> To: qemu-devel@nongnu.org
> Cc: pbonzini@redhat.com, "thorsten kohfeldt" <thorsten.kohfeldt@gmx.de>
> Sent: Friday, October 21, 2016 7:11:44 PM
> Subject: [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump
>
> With a vfio assigned device we lay down a base MemoryRegion registered
> as an IO region, giving us read & write accessors. If the region
> supports mmap, we lay down a higher priority sub-region MemoryRegion
> on top of the base layer initialized as a RAM pointer to the mmap.
> Finally, if we have any quirks for the device (ie. address ranges that
> need additional virtualization support), we put another IO sub-region
> on top of the mmap MemoryRegion. When this is flattened, we now
> potentially have sub-page mmap MemoryRegions exposed which cannot be
> directly mapped through KVM.
>
> This is as expected, but a subtle detail of this is that we end up
> with two different access mechanisms through QEMU. If we disable the
> mmap MemoryRegion, we make use of the IO MemoryRegion and service
> accesses using pread and pwrite to the vfio device file descriptor.
> If the mmap MemoryRegion is enabled and we end up in one of these
> sub-page gaps, QEMU handles the access as RAM, using memcpy to the
> mmap. Using the mmap through QEMU is a subtle difference, but it's
> fine, the problem is the memcpy. My assumption is that memcpy makes
> no guarantees about access width and potentially uses all sorts of
> optimized memory transfers that are not intended for talking to device
> MMIO. It turns out that this has been a problem for Realtek NIC
> assignment, which has such a quirk that creates a sub-page mmap
> MemoryRegion access.
>
> My proposal to fix this is to leverage the skip_dump flag that we
> already use for special handling of these device-backed MMIO ranges.
> When skip_dump is set for a MemoryRegion, we mark memory access as
> non-direct and automatically insert MemoryRegionOps with basic
> semantics to handle accesses. Note that we only enable dword
> accesses because some devices don't particularly like qword accesses
> (Realtek NICs are such a device). This actually also fixes memory
> inspection via the xp command in the QEMU monitor as well.
>
> Please comment. Is this the best way to solve this problem? Thanks
Looks good to me.
Paolo
>
> Reported-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> include/exec/memory.h | 6 ++++--
> memory.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 10d7eac..a4c3acf 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1464,9 +1464,11 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t
> addr);
> static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
> {
> if (is_write) {
> - return memory_region_is_ram(mr) && !mr->readonly;
> + return memory_region_is_ram(mr) &&
> + !mr->readonly && !memory_region_is_skip_dump(mr);
> } else {
> - return memory_region_is_ram(mr) || memory_region_is_romd(mr);
> + return (memory_region_is_ram(mr) && !memory_region_is_skip_dump(mr))
> ||
> + memory_region_is_romd(mr);
> }
> }
>
> diff --git a/memory.c b/memory.c
> index 58f9269..7ed7ca9 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1136,6 +1136,46 @@ const MemoryRegionOps unassigned_mem_ops = {
> .endianness = DEVICE_NATIVE_ENDIAN,
> };
>
> +static uint64_t skip_dump_mem_read(void *opaque, hwaddr addr, unsigned size)
> +{
> + uint64_t val = (uint64_t)~0;
> +
> + switch (size) {
> + case 1:
> + val = *(uint8_t *)(opaque + addr);
> + break;
> + case 2:
> + val = *(uint16_t *)(opaque + addr);
> + break;
> + case 4:
> + val = *(uint32_t *)(opaque + addr);
> + break;
> + }
> +
> + return val;
> +}
> +
> +static void skip_dump_mem_write(void *opaque, hwaddr addr, uint64_t data,
> unsigned size)
> +{
> + switch (size) {
> + case 1:
> + *(uint8_t *)(opaque + addr) = (uint8_t)data;
> + break;
> + case 2:
> + *(uint16_t *)(opaque + addr) = (uint16_t)data;
> + break;
> + case 4:
> + *(uint32_t *)(opaque + addr) = (uint32_t)data;
> + break;
> + }
> +}
> +
> +const MemoryRegionOps skip_dump_mem_ops = {
> + .read = skip_dump_mem_read,
> + .write = skip_dump_mem_write,
> + .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> bool memory_region_access_valid(MemoryRegion *mr,
> hwaddr addr,
> unsigned size,
> @@ -1366,6 +1406,10 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
> void memory_region_set_skip_dump(MemoryRegion *mr)
> {
> mr->skip_dump = true;
> + if (mr->ram && mr->ops == &unassigned_mem_ops) {
> + mr->ops = &skip_dump_mem_ops;
> + mr->opaque = mr->ram_block->host;
> + }
> }
>
> void memory_region_init_alias(MemoryRegion *mr,
>
>
prev parent reply other threads:[~2016-10-22 9:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-21 17:11 [Qemu-devel] [RFC PATCH] memory: Don't use memcpy for ram marked as skip_dump Alex Williamson
2016-10-22 9:10 ` Thorsten Kohfeldt
2016-10-22 15:09 ` Alex Williamson
2016-10-24 11:05 ` Paolo Bonzini
2016-10-24 15:16 ` Alex Williamson
2016-10-24 21:40 ` Thorsten Kohfeldt
2016-10-22 9:14 ` Paolo Bonzini [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1810367462.6153203.1477127661599.JavaMail.zimbra@redhat.com \
--to=pbonzini@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=thorsten.kohfeldt@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).