From: Paolo Bonzini <pbonzini@redhat.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: KVM devel mailing list <kvm@vger.kernel.org>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Marc Zyngier <marc.zyngier@arm.com>,
Laszlo Ersek <lersek@redhat.com>,
"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings
Date: Thu, 05 Mar 2015 18:43:36 +0100 [thread overview]
Message-ID: <54F895C8.2070306@redhat.com> (raw)
In-Reply-To: <20150305145831.GA11447@e104818-lin.cambridge.arm.com>
On 05/03/2015 15:58, Catalin Marinas wrote:
>> It would especially suck if the user has a cluster with different
>> machines, some of them coherent and others non-coherent, and then has to
>> debug why the same configuration works on some machines and not on others.
>
> That's a problem indeed, especially with guest migration. But I don't
> think we have any sane solution here for the bus master DMA.
I do not oppose doing cache management in QEMU for bus master DMA
(though if the solution you outlined below works it would be great).
> ARM can override them as well but only making them stricter. Otherwise,
> on a weakly ordered architecture, it's not always safe (let's say the
> guest thinks it accesses Strongly Ordered memory and avoids barriers for
> flag updates but the host "upgrades" it to Cacheable which breaks the
> memory order).
The same can happen on x86 though, even if it's rarer. You still need a
barrier between stores and loads.
> If we want the host to enforce guest memory mapping attributes via stage
> 2, we could do it the other way around: get the guests to always assume
> full cache coherency, generating Normal Cacheable mappings, but use the
> stage 2 attributes restriction in the host to make such mappings
> non-cacheable when needed (it works this way on ARM but not in the other
> direction to relax the attributes).
That sounds like a plan for device assignment. But it still would not
solve the problem of the MMIO framebuffer, right?
>> The problem arises with MMIO areas that the guest can reasonably expect
>> to be uncacheable, but that are optimized by the host so that they end
>> up backed by cacheable RAM. It's perfectly reasonable that the same
>> device needs cacheable mapping with one userspace, and works with
>> uncacheable mapping with another userspace that doesn't optimize the
>> MMIO area to RAM.
>
> Unless the guest allocates the framebuffer itself (e.g.
> dma_alloc_coherent), we can't control the cacheability via
> "dma-coherent" properties as it refers to bus master DMA.
Okay, it's good to rule that out. One less thing to think about. :)
Same for _DSD.
> So for MMIO with the buffer allocated by the host (Qemu), the only
> solution I see on ARM is for the host to ensure coherency, either via
> explicit cache maintenance (new KVM API) or by changing the memory
> attributes used by Qemu to access such virtual MMIO.
>
> Basically Qemu is acting as a bus master when reading the framebuffer it
> allocated but the guest considers it a slave access and we don't have a
> way to tell the guest that such accesses should be cacheable, nor can we
> upgrade them via architecture features.
Yes, that's a way to put it.
>> In practice, the VGA framebuffer has an optimization that uses dirty
>> page tracking, so we could piggyback on the ioctls that return which
>> pages are dirty. It turns out that piggybacking on those ioctls also
>> should fix the case of migrating a guest while the MMU is disabled.
>
> Yes, Qemu would need to invalidate the cache before reading a dirty
> framebuffer page.
>
> As I said above, an API that allows non-cacheable mappings for the VGA
> framebuffer in Qemu would also solve the problem. I'm not sure what KVM
> provides here (or whether we can add such API).
Nothing for now; other architectures simply do not have the issue.
As long as it's just VGA, we can quirk it. There's just a couple
vendor/device IDs to catch, and the guest can then use a cacheable mapping.
For a more generic solution, the API would be madvise(MADV_DONTCACHE).
It would be easy for QEMU to use it, but I am not too optimistic about
convincing the mm folks about it. We can try.
Paolo
WARNING: multiple messages have this Message-ID (diff)
From: pbonzini@redhat.com (Paolo Bonzini)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings
Date: Thu, 05 Mar 2015 18:43:36 +0100 [thread overview]
Message-ID: <54F895C8.2070306@redhat.com> (raw)
In-Reply-To: <20150305145831.GA11447@e104818-lin.cambridge.arm.com>
On 05/03/2015 15:58, Catalin Marinas wrote:
>> It would especially suck if the user has a cluster with different
>> machines, some of them coherent and others non-coherent, and then has to
>> debug why the same configuration works on some machines and not on others.
>
> That's a problem indeed, especially with guest migration. But I don't
> think we have any sane solution here for the bus master DMA.
I do not oppose doing cache management in QEMU for bus master DMA
(though if the solution you outlined below works it would be great).
> ARM can override them as well but only making them stricter. Otherwise,
> on a weakly ordered architecture, it's not always safe (let's say the
> guest thinks it accesses Strongly Ordered memory and avoids barriers for
> flag updates but the host "upgrades" it to Cacheable which breaks the
> memory order).
The same can happen on x86 though, even if it's rarer. You still need a
barrier between stores and loads.
> If we want the host to enforce guest memory mapping attributes via stage
> 2, we could do it the other way around: get the guests to always assume
> full cache coherency, generating Normal Cacheable mappings, but use the
> stage 2 attributes restriction in the host to make such mappings
> non-cacheable when needed (it works this way on ARM but not in the other
> direction to relax the attributes).
That sounds like a plan for device assignment. But it still would not
solve the problem of the MMIO framebuffer, right?
>> The problem arises with MMIO areas that the guest can reasonably expect
>> to be uncacheable, but that are optimized by the host so that they end
>> up backed by cacheable RAM. It's perfectly reasonable that the same
>> device needs cacheable mapping with one userspace, and works with
>> uncacheable mapping with another userspace that doesn't optimize the
>> MMIO area to RAM.
>
> Unless the guest allocates the framebuffer itself (e.g.
> dma_alloc_coherent), we can't control the cacheability via
> "dma-coherent" properties as it refers to bus master DMA.
Okay, it's good to rule that out. One less thing to think about. :)
Same for _DSD.
> So for MMIO with the buffer allocated by the host (Qemu), the only
> solution I see on ARM is for the host to ensure coherency, either via
> explicit cache maintenance (new KVM API) or by changing the memory
> attributes used by Qemu to access such virtual MMIO.
>
> Basically Qemu is acting as a bus master when reading the framebuffer it
> allocated but the guest considers it a slave access and we don't have a
> way to tell the guest that such accesses should be cacheable, nor can we
> upgrade them via architecture features.
Yes, that's a way to put it.
>> In practice, the VGA framebuffer has an optimization that uses dirty
>> page tracking, so we could piggyback on the ioctls that return which
>> pages are dirty. It turns out that piggybacking on those ioctls also
>> should fix the case of migrating a guest while the MMU is disabled.
>
> Yes, Qemu would need to invalidate the cache before reading a dirty
> framebuffer page.
>
> As I said above, an API that allows non-cacheable mappings for the VGA
> framebuffer in Qemu would also solve the problem. I'm not sure what KVM
> provides here (or whether we can add such API).
Nothing for now; other architectures simply do not have the issue.
As long as it's just VGA, we can quirk it. There's just a couple
vendor/device IDs to catch, and the guest can then use a cacheable mapping.
For a more generic solution, the API would be madvise(MADV_DONTCACHE).
It would be easy for QEMU to use it, but I am not too optimistic about
convincing the mm folks about it. We can try.
Paolo
next prev parent reply other threads:[~2015-03-05 17:37 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-19 10:54 [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings Ard Biesheuvel
2015-02-19 10:54 ` Ard Biesheuvel
2015-02-19 10:54 ` [RFC/RFT PATCH 1/3] arm64: KVM: handle some sysreg writes in EL2 Ard Biesheuvel
2015-02-19 10:54 ` Ard Biesheuvel
2015-03-03 17:59 ` Mario Smarduch
2015-03-03 17:59 ` Mario Smarduch
2015-02-19 10:54 ` [RFC/RFT PATCH 2/3] arm64: KVM: mangle MAIR register to prevent uncached guest mappings Ard Biesheuvel
2015-02-19 10:54 ` Ard Biesheuvel
2015-02-19 10:54 ` [RFC/RFT PATCH 3/3] arm64: KVM: keep trapping of VM sysreg writes enabled Ard Biesheuvel
2015-02-19 10:54 ` Ard Biesheuvel
2015-02-19 13:40 ` Marc Zyngier
2015-02-19 13:40 ` Marc Zyngier
2015-02-19 13:44 ` Ard Biesheuvel
2015-02-19 13:44 ` Ard Biesheuvel
2015-02-19 15:19 ` Marc Zyngier
2015-02-19 15:19 ` Marc Zyngier
2015-02-19 15:22 ` Ard Biesheuvel
2015-02-19 15:22 ` Ard Biesheuvel
2015-02-19 14:50 ` [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings Alexander Graf
2015-02-19 14:50 ` Alexander Graf
2015-02-19 14:56 ` Ard Biesheuvel
2015-02-19 14:56 ` Ard Biesheuvel
2015-02-19 15:27 ` Alexander Graf
2015-02-19 15:27 ` Alexander Graf
2015-02-19 15:31 ` Ard Biesheuvel
2015-02-19 15:31 ` Ard Biesheuvel
2015-02-19 16:57 ` Andrew Jones
2015-02-19 16:57 ` Andrew Jones
2015-02-19 17:19 ` Ard Biesheuvel
2015-02-19 17:19 ` Ard Biesheuvel
2015-02-19 17:55 ` Andrew Jones
2015-02-19 17:55 ` Andrew Jones
2015-02-19 17:57 ` Paolo Bonzini
2015-02-19 17:57 ` Paolo Bonzini
2015-02-20 14:29 ` Andrew Jones
2015-02-20 14:29 ` Andrew Jones
2015-02-20 14:37 ` Ard Biesheuvel
2015-02-20 14:37 ` Ard Biesheuvel
2015-02-20 15:36 ` Andrew Jones
2015-02-20 15:36 ` Andrew Jones
2015-02-24 14:55 ` Andrew Jones
2015-02-24 14:55 ` Andrew Jones
2015-02-24 17:47 ` Ard Biesheuvel
2015-02-24 17:47 ` Ard Biesheuvel
2015-02-24 19:12 ` Andrew Jones
2015-02-24 19:12 ` Andrew Jones
2015-03-02 16:31 ` Christoffer Dall
2015-03-02 16:31 ` Christoffer Dall
2015-03-02 16:47 ` Paolo Bonzini
2015-03-02 16:47 ` Paolo Bonzini
2015-03-02 16:55 ` Laszlo Ersek
2015-03-02 16:55 ` Laszlo Ersek
2015-03-02 17:05 ` Andrew Jones
2015-03-02 17:05 ` Andrew Jones
2015-03-02 16:48 ` Andrew Jones
2015-03-02 16:48 ` Andrew Jones
2015-03-03 2:20 ` Mario Smarduch
2015-03-03 2:20 ` Mario Smarduch
2015-03-04 11:35 ` Catalin Marinas
2015-03-04 11:35 ` Catalin Marinas
2015-03-04 11:50 ` Ard Biesheuvel
2015-03-04 11:50 ` Ard Biesheuvel
2015-03-04 12:29 ` Catalin Marinas
2015-03-04 12:29 ` Catalin Marinas
2015-03-04 12:43 ` Ard Biesheuvel
2015-03-04 12:43 ` Ard Biesheuvel
2015-03-04 14:12 ` Andrew Jones
2015-03-04 14:12 ` Andrew Jones
2015-03-04 14:29 ` Catalin Marinas
2015-03-04 14:29 ` Catalin Marinas
2015-03-04 14:34 ` Peter Maydell
2015-03-04 14:34 ` Peter Maydell
2015-03-04 17:03 ` Paolo Bonzini
2015-03-04 17:03 ` Paolo Bonzini
2015-03-04 17:28 ` Catalin Marinas
2015-03-04 17:28 ` Catalin Marinas
2015-03-05 10:12 ` Paolo Bonzini
2015-03-05 10:12 ` Paolo Bonzini
2015-03-05 11:04 ` Catalin Marinas
2015-03-05 11:04 ` Catalin Marinas
2015-03-05 11:52 ` Peter Maydell
2015-03-05 11:52 ` Peter Maydell
2015-03-05 12:03 ` Catalin Marinas
2015-03-05 12:03 ` Catalin Marinas
2015-03-05 12:26 ` Paolo Bonzini
2015-03-05 12:26 ` Paolo Bonzini
2015-03-05 14:58 ` Catalin Marinas
2015-03-05 14:58 ` Catalin Marinas
2015-03-05 17:43 ` Paolo Bonzini [this message]
2015-03-05 17:43 ` Paolo Bonzini
2015-03-06 21:08 ` Mario Smarduch
2015-03-06 21:08 ` Mario Smarduch
2015-03-09 14:26 ` Andrew Jones
2015-03-09 14:26 ` Andrew Jones
2015-03-09 15:33 ` Mario Smarduch
2015-03-09 15:33 ` Mario Smarduch
2015-03-05 19:13 ` Ard Biesheuvel
2015-03-05 19:13 ` Ard Biesheuvel
2015-03-06 20:33 ` Mario Smarduch
2015-03-06 20:33 ` Mario Smarduch
2015-02-19 18:44 ` Ard Biesheuvel
2015-02-19 18:44 ` Ard Biesheuvel
2015-03-03 17:34 ` Alexander Graf
2015-03-03 17:34 ` Alexander Graf
2015-03-03 18:13 ` Laszlo Ersek
2015-03-03 18:13 ` Laszlo Ersek
2015-03-03 20:58 ` Andrew Jones
2015-03-03 20:58 ` Andrew Jones
2015-03-03 18:32 ` Catalin Marinas
2015-03-03 18:32 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54F895C8.2070306@redhat.com \
--to=pbonzini@redhat.com \
--cc=ard.biesheuvel@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=lersek@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=marc.zyngier@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.