Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

public inbox for kvmarm@lists.cs.columbia.edu
 help / color / mirror / Atom feed

From: Mario Smarduch <m.smarduch@samsung.com>
To: Andrew Jones <drjones@redhat.com>
Cc: KVM devel mailing list <kvm@vger.kernel.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Laszlo Ersek <lersek@redhat.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings
Date: Mon, 09 Mar 2015 08:33:13 -0700	[thread overview]
Message-ID: <54FDBD39.8060000@samsung.com> (raw)
In-Reply-To: <20150309142601.GA4171@hawk.usersys.redhat.com>

On 03/09/2015 07:26 AM, Andrew Jones wrote:
> On Fri, Mar 06, 2015 at 01:08:29PM -0800, Mario Smarduch wrote:
>> On 03/05/2015 09:43 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 05/03/2015 15:58, Catalin Marinas wrote:
>>>>> It would especially suck if the user has a cluster with different
>>>>> machines, some of them coherent and others non-coherent, and then has to
>>>>> debug why the same configuration works on some machines and not on others.
>>>>
>>>> That's a problem indeed, especially with guest migration. But I don't
>>>> think we have any sane solution here for the bus master DMA.
>>>
>>> I do not oppose doing cache management in QEMU for bus master DMA
>>> (though if the solution you outlined below works it would be great).
>>>
>>>> ARM can override them as well but only making them stricter. Otherwise,
>>>> on a weakly ordered architecture, it's not always safe (let's say the
>>>> guest thinks it accesses Strongly Ordered memory and avoids barriers for
>>>> flag updates but the host "upgrades" it to Cacheable which breaks the
>>>> memory order).
>>>
>>> The same can happen on x86 though, even if it's rarer.  You still need a
>>> barrier between stores and loads.
>>>
>>>> If we want the host to enforce guest memory mapping attributes via stage
>>>> 2, we could do it the other way around: get the guests to always assume
>>>> full cache coherency, generating Normal Cacheable mappings, but use the
>>>> stage 2 attributes restriction in the host to make such mappings
>>>> non-cacheable when needed (it works this way on ARM but not in the other
>>>> direction to relax the attributes).
>>>
>>> That sounds like a plan for device assignment.  But it still would not
>>> solve the problem of the MMIO framebuffer, right?
>>>
>>>>> The problem arises with MMIO areas that the guest can reasonably expect
>>>>> to be uncacheable, but that are optimized by the host so that they end
>>>>> up backed by cacheable RAM.  It's perfectly reasonable that the same
>>>>> device needs cacheable mapping with one userspace, and works with
>>>>> uncacheable mapping with another userspace that doesn't optimize the
>>>>> MMIO area to RAM.
>>>>
>>>> Unless the guest allocates the framebuffer itself (e.g.
>>>> dma_alloc_coherent), we can't control the cacheability via
>>>> "dma-coherent" properties as it refers to bus master DMA.
>>>
>>> Okay, it's good to rule that out.  One less thing to think about. :)
>>> Same for _DSD.
>>>
>>>> So for MMIO with the buffer allocated by the host (Qemu), the only
>>>> solution I see on ARM is for the host to ensure coherency, either via
>>>> explicit cache maintenance (new KVM API) or by changing the memory
>>>> attributes used by Qemu to access such virtual MMIO.
>>>>
>>>> Basically Qemu is acting as a bus master when reading the framebuffer it
>>>> allocated but the guest considers it a slave access and we don't have a
>>>> way to tell the guest that such accesses should be cacheable, nor can we
>>>> upgrade them via architecture features.
>>>
>>> Yes, that's a way to put it.
>>>
>>>>> In practice, the VGA framebuffer has an optimization that uses dirty
>>>>> page tracking, so we could piggyback on the ioctls that return which
>>>>> pages are dirty.  It turns out that piggybacking on those ioctls also
>>>>> should fix the case of migrating a guest while the MMU is disabled.
>>>>
>>>> Yes, Qemu would need to invalidate the cache before reading a dirty
>>>> framebuffer page.
>>>>
>>>> As I said above, an API that allows non-cacheable mappings for the VGA
>>>> framebuffer in Qemu would also solve the problem. I'm not sure what KVM
>>>> provides here (or whether we can add such API).
>>>
>>> Nothing for now; other architectures simply do not have the issue.
>>>
>>> As long as it's just VGA, we can quirk it.  There's just a couple
>>> vendor/device IDs to catch, and the guest can then use a cacheable mapping.
>>>
>>> For a more generic solution, the API would be madvise(MADV_DONTCACHE).
>>> It would be easy for QEMU to use it, but I am not too optimistic about
>>> convincing the mm folks about it.  We can try.
> 
> I forgot to list this one in my summary of approaches[*]. This is a
> nice, clean approach. Avoids getting cache maintenance into everything.
> However, besides the difficulty to get it past mm people, it reduces
> performance for any userspace-userspace uses/sharing of the memory.
> userspace-guest requires cache maintenance, but nothing else. Maybe
> that's not an important concern for the few emulated devices that need
> it though.
> 
>>
>> Interested to see the outcome.
>>
>> I was thinking of a very basic memory driver that can provide
>> an uncached memslot to QEMU - in mmap() file operation
>> apply pgprot_uncached to allocated pages, lock them, flush TLB
>> call remap_pfn_range().
> 
> I guess this is the same as the madvise approach, but with a driver.
> KVM could take this approach itself when memslots are added/updated
> with the INCOHERENT flag. Maybe worth some experimental patches to
> find out?

I would work on this but I'm tied up for next 3 weeks.
If anyone is interested I can provide base code, I used
it for memory passthrough although testing may be time consuming.
I think the hurdle here is the kernel doesn't map these
for any reason like page migration, locking pages should
tell kernel don't touch. madvise() is the  desired solution
but I suspect it might take a while to get in.
> 
> I'm still thinking about experimenting with the ARM private syscalls
> next though.

Hope it succeeds.
> 
> drew
> 
> [*] http://lists.gnu.org/archive/html/qemu-devel/2015-03/msg01254.html
>

next prev parent reply	other threads:[~2015-03-09 15:27 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-19 10:54 [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings Ard Biesheuvel
2015-02-19 10:54 ` [RFC/RFT PATCH 1/3] arm64: KVM: handle some sysreg writes in EL2 Ard Biesheuvel
2015-03-03 17:59   ` Mario Smarduch
2015-02-19 10:54 ` [RFC/RFT PATCH 2/3] arm64: KVM: mangle MAIR register to prevent uncached guest mappings Ard Biesheuvel
2015-02-19 10:54 ` [RFC/RFT PATCH 3/3] arm64: KVM: keep trapping of VM sysreg writes enabled Ard Biesheuvel
2015-02-19 13:40   ` Marc Zyngier
2015-02-19 13:44     ` Ard Biesheuvel
2015-02-19 15:19       ` Marc Zyngier
2015-02-19 15:22         ` Ard Biesheuvel
2015-02-19 14:50 ` [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings Alexander Graf
2015-02-19 14:56   ` Ard Biesheuvel
2015-02-19 15:27     ` Alexander Graf
2015-02-19 15:31       ` Ard Biesheuvel
2015-02-19 16:57 ` Andrew Jones
2015-02-19 17:19   ` Ard Biesheuvel
2015-02-19 17:55     ` Andrew Jones
2015-02-19 17:57       ` Paolo Bonzini
2015-02-20 14:29         ` Andrew Jones
2015-02-20 14:37           ` Ard Biesheuvel
2015-02-20 15:36             ` Andrew Jones
2015-02-24 14:55               ` Andrew Jones
2015-02-24 17:47                 ` Ard Biesheuvel
2015-02-24 19:12                   ` Andrew Jones
2015-03-02 16:31                   ` Christoffer Dall
2015-03-02 16:47                     ` Paolo Bonzini
2015-03-02 16:55                       ` Laszlo Ersek
2015-03-02 17:05                         ` Andrew Jones
2015-03-02 16:48                     ` Andrew Jones
2015-03-03  2:20                     ` Mario Smarduch
2015-03-04 11:35                       ` Catalin Marinas
2015-03-04 11:50                         ` Ard Biesheuvel
2015-03-04 12:29                           ` Catalin Marinas
2015-03-04 12:43                             ` Ard Biesheuvel
2015-03-04 14:12                               ` Andrew Jones
2015-03-04 14:29                                 ` Catalin Marinas
2015-03-04 14:34                                   ` Peter Maydell
2015-03-04 17:03                                   ` Paolo Bonzini
2015-03-04 17:28                                     ` Catalin Marinas
2015-03-05 10:12                                       ` Paolo Bonzini
2015-03-05 11:04                                         ` Catalin Marinas
2015-03-05 11:52                                           ` Peter Maydell
2015-03-05 12:03                                             ` Catalin Marinas
2015-03-05 12:26                                               ` Paolo Bonzini
2015-03-05 14:58                                                 ` Catalin Marinas
2015-03-05 17:43                                                   ` Paolo Bonzini
2015-03-06 21:08                                                     ` Mario Smarduch
2015-03-09 14:26                                                       ` Andrew Jones
2015-03-09 15:33                                                         ` Mario Smarduch [this message]
2015-03-05 19:13                                                   ` Ard Biesheuvel
2015-03-06 20:33                         ` Mario Smarduch
2015-02-19 18:44       ` Ard Biesheuvel
2015-03-03 17:34 ` Alexander Graf
2015-03-03 18:13   ` Laszlo Ersek
2015-03-03 20:58     ` Andrew Jones
2015-03-03 18:32 ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54FDBD39.8060000@samsung.com \
    --to=m.smarduch@samsung.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=drjones@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=lersek@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc.zyngier@arm.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox