linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
Date: Fri, 16 Aug 2013 10:19:12 -0700	[thread overview]
Message-ID: <20130816171912.GB20246@cbox> (raw)
In-Reply-To: <CAAhSdy3W+q4yk992ajnHBb23SVDqmfe_4BiGfH9XTYkwfVmC-A@mail.gmail.com>

On Fri, Aug 16, 2013 at 12:27:55PM +0530, Anup Patel wrote:
> On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@brainfault.org> wrote:
> > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> > <christoffer.dall@linaro.org> wrote:
> >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> >>> On 2013-08-15 16:13, Anup Patel wrote:
> >>> > Hi Marc,
> >>> >
> >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com>
> >>> > wrote:
> >>> >> On 2013-08-15 14:31, Anup Patel wrote:
> >>> >>>
> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> >>> >>> <marc.zyngier@arm.com>
> >>> >>> wrote:
> >>> >>>>
> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Hi Marc,
> >>> >>>>>
> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> >>> >>>>> <marc.zyngier@arm.com>
> >>> >>>>> wrote:
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> Hi Anup,
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> >>> >>>>>>> <marc.zyngier@arm.com>
> >>> >>>>>>> wrote:
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> Hi Pranav,
> >>> >>>>>>>>
> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> >>> >>>>>>>>> dirty
> >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> >>> >>>>>>>>> this,
> >>> >>>>>>>>> we need to flush such dirty content from d-cache so that
> >>> >>>>>>>>> guest
> >>> >>>>>>>>> will see correct contents of guest page when guest MMU is
> >>> >>>>>>>>> disabled.
> >>> >>>>>>>>>
> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> >>> >>>>>>>>> L3-cache.
> >>> >>>>>>>>>
> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> >>> >>>>>>>>> <pranavkumar@linaro.org>
> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> >>> >>>>>>>>> ---
> >>> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> >>> >>>>>>>>>  1 file changed, 2 insertions(+)
> >>> >>>>>>>>>
> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> index efe609c..5129038 100644
> >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> >>> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> >>> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> >>> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> >>> >>>>>>>>> +             /* Flush d-cache for systems with external
> >>> >>>>>>>>> caches. */
> >>> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> >>> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> >>> >>>>>>>>> ASID-tagged
> >>> >>>>>>>>> VIVT
> >>> >>>>>>>>> */
> >>> >>>>>>>>>               /* any kind of VIPT cache */
> >>> >>>>>>>>>               __flush_icache_all();
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the
> >>> >>>>>>>> past]
> >>> >>>>>>>>
> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit
> >>> >>>>>>>> the
> >>> >>>>>>>> data
> >>> >>>>>>>> cache on each page mapping. It looks overkill.
> >>> >>>>>>>>
> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> >>> >>>>>>>> kvmtools
> >>> >>>>>>>> knows which bits of the guest memory have been touched, and
> >>> >>>>>>>> can do a
> >>> >>>>>>>> "DC
> >>> >>>>>>>> DVAC" on this region.
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> >>> >>>>>>> I am
> >>> >>>>>>> sure
> >>> >>>>>>> other architectures don't have such cache cleaning in
> >>> >>>>>>> user-space for
> >>> >>>>>>> KVM.
> >>> >>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu
> >>> >>>>>>>> - but
> >>> >>>>>>>> that's not very nice either (have to clean the whole of the
> >>> >>>>>>>> guest
> >>> >>>>>>>> memory, which makes a full dcache clean more appealing).
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> >>> >>>>>>> VCPU was
> >>> >>>>>>> our second option but current approach seemed very simple hence
> >>> >>>>>>> we went for this.
> >>> >>>>>>>
> >>> >>>>>>> If more people vote for full d-cache clean upon first run of
> >>> >>>>>>> VCPU then
> >>> >>>>>>> we should revise this patch.
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've
> >>> >>>>>> boot-tested
> >>> >>>>>> it on a model, but of course I can't really verify that it fixes
> >>> >>>>>> your
> >>> >>>>>> issue.
> >>> >>>>>>
> >>> >>>>>> As far as I can see, it should fix it without any additional
> >>> >>>>>> flushing.
> >>> >>>>>>
> >>> >>>>>> Please let me know how it goes.
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> >>> >>>>> Write-Allocate,
> >>> >>>>> Outer Write-Back Write-Allocate memory"
> >>> >>>>>
> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> >>> >>>>> treated as "Strongly-ordered device memory"
> >>> >>>>>
> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> >>> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> >>> >>>> be
> >>> >>>> accessed as device memory.
> >>> >>>>
> >>> >>>>
> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> >>> >>>>> KVM ARM64 because we use VGIC.
> >>> >>>>>
> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM
> >>> >>>>> when Stage1 MMU is off.
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> See above. My understanding is that HCR.DC controls the default
> >>> >>>> output of
> >>> >>>> Stage-1, and Stage-2 overrides still apply.
> >>> >>>
> >>> >>>
> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> >>> >>> and wanted guest to decide the memory attribute. In other words,
> >>> >>> you
> >>> >>> did not want to enforce any memory attribute in Stage2.
> >>> >>>
> >>> >>> Please refer to this patch
> >>> >>> https://patchwork.kernel.org/patch/2543201/
> >>> >>
> >>> >>
> >>> >> This patch has never been merged. If you carry on following the
> >>> >> discussion,
> >>> >> you will certainly notice it was dropped for a very good reason:
> >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> >>> >>
> >>> >> So Stage-2 memory attributes are used, they are not going away, and
> >>> >> they are
> >>> >> essential to the patch I sent this morning.
> >>> >>
> >>> >>
> >>> >>         M.
> >>> >> --
> >>> >> Fast, cheap, reliable. Pick two.
> >>> >
> >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> >>> > is
> >>> > provided a DMA-capable device in pass-through mode. The reason being
> >>> > bootloader/firmware typically don't enable MMU and such
> >>> > bootloader/firmware
> >>> > will programme a pass-through DMA-capable device without any flushes
> >>> > to
> >>> > guest RAM (because it has not enabled MMU).
> >>> >
> >>> > A good example of such a device would be SATA AHCI controller given
> >>> > to a
> >>> > Guest/VM as direct access (using SystemMMU) and Guest
> >>> > bootloader/firmware
> >>> > accessing this SATA AHCI controller to load kernel images from SATA
> >>> > disk.
> >>> > In this situation, we will have to hack Guest bootloader/firmware
> >>> > AHCI driver to
> >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> >>>
> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of
> >>> curiosity: is that a made up example or something you actually have?
> >>>
> >>> Back to square one:
> >>> Can you please benchmark the various cache cleaning options (global at
> >>> startup time, per-page on S2 translation fault, and user-space)?
> >>>
> >> Eh, why is this a more valid argument than the vgic?  The device
> >> passthrough Stage-2 mappings would still have the Stage-2 memory
> >> attributes to configure that memory region as device memory.  Why is it
> >> relevant if the device is DMA-capable in this context?
> >>
> >> -Christoffer
> >
> > Most ARM bootloader/firmware run with MMU off hence, they will not do
> > explicit cache flushes when programming DMA-capable device. Now If
> > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> > will not be visible to DMA-capable device.
> >
> > --Anup
> 
> The approach of flushing d-cache by set/way upon first run of VCPU will
> not work because for set/way operations ARM ARM says: "For set/way
> operations, and for All (entire cache) operations, the point is defined to be
> to the next level of caching". In other words, set/way operations work upto
> point of unification.
> 
> Also, Guest Linux already does __flush_dcache_all() from __cpu_setup()
> at bootup time which does not work for us when L3-cache is enabled so,
> there is no point is adding one more __flush_dcache_all() upon first run of
> VCPU in KVM ARM64.

When did anyone suggest using a cache cleaning method that doesn't apply
to this case.  I'm a little confused about your comment here.

> 
> IMHO, we are left with following options:
> 1. Flush all RAM regions of VCPU using __flush_dcache_range()
>     upon first run of VCPU

We could do that, but we have to ensure that no other memory regions can
be added later.  Is this the case?  I don't remember.

> 2. Implement outer-cache framework for ARM64 and flush all
>     caches + outer cache (i.e. L3-cache) upon first run of VCPU

What's the difference between (2) and (1)?

> 3. Use an alternate version of flush_icache_range() which will
>     flush d-cache by PoC instead of PoU. We can also ensure
>     that coherent_icache_guest_page() function will be called
>     upon Stage2 prefetch aborts only.
> 

I'm sorry, but I'm not following this discussion.  Are we not mixing a
discussion along two different axis here?  As I see it there are two
separate issues:

 (A) When/Where to flush the memory regions
 (B) How to flush the memory regions, including the hardware method and
     the kernel software engineering aspect.

-Christoffer

  reply	other threads:[~2013-08-16 17:19 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-14 11:47 [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache Pranavkumar Sawargaonkar
2013-08-14 12:04 ` Marc Zyngier
2013-08-14 14:22   ` Anup Patel
2013-08-14 15:06     ` Alexander Graf
2013-08-14 15:34       ` Marc Zyngier
2013-08-14 15:41         ` Peter Maydell
2013-08-14 15:57           ` Marc Zyngier
2013-08-14 16:36             ` Anup Patel
2013-08-14 15:23     ` Marc Zyngier
2013-08-14 15:35       ` Peter Maydell
2013-08-14 15:49         ` Marc Zyngier
2013-08-14 17:34           ` Christoffer Dall
2013-08-15  4:44             ` Marc Zyngier
2013-08-15 16:58               ` Christoffer Dall
2013-08-14 15:36       ` Anup Patel
2013-08-15  4:52     ` Marc Zyngier
2013-08-15  6:26       ` Anup Patel
2013-08-15  8:31         ` Marc Zyngier
2013-08-15 13:31           ` Anup Patel
2013-08-15 14:47             ` Marc Zyngier
2013-08-15 15:13               ` Anup Patel
2013-08-15 15:37                 ` Marc Zyngier
2013-08-15 15:45                   ` Anup Patel
2013-08-15 16:53                   ` Christoffer Dall
2013-08-16  5:02                     ` Anup Patel
2013-08-16  6:57                       ` Anup Patel
2013-08-16 17:19                         ` Christoffer Dall [this message]
2013-08-16 17:42                           ` Anup Patel
2013-08-16 17:50                             ` Christoffer Dall
2013-08-16 18:06                               ` Christoffer Dall
2013-08-16 18:20                                 ` Anup Patel
2013-08-16 18:11                               ` Anup Patel
2013-08-16 18:20                                 ` Christoffer Dall
2013-08-30  9:52                                 ` Catalin Marinas
2013-08-30 10:44                                   ` Anup Patel
2013-08-30 13:02                                     ` Catalin Marinas
2013-08-30 13:21                                     ` Marc Zyngier
2013-08-30 14:04                                       ` Catalin Marinas
2013-08-30 14:22                                         ` Marc Zyngier
2013-08-30 14:30                                           ` Will Deacon
2013-08-30 14:52                                             ` Anup Patel
2013-08-30 15:12                                             ` Marc Zyngier
2013-08-29 10:52                         ` Catalin Marinas
2013-08-29 12:31                           ` Anup Patel
2013-08-29 12:53                             ` Catalin Marinas
2013-08-29 16:02                               ` Anup Patel
2013-08-30  9:44                                 ` Catalin Marinas
2013-08-30 10:36                                   ` Anup Patel
2013-08-30 12:52                                     ` Catalin Marinas
2013-08-16 17:14                       ` Christoffer Dall
     [not found]                         ` <CALrVBkvEP1Q0mKpv8ViOTLRvW2ks18MQXgmurSBHn+aJcz+=gw@mail.gmail.com>
2013-08-16 17:28                           ` Christoffer Dall
2013-08-16 17:42                             ` Christoffer Dall
2013-08-15  8:39       ` Pranavkumar Sawargaonkar
2013-08-15 15:42         ` Marc Zyngier
2013-08-14 14:23 ` Sudeep KarkadaNagesha
2013-08-14 14:35   ` Anup Patel
2013-08-14 17:37 ` Christoffer Dall
  -- strict thread matches above, loose matches on Subject: below --
2013-08-14 11:45 Pranavkumar Sawargaonkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130816171912.GB20246@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).