From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Graf <agraf@suse.de>
Subject: Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for
 emulation
Date: Tue, 09 Jul 2013 19:44:32 +0200
Message-ID: <51DC4C00.70509@suse.de>
References: <1373390018.8183.194@snotra>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Mihai Caraman <mihai.caraman@freescale.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
To: Scott Wood <scottwood@freescale.com>
Return-path: <kvm-ppc-owner@vger.kernel.org>
In-Reply-To: <1373390018.8183.194@snotra>
Sender: kvm-ppc-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

On 07/09/2013 07:13 PM, Scott Wood wrote:
> On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
>>
>> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>>
>> > lwepx faults needs to be handled by KVM and this implies additional 
>> code
>> > in DO_KVM macro to identify the source of the exception originated 
>> from
>> > host context. This requires to check the Exception Syndrome Register
>> > (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for 
>> DTB_MISS,
>> > DSI and LRAT exceptions which is too intrusive for the host.
>> >
>> > Get rid of lwepx and acquire last instuction in 
>> kvmppc_handle_exit() by
>> > searching for the physical address and kmap it. This fixes an 
>> infinite loop
>>
>> What's the difference in speed for this?
>>
>> Also, could we call lwepx later in host code, when 
>> kvmppc_get_last_inst() gets invoked?
>
> Any use of lwepx is problematic unless we want to add overhead to the 
> main Linux TLB miss handler.

What exactly would be missing?

I'd also still like to see some performance benchmarks on this to make 
sure we're not walking into a bad direction.

>
>> > +        return;
>> > +    }
>> > +
>> > +    mas3 = mfspr(SPRN_MAS3);
>> > +    pr = vcpu->arch.shared->msr & MSR_PR;
>> > +    if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & 
>> MAS3_SX)))) {
>> > +         /*
>> > +         * Another thread may rewrite the TLB entry in parallel, 
>> don't
>> > +         * execute from the address if the execute permission is 
>> not set
>>
>> Isn't this racy?
>
> Yes, that's the point.  We want to access permissions atomically with 
> the address.  If the guest races here, the unpredictable behavior is 
> its own fault, but we don't want to make it worse by assuming that the 
> new TLB entry is executable just because the old TLB entry was.

I see.

>
> There's still a potential problem if the instruction at the new TLB 
> entry is valid but not something that KVM emulates (because it 
> wouldn't have trapped).  Given that the guest is already engaging in 
> unpredictable behavior, though, and that it's no longer a security 
> issue (it'll just cause the guest to exit), I don't think we need to 
> worry too much about it.

No, that case is fine. It's the same as book3s pr.

>
>> > +         */
>> > +        vcpu->arch.fault_esr = 0;
>> > +        *exit_nr = BOOKE_INTERRUPT_INST_STORAGE;
>> > +        return;
>> > +    }
>> > +
>> > +    /* Get page size */
>> > +    if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
>> > +        psize_shift = PAGE_SHIFT;
>> > +    else
>> > +        psize_shift = MAS1_GET_TSIZE(mas1) + 10;
>> > +
>> > +    mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) << 32) |
>> > +            mfspr(SPRN_MAS3);
>>
>> You're non-atomically reading MAS3/MAS7 after you've checked for 
>> permissions on MAS3. I'm surprised there's no handler that allows 
>> MAS3/7 access through the new, combined SPR for 64bit systems.
>
> There is, but then we'd need to special-case 64-bit systems.

Oh, what I was trying to say is that I'm surprised there's nothing in 
Linux already like

static inline u64 get_mas73(void) {
#ifdef CONFIG_PPC64
     return mfspr(SPRN_MAS73)
#else
     return ((u64)mfspr(SPRN_MAS7) << 32) | mfspr(SPRN_MAS3);
#endif
}

>   Why does atomicity matter here?  The MAS registers were filled in 
> when we did the tlbsx.  They are thread-local.  They don't magically 
> change just because the other thread rewrites the TLB entry that was 
> used to fill them.

Yeah, it doesn't matter.

>
>> > +    addr = (mas7_mas3 & (~0ULL << psize_shift)) |
>> > +           (geaddr & ((1ULL << psize_shift) - 1ULL));
>> > +
>> > +    /* Map a page and get guest's instruction */
>> > +    page = pfn_to_page(addr >> PAGE_SHIFT);
>>
>> So it seems to me like you're jumping through a lot of hoops to make 
>> sure this works for LRAT and non-LRAT at the same time. Can't we just 
>> treat them as the different things they are?
>>
>> What if we have different MMU backends for LRAT and non-LRAT? The 
>> non-LRAT case could then try lwepx, if that fails, fall back to read 
>> the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall 
>> back to this logic.
>
> This isn't about LRAT; it's about hardware threads.  It also fixes the 
> handling of execute-only pages on current chips.

On non-LRAT systems we could always check our shadow copy of the guest's 
TLB, no? I'd really like to know what the performance difference would 
be for the 2 approaches.


Alex