From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Wood Date: Wed, 10 Jul 2013 18:42:42 +0000 Subject: Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation Message-Id: <1373481762.8183.220@snotra> List-Id: References: <2750D29D-8CE6-40D3-922D-864F447FEFD8@suse.de> In-Reply-To: <2750D29D-8CE6-40D3-922D-864F447FEFD8@suse.de> (from agraf@suse.de on Wed Jul 10 05:15:09 2013) MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Alexander Graf Cc: Mihai Caraman , kvm-ppc@vger.kernel.org, kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org On 07/10/2013 05:15:09 AM, Alexander Graf wrote: > > On 10.07.2013, at 02:06, Scott Wood wrote: > > > On 07/09/2013 04:44:24 PM, Alexander Graf wrote: > >> On 09.07.2013, at 20:46, Scott Wood wrote: > >> > I suspect that tlbsx is faster, or at worst similar. And unlike > comparing tlbsx to lwepx (not counting a fix for the threading > problem), we don't already have code to search the guest TLB, so > testing would be more work. > >> We have code to walk the guest TLB for TLB misses. This really is > just the TLB miss search without host TLB injection. > >> So let's say we're using the shadow TLB. The guest always has its > say 64 TLB entries that it can count on - we never evict anything by > accident, because we store all of the 64 entries in our guest TLB > cache. When the guest faults at an address, the first thing we do is > we check the cache whether we have that page already mapped. > >> However, with this method we now have 2 enumeration methods for > guest TLB searches. We have the tlbsx one which searches the host TLB > and we have our guest TLB cache. The guest TLB cache might still > contain an entry for an address that we already invalidated on the > host. Would that impose a problem? > >> I guess not because we're swizzling the exit code around to > instead be an instruction miss which means we restore the TLB entry > into our host's TLB so that when we resume, we land here and the > tlbsx hits. But it feels backwards. > > > > Any better way? Searching the guest TLB won't work for the LRAT > case, so we'd need to have this logic around anyway. We shouldn't > add a second codepath unless it's a clear performance gain -- and > again, I suspect it would be the opposite, especially if the entry is > not in TLB0 or in one of the first few entries searched in TLB1. The > tlbsx miss case is not what we should optimize for. > > Hrm. > > So let's redesign this thing theoretically. We would have an exit > that requires an instruction fetch. We would override > kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can > fail because it can't find the TLB entry in the host TLB. When it > fails, we have to abort the emulation and resume the guest at the > same IP. > > Now the guest gets the TLB miss, we populate, go back into the guest. > The guest hits the emulation failure again. We go back to > kvmppc_ld_inst() which succeeds this time and we can emulate the > instruction. That's pretty much what this patch does, except that it goes immediately to the TLB miss code rather than having the extra round-trip back to the guest. Is there any benefit from adding that extra round-trip? Rewriting the exit type instead doesn't seem that bad... > I think this works. Just make sure that the gateway to the > instruction fetch is kvmppc_get_last_inst() and make that failable. > Then the difference between looking for the TLB entry in the host's > TLB or in the guest's TLB cache is hopefully negligible. I don't follow here. What does this have to do with looking in the guest TLB? -Scott From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from db9outboundpool.messaging.microsoft.com (mail-db9lp0253.outbound.messaging.microsoft.com [213.199.154.253]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "mail.global.frontbridge.com", Issuer "MSIT Machine Auth CA 2" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 0963C2C02CE for ; Thu, 11 Jul 2013 04:44:05 +1000 (EST) Date: Wed, 10 Jul 2013 13:42:42 -0500 From: Scott Wood Subject: Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation To: Alexander Graf In-Reply-To: <2750D29D-8CE6-40D3-922D-864F447FEFD8@suse.de> (from agraf@suse.de on Wed Jul 10 05:15:09 2013) Message-ID: <1373481762.8183.220@snotra> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; delsp=Yes; format=Flowed Cc: Mihai Caraman , linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, kvm-ppc@vger.kernel.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 07/10/2013 05:15:09 AM, Alexander Graf wrote: >=20 > On 10.07.2013, at 02:06, Scott Wood wrote: >=20 > > On 07/09/2013 04:44:24 PM, Alexander Graf wrote: > >> On 09.07.2013, at 20:46, Scott Wood wrote: > >> > I suspect that tlbsx is faster, or at worst similar. And unlike =20 > comparing tlbsx to lwepx (not counting a fix for the threading =20 > problem), we don't already have code to search the guest TLB, so =20 > testing would be more work. > >> We have code to walk the guest TLB for TLB misses. This really is =20 > just the TLB miss search without host TLB injection. > >> So let's say we're using the shadow TLB. The guest always has its =20 > say 64 TLB entries that it can count on - we never evict anything by =20 > accident, because we store all of the 64 entries in our guest TLB =20 > cache. When the guest faults at an address, the first thing we do is =20 > we check the cache whether we have that page already mapped. > >> However, with this method we now have 2 enumeration methods for =20 > guest TLB searches. We have the tlbsx one which searches the host TLB =20 > and we have our guest TLB cache. The guest TLB cache might still =20 > contain an entry for an address that we already invalidated on the =20 > host. Would that impose a problem? > >> I guess not because we're swizzling the exit code around to =20 > instead be an instruction miss which means we restore the TLB entry =20 > into our host's TLB so that when we resume, we land here and the =20 > tlbsx hits. But it feels backwards. > > > > Any better way? Searching the guest TLB won't work for the LRAT =20 > case, so we'd need to have this logic around anyway. We shouldn't =20 > add a second codepath unless it's a clear performance gain -- and =20 > again, I suspect it would be the opposite, especially if the entry is =20 > not in TLB0 or in one of the first few entries searched in TLB1. The =20 > tlbsx miss case is not what we should optimize for. >=20 > Hrm. >=20 > So let's redesign this thing theoretically. We would have an exit =20 > that requires an instruction fetch. We would override =20 > kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can =20 > fail because it can't find the TLB entry in the host TLB. When it =20 > fails, we have to abort the emulation and resume the guest at the =20 > same IP. >=20 > Now the guest gets the TLB miss, we populate, go back into the guest. =20 > The guest hits the emulation failure again. We go back to =20 > kvmppc_ld_inst() which succeeds this time and we can emulate the =20 > instruction. That's pretty much what this patch does, except that it goes =20 immediately to the TLB miss code rather than having the extra =20 round-trip back to the guest. Is there any benefit from adding that =20 extra round-trip? Rewriting the exit type instead doesn't seem that =20 bad... > I think this works. Just make sure that the gateway to the =20 > instruction fetch is kvmppc_get_last_inst() and make that failable. =20 > Then the difference between looking for the TLB entry in the host's =20 > TLB or in the guest's TLB cache is hopefully negligible. I don't follow here. What does this have to do with looking in the =20 guest TLB? -Scott= From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Wood Subject: Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation Date: Wed, 10 Jul 2013 13:42:42 -0500 Message-ID: <1373481762.8183.220@snotra> References: <2750D29D-8CE6-40D3-922D-864F447FEFD8@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; delsp=Yes; format=Flowed Content-Transfer-Encoding: 8BIT Cc: Mihai Caraman , , , To: Alexander Graf Return-path: In-Reply-To: <2750D29D-8CE6-40D3-922D-864F447FEFD8@suse.de> (from agraf@suse.de on Wed Jul 10 05:15:09 2013) Content-Disposition: inline Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 07/10/2013 05:15:09 AM, Alexander Graf wrote: > > On 10.07.2013, at 02:06, Scott Wood wrote: > > > On 07/09/2013 04:44:24 PM, Alexander Graf wrote: > >> On 09.07.2013, at 20:46, Scott Wood wrote: > >> > I suspect that tlbsx is faster, or at worst similar. And unlike > comparing tlbsx to lwepx (not counting a fix for the threading > problem), we don't already have code to search the guest TLB, so > testing would be more work. > >> We have code to walk the guest TLB for TLB misses. This really is > just the TLB miss search without host TLB injection. > >> So let's say we're using the shadow TLB. The guest always has its > say 64 TLB entries that it can count on - we never evict anything by > accident, because we store all of the 64 entries in our guest TLB > cache. When the guest faults at an address, the first thing we do is > we check the cache whether we have that page already mapped. > >> However, with this method we now have 2 enumeration methods for > guest TLB searches. We have the tlbsx one which searches the host TLB > and we have our guest TLB cache. The guest TLB cache might still > contain an entry for an address that we already invalidated on the > host. Would that impose a problem? > >> I guess not because we're swizzling the exit code around to > instead be an instruction miss which means we restore the TLB entry > into our host's TLB so that when we resume, we land here and the > tlbsx hits. But it feels backwards. > > > > Any better way? Searching the guest TLB won't work for the LRAT > case, so we'd need to have this logic around anyway. We shouldn't > add a second codepath unless it's a clear performance gain -- and > again, I suspect it would be the opposite, especially if the entry is > not in TLB0 or in one of the first few entries searched in TLB1. The > tlbsx miss case is not what we should optimize for. > > Hrm. > > So let's redesign this thing theoretically. We would have an exit > that requires an instruction fetch. We would override > kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can > fail because it can't find the TLB entry in the host TLB. When it > fails, we have to abort the emulation and resume the guest at the > same IP. > > Now the guest gets the TLB miss, we populate, go back into the guest. > The guest hits the emulation failure again. We go back to > kvmppc_ld_inst() which succeeds this time and we can emulate the > instruction. That's pretty much what this patch does, except that it goes immediately to the TLB miss code rather than having the extra round-trip back to the guest. Is there any benefit from adding that extra round-trip? Rewriting the exit type instead doesn't seem that bad... > I think this works. Just make sure that the gateway to the > instruction fetch is kvmppc_get_last_inst() and make that failable. > Then the difference between looking for the TLB entry in the host's > TLB or in the guest's TLB cache is hopefully negligible. I don't follow here. What does this have to do with looking in the guest TLB? -Scott