From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Date: Tue, 12 Aug 2008 08:04:20 +0000 Subject: Re: [PATCH 4 of 5] kvm: ppc: Write only modified shadow entries into Message-Id: <48A14404.3090707@linux.vnet.ibm.com> List-Id: References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: kvm-ppc@vger.kernel.org Liu Yu wrote: >> -----Original Message----- >> From: Christian Ehrhardt [mailto:ehrhardt@linux.vnet.ibm.com]=20 >> Sent: Monday, August 11, 2008 7:48 PM >> To: Liu Yu >> Cc: Hollis Blanchard; kvm-ppc@vger.kernel.org >> Subject: Re: [PATCH 4 of 5] kvm: ppc: Write only modified=20 >> shadow entries into theTLB on exit >> =20 >>> When you set nearly all tlb entries modified in this loop,=20 >>> =20 >> you have to=20 >> =20 >>> write all of them back before entering the guest in=20 >>> =20 >> booke_interrupts.S. >> =20 >>> So, why not "_tlbia()" here outside this loop, and don't=20 >>> =20 >> set them modified. >> =20 >>> Then, you needn't write them back. >>> In fact, the shadow tlb entries are still consistent with=20 >>> =20 >> the hardware, as all of them are invalidated. >> =20 >>> Am I overlooking something ? >>> =20 >>> =20 >> As Hollis wrote in his patch comment usually just one tlb=20 >> entry is modified which saves a lot of updates. >> If we invalidate all we later on would have to satisfy the=20 >> misses we cause by doing so. >> =20 > > In this loop, aren't you invalidating them all? > > =20 >> With some free time we could test _tlbia() here, but I expect=20 >> it to be slower by causing the need for tlb re-population=20 >> compared to what we save here. >> >> =20 > > I don't understand why it will be slower. > I think _tlbia will take the same effect as what you do here. > That is to say, it will always cause the need for tlb re-population. > And what's more, _tlbia will save some time when it enter the guest. > > =20 On 440 which this file 440x_tlb.c is for, the implementation of tlbia is=20 more or less the same than Hollis code when reentering the guest (there=20 is no tlbia instruction on 440) with the addition of a check of the=20 shadow modified flag. I agree with you that the code in the kvmppc_mmu_priv_switch effectively=20 releases all tlbe's to the high water mark, marks them modified and=20 later on pushes them (being invalid) when entering the guest. That means that the part of revoking all tlbe's to the high water mark=20 can be done with tlbia() instead of using the kvmppc_tlbe_set_modified=20 call for all indexes here. I also agree that this can save us memory accesses from 0 to=20 tlb_44x_hwater for the setting of the shadow modified flag, but=20 everything else will be the same. Don't get me wrong I usually embrace=20 any performance improvement, but when you look in the patched source=20 file the is a comment just above the patched section: /* Future optimization: clear only userspace mappings. */ Once we do that (or anything similar if we get other ideas here) keeping=20 the guest kernel mapping we can't use _tlbia() anymore because it clears=20 everyting. That as an example would save us a guest exit which should=20 save more time than some memory accesses. Alltogether thats just my opinion, maybe we both wait for Hollis come=20 back from vacation next week and add a third opinion to this discussion=20 making decisions easier. P.S. as I stated above this code is from the 440 tlb implementation, so=20 if e.g. there is hardware support for it on some platforms that might=20 make that improvement more beneficial we need an abstraction here. --=20 Gr=FCsse / regards,=20 Christian Ehrhardt IBM Linux Technology Center, Open Virtualization