From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?IuKAnHRpZWp1bi5jaGVu4oCdIg==?= Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Date: Wed, 24 Jul 2013 10:26:47 +0800 Message-ID: <51EF3B67.5060403@windriver.com> References: <1374127456-9614-1-git-send-email-Bharat.Bhushan@freescale.com> <1374127456-9614-2-git-send-email-Bharat.Bhushan@freescale.com> <51E78A88.4000904@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6AFF@039-SN2MPN1-013.039d.mgd.msft.net> <51E799D0.1030406@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Bhushan Bharat-R65777 , "kvm-ppc@vger.kernel.org" , "kvm@vger.kernel.org" , Wood Scott-B07421 To: Alexander Graf Return-path: In-Reply-To: Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 07/18/2013 06:27 PM, Alexander Graf wrote: > > On 18.07.2013, at 12:19, =E2=80=9Ctiejun.chen=E2=80=9D wrote: > >> On 07/18/2013 06:12 PM, Alexander Graf wrote: >>> >>> On 18.07.2013, at 12:08, =E2=80=9Ctiejun.chen=E2=80=9D wrote: >>> >>>> On 07/18/2013 05:48 PM, Alexander Graf wrote: >>>>> >>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: >>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Bhushan Bharat-R65777 >>>>>>> Sent: Thursday, July 18, 2013 1:53 PM >>>>>>> To: '"=EF=BF=BDtiejun.chen=EF=BF=BD"' >>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de= ; Wood Scott- >>>>>>> B07421 >>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only= for kernel >>>>>>> managed pages >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: "=EF=BF=BDtiejun.chen=EF=BF=BD" [mailto:tiejun.chen@wind= river.com] >>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM >>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.d= e; Wood >>>>>>>> Scott- >>>>>>>> B07421 >>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency onl= y for >>>>>>>> kernel managed pages >>>>>>>> >>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org >>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "=EF=BF=BD= tiejun.chen=EF=BF=BD" >>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM >>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse= =2Ede; >>>>>>>>>> Wood >>>>>>>>>> Scott- >>>>>>>>>> B07421 >>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency o= nly for >>>>>>>>>> kernel managed pages >>>>>>>>>> >>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: "=EF=BF=BDtiejun.chen=EF=BF=BD" [mailto:tiejun.chen@= windriver.com] >>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM >>>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@su= se.de; >>>>>>>>>>>> Wood >>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777 >>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency= only >>>>>>>>>>>> for kernel managed pages >>>>>>>>>>>> >>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >>>>>>>>>>>>> If there is a struct page for the requested mapping then = it's >>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cachea= ble) >>>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache >>>>>>>>>>>>> inhibited, >>>>>>>>>>>>> guarded) >>>>>>>>>>>>> >>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned= device >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Bharat Bhushan >>>>>>>>>>>>> --- >>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-= ---- >>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>> index 1c6a9d7..089c227 100644 >>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32 >>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int >>>>>>>>>>>> usermode) >>>>>>>>>>>>> return mas3; >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int >>>>>>>>>>>>> usermode) >>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_= t pfn) >>>>>>>>>>>>> { >>>>>>>>>>>>> + u32 mas2_attr; >>>>>>>>>>>>> + >>>>>>>>>>>>> + mas2_attr =3D mas2 & MAS2_ATTRIB_MASK; >>>>>>>>>>>>> + >>>>>>>>>>>>> + if (!pfn_valid(pfn)) { >>>>>>>>>>>> >>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()? >>>>>>>>>>> >>>>>>>>>>> What I understand from this function (someone can correct m= e) is >>>>>>>>>>> that it >>>>>>>>>> returns "false" when the page is managed by kernel and is no= t >>>>>>>>>> marked as RESERVED (for some reason). For us it does not mat= ter >>>>>>>>>> whether the page is reserved or not, if it is kernel visible= page then it >>>>>>> is DDR. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think you are setting I|G by addressing all mmio pages, ri= ght? If >>>>>>>>>> so, >>>>>>>>>> >>>>>>>>>> KVM: direct mmio pfn check >>>>>>>>>> >>>>>>>>>> Userspace may specify memory slots that are backed by = mmio >>>>>>>>>> pages rather than >>>>>>>>>> normal RAM. In some cases it is not enough to identif= y these >>>>>>>>>> mmio >>>>>>>> pages >>>>>>>>>> by pfn_valid(). This patch adds checking the PageRese= rved as well. >>>>>>>>> >>>>>>>>> Do you know what are those "some cases" and how checking >>>>>>>>> PageReserved helps in >>>>>>>> those cases? >>>>>>>> >>>>>>>> No, myself didn't see these actual cases in qemu,too. But this= should >>>>>>>> be chronically persistent as I understand ;-) >>>>>>> >>>>>>> Then I will wait till someone educate me :) >>>>>> >>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy an= d I do not want to call this for all tlbwe operation unless it is neces= sary. >>>>> >>>>> It certainly does more than we need and potentially slows down th= e fast path (RAM mapping). The only thing it does on top of "if (pfn_va= lid())" is to check for pages that are declared reserved on the host. T= his happens in 2 cases: >>>>> >>>>> 1) Non cache coherent DMA >>>>> 2) Memory hot remove >>>>> >>>>> The non coherent DMA case would be interesting, as with the mecha= nism as it is in place in Linux today, we could potentially break norma= l guest operation if we don't take it into account. However, it's Kconf= ig guarded by: >>>>> >>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE= _COMMON >>>>> default n if PPC_47x >>>>> default y >>>>> >>>>> so we never hit it with any core we care about ;). >>>>> >>>>> Memory hot remove does not exist on e500 FWIW, so we don't have t= o worry about that one either. >>>> >>>> Thanks for this good information :) >>>> >>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm= _is_mmio_pfn() to make sure that check is only valid when that is reall= y needed? This can decrease those unnecessary performance loss. >>>> >>>> If I'm wrong please correct me :) >>> >>> You're perfectly right, but this is generic KVM code. So it gets ru= n across all architectures. What if someone has the great idea to add a= new case here for x86, but doesn't tell us? In that case we potentiall= y break x86. >>> >>> I'd rather not like to break x86 :). >>> >>> However, it'd be very interesting to see a benchmark with this chan= ge. Do you think you could just rip out the whole reserved check and ru= n a few benchmarks and show us the results? >>> >> >> Often what case should be adopted to validate this scenario? > > Something which hammers the TLB emulation heavily. I usually just run= /bin/echo a thousand times in "time" and see how long it takes ;) > I tried to run five times with this combination, "time `for ((i=3D0; i<= 5000;=20 i++)); do /bin/echo; done`", to calculate the average value with this = change: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1580dd4..5e8635b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -102,6 +102,10 @@ static bool largepages_enabled =3D true; bool kvm_is_mmio_pfn(pfn_t pfn) { +#ifdef CONFIG_MEMORY_HOTPLUG + /* + * Currently only in memory hot remove case we may still need t= his. + */ if (pfn_valid(pfn)) { int reserved; struct page *tail =3D pfn_to_page(pfn); @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn) } return PageReserved(tail); } +#endif return true; } Before apply this change: real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5=3D= 1m21.376s user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5=3D= 0m23.433s sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5=3D 0m= 50.349s After apply this change: real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5=3D= 1m20.667s user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5=3D= 0m22.615s sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5=3D 0m= 49.496s So, real (1m20.667s - 1m21.376s)/1m21.376s x 100% =3D -0.6% user (0m22.615s - 0m23.433s)/0m23.433s x 100% =3D -3.5% sys (0m49.496s - 0m50.349s)/0m50.349s x 100% =3D -1.7% Note I only boot one VM. Tiejun