From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bharata B Rao Date: Fri, 31 Jul 2020 04:41:40 +0000 Subject: Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free() Message-Id: <20200731042940.GA20199@in.ibm.com> List-Id: References: <1596151526-4374-1-git-send-email-linuxram@us.ibm.com> In-Reply-To: <1596151526-4374-1-git-send-email-linuxram@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: Ram Pai Cc: ldufour@linux.ibm.com, cclaudio@linux.ibm.com, kvm-ppc@vger.kernel.org, sathnaga@linux.vnet.ibm.com, aneesh.kumar@linux.ibm.com, sukadev@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org, bauerman@linux.ibm.com, david@gibson.dropbear.id.au On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote: > Observed the following oops while stress-testing, using multiple > secureVM on a distro kernel. However this issue theoritically exists in > 5.5 kernel and later. >=20 > This issue occurs when the total number of requested device-PFNs exceed > the total-number of available device-PFNs. PFN migration fails to > allocate a device-pfn, which causes migrate_vma_finalize() to trigger > kvmppc_uvmem_page_free() on a page, that is not associated with any > device-pfn. kvmppc_uvmem_page_free() blindly tries to access the > contents of the private data which can be null, leading to the following > kernel fault. >=20 > ------------------------------------------------------------------------= -- > Unable to handle kernel paging request for data at address 0x00000011 > Faulting instruction address: 0xc00800000e36e110 > Oops: Kernel access of bad area, sig: 11 [#1] > LE SMP NR_CPUS 48 NUMA PowerNV > .... > MSR: 900000000280b033 > CR: 24424822 XER: 00000000 > CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0 > GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 00000000000000= 00 > GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf3= 00 > GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e3736= 08 > GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef873600= 00 > GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 00000000000000= 00 > GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 00000000000000= 01 > GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 00000000000000= 01 > GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f= 80 > NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv] > LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv] > Call Trace: > [c000000000512010] free_devmap_managed_page+0xd0/0x100 > [c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0 > [c0000000004d24bc] migrate_vma_finalize+0x32c/0x410 > [c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv] > [c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv] > [c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv] > [c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv] > [c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv] > [c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm] > [c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm] > [c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm] > [c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0 > [c000000000545d64] sys_ioctl+0xc4/0x160 > [c00000000000b408] system_call+0x5c/0x70 > Instruction dump: > a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000 > 913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000 > ------------------------------------------------------------------------= -- >=20 > Fix the oops.. >=20 > fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests") > Signed-off-by: Ram Pai > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) >=20 > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s= _hv_uvmem.c > index 2806983..f4002bf 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *p= age) > { > unsigned long pfn =3D page_to_pfn(page) - > (kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT); > - struct kvmppc_uvmem_page_pvt *pvt; > + struct kvmppc_uvmem_page_pvt *pvt =3D page->zone_device_data; > + > + if (!pvt) > + return; > =20 > spin_lock(&kvmppc_uvmem_bitmap_lock); > bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1); > spin_unlock(&kvmppc_uvmem_bitmap_lock); > =20 > - pvt =3D page->zone_device_data; > page->zone_device_data =3D NULL; > if (pvt->remove_gfn) > kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm); In our case, device pages that are in use are always associated with a valid pvt member. See kvmppc_uvmem_get_page() which returns failure if it runs out of device pfns and that will result in proper failure of page-in calls. For the case where we run out of device pfns, migrate_vma_finalize() will restore the original PTE and will not replace the PTE with device private P= TE. Also kvmppc_uvmem_page_free() (=DEv_pagemap_ops.page_free()) is never called for non-device-private pages. This could be a use-after-free case possibly arising out of the new state changes in HV. If so, this fix will only mask the bug and not address the original problem. Regards, Bharata.