From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bharata B Rao Date: Fri, 31 Jul 2020 04:41:40 +0000 Subject: Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free() Message-Id: <20200731042940.GA20199@in.ibm.com> List-Id: References: <1596151526-4374-1-git-send-email-linuxram@us.ibm.com> In-Reply-To: <1596151526-4374-1-git-send-email-linuxram@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: Ram Pai Cc: ldufour@linux.ibm.com, cclaudio@linux.ibm.com, kvm-ppc@vger.kernel.org, sathnaga@linux.vnet.ibm.com, aneesh.kumar@linux.ibm.com, sukadev@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org, bauerman@linux.ibm.com, david@gibson.dropbear.id.au On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote: > Observed the following oops while stress-testing, using multiple > secureVM on a distro kernel. However this issue theoritically exists in > 5.5 kernel and later. >=20 > This issue occurs when the total number of requested device-PFNs exceed > the total-number of available device-PFNs. PFN migration fails to > allocate a device-pfn, which causes migrate_vma_finalize() to trigger > kvmppc_uvmem_page_free() on a page, that is not associated with any > device-pfn. kvmppc_uvmem_page_free() blindly tries to access the > contents of the private data which can be null, leading to the following > kernel fault. >=20 > ------------------------------------------------------------------------= -- > Unable to handle kernel paging request for data at address 0x00000011 > Faulting instruction address: 0xc00800000e36e110 > Oops: Kernel access of bad area, sig: 11 [#1] > LE SMP NR_CPUS 48 NUMA PowerNV > .... > MSR: 900000000280b033 > CR: 24424822 XER: 00000000 > CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0 > GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 00000000000000= 00 > GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf3= 00 > GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e3736= 08 > GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef873600= 00 > GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 00000000000000= 00 > GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 00000000000000= 01 > GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 00000000000000= 01 > GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f= 80 > NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv] > LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv] > Call Trace: > [c000000000512010] free_devmap_managed_page+0xd0/0x100 > [c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0 > [c0000000004d24bc] migrate_vma_finalize+0x32c/0x410 > [c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv] > [c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv] > [c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv] > [c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv] > [c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv] > [c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm] > [c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm] > [c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm] > [c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0 > [c000000000545d64] sys_ioctl+0xc4/0x160 > [c00000000000b408] system_call+0x5c/0x70 > Instruction dump: > a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000 > 913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000 > ------------------------------------------------------------------------= -- >=20 > Fix the oops.. >=20 > fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests") > Signed-off-by: Ram Pai > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) >=20 > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s= _hv_uvmem.c > index 2806983..f4002bf 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *p= age) > { > unsigned long pfn =3D page_to_pfn(page) - > (kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT); > - struct kvmppc_uvmem_page_pvt *pvt; > + struct kvmppc_uvmem_page_pvt *pvt =3D page->zone_device_data; > + > + if (!pvt) > + return; > =20 > spin_lock(&kvmppc_uvmem_bitmap_lock); > bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1); > spin_unlock(&kvmppc_uvmem_bitmap_lock); > =20 > - pvt =3D page->zone_device_data; > page->zone_device_data =3D NULL; > if (pvt->remove_gfn) > kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm); In our case, device pages that are in use are always associated with a valid pvt member. See kvmppc_uvmem_get_page() which returns failure if it runs out of device pfns and that will result in proper failure of page-in calls. For the case where we run out of device pfns, migrate_vma_finalize() will restore the original PTE and will not replace the PTE with device private P= TE. Also kvmppc_uvmem_page_free() (=DEv_pagemap_ops.page_free()) is never called for non-device-private pages. This could be a use-after-free case possibly arising out of the new state changes in HV. If so, this fix will only mask the bug and not address the original problem. Regards, Bharata. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76CC5C433DF for ; Fri, 31 Jul 2020 04:31:57 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C63B420829 for ; Fri, 31 Jul 2020 04:31:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C63B420829 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4BHvWH0Dx1zDqcF for ; Fri, 31 Jul 2020 14:31:55 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0b-001b2d01.pphosted.com; envelope-from=bharata@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4BHvT46cVgzDqXd for ; Fri, 31 Jul 2020 14:30:00 +1000 (AEST) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 06V4PRef108850; Fri, 31 Jul 2020 00:29:49 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 32mc8wg19b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 Jul 2020 00:29:49 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 06V4QCgl016655; Fri, 31 Jul 2020 04:29:47 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma03ams.nl.ibm.com with ESMTP id 32gcpx6uh0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 31 Jul 2020 04:29:47 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 06V4TiQs30671262 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 31 Jul 2020 04:29:44 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A26E54C044; Fri, 31 Jul 2020 04:29:44 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 75E7A4C040; Fri, 31 Jul 2020 04:29:42 +0000 (GMT) Received: from in.ibm.com (unknown [9.199.52.65]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 31 Jul 2020 04:29:42 +0000 (GMT) Date: Fri, 31 Jul 2020 09:59:40 +0530 From: Bharata B Rao To: Ram Pai Subject: Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free() Message-ID: <20200731042940.GA20199@in.ibm.com> References: <1596151526-4374-1-git-send-email-linuxram@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1596151526-4374-1-git-send-email-linuxram@us.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-30_19:2020-07-30, 2020-07-30 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 clxscore=1015 impostorscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 lowpriorityscore=0 suspectscore=5 bulkscore=0 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007310023 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: bharata@linux.ibm.com Cc: ldufour@linux.ibm.com, cclaudio@linux.ibm.com, kvm-ppc@vger.kernel.org, sathnaga@linux.vnet.ibm.com, aneesh.kumar@linux.ibm.com, sukadev@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org, bauerman@linux.ibm.com, david@gibson.dropbear.id.au Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote: > Observed the following oops while stress-testing, using multiple > secureVM on a distro kernel. However this issue theoritically exists in > 5.5 kernel and later. > > This issue occurs when the total number of requested device-PFNs exceed > the total-number of available device-PFNs. PFN migration fails to > allocate a device-pfn, which causes migrate_vma_finalize() to trigger > kvmppc_uvmem_page_free() on a page, that is not associated with any > device-pfn. kvmppc_uvmem_page_free() blindly tries to access the > contents of the private data which can be null, leading to the following > kernel fault. > > -------------------------------------------------------------------------- > Unable to handle kernel paging request for data at address 0x00000011 > Faulting instruction address: 0xc00800000e36e110 > Oops: Kernel access of bad area, sig: 11 [#1] > LE SMP NR_CPUS=2048 NUMA PowerNV > .... > MSR: 900000000280b033 > CR: 24424822 XER: 00000000 > CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0 > GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 0000000000000000 > GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf300 > GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e373608 > GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef87360000 > GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 0000000000000000 > GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 0000000000000001 > GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 0000000000000001 > GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f80 > NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv] > LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv] > Call Trace: > [c000000000512010] free_devmap_managed_page+0xd0/0x100 > [c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0 > [c0000000004d24bc] migrate_vma_finalize+0x32c/0x410 > [c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv] > [c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv] > [c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv] > [c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv] > [c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv] > [c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm] > [c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm] > [c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm] > [c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0 > [c000000000545d64] sys_ioctl+0xc4/0x160 > [c00000000000b408] system_call+0x5c/0x70 > Instruction dump: > a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000 > 913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000 > -------------------------------------------------------------------------- > > Fix the oops.. > > fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests") > Signed-off-by: Ram Pai > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > index 2806983..f4002bf 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page) > { > unsigned long pfn = page_to_pfn(page) - > (kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT); > - struct kvmppc_uvmem_page_pvt *pvt; > + struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data; > + > + if (!pvt) > + return; > > spin_lock(&kvmppc_uvmem_bitmap_lock); > bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1); > spin_unlock(&kvmppc_uvmem_bitmap_lock); > > - pvt = page->zone_device_data; > page->zone_device_data = NULL; > if (pvt->remove_gfn) > kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm); In our case, device pages that are in use are always associated with a valid pvt member. See kvmppc_uvmem_get_page() which returns failure if it runs out of device pfns and that will result in proper failure of page-in calls. For the case where we run out of device pfns, migrate_vma_finalize() will restore the original PTE and will not replace the PTE with device private PTE. Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never called for non-device-private pages. This could be a use-after-free case possibly arising out of the new state changes in HV. If so, this fix will only mask the bug and not address the original problem. Regards, Bharata.