From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bharata B Rao Date: Fri, 31 Jul 2020 10:14:40 +0000 Subject: Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free() Message-Id: <20200731100240.GC20199@in.ibm.com> List-Id: References: <1596151526-4374-1-git-send-email-linuxram@us.ibm.com> <20200731042940.GA20199@in.ibm.com> <20200731083700.GB5787@oc0525413822.ibm.com> In-Reply-To: <20200731083700.GB5787@oc0525413822.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: Ram Pai Cc: ldufour@linux.ibm.com, cclaudio@linux.ibm.com, kvm-ppc@vger.kernel.org, sathnaga@linux.vnet.ibm.com, aneesh.kumar@linux.ibm.com, sukadev@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org, bauerman@linux.ibm.com, david@gibson.dropbear.id.au On Fri, Jul 31, 2020 at 01:37:00AM -0700, Ram Pai wrote: > On Fri, Jul 31, 2020 at 09:59:40AM +0530, Bharata B Rao wrote: > > On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote: > > In our case, device pages that are in use are always associated with a = valid > > pvt member. See kvmppc_uvmem_get_page() which returns failure if it > > runs out of device pfns and that will result in proper failure of > > page-in calls. >=20 > looked at the code, and yes that code path looks correct. So my > reasoning behind the root cause of this bug is incorrect. However the > bug is surfacing and there must be a reason. >=20 > >=20 > > For the case where we run out of device pfns, migrate_vma_finalize() wi= ll > > restore the original PTE and will not replace the PTE with device priva= te PTE. > >=20 > > Also kvmppc_uvmem_page_free() (=DEv_pagemap_ops.page_free()) is never > > called for non-device-private pages. >=20 > Yes. it should not be called. But as seen above in the stack trace, it is= called.=20 >=20 > What would cause the HMM to call ->page_free() on a page that is not > associated with that device's pfn? I believe it is being called for a device private page, you can verify it when you hit it next time? >=20 > >=20 > > This could be a use-after-free case possibly arising out of the new sta= te > > changes in HV. If so, this fix will only mask the bug and not address t= he > > original problem. >=20 > I can verify by rerunning the tests, without the new state changes. But > I do not see how those changes can cause this fault? >=20 > This could also be caused by a duplicate ->page_free() call due to some > bug in the migrate_page path? Could there be a race between > migrate_page() and a page_fault ? >=20 >=20 > Regardless, kvmppc_uvmem_page_free() needs to be fixed. It should not > access contents of pvt, without verifing pvt is valid. We don't expect pvt to be NULL here. Checking for NULL and returning isn't the right fix, I think. Regards, Bharata.