From mboxrd@z Thu Jan 1 00:00:00 1970 From: Leonardo =?ISO-8859-1?Q?Br=E1s?= Date: Tue, 31 Aug 2021 20:30:00 +0000 Subject: Re: [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode Message-Id: <12346f49d2f9962d654efe4466754c519a98c236.camel@gmail.com> List-Id: References: <20210827040706.517652-1-aik@ozlabs.ru> In-Reply-To: <20210827040706.517652-1-aik@ozlabs.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit To: Alexey Kardashevskiy , linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Hello Alexey, On Fri, 2021-08-27 at 14:07 +1000, Alexey Kardashevskiy wrote: > Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated > only when needed. This allows skipping any update when clearing TCEs. > This works mostly fine as TCE updates are handled when MMU is enabled. > The realmode handlers fail with H_TOO_HARD when pages are not yet > allocated except when clearing a TCE in which case KVM prints a warning > but proceeds to dereference a NULL pointer which crashes the host OS. > > This has not been caught so far as the change is reasonably new, > POWER9 runs mostly radix which does not use realmode handlers. > With hash, the default TCE table is memset() by QEMU the machine reset > which triggers page faults and the KVM TCE device's > kvm_spapr_tce_fault() > handles those with MMU on. And the huge DMA windows are not cleared > by VMs whicn instead successfully create a DMA window big enough to map > the VM memory 1:1 and then VMs just map everything without clearing. > > This started crashing now as upcoming sriov-under-powervm support added > a mode when a dymanic DMA window not big enough to map the VM memory > 1:1 > but it is used anyway, and the VM now is the first (i.e. not QEMU) to > clear a just created table. Note that the upstream QEMU needs to be > modified to trigger the VM to trigger the host OS crash. > > This replaces WARN_ON_ONCE_RM() with a check and return. > This adds another warning if TCE is not being cleared. > > Cc: Leonardo Bras > Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand > too") > Signed-off-by: Alexey Kardashevskiy > --- > > With recent changes in the printk() department, calling pr_err() when > MMU > off causes lockdep lockups which I did not dig any further so we should > start getting rid of the realmode's WARN_ON_ONCE_RM(). > --- >  arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++++++--- >  1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c > b/arch/powerpc/kvm/book3s_64_vio_hv.c > index 083a4e037718..e5ba96c41f3f 100644 > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct > kvmppc_spapr_tce_table *stt, >         idx -= stt->offset; >         page = stt->pages[idx / TCES_PER_PAGE]; >         /* > -        * page must not be NULL in real mode, > -        * kvmppc_rm_ioba_validate() must have taken care of this. > +        * kvmppc_rm_ioba_validate() allows pages not be allocated if > TCE is > +        * being cleared, otherwise it returns H_TOO_HARD and we skip > this. >          */ > -       WARN_ON_ONCE_RM(!page); > +       if (!page) { > +               WARN_ON_ONCE_RM(tce != 0); > +               return; > +       } >         tbl = kvmppc_page_address(page); >   >         tbl[idx % TCES_PER_PAGE] = tce; That looks reasonable IMHO. FWIW: Reviewed-by: Leonardo Bras