From: SeongJae Park <sj@kernel.org>
To: Alistair Popple <apopple@nvidia.com>
Cc: SeongJae Park <sj@kernel.org>,
akpm@linux-foundation.org, ajd@linux.ibm.com,
catalin.marinas@arm.com, fbarrat@linux.ibm.com,
iommu@lists.linux.dev, jgg@ziepe.ca, jhubbard@nvidia.com,
kevin.tian@intel.com, kvm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au,
nicolinc@nvidia.com, npiggin@gmail.com, robin.murphy@arm.com,
seanjc@google.com, will@kernel.org, x86@kernel.org,
zhi.wang.linux@gmail.com
Subject: Re: [PATCH v2 3/5] mmu_notifiers: Call invalidate_range() when invalidating TLBs
Date: Thu, 20 Jul 2023 01:31:31 +0000 [thread overview]
Message-ID: <20230720013131.1880-1-sj@kernel.org> (raw)
In-Reply-To: <877cqvl7vr.fsf@nvdebian.thelocal>
On Thu, 20 Jul 2023 10:52:59 +1000 Alistair Popple <apopple@nvidia.com> wrote:
>
> SeongJae Park <sj@kernel.org> writes:
>
> > Hi Alistair,
> >
> > On Wed, 19 Jul 2023 22:18:44 +1000 Alistair Popple <apopple@nvidia.com> wrote:
> >
> >> The invalidate_range() is going to become an architecture specific mmu
> >> notifier used to keep the TLB of secondary MMUs such as an IOMMU in
> >> sync with the CPU page tables. Currently it is called from separate
> >> code paths to the main CPU TLB invalidations. This can lead to a
> >> secondary TLB not getting invalidated when required and makes it hard
> >> to reason about when exactly the secondary TLB is invalidated.
> >>
> >> To fix this move the notifier call to the architecture specific TLB
> >> maintenance functions for architectures that have secondary MMUs
> >> requiring explicit software invalidations.
> >>
> >> This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades
> >> require a TLB invalidation. This invalidation is done by the
> >> architecutre specific ptep_set_access_flags() which calls
> >> flush_tlb_page() if required. However this doesn't call the notifier
> >> resulting in infinite faults being generated by devices using the SMMU
> >> if it has previously cached a read-only PTE in it's TLB.
> >>
> >> Moving the invalidations into the TLB invalidation functions ensures
> >> all invalidations happen at the same time as the CPU invalidation. The
> >> architecture specific flush_tlb_all() routines do not call the
> >> notifier as none of the IOMMUs require this.
> >>
> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> >> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> >
> > I found below kernel NULL-dereference issue on latest mm-unstable tree, and
> > bisect points me to the commit of this patch, namely
> > 75c400f82d347af1307010a3e06f3aa5d831d995.
> >
> > To reproduce, I use 'stress-ng --bigheap $(nproc)'. The issue happens as soon
> > as it starts reclaiming memory. I didn't dive deep into this yet, but
> > reporting this issue first, since you might have an idea already.
>
> Thanks for the report SJ!
>
> I see the problem - current->mm can (obviously!) be NULL which is what's
> leading to the NULL dereference. Instead I think on x86 I need to call
> the notifier when adding the invalidate to the tlbbatch in
> arch_tlbbatch_add_pending() which is equivalent to what ARM64 does.
>
> The below should fix it. Will do a respin with this.
Thank you for this quick reply! I confirm this fixes my issue.
Tested-by: SeongJae Park <sj@kernel.org>
>
> ---
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 837e4a50281a..79c46da919b9 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -4,6 +4,7 @@
>
> #include <linux/mm_types.h>
> #include <linux/sched.h>
> +#include <linux/mmu_notifier.h>
Nit. How about putting it between mm_types.h and sched.h, so that it looks
alphabetically sorted?
>
> #include <asm/processor.h>
> #include <asm/cpufeature.h>
> @@ -282,6 +283,7 @@ static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *b
> {
> inc_mm_tlb_gen(mm);
> cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> + mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> }
>
> static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 0b990fb56b66..2d253919b3e8 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1265,7 +1265,6 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>
> put_flush_tlb_info();
> put_cpu();
> - mmu_notifier_arch_invalidate_secondary_tlbs(current->mm, 0, -1UL);
> }
>
> /*
>
>
Thanks,
SJ
WARNING: multiple messages have this Message-ID (diff)
From: SeongJae Park <sj@kernel.org>
To: Alistair Popple <apopple@nvidia.com>
Cc: zhi.wang.linux@gmail.com, kvm@vger.kernel.org,
catalin.marinas@arm.com, linux-mm@kvack.org, will@kernel.org,
x86@kernel.org, jgg@ziepe.ca, iommu@lists.linux.dev,
nicolinc@nvidia.com, kevin.tian@intel.com, ajd@linux.ibm.com,
jhubbard@nvidia.com, robin.murphy@arm.com, npiggin@gmail.com,
linux-arm-kernel@lists.infradead.org,
SeongJae Park <sj@kernel.org>,
seanjc@google.com, linux-kernel@vger.kernel.org,
fbarrat@linux.ibm.com, akpm@linux-foundation.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 3/5] mmu_notifiers: Call invalidate_range() when invalidating TLBs
Date: Thu, 20 Jul 2023 01:31:31 +0000 [thread overview]
Message-ID: <20230720013131.1880-1-sj@kernel.org> (raw)
In-Reply-To: <877cqvl7vr.fsf@nvdebian.thelocal>
On Thu, 20 Jul 2023 10:52:59 +1000 Alistair Popple <apopple@nvidia.com> wrote:
>
> SeongJae Park <sj@kernel.org> writes:
>
> > Hi Alistair,
> >
> > On Wed, 19 Jul 2023 22:18:44 +1000 Alistair Popple <apopple@nvidia.com> wrote:
> >
> >> The invalidate_range() is going to become an architecture specific mmu
> >> notifier used to keep the TLB of secondary MMUs such as an IOMMU in
> >> sync with the CPU page tables. Currently it is called from separate
> >> code paths to the main CPU TLB invalidations. This can lead to a
> >> secondary TLB not getting invalidated when required and makes it hard
> >> to reason about when exactly the secondary TLB is invalidated.
> >>
> >> To fix this move the notifier call to the architecture specific TLB
> >> maintenance functions for architectures that have secondary MMUs
> >> requiring explicit software invalidations.
> >>
> >> This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades
> >> require a TLB invalidation. This invalidation is done by the
> >> architecutre specific ptep_set_access_flags() which calls
> >> flush_tlb_page() if required. However this doesn't call the notifier
> >> resulting in infinite faults being generated by devices using the SMMU
> >> if it has previously cached a read-only PTE in it's TLB.
> >>
> >> Moving the invalidations into the TLB invalidation functions ensures
> >> all invalidations happen at the same time as the CPU invalidation. The
> >> architecture specific flush_tlb_all() routines do not call the
> >> notifier as none of the IOMMUs require this.
> >>
> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> >> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> >
> > I found below kernel NULL-dereference issue on latest mm-unstable tree, and
> > bisect points me to the commit of this patch, namely
> > 75c400f82d347af1307010a3e06f3aa5d831d995.
> >
> > To reproduce, I use 'stress-ng --bigheap $(nproc)'. The issue happens as soon
> > as it starts reclaiming memory. I didn't dive deep into this yet, but
> > reporting this issue first, since you might have an idea already.
>
> Thanks for the report SJ!
>
> I see the problem - current->mm can (obviously!) be NULL which is what's
> leading to the NULL dereference. Instead I think on x86 I need to call
> the notifier when adding the invalidate to the tlbbatch in
> arch_tlbbatch_add_pending() which is equivalent to what ARM64 does.
>
> The below should fix it. Will do a respin with this.
Thank you for this quick reply! I confirm this fixes my issue.
Tested-by: SeongJae Park <sj@kernel.org>
>
> ---
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 837e4a50281a..79c46da919b9 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -4,6 +4,7 @@
>
> #include <linux/mm_types.h>
> #include <linux/sched.h>
> +#include <linux/mmu_notifier.h>
Nit. How about putting it between mm_types.h and sched.h, so that it looks
alphabetically sorted?
>
> #include <asm/processor.h>
> #include <asm/cpufeature.h>
> @@ -282,6 +283,7 @@ static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *b
> {
> inc_mm_tlb_gen(mm);
> cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> + mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> }
>
> static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 0b990fb56b66..2d253919b3e8 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1265,7 +1265,6 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>
> put_flush_tlb_info();
> put_cpu();
> - mmu_notifier_arch_invalidate_secondary_tlbs(current->mm, 0, -1UL);
> }
>
> /*
>
>
Thanks,
SJ
WARNING: multiple messages have this Message-ID (diff)
From: SeongJae Park <sj@kernel.org>
To: Alistair Popple <apopple@nvidia.com>
Cc: SeongJae Park <sj@kernel.org>,
akpm@linux-foundation.org, ajd@linux.ibm.com,
catalin.marinas@arm.com, fbarrat@linux.ibm.com,
iommu@lists.linux.dev, jgg@ziepe.ca, jhubbard@nvidia.com,
kevin.tian@intel.com, kvm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au,
nicolinc@nvidia.com, npiggin@gmail.com, robin.murphy@arm.com,
seanjc@google.com, will@kernel.org, x86@kernel.org,
zhi.wang.linux@gmail.com
Subject: Re: [PATCH v2 3/5] mmu_notifiers: Call invalidate_range() when invalidating TLBs
Date: Thu, 20 Jul 2023 01:31:31 +0000 [thread overview]
Message-ID: <20230720013131.1880-1-sj@kernel.org> (raw)
In-Reply-To: <877cqvl7vr.fsf@nvdebian.thelocal>
On Thu, 20 Jul 2023 10:52:59 +1000 Alistair Popple <apopple@nvidia.com> wrote:
>
> SeongJae Park <sj@kernel.org> writes:
>
> > Hi Alistair,
> >
> > On Wed, 19 Jul 2023 22:18:44 +1000 Alistair Popple <apopple@nvidia.com> wrote:
> >
> >> The invalidate_range() is going to become an architecture specific mmu
> >> notifier used to keep the TLB of secondary MMUs such as an IOMMU in
> >> sync with the CPU page tables. Currently it is called from separate
> >> code paths to the main CPU TLB invalidations. This can lead to a
> >> secondary TLB not getting invalidated when required and makes it hard
> >> to reason about when exactly the secondary TLB is invalidated.
> >>
> >> To fix this move the notifier call to the architecture specific TLB
> >> maintenance functions for architectures that have secondary MMUs
> >> requiring explicit software invalidations.
> >>
> >> This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades
> >> require a TLB invalidation. This invalidation is done by the
> >> architecutre specific ptep_set_access_flags() which calls
> >> flush_tlb_page() if required. However this doesn't call the notifier
> >> resulting in infinite faults being generated by devices using the SMMU
> >> if it has previously cached a read-only PTE in it's TLB.
> >>
> >> Moving the invalidations into the TLB invalidation functions ensures
> >> all invalidations happen at the same time as the CPU invalidation. The
> >> architecture specific flush_tlb_all() routines do not call the
> >> notifier as none of the IOMMUs require this.
> >>
> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> >> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> >
> > I found below kernel NULL-dereference issue on latest mm-unstable tree, and
> > bisect points me to the commit of this patch, namely
> > 75c400f82d347af1307010a3e06f3aa5d831d995.
> >
> > To reproduce, I use 'stress-ng --bigheap $(nproc)'. The issue happens as soon
> > as it starts reclaiming memory. I didn't dive deep into this yet, but
> > reporting this issue first, since you might have an idea already.
>
> Thanks for the report SJ!
>
> I see the problem - current->mm can (obviously!) be NULL which is what's
> leading to the NULL dereference. Instead I think on x86 I need to call
> the notifier when adding the invalidate to the tlbbatch in
> arch_tlbbatch_add_pending() which is equivalent to what ARM64 does.
>
> The below should fix it. Will do a respin with this.
Thank you for this quick reply! I confirm this fixes my issue.
Tested-by: SeongJae Park <sj@kernel.org>
>
> ---
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 837e4a50281a..79c46da919b9 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -4,6 +4,7 @@
>
> #include <linux/mm_types.h>
> #include <linux/sched.h>
> +#include <linux/mmu_notifier.h>
Nit. How about putting it between mm_types.h and sched.h, so that it looks
alphabetically sorted?
>
> #include <asm/processor.h>
> #include <asm/cpufeature.h>
> @@ -282,6 +283,7 @@ static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *b
> {
> inc_mm_tlb_gen(mm);
> cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> + mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> }
>
> static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm)
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 0b990fb56b66..2d253919b3e8 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1265,7 +1265,6 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>
> put_flush_tlb_info();
> put_cpu();
> - mmu_notifier_arch_invalidate_secondary_tlbs(current->mm, 0, -1UL);
> }
>
> /*
>
>
Thanks,
SJ
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-07-20 1:31 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 12:18 [PATCH v2 0/5] Invalidate secondary IOMMU TLB on permission upgrade Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` [PATCH v2 1/5] arm64/smmu: Use TLBI ASID when invalidating entire range Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` [PATCH v2 2/5] mmu_notifiers: Fixup comment in mmu_interval_read_begin() Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` [PATCH v2 3/5] mmu_notifiers: Call invalidate_range() when invalidating TLBs Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 22:51 ` SeongJae Park
2023-07-19 22:51 ` SeongJae Park
2023-07-19 22:51 ` SeongJae Park
2023-07-20 0:52 ` Alistair Popple
2023-07-20 0:52 ` Alistair Popple
2023-07-20 0:52 ` Alistair Popple
2023-07-20 1:31 ` SeongJae Park [this message]
2023-07-20 1:31 ` SeongJae Park
2023-07-20 1:31 ` SeongJae Park
2023-07-24 18:18 ` Luis Chamberlain
2023-07-24 18:18 ` Luis Chamberlain
2023-07-25 0:20 ` Alistair Popple
2023-07-25 0:20 ` Alistair Popple
2023-07-25 3:41 ` Michael Ellerman
2023-07-25 3:41 ` Michael Ellerman
2023-07-25 5:51 ` Alistair Popple
2023-07-25 5:51 ` Alistair Popple
2023-07-19 12:18 ` [PATCH v2 4/5] mmu_notifiers: Don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end() Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` [PATCH v2 5/5] mmu_notifiers: Rename invalidate_range notifier Alistair Popple
2023-07-19 12:18 ` Alistair Popple
2023-07-19 12:18 ` Alistair Popple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230720013131.1880-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=ajd@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=catalin.marinas@arm.com \
--cc=fbarrat@linux.ibm.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=nicolinc@nvidia.com \
--cc=npiggin@gmail.com \
--cc=robin.murphy@arm.com \
--cc=seanjc@google.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=zhi.wang.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.