All of lore.kernel.org
 help / color / mirror / Atom feed
From: Piotr Jaroszynski <pjaroszynski@nvidia.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	 Alistair Popple <apopple@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	 John Hubbard <jhubbard@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Breno Leitao <leitao@debian.org>,
	 stable@vger.kernel.org
Subject: Re: [PATCH] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
Date: Tue, 3 Mar 2026 10:40:00 -0800	[thread overview]
Message-ID: <aacohVRfAK46lOjo@box> (raw)
In-Reply-To: <0a10ea33-937a-4294-b9a1-9323c706434d@arm.com>

On Tue, Mar 03, 2026 at 08:38:23AM +0000, Ryan Roberts wrote:
> On 03/03/2026 06:37, Piotr Jaroszynski wrote:
> > contpte_ptep_set_access_flags() compared the gathered ptep_get() value
> > against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
> > from all sub-PTEs in the CONT block, so a dirty sibling can make the
> > target appear already-dirty. When the gathered value matches entry, the
> > function returns 0 even though the target sub-PTE still has PTE_RDONLY
> > set in hardware.
> > 
> > For CPU page-table walks this is benign: with FEAT_HAFDBS the hardware
> > may set AF/dirty on any sub-PTE and the CPU TLB treats the gathered
> > result as authoritative for the entire range. But an SMMU without HTTU
> > (or with HA/HD disabled in CD.TCR) evaluates each descriptor
> > individually and will keep raising F_PERMISSION on the unchanged target
> > sub-PTE, causing an infinite fault loop.
> 
> Ouch; thanks for the fix!
> 
> > 
> > Gathering can therefore cause false no-ops when only a sibling has been
> > updated:
> >  - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> >  - read faults:  target still lacks PTE_AF
> > 
> > Fix by checking all sub-PTEs' access flags individually (not via the
> > gathered view) before returning no-op, and use the raw target PTE for
> > the write-bit unfold decision. The access-flag mask matches the one
> > used by __ptep_set_access_flags().
> > 
> > Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
> > range may become the effective cached translation and software must
> > maintain consistent attributes across the range.
> > 
> > Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")
> > 
> 
> nit: there shouldn't be whitespace here.
> 
> > Reviewed-by: Alistair Popple <apopple@nvidia.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Breno Leitao <leitao@debian.org>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
> 
> This fix looks good to me:
> 
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

Thanks!

> 
> 
> > ---
> >  arch/arm64/mm/contpte.c | 47 +++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 43 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> > index bcac4f55f9c1..9868bfe4607c 100644
> > --- a/arch/arm64/mm/contpte.c
> > +++ b/arch/arm64/mm/contpte.c
> > @@ -390,6 +390,23 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
> >  }
> >  EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);
> >  
> > +static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry)
> > +{
> > +	pte_t *cont_ptep = contpte_align_down(ptep);
> > +	const pteval_t access_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
> > +	pteval_t entry_access = pte_val(entry) & access_mask;
> > +	int i;
> > +
> > +	for (i = 0; i < CONT_PTES; i++) {
> > +		pteval_t pte_access = pte_val(__ptep_get(cont_ptep + i)) & access_mask;
> > +
> > +		if (pte_access != entry_access)
> > +			return false;
> > +	}
> 
> There are 2 forms of "dirty"; HW and SW. Here you are testing that all ptes in
> the contpte block have the same form of dirty, which I think is the correct
> thing to do. You could relax to just test that every pte has one of the forms of
> dirty, But in that case, if a pte is sw-dirty but not hw-dirty, then the
> PTE_RDONLY bit remains set and the SMMU will fault, I think?

Yes, for SMMU we need to make sure the HW-dirty state is correct or it
will keep faulting. And while we are doing it, it makes sense to just
update all the bits to be consistent.

> 
> If my reasoning is correct, then I think arm64 hugetlb has a similar bug; See
> __cont_access_flags_changed(), which just checks for any form of dirty. So I
> guess hugetlb is buggy in the same way and should be fixed to use this more
> stringent approach?

Given sw-dirty is managed by sw, is it correct for sw to ever create a
PTE that's sw-dirty but not hw-dirty? If not, then I think it will still
work fine for the SMMU case as sw-dirty implies hw-dirty, and if it's
missing then we will set both. But for thoroughness it could make sense
to be stricter and add some comments there as it does feel a little
fragile. I'm very new to this area though so probably best for others to
comment and tackle this.

Thanks,
Piotr


> 
> Thanks,
> Ryan
> 
> > +
> > +	return true;
> > +}
> > +
> >  int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> >  					unsigned long addr, pte_t *ptep,
> >  					pte_t entry, int dirty)
> > @@ -399,13 +416,35 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> >  	int i;
> >  
> >  	/*
> > -	 * Gather the access/dirty bits for the contiguous range. If nothing has
> > -	 * changed, its a noop.
> > +	 * Check whether all sub-PTEs in the CONT block already have the
> > +	 * requested access flags, using raw per-PTE values rather than the
> > +	 * gathered ptep_get() view.
> > +	 *
> > +	 * ptep_get() gathers AF/dirty state across the whole CONT block,
> > +	 * which is correct for CPU TLB semantics: with FEAT_HAFDBS the
> > +	 * hardware may set AF/dirty on any sub-PTE and the CPU TLB treats
> > +	 * the gathered result as authoritative for the entire range. But an
> > +	 * SMMU without HTTU (or with HA/HD disabled in CD.TCR) evaluates
> > +	 * each descriptor individually and will keep faulting on the target
> > +	 * sub-PTE if its flags haven't actually been updated. Gathering can
> > +	 * therefore cause false no-ops when only a sibling has been updated:
> > +	 *  - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> > +	 *  - read faults:  target still lacks PTE_AF
> > +	 *
> > +	 * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may
> > +	 * become the effective cached translation, so all entries must have
> > +	 * consistent attributes. Check the full CONT block before returning
> > +	 * no-op, and when any sub-PTE mismatches, proceed to update the whole
> > +	 * range.
> >  	 */
> > -	orig_pte = pte_mknoncont(ptep_get(ptep));
> > -	if (pte_val(orig_pte) == pte_val(entry))
> > +	if (contpte_all_subptes_match_access_flags(ptep, entry))
> >  		return 0;
> >  
> > +	/*
> > +	 * Use raw target pte (not gathered) for write-bit unfold decision.
> > +	 */
> > +	orig_pte = pte_mknoncont(__ptep_get(ptep));
> > +
> >  	/*
> >  	 * We can fix up access/dirty bits without having to unfold the contig
> >  	 * range. But if the write bit is changing, we must unfold.
> 
> 


  reply	other threads:[~2026-03-03 18:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-03  6:37 [PATCH] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults Piotr Jaroszynski
2026-03-03  7:19 ` James Houghton
2026-03-03 12:45   ` Jason Gunthorpe
2026-03-03 21:40   ` Piotr Jaroszynski
2026-03-05  4:31     ` James Houghton
2026-03-03  8:38 ` Ryan Roberts
2026-03-03 18:40   ` Piotr Jaroszynski [this message]
2026-03-03 19:12     ` Jason Gunthorpe
2026-03-04 12:20       ` Ryan Roberts
2026-03-04 13:44         ` Jason Gunthorpe
2026-03-04 11:17 ` Catalin Marinas
2026-03-04 13:43   ` Jason Gunthorpe
2026-03-04 15:01     ` Catalin Marinas
2026-03-04 15:39       ` Jason Gunthorpe
2026-03-04 17:16         ` Piotr Jaroszynski
2026-03-04 17:25         ` Catalin Marinas
2026-03-04 17:37 ` Breno Leitao
2026-03-05 17:33 ` Catalin Marinas
2026-03-05 22:49   ` Piotr Jaroszynski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aacohVRfAK46lOjo@box \
    --to=pjaroszynski@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=catalin.marinas@arm.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=leitao@debian.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=will@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.