Huge page(contiguous bit) slow down

All of lore.kernel.org
 help / color / mirror / Atom feed

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: Huge page(contiguous bit) slow down
Date: Wed, 19 Sep 2018 11:29:37 +0100	[thread overview]
Message-ID: <20180919102937.GD25094@arm.com> (raw)
In-Reply-To: <20180918171133.2suapy5le7lfp5gf@armageddon.cambridge.arm.com>

On Tue, Sep 18, 2018 at 06:11:34PM +0100, Catalin Marinas wrote:
> On Tue, Sep 18, 2018 at 05:14:33PM +0100, Will Deacon wrote:
> > On Tue, Sep 18, 2018 at 05:09:28PM +0100, Catalin Marinas wrote:
> > > On Tue, Sep 18, 2018 at 04:16:26PM +0100, Will Deacon wrote:
> > > > One thing I don't really grok is the interaction between the contiguous
> > > > hint and HW_AFDBM. Is it possible for us to be e.g. halfway through the
> > > > set_pte_at() loop and then for the hardware to perform atomic PTE updates
> > > > for entries later in the loop? If so, we've got a race and need to use
> > > > cmpxchg() like we do for the non-contiguous code.
> > > 
> > > With the current code, no, since get_clear_flush() sets all of them to
> > > 0, so no hardware updates before set_pte_at().
> > 
> > The case I'm concerned about is when we've set_pte_at() half of the mapping,
> > though. At this point, a CPU can get a translation via one of the entries
> > that we've put down, and it's not clear to me whether this could establish
> > a contiguous TLB entry which could then result in access/dirty updates to
> > PTEs that we haven't yet written out.
> 
> Ah, I see what you meant. The actual updates are not done based on the
> TLB information but rather the CPU performs a read-modify-write of the
> entry (when an existing TLB entry would give fault; for writes as we
> don't cache the no access flag in the TLB).
> 
> The AF case is simpler as the hardware doesn't cache an AF=0 pte in the
> TLB. For DBM, we could indeed have a contiguous writable+clean (DBM=1,
> AP[2] = 1) entry covering not yet written ptes. The ARM ARM describes
> that the AP[2] bit is updated (cleared) atomically only if the DBM bit
> is set in the pte:
> 
>   If, for a write access, the PE finds that a cached copy of the
>   descriptor in a TLB had the DBM bit set to 1 and the AP[2] or S2AP[1]
>   bit set to the value that forbids writes, then the PE must check that
>   the cached copy is not stale with regard to the descriptor entry in
>   memory, and if necessary perform an atomic read-modify-write update of
>   the descriptor in memory. This applies if the cached copy of the
>   descriptor in a TLB is either:
>   - A stage 1 descriptor in which DBM has the value 1 and AP[2] has the
>     value 1.
>   - A stage 2 descriptor in which DBM has the value 1 and S2AP[1] has
>     the value 0.
> 
> (and there are some further notes in the ARM ARM; I think we don't
> have an issue here)

Thanks for the clarification, I also think we're alright here. The concern
would be if the CPU can, for example, always check the first PTE in the
contiguous range yet perform an atomic update on a different one. I have no
idea why a CPU would do this, so I think the architecture text can just be
interpreted a bit broadly there and I'll see if I can raise a clarification.

However, as you mention off-list, there is a problem with the proposed
solution of using pte_same() only on the first entry of the range. If a
CPU ignores the contiguous hint (as it is permitted to do), then we could
repeatedly fault on a different PTE within the range (for example, if we
are using software dirty-bit) and get stuck.

Something like the diff below might help, but I haven't tested it. What do
you think?

Will

--->8

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 192b3ba07075..edae6774132d 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -324,7 +324,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 			       unsigned long addr, pte_t *ptep,
 			       pte_t pte, int dirty)
 {
-	int ncontig, i, changed = 0;
+	int ncontig, i;
 	size_t pgsize = 0;
 	unsigned long pfn = pte_pfn(pte), dpfn;
 	pgprot_t hugeprot;
@@ -336,9 +336,14 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize);
 	dpfn = pgsize >> PAGE_SHIFT;
 
+	for (i = 0; i < ncontig; i++)
+		if (!pte_same(huge_ptep_get(ptep + i), pte))
+			break;
+
+	if (i == ncontig)
+		return 0;
+
 	orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
-	if (!pte_same(orig_pte, pte))
-		changed = 1;
 
 	/* Make sure we don't lose the dirty state */
 	if (pte_dirty(orig_pte))
@@ -348,7 +353,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
 		set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot));
 
-	return changed;
+	return 1;
 }
 
 void huge_ptep_set_wrprotect(struct mm_struct *mm,

next prev parent reply	other threads:[~2018-09-19 10:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-18  3:02 Huge page(contiguous bit) slow down Zhang, Lei
2018-09-18 11:33 ` Will Deacon
2018-09-18 14:58   ` Catalin Marinas
2018-09-18 15:16     ` Will Deacon
2018-09-18 16:09       ` Catalin Marinas
2018-09-18 16:14         ` Will Deacon
2018-09-18 17:11           ` Catalin Marinas
2018-09-18 20:30             ` Steve Capper
2018-09-19 10:29             ` Will Deacon [this message]
2018-09-19 15:37               ` Steve Capper
2018-09-19 16:05                 ` Will Deacon
2018-09-18 20:26       ` Steve Capper
2018-09-18 15:18   ` Punit Agrawal

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:192b3ba0707 dfblob:edae6774132 )
 OR (
bs:"Huge page(contiguous bit) slow down" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180919102937.GD25094@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.