Huge page(contiguous bit) slow down

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: Huge page(contiguous bit) slow down
Date: Wed, 19 Sep 2018 11:29:37 +0100	[thread overview]
Message-ID: <20180919102937.GD25094@arm.com> (raw)
In-Reply-To: <20180918171133.2suapy5le7lfp5gf@armageddon.cambridge.arm.com>

On Tue, Sep 18, 2018 at 06:11:34PM +0100, Catalin Marinas wrote:
> On Tue, Sep 18, 2018 at 05:14:33PM +0100, Will Deacon wrote:
> > On Tue, Sep 18, 2018 at 05:09:28PM +0100, Catalin Marinas wrote:
> > > On Tue, Sep 18, 2018 at 04:16:26PM +0100, Will Deacon wrote:
> > > > One thing I don't really grok is the interaction between the contiguous
> > > > hint and HW_AFDBM. Is it possible for us to be e.g. halfway through the
> > > > set_pte_at() loop and then for the hardware to perform atomic PTE updates
> > > > for entries later in the loop? If so, we've got a race and need to use
> > > > cmpxchg() like we do for the non-contiguous code.
> > > 
> > > With the current code, no, since get_clear_flush() sets all of them to
> > > 0, so no hardware updates before set_pte_at().
> > 
> > The case I'm concerned about is when we've set_pte_at() half of the mapping,
> > though. At this point, a CPU can get a translation via one of the entries
> > that we've put down, and it's not clear to me whether this could establish
> > a contiguous TLB entry which could then result in access/dirty updates to
> > PTEs that we haven't yet written out.
> 
> Ah, I see what you meant. The actual updates are not done based on the
> TLB information but rather the CPU performs a read-modify-write of the
> entry (when an existing TLB entry would give fault; for writes as we
> don't cache the no access flag in the TLB).
> 
> The AF case is simpler as the hardware doesn't cache an AF=0 pte in the
> TLB. For DBM, we could indeed have a contiguous writable+clean (DBM=1,
> AP[2] = 1) entry covering not yet written ptes. The ARM ARM describes
> that the AP[2] bit is updated (cleared) atomically only if the DBM bit
> is set in the pte:
> 
>   If, for a write access, the PE finds that a cached copy of the
>   descriptor in a TLB had the DBM bit set to 1 and the AP[2] or S2AP[1]
>   bit set to the value that forbids writes, then the PE must check that
>   the cached copy is not stale with regard to the descriptor entry in
>   memory, and if necessary perform an atomic read-modify-write update of
>   the descriptor in memory. This applies if the cached copy of the
>   descriptor in a TLB is either:
>   - A stage 1 descriptor in which DBM has the value 1 and AP[2] has the
>     value 1.
>   - A stage 2 descriptor in which DBM has the value 1 and S2AP[1] has
>     the value 0.
> 
> (and there are some further notes in the ARM ARM; I think we don't
> have an issue here)

Thanks for the clarification, I also think we're alright here. The concern
would be if the CPU can, for example, always check the first PTE in the
contiguous range yet perform an atomic update on a different one. I have no
idea why a CPU would do this, so I think the architecture text can just be
interpreted a bit broadly there and I'll see if I can raise a clarification.

However, as you mention off-list, there is a problem with the proposed
solution of using pte_same() only on the first entry of the range. If a
CPU ignores the contiguous hint (as it is permitted to do), then we could
repeatedly fault on a different PTE within the range (for example, if we
are using software dirty-bit) and get stuck.

Something like the diff below might help, but I haven't tested it. What do
you think?

Will

--->8

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 192b3ba07075..edae6774132d 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -324,7 +324,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 			       unsigned long addr, pte_t *ptep,
 			       pte_t pte, int dirty)
 {
-	int ncontig, i, changed = 0;
+	int ncontig, i;
 	size_t pgsize = 0;
 	unsigned long pfn = pte_pfn(pte), dpfn;
 	pgprot_t hugeprot;
@@ -336,9 +336,14 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize);
 	dpfn = pgsize >> PAGE_SHIFT;
 
+	for (i = 0; i < ncontig; i++)
+		if (!pte_same(huge_ptep_get(ptep + i), pte))
+			break;
+
+	if (i == ncontig)
+		return 0;
+
 	orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
-	if (!pte_same(orig_pte, pte))
-		changed = 1;
 
 	/* Make sure we don't lose the dirty state */
 	if (pte_dirty(orig_pte))
@@ -348,7 +353,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
 		set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot));
 
-	return changed;
+	return 1;
 }
 
 void huge_ptep_set_wrprotect(struct mm_struct *mm,

next prev parent reply	other threads:[~2018-09-19 10:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-18  3:02 Huge page(contiguous bit) slow down Zhang, Lei
2018-09-18 11:33 ` Will Deacon
2018-09-18 14:58   ` Catalin Marinas
2018-09-18 15:16     ` Will Deacon
2018-09-18 16:09       ` Catalin Marinas
2018-09-18 16:14         ` Will Deacon
2018-09-18 17:11           ` Catalin Marinas
2018-09-18 20:30             ` Steve Capper
2018-09-19 10:29             ` Will Deacon [this message]
2018-09-19 15:37               ` Steve Capper
2018-09-19 16:05                 ` Will Deacon
2018-09-18 20:26       ` Steve Capper
2018-09-18 15:18   ` Punit Agrawal

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:192b3ba0707 dfblob:edae6774132 )
 OR (
bs:"Huge page(contiguous bit) slow down" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180919102937.GD25094@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).