Re: [PATCH 0/6] Fix TLB invalidation on arm64

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nicholas Piggin <npiggin@gmail.com>
To: linux-arm-kernel@lists.infradead.org, Will Deacon <will@kernel.org>
Cc: linux-arch@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Marc Zyngier <maz@kernel.org>
Subject: Re: [PATCH 0/6] Fix TLB invalidation on arm64
Date: Wed, 28 Aug 2019 10:35:24 +1000	[thread overview]
Message-ID: <1566947104.2uma6s0pl1.astroid@bobo.none> (raw)
In-Reply-To: <20190827131818.14724-1-will@kernel.org>

Will Deacon's on August 27, 2019 11:18 pm:
> can actually raise a translation fault on the load instruction because the
> translation can be performed speculatively before the page table update and
> then marked as "faulting" by the CPU. For user PTEs, this is ok because we
> can handle the spurious fault, but for kernel PTEs and intermediate table
> entries this results in a panic().

powerpc sounds like it has the same coherency issue with stores vs loads 
from the MMU's page table walker, and a barrier called ptesync to order 
them.

> We can fix this by reverting 24fe1b0efad4fcdd, but the fun doesn't stop
> there. If we consider the unmap case, then a similar constraint applies to
> ordering subsequent memory accesses after the completion of the TLB
> invalidation, so we also need to add an ISB instruction to
> __flush_tlb_kernel_pgtable(). For user addresses, the exception return
> provides the necessary context synchronisation.
> 
> This then raises an interesting question: if an ISB is required after a TLBI
> instruction to prevent speculative translation of subsequent instructions,
> how is this speculation prevented on concurrent CPUs that receive the
> broadcast TLB invalidation message? Sending and completing a broadcast TLB
> invalidation message does not imply execution of an ISB on the remote CPU,
> however it /does/ require that the remote CPU will no longer make use of any
> old translations because otherwise we wouldn't be able to guarantee that an
> unmapped page could no longer be modified. In this regard, receiving a TLB
> invalidation is in some ways stronger than sending one (where you need the
> ISB).

Similar with powerpc's tlbie, sender requires extra barriers!

> So far, so good, but the final piece of the puzzle isn't quite so rosy.
> 
> *** Other architecture maintainers -- start here! ***
> 
> In the case that one CPU maps a page and then sets a flag to tell another
> CPU:
> 
> 	CPU 0
> 	-----
> 
> 	MOV	X0, <valid pte>
> 	STR	X0, [Xptep]	// Store new PTE to page table
> 	DSB	ISHST
> 	ISB
> 	MOV	X1, #1
> 	STR	X1, [Xflag]	// Set the flag
> 
> 	CPU 1
> 	-----
> 
> loop:	LDAR	X0, [Xflag]	// Poll flag with Acquire semantics
> 	CBZ	X0, loop
> 	LDR	X1, [X2]	// Translates using the new PTE
> 
> then the final load on CPU 1 can raise a translation fault for the same
> reasons as mentioned at the start of this description.

powerpc's ptesync instruction is defined to order MMU memory accesses on
all other CPUs. ptesync does not go out to the fabric though. How does
it work then?

Because the MMU coherency problem (at least we have) is not that the
load will begin to "partially" execute ahead of the store, enough to
kick off a table walk that goes ahead of the store, but not so much
that it violates the regular CPU barriers. It's just that the loads
from the MMU don't participate in the LSU pipeline, they don't snoop
the store queues aren't inserted into load queues so the regular
memory barrier instructions won't see stores from other threads cuasing
ordering violations.

In your first example, if powerpc just has a normal memory barrier
there instead of a ptesync, it could all execute completely
non-speculatively and in-order but still cause a fault, because the
table walker's loads didn't see the store in the store queue.

From the other side of the fabric you have no such problem. The table
walker is cache coherent apart from the local stores, so we don't need a 
special barrier on the other side. That's why ptesync doesn't broadcast.

I would be surprised if ARM's issue is different, but interested to 
hear if it is.

> In reality, code
> such as:
> 
> 	CPU 0				CPU 1
> 	-----				-----
> 	spin_lock(&lock);		spin_lock(&lock);
> 	*ptr = vmalloc(size);		if (*ptr)
> 	spin_unlock(&lock);			foo = **ptr;
> 					spin_unlock(&lock);
> 
> will not trigger the fault because there is an address dependency on
> CPU1 which prevents the speculative translation. However, more exotic
> code where the virtual address is known ahead of time, such as:
> 
> 	CPU 0				CPU 1
> 	-----				-----
> 	spin_lock(&lock);		spin_lock(&lock);
> 	set_fixmap(0, paddr, prot);	if (mapped)
> 	mapped = true;				foo = *fix_to_virt(0);
> 	spin_unlock(&lock);		spin_unlock(&lock);
> 
> could fault.

This is kind of a different issue, or part of a wider one at least.
Consider speculative execution more generally, any branch mispredict can 
send us off to crazy town executing instructions using nonsense register
values. CPU0 does not have to be in the picture, or any kernel page 
table modification at all, CPU1 alone will be doing speculative loads 
wildly all over the kernel address space and trying to access pages with
no pte.

Yet we don't have to flush TLB when creating a new kernel mapping, and
we don't get spurious kernel faults. The page table walker won't
install negative entries, at least not "architectural" i.e., that cause
faults and require flushing. My guess is ARM is similar, or you would 
have seen bigger problems by now?

If you have CPU0 doing a ro->rw upgrade on a kernel PTE, then it may
be possible another CPU1 would speculatively install a ro TLB and then
spurious fault on it when attempting to store to it. But no amount of
barriers would help because CPU1 could have picked up that TLB any time
in the past.

Thanks,
Nick

WARNING: multiple messages have this Message-ID (diff)

From: Nicholas Piggin <npiggin@gmail.com>
To: linux-arm-kernel@lists.infradead.org, Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	linux-arch@vger.kernel.org, Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 0/6] Fix TLB invalidation on arm64
Date: Wed, 28 Aug 2019 10:35:24 +1000	[thread overview]
Message-ID: <1566947104.2uma6s0pl1.astroid@bobo.none> (raw)
Message-ID: <20190828003524.bMAt4oyqvCT8sao7KDr_vkGKFRnvRD56D_SgO51HIbA@z> (raw)
In-Reply-To: <20190827131818.14724-1-will@kernel.org>

Will Deacon's on August 27, 2019 11:18 pm:
> can actually raise a translation fault on the load instruction because the
> translation can be performed speculatively before the page table update and
> then marked as "faulting" by the CPU. For user PTEs, this is ok because we
> can handle the spurious fault, but for kernel PTEs and intermediate table
> entries this results in a panic().

powerpc sounds like it has the same coherency issue with stores vs loads 
from the MMU's page table walker, and a barrier called ptesync to order 
them.

> We can fix this by reverting 24fe1b0efad4fcdd, but the fun doesn't stop
> there. If we consider the unmap case, then a similar constraint applies to
> ordering subsequent memory accesses after the completion of the TLB
> invalidation, so we also need to add an ISB instruction to
> __flush_tlb_kernel_pgtable(). For user addresses, the exception return
> provides the necessary context synchronisation.
> 
> This then raises an interesting question: if an ISB is required after a TLBI
> instruction to prevent speculative translation of subsequent instructions,
> how is this speculation prevented on concurrent CPUs that receive the
> broadcast TLB invalidation message? Sending and completing a broadcast TLB
> invalidation message does not imply execution of an ISB on the remote CPU,
> however it /does/ require that the remote CPU will no longer make use of any
> old translations because otherwise we wouldn't be able to guarantee that an
> unmapped page could no longer be modified. In this regard, receiving a TLB
> invalidation is in some ways stronger than sending one (where you need the
> ISB).

Similar with powerpc's tlbie, sender requires extra barriers!

> So far, so good, but the final piece of the puzzle isn't quite so rosy.
> 
> *** Other architecture maintainers -- start here! ***
> 
> In the case that one CPU maps a page and then sets a flag to tell another
> CPU:
> 
> 	CPU 0
> 	-----
> 
> 	MOV	X0, <valid pte>
> 	STR	X0, [Xptep]	// Store new PTE to page table
> 	DSB	ISHST
> 	ISB
> 	MOV	X1, #1
> 	STR	X1, [Xflag]	// Set the flag
> 
> 	CPU 1
> 	-----
> 
> loop:	LDAR	X0, [Xflag]	// Poll flag with Acquire semantics
> 	CBZ	X0, loop
> 	LDR	X1, [X2]	// Translates using the new PTE
> 
> then the final load on CPU 1 can raise a translation fault for the same
> reasons as mentioned at the start of this description.

powerpc's ptesync instruction is defined to order MMU memory accesses on
all other CPUs. ptesync does not go out to the fabric though. How does
it work then?

Because the MMU coherency problem (at least we have) is not that the
load will begin to "partially" execute ahead of the store, enough to
kick off a table walk that goes ahead of the store, but not so much
that it violates the regular CPU barriers. It's just that the loads
from the MMU don't participate in the LSU pipeline, they don't snoop
the store queues aren't inserted into load queues so the regular
memory barrier instructions won't see stores from other threads cuasing
ordering violations.

In your first example, if powerpc just has a normal memory barrier
there instead of a ptesync, it could all execute completely
non-speculatively and in-order but still cause a fault, because the
table walker's loads didn't see the store in the store queue.

next prev parent reply	other threads:[~2019-08-28  0:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27 13:18 [PATCH 0/6] Fix TLB invalidation on arm64 Will Deacon
2019-08-27 13:18 ` Will Deacon
2019-08-27 13:18 ` [PATCH 1/6] Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}" Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-28  2:29   ` [PATCH 1/6] Revert "arm64: Remove unnecessary ISBs from set_{pte, pmd, pud}" Sasha Levin
2019-08-27 13:18 ` [PATCH 2/6] arm64: tlb: Ensure we execute an ISB following walk cache invalidation Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 3/6] arm64: mm: Add ISB instruction to set_pgd() Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 4/6] arm64: sysreg: Add some field definitions for PAR_EL1 Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 5/6] arm64: mm: Ignore spurious translation faults taken from the kernel Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 6/6] arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 16:19 ` [PATCH 0/6] Fix TLB invalidation on arm64 Mark Rutland
2019-08-27 16:19   ` Mark Rutland
2019-08-28  0:35 ` Nicholas Piggin [this message]
2019-08-28  0:35   ` Nicholas Piggin
2019-08-28 16:12   ` Will Deacon
2019-08-28 16:12     ` Will Deacon
2019-08-29 14:08     ` Nicholas Piggin
2019-08-29 14:08       ` Nicholas Piggin
2019-08-30 12:40       ` Will Deacon
2019-08-30 12:40         ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1566947104.2uma6s0pl1.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).