linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM
@ 2016-04-13 15:01 Catalin Marinas
  2016-06-07 14:13 ` Alexander Graf
  0 siblings, 1 reply; 5+ messages in thread
From: Catalin Marinas @ 2016-04-13 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

When hardware updates of the access and dirty states are enabled, the
default ptep_set_access_flags() implementation based on calling
set_pte_at() directly is potentially racy. This triggers the "racy dirty
state clearing" warning in set_pte_at() because an existing writable PTE
is overridden with a clean entry.

There are two main scenarios for this situation:

1. The CPU getting an access fault does not support hardware updates of
   the access/dirty flags. However, a different agent in the system
   (e.g. SMMU) can do this, therefore overriding a writable entry with a
   clean one could potentially lose the automatically updated dirty
   status

2. A more complex situation is possible when all CPUs support hardware
   AF/DBM:

   a) Initial state: shareable + writable vma and pte_none(pte)
   b) Read fault taken by two threads of the same process on different
      CPUs
   c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
      eventually reaches do_set_pte() which sets a writable + clean pte.
      CPU0 releases the mmap_sem
   d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
      pte entry it reads is present, writable and clean and it continues
      to pte_mkyoung()
   e) CPU1 calls ptep_set_access_flags()

   If between (d) and (e) the hardware (another CPU) updates the dirty
   state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
   marking the entry clean again.

This patch implements an arm64-specific ptep_set_access_flags() function
to perform an atomic update of the PTE flags.

Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.3+
---
 arch/arm64/include/asm/pgtable.h |  5 ++++
 arch/arm64/mm/fault.c            | 49 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 989fef16d461..12ecc1f398d0 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -526,6 +526,11 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 }
 
 #ifdef CONFIG_ARM64_HW_AFDBM
+#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
+extern int ptep_set_access_flags(struct vm_area_struct *vma,
+				 unsigned long address, pte_t *ptep,
+				 pte_t entry, int dirty);
+
 /*
  * Atomic pte/pmd modifications.
  */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 95df28bc875f..639083f901cd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -81,6 +81,55 @@ void show_pte(struct mm_struct *mm, unsigned long addr)
 	printk("\n");
 }
 
+#ifdef CONFIG_ARM64_HW_AFDBM
+/*
+ * It only sets the access flags (dirty, accessed), as well as write
+ * permission, and only to a more permissive setting. This function needs to
+ * cope with hardware update of the accessed/dirty state by other agents in
+ * the system. It can safely skip the __sync_icache_dcache() call as in
+ * set_pte_at() since the PTE is never changed from no-exec to exec by this
+ * function.
+ * It returns whether the PTE actually changed.
+ */
+int ptep_set_access_flags(struct vm_area_struct *vma,
+			  unsigned long address, pte_t *ptep,
+			  pte_t entry, int dirty)
+{
+	pteval_t old_pteval;
+	unsigned int tmp;
+
+	if (pte_same(*ptep, entry))
+		return 0;
+
+	/* only preserve the access flags and write permission */
+	pte_val(entry) &= PTE_AF | PTE_WRITE | PTE_DIRTY;
+
+	/*
+	 * PTE_RDONLY is cleared by default in the asm below, so set it in
+	 * back if necessary (read-only or clean PTE).
+	 */
+	if (!pte_write(entry) || !dirty)
+		pte_val(entry) |= PTE_RDONLY;
+
+	/*
+	 * Setting the flags must be done atomically to avoid racing with the
+	 * hardware update of the access/dirty state.
+	 */
+	asm volatile("//	ptep_set_access_flags\n"
+	"	prfm	pstl1strm, %2\n"
+	"1:	ldxr	%0, %2\n"
+	"	and	%0, %0, %3		// clear PTE_RDONLY\n"
+	"	orr	%0, %0, %4		// set flags\n"
+	"	stxr	%w1, %0, %2\n"
+	"	cbnz	%w1, 1b\n"
+	: "=&r" (old_pteval), "=&r" (tmp), "+Q" (pte_val(*ptep))
+	: "L" (~PTE_RDONLY), "r" (pte_val(entry)));
+
+	flush_tlb_fix_spurious_fault(vma, address);
+	return 1;
+}
+#endif
+
 /*
  * The kernel tried to access some page that wasn't present.
  */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM
  2016-04-13 15:01 [PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
@ 2016-06-07 14:13 ` Alexander Graf
  2016-06-07 16:30   ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander Graf @ 2016-06-07 14:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 13.04.16 17:01, Catalin Marinas wrote:
> When hardware updates of the access and dirty states are enabled, the
> default ptep_set_access_flags() implementation based on calling
> set_pte_at() directly is potentially racy. This triggers the "racy dirty
> state clearing" warning in set_pte_at() because an existing writable PTE
> is overridden with a clean entry.
> 
> There are two main scenarios for this situation:
> 
> 1. The CPU getting an access fault does not support hardware updates of
>    the access/dirty flags. However, a different agent in the system
>    (e.g. SMMU) can do this, therefore overriding a writable entry with a
>    clean one could potentially lose the automatically updated dirty
>    status
> 
> 2. A more complex situation is possible when all CPUs support hardware
>    AF/DBM:
> 
>    a) Initial state: shareable + writable vma and pte_none(pte)
>    b) Read fault taken by two threads of the same process on different
>       CPUs
>    c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
>       eventually reaches do_set_pte() which sets a writable + clean pte.
>       CPU0 releases the mmap_sem
>    d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
>       pte entry it reads is present, writable and clean and it continues
>       to pte_mkyoung()
>    e) CPU1 calls ptep_set_access_flags()
> 
>    If between (d) and (e) the hardware (another CPU) updates the dirty
>    state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
>    marking the entry clean again.
> 
> This patch implements an arm64-specific ptep_set_access_flags() function
> to perform an atomic update of the PTE flags.
> 
> Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits")
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Ming Lei <tom.leiming@gmail.com>
> Tested-by: Julien Grall <julien.grall@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: <stable@vger.kernel.org> # 4.3+

This patch breaks swapping for me.

I've hit weird issues where systems stopped working half-way, with the
kernel still being fine and user space applications just stopping to
respond.

After some debugging we found out that it always happens when swapping
(to anything, backing storage doesn't matter). A quick bisect points to
this commit as culprit and indeed, if I disable CONFIG_ARM64_HW_AFDBM
the system works as expected.

For reference, here's my test case:

  $ qemu-system-aarch64 -nographic -M virt -cpu host -m 800M -kernel
Image  -initrd initrd.test -enable-kvm -append rd.break=pre-mount\
loglevel=9

Inside the VM:

  $ modprobe zram; echo $(( 256 * 1024 * 1024 )) >
/sys/block/zram0/disksize; mkswap /dev/zram0; swapon /dev/zram0
  $ dd if=/dev/zero of=/dev/null bs=700M &
  $ top

In the broken case, you'll see either systemd cpu time spike (because
it's stuck in a page fault loop) or the system hang (because the
application owning the screen is stuck in a page fault loop).

The back traces indicate that the page fault handler goes through and
the process just keeps banging on the same page fault over and over
again. I have not yet figured out what *exactly* is going wrong or why
this patch would actually give us that effect.

I was able to fully reproduce the issue with current Linus tree (4.7-rc2+).


Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM
  2016-06-07 14:13 ` Alexander Graf
@ 2016-06-07 16:30   ` Will Deacon
  2016-06-07 16:34     ` Catalin Marinas
  2016-06-07 16:38     ` Alexander Graf
  0 siblings, 2 replies; 5+ messages in thread
From: Will Deacon @ 2016-06-07 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alex,

Thanks for the bug report.

On Tue, Jun 07, 2016 at 04:13:03PM +0200, Alexander Graf wrote:
> On 13.04.16 17:01, Catalin Marinas wrote:
> > When hardware updates of the access and dirty states are enabled, the
> > default ptep_set_access_flags() implementation based on calling
> > set_pte_at() directly is potentially racy. This triggers the "racy dirty
> > state clearing" warning in set_pte_at() because an existing writable PTE
> > is overridden with a clean entry.

[...]

> This patch breaks swapping for me.
> 
> I've hit weird issues where systems stopped working half-way, with the
> kernel still being fine and user space applications just stopping to
> respond.
> 
> After some debugging we found out that it always happens when swapping
> (to anything, backing storage doesn't matter). A quick bisect points to
> this commit as culprit and indeed, if I disable CONFIG_ARM64_HW_AFDBM
> the system works as expected.

[...]

> The back traces indicate that the page fault handler goes through and
> the process just keeps banging on the same page fault over and over
> again. I have not yet figured out what *exactly* is going wrong or why
> this patch would actually give us that effect.
> 
> I was able to fully reproduce the issue with current Linus tree (4.7-rc2+).

It looks like the following happens:

  1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
  2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
  3. A read faults due to the missing access flag
  4. ptep_set_access_flags is called with dirty = 0, due to the read fault
  5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
  6. A write faults, but pte_write is true so we get stuck

Does the diff below fix the issue for you? If so, I'll write a proper
patch.

Cheers,

Will

--->8

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 5954881a35ac..ba3fc12bd272 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -109,7 +109,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
 	 * PTE_RDONLY is cleared by default in the asm below, so set it in
 	 * back if necessary (read-only or clean PTE).
 	 */
-	if (!pte_write(entry) || !dirty)
+	if (!pte_write(entry) || !pte_sw_dirty(entry))
 		pte_val(entry) |= PTE_RDONLY;
 
 	/*

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM
  2016-06-07 16:30   ` Will Deacon
@ 2016-06-07 16:34     ` Catalin Marinas
  2016-06-07 16:38     ` Alexander Graf
  1 sibling, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2016-06-07 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 07, 2016 at 05:30:25PM +0100, Will Deacon wrote:
> On Tue, Jun 07, 2016 at 04:13:03PM +0200, Alexander Graf wrote:
> > On 13.04.16 17:01, Catalin Marinas wrote:
> > > When hardware updates of the access and dirty states are enabled, the
> > > default ptep_set_access_flags() implementation based on calling
> > > set_pte_at() directly is potentially racy. This triggers the "racy dirty
> > > state clearing" warning in set_pte_at() because an existing writable PTE
> > > is overridden with a clean entry.
> 
> [...]
> 
> > This patch breaks swapping for me.
> > 
> > I've hit weird issues where systems stopped working half-way, with the
> > kernel still being fine and user space applications just stopping to
> > respond.
> > 
> > After some debugging we found out that it always happens when swapping
> > (to anything, backing storage doesn't matter). A quick bisect points to
> > this commit as culprit and indeed, if I disable CONFIG_ARM64_HW_AFDBM
> > the system works as expected.
> 
> [...]
> 
> > The back traces indicate that the page fault handler goes through and
> > the process just keeps banging on the same page fault over and over
> > again. I have not yet figured out what *exactly* is going wrong or why
> > this patch would actually give us that effect.
> > 
> > I was able to fully reproduce the issue with current Linus tree (4.7-rc2+).
> 
> It looks like the following happens:
> 
>   1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
>   2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
>   3. A read faults due to the missing access flag
>   4. ptep_set_access_flags is called with dirty = 0, due to the read fault
>   5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
>   6. A write faults, but pte_write is true so we get stuck
> 
> Does the diff below fix the issue for you? If so, I'll write a proper
> patch.
> 
> Cheers,
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 5954881a35ac..ba3fc12bd272 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -109,7 +109,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
>  	 * PTE_RDONLY is cleared by default in the asm below, so set it in
>  	 * back if necessary (read-only or clean PTE).
>  	 */
> -	if (!pte_write(entry) || !dirty)
> +	if (!pte_write(entry) || !pte_sw_dirty(entry))
>  		pte_val(entry) |= PTE_RDONLY;
>  
>  	/*

Whether it fixes this particular case or not, I think this patch is
still valid. So feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM
  2016-06-07 16:30   ` Will Deacon
  2016-06-07 16:34     ` Catalin Marinas
@ 2016-06-07 16:38     ` Alexander Graf
  1 sibling, 0 replies; 5+ messages in thread
From: Alexander Graf @ 2016-06-07 16:38 UTC (permalink / raw)
  To: linux-arm-kernel



On 07.06.16 18:30, Will Deacon wrote:
> Hi Alex,
> 
> Thanks for the bug report.
> 
> On Tue, Jun 07, 2016 at 04:13:03PM +0200, Alexander Graf wrote:
>> On 13.04.16 17:01, Catalin Marinas wrote:
>>> When hardware updates of the access and dirty states are enabled, the
>>> default ptep_set_access_flags() implementation based on calling
>>> set_pte_at() directly is potentially racy. This triggers the "racy dirty
>>> state clearing" warning in set_pte_at() because an existing writable PTE
>>> is overridden with a clean entry.
> 
> [...]
> 
>> This patch breaks swapping for me.
>>
>> I've hit weird issues where systems stopped working half-way, with the
>> kernel still being fine and user space applications just stopping to
>> respond.
>>
>> After some debugging we found out that it always happens when swapping
>> (to anything, backing storage doesn't matter). A quick bisect points to
>> this commit as culprit and indeed, if I disable CONFIG_ARM64_HW_AFDBM
>> the system works as expected.
> 
> [...]
> 
>> The back traces indicate that the page fault handler goes through and
>> the process just keeps banging on the same page fault over and over
>> again. I have not yet figured out what *exactly* is going wrong or why
>> this patch would actually give us that effect.
>>
>> I was able to fully reproduce the issue with current Linus tree (4.7-rc2+).
> 
> It looks like the following happens:
> 
>   1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
>   2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
>   3. A read faults due to the missing access flag
>   4. ptep_set_access_flags is called with dirty = 0, due to the read fault
>   5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
>   6. A write faults, but pte_write is true so we get stuck
> 
> Does the diff below fix the issue for you? If so, I'll write a proper
> patch.

With that patch on top of Linus master I don't see the issue anymore,
yes :).


Thanks for the quick reply!

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-07 16:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-13 15:01 [PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM Catalin Marinas
2016-06-07 14:13 ` Alexander Graf
2016-06-07 16:30   ` Will Deacon
2016-06-07 16:34     ` Catalin Marinas
2016-06-07 16:38     ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).