All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Dave Hansen <dave@sr71.net>, linux-kernel@vger.kernel.org
Cc: x86@kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	bp@alien8.de, ak@linux.intel.com, mhocko@suse.com
Subject: Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum
Date: Sat, 02 Jul 2016 08:28:12 +1000	[thread overview]
Message-ID: <1467412092.7422.56.camel@kernel.crashing.org> (raw)
In-Reply-To: <20160701174658.6ED27E64@viggo.jf.intel.com>

On Fri, 2016-07-01 at 10:46 -0700, Dave Hansen wrote:
> The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
> Landing) has an erratum where a processor thread setting the Accessed
> or Dirty bits may not do so atomically against its checks for the
> Present bit.A  This may cause a thread (which is about to page fault)
> to set A and/or D, even though the Present bit had already been
> atomically cleared.

Interesting.... I always wondered where in the Intel docs did it specify
that present was tested atomically with setting of A and D ... I couldn't
find it.

Isn't there a more fundamental issue however that you may actually lose
those bits ? For example if we do an munmap, in zap_pte_range()

We first exchange all the PTEs with 0 with ptep_get_and_clear_full()
and we then transfer D that we just read into the struct page.

We rely on the fact that D will never be set again, what we go it a
"final" D bit. IE. We rely on the fact that a processor either:

   - Has a cached PTE in its TLB with D set, in which case it can still
write to the page until we flush the TLB or

   - Doesn't have a cached PTE in its TLB with D set and so will fail
to do so due to the atomic P check, thus never writing.

With the errata, don't you have a situation where a processor in the second
category will write and set D despite P having been cleared (due to the
race) and thus causing us to miss the transfer of that D to the struct
page and essentially completely miss that the physical page is dirty ?

(Leading to memory corruption).

> If the PTE is used for storing a swap index or a NUMA migration index,
> the A bit could be misinterpreted as part of the swap type.A  The stray
> bits being set cause a software-cleared PTE to be interpreted as a
> swap entry.A  In some cases (like when the swap index ends up being
> for a non-existent swapfile), the kernel detects the stray value
> and WARN()s about it, but there is no guarantee that the kernel can
> always detect it.
> 
> This patch changes the kernel to attempt to ignore those stray bits
> when they get set.A  We do this by making our swap PTE format
> completely ignore the A/D bits, and also by ignoring them in our
> pte_none() checks.
> 
> Andi Kleen wrote the original version of this patch.A  Dave Hansen
> wrote the later ones.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Dave Hansen <dave@sr71.net>, linux-kernel@vger.kernel.org
Cc: x86@kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	bp@alien8.de, ak@linux.intel.com, mhocko@suse.com
Subject: Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum
Date: Sat, 02 Jul 2016 08:28:12 +1000	[thread overview]
Message-ID: <1467412092.7422.56.camel@kernel.crashing.org> (raw)
In-Reply-To: <20160701174658.6ED27E64@viggo.jf.intel.com>

On Fri, 2016-07-01 at 10:46 -0700, Dave Hansen wrote:
> The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
> Landing) has an erratum where a processor thread setting the Accessed
> or Dirty bits may not do so atomically against its checks for the
> Present bit.  This may cause a thread (which is about to page fault)
> to set A and/or D, even though the Present bit had already been
> atomically cleared.

Interesting.... I always wondered where in the Intel docs did it specify
that present was tested atomically with setting of A and D ... I couldn't
find it.

Isn't there a more fundamental issue however that you may actually lose
those bits ? For example if we do an munmap, in zap_pte_range()

We first exchange all the PTEs with 0 with ptep_get_and_clear_full()
and we then transfer D that we just read into the struct page.

We rely on the fact that D will never be set again, what we go it a
"final" D bit. IE. We rely on the fact that a processor either:

   - Has a cached PTE in its TLB with D set, in which case it can still
write to the page until we flush the TLB or

   - Doesn't have a cached PTE in its TLB with D set and so will fail
to do so due to the atomic P check, thus never writing.

With the errata, don't you have a situation where a processor in the second
category will write and set D despite P having been cleared (due to the
race) and thus causing us to miss the transfer of that D to the struct
page and essentially completely miss that the physical page is dirty ?

(Leading to memory corruption).

> If the PTE is used for storing a swap index or a NUMA migration index,
> the A bit could be misinterpreted as part of the swap type.  The stray
> bits being set cause a software-cleared PTE to be interpreted as a
> swap entry.  In some cases (like when the swap index ends up being
> for a non-existent swapfile), the kernel detects the stray value
> and WARN()s about it, but there is no guarantee that the kernel can
> always detect it.
> 
> This patch changes the kernel to attempt to ignore those stray bits
> when they get set.  We do this by making our swap PTE format
> completely ignore the A/D bits, and also by ignoring them in our
> pte_none() checks.
> 
> Andi Kleen wrote the original version of this patch.  Dave Hansen
> wrote the later ones.

  parent reply	other threads:[~2016-07-01 22:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 17:46 [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum Dave Hansen
2016-07-01 17:46 ` Dave Hansen
2016-07-01 17:47 ` [PATCH 1/4] x86, swap: move swap offset/type up in PTE to work around erratum Dave Hansen
2016-07-01 17:47   ` Dave Hansen
2016-07-01 17:47 ` [PATCH 2/4] x86, pagetable: ignore A/D bits in pte/pmd/pud_none() Dave Hansen
2016-07-01 17:47   ` Dave Hansen
2016-07-01 17:47 ` [PATCH 3/4] x86: disallow running with 32-bit PTEs to work around erratum Dave Hansen
2016-07-01 17:47   ` Dave Hansen
2016-07-01 17:47 ` [PATCH 4/4] x86: use pte_none() to test for empty PTE Dave Hansen
2016-07-01 17:47   ` Dave Hansen
2016-07-01 22:28 ` Benjamin Herrenschmidt [this message]
2016-07-01 22:28   ` [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum Benjamin Herrenschmidt
2016-07-13 11:37   ` Vlastimil Babka
2016-07-13 11:37     ` Vlastimil Babka
2016-07-13 12:10     ` Vlastimil Babka
2016-07-13 12:10       ` Vlastimil Babka
2016-07-13 14:04     ` Dave Hansen
2016-07-13 14:04       ` Dave Hansen
  -- strict thread matches above, loose matches on Subject: below --
2016-07-08  0:19 Dave Hansen
2016-07-08  0:19 ` Dave Hansen
2016-07-13  9:54 ` Vlastimil Babka
2016-07-13  9:54   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467412092.7422.56.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave@sr71.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.