From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Dave Hansen <dave@sr71.net>, linux-kernel@vger.kernel.org
Cc: x86@kernel.org, linux-mm@kvack.org,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
bp@alien8.de, ak@linux.intel.com, mhocko@suse.com
Subject: Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum
Date: Sat, 02 Jul 2016 08:28:12 +1000 [thread overview]
Message-ID: <1467412092.7422.56.camel@kernel.crashing.org> (raw)
In-Reply-To: <20160701174658.6ED27E64@viggo.jf.intel.com>
On Fri, 2016-07-01 at 10:46 -0700, Dave Hansen wrote:
> The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
> Landing) has an erratum where a processor thread setting the Accessed
> or Dirty bits may not do so atomically against its checks for the
> Present bit.A This may cause a thread (which is about to page fault)
> to set A and/or D, even though the Present bit had already been
> atomically cleared.
Interesting.... I always wondered where in the Intel docs did it specify
that present was tested atomically with setting of A and D ... I couldn't
find it.
Isn't there a more fundamental issue however that you may actually lose
those bits ? For example if we do an munmap, in zap_pte_range()
We first exchange all the PTEs with 0 with ptep_get_and_clear_full()
and we then transfer D that we just read into the struct page.
We rely on the fact that D will never be set again, what we go it a
"final" D bit. IE. We rely on the fact that a processor either:
- Has a cached PTE in its TLB with D set, in which case it can still
write to the page until we flush the TLB or
- Doesn't have a cached PTE in its TLB with D set and so will fail
to do so due to the atomic P check, thus never writing.
With the errata, don't you have a situation where a processor in the second
category will write and set D despite P having been cleared (due to the
race) and thus causing us to miss the transfer of that D to the struct
page and essentially completely miss that the physical page is dirty ?
(Leading to memory corruption).
> If the PTE is used for storing a swap index or a NUMA migration index,
> the A bit could be misinterpreted as part of the swap type.A The stray
> bits being set cause a software-cleared PTE to be interpreted as a
> swap entry.A In some cases (like when the swap index ends up being
> for a non-existent swapfile), the kernel detects the stray value
> and WARN()s about it, but there is no guarantee that the kernel can
> always detect it.
>
> This patch changes the kernel to attempt to ignore those stray bits
> when they get set.A We do this by making our swap PTE format
> completely ignore the A/D bits, and also by ignoring them in our
> pte_none() checks.
>
> Andi Kleen wrote the original version of this patch.A Dave Hansen
> wrote the later ones.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-07-01 22:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-01 17:46 [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum Dave Hansen
2016-07-01 17:47 ` [PATCH 1/4] x86, swap: move swap offset/type up in PTE to work around erratum Dave Hansen
2016-07-01 17:47 ` [PATCH 2/4] x86, pagetable: ignore A/D bits in pte/pmd/pud_none() Dave Hansen
2016-07-01 17:47 ` [PATCH 3/4] x86: disallow running with 32-bit PTEs to work around erratum Dave Hansen
2016-07-01 17:47 ` [PATCH 4/4] x86: use pte_none() to test for empty PTE Dave Hansen
2016-07-01 22:28 ` Benjamin Herrenschmidt [this message]
2016-07-13 11:37 ` [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum Vlastimil Babka
2016-07-13 12:10 ` Vlastimil Babka
2016-07-13 14:04 ` Dave Hansen
-- strict thread matches above, loose matches on Subject: below --
2016-07-08 0:19 Dave Hansen
2016-07-13 9:54 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1467412092.7422.56.camel@kernel.crashing.org \
--to=benh@kernel.crashing.org \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave@sr71.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).