All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave@sr71.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>, Andi Kleen <ak@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH 6/6] x86: Fix stray A/D bit setting into non-present PTEs
Date: Thu, 30 Jun 2016 21:39:52 -0700	[thread overview]
Message-ID: <5775F418.2000803@sr71.net> (raw)
In-Reply-To: <CA+55aFwm74uiqwsV5dvVMDBAthwmHub3J3Wz9cso0PpgVTHUPA@mail.gmail.com>

On 06/30/2016 07:55 PM, Linus Torvalds wrote:
> On Thu, Jun 30, 2016 at 5:12 PM, Dave Hansen <dave@sr71.net> wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>> The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
>> Landing) has an erratum where a processor thread setting the Accessed
>> or Dirty bits may not do so atomically against its checks for the
>> Present bit.  This may cause a thread (which is about to page fault)
>> to set A and/or D, even though the Present bit had already been
>> atomically cleared.
> 
> So I don't think your approach is wrong, but I suspect this is
> overkill, and what we should instead just do is to not use the A/D
> bits at all in the swap representation.

We actually don't even use Dirty today.  It's (implicitly) used to
determine pte_none(), but it ends up being masked out for the
swp_offset/type() calculations entirely, much to my surprise.

I think what you suggest will work if we don't consider A/D in
pte_none().  I think there are a bunch of code path where assume that
!pte_present() && !pte_none() means swap.

> The swap-entry representation was a bit tight on 32-bit page table
> entries, but in 64-bit ones, I think we have tons of bits, don't we?
> So we could decide just to not use those two bits on x86.

Yeah, we've definitely got space.  I'll go poke around and make sure
that this works everywhere.  I agree that throwing 32-bit non-PAE under
the bus is definitely worth it here.  Nobody will care about that in a
million years.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave@sr71.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>, Andi Kleen <ak@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH 6/6] x86: Fix stray A/D bit setting into non-present PTEs
Date: Thu, 30 Jun 2016 21:39:52 -0700	[thread overview]
Message-ID: <5775F418.2000803@sr71.net> (raw)
In-Reply-To: <CA+55aFwm74uiqwsV5dvVMDBAthwmHub3J3Wz9cso0PpgVTHUPA@mail.gmail.com>

On 06/30/2016 07:55 PM, Linus Torvalds wrote:
> On Thu, Jun 30, 2016 at 5:12 PM, Dave Hansen <dave@sr71.net> wrote:
>> From: Dave Hansen <dave.hansen@linux.intel.com>
>> The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
>> Landing) has an erratum where a processor thread setting the Accessed
>> or Dirty bits may not do so atomically against its checks for the
>> Present bit.  This may cause a thread (which is about to page fault)
>> to set A and/or D, even though the Present bit had already been
>> atomically cleared.
> 
> So I don't think your approach is wrong, but I suspect this is
> overkill, and what we should instead just do is to not use the A/D
> bits at all in the swap representation.

We actually don't even use Dirty today.  It's (implicitly) used to
determine pte_none(), but it ends up being masked out for the
swp_offset/type() calculations entirely, much to my surprise.

I think what you suggest will work if we don't consider A/D in
pte_none().  I think there are a bunch of code path where assume that
!pte_present() && !pte_none() means swap.

> The swap-entry representation was a bit tight on 32-bit page table
> entries, but in 64-bit ones, I think we have tons of bits, don't we?
> So we could decide just to not use those two bits on x86.

Yeah, we've definitely got space.  I'll go poke around and make sure
that this works everywhere.  I agree that throwing 32-bit non-PAE under
the bus is definitely worth it here.  Nobody will care about that in a
million years.

  parent reply	other threads:[~2016-07-01  4:39 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01  0:12 [PATCH 0/6] [v3] Workaround for Xeon Phi PTE A/D bits erratum Dave Hansen
2016-07-01  0:12 ` Dave Hansen
2016-07-01  0:12 ` [PATCH 1/6] x86: fix duplicated X86_BUG(9) macro Dave Hansen
2016-07-01  0:12   ` Dave Hansen
2016-07-01  0:12   ` Dave Hansen
2016-07-01  9:23   ` Borislav Petkov
2016-07-01 16:30     ` Andy Lutomirski
2016-07-01 16:30       ` Andy Lutomirski
2016-07-01 16:46       ` Borislav Petkov
2016-07-03 14:36         ` Andy Lutomirski
2016-07-03 14:36           ` Andy Lutomirski
2016-07-03 18:44           ` Borislav Petkov
2016-07-01  0:12 ` [PATCH 2/6] mm, tlb: add mmu_gather->saw_unset_a_or_d Dave Hansen
2016-07-01  0:12   ` Dave Hansen
2016-07-01  0:12 ` [PATCH 3/6] mm: add force_batch_flush to mmu_gather Dave Hansen
2016-07-01  0:12   ` Dave Hansen
2016-07-01  0:12 ` [PATCH 4/6] mm: move flush in madvise_free_pte_range() Dave Hansen
2016-07-01  0:12   ` Dave Hansen
2016-07-01  0:12 ` [PATCH 5/6] mm: make tlb_flush_mmu_tlbonly() return whether it flushed Dave Hansen
2016-07-01  0:12   ` Dave Hansen
2016-07-01  0:12 ` [PATCH 6/6] x86: Fix stray A/D bit setting into non-present PTEs Dave Hansen
2016-07-01  0:12   ` Dave Hansen
2016-07-01  1:50   ` Nadav Amit
2016-07-01  1:50     ` Nadav Amit
2016-07-01  1:54     ` Dave Hansen
2016-07-01  1:54       ` Dave Hansen
2016-07-01  2:55   ` Linus Torvalds
2016-07-01  2:55     ` Linus Torvalds
2016-07-01  3:06     ` Brian Gerst
2016-07-01  3:06       ` Brian Gerst
2016-07-01  3:21       ` Linus Torvalds
2016-07-03 17:10       ` Dave Hansen
2016-07-03 17:10         ` Dave Hansen
2016-07-01  4:39     ` Dave Hansen [this message]
2016-07-01  4:39       ` Dave Hansen
2016-07-01  5:43       ` Linus Torvalds
2016-07-01  5:43         ` Linus Torvalds
2016-07-01 14:25         ` Eric W. Biederman
2016-07-01 14:25           ` Eric W. Biederman
2016-07-01 15:51           ` Dave Hansen
2016-07-01 15:51             ` Dave Hansen
2016-07-01 18:12             ` Eric W. Biederman
2016-07-01 18:12               ` Eric W. Biederman
2016-07-01 16:07       ` Linus Torvalds
2016-07-01 16:07         ` Linus Torvalds
2016-07-01 16:14         ` Dave Hansen
2016-07-01 16:14           ` Dave Hansen
2016-07-01 16:25           ` Linus Torvalds
2016-07-01 16:25             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5775F418.2000803@sr71.net \
    --to=dave@sr71.net \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.