All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Mel Gorman <mgorman@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>, Dave Jones <davej@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: pipe/page fault oddness.
Date: Tue, 07 Oct 2014 00:48:43 +0530	[thread overview]
Message-ID: <87d2a5f1m4.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20141002124537.GL17501@suse.de>

Mel Gorman <mgorman@suse.de> writes:

> On Wed, Oct 01, 2014 at 09:18:25AM -0700, Linus Torvalds wrote:
>> On Wed, Oct 1, 2014 at 9:01 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> > We need to get rid of it, and just make it the same as pte_protnone().
>> > And then the real protnone is in the vma flags, and if you actually
>> > ever get to a pte that is marked protnone, you know it's a numa page.
>> 
>> So I'd really suggest we do exactly that. Get rid of "pte_numa()"
>> entirely, get rid of "_PAGE_[BIT_]NUMA" entirely, and instead add a
>> "pte_protnone()" helper to check for the "protnone" case (which on x86
>> is testing the _PAGE_PROTNONE bit, and on most other architectures is
>> just testing that the page has no access rights).
>> 
>
> Do not interpret the following as being against the idea of taking the
> pte_protnone approach. This is intended to give background.
>
> At the time the changes were made to the _PAGE_NUMA bits it was acknowledged
> that a full move to prot_none was an option but it was not the preferred
> solution at the time. It replaced one set of corner cases with another and
> the last time like this time, there was considerable time pressure. The
> VMA would be required to distinguish between a NUMA hinting fault and a
> real prot_none bit. In most cases, we have the VMA now with the exception
> of GUP. GUP would have to unconditionally go into the slow path to do the
> VMA lookup. That is not likely to be a big of a problem but it was a concern.
>
> In early implementations based on prot_none there were some VMA-based
> protection checks that had higher overhead. At the time, there were severe
> problems with overhead due to NUMA balancing and adding more was not
> desirable. This has been addressed since with changes in multiple other
> areas so it's much less of a concern now than it was. In the current shape,
> these probably is not as much a problem as long as any check on pte_numa
> was first guarded by a VMA check. One way of handling the corner cases
> where would be to pass in the VMA where available and have a VM_BUG_ON that
> fires if its a PROT_NONE VMA. That would catch problems during debugging
> without adding overhead in the !debug case.
>
> Going back to the start, the PTE bit was used as the approach due to
> concerns that a pte_protnone helper would not work on all architectures,
> ppc64 in particular.  There was no PROT_NONE bit there and instead prot_none
> protections rely on PAGE_USER not being set so it's inaccessible from
> userspace. There was discussion at the time that this could conceivably be
> broken from some sub-architectures but I don't recall the details. Looking
> at the current shape and your patch, it's conceivable that the pte_protnone
> could be implemented as a _PAGE_PRESENT && !_PAGE_USER check as long
> as it was guarded by a VMA check which x86 requires anyway. Not sure
> if that would work for PMDs as I'm not familiar with with ppc64 to tell
> offhand. Alternatively, ppc64 would potentially use the bit currently used
> for _PAGE_NUMA as a _PROT_NONE bit.

Are we still looking at these options ? I could look at implementing the
first option which will also enable us to free up one pte bit.

Note: Freeing up one bit will enable us to implement soft dirty tracking
needed for CRIU.

-aneesh


  reply	other threads:[~2014-10-06 19:18 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-30  3:33 pipe/page fault oddness Dave Jones
2014-09-30  4:27 ` Linus Torvalds
2014-09-30  4:33   ` Dave Jones
     [not found]     ` <CA+55aFwxdOBKHwwp7Zq1k19mHCyHYmYqigCVt59AtB-P7Zva1w@mail.gmail.com>
2014-09-30 15:52       ` Linus Torvalds
2014-09-30 16:03         ` Rik van Riel
2014-09-30 16:07           ` Dave Jones
2014-09-30 16:26           ` Linus Torvalds
2014-09-30 16:05         ` Dave Jones
2014-09-30 16:10           ` Linus Torvalds
2014-09-30 16:22             ` Dave Jones
2014-09-30 16:40               ` Dave Jones
2014-09-30 16:46                 ` Linus Torvalds
2014-09-30 18:20                   ` Dave Jones
2014-09-30 18:58                     ` Linus Torvalds
2014-10-01  8:19                       ` Hugh Dickins
2014-10-01 16:01                         ` Linus Torvalds
2014-10-01 16:18                           ` Linus Torvalds
2014-10-01 17:29                             ` Rik van Riel
2014-10-02  8:28                               ` Peter Zijlstra
2014-10-01 20:20                             ` Linus Torvalds
2014-10-01 21:09                               ` Rik van Riel
2014-10-01 22:08                               ` Sasha Levin
2014-10-01 22:28                                 ` Chuck Ebbert
2014-10-02  3:32                                   ` Sasha Levin
2014-10-02  8:03                                     ` Chuck Ebbert
2014-10-02 14:49                                       ` Sasha Levin
2014-10-01 22:42                                 ` Linus Torvalds
2014-10-02 14:25                                   ` Kirill A. Shutemov
2014-10-02 16:01                                     ` Linus Torvalds
2014-10-02 16:35                                       ` Kirill A. Shutemov
2014-10-02 15:04                                   ` Sasha Levin
2014-10-02 16:10                                     ` Linus Torvalds
2014-10-03  5:00                                       ` Sasha Levin
2014-10-03 15:43                                         ` Linus Torvalds
2014-10-03 15:58                                           ` Dave Jones
2014-10-03 16:02                                             ` Sasha Levin
2014-10-02 12:45                             ` Mel Gorman
2014-10-06 19:18                               ` Aneesh Kumar K.V [this message]
2014-10-07 12:45                                 ` Linus Torvalds
2014-10-08 10:37                                   ` Aneesh Kumar K.V
2014-10-02  8:47                           ` Hugh Dickins
2014-10-02 15:57                             ` Linus Torvalds
2014-09-30  4:35   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d2a5f1m4.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=davej@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.