All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@au1.ibm.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, paulus@samba.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
Date: Thu, 05 Dec 2013 10:48:13 +0530	[thread overview]
Message-ID: <87a9gfri3u.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <1386126782.16703.137.camel@pasglop>


Adding Mel and Rik to cc:

Benjamin Herrenschmidt <benh@au1.ibm.com> writes:

> On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
>> On archs like ppc64 that don't use _PAGE_PROTNONE and also have
>> a separate page table outside linux pagetable, we just need to
>> make sure that when calling change_prot_numa we flush the
>> hardware page table entry so that next page access  result in a numa
>> fault.
>
> That patch doesn't look right...
>
> You are essentially making change_prot_numa() do whatever it does (which
> I don't completely understand) *for all architectures* now, whether they
> have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that
> behaviour on powerpc book3s64, you change everybody.
>
> Is that correct ?


Yes. 

>
> Also what exactly is that doing, can you explain ? From what I can see,
> it calls back into the core of mprotect to change the protection to
> vma->vm_page_prot, which I would have expected is already the protection
> there, with the added "prot_numa" flag passed down.

it set the _PAGE_NUMA bit. Now we also want to make sure that when
we set _PAGE_NUMA, we would get a pagefault on that so that we can track
that fault as a numa fault. To ensure that, we had the below BUILD_BUG

	BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
        

But other than that the function doesn't really have any dependency on
_PAGE_PROTNONE. The only requirement is when we set _PAGE_NUMA, the
architecture should do enough to ensure that we get a page fault. Now on
ppc64 we does that by clearlying hpte entry and also clearing
_PAGE_PRESENT. Since we have _PAGE_PRESENT cleared hash_page will return
1 and we get to page fault handler.

>
> Your changeset comment says "On archs like ppc64 [...] we just need to
> make sure that when calling change_prot_numa we flush the
> hardware page table entry so that next page access  result in a numa
> fault."
>
> But change_prot_numa() does a lot more than that ... it does
> pte_mknuma(), do we need it ? I assume we do or we wouldn't have added
> that PTE bit to begin with...
>
> Now it *might* be allright and it might be that no other architecture
> cares anyway etc... but I need at least some mm folks to ack on that
> patch before I can take it because it *will* change behaviour of other
> architectures.
>

Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
change_prot_numa from task_numa_work and queue_pages_range(). The later
may be an issue. So doing the below will help ?

-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+#ifdef CONFIG_NUMA_BALANCING


-aneesh

WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@au1.ibm.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>
Cc: paulus@samba.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
Date: Thu, 05 Dec 2013 10:48:13 +0530	[thread overview]
Message-ID: <87a9gfri3u.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <1386126782.16703.137.camel@pasglop>


Adding Mel and Rik to cc:

Benjamin Herrenschmidt <benh@au1.ibm.com> writes:

> On Mon, 2013-11-18 at 14:58 +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
>> On archs like ppc64 that don't use _PAGE_PROTNONE and also have
>> a separate page table outside linux pagetable, we just need to
>> make sure that when calling change_prot_numa we flush the
>> hardware page table entry so that next page access  result in a numa
>> fault.
>
> That patch doesn't look right...
>
> You are essentially making change_prot_numa() do whatever it does (which
> I don't completely understand) *for all architectures* now, whether they
> have CONFIG_ARCH_USES_NUMA_PROT_NONE or not ... So because you want that
> behaviour on powerpc book3s64, you change everybody.
>
> Is that correct ?


Yes. 

>
> Also what exactly is that doing, can you explain ? From what I can see,
> it calls back into the core of mprotect to change the protection to
> vma->vm_page_prot, which I would have expected is already the protection
> there, with the added "prot_numa" flag passed down.

it set the _PAGE_NUMA bit. Now we also want to make sure that when
we set _PAGE_NUMA, we would get a pagefault on that so that we can track
that fault as a numa fault. To ensure that, we had the below BUILD_BUG

	BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
        

But other than that the function doesn't really have any dependency on
_PAGE_PROTNONE. The only requirement is when we set _PAGE_NUMA, the
architecture should do enough to ensure that we get a page fault. Now on
ppc64 we does that by clearlying hpte entry and also clearing
_PAGE_PRESENT. Since we have _PAGE_PRESENT cleared hash_page will return
1 and we get to page fault handler.

>
> Your changeset comment says "On archs like ppc64 [...] we just need to
> make sure that when calling change_prot_numa we flush the
> hardware page table entry so that next page access  result in a numa
> fault."
>
> But change_prot_numa() does a lot more than that ... it does
> pte_mknuma(), do we need it ? I assume we do or we wouldn't have added
> that PTE bit to begin with...
>
> Now it *might* be allright and it might be that no other architecture
> cares anyway etc... but I need at least some mm folks to ack on that
> patch before I can take it because it *will* change behaviour of other
> architectures.
>

Ok, I can move the changes below #ifdef CONFIG_NUMA_BALANCING ? We call
change_prot_numa from task_numa_work and queue_pages_range(). The later
may be an issue. So doing the below will help ?

-#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+#ifdef CONFIG_NUMA_BALANCING


-aneesh


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-12-05  5:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18  9:28 [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 Aneesh Kumar K.V
2013-11-18  9:28 ` Aneesh Kumar K.V
2013-11-18  9:28 ` [PATCH -V2 1/5] powerpc: Use HPTE constants when updating hpte bits Aneesh Kumar K.V
2013-11-18  9:28   ` Aneesh Kumar K.V
2013-11-20  4:35   ` Paul Mackerras
2013-11-20  4:35     ` Paul Mackerras
2013-11-18  9:28 ` [PATCH -V2 2/5] powerpc: Free up _PAGE_COHERENCE for numa fault use later Aneesh Kumar K.V
2013-11-18  9:28   ` Aneesh Kumar K.V
2013-11-20  4:35   ` Paul Mackerras
2013-11-20  4:35     ` Paul Mackerras
2013-11-18  9:28 ` [PATCH -V2 3/5] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE Aneesh Kumar K.V
2013-11-18  9:28   ` Aneesh Kumar K.V
2013-12-04  3:13   ` Benjamin Herrenschmidt
2013-12-04  3:13     ` Benjamin Herrenschmidt
2013-12-05  5:18     ` Aneesh Kumar K.V [this message]
2013-12-05  5:18       ` Aneesh Kumar K.V
2013-12-05  5:20       ` Benjamin Herrenschmidt
2013-12-05  5:20         ` Benjamin Herrenschmidt
2013-12-05 17:52         ` Rik van Riel
2013-12-05 17:52           ` Rik van Riel
2013-12-05 17:27     ` Rik van Riel
2013-12-05 17:27       ` Rik van Riel
2013-12-05 21:00       ` Benjamin Herrenschmidt
2013-12-05 21:00         ` Benjamin Herrenschmidt
2013-11-18  9:28 ` [PATCH -V2 4/5] powerpc: mm: Only check for _PAGE_PRESENT in set_pte/pmd functions Aneesh Kumar K.V
2013-11-18  9:28   ` Aneesh Kumar K.V
2013-11-20  4:36   ` Paul Mackerras
2013-11-20  4:36     ` Paul Mackerras
2013-11-18  9:28 ` [PATCH -V2 5/5] powerpc: mm: book3s: Enable _PAGE_NUMA for book3s Aneesh Kumar K.V
2013-11-18  9:28   ` Aneesh Kumar K.V
2013-11-20  4:37   ` Paul Mackerras
2013-11-20  4:37     ` Paul Mackerras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a9gfri3u.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@au1.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=paulus@samba.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.