From: Benjamin Herrenschmidt <benh@au1.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
linuxppc-dev@lists.ozlabs.org, paulus@samba.org
Subject: Re: [PATCH -V10 00/15] THP support for PPC64
Date: Sun, 16 Jun 2013 12:00:07 +1000 [thread overview]
Message-ID: <1371348007.21896.62.camel@pasglop> (raw)
In-Reply-To: <1370446119-8837-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Wed, 2013-06-05 at 20:58 +0530, Aneesh Kumar K.V wrote:
> This is the second patchset needed to support THP on ppc64. Some of
> the changes
> included in this series are tricky in that it changes the powerpc
> linux page table
> walk subtly. We also overload few of the pte flags for ptes at PMD
> level (huge
> page PTEs).
>
> The related mm/ changes are already merged to Andrew's -mm tree.
[Andrea, question for you near the end ]
So I'm trying to understand how you handle races between hash_page
and collapse.
The generic collapse code does:
_pmd = pmdp_clear_flush(vma, address, pmd);
Which expects the architecture to essentially have stopped any
concurrent walk by the time it returns.
Your implementation of the above does this:
pmd = *pmdp;
pmd_clear(pmdp);
/*
* Now invalidate the hpte entries in the range
* covered by pmd. This make sure we take a
* fault and will find the pmd as none, which will
* result in a major fault which takes mmap_sem and
* hence wait for collapse to complete. Without this
* the __collapse_huge_page_copy can result in copying
* the old content.
*/
flush_tlb_pmd_range(vma->vm_mm, &pmd, address);
So we clear the pmd after making a copy of it. This will eventually
prevent a tablewalk but only eventually, once that store becomes visible
to other processors, which may take a while. Then you proceed to flush
the hash table for all the underlying PTEs.
So at this point, hash_page might *still* see the old pmd. Unless I
missed something, you did nothing that will prevent that (the only way
to lock against hash_page is really an IPI & wait or to take the PTE's
busy and make them !present or something). So as far as I can tell,
a concurrent hash_page can still sneak into the hash some "small"
entries after you have supposedly flushed them.
In addition, my reading of __collapse_huge_page_isolate() is that it
expects page_young() to be stable at that point, while because of the
above, a concurrent hash_page() might still be setting _PAGE_ACCESSED
(and _PAGE_DIRTY).
So it might be that you have a sneaky way to perform the synchronization
that I have missed :-) But so far I haven't seen it....
Also, a more general question from Andrea. Looking at the code, I was
originally thinking that there is a similar race with dirty. But then
I noticed that the collapse code doesn't look at dirty at all on the sub
pages, it just ignores it. That stroke me as broken until I also noticed
that you seem to always make the THPs dirty....
Is there a reason for that rather than harvesting dirty in the sub pages
and making the THP's dirty the logical OR of the small pages one ?
I understand that anonymous memory is often either zero or dirty, but
I suppose it can be clear with content as well as a result of swap out
and back in, no ? Or is there other reasons why THPs must be dirty
always ?
Cheers,
Ben.
next prev parent reply other threads:[~2013-06-16 2:00 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-05 15:28 [PATCH -V10 00/15] THP support for PPC64 Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 01/15] powerpc/mm: handle hugepage size correctly when invalidating hpte entries Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 02/15] powerpc/THP: Double the PMD table size for THP Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 03/15] powerpc/THP: Implement transparent hugepages for ppc64 Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 04/15] powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 05/15] powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 06/15] powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 07/15] powerpc: Update gup_pmd_range to handle transparent hugepages Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 08/15] powerpc/THP: Add code to handle HPTE faults for hugepages Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 09/15] powerpc: Make linux pagetable walk safe with THP enabled Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 10/15] powerpc: Prevent gcc to re-read the pagetables Aneesh Kumar K.V
2013-06-05 15:41 ` David Laight
2013-06-05 22:39 ` Benjamin Herrenschmidt
2013-06-05 15:28 ` [PATCH -V10 11/15] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 12/15] powerpc: Optimize hugepage invalidate Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 13/15] powerpc: disable assert_pte_locked for collapse_huge_page Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 14/15] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 15/15] powerpc: split hugepage when using subpage protection Aneesh Kumar K.V
2013-06-05 23:31 ` [PATCH -V10 00/15] THP support for PPC64 Benjamin Herrenschmidt
2013-06-06 0:13 ` Andrew Morton
2013-06-06 6:05 ` Aneesh Kumar K.V
2013-06-06 7:20 ` Andrew Morton
2013-06-16 2:00 ` Benjamin Herrenschmidt [this message]
2013-06-16 3:37 ` Benjamin Herrenschmidt
2013-06-16 4:06 ` Benjamin Herrenschmidt
2013-06-16 23:02 ` Benjamin Herrenschmidt
2013-06-18 18:46 ` Aneesh Kumar K.V
2013-06-18 22:03 ` Benjamin Herrenschmidt
2013-06-19 3:30 ` Aneesh Kumar K.V
2013-06-18 17:54 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1371348007.21896.62.camel@pasglop \
--to=benh@au1.ibm.com \
--cc=aarcange@redhat.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.