From: Zi Yan <zi.yan@cs.rutgers.edu>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: n-horiguchi@ah.jp.nec.com, kirill.shutemov@linux.intel.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, minchan@kernel.org, vbabka@suse.cz,
mgorman@techsingularity.net, mhocko@kernel.org,
zi.yan@cs.rutgers.edu, dnellans@nvidia.com
Subject: Re: [PATCH v5 06/11] mm: thp: check pmd migration entry in common path
Date: Fri, 21 Apr 2017 10:17:53 -0500 [thread overview]
Message-ID: <58FA22A1.105@cs.rutgers.edu> (raw)
In-Reply-To: <edb7c113-4c25-4e5b-9182-594c002cbf93@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 7189 bytes --]
Anshuman Khandual wrote:
> On 04/21/2017 02:17 AM, Zi Yan wrote:
>> From: Zi Yan <zi.yan@cs.rutgers.edu>
>>
>> If one of callers of page migration starts to handle thp,
>> memory management code start to see pmd migration entry, so we need
>> to prepare for it before enabling. This patch changes various code
>> point which checks the status of given pmds in order to prevent race
>> between thp migration and the pmd-related works.
>>
>> ChangeLog v1 -> v2:
>> - introduce pmd_related() (I know the naming is not good, but can't
>> think up no better name. Any suggesntion is welcomed.)
>>
>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>
>> ChangeLog v2 -> v3:
>> - add is_swap_pmd()
>> - a pmd entry should be pmd pointing to pte pages, is_swap_pmd(),
>> pmd_trans_huge(), pmd_devmap(), or pmd_none()
>> - pmd_none_or_trans_huge_or_clear_bad() and pmd_trans_unstable() return
>> true on pmd_migration_entry, so that migration entries are not
>> treated as pmd page table entries.
>>
>> ChangeLog v4 -> v5:
>> - add explanation in pmd_none_or_trans_huge_or_clear_bad() to state
>> the equivalence of !pmd_present() and is_pmd_migration_entry()
>> - fix migration entry wait deadlock code (from v1) in follow_page_mask()
>> - remove unnecessary code (from v1) in follow_trans_huge_pmd()
>> - use is_swap_pmd() instead of !pmd_present() for pmd migration entry,
>> so it will not be confused with pmd_none()
>> - change author information
>>
>> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
>> ---
>> arch/x86/mm/gup.c | 7 +++--
>> fs/proc/task_mmu.c | 30 +++++++++++++--------
>> include/asm-generic/pgtable.h | 17 +++++++++++-
>> include/linux/huge_mm.h | 14 ++++++++--
>> mm/gup.c | 22 ++++++++++++++--
>> mm/huge_memory.c | 61 ++++++++++++++++++++++++++++++++++++++-----
>> mm/memcontrol.c | 5 ++++
>> mm/memory.c | 12 +++++++--
>> mm/mprotect.c | 4 +--
>> mm/mremap.c | 2 +-
>> 10 files changed, 145 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
>> index 456dfdfd2249..096bbcc801e6 100644
>> --- a/arch/x86/mm/gup.c
>> +++ b/arch/x86/mm/gup.c
>> @@ -9,6 +9,7 @@
>> #include <linux/vmstat.h>
>> #include <linux/highmem.h>
>> #include <linux/swap.h>
>> +#include <linux/swapops.h>
>> #include <linux/memremap.h>
>>
>> #include <asm/mmu_context.h>
>> @@ -243,9 +244,11 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
>> pmd_t pmd = *pmdp;
>>
>> next = pmd_addr_end(addr, end);
>> - if (pmd_none(pmd))
>> + if (!pmd_present(pmd)) {
>> + VM_BUG_ON(is_swap_pmd(pmd) && IS_ENABLED(CONFIG_MIGRATION) &&
>> + !is_pmd_migration_entry(pmd));
>> return 0;
>> - if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
>> + } else if (unlikely(pmd_large(pmd))) {
>> /*
>> * NUMA hinting faults need to be handled in the GUP
>> * slowpath for accounting purposes and so that they
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 5c8359704601..57489dcd71c4 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -600,7 +600,8 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>
>> ptl = pmd_trans_huge_lock(pmd, vma);
>> if (ptl) {
>> - smaps_pmd_entry(pmd, addr, walk);
>> + if (pmd_present(*pmd))
>> + smaps_pmd_entry(pmd, addr, walk);
>> spin_unlock(ptl);
>> return 0;
>> }
>> @@ -942,6 +943,9 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>> goto out;
>> }
>>
>> + if (!pmd_present(*pmd))
>> + goto out;
>> +
>
> These pmd_present() checks should have been done irrespective of the
> presence of new PMD migration entries. Please separate them out in a
> different clean up patch.
Not really. The introduction of PMD migration entries makes
pmd_trans_huge_lock() return a lock when PMD is a swap entry (See
changes on pmd_trans_huge_lock() in this patch). This was not the case
before, where pmd_trans_huge_lock() returned NULL if PMD entry was
pmd_none() and both two chunks were not reachable.
Maybe I should use is_swap_pmd() to clarify the confusion.
<snip>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 7406d88445bf..3479e9caf2fa 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -912,6 +912,22 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>
>> ret = -EAGAIN;
>> pmd = *src_pmd;
>> +
>> + if (unlikely(is_swap_pmd(pmd))) {
>> + swp_entry_t entry = pmd_to_swp_entry(pmd);
>> +
>> + VM_BUG_ON(IS_ENABLED(CONFIG_MIGRATION) &&
>> + !is_pmd_migration_entry(pmd));
>> + if (is_write_migration_entry(entry)) {
>> + make_migration_entry_read(&entry);
>
> We create a read migration entry after detecting a write ?
When copying page tables, COW mappings require pages in both parent and
child to be set to read. In copy_huge_pmd(), only anonymous VMAs are
copied and the other VMAs will be refilled on fault. Writable anonymous
VMAs have VM_MAYWRITE set but not VM_SHARED and this matches
is_cow_mapping(). So all mappings copied in this function are COW mappings.
>
>> + pmd = swp_entry_to_pmd(entry);
>> + set_pmd_at(src_mm, addr, src_pmd, pmd);
>> + }
>> + set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>> + ret = 0;
>> + goto out_unlock;
>> + }
>> +
>> if (unlikely(!pmd_trans_huge(pmd))) {
>> pte_free(dst_mm, pgtable);
>> goto out_unlock;
>> @@ -1218,6 +1234,9 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
>> if (unlikely(!pmd_same(*vmf->pmd, orig_pmd)))
>> goto out_unlock;
>>
>> + if (unlikely(!pmd_present(orig_pmd)))
>> + goto out_unlock;
>> +
>> page = pmd_page(orig_pmd);
>> VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
>> /*
>> @@ -1548,6 +1567,12 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>> if (is_huge_zero_pmd(orig_pmd))
>> goto out;
>>
>> + if (unlikely(!pmd_present(orig_pmd))) {
>> + VM_BUG_ON(IS_ENABLED(CONFIG_MIGRATION) &&
>> + !is_pmd_migration_entry(orig_pmd));
>> + goto out;
>> + }
>> +
>> page = pmd_page(orig_pmd);
>> /*
>> * If other processes are mapping this page, we couldn't discard
>> @@ -1758,6 +1783,21 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>> preserve_write = prot_numa && pmd_write(*pmd);
>> ret = 1;
>>
>> + if (is_swap_pmd(*pmd)) {
>> + swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> +
>> + VM_BUG_ON(IS_ENABLED(CONFIG_MIGRATION) &&
>> + !is_pmd_migration_entry(*pmd));
>> + if (is_write_migration_entry(entry)) {
>> + pmd_t newpmd;
>> +
>> + make_migration_entry_read(&entry);
>
> Same here or maybe I am missing something.
I follow the same pattern in change_pte_range() (mm/mprotect.c). The
comment there says "A protection check is difficult so just be safe and
disable write".
--
Best Regards,
Yan Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 537 bytes --]
next prev parent reply other threads:[~2017-04-21 15:18 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-20 20:47 [PATCH v5 00/11] mm: page migration enhancement for thp Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-20 20:47 ` [PATCH v5 01/11] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-05-19 13:07 ` Anshuman Khandual
2017-05-19 13:07 ` Anshuman Khandual
2017-05-19 15:55 ` Dave Hansen
2017-05-19 15:55 ` Dave Hansen
2017-05-19 16:31 ` Zi Yan
2017-05-19 16:36 ` Dave Hansen
2017-05-19 16:36 ` Dave Hansen
2017-04-20 20:47 ` [PATCH v5 02/11] mm: mempolicy: add queue_pages_node_check() Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-21 4:04 ` Anshuman Khandual
2017-04-21 4:04 ` Anshuman Khandual
2017-05-19 13:13 ` Anshuman Khandual
2017-05-19 13:13 ` Anshuman Khandual
2017-05-19 16:02 ` Mel Gorman
2017-05-19 16:02 ` Mel Gorman
2017-05-19 16:37 ` Zi Yan
2017-05-19 20:28 ` Mel Gorman
2017-05-19 20:28 ` Mel Gorman
2017-05-19 20:48 ` Zi Yan
2017-05-19 21:39 ` Mel Gorman
2017-05-19 21:39 ` Mel Gorman
2017-04-20 20:47 ` [PATCH v5 03/11] mm: thp: introduce separate TTU flag for thp freezing Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-21 4:29 ` Anshuman Khandual
2017-04-21 4:29 ` Anshuman Khandual
2017-04-20 20:47 ` [PATCH v5 04/11] mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-21 4:36 ` Anshuman Khandual
2017-04-21 4:36 ` Anshuman Khandual
2017-04-20 20:47 ` [PATCH v5 05/11] mm: thp: enable thp migration in generic path Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-20 20:47 ` [PATCH v5 06/11] mm: thp: check pmd migration entry in common path Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-21 6:17 ` Anshuman Khandual
2017-04-21 6:17 ` Anshuman Khandual
2017-04-21 15:17 ` Zi Yan [this message]
2017-04-20 20:47 ` [PATCH v5 07/11] mm: soft-dirty: keep soft-dirty bits over thp migration Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-20 20:47 ` [PATCH v5 08/11] mm: hwpoison: soft offline supports " Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-21 8:10 ` Anshuman Khandual
2017-04-21 8:10 ` Anshuman Khandual
2017-04-21 15:55 ` Zi Yan
2017-04-27 4:41 ` Naoya Horiguchi
2017-04-27 4:41 ` Naoya Horiguchi
2017-04-27 16:39 ` Zi Yan
2017-04-20 20:47 ` [PATCH v5 09/11] mm: mempolicy: mbind and migrate_pages support " Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-21 8:22 ` Anshuman Khandual
2017-04-21 8:22 ` Anshuman Khandual
2017-04-21 16:00 ` Zi Yan
2017-04-20 20:47 ` [PATCH v5 10/11] mm: migrate: move_pages() supports " Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-04-20 20:47 ` [PATCH v5 11/11] mm: memory_hotplug: memory hotremove " Zi Yan
2017-04-20 20:47 ` Zi Yan
2017-05-19 13:56 ` Anshuman Khandual
2017-05-19 13:56 ` Anshuman Khandual
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58FA22A1.105@cs.rutgers.edu \
--to=zi.yan@cs.rutgers.edu \
--cc=akpm@linux-foundation.org \
--cc=dnellans@nvidia.com \
--cc=khandual@linux.vnet.ibm.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.