Re: [PATCH v4 06/11] mm: thp: check pmd migration entry in common path

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <zi.yan@cs.rutgers.edu>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <zi.yan@sent.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kirill.shutemov@linux.intel.com, akpm@linux-foundation.org,
	minchan@kernel.org, vbabka@suse.cz, mgorman@techsingularity.net,
	mhocko@kernel.org, n-horiguchi@ah.jp.nec.com,
	khandual@linux.vnet.ibm.com, dnellans@nvidia.com
Subject: Re: [PATCH v4 06/11] mm: thp: check pmd migration entry in common path
Date: Fri, 24 Mar 2017 11:09:25 -0500	[thread overview]
Message-ID: <58D544B5.20102@cs.rutgers.edu> (raw)
In-Reply-To: <20170324145042.bda52glerop5wydx@node.shutemov.name>

[-- Attachment #1: Type: text/plain, Size: 8267 bytes --]



Kirill A. Shutemov wrote:
> On Mon, Mar 13, 2017 at 11:45:02AM -0400, Zi Yan wrote:
>> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>
>> If one of callers of page migration starts to handle thp,
>> memory management code start to see pmd migration entry, so we need
>> to prepare for it before enabling. This patch changes various code
>> point which checks the status of given pmds in order to prevent race
>> between thp migration and the pmd-related works.
>>
>> ChangeLog v1 -> v2:
>> - introduce pmd_related() (I know the naming is not good, but can't
>>   think up no better name. Any suggesntion is welcomed.)
>>
>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>
>> ChangeLog v2 -> v3:
>> - add is_swap_pmd()
>> - a pmd entry should be pmd pointing to pte pages, is_swap_pmd(),
>>   pmd_trans_huge(), pmd_devmap(), or pmd_none()
>> - use pmdp_huge_clear_flush() instead of pmdp_huge_get_and_clear()
>> - flush_cache_range() while set_pmd_migration_entry()
>> - pmd_none_or_trans_huge_or_clear_bad() and pmd_trans_unstable() return
>>   true on pmd_migration_entry, so that migration entries are not
>>   treated as pmd page table entries.
>>
>> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
>> ---
>>  arch/x86/mm/gup.c             |  4 +--
>>  fs/proc/task_mmu.c            | 22 +++++++++------
>>  include/asm-generic/pgtable.h |  3 +-
>>  include/linux/huge_mm.h       | 14 +++++++--
>>  mm/gup.c                      | 22 +++++++++++++--
>>  mm/huge_memory.c              | 66 ++++++++++++++++++++++++++++++++++++++-----
>>  mm/madvise.c                  |  2 ++
>>  mm/memcontrol.c               |  2 ++
>>  mm/memory.c                   |  9 ++++--
>>  mm/mprotect.c                 |  6 ++--
>>  mm/mremap.c                   |  2 +-
>>  11 files changed, 124 insertions(+), 28 deletions(-)
>>
<snip>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 94fab8fa432b..2b1effb16242 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -272,6 +272,15 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>>  			return page;
>>  		return no_page_table(vma, flags);
>>  	}
>> +	if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
>> +		return no_page_table(vma, flags);
>> +	if (!pmd_present(*pmd)) {
>> +retry:
>> +		if (likely(!(flags & FOLL_MIGRATION)))
>> +			return no_page_table(vma, flags);
>> +		pmd_migration_entry_wait(mm, pmd);
>> +		goto retry;
> 
> This looks a lot like endless loop if flags contain FOLL_MIGRATION. Hm?
> 
> I guess retry label should be on previous line.

You are right. It should be:

+	if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
+		return no_page_table(vma, flags);
+retry:
+	if (!pmd_present(*pmd)) {
+		if (likely(!(flags & FOLL_MIGRATION)))
+			return no_page_table(vma, flags);
+		pmd_migration_entry_wait(mm, pmd);
+		goto retry;

> 
>> +	}
>>  	if (pmd_devmap(*pmd)) {
>>  		ptl = pmd_lock(mm, pmd);
>>  		page = follow_devmap_pmd(vma, address, pmd, flags);
>> @@ -286,6 +295,15 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>>  		return no_page_table(vma, flags);
>>  
>>  	ptl = pmd_lock(mm, pmd);
>> +	if (unlikely(!pmd_present(*pmd))) {
>> +retry_locked:
>> +		if (likely(!(flags & FOLL_MIGRATION))) {
>> +			spin_unlock(ptl);
>> +			return no_page_table(vma, flags);
>> +		}
>> +		pmd_migration_entry_wait(mm, pmd);
>> +		goto retry_locked;
> 
> Again. That's doesn't look right..

It will be changed:

 	ptl = pmd_lock(mm, pmd);
+retry_locked:
+	if (unlikely(!pmd_present(*pmd))) {
+		if (likely(!(flags & FOLL_MIGRATION))) {
+			spin_unlock(ptl);
+			return no_page_table(vma, flags);
+		}
+		pmd_migration_entry_wait(mm, pmd);
+		goto retry_locked;

> 
>> +	}
>>  	if (unlikely(!pmd_trans_huge(*pmd))) {
>>  		spin_unlock(ptl);
>>  		return follow_page_pte(vma, address, pmd, flags);
>> @@ -341,7 +359,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
>>  	pud = pud_offset(pgd, address);
>>  	BUG_ON(pud_none(*pud));
>>  	pmd = pmd_offset(pud, address);
>> -	if (pmd_none(*pmd))
>> +	if (!pmd_present(*pmd))
>>  		return -EFAULT;
>>  	VM_BUG_ON(pmd_trans_huge(*pmd));
>>  	pte = pte_offset_map(pmd, address);
>> @@ -1369,7 +1387,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
>>  		pmd_t pmd = READ_ONCE(*pmdp);
>>  
>>  		next = pmd_addr_end(addr, end);
>> -		if (pmd_none(pmd))
>> +		if (!pmd_present(pmd))
>>  			return 0;
>>  
>>  		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index a9c2a0ef5b9b..3f18452f3eb1 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -898,6 +898,21 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>  
>>  	ret = -EAGAIN;
>>  	pmd = *src_pmd;
>> +
>> +	if (unlikely(is_pmd_migration_entry(pmd))) {
> 
> Shouldn't you first check that the pmd is not present?

is_pmd_migration_entry() checks !pmd_present().

in linux/swapops.h, is_pmd_migration_entry is defined as:

static inline int is_pmd_migration_entry(pmd_t pmd)
{
    return !pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
}


> 
>> +		swp_entry_t entry = pmd_to_swp_entry(pmd);
>> +
>> +		if (is_write_migration_entry(entry)) {
>> +			make_migration_entry_read(&entry);
>> +			pmd = swp_entry_to_pmd(entry);
>> +			set_pmd_at(src_mm, addr, src_pmd, pmd);
>> +		}
>> +		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>> +		ret = 0;
>> +		goto out_unlock;
>> +	}
>> +	WARN_ONCE(!pmd_present(pmd), "Uknown non-present format on pmd.\n");
> 
> Typo.

Got it.

> 
>> +
>>  	if (unlikely(!pmd_trans_huge(pmd))) {
>>  		pte_free(dst_mm, pgtable);
>>  		goto out_unlock;
>> @@ -1204,6 +1219,9 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
>>  	if (unlikely(!pmd_same(*vmf->pmd, orig_pmd)))
>>  		goto out_unlock;j
>>  
>> +	if (unlikely(!pmd_present(orig_pmd)))
>> +		goto out_unlock;
>> +
>>  	page = pmd_page(orig_pmd);
>>  	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
>>  	/*
>> @@ -1338,7 +1356,15 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>>  	if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
>>  		goto out;
>>  
>> -	page = pmd_page(*pmd);
>> +	if (is_pmd_migration_entry(*pmd)) {
> 
> Again, I don't think it's it's safe to check if pmd is migration entry
> before checking if it's present.
> 
>> +		swp_entry_t entry;
>> +
>> +		entry = pmd_to_swp_entry(*pmd);
>> +		page = pfn_to_page(swp_offset(entry));
>> +		if (!is_migration_entry(entry))
>> +			goto out;
> 
> I don't understand how it suppose to work.
> You take swp_offset() of entry before checking if it's migration entry.
> What's going on?

This chunk of change inside follow_trans_huge_pmd() is not needed.
Because two callers, smaps_pmd_entry() and follow_page_mask(), guarantee
that the pmd points to a present entry.

I will drop this chunk in the next version.

> 
>> +	} else
>> +		page = pmd_page(*pmd);
>>  	VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
>>  	if (flags & FOLL_TOUCH)
>>  		touch_pmd(vma, addr, pmd);
>> @@ -1534,6 +1560,9 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>  	if (is_huge_zero_pmd(orig_pmd))
>>  		goto out;
>>  
>> +	if (unlikely(!pmd_present(orig_pmd)))
>> +		goto out;
>> +
>>  	page = pmd_page(orig_pmd);
>>  	/*
>>  	 * If other processes are mapping this page, we couldn't discard
>> @@ -1766,6 +1795,20 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>  	if (prot_numa && pmd_protnone(*pmd))
>>  		goto unlock;
>>  
>> +	if (is_pmd_migration_entry(*pmd)) {
>> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> +
>> +		if (is_write_migration_entry(entry)) {
>> +			pmd_t newpmd;
>> +
>> +			make_migration_entry_read(&entry);
>> +			newpmd = swp_entry_to_pmd(entry);
>> +			set_pmd_at(mm, addr, pmd, newpmd);
>> +		}
>> +		goto unlock;
>> +	} else if (!pmd_present(*pmd))
>> +		WARN_ONCE(1, "Uknown non-present format on pmd.\n");
> 
> Another typo.

Got it.

Thanks for all your comments.

-- 
Best Regards,
Yan Zi


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 537 bytes --]

next prev parent reply	other threads:[~2017-03-24 16:09 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-13 15:44 [PATCH v4 00/11] mm: page migration enhancement for thp Zi Yan
2017-03-13 15:44 ` [PATCH v4 01/11] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 Zi Yan
2017-03-24 18:23   ` Tim Chen
2017-03-24 18:30     ` Zi Yan
2017-03-13 15:44 ` [PATCH v4 02/11] mm: mempolicy: add queue_pages_node_check() Zi Yan
2017-03-13 15:44 ` [PATCH v4 03/11] mm: thp: introduce separate TTU flag for thp freezing Zi Yan
2017-03-13 15:45 ` [PATCH v4 04/11] mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION Zi Yan
2017-03-24 14:10   ` Kirill A. Shutemov
2017-03-24 14:21     ` Zi Yan
2017-03-13 15:45 ` [PATCH v4 05/11] mm: thp: enable thp migration in generic path Zi Yan
2017-03-14 21:19   ` kbuild test robot
2017-03-14 21:55     ` Zi Yan
2017-03-15  9:01       ` Geert Uytterhoeven
2017-03-15 16:00         ` Zi Yan
2017-03-14 21:26   ` kbuild test robot
2017-03-24 14:28   ` Kirill A. Shutemov
2017-03-24 15:30     ` Zi Yan
2017-03-13 15:45 ` [PATCH v4 06/11] mm: thp: check pmd migration entry in common path Zi Yan
2017-03-24 14:50   ` Kirill A. Shutemov
2017-03-24 16:09     ` Zi Yan [this message]
2017-03-24 16:50       ` Kirill A. Shutemov
2017-03-24 17:09         ` Zi Yan
2017-03-13 15:45 ` [PATCH v4 07/11] mm: soft-dirty: keep soft-dirty bits over thp migration Zi Yan
2017-03-13 15:45 ` [PATCH v4 08/11] mm: hwpoison: soft offline supports " Zi Yan
2017-03-13 15:45 ` [PATCH v4 09/11] mm: mempolicy: mbind and migrate_pages support " Zi Yan
2017-03-13 15:45 ` [PATCH v4 10/11] mm: migrate: move_pages() supports " Zi Yan
2017-03-13 15:45 ` [PATCH v4 11/11] mm: memory_hotplug: memory hotremove " Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58D544B5.20102@cs.rutgers.edu \
    --to=zi.yan@cs.rutgers.edu \
    --cc=akpm@linux-foundation.org \
    --cc=dnellans@nvidia.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@sent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).