Re: [PATCH v12 21/29] HMM: mm add helper to update page table when migrating memory back v2.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: "Jérôme Glisse" <jglisse@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	joro@8bytes.org, Mel Gorman <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Christophe Harle <charle@nvidia.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
	Cameron Buschardt <cabuschardt@nvidia.com>,
	Arvind Gopalakrishnan <arvindg@nvidia.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Shachar Raindel <raindel@mellanox.com>,
	Liran Liss <liranl@mellanox.com>,
	Roland Dreier <roland@purestorage.com>,
	Ben Sander <ben.sander@amd.com>,
	Greg Stoner <Greg.Stoner@amd.com>,
	John Bridgman <John.Bridgman@amd.com>,
	Michael Mantor <Michael.Mantor@amd.com>,
	Paul Blinzer <Paul.Blinzer@amd.com>,
	Leonid Shamis <Leonid.Shamis@amd.com>,
	Laurent Morichetti <Laurent.Morichetti@amd.com>,
	Alexander Deucher <Alexander.Deucher@amd.com>
Subject: Re: [PATCH v12 21/29] HMM: mm add helper to update page table when migrating memory back v2.
Date: Mon, 21 Mar 2016 16:57:32 +0530	[thread overview]
Message-ID: <877fgwul3v.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <1457469802-11850-22-git-send-email-jglisse@redhat.com>

Jérôme Glisse <jglisse@redhat.com> writes:

> [ text/plain ]
> To migrate memory back we first need to lock HMM special CPU page
> table entry so we know no one else might try to migrate those entry
> back. Helper also allocate new page where data will be copied back
> from the device. Then we can proceed with the device DMA operation.
>
> Once DMA is done we can update again the CPU page table to point to
> the new page that holds the content copied back from device memory.
>
> Note that we do not need to invalidate the range are we are only
> modifying non present CPU page table entry.
>
> Changed since v1:
>   - Save memcg against which each page is precharge as it might
>     change along the way.
>
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> ---
>  include/linux/mm.h |  12 +++
>  mm/memory.c        | 257 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 269 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c5c062e..1cd060f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2392,6 +2392,18 @@ static inline void hmm_mm_init(struct mm_struct *mm)
>  {
>  	mm->hmm = NULL;
>  }
> +
> +int mm_hmm_migrate_back(struct mm_struct *mm,
> +			struct vm_area_struct *vma,
> +			pte_t *new_pte,
> +			unsigned long start,
> +			unsigned long end);
> +void mm_hmm_migrate_back_cleanup(struct mm_struct *mm,
> +				 struct vm_area_struct *vma,
> +				 pte_t *new_pte,
> +				 dma_addr_t *hmm_pte,
> +				 unsigned long start,
> +				 unsigned long end);
>  #else /* !CONFIG_HMM */
>  static inline void hmm_mm_init(struct mm_struct *mm)
>  {
> diff --git a/mm/memory.c b/mm/memory.c
> index 3cb3653..d917911a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3513,6 +3513,263 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL_GPL(handle_mm_fault);
>  
> +
> +#ifdef CONFIG_HMM
> +/* mm_hmm_migrate_back() - lock HMM CPU page table entry and allocate new page.
> + *
> + * @mm: The mm struct.
> + * @vma: The vm area struct the range is in.
> + * @new_pte: Array of new CPU page table entry value.
> + * @start: Start address of the range (inclusive).
> + * @end: End address of the range (exclusive).
> + *
> + * This function will lock HMM page table entry and allocate new page for entry
> + * it successfully locked.
> + */


Can you add more comments around this ?

> +int mm_hmm_migrate_back(struct mm_struct *mm,
> +			struct vm_area_struct *vma,
> +			pte_t *new_pte,
> +			unsigned long start,
> +			unsigned long end)
> +{
> +	pte_t hmm_entry = swp_entry_to_pte(make_hmm_entry_locked());
> +	unsigned long addr, i;
> +	int ret = 0;
> +
> +	VM_BUG_ON(vma->vm_ops || (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)));
> +
> +	if (unlikely(anon_vma_prepare(vma)))
> +		return -ENOMEM;
> +
> +	start &= PAGE_MASK;
> +	end = PAGE_ALIGN(end);
> +	memset(new_pte, 0, sizeof(pte_t) * ((end - start) >> PAGE_SHIFT));
> +
> +	for (addr = start; addr < end;) {
> +		unsigned long cstart, next;
> +		spinlock_t *ptl;
> +		pgd_t *pgdp;
> +		pud_t *pudp;
> +		pmd_t *pmdp;
> +		pte_t *ptep;
> +
> +		pgdp = pgd_offset(mm, addr);
> +		pudp = pud_offset(pgdp, addr);
> +		/*
> +		 * Some other thread might already have migrated back the entry
> +		 * and freed the page table. Unlikely thought.
> +		 */
> +		if (unlikely(!pudp)) {
> +			addr = min((addr + PUD_SIZE) & PUD_MASK, end);
> +			continue;
> +		}
> +		pmdp = pmd_offset(pudp, addr);
> +		if (unlikely(!pmdp || pmd_bad(*pmdp) || pmd_none(*pmdp) ||
> +			     pmd_trans_huge(*pmdp))) {
> +			addr = min((addr + PMD_SIZE) & PMD_MASK, end);
> +			continue;
> +		}
> +		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
> +		for (cstart = addr, i = (addr - start) >> PAGE_SHIFT,
> +		     next = min((addr + PMD_SIZE) & PMD_MASK, end);
> +		     addr < next; addr += PAGE_SIZE, ptep++, i++) {
> +			swp_entry_t entry;
> +
> +			entry = pte_to_swp_entry(*ptep);
> +			if (pte_none(*ptep) || pte_present(*ptep) ||
> +			    !is_hmm_entry(entry) ||
> +			    is_hmm_entry_locked(entry))
> +				continue;
> +
> +			set_pte_at(mm, addr, ptep, hmm_entry);
> +			new_pte[i] = pte_mkspecial(pfn_pte(my_zero_pfn(addr),
> +						   vma->vm_page_prot));
> +		}
> +		pte_unmap_unlock(ptep - 1, ptl);


I guess this is fixing all the ptes in the cpu page table mapping a pmd
entry. But then what is below ?


> +
> +		for (addr = cstart, i = (addr - start) >> PAGE_SHIFT;
> +		     addr < next; addr += PAGE_SIZE, i++) {

Your use of vairable addr with multiple loops updating then is also
making it complex. We should definitely add more comments here. I guess
we are going through the same range we iterated above here.

> +			struct mem_cgroup *memcg;
> +			struct page *page;
> +
> +			if (!pte_present(new_pte[i]))
> +				continue;

What is that checking for ?. We set that using pte_mkspecial above ?

> +
> +			page = alloc_zeroed_user_highpage_movable(vma, addr);
> +			if (!page) {
> +				ret = -ENOMEM;
> +				break;
> +			}
> +			__SetPageUptodate(page);
> +			if (mem_cgroup_try_charge(page, mm, GFP_KERNEL,
> +						  &memcg)) {
> +				page_cache_release(page);
> +				ret = -ENOMEM;
> +				break;
> +			}
> +			/*
> +			 * We can safely reuse the s_mem/mapping field of page
> +			 * struct to store the memcg as the page is only seen
> +			 * by HMM at this point and we can clear it before it
> +			 * is public see mm_hmm_migrate_back_cleanup().
> +			 */
> +			page->s_mem = memcg;
> +			new_pte[i] = mk_pte(page, vma->vm_page_prot);
> +			if (vma->vm_flags & VM_WRITE) {
> +				new_pte[i] = pte_mkdirty(new_pte[i]);
> +				new_pte[i] = pte_mkwrite(new_pte[i]);
> +			}

Why mark it dirty if vm_flags is VM_WRITE ?

> +		}
> +
> +		if (!ret)
> +			continue;
> +
> +		hmm_entry = swp_entry_to_pte(make_hmm_entry());
> +		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);


Again we loop through the same range ?

> +		for (addr = cstart, i = (addr - start) >> PAGE_SHIFT;
> +		     addr < next; addr += PAGE_SIZE, ptep++, i++) {
> +			unsigned long pfn = pte_pfn(new_pte[i]);
> +
> +			if (!pte_present(new_pte[i]) || !is_zero_pfn(pfn))
> +				continue;


What is that checking for ?
> +
> +			set_pte_at(mm, addr, ptep, hmm_entry);
> +			pte_clear(mm, addr, &new_pte[i]);

what is that pte_clear for ?. Handling of new_pte needs more code comments.

> +		}
> +		pte_unmap_unlock(ptep - 1, ptl);
> +		break;
> +	}
> +	return ret;
> +}
> +EXPORT_SYMBOL(mm_hmm_migrate_back);
> +


-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-03-21 11:27 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-08 20:42 HMM (Heterogeneous Memory Management) Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 01/29] mmu_notifier: add event information to address invalidation v9 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 02/29] mmu_notifier: keep track of active invalidation ranges v5 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 03/29] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() v2 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 04/29] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 05/29] HMM: introduce heterogeneous memory management v5 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 06/29] HMM: add HMM page table v4 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 07/29] HMM: add per mirror " Jérôme Glisse
2016-03-29 22:58   ` John Hubbard
2016-03-08 20:43 ` [PATCH v12 08/29] HMM: add device page fault support v6 Jérôme Glisse
2016-03-23  6:52   ` Aneesh Kumar K.V
2016-03-23 10:09     ` Jerome Glisse
2016-03-23 10:29       ` Aneesh Kumar K.V
2016-03-23 11:25         ` Jerome Glisse
2016-03-08 20:43 ` [PATCH v12 09/29] HMM: add mm page table iterator helpers Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 10/29] HMM: use CPU page table during invalidation Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 11/29] HMM: add discard range helper (to clear and free resources for a range) Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 12/29] HMM: add dirty range helper (toggle dirty bit inside mirror page table) v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 13/29] HMM: DMA map memory on behalf of device driver v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 14/29] HMM: Add support for hugetlb Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 15/29] HMM: add documentation explaining HMM internals and how to use it Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 16/29] fork: pass the dst vma to copy_page_range() and its sub-functions Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 17/29] HMM: add special swap filetype for memory migrated to device v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 18/29] HMM: add new HMM page table flag (valid device memory) Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 19/29] HMM: add new HMM page table flag (select flag) Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 20/29] HMM: handle HMM device page table entry on mirror page table fault and update Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 21/29] HMM: mm add helper to update page table when migrating memory back v2 Jérôme Glisse
2016-03-21 11:27   ` Aneesh Kumar K.V [this message]
2016-03-21 12:02     ` Jerome Glisse
2016-03-21 13:48       ` Aneesh Kumar K.V
2016-03-21 14:30         ` Jerome Glisse
2016-03-08 20:43 ` [PATCH v12 22/29] HMM: mm add helper to update page table when migrating memory v3 Jérôme Glisse
2016-03-21 14:24   ` Aneesh Kumar K.V
2016-03-08 20:43 ` [PATCH v12 23/29] HMM: new callback for copying memory from and to device memory v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 24/29] HMM: allow to get pointer to spinlock protecting a directory Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 25/29] HMM: split DMA mapping function in two Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 26/29] HMM: add helpers for migration back to system memory v3 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 27/29] HMM: fork copy migrated memory into system memory for child process Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 28/29] HMM: CPU page fault on migrated memory Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 29/29] HMM: add mirror fault support for system to device memory migration v3 Jérôme Glisse
2016-03-08 22:02 ` HMM (Heterogeneous Memory Management) John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877fgwul3v.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Greg.Stoner@amd.com \
    --cc=John.Bridgman@amd.com \
    --cc=Laurent.Morichetti@amd.com \
    --cc=Leonid.Shamis@amd.com \
    --cc=Michael.Mantor@amd.com \
    --cc=Paul.Blinzer@amd.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arvindg@nvidia.com \
    --cc=ben.sander@amd.com \
    --cc=blc@redhat.com \
    --cc=cabuschardt@nvidia.com \
    --cc=charle@nvidia.com \
    --cc=dpoole@nvidia.com \
    --cc=haggaie@mellanox.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jweiner@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=roland@purestorage.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).