Re: [HMM v13 16/18] mm/hmm/migrate: new memory migration helper for use with device memory

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: "Jérôme Glisse" <jglisse@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>,
	Jatin Kumar <jakumar@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>
Subject: Re: [HMM v13 16/18] mm/hmm/migrate: new memory migration helper for use with device memory
Date: Sat, 19 Nov 2016 01:27:28 +0530	[thread overview]
Message-ID: <87k2c0muhj.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <1479493107-982-17-git-send-email-jglisse@redhat.com>

Jérôme Glisse <jglisse@redhat.com> writes:

> This patch add a new memory migration helpers, which migrate memory
> backing a range of virtual address of a process to different memory
> (which can be allocated through special allocator). It differs from
> numa migration by working on a range of virtual address and thus by
> doing migration in chunk that can be large enough to use DMA engine
> or special copy offloading engine.
>
> Expected users are any one with heterogeneous memory where different
> memory have different characteristics (latency, bandwidth, ...). As
> an example IBM platform with CAPI bus can make use of this feature
> to migrate between regular memory and CAPI device memory. New CPU
> architecture with a pool of high performance memory not manage as
> cache but presented as regular memory (while being faster and with
> lower latency than DDR) will also be prime user of this patch.
>
> Migration to private device memory will be usefull for device that
> have large pool of such like GPU, NVidia plans to use HMM for that.
>



..............


>+
> +static int hmm_collect_walk_pmd(pmd_t *pmdp,
> +				unsigned long start,
> +				unsigned long end,
> +				struct mm_walk *walk)
> +{
> +	struct hmm_migrate *migrate = walk->private;
> +	struct mm_struct *mm = walk->vma->vm_mm;
> +	unsigned long addr = start;
> +	spinlock_t *ptl;
> +	hmm_pfn_t *pfns;
> +	int pages = 0;
> +	pte_t *ptep;
> +
> +again:
> +	if (pmd_none(*pmdp))
> +		return 0;
> +
> +	split_huge_pmd(walk->vma, pmdp, addr);
> +	if (pmd_trans_unstable(pmdp))
> +		goto again;
> +
> +	pfns = &migrate->pfns[(addr - migrate->start) >> PAGE_SHIFT];
> +	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
> +	arch_enter_lazy_mmu_mode();
> +
> +	for (; addr < end; addr += PAGE_SIZE, pfns++, ptep++) {
> +		unsigned long pfn;
> +		swp_entry_t entry;
> +		struct page *page;
> +		hmm_pfn_t flags;
> +		bool write;
> +		pte_t pte;
> +
> +		pte = ptep_get_and_clear(mm, addr, ptep);
> +		if (!pte_present(pte)) {
> +			if (pte_none(pte))
> +				continue;
> +
> +			entry = pte_to_swp_entry(pte);
> +			if (!is_device_entry(entry)) {
> +				set_pte_at(mm, addr, ptep, pte);
> +				continue;
> +			}
> +
> +			flags = HMM_PFN_DEVICE | HMM_PFN_UNADDRESSABLE;
> +			page = device_entry_to_page(entry);
> +			write = is_write_device_entry(entry);
> +			pfn = page_to_pfn(page);
> +
> +			if (!(page->pgmap->flags & MEMORY_MOVABLE)) {
> +				set_pte_at(mm, addr, ptep, pte);
> +				continue;
> +			}
> +
> +		} else {
> +			pfn = pte_pfn(pte);
> +			page = pfn_to_page(pfn);
> +			write = pte_write(pte);
> +			flags = is_zone_device_page(page) ? HMM_PFN_DEVICE : 0;
> +		}
> +
> +		/* FIXME support THP see hmm_migrate_page_check() */
> +		if (PageTransCompound(page))
> +			continue;
> +
> +		*pfns = hmm_pfn_from_pfn(pfn) | HMM_PFN_MIGRATE | flags;
> +		*pfns |= write ? HMM_PFN_WRITE : 0;
> +		migrate->npages++;
> +		get_page(page);
> +
> +		if (!trylock_page(page)) {
> +			set_pte_at(mm, addr, ptep, pte);
> +		} else {
> +			pte_t swp_pte;
> +
> +			*pfns |= HMM_PFN_LOCKED;
> +
> +			entry = make_migration_entry(page, write);
> +			swp_pte = swp_entry_to_pte(entry);
> +			if (pte_soft_dirty(pte))
> +				swp_pte = pte_swp_mksoft_dirty(swp_pte);
> +			set_pte_at(mm, addr, ptep, swp_pte);
> +
> +			page_remove_rmap(page, false);
> +			put_page(page);
> +			pages++;
> +		}

Can you explain this. What does a failure to lock means here. Also why
convert the pte to migration entries here ? We do that in try_to_unmap right ?


> +	}
> +
> +	arch_leave_lazy_mmu_mode();
> +	pte_unmap_unlock(ptep - 1, ptl);
> +
> +	/* Only flush the TLB if we actually modified any entries */
> +	if (pages)
> +		flush_tlb_range(walk->vma, start, end);
> +
> +	return 0;
> +}
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: "Jérôme Glisse" <jglisse@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: "John Hubbard" <jhubbard@nvidia.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Jatin Kumar" <jakumar@nvidia.com>,
	"Mark Hairgrove" <mhairgrove@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>
Subject: Re: [HMM v13 16/18] mm/hmm/migrate: new memory migration helper for use with device memory
Date: Sat, 19 Nov 2016 01:27:28 +0530	[thread overview]
Message-ID: <87k2c0muhj.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <1479493107-982-17-git-send-email-jglisse@redhat.com>

Jérôme Glisse <jglisse@redhat.com> writes:

> This patch add a new memory migration helpers, which migrate memory
> backing a range of virtual address of a process to different memory
> (which can be allocated through special allocator). It differs from
> numa migration by working on a range of virtual address and thus by
> doing migration in chunk that can be large enough to use DMA engine
> or special copy offloading engine.
>
> Expected users are any one with heterogeneous memory where different
> memory have different characteristics (latency, bandwidth, ...). As
> an example IBM platform with CAPI bus can make use of this feature
> to migrate between regular memory and CAPI device memory. New CPU
> architecture with a pool of high performance memory not manage as
> cache but presented as regular memory (while being faster and with
> lower latency than DDR) will also be prime user of this patch.
>
> Migration to private device memory will be usefull for device that
> have large pool of such like GPU, NVidia plans to use HMM for that.
>



..............


>+
> +static int hmm_collect_walk_pmd(pmd_t *pmdp,
> +				unsigned long start,
> +				unsigned long end,
> +				struct mm_walk *walk)
> +{
> +	struct hmm_migrate *migrate = walk->private;
> +	struct mm_struct *mm = walk->vma->vm_mm;
> +	unsigned long addr = start;
> +	spinlock_t *ptl;
> +	hmm_pfn_t *pfns;
> +	int pages = 0;
> +	pte_t *ptep;
> +
> +again:
> +	if (pmd_none(*pmdp))
> +		return 0;
> +
> +	split_huge_pmd(walk->vma, pmdp, addr);
> +	if (pmd_trans_unstable(pmdp))
> +		goto again;
> +
> +	pfns = &migrate->pfns[(addr - migrate->start) >> PAGE_SHIFT];
> +	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
> +	arch_enter_lazy_mmu_mode();
> +
> +	for (; addr < end; addr += PAGE_SIZE, pfns++, ptep++) {
> +		unsigned long pfn;
> +		swp_entry_t entry;
> +		struct page *page;
> +		hmm_pfn_t flags;
> +		bool write;
> +		pte_t pte;
> +
> +		pte = ptep_get_and_clear(mm, addr, ptep);
> +		if (!pte_present(pte)) {
> +			if (pte_none(pte))
> +				continue;
> +
> +			entry = pte_to_swp_entry(pte);
> +			if (!is_device_entry(entry)) {
> +				set_pte_at(mm, addr, ptep, pte);
> +				continue;
> +			}
> +
> +			flags = HMM_PFN_DEVICE | HMM_PFN_UNADDRESSABLE;
> +			page = device_entry_to_page(entry);
> +			write = is_write_device_entry(entry);
> +			pfn = page_to_pfn(page);
> +
> +			if (!(page->pgmap->flags & MEMORY_MOVABLE)) {
> +				set_pte_at(mm, addr, ptep, pte);
> +				continue;
> +			}
> +
> +		} else {
> +			pfn = pte_pfn(pte);
> +			page = pfn_to_page(pfn);
> +			write = pte_write(pte);
> +			flags = is_zone_device_page(page) ? HMM_PFN_DEVICE : 0;
> +		}
> +
> +		/* FIXME support THP see hmm_migrate_page_check() */
> +		if (PageTransCompound(page))
> +			continue;
> +
> +		*pfns = hmm_pfn_from_pfn(pfn) | HMM_PFN_MIGRATE | flags;
> +		*pfns |= write ? HMM_PFN_WRITE : 0;
> +		migrate->npages++;
> +		get_page(page);
> +
> +		if (!trylock_page(page)) {
> +			set_pte_at(mm, addr, ptep, pte);
> +		} else {
> +			pte_t swp_pte;
> +
> +			*pfns |= HMM_PFN_LOCKED;
> +
> +			entry = make_migration_entry(page, write);
> +			swp_pte = swp_entry_to_pte(entry);
> +			if (pte_soft_dirty(pte))
> +				swp_pte = pte_swp_mksoft_dirty(swp_pte);
> +			set_pte_at(mm, addr, ptep, swp_pte);
> +
> +			page_remove_rmap(page, false);
> +			put_page(page);
> +			pages++;
> +		}

Can you explain this. What does a failure to lock means here. Also why
convert the pte to migration entries here ? We do that in try_to_unmap right ?


> +	}
> +
> +	arch_leave_lazy_mmu_mode();
> +	pte_unmap_unlock(ptep - 1, ptl);
> +
> +	/* Only flush the TLB if we actually modified any entries */
> +	if (pages)
> +		flush_tlb_range(walk->vma, start, end);
> +
> +	return 0;
> +}
>

next prev parent reply	other threads:[~2016-11-18 19:57 UTC|newest]

Thread overview: 146+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-18 18:18 [HMM v13 00/18] HMM (Heterogeneous Memory Management) v13 Jérôme Glisse
2016-11-18 18:18 ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 01/18] mm/memory/hotplug: convert device parameter bool to set of flags Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  0:44   ` Balbir Singh
2016-11-21  0:44     ` Balbir Singh
2016-11-21  4:53     ` Jerome Glisse
2016-11-21  4:53       ` Jerome Glisse
2016-11-21  6:57       ` Anshuman Khandual
2016-11-21  6:57         ` Anshuman Khandual
2016-11-21 12:19         ` Jerome Glisse
2016-11-21 12:19           ` Jerome Glisse
2016-11-21  6:41   ` Anshuman Khandual
2016-11-21  6:41     ` Anshuman Khandual
2016-11-21 12:27     ` Jerome Glisse
2016-11-21 12:27       ` Jerome Glisse
2016-11-22  5:35       ` Anshuman Khandual
2016-11-22  5:35         ` Anshuman Khandual
2016-11-22 14:08         ` Jerome Glisse
2016-11-22 14:08           ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 02/18] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  8:06   ` Anshuman Khandual
2016-11-21  8:06     ` Anshuman Khandual
2016-11-21 12:33     ` Jerome Glisse
2016-11-21 12:33       ` Jerome Glisse
2016-11-22  5:15       ` Anshuman Khandual
2016-11-22  5:15         ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 03/18] mm/ZONE_DEVICE/free_hot_cold_page: catch ZONE_DEVICE pages Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  8:18   ` Anshuman Khandual
2016-11-21  8:18     ` Anshuman Khandual
2016-11-21 12:50     ` Jerome Glisse
2016-11-21 12:50       ` Jerome Glisse
2016-11-22  4:30       ` Anshuman Khandual
2016-11-22  4:30         ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 04/18] mm/ZONE_DEVICE/free-page: callback when page is freed Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  1:49   ` Balbir Singh
2016-11-21  1:49     ` Balbir Singh
2016-11-21  4:57     ` Jerome Glisse
2016-11-21  4:57       ` Jerome Glisse
2016-11-21  8:26   ` Anshuman Khandual
2016-11-21  8:26     ` Anshuman Khandual
2016-11-21 12:34     ` Jerome Glisse
2016-11-21 12:34       ` Jerome Glisse
2016-11-22  5:02       ` Anshuman Khandual
2016-11-22  5:02         ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 05/18] mm/ZONE_DEVICE/devmem_pages_remove: allow early removal of device memory Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21 10:37   ` Anshuman Khandual
2016-11-21 10:37     ` Anshuman Khandual
2016-11-21 12:39     ` Jerome Glisse
2016-11-21 12:39       ` Jerome Glisse
2016-11-22  4:54       ` Anshuman Khandual
2016-11-22  4:54         ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 06/18] mm/ZONE_DEVICE/unaddressable: add special swap for unaddressable Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  2:06   ` Balbir Singh
2016-11-21  2:06     ` Balbir Singh
2016-11-21  5:05     ` Jerome Glisse
2016-11-21  5:05       ` Jerome Glisse
2016-11-22  2:19       ` Balbir Singh
2016-11-22  2:19         ` Balbir Singh
2016-11-22 13:59         ` Jerome Glisse
2016-11-22 13:59           ` Jerome Glisse
2016-11-21 11:10     ` Anshuman Khandual
2016-11-21 11:10       ` Anshuman Khandual
2016-11-21 10:58   ` Anshuman Khandual
2016-11-21 10:58     ` Anshuman Khandual
2016-11-21 12:42     ` Jerome Glisse
2016-11-21 12:42       ` Jerome Glisse
2016-11-22  4:48       ` Anshuman Khandual
2016-11-22  4:48         ` Anshuman Khandual
2016-11-24 13:56         ` Jerome Glisse
2016-11-24 13:56           ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 07/18] mm/ZONE_DEVICE/x86: add support for un-addressable device memory Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  2:08   ` Balbir Singh
2016-11-21  2:08     ` Balbir Singh
2016-11-21  5:08     ` Jerome Glisse
2016-11-21  5:08       ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 08/18] mm/hmm: heterogeneous memory management (HMM for short) Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  2:29   ` Balbir Singh
2016-11-21  2:29     ` Balbir Singh
2016-11-21  5:14     ` Jerome Glisse
2016-11-21  5:14       ` Jerome Glisse
2016-11-23  4:03   ` Anshuman Khandual
2016-11-23  4:03     ` Anshuman Khandual
2016-11-27 13:10     ` Jerome Glisse
2016-11-27 13:10       ` Jerome Glisse
2016-11-28  2:58       ` Anshuman Khandual
2016-11-28  2:58         ` Anshuman Khandual
2016-11-28  9:41         ` Jerome Glisse
2016-11-28  9:41           ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 09/18] mm/hmm/mirror: mirror process address space on device with HMM helpers Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-21  2:42   ` Balbir Singh
2016-11-21  2:42     ` Balbir Singh
2016-11-21  5:18     ` Jerome Glisse
2016-11-21  5:18       ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 10/18] mm/hmm/mirror: add range lock helper, prevent CPU page table update for the range Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 11/18] mm/hmm/mirror: add range monitor helper, to monitor CPU page table update Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 12/18] mm/hmm/mirror: helper to snapshot CPU page table Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 13/18] mm/hmm/mirror: device page fault handler Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 14/18] mm/hmm/migrate: support un-addressable ZONE_DEVICE page in migration Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 15/18] mm/hmm/migrate: add new boolean copy flag to migratepage() callback Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 16/18] mm/hmm/migrate: new memory migration helper for use with device memory Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 19:57   ` Aneesh Kumar K.V [this message]
2016-11-18 19:57     ` Aneesh Kumar K.V
2016-11-18 20:15     ` Jerome Glisse
2016-11-18 20:15       ` Jerome Glisse
2016-11-19 14:32   ` Aneesh Kumar K.V
2016-11-19 14:32     ` Aneesh Kumar K.V
2016-11-19 17:17     ` Jerome Glisse
2016-11-19 17:17       ` Jerome Glisse
2016-11-20 18:21       ` Aneesh Kumar K.V
2016-11-20 18:21         ` Aneesh Kumar K.V
2016-11-20 20:06         ` Jerome Glisse
2016-11-20 20:06           ` Jerome Glisse
2016-11-21  3:30   ` Balbir Singh
2016-11-21  3:30     ` Balbir Singh
2016-11-21  5:31     ` Jerome Glisse
2016-11-21  5:31       ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 17/18] mm/hmm/devmem: device driver helper to hotplug ZONE_DEVICE memory Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 18/18] mm/hmm/devmem: dummy HMM device as an helper for " Jérôme Glisse
2016-11-18 18:18   ` Jérôme Glisse
2016-11-19  0:41 ` [HMM v13 00/18] HMM (Heterogeneous Memory Management) v13 John Hubbard
2016-11-19  0:41   ` John Hubbard
2016-11-19 14:50   ` Aneesh Kumar K.V
2016-11-19 14:50     ` Aneesh Kumar K.V
2016-11-23  9:16 ` Haggai Eran
2016-11-23  9:16   ` Haggai Eran
2016-11-25 16:16   ` Jerome Glisse
2016-11-25 16:16     ` Jerome Glisse
2016-11-27 13:27     ` Haggai Eran
2016-11-27 13:27       ` Haggai Eran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k2c0muhj.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=SCheung@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=jakumar@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhairgrove@nvidia.com \
    --cc=sgutti@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.