linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Jérôme Glisse" <jglisse@redhat.com>
To: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: "John Hubbard" <jhubbard@nvidia.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Jatin Kumar" <jakumar@nvidia.com>,
	"Mark Hairgrove" <mhairgrove@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>
Subject: [HMM v13 08/18] mm/hmm: heterogeneous memory management (HMM for short)
Date: Fri, 18 Nov 2016 13:18:17 -0500	[thread overview]
Message-ID: <1479493107-982-9-git-send-email-jglisse@redhat.com> (raw)
In-Reply-To: <1479493107-982-1-git-send-email-jglisse@redhat.com>

HMM provides 3 separate functionality :
    - Mirroring: synchronize CPU page table and device page table
    - Device memory: allocating struct page for device memory
    - Migration: migrating regular memory to device memory

This patch introduces some common helpers and definitions to all of
those 3 functionality.

Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
---
 MAINTAINERS              |   7 +++
 include/linux/hmm.h      | 139 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm_types.h |   5 ++
 kernel/fork.c            |   2 +
 mm/Kconfig               |  11 ++++
 mm/Makefile              |   1 +
 mm/hmm.c                 |  86 +++++++++++++++++++++++++++++
 7 files changed, 251 insertions(+)
 create mode 100644 include/linux/hmm.h
 create mode 100644 mm/hmm.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f593300..41cd63d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5582,6 +5582,13 @@ S:	Supported
 F:	drivers/scsi/hisi_sas/
 F:	Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
 
+HMM - Heterogeneous Memory Management
+M:	JA(C)rA'me Glisse <jglisse@redhat.com>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	mm/hmm*
+F:	include/linux/hmm*
+
 HOST AP DRIVER
 M:	Jouni Malinen <j@w1.fi>
 L:	hostap@shmoo.com (subscribers-only)
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
new file mode 100644
index 0000000..54dd529
--- /dev/null
+++ b/include/linux/hmm.h
@@ -0,0 +1,139 @@
+/*
+ * Copyright 2013 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Authors: JA(C)rA'me Glisse <jglisse@redhat.com>
+ */
+/*
+ * HMM provides 3 separate functionality :
+ *   - Mirroring: synchronize CPU page table and device page table
+ *   - Device memory: allocating struct page for device memory
+ *   - Migration: migrating regular memory to device memory
+ *
+ * Each can be use independently from the others.
+ *
+ *
+ * Mirroring:
+ *
+ * HMM provide helpers to mirror process address space on a device. For this it
+ * provides several helpers to order device page table update in respect to CPU
+ * page table update. Requirement is that for any given virtual address the CPU
+ * and device page table can not point to different physical page. It uses the
+ * mmu_notifier API and introduce virtual address range lock which block CPU
+ * page table update for a range while the device page table is being updated.
+ * Usage pattern is:
+ *
+ *      hmm_vma_range_lock(vma, start, end);
+ *      // snap shot CPU page table
+ *      // update device page table from snapshot
+ *      hmm_vma_range_unlock(vma, start, end);
+ *
+ * Any CPU page table update that conflict with a range lock will wait until
+ * range is unlock. This garanty proper serialization of CPU and device page
+ * table update.
+ *
+ *
+ * Device memory:
+ *
+ * HMM provides helpers to help leverage device memory either addressable like
+ * regular memory by the CPU or un-addressable at all. In both case the device
+ * memory is associated to dedicated structs page (which are allocated like for
+ * hotplug memory). Device memory management is under the responsability of the
+ * device driver. HMM only allocate and initialize the struct pages associated
+ * with the device memory.
+ *
+ * Allocating struct page for device memory allow to use device memory allmost
+ * like any regular memory. Unlike regular memory it can not be added to the
+ * lru, nor can any memory allocation can use device memory directly. Device
+ * memory will only end up to be use in a process if device driver migrate some
+ * of the process memory from regular memory to device memory.
+ *
+ *
+ * Migration:
+ *
+ * Existing memory migration mechanism (mm/migrate.c) does not allow to use
+ * something else than the CPU to copy from source to destination memory. More
+ * over existing code is not tailor to drive migration from process virtual
+ * address rather than from list of pages. Finaly the migration flow does not
+ * allow for graceful failure at different step of the migration process.
+ *
+ * HMM solves all of the above though simple API :
+ *
+ *      hmm_vma_migrate(vma, start, end, ops);
+ *
+ * With ops struct providing 2 callback alloc_and_copy() which allocated the
+ * destination memory and initialize it using source memory. Migration can fail
+ * after this step and thus last callback finalize_and_map() allow the device
+ * driver to know which page were successfully migrated and which were not.
+ *
+ * This can easily be use outside of HMM intended use case.
+ *
+ *
+ * This header file contain all the API related to this 3 functionality and
+ * each functions and struct are more thouroughly documented in below comments.
+ */
+#ifndef LINUX_HMM_H
+#define LINUX_HMM_H
+
+#include <linux/kconfig.h>
+
+#if IS_ENABLED(CONFIG_HMM)
+
+
+/*
+ * hmm_pfn_t - HMM use its own pfn type to keep several flags per page
+ *
+ * Flags:
+ * HMM_PFN_VALID: pfn is valid
+ * HMM_PFN_WRITE: CPU page table have the write permission set
+ */
+typedef unsigned long hmm_pfn_t;
+
+#define HMM_PFN_VALID (1 << 0)
+#define HMM_PFN_WRITE (1 << 1)
+#define HMM_PFN_SHIFT 2
+
+static inline struct page *hmm_pfn_to_page(hmm_pfn_t pfn)
+{
+	if (!(pfn & HMM_PFN_VALID))
+		return NULL;
+	return pfn_to_page(pfn >> HMM_PFN_SHIFT);
+}
+
+static inline unsigned long hmm_pfn_to_pfn(hmm_pfn_t pfn)
+{
+	if (!(pfn & HMM_PFN_VALID))
+		return -1UL;
+	return (pfn >> HMM_PFN_SHIFT);
+}
+
+static inline hmm_pfn_t hmm_pfn_from_page(struct page *page)
+{
+	return (page_to_pfn(page) << HMM_PFN_SHIFT) | HMM_PFN_VALID;
+}
+
+static inline hmm_pfn_t hmm_pfn_from_pfn(unsigned long pfn)
+{
+	return (pfn << HMM_PFN_SHIFT) | HMM_PFN_VALID;
+}
+
+
+/* Below are for HMM internal use only ! Not to be use by device driver ! */
+void hmm_mm_destroy(struct mm_struct *mm);
+
+#else /* IS_ENABLED(CONFIG_HMM) */
+
+/* Below are for HMM internal use only ! Not to be use by device driver ! */
+static inline void hmm_mm_destroy(struct mm_struct *mm) {}
+
+#endif /* IS_ENABLED(CONFIG_HMM) */
+#endif /* LINUX_HMM_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a8aced..4effdbf 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -23,6 +23,7 @@
 
 struct address_space;
 struct mem_cgroup;
+struct hmm;
 
 #define USE_SPLIT_PTE_PTLOCKS	(NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
 #define USE_SPLIT_PMD_PTLOCKS	(USE_SPLIT_PTE_PTLOCKS && \
@@ -516,6 +517,10 @@ struct mm_struct {
 	atomic_long_t hugetlb_usage;
 #endif
 	struct work_struct async_put_work;
+#if IS_ENABLED(CONFIG_HMM)
+	/* HMM need to track few things per mm */
+	struct hmm *hmm;
+#endif
 };
 
 static inline void mm_init_cpumask(struct mm_struct *mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index 690a1aad..af0eec8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -27,6 +27,7 @@
 #include <linux/binfmts.h>
 #include <linux/mman.h>
 #include <linux/mmu_notifier.h>
+#include <linux/hmm.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/vmacache.h>
@@ -702,6 +703,7 @@ void __mmdrop(struct mm_struct *mm)
 	BUG_ON(mm == &init_mm);
 	mm_free_pgd(mm);
 	destroy_context(mm);
+	hmm_mm_destroy(mm);
 	mmu_notifier_mm_destroy(mm);
 	check_mm(mm);
 	free_mm(mm);
diff --git a/mm/Kconfig b/mm/Kconfig
index 0a21411..be18cc2 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -289,6 +289,17 @@ config MIGRATION
 config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	bool
 
+config HMM
+	bool "Heterogeneous memory management (HMM)"
+	depends on MMU
+	default n
+	help
+	  Heterogeneous memory management, set of helpers for:
+	    - mirroring of process address space on a device
+	    - using device memory transparently inside a process
+
+	  If unsure, say N to disable HMM.
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
diff --git a/mm/Makefile b/mm/Makefile
index 2ca1faf..6ac1284 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_FAILSLAB) += failslab.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_MEMTEST)		+= memtest.o
 obj-$(CONFIG_MIGRATION) += migrate.o
+obj-$(CONFIG_HMM) += hmm.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
 obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
diff --git a/mm/hmm.c b/mm/hmm.c
new file mode 100644
index 0000000..342b596
--- /dev/null
+++ b/mm/hmm.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright 2013 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Authors: JA(C)rA'me Glisse <jglisse@redhat.com>
+ */
+/*
+ * Refer to include/linux/hmm.h for informations about heterogeneous memory
+ * management or HMM for short.
+ */
+#include <linux/mm.h>
+#include <linux/hmm.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+
+/*
+ * struct hmm - HMM per mm struct
+ *
+ * @mm: mm struct this HMM struct is bound to
+ */
+struct hmm {
+	struct mm_struct	*mm;
+};
+
+/*
+ * hmm_register - register HMM against an mm (HMM internal)
+ *
+ * @mm: mm struct to attach to
+ *
+ * This is not intended to be use directly by device driver but by other HMM
+ * component. It allocates an HMM struct if mm does not have one and initialize
+ * it.
+ */
+static struct hmm *hmm_register(struct mm_struct *mm)
+{
+	struct hmm *hmm = NULL;
+
+	if (!mm->hmm) {
+		hmm = kmalloc(sizeof(*hmm), GFP_KERNEL);
+		if (!hmm)
+			return NULL;
+		hmm->mm = mm;
+	}
+
+	spin_lock(&mm->page_table_lock);
+	if (!mm->hmm)
+		/*
+		 * The hmm struct can only be free once mm_struct goes away
+		 * hence we should always have pre-allocated an new hmm struct
+		 * above.
+		 */
+		mm->hmm = hmm;
+	else if (hmm)
+		kfree(hmm);
+	hmm = mm->hmm;
+	spin_unlock(&mm->page_table_lock);
+
+	return hmm;
+}
+
+void hmm_mm_destroy(struct mm_struct *mm)
+{
+	struct hmm *hmm;
+
+	/*
+	 * We should not need to lock here as no one should be able to register
+	 * a new HMM while an mm is being destroy. But just to be safe ...
+	 */
+	spin_lock(&mm->page_table_lock);
+	hmm = mm->hmm;
+	mm->hmm = NULL;
+	spin_unlock(&mm->page_table_lock);
+	if (!hmm)
+		return;
+
+	kfree(hmm);
+}
-- 
2.4.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-11-18 17:17 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-18 18:18 [HMM v13 00/18] HMM (Heterogeneous Memory Management) v13 Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 01/18] mm/memory/hotplug: convert device parameter bool to set of flags Jérôme Glisse
2016-11-21  0:44   ` Balbir Singh
2016-11-21  4:53     ` Jerome Glisse
2016-11-21  6:57       ` Anshuman Khandual
2016-11-21 12:19         ` Jerome Glisse
2016-11-21  6:41   ` Anshuman Khandual
2016-11-21 12:27     ` Jerome Glisse
2016-11-22  5:35       ` Anshuman Khandual
2016-11-22 14:08         ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 02/18] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory Jérôme Glisse
2016-11-21  8:06   ` Anshuman Khandual
2016-11-21 12:33     ` Jerome Glisse
2016-11-22  5:15       ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 03/18] mm/ZONE_DEVICE/free_hot_cold_page: catch ZONE_DEVICE pages Jérôme Glisse
2016-11-21  8:18   ` Anshuman Khandual
2016-11-21 12:50     ` Jerome Glisse
2016-11-22  4:30       ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 04/18] mm/ZONE_DEVICE/free-page: callback when page is freed Jérôme Glisse
2016-11-21  1:49   ` Balbir Singh
2016-11-21  4:57     ` Jerome Glisse
2016-11-21  8:26   ` Anshuman Khandual
2016-11-21 12:34     ` Jerome Glisse
2016-11-22  5:02       ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 05/18] mm/ZONE_DEVICE/devmem_pages_remove: allow early removal of device memory Jérôme Glisse
2016-11-21 10:37   ` Anshuman Khandual
2016-11-21 12:39     ` Jerome Glisse
2016-11-22  4:54       ` Anshuman Khandual
2016-11-18 18:18 ` [HMM v13 06/18] mm/ZONE_DEVICE/unaddressable: add special swap for unaddressable Jérôme Glisse
2016-11-21  2:06   ` Balbir Singh
2016-11-21  5:05     ` Jerome Glisse
2016-11-22  2:19       ` Balbir Singh
2016-11-22 13:59         ` Jerome Glisse
2016-11-21 11:10     ` Anshuman Khandual
2016-11-21 10:58   ` Anshuman Khandual
2016-11-21 12:42     ` Jerome Glisse
2016-11-22  4:48       ` Anshuman Khandual
2016-11-24 13:56         ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 07/18] mm/ZONE_DEVICE/x86: add support for un-addressable device memory Jérôme Glisse
2016-11-21  2:08   ` Balbir Singh
2016-11-21  5:08     ` Jerome Glisse
2016-11-18 18:18 ` Jérôme Glisse [this message]
2016-11-21  2:29   ` [HMM v13 08/18] mm/hmm: heterogeneous memory management (HMM for short) Balbir Singh
2016-11-21  5:14     ` Jerome Glisse
2016-11-23  4:03   ` Anshuman Khandual
2016-11-27 13:10     ` Jerome Glisse
2016-11-28  2:58       ` Anshuman Khandual
2016-11-28  9:41         ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 09/18] mm/hmm/mirror: mirror process address space on device with HMM helpers Jérôme Glisse
2016-11-21  2:42   ` Balbir Singh
2016-11-21  5:18     ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 10/18] mm/hmm/mirror: add range lock helper, prevent CPU page table update for the range Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 11/18] mm/hmm/mirror: add range monitor helper, to monitor CPU page table update Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 12/18] mm/hmm/mirror: helper to snapshot CPU page table Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 13/18] mm/hmm/mirror: device page fault handler Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 14/18] mm/hmm/migrate: support un-addressable ZONE_DEVICE page in migration Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 15/18] mm/hmm/migrate: add new boolean copy flag to migratepage() callback Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 16/18] mm/hmm/migrate: new memory migration helper for use with device memory Jérôme Glisse
2016-11-18 19:57   ` Aneesh Kumar K.V
2016-11-18 20:15     ` Jerome Glisse
2016-11-19 14:32   ` Aneesh Kumar K.V
2016-11-19 17:17     ` Jerome Glisse
2016-11-20 18:21       ` Aneesh Kumar K.V
2016-11-20 20:06         ` Jerome Glisse
2016-11-21  3:30   ` Balbir Singh
2016-11-21  5:31     ` Jerome Glisse
2016-11-18 18:18 ` [HMM v13 17/18] mm/hmm/devmem: device driver helper to hotplug ZONE_DEVICE memory Jérôme Glisse
2016-11-18 18:18 ` [HMM v13 18/18] mm/hmm/devmem: dummy HMM device as an helper for " Jérôme Glisse
2016-11-19  0:41 ` [HMM v13 00/18] HMM (Heterogeneous Memory Management) v13 John Hubbard
2016-11-19 14:50   ` Aneesh Kumar K.V
2016-11-23  9:16 ` Haggai Eran
2016-11-25 16:16   ` Jerome Glisse
2016-11-27 13:27     ` Haggai Eran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1479493107-982-9-git-send-email-jglisse@redhat.com \
    --to=jglisse@redhat.com \
    --cc=SCheung@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=jakumar@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhairgrove@nvidia.com \
    --cc=sgutti@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).