Re: mm: Various preparatory patches for hmm and kfd

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: mm: Various preparatory patches for hmm and kfd
       [not found] <1404856801-11702-1-git-send-email-j.glisse@gmail.com>
@ 2014-07-08 23:50 ` Jerome Glisse
       [not found] ` <1404856801-11702-2-git-send-email-j.glisse@gmail.com>
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-08 23:50 UTC (permalink / raw)
  To: akpm, linux-mm, linux-kernel
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay

On Tue, Jul 08, 2014 at 05:59:57PM -0400, j.glisse@gmail.com wrote:
> So here is updated patchset. I believe the first patch despite Joerg
> opposition, is still a welcome change to mmput. Wether or not kfd and
> hmm could register might still be debated but i strongly believe that
> the fact that we are tie to mm_struct and the file lifespan is not tie
> to a specific mm_struct is a testimony that we should interface with
> mm_struct destruction.
> 
> The third patch have updated comment and i hope address all Linus'
> worries. The event description have been updated and i hope is clear
> enough.
> 
> The last patch also been updated and now only pass vma to various
> mmu_notifier callback. This does means that instead of calling once
> the mmu_notifier for zapping whole address range, it will be call
> once per vma. I believe that given this change only impact process
> that use vma and that by restricting ourself to existing vma the
> impact will be small and might even turn to be a win as mmu listener
> will not have to traverse whole secondary page table full of non
> existant entry but only do job on thing that migth exist in the
> secondary page table.
> 
> Hope this address any previous concern about those patches.
> 
> Cheers,
> Jerome
> 
> 

I manage to forget mailing list.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] mmput: use notifier chain to call subsystem exit handler.
       [not found] ` <1404856801-11702-2-git-send-email-j.glisse@gmail.com>
@ 2014-07-08 23:51   ` Jerome Glisse
  2014-07-09 16:21   ` Joerg Roedel
  1 sibling, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-08 23:51 UTC (permalink / raw)
  To: akpm, linux-mm, linux-kernel
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse

On Tue, Jul 08, 2014 at 05:59:58PM -0400, j.glisse@gmail.com wrote:
From: Jerome Glisse <jglisse@redhat.com>

Several subsystem require a callback when a mm struct is being destroy
so that they can cleanup there respective per mm struct. Instead of
having each subsystem add its callback to mmput use a notifier chain
to call each of the subsystem.

This will allow new subsystem to register callback even if they are
module. There should be no contention on the rw semaphore protecting
the call chain and the impact on the code path should be low and
burried in the noise.

Note that this patch also move the call to cleanup functions after
exit_mmap so that new call back can assume that mmu_notifier_release
have already been call. This does not impact existing cleanup functions
as they do not rely on anything that exit_mmap is freeing. Also moved
khugepaged_exit to exit_mmap so that ordering is preserved for that
function.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 fs/aio.c                | 29 ++++++++++++++++++++++-------
 include/linux/aio.h     |  2 --
 include/linux/ksm.h     | 11 -----------
 include/linux/sched.h   |  5 +++++
 include/linux/uprobes.h |  1 -
 kernel/events/uprobes.c | 19 ++++++++++++++++---
 kernel/fork.c           | 22 ++++++++++++++++++----
 mm/ksm.c                | 26 +++++++++++++++++++++-----
 mm/mmap.c               |  3 +++
 9 files changed, 85 insertions(+), 33 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f2b85f8..a465b6d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -40,6 +40,7 @@
 #include <linux/ramfs.h>
 #include <linux/percpu-refcount.h>
 #include <linux/mount.h>
+#include <linux/notifier.h>
 
 #include <asm/kmap_types.h>
 #include <asm/uaccess.h>
@@ -776,20 +777,22 @@ ssize_t wait_on_sync_kiocb(struct kiocb *req)
 EXPORT_SYMBOL(wait_on_sync_kiocb);
 
 /*
- * exit_aio: called when the last user of mm goes away.  At this point, there is
+ * aio_exit: called when the last user of mm goes away.  At this point, there is
  * no way for any new requests to be submited or any of the io_* syscalls to be
  * called on the context.
  *
  * There may be outstanding kiocbs, but free_ioctx() will explicitly wait on
  * them.
  */
-void exit_aio(struct mm_struct *mm)
+static int aio_exit(struct notifier_block *nb,
+		    unsigned long action, void *data)
 {
+	struct mm_struct *mm = data;
 	struct kioctx_table *table = rcu_dereference_raw(mm->ioctx_table);
 	int i;
 
 	if (!table)
-		return;
+		return 0;
 
 	for (i = 0; i < table->nr; ++i) {
 		struct kioctx *ctx = table->table[i];
@@ -798,10 +801,10 @@ void exit_aio(struct mm_struct *mm)
 			continue;
 		/*
 		 * We don't need to bother with munmap() here - exit_mmap(mm)
-		 * is coming and it'll unmap everything. And we simply can't,
-		 * this is not necessarily our ->mm.
-		 * Since kill_ioctx() uses non-zero ->mmap_size as indicator
-		 * that it needs to unmap the area, just set it to 0.
+		 * have already been call and everything is unmap by now. But
+		 * to be safe set ->mmap_size to 0 since aio_free_ring() uses
+		 * non-zero ->mmap_size as indicator that it needs to unmap the
+		 * area.
 		 */
 		ctx->mmap_size = 0;
 		kill_ioctx(mm, ctx, NULL);
@@ -809,6 +812,7 @@ void exit_aio(struct mm_struct *mm)
 
 	RCU_INIT_POINTER(mm->ioctx_table, NULL);
 	kfree(table);
+	return 0;
 }
 
 static void put_reqs_available(struct kioctx *ctx, unsigned nr)
@@ -1631,3 +1635,14 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
 	}
 	return ret;
 }
+
+static struct notifier_block aio_mmput_nb = {
+	.notifier_call		= aio_exit,
+	.priority		= 1,
+};
+
+static int __init aio_init(void)
+{
+	return mmput_register_notifier(&aio_mmput_nb);
+}
+subsys_initcall(aio_init);
diff --git a/include/linux/aio.h b/include/linux/aio.h
index d9c92da..6308fac 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -73,7 +73,6 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 extern ssize_t wait_on_sync_kiocb(struct kiocb *iocb);
 extern void aio_complete(struct kiocb *iocb, long res, long res2);
 struct mm_struct;
-extern void exit_aio(struct mm_struct *mm);
 extern long do_io_submit(aio_context_t ctx_id, long nr,
 			 struct iocb __user *__user *iocbpp, bool compat);
 void kiocb_set_cancel_fn(struct kiocb *req, kiocb_cancel_fn *cancel);
@@ -81,7 +80,6 @@ void kiocb_set_cancel_fn(struct kiocb *req, kiocb_cancel_fn *cancel);
 static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; }
 static inline void aio_complete(struct kiocb *iocb, long res, long res2) { }
 struct mm_struct;
-static inline void exit_aio(struct mm_struct *mm) { }
 static inline long do_io_submit(aio_context_t ctx_id, long nr,
 				struct iocb __user * __user *iocbpp,
 				bool compat) { return 0; }
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 3be6bb1..84c184f 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -20,7 +20,6 @@ struct mem_cgroup;
 int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 		unsigned long end, int advice, unsigned long *vm_flags);
 int __ksm_enter(struct mm_struct *mm);
-void __ksm_exit(struct mm_struct *mm);
 
 static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 {
@@ -29,12 +28,6 @@ static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 	return 0;
 }
 
-static inline void ksm_exit(struct mm_struct *mm)
-{
-	if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
-		__ksm_exit(mm);
-}
-
 /*
  * A KSM page is one of those write-protected "shared pages" or "merged pages"
  * which KSM maps into multiple mms, wherever identical anonymous page content
@@ -83,10 +76,6 @@ static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 	return 0;
 }
 
-static inline void ksm_exit(struct mm_struct *mm)
-{
-}
-
 static inline int PageKsm(struct page *page)
 {
 	return 0;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ec89295..e9c3ea0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2384,6 +2384,11 @@ static inline void mmdrop(struct mm_struct * mm)
 		__mmdrop(mm);
 }
 
+/* mmput call list of notifier and subsystem/module can register
+ * new one through this call.
+ */
+extern int mmput_register_notifier(struct notifier_block *nb);
+extern int mmput_unregister_notifier(struct notifier_block *nb);
 /* mmput gets rid of the mappings and all user-space */
 extern void mmput(struct mm_struct *);
 /* Grab a reference to a task's mm, if it is not already going away */
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 4f844c6..44e7267 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -120,7 +120,6 @@ extern int uprobe_pre_sstep_notifier(struct pt_regs *regs);
 extern void uprobe_notify_resume(struct pt_regs *regs);
 extern bool uprobe_deny_signal(void);
 extern bool arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs);
-extern void uprobe_clear_state(struct mm_struct *mm);
 extern int  arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
 extern int  arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
 extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1d0af8a..29aad1f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -37,6 +37,7 @@
 #include <linux/percpu-rwsem.h>
 #include <linux/task_work.h>
 #include <linux/shmem_fs.h>
+#include <linux/notifier.h>
 
 #include <linux/uprobes.h>
 
@@ -1220,16 +1221,19 @@ static struct xol_area *get_xol_area(void)
 /*
  * uprobe_clear_state - Free the area allocated for slots.
  */
-void uprobe_clear_state(struct mm_struct *mm)
+static int uprobe_clear_state(struct notifier_block *nb,
+			      unsigned long action, void *data)
 {
+	struct mm_struct *mm = data;
 	struct xol_area *area = mm->uprobes_state.xol_area;
 
 	if (!area)
-		return;
+		return 0;
 
 	put_page(area->page);
 	kfree(area->bitmap);
 	kfree(area);
+	return 0;
 }
 
 void uprobe_start_dup_mmap(void)
@@ -1979,9 +1983,14 @@ static struct notifier_block uprobe_exception_nb = {
 	.priority		= INT_MAX-1,	/* notified after kprobes, kgdb */
 };
 
+static struct notifier_block uprobe_mmput_nb = {
+	.notifier_call		= uprobe_clear_state,
+	.priority		= 0,
+};
+
 static int __init init_uprobes(void)
 {
-	int i;
+	int i, err;
 
 	for (i = 0; i < UPROBES_HASH_SZ; i++)
 		mutex_init(&uprobes_mmap_mutex[i]);
@@ -1989,6 +1998,10 @@ static int __init init_uprobes(void)
 	if (percpu_init_rwsem(&dup_mmap_sem))
 		return -ENOMEM;
 
+	err = mmput_register_notifier(&uprobe_mmput_nb);
+	if (err)
+		return err;
+
 	return register_die_notifier(&uprobe_exception_nb);
 }
 __initcall(init_uprobes);
diff --git a/kernel/fork.c b/kernel/fork.c
index 41c9890..18e4487 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -87,6 +87,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/task.h>
 
+static BLOCKING_NOTIFIER_HEAD(mmput_notifier);
+
 /*
  * Protected counters by write_lock_irq(&tasklist_lock)
  */
@@ -623,6 +625,21 @@ void __mmdrop(struct mm_struct *mm)
 EXPORT_SYMBOL_GPL(__mmdrop);
 
 /*
+ * Register a notifier that will be call by mmput
+ */
+int mmput_register_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&mmput_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(mmput_register_notifier);
+
+int mmput_unregister_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&mmput_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(mmput_unregister_notifier);
+
+/*
  * Decrement the use count and release all resources for an mm.
  */
 void mmput(struct mm_struct *mm)
@@ -630,11 +647,8 @@ void mmput(struct mm_struct *mm)
 	might_sleep();
 
 	if (atomic_dec_and_test(&mm->mm_users)) {
-		uprobe_clear_state(mm);
-		exit_aio(mm);
-		ksm_exit(mm);
-		khugepaged_exit(mm); /* must run before exit_mmap */
 		exit_mmap(mm);
+		blocking_notifier_call_chain(&mmput_notifier, 0, mm);
 		set_mm_exe_file(mm, NULL);
 		if (!list_empty(&mm->mmlist)) {
 			spin_lock(&mmlist_lock);
diff --git a/mm/ksm.c b/mm/ksm.c
index 346ddc9..cb1e976 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -37,6 +37,7 @@
 #include <linux/freezer.h>
 #include <linux/oom.h>
 #include <linux/numa.h>
+#include <linux/notifier.h>
 
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -1586,7 +1587,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
 		/*
-		 * Although we tested list_empty() above, a racing __ksm_exit
+		 * Although we tested list_empty() above, a racing ksm_exit
 		 * of the last mm on the list may have removed it since then.
 		 */
 		if (slot == &ksm_mm_head)
@@ -1658,9 +1659,9 @@ next_mm:
 		/*
 		 * We've completed a full scan of all vmas, holding mmap_sem
 		 * throughout, and found no VM_MERGEABLE: so do the same as
-		 * __ksm_exit does to remove this mm from all our lists now.
-		 * This applies either when cleaning up after __ksm_exit
-		 * (but beware: we can reach here even before __ksm_exit),
+		 * ksm_exit does to remove this mm from all our lists now.
+		 * This applies either when cleaning up after ksm_exit
+		 * (but beware: we can reach here even before ksm_exit),
 		 * or when all VM_MERGEABLE areas have been unmapped (and
 		 * mmap_sem then protects against race with MADV_MERGEABLE).
 		 */
@@ -1821,11 +1822,16 @@ int __ksm_enter(struct mm_struct *mm)
 	return 0;
 }
 
-void __ksm_exit(struct mm_struct *mm)
+static int ksm_exit(struct notifier_block *nb,
+		    unsigned long action, void *data)
 {
+	struct mm_struct *mm = data;
 	struct mm_slot *mm_slot;
 	int easy_to_free = 0;
 
+	if (!test_bit(MMF_VM_MERGEABLE, &mm->flags))
+		return 0;
+
 	/*
 	 * This process is exiting: if it's straightforward (as is the
 	 * case when ksmd was never running), free mm_slot immediately.
@@ -1857,6 +1863,7 @@ void __ksm_exit(struct mm_struct *mm)
 		down_write(&mm->mmap_sem);
 		up_write(&mm->mmap_sem);
 	}
+	return 0;
 }
 
 struct page *ksm_might_need_to_copy(struct page *page,
@@ -2305,11 +2312,20 @@ static struct attribute_group ksm_attr_group = {
 };
 #endif /* CONFIG_SYSFS */
 
+static struct notifier_block ksm_mmput_nb = {
+	.notifier_call		= ksm_exit,
+	.priority		= 2,
+};
+
 static int __init ksm_init(void)
 {
 	struct task_struct *ksm_thread;
 	int err;
 
+	err = mmput_register_notifier(&ksm_mmput_nb);
+	if (err)
+		return err;
+
 	err = ksm_slab_init();
 	if (err)
 		goto out;
diff --git a/mm/mmap.c b/mm/mmap.c
index b2f81a6..b018aad 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2774,6 +2774,9 @@ void exit_mmap(struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	unsigned long nr_accounted = 0;
 
+	/* Important to call this first. */
+	khugepaged_exit(mm);
+
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
 
-- 
1.9.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/8] mm: differentiate unmap for vmscan from other unmap.
       [not found] ` <1404856801-11702-3-git-send-email-j.glisse@gmail.com>
@ 2014-07-08 23:51   ` Jerome Glisse
  2014-07-09 16:24   ` Joerg Roedel
  1 sibling, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-08 23:51 UTC (permalink / raw)
  To: akpm, linux-mm, linux-kernel
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse

On Tue, Jul 08, 2014 at 05:59:59PM -0400, j.glisse@gmail.com wrote:
From: Jerome Glisse <jglisse@redhat.com>

New code will need to be able to differentiate between a regular unmap and
an unmap trigger by vmscan in which case we want to be as quick as possible.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 include/linux/rmap.h | 15 ++++++++-------
 mm/memory-failure.c  |  2 +-
 mm/vmscan.c          |  4 ++--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index be57450..eddbc07 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -72,13 +72,14 @@ struct anon_vma_chain {
 };
 
 enum ttu_flags {
-	TTU_UNMAP = 1,			/* unmap mode */
-	TTU_MIGRATION = 2,		/* migration mode */
-	TTU_MUNLOCK = 4,		/* munlock mode */
-
-	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
-	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
-	TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
+	TTU_VMSCAN = 1,			/* unmap for vmscan */
+	TTU_POISON = 2,			/* unmap for poison */
+	TTU_MIGRATION = 4,		/* migration mode */
+	TTU_MUNLOCK = 8,		/* munlock mode */
+
+	TTU_IGNORE_MLOCK = (1 << 9),	/* ignore mlock */
+	TTU_IGNORE_ACCESS = (1 << 10),	/* don't age */
+	TTU_IGNORE_HWPOISON = (1 << 11),/* corrupted page is recoverable */
 };
 
 #ifdef CONFIG_MMU
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c035a2a..c931f05 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -887,7 +887,7 @@ static int page_action(struct page_state *ps, struct page *p,
 static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 				  int trapno, int flags, struct page **hpagep)
 {
-	enum ttu_flags ttu = TTU_UNMAP | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
+	enum ttu_flags ttu = TTU_POISON | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
 	struct address_space *mapping;
 	LIST_HEAD(tokill);
 	int ret;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6d24fd6..5a7d286 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1163,7 +1163,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 	}
 
 	ret = shrink_page_list(&clean_pages, zone, &sc,
-			TTU_UNMAP|TTU_IGNORE_ACCESS,
+			TTU_VMSCAN|TTU_IGNORE_ACCESS,
 			&dummy1, &dummy2, &dummy3, &dummy4, &dummy5, true);
 	list_splice(&clean_pages, page_list);
 	mod_zone_page_state(zone, NR_ISOLATED_FILE, -ret);
@@ -1518,7 +1518,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	if (nr_taken == 0)
 		return 0;
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc, TTU_UNMAP,
+	nr_reclaimed = shrink_page_list(&page_list, zone, sc, TTU_VMSCAN,
 				&nr_dirty, &nr_unqueued_dirty, &nr_congested,
 				&nr_writeback, &nr_immediate,
 				false);
-- 
1.9.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/8] mmu_notifier: add event information to address invalidation v3
       [not found] ` <1404856801-11702-4-git-send-email-j.glisse@gmail.com>
@ 2014-07-08 23:52   ` Jerome Glisse
  2014-07-09 16:32   ` Joerg Roedel
  1 sibling, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-08 23:52 UTC (permalink / raw)
  To: akpm, linux-mm, linux-kernel
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse

On Tue, Jul 08, 2014 at 06:00:00PM -0400, j.glisse@gmail.com wrote:
From: Jerome Glisse <jglisse@redhat.com>

The event information will be usefull for new user of mmu_notifier API.
The event argument differentiate between a vma disappearing, a page
being write protected or simply a page being unmaped. This allow new
user to take different path for different event for instance on unmap
the resource used to track a vma are still valid and should stay around.
While if the event is saying that a vma is being destroy it means that any
resources used to track this vma can be free.

Changed since v1:
  - renamed action into event (updated commit message too).
  - simplified the event names and clarified their intented usage
    also documenting what exceptation the listener can have in
    respect to each event.

Changed since v2:
  - Avoid crazy name.
  - Do not move code that do not need to move.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c |   3 +-
 drivers/iommu/amd_iommu_v2.c            |  14 ++--
 drivers/misc/sgi-gru/grutlbpurge.c      |   9 ++-
 drivers/xen/gntdev.c                    |   9 ++-
 fs/proc/task_mmu.c                      |   6 +-
 include/linux/mmu_notifier.h            | 123 ++++++++++++++++++++++++++------
 kernel/events/uprobes.c                 |  10 ++-
 mm/filemap_xip.c                        |   2 +-
 mm/huge_memory.c                        |  39 ++++++----
 mm/hugetlb.c                            |  23 +++---
 mm/ksm.c                                |  18 +++--
 mm/memory.c                             |  27 ++++---
 mm/migrate.c                            |   9 ++-
 mm/mmu_notifier.c                       |  28 +++++---
 mm/mprotect.c                           |   5 +-
 mm/mremap.c                             |   6 +-
 mm/rmap.c                               |  24 +++++--
 virt/kvm/kvm_main.c                     |  12 ++--
 18 files changed, 263 insertions(+), 104 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 21ea928..ed6f35e 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -56,7 +56,8 @@ struct i915_mmu_object {
 static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 						       struct mm_struct *mm,
 						       unsigned long start,
-						       unsigned long end)
+						       unsigned long end,
+						       enum mmu_event event)
 {
 	struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn);
 	struct interval_tree_node *it = NULL;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 499b436..2bb9771 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -414,21 +414,25 @@ static int mn_clear_flush_young(struct mmu_notifier *mn,
 static void mn_change_pte(struct mmu_notifier *mn,
 			  struct mm_struct *mm,
 			  unsigned long address,
-			  pte_t pte)
+			  pte_t pte,
+			  enum mmu_event event)
 {
 	__mn_flush_page(mn, address);
 }
 
 static void mn_invalidate_page(struct mmu_notifier *mn,
 			       struct mm_struct *mm,
-			       unsigned long address)
+			       unsigned long address,
+			       enum mmu_event event)
 {
 	__mn_flush_page(mn, address);
 }
 
 static void mn_invalidate_range_start(struct mmu_notifier *mn,
 				      struct mm_struct *mm,
-				      unsigned long start, unsigned long end)
+				      unsigned long start,
+				      unsigned long end,
+				      enum mmu_event event)
 {
 	struct pasid_state *pasid_state;
 	struct device_state *dev_state;
@@ -449,7 +453,9 @@ static void mn_invalidate_range_start(struct mmu_notifier *mn,
 
 static void mn_invalidate_range_end(struct mmu_notifier *mn,
 				    struct mm_struct *mm,
-				    unsigned long start, unsigned long end)
+				    unsigned long start,
+				    unsigned long end,
+				    enum mmu_event event)
 {
 	struct pasid_state *pasid_state;
 	struct device_state *dev_state;
diff --git a/drivers/misc/sgi-gru/grutlbpurge.c b/drivers/misc/sgi-gru/grutlbpurge.c
index 2129274..e67fed1 100644
--- a/drivers/misc/sgi-gru/grutlbpurge.c
+++ b/drivers/misc/sgi-gru/grutlbpurge.c
@@ -221,7 +221,8 @@ void gru_flush_all_tlb(struct gru_state *gru)
  */
 static void gru_invalidate_range_start(struct mmu_notifier *mn,
 				       struct mm_struct *mm,
-				       unsigned long start, unsigned long end)
+				       unsigned long start, unsigned long end,
+				       enum mmu_event event)
 {
 	struct gru_mm_struct *gms = container_of(mn, struct gru_mm_struct,
 						 ms_notifier);
@@ -235,7 +236,8 @@ static void gru_invalidate_range_start(struct mmu_notifier *mn,
 
 static void gru_invalidate_range_end(struct mmu_notifier *mn,
 				     struct mm_struct *mm, unsigned long start,
-				     unsigned long end)
+				     unsigned long end,
+				     enum mmu_event event)
 {
 	struct gru_mm_struct *gms = container_of(mn, struct gru_mm_struct,
 						 ms_notifier);
@@ -248,7 +250,8 @@ static void gru_invalidate_range_end(struct mmu_notifier *mn,
 }
 
 static void gru_invalidate_page(struct mmu_notifier *mn, struct mm_struct *mm,
-				unsigned long address)
+				unsigned long address,
+				enum mmu_event event)
 {
 	struct gru_mm_struct *gms = container_of(mn, struct gru_mm_struct,
 						 ms_notifier);
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 073b4a1..fe9da94 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -428,7 +428,9 @@ static void unmap_if_in_range(struct grant_map *map,
 
 static void mn_invl_range_start(struct mmu_notifier *mn,
 				struct mm_struct *mm,
-				unsigned long start, unsigned long end)
+				unsigned long start,
+				unsigned long end,
+				enum mmu_event event)
 {
 	struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
 	struct grant_map *map;
@@ -445,9 +447,10 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 
 static void mn_invl_page(struct mmu_notifier *mn,
 			 struct mm_struct *mm,
-			 unsigned long address)
+			 unsigned long address,
+			 enum mmu_event event)
 {
-	mn_invl_range_start(mn, mm, address, address + PAGE_SIZE);
+	mn_invl_range_start(mn, mm, address, address + PAGE_SIZE, event);
 }
 
 static void mn_release(struct mmu_notifier *mn,
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..e9e79f7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -830,7 +830,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		};
 		down_read(&mm->mmap_sem);
 		if (type == CLEAR_REFS_SOFT_DIRTY)
-			mmu_notifier_invalidate_range_start(mm, 0, -1);
+			mmu_notifier_invalidate_range_start(mm, 0,
+							    -1, MMU_STATUS);
 		for (vma = mm->mmap; vma; vma = vma->vm_next) {
 			cp.vma = vma;
 			if (is_vm_hugetlb_page(vma))
@@ -858,7 +859,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 					&clear_refs_walk);
 		}
 		if (type == CLEAR_REFS_SOFT_DIRTY)
-			mmu_notifier_invalidate_range_end(mm, 0, -1);
+			mmu_notifier_invalidate_range_end(mm, 0,
+							  -1, MMU_STATUS);
 		flush_tlb_mm(mm);
 		up_read(&mm->mmap_sem);
 		mmput(mm);
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index deca874..34717b8 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -9,6 +9,58 @@
 struct mmu_notifier;
 struct mmu_notifier_ops;
 
+/* MMU Events report fine-grained information to the callback routine, allowing
+ * the event listener to make a more informed decision as to what action to
+ * take. The event types are:
+ *
+ *   - MMU_MIGRATE: memory is migrating from one page to another, thus all write
+ *     access must stop after invalidate_range_start callback returns.
+ *     Furthermore, no read access should be allowed either, as a new page can
+ *     be remapped with write access before the invalidate_range_end callback
+ *     happens and thus any read access to old page might read stale data. There
+ *     are several sources for this event, including:
+ *
+ *         - A page moving to swap (various reasons, including page reclaim),
+ *         - An mremap syscall,
+ *         - migration for NUMA reasons,
+ *         - balancing the memory pool,
+ *         - write fault on COW page,
+ *         - and more that are not listed here.
+ *
+ *   - MMU_MPROT: memory access protection is changing. Refer to the vma to get
+ *     the new access protection. All memory access are still valid until the
+ *     invalidate_range_end callback.
+ *
+ *   - MMU_MUNMAP: the range is being unmapped (outcome of a munmap syscall or
+ *     process destruction). However, access is still allowed, up until the
+ *     invalidate_range_free_pages callback. This also implies that secondary
+ *     page table can be trimmed, because the address range is no longer valid.
+ *
+ *   - MMU_WRITE_BACK: memory is being written back to disk, all write accesses
+ *     must stop after invalidate_range_start callback returns. Read access are
+ *     still allowed.
+ *
+ *   - MMU_WRITE_PROTECT: memory is being writte protected (ie should be mapped
+ *     read only no matter what the vma memory protection allows). All write
+ *     accesses must stop after invalidate_range_start callback returns. Read
+ *     access are still allowed.
+ *
+ *   - MMU_STATUS memory status change, like soft dirty, or huge page
+ *     splitting flag being set on pmd.
+ *
+ * If in doubt when adding a new notifier caller, please use MMU_MIGRATE,
+ * because it will always lead to reasonable behavior, but will not allow the
+ * listener a chance to optimize its events.
+ */
+enum mmu_event {
+	MMU_MIGRATE = 0,
+	MMU_MPROT,
+	MMU_MUNMAP,
+	MMU_STATUS,
+	MMU_WRITE_BACK,
+	MMU_WRITE_PROTECT,
+};
+
 #ifdef CONFIG_MMU_NOTIFIER
 
 /*
@@ -79,7 +131,8 @@ struct mmu_notifier_ops {
 	void (*change_pte)(struct mmu_notifier *mn,
 			   struct mm_struct *mm,
 			   unsigned long address,
-			   pte_t pte);
+			   pte_t pte,
+			   enum mmu_event event);
 
 	/*
 	 * Before this is invoked any secondary MMU is still ok to
@@ -90,7 +143,8 @@ struct mmu_notifier_ops {
 	 */
 	void (*invalidate_page)(struct mmu_notifier *mn,
 				struct mm_struct *mm,
-				unsigned long address);
+				unsigned long address,
+				enum mmu_event event);
 
 	/*
 	 * invalidate_range_start() and invalidate_range_end() must be
@@ -137,10 +191,14 @@ struct mmu_notifier_ops {
 	 */
 	void (*invalidate_range_start)(struct mmu_notifier *mn,
 				       struct mm_struct *mm,
-				       unsigned long start, unsigned long end);
+				       unsigned long start,
+				       unsigned long end,
+				       enum mmu_event event);
 	void (*invalidate_range_end)(struct mmu_notifier *mn,
 				     struct mm_struct *mm,
-				     unsigned long start, unsigned long end);
+				     unsigned long start,
+				     unsigned long end,
+				     enum mmu_event event);
 };
 
 /*
@@ -177,13 +235,20 @@ extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
 extern int __mmu_notifier_test_young(struct mm_struct *mm,
 				     unsigned long address);
 extern void __mmu_notifier_change_pte(struct mm_struct *mm,
-				      unsigned long address, pte_t pte);
+				      unsigned long address,
+				      pte_t pte,
+				      enum mmu_event event);
 extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
-					  unsigned long address);
+					  unsigned long address,
+					  enum mmu_event event);
 extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
-				  unsigned long start, unsigned long end);
+						  unsigned long start,
+						  unsigned long end,
+						  enum mmu_event event);
 extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
-				  unsigned long start, unsigned long end);
+						unsigned long start,
+						unsigned long end,
+						enum mmu_event event);
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
@@ -208,31 +273,38 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm,
 }
 
 static inline void mmu_notifier_change_pte(struct mm_struct *mm,
-					   unsigned long address, pte_t pte)
+					   unsigned long address,
+					   pte_t pte,
+					   enum mmu_event event)
 {
 	if (mm_has_notifiers(mm))
-		__mmu_notifier_change_pte(mm, address, pte);
+		__mmu_notifier_change_pte(mm, address, pte, event);
 }
 
 static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
-					  unsigned long address)
+						unsigned long address,
+						enum mmu_event event)
 {
 	if (mm_has_notifiers(mm))
-		__mmu_notifier_invalidate_page(mm, address);
+		__mmu_notifier_invalidate_page(mm, address, event);
 }
 
 static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
-				  unsigned long start, unsigned long end)
+						       unsigned long start,
+						       unsigned long end,
+						       enum mmu_event event)
 {
 	if (mm_has_notifiers(mm))
-		__mmu_notifier_invalidate_range_start(mm, start, end);
+		__mmu_notifier_invalidate_range_start(mm, start, end, event);
 }
 
 static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
-				  unsigned long start, unsigned long end)
+						     unsigned long start,
+						     unsigned long end,
+						     enum mmu_event event)
 {
 	if (mm_has_notifiers(mm))
-		__mmu_notifier_invalidate_range_end(mm, start, end);
+		__mmu_notifier_invalidate_range_end(mm, start, end, event);
 }
 
 static inline void mmu_notifier_mm_init(struct mm_struct *mm)
@@ -278,13 +350,13 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
  * old page would remain mapped readonly in the secondary MMUs after the new
  * page is already writable by some CPU through the primary MMU.
  */
-#define set_pte_at_notify(__mm, __address, __ptep, __pte)		\
+#define set_pte_at_notify(__mm, __address, __ptep, __pte, __event)	\
 ({									\
 	struct mm_struct *___mm = __mm;					\
 	unsigned long ___address = __address;				\
 	pte_t ___pte = __pte;						\
 									\
-	mmu_notifier_change_pte(___mm, ___address, ___pte);		\
+	mmu_notifier_change_pte(___mm, ___address, ___pte, __event);	\
 	set_pte_at(___mm, ___address, __ptep, ___pte);			\
 })
 
@@ -307,22 +379,29 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm,
 }
 
 static inline void mmu_notifier_change_pte(struct mm_struct *mm,
-					   unsigned long address, pte_t pte)
+					   unsigned long address,
+					   pte_t pte,
+					   enum mmu_event event)
 {
 }
 
 static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
-					  unsigned long address)
+						unsigned long address,
+						enum mmu_event event)
 {
 }
 
 static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
-				  unsigned long start, unsigned long end)
+						       unsigned long start,
+						       unsigned long end,
+						       enum mmu_event event)
 {
 }
 
 static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
-				  unsigned long start, unsigned long end)
+						     unsigned long start,
+						     unsigned long end,
+						     enum mmu_event event)
 {
 }
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 29aad1f..ddbe598 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -177,7 +177,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	/* For try_to_free_swap() and munlock_vma_page() below */
 	lock_page(page);
 
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start,
+					    mmun_end, MMU_MIGRATE);
 	err = -EAGAIN;
 	ptep = page_check_address(page, mm, addr, &ptl, 0);
 	if (!ptep)
@@ -195,7 +196,9 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 
 	flush_cache_page(vma, addr, pte_pfn(*ptep));
 	ptep_clear_flush(vma, addr, ptep);
-	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
+	set_pte_at_notify(mm, addr, ptep,
+			  mk_pte(kpage, vma->vm_page_prot),
+			  MMU_MIGRATE);
 
 	page_remove_rmap(page);
 	if (!page_mapped(page))
@@ -209,7 +212,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	err = 0;
  unlock:
 	mem_cgroup_cancel_charge(kpage, memcg);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 	unlock_page(page);
 	return err;
 }
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index d8d9fe3..a2b3f09 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -198,7 +198,7 @@ retry:
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
 			/* must invalidate_page _before_ freeing the page */
-			mmu_notifier_invalidate_page(mm, address);
+			mmu_notifier_invalidate_page(mm, address, MMU_MIGRATE);
 			page_cache_release(page);
 		}
 	}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5d562a9..3fb1f9e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1029,7 +1029,8 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
 
 	mmun_start = haddr;
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+					    MMU_MIGRATE);
 
 	ptl = pmd_lock(mm, pmd);
 	if (unlikely(!pmd_same(*pmd, orig_pmd)))
@@ -1063,7 +1064,8 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
 	page_remove_rmap(page);
 	spin_unlock(ptl);
 
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 
 	ret |= VM_FAULT_WRITE;
 	put_page(page);
@@ -1073,7 +1075,8 @@ out:
 
 out_free_pages:
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		memcg = (void *)page_private(pages[i]);
 		set_page_private(pages[i], 0);
@@ -1165,7 +1168,8 @@ alloc:
 
 	mmun_start = haddr;
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+					    MMU_MIGRATE);
 
 	spin_lock(ptl);
 	if (page)
@@ -1197,7 +1201,8 @@ alloc:
 	}
 	spin_unlock(ptl);
 out_mn:
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 out:
 	return ret;
 out_unlock:
@@ -1632,7 +1637,8 @@ static int __split_huge_page_splitting(struct page *page,
 	const unsigned long mmun_start = address;
 	const unsigned long mmun_end   = address + HPAGE_PMD_SIZE;
 
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start,
+					    mmun_end, MMU_STATUS);
 	pmd = page_check_address_pmd(page, mm, address,
 			PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG, &ptl);
 	if (pmd) {
@@ -1647,7 +1653,8 @@ static int __split_huge_page_splitting(struct page *page,
 		ret = 1;
 		spin_unlock(ptl);
 	}
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_STATUS);
 
 	return ret;
 }
@@ -2446,7 +2453,8 @@ static void collapse_huge_page(struct mm_struct *mm,
 
 	mmun_start = address;
 	mmun_end   = address + HPAGE_PMD_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start,
+					    mmun_end, MMU_MIGRATE);
 	pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
 	/*
 	 * After this gup_fast can't run anymore. This also removes
@@ -2456,7 +2464,8 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 */
 	_pmd = pmdp_clear_flush(vma, address, pmd);
 	spin_unlock(pmd_ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 
 	spin_lock(pte_ptl);
 	isolated = __collapse_huge_page_isolate(vma, address, pte);
@@ -2845,24 +2854,28 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address,
 	mmun_start = haddr;
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
 again:
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start,
+					    mmun_end, MMU_MIGRATE);
 	ptl = pmd_lock(mm, pmd);
 	if (unlikely(!pmd_trans_huge(*pmd))) {
 		spin_unlock(ptl);
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range_end(mm, mmun_start,
+						  mmun_end, MMU_MIGRATE);
 		return;
 	}
 	if (is_huge_zero_pmd(*pmd)) {
 		__split_huge_zero_page_pmd(vma, haddr, pmd);
 		spin_unlock(ptl);
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range_end(mm, mmun_start,
+						  mmun_end, MMU_MIGRATE);
 		return;
 	}
 	page = pmd_page(*pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	get_page(page);
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 
 	split_huge_page(page);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4e49741..fa28018 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2555,7 +2555,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	mmun_start = vma->vm_start;
 	mmun_end = vma->vm_end;
 	if (cow)
-		mmu_notifier_invalidate_range_start(src, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range_start(src, mmun_start,
+						    mmun_end, MMU_MIGRATE);
 
 	for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
 		spinlock_t *src_ptl, *dst_ptl;
@@ -2605,7 +2606,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	}
 
 	if (cow)
-		mmu_notifier_invalidate_range_end(src, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range_end(src, mmun_start,
+						  mmun_end, MMU_MIGRATE);
 
 	return ret;
 }
@@ -2631,7 +2633,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	BUG_ON(end & ~huge_page_mask(h));
 
 	tlb_start_vma(tlb, vma);
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start,
+					    mmun_end, MMU_MIGRATE);
 again:
 	for (address = start; address < end; address += sz) {
 		ptep = huge_pte_offset(mm, address);
@@ -2702,7 +2705,8 @@ unlock:
 		if (address < end && !ref_page)
 			goto again;
 	}
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 	tlb_end_vma(tlb, vma);
 }
 
@@ -2880,8 +2884,8 @@ retry_avoidcopy:
 
 	mmun_start = address & huge_page_mask(h);
 	mmun_end = mmun_start + huge_page_size(h);
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
-
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+					    MMU_MIGRATE);
 	/*
 	 * Retake the page table lock to check for racing updates
 	 * before the page tables are altered
@@ -2901,7 +2905,8 @@ retry_avoidcopy:
 		new_page = old_page;
 	}
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end,
+					  MMU_MIGRATE);
 out_release_all:
 	page_cache_release(new_page);
 out_release_old:
@@ -3339,7 +3344,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	BUG_ON(address >= end);
 	flush_cache_range(vma, address, end);
 
-	mmu_notifier_invalidate_range_start(mm, start, end);
+	mmu_notifier_invalidate_range_start(mm, start, end, MMU_MPROT);
 	mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
 	for (; address < end; address += huge_page_size(h)) {
 		spinlock_t *ptl;
@@ -3369,7 +3374,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	 */
 	flush_tlb_range(vma, start, end);
 	mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
-	mmu_notifier_invalidate_range_end(mm, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end, MMU_MPROT);
 
 	return pages << h->order;
 }
diff --git a/mm/ksm.c b/mm/ksm.c
index cb1e976..f50e103 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -873,7 +873,8 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 
 	mmun_start = addr;
 	mmun_end   = addr + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+					    MMU_WRITE_PROTECT);
 
 	ptep = page_check_address(page, mm, addr, &ptl, 0);
 	if (!ptep)
@@ -905,7 +906,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 		if (pte_dirty(entry))
 			set_page_dirty(page);
 		entry = pte_mkclean(pte_wrprotect(entry));
-		set_pte_at_notify(mm, addr, ptep, entry);
+		set_pte_at_notify(mm, addr, ptep, entry, MMU_WRITE_PROTECT);
 	}
 	*orig_pte = *ptep;
 	err = 0;
@@ -913,7 +914,8 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 out_unlock:
 	pte_unmap_unlock(ptep, ptl);
 out_mn:
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end,
+					  MMU_WRITE_PROTECT);
 out:
 	return err;
 }
@@ -949,7 +951,8 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 
 	mmun_start = addr;
 	mmun_end   = addr + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+					    MMU_MIGRATE);
 
 	ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
 	if (!pte_same(*ptep, orig_pte)) {
@@ -962,7 +965,9 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 
 	flush_cache_page(vma, addr, pte_pfn(*ptep));
 	ptep_clear_flush(vma, addr, ptep);
-	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
+	set_pte_at_notify(mm, addr, ptep,
+			  mk_pte(kpage, vma->vm_page_prot),
+			  MMU_MIGRATE);
 
 	page_remove_rmap(page);
 	if (!page_mapped(page))
@@ -972,7 +977,8 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 	pte_unmap_unlock(ptep, ptl);
 	err = 0;
 out_mn:
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end,
+					  MMU_MIGRATE);
 out:
 	return err;
 }
diff --git a/mm/memory.c b/mm/memory.c
index 13141ae..74fe242 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1050,7 +1050,7 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	mmun_end   = end;
 	if (is_cow)
 		mmu_notifier_invalidate_range_start(src_mm, mmun_start,
-						    mmun_end);
+						    mmun_end, MMU_MIGRATE);
 
 	ret = 0;
 	dst_pgd = pgd_offset(dst_mm, addr);
@@ -1067,7 +1067,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
 
 	if (is_cow)
-		mmu_notifier_invalidate_range_end(src_mm, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range_end(src_mm, mmun_start, mmun_end,
+						  MMU_MIGRATE);
 	return ret;
 }
 
@@ -1371,10 +1372,12 @@ void unmap_vmas(struct mmu_gather *tlb,
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
+	mmu_notifier_invalidate_range_start(mm, start_addr,
+					    end_addr, MMU_MUNMAP);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next)
 		unmap_single_vma(tlb, vma, start_addr, end_addr, NULL);
-	mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
+	mmu_notifier_invalidate_range_end(mm, start_addr,
+					  end_addr, MMU_MUNMAP);
 }
 
 /**
@@ -1396,10 +1399,10 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, start, end);
+	mmu_notifier_invalidate_range_start(mm, start, end, MMU_MUNMAP);
 	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
 		unmap_single_vma(&tlb, vma, start, end, details);
-	mmu_notifier_invalidate_range_end(mm, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end, MMU_MUNMAP);
 	tlb_finish_mmu(&tlb, start, end);
 }
 
@@ -1422,9 +1425,9 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm, address, end);
 	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, address, end);
+	mmu_notifier_invalidate_range_start(mm, address, end, MMU_MUNMAP);
 	unmap_single_vma(&tlb, vma, address, end, details);
-	mmu_notifier_invalidate_range_end(mm, address, end);
+	mmu_notifier_invalidate_range_end(mm, address, end, MMU_MUNMAP);
 	tlb_finish_mmu(&tlb, address, end);
 }
 
@@ -2208,7 +2211,8 @@ gotten:
 
 	mmun_start  = address & PAGE_MASK;
 	mmun_end    = mmun_start + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start,
+					    mmun_end, MMU_MIGRATE);
 
 	/*
 	 * Re-check the pte - we dropped the lock
@@ -2240,7 +2244,7 @@ gotten:
 		 * mmu page tables (such as kvm shadow page tables), we want the
 		 * new page to be mapped directly into the secondary page table.
 		 */
-		set_pte_at_notify(mm, address, page_table, entry);
+		set_pte_at_notify(mm, address, page_table, entry, MMU_MIGRATE);
 		update_mmu_cache(vma, address, page_table);
 		if (old_page) {
 			/*
@@ -2279,7 +2283,8 @@ gotten:
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	if (mmun_end > mmun_start)
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range_end(mm, mmun_start,
+						  mmun_end, MMU_MIGRATE);
 	if (old_page) {
 		/*
 		 * Don't let another task, with possibly unlocked vma,
diff --git a/mm/migrate.c b/mm/migrate.c
index ab43fbf..b526c72 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1820,12 +1820,14 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	WARN_ON(PageLRU(new_page));
 
 	/* Recheck the target PMD */
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start,
+					    mmun_end, MMU_MIGRATE);
 	ptl = pmd_lock(mm, pmd);
 	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {
 fail_putback:
 		spin_unlock(ptl);
-		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+		mmu_notifier_invalidate_range_end(mm, mmun_start,
+						  mmun_end, MMU_MIGRATE);
 
 		/* Reverse changes made by migrate_page_copy() */
 		if (TestClearPageActive(new_page))
@@ -1878,7 +1880,8 @@ fail_putback:
 	page_remove_rmap(page);
 
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 
 	/* Take an "isolate" reference and put new page on the LRU. */
 	get_page(new_page);
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 41cefdf..9decb88 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -122,8 +122,10 @@ int __mmu_notifier_test_young(struct mm_struct *mm,
 	return young;
 }
 
-void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address,
-			       pte_t pte)
+void __mmu_notifier_change_pte(struct mm_struct *mm,
+			       unsigned long address,
+			       pte_t pte,
+			       enum mmu_event event)
 {
 	struct mmu_notifier *mn;
 	int id;
@@ -131,13 +133,14 @@ void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address,
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->change_pte)
-			mn->ops->change_pte(mn, mm, address, pte);
+			mn->ops->change_pte(mn, mm, address, pte, event);
 	}
 	srcu_read_unlock(&srcu, id);
 }
 
 void __mmu_notifier_invalidate_page(struct mm_struct *mm,
-					  unsigned long address)
+				    unsigned long address,
+				    enum mmu_event event)
 {
 	struct mmu_notifier *mn;
 	int id;
@@ -145,13 +148,16 @@ void __mmu_notifier_invalidate_page(struct mm_struct *mm,
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_page)
-			mn->ops->invalidate_page(mn, mm, address);
+			mn->ops->invalidate_page(mn, mm, address, event);
 	}
 	srcu_read_unlock(&srcu, id);
 }
 
 void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
-				  unsigned long start, unsigned long end)
+					   unsigned long start,
+					   unsigned long end,
+					   enum mmu_event event)
+
 {
 	struct mmu_notifier *mn;
 	int id;
@@ -159,14 +165,17 @@ void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_range_start)
-			mn->ops->invalidate_range_start(mn, mm, start, end);
+			mn->ops->invalidate_range_start(mn, mm, start,
+							end, event);
 	}
 	srcu_read_unlock(&srcu, id);
 }
 EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
 
 void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
-				  unsigned long start, unsigned long end)
+					 unsigned long start,
+					 unsigned long end,
+					 enum mmu_event event)
 {
 	struct mmu_notifier *mn;
 	int id;
@@ -174,7 +183,8 @@ void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_range_end)
-			mn->ops->invalidate_range_end(mn, mm, start, end);
+			mn->ops->invalidate_range_end(mn, mm, start,
+						      end, event);
 	}
 	srcu_read_unlock(&srcu, id);
 }
diff --git a/mm/mprotect.c b/mm/mprotect.c
index c43d557..886405b 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -157,7 +157,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 		/* invoke the mmu notifier if the pmd is populated */
 		if (!mni_start) {
 			mni_start = addr;
-			mmu_notifier_invalidate_range_start(mm, mni_start, end);
+			mmu_notifier_invalidate_range_start(mm, mni_start,
+							    end, MMU_MPROT);
 		}
 
 		if (pmd_trans_huge(*pmd)) {
@@ -185,7 +186,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 	} while (pmd++, addr = next, addr != end);
 
 	if (mni_start)
-		mmu_notifier_invalidate_range_end(mm, mni_start, end);
+		mmu_notifier_invalidate_range_end(mm, mni_start, end, MMU_MPROT);
 
 	if (nr_huge_updates)
 		count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates);
diff --git a/mm/mremap.c b/mm/mremap.c
index 05f1180..6827d2f 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -177,7 +177,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	mmun_start = old_addr;
 	mmun_end   = old_end;
-	mmu_notifier_invalidate_range_start(vma->vm_mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(vma->vm_mm, mmun_start,
+					    mmun_end, MMU_MIGRATE);
 
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
@@ -228,7 +229,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 	if (likely(need_flush))
 		flush_tlb_range(vma, old_end-len, old_addr);
 
-	mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start,
+					  mmun_end, MMU_MIGRATE);
 
 	return len + old_addr - old_end;	/* how much done */
 }
diff --git a/mm/rmap.c b/mm/rmap.c
index 7928ddd..474a913 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -840,7 +840,7 @@ static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 	pte_unmap_unlock(pte, ptl);
 
 	if (ret) {
-		mmu_notifier_invalidate_page(mm, address);
+		mmu_notifier_invalidate_page(mm, address, MMU_WRITE_BACK);
 		(*cleaned)++;
 	}
 out:
@@ -1128,6 +1128,10 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	spinlock_t *ptl;
 	int ret = SWAP_AGAIN;
 	enum ttu_flags flags = (enum ttu_flags)arg;
+	enum mmu_event event = MMU_MIGRATE;
+
+	if (flags & TTU_MUNLOCK)
+		event = MMU_STATUS;
 
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
@@ -1233,7 +1237,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
 	if (ret != SWAP_FAIL && !(flags & TTU_MUNLOCK))
-		mmu_notifier_invalidate_page(mm, address);
+		mmu_notifier_invalidate_page(mm, address, event);
 out:
 	return ret;
 
@@ -1287,7 +1291,9 @@ out_mlock:
 #define CLUSTER_MASK	(~(CLUSTER_SIZE - 1))
 
 static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
-		struct vm_area_struct *vma, struct page *check_page)
+				struct vm_area_struct *vma,
+				struct page *check_page,
+				enum ttu_flags flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t *pmd;
@@ -1301,6 +1307,10 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 	unsigned long end;
 	int ret = SWAP_AGAIN;
 	int locked_vma = 0;
+	enum mmu_event event = MMU_MIGRATE;
+
+	if (flags & TTU_MUNLOCK)
+		event = MMU_STATUS;
 
 	address = (vma->vm_start + cursor) & CLUSTER_MASK;
 	end = address + CLUSTER_SIZE;
@@ -1315,7 +1325,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 
 	mmun_start = address;
 	mmun_end   = end;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end, event);
 
 	/*
 	 * If we can acquire the mmap_sem for read, and vma is VM_LOCKED,
@@ -1380,7 +1390,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 		(*mapcount)--;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
+	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end, event);
 	if (locked_vma)
 		up_read(&vma->vm_mm->mmap_sem);
 	return ret;
@@ -1436,7 +1446,9 @@ static int try_to_unmap_nonlinear(struct page *page,
 			while (cursor < max_nl_cursor &&
 				cursor < vma->vm_end - vma->vm_start) {
 				if (try_to_unmap_cluster(cursor, &mapcount,
-						vma, page) == SWAP_MLOCK)
+							 vma, page,
+							 (enum ttu_flags)arg)
+							 == SWAP_MLOCK)
 					ret = SWAP_MLOCK;
 				cursor += CLUSTER_SIZE;
 				vma->vm_private_data = (void *) cursor;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..6e1992f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -262,7 +262,8 @@ static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
 
 static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
 					     struct mm_struct *mm,
-					     unsigned long address)
+					     unsigned long address,
+					     enum mmu_event event)
 {
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 	int need_tlb_flush, idx;
@@ -301,7 +302,8 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
 static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 					struct mm_struct *mm,
 					unsigned long address,
-					pte_t pte)
+					pte_t pte,
+					enum mmu_event event)
 {
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 	int idx;
@@ -317,7 +319,8 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 						    struct mm_struct *mm,
 						    unsigned long start,
-						    unsigned long end)
+						    unsigned long end,
+						    enum mmu_event event)
 {
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 	int need_tlb_flush = 0, idx;
@@ -343,7 +346,8 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
 						  struct mm_struct *mm,
 						  unsigned long start,
-						  unsigned long end)
+						  unsigned long end,
+						  enum mmu_event event)
 {
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 
-- 
1.9.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/8] mmu_notifier: pass through vma to invalidate_range and invalidate_page v2
       [not found] ` <1404856801-11702-5-git-send-email-j.glisse@gmail.com>
@ 2014-07-08 23:52   ` Jerome Glisse
       [not found]   ` <20140709164131.GQ1958@8bytes.org>
  1 sibling, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-08 23:52 UTC (permalink / raw)
  To: akpm, linux-mm, linux-kernel
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse

On Tue, Jul 08, 2014 at 06:00:01PM -0400, j.glisse@gmail.com wrote:
From: Jerome Glisse <jglisse@redhat.com>

New user of the mmu_notifier interface need to lookup vma in order to
perform the invalidation operation. Instead of redoing a vma lookup
inside the callback just pass through the vma from the call site where
it is already available.

This needs small refactoring in memory.c to call invalidate_range on
vma boundary while previously it was call once for larger range. The
overhead might be offseted by the fact that mmu_notifier listener now
work on smaller and exact range.

Changed since v1 :
  - Only passthrough the vma.
  - Commit comment.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c |  1 +
 drivers/iommu/amd_iommu_v2.c            |  6 ++---
 drivers/misc/sgi-gru/grutlbpurge.c      |  8 ++++---
 drivers/xen/gntdev.c                    |  6 ++---
 fs/proc/task_mmu.c                      | 16 ++++++++-----
 include/linux/mmu_notifier.h            | 41 +++++++++++++++++----------------
 kernel/events/uprobes.c                 |  4 ++--
 mm/filemap_xip.c                        |  3 ++-
 mm/huge_memory.c                        | 26 ++++++++++-----------
 mm/hugetlb.c                            | 16 ++++++-------
 mm/ksm.c                                |  8 +++----
 mm/memory.c                             | 30 ++++++++++++------------
 mm/migrate.c                            |  6 ++---
 mm/mmu_notifier.c                       | 15 +++++++-----
 mm/mprotect.c                           |  6 ++---
 mm/mremap.c                             |  4 ++--
 mm/rmap.c                               |  9 ++++----
 virt/kvm/kvm_main.c                     |  6 ++---
 18 files changed, 112 insertions(+), 99 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index ed6f35e..191ac71 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -55,6 +55,7 @@ struct i915_mmu_object {
 
 static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 						       struct mm_struct *mm,
+						       struct vm_area_struct *vma,
 						       unsigned long start,
 						       unsigned long end,
 						       enum mmu_event event)
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 2bb9771..5d4bc8c 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -421,7 +421,7 @@ static void mn_change_pte(struct mmu_notifier *mn,
 }
 
 static void mn_invalidate_page(struct mmu_notifier *mn,
-			       struct mm_struct *mm,
+			       struct vm_area_struct *vma,
 			       unsigned long address,
 			       enum mmu_event event)
 {
@@ -429,7 +429,7 @@ static void mn_invalidate_page(struct mmu_notifier *mn,
 }
 
 static void mn_invalidate_range_start(struct mmu_notifier *mn,
-				      struct mm_struct *mm,
+				      struct vm_area_struct *vma,
 				      unsigned long start,
 				      unsigned long end,
 				      enum mmu_event event)
@@ -452,7 +452,7 @@ static void mn_invalidate_range_start(struct mmu_notifier *mn,
 }
 
 static void mn_invalidate_range_end(struct mmu_notifier *mn,
-				    struct mm_struct *mm,
+				    struct vm_area_struct *vma,
 				    unsigned long start,
 				    unsigned long end,
 				    enum mmu_event event)
diff --git a/drivers/misc/sgi-gru/grutlbpurge.c b/drivers/misc/sgi-gru/grutlbpurge.c
index e67fed1..ef29b45 100644
--- a/drivers/misc/sgi-gru/grutlbpurge.c
+++ b/drivers/misc/sgi-gru/grutlbpurge.c
@@ -220,7 +220,7 @@ void gru_flush_all_tlb(struct gru_state *gru)
  * MMUOPS notifier callout functions
  */
 static void gru_invalidate_range_start(struct mmu_notifier *mn,
-				       struct mm_struct *mm,
+				       struct vm_area_struct *vma,
 				       unsigned long start, unsigned long end,
 				       enum mmu_event event)
 {
@@ -235,7 +235,8 @@ static void gru_invalidate_range_start(struct mmu_notifier *mn,
 }
 
 static void gru_invalidate_range_end(struct mmu_notifier *mn,
-				     struct mm_struct *mm, unsigned long start,
+				     struct vm_area_struct *vma,
+				     unsigned long start,
 				     unsigned long end,
 				     enum mmu_event event)
 {
@@ -249,7 +250,8 @@ static void gru_invalidate_range_end(struct mmu_notifier *mn,
 	gru_dbg(grudev, "gms %p, start 0x%lx, end 0x%lx\n", gms, start, end);
 }
 
-static void gru_invalidate_page(struct mmu_notifier *mn, struct mm_struct *mm,
+static void gru_invalidate_page(struct mmu_notifier *mn,
+				struct vm_area_struct *vma,
 				unsigned long address,
 				enum mmu_event event)
 {
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index fe9da94..768f425 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -427,7 +427,7 @@ static void unmap_if_in_range(struct grant_map *map,
 }
 
 static void mn_invl_range_start(struct mmu_notifier *mn,
-				struct mm_struct *mm,
+				struct vm_area_struct *vma,
 				unsigned long start,
 				unsigned long end,
 				enum mmu_event event)
@@ -446,11 +446,11 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 }
 
 static void mn_invl_page(struct mmu_notifier *mn,
-			 struct mm_struct *mm,
+			 struct vm_area_struct *vma,
 			 unsigned long address,
 			 enum mmu_event event)
 {
-	mn_invl_range_start(mn, mm, address, address + PAGE_SIZE, event);
+	mn_invl_range_start(mn, vma, address, address + PAGE_SIZE, event);
 }
 
 static void mn_release(struct mmu_notifier *mn,
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e9e79f7..d1ed285 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -829,13 +829,15 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 			.private = &cp,
 		};
 		down_read(&mm->mmap_sem);
-		if (type == CLEAR_REFS_SOFT_DIRTY)
-			mmu_notifier_invalidate_range_start(mm, 0,
-							    -1, MMU_STATUS);
 		for (vma = mm->mmap; vma; vma = vma->vm_next) {
 			cp.vma = vma;
 			if (is_vm_hugetlb_page(vma))
 				continue;
+			if (type == CLEAR_REFS_SOFT_DIRTY)
+				mmu_notifier_invalidate_range_start(vma,
+								    vma->vm_start,
+								    vma->vm_end,
+								    MMU_STATUS);
 			/*
 			 * Writing 1 to /proc/pid/clear_refs affects all pages.
 			 *
@@ -857,10 +859,12 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 			}
 			walk_page_range(vma->vm_start, vma->vm_end,
 					&clear_refs_walk);
+			if (type == CLEAR_REFS_SOFT_DIRTY)
+				mmu_notifier_invalidate_range_end(vma,
+								  vma->vm_start,
+								  vma->vm_end,
+								  MMU_STATUS);
 		}
-		if (type == CLEAR_REFS_SOFT_DIRTY)
-			mmu_notifier_invalidate_range_end(mm, 0,
-							  -1, MMU_STATUS);
 		flush_tlb_mm(mm);
 		up_read(&mm->mmap_sem);
 		mmput(mm);
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 34717b8..499d675 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -142,7 +142,7 @@ struct mmu_notifier_ops {
 	 * be called internally to this method.
 	 */
 	void (*invalidate_page)(struct mmu_notifier *mn,
-				struct mm_struct *mm,
+				struct vm_area_struct *vma,
 				unsigned long address,
 				enum mmu_event event);
 
@@ -190,12 +190,12 @@ struct mmu_notifier_ops {
 	 * the last refcount is dropped.
 	 */
 	void (*invalidate_range_start)(struct mmu_notifier *mn,
-				       struct mm_struct *mm,
+				       struct vm_area_struct *vma,
 				       unsigned long start,
 				       unsigned long end,
 				       enum mmu_event event);
 	void (*invalidate_range_end)(struct mmu_notifier *mn,
-				     struct mm_struct *mm,
+				     struct vm_area_struct *vma,
 				     unsigned long start,
 				     unsigned long end,
 				     enum mmu_event event);
@@ -238,14 +238,14 @@ extern void __mmu_notifier_change_pte(struct mm_struct *mm,
 				      unsigned long address,
 				      pte_t pte,
 				      enum mmu_event event);
-extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
-					  unsigned long address,
-					  enum mmu_event event);
-extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+extern void __mmu_notifier_invalidate_page(struct vm_area_struct *vma,
+					   unsigned long address,
+					   enum mmu_event event);
+extern void __mmu_notifier_invalidate_range_start(struct vm_area_struct *vma,
 						  unsigned long start,
 						  unsigned long end,
 						  enum mmu_event event);
-extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+extern void __mmu_notifier_invalidate_range_end(struct vm_area_struct *vma,
 						unsigned long start,
 						unsigned long end,
 						enum mmu_event event);
@@ -281,30 +281,31 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
 		__mmu_notifier_change_pte(mm, address, pte, event);
 }
 
-static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+static inline void mmu_notifier_invalidate_page(struct vm_area_struct *vma,
 						unsigned long address,
 						enum mmu_event event)
 {
-	if (mm_has_notifiers(mm))
-		__mmu_notifier_invalidate_page(mm, address, event);
+	if (mm_has_notifiers(vma->vm_mm))
+		__mmu_notifier_invalidate_page(vma, address, event);
 }
 
-static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+static inline void mmu_notifier_invalidate_range_start(struct vm_area_struct *vma,
 						       unsigned long start,
 						       unsigned long end,
 						       enum mmu_event event)
 {
-	if (mm_has_notifiers(mm))
-		__mmu_notifier_invalidate_range_start(mm, start, end, event);
+	if (mm_has_notifiers(vma->vm_mm))
+		__mmu_notifier_invalidate_range_start(vma, start,
+						      end, event);
 }
 
-static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+static inline void mmu_notifier_invalidate_range_end(struct vm_area_struct *vma,
 						     unsigned long start,
 						     unsigned long end,
 						     enum mmu_event event)
 {
-	if (mm_has_notifiers(mm))
-		__mmu_notifier_invalidate_range_end(mm, start, end, event);
+	if (mm_has_notifiers(vma->vm_mm))
+		__mmu_notifier_invalidate_range_end(vma, start, end, event);
 }
 
 static inline void mmu_notifier_mm_init(struct mm_struct *mm)
@@ -385,20 +386,20 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
 {
 }
 
-static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+static inline void mmu_notifier_invalidate_page(struct vm_area_struct *vma,
 						unsigned long address,
 						enum mmu_event event)
 {
 }
 
-static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+static inline void mmu_notifier_invalidate_range_start(struct vm_area_struct *vma,
 						       unsigned long start,
 						       unsigned long end,
 						       enum mmu_event event)
 {
 }
 
-static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+static inline void mmu_notifier_invalidate_range_end(struct vm_area_struct *vma,
 						     unsigned long start,
 						     unsigned long end,
 						     enum mmu_event event)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ddbe598..ab1197f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -177,7 +177,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	/* For try_to_free_swap() and munlock_vma_page() below */
 	lock_page(page);
 
-	mmu_notifier_invalidate_range_start(mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_MIGRATE);
 	err = -EAGAIN;
 	ptep = page_check_address(page, mm, addr, &ptl, 0);
@@ -212,7 +212,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	err = 0;
  unlock:
 	mem_cgroup_cancel_charge(kpage, memcg);
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 	unlock_page(page);
 	return err;
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index a2b3f09..f0113df 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -198,7 +198,8 @@ retry:
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
 			/* must invalidate_page _before_ freeing the page */
-			mmu_notifier_invalidate_page(mm, address, MMU_MIGRATE);
+			mmu_notifier_invalidate_page(mm, vma, address,
+						     MMU_MIGRATE);
 			page_cache_release(page);
 		}
 	}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3fb1f9e..524475d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1029,7 +1029,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
 
 	mmun_start = haddr;
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_start(vma, mmun_start, mmun_end,
 					    MMU_MIGRATE);
 
 	ptl = pmd_lock(mm, pmd);
@@ -1064,7 +1064,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
 	page_remove_rmap(page);
 	spin_unlock(ptl);
 
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 
 	ret |= VM_FAULT_WRITE;
@@ -1075,7 +1075,7 @@ out:
 
 out_free_pages:
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		memcg = (void *)page_private(pages[i]);
@@ -1168,7 +1168,7 @@ alloc:
 
 	mmun_start = haddr;
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_start(vma, mmun_start, mmun_end,
 					    MMU_MIGRATE);
 
 	spin_lock(ptl);
@@ -1201,7 +1201,7 @@ alloc:
 	}
 	spin_unlock(ptl);
 out_mn:
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 out:
 	return ret;
@@ -1637,7 +1637,7 @@ static int __split_huge_page_splitting(struct page *page,
 	const unsigned long mmun_start = address;
 	const unsigned long mmun_end   = address + HPAGE_PMD_SIZE;
 
-	mmu_notifier_invalidate_range_start(mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_STATUS);
 	pmd = page_check_address_pmd(page, mm, address,
 			PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG, &ptl);
@@ -1653,7 +1653,7 @@ static int __split_huge_page_splitting(struct page *page,
 		ret = 1;
 		spin_unlock(ptl);
 	}
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_STATUS);
 
 	return ret;
@@ -2453,7 +2453,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 
 	mmun_start = address;
 	mmun_end   = address + HPAGE_PMD_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_MIGRATE);
 	pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
 	/*
@@ -2464,7 +2464,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 */
 	_pmd = pmdp_clear_flush(vma, address, pmd);
 	spin_unlock(pmd_ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 
 	spin_lock(pte_ptl);
@@ -2854,19 +2854,19 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address,
 	mmun_start = haddr;
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
 again:
-	mmu_notifier_invalidate_range_start(mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_MIGRATE);
 	ptl = pmd_lock(mm, pmd);
 	if (unlikely(!pmd_trans_huge(*pmd))) {
 		spin_unlock(ptl);
-		mmu_notifier_invalidate_range_end(mm, mmun_start,
+		mmu_notifier_invalidate_range_end(vma, mmun_start,
 						  mmun_end, MMU_MIGRATE);
 		return;
 	}
 	if (is_huge_zero_pmd(*pmd)) {
 		__split_huge_zero_page_pmd(vma, haddr, pmd);
 		spin_unlock(ptl);
-		mmu_notifier_invalidate_range_end(mm, mmun_start,
+		mmu_notifier_invalidate_range_end(vma, mmun_start,
 						  mmun_end, MMU_MIGRATE);
 		return;
 	}
@@ -2874,7 +2874,7 @@ again:
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	get_page(page);
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 
 	split_huge_page(page);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fa28018..fe3c9bb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2555,7 +2555,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	mmun_start = vma->vm_start;
 	mmun_end = vma->vm_end;
 	if (cow)
-		mmu_notifier_invalidate_range_start(src, mmun_start,
+		mmu_notifier_invalidate_range_start(vma, mmun_start,
 						    mmun_end, MMU_MIGRATE);
 
 	for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
@@ -2606,7 +2606,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	}
 
 	if (cow)
-		mmu_notifier_invalidate_range_end(src, mmun_start,
+		mmu_notifier_invalidate_range_end(vma, mmun_start,
 						  mmun_end, MMU_MIGRATE);
 
 	return ret;
@@ -2633,7 +2633,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	BUG_ON(end & ~huge_page_mask(h));
 
 	tlb_start_vma(tlb, vma);
-	mmu_notifier_invalidate_range_start(mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_MIGRATE);
 again:
 	for (address = start; address < end; address += sz) {
@@ -2705,7 +2705,7 @@ unlock:
 		if (address < end && !ref_page)
 			goto again;
 	}
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 	tlb_end_vma(tlb, vma);
 }
@@ -2884,7 +2884,7 @@ retry_avoidcopy:
 
 	mmun_start = address & huge_page_mask(h);
 	mmun_end = mmun_start + huge_page_size(h);
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_start(vma, mmun_start, mmun_end,
 					    MMU_MIGRATE);
 	/*
 	 * Retake the page table lock to check for racing updates
@@ -2905,7 +2905,7 @@ retry_avoidcopy:
 		new_page = old_page;
 	}
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_end(vma, mmun_start, mmun_end,
 					  MMU_MIGRATE);
 out_release_all:
 	page_cache_release(new_page);
@@ -3344,7 +3344,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	BUG_ON(address >= end);
 	flush_cache_range(vma, address, end);
 
-	mmu_notifier_invalidate_range_start(mm, start, end, MMU_MPROT);
+	mmu_notifier_invalidate_range_start(vma, start, end, MMU_MPROT);
 	mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
 	for (; address < end; address += huge_page_size(h)) {
 		spinlock_t *ptl;
@@ -3374,7 +3374,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	 */
 	flush_tlb_range(vma, start, end);
 	mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
-	mmu_notifier_invalidate_range_end(mm, start, end, MMU_MPROT);
+	mmu_notifier_invalidate_range_end(vma, start, end, MMU_MPROT);
 
 	return pages << h->order;
 }
diff --git a/mm/ksm.c b/mm/ksm.c
index f50e103..6a5e247 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -873,7 +873,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 
 	mmun_start = addr;
 	mmun_end   = addr + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_start(vma, mmun_start, mmun_end,
 					    MMU_WRITE_PROTECT);
 
 	ptep = page_check_address(page, mm, addr, &ptl, 0);
@@ -914,7 +914,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 out_unlock:
 	pte_unmap_unlock(ptep, ptl);
 out_mn:
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_end(vma, mmun_start, mmun_end,
 					  MMU_WRITE_PROTECT);
 out:
 	return err;
@@ -951,7 +951,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 
 	mmun_start = addr;
 	mmun_end   = addr + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_start(vma, mmun_start, mmun_end,
 					    MMU_MIGRATE);
 
 	ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
@@ -977,7 +977,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 	pte_unmap_unlock(ptep, ptl);
 	err = 0;
 out_mn:
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end,
+	mmu_notifier_invalidate_range_end(vma, mmun_start, mmun_end,
 					  MMU_MIGRATE);
 out:
 	return err;
diff --git a/mm/memory.c b/mm/memory.c
index 74fe242..c2a8948 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1049,7 +1049,7 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	mmun_start = addr;
 	mmun_end   = end;
 	if (is_cow)
-		mmu_notifier_invalidate_range_start(src_mm, mmun_start,
+		mmu_notifier_invalidate_range_start(vma, mmun_start,
 						    mmun_end, MMU_MIGRATE);
 
 	ret = 0;
@@ -1067,8 +1067,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
 
 	if (is_cow)
-		mmu_notifier_invalidate_range_end(src_mm, mmun_start, mmun_end,
-						  MMU_MIGRATE);
+		mmu_notifier_invalidate_range_end(vma, mmun_start,
+						  mmun_end, MMU_MIGRATE);
 	return ret;
 }
 
@@ -1319,6 +1319,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 	if (end <= vma->vm_start)
 		return;
 
+	mmu_notifier_invalidate_range_start(vma,
+					    max(start_addr, vma->vm_start),
+					    min(end_addr, vma->vm_end),
+					    MMU_MUNMAP);
+
 	if (vma->vm_file)
 		uprobe_munmap(vma, start, end);
 
@@ -1346,6 +1351,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 		} else
 			unmap_page_range(tlb, vma, start, end, details);
 	}
+
+	mmu_notifier_invalidate_range_end(vma,
+					  max(start_addr, vma->vm_start),
+					  min(end_addr, vma->vm_end),
+					  MMU_MUNMAP);
 }
 
 /**
@@ -1370,14 +1380,8 @@ void unmap_vmas(struct mmu_gather *tlb,
 		struct vm_area_struct *vma, unsigned long start_addr,
 		unsigned long end_addr)
 {
-	struct mm_struct *mm = vma->vm_mm;
-
-	mmu_notifier_invalidate_range_start(mm, start_addr,
-					    end_addr, MMU_MUNMAP);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next)
 		unmap_single_vma(tlb, vma, start_addr, end_addr, NULL);
-	mmu_notifier_invalidate_range_end(mm, start_addr,
-					  end_addr, MMU_MUNMAP);
 }
 
 /**
@@ -1399,10 +1403,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, start, end, MMU_MUNMAP);
 	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
 		unmap_single_vma(&tlb, vma, start, end, details);
-	mmu_notifier_invalidate_range_end(mm, start, end, MMU_MUNMAP);
 	tlb_finish_mmu(&tlb, start, end);
 }
 
@@ -1425,9 +1427,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm, address, end);
 	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, address, end, MMU_MUNMAP);
 	unmap_single_vma(&tlb, vma, address, end, details);
-	mmu_notifier_invalidate_range_end(mm, address, end, MMU_MUNMAP);
 	tlb_finish_mmu(&tlb, address, end);
 }
 
@@ -2211,7 +2211,7 @@ gotten:
 
 	mmun_start  = address & PAGE_MASK;
 	mmun_end    = mmun_start + PAGE_SIZE;
-	mmu_notifier_invalidate_range_start(mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_MIGRATE);
 
 	/*
@@ -2283,7 +2283,7 @@ gotten:
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	if (mmun_end > mmun_start)
-		mmu_notifier_invalidate_range_end(mm, mmun_start,
+		mmu_notifier_invalidate_range_end(vma, mmun_start,
 						  mmun_end, MMU_MIGRATE);
 	if (old_page) {
 		/*
diff --git a/mm/migrate.c b/mm/migrate.c
index b526c72..5c63ac8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1820,13 +1820,13 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	WARN_ON(PageLRU(new_page));
 
 	/* Recheck the target PMD */
-	mmu_notifier_invalidate_range_start(mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_MIGRATE);
 	ptl = pmd_lock(mm, pmd);
 	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {
 fail_putback:
 		spin_unlock(ptl);
-		mmu_notifier_invalidate_range_end(mm, mmun_start,
+		mmu_notifier_invalidate_range_end(vma, mmun_start,
 						  mmun_end, MMU_MIGRATE);
 
 		/* Reverse changes made by migrate_page_copy() */
@@ -1880,7 +1880,7 @@ fail_putback:
 	page_remove_rmap(page);
 
 	spin_unlock(ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 
 	/* Take an "isolate" reference and put new page on the LRU. */
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 9decb88..b56040e 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -138,52 +138,55 @@ void __mmu_notifier_change_pte(struct mm_struct *mm,
 	srcu_read_unlock(&srcu, id);
 }
 
-void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+void __mmu_notifier_invalidate_page(struct vm_area_struct *vma,
 				    unsigned long address,
 				    enum mmu_event event)
 {
+	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_notifier *mn;
 	int id;
 
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_page)
-			mn->ops->invalidate_page(mn, mm, address, event);
+			mn->ops->invalidate_page(mn, vma, address, event);
 	}
 	srcu_read_unlock(&srcu, id);
 }
 
-void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+void __mmu_notifier_invalidate_range_start(struct vm_area_struct *vma,
 					   unsigned long start,
 					   unsigned long end,
 					   enum mmu_event event)
 
 {
+	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_notifier *mn;
 	int id;
 
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_range_start)
-			mn->ops->invalidate_range_start(mn, mm, start,
+			mn->ops->invalidate_range_start(mn, vma, start,
 							end, event);
 	}
 	srcu_read_unlock(&srcu, id);
 }
 EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
 
-void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+void __mmu_notifier_invalidate_range_end(struct vm_area_struct *vma,
 					 unsigned long start,
 					 unsigned long end,
 					 enum mmu_event event)
 {
+	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_notifier *mn;
 	int id;
 
 	id = srcu_read_lock(&srcu);
 	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
 		if (mn->ops->invalidate_range_end)
-			mn->ops->invalidate_range_end(mn, mm, start,
+			mn->ops->invalidate_range_end(mn, vma, start,
 						      end, event);
 	}
 	srcu_read_unlock(&srcu, id);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 886405b..fdcb254 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -140,7 +140,6 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 		pgprot_t newprot, int dirty_accountable, int prot_numa)
 {
 	pmd_t *pmd;
-	struct mm_struct *mm = vma->vm_mm;
 	unsigned long next;
 	unsigned long pages = 0;
 	unsigned long nr_huge_updates = 0;
@@ -157,7 +156,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 		/* invoke the mmu notifier if the pmd is populated */
 		if (!mni_start) {
 			mni_start = addr;
-			mmu_notifier_invalidate_range_start(mm, mni_start,
+			mmu_notifier_invalidate_range_start(vma, mni_start,
 							    end, MMU_MPROT);
 		}
 
@@ -186,7 +185,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 	} while (pmd++, addr = next, addr != end);
 
 	if (mni_start)
-		mmu_notifier_invalidate_range_end(mm, mni_start, end, MMU_MPROT);
+		mmu_notifier_invalidate_range_end(vma, mni_start,
+						  end, MMU_MPROT);
 
 	if (nr_huge_updates)
 		count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates);
diff --git a/mm/mremap.c b/mm/mremap.c
index 6827d2f..a223c20 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -177,7 +177,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	mmun_start = old_addr;
 	mmun_end   = old_end;
-	mmu_notifier_invalidate_range_start(vma->vm_mm, mmun_start,
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
 					    mmun_end, MMU_MIGRATE);
 
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
@@ -229,7 +229,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 	if (likely(need_flush))
 		flush_tlb_range(vma, old_end-len, old_addr);
 
-	mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start,
+	mmu_notifier_invalidate_range_end(vma, mmun_start,
 					  mmun_end, MMU_MIGRATE);
 
 	return len + old_addr - old_end;	/* how much done */
diff --git a/mm/rmap.c b/mm/rmap.c
index 474a913..196cd0c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -840,7 +840,7 @@ static int page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 	pte_unmap_unlock(pte, ptl);
 
 	if (ret) {
-		mmu_notifier_invalidate_page(mm, address, MMU_WRITE_BACK);
+		mmu_notifier_invalidate_page(vma, address, MMU_WRITE_BACK);
 		(*cleaned)++;
 	}
 out:
@@ -1237,7 +1237,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
 	if (ret != SWAP_FAIL && !(flags & TTU_MUNLOCK))
-		mmu_notifier_invalidate_page(mm, address, event);
+		mmu_notifier_invalidate_page(vma, address, event);
 out:
 	return ret;
 
@@ -1325,7 +1325,8 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 
 	mmun_start = address;
 	mmun_end   = end;
-	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end, event);
+	mmu_notifier_invalidate_range_start(vma, mmun_start,
+					    mmun_end, event);
 
 	/*
 	 * If we can acquire the mmap_sem for read, and vma is VM_LOCKED,
@@ -1390,7 +1391,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 		(*mapcount)--;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
-	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end, event);
+	mmu_notifier_invalidate_range_end(vma, mmun_start, mmun_end, event);
 	if (locked_vma)
 		up_read(&vma->vm_mm->mmap_sem);
 	return ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6e1992f..35ed19c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -261,7 +261,7 @@ static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
 }
 
 static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
-					     struct mm_struct *mm,
+					     struct vm_area_struct *vma,
 					     unsigned long address,
 					     enum mmu_event event)
 {
@@ -317,7 +317,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 }
 
 static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
-						    struct mm_struct *mm,
+						    struct vm_area_struct *vma,
 						    unsigned long start,
 						    unsigned long end,
 						    enum mmu_event event)
@@ -344,7 +344,7 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 }
 
 static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
-						  struct mm_struct *mm,
+						  struct vm_area_struct *vma,
 						  unsigned long start,
 						  unsigned long end,
 						  enum mmu_event event)
-- 
1.9.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] mmput: use notifier chain to call subsystem exit handler.
       [not found] ` <1404856801-11702-2-git-send-email-j.glisse@gmail.com>
  2014-07-08 23:51   ` [PATCH 1/8] mmput: use notifier chain to call subsystem exit handler Jerome Glisse
@ 2014-07-09 16:21   ` Joerg Roedel
  2014-07-09 16:30     ` Gabbay, Oded
  2014-07-09 17:33     ` Jerome Glisse
  1 sibling, 2 replies; 14+ messages in thread
From: Joerg Roedel @ 2014-07-09 16:21 UTC (permalink / raw)
  To: j.glisse
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse, linux-mm, linux-kernel

On Tue, Jul 08, 2014 at 05:59:58PM -0400, j.glisse@gmail.com wrote:
> +int mmput_register_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&mmput_notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(mmput_register_notifier);
> +
> +int mmput_unregister_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_unregister(&mmput_notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(mmput_unregister_notifier);

I am still not convinced that this is required. For core code that needs
to hook into mmput (like aio or uprobes) it really improves code
readability if their teardown functions are called explicitly in mmput.

And drivers that deal with the mm can use the already existing
mmu_notifers. That works at least for the AMD-IOMMUv2 and KFD drivers.

Maybe HMM is different here, but then you should explain why and how it
is different and why you can't add an explicit teardown function for
HMM.


	Joerg


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/8] mm: differentiate unmap for vmscan from other unmap.
       [not found] ` <1404856801-11702-3-git-send-email-j.glisse@gmail.com>
  2014-07-08 23:51   ` [PATCH 2/8] mm: differentiate unmap for vmscan from other unmap Jerome Glisse
@ 2014-07-09 16:24   ` Joerg Roedel
  2014-07-09 17:24     ` Jerome Glisse
  1 sibling, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2014-07-09 16:24 UTC (permalink / raw)
  To: j.glisse
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse, linux-mm, linux-kernel

On Tue, Jul 08, 2014 at 05:59:59PM -0400, j.glisse@gmail.com wrote:
> From: Jerome Glisse <jglisse@redhat.com>
> 
> New code will need to be able to differentiate [...]

Why?

> between a regular unmap and an unmap trigger by vmscan in which case
> we want to be as quick as possible.

Why want you be slower as possible in the other case?


	Joerg


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] mmput: use notifier chain to call subsystem exit handler.
  2014-07-09 16:21   ` Joerg Roedel
@ 2014-07-09 16:30     ` Gabbay, Oded
  2014-07-09 17:33     ` Jerome Glisse
  1 sibling, 0 replies; 14+ messages in thread
From: Gabbay, Oded @ 2014-07-09 16:30 UTC (permalink / raw)
  To: joro@8bytes.org
  Cc: ldunning@nvidia.com, airlied@redhat.com, Morichetti, Laurent,
	Bridgman, John, hpa@zytor.com, cabuschardt@nvidia.com,
	peterz@infradead.org, jweiner@redhat.com, liranl@mellanox.com,
	raindel@mellanox.com, roland@purestorage.com, dpoole@nvidia.com,
	jglisse@redhat.com, blc@redhat.com, sgutti@nvidia.com,
	Blinzer, Paul, linux-mm@kvack.org, SCheung@nvidia.com,
	Mantor, Michael, Sander, Ben, mhairgrove@nvidia.com,
	Deucher, Alexander, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, aarcange@redhat.com,
	jdonohue@redhat.com, mgorman@suse.de, j.glisse@gmail.com,
	riel@redhat.com, Stoner, Greg, lwoodman@redhat.com,
	linux-kernel@vger.kernel.org, arvindg@nvidia.com,
	jhubbard@nvidia.com

On Wed, 2014-07-09 at 18:21 +0200, Joerg Roedel wrote:
> On Tue, Jul 08, 2014 at 05:59:58PM -0400, j.glisse@gmail.com wrote:
> >  +int mmput_register_notifier(struct notifier_block *nb)
> >  +{
> >  +        return blocking_notifier_chain_register(&mmput_notifier, nb);
> >  +}
> >  +EXPORT_SYMBOL_GPL(mmput_register_notifier);
> >  +
> >  +int mmput_unregister_notifier(struct notifier_block *nb)
> >  +{
> >  +        return blocking_notifier_chain_unregister(&mmput_notifier, nb);
> >  +}
> >  +EXPORT_SYMBOL_GPL(mmput_unregister_notifier);
>  
> I am still not convinced that this is required. For core code that 
> needs
> to hook into mmput (like aio or uprobes) it really improves code
> readability if their teardown functions are called explicitly in 
> mmput.
>  
> And drivers that deal with the mm can use the already existing
> mmu_notifers. That works at least for the AMD-IOMMUv2 and KFD 
> drivers.
>  
> Maybe HMM is different here, but then you should explain why and how 
> it
> is different and why you can't add an explicit teardown function for
> HMM.
>  
>  
>         Joerg
>  
>  
Joerg, 

It's true I'm using the callback from AMD-IOMMUv2 when it is called 
from the release notifier, but I only use that to destroy queues that 
are related to the pasid that is being unbound.
KFD driver still needs either this patch, or an explicit call to 
kfd_process_exit from mmput to release kfd_process object, which can't 
be released in the callback. I went for the explicit call but Andrew 
Morton said that this begs for converting into notifier chain. Jerome 
and I followed his advice and hence this patch.

        Oded

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/8] mmu_notifier: add event information to address invalidation v3
       [not found] ` <1404856801-11702-4-git-send-email-j.glisse@gmail.com>
  2014-07-08 23:52   ` [PATCH 3/8] mmu_notifier: add event information to address invalidation v3 Jerome Glisse
@ 2014-07-09 16:32   ` Joerg Roedel
  2014-07-09 17:16     ` Jerome Glisse
  1 sibling, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2014-07-09 16:32 UTC (permalink / raw)
  To: j.glisse
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse, linux-mm, linux-kernel

On Tue, Jul 08, 2014 at 06:00:00PM -0400, j.glisse@gmail.com wrote:
> From: Jerome Glisse <jglisse@redhat.com>
> 
> The event information will be usefull for new user of mmu_notifier API.
> The event argument differentiate between a vma disappearing, a page
> being write protected or simply a page being unmaped. This allow new
> user to take different path for different event for instance on unmap
> the resource used to track a vma are still valid and should stay around.
> While if the event is saying that a vma is being destroy it means that any
> resources used to track this vma can be free.
> 
> Changed since v1:
>   - renamed action into event (updated commit message too).
>   - simplified the event names and clarified their intented usage
>     also documenting what exceptation the listener can have in
>     respect to each event.
> 
> Changed since v2:
>   - Avoid crazy name.
>   - Do not move code that do not need to move.

Okay, I can actually see use-cases for something like this. Given the
number of invalidate_range_start/end call-sites and the semantics of
these call-backs it would allow certain optimizations to know details of
whats going on between these calls.

But why do you need this event information for all the other
mmu_notifier call-backs? In change_pte for example you already get the
address and the new pte value, isn't that enough to find out whats going
on?


	Joerg


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/8] mmu_notifier: pass through vma to invalidate_range and invalidate_page v2
       [not found]   ` <20140709164131.GQ1958@8bytes.org>
@ 2014-07-09 16:55     ` Jerome Glisse
  0 siblings, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-09 16:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay, linux-mm,
	linux-kernel

On Wed, Jul 09, 2014 at 06:41:31PM +0200, Joerg Roedel wrote:
> On Tue, Jul 08, 2014 at 06:00:01PM -0400, j.glisse@gmail.com wrote:
> > New user of the mmu_notifier interface need to lookup vma in order to
> > perform the invalidation operation. Instead of redoing a vma lookup
> > inside the callback just pass through the vma from the call site where
> > it is already available.
> 
> Why do you need to lookup the vma during the invalidation operation?
> Given that invalidate_range_start/end can be expensive to implement on
> the call-back side those calls should happen less instead of more
> frequently.
> 

To do different thing if vma is file backed and depending on the block
device the file is in.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/8] mmu_notifier: add event information to address invalidation v3
  2014-07-09 16:32   ` Joerg Roedel
@ 2014-07-09 17:16     ` Jerome Glisse
  0 siblings, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-09 17:16 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse, linux-mm, linux-kernel

On Wed, Jul 09, 2014 at 06:32:08PM +0200, Joerg Roedel wrote:
> On Tue, Jul 08, 2014 at 06:00:00PM -0400, j.glisse@gmail.com wrote:
> > From: Jerome Glisse <jglisse@redhat.com>
> > 
> > The event information will be usefull for new user of mmu_notifier API.
> > The event argument differentiate between a vma disappearing, a page
> > being write protected or simply a page being unmaped. This allow new
> > user to take different path for different event for instance on unmap
> > the resource used to track a vma are still valid and should stay around.
> > While if the event is saying that a vma is being destroy it means that any
> > resources used to track this vma can be free.
> > 
> > Changed since v1:
> >   - renamed action into event (updated commit message too).
> >   - simplified the event names and clarified their intented usage
> >     also documenting what exceptation the listener can have in
> >     respect to each event.
> > 
> > Changed since v2:
> >   - Avoid crazy name.
> >   - Do not move code that do not need to move.
> 
> Okay, I can actually see use-cases for something like this. Given the
> number of invalidate_range_start/end call-sites and the semantics of
> these call-backs it would allow certain optimizations to know details of
> whats going on between these calls.
> 
> But why do you need this event information for all the other
> mmu_notifier call-backs? In change_pte for example you already get the
> address and the new pte value, isn't that enough to find out whats going
> on?
> 

For hmm no, because hmm do not know the old pte value. Thus have no idea.
But right as i am not going to further use change_pte i do not mind much
about it so i can drop it for change_pte. But other user might still find
that useful as it avoids them to lookup the old pte and do a comparison.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/8] mm: differentiate unmap for vmscan from other unmap.
  2014-07-09 16:24   ` Joerg Roedel
@ 2014-07-09 17:24     ` Jerome Glisse
  0 siblings, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-09 17:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse, linux-mm, linux-kernel

On Wed, Jul 09, 2014 at 06:24:53PM +0200, Joerg Roedel wrote:
> On Tue, Jul 08, 2014 at 05:59:59PM -0400, j.glisse@gmail.com wrote:
> > From: Jerome Glisse <jglisse@redhat.com>
> > 
> > New code will need to be able to differentiate [...]
> 
> Why?
> 
> > between a regular unmap and an unmap trigger by vmscan in which case
> > we want to be as quick as possible.
> 
> Why want you be slower as possible in the other case?
> 

Well i should probably have updated the commit message. Trying to be
faster is one side of the coin. The other is to actualy know that the
page is considered for reclaim which is useful information in itself.
Secondary user of the page might want to mark the page as recently use
and move it to the active list.

In most case an unmap event is not as critical as a vmscan can be (note
that this depend on the current states of memory resources if they are
scarce or not). While right now there is no special code inside hmm for
offering a fast path this is definitly something that is envision on
some hardware.

Again it is all about trying to take best course of action depending on
context and the more informations we have about current context the easier
it is to properly choose.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mm: Various preparatory patches for hmm and kfd
       [not found] ` <20140709164637.GR1958@8bytes.org>
@ 2014-07-09 17:26   ` Jerome Glisse
  0 siblings, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-09 17:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay, linux-mm,
	linux-kernel

On Wed, Jul 09, 2014 at 06:46:37PM +0200, Joerg Roedel wrote:
> On Tue, Jul 08, 2014 at 05:59:57PM -0400, j.glisse@gmail.com wrote:
> > Hope this address any previous concern about those patches.
> 
> Okay, so as you may have read out of my previous comments my main
> concern is that I don't know the details why all of this is needed for
> HMM. Do you have any high-level overview of the devices HMM is aimed for
> and the design of HMM?
> 
> Maybe you have already posted this somewhere and I missed it, so any
> link to archives or explanation would be good to understand your
> use-cases better.
> 
> 

Lengthy description :

http://comments.gmane.org/gmane.linux.kernel.mm/116584

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] mmput: use notifier chain to call subsystem exit handler.
  2014-07-09 16:21   ` Joerg Roedel
  2014-07-09 16:30     ` Gabbay, Oded
@ 2014-07-09 17:33     ` Jerome Glisse
  1 sibling, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2014-07-09 17:33 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: akpm, Linus Torvalds, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Duncan Poole,
	Sherry Cheung, Subhash Gutti, John Hubbard, Mark Hairgrove,
	Lucien Dunning, Cameron Buschardt, Arvind Gopalakrishnan,
	Shachar Raindel, Liran Liss, Roland Dreier, Ben Sander,
	Greg Stoner, John Bridgman, Michael Mantor, Paul Blinzer,
	Laurent Morichetti, Alexander Deucher, Oded Gabbay,
	Jérôme Glisse, linux-mm, linux-kernel

On Wed, Jul 09, 2014 at 06:21:24PM +0200, Joerg Roedel wrote:
> On Tue, Jul 08, 2014 at 05:59:58PM -0400, j.glisse@gmail.com wrote:
> > +int mmput_register_notifier(struct notifier_block *nb)
> > +{
> > +	return blocking_notifier_chain_register(&mmput_notifier, nb);
> > +}
> > +EXPORT_SYMBOL_GPL(mmput_register_notifier);
> > +
> > +int mmput_unregister_notifier(struct notifier_block *nb)
> > +{
> > +	return blocking_notifier_chain_unregister(&mmput_notifier, nb);
> > +}
> > +EXPORT_SYMBOL_GPL(mmput_unregister_notifier);
> 
> I am still not convinced that this is required. For core code that needs
> to hook into mmput (like aio or uprobes) it really improves code
> readability if their teardown functions are called explicitly in mmput.
> 
> And drivers that deal with the mm can use the already existing
> mmu_notifers. That works at least for the AMD-IOMMUv2 and KFD drivers.
> 
> Maybe HMM is different here, but then you should explain why and how it
> is different and why you can't add an explicit teardown function for
> HMM.

My first patchset added a call to hmm in mmput but Andrew asked me to
instead add a notifier chain as he foresee more user for that. Hence
why i did this patch.

On why hmm need to cleanup here it is simple :
  - hmm is tie to mm_struct (add a pointer to mm_struct)
  - hmm pointer of mm_struct is clear on fork
  - hmm object lifespan should be the same as mm_struct
  - device file descriptor can outlive the mm_struct into which they
    were open and thus an hmm structure that was allocated on behalf
    of a device driver would stay allocated for as long as children
    that have no use for it leaves (ie until they close the device
    file).

So again, hmm is tie to mm_struct life span. We want to free hmm and
its resources when mm is destroyed. We can not do that in file >release
callback because it might happen long after the mm struct is free.

We can not do that from mmu_notifier release callback because this
would lead to use after free.

We could add a delayed job from mmu_notifier callback but this would
be hacky as we would have no way to synchronize ourself with the mm
destruction without complex rules and crazy code.

So again i do not see any alternative to hmm interfacing them and i
genuinely belive iommuv2 is in the same situation as us thus justifying
even more the notifier chain idea.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-07-09 17:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1404856801-11702-1-git-send-email-j.glisse@gmail.com>
2014-07-08 23:50 ` mm: Various preparatory patches for hmm and kfd Jerome Glisse
     [not found] ` <1404856801-11702-2-git-send-email-j.glisse@gmail.com>
2014-07-08 23:51   ` [PATCH 1/8] mmput: use notifier chain to call subsystem exit handler Jerome Glisse
2014-07-09 16:21   ` Joerg Roedel
2014-07-09 16:30     ` Gabbay, Oded
2014-07-09 17:33     ` Jerome Glisse
     [not found] ` <1404856801-11702-3-git-send-email-j.glisse@gmail.com>
2014-07-08 23:51   ` [PATCH 2/8] mm: differentiate unmap for vmscan from other unmap Jerome Glisse
2014-07-09 16:24   ` Joerg Roedel
2014-07-09 17:24     ` Jerome Glisse
     [not found] ` <1404856801-11702-4-git-send-email-j.glisse@gmail.com>
2014-07-08 23:52   ` [PATCH 3/8] mmu_notifier: add event information to address invalidation v3 Jerome Glisse
2014-07-09 16:32   ` Joerg Roedel
2014-07-09 17:16     ` Jerome Glisse
     [not found] ` <1404856801-11702-5-git-send-email-j.glisse@gmail.com>
2014-07-08 23:52   ` [PATCH 4/8] mmu_notifier: pass through vma to invalidate_range and invalidate_page v2 Jerome Glisse
     [not found]   ` <20140709164131.GQ1958@8bytes.org>
2014-07-09 16:55     ` Jerome Glisse
     [not found] ` <20140709164637.GR1958@8bytes.org>
2014-07-09 17:26   ` mm: Various preparatory patches for hmm and kfd Jerome Glisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).