* [patch 0/6] [RFC] MMU Notifiers V3
@ 2008-01-30 2:29 Christoph Lameter
2008-01-30 2:29 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
` (5 more replies)
0 siblings, 6 replies; 24+ messages in thread
From: Christoph Lameter @ 2008-01-30 2:29 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Benjamin Herrenschmidt, steiner-sJ/iWh9BUns,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity,
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins
This is a patchset implementing MMU notifier callbacks based on Andrea's
earlier work. These are needed if Linux pages are referenced from something
else than tracked by the rmaps of the kernel. The known immediate users are
KVM (establishes a refcount to the page. External references called spte)
GRU (simple TLB shootdown without refcount. Has its own pagetable/tlb)
XPmem (uses its own reverse mappings and refcount. Remote ptes, Needs
to sleep when sending messages)
Issues:
- Feedback from uses of the callbacks for KVM, RDMA, XPmem and GRU
Early tests with the GRU were successful.
- Pages may be freed before the external mapping are torn down
through invalidate_range() if no refcount on the page is taken.
There is the chance that page content may be visible after
they have been reallocated (mainly an issue for the GRU that
takes no refcount).
- invalidate_range() callbacks are sometimes called under i_mmap_lock.
These need to be dealt with or XPmem needs to be able to work around
these.
- filemap_xip.c does not follow conventions for Rmap callbacks.
We could depends on XIP support not being active to avoid the issue.
Things that we leave as is:
- RCU quiescent periods are required on registering and unregistering
notifiers to guarantee visibility to other processors.
Currently only mmu_notifier_release() does the correct thing.
It is up to the user to provide RCU quiescent periods for
register/unregister functions if they are called outside of the
->release method.
Andrea's mmu_notifier #4 -> RFC V1
- Merge subsystem rmap based with Linux rmap based approach
- Move Linux rmap based notifiers out of macro
- Try to account for what locks are held while the notifiers are
called.
- Develop a patch sequence that separates out the different types of
hooks so that we can review their use.
- Avoid adding include to linux/mm_types.h
- Integrate RCU logic suggested by Peter.
V1->V2:
- Improve RCU support
- Use mmap_sem for mmu_notifier register / unregister
- Drop invalidate_page from COW, mm/fremap.c and mm/rmap.c since we
already have invalidate_range() callbacks there.
- Clean compile for !MMU_NOTIFIER
- Isolate filemap_xip strangeness into its own diff
- Pass a the flag to invalidate_range to indicate if a spinlock
is held.
- Add invalidate_all()
V2->V3:
- Further RCU fixes
- Fixes from Andrea to fixup aging and move invalidate_range() in do_wp_page
and sys_remap_file_pages() after the pte clearing.
--
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 24+ messages in thread* [patch 1/6] mmu_notifier: Core code 2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter @ 2008-01-30 2:29 ` Christoph Lameter [not found] ` <20080130022944.236370194-sJ/iWh9BUns@public.gmane.org> 2008-01-30 2:29 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter ` (4 subsequent siblings) 5 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 2:29 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins [-- Attachment #1: mmu_core --] [-- Type: text/plain, Size: 15636 bytes --] Core code for mmu notifiers. Signed-off-by: Christoph Lameter <clameter-sJ/iWh9BUns@public.gmane.org> Signed-off-by: Andrea Arcangeli <andrea-atKUWr5tajBWk0Htik3J/w@public.gmane.org> --- include/linux/list.h | 14 ++ include/linux/mm_types.h | 6 + include/linux/mmu_notifier.h | 210 +++++++++++++++++++++++++++++++++++++++++++ include/linux/page-flags.h | 10 ++ kernel/fork.c | 2 mm/Kconfig | 4 mm/Makefile | 1 mm/mmap.c | 2 mm/mmu_notifier.c | 101 ++++++++++++++++++++ 9 files changed, 350 insertions(+) Index: linux-2.6/include/linux/mm_types.h =================================================================== --- linux-2.6.orig/include/linux/mm_types.h 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/include/linux/mm_types.h 2008-01-29 16:56:36.000000000 -0800 @@ -153,6 +153,10 @@ struct vm_area_struct { #endif }; +struct mmu_notifier_head { + struct hlist_head head; +}; + struct mm_struct { struct vm_area_struct * mmap; /* list of VMAs */ struct rb_root mm_rb; @@ -219,6 +223,8 @@ struct mm_struct { /* aio bits */ rwlock_t ioctx_list_lock; struct kioctx *ioctx_list; + + struct mmu_notifier_head mmu_notifier; /* MMU notifier list */ }; #endif /* _LINUX_MM_TYPES_H */ Index: linux-2.6/include/linux/mmu_notifier.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/include/linux/mmu_notifier.h 2008-01-29 16:56:36.000000000 -0800 @@ -0,0 +1,210 @@ +#ifndef _LINUX_MMU_NOTIFIER_H +#define _LINUX_MMU_NOTIFIER_H + +/* + * MMU motifier + * + * Notifier functions for hardware and software that establishes external + * references to pages of a Linux system. The notifier calls ensure that + * the external mappings are removed when the Linux VM removes memory ranges + * or individual pages from a process. + * + * These fall into two classes + * + * 1. mmu_notifier + * + * These are callbacks registered with an mm_struct. If mappings are + * removed from an address space then callbacks are performed. + * Spinlocks must be held in order to the walk reverse maps and the + * notifications are performed while the spinlock is held. + * + * + * 2. mmu_rmap_notifier + * + * Callbacks for subsystems that provide their own rmaps. These + * need to walk their own rmaps for a page. The invalidate_page + * callback is outside of locks so that we are not in a strictly + * atomic context (but we may be in a PF_MEMALLOC context if the + * notifier is called from reclaim code) and are able to sleep. + * Rmap notifiers need an extra page bit and are only available + * on 64 bit platforms. It is up to the subsystem to mark pags + * as PageExternalRmap as needed to trigger the callbacks. Pages + * must be marked dirty if dirty bits are set in the external + * pte. + */ + +#include <linux/list.h> +#include <linux/spinlock.h> +#include <linux/rcupdate.h> +#include <linux/mm_types.h> + +struct mmu_notifier_ops; + +struct mmu_notifier { + struct hlist_node hlist; + const struct mmu_notifier_ops *ops; +}; + +struct mmu_notifier_ops { + /* + * Note: The mmu_notifier structure must be released with + * call_rcu() since other processors are only guaranteed to + * see the changes after a quiescent period. + */ + void (*release)(struct mmu_notifier *mn, + struct mm_struct *mm); + + int (*age_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + void (*invalidate_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* + * lock indicates that the function is called under spinlock. + */ + void (*invalidate_range)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end, + int lock); +}; + +struct mmu_rmap_notifier_ops; + +struct mmu_rmap_notifier { + struct hlist_node hlist; + const struct mmu_rmap_notifier_ops *ops; +}; + +struct mmu_rmap_notifier_ops { + /* + * Called with the page lock held after ptes are modified or removed + * so that a subsystem with its own rmap's can remove remote ptes + * mapping a page. + */ + void (*invalidate_page)(struct mmu_rmap_notifier *mrn, + struct page *page); +}; + +#ifdef CONFIG_MMU_NOTIFIER + +/* + * Must hold the mmap_sem for write. + * + * RCU is used to traverse the list. A quiescent period needs to pass + * before the notifier is guaranteed to be visible to all threads + */ +extern void __mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +/* Will acquire mmap_sem for write*/ +extern void mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +/* + * Will acquire mmap_sem for write. + * + * A quiescent period needs to pass before the mmu_notifier structure + * can be released. mmu_notifier_release() will wait for a quiescent period + * after calling the ->release callback. So it is safe to call + * mmu_notifier_unregister from the ->release function. + */ +extern void mmu_notifier_unregister(struct mmu_notifier *mn, + struct mm_struct *mm); + + +extern void mmu_notifier_release(struct mm_struct *mm); +extern int mmu_notifier_age_page(struct mm_struct *mm, + unsigned long address); + +static inline void mmu_notifier_head_init(struct mmu_notifier_head *mnh) +{ + INIT_HLIST_HEAD(&mnh->head); +} + +#define mmu_notifier(function, mm, args...) \ + do { \ + struct mmu_notifier *__mn; \ + struct hlist_node *__n; \ + \ + if (unlikely(!hlist_empty(&(mm)->mmu_notifier.head))) { \ + rcu_read_lock(); \ + hlist_for_each_entry_rcu(__mn, __n, \ + &(mm)->mmu_notifier.head, \ + hlist) \ + if (__mn->ops->function) \ + __mn->ops->function(__mn, \ + mm, \ + args); \ + rcu_read_unlock(); \ + } \ + } while (0) + +extern void mmu_rmap_notifier_register(struct mmu_rmap_notifier *mrn); +extern void mmu_rmap_notifier_unregister(struct mmu_rmap_notifier *mrn); + +extern struct hlist_head mmu_rmap_notifier_list; + +#define mmu_rmap_notifier(function, args...) \ + do { \ + struct mmu_rmap_notifier *__mrn; \ + struct hlist_node *__n; \ + \ + rcu_read_lock(); \ + hlist_for_each_entry_rcu(__mrn, __n, \ + &mmu_rmap_notifier_list, \ + hlist) \ + if (__mrn->ops->function) \ + __mrn->ops->function(__mrn, args); \ + rcu_read_unlock(); \ + } while (0); + +#else /* CONFIG_MMU_NOTIFIER */ + +/* + * Notifiers that use the parameters that they were passed so that the + * compiler does not complain about unused variables but does proper + * parameter checks even if !CONFIG_MMU_NOTIFIER. + * Macros generate no code. + */ +#define mmu_notifier(function, mm, args...) \ + do { \ + if (0) { \ + struct mmu_notifier *__mn; \ + \ + __mn = (struct mmu_notifier *)(0x00ff); \ + __mn->ops->function(__mn, mm, args); \ + }; \ + } while (0) + +#define mmu_rmap_notifier(function, args...) \ + do { \ + if (0) { \ + struct mmu_rmap_notifier *__mrn; \ + \ + __mrn = (struct mmu_rmap_notifier *)(0x00ff); \ + __mrn->ops->function(__mrn, args); \ + } \ + } while (0); + +static inline void mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm) {} +static inline void mmu_notifier_unregister(struct mmu_notifier *mn, + struct mm_struct *mm) {} +static inline void mmu_notifier_release(struct mm_struct *mm) {} +static inline int mmu_notifier_age_page(struct mm_struct *mm, + unsigned long address) +{ + return 0; +} + +static inline void mmu_notifier_head_init(struct mmu_notifier_head *mmh) {} + +static inline void mmu_rmap_notifier_register(struct mmu_rmap_notifier *mrn) + {} +static inline void mmu_rmap_notifier_unregister(struct mmu_rmap_notifier *mrn) + {} + +#endif /* CONFIG_MMU_NOTIFIER */ + +#endif /* _LINUX_MMU_NOTIFIER_H */ Index: linux-2.6/include/linux/page-flags.h =================================================================== --- linux-2.6.orig/include/linux/page-flags.h 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/include/linux/page-flags.h 2008-01-29 16:56:36.000000000 -0800 @@ -105,6 +105,7 @@ * 64 bit | FIELDS | ?????? FLAGS | * 63 32 0 */ +#define PG_external_rmap 30 /* Page has external rmap */ #define PG_uncached 31 /* Page has been mapped as uncached */ #endif @@ -260,6 +261,15 @@ static inline void __ClearPageTail(struc #define SetPageUncached(page) set_bit(PG_uncached, &(page)->flags) #define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags) +#if defined(CONFIG_MMU_NOTIFIER) && defined(CONFIG_64BIT) +#define PageExternalRmap(page) test_bit(PG_external_rmap, &(page)->flags) +#define SetPageExternalRmap(page) set_bit(PG_external_rmap, &(page)->flags) +#define ClearPageExternalRmap(page) clear_bit(PG_external_rmap, \ + &(page)->flags) +#else +#define PageExternalRmap(page) 0 +#endif + struct page; /* forward declaration */ extern void cancel_dirty_page(struct page *page, unsigned int account_size); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/mm/Kconfig 2008-01-29 16:56:36.000000000 -0800 @@ -193,3 +193,7 @@ config NR_QUICK config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config MMU_NOTIFIER + def_bool y + bool "MMU notifier, for paging KVM/RDMA" Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/mm/Makefile 2008-01-29 16:56:36.000000000 -0800 @@ -30,4 +30,5 @@ obj-$(CONFIG_FS_XIP) += filemap_xip.o obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_SMP) += allocpercpu.o obj-$(CONFIG_QUICKLIST) += quicklist.o +obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o Index: linux-2.6/mm/mmu_notifier.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/mm/mmu_notifier.c 2008-01-29 16:57:26.000000000 -0800 @@ -0,0 +1,101 @@ +/* + * linux/mm/mmu_notifier.c + * + * Copyright (C) 2008 Qumranet, Inc. + * Copyright (C) 2008 SGI + * Christoph Lameter <clameter-sJ/iWh9BUns@public.gmane.org> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include <linux/mmu_notifier.h> +#include <linux/module.h> + +void mmu_notifier_release(struct mm_struct *mm) +{ + struct mmu_notifier *mn; + struct hlist_node *n, *t; + + if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) { + rcu_read_lock(); + hlist_for_each_entry_safe_rcu(mn, n, t, + &mm->mmu_notifier.head, hlist) { + hlist_del_rcu(&mn->hlist); + if (mn->ops->release) + mn->ops->release(mn, mm); + } + rcu_read_unlock(); + synchronize_rcu(); + } +} + +/* + * If no young bitflag is supported by the hardware, ->age_page can + * unmap the address and return 1 or 0 depending if the mapping previously + * existed or not. + */ +int mmu_notifier_age_page(struct mm_struct *mm, unsigned long address) +{ + struct mmu_notifier *mn; + struct hlist_node *n; + int young = 0; + + if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) { + rcu_read_lock(); + hlist_for_each_entry_rcu(mn, n, + &mm->mmu_notifier.head, hlist) { + if (mn->ops->age_page) + young |= mn->ops->age_page(mn, mm, address); + } + rcu_read_unlock(); + } + + return young; +} + +/* + * Note that all notifiers use RCU. The updates are only guaranteed to be + * visible to other processes after a RCU quiescent period! + */ +void __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) +{ + hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier.head); +} +EXPORT_SYMBOL_GPL(__mmu_notifier_register); + +void mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) +{ + down_write(&mm->mmap_sem); + __mmu_notifier_register(mn, mm); + up_write(&mm->mmap_sem); +} +EXPORT_SYMBOL_GPL(mmu_notifier_register); + +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm) +{ + down_write(&mm->mmap_sem); + hlist_del_rcu(&mn->hlist); + up_write(&mm->mmap_sem); +} +EXPORT_SYMBOL_GPL(mmu_notifier_unregister); + +static DEFINE_SPINLOCK(mmu_notifier_list_lock); +HLIST_HEAD(mmu_rmap_notifier_list); + +void mmu_rmap_notifier_register(struct mmu_rmap_notifier *mrn) +{ + spin_lock(&mmu_notifier_list_lock); + hlist_add_head_rcu(&mrn->hlist, &mmu_rmap_notifier_list); + spin_unlock(&mmu_notifier_list_lock); +} +EXPORT_SYMBOL(mmu_rmap_notifier_register); + +void mmu_rmap_notifier_unregister(struct mmu_rmap_notifier *mrn) +{ + spin_lock(&mmu_notifier_list_lock); + hlist_del_rcu(&mrn->hlist); + spin_unlock(&mmu_notifier_list_lock); +} +EXPORT_SYMBOL(mmu_rmap_notifier_unregister); + Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/kernel/fork.c 2008-01-29 16:56:36.000000000 -0800 @@ -52,6 +52,7 @@ #include <linux/tty.h> #include <linux/proc_fs.h> #include <linux/blkdev.h> +#include <linux/mmu_notifier.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -360,6 +361,7 @@ static struct mm_struct * mm_init(struct if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; + mmu_notifier_head_init(&mm->mmu_notifier); return mm; } free_mm(mm); Index: linux-2.6/mm/mmap.c =================================================================== --- linux-2.6.orig/mm/mmap.c 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/mm/mmap.c 2008-01-29 16:56:36.000000000 -0800 @@ -26,6 +26,7 @@ #include <linux/mount.h> #include <linux/mempolicy.h> #include <linux/rmap.h> +#include <linux/mmu_notifier.h> #include <asm/uaccess.h> #include <asm/cacheflush.h> @@ -2043,6 +2044,7 @@ void exit_mmap(struct mm_struct *mm) vm_unacct_memory(nr_accounted); free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0); tlb_finish_mmu(tlb, 0, end); + mmu_notifier_release(mm); /* * Walk the list again, actually closing and freeing it, Index: linux-2.6/include/linux/list.h =================================================================== --- linux-2.6.orig/include/linux/list.h 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/include/linux/list.h 2008-01-29 16:56:36.000000000 -0800 @@ -991,6 +991,20 @@ static inline void hlist_add_after_rcu(s ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \ pos = pos->next) +/** + * hlist_for_each_entry_safe_rcu - iterate over list of given type + * @tpos: the type * to use as a loop cursor. + * @pos: the &struct hlist_node to use as a loop cursor. + * @n: temporary pointer + * @head: the head for your list. + * @member: the name of the hlist_node within the struct. + */ +#define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \ + for (pos = (head)->first; \ + rcu_dereference(pos) && ({ n = pos->next; 1;}) && \ + ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \ + pos = n) + #else #warning "don't include kernel headers in userspace" #endif /* __KERNEL__ */ -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20080130022944.236370194-sJ/iWh9BUns@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130022944.236370194-sJ/iWh9BUns@public.gmane.org> @ 2008-01-30 15:37 ` Andrea Arcangeli [not found] ` <20080130153749.GN7233-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> 2008-01-30 17:10 ` Peter Zijlstra 2008-01-30 18:02 ` Robin Holt 1 sibling, 2 replies; 24+ messages in thread From: Andrea Arcangeli @ 2008-01-30 15:37 UTC (permalink / raw) To: Christoph Lameter Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins On Tue, Jan 29, 2008 at 06:29:10PM -0800, Christoph Lameter wrote: > +void mmu_notifier_release(struct mm_struct *mm) > +{ > + struct mmu_notifier *mn; > + struct hlist_node *n, *t; > + > + if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) { > + rcu_read_lock(); > + hlist_for_each_entry_safe_rcu(mn, n, t, > + &mm->mmu_notifier.head, hlist) { > + hlist_del_rcu(&mn->hlist); This will race and kernel crash against mmu_notifier_register in SMP. You should resurrect the per-mmu_notifier_head lock in my last patch (except it can be converted from a rwlock_t to a regular spinlock_t) and drop the mmap_sem from mmu_notifier_register/unregister. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20080130153749.GN7233-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130153749.GN7233-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2008-01-30 15:53 ` Jack Steiner [not found] ` <20080130155306.GA13746-sJ/iWh9BUns@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Jack Steiner @ 2008-01-30 15:53 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins, Christoph Lameter On Wed, Jan 30, 2008 at 04:37:49PM +0100, Andrea Arcangeli wrote: > On Tue, Jan 29, 2008 at 06:29:10PM -0800, Christoph Lameter wrote: > > +void mmu_notifier_release(struct mm_struct *mm) > > +{ > > + struct mmu_notifier *mn; > > + struct hlist_node *n, *t; > > + > > + if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) { > > + rcu_read_lock(); > > + hlist_for_each_entry_safe_rcu(mn, n, t, > > + &mm->mmu_notifier.head, hlist) { > > + hlist_del_rcu(&mn->hlist); > > This will race and kernel crash against mmu_notifier_register in > SMP. You should resurrect the per-mmu_notifier_head lock in my last > patch (except it can be converted from a rwlock_t to a regular > spinlock_t) and drop the mmap_sem from > mmu_notifier_register/unregister. Agree. That will also resolve the problem we discussed yesterday. I want to unregister my mmu_notifier when a GRU segment is unmapped. This would not necessarily be at task termination. However, the mmap_sem is already held for write by the core VM at the point I would call the unregister function. Currently, there is no __mmu_notifier_unregister() defined. Moving to a different lock solves the problem. -- jack ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20080130155306.GA13746-sJ/iWh9BUns@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130155306.GA13746-sJ/iWh9BUns@public.gmane.org> @ 2008-01-30 16:38 ` Andrea Arcangeli 2008-01-30 19:19 ` Christoph Lameter 1 sibling, 0 replies; 24+ messages in thread From: Andrea Arcangeli @ 2008-01-30 16:38 UTC (permalink / raw) To: Jack Steiner Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins, Christoph Lameter On Wed, Jan 30, 2008 at 09:53:06AM -0600, Jack Steiner wrote: > That will also resolve the problem we discussed yesterday. > I want to unregister my mmu_notifier when a GRU segment is > unmapped. This would not necessarily be at task termination. My proof that there is something wrong in the smp locking of the current code is very simple: it can't be right to use hlist_for_each_entry_safe_rcu and rcu_read_lock inside mmu_notifier_release, and then to call hlist_del_rcu without any spinlock or semaphore. If we walk the list with hlist_for_each_entry_safe_rcu (and not with hlist_for_each_entry_safe), it means the list _can_ change from under us, and in turn the hlist_del_rcu must be surrounded by a spinlock or sempahore too! If by design the list _can't_ change from under us and calling hlist_del_rcu was safe w/o locks, then hlist_for_each_entry_safe is _sure_ enough for mmu_notifier_release, and rcu_read_lock most certainly can be removed too. To make an usage case where the race could trigger, I was thinking at somebody bumping the mm_count (not mm_users) and registering a notifier while mmu_notifier_release runs and relaying on ->release to know if it has to run mmu_notifier_unregister. However I now started wondering how it can relay on ->release to know that if ->release is called after hlist_del_rcu because with the latest changes ->release will also allow the mn to release itself ;). It's unsafe to call list_del_rcu twice (the second will crash on a poisoned entry). This starts to make me think we should remove the auto-disarming feature and require the notifier-user to have the ->release call mmu_notifier_unregister first and to free the "mn" inside ->release too if needed. Or alternatively the notifier-user can bump mm_count and to call a mmu_notifier_unregister before calling mmdrop (like kvm could do). Another approach is to simply define mmu_notifier_release as implicitly serialized by other code design, with a real lock (not rcu) against the whole register/unregister operations. So to guarantee the notifier list can't change from under us while mmu_notifier_release runs. If we go this route, yes, the auto-disarming hlist_del can be kept, the current code would have been safe, but to avoid confusion the mmu_notifier_release shall become this: void mmu_notifier_release(struct mm_struct *mm) { struct mmu_notifier *mn; struct hlist_node *n, *t; if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) { hlist_for_each_entry_safe(mn, n, t, &mm->mmu_notifier.head, hlist) { hlist_del(&mn->hlist); if (mn->ops->release) mn->ops->release(mn, mm); } } } > However, the mmap_sem is already held for write by the core > VM at the point I would call the unregister function. > Currently, there is no __mmu_notifier_unregister() defined. > > Moving to a different lock solves the problem. Unless the mmu_notifier_release becomes like above and we rely on the user of the mmu notifiers to implement a highlevel external lock that will we definitely forbid to bump the mm_count of the mm, and to call register/unregister while mmu_notifier_release could run, 1) moving to a different lock and 2) removing the auto-disarming hlist_del_rcu from mmu_notifier_release sounds the only possible smp safe way. As far as KVM is concerned mmu_notifier_released could be changed to the version I written above and everything should be ok. For KVM the mm_count bump is done by the task that also holds a mm_user, so when exit_mmap runs I don't think the list could possible change anymore. Anyway those are details that can be perfected after mainline merging, so this isn't something to worry about too much right now. My idea is to keep working to perfect it while I hope progress is being made by Christoph to merge the mmu notifiers V3 patchset in mainline ;). ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130155306.GA13746-sJ/iWh9BUns@public.gmane.org> 2008-01-30 16:38 ` Andrea Arcangeli @ 2008-01-30 19:19 ` Christoph Lameter [not found] ` <Pine.LNX.4.64.0801301116510.27491-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org> 1 sibling, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 19:19 UTC (permalink / raw) To: Jack Steiner Cc: Nick Piggin, Andrea Arcangeli, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins On Wed, 30 Jan 2008, Jack Steiner wrote: > Moving to a different lock solves the problem. Well it gets us back to the issue why we removed the lock. As Robin said before: If its global then we can have a huge number of tasks contending for the lock on startup of a process with a large number of ranks. The reason to go to mmap_sem was that it was placed in the mm_struct and so we would just have a couple of contentions per mm_struct. I'll be looking for some other way to do this. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <Pine.LNX.4.64.0801301116510.27491-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <Pine.LNX.4.64.0801301116510.27491-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org> @ 2008-01-30 22:20 ` Robin Holt [not found] ` <20080130222035.GX26420-sJ/iWh9BUns@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Robin Holt @ 2008-01-30 22:20 UTC (permalink / raw) To: Christoph Lameter Cc: Nick Piggin, Andrea Arcangeli, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, Jack Steiner, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins On Wed, Jan 30, 2008 at 11:19:28AM -0800, Christoph Lameter wrote: > On Wed, 30 Jan 2008, Jack Steiner wrote: > > > Moving to a different lock solves the problem. > > Well it gets us back to the issue why we removed the lock. As Robin said > before: If its global then we can have a huge number of tasks contending > for the lock on startup of a process with a large number of ranks. The > reason to go to mmap_sem was that it was placed in the mm_struct and so we > would just have a couple of contentions per mm_struct. > > I'll be looking for some other way to do this. I think Andrea's original concept of the lock in the mmu_notifier_head structure was the best. I agree with him that it should be a spinlock instead of the rw_lock. Thanks, Robin ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20080130222035.GX26420-sJ/iWh9BUns@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130222035.GX26420-sJ/iWh9BUns@public.gmane.org> @ 2008-01-30 23:38 ` Andrea Arcangeli 2008-01-30 23:55 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Andrea Arcangeli @ 2008-01-30 23:38 UTC (permalink / raw) To: Robin Holt Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, Jack Steiner, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Hugh Dickins, Christoph Lameter On Wed, Jan 30, 2008 at 04:20:35PM -0600, Robin Holt wrote: > On Wed, Jan 30, 2008 at 11:19:28AM -0800, Christoph Lameter wrote: > > On Wed, 30 Jan 2008, Jack Steiner wrote: > > > > > Moving to a different lock solves the problem. > > > > Well it gets us back to the issue why we removed the lock. As Robin said > > before: If its global then we can have a huge number of tasks contending > > for the lock on startup of a process with a large number of ranks. The > > reason to go to mmap_sem was that it was placed in the mm_struct and so we > > would just have a couple of contentions per mm_struct. > > > > I'll be looking for some other way to do this. > > I think Andrea's original concept of the lock in the mmu_notifier_head > structure was the best. I agree with him that it should be a spinlock > instead of the rw_lock. BTW, I don't see the scalability concern with huge number of tasks: the lock is still in the mm, down_write(mm->mmap_sem); oneinstruction; up_write(mm->mmap_sem) is always going to scale worse than spin_lock(mm->somethingelse); oneinstruction; spin_unlock(mm->somethinglese). Furthermore if we go this route and we don't relay on implicit serialization of all the mmu notifier users against exit_mmap (i.e. the mmu notifier user must agree to stop calling mmu_notifier_register on a mm after the last mmput) the autodisarming feature will likely have to be removed or it can't possibly be safe to run mmu_notifier_unregister while mmu_notifier_release runs. With the auto-disarming feature, there is no way to safely know if mmu_notifier_unregister has to be called or not. I'm ok with removing the auto-disarming feature and to have as self-contained-as-possible locking. Then mmu_notifier_release can just become the invalidate_all_after and invalidate_all, invalidate_all_before. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch 1/6] mmu_notifier: Core code 2008-01-30 23:38 ` Andrea Arcangeli @ 2008-01-30 23:55 ` Christoph Lameter [not found] ` <Pine.LNX.4.64.0801301552210.1722-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 23:55 UTC (permalink / raw) To: Andrea Arcangeli Cc: Robin Holt, Jack Steiner, Avi Kivity, Izik Eidus, Nick Piggin, kvm-devel, Benjamin Herrenschmidt, Peter Zijlstra, linux-kernel, linux-mm, daniel.blueman, Hugh Dickins On Thu, 31 Jan 2008, Andrea Arcangeli wrote: > > I think Andrea's original concept of the lock in the mmu_notifier_head > > structure was the best. I agree with him that it should be a spinlock > > instead of the rw_lock. > > BTW, I don't see the scalability concern with huge number of tasks: > the lock is still in the mm, down_write(mm->mmap_sem); oneinstruction; > up_write(mm->mmap_sem) is always going to scale worse than > spin_lock(mm->somethingelse); oneinstruction; > spin_unlock(mm->somethinglese). If we put it elsewhere in the mm then we increase the size of the memory used in the mm_struct. > Furthermore if we go this route and we don't relay on implicit > serialization of all the mmu notifier users against exit_mmap > (i.e. the mmu notifier user must agree to stop calling > mmu_notifier_register on a mm after the last mmput) the autodisarming > feature will likely have to be removed or it can't possibly be safe to > run mmu_notifier_unregister while mmu_notifier_release runs. With the > auto-disarming feature, there is no way to safely know if > mmu_notifier_unregister has to be called or not. I'm ok with removing > the auto-disarming feature and to have as self-contained-as-possible > locking. Then mmu_notifier_release can just become the > invalidate_all_after and invalidate_all, invalidate_all_before. Hmmmm.. exit_mmap is only called when the last reference is removed against the mm right? So no tasks are running anymore. No pages are left. Do we need to serialize at all for mmu_notifier_release? ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <Pine.LNX.4.64.0801301552210.1722-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <Pine.LNX.4.64.0801301552210.1722-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org> @ 2008-01-31 0:12 ` Andrea Arcangeli [not found] ` <20080131001258.GD7185-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 24+ messages in thread From: Andrea Arcangeli @ 2008-01-31 0:12 UTC (permalink / raw) To: Christoph Lameter Cc: Nick Piggin, Peter Zijlstra, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Benjamin Herrenschmidt, Jack Steiner, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins On Wed, Jan 30, 2008 at 03:55:37PM -0800, Christoph Lameter wrote: > On Thu, 31 Jan 2008, Andrea Arcangeli wrote: > > > > I think Andrea's original concept of the lock in the mmu_notifier_head > > > structure was the best. I agree with him that it should be a spinlock > > > instead of the rw_lock. > > > > BTW, I don't see the scalability concern with huge number of tasks: > > the lock is still in the mm, down_write(mm->mmap_sem); oneinstruction; > > up_write(mm->mmap_sem) is always going to scale worse than > > spin_lock(mm->somethingelse); oneinstruction; > > spin_unlock(mm->somethinglese). > > If we put it elsewhere in the mm then we increase the size of the memory > used in the mm_struct. Yes, and it will increase of the same amount of RAM that you pretend everyone to pay even if MMU_NOTIFIER=n after your patch is applied (vs mine that generated 0 ram utilization increase when MMU_NOTIFIER=n). And the additional ram will provide not just self-contained locking but higher scalability too. I think it's much more important to generate zero ram and CPU overhead for the embedded (this is something I was very careful to enforce in all my patches), than to reduce scalability and not having a self contained locking on full configurations with MMU_NOTIFIER=y. > Hmmmm.. exit_mmap is only called when the last reference is removed > against the mm right? So no tasks are running anymore. No pages are left. > Do we need to serialize at all for mmu_notifier_release? KVM sure doesn't need any locking there. I thought somebody had to possibly take a pin on the "mm_count" and pretend to call mmu_notifier_register at will until mmdrop was finally called, in a out of order fashion given mmu_notifier_release was implemented like if the list could change from under it. Note mmdrop != mmput. mmput and in turn mm_users is the serialization point if you prefer to drop all locking from _release. Nobody must ever attempt a mmu_notifier_* after calling mmput for that mm. That should be enough to be safe. I'm fine either ways... ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20080131001258.GD7185-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080131001258.GD7185-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2008-01-31 1:27 ` Christoph Lameter 0 siblings, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-31 1:27 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Benjamin Herrenschmidt, Jack Steiner, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins On Thu, 31 Jan 2008, Andrea Arcangeli wrote: > > Hmmmm.. exit_mmap is only called when the last reference is removed > > against the mm right? So no tasks are running anymore. No pages are left. > > Do we need to serialize at all for mmu_notifier_release? > > KVM sure doesn't need any locking there. I thought somebody had to > possibly take a pin on the "mm_count" and pretend to call > mmu_notifier_register at will until mmdrop was finally called, in a > out of order fashion given mmu_notifier_release was implemented like > if the list could change from under it. Note mmdrop != mmput. mmput > and in turn mm_users is the serialization point if you prefer to drop > all locking from _release. Nobody must ever attempt a mmu_notifier_* > after calling mmput for that mm. That should be enough to be > safe. I'm fine either ways... exit_mmap (where we call invalidate_all() and release()) is called when mm_users == 0: void mmput(struct mm_struct *mm) { might_sleep(); if (atomic_dec_and_test(&mm->mm_users)) { exit_aio(mm); exit_mmap(mm); if (!list_empty(&mm->mmlist)) { spin_lock(&mmlist_lock); list_del(&mm->mmlist); spin_unlock(&mmlist_lock); } put_swap_token(mm); mmdrop(mm); } } EXPORT_SYMBOL_GPL(mmput); So there is only a single thread executing at the time when invalidate_all() is called from exit_mmap(). Then we drop the pages, and the page tables. After the page tables we call the ->release method and then remove the vmas. So even dropping off the mmu_notifier chain in invalidate_all() could be done without an issue and without locking. Trouble is if other callbacks attempt the same. Do we need to support the removal from the mmu_notifier list in invalidate_range()? ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch 1/6] mmu_notifier: Core code 2008-01-30 15:37 ` Andrea Arcangeli [not found] ` <20080130153749.GN7233-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2008-01-30 17:10 ` Peter Zijlstra 2008-01-30 19:28 ` Christoph Lameter 1 sibling, 1 reply; 24+ messages in thread From: Peter Zijlstra @ 2008-01-30 17:10 UTC (permalink / raw) To: Andrea Arcangeli Cc: Christoph Lameter, Robin Holt, Avi Kivity, Izik Eidus, Nick Piggin, kvm-devel, Benjamin Herrenschmidt, steiner, linux-kernel, linux-mm, daniel.blueman, Hugh Dickins On Wed, 2008-01-30 at 16:37 +0100, Andrea Arcangeli wrote: > On Tue, Jan 29, 2008 at 06:29:10PM -0800, Christoph Lameter wrote: > > +void mmu_notifier_release(struct mm_struct *mm) > > +{ > > + struct mmu_notifier *mn; > > + struct hlist_node *n, *t; > > + > > + if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) { > > + rcu_read_lock(); > > + hlist_for_each_entry_safe_rcu(mn, n, t, > > + &mm->mmu_notifier.head, hlist) { > > + hlist_del_rcu(&mn->hlist); > > This will race and kernel crash against mmu_notifier_register in > SMP. You should resurrect the per-mmu_notifier_head lock in my last > patch (except it can be converted from a rwlock_t to a regular > spinlock_t) and drop the mmap_sem from > mmu_notifier_register/unregister. Agreed, sorry for this oversight. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch 1/6] mmu_notifier: Core code 2008-01-30 17:10 ` Peter Zijlstra @ 2008-01-30 19:28 ` Christoph Lameter 0 siblings, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 19:28 UTC (permalink / raw) To: Peter Zijlstra Cc: Nick Piggin, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins How about just taking the mmap_sem writelock in release? We have only a single caller of mmu_notifier_release() in mm/mmap.c and we know that we are not holding mmap_sem at that point. So just acquire it when needed? Index: linux-2.6/mm/mmu_notifier.c =================================================================== --- linux-2.6.orig/mm/mmu_notifier.c 2008-01-30 11:21:57.000000000 -0800 +++ linux-2.6/mm/mmu_notifier.c 2008-01-30 11:24:59.000000000 -0800 @@ -18,6 +19,7 @@ void mmu_notifier_release(struct mm_stru struct hlist_node *n, *t; if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) { + down_write(&mm->mmap_sem); rcu_read_lock(); hlist_for_each_entry_safe_rcu(mn, n, t, &mm->mmu_notifier.head, hlist) { @@ -26,6 +28,7 @@ void mmu_notifier_release(struct mm_stru mn->ops->release(mn, mm); } rcu_read_unlock(); + up_write(&mm->mmap_sem); synchronize_rcu(); } } ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130022944.236370194-sJ/iWh9BUns@public.gmane.org> 2008-01-30 15:37 ` Andrea Arcangeli @ 2008-01-30 18:02 ` Robin Holt [not found] ` <20080130180207.GU26420-sJ/iWh9BUns@public.gmane.org> 1 sibling, 1 reply; 24+ messages in thread From: Robin Holt @ 2008-01-30 18:02 UTC (permalink / raw) To: Christoph Lameter Cc: Nick Piggin, Andrea Arcangeli, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins Back to one of Andrea's points from a couple days ago, I think we still have a problem with the PageExternalRmap page flag. If I had two drivers with external rmap implementations, there is no way I can think of for a simple flag to coordinate a single page being exported and maintained by the two. Since the intended use seems to point in the direction of the external rmap must be maintained consistent with the all pages the driver has exported and the driver will already need to handle cases where the page does not appear in its rmap, I would propose the setting and clearing should be handled in the mmu_notifier code. This is the first of two patches. This one is intended as an addition to patch 1/6. I will post the other shortly under the patch 3/6 thread. Index: git-linus/include/linux/mmu_notifier.h =================================================================== --- git-linus.orig/include/linux/mmu_notifier.h 2008-01-30 11:43:45.000000000 -0600 +++ git-linus/include/linux/mmu_notifier.h 2008-01-30 11:44:35.000000000 -0600 @@ -146,6 +146,7 @@ static inline void mmu_notifier_head_ini extern void mmu_rmap_notifier_register(struct mmu_rmap_notifier *mrn); extern void mmu_rmap_notifier_unregister(struct mmu_rmap_notifier *mrn); +extern void mmu_rmap_export_page(struct page *page); extern struct hlist_head mmu_rmap_notifier_list; Index: git-linus/mm/mmu_notifier.c =================================================================== --- git-linus.orig/mm/mmu_notifier.c 2008-01-30 11:43:45.000000000 -0600 +++ git-linus/mm/mmu_notifier.c 2008-01-30 11:56:08.000000000 -0600 @@ -99,3 +99,8 @@ void mmu_rmap_notifier_unregister(struct } EXPORT_SYMBOL(mmu_rmap_notifier_unregister); +void mmu_rmap_export_page(struct page *page) +{ + SetPageExternalRmap(page); +} +EXPORT_SYMBOL(mmu_rmap_export_page); ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20080130180207.GU26420-sJ/iWh9BUns@public.gmane.org>]
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130180207.GU26420-sJ/iWh9BUns@public.gmane.org> @ 2008-01-30 19:08 ` Christoph Lameter 2008-01-30 19:14 ` Christoph Lameter 1 sibling, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 19:08 UTC (permalink / raw) To: Robin Holt Cc: Nick Piggin, Andrea Arcangeli, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Hugh Dickins On Wed, 30 Jan 2008, Robin Holt wrote: > Index: git-linus/mm/mmu_notifier.c > =================================================================== > --- git-linus.orig/mm/mmu_notifier.c 2008-01-30 11:43:45.000000000 -0600 > +++ git-linus/mm/mmu_notifier.c 2008-01-30 11:56:08.000000000 -0600 > @@ -99,3 +99,8 @@ void mmu_rmap_notifier_unregister(struct > } > EXPORT_SYMBOL(mmu_rmap_notifier_unregister); > > +void mmu_rmap_export_page(struct page *page) > +{ > + SetPageExternalRmap(page); > +} > +EXPORT_SYMBOL(mmu_rmap_export_page); Then mmu_rmap_export_page would have to be called before the subsystem establishes the rmap entry for the page. Could we do all PageExternalRmap modifications under Pagelock? ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch 1/6] mmu_notifier: Core code [not found] ` <20080130180207.GU26420-sJ/iWh9BUns@public.gmane.org> 2008-01-30 19:08 ` Christoph Lameter @ 2008-01-30 19:14 ` Christoph Lameter 1 sibling, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 19:14 UTC (permalink / raw) To: Robin Holt Cc: Nick Piggin, Andrea Arcangeli, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Hugh Dickins Ok. So I added the following patch: --- include/linux/mmu_notifier.h | 1 + mm/mmu_notifier.c | 12 ++++++++++++ 2 files changed, 13 insertions(+) Index: linux-2.6/include/linux/mmu_notifier.h =================================================================== --- linux-2.6.orig/include/linux/mmu_notifier.h 2008-01-30 11:09:06.000000000 -0800 +++ linux-2.6/include/linux/mmu_notifier.h 2008-01-30 11:10:38.000000000 -0800 @@ -146,6 +146,7 @@ static inline void mmu_notifier_head_ini extern void mmu_rmap_notifier_register(struct mmu_rmap_notifier *mrn); extern void mmu_rmap_notifier_unregister(struct mmu_rmap_notifier *mrn); +extern void mmu_rmap_export_page(struct page *page); extern struct hlist_head mmu_rmap_notifier_list; Index: linux-2.6/mm/mmu_notifier.c =================================================================== --- linux-2.6.orig/mm/mmu_notifier.c 2008-01-30 11:09:01.000000000 -0800 +++ linux-2.6/mm/mmu_notifier.c 2008-01-30 11:12:10.000000000 -0800 @@ -99,3 +99,15 @@ void mmu_rmap_notifier_unregister(struct } EXPORT_SYMBOL(mmu_rmap_notifier_unregister); +/* + * Export a page. + * + * Pagelock must be held. + * Must be called before a page is put on an external rmap. + */ +void mmu_rmap_export_page(struct page *page) +{ + BUG_ON(!PageLocked(page)); + SetPageExternalRmap(page); +} +EXPORT_SYMBOL(mmu_rmap_export_page); ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges 2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter 2008-01-30 2:29 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter @ 2008-01-30 2:29 ` Christoph Lameter 2008-01-30 2:29 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter ` (3 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 2:29 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins [-- Attachment #1: mmu_invalidate_range_callbacks --] [-- Type: text/plain, Size: 4940 bytes --] The invalidation of address ranges in a mm_struct needs to be performed when pages are removed or permissions etc change. Most of the VM address space changes can use the range invalidate callback. invalidate_range() is generally called with mmap_sem held but no spinlocks are active. If invalidate_range() is called with locks held then we pass a flag into invalidate_range() Comments state that mmap_sem must be held for remap_pfn_range() but various drivers do not seem to do this. Signed-off-by: Andrea Arcangeli <andrea-atKUWr5tajBWk0Htik3J/w@public.gmane.org> Signed-off-by: Robin Holt <holt-sJ/iWh9BUns@public.gmane.org> Signed-off-by: Christoph Lameter <clameter-sJ/iWh9BUns@public.gmane.org> --- mm/fremap.c | 2 ++ mm/hugetlb.c | 2 ++ mm/memory.c | 11 +++++++++-- mm/mmap.c | 1 + 4 files changed, 14 insertions(+), 2 deletions(-) Index: linux-2.6/mm/fremap.c =================================================================== --- linux-2.6.orig/mm/fremap.c 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/mm/fremap.c 2008-01-29 16:59:24.000000000 -0800 @@ -15,6 +15,7 @@ #include <linux/rmap.h> #include <linux/module.h> #include <linux/syscalls.h> +#include <linux/mmu_notifier.h> #include <asm/mmu_context.h> #include <asm/cacheflush.h> @@ -212,6 +213,7 @@ asmlinkage long sys_remap_file_pages(uns } err = populate_range(mm, vma, start, size, pgoff); + mmu_notifier(invalidate_range, mm, start, start + size, 0); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); Index: linux-2.6/mm/memory.c =================================================================== --- linux-2.6.orig/mm/memory.c 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/mm/memory.c 2008-01-29 16:59:24.000000000 -0800 @@ -50,6 +50,7 @@ #include <linux/delayacct.h> #include <linux/init.h> #include <linux/writeback.h> +#include <linux/mmu_notifier.h> #include <asm/pgalloc.h> #include <asm/uaccess.h> @@ -891,6 +892,8 @@ unsigned long zap_page_range(struct vm_a end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details); if (tlb) tlb_finish_mmu(tlb, address, end); + mmu_notifier(invalidate_range, mm, address, end, + (details ? (details->i_mmap_lock != NULL) : 0)); return end; } @@ -1319,7 +1322,7 @@ int remap_pfn_range(struct vm_area_struc { pgd_t *pgd; unsigned long next; - unsigned long end = addr + PAGE_ALIGN(size); + unsigned long start = addr, end = addr + PAGE_ALIGN(size); struct mm_struct *mm = vma->vm_mm; int err; @@ -1360,6 +1363,7 @@ int remap_pfn_range(struct vm_area_struc if (err) break; } while (pgd++, addr = next, addr != end); + mmu_notifier(invalidate_range, mm, start, end, 0); return err; } EXPORT_SYMBOL(remap_pfn_range); @@ -1443,7 +1447,7 @@ int apply_to_page_range(struct mm_struct { pgd_t *pgd; unsigned long next; - unsigned long end = addr + size; + unsigned long start = addr, end = addr + size; int err; BUG_ON(addr >= end); @@ -1454,6 +1458,7 @@ int apply_to_page_range(struct mm_struct if (err) break; } while (pgd++, addr = next, addr != end); + mmu_notifier(invalidate_range, mm, start, end, 0); return err; } EXPORT_SYMBOL_GPL(apply_to_page_range); @@ -1669,6 +1674,8 @@ gotten: page_cache_release(old_page); unlock: pte_unmap_unlock(page_table, ptl); + mmu_notifier(invalidate_range, mm, address, + address + PAGE_SIZE - 1, 0); if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file); Index: linux-2.6/mm/mmap.c =================================================================== --- linux-2.6.orig/mm/mmap.c 2008-01-29 16:56:36.000000000 -0800 +++ linux-2.6/mm/mmap.c 2008-01-29 16:58:15.000000000 -0800 @@ -1748,6 +1748,7 @@ static void unmap_region(struct mm_struc free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS, next? next->vm_start: 0); tlb_finish_mmu(tlb, start, end); + mmu_notifier(invalidate_range, mm, start, end, 0); } /* Index: linux-2.6/mm/hugetlb.c =================================================================== --- linux-2.6.orig/mm/hugetlb.c 2008-01-29 16:56:33.000000000 -0800 +++ linux-2.6/mm/hugetlb.c 2008-01-29 16:58:15.000000000 -0800 @@ -14,6 +14,7 @@ #include <linux/mempolicy.h> #include <linux/cpuset.h> #include <linux/mutex.h> +#include <linux/mmu_notifier.h> #include <asm/page.h> #include <asm/pgtable.h> @@ -763,6 +764,7 @@ void __unmap_hugepage_range(struct vm_ar } spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); + mmu_notifier(invalidate_range, mm, start, end, 1); list_for_each_entry_safe(page, tmp, &page_list, lru) { list_del(&page->lru); put_page(page); -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap 2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter 2008-01-30 2:29 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter 2008-01-30 2:29 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter @ 2008-01-30 2:29 ` Christoph Lameter [not found] ` <20080130022944.699753910-sJ/iWh9BUns@public.gmane.org> 2008-01-30 2:29 ` [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps Christoph Lameter ` (2 subsequent siblings) 5 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 2:29 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins [-- Attachment #1: mmu_invalidate_page_rmap_callbacks --] [-- Type: text/plain, Size: 1667 bytes --] Callbacks to remove individual pages if the subsystem has an rmap capability. The pagelock is held but no spinlocks are held. The refcount of the page is elevated so that dropping the refcount in the subsystem will not directly free the page. The callbacks occur after the Linux rmaps have been walked. Signed-off-by: Christoph Lameter <clameter-sJ/iWh9BUns@public.gmane.org> --- mm/rmap.c | 6 ++++++ 1 file changed, 6 insertions(+) Index: linux-2.6/mm/rmap.c =================================================================== --- linux-2.6.orig/mm/rmap.c 2008-01-25 14:24:19.000000000 -0800 +++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.000000000 -0800 @@ -49,6 +49,7 @@ #include <linux/rcupdate.h> #include <linux/module.h> #include <linux/kallsyms.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> @@ -473,6 +474,8 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); + if (!page_mapped(page)) ret = SWAP_SUCCESS; return ret; -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20080130022944.699753910-sJ/iWh9BUns@public.gmane.org>]
* Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap [not found] ` <20080130022944.699753910-sJ/iWh9BUns@public.gmane.org> @ 2008-01-30 18:03 ` Robin Holt 0 siblings, 0 replies; 24+ messages in thread From: Robin Holt @ 2008-01-30 18:03 UTC (permalink / raw) To: Christoph Lameter Cc: Nick Piggin, Andrea Arcangeli, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins This is the second part of a patch posted to patch 1/6. Index: git-linus/mm/rmap.c =================================================================== --- git-linus.orig/mm/rmap.c 2008-01-30 11:55:56.000000000 -0600 +++ git-linus/mm/rmap.c 2008-01-30 12:01:28.000000000 -0600 @@ -476,8 +476,10 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); - if (unlikely(PageExternalRmap(page))) + if (unlikely(PageExternalRmap(page))) { mmu_rmap_notifier(invalidate_page, page); + ClearPageExported(page); + } if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -980,8 +982,10 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); - if (unlikely(PageExternalRmap(page))) + if (unlikely(PageExternalRmap(page))) { mmu_rmap_notifier(invalidate_page, page); + ClearPageExported(page); + } if (!page_mapped(page)) ret = SWAP_SUCCESS; ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps 2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter ` (2 preceding siblings ...) 2008-01-30 2:29 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter @ 2008-01-30 2:29 ` Christoph Lameter 2008-01-30 2:29 ` [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c Christoph Lameter 2008-01-30 2:29 ` [patch 6/6] mmu_notifier: Add invalidate_all() Christoph Lameter 5 siblings, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 2:29 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins [-- Attachment #1: mmu_invalidate_page_callbacks --] [-- Type: text/plain, Size: 2716 bytes --] These notifiers here use the Linux rmaps to perform the callbacks. In order to walk the rmaps locks must be held. Callbacks can therefore only operate in an atomic context. Signed-off-by: Christoph Lameter <clameter-sJ/iWh9BUns@public.gmane.org> --- mm/rmap.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) Index: linux-2.6/mm/rmap.c =================================================================== --- linux-2.6.orig/mm/rmap.c 2008-01-29 16:58:25.000000000 -0800 +++ linux-2.6/mm/rmap.c 2008-01-29 16:58:39.000000000 -0800 @@ -285,7 +285,8 @@ static int page_referenced_one(struct pa if (!pte) goto out; - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) referenced++; /* Pretend the page is referenced if the task has the @@ -435,6 +436,7 @@ static int page_mkclean_one(struct page flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); @@ -680,7 +682,8 @@ static int try_to_unmap_one(struct page * skipped over this mm) then we should reactivate it. */ if (!migration && ((vma->vm_flags & VM_LOCKED) || - (ptep_clear_flush_young(vma, address, pte)))) { + (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)))) { ret = SWAP_FAIL; goto out_unmap; } @@ -688,6 +691,7 @@ static int try_to_unmap_one(struct page /* Nuke the page table entry. */ flush_cache_page(vma, address, page_to_pfn(page)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) @@ -812,12 +816,14 @@ static void try_to_unmap_cluster(unsigne page = vm_normal_page(vma, address, *pte); BUG_ON(!page || PageAnon(page)); - if (ptep_clear_flush_young(vma, address, pte)) + if (ptep_clear_flush_young(vma, address, pte) | + mmu_notifier_age_page(mm, address)) continue; /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); /* If nonlinear, store the file page offset in the pte. */ if (page->index != linear_page_index(vma, address)) -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c 2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter ` (3 preceding siblings ...) 2008-01-30 2:29 ` [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps Christoph Lameter @ 2008-01-30 2:29 ` Christoph Lameter 2008-01-30 2:29 ` [patch 6/6] mmu_notifier: Add invalidate_all() Christoph Lameter 5 siblings, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 2:29 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins [-- Attachment #1: mmu_xip --] [-- Type: text/plain, Size: 1493 bytes --] Problem for external rmaps: There is no pagelock held on the page. Signed-off-by: Robin Holt <holt-sJ/iWh9BUns@public.gmane.org> --- mm/filemap_xip.c | 5 +++++ 1 file changed, 5 insertions(+) Index: linux-2.6/mm/filemap_xip.c =================================================================== --- linux-2.6.orig/mm/filemap_xip.c 2008-01-25 19:39:04.000000000 -0800 +++ linux-2.6/mm/filemap_xip.c 2008-01-25 19:39:06.000000000 -0800 @@ -13,6 +13,7 @@ #include <linux/module.h> #include <linux/uio.h> #include <linux/rmap.h> +#include <linux/mmu_notifier.h> #include <linux/sched.h> #include <asm/tlbflush.h> @@ -183,6 +184,9 @@ __xip_unmap (struct address_space * mapp if (!page) return; + if (PageExternalRmap(page)) + mmu_rmap_notifier(invalidate_page, page); + spin_lock(&mapping->i_mmap_lock); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { mm = vma->vm_mm; @@ -194,6 +198,7 @@ __xip_unmap (struct address_space * mapp /* Nuke the page table entry. */ flush_cache_page(vma, address, pte_pfn(*pte)); pteval = ptep_clear_flush(vma, address, pte); + mmu_notifier(invalidate_page, mm, address); page_remove_rmap(page, vma); dec_mm_counter(mm, file_rss); BUG_ON(pte_dirty(pteval)); -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch 6/6] mmu_notifier: Add invalidate_all() 2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter ` (4 preceding siblings ...) 2008-01-30 2:29 ` [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c Christoph Lameter @ 2008-01-30 2:29 ` Christoph Lameter 5 siblings, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2008-01-30 2:29 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins [-- Attachment #1: mmu_all --] [-- Type: text/plain, Size: 1793 bytes --] when a task exits we can remove all external pts at once. At that point the extern mmu may also unregister itself from the mmu notifier chain to avoid future calls. Note the complications because of RCU. Other processors may not see that the notifier was unlinked until a quiescent period has passed! Signed-off-by: Christoph Lameter <clameter-sJ/iWh9BUns@public.gmane.org> --- include/linux/mmu_notifier.h | 4 ++++ mm/mmap.c | 1 + 2 files changed, 5 insertions(+) Index: linux-2.6/include/linux/mmu_notifier.h =================================================================== --- linux-2.6.orig/include/linux/mmu_notifier.h 2008-01-28 14:02:18.000000000 -0800 +++ linux-2.6/include/linux/mmu_notifier.h 2008-01-28 14:15:49.000000000 -0800 @@ -62,6 +62,10 @@ struct mmu_notifier_ops { struct mm_struct *mm, unsigned long address); + /* Dummy needed because the mmu_notifier() macro requires it */ + void (*invalidate_all)(struct mmu_notifier *mn, struct mm_struct *mm, + int dummy); + /* * lock indicates that the function is called under spinlock. */ Index: linux-2.6/mm/mmap.c =================================================================== --- linux-2.6.orig/mm/mmap.c 2008-01-28 14:15:49.000000000 -0800 +++ linux-2.6/mm/mmap.c 2008-01-28 14:15:49.000000000 -0800 @@ -2034,6 +2034,7 @@ void exit_mmap(struct mm_struct *mm) unsigned long end; /* mm's last user has gone, and its about to be pulled down */ + mmu_notifier(invalidate_all, mm, 0); arch_exit_mmap(mm); lru_add_drain(); -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch 0/6] [RFC] MMU Notifiers V2 @ 2008-01-28 20:28 Christoph Lameter 2008-01-28 20:28 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2008-01-28 20:28 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins This is a patchset implementing MMU notifier callbacks based on Andrea's earlier work. These are needed if Linux pages are referenced from something else than tracked by the rmaps of the kernel. Issues: - Feedback from uses of the callbacks for KVM, RDMA, XPmem and GRU - RCU quiescent periods are required on registering and unregistering notifiers to guarantee visibility to other processors. Currently only mmu_notifier_release() does the correct thing. It is up to the user to provide RCU quiescent periods for register/unregister functions if they are called outside of the ->release method. Andrea's mmu_notifier #4 -> RFC V1 - Merge subsystem rmap based with Linux rmap based approach - Move Linux rmap based notifiers out of macro - Try to account for what locks are held while the notifiers are called. - Develop a patch sequence that separates out the different types of hooks so that we can review their use. - Avoid adding include to linux/mm_types.h - Integrate RCU logic suggested by Peter. V1->V2: - Improve RCU support - Use mmap_sem for mmu_notifier register / unregister - Drop invalidate_page from COW, mm/fremap.c and mm/rmap.c since we already have invalidate_range() callbacks there. - Clean compile for !MMU_NOTIFIER - Isolate filemap_xip strangeness into its own diff - Pass a the flag to invalidate_range to indicate if a spinlock is held. - Add invalidate_all() -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap 2008-01-28 20:28 [patch 0/6] [RFC] MMU Notifiers V2 Christoph Lameter @ 2008-01-28 20:28 ` Christoph Lameter 2008-01-29 16:28 ` Robin Holt 0 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2008-01-28 20:28 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nick Piggin, Peter Zijlstra, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Benjamin Herrenschmidt, steiner-sJ/iWh9BUns, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, daniel.blueman-xqY44rlHlBpWk0Htik3J/w, Robin Holt, Hugh Dickins [-- Attachment #1: mmu_invalidate_page_rmap_callbacks --] [-- Type: text/plain, Size: 1667 bytes --] Callbacks to remove individual pages if the subsystem has an rmap capability. The pagelock is held but no spinlocks are held. The refcount of the page is elevated so that dropping the refcount in the subsystem will not directly free the page. The callbacks occur after the Linux rmaps have been walked. Signed-off-by: Christoph Lameter <clameter-sJ/iWh9BUns@public.gmane.org> --- mm/rmap.c | 6 ++++++ 1 file changed, 6 insertions(+) Index: linux-2.6/mm/rmap.c =================================================================== --- linux-2.6.orig/mm/rmap.c 2008-01-25 14:24:19.000000000 -0800 +++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.000000000 -0800 @@ -49,6 +49,7 @@ #include <linux/rcupdate.h> #include <linux/module.h> #include <linux/kallsyms.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> @@ -473,6 +474,8 @@ int page_mkclean(struct page *page) struct address_space *mapping = page_mapping(page); if (mapping) { ret = page_mkclean_file(mapping, page); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); if (page_test_dirty(page)) { page_clear_dirty(page); ret = 1; @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); + if (unlikely(PageExternalRmap(page))) + mmu_rmap_notifier(invalidate_page, page); + if (!page_mapped(page)) ret = SWAP_SUCCESS; return ret; -- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap 2008-01-28 20:28 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter @ 2008-01-29 16:28 ` Robin Holt 0 siblings, 0 replies; 24+ messages in thread From: Robin Holt @ 2008-01-29 16:28 UTC (permalink / raw) To: Christoph Lameter Cc: Andrea Arcangeli, Robin Holt, Avi Kivity, Izik Eidus, Nick Piggin, kvm-devel, Benjamin Herrenschmidt, Peter Zijlstra, steiner, linux-kernel, linux-mm, daniel.blueman, Hugh Dickins I don't understand how this is intended to work. I think the page flag needs to be maintained by the mmu_notifier subsystem. Let's assume we have a mapping that has a grant from xpmem and an additional grant from kvm. The exporters are not important, the fact that there may be two is. Assume that the user revokes the grant from xpmem (we call that xpmem_remove). As far as xpmem is concerned, there are no longer any exports of that page so the page should no longer have its exported flag set. Note: This is not a process exit, but a function of xpmem. In that case, at the remove time, we have no idea whether the flag should be cleared. For the invalidate_page side, I think we should have: > @@ -473,6 +474,10 @@ int page_mkclean(struct page *page) > struct address_space *mapping = page_mapping(page); > if (mapping) { > ret = page_mkclean_file(mapping, page); > + if (unlikely(PageExternalRmap(page))) { > + mmu_rmap_notifier(invalidate_page, page); > + ClearPageExternalRmap(page); > + } > if (page_test_dirty(page)) { > page_clear_dirty(page); > ret = 1; I would assume we would then want a function which sets the page flag. Additionally, I would think we would want some intervention in the freeing of the page side to ensure the page flag is cleared as well. Thanks, Robin ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2008-01-31 1:27 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-30 2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter
2008-01-30 2:29 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
[not found] ` <20080130022944.236370194-sJ/iWh9BUns@public.gmane.org>
2008-01-30 15:37 ` Andrea Arcangeli
[not found] ` <20080130153749.GN7233-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2008-01-30 15:53 ` Jack Steiner
[not found] ` <20080130155306.GA13746-sJ/iWh9BUns@public.gmane.org>
2008-01-30 16:38 ` Andrea Arcangeli
2008-01-30 19:19 ` Christoph Lameter
[not found] ` <Pine.LNX.4.64.0801301116510.27491-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org>
2008-01-30 22:20 ` Robin Holt
[not found] ` <20080130222035.GX26420-sJ/iWh9BUns@public.gmane.org>
2008-01-30 23:38 ` Andrea Arcangeli
2008-01-30 23:55 ` Christoph Lameter
[not found] ` <Pine.LNX.4.64.0801301552210.1722-RYO/mD75kfhx2SFC9UQUAuF7EQX82lMiAL8bYrjMMd8@public.gmane.org>
2008-01-31 0:12 ` Andrea Arcangeli
[not found] ` <20080131001258.GD7185-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2008-01-31 1:27 ` Christoph Lameter
2008-01-30 17:10 ` Peter Zijlstra
2008-01-30 19:28 ` Christoph Lameter
2008-01-30 18:02 ` Robin Holt
[not found] ` <20080130180207.GU26420-sJ/iWh9BUns@public.gmane.org>
2008-01-30 19:08 ` Christoph Lameter
2008-01-30 19:14 ` Christoph Lameter
2008-01-30 2:29 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-01-30 2:29 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter
[not found] ` <20080130022944.699753910-sJ/iWh9BUns@public.gmane.org>
2008-01-30 18:03 ` Robin Holt
2008-01-30 2:29 ` [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps Christoph Lameter
2008-01-30 2:29 ` [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c Christoph Lameter
2008-01-30 2:29 ` [patch 6/6] mmu_notifier: Add invalidate_all() Christoph Lameter
-- strict thread matches above, loose matches on Subject: below --
2008-01-28 20:28 [patch 0/6] [RFC] MMU Notifiers V2 Christoph Lameter
2008-01-28 20:28 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter
2008-01-29 16:28 ` Robin Holt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox