From: Andrea Arcangeli <andrea@qumranet.com>
To: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>,
Christoph Lameter <clameter@sgi.com>,
Nick Piggin <npiggin@suse.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
kvm-devel@lists.sourceforge.net,
Kanoj Sarcar <kanojsarcar@yahoo.com>,
Roland Dreier <rdreier@cisco.com>,
Steve Wise <swise@opengridcomputing.com>,
linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
linux-mm@kvack.org, general@lists.openfabrics.org,
Hugh Dickins <hugh@veritas.com>,
akpm@linux-foundation.org, Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [PATCH 01 of 12] Core of mmu notifiers
Date: Thu, 24 Apr 2008 19:41:45 +0200 [thread overview]
Message-ID: <20080424174145.GM24536@duo.random> (raw)
In-Reply-To: <20080424153943.GJ24536@duo.random>
On Thu, Apr 24, 2008 at 05:39:43PM +0200, Andrea Arcangeli wrote:
> There's at least one small issue I noticed so far, that while _release
> don't need to care about _register, but _unregister definitely need to
> care about _register. I've to take the mmap_sem in addition or in
In the end the best is to use the spinlock around those
list_add/list_del they all run in O(1) with the hlist and they take a
few asm insn. This also avoids to take the mmap_sem in exit_mmap, at
exit_mmap time nobody should need to use mmap_sem anymore, it might
work but this looks cleaner. The lock is dynamically allocated only
when the notifiers are registered, so the few bytes taken by it aren't
relevant.
A full new update will some become visible here:
http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/
Please have a close look again. Your help is extremely appreciated and
very helpful as usual! Thanks a lot.
diff -urN xxx/include/linux/mmu_notifier.h xx/include/linux/mmu_notifier.h
--- xxx/include/linux/mmu_notifier.h 2008-04-24 19:41:15.000000000 +0200
+++ xx/include/linux/mmu_notifier.h 2008-04-24 19:38:37.000000000 +0200
@@ -15,7 +15,7 @@
struct hlist_head list;
struct srcu_struct srcu;
/* to serialize mmu_notifier_unregister against mmu_notifier_release */
- spinlock_t unregister_lock;
+ spinlock_t lock;
};
struct mmu_notifier_ops {
diff -urN xxx/mm/memory.c xx/mm/memory.c
--- xxx/mm/memory.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/memory.c 2008-04-24 19:38:37.000000000 +0200
@@ -605,16 +605,13 @@
* readonly mappings. The tradeoff is that copy_page_range is more
* efficient than faulting.
*/
- ret = 0;
if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
if (!vma->anon_vma)
- goto out;
+ return 0;
}
- if (unlikely(is_vm_hugetlb_page(vma))) {
- ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
- goto out;
- }
+ if (is_vm_hugetlb_page(vma))
+ return copy_hugetlb_page_range(dst_mm, src_mm, vma);
if (is_cow_mapping(vma->vm_flags))
mmu_notifier_invalidate_range_start(src_mm, addr, end);
@@ -636,7 +633,6 @@
if (is_cow_mapping(vma->vm_flags))
mmu_notifier_invalidate_range_end(src_mm,
vma->vm_start, end);
-out:
return ret;
}
diff -urN xxx/mm/mmap.c xx/mm/mmap.c
--- xxx/mm/mmap.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmap.c 2008-04-24 19:38:37.000000000 +0200
@@ -2381,7 +2381,7 @@
if (data->nr_anon_vma_locks)
mm_unlock_vfree(data->anon_vma_locks,
data->nr_anon_vma_locks);
- if (data->i_mmap_locks)
+ if (data->nr_i_mmap_locks)
mm_unlock_vfree(data->i_mmap_locks,
data->nr_i_mmap_locks);
}
diff -urN xxx/mm/mmu_notifier.c xx/mm/mmu_notifier.c
--- xxx/mm/mmu_notifier.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmu_notifier.c 2008-04-24 19:31:23.000000000 +0200
@@ -24,22 +24,16 @@
* zero). All other tasks of this mm already quit so they can't invoke
* mmu notifiers anymore. This can run concurrently only against
* mmu_notifier_unregister and it serializes against it with the
- * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go
- * away from under us as the exit_mmap holds a mm_count pin itself.
- *
- * The ->release method can't allow the module to be unloaded, the
- * module can only be unloaded after mmu_notifier_unregister run. This
- * is because the release method has to run the ret instruction to
- * return back here, and so it can't allow the ret instruction to be
- * freed.
+ * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm
+ * can't go away from under us as exit_mmap holds a mm_count pin
+ * itself.
*/
void __mmu_notifier_release(struct mm_struct *mm)
{
struct mmu_notifier *mn;
int srcu;
- srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock(&mm->mmu_notifier_mm->lock);
while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
mn = hlist_entry(mm->mmu_notifier_mm->list.first,
struct mmu_notifier,
@@ -52,23 +46,28 @@
*/
hlist_del_init(&mn->hlist);
/*
+ * SRCU here will block mmu_notifier_unregister until
+ * ->release returns.
+ */
+ srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
+ /*
* if ->release runs before mmu_notifier_unregister it
* must be handled as it's the only way for the driver
- * to flush all existing sptes before the pages in the
- * mm are freed.
+ * to flush all existing sptes and stop the driver
+ * from establishing any more sptes before all the
+ * pages in the mm are freed.
*/
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
- /* SRCU will block mmu_notifier_unregister */
mn->ops->release(mn, mm);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ spin_lock(&mm->mmu_notifier_mm->lock);
}
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
- srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
/*
- * Wait ->release if mmu_notifier_unregister run list_del_rcu.
- * srcu can't go away from under us because one mm_count is
- * hold by exit_mmap.
+ * Wait ->release if mmu_notifier_unregister is running it.
+ * The mmu_notifier_mm can't go away from under us because one
+ * mm_count is hold by exit_mmap.
*/
synchronize_srcu(&mm->mmu_notifier_mm->srcu);
}
@@ -177,11 +176,19 @@
goto out_unlock;
}
INIT_HLIST_HEAD(&mm->mmu_notifier_mm->list);
- spin_lock_init(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock_init(&mm->mmu_notifier_mm->lock);
}
atomic_inc(&mm->mm_count);
+ /*
+ * Serialize the update against mmu_notifier_unregister. A
+ * side note: mmu_notifier_release can't run concurrently with
+ * us because we hold the mm_users pin (either implicitly as
+ * current->mm or explicitly with get_task_mm() or similar).
+ */
+ spin_lock(&mm->mmu_notifier_mm->lock);
hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
out_unlock:
mm_unlock(mm, &data);
out:
@@ -215,23 +222,32 @@
BUG_ON(atomic_read(&mm->mm_count) <= 0);
- srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock(&mm->mmu_notifier_mm->lock);
if (!hlist_unhashed(&mn->hlist)) {
hlist_del_rcu(&mn->hlist);
before_release = 1;
}
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
if (before_release)
/*
+ * SRCU here will force exit_mmap to wait ->release to finish
+ * before freeing the pages.
+ */
+ srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
+ if (before_release) {
+ /*
* exit_mmap will block in mmu_notifier_release to
* guarantee ->release is called before freeing the
* pages.
*/
mn->ops->release(mn, mm);
- srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ }
- /* wait any running method to finish, including ->release */
+ /*
+ * Wait any running method to finish, of course including
+ * ->release if it was run by mmu_notifier_relase instead of us.
+ */
synchronize_srcu(&mm->mmu_notifier_mm->srcu);
BUG_ON(atomic_read(&mm->mm_count) <= 0);
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <andrea@qumranet.com>
To: Robin Holt <holt@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>,
Rusty Russell <rusty@rustcorp.com.au>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
kvm-devel@lists.sourceforge.net,
Kanoj Sarcar <kanojsarcar@yahoo.com>,
Roland Dreier <rdreier@cisco.com>, Jack Steiner <steiner@sgi.com>,
linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
linux-mm@kvack.org, general@lists.openfabrics.org,
Hugh Dickins <hugh@veritas.com>,
akpm@linux-foundation.org, Christoph Lameter <clameter@sgi.com>
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
Date: Thu, 24 Apr 2008 19:41:45 +0200 [thread overview]
Message-ID: <20080424174145.GM24536@duo.random> (raw)
In-Reply-To: <20080424153943.GJ24536@duo.random>
On Thu, Apr 24, 2008 at 05:39:43PM +0200, Andrea Arcangeli wrote:
> There's at least one small issue I noticed so far, that while _release
> don't need to care about _register, but _unregister definitely need to
> care about _register. I've to take the mmap_sem in addition or in
In the end the best is to use the spinlock around those
list_add/list_del they all run in O(1) with the hlist and they take a
few asm insn. This also avoids to take the mmap_sem in exit_mmap, at
exit_mmap time nobody should need to use mmap_sem anymore, it might
work but this looks cleaner. The lock is dynamically allocated only
when the notifiers are registered, so the few bytes taken by it aren't
relevant.
A full new update will some become visible here:
http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/
Please have a close look again. Your help is extremely appreciated and
very helpful as usual! Thanks a lot.
diff -urN xxx/include/linux/mmu_notifier.h xx/include/linux/mmu_notifier.h
--- xxx/include/linux/mmu_notifier.h 2008-04-24 19:41:15.000000000 +0200
+++ xx/include/linux/mmu_notifier.h 2008-04-24 19:38:37.000000000 +0200
@@ -15,7 +15,7 @@
struct hlist_head list;
struct srcu_struct srcu;
/* to serialize mmu_notifier_unregister against mmu_notifier_release */
- spinlock_t unregister_lock;
+ spinlock_t lock;
};
struct mmu_notifier_ops {
diff -urN xxx/mm/memory.c xx/mm/memory.c
--- xxx/mm/memory.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/memory.c 2008-04-24 19:38:37.000000000 +0200
@@ -605,16 +605,13 @@
* readonly mappings. The tradeoff is that copy_page_range is more
* efficient than faulting.
*/
- ret = 0;
if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
if (!vma->anon_vma)
- goto out;
+ return 0;
}
- if (unlikely(is_vm_hugetlb_page(vma))) {
- ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
- goto out;
- }
+ if (is_vm_hugetlb_page(vma))
+ return copy_hugetlb_page_range(dst_mm, src_mm, vma);
if (is_cow_mapping(vma->vm_flags))
mmu_notifier_invalidate_range_start(src_mm, addr, end);
@@ -636,7 +633,6 @@
if (is_cow_mapping(vma->vm_flags))
mmu_notifier_invalidate_range_end(src_mm,
vma->vm_start, end);
-out:
return ret;
}
diff -urN xxx/mm/mmap.c xx/mm/mmap.c
--- xxx/mm/mmap.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmap.c 2008-04-24 19:38:37.000000000 +0200
@@ -2381,7 +2381,7 @@
if (data->nr_anon_vma_locks)
mm_unlock_vfree(data->anon_vma_locks,
data->nr_anon_vma_locks);
- if (data->i_mmap_locks)
+ if (data->nr_i_mmap_locks)
mm_unlock_vfree(data->i_mmap_locks,
data->nr_i_mmap_locks);
}
diff -urN xxx/mm/mmu_notifier.c xx/mm/mmu_notifier.c
--- xxx/mm/mmu_notifier.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmu_notifier.c 2008-04-24 19:31:23.000000000 +0200
@@ -24,22 +24,16 @@
* zero). All other tasks of this mm already quit so they can't invoke
* mmu notifiers anymore. This can run concurrently only against
* mmu_notifier_unregister and it serializes against it with the
- * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go
- * away from under us as the exit_mmap holds a mm_count pin itself.
- *
- * The ->release method can't allow the module to be unloaded, the
- * module can only be unloaded after mmu_notifier_unregister run. This
- * is because the release method has to run the ret instruction to
- * return back here, and so it can't allow the ret instruction to be
- * freed.
+ * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm
+ * can't go away from under us as exit_mmap holds a mm_count pin
+ * itself.
*/
void __mmu_notifier_release(struct mm_struct *mm)
{
struct mmu_notifier *mn;
int srcu;
- srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock(&mm->mmu_notifier_mm->lock);
while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
mn = hlist_entry(mm->mmu_notifier_mm->list.first,
struct mmu_notifier,
@@ -52,23 +46,28 @@
*/
hlist_del_init(&mn->hlist);
/*
+ * SRCU here will block mmu_notifier_unregister until
+ * ->release returns.
+ */
+ srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
+ /*
* if ->release runs before mmu_notifier_unregister it
* must be handled as it's the only way for the driver
- * to flush all existing sptes before the pages in the
- * mm are freed.
+ * to flush all existing sptes and stop the driver
+ * from establishing any more sptes before all the
+ * pages in the mm are freed.
*/
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
- /* SRCU will block mmu_notifier_unregister */
mn->ops->release(mn, mm);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ spin_lock(&mm->mmu_notifier_mm->lock);
}
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
- srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
/*
- * Wait ->release if mmu_notifier_unregister run list_del_rcu.
- * srcu can't go away from under us because one mm_count is
- * hold by exit_mmap.
+ * Wait ->release if mmu_notifier_unregister is running it.
+ * The mmu_notifier_mm can't go away from under us because one
+ * mm_count is hold by exit_mmap.
*/
synchronize_srcu(&mm->mmu_notifier_mm->srcu);
}
@@ -177,11 +176,19 @@
goto out_unlock;
}
INIT_HLIST_HEAD(&mm->mmu_notifier_mm->list);
- spin_lock_init(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock_init(&mm->mmu_notifier_mm->lock);
}
atomic_inc(&mm->mm_count);
+ /*
+ * Serialize the update against mmu_notifier_unregister. A
+ * side note: mmu_notifier_release can't run concurrently with
+ * us because we hold the mm_users pin (either implicitly as
+ * current->mm or explicitly with get_task_mm() or similar).
+ */
+ spin_lock(&mm->mmu_notifier_mm->lock);
hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
out_unlock:
mm_unlock(mm, &data);
out:
@@ -215,23 +222,32 @@
BUG_ON(atomic_read(&mm->mm_count) <= 0);
- srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock(&mm->mmu_notifier_mm->lock);
if (!hlist_unhashed(&mn->hlist)) {
hlist_del_rcu(&mn->hlist);
before_release = 1;
}
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
if (before_release)
/*
+ * SRCU here will force exit_mmap to wait ->release to finish
+ * before freeing the pages.
+ */
+ srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
+ if (before_release) {
+ /*
* exit_mmap will block in mmu_notifier_release to
* guarantee ->release is called before freeing the
* pages.
*/
mn->ops->release(mn, mm);
- srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ }
- /* wait any running method to finish, including ->release */
+ /*
+ * Wait any running method to finish, of course including
+ * ->release if it was run by mmu_notifier_relase instead of us.
+ */
synchronize_srcu(&mm->mmu_notifier_mm->srcu);
BUG_ON(atomic_read(&mm->mm_count) <= 0);
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <andrea@qumranet.com>
To: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>,
Christoph Lameter <clameter@sgi.com>,
Nick Piggin <npiggin@suse.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
kvm-devel@lists.sourceforge.net,
Kanoj Sarcar <kanojsarcar@yahoo.com>,
Roland Dreier <rdreier@cisco.com>,
Steve Wise <swise@opengridcomputing.com>,
linux-kernel@vger.kernel.org, Avi Kivity <avi@qumranet.com>,
linux-mm@kvack.org, general@lists.openfabrics.org,
Hugh Dickins <hugh@veritas.com>,
akpm@linux-foundation.org, Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [PATCH 01 of 12] Core of mmu notifiers
Date: Thu, 24 Apr 2008 19:41:45 +0200 [thread overview]
Message-ID: <20080424174145.GM24536@duo.random> (raw)
In-Reply-To: <20080424153943.GJ24536@duo.random>
On Thu, Apr 24, 2008 at 05:39:43PM +0200, Andrea Arcangeli wrote:
> There's at least one small issue I noticed so far, that while _release
> don't need to care about _register, but _unregister definitely need to
> care about _register. I've to take the mmap_sem in addition or in
In the end the best is to use the spinlock around those
list_add/list_del they all run in O(1) with the hlist and they take a
few asm insn. This also avoids to take the mmap_sem in exit_mmap, at
exit_mmap time nobody should need to use mmap_sem anymore, it might
work but this looks cleaner. The lock is dynamically allocated only
when the notifiers are registered, so the few bytes taken by it aren't
relevant.
A full new update will some become visible here:
http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/
Please have a close look again. Your help is extremely appreciated and
very helpful as usual! Thanks a lot.
diff -urN xxx/include/linux/mmu_notifier.h xx/include/linux/mmu_notifier.h
--- xxx/include/linux/mmu_notifier.h 2008-04-24 19:41:15.000000000 +0200
+++ xx/include/linux/mmu_notifier.h 2008-04-24 19:38:37.000000000 +0200
@@ -15,7 +15,7 @@
struct hlist_head list;
struct srcu_struct srcu;
/* to serialize mmu_notifier_unregister against mmu_notifier_release */
- spinlock_t unregister_lock;
+ spinlock_t lock;
};
struct mmu_notifier_ops {
diff -urN xxx/mm/memory.c xx/mm/memory.c
--- xxx/mm/memory.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/memory.c 2008-04-24 19:38:37.000000000 +0200
@@ -605,16 +605,13 @@
* readonly mappings. The tradeoff is that copy_page_range is more
* efficient than faulting.
*/
- ret = 0;
if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
if (!vma->anon_vma)
- goto out;
+ return 0;
}
- if (unlikely(is_vm_hugetlb_page(vma))) {
- ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
- goto out;
- }
+ if (is_vm_hugetlb_page(vma))
+ return copy_hugetlb_page_range(dst_mm, src_mm, vma);
if (is_cow_mapping(vma->vm_flags))
mmu_notifier_invalidate_range_start(src_mm, addr, end);
@@ -636,7 +633,6 @@
if (is_cow_mapping(vma->vm_flags))
mmu_notifier_invalidate_range_end(src_mm,
vma->vm_start, end);
-out:
return ret;
}
diff -urN xxx/mm/mmap.c xx/mm/mmap.c
--- xxx/mm/mmap.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmap.c 2008-04-24 19:38:37.000000000 +0200
@@ -2381,7 +2381,7 @@
if (data->nr_anon_vma_locks)
mm_unlock_vfree(data->anon_vma_locks,
data->nr_anon_vma_locks);
- if (data->i_mmap_locks)
+ if (data->nr_i_mmap_locks)
mm_unlock_vfree(data->i_mmap_locks,
data->nr_i_mmap_locks);
}
diff -urN xxx/mm/mmu_notifier.c xx/mm/mmu_notifier.c
--- xxx/mm/mmu_notifier.c 2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmu_notifier.c 2008-04-24 19:31:23.000000000 +0200
@@ -24,22 +24,16 @@
* zero). All other tasks of this mm already quit so they can't invoke
* mmu notifiers anymore. This can run concurrently only against
* mmu_notifier_unregister and it serializes against it with the
- * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go
- * away from under us as the exit_mmap holds a mm_count pin itself.
- *
- * The ->release method can't allow the module to be unloaded, the
- * module can only be unloaded after mmu_notifier_unregister run. This
- * is because the release method has to run the ret instruction to
- * return back here, and so it can't allow the ret instruction to be
- * freed.
+ * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm
+ * can't go away from under us as exit_mmap holds a mm_count pin
+ * itself.
*/
void __mmu_notifier_release(struct mm_struct *mm)
{
struct mmu_notifier *mn;
int srcu;
- srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock(&mm->mmu_notifier_mm->lock);
while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
mn = hlist_entry(mm->mmu_notifier_mm->list.first,
struct mmu_notifier,
@@ -52,23 +46,28 @@
*/
hlist_del_init(&mn->hlist);
/*
+ * SRCU here will block mmu_notifier_unregister until
+ * ->release returns.
+ */
+ srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
+ /*
* if ->release runs before mmu_notifier_unregister it
* must be handled as it's the only way for the driver
- * to flush all existing sptes before the pages in the
- * mm are freed.
+ * to flush all existing sptes and stop the driver
+ * from establishing any more sptes before all the
+ * pages in the mm are freed.
*/
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
- /* SRCU will block mmu_notifier_unregister */
mn->ops->release(mn, mm);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ spin_lock(&mm->mmu_notifier_mm->lock);
}
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
- srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
/*
- * Wait ->release if mmu_notifier_unregister run list_del_rcu.
- * srcu can't go away from under us because one mm_count is
- * hold by exit_mmap.
+ * Wait ->release if mmu_notifier_unregister is running it.
+ * The mmu_notifier_mm can't go away from under us because one
+ * mm_count is hold by exit_mmap.
*/
synchronize_srcu(&mm->mmu_notifier_mm->srcu);
}
@@ -177,11 +176,19 @@
goto out_unlock;
}
INIT_HLIST_HEAD(&mm->mmu_notifier_mm->list);
- spin_lock_init(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock_init(&mm->mmu_notifier_mm->lock);
}
atomic_inc(&mm->mm_count);
+ /*
+ * Serialize the update against mmu_notifier_unregister. A
+ * side note: mmu_notifier_release can't run concurrently with
+ * us because we hold the mm_users pin (either implicitly as
+ * current->mm or explicitly with get_task_mm() or similar).
+ */
+ spin_lock(&mm->mmu_notifier_mm->lock);
hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
out_unlock:
mm_unlock(mm, &data);
out:
@@ -215,23 +222,32 @@
BUG_ON(atomic_read(&mm->mm_count) <= 0);
- srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
- spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+ spin_lock(&mm->mmu_notifier_mm->lock);
if (!hlist_unhashed(&mn->hlist)) {
hlist_del_rcu(&mn->hlist);
before_release = 1;
}
- spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
if (before_release)
/*
+ * SRCU here will force exit_mmap to wait ->release to finish
+ * before freeing the pages.
+ */
+ srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+ spin_unlock(&mm->mmu_notifier_mm->lock);
+ if (before_release) {
+ /*
* exit_mmap will block in mmu_notifier_release to
* guarantee ->release is called before freeing the
* pages.
*/
mn->ops->release(mn, mm);
- srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+ }
- /* wait any running method to finish, including ->release */
+ /*
+ * Wait any running method to finish, of course including
+ * ->release if it was run by mmu_notifier_relase instead of us.
+ */
synchronize_srcu(&mm->mmu_notifier_mm->srcu);
BUG_ON(atomic_read(&mm->mm_count) <= 0);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-04-24 17:41 UTC|newest]
Thread overview: 249+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-22 13:51 [PATCH 00 of 12] mmu notifier #v13 Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` [ofa-general] " Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 01 of 12] Core of mmu notifiers Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 14:56 ` Eric Dumazet
2008-04-22 14:56 ` Eric Dumazet
2008-04-22 15:15 ` Andrea Arcangeli
2008-04-22 15:15 ` Andrea Arcangeli
2008-04-22 15:15 ` [ofa-general] " Andrea Arcangeli
2008-04-22 15:24 ` Avi Kivity
2008-04-22 15:24 ` Avi Kivity
2008-04-22 15:37 ` Eric Dumazet
2008-04-22 15:37 ` Eric Dumazet
2008-04-22 16:46 ` Andrea Arcangeli
2008-04-22 16:46 ` Andrea Arcangeli
2008-04-22 16:46 ` [ofa-general] " Andrea Arcangeli
2008-04-22 20:19 ` Christoph Lameter
2008-04-22 20:19 ` Christoph Lameter
2008-04-22 20:19 ` [ofa-general] " Christoph Lameter
2008-04-22 20:31 ` Robin Holt
2008-04-22 20:31 ` Robin Holt
2008-04-22 20:31 ` [ofa-general] " Robin Holt
2008-04-22 22:35 ` Andrea Arcangeli
2008-04-22 22:35 ` Andrea Arcangeli
2008-04-22 22:35 ` Andrea Arcangeli
2008-04-22 23:07 ` Robin Holt
2008-04-22 23:07 ` Robin Holt
2008-04-22 23:07 ` Robin Holt
2008-04-23 0:28 ` Jack Steiner
2008-04-23 0:28 ` Jack Steiner
2008-04-23 0:28 ` [ofa-general] " Jack Steiner
2008-04-23 16:37 ` Andrea Arcangeli
2008-04-23 16:37 ` Andrea Arcangeli
2008-04-23 16:37 ` [ofa-general] " Andrea Arcangeli
2008-04-23 18:19 ` Christoph Lameter
2008-04-23 18:19 ` Christoph Lameter
2008-04-23 18:19 ` [ofa-general] " Christoph Lameter
2008-04-23 18:25 ` Andrea Arcangeli
2008-04-23 18:25 ` Andrea Arcangeli
2008-04-23 18:25 ` [ofa-general] " Andrea Arcangeli
2008-04-23 22:19 ` Andrea Arcangeli
2008-04-23 22:19 ` Andrea Arcangeli
2008-04-23 22:19 ` [ofa-general] " Andrea Arcangeli
2008-04-24 6:49 ` Andrea Arcangeli
2008-04-24 6:49 ` Andrea Arcangeli
2008-04-24 6:49 ` Andrea Arcangeli
2008-04-24 9:51 ` Robin Holt
2008-04-24 9:51 ` Robin Holt
2008-04-24 9:51 ` [ofa-general] " Robin Holt
2008-04-24 15:39 ` Andrea Arcangeli
2008-04-24 15:39 ` Andrea Arcangeli
2008-04-24 15:39 ` [ofa-general] " Andrea Arcangeli
2008-04-24 17:41 ` Andrea Arcangeli [this message]
2008-04-24 17:41 ` Andrea Arcangeli
2008-04-24 17:41 ` [ofa-general] " Andrea Arcangeli
2008-04-26 13:17 ` Robin Holt
2008-04-26 13:17 ` Robin Holt
2008-04-26 13:17 ` Robin Holt
2008-04-26 14:04 ` Andrea Arcangeli
2008-04-26 14:04 ` Andrea Arcangeli
2008-04-26 14:04 ` Andrea Arcangeli
2008-04-27 12:27 ` Andrea Arcangeli
2008-04-27 12:27 ` Andrea Arcangeli
2008-04-27 12:27 ` Andrea Arcangeli
2008-04-28 20:34 ` Christoph Lameter
2008-04-28 20:34 ` Christoph Lameter
2008-04-28 20:34 ` [ofa-general] " Christoph Lameter
2008-04-29 0:10 ` Andrea Arcangeli
2008-04-29 0:10 ` Andrea Arcangeli
2008-04-29 0:10 ` Andrea Arcangeli
2008-04-29 1:28 ` Christoph Lameter
2008-04-29 1:28 ` Christoph Lameter
2008-04-29 1:28 ` Christoph Lameter
2008-04-29 15:30 ` Andrea Arcangeli
2008-04-29 15:30 ` Andrea Arcangeli
2008-04-29 15:30 ` [ofa-general] " Andrea Arcangeli
2008-04-29 15:50 ` Robin Holt
2008-04-29 15:50 ` Robin Holt
2008-04-29 15:50 ` [ofa-general] " Robin Holt
2008-04-29 16:03 ` Andrea Arcangeli
2008-04-29 16:03 ` Andrea Arcangeli
2008-04-29 16:03 ` [ofa-general] " Andrea Arcangeli
2008-05-07 15:00 ` Andrea Arcangeli
2008-05-07 15:00 ` Andrea Arcangeli
2008-05-07 15:00 ` [ofa-general] " Andrea Arcangeli
2008-04-29 10:49 ` Hugh Dickins
2008-04-29 10:49 ` Hugh Dickins
2008-04-29 10:49 ` [ofa-general] " Hugh Dickins
2008-04-29 13:32 ` Andrea Arcangeli
2008-04-29 13:32 ` Andrea Arcangeli
2008-04-23 13:36 ` Andrea Arcangeli
2008-04-23 13:36 ` Andrea Arcangeli
2008-04-23 13:36 ` [ofa-general] " Andrea Arcangeli
2008-04-23 14:47 ` Robin Holt
2008-04-23 14:47 ` Robin Holt
2008-04-23 14:47 ` [ofa-general] " Robin Holt
2008-04-23 15:59 ` Andrea Arcangeli
2008-04-23 15:59 ` Andrea Arcangeli
2008-04-23 15:59 ` [ofa-general] " Andrea Arcangeli
2008-04-23 18:09 ` Christoph Lameter
2008-04-23 18:09 ` Christoph Lameter
2008-04-23 18:09 ` [ofa-general] " Christoph Lameter
2008-04-23 18:19 ` Andrea Arcangeli
2008-04-23 18:19 ` Andrea Arcangeli
2008-04-23 18:19 ` [ofa-general] " Andrea Arcangeli
2008-04-23 18:27 ` Christoph Lameter
2008-04-23 18:27 ` Christoph Lameter
2008-04-23 18:27 ` [ofa-general] " Christoph Lameter
2008-04-23 18:37 ` Andrea Arcangeli
2008-04-23 18:37 ` Andrea Arcangeli
2008-04-23 18:37 ` [ofa-general] " Andrea Arcangeli
2008-04-23 18:46 ` Christoph Lameter
2008-04-23 18:46 ` Christoph Lameter
2008-04-23 18:46 ` [ofa-general] " Christoph Lameter
2008-04-22 23:20 ` Christoph Lameter
2008-04-22 23:20 ` Christoph Lameter
2008-04-22 23:20 ` [ofa-general] " Christoph Lameter
2008-04-23 16:26 ` Andrea Arcangeli
2008-04-23 16:26 ` Andrea Arcangeli
2008-04-23 16:26 ` [ofa-general] " Andrea Arcangeli
2008-04-23 17:24 ` Andrea Arcangeli
2008-04-23 17:24 ` Andrea Arcangeli
2008-04-23 18:21 ` Christoph Lameter
2008-04-23 18:21 ` Christoph Lameter
2008-04-23 18:21 ` [ofa-general] " Christoph Lameter
2008-04-23 18:34 ` Andrea Arcangeli
2008-04-23 18:34 ` Andrea Arcangeli
2008-04-23 18:15 ` Christoph Lameter
2008-04-23 18:15 ` Christoph Lameter
2008-04-23 18:15 ` [ofa-general] " Christoph Lameter
2008-04-23 17:09 ` Jack Steiner
2008-04-23 17:09 ` Jack Steiner
2008-04-23 17:09 ` [ofa-general] " Jack Steiner
2008-04-23 17:45 ` Andrea Arcangeli
2008-04-23 17:45 ` Andrea Arcangeli
2008-04-23 17:45 ` [ofa-general] " Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 02 of 12] Fix ia64 compilation failure because of common code include bug Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` [ofa-general] " Andrea Arcangeli
2008-04-22 20:22 ` Christoph Lameter
2008-04-22 20:22 ` Christoph Lameter
2008-04-22 20:22 ` Christoph Lameter
2008-04-22 22:43 ` Andrea Arcangeli
2008-04-22 22:43 ` Andrea Arcangeli
2008-04-22 22:43 ` [ofa-general] " Andrea Arcangeli
2008-04-22 23:07 ` Robin Holt
2008-04-22 23:07 ` Robin Holt
2008-04-22 23:07 ` [ofa-general] " Robin Holt
2008-04-22 13:51 ` [PATCH 03 of 12] get_task_mm should not succeed if mmput() is running and has reduced Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 20:23 ` Christoph Lameter
2008-04-22 20:23 ` Christoph Lameter
2008-04-22 20:23 ` [ofa-general] " Christoph Lameter
2008-04-22 22:37 ` Andrea Arcangeli
2008-04-22 22:37 ` Andrea Arcangeli
2008-04-22 22:37 ` [ofa-general] " Andrea Arcangeli
2008-04-22 23:13 ` Christoph Lameter
2008-04-22 23:13 ` Christoph Lameter
2008-04-22 23:13 ` [ofa-general] " Christoph Lameter
2008-04-22 13:51 ` [PATCH 04 of 12] Moves all mmu notifier methods outside the PT lock (first and not last Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 20:24 ` Christoph Lameter
2008-04-22 20:24 ` Christoph Lameter
2008-04-22 20:24 ` Christoph Lameter
2008-04-22 22:40 ` Andrea Arcangeli
2008-04-22 22:40 ` Andrea Arcangeli
2008-04-22 22:40 ` [ofa-general] " Andrea Arcangeli
2008-04-22 23:14 ` Christoph Lameter
2008-04-22 23:14 ` Christoph Lameter
2008-04-22 23:14 ` Christoph Lameter
2008-04-23 13:44 ` Andrea Arcangeli
2008-04-23 13:44 ` Andrea Arcangeli
2008-04-23 13:44 ` [ofa-general] " Andrea Arcangeli
2008-04-23 15:45 ` Robin Holt
2008-04-23 15:45 ` Robin Holt
2008-04-23 15:45 ` [ofa-general] " Robin Holt
2008-04-23 16:15 ` Andrea Arcangeli
2008-04-23 16:15 ` Andrea Arcangeli
2008-04-23 16:15 ` [ofa-general] " Andrea Arcangeli
2008-04-23 19:55 ` Robin Holt
2008-04-23 19:55 ` Robin Holt
2008-04-23 19:55 ` [ofa-general] " Robin Holt
2008-04-23 21:05 ` Avi Kivity
2008-04-23 21:05 ` Avi Kivity
2008-04-23 21:05 ` [ofa-general] " Avi Kivity
2008-04-23 18:02 ` Christoph Lameter
2008-04-23 18:02 ` Christoph Lameter
2008-04-23 18:02 ` [ofa-general] " Christoph Lameter
2008-04-23 18:16 ` Andrea Arcangeli
2008-04-23 18:16 ` Andrea Arcangeli
2008-04-23 18:16 ` [ofa-general] " Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 05 of 12] Move the tlb flushing into free_pgtables. The conversion of the locks Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 20:25 ` Christoph Lameter
2008-04-22 20:25 ` Christoph Lameter
2008-04-22 20:25 ` [ofa-general] " Christoph Lameter
2008-04-22 13:51 ` [PATCH 06 of 12] Move the tlb flushing inside of unmap vmas. This saves us from passing Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 07 of 12] Add a function to rw_semaphores to check if there are any processes Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 08 of 12] The conversion to a rwsem allows notifier callbacks during rmap traversal Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 09 of 12] Convert the anon_vma spinlock to a rw semaphore. This allows concurrent Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 10 of 12] Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 20:26 ` Christoph Lameter
2008-04-22 20:26 ` Christoph Lameter
2008-04-22 22:54 ` Andrea Arcangeli
2008-04-22 22:54 ` Andrea Arcangeli
2008-04-22 22:54 ` [ofa-general] " Andrea Arcangeli
2008-04-22 23:19 ` Christoph Lameter
2008-04-22 23:19 ` Christoph Lameter
2008-04-22 23:19 ` [ofa-general] " Christoph Lameter
2008-04-22 13:51 ` [PATCH 11 of 12] XPMEM would have used sys_madvise() except that madvise_dontneed() Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` [PATCH 12 of 12] This patch adds a lock ordering rule to avoid a potential deadlock when Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 13:51 ` Andrea Arcangeli
2008-04-22 18:22 ` [PATCH 00 of 12] mmu notifier #v13 Robin Holt
2008-04-22 18:22 ` Robin Holt
2008-04-22 18:22 ` Robin Holt
2008-04-22 18:43 ` Andrea Arcangeli
2008-04-22 18:43 ` Andrea Arcangeli
2008-04-22 19:42 ` Robin Holt
2008-04-22 19:42 ` Robin Holt
2008-04-22 19:42 ` [ofa-general] " Robin Holt
2008-04-22 20:30 ` Christoph Lameter
2008-04-22 20:30 ` Christoph Lameter
2008-04-22 20:30 ` [ofa-general] " Christoph Lameter
2008-04-23 13:33 ` Andrea Arcangeli
2008-04-23 13:33 ` Andrea Arcangeli
2008-04-23 13:33 ` [ofa-general] " Andrea Arcangeli
2008-04-22 20:28 ` Christoph Lameter
2008-04-22 20:28 ` Christoph Lameter
2008-04-22 20:28 ` [ofa-general] " Christoph Lameter
2008-04-23 0:31 ` Jack Steiner
2008-04-23 0:31 ` Jack Steiner
2008-04-23 0:31 ` [ofa-general] " Jack Steiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080424174145.GM24536@duo.random \
--to=andrea@qumranet.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=avi@qumranet.com \
--cc=clameter@sgi.com \
--cc=general@lists.openfabrics.org \
--cc=holt@sgi.com \
--cc=hugh@veritas.com \
--cc=kanojsarcar@yahoo.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=rdreier@cisco.com \
--cc=rusty@rustcorp.com.au \
--cc=steiner@sgi.com \
--cc=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.