* [PATCH 0/8] mm: Preemptibility -v8
@ 2011-02-17 17:05 Peter Zijlstra
2011-02-17 17:05 ` [PATCH 1/8] lockdep, mutex: Provide mutex_lock_nest_lock Peter Zijlstra
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
This series depends on the previous two series:
- mm: Simplify anon_vma lifetime rules
- mm: mmu_gather rework
These patches make part of the mm a lot more preemptible. It converts
i_mmap_lock and anon_vma->lock to mutexes which together with the mmu_gather
rework makes mmu_gather preemptible as well.
Making i_mmap_lock a mutex also enables a clean-up of the truncate code.
This also allows for preemptible mmu_notifiers, something that XPMEM I think
wants.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/8] lockdep, mutex: Provide mutex_lock_nest_lock
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:05 ` [PATCH 2/8] mm: Remove i_mmap_mutex lockbreak Peter Zijlstra
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-lockdep_mutex-provide_mutex_lock_nest_lock.patch --]
[-- Type: text/plain, Size: 5745 bytes --]
Provide the mutex_lock_nest_lock() annotation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/lockdep.h | 3 +++
include/linux/mutex.h | 9 +++++++++
kernel/mutex.c | 25 +++++++++++++++++--------
3 files changed, 29 insertions(+), 8 deletions(-)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -495,12 +495,15 @@ static inline void print_irqtrace_events
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# ifdef CONFIG_PROVE_LOCKING
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 2, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 2, n, i)
# else
# define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 1, NULL, i)
+# define mutex_acquire_nest(l, s, t, n, i) lock_acquire(l, s, t, 0, 1, n, i)
# endif
# define mutex_release(l, n, i) lock_release(l, n, i)
#else
# define mutex_acquire(l, s, t, i) do { } while (0)
+# define mutex_acquire_nest(l, s, t, n, i) do { } while (0)
# define mutex_release(l, n, i) do { } while (0)
#endif
Index: linux-2.6/include/linux/mutex.h
===================================================================
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -132,6 +132,7 @@ static inline int mutex_is_locked(struct
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
+extern void _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest_lock);
extern int __must_check mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
extern int __must_check mutex_lock_killable_nested(struct mutex *lock,
@@ -140,6 +141,13 @@ extern int __must_check mutex_lock_killa
#define mutex_lock(lock) mutex_lock_nested(lock, 0)
#define mutex_lock_interruptible(lock) mutex_lock_interruptible_nested(lock, 0)
#define mutex_lock_killable(lock) mutex_lock_killable_nested(lock, 0)
+
+#define mutex_lock_nest_lock(lock, nest_lock) \
+do { \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \
+ _mutex_lock_nest_lock(lock, &(nest_lock)->dep_map); \
+} while (0)
+
#else
extern void mutex_lock(struct mutex *lock);
extern int __must_check mutex_lock_interruptible(struct mutex *lock);
@@ -148,6 +156,7 @@ extern int __must_check mutex_lock_killa
# define mutex_lock_nested(lock, subclass) mutex_lock(lock)
# define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock)
# define mutex_lock_killable_nested(lock, subclass) mutex_lock_killable(lock)
+# define mutex_lock_nest_lock(lock, nest_lock) mutex_lock(lock)
#endif
/*
Index: linux-2.6/kernel/mutex.c
===================================================================
--- linux-2.6.orig/kernel/mutex.c
+++ linux-2.6/kernel/mutex.c
@@ -131,14 +131,14 @@ EXPORT_SYMBOL(mutex_unlock);
*/
static inline int __sched
__mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
- unsigned long ip)
+ struct lockdep_map *nest_lock, unsigned long ip)
{
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
preempt_disable();
- mutex_acquire(&lock->dep_map, subclass, 0, ip);
+ mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
/*
@@ -276,16 +276,25 @@ void __sched
mutex_lock_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_nested);
+void __sched
+_mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
+{
+ might_sleep();
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, nest, _RET_IP_);
+}
+
+EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock);
+
int __sched
mutex_lock_killable_nested(struct mutex *lock, unsigned int subclass)
{
might_sleep();
- return __mutex_lock_common(lock, TASK_KILLABLE, subclass, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_killable_nested);
@@ -294,7 +303,7 @@ mutex_lock_interruptible_nested(struct m
{
might_sleep();
return __mutex_lock_common(lock, TASK_INTERRUPTIBLE,
- subclass, _RET_IP_);
+ subclass, NULL, _RET_IP_);
}
EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
@@ -400,7 +409,7 @@ __mutex_lock_slowpath(atomic_t *lock_cou
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, _RET_IP_);
+ __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -408,7 +417,7 @@ __mutex_lock_killable_slowpath(atomic_t
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_KILLABLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
@@ -416,7 +425,7 @@ __mutex_lock_interruptible_slowpath(atom
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
- return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, _RET_IP_);
+ return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/8] mm: Remove i_mmap_mutex lockbreak
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
2011-02-17 17:05 ` [PATCH 1/8] lockdep, mutex: Provide mutex_lock_nest_lock Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:46 ` KAMEZAWA Hiroyuki
2011-02-17 17:05 ` [PATCH 3/8] mm: Convert i_mmap_lock to a mutex Peter Zijlstra
` (6 subsequent siblings)
8 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: mm-fix-zap_block_size.patch --]
[-- Type: text/plain, Size: 16470 bytes --]
Hugh says:
"The only significant loser, I think, would be page reclaim (when
concurrent with truncation): could spin for a long time waiting for
the i_mmap_mutex it expects would soon be dropped? "
Counter points:
- cpu contention makes the spin stop (need_resched())
- zap pages should be freeing pages at a higher rate than reclaim
ever can
- shouldn't hold up reclaim more than lock_page() would
I think the simplification of the truncate code is definately worth
it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/fs.h | 1
include/linux/mm.h | 2
include/linux/mm_types.h | 1
kernel/fork.c | 1
mm/memory.c | 193 ++++++-----------------------------------------
mm/mmap.c | 13 ---
mm/mremap.c | 3
7 files changed, 27 insertions(+), 187 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -878,8 +878,6 @@ struct zap_details {
struct address_space *check_mapping; /* Check page->mapping if set */
pgoff_t first_index; /* Lowest page->index to unmap */
pgoff_t last_index; /* Highest page->index to unmap */
- spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */
- unsigned long truncate_count; /* Compare vm_truncate_count */
};
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -986,13 +986,13 @@ int copy_page_range(struct mm_struct *ds
static unsigned long zap_pte_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
struct mm_struct *mm = tlb->mm;
int need_flush = 0;
- pte_t *pte;
- spinlock_t *ptl;
int rss[NR_MM_COUNTERS];
+ spinlock_t *ptl;
+ pte_t *pte;
init_rss_vec(rss);
again:
@@ -1001,12 +1001,9 @@ static unsigned long zap_pte_range(struc
do {
pte_t ptent = *pte;
if (pte_none(ptent)) {
- (*zap_work)--;
continue;
}
- (*zap_work) -= PAGE_SIZE;
-
if (pte_present(ptent)) {
struct page *page;
@@ -1073,8 +1070,7 @@ static unsigned long zap_pte_range(struc
print_bad_pte(vma, addr, ptent, NULL);
}
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
- } while (pte++, addr += PAGE_SIZE,
- (addr != end && *zap_work > 0 && !need_flush));
+ } while (pte++, addr += PAGE_SIZE, (addr != end && !need_flush));
add_mm_rss_vec(mm, rss);
arch_leave_lazy_mmu_mode();
@@ -1093,7 +1089,7 @@ static unsigned long zap_pte_range(struc
static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pmd_t *pmd;
unsigned long next;
@@ -1105,19 +1101,15 @@ static inline unsigned long zap_pmd_rang
if (next-addr != HPAGE_PMD_SIZE) {
VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
split_huge_page_pmd(vma->vm_mm, pmd);
- } else if (zap_huge_pmd(tlb, vma, pmd)) {
- (*zap_work)--;
+ } else if (zap_huge_pmd(tlb, vma, pmd))
continue;
- }
/* fall through */
}
- if (pmd_none_or_clear_bad(pmd)) {
- (*zap_work)--;
+ if (pmd_none_or_clear_bad(pmd))
continue;
- }
- next = zap_pte_range(tlb, vma, pmd, addr, next,
- zap_work, details);
- } while (pmd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pte_range(tlb, vma, pmd, addr, next, details);
+ cond_resched();
+ } while (pmd++, addr = next, addr != end);
return addr;
}
@@ -1125,7 +1117,7 @@ static inline unsigned long zap_pmd_rang
static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pud_t *pud;
unsigned long next;
@@ -1133,13 +1125,10 @@ static inline unsigned long zap_pud_rang
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
- if (pud_none_or_clear_bad(pud)) {
- (*zap_work)--;
+ if (pud_none_or_clear_bad(pud))
continue;
- }
- next = zap_pmd_range(tlb, vma, pud, addr, next,
- zap_work, details);
- } while (pud++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pmd_range(tlb, vma, pud, addr, next, details);
+ } while (pud++, addr = next, addr != end);
return addr;
}
@@ -1147,7 +1136,7 @@ static inline unsigned long zap_pud_rang
static unsigned long unmap_page_range(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long addr, unsigned long end,
- long *zap_work, struct zap_details *details)
+ struct zap_details *details)
{
pgd_t *pgd;
unsigned long next;
@@ -1161,13 +1150,10 @@ static unsigned long unmap_page_range(st
pgd = pgd_offset(vma->vm_mm, addr);
do {
next = pgd_addr_end(addr, end);
- if (pgd_none_or_clear_bad(pgd)) {
- (*zap_work)--;
+ if (pgd_none_or_clear_bad(pgd))
continue;
- }
- next = zap_pud_range(tlb, vma, pgd, addr, next,
- zap_work, details);
- } while (pgd++, addr = next, (addr != end && *zap_work > 0));
+ next = zap_pud_range(tlb, vma, pgd, addr, next, details);
+ } while (pgd++, addr = next, addr != end);
tlb_end_vma(tlb, vma);
mem_cgroup_uncharge_end();
@@ -1212,9 +1198,7 @@ unsigned long unmap_vmas(struct mmu_gath
unsigned long end_addr, unsigned long *nr_accounted,
struct zap_details *details)
{
- long zap_work = ZAP_BLOCK_SIZE;
unsigned long start = start_addr;
- spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
struct mm_struct *mm = vma->vm_mm;
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
@@ -1247,33 +1231,15 @@ unsigned long unmap_vmas(struct mmu_gath
* Since no pte has actually been setup, it is
* safe to do nothing in this case.
*/
- if (vma->vm_file) {
+ if (vma->vm_file)
unmap_hugepage_range(vma, start, end, NULL);
- zap_work -= (end - start) /
- pages_per_huge_page(hstate_vma(vma));
- }
start = end;
} else
- start = unmap_page_range(tlb, vma,
- start, end, &zap_work, details);
-
- if (zap_work > 0) {
- BUG_ON(start != end);
- break;
- }
-
- if (need_resched() ||
- (i_mmap_lock && spin_needbreak(i_mmap_lock))) {
- if (i_mmap_lock)
- goto out;
- cond_resched();
- }
-
- zap_work = ZAP_BLOCK_SIZE;
+ start = unmap_page_range(tlb, vma, start, end, details);
}
}
-out:
+
mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
return start; /* which is now the end (or restart) address */
}
@@ -2535,96 +2501,11 @@ static int do_wp_page(struct mm_struct *
return ret;
}
-/*
- * Helper functions for unmap_mapping_range().
- *
- * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
- *
- * We have to restart searching the prio_tree whenever we drop the lock,
- * since the iterator is only valid while the lock is held, and anyway
- * a later vma might be split and reinserted earlier while lock dropped.
- *
- * The list of nonlinear vmas could be handled more efficiently, using
- * a placeholder, but handle it in the same way until a need is shown.
- * It is important to search the prio_tree before nonlinear list: a vma
- * may become nonlinear and be shifted from prio_tree to nonlinear list
- * while the lock is dropped; but never shifted from list to prio_tree.
- *
- * In order to make forward progress despite restarting the search,
- * vm_truncate_count is used to mark a vma as now dealt with, so we can
- * quickly skip it next time around. Since the prio_tree search only
- * shows us those vmas affected by unmapping the range in question, we
- * can't efficiently keep all vmas in step with mapping->truncate_count:
- * so instead reset them all whenever it wraps back to 0 (then go to 1).
- * mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_lock.
- *
- * In order to make forward progress despite repeatedly restarting some
- * large vma, note the restart_addr from unmap_vmas when it breaks out:
- * and restart from that address when we reach that vma again. It might
- * have been split or merged, shrunk or extended, but never shifted: so
- * restart_addr remains valid so long as it remains in the vma's range.
- * unmap_mapping_range forces truncate_count to leap over page-aligned
- * values so we can save vma's restart_addr in its truncate_count field.
- */
-#define is_restart_addr(truncate_count) (!((truncate_count) & ~PAGE_MASK))
-
-static void reset_vma_truncate_counts(struct address_space *mapping)
-{
- struct vm_area_struct *vma;
- struct prio_tree_iter iter;
-
- vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX)
- vma->vm_truncate_count = 0;
- list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
- vma->vm_truncate_count = 0;
-}
-
-static int unmap_mapping_range_vma(struct vm_area_struct *vma,
+static void unmap_mapping_range_vma(struct vm_area_struct *vma,
unsigned long start_addr, unsigned long end_addr,
struct zap_details *details)
{
- unsigned long restart_addr;
- int need_break;
-
- /*
- * files that support invalidating or truncating portions of the
- * file from under mmaped areas must have their ->fault function
- * return a locked page (and set VM_FAULT_LOCKED in the return).
- * This provides synchronisation against concurrent unmapping here.
- */
-
-again:
- restart_addr = vma->vm_truncate_count;
- if (is_restart_addr(restart_addr) && start_addr < restart_addr) {
- start_addr = restart_addr;
- if (start_addr >= end_addr) {
- /* Top of vma has been split off since last time */
- vma->vm_truncate_count = details->truncate_count;
- return 0;
- }
- }
-
- restart_addr = zap_page_range(vma, start_addr,
- end_addr - start_addr, details);
- need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
-
- if (restart_addr >= end_addr) {
- /* We have now completed this vma: mark it so */
- vma->vm_truncate_count = details->truncate_count;
- if (!need_break)
- return 0;
- } else {
- /* Note restart_addr in vma's truncate_count field */
- vma->vm_truncate_count = restart_addr;
- if (!need_break)
- goto again;
- }
-
- spin_unlock(details->i_mmap_lock);
- cond_resched();
- spin_lock(details->i_mmap_lock);
- return -EINTR;
+ zap_page_range(vma, start_addr, end_addr - start_addr, details);
}
static inline void unmap_mapping_range_tree(struct prio_tree_root *root,
@@ -2634,12 +2515,8 @@ static inline void unmap_mapping_range_t
struct prio_tree_iter iter;
pgoff_t vba, vea, zba, zea;
-restart:
vma_prio_tree_foreach(vma, &iter, root,
details->first_index, details->last_index) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
vba = vma->vm_pgoff;
vea = vba + ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) - 1;
@@ -2651,11 +2528,10 @@ static inline void unmap_mapping_range_t
if (zea > vea)
zea = vea;
- if (unmap_mapping_range_vma(vma,
+ unmap_mapping_range_vma(vma,
((zba - vba) << PAGE_SHIFT) + vma->vm_start,
((zea - vba + 1) << PAGE_SHIFT) + vma->vm_start,
- details) < 0)
- goto restart;
+ details);
}
}
@@ -2670,15 +2546,9 @@ static inline void unmap_mapping_range_l
* across *all* the pages in each nonlinear VMA, not just the pages
* whose virtual address lies outside the file truncation point.
*/
-restart:
list_for_each_entry(vma, head, shared.vm_set.list) {
- /* Skip quickly over those we have already dealt with */
- if (vma->vm_truncate_count == details->truncate_count)
- continue;
details->nonlinear_vma = vma;
- if (unmap_mapping_range_vma(vma, vma->vm_start,
- vma->vm_end, details) < 0)
- goto restart;
+ unmap_mapping_range_vma(vma, vma->vm_start, vma->vm_end, details);
}
}
@@ -2717,19 +2587,8 @@ void unmap_mapping_range(struct address_
details.last_index = hba + hlen - 1;
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- details.i_mmap_lock = &mapping->i_mmap_lock;
spin_lock(&mapping->i_mmap_lock);
-
- /* Protect against endless unmapping loops */
- mapping->truncate_count++;
- if (unlikely(is_restart_addr(mapping->truncate_count))) {
- if (mapping->truncate_count == 0)
- reset_vma_truncate_counts(mapping);
- mapping->truncate_count++;
- }
- details.truncate_count = mapping->truncate_count;
-
if (unlikely(!prio_tree_empty(&mapping->i_mmap)))
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -640,7 +640,6 @@ struct address_space {
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
spinlock_t i_mmap_lock; /* protect tree, count, list */
- unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
Index: linux-2.6/include/linux/mm_types.h
===================================================================
--- linux-2.6.orig/include/linux/mm_types.h
+++ linux-2.6/include/linux/mm_types.h
@@ -175,7 +175,6 @@ struct vm_area_struct {
units, *not* PAGE_CACHE_SIZE */
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
- unsigned long vm_truncate_count;/* truncate_count or restart_addr */
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -379,7 +379,6 @@ static int dup_mmap(struct mm_struct *mm
spin_lock(&mapping->i_mmap_lock);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
- tmp->vm_truncate_count = mpnt->vm_truncate_count;
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -464,10 +464,8 @@ static void vma_link(struct mm_struct *m
if (vma->vm_file)
mapping = vma->vm_file->f_mapping;
- if (mapping) {
+ if (mapping)
spin_lock(&mapping->i_mmap_lock);
- vma->vm_truncate_count = mapping->truncate_count;
- }
__vma_link(mm, vma, prev, rb_link, rb_parent);
__vma_link_file(vma);
@@ -577,16 +575,7 @@ again: remove_next = 1 + (end > next->
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
spin_lock(&mapping->i_mmap_lock);
- if (importer &&
- vma->vm_truncate_count != next->vm_truncate_count) {
- /*
- * unmap_mapping_range might be in progress:
- * ensure that the expanding vma is rescanned.
- */
- importer->vm_truncate_count = 0;
- }
if (insert) {
- insert->vm_truncate_count = vma->vm_truncate_count;
/*
* Put into prio_tree now, so instantiated pages
* are visible to arm/parisc __flush_dcache_page
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -94,9 +94,6 @@ static void move_ptes(struct vm_area_str
*/
mapping = vma->vm_file->f_mapping;
spin_lock(&mapping->i_mmap_lock);
- if (new_vma->vm_truncate_count &&
- new_vma->vm_truncate_count != vma->vm_truncate_count)
- new_vma->vm_truncate_count = 0;
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/8] mm: Convert i_mmap_lock to a mutex
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
2011-02-17 17:05 ` [PATCH 1/8] lockdep, mutex: Provide mutex_lock_nest_lock Peter Zijlstra
2011-02-17 17:05 ` [PATCH 2/8] mm: Remove i_mmap_mutex lockbreak Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:05 ` [PATCH 4/8] mm: Revert page_lock_anon_vma() lock annotation Peter Zijlstra
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-convert_i_mmap_lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 23113 bytes --]
Straight fwd conversion of i_mmap_lock to a mutex
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Documentation/lockstat.txt | 2 +-
Documentation/vm/locking | 2 +-
arch/x86/mm/hugetlbpage.c | 4 ++--
fs/gfs2/main.c | 2 +-
fs/hugetlbfs/inode.c | 4 ++--
fs/inode.c | 2 +-
fs/nilfs2/page.c | 2 +-
include/linux/fs.h | 2 +-
include/linux/mm.h | 1 +
include/linux/mmu_notifier.h | 2 +-
kernel/fork.c | 4 ++--
mm/filemap.c | 10 +++++-----
mm/filemap_xip.c | 4 ++--
mm/fremap.c | 4 ++--
mm/hugetlb.c | 14 +++++++-------
mm/memory-failure.c | 4 ++--
mm/memory.c | 4 ++--
mm/mmap.c | 22 +++++++++++-----------
mm/mremap.c | 4 ++--
mm/rmap.c | 28 ++++++++++++++--------------
20 files changed, 61 insertions(+), 60 deletions(-)
Index: linux-2.6/arch/x86/mm/hugetlbpage.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/hugetlbpage.c
+++ linux-2.6/arch/x86/mm/hugetlbpage.c
@@ -72,7 +72,7 @@ static void huge_pmd_share(struct mm_str
if (!vma_shareable(vma, addr))
return;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
if (svma == vma)
continue;
@@ -97,7 +97,7 @@ static void huge_pmd_share(struct mm_str
put_page(virt_to_page(spte));
spin_unlock(&mm->page_table_lock);
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/fs/hugetlbfs/inode.c
===================================================================
--- linux-2.6.orig/fs/hugetlbfs/inode.c
+++ linux-2.6/fs/hugetlbfs/inode.c
@@ -413,10 +413,10 @@ static int hugetlb_vmtruncate(struct ino
pgoff = offset >> PAGE_SHIFT;
i_size_write(inode, offset);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (!prio_tree_empty(&mapping->i_mmap))
hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
truncate_hugepages(inode, offset);
return 0;
}
Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -310,7 +310,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(&inode->i_lru);
INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
spin_lock_init(&inode->i_data.tree_lock);
- spin_lock_init(&inode->i_data.i_mmap_lock);
+ mutex_init(&inode->i_data.i_mmap_mutex);
INIT_LIST_HEAD(&inode->i_data.private_list);
spin_lock_init(&inode->i_data.private_lock);
INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -639,7 +639,7 @@ struct address_space {
unsigned int i_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
- spinlock_t i_mmap_lock; /* protect tree, count, list */
+ struct mutex i_mmap_mutex; /* protect tree, count, list */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -376,14 +376,14 @@ static int dup_mmap(struct mm_struct *mm
get_file(file);
if (tmp->vm_flags & VM_DENYWRITE)
atomic_dec(&inode->i_writecount);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (tmp->vm_flags & VM_SHARED)
mapping->i_mmap_writable++;
flush_dcache_mmap_lock(mapping);
/* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
/*
Index: linux-2.6/mm/filemap_xip.c
===================================================================
--- linux-2.6.orig/mm/filemap_xip.c
+++ linux-2.6/mm/filemap_xip.c
@@ -183,7 +183,7 @@ __xip_unmap (struct address_space * mapp
return;
retry:
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
mm = vma->vm_mm;
address = vma->vm_start +
@@ -201,7 +201,7 @@ __xip_unmap (struct address_space * mapp
page_cache_release(page);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (locked) {
mutex_unlock(&xip_sparse_mutex);
Index: linux-2.6/mm/fremap.c
===================================================================
--- linux-2.6.orig/mm/fremap.c
+++ linux-2.6/mm/fremap.c
@@ -211,13 +211,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
}
goto out;
}
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
flush_dcache_mmap_lock(mapping);
vma->vm_flags |= VM_NONLINEAR;
vma_prio_tree_remove(vma, &mapping->i_mmap);
vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
flush_dcache_mmap_unlock(mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
if (vma->vm_flags & VM_LOCKED) {
Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c
+++ linux-2.6/mm/hugetlb.c
@@ -2207,7 +2207,7 @@ void __unmap_hugepage_range(struct vm_ar
unsigned long sz = huge_page_size(h);
/*
- * A page gathering list, protected by per file i_mmap_lock. The
+ * A page gathering list, protected by per file i_mmap_mutex. The
* lock is used to avoid list corruption from multiple unmapping
* of the same page since we are using page->lru.
*/
@@ -2276,9 +2276,9 @@ void __unmap_hugepage_range(struct vm_ar
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page)
{
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
__unmap_hugepage_range(vma, start, end, ref_page);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
}
/*
@@ -2310,7 +2310,7 @@ static int unmap_ref_private(struct mm_s
* this mapping should be shared between all the VMAs,
* __unmap_hugepage_range() is called as the lock is already held
*/
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(iter_vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
/* Do not unmap the current VMA */
if (iter_vma == vma)
@@ -2328,7 +2328,7 @@ static int unmap_ref_private(struct mm_s
address, address + huge_page_size(h),
page);
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return 1;
}
@@ -2812,7 +2812,7 @@ void hugetlb_change_protection(struct vm
BUG_ON(address >= end);
flush_cache_range(vma, address, end);
- spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_lock(&vma->vm_file->f_mapping->i_mmap_mutex);
spin_lock(&mm->page_table_lock);
for (; address < end; address += huge_page_size(h)) {
ptep = huge_pte_offset(mm, address);
@@ -2827,7 +2827,7 @@ void hugetlb_change_protection(struct vm
}
}
spin_unlock(&mm->page_table_lock);
- spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+ mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
flush_tlb_range(vma, start, end);
}
Index: linux-2.6/mm/memory-failure.c
===================================================================
--- linux-2.6.orig/mm/memory-failure.c
+++ linux-2.6/mm/memory-failure.c
@@ -429,7 +429,7 @@ static void collect_procs_file(struct pa
*/
read_lock(&tasklist_lock);
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
for_each_process(tsk) {
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -449,7 +449,7 @@ static void collect_procs_file(struct pa
add_to_kill(tsk, page, vma, to_kill, tkc);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
read_unlock(&tasklist_lock);
}
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -2588,12 +2588,12 @@ void unmap_mapping_range(struct address_
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (unlikely(!prio_tree_empty(&mapping->i_mmap)))
unmap_mapping_range_tree(&mapping->i_mmap, &details);
if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
EXPORT_SYMBOL(unmap_mapping_range);
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -190,7 +190,7 @@ int __vm_enough_memory(struct mm_struct
}
/*
- * Requires inode->i_mapping->i_mmap_lock
+ * Requires inode->i_mapping->i_mmap_mutex
*/
static void __remove_shared_vm_struct(struct vm_area_struct *vma,
struct file *file, struct address_space *mapping)
@@ -218,9 +218,9 @@ void unlink_file_vma(struct vm_area_stru
if (file) {
struct address_space *mapping = file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
__remove_shared_vm_struct(vma, file, mapping);
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
}
}
@@ -465,13 +465,13 @@ static void vma_link(struct mm_struct *m
mapping = vma->vm_file->f_mapping;
if (mapping)
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
__vma_link(mm, vma, prev, rb_link, rb_parent);
__vma_link_file(vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mm->map_count++;
validate_mm(mm);
@@ -574,7 +574,7 @@ again: remove_next = 1 + (end > next->
mapping = file->f_mapping;
if (!(vma->vm_flags & VM_NONLINEAR))
root = &mapping->i_mmap;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
if (insert) {
/*
* Put into prio_tree now, so instantiated pages
@@ -641,7 +641,7 @@ again: remove_next = 1 + (end > next->
if (anon_vma)
anon_vma_unlock(anon_vma);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (remove_next) {
if (file) {
@@ -2300,7 +2300,7 @@ void exit_mmap(struct mm_struct *mm)
/* Insert vm structure into process list sorted by address
* and into the inode's i_mmap tree. If vm_file is non-NULL
- * then i_mmap_lock is taken here.
+ * then i_mmap_mutex is taken here.
*/
int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
{
@@ -2542,7 +2542,7 @@ static void vm_lock_mapping(struct mm_st
*/
if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
BUG();
- spin_lock_nest_lock(&mapping->i_mmap_lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&mapping->i_mmap_mutex, &mm->mmap_sem);
}
}
@@ -2569,7 +2569,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_lock or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2641,7 +2641,7 @@ static void vm_unlock_mapping(struct add
* AS_MM_ALL_LOCKS can't change to 0 from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
if (!test_and_clear_bit(AS_MM_ALL_LOCKS,
&mapping->flags))
BUG();
Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c
+++ linux-2.6/mm/mremap.c
@@ -93,7 +93,7 @@ static void move_ptes(struct vm_area_str
* and we propagate stale pages into the dst afterward.
*/
mapping = vma->vm_file->f_mapping;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
}
/*
@@ -122,7 +122,7 @@ static void move_ptes(struct vm_area_str
pte_unmap(new_pte - 1);
pte_unmap_unlock(old_pte - 1, old_ptl);
if (mapping)
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
}
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -24,7 +24,7 @@
* inode->i_alloc_sem (vmtruncate_range)
* mm->mmap_sem
* page->flags PG_locked (lock_page)
- * mapping->i_mmap_lock
+ * mapping->i_mmap_mutex
* anon_vma->lock
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
@@ -629,14 +629,14 @@ static int page_referenced_file(struct p
* The page lock not only makes sure that page->mapping cannot
* suddenly be NULLified by truncation, it makes sure that the
* structure at mapping cannot be freed and reused yet,
- * so we can safely take mapping->i_mmap_lock.
+ * so we can safely take mapping->i_mmap_mutex.
*/
BUG_ON(!PageLocked(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
/*
- * i_mmap_lock does not stabilize mapcount at all, but mapcount
+ * i_mmap_mutex does not stabilize mapcount at all, but mapcount
* is more likely to be accurate if we note it after spinning.
*/
mapcount = page_mapcount(page);
@@ -658,7 +658,7 @@ static int page_referenced_file(struct p
break;
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return referenced;
}
@@ -745,7 +745,7 @@ static int page_mkclean_file(struct addr
BUG_ON(PageAnon(page));
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
if (vma->vm_flags & VM_SHARED) {
unsigned long address = vma_address(page, vma);
@@ -754,7 +754,7 @@ static int page_mkclean_file(struct addr
ret += page_mkclean_one(page, vma, address);
}
}
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1105,7 +1105,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_lock.
+ * we now hold anon_vma->lock or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
@@ -1331,7 +1331,7 @@ static int try_to_unmap_file(struct page
unsigned long max_nl_size = 0;
unsigned int mapcount;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1377,7 +1377,7 @@ static int try_to_unmap_file(struct page
mapcount = page_mapcount(page);
if (!mapcount)
goto out;
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
if (max_nl_cursor == 0)
@@ -1399,7 +1399,7 @@ static int try_to_unmap_file(struct page
}
vma->vm_private_data = (void *) max_nl_cursor;
}
- cond_resched_lock(&mapping->i_mmap_lock);
+ cond_resched();
max_nl_cursor += CLUSTER_SIZE;
} while (max_nl_cursor <= max_nl_size);
@@ -1411,7 +1411,7 @@ static int try_to_unmap_file(struct page
list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
vma->vm_private_data = NULL;
out:
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
@@ -1527,7 +1527,7 @@ static int rmap_walk_file(struct page *p
if (!mapping)
return ret;
- spin_lock(&mapping->i_mmap_lock);
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
unsigned long address = vma_address(page, vma);
if (address == -EFAULT)
@@ -1541,7 +1541,7 @@ static int rmap_walk_file(struct page *p
* never contain migration ptes. Decide what to do about this
* limitation to linear when we need rmap_walk() on nonlinear.
*/
- spin_unlock(&mapping->i_mmap_lock);
+ mutex_unlock(&mapping->i_mmap_mutex);
return ret;
}
Index: linux-2.6/fs/gfs2/main.c
===================================================================
--- linux-2.6.orig/fs/gfs2/main.c
+++ linux-2.6/fs/gfs2/main.c
@@ -62,7 +62,7 @@ static void gfs2_init_gl_aspace_once(voi
memset(mapping, 0, sizeof(*mapping));
INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
spin_lock_init(&mapping->tree_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
Index: linux-2.6/fs/nilfs2/page.c
===================================================================
--- linux-2.6.orig/fs/nilfs2/page.c
+++ linux-2.6/fs/nilfs2/page.c
@@ -500,7 +500,7 @@ void nilfs_mapping_init_once(struct addr
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);
- spin_lock_init(&mapping->i_mmap_lock);
+ mutex_init(&mapping->i_mmap_mutex);
INIT_RAW_PRIO_TREE_ROOT(&mapping->i_mmap);
INIT_LIST_HEAD(&mapping->i_mmap_nonlinear);
}
Index: linux-2.6/Documentation/lockstat.txt
===================================================================
--- linux-2.6.orig/Documentation/lockstat.txt
+++ linux-2.6/Documentation/lockstat.txt
@@ -136,7 +136,7 @@ The integer part of the time values is i
dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
&inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74
&zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06
- &inode->i_data.i_mmap_lock: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
+ &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44
&q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52
&rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62
&rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63
Index: linux-2.6/Documentation/vm/locking
===================================================================
--- linux-2.6.orig/Documentation/vm/locking
+++ linux-2.6/Documentation/vm/locking
@@ -66,7 +66,7 @@ in some cases it is not really needed. E
expand_stack(), it is hard to come up with a destructive scenario without
having the vmlist protection in this case.
-The page_table_lock nests with the inode i_mmap_lock and the kmem cache
+The page_table_lock nests with the inode i_mmap_mutex and the kmem cache
c_spinlock spinlocks. This is okay, since the kmem code asks for pages after
dropping c_spinlock. The page_table_lock also nests with pagecache_lock and
pagemap_lru_lock spinlocks, and no code asks for memory with these locks
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_lock or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -58,16 +58,16 @@
/*
* Lock ordering:
*
- * ->i_mmap_lock (truncate_pagecache)
+ * ->i_mmap_mutex (truncate_pagecache)
* ->private_lock (__free_pte->__set_page_dirty_buffers)
* ->swap_lock (exclusive_swap_page, others)
* ->mapping->tree_lock
*
* ->i_mutex
- * ->i_mmap_lock (truncate->unmap_mapping_range)
+ * ->i_mmap_mutex (truncate->unmap_mapping_range)
*
* ->mmap_sem
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->page_table_lock or pte_lock (various, mainly in memory.c)
* ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock)
*
@@ -84,7 +84,7 @@
* ->sb_lock (fs/fs-writeback.c)
* ->mapping->tree_lock (__sync_single_inode)
*
- * ->i_mmap_lock
+ * ->i_mmap_mutex
* ->anon_vma.lock (vma_adjust)
*
* ->anon_vma.lock
@@ -104,7 +104,7 @@
*
* (code doesn't rely on that order, so you could switch it around)
* ->tasklist_lock (memory_failure, collect_procs_ao)
- * ->i_mmap_lock
+ * ->i_mmap_mutex
*/
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/8] mm: Revert page_lock_anon_vma() lock annotation
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
` (2 preceding siblings ...)
2011-02-17 17:05 ` [PATCH 3/8] mm: Convert i_mmap_lock to a mutex Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:05 ` [PATCH 5/8] mm: Improve page_lock_anon_vma() comment Peter Zijlstra
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Namhyung Kim,
Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-revert_page_lock_anon_vma_lock_annotation.patch --]
[-- Type: text/plain, Size: 2207 bytes --]
Its beyond ugly and gets in the way.
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/rmap.h | 15 +--------------
mm/rmap.c | 4 +---
2 files changed, 2 insertions(+), 17 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -243,20 +243,7 @@ int try_to_munlock(struct page *);
/*
* Called by memory-failure.c to kill processes.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page);
-
-static inline struct anon_vma *page_lock_anon_vma(struct page *page)
-{
- struct anon_vma *anon_vma;
-
- __cond_lock(RCU, anon_vma = __page_lock_anon_vma(page));
-
- /* (void) is needed to make gcc happy */
- (void) __cond_lock(&anon_vma->root->lock, anon_vma);
-
- return anon_vma;
-}
-
+struct anon_vma *page_lock_anon_vma(struct page *page);
void page_unlock_anon_vma(struct anon_vma *anon_vma);
int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -318,7 +318,7 @@ void __init anon_vma_init(void)
* Getting a lock on a stable anon_vma from a page off the LRU is
* tricky: page_lock_anon_vma rely on RCU to guard against the races.
*/
-struct anon_vma *__page_lock_anon_vma(struct page *page)
+struct anon_vma *page_lock_anon_vma(struct page *page)
{
struct anon_vma *anon_vma, *root_anon_vma;
unsigned long anon_mapping;
@@ -352,8 +352,6 @@ struct anon_vma *__page_lock_anon_vma(st
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
- __releases(&anon_vma->root->lock)
- __releases(RCU)
{
anon_vma_unlock(anon_vma);
rcu_read_unlock();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 5/8] mm: Improve page_lock_anon_vma() comment
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
` (3 preceding siblings ...)
2011-02-17 17:05 ` [PATCH 4/8] mm: Revert page_lock_anon_vma() lock annotation Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:05 ` [PATCH 6/8] mm: Use refcounts for page_lock_anon_vma() Peter Zijlstra
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-improve_page_lock_anon_vma_comment.patch --]
[-- Type: text/plain, Size: 1988 bytes --]
A slightly more verbose comment to go along with the trickery in
page_lock_anon_vma().
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -315,8 +315,22 @@ void __init anon_vma_init(void)
}
/*
- * Getting a lock on a stable anon_vma from a page off the LRU is
- * tricky: page_lock_anon_vma rely on RCU to guard against the races.
+ * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
+ *
+ * Since there is no serialization what so ever against page_remove_rmap()
+ * the best this function can do is return a locked anon_vma that might
+ * have been relevant to this page.
+ *
+ * The page might have been remapped to a different anon_vma or the anon_vma
+ * returned may already be freed (and even reused).
+ *
+ * All users of this function must be very careful when walking the anon_vma
+ * chain and verify that the page in question is indeed mapped in it
+ * [ something equivalent to page_mapped_in_vma() ].
+ *
+ * Since anon_vma's slab is DESTROY_BY_RCU and we know from page_remove_rmap()
+ * that the anon_vma pointer from page->mapping is valid if there is a
+ * mapcount, we can dereference the anon_vma after observing those.
*/
struct anon_vma *page_lock_anon_vma(struct page *page)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 6/8] mm: Use refcounts for page_lock_anon_vma()
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
` (4 preceding siblings ...)
2011-02-17 17:05 ` [PATCH 5/8] mm: Improve page_lock_anon_vma() comment Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:05 ` [PATCH 7/8] mm: Convert anon_vma->lock to a mutex Peter Zijlstra
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-use_refcounts_for_page_lock_anon_vma.patch --]
[-- Type: text/plain, Size: 4302 bytes --]
Convert page_lock_anon_vma() over to use refcounts. This is done to
prepare for the conversion of anon_vma from spinlock to mutex.
Sadly this inceases the cost of page_lock_anon_vma() from one to two
atomics, a follow up patch addresses this, lets keep that simple for
now.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/migrate.c | 17 ++++-------------
mm/rmap.c | 42 +++++++++++++++++++++++++++---------------
2 files changed, 31 insertions(+), 28 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -336,9 +336,9 @@ void __init anon_vma_init(void)
* that the anon_vma pointer from page->mapping is valid if there is a
* mapcount, we can dereference the anon_vma after observing those.
*/
-struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *page_get_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma, *root_anon_vma;
+ struct anon_vma *anon_vma = NULL;
unsigned long anon_mapping;
rcu_read_lock();
@@ -349,30 +349,42 @@ struct anon_vma *page_lock_anon_vma(stru
goto out;
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
- root_anon_vma = ACCESS_ONCE(anon_vma->root);
- spin_lock(&root_anon_vma->lock);
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
/*
* If this page is still mapped, then its anon_vma cannot have been
- * freed. But if it has been unmapped, we have no security against
- * the anon_vma structure being freed and reused (for another anon_vma:
- * SLAB_DESTROY_BY_RCU guarantees that - so the spin_lock above cannot
- * corrupt): with anon_vma_prepare() or anon_vma_fork() redirecting
- * anon_vma->root before page_unlock_anon_vma() is called to unlock.
+ * freed. But if it has been unmapped, we have no security against the
+ * anon_vma structure being freed and reused (for another anon_vma:
+ * SLAB_DESTROY_BY_RCU guarantees that - so the atomic_inc_not_zero()
+ * above cannot corrupt).
*/
- if (page_mapped(page))
- return anon_vma;
-
- spin_unlock(&root_anon_vma->lock);
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
out:
rcu_read_unlock();
- return NULL;
+
+ return anon_vma;
+}
+
+struct anon_vma *page_lock_anon_vma(struct page *page)
+{
+ struct anon_vma *anon_vma = page_get_anon_vma(page);
+
+ if (anon_vma)
+ anon_vma_lock(anon_vma);
+
+ return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- rcu_read_unlock();
+ put_anon_vma(anon_vma);
}
/*
Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c
+++ linux-2.6/mm/migrate.c
@@ -703,15 +703,11 @@ static int unmap_and_move(new_page_t get
* Only page_lock_anon_vma() understands the subtleties of
* getting a hold on an anon_vma from outside one of its mms.
*/
- anon_vma = page_lock_anon_vma(page);
+ anon_vma = page_get_anon_vma(page);
if (anon_vma) {
/*
- * Take a reference count on the anon_vma if the
- * page is mapped so that it is guaranteed to
- * exist when the page is remapped later
+ * Anon page
*/
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
} else if (PageSwapCache(page)) {
/*
* We cannot be sure that the anon_vma of an unmapped
@@ -840,13 +836,8 @@ static int unmap_and_move_huge_page(new_
lock_page(hpage);
}
- if (PageAnon(hpage)) {
- anon_vma = page_lock_anon_vma(hpage);
- if (anon_vma) {
- get_anon_vma(anon_vma);
- page_unlock_anon_vma(anon_vma);
- }
- }
+ if (PageAnon(hpage))
+ anon_vma = page_get_anon_vma(hpage);
try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 7/8] mm: Convert anon_vma->lock to a mutex
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
` (5 preceding siblings ...)
2011-02-17 17:05 ` [PATCH 6/8] mm: Use refcounts for page_lock_anon_vma() Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:05 ` [PATCH 8/8] mm: Optimize page_lock_anon_vma() fast-path Peter Zijlstra
2011-02-17 17:36 ` [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang, Hugh Dickins
[-- Attachment #1: peter_zijlstra-mm-anon_vma-lock_to_mutexes.patch --]
[-- Type: text/plain, Size: 8044 bytes --]
Straight fwd conversion of anon_vma->lock to a mutex.
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/huge_mm.h | 8 ++------
include/linux/mmu_notifier.h | 2 +-
include/linux/rmap.h | 14 +++++++-------
mm/huge_memory.c | 4 ++--
mm/mmap.c | 10 +++++-----
mm/rmap.c | 8 ++++----
6 files changed, 21 insertions(+), 25 deletions(-)
Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h
+++ linux-2.6/include/linux/rmap.h
@@ -7,7 +7,7 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/mm.h>
-#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/memcontrol.h>
/*
@@ -26,7 +26,7 @@
*/
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
- spinlock_t lock; /* Serialize access to vma list */
+ struct mutex mutex; /* Serialize access to vma list */
/*
* The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
@@ -64,7 +64,7 @@ struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma; /* locked by mmap_sem & page_table_lock */
- struct list_head same_anon_vma; /* locked by anon_vma->lock */
+ struct list_head same_anon_vma; /* locked by anon_vma->mutex */
};
#ifdef CONFIG_MMU
@@ -93,24 +93,24 @@ static inline void vma_lock_anon_vma(str
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
static inline void anon_vma_lock(struct anon_vma *anon_vma)
{
- spin_lock(&anon_vma->root->lock);
+ mutex_lock(&anon_vma->root->mutex);
}
static inline void anon_vma_unlock(struct anon_vma *anon_vma)
{
- spin_unlock(&anon_vma->root->lock);
+ mutex_unlock(&anon_vma->root->mutex);
}
/*
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -25,7 +25,7 @@
* mm->mmap_sem
* page->flags PG_locked (lock_page)
* mapping->i_mmap_mutex
- * anon_vma->lock
+ * anon_vma->mutex
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
* swap_lock (in swap_duplicate, swap_info_get)
@@ -39,7 +39,7 @@
*
* (code doesn't rely on that order so it could be switched around)
* ->tasklist_lock
- * anon_vma->lock (memory_failure, collect_procs_anon)
+ * anon_vma->mutex (memory_failure, collect_procs_anon)
* pte map lock
*/
@@ -306,7 +306,7 @@ static void anon_vma_ctor(void *data)
{
struct anon_vma *anon_vma = data;
- spin_lock_init(&anon_vma->lock);
+ mutex_init(&anon_vma->mutex);
atomic_set(&anon_vma->refcount, 0);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -1129,7 +1129,7 @@ int try_to_unmap_one(struct page *page,
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->lock or mapping->i_mmap_mutex.
+ * we now hold anon_vma->mutex or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -2523,15 +2523,15 @@ static void vm_lock_anon_vma(struct mm_s
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
*/
- spin_lock_nest_lock(&anon_vma->root->lock, &mm->mmap_sem);
+ mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);
/*
* We can safely modify head.next after taking the
- * anon_vma->root->lock. If some other vma in this mm shares
+ * anon_vma->root->mutex. If some other vma in this mm shares
* the same anon_vma we won't take it again.
*
* No need of atomic instructions here, head.next
* can't change from under us thanks to the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (__test_and_set_bit(0, (unsigned long *)
&anon_vma->root->head.next))
@@ -2580,7 +2580,7 @@ static void vm_lock_mapping(struct mm_st
* vma in this mm is backed by the same anon_vma or address_space.
*
* We can take all the locks in random order because the VM code
- * taking i_mmap_mutex or anon_vma->lock outside the mmap_sem never
+ * taking i_mmap_mutex or anon_vma->mutex outside the mmap_sem never
* takes more than one of them in a row. Secondly we're protected
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
*
@@ -2636,7 +2636,7 @@ static void vm_unlock_anon_vma(struct an
*
* No need of atomic instructions here, head.next
* can't change from under us until we release the
- * anon_vma->root->lock.
+ * anon_vma->root->mutex.
*/
if (!__test_and_clear_bit(0, (unsigned long *)
&anon_vma->root->head.next))
Index: linux-2.6/include/linux/mmu_notifier.h
===================================================================
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -150,7 +150,7 @@ struct mmu_notifier_ops {
* Therefore notifier chains can only be traversed when either
*
* 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->lock).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->mutex).
* 3. No other concurrent thread can access the list (release)
*/
struct mmu_notifier {
Index: linux-2.6/mm/huge_memory.c
===================================================================
--- linux-2.6.orig/mm/huge_memory.c
+++ linux-2.6/mm/huge_memory.c
@@ -1128,7 +1128,7 @@ static int __split_huge_page_splitting(s
* We can't temporarily set the pmd to null in order
* to split it, the pmd must remain marked huge at all
* times or the VM won't take the pmd_trans_huge paths
- * and it won't wait on the anon_vma->root->lock to
+ * and it won't wait on the anon_vma->root->mutex to
* serialize against split_huge_page*.
*/
pmdp_splitting_flush_notify(vma, address, pmd);
@@ -1315,7 +1315,7 @@ static int __split_huge_page_map(struct
return ret;
}
-/* must be called with anon_vma->root->lock hold */
+/* must be called with anon_vma->root->mutex hold */
static void __split_huge_page(struct page *page,
struct anon_vma *anon_vma)
{
Index: linux-2.6/include/linux/huge_mm.h
===================================================================
--- linux-2.6.orig/include/linux/huge_mm.h
+++ linux-2.6/include/linux/huge_mm.h
@@ -91,12 +91,8 @@ extern void __split_huge_page_pmd(struct
#define wait_split_huge_page(__anon_vma, __pmd) \
do { \
pmd_t *____pmd = (__pmd); \
- spin_unlock_wait(&(__anon_vma)->root->lock); \
- /* \
- * spin_unlock_wait() is just a loop in C and so the \
- * CPU can reorder anything around it. \
- */ \
- smp_mb(); \
+ anon_vma_lock(__anon_vma); \
+ anon_vma_unlock(__anon_vma); \
BUG_ON(pmd_trans_splitting(*____pmd) || \
pmd_trans_huge(*____pmd)); \
} while (0)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 8/8] mm: Optimize page_lock_anon_vma() fast-path
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
` (6 preceding siblings ...)
2011-02-17 17:05 ` [PATCH 7/8] mm: Convert anon_vma->lock to a mutex Peter Zijlstra
@ 2011-02-17 17:05 ` Peter Zijlstra
2011-02-17 17:36 ` [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:05 UTC (permalink / raw)
To: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds
Cc: linux-kernel, linux-arch, linux-mm, Benjamin Herrenschmidt,
David Miller, Hugh Dickins, Mel Gorman, Nick Piggin,
Peter Zijlstra, Paul McKenney, Yanmin Zhang
[-- Attachment #1: peter_zijlstra-mm-optimize_page_lock_anon_vma_fast-path.patch --]
[-- Type: text/plain, Size: 3739 bytes --]
Optimize the page_lock_anon_vma() fast path to be one LOCKed op,
instead of two.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/rmap.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 81 insertions(+), 4 deletions(-)
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -371,20 +371,75 @@ struct anon_vma *page_get_anon_vma(struc
return anon_vma;
}
+/*
+ * Similar to page_get_anon_vma() except it locks the anon_vma.
+ *
+ * Its a little more complex as it tries to keep the fast path to a single
+ * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
+ * reference like with page_get_anon_vma() and then block on the mutex.
+ */
struct anon_vma *page_lock_anon_vma(struct page *page)
{
- struct anon_vma *anon_vma = page_get_anon_vma(page);
+ struct anon_vma *anon_vma = NULL;
+ unsigned long anon_mapping;
- if (anon_vma)
- anon_vma_lock(anon_vma);
+ rcu_read_lock();
+ anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
+ if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
+ goto out;
+ if (!page_mapped(page))
+ goto out;
+
+ anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
+ if (mutex_trylock(&anon_vma->root->mutex)) {
+ /*
+ * If we observe a !0 refcount, then holding the lock ensures
+ * the anon_vma will not go away, see __put_anon_vma().
+ */
+ if (!atomic_read(&anon_vma->refcount)) {
+ anon_vma_unlock(anon_vma);
+ anon_vma = NULL;
+ }
+ goto out;
+ }
+
+ /* trylock failed, we got to sleep */
+ if (!atomic_inc_not_zero(&anon_vma->refcount)) {
+ anon_vma = NULL;
+ goto out;
+ }
+
+ if (!page_mapped(page)) {
+ put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ goto out;
+ }
+ /* we pinned the anon_vma, its safe to sleep */
+ rcu_read_unlock();
+ anon_vma_lock(anon_vma);
+
+ if (atomic_dec_and_test(&anon_vma->refcount)) {
+ /*
+ * Oops, we held the last refcount, release the lock
+ * and bail -- can't simply use put_anon_vma() because
+ * we'll deadlock on the anon_vma_lock() recursion.
+ */
+ anon_vma_unlock(anon_vma);
+ __put_anon_vma(anon_vma);
+ anon_vma = NULL;
+ }
+
+ return anon_vma;
+
+out:
+ rcu_read_unlock();
return anon_vma;
}
void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
anon_vma_unlock(anon_vma);
- put_anon_vma(anon_vma);
}
/*
@@ -1500,6 +1555,28 @@ int try_to_munlock(struct page *page)
void __put_anon_vma(struct anon_vma *anon_vma)
{
+ /*
+ * Synchronize against page_lock_anon_vma() such that
+ * we can safely hold the lock without the anon_vma getting
+ * freed.
+ *
+ * Relies on the full mb implied by the atomic_dec_and_test() from
+ * put_anon_vma() against the acquire barrier implied by
+ * mutex_trylock() from page_lock_anon_vma(). This orders:
+ *
+ * page_lock_anon_vma() VS put_anon_vma()
+ * mutex_trylock() atomic_dec_and_test()
+ * LOCK MB
+ * atomic_read() mutex_is_locked()
+ *
+ * LOCK should suffice since the actual taking of the lock must
+ * happen _before_ what follows.
+ */
+ if (mutex_is_locked(&anon_vma->root->mutex)) {
+ anon_vma_lock(anon_vma);
+ anon_vma_unlock(anon_vma);
+ }
+
if (anon_vma->root != anon_vma)
put_anon_vma(anon_vma->root);
anon_vma_free(anon_vma);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/8] mm: Preemptibility -v8
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
` (7 preceding siblings ...)
2011-02-17 17:05 ` [PATCH 8/8] mm: Optimize page_lock_anon_vma() fast-path Peter Zijlstra
@ 2011-02-17 17:36 ` Peter Zijlstra
8 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2011-02-17 17:36 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Avi Kivity, Thomas Gleixner, Rik van Riel, Ingo Molnar, akpm,
Linus Torvalds, linux-kernel, linux-arch, linux-mm,
Benjamin Herrenschmidt, David Miller, Hugh Dickins, Mel Gorman,
Nick Piggin, Paul McKenney, Yanmin Zhang
On Thu, 2011-02-17 at 18:05 +0100, Peter Zijlstra wrote:
> This series depends on the previous two series:
> - mm: Simplify anon_vma lifetime rules
> - mm: mmu_gather rework
>
> These patches make part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes which together with the mmu_gather
> rework makes mmu_gather preemptible as well.
>
> Making i_mmap_lock a mutex also enables a clean-up of the truncate code.
>
> This also allows for preemptible mmu_notifiers, something that XPMEM I think
> wants.
---
Documentation/lockstat.txt | 2
Documentation/vm/locking | 2
arch/x86/mm/hugetlbpage.c | 4
fs/gfs2/main.c | 2
fs/hugetlbfs/inode.c | 4
fs/inode.c | 2
fs/nilfs2/page.c | 2
include/linux/fs.h | 3
include/linux/huge_mm.h | 8 -
include/linux/lockdep.h | 3
include/linux/mm.h | 2
include/linux/mm_types.h | 1
include/linux/mmu_notifier.h | 2
include/linux/mutex.h | 9 +
include/linux/rmap.h | 29 +-----
kernel/fork.c | 5 -
kernel/mutex.c | 25 +++--
mm/filemap.c | 10 +-
mm/filemap_xip.c | 4
mm/fremap.c | 4
mm/huge_memory.c | 4
mm/hugetlb.c | 14 +--
mm/memory-failure.c | 4
mm/memory.c | 197 ++++++-------------------------------------
mm/migrate.c | 17 ---
mm/mmap.c | 43 +++------
mm/mremap.c | 7 -
mm/rmap.c | 171 +++++++++++++++++++++++++++++--------
28 files changed, 258 insertions(+), 322 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/8] mm: Remove i_mmap_mutex lockbreak
2011-02-17 17:05 ` [PATCH 2/8] mm: Remove i_mmap_mutex lockbreak Peter Zijlstra
@ 2011-02-17 17:46 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 11+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-17 17:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrea Arcangeli, Avi Kivity, Thomas Gleixner, Rik van Riel,
Ingo Molnar, akpm, Linus Torvalds, linux-kernel, linux-arch,
linux-mm, Benjamin Herrenschmidt, David Miller, Hugh Dickins,
Mel Gorman, Nick Piggin, Paul McKenney, Yanmin Zhang
On Thu, 17 Feb 2011 18:05:22 +0100
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Hugh says:
> "The only significant loser, I think, would be page reclaim (when
> concurrent with truncation): could spin for a long time waiting for
> the i_mmap_mutex it expects would soon be dropped? "
>
> Counter points:
> - cpu contention makes the spin stop (need_resched())
> - zap pages should be freeing pages at a higher rate than reclaim
> ever can
> - shouldn't hold up reclaim more than lock_page() would
>
> I think the simplification of the truncate code is definately worth
> it.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Maybe I have to improve batched-uncharge in memcg, whose work depends
on ZAP_BLOCK_SIZE....but the zap routine seems cleaner.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-02-17 17:52 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-17 17:05 [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
2011-02-17 17:05 ` [PATCH 1/8] lockdep, mutex: Provide mutex_lock_nest_lock Peter Zijlstra
2011-02-17 17:05 ` [PATCH 2/8] mm: Remove i_mmap_mutex lockbreak Peter Zijlstra
2011-02-17 17:46 ` KAMEZAWA Hiroyuki
2011-02-17 17:05 ` [PATCH 3/8] mm: Convert i_mmap_lock to a mutex Peter Zijlstra
2011-02-17 17:05 ` [PATCH 4/8] mm: Revert page_lock_anon_vma() lock annotation Peter Zijlstra
2011-02-17 17:05 ` [PATCH 5/8] mm: Improve page_lock_anon_vma() comment Peter Zijlstra
2011-02-17 17:05 ` [PATCH 6/8] mm: Use refcounts for page_lock_anon_vma() Peter Zijlstra
2011-02-17 17:05 ` [PATCH 7/8] mm: Convert anon_vma->lock to a mutex Peter Zijlstra
2011-02-17 17:05 ` [PATCH 8/8] mm: Optimize page_lock_anon_vma() fast-path Peter Zijlstra
2011-02-17 17:36 ` [PATCH 0/8] mm: Preemptibility -v8 Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).