* [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas
@ 2012-10-31 10:33 Michel Lespinasse
2012-10-31 10:33 ` [RFC PATCH 1/6] mm: augment vma rbtree with rb_subtree_gap Michel Lespinasse
` (6 more replies)
0 siblings, 7 replies; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 10:33 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm
Earlier this year, Rik proposed using augmented rbtrees to optimize
our search for a suitable unmapped area during mmap(). This prompted
my work on improving the augmented rbtree code. Rik doesn't seem to
have time to follow up on his idea at this time, so I'm sending this
series to revive the idea.
These changes are against v3.7-rc3. I have only converted the generic
and x86_64 search implementations so far, as I'm really looking for
comments on the general approach; however it shouldn't be too
difficult to convert other architectures in the same way (and
eventually the drivers that define their own f_op->get_unmapped_area
method as well).
Patch 1 augments the VMA rbtree with a new rb_subtree_gap field,
indicating the length of the largest gap immediately preceding any
VMAs in a subtree.
Patch 2 adds new checks to CONFIG_DEBUG_VM_RB to verify the above
information is correctly maintained.
Patch 3 rearranges the vm_area_struct layout so that rbtree searches only
need data that is contained in the first cacheline (this one is from
Rik's original patch series)
Patch 4 adds a generic vm_unmapped_area() search function, which
allows for searching for an address space of any desired length,
within [low; high[ address constraints, with any desired alignment.
The generic arch_get_unmapped_area[_topdown] functions are also converted
to use this.
Patch 5 converts the x86_64 arch_get_unmapped_area[_topdown] functions
to use vm_unmapped_area() as well.
Patch 6 fixes cache coloring on x86_64, as suggested by Rik in his
previous series.
My own feel for this series is that I'm fairly confident in the
robustness of my vm_unmapped_area() implementation; however I would
like to confirm that people are happy with this new interface. Also
the code that figures out what constraints to pass to
vm_unmapped_area() is a bit odd; I have tried to make the constraints
match the behavior of the current code but it's not clear to me if
that behavior makes sense in the first place.
There is also the question of performance. I remember from IRC
conversations that someone (I think it was Mel ?) had found some
regressions with Rik's prior proposal. My current proposal is
significantly faster at updating the rbtrees; however there is still
the fact that vm_unmapped_area() does not use a free area cache as the
old brute-force implementation did. I don't expect that to be a
problem, but I have not confirmed yet if the prior regressions are
gone (and if they are still present, one would want to find out if
they are introduced by rbtree maintainance in Patch 1, or by removing
the free area cache as part of patches 4/5).
Michel Lespinasse (5):
mm: augment vma rbtree with rb_subtree_gap
mm: check rb_subtree_gap correctness
mm: vm_unmapped_area() lookup function
mm: use vm_unmapped_area() on x86_64 architecture
mm: fix cache coloring on x86_64
Rik van Riel (1):
mm: rearrange vm_area_struct for fewer cache misses
arch/x86/include/asm/elf.h | 6 +-
arch/x86/kernel/sys_x86_64.c | 151 +++-----------
arch/x86/vdso/vma.c | 2 +-
include/linux/mm.h | 31 +++
include/linux/mm_types.h | 19 ++-
mm/mmap.c | 450 ++++++++++++++++++++++++++++++++----------
6 files changed, 423 insertions(+), 236 deletions(-)
--
1.7.7.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH 1/6] mm: augment vma rbtree with rb_subtree_gap
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
@ 2012-10-31 10:33 ` Michel Lespinasse
2012-11-01 21:43 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 2/6] mm: check rb_subtree_gap correctness Michel Lespinasse
` (5 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 10:33 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm
Define vma->rb_subtree_gap as the largest gap between any vma in the
subtree rooted at that vma, and their predecessor. Or, for a recursive
definition, vma->rb_subtree_gap is the max of:
- vma->vm_start - vma->vm_prev->vm_end
- rb_subtree_gap fields of the vmas pointed by vma->rb.rb_left and
vma->rb.rb_right
This will allow get_unmapped_area_* to find a free area of the right
size in O(log(N)) time, instead of potentially having to do a linear
walk across all the VMAs.
Also define mm->highest_vm_end as the vm_end field of the highest vma,
so that we can easily check if the following gap is suitable.
This does have the potential to make unmapping VMAs more expensive,
especially for processes with very large numbers of VMAs, where the
VMA rbtree can grow quite deep.
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/mm_types.h | 9 ++++
mm/mmap.c | 114 ++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 114 insertions(+), 9 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 31f8a3af7d94..94fa52b28ee8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -238,6 +238,14 @@ struct vm_area_struct {
struct rb_node vm_rb;
/*
+ * Largest free memory gap in bytes to the left of this VMA.
+ * Either between this VMA and vma->vm_prev, or between one of the
+ * VMAs below us in the VMA rbtree and its ->vm_prev. This helps
+ * get_unmapped_area find a free area of the right size.
+ */
+ unsigned long rb_subtree_gap;
+
+ /*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap interval tree, or
* linkage of vma in the address_space->i_mmap_nonlinear list.
@@ -322,6 +330,7 @@ struct mm_struct {
unsigned long task_size; /* size of task vm space */
unsigned long cached_hole_size; /* if non-zero, the largest hole below free_area_cache */
unsigned long free_area_cache; /* first hole of size cached_hole_size or larger */
+ unsigned long highest_vm_end; /* highest vma end address */
pgd_t * pgd;
atomic_t mm_users; /* How many users with user space? */
atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */
diff --git a/mm/mmap.c b/mm/mmap.c
index 2d942353d681..4b14c4070305 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -31,6 +31,7 @@
#include <linux/audit.h>
#include <linux/khugepaged.h>
#include <linux/uprobes.h>
+#include <linux/rbtree_augmented.h>
#include <asm/uaccess.h>
#include <asm/cacheflush.h>
@@ -297,6 +298,70 @@ out:
return retval;
}
+static long vma_compute_subtree_gap(struct vm_area_struct *vma)
+{
+ unsigned long max, subtree_gap;
+ max = vma->vm_start;
+ if (vma->vm_prev)
+ max -= vma->vm_prev->vm_end;
+ if (vma->vm_rb.rb_left) {
+ subtree_gap = rb_entry(vma->vm_rb.rb_left,
+ struct vm_area_struct, vm_rb)->rb_subtree_gap;
+ if (subtree_gap > max)
+ max = subtree_gap;
+ }
+ if (vma->vm_rb.rb_right) {
+ subtree_gap = rb_entry(vma->vm_rb.rb_right,
+ struct vm_area_struct, vm_rb)->rb_subtree_gap;
+ if (subtree_gap > max)
+ max = subtree_gap;
+ }
+ return max;
+}
+
+RB_DECLARE_CALLBACKS(static, vma_gap_callbacks, struct vm_area_struct, vm_rb,
+ unsigned long, rb_subtree_gap, vma_compute_subtree_gap)
+
+/*
+ * Update augmented rbtree rb_subtree_gap values after vma->vm_start or
+ * vma->vm_prev->vm_end values changed, without modifying the vma's position
+ * in the rbtree.
+ */
+static void vma_gap_update(struct vm_area_struct *vma)
+{
+ /*
+ * As it turns out, RB_DECLARE_CALLBACKS() already created a callback
+ * function that does exacltly what we want.
+ */
+ vma_gap_callbacks_propagate(&vma->vm_rb, NULL);
+}
+
+static inline void vma_rb_insert(struct vm_area_struct *vma,
+ struct rb_root *root)
+{
+ /*
+ * vma->vm_prev wasn't known when we followed the rbtree to find the
+ * correct insertion point for that vma. As a result, we could not
+ * update the vma vm_rb parents rb_subtree_gap values on the way down.
+ * So, we first insert the vma with a zero rb_subtree_gap value
+ * (to be consistent with what we did on the way down), and then
+ * immediately update the gap to the correct value.
+ */
+ vma->rb_subtree_gap = 0;
+ rb_insert_augmented(&vma->vm_rb, root, &vma_gap_callbacks);
+ vma_gap_update(vma);
+}
+
+static void vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root)
+{
+ /*
+ * Note rb_erase_augmented is a fairly large inline function,
+ * so make sure we instantiate it only once with our desired
+ * augmented rbtree callbacks.
+ */
+ rb_erase_augmented(&vma->vm_rb, root, &vma_gap_callbacks);
+}
+
#ifdef CONFIG_DEBUG_VM_RB
static int browse_rb(struct rb_root *root)
{
@@ -420,7 +485,11 @@ void __vma_link_rb(struct mm_struct *mm, struct vm_area_struct *vma,
struct rb_node **rb_link, struct rb_node *rb_parent)
{
rb_link_node(&vma->vm_rb, rb_parent, rb_link);
- rb_insert_color(&vma->vm_rb, &mm->mm_rb);
+ vma_rb_insert(vma, &mm->mm_rb);
+ if (vma->vm_next)
+ vma_gap_update(vma->vm_next);
+ else
+ mm->highest_vm_end = vma->vm_end;
}
static void __vma_link_file(struct vm_area_struct *vma)
@@ -501,7 +570,7 @@ __vma_unlink(struct mm_struct *mm, struct vm_area_struct *vma,
prev->vm_next = next;
if (next)
next->vm_prev = prev;
- rb_erase(&vma->vm_rb, &mm->mm_rb);
+ vma_rb_erase(vma, &mm->mm_rb);
if (mm->mmap_cache == vma)
mm->mmap_cache = prev;
}
@@ -523,6 +592,7 @@ int vma_adjust(struct vm_area_struct *vma, unsigned long start,
struct rb_root *root = NULL;
struct anon_vma *anon_vma = NULL;
struct file *file = vma->vm_file;
+ bool start_changed = false, end_changed = false;
long adjust_next = 0;
int remove_next = 0;
@@ -613,8 +683,14 @@ again: remove_next = 1 + (end > next->vm_end);
vma_interval_tree_remove(next, root);
}
- vma->vm_start = start;
- vma->vm_end = end;
+ if (start != vma->vm_start) {
+ vma->vm_start = start;
+ start_changed = true;
+ }
+ if (end != vma->vm_end) {
+ vma->vm_end = end;
+ end_changed = true;
+ }
vma->vm_pgoff = pgoff;
if (adjust_next) {
next->vm_start += adjust_next << PAGE_SHIFT;
@@ -643,6 +719,15 @@ again: remove_next = 1 + (end > next->vm_end);
* (it may either follow vma or precede it).
*/
__insert_vm_struct(mm, insert);
+ } else {
+ if (start_changed)
+ vma_gap_update(vma);
+ if (end_changed) {
+ if (!next)
+ mm->highest_vm_end = end;
+ else if (!adjust_next)
+ vma_gap_update(next);
+ }
}
if (anon_vma) {
@@ -676,10 +761,13 @@ again: remove_next = 1 + (end > next->vm_end);
* we must remove another next too. It would clutter
* up the code too much to do both in one go.
*/
- if (remove_next == 2) {
- next = vma->vm_next;
+ next = vma->vm_next;
+ if (remove_next == 2)
goto again;
- }
+ else if (next)
+ vma_gap_update(next);
+ else
+ mm->highest_vm_end = end;
}
if (insert && file)
uprobe_mmap(insert);
@@ -1781,6 +1869,10 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
anon_vma_interval_tree_pre_update_vma(vma);
vma->vm_end = address;
anon_vma_interval_tree_post_update_vma(vma);
+ if (vma->vm_next)
+ vma_gap_update(vma->vm_next);
+ else
+ mm->highest_vm_end = address;
perf_event_mmap(vma);
}
}
@@ -1835,6 +1927,7 @@ int expand_downwards(struct vm_area_struct *vma,
vma->vm_start = address;
vma->vm_pgoff -= grow;
anon_vma_interval_tree_post_update_vma(vma);
+ vma_gap_update(vma);
perf_event_mmap(vma);
}
}
@@ -1957,14 +2050,17 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
insertion_point = (prev ? &prev->vm_next : &mm->mmap);
vma->vm_prev = NULL;
do {
- rb_erase(&vma->vm_rb, &mm->mm_rb);
+ vma_rb_erase(vma, &mm->mm_rb);
mm->map_count--;
tail_vma = vma;
vma = vma->vm_next;
} while (vma && vma->vm_start < end);
*insertion_point = vma;
- if (vma)
+ if (vma) {
vma->vm_prev = prev;
+ vma_gap_update(vma);
+ } else
+ mm->highest_vm_end = prev ? prev->vm_end : 0;
tail_vma->vm_next = NULL;
if (mm->unmap_area == arch_unmap_area)
addr = prev ? prev->vm_end : mm->mmap_base;
--
1.7.7.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 2/6] mm: check rb_subtree_gap correctness
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
2012-10-31 10:33 ` [RFC PATCH 1/6] mm: augment vma rbtree with rb_subtree_gap Michel Lespinasse
@ 2012-10-31 10:33 ` Michel Lespinasse
2012-11-01 21:49 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 3/6] mm: rearrange vm_area_struct for fewer cache misses Michel Lespinasse
` (4 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 10:33 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm
When CONFIG_DEBUG_VM_RB is enabled, check that rb_subtree_gap is
correctly set for every vma and that mm->highest_vm_end is also correct.
Also add an explicit 'bug' variable to track if browse_rb() detected any
invalid condition.
Signed-off-by: Michel Lespinasse <walken@google.com>
---
mm/mmap.c | 24 ++++++++++++++++--------
1 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 4b14c4070305..548fed471398 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -365,7 +365,7 @@ static void vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root)
#ifdef CONFIG_DEBUG_VM_RB
static int browse_rb(struct rb_root *root)
{
- int i = 0, j;
+ int i = 0, j, bug = 0;
struct rb_node *nd, *pn = NULL;
unsigned long prev = 0, pend = 0;
@@ -373,39 +373,47 @@ static int browse_rb(struct rb_root *root)
struct vm_area_struct *vma;
vma = rb_entry(nd, struct vm_area_struct, vm_rb);
if (vma->vm_start < prev)
- printk("vm_start %lx prev %lx\n", vma->vm_start, prev), i = -1;
+ printk("vm_start %lx prev %lx\n", vma->vm_start, prev), bug = 1;
if (vma->vm_start < pend)
- printk("vm_start %lx pend %lx\n", vma->vm_start, pend);
+ printk("vm_start %lx pend %lx\n", vma->vm_start, pend), bug = 1;
if (vma->vm_start > vma->vm_end)
- printk("vm_end %lx < vm_start %lx\n", vma->vm_end, vma->vm_start);
+ printk("vm_end %lx < vm_start %lx\n", vma->vm_end, vma->vm_start), bug = 1;
+ if (vma->rb_subtree_gap != vma_compute_subtree_gap(vma))
+ printk("free gap %lx, correct %lx\n",
+ vma->rb_subtree_gap,
+ vma_compute_subtree_gap(vma)), bug = 1;
i++;
pn = nd;
prev = vma->vm_start;
pend = vma->vm_end;
}
j = 0;
- for (nd = pn; nd; nd = rb_prev(nd)) {
+ for (nd = pn; nd; nd = rb_prev(nd))
j++;
- }
if (i != j)
- printk("backwards %d, forwards %d\n", j, i), i = 0;
- return i;
+ printk("backwards %d, forwards %d\n", j, i), bug = 1;
+ return bug ? -1 : i;
}
void validate_mm(struct mm_struct *mm)
{
int bug = 0;
int i = 0;
+ unsigned long highest_address = 0;
struct vm_area_struct *vma = mm->mmap;
while (vma) {
struct anon_vma_chain *avc;
list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
anon_vma_interval_tree_verify(avc);
+ highest_address = vma->vm_end;
vma = vma->vm_next;
i++;
}
if (i != mm->map_count)
printk("map_count %d vm_next %d\n", mm->map_count, i), bug = 1;
+ if (highest_address != mm->highest_vm_end)
+ printk("mm->highest_vm_end %lx, found %lx\n",
+ mm->highest_vm_end, highest_address), bug = 1;
i = browse_rb(&mm->mm_rb);
if (i != mm->map_count)
printk("map_count %d rb %d\n", mm->map_count, i), bug = 1;
--
1.7.7.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 3/6] mm: rearrange vm_area_struct for fewer cache misses
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
2012-10-31 10:33 ` [RFC PATCH 1/6] mm: augment vma rbtree with rb_subtree_gap Michel Lespinasse
2012-10-31 10:33 ` [RFC PATCH 2/6] mm: check rb_subtree_gap correctness Michel Lespinasse
@ 2012-10-31 10:33 ` Michel Lespinasse
2012-10-31 10:33 ` [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function Michel Lespinasse
` (3 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 10:33 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm, Rik van Riel
From: Rik van Riel <riel@surriel.com>
The kernel walks the VMA rbtree in various places, including
the page fault path. However, the vm_rb node spanned two
cache lines, on 64 bit systems with 64 byte cache lines (most
x86 systems).
Rearrange vm_area_struct a little, so all the information we
need to do a VMA tree walk is in the first cache line.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/mm_types.h | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 94fa52b28ee8..528da4abf8ee 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -224,7 +224,8 @@ struct vm_region {
* library, the executable area etc).
*/
struct vm_area_struct {
- struct mm_struct * vm_mm; /* The address space we belong to. */
+ /* The first cache line has the info for VMA tree walking. */
+
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end; /* The first byte after our end address
within vm_mm. */
@@ -232,9 +233,6 @@ struct vm_area_struct {
/* linked list of VM areas per task, sorted by address */
struct vm_area_struct *vm_next, *vm_prev;
- pgprot_t vm_page_prot; /* Access permissions of this VMA. */
- unsigned long vm_flags; /* Flags, see mm.h. */
-
struct rb_node vm_rb;
/*
@@ -245,6 +243,12 @@ struct vm_area_struct {
*/
unsigned long rb_subtree_gap;
+ /* Second cache line starts here. */
+
+ struct mm_struct * vm_mm; /* The address space we belong to. */
+ pgprot_t vm_page_prot; /* Access permissions of this VMA. */
+ unsigned long vm_flags; /* Flags, see mm.h. */
+
/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap interval tree, or
--
1.7.7.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
` (2 preceding siblings ...)
2012-10-31 10:33 ` [RFC PATCH 3/6] mm: rearrange vm_area_struct for fewer cache misses Michel Lespinasse
@ 2012-10-31 10:33 ` Michel Lespinasse
2012-11-02 16:52 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 5/6] mm: use vm_unmapped_area() on x86_64 architecture Michel Lespinasse
` (2 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 10:33 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm
Implement vm_unmapped_area() using the rb_subtree_gap and highest_vm_end
information to look up for suitable virtual address space gaps.
struct vm_unmapped_area_info is used to define the desired allocation
request:
- lowest or highest possible address matching the remaining constraints
- desired gap length
- low/high address limits that the gap must fit into
- alignment mask and offset
Also update the generic arch_get_unmapped_area[_topdown] functions to
make use of vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/mm.h | 31 +++++
mm/mmap.c | 312 +++++++++++++++++++++++++++++++++++++---------------
2 files changed, 253 insertions(+), 90 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa0680402738..8dea2a732505 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1456,6 +1456,37 @@ extern unsigned long vm_mmap(struct file *, unsigned long,
unsigned long, unsigned long,
unsigned long, unsigned long);
+struct vm_unmapped_area_info {
+#define VM_UNMAPPED_AREA_TOPDOWN 1
+ unsigned long flags;
+ unsigned long length;
+ unsigned long low_limit;
+ unsigned long high_limit;
+ unsigned long align_mask;
+ unsigned long align_offset;
+};
+
+extern unsigned long unmapped_area(struct vm_unmapped_area_info *info);
+extern unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info);
+
+/*
+ * Search for an unmapped address range.
+ *
+ * We are looking for a range that:
+ * - does not intersect with any VMA;
+ * - is contained within the [low_limit, high_limit[ interval;
+ * - is at least the desired size.
+ * - satisfies (begin_addr & align_mask) == (align_offset & align_mask)
+ */
+static inline unsigned long
+vm_unmapped_area(struct vm_unmapped_area_info *info)
+{
+ if (!(info->flags & VM_UNMAPPED_AREA_TOPDOWN))
+ return unmapped_area(info);
+ else
+ return unmapped_area_topdown(info);
+}
+
/* truncate.c */
extern void truncate_inode_pages(struct address_space *, loff_t);
extern void truncate_inode_pages_range(struct address_space *,
diff --git a/mm/mmap.c b/mm/mmap.c
index 548fed471398..7f017468f1d0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1494,6 +1494,206 @@ unacct_error:
return error;
}
+unsigned long unmapped_area(struct vm_unmapped_area_info *info)
+{
+ /*
+ * We implement the search by looking for an rbtree node that
+ * immediately follows a suitable gap. That is,
+ * - gap_start = vma->vm_prev->vm_end <= info->high_limit - length;
+ * - gap_end = vma->vm_start >= info->low_limit + length;
+ * - gap_end - gap_start >= length
+ */
+
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ unsigned long length, low_limit, high_limit, gap_start, gap_end;
+
+ /* Adjust search length to account for worst case alignment overhead */
+ length = info->length + info->align_mask;
+ if (length < info->length)
+ return -ENOMEM;
+
+ /* Adjust search limits by the desired length */
+ if (info->high_limit < length)
+ return -ENOMEM;
+ high_limit = info->high_limit - length;
+
+ if (info->low_limit > high_limit)
+ return -ENOMEM;
+ low_limit = info->low_limit + length;
+
+ /* Check if rbtree root looks promising */
+ if (RB_EMPTY_ROOT(&mm->mm_rb))
+ goto check_highest;
+ vma = rb_entry(mm->mm_rb.rb_node, struct vm_area_struct, vm_rb);
+ if (vma->rb_subtree_gap < length)
+ goto check_highest;
+
+ while (true) {
+ /* Visit left subtree if it looks promising */
+ gap_end = vma->vm_start;
+ if (gap_end >= low_limit && vma->vm_rb.rb_left) {
+ struct vm_area_struct *left =
+ rb_entry(vma->vm_rb.rb_left,
+ struct vm_area_struct, vm_rb);
+ if (left->rb_subtree_gap >= length) {
+ vma = left;
+ continue;
+ }
+ }
+
+ gap_start = vma->vm_prev ? vma->vm_prev->vm_end : 0;
+ check_current:
+ /* Check if current node has a suitable gap */
+ if (gap_start > high_limit)
+ return -ENOMEM;
+ if (gap_end >= low_limit && gap_end - gap_start >= length)
+ goto found;
+
+ /* Visit right subtree if it looks promising */
+ if (vma->vm_rb.rb_right) {
+ struct vm_area_struct *right =
+ rb_entry(vma->vm_rb.rb_right,
+ struct vm_area_struct, vm_rb);
+ if (right->rb_subtree_gap >= length) {
+ vma = right;
+ continue;
+ }
+ }
+
+ /* Go back up the rbtree to find next candidate node */
+ while (true) {
+ struct rb_node *prev = &vma->vm_rb;
+ if (!rb_parent(prev))
+ goto check_highest;
+ vma = rb_entry(rb_parent(prev),
+ struct vm_area_struct, vm_rb);
+ if (prev == vma->vm_rb.rb_left) {
+ gap_start = vma->vm_prev->vm_end;
+ gap_end = vma->vm_start;
+ goto check_current;
+ }
+ }
+ }
+
+check_highest:
+ /* Check highest gap, which does not precede any rbtree node */
+ gap_start = mm->highest_vm_end;
+ gap_end = ULONG_MAX; /* Only for VM_BUG_ON below */
+ if (gap_start > high_limit)
+ return -ENOMEM;
+
+found:
+ /* We found a suitable gap. Clip it with the original low_limit. */
+ if (gap_start < info->low_limit)
+ gap_start = info->low_limit;
+
+ /* Adjust gap address to the desired alignment */
+ gap_start += (info->align_offset - gap_start) & info->align_mask;
+
+ VM_BUG_ON(gap_start + info->length > info->high_limit);
+ VM_BUG_ON(gap_start + info->length > gap_end);
+ return gap_start;
+}
+
+unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ unsigned long length, low_limit, high_limit, gap_start, gap_end;
+
+ /* Adjust search length to account for worst case alignment overhead */
+ length = info->length + info->align_mask;
+ if (length < info->length)
+ return -ENOMEM;
+
+ /*
+ * Adjust search limits by the desired length.
+ * See implementation comment at top of unmapped_area().
+ */
+ gap_end = info->high_limit;
+ if (gap_end < length)
+ return -ENOMEM;
+ high_limit = gap_end - length;
+
+ if (info->low_limit > high_limit)
+ return -ENOMEM;
+ low_limit = info->low_limit + length;
+
+ /* Check highest gap, which does not precede any rbtree node */
+ gap_start = mm->highest_vm_end;
+ if (gap_start <= high_limit)
+ goto found_highest;
+
+ /* Check if rbtree root looks promising */
+ if (RB_EMPTY_ROOT(&mm->mm_rb))
+ return -ENOMEM;
+ vma = rb_entry(mm->mm_rb.rb_node, struct vm_area_struct, vm_rb);
+ if (vma->rb_subtree_gap < length)
+ return -ENOMEM;
+
+ while (true) {
+ /* Visit right subtree if it looks promising */
+ gap_start = vma->vm_prev ? vma->vm_prev->vm_end : 0;
+ if (gap_start <= high_limit && vma->vm_rb.rb_right) {
+ struct vm_area_struct *right =
+ rb_entry(vma->vm_rb.rb_right,
+ struct vm_area_struct, vm_rb);
+ if (right->rb_subtree_gap >= length) {
+ vma = right;
+ continue;
+ }
+ }
+
+ check_current:
+ /* Check if current node has a suitable gap */
+ gap_end = vma->vm_start;
+ if (gap_end < low_limit)
+ return -ENOMEM;
+ if (gap_start <= high_limit && gap_end - gap_start >= length)
+ goto found;
+
+ /* Visit left subtree if it looks promising */
+ if (vma->vm_rb.rb_left) {
+ struct vm_area_struct *left =
+ rb_entry(vma->vm_rb.rb_left,
+ struct vm_area_struct, vm_rb);
+ if (left->rb_subtree_gap >= length) {
+ vma = left;
+ continue;
+ }
+ }
+
+ /* Go back up the rbtree to find next candidate node */
+ while (true) {
+ struct rb_node *prev = &vma->vm_rb;
+ if (!rb_parent(prev))
+ return -ENOMEM;
+ vma = rb_entry(rb_parent(prev),
+ struct vm_area_struct, vm_rb);
+ if (prev == vma->vm_rb.rb_right) {
+ gap_start = vma->vm_prev ?
+ vma->vm_prev->vm_end : 0;
+ goto check_current;
+ }
+ }
+ }
+
+found:
+ /* We found a suitable gap. Clip it with the original high_limit. */
+ if (gap_end > info->high_limit)
+ gap_end = info->high_limit;
+
+found_highest:
+ /* Compute highest gap address at the desired alignment */
+ gap_end -= info->length;
+ gap_end -= (gap_end - info->align_offset) & info->align_mask;
+
+ VM_BUG_ON(gap_end < info->low_limit);
+ VM_BUG_ON(gap_end < gap_start);
+ return gap_end;
+}
+
/* Get an address range which is currently unmapped.
* For shmat() with addr=0.
*
@@ -1512,7 +1712,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- unsigned long start_addr;
+ struct vm_unmapped_area_info info;
if (len > TASK_SIZE)
return -ENOMEM;
@@ -1527,40 +1727,13 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
(!vma || addr + len <= vma->vm_start))
return addr;
}
- if (len > mm->cached_hole_size) {
- start_addr = addr = mm->free_area_cache;
- } else {
- start_addr = addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
- }
-full_search:
- for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
- /* At this point: (!vma || addr < vma->vm_end). */
- if (TASK_SIZE - len < addr) {
- /*
- * Start a new search - just in case we missed
- * some holes.
- */
- if (start_addr != TASK_UNMAPPED_BASE) {
- addr = TASK_UNMAPPED_BASE;
- start_addr = addr;
- mm->cached_hole_size = 0;
- goto full_search;
- }
- return -ENOMEM;
- }
- if (!vma || addr + len <= vma->vm_start) {
- /*
- * Remember the place where we stopped the search:
- */
- mm->free_area_cache = addr + len;
- return addr;
- }
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
- addr = vma->vm_end;
- }
+ info.flags = 0;
+ info.length = len;
+ info.low_limit = TASK_UNMAPPED_BASE;
+ info.high_limit = TASK_SIZE;
+ info.align_mask = 0;
+ return vm_unmapped_area(&info);
}
#endif
@@ -1585,7 +1758,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
- unsigned long addr = addr0, start_addr;
+ unsigned long addr = addr0;
+ struct vm_unmapped_area_info info;
/* requested length too big for entire address space */
if (len > TASK_SIZE)
@@ -1603,53 +1777,12 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
return addr;
}
- /* check if free_area_cache is useful for us */
- if (len <= mm->cached_hole_size) {
- mm->cached_hole_size = 0;
- mm->free_area_cache = mm->mmap_base;
- }
-
-try_again:
- /* either no address requested or can't fit in requested address hole */
- start_addr = addr = mm->free_area_cache;
-
- if (addr < len)
- goto fail;
-
- addr -= len;
- do {
- /*
- * Lookup failure means no vma is above this address,
- * else if new region fits below vma->vm_start,
- * return with success:
- */
- vma = find_vma(mm, addr);
- if (!vma || addr+len <= vma->vm_start)
- /* remember the address as a hint for next time */
- return (mm->free_area_cache = addr);
-
- /* remember the largest hole we saw so far */
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
-
- /* try just below the current vma->vm_start */
- addr = vma->vm_start-len;
- } while (len < vma->vm_start);
-
-fail:
- /*
- * if hint left us with no space for the requested
- * mapping then try again:
- *
- * Note: this is different with the case of bottomup
- * which does the fully line-search, but we use find_vma
- * here that causes some holes skipped.
- */
- if (start_addr != mm->mmap_base) {
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = 0;
- goto try_again;
- }
+ info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+ info.length = len;
+ info.low_limit = 0;
+ info.high_limit = mm->mmap_base;
+ info.align_mask = 0;
+ addr = vm_unmapped_area(&info);
/*
* A failed mmap() very likely causes application failure,
@@ -1657,14 +1790,13 @@ fail:
* can happen with large stack limits and large mmap()
* allocations.
*/
- mm->cached_hole_size = ~0UL;
- mm->free_area_cache = TASK_UNMAPPED_BASE;
- addr = arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
- /*
- * Restore the topdown base:
- */
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = ~0UL;
+ if (addr & ~PAGE_MASK) {
+ VM_BUG_ON(addr != -ENOMEM);
+ info.flags = 0;
+ info.low_limit = TASK_UNMAPPED_BASE;
+ info.high_limit = TASK_SIZE;
+ addr = vm_unmapped_area(&info);
+ }
return addr;
}
--
1.7.7.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 5/6] mm: use vm_unmapped_area() on x86_64 architecture
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
` (3 preceding siblings ...)
2012-10-31 10:33 ` [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function Michel Lespinasse
@ 2012-10-31 10:33 ` Michel Lespinasse
2012-11-02 20:39 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 6/6] mm: fix cache coloring on x86_64 Michel Lespinasse
2012-10-31 11:03 ` [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
6 siblings, 1 reply; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 10:33 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm
Signed-off-by: Michel Lespinasse <walken@google.com>
---
arch/x86/include/asm/elf.h | 6 +-
arch/x86/kernel/sys_x86_64.c | 151 ++++++++---------------------------------
arch/x86/vdso/vma.c | 2 +-
3 files changed, 33 insertions(+), 126 deletions(-)
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 5939f44fe0c0..9c999c1674fa 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -354,12 +354,10 @@ static inline int mmap_is_ia32(void)
return 0;
}
-/* The first two values are special, do not change. See align_addr() */
+/* Do not change the values. See get_align_mask() */
enum align_flags {
ALIGN_VA_32 = BIT(0),
ALIGN_VA_64 = BIT(1),
- ALIGN_VDSO = BIT(2),
- ALIGN_TOPDOWN = BIT(3),
};
struct va_alignment {
@@ -368,5 +366,5 @@ struct va_alignment {
} ____cacheline_aligned;
extern struct va_alignment va_align;
-extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
+extern unsigned long align_vdso_addr(unsigned long);
#endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index b4d3c3927dd8..fa9227214753 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -21,37 +21,23 @@
/*
* Align a virtual address to avoid aliasing in the I$ on AMD F15h.
- *
- * @flags denotes the allocation direction - bottomup or topdown -
- * or vDSO; see call sites below.
*/
-unsigned long align_addr(unsigned long addr, struct file *filp,
- enum align_flags flags)
+static unsigned long get_align_mask(void)
{
- unsigned long tmp_addr;
-
/* handle 32- and 64-bit case with a single conditional */
if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
- return addr;
+ return 0;
if (!(current->flags & PF_RANDOMIZE))
- return addr;
-
- if (!((flags & ALIGN_VDSO) || filp))
- return addr;
-
- tmp_addr = addr;
-
- /*
- * We need an address which is <= than the original
- * one only when in topdown direction.
- */
- if (!(flags & ALIGN_TOPDOWN))
- tmp_addr += va_align.mask;
+ return 0;
- tmp_addr &= ~va_align.mask;
+ return va_align.mask;
+}
- return tmp_addr;
+unsigned long align_vdso_addr(unsigned long addr)
+{
+ unsigned long align_mask = get_align_mask();
+ return (addr + align_mask) & ~align_mask;
}
static int __init control_va_addr_alignment(char *str)
@@ -126,7 +112,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- unsigned long start_addr;
+ struct vm_unmapped_area_info info;
unsigned long begin, end;
if (flags & MAP_FIXED)
@@ -144,50 +130,16 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
(!vma || addr + len <= vma->vm_start))
return addr;
}
- if (((flags & MAP_32BIT) || test_thread_flag(TIF_ADDR32))
- && len <= mm->cached_hole_size) {
- mm->cached_hole_size = 0;
- mm->free_area_cache = begin;
- }
- addr = mm->free_area_cache;
- if (addr < begin)
- addr = begin;
- start_addr = addr;
-
-full_search:
-
- addr = align_addr(addr, filp, 0);
-
- for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
- /* At this point: (!vma || addr < vma->vm_end). */
- if (end - len < addr) {
- /*
- * Start a new search - just in case we missed
- * some holes.
- */
- if (start_addr != begin) {
- start_addr = addr = begin;
- mm->cached_hole_size = 0;
- goto full_search;
- }
- return -ENOMEM;
- }
- if (!vma || addr + len <= vma->vm_start) {
- /*
- * Remember the place where we stopped the search:
- */
- mm->free_area_cache = addr + len;
- return addr;
- }
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
- addr = vma->vm_end;
- addr = align_addr(addr, filp, 0);
- }
+ info.flags = 0;
+ info.length = len;
+ info.low_limit = begin;
+ info.high_limit = end;
+ info.align_mask = filp ? get_align_mask() : 0;
+ info.align_offset = 0;
+ return vm_unmapped_area(&info);
}
-
unsigned long
arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
const unsigned long len, const unsigned long pgoff,
@@ -195,7 +147,8 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
- unsigned long addr = addr0, start_addr;
+ unsigned long addr = addr0;
+ struct vm_unmapped_area_info info;
/* requested length too big for entire address space */
if (len > TASK_SIZE)
@@ -217,51 +170,16 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
return addr;
}
- /* check if free_area_cache is useful for us */
- if (len <= mm->cached_hole_size) {
- mm->cached_hole_size = 0;
- mm->free_area_cache = mm->mmap_base;
- }
-
-try_again:
- /* either no address requested or can't fit in requested address hole */
- start_addr = addr = mm->free_area_cache;
-
- if (addr < len)
- goto fail;
-
- addr -= len;
- do {
- addr = align_addr(addr, filp, ALIGN_TOPDOWN);
-
- /*
- * Lookup failure means no vma is above this address,
- * else if new region fits below vma->vm_start,
- * return with success:
- */
- vma = find_vma(mm, addr);
- if (!vma || addr+len <= vma->vm_start)
- /* remember the address as a hint for next time */
- return mm->free_area_cache = addr;
-
- /* remember the largest hole we saw so far */
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
-
- /* try just below the current vma->vm_start */
- addr = vma->vm_start-len;
- } while (len < vma->vm_start);
-
-fail:
- /*
- * if hint left us with no space for the requested
- * mapping then try again:
- */
- if (start_addr != mm->mmap_base) {
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = 0;
- goto try_again;
- }
+ info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+ info.length = len;
+ info.low_limit = 0; // XXX could be PAGE_SIZE ???
+ info.high_limit = mm->mmap_base;
+ info.align_mask = filp ? get_align_mask() : 0;
+ info.align_offset = 0;
+ addr = vm_unmapped_area(&info);
+ if (!(addr & ~PAGE_MASK))
+ return addr;
+ VM_BUG_ON(addr != -ENOMEM);
bottomup:
/*
@@ -270,14 +188,5 @@ bottomup:
* can happen with large stack limits and large mmap()
* allocations.
*/
- mm->cached_hole_size = ~0UL;
- mm->free_area_cache = TASK_UNMAPPED_BASE;
- addr = arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
- /*
- * Restore the topdown base:
- */
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = ~0UL;
-
- return addr;
+ return arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
}
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 00aaf047b39f..431e87544411 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -141,7 +141,7 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
* unaligned here as a result of stack start randomization.
*/
addr = PAGE_ALIGN(addr);
- addr = align_addr(addr, NULL, ALIGN_VDSO);
+ addr = align_vdso_addr(addr);
return addr;
}
--
1.7.7.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 6/6] mm: fix cache coloring on x86_64
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
` (4 preceding siblings ...)
2012-10-31 10:33 ` [RFC PATCH 5/6] mm: use vm_unmapped_area() on x86_64 architecture Michel Lespinasse
@ 2012-10-31 10:33 ` Michel Lespinasse
2012-10-31 11:03 ` [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
6 siblings, 0 replies; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 10:33 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm, Rik van Riel
From: Rik van Riel <riel@surriel.com>
Fix the x86-64 cache alignment code to take pgoff into account.
Use the x86 and MIPS cache alignment code as the basis for a generic
cache alignment function.
The old x86 code will always align the mmap to aliasing boundaries,
even if the program mmaps the file with a non-zero pgoff.
If program A mmaps the file with pgoff 0, and program B mmaps the
file with pgoff 1. The old code would align the mmaps, resulting in
misaligned pages:
A: 0123
B: 123
After this patch, they are aligned so the pages line up:
A: 0123
B: 123
Proposed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Michel Lespinasse <walken@google.com>
---
arch/x86/kernel/sys_x86_64.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index fa9227214753..5df5837ce6a3 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -136,7 +136,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
info.low_limit = begin;
info.high_limit = end;
info.align_mask = filp ? get_align_mask() : 0;
- info.align_offset = 0;
+ info.align_offset = pgoff << PAGE_SHIFT;
return vm_unmapped_area(&info);
}
@@ -175,7 +175,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
info.low_limit = 0; // XXX could be PAGE_SIZE ???
info.high_limit = mm->mmap_base;
info.align_mask = filp ? get_align_mask() : 0;
- info.align_offset = 0;
+ info.align_offset = pgoff << PAGE_SHIFT;
addr = vm_unmapped_area(&info);
if (!(addr & ~PAGE_MASK))
return addr;
--
1.7.7.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
` (5 preceding siblings ...)
2012-10-31 10:33 ` [RFC PATCH 6/6] mm: fix cache coloring on x86_64 Michel Lespinasse
@ 2012-10-31 11:03 ` Michel Lespinasse
6 siblings, 0 replies; 14+ messages in thread
From: Michel Lespinasse @ 2012-10-31 11:03 UTC (permalink / raw)
To: Rik van Riel, Hugh Dickins, Mel Gorman, Peter Zijlstra,
Johannes Weiner, Andrea Arcangeli
Cc: linux-mm
On Wed, Oct 31, 2012 at 3:33 AM, Michel Lespinasse <walken@google.com> wrote:
> My own feel for this series is that I'm fairly confident in the
> robustness of my vm_unmapped_area() implementation; however I would
> like to confirm that people are happy with this new interface. Also
> the code that figures out what constraints to pass to
> vm_unmapped_area() is a bit odd; I have tried to make the constraints
> match the behavior of the current code but it's not clear to me if
> that behavior makes sense in the first place.
I wanted to expand a bit on that by listing some of these behaviors I
have made sure to preserve without really understanding why they are
as they are:
- arch_get_unmapped_area() doesn't make use of mm->mmap_base, this
value is used only when doing downwards allocations. However, many
architectures including x86_64 carefully initialize this (in
arch_pick_mmap_layout() ) to different values based on the up/down
allocation direction. It seems that the legacy (upwards allocation)
mmap_base value is irrelevant as I don't see any place using it ???
- For downwards allocations, it is not clear if the lowest valid
address should be 0 or PAGE_SIZE. Existing brute-force search code
will treat address 0 as valid on entering the loop, but invalid when
reaching the end of the loop.
- When user passes a suggested address without the MAP_FIXED flag, the
address range we validate the address against varies depending on the
upwards/downwards allocation direction. This doesn't make much sense
since there is no address space search taking place in this case.
- The stragegy of allocating upwards if the downwards allocation
failed is a bit strange. I'm not sure what we really want; maybe we
only need to extend the valid address range for the initial search ?
(IIRC Rik's initial patch series got rid of this redundant search, but
didn't explain why this was considered safe).
That's all I noticed, but this is really most of the remaining code
left in arch_get_unmapped_area[_topdown]... and I didn't even go into
architectures other than x86, where I could find some additional
questionable stuff (but I don't even want to go there before we at
least agree on the general principle of this patch series).
I hope with a proper understanding of the allocation strategies /
constraints it might be possible to unify the remaining
arch_get_unmapped_area[_topdown] code between architectures, but I'm
keeping this for a later step as I'm obviously not informed enough to
tackle that just yet...
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/6] mm: augment vma rbtree with rb_subtree_gap
2012-10-31 10:33 ` [RFC PATCH 1/6] mm: augment vma rbtree with rb_subtree_gap Michel Lespinasse
@ 2012-11-01 21:43 ` Rik van Riel
0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2012-11-01 21:43 UTC (permalink / raw)
To: Michel Lespinasse
Cc: Hugh Dickins, Mel Gorman, Peter Zijlstra, Johannes Weiner,
Andrea Arcangeli, linux-mm
On 10/31/2012 06:33 AM, Michel Lespinasse wrote:
> Define vma->rb_subtree_gap as the largest gap between any vma in the
> subtree rooted at that vma, and their predecessor. Or, for a recursive
> definition, vma->rb_subtree_gap is the max of:
> - vma->vm_start - vma->vm_prev->vm_end
> - rb_subtree_gap fields of the vmas pointed by vma->rb.rb_left and
> vma->rb.rb_right
>
> This will allow get_unmapped_area_* to find a free area of the right
> size in O(log(N)) time, instead of potentially having to do a linear
> walk across all the VMAs.
>
> Also define mm->highest_vm_end as the vm_end field of the highest vma,
> so that we can easily check if the following gap is suitable.
>
> This does have the potential to make unmapping VMAs more expensive,
> especially for processes with very large numbers of VMAs, where the
> VMA rbtree can grow quite deep.
>
> Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/6] mm: check rb_subtree_gap correctness
2012-10-31 10:33 ` [RFC PATCH 2/6] mm: check rb_subtree_gap correctness Michel Lespinasse
@ 2012-11-01 21:49 ` Rik van Riel
0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2012-11-01 21:49 UTC (permalink / raw)
To: Michel Lespinasse
Cc: Hugh Dickins, Mel Gorman, Peter Zijlstra, Johannes Weiner,
Andrea Arcangeli, linux-mm
On 10/31/2012 06:33 AM, Michel Lespinasse wrote:
> When CONFIG_DEBUG_VM_RB is enabled, check that rb_subtree_gap is
> correctly set for every vma and that mm->highest_vm_end is also correct.
>
> Also add an explicit 'bug' variable to track if browse_rb() detected any
> invalid condition.
Looks somewhat familiar :)
> Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function
2012-10-31 10:33 ` [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function Michel Lespinasse
@ 2012-11-02 16:52 ` Rik van Riel
2012-11-02 22:41 ` Michel Lespinasse
0 siblings, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2012-11-02 16:52 UTC (permalink / raw)
To: Michel Lespinasse
Cc: Hugh Dickins, Mel Gorman, Peter Zijlstra, Johannes Weiner,
Andrea Arcangeli, linux-mm
On 10/31/2012 06:33 AM, Michel Lespinasse wrote:
> +/*
> + * Search for an unmapped address range.
> + *
> + * We are looking for a range that:
> + * - does not intersect with any VMA;
> + * - is contained within the [low_limit, high_limit[ interval;
^
bracket is the wrong way around :)
> + * - is at least the desired size.
> + * - satisfies (begin_addr & align_mask) == (align_offset & align_mask)
> + */
> +static inline unsigned long
> +vm_unmapped_area(struct vm_unmapped_area_info *info)
> +{
> + if (!(info->flags & VM_UNMAPPED_AREA_TOPDOWN))
> + return unmapped_area(info);
> + else
> + return unmapped_area_topdown(info);
> +}
I like how you pass all the info in the struct. That solves
some of the problems I was trying to deal with when I implemented
something similar.
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1494,6 +1494,206 @@ unacct_error:
> return error;
> }
>
> +unsigned long unmapped_area(struct vm_unmapped_area_info *info)
> +{
> + /*
> + * We implement the search by looking for an rbtree node that
> + * immediately follows a suitable gap. That is,
> + * - gap_start = vma->vm_prev->vm_end <= info->high_limit - length;
> + * - gap_end = vma->vm_start >= info->low_limit + length;
> + * - gap_end - gap_start >= length
> + */
> +
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> + unsigned long length, low_limit, high_limit, gap_start, gap_end;
> +
> + /* Adjust search length to account for worst case alignment overhead */
> + length = info->length + info->align_mask;
> + if (length < info->length)
> + return -ENOMEM;
If we unmap a VMA that is size N and correctly aligned, then
we may leave a gap of size N.
A subsequent call to mmap should be able to find that gap,
and use it.
It may make sense to not add the align_mask the first time
around, and see if the gap that we find already has the
correct alignment.
If it does not, we can always add the align_mask, and do
a second search for a hole of N + align_mask:
if (hole not aligned) {
length += info->align_mask;
goto again;
}
> @@ -1603,53 +1777,12 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
> return addr;
> }
>
...
> + info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> + info.length = len;
> + info.low_limit = 0;
> + info.high_limit = mm->mmap_base;
> + info.align_mask = 0;
> + addr = vm_unmapped_area(&info);
I believe it would be better to set low_limit to PAGE_SIZE.
Otherwise we can return 0, which is indistinguishable from a null
pointer. Keeping address 0 unmapped provides the security benefit
that code that can be made to address null pointers causes a crash,
instead of a potential privilege escalation.
The rest of the patch looks good.
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 5/6] mm: use vm_unmapped_area() on x86_64 architecture
2012-10-31 10:33 ` [RFC PATCH 5/6] mm: use vm_unmapped_area() on x86_64 architecture Michel Lespinasse
@ 2012-11-02 20:39 ` Rik van Riel
0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2012-11-02 20:39 UTC (permalink / raw)
To: Michel Lespinasse
Cc: Hugh Dickins, Mel Gorman, Peter Zijlstra, Johannes Weiner,
Andrea Arcangeli, linux-mm
On 10/31/2012 06:33 AM, Michel Lespinasse wrote:
> Signed-off-by: Michel Lespinasse <walken@google.com>
The patch could use a changelog.
> + info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> + info.length = len;
> + info.low_limit = 0; // XXX could be PAGE_SIZE ???
Indeed.
Everything else in the patch looks good to me.
> + info.high_limit = mm->mmap_base;
> + info.align_mask = filp ? get_align_mask() : 0;
> + info.align_offset = 0;
> + addr = vm_unmapped_area(&info);
> + if (!(addr & ~PAGE_MASK))
> + return addr;
> + VM_BUG_ON(addr != -ENOMEM);
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function
2012-11-02 16:52 ` Rik van Riel
@ 2012-11-02 22:41 ` Michel Lespinasse
2012-11-03 0:40 ` Rik van Riel
0 siblings, 1 reply; 14+ messages in thread
From: Michel Lespinasse @ 2012-11-02 22:41 UTC (permalink / raw)
To: Rik van Riel
Cc: Hugh Dickins, Mel Gorman, Peter Zijlstra, Johannes Weiner,
Andrea Arcangeli, linux-mm
On Fri, Nov 2, 2012 at 9:52 AM, Rik van Riel <riel@redhat.com> wrote:
> On 10/31/2012 06:33 AM, Michel Lespinasse wrote:
>
>> +/*
>> + * Search for an unmapped address range.
>> + *
>> + * We are looking for a range that:
>> + * - does not intersect with any VMA;
>> + * - is contained within the [low_limit, high_limit[ interval;
>
> bracket is the wrong way around :)
Ah, I meant to indicate the byte at high_limit is not included in the interval.
I think the convention people are used to here would be [low_limit, high_limit)
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1494,6 +1494,206 @@ unacct_error:
>> return error;
>> }
>>
>> +unsigned long unmapped_area(struct vm_unmapped_area_info *info)
>> +{
>> + /*
>> + * We implement the search by looking for an rbtree node that
>> + * immediately follows a suitable gap. That is,
>> + * - gap_start = vma->vm_prev->vm_end <= info->high_limit -
>> length;
>> + * - gap_end = vma->vm_start >= info->low_limit +
>> length;
>> + * - gap_end - gap_start >= length
>> + */
>> +
>> + struct mm_struct *mm = current->mm;
>> + struct vm_area_struct *vma;
>> + unsigned long length, low_limit, high_limit, gap_start, gap_end;
>> +
>> + /* Adjust search length to account for worst case alignment
>> overhead */
>> + length = info->length + info->align_mask;
>> + if (length < info->length)
>> + return -ENOMEM;
>
>
> If we unmap a VMA that is size N and correctly aligned, then
> we may leave a gap of size N.
>
> A subsequent call to mmap should be able to find that gap,
> and use it.
>
> It may make sense to not add the align_mask the first time
> around, and see if the gap that we find already has the
> correct alignment.
>
> If it does not, we can always add the align_mask, and do
> a second search for a hole of N + align_mask:
>
> if (hole not aligned) {
> length += info->align_mask;
> goto again;
>
> }
I guess the suggestion is OK in the sense that I can't see a case
where it'd hurt. However, it still won't find all cases where we just
unmapped a region of size N with the correct alignment - it could be
that the first region we find has insufficient alignment, and then the
search with an increased length could fail, even though there exists
an aligned gap (just not the first) of the desired size. So, this is
only a partial solution.
Another suggestion I thought about (but thought it may be overkill)
would be to use an extra field (maybe stored in the lowest bits of
rb_subtree_gap) to indicate the largest *aligned* gap preceding vmas
within a subtree: if the value would be N, there would have to be an
gap of size PAGE_SIZE<<N and aligned to PAGE_SIZE<<N. This still
doesn't solve the general problem, but does help in the case of
wanting to allocate an aligned power-of-2 sized vma.
Unfortunately, I don't think there is an efficient solution to the
general problem, and the partial solutions discussed above (both yours
and mine) don't seem to cover enough cases to warrant the complexity
IMO...
>> @@ -1603,53 +1777,12 @@ arch_get_unmapped_area_topdown(struct file *filp,
>> const unsigned long addr0,
>> return addr;
>> }
>>
> ...
>
>
>> + info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>> + info.length = len;
>> + info.low_limit = 0;
>> + info.high_limit = mm->mmap_base;
>> + info.align_mask = 0;
>> + addr = vm_unmapped_area(&info);
>
> I believe it would be better to set low_limit to PAGE_SIZE.
Noted. I should probably have tried that first as this is the safer option.
(It's one of the things I had noted in my 0/6 email reply: the
original code seems to be inconsistent here as it'd allow allocation
at address 0 on the first loop iteration but fail it on further
iterations)
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function
2012-11-02 22:41 ` Michel Lespinasse
@ 2012-11-03 0:40 ` Rik van Riel
0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2012-11-03 0:40 UTC (permalink / raw)
To: Michel Lespinasse
Cc: Hugh Dickins, Mel Gorman, Peter Zijlstra, Johannes Weiner,
Andrea Arcangeli, linux-mm
On 11/02/2012 06:41 PM, Michel Lespinasse wrote:
> On Fri, Nov 2, 2012 at 9:52 AM, Rik van Riel <riel@redhat.com> wrote:
>> On 10/31/2012 06:33 AM, Michel Lespinasse wrote:
> I guess the suggestion is OK in the sense that I can't see a case
> where it'd hurt. However, it still won't find all cases where we just
> unmapped a region of size N with the correct alignment - it could be
> that the first region we find has insufficient alignment, and then the
> search with an increased length could fail, even though there exists
> an aligned gap (just not the first) of the desired size. So, this is
> only a partial solution.
> Unfortunately, I don't think there is an efficient solution to the
> general problem, and the partial solutions discussed above (both yours
> and mine) don't seem to cover enough cases to warrant the complexity
> IMO...
The common case (anonymous memory) is that no alignment is
required, so your solution and mine would be equivalent :)
You are right that your solution and mine are pretty
much the same for the (rarer) cases where alignment
is needed. Lets stick with your simpler version.
ACK to your version
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-11-03 0:41 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-31 10:33 [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
2012-10-31 10:33 ` [RFC PATCH 1/6] mm: augment vma rbtree with rb_subtree_gap Michel Lespinasse
2012-11-01 21:43 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 2/6] mm: check rb_subtree_gap correctness Michel Lespinasse
2012-11-01 21:49 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 3/6] mm: rearrange vm_area_struct for fewer cache misses Michel Lespinasse
2012-10-31 10:33 ` [RFC PATCH 4/6] mm: vm_unmapped_area() lookup function Michel Lespinasse
2012-11-02 16:52 ` Rik van Riel
2012-11-02 22:41 ` Michel Lespinasse
2012-11-03 0:40 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 5/6] mm: use vm_unmapped_area() on x86_64 architecture Michel Lespinasse
2012-11-02 20:39 ` Rik van Riel
2012-10-31 10:33 ` [RFC PATCH 6/6] mm: fix cache coloring on x86_64 Michel Lespinasse
2012-10-31 11:03 ` [RFC PATCH 0/6] mm: use augmented rbtrees for finding unmapped areas Michel Lespinasse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).