* [patch] mm: reduce pagetable-freeing latencies
@ 2007-07-24 8:38 Ingo Molnar
2007-07-24 8:54 ` Andrew Morton
2007-07-24 9:40 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 17+ messages in thread
From: Ingo Molnar @ 2007-07-24 8:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Hugh Dickins
the patch below has been tested in -rt for a long time - lets get it
upstream please. It fixes some bad latencies on PREEMPT too.
Ingo
----------------------->
From: Hugh Dickins <hugh@veritas.com>
Subject: mm: reduce pagetable-freeing latencies
2.6.15-rc1 moved the unlinking of a vma from its prio_tree and anon_vma
into free_pgtables: so the vma is hidden from rmap and vmtruncate before
freeing its page tables, allowing safe descent without page table lock.
But free_pgtables is still called with preemption disabled, and Lee
Revell has now detected high latency there.
The right fix will be to rework the mmu_gathering, not to need preemption
disabled; but for now an ugly CONFIG_PREEMPT block in free_pgtables, to
make an initial unlinking pass with preemption enabled - made uglier by
CONFIG_IA64 definitions (only ia64 actually uses the start and end given
to tlb_finish_mmu, and our floor and ceiling don't quite work for those).
These CONFIG choices being to minimize the additional TLB flushing.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--
mm/memory.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
Index: linux-rt-rebase.q/mm/memory.c
===================================================================
--- linux-rt-rebase.q.orig/mm/memory.c
+++ linux-rt-rebase.q/mm/memory.c
@@ -264,18 +264,48 @@ void free_pgd_range(struct mmu_gather **
flush_tlb_pgtables((*tlb)->mm, start, end);
}
+#ifdef CONFIG_IA64
+#define tlb_start_addr(tlb) (tlb)->start_addr
+#define tlb_end_addr(tlb) (tlb)->end_addr
+#else
+#define tlb_start_addr(tlb) 0UL /* only ia64 really uses it */
+#define tlb_end_addr(tlb) 0UL /* only ia64 really uses it */
+#endif
+
void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
unsigned long floor, unsigned long ceiling)
{
+#ifdef CONFIG_PREEMPT
+ struct vm_area_struct *unlink = vma;
+ int fullmm = (*tlb)->fullmm;
+
+ if (!vma) /* Sometimes when exiting after an oops */
+ return;
+ if (vma->vm_next)
+ tlb_finish_mmu(*tlb, tlb_start_addr(*tlb), tlb_end_addr(*tlb));
+ /*
+ * Hide vma from rmap and vmtruncate before freeeing pgtables,
+ * with preemption enabled, except when unmapping just one area.
+ */
+ while (unlink) {
+ anon_vma_unlink(unlink);
+ unlink_file_vma(unlink);
+ unlink = unlink->vm_next;
+ }
+ if (vma->vm_next)
+ *tlb = tlb_gather_mmu(vma->vm_mm, fullmm);
+#endif
while (vma) {
struct vm_area_struct *next = vma->vm_next;
unsigned long addr = vma->vm_start;
+#ifndef CONFIG_PREEMPT
/*
* Hide vma from rmap and vmtruncate before freeing pgtables
*/
anon_vma_unlink(vma);
unlink_file_vma(vma);
+#endif
if (is_vm_hugetlb_page(vma)) {
hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
@@ -288,8 +318,10 @@ void free_pgtables(struct mmu_gather **t
&& !is_vm_hugetlb_page(next)) {
vma = next;
next = vma->vm_next;
+#ifndef CONFIG_PREEMPT
anon_vma_unlink(vma);
unlink_file_vma(vma);
+#endif
}
free_pgd_range(tlb, addr, vma->vm_end,
floor, next? next->vm_start: ceiling);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-24 8:38 [patch] mm: reduce pagetable-freeing latencies Ingo Molnar
@ 2007-07-24 8:54 ` Andrew Morton
2007-07-24 9:40 ` Benjamin Herrenschmidt
2007-07-24 11:22 ` Ingo Molnar
2007-07-24 9:40 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2007-07-24 8:54 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Hugh Dickins
On Tue, 24 Jul 2007 10:38:55 +0200 Ingo Molnar <mingo@elte.hu> wrote:
>
> the patch below has been tested in -rt for a long time - lets get it
> upstream please. It fixes some bad latencies on PREEMPT too.
>
> Ingo
>
> ----------------------->
> From: Hugh Dickins <hugh@veritas.com>
> Subject: mm: reduce pagetable-freeing latencies
>
> 2.6.15-rc1 moved the unlinking of a vma from its prio_tree and anon_vma
> into free_pgtables: so the vma is hidden from rmap and vmtruncate before
> freeing its page tables, allowing safe descent without page table lock.
> But free_pgtables is still called with preemption disabled, and Lee
> Revell has now detected high latency there.
>
> The right fix will be to rework the mmu_gathering, not to need preemption
> disabled; but for now an ugly CONFIG_PREEMPT block in free_pgtables, to
> make an initial unlinking pass with preemption enabled - made uglier by
> CONFIG_IA64 definitions (only ia64 actually uses the start and end given
> to tlb_finish_mmu, and our floor and ceiling don't quite work for those).
> These CONFIG choices being to minimize the additional TLB flushing.
>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> --
>
> mm/memory.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> Index: linux-rt-rebase.q/mm/memory.c
> ===================================================================
> --- linux-rt-rebase.q.orig/mm/memory.c
> +++ linux-rt-rebase.q/mm/memory.c
> @@ -264,18 +264,48 @@ void free_pgd_range(struct mmu_gather **
> flush_tlb_pgtables((*tlb)->mm, start, end);
> }
>
> +#ifdef CONFIG_IA64
> +#define tlb_start_addr(tlb) (tlb)->start_addr
> +#define tlb_end_addr(tlb) (tlb)->end_addr
> +#else
> +#define tlb_start_addr(tlb) 0UL /* only ia64 really uses it */
> +#define tlb_end_addr(tlb) 0UL /* only ia64 really uses it */
> +#endif
> +
> void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
> unsigned long floor, unsigned long ceiling)
> {
> +#ifdef CONFIG_PREEMPT
> + struct vm_area_struct *unlink = vma;
> + int fullmm = (*tlb)->fullmm;
> +
> + if (!vma) /* Sometimes when exiting after an oops */
> + return;
> + if (vma->vm_next)
> + tlb_finish_mmu(*tlb, tlb_start_addr(*tlb), tlb_end_addr(*tlb));
> + /*
> + * Hide vma from rmap and vmtruncate before freeeing pgtables,
> + * with preemption enabled, except when unmapping just one area.
> + */
> + while (unlink) {
> + anon_vma_unlink(unlink);
> + unlink_file_vma(unlink);
> + unlink = unlink->vm_next;
> + }
> + if (vma->vm_next)
> + *tlb = tlb_gather_mmu(vma->vm_mm, fullmm);
> +#endif
> while (vma) {
> struct vm_area_struct *next = vma->vm_next;
> unsigned long addr = vma->vm_start;
>
> +#ifndef CONFIG_PREEMPT
> /*
> * Hide vma from rmap and vmtruncate before freeing pgtables
> */
> anon_vma_unlink(vma);
> unlink_file_vma(vma);
> +#endif
>
> if (is_vm_hugetlb_page(vma)) {
> hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
> @@ -288,8 +318,10 @@ void free_pgtables(struct mmu_gather **t
> && !is_vm_hugetlb_page(next)) {
> vma = next;
> next = vma->vm_next;
> +#ifndef CONFIG_PREEMPT
> anon_vma_unlink(vma);
> unlink_file_vma(vma);
> +#endif
> }
> free_pgd_range(tlb, addr, vma->vm_end,
> floor, next? next->vm_start: ceiling);
What a truly putrid patch. I am suspecting that this was a quick
get-you-out-of-trouble thing, which then got forgotten about.
We have two months to do the "right fix". Please?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-24 8:38 [patch] mm: reduce pagetable-freeing latencies Ingo Molnar
2007-07-24 8:54 ` Andrew Morton
@ 2007-07-24 9:40 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-24 9:40 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Hugh Dickins
On Tue, 2007-07-24 at 10:38 +0200, Ingo Molnar wrote:
> the patch below has been tested in -rt for a long time - lets get it
> upstream please. It fixes some bad latencies on PREEMPT too.
Note that I'm working on the "right way" (reworking mmu gathering)
Patches hopefully this week.
Ben.
> Ingo
>
> ----------------------->
> From: Hugh Dickins <hugh@veritas.com>
> Subject: mm: reduce pagetable-freeing latencies
>
> 2.6.15-rc1 moved the unlinking of a vma from its prio_tree and anon_vma
> into free_pgtables: so the vma is hidden from rmap and vmtruncate before
> freeing its page tables, allowing safe descent without page table lock.
> But free_pgtables is still called with preemption disabled, and Lee
> Revell has now detected high latency there.
>
> The right fix will be to rework the mmu_gathering, not to need preemption
> disabled; but for now an ugly CONFIG_PREEMPT block in free_pgtables, to
> make an initial unlinking pass with preemption enabled - made uglier by
> CONFIG_IA64 definitions (only ia64 actually uses the start and end given
> to tlb_finish_mmu, and our floor and ceiling don't quite work for those).
> These CONFIG choices being to minimize the additional TLB flushing.
>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> --
>
> mm/memory.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> Index: linux-rt-rebase.q/mm/memory.c
> ===================================================================
> --- linux-rt-rebase.q.orig/mm/memory.c
> +++ linux-rt-rebase.q/mm/memory.c
> @@ -264,18 +264,48 @@ void free_pgd_range(struct mmu_gather **
> flush_tlb_pgtables((*tlb)->mm, start, end);
> }
>
> +#ifdef CONFIG_IA64
> +#define tlb_start_addr(tlb) (tlb)->start_addr
> +#define tlb_end_addr(tlb) (tlb)->end_addr
> +#else
> +#define tlb_start_addr(tlb) 0UL /* only ia64 really uses it */
> +#define tlb_end_addr(tlb) 0UL /* only ia64 really uses it */
> +#endif
> +
> void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
> unsigned long floor, unsigned long ceiling)
> {
> +#ifdef CONFIG_PREEMPT
> + struct vm_area_struct *unlink = vma;
> + int fullmm = (*tlb)->fullmm;
> +
> + if (!vma) /* Sometimes when exiting after an oops */
> + return;
> + if (vma->vm_next)
> + tlb_finish_mmu(*tlb, tlb_start_addr(*tlb), tlb_end_addr(*tlb));
> + /*
> + * Hide vma from rmap and vmtruncate before freeeing pgtables,
> + * with preemption enabled, except when unmapping just one area.
> + */
> + while (unlink) {
> + anon_vma_unlink(unlink);
> + unlink_file_vma(unlink);
> + unlink = unlink->vm_next;
> + }
> + if (vma->vm_next)
> + *tlb = tlb_gather_mmu(vma->vm_mm, fullmm);
> +#endif
> while (vma) {
> struct vm_area_struct *next = vma->vm_next;
> unsigned long addr = vma->vm_start;
>
> +#ifndef CONFIG_PREEMPT
> /*
> * Hide vma from rmap and vmtruncate before freeing pgtables
> */
> anon_vma_unlink(vma);
> unlink_file_vma(vma);
> +#endif
>
> if (is_vm_hugetlb_page(vma)) {
> hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
> @@ -288,8 +318,10 @@ void free_pgtables(struct mmu_gather **t
> && !is_vm_hugetlb_page(next)) {
> vma = next;
> next = vma->vm_next;
> +#ifndef CONFIG_PREEMPT
> anon_vma_unlink(vma);
> unlink_file_vma(vma);
> +#endif
> }
> free_pgd_range(tlb, addr, vma->vm_end,
> floor, next? next->vm_start: ceiling);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-24 8:54 ` Andrew Morton
@ 2007-07-24 9:40 ` Benjamin Herrenschmidt
2007-07-24 12:13 ` Andi Kleen
2007-07-24 11:22 ` Ingo Molnar
1 sibling, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-24 9:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel, Hugh Dickins
> What a truly putrid patch. I am suspecting that this was a quick
> get-you-out-of-trouble thing, which then got forgotten about.
>
> We have two months to do the "right fix". Please?
Working on it...
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-24 8:54 ` Andrew Morton
2007-07-24 9:40 ` Benjamin Herrenschmidt
@ 2007-07-24 11:22 ` Ingo Molnar
1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2007-07-24 11:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Hugh Dickins, Benjamin Herrenschmidt
* Andrew Morton <akpm@linux-foundation.org> wrote:
> What a truly putrid patch. I am suspecting that this was a quick
> get-you-out-of-trouble thing, which then got forgotten about.
>
> We have two months to do the "right fix". Please?
sure. Just wanted to bring attention to this issue - the regression got
introduced in v2.6.15. I'd be happy to test-drive Ben's patch in -rt.
Ingo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-24 9:40 ` Benjamin Herrenschmidt
@ 2007-07-24 12:13 ` Andi Kleen
2007-07-24 21:29 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2007-07-24 12:13 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Andrew Morton, Ingo Molnar, linux-kernel, Hugh Dickins
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> > What a truly putrid patch. I am suspecting that this was a quick
> > get-you-out-of-trouble thing, which then got forgotten about.
> >
> > We have two months to do the "right fix". Please?
>
> Working on it...
Ideally the patch would DTRT even on non preemptible kernels,
aka do cond_resched()s when needed.
-Andi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-24 12:13 ` Andi Kleen
@ 2007-07-24 21:29 ` Benjamin Herrenschmidt
2007-07-25 6:44 ` Peter Zijlstra
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-24 21:29 UTC (permalink / raw)
To: Andi Kleen; +Cc: Andrew Morton, Ingo Molnar, linux-kernel, Hugh Dickins
On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>
> > > What a truly putrid patch. I am suspecting that this was a quick
> > > get-you-out-of-trouble thing, which then got forgotten about.
> > >
> > > We have two months to do the "right fix". Please?
> >
> > Working on it...
>
> Ideally the patch would DTRT even on non preemptible kernels,
> aka do cond_resched()s when needed.
First is to rework the batch structure to make it more manageable. That
is, patch #1 will keep the page list in per-cpu (and thus non-preempt),
but the batch "head" will be on the stack.
Now, there are two approaches regarding getting rid of the
get_cpu/put_cpu:
- One is to have a small number of entries for the page list in the
batch structure on the stack, and attempt to gfp' a page for more. If
that fails, we can still free, though with less batching, using only the
few entries in the batch struct itself. That's Hugh initial appraoch
iirc.
- Another is to hook up with those folks who've been asking for a
notifier that we are being preempted/scheduled out. In this case, I can
happily access the per-cpu list, and just trigger a batch flush if we
happen to be scheduled out.
I tend to prefer the former solution though, gfp should be fast, and
there is no need to force a flush if we get scheduled out. It would be
rare to hit the worst case scenario of falling back to the few page
heads in the batch itself. On the other hand, that solution has the
problem of bloating the stack a bit (with the few page pointers) even in
the case where I plan to use the extended batch outside of zap_*, such
as fork, mprotect, ....
So I'll first do patch #1, which will not fix the problem, but will make
the fix easier to fit in, in the meantime, please provide feedback of
your preferred solution for avoiding the get/put_cpu of the 2 above,
unless you find a good 3rd one.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-24 21:29 ` Benjamin Herrenschmidt
@ 2007-07-25 6:44 ` Peter Zijlstra
2007-07-25 9:46 ` Andi Kleen
2007-07-28 1:36 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2007-07-25 6:44 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Andi Kleen, Andrew Morton, Ingo Molnar, linux-kernel,
Hugh Dickins
On Wed, 2007-07-25 at 07:29 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote:
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> >
> > > > What a truly putrid patch. I am suspecting that this was a quick
> > > > get-you-out-of-trouble thing, which then got forgotten about.
> > > >
> > > > We have two months to do the "right fix". Please?
> > >
> > > Working on it...
> >
> > Ideally the patch would DTRT even on non preemptible kernels,
> > aka do cond_resched()s when needed.
>
> First is to rework the batch structure to make it more manageable. That
> is, patch #1 will keep the page list in per-cpu (and thus non-preempt),
> but the batch "head" will be on the stack.
>
> Now, there are two approaches regarding getting rid of the
> get_cpu/put_cpu:
>
> - One is to have a small number of entries for the page list in the
> batch structure on the stack, and attempt to gfp' a page for more. If
> that fails, we can still free, though with less batching, using only the
> few entries in the batch struct itself. That's Hugh initial appraoch
> iirc.
>
> - Another is to hook up with those folks who've been asking for a
> notifier that we are being preempted/scheduled out. In this case, I can
> happily access the per-cpu list, and just trigger a batch flush if we
> happen to be scheduled out.
>
> I tend to prefer the former solution though, gfp should be fast, and
> there is no need to force a flush if we get scheduled out. It would be
> rare to hit the worst case scenario of falling back to the few page
> heads in the batch itself. On the other hand, that solution has the
> problem of bloating the stack a bit (with the few page pointers) even in
> the case where I plan to use the extended batch outside of zap_*, such
> as fork, mprotect, ....
>
> So I'll first do patch #1, which will not fix the problem, but will make
> the fix easier to fit in, in the meantime, please provide feedback of
> your preferred solution for avoiding the get/put_cpu of the 2 above,
> unless you find a good 3rd one.
I too would prefer the former solution. I think preemption notifiers are
a particular iffy hack.
You could perhaps use C99 variable length arrays to avoid the stack
waste when not needed, however Andi once told me that generates rather
dubious code.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-25 6:44 ` Peter Zijlstra
@ 2007-07-25 9:46 ` Andi Kleen
2007-07-25 10:08 ` Benjamin Herrenschmidt
2007-07-28 1:36 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2007-07-25 9:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Benjamin Herrenschmidt, Andi Kleen, Andrew Morton, Ingo Molnar,
linux-kernel, Hugh Dickins
> You could perhaps use C99 variable length arrays to avoid the stack
> waste when not needed, however Andi once told me that generates rather
> dubious code.
It generates frame pointers, but that's not that bad. I'm not
aware of any other bad side effects. Ok the compiler will limit
your goto usage, but that's more a good thing.
But since you always have to strictly limit the array in kernel code anyways
you could as well just allocate the fixed limit.
-Andi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-25 9:46 ` Andi Kleen
@ 2007-07-25 10:08 ` Benjamin Herrenschmidt
2007-07-25 10:26 ` Andi Kleen
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-25 10:08 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, Andrew Morton, Ingo Molnar, linux-kernel,
Hugh Dickins
On Wed, 2007-07-25 at 11:46 +0200, Andi Kleen wrote:
> > You could perhaps use C99 variable length arrays to avoid the stack
> > waste when not needed, however Andi once told me that generates rather
> > dubious code.
>
> It generates frame pointers, but that's not that bad. I'm not
> aware of any other bad side effects. Ok the compiler will limit
> your goto usage, but that's more a good thing.
>
> But since you always have to strictly limit the array in kernel code anyways
> you could as well just allocate the fixed limit.
Plan is fixed array or 4 or maybe 8 entries (pointers), that shouldn't
be -too- bad. The code path I'm a bit worried about is
unmap_mapping_ranges() which goes into zapping page tables from deep
within filesystems.
At worst, I can reduce the fixed array to 1 entry. That means that if
the batch can't manage to get a page to use for the page list, it will
end up doing the flush for each page :-) But that should rarely happen,
in fact, I would expect it to be able to get a page the next time around
because it just freed one...
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-25 10:08 ` Benjamin Herrenschmidt
@ 2007-07-25 10:26 ` Andi Kleen
2007-07-25 10:46 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2007-07-25 10:26 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Andi Kleen, Peter Zijlstra, Andrew Morton, Ingo Molnar,
linux-kernel, Hugh Dickins
> Plan is fixed array or 4 or maybe 8 entries (pointers), that shouldn't
> be -too- bad. The code path I'm a bit worried about is
Yep.
> unmap_mapping_ranges() which goes into zapping page tables from deep
> within filesystems.
Your aim is to conserve stack space?
The worst case has to work without overflow anyways, so using VLAs don't
help you. Just allocate the largest size you can safely afford.
This is also easier to test; if it is too large it is better when
the overflow triggers always.
They are more useful in user space with larger stacks when you don't
know the maximum size.
-Andi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-25 10:26 ` Andi Kleen
@ 2007-07-25 10:46 ` Benjamin Herrenschmidt
2007-07-26 17:11 ` Hugh Dickins
0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-25 10:46 UTC (permalink / raw)
To: Andi Kleen
Cc: Peter Zijlstra, Andrew Morton, Ingo Molnar, linux-kernel,
Hugh Dickins
On Wed, 2007-07-25 at 12:26 +0200, Andi Kleen wrote:
> > Plan is fixed array or 4 or maybe 8 entries (pointers), that shouldn't
> > be -too- bad. The code path I'm a bit worried about is
>
> Yep.
>
> > unmap_mapping_ranges() which goes into zapping page tables from deep
> > within filesystems.
>
> Your aim is to conserve stack space?
> The worst case has to work without overflow anyways, so using VLAs don't
> help you. Just allocate the largest size you can safely afford.
>
> This is also easier to test; if it is too large it is better when
> the overflow triggers always.
>
> They are more useful in user space with larger stacks when you don't
> know the maximum size.
In fact, i discussed with peter and I think the best is to only have one
entry in the stack mmu_gather structure. If we fail to allocate a page
to batch entries, we flush that one and steal it :-) Then we have a page
to gather more.
I'll give that a go tomorrow, I got delayed by some stupid HW issues
today.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-25 10:46 ` Benjamin Herrenschmidt
@ 2007-07-26 17:11 ` Hugh Dickins
2007-07-26 21:35 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2007-07-26 17:11 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Andi Kleen, Peter Zijlstra, Andrew Morton, Ingo Molnar,
linux-kernel
On Wed, 25 Jul 2007, Benjamin Herrenschmidt wrote:
>
> In fact, i discussed with peter and I think the best is to only have one
> entry in the stack mmu_gather structure.
Four or eight would be much nicer than one, but you're right there's
a stackdepth worry: good spotting of those unmap_mapping_range cases,
I'd certainly been thinking we were fairly near the top of the stack
in unmap_vmas, which is not so (or not obviously so) in those cases.
> If we fail to allocate a page
> to batch entries, we flush that one and steal it :-) Then we have a page
> to gather more.
That will often be the case for anonymous pages, but not for pagecache
pages. I had wondered whether to make exit_mmap unmap the stack first
(rather than text first as usually happens), but seems rather a hack.
Hugh
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-26 17:11 ` Hugh Dickins
@ 2007-07-26 21:35 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-26 21:35 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andi Kleen, Peter Zijlstra, Andrew Morton, Ingo Molnar,
linux-kernel
On Thu, 2007-07-26 at 18:11 +0100, Hugh Dickins wrote:
> That will often be the case for anonymous pages, but not for pagecache
> pages. I had wondered whether to make exit_mmap unmap the stack first
> (rather than text first as usually happens), but seems rather a hack.
Yup and defeated by HIGHMEM ,so I'm back to per-cpu page lists for now
and we'll see how to move to the next step once I've ironed out other
issues such as sparc64/ia64 related bits.
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-25 6:44 ` Peter Zijlstra
2007-07-25 9:46 ` Andi Kleen
@ 2007-07-28 1:36 ` Benjamin Herrenschmidt
2007-07-28 5:54 ` Hugh Dickins
1 sibling, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-28 1:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andi Kleen, Andrew Morton, Ingo Molnar, linux-kernel,
Hugh Dickins
> > So I'll first do patch #1, which will not fix the problem, but will make
> > the fix easier to fit in, in the meantime, please provide feedback of
> > your preferred solution for avoiding the get/put_cpu of the 2 above,
> > unless you find a good 3rd one.
>
> I too would prefer the former solution. I think preemption notifiers are
> a particular iffy hack.
>
> You could perhaps use C99 variable length arrays to avoid the stack
> waste when not needed, however Andi once told me that generates rather
> dubious code.
As I'm sweeping through arch code etc... preparing the ground for the
proper mmu_gather surgery, I've been thinking about the way to deal with
that per-cpu page list and finally came up with the idea that the best
we can do is around the lines of trying to allocate the list via gfp,
and if that fails, fallback to a (smaller than now) per-cpu. I'm
reworking the interfaces such that the higher level code doesn't have to
care whether preemption is enabled or disabled at a given point.
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-28 1:36 ` Benjamin Herrenschmidt
@ 2007-07-28 5:54 ` Hugh Dickins
2007-07-28 22:36 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2007-07-28 5:54 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Peter Zijlstra, Andi Kleen, Andrew Morton, Ingo Molnar,
linux-kernel
On Sat, 28 Jul 2007, Benjamin Herrenschmidt wrote:
>
> As I'm sweeping through arch code etc... preparing the ground for the
> proper mmu_gather surgery, I've been thinking about the way to deal with
> that per-cpu page list and finally came up with the idea that the best
> we can do is around the lines of trying to allocate the list via gfp,
> and if that fails, fallback to a (smaller than now) per-cpu. I'm
> reworking the interfaces such that the higher level code doesn't have to
> care whether preemption is enabled or disabled at a given point.
That doesn't sound like the best way to me at all. Using two means
of buffering, one with preemption enabled and the other not, seems
complex and prone to error (perhaps not while you're working on it,
but later on). We do already have that problem (the i_mmap_lock case
versus the others), but it's not a complication I'd want to extend.
The onstack array seems fine to me, even if you do end up deciding on
an array of one. Is there any evidence that it's a problem getting a
page for the freeing (other than in circumstances that are already
badly slowed down)? It's obvious that we need a fallback route,
but optimizing throughput on that route seems premature.
Hugh
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [patch] mm: reduce pagetable-freeing latencies
2007-07-28 5:54 ` Hugh Dickins
@ 2007-07-28 22:36 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2007-07-28 22:36 UTC (permalink / raw)
To: Hugh Dickins
Cc: Peter Zijlstra, Andi Kleen, Andrew Morton, Ingo Molnar,
linux-kernel
> The onstack array seems fine to me, even if you do end up deciding on
> an array of one. Is there any evidence that it's a problem getting a
> page for the freeing (other than in circumstances that are already
> badly slowed down)? It's obvious that we need a fallback route,
> but optimizing throughput on that route seems premature.
Hrm, no evidence of that so far indeed. I'm worried by the stack usage
of the unmap_mapping_ranges() but appart from that, no.
Appart from that, yeah, I suppose we can have a macro defining how many
on-stack backup we have and adjust it if we see that being a problem.
I'm not fan of dynamic on-stack allocations.
Ben.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-07-28 22:38 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-24 8:38 [patch] mm: reduce pagetable-freeing latencies Ingo Molnar
2007-07-24 8:54 ` Andrew Morton
2007-07-24 9:40 ` Benjamin Herrenschmidt
2007-07-24 12:13 ` Andi Kleen
2007-07-24 21:29 ` Benjamin Herrenschmidt
2007-07-25 6:44 ` Peter Zijlstra
2007-07-25 9:46 ` Andi Kleen
2007-07-25 10:08 ` Benjamin Herrenschmidt
2007-07-25 10:26 ` Andi Kleen
2007-07-25 10:46 ` Benjamin Herrenschmidt
2007-07-26 17:11 ` Hugh Dickins
2007-07-26 21:35 ` Benjamin Herrenschmidt
2007-07-28 1:36 ` Benjamin Herrenschmidt
2007-07-28 5:54 ` Hugh Dickins
2007-07-28 22:36 ` Benjamin Herrenschmidt
2007-07-24 11:22 ` Ingo Molnar
2007-07-24 9:40 ` Benjamin Herrenschmidt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.