* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic [not found] <tip-3ff0141aa3a03ca3388b40b36167d0a37919f3fd@git.kernel.org> @ 2009-06-15 14:46 ` Peter Zijlstra 2009-06-15 15:30 ` Hugh Dickins 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 14:46 UTC (permalink / raw) To: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, mingo Cc: linux-tip-commits, hugh.dickins On Mon, 2009-06-15 at 14:07 +0000, tip-bot for Peter Zijlstra wrote: > Commit-ID: 3ff0141aa3a03ca3388b40b36167d0a37919f3fd > Gitweb: http://git.kernel.org/tip/3ff0141aa3a03ca3388b40b36167d0a37919f3fd > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > AuthorDate: Mon, 15 Jun 2009 12:40:41 +0200 > Committer: Ingo Molnar <mingo@elte.hu> > CommitDate: Mon, 15 Jun 2009 15:57:52 +0200 > > x86: Add NMI types for kmap_atomic > > Two new kmap_atomic slots for NMI context. And teach pte_offset_map() > about NMI context. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > CC: Nick Piggin <npiggin@suse.de> > Cc: Mike Galbraith <efault@gmx.de> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > > --- > arch/x86/include/asm/kmap_types.h | 4 +++- > arch/x86/include/asm/pgtable_32.h | 5 +++-- > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/kmap_types.h b/arch/x86/include/asm/kmap_types.h > index 5759c16..ff00a44 100644 > --- a/arch/x86/include/asm/kmap_types.h > +++ b/arch/x86/include/asm/kmap_types.h > @@ -21,7 +21,9 @@ D(9) KM_IRQ0, > D(10) KM_IRQ1, > D(11) KM_SOFTIRQ0, > D(12) KM_SOFTIRQ1, > -D(13) KM_TYPE_NR > +D(13) KM_NMI, > +D(14) KM_NMI_PTE, > +D(15) KM_TYPE_NR > }; > > #undef D > diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h > index 31bd120..8546497 100644 > --- a/arch/x86/include/asm/pgtable_32.h > +++ b/arch/x86/include/asm/pgtable_32.h > @@ -49,13 +49,14 @@ extern void set_pmd_pfn(unsigned long, unsigned long, pgprot_t); > #endif > > #if defined(CONFIG_HIGHPTE) > +#define __KM_PTE (in_nmi() ? KM_NMI_PTE : KM_PTE0) > #define pte_offset_map(dir, address) \ > - ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), KM_PTE0) + \ > + ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) + \ > pte_index((address))) > #define pte_offset_map_nested(dir, address) \ > ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), KM_PTE1) + \ > pte_index((address))) > -#define pte_unmap(pte) kunmap_atomic((pte), KM_PTE0) > +#define pte_unmap(pte) kunmap_atomic((pte), __KM_PTE) > #define pte_unmap_nested(pte) kunmap_atomic((pte), KM_PTE1) > #else > #define pte_offset_map(dir, address) \ I just realized this has a kmap_atomic bug in... The below would fix it, but that's getting rather ugly :-/, alternatively I would have to introduce something like pte_offset_map_irq() which would make the irq/nmi detection and leave the regular code paths alone, however that would mean either duplicating the gup_fast() pagewalk or passing down a pte function pointer, which would only duplicate the gup_pte_range() bit, neither is really attractive... Index: linux-2.6/arch/x86/include/asm/kmap_types.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/kmap_types.h +++ linux-2.6/arch/x86/include/asm/kmap_types.h @@ -19,11 +19,12 @@ D(7) KM_PTE0, D(8) KM_PTE1, D(9) KM_IRQ0, D(10) KM_IRQ1, -D(11) KM_SOFTIRQ0, -D(12) KM_SOFTIRQ1, -D(13) KM_NMI, -D(14) KM_NMI_PTE, -D(15) KM_TYPE_NR +D(11) KM_IRQ_PTE, +D(12) KM_SOFTIRQ0, +D(13) KM_SOFTIRQ1, +D(14) KM_NMI, +D(15) KM_NMI_PTE, +D(16) KM_TYPE_NR }; #undef D Index: linux-2.6/arch/x86/include/asm/pgtable_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h +++ linux-2.6/arch/x86/include/asm/pgtable_32.h @@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u #endif #if defined(CONFIG_HIGHPTE) -#define __KM_PTE (in_nmi() ? KM_NMI_PTE : KM_PTE0) +#define __KM_PTE \ + (in_nmi() ? KM_NMI_PTE : \ + in_irq() ? KM_IRQ_PTE : \ + KM_PTE0) #define pte_offset_map(dir, address) \ ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) + \ pte_index((address))) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 14:46 ` [tip:perfcounters/core] x86: Add NMI types for kmap_atomic Peter Zijlstra @ 2009-06-15 15:30 ` Hugh Dickins 2009-06-15 15:41 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Hugh Dickins @ 2009-06-15 15:30 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, mingo, linux-tip-commits, hugh.dickins On Mon, 15 Jun 2009, Peter Zijlstra wrote: > > The below would fix it, but that's getting rather ugly :-/, > alternatively I would have to introduce something like > pte_offset_map_irq() which would make the irq/nmi detection and leave > the regular code paths alone, however that would mean either duplicating > the gup_fast() pagewalk or passing down a pte function pointer, which > would only duplicate the gup_pte_range() bit, neither is really > attractive... > Index: linux-2.6/arch/x86/include/asm/pgtable_32.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h > +++ linux-2.6/arch/x86/include/asm/pgtable_32.h > @@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u > #endif > > #if defined(CONFIG_HIGHPTE) > -#define __KM_PTE (in_nmi() ? KM_NMI_PTE : KM_PTE0) > +#define __KM_PTE \ > + (in_nmi() ? KM_NMI_PTE : \ > + in_irq() ? KM_IRQ_PTE : \ > + KM_PTE0) > #define pte_offset_map(dir, address) \ > ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) + \ > pte_index((address))) Yes, that does look ugly! I've not been following the background to this, but I've often wondered if a kmap_push() and kmap_pop() could be useful, allowing you to reuse the slot in between - any use here? Hugh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 15:30 ` Hugh Dickins @ 2009-06-15 15:41 ` Peter Zijlstra 2009-06-15 15:52 ` Ingo Molnar 2009-06-15 18:04 ` Peter Zijlstra 0 siblings, 2 replies; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 15:41 UTC (permalink / raw) To: Hugh Dickins Cc: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, mingo, linux-tip-commits On Mon, 2009-06-15 at 16:30 +0100, Hugh Dickins wrote: > On Mon, 15 Jun 2009, Peter Zijlstra wrote: > > > > The below would fix it, but that's getting rather ugly :-/, > > alternatively I would have to introduce something like > > pte_offset_map_irq() which would make the irq/nmi detection and leave > > the regular code paths alone, however that would mean either duplicating > > the gup_fast() pagewalk or passing down a pte function pointer, which > > would only duplicate the gup_pte_range() bit, neither is really > > attractive... > > > Index: linux-2.6/arch/x86/include/asm/pgtable_32.h > > =================================================================== > > --- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h > > +++ linux-2.6/arch/x86/include/asm/pgtable_32.h > > @@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u > > #endif > > > > #if defined(CONFIG_HIGHPTE) > > -#define __KM_PTE (in_nmi() ? KM_NMI_PTE : KM_PTE0) > > +#define __KM_PTE \ > > + (in_nmi() ? KM_NMI_PTE : \ > > + in_irq() ? KM_IRQ_PTE : \ > > + KM_PTE0) > > #define pte_offset_map(dir, address) \ > > ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) + \ > > pte_index((address))) > > Yes, that does look ugly! > > I've not been following the background to this, We need/want to do a user-space stack walk from IRQ/NMI context. The NMI bit means we cannot simply use __copy_from_user_inatomic() since that will still fault (albeit not page), and the fault return path invokes IRET which will terminate the NMI context. Therefore I wrote a copy_from_user_nmi() variant that is based of of __get_user_pages_fast() (a variant that doesn't fall back to the regular GUP), but that means we get 2 kmap_atomic()s, one for HIGHPTE and one for the user page. So this introduces the pte map from IRQ context and one from NMI context. > but I've often > wondered if a kmap_push() and kmap_pop() could be useful, > allowing you to reuse the slot in between - any use here? Yes, that would be much nicer, although less we would loose some of the type validation that lives in -mm, (along with a massive overhaul of the current kmap_atomic usage). Hmm, if we give each explicit type an level and ensure the new push()'ed type's level <= the previous one, we'd still have the full nesting validation and such.. I'll look into doing this. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 15:41 ` Peter Zijlstra @ 2009-06-15 15:52 ` Ingo Molnar 2009-06-15 16:02 ` Hugh Dickins 2009-06-15 18:04 ` Peter Zijlstra 1 sibling, 1 reply; 23+ messages in thread From: Ingo Molnar @ 2009-06-15 15:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, mathieu.desnoyers, torvalds, Jeremy Fitzhardinge * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > I've not been following the background to this, > > We need/want to do a user-space stack walk from IRQ/NMI context. > The NMI bit means we cannot simply use __copy_from_user_inatomic() > since that will still fault (albeit not page), and the fault > return path invokes IRET which will terminate the NMI context. The goal is to allow 'perf' (see tools/perf/) non-flat categorizations like the sample output in the (pending) commit (see it attached further below). Here's the kind of output it allows: $ perf record -g -m 512 -f -- make -j32 kernel $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi 4.14% [k] do_page_fault 1.20% [k] sys_write 1.10% [k] sys_open 0.63% [k] sys_exit_group 0.48% [k] smp_apic_timer_interrupt 0.37% [k] sys_read 0.37% [k] sys_execve 0.20% [k] sys_mmap 0.18% [k] sys_close 0.14% [k] sys_munmap 0.13% [k] sys_poll Note that Oprofile uses the same method of __copy_user_inatomic() in arch/x86/oprofile/backtrace.c, but i believe that code is broken - i doubt the call-chain support for user-space stacks ever worked in oprofile - with perfcounters i can make this method crash under load. (we re-enter the NMI which due to IST executes over the exact same, still pending NMI frame. Kaboom.) I saw you being involved with the Oprofile code 3 years ago: | commit c34d1b4d165c67b966bca4aba026443d7ff161eb | Author: Hugh Dickins <hugh@veritas.com> | Date: Sat Oct 29 18:16:32 2005 -0700 | | [PATCH] mm: kill check_user_page_readable That method of __copy_user_inatomic(), while elegant, is subtly wrong in an NMI context. We really must avoid taking faults there. Ingo ------------> >From 3dfabc74c65904c9e6cf952391312d16ea772ef5 Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@elte.hu> Date: Mon, 15 Jun 2009 11:24:38 +0200 Subject: [PATCH] perf report: Add per system call overhead histogram Take advantage of call-graph percounter sampling/recording to display a non-trivial histogram: the true, collapsed/summarized cost measurement, on a per system call total overhead basis: aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10 aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10 # # (3536 samples) # # Overhead Symbol # ........ ...... # 40.75% [k] sys_write 40.21% [k] sys_read 4.44% [k] do_nmi ... This is done by accounting each (reliable) call-chain that chains back to a given system call to that system call function. [ So in the above example we can see that hackbench spends about 40% of its total time somewhere in sys_write() and 40% somewhere in sys_read(), the rest of the time is spent in user-space. The time is not spent in sys_write() _itself_ but in one of its many child functions. ] Or, a recording of a (source files are already in the page-cache) kernel build: $ perf record -g -m 512 -f -- make -j32 kernel $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi 4.14% [k] do_page_fault 1.20% [k] sys_write 1.10% [k] sys_open 0.63% [k] sys_exit_group 0.48% [k] smp_apic_timer_interrupt 0.37% [k] sys_read 0.37% [k] sys_execve 0.20% [k] sys_mmap 0.18% [k] sys_close 0.14% [k] sys_munmap 0.13% [k] sys_poll 0.09% [k] sys_newstat 0.07% [k] sys_clone 0.06% [k] sys_newfstat 0.05% [k] sys_access 0.05% [k] schedule Shows the true total cost of each syscall variant that gets used during a kernel build. This profile reveals it that pagefaults are the costliest, followed by read()/write(). An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds per 100 seconds of kernel build-time. (this was done with HZ=1000) The summary is done in 'perf report', i.e. in the post-processing stage - so once we have a good call-graph recording, this type of non-trivial high-level analysis becomes possible. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- tools/perf/builtin-report.c | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index aebba56..1e2f5dd 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -40,6 +40,7 @@ static int dump_trace = 0; static int verbose; static int full_paths; +static int collapse_syscalls; static unsigned long page_size; static unsigned long mmap_window = 32; @@ -983,6 +984,15 @@ process_overflow_event(event_t *event, unsigned long offset, unsigned long head) for (i = 0; i < chain->nr; i++) dprintf("..... %2d: %p\n", i, (void *)chain->ips[i]); } + if (collapse_syscalls) { + /* + * Find the all-but-last kernel entry + * amongst the call-chains - to get + * to the level of system calls: + */ + if (chain->kernel >= 2) + ip = chain->ips[chain->kernel-2]; + } } dprintf(" ... thread: %s:%d\n", thread->comm, thread->pid); @@ -1343,6 +1353,8 @@ static const struct option options[] = { "sort by key(s): pid, comm, dso, symbol. Default: pid,symbol"), OPT_BOOLEAN('P', "full-paths", &full_paths, "Don't shorten the pathnames taking into account the cwd"), + OPT_BOOLEAN('S', "syscalls", &collapse_syscalls, + "show per syscall summary overhead, using call graph"), OPT_END() }; ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 15:52 ` Ingo Molnar @ 2009-06-15 16:02 ` Hugh Dickins 0 siblings, 0 replies; 23+ messages in thread From: Hugh Dickins @ 2009-06-15 16:02 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, mathieu.desnoyers, torvalds, Jeremy Fitzhardinge On Mon, 15 Jun 2009, Ingo Molnar wrote: > > Note that Oprofile uses the same method of __copy_user_inatomic() in > arch/x86/oprofile/backtrace.c, but i believe that code is broken - i > doubt the call-chain support for user-space stacks ever worked in > oprofile - with perfcounters i can make this method crash under > load. (we re-enter the NMI which due to IST executes over the exact > same, still pending NMI frame. Kaboom.) > > I saw you being involved with the Oprofile code 3 years ago: > > | commit c34d1b4d165c67b966bca4aba026443d7ff161eb > | Author: Hugh Dickins <hugh@veritas.com> > | Date: Sat Oct 29 18:16:32 2005 -0700 > | > | [PATCH] mm: kill check_user_page_readable > > That method of __copy_user_inatomic(), while elegant, is subtly > wrong in an NMI context. We really must avoid taking faults there. Yes, I'm afraid that subtlety escaped me - thanks for explaining. Hugh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 15:41 ` Peter Zijlstra 2009-06-15 15:52 ` Ingo Molnar @ 2009-06-15 18:04 ` Peter Zijlstra 2009-06-15 18:15 ` Ingo Molnar 2009-06-15 18:42 ` Andrew Morton 1 sibling, 2 replies; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 18:04 UTC (permalink / raw) To: Hugh Dickins Cc: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, mingo, linux-tip-commits, Linus Torvalds, Andrew Morton On Mon, 2009-06-15 at 17:41 +0200, Peter Zijlstra wrote: > On Mon, 2009-06-15 at 16:30 +0100, Hugh Dickins wrote: > > On Mon, 15 Jun 2009, Peter Zijlstra wrote: > > > > > > The below would fix it, but that's getting rather ugly :-/, > > > alternatively I would have to introduce something like > > > pte_offset_map_irq() which would make the irq/nmi detection and leave > > > the regular code paths alone, however that would mean either duplicating > > > the gup_fast() pagewalk or passing down a pte function pointer, which > > > would only duplicate the gup_pte_range() bit, neither is really > > > attractive... > > > > > Index: linux-2.6/arch/x86/include/asm/pgtable_32.h > > > =================================================================== > > > --- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h > > > +++ linux-2.6/arch/x86/include/asm/pgtable_32.h > > > @@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u > > > #endif > > > > > > #if defined(CONFIG_HIGHPTE) > > > -#define __KM_PTE (in_nmi() ? KM_NMI_PTE : KM_PTE0) > > > +#define __KM_PTE \ > > > + (in_nmi() ? KM_NMI_PTE : \ > > > + in_irq() ? KM_IRQ_PTE : \ > > > + KM_PTE0) > > > #define pte_offset_map(dir, address) \ > > > ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) + \ > > > pte_index((address))) > > > > Yes, that does look ugly! > > > > I've not been following the background to this, > > We need/want to do a user-space stack walk from IRQ/NMI context. The NMI > bit means we cannot simply use __copy_from_user_inatomic() since that > will still fault (albeit not page), and the fault return path invokes > IRET which will terminate the NMI context. > > Therefore I wrote a copy_from_user_nmi() variant that is based of of > __get_user_pages_fast() (a variant that doesn't fall back to the regular > GUP), but that means we get 2 kmap_atomic()s, one for HIGHPTE and one > for the user page. > > So this introduces the pte map from IRQ context and one from NMI > context. > > > but I've often > > wondered if a kmap_push() and kmap_pop() could be useful, > > allowing you to reuse the slot in between - any use here? > > Yes, that would be much nicer, although less we would loose some of the > type validation that lives in -mm, (along with a massive overhaul of the > current kmap_atomic usage). > > Hmm, if we give each explicit type an level and ensure the new push()'ed > type's level <= the previous one, we'd still have the full nesting > validation and such.. > > I'll look into doing this. OK, utterly untested, but how does this look? Not-Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- diff --git a/arch/x86/include/asm/kmap_types.h b/arch/x86/include/asm/kmap_types.h index ff00a44..f866138 100644 --- a/arch/x86/include/asm/kmap_types.h +++ b/arch/x86/include/asm/kmap_types.h @@ -19,11 +19,12 @@ D(7) KM_PTE0, D(8) KM_PTE1, D(9) KM_IRQ0, D(10) KM_IRQ1, -D(11) KM_SOFTIRQ0, -D(12) KM_SOFTIRQ1, -D(13) KM_NMI, -D(14) KM_NMI_PTE, -D(15) KM_TYPE_NR +D(11) KM_IRQ_PTE, +D(12) KM_SOFTIRQ0, +D(13) KM_SOFTIRQ1, +D(14) KM_NMI, +D(15) KM_NMI_PTE, +D(16) KM_TYPE_NR }; #undef D diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h index 8546497..d31930a 100644 --- a/arch/x86/include/asm/pgtable_32.h +++ b/arch/x86/include/asm/pgtable_32.h @@ -49,7 +49,20 @@ extern void set_pmd_pfn(unsigned long, unsigned long, pgprot_t); #endif #if defined(CONFIG_HIGHPTE) -#define __KM_PTE (in_nmi() ? KM_NMI_PTE : KM_PTE0) + +#ifdef CONFIG_DEBUG_VM +/* + * for debug we need the right type so that we can validate context + * nesting + */ +#define __KM_PTE \ + (in_nmi() ? KM_NMI_PTE : \ + in_irq() ? KM_IRQ_PTE : \ + KM_PTE0) +#else +#define __KM_PTE KM_PTE0 +#endif + #define pte_offset_map(dir, address) \ ((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) + \ pte_index((address))) diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c index 58f621e..62d15f7 100644 --- a/arch/x86/mm/highmem_32.c +++ b/arch/x86/mm/highmem_32.c @@ -19,6 +19,147 @@ void kunmap(struct page *page) kunmap_high(page); } +struct kmap_atomic_state { + int slot; +#ifdef CONFIG_DEBUG_VM + int taken[KM_TYPE_NR]; + int context; +#endif +}; + +#ifdef CONFIG_DEBUG_VM +enum km_context { + KM_CTX_USER, + KM_CTX_SOFTIRQ, + KM_CTX_IRQ, + KM_CTX_NMI, + + KM_CTX_MAX +}; + +static int kmap_type_to_context(enum km_type type) +{ + switch (type) { + case KM_BOUNCE_READ: + return KM_CTX_USER; + case KM_SKB_SUNRPC_DATA: + return KM_CTX_USER; + case KM_SKB_DATA_SOFTIRQ: + return KM_CTX_SOFTIRQ; + case KM_USER0: + return KM_CTX_USER; + case KM_USER1: + return KM_CTX_USER; + case KM_BIO_SRC_IRQ: + return KM_CTX_IRQ; + case KM_BIO_DST_IRQ: + return KM_CTX_IRQ; + case KM_PTE0: + return KM_CTX_USER; + case KM_PTE1: + return KM_CTX_USER; + case KM_IRQ0: + return KM_CTX_IRQ; + case KM_IRQ1: + return KM_CTX_IRQ; + case KM_SOFTIRQ0: + return KM_CTX_SOFTIRQ; + case KM_SOFTIRQ1: + return KM_CTX_SOFTIRQ; + case KM_NMI: + return KM_CTX_NMI; + case KM_NMI_PTE: + return KM_CTX_NMI; + } + + return KM_CTX_MAX; +} + +static void +kmap_atomic_push_debug(struct kmap_atomic_state *state, enum km_type type) +{ + int context = kmap_type_to_context(type); + int i; + + for (i = 0; i < state->slot; i++) + WARN_ON_ONCE(state->taken[i] == type); + + WARN_ON_ONCE(state->context > context); + + if (context > state->context) + state->context = context; + + switch (context) { + case KM_CTX_USER: + WARN_ON_ONCE(in_irq()); + WARN_ON_ONCE(in_nmi()); + break; + + case KM_CTX_SOFTIRQ: + WARN_ON_ONCE(in_irq()); + WARN_ON_ONCE(in_nmi()); + break; + + case KM_CTX_IRQ: + WARN_ON_ONCE(in_nmi()); + break; + + case KM_CTX_NMI: + break; + + case KM_CTX_MAX: + WARN_ON_ONCE(1); + break; + } +} + +static void +kmap_atomic_pop_debug(struct kmap_atomic_state *state, enum km_type type) +{ + WARN_ON_ONCE(state->taken[state->slot] != type); + + if (!state->slot) { + state->context = KM_CTX_USER; + return; + } + + state->context = kmap_type_to_context(state->taken[state->slot - 1]); +} + +#else /* !CONFIG_DEBUG_VM */ +static inline void +kmap_atomic_push_debug(struct kmap_atomic_state *state, enum km_type type) +{ +} + +static inline void +kmap_atomic_pop_debug(struct kmap_atomic_state *state, enum km_type type) +{ +} +#endif + +static DEFINE_PER_CPU(struct kmap_atomic_state, kmap_state); + +static int kmap_atomic_push(enum km_type type) +{ + struct kmap_atomic_state *state = &per_cpu(kmap_state); + + kmap_atomic_push_debug(state, type); + + return state->slot++; +} + +static int kmap_atomic_pop(enum km_type type) +{ + struct kmap_atomic_state *state = &per_cpu(kmap_state); + + state->slot--; + + kmap_atomic_pop_debug(state, type); + + return state->slot; +} + /* * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because * no global lock is needed and because the kmap code must perform a global TLB @@ -38,9 +179,7 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot) if (!PageHighMem(page)) return page_address(page); - debug_kmap_atomic(type); - - idx = type + KM_TYPE_NR*smp_processor_id(); + idx = KM_TYPE_NR*smp_processor_id() + kmap_atomic_push(type); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); BUG_ON(!pte_none(*(kmap_pte-idx))); set_pte(kmap_pte-idx, mk_pte(page, prot)); @@ -56,8 +195,9 @@ void *kmap_atomic(struct page *page, enum km_type type) void kunmap_atomic(void *kvaddr, enum km_type type) { unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id(); + enum fixed_addresses idx = KM_TYPE_NR*smp_processor_id(); + idx += kmap_atomic_pop(type); /* * Force other mappings to Oops if they'll try to access this pte * without first remap it. Keeping stale mappings around is a bad idea diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 1fcb712..92123b9 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -21,18 +21,6 @@ static inline void flush_kernel_dcache_page(struct page *page) #include <asm/kmap_types.h> -#if defined(CONFIG_DEBUG_HIGHMEM) && defined(CONFIG_TRACE_IRQFLAGS_SUPPORT) - -void debug_kmap_atomic(enum km_type type); - -#else - -static inline void debug_kmap_atomic(enum km_type type) -{ -} - -#endif - #ifdef CONFIG_HIGHMEM #include <asm/highmem.h> diff --git a/mm/highmem.c b/mm/highmem.c index 68eb1d9..9101980 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -422,48 +422,3 @@ void __init page_address_init(void) } #endif /* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */ - -#if defined(CONFIG_DEBUG_HIGHMEM) && defined(CONFIG_TRACE_IRQFLAGS_SUPPORT) - -void debug_kmap_atomic(enum km_type type) -{ - static unsigned warn_count = 10; - - if (unlikely(warn_count == 0)) - return; - - if (unlikely(in_interrupt())) { - if (in_irq()) { - if (type != KM_IRQ0 && type != KM_IRQ1 && - type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ && - type != KM_BOUNCE_READ) { - WARN_ON(1); - warn_count--; - } - } else if (!irqs_disabled()) { /* softirq */ - if (type != KM_IRQ0 && type != KM_IRQ1 && - type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 && - type != KM_SKB_SUNRPC_DATA && - type != KM_SKB_DATA_SOFTIRQ && - type != KM_BOUNCE_READ) { - WARN_ON(1); - warn_count--; - } - } - } - - if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ || - type == KM_BIO_SRC_IRQ || type == KM_BIO_DST_IRQ) { - if (!irqs_disabled()) { - WARN_ON(1); - warn_count--; - } - } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { - if (irq_count() == 0 && !irqs_disabled()) { - WARN_ON(1); - warn_count--; - } - } -} - -#endif ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:04 ` Peter Zijlstra @ 2009-06-15 18:15 ` Ingo Molnar 2009-06-15 18:19 ` Peter Zijlstra 2009-06-15 18:42 ` Andrew Morton 1 sibling, 1 reply; 23+ messages in thread From: Ingo Molnar @ 2009-06-15 18:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > +static int kmap_type_to_context(enum km_type type) > +{ > + switch (type) { > + case KM_BOUNCE_READ: > + return KM_CTX_USER; > + case KM_SKB_SUNRPC_DATA: > + return KM_CTX_USER; > + case KM_SKB_DATA_SOFTIRQ: > + return KM_CTX_SOFTIRQ; > + case KM_USER0: > + return KM_CTX_USER; > + case KM_USER1: > + return KM_CTX_USER; > + case KM_BIO_SRC_IRQ: > + return KM_CTX_IRQ; > + case KM_BIO_DST_IRQ: > + return KM_CTX_IRQ; > + case KM_PTE0: > + return KM_CTX_USER; > + case KM_PTE1: > + return KM_CTX_USER; > + case KM_IRQ0: > + return KM_CTX_IRQ; > + case KM_IRQ1: > + return KM_CTX_IRQ; > + case KM_SOFTIRQ0: > + return KM_CTX_SOFTIRQ; > + case KM_SOFTIRQ1: > + return KM_CTX_SOFTIRQ; > + case KM_NMI: > + return KM_CTX_NMI; > + case KM_NMI_PTE: > + return KM_CTX_NMI; > + } > + > + return KM_CTX_MAX; why not do a very simple stack of atomic kmaps, like Hugh suggested? That would mean a much simpler interface: kaddr = kmap_atomic(page); no index needed. The kunmap pops the entry off the stack: kunmap_atomic(kaddr); This becomes simpler too. Now, a stack can be overflown by imbalance - but that's easy to detect and existing entries are easily printed and thus the source of the leak is easily identified. In my book this design beats the current enumeration of kmap types indices hands down ... It would likely be much more robust as well, and much more easy to extend. Am i missing any subtlety? Ingo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:15 ` Ingo Molnar @ 2009-06-15 18:19 ` Peter Zijlstra 2009-06-15 18:25 ` Ingo Molnar 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 18:19 UTC (permalink / raw) To: Ingo Molnar Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton On Mon, 2009-06-15 at 20:15 +0200, Ingo Molnar wrote: > * Peter Zijlstra <peterz@infradead.org> wrote: > > > +static int kmap_type_to_context(enum km_type type) > > +{ > > + switch (type) { > > + case KM_BOUNCE_READ: > > + return KM_CTX_USER; > > + case KM_SKB_SUNRPC_DATA: > > + return KM_CTX_USER; > > + case KM_SKB_DATA_SOFTIRQ: > > + return KM_CTX_SOFTIRQ; > > + case KM_USER0: > > + return KM_CTX_USER; > > + case KM_USER1: > > + return KM_CTX_USER; > > + case KM_BIO_SRC_IRQ: > > + return KM_CTX_IRQ; > > + case KM_BIO_DST_IRQ: > > + return KM_CTX_IRQ; > > + case KM_PTE0: > > + return KM_CTX_USER; > > + case KM_PTE1: > > + return KM_CTX_USER; > > + case KM_IRQ0: > > + return KM_CTX_IRQ; > > + case KM_IRQ1: > > + return KM_CTX_IRQ; > > + case KM_SOFTIRQ0: > > + return KM_CTX_SOFTIRQ; > > + case KM_SOFTIRQ1: > > + return KM_CTX_SOFTIRQ; > > + case KM_NMI: > > + return KM_CTX_NMI; > > + case KM_NMI_PTE: > > + return KM_CTX_NMI; > > + } > > + > > + return KM_CTX_MAX; > > why not do a very simple stack of atomic kmaps, like Hugh suggested? > > That would mean a much simpler interface: > > kaddr = kmap_atomic(page); > > no index needed. The kunmap pops the entry off the stack: > > kunmap_atomic(kaddr); > > This becomes simpler too. > > Now, a stack can be overflown by imbalance - but that's easy to > detect and existing entries are easily printed and thus the source > of the leak is easily identified. > > In my book this design beats the current enumeration of kmap types > indices hands down ... It would likely be much more robust as well, > and much more easy to extend. > > Am i missing any subtlety? The above is mostly debug code used to validate the kmap_atomic conditions. KM_CTX_NMI nests in KM_CTX_IRQ nests in KM_CTX_SOFTIRQ nests in KM_CTX_USER. And validate that we indeed are in the context specified by the type. That is, it will warn if we use KM_IRQ1 with KM_CTX_IRQ from user context. Some of this was already captured in the old kmap debug code which I removed. But yes, I should write that nicer.. /me goes clean up ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:19 ` Peter Zijlstra @ 2009-06-15 18:25 ` Ingo Molnar 2009-06-15 18:30 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Ingo Molnar @ 2009-06-15 18:25 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2009-06-15 at 20:15 +0200, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > +static int kmap_type_to_context(enum km_type type) > > > +{ > > > + switch (type) { > > > + case KM_BOUNCE_READ: > > > + return KM_CTX_USER; > > > + case KM_SKB_SUNRPC_DATA: > > > + return KM_CTX_USER; > > > + case KM_SKB_DATA_SOFTIRQ: > > > + return KM_CTX_SOFTIRQ; > > > + case KM_USER0: > > > + return KM_CTX_USER; > > > + case KM_USER1: > > > + return KM_CTX_USER; > > > + case KM_BIO_SRC_IRQ: > > > + return KM_CTX_IRQ; > > > + case KM_BIO_DST_IRQ: > > > + return KM_CTX_IRQ; > > > + case KM_PTE0: > > > + return KM_CTX_USER; > > > + case KM_PTE1: > > > + return KM_CTX_USER; > > > + case KM_IRQ0: > > > + return KM_CTX_IRQ; > > > + case KM_IRQ1: > > > + return KM_CTX_IRQ; > > > + case KM_SOFTIRQ0: > > > + return KM_CTX_SOFTIRQ; > > > + case KM_SOFTIRQ1: > > > + return KM_CTX_SOFTIRQ; > > > + case KM_NMI: > > > + return KM_CTX_NMI; > > > + case KM_NMI_PTE: > > > + return KM_CTX_NMI; > > > + } > > > + > > > + return KM_CTX_MAX; > > > > why not do a very simple stack of atomic kmaps, like Hugh suggested? > > > > That would mean a much simpler interface: > > > > kaddr = kmap_atomic(page); > > > > no index needed. The kunmap pops the entry off the stack: > > > > kunmap_atomic(kaddr); > > > > This becomes simpler too. > > > > Now, a stack can be overflown by imbalance - but that's easy to > > detect and existing entries are easily printed and thus the source > > of the leak is easily identified. > > > > In my book this design beats the current enumeration of kmap types > > indices hands down ... It would likely be much more robust as well, > > and much more easy to extend. > > > > Am i missing any subtlety? > > The above is mostly debug code used to validate the kmap_atomic > conditions. > > KM_CTX_NMI nests in KM_CTX_IRQ nests in KM_CTX_SOFTIRQ nests in > KM_CTX_USER. > > And validate that we indeed are in the context specified by the type. > That is, it will warn if we use KM_IRQ1 with KM_CTX_IRQ from user > context. > > Some of this was already captured in the old kmap debug code which I > removed. > > But yes, I should write that nicer.. but ... look at the APIs i propose above. We dont need _any_ 'types'. That type enumeration is basically an open-coded allocator. If we do a _real_ allocator (a balanced stack of atomic kmaps) we dont need any of those indices, and all the potential for mismatch goes away as well - a stack nests trivially with IRQ and NMI and arbitrary other contexts. Ingo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:25 ` Ingo Molnar @ 2009-06-15 18:30 ` Peter Zijlstra 2009-06-15 18:42 ` Ingo Molnar 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 18:30 UTC (permalink / raw) To: Ingo Molnar Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote: > * Peter Zijlstra <peterz@infradead.org> wrote: > but ... look at the APIs i propose above. We dont need _any_ > 'types'. > > That type enumeration is basically an open-coded allocator. If we do > a _real_ allocator (a balanced stack of atomic kmaps) we dont need > any of those indices, and all the potential for mismatch goes away > as well - a stack nests trivially with IRQ and NMI and arbitrary > other contexts. You want types because: - they encode the intent, and can be verified - they help keep track of the max nesting depth In the proposed implementation all type code basically falls away no ! CONFIG_DEBUG_VM, but is kept around for robustness. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:30 ` Peter Zijlstra @ 2009-06-15 18:42 ` Ingo Molnar 2009-06-15 18:47 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Ingo Molnar @ 2009-06-15 18:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > but ... look at the APIs i propose above. We dont need _any_ > > 'types'. > > > > That type enumeration is basically an open-coded allocator. If we do > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need > > any of those indices, and all the potential for mismatch goes away > > as well - a stack nests trivially with IRQ and NMI and arbitrary > > other contexts. > > You want types because: > - they encode the intent, and can be verified > - they help keep track of the max nesting depth > > In the proposed implementation all type code basically falls away > no ! CONFIG_DEBUG_VM, but is kept around for robustness. But much of the fragility of the types (and their clumsiness - for example in highpte ops we have to know at which level of the pagetables we are, and use the right kind of index) is _precisely_ because we have the types ... Unbalanced unmaps will be detected under CONFIG_DEBUG_VM: kunmap uses 'page' as a parameter which is checked against the pte entry - they must match. I.e. it becomes a symmetric and expressive API: kaddr = kmap_atomic(page); ... kunmap_atomic(page); Hm? Ingo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:42 ` Ingo Molnar @ 2009-06-15 18:47 ` Peter Zijlstra 2009-06-15 18:52 ` Ingo Molnar 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 18:47 UTC (permalink / raw) To: Ingo Molnar Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote: > * Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote: > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > but ... look at the APIs i propose above. We dont need _any_ > > > 'types'. > > > > > > That type enumeration is basically an open-coded allocator. If we do > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need > > > any of those indices, and all the potential for mismatch goes away > > > as well - a stack nests trivially with IRQ and NMI and arbitrary > > > other contexts. > > > > You want types because: > > - they encode the intent, and can be verified > > - they help keep track of the max nesting depth > > > > In the proposed implementation all type code basically falls away > > no ! CONFIG_DEBUG_VM, but is kept around for robustness. > > But much of the fragility of the types (and their clumsiness - for > example in highpte ops we have to know at which level of the > pagetables we are, and use the right kind of index) is _precisely_ > because we have the types ... How will you manage the max depth? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:47 ` Peter Zijlstra @ 2009-06-15 18:52 ` Ingo Molnar 2009-06-15 19:00 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Ingo Molnar @ 2009-06-15 18:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote: > > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > but ... look at the APIs i propose above. We dont need _any_ > > > > 'types'. > > > > > > > > That type enumeration is basically an open-coded allocator. If we do > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need > > > > any of those indices, and all the potential for mismatch goes away > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary > > > > other contexts. > > > > > > You want types because: > > > - they encode the intent, and can be verified > > > - they help keep track of the max nesting depth > > > > > > In the proposed implementation all type code basically falls away > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness. > > > > But much of the fragility of the types (and their clumsiness - for > > example in highpte ops we have to know at which level of the > > pagetables we are, and use the right kind of index) is _precisely_ > > because we have the types ... > > How will you manage the max depth? if (++depth == MAX_DEPTH) { print_all_entries_and_nasty_warning(); /* hope we'll live long enough for the syslog to touch disk */ depth = 0; } unbalanced kmap is a bad bug - the easier we make it to catch, the better. The system wouldnt survive anyway. Ingo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:52 ` Ingo Molnar @ 2009-06-15 19:00 ` Peter Zijlstra 2009-06-16 8:13 ` Ingo Molnar 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 19:00 UTC (permalink / raw) To: Ingo Molnar Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton On Mon, 2009-06-15 at 20:52 +0200, Ingo Molnar wrote: > * Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote: > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote: > > > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > but ... look at the APIs i propose above. We dont need _any_ > > > > > 'types'. > > > > > > > > > > That type enumeration is basically an open-coded allocator. If we do > > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need > > > > > any of those indices, and all the potential for mismatch goes away > > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary > > > > > other contexts. > > > > > > > > You want types because: > > > > - they encode the intent, and can be verified > > > > - they help keep track of the max nesting depth > > > > > > > > In the proposed implementation all type code basically falls away > > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness. > > > > > > But much of the fragility of the types (and their clumsiness - for > > > example in highpte ops we have to know at which level of the > > > pagetables we are, and use the right kind of index) is _precisely_ > > > because we have the types ... > > > > How will you manage the max depth? > > if (++depth == MAX_DEPTH) { > print_all_entries_and_nasty_warning(); > /* hope we'll live long enough for the syslog to touch disk */ > depth = 0; > } That will only trigger if we hit it, which will be _very_ rare. > unbalanced kmap is a bad bug - the easier we make it to catch, the > better. The system wouldnt survive anyway. My proposed patch validates strict balance of types. But I can easily add the above as well. By removing the types it becomes very difficult to verify the max depth. I really don't like removing them. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 19:00 ` Peter Zijlstra @ 2009-06-16 8:13 ` Ingo Molnar 2009-06-16 12:38 ` Hugh Dickins 2009-06-17 7:44 ` Peter Zijlstra 0 siblings, 2 replies; 23+ messages in thread From: Ingo Molnar @ 2009-06-16 8:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2009-06-15 at 20:52 +0200, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote: > > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote: > > > > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > > but ... look at the APIs i propose above. We dont need _any_ > > > > > > 'types'. > > > > > > > > > > > > That type enumeration is basically an open-coded allocator. If we do > > > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need > > > > > > any of those indices, and all the potential for mismatch goes away > > > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary > > > > > > other contexts. > > > > > > > > > > You want types because: > > > > > - they encode the intent, and can be verified > > > > > - they help keep track of the max nesting depth > > > > > > > > > > In the proposed implementation all type code basically falls away > > > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness. > > > > > > > > But much of the fragility of the types (and their clumsiness - for > > > > example in highpte ops we have to know at which level of the > > > > pagetables we are, and use the right kind of index) is _precisely_ > > > > because we have the types ... > > > > > > How will you manage the max depth? > > > > if (++depth == MAX_DEPTH) { > > print_all_entries_and_nasty_warning(); > > /* hope we'll live long enough for the syslog to touch disk */ > > depth = 0; > > } > > That will only trigger if we hit it, which will be _very_ rare. > > > unbalanced kmap is a bad bug - the easier we make it to catch, > > the better. The system wouldnt survive anyway. > > My proposed patch validates strict balance of types. But I can > easily add the above as well. > > By removing the types it becomes very difficult to verify the max > depth. I really don't like removing them. The fact that it implies an atomic section pretty much limits its depth in practice, doesnt it? All we need to track in the debug code is max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts must be smaller than the max - even if (as you are right to point out) we dont hit that magic combo that truly maximizes the depth. And note that in practice many of the current types are exclusive to each other - so using the stack would _reduce_ the amount of kmap-atomic space we need. Ingo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-16 8:13 ` Ingo Molnar @ 2009-06-16 12:38 ` Hugh Dickins 2009-06-17 7:58 ` Peter Zijlstra 2009-06-17 7:44 ` Peter Zijlstra 1 sibling, 1 reply; 23+ messages in thread From: Hugh Dickins @ 2009-06-16 12:38 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton On Tue, 16 Jun 2009, Ingo Molnar wrote: > * Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, 2009-06-15 at 20:52 +0200, Ingo Molnar wrote: > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote: > > > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote: > > > > > > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > > > > but ... look at the APIs i propose above. We dont need _any_ > > > > > > > 'types'. > > > > > > > > > > > > > > That type enumeration is basically an open-coded allocator. If we do > > > > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need > > > > > > > any of those indices, and all the potential for mismatch goes away > > > > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary > > > > > > > other contexts. > > > > > > > > > > > > You want types because: > > > > > > - they encode the intent, and can be verified > > > > > > - they help keep track of the max nesting depth > > > > > > > > > > > > In the proposed implementation all type code basically falls away > > > > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness. > > > > > > > > > > But much of the fragility of the types (and their clumsiness - for > > > > > example in highpte ops we have to know at which level of the > > > > > pagetables we are, and use the right kind of index) is _precisely_ > > > > > because we have the types ... > > > > > > > > How will you manage the max depth? > > > > > > if (++depth == MAX_DEPTH) { > > > print_all_entries_and_nasty_warning(); > > > /* hope we'll live long enough for the syslog to touch disk */ > > > depth = 0; > > > } > > > > That will only trigger if we hit it, which will be _very_ rare. > > > > > unbalanced kmap is a bad bug - the easier we make it to catch, > > > the better. The system wouldnt survive anyway. > > > > My proposed patch validates strict balance of types. But I can > > easily add the above as well. > > > > By removing the types it becomes very difficult to verify the max > > depth. I really don't like removing them. > > The fact that it implies an atomic section pretty much limits its > depth in practice, doesnt it? > > All we need to track in the debug code is > max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts must be > smaller than the max - even if (as you are right to point out) we > dont hit that magic combo that truly maximizes the depth. > > And note that in practice many of the current types are exclusive to > each other - so using the stack would _reduce_ the amount of > kmap-atomic space we need. I'll briefly resurface into the discussion before submerging again ;) I like very much the direction you're taking this, Ingo. Yes, that is how I've sometimes thought we should go - though when making the kmap_push/kmap_pop suggestion to Peter yesterday, I wasn't expecting him to make that revolution, just provide a way to save a current KM_type mapping and restore it later, so he can safely use the standard primitives like pte_offset_map() within. I wasn't expecting in_nmi() and in_irq() tests still to be there, even if only when debug. I can understand Peter's lockdep background wanting to retain the checking and KM_types, but if we're actually going to overhaul this area, I'd love just to get rid of them. Yes, that should reduce the amount of kmap_atomic space needed; though I've not thought how we keep track of the maximum needed as the kernel goes on developing. There might be a very few places where we expect to kmap_atomic A, kmap_atomic B, kunmap_atomic A, kunmap_atomic B? Something else to throw in: what if they were not just atomic, but also replaced the current sleeping kmaps? i.e. a task context carries around its own stack of these. I've always rejected that as introducing a pretty terrible overhead just where we don't want it; but maybe you're ingenious enough to devise ways of amortizing that cost. It would be nice to delete mm/highmem.c is we could. Ah, but there are probably places where one task passes a kmap address to another? Hugh ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-16 12:38 ` Hugh Dickins @ 2009-06-17 7:58 ` Peter Zijlstra 2009-06-17 8:43 ` Tejun Heo 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2009-06-17 7:58 UTC (permalink / raw) To: Hugh Dickins Cc: Ingo Molnar, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton, Tejun Heo On Tue, 2009-06-16 at 13:38 +0100, Hugh Dickins wrote: > > Something else to throw in: what if they were not just atomic, > but also replaced the current sleeping kmaps? i.e. a task context > carries around its own stack of these. I actually did that once, but it means the task needs to be cpu-affine, because fixmaps have different addresses between cpus. And disabling migration for tasks has subtle side-effects so I dropped that again. However, I recently considered the possiblity of putting the fixmaps in the new per-cpu address space so that we might use the %gs segment to normalize the fixmap addresses between the cpus. This would allow full preemptible kmaps (yay for -rt). However I suspect it might greatly complicate kmaps for the !i386 world. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-17 7:58 ` Peter Zijlstra @ 2009-06-17 8:43 ` Tejun Heo 2009-06-17 9:05 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Tejun Heo @ 2009-06-17 8:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, Ingo Molnar, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton Hello, Peter Zijlstra wrote: > On Tue, 2009-06-16 at 13:38 +0100, Hugh Dickins wrote: >> Something else to throw in: what if they were not just atomic, >> but also replaced the current sleeping kmaps? i.e. a task context >> carries around its own stack of these. > > I actually did that once, but it means the task needs to be cpu-affine, > because fixmaps have different addresses between cpus. And disabling > migration for tasks has subtle side-effects so I dropped that again. > > However, I recently considered the possiblity of putting the fixmaps in > the new per-cpu address space so that we might use the %gs segment to > normalize the fixmap addresses between the cpus. > > This would allow full preemptible kmaps (yay for -rt). > > However I suspect it might greatly complicate kmaps for the !i386 world. Other archs are in the process of conversion so once that is complete, there's no reason this should be more difficult but it means that kmapped addresses should be accessed differently from regular ones which we can't do. :-( Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-17 8:43 ` Tejun Heo @ 2009-06-17 9:05 ` Peter Zijlstra 0 siblings, 0 replies; 23+ messages in thread From: Peter Zijlstra @ 2009-06-17 9:05 UTC (permalink / raw) To: Tejun Heo Cc: Hugh Dickins, Ingo Molnar, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton On Wed, 2009-06-17 at 17:43 +0900, Tejun Heo wrote: > Hello, > > Peter Zijlstra wrote: > > On Tue, 2009-06-16 at 13:38 +0100, Hugh Dickins wrote: > >> Something else to throw in: what if they were not just atomic, > >> but also replaced the current sleeping kmaps? i.e. a task context > >> carries around its own stack of these. > > > > I actually did that once, but it means the task needs to be cpu-affine, > > because fixmaps have different addresses between cpus. And disabling > > migration for tasks has subtle side-effects so I dropped that again. > > > > However, I recently considered the possiblity of putting the fixmaps in > > the new per-cpu address space so that we might use the %gs segment to > > normalize the fixmap addresses between the cpus. > > > > This would allow full preemptible kmaps (yay for -rt). > > > > However I suspect it might greatly complicate kmaps for the !i386 world. > > Other archs are in the process of conversion so once that is complete, > there's no reason this should be more difficult but it means that > kmapped addresses should be accessed differently from regular ones > which we can't do. :-( Right, we'd need percpu_memset, percpu_copy_from, percpu_copy_to, etc. that all operate on %gs like addrs. might be a tad invasive. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-16 8:13 ` Ingo Molnar 2009-06-16 12:38 ` Hugh Dickins @ 2009-06-17 7:44 ` Peter Zijlstra 2009-06-17 12:28 ` Ingo Molnar 1 sibling, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2009-06-17 7:44 UTC (permalink / raw) To: Ingo Molnar Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton On Tue, 2009-06-16 at 10:13 +0200, Ingo Molnar wrote: > > By removing the types it becomes very difficult to verify the max > > depth. I really don't like removing them. > > The fact that it implies an atomic section pretty much limits its > depth in practice, doesnt it? > > All we need to track in the debug code is > max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts must be > smaller than the max - even if (as you are right to point out) we > dont hit that magic combo that truly maximizes the depth. Right, so the thing I'd worry about is someone adding kmap_atomic() to an interrupt context that didn't have interrupts disabled and then managing to nest that a few times. Suppose you put it in some IO completion handler, and someone has 4 IO controllers installed and all 4 IO interrupts come in at the 'same' time. With types you could warn on similarly to what we do today, but with the simple push/pop that might be a lot harder. Anyway, with the whole cr2 fiddling bit being discussed this seems to become redundant. > And note that in practice many of the current types are exclusive to > each other - so using the stack would _reduce_ the amount of > kmap-atomic space we need. Yeah, I did realize that. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-17 7:44 ` Peter Zijlstra @ 2009-06-17 12:28 ` Ingo Molnar 0 siblings, 0 replies; 23+ messages in thread From: Ingo Molnar @ 2009-06-17 12:28 UTC (permalink / raw) To: Peter Zijlstra Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton * Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2009-06-16 at 10:13 +0200, Ingo Molnar wrote: > > > > By removing the types it becomes very difficult to verify the max > > > depth. I really don't like removing them. > > > > The fact that it implies an atomic section pretty much limits its > > depth in practice, doesnt it? > > > > All we need to track in the debug code is > > max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts > > must be smaller than the max - even if (as you are right to > > point out) we dont hit that magic combo that truly maximizes the > > depth. > > Right, so the thing I'd worry about is someone adding > kmap_atomic() to an interrupt context that didn't have interrupts > disabled and then managing to nest that a few times. > > Suppose you put it in some IO completion handler, and someone has > 4 IO controllers installed and all 4 IO interrupts come in at the > 'same' time. > > With types you could warn on similarly to what we do today, but > with the simple push/pop that might be a lot harder. Yes, fixed-purpose allocations are easier to warn about - they imply more constraints, no doubt about that. But we could warn about using kmap-atomic with in irq context with irqs enabled and thus exclude the case you are worried about? > Anyway, with the whole cr2 fiddling bit being discussed this seems > to become redundant. It's not just the cr2 fiddling but also conversion of pagefault returns from IRET to RET. The kmap_atomic API change is a nice cleanup in itself. Ingo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:04 ` Peter Zijlstra 2009-06-15 18:15 ` Ingo Molnar @ 2009-06-15 18:42 ` Andrew Morton 2009-06-15 18:45 ` Peter Zijlstra 1 sibling, 1 reply; 23+ messages in thread From: Andrew Morton @ 2009-06-15 18:42 UTC (permalink / raw) To: Peter Zijlstra Cc: hugh.dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, mingo, linux-tip-commits, torvalds, Randy Dunlap On Mon, 15 Jun 2009 20:04:25 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > OK, utterly untested, but how does this look? arch/x86/include/asm/kmap_types.h | 11 +- arch/x86/include/asm/pgtable_32.h | 15 +++ arch/x86/mm/highmem_32.c | 148 ++++++++++++++++++++++++++++++++++++-- include/linux/highmem.h | 12 --- mm/highmem.c | 45 ----------- 5 files changed, 164 insertions(+), 67 deletions(-) Did http://userweb.kernel.org/~akpm/mmotm/broken-out/kmap_types-make-most-arches-use-generic-header-file.patch just die a horrid death? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic 2009-06-15 18:42 ` Andrew Morton @ 2009-06-15 18:45 ` Peter Zijlstra 0 siblings, 0 replies; 23+ messages in thread From: Peter Zijlstra @ 2009-06-15 18:45 UTC (permalink / raw) To: Andrew Morton Cc: hugh.dickins, linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, mingo, linux-tip-commits, torvalds, Randy Dunlap On Mon, 2009-06-15 at 11:42 -0700, Andrew Morton wrote: > On Mon, 15 Jun 2009 20:04:25 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > > > OK, utterly untested, but how does this look? > > arch/x86/include/asm/kmap_types.h | 11 +- > arch/x86/include/asm/pgtable_32.h | 15 +++ > arch/x86/mm/highmem_32.c | 148 ++++++++++++++++++++++++++++++++++++-- > include/linux/highmem.h | 12 --- > mm/highmem.c | 45 ----------- > 5 files changed, 164 insertions(+), 67 deletions(-) > > Did > http://userweb.kernel.org/~akpm/mmotm/broken-out/kmap_types-make-most-arches-use-generic-header-file.patch > just die a horrid death? hehe, yep, but I can fix that up.. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2009-06-17 12:28 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <tip-3ff0141aa3a03ca3388b40b36167d0a37919f3fd@git.kernel.org>
2009-06-15 14:46 ` [tip:perfcounters/core] x86: Add NMI types for kmap_atomic Peter Zijlstra
2009-06-15 15:30 ` Hugh Dickins
2009-06-15 15:41 ` Peter Zijlstra
2009-06-15 15:52 ` Ingo Molnar
2009-06-15 16:02 ` Hugh Dickins
2009-06-15 18:04 ` Peter Zijlstra
2009-06-15 18:15 ` Ingo Molnar
2009-06-15 18:19 ` Peter Zijlstra
2009-06-15 18:25 ` Ingo Molnar
2009-06-15 18:30 ` Peter Zijlstra
2009-06-15 18:42 ` Ingo Molnar
2009-06-15 18:47 ` Peter Zijlstra
2009-06-15 18:52 ` Ingo Molnar
2009-06-15 19:00 ` Peter Zijlstra
2009-06-16 8:13 ` Ingo Molnar
2009-06-16 12:38 ` Hugh Dickins
2009-06-17 7:58 ` Peter Zijlstra
2009-06-17 8:43 ` Tejun Heo
2009-06-17 9:05 ` Peter Zijlstra
2009-06-17 7:44 ` Peter Zijlstra
2009-06-17 12:28 ` Ingo Molnar
2009-06-15 18:42 ` Andrew Morton
2009-06-15 18:45 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox