* NMI between switch_mm and switch_to @ 2009-07-28 4:49 Paul Mackerras 2009-07-28 7:51 ` Peter Zijlstra 0 siblings, 1 reply; 6+ messages in thread From: Paul Mackerras @ 2009-07-28 4:49 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel Ben H. suggested there might be a problem if we get a PMU interrupt and try to do a stack trace of userspace in the interval between when we call switch_mm() from sched.c:context_switch() and when we call switch_to(). If we get an NMI in that interval and do a stack trace of userspace, we'll see the registers of the old task but when we peek at user addresses we'll see the memory image for the new task, so the stack trace we get will be completely bogus. Is this in fact also a problem on x86, or is there some subtle reason why it can't happen there? Paul. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI between switch_mm and switch_to 2009-07-28 4:49 NMI between switch_mm and switch_to Paul Mackerras @ 2009-07-28 7:51 ` Peter Zijlstra 2009-07-28 9:23 ` Andi Kleen 2009-08-03 8:29 ` Ingo Molnar 0 siblings, 2 replies; 6+ messages in thread From: Peter Zijlstra @ 2009-07-28 7:51 UTC (permalink / raw) To: Paul Mackerras; +Cc: Ingo Molnar, linux-kernel On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote: > Ben H. suggested there might be a problem if we get a PMU interrupt > and try to do a stack trace of userspace in the interval between when > we call switch_mm() from sched.c:context_switch() and when we call > switch_to(). If we get an NMI in that interval and do a stack trace > of userspace, we'll see the registers of the old task but when we peek > at user addresses we'll see the memory image for the new task, so the > stack trace we get will be completely bogus. > > Is this in fact also a problem on x86, or is there some subtle reason > why it can't happen there? I can't spot one, maybe Ingo can when he's back :-) So I think this is very good spotting from Ben. We could use preempt notifiers (or put in our own hooks) to disable callchains during the context switch I suppose. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI between switch_mm and switch_to 2009-07-28 7:51 ` Peter Zijlstra @ 2009-07-28 9:23 ` Andi Kleen 2009-08-03 8:29 ` Ingo Molnar 1 sibling, 0 replies; 6+ messages in thread From: Andi Kleen @ 2009-07-28 9:23 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Paul Mackerras, Ingo Molnar, linux-kernel Peter Zijlstra <a.p.zijlstra@chello.nl> writes: > We could use preempt notifiers (or put in our own hooks) to disable > callchains during the context switch I suppose. You can simply check if cr3 matches the current->mm page tables and not dump if they are not. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI between switch_mm and switch_to 2009-07-28 7:51 ` Peter Zijlstra 2009-07-28 9:23 ` Andi Kleen @ 2009-08-03 8:29 ` Ingo Molnar 2009-08-03 10:32 ` Paul Mackerras 1 sibling, 1 reply; 6+ messages in thread From: Ingo Molnar @ 2009-08-03 8:29 UTC (permalink / raw) To: Peter Zijlstra, Benjamin Herrenschmidt; +Cc: Paul Mackerras, linux-kernel * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote: > > > Ben H. suggested there might be a problem if we get a PMU > > interrupt and try to do a stack trace of userspace in the > > interval between when we call switch_mm() from > > sched.c:context_switch() and when we call switch_to(). If we > > get an NMI in that interval and do a stack trace of userspace, > > we'll see the registers of the old task but when we peek at user > > addresses we'll see the memory image for the new task, so the > > stack trace we get will be completely bogus. > > > > Is this in fact also a problem on x86, or is there some subtle > > reason why it can't happen there? > > I can't spot one, maybe Ingo can when he's back :-) > > So I think this is very good spotting from Ben. Yeah. > We could use preempt notifiers (or put in our own hooks) to > disable callchains during the context switch I suppose. I think we should only disable user call-chains i think - the in-kernel call-chain is still reliable. Also, i think we dont need preempt notifiers, we can use a simple check like this: if (current->mm && cpu_isset(smp_processor_id(), ¤t->mm->cpu_vm_mask) { ... } In the user-call-chain code. We'd only touch the user memory image if that bit is set. cpu_vm_mask is maintained atomically and before we switch the MM, so it should be race-free. Ingo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI between switch_mm and switch_to 2009-08-03 8:29 ` Ingo Molnar @ 2009-08-03 10:32 ` Paul Mackerras 2009-08-03 10:43 ` Ingo Molnar 0 siblings, 1 reply; 6+ messages in thread From: Paul Mackerras @ 2009-08-03 10:32 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, Benjamin Herrenschmidt, linux-kernel Ingo Molnar writes: > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote: > > > > > Ben H. suggested there might be a problem if we get a PMU > > > interrupt and try to do a stack trace of userspace in the > > > interval between when we call switch_mm() from > > > sched.c:context_switch() and when we call switch_to(). If we > > > get an NMI in that interval and do a stack trace of userspace, > > > we'll see the registers of the old task but when we peek at user > > > addresses we'll see the memory image for the new task, so the > > > stack trace we get will be completely bogus. > > > > > > Is this in fact also a problem on x86, or is there some subtle > > > reason why it can't happen there? > > > > I can't spot one, maybe Ingo can when he's back :-) > > > > So I think this is very good spotting from Ben. > > Yeah. > > > We could use preempt notifiers (or put in our own hooks) to > > disable callchains during the context switch I suppose. > > I think we should only disable user call-chains i think - the > in-kernel call-chain is still reliable. > > Also, i think we dont need preempt notifiers, we can use a simple > check like this: > > if (current->mm && > cpu_isset(smp_processor_id(), ¤t->mm->cpu_vm_mask) { On x86, do you clear the current processor's bit in cpu_vm_mask when you switch the MMU away from a task? We don't on powerpc, which would render the above test incorrect. (But then we don't actually have the problem on powerpc since interrupts get hard-disabled in switch_mm and stay hard-disabled until they get soft-enabled.) Paul. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI between switch_mm and switch_to 2009-08-03 10:32 ` Paul Mackerras @ 2009-08-03 10:43 ` Ingo Molnar 0 siblings, 0 replies; 6+ messages in thread From: Ingo Molnar @ 2009-08-03 10:43 UTC (permalink / raw) To: Paul Mackerras; +Cc: Peter Zijlstra, Benjamin Herrenschmidt, linux-kernel * Paul Mackerras <paulus@samba.org> wrote: > Ingo Molnar writes: > > > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > > > On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote: > > > > > > > Ben H. suggested there might be a problem if we get a PMU > > > > interrupt and try to do a stack trace of userspace in the > > > > interval between when we call switch_mm() from > > > > sched.c:context_switch() and when we call switch_to(). If we > > > > get an NMI in that interval and do a stack trace of userspace, > > > > we'll see the registers of the old task but when we peek at user > > > > addresses we'll see the memory image for the new task, so the > > > > stack trace we get will be completely bogus. > > > > > > > > Is this in fact also a problem on x86, or is there some subtle > > > > reason why it can't happen there? > > > > > > I can't spot one, maybe Ingo can when he's back :-) > > > > > > So I think this is very good spotting from Ben. > > > > Yeah. > > > > > We could use preempt notifiers (or put in our own hooks) to > > > disable callchains during the context switch I suppose. > > > > I think we should only disable user call-chains i think - the > > in-kernel call-chain is still reliable. > > > > Also, i think we dont need preempt notifiers, we can use a simple > > check like this: > > > > if (current->mm && > > cpu_isset(smp_processor_id(), ¤t->mm->cpu_vm_mask) { > > On x86, do you clear the current processor's bit in cpu_vm_mask > when you switch the MMU away from a task? We don't on powerpc, > which would render the above test incorrect. (But then we don't > actually have the problem on powerpc since interrupts get > hard-disabled in switch_mm and stay hard-disabled until they get > soft-enabled.) This is what x86 does in arch/x86/include/asm/mmu_context.h: static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { unsigned cpu = smp_processor_id(); if (likely(prev != next)) { /* stop flush ipis for the previous mm */ cpu_clear(cpu, prev->cpu_vm_mask); #ifdef CONFIG_SMP percpu_write(cpu_tlbstate.state, TLBSTATE_OK); percpu_write(cpu_tlbstate.active_mm, next); #endif cpu_set(cpu, next->cpu_vm_mask); /* Re-load page tables */ load_cr3(next->pgd); /* * load the LDT, if the LDT is different: */ if (unlikely(prev->context.ldt != next->context.ldt)) load_LDT_nolock(&next->context); } #ifdef CONFIG_SMP else { percpu_write(cpu_tlbstate.state, TLBSTATE_OK); BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next); if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) { /* We were in lazy tlb mode and leave_mm disabled * tlb flush IPI delivery. We must reload CR3 * to make sure to use no freed page tables. */ load_cr3(next->pgd); load_LDT_nolock(&next->context); } } #endif } which would suggest to me that cpu_vm_mask is precise. Ingo ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-08-03 10:43 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-28 4:49 NMI between switch_mm and switch_to Paul Mackerras 2009-07-28 7:51 ` Peter Zijlstra 2009-07-28 9:23 ` Andi Kleen 2009-08-03 8:29 ` Ingo Molnar 2009-08-03 10:32 ` Paul Mackerras 2009-08-03 10:43 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox