NMI between switch_mm and switch

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* NMI between switch_mm and switch_to
@ 2009-07-28  4:49 Paul Mackerras
  2009-07-28  7:51 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2009-07-28  4:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel

Ben H. suggested there might be a problem if we get a PMU interrupt
and try to do a stack trace of userspace in the interval between when
we call switch_mm() from sched.c:context_switch() and when we call
switch_to().  If we get an NMI in that interval and do a stack trace
of userspace, we'll see the registers of the old task but when we peek
at user addresses we'll see the memory image for the new task, so the
stack trace we get will be completely bogus.

Is this in fact also a problem on x86, or is there some subtle reason
why it can't happen there?

Paul.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI between switch_mm and switch_to
  2009-07-28  4:49 NMI between switch_mm and switch_to Paul Mackerras
@ 2009-07-28  7:51 ` Peter Zijlstra
  2009-07-28  9:23   ` Andi Kleen
  2009-08-03  8:29   ` Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2009-07-28  7:51 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Ingo Molnar, linux-kernel

On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote:
> Ben H. suggested there might be a problem if we get a PMU interrupt
> and try to do a stack trace of userspace in the interval between when
> we call switch_mm() from sched.c:context_switch() and when we call
> switch_to().  If we get an NMI in that interval and do a stack trace
> of userspace, we'll see the registers of the old task but when we peek
> at user addresses we'll see the memory image for the new task, so the
> stack trace we get will be completely bogus.
> 
> Is this in fact also a problem on x86, or is there some subtle reason
> why it can't happen there?

I can't spot one, maybe Ingo can when he's back :-)

So I think this is very good spotting from Ben.

We could use preempt notifiers (or put in our own hooks) to disable
callchains during the context switch I suppose.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI between switch_mm and switch_to
  2009-07-28  7:51 ` Peter Zijlstra
@ 2009-07-28  9:23   ` Andi Kleen
  2009-08-03  8:29   ` Ingo Molnar
  1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2009-07-28  9:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Paul Mackerras, Ingo Molnar, linux-kernel

Peter Zijlstra <a.p.zijlstra@chello.nl> writes:

> We could use preempt notifiers (or put in our own hooks) to disable
> callchains during the context switch I suppose.

You can simply check if cr3 matches the current->mm page tables
and not dump if they are not.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI between switch_mm and switch_to
  2009-07-28  7:51 ` Peter Zijlstra
  2009-07-28  9:23   ` Andi Kleen
@ 2009-08-03  8:29   ` Ingo Molnar
  2009-08-03 10:32     ` Paul Mackerras
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2009-08-03  8:29 UTC (permalink / raw)
  To: Peter Zijlstra, Benjamin Herrenschmidt; +Cc: Paul Mackerras, linux-kernel


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote:
>
> > Ben H. suggested there might be a problem if we get a PMU 
> > interrupt and try to do a stack trace of userspace in the 
> > interval between when we call switch_mm() from 
> > sched.c:context_switch() and when we call switch_to().  If we 
> > get an NMI in that interval and do a stack trace of userspace, 
> > we'll see the registers of the old task but when we peek at user 
> > addresses we'll see the memory image for the new task, so the 
> > stack trace we get will be completely bogus.
> > 
> > Is this in fact also a problem on x86, or is there some subtle 
> > reason why it can't happen there?
> 
> I can't spot one, maybe Ingo can when he's back :-)
> 
> So I think this is very good spotting from Ben.

Yeah.

> We could use preempt notifiers (or put in our own hooks) to 
> disable callchains during the context switch I suppose.

I think we should only disable user call-chains i think - the 
in-kernel call-chain is still reliable.

Also, i think we dont need preempt notifiers, we can use a simple 
check like this:

	if (current->mm &&
		cpu_isset(smp_processor_id(), &current->mm->cpu_vm_mask) {

		...
	}

In the user-call-chain code. We'd only touch the user memory image 
if that bit is set. cpu_vm_mask is maintained atomically and before 
we switch the MM, so it should be race-free.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI between switch_mm and switch_to
  2009-08-03  8:29   ` Ingo Molnar
@ 2009-08-03 10:32     ` Paul Mackerras
  2009-08-03 10:43       ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2009-08-03 10:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Benjamin Herrenschmidt, linux-kernel

Ingo Molnar writes:

> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote:
> >
> > > Ben H. suggested there might be a problem if we get a PMU 
> > > interrupt and try to do a stack trace of userspace in the 
> > > interval between when we call switch_mm() from 
> > > sched.c:context_switch() and when we call switch_to().  If we 
> > > get an NMI in that interval and do a stack trace of userspace, 
> > > we'll see the registers of the old task but when we peek at user 
> > > addresses we'll see the memory image for the new task, so the 
> > > stack trace we get will be completely bogus.
> > > 
> > > Is this in fact also a problem on x86, or is there some subtle 
> > > reason why it can't happen there?
> > 
> > I can't spot one, maybe Ingo can when he's back :-)
> > 
> > So I think this is very good spotting from Ben.
> 
> Yeah.
> 
> > We could use preempt notifiers (or put in our own hooks) to 
> > disable callchains during the context switch I suppose.
> 
> I think we should only disable user call-chains i think - the 
> in-kernel call-chain is still reliable.
> 
> Also, i think we dont need preempt notifiers, we can use a simple 
> check like this:
> 
> 	if (current->mm &&
> 		cpu_isset(smp_processor_id(), &current->mm->cpu_vm_mask) {

On x86, do you clear the current processor's bit in cpu_vm_mask when
you switch the MMU away from a task?  We don't on powerpc, which would
render the above test incorrect.  (But then we don't actually have the
problem on powerpc since interrupts get hard-disabled in switch_mm and
stay hard-disabled until they get soft-enabled.)

Paul.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI between switch_mm and switch_to
  2009-08-03 10:32     ` Paul Mackerras
@ 2009-08-03 10:43       ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-08-03 10:43 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Peter Zijlstra, Benjamin Herrenschmidt, linux-kernel


* Paul Mackerras <paulus@samba.org> wrote:

> Ingo Molnar writes:
> 
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > 
> > > On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote:
> > >
> > > > Ben H. suggested there might be a problem if we get a PMU 
> > > > interrupt and try to do a stack trace of userspace in the 
> > > > interval between when we call switch_mm() from 
> > > > sched.c:context_switch() and when we call switch_to().  If we 
> > > > get an NMI in that interval and do a stack trace of userspace, 
> > > > we'll see the registers of the old task but when we peek at user 
> > > > addresses we'll see the memory image for the new task, so the 
> > > > stack trace we get will be completely bogus.
> > > > 
> > > > Is this in fact also a problem on x86, or is there some subtle 
> > > > reason why it can't happen there?
> > > 
> > > I can't spot one, maybe Ingo can when he's back :-)
> > > 
> > > So I think this is very good spotting from Ben.
> > 
> > Yeah.
> > 
> > > We could use preempt notifiers (or put in our own hooks) to 
> > > disable callchains during the context switch I suppose.
> > 
> > I think we should only disable user call-chains i think - the 
> > in-kernel call-chain is still reliable.
> > 
> > Also, i think we dont need preempt notifiers, we can use a simple 
> > check like this:
> > 
> > 	if (current->mm &&
> > 		cpu_isset(smp_processor_id(), &current->mm->cpu_vm_mask) {
> 
> On x86, do you clear the current processor's bit in cpu_vm_mask 
> when you switch the MMU away from a task?  We don't on powerpc, 
> which would render the above test incorrect.  (But then we don't 
> actually have the problem on powerpc since interrupts get 
> hard-disabled in switch_mm and stay hard-disabled until they get 
> soft-enabled.)

This is what x86 does in arch/x86/include/asm/mmu_context.h:

static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
			     struct task_struct *tsk)
{
	unsigned cpu = smp_processor_id();

	if (likely(prev != next)) {
		/* stop flush ipis for the previous mm */
		cpu_clear(cpu, prev->cpu_vm_mask);
#ifdef CONFIG_SMP
		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
		percpu_write(cpu_tlbstate.active_mm, next);
#endif
		cpu_set(cpu, next->cpu_vm_mask);

		/* Re-load page tables */
		load_cr3(next->pgd);

		/*
		 * load the LDT, if the LDT is different:
		 */
		if (unlikely(prev->context.ldt != next->context.ldt))
			load_LDT_nolock(&next->context);
	}
#ifdef CONFIG_SMP
	else {
		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
		BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);

		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
			/* We were in lazy tlb mode and leave_mm disabled
			 * tlb flush IPI delivery. We must reload CR3
			 * to make sure to use no freed page tables.
			 */
			load_cr3(next->pgd);
			load_LDT_nolock(&next->context);
		}
	}
#endif
}

which would suggest to me that cpu_vm_mask is precise.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-08-03 10:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-28  4:49 NMI between switch_mm and switch_to Paul Mackerras
2009-07-28  7:51 ` Peter Zijlstra
2009-07-28  9:23   ` Andi Kleen
2009-08-03  8:29   ` Ingo Molnar
2009-08-03 10:32     ` Paul Mackerras
2009-08-03 10:43       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox