Re: 2.6.26-git: NULL pointer deref in __switch_to

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rusty Russell <rusty@rustcorp.com.au>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Simon Holm Thøgersen" <odie@cs.aau.dk>,
	"Vegard Nossum" <vegard.nossum@gmail.com>,
	"Patrick McHardy" <kaber@trash.net>,
	"Linux Kernel Mailinglist" <linux-kernel@vger.kernel.org>,
	"Chuck Ebbert" <cebbert@redhat.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: 2.6.26-git: NULL pointer deref in __switch_to
Date: Wed, 18 Jun 2008 15:34:23 +1000	[thread overview]
Message-ID: <200806181534.24085.rusty@rustcorp.com.au> (raw)
In-Reply-To: <20080617235022.GA23370@linux-os.sc.intel.com>

On Wednesday 18 June 2008 09:50:22 Suresh Siddha wrote:
> On Mon, Jun 16, 2008 at 02:21:23PM -0700, Simon Holm Thøgersen wrote:
> > > Can you please upload it some where? I will also try with another guest
> > > image meanwhile.
> >
> > [access provided to Suresh in private email]
>
> Simon, Thanks.
>
> Simon, Patrick, I am able to reproduce the oops in __switch_to()
> with lguest.  My debug showed that there is atleast one lguest specific
> issue (which should be present in 2.6.25 and before aswell) and it got
> exposed with a kernel oops with the recent fpu dynamic allocation patches.
>
> In addition to the previous possible scenario (with fpu_counter), in the
> presence of lguest, it is possible that the cpu's TS bit it still set and
> the lguest launcher task's thread_info has TS_USEDFPU still set.
>
> This is because of the way the lguest launcher handling the guest's TS bit.
> (look at lguest_set_ts() in lguest_arch_run_guest()). This can result
> in a DNA fault while doing unlazy_fpu() in __switch_to(). This will
> end up causing a DNA fault in the context of new process thats
> getting context switched in (as opossed to handling DNA fault in the
> context of lguest launcher/helper process).
>
> This is wrong in both pre and post 2.6.25 kernels. In the recent
> 2.6.26-rc series, this is showing up as NULL pointer dereferences or
> sleeping function called from atomic context(__switch_to()), as
> we free and dynamically allocate the FPU context for the newly
> created threads. Older kernels might show some FPU corruption for processes
> running inside of lguest.
>
> With the appended patch, my test system is running for more than 50 mins
> now. So atleast some of your oops (hopefully all!) should get fixed.
> Please give it a try. I will spend more time with this fix tomorrow.
>
> Apart from the last hunk(MSR_IA32_SYSENTER_CS changes), I believe
> the below patch is needed for 2.6.25 aswell.
>
> Thanks.
>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> ---
>
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 6d54833..e2db9ac 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -333,6 +333,7 @@ void flush_thread(void)
>  	/*
>  	 * Forget coprocessor state..
>  	 */
> +	tsk->fpu_counter = 0;
>  	clear_fpu(tsk);
>  	clear_used_math();
>  }
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index ac54ff5..c6eb5c9 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -294,6 +294,7 @@ void flush_thread(void)
>  	/*
>  	 * Forget coprocessor state..
>  	 */
> +	tsk->fpu_counter = 0;
>  	clear_fpu(tsk);
>  	clear_used_math();
>  }
> diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
> index 5126d5d..4a98404 100644
> --- a/drivers/lguest/x86/core.c
> +++ b/drivers/lguest/x86/core.c
> @@ -176,7 +176,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
>  	 * we set it now, so we can trap and pass that trap to the Guest if it
>  	 * uses the FPU. */
>  	if (cpu->ts)
> -		lguest_set_ts();
> +		unlazy_fpu(current);
>
>  	/* SYSENTER is an optimized way of doing system calls.  We can't allow
>  	 * it because it always jumps to privilege level 0.  A normal Guest
> @@ -196,6 +196,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
>  	 * trap made the switcher code come back, and an error code which some
>  	 * traps set.  */
>
> +	/* Restore SYSENTER if it's supposed to be on. */
> +	if (boot_cpu_has(X86_FEATURE_SEP))
> +		wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
> +
>  	/* If the Guest page faulted, then the cr2 register will tell us the
>  	 * bad virtual address.  We have to grab this now, because once we
>  	 * re-enable interrupts an interrupt could fault and thus overwrite
> @@ -203,13 +207,12 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
>  	if (cpu->regs->trapnum == 14)
>  		cpu->arch.last_pagefault = read_cr2();
>  	/* Similarly, if we took a trap because the Guest used the FPU,
> -	 * we have to restore the FPU it expects to see. */
> +	 * we have to restore the FPU it expects to see.
> +	 * math_state_restore() may sleep and we may even move off to
> +	 * a different CPU. So all the critical stuff should be done
> +	 * before this.  */
>  	else if (cpu->regs->trapnum == 7)
>  		math_state_restore();

Hi Suresh,

   Firstly, thanks for figuring this out.  But math_state_restore() has nasty 
semantics now.  Currently lguest will work, because no code path following 
this call relies on being on the same CPU.

So, this patch is fine, but I wonder if I should just be forcing fpu 
allocation earlier for lguest tasks, so I can avoid this altogether?

Thanks,
Rusty.

next prev parent reply	other threads:[~2008-06-18  5:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-13 17:42 2.6.26-git: NULL pointer deref in __switch_to Patrick McHardy
2008-06-13 18:24 ` Vegard Nossum
2008-06-13 22:47   ` Suresh Siddha
2008-06-14  6:20     ` Ingo Molnar
2008-06-14  7:39       ` Patrick McHardy
2008-06-16 11:06       ` Jens Axboe
2008-06-14  7:36     ` Patrick McHardy
2008-06-16 10:15     ` Simon Holm Thøgersen
2008-06-16 10:29       ` Patrick McHardy
2008-06-16 12:10         ` Patrick McHardy
2008-06-16 17:49       ` Suresh Siddha
2008-06-16 21:21         ` Simon Holm Thøgersen
2008-06-17 23:50           ` Suresh Siddha
2008-06-18  5:34             ` Rusty Russell [this message]
2008-06-18  6:23               ` Suresh Siddha
2008-06-18 12:19                 ` Rusty Russell
2008-06-18  8:42             ` Patrick McHardy
2008-06-18 13:57             ` Simon Holm Thøgersen
2008-06-13 20:10 ` Rafael J. Wysocki
2008-06-14  7:33   ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200806181534.24085.rusty@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=cebbert@redhat.com \
    --cc=kaber@trash.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=odie@cs.aau.dk \
    --cc=suresh.b.siddha@intel.com \
    --cc=vegard.nossum@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.