public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Simon Holm Thøgersen" <odie@cs.aau.dk>,
	"Vegard Nossum" <vegard.nossum@gmail.com>,
	"Patrick McHardy" <kaber@trash.net>,
	"Linux Kernel Mailinglist" <linux-kernel@vger.kernel.org>,
	"Chuck Ebbert" <cebbert@redhat.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: 2.6.26-git: NULL pointer deref in __switch_to
Date: Wed, 18 Jun 2008 15:34:23 +1000	[thread overview]
Message-ID: <200806181534.24085.rusty@rustcorp.com.au> (raw)
In-Reply-To: <20080617235022.GA23370@linux-os.sc.intel.com>

On Wednesday 18 June 2008 09:50:22 Suresh Siddha wrote:
> On Mon, Jun 16, 2008 at 02:21:23PM -0700, Simon Holm Thøgersen wrote:
> > > Can you please upload it some where? I will also try with another guest
> > > image meanwhile.
> >
> > [access provided to Suresh in private email]
>
> Simon, Thanks.
>
> Simon, Patrick, I am able to reproduce the oops in __switch_to()
> with lguest.  My debug showed that there is atleast one lguest specific
> issue (which should be present in 2.6.25 and before aswell) and it got
> exposed with a kernel oops with the recent fpu dynamic allocation patches.
>
> In addition to the previous possible scenario (with fpu_counter), in the
> presence of lguest, it is possible that the cpu's TS bit it still set and
> the lguest launcher task's thread_info has TS_USEDFPU still set.
>
> This is because of the way the lguest launcher handling the guest's TS bit.
> (look at lguest_set_ts() in lguest_arch_run_guest()). This can result
> in a DNA fault while doing unlazy_fpu() in __switch_to(). This will
> end up causing a DNA fault in the context of new process thats
> getting context switched in (as opossed to handling DNA fault in the
> context of lguest launcher/helper process).
>
> This is wrong in both pre and post 2.6.25 kernels. In the recent
> 2.6.26-rc series, this is showing up as NULL pointer dereferences or
> sleeping function called from atomic context(__switch_to()), as
> we free and dynamically allocate the FPU context for the newly
> created threads. Older kernels might show some FPU corruption for processes
> running inside of lguest.
>
> With the appended patch, my test system is running for more than 50 mins
> now. So atleast some of your oops (hopefully all!) should get fixed.
> Please give it a try. I will spend more time with this fix tomorrow.
>
> Apart from the last hunk(MSR_IA32_SYSENTER_CS changes), I believe
> the below patch is needed for 2.6.25 aswell.
>
> Thanks.
>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> ---
>
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 6d54833..e2db9ac 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -333,6 +333,7 @@ void flush_thread(void)
>  	/*
>  	 * Forget coprocessor state..
>  	 */
> +	tsk->fpu_counter = 0;
>  	clear_fpu(tsk);
>  	clear_used_math();
>  }
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index ac54ff5..c6eb5c9 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -294,6 +294,7 @@ void flush_thread(void)
>  	/*
>  	 * Forget coprocessor state..
>  	 */
> +	tsk->fpu_counter = 0;
>  	clear_fpu(tsk);
>  	clear_used_math();
>  }
> diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
> index 5126d5d..4a98404 100644
> --- a/drivers/lguest/x86/core.c
> +++ b/drivers/lguest/x86/core.c
> @@ -176,7 +176,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
>  	 * we set it now, so we can trap and pass that trap to the Guest if it
>  	 * uses the FPU. */
>  	if (cpu->ts)
> -		lguest_set_ts();
> +		unlazy_fpu(current);
>
>  	/* SYSENTER is an optimized way of doing system calls.  We can't allow
>  	 * it because it always jumps to privilege level 0.  A normal Guest
> @@ -196,6 +196,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
>  	 * trap made the switcher code come back, and an error code which some
>  	 * traps set.  */
>
> +	/* Restore SYSENTER if it's supposed to be on. */
> +	if (boot_cpu_has(X86_FEATURE_SEP))
> +		wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
> +
>  	/* If the Guest page faulted, then the cr2 register will tell us the
>  	 * bad virtual address.  We have to grab this now, because once we
>  	 * re-enable interrupts an interrupt could fault and thus overwrite
> @@ -203,13 +207,12 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
>  	if (cpu->regs->trapnum == 14)
>  		cpu->arch.last_pagefault = read_cr2();
>  	/* Similarly, if we took a trap because the Guest used the FPU,
> -	 * we have to restore the FPU it expects to see. */
> +	 * we have to restore the FPU it expects to see.
> +	 * math_state_restore() may sleep and we may even move off to
> +	 * a different CPU. So all the critical stuff should be done
> +	 * before this.  */
>  	else if (cpu->regs->trapnum == 7)
>  		math_state_restore();

Hi Suresh,

   Firstly, thanks for figuring this out.  But math_state_restore() has nasty 
semantics now.  Currently lguest will work, because no code path following 
this call relies on being on the same CPU.

So, this patch is fine, but I wonder if I should just be forcing fpu 
allocation earlier for lguest tasks, so I can avoid this altogether?

Thanks,
Rusty.

  reply	other threads:[~2008-06-18  5:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-13 17:42 2.6.26-git: NULL pointer deref in __switch_to Patrick McHardy
2008-06-13 18:24 ` Vegard Nossum
2008-06-13 22:47   ` Suresh Siddha
2008-06-14  6:20     ` Ingo Molnar
2008-06-14  7:39       ` Patrick McHardy
2008-06-16 11:06       ` Jens Axboe
2008-06-14  7:36     ` Patrick McHardy
2008-06-16 10:15     ` Simon Holm Thøgersen
2008-06-16 10:29       ` Patrick McHardy
2008-06-16 12:10         ` Patrick McHardy
2008-06-16 17:49       ` Suresh Siddha
2008-06-16 21:21         ` Simon Holm Thøgersen
2008-06-17 23:50           ` Suresh Siddha
2008-06-18  5:34             ` Rusty Russell [this message]
2008-06-18  6:23               ` Suresh Siddha
2008-06-18 12:19                 ` Rusty Russell
2008-06-18  8:42             ` Patrick McHardy
2008-06-18 13:57             ` Simon Holm Thøgersen
2008-06-13 20:10 ` Rafael J. Wysocki
2008-06-14  7:33   ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200806181534.24085.rusty@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=cebbert@redhat.com \
    --cc=kaber@trash.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=odie@cs.aau.dk \
    --cc=suresh.b.siddha@intel.com \
    --cc=vegard.nossum@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox