All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takashi Iwai <tiwai@suse.de>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>,
	Denys Vlasenko <vda.linux@googlemail.com>,
	Jiri Kosina <jkosina@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Stefan Seyfried <stefan.seyfried@googlemail.com>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Mon, 23 Mar 2015 19:56:43 +0100	[thread overview]
Message-ID: <s5hlhinlez8.wl-tiwai@suse.de> (raw)
In-Reply-To: <CALCETrWmz2JoWJptt-YxVszJX0J8m+OhQPqiXRJsE460tXbYNg@mail.gmail.com>

At Mon, 23 Mar 2015 11:38:30 -0700,
Andy Lutomirski wrote:
> 
> On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > On 03/23/2015 02:22 PM, Takashi Iwai wrote:
> >> At Mon, 23 Mar 2015 10:35:41 +0100,
> >> Takashi Iwai wrote:
> >>>
> >>> At Mon, 23 Mar 2015 10:02:52 +0100,
> >>> Takashi Iwai wrote:
> >>>>
> >>>> At Fri, 20 Mar 2015 19:16:53 +0100,
> >>>> Denys Vlasenko wrote:
> 
> >> I'm really puzzled now.  We have a few pieces of information:
> >>
> >> - git bisection pointed the commit 96b6352c1271:
> >>     x86_64, entry: Remove the syscall exit audit and schedule optimizations
> >>   and reverting this "fixes" the problem indeed.  Even just moving two
> >>   lines
> >>     LOCKDEP_SYS_EXIT
> >>     DISABLE_INTERRUPTS(CLBR_NONE)
> >>   at the beginning of ret_from_sys_call already fixes.  (Of course I
> >>   can't prove the fix but it stabilizes for a day without crash while
> >>   usually I hit the bug in 10 minutes in full test running.)
> >
> > The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from
> > interrupt-disabled region to interrupt-enabled:
> >
> >         cmpq $__NR_syscall_max,%rax
> >         ja ret_from_sys_call
> >         movq %r10,%rcx
> >         call *sys_call_table(,%rax,8)  # XXX:    rip relative
> >         movq %rax,RAX-ARGOFFSET(%rsp)
> > ret_from_sys_call:
> >         testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >         jnz int_ret_from_sys_call_fixup /* Go the the slow path */
> >         LOCKDEP_SYS_EXIT
> >         DISABLE_INTERRUPTS(CLBR_NONE)
> >         TRACE_IRQS_OFF
> > ...
> > ...
> > int_ret_from_sys_call_fixup:
> >         FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
> >         jmp int_ret_from_sys_call
> > ...
> > ...
> > GLOBAL(int_ret_from_sys_call)
> >         DISABLE_INTERRUPTS(CLBR_NONE)
> >         TRACE_IRQS_OFF
> >
> > You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE).
> >
> > I also don't see how moving that check (even if it is wrong in a more
> > benign way) can have such a drastic effect.
> 
> I bet I see it.  I have the advantage of having stared at KVM code and
> cursed at it more recently than you, I suspect.  KVM does awful, awful
> things to CPU state, and, as an optimization, it allows kernel code to
> run with CPU state that would be totally invalid in user mode.  This
> happens through a bunch of hooks, including this bit in __switch_to:
> 
>     /*
>      * Now maybe reload the debug registers and handle I/O bitmaps
>      */
>     if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT ||
>              task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
>         __switch_to_xtra(prev_p, next_p, tss);
> 
> IOW, we *change* tif during context switches.
> 
> 
> The race looks like this:
> 
>     testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
>     jnz int_ret_from_sys_call_fixup    /* Go the the slow path */
> 
> --- preempted here, switch to KVM guest ---
> 
> KVM guest enters and screws up, say, MSR_SYSCALL_MASK.  This wouldn't
> happen to be a *32-bit* KVM guest, perhaps?
> 
> Now KVM schedules, calling __switch_to.  __switch_to sets
> _TIF_USER_RETURN_NOTIFY.  We IRET back to the syscall exit code, turn
> off interrupts, and do sysret.  We are now screwed.

Thanks for enlightening!  That looks like a feasible scenario.
(I tested only a 64bit KVM guest, BTW.)

> I don't know why this manifests in this particular failure, but any
> number of terrible things could happen now.
> 
> FWIW, this will affect things other than KVM.  For example, SIGKILL
> sent while a process is sleeping in that two-instruction window won't
> work.
> 
> Takashi, can you re-send your patch so we can review it for real in
> light of this race?

The patch below worked.  I'll double-check tomorrow whether this
really cures reliably.


thanks,

Takashi

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1d74d161687c..5340ac7f88a9 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -364,12 +364,12 @@ system_call_fastpath:
  * Has incomplete stack frame and undefined top of stack.
  */
 ret_from_sys_call:
-	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
-	jnz int_ret_from_sys_call_fixup	/* Go the the slow path */
-
 	LOCKDEP_SYS_EXIT
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
+	testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+	jnz int_ret_from_sys_call_fixup	/* Go the the slow path */
+
 	CFI_REMEMBER_STATE
 	/*
 	 * sysretq will re-enable interrupts:

  parent reply	other threads:[~2015-03-23 18:56 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15  8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05   ` Takashi Iwai
2015-03-18 17:43   ` Takashi Iwai
2015-03-18 17:46     ` Takashi Iwai
2015-03-18 18:03       ` Andy Lutomirski
2015-03-18 19:03         ` Stefan Seyfried
2015-03-18 19:26           ` Andy Lutomirski
2015-03-18 20:05             ` Stefan Seyfried
2015-03-18 20:51               ` Andy Lutomirski
2015-03-18 21:12                 ` Stefan Seyfried
2015-03-18 21:21                   ` Andy Lutomirski
2015-03-18 21:41                     ` Stefan Seyfried
2015-03-18 21:49                       ` Denys Vlasenko
2015-03-18 21:53                         ` Stefan Seyfried
2015-03-18 20:06             ` Denys Vlasenko
2015-03-18 20:49               ` Andy Lutomirski
2015-03-18 21:06                 ` Denys Vlasenko
2015-03-18 21:17                   ` Andy Lutomirski
2015-03-18 21:32             ` Linus Torvalds
2015-03-18 21:42               ` Denys Vlasenko
2015-03-18 21:55                 ` Andy Lutomirski
2015-03-18 22:17                   ` Denys Vlasenko
2015-03-18 22:20                     ` Andy Lutomirski
2015-03-18 22:27                       ` Denys Vlasenko
2015-03-18 22:18                   ` Linus Torvalds
2015-03-18 22:24                     ` Andy Lutomirski
2015-03-18 22:22                   ` Jiri Kosina
2015-03-18 22:28                     ` Linus Torvalds
2015-03-18 22:29                       ` Andy Lutomirski
2015-03-18 22:29                     ` Andy Lutomirski
2015-03-18 22:38                       ` Stefan Seyfried
2015-03-18 22:40                         ` Andy Lutomirski
2015-03-18 23:22                           ` Andy Lutomirski
2015-03-19  0:23                             ` Stefan Seyfried
2015-03-19  0:57                               ` Andy Lutomirski
2015-03-19  2:15                                 ` Linus Torvalds
2015-03-19  6:24                                 ` Stefan Seyfried
2015-03-19 10:16                       ` Takashi Iwai
2015-03-19 10:58                         ` Denys Vlasenko
2015-03-19 11:21                           ` Takashi Iwai
2015-03-19 12:48                             ` Denys Vlasenko
2015-03-19 13:47                               ` Takashi Iwai
2015-03-19 14:55                                 ` Takashi Iwai
2015-03-19 15:22                                   ` Takashi Iwai
2015-03-19 15:41                                     ` Andy Lutomirski
2015-03-19 15:51                                       ` Takashi Iwai
2015-03-19 16:01                                         ` Andy Lutomirski
2015-03-20 18:16                                         ` Denys Vlasenko
2015-03-20 18:50                                           ` Takashi Iwai
2015-03-23  9:02                                           ` Takashi Iwai
2015-03-23  9:35                                             ` Takashi Iwai
2015-03-23 13:22                                               ` Takashi Iwai
2015-03-23 16:07                                                 ` Denys Vlasenko
2015-03-23 17:18                                                   ` Takashi Iwai
2015-03-23 17:46                                                     ` Denys Vlasenko
2015-03-23 18:43                                                       ` Takashi Iwai
2015-03-23 18:38                                                   ` Andy Lutomirski
2015-03-23 18:48                                                     ` Andy Lutomirski
2015-03-23 18:59                                                       ` Takashi Iwai
2015-03-23 19:10                                                         ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21                                                           ` Denys Vlasenko
2015-03-23 19:27                                                             ` Andy Lutomirski
2015-03-23 19:32                                                               ` Andy Lutomirski
2015-03-24 11:17                                                           ` Takashi Iwai
2015-03-24 20:08                                                           ` Ingo Molnar
2015-03-25  0:35                                                             ` Andy Lutomirski
2015-03-25 12:21                                                               ` Ingo Molnar
2015-03-25 15:07                                                                 ` Andy Lutomirski
2015-03-25  9:13                                                           ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54                                                     ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56                                                     ` Takashi Iwai [this message]
2015-03-23 19:07                                                     ` Denys Vlasenko
2015-03-23 19:10                                                       ` Andy Lutomirski
2015-03-19 13:21                   ` Denys Vlasenko
2015-03-18 21:49               ` Stefan Seyfried
2015-03-28 23:57             ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s5hlhinlez8.wl-tiwai@suse.de \
    --to=tiwai@suse.de \
    --cc=dvlasenk@redhat.com \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=stefan.seyfried@googlemail.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vda.linux@googlemail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.