From: Denys Vlasenko <dvlasenk@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Takashi Iwai <tiwai@suse.de>,
Denys Vlasenko <vda.linux@googlemail.com>,
Jiri Kosina <jkosina@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Stefan Seyfried <stefan.seyfried@googlemail.com>,
X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Mon, 23 Mar 2015 20:07:18 +0100 [thread overview]
Message-ID: <55106466.7030202@redhat.com> (raw)
In-Reply-To: <CALCETrWmz2JoWJptt-YxVszJX0J8m+OhQPqiXRJsE460tXbYNg@mail.gmail.com>
On 03/23/2015 07:38 PM, Andy Lutomirski wrote:
>> cmpq $__NR_syscall_max,%rax
>> ja ret_from_sys_call
>> movq %r10,%rcx
>> call *sys_call_table(,%rax,8) # XXX: rip relative
>> movq %rax,RAX-ARGOFFSET(%rsp)
>> ret_from_sys_call:
>> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> jnz int_ret_from_sys_call_fixup /* Go the the slow path */
>> LOCKDEP_SYS_EXIT
>> DISABLE_INTERRUPTS(CLBR_NONE)
>> TRACE_IRQS_OFF
>> ...
>> ...
>> int_ret_from_sys_call_fixup:
>> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
>> jmp int_ret_from_sys_call
>> ...
>> ...
>> GLOBAL(int_ret_from_sys_call)
>> DISABLE_INTERRUPTS(CLBR_NONE)
>> TRACE_IRQS_OFF
>>
>> You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE).
>>
>> I also don't see how moving that check (even if it is wrong in a more
>> benign way) can have such a drastic effect.
>
> I bet I see it. I have the advantage of having stared at KVM code and
> cursed at it more recently than you, I suspect. KVM does awful, awful
> things to CPU state, and, as an optimization, it allows kernel code to
> run with CPU state that would be totally invalid in user mode. This
> happens through a bunch of hooks, including this bit in __switch_to:
>
> /*
> * Now maybe reload the debug registers and handle I/O bitmaps
> */
> if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT ||
> task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
> __switch_to_xtra(prev_p, next_p, tss);
>
> IOW, we *change* tif during context switches.
>
>
> The race looks like this:
>
> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
> jnz int_ret_from_sys_call_fixup /* Go the the slow path */
>
> --- preempted here, switch to KVM guest ---
>
> KVM guest enters and screws up, say, MSR_SYSCALL_MASK. This wouldn't
> happen to be a *32-bit* KVM guest, perhaps?
>
> Now KVM schedules, calling __switch_to. __switch_to sets
> _TIF_USER_RETURN_NOTIFY.
Clear up to now...
> We IRET back to the syscall exit code,
So we end up being just after the "testl", right?
We go into "int_ret_from_sys_call_fixup".
We FIXUP_TOP_OF_STACK - now iret frame contains correct values.
Then we jump to "int_ret_from_sys_call".
> turn off interrupts, and do sysret. We are now screwed.
I don't understand. Where exactly it would go wrong?
On sysret, rsp would be restored from PER_CPU(old_rsp), right?
We'd end up in *userspace* with userspace rsp.
More to it. Since we FIXUPed the iret frame, it does not even matter
how we'll exit to userspace. Either sysret or iret would work.
next prev parent reply other threads:[~2015-03-23 19:07 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05 ` Takashi Iwai
2015-03-18 17:43 ` Takashi Iwai
2015-03-18 17:46 ` Takashi Iwai
2015-03-18 18:03 ` Andy Lutomirski
2015-03-18 19:03 ` Stefan Seyfried
2015-03-18 19:26 ` Andy Lutomirski
2015-03-18 20:05 ` Stefan Seyfried
2015-03-18 20:51 ` Andy Lutomirski
2015-03-18 21:12 ` Stefan Seyfried
2015-03-18 21:21 ` Andy Lutomirski
2015-03-18 21:41 ` Stefan Seyfried
2015-03-18 21:49 ` Denys Vlasenko
2015-03-18 21:53 ` Stefan Seyfried
2015-03-18 20:06 ` Denys Vlasenko
2015-03-18 20:49 ` Andy Lutomirski
2015-03-18 21:06 ` Denys Vlasenko
2015-03-18 21:17 ` Andy Lutomirski
2015-03-18 21:32 ` Linus Torvalds
2015-03-18 21:42 ` Denys Vlasenko
2015-03-18 21:55 ` Andy Lutomirski
2015-03-18 22:17 ` Denys Vlasenko
2015-03-18 22:20 ` Andy Lutomirski
2015-03-18 22:27 ` Denys Vlasenko
2015-03-18 22:18 ` Linus Torvalds
2015-03-18 22:24 ` Andy Lutomirski
2015-03-18 22:22 ` Jiri Kosina
2015-03-18 22:28 ` Linus Torvalds
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:29 ` Andy Lutomirski
2015-03-18 22:38 ` Stefan Seyfried
2015-03-18 22:40 ` Andy Lutomirski
2015-03-18 23:22 ` Andy Lutomirski
2015-03-19 0:23 ` Stefan Seyfried
2015-03-19 0:57 ` Andy Lutomirski
2015-03-19 2:15 ` Linus Torvalds
2015-03-19 6:24 ` Stefan Seyfried
2015-03-19 10:16 ` Takashi Iwai
2015-03-19 10:58 ` Denys Vlasenko
2015-03-19 11:21 ` Takashi Iwai
2015-03-19 12:48 ` Denys Vlasenko
2015-03-19 13:47 ` Takashi Iwai
2015-03-19 14:55 ` Takashi Iwai
2015-03-19 15:22 ` Takashi Iwai
2015-03-19 15:41 ` Andy Lutomirski
2015-03-19 15:51 ` Takashi Iwai
2015-03-19 16:01 ` Andy Lutomirski
2015-03-20 18:16 ` Denys Vlasenko
2015-03-20 18:50 ` Takashi Iwai
2015-03-23 9:02 ` Takashi Iwai
2015-03-23 9:35 ` Takashi Iwai
2015-03-23 13:22 ` Takashi Iwai
2015-03-23 16:07 ` Denys Vlasenko
2015-03-23 17:18 ` Takashi Iwai
2015-03-23 17:46 ` Denys Vlasenko
2015-03-23 18:43 ` Takashi Iwai
2015-03-23 18:38 ` Andy Lutomirski
2015-03-23 18:48 ` Andy Lutomirski
2015-03-23 18:59 ` Takashi Iwai
2015-03-23 19:10 ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21 ` Denys Vlasenko
2015-03-23 19:27 ` Andy Lutomirski
2015-03-23 19:32 ` Andy Lutomirski
2015-03-24 11:17 ` Takashi Iwai
2015-03-24 20:08 ` Ingo Molnar
2015-03-25 0:35 ` Andy Lutomirski
2015-03-25 12:21 ` Ingo Molnar
2015-03-25 15:07 ` Andy Lutomirski
2015-03-25 9:13 ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54 ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56 ` Takashi Iwai
2015-03-23 19:07 ` Denys Vlasenko [this message]
2015-03-23 19:10 ` Andy Lutomirski
2015-03-19 13:21 ` Denys Vlasenko
2015-03-18 21:49 ` Stefan Seyfried
2015-03-28 23:57 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55106466.7030202@redhat.com \
--to=dvlasenk@redhat.com \
--cc=jkosina@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=stefan.seyfried@googlemail.com \
--cc=tiwai@suse.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vda.linux@googlemail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).