public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sabrina Dubroca <sd@queasysnail.net>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>,
	Andrey Wagin <avagin@gmail.com>, Ingo Molnar <mingo@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Frederic Weisbecker <fweisbec@gmail.com>, X86 ML <x86@kernel.org>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Will Drewry <wad@chromium.org>, Kees Cook <keescook@chromium.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/3 v3] x86: entry_64.S: always allocate complete "struct pt_regs"
Date: Wed, 25 Feb 2015 23:40:55 +0100	[thread overview]
Message-ID: <20150225224055.GA3678@kria> (raw)
In-Reply-To: <CALCETrVFzm-nXwJSgjhLuHS6B1HT70Q0CsxybjJUmVAKALL5NA@mail.gmail.com>

2015-02-25, 13:59:06 -0800, Andy Lutomirski wrote:
> On Wed, Feb 25, 2015 at 1:28 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > On 02/25/2015 09:10 PM, Andy Lutomirski wrote:
> >> On Wed, Feb 25, 2015 at 11:59 AM, Andrey Wagin <avagin@gmail.com> wrote:
> >>> 2015-02-25 21:42 GMT+03:00 Denys Vlasenko <dvlasenk@redhat.com>:
> >>>> On 02/25/2015 01:37 PM, Andrey Wagin wrote:
> >>>>> 2015-02-13 0:54 GMT+03:00 Denys Vlasenko <dvlasenk@redhat.com>:
> >>>>>> 64-bit code was using six stack slots less by not saving/restoring
> >>>>>> registers which are callee-preserved according to C ABI,
> >>>>>> and not allocating space for them.
> >>>>>> Only when syscall needed a complete "struct pt_regs",
> >>>>>> the complete area was allocated and filled in.
> >>>>>> As an additional twist, on interrupt entry a "slightly less truncated pt_regs"
> >>>>>> trick is used, to make nested interrupt stacks easier to unwind.
> >>>>>>
> >>>>>> This proved to be a source of significant obfuscation and subtle bugs.
> >>>>>> For example, stub_fork had to pop the return address,
> >>>>>> extend the struct, save registers, and push return address back. Ugly.
> >>>>>> ia32_ptregs_common pops return address and "returns" via jmp insn,
> >>>>>> throwing a wrench into CPU return stack cache.
> >>>>>>
> >>>>>> This patch changes code to always allocate a complete "struct pt_regs".
> >>>>>> The saving of registers is still done lazily.
> >>>>>>
> >>>>>> "Partial pt_regs" trick on interrupt stack is retained.
> >>>>>>
> >>>>>> Macros which manipulate "struct pt_regs" on stack are reworked:
> >>>>>> ALLOC_PT_GPREGS_ON_STACK allocates the structure.
> >>>>>> SAVE_C_REGS saves to it those registers which are clobbered by C code.
> >>>>>> SAVE_EXTRA_REGS saves to it all other registers.
> >>>>>> Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it.
> >>>>>>
> >>>>>> ia32_ptregs_common, stub_fork and friends lost their ugly dance with
> >>>>>> return pointer.
> >>>>>>
> >>>>>> LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
> >>>>>> instead of magic numbers.
> >>>>>>
> >>>>>> error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS
> >>>>>> instead of having it open-coded yet again.
> >>>>>>
> >>>>>> Patch was run-tested: 64-bit executables, 32-bit executables,
> >>>>>> strace works.
> >>>>>> Timing tests did not show measurable difference in 32-bit
> >>>>>> and 64-bit syscalls.
> >>>>>
> >>>>> Hello Denys,
> >>>>>
> >>>>> My test vm doesn't boot with this patch. Could you help to investigate
> >>>>> this issue?
> >>>>
> >>>> I think I found it. This part of my patch is possibly wrong:
> >>>>
> >>>> @@ -171,9 +171,9 @@ static inline int arch_irqs_disabled(void)
> >>>>  #define ARCH_LOCKDEP_SYS_EXIT_IRQ      \
> >>>>         TRACE_IRQS_ON; \
> >>>>         sti; \
> >>>> -       SAVE_REST; \
> >>>> +       SAVE_EXTRA_REGS; \
> >>>>         LOCKDEP_SYS_EXIT; \
> >>>> -       RESTORE_REST; \
> >>>> +       RESTORE_EXTRA_REGS; \
> >>>>         cli; \
> >>>>         TRACE_IRQS_OFF;
> >>>>
> >>>> The "SAVE_REST" here is intended to really *push* extra regs on stack,
> >>>> but the patch changed it so that they are written to existing stack
> >>>> slots above.
> >>>>
> >>>> From code inspection it should work in almost all cases, but some
> >>>> locations where it is used are really obscure.
> >>>>
> >>>> If there are places where *pushing* regs is really necessary,
> >>>> this can corrupt rbp,rbx,r12-15 registers.
> >>>>
> >>>> Your config has CONFIG_LOCKDEP=y, I think it's worth trying whether the bug
> >>>> was here.
> >>>> Please find updated patch attached. Can you try it?
> >>>
> >>> It doesn't work
> >
> > Thanks for testing it anyway.
> >
> >
> >>> [    3.016262] traps: systemd-cgroups[390] general protection
> >>> ip:7f456f7b6028 sp:7fffdc059718 error:0 in
> >>> ld-2.18.so[7f456f79e000+20000]
> >
> > This is what I know about these crashes. The SEGV itself is caused by
> > HLT instruction executed by dynamic loader, ld-2.NN.so.
> > The instruction is in _exit function, and is only reachable if
> > exit_group and exit syscalls fail to terminate the process.
> > So it seems that syscall execution is getting badly broken somehow
> > at some point.
> >
> > This happens to both reporters.
> >
> > My theory that it is related to lockdep seems to be wrong, because
> > Sabrina's kernel is not lockdep-enabled, yet it sees the same failure.
> >
> > Both kernels are paravirtualized, both are booted under KVM,
> > Andrey runs it with four virtual CPUs, Sabrina runs with two.
> >
> > My next theory is that I missed something related to paravirt.
> > I am looking at that code, so far I don't see anything suspicious.
> >
> > Unfortunately, it doesn't happen to me: I have Sabrina's bzImage,
> > I run it under "qemu-system-x86_64 -enable-kvm -smp 2",
> > I see in dmesg that kernel does detect that it is being run under KVM,
> > but it works for me. No mysterious segfaults.
> >
> > Andrey, can you send me your bzImage? Maybe it will trigger
> > the problem for me.
> >
> >
> >> The change to stub_\func looks wrong to me.  It saves and restores
> >> regs, but those regs might already have been saved if we're on the
> >> slow path.  (Yes, all that code is quite buggy even without all these
> >> patches.)  So is execve.
> >>
> >> This means that, for example, execve called in the slow path will
> >> save/restore regs twice.  If the values in the regs after the first
> >> save and before the second save are different, then we corrupt user
> >> state.
> >
> > This part?
> >
> >         .macro FORK_LIKE func
> >  ENTRY(stub_\func)
> >         CFI_STARTPROC
> > -       popq    %r11                    /* save return address */
> > -       PARTIAL_FRAME 0
> > -       SAVE_REST
> > -       pushq   %r11                    /* put it back on stack */
> > +       DEFAULT_FRAME 0, 8              /* offset 8: return address */
> > +       SAVE_EXTRA_REGS 8
> >         FIXUP_TOP_OF_STACK %r11, 8
> > -       DEFAULT_FRAME 0 8               /* offset 8: return address */
> >         call sys_\func
> >         RESTORE_TOP_OF_STACK %r11, 8
> > -       ret $REST_SKIP          /* pop extended registers */
> > +       ret
> >         CFI_ENDPROC
> >  END(stub_\func)
> >         .endm
> >
> >         FORK_LIKE  clone
> >         FORK_LIKE  fork
> >         FORK_LIKE  vfork
> >
> > But the old code (SAVE_REST thing) was also saving registers here.
> > It had to jump through hoops (pop return address, SAVE_REST,
> > push return address) to do that.
> > After the patch, "SAVE_EXTRA_REGS 8" does the same, just without
> > pop/push pair.
> >
> > I just don't see what's wrong with it. Can you elaborate?
> 
> SAVE_REST pushed the regs onto the stack, whereas SAVE_EXTRA_REGS just
> writes them in place.  It's possible for this to be called when the
> regs have already been saved.
> 
> >
> > And this area of code has no paravirt gunk, so if the bug is here,
> > why it doesn't fail for people running this natively?
> 
> I don't know whether paravirt is involved.  It could be something else.

After reading Denys's last mail, I tried booting the same VM
 - with 1 cpu
 - without CONFIG_PARAVIRT
 - with x86_64_defconfig

and I still get the same traps in all 3 cases.

I can run some userspace programs, but I have no idea what would be
helpful.
I can also try booting a real machine with archlinux/systemd tomorrow.

-- 
Sabrina

  reply	other threads:[~2015-02-25 22:41 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-12 21:54 [PATCH 1/3 v3] x86: entry_64.S: fix wrong symbolic constant usage: R11->ARGOFFSET Denys Vlasenko
2015-02-12 21:54 ` [PATCH 2/3 v3] x86: entry_64.S: always allocate complete "struct pt_regs" Denys Vlasenko
2015-02-18 20:22   ` Andy Lutomirski
2015-02-25 12:37   ` Andrey Wagin
2015-02-25 13:55     ` Denys Vlasenko
2015-02-25 14:48       ` Sabrina Dubroca
2015-02-25 16:52     ` Denys Vlasenko
2015-02-25 18:42     ` Denys Vlasenko
2015-02-25 19:59       ` Andrey Wagin
2015-02-25 20:10         ` Andy Lutomirski
2015-02-25 21:28           ` Denys Vlasenko
2015-02-25 21:59             ` Andy Lutomirski
2015-02-25 22:40               ` Sabrina Dubroca [this message]
2015-02-25 23:34                 ` Sabrina Dubroca
2015-02-26  1:12                   ` Denys Vlasenko
2015-02-26  5:18                     ` Andrew Morton
2015-02-26  6:25                       ` Stephen Rothwell
2015-02-26  9:55               ` Denys Vlasenko
2015-02-26 12:11                 ` Denys Vlasenko
2015-02-26 13:54                   ` Denys Vlasenko
2015-02-26 14:26                     ` Sabrina Dubroca
2015-02-26 15:14                 ` Andy Lutomirski
2015-02-12 21:54 ` [PATCH 3/3 v3] x86: entry_64.S: fix comments. No code changes Denys Vlasenko
2015-02-18 20:25   ` Andy Lutomirski
2015-02-18 20:00 ` [PATCH 1/3 v3] x86: entry_64.S: fix wrong symbolic constant usage: R11->ARGOFFSET Andy Lutomirski
  -- strict thread matches above, loose matches on Subject: below --
2015-02-26 22:40 [PATCH 00/16] x86/asm changes for 4.1 for review Andy Lutomirski
2015-02-26 22:40 ` [PATCH 01/16] x86: open-code register save/restore in trace_hardirqs thunks Andy Lutomirski
2015-03-04 22:52   ` [tip:x86/asm] x86/asm/64: Open-code register save/ restore in trace_hardirqs*() thunks tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 02/16] x86: introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE Andy Lutomirski
2015-03-04 22:52   ` [tip:x86/asm] x86/asm: Introduce push/ pop " tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 03/16] x86: entry_64.S: fix wrong symbolic constant usage: R11->ARGOFFSET Andy Lutomirski
2015-03-04 22:53   ` [tip:x86/asm] x86/asm/entry/64: Fix incorrect " tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 04/16] x86: entry_64.S: always allocate complete "struct pt_regs" Andy Lutomirski
2015-03-04 22:53   ` [tip:x86/asm] x86/asm/entry/64: Always allocate a complete " struct pt_regs" on the kernel stack tip-bot for Denys Vlasenko
2015-03-21 22:51     ` Brian Gerst
2015-03-22 14:15       ` Denys Vlasenko
2015-02-26 22:40 ` [PATCH 05/16] x86: entry_64.S: fix comments. No code changes Andy Lutomirski
2015-03-04 22:53   ` [tip:x86/asm] x86/asm/entry/64: Fix comments tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 06/16] x86: code shrink in paranoid_exit Andy Lutomirski
2015-03-04 22:53   ` [tip:x86/asm] x86/asm/entry/64: Shrink code in 'paranoid_exit' tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 07/16] x86: mass removal of ARGOFFSET. No code changes Andy Lutomirski
2015-03-04 22:54   ` [tip:x86/asm] x86/asm/entry: Do mass removal of 'ARGOFFSET' tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 08/16] x86, entry: Remove int_check_syscall_exit_work Andy Lutomirski
2015-03-04 22:54   ` [tip:x86/asm] x86/asm/entry/64: Remove ' int_check_syscall_exit_work' tip-bot for Andy Lutomirski
2015-02-26 22:40 ` [PATCH 09/16] x86: add comments about various syscall instructions, no code changes Andy Lutomirski
2015-03-04 22:54   ` [tip:x86/asm] x86/asm/entry: Add comments about various syscall instructions tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 10/16] x86: entry_64.S: move save_paranoid and ret_from_fork closer to their users Andy Lutomirski
2015-03-04 22:55   ` [tip:x86/asm] x86/asm/entry/64: Move 'save_paranoid' and ' ret_from_fork' " tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 11/16] x86: entry_64.S: rename save_paranoid to paranoid_entry, no code changes Andy Lutomirski
2015-03-04 22:55   ` [tip:x86/asm] x86/asm/entry/64: Clean up and document various entry code details tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 12/16] x86: ia32entry.S: fold IA32_ARG_FIXUP macro into its callers Andy Lutomirski
2015-03-04 22:55   ` [tip:x86/asm] x86/asm/entry/64/compat: Fold the " tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 13/16] x86: entry_64.S: use more understandable constants Andy Lutomirski
2015-03-04 22:56   ` [tip:x86/asm] x86/asm/entry/64: Use more readable constants tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 14/16] x86: ia32entry.S: use more understandable constant Andy Lutomirski
2015-03-04 22:56   ` [tip:x86/asm] x86/asm/entry/64/compat: Use more readable constant tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 15/16] x86: entry.S: simplify optimistic SYSRET Andy Lutomirski
2015-03-04 21:40   ` Ingo Molnar
2015-03-04 22:56   ` [tip:x86/asm] x86/asm/entry/64: Simplify " tip-bot for Denys Vlasenko
2015-02-26 22:40 ` [PATCH 16/16] x86_64, entry: Remove a bogus ret_from_fork optimization Andy Lutomirski
2015-03-04 22:57   ` [tip:x86/asm] x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization tip-bot for Andy Lutomirski
2015-03-05 11:49   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2015-03-04 21:50 ` [PATCH 00/16] x86/asm changes for 4.1 for review Ingo Molnar
2015-03-04 21:55   ` Andy Lutomirski
2015-01-14 21:48 [PATCH 01/11] x86: entry_64.S: always allocate complete "struct pt_regs" Denys Vlasenko
2015-01-14 21:48 ` [PATCH 02/11] x86: code shrink in paranoid_exit Denys Vlasenko
2015-02-11 20:36   ` Andy Lutomirski
2015-02-11 21:01     ` H. Peter Anvin
2015-02-11 21:13     ` Denys Vlasenko
2015-02-11 22:09       ` Andy Lutomirski
2015-02-18 23:26   ` Andy Lutomirski
2015-01-14 21:48 ` [PATCH 03/11] x86: mass removal of ARGOFFSET Denys Vlasenko
2015-02-21  0:31   ` Andy Lutomirski
2015-02-23 14:31     ` Denys Vlasenko
2015-02-23 16:06       ` Andy Lutomirski
2015-01-14 21:48 ` [PATCH 04/11] x86: rename some macros and labels, no code changes Denys Vlasenko
2015-01-14 21:48 ` [PATCH 05/11] x86: add comments about various syscall instructions, " Denys Vlasenko
2015-01-14 21:48 ` [PATCH 06/11] x86: entry_64.S: move save_paranoid and ret_from_fork closer to their users Denys Vlasenko
2015-01-14 21:48 ` [PATCH 07/11] x86: entry_64.S: rename save_paranoid to paranoid_entry, no code changes Denys Vlasenko
2015-02-11 20:39   ` Andy Lutomirski
2015-01-14 21:48 ` [PATCH 08/11] x86: entry_64.S: fold test_in_nmi macro into its only user Denys Vlasenko
2015-02-11 20:40   ` Andy Lutomirski
2015-02-12  2:17     ` Steven Rostedt
2015-01-14 21:48 ` [PATCH 09/11] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
2015-01-14 21:48 ` [PATCH 10/11] x86: ia32entry.S: fold IA32_ARG_FIXUP macro into its callers Denys Vlasenko
2015-01-14 21:48 ` [PATCH 11/11] x86: entry_64.S: use more understandable constants Denys Vlasenko
2015-01-14 22:17 ` [PATCH 01/11] x86: entry_64.S: always allocate complete "struct pt_regs" Denys Vlasenko
2015-01-14 22:29   ` Andy Lutomirski
2015-01-14 22:41     ` Borislav Petkov
2015-01-14 22:50       ` Denys Vlasenko
2015-02-11 20:30 ` Andy Lutomirski
2015-02-11 21:55   ` Denys Vlasenko
2015-02-11 22:03     ` Andy Lutomirski
2015-01-11 23:07 [PATCH] x86: introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE Denys Vlasenko
2015-01-12  0:38 ` Andy Lutomirski
2015-01-12  6:23   ` Denys Vlasenko
2015-01-12 19:23 ` Borislav Petkov
2015-01-12 19:25   ` Andy Lutomirski
2015-01-12 19:37     ` Borislav Petkov
2015-01-12 19:46       ` Andy Lutomirski
2015-01-12 20:11         ` Borislav Petkov
2015-01-12 20:14           ` Andy Lutomirski
2015-01-12 20:22             ` H. Peter Anvin
2015-01-12 20:26               ` Andy Lutomirski
2015-01-12 21:03                 ` Borislav Petkov
2015-01-13 12:07                 ` Denys Vlasenko
2015-01-12 20:32             ` Borislav Petkov
2015-02-11 20:24 ` Andy Lutomirski
2015-01-10 22:00 [PATCH 0/4 v2] x86: entry.S cleanup Denys Vlasenko
2015-01-10 22:00 ` [PATCH 1/4] x86: entry_64.S: delete unused code Denys Vlasenko
2015-01-10 22:12   ` Andy Lutomirski
2015-01-10 22:00 ` [PATCH 2/4] x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET Denys Vlasenko
2015-01-10 22:13   ` Andy Lutomirski
2015-01-10 22:27     ` Linus Torvalds
2015-01-10 22:35       ` Borislav Petkov
2015-01-10 22:41         ` Linus Torvalds
2015-01-10 22:45           ` Borislav Petkov
2015-01-10 22:37       ` Linus Torvalds
2015-01-10 23:27       ` Andy Lutomirski
2015-01-13 22:11   ` Andy Lutomirski
2015-01-10 22:00 ` [PATCH 3/4] x86: open-code register save/restore in trace_hardirqs thunks Denys Vlasenko
2015-01-10 22:07   ` Linus Torvalds
2015-01-10 22:35     ` Denys Vlasenko
2015-01-10 22:41       ` Borislav Petkov
2015-01-11  3:33         ` Denys Vlasenko
2015-01-11 10:54           ` Borislav Petkov
2015-01-11 23:06             ` Denys Vlasenko
2015-02-11  2:38   ` Andy Lutomirski
2015-01-10 22:00 ` [PATCH 4/4] x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user Denys Vlasenko
2015-01-13 22:26   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150225224055.GA3678@kria \
    --to=sd@queasysnail.net \
    --cc=ast@plumgrid.com \
    --cc=avagin@gmail.com \
    --cc=bp@alien8.de \
    --cc=dvlasenk@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wad@chromium.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox