All of lore.kernel.org
 help / color / mirror / Atom feed
From: Denys Vlasenko <dvlasenk@redhat.com>
To: Kees Cook <keescook@chromium.org>, David Drysdale <drysdale@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Will Drewry <wad@chromium.org>, Ingo Molnar <mingo@kernel.org>,
	Alok Kataria <akataria@vmware.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Oleg Nesterov <oleg@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, X86 ML <x86@kernel.org>
Subject: Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM?
Date: Thu, 13 Aug 2015 23:35:40 +0200	[thread overview]
Message-ID: <55CD0DAC.9080809@redhat.com> (raw)
In-Reply-To: <CAGXu5j+gXShOAdK93KuVide93VYA_ObyjbK-zb7CwgOLc2JCnQ@mail.gmail.com>

On 08/13/2015 08:47 PM, Kees Cook wrote:
> On Thu, Aug 13, 2015 at 10:39 AM, David Drysdale <drysdale@google.com> wrote:
>> On Thu, Aug 13, 2015 at 6:15 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Thu, Aug 13, 2015 at 9:28 AM, David Drysdale <drysdale@google.com> wrote:
>>>> On Thu, Aug 13, 2015 at 4:17 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>>> On 08/13/2015 10:30 AM, David Drysdale wrote:
>>>>>> Hi folks,
>>>>>>
>>>>>> I've got an odd regression with the v4.2 rc kernel, and I wondered if anyone
>>>>>> else could reproduce it.
>>>>>>
>>>>>> The problem occurs with a seccomp-bpf filter program that's set up to return
>>>>>> an errno value -- an errno of 1 is always returned instead of what's in the
>>>>>> filter, plus other oddities (selftest output below).
>>>>>>
>>>>>> The problem seems to need a combination of circumstances to occur:
>>>>>>
>>>>>>  - The seccomp-bpf userspace program needs to be 32-bit, running against a
>>>>>>    64-bit kernel -- I'm testing with seccomp_bpf from
>>>>>>    tools/testing/selftests/seccomp/, built via 'CFLAGS=-m32 make'.
>>>>>
>>>>> Does it work correctly when built as 64-bit program?
>>>>
>>>> Yep, 64-bit works fine (both at v4.2-rc6 and at commit 3f5159).
>>>>
>>>>>>
>>>>>>  - The kernel needs to be running as a VM guest -- it occurs inside my
>>>>>>    VMware Fusion host, but not if I run on bare metal.  Kees tells me he
>>>>>>    cannot repro with a kvm guest though.
>>>>>>
>>>>>> Bisecting indicates that the commit that induces the problem is
>>>>>> 3f5159a9221f19b0, "x86/asm/entry/32: Update -ENOSYS handling to match the
>>>>>> 64-bit logic", included in all the v4.2-rc* candidates.
>>>>>>
>>>>>> Apologies if I've just got something odd with my local setup, but the
>>>>>> bisection was unequivocal enough that I thought it worth reporting...
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>> seccomp_bpf failure outputs:
>>>>
>>>> [snip]
>>>>
>>>>> End result should be:
>>>>> pt_regs->ax = -E2BIG (via syscall_set_return_value())
>>>>> pt_regs->orig_ax = -1 ("skip syscall")
>>>>> and syscall_trace_enter_phase1() usually returns with 0,
>>>>> meaning "re-execute syscall at once, no phase2 needed".
>>>>>
>>>>> This, in turn, is called from .S files, and when it returns there,
>>>>> execution loops back to syscall dispatch.
>>>>>
>>>>> Because of orig_ax = -1, syscall dispatch should skip calling syscall.
>>>>> So -E2BIG should survive and be returned...
>>>>
>>>> So I was just about to send:
>>>>
>>>>  That makes sense, and given that exactly the same 32-bit binary
>>>>  runs fine on a different machine, there's presumably something up
>>>>  with my local setup.  The failing machine is a VMware guest, but
>>>>  maybe that's not the relevant interaction -- particularly if no-one
>>>>  else can repro.
>>>>
>>>> But then I noticed some odd audit entries in the main log:
>>>>
>>>> Aug 13 16:52:56 ubuntu kernel: [   20.687249] audit: type=1326
>>>> audit(1439481176.034:62): auid=4294967295 uid=1000 gid=1000
>>>> ses=4294967295 pid=2621 comm="secccomp_bpf.ke"
>>>> exe="/home/dmd/secccomp_bpf.kees.m32" sig=9 arch=40000003 syscall=172
>>>> compat=1 ip=0xf773cc90 code=0x0
>>>> Aug 13 16:52:56 ubuntu kernel: [   20.691157] audit: type=1326
>>>> audit(1439481176.038:63): auid=4294967295 uid=1000 gid=1000
>>>> ses=4294967295 pid=2631 comm="secccomp_bpf.ke"
>>>> exe="/home/dmd/secccomp_bpf.kees.m32" sig=31 arch=40000003 syscall=20
>>>> compat=1 ip=0xf773cc90 code=0x10000000
>>>> ...
>>>>
>>>> I didn't think I had any audit stuff turned on, and indeed:
>>>>   # auditctl -l
>>>>   No rules
>>>>
>>>> But as soon as I'd run that auditctl command, the 32-bit
>>>> seccomp_bpf binary started running fine!
>>>>
>>>> So now I'm confused, and I can no longer reproduce the
>>>> problem.  Which probably means this was a false alarm, in
>>>> which case, my apologies.
>>>
>>> You might have triggered TIF_AUDIT or whatever it's called, which
>>> causes a whole different path through the asm tangle, so you might
>>> really have a problem.
>>>
>>> Try auditctl -a task,never.  If that doesn't change anything, try
>>> rebooting the guest.
>>
>> Aha, that seems to re-instate the problem -- with that auditctl setup
>> I get the 32-bit seccomp failures on two different machines (one VM,
>> one bare).  So can anyone else repro?
>>
>> I guess the relevant steps are thus:
>>   - sudo auditctl -a task,never
>>   - cd tools/testing/selftests/seccomp
>>   - CFLAGS=-m32 make clean run_tests
> 
> That was it! I can reproduce this now on kvm (after adding the auditctl rule).

I suspect this change:

        .macro auditsys_entry_common
...
        movl %ebx,%esi                  /* 2nd arg: 1st syscall arg */
        movl %eax,%edi                  /* 1st arg: syscall number */
        call __audit_syscall_entry
-       movl RAX(%rsp),%eax     /* reload syscall number */
-       cmpq $(IA32_NR_syscalls-1),%rax
-       ja ia32_badsys
+       movl ORIG_RAX(%rsp),%eax        /* reload syscall number */
        movl %ebx,%edi                  /* reload 1st syscall arg */
        movl RCX(%rsp),%esi     /* reload 2nd syscall arg */
        movl RDX(%rsp),%edx     /* reload 3rd syscall arg */

We were reloading syscall# from pt_regs->ax.

After the patch, pt_regs->ax isn't equal to syscall# on entry,
instead it contains -ENOSYS. Therefore the change shown above
was made, to reload it from pt_regs->orig_ax.

Well. This still should work... in fact it is "more correct"
than it was before...

64-bit code has no call to __audit_syscall_entry, it uses
syscall_trace_enter_phase1/phase2 mechanism instead of
"only audit" shortcut. If the bug is here (though I don't see it),
it explains why 64-bit binary works.


Now, how do we reach this bit of code?

ia32_sysenter_target:
...
        testl   $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jnz  sysenter_tracesys
...
sysenter_tracesys:
        testl   $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jz      sysenter_auditsys
...
sysenter_auditsys:
        auditsys_entry_common     <== OUR MACRO
        movl %ebp,%r9d                  /* reload 6th syscall arg */
        jmp sysenter_dispatch


ia32_cstar_target:
...
        testl   $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jnz   cstar_tracesys
...
cstar_tracesys:
        testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
        jz cstar_auditsys
...
cstar_auditsys:
        movl %r9d,R9(%rsp)      /* register to be clobbered by call */
        auditsys_entry_common  <== OUR MACRO
        movl R9(%rsp),%r9d      /* reload 6th syscall arg */
        jmp cstar_dispatch


  reply	other threads:[~2015-08-13 21:35 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-13  8:30 [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM? David Drysdale
2015-08-13 15:17 ` Denys Vlasenko
2015-08-13 16:28   ` David Drysdale
2015-08-13 17:15     ` Andy Lutomirski
2015-08-13 17:39       ` David Drysdale
2015-08-13 18:47         ` Kees Cook
2015-08-13 21:35           ` Denys Vlasenko [this message]
2015-08-13 21:47             ` Andy Lutomirski
2015-08-13 22:49               ` Linus Torvalds
2015-08-13 22:54                 ` Linus Torvalds
2015-08-13 22:56                   ` Kees Cook
2015-08-13 22:59                     ` Andy Lutomirski
2015-08-13 23:14                       ` Kees Cook
2015-08-13 23:30                       ` Linus Torvalds
2015-08-14 11:58                       ` Denys Vlasenko
2015-08-14 14:27                         ` Andy Lutomirski
2015-08-14  7:33                     ` David Drysdale
2015-08-13 22:58                   ` Andy Lutomirski
2015-08-13 23:25                     ` Linus Torvalds
2015-08-13 22:27             ` Linus Torvalds
2015-08-14 11:20               ` Denys Vlasenko
2015-08-22 10:03                 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55CD0DAC.9080809@redhat.com \
    --to=dvlasenk@redhat.com \
    --cc=akataria@vmware.com \
    --cc=ast@plumgrid.com \
    --cc=bp@alien8.de \
    --cc=drysdale@google.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wad@chromium.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.