From: Denys Vlasenko <dvlasenk@redhat.com>
To: David Drysdale <drysdale@google.com>,
Kees Cook <keescook@chromium.org>,
Andy Lutomirski <luto@amacapital.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Will Drewry <wad@chromium.org>, Ingo Molnar <mingo@kernel.org>
Cc: Alok Kataria <akataria@vmware.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>,
Alexei Starovoitov <ast@plumgrid.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"H. Peter Anvin" <hpa@zytor.com>, Oleg Nesterov <oleg@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>, X86 ML <x86@kernel.org>
Subject: Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM?
Date: Thu, 13 Aug 2015 17:17:36 +0200 [thread overview]
Message-ID: <55CCB510.3060807@redhat.com> (raw)
In-Reply-To: <CAHse=S8UeYVJqRBx-_wf=jeDERTJxzM=YRH5izBpQnrCHOxcDA@mail.gmail.com>
On 08/13/2015 10:30 AM, David Drysdale wrote:
> Hi folks,
>
> I've got an odd regression with the v4.2 rc kernel, and I wondered if anyone
> else could reproduce it.
>
> The problem occurs with a seccomp-bpf filter program that's set up to return
> an errno value -- an errno of 1 is always returned instead of what's in the
> filter, plus other oddities (selftest output below).
>
> The problem seems to need a combination of circumstances to occur:
>
> - The seccomp-bpf userspace program needs to be 32-bit, running against a
> 64-bit kernel -- I'm testing with seccomp_bpf from
> tools/testing/selftests/seccomp/, built via 'CFLAGS=-m32 make'.
Does it work correctly when built as 64-bit program?
>
> - The kernel needs to be running as a VM guest -- it occurs inside my
> VMware Fusion host, but not if I run on bare metal. Kees tells me he
> cannot repro with a kvm guest though.
>
> Bisecting indicates that the commit that induces the problem is
> 3f5159a9221f19b0, "x86/asm/entry/32: Update -ENOSYS handling to match the
> 64-bit logic", included in all the v4.2-rc* candidates.
>
> Apologies if I've just got something odd with my local setup, but the
> bisection was unequivocal enough that I thought it worth reporting...
>
> Thanks,
> David
>
>
> seccomp_bpf failure outputs:
>
> seccomp_bpf.c:533:global.ERRNO_valid:Expected 7 (7) ==
> (*__errno_location ()) (1)
Test source code:
TEST(ERRNO_valid)
{
struct sock_filter filter[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG),
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
};
struct sock_fprog prog = {
.len = (unsigned short)ARRAY_SIZE(filter),
.filter = filter,
};
long ret;
pid_t parent = getppid();
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
ASSERT_EQ(0, ret);
EXPECT_EQ(parent, syscall(__NR_getppid));
EXPECT_EQ(-1, read(0, NULL, 0));
EXPECT_EQ(E2BIG, errno);
}
The last EXPECT expects 7 (E2BIG) but sees 1.
I'm trying to see how that happens.
SECCOMP_RET_ERRNO action is processed as follows:
static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
{
...
case SECCOMP_RET_ERRNO:
/* Set low-order bits as an errno, capped at MAX_ERRNO. */
if (data > MAX_ERRNO)
data = MAX_ERRNO;
syscall_set_return_value(current, task_pt_regs(current),
-data, 0);
goto skip;
...
skip:
audit_seccomp(this_syscall, 0, action);
return SECCOMP_PHASE1_SKIP; // "the syscall should not be invoked"
}
The above is called from:
unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
{
...
if (work & _TIF_SECCOMP) {
... ret = seccomp_phase1(&sd);
if (ret == SECCOMP_PHASE1_SKIP) {
regs->orig_ax = -1;
ret = 0;
}
...
}
/* Do our best to finish without phase 2. */
if (work == 0)
return ret; /* seccomp and/or nohz only (ret == 0 here) */
#ifdef CONFIG_AUDITSYSCALL
if (work == _TIF_SYSCALL_AUDIT) {
/*
* If there is no more work to be done except auditing,
* then audit in phase 1. Phase 2 always audits, so, if
* we audit here, then we can't go on to phase 2.
*/
do_audit_syscall_entry(regs, arch);
return 0;
}
#endif
return 1; /* Something is enabled that we can't handle in phase 1 */
}
...
long syscall_trace_enter(struct pt_regs *regs)
{
u32 arch = is_ia32_task() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
unsigned long phase1_result = syscall_trace_enter_phase1(regs, arch);
if (phase1_result == 0)
return regs->orig_ax;
else
return syscall_trace_enter_phase2(regs, arch, phase1_result);
}
End result should be:
pt_regs->ax = -E2BIG (via syscall_set_return_value())
pt_regs->orig_ax = -1 ("skip syscall")
and syscall_trace_enter_phase1() usually returns with 0,
meaning "re-execute syscall at once, no phase2 needed".
This, in turn, is called from .S files, and when it returns there,
execution loops back to syscall dispatch.
Because of orig_ax = -1, syscall dispatch should skip calling syscall.
So -E2BIG should survive and be returned...
next prev parent reply other threads:[~2015-08-13 15:17 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-13 8:30 [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM? David Drysdale
2015-08-13 15:17 ` Denys Vlasenko [this message]
2015-08-13 16:28 ` David Drysdale
2015-08-13 17:15 ` Andy Lutomirski
2015-08-13 17:39 ` David Drysdale
2015-08-13 18:47 ` Kees Cook
2015-08-13 21:35 ` Denys Vlasenko
2015-08-13 21:47 ` Andy Lutomirski
2015-08-13 22:49 ` Linus Torvalds
2015-08-13 22:54 ` Linus Torvalds
2015-08-13 22:56 ` Kees Cook
2015-08-13 22:59 ` Andy Lutomirski
2015-08-13 23:14 ` Kees Cook
2015-08-13 23:30 ` Linus Torvalds
2015-08-14 11:58 ` Denys Vlasenko
2015-08-14 14:27 ` Andy Lutomirski
2015-08-14 7:33 ` David Drysdale
2015-08-13 22:58 ` Andy Lutomirski
2015-08-13 23:25 ` Linus Torvalds
2015-08-13 22:27 ` Linus Torvalds
2015-08-14 11:20 ` Denys Vlasenko
2015-08-22 10:03 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55CCB510.3060807@redhat.com \
--to=dvlasenk@redhat.com \
--cc=akataria@vmware.com \
--cc=ast@plumgrid.com \
--cc=bp@alien8.de \
--cc=drysdale@google.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
--cc=wad@chromium.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox