From: Denys Vlasenko <dvlasenk@redhat.com>
To: David Drysdale <drysdale@google.com>,
Kees Cook <keescook@chromium.org>,
Andy Lutomirski <luto@amacapital.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Will Drewry <wad@chromium.org>, Ingo Molnar <mingo@kernel.org>
Cc: Alok Kataria <akataria@vmware.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>,
Alexei Starovoitov <ast@plumgrid.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"H. Peter Anvin" <hpa@zytor.com>, Oleg Nesterov <oleg@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>, X86 ML <x86@kernel.org>
Subject: Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM?
Date: Thu, 13 Aug 2015 17:17:36 +0200 [thread overview]
Message-ID: <55CCB510.3060807@redhat.com> (raw)
In-Reply-To: <CAHse=S8UeYVJqRBx-_wf=jeDERTJxzM=YRH5izBpQnrCHOxcDA@mail.gmail.com>
On 08/13/2015 10:30 AM, David Drysdale wrote:
> Hi folks,
>
> I've got an odd regression with the v4.2 rc kernel, and I wondered if anyone
> else could reproduce it.
>
> The problem occurs with a seccomp-bpf filter program that's set up to return
> an errno value -- an errno of 1 is always returned instead of what's in the
> filter, plus other oddities (selftest output below).
>
> The problem seems to need a combination of circumstances to occur:
>
> - The seccomp-bpf userspace program needs to be 32-bit, running against a
> 64-bit kernel -- I'm testing with seccomp_bpf from
> tools/testing/selftests/seccomp/, built via 'CFLAGS=-m32 make'.
Does it work correctly when built as 64-bit program?
>
> - The kernel needs to be running as a VM guest -- it occurs inside my
> VMware Fusion host, but not if I run on bare metal. Kees tells me he
> cannot repro with a kvm guest though.
>
> Bisecting indicates that the commit that induces the problem is
> 3f5159a9221f19b0, "x86/asm/entry/32: Update -ENOSYS handling to match the
> 64-bit logic", included in all the v4.2-rc* candidates.
>
> Apologies if I've just got something odd with my local setup, but the
> bisection was unequivocal enough that I thought it worth reporting...
>
> Thanks,
> David
>
>
> seccomp_bpf failure outputs:
>
> seccomp_bpf.c:533:global.ERRNO_valid:Expected 7 (7) ==
> (*__errno_location ()) (1)
Test source code:
TEST(ERRNO_valid)
{
struct sock_filter filter[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG),
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
};
struct sock_fprog prog = {
.len = (unsigned short)ARRAY_SIZE(filter),
.filter = filter,
};
long ret;
pid_t parent = getppid();
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
ASSERT_EQ(0, ret);
EXPECT_EQ(parent, syscall(__NR_getppid));
EXPECT_EQ(-1, read(0, NULL, 0));
EXPECT_EQ(E2BIG, errno);
}
The last EXPECT expects 7 (E2BIG) but sees 1.
I'm trying to see how that happens.
SECCOMP_RET_ERRNO action is processed as follows:
static u32 __seccomp_phase1_filter(int this_syscall, struct seccomp_data *sd)
{
...
case SECCOMP_RET_ERRNO:
/* Set low-order bits as an errno, capped at MAX_ERRNO. */
if (data > MAX_ERRNO)
data = MAX_ERRNO;
syscall_set_return_value(current, task_pt_regs(current),
-data, 0);
goto skip;
...
skip:
audit_seccomp(this_syscall, 0, action);
return SECCOMP_PHASE1_SKIP; // "the syscall should not be invoked"
}
The above is called from:
unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
{
...
if (work & _TIF_SECCOMP) {
... ret = seccomp_phase1(&sd);
if (ret == SECCOMP_PHASE1_SKIP) {
regs->orig_ax = -1;
ret = 0;
}
...
}
/* Do our best to finish without phase 2. */
if (work == 0)
return ret; /* seccomp and/or nohz only (ret == 0 here) */
#ifdef CONFIG_AUDITSYSCALL
if (work == _TIF_SYSCALL_AUDIT) {
/*
* If there is no more work to be done except auditing,
* then audit in phase 1. Phase 2 always audits, so, if
* we audit here, then we can't go on to phase 2.
*/
do_audit_syscall_entry(regs, arch);
return 0;
}
#endif
return 1; /* Something is enabled that we can't handle in phase 1 */
}
...
long syscall_trace_enter(struct pt_regs *regs)
{
u32 arch = is_ia32_task() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
unsigned long phase1_result = syscall_trace_enter_phase1(regs, arch);
if (phase1_result == 0)
return regs->orig_ax;
else
return syscall_trace_enter_phase2(regs, arch, phase1_result);
}
End result should be:
pt_regs->ax = -E2BIG (via syscall_set_return_value())
pt_regs->orig_ax = -1 ("skip syscall")
and syscall_trace_enter_phase1() usually returns with 0,
meaning "re-execute syscall at once, no phase2 needed".
This, in turn, is called from .S files, and when it returns there,
execution loops back to syscall dispatch.
Because of orig_ax = -1, syscall dispatch should skip calling syscall.
So -E2BIG should survive and be returned...
next prev parent reply other threads:[~2015-08-13 15:17 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-13 8:30 [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM? David Drysdale
2015-08-13 15:17 ` Denys Vlasenko [this message]
2015-08-13 16:28 ` David Drysdale
2015-08-13 17:15 ` Andy Lutomirski
2015-08-13 17:39 ` David Drysdale
2015-08-13 18:47 ` Kees Cook
2015-08-13 21:35 ` Denys Vlasenko
2015-08-13 21:47 ` Andy Lutomirski
2015-08-13 22:49 ` Linus Torvalds
2015-08-13 22:54 ` Linus Torvalds
2015-08-13 22:56 ` Kees Cook
2015-08-13 22:59 ` Andy Lutomirski
2015-08-13 23:14 ` Kees Cook
2015-08-13 23:30 ` Linus Torvalds
2015-08-14 11:58 ` Denys Vlasenko
2015-08-14 14:27 ` Andy Lutomirski
2015-08-14 7:33 ` David Drysdale
2015-08-13 22:58 ` Andy Lutomirski
2015-08-13 23:25 ` Linus Torvalds
2015-08-13 22:27 ` Linus Torvalds
2015-08-14 11:20 ` Denys Vlasenko
2015-08-22 10:03 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55CCB510.3060807@redhat.com \
--to=dvlasenk@redhat.com \
--cc=akataria@vmware.com \
--cc=ast@plumgrid.com \
--cc=bp@alien8.de \
--cc=drysdale@google.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
--cc=wad@chromium.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.