From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases Date: Fri, 6 Feb 2015 12:20:20 -0800 Message-ID: References: <2df320a600020fda055fccf2b668145729dd0c04.1409954077.git.luto@amacapital.net> <20150205211916.GA31367@altlinux.org> <20150205214027.GB31367@altlinux.org> <20150205233945.GA31540@altlinux.org> <20150206023249.GB31540@altlinux.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Kees Cook Cc: "Dmitry V. Levin" , LKML , Will Drewry , Oleg Nesterov , "x86@kernel.org" , "linux-arm-kernel@lists.infradead.org" , Linux MIPS Mailing List , linux-arch , linux-security-module , Alexei Starovoitov , "H. Peter Anvin" , Frederic Weisbecker , Michael Kerrisk-manpages List-Id: linux-arch.vger.kernel.org On Fri, Feb 6, 2015 at 12:16 PM, Kees Cook wrote: > On Fri, Feb 6, 2015 at 12:12 PM, Andy Lutomirski wrote: >> On Fri, Feb 6, 2015 at 12:07 PM, Kees Cook wrote: >>> On Fri, Feb 6, 2015 at 11:32 AM, Andy Lutomirski wrote: >>>> On Fri, Feb 6, 2015 at 11:23 AM, Kees Cook wrote: >>>>> And especially since a ptracer >>>>> can change syscalls during syscall-enter-stop to any syscall it wants, >>>>> bypassing seccomp. This condition is already documented. >>>> >>>> If a ptracer (using PTRACE_SYSCALL) were to get the entry callback >>>> before seccomp, then this oddity would go away, which might be a good >>>> thing. A ptracer could change the syscall, but seccomp would based on >>>> what the ptracer changed the syscall to. >>> >>> I want kill events to trigger immediately. I don't want to leave the >>> ptrace surface available on a SECCOMP_RET_KILL. So maybe it can be >>> seccomp phase 1, then ptrace, then seccomp phase 2? And pass more >>> information between phases to determine how things should behave >>> beyond just "skip"? >> >> I thought so too, originally, but I'm far less convinced now, for two reasons: >> >> 1. I think that a lot of filters these days use RET_ERRNO heavily, so >> this won't benefit them. >> >> 2. I'm not convinced it really reduces the attack surface for anyone. >> Unless your filter is literally "return SECCOMP_RET_KILL", then the >> seccomp-filtered task can always cause the ptracer to get a pair of >> syscall notifications. Also, the task can send itself signals (using >> page faults, breakpoints, etc) and cause ptrace events via other >> paths. > > What are you thinking for a solution? > I'm writing a patch now. It's an ABI break, but this thread seems to show that the ABI was somewhat useless before the split-phase changes, and it's differently broken now, so I would be surprised if the change broke anything that was currently working. I'll send it later today, hopefully. > As for capping SECCOMP_RET_ERRNO to MAX_ERRNO, how about this (sorry > if gmail butchers the paste): > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 4ef9687ac115..c88148d20bd5 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -629,7 +629,9 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct > > switch (action) { > case SECCOMP_RET_ERRNO: > - /* Set the low-order 16-bits as a errno. */ > + /* Set the low-order bits as a errno. */ > + if (data > MAX_ERRNO) > + data = MAX_ERRNO; > syscall_set_return_value(current, task_pt_regs(current), > -data, 0); > goto skip; > I'm fine with this, but I'm not entirely convinced it solves a problem. SECCOMP_RET_ERRNO | 5000 didn't work before, and it doesn't work now. Admittedly, the new failure mode is possibly better. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f178.google.com ([209.85.217.178]:45423 "EHLO mail-lb0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752419AbbBFUUm (ORCPT ); Fri, 6 Feb 2015 15:20:42 -0500 Received: by mail-lb0-f178.google.com with SMTP id b6so13640681lbj.9 for ; Fri, 06 Feb 2015 12:20:41 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <2df320a600020fda055fccf2b668145729dd0c04.1409954077.git.luto@amacapital.net> <20150205211916.GA31367@altlinux.org> <20150205214027.GB31367@altlinux.org> <20150205233945.GA31540@altlinux.org> <20150206023249.GB31540@altlinux.org> From: Andy Lutomirski Date: Fri, 6 Feb 2015 12:20:20 -0800 Message-ID: Subject: Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases Content-Type: text/plain; charset=UTF-8 Sender: linux-arch-owner@vger.kernel.org List-ID: To: Kees Cook Cc: "Dmitry V. Levin" , LKML , Will Drewry , Oleg Nesterov , "x86@kernel.org" , "linux-arm-kernel@lists.infradead.org" , Linux MIPS Mailing List , linux-arch , linux-security-module , Alexei Starovoitov , "H. Peter Anvin" , Frederic Weisbecker , Michael Kerrisk-manpages Message-ID: <20150206202020.Yp3NXLL98QW6PvXTvBIPW6Dpt9eZdJiLIeMHMU5hi8c@z> On Fri, Feb 6, 2015 at 12:16 PM, Kees Cook wrote: > On Fri, Feb 6, 2015 at 12:12 PM, Andy Lutomirski wrote: >> On Fri, Feb 6, 2015 at 12:07 PM, Kees Cook wrote: >>> On Fri, Feb 6, 2015 at 11:32 AM, Andy Lutomirski wrote: >>>> On Fri, Feb 6, 2015 at 11:23 AM, Kees Cook wrote: >>>>> And especially since a ptracer >>>>> can change syscalls during syscall-enter-stop to any syscall it wants, >>>>> bypassing seccomp. This condition is already documented. >>>> >>>> If a ptracer (using PTRACE_SYSCALL) were to get the entry callback >>>> before seccomp, then this oddity would go away, which might be a good >>>> thing. A ptracer could change the syscall, but seccomp would based on >>>> what the ptracer changed the syscall to. >>> >>> I want kill events to trigger immediately. I don't want to leave the >>> ptrace surface available on a SECCOMP_RET_KILL. So maybe it can be >>> seccomp phase 1, then ptrace, then seccomp phase 2? And pass more >>> information between phases to determine how things should behave >>> beyond just "skip"? >> >> I thought so too, originally, but I'm far less convinced now, for two reasons: >> >> 1. I think that a lot of filters these days use RET_ERRNO heavily, so >> this won't benefit them. >> >> 2. I'm not convinced it really reduces the attack surface for anyone. >> Unless your filter is literally "return SECCOMP_RET_KILL", then the >> seccomp-filtered task can always cause the ptracer to get a pair of >> syscall notifications. Also, the task can send itself signals (using >> page faults, breakpoints, etc) and cause ptrace events via other >> paths. > > What are you thinking for a solution? > I'm writing a patch now. It's an ABI break, but this thread seems to show that the ABI was somewhat useless before the split-phase changes, and it's differently broken now, so I would be surprised if the change broke anything that was currently working. I'll send it later today, hopefully. > As for capping SECCOMP_RET_ERRNO to MAX_ERRNO, how about this (sorry > if gmail butchers the paste): > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 4ef9687ac115..c88148d20bd5 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -629,7 +629,9 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct > > switch (action) { > case SECCOMP_RET_ERRNO: > - /* Set the low-order 16-bits as a errno. */ > + /* Set the low-order bits as a errno. */ > + if (data > MAX_ERRNO) > + data = MAX_ERRNO; > syscall_set_return_value(current, task_pt_regs(current), > -data, 0); > goto skip; > I'm fine with this, but I'm not entirely convinced it solves a problem. SECCOMP_RET_ERRNO | 5000 didn't work before, and it doesn't work now. Admittedly, the new failure mode is possibly better. --Andy