* [RFC PATCH 0/1] Adding previous syscall context to seccomp
@ 2016-01-22 6:29 Daniel Sangorrin
2016-01-22 6:30 ` [RFC PATCH 1/1] seccomp: provide information about the previous syscall Daniel Sangorrin
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Sangorrin @ 2016-01-22 6:29 UTC (permalink / raw)
To: keescook; +Cc: luto, wad, linux-kernel, linux-api, kernel-hardening
Hi,
During my presentation last year at Linuxcon Japan [1], I released
a proof-of-concept patch [2] for the seccomp subsystem. The main
purpose of that patch was to let applications restrict the order
in which their system calls are requested. In more technical terms,
a host-based anomaly intrusion detection system (HIDS) that uses call
sequence monitoring for detecting unusual patterns. For example,
to detect when the execution flow unexpectedly diverts towards the
'mprotect' syscall, perhaps after a stack overflow.
The main target for the patch was embedded real-time systems
where applications have a high degree of determinism. For that
reason, my original proof-of-concept patch was using bitmaps,
which allow for a constant O(1) overhead (correct me if
I'm wrong but I think the current seccomp-filter implementation
introduces an O(n) overhead proportional to the number of system
calls that one wants to allow or prohibit).
However, I realized that it would be too hard to merge with the
current code. I have adapted my original patch which now allows
BPF filters to retrieve information regarding the previous system
call requested by the application.
The patch can be tested on linux-master as follows (tested
on Debian Jessie x86_64):
$ sudo vi /usr/include/linux/seccomp.h
...
struct seccomp_data {
int nr;
int prev_nr; <-- add this entry
...
$ cd samples/seccomp/
$ make bpf-prev
$ ./bpf-prev
parent msgsnd: hello
parent msgrcv after prctl: hello (128 bytes)
parent msgsnd: world
parent msgrcv after msgsnd: world (128 bytes)
parent msgsnd: this is mars
child msgrcv after clone: this is mars (128 bytes)
parent: child 11409 exited with status 0
Should fail: Bad system call
For simplicity, at the moment the patch only records the last
requested system call. Despite being vulnerable to specially-
crafted mimicry attacks, I think it can deter common attacks
specially during the "initial phase" of the attack (e.g.: a
return-oriented jump).
It could also be extended with longer call sequences (NGRAMs),
call stack and call site information, or scratch memory for
restricting a system call to the application's initalization
for example. However, I'm not sure if such complexity would
be worth. I would like to know at this early stage if any
of you is interested in this type of approach and what you
think about it.
Thanks,
Daniel
[1] Kernel security hacking for the Internet of Things
http://events.linuxfoundation.jp/sites/events/files/slides/linuxcon-2015-daniel-sangorrin-final.pdf
[2] https://github.com/sangorrin/linuxcon-japan-2015/tree/master/hids
Daniel Sangorrin (1):
seccomp: provide information about the previous syscall
include/linux/seccomp.h | 2 +
include/uapi/linux/seccomp.h | 2 +
kernel/seccomp.c | 10 +++
samples/seccomp/.gitignore | 1 +
samples/seccomp/Makefile | 9 ++-
samples/seccomp/bpf-prev.c | 160 +++++++++++++++++++++++++++++++++++++++++++
6 files changed, 183 insertions(+), 1 deletion(-)
create mode 100644 samples/seccomp/bpf-prev.c
--
2.1.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH 1/1] seccomp: provide information about the previous syscall
2016-01-22 6:29 [RFC PATCH 0/1] Adding previous syscall context to seccomp Daniel Sangorrin
@ 2016-01-22 6:30 ` Daniel Sangorrin
2016-01-22 10:48 ` [kernel-hardening] " Jann Horn
2016-01-22 17:30 ` Alexei Starovoitov
0 siblings, 2 replies; 8+ messages in thread
From: Daniel Sangorrin @ 2016-01-22 6:30 UTC (permalink / raw)
To: keescook; +Cc: luto, wad, linux-kernel, linux-api, kernel-hardening
This patch allows applications to restrict the order in which
its system calls may be requested. In order to do that, we
provide seccomp-BPF scripts with information about the
previous system call requested.
An example use case consists of detecting (and stopping) return
oriented attacks that disturb the normal execution flow of
a user program.
Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
---
include/linux/seccomp.h | 2 +
include/uapi/linux/seccomp.h | 2 +
kernel/seccomp.c | 10 +++
samples/seccomp/.gitignore | 1 +
samples/seccomp/Makefile | 9 ++-
samples/seccomp/bpf-prev.c | 160 +++++++++++++++++++++++++++++++++++++++++++
6 files changed, 183 insertions(+), 1 deletion(-)
create mode 100644 samples/seccomp/bpf-prev.c
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 2296e6b..8c6de6d 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -16,6 +16,7 @@ struct seccomp_filter;
*
* @mode: indicates one of the valid values above for controlled
* system calls available to a process.
+ * @prev_nr: stores the previous system call number.
* @filter: must always point to a valid seccomp-filter or NULL as it is
* accessed without locking during system call entry.
*
@@ -24,6 +25,7 @@ struct seccomp_filter;
*/
struct seccomp {
int mode;
+ int prev_nr;
struct seccomp_filter *filter;
};
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..42775dc 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -38,6 +38,7 @@
/**
* struct seccomp_data - the format the BPF program executes over.
* @nr: the system call number
+ * @prev_nr: the previous system call number
* @arch: indicates system call convention as an AUDIT_ARCH_* value
* as defined in <linux/audit.h>.
* @instruction_pointer: at the time of the system call.
@@ -46,6 +47,7 @@
*/
struct seccomp_data {
int nr;
+ int prev_nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 580ac2d..98b2c9d3 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -190,6 +190,8 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
sd = &sd_local;
}
+ sd->prev_nr = current->seccomp.prev_nr;
+
/*
* All filters in the list are evaluated and the lowest BPF return
* value always takes priority (ignoring the DATA).
@@ -200,6 +202,9 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
ret = cur_ret;
}
+
+ current->seccomp.prev_nr = sd->nr;
+
return ret;
}
#endif /* CONFIG_SECCOMP_FILTER */
@@ -443,6 +448,11 @@ static long seccomp_attach_filter(unsigned int flags,
return ret;
}
+ /* Initialize the prev_nr field only once */
+ if (current->seccomp.filter == NULL)
+ current->seccomp.prev_nr =
+ syscall_get_nr(current, task_pt_regs(current));
+
/*
* If there is an existing filter, make it the prev and don't drop its
* task reference.
diff --git a/samples/seccomp/.gitignore b/samples/seccomp/.gitignore
index 78fb781..11dda7a 100644
--- a/samples/seccomp/.gitignore
+++ b/samples/seccomp/.gitignore
@@ -1,3 +1,4 @@
bpf-direct
bpf-fancy
dropper
+bpf-prev
diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
index 1b4e4b8..b50821c 100644
--- a/samples/seccomp/Makefile
+++ b/samples/seccomp/Makefile
@@ -1,7 +1,7 @@
# kbuild trick to avoid linker error. Can be omitted if a module is built.
obj- := dummy.o
-hostprogs-$(CONFIG_SECCOMP_FILTER) := bpf-fancy dropper bpf-direct
+hostprogs-$(CONFIG_SECCOMP_FILTER) := bpf-fancy dropper bpf-direct bpf-prev
HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
@@ -17,6 +17,11 @@ HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
bpf-direct-objs := bpf-direct.o
+HOSTCFLAGS_bpf-prev.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-prev.o += -idirafter $(objtree)/include
+bpf-prev-objs := bpf-prev.o
+
+
# Try to match the kernel target.
ifndef CROSS_COMPILE
ifndef CONFIG_64BIT
@@ -29,10 +34,12 @@ MFLAG = -m31
endif
HOSTCFLAGS_bpf-direct.o += $(MFLAG)
+HOSTCFLAGS_bpf-prev.o += $(MFLAG)
HOSTCFLAGS_dropper.o += $(MFLAG)
HOSTCFLAGS_bpf-helper.o += $(MFLAG)
HOSTCFLAGS_bpf-fancy.o += $(MFLAG)
HOSTLOADLIBES_bpf-direct += $(MFLAG)
+HOSTLOADLIBES_bpf-prev += $(MFLAG)
HOSTLOADLIBES_bpf-fancy += $(MFLAG)
HOSTLOADLIBES_dropper += $(MFLAG)
endif
diff --git a/samples/seccomp/bpf-prev.c b/samples/seccomp/bpf-prev.c
new file mode 100644
index 0000000..138c584
--- /dev/null
+++ b/samples/seccomp/bpf-prev.c
@@ -0,0 +1,160 @@
+/*
+ * Seccomp BPF example that uses information about the previous syscall.
+ *
+ * Copyright (C) 2015 TOSHIBA corp.
+ * Author: Daniel Sangorrin <daniel.sangorrin@gmail.com>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl or seccomp.
+ */
+#if defined(__x86_64__)
+#define SUPPORTED_ARCH 1
+#endif
+
+#if defined(SUPPORTED_ARCH)
+#define __USE_GNU 1
+#define _GNU_SOURCE 1
+
+#include <linux/filter.h>
+/* NOTE: make sure seccomp_data in /usr/include/linux/seccomp.h has prev_nr */
+#include <linux/seccomp.h>
+#include <linux/unistd.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+#include <sys/msg.h>
+#include <assert.h>
+
+#define MSGPERM 0600
+#define MTEXTSIZE 128
+#define MTYPE 1
+
+struct msg_buf {
+ long mtype;
+ char mtext[MTEXTSIZE];
+};
+
+#define syscall_nr (offsetof(struct seccomp_data, nr))
+#define prev_nr (offsetof(struct seccomp_data, prev_nr))
+
+#define EXAMINE_SYSCALL \
+ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr)
+
+#define EXAMINE_PREV_SYSCALL \
+ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, prev_nr)
+
+#define KILL_PROCESS \
+ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
+
+#ifndef PR_SET_NO_NEW_PRIVS
+#define PR_SET_NO_NEW_PRIVS 38
+#endif
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+static int install_syscall_filter(void)
+{
+ /* allow __NR_msgrcv only if prev_nr is __NR_prctl or __NR_msgsnd */
+ struct sock_filter filter[] = {
+ EXAMINE_SYSCALL,
+ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_msgrcv, 1, 0),
+ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+ EXAMINE_PREV_SYSCALL,
+ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_prctl, 0, 1),
+ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_msgsnd, 0, 1),
+ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_clone, 0, 1),
+ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+ KILL_PROCESS,
+ };
+ struct sock_fprog prog = {
+ .len = ARRAY_SIZE(filter),
+ .filter = filter,
+ };
+
+ if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+ perror("prctl(NO_NEW_PRIVS)");
+ return 1;
+ }
+
+ if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+ perror("prctl(SECCOMP)");
+ return 1;
+ }
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ long ret;
+ int id;
+ struct msg_buf send, recv;
+
+ id = syscall(__NR_msgget, IPC_PRIVATE, MSGPERM | IPC_CREAT | IPC_EXCL);
+ assert(id >= 0);
+
+ send.mtype = MTYPE;
+ snprintf(send.mtext, MTEXTSIZE, "hello");
+ printf("parent msgsnd: %s\n", send.mtext);
+ ret = syscall(__NR_msgsnd, id, &send, MTEXTSIZE, 0);
+ assert(ret == 0);
+
+ install_syscall_filter();
+
+ /* TEST 1: msgrcv can be executed after prctl */
+ ret = syscall(__NR_msgrcv, id, &recv, MTEXTSIZE, MTYPE, 0);
+ assert(ret == MTEXTSIZE);
+ printf("parent msgrcv after prctl: %s (%d bytes)\n", recv.mtext, ret);
+
+ snprintf(send.mtext, MTEXTSIZE, "world");
+ printf("parent msgsnd: %s\n", send.mtext);
+ ret = syscall(__NR_msgsnd, id, &send, MTEXTSIZE, 0);
+ assert(ret == 0);
+
+ /* TEST 2: msgrcv can be executed after msgsnd */
+ ret = syscall(__NR_msgrcv, id, &recv, MTEXTSIZE, MTYPE, 0);
+ assert(ret == MTEXTSIZE);
+ printf("parent msgrcv after msgsnd: %s (%d bytes)\n", recv.mtext, ret);
+
+ snprintf(send.mtext, MTEXTSIZE, "this is mars");
+ printf("parent msgsnd: %s\n", send.mtext);
+ ret = syscall(__NR_msgsnd, id, &send, MTEXTSIZE, 0);
+ assert(ret == 0);
+
+ pid_t pid = fork();
+
+ if (pid == 0) {
+ /* TEST 3a: msgrcv can be executed after clone */
+ ret = syscall(__NR_msgrcv, id, &recv, MTEXTSIZE, MTYPE, 0);
+ assert(ret == MTEXTSIZE);
+ printf("child msgrcv after clone: %s (%d bytes)\n",
+ recv.mtext, ret);
+ _exit(0);
+ } else if (pid > 0) {
+ int status;
+
+ pid = wait(&status);
+ printf("parent: child %d exited with status %d\n", pid, status);
+ /* TEST 3b: msgrcv can NOT be executed after write (dmseg) */
+ syscall(__NR_write, STDOUT_FILENO, "Should fail: ", 14);
+ syscall(__NR_msgrcv, id, &recv, MTEXTSIZE, MTYPE, 0);
+ return 0;
+ }
+
+ assert(0); /* should never arrive here */
+
+ return 0;
+}
+#else /* SUPPORTED_ARCH */
+/*
+ * This sample has been tested on x86_64. Other architectures will result in
+ * using only the main() below.
+ */
+int main(void)
+{
+ return 1;
+}
+#endif /* SUPPORTED_ARCH */
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [kernel-hardening] [RFC PATCH 1/1] seccomp: provide information about the previous syscall
2016-01-22 6:30 ` [RFC PATCH 1/1] seccomp: provide information about the previous syscall Daniel Sangorrin
@ 2016-01-22 10:48 ` Jann Horn
2016-01-22 17:17 ` Andy Lutomirski
2016-01-25 3:39 ` Daniel Sangorrin
2016-01-22 17:30 ` Alexei Starovoitov
1 sibling, 2 replies; 8+ messages in thread
From: Jann Horn @ 2016-01-22 10:48 UTC (permalink / raw)
To: Daniel Sangorrin
Cc: keescook, luto, wad, linux-kernel, linux-api, kernel-hardening
[-- Attachment #1: Type: text/plain, Size: 4780 bytes --]
On Fri, Jan 22, 2016 at 03:30:00PM +0900, Daniel Sangorrin wrote:
> This patch allows applications to restrict the order in which
> its system calls may be requested. In order to do that, we
> provide seccomp-BPF scripts with information about the
> previous system call requested.
>
> An example use case consists of detecting (and stopping) return
> oriented attacks that disturb the normal execution flow of
> a user program.
The intent here is to mitigate attacks in which an attacker has
e.g. a function pointer overwrite without a high degree of stack
control or the ability to perform a stack pivot, correct? So that
e.g. a one-gadget system() call won't succeed?
Do you have data on how effective this protection is using just
the previous system call number?
I think that for example, the "magic ROP gadget" in glibc that
can be used given just a single pointer overwrite and stdin
control (https://gist.github.com/zachriggle/ca24daf4e8be953a3f96),
which (as far as I can tell) is in the middle of the system()
implementation, could be used as long as a transition to one of
the following syscalls is allowed:
- rt_sigaction
- rt_sigprocmask
- clone
- execve
I'm not sure how many interesting syscalls typically transition
to that, perhaps you can comment on that?
However, when exploiting network servers, this magic gadget
won't help much - an attacker would probably have to either
call into an interesting function in the application or use
ROP. In the latter case, this protection won't help much -
especially considering that most syscalls just return
-EFAULT / -EINVAL when you supply nonsense arguments, ROPping
through a "pop rax;ret" gadget and a "syscall;ret" gadget
should make it fairly easy to bypass the protection. There
are a bunch of occurences of both gadgets in Debian's libc
(and these are just the trivial ones):
$ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '58 c3'
000382e0 00 00 48 8b 00 5b 8b 40 58 c3 48 8d 05 4f 8a 36 |..H..[.@X.H..O.6|
000383b0 58 c3 48 8d 05 87 89 36 00 48 39 c3 74 0e 48 89 |X.H....6.H9.t.H.|
00038450 40 58 c3 48 8d 05 e6 88 36 00 48 39 c3 74 0e 48 |@X.H....6.H9.t.H|
000d9a00 48 89 44 24 18 e8 56 ff ff ff 48 83 c4 58 c3 90 |H.D$..V...H..X..|
000e51d0 c3 0f 1f 80 00 00 00 00 48 8b 40 58 c3 0f 1f 00 |........H.@X....|
000ea2f0 48 83 3d 58 c3 2b 00 00 48 8b 1d 69 8b 2b 00 64 |H.=X.+..H..i.+.d|
00160520 48 c3 fa ff 58 c3 fa ff 68 c3 fa ff 80 c3 fa ff |H...X...h.......|
00171470 58 c3 f8 ff 84 60 02 00 74 c3 f8 ff 94 62 02 00 |X....`..t....b..|
$ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '0f 05 c3'
000b85b0 b8 6e 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.n..............|
000b85c0 b8 66 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.f..............|
000b85d0 b8 6b 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.k..............|
000b85e0 b8 68 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.h..............|
000b85f0 b8 6c 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.l..............|
000b87f0 b8 6f 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.o..............|
000d9260 b8 5f 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |._..............|
000e6400 b8 e4 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |................|
000fff60 48 63 3f b8 03 00 00 00 0f 05 c3 0f 1f 44 00 00 |Hc?..........D..|
So an attacker would craft the stack like this:
[pop rax;ret address]
[first syscall for transition]
[syscall;ret address]
[pop rax;ret address]
[second syscall for transition]
[syscall;ret address]
[...]
[normal ROP for whatever the attacker wants to do]
Maybe someone who knows a bit more about binary exploiting
can comment on this, especially how likely it is that a
manipulation of a network service's program flow is successful
in the presence of full ASLR and so on without ROP.
Also, there is a potential functional issue: What about signal handlers?
Signal handlers will require transitions from all syscalls to any syscall
that occurs at the start of a signal handler to be allowed as far as I
can tell.
> @@ -443,6 +448,11 @@ static long seccomp_attach_filter(unsigned int flags,
> return ret;
> }
>
> + /* Initialize the prev_nr field only once */
> + if (current->seccomp.filter == NULL)
> + current->seccomp.prev_nr =
> + syscall_get_nr(current, task_pt_regs(current));
> +
> /*
> * If there is an existing filter, make it the prev and don't drop its
> * task reference.
What about SECCOMP_FILTER_FLAG_TSYNC? When a thread is transitioned from
SECCOMP_MODE_DISABLED to SECCOMP_MODE_FILTER by another thread, its initial
prev_nr will be 0, which would e.g. appear to be the read() syscall on
x86_64, right?
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kernel-hardening] [RFC PATCH 1/1] seccomp: provide information about the previous syscall
2016-01-22 10:48 ` [kernel-hardening] " Jann Horn
@ 2016-01-22 17:17 ` Andy Lutomirski
2016-01-25 3:39 ` Daniel Sangorrin
1 sibling, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2016-01-22 17:17 UTC (permalink / raw)
To: Jann Horn
Cc: Daniel Sangorrin, Kees Cook, Will Drewry,
linux-kernel@vger.kernel.org, Linux API,
kernel-hardening@lists.openwall.com
On Fri, Jan 22, 2016 at 2:48 AM, Jann Horn <jann@thejh.net> wrote:
> On Fri, Jan 22, 2016 at 03:30:00PM +0900, Daniel Sangorrin wrote:
>> This patch allows applications to restrict the order in which
>> its system calls may be requested. In order to do that, we
>> provide seccomp-BPF scripts with information about the
>> previous system call requested.
>>
>> An example use case consists of detecting (and stopping) return
>> oriented attacks that disturb the normal execution flow of
>> a user program.
>
>
> The intent here is to mitigate attacks in which an attacker has
> e.g. a function pointer overwrite without a high degree of stack
> control or the ability to perform a stack pivot, correct? So that
> e.g. a one-gadget system() call won't succeed?
>
> Do you have data on how effective this protection is using just
> the previous system call number?
>
> I think that for example, the "magic ROP gadget" in glibc that
> can be used given just a single pointer overwrite and stdin
> control (https://gist.github.com/zachriggle/ca24daf4e8be953a3f96),
> which (as far as I can tell) is in the middle of the system()
> implementation, could be used as long as a transition to one of
> the following syscalls is allowed:
>
> - rt_sigaction
> - rt_sigprocmask
> - clone
> - execve
>
> I'm not sure how many interesting syscalls typically transition
> to that, perhaps you can comment on that?
rt_sigaction is going to be a problem. It can legitimately follow
*anything* because of async signals.
In general, I think I don't like this idea. It seems like a hack that
we'll have to support forever that will allow semi-reliable IDS
signatures to break due to async signals and occasionally detect
intrusions that don't modify themselves slightly to evade detection.
--Andy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] seccomp: provide information about the previous syscall
2016-01-22 6:30 ` [RFC PATCH 1/1] seccomp: provide information about the previous syscall Daniel Sangorrin
2016-01-22 10:48 ` [kernel-hardening] " Jann Horn
@ 2016-01-22 17:30 ` Alexei Starovoitov
2016-01-22 21:23 ` Kees Cook
1 sibling, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2016-01-22 17:30 UTC (permalink / raw)
To: Daniel Sangorrin
Cc: keescook, luto, wad, linux-kernel, linux-api, kernel-hardening
On Fri, Jan 22, 2016 at 03:30:00PM +0900, Daniel Sangorrin wrote:
> This patch allows applications to restrict the order in which
> its system calls may be requested. In order to do that, we
> provide seccomp-BPF scripts with information about the
> previous system call requested.
>
> An example use case consists of detecting (and stopping) return
> oriented attacks that disturb the normal execution flow of
> a user program.
>
> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
...
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
...
> struct seccomp_data {
> int nr;
> + int prev_nr;
> __u32 arch;
> __u64 instruction_pointer;
> __u64 args[6];
this will break abi for existing seccomp programs.
New field has to be at the end.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] seccomp: provide information about the previous syscall
2016-01-22 17:30 ` Alexei Starovoitov
@ 2016-01-22 21:23 ` Kees Cook
2016-01-22 22:10 ` [kernel-hardening] " Paul Moore
0 siblings, 1 reply; 8+ messages in thread
From: Kees Cook @ 2016-01-22 21:23 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Daniel Sangorrin, Andy Lutomirski, Will Drewry, LKML, Linux API,
kernel-hardening@lists.openwall.com
On Fri, Jan 22, 2016 at 9:30 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Jan 22, 2016 at 03:30:00PM +0900, Daniel Sangorrin wrote:
>> This patch allows applications to restrict the order in which
>> its system calls may be requested. In order to do that, we
>> provide seccomp-BPF scripts with information about the
>> previous system call requested.
>>
>> An example use case consists of detecting (and stopping) return
>> oriented attacks that disturb the normal execution flow of
>> a user program.
>>
>> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
> ...
>> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> ...
>> struct seccomp_data {
>> int nr;
>> + int prev_nr;
>> __u32 arch;
>> __u64 instruction_pointer;
>> __u64 args[6];
>
> this will break abi for existing seccomp programs.
> New field has to be at the end.
Yeah, and if we break abi, we need to add further sanity checking to
the parser to determine which "version" of seccomp_data we need. I'm
not convinced that there is enough utility here to break ABI.
(Though if we do, I'd like to add tid to the seccomp_data, which has
been requested in the past to make some pid-based arg checks easier to
do.)
-Kees
--
Kees Cook
Chrome OS & Brillo Security
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kernel-hardening] Re: [RFC PATCH 1/1] seccomp: provide information about the previous syscall
2016-01-22 21:23 ` Kees Cook
@ 2016-01-22 22:10 ` Paul Moore
0 siblings, 0 replies; 8+ messages in thread
From: Paul Moore @ 2016-01-22 22:10 UTC (permalink / raw)
To: kernel-hardening
Cc: Alexei Starovoitov, Daniel Sangorrin, Andy Lutomirski,
Will Drewry, LKML, Linux API
On Fri, Jan 22, 2016 at 4:23 PM, Kees Cook <keescook@chromium.org> wrote:
> Yeah, and if we break abi, we need to add further sanity checking to
> the parser to determine which "version" of seccomp_data we need. I'm
> not convinced that there is enough utility here to break ABI.
PLEASE do not break the seccomp ABI for this alone ... I'm still
trying to sort out the well intentioned, but extremely annoying
direct-wired x86 socket syscalls in 4.4 for libseccomp, I don't want
to have to deal with another big change for at least a month or two ;)
> (Though if we do, I'd like to add tid to the seccomp_data, which has
> been requested in the past to make some pid-based arg checks easier to
> do.)
Agreed, if we break anything, please add this.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [kernel-hardening] [RFC PATCH 1/1] seccomp: provide information about the previous syscall
2016-01-22 10:48 ` [kernel-hardening] " Jann Horn
2016-01-22 17:17 ` Andy Lutomirski
@ 2016-01-25 3:39 ` Daniel Sangorrin
1 sibling, 0 replies; 8+ messages in thread
From: Daniel Sangorrin @ 2016-01-25 3:39 UTC (permalink / raw)
To: 'Jann Horn'
Cc: keescook, luto, wad, linux-kernel, linux-api, kernel-hardening,
'Alexei Starovoitov', 'Andy Lutomirski',
'Paul Moore'
Hi,
Jann, Andy, Alexei, Kees and Paul: thanks a lot for your comments on my RFC!!.
There were a few important points that I didn't mention but are critical to understand
what I was trying to do. The focus of the patch was on protecting "real-time
embedded IoT devices" such as a PLC (programmable logic controller) inside a factory
assembly line .
They have a few important properties that I took into consideration:
- They often rely on firewall technology, and are not updated for many
years (~20 years). For that reason, I think that a white-list approach (define
the correct behaviour) seems suitable. Note also that the typical problem
of white list approaches, false-positives, is unlikely to occur because they
are very deterministic systems.
- No asynchronous signal handlers: real-time applications need deterministic
response times. For that reason, signals are handled synchronously typically
by using 'sigtimedwait' on a separate thread.
- Initialization vs cycle: real-time applications usually have an initialization phase
where memory and stack are locked into RAM and threads are created. After
the initialization phase, threads typically loop through periodic cycles and
perform their tasks. The important point here is that once the initialization
is done we can ban any further calls to 'clone', 'execve', 'mprotect' and the like.
This can be done already by installing an extra filter. For the cyclic phase, my
patch would allow enforcing the order of the system calls inside the cycles.
(e.g.: read sensor, send a message, and write to an actuator). Despite the
fact that the attacker cannot call 'clone' anymore, he could try to alter the
control of an external actuator (e.g. a motor) by using the 'ioctl' system call
for example.
- Mimicry: as I mentioned in the cover letter (and Jann showed with
his ROP attack) if the attacker is able to emulate the system call's order
(plus its arguments and the address from which the call was made)
this patch can be bypassed. However, note that this is not easy for several
reasons:
+ the attacker may need a long stack to mimic all the system calls and their
arguments.
+ the stealthy attacker must make sure the real-time application does not
crash, miss any of its deadlines or cause deadline misses in other apps
[Note] Real-time application binaries are usually closed source so
this might require quite a bit of effort.
+ randomized system calls: applications could randomly activate dummy
system calls each time they are instantiated (and adjust their BPF filter,
which should later be zeroed). In this case, the attacker (or virus)
would need to figure out which dummy system calls have to
be mimicked and prepare a stack accordingly. This seems challenging.
[Note] under a brute force attack, the application may just raise an alarm,
activate a redundant node (not connected to the network) and
commit digital suicide :).
About the ABI, by all means I don't want to break it. If putting the field at
the end does not break it, as Alexei mentioned, I can change it. Also I would
be glad to review the SECCOMP_FILTER_FLAG_TSYNC flag mentioned by Jann
in case there is any interest.
However, I'll understand the NACK if you think that the maintenance is not worth it
as Andy mentioned; that it can be bypassed under certain conditions; or the fact
that it focuses on a particular type of systems. I will keep reading the
messages in the kernel-hardening list and see if I find another topic to
contribute :).
Thanks a lot for your consideration and comments,
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-25 3:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-22 6:29 [RFC PATCH 0/1] Adding previous syscall context to seccomp Daniel Sangorrin
2016-01-22 6:30 ` [RFC PATCH 1/1] seccomp: provide information about the previous syscall Daniel Sangorrin
2016-01-22 10:48 ` [kernel-hardening] " Jann Horn
2016-01-22 17:17 ` Andy Lutomirski
2016-01-25 3:39 ` Daniel Sangorrin
2016-01-22 17:30 ` Alexei Starovoitov
2016-01-22 21:23 ` Kees Cook
2016-01-22 22:10 ` [kernel-hardening] " Paul Moore
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox