From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756154AbbJAJiZ (ORCPT ); Thu, 1 Oct 2015 05:38:25 -0400 Received: from mail-wi0-f177.google.com ([209.85.212.177]:37036 "EHLO mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751840AbbJAJiC (ORCPT ); Thu, 1 Oct 2015 05:38:02 -0400 Date: Thu, 1 Oct 2015 11:37:57 +0200 From: Ingo Molnar To: Kees Cook Cc: Thomas Gleixner , Andy Lutomirski , Dmitry Vyukov , Andrey Ryabinin , Ingo Molnar , "H. Peter Anvin" , Andy Lutomirski , Borislav Petkov , Denys Vlasenko , "x86@kernel.org" , LKML , Kostya Serebryany , Alexander Potapenko , Andrey Konovalov , Sasha Levin , Andi Kleen , kasan-dev , Linus Torvalds , Peter Zijlstra , Andrew Morton , Al Viro Subject: [PATCH v4] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan Message-ID: <20151001093757.GA6060@gmail.com> References: <1443430839-13225-1-git-send-email-dvyukov@google.com> <20150930071537.GA19048@gmail.com> <20150930135917.GA3285@gmail.com> <20151001075715.GA23430@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151001075715.GA23430@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > > * Kees Cook wrote: > > > > @@ -507,7 +505,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > > seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL); > > > seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL); > > > seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL); > > > - seq_put_decimal_ull(m, ' ', wchan); > > > + seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */ > > > > Probably should also update Documentation/filesystems/proc.txt with > > something like: > > > > --- a/Documentation/filesystems/proc.txt > > +++ b/Documentation/filesystems/proc.txt > > @@ -310,7 +310,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) > > blocked bitmap of blocked signals > > sigign bitmap of ignored signals > > sigcatch bitmap of caught signals > > - wchan address where process went to sleep > > + 0 (place holder, was wchan, see /proc/PID/wchan instead) > > 0 (place holder) > > 0 (place holder) > > exit_signal signal to send to parent thread on exit > > Indeed - I ended up clarifying both wchan explanations, see the changes below. > > I also made the 'no symbols' output "0" (instead of an empty string), to better > match the /proc/PID/stat behavior and previous output. > > I'll push it out after a bit more testing and if nothing goes wrong I'll send this > patch to Linus in the v4.4 merge window. Yeah, so testing uncovered the following additional ABI detail: procps relies on the wchan field in /proc/PID/stat, but only as a flag (in most cases), whether to look at /proc/PID/wchan. To keep the ABI, the v4 patch below outputs not the absolute address, but a 0/1 flag to indicate whether the task is blocked and whether there's anything worth looking at in /proc/PID/wchan. I tested this approach with procps and it seems to fully work. In fact due to the ptrace check we properly restrict the information to our own tasks only. root still sees the wchan field of all tasks. Btw., the very latest procps-ng grew this nice change: 6b8dc5511fb9 ("library: refactor and rely on modern kernels for wchan") which greatly simplified procps's handling of /proc/PID/wchan. ... but my testing was done with an older procps version. Thanks, Ingo ==========================> >>From b26a16469b0b6f3f0604aafb95c50d1532b3fff2 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Wed, 30 Sep 2015 15:59:17 +0200 Subject: [PATCH] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan So the /proc/PID/stat 'wchan' field (the 30th field, which contains the absolute kernel address of the kernel function a task is blocked in) leaks absolute kernel addresses to unprivileged user-space: seq_put_decimal_ull(m, ' ', wchan); The absolute address might also leak via /proc/PID/wchan as well, if KALLSYMS is turned off or if the symbol lookup fails for some reason: static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { unsigned long wchan; char symname[KSYM_NAME_LEN]; wchan = get_wchan(task); if (lookup_symbol_name(wchan, symname) < 0) { if (!ptrace_may_access(task, PTRACE_MODE_READ)) return 0; seq_printf(m, "%lu", wchan); } else { seq_printf(m, "%s", symname); } return 0; } This isn't ideal, because for example it trivially leaks the KASLR offset to any local attacker: fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35) ffffffff8123b380 Most real-life uses of wchan are symbolic: ps -eo pid:10,tid:10,wchan:30,comm and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat: triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1 open("/proc/30833/wchan", O_RDONLY) = 6 There's one compatibility quirk here: procps relies on whether the absolute value is non-zero - and we can provide that functionality by outputing "0" or "1" depending on whether the task is blocked (whether there's a wchan address). These days there appears to be very little legitimate reason user-space would be interested in the absolute address. The absolute address is mostly historic: from the days when we didn't have kallsyms and user-space procps had to do the decoding itself via the System.map. So this patch sets all numeric output to "0" or "1" and keeps only symbolic output, in /proc/PID/wchan. ( The absolute sleep address can generally still be profiled via perf, by tasks with sufficient privileges. ) Reviewed-by: Thomas Gleixner Acked-by: Kees Cook Acked-by: Linus Torvalds Cc: Cc: Al Viro Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Denys Vlasenko Cc: Dmitry Vyukov Cc: Kostya Serebryany Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Peter Zijlstra Cc: Sasha Levin Cc: kasan-dev Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com Signed-off-by: Ingo Molnar --- Documentation/filesystems/proc.txt | 5 +++-- fs/proc/array.c | 16 ++++++++++++++-- fs/proc/base.c | 9 +++------ 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index d411ca63c8b6..3a9d65c912e7 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc stat Process status statm Process memory status information status Process status in human readable form - wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan + wchan Present with CONFIG_KALLSYMS=y: it shows the kernel function + symbol the task is blocked in - or "0" if not blocked. pagemap Page table stack Report full stack trace, enable via CONFIG_STACKTRACE smaps a extension based on maps, showing the memory consumption of @@ -310,7 +311,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) blocked bitmap of blocked signals sigign bitmap of ignored signals sigcatch bitmap of caught signals - wchan address where process went to sleep + 0 (place holder, used to be the wchan address, use /proc/PID/wchan instead) 0 (place holder) 0 (place holder) exit_signal signal to send to parent thread on exit diff --git a/fs/proc/array.c b/fs/proc/array.c index f60f0121e331..eed2050db9be 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -375,7 +375,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task, int whole) { - unsigned long vsize, eip, esp, wchan = ~0UL; + unsigned long vsize, eip, esp, wchan = 0; int priority, nice; int tty_pgrp = -1, tty_nr = 0; sigset_t sigign, sigcatch; @@ -507,7 +507,19 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL); seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL); seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL); - seq_put_decimal_ull(m, ' ', wchan); + + /* + * We used to output the absolute kernel address, but that's an + * information leak - so instead we show a 0/1 flag here, to signal + * to user-space whether there's a wchan field in /proc/PID/wchan. + * + * This works with older implementations of procps as well. + */ + if (wchan) + seq_puts(m, " 1"); + else + seq_puts(m, " 0"); + seq_put_decimal_ull(m, ' ', 0); seq_put_decimal_ull(m, ' ', 0); seq_put_decimal_ll(m, ' ', task->exit_signal); diff --git a/fs/proc/base.c b/fs/proc/base.c index b25eee4cead5..6f05aabce3aa 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns, wchan = get_wchan(task); - if (lookup_symbol_name(wchan, symname) < 0) { - if (!ptrace_may_access(task, PTRACE_MODE_READ)) - return 0; - seq_printf(m, "%lu", wchan); - } else { + if (!lookup_symbol_name(wchan, symname)) seq_printf(m, "%s", symname); - } + else + seq_putc(m, '0'); return 0; }