From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [PATCH v7 03/11] task_isolation: support PR_TASK_ISOLATION_STRICT mode Date: Mon, 28 Sep 2015 16:51:49 -0400 Message-ID: References: <1443453446-7827-1-git-send-email-cmetcalf@ezchip.com> <1443453446-7827-4-git-send-email-cmetcalf@ezchip.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: In-Reply-To: <1443453446-7827-4-git-send-email-cmetcalf-d5a29ZRxExrQT0dZR+AlfA@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chris Metcalf Cc: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , "linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Linux API , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" List-Id: linux-api@vger.kernel.org On Mon, Sep 28, 2015 at 11:17 AM, Chris Metcalf wrote: > With task_isolation mode, the task is in principle guaranteed not to > be interrupted by the kernel, but only if it behaves. In particular, > if it enters the kernel via system call, page fault, or any of a > number of other synchronous traps, it may be unexpectedly exposed > to long latencies. Add a simple flag that puts the process into > a state where any such kernel entry is fatal; this is defined as > happening immediately after the SECCOMP test. Why after seccomp? Seccomp is still an entry, and the code would be considerably simpler if it were before seccomp. > @@ -35,8 +36,12 @@ static inline enum ctx_state exception_enter(void) > return 0; > > prev_ctx = this_cpu_read(context_tracking.state); > - if (prev_ctx != CONTEXT_KERNEL) > - context_tracking_exit(prev_ctx); > + if (prev_ctx != CONTEXT_KERNEL) { > + if (context_tracking_exit(prev_ctx)) { > + if (task_isolation_strict()) > + task_isolation_exception(); > + } > + } > > return prev_ctx; > } x86 does not promise to call this function. In fact, x86 is rather likely to stop ever calling this function in the reasonably near future. > --- a/kernel/context_tracking.c > +++ b/kernel/context_tracking.c > @@ -144,15 +144,16 @@ NOKPROBE_SYMBOL(context_tracking_user_enter); > * This call supports re-entrancy. This way it can be called from any exception > * handler without needing to know if we came from userspace or not. > */ > -void context_tracking_exit(enum ctx_state state) > +bool context_tracking_exit(enum ctx_state state) This needs clear documentation of what the return value means. > +static void kill_task_isolation_strict_task(void) > +{ > + /* RCU should have been enabled prior to this point. */ > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "kernel entry without RCU"); > + > + dump_stack(); > + current->task_isolation_flags &= ~PR_TASK_ISOLATION_ENABLE; > + send_sig(SIGKILL, current, 1); > +} Wasn't this supposed to be configurable? Or is that something that happens later on in the series? > + > +/* > + * This routine is called from syscall entry (with the syscall number > + * passed in) if the _STRICT flag is set. > + */ > +void task_isolation_syscall(int syscall) > +{ > + /* Ignore prctl() syscalls or any task exit. */ > + switch (syscall) { > + case __NR_prctl: > + case __NR_exit: > + case __NR_exit_group: > + return; > + } > + > + pr_warn("%s/%d: task_isolation strict mode violated by syscall %d\n", > + current->comm, current->pid, syscall); > + kill_task_isolation_strict_task(); > +} Ick. I guess it works, but this is still quite ugly IMO. > +void task_isolation_exception(void) > +{ > + pr_warn("%s/%d: task_isolation strict mode violated by exception\n", > + current->comm, current->pid); > + kill_task_isolation_strict_task(); > +} Should this say what exception? --Andy