From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [PATCH] x86,seccomp,prctl: Remove PR_TSC_SIGSEGV and seccomp TSC filtering Date: Sat, 4 Oct 2014 10:13:24 +0200 Message-ID: <20141004081324.GR10583@worktop.programming.kicks-ass.net> References: <20141003201409.GM10583@worktop.programming.kicks-ass.net> <20141003204443.GP10583@worktop.programming.kicks-ass.net> <20141003210213.GG6324@worktop.programming.kicks-ass.net> <20141003211204.GQ10583@worktop.programming.kicks-ass.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Andy Lutomirski Cc: "linux-kernel@vger.kernel.org" , Ingo Molnar , Kees Cook , Andrea Arcangeli , Erik Bosman , "H. Peter Anvin" , Linux API , Michael Kerrisk-manpages , Paul Mackerras , Arnaldo Carvalho de Melo , X86 ML List-Id: linux-api@vger.kernel.org On Fri, Oct 03, 2014 at 02:15:24PM -0700, Andy Lutomirski wrote: > On Fri, Oct 3, 2014 at 2:12 PM, Peter Zijlstra wrote: > > On Fri, Oct 03, 2014 at 02:04:53PM -0700, Andy Lutomirski wrote: > >> On Fri, Oct 3, 2014 at 2:02 PM, Peter Zijlstra wrote: > > > >> > Something like so.. slightly less ugly and possibly with more > >> > complicated conditions setting the cr4 if you want to fix tsc vs seccomp > >> > as well. > >> > >> This will crash anything that tries rdpmc in an allow-everything > >> seccomp sandbox. It's also not very compatible with my grand scheme > >> of allowing rdtsc to be turned off without breaking clock_gettime. :) > > > > Well, we clear cap_user_rdpmc, so everybody who still tries it gets what > > he deserves, no problem there. > > Oh, interesting. > > To continue playing devil's advocate, what if you do perf_event_open, > then mmap it, then start the seccomp sandbox? We update that cap bit on every update to the self-monitor state, and in a perfect world people would also check the cap bit every time they try and read it, and fall back to the syscall. So we could just clear it.. but I can imagine reality ruining things here. > My draft patches are currently tracking the number of perf_event mmaps > per mm. I'm not thrilled with it, but it's straightforward. And I > still need to benchmark cr4 writes, which is tedious, because I can't > do it from user code. Should be fairly straight fwd from kernel space, get a tsc stamp, read+write cr4 1000 times, get another tsc read, and maybe do that several times. No?