* disable tsc with seccomp
@ 2005-11-05 13:47 Andrea Arcangeli
2005-11-05 15:37 ` Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2005-11-05 13:47 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, Andrew Morton
Hello,
This changeset is backing out an useful feature I implemented some month
ago:
http://kernel.org/hg/linux-2.6/?cs=2fd4e5f089df
Anything that can strengthen security is needed, the covert channels are
theoretically possible and this is a fact, you don't need hyperthreading
for that.
I tried to convince you a few times privately but I failed, and now that
you made mainline less secure, I have to raise the topic on l-k since
all other attemps to convince you privately already failed.
As I told you a few times, in real life any admin that doesn't notice a
task running at 100% cpu load for months means there are more serious
problems in that server, than the risk of covert channel. Because of
that, covert channels remains mostly a theoretical problem in servers.
But with the CPUShare usage of seccomp, running untrusted bycode for
months at 100% cpu load is the norm, so we must disable all high
precision timing information that we can disable.
Infact we should disable MISC_ENABLE too at runtime (if possible).
Furthermore i386 still has the tsc disable with seccomp, so the fact my
patch is still applied to i386 and has been backed out only of x86-64 is
a nosense. Either we back out both (and I strongly disagree with that),
or we keep both applied (this is what I'm suggesting). Current status
makes no sense to me.
If the end result of this discussion will be that both patches are
backed out, I'll rewrite them with a config option (turned off by
default). So the CPUShare users that wants to be safer, can enable it
when compiling the kernel (plus crossing fingers in the hope that
distros would also enable it before compiling their kernels). A config
option would make it acceptable even in the worst case I hope.
I think it would have been nicer from your part to at least make it a
config option instead of dropping it right away, especially after I
explicitly asked you not to drop it.
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: disable tsc with seccomp 2005-11-05 13:47 disable tsc with seccomp Andrea Arcangeli @ 2005-11-05 15:37 ` Andi Kleen 2005-11-05 16:07 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2005-11-05 15:37 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Andrew Morton On Saturday 05 November 2005 14:47, Andrea Arcangeli wrote: > Hello, > > This changeset is backing out an useful feature I implemented some month > ago: > > http://kernel.org/hg/linux-2.6/?cs=2fd4e5f089df > > Anything that can strengthen security is needed, the covert channels are > theoretically possible and this is a fact, you don't need hyperthreading > for that. It was useless, you can get exactly the same information by using RDPMC on perfctr 0 which always runs the NMI watchdog and counts all cycles too. And even with that I don't want to have such checks in the context switch for something that is at best theoretical. Letting it in would open the floodgates for making the context switch really slow long term. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-05 15:37 ` Andi Kleen @ 2005-11-05 16:07 ` Andrea Arcangeli 2005-11-05 16:12 ` Andi Kleen 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2005-11-05 16:07 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, Andrew Morton On Sat, Nov 05, 2005 at 04:37:44PM +0100, Andi Kleen wrote: > It was useless, you can get exactly the same information by using RDPMC > on perfctr 0 which always runs the NMI watchdog and counts all cycles too. nmi watchdog is off in my system, but it was used to be very slow. Anyway performance counters should be turned off too. They can be turned off on a per task basis right? Just switching another cr4 bit or what? The fact turning off the tsc is not enough to remove all timing info, is sure not a good reason to remove that code IMHO, infact we should disable more stuff if there are other ways to gather that information. The fast path cost is constant and not measurable, no matter how much stuff we disable in the slow path. So that's not a problem. We must disable everything that can be disabled and that can provide potential high precision timing info to userland. For example we could also flip the ptes to non-present over the hpet mapping. Zero-cost! The only _fixed_ cost is the one we already had before you backed out the feature. But note that hpet is a very very low prio compared to tsc, the precision is not high enough for it to matter, but certainly I would welcome a patch flipping the hpet ptes too. I only care about turning off timings sources that allows to count cycles. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-05 16:07 ` Andrea Arcangeli @ 2005-11-05 16:12 ` Andi Kleen 2005-11-05 16:31 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2005-11-05 16:12 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Andrew Morton On Saturday 05 November 2005 17:07, Andrea Arcangeli wrote: > On Sat, Nov 05, 2005 at 04:37:44PM +0100, Andi Kleen wrote: > > It was useless, you can get exactly the same information by using RDPMC > > on perfctr 0 which always runs the NMI watchdog and counts all cycles > > too. > > nmi watchdog is off in my system, but it was used to be very slow. It is normally on on all x86-64 systems. > Anyway performance counters should be turned off too. They can be turned > off on a per task basis right? Just switching another cr4 bit or what? I definitely don't want any code like this in the context switch. It is critical and I don't want to pollute fast paths with stuff like this that nobody needs. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-05 16:12 ` Andi Kleen @ 2005-11-05 16:31 ` Andrea Arcangeli 2005-11-05 17:04 ` Andi Kleen 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2005-11-05 16:31 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, Andrew Morton On Sat, Nov 05, 2005 at 05:12:09PM +0100, Andi Kleen wrote: > It is normally on on all x86-64 systems. Can the performance counters be disabled for seccomp only right? > I definitely don't want any code like this in the context switch. It is > critical and I don't want to pollute fast paths with stuff like this > that nobody needs. 287 registered CPUShare users will appreciate to compute more securely thanks to this feature (about 10 up at any given time), and once I start allowing transactions I hope much more users will need this (it's not finished yet). We have in the kernel lots of features that slowdown a bit and that benefit only a part of the userbase. Even kmap only benefits people with >1G of ram. Even the security_* api in the syscalls only benefit a part of the userbase. There are infinite other examples. The point is that none of this is measurable, _especially_ this one in the context switch, context switches aren't as frequent as syscalls! It's only two cachelines at every context switch, and they might be hot already since we're mangling over the task struct of the prev/next regardless of my patch. If something we could discuss how to pack the seccomp structure in the task structure so that it's already hot. But calling my feature "pollution" seems a bit exaggerated to me. Plus Andrew would have never allowed it to go in, if this could have impacted performance, you also should know this can't slowdown anything and you're just talking about theory. Of course if 1000 other people also adds their feature to the context switch then it might become measurable, but this is the first time we had to change the context switch to add more security on per-task basis, so I don't see it happening any time soon. Plus if it happens we can implement some technique to share the same cacheline for multiple purposes so it remains a fixed cost (like an hook). If you tell me how to use an hook today that avoids me to touch two cachelines in the context switch fast path, let me know, I don't see it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-05 16:31 ` Andrea Arcangeli @ 2005-11-05 17:04 ` Andi Kleen 2005-11-06 1:55 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2005-11-05 17:04 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Andrew Morton On Saturday 05 November 2005 17:31, Andrea Arcangeli wrote: > On Sat, Nov 05, 2005 at 05:12:09PM +0100, Andi Kleen wrote: > > It is normally on on all x86-64 systems. > > Can the performance counters be disabled for seccomp only right? Yes, there is a bit to disable reading performance counters in ring 3. But I promise you to complain about a patch to add setting it in the context switch too :) > > I definitely don't want any code like this in the context switch. It is > > critical and I don't want to pollute fast paths with stuff like this > > that nobody needs. > > 287 registered CPUShare users will appreciate to compute more securely > thanks to this feature (about 10 up at any given time), and once I start > allowing transactions I hope much more users will need this (it's not > finished yet). I don't believe they need it - the side channel attack is too theoretical for their use case. > We have in the kernel lots of features that slowdown a bit and that > benefit only a part of the userbase. Even kmap only benefits people with > > >1G of ram. Even the security_* api in the syscalls only benefit a part > > of the userbase. There are infinite other examples. The point is that > none of this is measurable, LSM was actually quite measurable on some systems, the indirect calls really hurt on IA64 on some of the network benchmarks. > _especially_ this one in the context switch, > context switches aren't as frequent as syscalls! It's only two > cachelines at every context switch, and they might be hot If they're not hot for some reason (e.g. cache pig in userspace) you're talking about 1000+ cycles. > Plus Andrew would have never allowed it to go in, if this could have > impacted performance, you also should know this can't slowdown anything > and you're just talking about theory. The person talking about theory is you in my opinion with this basically theoretical attack. > Of course if 1000 other people also adds their feature to the context > switch then it might become measurable, but this is the first time we > had to change the context switch to add more security on per-task basis, Better to stamp out any such attempts in the roots. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-05 17:04 ` Andi Kleen @ 2005-11-06 1:55 ` Andrea Arcangeli 2005-11-21 16:43 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2005-11-06 1:55 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, Andrew Morton On Sat, Nov 05, 2005 at 06:04:08PM +0100, Andi Kleen wrote: > But I promise you to complain about a patch to add setting it in the context > switch too :) If we include the tsc disable, the performance counter disable will be completely zerocost for the fast path, so you shouldn't complain about that, and we can keep the focus only on the tsc part :) This invalidates your argument that the tsc patch can be disabled because the same info can be collected using performance counters: if you are ok with the tsc patch, the performance counter problem will be solved too at zero additional cost for the fast path. > LSM was actually quite measurable on some systems, the indirect > calls really hurt on IA64 on some of the network benchmarks. I can imagine that since they're executed during syscalls. But we're talking about context switches here, so in average order of magnitude less frequent than syscalls. > > _especially_ this one in the context switch, > > context switches aren't as frequent as syscalls! It's only two > > cachelines at every context switch, and they might be hot > > If they're not hot for some reason (e.g. cache pig in userspace) you're > talking about 1000+ cycles. What I meant is that we could try to be careful enough in ordering the task struct so that no additional cacheline has to be read from the cpu, so it doesn't matter what runs in userland. I'm all for discussing how to possibly eliminate the additional two cachelines of working set in the cpu. If you have to touch a close area during the context switch, the cacheline of the close area has to become hot, no matter what pigs runs in userland. So our objective would be to store the seccomp bitflags in the same cacheline that is required to become hot for the context switch to execute. If we can achieve that objective we'll be doing the check nearly at zerocost. > The person talking about theory is you in my opinion with this basically > theoretical attack. But there is a great deal of difference in the impact of the two theories. If my theory happens in practice, a ssh or ssl key can be stolen. If your theory happens in practice, there will be a masurable but for sure very tiny slowdown in the fast path. So if I'm right, things might go bad. If you're right, nothing will go wrong. If you're right, it basically means we need a config option around my code, but we shouldn't remove it since it still provides value to the CPUShare users (I'm sure they'll be happy to risk your claimed microslowdown, to be sure there is no covert channel possible with seccomp). I can't fix what I don't know about. But this thing I know about it, and I want to do all I can to guarantee it's impossible to trigger it. The only way to guarantee it is to apply my patch. For normal execution of tasks, disabling the tsc sounds not needed, as said in the previous email, CPUShare is the only project out there that runs untrusted bytecode at 100% cpu load for months, with the administrator acknowledging it as the "normal safe load". So CPUShare is the only thing that has to be careful about preventing covert channels. For anything else, it is more an administration problem than a kernel problem. > Better to stamp out any such attempts in the roots. I see your point, but OTOH I'd like math guarantee of preventing covert channels. I know for myself, that I'll feel safer in running CPUShare on my systems, knowing a cover channel is mathematically impossible. If you don't use CPUShare, to know a cover channel is impossible, you only have to run `top`! So you definitely don't need to disable the tsc to feel safe, if you're not running untrusted bytecode under seccomp at full cpu load. Hope this clarifies my point of view ;). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-06 1:55 ` Andrea Arcangeli @ 2005-11-21 16:43 ` Andrea Arcangeli 2005-11-21 17:05 ` Andi Kleen 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2005-11-21 16:43 UTC (permalink / raw) Cc: Andi Kleen, linux-kernel, Andrew Morton Since there was no feedback to my last post, I assume you agree, so please backout the tsc disable so then I can plug the performane counter disable on top of it (at zero additional runtime cost). Also note: even if it may not be obvious because it's nicely hidden by header files, if you disable CONFIG_SECCOMP, the tsc disable patch becomes a noop and it's optimized away by gcc (like the rest of seccomp). So if you don't want to risk the microslowdown in your systems, just disable SECCOMP and be done with it but please resurrect that patch so I can fix the performance counters too after that. Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-21 16:43 ` Andrea Arcangeli @ 2005-11-21 17:05 ` Andi Kleen 2005-11-21 17:16 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2005-11-21 17:05 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andi Kleen, linux-kernel, Andrew Morton On Mon, Nov 21, 2005 at 05:43:49PM +0100, Andrea Arcangeli wrote: > Since there was no feedback to my last post, I assume you agree, so > please backout the tsc disable so then I can plug the performane counter > disable on top of it (at zero additional runtime cost). Sorry I don't agree. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-21 17:05 ` Andi Kleen @ 2005-11-21 17:16 ` Andrea Arcangeli 2005-11-21 17:24 ` Andi Kleen 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2005-11-21 17:16 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, Andrew Morton On Mon, Nov 21, 2005 at 06:05:17PM +0100, Andi Kleen wrote: > On Mon, Nov 21, 2005 at 05:43:49PM +0100, Andrea Arcangeli wrote: > > Since there was no feedback to my last post, I assume you agree, so > > please backout the tsc disable so then I can plug the performane counter > > disable on top of it (at zero additional runtime cost). > > Sorry I don't agree. You've the config option, turn that off on your systems, what's the problem with that? Or does this mean I need to ship kernels myself with covert channels made mathematically impossible with seccomp enabled? I'd rather avoid having to ship special kernels to run CPUShare as safely as physically possible. My time is already too short, so I hope I won't have to take care of this additional burden. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-21 17:16 ` Andrea Arcangeli @ 2005-11-21 17:24 ` Andi Kleen 2005-11-21 17:38 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Andi Kleen @ 2005-11-21 17:24 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andi Kleen, linux-kernel, Andrew Morton On Mon, Nov 21, 2005 at 06:16:16PM +0100, Andrea Arcangeli wrote: > On Mon, Nov 21, 2005 at 06:05:17PM +0100, Andi Kleen wrote: > > On Mon, Nov 21, 2005 at 05:43:49PM +0100, Andrea Arcangeli wrote: > > > Since there was no feedback to my last post, I assume you agree, so > > > please backout the tsc disable so then I can plug the performane counter > > > disable on top of it (at zero additional runtime cost). > > > > Sorry I don't agree. > > You've the config option, turn that off on your systems, what's the > problem with that? > > Or does this mean I need to ship kernels myself with covert channels > made mathematically impossible with seccomp enabled? I'd rather avoid I don't believe theoretical, unlikely to be usable in any way covert channels are a justification to make fast paths slower. That's independent of CONFIG options. And in addition your change doesn't even close that channel in theory. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-21 17:24 ` Andi Kleen @ 2005-11-21 17:38 ` Andrea Arcangeli 2005-11-21 18:40 ` Andrea Arcangeli 0 siblings, 1 reply; 13+ messages in thread From: Andrea Arcangeli @ 2005-11-21 17:38 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, Andrew Morton On Mon, Nov 21, 2005 at 06:24:44PM +0100, Andi Kleen wrote: > I don't believe theoretical, unlikely to be usable in any way Demonstrate it if you can. The fact is that the closest thing to a demonstration that we have, demonstrated the opposite of what you claim, so you'll have an hard time to convince me about your point given you've no way to demonstrate it. I can guarantee that my machine is safe by running top, the only thing that escapes the easy "top" check is CPUShare and I want to fix it, since I can. > And in addition your change doesn't even close that channel > in theory. The the PCE is red herring, since it's a zero-cost addition. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: disable tsc with seccomp 2005-11-21 17:38 ` Andrea Arcangeli @ 2005-11-21 18:40 ` Andrea Arcangeli 0 siblings, 0 replies; 13+ messages in thread From: Andrea Arcangeli @ 2005-11-21 18:40 UTC (permalink / raw) Cc: Andi Kleen, linux-kernel, Andrew Morton On Sat, Nov 05, 2005 at 04:37:44PM +0100, Andi Kleen wrote: > It was useless, you can get exactly the same information by using > RDPMC on perfctr 0 which always runs the NMI watchdog and counts all > cycles too. I can't see how you can claim that you can read stuff with rdpmc. andrea@opteron:~> gdb ./a.out GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) r Starting program: /home/andrea/a.out Program received signal SIGSEGV, Segmentation fault. 0x00000000004004bc in main () (gdb) dis 0x00000000004004bc warning: bad breakpoint number at or near '0x00000000004004bc' (gdb) disassem 0x00000000004004bc Dump of assembler code for function main: 0x00000000004004b8 <main+0>: push %rbp 0x00000000004004b9 <main+1>: mov %rsp,%rbp 0x00000000004004bc <main+4>: rdpmc I get an immediate segfault if I try to execute that instruction. infact the PCE bitflag is _already_ zero (checked with sysrq+P). Perhaps you mean if you change the kernel to allow RDPMC then you have a problem? But then it means your kernel modifications are buggy if they break seccomp. Please tell me how to generate a convert channel with an unmodified 2.6.15-rc2 plus the attached patch applied so I can fix it too. I also moved the seccomp struct next to the scheduler data so the two cachelines may be hot already and the theoretical overhead may go away too. And next time please bother to send me an email instead of silenty backing out my recent patches from your tree, especially when your backouts might decrease the security of some users of the kernel. I was lucky that Andrew notified me about it. (thanks!) In the below patch I also added a forced clear of the PCE just in case somebody writes buggy kernel code (as an additional security measure so if somebody writes buggy code like you seem to imply, seccomp still won't break this way, the buggy code will break instead and it will deserve it ;). Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> diff -r 6377b3f31134 include/linux/sched.h --- a/include/linux/sched.h Mon Nov 21 06:06:28 2005 +0800 +++ b/include/linux/sched.h Mon Nov 21 20:04:38 2005 +0200 @@ -713,6 +719,8 @@ #ifdef CONFIG_SCHEDSTATS struct sched_info sched_info; #endif + + seccomp_t seccomp; struct list_head tasks; /* @@ -810,7 +818,6 @@ void *security; struct audit_context *audit_context; - seccomp_t seccomp; /* Thread group tracking */ u32 parent_exec_id; diff -r 6377b3f31134 arch/x86_64/kernel/process.c --- a/arch/x86_64/kernel/process.c Mon Nov 21 06:06:28 2005 +0800 +++ b/arch/x86_64/kernel/process.c Mon Nov 21 20:04:38 2005 +0200 @@ -485,6 +485,33 @@ } /* + * This function selects if the context switch from prev to next + * has to tweak the TSC disable bit in the cr4. + */ +static inline void disable_tsc(struct task_struct *prev_p, + struct task_struct *next_p) +{ + struct thread_info *prev, *next; + + /* + * gcc should eliminate the ->thread_info dereference if + * has_secure_computing returns 0 at compile time (SECCOMP=n). + */ + prev = prev_p->thread_info; + next = next_p->thread_info; + + if (has_secure_computing(prev) || has_secure_computing(next)) { + /* slow path here */ + if (has_secure_computing(prev) && + !has_secure_computing(next)) { + write_cr4(read_cr4() & ~X86_CR4_TSD); + } else if (!has_secure_computing(prev) && + has_secure_computing(next)) + write_cr4((read_cr4() | X86_CR4_TSD) & ~X86_CR4_PCE); + } +} + +/* * This special macro can be used to load a debugging register */ #define loaddebug(thread,r) set_debug(thread->debugreg ## r, r) @@ -603,6 +630,8 @@ memset(tss->io_bitmap, 0xff, prev->io_bitmap_max); } } + + disable_tsc(prev_p, next_p); return prev_p; } ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-11-21 18:40 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-05 13:47 disable tsc with seccomp Andrea Arcangeli 2005-11-05 15:37 ` Andi Kleen 2005-11-05 16:07 ` Andrea Arcangeli 2005-11-05 16:12 ` Andi Kleen 2005-11-05 16:31 ` Andrea Arcangeli 2005-11-05 17:04 ` Andi Kleen 2005-11-06 1:55 ` Andrea Arcangeli 2005-11-21 16:43 ` Andrea Arcangeli 2005-11-21 17:05 ` Andi Kleen 2005-11-21 17:16 ` Andrea Arcangeli 2005-11-21 17:24 ` Andi Kleen 2005-11-21 17:38 ` Andrea Arcangeli 2005-11-21 18:40 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox