[2.6 patch] let CONFIG_SECCOMP default to n

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [2.6 patch] let CONFIG_SECCOMP default to n
@ 2006-06-29 19:21 Adrian Bunk
  2006-06-30  0:44 ` Lee Revell
  0 siblings, 1 reply; 73+ messages in thread
From: Adrian Bunk @ 2006-06-29 19:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Ingo Molnar

From: Ingo Molnar <mingo@elte.hu>

I was profiling the scheduler on x86 and noticed some overhead related 
to SECCOMP, and indeed, SECCOMP runs disable_tsc() at _every_ 
context-switch:

        if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
                handle_io_bitmap(next, tss);

        disable_tsc(prev_p, next_p);

        return prev_p;

these are a couple of instructions in the hottest scheduler codepath!

x86_64 already removed disable_tsc() from switch_to(), but i think the 
right solution is to turn SECCOMP off by default.

besides the runtime overhead, there are a couple of other reasons as 
well why this should be done:

 - CONFIG_SECCOMP=y adds 836 bytes of bloat to the kernel:

       text    data     bss     dec     hex filename
    4185360  867112  391012 5443484  530f9c vmlinux-noseccomp
    4185992  867316  391012 5444320  5312e0 vmlinux-seccomp

 - virtually nobody seems to be using it (but cpushare.com, which seems
   pretty inactive)

 - users/distributions can still turn it on if they want it

 - http://www.cpushare.com/legal seems to suggest that it is pursuing a
   software patent to utilize the seccomp concept in a distributed 
   environment, and seems to give a promise that 'end users' will not be
   affected by that patent. How about non-end-users [i.e. server-side]?
   Has the Linux kernel become a vehicle for a propriety server-side
   feature, with every Linux user paying the price of it?

so the patch below just does the minimal common-sense change: turn it 
off by default.

Adrian Bunk:
I've removed the superfluous "default n"'s the original patch introduced.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

----

This patch was already sent on:
- 26 Jun 2006
- 27 Apr 2006
- 19 Apr 2006
- 11 Apr 2006
- 10 Mar 2006
- 29 Jan 2006
- 21 Jan 2006

This patch was sent by Ingo Molnar on:
- 9 Jan 2006

Index: linux/arch/i386/Kconfig
===================================================================
--- linux.orig/arch/i386/Kconfig
+++ linux/arch/i386/Kconfig
@@ -637,7 +637,6 @@ config REGPARM
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/mips/Kconfig
===================================================================
--- linux.orig/arch/mips/Kconfig
+++ linux/arch/mips/Kconfig
@@ -1787,7 +1787,6 @@ config BINFMT_ELF32
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS && BROKEN
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/powerpc/Kconfig
===================================================================
--- linux.orig/arch/powerpc/Kconfig
+++ linux/arch/powerpc/Kconfig
@@ -666,7 +666,6 @@ endif
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/ppc/Kconfig
===================================================================
--- linux.orig/arch/ppc/Kconfig
+++ linux/arch/ppc/Kconfig
@@ -1127,7 +1127,6 @@ endif
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/sparc64/Kconfig
===================================================================
--- linux.orig/arch/sparc64/Kconfig
+++ linux/arch/sparc64/Kconfig
@@ -64,7 +64,6 @@ endchoice
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -466,7 +466,6 @@ config PHYSICAL_START
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-29 19:21 [2.6 patch] let CONFIG_SECCOMP default to n Adrian Bunk
@ 2006-06-30  0:44 ` Lee Revell
  2006-06-30  1:07   ` Andrew Morton
  0 siblings, 1 reply; 73+ messages in thread
From: Lee Revell @ 2006-06-30  0:44 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, linux-kernel, Ingo Molnar

On Thu, 2006-06-29 at 21:21 +0200, Adrian Bunk wrote:
> This patch was already sent on:
> - 26 Jun 2006
> - 27 Apr 2006
> - 19 Apr 2006
> - 11 Apr 2006
> - 10 Mar 2006
> - 29 Jan 2006
> - 21 Jan 2006 

3 days ago?  That seems a bit silly.  Why didn't you just ping Andrew on
it?

Andrew, what's the status of this?  Can we get an ACK or a NACK before
this starts getting reposted every day? ;-)

Lee


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  0:44 ` Lee Revell
@ 2006-06-30  1:07   ` Andrew Morton
  2006-06-30  1:40     ` Adrian Bunk
  2006-06-30  2:35     ` Randy.Dunlap
  0 siblings, 2 replies; 73+ messages in thread
From: Andrew Morton @ 2006-06-30  1:07 UTC (permalink / raw)
  To: Lee Revell; +Cc: bunk, linux-kernel, mingo

Lee Revell <rlrevell@joe-job.com> wrote:
>
> On Thu, 2006-06-29 at 21:21 +0200, Adrian Bunk wrote:
> > This patch was already sent on:
> > - 26 Jun 2006
> > - 27 Apr 2006
> > - 19 Apr 2006
> > - 11 Apr 2006
> > - 10 Mar 2006
> > - 29 Jan 2006
> > - 21 Jan 2006 
> 
> 3 days ago?  That seems a bit silly.  Why didn't you just ping Andrew on
> it?
> 
> Andrew, what's the status of this?  Can we get an ACK or a NACK before
> this starts getting reposted every day? ;-)
> 

I am stolidly letting the arch maintainers and the developer of this
feature work out what to do.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  1:07   ` Andrew Morton
@ 2006-06-30  1:40     ` Adrian Bunk
  2006-06-30  4:52       ` Andrea Arcangeli
  2006-06-30 12:39       ` [2.6 patch] " Alan Cox
  2006-06-30  2:35     ` Randy.Dunlap
  1 sibling, 2 replies; 73+ messages in thread
From: Adrian Bunk @ 2006-06-30  1:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lee Revell, linux-kernel, mingo, Alan Cox, Linus Torvalds,
	Andrea Arcangeli

On Thu, Jun 29, 2006 at 06:07:06PM -0700, Andrew Morton wrote:
> Lee Revell <rlrevell@joe-job.com> wrote:
> >
> > On Thu, 2006-06-29 at 21:21 +0200, Adrian Bunk wrote:
> > > This patch was already sent on:
> > > - 26 Jun 2006
> > > - 27 Apr 2006
> > > - 19 Apr 2006
> > > - 11 Apr 2006
> > > - 10 Mar 2006
> > > - 29 Jan 2006
> > > - 21 Jan 2006 
> > 
> > 3 days ago?  That seems a bit silly.  Why didn't you just ping Andrew on
> > it?
> > 
> > Andrew, what's the status of this?  Can we get an ACK or a NACK before
> > this starts getting reposted every day? ;-)
> 
> I am stolidly letting the arch maintainers and the developer of this
> feature work out what to do.

Andrea is proud of getting a patent for the server part [1], so I doubt 
he would be happy with no longer having the client part defaulting to Y...

It might sound a bit strange that although Alan Cox and Linus Torvalds 
even wrote an open letter to the President of the European Parliament
calling "Software patents are also the utmost threat to the development 
of Linux and other free software products" [2]...

One bonus point for people arguing in favor of software patents - even 
Linux actively supports patented services.

cu
Adrian

[1] http://www.cpushare.com/legal
[2] http://www.effi.org/patentit/patents_torvalds_cox.txt

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  1:40     ` Adrian Bunk
@ 2006-06-30  4:52       ` Andrea Arcangeli
  2006-06-30  9:47         ` Ingo Molnar
  2006-06-30 12:39       ` [2.6 patch] " Alan Cox
  1 sibling, 1 reply; 73+ messages in thread
From: Andrea Arcangeli @ 2006-06-30  4:52 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, Lee Revell, linux-kernel, mingo, Alan Cox,
	Linus Torvalds

Hello,

On Fri, Jun 30, 2006 at 03:40:50AM +0200, Adrian Bunk wrote:
> Andrea is proud [..]

I wish I could be proud of it like you suggest, but for now it remains
to be seen if it will be approved and useful, perhaps one day will pay
off and I could be proud of the hard work, but for now I'm being very
cautious.

> [..] so I doubt 
> he would be happy with no longer having the client part defaulting to
> Y... [..]

Correct but this is a purely technical matter, let's not confuse
technical issues with strict bureaucracy.

> It might sound a bit strange that although Alan Cox and Linus Torvalds 
> even wrote an open letter to the President of the European Parliament
> calling "Software patents are also the utmost threat to the development 
> of Linux and other free software products" [2]...

Alan filed too:

	http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PG01&s1=%22alan+cox%22&OS="alan+cox"&RS="alan+cox"

Ingo who started this focus on disabling seccomp by default filed too:

	http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=molnar&s2=ingo&OS=molnar+AND+ingo&RS=molnar+AND+ingo

FWIW I agree that patents at large are one of the threats to the
development of linux and other free software. But it's not me that
you've to talk with if you want to change the system. I can only agree
with Linus, Alan and you. Like most others here I would be _very_ happy
if all patents would suddently disappear, not just the software patents.

Infact you should inform yourself on the usages of your cpu resources if
you're donating them:

	http://ubuntuforums.org/archive/index.php/t-31418.html"***</t-116861.html

As far as CPUShare is concerned that problem doesn't exist because it's
not a donation, so it's up to you to ask a fair price.

> One bonus point for people arguing in favor of software patents - even 
> Linux actively supports patented services.

Not quite, all open source or proprietary OS out there are free to add
seccomp to their kernels, I will never have any right about the seccomp
idea no matter what.

BTW, I suggested a few weeks ago to the rpm maintainer to use seccomp to
validate the rpm header data because he wasn't convinced that such code
could be trusted. It seems he was looking into it. There are many other
possible usages but nobody ever got intersted to implement them so far.

I think Y is the right setting. If something I can add a secondary
config option for the tsc disable and set that one to N, but the global
CONFIG_SECCOMP should be set to Y beause it generates absolutely zero
overhead, not just pratically but theoretically too.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  4:52       ` Andrea Arcangeli
@ 2006-06-30  9:47         ` Ingo Molnar
  2006-06-30 14:58           ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Ingo Molnar @ 2006-06-30  9:47 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel, Alan Cox,
	Linus Torvalds


* Andrea Arcangeli <andrea@cpushare.com> wrote:

> Alan filed too:

> Ingo who started this focus on disabling seccomp by default filed too:

and both are pledged and available to GPL users. Is yours?

	Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  9:47         ` Ingo Molnar
@ 2006-06-30 14:58           ` andrea
  2006-07-11  7:36             ` [patch] " Ingo Molnar
  0 siblings, 1 reply; 73+ messages in thread
From: andrea @ 2006-06-30 14:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel, Alan Cox,
	Linus Torvalds

On Fri, Jun 30, 2006 at 11:47:53AM +0200, Ingo Molnar wrote:
> and both are pledged and available to GPL users. [..]

If the GPL offered any protection to my system software I would
consider it too, but the GPL can't protect software that runs behind
the corporate firewall. You know google can change the kernel without
having to release anything back (note that they a few times posted me
patches and fixes, so they at least try to contribute their changes back
to the community, it's in their interest I think, but I'm just saying
they're not _required_ to publish the exact copy of the kernel that runs
on their servers, if I'm wrong then please send me the link where to
download it). So if I would release my software as GPL anybody with a
bigger web farm than I have could install it, throw some million on ads,
and then I could just setup a redirect from my server that points at
theirs because I would have no chance to survive a competitor with
better financing. Make a license that forces them to release the
software behind the firewall like they have to do if they offer it as
download, and I will think about it. And at the moment thinking about it
or trying writing a license like that myself, is just wasted time. I'll
think about these matters only if it will accepted.

And for yours that covers the http optimizations inside the http
accellerator, apache and other open source webservers aren't GPL and if
you only pledged it under the GPL like you suggest above, apache still
is forbidden to use your technique:

	http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=5&f=G&l=50&co1=AND&d=PG01&s1=molnar&s2=ingo&OS=molnar+AND+ingo&RS=molnar+AND+ingo

Same goes for sendmail in the mail one, assuming it has something to do
with the mail (I didn't read it all since it's not my field of
interest).

If I've to keep reading these threads about CONFIG_SECCOMP every few
months then set it to N (even if I disagree with that setting). Like
Alan said, what really matters is what distro will choose in their
config, not the default (and I doubt fedora ships with cifs=Y like the
default where only the required stuff is set to Y, please focus on the
big stuff first ;).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [patch] let CONFIG_SECCOMP default to n
  2006-06-30 14:58           ` andrea
@ 2006-07-11  7:36             ` Ingo Molnar
  2006-07-11 14:17               ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Ingo Molnar @ 2006-07-11  7:36 UTC (permalink / raw)
  To: andrea
  Cc: Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel, Alan Cox,
	Linus Torvalds


* andrea@cpushare.com <andrea@cpushare.com> wrote:

> On Fri, Jun 30, 2006 at 11:47:53AM +0200, Ingo Molnar wrote:
> > and both are pledged and available to GPL users. [..]
> 
> If the GPL offered any protection to my system software I would 
> consider it too, but the GPL can't protect software that runs behind 
> the corporate firewall. [...]

so you admit and confirm that you explicitly and intentionally do not 
pledge your patent to GPL users. That is troubling (and unethical) in my 
opinion and strengthens my argument that this feature should be _at a 
minimum_ be made default-off, just like hundreds of other kernel 
features are.

I'm also wondering what the upstream decision would have been, had you 
disclosed this patent licensing intention of yours. (to use the GPL-ed 
Linux kernel as a vehicle for your 'invention', while not fully living 
up to the basic quid-pro-quo.).

and i'm not really interested in marketing arguments about cpushare and 
seccomp in general. What matters to me is that this feature has been in 
the kernel for more than a year already, that nobody but you is using it 
and that _everyone_ using the default kernel options is paying the price 
in the context-switch hotpath. (The fact that in my view one of 
seccomp's obvious user-space usages is also patent-tainted without a 
fair pledge is 'just' icing on the cake.)

So i'd like to request the patch below to be included in v2.6.18.

	Ingo

---------------->
From: Ingo Molnar <mingo@elte.hu>
Subject: let CONFIG_SECCOMP default to n

I was profiling the scheduler on x86 and noticed some overhead related 
to SECCOMP, and indeed, SECCOMP runs disable_tsc() at _every_ 
context-switch:

        if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
                handle_io_bitmap(next, tss);

        disable_tsc(prev_p, next_p);

        return prev_p;

these are a couple of instructions in the hottest scheduler codepath!

x86_64 already removed disable_tsc() from switch_to(), but i think the 
right solution is to turn SECCOMP off by default.

besides the runtime overhead, there are a couple of other reasons as 
well why this should be done:

 - CONFIG_SECCOMP=y adds 836 bytes of bloat to the kernel:

       text    data     bss     dec     hex filename
    4185360  867112  391012 5443484  530f9c vmlinux-noseccomp
    4185992  867316  391012 5444320  5312e0 vmlinux-seccomp

 - virtually nobody seems to be using it (but cpushare.com, which seems
   pretty inactive)

 - users/distributions can still turn it on if they want it

 - http://www.cpushare.com/legal seems to suggest that it is pursuing a
   software patent to utilize the seccomp concept in a distributed 
   environment, and seems to give a promise that 'end users' will not be
   affected by that patent. How about non-end-users [i.e. server-side]?
   Has the Linux kernel become a vehicle for a propriety server-side
   feature, with every Linux user paying the price of it?

so the patch below just does the minimal common-sense change: turn it 
off by default.

Adrian Bunk:
I've removed the superfluous "default n"'s the original patch introduced.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

----

This patch was already sent on:
- 19 Apr 2006
- 11 Apr 2006
- 10 Mar 2006
- 29 Jan 2006
- 21 Jan 2006

This patch was sent by Ingo Molnar on:
- 9 Jan 2006

Index: linux/arch/i386/Kconfig
===================================================================
--- linux.orig/arch/i386/Kconfig
+++ linux/arch/i386/Kconfig
@@ -637,7 +637,6 @@ config REGPARM
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/mips/Kconfig
===================================================================
--- linux.orig/arch/mips/Kconfig
+++ linux/arch/mips/Kconfig
@@ -1787,7 +1787,6 @@ config BINFMT_ELF32
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS && BROKEN
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/powerpc/Kconfig
===================================================================
--- linux.orig/arch/powerpc/Kconfig
+++ linux/arch/powerpc/Kconfig
@@ -666,7 +666,6 @@ endif
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/ppc/Kconfig
===================================================================
--- linux.orig/arch/ppc/Kconfig
+++ linux/arch/ppc/Kconfig
@@ -1127,7 +1127,6 @@ endif
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/sparc64/Kconfig
===================================================================
--- linux.orig/arch/sparc64/Kconfig
+++ linux/arch/sparc64/Kconfig
@@ -64,7 +64,6 @@ endchoice
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -466,7 +466,6 @@ config PHYSICAL_START
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11  7:36             ` [patch] " Ingo Molnar
@ 2006-07-11 14:17               ` andrea
  2006-07-11 14:32                 ` Arjan van de Ven
  2006-07-11 15:54                 ` Pavel Machek
  0 siblings, 2 replies; 73+ messages in thread
From: andrea @ 2006-07-11 14:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel, Alan Cox,
	Linus Torvalds

On Tue, Jul 11, 2006 at 09:36:25AM +0200, Ingo Molnar wrote:
> 
> * andrea@cpushare.com <andrea@cpushare.com> wrote:
> 
> > On Fri, Jun 30, 2006 at 11:47:53AM +0200, Ingo Molnar wrote:
> > > and both are pledged and available to GPL users. [..]
> > 
> > If the GPL offered any protection to my system software I would 
> > consider it too, but the GPL can't protect software that runs behind 
> > the corporate firewall. [...]
> 
> so you admit and confirm that you explicitly and intentionally do not 
> pledge your patent to GPL users. [..]

How many times do I need to say it. The pending patent has nothing to do
with the kernel, and it is _not_ pledged under the GPL.

The software world is moving from proprietary software that runs on top
of end user computers, to proprietary software that runs behind the
firewall. And the GPL is getting old and not capable to cope with the
second kind of usage of the software. This is being discussed in gplv3
context.

This isn't too bad, because as long as the software you run is open
source, you're guaranteed to be in control of what runs on your
computer, in control of your privacy, and in control of your security.
So it's a better model, you pratically run the proprietary software on
the server through the firefox open source client stack, but but the
general forced-contribution-mode that the GPL allows, tends to weaken
signficantly when software is only meant to run behind the firewall of
large corporations that don't depend on external companies for their own
software support.

I've no magic solution for this, I'm not a lawyer so I've no idea if
anything more than the GPL would be enforceable under US law in the
first place (until recently it wasn't even 100% sure that the GPL was
enforceable), so I certainly can't help on this side.

> [..] That is troubling (and unethical) in my 

You should talk about ethics to the people that written the law,
certainly not with me. I've to comply with the law and with the rules of
economy. It's not my fault the fact that if I don't file patents I risk
troubles ahead. I'm against patents too and it wasn't fun to pay for the
legal costs of the filing. I seem to recall that transmeta also filed a
software patent over the CMS but it seems people had not many problems
to work for them (you once sent emails @transmeta.com too). Transmeta
perhaps was the smallest player, there are many more bigger examples
that I won't be making.

Furthermore I obtained no patents yet, it could be the way I written the
text is wrong or whatever, I'm not the most expert in terms of patents,
I did my best just to be sure I couldn't regret having done a fatal
and obvious mistake later and it seems everything is going fine, but
doing my best is definitely no guarantee of success. So it could be all
this will be void if I've bad luck and they reject my application.

> opinion and strengthens my argument that this feature should be _at a 
> minimum_ be made default-off, just like hundreds of other kernel 
> features are.

See the below email of yours.

Also note, seccomp is the basic mode, at some point trusted
computing will be supported by xen so then I'll have a compelling reason
to switch the client side from seccomp to xen. But even then I'll be
always supporting seccomp for a long time because it's the most solid
and simplest mode of all.

> I'm also wondering what the upstream decision would have been, had you 
> disclosed this patent licensing intention of yours. (to use the GPL-ed 

Talking about patents when submitting seccomp, would be like talking
about mp3 patents when submitting alsa code or talking about google
server side patents when submitting a new tcp/ip feature that could
allow google render html faster. This whole discussion is officially
offtopic and it's a pure waste of bandwidth(tm), believe it or not.

All grid computing providers out there are welcome to start using
seccomp today to make their clients more secure against possible
software bugs in the remote computed bytecode. I welcome boinc to start
using seccomp too, I welcome worldcommunitygrid to use seccomp on the
linux client, I welcome all OS vendors to add seccomp to their OS, I
even tried to contact apple since it should be easy to port seccomp to
their kernels.

As far as cpushare is concerned, I never had anything to hide. The legal
part of the website is there since day zero I think.

> Linux kernel as a vehicle for your 'invention', while not fully living 
> up to the basic quid-pro-quo.).

If you prefer I can move all patent pending code on top of windows,
though I believe the "powered by" ads on my site were nice ads for free
software and open source (and obviously I'm very proud to be using them
like I'm very proud to be able to contribute to free software as well).
If you check the user agreement I even stated that any income generated
by cpucoins transactions started by cpushare will be donated to the
development of open source software.

> So i'd like to request the patch below to be included in v2.6.18.

Here the email that now you're forcing me to remind you:

http://www.cpushare.com/hypermail/cpushare-discuss/06/01/0080.html

	"ok, i agree with you here - having it on by default does make
	sense from an API uniformity POV."

I didn't feel the need of mentioning this opinion of yours before as
backing for my arguments pro Y, because I thought you were agreeing with
my previous post and that my arguments alone were enough, but since you
changed your mind again...

Note that I don't think Y or N makes any difference at the end for my
project. But fedora could set it to N under your advice and that would
do more damage to my project than whatever default setting we have
in-kernel. So if you want to hurt my project, you should ask Dave to
turn off seccomp instead of asking Linus to turn off it in the kernel
source.

Even more significant is that fact that it turns out 90% of the
interested userbase seems windows based, so for 90% of users it won't
matter if seccomp is even in the kernel.

Even if you seem to believe I don't care about the kernel when I talk
about seccomp, I really think Y is the right setting for the kernel, and
I'm not speaking for my own personal usages of seccomp, for the reason
why you also agreed with it in the above email a few months ago.

But like I said in previous emails, if these discussion about Y or N
have to keep going, then feel free to apply Ingo patch to make him
happy. I'm happy either ways even but I think Y is the appropriate
setting like with EPOLL and friends, now that all overhead triggered by
the purely paranoid additional feature has been removed by default.

config EPOLL
        bool "Enable eventpoll support" if EMBEDDED
        default y

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 14:17               ` andrea
@ 2006-07-11 14:32                 ` Arjan van de Ven
  2006-07-11 15:31                   ` andrea
  2006-07-11 15:54                 ` Pavel Machek
  1 sibling, 1 reply; 73+ messages in thread
From: Arjan van de Ven @ 2006-07-11 14:32 UTC (permalink / raw)
  To: andrea
  Cc: Ingo Molnar, Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel,
	Alan Cox, Linus Torvalds


> Note that I don't think Y or N makes any difference at the end for my
> project. But fedora could set it to N under your advice and that would
> do more damage to my project than whatever default setting we have

as far as I can see Fedora has SECCOMP off for a long time already

> Even if you seem to believe I don't care about the kernel when I talk
> about seccomp, I really think Y is the right setting for the kernel, and
> I'm not speaking for my own personal usages of seccomp, for the reason
> why you also agreed with it in the above email a few months ago.

if there is overhead, and there is no general use for it (which there
isn't really) then it should be off imo.



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 14:32                 ` Arjan van de Ven
@ 2006-07-11 15:31                   ` andrea
  2006-07-11 15:54                     ` Arjan van de Ven
                                       ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: andrea @ 2006-07-11 15:31 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel,
	Alan Cox, Linus Torvalds

On Tue, Jul 11, 2006 at 04:32:53PM +0200, Arjan van de Ven wrote:
> as far as I can see Fedora has SECCOMP off for a long time already

Well, I didn't know about it... Long time can't be more than a few
months because I was sure in older releases it was enabled because I had
people running seccomp code on fedora.

I never expect it was easy thing to startup the CPUShare project, but
one thing that I didn't expect however was this kind of behaviour from
the leading linux vendor, I didn't get a single email of questions and I
wasn't informed about this, despite they know me perfectly. This
effectively reminds me about the high profile news articles I keep
reading recently that on the sidelines mentions about some RH behaviour
in the industry.

> if there is overhead, and there is no general use for it (which there
> isn't really) then it should be off imo.

I hope the reason was the lack of my last patch. But even in such case
RH could have turned off the tsc thing immediately themself (they know
how to patch the kernel no?) or they could have asked me a single
question about it before turning it off, no?

I hope RH will reconsider with my last patch applied and at the light of
this email as well:

	http://www.cpushare.com/hypermail/cpushare-discuss/06/01/0080.html

If they don't reconsider I'll be forced to recommend the Fedora CPUShare
users to switch distro if they don't want having to recompile the kernel
by themself.

I guess now I understand why this new change of mind of Ingo: if he
would succeed to push the N in the main kernel, then nobody could
complain to fedora for setting it to N, while they're in a less obvious
position at the moment where the kernel says "default to y" and they set
it to N to be happy.

As for no general use, this is the people that certainly used seccomp so
far:

cpushare=> select count(*) from accounts where cpucoins != 0;
 count 
-------
   122
(1 row)

cpushare=> 

remove 1 that is myself, that leaves 121 persons using seccomp so far
in CPUShare context. One first user already started buying CPU resource
a few days ago, and he's currently computing his own seccomp bytecode
remotely as we speak. So unless they're all wasting their time by
helping me testing the stuff, I'm not the only one that find at least
one useful usage for seccomp (but I think there are many more if only
people would care to use it). Certainly the FUD about the Y and N
availability doesn't help in convincing people to use seccomp to
strengthen decompression security etc...

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 15:31                   ` andrea
@ 2006-07-11 15:54                     ` Arjan van de Ven
  2006-07-11 16:13                       ` andrea
  2006-07-11 16:25                       ` Alan Cox
  2006-07-11 16:02                     ` Adrian Bunk
  2006-07-11 16:24                     ` Alan Cox
  2 siblings, 2 replies; 73+ messages in thread
From: Arjan van de Ven @ 2006-07-11 15:54 UTC (permalink / raw)
  To: andrea
  Cc: Ingo Molnar, Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel,
	Alan Cox, Linus Torvalds

On Tue, 2006-07-11 at 17:31 +0200, andrea@cpushare.com wrote:
> On Tue, Jul 11, 2006 at 04:32:53PM +0200, Arjan van de Ven wrote:
> > as far as I can see Fedora has SECCOMP off for a long time already
> 
> Well, I didn't know about it... Long time can't be more than a few
> months because I was sure in older releases it was enabled because I had
> people running seccomp code on fedora.

hmm I checked my laptop which runs a quite old version

> I never expect it was easy thing to startup the CPUShare project, but
> one thing that I didn't expect however was this kind of behaviour from
> the leading linux vendor, I didn't get a single email of questions and I
> wasn't informed about this, despite they know me perfectly. 

Ehm I wasn't aware all linux vendors in the world owe that to you, or
that you own their kernel configuration

> > if there is overhead, and there is no general use for it (which there
> > isn't really) then it should be off imo.
> 
> I hope the reason was the lack of my last patch. But even in such case
> RH could have turned off the tsc thing immediately themself (they know
> how to patch the kernel no?) or they could have asked me a single
> question about it before turning it off, no?
> 

I have no idea; I don't work there. Also I checked Fedora, not RHEL, and
Fedora is done by the Fedora project, not by Red Hat the company. If you
want to ask them to enable it, you should do so on the fedora-devel
mailing list




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 15:54                     ` Arjan van de Ven
@ 2006-07-11 16:13                       ` andrea
  2006-07-11 16:23                         ` Arjan van de Ven
  2006-07-11 16:57                         ` Alan Cox
  2006-07-11 16:25                       ` Alan Cox
  1 sibling, 2 replies; 73+ messages in thread
From: andrea @ 2006-07-11 16:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel,
	Alan Cox, Linus Torvalds

On Tue, Jul 11, 2006 at 05:54:02PM +0200, Arjan van de Ven wrote:
> Ehm I wasn't aware all linux vendors in the world owe that to you, or
> that you own their kernel configuration

I perfectly know nobody owes anything to me, I said I didn't expect it
because it sounds very weird having to take an anti-fedora position in a
project like CPUShare. Hope you didn't get it wrong because I'd be sad
having opened this whole topic if you were wrong and SECCOMP was
actually enabled in fedora.

> I have no idea; I don't work there. Also I checked Fedora, not RHEL, and
> Fedora is done by the Fedora project, not by Red Hat the company. If you
> want to ask them to enable it, you should do so on the fedora-devel
> mailing list

Aren't Ingo and Alan Fedora? If they ask N in the main kernel, and they
already set it to N in fedora I'm unsure what I should discuss further
with them.

And most of this whole thread is grossly offtopic, I'm amazed nobody
complained yet about the questions they ask about cpushare legal details
on this list, I guess it was entertaining enough for people not to
complain just yet.

I won't post more emails from my part... hope it helps reducing the
noise.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 16:13                       ` andrea
@ 2006-07-11 16:23                         ` Arjan van de Ven
  2006-07-11 16:57                         ` Alan Cox
  1 sibling, 0 replies; 73+ messages in thread
From: Arjan van de Ven @ 2006-07-11 16:23 UTC (permalink / raw)
  To: andrea
  Cc: Ingo Molnar, Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel,
	Alan Cox, Linus Torvalds

On Tue, 2006-07-11 at 18:13 +0200, andrea@cpushare.com wrote:
> On Tue, Jul 11, 2006 at 05:54:02PM +0200, Arjan van de Ven wrote:
> > Ehm I wasn't aware all linux vendors in the world owe that to you, or
> > that you own their kernel configuration
> 
> I perfectly know nobody owes anything to me, I said I didn't expect it
> because it sounds very weird having to take an anti-fedora position in a
> project like CPUShare. 

it sounds very weird taking an anti-fedora position without even having
asked Fedora to turn it on. But maybe that's just me.




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 16:13                       ` andrea
  2006-07-11 16:23                         ` Arjan van de Ven
@ 2006-07-11 16:57                         ` Alan Cox
  1 sibling, 0 replies; 73+ messages in thread
From: Alan Cox @ 2006-07-11 16:57 UTC (permalink / raw)
  To: andrea
  Cc: Arjan van de Ven, Ingo Molnar, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

Ar Maw, 2006-07-11 am 18:13 +0200, ysgrifennodd andrea@cpushare.com:
> On Tue, Jul 11, 2006 at 05:54:02PM +0200, Arjan van de Ven wrote:
> Aren't Ingo and Alan Fedora? If they ask N in the main kernel, and they
> already set it to N in fedora I'm unsure what I should discuss further
> with them.

Neither of us are the ones who set the final options for Fedora kernels,
although I'd be suprised given the minisclue user base and the added
cost to every user if anyone else reached a differing conclusion, but
that is for the Fedora Project to decide not for me.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 15:54                     ` Arjan van de Ven
  2006-07-11 16:13                       ` andrea
@ 2006-07-11 16:25                       ` Alan Cox
  1 sibling, 0 replies; 73+ messages in thread
From: Alan Cox @ 2006-07-11 16:25 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: andrea, Ingo Molnar, Adrian Bunk, Andrew Morton, Lee Revell,
	linux-kernel, Alan Cox, Linus Torvalds

Ar Maw, 2006-07-11 am 17:54 +0200, ysgrifennodd Arjan van de Ven:
> On Tue, 2006-07-11 at 17:31 +0200, andrea@cpushare.com wrote:
> Fedora is done by the Fedora project, not by Red Hat the company. If you
> want to ask them to enable it, you should do so on the fedora-devel
> mailing list

Or roll your own alternative kernel package and set up a mini yum
repository for it and updates you make. The GPL is a wonderful thing 8)


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 15:31                   ` andrea
  2006-07-11 15:54                     ` Arjan van de Ven
@ 2006-07-11 16:02                     ` Adrian Bunk
  2006-07-11 16:16                       ` andrea
  2006-07-11 16:24                     ` Alan Cox
  2 siblings, 1 reply; 73+ messages in thread
From: Adrian Bunk @ 2006-07-11 16:02 UTC (permalink / raw)
  To: andrea
  Cc: Arjan van de Ven, Ingo Molnar, Andrew Morton, Lee Revell,
	linux-kernel, Alan Cox, Linus Torvalds

On Tue, Jul 11, 2006 at 05:31:17PM +0200, andrea@cpushare.com wrote:
> On Tue, Jul 11, 2006 at 04:32:53PM +0200, Arjan van de Ven wrote:
>...
> > if there is overhead, and there is no general use for it (which there
> > isn't really) then it should be off imo.
> 
> I hope the reason was the lack of my last patch. But even in such case
> RH could have turned off the tsc thing immediately themself (they know
> how to patch the kernel no?) or they could have asked me a single
> question about it before turning it off, no?
> 
> I hope RH will reconsider with my last patch applied and at the light of
> this email as well:
> 
> 	http://www.cpushare.com/hypermail/cpushare-discuss/06/01/0080.html
> 
> If they don't reconsider I'll be forced to recommend the Fedora CPUShare
> users to switch distro if they don't want having to recompile the kernel
> by themself.
> 
> I guess now I understand why this new change of mind of Ingo: if he
> would succeed to push the N in the main kernel, then nobody could
> complain to fedora for setting it to N, while they're in a less obvious
> position at the moment where the kernel says "default to y" and they set
> it to N to be happy.
>...

WTF are you smoking?

You said yourself that your feature has currently exactly 121 users.

And why should anyone have to contact you before disabling your feature?
Everyone enables the subset of features he considers useful, and there's
no reason to contact anyone when disabling a feature in the kernel
(or would you consider it a morally bad thing that I disabled kernel 
preemption in my kernel without asking anyone for permission?).

And it was you who said just a few days ago [1]:

<--  snip  -->

...
If I've to keep reading these threads about CONFIG_SECCOMP every few
months then set it to N (even if I disagree with that setting). Like
Alan said, what really matters is what distro will choose in their
config, not the default (and I doubt fedora ships with cifs=Y like the
default where only the required stuff is set to Y, please focus on the
big stuff first ;).

<--  snip  -->

cu
Adrian

[1] http://lkml.org/lkml/2006/6/30/132

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 16:02                     ` Adrian Bunk
@ 2006-07-11 16:16                       ` andrea
  0 siblings, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-11 16:16 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Arjan van de Ven, Ingo Molnar, Andrew Morton, Lee Revell,
	linux-kernel, Alan Cox, Linus Torvalds

On Tue, Jul 11, 2006 at 06:02:36PM +0200, Adrian Bunk wrote:
> And it was you who said just a few days ago [1]:
> 
> <--  snip  -->
> 
> ...
> If I've to keep reading these threads about CONFIG_SECCOMP every few
> months then set it to N (even if I disagree with that setting). Like
> Alan said, what really matters is what distro will choose in their
> config, not the default (and I doubt fedora ships with cifs=Y like the
> default where only the required stuff is set to Y, please focus on the
> big stuff first ;).
> 
> <--  snip  -->

The above was in the context of the mainline kernel in case you didn't
notice (when I wrote the above I expected fedora to set it to Y even if
the main kernel was set to N, imagine how way off I was when I wrote the
above ;).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 15:31                   ` andrea
  2006-07-11 15:54                     ` Arjan van de Ven
  2006-07-11 16:02                     ` Adrian Bunk
@ 2006-07-11 16:24                     ` Alan Cox
  2006-07-12 15:43                       ` Andi Kleen
  2 siblings, 1 reply; 73+ messages in thread
From: Alan Cox @ 2006-07-11 16:24 UTC (permalink / raw)
  To: andrea
  Cc: Arjan van de Ven, Ingo Molnar, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

Ar Maw, 2006-07-11 am 17:31 +0200, ysgrifennodd andrea@cpushare.com:
> If they don't reconsider I'll be forced to recommend the Fedora CPUShare
> users to switch distro if they don't want having to recompile the kernel
> by themself.

I'm sure they'll both be deeply upset.

I really don't care about cpushare and patents for some users of the
code in question. On the other hand turning on performance harming code
for a tiny number of users is dumb. If it were a loadable module it
would be different.


Alan


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 16:24                     ` Alan Cox
@ 2006-07-12 15:43                       ` Andi Kleen
  2006-07-12 21:07                         ` Ingo Molnar
  2006-07-12 21:22                         ` Ingo Molnar
  0 siblings, 2 replies; 73+ messages in thread
From: Andi Kleen @ 2006-07-12 15:43 UTC (permalink / raw)
  To: Alan Cox
  Cc: Arjan van de Ven, Ingo Molnar, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
> 
> I really don't care about cpushare and patents for some users of the
> code in question. On the other hand turning on performance harming code
> for a tiny number of users is dumb. If it were a loadable module it
> would be different.

Actually there are some promising applications of seccomp outside
cpushare.

e.g. Andrea at some point proposed to run codecs which often
have security issues in a simple cpusec jail.  That's ok for 
them because they normally don't need to do any system calls.

I liked the idea. While this can be done with LSM (e.g. apparmor) too 
seccomp is definitely much easier and simpler and more "obviously safe"
than anything LSM based.

If the TSC disabling code is taken out the runtime overhead
of seccomp is also very small because it's only tested in slow
paths.

-Andi

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 15:43                       ` Andi Kleen
@ 2006-07-12 21:07                         ` Ingo Molnar
  2006-07-12 22:06                           ` Andi Kleen
  2006-07-13  1:51                           ` Andrew Morton
  2006-07-12 21:22                         ` Ingo Molnar
  1 sibling, 2 replies; 73+ messages in thread
From: Ingo Molnar @ 2006-07-12 21:07 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Arjan van de Ven, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

* Andi Kleen <ak@suse.de> wrote:

> If the TSC disabling code is taken out the runtime overhead of seccomp 
> is also very small because it's only tested in slow paths.

correct. But when i suggested to do precisely that i got a rant from 
Andrea of how super duper important it was to disable the TSC for 
seccomp ... (which argument is almost total hogwash)

so i'm going with the simpler path of making seccomp default-off. (which 
solves the problem as far as i'm concerned - i.e. no default overhead in 
the scheduler.)

but Andrea's creative arguments wrt. his decision to not pledge the 
seccomp related patent to GPL users makes me worry about whether this 
technology is untainted. We _dont_ want to make Linux reliant on 
possibly hostile patents - especially not for core default-enabled 
functionality. Basically Andrea wants us to help his project but he is 
in essence rejecting to do the same for us (for one of the most obvious 
applications of trusted bytecode: over-the-internet clustering via 
seccomp) - i find that approach fundamentally unfair. And he has not 
given a satisfactory answer either - IMO his 'GPL is not good because of 
the corporate firewall licensing backhole' argument is ridiculous [if he 
doesnt like the GPL he should write his own OS i guess, and not try to 
use a GPL-ed kernel as a vehicle for his technology]. (And he has 
injected his code into code that i authored too, which makes it doubly 
offensive to me.)

the fundamental problem is what i sense to be arrogant behavior of 
Andrea all across. I reported a performance problem months ago with two 
simple patches (the first one fixing seccomp, the second one disabling 
it by default) to get rid of the problem, and what i got was insults 
from Andrea and hours spent on writing pointless emails. Andrea is 
forcing me to invest time into this stupidity and that just increases my 
sense of being abused. I also start being of the opinion that no matter 
how good of a coder Andrea is, i dont want to deal with his code _at 
all_ if such _basic_ issues like performance regressions are so hard to 
communicate. At a minimum Andrea should apologize for all that abuse 
that i got just because i happened to cross his tracks with his 
holy-grail patent-pending technology.

another problem is the double standard Andrea's code is enjoying. 
Despite good resons to apply the patch, it has not been applied yet, 
with no explanation. Again, i request the patch below to be applied to 
the upstream kernel. If Andrea fixes the performance problem and fixes 
the patent taining issue we can turn the feature back on. Is Andrea's 
code above the rules of maintainance?

really, how much more stupid can the situation get before we get a 
resolution?

And i just wasted another 15 minutes on this ...

	Ingo

---------------->
From: Ingo Molnar <mingo@elte.hu>
Subject: let CONFIG_SECCOMP default to n

I was profiling the scheduler on x86 and noticed some overhead related 
to SECCOMP, and indeed, SECCOMP runs disable_tsc() at _every_ 
context-switch:

        if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
                handle_io_bitmap(next, tss);

        disable_tsc(prev_p, next_p);

        return prev_p;

these are a couple of instructions in the hottest scheduler codepath!

x86_64 already removed disable_tsc() from switch_to(), but i think the 
right solution is to turn SECCOMP off by default.

besides the runtime overhead, there are a couple of other reasons as 
well why this should be done:

 - CONFIG_SECCOMP=y adds 836 bytes of bloat to the kernel:

       text    data     bss     dec     hex filename
    4185360  867112  391012 5443484  530f9c vmlinux-noseccomp
    4185992  867316  391012 5444320  5312e0 vmlinux-seccomp

 - virtually nobody seems to be using it (but cpushare.com, which seems
   pretty inactive)

 - users/distributions can still turn it on if they want it

 - http://www.cpushare.com/legal seems to suggest that it is pursuing a
   software patent to utilize the seccomp concept in a distributed 
   environment, and seems to give a promise that 'end users' will not be
   affected by that patent. How about non-end-users [i.e. server-side]?
   Has the Linux kernel become a vehicle for a propriety server-side
   feature, with every Linux user paying the price of it?

so the patch below just does the minimal common-sense change: turn it 
off by default.

Adrian Bunk:
I've removed the superfluous "default n"'s the original patch introduced.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

----

This patch was already sent on:
- 19 Apr 2006
- 11 Apr 2006
- 10 Mar 2006
- 29 Jan 2006
- 21 Jan 2006

This patch was sent by Ingo Molnar on:
- 9 Jan 2006

Index: linux/arch/i386/Kconfig
===================================================================
--- linux.orig/arch/i386/Kconfig
+++ linux/arch/i386/Kconfig
@@ -637,7 +637,6 @@ config REGPARM
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/mips/Kconfig
===================================================================
--- linux.orig/arch/mips/Kconfig
+++ linux/arch/mips/Kconfig
@@ -1787,7 +1787,6 @@ config BINFMT_ELF32
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS && BROKEN
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/powerpc/Kconfig
===================================================================
--- linux.orig/arch/powerpc/Kconfig
+++ linux/arch/powerpc/Kconfig
@@ -666,7 +666,6 @@ endif
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/ppc/Kconfig
===================================================================
--- linux.orig/arch/ppc/Kconfig
+++ linux/arch/ppc/Kconfig
@@ -1127,7 +1127,6 @@ endif
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/sparc64/Kconfig
===================================================================
--- linux.orig/arch/sparc64/Kconfig
+++ linux/arch/sparc64/Kconfig
@@ -64,7 +64,6 @@ endchoice
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -466,7 +466,6 @@ config PHYSICAL_START
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
-	default y
 	help
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 21:07                         ` Ingo Molnar
@ 2006-07-12 22:06                           ` Andi Kleen
  2006-07-12 22:19                             ` Ingo Molnar
  2006-07-13  3:04                             ` Andrea Arcangeli
  2006-07-13  1:51                           ` Andrew Morton
  1 sibling, 2 replies; 73+ messages in thread
From: Andi Kleen @ 2006-07-12 22:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Cox, Arjan van de Ven, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

On Wednesday 12 July 2006 23:07, Ingo Molnar wrote:
> 
> * Andi Kleen <ak@suse.de> wrote:
> 
> > If the TSC disabling code is taken out the runtime overhead of seccomp 
> > is also very small because it's only tested in slow paths.
> 
> correct. But when i suggested to do precisely that i got a rant from 
> Andrea of how super duper important it was to disable the TSC for 
> seccomp ... (which argument is almost total hogwash)

I wouldn't call it completely hogwash - there was a published paper
with a demo of an attack - but still the attack required to so much
preparation and advance knowledge of the system that it seemed
more of academical value to me. At least for the standard kernel
we chose to not care about it. So for seccomp it was also not needed
imho.

> 
> so i'm going with the simpler path of making seccomp default-off. (which 
> solves the problem as far as i'm concerned - i.e. no default overhead in 
> the scheduler.)

I think without the context switch overhead it's a moderately useful facility.
Ok currently near nobody uses it, but having a very lightweight sandbox
with simple security semantics and that's easy to use is a useful 
facility for more secure user space.

It certainly would need to be better advertised to be any useful.
e.g. with a simple user space library that makes it easy to use.

> 
> but Andrea's creative arguments wrt. his decision to not pledge the 
> seccomp related patent to GPL users makes me worry about whether this 
> technology is untainted. 

I don't know any details about this, but I would generally trust Andrea not to
attempt to do anything evil regarding kernel & patents.

> 
> another problem is the double standard Andrea's code is enjoying. 
> Despite good resons to apply the patch, it has not been applied yet, 
> with no explanation. Again, i request the patch below to be applied to 
> the upstream kernel. 

I can put in a patch into my tree for the next merge to disable the TSC
disable code on i386 too like I did earlier for x86-64.

I don't have a great opinion on the Kconfig defaults, so I won't put
in a patch for that.

-Andi

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 22:06                           ` Andi Kleen
@ 2006-07-12 22:19                             ` Ingo Molnar
  2006-07-12 22:33                               ` Andi Kleen
  2006-07-13  3:16                               ` Andrea Arcangeli
  2006-07-13  3:04                             ` Andrea Arcangeli
  1 sibling, 2 replies; 73+ messages in thread
From: Ingo Molnar @ 2006-07-12 22:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Arjan van de Ven, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds


* Andi Kleen <ak@suse.de> wrote:

> On Wednesday 12 July 2006 23:07, Ingo Molnar wrote:
> > 
> > * Andi Kleen <ak@suse.de> wrote:
> > 
> > > If the TSC disabling code is taken out the runtime overhead of seccomp 
> > > is also very small because it's only tested in slow paths.
> > 
> > correct. But when i suggested to do precisely that i got a rant from 
> > Andrea of how super duper important it was to disable the TSC for 
> > seccomp ... (which argument is almost total hogwash)
> 
> I wouldn't call it completely hogwash - there was a published paper 
> with a demo of an attack - but still the attack required to so much 
> preparation and advance knowledge of the system that it seemed more of 
> academical value to me. [...]

(certainly - that's why i added the 'almost' qualifier to 'total 
hogwash'.)

> > so i'm going with the simpler path of making seccomp default-off. (which 
> > solves the problem as far as i'm concerned - i.e. no default overhead in 
> > the scheduler.)
> 
> I think without the context switch overhead it's a moderately useful 
> facility. Ok currently near nobody uses it, but having a very 
> lightweight sandbox with simple security semantics and that's easy to 
> use is a useful facility for more secure user space.

yeah. But wouldnt it be nicer to have the same damn thing that also 
improves a vital infrastructure of Linux, namely ptrace? Andrea didnt 
even try to improve ptrace - in fact he actively (and mostly unfairly) 
attacked ptrace, implicitly weakening the security perception of other 
syscall filtering based projects like User Mode Linux. Now what we have 
is the same old ptrace, some context-switch overhead, ~900 bytes of 
bloat and a NIH API. It's a lose-lose scenario IMO ...

> > but Andrea's creative arguments wrt. his decision to not pledge the 
> > seccomp related patent to GPL users makes me worry about whether 
> > this technology is untainted.
> 
> I don't know any details about this, [...]

Andrea wrote:

"If the GPL offered any protection to my system software I would 
 consider it too, but the GPL can't protect software that runs behind 
 the corporate firewall."

see:

 http://marc.theaimsgroup.com/?l=linux-kernel&m=115167947608676&w=2

> [...] but I would generally trust Andrea not to attempt to do anything 
> evil regarding kernel & patents.

firstly, you might trust Andrea, but do you trust the entity that 
actually owns the patent (cpushare.com and its investors)? And even if 
you trusted Andrea, would you trust his heir(s)?

> > another problem is the double standard Andrea's code is enjoying. 
> > Despite good resons to apply the patch, it has not been applied yet, 
> > with no explanation. Again, i request the patch below to be applied 
> > to the upstream kernel.
> 
> I can put in a patch into my tree for the next merge to disable the 
> TSC disable code on i386 too like I did earlier for x86-64.

please do.

	Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 22:19                             ` Ingo Molnar
@ 2006-07-12 22:33                               ` Andi Kleen
  2006-07-12 22:49                                 ` Ingo Molnar
  2006-07-13  3:16                               ` Andrea Arcangeli
  1 sibling, 1 reply; 73+ messages in thread
From: Andi Kleen @ 2006-07-12 22:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Cox, Arjan van de Ven, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds


> > I can put in a patch into my tree for the next merge to disable the 
> > TSC disable code on i386 too like I did earlier for x86-64.
> 
> please do.

Hmm, with the new thread test as it was pointed out it can be indeed made zero 
cost for the common case. Perhaps that's not needed then.

-Andi

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 22:33                               ` Andi Kleen
@ 2006-07-12 22:49                                 ` Ingo Molnar
  0 siblings, 0 replies; 73+ messages in thread
From: Ingo Molnar @ 2006-07-12 22:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Arjan van de Ven, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds


* Andi Kleen <ak@suse.de> wrote:

> 
> > > I can put in a patch into my tree for the next merge to disable the 
> > > TSC disable code on i386 too like I did earlier for x86-64.
> > 
> > please do.
> 
> Hmm, with the new thread test as it was pointed out it can be indeed 
> made zero cost for the common case. Perhaps that's not needed then.

putting aside the fundamental fallacy of disabling TSC based timing 
attacks while not even considering network-based timing attacks (which 
are still very much possible), Chuck's approach of pushing the seccomp 
TSC cr4 twiddling into the context-switch slowpath is the right 
solution, given the circumstances. Will Chuck's patch be in 2.6.18? If 
not then my months-old patch below should be applied.

	Ingo

----

remove TSC-disabling logic from the context-switch hotpath. It has
marginal security relevance. Truly paranoid users can boot with the
TSC disabled anyway.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
----

 arch/i386/kernel/process.c |   29 -----------------------------
 1 files changed, 29 deletions(-)

Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -589,33 +589,6 @@ handle_io_bitmap(struct thread_struct *n
 }
 
 /*
- * This function selects if the context switch from prev to next
- * has to tweak the TSC disable bit in the cr4.
- */
-static inline void disable_tsc(struct task_struct *prev_p,
-			       struct task_struct *next_p)
-{
-	struct thread_info *prev, *next;
-
-	/*
-	 * gcc should eliminate the ->thread_info dereference if
-	 * has_secure_computing returns 0 at compile time (SECCOMP=n).
-	 */
-	prev = prev_p->thread_info;
-	next = next_p->thread_info;
-
-	if (has_secure_computing(prev) || has_secure_computing(next)) {
-		/* slow path here */
-		if (has_secure_computing(prev) &&
-		    !has_secure_computing(next)) {
-			write_cr4(read_cr4() & ~X86_CR4_TSD);
-		} else if (!has_secure_computing(prev) &&
-			   has_secure_computing(next))
-			write_cr4(read_cr4() | X86_CR4_TSD);
-	}
-}
-
-/*
  *	switch_to(x,yn) should switch tasks from x to y.
  *
  * We fsave/fwait so that an exception goes off at the right time
@@ -709,8 +682,6 @@ struct task_struct fastcall * __switch_t
 	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
 		handle_io_bitmap(next, tss);
 
-	disable_tsc(prev_p, next_p);
-
 	return prev_p;
 }
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 22:19                             ` Ingo Molnar
  2006-07-12 22:33                               ` Andi Kleen
@ 2006-07-13  3:16                               ` Andrea Arcangeli
  2006-07-13 11:23                                 ` Jeff Dike
  1 sibling, 1 reply; 73+ messages in thread
From: Andrea Arcangeli @ 2006-07-13  3:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

On Thu, Jul 13, 2006 at 12:19:11AM +0200, Ingo Molnar wrote:
> attacked ptrace, implicitly weakening the security perception of other 
> syscall filtering based projects like User Mode Linux. Now what we have 

Note that UML had a security weakness already that allowed to escape
the jail, see bugtraq. Infact his complexity is huge regardless of
ptrace, the security hole probably wasn't even ptrace related (I don't
remember the exact details).

I'm a big fun of UML and other userland virtualization project, my own
ex prof is working on a few of them. That doesn't mean I would use UML
as a jail myself for CPUShare.

In the last two years of existence of seccomp, there has never been a
single bug that could allow to escape the jail, infact there has never
been one that I know if you backtest seccomp. And this track record
will continue.

Even the kernel itself as a whole is less secure than the seccomp
jail, that doesn't mean I want to weaken the perception of anything.
It's a pure matter of probability, the higher the complexity and the
bigger is the size of the project in kernel space, the more likely
there can be bug that can lead to an exploit.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  3:16                               ` Andrea Arcangeli
@ 2006-07-13 11:23                                 ` Jeff Dike
  2006-07-13 11:35                                   ` Ingo Molnar
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff Dike @ 2006-07-13 11:23 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Andi Kleen, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

On Thu, Jul 13, 2006 at 05:16:14AM +0200, Andrea Arcangeli wrote:
> On Thu, Jul 13, 2006 at 12:19:11AM +0200, Ingo Molnar wrote:
> > attacked ptrace, implicitly weakening the security perception of other 
> > syscall filtering based projects like User Mode Linux. Now what we have 
> 
> Note that UML had a security weakness already that allowed to escape
> the jail, see bugtraq. Infact his complexity is huge regardless of
> ptrace, the security hole probably wasn't even ptrace related (I don't
> remember the exact details).

Not hardly.  If you did remember the exact details, you'd remember
that it was in 2000, and someone "discovered" that tt mode didn't
allow kernel memory to be protected from userspace.  It had always
been well documented that tt mode had this problem and you shouldn't
be using it if you needed a secure VM.

See http://www.securityfocus.com/bid/3973/info

Now, there were a couple of ways to legitimately escape from UML, and
they *did* involve ptrace.  Things like single-stepping a system call
instruction or putting a breakpoint on a system call instruction and
single-stepping from the breakpoint.  As far as I know, these were
discovered and fixed by UML developers before there was any outside
awareness of these bugs.

				Jeff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13 11:23                                 ` Jeff Dike
@ 2006-07-13 11:35                                   ` Ingo Molnar
  0 siblings, 0 replies; 73+ messages in thread
From: Ingo Molnar @ 2006-07-13 11:35 UTC (permalink / raw)
  To: Jeff Dike
  Cc: Andrea Arcangeli, Andi Kleen, Alan Cox, Arjan van de Ven,
	Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel, Alan Cox,
	Linus Torvalds


* Jeff Dike <jdike@addtoit.com> wrote:

> Now, there were a couple of ways to legitimately escape from UML, and 
> they *did* involve ptrace.  Things like single-stepping a system call 
> instruction or putting a breakpoint on a system call instruction and 
> single-stepping from the breakpoint.  As far as I know, these were 
> discovered and fixed by UML developers before there was any outside 
> awareness of these bugs.

also, UML 'ptrace clients' are allowed alot more leeway than what a 
seccomp-alike ptrace/utrace based syscall filter would allow. It would 
clearly exclude activities like 'setting a breakpoint' or 
'single-stepping' - valid syscalls would be limited to 
read/write/sigreturn/exit.

	Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 22:06                           ` Andi Kleen
  2006-07-12 22:19                             ` Ingo Molnar
@ 2006-07-13  3:04                             ` Andrea Arcangeli
  2006-07-13  3:12                               ` Linus Torvalds
  1 sibling, 1 reply; 73+ messages in thread
From: Andrea Arcangeli @ 2006-07-13  3:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

On Thu, Jul 13, 2006 at 12:06:12AM +0200, Andi Kleen wrote:
> I don't know any details about this, but I would generally trust Andrea not to
> attempt to do anything evil regarding kernel & patents.

I appreciate Andi.

For the ones that don't seem to trust me I quote Alan (and I also know
that what Alan said is correct):

  As to patented code for the kernel. That itself is a non-issue providing
  the patent owner or someone with permission from them submitted the
  code. The law recognizes that you cannot go around making promises
  (estoppel) and then trying to sue people for acting on them. The GPL
  likewise makes this clear.

What Ingo complains about is the fact somebody could be selling a
patented mp3 player that uses alsa. Should alsa be rejected from the
kernel? Does that mean alsa has anything to do with the mp3 patent?

Another example is when you make a search on google.com, you use the
tcp/ip kernel stack to connect to a software covered by
patents. Should the tcp/ip stack be removed from the kernel? Does that
mean that the tcp/ip code has anything to do with the google patents?

Yes I also use tcp/ip, so do you want to reject tcp/ip from the kernel
to prevent people to run the software that connects the seccomp task
to the server? seccomp alone won't allow the client software to work
unless I can connect to the server, so tcp/ip is guilty exactly the
same way as seccomp.

There are infinite other examples...

About the GPL, I'm a huge believer on the GPL, I said multiple times I
think Linux has got the success it has because it's under the GPL and
not under the BSD. The GPL works perfectly for the kernel.

But it doesn't mean the GPL works for everything, infact the GPL
translates to a sort of BSD behind the firewall.

Ask to Ingo the link to the kernel source running in the google
supercomputer if he keeps saying that the GPL works universally.

Ask to Ingo what was deadly wrong with the LGPL that made he decide
that it would have been bad if people writing LGPL code would have
been allowed to use his patent-pending ideas. (Then re-ask him the
same question after replacing the LGPL with the BSD license).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  3:04                             ` Andrea Arcangeli
@ 2006-07-13  3:12                               ` Linus Torvalds
  2006-07-13  4:40                                 ` Andrea Arcangeli
  0 siblings, 1 reply; 73+ messages in thread
From: Linus Torvalds @ 2006-07-13  3:12 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Ingo Molnar, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox

On Thu, 13 Jul 2006, Andrea Arcangeli wrote:
> 
> What Ingo complains about is the fact somebody could be selling a
> patented mp3 player that uses alsa. Should alsa be rejected from the
> kernel? Does that mean alsa has anything to do with the mp3 patent?

ALSA is used for other things _too_.

I don't think SECCOMP is wrong per se, but I do believe that if other 
approaches become more popular, and the only user of SECCOMP is not GPL'd 
and uses some patented stuff, then we should seriously look at the other 
interfaces (eg the extended ptrace).

Does anybody actually really _use_ SECCOMP outside of the patented stuff?

		Linus

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  3:12                               ` Linus Torvalds
@ 2006-07-13  4:40                                 ` Andrea Arcangeli
  2006-07-13  4:51                                   ` andrea
  2006-07-13  5:12                                   ` Linus Torvalds
  0 siblings, 2 replies; 73+ messages in thread
From: Andrea Arcangeli @ 2006-07-13  4:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ingo Molnar, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox

On Wed, Jul 12, 2006 at 08:12:16PM -0700, Linus Torvalds wrote:
> ALSA is used for other things _too_.

But there may be some users only using alsa to play mp3, so it's not
so different (certainly I agree if it would be nice if there would be
more users since it can solve the codec decompression exploits).

> I don't think SECCOMP is wrong per se, but I do believe that if other 
> approaches become more popular, and the only user of SECCOMP is not GPL'd 
> and uses some patented stuff, then we should seriously look at the other 
> interfaces (eg the extended ptrace).

You want to extend ptrace so I can run two ptraces of the same ptraced
task? (one to stop the syscalls from happening like current seccomp
does, the other to debug the task with gdb while it's under the first
ptrace?) I think Linux will be better off without this complication
(and I'll be better off too, I've enough paralleism to deal with
already in this project without this one more ;)

If what you don't like is the API and you want to change it (like
replacing the /proc interface with a syscall or a prctl) that's fine
with me though.

> Does anybody actually really _use_ SECCOMP outside of the patented
> stuff?

Just a side note, it's patent-pending, not patented. It may never be
patented infact, all these discussion sounds very premature to me.

About your question, does it really matter what I would answer, given
we already have code in the kernel that can only be used in
combination with patented software and that it isn't useful for
anything else?

config X86_LONGRUN
        tristate "Transmeta LongRun"
        help
          This adds the CPUFreq driver for Transmeta Crusoe and Efficeon processors
          which support LongRun.

          For details, take a look at <file:Documentation/cpu-freq/>.

          If in doubt, say N.

Both these files:

     linux-2.6/arch/i386/kernel/cpu/cpufreq/longrun.c
     linux-2.6/arch/i386/kernel/cpu/transmeta.c

are only useful if used in comination with the CMS patented software
(Combining hardware and software to provide an improved
microprocessor):

     http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=50&f=G&l=50&co1=AND&d=PTXT&s1=transmeta.ASNM.&OS=AN/transmeta&RS=AN/transmeta

Furthermore the transmeta.o generates a 2964 bytes object, and cannot
be set to N, so it's linked in all i386 kernels out there, seccomp.o
OTOH can be set to N generating zero bytes of overhead and its final
.o size is 1108 bytes.

If you are aware of any other use of the above two files other than
the patented stuff you probably may want to communicate it to
Transmeta cause I guess they would be interested to know.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  4:40                                 ` Andrea Arcangeli
@ 2006-07-13  4:51                                   ` andrea
  2006-07-13  5:12                                   ` Linus Torvalds
  1 sibling, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-13  4:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ingo Molnar, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox

I sent the previous email from suse.de but like for the previous
emails in this thread, I was only speaking for myself. I followup
myself to specify this explicitly because if I did any major or minor
mistake in my reasoning in this thread, I want to be clear that you
must only blame me personally. The previous from got automatically
caught up by the reply headers.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  4:40                                 ` Andrea Arcangeli
  2006-07-13  4:51                                   ` andrea
@ 2006-07-13  5:12                                   ` Linus Torvalds
  2006-07-13  6:22                                     ` andrea
  1 sibling, 1 reply; 73+ messages in thread
From: Linus Torvalds @ 2006-07-13  5:12 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Ingo Molnar, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox



On Thu, 13 Jul 2006, Andrea Arcangeli wrote:
> 
> But there may be some users only using alsa to play mp3, so it's not
> so different (certainly I agree if it would be nice if there would be
> more users since it can solve the codec decompression exploits).

You aren't even listening.

> If what you don't like is the API and you want to change it (like
> replacing the /proc interface with a syscall or a prctl) that's fine
> with me though.

This has NOTHING to do with the API.

You're just in denial, and don't even listen to what people say. It also 
has nothing to do with cpufreq, which again is a case of _some_ uses may 
be patented, but not "_the_ use"

I just stated that if other interfaces don't have the problem that their 
only use is patent-protected, then other interfaces are clearly better 
alternatives. IF they have users at all.

			Linus

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  5:12                                   ` Linus Torvalds
@ 2006-07-13  6:22                                     ` andrea
  0 siblings, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-13  6:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ingo Molnar, Alan Cox, Arjan van de Ven, Adrian Bunk,
	Andrew Morton, Lee Revell, linux-kernel, Alan Cox

On Wed, Jul 12, 2006 at 10:12:15PM -0700, Linus Torvalds wrote:
> You're just in denial, and don't even listen to what people say. It also 
> has nothing to do with cpufreq, which again is a case of _some_ uses may 
> be patented, but not "_the_ use"

You know "_the_ only use" possible of transmeta.o (see the function
init_transmeta) is in connection with the CMS patented software:

	/* Print CMS and CPU revision */
	max = cpuid_eax(0x80860000);

If you can see a difference between transmeta.o and seccomp.o then I
trust you but personally the only difference I can see is that with
seccomp.o it is possible that it will be used for something else
useful too.

I never cared about transmeta.o being linked into my kernels despite I
never happened to need it so far in my life and despite it's larger
than seccomp. I'm happy to spend those hundred bytes in the transmeta
code just in case I would become a transmeta user in the future.

> I just stated that if other interfaces don't have the problem that
> their only use is patent-protected, then other interfaces are
> clearly better alternatives. IF they have users at all.

Obviously you're free to change the kernel the way you want (feel free
to nuke seccomp as well if you want), but I'm also free not to switch
to ptrace if the only reason you give me is sadly non-technical. If
seccomp is better, it is better regardless if the server side is
patent-pending (not patent-protected) or not. So even trusting you
that transmeta.o is fundamentally different from seccomp.o, and it's
all fair as you imply, it still won't make a difference to me since I
only care about technical arguments for my decisions about CPUShare.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 21:07                         ` Ingo Molnar
  2006-07-12 22:06                           ` Andi Kleen
@ 2006-07-13  1:51                           ` Andrew Morton
  2006-07-13  2:00                             ` Linus Torvalds
                                               ` (2 more replies)
  1 sibling, 3 replies; 73+ messages in thread
From: Andrew Morton @ 2006-07-13  1:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: ak, alan, arjan, bunk, rlrevell, linux-kernel, alan, torvalds

On Wed, 12 Jul 2006 23:07:32 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> Despite good resons to apply the patch, it has not been applied yet, 
> with no explanation.

I queued the below.  Andrea claims that it'll reduce seccomp overhead to
literally zero.

But looking at it, I think it's a bit confused.  The patch needs
s/DISABLE_TSC/ENABLE_TSC/ to make it right.





From: Andrea Arcangeli <andrea@cpushare.com>

Make the TSC disable purely paranoid feature optional, so by default seccomp
returns absolutely zerocost.

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 arch/i386/Kconfig            |   12 ++++++++++++
 arch/i386/kernel/process.c   |    2 ++
 arch/x86_64/Kconfig          |   12 ++++++++++++
 arch/x86_64/kernel/process.c |   31 +++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff -puN arch/i386/Kconfig~add-seccomp_disable_tsc-config-option arch/i386/Kconfig
--- a/arch/i386/Kconfig~add-seccomp_disable_tsc-config-option
+++ a/arch/i386/Kconfig
@@ -737,6 +737,18 @@ config SECCOMP
 
 	  If unsure, say Y. Only embedded should say N here.
 
+config SECCOMP_DISABLE_TSC
+	bool "Disable the TSC for seccomp tasks"
+	depends on SECCOMP
+	default n
+	help
+	  This feature mathematically prevents covert channels
+	  for tasks running under SECCOMP. This can generate
+	  a minuscule overhead in the scheduler.
+
+	  If you care most about performance say N. Say Y only if you're
+	  paranoid about covert channels.
+
 config VGA_NOPROBE
        bool "Don't probe VGA at boot" if EMBEDDED
        default n
diff -puN arch/i386/kernel/process.c~add-seccomp_disable_tsc-config-option arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c~add-seccomp_disable_tsc-config-option
+++ a/arch/i386/kernel/process.c
@@ -572,6 +572,7 @@ handle_io_bitmap(struct thread_struct *n
 static inline void disable_tsc(struct task_struct *prev_p,
 			       struct task_struct *next_p)
 {
+#ifdef CONFIG_SECCOMP_DISABLE_TSC
 	struct thread_info *prev, *next;
 
 	/*
@@ -590,6 +591,7 @@ static inline void disable_tsc(struct ta
 			   has_secure_computing(next))
 			write_cr4(read_cr4() | X86_CR4_TSD);
 	}
+#endif
 }
 
 /*
diff -puN arch/x86_64/Kconfig~add-seccomp_disable_tsc-config-option arch/x86_64/Kconfig
--- a/arch/x86_64/Kconfig~add-seccomp_disable_tsc-config-option
+++ a/arch/x86_64/Kconfig
@@ -526,6 +526,18 @@ config SECCOMP
 
 	  If unsure, say Y. Only embedded should say N here.
 
+config SECCOMP_DISABLE_TSC
+	bool "Disable the TSC for seccomp tasks"
+	depends on SECCOMP
+	default n
+	help
+	  This feature mathematically prevents covert channels
+	  for tasks running under SECCOMP. This can generate
+	  a minuscule overhead in the scheduler.
+
+	  If you care most about performance say N. Say Y only if you're
+	  paranoid about covert channels.
+
 source kernel/Kconfig.hz
 
 config REORDER
diff -puN arch/x86_64/kernel/process.c~add-seccomp_disable_tsc-config-option arch/x86_64/kernel/process.c
--- a/arch/x86_64/kernel/process.c~add-seccomp_disable_tsc-config-option
+++ a/arch/x86_64/kernel/process.c
@@ -494,6 +494,35 @@ out:
 }
 
 /*
+ * This function selects if the context switch from prev to next
+ * has to tweak the TSC disable bit in the cr4.
+ */
+static inline void disable_tsc(struct task_struct *prev_p,
+			       struct task_struct *next_p)
+{
+#ifdef CONFIG_SECCOMP_DISABLE_TSC
+	struct thread_info *prev, *next;
+
+	/*
+	 * gcc should eliminate the ->thread_info dereference if
+	 * has_secure_computing returns 0 at compile time (SECCOMP=n).
+	 */
+	prev = prev_p->thread_info;
+	next = next_p->thread_info;
+
+	if (has_secure_computing(prev) || has_secure_computing(next)) {
+		/* slow path here */
+		if (has_secure_computing(prev) &&
+		    !has_secure_computing(next)) {
+			write_cr4(read_cr4() & ~X86_CR4_TSD);
+		} else if (!has_secure_computing(prev) &&
+			   has_secure_computing(next))
+			write_cr4((read_cr4() | X86_CR4_TSD) & ~X86_CR4_PCE);
+	}
+#endif
+}
+
+/*
  * This special macro can be used to load a debugging register
  */
 #define loaddebug(thread,r) set_debugreg(thread->debugreg ## r, r)
@@ -622,6 +651,8 @@ __switch_to(struct task_struct *prev_p, 
 		}
 	}
 
+	disable_tsc(prev_p, next_p);
+
 	/* If the task has used fpu the last 5 timeslices, just do a full
 	 * restore of the math state immediately to avoid the trap; the
 	 * chances of needing FPU soon are obviously high now
_


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  1:51                           ` Andrew Morton
@ 2006-07-13  2:00                             ` Linus Torvalds
  2006-07-13  7:44                             ` James Bruce
  2006-07-13 12:13                             ` [patch] let CONFIG_SECCOMP default to n Andi Kleen
  2 siblings, 0 replies; 73+ messages in thread
From: Linus Torvalds @ 2006-07-13  2:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, ak, alan, arjan, bunk, rlrevell, linux-kernel, alan



On Wed, 12 Jul 2006, Andrew Morton wrote:
> 
> But looking at it, I think it's a bit confused.  The patch needs
> s/DISABLE_TSC/ENABLE_TSC/ to make it right.

No, SECCOMP_DISABLE_TSC _enables_ the "disable TSC" feature.

Rather confusing naming, I'd agree.

That said, I still think the code is crap, and that if we want to support 
tasks that don't have access to the TSC, we should make that an 
independent feature of anything like SECCOMP. 

		Linus

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  1:51                           ` Andrew Morton
  2006-07-13  2:00                             ` Linus Torvalds
@ 2006-07-13  7:44                             ` James Bruce
  2006-07-13  8:34                               ` andrea
  2006-07-13 12:13                             ` [patch] let CONFIG_SECCOMP default to n Andi Kleen
  2 siblings, 1 reply; 73+ messages in thread
From: James Bruce @ 2006-07-13  7:44 UTC (permalink / raw)
  To: andrea
  Cc: Andrew Morton, alan, arjan, bunk, rlrevell, linux-kernel, alan,
	torvalds, Ingo Molnar

Andrew Morton wrote:
> On Wed, 12 Jul 2006 23:07:32 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
>> Despite good resons to apply the patch, it has not been applied yet, 
>> with no explanation.
> 
> I queued the below.  Andrea claims that it'll reduce seccomp overhead to
> literally zero.
> 
> But looking at it, I think it's a bit confused.  The patch needs
> s/DISABLE_TSC/ENABLE_TSC/ to make it right.
<-- snip -->

Andrea,
what happened to Andrew James Wade's rewording [1] of your config help? 
   It seemed to disappear from what was submitted to akpm.

To "mathematically prevent covert channels" is far too strong a claim to 
make, since you only handle the case of TSC-related timing attacks. 
AJW's wording is much better, so please don't drop it.

Of course, if the new wording will be included in some forthcoming patch 
that also makes Linus happy [2], then never mind.

  - Jim Bruce

[1] http://lkml.org/lkml/2006/7/10/440
[2] http://lkml.org/lkml/2006/7/12/328

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  7:44                             ` James Bruce
@ 2006-07-13  8:34                               ` andrea
  2006-07-13  9:18                                 ` Andrew Morton
  0 siblings, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-13  8:34 UTC (permalink / raw)
  To: James Bruce
  Cc: Andrew Morton, alan, arjan, bunk, rlrevell, linux-kernel, alan,
	torvalds, Ingo Molnar

On Thu, Jul 13, 2006 at 03:44:38AM -0400, James Bruce wrote:
> Andrea,
> what happened to Andrew James Wade's rewording [1] of your config help? 
>   It seemed to disappear from what was submitted to akpm.

Andrew picked the patch I made originally, before Andrew James Wade
patched it.

Both patches are obsoleted by the new logic in the context switch that
uses the bitflags to enter the slow path, see Chuck's patch. That will
prevent the need of a config option because it's zero cost like the
core of seccomp.

As long as seccomp won't be nuked from the kernel, Chuck's patch seems
the way to go.

But the point is that I've no idea anymore what will happen to
seccomp so perhaps all patches will be useless.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  8:34                               ` andrea
@ 2006-07-13  9:18                                 ` Andrew Morton
  2006-07-14  6:09                                   ` [PATCH] TIF_NOTSC and SECCOMP prctl andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Andrew Morton @ 2006-07-13  9:18 UTC (permalink / raw)
  To: andrea
  Cc: bruce, alan, arjan, bunk, rlrevell, linux-kernel, alan, torvalds,
	mingo

On Thu, 13 Jul 2006 10:34:41 +0200
andrea@cpushare.com wrote:

> Both patches are obsoleted by the new logic in the context switch that
> uses the bitflags to enter the slow path, see Chuck's patch.

What darn patch?

<looks>

hm, p73wtain80h.fsf@verdi.suse.de, who appears to be Andi has (again)
removed me from cc.  Possibly an act of mercy ;)

> As long as seccomp won't be nuked from the kernel, Chuck's patch seems
> the way to go.

I see "[compile tested only; requires just-sent fix to i386 system.h]", so
an appropriate next step would be for you to review, test, sign-off and
forward it, please.

> But the point is that I've no idea anymore what will happen to
> seccomp so perhaps all patches will be useless.

Shrug.  If we can optimise the current code, fine.  If there's a default-on
config option that makes no-TSC seccomp have zero overhead, better.  If that
makes us go back to doing useful stuff, perfect.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH] TIF_NOTSC and SECCOMP prctl
  2006-07-13  9:18                                 ` Andrew Morton
@ 2006-07-14  6:09                                   ` andrea
  2006-07-14  6:27                                     ` Andrew Morton
  0 siblings, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-14  6:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: bruce, alan, arjan, bunk, rlrevell, linux-kernel, alan, torvalds,
	mingo

On Thu, Jul 13, 2006 at 02:18:18AM -0700, Andrew Morton wrote:
> removed me from cc.  Possibly an act of mercy ;)

;)

> I see "[compile tested only; requires just-sent fix to i386 system.h]", so
> an appropriate next step would be for you to review, test, sign-off and
> forward it, please.

I took the liberty to add Chuck's signoff as well since I started
hacking on top of his patch, if this is not ok Chuck please let us
know.

The below patch seems to work, I ported all my client code on top of
prctl already. (it's a bit more painful to autodetect a kernel with
CONFIG_SECCOMP turned off but I already adapted to it)

The only thing left worth discussing is why if I set TIF_NOTSC to 10
instead of 19 the kernel was crashing hard... After I checked and
rechecked everything else I deduced it had to be that number and after
changing it to 19 everything works fine... I also verified the first
rdtsc kills the task with a sigsegv. It would be nice to make sure
it's not a bug in the below patch that 10 didn't work but just some
hidden kernel "feature" ;).

The reduction of 36 lines should be a welcome thing. I also left a
CONFIG_SECCOMP in the slow path around the TIF_NOTSC stuff, so the ones
setting CONFIG_SECCOMP=n won't notice any bytecode size
difference. (those two CONFIG_SECCOMP should be removed if somebody
adds a standalone prctl that only calls disable_TSC()).

Compared to Chuck's patch I also moved the io_bitmap in a path that
only executes if either prev or next have the TIF_IO_BITMAP set, which
seems more optimal.

Reviews are welcome (then I will move into x86-64, all other archs
supporting seccomp should require no changes despite the API
change). Thanks.

 arch/i386/kernel/process.c     |  124 +++++++++++++++++++++--------------------
 fs/proc/base.c                 |   91 ------------------------------
 include/asm-i386/processor.h   |    4 +
 include/asm-i386/thread_info.h |    5 +
 include/linux/prctl.h          |    4 +
 include/linux/seccomp.h        |   19 +++---
 kernel/seccomp.c               |   31 +++++++++-
 kernel/sys.c                   |    8 ++
 8 files changed, 125 insertions(+), 161 deletions(-)

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>

# HG changeset patch
# User andrea@cpushare.com
# Date 1152856077 -7200
# Node ID 9be99cbb325935c2a7af96ac39411fdde58d4eef
# Parent  bcfd682ea605a2ab00469eaa875988de6b910814
Removes the overhead of disabling the TSC under SECCCOMP
with a new TIF_NOTSC bitflag (idea and part of the code from Chuck Ebbert).
disable_TSC can be called by other kernel code without interfering with SECCOMP
in any way. A prctl could be added just to disable the TSC if anybody needs it.
Only the "current" task can call disable_TSC.

To reduce the bytes of .text to the minimum, the seccomp API is moved from
/proc to prctl. /proc wasn't necessary anymore because only the "current" task
can safely turn on the NOTSC bit without SMP race conditions.

diff -r bcfd682ea605 -r 9be99cbb3259 arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c	Thu Jul 13 03:03:35 2006 +0700
+++ b/arch/i386/kernel/process.c	Fri Jul 14 07:47:57 2006 +0200
@@ -535,8 +535,29 @@ int dump_task_regs(struct task_struct *t
 	return 1;
 }
 
-static noinline void __switch_to_xtra(struct task_struct *next_p,
-				    struct tss_struct *tss)
+#ifdef CONFIG_SECCOMP
+void hard_disable_TSC(void)
+{
+	write_cr4(read_cr4() | X86_CR4_TSD);
+}
+void disable_TSC(void)
+{
+	if (!test_and_set_thread_flag(TIF_NOTSC))
+		/*
+		 * Must flip the CPU state synchronously with
+		 * TIF_NOTSC in the current running context.
+		 */
+		hard_disable_TSC();
+}
+void hard_enable_TSC(void)
+{
+	write_cr4(read_cr4() & ~X86_CR4_TSD);
+}
+#endif /* CONFIG_SECCOMP */
+
+static noinline void
+__switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
+		 struct tss_struct *tss)
 {
 	struct thread_struct *next;
 
@@ -552,60 +573,47 @@ static noinline void __switch_to_xtra(st
 		set_debugreg(next->debugreg[7], 7);
 	}
 
-	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+#ifdef CONFIG_SECCOMP
+	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
+	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
+		/* prev and next are different */
+		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
+			hard_disable_TSC();
+		else
+			hard_enable_TSC();
+	}
+#endif
+
+	if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP) ||
+	    test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+		if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+			/*
+			 * Disable the bitmap via an invalid offset. We still cache
+			 * the previous bitmap owner and the IO bitmap contents:
+			 */
+			tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
+			return;
+		}
+
+		if (likely(next == tss->io_bitmap_owner)) {
+			/*
+			 * Previous owner of the bitmap (hence the bitmap content)
+			 * matches the next task, we dont have to do anything but
+			 * to set a valid offset in the TSS:
+			 */
+			tss->io_bitmap_base = IO_BITMAP_OFFSET;
+			return;
+		}
 		/*
-		 * Disable the bitmap via an invalid offset. We still cache
-		 * the previous bitmap owner and the IO bitmap contents:
+		 * Lazy TSS's I/O bitmap copy. We set an invalid offset here
+		 * and we let the task to get a GPF in case an I/O instruction
+		 * is performed.  The handler of the GPF will verify that the
+		 * faulting task has a valid I/O bitmap and, it true, does the
+		 * real copy and restart the instruction.  This will save us
+		 * redundant copies when the currently switched task does not
+		 * perform any I/O during its timeslice.
 		 */
-		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
-		return;
-	}
-
-	if (likely(next == tss->io_bitmap_owner)) {
-		/*
-		 * Previous owner of the bitmap (hence the bitmap content)
-		 * matches the next task, we dont have to do anything but
-		 * to set a valid offset in the TSS:
-		 */
-		tss->io_bitmap_base = IO_BITMAP_OFFSET;
-		return;
-	}
-	/*
-	 * Lazy TSS's I/O bitmap copy. We set an invalid offset here
-	 * and we let the task to get a GPF in case an I/O instruction
-	 * is performed.  The handler of the GPF will verify that the
-	 * faulting task has a valid I/O bitmap and, it true, does the
-	 * real copy and restart the instruction.  This will save us
-	 * redundant copies when the currently switched task does not
-	 * perform any I/O during its timeslice.
-	 */
-	tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET_LAZY;
-}
-
-/*
- * This function selects if the context switch from prev to next
- * has to tweak the TSC disable bit in the cr4.
- */
-static inline void disable_tsc(struct task_struct *prev_p,
-			       struct task_struct *next_p)
-{
-	struct thread_info *prev, *next;
-
-	/*
-	 * gcc should eliminate the ->thread_info dereference if
-	 * has_secure_computing returns 0 at compile time (SECCOMP=n).
-	 */
-	prev = task_thread_info(prev_p);
-	next = task_thread_info(next_p);
-
-	if (has_secure_computing(prev) || has_secure_computing(next)) {
-		/* slow path here */
-		if (has_secure_computing(prev) &&
-		    !has_secure_computing(next)) {
-			write_cr4(read_cr4() & ~X86_CR4_TSD);
-		} else if (!has_secure_computing(prev) &&
-			   has_secure_computing(next))
-			write_cr4(read_cr4() | X86_CR4_TSD);
+		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET_LAZY;
 	}
 }
 
@@ -690,11 +698,9 @@ struct task_struct fastcall * __switch_t
 	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
 	 */
-	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW))
-	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP))
-		__switch_to_xtra(next_p, tss);
-
-	disable_tsc(prev_p, next_p);
+	if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
+		     task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
+		__switch_to_xtra(prev_p, next_p, tss);
 
 	return prev_p;
 }
diff -r bcfd682ea605 -r 9be99cbb3259 fs/proc/base.c
--- a/fs/proc/base.c	Thu Jul 13 03:03:35 2006 +0700
+++ b/fs/proc/base.c	Fri Jul 14 07:47:57 2006 +0200
@@ -67,7 +67,6 @@
 #include <linux/mount.h>
 #include <linux/security.h>
 #include <linux/ptrace.h>
-#include <linux/seccomp.h>
 #include <linux/cpuset.h>
 #include <linux/audit.h>
 #include <linux/poll.h>
@@ -98,9 +97,6 @@ enum pid_directory_inos {
 	PROC_TGID_TASK,
 	PROC_TGID_STATUS,
 	PROC_TGID_MEM,
-#ifdef CONFIG_SECCOMP
-	PROC_TGID_SECCOMP,
-#endif
 	PROC_TGID_CWD,
 	PROC_TGID_ROOT,
 	PROC_TGID_EXE,
@@ -141,9 +137,6 @@ enum pid_directory_inos {
 	PROC_TID_INO,
 	PROC_TID_STATUS,
 	PROC_TID_MEM,
-#ifdef CONFIG_SECCOMP
-	PROC_TID_SECCOMP,
-#endif
 	PROC_TID_CWD,
 	PROC_TID_ROOT,
 	PROC_TID_EXE,
@@ -212,9 +205,6 @@ static struct pid_entry tgid_base_stuff[
 	E(PROC_TGID_NUMA_MAPS, "numa_maps", S_IFREG|S_IRUGO),
 #endif
 	E(PROC_TGID_MEM,       "mem",     S_IFREG|S_IRUSR|S_IWUSR),
-#ifdef CONFIG_SECCOMP
-	E(PROC_TGID_SECCOMP,   "seccomp", S_IFREG|S_IRUSR|S_IWUSR),
-#endif
 	E(PROC_TGID_CWD,       "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_ROOT,      "root",    S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_EXE,       "exe",     S_IFLNK|S_IRWXUGO),
@@ -255,9 +245,6 @@ static struct pid_entry tid_base_stuff[]
 	E(PROC_TID_NUMA_MAPS,  "numa_maps",    S_IFREG|S_IRUGO),
 #endif
 	E(PROC_TID_MEM,        "mem",     S_IFREG|S_IRUSR|S_IWUSR),
-#ifdef CONFIG_SECCOMP
-	E(PROC_TID_SECCOMP,    "seccomp", S_IFREG|S_IRUSR|S_IWUSR),
-#endif
 	E(PROC_TID_CWD,        "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_ROOT,       "root",    S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_EXE,        "exe",     S_IFLNK|S_IRWXUGO),
@@ -970,78 +957,6 @@ static struct file_operations proc_login
 	.write		= proc_loginuid_write,
 };
 #endif
-
-#ifdef CONFIG_SECCOMP
-static ssize_t seccomp_read(struct file *file, char __user *buf,
-			    size_t count, loff_t *ppos)
-{
-	struct task_struct *tsk = get_proc_task(file->f_dentry->d_inode);
-	char __buf[20];
-	loff_t __ppos = *ppos;
-	size_t len;
-
-	if (!tsk)
-		return -ESRCH;
-	/* no need to print the trailing zero, so use only len */
-	len = sprintf(__buf, "%u\n", tsk->seccomp.mode);
-	put_task_struct(tsk);
-	if (__ppos >= len)
-		return 0;
-	if (count > len - __ppos)
-		count = len - __ppos;
-	if (copy_to_user(buf, __buf + __ppos, count))
-		return -EFAULT;
-	*ppos = __ppos + count;
-	return count;
-}
-
-static ssize_t seccomp_write(struct file *file, const char __user *buf,
-			     size_t count, loff_t *ppos)
-{
-	struct task_struct *tsk = get_proc_task(file->f_dentry->d_inode);
-	char __buf[20], *end;
-	unsigned int seccomp_mode;
-	ssize_t result;
-
-	result = -ESRCH;
-	if (!tsk)
-		goto out_no_task;
-
-	/* can set it only once to be even more secure */
-	result = -EPERM;
-	if (unlikely(tsk->seccomp.mode))
-		goto out;
-
-	result = -EFAULT;
-	memset(__buf, 0, sizeof(__buf));
-	count = min(count, sizeof(__buf) - 1);
-	if (copy_from_user(__buf, buf, count))
-		goto out;
-
-	seccomp_mode = simple_strtoul(__buf, &end, 0);
-	if (*end == '\n')
-		end++;
-	result = -EINVAL;
-	if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) {
-		tsk->seccomp.mode = seccomp_mode;
-		set_tsk_thread_flag(tsk, TIF_SECCOMP);
-	} else
-		goto out;
-	result = -EIO;
-	if (unlikely(!(end - __buf)))
-		goto out;
-	result = end - __buf;
-out:
-	put_task_struct(tsk);
-out_no_task:
-	return result;
-}
-
-static struct file_operations proc_seccomp_operations = {
-	.read		= seccomp_read,
-	.write		= seccomp_write,
-};
-#endif /* CONFIG_SECCOMP */
 
 static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
@@ -1726,12 +1641,6 @@ static struct dentry *proc_pident_lookup
 		case PROC_TGID_MEM:
 			inode->i_fop = &proc_mem_operations;
 			break;
-#ifdef CONFIG_SECCOMP
-		case PROC_TID_SECCOMP:
-		case PROC_TGID_SECCOMP:
-			inode->i_fop = &proc_seccomp_operations;
-			break;
-#endif /* CONFIG_SECCOMP */
 		case PROC_TID_MOUNTS:
 		case PROC_TGID_MOUNTS:
 			inode->i_fop = &proc_mounts_operations;
diff -r bcfd682ea605 -r 9be99cbb3259 include/asm-i386/processor.h
--- a/include/asm-i386/processor.h	Thu Jul 13 03:03:35 2006 +0700
+++ b/include/asm-i386/processor.h	Fri Jul 14 07:47:57 2006 +0200
@@ -256,6 +256,10 @@ static inline void clear_in_cr4 (unsigne
 	cr4 &= ~mask;
 	write_cr4(cr4);
 }
+
+extern void hard_disable_TSC(void);
+extern void disable_TSC(void);
+extern void hard_enable_TSC(void);
 
 /*
  *      NSC/Cyrix CPU configuration register indexes
diff -r bcfd682ea605 -r 9be99cbb3259 include/asm-i386/thread_info.h
--- a/include/asm-i386/thread_info.h	Thu Jul 13 03:03:35 2006 +0700
+++ b/include/asm-i386/thread_info.h	Fri Jul 14 07:47:57 2006 +0200
@@ -142,6 +142,7 @@ static inline struct thread_info *curren
 #define TIF_MEMDIE		16
 #define TIF_DEBUG		17	/* uses debug registers */
 #define TIF_IO_BITMAP		18	/* uses I/O bitmap */
+#define TIF_NOTSC		19	/* TSC is not accessible in userland */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
@@ -153,6 +154,7 @@ static inline struct thread_info *curren
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1<<TIF_SECCOMP)
 #define _TIF_RESTORE_SIGMASK	(1<<TIF_RESTORE_SIGMASK)
+#define _TIF_NOTSC		(1<<TIF_NOTSC)
 #define _TIF_DEBUG		(1<<TIF_DEBUG)
 #define _TIF_IO_BITMAP		(1<<TIF_IO_BITMAP)
 
@@ -164,7 +166,8 @@ static inline struct thread_info *curren
 #define _TIF_ALLWORK_MASK	(0x0000FFFF & ~_TIF_SECCOMP)
 
 /* flags to check in __switch_to() */
-#define _TIF_WORK_CTXSW (_TIF_DEBUG|_TIF_IO_BITMAP)
+#define _TIF_WORK_CTXSW_NEXT (_TIF_IO_BITMAP | _TIF_NOTSC | _TIF_DEBUG)
+#define _TIF_WORK_CTXSW_PREV (_TIF_IO_BITMAP | _TIF_NOTSC)
 
 /*
  * Thread-synchronous status.
diff -r bcfd682ea605 -r 9be99cbb3259 include/linux/prctl.h
--- a/include/linux/prctl.h	Thu Jul 13 03:03:35 2006 +0700
+++ b/include/linux/prctl.h	Fri Jul 14 07:47:57 2006 +0200
@@ -59,4 +59,8 @@
 # define PR_ENDIAN_LITTLE	1	/* True little endian mode */
 # define PR_ENDIAN_PPC_LITTLE	2	/* "PowerPC" pseudo little endian */
 
+/* Get/set process seccomp mode */
+#define PR_GET_SECCOMP	21
+#define PR_SET_SECCOMP	22
+
 #endif /* _LINUX_PRCTL_H */
diff -r bcfd682ea605 -r 9be99cbb3259 include/linux/seccomp.h
--- a/include/linux/seccomp.h	Thu Jul 13 03:03:35 2006 +0700
+++ b/include/linux/seccomp.h	Fri Jul 14 07:47:57 2006 +0200
@@ -3,8 +3,6 @@
 
 
 #ifdef CONFIG_SECCOMP
-
-#define NR_SECCOMP_MODES 1
 
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
@@ -18,20 +16,23 @@ static inline void secure_computing(int 
 		__secure_computing(this_syscall);
 }
 
-static inline int has_secure_computing(struct thread_info *ti)
-{
-	return unlikely(test_ti_thread_flag(ti, TIF_SECCOMP));
-}
+extern long prctl_get_seccomp(void);
+extern long prctl_set_seccomp(unsigned long);
 
 #else /* CONFIG_SECCOMP */
 
 typedef struct { } seccomp_t;
 
 #define secure_computing(x) do { } while (0)
-/* static inline to preserve typechecking */
-static inline int has_secure_computing(struct thread_info *ti)
+
+static inline long prctl_get_seccomp(void)
 {
-	return 0;
+	return -EINVAL;
+}
+
+static inline long prctl_set_seccomp(unsigned long arg2)
+{
+	return -EINVAL;
 }
 
 #endif /* CONFIG_SECCOMP */
diff -r bcfd682ea605 -r 9be99cbb3259 kernel/seccomp.c
--- a/kernel/seccomp.c	Thu Jul 13 03:03:35 2006 +0700
+++ b/kernel/seccomp.c	Fri Jul 14 07:47:57 2006 +0200
@@ -1,7 +1,7 @@
 /*
  * linux/kernel/seccomp.c
  *
- * Copyright 2004-2005  Andrea Arcangeli <andrea@cpushare.com>
+ * Copyright 2004-2006  Andrea Arcangeli <andrea@cpushare.com>
  *
  * This defines a simple but solid secure-computing mode.
  */
@@ -10,6 +10,7 @@
 #include <linux/sched.h>
 
 /* #define SECCOMP_DEBUG 1 */
+#define NR_SECCOMP_MODES 1
 
 /*
  * Secure computing mode 1 allows only read/write/exit/sigreturn.
@@ -54,3 +55,31 @@ void __secure_computing(int this_syscall
 #endif
 	do_exit(SIGKILL);
 }
+
+long prctl_get_seccomp(void)
+{
+	return current->seccomp.mode;
+}
+
+long prctl_set_seccomp(unsigned long seccomp_mode)
+{
+	long ret;
+
+	/* can set it only once to be even more secure */
+	ret = -EPERM;
+	if (unlikely(current->seccomp.mode))
+		goto out;
+
+	ret = -EINVAL;
+	if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) {
+		current->seccomp.mode = seccomp_mode;
+		set_thread_flag(TIF_SECCOMP);
+#ifdef TIF_NOTSC
+		disable_TSC();
+#endif
+		ret = 0;
+	}
+
+ out:
+	return ret;
+}
diff -r bcfd682ea605 -r 9be99cbb3259 kernel/sys.c
--- a/kernel/sys.c	Thu Jul 13 03:03:35 2006 +0700
+++ b/kernel/sys.c	Fri Jul 14 07:47:57 2006 +0200
@@ -28,6 +28,7 @@
 #include <linux/tty.h>
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
+#include <linux/seccomp.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -2056,6 +2057,13 @@ asmlinkage long sys_prctl(int option, un
 			error = SET_ENDIAN(current, arg2);
 			break;
 
+		case PR_GET_SECCOMP:
+			error = prctl_get_seccomp();
+			break;
+		case PR_SET_SECCOMP:
+			error = prctl_set_seccomp(arg2);
+			break;
+
 		default:
 			error = -EINVAL;
 			break;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] TIF_NOTSC and SECCOMP prctl
  2006-07-14  6:09                                   ` [PATCH] TIF_NOTSC and SECCOMP prctl andrea
@ 2006-07-14  6:27                                     ` Andrew Morton
  2006-07-14  6:33                                       ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Andrew Morton @ 2006-07-14  6:27 UTC (permalink / raw)
  To: andrea
  Cc: bruce, alan, arjan, bunk, rlrevell, linux-kernel, alan, torvalds,
	mingo

On Fri, 14 Jul 2006 08:09:32 +0200
andrea@cpushare.com wrote:

> The only thing left worth discussing is why if I set TIF_NOTSC to 10
> instead of 19 the kernel was crashing hard... After I checked and
> rechecked everything else I deduced it had to be that number and after
> changing it to 19 everything works fine... I also verified the first
> rdtsc kills the task with a sigsegv. It would be nice to make sure
> it's not a bug in the below patch that 10 didn't work but just some
> hidden kernel "feature" ;).

Using a bit <= 15 will cause kernel to take the work_notifysig path
"pending work-to-be-done flags are in LSW".  I'm not sure what happens if
there's such a flag set but nothing is set up to handle it.  I guess it
stays set and processes never get out of the kernel again.

Perhaps TIF_SECCOMP should be >= 16 too - the special-case in
_TIF_ALLWORK_MASK looks odd.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] TIF_NOTSC and SECCOMP prctl
  2006-07-14  6:27                                     ` Andrew Morton
@ 2006-07-14  6:33                                       ` andrea
  0 siblings, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-14  6:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: bruce, alan, arjan, bunk, rlrevell, linux-kernel, alan, torvalds,
	mingo

On Thu, Jul 13, 2006 at 11:27:27PM -0700, Andrew Morton wrote:
> Using a bit <= 15 will cause kernel to take the work_notifysig path
> "pending work-to-be-done flags are in LSW".  I'm not sure what happens if
> there's such a flag set but nothing is set up to handle it.  I guess it
> stays set and processes never get out of the kernel again.

Ah ok, thanks.

> Perhaps TIF_SECCOMP should be >= 16 too - the special-case in
> _TIF_ALLWORK_MASK looks odd.

It's checked with testw so it must be < 16.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-13  1:51                           ` Andrew Morton
  2006-07-13  2:00                             ` Linus Torvalds
  2006-07-13  7:44                             ` James Bruce
@ 2006-07-13 12:13                             ` Andi Kleen
  2 siblings, 0 replies; 73+ messages in thread
From: Andi Kleen @ 2006-07-13 12:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, alan, arjan, linux-kernel, alan, torvalds

On Thursday 13 July 2006 03:51, Andrew Morton wrote:
> On Wed, 12 Jul 2006 23:07:32 +0200
>
> Ingo Molnar <mingo@elte.hu> wrote:
> > Despite good resons to apply the patch, it has not been applied yet,
> > with no explanation.
>
> I queued the below.  Andrea claims that it'll reduce seccomp overhead to
> literally zero.

Chuck's patch - possibly with Linus' rename - is better.

-Andi


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 15:43                       ` Andi Kleen
  2006-07-12 21:07                         ` Ingo Molnar
@ 2006-07-12 21:22                         ` Ingo Molnar
  2006-07-12 22:11                           ` Andi Kleen
  1 sibling, 1 reply; 73+ messages in thread
From: Ingo Molnar @ 2006-07-12 21:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Arjan van de Ven, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

* Andi Kleen <ak@suse.de> wrote:

> I liked the idea. While this can be done with LSM (e.g. apparmor) too 
> seccomp is definitely much easier and simpler and more "obviously 
> safe" than anything LSM based.

LSM is probably too heavy for this - but utrace (posted by Roland 
McGrath a few weeks ago) is alot more focused on modularizing ptrace 
features. utrace also solves a whole host of other issues that we have 
with ptrace!

for example the first sample utrace module that Roland posted was a 
'stop the task if it becomes undebugged, instead of letting the task run 
away'. That solves precisely the ptrace property that Andrea complained 
about most.

i think Andrea didnt even try to fix/generalize ptrace perhaps because 
that would make his 'security feature' too banal? It would also become 
unpatentable? Even though this decision hurts the 'reach' of his project 
fundamentally: ptrace support is everywhere, and users could very much 
and consciously decide to run 'compatible ptrace' or 'more secure 
ptrace' [provided by newer kernels].

Andrea's "ptrace is insecure" argument is just plain FUD: there's 
nothing inherently insecure about the _client side_ of the ptrace APIs 
or the client side of ptrace implementation. So my suggestion is to get 
utrace in, to implement an utrace module that implements untrusted code 
execution and then lets get rid of seccomp.

	Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-12 21:22                         ` Ingo Molnar
@ 2006-07-12 22:11                           ` Andi Kleen
  0 siblings, 0 replies; 73+ messages in thread
From: Andi Kleen @ 2006-07-12 22:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Cox, Arjan van de Ven, Adrian Bunk, Andrew Morton,
	Lee Revell, linux-kernel, Alan Cox, Linus Torvalds

On Wednesday 12 July 2006 23:22, Ingo Molnar wrote:

> 
> i think Andrea didnt even try to fix/generalize ptrace perhaps because 
> that would make his 'security feature' too banal? 

seccomp in its current state is already "banal". I think that was the
whole point of it. If he had wanted to do something complicated I'm sure
LSM would have offered lots of opportunity to go wild @). But seccomp
is really simple and easy to analyze. I bet if he could have made
it simpler he would have done that too.

That said the problems I see with using ptrace for this is that it
just adds too many context switches for each syscall and would be likely too slow.
Hmm, actually there might not be that many syscalls for these applications
(just some reads and writes) so it might work or not. But it would certainly be slower 
than it is right now. Would probably need some testing.

If utrace allows to do the filtering in kernel space it would 
be probably a useful replacement. I don't remember enough of the code
to know if it can do this or not. But I suppose it would still
need a kernel module or kernel patch of some sort to implement this
specific filtering.

> there's 
> nothing inherently insecure about the _client side_ of the ptrace APIs 
> or the client side of ptrace implementation. 

Agreed. 

> So my suggestion is to get  
> utrace in, to implement an utrace module that implements untrusted code 
> execution and then lets get rid of seccomp.

Sounds fine to me in theory (without having looked at any code) 

-Andi

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [patch] let CONFIG_SECCOMP default to n
  2006-07-11 14:17               ` andrea
  2006-07-11 14:32                 ` Arjan van de Ven
@ 2006-07-11 15:54                 ` Pavel Machek
  1 sibling, 0 replies; 73+ messages in thread
From: Pavel Machek @ 2006-07-11 15:54 UTC (permalink / raw)
  To: andrea
  Cc: Ingo Molnar, Adrian Bunk, Andrew Morton, Lee Revell, linux-kernel,
	Alan Cox, Linus Torvalds

Hi!

> > > > and both are pledged and available to GPL users. [..]
> > > 
> > > If the GPL offered any protection to my system software I would 
> > > consider it too, but the GPL can't protect software that runs behind 
> > > the corporate firewall. [...]
> > 
> > so you admit and confirm that you explicitly and intentionally do not 
> > pledge your patent to GPL users. [..]
> 
> How many times do I need to say it. The pending patent has nothing to do
> with the kernel, and it is _not_ pledged under the GPL.
...
> Talking about patents when submitting seccomp, would be like talking
> about mp3 patents when submitting alsa code or talking about google

Well, if mp3 was only known user of alsa, yes that would be relevant
discussion... and that seems to be the case here.
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  1:40     ` Adrian Bunk
  2006-06-30  4:52       ` Andrea Arcangeli
@ 2006-06-30 12:39       ` Alan Cox
  1 sibling, 0 replies; 73+ messages in thread
From: Alan Cox @ 2006-06-30 12:39 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, Lee Revell, linux-kernel, mingo, Alan Cox,
	Linus Torvalds, Andrea Arcangeli

Ar Gwe, 2006-06-30 am 03:40 +0200, ysgrifennodd Adrian Bunk:
> > I am stolidly letting the arch maintainers and the developer of this
> > feature work out what to do.
> 
> Andrea is proud of getting a patent for the server part [1], so I doubt 
> he would be happy with no longer having the client part defaulting to Y...

If Andrea has clear personal business interests in that decision then
perhaps you could make the case he shouldn't make the decision as to
whether it should be Y or not, or that someone should review it. No big
deal. There are lots of uses for that code. None of them appear
interesting 8)

I don't think its actually important because distributions make their
own decisions about such questions and most of the running kernels are
distribution ones. 

As to patented code for the kernel. That itself is a non-issue providing
the patent owner or someone with permission from them submitted the
code. The law recognizes that you cannot go around making promises
(estoppel) and then trying to sue people for acting on them. The GPL
likewise makes this clear.

> It might sound a bit strange that although Alan Cox and Linus Torvalds 
> even wrote an open letter to the President of the European Parliament
> calling "Software patents are also the utmost threat to the development 
> of Linux and other free software products" [2]...

The Red Hat position on patents is on the web site, along with the
permissions for GPL use. It makes clear the view we have on patents for
software.

Alan

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  1:07   ` Andrew Morton
  2006-06-30  1:40     ` Adrian Bunk
@ 2006-06-30  2:35     ` Randy.Dunlap
  2006-06-30 15:03       ` Lee Revell
  1 sibling, 1 reply; 73+ messages in thread
From: Randy.Dunlap @ 2006-06-30  2:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rlrevell, bunk, linux-kernel, mingo

On Thu, 29 Jun 2006 18:07:06 -0700 Andrew Morton wrote:

> Lee Revell <rlrevell@joe-job.com> wrote:
> >
> > On Thu, 2006-06-29 at 21:21 +0200, Adrian Bunk wrote:
> > > This patch was already sent on:
> > > - 26 Jun 2006
> > > - 27 Apr 2006
> > > - 19 Apr 2006
> > > - 11 Apr 2006
> > > - 10 Mar 2006
> > > - 29 Jan 2006
> > > - 21 Jan 2006 
> > 
> > 3 days ago?  That seems a bit silly.  Why didn't you just ping Andrew on
> > it?
> > 
> > Andrew, what's the status of this?  Can we get an ACK or a NACK before
> > this starts getting reposted every day? ;-)
> > 
> 
> I am stolidly letting the arch maintainers and the developer of this
> feature work out what to do.

Bah, options that are not Required should default to n.
I support Adrian's patch.

---
~Randy

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30  2:35     ` Randy.Dunlap
@ 2006-06-30 15:03       ` Lee Revell
  2006-07-08  9:23         ` Andrea Arcangeli
  0 siblings, 1 reply; 73+ messages in thread
From: Lee Revell @ 2006-06-30 15:03 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Andrew Morton, bunk, linux-kernel, mingo

On Thu, 2006-06-29 at 19:35 -0700, Randy.Dunlap wrote:
> On Thu, 29 Jun 2006 18:07:06 -0700 Andrew Morton wrote:
> 
> > Lee Revell <rlrevell@joe-job.com> wrote:
> > >
> > > On Thu, 2006-06-29 at 21:21 +0200, Adrian Bunk wrote:
> > > > This patch was already sent on:
> > > > - 26 Jun 2006
> > > > - 27 Apr 2006
> > > > - 19 Apr 2006
> > > > - 11 Apr 2006
> > > > - 10 Mar 2006
> > > > - 29 Jan 2006
> > > > - 21 Jan 2006 
> > > 
> > > 3 days ago?  That seems a bit silly.  Why didn't you just ping Andrew on
> > > it?
> > > 
> > > Andrew, what's the status of this?  Can we get an ACK or a NACK before
> > > this starts getting reposted every day? ;-)
> > > 
> > 
> > I am stolidly letting the arch maintainers and the developer of this
> > feature work out what to do.
> 
> Bah, options that are not Required should default to n.
> I support Adrian's patch.

Agreed:

- Most people don't use it
- There's a performance hit

Clearly should default to N.

Lee


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-06-30 15:03       ` Lee Revell
@ 2006-07-08  9:23         ` Andrea Arcangeli
  2006-07-11  1:59           ` Andrew James Wade
  0 siblings, 1 reply; 73+ messages in thread
From: Andrea Arcangeli @ 2006-07-08  9:23 UTC (permalink / raw)
  To: Lee Revell; +Cc: Randy.Dunlap, Andrew Morton, bunk, linux-kernel, mingo

On Fri, Jun 30, 2006 at 11:03:00AM -0400, Lee Revell wrote:
> On Thu, 2006-06-29 at 19:35 -0700, Randy.Dunlap wrote:
> > On Thu, 29 Jun 2006 18:07:06 -0700 Andrew Morton wrote:
> > 
> > > Lee Revell <rlrevell@joe-job.com> wrote:
> > > >
> > > > On Thu, 2006-06-29 at 21:21 +0200, Adrian Bunk wrote:
> > > > > This patch was already sent on:
> > > > > - 26 Jun 2006
> > > > > - 27 Apr 2006
> > > > > - 19 Apr 2006
> > > > > - 11 Apr 2006
> > > > > - 10 Mar 2006
> > > > > - 29 Jan 2006
> > > > > - 21 Jan 2006 
> > > > 
> > > > 3 days ago?  That seems a bit silly.  Why didn't you just ping Andrew on
> > > > it?
> > > > 
> > > > Andrew, what's the status of this?  Can we get an ACK or a NACK before
> > > > this starts getting reposted every day? ;-)
> > > > 
> > > 
> > > I am stolidly letting the arch maintainers and the developer of this
> > > feature work out what to do.
> > 
> > Bah, options that are not Required should default to n.
> > I support Adrian's patch.
> 
> Agreed:
> 
> - Most people don't use it
> - There's a performance hit

On x86-64 SECCOMP generates absoutely zero performance hit.

The original seccomp patch for x86 also generated absolutely zero
performance hit, both pratically and theoretically too. _zero_ CPU
cycles of difference, zero cachelines.

What generates a minuscle overhead is a feature I added later on top of
SECCOMP, that disables TSC for SECCOMP tasks. I thought such minuscle
overhead wouldn't be measurable compared to all other heavyweight work
we do in the scheduler (note that unless you sell cpu through CPUShare
actively this overhead consists in two cacheline touches per context
switch), but anyway I agree it's good idea to make it optional so
there will be absolutely no reason left to leave seccomp disabled by
default anymore.

Andi thinks the feature is absolutely unnecessary, he's certainly right,
and it has been there only for paranoid reasons.

	http://www.cpushare.com/blog/CPUShare/article/26/

> Clearly should default to N.

I think the best is to add a CONFIG_SECCOMP_DISABLE_TSC obviously
defaulted to N, so seccomp returns absolutely zerocost and everybody
will be happy (me included for sure, since I agree with Andi, except I
can't be sure of it and that's the only reason why I developed the tsc
disable feature).

I strongly agree with leaving CONFIG_SECCOMP_DISABLE_TSC set to N by
default.

-------------

Make the TSC disable purely paranoid feature optional, so by default seccomp
returns absolutely zerocost.

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>

diff -r 67137165b47d arch/i386/Kconfig
--- a/arch/i386/Kconfig	Thu Jul 06 19:45:01 2006 +0200
+++ b/arch/i386/Kconfig	Sat Jul 08 11:06:49 2006 +0200
@@ -734,6 +734,18 @@ config SECCOMP
 	  defined by each seccomp mode.
 
 	  If unsure, say Y. Only embedded should say N here.
+
+config SECCOMP_DISABLE_TSC
+	bool "Disable the TSC for seccomp tasks"
+	depends on SECCOMP
+	default n
+	help
+	  This feature mathematically prevents covert channels
+	  for tasks running under SECCOMP. This can generate
+	  a minuscule overhead in the scheduler.
+
+	  If you care most about performance say N. Say Y only if you're
+	  paranoid about covert channels.
 
 source kernel/Kconfig.hz
 
diff -r 67137165b47d arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c	Thu Jul 06 19:45:01 2006 +0200
+++ b/arch/i386/kernel/process.c	Sat Jul 08 11:05:35 2006 +0200
@@ -572,6 +572,7 @@ static inline void disable_tsc(struct ta
 static inline void disable_tsc(struct task_struct *prev_p,
 			       struct task_struct *next_p)
 {
+#ifdef CONFIG_SECCOMP_DISABLE_TSC
 	struct thread_info *prev, *next;
 
 	/*
@@ -590,6 +591,7 @@ static inline void disable_tsc(struct ta
 			   has_secure_computing(next))
 			write_cr4(read_cr4() | X86_CR4_TSD);
 	}
+#endif
 }
 
 /*
diff -r 67137165b47d arch/x86_64/Kconfig
--- a/arch/x86_64/Kconfig	Thu Jul 06 19:45:01 2006 +0200
+++ b/arch/x86_64/Kconfig	Sat Jul 08 11:06:40 2006 +0200
@@ -522,6 +522,18 @@ config SECCOMP
 
 	  If unsure, say Y. Only embedded should say N here.
 
+config SECCOMP_DISABLE_TSC
+	bool "Disable the TSC for seccomp tasks"
+	depends on SECCOMP
+	default n
+	help
+	  This feature mathematically prevents covert channels
+	  for tasks running under SECCOMP. This can generate
+	  a minuscule overhead in the scheduler.
+
+	  If you care most about performance say N. Say Y only if you're
+	  paranoid about covert channels.
+
 source kernel/Kconfig.hz
 
 config REORDER
diff -r 67137165b47d arch/x86_64/kernel/process.c
--- a/arch/x86_64/kernel/process.c	Thu Jul 06 19:45:01 2006 +0200
+++ b/arch/x86_64/kernel/process.c	Sat Jul 08 11:05:26 2006 +0200
@@ -494,6 +494,35 @@ out:
 }
 
 /*
+ * This function selects if the context switch from prev to next
+ * has to tweak the TSC disable bit in the cr4.
+ */
+static inline void disable_tsc(struct task_struct *prev_p,
+			       struct task_struct *next_p)
+{
+#ifdef CONFIG_SECCOMP_DISABLE_TSC
+	struct thread_info *prev, *next;
+
+	/*
+	 * gcc should eliminate the ->thread_info dereference if
+	 * has_secure_computing returns 0 at compile time (SECCOMP=n).
+	 */
+	prev = prev_p->thread_info;
+	next = next_p->thread_info;
+
+	if (has_secure_computing(prev) || has_secure_computing(next)) {
+		/* slow path here */
+		if (has_secure_computing(prev) &&
+		    !has_secure_computing(next)) {
+			write_cr4(read_cr4() & ~X86_CR4_TSD);
+		} else if (!has_secure_computing(prev) &&
+			   has_secure_computing(next))
+			write_cr4((read_cr4() | X86_CR4_TSD) & ~X86_CR4_PCE);
+	}
+#endif
+}
+
+/*
  * This special macro can be used to load a debugging register
  */
 #define loaddebug(thread,r) set_debugreg(thread->debugreg ## r, r)
@@ -617,6 +646,8 @@ __switch_to(struct task_struct *prev_p, 
 			memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
 		}
 	}
+
+	disable_tsc(prev_p, next_p);
 
 	return prev_p;
 }


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-08  9:23         ` Andrea Arcangeli
@ 2006-07-11  1:59           ` Andrew James Wade
  2006-07-11  4:16             ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Andrew James Wade @ 2006-07-11  1:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Lee Revell, Randy.Dunlap, Andrew Morton, bunk, linux-kernel,
	mingo

On Saturday 08 July 2006 05:23, Andrea Arcangeli wrote:
..
> (note that unless you sell cpu through CPUShare
> actively this overhead consists in two cacheline touches per context
> switch),

It's probably not worth the complication, but I suppose that could be
reduced to one cacheline by lazily enabling the TSC access.

...
> +	  This feature mathematically prevents covert channels
> +	  for tasks running under SECCOMP.

I disagree with this wording. First, for most users the worry isn't so
much covert channels, as it is side channels. In other words, the
worry is not so much that data is sent to the SECCOMP process
secretly, as that the data could be sensitive. Second, the feature
closes one only one type of side-channel; others may still exist. It's
quite possible for cpu bugs or undefined behaviour to reveal internal
cpu state (possibly affected by another process) without otherwise
being security risks. (In my uninformed opinion). I wouldn't worry
about such side channels myself, but they do likely exist.

Suggested wording as a patch against 2.6.18-rc1-mm1:
------

Change help text for SECCOMP_DISABLE_TSC to warn about
side channels (the larger concern) instead of covert channels.

signed-off-by: Andrew Wade <andrew.j.wade@gmail.com>
---

diff -rupN a/arch/i386/Kconfig b/arch/i386/Kconfig
--- a/arch/i386/Kconfig	2006-07-10 21:00:37.000000000 -0400
+++ b/arch/i386/Kconfig	2006-07-10 21:37:12.000000000 -0400
@@ -748,12 +748,12 @@ config SECCOMP_DISABLE_TSC
 	depends on SECCOMP
 	default n
 	help
-	  This feature mathematically prevents covert channels
-	  for tasks running under SECCOMP. This can generate
-	  a minuscule overhead in the scheduler.
+	  This feature closes potential side channels for tasks
+	  running under SECCOMP. Enabling this can generate a
+	  miniscule overhead in the scheduler.
 
 	  If you care most about performance say N. Say Y only if you're
-	  paranoid about covert channels.
+	  paranoid about security.
 
 config VGA_NOPROBE
        bool "Don't probe VGA at boot" if EMBEDDED
diff -rupN a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
--- a/arch/x86_64/Kconfig	2006-07-10 21:00:40.000000000 -0400
+++ b/arch/x86_64/Kconfig	2006-07-10 21:44:59.000000000 -0400
@@ -537,12 +537,12 @@ config SECCOMP_DISABLE_TSC
 	depends on SECCOMP
 	default n
 	help
-	  This feature mathematically prevents covert channels
-	  for tasks running under SECCOMP. This can generate
-	  a minuscule overhead in the scheduler.
+	  This feature closes potential side channels for tasks
+	  running under SECCOMP. Enabling this can generate a
+	  miniscule overhead in the scheduler.
 
 	  If you care most about performance say N. Say Y only if you're
-	  paranoid about covert channels.
+	  paranoid about security.
 
 source kernel/Kconfig.hz
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-11  1:59           ` Andrew James Wade
@ 2006-07-11  4:16             ` andrea
  2006-07-11 20:19               ` Andrew James Wade
  0 siblings, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-11  4:16 UTC (permalink / raw)
  To: ajwade; +Cc: Lee Revell, Randy.Dunlap, Andrew Morton, bunk, linux-kernel,
	mingo

Hello,

On Mon, Jul 10, 2006 at 09:59:09PM -0400, Andrew James Wade wrote:
> It's probably not worth the complication, but I suppose that could be
> reduced to one cacheline by lazily enabling the TSC access.

Yep, OTOH lazily enabling means generating exception faults and
enter/exit kernels that takes order of magnitude more time and stall the
pipeline unlike the two cacheline touches. So that may be ok on x86
(well modulo a few apps that I know do a flood of rdtsc and infact
you've to disable them on numa), but x86-64 may be using it more
frequently through vgettimeofday in some UP configuration, making the
optimization dubious.

I already tried to reduce the number of cacheline touches to zero and
without risking exception faults by relocating the seccomp_t in the
task_t, but I removed that part from the last patch just to make it 100%
strightforward.

> I disagree with this wording. First, for most users the worry isn't so
> much covert channels, as it is side channels. In other words, the
> worry is not so much that data is sent to the SECCOMP process
> secretly, as that the data could be sensitive. Second, the feature

Well, this comes as a news because with covert channel I always meant
your "side channel" and only in timing context. Perhaps it was just me
misunderstanding ;).  But we obviously agree, I meant your side channel
if you're the one right about the wording.

> closes one only one type of side-channel; others may still exist. It's
> quite possible for cpu bugs or undefined behaviour to reveal internal

Well math guarantees and unpredictable hardware issues don't go well
together. If there are cpu bugs the least thing anybody (not just the
seccomp users) can care about is the covert channel.

> cpu state (possibly affected by another process) without otherwise
> being security risks. (In my uninformed opinion). I wouldn't worry

Any bug that affects seccomp security is always a security bug for
everyone else too (in terms of multiuser security of course).

I'll also give you the perfect example of a side channel not related to
timing attacks if that's what you meant (I thought covert channels were
only about timing attacks): see the mmx example that I can quote here
taken out of a webpage of my site (pathname /technical):

	The most severe attack possible I'm aware of is the mmx
	incorrect initialization caused by the MMX capable cpus not being
	backwards compatible with previous Pentium cpus. No computer could have
	been permanently compromised by such an attack and a simple kernel
	upgrade would have fixed the problem.

That was affecting all multiuser systems, and it wasn't really a cpu
bug (though it's hard to call it a "feature" ;), it was just the newer
cpus not being "security" backwards compatible.

f00f was a real cpu bug instead and it lead to a DoS.

The opposite isn't true, a security bug for everyone, is pratically
never a bug for seccomp. Historically backtesting seccomp, the only
exceptions I'm aware of have been the above mmx data leak,
the f00f and some fdiv bug in the kernel (not a cpu bug) also were
dosable. But we never even got close to exploitability as far as I can
remember, and that's after all the most important thing.

Running linux seccomp under the vista virtualization will be even more
secure than running it on top of the bare hardware, because if there are
_kernel_ bugs that makes seccomp expoitable, things will be still secure
and no reinstall will be necessary since the iso will be mounted
readonly and there will be no access to any filesystem or harddisk (it
all runs in ramfs).

> about such side channels myself, but they do likely exist.

Nothing can be guaranteed perfect, and if something I'm more worried
about kernel bugs than about cpu bugs. It's all a matter of probability,
if it's more likely that you're being hit by an asteroid, then that your
CPU has a bug that allows an attacker to execute code outside seccomp, I
think you should be fine ;). Though it's probably more likely that the
CPU has a bug than we are hit by an asteroid (or at least I hope so).

>From another POV if any seccomp related usage really can save energy, I
suppose that's less risky than producing the saved energy with fission
(we could argue about oil).

> Suggested wording as a patch against 2.6.18-rc1-mm1:
> ------
> 
> Change help text for SECCOMP_DISABLE_TSC to warn about
> side channels (the larger concern) instead of covert channels.
> 
> signed-off-by: Andrew Wade <andrew.j.wade@gmail.com>
Acked-by: Andrea Arcangeli <andrea@cpushare.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-11  4:16             ` andrea
@ 2006-07-11 20:19               ` Andrew James Wade
  2006-07-12 21:05                 ` andrea
  2006-07-12 21:13                 ` Ingo Molnar
  0 siblings, 2 replies; 73+ messages in thread
From: Andrew James Wade @ 2006-07-11 20:19 UTC (permalink / raw)
  To: andrea; +Cc: Lee Revell, Randy.Dunlap, Andrew Morton, bunk, linux-kernel,
	mingo

Hello,

On Tuesday 11 July 2006 00:16, andrea@cpushare.com wrote:
> Hello,
> 
> On Mon, Jul 10, 2006 at 09:59:09PM -0400, Andrew James Wade wrote:
> > It's probably not worth the complication, but I suppose that could be
> > reduced to one cacheline by lazily enabling the TSC access.
> 
> Yep, OTOH lazily enabling means generating exception faults and
> enter/exit kernels that takes order of magnitude more time and stall the
> pipeline unlike the two cacheline touches.

But only if SECCOMP code runs, otherwise it's never needed. OTOH, if it
can't reduce the number of cacheline touches over a tuned seccomp common
case, there's no benefit either.

> > I disagree with this wording. First, for most users the worry isn't so
> > much covert channels, as it is side channels. In other words, the
> > worry is not so much that data is sent to the SECCOMP process
> > secretly, as that the data could be sensitive. Second, the feature
> 
> Well, this comes as a news because with covert channel I always meant
> your "side channel" and only in timing context. Perhaps it was just me
> misunderstanding ;).  But we obviously agree, I meant your side channel
> if you're the one right about the wording.

I'm not an expert, but I believe I'm using the terminology correctly.

> > closes one only one type of side-channel; others may still exist. It's
> > quite possible for cpu bugs or undefined behaviour to reveal internal
> 
> Well math guarantees and unpredictable hardware issues don't go well
> together.

Yes. By necessity any proofs about software must make assumptions that
aren't quite valid. Such proofs can still be useful of course. I think
we're in agreement.

> If there are cpu bugs the least thing anybody (not just the 
> seccomp users) can care about is the covert channel.
> 
> > cpu state (possibly affected by another process) without otherwise
> > being security risks. (In my uninformed opinion). I wouldn't worry
> 
> Any bug that affects seccomp security is always a security bug for
> everyone else too (in terms of multiuser security of course).

Yes. But it is possible for the only exploitable side-effect of a bug
to be the opening of a side-channel. The incorrect fp initialization
being almost a case in point; all programs outside the jail would
likely ignore initial values in mmx registers, but they would still be
vulnerable to their floating point state being read. That's probably
not useful information to an attacker. Many other side channels are
likely similar, in that the information revealed is not actually
useful to the attacker. f00f also has very limited security
implications, as I understand it.

The various software and hardware caches will open a plethora of
timing side channels, almost all useless to an attacker in that the
revealed information is uninteresting/useless. At least I would hope
so. The downside of security is that it is hard to be sure.

> The opposite isn't true, a security bug for everyone, is pratically
> never a bug for seccomp.

Ah, fail safe. Nice property to have. From my observation, userspace
does appear to have something of that property with regards to cpu
bugs as well. What I recall from the errata sheets is that many of the
bugs could only be triggered from privileged code.

...

> It's all a matter of probability,
> if it's more likely that you're being hit by an asteroid, then that your 
> CPU has a bug that allows an attacker to execute code outside seccomp, I
> think you should be fine ;).

And that's where fail-safe and simple design comes in. In this
application an oops is better than a jail-break by orders of
magnitude. But then that's why you wrote seccomp instead of using
ptrace in the first place.

Andrew Wade

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-11 20:19               ` Andrew James Wade
@ 2006-07-12 21:05                 ` andrea
  2006-07-12 22:02                   ` Alan Cox
  2006-07-12 21:13                 ` Ingo Molnar
  1 sibling, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-12 21:05 UTC (permalink / raw)
  To: ajwade; +Cc: Lee Revell, Randy.Dunlap, Andrew Morton, bunk, linux-kernel,
	mingo

Hello,

On Tue, Jul 11, 2006 at 04:19:35PM -0400, Andrew James Wade wrote:
> But only if SECCOMP code runs, otherwise it's never needed. OTOH, if it
> can't reduce the number of cacheline touches over a tuned seccomp common
> case, there's no benefit either.

Yep.

> I'm not an expert, but I believe I'm using the terminology correctly.

Nobody else answered so you must be right then ;).

> Yes. By necessity any proofs about software must make assumptions that
> aren't quite valid. Such proofs can still be useful of course. I think
> we're in agreement.

We are.

> useful to the attacker. f00f also has very limited security
> implications, as I understand it.

f00f has very limited implications and it would be immediately stopped
from creating further damage (I could even autodetect it with some
clever heuristic). In fact on architectures with the NX bit the stack
can be marked not executable too, so then I could make self-modifying
code impossible, and in turn I would be able to filter out bytecode
reliably on the server. So even if a CPU has a bug not possible to
workaround in the kernel (like with the idt marked readonly) I could
prevent such a bug to be exploited thanks to the NX bit. The only
self-modifying code allowed currently is on the user stack and nothing
else. I didn't bother to enable the NX bit yet because most i686
misses it and there are not (yet) security related bugs that requires
bytecode filtering to be workarounded (and if they will appear, it
means such a cpu will be insecure for multiuser systems on linux
without any hope of software workarounds, only my special seccomp
usage could prevent such a CPU bug to trigger by filtering the
bytecode).

> The various software and hardware caches will open a plethora of
> timing side channels, almost all useless to an attacker in that the
> revealed information is uninteresting/useless. At least I would hope
> so. The downside of security is that it is hard to be sure.

The whole point of the tsc disable was exactly to be sure there are no
timing side channels. If they can't access an accurate source of time
information the very bytecode that attempts to measure the time will
simply get killed instantly.

Measuring time through the network currently is impractical, the rtt is
too huge for that (though perhaps 10 years from now we'll have to
rethink about this).

> Ah, fail safe. Nice property to have. From my observation, userspace
> does appear to have something of that property with regards to cpu
> bugs as well. What I recall from the errata sheets is that many of the
> bugs could only be triggered from privileged code.

Right.

> And that's where fail-safe and simple design comes in. In this
> application an oops is better than a jail-break by orders of

An oops or more generically a system crash.

> magnitude. But then that's why you wrote seccomp instead of using
> ptrace in the first place.

Exactly ;).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-12 21:05                 ` andrea
@ 2006-07-12 22:02                   ` Alan Cox
  2006-07-12 23:44                     ` andrea
  2006-07-13  2:56                     ` Andrew James Wade
  0 siblings, 2 replies; 73+ messages in thread
From: Alan Cox @ 2006-07-12 22:02 UTC (permalink / raw)
  To: andrea
  Cc: ajwade, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel, mingo

Ar Mer, 2006-07-12 am 23:05 +0200, ysgrifennodd andrea@cpushare.com:
> Measuring time through the network currently is impractical, the rtt is
> too huge for that (though perhaps 10 years from now we'll have to
> rethink about this).

Actually measuring time through the network is extremely doable given
enough samples as is communication through delay perturbation. A good
viterbi encoder/decoder will fish a signal out of very high noise. Yes
you pay a lot in data rate at that point but it works.

Anyway at the point you pass the bytecode through a processing filter
you don't need SECCOMP because your filter can remove any syscall
attempts. 

Alan

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-12 22:02                   ` Alan Cox
@ 2006-07-12 23:44                     ` andrea
  2006-07-13 21:29                       ` Pavel Machek
  2006-07-13  2:56                     ` Andrew James Wade
  1 sibling, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-12 23:44 UTC (permalink / raw)
  To: Alan Cox
  Cc: ajwade, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel, mingo

On Wed, Jul 12, 2006 at 11:02:56PM +0100, Alan Cox wrote:
> Actually measuring time through the network is extremely doable given
> enough samples as is communication through delay perturbation. A good
> viterbi encoder/decoder will fish a signal out of very high noise. Yes
> you pay a lot in data rate at that point but it works.

Currently the bandwidth is free, I'll charge for the transaction
associated bandwidth only if I'm forced to (which would happen quickly
if people starts doing the above ;).

The way the current transactions are running as we speak, is not like
in a full peer to peer system. It's half peer to peer, a trusted node
always sits in between buyer and seller. I need this for a multitude
of reasons (I could offload the middle node in a p3p system that is
reliable as long as only 1 of the 3 is malicious but it's certainly
more secure if the node in the middle is fully trusted so I'll try to
avoid that). So if you are right, my trusted node will simply add
/dev/urandom delay as needed before forwarding any packet, to prevent
any meaningful measurement. Any network side channel can be solved in
a few liner patch and very quickly.

Theoretically speaking everything is possible, but pratically speaking
I will defer any further thought about this 10 years in the future
because I think it's impossible to measure any signal with nanosecond
frequency, over a connection with millisecond resolution passing
through a randomization of tcp/ip kernel code, slowdown of the python
interpreter and kernel pipes until it finally reaches the seccomp
bytecode and then same way backwards. (plus virtualization on vista)

I rate the network side channel even less probable than the TSC one
(which is purely theoretical too like Andi can certainly confirm).

> Anyway at the point you pass the bytecode through a processing filter
> you don't need SECCOMP because your filter can remove any syscall
> attempts. 

Even if I wanted to run the filtered (but originally untrusted)
bytecode out of any jail, I need the NX bit to do that, and I don't
have it in a large part of the (currently tiny) userbase. In the
previous email I wasn't accurate saying self modifying code is only
possible on the stack, obviously it's possibly in the heap too, so not
even the non-executable-stack patches could help. SECCOMP is the only
feasible basic mode to cover all systems >=i686 (I don't support i586
and lower because I think it's not worth the bandwidth they would
generate, and if I'm wrong I can add them later, ia64 is also not
supported and that has higher prio if something).

Security is about having tons of things to break before you gain ring
0 privilege. It'd be totally wrong in security terms to remove
zerocost SECCOMP (or trusted-xen) just because you added a further
security measure on top of seccomp (or xen). It's like leaving the
door of your apartment open just because you enabled the security
alarm (you want to do both don't you?).

The more security you have the better, especially when it's zero cost
like seccomp.

Furthermore the filter would need to know about all archs and
bytecodes in the world, arch details and all possible ways to enter
kernel, it would need to be maintained out of sync with the kernel
development and it would be of an huge complexity compared to the few
liner seccomp patch (all high risk stuff compared to SECCOMP).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-12 23:44                     ` andrea
@ 2006-07-13 21:29                       ` Pavel Machek
  2006-07-13 23:11                         ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Pavel Machek @ 2006-07-13 21:29 UTC (permalink / raw)
  To: andrea
  Cc: Alan Cox, ajwade, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel, mingo

Hi!

> > Actually measuring time through the network is extremely doable given
> > enough samples as is communication through delay perturbation. A good
> > viterbi encoder/decoder will fish a signal out of very high noise. Yes
> > you pay a lot in data rate at that point but it works.
> 
> Currently the bandwidth is free, I'll charge for the transaction
> associated bandwidth only if I'm forced to (which would happen quickly
> if people starts doing the above ;).
> 
> The way the current transactions are running as we speak, is not like
> in a full peer to peer system. It's half peer to peer, a trusted node
> always sits in between buyer and seller. I need this for a multitude
> of reasons (I could offload the middle node in a p3p system that is
> reliable as long as only 1 of the 3 is malicious but it's certainly
> more secure if the node in the middle is fully trusted so I'll try to
> avoid that). So if you are right, my trusted node will simply add
> /dev/urandom delay as needed before forwarding any packet, to prevent
> any meaningful measurement. Any network side channel can be solved in
> a few liner patch and very quickly.

Actually random delays are unlike to help (much). You have just added
noise, but you can still decode original signal...

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-13 21:29                       ` Pavel Machek
@ 2006-07-13 23:11                         ` andrea
  2006-07-13 23:20                           ` Pavel Machek
  2006-07-15  2:55                           ` Valdis.Kletnieks
  0 siblings, 2 replies; 73+ messages in thread
From: andrea @ 2006-07-13 23:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, ajwade, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel, mingo

On Thu, Jul 13, 2006 at 09:29:41PM +0000, Pavel Machek wrote:
> Actually random delays are unlike to help (much). You have just added
> noise, but you can still decode original signal...

You're wrong, the random delays added to every packet will definitely
wipe out any signal.

But regardless of what is the best fix for the network attack I quote
Ingo:

   correct. But when i suggested to do precisely that i got a rant from
   Andrea of how super duper important it was to disable the TSC for
   seccomp ... (which argument is almost total hogwash)

Now if the availability of the nanosecond precision of the TSC is
almost total hogwash, how can the network attack be a real concern?

Either the NOTSC feature is critically important (and I don't think it
is but it's not total hogwash either), or the network attach is an
absolute red-herring.

You can't get it both ways. It can't be the NOTSC isn't needed but the
network attack is a serious concern.

What is currently shocking me is that if you really think the network
attack isn't an absolute red-herring, then it's very weird you're
answering to my email instead of answering to Ingo when he says the
availability of the TSC is almost total hogwash.

And please feel free to demonstrate the network attack, remote seccomp
computations are already possible so if you want to start listening to
a signal you can.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-13 23:11                         ` andrea
@ 2006-07-13 23:20                           ` Pavel Machek
  2006-07-14  0:34                             ` andrea
  2006-07-15  2:55                           ` Valdis.Kletnieks
  1 sibling, 1 reply; 73+ messages in thread
From: Pavel Machek @ 2006-07-13 23:20 UTC (permalink / raw)
  To: andrea
  Cc: Alan Cox, ajwade, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel, mingo

Hi!

I do not want to enter seccomp flamewar, and that's why I did not
answer to Ingo.

> > Actually random delays are unlike to help (much). You have just added
> > noise, but you can still decode original signal...
> 
> You're wrong, the random delays added to every packet will definitely
> wipe out any signal.

Strictly speaking, this is wrong. This is like adding noise into the
room. You have to pick up maximum delay (ammount of noise), and you
clearly can't override signal that's longer than maximum delay. But
you also can't override signal that's half the maximum delay, given
that transmitter will retransmit it 4-or-so times. Just average 4
samples, and your random delays will cancel out.

No, this probably does not apply to seccomp, because we are picking
unintended noise from affected computer.

OTOH I'm pretty sure I could communicate from seccomp process by
sending zeros alone, and I cound communicate from another process on
box running seccomp through your randomizing packetizer to my machine.

								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-13 23:20                           ` Pavel Machek
@ 2006-07-14  0:34                             ` andrea
  0 siblings, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-14  0:34 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, ajwade, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel, mingo

On Fri, Jul 14, 2006 at 01:20:26AM +0200, Pavel Machek wrote:
> I do not want to enter seccomp flamewar, and that's why I did not
> answer to Ingo.

Ok ;), I didn't imagine this was the reason. So we agree that the risk
introduced by the availability of the TSC is orders of magnitude
higher than whatever network timing attack.

> Strictly speaking, this is wrong. This is like adding noise into the
> room. You have to pick up maximum delay (ammount of noise), and you
> clearly can't override signal that's longer than maximum delay. But
>
> you also can't override signal that's half the maximum delay, given
> that transmitter will retransmit it 4-or-so times. Just average 4
> samples, and your random delays will cancel out.
> 
> No, this probably does not apply to seccomp, because we are picking
> unintended noise from affected computer.

Well, perhaps I wasn't clear enough, but I am only talking about
seccomp not some other unrelated and hypothetical network system.

I know a few algorithms are potentially vulnerable to network timing
attacks, the tcp sequence number and urandom comes to mind. urandom is
perhaps the worst of all (which btw, it also gets data from the
tsc). Those issues have absolutely nothing to do with seccomp.

As far as seccomp is concerned the only worry is the demonstration of
the timing side channel that was getting openssl keys by controlling
the host and running openssl commands on the other cpu at his
will. And to do that you need the TSC. Even that is totally vapourware
because the attacked environment was strictly controlled by the
attacker, it's unclear what would happen shall the attacked
environment being mostly random like in real life. Disabling the TSC
has been generally agreed good enough to stop it.

The second one in the priority list are the readonly ptes mapping HPET
on some x86-64 config. The network timing attack to CPUShare is way
over what I could actually worry about.

> OTOH I'm pretty sure I could communicate from seccomp process by
> sending zeros alone, and I cound communicate from another process on
> box running seccomp through your randomizing packetizer to my machine.

You mean you could communicate using some sort of morse-code and you
would use the frequency of the zero to send the messages? That's
certainly possible today (without any randomizer), but the timing will
be measurable through the internet, so you're talking morse-code in
cleartext over the internet. The whole internet will sniff your
message. Furthermore I register all ip and ports for all transactions,
so it's really no different from direct a tcp connection even if you
talk encrypted-more-code over it, you're only wasting some resources.

Also note, there's absolutely no way for you to know for sure who you
are talking with.

So I don't see anything to worry about, feel free to communicate with
the other side through seccomp if you want, I'm certainly not adding a
randomizer to prevent that.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-13 23:11                         ` andrea
  2006-07-13 23:20                           ` Pavel Machek
@ 2006-07-15  2:55                           ` Valdis.Kletnieks
  2006-07-16  0:51                             ` andrea
  1 sibling, 1 reply; 73+ messages in thread
From: Valdis.Kletnieks @ 2006-07-15  2:55 UTC (permalink / raw)
  To: andrea
  Cc: Pavel Machek, Alan Cox, ajwade, Lee Revell, Randy.Dunlap,
	Andrew Morton, bunk, linux-kernel, mingo

[-- Attachment #1: Type: text/plain, Size: 942 bytes --]

On Fri, 14 Jul 2006 01:11:18 +0200, andrea@cpushare.com said:
> On Thu, Jul 13, 2006 at 09:29:41PM +0000, Pavel Machek wrote:
> > Actually random delays are unlike to help (much). You have just added
> > noise, but you can still decode original signal...
> 
> You're wrong, the random delays added to every packet will definitely
> wipe out any signal.

I call shenanigans on that.

Take a look at the NTP userspace code, which has some very nice code to
filter network jitter. 

In fact, the best you can do here is to reduce the effective bandwidth
the signal can have, as Shannon showed quite clearly.

And even 20 years ago, the guys who did the original DoD Orange Book
requirements understood this - they didn't make a requirement that covert
channels (both timing and other) be totally closed down, they only made
a requirement that for higher security configurations the bandwidth of
the channel be reduced below a specified level...

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-15  2:55                           ` Valdis.Kletnieks
@ 2006-07-16  0:51                             ` andrea
  2006-07-16  1:54                               ` Pavel Machek
  0 siblings, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-16  0:51 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Pavel Machek, Alan Cox, ajwade, Lee Revell, Randy.Dunlap,
	Andrew Morton, bunk, linux-kernel, mingo

On Fri, Jul 14, 2006 at 10:55:28PM -0400, Valdis.Kletnieks@vt.edu wrote:
> In fact, the best you can do here is to reduce the effective bandwidth
> the signal can have, as Shannon showed quite clearly.

Yes.

> And even 20 years ago, the guys who did the original DoD Orange Book
> requirements understood this - they didn't make a requirement that covert
> channels (both timing and other) be totally closed down, they only made
> a requirement that for higher security configurations the bandwidth of
> the channel be reduced below a specified level...

Why I think it's trivial to guarantee the closure of the seccomp side
channel timing attack even on a very fast internet by simply
introducing the random delay, is that below a certain sampling
frequency you won't be able to extract data from the latencies of the
cache. The max length of the random noise has to be >= of what it
takes to refill the whole cache. Then you won't know if it was a cache
miss or a random introduced delay that generated the slowdown, problem
solved.

As you and Pavel correctly pointed out, you can still communicate
whatever you want over the wire (between the two points) by using a
low enough frequency, but I don't think that has security relevance in
this context.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-16  0:51                             ` andrea
@ 2006-07-16  1:54                               ` Pavel Machek
  2006-07-16 15:36                                 ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Pavel Machek @ 2006-07-16  1:54 UTC (permalink / raw)
  To: andrea
  Cc: Valdis.Kletnieks, Alan Cox, ajwade, Lee Revell, Randy.Dunlap,
	Andrew Morton, bunk, linux-kernel, mingo

> On Fri, Jul 14, 2006 at 10:55:28PM -0400, Valdis.Kletnieks@vt.edu wrote:
> > In fact, the best you can do here is to reduce the effective bandwidth
> > the signal can have, as Shannon showed quite clearly.
> 
> Yes.
> 
> > And even 20 years ago, the guys who did the original DoD Orange Book
> > requirements understood this - they didn't make a requirement that covert
> > channels (both timing and other) be totally closed down, they only made
> > a requirement that for higher security configurations the bandwidth of
> > the channel be reduced below a specified level...
> 
> Why I think it's trivial to guarantee the closure of the seccomp side
> channel timing attack even on a very fast internet by simply
> introducing the random delay, is that below a certain sampling
> frequency you won't be able to extract data from the latencies of the
> cache. The max length of the random noise has to be >= of what it
> takes to refill the whole cache. Then you won't know if it was a
> cache

You won't know for sure... but. Let t be time takes to reload the
cache. Let your random noise be in <0, t> interval. According to you,
that would be okay. IT IS NOT.

If the original delay was long, and your generator returned t,
attacker sees 2*t. He can be _sure_ delay was long now.

If the delay was short, and your generator returns 0, attacker sees 0,
and _knows_ delay was short. (Chance that generator produces 0 or t is
small, but non zero).

Even if you do random noise in <0, 2*t) interval, I'll be able to
gather some statistics.

							Pavel
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-16  1:54                               ` Pavel Machek
@ 2006-07-16 15:36                                 ` andrea
  0 siblings, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-16 15:36 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Valdis.Kletnieks, Alan Cox, ajwade, Lee Revell, Randy.Dunlap,
	Andrew Morton, bunk, linux-kernel, mingo

On Sun, Jul 16, 2006 at 03:54:27AM +0200, Pavel Machek wrote:
> You won't know for sure... but. Let t be time takes to reload the
> cache. Let your random noise be in <0, t> interval. According to you,
> that would be okay. IT IS NOT.
> 
> If the original delay was long, and your generator returned t,
> attacker sees 2*t. He can be _sure_ delay was long now.

Well, it could be a random internet delay that made it 2*t. So you
certainly can't be sure, but I agree you can hope that you were lucky ;).

> If the delay was short, and your generator returns 0, attacker sees 0,
> and _knows_ delay was short. (Chance that generator produces 0 or t
> is

Yes, when you see zero you're sure there was no randomization, but the
zero is what you pay for adding randomization in the first place
(i.e. you need this zero delay for all the other points to become
random).

> small, but non zero).
> 
> Even if you do random noise in <0, 2*t) interval, I'll be able to
> gather some statistics.

And how would those statistic help you in extracting any meaningful
data out of the system? It's like if you've a .wav file completely
random except for a few points that you may guess they could be in
their original position (or close). With a few points scattered
randomly, you won't have any hope to listen to the wav music. It's not
like you're sampling a signal that repeat itself exactly the same
again and again so that you can reconstruct it by mixing the
zero-error points. To make an example when you measure the same point
again, and you won't get 0, you'll never know if it was the
artificial-random delay that made it non-zero, or if the randomizer
was 0 and this time the time measurement was non zero, or if it was a
network delay.

Even you were right that it would be theoretically feasible, at first
glance it sounds easier to crack the ssh key with brute force, than to
try to sniff the private key using your statistics on top of the
randomizer (even ignoring the number of network packets that you would
need to transfer ;).

Now going back to the current server code that doesn't have any
randomizer at all, keep in mind the attack in the paper happened in a
strictly artificial environment (not even close to real life), and the
TSC was used because it takes a nanosecond or so to run, so that you
can measure time at full cpu bandwidth (not at the rate of an
adsl). So if it could take 1 day of sniffing for the guy to extract
anything meaningful with the TSC in real life (which sounds very
unlikely too unless you run ssh in a loop), and you would find a way
to reliably measure the nanoseconds using a millisecond clock (and
here I mean there is no randomizer at all in the system, if I add the
randomizer the whole network attack would fall apart), it would take
you one million of days to sample the same data that the tsc can
sample in one day. And really it's double RTT because it's not a pure
p2p, and it'll be more likely in the order of 20msec if you're both an
adsl. That change alone will raise the time from 1 million days to 20
million days. Not only this assumes no randomizer, this also doesn't
account in any way the several repetition of measurements required to
apply ntp-like algorithms which would explode the number of days to
orders of magnitude bigger than the 1-20 million days mentioned above.

All the above considerations should be combined with the fact that a
CPUShare transaction takes 1 hour, not 1-20 million days. Once the
transaction is complete you will never know who you attach with next
time, so after one hour passes, your above statistic would be sampling
a different ssh private key, not the same one.

No matter from what point of view you revolt the problem, the network
attack sounds the least thing I could be concerned about, attacks
against urandom for the ssh private key generation sounds more likely
than this one.

And if somebody attempts this kind of attack, I'll be noticing with
the network bill ;). Once my network bill will be high enough I may
decide to add the randomizer just in case. The sell orders could have
network quotas as well in the future, so a seller can specify the max
amount of network data he accepts to transfer during the transaction
to be more secure and to generate less network traffic spread over the
hour of computations (so he can still leave a good portion of his adsl
free to surf or run other p2p software).

Last but not the least, completely closing the cache timing side
channel is possible if I wanted to by simply invalidate the whole l2
cache in the same place were I flip the cr4 (plus a change of the
scheduler to forbid seccomp and non-seccomp tasks to mix in the same
physical cpu). It's just not worth it.

While I want the best security available in the basic computing mode
supported by all clients (i.e. seccomp), CPUShare is very clear that
any exposure of confidential data through the internet, or any other
damage like spread of troyans, spyware, adware or viruses is at your
risk. While I'm convinced the network timing attack is a total
hogwash, CPU bugs are very possible, kernel bugs are very possible
too, those are orders of magnitude likely to be exploitable than
whatever network timing attack. No matter what technology I use,
everything can be buggy at both the hardware and software layer, even
the math itself of the crypto could have been proven wrong. There's no
way for me to provide any guarantee, the kernel itself is under the
GPL with no warranty, and so is seccomp under the GPL and with no
warranty too. I only can guarantee that I'm doing my best for anything
that makes sense (i.e. the tsc, since the tsc being so fast [and so
accurate too], million times more risky than whatever remote
clock). Worrying about the TSC sounds paranoid enough already,
worrying about the network attack is just way over I can consider to
have practical security relevance as of 2006 internet network
bandwidth and latency in connection with the current CPUShare code. As
I said I'd be glad to check again 10 years from now in the hope
latencies will go down to the usec and bandwidth up to the gigabits
like I strongly hope.

If you don't believe me and you want to be sure, feel free to add a
seccomp mode that flushes the cache at every context switch from
non-seccomp to seccomp plus the HT hack as I suggested above. You
don't have to trust me, all code that runs on your computer is free
software and not covered by any pending-patent at all, so you can
change it as you want, though if it was me I wouldn't do that since I
don't like to run slow for no apparent good reason.

If you've more doubts please feel free to go ahead, but I recommend to
move this thread on the security attacks on cpushare-devel mailing
list. This is way offtopic here (I'm answering in CC just to express
my point of view on the matter, but I don't think it's much relevant
for this list).

Thanks Pavel.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-12 22:02                   ` Alan Cox
  2006-07-12 23:44                     ` andrea
@ 2006-07-13  2:56                     ` Andrew James Wade
  1 sibling, 0 replies; 73+ messages in thread
From: Andrew James Wade @ 2006-07-13  2:56 UTC (permalink / raw)
  To: Alan Cox
  Cc: andrea, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel, mingo

On Wednesday 12 July 2006 18:02, Alan Cox wrote:
...
> Actually measuring time through the network is extremely doable given
> enough samples as is communication through delay perturbation. A good
> viterbi encoder/decoder will fish a signal out of very high noise. Yes
> you pay a lot in data rate at that point but it works.

The data rate is important: it can mean a difference between an attack
that is practical and one that is impractical. Although I suspect the
orders of magnitude in the sampling rate of rdtsc versus network
packet times is more important. Another source of timing information
could be two seccomp threads (scheduled to different cores/SMT
threads) comparing their relative progress. What made the L1 cache
miss side-channel so interesting was its very high bandwidth. Relative
progress timing techniques will yield lower side-channel bandwidth on
many interesting configurations.

> Anyway at the point you pass the bytecode through a processing filter
> you don't need SECCOMP because your filter can remove any syscall
> attempts. 

Both int 80 and the rdtsc instructions are only 2 bytes long: they'll
generate too many false positives. It may be practical to filter out
the "f00f" instructions though. 
And filtering isn't fail-safe from a security point-of-view: if you
miss a case you lose. For example, can the f00f bug still be triggered
if there are prefixes between the lock prefix and the cmpxchg8b? I
don't know, but if so you'll need to filter for those cases too.

Filtering may be a good idea, but I wouldn't want to rely on it alone.

Andrew Wade

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-11 20:19               ` Andrew James Wade
  2006-07-12 21:05                 ` andrea
@ 2006-07-12 21:13                 ` Ingo Molnar
  2006-07-13  1:16                   ` andrea
  2006-07-13  1:37                   ` Andrew James Wade
  1 sibling, 2 replies; 73+ messages in thread
From: Ingo Molnar @ 2006-07-12 21:13 UTC (permalink / raw)
  To: ajwade; +Cc: andrea, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel

* Andrew James Wade <andrew.j.wade@gmail.com> wrote:

> And that's where fail-safe and simple design comes in. In this 
> application an oops is better than a jail-break by orders of 
> magnitude. But then that's why you wrote seccomp instead of using 
> ptrace in the first place.

actually, the client side of ptrace isnt all that more complex. I guess 
one of the main problems with using ptrace was that it has no catchy 
name that Andrea could claim for his project and that it couldnt be 
patented ;-)

Andrea could have isolated the 'client side' functionality of ptrace 
(which is often confused with the 'server side' of ptrace - where the 
overwhelming majority of ptrace security holes were located) and he 
could have made it simple to review, to get a comparable 'feeling' of 
security. [User Mode Linux uses the client-side ptrace model to execute 
untrusted code.]

Andrea could also have extended ptrace to solve whatever marginal 
problems he has with ptrace. [in fact such extension of ptrace was 
posted recently, see Roland McGrath's utrace framework!]

But he chose not to do so - and that has nothing to do with being unable 
to improve ptrace - it evidently is improvable. So i see SECCOMP being 
the result of the NIH syndrome.

	Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-12 21:13                 ` Ingo Molnar
@ 2006-07-13  1:16                   ` andrea
  2006-07-13  1:37                   ` Andrew James Wade
  1 sibling, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-13  1:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: ajwade, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel

I think this thread is not worth reading but since you tell me that I
am affected by NIH syndrome...

On Wed, Jul 12, 2006 at 11:13:58PM +0200, Ingo Molnar wrote:
> But he chose not to do so - and that has nothing to do with being unable 
> to improve ptrace - it evidently is improvable. So i see SECCOMP being 
> the result of the NIH syndrome.

	http://www.cpushare.com/hypermail/cpushare-discuss/06/01/0086.html

	"so if you walk this thought-experiment a bit, you'll quickly arrive to a 
	virtual machine that is actually pretty useful already, and is fully 
	provable. You should not dismiss this as "I dont trust it because it's 
	at ring 0", unless you can show some fatal flaw in my thinking."

In short a few months ago you told me that instead of using SECCOMP I
should filter and run the resulting filtered (but originally
untrusted) bytecode in ring0 inside the kernel without any memory
protection, because your filter was a lot more than the "int 0x80
checks plus jump offsets checks" (that Alan just mentioned), you
suggested a sort of kernel virtual machine that you just invented that
would verify where the bytecode could touch the ram. So the first bug
in this hugely complicated (and IMHO nearly impossible to implement
filter due the lack of memory protection) gives the attacker ring0
privilege because it all runs inside the kernel...

If it's you, inventing it the huge complicated insecure and probably
impossible to realize virtual machine in the kernel, it's ok. While if
it's me, I should be using ptrace instead of the simpler and much more
secure seccomp. If the above is not NIH syndrome I don't know what
that is.

About your first claim, as far as I'm concerned, if I use seccomp or
ptrace it doesn't change anything at all for me. It only changes for
the *users* because if I use ptrace they'll be less secure.

If not obvious yet the reasons of seccomp are multiple.

1) gdb (all buyers of the CPU resources will need a way to debug their
   own bytecode by running the sell client on their own machine, if
   something goes wrong). If seccomp will be used by others to do
   secure decompression, the fact gdb will keep working will be
   welcome for the same reason.
2) strace (I used it quite a lot of times to debug problems in seccomp
   bytecode myself already, it has been a fundamental debugging tool
   as usual)
3) not being susceptible to the newest ptrace addition of the day (you
   just mentioned a ptrace rework with the utrace framework).
4) I didn't want to require an userland knowledge of kernel details
   like syscall numbers and signal details for all archs (the signal
   part could even go out of sync with the kernel).
5) being simple, so less likely to be buggy. ptrace security is
   extremely complex, just the case where the ptracer tasks gets
   killed and the ptraced tasks must be synchronously sigkilled too is
   a controlled race in itself (that again could break after the
   frequent ptrace reworks you just mentioned and since it's a race
   condition it may go unnoticed for a while).
6) Ptrace is generally not being used for critical security features,
   it's primarly a debugging kludge, and others (including my ex. prof) are
   using it for virtualization similar in UML. But they're not running
   untrusted bytecode inside UML or in their virtualization software.
   No major outbreak would happen if ptrace suddenly breaks after one
   of the frequent reworks that you just mentioned.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [2.6 patch] let CONFIG_SECCOMP default to n
  2006-07-12 21:13                 ` Ingo Molnar
  2006-07-13  1:16                   ` andrea
@ 2006-07-13  1:37                   ` Andrew James Wade
  1 sibling, 0 replies; 73+ messages in thread
From: Andrew James Wade @ 2006-07-13  1:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: andrea, Lee Revell, Randy.Dunlap, Andrew Morton, bunk,
	linux-kernel

On Wednesday 12 July 2006 17:13, Ingo Molnar wrote:
...
> actually, the client side of ptrace isnt all that more complex.

Ah. I'm out of my depth here. 

Andrew Wade

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] TIF_NOTSC and SECCOMP prctl
@ 2006-07-18 10:20 Chuck Ebbert
  2006-07-18 13:29 ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Chuck Ebbert @ 2006-07-18 10:20 UTC (permalink / raw)
  To: andrea@cpushare.com
  Cc: bruce@andrew.cmu.edu, linux-kernel, Alan Cox, Arjan van de Ven,
	Adrian Bunk, Lee Revell, Linus Torvalds, Ingo Molnar

In-Reply-To: <20060714060932.GE18774@opteron.random>

On Fri, 14 Jul 2006 08:09:32 +0200, andrea@cpushare.com wrote:

> The below patch seems to work, I ported all my client code on top of
> prctl already. (it's a bit more painful to autodetect a kernel with
> CONFIG_SECCOMP turned off but I already adapted to it)

AFAIC the /proc method of controlling seccomp is so ugly it should
just go, but what about backwards compatibility?

I have a couple of questions:


+void disable_TSC(void)
+{
+       if (!test_and_set_thread_flag(TIF_NOTSC))
+               /*
+                * Must flip the CPU state synchronously with
+                * TIF_NOTSC in the current running context.
+                */
+               hard_disable_TSC();
+}

This gets called from sys_prctl().  Do you need to worry about preemption
between the test_and_set and TSC disable?


--- a/include/asm-i386/processor.h      Thu Jul 13 03:03:35 2006 +0700
+++ b/include/asm-i386/processor.h      Fri Jul 14 07:47:57 2006 +0200
@@ -256,6 +256,10 @@ static inline void clear_in_cr4 (unsigne
        cr4 &= ~mask;
        write_cr4(cr4);
 }
+
+extern void hard_disable_TSC(void);
+extern void disable_TSC(void);
+extern void hard_enable_TSC(void);

Maybe these should be inline?  They're really small and that way you
don't need #ifdef around the code for them.


> Reviews are welcome (then I will move into x86-64, all other archs
> supporting seccomp should require no changes despite the API
> change). Thanks.

For x86_64 you need this:

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt-current/patches/tif-flags-for-debug-regs-and-io-bitmap-in-ctxsw

But I don't think Andi plans on pushing it for 2.6.18.

-- 
Chuck

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] TIF_NOTSC and SECCOMP prctl
  2006-07-18 10:20 [PATCH] TIF_NOTSC and SECCOMP prctl Chuck Ebbert
@ 2006-07-18 13:29 ` andrea
  2006-07-25 21:44   ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-18 13:29 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: bruce@andrew.cmu.edu, linux-kernel, Alan Cox, Arjan van de Ven,
	Adrian Bunk, Lee Revell, Linus Torvalds, Ingo Molnar

On Tue, Jul 18, 2006 at 06:20:20AM -0400, Chuck Ebbert wrote:
> AFAIC the /proc method of controlling seccomp is so ugly it should
> just go, but what about backwards compatibility?

Given that so far CPUShare seems the only user there should be no
problem, I already uploaded a new CPUShare package that handles both
the old and new interfaces transparently, no matter what kernel runs
under it.

> I have a couple of questions:
> 
> 
> +void disable_TSC(void)
> +{
> +       if (!test_and_set_thread_flag(TIF_NOTSC))
> +               /*
> +                * Must flip the CPU state synchronously with
> +                * TIF_NOTSC in the current running context.
> +                */
> +               hard_disable_TSC();
> +}
> 
> This gets called from sys_prctl().  Do you need to worry about preemption
> between the test_and_set and TSC disable?

I tend to completely forget about preempt.

> Maybe these should be inline?  They're really small and that way you
> don't need #ifdef around the code for them.

I wanted to reduce the bytecode overhead to the minimum when seccomp
is set to y, for that I tried to avoided inlines.

> For x86_64 you need this:
> 
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt-current/patches/tif-flags-for-debug-regs-and-io-bitmap-in-ctxsw
> 
> But I don't think Andi plans on pushing it for 2.6.18.

Thanks for the pointer.

For now the patch I posted already works on x86-64 and on all other
archs (x86_64 misses the notsc feature for now, but that's not a
problem, the patch is self contained and we can take care of the notsc
for x86-64 later on).

This is the incremental patch to address the preempt=y kernel builds.

diff -r 373f0be00c40 arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c	Sun Jul 16 15:51:54 2006 +0200
+++ b/arch/i386/kernel/process.c	Tue Jul 18 14:59:23 2006 +0200
@@ -542,12 +542,14 @@ void hard_disable_TSC(void)
 }
 void disable_TSC(void)
 {
+	preempt_disable();
 	if (!test_and_set_thread_flag(TIF_NOTSC))
 		/*
 		 * Must flip the CPU state synchronously with
 		 * TIF_NOTSC in the current running context.
 		 */
 		hard_disable_TSC();
+	preempt_enable();
 }
 void hard_enable_TSC(void)
 {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] TIF_NOTSC and SECCOMP prctl
  2006-07-18 13:29 ` andrea
@ 2006-07-25 21:44   ` andrea
  2006-07-26  8:07     ` Ingo Molnar
  0 siblings, 1 reply; 73+ messages in thread
From: andrea @ 2006-07-25 21:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, bruce@andrew.cmu.edu, Alan Cox, Arjan van de Ven,
	Adrian Bunk, Lee Revell, Linus Torvalds, Ingo Molnar

Here a repost of the last seccomp patch against current mainline
including the preempt fix. This changes the seccomp API from
/proc/<pid>/seccomp to a prctl (this will produce a smaller kernel)
and it adds a TIF_NOTSC that seccomp sets. Only the current task can
call disable_TSC (obviously because it hasn't a task_t param). This
includes Chuck's patch to give zero runtime cost to the notsc feature.

After applying this patch, seccomp will keep working fine on all other
archs that currently support it too.

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>

diff -r 93feac10afde -r fa49c58866fe arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c	Tue Jul 25 11:05:21 2006 -0200
+++ b/arch/i386/kernel/process.c	Tue Jul 25 23:33:52 2006 +0200
@@ -535,8 +535,31 @@ int dump_task_regs(struct task_struct *t
 	return 1;
 }
 
-static noinline void __switch_to_xtra(struct task_struct *next_p,
-				    struct tss_struct *tss)
+#ifdef CONFIG_SECCOMP
+void hard_disable_TSC(void)
+{
+	write_cr4(read_cr4() | X86_CR4_TSD);
+}
+void disable_TSC(void)
+{
+	preempt_disable();
+	if (!test_and_set_thread_flag(TIF_NOTSC))
+		/*
+		 * Must flip the CPU state synchronously with
+		 * TIF_NOTSC in the current running context.
+		 */
+		hard_disable_TSC();
+	preempt_enable();
+}
+void hard_enable_TSC(void)
+{
+	write_cr4(read_cr4() & ~X86_CR4_TSD);
+}
+#endif /* CONFIG_SECCOMP */
+
+static noinline void
+__switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
+		 struct tss_struct *tss)
 {
 	struct thread_struct *next;
 
@@ -552,60 +575,47 @@ static noinline void __switch_to_xtra(st
 		set_debugreg(next->debugreg[7], 7);
 	}
 
-	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+#ifdef CONFIG_SECCOMP
+	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
+	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
+		/* prev and next are different */
+		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
+			hard_disable_TSC();
+		else
+			hard_enable_TSC();
+	}
+#endif
+
+	if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP) ||
+	    test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+		if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+			/*
+			 * Disable the bitmap via an invalid offset. We still cache
+			 * the previous bitmap owner and the IO bitmap contents:
+			 */
+			tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
+			return;
+		}
+
+		if (likely(next == tss->io_bitmap_owner)) {
+			/*
+			 * Previous owner of the bitmap (hence the bitmap content)
+			 * matches the next task, we dont have to do anything but
+			 * to set a valid offset in the TSS:
+			 */
+			tss->io_bitmap_base = IO_BITMAP_OFFSET;
+			return;
+		}
 		/*
-		 * Disable the bitmap via an invalid offset. We still cache
-		 * the previous bitmap owner and the IO bitmap contents:
+		 * Lazy TSS's I/O bitmap copy. We set an invalid offset here
+		 * and we let the task to get a GPF in case an I/O instruction
+		 * is performed.  The handler of the GPF will verify that the
+		 * faulting task has a valid I/O bitmap and, it true, does the
+		 * real copy and restart the instruction.  This will save us
+		 * redundant copies when the currently switched task does not
+		 * perform any I/O during its timeslice.
 		 */
-		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
-		return;
-	}
-
-	if (likely(next == tss->io_bitmap_owner)) {
-		/*
-		 * Previous owner of the bitmap (hence the bitmap content)
-		 * matches the next task, we dont have to do anything but
-		 * to set a valid offset in the TSS:
-		 */
-		tss->io_bitmap_base = IO_BITMAP_OFFSET;
-		return;
-	}
-	/*
-	 * Lazy TSS's I/O bitmap copy. We set an invalid offset here
-	 * and we let the task to get a GPF in case an I/O instruction
-	 * is performed.  The handler of the GPF will verify that the
-	 * faulting task has a valid I/O bitmap and, it true, does the
-	 * real copy and restart the instruction.  This will save us
-	 * redundant copies when the currently switched task does not
-	 * perform any I/O during its timeslice.
-	 */
-	tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET_LAZY;
-}
-
-/*
- * This function selects if the context switch from prev to next
- * has to tweak the TSC disable bit in the cr4.
- */
-static inline void disable_tsc(struct task_struct *prev_p,
-			       struct task_struct *next_p)
-{
-	struct thread_info *prev, *next;
-
-	/*
-	 * gcc should eliminate the ->thread_info dereference if
-	 * has_secure_computing returns 0 at compile time (SECCOMP=n).
-	 */
-	prev = task_thread_info(prev_p);
-	next = task_thread_info(next_p);
-
-	if (has_secure_computing(prev) || has_secure_computing(next)) {
-		/* slow path here */
-		if (has_secure_computing(prev) &&
-		    !has_secure_computing(next)) {
-			write_cr4(read_cr4() & ~X86_CR4_TSD);
-		} else if (!has_secure_computing(prev) &&
-			   has_secure_computing(next))
-			write_cr4(read_cr4() | X86_CR4_TSD);
+		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET_LAZY;
 	}
 }
 
@@ -690,11 +700,9 @@ struct task_struct fastcall * __switch_t
 	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
 	 */
-	if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW))
-	    || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP))
-		__switch_to_xtra(next_p, tss);
-
-	disable_tsc(prev_p, next_p);
+	if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
+		     task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
+		__switch_to_xtra(prev_p, next_p, tss);
 
 	return prev_p;
 }
diff -r 93feac10afde -r fa49c58866fe fs/proc/base.c
--- a/fs/proc/base.c	Tue Jul 25 11:05:21 2006 -0200
+++ b/fs/proc/base.c	Tue Jul 25 23:33:52 2006 +0200
@@ -67,7 +67,6 @@
 #include <linux/mount.h>
 #include <linux/security.h>
 #include <linux/ptrace.h>
-#include <linux/seccomp.h>
 #include <linux/cpuset.h>
 #include <linux/audit.h>
 #include <linux/poll.h>
@@ -98,9 +97,6 @@ enum pid_directory_inos {
 	PROC_TGID_TASK,
 	PROC_TGID_STATUS,
 	PROC_TGID_MEM,
-#ifdef CONFIG_SECCOMP
-	PROC_TGID_SECCOMP,
-#endif
 	PROC_TGID_CWD,
 	PROC_TGID_ROOT,
 	PROC_TGID_EXE,
@@ -141,9 +137,6 @@ enum pid_directory_inos {
 	PROC_TID_INO,
 	PROC_TID_STATUS,
 	PROC_TID_MEM,
-#ifdef CONFIG_SECCOMP
-	PROC_TID_SECCOMP,
-#endif
 	PROC_TID_CWD,
 	PROC_TID_ROOT,
 	PROC_TID_EXE,
@@ -212,9 +205,6 @@ static struct pid_entry tgid_base_stuff[
 	E(PROC_TGID_NUMA_MAPS, "numa_maps", S_IFREG|S_IRUGO),
 #endif
 	E(PROC_TGID_MEM,       "mem",     S_IFREG|S_IRUSR|S_IWUSR),
-#ifdef CONFIG_SECCOMP
-	E(PROC_TGID_SECCOMP,   "seccomp", S_IFREG|S_IRUSR|S_IWUSR),
-#endif
 	E(PROC_TGID_CWD,       "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_ROOT,      "root",    S_IFLNK|S_IRWXUGO),
 	E(PROC_TGID_EXE,       "exe",     S_IFLNK|S_IRWXUGO),
@@ -255,9 +245,6 @@ static struct pid_entry tid_base_stuff[]
 	E(PROC_TID_NUMA_MAPS,  "numa_maps",    S_IFREG|S_IRUGO),
 #endif
 	E(PROC_TID_MEM,        "mem",     S_IFREG|S_IRUSR|S_IWUSR),
-#ifdef CONFIG_SECCOMP
-	E(PROC_TID_SECCOMP,    "seccomp", S_IFREG|S_IRUSR|S_IWUSR),
-#endif
 	E(PROC_TID_CWD,        "cwd",     S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_ROOT,       "root",    S_IFLNK|S_IRWXUGO),
 	E(PROC_TID_EXE,        "exe",     S_IFLNK|S_IRWXUGO),
@@ -991,78 +978,6 @@ static struct file_operations proc_login
 	.write		= proc_loginuid_write,
 };
 #endif
-
-#ifdef CONFIG_SECCOMP
-static ssize_t seccomp_read(struct file *file, char __user *buf,
-			    size_t count, loff_t *ppos)
-{
-	struct task_struct *tsk = get_proc_task(file->f_dentry->d_inode);
-	char __buf[20];
-	loff_t __ppos = *ppos;
-	size_t len;
-
-	if (!tsk)
-		return -ESRCH;
-	/* no need to print the trailing zero, so use only len */
-	len = sprintf(__buf, "%u\n", tsk->seccomp.mode);
-	put_task_struct(tsk);
-	if (__ppos >= len)
-		return 0;
-	if (count > len - __ppos)
-		count = len - __ppos;
-	if (copy_to_user(buf, __buf + __ppos, count))
-		return -EFAULT;
-	*ppos = __ppos + count;
-	return count;
-}
-
-static ssize_t seccomp_write(struct file *file, const char __user *buf,
-			     size_t count, loff_t *ppos)
-{
-	struct task_struct *tsk = get_proc_task(file->f_dentry->d_inode);
-	char __buf[20], *end;
-	unsigned int seccomp_mode;
-	ssize_t result;
-
-	result = -ESRCH;
-	if (!tsk)
-		goto out_no_task;
-
-	/* can set it only once to be even more secure */
-	result = -EPERM;
-	if (unlikely(tsk->seccomp.mode))
-		goto out;
-
-	result = -EFAULT;
-	memset(__buf, 0, sizeof(__buf));
-	count = min(count, sizeof(__buf) - 1);
-	if (copy_from_user(__buf, buf, count))
-		goto out;
-
-	seccomp_mode = simple_strtoul(__buf, &end, 0);
-	if (*end == '\n')
-		end++;
-	result = -EINVAL;
-	if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) {
-		tsk->seccomp.mode = seccomp_mode;
-		set_tsk_thread_flag(tsk, TIF_SECCOMP);
-	} else
-		goto out;
-	result = -EIO;
-	if (unlikely(!(end - __buf)))
-		goto out;
-	result = end - __buf;
-out:
-	put_task_struct(tsk);
-out_no_task:
-	return result;
-}
-
-static struct file_operations proc_seccomp_operations = {
-	.read		= seccomp_read,
-	.write		= seccomp_write,
-};
-#endif /* CONFIG_SECCOMP */
 
 static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
@@ -1753,12 +1668,6 @@ static struct dentry *proc_pident_lookup
 		case PROC_TGID_MEM:
 			inode->i_fop = &proc_mem_operations;
 			break;
-#ifdef CONFIG_SECCOMP
-		case PROC_TID_SECCOMP:
-		case PROC_TGID_SECCOMP:
-			inode->i_fop = &proc_seccomp_operations;
-			break;
-#endif /* CONFIG_SECCOMP */
 		case PROC_TID_MOUNTS:
 		case PROC_TGID_MOUNTS:
 			inode->i_fop = &proc_mounts_operations;
diff -r 93feac10afde -r fa49c58866fe include/asm-i386/processor.h
--- a/include/asm-i386/processor.h	Tue Jul 25 11:05:21 2006 -0200
+++ b/include/asm-i386/processor.h	Tue Jul 25 23:33:52 2006 +0200
@@ -256,6 +256,10 @@ static inline void clear_in_cr4 (unsigne
 	cr4 &= ~mask;
 	write_cr4(cr4);
 }
+
+extern void hard_disable_TSC(void);
+extern void disable_TSC(void);
+extern void hard_enable_TSC(void);
 
 /*
  *      NSC/Cyrix CPU configuration register indexes
diff -r 93feac10afde -r fa49c58866fe include/asm-i386/thread_info.h
--- a/include/asm-i386/thread_info.h	Tue Jul 25 11:05:21 2006 -0200
+++ b/include/asm-i386/thread_info.h	Tue Jul 25 23:33:52 2006 +0200
@@ -142,6 +142,7 @@ static inline struct thread_info *curren
 #define TIF_MEMDIE		16
 #define TIF_DEBUG		17	/* uses debug registers */
 #define TIF_IO_BITMAP		18	/* uses I/O bitmap */
+#define TIF_NOTSC		19	/* TSC is not accessible in userland */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
@@ -155,6 +156,7 @@ static inline struct thread_info *curren
 #define _TIF_RESTORE_SIGMASK	(1<<TIF_RESTORE_SIGMASK)
 #define _TIF_DEBUG		(1<<TIF_DEBUG)
 #define _TIF_IO_BITMAP		(1<<TIF_IO_BITMAP)
+#define _TIF_NOTSC		(1<<TIF_NOTSC)
 
 /* work to do on interrupt/exception return */
 #define _TIF_WORK_MASK \
@@ -164,7 +166,8 @@ static inline struct thread_info *curren
 #define _TIF_ALLWORK_MASK	(0x0000FFFF & ~_TIF_SECCOMP)
 
 /* flags to check in __switch_to() */
-#define _TIF_WORK_CTXSW (_TIF_DEBUG|_TIF_IO_BITMAP)
+#define _TIF_WORK_CTXSW_NEXT (_TIF_IO_BITMAP | _TIF_NOTSC | _TIF_DEBUG)
+#define _TIF_WORK_CTXSW_PREV (_TIF_IO_BITMAP | _TIF_NOTSC)
 
 /*
  * Thread-synchronous status.
diff -r 93feac10afde -r fa49c58866fe include/linux/prctl.h
--- a/include/linux/prctl.h	Tue Jul 25 11:05:21 2006 -0200
+++ b/include/linux/prctl.h	Tue Jul 25 23:33:52 2006 +0200
@@ -59,4 +59,8 @@
 # define PR_ENDIAN_LITTLE	1	/* True little endian mode */
 # define PR_ENDIAN_PPC_LITTLE	2	/* "PowerPC" pseudo little endian */
 
+/* Get/set process seccomp mode */
+#define PR_GET_SECCOMP	21
+#define PR_SET_SECCOMP	22
+
 #endif /* _LINUX_PRCTL_H */
diff -r 93feac10afde -r fa49c58866fe include/linux/seccomp.h
--- a/include/linux/seccomp.h	Tue Jul 25 11:05:21 2006 -0200
+++ b/include/linux/seccomp.h	Tue Jul 25 23:33:52 2006 +0200
@@ -3,8 +3,6 @@
 
 
 #ifdef CONFIG_SECCOMP
-
-#define NR_SECCOMP_MODES 1
 
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
@@ -18,20 +16,23 @@ static inline void secure_computing(int 
 		__secure_computing(this_syscall);
 }
 
-static inline int has_secure_computing(struct thread_info *ti)
-{
-	return unlikely(test_ti_thread_flag(ti, TIF_SECCOMP));
-}
+extern long prctl_get_seccomp(void);
+extern long prctl_set_seccomp(unsigned long);
 
 #else /* CONFIG_SECCOMP */
 
 typedef struct { } seccomp_t;
 
 #define secure_computing(x) do { } while (0)
-/* static inline to preserve typechecking */
-static inline int has_secure_computing(struct thread_info *ti)
+
+static inline long prctl_get_seccomp(void)
 {
-	return 0;
+	return -EINVAL;
+}
+
+static inline long prctl_set_seccomp(unsigned long arg2)
+{
+	return -EINVAL;
 }
 
 #endif /* CONFIG_SECCOMP */
diff -r 93feac10afde -r fa49c58866fe kernel/seccomp.c
--- a/kernel/seccomp.c	Tue Jul 25 11:05:21 2006 -0200
+++ b/kernel/seccomp.c	Tue Jul 25 23:33:52 2006 +0200
@@ -1,7 +1,7 @@
 /*
  * linux/kernel/seccomp.c
  *
- * Copyright 2004-2005  Andrea Arcangeli <andrea@cpushare.com>
+ * Copyright 2004-2006  Andrea Arcangeli <andrea@cpushare.com>
  *
  * This defines a simple but solid secure-computing mode.
  */
@@ -10,6 +10,7 @@
 #include <linux/sched.h>
 
 /* #define SECCOMP_DEBUG 1 */
+#define NR_SECCOMP_MODES 1
 
 /*
  * Secure computing mode 1 allows only read/write/exit/sigreturn.
@@ -54,3 +55,31 @@ void __secure_computing(int this_syscall
 #endif
 	do_exit(SIGKILL);
 }
+
+long prctl_get_seccomp(void)
+{
+	return current->seccomp.mode;
+}
+
+long prctl_set_seccomp(unsigned long seccomp_mode)
+{
+	long ret;
+
+	/* can set it only once to be even more secure */
+	ret = -EPERM;
+	if (unlikely(current->seccomp.mode))
+		goto out;
+
+	ret = -EINVAL;
+	if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) {
+		current->seccomp.mode = seccomp_mode;
+		set_thread_flag(TIF_SECCOMP);
+#ifdef TIF_NOTSC
+		disable_TSC();
+#endif
+		ret = 0;
+	}
+
+ out:
+	return ret;
+}
diff -r 93feac10afde -r fa49c58866fe kernel/sys.c
--- a/kernel/sys.c	Tue Jul 25 11:05:21 2006 -0200
+++ b/kernel/sys.c	Tue Jul 25 23:33:52 2006 +0200
@@ -28,6 +28,7 @@
 #include <linux/tty.h>
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
+#include <linux/seccomp.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -2056,6 +2057,13 @@ asmlinkage long sys_prctl(int option, un
 			error = SET_ENDIAN(current, arg2);
 			break;
 
+		case PR_GET_SECCOMP:
+			error = prctl_get_seccomp();
+			break;
+		case PR_SET_SECCOMP:
+			error = prctl_set_seccomp(arg2);
+			break;
+
 		default:
 			error = -EINVAL;
 			break;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] TIF_NOTSC and SECCOMP prctl
  2006-07-25 21:44   ` andrea
@ 2006-07-26  8:07     ` Ingo Molnar
  2006-07-26 11:45       ` andrea
  0 siblings, 1 reply; 73+ messages in thread
From: Ingo Molnar @ 2006-07-26  8:07 UTC (permalink / raw)
  To: andrea
  Cc: linux-kernel, Chuck Ebbert, bruce@andrew.cmu.edu, Alan Cox,
	Arjan van de Ven, Adrian Bunk, Lee Revell, Linus Torvalds


* andrea@cpushare.com <andrea@cpushare.com> wrote:

> Here a repost of the last seccomp patch against current mainline 
> including the preempt fix. This changes the seccomp API from 
> /proc/<pid>/seccomp to a prctl (this will produce a smaller kernel) 
> and it adds a TIF_NOTSC that seccomp sets. Only the current task can 
> call disable_TSC (obviously because it hasn't a task_t param). This 
> includes Chuck's patch to give zero runtime cost to the notsc feature.

please send a patch-queue that is properly split-up: the bugfix, the API 
change and the TIF_NOTSC improvement.

	Ingo

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH] TIF_NOTSC and SECCOMP prctl
  2006-07-26  8:07     ` Ingo Molnar
@ 2006-07-26 11:45       ` andrea
  0 siblings, 0 replies; 73+ messages in thread
From: andrea @ 2006-07-26 11:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Chuck Ebbert, bruce@andrew.cmu.edu, Alan Cox,
	Arjan van de Ven, Adrian Bunk, Lee Revell, Linus Torvalds

On Wed, Jul 26, 2006 at 10:07:39AM +0200, Ingo Molnar wrote:
> 
> * andrea@cpushare.com <andrea@cpushare.com> wrote:
> 
> > Here a repost of the last seccomp patch against current mainline 
> > including the preempt fix. This changes the seccomp API from 
> > /proc/<pid>/seccomp to a prctl (this will produce a smaller kernel) 
> > and it adds a TIF_NOTSC that seccomp sets. Only the current task can 
> > call disable_TSC (obviously because it hasn't a task_t param). This 
> > includes Chuck's patch to give zero runtime cost to the notsc feature.
> 
> please send a patch-queue that is properly split-up: the bugfix, the API 
> change and the TIF_NOTSC improvement.

Which bugfix do you mean? If you mean the preempt fix for the NOTSC
improvement it makes no sense to split it up from the NOTSC
part. There are no other bugfixes (the reduction of the notsc window
isn't strictly a bugfix, since the feature already helped).

I can split the API change from the NOTSC feature, I'll wait some more
days in the hope this one goes in. If it doesn't go in I'll follow
your suggestion and I'll try again later with the split up in the hope
to increase my chances.

>From my point of view it's not urgent to merge it, it's just the
anti-seccomp advocates that should want this patch being merged
urgently.

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2006-07-26 11:44 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-29 19:21 [2.6 patch] let CONFIG_SECCOMP default to n Adrian Bunk
2006-06-30  0:44 ` Lee Revell
2006-06-30  1:07   ` Andrew Morton
2006-06-30  1:40     ` Adrian Bunk
2006-06-30  4:52       ` Andrea Arcangeli
2006-06-30  9:47         ` Ingo Molnar
2006-06-30 14:58           ` andrea
2006-07-11  7:36             ` [patch] " Ingo Molnar
2006-07-11 14:17               ` andrea
2006-07-11 14:32                 ` Arjan van de Ven
2006-07-11 15:31                   ` andrea
2006-07-11 15:54                     ` Arjan van de Ven
2006-07-11 16:13                       ` andrea
2006-07-11 16:23                         ` Arjan van de Ven
2006-07-11 16:57                         ` Alan Cox
2006-07-11 16:25                       ` Alan Cox
2006-07-11 16:02                     ` Adrian Bunk
2006-07-11 16:16                       ` andrea
2006-07-11 16:24                     ` Alan Cox
2006-07-12 15:43                       ` Andi Kleen
2006-07-12 21:07                         ` Ingo Molnar
2006-07-12 22:06                           ` Andi Kleen
2006-07-12 22:19                             ` Ingo Molnar
2006-07-12 22:33                               ` Andi Kleen
2006-07-12 22:49                                 ` Ingo Molnar
2006-07-13  3:16                               ` Andrea Arcangeli
2006-07-13 11:23                                 ` Jeff Dike
2006-07-13 11:35                                   ` Ingo Molnar
2006-07-13  3:04                             ` Andrea Arcangeli
2006-07-13  3:12                               ` Linus Torvalds
2006-07-13  4:40                                 ` Andrea Arcangeli
2006-07-13  4:51                                   ` andrea
2006-07-13  5:12                                   ` Linus Torvalds
2006-07-13  6:22                                     ` andrea
2006-07-13  1:51                           ` Andrew Morton
2006-07-13  2:00                             ` Linus Torvalds
2006-07-13  7:44                             ` James Bruce
2006-07-13  8:34                               ` andrea
2006-07-13  9:18                                 ` Andrew Morton
2006-07-14  6:09                                   ` [PATCH] TIF_NOTSC and SECCOMP prctl andrea
2006-07-14  6:27                                     ` Andrew Morton
2006-07-14  6:33                                       ` andrea
2006-07-13 12:13                             ` [patch] let CONFIG_SECCOMP default to n Andi Kleen
2006-07-12 21:22                         ` Ingo Molnar
2006-07-12 22:11                           ` Andi Kleen
2006-07-11 15:54                 ` Pavel Machek
2006-06-30 12:39       ` [2.6 patch] " Alan Cox
2006-06-30  2:35     ` Randy.Dunlap
2006-06-30 15:03       ` Lee Revell
2006-07-08  9:23         ` Andrea Arcangeli
2006-07-11  1:59           ` Andrew James Wade
2006-07-11  4:16             ` andrea
2006-07-11 20:19               ` Andrew James Wade
2006-07-12 21:05                 ` andrea
2006-07-12 22:02                   ` Alan Cox
2006-07-12 23:44                     ` andrea
2006-07-13 21:29                       ` Pavel Machek
2006-07-13 23:11                         ` andrea
2006-07-13 23:20                           ` Pavel Machek
2006-07-14  0:34                             ` andrea
2006-07-15  2:55                           ` Valdis.Kletnieks
2006-07-16  0:51                             ` andrea
2006-07-16  1:54                               ` Pavel Machek
2006-07-16 15:36                                 ` andrea
2006-07-13  2:56                     ` Andrew James Wade
2006-07-12 21:13                 ` Ingo Molnar
2006-07-13  1:16                   ` andrea
2006-07-13  1:37                   ` Andrew James Wade
  -- strict thread matches above, loose matches on Subject: below --
2006-07-18 10:20 [PATCH] TIF_NOTSC and SECCOMP prctl Chuck Ebbert
2006-07-18 13:29 ` andrea
2006-07-25 21:44   ` andrea
2006-07-26  8:07     ` Ingo Molnar
2006-07-26 11:45       ` andrea

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox