LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: James Morris @ 2011-05-13  0:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Peter Zijlstra, microblaze-uclinux,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner,
	Roland McGrath, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110512130104.GA2912@elte.hu>

On Thu, 12 May 2011, Ingo Molnar wrote:
> Funnily enough, back then you wrote this:
> 
>   " I'm concerned that we're seeing yet another security scheme being designed on 
>     the fly, without a well-formed threat model, and without taking into account 
>     lessons learned from the seemingly endless parade of similar, failed schemes. "
> 
> so when and how did your opinion of this scheme turn from it being an "endless 
> parade of failed schemes" to it being a "well-defined and readily 
> understandable feature"? :-)

When it was defined in a way which limited its purpose to reducing the 
attack surface of the sycall interface.


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: fsl_udc_core: BUG: scheduling while atomic
From: Sergej.Stepanov @ 2011-05-13  8:34 UTC (permalink / raw)
  To: mlcreech; +Cc: linuxppc-dev
In-Reply-To: <BANLkTim8ntQHkcm_yOgD+UQQb+b+tSoFqA@mail.gmail.com>

SSB3b3VsZCBzYXkgaXQgaXMgYSBnZW5lcmFsIHByb2JsZW0gYnkgdXNpbmcgQ09ORklHX1BSRUVN
UFRfVk9MVU5UQVJZLApub3Qgb25seSBGcmVlc2NhbGUuLi4KCkFtIERvbm5lcnN0YWcsIGRlbiAx
Mi4wNS4yMDExLCAxMTozMCAtMDQwMCBzY2hyaWViIE1hdHRoZXcgTC4gQ3JlZWNoOgo+IE9uIFRo
dSwgTWF5IDEyLCAyMDExIGF0IDQ6MzcgQU0sICA8U2VyZ2VqLlN0ZXBhbm92QGlkcy5kZT4gd3Jv
dGU6Cj4gPiBIaSBNYXR0aGVldywKPiA+Cj4gPiBzdWNoIG9vcHMgeW91IGNhbiBnZXQgYWxzbyB3
aXRoIHNwaS4KPiA+IEZvciBzdWNoIHByb2JsZW0gaGVscHMgdG8gY29tcGlsZSB5b3VyIGtlcm5l
bCB3aXRoIG90aGVyIHByZWVtcHRpb24KPiA+IG1vZGVsOgo+ID4gIC0gcHJlZW1wdAo+ID4gIC0g
c3RhbmRhcmQKPiA+ICAtICEhISBidXQgbm90IHZvbHVudGFyeSBwcmVlbXB0aW9uICEhIQo+IAo+
IFRoYW5rcyBTZXJnZWosIGluZGVlZCBJJ20gY3VycmVudGx5IHVzaW5nIENPTkZJR19QUkVFTVBU
X1ZPTFVOVEFSWSBvbgo+IHRoaXMgYm9hcmQuICBJJ2xsIGNoYW5nZSBpdCB0byBmaXggdGhpcyBw
cm9ibGVtIGZvciBub3cuCj4gCj4gRG8geW91IGhhcHBlbiB0byBrbm93IHdoZXRoZXIgdGhlIEZy
ZWVzY2FsZSBmb2xrcyBpbnRlbmQgdG8gZml4IHRoaXM/Cj4gSWYgbm90LCBpdCBzZWVtcyBsaWtl
IGF0IGxlYXN0IHNvbWUgc29ydCBvZiB3YXJuaW5nIGlzIGluIG9yZGVyLgo+IAo+IC0tIAo+IE1h
dHRoZXcgTC4gQ3JlZWNoCg==

^ permalink raw reply

* RE: [linuxppc-release] [PATCH 1/2] powerpc, e5500: add networking to defconfig
From: Li Yang-R58472 @ 2011-05-13 10:36 UTC (permalink / raw)
  To: Wood Scott-B07421; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20110512103135.2c2a037a@schlenkerla.am.freescale.net>

>Subject: Re: [linuxppc-release] [PATCH 1/2] powerpc, e5500: add networking
>to defconfig
>
>On Thu, 12 May 2011 10:31:08 -0500
>Scott Wood <scottwood@freescale.com> wrote:
>
>> On Thu, 12 May 2011 01:11:03 -0500
>> Li Yang-R58472 <R58472@freescale.com> wrote:
>>
>> > >diff --git a/arch/powerpc/configs/e55xx_smp_defconfig
>> > >b/arch/powerpc/configs/e55xx_smp_defconfig
>> > >index 9fa1613..f4c5780 100644
>> > >--- a/arch/powerpc/configs/e55xx_smp_defconfig
>> > >+++ b/arch/powerpc/configs/e55xx_smp_defconfig
>> > >@@ -6,10 +6,10 @@ CONFIG_NR_CPUS=3D2
>> > > CONFIG_EXPERIMENTAL=3Dy
>> > > CONFIG_SYSVIPC=3Dy
>> > > CONFIG_BSD_PROCESS_ACCT=3Dy
>> > >+CONFIG_SPARSE_IRQ=3Dy
>> >
>> > Hi Scott,
>> >
>> > I remember in previous testing that this option has a negative effect
>on performance.  Do we really need it to be enabled?
>>
>> I didn't change this setting, it just moved due to running it through
>> savedefconfig.
>
>What was the performance impact?

It adds CPU cycles to the interrupt handling path.  Will cause performance =
drop for benchmarks with large amount of interrupts such as IP forwarding.

- Leo

^ permalink raw reply

* [PATHC] Fix for Pegasos keyboard and mouse
From: Gabriel Paubert @ 2011-05-13 11:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: pacman, linuxppc-dev

[See http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-October/086424.html
and followups. Part of the commit message is directly copied from that.]

Commit 540c6c392f01887dcc96bef0a41e63e6c1334f01 tries to find i8042 IRQs in
the device-tree but doesn't fall back to the old hardcoded 1 and 12 in all
failure cases.

Specifically, the case where the device-tree contains nothing matching
pnpPNP,303 or pnpPNP,f03 doesn't seem to be handled well. It sort of falls
through to the old code, but leaves the IRQs set to 0.

Signed-off-by: Gabriel Paubert <paubert@iram.es>

---

This fix has only been tested on Pegasos, but to my knowledge it only 
affects a Pegasos specific path (all other fimwares should be able
to find the keyboard through the pnp identifiers.

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 21f30cb..6c7abbf 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -602,6 +602,10 @@ int check_legacy_ioport(unsigned long base_port)
 		 * name instead */
 		if (!np)
 			np = of_find_node_by_name(NULL, "8042");
+		if (np) {
+			of_i8042_kbd_irq = 1;
+			of_i8042_aux_irq = 12;
+		}
 		break;
 	case FDC_BASE: /* FDC1 */
 		np = of_find_node_by_type(NULL, "fdc");

^ permalink raw reply related

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-13 12:10 UTC (permalink / raw)
  To: James Morris
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Peter Zijlstra, microblaze-uclinux,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner,
	Roland McGrath, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <alpine.LRH.2.00.1105131018040.3047@tundra.namei.org>

* James Morris <jmorris@namei.org> wrote:

> On Thu, 12 May 2011, Ingo Molnar wrote:
> > Funnily enough, back then you wrote this:
> > 
> >   " I'm concerned that we're seeing yet another security scheme being designed on 
> >     the fly, without a well-formed threat model, and without taking into account 
> >     lessons learned from the seemingly endless parade of similar, failed schemes. "
> > 
> > so when and how did your opinion of this scheme turn from it being an 
> > "endless parade of failed schemes" to it being a "well-defined and readily 
> > understandable feature"? :-)
> 
> When it was defined in a way which limited its purpose to reducing the attack 
> surface of the sycall interface.

Let me outline a simple example of a new filter expression based security 
feature that could be implemented outside the narrow system call boundary you 
find acceptable, and please tell what is bad about it.

Say i'm a user-space sandbox developer who wants to enforce that sandboxed code 
should only be allowed to open files in /home/sandbox/, /lib/ and /usr/lib/.

It is a simple and sensible security feature, agreed? It allows most code to 
run well and link to countless libraries - but no access to other files is 
allowed.

I would also like my sandbox app to be able to install this policy without 
having to be root. I do not want the sandbox app to have permission to create 
labels on /lib and /usr/lib and what not.

Firstly, using the filter code i deny the various link creation syscalls so 
that sandboxed code cannot escape for example by creating a symlink to outside 
the permitted VFS namespace. (Note: we opt-in to syscalls, that way new 
syscalls added by new kernels are denied by defalt. The current symlink 
creation syscalls are not opted in to.)

But the next step, actually checking filenames, poses a big hurdle: i cannot 
implement the filename checking at the sys_open() syscall level in a secure 
way: because the pathname is passed to sys_open() by pointer, and if i check it 
at the generic sys_open() syscall level, another thread in the sandbox might 
modify the underlying filename *after* i've checked it.

But if i had a VFS event at the fs/namei.c::getname() level, i would have 
access to a central point where the VFS string becomes stable to the kernel and 
can be checked (and denied if necessary).

A sidenote, and not surprisingly, the audit subsystem already has an event 
callback there:

        audit_getname(result);

Unfortunately this audit callback cannot be used for my purposes, because the 
event is single-purpose for auditd and because it allows no feedback (no 
deny/accept discretion for the security policy).

But if had this simple event there:

	err = event_vfs_getname(result);

I could implement this new filename based sandboxing policy, using a filter 
like this installed on the vfs::getname event and inherited by all sandboxed 
tasks (which cannot uninstall the filter, obviously):

  "
	if (strstr(name, ".."))
		return -EACCESS;

	if (!strncmp(name, "/home/sandbox/", 14) &&
	    !strncmp(name, "/lib/", 5) &&
	    !strncmp(name, "/usr/lib/", 9))
		return -EACCESS;

  "

  #
  # Note1: Obviously the filter engine would be extended to allow such simple string
  #        match functions. )
  #
  # Note2: ".." is disallowed so that sandboxed code cannot escape the restrictions
  #         using "/..".
  #

This kind of flexible and dynamic sandboxing would allow a wide range of file 
ops within the sandbox, while still isolating it from files not included in the 
specified VFS namespace.

( Note that there are tons of other examples as well, for useful security features
  that are best done using events outside the syscall boundary. )

The security event filters code tied to seccomp and syscalls at the moment is 
useful, but limited in its future potential.

So i argue that it should go slightly further and should become:

 - unprivileged:  application-definable, allowing the embedding of security 
                  policy in *apps* as well, not just the system

 - flexible:      can be added/removed runtime unprivileged, and cheaply so

 - transparent:   does not impact executing code that meets the policy

 - nestable:      it is inherited by child tasks and is fundamentally stackable,
                  multiple policies will have the combined effect and they
                  are transparent to each other. So if a child task within a
                  sandbox adds *more* checks then those add to the already
                  existing set of checks. We only narrow permissions, never
                  extend them.

 - generic:       allowing observation and (safe) control of security relevant
                  parameters not just at the system call boundary but at other
                  relevant places of kernel execution as well: which 
                  points/callbacks could also be used for other types of event 
                  extraction such as perf. It could even be shared with audit ...

I argue that this is the LSM and audit subsystems designed right: in the long 
run it could allow everything that LSM does at the moment - and so much more 
...

And you argue that allowing this would be bad, if it was extended like that 
then you'd consider it a failed scheme? Why?

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 12:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <20110513121034.GG21022@elte.hu>

On Fri, 2011-05-13 at 14:10 +0200, Ingo Molnar wrote:
>         err =3D event_vfs_getname(result);

I really think we should not do this. Events like we have them should be
inactive, totally passive entities, only observe but not affect
execution (other than the bare minimal time delay introduced by
observance).

If you want another entity that is more active, please invent a new name
for it and create a new subsystem for them, now you could have these
active entities also have an (automatic) passive event side, but that's
some detail.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-13 12:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <1305289146.2466.8.camel@twins>


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-05-13 at 14:10 +0200, Ingo Molnar wrote:
> >         err = event_vfs_getname(result);
> 
> I really think we should not do this. Events like we have them should be 
> inactive, totally passive entities, only observe but not affect execution 
> (other than the bare minimal time delay introduced by observance).

Well, this patchset already demonstrates that we can use a single event 
callback for a rather useful purpose.

Either it makes sense to do, in which case we should share facilities as much 
as possible, or it makes no sense, in which case we should not merge it at all.

> If you want another entity that is more active, please invent a new name for 
> it and create a new subsystem for them, now you could have these active 
> entities also have an (automatic) passive event side, but that's some detail.

Why should we have two callbacks next to each other:

	event_vfs_getname(result);
	result = check_event_vfs_getname(result);

if one could do it all?

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 12:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <20110513122646.GA3924@elte.hu>

On Fri, 2011-05-13 at 14:26 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>=20
> > On Fri, 2011-05-13 at 14:10 +0200, Ingo Molnar wrote:
> > >         err =3D event_vfs_getname(result);
> >=20
> > I really think we should not do this. Events like we have them should b=
e=20
> > inactive, totally passive entities, only observe but not affect executi=
on=20
> > (other than the bare minimal time delay introduced by observance).
>=20
> Well, this patchset already demonstrates that we can use a single event=
=20
> callback for a rather useful purpose.

Can and should are two distinct things.

> Either it makes sense to do, in which case we should share facilities as =
much=20
> as possible, or it makes no sense, in which case we should not merge it a=
t all.

And I'm arguing we should _not_. Observing is radically different from
Affecting, at the very least the two things should have different
permission schemes. We should not confuse these two matters.

> > If you want another entity that is more active, please invent a new nam=
e for=20
> > it and create a new subsystem for them, now you could have these active=
=20
> > entities also have an (automatic) passive event side, but that's some d=
etail.
>=20
> Why should we have two callbacks next to each other:
>=20
> 	event_vfs_getname(result);
> 	result =3D check_event_vfs_getname(result);
>=20
> if one could do it all?

Did you actually read the bit where I said that check_event_* (although
I still think that name sucks) could imply a matching event_*?

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 12:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <1305290370.2466.14.camel@twins>

On Fri, 2011-05-13 at 14:39 +0200, Peter Zijlstra wrote:
>=20
> >       event_vfs_getname(result);
> >       result =3D check_event_vfs_getname(result);=20

Another fundamental difference is how to treat the callback chains for
these two.

Observers won't have a return value and are assumed to never fail,
therefore we can always call every entry on the callback list.

Active things otoh do have a return value, and thus we need to have
semantics that define what to do with that during callback iteration,
when to continue and when to break. Thus for active elements its
impossible to guarantee all entries will indeed be called.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-13 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <1305290370.2466.14.camel@twins>

* Peter Zijlstra <peterz@infradead.org> wrote:

> > Why should we have two callbacks next to each other:
> > 
> > 	event_vfs_getname(result);
> > 	result = check_event_vfs_getname(result);
> > 
> > if one could do it all?
> 
> Did you actually read the bit where I said that check_event_* (although
> I still think that name sucks) could imply a matching event_*?

No, did not notice that - and yes that solves this particular problem.

So given that by your own admission it makes sense to share the facilities at 
the low level, i also argue that it makes sense to share as high up as 
possible.

Are you perhaps arguing for a ->observe flag that would make 100% sure that the 
default behavior for events is observe-only? That would make sense indeed.

Otherwise both cases really want to use all the same facilities for event 
discovery, setup, control and potential extraction of events.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-13 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <1305290612.2466.17.camel@twins>

* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-05-13 at 14:39 +0200, Peter Zijlstra wrote:
> > 
> > >       event_vfs_getname(result);
> > >       result = check_event_vfs_getname(result); 
> 
> Another fundamental difference is how to treat the callback chains for
> these two.
> 
> Observers won't have a return value and are assumed to never fail,
> therefore we can always call every entry on the callback list.
> 
> Active things otoh do have a return value, and thus we need to have
> semantics that define what to do with that during callback iteration,
> when to continue and when to break. Thus for active elements its
> impossible to guarantee all entries will indeed be called.

I think the sanest semantics is to run all active callbacks as well.

For example if this is used for three stacked security policies - as if 3 LSM 
modules were stacked at once. We'd call all three, and we'd determine that at 
least one failed - and we'd return a failure.

Even if the first one failed already we'd still want to trigger *all* the 
failures, because security policies like to know when they have triggered a 
failure (regardless of other active policies) and want to see that failure 
event (if they are logging such events).

So to me this looks pretty similar to observer callbacks as well, it's the 
natural extension to an observer callback chain.

Observer callbacks are simply constant functions (to the caller), those which 
never return failure and which never modify any of the parameters.

It's as if you argued that there should be separate syscalls/facilities for 
handling readonly files versus handling read/write files.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 13:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <20110513125452.GD3924@elte.hu>

On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote:
> I think the sanest semantics is to run all active callbacks as well.
>=20
> For example if this is used for three stacked security policies - as if 3=
 LSM=20
> modules were stacked at once. We'd call all three, and we'd determine tha=
t at=20
> least one failed - and we'd return a failure.=20

But that only works for boolean functions where you can return the
multi-bit-or of the result. What if you need to return the specific
error code.

Also, there's bound to be other cases where people will want to employ
this, look at all the various notifier chain muck we've got, it already
deals with much of this -- simply because users need it.

Then there's the whole indirection argument, if you don't need
indirection, its often better to not use it, I myself much prefer code
to look like:

   foo1(bar);
   foo2(bar);
   foo3(bar);

Than:

   foo_notifier(bar);

Simply because its much clearer who all are involved without me having
to grep around to see who registers for foo_notifier and wth they do
with it. It also makes it much harder to sneak in another user, whereas
its nearly impossible to find new notifier users.

Its also much faster, no extra memory accesses, no indirect function
calls, no other muck.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-13 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <1305292132.2466.26.camel@twins>


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote:
> > I think the sanest semantics is to run all active callbacks as well.
> > 
> > For example if this is used for three stacked security policies - as if 3 LSM 
> > modules were stacked at once. We'd call all three, and we'd determine that at 
> > least one failed - and we'd return a failure. 
> 
> But that only works for boolean functions where you can return the
> multi-bit-or of the result. What if you need to return the specific
> error code.

Do you mean that one filter returns -EINVAL while the other -EACCES?

Seems like a non-problem to me, we'd return the first nonzero value.

> Also, there's bound to be other cases where people will want to employ
> this, look at all the various notifier chain muck we've got, it already
> deals with much of this -- simply because users need it.

Do you mean it would be easy to abuse it? What kind of abuse are you most 
worried about?

> Then there's the whole indirection argument, if you don't need
> indirection, its often better to not use it, I myself much prefer code
> to look like:
> 
>    foo1(bar);
>    foo2(bar);
>    foo3(bar);
> 
> Than:
> 
>    foo_notifier(bar);
> 
> Simply because its much clearer who all are involved without me having
> to grep around to see who registers for foo_notifier and wth they do
> with it. It also makes it much harder to sneak in another user, whereas
> its nearly impossible to find new notifier users.
> 
> Its also much faster, no extra memory accesses, no indirect function
> calls, no other muck.

But i suspect this question has been settled, given the fact that even pure 
observer events need and already process a chain of events? Am i missing 
something about your argument?

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 13:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <20110513124902.GC3924@elte.hu>

On Fri, 2011-05-13 at 14:49 +0200, Ingo Molnar wrote:
>=20
> So given that by your own admission it makes sense to share the facilitie=
s at=20
> the low level, i also argue that it makes sense to share as high up as=
=20
> possible.=20

I'm not saying any such thing, I'm saying that it might make sense to
observe active objects and auto-create these observation points. That
doesn't make them similar or make them share anything.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 13:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Steven Rostedt, Martin Schwidefsky,
	Thomas Gleixner, Roland McGrath, Michal Marek, Michal Simek,
	Will Drewry, linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Tejun Heo, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110513131800.GA7883@elte.hu>

Cut the microblaze list since its bouncy.

On Fri, 2011-05-13 at 15:18 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>=20
> > On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote:
> > > I think the sanest semantics is to run all active callbacks as well.
> > >=20
> > > For example if this is used for three stacked security policies - as =
if 3 LSM=20
> > > modules were stacked at once. We'd call all three, and we'd determine=
 that at=20
> > > least one failed - and we'd return a failure.=20
> >=20
> > But that only works for boolean functions where you can return the
> > multi-bit-or of the result. What if you need to return the specific
> > error code.
>=20
> Do you mean that one filter returns -EINVAL while the other -EACCES?
>=20
> Seems like a non-problem to me, we'd return the first nonzero value.

Assuming the first is -EINVAL, what then is the value in computing the
-EACCESS? Sounds like a massive waste of time to me.

> > Also, there's bound to be other cases where people will want to employ
> > this, look at all the various notifier chain muck we've got, it already
> > deals with much of this -- simply because users need it.
>=20
> Do you mean it would be easy to abuse it? What kind of abuse are you most=
=20
> worried about?

I'm not worried about abuse, I'm saying that going by the existing
notifier pattern always visiting all entries on the callback list is
undesired.

> > Then there's the whole indirection argument, if you don't need
> > indirection, its often better to not use it, I myself much prefer code
> > to look like:
> >=20
> >    foo1(bar);
> >    foo2(bar);
> >    foo3(bar);
> >=20
> > Than:
> >=20
> >    foo_notifier(bar);
> >=20
> > Simply because its much clearer who all are involved without me having
> > to grep around to see who registers for foo_notifier and wth they do
> > with it. It also makes it much harder to sneak in another user, whereas
> > its nearly impossible to find new notifier users.
> >=20
> > Its also much faster, no extra memory accesses, no indirect function
> > calls, no other muck.
>=20
> But i suspect this question has been settled, given the fact that even pu=
re=20
> observer events need and already process a chain of events? Am i missing=
=20
> something about your argument?

I'm saying that there's reasons to not use notifiers passive or active.

Mostly the whole notifier/indirection muck comes up once you want
modules to make use of the thing, because then you need dynamic
management of the callback list.

(Then again, I'm fairly glad we don't have explicit callbacks in
kernel/cpu.c for all the cpu-hotplug callbacks :-)

Anyway, I oppose for the existing events to gain an active role.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-13 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Steven Rostedt, Martin Schwidefsky,
	Thomas Gleixner, Roland McGrath, Michal Marek, Michal Simek,
	Will Drewry, linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Tejun Heo, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <1305294935.2466.64.camel@twins>


* Peter Zijlstra <peterz@infradead.org> wrote:

> Cut the microblaze list since its bouncy.
> 
> On Fri, 2011-05-13 at 15:18 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote:
> > > > I think the sanest semantics is to run all active callbacks as well.
> > > > 
> > > > For example if this is used for three stacked security policies - as if 3 LSM 
> > > > modules were stacked at once. We'd call all three, and we'd determine that at 
> > > > least one failed - and we'd return a failure. 
> > > 
> > > But that only works for boolean functions where you can return the
> > > multi-bit-or of the result. What if you need to return the specific
> > > error code.
> > 
> > Do you mean that one filter returns -EINVAL while the other -EACCES?
> > 
> > Seems like a non-problem to me, we'd return the first nonzero value.
> 
> Assuming the first is -EINVAL, what then is the value in computing the
> -EACCESS? Sounds like a massive waste of time to me.

No, because the common case is no rejection - this is a security mechanism. So 
in the normal case we would execute all 3 anyway, just to determine that all 
return 0.

Are you really worried about the abnormal case of one of them returning an 
error and us calculating all 3 return values?

> > > Also, there's bound to be other cases where people will want to employ
> > > this, look at all the various notifier chain muck we've got, it already
> > > deals with much of this -- simply because users need it.
> > 
> > Do you mean it would be easy to abuse it? What kind of abuse are you most 
> > worried about?
> 
> I'm not worried about abuse, I'm saying that going by the existing
> notifier pattern always visiting all entries on the callback list is
> undesired.

That is because many notifier chains are used in an 'event consuming' manner - 
they are responding to things like hardware events and are called in an 
interrupt-handler alike fashion most of the time.

> > > Then there's the whole indirection argument, if you don't need
> > > indirection, its often better to not use it, I myself much prefer code
> > > to look like:
> > > 
> > >    foo1(bar);
> > >    foo2(bar);
> > >    foo3(bar);
> > > 
> > > Than:
> > > 
> > >    foo_notifier(bar);
> > > 
> > > Simply because its much clearer who all are involved without me having
> > > to grep around to see who registers for foo_notifier and wth they do
> > > with it. It also makes it much harder to sneak in another user, whereas
> > > its nearly impossible to find new notifier users.
> > > 
> > > Its also much faster, no extra memory accesses, no indirect function
> > > calls, no other muck.
> > 
> > But i suspect this question has been settled, given the fact that even pure 
> > observer events need and already process a chain of events? Am i missing 
> > something about your argument?
> 
> I'm saying that there's reasons to not use notifiers passive or active.
> 
> Mostly the whole notifier/indirection muck comes up once you want
> modules to make use of the thing, because then you need dynamic
> management of the callback list.

But your argument assumes that we'd have a chain of functions to call, like 
regular notifiers.

While the natural model here would be to have a list of registered event 
structs for that point, with different filters but basically the same callback 
mechanism (a call into the filter engine in essence).

Also note that the common case would be no event registered - and we'd 
automatically optimize that case via the existing jump labels optimization.

> (Then again, I'm fairly glad we don't have explicit callbacks in kernel/cpu.c 
> for all the cpu-hotplug callbacks :-)
> 
> Anyway, I oppose for the existing events to gain an active role.

Why if 'being active' is optional and useful?

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-13 15:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, microblaze-uclinux, Steven Rostedt,
	Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
	Michal Simek, Will Drewry, linuxppc-dev, linux-kernel,
	Ralf Baechle, Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
	David S. Miller
In-Reply-To: <1305294936.2466.65.camel@twins>

* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-05-13 at 14:49 +0200, Ingo Molnar wrote:
> > 
> > So given that by your own admission it makes sense to share the facilities at 
> > the low level, i also argue that it makes sense to share as high up as 
> > possible. 
> 
> I'm not saying any such thing, I'm saying that it might make sense to
> observe active objects and auto-create these observation points. That
> doesn't make them similar or make them share anything.

Well, they would share the lowest level call site:

	result = check_event_vfs_getname(result);

You call it 'auto-generated call site', i call it a shared (single line) call 
site. The same thing as far as the lowest level goes.

Now (the way i understood it) you'd want to stop the sharing right after that. 
I argue that it should go all the way up.

Note: i fully agree that there should be events where filters can have no 
effect whatsoever. For example if this was written as:

	check_event_vfs_getname(result);

Then it would have no effect. This is decided by the subsystem developers, 
obviously. So whether an event is 'active' or 'passive' can be enforced at the 
subsystem level as well.

As far as the event facilities go, 'no effect observation' is a special-case of 
'active observation' - just like read-only files are a special case of 
read-write files.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Eric Paris @ 2011-05-13 15:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
	Serge E. Hallyn, Peter Zijlstra, Steven Rostedt, Tejun Heo,
	Thomas Gleixner, linux-arm-kernel, Michal Marek, Michal Simek,
	Will Drewry, linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110513121034.GG21022@elte.hu>

[dropping microblaze and roland]

lOn Fri, 2011-05-13 at 14:10 +0200, Ingo Molnar wrote:
> * James Morris <jmorris@namei.org> wrote:

> It is a simple and sensible security feature, agreed? It allows most code to 
> run well and link to countless libraries - but no access to other files is 
> allowed.

It's simple enough and sounds reasonable, but you can read all the
discussion about AppArmour why many people don't really think it's the
best.  Still, I'll agree it's a lot better than nothing.

> But if i had a VFS event at the fs/namei.c::getname() level, i would have 
> access to a central point where the VFS string becomes stable to the kernel and 
> can be checked (and denied if necessary).
> 
> A sidenote, and not surprisingly, the audit subsystem already has an event 
> callback there:
> 
>         audit_getname(result);
> 
> Unfortunately this audit callback cannot be used for my purposes, because the 
> event is single-purpose for auditd and because it allows no feedback (no 
> deny/accept discretion for the security policy).
> 
> But if had this simple event there:
> 
> 	err = event_vfs_getname(result);

Wow it sounds so easy.  Now lets keep extending your train of thought
until we can actually provide the security provided by SELinux.  What do
we end up with?  We end up with an event hook right next to every LSM
hook.  You know, the LSM hooks were placed where they are for a reason.
Because those were the locations inside the kernel where you actually
have information about the task doing an operation and the objects
(files, sockets, directories, other tasks, etc) they are doing an
operation on.

Honestly all you are talking about it remaking the LSM with 2 sets of
hooks instead if 1.  Why?  It seems much easier that if you want the
language of the filter engine you would just make a new LSM that uses
the filter engine for it's policy language rather than the language
created by SELinux or SMACK or name your LSM implementation.

>  - unprivileged:  application-definable, allowing the embedding of security 
>                   policy in *apps* as well, not just the system
> 
>  - flexible:      can be added/removed runtime unprivileged, and cheaply so
> 
>  - transparent:   does not impact executing code that meets the policy
> 
>  - nestable:      it is inherited by child tasks and is fundamentally stackable,
>                   multiple policies will have the combined effect and they
>                   are transparent to each other. So if a child task within a
>                   sandbox adds *more* checks then those add to the already
>                   existing set of checks. We only narrow permissions, never
>                   extend them.
> 
>  - generic:       allowing observation and (safe) control of security relevant
>                   parameters not just at the system call boundary but at other
>                   relevant places of kernel execution as well: which 
>                   points/callbacks could also be used for other types of event 
>                   extraction such as perf. It could even be shared with audit ...

I'm not arguing that any of these things are bad things.  What you
describe is a new LSM that uses a discretionary access control model but
with the granularity and flexibility that has traditionally only existed
in the mandatory access control security modules previously implemented
in the kernel.

I won't argue that's a bad idea, there's no reason in my mind that a
process shouldn't be allowed to control it's own access decisions in a
more flexible way than rwx bits.  Then again, I certainly don't see a
reason that this syscall hardening patch should be held up while a whole
new concept in computer security is contemplated...

-Eric

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Eric Paris @ 2011-05-13 15:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
	Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110513131800.GA7883@elte.hu>

[dropping microblaze and roland]

On Fri, 2011-05-13 at 15:18 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote:
> > > I think the sanest semantics is to run all active callbacks as well.
> > > 
> > > For example if this is used for three stacked security policies - as if 3 LSM 
> > > modules were stacked at once. We'd call all three, and we'd determine that at 
> > > least one failed - and we'd return a failure. 
> > 
> > But that only works for boolean functions where you can return the
> > multi-bit-or of the result. What if you need to return the specific
> > error code.
> 
> Do you mean that one filter returns -EINVAL while the other -EACCES?
> 
> Seems like a non-problem to me, we'd return the first nonzero value.

Sounds so easy!  Why haven't LSMs stacked already?  Because what happens
if one of these hooks did something stateful?  Lets say on open, hook #1
returns EPERM.  hook #2 allocates memory.  The open is going to fail and
hooks #2 is never going to get the close() which should have freed the
allocation.  If you can be completely stateless its easier, but there's
a reason that stacking security modules is hard.  Serge has tried in the
past and both dhowells and casey schaufler are working on it right now.
Stacking is never as easy as it sounds   :)

-Eric

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 15:23 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, H. Peter Anvin,
	sparclinux, Jiri Slaby, linux-s390, Russell King, x86,
	James Morris, Linus Torvalds, Ingo Molnar, Ingo Molnar,
	Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	kees.cook, linux-arm-kernel, Michal Marek, Michal Simek,
	Will Drewry, linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <1305299455.2076.26.camel@localhost.localdomain>

On Fri, 2011-05-13 at 11:10 -0400, Eric Paris wrote:
> Then again, I certainly don't see a
> reason that this syscall hardening patch should be held up while a whole
> new concept in computer security is contemplated...=20

Which makes me wonder why this syscall hardening stuff is done outside
of LSM? Why isn't is part of the LSM so that say SELinux can have a
syscall bitmask per security context?

Making it part of the LSM also avoids having to add this prctl().

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-13 15:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
	Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110513145737.GC32688@elte.hu>

On Fri, 2011-05-13 at 16:57 +0200, Ingo Molnar wrote:
> this is a security mechanism

Who says? and why would you want to unify two separate concepts only to
them limit it to security that just doesn't make sense.

Either you provide a full on replacement for notifier chain like things
or you don't, only extending trace events in this fashion for security
is like way weird.

Plus see the arguments Eric made about stacking stuff, not only security
schemes will have those problems.

^ permalink raw reply

* RE: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system callfiltering
From: David Laight @ 2011-05-13 15:29 UTC (permalink / raw)
  To: Eric Paris, Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, linux-kernel, David Howells, Paul Mackerras,
	H. PeterAnvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Ingo Molnar, kees.cook, Serge E. Hallyn,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner, agl,
	linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, Oleg Nesterov, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, Linus Torvalds, David S. Miller
In-Reply-To: <1305299880.2076.31.camel@localhost.localdomain>

> ... If you can be completely stateless its easier, but there's
> a reason that stacking security modules is hard.  Serge has tried in
the
> past and both dhowells and casey schaufler are working on it right
now.
> Stacking is never as easy as it sounds   :)

For a bad example of trying to allow alternate security models
look at NetBSD's kauth code :-)

NetBSD also had issues where some 'system call trace' code
was being used to (try to) apply security - unfortunately
it worked by looking at the user-space buffers on system
call entry - and a multithreaded program can easily arrange
to update them after the initial check!
For trace/event type activities this wouldn't really matter,
for security policy it does.
(I've not looked directly at these event points in linux)

	David

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Eric Paris @ 2011-05-13 15:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, H. Peter Anvin,
	sparclinux, Jiri Slaby, linux-s390, Russell King, x86,
	James Morris, Linus Torvalds, Ingo Molnar, Ingo Molnar,
	Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	kees.cook, linux-arm-kernel, Michal Marek, Michal Simek,
	Will Drewry, linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <1305300181.2466.72.camel@twins>

On Fri, 2011-05-13 at 17:23 +0200, Peter Zijlstra wrote:
> On Fri, 2011-05-13 at 11:10 -0400, Eric Paris wrote:
> > Then again, I certainly don't see a
> > reason that this syscall hardening patch should be held up while a whole
> > new concept in computer security is contemplated... 
> 
> Which makes me wonder why this syscall hardening stuff is done outside
> of LSM? Why isn't is part of the LSM so that say SELinux can have a
> syscall bitmask per security context?

I could do that, but I like Will's approach better.  From the PoV of
meeting security goals of information flow, data confidentiality,
integrity, least priv, etc limiting on the syscall boundary doesn't make
a lot of sense.  You just don't know enough there to enforce these
things.  These are the types of goals that SELinux and other LSMs have
previously tried to enforce.  From the PoV of making the kernel more
resistant to attacks and making a process more resistant to misbehavior
I think that the syscall boundary is appropriate.  Although I could do
it in SELinux it don't really want to do it there.

In case people are interested or confused let me give my definition of
two words I've used a bit in these conversations: discretionary and
mandatory.  Any time I talk about a 'discretionary' security decision it
is a security decisions that a process imposed upon itself.  Aka the
choice to use seccomp is discretionary.  The choice to mark our own file
u-wx is discretionary.  This isn't the best definition but it's one that
works well in this discussion.  Mandatory security is one enforce by a
global policy.  It's what selinux is all about.  SELinux doesn't give
hoot what a process wants to do, it enforces a global policy from the
top down.  You take over a process, well, too bad, you still have no
choice but to follow the mandatory policy.

The LSM does NOT enforce a mandatory access control model, it's just how
it's been used in the past.  Ingo appears to me (please correct me if
I'm wrong) to really be a fan of exposing the flexibility of the LSM to
a discretionary access control model.  That doesn't seem like a bad
idea.  And maybe using the filter engine to define the language to do
this isn't a bad idea either.  But I think that's a 'down the road'
project, not something to hold up a better seccomp.

> Making it part of the LSM also avoids having to add this prctl().

Well, it would mean exposing some new language construct to every LSM
(instead of a single prctl construct) and it would mean anyone wanting
to use the interface would have to rely on the LSM implementing those
hooks the way they need it.  Honestly chrome can already get all of the
benefits of this patch (given a perfectly coded kernel) and a whole lot
more using SELinux, but (surprise surprise) not everyone uses SELinux.
I think it's a good idea to expose a simple interface which will be
widely enough adopted that many userspace applications can rely on it
for hardening.

The existence of the LSM and the fact that there exists multiple
security modules that may or may not be enabled really leads application
developers to be unable to rely on LSM for security.  If linux had a
single security model which everyone could rely on we wouldn't really
have as big of an issue but that's not possible.  So I'm advocating for
this series which will provide a single useful change which applications
can rely upon across distros and platforms to enhance the properties and
abilities of the linux kernel.

-Eric

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Will Drewry @ 2011-05-13 16:29 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, Ingo Molnar,
	Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	kees.cook, linux-arm-kernel, Michal Marek, Michal Simek,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <1305302141.2076.56.camel@localhost.localdomain>

On Fri, May 13, 2011 at 10:55 AM, Eric Paris <eparis@redhat.com> wrote:
> On Fri, 2011-05-13 at 17:23 +0200, Peter Zijlstra wrote:
>> On Fri, 2011-05-13 at 11:10 -0400, Eric Paris wrote:
>> > Then again, I certainly don't see a
>> > reason that this syscall hardening patch should be held up while a who=
le
>> > new concept in computer security is contemplated...
>>
>> Which makes me wonder why this syscall hardening stuff is done outside
>> of LSM? Why isn't is part of the LSM so that say SELinux can have a
>> syscall bitmask per security context?
>
> I could do that, but I like Will's approach better. =A0From the PoV of
> meeting security goals of information flow, data confidentiality,
> integrity, least priv, etc limiting on the syscall boundary doesn't make
> a lot of sense. =A0You just don't know enough there to enforce these
> things. =A0These are the types of goals that SELinux and other LSMs have
> previously tried to enforce. =A0From the PoV of making the kernel more
> resistant to attacks and making a process more resistant to misbehavior
> I think that the syscall boundary is appropriate. =A0Although I could do
> it in SELinux it don't really want to do it there.

There's also the problem that there are no hooks per-system call for
LSMs, only logical hooks that sometimes mirror system call names and
are called after user data has been parsed.  If system call enter
hooks, like seccomp's, were added for LSMs, it would allow the lsm
bitmask approach, but it still wouldn't satisfy the issues you raise
below (and I wholeheartedly agree with).

> In case people are interested or confused let me give my definition of
> two words I've used a bit in these conversations: discretionary and
> mandatory. =A0Any time I talk about a 'discretionary' security decision i=
t
> is a security decisions that a process imposed upon itself. =A0Aka the
> choice to use seccomp is discretionary. =A0The choice to mark our own fil=
e
> u-wx is discretionary. =A0This isn't the best definition but it's one tha=
t
> works well in this discussion. =A0Mandatory security is one enforce by a
> global policy. =A0It's what selinux is all about. =A0SELinux doesn't give
> hoot what a process wants to do, it enforces a global policy from the
> top down. =A0You take over a process, well, too bad, you still have no
> choice but to follow the mandatory policy.
>
> The LSM does NOT enforce a mandatory access control model, it's just how
> it's been used in the past. =A0Ingo appears to me (please correct me if
> I'm wrong) to really be a fan of exposing the flexibility of the LSM to
> a discretionary access control model. =A0That doesn't seem like a bad
> idea. =A0And maybe using the filter engine to define the language to do
> this isn't a bad idea either. =A0But I think that's a 'down the road'
> project, not something to hold up a better seccomp.
>
>> Making it part of the LSM also avoids having to add this prctl().
>
> Well, it would mean exposing some new language construct to every LSM
> (instead of a single prctl construct) and it would mean anyone wanting
> to use the interface would have to rely on the LSM implementing those
> hooks the way they need it. =A0Honestly chrome can already get all of the
> benefits of this patch (given a perfectly coded kernel) and a whole lot
> more using SELinux, but (surprise surprise) not everyone uses SELinux.
> I think it's a good idea to expose a simple interface which will be
> widely enough adopted that many userspace applications can rely on it
> for hardening.
>
> The existence of the LSM and the fact that there exists multiple
> security modules that may or may not be enabled really leads application
> developers to be unable to rely on LSM for security. =A0If linux had a
> single security model which everyone could rely on we wouldn't really
> have as big of an issue but that's not possible. =A0So I'm advocating for
> this series which will provide a single useful change which applications
> can rely upon across distros and platforms to enhance the properties and
> abilities of the linux kernel.
>
> -Eric
>
>

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-14  7:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
	Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <1305300443.2466.77.camel@twins>

* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-05-13 at 16:57 +0200, Ingo Molnar wrote:
> > this is a security mechanism
> 
> Who says? [...]

Kernel developers/maintainers of the affected code.

We have security hooks all around the kernel, which can deny/accept execution 
at various key points, but we do not have 'execute arbitrary user-space defined 
(safe) scripts' callbacks in general.

But yes, if a particular callback point is defined widely enough to allow much 
bigger intervention into the flow of execution, then more is possible as well.

> [...] and why would you want to unify two separate concepts only to them 
> limit it to security that just doesn't make sense.

I don't limit them to security - the callbacks themselves are either for 
passive observation or, at most, for security accept/deny callbacks.

It's decided by the subsystem maintainers what kind of user-space control power 
(or observation power) they want to allow, not me.

I would just like to not stop the facility itself at the 'observe only' level, 
like you suggest.

Thanks,

	Ingo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox