LSM namespacing API

selinux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* LSM namespacing API
@ 2025-08-19 14:56 Paul Moore
  2025-08-19 17:11 ` Casey Schaufler
                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-19 14:56 UTC (permalink / raw)
  To: linux-security-module, selinux; +Cc: John Johansen, Stephen Smalley

Hello all,

As most of you are likely aware, Stephen Smalley has been working on
adding namespace support to SELinux, and the work has now progressed
to the point where a serious discussion on the API is warranted.  For
those of you are unfamiliar with the details or Stephen's patchset, or
simply need a refresher, he has some excellent documentation in his
work-in-progress repo:

* https://github.com/stephensmalley/selinuxns

Stephen also gave a (pre-recorded) presentation at LSS-NA this year
about SELinux namespacing, you can watch the presentation here:

* https://www.youtube.com/watch?v=AwzGCOwxLoM

In the past you've heard me state, rather firmly at times, that I
believe namespacing at the LSM framework layer to be a mistake,
although if there is something that can be done to help facilitate the
namespacing of individual LSMs at the framework layer, I would be
supportive of that.  I think that a single LSM namespace API, similar
to our recently added LSM syscalls, may be such a thing, so I'd like
us to have a discussion to see if we all agree on that, and if so,
what such an API might look like.

At LSS-NA this year, John Johansen and I had a brief discussion where
he suggested a single LSM wide clone*(2) flag that individual LSM's
could opt into via callbacks.  John is directly CC'd on this mail, so
I'll let him expand on this idea.

While I agree with John that a fs based API is problematic (see all of
our discussions around the LSM syscalls), I'm concerned that a single
clone*(2) flag will significantly limit our flexibility around how
individual LSMs are namespaced, something I don't want to see happen.
This makes me wonder about the potential for expanding
lsm_set_self_attr(2) to support a new LSM attribute that would support
a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
provide a single LSM framework API for an unshare operation while also
providing a mechanism to pass LSM specific via the lsm_ctx struct if
needed.  Just as we do with the other LSM_ATTR_* flags today,
individual LSMs can opt-in to the API fairly easily by providing a
setselfattr() LSM callback.

Thoughts?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 14:56 LSM namespacing API Paul Moore
@ 2025-08-19 17:11 ` Casey Schaufler
  2025-08-19 18:40   ` Paul Moore
  2025-08-19 17:47 ` Stephen Smalley
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Casey Schaufler @ 2025-08-19 17:11 UTC (permalink / raw)
  To: Paul Moore, linux-security-module, selinux
  Cc: John Johansen, Stephen Smalley, Casey Schaufler

On 8/19/2025 7:56 AM, Paul Moore wrote:
> Hello all,
>
> As most of you are likely aware, Stephen Smalley has been working on
> adding namespace support to SELinux, and the work has now progressed
> to the point where a serious discussion on the API is warranted.  For
> those of you are unfamiliar with the details or Stephen's patchset, or
> simply need a refresher, he has some excellent documentation in his
> work-in-progress repo:
>
> * https://github.com/stephensmalley/selinuxns
>
> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> about SELinux namespacing, you can watch the presentation here:
>
> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>
> In the past you've heard me state, rather firmly at times, that I
> believe namespacing at the LSM framework layer to be a mistake,
> although if there is something that can be done to help facilitate the
> namespacing of individual LSMs at the framework layer, I would be
> supportive of that.  I think that a single LSM namespace API, similar
> to our recently added LSM syscalls, may be such a thing, so I'd like
> us to have a discussion to see if we all agree on that, and if so,
> what such an API might look like.
>
> At LSS-NA this year, John Johansen and I had a brief discussion where
> he suggested a single LSM wide clone*(2) flag that individual LSM's
> could opt into via callbacks.  John is directly CC'd on this mail, so
> I'll let him expand on this idea.
>
> While I agree with John that a fs based API is problematic (see all of
> our discussions around the LSM syscalls), I'm concerned that a single
> clone*(2) flag will significantly limit our flexibility around how
> individual LSMs are namespaced, something I don't want to see happen.
> This makes me wonder about the potential for expanding
> lsm_set_self_attr(2) to support a new LSM attribute that would support
> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> provide a single LSM framework API for an unshare operation while also
> providing a mechanism to pass LSM specific via the lsm_ctx struct if
> needed.  Just as we do with the other LSM_ATTR_* flags today,
> individual LSMs can opt-in to the API fairly easily by providing a
> setselfattr() LSM callback.
>
> Thoughts?

The advantage of a clone flag is that the operation is atomic with
the other namespace flag based behaviors. Having a two step process

	clone(); lsm_set_self_attr(); - or -
	lsm_set_self_attr(); clone();

is going to lead to cases where neither order really works correctly.

On the other hand, it's better to have a mechanism with a few drawbacks
than nothing at all. I think it could be workable.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 17:11 ` Casey Schaufler
@ 2025-08-19 18:40   ` Paul Moore
  2025-08-19 18:58     ` Stephen Smalley
                       ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-19 18:40 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: linux-security-module, selinux, John Johansen, Stephen Smalley

On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> The advantage of a clone flag is that the operation is atomic with
> the other namespace flag based behaviors. Having a two step process
>
>         clone(); lsm_set_self_attr(); - or -
>         lsm_set_self_attr(); clone();
>
> is going to lead to cases where neither order really works correctly.

I was envisioning something that works similarly to LSM_ATTR_EXEC
where the unshare isn't immediate, but rather happens at a future
event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 18:40   ` Paul Moore
@ 2025-08-19 18:58     ` Stephen Smalley
  2025-08-21  7:26       ` John Johansen
  2025-08-21  7:23     ` John Johansen
  2025-08-21 10:00     ` Mickaël Salaün
  2 siblings, 1 reply; 43+ messages in thread
From: Stephen Smalley @ 2025-08-19 18:58 UTC (permalink / raw)
  To: Paul Moore; +Cc: Casey Schaufler, linux-security-module, selinux, John Johansen

On Tue, Aug 19, 2025 at 2:41 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >
> > The advantage of a clone flag is that the operation is atomic with
> > the other namespace flag based behaviors. Having a two step process
> >
> >         clone(); lsm_set_self_attr(); - or -
> >         lsm_set_self_attr(); clone();
> >
> > is going to lead to cases where neither order really works correctly.
>
> I was envisioning something that works similarly to LSM_ATTR_EXEC
> where the unshare isn't immediate, but rather happens at a future
> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().

I've only implemented support for an immediate unsharing of the
SELinux namespace, not any kind of deferred unsharing until the next
exec or clone.
Not saying that would be impossible, but since I was following the
example of clone(2) and unshare(2) I didn't do it.
May be some complications in doing so, but I haven't looked at it yet.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 18:58     ` Stephen Smalley
@ 2025-08-21  7:26       ` John Johansen
  0 siblings, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-21  7:26 UTC (permalink / raw)
  To: Stephen Smalley, Paul Moore
  Cc: Casey Schaufler, linux-security-module, selinux

On 8/19/25 11:58, Stephen Smalley wrote:
> On Tue, Aug 19, 2025 at 2:41 PM Paul Moore <paul@paul-moore.com> wrote:
>>
>> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>
>>> The advantage of a clone flag is that the operation is atomic with
>>> the other namespace flag based behaviors. Having a two step process
>>>
>>>          clone(); lsm_set_self_attr(); - or -
>>>          lsm_set_self_attr(); clone();
>>>
>>> is going to lead to cases where neither order really works correctly.
>>
>> I was envisioning something that works similarly to LSM_ATTR_EXEC
>> where the unshare isn't immediate, but rather happens at a future
>> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
> 
> I've only implemented support for an immediate unsharing of the
> SELinux namespace, not any kind of deferred unsharing until the next
> exec or clone.
> Not saying that would be impossible, but since I was following the
> example of clone(2) and unshare(2) I didn't do it.
> May be some complications in doing so, but I haven't looked at it yet.

if the hooks are setup correctly I expect it will actually remove some
potential complications. But I haven't deep dived the selinux code
yet so call that an uninformed hunch.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 18:40   ` Paul Moore
  2025-08-19 18:58     ` Stephen Smalley
@ 2025-08-21  7:23     ` John Johansen
  2025-08-22  1:57       ` Paul Moore
  2025-08-21 10:00     ` Mickaël Salaün
  2 siblings, 1 reply; 43+ messages in thread
From: John Johansen @ 2025-08-21  7:23 UTC (permalink / raw)
  To: Paul Moore, Casey Schaufler
  Cc: linux-security-module, selinux, Stephen Smalley

On 8/19/25 11:40, Paul Moore wrote:
> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>
>> The advantage of a clone flag is that the operation is atomic with
>> the other namespace flag based behaviors. Having a two step process
>>
>>          clone(); lsm_set_self_attr(); - or -
>>          lsm_set_self_attr(); clone();
>>
>> is going to lead to cases where neither order really works correctly.
> 
> I was envisioning something that works similarly to LSM_ATTR_EXEC
> where the unshare isn't immediate, but rather happens at a future
> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
> 
I do think something like this is needed to deal well with the two
step process. Without it is fairly easy to get into situations
where you either need more permissions, than strictly necessary,
because of steps in between or as Casey says things just don't work
correctly.

There will need to be an additional call that allows entering a
namespace separately from clone/unshare, but that covers a different
use case.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  7:23     ` John Johansen
@ 2025-08-22  1:57       ` Paul Moore
  2025-08-22 14:30         ` John Johansen
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Moore @ 2025-08-22  1:57 UTC (permalink / raw)
  To: John Johansen
  Cc: Casey Schaufler, linux-security-module, selinux, Stephen Smalley

On Thu, Aug 21, 2025 at 3:23 AM John Johansen
<john.johansen@canonical.com> wrote:
> On 8/19/25 11:40, Paul Moore wrote:
> > On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>
> >> The advantage of a clone flag is that the operation is atomic with
> >> the other namespace flag based behaviors. Having a two step process
> >>
> >>          clone(); lsm_set_self_attr(); - or -
> >>          lsm_set_self_attr(); clone();
> >>
> >> is going to lead to cases where neither order really works correctly.
> >
> > I was envisioning something that works similarly to LSM_ATTR_EXEC
> > where the unshare isn't immediate, but rather happens at a future
> > event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
> > LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
>
> I do think something like this is needed to deal well with the two
> step process. Without it is fairly easy to get into situations
> where you either need more permissions, than strictly necessary,
> because of steps in between or as Casey says things just don't work
> correctly.

I think we're starting to all coalesce on this basic idea now, at
least for creating new LSM namespace sets, that's good.  As the only
LSM that really has a namespace currently, would AppArmor be able to
work within the lsm_set_self_attr(2) approach, or would you need
something a bit different?  If so, can you give us a basic idea of
what AA would need to work?

> There will need to be an additional call that allows entering a
> namespace separately from clone/unshare, but that covers a different
> use case.

In this particular case I've been thinking of not allowing the same
level of arbitrary LSM namespace composability, but rather limiting
the caller to the set of LSM namespaces already configured for a given
process, using the procfs/setns(2) mechanism.  Does that work for your
use case(s), or do you need more flexibility?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-22  1:57       ` Paul Moore
@ 2025-08-22 14:30         ` John Johansen
  0 siblings, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-22 14:30 UTC (permalink / raw)
  To: Paul Moore
  Cc: Casey Schaufler, linux-security-module, selinux, Stephen Smalley

On 8/21/25 18:57, Paul Moore wrote:
> On Thu, Aug 21, 2025 at 3:23 AM John Johansen
> <john.johansen@canonical.com> wrote:
>> On 8/19/25 11:40, Paul Moore wrote:
>>> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>>
>>>> The advantage of a clone flag is that the operation is atomic with
>>>> the other namespace flag based behaviors. Having a two step process
>>>>
>>>>           clone(); lsm_set_self_attr(); - or -
>>>>           lsm_set_self_attr(); clone();
>>>>
>>>> is going to lead to cases where neither order really works correctly.
>>>
>>> I was envisioning something that works similarly to LSM_ATTR_EXEC
>>> where the unshare isn't immediate, but rather happens at a future
>>> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
>>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
>>
>> I do think something like this is needed to deal well with the two
>> step process. Without it is fairly easy to get into situations
>> where you either need more permissions, than strictly necessary,
>> because of steps in between or as Casey says things just don't work
>> correctly.
> 
> I think we're starting to all coalesce on this basic idea now, at
> least for creating new LSM namespace sets, that's good.  As the only
> LSM that really has a namespace currently, would AppArmor be able to
> work within the lsm_set_self_attr(2) approach, or would you need
> something a bit different?  If so, can you give us a basic idea of
> what AA would need to work?
> 
>> There will need to be an additional call that allows entering a
>> namespace separately from clone/unshare, but that covers a different
>> use case.
> 
> In this particular case I've been thinking of not allowing the same
> level of arbitrary LSM namespace composability, but rather limiting
> the caller to the set of LSM namespaces already configured for a given
> process, using the procfs/setns(2) mechanism.  Does that work for your
> use case(s), or do you need more flexibility?
> 
yes it should work,  I think the LSM/security namespaces need to move
together. In fact I want even less arbitrary composability as I think
switching LSM namespaces should be able to force system namespace
changes as well.

Their are all kinds of potential security corner cases you have to
worry about when trying to move them independently.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 18:40   ` Paul Moore
  2025-08-19 18:58     ` Stephen Smalley
  2025-08-21  7:23     ` John Johansen
@ 2025-08-21 10:00     ` Mickaël Salaün
  2025-08-22  2:14       ` Paul Moore
  2 siblings, 1 reply; 43+ messages in thread
From: Mickaël Salaün @ 2025-08-21 10:00 UTC (permalink / raw)
  To: Paul Moore
  Cc: Casey Schaufler, linux-security-module, selinux, John Johansen,
	Stephen Smalley, Maxime Bélair

On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote:
> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >
> > The advantage of a clone flag is that the operation is atomic with
> > the other namespace flag based behaviors. Having a two step process
> >
> >         clone(); lsm_set_self_attr(); - or -
> >         lsm_set_self_attr(); clone();
> >
> > is going to lead to cases where neither order really works correctly.
> 
> I was envisioning something that works similarly to LSM_ATTR_EXEC
> where the unshare isn't immediate, but rather happens at a future
> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().

The next unshare(2) would make more sense to me.

This deferred operation could be requested with a flag in
lsm_config_system_policy(2) instead:
https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21 10:00     ` Mickaël Salaün
@ 2025-08-22  2:14       ` Paul Moore
  2025-08-22 14:47         ` Casey Schaufler
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Moore @ 2025-08-22  2:14 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Casey Schaufler, linux-security-module, selinux, John Johansen,
	Stephen Smalley, Maxime Bélair

On Thu, Aug 21, 2025 at 6:00 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote:
> > On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> > >
> > > The advantage of a clone flag is that the operation is atomic with
> > > the other namespace flag based behaviors. Having a two step process
> > >
> > >         clone(); lsm_set_self_attr(); - or -
> > >         lsm_set_self_attr(); clone();
> > >
> > > is going to lead to cases where neither order really works correctly.
> >
> > I was envisioning something that works similarly to LSM_ATTR_EXEC
> > where the unshare isn't immediate, but rather happens at a future
> > event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
> > LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
>
> The next unshare(2) would make more sense to me.

That's definitely something to discuss.  I've been fairly loose on
that in the discussion thus far, but as things are starting to settle
on the lsm_set_self_attr(2) approach as one API, we should start to
clarify that.

> This deferred operation could be requested with a flag in
> lsm_config_system_policy(2) instead:
> https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com

I want to keep the policy syscall work separate from the LSM namespace
discussion as we don't want to require a policy load operation to
create a new LSM namespace.  I think it's probably okay if the policy
syscall work were to be namespace aware so that an orchestrator could
load a LSM policy into a LSM namespace other than it's own, but that
is still not overly dependent on what we are discussing here (yes,
maybe it is a little, but only just so).

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-22  2:14       ` Paul Moore
@ 2025-08-22 14:47         ` Casey Schaufler
  2025-08-22 19:59           ` John Johansen
  0 siblings, 1 reply; 43+ messages in thread
From: Casey Schaufler @ 2025-08-22 14:47 UTC (permalink / raw)
  To: Paul Moore, Mickaël Salaün
  Cc: linux-security-module, selinux, John Johansen, Stephen Smalley,
	Maxime Bélair, Casey Schaufler

On 8/21/2025 7:14 PM, Paul Moore wrote:
> On Thu, Aug 21, 2025 at 6:00 AM Mickaël Salaün <mic@digikod.net> wrote:
>> On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote:
>>> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>> The advantage of a clone flag is that the operation is atomic with
>>>> the other namespace flag based behaviors. Having a two step process
>>>>
>>>>         clone(); lsm_set_self_attr(); - or -
>>>>         lsm_set_self_attr(); clone();
>>>>
>>>> is going to lead to cases where neither order really works correctly.
>>> I was envisioning something that works similarly to LSM_ATTR_EXEC
>>> where the unshare isn't immediate, but rather happens at a future
>>> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
>>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
>> The next unshare(2) would make more sense to me.
> That's definitely something to discuss.  I've been fairly loose on
> that in the discussion thus far, but as things are starting to settle
> on the lsm_set_self_attr(2) approach as one API, we should start to
> clarify that.
>
>> This deferred operation could be requested with a flag in
>> lsm_config_system_policy(2) instead:
>> https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com
> I want to keep the policy syscall work separate from the LSM namespace
> discussion as we don't want to require a policy load operation to
> create a new LSM namespace.  I think it's probably okay if the policy
> syscall work were to be namespace aware so that an orchestrator could
> load a LSM policy into a LSM namespace other than it's own, but that
> is still not overly dependent on what we are discussing here (yes,
> maybe it is a little, but only just so).

Policy load and namespace manipulation *must* be kept separate. Smack
requires the ability to "load policy" at any time. Smack allows a process
to add "policy" to further restrict its own access, and does not require
a namespace change. There has been an implementation of namespaces for
Smack, but the developers disappeared quietly and sadly no one picked it
up. Introducing a requirement that LSMs support namespaces in order to
load policy beyond system initialization is a non-starter.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-22 14:47         ` Casey Schaufler
@ 2025-08-22 19:59           ` John Johansen
  2025-08-23 17:41             ` Dr. Greg
  0 siblings, 1 reply; 43+ messages in thread
From: John Johansen @ 2025-08-22 19:59 UTC (permalink / raw)
  To: Casey Schaufler, Paul Moore, Mickaël Salaün
  Cc: linux-security-module, selinux, Stephen Smalley,
	Maxime Bélair

On 8/22/25 07:47, Casey Schaufler wrote:
> On 8/21/2025 7:14 PM, Paul Moore wrote:
>> On Thu, Aug 21, 2025 at 6:00 AM Mickaël Salaün <mic@digikod.net> wrote:
>>> On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote:
>>>> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>>> The advantage of a clone flag is that the operation is atomic with
>>>>> the other namespace flag based behaviors. Having a two step process
>>>>>
>>>>>          clone(); lsm_set_self_attr(); - or -
>>>>>          lsm_set_self_attr(); clone();
>>>>>
>>>>> is going to lead to cases where neither order really works correctly.
>>>> I was envisioning something that works similarly to LSM_ATTR_EXEC
>>>> where the unshare isn't immediate, but rather happens at a future
>>>> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
>>>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
>>> The next unshare(2) would make more sense to me.
>> That's definitely something to discuss.  I've been fairly loose on
>> that in the discussion thus far, but as things are starting to settle
>> on the lsm_set_self_attr(2) approach as one API, we should start to
>> clarify that.
>>
>>> This deferred operation could be requested with a flag in
>>> lsm_config_system_policy(2) instead:
>>> https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com
>> I want to keep the policy syscall work separate from the LSM namespace
>> discussion as we don't want to require a policy load operation to
>> create a new LSM namespace.  I think it's probably okay if the policy
>> syscall work were to be namespace aware so that an orchestrator could
>> load a LSM policy into a LSM namespace other than it's own, but that
>> is still not overly dependent on what we are discussing here (yes,
>> maybe it is a little, but only just so).
> 
> Policy load and namespace manipulation *must* be kept separate. Smack
> requires the ability to "load policy" at any time. Smack allows a process
> to add "policy" to further restrict its own access, and does not require
> a namespace change. There has been an implementation of namespaces for
> Smack, but the developers disappeared quietly and sadly no one picked it
> up. Introducing a requirement that LSMs support namespaces in order to
> load policy beyond system initialization is a non-starter.
> 
yes the ability to load policy must be exist separately, however
policy load could be made namespace aware so that a parent could
inject policy into a child.

There is also an open question as to whether we need to allow, but not
require, some kind of policy manipulation/injection with the creation
of the LSM namespace so that the there is an atomic transition with
entering the namespace. Is there a case where policy really needs to
be present atomically with the creation of the namespace? If so we
need to further break it down to

1. is it sufficient for the LSM to do it, without container manager
guidance?  An inherit of policy, or already present policy that can be
injected. Then we don't need policy load inject to be considered at
the point of clone/unshare.

2. do we need to let the container manager hint/load policy.

So far I think the inherit/policy directed injection works for
apparmor, and selinux. Container managers generally speaking have to
additional setup after the container is created before running the
work load, which means a separate load phase should be fine.

However I can see an argument for having policy in place when
clone/unshare exit. Admittedly atm its largely around flexibility, and
nebulous ill defined use cases. Just because something works for
apparmor, selinux, and I think smack, doesn't mean it would work for
all use cases.

But we also should add flexibility for flexibility just because we can
see there might be some future utility for some future use case. It
would certainly make the interface uglier, and more complicated, and I
would hate to have to carry that without a concrete use case.

I think unless there is a solid use case for making clone/unshare
policy aware we don't worry about it for now. A new interface can be
add in the future if the capability is really needed.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-22 19:59           ` John Johansen
@ 2025-08-23 17:41             ` Dr. Greg
  2025-08-23 23:00               ` John Johansen
  0 siblings, 1 reply; 43+ messages in thread
From: Dr. Greg @ 2025-08-23 17:41 UTC (permalink / raw)
  To: John Johansen
  Cc: Casey Schaufler, Paul Moore, Micka??l Sala??n,
	linux-security-module, selinux, Stephen Smalley, Maxime B??lair

On Fri, Aug 22, 2025 at 12:59:29PM -0700, John Johansen wrote:

Good morning, I hope the weekend is going well for everyone.

> On 8/22/25 07:47, Casey Schaufler wrote:
> >On 8/21/2025 7:14 PM, Paul Moore wrote:
> >>On Thu, Aug 21, 2025 at 6:00???AM Micka??l Sala??n <mic@digikod.net> 
> >>wrote:
> >>>On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote:
> >>>>On Tue, Aug 19, 2025 at 1:11???PM Casey Schaufler 
> >>>><casey@schaufler-ca.com> wrote:
> >>>>>The advantage of a clone flag is that the operation is atomic with
> >>>>>the other namespace flag based behaviors. Having a two step process
> >>>>>
> >>>>>         clone(); lsm_set_self_attr(); - or -
> >>>>>         lsm_set_self_attr(); clone();
> >>>>>
> >>>>>is going to lead to cases where neither order really works correctly.
> >>>>I was envisioning something that works similarly to LSM_ATTR_EXEC
> >>>>where the unshare isn't immediate, but rather happens at a future
> >>>>event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
> >>>>LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
> >>>The next unshare(2) would make more sense to me.
> >>That's definitely something to discuss.  I've been fairly loose on
> >>that in the discussion thus far, but as things are starting to settle
> >>on the lsm_set_self_attr(2) approach as one API, we should start to
> >>clarify that.
> >>
> >>>This deferred operation could be requested with a flag in
> >>>lsm_config_system_policy(2) instead:
> >>>https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com
> >>I want to keep the policy syscall work separate from the LSM namespace
> >>discussion as we don't want to require a policy load operation to
> >>create a new LSM namespace.  I think it's probably okay if the policy
> >>syscall work were to be namespace aware so that an orchestrator could
> >>load a LSM policy into a LSM namespace other than it's own, but that
> >>is still not overly dependent on what we are discussing here (yes,
> >>maybe it is a little, but only just so).
> >
> >Policy load and namespace manipulation *must* be kept separate. Smack
> >requires the ability to "load policy" at any time. Smack allows a process
> >to add "policy" to further restrict its own access, and does not require
> >a namespace change. There has been an implementation of namespaces for
> >Smack, but the developers disappeared quietly and sadly no one picked it
> >up. Introducing a requirement that LSMs support namespaces in order to
> >load policy beyond system initialization is a non-starter.

> yes the ability to load policy must be exist separately, however
> policy load could be made namespace aware so that a parent could
> inject policy into a child.

Policy or model load, specific to the subordinate namespace, will be
a necessity.

As Casey noted, some LSM namespaces will require configuration and
management calls well after the namespace has started.  Other LSM's
will want the configuration to be completed before the namespace
starts, with any further configurations to the namespace blocked.

There is a very valid security rationale for isolating the capability
for namespace separation from the capability that allows the
configuration of a security model.  It would be an entirely realistic
security objective for a namespace to block further separation
attempts, while still allowing for management operations to be
conducted in the context of the subordinate namespace.

Hence the rationale for splitting CAP_MAC_ADMIN from whatever name the
bike shedding process around the new capability naming process
produces.

> There is also an open question as to whether we need to allow, but
> not require, some kind of policy manipulation/injection with the
> creation of the LSM namespace so that the there is an atomic
> transition with entering the namespace. Is there a case where policy
> really needs to be present atomically with the creation of the
> namespace? If so we need to further break it down to
>
> 1. is it sufficient for the LSM to do it, without container manager
> guidance?  An inherit of policy, or already present policy that can be
> injected. Then we don't need policy load inject to be considered at
> the point of clone/unshare.
> 
> 2. do we need to let the container manager hint/load policy.

Policy load needs to be atomic with respect to namespace separation.
In other words, the policy needs to be in place when execution within
the context of the new security namespace begins.

A resource orchestrator will need the ability to load the new policy
that will be enforced into the context of the new namespace.

In the case of some model/integrity based LSM's, the security events
related to the policy load need to occur in the context of the parent
LSM namespace.

See the writings of Werner Karl Heisenberg for the reasoning behind
that... :-)

> So far I think the inherit/policy directed injection works for
> apparmor, and selinux. Container managers generally speaking have to
> additional setup after the container is created before running the
> work load, which means a separate load phase should be fine.
> 
> However I can see an argument for having policy in place when
> clone/unshare exit. Admittedly atm its largely around flexibility, and
> nebulous ill defined use cases. Just because something works for
> apparmor, selinux, and I think smack, doesn't mean it would work for
> all use cases.
> 
> But we also should add flexibility for flexibility just because we can
> see there might be some future utility for some future use case. It
> would certainly make the interface uglier, and more complicated, and I
> would hate to have to carry that without a concrete use case.
> 
> I think unless there is a solid use case for making clone/unshare
> policy aware we don't worry about it for now. A new interface can be
> add in the future if the capability is really needed.

We will respond more directly to the issue of clone, unshare and
external process entry, in the other thread where you initiated a
discussion of these issues.  We believe there is a strong argument to
be made that LSM namespace separation is a poor fit for the classic
fork/unshare model of the other resource namespaces.

Among other issues, a direct separation model places the complexity of
policy verification and loading in userspace.  As was noted above,
accounting for the security events related to the policy verification
and load process, in the orchestrator process, will be a requirement
for some integrity and functional models.

Have a good weekend.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-23 17:41             ` Dr. Greg
@ 2025-08-23 23:00               ` John Johansen
  0 siblings, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-23 23:00 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Casey Schaufler, Paul Moore, Micka??l Sala??n,
	linux-security-module, selinux, Stephen Smalley, Maxime B??lair

On 8/23/25 10:41, Dr. Greg wrote:
> On Fri, Aug 22, 2025 at 12:59:29PM -0700, John Johansen wrote:
> 
> Good morning, I hope the weekend is going well for everyone.
> 
>> On 8/22/25 07:47, Casey Schaufler wrote:
>>> On 8/21/2025 7:14 PM, Paul Moore wrote:
>>>> On Thu, Aug 21, 2025 at 6:00???AM Micka??l Sala??n <mic@digikod.net>
>>>> wrote:
>>>>> On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote:
>>>>>> On Tue, Aug 19, 2025 at 1:11???PM Casey Schaufler
>>>>>> <casey@schaufler-ca.com> wrote:
>>>>>>> The advantage of a clone flag is that the operation is atomic with
>>>>>>> the other namespace flag based behaviors. Having a two step process
>>>>>>>
>>>>>>>          clone(); lsm_set_self_attr(); - or -
>>>>>>>          lsm_set_self_attr(); clone();
>>>>>>>
>>>>>>> is going to lead to cases where neither order really works correctly.
>>>>>> I was envisioning something that works similarly to LSM_ATTR_EXEC
>>>>>> where the unshare isn't immediate, but rather happens at a future
>>>>>> event.  With LSM_ATTR_EXEC it happens at the next exec*(), with
>>>>>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*().
>>>>> The next unshare(2) would make more sense to me.
>>>> That's definitely something to discuss.  I've been fairly loose on
>>>> that in the discussion thus far, but as things are starting to settle
>>>> on the lsm_set_self_attr(2) approach as one API, we should start to
>>>> clarify that.
>>>>
>>>>> This deferred operation could be requested with a flag in
>>>>> lsm_config_system_policy(2) instead:
>>>>> https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com
>>>> I want to keep the policy syscall work separate from the LSM namespace
>>>> discussion as we don't want to require a policy load operation to
>>>> create a new LSM namespace.  I think it's probably okay if the policy
>>>> syscall work were to be namespace aware so that an orchestrator could
>>>> load a LSM policy into a LSM namespace other than it's own, but that
>>>> is still not overly dependent on what we are discussing here (yes,
>>>> maybe it is a little, but only just so).
>>>
>>> Policy load and namespace manipulation *must* be kept separate. Smack
>>> requires the ability to "load policy" at any time. Smack allows a process
>>> to add "policy" to further restrict its own access, and does not require
>>> a namespace change. There has been an implementation of namespaces for
>>> Smack, but the developers disappeared quietly and sadly no one picked it
>>> up. Introducing a requirement that LSMs support namespaces in order to
>>> load policy beyond system initialization is a non-starter.
> 
>> yes the ability to load policy must be exist separately, however
>> policy load could be made namespace aware so that a parent could
>> inject policy into a child.
> 
> Policy or model load, specific to the subordinate namespace, will be
> a necessity.
> 
> As Casey noted, some LSM namespaces will require configuration and
> management calls well after the namespace has started.  Other LSM's
> will want the configuration to be completed before the namespace
> starts, with any further configurations to the namespace blocked.
> 
> There is a very valid security rationale for isolating the capability
> for namespace separation from the capability that allows the
> configuration of a security model.  It would be an entirely realistic
> security objective for a namespace to block further separation
> attempts, while still allowing for management operations to be
> conducted in the context of the subordinate namespace.
> 
> Hence the rationale for splitting CAP_MAC_ADMIN from whatever name the
> bike shedding process around the new capability naming process
> produces.
> 
>> There is also an open question as to whether we need to allow, but
>> not require, some kind of policy manipulation/injection with the
>> creation of the LSM namespace so that the there is an atomic
>> transition with entering the namespace. Is there a case where policy
>> really needs to be present atomically with the creation of the
>> namespace? If so we need to further break it down to
>>
>> 1. is it sufficient for the LSM to do it, without container manager
>> guidance?  An inherit of policy, or already present policy that can be
>> injected. Then we don't need policy load inject to be considered at
>> the point of clone/unshare.
>>
>> 2. do we need to let the container manager hint/load policy.
> 
> Policy load needs to be atomic with respect to namespace separation.
> In other words, the policy needs to be in place when execution within
> the context of the new security namespace begins.
> 
no, it _may_ need to be depending on the model/policy being used, and
an LSM is in the best place to make that decision and do it for its own
policy as long as the infrastructure supports it.

> A resource orchestrator will need the ability to load the new policy
> that will be enforced into the context of the new namespace.

No an LSM is fully capable of doing this and honestly in a better
position to do so for its own policy than an external orchestrator.
Where coordination orchestration is need is at the infrastructure layer
(LSM), to ensure once everything is decided by inidivual LSMs that
what the security context is atomically setup correctly.

So in that sense the LSM infrastructure is an orchestrator, but only
in the loosest sense.

> 
> In the case of some model/integrity based LSM's, the security events
> related to the policy load need to occur in the context of the parent
> LSM namespace.
> 
yes, it very much depends on the model. I would argue if the LSM needs
this.
1. the policy at the exec/fork/clone/unshare point already needs to
    be loaded.
2. the LSMs policy needs a way to initiate the transition. Eg. in
    the selinux case, the transition is setting up a new layer in
    mediation that will be bounded by the previous layers. There isn't
    a transition from one policy to another, but adding a new layer
    on top of.

> See the writings of Werner Karl Heisenberg for the reasoning behind
> that... :-)
> 
>> So far I think the inherit/policy directed injection works for
>> apparmor, and selinux. Container managers generally speaking have to
>> additional setup after the container is created before running the
>> work load, which means a separate load phase should be fine.
>>
>> However I can see an argument for having policy in place when
>> clone/unshare exit. Admittedly atm its largely around flexibility, and
>> nebulous ill defined use cases. Just because something works for
>> apparmor, selinux, and I think smack, doesn't mean it would work for
>> all use cases.
>>
>> But we also should add flexibility for flexibility just because we can
>> see there might be some future utility for some future use case. It
>> would certainly make the interface uglier, and more complicated, and I
>> would hate to have to carry that without a concrete use case.
>>
>> I think unless there is a solid use case for making clone/unshare
>> policy aware we don't worry about it for now. A new interface can be
>> add in the future if the capability is really needed.
> 
> We will respond more directly to the issue of clone, unshare and
> external process entry, in the other thread where you initiated a
> discussion of these issues.  We believe there is a strong argument to
> be made that LSM namespace separation is a poor fit for the classic
> fork/unshare model of the other resource namespaces.
> 
the other resource namespaces being able to move independent of the
security namespace, or at least mediation by the security namespace is
a complete disaster and should not have ever been allowed.

> Among other issues, a direct separation model places the complexity of
> policy verification and loading in userspace.  As was noted above,
> accounting for the security events related to the policy verification
> and load process, in the orchestrator process, will be a requirement
> for some integrity and functional models.
> 
There are different levels of verification. It makes sense to do some
of it in the individual LSM, some of it in userspace, and potentially
some at another level in another LSM. Unfortunately Linux has forced the
concept of containers to be a user level construct, and this forces
certain verifications around containers to be in userspace.

AppArmor does a policy verification checking that policy meet all the
bounding constraints etc. Is very different than the verification check
that IMA may doing check that this policy is blessed and allowed to be
loaded. AppArmor could support some IMA verification but is very much
designed to be like landlock in that unprivileged userspace _may_ have
privilege to load policy into the kernel. You may not want to allow
that on some systems, but you certainly do on others. The system level
signature check that IMA does isn't appropriate for unprivileged
user policy. But the apparmor verification check is.

and Yes something like IMA that is doing a system level integrity is going
to need a post policy load callback to do verification. This again doesn't
need an orchestrator, but just support in the infrastructure, and a
callback that individual LSMs can trigger. See the work Paul is doing
to rework the LSM init or how IMA is doing a verification of selinux
policy.

Of course you have to trust the LSMs to trigger the callback, but its
opersource and the code can be checked. If you can't trust the individual
LSMs you have a much bigger problem because you just can't trust a
monolithic kernel and you are going need a trust zone/hyper visor above
the kernel to do any form of integrity check you can trust.


> Have a good weekend.
> 
> As always,
> Dr. Greg
> 
> The Quixote Project - Flailing at the Travails of Cybersecurity
>                https://github.com/Quixote-Project
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 14:56 LSM namespacing API Paul Moore
  2025-08-19 17:11 ` Casey Schaufler
@ 2025-08-19 17:47 ` Stephen Smalley
  2025-08-19 18:51   ` Paul Moore
  2025-08-21  7:46   ` John Johansen
  2025-08-21  7:14 ` John Johansen
  2025-08-21 11:20 ` Dr. Greg
  3 siblings, 2 replies; 43+ messages in thread
From: Stephen Smalley @ 2025-08-19 17:47 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-security-module, selinux, John Johansen

On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote:
>
> Hello all,
>
> As most of you are likely aware, Stephen Smalley has been working on
> adding namespace support to SELinux, and the work has now progressed
> to the point where a serious discussion on the API is warranted.  For
> those of you are unfamiliar with the details or Stephen's patchset, or
> simply need a refresher, he has some excellent documentation in his
> work-in-progress repo:
>
> * https://github.com/stephensmalley/selinuxns
>
> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> about SELinux namespacing, you can watch the presentation here:
>
> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>
> In the past you've heard me state, rather firmly at times, that I
> believe namespacing at the LSM framework layer to be a mistake,
> although if there is something that can be done to help facilitate the
> namespacing of individual LSMs at the framework layer, I would be
> supportive of that.  I think that a single LSM namespace API, similar
> to our recently added LSM syscalls, may be such a thing, so I'd like
> us to have a discussion to see if we all agree on that, and if so,
> what such an API might look like.
>
> At LSS-NA this year, John Johansen and I had a brief discussion where
> he suggested a single LSM wide clone*(2) flag that individual LSM's
> could opt into via callbacks.  John is directly CC'd on this mail, so
> I'll let him expand on this idea.
>
> While I agree with John that a fs based API is problematic (see all of
> our discussions around the LSM syscalls), I'm concerned that a single
> clone*(2) flag will significantly limit our flexibility around how
> individual LSMs are namespaced, something I don't want to see happen.
> This makes me wonder about the potential for expanding
> lsm_set_self_attr(2) to support a new LSM attribute that would support
> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> provide a single LSM framework API for an unshare operation while also
> providing a mechanism to pass LSM specific via the lsm_ctx struct if
> needed.  Just as we do with the other LSM_ATTR_* flags today,
> individual LSMs can opt-in to the API fairly easily by providing a
> setselfattr() LSM callback.
>
> Thoughts?

I think we want to be able to unshare a specific security module
namespace without unsharing the others, i.e. just SELinux or just
AppArmor.
Not sure if your suggestion above supports that already but wanted to note it.
Regardless, I have no objections to any system call or flag that can
be used to unshare the SELinux namespace and it should be trivial to
wire it up to the existing underlying function.
Serge pointed out that we also will need an API to attach to an
existing SELinux namespace, which I captured here:
https://github.com/stephensmalley/selinuxns/issues/19
This is handled for other Linux namespaces by opening a pseudo file
under /proc/pid/ns and invoking setns(2), so not sure how we want to
do it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 17:47 ` Stephen Smalley
@ 2025-08-19 18:51   ` Paul Moore
  2025-08-19 18:52     ` Paul Moore
                       ` (2 more replies)
  2025-08-21  7:46   ` John Johansen
  1 sibling, 3 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-19 18:51 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: linux-security-module, selinux, John Johansen

On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> I think we want to be able to unshare a specific security module
> namespace without unsharing the others, i.e. just SELinux or just
> AppArmor.
> Not sure if your suggestion above supports that already but wanted to note it.

The lsm_set_self_attr(2) approach allows for LSM specific unshare
operations.  Take the existing LSM_ATTR_EXEC attribute as an example,
two LSMs have implemented support (AppArmor and SELinux), and
userspace can independently set the attribute as desired for each LSM.

> Serge pointed out that we also will need an API to attach to an
> existing SELinux namespace, which I captured here:
> https://github.com/stephensmalley/selinuxns/issues/19
> This is handled for other Linux namespaces by opening a pseudo file
> under /proc/pid/ns and invoking setns(2), so not sure how we want to
> do it.

One option would be to have a the LSM framework return a LSM namespace
"handle" for a given LSM using lsm_get_self_attr(2) and then do a
setns(2)-esque operation using lsm_set_self_attr(2) with that
"handle".  We would need to figure out what would constitute a
"handle" but let's just mark that as TBD for now with this approach (I
think better options are available).

Since we have an existing LSM namespace combination, with processes
running inside of it, it might be sufficient to simply support moving
into an existing LSM namespace set with setns(2) using only a pidfd
and a new CLONE_LSMNS flag (or similar, upstream might want this as
CLONE_NEWLSM).  This would simply set the LSM namespace set for the
setns(2) caller to match that of the target pidfd.  We still wouldn't
want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().

Any other ideas?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 18:51   ` Paul Moore
@ 2025-08-19 18:52     ` Paul Moore
  2025-08-20 14:44     ` Mickaël Salaün
  2025-08-21  2:05     ` Serge E. Hallyn
  2 siblings, 0 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-19 18:52 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: linux-security-module, selinux, John Johansen

On Tue, Aug 19, 2025 at 2:51 PM Paul Moore <paul@paul-moore.com> wrote:
> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> >
> > I think we want to be able to unshare a specific security module
> > namespace without unsharing the others, i.e. just SELinux or just
> > AppArmor.
> > Not sure if your suggestion above supports that already but wanted to note it.
>
> The lsm_set_self_attr(2) approach allows for LSM specific unshare
> operations.  Take the existing LSM_ATTR_EXEC attribute as an example,
> two LSMs have implemented support (AppArmor and SELinux), and
> userspace can independently set the attribute as desired for each LSM.

I should add, for those that didn't follow the lsm_set_self_attr(2)
development, if you want to set the same attribute on multiple LSMs,
you must make multiple calls to lsm_set_self_attr(2) (think of error
handling/conditions).

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 18:51   ` Paul Moore
  2025-08-19 18:52     ` Paul Moore
@ 2025-08-20 14:44     ` Mickaël Salaün
  2025-08-20 15:37       ` Casey Schaufler
  2025-08-20 20:47       ` Paul Moore
  2025-08-21  2:05     ` Serge E. Hallyn
  2 siblings, 2 replies; 43+ messages in thread
From: Mickaël Salaün @ 2025-08-20 14:44 UTC (permalink / raw)
  To: Paul Moore
  Cc: Stephen Smalley, linux-security-module, selinux, John Johansen,
	Maxime Bélair

On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> >
> > I think we want to be able to unshare a specific security module
> > namespace without unsharing the others, i.e. just SELinux or just
> > AppArmor.
> > Not sure if your suggestion above supports that already but wanted to note it.
> 
> The lsm_set_self_attr(2) approach allows for LSM specific unshare
> operations.  Take the existing LSM_ATTR_EXEC attribute as an example,
> two LSMs have implemented support (AppArmor and SELinux), and
> userspace can independently set the attribute as desired for each LSM.
> 
> > Serge pointed out that we also will need an API to attach to an
> > existing SELinux namespace, which I captured here:
> > https://github.com/stephensmalley/selinuxns/issues/19
> > This is handled for other Linux namespaces by opening a pseudo file
> > under /proc/pid/ns and invoking setns(2), so not sure how we want to
> > do it.
> 
> One option would be to have a the LSM framework return a LSM namespace
> "handle" for a given LSM using lsm_get_self_attr(2) and then do a
> setns(2)-esque operation using lsm_set_self_attr(2) with that
> "handle".  We would need to figure out what would constitute a
> "handle" but let's just mark that as TBD for now with this approach (I
> think better options are available).
> 
> Since we have an existing LSM namespace combination, with processes
> running inside of it, it might be sufficient to simply support moving
> into an existing LSM namespace set with setns(2) using only a pidfd
> and a new CLONE_LSMNS flag (or similar, upstream might want this as
> CLONE_NEWLSM).  This would simply set the LSM namespace set for the

Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM
because the goal is not to add a new LSM but a new "security" namespace.
To fit with existing capabilities that could be reused by such security
namespace (CAP_MAC_ADMIN), CLONE_NEWMAC is another option.  I know that
LSM may not be enforce MAC, but I think "LSM" would be confusing for
users.

> setns(2) caller to match that of the target pidfd.  We still wouldn't
> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().

Why making clone*() support this flag would be an issue?

> 
> Any other ideas?

The goal of a namespace is to configure absolute references (e.g. file
path, network address, PID, time).  I think it would make sense to have
an LSM/MAC/SEC namespace that would enforce a consistent access control
on every processes in this namespace.  A related namespace file
descriptor could then be used with an LSM-specific syscall to configure
the policy related to a specific namespace (instead of only the current
namespace), see
https://lore.kernel.org/r/20250820.Ao3iquoshaiB@digikod.net
That would enables us to build a context before running untrusted code
in it, and to update the related security policy without requiring a
trusted (and exposed) process in each namespace.

I guess a security namespace would not be exclusive to an LSM but could
be shared, right?  If yes, then it's OK to only have one new security
namespace instead of one per LSM. ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-20 14:44     ` Mickaël Salaün
@ 2025-08-20 15:37       ` Casey Schaufler
  2025-08-20 20:47       ` Paul Moore
  1 sibling, 0 replies; 43+ messages in thread
From: Casey Schaufler @ 2025-08-20 15:37 UTC (permalink / raw)
  To: Mickaël Salaün, Paul Moore
  Cc: Stephen Smalley, linux-security-module, selinux, John Johansen,
	Maxime Bélair, Casey Schaufler

On 8/20/2025 7:44 AM, Mickaël Salaün wrote:
> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
>> <stephen.smalley.work@gmail.com> wrote:
>>> I think we want to be able to unshare a specific security module
>>> namespace without unsharing the others, i.e. just SELinux or just
>>> AppArmor.
>>> Not sure if your suggestion above supports that already but wanted to note it.
>> The lsm_set_self_attr(2) approach allows for LSM specific unshare
>> operations.  Take the existing LSM_ATTR_EXEC attribute as an example,
>> two LSMs have implemented support (AppArmor and SELinux), and
>> userspace can independently set the attribute as desired for each LSM.
>>
>>> Serge pointed out that we also will need an API to attach to an
>>> existing SELinux namespace, which I captured here:
>>> https://github.com/stephensmalley/selinuxns/issues/19
>>> This is handled for other Linux namespaces by opening a pseudo file
>>> under /proc/pid/ns and invoking setns(2), so not sure how we want to
>>> do it.
>> One option would be to have a the LSM framework return a LSM namespace
>> "handle" for a given LSM using lsm_get_self_attr(2) and then do a
>> setns(2)-esque operation using lsm_set_self_attr(2) with that
>> "handle".  We would need to figure out what would constitute a
>> "handle" but let's just mark that as TBD for now with this approach (I
>> think better options are available).
>>
>> Since we have an existing LSM namespace combination, with processes
>> running inside of it, it might be sufficient to simply support moving
>> into an existing LSM namespace set with setns(2) using only a pidfd
>> and a new CLONE_LSMNS flag (or similar, upstream might want this as
>> CLONE_NEWLSM).  This would simply set the LSM namespace set for the
> Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM
> because the goal is not to add a new LSM but a new "security" namespace.
> To fit with existing capabilities that could be reused by such security
> namespace (CAP_MAC_ADMIN), CLONE_NEWMAC is another option.  I know that
> LSM may not be enforce MAC, but I think "LSM" would be confusing for
> users.

I disagree. Using MAC in the name is bad because many LSMs don't do MAC.
Using SEC is even worse, because no two "users" define "security" the
same way, and most of what implements security in Linux is outside of
LSMs. Since this feature would be limited to use by LSMs, it makes sense
that LSM be in the name.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-20 14:44     ` Mickaël Salaün
  2025-08-20 15:37       ` Casey Schaufler
@ 2025-08-20 20:47       ` Paul Moore
  2025-08-21  9:56         ` Mickaël Salaün
  1 sibling, 1 reply; 43+ messages in thread
From: Paul Moore @ 2025-08-20 20:47 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Stephen Smalley, linux-security-module, selinux, John Johansen,
	Maxime Bélair

On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> > <stephen.smalley.work@gmail.com> wrote:

...

> > Since we have an existing LSM namespace combination, with processes
> > running inside of it, it might be sufficient to simply support moving
> > into an existing LSM namespace set with setns(2) using only a pidfd
> > and a new CLONE_LSMNS flag (or similar, upstream might want this as
> > CLONE_NEWLSM).  This would simply set the LSM namespace set for the
>
> Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM
> because the goal is not to add a new LSM but a new "security" namespace.

I disagree with your statement about the goal.  In fact I would argue
that one of the goals is to explicitly *not* create a generic
"security" namespace.  Defining a single, LSM-wide namespace, is
already an almost impossible task, extending it to become a generic
"security" namespace seems maddening.

> > setns(2) caller to match that of the target pidfd.  We still wouldn't
> > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
>
> Why making clone*() support this flag would be an issue?

With the understanding that I'm not going to support a single LSM-wide
namespace (see my previous comments), we would need multiple flags for
clone*(), one for each LSM that wanted to implement a namespace.
While clone3() has expanded the number of flag bits from clone(),
there is still a limitation of 64-bits and I'm fairly certain the
other kernel devs are not going to be supportive of a flag for each
LSM that wants one.

Maybe we could argue for our own u64 in cl_args, or create our own
lsm_clone(2) syscall that mimics clone3(2) with better LSM support,
but neither of these seem like great ideas at the moment.

> > Any other ideas?
>
> The goal of a namespace is to configure absolute references (e.g. file
> path, network address, PID, time).  I think it would make sense to have
> an LSM/MAC/SEC namespace that would enforce a consistent access control
> on every processes in this namespace.

Once again, I'm not going to support the idea of a namespace at the
LSM framework layer, individual LSMs are better suited to implementing
their own namespacing concepts.  However, I do support the LSM
framework providing an API and/or helpers to help make it easier for
individual LSMs and userspace to create/manage individual LSM
namespaces.

> A related namespace file
> descriptor could then be used with an LSM-specific syscall to configure
> the policy related to a specific namespace (instead of only the current
> namespace)

That is a reasonable request, and I think the same underlying solution
that we would use for setns(2) could also be used here.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-20 20:47       ` Paul Moore
@ 2025-08-21  9:56         ` Mickaël Salaün
  2025-08-21 14:18           ` John Johansen
  2025-08-22  2:09           ` Paul Moore
  0 siblings, 2 replies; 43+ messages in thread
From: Mickaël Salaün @ 2025-08-21  9:56 UTC (permalink / raw)
  To: Paul Moore
  Cc: Stephen Smalley, linux-security-module, selinux, John Johansen,
	Maxime Bélair

On Wed, Aug 20, 2025 at 04:47:15PM -0400, Paul Moore wrote:
> On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote:
> > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> > > <stephen.smalley.work@gmail.com> wrote:
> 
> ...
> 
> > > Since we have an existing LSM namespace combination, with processes
> > > running inside of it, it might be sufficient to simply support moving
> > > into an existing LSM namespace set with setns(2) using only a pidfd
> > > and a new CLONE_LSMNS flag (or similar, upstream might want this as
> > > CLONE_NEWLSM).  This would simply set the LSM namespace set for the
> >
> > Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM
> > because the goal is not to add a new LSM but a new "security" namespace.
> 
> I disagree with your statement about the goal.  In fact I would argue
> that one of the goals is to explicitly *not* create a generic
> "security" namespace.  Defining a single, LSM-wide namespace, is
> already an almost impossible task, extending it to become a generic
> "security" namespace seems maddening.

I didn't suggest a generic "security" namespace that would include
non-LSM access checks, just using the name "security" instead of "LSM",
but never mind.

> 
> > > setns(2) caller to match that of the target pidfd.  We still wouldn't
> > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
> >
> > Why making clone*() support this flag would be an issue?
> 
> With the understanding that I'm not going to support a single LSM-wide
> namespace (see my previous comments), we would need multiple flags for

I'm confused about the goal of this thread...  When I read namespace I
think about the user space interface that enables to tie a set of
processes to ambient kernel objects.  I'm not suggesting to force all
LSM to handle namespaces, but to have a unified user space interface
(i.e. namespace flag, file descriptor...) that can be used by user space
to request a new "context" that may or may not be used by running LSMs.

> clone*(), one for each LSM that wanted to implement a namespace.

My understanding of this proposal was to create a LSM-wide namespace,
and one of the reason was to avoid one namespace per LSM.  As I
explained in my previous email, I think it would make sense and could be
convincing.

> While clone3() has expanded the number of flag bits from clone(),
> there is still a limitation of 64-bits and I'm fairly certain the
> other kernel devs are not going to be supportive of a flag for each
> LSM that wants one.
> 
> Maybe we could argue for our own u64 in cl_args, or create our own
> lsm_clone(2) syscall that mimics clone3(2) with better LSM support,
> but neither of these seem like great ideas at the moment.

My idea was that using CLONE_NEWLSM would just fork the current/initial
namespace used by LSMs to tie security policies/configurations to
processes, but as John already said, it would be the responsibility of
each LSM to either inherit and keep in sync the parent policy (e.g.
SELinux) or start with a blank/default one (e.g. Yama).

One way to configure a newly created namespace could be to load a
configuration in the parent namespace (e.g. with one of the new LSM
config syscall and a dedicated flag) that would only be applied to child
namespaces when they are created, similarly to attr/exec for execve(2).
I think this is what you meant with the LSM_UNSHARE flag, right?

> 
> > > Any other ideas?
> >
> > The goal of a namespace is to configure absolute references (e.g. file
> > path, network address, PID, time).  I think it would make sense to have
> > an LSM/MAC/SEC namespace that would enforce a consistent access control
> > on every processes in this namespace.
> 
> Once again, I'm not going to support the idea of a namespace at the
> LSM framework layer, individual LSMs are better suited to implementing
> their own namespacing concepts.  However, I do support the LSM
> framework providing an API and/or helpers to help make it easier for
> individual LSMs and userspace to create/manage individual LSM
> namespaces.

Should we still talk about "namespace" or use another name?

> 
> > A related namespace file
> > descriptor could then be used with an LSM-specific syscall to configure
> > the policy related to a specific namespace (instead of only the current
> > namespace)
> 
> That is a reasonable request, and I think the same underlying solution
> that we would use for setns(2) could also be used here.

I'm not sure having a set of namespace file descriptors without related
clone flags would be acceptable, at least for what we currently call
Linux "namespace".

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  9:56         ` Mickaël Salaün
@ 2025-08-21 14:18           ` John Johansen
  2025-08-22  2:09           ` Paul Moore
  1 sibling, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-21 14:18 UTC (permalink / raw)
  To: Mickaël Salaün, Paul Moore
  Cc: Stephen Smalley, linux-security-module, selinux,
	Maxime Bélair

On 8/21/25 02:56, Mickaël Salaün wrote:
> On Wed, Aug 20, 2025 at 04:47:15PM -0400, Paul Moore wrote:
>> On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote:
>>> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
>>>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
>>>> <stephen.smalley.work@gmail.com> wrote:
>>
>> ...
>>
>>>> Since we have an existing LSM namespace combination, with processes
>>>> running inside of it, it might be sufficient to simply support moving
>>>> into an existing LSM namespace set with setns(2) using only a pidfd
>>>> and a new CLONE_LSMNS flag (or similar, upstream might want this as
>>>> CLONE_NEWLSM).  This would simply set the LSM namespace set for the
>>>
>>> Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM
>>> because the goal is not to add a new LSM but a new "security" namespace.
>>
>> I disagree with your statement about the goal.  In fact I would argue
>> that one of the goals is to explicitly *not* create a generic
>> "security" namespace.  Defining a single, LSM-wide namespace, is
>> already an almost impossible task, extending it to become a generic
>> "security" namespace seems maddening.
> 
> I didn't suggest a generic "security" namespace that would include
> non-LSM access checks, just using the name "security" instead of "LSM",
> but never mind.
> 
>>
>>>> setns(2) caller to match that of the target pidfd.  We still wouldn't
>>>> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
>>>
>>> Why making clone*() support this flag would be an issue?
>>
>> With the understanding that I'm not going to support a single LSM-wide
>> namespace (see my previous comments), we would need multiple flags for
> 
> I'm confused about the goal of this thread...  When I read namespace I
> think about the user space interface that enables to tie a set of
> processes to ambient kernel objects.  I'm not suggesting to force all
> LSM to handle namespaces, but to have a unified user space interface
> (i.e. namespace flag, file descriptor...) that can be used by user space
> to request a new "context" that may or may not be used by running LSMs.
> 

Yes to a unified interface, no to an LSM wide namespace. The interface
could request of the LSM to namespace, but its up to the LSM what it
will do. If it creates a namespace, whether that namespace is hierarchical,
or flat.

You would at the end of the call likely get a proxy object to a set of
individual LSM namespace contexts. Not that different than you have a
set of different system namespaces, mount, pid, user, ...

>> clone*(), one for each LSM that wanted to implement a namespace.
> 
> My understanding of this proposal was to create a LSM-wide namespace,
> and one of the reason was to avoid one namespace per LSM.  As I

no each LSM will do its own thing wrt namespacing. The proposal is just
to provide a common API and minimal infra around it.

> explained in my previous email, I think it would make sense and could be
> convincing.
> 
I have to agree with Paul that we won't generically agree on what an LSM
namespace should be.

>> While clone3() has expanded the number of flag bits from clone(),
>> there is still a limitation of 64-bits and I'm fairly certain the
>> other kernel devs are not going to be supportive of a flag for each
>> LSM that wants one.
>>
>> Maybe we could argue for our own u64 in cl_args, or create our own
>> lsm_clone(2) syscall that mimics clone3(2) with better LSM support,
>> but neither of these seem like great ideas at the moment.
> 
> My idea was that using CLONE_NEWLSM would just fork the current/initial
> namespace used by LSMs to tie security policies/configurations to
> processes, but as John already said, it would be the responsibility of
> each LSM to either inherit and keep in sync the parent policy (e.g.
> SELinux) or start with a blank/default one (e.g. Yama).
> 
Its not just these options though. The container manager may want to
"drop/add" an LSM.
Eg. one fedora/RH booting an Ubuntu container your host has selinux
the container wants apparmor.

In reality you have both selinux and apparmor active on the system,
but selinux is an enforcing state, and apparmor is in a no-policy
state.

selinux could deny creating the namespace, it could return its current
state, or it could mask itself by creating a namespace for the container
with the default unconfined_t policy, but its current state is still
there bounding the container, the container just doesn't see it.

On the AppArmor side at the request for a new namespace with apparmor
it needs to decide what to do independent of what selinux does. Yes
if configured correctly it should setup its policy namespace for the
container, but it has choices just like selinux that are driven
by policy as well as the userspace request for a specific combination
of LSMs for the cntainer.

> One way to configure a newly created namespace could be to load a
> configuration in the parent namespace (e.g. with one of the new LSM
> config syscall and a dedicated flag) that would only be applied to child
> namespaces when they are created, similarly to attr/exec for execve(2).
host injecting policy into the container certainly could be supported
but I think that would be a per LSM thing.

attr/exec flags Paul was discussing (correct me if I am wrong), where
a way to specify which LSMs should but part of the unshare. So the
whole I want a container to support Ubuntu or RH and need these LSMs.

> I think this is what you meant with the LSM_UNSHARE flag, right?
> 
Per my above understanding the LSM_UNSHARE flag is then just a
namespacing that indicates you want to unshare the LSM and use the afore
mentioned attrs.

I don't think it is actually needed, but maybe desirable for consistency.
If you have already set the above attrs, that already indicates what
you want to do with the namespace at clone/unshare.

This then gets fed into every LSM (whether in the attrs or not). So they
can make current policy decision, and then if allowed, as second hook
with the info, so that they can each setup and return with their context
setup. Not really all that different from exec.

>>
>>>> Any other ideas?
>>>
>>> The goal of a namespace is to configure absolute references (e.g. file
>>> path, network address, PID, time).  I think it would make sense to have
>>> an LSM/MAC/SEC namespace that would enforce a consistent access control
>>> on every processes in this namespace.
>>
>> Once again, I'm not going to support the idea of a namespace at the
>> LSM framework layer, individual LSMs are better suited to implementing
>> their own namespacing concepts.  However, I do support the LSM
>> framework providing an API and/or helpers to help make it easier for
>> individual LSMs and userspace to create/manage individual LSM
>> namespaces.
> 
> Should we still talk about "namespace" or use another name?
> 
its namespaces for LSMs, just not an LSM namespace.

>>
>>> A related namespace file
>>> descriptor could then be used with an LSM-specific syscall to configure
>>> the policy related to a specific namespace (instead of only the current
>>> namespace)
>>
>> That is a reasonable request, and I think the same underlying solution
>> that we would use for setns(2) could also be used here.
> 
> I'm not sure having a set of namespace file descriptors without related
> clone flags would be acceptable, at least for what we currently call
> Linux "namespace".

well Paul did propose a single Clone_LSM flag that would cover them ;-).

Agree with Paul that a per LSM flag would be unlikely and just raise the
whole, security is crazy why can't you agree on one "fun".


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  9:56         ` Mickaël Salaün
  2025-08-21 14:18           ` John Johansen
@ 2025-08-22  2:09           ` Paul Moore
  1 sibling, 0 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-22  2:09 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Stephen Smalley, linux-security-module, selinux, John Johansen,
	Maxime Bélair

On Thu, Aug 21, 2025 at 5:56 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Wed, Aug 20, 2025 at 04:47:15PM -0400, Paul Moore wrote:
> > On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> > > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> > > > <stephen.smalley.work@gmail.com> wrote:
> >
> > ...
> >
> > > > Since we have an existing LSM namespace combination, with processes
> > > > running inside of it, it might be sufficient to simply support moving
> > > > into an existing LSM namespace set with setns(2) using only a pidfd
> > > > and a new CLONE_LSMNS flag (or similar, upstream might want this as
> > > > CLONE_NEWLSM).  This would simply set the LSM namespace set for the
> > >
> > > Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM
> > > because the goal is not to add a new LSM but a new "security" namespace.
> >
> > I disagree with your statement about the goal.  In fact I would argue
> > that one of the goals is to explicitly *not* create a generic
> > "security" namespace.  Defining a single, LSM-wide namespace, is
> > already an almost impossible task, extending it to become a generic
> > "security" namespace seems maddening.
>
> I didn't suggest a generic "security" namespace that would include
> non-LSM access checks, just using the name "security" instead of "LSM",
> but never mind.
>
> > > > setns(2) caller to match that of the target pidfd.  We still wouldn't
> > > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
> > >
> > > Why making clone*() support this flag would be an issue?
> >
> > With the understanding that I'm not going to support a single LSM-wide
> > namespace (see my previous comments), we would need multiple flags for
>
> I'm confused about the goal of this thread...  When I read namespace I
> think about the user space interface that enables to tie a set of
> processes to ambient kernel objects.  I'm not suggesting to force all
> LSM to handle namespaces, but to have a unified user space interface
> (i.e. namespace flag, file descriptor...) that can be used by user space
> to request a new "context" that may or may not be used by running LSMs.

The goal of this thread is to hopefully define a set of APIs that
allow userspace to create new LSM namespace sets, and join existing
LSM namespace sets.  We're not necessarily focused on any individual
LSM namespace concepts, beyond ensuring that the API provides enough
flexibility for the different concepts to be implemented.

> > clone*(), one for each LSM that wanted to implement a namespace.
>
> My understanding of this proposal was to create a LSM-wide namespace,
> and one of the reason was to avoid one namespace per LSM.

As I stated in my original email, perhaps not clearly enough, and
several times in the past, I have no interest in supporting a single
LSM-wide namespace at this point in time.  Any LSM namespaces must be
done at the individual LSM layer, although I am supportive of an API
at the LSM framework layer to both help facilitate the individual LSM
namespaces and provide a better userspace interface.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 18:51   ` Paul Moore
  2025-08-19 18:52     ` Paul Moore
  2025-08-20 14:44     ` Mickaël Salaün
@ 2025-08-21  2:05     ` Serge E. Hallyn
  2025-08-21  2:35       ` Paul Moore
  2025-08-21  8:07       ` John Johansen
  2 siblings, 2 replies; 43+ messages in thread
From: Serge E. Hallyn @ 2025-08-21  2:05 UTC (permalink / raw)
  To: Paul Moore; +Cc: Stephen Smalley, linux-security-module, selinux, John Johansen

On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> >
> > I think we want to be able to unshare a specific security module
> > namespace without unsharing the others, i.e. just SELinux or just
> > AppArmor.
> > Not sure if your suggestion above supports that already but wanted to note it.
> 
> The lsm_set_self_attr(2) approach allows for LSM specific unshare
> operations.  Take the existing LSM_ATTR_EXEC attribute as an example,
> two LSMs have implemented support (AppArmor and SELinux), and
> userspace can independently set the attribute as desired for each LSM.

Overall I really like the idea.

> > Serge pointed out that we also will need an API to attach to an
> > existing SELinux namespace, which I captured here:
> > https://github.com/stephensmalley/selinuxns/issues/19
> > This is handled for other Linux namespaces by opening a pseudo file
> > under /proc/pid/ns and invoking setns(2), so not sure how we want to
> > do it.
> 
> One option would be to have a the LSM framework return a LSM namespace
> "handle" for a given LSM using lsm_get_self_attr(2) and then do a
> setns(2)-esque operation using lsm_set_self_attr(2) with that
> "handle".  We would need to figure out what would constitute a
> "handle" but let's just mark that as TBD for now with this approach (I
> think better options are available).

The use case which would be complicated (not blocked) by this, is

* a runtime creates a process p1
  * p1 unshares its lsm namespace
* runtime forks a debug/admin process p2
  * p2 wants to enter p1's namespace

Of course the runtime could work around it by, before relinquishing
control of p1 to a new executable, returning the lsm_get_self_attr()
data to over a pipe.

Note I don't think we should support setting another task's namespace,
only getting its namespace ID.

> Since we have an existing LSM namespace combination, with processes
> running inside of it, it might be sufficient to simply support moving
> into an existing LSM namespace set with setns(2) using only a pidfd
> and a new CLONE_LSMNS flag (or similar, upstream might want this as
> CLONE_NEWLSM).  This would simply set the LSM namespace set for the
> setns(2) caller to match that of the target pidfd.  We still wouldn't
> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().

A part of me is telling (another part of) me that being able to setns
to a subset of the lsms could lead to privilege escapes through
weird policy configurations for the various LSMs.  In which case,
an all-or-nothing LSM setns might actually be preferable.

I haven't thought of a concrete example, though.

> Any other ideas?
> 
> -- 
> paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  2:05     ` Serge E. Hallyn
@ 2025-08-21  2:35       ` Paul Moore
  2025-08-21  3:02         ` Serge E. Hallyn
  2025-08-21  8:12         ` John Johansen
  2025-08-21  8:07       ` John Johansen
  1 sibling, 2 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-21  2:35 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Stephen Smalley, linux-security-module, selinux, John Johansen

On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> > <stephen.smalley.work@gmail.com> wrote:

...

> > > Serge pointed out that we also will need an API to attach to an
> > > existing SELinux namespace, which I captured here:
> > > https://github.com/stephensmalley/selinuxns/issues/19
> > > This is handled for other Linux namespaces by opening a pseudo file
> > > under /proc/pid/ns and invoking setns(2), so not sure how we want to
> > > do it.
> >
> > One option would be to have a the LSM framework return a LSM namespace
> > "handle" for a given LSM using lsm_get_self_attr(2) and then do a
> > setns(2)-esque operation using lsm_set_self_attr(2) with that
> > "handle".  We would need to figure out what would constitute a
> > "handle" but let's just mark that as TBD for now with this approach (I
> > think better options are available).
>
> The use case which would be complicated (not blocked) by this, is
>
> * a runtime creates a process p1
>   * p1 unshares its lsm namespace
> * runtime forks a debug/admin process p2
>   * p2 wants to enter p1's namespace
>
> Of course the runtime could work around it by, before relinquishing
> control of p1 to a new executable, returning the lsm_get_self_attr()
> data to over a pipe.
>
> Note I don't think we should support setting another task's namespace,
> only getting its namespace ID.
>
> > Since we have an existing LSM namespace combination, with processes
> > running inside of it, it might be sufficient to simply support moving
> > into an existing LSM namespace set with setns(2) using only a pidfd
> > and a new CLONE_LSMNS flag (or similar, upstream might want this as
> > CLONE_NEWLSM).  This would simply set the LSM namespace set for the
> > setns(2) caller to match that of the target pidfd.  We still wouldn't
> > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
>
> A part of me is telling (another part of) me that being able to setns
> to a subset of the lsms could lead to privilege escapes through
> weird policy configurations for the various LSMs.  In which case,
> an all-or-nothing LSM setns might actually be preferable.

Sorry I probably wasn't as clear as I should have been, but my idea
with using the existing procfs/setns(2) approach with a single
CLONE_NEWLSM (name pending sufficient bikeshedding) was that the
process being setns()'d would simply end up in the exact copy of the
target process' LSM namespace configuration, it shouldn't be a new
set/subset/configuration ... and I would expect us to have controls
around that such that LSMs could enforce policy on a setns(2)
operation that involved their LSM.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  2:35       ` Paul Moore
@ 2025-08-21  3:02         ` Serge E. Hallyn
  2025-08-22  1:50           ` Paul Moore
  2025-08-21  8:12         ` John Johansen
  1 sibling, 1 reply; 43+ messages in thread
From: Serge E. Hallyn @ 2025-08-21  3:02 UTC (permalink / raw)
  To: Paul Moore
  Cc: Serge E. Hallyn, Stephen Smalley, linux-security-module, selinux,
	John Johansen

On Wed, Aug 20, 2025 at 10:35:42PM -0400, Paul Moore wrote:
> On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> > > <stephen.smalley.work@gmail.com> wrote:
> 
> ...
> 
> > > > Serge pointed out that we also will need an API to attach to an
> > > > existing SELinux namespace, which I captured here:
> > > > https://github.com/stephensmalley/selinuxns/issues/19
> > > > This is handled for other Linux namespaces by opening a pseudo file
> > > > under /proc/pid/ns and invoking setns(2), so not sure how we want to
> > > > do it.
> > >
> > > One option would be to have a the LSM framework return a LSM namespace
> > > "handle" for a given LSM using lsm_get_self_attr(2) and then do a
> > > setns(2)-esque operation using lsm_set_self_attr(2) with that
> > > "handle".  We would need to figure out what would constitute a
> > > "handle" but let's just mark that as TBD for now with this approach (I
> > > think better options are available).
> >
> > The use case which would be complicated (not blocked) by this, is
> >
> > * a runtime creates a process p1
> >   * p1 unshares its lsm namespace
> > * runtime forks a debug/admin process p2
> >   * p2 wants to enter p1's namespace
> >
> > Of course the runtime could work around it by, before relinquishing
> > control of p1 to a new executable, returning the lsm_get_self_attr()
> > data to over a pipe.
> >
> > Note I don't think we should support setting another task's namespace,
> > only getting its namespace ID.
> >
> > > Since we have an existing LSM namespace combination, with processes
> > > running inside of it, it might be sufficient to simply support moving
> > > into an existing LSM namespace set with setns(2) using only a pidfd
> > > and a new CLONE_LSMNS flag (or similar, upstream might want this as
> > > CLONE_NEWLSM).  This would simply set the LSM namespace set for the
> > > setns(2) caller to match that of the target pidfd.  We still wouldn't
> > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
> >
> > A part of me is telling (another part of) me that being able to setns
> > to a subset of the lsms could lead to privilege escapes through
> > weird policy configurations for the various LSMs.  In which case,
> > an all-or-nothing LSM setns might actually be preferable.
> 
> Sorry I probably wasn't as clear as I should have been, but my idea
> with using the existing procfs/setns(2) approach with a single
> CLONE_NEWLSM (name pending sufficient bikeshedding) was that the
> process being setns()'d would simply end up in the exact copy of the
> target process' LSM namespace configuration, it shouldn't be a new

Oh, I think I was being unclear - I thought the first option, using
lsm_set_self_attr(), would allow choosing a subset of LSMs to setns to.
In contrast, the pure setns with a single flag is less flexible, but
possibly safer.  So I typed there the result of my train of thought,
which is that your second suggestion is probably preferable.

> set/subset/configuration ... and I would expect us to have controls
> around that such that LSMs could enforce policy on a setns(2)
> operation that involved their LSM.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  3:02         ` Serge E. Hallyn
@ 2025-08-22  1:50           ` Paul Moore
  0 siblings, 0 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-22  1:50 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Stephen Smalley, linux-security-module, selinux, John Johansen

On Wed, Aug 20, 2025 at 11:02 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Wed, Aug 20, 2025 at 10:35:42PM -0400, Paul Moore wrote:
> > On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> > > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
> > > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
> > > > <stephen.smalley.work@gmail.com> wrote:
> >
> > ...
> >
> > > > > Serge pointed out that we also will need an API to attach to an
> > > > > existing SELinux namespace, which I captured here:
> > > > > https://github.com/stephensmalley/selinuxns/issues/19
> > > > > This is handled for other Linux namespaces by opening a pseudo file
> > > > > under /proc/pid/ns and invoking setns(2), so not sure how we want to
> > > > > do it.
> > > >
> > > > One option would be to have a the LSM framework return a LSM namespace
> > > > "handle" for a given LSM using lsm_get_self_attr(2) and then do a
> > > > setns(2)-esque operation using lsm_set_self_attr(2) with that
> > > > "handle".  We would need to figure out what would constitute a
> > > > "handle" but let's just mark that as TBD for now with this approach (I
> > > > think better options are available).
> > >
> > > The use case which would be complicated (not blocked) by this, is
> > >
> > > * a runtime creates a process p1
> > >   * p1 unshares its lsm namespace
> > > * runtime forks a debug/admin process p2
> > >   * p2 wants to enter p1's namespace
> > >
> > > Of course the runtime could work around it by, before relinquishing
> > > control of p1 to a new executable, returning the lsm_get_self_attr()
> > > data to over a pipe.
> > >
> > > Note I don't think we should support setting another task's namespace,
> > > only getting its namespace ID.
> > >
> > > > Since we have an existing LSM namespace combination, with processes
> > > > running inside of it, it might be sufficient to simply support moving
> > > > into an existing LSM namespace set with setns(2) using only a pidfd
> > > > and a new CLONE_LSMNS flag (or similar, upstream might want this as
> > > > CLONE_NEWLSM).  This would simply set the LSM namespace set for the
> > > > setns(2) caller to match that of the target pidfd.  We still wouldn't
> > > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
> > >
> > > A part of me is telling (another part of) me that being able to setns
> > > to a subset of the lsms could lead to privilege escapes through
> > > weird policy configurations for the various LSMs.  In which case,
> > > an all-or-nothing LSM setns might actually be preferable.
> >
> > Sorry I probably wasn't as clear as I should have been, but my idea
> > with using the existing procfs/setns(2) approach with a single
> > CLONE_NEWLSM (name pending sufficient bikeshedding) was that the
> > process being setns()'d would simply end up in the exact copy of the
> > target process' LSM namespace configuration, it shouldn't be a new
>
> Oh, I think I was being unclear - I thought the first option, using
> lsm_set_self_attr(), would allow choosing a subset of LSMs to setns to.
> In contrast, the pure setns with a single flag is less flexible, but
> possibly safer.  So I typed there the result of my train of thought,
> which is that your second suggestion is probably preferable.

I think we've probably both been a bit off :)  Let me try again ...

I'm proposing the lsm_set_self_attr(2) approach as a way for a process
to setup an arbitrary set of LSM namespaces to take effect on an
upcoming clone() or exec() (we can discuss that detail).  I didn't
originally envision this as a way to potentially join existing LSM
namespaces, but rather a way to create new LSM namespaces when a new
process is created/exec'd.

The procfs/setns(2) approach would be in addition to the
lsm_set_self_attr(2) mechanism, and would allow a process to enter a
previously configured LSM namespace set when a CLONE_LSMNS (or
similar) flag was passed to setns(2).

Both mechanisms are very much up for debate in my mind, and doing
either or both, is possible as far as I'm concerned.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  2:35       ` Paul Moore
  2025-08-21  3:02         ` Serge E. Hallyn
@ 2025-08-21  8:12         ` John Johansen
  1 sibling, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-21  8:12 UTC (permalink / raw)
  To: Paul Moore, Serge E. Hallyn
  Cc: Stephen Smalley, linux-security-module, selinux

On 8/20/25 19:35, Paul Moore wrote:
> On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
>>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
>>> <stephen.smalley.work@gmail.com> wrote:
> 
> ...
> 
>>>> Serge pointed out that we also will need an API to attach to an
>>>> existing SELinux namespace, which I captured here:
>>>> https://github.com/stephensmalley/selinuxns/issues/19
>>>> This is handled for other Linux namespaces by opening a pseudo file
>>>> under /proc/pid/ns and invoking setns(2), so not sure how we want to
>>>> do it.
>>>
>>> One option would be to have a the LSM framework return a LSM namespace
>>> "handle" for a given LSM using lsm_get_self_attr(2) and then do a
>>> setns(2)-esque operation using lsm_set_self_attr(2) with that
>>> "handle".  We would need to figure out what would constitute a
>>> "handle" but let's just mark that as TBD for now with this approach (I
>>> think better options are available).
>>
>> The use case which would be complicated (not blocked) by this, is
>>
>> * a runtime creates a process p1
>>    * p1 unshares its lsm namespace
>> * runtime forks a debug/admin process p2
>>    * p2 wants to enter p1's namespace
>>
>> Of course the runtime could work around it by, before relinquishing
>> control of p1 to a new executable, returning the lsm_get_self_attr()
>> data to over a pipe.
>>
>> Note I don't think we should support setting another task's namespace,
>> only getting its namespace ID.
>>
>>> Since we have an existing LSM namespace combination, with processes
>>> running inside of it, it might be sufficient to simply support moving
>>> into an existing LSM namespace set with setns(2) using only a pidfd
>>> and a new CLONE_LSMNS flag (or similar, upstream might want this as
>>> CLONE_NEWLSM).  This would simply set the LSM namespace set for the
>>> setns(2) caller to match that of the target pidfd.  We still wouldn't
>>> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
>>
>> A part of me is telling (another part of) me that being able to setns
>> to a subset of the lsms could lead to privilege escapes through
>> weird policy configurations for the various LSMs.  In which case,
>> an all-or-nothing LSM setns might actually be preferable.
> 
> Sorry I probably wasn't as clear as I should have been, but my idea
> with using the existing procfs/setns(2) approach with a single
> CLONE_NEWLSM (name pending sufficient bikeshedding) was that the
> process being setns()'d would simply end up in the exact copy of the
> target process' LSM namespace configuration, it shouldn't be a new
> set/subset/configuration ... and I would expect us to have controls
> around that such that LSMs could enforce policy on a setns(2)
> operation that involved their LSM.
> 
entering as a complete set, is certainly the safest. At a minim the
LSMs are going to need to be able to specify the set of namespaces
the are needed if you enter the LSM namespace. The easiest way to
do this is what you propose, take away the flexibility and allow
moving everything as a set.

I do think we might still have a need to be able to request entering
an LSM namespace from the set, but I think that at least for a first
its probably better to not go there.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  2:05     ` Serge E. Hallyn
  2025-08-21  2:35       ` Paul Moore
@ 2025-08-21  8:07       ` John Johansen
  1 sibling, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-21  8:07 UTC (permalink / raw)
  To: Serge E. Hallyn, Paul Moore
  Cc: Stephen Smalley, linux-security-module, selinux

On 8/20/25 19:05, Serge E. Hallyn wrote:
> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
>> <stephen.smalley.work@gmail.com> wrote:
>>>
>>> I think we want to be able to unshare a specific security module
>>> namespace without unsharing the others, i.e. just SELinux or just
>>> AppArmor.
>>> Not sure if your suggestion above supports that already but wanted to note it.
>>
>> The lsm_set_self_attr(2) approach allows for LSM specific unshare
>> operations.  Take the existing LSM_ATTR_EXEC attribute as an example,
>> two LSMs have implemented support (AppArmor and SELinux), and
>> userspace can independently set the attribute as desired for each LSM.
> 
> Overall I really like the idea.
> 
>>> Serge pointed out that we also will need an API to attach to an
>>> existing SELinux namespace, which I captured here:
>>> https://github.com/stephensmalley/selinuxns/issues/19
>>> This is handled for other Linux namespaces by opening a pseudo file
>>> under /proc/pid/ns and invoking setns(2), so not sure how we want to
>>> do it.
>>
>> One option would be to have a the LSM framework return a LSM namespace
>> "handle" for a given LSM using lsm_get_self_attr(2) and then do a
>> setns(2)-esque operation using lsm_set_self_attr(2) with that
>> "handle".  We would need to figure out what would constitute a
>> "handle" but let's just mark that as TBD for now with this approach (I
>> think better options are available).
> 
> The use case which would be complicated (not blocked) by this, is
> 
> * a runtime creates a process p1
>    * p1 unshares its lsm namespace
> * runtime forks a debug/admin process p2
>    * p2 wants to enter p1's namespace
> 
> Of course the runtime could work around it by, before relinquishing
> control of p1 to a new executable, returning the lsm_get_self_attr()
> data to over a pipe.
> 
> Note I don't think we should support setting another task's namespace,
> only getting its namespace ID.
> 
its not reasonably doable without a significant update to the creds
architecture. Its an orthogal feature, being able to set another task's
credentials and as such can be saved for another argument. So very
much in agreement, lets not allow that as part of the design.


>> Since we have an existing LSM namespace combination, with processes
>> running inside of it, it might be sufficient to simply support moving
>> into an existing LSM namespace set with setns(2) using only a pidfd
>> and a new CLONE_LSMNS flag (or similar, upstream might want this as
>> CLONE_NEWLSM).  This would simply set the LSM namespace set for the
>> setns(2) caller to match that of the target pidfd.  We still wouldn't
>> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
> 
> A part of me is telling (another part of) me that being able to setns
> to a subset of the lsms could lead to privilege escapes through
> weird policy configurations for the various LSMs.  In which case,
> an all-or-nothing LSM setns might actually be preferable.
> 
> I haven't thought of a concrete example, though.
> 
Not just potentially, and not just security/LSM namespaces. Really

the LSMs need to be able to determine whether/which namespaces (including
system namespaces) need to move together as a set.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 17:47 ` Stephen Smalley
  2025-08-19 18:51   ` Paul Moore
@ 2025-08-21  7:46   ` John Johansen
  2025-08-21 14:26     ` Serge E. Hallyn
  2025-08-22  1:59     ` Paul Moore
  1 sibling, 2 replies; 43+ messages in thread
From: John Johansen @ 2025-08-21  7:46 UTC (permalink / raw)
  To: Stephen Smalley, Paul Moore; +Cc: linux-security-module, selinux

On 8/19/25 10:47, Stephen Smalley wrote:
> On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote:
>>
>> Hello all,
>>
>> As most of you are likely aware, Stephen Smalley has been working on
>> adding namespace support to SELinux, and the work has now progressed
>> to the point where a serious discussion on the API is warranted.  For
>> those of you are unfamiliar with the details or Stephen's patchset, or
>> simply need a refresher, he has some excellent documentation in his
>> work-in-progress repo:
>>
>> * https://github.com/stephensmalley/selinuxns
>>
>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>> about SELinux namespacing, you can watch the presentation here:
>>
>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>
>> In the past you've heard me state, rather firmly at times, that I
>> believe namespacing at the LSM framework layer to be a mistake,
>> although if there is something that can be done to help facilitate the
>> namespacing of individual LSMs at the framework layer, I would be
>> supportive of that.  I think that a single LSM namespace API, similar
>> to our recently added LSM syscalls, may be such a thing, so I'd like
>> us to have a discussion to see if we all agree on that, and if so,
>> what such an API might look like.
>>
>> At LSS-NA this year, John Johansen and I had a brief discussion where
>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>> could opt into via callbacks.  John is directly CC'd on this mail, so
>> I'll let him expand on this idea.
>>
>> While I agree with John that a fs based API is problematic (see all of
>> our discussions around the LSM syscalls), I'm concerned that a single
>> clone*(2) flag will significantly limit our flexibility around how
>> individual LSMs are namespaced, something I don't want to see happen.
>> This makes me wonder about the potential for expanding
>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>> provide a single LSM framework API for an unshare operation while also
>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>> individual LSMs can opt-in to the API fairly easily by providing a
>> setselfattr() LSM callback.
>>
>> Thoughts?
> 
> I think we want to be able to unshare a specific security module
> namespace without unsharing the others, i.e. just SELinux or just
> AppArmor.

yes which is part of the problem with the single flag. That choice
would be entirely at the policy level, without any input from userspace.

I still think the policy may decide something different than what
userspace requests but that just means the namespacing of an LSM is
under the individual LSMs controls and not the infrastructures.

Eg. selinux is using hierarchical namespaces, so when asked for a
new namespace you will get the bounding hierarchy, but yama (if it
ever gets namespace support) could very well just use independent
namespaces.

> Not sure if your suggestion above supports that already but wanted to note it.
> Regardless, I have no objections to any system call or flag that can
> be used to unshare the SELinux namespace and it should be trivial to
> wire it up to the existing underlying function.
> Serge pointed out that we also will need an API to attach to an
> existing SELinux namespace, which I captured here:
> https://github.com/stephensmalley/selinuxns/issues/19

yes a mechanism to switch is needed, but I also strongly dislike
setns(2). For security purposes we definitely want to control whether
the LSM namespace is associated with other system namespaces.

> This is handled for other Linux namespaces by opening a pseudo file
> under /proc/pid/ns and invoking setns(2), so not sure how we want to
> do it.

That is a possible interface, not one that I like, so I would like to
explore other options first.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  7:46   ` John Johansen
@ 2025-08-21 14:26     ` Serge E. Hallyn
  2025-08-21 14:57       ` John Johansen
  2025-08-22  1:59     ` Paul Moore
  1 sibling, 1 reply; 43+ messages in thread
From: Serge E. Hallyn @ 2025-08-21 14:26 UTC (permalink / raw)
  To: John Johansen; +Cc: Stephen Smalley, Paul Moore, linux-security-module, selinux

On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
> On 8/19/25 10:47, Stephen Smalley wrote:
> > On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote:
> > > 
> > > Hello all,
> > > 
> > > As most of you are likely aware, Stephen Smalley has been working on
> > > adding namespace support to SELinux, and the work has now progressed
> > > to the point where a serious discussion on the API is warranted.  For
> > > those of you are unfamiliar with the details or Stephen's patchset, or
> > > simply need a refresher, he has some excellent documentation in his
> > > work-in-progress repo:
> > > 
> > > * https://github.com/stephensmalley/selinuxns
> > > 
> > > Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> > > about SELinux namespacing, you can watch the presentation here:
> > > 
> > > * https://www.youtube.com/watch?v=AwzGCOwxLoM
> > > 
> > > In the past you've heard me state, rather firmly at times, that I
> > > believe namespacing at the LSM framework layer to be a mistake,
> > > although if there is something that can be done to help facilitate the
> > > namespacing of individual LSMs at the framework layer, I would be
> > > supportive of that.  I think that a single LSM namespace API, similar
> > > to our recently added LSM syscalls, may be such a thing, so I'd like
> > > us to have a discussion to see if we all agree on that, and if so,
> > > what such an API might look like.
> > > 
> > > At LSS-NA this year, John Johansen and I had a brief discussion where
> > > he suggested a single LSM wide clone*(2) flag that individual LSM's
> > > could opt into via callbacks.  John is directly CC'd on this mail, so
> > > I'll let him expand on this idea.
> > > 
> > > While I agree with John that a fs based API is problematic (see all of
> > > our discussions around the LSM syscalls), I'm concerned that a single
> > > clone*(2) flag will significantly limit our flexibility around how
> > > individual LSMs are namespaced, something I don't want to see happen.
> > > This makes me wonder about the potential for expanding
> > > lsm_set_self_attr(2) to support a new LSM attribute that would support
> > > a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> > > provide a single LSM framework API for an unshare operation while also
> > > providing a mechanism to pass LSM specific via the lsm_ctx struct if
> > > needed.  Just as we do with the other LSM_ATTR_* flags today,
> > > individual LSMs can opt-in to the API fairly easily by providing a
> > > setselfattr() LSM callback.
> > > 
> > > Thoughts?
> > 
> > I think we want to be able to unshare a specific security module
> > namespace without unsharing the others, i.e. just SELinux or just
> > AppArmor.
> 
> yes which is part of the problem with the single flag. That choice
> would be entirely at the policy level, without any input from userspace.

AIUI Paul's suggestion is the user can pre-set the details of which
lsms to unshare and how with the lsm_set_self_attr(), and then a
single CLONE_LSM effects that.

-serge

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21 14:26     ` Serge E. Hallyn
@ 2025-08-21 14:57       ` John Johansen
  2025-09-01 16:01         ` Dr. Greg
  0 siblings, 1 reply; 43+ messages in thread
From: John Johansen @ 2025-08-21 14:57 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Stephen Smalley, Paul Moore, linux-security-module, selinux

On 8/21/25 07:26, Serge E. Hallyn wrote:
> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
>> On 8/19/25 10:47, Stephen Smalley wrote:
>>> On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote:
>>>>
>>>> Hello all,
>>>>
>>>> As most of you are likely aware, Stephen Smalley has been working on
>>>> adding namespace support to SELinux, and the work has now progressed
>>>> to the point where a serious discussion on the API is warranted.  For
>>>> those of you are unfamiliar with the details or Stephen's patchset, or
>>>> simply need a refresher, he has some excellent documentation in his
>>>> work-in-progress repo:
>>>>
>>>> * https://github.com/stephensmalley/selinuxns
>>>>
>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>>>> about SELinux namespacing, you can watch the presentation here:
>>>>
>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>>>
>>>> In the past you've heard me state, rather firmly at times, that I
>>>> believe namespacing at the LSM framework layer to be a mistake,
>>>> although if there is something that can be done to help facilitate the
>>>> namespacing of individual LSMs at the framework layer, I would be
>>>> supportive of that.  I think that a single LSM namespace API, similar
>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
>>>> us to have a discussion to see if we all agree on that, and if so,
>>>> what such an API might look like.
>>>>
>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
>>>> I'll let him expand on this idea.
>>>>
>>>> While I agree with John that a fs based API is problematic (see all of
>>>> our discussions around the LSM syscalls), I'm concerned that a single
>>>> clone*(2) flag will significantly limit our flexibility around how
>>>> individual LSMs are namespaced, something I don't want to see happen.
>>>> This makes me wonder about the potential for expanding
>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>>>> provide a single LSM framework API for an unshare operation while also
>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>>>> individual LSMs can opt-in to the API fairly easily by providing a
>>>> setselfattr() LSM callback.
>>>>
>>>> Thoughts?
>>>
>>> I think we want to be able to unshare a specific security module
>>> namespace without unsharing the others, i.e. just SELinux or just
>>> AppArmor.
>>
>> yes which is part of the problem with the single flag. That choice
>> would be entirely at the policy level, without any input from userspace.
> 
> AIUI Paul's suggestion is the user can pre-set the details of which
> lsms to unshare and how with the lsm_set_self_attr(), and then a
> single CLONE_LSM effects that.
> 
yes, I was specifically addressing the conversation I had with Paul at
LSS that Paul brought up. That is

   At LSS-NA this year, John Johansen and I had a brief discussion where
   he suggested a single LSM wide clone*(2) flag that individual LSM's
   could opt into via callbacks.

the idea there isn't all that different than what Paul proposed. You
could have a single flag, if you can provide ancillary information. But
a single flag on its own isn't sufficient.

You can do a subset with a single flag and only policy directing things,
but that would cut container managers out of the decision. Without a
universal container identifier that really limits what you can do. In
another email I likend it to the MCS label approach to the container
where you have a single security policy for the container and each
container gets to be a unique instance of that policy. Its not a perfect
analogy as with namespace policy can be loaded into the namespace making
it unique. I don't think the approach is right because not all namespaces
implement a loadable policy, and even when they do I think we can do a
better job if the container manager is allowed to provide additional
context with the namespacing request.






^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21 14:57       ` John Johansen
@ 2025-09-01 16:01         ` Dr. Greg
  2025-09-01 17:31           ` Casey Schaufler
  2025-09-02 10:55           ` John Johansen
  0 siblings, 2 replies; 43+ messages in thread
From: Dr. Greg @ 2025-09-01 16:01 UTC (permalink / raw)
  To: John Johansen
  Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore,
	linux-security-module, selinux

On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:

Good morning, I hope the week is starting well for everyone.

Now that everyone is getting past the summer holiday season, it would
seem useful to specifically clarify some of the LSM namespace
implementation details.

> On 8/21/25 07:26, Serge E. Hallyn wrote:
> >On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
> >>On 8/19/25 10:47, Stephen Smalley wrote:
> >>>On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> 
> >>>wrote:
> >>>>
> >>>>Hello all,
> >>>>
> >>>>As most of you are likely aware, Stephen Smalley has been working on
> >>>>adding namespace support to SELinux, and the work has now progressed
> >>>>to the point where a serious discussion on the API is warranted.  For
> >>>>those of you are unfamiliar with the details or Stephen's patchset, or
> >>>>simply need a refresher, he has some excellent documentation in his
> >>>>work-in-progress repo:
> >>>>
> >>>>* https://github.com/stephensmalley/selinuxns
> >>>>
> >>>>Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> >>>>about SELinux namespacing, you can watch the presentation here:
> >>>>
> >>>>* https://www.youtube.com/watch?v=AwzGCOwxLoM
> >>>>
> >>>>In the past you've heard me state, rather firmly at times, that I
> >>>>believe namespacing at the LSM framework layer to be a mistake,
> >>>>although if there is something that can be done to help facilitate the
> >>>>namespacing of individual LSMs at the framework layer, I would be
> >>>>supportive of that.  I think that a single LSM namespace API, similar
> >>>>to our recently added LSM syscalls, may be such a thing, so I'd like
> >>>>us to have a discussion to see if we all agree on that, and if so,
> >>>>what such an API might look like.
> >>>>
> >>>>At LSS-NA this year, John Johansen and I had a brief discussion where
> >>>>he suggested a single LSM wide clone*(2) flag that individual LSM's
> >>>>could opt into via callbacks.  John is directly CC'd on this mail, so
> >>>>I'll let him expand on this idea.
> >>>>
> >>>>While I agree with John that a fs based API is problematic (see all of
> >>>>our discussions around the LSM syscalls), I'm concerned that a single
> >>>>clone*(2) flag will significantly limit our flexibility around how
> >>>>individual LSMs are namespaced, something I don't want to see happen.
> >>>>This makes me wonder about the potential for expanding
> >>>>lsm_set_self_attr(2) to support a new LSM attribute that would support
> >>>>a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> >>>>provide a single LSM framework API for an unshare operation while also
> >>>>providing a mechanism to pass LSM specific via the lsm_ctx struct if
> >>>>needed.  Just as we do with the other LSM_ATTR_* flags today,
> >>>>individual LSMs can opt-in to the API fairly easily by providing a
> >>>>setselfattr() LSM callback.
> >>>>
> >>>>Thoughts?
> >>>
> >>>I think we want to be able to unshare a specific security module
> >>>namespace without unsharing the others, i.e. just SELinux or just
> >>>AppArmor.
> >>
> >>yes which is part of the problem with the single flag. That choice
> >>would be entirely at the policy level, without any input from userspace.
> >
> >AIUI Paul's suggestion is the user can pre-set the details of which
> >lsms to unshare and how with the lsm_set_self_attr(), and then a
> >single CLONE_LSM effects that.

> yes, I was specifically addressing the conversation I had with Paul at
> LSS that Paul brought up. That is
> 
>   At LSS-NA this year, John Johansen and I had a brief discussion where
>   he suggested a single LSM wide clone*(2) flag that individual LSM's
>   could opt into via callbacks.
> 
> the idea there isn't all that different than what Paul proposed. You
> could have a single flag, if you can provide ancillary information. But
> a single flag on its own isn't sufficient.

If one thing has come out of this thread, it would seem to be the fact
that there is going to be little commonality in the requirements that
various LSM's will have for the creation of a namespace.

Given that, the most infrastructure that the LSM should provide would
be a common API for a resource orchestrator to request namespace
separation and to provide a framework for configuring the namespace
prior to when execution begins in the context of the namespace.

The first issue to resolve would seem to be what namespace separation
implies.

John, if I interpret your comments in this discussion correctly, your
contention is that when namespace separation is requested, all of the
LSM's that implement namespaces will create a subordinate namespace,
is that a correct assumption?

It would seem, consistent with the 'stacking' concept, that any LSM
with namespace capability that chooses not to separate, will result in
denial of the separation request.  That in turn will imply the need to
unwind or delete any namespace context that other LSM's may have
allocated before the refusal occurred.

This model also implies that the orchestrator requesting the
separation will need to pass a set of parameters describing the
characteristics of each namespace, described by the LSM identifier
that they pertain to.  Since there may be a need to configure multiple
namespaces there would be a requirement to pass an array or list of
these parameter sets.

There will also be a need to inject, possibly substantial amounts of
policy or model information into the namespace, before execution in
the context of the namespace begins.

There will also be a need to decide whether namespace separation
should occur at the request of the orchestrator or at the next fork,
the latter model being what the other resource namespaces use.  We
believe the argument for direct separation can be made by looking at
the gymnastics that orchestrators need to jump through with the
'change-on-fork' model.

Case in point, it would seem realistic that a process with sufficient
privilege, may desire to place itself in a new LSM namespace context
in a manner that does not require re-execution of itself.

With respect to separation, the remaining issue is if a new security
capability bit needs to be implemented to gate namespace separation.
John, based on your comments, I believe you would support this need?

> You can do a subset with a single flag and only policy directing things,
> but that would cut container managers out of the decision. Without a
> universal container identifier that really limits what you can do. In
> another email I likend it to the MCS label approach to the container
> where you have a single security policy for the container and each
> container gets to be a unique instance of that policy. Its not a perfect
> analogy as with namespace policy can be loaded into the namespace making
> it unique. I don't think the approach is right because not all namespaces
> implement a loadable policy, and even when they do I think we can do a
> better job if the container manager is allowed to provide additional
> context with the namespacing request.

In order to be relevant, the configuration of LSM namespaces need to
be under control of a resource orchestrator or container manager.

What we hear from people doing Kubernetes, at scale, is a desire to be
able to request that a container be run somewhere in the hardware
resource pool and for that container to implement a security model
specific to the needs of the workload running in that container.  In a
manner that is orthogonal from other security policies that may be in
effect for other workloads, on the host or in other containers.

Hopefully the above will be of assistance in furthering discussion.

Have a good week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-09-01 16:01         ` Dr. Greg
@ 2025-09-01 17:31           ` Casey Schaufler
  2025-09-04  2:16             ` Dr. Greg
  2025-09-02 10:55           ` John Johansen
  1 sibling, 1 reply; 43+ messages in thread
From: Casey Schaufler @ 2025-09-01 17:31 UTC (permalink / raw)
  To: Dr. Greg, John Johansen
  Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore,
	linux-security-module, selinux, Casey Schaufler

On 9/1/2025 9:01 AM, Dr. Greg wrote:
> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
>
> Good morning, I hope the week is starting well for everyone.
>
> Now that everyone is getting past the summer holiday season, it would
> seem useful to specifically clarify some of the LSM namespace
> implementation details.
>
>> On 8/21/25 07:26, Serge E. Hallyn wrote:
>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
>>>> On 8/19/25 10:47, Stephen Smalley wrote:
>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> 
>>>>> wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> As most of you are likely aware, Stephen Smalley has been working on
>>>>>> adding namespace support to SELinux, and the work has now progressed
>>>>>> to the point where a serious discussion on the API is warranted.  For
>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or
>>>>>> simply need a refresher, he has some excellent documentation in his
>>>>>> work-in-progress repo:
>>>>>>
>>>>>> * https://github.com/stephensmalley/selinuxns
>>>>>>
>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>>>>>> about SELinux namespacing, you can watch the presentation here:
>>>>>>
>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>>>>>
>>>>>> In the past you've heard me state, rather firmly at times, that I
>>>>>> believe namespacing at the LSM framework layer to be a mistake,
>>>>>> although if there is something that can be done to help facilitate the
>>>>>> namespacing of individual LSMs at the framework layer, I would be
>>>>>> supportive of that.  I think that a single LSM namespace API, similar
>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
>>>>>> us to have a discussion to see if we all agree on that, and if so,
>>>>>> what such an API might look like.
>>>>>>
>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
>>>>>> I'll let him expand on this idea.
>>>>>>
>>>>>> While I agree with John that a fs based API is problematic (see all of
>>>>>> our discussions around the LSM syscalls), I'm concerned that a single
>>>>>> clone*(2) flag will significantly limit our flexibility around how
>>>>>> individual LSMs are namespaced, something I don't want to see happen.
>>>>>> This makes me wonder about the potential for expanding
>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>>>>>> provide a single LSM framework API for an unshare operation while also
>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>>>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>>>>>> individual LSMs can opt-in to the API fairly easily by providing a
>>>>>> setselfattr() LSM callback.
>>>>>>
>>>>>> Thoughts?
>>>>> I think we want to be able to unshare a specific security module
>>>>> namespace without unsharing the others, i.e. just SELinux or just
>>>>> AppArmor.
>>>> yes which is part of the problem with the single flag. That choice
>>>> would be entirely at the policy level, without any input from userspace.
>>> AIUI Paul's suggestion is the user can pre-set the details of which
>>> lsms to unshare and how with the lsm_set_self_attr(), and then a
>>> single CLONE_LSM effects that.
>> yes, I was specifically addressing the conversation I had with Paul at
>> LSS that Paul brought up. That is
>>
>>   At LSS-NA this year, John Johansen and I had a brief discussion where
>>   he suggested a single LSM wide clone*(2) flag that individual LSM's
>>   could opt into via callbacks.
>>
>> the idea there isn't all that different than what Paul proposed. You
>> could have a single flag, if you can provide ancillary information. But
>> a single flag on its own isn't sufficient.
> If one thing has come out of this thread, it would seem to be the fact
> that there is going to be little commonality in the requirements that
> various LSM's will have for the creation of a namespace.
>
> Given that, the most infrastructure that the LSM should provide would
> be a common API for a resource orchestrator to request namespace
> separation and to provide a framework for configuring the namespace
> prior to when execution begins in the context of the namespace.
>
> The first issue to resolve would seem to be what namespace separation
> implies.
>
> John, if I interpret your comments in this discussion correctly, your
> contention is that when namespace separation is requested, all of the
> LSM's that implement namespaces will create a subordinate namespace,
> is that a correct assumption?
>
> It would seem, consistent with the 'stacking' concept, that any LSM
> with namespace capability that chooses not to separate, will result in
> denial of the separation request.  That in turn will imply the need to
> unwind or delete any namespace context that other LSM's may have
> allocated before the refusal occurred.

Were it true that 'stacking' rated the status of a 'concept'.

An LSM that is capable of namespacing (the definition of which is
elusive at this time) should be allowed to decline participation
in a namespace creation. That, or there needs to be a convention
for "null" namespaces, by which an LSM can pretend that it isn't
involved in the new namespace. I think the latter smells funny
and would invite "security people don't understand performance"
remarks. No LSM should be allowed to prevent another from using
namespaces.


>
> This model also implies that the orchestrator requesting the
> separation will need to pass a set of parameters describing the
> characteristics of each namespace, described by the LSM identifier
> that they pertain to.  Since there may be a need to configure multiple
> namespaces there would be a requirement to pass an array or list of
> these parameter sets.

Just like lsm_set_self_attr(2).

> There will also be a need to inject, possibly substantial amounts of
> policy or model information into the namespace, before execution in
> the context of the namespace begins.

Yup. A major downside of loadable policy.

> There will also be a need to decide whether namespace separation
> should occur at the request of the orchestrator or at the next fork,
> the latter model being what the other resource namespaces use.  We
> believe the argument for direct separation can be made by looking at
> the gymnastics that orchestrators need to jump through with the
> 'change-on-fork' model.
>
> Case in point, it would seem realistic that a process with sufficient
> privilege, may desire to place itself in a new LSM namespace context
> in a manner that does not require re-execution of itself.
>
> With respect to separation, the remaining issue is if a new security
> capability bit needs to be implemented to gate namespace separation.
> John, based on your comments, I believe you would support this need?

I don't like the notion of a new capability for this. But then,
I object to almost every new capability proposed. Existing namespaces
don't need their own capabilities. I don't see this case as special.

>
>> You can do a subset with a single flag and only policy directing things,
>> but that would cut container managers out of the decision. Without a
>> universal container identifier that really limits what you can do. In
>> another email I likend it to the MCS label approach to the container
>> where you have a single security policy for the container and each
>> container gets to be a unique instance of that policy. Its not a perfect
>> analogy as with namespace policy can be loaded into the namespace making
>> it unique. I don't think the approach is right because not all namespaces
>> implement a loadable policy, and even when they do I think we can do a
>> better job if the container manager is allowed to provide additional
>> context with the namespacing request.
> In order to be relevant, the configuration of LSM namespaces need to
> be under control of a resource orchestrator or container manager.

I do not approve of kernel features that are pointless without specific
user space support. If it can't be used in ways other than those
defined by a particular user space component they really don't belong
in the kernel. 

>
> What we hear from people doing Kubernetes, at scale, is a desire to be
> able to request that a container be run somewhere in the hardware
> resource pool and for that container to implement a security model
> specific to the needs of the workload running in that container.  In a
> manner that is orthogonal from other security policies that may be in
> effect for other workloads, on the host or in other containers.

That sounds to me like they want per-container security policy. That
would require that the kernel have the 'concept' of a container. That's
not something I expect to see in my lifetime.

>
> Hopefully the above will be of assistance in furthering discussion.
>
> Have a good week.
>
> As always,
> Dr. Greg
>
> The Quixote Project - Flailing at the Travails of Cybersecurity
>               https://github.com/Quixote-Project
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-09-01 17:31           ` Casey Schaufler
@ 2025-09-04  2:16             ` Dr. Greg
  2025-09-04 17:40               ` Casey Schaufler
  0 siblings, 1 reply; 43+ messages in thread
From: Dr. Greg @ 2025-09-04  2:16 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: John Johansen, Serge E. Hallyn, Stephen Smalley, Paul Moore,
	linux-security-module, selinux

On Mon, Sep 01, 2025 at 10:31:43AM -0700, Casey Schaufler wrote:

Hi, I hope mid-week has gone well for everyone.

> On 9/1/2025 9:01 AM, Dr. Greg wrote:
> > On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
> >
> > Good morning, I hope the week is starting well for everyone.
> >
> > Now that everyone is getting past the summer holiday season, it would
> > seem useful to specifically clarify some of the LSM namespace
> > implementation details.
> >
> >> On 8/21/25 07:26, Serge E. Hallyn wrote:
> >>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
> >>>> On 8/19/25 10:47, Stephen Smalley wrote:
> >>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> 
> >>>>> wrote:
> >>>>>> Hello all,
> >>>>>>
> >>>>>> As most of you are likely aware, Stephen Smalley has been working on
> >>>>>> adding namespace support to SELinux, and the work has now progressed
> >>>>>> to the point where a serious discussion on the API is warranted.  For
> >>>>>> those of you are unfamiliar with the details or Stephen's patchset, or
> >>>>>> simply need a refresher, he has some excellent documentation in his
> >>>>>> work-in-progress repo:
> >>>>>>
> >>>>>> * https://github.com/stephensmalley/selinuxns
> >>>>>>
> >>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> >>>>>> about SELinux namespacing, you can watch the presentation here:
> >>>>>>
> >>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
> >>>>>>
> >>>>>> In the past you've heard me state, rather firmly at times, that I
> >>>>>> believe namespacing at the LSM framework layer to be a mistake,
> >>>>>> although if there is something that can be done to help facilitate the
> >>>>>> namespacing of individual LSMs at the framework layer, I would be
> >>>>>> supportive of that.  I think that a single LSM namespace API, similar
> >>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
> >>>>>> us to have a discussion to see if we all agree on that, and if so,
> >>>>>> what such an API might look like.
> >>>>>>
> >>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
> >>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
> >>>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
> >>>>>> I'll let him expand on this idea.
> >>>>>>
> >>>>>> While I agree with John that a fs based API is problematic (see all of
> >>>>>> our discussions around the LSM syscalls), I'm concerned that a single
> >>>>>> clone*(2) flag will significantly limit our flexibility around how
> >>>>>> individual LSMs are namespaced, something I don't want to see happen.
> >>>>>> This makes me wonder about the potential for expanding
> >>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
> >>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> >>>>>> provide a single LSM framework API for an unshare operation while also
> >>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
> >>>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
> >>>>>> individual LSMs can opt-in to the API fairly easily by providing a
> >>>>>> setselfattr() LSM callback.
> >>>>>>
> >>>>>> Thoughts?
> >>>>> I think we want to be able to unshare a specific security module
> >>>>> namespace without unsharing the others, i.e. just SELinux or just
> >>>>> AppArmor.
> >>>> yes which is part of the problem with the single flag. That choice
> >>>> would be entirely at the policy level, without any input from userspace.
> >>> AIUI Paul's suggestion is the user can pre-set the details of which
> >>> lsms to unshare and how with the lsm_set_self_attr(), and then a
> >>> single CLONE_LSM effects that.
> >> yes, I was specifically addressing the conversation I had with Paul at
> >> LSS that Paul brought up. That is
> >>
> >>   At LSS-NA this year, John Johansen and I had a brief discussion where
> >>   he suggested a single LSM wide clone*(2) flag that individual LSM's
> >>   could opt into via callbacks.
> >>
> >> the idea there isn't all that different than what Paul proposed. You
> >> could have a single flag, if you can provide ancillary information. But
> >> a single flag on its own isn't sufficient.
> > If one thing has come out of this thread, it would seem to be the fact
> > that there is going to be little commonality in the requirements that
> > various LSM's will have for the creation of a namespace.
> >
> > Given that, the most infrastructure that the LSM should provide would
> > be a common API for a resource orchestrator to request namespace
> > separation and to provide a framework for configuring the namespace
> > prior to when execution begins in the context of the namespace.
> >
> > The first issue to resolve would seem to be what namespace separation
> > implies.
> >
> > John, if I interpret your comments in this discussion correctly, your
> > contention is that when namespace separation is requested, all of the
> > LSM's that implement namespaces will create a subordinate namespace,
> > is that a correct assumption?
> >
> > It would seem, consistent with the 'stacking' concept, that any LSM
> > with namespace capability that chooses not to separate, will result in
> > denial of the separation request.  That in turn will imply the need to
> > unwind or delete any namespace context that other LSM's may have
> > allocated before the refusal occurred.

> Were it true that 'stacking' rated the status of a 'concept'.

If 'concept' doesn't work as a term, we can call it an agreement on
the co-existence of multiple security models.

> An LSM that is capable of namespacing (the definition of which is
> elusive at this time) should be allowed to decline participation in
> a namespace creation.

Given the above, a full stop may be in order.

Perhaps, in pursuit of wisdom, we should call for a general consensus
among the group as to whether or not we have any clue as to what we
are doing?

> That, or there needs to be a convention for "null" namespaces, by
> which an LSM can pretend that it isn't involved in the new
> namespace. I think the latter smells funny and would invite
> "security people don't understand performance" remarks. No LSM
> should be allowed to prevent another from using namespaces.

Unfortunately that would seem to collide with the general consensus
that has evolved around 'stacking', as the means by which Linux
supports multiple LSM based security models/architectures.

The kernel security architecture admits to the notion that all of
the active LSM's have to agree that a specific security event be
allowed.  If any LSM elects to deny a hook call, permission is denied
for the event.

John responded to our e-mail in this thread and clarified that he
doesn't believe that a POSIX 1e style capability for namespace
separation is required.  However, our understanding from his reply is
that he felt that LSM namespace creation itself should have its own
LSM hook/event.

If this is the case, to be consistent with the stacking architecture,
any LSM should have the ability to deny security namespace creation
through its interpretation of the LSM namespace creation hook.

For example, it would certainly seem to be a valid concept for
something like an enhanced 'lockdown' mode to deny the ability for any
processes to escape into an LSM policy domain other than what was
configured when the platform was placed in a locked down status.

If we don't adhere to this model, we will have a 'snowflake' to
contend with in the LSM security model.

> > This model also implies that the orchestrator requesting the
> > separation will need to pass a set of parameters describing the
> > characteristics of each namespace, described by the LSM identifier
> > that they pertain to.  Since there may be a need to configure multiple
> > namespaces there would be a requirement to pass an array or list of
> > these parameter sets.

> Just like lsm_set_self_attr(2).

That provides basic infrastructure, however, with concession to the
general acknowledgement that every LSM is different, the requirement
for every attribute to have a unique descriptive identity value may
prove restrictive, particularly in model based LSM's.

What may be needed is an agnostic attribute identifier that
orchestration software could use, in combination with the 'flags'
variable to specify exactly what type of attribute is being delivered
by the system call to an LSM.  In other words, the attribute would
tell an LSM to interpret the flags value as an indicator of the
payload being delivered.

> > There will also be a need to inject, possibly substantial amounts of
> > policy or model information into the namespace, before execution in
> > the context of the namespace begins.

> Yup. A major downside of loadable policy.

Irregardless of merit, it will be reality, see below.

> > There will also be a need to decide whether namespace separation
> > should occur at the request of the orchestrator or at the next fork,
> > the latter model being what the other resource namespaces use.  We
> > believe the argument for direct separation can be made by looking at
> > the gymnastics that orchestrators need to jump through with the
> > 'change-on-fork' model.
> >
> > Case in point, it would seem realistic that a process with sufficient
> > privilege, may desire to place itself in a new LSM namespace context
> > in a manner that does not require re-execution of itself.
> >
> > With respect to separation, the remaining issue is if a new security
> > capability bit needs to be implemented to gate namespace separation.
> > John, based on your comments, I believe you would support this need?

> I don't like the notion of a new capability for this. But then, I
> object to almost every new capability proposed. Existing namespaces
> don't need their own capabilities. I don't see this case as special.

It appears that John is thinking that an LSM hook is what will be
needed, so no new capability bit would be required.

That concept seems consistent with the precedence that was established
by using this type of scheme to control the creation of user
namespaces.

> >> You can do a subset with a single flag and only policy directing things,
> >> but that would cut container managers out of the decision. Without a
> >> universal container identifier that really limits what you can do. In
> >> another email I likend it to the MCS label approach to the container
> >> where you have a single security policy for the container and each
> >> container gets to be a unique instance of that policy. Its not a perfect
> >> analogy as with namespace policy can be loaded into the namespace making
> >> it unique. I don't think the approach is right because not all namespaces
> >> implement a loadable policy, and even when they do I think we can do a
> >> better job if the container manager is allowed to provide additional
> >> context with the namespacing request.
> > In order to be relevant, the configuration of LSM namespaces need to
> > be under control of a resource orchestrator or container manager.

> I do not approve of kernel features that are pointless without
> specific user space support. If it can't be used in ways other than
> those defined by a particular user space component they really don't
> belong in the kernel.

It appears you have already created the necessary infrastructure with
lsm_set_self_attr(2).

Given the apparent consensus that an LSM is free to implement
namespaces in whatever manner it pleases, an LSM can offer
configuration of an instance of its security namespace with an LSM
specific pseudo-filesystem interface.

If a centralized namespace separation is pursued, what will be
required is a method for loading policy/configuration before execution
starts in the context of the namespace.

> > What we hear from people doing Kubernetes, at scale, is a desire to be
> > able to request that a container be run somewhere in the hardware
> > resource pool and for that container to implement a security model
> > specific to the needs of the workload running in that container.  In a
> > manner that is orthogonal from other security policies that may be in
> > effect for other workloads, on the host or in other containers.

> That sounds to me like they want per-container security policy. That
> would require that the kernel have the 'concept' of a
> container. That's not something I expect to see in my lifetime.

Per-container security policy is the expectation that will be raised
by the creation of LSM namespaces.  We can speak very directly to that
fact, from conversations with groups that are running fleets of
thousands of virtual machines supporting tens of thousands of
container instances.

A 'container' is a set of kernel resource domains applied to an
execution workload.  An LSM namespace will be another resource domain
that is placed around the workload by an orchestration system.

Speaking from personal implementation experience.  If the LSM
namespace is entered and configured before the container runtime
engine is started, you have in effect, created a per container
security policy for that workload.

There are a plethora of issues surrounding this but it may be best to
leave those to further evolution of this discussion.

Have a good remainder of the week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-09-04  2:16             ` Dr. Greg
@ 2025-09-04 17:40               ` Casey Schaufler
  0 siblings, 0 replies; 43+ messages in thread
From: Casey Schaufler @ 2025-09-04 17:40 UTC (permalink / raw)
  To: Dr. Greg
  Cc: John Johansen, Serge E. Hallyn, Stephen Smalley, Paul Moore,
	linux-security-module, selinux, Casey Schaufler

On 9/3/2025 7:16 PM, Dr. Greg wrote:
> On Mon, Sep 01, 2025 at 10:31:43AM -0700, Casey Schaufler wrote:
>
> Hi, I hope mid-week has gone well for everyone.
>
>> On 9/1/2025 9:01 AM, Dr. Greg wrote:
>>> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
>>>
>>> Good morning, I hope the week is starting well for everyone.
>>>
>>> Now that everyone is getting past the summer holiday season, it would
>>> seem useful to specifically clarify some of the LSM namespace
>>> implementation details.
>>>
>>>> On 8/21/25 07:26, Serge E. Hallyn wrote:
>>>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
>>>>>> On 8/19/25 10:47, Stephen Smalley wrote:
>>>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> 
>>>>>>> wrote:
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> As most of you are likely aware, Stephen Smalley has been working on
>>>>>>>> adding namespace support to SELinux, and the work has now progressed
>>>>>>>> to the point where a serious discussion on the API is warranted.  For
>>>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or
>>>>>>>> simply need a refresher, he has some excellent documentation in his
>>>>>>>> work-in-progress repo:
>>>>>>>>
>>>>>>>> * https://github.com/stephensmalley/selinuxns
>>>>>>>>
>>>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>>>>>>>> about SELinux namespacing, you can watch the presentation here:
>>>>>>>>
>>>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>>>>>>>
>>>>>>>> In the past you've heard me state, rather firmly at times, that I
>>>>>>>> believe namespacing at the LSM framework layer to be a mistake,
>>>>>>>> although if there is something that can be done to help facilitate the
>>>>>>>> namespacing of individual LSMs at the framework layer, I would be
>>>>>>>> supportive of that.  I think that a single LSM namespace API, similar
>>>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
>>>>>>>> us to have a discussion to see if we all agree on that, and if so,
>>>>>>>> what such an API might look like.
>>>>>>>>
>>>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
>>>>>>>> I'll let him expand on this idea.
>>>>>>>>
>>>>>>>> While I agree with John that a fs based API is problematic (see all of
>>>>>>>> our discussions around the LSM syscalls), I'm concerned that a single
>>>>>>>> clone*(2) flag will significantly limit our flexibility around how
>>>>>>>> individual LSMs are namespaced, something I don't want to see happen.
>>>>>>>> This makes me wonder about the potential for expanding
>>>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>>>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>>>>>>>> provide a single LSM framework API for an unshare operation while also
>>>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>>>>>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>>>>>>>> individual LSMs can opt-in to the API fairly easily by providing a
>>>>>>>> setselfattr() LSM callback.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>> I think we want to be able to unshare a specific security module
>>>>>>> namespace without unsharing the others, i.e. just SELinux or just
>>>>>>> AppArmor.
>>>>>> yes which is part of the problem with the single flag. That choice
>>>>>> would be entirely at the policy level, without any input from userspace.
>>>>> AIUI Paul's suggestion is the user can pre-set the details of which
>>>>> lsms to unshare and how with the lsm_set_self_attr(), and then a
>>>>> single CLONE_LSM effects that.
>>>> yes, I was specifically addressing the conversation I had with Paul at
>>>> LSS that Paul brought up. That is
>>>>
>>>>   At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>   he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>   could opt into via callbacks.
>>>>
>>>> the idea there isn't all that different than what Paul proposed. You
>>>> could have a single flag, if you can provide ancillary information. But
>>>> a single flag on its own isn't sufficient.
>>> If one thing has come out of this thread, it would seem to be the fact
>>> that there is going to be little commonality in the requirements that
>>> various LSM's will have for the creation of a namespace.
>>>
>>> Given that, the most infrastructure that the LSM should provide would
>>> be a common API for a resource orchestrator to request namespace
>>> separation and to provide a framework for configuring the namespace
>>> prior to when execution begins in the context of the namespace.
>>>
>>> The first issue to resolve would seem to be what namespace separation
>>> implies.
>>>
>>> John, if I interpret your comments in this discussion correctly, your
>>> contention is that when namespace separation is requested, all of the
>>> LSM's that implement namespaces will create a subordinate namespace,
>>> is that a correct assumption?
>>>
>>> It would seem, consistent with the 'stacking' concept, that any LSM
>>> with namespace capability that chooses not to separate, will result in
>>> denial of the separation request.  That in turn will imply the need to
>>> unwind or delete any namespace context that other LSM's may have
>>> allocated before the refusal occurred.
>> Were it true that 'stacking' rated the status of a 'concept'.
> If 'concept' doesn't work as a term, we can call it an agreement on
> the co-existence of multiple security models.

Sure.

>> An LSM that is capable of namespacing (the definition of which is
>> elusive at this time) should be allowed to decline participation in
>> a namespace creation.
> Given the above, a full stop may be in order.
>
> Perhaps, in pursuit of wisdom, we should call for a general consensus
> among the group as to whether or not we have any clue as to what we
> are doing?

That's the purpose of this thread, I believe. Now, whether we'll ever
get to true consensus seems unlikely, but I expect to see something
close enough that the wailing of those opposed will fail to prevent
acceptance.

>> That, or there needs to be a convention for "null" namespaces, by
>> which an LSM can pretend that it isn't involved in the new
>> namespace. I think the latter smells funny and would invite
>> "security people don't understand performance" remarks. No LSM
>> should be allowed to prevent another from using namespaces.
> Unfortunately that would seem to collide with the general consensus
> that has evolved around 'stacking', as the means by which Linux
> supports multiple LSM based security models/architectures.

I don't see that at all. For whatever reason, the developers of
namespaces chose to ignore the LSM infrastructure and the implications
their scheme has upon it. Managing the combination of differing
philosophies is often complex, and this is no exception.

> The kernel security architecture admits to the notion that all of
> the active LSM's have to agree that a specific security event be
> allowed.  If any LSM elects to deny a hook call, permission is denied
> for the event.

call_void_hook()

> John responded to our e-mail in this thread and clarified that he
> doesn't believe that a POSIX 1e style capability for namespace
> separation is required.  However, our understanding from his reply is
> that he felt that LSM namespace creation itself should have its own
> LSM hook/event.
>
> If this is the case, to be consistent with the stacking architecture,
> any LSM should have the ability to deny security namespace creation
> through its interpretation of the LSM namespace creation hook.
>
> For example, it would certainly seem to be a valid concept for
> something like an enhanced 'lockdown' mode to deny the ability for any
> processes to escape into an LSM policy domain other than what was
> configured when the platform was placed in a locked down status.
>
> If we don't adhere to this model, we will have a 'snowflake' to
> contend with in the LSM security model.

Again, call_void_hook()

>>> This model also implies that the orchestrator requesting the
>>> separation will need to pass a set of parameters describing the
>>> characteristics of each namespace, described by the LSM identifier
>>> that they pertain to.  Since there may be a need to configure multiple
>>> namespaces there would be a requirement to pass an array or list of
>>> these parameter sets.
>> Just like lsm_set_self_attr(2).
> That provides basic infrastructure, however, with concession to the
> general acknowledgement that every LSM is different, the requirement
> for every attribute to have a unique descriptive identity value may
> prove restrictive, particularly in model based LSM's.

That was an argument made against the lsm_set_self_attr() interface
in the beginning. Even if lsm_set_self_attr() isn't the answer, it
provides a clue on how to formulate one.

> What may be needed is an agnostic attribute identifier that
> orchestration software could use, in combination with the 'flags'
> variable to specify exactly what type of attribute is being delivered
> by the system call to an LSM.  In other words, the attribute would
> tell an LSM to interpret the flags value as an indicator of the
> payload being delivered.

That's what flags are for. Or did I miss something?

>>> There will also be a need to inject, possibly substantial amounts of
>>> policy or model information into the namespace, before execution in
>>> the context of the namespace begins.
>> Yup. A major downside of loadable policy.
> Irregardless of merit, it will be reality, see below.

s/Irregardless/Regardless/

"Irregardless" is not a word.

>>> There will also be a need to decide whether namespace separation
>>> should occur at the request of the orchestrator or at the next fork,
>>> the latter model being what the other resource namespaces use.  We
>>> believe the argument for direct separation can be made by looking at
>>> the gymnastics that orchestrators need to jump through with the
>>> 'change-on-fork' model.
>>>
>>> Case in point, it would seem realistic that a process with sufficient
>>> privilege, may desire to place itself in a new LSM namespace context
>>> in a manner that does not require re-execution of itself.
>>>
>>> With respect to separation, the remaining issue is if a new security
>>> capability bit needs to be implemented to gate namespace separation.
>>> John, based on your comments, I believe you would support this need?
>> I don't like the notion of a new capability for this. But then, I
>> object to almost every new capability proposed. Existing namespaces
>> don't need their own capabilities. I don't see this case as special.
> It appears that John is thinking that an LSM hook is what will be
> needed, so no new capability bit would be required.
>
> That concept seems consistent with the precedence that was established
> by using this type of scheme to control the creation of user
> namespaces.
>
>>>> You can do a subset with a single flag and only policy directing things,
>>>> but that would cut container managers out of the decision. Without a
>>>> universal container identifier that really limits what you can do. In
>>>> another email I likend it to the MCS label approach to the container
>>>> where you have a single security policy for the container and each
>>>> container gets to be a unique instance of that policy. Its not a perfect
>>>> analogy as with namespace policy can be loaded into the namespace making
>>>> it unique. I don't think the approach is right because not all namespaces
>>>> implement a loadable policy, and even when they do I think we can do a
>>>> better job if the container manager is allowed to provide additional
>>>> context with the namespacing request.
>>> In order to be relevant, the configuration of LSM namespaces need to
>>> be under control of a resource orchestrator or container manager.
>> I do not approve of kernel features that are pointless without
>> specific user space support. If it can't be used in ways other than
>> those defined by a particular user space component they really don't
>> belong in the kernel.
> It appears you have already created the necessary infrastructure with
> lsm_set_self_attr(2).
>
> Given the apparent consensus that an LSM is free to implement
> namespaces in whatever manner it pleases, an LSM can offer
> configuration of an instance of its security namespace with an LSM
> specific pseudo-filesystem interface.
>
> If a centralized namespace separation is pursued, what will be
> required is a method for loading policy/configuration before execution
> starts in the context of the namespace.

Just so.

>>> What we hear from people doing Kubernetes, at scale, is a desire to be
>>> able to request that a container be run somewhere in the hardware
>>> resource pool and for that container to implement a security model
>>> specific to the needs of the workload running in that container.  In a
>>> manner that is orthogonal from other security policies that may be in
>>> effect for other workloads, on the host or in other containers.
>> That sounds to me like they want per-container security policy. That
>> would require that the kernel have the 'concept' of a
>> container. That's not something I expect to see in my lifetime.
> Per-container security policy is the expectation that will be raised
> by the creation of LSM namespaces.  We can speak very directly to that
> fact, from conversations with groups that are running fleets of
> thousands of virtual machines supporting tens of thousands of
> container instances.

All the more reason not to implement them at the LSM level. If
you can't meet expectations, the effort is futile.

> A 'container' is a set of kernel resource domains applied to an
> execution workload.  An LSM namespace will be another resource domain
> that is placed around the workload by an orchestration system.

A 'container' is whatever the snake oil sales rep says it is.
Kata containers use virtual machines. Containers may be implemented
without an "orchestration system".

> Speaking from personal implementation experience.  If the LSM
> namespace is entered and configured before the container runtime
> engine is started, you have in effect, created a per container
> security policy for that workload.

The base system policy will still be enforced. Having multiple policies
in place is tricky. What we can't have is a system where the base
policy is replaced rather than supplemented. It that not obvious?

> There are a plethora of issues surrounding this but it may be best to
> leave those to further evolution of this discussion.
>
> Have a good remainder of the week.
>
> As always,
> Dr. Greg
>
> The Quixote Project - Flailing at the Travails of Cybersecurity
>               https://github.com/Quixote-Project
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-09-01 16:01         ` Dr. Greg
  2025-09-01 17:31           ` Casey Schaufler
@ 2025-09-02 10:55           ` John Johansen
  2025-09-05 22:14             ` Dr. Greg
  1 sibling, 1 reply; 43+ messages in thread
From: John Johansen @ 2025-09-02 10:55 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore,
	linux-security-module, selinux

On 9/1/25 09:01, Dr. Greg wrote:
> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
> 
> Good morning, I hope the week is starting well for everyone.
> 
> Now that everyone is getting past the summer holiday season, it would
> seem useful to specifically clarify some of the LSM namespace
> implementation details.
> 
>> On 8/21/25 07:26, Serge E. Hallyn wrote:
>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
>>>> On 8/19/25 10:47, Stephen Smalley wrote:
>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com>
>>>>> wrote:
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> As most of you are likely aware, Stephen Smalley has been working on
>>>>>> adding namespace support to SELinux, and the work has now progressed
>>>>>> to the point where a serious discussion on the API is warranted.  For
>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or
>>>>>> simply need a refresher, he has some excellent documentation in his
>>>>>> work-in-progress repo:
>>>>>>
>>>>>> * https://github.com/stephensmalley/selinuxns
>>>>>>
>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>>>>>> about SELinux namespacing, you can watch the presentation here:
>>>>>>
>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>>>>>
>>>>>> In the past you've heard me state, rather firmly at times, that I
>>>>>> believe namespacing at the LSM framework layer to be a mistake,
>>>>>> although if there is something that can be done to help facilitate the
>>>>>> namespacing of individual LSMs at the framework layer, I would be
>>>>>> supportive of that.  I think that a single LSM namespace API, similar
>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
>>>>>> us to have a discussion to see if we all agree on that, and if so,
>>>>>> what such an API might look like.
>>>>>>
>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
>>>>>> I'll let him expand on this idea.
>>>>>>
>>>>>> While I agree with John that a fs based API is problematic (see all of
>>>>>> our discussions around the LSM syscalls), I'm concerned that a single
>>>>>> clone*(2) flag will significantly limit our flexibility around how
>>>>>> individual LSMs are namespaced, something I don't want to see happen.
>>>>>> This makes me wonder about the potential for expanding
>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>>>>>> provide a single LSM framework API for an unshare operation while also
>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>>>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>>>>>> individual LSMs can opt-in to the API fairly easily by providing a
>>>>>> setselfattr() LSM callback.
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> I think we want to be able to unshare a specific security module
>>>>> namespace without unsharing the others, i.e. just SELinux or just
>>>>> AppArmor.
>>>>
>>>> yes which is part of the problem with the single flag. That choice
>>>> would be entirely at the policy level, without any input from userspace.
>>>
>>> AIUI Paul's suggestion is the user can pre-set the details of which
>>> lsms to unshare and how with the lsm_set_self_attr(), and then a
>>> single CLONE_LSM effects that.
> 
>> yes, I was specifically addressing the conversation I had with Paul at
>> LSS that Paul brought up. That is
>>
>>    At LSS-NA this year, John Johansen and I had a brief discussion where
>>    he suggested a single LSM wide clone*(2) flag that individual LSM's
>>    could opt into via callbacks.
>>
>> the idea there isn't all that different than what Paul proposed. You
>> could have a single flag, if you can provide ancillary information. But
>> a single flag on its own isn't sufficient.
> 
> If one thing has come out of this thread, it would seem to be the fact
> that there is going to be little commonality in the requirements that
> various LSM's will have for the creation of a namespace.
> 

yes

> Given that, the most infrastructure that the LSM should provide would
> be a common API for a resource orchestrator to request namespace
> separation and to provide a framework for configuring the namespace
> prior to when execution begins in the context of the namespace.
> 

hrmmm, certainly a common API. Any task could theoretically use the API
it doesn't have to be a resource orchestrator, but I suppose you could
call it such.

I also dont know that we need to provide a framework for configuring
the namespace prior to when execcution begins in the context of the
namespace. It might be a nice to have, but configuring of LSMs is
very LSM specific.

We don't even have a common LSM policy load interface atm, though there
is a proposal. Configuration is a step beyond that. Would it be nice
to have, sure. Are we going to get that far, I don't know.


> The first issue to resolve would seem to be what namespace separation
> implies.
> 
> John, if I interpret your comments in this discussion correctly, your
> contention is that when namespace separation is requested, all of the
> LSM's that implement namespaces will create a subordinate namespace,
> is that a correct assumption?
> 
No, not necessarily. The task can request to "unshare/create" LSMs
similar to requesting a set of system namespaces. Then every LSM,
whether part of the request or not get to do their thing. If every
LSM agrees, then a transition hook will process and each LSM will
again do its thing. This would likely be what was requested but its
possible that an LSM not in the request will do something, based
on its model.

In the end usespace gets to make a request, each security policy is
responsible for staying withing its security model/policy.

> It would seem, consistent with the 'stacking' concept, that any LSM
> with namespace capability that chooses not to separate, will result in
> denial of the separation request.  That in turn will imply the need to

Not necessarily. They could allow and choose not to transition. Or they
could not create a namespace but update some state.

> unwind or delete any namespace context that other LSM's may have
> allocated before the refusal occurred.

The request does need to be split into a permission hook and a
transition hook similar to exec. If any LSM in the permission hook
denies, the request is denied. If any LSM in the transition hook fails
again the request will fail, and the LSMs would get their regular clean
up hook called for the object associated.

> 
> This model also implies that the orchestrator requesting the
> separation will need to pass a set of parameters describing the
> characteristics of each namespace, described by the LSM identifier
> that they pertain to.  Since there may be a need to configure multiple
> namespaces there would be a requirement to pass an array or list of
> these parameter sets.
> 
yes it will require a list/array see lsm_set_self_attr(2)

> There will also be a need to inject, possibly substantial amounts of
> policy or model information into the namespace, before execution in
> the context of the namespace begins.
> 
Allowing for this and requiring this are two different things. Like I
said above we don't even currently have a common policy load interface.
Configuration is another step beyond policy load.


> There will also be a need to decide whether namespace separation
> should occur at the request of the orchestrator or at the next fork,

Or allow both, but yes a decision needs to be made

> the latter model being what the other resource namespaces use.  We
> believe the argument for direct separation can be made by looking at
> the gymnastics that orchestrators need to jump through with the
> 'change-on-fork' model.
>
Looking at current system namespacing we have clone/unshare which
really or on fork. setns enters existing namespaces.

We either need to create new variants of clone/unshare or potentially
have an LSM syscall that setups addition parameters that then are
triggered by clone/unshare. If going the latter route then its just
a matter whether the LSM call returns a handle that can be operated
on or not.

> Case in point, it would seem realistic that a process with sufficient
> privilege, may desire to place itself in a new LSM namespace context
> in a manner that does not require re-execution of itself.
> 
yes, but it is questionable whether security policy should allow that.
At the very least security policy should be consulted and may deny
it.

> With respect to separation, the remaining issue is if a new security
> capability bit needs to be implemented to gate namespace separation.
> John, based on your comments, I believe you would support this need?
> 
No, I don't think a capability (as in posix.1e) per say is needed. I
think an LSM permission request is.

>> You can do a subset with a single flag and only policy directing things,
>> but that would cut container managers out of the decision. Without a
>> universal container identifier that really limits what you can do. In
>> another email I likend it to the MCS label approach to the container
>> where you have a single security policy for the container and each
>> container gets to be a unique instance of that policy. Its not a perfect
>> analogy as with namespace policy can be loaded into the namespace making
>> it unique. I don't think the approach is right because not all namespaces
>> implement a loadable policy, and even when they do I think we can do a
>> better job if the container manager is allowed to provide additional
>> context with the namespacing request.
> 
> In order to be relevant, the configuration of LSM namespaces need to
> be under control of a resource orchestrator or container manager.
> 
No, the must be under the control of the LSMs.

> What we hear from people doing Kubernetes, at scale, is a desire to be
> able to request that a container be run somewhere in the hardware
> resource pool and for that container to implement a security model
> specific to the needs of the workload running in that container.  In a
> manner that is orthogonal from other security policies that may be in
> effect for other workloads, on the host or in other containers.
> 
sure, assuming the host policy allows it. Otherwise it is just a host
policy by-pass, which can not be allowed. K8s people have a specific
use case, they need to configure the host for that use case. They can
not expect that use case to work on host that has been configured
for say an MLS security constraint.

> Hopefully the above will be of assistance in furthering discussion.
> 
> Have a good week.
> 
> As always,
> Dr. Greg
> 
> The Quixote Project - Flailing at the Travails of Cybersecurity
>                https://github.com/Quixote-Project



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-09-02 10:55           ` John Johansen
@ 2025-09-05 22:14             ` Dr. Greg
  2025-09-06  2:01               ` John Johansen
  0 siblings, 1 reply; 43+ messages in thread
From: Dr. Greg @ 2025-09-05 22:14 UTC (permalink / raw)
  To: John Johansen
  Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore,
	linux-security-module, selinux

On Tue, Sep 02, 2025 at 03:55:39AM -0700, John Johansen wrote:

Hi, I hope the week has gone well for everyone.

> On 9/1/25 09:01, Dr. Greg wrote:
> >On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
> >
> >Good morning, I hope the week is starting well for everyone.
> >
> >Now that everyone is getting past the summer holiday season, it would
> >seem useful to specifically clarify some of the LSM namespace
> >implementation details.
> >
> >>On 8/21/25 07:26, Serge E. Hallyn wrote:
> >>>On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
> >>>>On 8/19/25 10:47, Stephen Smalley wrote:
> >>>>>On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com>
> >>>>>wrote:
> >>>>>>
> >>>>>>Hello all,
> >>>>>>
> >>>>>>As most of you are likely aware, Stephen Smalley has been working on
> >>>>>>adding namespace support to SELinux, and the work has now progressed
> >>>>>>to the point where a serious discussion on the API is warranted.  For
> >>>>>>those of you are unfamiliar with the details or Stephen's patchset, or
> >>>>>>simply need a refresher, he has some excellent documentation in his
> >>>>>>work-in-progress repo:
> >>>>>>
> >>>>>>* https://github.com/stephensmalley/selinuxns
> >>>>>>
> >>>>>>Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> >>>>>>about SELinux namespacing, you can watch the presentation here:
> >>>>>>
> >>>>>>* https://www.youtube.com/watch?v=AwzGCOwxLoM
> >>>>>>
> >>>>>>In the past you've heard me state, rather firmly at times, that I
> >>>>>>believe namespacing at the LSM framework layer to be a mistake,
> >>>>>>although if there is something that can be done to help facilitate the
> >>>>>>namespacing of individual LSMs at the framework layer, I would be
> >>>>>>supportive of that.  I think that a single LSM namespace API, similar
> >>>>>>to our recently added LSM syscalls, may be such a thing, so I'd like
> >>>>>>us to have a discussion to see if we all agree on that, and if so,
> >>>>>>what such an API might look like.
> >>>>>>
> >>>>>>At LSS-NA this year, John Johansen and I had a brief discussion where
> >>>>>>he suggested a single LSM wide clone*(2) flag that individual LSM's
> >>>>>>could opt into via callbacks.  John is directly CC'd on this mail, so
> >>>>>>I'll let him expand on this idea.
> >>>>>>
> >>>>>>While I agree with John that a fs based API is problematic (see all of
> >>>>>>our discussions around the LSM syscalls), I'm concerned that a single
> >>>>>>clone*(2) flag will significantly limit our flexibility around how
> >>>>>>individual LSMs are namespaced, something I don't want to see happen.
> >>>>>>This makes me wonder about the potential for expanding
> >>>>>>lsm_set_self_attr(2) to support a new LSM attribute that would support
> >>>>>>a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> >>>>>>provide a single LSM framework API for an unshare operation while also
> >>>>>>providing a mechanism to pass LSM specific via the lsm_ctx struct if
> >>>>>>needed.  Just as we do with the other LSM_ATTR_* flags today,
> >>>>>>individual LSMs can opt-in to the API fairly easily by providing a
> >>>>>>setselfattr() LSM callback.
> >>>>>>
> >>>>>>Thoughts?
> >>>>>
> >>>>>I think we want to be able to unshare a specific security module
> >>>>>namespace without unsharing the others, i.e. just SELinux or just
> >>>>>AppArmor.
> >>>>
> >>>>yes which is part of the problem with the single flag. That choice
> >>>>would be entirely at the policy level, without any input from userspace.
> >>>
> >>>AIUI Paul's suggestion is the user can pre-set the details of which
> >>>lsms to unshare and how with the lsm_set_self_attr(), and then a
> >>>single CLONE_LSM effects that.
> >
> >>yes, I was specifically addressing the conversation I had with Paul at
> >>LSS that Paul brought up. That is
> >>
> >>   At LSS-NA this year, John Johansen and I had a brief discussion where
> >>   he suggested a single LSM wide clone*(2) flag that individual LSM's
> >>   could opt into via callbacks.
> >>
> >>the idea there isn't all that different than what Paul proposed. You
> >>could have a single flag, if you can provide ancillary information. But
> >>a single flag on its own isn't sufficient.
> >
> >If one thing has come out of this thread, it would seem to be the fact
> >that there is going to be little commonality in the requirements that
> >various LSM's will have for the creation of a namespace.

> yes

Given that and the conversations to date, the open question may be
whether there needs to be a common 'LSM namespace' infrastructure at
all or just punt everything to LSM's that choose to implement
namespaces.

> >Given that, the most infrastructure that the LSM should provide would
> >be a common API for a resource orchestrator to request namespace
> >separation and to provide a framework for configuring the namespace
> >prior to when execution begins in the context of the namespace.

> hrmmm, certainly a common API. Any task could theoretically use the API
> it doesn't have to be a resource orchestrator, but I suppose you could
> call it such.

No argument that any task could call for separation.

We seem to be dancing around the notion that the primary use, nee
demand, for a security namespace will be to allow container specific
security policies.  In that scenario, the resource orchestrator or
container runtime will be what is requesting a specific security
model to be implemented in a namespace.

> I also dont know that we need to provide a framework for configuring
> the namespace prior to when execcution begins in the context of the
> namespace. It might be a nice to have, but configuring of LSMs is
> very LSM specific.
>
> We don't even have a common LSM policy load interface atm, though there
> is a proposal. Configuration is a step beyond that. Would it be nice
> to have, sure. Are we going to get that far, I don't know.

At least for model based LSM's, the configuration needs to occur
before execution within the namespace begins in order to avoid
possible races with respect to the security policy that gets effected.

Casey advocates for the use of lsm_set_self_attr(2), which has the
advantage of a common API and is probably sufficient if an LSM elects
to provide a generic management interface.

The system call is currently not namespace aware so the challenge will
be how to direct the configuration payload to the correct namespace.

Given that limitation, it seems highly probably that individual LSM's
will implement configuration/policy management via their various
pseudo-filesystem implementations that will grow awareness for the
namespace context that the commands are being issued for.

> >The first issue to resolve would seem to be what namespace separation
> >implies.
> >
> >John, if I interpret your comments in this discussion correctly, your
> >contention is that when namespace separation is requested, all of the
> >LSM's that implement namespaces will create a subordinate namespace,
> >is that a correct assumption?

> No, not necessarily. The task can request to "unshare/create" LSMs
> similar to requesting a set of system namespaces. Then every LSM,
> whether part of the request or not get to do their thing. If every
> LSM agrees, then a transition hook will process and each LSM will
> again do its thing. This would likely be what was requested but its
> possible that an LSM not in the request will do something, based on
> its model.
>
> In the end usespace gets to make a request, each security policy is
> responsible for staying withing its security model/policy.

This approach seems contrary to what Casey is advocating for in our
conversations, but perhaps we misunderstand what he is saying.

Casey indicated that no other LSM should be able to deny the ability
of another LSM to create a namespace.

As we noted in our exchange with him, this seems to violate the
current LSM model where all of the LSM's need to agree that an event
should be allowed, or it fails.

> >It would seem, consistent with the 'stacking' concept, that any LSM
> >with namespace capability that chooses not to separate, will result in
> >denial of the separation request.  That in turn will imply the need to

> Not necessarily. They could allow and choose not to transition. Or
> they could not create a namespace but update some state.

> >unwind or delete any namespace context that other LSM's may have
> >allocated before the refusal occurred.

> The request does need to be split into a permission hook and a
> transition hook similar to exec. If any LSM in the permission hook
> denies, the request is denied. If any LSM in the transition hook
> fails again the request will fail, and the LSMs would get their
> regular clean up hook called for the object associated.

See above, the open question seems to be whether or not there is
agreement that any LSM can generically deny the creation of namespace
creation.

Again, we may misunderstand Casey on this issue.

> >This model also implies that the orchestrator requesting the
> >separation will need to pass a set of parameters describing the
> >characteristics of each namespace, described by the LSM identifier
> >that they pertain to.  Since there may be a need to configure multiple
> >namespaces there would be a requirement to pass an array or list of
> >these parameter sets.

> yes it will require a list/array see lsm_set_self_attr(2)

Again, the issue is making this system call namespace aware.

> >There will also be a need to inject, possibly substantial amounts of
> >policy or model information into the namespace, before execution in
> >the context of the namespace begins.

> Allowing for this and requiring this are two different things. Like
> I said above we don't even currently have a common policy load
> interface.  Configuration is another step beyond policy load.

It would seem the most straight forward path is to simply punt this to
the LSM's itself.  If nothing else, it reduces the issues that
everyone needs to agree on.

> >There will also be a need to decide whether namespace separation
> >should occur at the request of the orchestrator or at the next fork,

> Or allow both, but yes a decision needs to be made

Again, allow both at the discretion of the LSM.

> >the latter model being what the other resource namespaces use.  We
> >believe the argument for direct separation can be made by looking at
> >the gymnastics that orchestrators need to jump through with the
> >'change-on-fork' model.

> Looking at current system namespacing we have clone/unshare which
> really or on fork. setns enters existing namespaces.
>
> We either need to create new variants of clone/unshare or potentially
> have an LSM syscall that setups addition parameters that then are
> triggered by clone/unshare. If going the latter route then its just
> a matter whether the LSM call returns a handle that can be operated
> on or not.

We will find that current namespace semantics are challenging with
respect to being a good model for LSM namespaces.

Current namespaces focus on managing a single resource.  In contrast,
as we have seen in our discussions, an 'LSM namespace' involves
multiple resources, each with their own specific requirements.  On top
of that we have the complication of 'stacking' where anything that
happens will be the composite of what all the LSM's agree on, some of
which may be in the root namespace and some of which may be in
subordinate namespaces.

The notion of a process entering a security namespace, aka setns, will
be interesting.  It would seem that this will require callbacks to
every LSM that is participating in the namespace.  Presumably all of
the references to LSM security contexts will need to be suspended and
replaced with references to the context(s) for the security namespace
that is being entered.

With respect to managing this effectively, we would advocate for a
64-bit global counter that gets incremented on each successful LSM
namespace creation event.  That would provide a unique handle for the
namespace that will never wrap.

> >Case in point, it would seem realistic that a process with sufficient
> >privilege, may desire to place itself in a new LSM namespace context
> >in a manner that does not require re-execution of itself.

> yes, but it is questionable whether security policy should allow that.
> At the very least security policy should be consulted and may deny
> it.

What we are talking about here is the need to support a process
requesting to run in an alternate LSM namespace without forking.

The question of whether this should be allowed will be regulated by
whatever composite security policy is operational, the same as would
be the case with the switch on fork model.

> >With respect to separation, the remaining issue is if a new security
> >capability bit needs to be implemented to gate namespace separation.
> >John, based on your comments, I believe you would support this need?

> No, I don't think a capability (as in posix.1e) per say is needed. I
> think an LSM permission request is.

Once again, that seems inconsistent with what Casey is advocating.

Although I'm sure he is happy that a new capability bit is not in the
offing... :-)

> >>You can do a subset with a single flag and only policy directing things,
> >>but that would cut container managers out of the decision. Without a
> >>universal container identifier that really limits what you can do. In
> >>another email I likend it to the MCS label approach to the container
> >>where you have a single security policy for the container and each
> >>container gets to be a unique instance of that policy. Its not a perfect
> >>analogy as with namespace policy can be loaded into the namespace making
> >>it unique. I don't think the approach is right because not all namespaces
> >>implement a loadable policy, and even when they do I think we can do a
> >>better job if the container manager is allowed to provide additional
> >>context with the namespacing request.
> >
> >In order to be relevant, the configuration of LSM namespaces need to
> >be under control of a resource orchestrator or container manager.

> No, the must be under the control of the LSMs.

I think we are talking past one another.

Configuration was perhaps a poor choice of vernacular, we were
referring to policy or model load.

As we mentioned in our exchange with Casey, the expection for all of
this from the user community will be to allow resource orchestrators
to run a workload under the constraints of a specific security policy.

Where policy should be probably plural.

Stephen even notes this on the slides that are linked from his GitHub
selinuxns site.

> >What we hear from people doing Kubernetes, at scale, is a desire to be
> >able to request that a container be run somewhere in the hardware
> >Resource pool and for that container to implement a security model
> >specific to the needs of the workload running in that container.  In a
> >manner that is orthogonal from other security policies that may be in
> >effect for other workloads, on the host or in other containers.

> sure, assuming the host policy allows it. Otherwise it is just a host
> policy by-pass, which can not be allowed. K8s people have a specific
> use case, they need to configure the host for that use case. They can
> not expect that use case to work on host that has been configured
> for say an MLS security constraint.

Given that the concept of LSM stacking is overlaid on top of
namespaces, the result of all this will be security policies that will
be very interesting to reason about, particularly if multiple levels
of namespacing are allowed.

The other issue will be potential performance issues for LSM's that
choose to chase permissions all the way back up to the root namespace.
We've heard continuous suggestions that every pointer de-reference
is problematic from a performance perspective.

So, lots of issues to consider in all of this.

Have a good weekend.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-09-05 22:14             ` Dr. Greg
@ 2025-09-06  2:01               ` John Johansen
  0 siblings, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-09-06  2:01 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore,
	linux-security-module, selinux

On 9/5/25 15:14, Dr. Greg wrote:
> On Tue, Sep 02, 2025 at 03:55:39AM -0700, John Johansen wrote:
> 
> Hi, I hope the week has gone well for everyone.
> 
I wish, *sigh*

>> On 9/1/25 09:01, Dr. Greg wrote:
>>> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
>>>
>>> Good morning, I hope the week is starting well for everyone.
>>>
>>> Now that everyone is getting past the summer holiday season, it would
>>> seem useful to specifically clarify some of the LSM namespace
>>> implementation details.
>>>
>>>> On 8/21/25 07:26, Serge E. Hallyn wrote:
>>>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
>>>>>> On 8/19/25 10:47, Stephen Smalley wrote:
>>>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> As most of you are likely aware, Stephen Smalley has been working on
>>>>>>>> adding namespace support to SELinux, and the work has now progressed
>>>>>>>> to the point where a serious discussion on the API is warranted.  For
>>>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or
>>>>>>>> simply need a refresher, he has some excellent documentation in his
>>>>>>>> work-in-progress repo:
>>>>>>>>
>>>>>>>> * https://github.com/stephensmalley/selinuxns
>>>>>>>>
>>>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>>>>>>>> about SELinux namespacing, you can watch the presentation here:
>>>>>>>>
>>>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>>>>>>>
>>>>>>>> In the past you've heard me state, rather firmly at times, that I
>>>>>>>> believe namespacing at the LSM framework layer to be a mistake,
>>>>>>>> although if there is something that can be done to help facilitate the
>>>>>>>> namespacing of individual LSMs at the framework layer, I would be
>>>>>>>> supportive of that.  I think that a single LSM namespace API, similar
>>>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
>>>>>>>> us to have a discussion to see if we all agree on that, and if so,
>>>>>>>> what such an API might look like.
>>>>>>>>
>>>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
>>>>>>>> I'll let him expand on this idea.
>>>>>>>>
>>>>>>>> While I agree with John that a fs based API is problematic (see all of
>>>>>>>> our discussions around the LSM syscalls), I'm concerned that a single
>>>>>>>> clone*(2) flag will significantly limit our flexibility around how
>>>>>>>> individual LSMs are namespaced, something I don't want to see happen.
>>>>>>>> This makes me wonder about the potential for expanding
>>>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>>>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>>>>>>>> provide a single LSM framework API for an unshare operation while also
>>>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>>>>>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>>>>>>>> individual LSMs can opt-in to the API fairly easily by providing a
>>>>>>>> setselfattr() LSM callback.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>
>>>>>>> I think we want to be able to unshare a specific security module
>>>>>>> namespace without unsharing the others, i.e. just SELinux or just
>>>>>>> AppArmor.
>>>>>>
>>>>>> yes which is part of the problem with the single flag. That choice
>>>>>> would be entirely at the policy level, without any input from userspace.
>>>>>
>>>>> AIUI Paul's suggestion is the user can pre-set the details of which
>>>>> lsms to unshare and how with the lsm_set_self_attr(), and then a
>>>>> single CLONE_LSM effects that.
>>>
>>>> yes, I was specifically addressing the conversation I had with Paul at
>>>> LSS that Paul brought up. That is
>>>>
>>>>    At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>    he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>    could opt into via callbacks.
>>>>
>>>> the idea there isn't all that different than what Paul proposed. You
>>>> could have a single flag, if you can provide ancillary information. But
>>>> a single flag on its own isn't sufficient.
>>>
>>> If one thing has come out of this thread, it would seem to be the fact
>>> that there is going to be little commonality in the requirements that
>>> various LSM's will have for the creation of a namespace.
> 
>> yes
> 
> Given that and the conversations to date, the open question may be
> whether there needs to be a common 'LSM namespace' infrastructure at
> all or just punt everything to LSM's that choose to implement
> namespaces.
> 
>>> Given that, the most infrastructure that the LSM should provide would
>>> be a common API for a resource orchestrator to request namespace
>>> separation and to provide a framework for configuring the namespace
>>> prior to when execution begins in the context of the namespace.
> 
>> hrmmm, certainly a common API. Any task could theoretically use the API
>> it doesn't have to be a resource orchestrator, but I suppose you could
>> call it such.
> 
> No argument that any task could call for separation.
> 
> We seem to be dancing around the notion that the primary use, nee
> demand, for a security namespace will be to allow container specific
> security policies.  In that scenario, the resource orchestrator or
> container runtime will be what is requesting a specific security
> model to be implemented in a namespace.
> 
no that is one use of them.

AppArmor is using namespaces for sub-confinement/priv sep They are also
used for tiered policy restrictions, and global black listing, and
unprivileged user and application policy.

>> I also dont know that we need to provide a framework for configuring
>> the namespace prior to when execcution begins in the context of the
>> namespace. It might be a nice to have, but configuring of LSMs is
>> very LSM specific.
>>
>> We don't even have a common LSM policy load interface atm, though there
>> is a proposal. Configuration is a step beyond that. Would it be nice
>> to have, sure. Are we going to get that far, I don't know.
> 
> At least for model based LSM's, the configuration needs to occur
> before execution within the namespace begins in order to avoid
> possible races with respect to the security policy that gets effected.
> 
depends on what you mean by configuration. There might be some config
of the namespace, but policy doesn't necessarily need to be loaded.
In both the unprivileged user and unprivileged application policy
cases, policy needs to be loaded after the the namespace is entered.

You can even split the container/orchastrator case, where LXD emulating
a system, will want to load the system policy as part of the OS boot
processes.

Where the docker/k8s/sandboxing use case have an orchestrator sandbox
app setup policy before hand.

> Casey advocates for the use of lsm_set_self_attr(2), which has the
> advantage of a common API and is probably sufficient if an LSM elects
> to provide a generic management interface.
> 
yeah that or something similar seems to be the way to go

> The system call is currently not namespace aware so the challenge will
> be how to direct the configuration payload to the correct namespace.
> 
yes

> Given that limitation, it seems highly probably that individual LSM's
> will implement configuration/policy management via their various
> pseudo-filesystem implementations that will grow awareness for the
> namespace context that the commands are being issued for.
>
possible. But ideally if we get it right they can expand the syscall
instead of an fs interface.

An fs interface has lots of problems like needing to be available within
a given namespace. If we want to be nesting namespaces (which we do),
then mounting custom FSes into the namespace is extra setup, and things
like proc may not even be available, depending on how the container is
being setup. 
>>> The first issue to resolve would seem to be what namespace separation
>>> implies.
>>>
>>> John, if I interpret your comments in this discussion correctly, your
>>> contention is that when namespace separation is requested, all of the
>>> LSM's that implement namespaces will create a subordinate namespace,
>>> is that a correct assumption?
> 
>> No, not necessarily. The task can request to "unshare/create" LSMs
>> similar to requesting a set of system namespaces. Then every LSM,
>> whether part of the request or not get to do their thing. If every
>> LSM agrees, then a transition hook will process and each LSM will
>> again do its thing. This would likely be what was requested but its
>> possible that an LSM not in the request will do something, based on
>> its model.
>>
>> In the end usespace gets to make a request, each security policy is
>> responsible for staying withing its security model/policy.
> 
> This approach seems contrary to what Casey is advocating for in our
> conversations, but perhaps we misunderstand what he is saying.
> 
Maybe, its not what I am getting from him, but I could be misunderstanding
as well.

> Casey indicated that no other LSM should be able to deny the ability
> of another LSM to create a namespace.
> 
correct, at least in isolation. However if it is tied to other namespace
creation, say at clone/unshare, an LSM should be able to deny that and
have the whole set fail.

That is an individual LSM can deny the creation of other non-LSM
namespaces that are happening at the same time. This may affect the
creation of other LSM namespaces, but any given individual LSM is
not denying another LSM from creating a namespace.



> As we noted in our exchange with him, this seems to violate the
> current LSM model where all of the LSM's need to agree that an event
> should be allowed, or it fails.
> 

there is good reason for it. Experience has shown forcing each LSM to
update policy for the policy of another LSM is problematic. Allowing
each LSM to manage itself based on its own policy while the rest of
the events are allow or fail, is very practical.

>>> It would seem, consistent with the 'stacking' concept, that any LSM
>>> with namespace capability that chooses not to separate, will result in
>>> denial of the separation request.  That in turn will imply the need to
> 
>> Not necessarily. They could allow and choose not to transition. Or
>> they could not create a namespace but update some state.
> 
>>> unwind or delete any namespace context that other LSM's may have
>>> allocated before the refusal occurred.
> 
>> The request does need to be split into a permission hook and a
>> transition hook similar to exec. If any LSM in the permission hook
>> denies, the request is denied. If any LSM in the transition hook
>> fails again the request will fail, and the LSMs would get their
>> regular clean up hook called for the object associated.
> 
> See above, the open question seems to be whether or not there is
> agreement that any LSM can generically deny the creation of namespace
> creation.
> 
> Again, we may misunderstand Casey on this issue.
> 

Its not about what an individual LSM is allowed but what is happening
at the system level. If system events are moving with the LSM event
the system event is fair game.

Even if we are talking individual LSM updates a two hook model may be
needed when taking into account the constraints of creds, and non-LSM
permission checks.

>>> This model also implies that the orchestrator requesting the
>>> separation will need to pass a set of parameters describing the
>>> characteristics of each namespace, described by the LSM identifier
>>> that they pertain to.  Since there may be a need to configure multiple
>>> namespaces there would be a requirement to pass an array or list of
>>> these parameter sets.
> 
>> yes it will require a list/array see lsm_set_self_attr(2)
> 
> Again, the issue is making this system call namespace aware.
> 
sure or another similar syscall. I don't think we are saying that it
has to be lsm_set_self_attr. More that it provides an example of how
to do this. It could be that it can be extended, it could be it turns
out that doing a new call that is similar but meets the constraints
is needed.

>>> There will also be a need to inject, possibly substantial amounts of
>>> policy or model information into the namespace, before execution in
>>> the context of the namespace begins.
> 
>> Allowing for this and requiring this are two different things. Like
>> I said above we don't even currently have a common policy load
>> interface.  Configuration is another step beyond policy load.
> 
> It would seem the most straight forward path is to simply punt this to
> the LSM's itself.  If nothing else, it reduces the issues that
> everyone needs to agree on.
> 
Yes, configuration requirements are definitely a per LSM thing.

>>> There will also be a need to decide whether namespace separation
>>> should occur at the request of the orchestrator or at the next fork,
> 
>> Or allow both, but yes a decision needs to be made
> 
> Again, allow both at the discretion of the LSM.
> 
sure

>>> the latter model being what the other resource namespaces use.  We
>>> believe the argument for direct separation can be made by looking at
>>> the gymnastics that orchestrators need to jump through with the
>>> 'change-on-fork' model.
> 
>> Looking at current system namespacing we have clone/unshare which
>> really or on fork. setns enters existing namespaces.
>>
>> We either need to create new variants of clone/unshare or potentially
>> have an LSM syscall that setups addition parameters that then are
>> triggered by clone/unshare. If going the latter route then its just
>> a matter whether the LSM call returns a handle that can be operated
>> on or not.
> 
> We will find that current namespace semantics are challenging with
> respect to being a good model for LSM namespaces.
> 
> Current namespaces focus on managing a single resource.  In contrast,
> as we have seen in our discussions, an 'LSM namespace' involves
> multiple resources, each with their own specific requirements.  On top
> of that we have the complication of 'stacking' where anything that
> happens will be the composite of what all the LSM's agree on, some of
> which may be in the root namespace and some of which may be in
> subordinate namespaces.
> 
its easy to see why people call security people crazy :)

> The notion of a process entering a security namespace, aka setns, will
> be interesting.  It would seem that this will require callbacks to
> every LSM that is participating in the namespace.  Presumably all of
> the references to LSM security contexts will need to be suspended and
> replaced with references to the context(s) for the security namespace
> that is being entered.
> 
yes setns from a security pov is problematic.

> With respect to managing this effectively, we would advocate for a
> 64-bit global counter that gets incremented on each successful LSM
> namespace creation event.  That would provide a unique handle for the
> namespace that will never wrap.
> 

uhmmm, a unique container id? Well I guess that is one way to guarantee
this will never happen.

>>> Case in point, it would seem realistic that a process with sufficient
>>> privilege, may desire to place itself in a new LSM namespace context
>>> in a manner that does not require re-execution of itself.
> 
>> yes, but it is questionable whether security policy should allow that.
>> At the very least security policy should be consulted and may deny
>> it.
> 
> What we are talking about here is the need to support a process
> requesting to run in an alternate LSM namespace without forking.
> 
sure, I support allowing a process to ask

> The question of whether this should be allowed will be regulated by
> whatever composite security policy is operational, the same as would
> be the case with the switch on fork model.
> 
>>> With respect to separation, the remaining issue is if a new security
>>> capability bit needs to be implemented to gate namespace separation.
>>> John, based on your comments, I believe you would support this need?
> 
>> No, I don't think a capability (as in posix.1e) per say is needed. I
>> think an LSM permission request is.
> 
> Once again, that seems inconsistent with what Casey is advocating.
> 
> Although I'm sure he is happy that a new capability bit is not in the
> offing... :-)
> 

not at all. I think the distinction is the LSM hook is asking the LSM
that is being asked to be namespaced. That is each LSM is consulted about
itself.

>>>> You can do a subset with a single flag and only policy directing things,
>>>> but that would cut container managers out of the decision. Without a
>>>> universal container identifier that really limits what you can do. In
>>>> another email I likend it to the MCS label approach to the container
>>>> where you have a single security policy for the container and each
>>>> container gets to be a unique instance of that policy. Its not a perfect
>>>> analogy as with namespace policy can be loaded into the namespace making
>>>> it unique. I don't think the approach is right because not all namespaces
>>>> implement a loadable policy, and even when they do I think we can do a
>>>> better job if the container manager is allowed to provide additional
>>>> context with the namespacing request.
>>>
>>> In order to be relevant, the configuration of LSM namespaces need to
>>> be under control of a resource orchestrator or container manager.
> 
>> No, the must be under the control of the LSMs.
> 
> I think we are talking past one another.
> 
quite possibly

> Configuration was perhaps a poor choice of vernacular, we were
> referring to policy or model load.
> 
which is one part of configuration. Its conceivable that an LSM could
have nobs to turn beyond policy

> As we mentioned in our exchange with Casey, the expection for all of
> this from the user community will be to allow resource orchestrators
> to run a workload under the constraints of a specific security policy.
> 
sure that is the expectation of the container community. Its just not
the only use.

> Where policy should be probably plural.
> 
> Stephen even notes this on the slides that are linked from his GitHub
> selinuxns site.
> 
>>> What we hear from people doing Kubernetes, at scale, is a desire to be
>>> able to request that a container be run somewhere in the hardware
>>> Resource pool and for that container to implement a security model
>>> specific to the needs of the workload running in that container.  In a
>>> manner that is orthogonal from other security policies that may be in
>>> effect for other workloads, on the host or in other containers.
> 
>> sure, assuming the host policy allows it. Otherwise it is just a host
>> policy by-pass, which can not be allowed. K8s people have a specific
>> use case, they need to configure the host for that use case. They can
>> not expect that use case to work on host that has been configured
>> for say an MLS security constraint.
> 
> Given that the concept of LSM stacking is overlaid on top of
> namespaces, the result of all this will be security policies that will
> be very interesting to reason about, particularly if multiple levels
> of namespacing are allowed.
> 
"interesting"*TM* indeed

> The other issue will be potential performance issues for LSM's that
> choose to chase permissions all the way back up to the root namespace.
> We've heard continuous suggestions that every pointer de-reference
> is problematic from a performance perspective.
> 
oh it is, the perforamance people can get snippy about just a few
cycles. Ultimately that is just the cost of stacking policy. The
more layers you add the higher the cost.

AppArmor is already working towards a jit of policy that will be able
to flatten stacked policy, so the cost is can be pushed back to the
same as non-stacked. That however comes with the cost of increased
memory use, and it will only deal with the AppArmor part of the
whole stack.

> So, lots of issues to consider in all of this.
> 
> Have a good weekend.
> 
> As always,
> Dr. Greg
> 
> The Quixote Project - Flailing at the Travails of Cybersecurity
>                https://github.com/Quixote-Project


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21  7:46   ` John Johansen
  2025-08-21 14:26     ` Serge E. Hallyn
@ 2025-08-22  1:59     ` Paul Moore
  1 sibling, 0 replies; 43+ messages in thread
From: Paul Moore @ 2025-08-22  1:59 UTC (permalink / raw)
  To: John Johansen; +Cc: Stephen Smalley, linux-security-module, selinux

On Thu, Aug 21, 2025 at 3:46 AM John Johansen
<john.johansen@canonical.com> wrote:
> On 8/19/25 10:47, Stephen Smalley wrote:

...

> > This is handled for other Linux namespaces by opening a pseudo file
> > under /proc/pid/ns and invoking setns(2), so not sure how we want to
> > do it.
>
> That is a possible interface, not one that I like, so I would like to
> explore other options first.

Fair enough, suggestions are definitely welcome :)

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 14:56 LSM namespacing API Paul Moore
  2025-08-19 17:11 ` Casey Schaufler
  2025-08-19 17:47 ` Stephen Smalley
@ 2025-08-21  7:14 ` John Johansen
  2025-08-21 11:20 ` Dr. Greg
  3 siblings, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-21  7:14 UTC (permalink / raw)
  To: Paul Moore, linux-security-module, selinux; +Cc: Stephen Smalley

On 8/19/25 07:56, Paul Moore wrote:
> Hello all,
> 
> As most of you are likely aware, Stephen Smalley has been working on
> adding namespace support to SELinux, and the work has now progressed
> to the point where a serious discussion on the API is warranted.  For
> those of you are unfamiliar with the details or Stephen's patchset, or
> simply need a refresher, he has some excellent documentation in his
> work-in-progress repo:
> 
> * https://github.com/stephensmalley/selinuxns
> 
> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> about SELinux namespacing, you can watch the presentation here:
> 
> * https://www.youtube.com/watch?v=AwzGCOwxLoM
> 
> In the past you've heard me state, rather firmly at times, that I
> believe namespacing at the LSM framework layer to be a mistake,
> although if there is something that can be done to help facilitate the
> namespacing of individual LSMs at the framework layer, I would be
> supportive of that.  I think that a single LSM namespace API, similar
> to our recently added LSM syscalls, may be such a thing, so I'd like
> us to have a discussion to see if we all agree on that, and if so,
> what such an API might look like.
> 
> At LSS-NA this year, John Johansen and I had a brief discussion where
> he suggested a single LSM wide clone*(2) flag that individual LSM's
> could opt into via callbacks.  John is directly CC'd on this mail, so
> I'll let him expand on this idea.
> 
> While I agree with John that a fs based API is problematic (see all of
> our discussions around the LSM syscalls), I'm concerned that a single
> clone*(2) flag will significantly limit our flexibility around how
> individual LSMs are namespaced, something I don't want to see happen.
> This makes me wonder about the potential for expanding
> lsm_set_self_attr(2) to support a new LSM attribute that would support
> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> provide a single LSM framework API for an unshare operation while also
> providing a mechanism to pass LSM specific via the lsm_ctx struct if
> needed.  Just as we do with the other LSM_ATTR_* flags today,
> individual LSMs can opt-in to the API fairly easily by providing a
> setselfattr() LSM callback.
> 
> Thoughts?
> 
sorry I have been deal with a forced email migration that uhhmmm hasn't
gone well.

So yes we could do a single clone flag, but it does have significant
issues, and is not generic enough for every LSM, at least not without
some form of providing augmented information.

A single clone flag means each LSM is completely in charge of its
transitions (needed) but without any hinting from userspace container
managers (this is a problem). Under the single flag, policy would have
to drive what can be done, and that would be fairly limiting. It would
allow for something like the current MCS labeling approach but not a
finer Udica style approach, at least not without an addition call
similar to setexeccon(), or as you have proposed more generically
LSM_ATTR_UNSHARE.

The more I have looked at it. The single clone flag approach is wrong
and is just going to lead to problems.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-19 14:56 LSM namespacing API Paul Moore
                   ` (2 preceding siblings ...)
  2025-08-21  7:14 ` John Johansen
@ 2025-08-21 11:20 ` Dr. Greg
  2025-08-21 14:44   ` John Johansen
  3 siblings, 1 reply; 43+ messages in thread
From: Dr. Greg @ 2025-08-21 11:20 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-security-module, selinux, John Johansen, Stephen Smalley

On Tue, Aug 19, 2025 at 10:56:27AM -0400, Paul Moore wrote:

> Hello all,

Good morning, I hope the day is going well for everyone.

> As most of you are likely aware, Stephen Smalley has been working on
> adding namespace support to SELinux, and the work has now progressed
> to the point where a serious discussion on the API is warranted.  For
> those of you are unfamiliar with the details or Stephen's patchset, or
> simply need a refresher, he has some excellent documentation in his
> work-in-progress repo:
> 
> * https://github.com/stephensmalley/selinuxns
> 
> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
> about SELinux namespacing, you can watch the presentation here:
> 
> * https://www.youtube.com/watch?v=AwzGCOwxLoM
> 
> In the past you've heard me state, rather firmly at times, that I
> believe namespacing at the LSM framework layer to be a mistake,
> although if there is something that can be done to help facilitate the
> namespacing of individual LSMs at the framework layer, I would be
> supportive of that.  I think that a single LSM namespace API, similar
> to our recently added LSM syscalls, may be such a thing, so I'd like
> us to have a discussion to see if we all agree on that, and if so,
> what such an API might look like.
> 
> At LSS-NA this year, John Johansen and I had a brief discussion where
> he suggested a single LSM wide clone*(2) flag that individual LSM's
> could opt into via callbacks.  John is directly CC'd on this mail, so
> I'll let him expand on this idea.
> 
> While I agree with John that a fs based API is problematic (see all of
> our discussions around the LSM syscalls), I'm concerned that a single
> clone*(2) flag will significantly limit our flexibility around how
> individual LSMs are namespaced, something I don't want to see happen.
> This makes me wonder about the potential for expanding
> lsm_set_self_attr(2) to support a new LSM attribute that would support
> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
> provide a single LSM framework API for an unshare operation while also
> providing a mechanism to pass LSM specific via the lsm_ctx struct if
> needed.  Just as we do with the other LSM_ATTR_* flags today,
> individual LSMs can opt-in to the API fairly easily by providing a
> setselfattr() LSM callback.
> 
> Thoughts?

There has been an adage that traces back to the writings of George
Santayana in 1905 that seems relevant:

"Those who cannot remember the past are condemned to repeat it."

To that end, some input from more than a decade of our work on this
issue.  Some of our reflections below are relevant to issues being
covered in downstream components of this thread, particularly by John
in the last few hours.

We have had code on the table for three years with respect to the
problem of generic namespacing of security policy/model/architecture,
whatever one chooses to call it.

For everyone's reference, here are the URL's to the patch series:

V1:
https://lore.kernel.org/linux-security-module/20230204050954.11583-1-greg@enjellic.com/T/#t

V2:
https://lore.kernel.org/linux-security-module/20230710102319.19716-1-greg@enjellic.com/T/#t

V3:
https://lore.kernel.org/linux-security-module/20240401105015.27614-1-greg@enjellic.com/T/#t

V4:
https://lore.kernel.org/linux-security-module/20240826103728.3378-1-greg@enjellic.com/T/#t

We started this work about 13-15 years ago.  We initially described
our work and the need for it, 10 years ago almost to this day.  See
our 2015 paper at the Linux Security Summit in Seattle.

James Morris and Casey were in the first row, Stephen and a co-worker
from the NSA were in the second row, to the speakers left.

If one spends some time looking under the hood, TSEM is in large part
about providing a generic framework for running multiple, independent
and orthogonal security frameworks/policies/architectures, whatever
one chooses to call these entities.

The reason that we argue that TSEM is a generic framework, is that in
our internal work, we have ported the major LSM's, including the IMA
infrastructure, to run in isolated namespaces as plugins for TSEM's
notion of Trusted Modeling Agents (TMA's).  We also have ongoing work
that enables Kubernetes to dispatch workloads, using whatever LSM
based security policy that container developers desire for their
workloads.

Suffice it to say, we have howed a lot of ground on the issues
surrounding this, including issues surrounding production deployment
of this type of technology.

In our initial implementation, circa 2015, we adopted the approach of
using a CLONE_* flag and wired the implementation of security
namespaces into the rest of the namespace infrastructure.

During COVID, we re-architected the entire implementation and moved to
using a control file in the pseudo-filesystem that TSEM implements, we
have never looked back on this decision.

TSEM security workloads are a poster child for security namespaces
that require a number of different setup parameters.  A command verb
syntax with key=value pairs, written to a pseudo-file, has proven
itself to be the most flexible approach when setting up security
workloads.

With respect to namespace transition, we trigger the transition of a
process to a new namespace (unsharing) when the process issues the
request via the control file.  This has proven to be, at once, the
most straight forward and least security prone approach.

The other major, and thorny issue, is the notion of another process
'entering' a security namespace.  There are a ton of open issues to be
considered with this, the approach that we took that has worked well
to date, is the notion of a 'trust orchestrator' that has
responsibility for controlling the namespace.  Any manipulations or
control of the namespace are conducted through the orchestrator
process.

If anyone chooses to look at our implementation, you will find that we
'bless' the orchestrator process, at the time of namespace creation,
with access to the security namespace context control structure for
the namespace being created.  The orchestrator is the only entity that
can access the security state of the namespace, other than processes
within the namespace itself.

This significantly narrows the scope of vulnerability with respect to
who or what can manipulate a security namespace.  There are a number
of thorny issues, that we have not seen anyone mention, that surface
with respect to allowing entry into a security namespace by an
arbitrary process.  Believe me when I say we have found a number of
them by accident and incident.

So big picture.

Over a decade of experience with these issues, suggests that Paul's
premise that most of these issues are best left to specific LSM's that
elect to implement namespacing, is correct.

The challenge is that this situation ends up being all or nothing.

The actual amount of code involved in unsharing a namespace is so
trivial, in comparison to the work involved with setting up and
maintaining state information for a security namespace context, that
it seems to make little sense to implement this support at the level
of the LSM infrastructure itself.

If the decision is made to provide generic namespace support, other
than a request to create a namespace, it will rapidly become a
slippery slope with respect to the amount of infrastructure needed to
address the complexities associated with every security model being
different from every other.

The caveat to this is if our notion of a 'trust orchestrator' would be
deemed to have merit.  In that case, an LSM based namespace separation
architecture would provide a common point for the orchestrator to be
'blessed' with access to control of a namespace.

The other open issue is whether or not a separate capability should be
implemented that allows the creation of a new security namespace.  If
one paws through our TSEM submissions, one will see that we proposed
such a capability bit.

Casey noted, rather emphatically, that no new capabilities were going
to be implemented in Linux, particularly for what was described as a
'toy' project.  He indicated that CAP_MAC_ADMIN was the canonical
capability that should be used for manipulating LSM's.

We will be very interested in seeing how a discussion around this
evolves, as 'escaping' from an existing security context to a new one
is an extremely critical operation from a security perspective, if one
stands back and looks at the issue objectively.  If the concept of a
'security orchestrator' is embraced, it would make perfect sense for
the orchestrator to drop CAP_SEC_NS, or whatever it would be called,
and retain CAP_MAC_ADMIN in order to manage the namespace.

So lots of issues to consider; thorny, political and otherwise, on
multiple fronts.

> paul-moore.com

Have a good day.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: LSM namespacing API
  2025-08-21 11:20 ` Dr. Greg
@ 2025-08-21 14:44   ` John Johansen
  0 siblings, 0 replies; 43+ messages in thread
From: John Johansen @ 2025-08-21 14:44 UTC (permalink / raw)
  To: Dr. Greg, Paul Moore; +Cc: linux-security-module, selinux, Stephen Smalley

On 8/21/25 04:20, Dr. Greg wrote:
> On Tue, Aug 19, 2025 at 10:56:27AM -0400, Paul Moore wrote:
> 
>> Hello all,
> 
> Good morning, I hope the day is going well for everyone.
> 
>> As most of you are likely aware, Stephen Smalley has been working on
>> adding namespace support to SELinux, and the work has now progressed
>> to the point where a serious discussion on the API is warranted.  For
>> those of you are unfamiliar with the details or Stephen's patchset, or
>> simply need a refresher, he has some excellent documentation in his
>> work-in-progress repo:
>>
>> * https://github.com/stephensmalley/selinuxns
>>
>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>> about SELinux namespacing, you can watch the presentation here:
>>
>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>
>> In the past you've heard me state, rather firmly at times, that I
>> believe namespacing at the LSM framework layer to be a mistake,
>> although if there is something that can be done to help facilitate the
>> namespacing of individual LSMs at the framework layer, I would be
>> supportive of that.  I think that a single LSM namespace API, similar
>> to our recently added LSM syscalls, may be such a thing, so I'd like
>> us to have a discussion to see if we all agree on that, and if so,
>> what such an API might look like.
>>
>> At LSS-NA this year, John Johansen and I had a brief discussion where
>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>> could opt into via callbacks.  John is directly CC'd on this mail, so
>> I'll let him expand on this idea.
>>
>> While I agree with John that a fs based API is problematic (see all of
>> our discussions around the LSM syscalls), I'm concerned that a single
>> clone*(2) flag will significantly limit our flexibility around how
>> individual LSMs are namespaced, something I don't want to see happen.
>> This makes me wonder about the potential for expanding
>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>> provide a single LSM framework API for an unshare operation while also
>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>> individual LSMs can opt-in to the API fairly easily by providing a
>> setselfattr() LSM callback.
>>
>> Thoughts?
> 
> There has been an adage that traces back to the writings of George
> Santayana in 1905 that seems relevant:
> 
> "Those who cannot remember the past are condemned to repeat it."
> 
> To that end, some input from more than a decade of our work on this
> issue.  Some of our reflections below are relevant to issues being
> covered in downstream components of this thread, particularly by John
> in the last few hours.
> 
> We have had code on the table for three years with respect to the
> problem of generic namespacing of security policy/model/architecture,
> whatever one chooses to call it.
> 
> For everyone's reference, here are the URL's to the patch series:
> 
> V1:
> https://lore.kernel.org/linux-security-module/20230204050954.11583-1-greg@enjellic.com/T/#t
> 
> V2:
> https://lore.kernel.org/linux-security-module/20230710102319.19716-1-greg@enjellic.com/T/#t
> 
> V3:
> https://lore.kernel.org/linux-security-module/20240401105015.27614-1-greg@enjellic.com/T/#t
> 
> V4:
> https://lore.kernel.org/linux-security-module/20240826103728.3378-1-greg@enjellic.com/T/#t
> 
> We started this work about 13-15 years ago.  We initially described
> our work and the need for it, 10 years ago almost to this day.  See
> our 2015 paper at the Linux Security Summit in Seattle.
> 
> James Morris and Casey were in the first row, Stephen and a co-worker
> from the NSA were in the second row, to the speakers left.
> 
> If one spends some time looking under the hood, TSEM is in large part
> about providing a generic framework for running multiple, independent
> and orthogonal security frameworks/policies/architectures, whatever
> one chooses to call these entities.
> 
> The reason that we argue that TSEM is a generic framework, is that in
> our internal work, we have ported the major LSM's, including the IMA
> infrastructure, to run in isolated namespaces as plugins for TSEM's
> notion of Trusted Modeling Agents (TMA's).  We also have ongoing work
> that enables Kubernetes to dispatch workloads, using whatever LSM
> based security policy that container developers desire for their
> workloads.
> 
> Suffice it to say, we have howed a lot of ground on the issues
> surrounding this, including issues surrounding production deployment
> of this type of technology.
> 
> In our initial implementation, circa 2015, we adopted the approach of
> using a CLONE_* flag and wired the implementation of security
> namespaces into the rest of the namespace infrastructure.
> 
> During COVID, we re-architected the entire implementation and moved to
> using a control file in the pseudo-filesystem that TSEM implements, we
> have never looked back on this decision.
> 
> TSEM security workloads are a poster child for security namespaces
> that require a number of different setup parameters.  A command verb
> syntax with key=value pairs, written to a pseudo-file, has proven
> itself to be the most flexible approach when setting up security
> workloads.
> 
> With respect to namespace transition, we trigger the transition of a
> process to a new namespace (unsharing) when the process issues the
> request via the control file.  This has proven to be, at once, the
> most straight forward and least security prone approach.
> 
> The other major, and thorny issue, is the notion of another process
> 'entering' a security namespace.  There are a ton of open issues to be
> considered with this, the approach that we took that has worked well
> to date, is the notion of a 'trust orchestrator' that has
> responsibility for controlling the namespace.  Any manipulations or
> control of the namespace are conducted through the orchestrator
> process.
> 
> If anyone chooses to look at our implementation, you will find that we
> 'bless' the orchestrator process, at the time of namespace creation,
> with access to the security namespace context control structure for
> the namespace being created.  The orchestrator is the only entity that
> can access the security state of the namespace, other than processes
> within the namespace itself.
> 
> This significantly narrows the scope of vulnerability with respect to
> who or what can manipulate a security namespace.  There are a number
> of thorny issues, that we have not seen anyone mention, that surface
> with respect to allowing entry into a security namespace by an
> arbitrary process.  Believe me when I say we have found a number of
> them by accident and incident.
> 
indeed, this has to be tightly controlled. Much more so than just
creating a namespace. And its not just the "security/LSM" namespace
but the entire context around it. That is whether or not you can
step into say the mount namespace separate from the security/LSM
namespace it was created with.

Each and everyone of those opens potential attack surface. Even if you
if it turns out to be safe, you have to carefully evaluate each
potential combination.

> So big picture.
> 
> Over a decade of experience with these issues, suggests that Paul's
> premise that most of these issues are best left to specific LSM's that
> elect to implement namespacing, is correct.
> 
> The challenge is that this situation ends up being all or nothing.
> 
> The actual amount of code involved in unsharing a namespace is so
> trivial, in comparison to the work involved with setting up and
> maintaining state information for a security namespace context, that
> it seems to make little sense to implement this support at the level
> of the LSM infrastructure itself.
> 
actually I think that is pretty much the goal, just a minimal thin
layer that provides the hooks and maybe an LSM blob object for the
individual LSMs to do their thing.

Instead of each LSM implementing their own interface there is a common
one for container orchastrators to use to make the request.

> If the decision is made to provide generic namespace support, other
> than a request to create a namespace, it will rapidly become a
> slippery slope with respect to the amount of infrastructure needed to
> address the complexities associated with every security model being
> different from every other.
> 
yep, this is really just about a thin common API, and minimal
infrastructure around the existing system namespacing calls (clone,
ushare, setns).

> The caveat to this is if our notion of a 'trust orchestrator' would be
> deemed to have merit.  In that case, an LSM based namespace separation
> architecture would provide a common point for the orchestrator to be
> 'blessed' with access to control of a namespace.
> 
A trust orchestrator isn't necessarily needed. Each LSM can manage its
own trust within its policy. A trust orchestrator becomes more necessary
when you are trying to namespacing without the LSMs themselves
participating in the decision around namespacing. Which admittedly has
largely been the current situation.

> The other open issue is whether or not a separate capability should be
> implemented that allows the creation of a new security namespace.  If
> one paws through our TSEM submissions, one will see that we proposed
> such a capability bit.
> 
Its not needed if individual LSMs are making decisions around namespacing
based on policy. In fact in that case it can even be harmful. Per LSM
policy would be finer grained, where a capability becomes this shared
flag that lacks context. Examples abound in the kernel where we have
a cap check without context and then a more context based security
check.

Where the capability might be useful is wehn LSMs aren't dealing with
the namespacing request directly.

> Casey noted, rather emphatically, that no new capabilities were going
> to be implemented in Linux, particularly for what was described as a
> 'toy' project.  He indicated that CAP_MAC_ADMIN was the canonical
> capability that should be used for manipulating LSM's.
> 
I disagree with the reuse of CAP_MAC_ADMIN, if there is going to be
a capability around this it should be distinct from MAC_ADMIN and
MAC_OVERRIDE, as it very much has different semantics.

> We will be very interested in seeing how a discussion around this
> evolves, as 'escaping' from an existing security context to a new one
> is an extremely critical operation from a security perspective, if one

yes. I might have mentioned just how much I dislike setns().

> stands back and looks at the issue objectively.  If the concept of a
> 'security orchestrator' is embraced, it would make perfect sense for
> the orchestrator to drop CAP_SEC_NS, or whatever it would be called,
> and retain CAP_MAC_ADMIN in order to manage the namespace.
> 
> So lots of issues to consider; thorny, political and otherwise, on
> multiple fronts.
> 
>> paul-moore.com
> 
> Have a good day.
> 
> As always,
> Dr. Greg
> 
> The Quixote Project - Flailing at the Travails of Cybersecurity
>                https://github.com/Quixote-Project
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2025-09-06  2:01 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 14:56 LSM namespacing API Paul Moore
2025-08-19 17:11 ` Casey Schaufler
2025-08-19 18:40   ` Paul Moore
2025-08-19 18:58     ` Stephen Smalley
2025-08-21  7:26       ` John Johansen
2025-08-21  7:23     ` John Johansen
2025-08-22  1:57       ` Paul Moore
2025-08-22 14:30         ` John Johansen
2025-08-21 10:00     ` Mickaël Salaün
2025-08-22  2:14       ` Paul Moore
2025-08-22 14:47         ` Casey Schaufler
2025-08-22 19:59           ` John Johansen
2025-08-23 17:41             ` Dr. Greg
2025-08-23 23:00               ` John Johansen
2025-08-19 17:47 ` Stephen Smalley
2025-08-19 18:51   ` Paul Moore
2025-08-19 18:52     ` Paul Moore
2025-08-20 14:44     ` Mickaël Salaün
2025-08-20 15:37       ` Casey Schaufler
2025-08-20 20:47       ` Paul Moore
2025-08-21  9:56         ` Mickaël Salaün
2025-08-21 14:18           ` John Johansen
2025-08-22  2:09           ` Paul Moore
2025-08-21  2:05     ` Serge E. Hallyn
2025-08-21  2:35       ` Paul Moore
2025-08-21  3:02         ` Serge E. Hallyn
2025-08-22  1:50           ` Paul Moore
2025-08-21  8:12         ` John Johansen
2025-08-21  8:07       ` John Johansen
2025-08-21  7:46   ` John Johansen
2025-08-21 14:26     ` Serge E. Hallyn
2025-08-21 14:57       ` John Johansen
2025-09-01 16:01         ` Dr. Greg
2025-09-01 17:31           ` Casey Schaufler
2025-09-04  2:16             ` Dr. Greg
2025-09-04 17:40               ` Casey Schaufler
2025-09-02 10:55           ` John Johansen
2025-09-05 22:14             ` Dr. Greg
2025-09-06  2:01               ` John Johansen
2025-08-22  1:59     ` Paul Moore
2025-08-21  7:14 ` John Johansen
2025-08-21 11:20 ` Dr. Greg
2025-08-21 14:44   ` John Johansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).