* LSM namespacing API @ 2025-08-19 14:56 Paul Moore 2025-08-19 17:11 ` Casey Schaufler ` (3 more replies) 0 siblings, 4 replies; 43+ messages in thread From: Paul Moore @ 2025-08-19 14:56 UTC (permalink / raw) To: linux-security-module, selinux; +Cc: John Johansen, Stephen Smalley Hello all, As most of you are likely aware, Stephen Smalley has been working on adding namespace support to SELinux, and the work has now progressed to the point where a serious discussion on the API is warranted. For those of you are unfamiliar with the details or Stephen's patchset, or simply need a refresher, he has some excellent documentation in his work-in-progress repo: * https://github.com/stephensmalley/selinuxns Stephen also gave a (pre-recorded) presentation at LSS-NA this year about SELinux namespacing, you can watch the presentation here: * https://www.youtube.com/watch?v=AwzGCOwxLoM In the past you've heard me state, rather firmly at times, that I believe namespacing at the LSM framework layer to be a mistake, although if there is something that can be done to help facilitate the namespacing of individual LSMs at the framework layer, I would be supportive of that. I think that a single LSM namespace API, similar to our recently added LSM syscalls, may be such a thing, so I'd like us to have a discussion to see if we all agree on that, and if so, what such an API might look like. At LSS-NA this year, John Johansen and I had a brief discussion where he suggested a single LSM wide clone*(2) flag that individual LSM's could opt into via callbacks. John is directly CC'd on this mail, so I'll let him expand on this idea. While I agree with John that a fs based API is problematic (see all of our discussions around the LSM syscalls), I'm concerned that a single clone*(2) flag will significantly limit our flexibility around how individual LSMs are namespaced, something I don't want to see happen. This makes me wonder about the potential for expanding lsm_set_self_attr(2) to support a new LSM attribute that would support a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would provide a single LSM framework API for an unshare operation while also providing a mechanism to pass LSM specific via the lsm_ctx struct if needed. Just as we do with the other LSM_ATTR_* flags today, individual LSMs can opt-in to the API fairly easily by providing a setselfattr() LSM callback. Thoughts? -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 14:56 LSM namespacing API Paul Moore @ 2025-08-19 17:11 ` Casey Schaufler 2025-08-19 18:40 ` Paul Moore 2025-08-19 17:47 ` Stephen Smalley ` (2 subsequent siblings) 3 siblings, 1 reply; 43+ messages in thread From: Casey Schaufler @ 2025-08-19 17:11 UTC (permalink / raw) To: Paul Moore, linux-security-module, selinux Cc: John Johansen, Stephen Smalley, Casey Schaufler On 8/19/2025 7:56 AM, Paul Moore wrote: > Hello all, > > As most of you are likely aware, Stephen Smalley has been working on > adding namespace support to SELinux, and the work has now progressed > to the point where a serious discussion on the API is warranted. For > those of you are unfamiliar with the details or Stephen's patchset, or > simply need a refresher, he has some excellent documentation in his > work-in-progress repo: > > * https://github.com/stephensmalley/selinuxns > > Stephen also gave a (pre-recorded) presentation at LSS-NA this year > about SELinux namespacing, you can watch the presentation here: > > * https://www.youtube.com/watch?v=AwzGCOwxLoM > > In the past you've heard me state, rather firmly at times, that I > believe namespacing at the LSM framework layer to be a mistake, > although if there is something that can be done to help facilitate the > namespacing of individual LSMs at the framework layer, I would be > supportive of that. I think that a single LSM namespace API, similar > to our recently added LSM syscalls, may be such a thing, so I'd like > us to have a discussion to see if we all agree on that, and if so, > what such an API might look like. > > At LSS-NA this year, John Johansen and I had a brief discussion where > he suggested a single LSM wide clone*(2) flag that individual LSM's > could opt into via callbacks. John is directly CC'd on this mail, so > I'll let him expand on this idea. > > While I agree with John that a fs based API is problematic (see all of > our discussions around the LSM syscalls), I'm concerned that a single > clone*(2) flag will significantly limit our flexibility around how > individual LSMs are namespaced, something I don't want to see happen. > This makes me wonder about the potential for expanding > lsm_set_self_attr(2) to support a new LSM attribute that would support > a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > provide a single LSM framework API for an unshare operation while also > providing a mechanism to pass LSM specific via the lsm_ctx struct if > needed. Just as we do with the other LSM_ATTR_* flags today, > individual LSMs can opt-in to the API fairly easily by providing a > setselfattr() LSM callback. > > Thoughts? The advantage of a clone flag is that the operation is atomic with the other namespace flag based behaviors. Having a two step process clone(); lsm_set_self_attr(); - or - lsm_set_self_attr(); clone(); is going to lead to cases where neither order really works correctly. On the other hand, it's better to have a mechanism with a few drawbacks than nothing at all. I think it could be workable. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 17:11 ` Casey Schaufler @ 2025-08-19 18:40 ` Paul Moore 2025-08-19 18:58 ` Stephen Smalley ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Paul Moore @ 2025-08-19 18:40 UTC (permalink / raw) To: Casey Schaufler Cc: linux-security-module, selinux, John Johansen, Stephen Smalley On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > The advantage of a clone flag is that the operation is atomic with > the other namespace flag based behaviors. Having a two step process > > clone(); lsm_set_self_attr(); - or - > lsm_set_self_attr(); clone(); > > is going to lead to cases where neither order really works correctly. I was envisioning something that works similarly to LSM_ATTR_EXEC where the unshare isn't immediate, but rather happens at a future event. With LSM_ATTR_EXEC it happens at the next exec*(), with LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 18:40 ` Paul Moore @ 2025-08-19 18:58 ` Stephen Smalley 2025-08-21 7:26 ` John Johansen 2025-08-21 7:23 ` John Johansen 2025-08-21 10:00 ` Mickaël Salaün 2 siblings, 1 reply; 43+ messages in thread From: Stephen Smalley @ 2025-08-19 18:58 UTC (permalink / raw) To: Paul Moore; +Cc: Casey Schaufler, linux-security-module, selinux, John Johansen On Tue, Aug 19, 2025 at 2:41 PM Paul Moore <paul@paul-moore.com> wrote: > > On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > > > The advantage of a clone flag is that the operation is atomic with > > the other namespace flag based behaviors. Having a two step process > > > > clone(); lsm_set_self_attr(); - or - > > lsm_set_self_attr(); clone(); > > > > is going to lead to cases where neither order really works correctly. > > I was envisioning something that works similarly to LSM_ATTR_EXEC > where the unshare isn't immediate, but rather happens at a future > event. With LSM_ATTR_EXEC it happens at the next exec*(), with > LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). I've only implemented support for an immediate unsharing of the SELinux namespace, not any kind of deferred unsharing until the next exec or clone. Not saying that would be impossible, but since I was following the example of clone(2) and unshare(2) I didn't do it. May be some complications in doing so, but I haven't looked at it yet. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 18:58 ` Stephen Smalley @ 2025-08-21 7:26 ` John Johansen 0 siblings, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-21 7:26 UTC (permalink / raw) To: Stephen Smalley, Paul Moore Cc: Casey Schaufler, linux-security-module, selinux On 8/19/25 11:58, Stephen Smalley wrote: > On Tue, Aug 19, 2025 at 2:41 PM Paul Moore <paul@paul-moore.com> wrote: >> >> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>> >>> The advantage of a clone flag is that the operation is atomic with >>> the other namespace flag based behaviors. Having a two step process >>> >>> clone(); lsm_set_self_attr(); - or - >>> lsm_set_self_attr(); clone(); >>> >>> is going to lead to cases where neither order really works correctly. >> >> I was envisioning something that works similarly to LSM_ATTR_EXEC >> where the unshare isn't immediate, but rather happens at a future >> event. With LSM_ATTR_EXEC it happens at the next exec*(), with >> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). > > I've only implemented support for an immediate unsharing of the > SELinux namespace, not any kind of deferred unsharing until the next > exec or clone. > Not saying that would be impossible, but since I was following the > example of clone(2) and unshare(2) I didn't do it. > May be some complications in doing so, but I haven't looked at it yet. if the hooks are setup correctly I expect it will actually remove some potential complications. But I haven't deep dived the selinux code yet so call that an uninformed hunch. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 18:40 ` Paul Moore 2025-08-19 18:58 ` Stephen Smalley @ 2025-08-21 7:23 ` John Johansen 2025-08-22 1:57 ` Paul Moore 2025-08-21 10:00 ` Mickaël Salaün 2 siblings, 1 reply; 43+ messages in thread From: John Johansen @ 2025-08-21 7:23 UTC (permalink / raw) To: Paul Moore, Casey Schaufler Cc: linux-security-module, selinux, Stephen Smalley On 8/19/25 11:40, Paul Moore wrote: > On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >> >> The advantage of a clone flag is that the operation is atomic with >> the other namespace flag based behaviors. Having a two step process >> >> clone(); lsm_set_self_attr(); - or - >> lsm_set_self_attr(); clone(); >> >> is going to lead to cases where neither order really works correctly. > > I was envisioning something that works similarly to LSM_ATTR_EXEC > where the unshare isn't immediate, but rather happens at a future > event. With LSM_ATTR_EXEC it happens at the next exec*(), with > LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). > I do think something like this is needed to deal well with the two step process. Without it is fairly easy to get into situations where you either need more permissions, than strictly necessary, because of steps in between or as Casey says things just don't work correctly. There will need to be an additional call that allows entering a namespace separately from clone/unshare, but that covers a different use case. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 7:23 ` John Johansen @ 2025-08-22 1:57 ` Paul Moore 2025-08-22 14:30 ` John Johansen 0 siblings, 1 reply; 43+ messages in thread From: Paul Moore @ 2025-08-22 1:57 UTC (permalink / raw) To: John Johansen Cc: Casey Schaufler, linux-security-module, selinux, Stephen Smalley On Thu, Aug 21, 2025 at 3:23 AM John Johansen <john.johansen@canonical.com> wrote: > On 8/19/25 11:40, Paul Moore wrote: > > On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > >> > >> The advantage of a clone flag is that the operation is atomic with > >> the other namespace flag based behaviors. Having a two step process > >> > >> clone(); lsm_set_self_attr(); - or - > >> lsm_set_self_attr(); clone(); > >> > >> is going to lead to cases where neither order really works correctly. > > > > I was envisioning something that works similarly to LSM_ATTR_EXEC > > where the unshare isn't immediate, but rather happens at a future > > event. With LSM_ATTR_EXEC it happens at the next exec*(), with > > LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). > > I do think something like this is needed to deal well with the two > step process. Without it is fairly easy to get into situations > where you either need more permissions, than strictly necessary, > because of steps in between or as Casey says things just don't work > correctly. I think we're starting to all coalesce on this basic idea now, at least for creating new LSM namespace sets, that's good. As the only LSM that really has a namespace currently, would AppArmor be able to work within the lsm_set_self_attr(2) approach, or would you need something a bit different? If so, can you give us a basic idea of what AA would need to work? > There will need to be an additional call that allows entering a > namespace separately from clone/unshare, but that covers a different > use case. In this particular case I've been thinking of not allowing the same level of arbitrary LSM namespace composability, but rather limiting the caller to the set of LSM namespaces already configured for a given process, using the procfs/setns(2) mechanism. Does that work for your use case(s), or do you need more flexibility? -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-22 1:57 ` Paul Moore @ 2025-08-22 14:30 ` John Johansen 0 siblings, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-22 14:30 UTC (permalink / raw) To: Paul Moore Cc: Casey Schaufler, linux-security-module, selinux, Stephen Smalley On 8/21/25 18:57, Paul Moore wrote: > On Thu, Aug 21, 2025 at 3:23 AM John Johansen > <john.johansen@canonical.com> wrote: >> On 8/19/25 11:40, Paul Moore wrote: >>> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> >>>> The advantage of a clone flag is that the operation is atomic with >>>> the other namespace flag based behaviors. Having a two step process >>>> >>>> clone(); lsm_set_self_attr(); - or - >>>> lsm_set_self_attr(); clone(); >>>> >>>> is going to lead to cases where neither order really works correctly. >>> >>> I was envisioning something that works similarly to LSM_ATTR_EXEC >>> where the unshare isn't immediate, but rather happens at a future >>> event. With LSM_ATTR_EXEC it happens at the next exec*(), with >>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). >> >> I do think something like this is needed to deal well with the two >> step process. Without it is fairly easy to get into situations >> where you either need more permissions, than strictly necessary, >> because of steps in between or as Casey says things just don't work >> correctly. > > I think we're starting to all coalesce on this basic idea now, at > least for creating new LSM namespace sets, that's good. As the only > LSM that really has a namespace currently, would AppArmor be able to > work within the lsm_set_self_attr(2) approach, or would you need > something a bit different? If so, can you give us a basic idea of > what AA would need to work? > >> There will need to be an additional call that allows entering a >> namespace separately from clone/unshare, but that covers a different >> use case. > > In this particular case I've been thinking of not allowing the same > level of arbitrary LSM namespace composability, but rather limiting > the caller to the set of LSM namespaces already configured for a given > process, using the procfs/setns(2) mechanism. Does that work for your > use case(s), or do you need more flexibility? > yes it should work, I think the LSM/security namespaces need to move together. In fact I want even less arbitrary composability as I think switching LSM namespaces should be able to force system namespace changes as well. Their are all kinds of potential security corner cases you have to worry about when trying to move them independently. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 18:40 ` Paul Moore 2025-08-19 18:58 ` Stephen Smalley 2025-08-21 7:23 ` John Johansen @ 2025-08-21 10:00 ` Mickaël Salaün 2025-08-22 2:14 ` Paul Moore 2 siblings, 1 reply; 43+ messages in thread From: Mickaël Salaün @ 2025-08-21 10:00 UTC (permalink / raw) To: Paul Moore Cc: Casey Schaufler, linux-security-module, selinux, John Johansen, Stephen Smalley, Maxime Bélair On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote: > On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > > > The advantage of a clone flag is that the operation is atomic with > > the other namespace flag based behaviors. Having a two step process > > > > clone(); lsm_set_self_attr(); - or - > > lsm_set_self_attr(); clone(); > > > > is going to lead to cases where neither order really works correctly. > > I was envisioning something that works similarly to LSM_ATTR_EXEC > where the unshare isn't immediate, but rather happens at a future > event. With LSM_ATTR_EXEC it happens at the next exec*(), with > LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). The next unshare(2) would make more sense to me. This deferred operation could be requested with a flag in lsm_config_system_policy(2) instead: https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 10:00 ` Mickaël Salaün @ 2025-08-22 2:14 ` Paul Moore 2025-08-22 14:47 ` Casey Schaufler 0 siblings, 1 reply; 43+ messages in thread From: Paul Moore @ 2025-08-22 2:14 UTC (permalink / raw) To: Mickaël Salaün Cc: Casey Schaufler, linux-security-module, selinux, John Johansen, Stephen Smalley, Maxime Bélair On Thu, Aug 21, 2025 at 6:00 AM Mickaël Salaün <mic@digikod.net> wrote: > On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote: > > On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > > > > > The advantage of a clone flag is that the operation is atomic with > > > the other namespace flag based behaviors. Having a two step process > > > > > > clone(); lsm_set_self_attr(); - or - > > > lsm_set_self_attr(); clone(); > > > > > > is going to lead to cases where neither order really works correctly. > > > > I was envisioning something that works similarly to LSM_ATTR_EXEC > > where the unshare isn't immediate, but rather happens at a future > > event. With LSM_ATTR_EXEC it happens at the next exec*(), with > > LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). > > The next unshare(2) would make more sense to me. That's definitely something to discuss. I've been fairly loose on that in the discussion thus far, but as things are starting to settle on the lsm_set_self_attr(2) approach as one API, we should start to clarify that. > This deferred operation could be requested with a flag in > lsm_config_system_policy(2) instead: > https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com I want to keep the policy syscall work separate from the LSM namespace discussion as we don't want to require a policy load operation to create a new LSM namespace. I think it's probably okay if the policy syscall work were to be namespace aware so that an orchestrator could load a LSM policy into a LSM namespace other than it's own, but that is still not overly dependent on what we are discussing here (yes, maybe it is a little, but only just so). -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-22 2:14 ` Paul Moore @ 2025-08-22 14:47 ` Casey Schaufler 2025-08-22 19:59 ` John Johansen 0 siblings, 1 reply; 43+ messages in thread From: Casey Schaufler @ 2025-08-22 14:47 UTC (permalink / raw) To: Paul Moore, Mickaël Salaün Cc: linux-security-module, selinux, John Johansen, Stephen Smalley, Maxime Bélair, Casey Schaufler On 8/21/2025 7:14 PM, Paul Moore wrote: > On Thu, Aug 21, 2025 at 6:00 AM Mickaël Salaün <mic@digikod.net> wrote: >> On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote: >>> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> The advantage of a clone flag is that the operation is atomic with >>>> the other namespace flag based behaviors. Having a two step process >>>> >>>> clone(); lsm_set_self_attr(); - or - >>>> lsm_set_self_attr(); clone(); >>>> >>>> is going to lead to cases where neither order really works correctly. >>> I was envisioning something that works similarly to LSM_ATTR_EXEC >>> where the unshare isn't immediate, but rather happens at a future >>> event. With LSM_ATTR_EXEC it happens at the next exec*(), with >>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). >> The next unshare(2) would make more sense to me. > That's definitely something to discuss. I've been fairly loose on > that in the discussion thus far, but as things are starting to settle > on the lsm_set_self_attr(2) approach as one API, we should start to > clarify that. > >> This deferred operation could be requested with a flag in >> lsm_config_system_policy(2) instead: >> https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com > I want to keep the policy syscall work separate from the LSM namespace > discussion as we don't want to require a policy load operation to > create a new LSM namespace. I think it's probably okay if the policy > syscall work were to be namespace aware so that an orchestrator could > load a LSM policy into a LSM namespace other than it's own, but that > is still not overly dependent on what we are discussing here (yes, > maybe it is a little, but only just so). Policy load and namespace manipulation *must* be kept separate. Smack requires the ability to "load policy" at any time. Smack allows a process to add "policy" to further restrict its own access, and does not require a namespace change. There has been an implementation of namespaces for Smack, but the developers disappeared quietly and sadly no one picked it up. Introducing a requirement that LSMs support namespaces in order to load policy beyond system initialization is a non-starter. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-22 14:47 ` Casey Schaufler @ 2025-08-22 19:59 ` John Johansen 2025-08-23 17:41 ` Dr. Greg 0 siblings, 1 reply; 43+ messages in thread From: John Johansen @ 2025-08-22 19:59 UTC (permalink / raw) To: Casey Schaufler, Paul Moore, Mickaël Salaün Cc: linux-security-module, selinux, Stephen Smalley, Maxime Bélair On 8/22/25 07:47, Casey Schaufler wrote: > On 8/21/2025 7:14 PM, Paul Moore wrote: >> On Thu, Aug 21, 2025 at 6:00 AM Mickaël Salaün <mic@digikod.net> wrote: >>> On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote: >>>> On Tue, Aug 19, 2025 at 1:11 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> The advantage of a clone flag is that the operation is atomic with >>>>> the other namespace flag based behaviors. Having a two step process >>>>> >>>>> clone(); lsm_set_self_attr(); - or - >>>>> lsm_set_self_attr(); clone(); >>>>> >>>>> is going to lead to cases where neither order really works correctly. >>>> I was envisioning something that works similarly to LSM_ATTR_EXEC >>>> where the unshare isn't immediate, but rather happens at a future >>>> event. With LSM_ATTR_EXEC it happens at the next exec*(), with >>>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). >>> The next unshare(2) would make more sense to me. >> That's definitely something to discuss. I've been fairly loose on >> that in the discussion thus far, but as things are starting to settle >> on the lsm_set_self_attr(2) approach as one API, we should start to >> clarify that. >> >>> This deferred operation could be requested with a flag in >>> lsm_config_system_policy(2) instead: >>> https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com >> I want to keep the policy syscall work separate from the LSM namespace >> discussion as we don't want to require a policy load operation to >> create a new LSM namespace. I think it's probably okay if the policy >> syscall work were to be namespace aware so that an orchestrator could >> load a LSM policy into a LSM namespace other than it's own, but that >> is still not overly dependent on what we are discussing here (yes, >> maybe it is a little, but only just so). > > Policy load and namespace manipulation *must* be kept separate. Smack > requires the ability to "load policy" at any time. Smack allows a process > to add "policy" to further restrict its own access, and does not require > a namespace change. There has been an implementation of namespaces for > Smack, but the developers disappeared quietly and sadly no one picked it > up. Introducing a requirement that LSMs support namespaces in order to > load policy beyond system initialization is a non-starter. > yes the ability to load policy must be exist separately, however policy load could be made namespace aware so that a parent could inject policy into a child. There is also an open question as to whether we need to allow, but not require, some kind of policy manipulation/injection with the creation of the LSM namespace so that the there is an atomic transition with entering the namespace. Is there a case where policy really needs to be present atomically with the creation of the namespace? If so we need to further break it down to 1. is it sufficient for the LSM to do it, without container manager guidance? An inherit of policy, or already present policy that can be injected. Then we don't need policy load inject to be considered at the point of clone/unshare. 2. do we need to let the container manager hint/load policy. So far I think the inherit/policy directed injection works for apparmor, and selinux. Container managers generally speaking have to additional setup after the container is created before running the work load, which means a separate load phase should be fine. However I can see an argument for having policy in place when clone/unshare exit. Admittedly atm its largely around flexibility, and nebulous ill defined use cases. Just because something works for apparmor, selinux, and I think smack, doesn't mean it would work for all use cases. But we also should add flexibility for flexibility just because we can see there might be some future utility for some future use case. It would certainly make the interface uglier, and more complicated, and I would hate to have to carry that without a concrete use case. I think unless there is a solid use case for making clone/unshare policy aware we don't worry about it for now. A new interface can be add in the future if the capability is really needed. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-22 19:59 ` John Johansen @ 2025-08-23 17:41 ` Dr. Greg 2025-08-23 23:00 ` John Johansen 0 siblings, 1 reply; 43+ messages in thread From: Dr. Greg @ 2025-08-23 17:41 UTC (permalink / raw) To: John Johansen Cc: Casey Schaufler, Paul Moore, Micka??l Sala??n, linux-security-module, selinux, Stephen Smalley, Maxime B??lair On Fri, Aug 22, 2025 at 12:59:29PM -0700, John Johansen wrote: Good morning, I hope the weekend is going well for everyone. > On 8/22/25 07:47, Casey Schaufler wrote: > >On 8/21/2025 7:14 PM, Paul Moore wrote: > >>On Thu, Aug 21, 2025 at 6:00???AM Micka??l Sala??n <mic@digikod.net> > >>wrote: > >>>On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote: > >>>>On Tue, Aug 19, 2025 at 1:11???PM Casey Schaufler > >>>><casey@schaufler-ca.com> wrote: > >>>>>The advantage of a clone flag is that the operation is atomic with > >>>>>the other namespace flag based behaviors. Having a two step process > >>>>> > >>>>> clone(); lsm_set_self_attr(); - or - > >>>>> lsm_set_self_attr(); clone(); > >>>>> > >>>>>is going to lead to cases where neither order really works correctly. > >>>>I was envisioning something that works similarly to LSM_ATTR_EXEC > >>>>where the unshare isn't immediate, but rather happens at a future > >>>>event. With LSM_ATTR_EXEC it happens at the next exec*(), with > >>>>LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). > >>>The next unshare(2) would make more sense to me. > >>That's definitely something to discuss. I've been fairly loose on > >>that in the discussion thus far, but as things are starting to settle > >>on the lsm_set_self_attr(2) approach as one API, we should start to > >>clarify that. > >> > >>>This deferred operation could be requested with a flag in > >>>lsm_config_system_policy(2) instead: > >>>https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com > >>I want to keep the policy syscall work separate from the LSM namespace > >>discussion as we don't want to require a policy load operation to > >>create a new LSM namespace. I think it's probably okay if the policy > >>syscall work were to be namespace aware so that an orchestrator could > >>load a LSM policy into a LSM namespace other than it's own, but that > >>is still not overly dependent on what we are discussing here (yes, > >>maybe it is a little, but only just so). > > > >Policy load and namespace manipulation *must* be kept separate. Smack > >requires the ability to "load policy" at any time. Smack allows a process > >to add "policy" to further restrict its own access, and does not require > >a namespace change. There has been an implementation of namespaces for > >Smack, but the developers disappeared quietly and sadly no one picked it > >up. Introducing a requirement that LSMs support namespaces in order to > >load policy beyond system initialization is a non-starter. > yes the ability to load policy must be exist separately, however > policy load could be made namespace aware so that a parent could > inject policy into a child. Policy or model load, specific to the subordinate namespace, will be a necessity. As Casey noted, some LSM namespaces will require configuration and management calls well after the namespace has started. Other LSM's will want the configuration to be completed before the namespace starts, with any further configurations to the namespace blocked. There is a very valid security rationale for isolating the capability for namespace separation from the capability that allows the configuration of a security model. It would be an entirely realistic security objective for a namespace to block further separation attempts, while still allowing for management operations to be conducted in the context of the subordinate namespace. Hence the rationale for splitting CAP_MAC_ADMIN from whatever name the bike shedding process around the new capability naming process produces. > There is also an open question as to whether we need to allow, but > not require, some kind of policy manipulation/injection with the > creation of the LSM namespace so that the there is an atomic > transition with entering the namespace. Is there a case where policy > really needs to be present atomically with the creation of the > namespace? If so we need to further break it down to > > 1. is it sufficient for the LSM to do it, without container manager > guidance? An inherit of policy, or already present policy that can be > injected. Then we don't need policy load inject to be considered at > the point of clone/unshare. > > 2. do we need to let the container manager hint/load policy. Policy load needs to be atomic with respect to namespace separation. In other words, the policy needs to be in place when execution within the context of the new security namespace begins. A resource orchestrator will need the ability to load the new policy that will be enforced into the context of the new namespace. In the case of some model/integrity based LSM's, the security events related to the policy load need to occur in the context of the parent LSM namespace. See the writings of Werner Karl Heisenberg for the reasoning behind that... :-) > So far I think the inherit/policy directed injection works for > apparmor, and selinux. Container managers generally speaking have to > additional setup after the container is created before running the > work load, which means a separate load phase should be fine. > > However I can see an argument for having policy in place when > clone/unshare exit. Admittedly atm its largely around flexibility, and > nebulous ill defined use cases. Just because something works for > apparmor, selinux, and I think smack, doesn't mean it would work for > all use cases. > > But we also should add flexibility for flexibility just because we can > see there might be some future utility for some future use case. It > would certainly make the interface uglier, and more complicated, and I > would hate to have to carry that without a concrete use case. > > I think unless there is a solid use case for making clone/unshare > policy aware we don't worry about it for now. A new interface can be > add in the future if the capability is really needed. We will respond more directly to the issue of clone, unshare and external process entry, in the other thread where you initiated a discussion of these issues. We believe there is a strong argument to be made that LSM namespace separation is a poor fit for the classic fork/unshare model of the other resource namespaces. Among other issues, a direct separation model places the complexity of policy verification and loading in userspace. As was noted above, accounting for the security events related to the policy verification and load process, in the orchestrator process, will be a requirement for some integrity and functional models. Have a good weekend. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-23 17:41 ` Dr. Greg @ 2025-08-23 23:00 ` John Johansen 0 siblings, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-23 23:00 UTC (permalink / raw) To: Dr. Greg Cc: Casey Schaufler, Paul Moore, Micka??l Sala??n, linux-security-module, selinux, Stephen Smalley, Maxime B??lair On 8/23/25 10:41, Dr. Greg wrote: > On Fri, Aug 22, 2025 at 12:59:29PM -0700, John Johansen wrote: > > Good morning, I hope the weekend is going well for everyone. > >> On 8/22/25 07:47, Casey Schaufler wrote: >>> On 8/21/2025 7:14 PM, Paul Moore wrote: >>>> On Thu, Aug 21, 2025 at 6:00???AM Micka??l Sala??n <mic@digikod.net> >>>> wrote: >>>>> On Tue, Aug 19, 2025 at 02:40:52PM -0400, Paul Moore wrote: >>>>>> On Tue, Aug 19, 2025 at 1:11???PM Casey Schaufler >>>>>> <casey@schaufler-ca.com> wrote: >>>>>>> The advantage of a clone flag is that the operation is atomic with >>>>>>> the other namespace flag based behaviors. Having a two step process >>>>>>> >>>>>>> clone(); lsm_set_self_attr(); - or - >>>>>>> lsm_set_self_attr(); clone(); >>>>>>> >>>>>>> is going to lead to cases where neither order really works correctly. >>>>>> I was envisioning something that works similarly to LSM_ATTR_EXEC >>>>>> where the unshare isn't immediate, but rather happens at a future >>>>>> event. With LSM_ATTR_EXEC it happens at the next exec*(), with >>>>>> LSM_ATTR_UNSHARE I imagine it would happen at the next clone*(). >>>>> The next unshare(2) would make more sense to me. >>>> That's definitely something to discuss. I've been fairly loose on >>>> that in the discussion thus far, but as things are starting to settle >>>> on the lsm_set_self_attr(2) approach as one API, we should start to >>>> clarify that. >>>> >>>>> This deferred operation could be requested with a flag in >>>>> lsm_config_system_policy(2) instead: >>>>> https://lore.kernel.org/r/20250709080220.110947-1-maxime.belair@canonical.com >>>> I want to keep the policy syscall work separate from the LSM namespace >>>> discussion as we don't want to require a policy load operation to >>>> create a new LSM namespace. I think it's probably okay if the policy >>>> syscall work were to be namespace aware so that an orchestrator could >>>> load a LSM policy into a LSM namespace other than it's own, but that >>>> is still not overly dependent on what we are discussing here (yes, >>>> maybe it is a little, but only just so). >>> >>> Policy load and namespace manipulation *must* be kept separate. Smack >>> requires the ability to "load policy" at any time. Smack allows a process >>> to add "policy" to further restrict its own access, and does not require >>> a namespace change. There has been an implementation of namespaces for >>> Smack, but the developers disappeared quietly and sadly no one picked it >>> up. Introducing a requirement that LSMs support namespaces in order to >>> load policy beyond system initialization is a non-starter. > >> yes the ability to load policy must be exist separately, however >> policy load could be made namespace aware so that a parent could >> inject policy into a child. > > Policy or model load, specific to the subordinate namespace, will be > a necessity. > > As Casey noted, some LSM namespaces will require configuration and > management calls well after the namespace has started. Other LSM's > will want the configuration to be completed before the namespace > starts, with any further configurations to the namespace blocked. > > There is a very valid security rationale for isolating the capability > for namespace separation from the capability that allows the > configuration of a security model. It would be an entirely realistic > security objective for a namespace to block further separation > attempts, while still allowing for management operations to be > conducted in the context of the subordinate namespace. > > Hence the rationale for splitting CAP_MAC_ADMIN from whatever name the > bike shedding process around the new capability naming process > produces. > >> There is also an open question as to whether we need to allow, but >> not require, some kind of policy manipulation/injection with the >> creation of the LSM namespace so that the there is an atomic >> transition with entering the namespace. Is there a case where policy >> really needs to be present atomically with the creation of the >> namespace? If so we need to further break it down to >> >> 1. is it sufficient for the LSM to do it, without container manager >> guidance? An inherit of policy, or already present policy that can be >> injected. Then we don't need policy load inject to be considered at >> the point of clone/unshare. >> >> 2. do we need to let the container manager hint/load policy. > > Policy load needs to be atomic with respect to namespace separation. > In other words, the policy needs to be in place when execution within > the context of the new security namespace begins. > no, it _may_ need to be depending on the model/policy being used, and an LSM is in the best place to make that decision and do it for its own policy as long as the infrastructure supports it. > A resource orchestrator will need the ability to load the new policy > that will be enforced into the context of the new namespace. No an LSM is fully capable of doing this and honestly in a better position to do so for its own policy than an external orchestrator. Where coordination orchestration is need is at the infrastructure layer (LSM), to ensure once everything is decided by inidivual LSMs that what the security context is atomically setup correctly. So in that sense the LSM infrastructure is an orchestrator, but only in the loosest sense. > > In the case of some model/integrity based LSM's, the security events > related to the policy load need to occur in the context of the parent > LSM namespace. > yes, it very much depends on the model. I would argue if the LSM needs this. 1. the policy at the exec/fork/clone/unshare point already needs to be loaded. 2. the LSMs policy needs a way to initiate the transition. Eg. in the selinux case, the transition is setting up a new layer in mediation that will be bounded by the previous layers. There isn't a transition from one policy to another, but adding a new layer on top of. > See the writings of Werner Karl Heisenberg for the reasoning behind > that... :-) > >> So far I think the inherit/policy directed injection works for >> apparmor, and selinux. Container managers generally speaking have to >> additional setup after the container is created before running the >> work load, which means a separate load phase should be fine. >> >> However I can see an argument for having policy in place when >> clone/unshare exit. Admittedly atm its largely around flexibility, and >> nebulous ill defined use cases. Just because something works for >> apparmor, selinux, and I think smack, doesn't mean it would work for >> all use cases. >> >> But we also should add flexibility for flexibility just because we can >> see there might be some future utility for some future use case. It >> would certainly make the interface uglier, and more complicated, and I >> would hate to have to carry that without a concrete use case. >> >> I think unless there is a solid use case for making clone/unshare >> policy aware we don't worry about it for now. A new interface can be >> add in the future if the capability is really needed. > > We will respond more directly to the issue of clone, unshare and > external process entry, in the other thread where you initiated a > discussion of these issues. We believe there is a strong argument to > be made that LSM namespace separation is a poor fit for the classic > fork/unshare model of the other resource namespaces. > the other resource namespaces being able to move independent of the security namespace, or at least mediation by the security namespace is a complete disaster and should not have ever been allowed. > Among other issues, a direct separation model places the complexity of > policy verification and loading in userspace. As was noted above, > accounting for the security events related to the policy verification > and load process, in the orchestrator process, will be a requirement > for some integrity and functional models. > There are different levels of verification. It makes sense to do some of it in the individual LSM, some of it in userspace, and potentially some at another level in another LSM. Unfortunately Linux has forced the concept of containers to be a user level construct, and this forces certain verifications around containers to be in userspace. AppArmor does a policy verification checking that policy meet all the bounding constraints etc. Is very different than the verification check that IMA may doing check that this policy is blessed and allowed to be loaded. AppArmor could support some IMA verification but is very much designed to be like landlock in that unprivileged userspace _may_ have privilege to load policy into the kernel. You may not want to allow that on some systems, but you certainly do on others. The system level signature check that IMA does isn't appropriate for unprivileged user policy. But the apparmor verification check is. and Yes something like IMA that is doing a system level integrity is going to need a post policy load callback to do verification. This again doesn't need an orchestrator, but just support in the infrastructure, and a callback that individual LSMs can trigger. See the work Paul is doing to rework the LSM init or how IMA is doing a verification of selinux policy. Of course you have to trust the LSMs to trigger the callback, but its opersource and the code can be checked. If you can't trust the individual LSMs you have a much bigger problem because you just can't trust a monolithic kernel and you are going need a trust zone/hyper visor above the kernel to do any form of integrity check you can trust. > Have a good weekend. > > As always, > Dr. Greg > > The Quixote Project - Flailing at the Travails of Cybersecurity > https://github.com/Quixote-Project > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 14:56 LSM namespacing API Paul Moore 2025-08-19 17:11 ` Casey Schaufler @ 2025-08-19 17:47 ` Stephen Smalley 2025-08-19 18:51 ` Paul Moore 2025-08-21 7:46 ` John Johansen 2025-08-21 7:14 ` John Johansen 2025-08-21 11:20 ` Dr. Greg 3 siblings, 2 replies; 43+ messages in thread From: Stephen Smalley @ 2025-08-19 17:47 UTC (permalink / raw) To: Paul Moore; +Cc: linux-security-module, selinux, John Johansen On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote: > > Hello all, > > As most of you are likely aware, Stephen Smalley has been working on > adding namespace support to SELinux, and the work has now progressed > to the point where a serious discussion on the API is warranted. For > those of you are unfamiliar with the details or Stephen's patchset, or > simply need a refresher, he has some excellent documentation in his > work-in-progress repo: > > * https://github.com/stephensmalley/selinuxns > > Stephen also gave a (pre-recorded) presentation at LSS-NA this year > about SELinux namespacing, you can watch the presentation here: > > * https://www.youtube.com/watch?v=AwzGCOwxLoM > > In the past you've heard me state, rather firmly at times, that I > believe namespacing at the LSM framework layer to be a mistake, > although if there is something that can be done to help facilitate the > namespacing of individual LSMs at the framework layer, I would be > supportive of that. I think that a single LSM namespace API, similar > to our recently added LSM syscalls, may be such a thing, so I'd like > us to have a discussion to see if we all agree on that, and if so, > what such an API might look like. > > At LSS-NA this year, John Johansen and I had a brief discussion where > he suggested a single LSM wide clone*(2) flag that individual LSM's > could opt into via callbacks. John is directly CC'd on this mail, so > I'll let him expand on this idea. > > While I agree with John that a fs based API is problematic (see all of > our discussions around the LSM syscalls), I'm concerned that a single > clone*(2) flag will significantly limit our flexibility around how > individual LSMs are namespaced, something I don't want to see happen. > This makes me wonder about the potential for expanding > lsm_set_self_attr(2) to support a new LSM attribute that would support > a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > provide a single LSM framework API for an unshare operation while also > providing a mechanism to pass LSM specific via the lsm_ctx struct if > needed. Just as we do with the other LSM_ATTR_* flags today, > individual LSMs can opt-in to the API fairly easily by providing a > setselfattr() LSM callback. > > Thoughts? I think we want to be able to unshare a specific security module namespace without unsharing the others, i.e. just SELinux or just AppArmor. Not sure if your suggestion above supports that already but wanted to note it. Regardless, I have no objections to any system call or flag that can be used to unshare the SELinux namespace and it should be trivial to wire it up to the existing underlying function. Serge pointed out that we also will need an API to attach to an existing SELinux namespace, which I captured here: https://github.com/stephensmalley/selinuxns/issues/19 This is handled for other Linux namespaces by opening a pseudo file under /proc/pid/ns and invoking setns(2), so not sure how we want to do it. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 17:47 ` Stephen Smalley @ 2025-08-19 18:51 ` Paul Moore 2025-08-19 18:52 ` Paul Moore ` (2 more replies) 2025-08-21 7:46 ` John Johansen 1 sibling, 3 replies; 43+ messages in thread From: Paul Moore @ 2025-08-19 18:51 UTC (permalink / raw) To: Stephen Smalley; +Cc: linux-security-module, selinux, John Johansen On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley <stephen.smalley.work@gmail.com> wrote: > > I think we want to be able to unshare a specific security module > namespace without unsharing the others, i.e. just SELinux or just > AppArmor. > Not sure if your suggestion above supports that already but wanted to note it. The lsm_set_self_attr(2) approach allows for LSM specific unshare operations. Take the existing LSM_ATTR_EXEC attribute as an example, two LSMs have implemented support (AppArmor and SELinux), and userspace can independently set the attribute as desired for each LSM. > Serge pointed out that we also will need an API to attach to an > existing SELinux namespace, which I captured here: > https://github.com/stephensmalley/selinuxns/issues/19 > This is handled for other Linux namespaces by opening a pseudo file > under /proc/pid/ns and invoking setns(2), so not sure how we want to > do it. One option would be to have a the LSM framework return a LSM namespace "handle" for a given LSM using lsm_get_self_attr(2) and then do a setns(2)-esque operation using lsm_set_self_attr(2) with that "handle". We would need to figure out what would constitute a "handle" but let's just mark that as TBD for now with this approach (I think better options are available). Since we have an existing LSM namespace combination, with processes running inside of it, it might be sufficient to simply support moving into an existing LSM namespace set with setns(2) using only a pidfd and a new CLONE_LSMNS flag (or similar, upstream might want this as CLONE_NEWLSM). This would simply set the LSM namespace set for the setns(2) caller to match that of the target pidfd. We still wouldn't want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). Any other ideas? -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 18:51 ` Paul Moore @ 2025-08-19 18:52 ` Paul Moore 2025-08-20 14:44 ` Mickaël Salaün 2025-08-21 2:05 ` Serge E. Hallyn 2 siblings, 0 replies; 43+ messages in thread From: Paul Moore @ 2025-08-19 18:52 UTC (permalink / raw) To: Stephen Smalley; +Cc: linux-security-module, selinux, John Johansen On Tue, Aug 19, 2025 at 2:51 PM Paul Moore <paul@paul-moore.com> wrote: > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > <stephen.smalley.work@gmail.com> wrote: > > > > I think we want to be able to unshare a specific security module > > namespace without unsharing the others, i.e. just SELinux or just > > AppArmor. > > Not sure if your suggestion above supports that already but wanted to note it. > > The lsm_set_self_attr(2) approach allows for LSM specific unshare > operations. Take the existing LSM_ATTR_EXEC attribute as an example, > two LSMs have implemented support (AppArmor and SELinux), and > userspace can independently set the attribute as desired for each LSM. I should add, for those that didn't follow the lsm_set_self_attr(2) development, if you want to set the same attribute on multiple LSMs, you must make multiple calls to lsm_set_self_attr(2) (think of error handling/conditions). -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 18:51 ` Paul Moore 2025-08-19 18:52 ` Paul Moore @ 2025-08-20 14:44 ` Mickaël Salaün 2025-08-20 15:37 ` Casey Schaufler 2025-08-20 20:47 ` Paul Moore 2025-08-21 2:05 ` Serge E. Hallyn 2 siblings, 2 replies; 43+ messages in thread From: Mickaël Salaün @ 2025-08-20 14:44 UTC (permalink / raw) To: Paul Moore Cc: Stephen Smalley, linux-security-module, selinux, John Johansen, Maxime Bélair On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > <stephen.smalley.work@gmail.com> wrote: > > > > I think we want to be able to unshare a specific security module > > namespace without unsharing the others, i.e. just SELinux or just > > AppArmor. > > Not sure if your suggestion above supports that already but wanted to note it. > > The lsm_set_self_attr(2) approach allows for LSM specific unshare > operations. Take the existing LSM_ATTR_EXEC attribute as an example, > two LSMs have implemented support (AppArmor and SELinux), and > userspace can independently set the attribute as desired for each LSM. > > > Serge pointed out that we also will need an API to attach to an > > existing SELinux namespace, which I captured here: > > https://github.com/stephensmalley/selinuxns/issues/19 > > This is handled for other Linux namespaces by opening a pseudo file > > under /proc/pid/ns and invoking setns(2), so not sure how we want to > > do it. > > One option would be to have a the LSM framework return a LSM namespace > "handle" for a given LSM using lsm_get_self_attr(2) and then do a > setns(2)-esque operation using lsm_set_self_attr(2) with that > "handle". We would need to figure out what would constitute a > "handle" but let's just mark that as TBD for now with this approach (I > think better options are available). > > Since we have an existing LSM namespace combination, with processes > running inside of it, it might be sufficient to simply support moving > into an existing LSM namespace set with setns(2) using only a pidfd > and a new CLONE_LSMNS flag (or similar, upstream might want this as > CLONE_NEWLSM). This would simply set the LSM namespace set for the Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM because the goal is not to add a new LSM but a new "security" namespace. To fit with existing capabilities that could be reused by such security namespace (CAP_MAC_ADMIN), CLONE_NEWMAC is another option. I know that LSM may not be enforce MAC, but I think "LSM" would be confusing for users. > setns(2) caller to match that of the target pidfd. We still wouldn't > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). Why making clone*() support this flag would be an issue? > > Any other ideas? The goal of a namespace is to configure absolute references (e.g. file path, network address, PID, time). I think it would make sense to have an LSM/MAC/SEC namespace that would enforce a consistent access control on every processes in this namespace. A related namespace file descriptor could then be used with an LSM-specific syscall to configure the policy related to a specific namespace (instead of only the current namespace), see https://lore.kernel.org/r/20250820.Ao3iquoshaiB@digikod.net That would enables us to build a context before running untrusted code in it, and to update the related security policy without requiring a trusted (and exposed) process in each namespace. I guess a security namespace would not be exclusive to an LSM but could be shared, right? If yes, then it's OK to only have one new security namespace instead of one per LSM. ;) ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-20 14:44 ` Mickaël Salaün @ 2025-08-20 15:37 ` Casey Schaufler 2025-08-20 20:47 ` Paul Moore 1 sibling, 0 replies; 43+ messages in thread From: Casey Schaufler @ 2025-08-20 15:37 UTC (permalink / raw) To: Mickaël Salaün, Paul Moore Cc: Stephen Smalley, linux-security-module, selinux, John Johansen, Maxime Bélair, Casey Schaufler On 8/20/2025 7:44 AM, Mickaël Salaün wrote: > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: >> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley >> <stephen.smalley.work@gmail.com> wrote: >>> I think we want to be able to unshare a specific security module >>> namespace without unsharing the others, i.e. just SELinux or just >>> AppArmor. >>> Not sure if your suggestion above supports that already but wanted to note it. >> The lsm_set_self_attr(2) approach allows for LSM specific unshare >> operations. Take the existing LSM_ATTR_EXEC attribute as an example, >> two LSMs have implemented support (AppArmor and SELinux), and >> userspace can independently set the attribute as desired for each LSM. >> >>> Serge pointed out that we also will need an API to attach to an >>> existing SELinux namespace, which I captured here: >>> https://github.com/stephensmalley/selinuxns/issues/19 >>> This is handled for other Linux namespaces by opening a pseudo file >>> under /proc/pid/ns and invoking setns(2), so not sure how we want to >>> do it. >> One option would be to have a the LSM framework return a LSM namespace >> "handle" for a given LSM using lsm_get_self_attr(2) and then do a >> setns(2)-esque operation using lsm_set_self_attr(2) with that >> "handle". We would need to figure out what would constitute a >> "handle" but let's just mark that as TBD for now with this approach (I >> think better options are available). >> >> Since we have an existing LSM namespace combination, with processes >> running inside of it, it might be sufficient to simply support moving >> into an existing LSM namespace set with setns(2) using only a pidfd >> and a new CLONE_LSMNS flag (or similar, upstream might want this as >> CLONE_NEWLSM). This would simply set the LSM namespace set for the > Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM > because the goal is not to add a new LSM but a new "security" namespace. > To fit with existing capabilities that could be reused by such security > namespace (CAP_MAC_ADMIN), CLONE_NEWMAC is another option. I know that > LSM may not be enforce MAC, but I think "LSM" would be confusing for > users. I disagree. Using MAC in the name is bad because many LSMs don't do MAC. Using SEC is even worse, because no two "users" define "security" the same way, and most of what implements security in Linux is outside of LSMs. Since this feature would be limited to use by LSMs, it makes sense that LSM be in the name. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-20 14:44 ` Mickaël Salaün 2025-08-20 15:37 ` Casey Schaufler @ 2025-08-20 20:47 ` Paul Moore 2025-08-21 9:56 ` Mickaël Salaün 1 sibling, 1 reply; 43+ messages in thread From: Paul Moore @ 2025-08-20 20:47 UTC (permalink / raw) To: Mickaël Salaün Cc: Stephen Smalley, linux-security-module, selinux, John Johansen, Maxime Bélair On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote: > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > > <stephen.smalley.work@gmail.com> wrote: ... > > Since we have an existing LSM namespace combination, with processes > > running inside of it, it might be sufficient to simply support moving > > into an existing LSM namespace set with setns(2) using only a pidfd > > and a new CLONE_LSMNS flag (or similar, upstream might want this as > > CLONE_NEWLSM). This would simply set the LSM namespace set for the > > Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM > because the goal is not to add a new LSM but a new "security" namespace. I disagree with your statement about the goal. In fact I would argue that one of the goals is to explicitly *not* create a generic "security" namespace. Defining a single, LSM-wide namespace, is already an almost impossible task, extending it to become a generic "security" namespace seems maddening. > > setns(2) caller to match that of the target pidfd. We still wouldn't > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). > > Why making clone*() support this flag would be an issue? With the understanding that I'm not going to support a single LSM-wide namespace (see my previous comments), we would need multiple flags for clone*(), one for each LSM that wanted to implement a namespace. While clone3() has expanded the number of flag bits from clone(), there is still a limitation of 64-bits and I'm fairly certain the other kernel devs are not going to be supportive of a flag for each LSM that wants one. Maybe we could argue for our own u64 in cl_args, or create our own lsm_clone(2) syscall that mimics clone3(2) with better LSM support, but neither of these seem like great ideas at the moment. > > Any other ideas? > > The goal of a namespace is to configure absolute references (e.g. file > path, network address, PID, time). I think it would make sense to have > an LSM/MAC/SEC namespace that would enforce a consistent access control > on every processes in this namespace. Once again, I'm not going to support the idea of a namespace at the LSM framework layer, individual LSMs are better suited to implementing their own namespacing concepts. However, I do support the LSM framework providing an API and/or helpers to help make it easier for individual LSMs and userspace to create/manage individual LSM namespaces. > A related namespace file > descriptor could then be used with an LSM-specific syscall to configure > the policy related to a specific namespace (instead of only the current > namespace) That is a reasonable request, and I think the same underlying solution that we would use for setns(2) could also be used here. -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-20 20:47 ` Paul Moore @ 2025-08-21 9:56 ` Mickaël Salaün 2025-08-21 14:18 ` John Johansen 2025-08-22 2:09 ` Paul Moore 0 siblings, 2 replies; 43+ messages in thread From: Mickaël Salaün @ 2025-08-21 9:56 UTC (permalink / raw) To: Paul Moore Cc: Stephen Smalley, linux-security-module, selinux, John Johansen, Maxime Bélair On Wed, Aug 20, 2025 at 04:47:15PM -0400, Paul Moore wrote: > On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > > > <stephen.smalley.work@gmail.com> wrote: > > ... > > > > Since we have an existing LSM namespace combination, with processes > > > running inside of it, it might be sufficient to simply support moving > > > into an existing LSM namespace set with setns(2) using only a pidfd > > > and a new CLONE_LSMNS flag (or similar, upstream might want this as > > > CLONE_NEWLSM). This would simply set the LSM namespace set for the > > > > Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM > > because the goal is not to add a new LSM but a new "security" namespace. > > I disagree with your statement about the goal. In fact I would argue > that one of the goals is to explicitly *not* create a generic > "security" namespace. Defining a single, LSM-wide namespace, is > already an almost impossible task, extending it to become a generic > "security" namespace seems maddening. I didn't suggest a generic "security" namespace that would include non-LSM access checks, just using the name "security" instead of "LSM", but never mind. > > > > setns(2) caller to match that of the target pidfd. We still wouldn't > > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). > > > > Why making clone*() support this flag would be an issue? > > With the understanding that I'm not going to support a single LSM-wide > namespace (see my previous comments), we would need multiple flags for I'm confused about the goal of this thread... When I read namespace I think about the user space interface that enables to tie a set of processes to ambient kernel objects. I'm not suggesting to force all LSM to handle namespaces, but to have a unified user space interface (i.e. namespace flag, file descriptor...) that can be used by user space to request a new "context" that may or may not be used by running LSMs. > clone*(), one for each LSM that wanted to implement a namespace. My understanding of this proposal was to create a LSM-wide namespace, and one of the reason was to avoid one namespace per LSM. As I explained in my previous email, I think it would make sense and could be convincing. > While clone3() has expanded the number of flag bits from clone(), > there is still a limitation of 64-bits and I'm fairly certain the > other kernel devs are not going to be supportive of a flag for each > LSM that wants one. > > Maybe we could argue for our own u64 in cl_args, or create our own > lsm_clone(2) syscall that mimics clone3(2) with better LSM support, > but neither of these seem like great ideas at the moment. My idea was that using CLONE_NEWLSM would just fork the current/initial namespace used by LSMs to tie security policies/configurations to processes, but as John already said, it would be the responsibility of each LSM to either inherit and keep in sync the parent policy (e.g. SELinux) or start with a blank/default one (e.g. Yama). One way to configure a newly created namespace could be to load a configuration in the parent namespace (e.g. with one of the new LSM config syscall and a dedicated flag) that would only be applied to child namespaces when they are created, similarly to attr/exec for execve(2). I think this is what you meant with the LSM_UNSHARE flag, right? > > > > Any other ideas? > > > > The goal of a namespace is to configure absolute references (e.g. file > > path, network address, PID, time). I think it would make sense to have > > an LSM/MAC/SEC namespace that would enforce a consistent access control > > on every processes in this namespace. > > Once again, I'm not going to support the idea of a namespace at the > LSM framework layer, individual LSMs are better suited to implementing > their own namespacing concepts. However, I do support the LSM > framework providing an API and/or helpers to help make it easier for > individual LSMs and userspace to create/manage individual LSM > namespaces. Should we still talk about "namespace" or use another name? > > > A related namespace file > > descriptor could then be used with an LSM-specific syscall to configure > > the policy related to a specific namespace (instead of only the current > > namespace) > > That is a reasonable request, and I think the same underlying solution > that we would use for setns(2) could also be used here. I'm not sure having a set of namespace file descriptors without related clone flags would be acceptable, at least for what we currently call Linux "namespace". ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 9:56 ` Mickaël Salaün @ 2025-08-21 14:18 ` John Johansen 2025-08-22 2:09 ` Paul Moore 1 sibling, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-21 14:18 UTC (permalink / raw) To: Mickaël Salaün, Paul Moore Cc: Stephen Smalley, linux-security-module, selinux, Maxime Bélair On 8/21/25 02:56, Mickaël Salaün wrote: > On Wed, Aug 20, 2025 at 04:47:15PM -0400, Paul Moore wrote: >> On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote: >>> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: >>>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley >>>> <stephen.smalley.work@gmail.com> wrote: >> >> ... >> >>>> Since we have an existing LSM namespace combination, with processes >>>> running inside of it, it might be sufficient to simply support moving >>>> into an existing LSM namespace set with setns(2) using only a pidfd >>>> and a new CLONE_LSMNS flag (or similar, upstream might want this as >>>> CLONE_NEWLSM). This would simply set the LSM namespace set for the >>> >>> Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM >>> because the goal is not to add a new LSM but a new "security" namespace. >> >> I disagree with your statement about the goal. In fact I would argue >> that one of the goals is to explicitly *not* create a generic >> "security" namespace. Defining a single, LSM-wide namespace, is >> already an almost impossible task, extending it to become a generic >> "security" namespace seems maddening. > > I didn't suggest a generic "security" namespace that would include > non-LSM access checks, just using the name "security" instead of "LSM", > but never mind. > >> >>>> setns(2) caller to match that of the target pidfd. We still wouldn't >>>> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). >>> >>> Why making clone*() support this flag would be an issue? >> >> With the understanding that I'm not going to support a single LSM-wide >> namespace (see my previous comments), we would need multiple flags for > > I'm confused about the goal of this thread... When I read namespace I > think about the user space interface that enables to tie a set of > processes to ambient kernel objects. I'm not suggesting to force all > LSM to handle namespaces, but to have a unified user space interface > (i.e. namespace flag, file descriptor...) that can be used by user space > to request a new "context" that may or may not be used by running LSMs. > Yes to a unified interface, no to an LSM wide namespace. The interface could request of the LSM to namespace, but its up to the LSM what it will do. If it creates a namespace, whether that namespace is hierarchical, or flat. You would at the end of the call likely get a proxy object to a set of individual LSM namespace contexts. Not that different than you have a set of different system namespaces, mount, pid, user, ... >> clone*(), one for each LSM that wanted to implement a namespace. > > My understanding of this proposal was to create a LSM-wide namespace, > and one of the reason was to avoid one namespace per LSM. As I no each LSM will do its own thing wrt namespacing. The proposal is just to provide a common API and minimal infra around it. > explained in my previous email, I think it would make sense and could be > convincing. > I have to agree with Paul that we won't generically agree on what an LSM namespace should be. >> While clone3() has expanded the number of flag bits from clone(), >> there is still a limitation of 64-bits and I'm fairly certain the >> other kernel devs are not going to be supportive of a flag for each >> LSM that wants one. >> >> Maybe we could argue for our own u64 in cl_args, or create our own >> lsm_clone(2) syscall that mimics clone3(2) with better LSM support, >> but neither of these seem like great ideas at the moment. > > My idea was that using CLONE_NEWLSM would just fork the current/initial > namespace used by LSMs to tie security policies/configurations to > processes, but as John already said, it would be the responsibility of > each LSM to either inherit and keep in sync the parent policy (e.g. > SELinux) or start with a blank/default one (e.g. Yama). > Its not just these options though. The container manager may want to "drop/add" an LSM. Eg. one fedora/RH booting an Ubuntu container your host has selinux the container wants apparmor. In reality you have both selinux and apparmor active on the system, but selinux is an enforcing state, and apparmor is in a no-policy state. selinux could deny creating the namespace, it could return its current state, or it could mask itself by creating a namespace for the container with the default unconfined_t policy, but its current state is still there bounding the container, the container just doesn't see it. On the AppArmor side at the request for a new namespace with apparmor it needs to decide what to do independent of what selinux does. Yes if configured correctly it should setup its policy namespace for the container, but it has choices just like selinux that are driven by policy as well as the userspace request for a specific combination of LSMs for the cntainer. > One way to configure a newly created namespace could be to load a > configuration in the parent namespace (e.g. with one of the new LSM > config syscall and a dedicated flag) that would only be applied to child > namespaces when they are created, similarly to attr/exec for execve(2). host injecting policy into the container certainly could be supported but I think that would be a per LSM thing. attr/exec flags Paul was discussing (correct me if I am wrong), where a way to specify which LSMs should but part of the unshare. So the whole I want a container to support Ubuntu or RH and need these LSMs. > I think this is what you meant with the LSM_UNSHARE flag, right? > Per my above understanding the LSM_UNSHARE flag is then just a namespacing that indicates you want to unshare the LSM and use the afore mentioned attrs. I don't think it is actually needed, but maybe desirable for consistency. If you have already set the above attrs, that already indicates what you want to do with the namespace at clone/unshare. This then gets fed into every LSM (whether in the attrs or not). So they can make current policy decision, and then if allowed, as second hook with the info, so that they can each setup and return with their context setup. Not really all that different from exec. >> >>>> Any other ideas? >>> >>> The goal of a namespace is to configure absolute references (e.g. file >>> path, network address, PID, time). I think it would make sense to have >>> an LSM/MAC/SEC namespace that would enforce a consistent access control >>> on every processes in this namespace. >> >> Once again, I'm not going to support the idea of a namespace at the >> LSM framework layer, individual LSMs are better suited to implementing >> their own namespacing concepts. However, I do support the LSM >> framework providing an API and/or helpers to help make it easier for >> individual LSMs and userspace to create/manage individual LSM >> namespaces. > > Should we still talk about "namespace" or use another name? > its namespaces for LSMs, just not an LSM namespace. >> >>> A related namespace file >>> descriptor could then be used with an LSM-specific syscall to configure >>> the policy related to a specific namespace (instead of only the current >>> namespace) >> >> That is a reasonable request, and I think the same underlying solution >> that we would use for setns(2) could also be used here. > > I'm not sure having a set of namespace file descriptors without related > clone flags would be acceptable, at least for what we currently call > Linux "namespace". well Paul did propose a single Clone_LSM flag that would cover them ;-). Agree with Paul that a per LSM flag would be unlikely and just raise the whole, security is crazy why can't you agree on one "fun". ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 9:56 ` Mickaël Salaün 2025-08-21 14:18 ` John Johansen @ 2025-08-22 2:09 ` Paul Moore 1 sibling, 0 replies; 43+ messages in thread From: Paul Moore @ 2025-08-22 2:09 UTC (permalink / raw) To: Mickaël Salaün Cc: Stephen Smalley, linux-security-module, selinux, John Johansen, Maxime Bélair On Thu, Aug 21, 2025 at 5:56 AM Mickaël Salaün <mic@digikod.net> wrote: > On Wed, Aug 20, 2025 at 04:47:15PM -0400, Paul Moore wrote: > > On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic@digikod.net> wrote: > > > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > > > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > > > > <stephen.smalley.work@gmail.com> wrote: > > > > ... > > > > > > Since we have an existing LSM namespace combination, with processes > > > > running inside of it, it might be sufficient to simply support moving > > > > into an existing LSM namespace set with setns(2) using only a pidfd > > > > and a new CLONE_LSMNS flag (or similar, upstream might want this as > > > > CLONE_NEWLSM). This would simply set the LSM namespace set for the > > > > > > Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM > > > because the goal is not to add a new LSM but a new "security" namespace. > > > > I disagree with your statement about the goal. In fact I would argue > > that one of the goals is to explicitly *not* create a generic > > "security" namespace. Defining a single, LSM-wide namespace, is > > already an almost impossible task, extending it to become a generic > > "security" namespace seems maddening. > > I didn't suggest a generic "security" namespace that would include > non-LSM access checks, just using the name "security" instead of "LSM", > but never mind. > > > > > setns(2) caller to match that of the target pidfd. We still wouldn't > > > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). > > > > > > Why making clone*() support this flag would be an issue? > > > > With the understanding that I'm not going to support a single LSM-wide > > namespace (see my previous comments), we would need multiple flags for > > I'm confused about the goal of this thread... When I read namespace I > think about the user space interface that enables to tie a set of > processes to ambient kernel objects. I'm not suggesting to force all > LSM to handle namespaces, but to have a unified user space interface > (i.e. namespace flag, file descriptor...) that can be used by user space > to request a new "context" that may or may not be used by running LSMs. The goal of this thread is to hopefully define a set of APIs that allow userspace to create new LSM namespace sets, and join existing LSM namespace sets. We're not necessarily focused on any individual LSM namespace concepts, beyond ensuring that the API provides enough flexibility for the different concepts to be implemented. > > clone*(), one for each LSM that wanted to implement a namespace. > > My understanding of this proposal was to create a LSM-wide namespace, > and one of the reason was to avoid one namespace per LSM. As I stated in my original email, perhaps not clearly enough, and several times in the past, I have no interest in supporting a single LSM-wide namespace at this point in time. Any LSM namespaces must be done at the individual LSM layer, although I am supportive of an API at the LSM framework layer to both help facilitate the individual LSM namespaces and provide a better userspace interface. -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 18:51 ` Paul Moore 2025-08-19 18:52 ` Paul Moore 2025-08-20 14:44 ` Mickaël Salaün @ 2025-08-21 2:05 ` Serge E. Hallyn 2025-08-21 2:35 ` Paul Moore 2025-08-21 8:07 ` John Johansen 2 siblings, 2 replies; 43+ messages in thread From: Serge E. Hallyn @ 2025-08-21 2:05 UTC (permalink / raw) To: Paul Moore; +Cc: Stephen Smalley, linux-security-module, selinux, John Johansen On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > <stephen.smalley.work@gmail.com> wrote: > > > > I think we want to be able to unshare a specific security module > > namespace without unsharing the others, i.e. just SELinux or just > > AppArmor. > > Not sure if your suggestion above supports that already but wanted to note it. > > The lsm_set_self_attr(2) approach allows for LSM specific unshare > operations. Take the existing LSM_ATTR_EXEC attribute as an example, > two LSMs have implemented support (AppArmor and SELinux), and > userspace can independently set the attribute as desired for each LSM. Overall I really like the idea. > > Serge pointed out that we also will need an API to attach to an > > existing SELinux namespace, which I captured here: > > https://github.com/stephensmalley/selinuxns/issues/19 > > This is handled for other Linux namespaces by opening a pseudo file > > under /proc/pid/ns and invoking setns(2), so not sure how we want to > > do it. > > One option would be to have a the LSM framework return a LSM namespace > "handle" for a given LSM using lsm_get_self_attr(2) and then do a > setns(2)-esque operation using lsm_set_self_attr(2) with that > "handle". We would need to figure out what would constitute a > "handle" but let's just mark that as TBD for now with this approach (I > think better options are available). The use case which would be complicated (not blocked) by this, is * a runtime creates a process p1 * p1 unshares its lsm namespace * runtime forks a debug/admin process p2 * p2 wants to enter p1's namespace Of course the runtime could work around it by, before relinquishing control of p1 to a new executable, returning the lsm_get_self_attr() data to over a pipe. Note I don't think we should support setting another task's namespace, only getting its namespace ID. > Since we have an existing LSM namespace combination, with processes > running inside of it, it might be sufficient to simply support moving > into an existing LSM namespace set with setns(2) using only a pidfd > and a new CLONE_LSMNS flag (or similar, upstream might want this as > CLONE_NEWLSM). This would simply set the LSM namespace set for the > setns(2) caller to match that of the target pidfd. We still wouldn't > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). A part of me is telling (another part of) me that being able to setns to a subset of the lsms could lead to privilege escapes through weird policy configurations for the various LSMs. In which case, an all-or-nothing LSM setns might actually be preferable. I haven't thought of a concrete example, though. > Any other ideas? > > -- > paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 2:05 ` Serge E. Hallyn @ 2025-08-21 2:35 ` Paul Moore 2025-08-21 3:02 ` Serge E. Hallyn 2025-08-21 8:12 ` John Johansen 2025-08-21 8:07 ` John Johansen 1 sibling, 2 replies; 43+ messages in thread From: Paul Moore @ 2025-08-21 2:35 UTC (permalink / raw) To: Serge E. Hallyn Cc: Stephen Smalley, linux-security-module, selinux, John Johansen On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote: > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > > <stephen.smalley.work@gmail.com> wrote: ... > > > Serge pointed out that we also will need an API to attach to an > > > existing SELinux namespace, which I captured here: > > > https://github.com/stephensmalley/selinuxns/issues/19 > > > This is handled for other Linux namespaces by opening a pseudo file > > > under /proc/pid/ns and invoking setns(2), so not sure how we want to > > > do it. > > > > One option would be to have a the LSM framework return a LSM namespace > > "handle" for a given LSM using lsm_get_self_attr(2) and then do a > > setns(2)-esque operation using lsm_set_self_attr(2) with that > > "handle". We would need to figure out what would constitute a > > "handle" but let's just mark that as TBD for now with this approach (I > > think better options are available). > > The use case which would be complicated (not blocked) by this, is > > * a runtime creates a process p1 > * p1 unshares its lsm namespace > * runtime forks a debug/admin process p2 > * p2 wants to enter p1's namespace > > Of course the runtime could work around it by, before relinquishing > control of p1 to a new executable, returning the lsm_get_self_attr() > data to over a pipe. > > Note I don't think we should support setting another task's namespace, > only getting its namespace ID. > > > Since we have an existing LSM namespace combination, with processes > > running inside of it, it might be sufficient to simply support moving > > into an existing LSM namespace set with setns(2) using only a pidfd > > and a new CLONE_LSMNS flag (or similar, upstream might want this as > > CLONE_NEWLSM). This would simply set the LSM namespace set for the > > setns(2) caller to match that of the target pidfd. We still wouldn't > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). > > A part of me is telling (another part of) me that being able to setns > to a subset of the lsms could lead to privilege escapes through > weird policy configurations for the various LSMs. In which case, > an all-or-nothing LSM setns might actually be preferable. Sorry I probably wasn't as clear as I should have been, but my idea with using the existing procfs/setns(2) approach with a single CLONE_NEWLSM (name pending sufficient bikeshedding) was that the process being setns()'d would simply end up in the exact copy of the target process' LSM namespace configuration, it shouldn't be a new set/subset/configuration ... and I would expect us to have controls around that such that LSMs could enforce policy on a setns(2) operation that involved their LSM. -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 2:35 ` Paul Moore @ 2025-08-21 3:02 ` Serge E. Hallyn 2025-08-22 1:50 ` Paul Moore 2025-08-21 8:12 ` John Johansen 1 sibling, 1 reply; 43+ messages in thread From: Serge E. Hallyn @ 2025-08-21 3:02 UTC (permalink / raw) To: Paul Moore Cc: Serge E. Hallyn, Stephen Smalley, linux-security-module, selinux, John Johansen On Wed, Aug 20, 2025 at 10:35:42PM -0400, Paul Moore wrote: > On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote: > > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > > > <stephen.smalley.work@gmail.com> wrote: > > ... > > > > > Serge pointed out that we also will need an API to attach to an > > > > existing SELinux namespace, which I captured here: > > > > https://github.com/stephensmalley/selinuxns/issues/19 > > > > This is handled for other Linux namespaces by opening a pseudo file > > > > under /proc/pid/ns and invoking setns(2), so not sure how we want to > > > > do it. > > > > > > One option would be to have a the LSM framework return a LSM namespace > > > "handle" for a given LSM using lsm_get_self_attr(2) and then do a > > > setns(2)-esque operation using lsm_set_self_attr(2) with that > > > "handle". We would need to figure out what would constitute a > > > "handle" but let's just mark that as TBD for now with this approach (I > > > think better options are available). > > > > The use case which would be complicated (not blocked) by this, is > > > > * a runtime creates a process p1 > > * p1 unshares its lsm namespace > > * runtime forks a debug/admin process p2 > > * p2 wants to enter p1's namespace > > > > Of course the runtime could work around it by, before relinquishing > > control of p1 to a new executable, returning the lsm_get_self_attr() > > data to over a pipe. > > > > Note I don't think we should support setting another task's namespace, > > only getting its namespace ID. > > > > > Since we have an existing LSM namespace combination, with processes > > > running inside of it, it might be sufficient to simply support moving > > > into an existing LSM namespace set with setns(2) using only a pidfd > > > and a new CLONE_LSMNS flag (or similar, upstream might want this as > > > CLONE_NEWLSM). This would simply set the LSM namespace set for the > > > setns(2) caller to match that of the target pidfd. We still wouldn't > > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). > > > > A part of me is telling (another part of) me that being able to setns > > to a subset of the lsms could lead to privilege escapes through > > weird policy configurations for the various LSMs. In which case, > > an all-or-nothing LSM setns might actually be preferable. > > Sorry I probably wasn't as clear as I should have been, but my idea > with using the existing procfs/setns(2) approach with a single > CLONE_NEWLSM (name pending sufficient bikeshedding) was that the > process being setns()'d would simply end up in the exact copy of the > target process' LSM namespace configuration, it shouldn't be a new Oh, I think I was being unclear - I thought the first option, using lsm_set_self_attr(), would allow choosing a subset of LSMs to setns to. In contrast, the pure setns with a single flag is less flexible, but possibly safer. So I typed there the result of my train of thought, which is that your second suggestion is probably preferable. > set/subset/configuration ... and I would expect us to have controls > around that such that LSMs could enforce policy on a setns(2) > operation that involved their LSM. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 3:02 ` Serge E. Hallyn @ 2025-08-22 1:50 ` Paul Moore 0 siblings, 0 replies; 43+ messages in thread From: Paul Moore @ 2025-08-22 1:50 UTC (permalink / raw) To: Serge E. Hallyn Cc: Stephen Smalley, linux-security-module, selinux, John Johansen On Wed, Aug 20, 2025 at 11:02 PM Serge E. Hallyn <serge@hallyn.com> wrote: > On Wed, Aug 20, 2025 at 10:35:42PM -0400, Paul Moore wrote: > > On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote: > > > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: > > > > On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley > > > > <stephen.smalley.work@gmail.com> wrote: > > > > ... > > > > > > > Serge pointed out that we also will need an API to attach to an > > > > > existing SELinux namespace, which I captured here: > > > > > https://github.com/stephensmalley/selinuxns/issues/19 > > > > > This is handled for other Linux namespaces by opening a pseudo file > > > > > under /proc/pid/ns and invoking setns(2), so not sure how we want to > > > > > do it. > > > > > > > > One option would be to have a the LSM framework return a LSM namespace > > > > "handle" for a given LSM using lsm_get_self_attr(2) and then do a > > > > setns(2)-esque operation using lsm_set_self_attr(2) with that > > > > "handle". We would need to figure out what would constitute a > > > > "handle" but let's just mark that as TBD for now with this approach (I > > > > think better options are available). > > > > > > The use case which would be complicated (not blocked) by this, is > > > > > > * a runtime creates a process p1 > > > * p1 unshares its lsm namespace > > > * runtime forks a debug/admin process p2 > > > * p2 wants to enter p1's namespace > > > > > > Of course the runtime could work around it by, before relinquishing > > > control of p1 to a new executable, returning the lsm_get_self_attr() > > > data to over a pipe. > > > > > > Note I don't think we should support setting another task's namespace, > > > only getting its namespace ID. > > > > > > > Since we have an existing LSM namespace combination, with processes > > > > running inside of it, it might be sufficient to simply support moving > > > > into an existing LSM namespace set with setns(2) using only a pidfd > > > > and a new CLONE_LSMNS flag (or similar, upstream might want this as > > > > CLONE_NEWLSM). This would simply set the LSM namespace set for the > > > > setns(2) caller to match that of the target pidfd. We still wouldn't > > > > want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). > > > > > > A part of me is telling (another part of) me that being able to setns > > > to a subset of the lsms could lead to privilege escapes through > > > weird policy configurations for the various LSMs. In which case, > > > an all-or-nothing LSM setns might actually be preferable. > > > > Sorry I probably wasn't as clear as I should have been, but my idea > > with using the existing procfs/setns(2) approach with a single > > CLONE_NEWLSM (name pending sufficient bikeshedding) was that the > > process being setns()'d would simply end up in the exact copy of the > > target process' LSM namespace configuration, it shouldn't be a new > > Oh, I think I was being unclear - I thought the first option, using > lsm_set_self_attr(), would allow choosing a subset of LSMs to setns to. > In contrast, the pure setns with a single flag is less flexible, but > possibly safer. So I typed there the result of my train of thought, > which is that your second suggestion is probably preferable. I think we've probably both been a bit off :) Let me try again ... I'm proposing the lsm_set_self_attr(2) approach as a way for a process to setup an arbitrary set of LSM namespaces to take effect on an upcoming clone() or exec() (we can discuss that detail). I didn't originally envision this as a way to potentially join existing LSM namespaces, but rather a way to create new LSM namespaces when a new process is created/exec'd. The procfs/setns(2) approach would be in addition to the lsm_set_self_attr(2) mechanism, and would allow a process to enter a previously configured LSM namespace set when a CLONE_LSMNS (or similar) flag was passed to setns(2). Both mechanisms are very much up for debate in my mind, and doing either or both, is possible as far as I'm concerned. -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 2:35 ` Paul Moore 2025-08-21 3:02 ` Serge E. Hallyn @ 2025-08-21 8:12 ` John Johansen 1 sibling, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-21 8:12 UTC (permalink / raw) To: Paul Moore, Serge E. Hallyn Cc: Stephen Smalley, linux-security-module, selinux On 8/20/25 19:35, Paul Moore wrote: > On Wed, Aug 20, 2025 at 10:05 PM Serge E. Hallyn <serge@hallyn.com> wrote: >> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: >>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley >>> <stephen.smalley.work@gmail.com> wrote: > > ... > >>>> Serge pointed out that we also will need an API to attach to an >>>> existing SELinux namespace, which I captured here: >>>> https://github.com/stephensmalley/selinuxns/issues/19 >>>> This is handled for other Linux namespaces by opening a pseudo file >>>> under /proc/pid/ns and invoking setns(2), so not sure how we want to >>>> do it. >>> >>> One option would be to have a the LSM framework return a LSM namespace >>> "handle" for a given LSM using lsm_get_self_attr(2) and then do a >>> setns(2)-esque operation using lsm_set_self_attr(2) with that >>> "handle". We would need to figure out what would constitute a >>> "handle" but let's just mark that as TBD for now with this approach (I >>> think better options are available). >> >> The use case which would be complicated (not blocked) by this, is >> >> * a runtime creates a process p1 >> * p1 unshares its lsm namespace >> * runtime forks a debug/admin process p2 >> * p2 wants to enter p1's namespace >> >> Of course the runtime could work around it by, before relinquishing >> control of p1 to a new executable, returning the lsm_get_self_attr() >> data to over a pipe. >> >> Note I don't think we should support setting another task's namespace, >> only getting its namespace ID. >> >>> Since we have an existing LSM namespace combination, with processes >>> running inside of it, it might be sufficient to simply support moving >>> into an existing LSM namespace set with setns(2) using only a pidfd >>> and a new CLONE_LSMNS flag (or similar, upstream might want this as >>> CLONE_NEWLSM). This would simply set the LSM namespace set for the >>> setns(2) caller to match that of the target pidfd. We still wouldn't >>> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). >> >> A part of me is telling (another part of) me that being able to setns >> to a subset of the lsms could lead to privilege escapes through >> weird policy configurations for the various LSMs. In which case, >> an all-or-nothing LSM setns might actually be preferable. > > Sorry I probably wasn't as clear as I should have been, but my idea > with using the existing procfs/setns(2) approach with a single > CLONE_NEWLSM (name pending sufficient bikeshedding) was that the > process being setns()'d would simply end up in the exact copy of the > target process' LSM namespace configuration, it shouldn't be a new > set/subset/configuration ... and I would expect us to have controls > around that such that LSMs could enforce policy on a setns(2) > operation that involved their LSM. > entering as a complete set, is certainly the safest. At a minim the LSMs are going to need to be able to specify the set of namespaces the are needed if you enter the LSM namespace. The easiest way to do this is what you propose, take away the flexibility and allow moving everything as a set. I do think we might still have a need to be able to request entering an LSM namespace from the set, but I think that at least for a first its probably better to not go there. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 2:05 ` Serge E. Hallyn 2025-08-21 2:35 ` Paul Moore @ 2025-08-21 8:07 ` John Johansen 1 sibling, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-21 8:07 UTC (permalink / raw) To: Serge E. Hallyn, Paul Moore Cc: Stephen Smalley, linux-security-module, selinux On 8/20/25 19:05, Serge E. Hallyn wrote: > On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote: >> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley >> <stephen.smalley.work@gmail.com> wrote: >>> >>> I think we want to be able to unshare a specific security module >>> namespace without unsharing the others, i.e. just SELinux or just >>> AppArmor. >>> Not sure if your suggestion above supports that already but wanted to note it. >> >> The lsm_set_self_attr(2) approach allows for LSM specific unshare >> operations. Take the existing LSM_ATTR_EXEC attribute as an example, >> two LSMs have implemented support (AppArmor and SELinux), and >> userspace can independently set the attribute as desired for each LSM. > > Overall I really like the idea. > >>> Serge pointed out that we also will need an API to attach to an >>> existing SELinux namespace, which I captured here: >>> https://github.com/stephensmalley/selinuxns/issues/19 >>> This is handled for other Linux namespaces by opening a pseudo file >>> under /proc/pid/ns and invoking setns(2), so not sure how we want to >>> do it. >> >> One option would be to have a the LSM framework return a LSM namespace >> "handle" for a given LSM using lsm_get_self_attr(2) and then do a >> setns(2)-esque operation using lsm_set_self_attr(2) with that >> "handle". We would need to figure out what would constitute a >> "handle" but let's just mark that as TBD for now with this approach (I >> think better options are available). > > The use case which would be complicated (not blocked) by this, is > > * a runtime creates a process p1 > * p1 unshares its lsm namespace > * runtime forks a debug/admin process p2 > * p2 wants to enter p1's namespace > > Of course the runtime could work around it by, before relinquishing > control of p1 to a new executable, returning the lsm_get_self_attr() > data to over a pipe. > > Note I don't think we should support setting another task's namespace, > only getting its namespace ID. > its not reasonably doable without a significant update to the creds architecture. Its an orthogal feature, being able to set another task's credentials and as such can be saved for another argument. So very much in agreement, lets not allow that as part of the design. >> Since we have an existing LSM namespace combination, with processes >> running inside of it, it might be sufficient to simply support moving >> into an existing LSM namespace set with setns(2) using only a pidfd >> and a new CLONE_LSMNS flag (or similar, upstream might want this as >> CLONE_NEWLSM). This would simply set the LSM namespace set for the >> setns(2) caller to match that of the target pidfd. We still wouldn't >> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*(). > > A part of me is telling (another part of) me that being able to setns > to a subset of the lsms could lead to privilege escapes through > weird policy configurations for the various LSMs. In which case, > an all-or-nothing LSM setns might actually be preferable. > > I haven't thought of a concrete example, though. > Not just potentially, and not just security/LSM namespaces. Really the LSMs need to be able to determine whether/which namespaces (including system namespaces) need to move together as a set. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 17:47 ` Stephen Smalley 2025-08-19 18:51 ` Paul Moore @ 2025-08-21 7:46 ` John Johansen 2025-08-21 14:26 ` Serge E. Hallyn 2025-08-22 1:59 ` Paul Moore 1 sibling, 2 replies; 43+ messages in thread From: John Johansen @ 2025-08-21 7:46 UTC (permalink / raw) To: Stephen Smalley, Paul Moore; +Cc: linux-security-module, selinux On 8/19/25 10:47, Stephen Smalley wrote: > On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote: >> >> Hello all, >> >> As most of you are likely aware, Stephen Smalley has been working on >> adding namespace support to SELinux, and the work has now progressed >> to the point where a serious discussion on the API is warranted. For >> those of you are unfamiliar with the details or Stephen's patchset, or >> simply need a refresher, he has some excellent documentation in his >> work-in-progress repo: >> >> * https://github.com/stephensmalley/selinuxns >> >> Stephen also gave a (pre-recorded) presentation at LSS-NA this year >> about SELinux namespacing, you can watch the presentation here: >> >> * https://www.youtube.com/watch?v=AwzGCOwxLoM >> >> In the past you've heard me state, rather firmly at times, that I >> believe namespacing at the LSM framework layer to be a mistake, >> although if there is something that can be done to help facilitate the >> namespacing of individual LSMs at the framework layer, I would be >> supportive of that. I think that a single LSM namespace API, similar >> to our recently added LSM syscalls, may be such a thing, so I'd like >> us to have a discussion to see if we all agree on that, and if so, >> what such an API might look like. >> >> At LSS-NA this year, John Johansen and I had a brief discussion where >> he suggested a single LSM wide clone*(2) flag that individual LSM's >> could opt into via callbacks. John is directly CC'd on this mail, so >> I'll let him expand on this idea. >> >> While I agree with John that a fs based API is problematic (see all of >> our discussions around the LSM syscalls), I'm concerned that a single >> clone*(2) flag will significantly limit our flexibility around how >> individual LSMs are namespaced, something I don't want to see happen. >> This makes me wonder about the potential for expanding >> lsm_set_self_attr(2) to support a new LSM attribute that would support >> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would >> provide a single LSM framework API for an unshare operation while also >> providing a mechanism to pass LSM specific via the lsm_ctx struct if >> needed. Just as we do with the other LSM_ATTR_* flags today, >> individual LSMs can opt-in to the API fairly easily by providing a >> setselfattr() LSM callback. >> >> Thoughts? > > I think we want to be able to unshare a specific security module > namespace without unsharing the others, i.e. just SELinux or just > AppArmor. yes which is part of the problem with the single flag. That choice would be entirely at the policy level, without any input from userspace. I still think the policy may decide something different than what userspace requests but that just means the namespacing of an LSM is under the individual LSMs controls and not the infrastructures. Eg. selinux is using hierarchical namespaces, so when asked for a new namespace you will get the bounding hierarchy, but yama (if it ever gets namespace support) could very well just use independent namespaces. > Not sure if your suggestion above supports that already but wanted to note it. > Regardless, I have no objections to any system call or flag that can > be used to unshare the SELinux namespace and it should be trivial to > wire it up to the existing underlying function. > Serge pointed out that we also will need an API to attach to an > existing SELinux namespace, which I captured here: > https://github.com/stephensmalley/selinuxns/issues/19 yes a mechanism to switch is needed, but I also strongly dislike setns(2). For security purposes we definitely want to control whether the LSM namespace is associated with other system namespaces. > This is handled for other Linux namespaces by opening a pseudo file > under /proc/pid/ns and invoking setns(2), so not sure how we want to > do it. That is a possible interface, not one that I like, so I would like to explore other options first. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 7:46 ` John Johansen @ 2025-08-21 14:26 ` Serge E. Hallyn 2025-08-21 14:57 ` John Johansen 2025-08-22 1:59 ` Paul Moore 1 sibling, 1 reply; 43+ messages in thread From: Serge E. Hallyn @ 2025-08-21 14:26 UTC (permalink / raw) To: John Johansen; +Cc: Stephen Smalley, Paul Moore, linux-security-module, selinux On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: > On 8/19/25 10:47, Stephen Smalley wrote: > > On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote: > > > > > > Hello all, > > > > > > As most of you are likely aware, Stephen Smalley has been working on > > > adding namespace support to SELinux, and the work has now progressed > > > to the point where a serious discussion on the API is warranted. For > > > those of you are unfamiliar with the details or Stephen's patchset, or > > > simply need a refresher, he has some excellent documentation in his > > > work-in-progress repo: > > > > > > * https://github.com/stephensmalley/selinuxns > > > > > > Stephen also gave a (pre-recorded) presentation at LSS-NA this year > > > about SELinux namespacing, you can watch the presentation here: > > > > > > * https://www.youtube.com/watch?v=AwzGCOwxLoM > > > > > > In the past you've heard me state, rather firmly at times, that I > > > believe namespacing at the LSM framework layer to be a mistake, > > > although if there is something that can be done to help facilitate the > > > namespacing of individual LSMs at the framework layer, I would be > > > supportive of that. I think that a single LSM namespace API, similar > > > to our recently added LSM syscalls, may be such a thing, so I'd like > > > us to have a discussion to see if we all agree on that, and if so, > > > what such an API might look like. > > > > > > At LSS-NA this year, John Johansen and I had a brief discussion where > > > he suggested a single LSM wide clone*(2) flag that individual LSM's > > > could opt into via callbacks. John is directly CC'd on this mail, so > > > I'll let him expand on this idea. > > > > > > While I agree with John that a fs based API is problematic (see all of > > > our discussions around the LSM syscalls), I'm concerned that a single > > > clone*(2) flag will significantly limit our flexibility around how > > > individual LSMs are namespaced, something I don't want to see happen. > > > This makes me wonder about the potential for expanding > > > lsm_set_self_attr(2) to support a new LSM attribute that would support > > > a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > > > provide a single LSM framework API for an unshare operation while also > > > providing a mechanism to pass LSM specific via the lsm_ctx struct if > > > needed. Just as we do with the other LSM_ATTR_* flags today, > > > individual LSMs can opt-in to the API fairly easily by providing a > > > setselfattr() LSM callback. > > > > > > Thoughts? > > > > I think we want to be able to unshare a specific security module > > namespace without unsharing the others, i.e. just SELinux or just > > AppArmor. > > yes which is part of the problem with the single flag. That choice > would be entirely at the policy level, without any input from userspace. AIUI Paul's suggestion is the user can pre-set the details of which lsms to unshare and how with the lsm_set_self_attr(), and then a single CLONE_LSM effects that. -serge ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 14:26 ` Serge E. Hallyn @ 2025-08-21 14:57 ` John Johansen 2025-09-01 16:01 ` Dr. Greg 0 siblings, 1 reply; 43+ messages in thread From: John Johansen @ 2025-08-21 14:57 UTC (permalink / raw) To: Serge E. Hallyn Cc: Stephen Smalley, Paul Moore, linux-security-module, selinux On 8/21/25 07:26, Serge E. Hallyn wrote: > On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: >> On 8/19/25 10:47, Stephen Smalley wrote: >>> On Tue, Aug 19, 2025 at 10:56 AM Paul Moore <paul@paul-moore.com> wrote: >>>> >>>> Hello all, >>>> >>>> As most of you are likely aware, Stephen Smalley has been working on >>>> adding namespace support to SELinux, and the work has now progressed >>>> to the point where a serious discussion on the API is warranted. For >>>> those of you are unfamiliar with the details or Stephen's patchset, or >>>> simply need a refresher, he has some excellent documentation in his >>>> work-in-progress repo: >>>> >>>> * https://github.com/stephensmalley/selinuxns >>>> >>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year >>>> about SELinux namespacing, you can watch the presentation here: >>>> >>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM >>>> >>>> In the past you've heard me state, rather firmly at times, that I >>>> believe namespacing at the LSM framework layer to be a mistake, >>>> although if there is something that can be done to help facilitate the >>>> namespacing of individual LSMs at the framework layer, I would be >>>> supportive of that. I think that a single LSM namespace API, similar >>>> to our recently added LSM syscalls, may be such a thing, so I'd like >>>> us to have a discussion to see if we all agree on that, and if so, >>>> what such an API might look like. >>>> >>>> At LSS-NA this year, John Johansen and I had a brief discussion where >>>> he suggested a single LSM wide clone*(2) flag that individual LSM's >>>> could opt into via callbacks. John is directly CC'd on this mail, so >>>> I'll let him expand on this idea. >>>> >>>> While I agree with John that a fs based API is problematic (see all of >>>> our discussions around the LSM syscalls), I'm concerned that a single >>>> clone*(2) flag will significantly limit our flexibility around how >>>> individual LSMs are namespaced, something I don't want to see happen. >>>> This makes me wonder about the potential for expanding >>>> lsm_set_self_attr(2) to support a new LSM attribute that would support >>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would >>>> provide a single LSM framework API for an unshare operation while also >>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if >>>> needed. Just as we do with the other LSM_ATTR_* flags today, >>>> individual LSMs can opt-in to the API fairly easily by providing a >>>> setselfattr() LSM callback. >>>> >>>> Thoughts? >>> >>> I think we want to be able to unshare a specific security module >>> namespace without unsharing the others, i.e. just SELinux or just >>> AppArmor. >> >> yes which is part of the problem with the single flag. That choice >> would be entirely at the policy level, without any input from userspace. > > AIUI Paul's suggestion is the user can pre-set the details of which > lsms to unshare and how with the lsm_set_self_attr(), and then a > single CLONE_LSM effects that. > yes, I was specifically addressing the conversation I had with Paul at LSS that Paul brought up. That is At LSS-NA this year, John Johansen and I had a brief discussion where he suggested a single LSM wide clone*(2) flag that individual LSM's could opt into via callbacks. the idea there isn't all that different than what Paul proposed. You could have a single flag, if you can provide ancillary information. But a single flag on its own isn't sufficient. You can do a subset with a single flag and only policy directing things, but that would cut container managers out of the decision. Without a universal container identifier that really limits what you can do. In another email I likend it to the MCS label approach to the container where you have a single security policy for the container and each container gets to be a unique instance of that policy. Its not a perfect analogy as with namespace policy can be loaded into the namespace making it unique. I don't think the approach is right because not all namespaces implement a loadable policy, and even when they do I think we can do a better job if the container manager is allowed to provide additional context with the namespacing request. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 14:57 ` John Johansen @ 2025-09-01 16:01 ` Dr. Greg 2025-09-01 17:31 ` Casey Schaufler 2025-09-02 10:55 ` John Johansen 0 siblings, 2 replies; 43+ messages in thread From: Dr. Greg @ 2025-09-01 16:01 UTC (permalink / raw) To: John Johansen Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore, linux-security-module, selinux On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote: Good morning, I hope the week is starting well for everyone. Now that everyone is getting past the summer holiday season, it would seem useful to specifically clarify some of the LSM namespace implementation details. > On 8/21/25 07:26, Serge E. Hallyn wrote: > >On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: > >>On 8/19/25 10:47, Stephen Smalley wrote: > >>>On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> > >>>wrote: > >>>> > >>>>Hello all, > >>>> > >>>>As most of you are likely aware, Stephen Smalley has been working on > >>>>adding namespace support to SELinux, and the work has now progressed > >>>>to the point where a serious discussion on the API is warranted. For > >>>>those of you are unfamiliar with the details or Stephen's patchset, or > >>>>simply need a refresher, he has some excellent documentation in his > >>>>work-in-progress repo: > >>>> > >>>>* https://github.com/stephensmalley/selinuxns > >>>> > >>>>Stephen also gave a (pre-recorded) presentation at LSS-NA this year > >>>>about SELinux namespacing, you can watch the presentation here: > >>>> > >>>>* https://www.youtube.com/watch?v=AwzGCOwxLoM > >>>> > >>>>In the past you've heard me state, rather firmly at times, that I > >>>>believe namespacing at the LSM framework layer to be a mistake, > >>>>although if there is something that can be done to help facilitate the > >>>>namespacing of individual LSMs at the framework layer, I would be > >>>>supportive of that. I think that a single LSM namespace API, similar > >>>>to our recently added LSM syscalls, may be such a thing, so I'd like > >>>>us to have a discussion to see if we all agree on that, and if so, > >>>>what such an API might look like. > >>>> > >>>>At LSS-NA this year, John Johansen and I had a brief discussion where > >>>>he suggested a single LSM wide clone*(2) flag that individual LSM's > >>>>could opt into via callbacks. John is directly CC'd on this mail, so > >>>>I'll let him expand on this idea. > >>>> > >>>>While I agree with John that a fs based API is problematic (see all of > >>>>our discussions around the LSM syscalls), I'm concerned that a single > >>>>clone*(2) flag will significantly limit our flexibility around how > >>>>individual LSMs are namespaced, something I don't want to see happen. > >>>>This makes me wonder about the potential for expanding > >>>>lsm_set_self_attr(2) to support a new LSM attribute that would support > >>>>a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > >>>>provide a single LSM framework API for an unshare operation while also > >>>>providing a mechanism to pass LSM specific via the lsm_ctx struct if > >>>>needed. Just as we do with the other LSM_ATTR_* flags today, > >>>>individual LSMs can opt-in to the API fairly easily by providing a > >>>>setselfattr() LSM callback. > >>>> > >>>>Thoughts? > >>> > >>>I think we want to be able to unshare a specific security module > >>>namespace without unsharing the others, i.e. just SELinux or just > >>>AppArmor. > >> > >>yes which is part of the problem with the single flag. That choice > >>would be entirely at the policy level, without any input from userspace. > > > >AIUI Paul's suggestion is the user can pre-set the details of which > >lsms to unshare and how with the lsm_set_self_attr(), and then a > >single CLONE_LSM effects that. > yes, I was specifically addressing the conversation I had with Paul at > LSS that Paul brought up. That is > > At LSS-NA this year, John Johansen and I had a brief discussion where > he suggested a single LSM wide clone*(2) flag that individual LSM's > could opt into via callbacks. > > the idea there isn't all that different than what Paul proposed. You > could have a single flag, if you can provide ancillary information. But > a single flag on its own isn't sufficient. If one thing has come out of this thread, it would seem to be the fact that there is going to be little commonality in the requirements that various LSM's will have for the creation of a namespace. Given that, the most infrastructure that the LSM should provide would be a common API for a resource orchestrator to request namespace separation and to provide a framework for configuring the namespace prior to when execution begins in the context of the namespace. The first issue to resolve would seem to be what namespace separation implies. John, if I interpret your comments in this discussion correctly, your contention is that when namespace separation is requested, all of the LSM's that implement namespaces will create a subordinate namespace, is that a correct assumption? It would seem, consistent with the 'stacking' concept, that any LSM with namespace capability that chooses not to separate, will result in denial of the separation request. That in turn will imply the need to unwind or delete any namespace context that other LSM's may have allocated before the refusal occurred. This model also implies that the orchestrator requesting the separation will need to pass a set of parameters describing the characteristics of each namespace, described by the LSM identifier that they pertain to. Since there may be a need to configure multiple namespaces there would be a requirement to pass an array or list of these parameter sets. There will also be a need to inject, possibly substantial amounts of policy or model information into the namespace, before execution in the context of the namespace begins. There will also be a need to decide whether namespace separation should occur at the request of the orchestrator or at the next fork, the latter model being what the other resource namespaces use. We believe the argument for direct separation can be made by looking at the gymnastics that orchestrators need to jump through with the 'change-on-fork' model. Case in point, it would seem realistic that a process with sufficient privilege, may desire to place itself in a new LSM namespace context in a manner that does not require re-execution of itself. With respect to separation, the remaining issue is if a new security capability bit needs to be implemented to gate namespace separation. John, based on your comments, I believe you would support this need? > You can do a subset with a single flag and only policy directing things, > but that would cut container managers out of the decision. Without a > universal container identifier that really limits what you can do. In > another email I likend it to the MCS label approach to the container > where you have a single security policy for the container and each > container gets to be a unique instance of that policy. Its not a perfect > analogy as with namespace policy can be loaded into the namespace making > it unique. I don't think the approach is right because not all namespaces > implement a loadable policy, and even when they do I think we can do a > better job if the container manager is allowed to provide additional > context with the namespacing request. In order to be relevant, the configuration of LSM namespaces need to be under control of a resource orchestrator or container manager. What we hear from people doing Kubernetes, at scale, is a desire to be able to request that a container be run somewhere in the hardware resource pool and for that container to implement a security model specific to the needs of the workload running in that container. In a manner that is orthogonal from other security policies that may be in effect for other workloads, on the host or in other containers. Hopefully the above will be of assistance in furthering discussion. Have a good week. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-09-01 16:01 ` Dr. Greg @ 2025-09-01 17:31 ` Casey Schaufler 2025-09-04 2:16 ` Dr. Greg 2025-09-02 10:55 ` John Johansen 1 sibling, 1 reply; 43+ messages in thread From: Casey Schaufler @ 2025-09-01 17:31 UTC (permalink / raw) To: Dr. Greg, John Johansen Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore, linux-security-module, selinux, Casey Schaufler On 9/1/2025 9:01 AM, Dr. Greg wrote: > On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote: > > Good morning, I hope the week is starting well for everyone. > > Now that everyone is getting past the summer holiday season, it would > seem useful to specifically clarify some of the LSM namespace > implementation details. > >> On 8/21/25 07:26, Serge E. Hallyn wrote: >>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: >>>> On 8/19/25 10:47, Stephen Smalley wrote: >>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> >>>>> wrote: >>>>>> Hello all, >>>>>> >>>>>> As most of you are likely aware, Stephen Smalley has been working on >>>>>> adding namespace support to SELinux, and the work has now progressed >>>>>> to the point where a serious discussion on the API is warranted. For >>>>>> those of you are unfamiliar with the details or Stephen's patchset, or >>>>>> simply need a refresher, he has some excellent documentation in his >>>>>> work-in-progress repo: >>>>>> >>>>>> * https://github.com/stephensmalley/selinuxns >>>>>> >>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year >>>>>> about SELinux namespacing, you can watch the presentation here: >>>>>> >>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM >>>>>> >>>>>> In the past you've heard me state, rather firmly at times, that I >>>>>> believe namespacing at the LSM framework layer to be a mistake, >>>>>> although if there is something that can be done to help facilitate the >>>>>> namespacing of individual LSMs at the framework layer, I would be >>>>>> supportive of that. I think that a single LSM namespace API, similar >>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like >>>>>> us to have a discussion to see if we all agree on that, and if so, >>>>>> what such an API might look like. >>>>>> >>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where >>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's >>>>>> could opt into via callbacks. John is directly CC'd on this mail, so >>>>>> I'll let him expand on this idea. >>>>>> >>>>>> While I agree with John that a fs based API is problematic (see all of >>>>>> our discussions around the LSM syscalls), I'm concerned that a single >>>>>> clone*(2) flag will significantly limit our flexibility around how >>>>>> individual LSMs are namespaced, something I don't want to see happen. >>>>>> This makes me wonder about the potential for expanding >>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support >>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would >>>>>> provide a single LSM framework API for an unshare operation while also >>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if >>>>>> needed. Just as we do with the other LSM_ATTR_* flags today, >>>>>> individual LSMs can opt-in to the API fairly easily by providing a >>>>>> setselfattr() LSM callback. >>>>>> >>>>>> Thoughts? >>>>> I think we want to be able to unshare a specific security module >>>>> namespace without unsharing the others, i.e. just SELinux or just >>>>> AppArmor. >>>> yes which is part of the problem with the single flag. That choice >>>> would be entirely at the policy level, without any input from userspace. >>> AIUI Paul's suggestion is the user can pre-set the details of which >>> lsms to unshare and how with the lsm_set_self_attr(), and then a >>> single CLONE_LSM effects that. >> yes, I was specifically addressing the conversation I had with Paul at >> LSS that Paul brought up. That is >> >> At LSS-NA this year, John Johansen and I had a brief discussion where >> he suggested a single LSM wide clone*(2) flag that individual LSM's >> could opt into via callbacks. >> >> the idea there isn't all that different than what Paul proposed. You >> could have a single flag, if you can provide ancillary information. But >> a single flag on its own isn't sufficient. > If one thing has come out of this thread, it would seem to be the fact > that there is going to be little commonality in the requirements that > various LSM's will have for the creation of a namespace. > > Given that, the most infrastructure that the LSM should provide would > be a common API for a resource orchestrator to request namespace > separation and to provide a framework for configuring the namespace > prior to when execution begins in the context of the namespace. > > The first issue to resolve would seem to be what namespace separation > implies. > > John, if I interpret your comments in this discussion correctly, your > contention is that when namespace separation is requested, all of the > LSM's that implement namespaces will create a subordinate namespace, > is that a correct assumption? > > It would seem, consistent with the 'stacking' concept, that any LSM > with namespace capability that chooses not to separate, will result in > denial of the separation request. That in turn will imply the need to > unwind or delete any namespace context that other LSM's may have > allocated before the refusal occurred. Were it true that 'stacking' rated the status of a 'concept'. An LSM that is capable of namespacing (the definition of which is elusive at this time) should be allowed to decline participation in a namespace creation. That, or there needs to be a convention for "null" namespaces, by which an LSM can pretend that it isn't involved in the new namespace. I think the latter smells funny and would invite "security people don't understand performance" remarks. No LSM should be allowed to prevent another from using namespaces. > > This model also implies that the orchestrator requesting the > separation will need to pass a set of parameters describing the > characteristics of each namespace, described by the LSM identifier > that they pertain to. Since there may be a need to configure multiple > namespaces there would be a requirement to pass an array or list of > these parameter sets. Just like lsm_set_self_attr(2). > There will also be a need to inject, possibly substantial amounts of > policy or model information into the namespace, before execution in > the context of the namespace begins. Yup. A major downside of loadable policy. > There will also be a need to decide whether namespace separation > should occur at the request of the orchestrator or at the next fork, > the latter model being what the other resource namespaces use. We > believe the argument for direct separation can be made by looking at > the gymnastics that orchestrators need to jump through with the > 'change-on-fork' model. > > Case in point, it would seem realistic that a process with sufficient > privilege, may desire to place itself in a new LSM namespace context > in a manner that does not require re-execution of itself. > > With respect to separation, the remaining issue is if a new security > capability bit needs to be implemented to gate namespace separation. > John, based on your comments, I believe you would support this need? I don't like the notion of a new capability for this. But then, I object to almost every new capability proposed. Existing namespaces don't need their own capabilities. I don't see this case as special. > >> You can do a subset with a single flag and only policy directing things, >> but that would cut container managers out of the decision. Without a >> universal container identifier that really limits what you can do. In >> another email I likend it to the MCS label approach to the container >> where you have a single security policy for the container and each >> container gets to be a unique instance of that policy. Its not a perfect >> analogy as with namespace policy can be loaded into the namespace making >> it unique. I don't think the approach is right because not all namespaces >> implement a loadable policy, and even when they do I think we can do a >> better job if the container manager is allowed to provide additional >> context with the namespacing request. > In order to be relevant, the configuration of LSM namespaces need to > be under control of a resource orchestrator or container manager. I do not approve of kernel features that are pointless without specific user space support. If it can't be used in ways other than those defined by a particular user space component they really don't belong in the kernel. > > What we hear from people doing Kubernetes, at scale, is a desire to be > able to request that a container be run somewhere in the hardware > resource pool and for that container to implement a security model > specific to the needs of the workload running in that container. In a > manner that is orthogonal from other security policies that may be in > effect for other workloads, on the host or in other containers. That sounds to me like they want per-container security policy. That would require that the kernel have the 'concept' of a container. That's not something I expect to see in my lifetime. > > Hopefully the above will be of assistance in furthering discussion. > > Have a good week. > > As always, > Dr. Greg > > The Quixote Project - Flailing at the Travails of Cybersecurity > https://github.com/Quixote-Project > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-09-01 17:31 ` Casey Schaufler @ 2025-09-04 2:16 ` Dr. Greg 2025-09-04 17:40 ` Casey Schaufler 0 siblings, 1 reply; 43+ messages in thread From: Dr. Greg @ 2025-09-04 2:16 UTC (permalink / raw) To: Casey Schaufler Cc: John Johansen, Serge E. Hallyn, Stephen Smalley, Paul Moore, linux-security-module, selinux On Mon, Sep 01, 2025 at 10:31:43AM -0700, Casey Schaufler wrote: Hi, I hope mid-week has gone well for everyone. > On 9/1/2025 9:01 AM, Dr. Greg wrote: > > On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote: > > > > Good morning, I hope the week is starting well for everyone. > > > > Now that everyone is getting past the summer holiday season, it would > > seem useful to specifically clarify some of the LSM namespace > > implementation details. > > > >> On 8/21/25 07:26, Serge E. Hallyn wrote: > >>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: > >>>> On 8/19/25 10:47, Stephen Smalley wrote: > >>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> > >>>>> wrote: > >>>>>> Hello all, > >>>>>> > >>>>>> As most of you are likely aware, Stephen Smalley has been working on > >>>>>> adding namespace support to SELinux, and the work has now progressed > >>>>>> to the point where a serious discussion on the API is warranted. For > >>>>>> those of you are unfamiliar with the details or Stephen's patchset, or > >>>>>> simply need a refresher, he has some excellent documentation in his > >>>>>> work-in-progress repo: > >>>>>> > >>>>>> * https://github.com/stephensmalley/selinuxns > >>>>>> > >>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year > >>>>>> about SELinux namespacing, you can watch the presentation here: > >>>>>> > >>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM > >>>>>> > >>>>>> In the past you've heard me state, rather firmly at times, that I > >>>>>> believe namespacing at the LSM framework layer to be a mistake, > >>>>>> although if there is something that can be done to help facilitate the > >>>>>> namespacing of individual LSMs at the framework layer, I would be > >>>>>> supportive of that. I think that a single LSM namespace API, similar > >>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like > >>>>>> us to have a discussion to see if we all agree on that, and if so, > >>>>>> what such an API might look like. > >>>>>> > >>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where > >>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's > >>>>>> could opt into via callbacks. John is directly CC'd on this mail, so > >>>>>> I'll let him expand on this idea. > >>>>>> > >>>>>> While I agree with John that a fs based API is problematic (see all of > >>>>>> our discussions around the LSM syscalls), I'm concerned that a single > >>>>>> clone*(2) flag will significantly limit our flexibility around how > >>>>>> individual LSMs are namespaced, something I don't want to see happen. > >>>>>> This makes me wonder about the potential for expanding > >>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support > >>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > >>>>>> provide a single LSM framework API for an unshare operation while also > >>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if > >>>>>> needed. Just as we do with the other LSM_ATTR_* flags today, > >>>>>> individual LSMs can opt-in to the API fairly easily by providing a > >>>>>> setselfattr() LSM callback. > >>>>>> > >>>>>> Thoughts? > >>>>> I think we want to be able to unshare a specific security module > >>>>> namespace without unsharing the others, i.e. just SELinux or just > >>>>> AppArmor. > >>>> yes which is part of the problem with the single flag. That choice > >>>> would be entirely at the policy level, without any input from userspace. > >>> AIUI Paul's suggestion is the user can pre-set the details of which > >>> lsms to unshare and how with the lsm_set_self_attr(), and then a > >>> single CLONE_LSM effects that. > >> yes, I was specifically addressing the conversation I had with Paul at > >> LSS that Paul brought up. That is > >> > >> At LSS-NA this year, John Johansen and I had a brief discussion where > >> he suggested a single LSM wide clone*(2) flag that individual LSM's > >> could opt into via callbacks. > >> > >> the idea there isn't all that different than what Paul proposed. You > >> could have a single flag, if you can provide ancillary information. But > >> a single flag on its own isn't sufficient. > > If one thing has come out of this thread, it would seem to be the fact > > that there is going to be little commonality in the requirements that > > various LSM's will have for the creation of a namespace. > > > > Given that, the most infrastructure that the LSM should provide would > > be a common API for a resource orchestrator to request namespace > > separation and to provide a framework for configuring the namespace > > prior to when execution begins in the context of the namespace. > > > > The first issue to resolve would seem to be what namespace separation > > implies. > > > > John, if I interpret your comments in this discussion correctly, your > > contention is that when namespace separation is requested, all of the > > LSM's that implement namespaces will create a subordinate namespace, > > is that a correct assumption? > > > > It would seem, consistent with the 'stacking' concept, that any LSM > > with namespace capability that chooses not to separate, will result in > > denial of the separation request. That in turn will imply the need to > > unwind or delete any namespace context that other LSM's may have > > allocated before the refusal occurred. > Were it true that 'stacking' rated the status of a 'concept'. If 'concept' doesn't work as a term, we can call it an agreement on the co-existence of multiple security models. > An LSM that is capable of namespacing (the definition of which is > elusive at this time) should be allowed to decline participation in > a namespace creation. Given the above, a full stop may be in order. Perhaps, in pursuit of wisdom, we should call for a general consensus among the group as to whether or not we have any clue as to what we are doing? > That, or there needs to be a convention for "null" namespaces, by > which an LSM can pretend that it isn't involved in the new > namespace. I think the latter smells funny and would invite > "security people don't understand performance" remarks. No LSM > should be allowed to prevent another from using namespaces. Unfortunately that would seem to collide with the general consensus that has evolved around 'stacking', as the means by which Linux supports multiple LSM based security models/architectures. The kernel security architecture admits to the notion that all of the active LSM's have to agree that a specific security event be allowed. If any LSM elects to deny a hook call, permission is denied for the event. John responded to our e-mail in this thread and clarified that he doesn't believe that a POSIX 1e style capability for namespace separation is required. However, our understanding from his reply is that he felt that LSM namespace creation itself should have its own LSM hook/event. If this is the case, to be consistent with the stacking architecture, any LSM should have the ability to deny security namespace creation through its interpretation of the LSM namespace creation hook. For example, it would certainly seem to be a valid concept for something like an enhanced 'lockdown' mode to deny the ability for any processes to escape into an LSM policy domain other than what was configured when the platform was placed in a locked down status. If we don't adhere to this model, we will have a 'snowflake' to contend with in the LSM security model. > > This model also implies that the orchestrator requesting the > > separation will need to pass a set of parameters describing the > > characteristics of each namespace, described by the LSM identifier > > that they pertain to. Since there may be a need to configure multiple > > namespaces there would be a requirement to pass an array or list of > > these parameter sets. > Just like lsm_set_self_attr(2). That provides basic infrastructure, however, with concession to the general acknowledgement that every LSM is different, the requirement for every attribute to have a unique descriptive identity value may prove restrictive, particularly in model based LSM's. What may be needed is an agnostic attribute identifier that orchestration software could use, in combination with the 'flags' variable to specify exactly what type of attribute is being delivered by the system call to an LSM. In other words, the attribute would tell an LSM to interpret the flags value as an indicator of the payload being delivered. > > There will also be a need to inject, possibly substantial amounts of > > policy or model information into the namespace, before execution in > > the context of the namespace begins. > Yup. A major downside of loadable policy. Irregardless of merit, it will be reality, see below. > > There will also be a need to decide whether namespace separation > > should occur at the request of the orchestrator or at the next fork, > > the latter model being what the other resource namespaces use. We > > believe the argument for direct separation can be made by looking at > > the gymnastics that orchestrators need to jump through with the > > 'change-on-fork' model. > > > > Case in point, it would seem realistic that a process with sufficient > > privilege, may desire to place itself in a new LSM namespace context > > in a manner that does not require re-execution of itself. > > > > With respect to separation, the remaining issue is if a new security > > capability bit needs to be implemented to gate namespace separation. > > John, based on your comments, I believe you would support this need? > I don't like the notion of a new capability for this. But then, I > object to almost every new capability proposed. Existing namespaces > don't need their own capabilities. I don't see this case as special. It appears that John is thinking that an LSM hook is what will be needed, so no new capability bit would be required. That concept seems consistent with the precedence that was established by using this type of scheme to control the creation of user namespaces. > >> You can do a subset with a single flag and only policy directing things, > >> but that would cut container managers out of the decision. Without a > >> universal container identifier that really limits what you can do. In > >> another email I likend it to the MCS label approach to the container > >> where you have a single security policy for the container and each > >> container gets to be a unique instance of that policy. Its not a perfect > >> analogy as with namespace policy can be loaded into the namespace making > >> it unique. I don't think the approach is right because not all namespaces > >> implement a loadable policy, and even when they do I think we can do a > >> better job if the container manager is allowed to provide additional > >> context with the namespacing request. > > In order to be relevant, the configuration of LSM namespaces need to > > be under control of a resource orchestrator or container manager. > I do not approve of kernel features that are pointless without > specific user space support. If it can't be used in ways other than > those defined by a particular user space component they really don't > belong in the kernel. It appears you have already created the necessary infrastructure with lsm_set_self_attr(2). Given the apparent consensus that an LSM is free to implement namespaces in whatever manner it pleases, an LSM can offer configuration of an instance of its security namespace with an LSM specific pseudo-filesystem interface. If a centralized namespace separation is pursued, what will be required is a method for loading policy/configuration before execution starts in the context of the namespace. > > What we hear from people doing Kubernetes, at scale, is a desire to be > > able to request that a container be run somewhere in the hardware > > resource pool and for that container to implement a security model > > specific to the needs of the workload running in that container. In a > > manner that is orthogonal from other security policies that may be in > > effect for other workloads, on the host or in other containers. > That sounds to me like they want per-container security policy. That > would require that the kernel have the 'concept' of a > container. That's not something I expect to see in my lifetime. Per-container security policy is the expectation that will be raised by the creation of LSM namespaces. We can speak very directly to that fact, from conversations with groups that are running fleets of thousands of virtual machines supporting tens of thousands of container instances. A 'container' is a set of kernel resource domains applied to an execution workload. An LSM namespace will be another resource domain that is placed around the workload by an orchestration system. Speaking from personal implementation experience. If the LSM namespace is entered and configured before the container runtime engine is started, you have in effect, created a per container security policy for that workload. There are a plethora of issues surrounding this but it may be best to leave those to further evolution of this discussion. Have a good remainder of the week. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-09-04 2:16 ` Dr. Greg @ 2025-09-04 17:40 ` Casey Schaufler 0 siblings, 0 replies; 43+ messages in thread From: Casey Schaufler @ 2025-09-04 17:40 UTC (permalink / raw) To: Dr. Greg Cc: John Johansen, Serge E. Hallyn, Stephen Smalley, Paul Moore, linux-security-module, selinux, Casey Schaufler On 9/3/2025 7:16 PM, Dr. Greg wrote: > On Mon, Sep 01, 2025 at 10:31:43AM -0700, Casey Schaufler wrote: > > Hi, I hope mid-week has gone well for everyone. > >> On 9/1/2025 9:01 AM, Dr. Greg wrote: >>> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote: >>> >>> Good morning, I hope the week is starting well for everyone. >>> >>> Now that everyone is getting past the summer holiday season, it would >>> seem useful to specifically clarify some of the LSM namespace >>> implementation details. >>> >>>> On 8/21/25 07:26, Serge E. Hallyn wrote: >>>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: >>>>>> On 8/19/25 10:47, Stephen Smalley wrote: >>>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> >>>>>>> wrote: >>>>>>>> Hello all, >>>>>>>> >>>>>>>> As most of you are likely aware, Stephen Smalley has been working on >>>>>>>> adding namespace support to SELinux, and the work has now progressed >>>>>>>> to the point where a serious discussion on the API is warranted. For >>>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or >>>>>>>> simply need a refresher, he has some excellent documentation in his >>>>>>>> work-in-progress repo: >>>>>>>> >>>>>>>> * https://github.com/stephensmalley/selinuxns >>>>>>>> >>>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year >>>>>>>> about SELinux namespacing, you can watch the presentation here: >>>>>>>> >>>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM >>>>>>>> >>>>>>>> In the past you've heard me state, rather firmly at times, that I >>>>>>>> believe namespacing at the LSM framework layer to be a mistake, >>>>>>>> although if there is something that can be done to help facilitate the >>>>>>>> namespacing of individual LSMs at the framework layer, I would be >>>>>>>> supportive of that. I think that a single LSM namespace API, similar >>>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like >>>>>>>> us to have a discussion to see if we all agree on that, and if so, >>>>>>>> what such an API might look like. >>>>>>>> >>>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where >>>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's >>>>>>>> could opt into via callbacks. John is directly CC'd on this mail, so >>>>>>>> I'll let him expand on this idea. >>>>>>>> >>>>>>>> While I agree with John that a fs based API is problematic (see all of >>>>>>>> our discussions around the LSM syscalls), I'm concerned that a single >>>>>>>> clone*(2) flag will significantly limit our flexibility around how >>>>>>>> individual LSMs are namespaced, something I don't want to see happen. >>>>>>>> This makes me wonder about the potential for expanding >>>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support >>>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would >>>>>>>> provide a single LSM framework API for an unshare operation while also >>>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if >>>>>>>> needed. Just as we do with the other LSM_ATTR_* flags today, >>>>>>>> individual LSMs can opt-in to the API fairly easily by providing a >>>>>>>> setselfattr() LSM callback. >>>>>>>> >>>>>>>> Thoughts? >>>>>>> I think we want to be able to unshare a specific security module >>>>>>> namespace without unsharing the others, i.e. just SELinux or just >>>>>>> AppArmor. >>>>>> yes which is part of the problem with the single flag. That choice >>>>>> would be entirely at the policy level, without any input from userspace. >>>>> AIUI Paul's suggestion is the user can pre-set the details of which >>>>> lsms to unshare and how with the lsm_set_self_attr(), and then a >>>>> single CLONE_LSM effects that. >>>> yes, I was specifically addressing the conversation I had with Paul at >>>> LSS that Paul brought up. That is >>>> >>>> At LSS-NA this year, John Johansen and I had a brief discussion where >>>> he suggested a single LSM wide clone*(2) flag that individual LSM's >>>> could opt into via callbacks. >>>> >>>> the idea there isn't all that different than what Paul proposed. You >>>> could have a single flag, if you can provide ancillary information. But >>>> a single flag on its own isn't sufficient. >>> If one thing has come out of this thread, it would seem to be the fact >>> that there is going to be little commonality in the requirements that >>> various LSM's will have for the creation of a namespace. >>> >>> Given that, the most infrastructure that the LSM should provide would >>> be a common API for a resource orchestrator to request namespace >>> separation and to provide a framework for configuring the namespace >>> prior to when execution begins in the context of the namespace. >>> >>> The first issue to resolve would seem to be what namespace separation >>> implies. >>> >>> John, if I interpret your comments in this discussion correctly, your >>> contention is that when namespace separation is requested, all of the >>> LSM's that implement namespaces will create a subordinate namespace, >>> is that a correct assumption? >>> >>> It would seem, consistent with the 'stacking' concept, that any LSM >>> with namespace capability that chooses not to separate, will result in >>> denial of the separation request. That in turn will imply the need to >>> unwind or delete any namespace context that other LSM's may have >>> allocated before the refusal occurred. >> Were it true that 'stacking' rated the status of a 'concept'. > If 'concept' doesn't work as a term, we can call it an agreement on > the co-existence of multiple security models. Sure. >> An LSM that is capable of namespacing (the definition of which is >> elusive at this time) should be allowed to decline participation in >> a namespace creation. > Given the above, a full stop may be in order. > > Perhaps, in pursuit of wisdom, we should call for a general consensus > among the group as to whether or not we have any clue as to what we > are doing? That's the purpose of this thread, I believe. Now, whether we'll ever get to true consensus seems unlikely, but I expect to see something close enough that the wailing of those opposed will fail to prevent acceptance. >> That, or there needs to be a convention for "null" namespaces, by >> which an LSM can pretend that it isn't involved in the new >> namespace. I think the latter smells funny and would invite >> "security people don't understand performance" remarks. No LSM >> should be allowed to prevent another from using namespaces. > Unfortunately that would seem to collide with the general consensus > that has evolved around 'stacking', as the means by which Linux > supports multiple LSM based security models/architectures. I don't see that at all. For whatever reason, the developers of namespaces chose to ignore the LSM infrastructure and the implications their scheme has upon it. Managing the combination of differing philosophies is often complex, and this is no exception. > The kernel security architecture admits to the notion that all of > the active LSM's have to agree that a specific security event be > allowed. If any LSM elects to deny a hook call, permission is denied > for the event. call_void_hook() > John responded to our e-mail in this thread and clarified that he > doesn't believe that a POSIX 1e style capability for namespace > separation is required. However, our understanding from his reply is > that he felt that LSM namespace creation itself should have its own > LSM hook/event. > > If this is the case, to be consistent with the stacking architecture, > any LSM should have the ability to deny security namespace creation > through its interpretation of the LSM namespace creation hook. > > For example, it would certainly seem to be a valid concept for > something like an enhanced 'lockdown' mode to deny the ability for any > processes to escape into an LSM policy domain other than what was > configured when the platform was placed in a locked down status. > > If we don't adhere to this model, we will have a 'snowflake' to > contend with in the LSM security model. Again, call_void_hook() >>> This model also implies that the orchestrator requesting the >>> separation will need to pass a set of parameters describing the >>> characteristics of each namespace, described by the LSM identifier >>> that they pertain to. Since there may be a need to configure multiple >>> namespaces there would be a requirement to pass an array or list of >>> these parameter sets. >> Just like lsm_set_self_attr(2). > That provides basic infrastructure, however, with concession to the > general acknowledgement that every LSM is different, the requirement > for every attribute to have a unique descriptive identity value may > prove restrictive, particularly in model based LSM's. That was an argument made against the lsm_set_self_attr() interface in the beginning. Even if lsm_set_self_attr() isn't the answer, it provides a clue on how to formulate one. > What may be needed is an agnostic attribute identifier that > orchestration software could use, in combination with the 'flags' > variable to specify exactly what type of attribute is being delivered > by the system call to an LSM. In other words, the attribute would > tell an LSM to interpret the flags value as an indicator of the > payload being delivered. That's what flags are for. Or did I miss something? >>> There will also be a need to inject, possibly substantial amounts of >>> policy or model information into the namespace, before execution in >>> the context of the namespace begins. >> Yup. A major downside of loadable policy. > Irregardless of merit, it will be reality, see below. s/Irregardless/Regardless/ "Irregardless" is not a word. >>> There will also be a need to decide whether namespace separation >>> should occur at the request of the orchestrator or at the next fork, >>> the latter model being what the other resource namespaces use. We >>> believe the argument for direct separation can be made by looking at >>> the gymnastics that orchestrators need to jump through with the >>> 'change-on-fork' model. >>> >>> Case in point, it would seem realistic that a process with sufficient >>> privilege, may desire to place itself in a new LSM namespace context >>> in a manner that does not require re-execution of itself. >>> >>> With respect to separation, the remaining issue is if a new security >>> capability bit needs to be implemented to gate namespace separation. >>> John, based on your comments, I believe you would support this need? >> I don't like the notion of a new capability for this. But then, I >> object to almost every new capability proposed. Existing namespaces >> don't need their own capabilities. I don't see this case as special. > It appears that John is thinking that an LSM hook is what will be > needed, so no new capability bit would be required. > > That concept seems consistent with the precedence that was established > by using this type of scheme to control the creation of user > namespaces. > >>>> You can do a subset with a single flag and only policy directing things, >>>> but that would cut container managers out of the decision. Without a >>>> universal container identifier that really limits what you can do. In >>>> another email I likend it to the MCS label approach to the container >>>> where you have a single security policy for the container and each >>>> container gets to be a unique instance of that policy. Its not a perfect >>>> analogy as with namespace policy can be loaded into the namespace making >>>> it unique. I don't think the approach is right because not all namespaces >>>> implement a loadable policy, and even when they do I think we can do a >>>> better job if the container manager is allowed to provide additional >>>> context with the namespacing request. >>> In order to be relevant, the configuration of LSM namespaces need to >>> be under control of a resource orchestrator or container manager. >> I do not approve of kernel features that are pointless without >> specific user space support. If it can't be used in ways other than >> those defined by a particular user space component they really don't >> belong in the kernel. > It appears you have already created the necessary infrastructure with > lsm_set_self_attr(2). > > Given the apparent consensus that an LSM is free to implement > namespaces in whatever manner it pleases, an LSM can offer > configuration of an instance of its security namespace with an LSM > specific pseudo-filesystem interface. > > If a centralized namespace separation is pursued, what will be > required is a method for loading policy/configuration before execution > starts in the context of the namespace. Just so. >>> What we hear from people doing Kubernetes, at scale, is a desire to be >>> able to request that a container be run somewhere in the hardware >>> resource pool and for that container to implement a security model >>> specific to the needs of the workload running in that container. In a >>> manner that is orthogonal from other security policies that may be in >>> effect for other workloads, on the host or in other containers. >> That sounds to me like they want per-container security policy. That >> would require that the kernel have the 'concept' of a >> container. That's not something I expect to see in my lifetime. > Per-container security policy is the expectation that will be raised > by the creation of LSM namespaces. We can speak very directly to that > fact, from conversations with groups that are running fleets of > thousands of virtual machines supporting tens of thousands of > container instances. All the more reason not to implement them at the LSM level. If you can't meet expectations, the effort is futile. > A 'container' is a set of kernel resource domains applied to an > execution workload. An LSM namespace will be another resource domain > that is placed around the workload by an orchestration system. A 'container' is whatever the snake oil sales rep says it is. Kata containers use virtual machines. Containers may be implemented without an "orchestration system". > Speaking from personal implementation experience. If the LSM > namespace is entered and configured before the container runtime > engine is started, you have in effect, created a per container > security policy for that workload. The base system policy will still be enforced. Having multiple policies in place is tricky. What we can't have is a system where the base policy is replaced rather than supplemented. It that not obvious? > There are a plethora of issues surrounding this but it may be best to > leave those to further evolution of this discussion. > > Have a good remainder of the week. > > As always, > Dr. Greg > > The Quixote Project - Flailing at the Travails of Cybersecurity > https://github.com/Quixote-Project > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-09-01 16:01 ` Dr. Greg 2025-09-01 17:31 ` Casey Schaufler @ 2025-09-02 10:55 ` John Johansen 2025-09-05 22:14 ` Dr. Greg 1 sibling, 1 reply; 43+ messages in thread From: John Johansen @ 2025-09-02 10:55 UTC (permalink / raw) To: Dr. Greg Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore, linux-security-module, selinux On 9/1/25 09:01, Dr. Greg wrote: > On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote: > > Good morning, I hope the week is starting well for everyone. > > Now that everyone is getting past the summer holiday season, it would > seem useful to specifically clarify some of the LSM namespace > implementation details. > >> On 8/21/25 07:26, Serge E. Hallyn wrote: >>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: >>>> On 8/19/25 10:47, Stephen Smalley wrote: >>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> >>>>> wrote: >>>>>> >>>>>> Hello all, >>>>>> >>>>>> As most of you are likely aware, Stephen Smalley has been working on >>>>>> adding namespace support to SELinux, and the work has now progressed >>>>>> to the point where a serious discussion on the API is warranted. For >>>>>> those of you are unfamiliar with the details or Stephen's patchset, or >>>>>> simply need a refresher, he has some excellent documentation in his >>>>>> work-in-progress repo: >>>>>> >>>>>> * https://github.com/stephensmalley/selinuxns >>>>>> >>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year >>>>>> about SELinux namespacing, you can watch the presentation here: >>>>>> >>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM >>>>>> >>>>>> In the past you've heard me state, rather firmly at times, that I >>>>>> believe namespacing at the LSM framework layer to be a mistake, >>>>>> although if there is something that can be done to help facilitate the >>>>>> namespacing of individual LSMs at the framework layer, I would be >>>>>> supportive of that. I think that a single LSM namespace API, similar >>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like >>>>>> us to have a discussion to see if we all agree on that, and if so, >>>>>> what such an API might look like. >>>>>> >>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where >>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's >>>>>> could opt into via callbacks. John is directly CC'd on this mail, so >>>>>> I'll let him expand on this idea. >>>>>> >>>>>> While I agree with John that a fs based API is problematic (see all of >>>>>> our discussions around the LSM syscalls), I'm concerned that a single >>>>>> clone*(2) flag will significantly limit our flexibility around how >>>>>> individual LSMs are namespaced, something I don't want to see happen. >>>>>> This makes me wonder about the potential for expanding >>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support >>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would >>>>>> provide a single LSM framework API for an unshare operation while also >>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if >>>>>> needed. Just as we do with the other LSM_ATTR_* flags today, >>>>>> individual LSMs can opt-in to the API fairly easily by providing a >>>>>> setselfattr() LSM callback. >>>>>> >>>>>> Thoughts? >>>>> >>>>> I think we want to be able to unshare a specific security module >>>>> namespace without unsharing the others, i.e. just SELinux or just >>>>> AppArmor. >>>> >>>> yes which is part of the problem with the single flag. That choice >>>> would be entirely at the policy level, without any input from userspace. >>> >>> AIUI Paul's suggestion is the user can pre-set the details of which >>> lsms to unshare and how with the lsm_set_self_attr(), and then a >>> single CLONE_LSM effects that. > >> yes, I was specifically addressing the conversation I had with Paul at >> LSS that Paul brought up. That is >> >> At LSS-NA this year, John Johansen and I had a brief discussion where >> he suggested a single LSM wide clone*(2) flag that individual LSM's >> could opt into via callbacks. >> >> the idea there isn't all that different than what Paul proposed. You >> could have a single flag, if you can provide ancillary information. But >> a single flag on its own isn't sufficient. > > If one thing has come out of this thread, it would seem to be the fact > that there is going to be little commonality in the requirements that > various LSM's will have for the creation of a namespace. > yes > Given that, the most infrastructure that the LSM should provide would > be a common API for a resource orchestrator to request namespace > separation and to provide a framework for configuring the namespace > prior to when execution begins in the context of the namespace. > hrmmm, certainly a common API. Any task could theoretically use the API it doesn't have to be a resource orchestrator, but I suppose you could call it such. I also dont know that we need to provide a framework for configuring the namespace prior to when execcution begins in the context of the namespace. It might be a nice to have, but configuring of LSMs is very LSM specific. We don't even have a common LSM policy load interface atm, though there is a proposal. Configuration is a step beyond that. Would it be nice to have, sure. Are we going to get that far, I don't know. > The first issue to resolve would seem to be what namespace separation > implies. > > John, if I interpret your comments in this discussion correctly, your > contention is that when namespace separation is requested, all of the > LSM's that implement namespaces will create a subordinate namespace, > is that a correct assumption? > No, not necessarily. The task can request to "unshare/create" LSMs similar to requesting a set of system namespaces. Then every LSM, whether part of the request or not get to do their thing. If every LSM agrees, then a transition hook will process and each LSM will again do its thing. This would likely be what was requested but its possible that an LSM not in the request will do something, based on its model. In the end usespace gets to make a request, each security policy is responsible for staying withing its security model/policy. > It would seem, consistent with the 'stacking' concept, that any LSM > with namespace capability that chooses not to separate, will result in > denial of the separation request. That in turn will imply the need to Not necessarily. They could allow and choose not to transition. Or they could not create a namespace but update some state. > unwind or delete any namespace context that other LSM's may have > allocated before the refusal occurred. The request does need to be split into a permission hook and a transition hook similar to exec. If any LSM in the permission hook denies, the request is denied. If any LSM in the transition hook fails again the request will fail, and the LSMs would get their regular clean up hook called for the object associated. > > This model also implies that the orchestrator requesting the > separation will need to pass a set of parameters describing the > characteristics of each namespace, described by the LSM identifier > that they pertain to. Since there may be a need to configure multiple > namespaces there would be a requirement to pass an array or list of > these parameter sets. > yes it will require a list/array see lsm_set_self_attr(2) > There will also be a need to inject, possibly substantial amounts of > policy or model information into the namespace, before execution in > the context of the namespace begins. > Allowing for this and requiring this are two different things. Like I said above we don't even currently have a common policy load interface. Configuration is another step beyond policy load. > There will also be a need to decide whether namespace separation > should occur at the request of the orchestrator or at the next fork, Or allow both, but yes a decision needs to be made > the latter model being what the other resource namespaces use. We > believe the argument for direct separation can be made by looking at > the gymnastics that orchestrators need to jump through with the > 'change-on-fork' model. > Looking at current system namespacing we have clone/unshare which really or on fork. setns enters existing namespaces. We either need to create new variants of clone/unshare or potentially have an LSM syscall that setups addition parameters that then are triggered by clone/unshare. If going the latter route then its just a matter whether the LSM call returns a handle that can be operated on or not. > Case in point, it would seem realistic that a process with sufficient > privilege, may desire to place itself in a new LSM namespace context > in a manner that does not require re-execution of itself. > yes, but it is questionable whether security policy should allow that. At the very least security policy should be consulted and may deny it. > With respect to separation, the remaining issue is if a new security > capability bit needs to be implemented to gate namespace separation. > John, based on your comments, I believe you would support this need? > No, I don't think a capability (as in posix.1e) per say is needed. I think an LSM permission request is. >> You can do a subset with a single flag and only policy directing things, >> but that would cut container managers out of the decision. Without a >> universal container identifier that really limits what you can do. In >> another email I likend it to the MCS label approach to the container >> where you have a single security policy for the container and each >> container gets to be a unique instance of that policy. Its not a perfect >> analogy as with namespace policy can be loaded into the namespace making >> it unique. I don't think the approach is right because not all namespaces >> implement a loadable policy, and even when they do I think we can do a >> better job if the container manager is allowed to provide additional >> context with the namespacing request. > > In order to be relevant, the configuration of LSM namespaces need to > be under control of a resource orchestrator or container manager. > No, the must be under the control of the LSMs. > What we hear from people doing Kubernetes, at scale, is a desire to be > able to request that a container be run somewhere in the hardware > resource pool and for that container to implement a security model > specific to the needs of the workload running in that container. In a > manner that is orthogonal from other security policies that may be in > effect for other workloads, on the host or in other containers. > sure, assuming the host policy allows it. Otherwise it is just a host policy by-pass, which can not be allowed. K8s people have a specific use case, they need to configure the host for that use case. They can not expect that use case to work on host that has been configured for say an MLS security constraint. > Hopefully the above will be of assistance in furthering discussion. > > Have a good week. > > As always, > Dr. Greg > > The Quixote Project - Flailing at the Travails of Cybersecurity > https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-09-02 10:55 ` John Johansen @ 2025-09-05 22:14 ` Dr. Greg 2025-09-06 2:01 ` John Johansen 0 siblings, 1 reply; 43+ messages in thread From: Dr. Greg @ 2025-09-05 22:14 UTC (permalink / raw) To: John Johansen Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore, linux-security-module, selinux On Tue, Sep 02, 2025 at 03:55:39AM -0700, John Johansen wrote: Hi, I hope the week has gone well for everyone. > On 9/1/25 09:01, Dr. Greg wrote: > >On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote: > > > >Good morning, I hope the week is starting well for everyone. > > > >Now that everyone is getting past the summer holiday season, it would > >seem useful to specifically clarify some of the LSM namespace > >implementation details. > > > >>On 8/21/25 07:26, Serge E. Hallyn wrote: > >>>On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: > >>>>On 8/19/25 10:47, Stephen Smalley wrote: > >>>>>On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> > >>>>>wrote: > >>>>>> > >>>>>>Hello all, > >>>>>> > >>>>>>As most of you are likely aware, Stephen Smalley has been working on > >>>>>>adding namespace support to SELinux, and the work has now progressed > >>>>>>to the point where a serious discussion on the API is warranted. For > >>>>>>those of you are unfamiliar with the details or Stephen's patchset, or > >>>>>>simply need a refresher, he has some excellent documentation in his > >>>>>>work-in-progress repo: > >>>>>> > >>>>>>* https://github.com/stephensmalley/selinuxns > >>>>>> > >>>>>>Stephen also gave a (pre-recorded) presentation at LSS-NA this year > >>>>>>about SELinux namespacing, you can watch the presentation here: > >>>>>> > >>>>>>* https://www.youtube.com/watch?v=AwzGCOwxLoM > >>>>>> > >>>>>>In the past you've heard me state, rather firmly at times, that I > >>>>>>believe namespacing at the LSM framework layer to be a mistake, > >>>>>>although if there is something that can be done to help facilitate the > >>>>>>namespacing of individual LSMs at the framework layer, I would be > >>>>>>supportive of that. I think that a single LSM namespace API, similar > >>>>>>to our recently added LSM syscalls, may be such a thing, so I'd like > >>>>>>us to have a discussion to see if we all agree on that, and if so, > >>>>>>what such an API might look like. > >>>>>> > >>>>>>At LSS-NA this year, John Johansen and I had a brief discussion where > >>>>>>he suggested a single LSM wide clone*(2) flag that individual LSM's > >>>>>>could opt into via callbacks. John is directly CC'd on this mail, so > >>>>>>I'll let him expand on this idea. > >>>>>> > >>>>>>While I agree with John that a fs based API is problematic (see all of > >>>>>>our discussions around the LSM syscalls), I'm concerned that a single > >>>>>>clone*(2) flag will significantly limit our flexibility around how > >>>>>>individual LSMs are namespaced, something I don't want to see happen. > >>>>>>This makes me wonder about the potential for expanding > >>>>>>lsm_set_self_attr(2) to support a new LSM attribute that would support > >>>>>>a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > >>>>>>provide a single LSM framework API for an unshare operation while also > >>>>>>providing a mechanism to pass LSM specific via the lsm_ctx struct if > >>>>>>needed. Just as we do with the other LSM_ATTR_* flags today, > >>>>>>individual LSMs can opt-in to the API fairly easily by providing a > >>>>>>setselfattr() LSM callback. > >>>>>> > >>>>>>Thoughts? > >>>>> > >>>>>I think we want to be able to unshare a specific security module > >>>>>namespace without unsharing the others, i.e. just SELinux or just > >>>>>AppArmor. > >>>> > >>>>yes which is part of the problem with the single flag. That choice > >>>>would be entirely at the policy level, without any input from userspace. > >>> > >>>AIUI Paul's suggestion is the user can pre-set the details of which > >>>lsms to unshare and how with the lsm_set_self_attr(), and then a > >>>single CLONE_LSM effects that. > > > >>yes, I was specifically addressing the conversation I had with Paul at > >>LSS that Paul brought up. That is > >> > >> At LSS-NA this year, John Johansen and I had a brief discussion where > >> he suggested a single LSM wide clone*(2) flag that individual LSM's > >> could opt into via callbacks. > >> > >>the idea there isn't all that different than what Paul proposed. You > >>could have a single flag, if you can provide ancillary information. But > >>a single flag on its own isn't sufficient. > > > >If one thing has come out of this thread, it would seem to be the fact > >that there is going to be little commonality in the requirements that > >various LSM's will have for the creation of a namespace. > yes Given that and the conversations to date, the open question may be whether there needs to be a common 'LSM namespace' infrastructure at all or just punt everything to LSM's that choose to implement namespaces. > >Given that, the most infrastructure that the LSM should provide would > >be a common API for a resource orchestrator to request namespace > >separation and to provide a framework for configuring the namespace > >prior to when execution begins in the context of the namespace. > hrmmm, certainly a common API. Any task could theoretically use the API > it doesn't have to be a resource orchestrator, but I suppose you could > call it such. No argument that any task could call for separation. We seem to be dancing around the notion that the primary use, nee demand, for a security namespace will be to allow container specific security policies. In that scenario, the resource orchestrator or container runtime will be what is requesting a specific security model to be implemented in a namespace. > I also dont know that we need to provide a framework for configuring > the namespace prior to when execcution begins in the context of the > namespace. It might be a nice to have, but configuring of LSMs is > very LSM specific. > > We don't even have a common LSM policy load interface atm, though there > is a proposal. Configuration is a step beyond that. Would it be nice > to have, sure. Are we going to get that far, I don't know. At least for model based LSM's, the configuration needs to occur before execution within the namespace begins in order to avoid possible races with respect to the security policy that gets effected. Casey advocates for the use of lsm_set_self_attr(2), which has the advantage of a common API and is probably sufficient if an LSM elects to provide a generic management interface. The system call is currently not namespace aware so the challenge will be how to direct the configuration payload to the correct namespace. Given that limitation, it seems highly probably that individual LSM's will implement configuration/policy management via their various pseudo-filesystem implementations that will grow awareness for the namespace context that the commands are being issued for. > >The first issue to resolve would seem to be what namespace separation > >implies. > > > >John, if I interpret your comments in this discussion correctly, your > >contention is that when namespace separation is requested, all of the > >LSM's that implement namespaces will create a subordinate namespace, > >is that a correct assumption? > No, not necessarily. The task can request to "unshare/create" LSMs > similar to requesting a set of system namespaces. Then every LSM, > whether part of the request or not get to do their thing. If every > LSM agrees, then a transition hook will process and each LSM will > again do its thing. This would likely be what was requested but its > possible that an LSM not in the request will do something, based on > its model. > > In the end usespace gets to make a request, each security policy is > responsible for staying withing its security model/policy. This approach seems contrary to what Casey is advocating for in our conversations, but perhaps we misunderstand what he is saying. Casey indicated that no other LSM should be able to deny the ability of another LSM to create a namespace. As we noted in our exchange with him, this seems to violate the current LSM model where all of the LSM's need to agree that an event should be allowed, or it fails. > >It would seem, consistent with the 'stacking' concept, that any LSM > >with namespace capability that chooses not to separate, will result in > >denial of the separation request. That in turn will imply the need to > Not necessarily. They could allow and choose not to transition. Or > they could not create a namespace but update some state. > >unwind or delete any namespace context that other LSM's may have > >allocated before the refusal occurred. > The request does need to be split into a permission hook and a > transition hook similar to exec. If any LSM in the permission hook > denies, the request is denied. If any LSM in the transition hook > fails again the request will fail, and the LSMs would get their > regular clean up hook called for the object associated. See above, the open question seems to be whether or not there is agreement that any LSM can generically deny the creation of namespace creation. Again, we may misunderstand Casey on this issue. > >This model also implies that the orchestrator requesting the > >separation will need to pass a set of parameters describing the > >characteristics of each namespace, described by the LSM identifier > >that they pertain to. Since there may be a need to configure multiple > >namespaces there would be a requirement to pass an array or list of > >these parameter sets. > yes it will require a list/array see lsm_set_self_attr(2) Again, the issue is making this system call namespace aware. > >There will also be a need to inject, possibly substantial amounts of > >policy or model information into the namespace, before execution in > >the context of the namespace begins. > Allowing for this and requiring this are two different things. Like > I said above we don't even currently have a common policy load > interface. Configuration is another step beyond policy load. It would seem the most straight forward path is to simply punt this to the LSM's itself. If nothing else, it reduces the issues that everyone needs to agree on. > >There will also be a need to decide whether namespace separation > >should occur at the request of the orchestrator or at the next fork, > Or allow both, but yes a decision needs to be made Again, allow both at the discretion of the LSM. > >the latter model being what the other resource namespaces use. We > >believe the argument for direct separation can be made by looking at > >the gymnastics that orchestrators need to jump through with the > >'change-on-fork' model. > Looking at current system namespacing we have clone/unshare which > really or on fork. setns enters existing namespaces. > > We either need to create new variants of clone/unshare or potentially > have an LSM syscall that setups addition parameters that then are > triggered by clone/unshare. If going the latter route then its just > a matter whether the LSM call returns a handle that can be operated > on or not. We will find that current namespace semantics are challenging with respect to being a good model for LSM namespaces. Current namespaces focus on managing a single resource. In contrast, as we have seen in our discussions, an 'LSM namespace' involves multiple resources, each with their own specific requirements. On top of that we have the complication of 'stacking' where anything that happens will be the composite of what all the LSM's agree on, some of which may be in the root namespace and some of which may be in subordinate namespaces. The notion of a process entering a security namespace, aka setns, will be interesting. It would seem that this will require callbacks to every LSM that is participating in the namespace. Presumably all of the references to LSM security contexts will need to be suspended and replaced with references to the context(s) for the security namespace that is being entered. With respect to managing this effectively, we would advocate for a 64-bit global counter that gets incremented on each successful LSM namespace creation event. That would provide a unique handle for the namespace that will never wrap. > >Case in point, it would seem realistic that a process with sufficient > >privilege, may desire to place itself in a new LSM namespace context > >in a manner that does not require re-execution of itself. > yes, but it is questionable whether security policy should allow that. > At the very least security policy should be consulted and may deny > it. What we are talking about here is the need to support a process requesting to run in an alternate LSM namespace without forking. The question of whether this should be allowed will be regulated by whatever composite security policy is operational, the same as would be the case with the switch on fork model. > >With respect to separation, the remaining issue is if a new security > >capability bit needs to be implemented to gate namespace separation. > >John, based on your comments, I believe you would support this need? > No, I don't think a capability (as in posix.1e) per say is needed. I > think an LSM permission request is. Once again, that seems inconsistent with what Casey is advocating. Although I'm sure he is happy that a new capability bit is not in the offing... :-) > >>You can do a subset with a single flag and only policy directing things, > >>but that would cut container managers out of the decision. Without a > >>universal container identifier that really limits what you can do. In > >>another email I likend it to the MCS label approach to the container > >>where you have a single security policy for the container and each > >>container gets to be a unique instance of that policy. Its not a perfect > >>analogy as with namespace policy can be loaded into the namespace making > >>it unique. I don't think the approach is right because not all namespaces > >>implement a loadable policy, and even when they do I think we can do a > >>better job if the container manager is allowed to provide additional > >>context with the namespacing request. > > > >In order to be relevant, the configuration of LSM namespaces need to > >be under control of a resource orchestrator or container manager. > No, the must be under the control of the LSMs. I think we are talking past one another. Configuration was perhaps a poor choice of vernacular, we were referring to policy or model load. As we mentioned in our exchange with Casey, the expection for all of this from the user community will be to allow resource orchestrators to run a workload under the constraints of a specific security policy. Where policy should be probably plural. Stephen even notes this on the slides that are linked from his GitHub selinuxns site. > >What we hear from people doing Kubernetes, at scale, is a desire to be > >able to request that a container be run somewhere in the hardware > >Resource pool and for that container to implement a security model > >specific to the needs of the workload running in that container. In a > >manner that is orthogonal from other security policies that may be in > >effect for other workloads, on the host or in other containers. > sure, assuming the host policy allows it. Otherwise it is just a host > policy by-pass, which can not be allowed. K8s people have a specific > use case, they need to configure the host for that use case. They can > not expect that use case to work on host that has been configured > for say an MLS security constraint. Given that the concept of LSM stacking is overlaid on top of namespaces, the result of all this will be security policies that will be very interesting to reason about, particularly if multiple levels of namespacing are allowed. The other issue will be potential performance issues for LSM's that choose to chase permissions all the way back up to the root namespace. We've heard continuous suggestions that every pointer de-reference is problematic from a performance perspective. So, lots of issues to consider in all of this. Have a good weekend. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-09-05 22:14 ` Dr. Greg @ 2025-09-06 2:01 ` John Johansen 0 siblings, 0 replies; 43+ messages in thread From: John Johansen @ 2025-09-06 2:01 UTC (permalink / raw) To: Dr. Greg Cc: Serge E. Hallyn, Stephen Smalley, Paul Moore, linux-security-module, selinux On 9/5/25 15:14, Dr. Greg wrote: > On Tue, Sep 02, 2025 at 03:55:39AM -0700, John Johansen wrote: > > Hi, I hope the week has gone well for everyone. > I wish, *sigh* >> On 9/1/25 09:01, Dr. Greg wrote: >>> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote: >>> >>> Good morning, I hope the week is starting well for everyone. >>> >>> Now that everyone is getting past the summer holiday season, it would >>> seem useful to specifically clarify some of the LSM namespace >>> implementation details. >>> >>>> On 8/21/25 07:26, Serge E. Hallyn wrote: >>>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote: >>>>>> On 8/19/25 10:47, Stephen Smalley wrote: >>>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul@paul-moore.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hello all, >>>>>>>> >>>>>>>> As most of you are likely aware, Stephen Smalley has been working on >>>>>>>> adding namespace support to SELinux, and the work has now progressed >>>>>>>> to the point where a serious discussion on the API is warranted. For >>>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or >>>>>>>> simply need a refresher, he has some excellent documentation in his >>>>>>>> work-in-progress repo: >>>>>>>> >>>>>>>> * https://github.com/stephensmalley/selinuxns >>>>>>>> >>>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year >>>>>>>> about SELinux namespacing, you can watch the presentation here: >>>>>>>> >>>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM >>>>>>>> >>>>>>>> In the past you've heard me state, rather firmly at times, that I >>>>>>>> believe namespacing at the LSM framework layer to be a mistake, >>>>>>>> although if there is something that can be done to help facilitate the >>>>>>>> namespacing of individual LSMs at the framework layer, I would be >>>>>>>> supportive of that. I think that a single LSM namespace API, similar >>>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like >>>>>>>> us to have a discussion to see if we all agree on that, and if so, >>>>>>>> what such an API might look like. >>>>>>>> >>>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where >>>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's >>>>>>>> could opt into via callbacks. John is directly CC'd on this mail, so >>>>>>>> I'll let him expand on this idea. >>>>>>>> >>>>>>>> While I agree with John that a fs based API is problematic (see all of >>>>>>>> our discussions around the LSM syscalls), I'm concerned that a single >>>>>>>> clone*(2) flag will significantly limit our flexibility around how >>>>>>>> individual LSMs are namespaced, something I don't want to see happen. >>>>>>>> This makes me wonder about the potential for expanding >>>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support >>>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would >>>>>>>> provide a single LSM framework API for an unshare operation while also >>>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if >>>>>>>> needed. Just as we do with the other LSM_ATTR_* flags today, >>>>>>>> individual LSMs can opt-in to the API fairly easily by providing a >>>>>>>> setselfattr() LSM callback. >>>>>>>> >>>>>>>> Thoughts? >>>>>>> >>>>>>> I think we want to be able to unshare a specific security module >>>>>>> namespace without unsharing the others, i.e. just SELinux or just >>>>>>> AppArmor. >>>>>> >>>>>> yes which is part of the problem with the single flag. That choice >>>>>> would be entirely at the policy level, without any input from userspace. >>>>> >>>>> AIUI Paul's suggestion is the user can pre-set the details of which >>>>> lsms to unshare and how with the lsm_set_self_attr(), and then a >>>>> single CLONE_LSM effects that. >>> >>>> yes, I was specifically addressing the conversation I had with Paul at >>>> LSS that Paul brought up. That is >>>> >>>> At LSS-NA this year, John Johansen and I had a brief discussion where >>>> he suggested a single LSM wide clone*(2) flag that individual LSM's >>>> could opt into via callbacks. >>>> >>>> the idea there isn't all that different than what Paul proposed. You >>>> could have a single flag, if you can provide ancillary information. But >>>> a single flag on its own isn't sufficient. >>> >>> If one thing has come out of this thread, it would seem to be the fact >>> that there is going to be little commonality in the requirements that >>> various LSM's will have for the creation of a namespace. > >> yes > > Given that and the conversations to date, the open question may be > whether there needs to be a common 'LSM namespace' infrastructure at > all or just punt everything to LSM's that choose to implement > namespaces. > >>> Given that, the most infrastructure that the LSM should provide would >>> be a common API for a resource orchestrator to request namespace >>> separation and to provide a framework for configuring the namespace >>> prior to when execution begins in the context of the namespace. > >> hrmmm, certainly a common API. Any task could theoretically use the API >> it doesn't have to be a resource orchestrator, but I suppose you could >> call it such. > > No argument that any task could call for separation. > > We seem to be dancing around the notion that the primary use, nee > demand, for a security namespace will be to allow container specific > security policies. In that scenario, the resource orchestrator or > container runtime will be what is requesting a specific security > model to be implemented in a namespace. > no that is one use of them. AppArmor is using namespaces for sub-confinement/priv sep They are also used for tiered policy restrictions, and global black listing, and unprivileged user and application policy. >> I also dont know that we need to provide a framework for configuring >> the namespace prior to when execcution begins in the context of the >> namespace. It might be a nice to have, but configuring of LSMs is >> very LSM specific. >> >> We don't even have a common LSM policy load interface atm, though there >> is a proposal. Configuration is a step beyond that. Would it be nice >> to have, sure. Are we going to get that far, I don't know. > > At least for model based LSM's, the configuration needs to occur > before execution within the namespace begins in order to avoid > possible races with respect to the security policy that gets effected. > depends on what you mean by configuration. There might be some config of the namespace, but policy doesn't necessarily need to be loaded. In both the unprivileged user and unprivileged application policy cases, policy needs to be loaded after the the namespace is entered. You can even split the container/orchastrator case, where LXD emulating a system, will want to load the system policy as part of the OS boot processes. Where the docker/k8s/sandboxing use case have an orchestrator sandbox app setup policy before hand. > Casey advocates for the use of lsm_set_self_attr(2), which has the > advantage of a common API and is probably sufficient if an LSM elects > to provide a generic management interface. > yeah that or something similar seems to be the way to go > The system call is currently not namespace aware so the challenge will > be how to direct the configuration payload to the correct namespace. > yes > Given that limitation, it seems highly probably that individual LSM's > will implement configuration/policy management via their various > pseudo-filesystem implementations that will grow awareness for the > namespace context that the commands are being issued for. > possible. But ideally if we get it right they can expand the syscall instead of an fs interface. An fs interface has lots of problems like needing to be available within a given namespace. If we want to be nesting namespaces (which we do), then mounting custom FSes into the namespace is extra setup, and things like proc may not even be available, depending on how the container is being setup. >>> The first issue to resolve would seem to be what namespace separation >>> implies. >>> >>> John, if I interpret your comments in this discussion correctly, your >>> contention is that when namespace separation is requested, all of the >>> LSM's that implement namespaces will create a subordinate namespace, >>> is that a correct assumption? > >> No, not necessarily. The task can request to "unshare/create" LSMs >> similar to requesting a set of system namespaces. Then every LSM, >> whether part of the request or not get to do their thing. If every >> LSM agrees, then a transition hook will process and each LSM will >> again do its thing. This would likely be what was requested but its >> possible that an LSM not in the request will do something, based on >> its model. >> >> In the end usespace gets to make a request, each security policy is >> responsible for staying withing its security model/policy. > > This approach seems contrary to what Casey is advocating for in our > conversations, but perhaps we misunderstand what he is saying. > Maybe, its not what I am getting from him, but I could be misunderstanding as well. > Casey indicated that no other LSM should be able to deny the ability > of another LSM to create a namespace. > correct, at least in isolation. However if it is tied to other namespace creation, say at clone/unshare, an LSM should be able to deny that and have the whole set fail. That is an individual LSM can deny the creation of other non-LSM namespaces that are happening at the same time. This may affect the creation of other LSM namespaces, but any given individual LSM is not denying another LSM from creating a namespace. > As we noted in our exchange with him, this seems to violate the > current LSM model where all of the LSM's need to agree that an event > should be allowed, or it fails. > there is good reason for it. Experience has shown forcing each LSM to update policy for the policy of another LSM is problematic. Allowing each LSM to manage itself based on its own policy while the rest of the events are allow or fail, is very practical. >>> It would seem, consistent with the 'stacking' concept, that any LSM >>> with namespace capability that chooses not to separate, will result in >>> denial of the separation request. That in turn will imply the need to > >> Not necessarily. They could allow and choose not to transition. Or >> they could not create a namespace but update some state. > >>> unwind or delete any namespace context that other LSM's may have >>> allocated before the refusal occurred. > >> The request does need to be split into a permission hook and a >> transition hook similar to exec. If any LSM in the permission hook >> denies, the request is denied. If any LSM in the transition hook >> fails again the request will fail, and the LSMs would get their >> regular clean up hook called for the object associated. > > See above, the open question seems to be whether or not there is > agreement that any LSM can generically deny the creation of namespace > creation. > > Again, we may misunderstand Casey on this issue. > Its not about what an individual LSM is allowed but what is happening at the system level. If system events are moving with the LSM event the system event is fair game. Even if we are talking individual LSM updates a two hook model may be needed when taking into account the constraints of creds, and non-LSM permission checks. >>> This model also implies that the orchestrator requesting the >>> separation will need to pass a set of parameters describing the >>> characteristics of each namespace, described by the LSM identifier >>> that they pertain to. Since there may be a need to configure multiple >>> namespaces there would be a requirement to pass an array or list of >>> these parameter sets. > >> yes it will require a list/array see lsm_set_self_attr(2) > > Again, the issue is making this system call namespace aware. > sure or another similar syscall. I don't think we are saying that it has to be lsm_set_self_attr. More that it provides an example of how to do this. It could be that it can be extended, it could be it turns out that doing a new call that is similar but meets the constraints is needed. >>> There will also be a need to inject, possibly substantial amounts of >>> policy or model information into the namespace, before execution in >>> the context of the namespace begins. > >> Allowing for this and requiring this are two different things. Like >> I said above we don't even currently have a common policy load >> interface. Configuration is another step beyond policy load. > > It would seem the most straight forward path is to simply punt this to > the LSM's itself. If nothing else, it reduces the issues that > everyone needs to agree on. > Yes, configuration requirements are definitely a per LSM thing. >>> There will also be a need to decide whether namespace separation >>> should occur at the request of the orchestrator or at the next fork, > >> Or allow both, but yes a decision needs to be made > > Again, allow both at the discretion of the LSM. > sure >>> the latter model being what the other resource namespaces use. We >>> believe the argument for direct separation can be made by looking at >>> the gymnastics that orchestrators need to jump through with the >>> 'change-on-fork' model. > >> Looking at current system namespacing we have clone/unshare which >> really or on fork. setns enters existing namespaces. >> >> We either need to create new variants of clone/unshare or potentially >> have an LSM syscall that setups addition parameters that then are >> triggered by clone/unshare. If going the latter route then its just >> a matter whether the LSM call returns a handle that can be operated >> on or not. > > We will find that current namespace semantics are challenging with > respect to being a good model for LSM namespaces. > > Current namespaces focus on managing a single resource. In contrast, > as we have seen in our discussions, an 'LSM namespace' involves > multiple resources, each with their own specific requirements. On top > of that we have the complication of 'stacking' where anything that > happens will be the composite of what all the LSM's agree on, some of > which may be in the root namespace and some of which may be in > subordinate namespaces. > its easy to see why people call security people crazy :) > The notion of a process entering a security namespace, aka setns, will > be interesting. It would seem that this will require callbacks to > every LSM that is participating in the namespace. Presumably all of > the references to LSM security contexts will need to be suspended and > replaced with references to the context(s) for the security namespace > that is being entered. > yes setns from a security pov is problematic. > With respect to managing this effectively, we would advocate for a > 64-bit global counter that gets incremented on each successful LSM > namespace creation event. That would provide a unique handle for the > namespace that will never wrap. > uhmmm, a unique container id? Well I guess that is one way to guarantee this will never happen. >>> Case in point, it would seem realistic that a process with sufficient >>> privilege, may desire to place itself in a new LSM namespace context >>> in a manner that does not require re-execution of itself. > >> yes, but it is questionable whether security policy should allow that. >> At the very least security policy should be consulted and may deny >> it. > > What we are talking about here is the need to support a process > requesting to run in an alternate LSM namespace without forking. > sure, I support allowing a process to ask > The question of whether this should be allowed will be regulated by > whatever composite security policy is operational, the same as would > be the case with the switch on fork model. > >>> With respect to separation, the remaining issue is if a new security >>> capability bit needs to be implemented to gate namespace separation. >>> John, based on your comments, I believe you would support this need? > >> No, I don't think a capability (as in posix.1e) per say is needed. I >> think an LSM permission request is. > > Once again, that seems inconsistent with what Casey is advocating. > > Although I'm sure he is happy that a new capability bit is not in the > offing... :-) > not at all. I think the distinction is the LSM hook is asking the LSM that is being asked to be namespaced. That is each LSM is consulted about itself. >>>> You can do a subset with a single flag and only policy directing things, >>>> but that would cut container managers out of the decision. Without a >>>> universal container identifier that really limits what you can do. In >>>> another email I likend it to the MCS label approach to the container >>>> where you have a single security policy for the container and each >>>> container gets to be a unique instance of that policy. Its not a perfect >>>> analogy as with namespace policy can be loaded into the namespace making >>>> it unique. I don't think the approach is right because not all namespaces >>>> implement a loadable policy, and even when they do I think we can do a >>>> better job if the container manager is allowed to provide additional >>>> context with the namespacing request. >>> >>> In order to be relevant, the configuration of LSM namespaces need to >>> be under control of a resource orchestrator or container manager. > >> No, the must be under the control of the LSMs. > > I think we are talking past one another. > quite possibly > Configuration was perhaps a poor choice of vernacular, we were > referring to policy or model load. > which is one part of configuration. Its conceivable that an LSM could have nobs to turn beyond policy > As we mentioned in our exchange with Casey, the expection for all of > this from the user community will be to allow resource orchestrators > to run a workload under the constraints of a specific security policy. > sure that is the expectation of the container community. Its just not the only use. > Where policy should be probably plural. > > Stephen even notes this on the slides that are linked from his GitHub > selinuxns site. > >>> What we hear from people doing Kubernetes, at scale, is a desire to be >>> able to request that a container be run somewhere in the hardware >>> Resource pool and for that container to implement a security model >>> specific to the needs of the workload running in that container. In a >>> manner that is orthogonal from other security policies that may be in >>> effect for other workloads, on the host or in other containers. > >> sure, assuming the host policy allows it. Otherwise it is just a host >> policy by-pass, which can not be allowed. K8s people have a specific >> use case, they need to configure the host for that use case. They can >> not expect that use case to work on host that has been configured >> for say an MLS security constraint. > > Given that the concept of LSM stacking is overlaid on top of > namespaces, the result of all this will be security policies that will > be very interesting to reason about, particularly if multiple levels > of namespacing are allowed. > "interesting"*TM* indeed > The other issue will be potential performance issues for LSM's that > choose to chase permissions all the way back up to the root namespace. > We've heard continuous suggestions that every pointer de-reference > is problematic from a performance perspective. > oh it is, the perforamance people can get snippy about just a few cycles. Ultimately that is just the cost of stacking policy. The more layers you add the higher the cost. AppArmor is already working towards a jit of policy that will be able to flatten stacked policy, so the cost is can be pushed back to the same as non-stacked. That however comes with the cost of increased memory use, and it will only deal with the AppArmor part of the whole stack. > So, lots of issues to consider in all of this. > > Have a good weekend. > > As always, > Dr. Greg > > The Quixote Project - Flailing at the Travails of Cybersecurity > https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 7:46 ` John Johansen 2025-08-21 14:26 ` Serge E. Hallyn @ 2025-08-22 1:59 ` Paul Moore 1 sibling, 0 replies; 43+ messages in thread From: Paul Moore @ 2025-08-22 1:59 UTC (permalink / raw) To: John Johansen; +Cc: Stephen Smalley, linux-security-module, selinux On Thu, Aug 21, 2025 at 3:46 AM John Johansen <john.johansen@canonical.com> wrote: > On 8/19/25 10:47, Stephen Smalley wrote: ... > > This is handled for other Linux namespaces by opening a pseudo file > > under /proc/pid/ns and invoking setns(2), so not sure how we want to > > do it. > > That is a possible interface, not one that I like, so I would like to > explore other options first. Fair enough, suggestions are definitely welcome :) -- paul-moore.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 14:56 LSM namespacing API Paul Moore 2025-08-19 17:11 ` Casey Schaufler 2025-08-19 17:47 ` Stephen Smalley @ 2025-08-21 7:14 ` John Johansen 2025-08-21 11:20 ` Dr. Greg 3 siblings, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-21 7:14 UTC (permalink / raw) To: Paul Moore, linux-security-module, selinux; +Cc: Stephen Smalley On 8/19/25 07:56, Paul Moore wrote: > Hello all, > > As most of you are likely aware, Stephen Smalley has been working on > adding namespace support to SELinux, and the work has now progressed > to the point where a serious discussion on the API is warranted. For > those of you are unfamiliar with the details or Stephen's patchset, or > simply need a refresher, he has some excellent documentation in his > work-in-progress repo: > > * https://github.com/stephensmalley/selinuxns > > Stephen also gave a (pre-recorded) presentation at LSS-NA this year > about SELinux namespacing, you can watch the presentation here: > > * https://www.youtube.com/watch?v=AwzGCOwxLoM > > In the past you've heard me state, rather firmly at times, that I > believe namespacing at the LSM framework layer to be a mistake, > although if there is something that can be done to help facilitate the > namespacing of individual LSMs at the framework layer, I would be > supportive of that. I think that a single LSM namespace API, similar > to our recently added LSM syscalls, may be such a thing, so I'd like > us to have a discussion to see if we all agree on that, and if so, > what such an API might look like. > > At LSS-NA this year, John Johansen and I had a brief discussion where > he suggested a single LSM wide clone*(2) flag that individual LSM's > could opt into via callbacks. John is directly CC'd on this mail, so > I'll let him expand on this idea. > > While I agree with John that a fs based API is problematic (see all of > our discussions around the LSM syscalls), I'm concerned that a single > clone*(2) flag will significantly limit our flexibility around how > individual LSMs are namespaced, something I don't want to see happen. > This makes me wonder about the potential for expanding > lsm_set_self_attr(2) to support a new LSM attribute that would support > a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > provide a single LSM framework API for an unshare operation while also > providing a mechanism to pass LSM specific via the lsm_ctx struct if > needed. Just as we do with the other LSM_ATTR_* flags today, > individual LSMs can opt-in to the API fairly easily by providing a > setselfattr() LSM callback. > > Thoughts? > sorry I have been deal with a forced email migration that uhhmmm hasn't gone well. So yes we could do a single clone flag, but it does have significant issues, and is not generic enough for every LSM, at least not without some form of providing augmented information. A single clone flag means each LSM is completely in charge of its transitions (needed) but without any hinting from userspace container managers (this is a problem). Under the single flag, policy would have to drive what can be done, and that would be fairly limiting. It would allow for something like the current MCS labeling approach but not a finer Udica style approach, at least not without an addition call similar to setexeccon(), or as you have proposed more generically LSM_ATTR_UNSHARE. The more I have looked at it. The single clone flag approach is wrong and is just going to lead to problems. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-19 14:56 LSM namespacing API Paul Moore ` (2 preceding siblings ...) 2025-08-21 7:14 ` John Johansen @ 2025-08-21 11:20 ` Dr. Greg 2025-08-21 14:44 ` John Johansen 3 siblings, 1 reply; 43+ messages in thread From: Dr. Greg @ 2025-08-21 11:20 UTC (permalink / raw) To: Paul Moore; +Cc: linux-security-module, selinux, John Johansen, Stephen Smalley On Tue, Aug 19, 2025 at 10:56:27AM -0400, Paul Moore wrote: > Hello all, Good morning, I hope the day is going well for everyone. > As most of you are likely aware, Stephen Smalley has been working on > adding namespace support to SELinux, and the work has now progressed > to the point where a serious discussion on the API is warranted. For > those of you are unfamiliar with the details or Stephen's patchset, or > simply need a refresher, he has some excellent documentation in his > work-in-progress repo: > > * https://github.com/stephensmalley/selinuxns > > Stephen also gave a (pre-recorded) presentation at LSS-NA this year > about SELinux namespacing, you can watch the presentation here: > > * https://www.youtube.com/watch?v=AwzGCOwxLoM > > In the past you've heard me state, rather firmly at times, that I > believe namespacing at the LSM framework layer to be a mistake, > although if there is something that can be done to help facilitate the > namespacing of individual LSMs at the framework layer, I would be > supportive of that. I think that a single LSM namespace API, similar > to our recently added LSM syscalls, may be such a thing, so I'd like > us to have a discussion to see if we all agree on that, and if so, > what such an API might look like. > > At LSS-NA this year, John Johansen and I had a brief discussion where > he suggested a single LSM wide clone*(2) flag that individual LSM's > could opt into via callbacks. John is directly CC'd on this mail, so > I'll let him expand on this idea. > > While I agree with John that a fs based API is problematic (see all of > our discussions around the LSM syscalls), I'm concerned that a single > clone*(2) flag will significantly limit our flexibility around how > individual LSMs are namespaced, something I don't want to see happen. > This makes me wonder about the potential for expanding > lsm_set_self_attr(2) to support a new LSM attribute that would support > a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would > provide a single LSM framework API for an unshare operation while also > providing a mechanism to pass LSM specific via the lsm_ctx struct if > needed. Just as we do with the other LSM_ATTR_* flags today, > individual LSMs can opt-in to the API fairly easily by providing a > setselfattr() LSM callback. > > Thoughts? There has been an adage that traces back to the writings of George Santayana in 1905 that seems relevant: "Those who cannot remember the past are condemned to repeat it." To that end, some input from more than a decade of our work on this issue. Some of our reflections below are relevant to issues being covered in downstream components of this thread, particularly by John in the last few hours. We have had code on the table for three years with respect to the problem of generic namespacing of security policy/model/architecture, whatever one chooses to call it. For everyone's reference, here are the URL's to the patch series: V1: https://lore.kernel.org/linux-security-module/20230204050954.11583-1-greg@enjellic.com/T/#t V2: https://lore.kernel.org/linux-security-module/20230710102319.19716-1-greg@enjellic.com/T/#t V3: https://lore.kernel.org/linux-security-module/20240401105015.27614-1-greg@enjellic.com/T/#t V4: https://lore.kernel.org/linux-security-module/20240826103728.3378-1-greg@enjellic.com/T/#t We started this work about 13-15 years ago. We initially described our work and the need for it, 10 years ago almost to this day. See our 2015 paper at the Linux Security Summit in Seattle. James Morris and Casey were in the first row, Stephen and a co-worker from the NSA were in the second row, to the speakers left. If one spends some time looking under the hood, TSEM is in large part about providing a generic framework for running multiple, independent and orthogonal security frameworks/policies/architectures, whatever one chooses to call these entities. The reason that we argue that TSEM is a generic framework, is that in our internal work, we have ported the major LSM's, including the IMA infrastructure, to run in isolated namespaces as plugins for TSEM's notion of Trusted Modeling Agents (TMA's). We also have ongoing work that enables Kubernetes to dispatch workloads, using whatever LSM based security policy that container developers desire for their workloads. Suffice it to say, we have howed a lot of ground on the issues surrounding this, including issues surrounding production deployment of this type of technology. In our initial implementation, circa 2015, we adopted the approach of using a CLONE_* flag and wired the implementation of security namespaces into the rest of the namespace infrastructure. During COVID, we re-architected the entire implementation and moved to using a control file in the pseudo-filesystem that TSEM implements, we have never looked back on this decision. TSEM security workloads are a poster child for security namespaces that require a number of different setup parameters. A command verb syntax with key=value pairs, written to a pseudo-file, has proven itself to be the most flexible approach when setting up security workloads. With respect to namespace transition, we trigger the transition of a process to a new namespace (unsharing) when the process issues the request via the control file. This has proven to be, at once, the most straight forward and least security prone approach. The other major, and thorny issue, is the notion of another process 'entering' a security namespace. There are a ton of open issues to be considered with this, the approach that we took that has worked well to date, is the notion of a 'trust orchestrator' that has responsibility for controlling the namespace. Any manipulations or control of the namespace are conducted through the orchestrator process. If anyone chooses to look at our implementation, you will find that we 'bless' the orchestrator process, at the time of namespace creation, with access to the security namespace context control structure for the namespace being created. The orchestrator is the only entity that can access the security state of the namespace, other than processes within the namespace itself. This significantly narrows the scope of vulnerability with respect to who or what can manipulate a security namespace. There are a number of thorny issues, that we have not seen anyone mention, that surface with respect to allowing entry into a security namespace by an arbitrary process. Believe me when I say we have found a number of them by accident and incident. So big picture. Over a decade of experience with these issues, suggests that Paul's premise that most of these issues are best left to specific LSM's that elect to implement namespacing, is correct. The challenge is that this situation ends up being all or nothing. The actual amount of code involved in unsharing a namespace is so trivial, in comparison to the work involved with setting up and maintaining state information for a security namespace context, that it seems to make little sense to implement this support at the level of the LSM infrastructure itself. If the decision is made to provide generic namespace support, other than a request to create a namespace, it will rapidly become a slippery slope with respect to the amount of infrastructure needed to address the complexities associated with every security model being different from every other. The caveat to this is if our notion of a 'trust orchestrator' would be deemed to have merit. In that case, an LSM based namespace separation architecture would provide a common point for the orchestrator to be 'blessed' with access to control of a namespace. The other open issue is whether or not a separate capability should be implemented that allows the creation of a new security namespace. If one paws through our TSEM submissions, one will see that we proposed such a capability bit. Casey noted, rather emphatically, that no new capabilities were going to be implemented in Linux, particularly for what was described as a 'toy' project. He indicated that CAP_MAC_ADMIN was the canonical capability that should be used for manipulating LSM's. We will be very interested in seeing how a discussion around this evolves, as 'escaping' from an existing security context to a new one is an extremely critical operation from a security perspective, if one stands back and looks at the issue objectively. If the concept of a 'security orchestrator' is embraced, it would make perfect sense for the orchestrator to drop CAP_SEC_NS, or whatever it would be called, and retain CAP_MAC_ADMIN in order to manage the namespace. So lots of issues to consider; thorny, political and otherwise, on multiple fronts. > paul-moore.com Have a good day. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: LSM namespacing API 2025-08-21 11:20 ` Dr. Greg @ 2025-08-21 14:44 ` John Johansen 0 siblings, 0 replies; 43+ messages in thread From: John Johansen @ 2025-08-21 14:44 UTC (permalink / raw) To: Dr. Greg, Paul Moore; +Cc: linux-security-module, selinux, Stephen Smalley On 8/21/25 04:20, Dr. Greg wrote: > On Tue, Aug 19, 2025 at 10:56:27AM -0400, Paul Moore wrote: > >> Hello all, > > Good morning, I hope the day is going well for everyone. > >> As most of you are likely aware, Stephen Smalley has been working on >> adding namespace support to SELinux, and the work has now progressed >> to the point where a serious discussion on the API is warranted. For >> those of you are unfamiliar with the details or Stephen's patchset, or >> simply need a refresher, he has some excellent documentation in his >> work-in-progress repo: >> >> * https://github.com/stephensmalley/selinuxns >> >> Stephen also gave a (pre-recorded) presentation at LSS-NA this year >> about SELinux namespacing, you can watch the presentation here: >> >> * https://www.youtube.com/watch?v=AwzGCOwxLoM >> >> In the past you've heard me state, rather firmly at times, that I >> believe namespacing at the LSM framework layer to be a mistake, >> although if there is something that can be done to help facilitate the >> namespacing of individual LSMs at the framework layer, I would be >> supportive of that. I think that a single LSM namespace API, similar >> to our recently added LSM syscalls, may be such a thing, so I'd like >> us to have a discussion to see if we all agree on that, and if so, >> what such an API might look like. >> >> At LSS-NA this year, John Johansen and I had a brief discussion where >> he suggested a single LSM wide clone*(2) flag that individual LSM's >> could opt into via callbacks. John is directly CC'd on this mail, so >> I'll let him expand on this idea. >> >> While I agree with John that a fs based API is problematic (see all of >> our discussions around the LSM syscalls), I'm concerned that a single >> clone*(2) flag will significantly limit our flexibility around how >> individual LSMs are namespaced, something I don't want to see happen. >> This makes me wonder about the potential for expanding >> lsm_set_self_attr(2) to support a new LSM attribute that would support >> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE. This would >> provide a single LSM framework API for an unshare operation while also >> providing a mechanism to pass LSM specific via the lsm_ctx struct if >> needed. Just as we do with the other LSM_ATTR_* flags today, >> individual LSMs can opt-in to the API fairly easily by providing a >> setselfattr() LSM callback. >> >> Thoughts? > > There has been an adage that traces back to the writings of George > Santayana in 1905 that seems relevant: > > "Those who cannot remember the past are condemned to repeat it." > > To that end, some input from more than a decade of our work on this > issue. Some of our reflections below are relevant to issues being > covered in downstream components of this thread, particularly by John > in the last few hours. > > We have had code on the table for three years with respect to the > problem of generic namespacing of security policy/model/architecture, > whatever one chooses to call it. > > For everyone's reference, here are the URL's to the patch series: > > V1: > https://lore.kernel.org/linux-security-module/20230204050954.11583-1-greg@enjellic.com/T/#t > > V2: > https://lore.kernel.org/linux-security-module/20230710102319.19716-1-greg@enjellic.com/T/#t > > V3: > https://lore.kernel.org/linux-security-module/20240401105015.27614-1-greg@enjellic.com/T/#t > > V4: > https://lore.kernel.org/linux-security-module/20240826103728.3378-1-greg@enjellic.com/T/#t > > We started this work about 13-15 years ago. We initially described > our work and the need for it, 10 years ago almost to this day. See > our 2015 paper at the Linux Security Summit in Seattle. > > James Morris and Casey were in the first row, Stephen and a co-worker > from the NSA were in the second row, to the speakers left. > > If one spends some time looking under the hood, TSEM is in large part > about providing a generic framework for running multiple, independent > and orthogonal security frameworks/policies/architectures, whatever > one chooses to call these entities. > > The reason that we argue that TSEM is a generic framework, is that in > our internal work, we have ported the major LSM's, including the IMA > infrastructure, to run in isolated namespaces as plugins for TSEM's > notion of Trusted Modeling Agents (TMA's). We also have ongoing work > that enables Kubernetes to dispatch workloads, using whatever LSM > based security policy that container developers desire for their > workloads. > > Suffice it to say, we have howed a lot of ground on the issues > surrounding this, including issues surrounding production deployment > of this type of technology. > > In our initial implementation, circa 2015, we adopted the approach of > using a CLONE_* flag and wired the implementation of security > namespaces into the rest of the namespace infrastructure. > > During COVID, we re-architected the entire implementation and moved to > using a control file in the pseudo-filesystem that TSEM implements, we > have never looked back on this decision. > > TSEM security workloads are a poster child for security namespaces > that require a number of different setup parameters. A command verb > syntax with key=value pairs, written to a pseudo-file, has proven > itself to be the most flexible approach when setting up security > workloads. > > With respect to namespace transition, we trigger the transition of a > process to a new namespace (unsharing) when the process issues the > request via the control file. This has proven to be, at once, the > most straight forward and least security prone approach. > > The other major, and thorny issue, is the notion of another process > 'entering' a security namespace. There are a ton of open issues to be > considered with this, the approach that we took that has worked well > to date, is the notion of a 'trust orchestrator' that has > responsibility for controlling the namespace. Any manipulations or > control of the namespace are conducted through the orchestrator > process. > > If anyone chooses to look at our implementation, you will find that we > 'bless' the orchestrator process, at the time of namespace creation, > with access to the security namespace context control structure for > the namespace being created. The orchestrator is the only entity that > can access the security state of the namespace, other than processes > within the namespace itself. > > This significantly narrows the scope of vulnerability with respect to > who or what can manipulate a security namespace. There are a number > of thorny issues, that we have not seen anyone mention, that surface > with respect to allowing entry into a security namespace by an > arbitrary process. Believe me when I say we have found a number of > them by accident and incident. > indeed, this has to be tightly controlled. Much more so than just creating a namespace. And its not just the "security/LSM" namespace but the entire context around it. That is whether or not you can step into say the mount namespace separate from the security/LSM namespace it was created with. Each and everyone of those opens potential attack surface. Even if you if it turns out to be safe, you have to carefully evaluate each potential combination. > So big picture. > > Over a decade of experience with these issues, suggests that Paul's > premise that most of these issues are best left to specific LSM's that > elect to implement namespacing, is correct. > > The challenge is that this situation ends up being all or nothing. > > The actual amount of code involved in unsharing a namespace is so > trivial, in comparison to the work involved with setting up and > maintaining state information for a security namespace context, that > it seems to make little sense to implement this support at the level > of the LSM infrastructure itself. > actually I think that is pretty much the goal, just a minimal thin layer that provides the hooks and maybe an LSM blob object for the individual LSMs to do their thing. Instead of each LSM implementing their own interface there is a common one for container orchastrators to use to make the request. > If the decision is made to provide generic namespace support, other > than a request to create a namespace, it will rapidly become a > slippery slope with respect to the amount of infrastructure needed to > address the complexities associated with every security model being > different from every other. > yep, this is really just about a thin common API, and minimal infrastructure around the existing system namespacing calls (clone, ushare, setns). > The caveat to this is if our notion of a 'trust orchestrator' would be > deemed to have merit. In that case, an LSM based namespace separation > architecture would provide a common point for the orchestrator to be > 'blessed' with access to control of a namespace. > A trust orchestrator isn't necessarily needed. Each LSM can manage its own trust within its policy. A trust orchestrator becomes more necessary when you are trying to namespacing without the LSMs themselves participating in the decision around namespacing. Which admittedly has largely been the current situation. > The other open issue is whether or not a separate capability should be > implemented that allows the creation of a new security namespace. If > one paws through our TSEM submissions, one will see that we proposed > such a capability bit. > Its not needed if individual LSMs are making decisions around namespacing based on policy. In fact in that case it can even be harmful. Per LSM policy would be finer grained, where a capability becomes this shared flag that lacks context. Examples abound in the kernel where we have a cap check without context and then a more context based security check. Where the capability might be useful is wehn LSMs aren't dealing with the namespacing request directly. > Casey noted, rather emphatically, that no new capabilities were going > to be implemented in Linux, particularly for what was described as a > 'toy' project. He indicated that CAP_MAC_ADMIN was the canonical > capability that should be used for manipulating LSM's. > I disagree with the reuse of CAP_MAC_ADMIN, if there is going to be a capability around this it should be distinct from MAC_ADMIN and MAC_OVERRIDE, as it very much has different semantics. > We will be very interested in seeing how a discussion around this > evolves, as 'escaping' from an existing security context to a new one > is an extremely critical operation from a security perspective, if one yes. I might have mentioned just how much I dislike setns(). > stands back and looks at the issue objectively. If the concept of a > 'security orchestrator' is embraced, it would make perfect sense for > the orchestrator to drop CAP_SEC_NS, or whatever it would be called, > and retain CAP_MAC_ADMIN in order to manage the namespace. > > So lots of issues to consider; thorny, political and otherwise, on > multiple fronts. > >> paul-moore.com > > Have a good day. > > As always, > Dr. Greg > > The Quixote Project - Flailing at the Travails of Cybersecurity > https://github.com/Quixote-Project > ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2025-09-06 2:01 UTC | newest] Thread overview: 43+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-19 14:56 LSM namespacing API Paul Moore 2025-08-19 17:11 ` Casey Schaufler 2025-08-19 18:40 ` Paul Moore 2025-08-19 18:58 ` Stephen Smalley 2025-08-21 7:26 ` John Johansen 2025-08-21 7:23 ` John Johansen 2025-08-22 1:57 ` Paul Moore 2025-08-22 14:30 ` John Johansen 2025-08-21 10:00 ` Mickaël Salaün 2025-08-22 2:14 ` Paul Moore 2025-08-22 14:47 ` Casey Schaufler 2025-08-22 19:59 ` John Johansen 2025-08-23 17:41 ` Dr. Greg 2025-08-23 23:00 ` John Johansen 2025-08-19 17:47 ` Stephen Smalley 2025-08-19 18:51 ` Paul Moore 2025-08-19 18:52 ` Paul Moore 2025-08-20 14:44 ` Mickaël Salaün 2025-08-20 15:37 ` Casey Schaufler 2025-08-20 20:47 ` Paul Moore 2025-08-21 9:56 ` Mickaël Salaün 2025-08-21 14:18 ` John Johansen 2025-08-22 2:09 ` Paul Moore 2025-08-21 2:05 ` Serge E. Hallyn 2025-08-21 2:35 ` Paul Moore 2025-08-21 3:02 ` Serge E. Hallyn 2025-08-22 1:50 ` Paul Moore 2025-08-21 8:12 ` John Johansen 2025-08-21 8:07 ` John Johansen 2025-08-21 7:46 ` John Johansen 2025-08-21 14:26 ` Serge E. Hallyn 2025-08-21 14:57 ` John Johansen 2025-09-01 16:01 ` Dr. Greg 2025-09-01 17:31 ` Casey Schaufler 2025-09-04 2:16 ` Dr. Greg 2025-09-04 17:40 ` Casey Schaufler 2025-09-02 10:55 ` John Johansen 2025-09-05 22:14 ` Dr. Greg 2025-09-06 2:01 ` John Johansen 2025-08-22 1:59 ` Paul Moore 2025-08-21 7:14 ` John Johansen 2025-08-21 11:20 ` Dr. Greg 2025-08-21 14:44 ` John Johansen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).