* Re: [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux
2025-09-18 13:59 [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux Stephen Smalley
@ 2025-09-22 15:45 ` Casey Schaufler
2025-09-24 8:08 ` Dr. Greg
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Casey Schaufler @ 2025-09-22 15:45 UTC (permalink / raw)
To: Stephen Smalley, linux-security-module, selinux
Cc: paul, omosnace, john.johansen, serge, Casey Schaufler
On 9/18/2025 6:59 AM, Stephen Smalley wrote:
> RFC-only, will ultimately split the LSM-only changes to their own
> patch for submission. I have now tested this with the corresponding
> selinux userspace change that you can find at
> https://lore.kernel.org/selinux/20250918135118.9896-2-stephen.smalley.work@gmail.com/
> and also verified that my modified systemd-nspawn still works when
> starting containers with their own SELinux namespace.
>
> This defines a new LSM_ATTR_UNSHARE attribute for the
> lsm_set_self_attr(2) system call and wires it up for SELinux to invoke
> the underlying function for unsharing the SELinux namespace. As with
> the selinuxfs interface, this immediately unshares the SELinux
> namespace of the current process just like an unshare(2) system call
> would do for other namespaces. I have not yet explored the
> alternatives of deferring the unshare to the next unshare(2),
> clone(2), or execve(2) call and would want to first confirm that doing
> so does not introduce any issues in the kernel or make it harder to
> integrate with existing container runtimes.
>
> Differences between this syscall interface and the selinuxfs interface
> that need discussion before moving forward:
>
> 1. The syscall interface does not currently check any Linux capability
> or DAC permissions, whereas the selinuxfs interface can only be set by
> uid-0 or CAP_DAC_OVERRIDE processes. We need to decide what if any
> capability or DAC check should apply to this syscall interface and if
> any, add the checks to either the LSM framework code or to the SELinux
> hook function.
>
> Pros: Checking a capability or DAC permissions prevents misuse of this
> interface by unprivileged processes, particularly on systems with
> policies that do not yet define any of the new SELinux permissions
> introduced for controlling this operation. This is a potential concern
> on Linux distributions that do not tightly coordinate kernel updates
> with policy updates (or where users may choose to deploy upstream
> kernels on their own), but not on Android.
>
> Cons: Checking a capability or DAC permissions requires any process
> that uses this facility to have the corresponding capability or
> permissions, which might otherwise be unnecessary and create
> additional risks. This is less likely if we use a capability already
> required by container runtimes and similar components that might
> leverage this facility for unsharing SELinux namespaces.
>
> 2. The syscall interface checks a new SELinux unshare_selinuxns
> permission in the process2 class between the task SID and itself,
> similar to other checks for setting process attributes. This means
> that:
> allow domain self:process2 *; -or-
> allow domain self:process2 ~anything-other-than-unshare_selinuxns; -or-
> allow domain self:process2 unshare_selinuxns;
> would allow a process to unshare its SELinux namespace.
>
> The selinuxfs interface checks a new unshare permission in the
> security class between the task SID and the security initial SID,
> likewise similar to other checks for setting selinuxfs attributes.
> This means that:
> allow domain security_t:security *; -or-
> allow domain security_t:security ~anything-other-than-unshare; -or-
> allow domain security_t:security unshare;
> would allow a process to unshare its SELinux namespace.
>
> Technically, the selinuxfs interface also currently requires open and
> write access to the selinuxfs node; hence:
> allow domain security_t:file { open write };
> is also required for the selinuxfs interface.
>
> We need to decide what we want the SELinux check(s) to be for the
> syscall and whether it should be more like the former (process
> attributes) or more like the latter (security policy settings). Note
> that the permission name itself is unimportant here and only differs
> because it seemed less evident in the process2 class that we are
> talking about a SELinux namespace otherwise.
>
> Regardless, either form of allow rule can be prohibited in policies
> via neverallow rules on systems that enforce their usage
> (e.g. Android, not necessarily on Linux distributions).
>
> 3. The selinuxfs interface currently offers more functionality than I
> have implemented here for the sycall interface, including:
>
> a) the ability to read the selinuxfs node to see if your namespace has
> been unshared, which should be easily implementable via
> lsm_get_self_attr(2). However, questions remain as to when that
> should return 1 versus 0 (currently returns 1 whenever your namespace
> is NOT the initial SELinux namespace, useful for the testsuite to
> detect it is in a child, but could instead be reset to 0 by a
> subsequent policy load to indicate completion of the setup of the
> namespace, thus hiding from child processes that they are in a child
> namespace once its policy has been loaded).
>
> b) the abilities to get and set the maximum number of SELinux
> namespaces (via a /sys/fs/selinux/maxns node) and to get and set the
> maximum depth for SELinux namespaces (via a /sys/fs/selinux/maxnsdepth
> node). These could be left in selinuxfs or migrated to some other LSM
> management APIs since they are global in scope, not per-process
> attributes.
>
> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
This looks like an appropriate use of lsm_set_self_attr() to me.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> ---
> v2 fixes a typo (PROCESS->PROCESS2) and is now tested.
>
> include/uapi/linux/lsm.h | 1 +
> security/selinux/hooks.c | 8 ++++++++
> security/selinux/include/classmap.h | 4 +++-
> 3 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
> index 938593dfd5da..fb1b4a8aa639 100644
> --- a/include/uapi/linux/lsm.h
> +++ b/include/uapi/linux/lsm.h
> @@ -83,6 +83,7 @@ struct lsm_ctx {
> #define LSM_ATTR_KEYCREATE 103
> #define LSM_ATTR_PREV 104
> #define LSM_ATTR_SOCKCREATE 105
> +#define LSM_ATTR_UNSHARE 106
>
> /*
> * LSM_FLAG_XXX definitions identify special handling instructions
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index f48483383d6e..1e34a16b7954 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -6816,6 +6816,10 @@ static int selinux_lsm_setattr(u64 attr, void *value, size_t size)
> error = avc_has_perm(state, mysid, mysid, SECCLASS_PROCESS,
> PROCESS__SETCURRENT, NULL);
> break;
> + case LSM_ATTR_UNSHARE:
> + error = avc_has_perm(state, mysid, mysid, SECCLASS_PROCESS2,
> + PROCESS2__UNSHARE_SELINUXNS, NULL);
> + break;
> default:
> error = -EOPNOTSUPP;
> break;
> @@ -6927,6 +6931,10 @@ static int selinux_lsm_setattr(u64 attr, void *value, size_t size)
> }
>
> tsec->sid = sid;
> + } else if (attr == LSM_ATTR_UNSHARE) {
> + error = selinux_state_create(new);
> + if (error)
> + goto abort_change;
> } else {
> error = -EINVAL;
> goto abort_change;
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index be52ebb6b94a..07fe316308cd 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -60,7 +60,9 @@ const struct security_class_mapping secclass_map[] = {
> "siginh", "setrlimit", "rlimitinh", "dyntransition",
> "setcurrent", "execmem", "execstack", "execheap",
> "setkeycreate", "setsockcreate", "getrlimit", NULL } },
> - { "process2", { "nnp_transition", "nosuid_transition", NULL } },
> + { "process2",
> + { "nnp_transition", "nosuid_transition", "unshare_selinuxns",
> + NULL } },
> { "system",
> { "ipc_info", "syslog_read", "syslog_mod", "syslog_console",
> "module_request", "module_load", "firmware_load",
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux
2025-09-18 13:59 [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux Stephen Smalley
2025-09-22 15:45 ` Casey Schaufler
@ 2025-09-24 8:08 ` Dr. Greg
2025-09-24 18:11 ` Casey Schaufler
2025-09-29 3:35 ` Serge E. Hallyn
2025-09-30 12:32 ` Stephen Smalley
3 siblings, 1 reply; 6+ messages in thread
From: Dr. Greg @ 2025-09-24 8:08 UTC (permalink / raw)
To: Stephen Smalley
Cc: linux-security-module, selinux, paul, omosnace, john.johansen,
serge, casey
On Thu, Sep 18, 2025 at 09:59:05AM -0400, Stephen Smalley wrote:
Good morning, I hope the week is going well for everyone.
> RFC-only, will ultimately split the LSM-only changes to their own
> patch for submission. I have now tested this with the corresponding
> selinux userspace change that you can find at
> https://lore.kernel.org/selinux/20250918135118.9896-2-stephen.smalley.work@gmail.com/
> and also verified that my modified systemd-nspawn still works when
> starting containers with their own SELinux namespace.
>
> This defines a new LSM_ATTR_UNSHARE attribute for the
> lsm_set_self_attr(2) system call and wires it up for SELinux to invoke
> the underlying function for unsharing the SELinux namespace. As with
> the selinuxfs interface, this immediately unshares the SELinux
> namespace of the current process just like an unshare(2) system call
> would do for other namespaces. I have not yet explored the
> alternatives of deferring the unshare to the next unshare(2),
> clone(2), or execve(2) call and would want to first confirm that doing
> so does not introduce any issues in the kernel or make it harder to
> integrate with existing container runtimes.
>
> Differences between this syscall interface and the selinuxfs interface
> that need discussion before moving forward:
>
> 1. The syscall interface does not currently check any Linux capability
> or DAC permissions, whereas the selinuxfs interface can only be set by
> uid-0 or CAP_DAC_OVERRIDE processes. We need to decide what if any
> capability or DAC check should apply to this syscall interface and if
> any, add the checks to either the LSM framework code or to the SELinux
> hook function.
>
> Pros: Checking a capability or DAC permissions prevents misuse of this
> interface by unprivileged processes, particularly on systems with
> policies that do not yet define any of the new SELinux permissions
> introduced for controlling this operation. This is a potential concern
> on Linux distributions that do not tightly coordinate kernel updates
> with policy updates (or where users may choose to deploy upstream
> kernels on their own), but not on Android.
>
> Cons: Checking a capability or DAC permissions requires any process
> that uses this facility to have the corresponding capability or
> permissions, which might otherwise be unnecessary and create
> additional risks. This is less likely if we use a capability already
> required by container runtimes and similar components that might
> leverage this facility for unsharing SELinux namespaces.
>
> 2. The syscall interface checks a new SELinux unshare_selinuxns
> permission in the process2 class between the task SID and itself,
> similar to other checks for setting process attributes. This means
> that:
> allow domain self:process2 *; -or-
> allow domain self:process2 ~anything-other-than-unshare_selinuxns; -or-
> allow domain self:process2 unshare_selinuxns;
> would allow a process to unshare its SELinux namespace.
>
> The selinuxfs interface checks a new unshare permission in the
> security class between the task SID and the security initial SID,
> likewise similar to other checks for setting selinuxfs attributes.
> This means that:
> allow domain security_t:security *; -or-
> allow domain security_t:security ~anything-other-than-unshare; -or-
> allow domain security_t:security unshare;
> would allow a process to unshare its SELinux namespace.
>
> Technically, the selinuxfs interface also currently requires open and
> write access to the selinuxfs node; hence:
> allow domain security_t:file { open write };
> is also required for the selinuxfs interface.
>
> We need to decide what we want the SELinux check(s) to be for the
> syscall and whether it should be more like the former (process
> attributes) or more like the latter (security policy settings). Note
> that the permission name itself is unimportant here and only differs
> because it seemed less evident in the process2 class that we are
> talking about a SELinux namespace otherwise.
>
> Regardless, either form of allow rule can be prohibited in policies
> via neverallow rules on systems that enforce their usage
> (e.g. Android, not necessarily on Linux distributions).
>
> 3. The selinuxfs interface currently offers more functionality than I
> have implemented here for the sycall interface, including:
>
> a) the ability to read the selinuxfs node to see if your namespace has
> been unshared, which should be easily implementable via
> lsm_get_self_attr(2). However, questions remain as to when that
> should return 1 versus 0 (currently returns 1 whenever your namespace
> is NOT the initial SELinux namespace, useful for the testsuite to
> detect it is in a child, but could instead be reset to 0 by a
> subsequent policy load to indicate completion of the setup of the
> namespace, thus hiding from child processes that they are in a child
> namespace once its policy has been loaded).
>
> b) the abilities to get and set the maximum number of SELinux
> namespaces (via a /sys/fs/selinux/maxns node) and to get and set the
> maximum depth for SELinux namespaces (via a /sys/fs/selinux/maxnsdepth
> node). These could be left in selinuxfs or migrated to some other LSM
> management APIs since they are global in scope, not per-process
> attributes.
We had a number of exchanges regarding LSM namespacing in the thread
that Paul Moore started on this issue:
https://lore.kernel.org/linux-security-module/CAHC9VhRGMmhxbajwQNfGFy+ZFF1uN=UEBjqQZQ4UBy7yds3eVQ@mail.gmail.com/
The one issue that seemed to achieve universal consensus was that
every LSM was going to have different requirements for namespacing.
At the risk of playing devil's advocate, this seems to raise the
question as to whether or not there is a need to have a common API for
requesting security namespace separation or leave the issue to LSM
specific implementations.
The primary rationale for some modicum of centralized infrastructure
would seem to be to have a system call rather than an LSM specific
pseudo-filesystem interface to control security namespaces. Since
creating a system call interface is going to lock the API in stone it
would seem that we would want to get this right, or at least as
generic as possibe.
So some comments to that end.
If we use the lsm_set_self_attr(2) system call as our approach, the
namespace separation process needs to be split into two separate
calls. One to request the creation of a namespace and a second call
to request that the process join the new namespace.
This is required in order to support the ability for an orchestration
process to load a policy or model and have it in place before the new
namespace is allowed to enforce the policy or model.
So we would need something like an LSM_ATTR_UNSHARE_INIT as well as
the LSM_ATTR_UNSHARE attribute.
So the model would be for a process to issue an LSM_ATTR_UNSHARE_INIT
call to create the new security context namespace. That namespace
context can then be configured through either an LSM specific
pseudo-filesystem interface or alternatively through additional calls
to lsm_set_self_attr(2).
Once the configuration process is complete, the process would be set
free in its new namespace with the LSM_ATTR_UNSHARE attribute.
Separating from a security policy namespace to a new namespace will be
one of the most security sensitive operations that a system can
execute. As such it has to be gated by some type of security control.
At a minimum this needs to be uid-0 or posession of CAP_MAC_ADMIN.
Given the current LSM concept of stacking, there needs to be an LSM
security hooks assigned, so as to give all of the LSM's an opportunity
to accept or deny the attribute operations.
For example, it would seem entirely reasonable that the lockdown LSM
may want to deny the ability to create any departures from the current
security configuration.
Making this generic for any security namespace will require some
additional plumbing, most notably the ability for any LSM to register
for the ability to receive namespace event notifications. If we
create new security_task_secns_init() and
security_task_secns_unshare() hooks we could use those as both
notification and security control mechanisms.
So lots of details to discuss but the above should be about the most
generic implementation that can be leveraged by all of the LSM's.
Comments/suggestion welcome.
Have a good remainder of the week.
As always,
Dr. Greg
The Quixote Project - Flailing at the Travails of Cybersecurity
https://github.com/Quixote-Project
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux
2025-09-24 8:08 ` Dr. Greg
@ 2025-09-24 18:11 ` Casey Schaufler
0 siblings, 0 replies; 6+ messages in thread
From: Casey Schaufler @ 2025-09-24 18:11 UTC (permalink / raw)
To: Dr. Greg, Stephen Smalley
Cc: linux-security-module, selinux, paul, omosnace, john.johansen,
serge, Casey Schaufler
On 9/24/2025 1:08 AM, Dr. Greg wrote:
> On Thu, Sep 18, 2025 at 09:59:05AM -0400, Stephen Smalley wrote:
>
> Good morning, I hope the week is going well for everyone.
>
>> RFC-only, will ultimately split the LSM-only changes to their own
>> patch for submission. I have now tested this with the corresponding
>> selinux userspace change that you can find at
>> https://lore.kernel.org/selinux/20250918135118.9896-2-stephen.smalley.work@gmail.com/
>> and also verified that my modified systemd-nspawn still works when
>> starting containers with their own SELinux namespace.
>>
>> This defines a new LSM_ATTR_UNSHARE attribute for the
>> lsm_set_self_attr(2) system call and wires it up for SELinux to invoke
>> the underlying function for unsharing the SELinux namespace. As with
>> the selinuxfs interface, this immediately unshares the SELinux
>> namespace of the current process just like an unshare(2) system call
>> would do for other namespaces. I have not yet explored the
>> alternatives of deferring the unshare to the next unshare(2),
>> clone(2), or execve(2) call and would want to first confirm that doing
>> so does not introduce any issues in the kernel or make it harder to
>> integrate with existing container runtimes.
>>
>> Differences between this syscall interface and the selinuxfs interface
>> that need discussion before moving forward:
>>
>> 1. The syscall interface does not currently check any Linux capability
>> or DAC permissions, whereas the selinuxfs interface can only be set by
>> uid-0 or CAP_DAC_OVERRIDE processes. We need to decide what if any
>> capability or DAC check should apply to this syscall interface and if
>> any, add the checks to either the LSM framework code or to the SELinux
>> hook function.
>>
>> Pros: Checking a capability or DAC permissions prevents misuse of this
>> interface by unprivileged processes, particularly on systems with
>> policies that do not yet define any of the new SELinux permissions
>> introduced for controlling this operation. This is a potential concern
>> on Linux distributions that do not tightly coordinate kernel updates
>> with policy updates (or where users may choose to deploy upstream
>> kernels on their own), but not on Android.
>>
>> Cons: Checking a capability or DAC permissions requires any process
>> that uses this facility to have the corresponding capability or
>> permissions, which might otherwise be unnecessary and create
>> additional risks. This is less likely if we use a capability already
>> required by container runtimes and similar components that might
>> leverage this facility for unsharing SELinux namespaces.
>>
>> 2. The syscall interface checks a new SELinux unshare_selinuxns
>> permission in the process2 class between the task SID and itself,
>> similar to other checks for setting process attributes. This means
>> that:
>> allow domain self:process2 *; -or-
>> allow domain self:process2 ~anything-other-than-unshare_selinuxns; -or-
>> allow domain self:process2 unshare_selinuxns;
>> would allow a process to unshare its SELinux namespace.
>>
>> The selinuxfs interface checks a new unshare permission in the
>> security class between the task SID and the security initial SID,
>> likewise similar to other checks for setting selinuxfs attributes.
>> This means that:
>> allow domain security_t:security *; -or-
>> allow domain security_t:security ~anything-other-than-unshare; -or-
>> allow domain security_t:security unshare;
>> would allow a process to unshare its SELinux namespace.
>>
>> Technically, the selinuxfs interface also currently requires open and
>> write access to the selinuxfs node; hence:
>> allow domain security_t:file { open write };
>> is also required for the selinuxfs interface.
>>
>> We need to decide what we want the SELinux check(s) to be for the
>> syscall and whether it should be more like the former (process
>> attributes) or more like the latter (security policy settings). Note
>> that the permission name itself is unimportant here and only differs
>> because it seemed less evident in the process2 class that we are
>> talking about a SELinux namespace otherwise.
>>
>> Regardless, either form of allow rule can be prohibited in policies
>> via neverallow rules on systems that enforce their usage
>> (e.g. Android, not necessarily on Linux distributions).
>>
>> 3. The selinuxfs interface currently offers more functionality than I
>> have implemented here for the sycall interface, including:
>>
>> a) the ability to read the selinuxfs node to see if your namespace has
>> been unshared, which should be easily implementable via
>> lsm_get_self_attr(2). However, questions remain as to when that
>> should return 1 versus 0 (currently returns 1 whenever your namespace
>> is NOT the initial SELinux namespace, useful for the testsuite to
>> detect it is in a child, but could instead be reset to 0 by a
>> subsequent policy load to indicate completion of the setup of the
>> namespace, thus hiding from child processes that they are in a child
>> namespace once its policy has been loaded).
>>
>> b) the abilities to get and set the maximum number of SELinux
>> namespaces (via a /sys/fs/selinux/maxns node) and to get and set the
>> maximum depth for SELinux namespaces (via a /sys/fs/selinux/maxnsdepth
>> node). These could be left in selinuxfs or migrated to some other LSM
>> management APIs since they are global in scope, not per-process
>> attributes.
> We had a number of exchanges regarding LSM namespacing in the thread
> that Paul Moore started on this issue:
>
> https://lore.kernel.org/linux-security-module/CAHC9VhRGMmhxbajwQNfGFy+ZFF1uN=UEBjqQZQ4UBy7yds3eVQ@mail.gmail.com/
>
> The one issue that seemed to achieve universal consensus was that
> every LSM was going to have different requirements for namespacing.
>
> At the risk of playing devil's advocate, this seems to raise the
> question as to whether or not there is a need to have a common API for
> requesting security namespace separation or leave the issue to LSM
> specific implementations.
It does raise the question. I, for one, would like to see the
LSM infrastructure move toward being a bit more helpful regarding
things that the various LSMs do that can be done commonly. One
example of this is moving blob allocations out of the individual
modules. Another is the introduction of lsm properties
(struct lsm_prop) to replace the SELinux specific secid model.
When there where five LSMs total, only two to be used at a time,
leaving everything to the individual module was mildly tolerable.
Today we see that there's a heap of waste in data management and
duplicate code when even a base Fedora uses eight at a time.
I don't see this as a matter of taste. I see this as a case where
we can make the LSM infrastructure a little less chaotic.
> The primary rationale for some modicum of centralized infrastructure
> would seem to be to have a system call rather than an LSM specific
> pseudo-filesystem interface to control security namespaces. Since
> creating a system call interface is going to lock the API in stone it
> would seem that we would want to get this right, or at least as
> generic as possibe.
OK.
> So some comments to that end.
>
> If we use the lsm_set_self_attr(2) system call as our approach, the
> namespace separation process needs to be split into two separate
> calls. One to request the creation of a namespace and a second call
> to request that the process join the new namespace.
>
> This is required in order to support the ability for an orchestration
> process to load a policy or model and have it in place before the new
> namespace is allowed to enforce the policy or model.
>
> So we would need something like an LSM_ATTR_UNSHARE_INIT as well as
> the LSM_ATTR_UNSHARE attribute.
Stephen does not require this in his SELinux implementation.
Nor did the abandoned Smack namespaces. On the other hand, I
can see where a system with a dynamic security configuration
such as yours might indeed need this.
> So the model would be for a process to issue an LSM_ATTR_UNSHARE_INIT
> call to create the new security context namespace. That namespace
> context can then be configured through either an LSM specific
> pseudo-filesystem interface or alternatively through additional calls
> to lsm_set_self_attr(2).
>
> Once the configuration process is complete, the process would be set
> free in its new namespace with the LSM_ATTR_UNSHARE attribute.
>
> Separating from a security policy namespace to a new namespace will be
> one of the most security sensitive operations that a system can
> execute. As such it has to be gated by some type of security control.
>
> At a minimum this needs to be uid-0 or posession of CAP_MAC_ADMIN.
> Given the current LSM concept of stacking, there needs to be an LSM
> security hooks assigned, so as to give all of the LSM's an opportunity
> to accept or deny the attribute operations.
You've got paradigm conflict developing here. SELinux (for example)
would prefer to make its own decisions regarding "security relevant"
operations. Landlock, which is "unprivileged", ought to be able to
perform its namespace operations without undue interference. While
I generally think that all LSMs ought to integrate with capabilities,
but I have to admit to the status quo.
> For example, it would seem entirely reasonable that the lockdown LSM
> may want to deny the ability to create any departures from the current
> security configuration.
Bluntly, that's not gonna happen. It would break Smack for one thing.
> Making this generic for any security namespace will require some
> additional plumbing, most notably the ability for any LSM to register
> for the ability to receive namespace event notifications. If we
> create new security_task_secns_init() and
> security_task_secns_unshare() hooks we could use those as both
> notification and security control mechanisms.
>
> So lots of details to discuss but the above should be about the most
> generic implementation that can be leveraged by all of the LSM's.
>
> Comments/suggestion welcome.
>
> Have a good remainder of the week.
If I must. :)
>
> As always,
> Dr. Greg
>
> The Quixote Project - Flailing at the Travails of Cybersecurity
> https://github.com/Quixote-Project
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux
2025-09-18 13:59 [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux Stephen Smalley
2025-09-22 15:45 ` Casey Schaufler
2025-09-24 8:08 ` Dr. Greg
@ 2025-09-29 3:35 ` Serge E. Hallyn
2025-09-30 12:32 ` Stephen Smalley
3 siblings, 0 replies; 6+ messages in thread
From: Serge E. Hallyn @ 2025-09-29 3:35 UTC (permalink / raw)
To: Stephen Smalley
Cc: linux-security-module, selinux, paul, omosnace, john.johansen,
serge, casey
On Thu, Sep 18, 2025 at 09:59:05AM -0400, Stephen Smalley wrote:
> RFC-only, will ultimately split the LSM-only changes to their own
> patch for submission. I have now tested this with the corresponding
> selinux userspace change that you can find at
> https://lore.kernel.org/selinux/20250918135118.9896-2-stephen.smalley.work@gmail.com/
> and also verified that my modified systemd-nspawn still works when
> starting containers with their own SELinux namespace.
>
> This defines a new LSM_ATTR_UNSHARE attribute for the
> lsm_set_self_attr(2) system call and wires it up for SELinux to invoke
> the underlying function for unsharing the SELinux namespace. As with
> the selinuxfs interface, this immediately unshares the SELinux
> namespace of the current process just like an unshare(2) system call
> would do for other namespaces. I have not yet explored the
> alternatives of deferring the unshare to the next unshare(2),
> clone(2), or execve(2) call and would want to first confirm that doing
> so does not introduce any issues in the kernel or make it harder to
> integrate with existing container runtimes.
Doing it immediately seems like the right thing to do. So that
the container runtime can keep the umount/remount of selinuxfs
with the unshare, instead of having to defer that until after
a later syscall.
> Differences between this syscall interface and the selinuxfs interface
> that need discussion before moving forward:
>
> 1. The syscall interface does not currently check any Linux capability
> or DAC permissions, whereas the selinuxfs interface can only be set by
> uid-0 or CAP_DAC_OVERRIDE processes. We need to decide what if any
> capability or DAC check should apply to this syscall interface and if
> any, add the checks to either the LSM framework code or to the SELinux
> hook function.
I think this should be done by the SELinux hook. And I suspect you
do want to require those privs, but I could be wrong.
> Pros: Checking a capability or DAC permissions prevents misuse of this
> interface by unprivileged processes, particularly on systems with
> policies that do not yet define any of the new SELinux permissions
> introduced for controlling this operation. This is a potential concern
> on Linux distributions that do not tightly coordinate kernel updates
> with policy updates (or where users may choose to deploy upstream
> kernels on their own), but not on Android.
Hm, that's an interesting problem.
> Cons: Checking a capability or DAC permissions requires any process
> that uses this facility to have the corresponding capability or
> permissions, which might otherwise be unnecessary and create
> additional risks. This is less likely if we use a capability already
> required by container runtimes and similar components that might
> leverage this facility for unsharing SELinux namespaces.
>
> 2. The syscall interface checks a new SELinux unshare_selinuxns
> permission in the process2 class between the task SID and itself,
> similar to other checks for setting process attributes. This means
> that:
> allow domain self:process2 *; -or-
> allow domain self:process2 ~anything-other-than-unshare_selinuxns; -or-
> allow domain self:process2 unshare_selinuxns;
> would allow a process to unshare its SELinux namespace.
>
> The selinuxfs interface checks a new unshare permission in the
> security class between the task SID and the security initial SID,
> likewise similar to other checks for setting selinuxfs attributes.
> This means that:
> allow domain security_t:security *; -or-
> allow domain security_t:security ~anything-other-than-unshare; -or-
> allow domain security_t:security unshare;
> would allow a process to unshare its SELinux namespace.
>
> Technically, the selinuxfs interface also currently requires open and
> write access to the selinuxfs node; hence:
> allow domain security_t:file { open write };
> is also required for the selinuxfs interface.
>
> We need to decide what we want the SELinux check(s) to be for the
> syscall and whether it should be more like the former (process
> attributes) or more like the latter (security policy settings). Note
> that the permission name itself is unimportant here and only differs
> because it seemed less evident in the process2 class that we are
> talking about a SELinux namespace otherwise.
>
> Regardless, either form of allow rule can be prohibited in policies
> via neverallow rules on systems that enforce their usage
> (e.g. Android, not necessarily on Linux distributions).
>
> 3. The selinuxfs interface currently offers more functionality than I
> have implemented here for the sycall interface, including:
>
> a) the ability to read the selinuxfs node to see if your namespace has
> been unshared, which should be easily implementable via
> lsm_get_self_attr(2). However, questions remain as to when that
> should return 1 versus 0 (currently returns 1 whenever your namespace
> is NOT the initial SELinux namespace, useful for the testsuite to
> detect it is in a child, but could instead be reset to 0 by a
> subsequent policy load to indicate completion of the setup of the
> namespace, thus hiding from child processes that they are in a child
> namespace once its policy has been loaded).
maybe 'unshare' means that an unshare is in progress, and add an
'unshared' which is incremented on every unshare (and never
decremented) for use by the testsuite?
> b) the abilities to get and set the maximum number of SELinux
> namespaces (via a /sys/fs/selinux/maxns node) and to get and set the
> maximum depth for SELinux namespaces (via a /sys/fs/selinux/maxnsdepth
> node). These could be left in selinuxfs or migrated to some other LSM
> management APIs since they are global in scope, not per-process
> attributes.
>
> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
> v2 fixes a typo (PROCESS->PROCESS2) and is now tested.
>
> include/uapi/linux/lsm.h | 1 +
> security/selinux/hooks.c | 8 ++++++++
> security/selinux/include/classmap.h | 4 +++-
> 3 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
> index 938593dfd5da..fb1b4a8aa639 100644
> --- a/include/uapi/linux/lsm.h
> +++ b/include/uapi/linux/lsm.h
> @@ -83,6 +83,7 @@ struct lsm_ctx {
> #define LSM_ATTR_KEYCREATE 103
> #define LSM_ATTR_PREV 104
> #define LSM_ATTR_SOCKCREATE 105
> +#define LSM_ATTR_UNSHARE 106
>
> /*
> * LSM_FLAG_XXX definitions identify special handling instructions
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index f48483383d6e..1e34a16b7954 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -6816,6 +6816,10 @@ static int selinux_lsm_setattr(u64 attr, void *value, size_t size)
> error = avc_has_perm(state, mysid, mysid, SECCLASS_PROCESS,
> PROCESS__SETCURRENT, NULL);
> break;
> + case LSM_ATTR_UNSHARE:
> + error = avc_has_perm(state, mysid, mysid, SECCLASS_PROCESS2,
> + PROCESS2__UNSHARE_SELINUXNS, NULL);
> + break;
> default:
> error = -EOPNOTSUPP;
> break;
> @@ -6927,6 +6931,10 @@ static int selinux_lsm_setattr(u64 attr, void *value, size_t size)
> }
>
> tsec->sid = sid;
> + } else if (attr == LSM_ATTR_UNSHARE) {
> + error = selinux_state_create(new);
> + if (error)
> + goto abort_change;
> } else {
> error = -EINVAL;
> goto abort_change;
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index be52ebb6b94a..07fe316308cd 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -60,7 +60,9 @@ const struct security_class_mapping secclass_map[] = {
> "siginh", "setrlimit", "rlimitinh", "dyntransition",
> "setcurrent", "execmem", "execstack", "execheap",
> "setkeycreate", "setsockcreate", "getrlimit", NULL } },
> - { "process2", { "nnp_transition", "nosuid_transition", NULL } },
> + { "process2",
> + { "nnp_transition", "nosuid_transition", "unshare_selinuxns",
> + NULL } },
> { "system",
> { "ipc_info", "syslog_read", "syslog_mod", "syslog_console",
> "module_request", "module_load", "firmware_load",
> --
> 2.50.1
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux
2025-09-18 13:59 [RFC PATCH v2] lsm,selinux: introduce LSM_ATTR_UNSHARE and wire it up for SELinux Stephen Smalley
` (2 preceding siblings ...)
2025-09-29 3:35 ` Serge E. Hallyn
@ 2025-09-30 12:32 ` Stephen Smalley
3 siblings, 0 replies; 6+ messages in thread
From: Stephen Smalley @ 2025-09-30 12:32 UTC (permalink / raw)
To: linux-security-module, selinux
Cc: paul, omosnace, john.johansen, serge, casey
On Thu, Sep 18, 2025 at 10:07 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> RFC-only, will ultimately split the LSM-only changes to their own
> patch for submission. I have now tested this with the corresponding
> selinux userspace change that you can find at
> https://lore.kernel.org/selinux/20250918135118.9896-2-stephen.smalley.work@gmail.com/
> and also verified that my modified systemd-nspawn still works when
> starting containers with their own SELinux namespace.
As it turns out, I had NOT tested this with my modified systemd-nspawn
- it was picking up the wrong libselinux and fails if I used the one
updated to call the lsm_set_self_attr() system call instead of the
/sys/fs/selinux/unshare interface. This was due to systemd-nspawn
setting seccomp filters prior to the point at which it was doing the
unshare and lsm_set_self_attr() not being included in the allowlist. I
fixed this by moving the selinux_unshare() call before setting the
seccomp filters and that solved the problem. Updated systemd patches
pushed to my systemd fork for anyone trying this themselves.
>
> This defines a new LSM_ATTR_UNSHARE attribute for the
> lsm_set_self_attr(2) system call and wires it up for SELinux to invoke
> the underlying function for unsharing the SELinux namespace. As with
> the selinuxfs interface, this immediately unshares the SELinux
> namespace of the current process just like an unshare(2) system call
> would do for other namespaces. I have not yet explored the
> alternatives of deferring the unshare to the next unshare(2),
> clone(2), or execve(2) call and would want to first confirm that doing
> so does not introduce any issues in the kernel or make it harder to
> integrate with existing container runtimes.
>
> Differences between this syscall interface and the selinuxfs interface
> that need discussion before moving forward:
>
> 1. The syscall interface does not currently check any Linux capability
> or DAC permissions, whereas the selinuxfs interface can only be set by
> uid-0 or CAP_DAC_OVERRIDE processes. We need to decide what if any
> capability or DAC check should apply to this syscall interface and if
> any, add the checks to either the LSM framework code or to the SELinux
> hook function.
>
> Pros: Checking a capability or DAC permissions prevents misuse of this
> interface by unprivileged processes, particularly on systems with
> policies that do not yet define any of the new SELinux permissions
> introduced for controlling this operation. This is a potential concern
> on Linux distributions that do not tightly coordinate kernel updates
> with policy updates (or where users may choose to deploy upstream
> kernels on their own), but not on Android.
>
> Cons: Checking a capability or DAC permissions requires any process
> that uses this facility to have the corresponding capability or
> permissions, which might otherwise be unnecessary and create
> additional risks. This is less likely if we use a capability already
> required by container runtimes and similar components that might
> leverage this facility for unsharing SELinux namespaces.
>
> 2. The syscall interface checks a new SELinux unshare_selinuxns
> permission in the process2 class between the task SID and itself,
> similar to other checks for setting process attributes. This means
> that:
> allow domain self:process2 *; -or-
> allow domain self:process2 ~anything-other-than-unshare_selinuxns; -or-
> allow domain self:process2 unshare_selinuxns;
> would allow a process to unshare its SELinux namespace.
>
> The selinuxfs interface checks a new unshare permission in the
> security class between the task SID and the security initial SID,
> likewise similar to other checks for setting selinuxfs attributes.
> This means that:
> allow domain security_t:security *; -or-
> allow domain security_t:security ~anything-other-than-unshare; -or-
> allow domain security_t:security unshare;
> would allow a process to unshare its SELinux namespace.
>
> Technically, the selinuxfs interface also currently requires open and
> write access to the selinuxfs node; hence:
> allow domain security_t:file { open write };
> is also required for the selinuxfs interface.
>
> We need to decide what we want the SELinux check(s) to be for the
> syscall and whether it should be more like the former (process
> attributes) or more like the latter (security policy settings). Note
> that the permission name itself is unimportant here and only differs
> because it seemed less evident in the process2 class that we are
> talking about a SELinux namespace otherwise.
>
> Regardless, either form of allow rule can be prohibited in policies
> via neverallow rules on systems that enforce their usage
> (e.g. Android, not necessarily on Linux distributions).
>
> 3. The selinuxfs interface currently offers more functionality than I
> have implemented here for the sycall interface, including:
>
> a) the ability to read the selinuxfs node to see if your namespace has
> been unshared, which should be easily implementable via
> lsm_get_self_attr(2). However, questions remain as to when that
> should return 1 versus 0 (currently returns 1 whenever your namespace
> is NOT the initial SELinux namespace, useful for the testsuite to
> detect it is in a child, but could instead be reset to 0 by a
> subsequent policy load to indicate completion of the setup of the
> namespace, thus hiding from child processes that they are in a child
> namespace once its policy has been loaded).
>
> b) the abilities to get and set the maximum number of SELinux
> namespaces (via a /sys/fs/selinux/maxns node) and to get and set the
> maximum depth for SELinux namespaces (via a /sys/fs/selinux/maxnsdepth
> node). These could be left in selinuxfs or migrated to some other LSM
> management APIs since they are global in scope, not per-process
> attributes.
>
> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
> v2 fixes a typo (PROCESS->PROCESS2) and is now tested.
>
> include/uapi/linux/lsm.h | 1 +
> security/selinux/hooks.c | 8 ++++++++
> security/selinux/include/classmap.h | 4 +++-
> 3 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
> index 938593dfd5da..fb1b4a8aa639 100644
> --- a/include/uapi/linux/lsm.h
> +++ b/include/uapi/linux/lsm.h
> @@ -83,6 +83,7 @@ struct lsm_ctx {
> #define LSM_ATTR_KEYCREATE 103
> #define LSM_ATTR_PREV 104
> #define LSM_ATTR_SOCKCREATE 105
> +#define LSM_ATTR_UNSHARE 106
>
> /*
> * LSM_FLAG_XXX definitions identify special handling instructions
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index f48483383d6e..1e34a16b7954 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -6816,6 +6816,10 @@ static int selinux_lsm_setattr(u64 attr, void *value, size_t size)
> error = avc_has_perm(state, mysid, mysid, SECCLASS_PROCESS,
> PROCESS__SETCURRENT, NULL);
> break;
> + case LSM_ATTR_UNSHARE:
> + error = avc_has_perm(state, mysid, mysid, SECCLASS_PROCESS2,
> + PROCESS2__UNSHARE_SELINUXNS, NULL);
> + break;
> default:
> error = -EOPNOTSUPP;
> break;
> @@ -6927,6 +6931,10 @@ static int selinux_lsm_setattr(u64 attr, void *value, size_t size)
> }
>
> tsec->sid = sid;
> + } else if (attr == LSM_ATTR_UNSHARE) {
> + error = selinux_state_create(new);
> + if (error)
> + goto abort_change;
> } else {
> error = -EINVAL;
> goto abort_change;
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index be52ebb6b94a..07fe316308cd 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -60,7 +60,9 @@ const struct security_class_mapping secclass_map[] = {
> "siginh", "setrlimit", "rlimitinh", "dyntransition",
> "setcurrent", "execmem", "execstack", "execheap",
> "setkeycreate", "setsockcreate", "getrlimit", NULL } },
> - { "process2", { "nnp_transition", "nosuid_transition", NULL } },
> + { "process2",
> + { "nnp_transition", "nosuid_transition", "unshare_selinuxns",
> + NULL } },
> { "system",
> { "ipc_info", "syslog_read", "syslog_mod", "syslog_console",
> "module_request", "module_load", "firmware_load",
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread