* Re: [PATCH net 0/4] net: trust-after-modification fixes for IPv4 options + netlabel
From: Qi Tang @ 2026-05-14 17:06 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet
Cc: netdev, lyutoon, dsahern, idosch, horms, paul, huw,
linux-security-module, Qi Tang
In-Reply-To: <20260514165139.436961-1-tpluszz77@gmail.com>
Sorry, v1 went out unthreaded and 1/4 was duplicated. I will
resend properly as [PATCH net v2 0/4] in ~24h. Please ignore
the v1 thread.
Qi Tang <tpluszz77@gmail.com>
Tong Liu <lyutoon@gmail.com>
^ permalink raw reply
* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Casey Schaufler @ 2026-05-14 17:11 UTC (permalink / raw)
To: Qi Tang, davem, kuba, pabeni, edumazet
Cc: netdev, lyutoon, stable, Paul Moore, Simon Horman, Huw Davies,
linux-security-module, Casey Schaufler
In-Reply-To: <20260514165139.436961-4-tpluszz77@gmail.com>
On 5/14/2026 9:51 AM, Qi Tang wrote:
> netlbl_skbuff_getattr() locates the CALIPSO option in the IPv6 HBH
> header via calipso_optptr() and hands the bare pointer to
> calipso_getattr() -> calipso_opt_getattr(). The consumer re-reads
> calipso[1] (option data length) and calipso[6] (cat_len/4) and walks
> calipso + 10 for cat_len bytes via netlbl_bitmap_walk().
>
> ipv6_hop_calipso() validates these bytes only at parse time inside
> ipv6_parse_hopopts(). An nftables PRE_ROUTING payload write
> reachable from an unprivileged user namespace can rewrite both bytes
> between parse and the SELinux/Smack peer-label consume path
> (selinux_sock_rcv_skb_compat -> selinux_netlbl_sock_rcv_skb ->
> netlbl_skbuff_getattr). The self-consistency check
> (cat_len + 8 > len) inside calipso_opt_getattr() is defeated by
> mutating both bytes consistently, allowing a ~232-byte
> slab-out-of-bounds read from calipso + 10 whose set bits become MLS
> categories driving the access decision.
>
> netlbl_skbuff_getattr() has the skb; gate the consume on the option
> fitting within skb_tail_pointer(). The IPv6 option layout is
> type(1) + length(1) + length bytes of data, so requiring
> ptr + 2 + ptr[1] <= skb_tail covers the option and its embedded
> bitmap.
>
> Runtime confirmation (Smack peer-label policy + nft HBH mutation):
I'm the Smack maintainer and do not understand what you are trying
to say. Smack does not use CALIPSO, although support is on the
wish list.
> Udp6InDatagrams increments to 1 with the mutated cat_len, showing
> selinux/smack_socket_sock_rcv_skb -> netlbl_skbuff_getattr ->
> calipso_opt_getattr -> netlbl_bitmap_walk runs end-to-end past the
> option's true bound; with this patch the consume path short-circuits
> at the bounds check and the counter stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
> net/netlabel/netlabel_kapi.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 3583fa63dd01f..4af8ab76964e0 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1399,11 +1399,20 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
> return 0;
> break;
> #if IS_ENABLED(CONFIG_IPV6)
> - case AF_INET6:
> + case AF_INET6: {
> + const unsigned char *tail = skb_tail_pointer(skb);
> + u8 opt_data_len;
> +
> ptr = calipso_optptr(skb);
> - if (ptr && calipso_getattr(ptr, secattr) == 0)
> + if (!ptr || ptr + 2 > tail)
> + break;
> + opt_data_len = ptr[1]; /* IPv6 option data length */
> + if (ptr + 2 + opt_data_len > tail)
> + break;
> + if (calipso_getattr(ptr, secattr) == 0)
> return 0;
> break;
> + }
> #endif /* IPv6 */
> }
>
^ permalink raw reply
* Re: [PATCH v2 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Paul Moore @ 2026-05-14 20:15 UTC (permalink / raw)
To: Aaron Tomlin
Cc: tsbogend, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <bscbywzzx4nmxzbuw2bkzltb7rrmgmzy5u4gqy5pfpmafcnlto@eznniiguusqb>
On Tue, May 12, 2026 at 3:49 PM Aaron Tomlin <atomlin@atomlin.com> wrote:
> On Mon, May 11, 2026 at 04:28:09PM -0400, Paul Moore wrote:
> [ ... ]
> > > Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> > > ---
> > > arch/mips/kernel/mips-mt-fpaff.c | 30 +++++++++++++++++-------------
> > > fs/proc/base.c | 2 +-
> > > include/linux/lsm_hook_defs.h | 3 ++-
> > > include/linux/security.h | 11 +++++++----
> > > kernel/cgroup/cpuset.c | 4 ++--
> > > kernel/sched/syscalls.c | 4 ++--
> > > security/commoncap.c | 7 +++++--
> > > security/security.c | 11 ++++++-----
> > > security/selinux/hooks.c | 3 ++-
> > > security/smack/smack_lsm.c | 11 +++++++++--
> > > 10 files changed, 53 insertions(+), 33 deletions(-)
> >
> > I haven't looked too closely at this patch yet, but based on a quick
> > glance, can you help me understand why it is included with the other
> > two patches in one patchset? The other two patches look like stable
> > level kernel bug fixes, while this patch introduces functionality to
> > an existing LSM hook; one of these is not like the others :)
> >
> > Unless there is something critical that I'm missing here, I would
> > suggest splitting this patch out from the other two bugfixes for
> > separate handling. If there is a patch dependency issue you can
> > always mention that in the cover letter.
>
> Hi Paul,
>
> Thank you for taking the time to have a look.
>
> You raise a perfectly valid point.
>
> Please note, the cgroup-related BUG fix will be dropped from the next
> iteration of this series. As per Waiman Long (on Cc), a solution for the
> BUG was already proposed here [1].
That's good news. I saw some discussion on that but didn't follow it
very closely.
> However, I suspect the MIPS-related patch will need to remain coupled with
> this feature patch. Because the first patch fundamentally alters the
> signature of the security_task_setscheduler() hook, the MIPS FPU affinity
> code must be updated concurrently to accommodate the new parameter.
I generally dislike when bug fixes depend on new functionality; it's
backwards in my opinion. I would much rather see the MIPS bug fix
patch submitted as a standalone patch and then have the LSM hook
modification patch come separately, perhaps with a note that it
depends on the bug fix patch.
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH RFC 4/5] selinux: Restrict cross-cgroup dma-heap charging
From: Paul Moore @ 2026-05-14 20:44 UTC (permalink / raw)
To: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan, Sumit Semwal, Christian König,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
T.J. Mercier, Christian Brauner, James Morris, Serge E. Hallyn,
Stephen Smalley, Ondrej Mosnacek, Shuah Khan
Cc: cgroups, linux-doc, linux-kernel, linux-media, dri-devel,
linaro-mm-sig, linux-mm, linux-security-module, selinux,
linux-kselftest, Albert Esteve, mripard, echanude
In-Reply-To: <20260512-v2_20230123_tjmercier_google_com-v1-4-6326701c3691@redhat.com>
On May 12, 2026 Albert Esteve <aesteve@redhat.com> wrote:
>
> The security_dma_heap_alloc() hook allows security modules
> to control which processes may charge dma-buf allocations
> to another process's cgroup via the charge_pid_fd field of
> DMA_HEAP_IOCTL_ALLOC. Without a policy implementation, the
> hook is a no-op and the restriction is not enforced.
>
> On SELinux-managed systems any domain with access to a
> dma-heap device node can therefore exhaust another cgroup's
> memory budget without restriction.
>
> Implement selinux_dma_heap_alloc() using avc_has_perm() with
> a new dma_heap object class and a charge_to permission. Policy
> authors can then grant cross-cgroup charging selectively,
> for example:
>
> allow allocator_app_t client_app_t:dma_heap charge_to;
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> security/selinux/hooks.c | 7 +++++++
> security/selinux/include/classmap.h | 1 +
> 2 files changed, 8 insertions(+)
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 0f704380a8c81..ea1f410b9f619 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2189,6 +2189,12 @@ static int selinux_capable(const struct cred *cred, struct user_namespace *ns,
> return cred_has_capability(cred, cap, opts, ns == &init_user_ns);
> }
>
> +static int selinux_dma_heap_alloc(const struct cred *from, const struct cred *to)
> +{
> + return avc_has_perm(cred_sid(from), cred_sid(to),
> + SECCLASS_DMA_HEAP, DMA_HEAP__CHARGE_TO, NULL);
> +}
> +
> static int selinux_quotactl(int cmds, int type, int id, const struct super_block *sb)
> {
> const struct cred *cred = current_cred();
> @@ -7541,6 +7547,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
> LSM_HOOK_INIT(capget, selinux_capget),
> LSM_HOOK_INIT(capset, selinux_capset),
> LSM_HOOK_INIT(capable, selinux_capable),
> + LSM_HOOK_INIT(dma_heap_alloc, selinux_dma_heap_alloc),
> LSM_HOOK_INIT(quotactl, selinux_quotactl),
> LSM_HOOK_INIT(quota_on, selinux_quota_on),
> LSM_HOOK_INIT(syslog, selinux_syslog),
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index 90cb61b164256..d232f7808f6b8 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -181,6 +181,7 @@ const struct security_class_mapping secclass_map[] = {
> { "user_namespace", { "create", NULL } },
> { "memfd_file",
> { COMMON_FILE_PERMS, "execute_no_trans", "entrypoint", NULL } },
> + { "dma_heap", { "charge_to", NULL } },
> /* last one */ { NULL, {} }
> };
While we have seen some one-off patches to add specific resource/cgroups
controls in the past, much like this one, we've yet to see a patchset
that provides a more comprehensive set of resource/cgroup access controls
for SELinux.
I'm not opposed to a patch like this, but I would like to see it as part
of a larger effort to introduce access controls across all of the
existing cgroup control points where it makes sense. In other words,
let's see a design for cgroup access controls so that we can ensure we
have something that is meaningful and makes sense from a policy
developer's perspective.
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH] lsm: hold cred_guard_mutex for lsm_set_self_attr()
From: Paul Moore @ 2026-05-14 20:47 UTC (permalink / raw)
To: Stephen Smalley, selinux
Cc: omosnace, casey, serge, john.johansen, linux-security-module,
Stephen Smalley
In-Reply-To: <20260513180506.760657-1-stephen.smalley.work@gmail.com>
On May 13, 2026 Stephen Smalley <stephen.smalley.work@gmail.com> wrote:
>
> Just as proc_pid_attr_write() already does before calling the LSM
> hook. This only matters for SELinux and AppArmor which check
> whether the process is being ptraced and if so, whether to
> allow the transition.
>
> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> ---
> security/lsm_syscalls.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
Thanks Stephen. I'm going to merge this into lsm/stable-7.1 now, but
hold on to it until next week before sending it to Linus. While I
can't see why John would have any objections to this, the extra time
should give him a chance to respond.
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15 1:54 UTC (permalink / raw)
To: casey
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, paul, horms, huw,
linux-security-module, Qi Tang
In-Reply-To: <7e165421-a688-4025-a33a-8eefbb84c4b5@schaufler-ca.com>
Hi Casey,
You're right. "SELinux/Smack peer-label consume path" was wrong
in the CALIPSO patch. Our reasoning was that both LSMs call
netlbl_skbuff_getattr() in their socket-rcv path, but we only
actually verified the OOB read via SELinux's compat path
(selinux=1 enforcing=0, with a CALIPSO DOI installed via
netlabelctl). We never tested with Smack and shouldn't have
included it.
v2 will say "SELinux" only on the CALIPSO patch. The companion
CIPSO patch keeps the Smack mention since Smack does use CIPSO.
Sorry for the noise.
Qi
^ permalink raw reply
* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Paul Moore @ 2026-05-15 2:18 UTC (permalink / raw)
To: Qi Tang
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, stable,
Simon Horman, Huw Davies, linux-security-module
In-Reply-To: <20260514165139.436961-4-tpluszz77@gmail.com>
On Thu, May 14, 2026 at 12:52 PM Qi Tang <tpluszz77@gmail.com> wrote:
>
> netlbl_skbuff_getattr() locates the CALIPSO option in the IPv6 HBH
> header via calipso_optptr() and hands the bare pointer to
> calipso_getattr() -> calipso_opt_getattr(). The consumer re-reads
> calipso[1] (option data length) and calipso[6] (cat_len/4) and walks
> calipso + 10 for cat_len bytes via netlbl_bitmap_walk().
>
> ipv6_hop_calipso() validates these bytes only at parse time inside
> ipv6_parse_hopopts(). An nftables PRE_ROUTING payload write
> reachable from an unprivileged user namespace can rewrite both bytes
> between parse and the SELinux/Smack peer-label consume path
> (selinux_sock_rcv_skb_compat -> selinux_netlbl_sock_rcv_skb ->
> netlbl_skbuff_getattr). The self-consistency check
> (cat_len + 8 > len) inside calipso_opt_getattr() is defeated by
> mutating both bytes consistently, allowing a ~232-byte
> slab-out-of-bounds read from calipso + 10 whose set bits become MLS
> categories driving the access decision.
>
> netlbl_skbuff_getattr() has the skb; gate the consume on the option
> fitting within skb_tail_pointer(). The IPv6 option layout is
> type(1) + length(1) + length bytes of data, so requiring
> ptr + 2 + ptr[1] <= skb_tail covers the option and its embedded
> bitmap.
>
> Runtime confirmation (Smack peer-label policy + nft HBH mutation):
> Udp6InDatagrams increments to 1 with the mutated cat_len, showing
> selinux/smack_socket_sock_rcv_skb -> netlbl_skbuff_getattr ->
> calipso_opt_getattr -> netlbl_bitmap_walk runs end-to-end past the
> option's true bound; with this patch the consume path short-circuits
> at the bounds check and the counter stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
> net/netlabel/netlabel_kapi.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 3583fa63dd01f..4af8ab76964e0 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1399,11 +1399,20 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
> return 0;
> break;
> #if IS_ENABLED(CONFIG_IPV6)
> - case AF_INET6:
> + case AF_INET6: {
> + const unsigned char *tail = skb_tail_pointer(skb);
> + u8 opt_data_len;
> +
> ptr = calipso_optptr(skb);
> - if (ptr && calipso_getattr(ptr, secattr) == 0)
> + if (!ptr || ptr + 2 > tail)
> + break;
Is there a reason why you simply break here and drop down into the
unlabeled code? I would think we would want to return an error here
since we had packet that was munged.
> + opt_data_len = ptr[1]; /* IPv6 option data length */
> + if (ptr + 2 + opt_data_len > tail)
> + break;
Same thing.
> + if (calipso_getattr(ptr, secattr) == 0)
> return 0;
> break;
> + }
> #endif /* IPv6 */
> }
>
> --
> 2.47.3
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH net 4/4] netlabel: validate CIPSO option against skb tail in netlbl_skbuff_getattr
From: Paul Moore @ 2026-05-15 2:18 UTC (permalink / raw)
To: Qi Tang
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, stable,
Simon Horman, linux-security-module
In-Reply-To: <20260514165139.436961-5-tpluszz77@gmail.com>
On Thu, May 14, 2026 at 12:52 PM Qi Tang <tpluszz77@gmail.com> wrote:
>
> netlbl_skbuff_getattr() locates the CIPSO option in the IPv4 IP header
> via cipso_v4_optptr() and hands the bare pointer to cipso_v4_getattr().
> The consumer re-reads cipso[1] (option length), cipso[6] (tag type),
> and then cipso_v4_parsetag_*() re-reads further bytes from the skb.
>
> __ip_options_compile() validates these bytes only at parse time. An
> nftables LOCAL_IN payload write reachable from an unprivileged user
> namespace can rewrite them after parse and before the SELinux/Smack
> peer-label consume path (selinux_sock_rcv_skb_compat ->
> selinux_netlbl_sock_rcv_skb -> netlbl_skbuff_getattr). This is the
> IPv4 analogue of the CALIPSO IPv6 trust-after-modification fixed in
> the previous patch: the tag parsers walk the option using attacker-
> controlled length bytes, producing slab-out-of-bounds reads whose
> contents feed into the MLS access decision.
>
> Validate the option fits within skb_tail_pointer(skb) before invoking
> cipso_v4_getattr().
>
> Runtime confirmation (Smack peer-label policy + nft LOCAL_IN
> mutation of tag_len): UdpInDatagrams increments to 1 and recvfrom
> returns the payload, showing netlbl_skbuff_getattr ->
> cipso_v4_getattr -> cipso_v4_parsetag_rbm -> netlbl_bitmap_walk runs
> end-to-end past the option's true bound; with this patch the
> consume path short-circuits at the bounds check and the counter
> stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 04f81f0154e4 ("cipso: don't use IPCB() to locate the CIPSO IP option")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
> net/netlabel/netlabel_kapi.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 4af8ab76964e0..ace561a2904a4 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1393,11 +1393,21 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
> unsigned char *ptr;
>
> switch (family) {
> - case AF_INET:
> + case AF_INET: {
> + const unsigned char *tail = skb_tail_pointer(skb);
> + u8 opt_len, tag_len;
> +
> ptr = cipso_v4_optptr(skb);
> - if (ptr && cipso_v4_getattr(ptr, secattr) == 0)
> + if (!ptr || ptr + 8 > tail)
> + break;
Similar to my CALIPSO comments, I suspect we would want to return an
error here, yes?
Also, how did you arrive at the magic number of '8' above?
> + opt_len = ptr[1]; /* total CIPSO option length */
> + tag_len = ptr[7]; /* first tag length */
> + if (ptr + opt_len > tail || ptr + 6 + tag_len > tail)
> + break;
> + if (cipso_v4_getattr(ptr, secattr) == 0)
> return 0;
> break;
> + }
> #if IS_ENABLED(CONFIG_IPV6)
> case AF_INET6: {
> const unsigned char *tail = skb_tail_pointer(skb);
> --
> 2.47.3
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15 2:42 UTC (permalink / raw)
To: paul
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, horms, huw, casey,
linux-security-module, Qi Tang
In-Reply-To: <CAHC9VhR52b2FbD-aiMFsaXwwRrUGTLSdRFzWcVAZjUm-K3qgkw@mail.gmail.com>
Agreed, -EINVAL is right. The bytes passed parse-time
validation, so hitting either bounds check at consume time means
they were mutated after parse. Treating such a packet as "no
label" via netlbl_unlabel_getattr() drops it into the wrong
default. v2 returns -EINVAL on both checks.
Will also drop the Smack mention from the commit message (Casey
flagged that separately).
Qi
^ permalink raw reply
* Re: [PATCH net 4/4] netlabel: validate CIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15 2:42 UTC (permalink / raw)
To: paul
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, horms,
linux-security-module, Qi Tang
In-Reply-To: <CAHC9VhS63xq5Pja2iA4DEkRU5sqpQ8ozXzgLBaE6Ck4PDCKpMQ@mail.gmail.com>
Agreed on the return value, same reasoning as on 3/4: a length
mismatch here means post-parse mutation, and the unlabeled
fallback is the wrong default for that. v2 returns -EINVAL on
all three CIPSO bounds checks.
The 8 is the offset of the first tag's length byte. CIPSO option
header is type(1) + length(1) + DOI(4) = 6, plus the first tag
header type(1) + length(1) = 2. We need ptr+8 readable before
dereferencing ptr[7]. v2 will document this inline, and use
CIPSO_V4_HDR_LEN if it's exposed in the header.
Qi
^ permalink raw reply
* Re: [PATCH] killswitch: add per-function short-circuit mitigation primitive
From: Paul Moore @ 2026-05-15 3:48 UTC (permalink / raw)
To: Sasha Levin
Cc: corbet, akpm, skhan, linux-doc, linux-kernel, linux-kselftest,
gregkh, linux-security-module
In-Reply-To: <20260507070547.2268452-1-sashal@kernel.org>
On Thu, May 7, 2026 at 3:05 AM Sasha Levin <sashal@kernel.org> wrote:
>
> When a (security) issue goes public, fleets stay exposed until a patched kernel
> is built, distributed, and rebooted into.
>
> For many such issues the simplest mitigation is to stop calling the buggy
> function. Killswitch provides that. An admin writes:
>
> echo "engage af_alg_sendmsg -1" \
> > /sys/kernel/security/killswitch/control
>
> After this, af_alg_sendmsg() returns -EPERM on every call without
> running its body. The mitigation takes effect immediately, and is dropped on
> the next reboot.
>
> A lot of recent kernel issues sit in code paths most installs only have enabled
> to support a relative minority of users: AF_ALG, ksmbd, nf_tables, vsock, ax25,
> and friends.
>
> For most users, the cost of "this socket family stops working for the day" is
> much smaller than the cost of running a known vulnerable kernel until the fix
> land.
>
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> Documentation/admin-guide/index.rst | 1 +
> Documentation/admin-guide/killswitch.rst | 159 ++++
> Documentation/admin-guide/tainted-kernels.rst | 8 +
> MAINTAINERS | 11 +
> include/linux/killswitch.h | 19 +
> include/linux/panic.h | 3 +-
> init/Kconfig | 2 +
> kernel/Kconfig.killswitch | 31 +
> kernel/Makefile | 1 +
> kernel/killswitch.c | 798 ++++++++++++++++++
> kernel/panic.c | 1 +
> lib/Kconfig.debug | 13 +
> lib/Makefile | 1 +
> lib/test_killswitch.c | 85 ++
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/killswitch/.gitignore | 1 +
> tools/testing/selftests/killswitch/Makefile | 8 +
> .../selftests/killswitch/cve_31431_test.c | 162 ++++
> .../selftests/killswitch/killswitch_test.sh | 147 ++++
> 19 files changed, 1451 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/admin-guide/killswitch.rst
> create mode 100644 include/linux/killswitch.h
> create mode 100644 kernel/Kconfig.killswitch
> create mode 100644 kernel/killswitch.c
> create mode 100644 lib/test_killswitch.c
> create mode 100644 tools/testing/selftests/killswitch/.gitignore
> create mode 100644 tools/testing/selftests/killswitch/Makefile
> create mode 100644 tools/testing/selftests/killswitch/cve_31431_test.c
> create mode 100755 tools/testing/selftests/killswitch/killswitch_test.sh
If we made Lockdown an LSM, we should probably also make killswitch an LSM.
For the LSM crowd who might be seeing this for the first time, the
original thread can be found on lore via the link below:
https://lore.kernel.org/all/20260507070547.2268452-1-sashal@kernel.org
--
paul-moore.com
^ permalink raw reply
* [PATCH] apparmor: hold peer path references in aa_unix_file_perm()
From: Zhang Cen @ 2026-05-15 5:01 UTC (permalink / raw)
To: John Johansen, Paul Moore, James Morris, Serge E. Hallyn
Cc: apparmor, linux-security-module, linux-kernel, zerocling0077,
2045gemini, Zhang Cen
aa_unix_file_perm() keeps the connected peer alive with sock_hold(peer_sk),
but it then carries unix_sk(peer_sk)->path outside the peer socket state
lock without taking a path reference. That copied peer_path can race with
unix_release_sock(), which clears u->path under unix_state_lock(peer_sk)
and drops the socket-owned path reference with path_put() before the final
sock_put(peer_sk).
Take peer_sk's unix_state_lock() long enough to snapshot peer_path,
cache whether the peer is filesystem-bound, and path_get() a non-NULL
path before dropping the lock. Drop that path reference after the last
AppArmor peer path check. This restores the ownership invariant for
peer_path without changing AF_UNIX shutdown semantics once the peer path
has already been cleared.
The buggy scenario involves two paths, with each column showing the
order within that path:
aa_unix_file_perm() [borrower]: unix_release_sock() [peer close]:
1. unix_state_lock(sock->sk) 1. unix_state_lock(peer_sk)
2. peer_sk = unix_peer(sock->sk) 2. Save path = u->path
3. sock_hold(peer_sk) 3. Clear u->path.dentry/mnt
4. unix_state_unlock(sock->sk) 4. unix_state_unlock(peer_sk)
5. peer_path = unix_sk(peer_sk)->path 5. path_put(&path)
6. unix_fs_perm(&peer_path) 6. sock_put(peer_sk)
KASAN reported a slab-use-after-free in unix_fs_perm() at
security/apparmor/af_unix.c:46, with the free side in
unix_release_sock() -> path_put() at net/unix/af_unix.c:730.
Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
---
security/apparmor/af_unix.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/security/apparmor/af_unix.c b/security/apparmor/af_unix.c
index fdb4a9f21..7a1562f6f 100644
--- a/security/apparmor/af_unix.c
+++ b/security/apparmor/af_unix.c
@@ -716,7 +716,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
struct sock *peer_sk = NULL;
u32 sk_req = request & ~NET_PEER_MASK;
struct path path;
- bool is_sk_fs;
+ struct path peer_path = {};
+ bool is_sk_fs, is_peer_fs = false;
int error = 0;
AA_BUG(!label);
@@ -724,9 +725,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
AA_BUG(!sock->sk);
AA_BUG(sock->sk->sk_family != PF_UNIX);
- /* investigate only using lock via unix_peer_get()
- * addr only needs the memory barrier, but need to investigate
- * path
+ /* addr only needs the memory barrier; hold a peer path reference
+ * under peer_sk's state lock after sock_hold(peer_sk)
*/
unix_state_lock(sock->sk);
peer_sk = unix_peer(sock->sk);
@@ -749,14 +749,18 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
goto out;
peer_addr = aa_sunaddr(unix_sk(peer_sk), &peer_addrlen);
-
- struct path peer_path;
-
- peer_path = unix_sk(peer_sk)->path;
- if (!is_sk_fs && is_unix_fs(peer_sk)) {
+ if (!is_sk_fs) {
+ unix_state_lock(peer_sk);
+ is_peer_fs = is_unix_fs(peer_sk);
+ peer_path = unix_sk(peer_sk)->path;
+ if (peer_path.dentry)
+ path_get(&peer_path);
+ unix_state_unlock(peer_sk);
+ }
+ if (!is_sk_fs && is_peer_fs) {
last_error(error,
unix_fs_perm(op, request, subj_cred, label,
- is_unix_fs(peer_sk) ? &peer_path : NULL));
+ &peer_path));
} else if (!is_sk_fs) {
struct aa_label *plabel;
struct aa_sk_ctx *pctx = aa_sock(peer_sk);
@@ -772,12 +776,12 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
MAY_READ | MAY_WRITE, sock->sk,
is_sk_fs ? &path : NULL,
peer_addr, peer_addrlen,
- is_unix_fs(peer_sk) ?
+ is_peer_fs ?
&peer_path : NULL,
plabel),
unix_peer_perm(file->f_cred, plabel, op,
MAY_READ | MAY_WRITE, peer_sk,
- is_unix_fs(peer_sk) ?
+ is_peer_fs ?
&peer_path : NULL,
addr, addrlen,
is_sk_fs ? &path : NULL,
@@ -785,6 +789,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
if (!error && !__aa_subj_label_is_cached(plabel, label))
update_peer_ctx(peer_sk, pctx, label);
}
+ if (peer_path.dentry)
+ path_put(&peer_path);
sock_put(peer_sk);
out:
@@ -796,4 +802,3 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
return error;
}
-
--
2.43.0
^ permalink raw reply related
* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian Brauner @ 2026-05-15 13:53 UTC (permalink / raw)
To: Albert Esteve
Cc: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan, Sumit Semwal, Christian König, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
linux-media, dri-devel, linaro-mm-sig, linux-mm,
linux-security-module, selinux, linux-kselftest, mripard,
echanude
In-Reply-To: <20260512-v2_20230123_tjmercier_google_com-v1-2-6326701c3691@redhat.com>
On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> On embedded platforms a central process often allocates dma-buf
> memory on behalf of client applications. Without a way to
> attribute the charge to the requesting client's cgroup, the
> cost lands on the allocator, making per-cgroup memory limits
> ineffective for the actual consumers.
>
> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
Please be aware that pidfds come in two flavors:
thread-group pidfds and thread-specific pidfds. Make sure that your API
doesn't implicitly depend on this distinction not existing.
> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> the mem_accounting module parameter enabled, the buffer is charged
> to the allocator's own cgroup.
>
> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> all accounting through a single MEMCG_DMABUF path.
>
> Usage examples:
>
> 1. Central allocator charging to a client at allocation time.
> The allocator knows the client's PID (e.g., from binder's
> sender_pid) and uses pidfd to attribute the charge:
>
> pid_t client_pid = txn->sender_pid;
> int pidfd = pidfd_open(client_pid, 0);
>
> struct dma_heap_allocation_data alloc = {
> .len = buffer_size,
> .fd_flags = O_RDWR | O_CLOEXEC,
> .charge_pid_fd = pidfd,
> };
> ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> close(pidfd);
> /* alloc.fd is now charged to client's cgroup */
>
> 2. Default allocation (no pidfd, mem_accounting=1).
> When charge_pid_fd is not set and the mem_accounting module
> parameter is enabled, the buffer is charged to the allocator's
> own cgroup:
>
> struct dma_heap_allocation_data alloc = {
> .len = buffer_size,
> .fd_flags = O_RDWR | O_CLOEXEC,
> };
> ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> /* charged to current process's cgroup */
>
> Current limitations:
>
> - Single-owner model: a dma-buf carries one memcg charge regardless of
> how many processes share it. Means only the first owner (and exporter)
> of the shared buffer bears the charge.
> - Only memcg accounting supported. While this makes sense for system
> heap buffers, other heaps (e.g., CMA heaps) will require selectively
> charging also for the dmem controller.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 5 ++--
> drivers/dma-buf/dma-buf.c | 16 ++++---------
> drivers/dma-buf/dma-heap.c | 42 ++++++++++++++++++++++++++++++---
> drivers/dma-buf/heaps/system_heap.c | 2 --
> include/uapi/linux/dma-heap.h | 6 +++++
> 5 files changed, 53 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 8bdbc2e866430..824d269531eb1 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> structures.
>
> dmabuf (npn)
> - Amount of memory used for exported DMA buffers allocated by the cgroup.
> - Stays with the allocating cgroup regardless of how the buffer is shared.
> + Amount of memory used for exported DMA buffers allocated by or on
> + behalf of the cgroup. Stays with the allocating cgroup regardless
> + of how the buffer is shared.
>
> workingset_refault_anon
> Number of refaults of previously evicted anonymous pages.
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index ce02377f48908..23fb758b78297 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> */
> BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
>
> - mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> - mem_cgroup_put(dmabuf->memcg);
> + if (dmabuf->memcg) {
> + mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> + PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> + mem_cgroup_put(dmabuf->memcg);
> + }
>
> dmabuf->ops->release(dmabuf);
>
> @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> dmabuf->resv = resv;
> }
>
> - dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> - if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> - GFP_KERNEL)) {
> - ret = -ENOMEM;
> - goto err_memcg;
> - }
> -
> file->private_data = dmabuf;
> file->f_path.dentry->d_fsdata = dmabuf;
> dmabuf->file = file;
> @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>
> return dmabuf;
>
> -err_memcg:
> - mem_cgroup_put(dmabuf->memcg);
> err_file:
> fput(file);
> err_module:
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index ac5f8685a6494..ff6e259afcdc0 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -7,13 +7,17 @@
> */
>
> #include <linux/cdev.h>
> +#include <linux/cgroup.h>
> #include <linux/device.h>
> #include <linux/dma-buf.h>
> #include <linux/dma-heap.h>
> +#include <linux/memcontrol.h>
> +#include <linux/sched/mm.h>
> #include <linux/err.h>
> #include <linux/export.h>
> #include <linux/list.h>
> #include <linux/nospec.h>
> +#include <linux/pidfd.h>
> #include <linux/syscalls.h>
> #include <linux/uaccess.h>
> #include <linux/xarray.h>
> @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
>
> static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> - u32 fd_flags,
> - u64 heap_flags)
> + u32 fd_flags, u64 heap_flags,
> + struct mem_cgroup *charge_to)
> {
> struct dma_buf *dmabuf;
> + unsigned int nr_pages;
> + struct mem_cgroup *memcg = charge_to;
> int fd;
>
> /*
> @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> if (IS_ERR(dmabuf))
> return PTR_ERR(dmabuf);
>
> + nr_pages = len / PAGE_SIZE;
> +
> + if (memcg)
> + css_get(&memcg->css);
> + else if (mem_accounting)
> + memcg = get_mem_cgroup_from_mm(current->mm);
> +
> + if (memcg) {
> + if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> + mem_cgroup_put(memcg);
> + dma_buf_put(dmabuf);
> + return -ENOMEM;
> + }
> + dmabuf->memcg = memcg;
> + }
> +
> fd = dma_buf_fd(dmabuf, fd_flags);
> if (fd < 0) {
> dma_buf_put(dmabuf);
> @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> {
> struct dma_heap_allocation_data *heap_allocation = data;
> struct dma_heap *heap = file->private_data;
> + struct mem_cgroup *memcg = NULL;
> + struct task_struct *task;
> + unsigned int pidfd_flags;
> int fd;
>
> if (heap_allocation->fd)
> @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> return -EINVAL;
>
> + if (heap_allocation->charge_pid_fd) {
> + task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
Will always get a thread-group leader pidfd and will fail if this is a
thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
open a thread-specific pidfd.
> + if (IS_ERR(task))
> + return PTR_ERR(task);
> +
> + memcg = get_mem_cgroup_from_mm(task->mm);
> + put_task_struct(task);
> + }
> +
> fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> heap_allocation->fd_flags,
> - heap_allocation->heap_flags);
> + heap_allocation->heap_flags,
> + memcg);
> + mem_cgroup_put(memcg);
> if (fd < 0)
> return fd;
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 03c2b87cb1112..95d7688167b93 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> if (max_order < orders[i])
> continue;
> flags = order_flags[i];
> - if (mem_accounting)
> - flags |= __GFP_ACCOUNT;
> page = alloc_pages(flags, orders[i]);
> if (!page)
> continue;
> diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> index a4cf716a49fa6..e02b0f8cbc6a1 100644
> --- a/include/uapi/linux/dma-heap.h
> +++ b/include/uapi/linux/dma-heap.h
> @@ -29,6 +29,10 @@
> * handle to the allocated dma-buf
> * @fd_flags: file descriptor flags used when allocating
> * @heap_flags: flags passed to heap
> + * @charge_pid_fd: optional pidfd of the process whose cgroup should be
> + * charged for this allocation; 0 means charge the calling
> + * process's cgroup
> + * @__padding: reserved, must be zero
> *
> * Provided by userspace as an argument to the ioctl
> */
> @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> __u32 fd;
> __u32 fd_flags;
> __u64 heap_flags;
> + __u32 charge_pid_fd;
> + __u32 __padding;
> };
>
> #define DMA_HEAP_IOC_MAGIC 'H'
>
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH] Documentation: fix typo and formattting in security/credentials.rst
From: Jonathan Corbet @ 2026-05-15 14:10 UTC (permalink / raw)
To: Mayank Gite, Paul Moore
Cc: Mayank Gite, Serge Hallyn, Shuah Khan, linux-security-module,
linux-doc, linux-kernel
In-Reply-To: <20260506225925.271163-1-drapl0n.kernel@gmail.com>
Mayank Gite <drapl0n.kernel@gmail.com> writes:
> - Fixes a typo in "Keys and keyrings" section. Replaces "keying" with
> "keyring".
> - Updates formatting of keyring types.
>
> Signed-off-by: Mayank Gite <drapl0n.kernel@gmail.com>
> ---
> Documentation/security/credentials.rst | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst
> index d0191c8b8060..4996838491b1 100644
> --- a/Documentation/security/credentials.rst
> +++ b/Documentation/security/credentials.rst
> @@ -189,9 +189,9 @@ The Linux kernel supports the following types of credentials:
> be searched for the desired key. Each process may subscribe to a number
> of keyrings:
>
> - Per-thread keying
> - Per-process keyring
> - Per-session keyring
> + - Per-thread keyring
> + - Per-process keyring
> + - Per-session keyring
>
Applied, thanks.
jon
^ permalink raw reply
* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: T.J. Mercier @ 2026-05-15 17:06 UTC (permalink / raw)
To: Christian Brauner
Cc: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan, Sumit Semwal, Christian König,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
linux-media, dri-devel, linaro-mm-sig, linux-mm,
linux-security-module, selinux, linux-kselftest, mripard,
echanude
In-Reply-To: <20260515-hinschauen-effizient-9e3a05a94f2e@brauner>
On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> > On embedded platforms a central process often allocates dma-buf
> > memory on behalf of client applications. Without a way to
> > attribute the charge to the requesting client's cgroup, the
> > cost lands on the allocator, making per-cgroup memory limits
> > ineffective for the actual consumers.
> >
> > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>
> Please be aware that pidfds come in two flavors:
>
> thread-group pidfds and thread-specific pidfds. Make sure that your API
> doesn't implicitly depend on this distinction not existing.
Hi Christian,
Memcg is not a controller that supports "thread mode" so all threads
in a group should belong to the same memcg.
Checking the flags from pidfd_get_pid would be the best way for an
explicit check of the pidfd type?
> > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > the mem_accounting module parameter enabled, the buffer is charged
> > to the allocator's own cgroup.
> >
> > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > all accounting through a single MEMCG_DMABUF path.
> >
> > Usage examples:
> >
> > 1. Central allocator charging to a client at allocation time.
> > The allocator knows the client's PID (e.g., from binder's
> > sender_pid) and uses pidfd to attribute the charge:
> >
> > pid_t client_pid = txn->sender_pid;
> > int pidfd = pidfd_open(client_pid, 0);
> >
> > struct dma_heap_allocation_data alloc = {
> > .len = buffer_size,
> > .fd_flags = O_RDWR | O_CLOEXEC,
> > .charge_pid_fd = pidfd,
> > };
> > ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > close(pidfd);
> > /* alloc.fd is now charged to client's cgroup */
> >
> > 2. Default allocation (no pidfd, mem_accounting=1).
> > When charge_pid_fd is not set and the mem_accounting module
> > parameter is enabled, the buffer is charged to the allocator's
> > own cgroup:
> >
> > struct dma_heap_allocation_data alloc = {
> > .len = buffer_size,
> > .fd_flags = O_RDWR | O_CLOEXEC,
> > };
> > ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > /* charged to current process's cgroup */
> >
> > Current limitations:
> >
> > - Single-owner model: a dma-buf carries one memcg charge regardless of
> > how many processes share it. Means only the first owner (and exporter)
> > of the shared buffer bears the charge.
> > - Only memcg accounting supported. While this makes sense for system
> > heap buffers, other heaps (e.g., CMA heaps) will require selectively
> > charging also for the dmem controller.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> > Documentation/admin-guide/cgroup-v2.rst | 5 ++--
> > drivers/dma-buf/dma-buf.c | 16 ++++---------
> > drivers/dma-buf/dma-heap.c | 42 ++++++++++++++++++++++++++++++---
> > drivers/dma-buf/heaps/system_heap.c | 2 --
> > include/uapi/linux/dma-heap.h | 6 +++++
> > 5 files changed, 53 insertions(+), 18 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 8bdbc2e866430..824d269531eb1 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> > structures.
> >
> > dmabuf (npn)
> > - Amount of memory used for exported DMA buffers allocated by the cgroup.
> > - Stays with the allocating cgroup regardless of how the buffer is shared.
> > + Amount of memory used for exported DMA buffers allocated by or on
> > + behalf of the cgroup. Stays with the allocating cgroup regardless
> > + of how the buffer is shared.
> >
> > workingset_refault_anon
> > Number of refaults of previously evicted anonymous pages.
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index ce02377f48908..23fb758b78297 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> > */
> > BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> >
> > - mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > - mem_cgroup_put(dmabuf->memcg);
> > + if (dmabuf->memcg) {
> > + mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > + PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > + mem_cgroup_put(dmabuf->memcg);
> > + }
> >
> > dmabuf->ops->release(dmabuf);
> >
> > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > dmabuf->resv = resv;
> > }
> >
> > - dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > - if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > - GFP_KERNEL)) {
> > - ret = -ENOMEM;
> > - goto err_memcg;
> > - }
> > -
> > file->private_data = dmabuf;
> > file->f_path.dentry->d_fsdata = dmabuf;
> > dmabuf->file = file;
> > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >
> > return dmabuf;
> >
> > -err_memcg:
> > - mem_cgroup_put(dmabuf->memcg);
> > err_file:
> > fput(file);
> > err_module:
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index ac5f8685a6494..ff6e259afcdc0 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -7,13 +7,17 @@
> > */
> >
> > #include <linux/cdev.h>
> > +#include <linux/cgroup.h>
> > #include <linux/device.h>
> > #include <linux/dma-buf.h>
> > #include <linux/dma-heap.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/sched/mm.h>
> > #include <linux/err.h>
> > #include <linux/export.h>
> > #include <linux/list.h>
> > #include <linux/nospec.h>
> > +#include <linux/pidfd.h>
> > #include <linux/syscalls.h>
> > #include <linux/uaccess.h>
> > #include <linux/xarray.h>
> > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> > "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> >
> > static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > - u32 fd_flags,
> > - u64 heap_flags)
> > + u32 fd_flags, u64 heap_flags,
> > + struct mem_cgroup *charge_to)
> > {
> > struct dma_buf *dmabuf;
> > + unsigned int nr_pages;
> > + struct mem_cgroup *memcg = charge_to;
> > int fd;
> >
> > /*
> > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > if (IS_ERR(dmabuf))
> > return PTR_ERR(dmabuf);
> >
> > + nr_pages = len / PAGE_SIZE;
> > +
> > + if (memcg)
> > + css_get(&memcg->css);
> > + else if (mem_accounting)
> > + memcg = get_mem_cgroup_from_mm(current->mm);
> > +
> > + if (memcg) {
> > + if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > + mem_cgroup_put(memcg);
> > + dma_buf_put(dmabuf);
> > + return -ENOMEM;
> > + }
> > + dmabuf->memcg = memcg;
> > + }
> > +
> > fd = dma_buf_fd(dmabuf, fd_flags);
> > if (fd < 0) {
> > dma_buf_put(dmabuf);
> > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > {
> > struct dma_heap_allocation_data *heap_allocation = data;
> > struct dma_heap *heap = file->private_data;
> > + struct mem_cgroup *memcg = NULL;
> > + struct task_struct *task;
> > + unsigned int pidfd_flags;
> > int fd;
> >
> > if (heap_allocation->fd)
> > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > return -EINVAL;
> >
> > + if (heap_allocation->charge_pid_fd) {
> > + task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
>
> Will always get a thread-group leader pidfd and will fail if this is a
> thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
> open a thread-specific pidfd.
>
> > + if (IS_ERR(task))
> > + return PTR_ERR(task);
> > +
> > + memcg = get_mem_cgroup_from_mm(task->mm);
> > + put_task_struct(task);
> > + }
> > +
> > fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > heap_allocation->fd_flags,
> > - heap_allocation->heap_flags);
> > + heap_allocation->heap_flags,
> > + memcg);
> > + mem_cgroup_put(memcg);
> > if (fd < 0)
> > return fd;
> >
> > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > index 03c2b87cb1112..95d7688167b93 100644
> > --- a/drivers/dma-buf/heaps/system_heap.c
> > +++ b/drivers/dma-buf/heaps/system_heap.c
> > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> > if (max_order < orders[i])
> > continue;
> > flags = order_flags[i];
> > - if (mem_accounting)
> > - flags |= __GFP_ACCOUNT;
> > page = alloc_pages(flags, orders[i]);
> > if (!page)
> > continue;
> > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > --- a/include/uapi/linux/dma-heap.h
> > +++ b/include/uapi/linux/dma-heap.h
> > @@ -29,6 +29,10 @@
> > * handle to the allocated dma-buf
> > * @fd_flags: file descriptor flags used when allocating
> > * @heap_flags: flags passed to heap
> > + * @charge_pid_fd: optional pidfd of the process whose cgroup should be
> > + * charged for this allocation; 0 means charge the calling
> > + * process's cgroup
> > + * @__padding: reserved, must be zero
> > *
> > * Provided by userspace as an argument to the ioctl
> > */
> > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> > __u32 fd;
> > __u32 fd_flags;
> > __u64 heap_flags;
> > + __u32 charge_pid_fd;
> > + __u32 __padding;
> > };
> >
> > #define DMA_HEAP_IOC_MAGIC 'H'
> >
> > --
> > 2.53.0
> >
^ permalink raw reply
* Re: [PATCH v5 00/13] ima: Introduce staging mechanism
From: Lakshmi Ramasubramanian @ 2026-05-15 17:37 UTC (permalink / raw)
To: Roberto Sassu, steven chen, corbet, skhan, zohar, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, Roberto Sassu
In-Reply-To: <2302296a13b847960dbdbab3cf5518b275938838.camel@huaweicloud.com>
Thanks for the response Roberto.
On 5/12/2026 1:17 AM, Roberto Sassu wrote:
>>>
>>> This submission proposes two ways for log trimming:
>>>
>>> *Flavor 1:* Staging With Prompt
>>> *Flavor 2:* Stage and Delete N
>>>
>
> I'm happy to support your trimming method. Just does not fit with my
> use case. I would like to keep both.
>
If "Flavor 1: Staging With Prompt" would be beneficial to the Linux
kernel customers, in general, we should continue to review the change
and merge it eventually.
My request, then, would be to split this patch set into 2 parts:
Part 1: Implements "Staging With Prompt"
Part 2: Implements "Stage and Delete N"
I think that would make it easier for reviewing the code, test\validate,
and merge.
Thanks,
-lakshmi
^ permalink raw reply
* Re: [PATCH v1] landlock: Demonstrate best-effort allowed_access filtering
From: Günther Noack @ 2026-05-15 17:53 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Günther Noack, linux-security-module, Justin Suess,
Tingmao Wang
In-Reply-To: <20260513151856.148423-1-mic@digikod.net>
On Wed, May 13, 2026 at 05:18:53PM +0200, Mickaël Salaün wrote:
> Landlock provides best-effort sandboxing across ABI versions:
> applications request the rights they need, and on older kernels the
> unsupported rights are silently dropped from handled_access_* by the
> documented compatibility switch. The recommended pattern for
> landlock_add_rule(2) calls is to mirror this filtering at the rule
> level, which wasn't explicitly described in the exemple.
>
> Show the pattern explicitly in the filesystem and network rule examples
> by masking each rule's allowed_access against the ruleset's
> handled_access_* and adding the rule only when at least one bit remains
> set. This makes the recommended best-effort pattern self-documenting.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> Documentation/userspace-api/landlock.rst | 48 +++++++++++++-----------
> 1 file changed, 27 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index fd8b78c31f2f..45861fa75685 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> =====================================
>
> :Author: Mickaël Salaün
> -:Date: March 2026
> +:Date: May 2026
>
> The goal of Landlock is to enable restriction of ambient rights (e.g. global
> filesystem or network access) for a set of processes. Because Landlock
> @@ -155,7 +155,7 @@ this file descriptor.
>
> .. code-block:: c
>
> - int err;
> + int err = 0;
> struct landlock_path_beneath_attr path_beneath = {
> .allowed_access =
> LANDLOCK_ACCESS_FS_EXECUTE |
> @@ -163,25 +163,29 @@ this file descriptor.
> LANDLOCK_ACCESS_FS_READ_DIR,
> };
>
> - path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> - if (path_beneath.parent_fd < 0) {
> - perror("Failed to open file");
> - close(ruleset_fd);
> - return 1;
> - }
> - err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> - &path_beneath, 0);
> - close(path_beneath.parent_fd);
> - if (err) {
> - perror("Failed to update ruleset");
> - close(ruleset_fd);
> - return 1;
> + path_beneath.allowed_access &= ruleset_attr.handled_access_fs;
> + if (path_beneath.allowed_access) {
> + path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> + if (path_beneath.parent_fd < 0) {
> + perror("Failed to open file");
> + close(ruleset_fd);
> + return 1;
> + }
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> + &path_beneath, 0);
> + close(path_beneath.parent_fd);
> + if (err) {
> + perror("Failed to update ruleset");
> + close(ruleset_fd);
> + return 1;
> + }
> }
>
> -It may also be required to create rules following the same logic as explained
> -for the ruleset creation, by filtering access rights according to the Landlock
> -ABI version. In this example, this is not required because all of the requested
> -``allowed_access`` rights are already available in ABI 1.
> +As shown above, masking the rule's ``allowed_access`` against the ruleset's
> +``handled_access_*`` is the recommended best-effort pattern: rights the running
> +kernel does not support are dropped (the compatibility switch above already
> +cleared them in ``handled_access_*``), and the rule is skipped if no supported
> +right remains.
>
> For network access-control, we can add a set of rules that allow to use a port
> number for a specific action: HTTPS connections.
> @@ -193,8 +197,10 @@ number for a specific action: HTTPS connections.
> .port = 443,
> };
>
> - err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> - &net_port, 0);
> + net_port.allowed_access &= ruleset_attr.handled_access_net;
> + if (net_port.allowed_access)
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> + &net_port, 0);
>
> When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> similar backwards compatibility check is needed for the restrict flags
> --
> 2.54.0
>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Thanks for the documentation improvement!
–Günther
P.S.: Please don't forget to also transfer this change to the
landlock(7) man page, where we are using the same code example. I
believe the overlap is mostly in the code there, and the text is
slightly different.
^ permalink raw reply
* Re: [PATCH] rust: cred: add safe abstractions for capable() and ns_capable()
From: Miguel Ojeda @ 2026-05-15 19:07 UTC (permalink / raw)
To: Arnav Sharma
Cc: lkp, Andreas Hindborg, Alice Ryhl, Björn Roy Baron,
Boqun Feng, Danilo Krummrich, Gary Guo, linux-kernel,
linux-security-module, Benno Lossin, oe-kbuild-all, ojeda, paul,
rust-for-linux, Serge Hallyn, Trevor Gross
In-Reply-To: <CACeWta-RVTqcmLVmLTJ7yLrStjrLEyL_oYpG8QLAcy7sEyKmCA@mail.gmail.com>
On Fri, May 15, 2026 at 7:13 PM Arnav Sharma <arnav4324@gmail.com> wrote:
>
> I don't have an immediate in-tree use case for this. I'm fairly new to kernel development and was going through the existing Rust-for-Linux abstractions. I noticed that capable() and ns_capable() had no safe Rust wrappers and attempted to fill that gap — I was approaching it more from a "Hmm.. this seems missing" angle rather than "Do we even need it". I understand now that abstractions need concrete users to justify their inclusion, and I'll keep that in mind going forward.
Yeah, in general, all kernel code needs a user, i.e. a justification
for it to be added (i.e. it is a general rule, not just for Rust
abstractions).
Please see https://rust-for-linux.com/contributing#submitting-new-abstractions-and-modules
for some more details.
(By the way, your email uses HTML, so it may not reach the mailing
list. Please use plain text instead.)
Cheers,
Miguel
^ permalink raw reply
* [PATCH v4 0/7] lsm: Replace security_sb_mount with granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
This series replaces the monolithic security_sb_mount() hook with
per-operation mount hooks, addressing two main issues:
1. TOCTOU: security_sb_mount() receives dev_name as a string, which
LSMs like AppArmor and Tomoyo re-resolve via kern_path(). The new
hooks pass pre-resolved struct path pointers where possible (bind
mount, move mount), eliminating the double-resolution.
2. Conflation: security_sb_mount() handles bind, new mount, remount,
move, propagation changes, and mount reconfiguration through a
single hook, requiring LSMs to dispatch on flags internally. The
new hooks are called at the operation level with appropriate
context.
The new hooks are:
mount_bind - bind mount (pre-resolved source path)
mount_new - new filesystem mount (with fs_context)
mount_remount - filesystem remount (with fs_context)
mount_reconfigure - mount flag reconfiguration (MS_REMOUNT|MS_BIND)
mount_move - move mount (pre-resolved paths)
mount_change_type - propagation type changes
mount_new and mount_remount are called after parse_monolithic_mount_data(),
so LSMs have access to the fs_context with parsed mount options. They also
receive the original mount(2) flags and data pointer for LSMs (AppArmor,
Tomoyo) that need them for policy matching.
The series also replaces security_move_mount() with the new mount_move
hook, unifying the old mount(2) MS_MOVE path with the move_mount(2)
syscall path.
All existing LSM behaviors are preserved:
AppArmor: same policy matching, TOCTOU fixed for bind/move
SELinux: same permission checks (FILE__MOUNTON, FILESYSTEM__REMOUNT)
Landlock: same deny-all for sandboxed processes
Tomoyo: same policy matching, TOCTOU fixed for bind/move, unused
data_page parameter removed
This work is inspired by earlier discussions:
[1] https://lore.kernel.org/bpf/20251127005011.1872209-1-song@kernel.org/
[2] https://lore.kernel.org/linux-security-module/20250708230504.3994335-1-song@kernel.org/
Changes v3 => v4:
1. Move LSM_HOOK_INIT(move_mount, ...) removal from patch 7/7 to each
per-LSM conversion patch (3/7, 4/7, 5/7). (Paul Moore)
2. Add kdoc comments to tomoyo mount hook functions and rename
tomoyo_move_mount to tomoyo_mount_move in patch 6/7. (Tetsuo Handa)
3. Add Acked-by from Tetsuo Handa to patch 6/7.
v3: https://lore.kernel.org/linux-security-module/20260509015208.3853132-1-song@kernel.org/
Changes v2 => v3:
1. Rebase.
2. Move security_mount_move() call in vfs_move_mount() from patch 7/7
to patch 1/7. (Paul Moore)
v2: https://lore.kernel.org/linux-security-module/20260430000315.918964-1-song@kernel.org/
Changes v1 => v2:
1. Rebase.
2. Add Reviewed-by and Tested-by from Stephen Smalley.
v1: https://lore.kernel.org/linux-security-module/20260318184400.3502908-1-song@kernel.org/
Song Liu (7):
lsm: Add granular mount hooks to replace security_sb_mount
apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
apparmor: Convert from sb_mount to granular mount hooks
selinux: Convert from sb_mount to granular mount hooks
landlock: Convert from sb_mount to granular mount hooks
tomoyo: Convert from sb_mount to granular mount hooks
lsm: Remove security_sb_mount and security_move_mount
fs/namespace.c | 41 +++++++---
include/linux/lsm_hook_defs.h | 14 +++-
include/linux/security.h | 56 +++++++++++---
kernel/bpf/bpf_lsm.c | 7 +-
security/apparmor/include/mount.h | 5 +-
security/apparmor/lsm.c | 102 ++++++++++++++++++-------
security/apparmor/mount.c | 37 ++--------
security/landlock/fs.c | 41 ++++++++--
security/security.c | 119 +++++++++++++++++++++++-------
security/selinux/hooks.c | 49 ++++++++----
security/tomoyo/common.h | 2 +-
security/tomoyo/mount.c | 31 +++++---
security/tomoyo/tomoyo.c | 109 ++++++++++++++++++++++++---
13 files changed, 457 insertions(+), 156 deletions(-)
--
2.53.0-Meta
^ permalink raw reply
* [PATCH v4 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Add six new LSM hooks for mount operations:
- mount_bind(from, to, recurse): bind mount with pre-resolved
struct path for source and destination.
- mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
mount options are parsed. The flags and data parameters carry the
original mount(2) flags and data for LSMs that need them (AppArmor,
Tomoyo).
- mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
called after mount options are parsed into the fs_context.
- mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
(MS_REMOUNT|MS_BIND path).
- mount_move(from, to): move mount with pre-resolved paths.
- mount_change_type(mp, ms_flags): propagation type changes.
These replace the monolithic security_sb_mount() which conflates
multiple distinct operations into a single hook, and suffers from
TOCTOU issues where LSMs re-resolve string-based dev_name via
kern_path().
The mount_move hook is added alongside the existing move_mount hook.
During the transition, LSMs register for both hooks. The move_mount
hook will be removed once all LSMs have been converted.
Some LSMs, such as apparmor and tomoyo, audit the original input passed
in the mount syscall. To keep the same behavior, argument data and flags
are passed in do_* functions. These can be removed if these LSMs no
longer need these information.
All new hooks are registered as sleepable BPF LSM hooks.
Code generated with the assistance of Claude, reviewed by human.
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
Signed-off-by: Song Liu <song@kernel.org>
---
fs/namespace.c | 39 +++++++++++--
include/linux/lsm_hook_defs.h | 12 ++++
include/linux/security.h | 50 +++++++++++++++++
kernel/bpf/bpf_lsm.c | 7 +++
security/security.c | 101 ++++++++++++++++++++++++++++++++++
5 files changed, 203 insertions(+), 6 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..04e3bd7f6336 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2888,6 +2888,10 @@ static int do_change_type(const struct path *path, int ms_flags)
if (!type)
return -EINVAL;
+ err = security_mount_change_type(path, ms_flags);
+ if (err)
+ return err;
+
guard(namespace_excl)();
err = may_change_propagation(mnt);
@@ -3006,6 +3010,10 @@ static int do_loopback(const struct path *path, const char *old_name,
if (err)
return err;
+ err = security_mount_bind(&old_path, path, recurse);
+ if (err)
+ return err;
+
if (mnt_ns_loop(old_path.dentry))
return -EINVAL;
@@ -3328,7 +3336,8 @@ static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
* superblock it refers to. This is triggered by specifying MS_REMOUNT|MS_BIND
* to mount(2).
*/
-static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
+static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags,
+ unsigned long flags)
{
struct super_block *sb = path->mnt->mnt_sb;
struct mount *mnt = real_mount(path->mnt);
@@ -3343,6 +3352,10 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
if (!can_change_locked_flags(mnt, mnt_flags))
return -EPERM;
+ ret = security_mount_reconfigure(path, mnt_flags, flags);
+ if (ret)
+ return ret;
+
/*
* We're only checking whether the superblock is read-only not
* changing it, so only take down_read(&sb->s_umount).
@@ -3366,7 +3379,7 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
* on it - tough luck.
*/
static int do_remount(const struct path *path, int sb_flags,
- int mnt_flags, void *data)
+ int mnt_flags, void *data, unsigned long flags)
{
int err;
struct super_block *sb = path->mnt->mnt_sb;
@@ -3393,6 +3406,9 @@ static int do_remount(const struct path *path, int sb_flags,
fc->oldapi = true;
err = parse_monolithic_mount_data(fc, data);
+ if (!err)
+ err = security_mount_remount(fc, path, mnt_flags, flags,
+ data);
if (!err) {
down_write(&sb->s_umount);
err = -EPERM;
@@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
if (err)
return err;
+ err = security_mount_move(&old_path, path);
+ if (err)
+ return err;
+
return do_move_mount(&old_path, path, 0);
}
@@ -3786,7 +3806,7 @@ static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
*/
static int do_new_mount(const struct path *path, const char *fstype,
int sb_flags, int mnt_flags,
- const char *name, void *data)
+ const char *name, void *data, unsigned long flags)
{
struct file_system_type *type;
struct fs_context *fc;
@@ -3830,6 +3850,9 @@ static int do_new_mount(const struct path *path, const char *fstype,
err = parse_monolithic_mount_data(fc, data);
if (!err && !mount_capable(fc))
err = -EPERM;
+
+ if (!err)
+ err = security_mount_new(fc, path, mnt_flags, flags, data);
if (!err)
err = do_new_mount_fc(fc, path, mnt_flags);
@@ -4141,9 +4164,9 @@ int path_mount(const char *dev_name, const struct path *path,
SB_I_VERSION);
if ((flags & (MS_REMOUNT | MS_BIND)) == (MS_REMOUNT | MS_BIND))
- return do_reconfigure_mnt(path, mnt_flags);
+ return do_reconfigure_mnt(path, mnt_flags, flags);
if (flags & MS_REMOUNT)
- return do_remount(path, sb_flags, mnt_flags, data_page);
+ return do_remount(path, sb_flags, mnt_flags, data_page, flags);
if (flags & MS_BIND)
return do_loopback(path, dev_name, flags & MS_REC);
if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
@@ -4152,7 +4175,7 @@ int path_mount(const char *dev_name, const struct path *path,
return do_move_mount_old(path, dev_name);
return do_new_mount(path, type_page, sb_flags, mnt_flags, dev_name,
- data_page);
+ data_page, flags);
}
int do_mount(const char *dev_name, const char __user *dir_name,
@@ -4549,6 +4572,10 @@ static inline int vfs_move_mount(const struct path *from_path,
if (ret)
return ret;
+ ret = security_mount_move(from_path, to_path);
+ if (ret)
+ return ret;
+
if (mflags & MNT_TREE_PROPAGATION)
return do_set_group(from_path, to_path);
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..98f0fe382665 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -81,6 +81,18 @@ LSM_HOOK(int, 0, sb_clone_mnt_opts, const struct super_block *oldsb,
unsigned long *set_kern_flags)
LSM_HOOK(int, 0, move_mount, const struct path *from_path,
const struct path *to_path)
+LSM_HOOK(int, 0, mount_bind, const struct path *from, const struct path *to,
+ bool recurse)
+LSM_HOOK(int, 0, mount_new, struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+LSM_HOOK(int, 0, mount_remount, struct fs_context *fc,
+ const struct path *mp, int mnt_flags, unsigned long flags,
+ void *data)
+LSM_HOOK(int, 0, mount_reconfigure, const struct path *mp,
+ unsigned int mnt_flags, unsigned long flags)
+LSM_HOOK(int, 0, mount_move, const struct path *from_path,
+ const struct path *to_path)
+LSM_HOOK(int, 0, mount_change_type, const struct path *mp, int ms_flags)
LSM_HOOK(int, -EOPNOTSUPP, dentry_init_security, struct dentry *dentry,
int mode, const struct qstr *name, const char **xattr_name,
struct lsm_context *cp)
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..b1b3da51a88d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -386,6 +386,17 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
unsigned long kern_flags,
unsigned long *set_kern_flags);
int security_move_mount(const struct path *from_path, const struct path *to_path);
+int security_mount_bind(const struct path *from, const struct path *to,
+ bool recurse);
+int security_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data);
+int security_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data);
+int security_mount_reconfigure(const struct path *mp, unsigned int mnt_flags,
+ unsigned long flags);
+int security_mount_move(const struct path *from_path,
+ const struct path *to_path);
+int security_mount_change_type(const struct path *mp, int ms_flags);
int security_dentry_init_security(struct dentry *dentry, int mode,
const struct qstr *name,
const char **xattr_name,
@@ -854,6 +865,45 @@ static inline int security_move_mount(const struct path *from_path,
return 0;
}
+static inline int security_mount_bind(const struct path *from,
+ const struct path *to, bool recurse)
+{
+ return 0;
+}
+
+static inline int security_mount_new(struct fs_context *fc,
+ const struct path *mp, int mnt_flags,
+ unsigned long flags, void *data)
+{
+ return 0;
+}
+
+static inline int security_mount_remount(struct fs_context *fc,
+ const struct path *mp, int mnt_flags,
+ unsigned long flags, void *data)
+{
+ return 0;
+}
+
+static inline int security_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ return 0;
+}
+
+static inline int security_mount_move(const struct path *from_path,
+ const struct path *to_path)
+{
+ return 0;
+}
+
+static inline int security_mount_change_type(const struct path *mp,
+ int ms_flags)
+{
+ return 0;
+}
+
static inline int security_path_notify(const struct path *path, u64 mask,
unsigned int obj_type)
{
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index c5c925f00202..aa228372cfb4 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -382,6 +382,13 @@ BTF_ID(func, bpf_lsm_task_setscheduler)
BTF_ID(func, bpf_lsm_userns_create)
BTF_ID(func, bpf_lsm_bdev_alloc_security)
BTF_ID(func, bpf_lsm_bdev_setintegrity)
+BTF_ID(func, bpf_lsm_move_mount)
+BTF_ID(func, bpf_lsm_mount_bind)
+BTF_ID(func, bpf_lsm_mount_new)
+BTF_ID(func, bpf_lsm_mount_remount)
+BTF_ID(func, bpf_lsm_mount_reconfigure)
+BTF_ID(func, bpf_lsm_mount_move)
+BTF_ID(func, bpf_lsm_mount_change_type)
BTF_SET_END(sleepable_lsm_hooks)
BTF_SET_START(untrusted_lsm_hooks)
diff --git a/security/security.c b/security/security.c
index 4e999f023651..b7ec0ec7af26 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1182,6 +1182,107 @@ int security_move_mount(const struct path *from_path,
return call_int_hook(move_mount, from_path, to_path);
}
+/**
+ * security_mount_bind() - Check permissions for a bind mount
+ * @from: source path
+ * @to: destination mount point
+ * @recurse: whether this is a recursive bind mount
+ *
+ * Check permission before a bind mount is performed. Called with the
+ * source path already resolved, eliminating TOCTOU issues with
+ * string-based dev_name in security_sb_mount().
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
+{
+ return call_int_hook(mount_bind, from, to, recurse);
+}
+
+/**
+ * security_mount_new() - Check permissions for a new mount
+ * @fc: filesystem context with parsed options
+ * @mp: mount point path
+ * @mnt_flags: mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ * @data: filesystem specific data (used by AppArmor)
+ *
+ * Check permission before a new filesystem is mounted. Called after
+ * mount options are parsed, providing access to the fs_context.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return call_int_hook(mount_new, fc, mp, mnt_flags, flags, data);
+}
+
+/**
+ * security_mount_remount() - Check permissions for a remount
+ * @fc: filesystem context with parsed options
+ * @mp: mount point path
+ * @mnt_flags: mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ * @data: filesystem specific data (used by AppArmor)
+ *
+ * Check permission before a filesystem is remounted. Called after
+ * mount options are parsed, providing access to the fs_context.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return call_int_hook(mount_remount, fc, mp, mnt_flags, flags, data);
+}
+
+/**
+ * security_mount_reconfigure() - Check permissions for mount reconfiguration
+ * @mp: mount point path
+ * @mnt_flags: new mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ *
+ * Check permission before mount flags are reconfigured (MS_REMOUNT|MS_BIND).
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_reconfigure(const struct path *mp, unsigned int mnt_flags,
+ unsigned long flags)
+{
+ return call_int_hook(mount_reconfigure, mp, mnt_flags, flags);
+}
+
+/**
+ * security_mount_move() - Check permissions for moving a mount
+ * @from_path: source mount path
+ * @to_path: destination mount point path
+ *
+ * Check permission before a mount is moved.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_move(const struct path *from_path,
+ const struct path *to_path)
+{
+ return call_int_hook(mount_move, from_path, to_path);
+}
+
+/**
+ * security_mount_change_type() - Check permissions for propagation changes
+ * @mp: mount point path
+ * @ms_flags: propagation flags (MS_SHARED, MS_PRIVATE, etc.)
+ *
+ * Check permission before mount propagation type is changed.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_change_type(const struct path *mp, int ms_flags)
+{
+ return call_int_hook(mount_change_type, mp, ms_flags);
+}
+
/**
* security_path_notify() - Check if setting a watch is allowed
* @path: file path
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 2/7] apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
path_mount() already strips the magic number from flags before
calling security_sb_mount(), so this check in apparmor_sb_mount()
is a no-op. Remove it.
Code generated with the assistance of Claude, reviewed by human.
Signed-off-by: Song Liu <song@kernel.org>
---
security/apparmor/lsm.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 3491e9f60194..4415bca5889c 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -705,10 +705,6 @@ static int apparmor_sb_mount(const char *dev_name, const struct path *path,
int error = 0;
bool needput;
- /* Discard magic */
- if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
- flags &= ~MS_MGC_MSK;
-
flags &= ~AA_MS_IGNORE_MASK;
label = __begin_current_label_crit_section(&needput);
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 3/7] apparmor: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace AppArmor's monolithic apparmor_sb_mount() with granular
mount hooks.
Key changes:
- mount_bind: uses the pre-resolved struct path from VFS instead of
re-resolving dev_name via kern_path(), eliminating a TOCTOU
vulnerability. aa_bind_mount() now takes a struct path instead of
a string for the source.
- mount_new, mount_remount: receive the original mount(2) flags and
data parameters for policy matching via match_mnt_flags() and
AA_MNT_CONT_MATCH data matching.
- mount_reconfigure: handles MS_REMOUNT|MS_BIND (mount attribute
reconfiguration) which was previously handled as a remount.
- mount_move: reuses apparmor_move_mount() which already handles
pre-resolved paths.
- mount_change_type: propagation type changes.
aa_move_mount_old() is removed since move mounts now go through
security_mount_move() with pre-resolved struct path pointers for
both the old mount(2) and new move_mount(2) APIs.
Code generated with the assistance of Claude, reviewed by human.
Signed-off-by: Song Liu <song@kernel.org>
---
security/apparmor/include/mount.h | 5 +-
security/apparmor/lsm.c | 100 +++++++++++++++++++++++-------
security/apparmor/mount.c | 37 ++---------
3 files changed, 83 insertions(+), 59 deletions(-)
diff --git a/security/apparmor/include/mount.h b/security/apparmor/include/mount.h
index 46834f828179..088e2f938cc1 100644
--- a/security/apparmor/include/mount.h
+++ b/security/apparmor/include/mount.h
@@ -31,16 +31,13 @@ int aa_remount(const struct cred *subj_cred,
int aa_bind_mount(const struct cred *subj_cred,
struct aa_label *label, const struct path *path,
- const char *old_name, unsigned long flags);
+ const struct path *old_path, bool recurse);
int aa_mount_change_type(const struct cred *subj_cred,
struct aa_label *label, const struct path *path,
unsigned long flags);
-int aa_move_mount_old(const struct cred *subj_cred,
- struct aa_label *label, const struct path *path,
- const char *old_name);
int aa_move_mount(const struct cred *subj_cred,
struct aa_label *label, const struct path *from_path,
const struct path *to_path);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 4415bca5889c..b0de7f316f51 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -13,6 +13,7 @@
#include <linux/mm.h>
#include <linux/mman.h>
#include <linux/mount.h>
+#include <linux/fs_context.h>
#include <linux/namei.h>
#include <linux/ptrace.h>
#include <linux/ctype.h>
@@ -698,34 +699,83 @@ static int apparmor_uring_sqpoll(void)
}
#endif /* CONFIG_IO_URING */
-static int apparmor_sb_mount(const char *dev_name, const struct path *path,
- const char *type, unsigned long flags, void *data)
+static int apparmor_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
{
struct aa_label *label;
int error = 0;
bool needput;
- flags &= ~AA_MS_IGNORE_MASK;
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_bind_mount(current_cred(), label, to, from,
+ recurse);
+ __end_current_label_crit_section(label, needput);
+ return error;
+}
+
+static int apparmor_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ /* flags and data are from the original mount(2) call */
label = __begin_current_label_crit_section(&needput);
- if (!unconfined(label)) {
- if (flags & MS_REMOUNT)
- error = aa_remount(current_cred(), label, path, flags,
- data);
- else if (flags & MS_BIND)
- error = aa_bind_mount(current_cred(), label, path,
- dev_name, flags);
- else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE |
- MS_UNBINDABLE))
- error = aa_mount_change_type(current_cred(), label,
- path, flags);
- else if (flags & MS_MOVE)
- error = aa_move_mount_old(current_cred(), label, path,
- dev_name);
- else
- error = aa_new_mount(current_cred(), label, dev_name,
- path, type, flags, data);
- }
+ if (!unconfined(label))
+ error = aa_new_mount(current_cred(), label, fc->source,
+ mp, fc->fs_type->name, flags, data);
+ __end_current_label_crit_section(label, needput);
+
+ return error;
+}
+
+static int apparmor_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags,
+ void *data)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ /* flags and data are from the original mount(2) call */
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_remount(current_cred(), label, mp, flags, data);
+ __end_current_label_crit_section(label, needput);
+
+ return error;
+}
+
+static int apparmor_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ /* flags are from the original mount(2) call */
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_remount(current_cred(), label, mp, flags, NULL);
+ __end_current_label_crit_section(label, needput);
+
+ return error;
+}
+
+static int apparmor_mount_change_type(const struct path *mp, int ms_flags)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_mount_change_type(current_cred(), label, mp,
+ ms_flags);
__end_current_label_crit_section(label, needput);
return error;
@@ -1655,8 +1705,12 @@ static struct security_hook_list apparmor_hooks[] __ro_after_init = {
LSM_HOOK_INIT(capget, apparmor_capget),
LSM_HOOK_INIT(capable, apparmor_capable),
- LSM_HOOK_INIT(move_mount, apparmor_move_mount),
- LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
+ LSM_HOOK_INIT(mount_bind, apparmor_mount_bind),
+ LSM_HOOK_INIT(mount_new, apparmor_mount_new),
+ LSM_HOOK_INIT(mount_remount, apparmor_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, apparmor_mount_reconfigure),
+ LSM_HOOK_INIT(mount_move, apparmor_move_mount),
+ LSM_HOOK_INIT(mount_change_type, apparmor_mount_change_type),
LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
index 523570aa1a5a..38b40e16014f 100644
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -418,25 +418,17 @@ int aa_remount(const struct cred *subj_cred,
}
int aa_bind_mount(const struct cred *subj_cred,
- struct aa_label *label, const struct path *path,
- const char *dev_name, unsigned long flags)
+ struct aa_label *label, const struct path *path,
+ const struct path *old_path, bool recurse)
{
struct aa_profile *profile;
char *buffer = NULL, *old_buffer = NULL;
- struct path old_path;
+ unsigned long flags = MS_BIND | (recurse ? MS_REC : 0);
int error;
AA_BUG(!label);
AA_BUG(!path);
-
- if (!dev_name || !*dev_name)
- return -EINVAL;
-
- flags &= MS_REC | MS_BIND;
-
- error = kern_path(dev_name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &old_path);
- if (error)
- return error;
+ AA_BUG(!old_path);
buffer = aa_get_buffer(false);
old_buffer = aa_get_buffer(false);
@@ -445,12 +437,11 @@ int aa_bind_mount(const struct cred *subj_cred,
goto out;
error = fn_for_each_confined(label, profile,
- match_mnt(subj_cred, profile, path, buffer, &old_path,
+ match_mnt(subj_cred, profile, path, buffer, old_path,
old_buffer, NULL, flags, NULL, false));
out:
aa_put_buffer(buffer);
aa_put_buffer(old_buffer);
- path_put(&old_path);
return error;
}
@@ -514,24 +505,6 @@ int aa_move_mount(const struct cred *subj_cred,
return error;
}
-int aa_move_mount_old(const struct cred *subj_cred, struct aa_label *label,
- const struct path *path, const char *orig_name)
-{
- struct path old_path;
- int error;
-
- if (!orig_name || !*orig_name)
- return -EINVAL;
- error = kern_path(orig_name, LOOKUP_FOLLOW, &old_path);
- if (error)
- return error;
-
- error = aa_move_mount(subj_cred, label, &old_path, path);
- path_put(&old_path);
-
- return error;
-}
-
int aa_new_mount(const struct cred *subj_cred, struct aa_label *label,
const char *dev_name, const struct path *path,
const char *type, unsigned long flags, void *data)
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 4/7] selinux: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace selinux_mount() with granular mount hooks, preserving the
same permission checks:
- mount_bind, mount_new, mount_change_type: FILE__MOUNTON
- mount_remount, mount_reconfigure: FILESYSTEM__REMOUNT
- mount_move: FILE__MOUNTON (reuses selinux_move_mount)
The flags and data parameters are unused by SELinux.
Code generated with the assistance of Claude, reviewed by human.
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Song Liu <song@kernel.org>
---
security/selinux/hooks.c | 49 ++++++++++++++++++++++++++++------------
1 file changed, 35 insertions(+), 14 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0f704380a8c8..c8de175bde04 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2802,19 +2802,37 @@ static int selinux_sb_statfs(struct dentry *dentry)
return superblock_has_perm(cred, dentry->d_sb, FILESYSTEM__GETATTR, &ad);
}
-static int selinux_mount(const char *dev_name,
- const struct path *path,
- const char *type,
- unsigned long flags,
- void *data)
+static int selinux_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
{
- const struct cred *cred = current_cred();
+ return path_has_perm(current_cred(), to, FILE__MOUNTON);
+}
- if (flags & MS_REMOUNT)
- return superblock_has_perm(cred, path->dentry->d_sb,
- FILESYSTEM__REMOUNT, NULL);
- else
- return path_has_perm(cred, path, FILE__MOUNTON);
+static int selinux_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return path_has_perm(current_cred(), mp, FILE__MOUNTON);
+}
+
+static int selinux_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags,
+ void *data)
+{
+ return superblock_has_perm(current_cred(), fc->root->d_sb,
+ FILESYSTEM__REMOUNT, NULL);
+}
+
+static int selinux_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ return superblock_has_perm(current_cred(), mp->dentry->d_sb,
+ FILESYSTEM__REMOUNT, NULL);
+}
+
+static int selinux_mount_change_type(const struct path *mp, int ms_flags)
+{
+ return path_has_perm(current_cred(), mp, FILE__MOUNTON);
}
static int selinux_move_mount(const struct path *from_path,
@@ -7558,13 +7576,16 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
LSM_HOOK_INIT(sb_kern_mount, selinux_sb_kern_mount),
LSM_HOOK_INIT(sb_show_options, selinux_sb_show_options),
LSM_HOOK_INIT(sb_statfs, selinux_sb_statfs),
- LSM_HOOK_INIT(sb_mount, selinux_mount),
+ LSM_HOOK_INIT(mount_bind, selinux_mount_bind),
+ LSM_HOOK_INIT(mount_new, selinux_mount_new),
+ LSM_HOOK_INIT(mount_remount, selinux_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, selinux_mount_reconfigure),
+ LSM_HOOK_INIT(mount_change_type, selinux_mount_change_type),
+ LSM_HOOK_INIT(mount_move, selinux_move_mount),
LSM_HOOK_INIT(sb_umount, selinux_umount),
LSM_HOOK_INIT(sb_set_mnt_opts, selinux_set_mnt_opts),
LSM_HOOK_INIT(sb_clone_mnt_opts, selinux_sb_clone_mnt_opts),
- LSM_HOOK_INIT(move_mount, selinux_move_mount),
-
LSM_HOOK_INIT(dentry_init_security, selinux_dentry_init_security),
LSM_HOOK_INIT(dentry_create_files_as, selinux_dentry_create_files_as),
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 5/7] landlock: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace hook_sb_mount() with granular mount hooks. Landlock denies
all mount operations for sandboxed processes regardless of flags,
so all new hooks share a common hook_mount_deny() helper. The
mount_move hook reuses hook_move_mount().
Code generated with the assistance of Claude, reviewed by human.
Signed-off-by: Song Liu <song@kernel.org>
---
security/landlock/fs.c | 41 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 36 insertions(+), 5 deletions(-)
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index c1ecfe239032..7377f22a165e 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1416,9 +1416,7 @@ static void log_fs_change_topology_dentry(
* inherit these new constraints. Anyway, for backward compatibility reasons,
* a dedicated user space option would be required (e.g. as a ruleset flag).
*/
-static int hook_sb_mount(const char *const dev_name,
- const struct path *const path, const char *const type,
- const unsigned long flags, void *const data)
+static int hook_mount_deny(const struct path *const path)
{
size_t handle_layer;
const struct landlock_cred_security *const subject =
@@ -1432,6 +1430,35 @@ static int hook_sb_mount(const char *const dev_name,
return -EPERM;
}
+static int hook_mount_bind(const struct path *const from,
+ const struct path *const to, bool recurse)
+{
+ return hook_mount_deny(to);
+}
+
+static int hook_mount_new(struct fs_context *fc, const struct path *const mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return hook_mount_deny(mp);
+}
+
+static int hook_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return hook_mount_deny(mp);
+}
+
+static int hook_mount_reconfigure(const struct path *const mp,
+ unsigned int mnt_flags, unsigned long flags)
+{
+ return hook_mount_deny(mp);
+}
+
+static int hook_mount_change_type(const struct path *const mp, int ms_flags)
+{
+ return hook_mount_deny(mp);
+}
+
static int hook_move_mount(const struct path *const from_path,
const struct path *const to_path)
{
@@ -1950,8 +1977,12 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu),
LSM_HOOK_INIT(sb_delete, hook_sb_delete),
- LSM_HOOK_INIT(sb_mount, hook_sb_mount),
- LSM_HOOK_INIT(move_mount, hook_move_mount),
+ LSM_HOOK_INIT(mount_bind, hook_mount_bind),
+ LSM_HOOK_INIT(mount_new, hook_mount_new),
+ LSM_HOOK_INIT(mount_remount, hook_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, hook_mount_reconfigure),
+ LSM_HOOK_INIT(mount_change_type, hook_mount_change_type),
+ LSM_HOOK_INIT(mount_move, hook_move_mount),
LSM_HOOK_INIT(sb_umount, hook_sb_umount),
LSM_HOOK_INIT(sb_remount, hook_sb_remount),
LSM_HOOK_INIT(sb_pivotroot, hook_sb_pivotroot),
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 6/7] tomoyo: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace tomoyo_sb_mount() with granular mount hooks. Each hook
reconstructs the MS_* flags expected by tomoyo_mount_permission()
using the original flags parameter where available.
Key changes:
- mount_bind: passes the pre-resolved source path to
tomoyo_mount_acl() via a new dev_path parameter, instead of
re-resolving dev_name via kern_path(). This eliminates a TOCTOU
vulnerability.
- mount_new, mount_remount, mount_reconfigure: use the original
mount(2) flags for policy matching.
- mount_move: passes pre-resolved paths for both source and
destination.
- mount_change_type: passes raw ms_flags directly.
Also removes the unused data_page parameter from
tomoyo_mount_permission().
Code generated with the assistance of Claude, reviewed by human.
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Song Liu <song@kernel.org>
---
security/tomoyo/common.h | 2 +-
security/tomoyo/mount.c | 31 +++++++----
security/tomoyo/tomoyo.c | 109 +++++++++++++++++++++++++++++++++++----
3 files changed, 121 insertions(+), 21 deletions(-)
diff --git a/security/tomoyo/common.h b/security/tomoyo/common.h
index d098cf8aae61..9241034cfede 100644
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -1013,7 +1013,7 @@ int tomoyo_mkdev_perm(const u8 operation, const struct path *path,
const unsigned int mode, unsigned int dev);
int tomoyo_mount_permission(const char *dev_name, const struct path *path,
const char *type, unsigned long flags,
- void *data_page);
+ const struct path *dev_path);
int tomoyo_open_control(const u8 type, struct file *file);
int tomoyo_path2_perm(const u8 operation, const struct path *path1,
const struct path *path2);
diff --git a/security/tomoyo/mount.c b/security/tomoyo/mount.c
index 322dfd188ada..82ffe7d02814 100644
--- a/security/tomoyo/mount.c
+++ b/security/tomoyo/mount.c
@@ -70,6 +70,7 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
* @dir: Pointer to "struct path".
* @type: Name of filesystem type.
* @flags: Mount options.
+ * @dev_path: Pre-resolved device/source path. Maybe NULL.
*
* Returns 0 on success, negative value otherwise.
*
@@ -78,11 +79,11 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
static int tomoyo_mount_acl(struct tomoyo_request_info *r,
const char *dev_name,
const struct path *dir, const char *type,
- unsigned long flags)
+ unsigned long flags,
+ const struct path *dev_path)
__must_hold_shared(&tomoyo_ss)
{
struct tomoyo_obj_info obj = { };
- struct path path;
struct file_system_type *fstype = NULL;
const char *requested_type = NULL;
const char *requested_dir_name = NULL;
@@ -134,13 +135,23 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
need_dev = 1;
}
if (need_dev) {
- /* Get mount point or device file. */
- if (!dev_name || kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
+ if (dev_path) {
+ /* Use pre-resolved path to avoid TOCTOU issues. */
+ obj.path1 = *dev_path;
+ path_get(&obj.path1);
+ } else if (!dev_name) {
error = -ENOENT;
goto out;
+ } else {
+ struct path path;
+
+ if (kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
+ error = -ENOENT;
+ goto out;
+ }
+ obj.path1 = path;
}
- obj.path1 = path;
- requested_dev_name = tomoyo_realpath_from_path(&path);
+ requested_dev_name = tomoyo_realpath_from_path(&obj.path1);
if (!requested_dev_name) {
error = -ENOENT;
goto out;
@@ -173,7 +184,7 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
if (fstype)
put_filesystem(fstype);
kfree(requested_type);
- /* Drop refcount obtained by kern_path(). */
+ /* Drop refcount obtained by kern_path() or path_get(). */
if (obj.path1.dentry)
path_put(&obj.path1);
return error;
@@ -186,13 +197,13 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
* @path: Pointer to "struct path".
* @type: Name of filesystem type. Maybe NULL.
* @flags: Mount options.
- * @data_page: Optional data. Maybe NULL.
+ * @dev_path: Pre-resolved device/source path. Maybe NULL.
*
* Returns 0 on success, negative value otherwise.
*/
int tomoyo_mount_permission(const char *dev_name, const struct path *path,
const char *type, unsigned long flags,
- void *data_page)
+ const struct path *dev_path)
{
struct tomoyo_request_info r;
int error;
@@ -236,7 +247,7 @@ int tomoyo_mount_permission(const char *dev_name, const struct path *path,
if (!type)
type = "<NULL>";
idx = tomoyo_read_lock();
- error = tomoyo_mount_acl(&r, dev_name, path, type, flags);
+ error = tomoyo_mount_acl(&r, dev_name, path, type, flags, dev_path);
tomoyo_read_unlock(idx);
return error;
}
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index c66e02ed8ee3..c93d000acc95 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -6,6 +6,8 @@
*/
#include <linux/lsm_hooks.h>
+#include <linux/fs_context.h>
+#include <uapi/linux/mount.h>
#include <uapi/linux/lsm.h>
#include "common.h"
@@ -399,20 +401,102 @@ static int tomoyo_path_chroot(const struct path *path)
}
/**
- * tomoyo_sb_mount - Target for security_sb_mount().
+ * tomoyo_mount_bind - Target for security_mount_bind().
*
- * @dev_name: Name of device file. Maybe NULL.
- * @path: Pointer to "struct path".
- * @type: Name of filesystem type. Maybe NULL.
- * @flags: Mount options.
- * @data: Optional data. Maybe NULL.
+ * @from: Pointer to "struct path".
+ * @to: Pointer to "struct path".
+ * @recurse: Whether recursive bind mount or not.
*
* Returns 0 on success, negative value otherwise.
*/
-static int tomoyo_sb_mount(const char *dev_name, const struct path *path,
- const char *type, unsigned long flags, void *data)
+static int tomoyo_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
{
- return tomoyo_mount_permission(dev_name, path, type, flags, data);
+ unsigned long flags = MS_BIND | (recurse ? MS_REC : 0);
+
+ return tomoyo_mount_permission(NULL, to, NULL, flags, from);
+}
+
+/**
+ * tomoyo_mount_new - Target for security_mount_new().
+ *
+ * @fc: Pointer to "struct fs_context".
+ * @mp: Pointer to "struct path".
+ * @mnt_flags: Mount options.
+ * @flags: Original mount options.
+ * @data: Optional data. Maybe NULL.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ /* Use original MS_* flags for policy matching */
+ return tomoyo_mount_permission(fc->source, mp, fc->fs_type->name,
+ flags, NULL);
+}
+
+/**
+ * tomoyo_mount_remount - Target for security_mount_remount().
+ *
+ * @fc: Pointer to "struct fs_context".
+ * @mp: Pointer to "struct path".
+ * @mnt_flags: Mount options.
+ * @flags: Original mount options.
+ * @data: Optional data. Maybe NULL.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ /* Use original MS_* flags for policy matching */
+ return tomoyo_mount_permission(NULL, mp, NULL, flags, NULL);
+}
+
+/**
+ * tomoyo_mount_reconfigure - Target for security_mount_reconfigure().
+ *
+ * @mp: Pointer to "struct path".
+ * @mnt_flags: Mount options.
+ * @flags: Original mount options.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ /* Use original MS_* flags for policy matching */
+ return tomoyo_mount_permission(NULL, mp, NULL, flags, NULL);
+}
+
+/**
+ * tomoyo_mount_change_type - Target for security_mount_change_type().
+ *
+ * @mp: Pointer to "struct path".
+ * @ms_flags: Mount options.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_change_type(const struct path *mp, int ms_flags)
+{
+ return tomoyo_mount_permission(NULL, mp, NULL, ms_flags, NULL);
+}
+
+/**
+ * tomoyo_mount_move - Target for security_mount_move().
+ *
+ * @from_path: Pointer to "struct path".
+ * @to_path: Pointer to "struct path".
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_move(const struct path *from_path,
+ const struct path *to_path)
+{
+ return tomoyo_mount_permission(NULL, to_path, NULL, MS_MOVE,
+ from_path);
}
/**
@@ -576,7 +660,12 @@ static struct security_hook_list tomoyo_hooks[] __ro_after_init = {
LSM_HOOK_INIT(path_chmod, tomoyo_path_chmod),
LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
- LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
+ LSM_HOOK_INIT(mount_bind, tomoyo_mount_bind),
+ LSM_HOOK_INIT(mount_new, tomoyo_mount_new),
+ LSM_HOOK_INIT(mount_remount, tomoyo_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, tomoyo_mount_reconfigure),
+ LSM_HOOK_INIT(mount_change_type, tomoyo_mount_change_type),
+ LSM_HOOK_INIT(mount_move, tomoyo_mount_move),
LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),
LSM_HOOK_INIT(socket_bind, tomoyo_socket_bind),
--
2.53.0-Meta
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox