linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/3] devguard: guard mknod for non-initial user namespace
@ 2023-12-13 14:38 Michael Weiß
  2023-12-13 14:38 ` [RFC PATCH v3 1/3] bpf: cgroup: Introduce helper cgroup_bpf_current_enabled() Michael Weiß
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Michael Weiß @ 2023-12-13 14:38 UTC (permalink / raw)
  To: Christian Brauner, Alexander Mikhalitsyn, Alexei Starovoitov,
	Paul Moore
  Cc: Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Quentin Monnet, Alexander Viro,
	Miklos Szeredi, Amir Goldstein, Serge E. Hallyn, bpf,
	linux-kernel, linux-fsdevel, linux-security-module, gyroidos,
	Michael Weiß

If a container manager restricts its unprivileged (user namespaced)
children by a device cgroup, it is not necessary to deny mknod()
anymore. Thus, user space applications may map devices on different
locations in the file system by using mknod() inside the container.

A use case for this, we also use in GyroidOS, is to run virsh for
VMs inside an unprivileged container. virsh creates device nodes,
e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails
in a non-initial userns, even if a cgroup device white list with the
corresponding major, minor of /dev/null exists. Thus, in this case
the usual bind mounts or pre populated device nodes under /dev are
not sufficient.

Due to the discussion with Christian on v2, I agree that the previous
approach was to complex. Actually, we just want working device
nodes in user namespace if we have a device cgroup in place which
handles access decisions.

Patch 1 provides a helper functions to check if the current task
is guarded by a bpf-device cgroup program.
Thanks Alexander Mikhalitsyn for reviewing.

Patch 2 implements the ns_capable check including sysctl as proposed
by Christian. I provide a short overview about device node creation
and access decisions in the commit message there.

Patch 3 provides devgard, a small lsm which actually strips out
SB_I_NODEV.

---
Changes in v3:
- Small LSM to just implement security_inode_mknod() hook
- Leave devcgroup as is
- Strip SB_I_NO_DEV in security_inode_mknod hook as suggested by
  Christian
- Do not change bpf or cgroup access decision at all
- ns_capable(sb->s_iflags, CAP_MKNOD) in vfs_mknod()
- Link to v2: https://lore.kernel.org/lkml/1d481e11-6601-4b82-a317-f8506f3ccf9b@aisec.fraunhofer.de/

Changes in v2:
- Integrate this as LSM (Christian, Paul)
- Switched to a device cgroup specific flag instead of a generic
  bpf program flag (Christian)
- Do not ignore SB_I_NODEV in fs/namei.c but use LSM hook in
  sb_alloc_super in fs/super.c
- Link to v1: https://lore.kernel.org/lkml/20230814-devcg_guard-v1-0-654971ab88b1@aisec.fraunhofer.de

Michael Weiß (3):
  bpf: cgroup: Introduce helper cgroup_bpf_current_enabled()
  fs: Make vfs_mknod() to check CAP_MKNOD in user namespace of sb
  devguard: added device guard for mknod in non-initial userns

 fs/namei.c                   | 30 +++++++++++++++++++++++-
 include/linux/bpf-cgroup.h   |  2 ++
 kernel/bpf/cgroup.c          | 14 ++++++++++++
 security/Kconfig             | 11 +++++----
 security/Makefile            |  1 +
 security/devguard/Kconfig    | 12 ++++++++++
 security/devguard/Makefile   |  2 ++
 security/devguard/devguard.c | 44 ++++++++++++++++++++++++++++++++++++
 8 files changed, 110 insertions(+), 6 deletions(-)
 create mode 100644 security/devguard/Kconfig
 create mode 100644 security/devguard/Makefile
 create mode 100644 security/devguard/devguard.c


base-commit: a39b6ac3781d46ba18193c9dbb2110f31e9bffe9
-- 
2.30.2


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-01-08 16:34 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-13 14:38 [RFC PATCH v3 0/3] devguard: guard mknod for non-initial user namespace Michael Weiß
2023-12-13 14:38 ` [RFC PATCH v3 1/3] bpf: cgroup: Introduce helper cgroup_bpf_current_enabled() Michael Weiß
2023-12-13 16:59   ` Yonghong Song
2023-12-14  8:17     ` Michael Weiß
2023-12-15 14:31       ` Yonghong Song
2023-12-13 14:38 ` [RFC PATCH v3 2/3] fs: Make vfs_mknod() to check CAP_MKNOD in user namespace of sb Michael Weiß
2023-12-13 14:38 ` [RFC PATCH v3 3/3] devguard: added device guard for mknod in non-initial userns Michael Weiß
2023-12-13 18:35   ` Casey Schaufler
2023-12-15 12:31   ` Christian Brauner
2023-12-15 13:26     ` Michael Weiß
2023-12-15 14:15       ` Christian Brauner
2023-12-15 16:36         ` Christian Brauner
2023-12-18 16:09           ` Alexander Mikhalitsyn
2023-12-19 13:43             ` Christian Brauner
2023-12-15 18:08         ` Alexei Starovoitov
2023-12-16 10:38           ` Christian Brauner
2023-12-16 17:41             ` Alexei Starovoitov
2023-12-18 12:30               ` Christian Brauner
2023-12-22 23:39                 ` Paul Moore
2023-12-27 14:31                   ` Michael Weiß
2023-12-29 22:31                     ` Paul Moore
2024-01-08 13:44                       ` Michael Weiß
2024-01-08 16:34                         ` Paul Moore
2023-12-18 16:18       ` Alexander Mikhalitsyn
2023-12-20 19:44         ` Michael Weiß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).