Re: [RFC PATCH v2 bpf-next 0/3] bpf: cgroup: support writing and freezing cgroups from BPF

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Michal Koutný" <mkoutny@suse.com>
To: Djalal Harouni <tixxdz@gmail.com>
Cc: tj@kernel.org, hannes@cmpxchg.org, ast@kernel.org,
	 daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
	eddyz87@gmail.com,  song@kernel.org, yonghong.song@linux.dev,
	john.fastabend@gmail.com,  kpsingh@kernel.org, sdf@fomichev.me,
	haoluo@google.com, jolsa@kernel.org,  mykolal@fb.com,
	shuah@kernel.org, cgroups@vger.kernel.org, bpf@vger.kernel.org,
	 linux-kselftest@vger.kernel.org, tixxdz@opendz.org
Subject: Re: [RFC PATCH v2 bpf-next 0/3] bpf: cgroup: support writing and freezing cgroups from BPF
Date: Thu, 28 Aug 2025 16:38:45 +0200	[thread overview]
Message-ID: <m7laj6747wtu5r732iph47zn6no3mbu6iq3mne3zslzyqlq523@7tmw25ap77ek> (raw)
In-Reply-To: <0e78be6f-ef48-4fcc-b0c7-48bc14fdfc7f@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3143 bytes --]

On Wed, Aug 27, 2025 at 12:27:08AM +0100, Djalal Harouni <tixxdz@gmail.com> wrote:
> It solves the case perfectly, you detect something you fail the
> security hook return -EPERM and optionally freeze the cgroup,
> snapshot the runtime state.

So -EPERM is the right way to cut off such tasks.

> Oh I thought the attached example is an obvious one, customers want to
> restrict bpf() usage per cgroup specific container/pod, so when
> we detect bpf() that's not per allowed cgroup we fail it and freeze
> it.
> 
> Take this and build on top, detect bash/shell exec or any other new
> dropped binaries, fail and freeze the exec early at linux_bprm object
> checks.

Or if you want to do some followup analysis, the process can be killed
and coredump'd (at least seccomp allows this, it'd be good to have such
a possibility with LSMs if there isn't (I'm not that familiar)).
Freezing the groups sounds like a way to DoS the system (not only
because of hanging the faulty process itself but possibly spreading via
IPC dependencies to unrelated processes).

> > Also why couldn't all these tools execute the cgroup actions themselves
> > through traditional userspace API?
> 
> - Freezing at BPF is obviously better, less race since you don't need
>   access to the corresponding cgroup fs and namespace. Not all tools run
>   as supervisor/container manager.

Less race or more race -- I know the race window size may vary but
strictly speaking , there is a race or isn't (depends on having proper
synchronization or not). (And when intentionally misbehaving processes are
considered even tiny window is potential risk.)

> - The bpf_send_signal in some cases is not enough, what if you race with
>   a task clone as an example? however freezing the cgroup hierarchy or
>   the one above is a catch all...

Yeah, this might be part that I don't internalize well. If you're
running the hook in particular task's process context, it cannot do
clone at the same time. If they are independent tasks, there's no
ordering, so there's always possibility of the race (so why not embrace
it and do whatever is possible with userspace monitoring audit log or
similar and respond based on that).

> The feature is supposed to be used by sleepable BPF programs, I don't
> think we need extra checks here?

Good.

> It could be that this BPF code runs in a process that is under
> pod-x/container-y/cgroup-z/  and maybe you want to freeze "cgroup-z"
> or "container-y" and so on... or in case of delegated hierarchies,
> freezing the parent is a catch all.

OK, this would be good. Could it also be pod-x/container-y/cgroup-z2?

---

I acknowledge that sooner or later some kind of access to cgroup through
BPF will be added, I'd prefer if it was done in a generic way (so that
it doesn't become cgroup's problem but someone else's e.g. VFS's or
kernfs's ;-)).
I can even imagine some usefulness of helpers for selected specific
cgroup (core) operations (which is the direction brought up in the other
discussion), I just don't think it solves the problem as you present it.

HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

next prev parent reply	other threads:[~2025-08-28 14:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-18  9:04 [RFC PATCH v2 bpf-next 0/3] bpf: cgroup: support writing and freezing cgroups from BPF Djalal Harouni
2025-08-18  9:04 ` [RFC PATCH v2 bpf-next 1/3] kernfs: cgroup: support writing cgroup interfaces from a kernfs node Djalal Harouni
2025-08-18  9:04 ` [RFC PATCH v2 bpf-next 2/3] bpf: cgroup: Add BPF Kfunc to write and freeze a cgroup Djalal Harouni
2025-08-18  9:04 ` [RFC PATCH v2 bpf-next 3/3] selftests/bpf: add selftest for bpf_cgroup_write_interface Djalal Harouni
2025-08-18 17:32 ` [RFC PATCH v2 bpf-next 0/3] bpf: cgroup: support writing and freezing cgroups from BPF Tejun Heo
2025-08-19 23:31   ` Djalal Harouni
2025-08-19 23:36     ` Djalal Harouni
2025-08-20  1:14     ` Tejun Heo
2025-08-22 18:16       ` Djalal Harouni
2025-08-25 18:48         ` Tejun Heo
2025-08-26  3:45           ` Alexei Starovoitov
2025-08-26 10:23           ` Djalal Harouni
2025-08-26 14:18 ` Michal Koutný
2025-08-26 23:27   ` Djalal Harouni
2025-08-28 14:38     ` Michal Koutný [this message]
2025-09-01 19:53       ` Djalal Harouni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m7laj6747wtu5r732iph47zn6no3mbu6iq3mne3zslzyqlq523@7tmw25ap77ek \
    --to=mkoutny@suse.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mykolal@fb.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=tixxdz@gmail.com \
    --cc=tixxdz@opendz.org \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).