From: Martin Pitt <martin@piware.de>
To: regressions@lists.linux.dev
Cc: cgroups@vger.kernel.org, tj@kernel.org, lizefan.x@bytedance.com,
hannes@cmpxchg.org
Subject: [REGRESSION] 6.9.11: systemd hangs in cgroup_drain_dying during cleanup after podman operations
Date: Wed, 29 Apr 2026 11:21:07 +0200 [thread overview]
Message-ID: <afHNg2VX2jy9bW7y@piware.de> (raw)
Hello,
Our cockpit tests found a kernel regression introduced between 6.9.10 (working)
and 6.9.11 (broken) that causes a system hang during cgroup cleanup after
podman container operations. I've kept notes in
https://github.com/cockpit-project/bots/pull/8970#issuecomment-4342147158 , but
now I am at the end of my wisdom how to squeeze more information out of this.
=== Summary ===
When running podman REST API operations on rootless containers followed by user
session cleanup (loginctl/pkill), systemd (pid 1) gets stuck in
cgroup_drain_dying trying to remove an empty cgroup. After that, I'm
- Unable to run commands that access /proc (ps, top, lsns, ls /proc, etc.)
- Unable to create new SSH sessions or VT logins
- If I previously logged into the QEMU VT, that login session remains
more or less functional, except not being able to run most commands
=== Kernel Versions ===
- Last known working: 6.9.10
- Broken: 6.9.11 (OpenSUSE Tumbleweed), 6.9.13 (Fedora 44), 6.9.14 (Fedora 44),
Ubuntu 26.04 (7.0.0)
=== Stack Trace ===
From sysrq-trigger task dump, systemd is stuck in:
[ 207.958946] task:systemd state:D stack:0 pid:1 tgid:1 ppid:0
[ 207.959734] Call Trace:
[ 207.960117] <TASK>
[ 207.960333] __schedule+0x2b2/0x5d0
[ 207.960603] schedule+0x27/0x80
[ 207.960945] cgroup_drain_dying+0xef/0x1a0
[ 207.961287] ? __pfx_autoremove_wake_function+0x10/0x10
[ 207.961639] cgroup_rmdir+0x37/0x100
[ 207.961945] kernfs_iop_rmdir+0x6a/0xd0
[ 207.962239] vfs_rmdir+0x154/0x270
[ 207.962486] do_rmdir+0x201/0x280
[ 207.962723] __x64_sys_unlinkat+0x8c/0xd0
=== Observations ===
- /sys/fs/cgroup/user.slice/user-1000.slice/cgroup.procs was empty, indicating
all processes were killed but the cgroup itself cannot be removed
- Multiple zombie processes present, unable to be reaped (user@1000.service
systemd, podman, conmon processes)
- RCU subsystem appears healthy (rcu_exp_gp_kthr in S state)
=== Reproducer ===
The bug is triggered by a specific sequence of podman REST API operations on
rootless containers, followed by user cleanup. The reproducer is part of the
cockpit-podman test suite. I created a branch where I reduced the test to the
absolute minimum, and also replaced as many UI clicks as possible with shell
operations (all but one):
https://github.com/martinpitt/cockpit-podman/blob/kernel-hang/test/check-application#L1486
Sequence:
1. Create and stop a rootless container as the admin user
2. Call podman REST API lifecycle operations: start → restart → stop
3. Create an exec session (console/TTY connection) via REST API
4. Start the container again via REST API
5. Cleanup: loginctl terminate-user admin; loginctl kill-user admin; pkill -9 -u admin
Using podman CLI commands (e.g., "podman start swamped-crate") instead of the
REST API does NOT trigger the hang, only when using the REST API. That may be
because of the different process layout, or just sheer timing -- as eventually,
both CLI and API should result in the same actual cgroup/container operations
on the podman side.
The bug is very timing-sensitive. I attempted to create a standalone shell
script reproducer, but failed, it always passes with that. Even with the
original cockpit-podman integration test failure it's unreliable: it can hang
on the first iteration, most of the time it fails within 5 runs, but I've had
stretches where 50+ iterations passed before the hang happened.
=== Full debug output ===
The above GitHub PR comment links to the full dmesg log. Direct link:
https://github.com/user-attachments/files/27195205/dmesg-cgrouphang.txt
This covers initial boot up to the hang, and then the outputs of sysrq task
dump (t), memory info (m), and blocked tasks (w).
=== Additional Notes ===
In one early test run, a different hang pattern was observed where
rcu_exp_gp_kthr was in D state with a process stuck in
synchronize_rcu_expedited during namespace cleanup, but this variant has not
reproduced in subsequent runs. The cgroup cleanup deadlock appears to be the
primary manifestation.
This is my first (non-trivial) kernel bug report, so please bear with me. I'm
normally stay firmly in userland.
Thanks,
Martin Pitt
next reply other threads:[~2026-04-29 9:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-29 9:21 Martin Pitt [this message]
2026-04-29 16:21 ` [REGRESSION] 6.9.11: systemd hangs in cgroup_drain_dying during cleanup after podman operations Tejun Heo
2026-04-29 21:15 ` Tejun Heo
2026-04-30 6:15 ` Martin Pitt
2026-05-01 2:29 ` [PATCH] cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated Tejun Heo
2026-05-03 19:30 ` kernel test robot
2026-05-03 20:15 ` kernel test robot
2026-05-03 22:45 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afHNg2VX2jy9bW7y@piware.de \
--to=martin@piware.de \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=lizefan.x@bytedance.com \
--cc=regressions@lists.linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox