Linux cgroups development
 help / color / mirror / Atom feed
From: Martin Pitt <martin@piware.de>
To: regressions@lists.linux.dev
Cc: cgroups@vger.kernel.org, tj@kernel.org, lizefan.x@bytedance.com,
	hannes@cmpxchg.org
Subject: [REGRESSION] 6.9.11: systemd hangs in cgroup_drain_dying during cleanup after podman operations
Date: Wed, 29 Apr 2026 11:21:07 +0200	[thread overview]
Message-ID: <afHNg2VX2jy9bW7y@piware.de> (raw)

Hello,

Our cockpit tests found a kernel regression introduced between 6.9.10 (working)
and 6.9.11 (broken) that causes a system hang during cgroup cleanup after
podman container operations. I've kept notes in
https://github.com/cockpit-project/bots/pull/8970#issuecomment-4342147158 , but
now I am at the end of my wisdom how to squeeze more information out of this.

=== Summary ===

When running podman REST API operations on rootless containers followed by user
session cleanup (loginctl/pkill), systemd (pid 1) gets stuck in
cgroup_drain_dying trying to remove an empty cgroup. After that, I'm

- Unable to run commands that access /proc (ps, top, lsns, ls /proc, etc.)
- Unable to create new SSH sessions or VT logins
- If I previously logged into the QEMU VT, that login session remains
  more or less functional, except not being able to run most commands

=== Kernel Versions ===

- Last known working: 6.9.10
- Broken: 6.9.11 (OpenSUSE Tumbleweed), 6.9.13 (Fedora 44), 6.9.14 (Fedora 44),
  Ubuntu 26.04 (7.0.0)

=== Stack Trace ===

From sysrq-trigger task dump, systemd is stuck in:

[  207.958946] task:systemd         state:D stack:0     pid:1     tgid:1     ppid:0
[  207.959734] Call Trace:
[  207.960117]  <TASK>
[  207.960333]  __schedule+0x2b2/0x5d0
[  207.960603]  schedule+0x27/0x80
[  207.960945]  cgroup_drain_dying+0xef/0x1a0
[  207.961287]  ? __pfx_autoremove_wake_function+0x10/0x10
[  207.961639]  cgroup_rmdir+0x37/0x100
[  207.961945]  kernfs_iop_rmdir+0x6a/0xd0
[  207.962239]  vfs_rmdir+0x154/0x270
[  207.962486]  do_rmdir+0x201/0x280
[  207.962723]  __x64_sys_unlinkat+0x8c/0xd0

=== Observations ===

- /sys/fs/cgroup/user.slice/user-1000.slice/cgroup.procs was empty, indicating
  all processes were killed but the cgroup itself cannot be removed
- Multiple zombie processes present, unable to be reaped (user@1000.service
  systemd, podman, conmon processes)
- RCU subsystem appears healthy (rcu_exp_gp_kthr in S state)

=== Reproducer ===

The bug is triggered by a specific sequence of podman REST API operations on
rootless containers, followed by user cleanup. The reproducer is part of the
cockpit-podman test suite. I created a branch where I reduced the test to the
absolute minimum, and also replaced as many UI clicks as possible with shell
operations (all but one):

  https://github.com/martinpitt/cockpit-podman/blob/kernel-hang/test/check-application#L1486

Sequence:
1. Create and stop a rootless container as the admin user
2. Call podman REST API lifecycle operations: start → restart → stop
3. Create an exec session (console/TTY connection) via REST API
4. Start the container again via REST API
5. Cleanup: loginctl terminate-user admin; loginctl kill-user admin; pkill -9 -u admin

Using podman CLI commands (e.g., "podman start swamped-crate") instead of the
REST API does NOT trigger the hang, only when using the REST API. That may be
because of the different process layout, or just sheer timing -- as eventually,
both CLI and API should result in the same actual cgroup/container operations
on the podman side.

The bug is very timing-sensitive. I attempted to create a standalone shell
script reproducer, but failed, it always passes with that. Even with the
original cockpit-podman integration test failure it's unreliable: it can hang
on the first iteration, most of the time it fails within 5 runs, but I've had
stretches where 50+ iterations passed before the hang happened.

=== Full debug output ===

The above GitHub PR comment links to the full dmesg log. Direct link:
https://github.com/user-attachments/files/27195205/dmesg-cgrouphang.txt

This covers initial boot up to the hang, and then the outputs of sysrq task
dump (t), memory info (m), and blocked tasks (w).

=== Additional Notes ===

In one early test run, a different hang pattern was observed where
rcu_exp_gp_kthr was in D state with a process stuck in
synchronize_rcu_expedited during namespace cleanup, but this variant has not
reproduced in subsequent runs. The cgroup cleanup deadlock appears to be the
primary manifestation.

This is my first (non-trivial) kernel bug report, so please bear with me. I'm
normally stay firmly in userland.

Thanks,

Martin Pitt

             reply	other threads:[~2026-04-29  9:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29  9:21 Martin Pitt [this message]
2026-04-29 16:21 ` [REGRESSION] 6.9.11: systemd hangs in cgroup_drain_dying during cleanup after podman operations Tejun Heo
2026-04-29 21:15   ` Tejun Heo
2026-04-30  6:15   ` Martin Pitt
2026-05-01  2:29 ` [PATCH] cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated Tejun Heo
2026-05-03 19:30   ` kernel test robot
2026-05-03 20:15   ` kernel test robot
2026-05-03 22:45   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afHNg2VX2jy9bW7y@piware.de \
    --to=martin@piware.de \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=lizefan.x@bytedance.com \
    --cc=regressions@lists.linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox