cgroup2 freezer and kvm_vm_worker_thread()

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Luca Boccassi <bluca@debian.org>,
	Roman Gushchin <roman.gushchin@linux.dev>
Cc: kvm@vger.kernel.org, cgroups@vger.kernel.org,
	"Michal Koutný" <mkoutny@suse.com>,
	linux-kernel@vger.kernel.org
Subject: cgroup2 freezer and kvm_vm_worker_thread()
Date: Mon, 28 Oct 2024 14:07:36 -1000	[thread overview]
Message-ID: <ZyAnSAw34jwWicJl@slm.duckdns.org> (raw)

Hello,

Luca is reporting that cgroups which have kvm instances inside never
complete freezing. This can be trivially reproduced:

  root@test ~# mkdir /sys/fs/cgroup/test
  root@test ~# echo $fish_pid > /sys/fs/cgroup/test/cgroup.procs
  root@test ~# qemu-system-x86_64 --nographic -enable-kvm

and in another terminal:

  root@test ~# echo 1 > /sys/fs/cgroup/test/cgroup.freeze
  root@test ~# cat /sys/fs/cgroup/test/cgroup.events
  populated 1
  frozen 0
  root@test ~# for i in (cat /sys/fs/cgroup/test/cgroup.threads); echo $i; cat /proc/$i/stack; end 
  2070
  [<0>] do_freezer_trap+0x42/0x70
  [<0>] get_signal+0x4da/0x870
  [<0>] arch_do_signal_or_restart+0x1a/0x1c0
  [<0>] syscall_exit_to_user_mode+0x73/0x120
  [<0>] do_syscall_64+0x87/0x140
  [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
  2159
  [<0>] do_freezer_trap+0x42/0x70
  [<0>] get_signal+0x4da/0x870
  [<0>] arch_do_signal_or_restart+0x1a/0x1c0
  [<0>] syscall_exit_to_user_mode+0x73/0x120
  [<0>] do_syscall_64+0x87/0x140
  [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
  2160
  [<0>] do_freezer_trap+0x42/0x70
  [<0>] get_signal+0x4da/0x870
  [<0>] arch_do_signal_or_restart+0x1a/0x1c0
  [<0>] syscall_exit_to_user_mode+0x73/0x120
  [<0>] do_syscall_64+0x87/0x140
  [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
  2161
  [<0>] kvm_nx_huge_page_recovery_worker+0xea/0x680
  [<0>] kvm_vm_worker_thread+0x8f/0x2b0
  [<0>] kthread+0xe8/0x110
  [<0>] ret_from_fork+0x33/0x40
  [<0>] ret_from_fork_asm+0x1a/0x30
  2164
  [<0>] do_freezer_trap+0x42/0x70
  [<0>] get_signal+0x4da/0x870
  [<0>] arch_do_signal_or_restart+0x1a/0x1c0
  [<0>] syscall_exit_to_user_mode+0x73/0x120
  [<0>] do_syscall_64+0x87/0x140
  [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

The cgroup freezing happens in the signal delivery path but
kvm_vm_worker_thread() thread never call into the signal delivery path while
joining non-root cgroups, so they never get frozen. Because the cgroup
freezer determines whether a given cgroup is frozen by comparing the number
of frozen threads to the total number of threads in the cgroup, the cgroup
never becomes frozen and users waiting for the state transition may hang
indefinitely.

There are two paths that we can take:

1. Make kvm_vm_worker_thread() call into signal delivery path.
   io_wq_worker() is in a similar boat and handles signal delivery and can
   be frozen and trapped like regular threads.

2. Keep the count of threads which can't be frozen per cgroup so that cgroup
   freezer can ignore these threads.

#1 is better in that the cgroup will actually be frozen when reported
frozen. However, the rather ambiguous criterion we've been using for cgroup
freezer is whether the cgroup can be safely snapshotted whil frozen and as
long as the workers not being frozen doesn't break that, we can go for #2
too.

What do you guys think?

Thanks.

-- 
tejun

next             reply	other threads:[~2024-10-29  0:07 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-29  0:07 Tejun Heo [this message]
2024-10-29 20:46 ` cgroup2 freezer and kvm_vm_worker_thread() Roman Gushchin
2024-10-29 22:59 ` Paolo Bonzini
2024-10-30  0:25   ` Tejun Heo
2024-10-30  0:38     ` Luca Boccassi
2024-10-30 12:05     ` Paolo Bonzini
2024-10-30 18:14       ` Tejun Heo
2024-11-06 23:21         ` Luca Boccassi
2024-11-06 23:22           ` Paolo Bonzini
2024-11-07 18:05 ` Michal Koutný
2024-11-07 18:54   ` Tejun Heo
2024-11-07 20:48     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZyAnSAw34jwWicJl@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=bluca@debian.org \
    --cc=cgroups@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkoutny@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=roman.gushchin@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).