Linux cgroups development
 help / color / mirror / Atom feed
* [PATCH 0/6] psi: slightly improve performance of psi
@ 2026-05-12  6:19 Luka Bai
  2026-05-12  6:19 ` [PATCH 1/6] psi: move curr_in_memstall out of psi_group_change Luka Bai
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Luka Bai @ 2026-05-12  6:19 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Suren Baghdasaryan, Peter Ziljstra, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, K Prateek Nayak,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Michal Hocko,
	Kees Cook, Tejun Heo, Michal Koutný, linux-kernel, cgroups,
	Luka Bai

PSI is useful for resource pressure monitoring. But the callbacks are
distributed among all the common calling paths, some of which are quite
performance critical. The hottest callback like psi_group_change is
called by both psi_task_switch and psi_task_change, which are parts of
task_switch, enqueue, dequeue. So the cpu usage of psi is quite
important.

We initialized a common hackbench test using the following command:

perf record --kernel-callchains -a -g hackbench -s 512 -P -g 10 -f 30 \
        -l 1000 --pipe

In a machine setup with 8 cores, 16GB with two numa node(each node 8GB),
we saw a cpu usage of 4.3% for psi using the flame graph of the perf
data, which can make some observable influence to the actual workloads.

In this patchset, we did some improvement for the performance of hot
path, which slightly improves the performance for the psi. With a same
setup of 8 cores + 16GB, the cpu usage of psi becomes 3.4%, which has
a 20% improvement. In the future patches we may try to do more
adjustment to go further (Like add switches for different types of PSI
resources maybe).

Patch Details:
========
* Patch 1 moves the judgement of cpu_curr(cpu)->in_memstall from
  psi_group_change outside to eliminate some repeated memory access.
* Patch 2 adds a bit variable need_psi to help judge whether we need
  to do psi accouting for the cgroup. we move it and psi_flags, which
  currently only has 5 bits, close to the bitfield variable in_memstall
  together. This way they will be cacheline aligned together.
* Patch 3 adds a prefetch logic before actually accessing the parent
  cgroups, since the parent cgroups will always be accessed in the
  following step.
* Patch 4 only calls record_times when the state actually changes to
  save some uncessary accesses.
* Patch 5 adds psi_group for the root cgroup to remove the uncessary
  if condition.
* Patch 6 uses printk_deferred_once to replace the psi_bug variable
  and moves tasks[NR_RUNNING] which is most likely to happen ahead
  in the if condition.

Thanks for reading. Comments and suggestions are very welcome!

Signed-off-by: Luka Bai <lukabai@tencent.com>
---
Luka Bai (6):
      psi: move curr_in_memstall out of psi_group_change
      psi: reorganize the psi members for cacheline benifits
      psi: use prefetch to preread the parent groupc
      psi: do not call record_times when the state is not changed
      psi: add psi group for the root cgroup
      psi: remove psi_bug and moves checking of NR_RUNNING ahead.

 include/linux/psi.h       |  2 +-
 include/linux/psi_types.h | 20 +------------
 include/linux/sched.h     | 29 ++++++++++++++++---
 kernel/cgroup/cgroup.c    |  3 ++
 kernel/fork.c             | 10 +++++++
 kernel/sched/psi.c        | 71 ++++++++++++++++++++++++++++++-----------------
 6 files changed, 85 insertions(+), 50 deletions(-)
---
base-commit: 972c53e0ec3abfc6f5fe2cb503640710fb23cf95
change-id: 20260512-psi_impr-f543a199f39d

Best regards,
--  
Luka Bai <lukabai@tencent.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-12  6:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12  6:19 [PATCH 0/6] psi: slightly improve performance of psi Luka Bai
2026-05-12  6:19 ` [PATCH 1/6] psi: move curr_in_memstall out of psi_group_change Luka Bai
2026-05-12  6:19 ` [PATCH 2/6] psi: reorganize the psi members for cacheline benifits Luka Bai
2026-05-12  6:19 ` [PATCH 3/6] psi: use prefetch to preread the parent groupc Luka Bai
2026-05-12  6:20 ` [PATCH 4/6] psi: do not call record_times when the state is not changed Luka Bai
2026-05-12  6:20 ` [PATCH 5/6] psi: add psi group for the root cgroup Luka Bai
2026-05-12  6:20 ` [PATCH 6/6] psi: remove psi_bug and moves checking of NR_RUNNING ahead Luka Bai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox