All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/rt: optimize cpupri_vec layout
@ 2025-06-12  3:11 Pan Deng
  2025-06-12  3:11 ` Deng, Pan
  2025-06-16  8:28 ` kernel test robot
  0 siblings, 2 replies; 3+ messages in thread
From: Pan Deng @ 2025-06-12  3:11 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, tianyou.li, tim.c.chen, pan.deng

When running a multi-instance ffmpeg transcoding workload which uses rt
thread in a high core count system, cpupri_vec->count contends with the
reading of mask in the same cache line in function cpupri_find_fitness
and cpupri_set.
This change separates each count and mask into different cache lines by
cache aligned attribute to avoid the false sharing.
Tested in a 2 sockets, 240 physical core 480 logical core machine, running
60 ffmpeg transcoding instances. With the change, the kernel cycles% is
reduced from ~20% to ~12%, the fps metric is improved ~11%.
The side effect of this change is that struct cpupri size is increased
from 26 cache lines to 203 cache lines.

Signed-off-by: Pan Deng <pan.deng@intel.com>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/cpupri.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cpupri.h b/kernel/sched/cpupri.h
index d6cba0020064..245b0fa626be 100644
--- a/kernel/sched/cpupri.h
+++ b/kernel/sched/cpupri.h
@@ -9,7 +9,7 @@
 
 struct cpupri_vec {
 	atomic_t		count;
-	cpumask_var_t		mask;
+	cpumask_var_t		mask	____cacheline_aligned;
 };
 
 struct cpupri {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-16  8:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-12  3:11 [PATCH] sched/rt: optimize cpupri_vec layout Pan Deng
2025-06-12  3:11 ` Deng, Pan
2025-06-16  8:28 ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.