linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability
@ 2025-06-13  9:12 Gabriele Monaco
  2025-06-13  9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13  9:12 UTC (permalink / raw)
  To: linux-kernel, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar
  Cc: Gabriele Monaco

This patchset moves the task_mm_cid_work to a preemptible and migratable
context. This reduces the impact of this work to the scheduling latency
of real time tasks.
The change makes the recurrence of the task a bit more predictable.

The behaviour causing latency was introduced in commit 223baf9d17f2
("sched: Fix performance regression introduced by mm_cid") which
introduced a task work tied to the scheduler tick.
That approach presents two possible issues:
* the task work runs before returning to user and causes, in fact, a
  scheduling latency (with order of magnitude significant in PREEMPT_RT)
* periodic tasks with short runtime are less likely to run during the
  tick, hence they might not run the task work at all

Patch 1 add support for prev_sum_exec_runtime to the RT, deadline and
sched_ext classes as it is supported by fair, this is required to avoid
calling rseq_preempt on tick if the runtime is below a threshold.

Patch 2 contains the main changes, removing the task_work on the
scheduler tick and using a work_struct scheduled more reliably during
__rseq_handle_notify_resume.

Patch 3 adds a selftest to validate the functionality of the
task_mm_cid_work (i.e. to compact the mm_cids).

Rebased on v6.16-rc1, no change since V13 [1].

Changes since V12:
* Ensure the tick schedules the mm_cid compaction only once for tasks
  executing longer than 100ms (until the scan expires again)
* Execute an rseq_preempt from the tick only after compaction was done
  and the cid assignation changed

Changes since V11:
* Remove variable to make mm_cid_needs_scan more compact
* All patches reviewed

Changes since V10:
* Fix compilation errors with RSEQ and/or MM_CID disabled

Changes since V9:
* Simplify and move checks from task_queue_mm_cid to its call site

Changes since V8 [2]:
* Add support for prev_sum_exec_runtime to RT, deadline and sched_ext
* Avoid rseq_preempt on ticks unless executing for more than 100ms
* Queue the work on the unbound workqueue

Changes since V7:
* Schedule mm_cid compaction and update at every tick too
* mmgrab before scheduling the work

Changes since V6 [3]:
* Switch to a simple work_struct instead of a delayed work
* Schedule the work_struct in __rseq_handle_notify_resume
* Asynchronously disable the work but make sure mm is there while we run
* Remove first patch as merged independently
* Fix commit tag for test

Changes since V5:
* Punctuation

Changes since V4 [4]:
* Fixes on the selftest
    * Polished memory allocation and cleanup
    * Handle the test failure in main

Changes since V3 [5]:
* Fixes on the selftest
    * Minor style issues in comments and indentation
    * Use of perror where possible
    * Add a barrier to align threads execution
    * Improve test failure and error handling

Changes since V2 [6]:
* Change the order of the patches
* Merge patches changing the main delayed_work logic
* Improved self-test to spawn 1 less thread and use the main one instead

Changes since V1 [7]:
* Re-arm the delayed_work at each invocation
* Cancel the work synchronously at mmdrop
* Remove next scan fields and completely rely on the delayed_work
* Shrink mm_cid allocation with nr thread/affinity (Mathieu Desnoyers)
* Add self test

[1] - https://lore.kernel.org/lkml/20250414123630.177385-5-gmonaco@redhat.com
[2] - https://lore.kernel.org/lkml/20250220102639.141314-1-gmonaco@redhat.com
[3] - https://lore.kernel.org/lkml/20250210153253.460471-1-gmonaco@redhat.com
[4] - https://lore.kernel.org/lkml/20250113074231.61638-4-gmonaco@redhat.com
[5] - https://lore.kernel.org/lkml/20241216130909.240042-1-gmonaco@redhat.com
[6] - https://lore.kernel.org/lkml/20241213095407.271357-1-gmonaco@redhat.com
[7] - https://lore.kernel.org/lkml/20241205083110.180134-2-gmonaco@redhat.com

To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Ingo Molnar <mingo@redhat.org>

Gabriele Monaco (3):
  sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes
  sched: Move task_mm_cid_work to mm work_struct
  selftests/rseq: Add test for mm_cid compaction

 include/linux/mm_types.h                      |  26 +++
 include/linux/sched.h                         |   8 +-
 kernel/rseq.c                                 |   2 +
 kernel/sched/core.c                           |  75 ++++---
 kernel/sched/deadline.c                       |   1 +
 kernel/sched/ext.c                            |   1 +
 kernel/sched/rt.c                             |   1 +
 kernel/sched/sched.h                          |   6 +-
 tools/testing/selftests/rseq/.gitignore       |   1 +
 tools/testing/selftests/rseq/Makefile         |   2 +-
 .../selftests/rseq/mm_cid_compaction_test.c   | 200 ++++++++++++++++++
 11 files changed, 294 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c


base-commit: 2c4a1f3fe03edab80db66688360685031802160a
-- 
2.49.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes
  2025-06-13  9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
@ 2025-06-13  9:12 ` Gabriele Monaco
  2025-06-13  9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
  2025-06-13  9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
  2 siblings, 0 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13  9:12 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra
  Cc: Gabriele Monaco, Mathieu Desnoyers, Ingo Molnar

The fair scheduling class relies on prev_sum_exec_runtime to compute the
duration of the task's runtime since it was last scheduled. This value
is currently not required by other scheduling classes but can be useful
to understand long running tasks and take certain actions (e.g. during a
scheduler tick).

Add support for prev_sum_exec_runtime to the RT, deadline and sched_ext
classes by simply assigning the sum_exec_runtime at each set_next_task.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/sched/deadline.c | 1 +
 kernel/sched/ext.c      | 1 +
 kernel/sched/rt.c       | 1 +
 3 files changed, 3 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245e..8387006396c8a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2389,6 +2389,7 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
 	p->se.exec_start = rq_clock_task(rq);
 	if (on_dl_rq(&p->dl))
 		update_stats_wait_end_dl(dl_rq, dl_se);
+	p->se.prev_sum_exec_runtime = p->se.sum_exec_runtime;
 
 	/* You can't push away the running task */
 	dequeue_pushable_dl_task(rq, p);
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 2c41c78be61eb..75772767f87d2 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3255,6 +3255,7 @@ static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool first)
 	}
 
 	p->se.exec_start = rq_clock_task(rq);
+	p->se.prev_sum_exec_runtime = p->se.sum_exec_runtime;
 
 	/* see dequeue_task_scx() on why we skip when !QUEUED */
 	if (SCX_HAS_OP(sch, running) && (p->scx.flags & SCX_TASK_QUEUED))
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e40422c370335..2c70ff2042ee9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1693,6 +1693,7 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
 	p->se.exec_start = rq_clock_task(rq);
 	if (on_rt_rq(&p->rt))
 		update_stats_wait_end_rt(rt_rq, rt_se);
+	p->se.prev_sum_exec_runtime = p->se.sum_exec_runtime;
 
 	/* The running task is never eligible for pushing */
 	dequeue_pushable_task(rq, p);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
  2025-06-13  9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
  2025-06-13  9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
@ 2025-06-13  9:12 ` Gabriele Monaco
  2025-06-25  8:01   ` kernel test robot
  2025-06-13  9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
  2 siblings, 1 reply; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13  9:12 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, David Hildenbrand, Ingo Molnar,
	Peter Zijlstra, Mathieu Desnoyers, Paul E. McKenney, linux-mm
  Cc: Gabriele Monaco, Ingo Molnar

Currently, the task_mm_cid_work function is called in a task work
triggered by a scheduler tick to frequently compact the mm_cids of each
process. This can delay the execution of the corresponding thread for
the entire duration of the function, negatively affecting the response
in case of real time tasks. In practice, we observe task_mm_cid_work
increasing the latency of 30-35us on a 128 cores system, this order of
magnitude is meaningful under PREEMPT_RT.

Run the task_mm_cid_work in a new work_struct connected to the
mm_struct rather than in the task context before returning to
userspace.

This work_struct is initialised with the mm and disabled before freeing
it. The queuing of the work happens while returning to userspace in
__rseq_handle_notify_resume, maintaining the checks to avoid running
more frequently than MM_CID_SCAN_DELAY.
To make sure this happens predictably also on long running tasks, we
trigger a call to __rseq_handle_notify_resume also from the scheduler
tick if the runtime exceeded a 100ms threshold.

The main advantage of this change is that the function can be offloaded
to a different CPU and even preempted by RT tasks.

Moreover, this new behaviour is more predictable with periodic tasks
with short runtime, which may rarely run during a scheduler tick.
Now, the work is always scheduled when the task returns to userspace.

The work is disabled during mmdrop, since the function cannot sleep in
all kernel configurations, we cannot wait for possibly running work
items to terminate. We make sure the mm is valid in case the task is
terminating by reserving it with mmgrab/mmdrop, returning prematurely if
we are really the last user while the work gets to run.
This situation is unlikely since we don't schedule the work for exiting
tasks, but we cannot rule it out.

Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/linux/mm_types.h | 26 ++++++++++++++
 include/linux/sched.h    |  8 ++++-
 kernel/rseq.c            |  2 ++
 kernel/sched/core.c      | 75 ++++++++++++++++++++++++++--------------
 kernel/sched/sched.h     |  6 ++--
 5 files changed, 89 insertions(+), 28 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d6b91e8a66d6d..d14c7c49cf0ec 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1017,6 +1017,10 @@ struct mm_struct {
 		 * mm nr_cpus_allowed updates.
 		 */
 		raw_spinlock_t cpus_allowed_lock;
+		/*
+		 * @cid_work: Work item to run the mm_cid scan.
+		 */
+		struct work_struct cid_work;
 #endif
 #ifdef CONFIG_MMU
 		atomic_long_t pgtables_bytes;	/* size of all page tables */
@@ -1321,6 +1325,8 @@ enum mm_cid_state {
 	MM_CID_LAZY_PUT = (1U << 31),
 };
 
+extern void task_mm_cid_work(struct work_struct *work);
+
 static inline bool mm_cid_is_unset(int cid)
 {
 	return cid == MM_CID_UNSET;
@@ -1393,12 +1399,14 @@ static inline int mm_alloc_cid_noprof(struct mm_struct *mm, struct task_struct *
 	if (!mm->pcpu_cid)
 		return -ENOMEM;
 	mm_init_cid(mm, p);
+	INIT_WORK(&mm->cid_work, task_mm_cid_work);
 	return 0;
 }
 #define mm_alloc_cid(...)	alloc_hooks(mm_alloc_cid_noprof(__VA_ARGS__))
 
 static inline void mm_destroy_cid(struct mm_struct *mm)
 {
+	disable_work(&mm->cid_work);
 	free_percpu(mm->pcpu_cid);
 	mm->pcpu_cid = NULL;
 }
@@ -1420,6 +1428,16 @@ static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct cpumas
 	WRITE_ONCE(mm->nr_cpus_allowed, cpumask_weight(mm_allowed));
 	raw_spin_unlock(&mm->cpus_allowed_lock);
 }
+
+static inline bool mm_cid_needs_scan(struct mm_struct *mm)
+{
+	return mm && !time_before(jiffies, READ_ONCE(mm->mm_cid_next_scan));
+}
+
+static inline bool mm_cid_scan_pending(struct mm_struct *mm)
+{
+	return mm && work_pending(&mm->cid_work);
+}
 #else /* CONFIG_SCHED_MM_CID */
 static inline void mm_init_cid(struct mm_struct *mm, struct task_struct *p) { }
 static inline int mm_alloc_cid(struct mm_struct *mm, struct task_struct *p) { return 0; }
@@ -1430,6 +1448,14 @@ static inline unsigned int mm_cid_size(void)
 	return 0;
 }
 static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct cpumask *cpumask) { }
+static inline bool mm_cid_needs_scan(struct mm_struct *mm)
+{
+	return false;
+}
+static inline bool mm_cid_scan_pending(struct mm_struct *mm)
+{
+	return false;
+}
 #endif /* CONFIG_SCHED_MM_CID */
 
 struct mmu_gather;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4f78a64beb52c..e90bc52dece3e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1432,7 +1432,7 @@ struct task_struct {
 	int				last_mm_cid;	/* Most recent cid in mm */
 	int				migrate_from_cpu;
 	int				mm_cid_active;	/* Whether cid bitmap is active */
-	struct callback_head		cid_work;
+	unsigned long			last_cid_reset;	/* Time of last reset in jiffies */
 #endif
 
 	struct tlbflush_unmap_batch	tlb_ubc;
@@ -2277,4 +2277,10 @@ static __always_inline void alloc_tag_restore(struct alloc_tag *tag, struct allo
 #define alloc_tag_restore(_tag, _old)		do {} while (0)
 #endif
 
+#ifdef CONFIG_SCHED_MM_CID
+extern void task_queue_mm_cid(struct task_struct *curr);
+#else
+static inline void task_queue_mm_cid(struct task_struct *curr) { }
+#endif
+
 #endif
diff --git a/kernel/rseq.c b/kernel/rseq.c
index b7a1ec327e811..383db2ccad4d0 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -441,6 +441,8 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
 	}
 	if (unlikely(rseq_update_cpu_node_id(t)))
 		goto error;
+	if (mm_cid_needs_scan(t->mm))
+		task_queue_mm_cid(t);
 	return;
 
 error:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dce50fa57471d..7d502a99a69cb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10589,22 +10589,16 @@ static void sched_mm_cid_remote_clear_weight(struct mm_struct *mm, int cpu,
 	sched_mm_cid_remote_clear(mm, pcpu_cid, cpu);
 }
 
-static void task_mm_cid_work(struct callback_head *work)
+void task_mm_cid_work(struct work_struct *work)
 {
 	unsigned long now = jiffies, old_scan, next_scan;
-	struct task_struct *t = current;
 	struct cpumask *cidmask;
-	struct mm_struct *mm;
+	struct mm_struct *mm = container_of(work, struct mm_struct, cid_work);
 	int weight, cpu;
 
-	WARN_ON_ONCE(t != container_of(work, struct task_struct, cid_work));
-
-	work->next = work;	/* Prevent double-add */
-	if (t->flags & PF_EXITING)
-		return;
-	mm = t->mm;
-	if (!mm)
-		return;
+	/* We are the last user, process already terminated. */
+	if (atomic_read(&mm->mm_count) == 1)
+		goto out_drop;
 	old_scan = READ_ONCE(mm->mm_cid_next_scan);
 	next_scan = now + msecs_to_jiffies(MM_CID_SCAN_DELAY);
 	if (!old_scan) {
@@ -10617,9 +10611,9 @@ static void task_mm_cid_work(struct callback_head *work)
 			old_scan = next_scan;
 	}
 	if (time_before(now, old_scan))
-		return;
+		goto out_drop;
 	if (!try_cmpxchg(&mm->mm_cid_next_scan, &old_scan, next_scan))
-		return;
+		goto out_drop;
 	cidmask = mm_cidmask(mm);
 	/* Clear cids that were not recently used. */
 	for_each_possible_cpu(cpu)
@@ -10631,6 +10625,8 @@ static void task_mm_cid_work(struct callback_head *work)
 	 */
 	for_each_possible_cpu(cpu)
 		sched_mm_cid_remote_clear_weight(mm, cpu, weight);
+out_drop:
+	mmdrop(mm);
 }
 
 void init_sched_mm_cid(struct task_struct *t)
@@ -10643,23 +10639,52 @@ void init_sched_mm_cid(struct task_struct *t)
 		if (mm_users == 1)
 			mm->mm_cid_next_scan = jiffies + msecs_to_jiffies(MM_CID_SCAN_DELAY);
 	}
-	t->cid_work.next = &t->cid_work;	/* Protect against double add */
-	init_task_work(&t->cid_work, task_mm_cid_work);
 }
 
-void task_tick_mm_cid(struct rq *rq, struct task_struct *curr)
+void task_tick_mm_cid(struct rq *rq, struct task_struct *t)
 {
-	struct callback_head *work = &curr->cid_work;
-	unsigned long now = jiffies;
+	u64 rtime = t->se.sum_exec_runtime - t->se.prev_sum_exec_runtime;
 
-	if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) ||
-	    work->next != work)
-		return;
-	if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan)))
-		return;
+	/*
+	 * If a task is running unpreempted for a long time, it won't get its
+	 * mm_cid compacted and won't update its mm_cid value after a
+	 * compaction occurs.
+	 * For such a task, this function does two things:
+	 * A) trigger the mm_cid recompaction,
+	 * B) trigger an update of the task's rseq->mm_cid field at some point
+	 * after recompaction, so it can get a mm_cid value closer to 0.
+	 * A change in the mm_cid triggers an rseq_preempt.
+	 *
+	 * A occurs only once after the scan time elapsed, until the next scan
+	 * expires as well.
+	 * B occurs once after the compaction work completes, that is when scan
+	 * is no longer needed (it occurred for this mm) but the last rseq
+	 * preempt was done before the last mm_cid scan.
+	 */
+	if (t->mm && rtime > RSEQ_UNPREEMPTED_THRESHOLD) {
+		if (mm_cid_needs_scan(t->mm) && !mm_cid_scan_pending(t->mm))
+			rseq_set_notify_resume(t);
+		else if (time_after(jiffies, t->last_cid_reset +
+				      msecs_to_jiffies(MM_CID_SCAN_DELAY))) {
+			int old_cid = t->mm_cid;
+
+			if (!t->mm_cid_active)
+				return;
+			mm_cid_snapshot_time(rq, t->mm);
+			mm_cid_put_lazy(t);
+			t->last_mm_cid = t->mm_cid = mm_cid_get(rq, t, t->mm);
+			if (old_cid != t->mm_cid)
+				rseq_preempt(t);
+		}
+	}
+}
 
-	/* No page allocation under rq lock */
-	task_work_add(curr, work, TWA_RESUME);
+/* Call only when curr is a user thread. */
+void task_queue_mm_cid(struct task_struct *curr)
+{
+	/* Ensure the mm exists when we run. */
+	mmgrab(curr->mm);
+	queue_work(system_unbound_wq, &curr->mm->cid_work);
 }
 
 void sched_mm_cid_exit_signals(struct task_struct *t)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 475bb5998295e..c1881ba10ac62 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3606,13 +3606,14 @@ extern const char *preempt_modes[];
 
 #define SCHED_MM_CID_PERIOD_NS	(100ULL * 1000000)	/* 100ms */
 #define MM_CID_SCAN_DELAY	100			/* 100ms */
+#define RSEQ_UNPREEMPTED_THRESHOLD	SCHED_MM_CID_PERIOD_NS
 
 extern raw_spinlock_t cid_lock;
 extern int use_cid_lock;
 
 extern void sched_mm_cid_migrate_from(struct task_struct *t);
 extern void sched_mm_cid_migrate_to(struct rq *dst_rq, struct task_struct *t);
-extern void task_tick_mm_cid(struct rq *rq, struct task_struct *curr);
+extern void task_tick_mm_cid(struct rq *rq, struct task_struct *t);
 extern void init_sched_mm_cid(struct task_struct *t);
 
 static inline void __mm_cid_put(struct mm_struct *mm, int cid)
@@ -3822,6 +3823,7 @@ static inline int mm_cid_get(struct rq *rq, struct task_struct *t,
 	cid = __mm_cid_get(rq, t, mm);
 	__this_cpu_write(pcpu_cid->cid, cid);
 	__this_cpu_write(pcpu_cid->recent_cid, cid);
+	t->last_cid_reset = jiffies;
 
 	return cid;
 }
@@ -3881,7 +3883,7 @@ static inline void switch_mm_cid(struct rq *rq,
 static inline void switch_mm_cid(struct rq *rq, struct task_struct *prev, struct task_struct *next) { }
 static inline void sched_mm_cid_migrate_from(struct task_struct *t) { }
 static inline void sched_mm_cid_migrate_to(struct rq *dst_rq, struct task_struct *t) { }
-static inline void task_tick_mm_cid(struct rq *rq, struct task_struct *curr) { }
+static inline void task_tick_mm_cid(struct rq *rq, struct task_struct *t) { }
 static inline void init_sched_mm_cid(struct task_struct *t) { }
 #endif /* !CONFIG_SCHED_MM_CID */
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
  2025-06-13  9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
  2025-06-13  9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
  2025-06-13  9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
@ 2025-06-13  9:12 ` Gabriele Monaco
  2025-06-18 21:04   ` Shuah Khan
  2 siblings, 1 reply; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13  9:12 UTC (permalink / raw)
  To: linux-kernel, Mathieu Desnoyers, Peter Zijlstra, Paul E. McKenney,
	Shuah Khan, linux-kselftest
  Cc: Gabriele Monaco, Ingo Molnar

A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.

The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.

At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.

After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.

The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 tools/testing/selftests/rseq/.gitignore       |   1 +
 tools/testing/selftests/rseq/Makefile         |   2 +-
 .../selftests/rseq/mm_cid_compaction_test.c   | 200 ++++++++++++++++++
 3 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c

diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 0fda241fa62b0..b3920c59bf401 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
 basic_percpu_ops_mm_cid_test
 basic_test
 basic_rseq_op_test
+mm_cid_compaction_test
 param_test
 param_test_benchmark
 param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 0d0a5fae59547..bc4d940f66d40 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -17,7 +17,7 @@ OVERRIDE_TARGETS = 1
 TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
 		param_test_benchmark param_test_compare_twice param_test_mm_cid \
 		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \
-		syscall_errors_test
+		syscall_errors_test mm_cid_compaction_test
 
 TEST_GEN_PROGS_EXTENDED = librseq.so
 
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...)                    \
+	do {                                        \
+		if (VERBOSE)                        \
+			printf(fmt, ##__VA_ARGS__); \
+	} while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+	int cpu;
+	int num_cpus;
+	pthread_mutex_t *token;
+	pthread_barrier_t *barrier;
+	pthread_t *tinfo;
+	struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+	struct thread_args *args = arg;
+	int i, ret, curr_mm_cid;
+	cpu_set_t cpumask;
+
+	CPU_ZERO(&cpumask);
+	CPU_SET(args->cpu, &cpumask);
+	ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+	if (ret) {
+		errno = ret;
+		perror("Error: failed to set affinity");
+		abort();
+	}
+	pthread_barrier_wait(args->barrier);
+
+	for (i = 0; i < THREAD_RUNS; i++)
+		usleep(RUNNER_PERIOD);
+	curr_mm_cid = rseq_current_mm_cid();
+	/*
+	 * We select one thread with high enough mm_cid to be the new leader.
+	 * All other threads (including the main thread) will terminate.
+	 * After some time, the mm_cid of the only remaining thread should
+	 * converge to 0, if not, the test fails.
+	 */
+	if (curr_mm_cid >= args->num_cpus / 2 &&
+	    !pthread_mutex_trylock(args->token)) {
+		printf_verbose(
+			"cpu%d has mm_cid=%d and will be the new leader.\n",
+			sched_getcpu(), curr_mm_cid);
+		for (i = 0; i < args->num_cpus; i++) {
+			if (args->tinfo[i] == pthread_self())
+				continue;
+			ret = pthread_join(args->tinfo[i], NULL);
+			if (ret) {
+				errno = ret;
+				perror("Error: failed to join thread");
+				abort();
+			}
+		}
+		pthread_barrier_destroy(args->barrier);
+		free(args->tinfo);
+		free(args->token);
+		free(args->barrier);
+		free(args->args_head);
+
+		for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+			curr_mm_cid = rseq_current_mm_cid();
+			printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+				       curr_mm_cid, sched_getcpu());
+			if (curr_mm_cid == 0)
+				exit(EXIT_SUCCESS);
+			usleep(RUNNER_PERIOD);
+		}
+		exit(EXIT_FAILURE);
+	}
+	printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+		       sched_getcpu(), curr_mm_cid);
+	pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+	cpu_set_t affinity;
+	int i, j, ret = 0, num_threads;
+	pthread_t *tinfo;
+	pthread_mutex_t *token;
+	pthread_barrier_t *barrier;
+	struct thread_args *args;
+
+	sched_getaffinity(0, sizeof(affinity), &affinity);
+	num_threads = CPU_COUNT(&affinity);
+	tinfo = calloc(num_threads, sizeof(*tinfo));
+	if (!tinfo) {
+		perror("Error: failed to allocate tinfo");
+		return -1;
+	}
+	args = calloc(num_threads, sizeof(*args));
+	if (!args) {
+		perror("Error: failed to allocate args");
+		ret = -1;
+		goto out_free_tinfo;
+	}
+	token = malloc(sizeof(*token));
+	if (!token) {
+		perror("Error: failed to allocate token");
+		ret = -1;
+		goto out_free_args;
+	}
+	barrier = malloc(sizeof(*barrier));
+	if (!barrier) {
+		perror("Error: failed to allocate barrier");
+		ret = -1;
+		goto out_free_token;
+	}
+	if (num_threads == 1) {
+		fprintf(stderr, "Cannot test on a single cpu. "
+				"Skipping mm_cid_compaction test.\n");
+		/* only skipping the test, this is not a failure */
+		goto out_free_barrier;
+	}
+	pthread_mutex_init(token, NULL);
+	ret = pthread_barrier_init(barrier, NULL, num_threads);
+	if (ret) {
+		errno = ret;
+		perror("Error: failed to initialise barrier");
+		goto out_free_barrier;
+	}
+	for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+		if (!CPU_ISSET(i, &affinity))
+			continue;
+		args[j].num_cpus = num_threads;
+		args[j].tinfo = tinfo;
+		args[j].token = token;
+		args[j].barrier = barrier;
+		args[j].cpu = i;
+		args[j].args_head = args;
+		if (!j) {
+			/* The first thread is the main one */
+			tinfo[0] = pthread_self();
+			++j;
+			continue;
+		}
+		ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+		if (ret) {
+			errno = ret;
+			perror("Error: failed to create thread");
+			abort();
+		}
+		++j;
+	}
+	printf_verbose("Started %d threads.\n", num_threads);
+
+	/* Also main thread will terminate if it is not selected as leader */
+	thread_runner(&args[0]);
+
+	/* only reached in case of errors */
+out_free_barrier:
+	free(barrier);
+out_free_token:
+	free(token);
+out_free_args:
+	free(args);
+out_free_tinfo:
+	free(tinfo);
+
+	return ret;
+}
+
+int main(int argc, char **argv)
+{
+	if (!rseq_mm_cid_available()) {
+		fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+		return -1;
+	}
+	if (test_mm_cid_compaction())
+		return -1;
+	return 0;
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
  2025-06-13  9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
@ 2025-06-18 21:04   ` Shuah Khan
  2025-06-20 17:20     ` Gabriele Monaco
  0 siblings, 1 reply; 11+ messages in thread
From: Shuah Khan @ 2025-06-18 21:04 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
	Paul E. McKenney, Shuah Khan, linux-kselftest
  Cc: Ingo Molnar, Shuah Khan

On 6/13/25 03:12, Gabriele Monaco wrote:
> A task in the kernel (task_mm_cid_work) runs somewhat periodically to
> compact the mm_cid for each process. Add a test to validate that it runs
> correctly and timely.
> 
> The test spawns 1 thread pinned to each CPU, then each thread, including
> the main one, runs in short bursts for some time. During this period, the
> mm_cids should be spanning all numbers between 0 and nproc.
> 
> At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
> is selected to be the new leader, all other threads terminate.
> 
> After some time, the only running thread should see 0 as mm_cid, if that
> doesn't happen, the compaction mechanism didn't work and the test fails.
> 
> The test never fails if only 1 core is available, in which case, we
> cannot test anything as the only available mm_cid is 0.
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

Mathieu,

Let me know if you would like me to take this through my tree.

Acked-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
  2025-06-18 21:04   ` Shuah Khan
@ 2025-06-20 17:20     ` Gabriele Monaco
  2025-06-20 17:29       ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-20 17:20 UTC (permalink / raw)
  To: Shuah Khan
  Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra, Paul E. McKenney,
	Shuah Khan, linux-kselftest, Ingo Molnar

2025-06-18T21:04:30Z Shuah Khan <skhan@linuxfoundation.org>:

> On 6/13/25 03:12, Gabriele Monaco wrote:
>> A task in the kernel (task_mm_cid_work) runs somewhat periodically to
>> compact the mm_cid for each process. Add a test to validate that it runs
>> correctly and timely.
>> The test spawns 1 thread pinned to each CPU, then each thread, including
>> the main one, runs in short bursts for some time. During this period, the
>> mm_cids should be spanning all numbers between 0 and nproc.
>> At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
>> is selected to be the new leader, all other threads terminate.
>> After some time, the only running thread should see 0 as mm_cid, if that
>> doesn't happen, the compaction mechanism didn't work and the test fails.
>> The test never fails if only 1 core is available, in which case, we
>> cannot test anything as the only available mm_cid is 0.
>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>
> Mathieu,
>
> Let me know if you would like me to take this through my tree.
>
> Acked-by: Shuah Khan <skhan@linuxfoundation.org>

Thanks for the Ack, just to add some context: the test is flaky without the previous patches, would it still be alright to pull it before them?

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
  2025-06-20 17:20     ` Gabriele Monaco
@ 2025-06-20 17:29       ` Mathieu Desnoyers
  0 siblings, 0 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2025-06-20 17:29 UTC (permalink / raw)
  To: Gabriele Monaco, Shuah Khan
  Cc: linux-kernel, Peter Zijlstra, Paul E. McKenney, Shuah Khan,
	linux-kselftest, Ingo Molnar

On 2025-06-20 13:20, Gabriele Monaco wrote:
> 2025-06-18T21:04:30Z Shuah Khan <skhan@linuxfoundation.org>:
> 
>> On 6/13/25 03:12, Gabriele Monaco wrote:
>>> A task in the kernel (task_mm_cid_work) runs somewhat periodically to
>>> compact the mm_cid for each process. Add a test to validate that it runs
>>> correctly and timely.
>>> The test spawns 1 thread pinned to each CPU, then each thread, including
>>> the main one, runs in short bursts for some time. During this period, the
>>> mm_cids should be spanning all numbers between 0 and nproc.
>>> At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
>>> is selected to be the new leader, all other threads terminate.
>>> After some time, the only running thread should see 0 as mm_cid, if that
>>> doesn't happen, the compaction mechanism didn't work and the test fails.
>>> The test never fails if only 1 core is available, in which case, we
>>> cannot test anything as the only available mm_cid is 0.
>>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>>
>> Mathieu,
>>
>> Let me know if you would like me to take this through my tree.
>>
>> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
> 
> Thanks for the Ack, just to add some context: the test is flaky without the previous patches, would it still be alright to pull it before them?

We need Peter Zijlstra to act on merging the fix.

Peter ?

Thanks,

Mathieu

> 
> Thanks,
> Gabriele
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
  2025-06-13  9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
@ 2025-06-25  8:01   ` kernel test robot
  2025-06-25 13:57     ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: kernel test robot @ 2025-06-25  8:01 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
	Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
	Mathieu Desnoyers, Paul E. McKenney, Gabriele Monaco, Ingo Molnar,
	oliver.sang


Hello,

kernel test robot noticed a 10.1% regression of hackbench.throughput on:


commit: f3de761c52148abfb1b4512914f64c7e1c737fc8 ("[RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct")
url: https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250613-171504
patch link: https://lore.kernel.org/all/20250613091229.21500-3-gmonaco@redhat.com/
patch subject: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct

testcase: hackbench
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	iterations: 4
	mode: process
	ipc: pipe
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput  2.9% regression                                               |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | ipc=socket                                                                                     |
|                  | iterations=4                                                                                   |
|                  | mode=process                                                                                   |
|                  | nr_threads=50%                                                                                 |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.shell_rtns_3.ops_per_sec  1.7% regression                                           |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | test=shell_rtns_3                                                                              |
|                  | testtime=300s                                                                                  |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput  6.2% regression                                               |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | ipc=pipe                                                                                       |
|                  | iterations=4                                                                                   |
|                  | mode=process                                                                                   |
|                  | nr_threads=800%                                                                                |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.shell_rtns_1.ops_per_sec  2.1% regression                                           |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | test=shell_rtns_1                                                                              |
|                  | testtime=300s                                                                                  |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput 11.8% improvement                                              |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | ipc=pipe                                                                                       |
|                  | iterations=4                                                                                   |
|                  | mode=process                                                                                   |
|                  | nr_threads=50%                                                                                 |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.shell_rtns_2.ops_per_sec  2.2% regression                                           |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | test=shell_rtns_2                                                                              |
|                  | testtime=300s                                                                                  |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.exec_test.ops_per_sec  2.6% regression                                              |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | test=exec_test                                                                                 |
|                  | testtime=300s                                                                                  |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim7: aim7.jobs-per-min  5.5% regression                                                       |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | disk=1BRD_48G                                                                                  |
|                  | fs=xfs                                                                                         |
|                  | load=600                                                                                       |
|                  | test=sync_disk_rw                                                                              |
+------------------+------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506251555.de6720f7-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250625/202506251555.de6720f7-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     55140 ± 80%    +229.2%     181547 ± 20%  numa-meminfo.node1.Mapped
     13048 ± 80%    +248.2%      45431 ± 20%  numa-vmstat.node1.nr_mapped
    679.17 ± 22%     -25.3%     507.33 ± 10%  sched_debug.cfs_rq:/.util_est.max
 4.287e+08 ±  3%     +20.3%  5.158e+08        cpuidle..time
   2953716 ± 13%    +228.9%    9716185 ±  2%  cpuidle..usage
     91072 ± 12%    +134.8%     213855 ±  7%  meminfo.Mapped
   8848637           +10.4%    9769875 ±  5%  meminfo.Memused
      0.67 ±  4%      +0.1        0.78 ±  2%  mpstat.cpu.all.irq%
      0.03 ±  2%      +0.0        0.03 ±  4%  mpstat.cpu.all.soft%
      4.17 ±  8%    +596.0%      29.00 ± 31%  mpstat.max_utilization.seconds
      2950           -12.3%       2587        vmstat.procs.r
   4557607 ±  2%     +35.9%    6192548        vmstat.system.cs
    397195 ±  5%     +73.4%     688726        vmstat.system.in
   1490153           -10.1%    1339340        hackbench.throughput
   1424170            -8.7%    1299590        hackbench.throughput_avg
   1490153           -10.1%    1339340        hackbench.throughput_best
   1353181 ±  2%     -10.1%    1216523        hackbench.throughput_worst
  53158738 ±  3%     +34.0%   71240022        hackbench.time.involuntary_context_switches
     12177            -2.4%      11891        hackbench.time.percent_of_cpu_this_job_got
      4482            +7.6%       4821        hackbench.time.system_time
    798.92            +2.0%     815.24        hackbench.time.user_time
  1.54e+08 ±  3%     +46.6%  2.257e+08        hackbench.time.voluntary_context_switches
    210335            +3.3%     217333        proc-vmstat.nr_anon_pages
     23353 ± 14%    +136.2%      55152 ±  7%  proc-vmstat.nr_mapped
     61825 ±  3%      +6.6%      65928 ±  2%  proc-vmstat.nr_page_table_pages
     30859            +4.4%      32213        proc-vmstat.nr_slab_reclaimable
      1294 ±177%   +1657.1%      22743 ± 66%  proc-vmstat.numa_hint_faults
      1153 ±198%   +1597.0%      19566 ± 79%  proc-vmstat.numa_hint_faults_local
 1.242e+08            -3.2%  1.202e+08        proc-vmstat.numa_hit
 1.241e+08            -3.2%  1.201e+08        proc-vmstat.numa_local
      2195 ±110%   +2337.0%      53508 ± 55%  proc-vmstat.numa_pte_updates
 1.243e+08            -3.2%  1.203e+08        proc-vmstat.pgalloc_normal
    875909 ±  2%      +8.6%     951378 ±  2%  proc-vmstat.pgfault
 1.231e+08            -3.5%  1.188e+08        proc-vmstat.pgfree
 6.903e+10            -5.6%  6.514e+10        perf-stat.i.branch-instructions
      0.21            +0.0        0.26        perf-stat.i.branch-miss-rate%
  89225177 ±  2%     +38.3%  1.234e+08        perf-stat.i.branch-misses
     25.64 ±  2%      -5.7       19.95 ±  2%  perf-stat.i.cache-miss-rate%
 9.322e+08 ±  2%     +22.8%  1.145e+09        perf-stat.i.cache-references
   4553621 ±  2%     +39.8%    6363761        perf-stat.i.context-switches
      1.12            +4.5%       1.17        perf-stat.i.cpi
    186890 ±  2%    +143.9%     455784        perf-stat.i.cpu-migrations
 2.787e+11            -4.9%  2.649e+11        perf-stat.i.instructions
      0.91            -4.4%       0.87        perf-stat.i.ipc
     36.79 ±  2%     +44.9%      53.30        perf-stat.i.metric.K/sec
      0.13 ±  2%      +0.1        0.19        perf-stat.overall.branch-miss-rate%
     24.44 ±  2%      -4.7       19.74 ±  2%  perf-stat.overall.cache-miss-rate%
      1.12            +4.6%       1.17        perf-stat.overall.cpi
      0.89            -4.4%       0.85        perf-stat.overall.ipc
 6.755e+10            -5.4%  6.392e+10        perf-stat.ps.branch-instructions
  87121352 ±  2%     +38.5%  1.206e+08        perf-stat.ps.branch-misses
 9.098e+08 ±  2%     +23.1%   1.12e+09        perf-stat.ps.cache-references
   4443812 ±  2%     +39.9%    6218298        perf-stat.ps.context-switches
    181595 ±  2%    +144.5%     443985        perf-stat.ps.cpu-migrations
 2.727e+11            -4.7%  2.599e+11        perf-stat.ps.instructions
  1.21e+13            +4.3%  1.262e+13        perf-stat.total.instructions
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.ctx_resched.event_function.remote_function.generic_exec_single
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.ctx_resched.event_function.remote_function.generic_exec_single.smp_call_function_single
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.event_function.remote_function.generic_exec_single.smp_call_function_single.event_function_call
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.remote_function.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_internal_command
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record.run_builtin
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.main
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.perf_c2c__record.run_builtin.handle_internal_command.main
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.__x64_sys_ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp._perf_ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.ctx_resched
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.event_function
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.generic_exec_single
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.perf_event_for_each_child
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.perf_ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.remote_function
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.__evlist__enable
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.perf_c2c__record
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.perf_evsel__enable_cpu
     11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.perf_evsel__run_ioctl
     11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
     23.74 ±185%     -98.6%       0.34 ±114%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
     12.77 ± 80%     -83.9%       2.05 ±138%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      5.93 ± 69%     -90.5%       0.56 ±105%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      6.70 ±152%     -94.5%       0.37 ±145%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      0.82 ± 85%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.59 ±202%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     13.53 ± 11%     -47.0%       7.18 ± 76%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     15.63 ± 17%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     47.22 ± 77%     -85.5%       6.87 ±144%  perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
    133.35 ±132%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
     68.01 ±203%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     13.53 ± 11%     -47.0%       7.18 ± 76%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     34.59 ±  3%    -100.0%       0.00        perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     40.97 ±  8%     -71.8%      11.55 ± 64%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
    373.07 ±123%     -99.8%       0.78 ±156%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     13.53 ± 11%     -62.0%       5.14 ±107%  perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
    120.97 ± 23%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     46.03 ± 30%     -62.5%      17.27 ± 87%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
    984.50 ± 14%     -43.5%     556.24 ± 58%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
    339.42 ± 12%     -97.3%       9.11 ± 54%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      8.00 ± 23%     -85.4%       1.17 ±223%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     22.17 ± 49%    -100.0%       0.00        perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     73.83 ± 20%     -76.3%      17.50 ± 96%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
     13.53 ± 11%     -62.0%       5.14 ±107%  perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
    336.30 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     23.74 ±185%     -98.6%       0.34 ±114%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
     14.48 ± 61%     -74.1%       3.76 ±152%  perf-sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      6.48 ± 68%     -91.3%       0.56 ±105%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      6.70 ±152%     -94.5%       0.37 ±145%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      2.18 ± 75%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
     10.79 ±165%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      1.53 ±100%     -97.5%       0.04 ± 84%  perf-sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
    105.34 ± 26%    -100.0%       0.00        perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     29.72 ± 40%     -76.5%       7.00 ±102%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
     32.21 ± 33%     -65.7%      11.04 ± 85%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
    984.49 ± 14%     -43.5%     556.23 ± 58%  perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
    337.00 ± 12%     -97.6%       8.11 ± 52%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     53.42 ± 59%     -69.8%      16.15 ±162%  perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
    218.65 ± 83%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
     82.52 ±162%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     10.89 ± 98%     -98.8%       0.13 ±134%  perf-sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
    334.02 ±  6%    -100.0%       0.00        perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault


***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-12/performance/socket/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    161258           -12.6%     141018 ±  5%  perf-c2c.HITM.total
      6514 ±  3%     +13.3%       7381 ±  3%  uptime.idle
    692218           +17.8%     815512        vmstat.system.in
 4.747e+08 ±  7%    +137.3%  1.127e+09 ± 21%  cpuidle..time
   5702271 ± 12%    +503.6%   34419686 ± 13%  cpuidle..usage
    141191 ±  2%     +10.3%     155768 ±  3%  meminfo.PageTables
     62180           +26.0%      78348        meminfo.Percpu
      2.20 ± 14%      +3.5        5.67 ± 20%  mpstat.cpu.all.idle%
      0.55            +0.2        0.72 ±  5%  mpstat.cpu.all.irq%
      0.04 ±  2%      +0.0        0.06 ±  5%  mpstat.cpu.all.soft%
    448780            -2.9%     435554        hackbench.throughput
    440656            -2.6%     429130        hackbench.throughput_avg
    448780            -2.9%     435554        hackbench.throughput_best
    425797            -2.2%     416584        hackbench.throughput_worst
  90998790           -15.0%   77364427 ±  6%  hackbench.time.involuntary_context_switches
     12446            -3.9%      11960        hackbench.time.percent_of_cpu_this_job_got
     16057            -1.4%      15825        hackbench.time.system_time
     63421            -2.3%      61955        proc-vmstat.nr_kernel_stack
     35455 ±  2%     +10.0%      38991 ±  3%  proc-vmstat.nr_page_table_pages
     34542            +5.1%      36312 ±  2%  proc-vmstat.nr_slab_reclaimable
    151083 ± 16%     +46.6%     221509 ± 17%  proc-vmstat.numa_hint_faults
    113731 ± 26%     +64.7%     187314 ± 20%  proc-vmstat.numa_hint_faults_local
    133591            +3.1%     137709        proc-vmstat.numa_other
     53696 ± 16%     -28.6%      38362 ± 10%  proc-vmstat.numa_pages_migrated
   1053504 ±  2%      +7.7%    1135052 ±  4%  proc-vmstat.pgfault
   2077549 ±  3%      +8.5%    2254157 ±  4%  proc-vmstat.pgfree
     53696 ± 16%     -28.6%      38362 ± 10%  proc-vmstat.pgmigrate_success
 4.941e+10            -2.6%   4.81e+10        perf-stat.i.branch-instructions
 2.232e+08            -1.9%  2.189e+08        perf-stat.i.branch-misses
  2.11e+09            -5.8%  1.989e+09 ±  2%  perf-stat.i.cache-references
 3.221e+11            -2.5%  3.141e+11        perf-stat.i.cpu-cycles
 2.365e+11            -2.7%  2.303e+11        perf-stat.i.instructions
      6787 ±  3%      +8.0%       7327 ±  4%  perf-stat.i.minor-faults
      6789 ±  3%      +8.0%       7329 ±  4%  perf-stat.i.page-faults
 4.904e+10            -2.5%  4.779e+10        perf-stat.ps.branch-instructions
 2.215e+08            -1.8%  2.174e+08        perf-stat.ps.branch-misses
 2.094e+09            -5.7%  1.974e+09 ±  2%  perf-stat.ps.cache-references
 3.197e+11            -2.4%   3.12e+11        perf-stat.ps.cpu-cycles
 2.348e+11            -2.6%  2.288e+11        perf-stat.ps.instructions
      6691 ±  3%      +7.2%       7174 ±  4%  perf-stat.ps.minor-faults
      6693 ±  3%      +7.2%       7176 ±  4%  perf-stat.ps.page-faults
   7475567           +16.4%    8699139        sched_debug.cfs_rq:/.avg_vruntime.avg
   8752154 ±  3%     +20.6%   10551563 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.max
    211424 ± 12%    +374.5%    1003211 ± 39%  sched_debug.cfs_rq:/.avg_vruntime.stddev
     19.44 ±  6%     +29.4%      25.17 ±  5%  sched_debug.cfs_rq:/.h_nr_queued.max
      4.49 ±  4%     +33.5%       5.99 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.stddev
     19.33 ±  6%     +29.0%      24.94 ±  5%  sched_debug.cfs_rq:/.h_nr_runnable.max
      4.47 ±  4%     +33.4%       5.96 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
      6446 ±223%    +885.4%      63529 ± 57%  sched_debug.cfs_rq:/.left_deadline.avg
    825119 ±223%    +613.5%    5886958 ± 44%  sched_debug.cfs_rq:/.left_deadline.max
     72645 ±223%    +713.6%     591074 ± 49%  sched_debug.cfs_rq:/.left_deadline.stddev
      6446 ±223%    +885.5%      63527 ± 57%  sched_debug.cfs_rq:/.left_vruntime.avg
    825080 ±223%    +613.5%    5886805 ± 44%  sched_debug.cfs_rq:/.left_vruntime.max
     72642 ±223%    +713.7%     591058 ± 49%  sched_debug.cfs_rq:/.left_vruntime.stddev
      4202 ±  8%   +1115.1%      51069 ± 61%  sched_debug.cfs_rq:/.load.stddev
    367.11           +20.2%     441.44 ± 17%  sched_debug.cfs_rq:/.load_avg.max
   7475567           +16.4%    8699139        sched_debug.cfs_rq:/.min_vruntime.avg
   8752154 ±  3%     +20.6%   10551563 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
    211424 ± 12%    +374.5%    1003211 ± 39%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.17 ± 16%     +39.8%       0.24 ±  6%  sched_debug.cfs_rq:/.nr_queued.stddev
      6446 ±223%    +885.5%      63527 ± 57%  sched_debug.cfs_rq:/.right_vruntime.avg
    825080 ±223%    +613.5%    5886805 ± 44%  sched_debug.cfs_rq:/.right_vruntime.max
     72642 ±223%    +713.7%     591058 ± 49%  sched_debug.cfs_rq:/.right_vruntime.stddev
    752.39 ± 81%     -81.4%     139.72 ± 53%  sched_debug.cfs_rq:/.runnable_avg.min
      2728 ±  3%     +51.2%       4126 ±  8%  sched_debug.cfs_rq:/.runnable_avg.stddev
    265.50 ±  2%     +12.3%     298.07 ±  2%  sched_debug.cfs_rq:/.util_avg.stddev
    686.78 ±  7%     +23.4%     847.76 ±  6%  sched_debug.cfs_rq:/.util_est.stddev
     19.44 ±  5%     +29.7%      25.22 ±  4%  sched_debug.cpu.nr_running.max
      4.48 ±  5%     +34.4%       6.02 ±  3%  sched_debug.cpu.nr_running.stddev
     67323 ± 14%    +130.3%     155017 ± 29%  sched_debug.cpu.nr_switches.stddev
    -20.78           -18.2%     -17.00        sched_debug.cpu.nr_uninterruptible.min
      0.13 ±100%     -85.8%       0.02 ±163%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
      0.17 ±116%     -97.8%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
     22.92 ±110%     -97.4%       0.59 ±137%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
      8.10 ± 45%     -78.0%       1.78 ±135%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      3.14 ± 19%     -70.9%       0.91 ±102%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     39.05 ±149%     -97.4%       1.01 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
     15.77 ±203%     -99.7%       0.04 ±102%  perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
      1.27 ±177%     -98.2%       0.02 ±190%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
      0.20 ±140%     -92.4%       0.02 ±201%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
     86.63 ±221%     -99.9%       0.05 ±184%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.18 ± 75%     -97.0%       0.01 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
      0.13 ± 34%     -75.5%       0.03 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
      0.26 ±108%     -86.2%       0.04 ±142%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      2.33 ± 11%     -65.8%       0.80 ±107%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      0.18 ± 88%     -91.1%       0.02 ±194%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
      0.50 ±145%     -92.5%       0.04 ±210%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
      0.19 ±116%     -98.5%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
      0.24 ±128%     -96.8%       0.01 ±180%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
      0.99 ± 16%     -58.0%       0.42 ±100%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      0.27 ±124%     -97.5%       0.01 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
      1.08 ± 28%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.96 ± 93%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.53 ±182%     -94.2%       0.03 ±158%  perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.84 ±160%     -93.5%       0.05 ±100%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
     29.39 ±172%     -94.0%       1.78 ±123%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     21.51 ± 60%     -74.7%       5.45 ±118%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13.77 ± 61%     -81.3%       2.57 ±113%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     11.22 ± 33%     -74.5%       2.86 ±107%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      1.99 ± 90%     -90.1%       0.20 ±100%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      4.50 ±138%     -94.9%       0.23 ±200%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     27.91 ±218%     -99.6%       0.11 ±120%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      9.91 ± 51%     -68.3%       3.15 ±124%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     10.18 ± 24%     -62.4%       3.83 ±105%  perf-sched.sch_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
      1.16 ± 20%     -62.7%       0.43 ±106%  perf-sched.sch_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      0.27 ± 99%     -92.0%       0.02 ±172%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
      0.32 ±128%     -98.9%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
      0.88 ± 94%     -86.7%       0.12 ±144%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
    252.53 ±128%     -98.4%       4.12 ±138%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
     60.22 ± 58%     -67.8%      19.37 ±146%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
    168.93 ±209%     -99.9%       0.15 ±100%  perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
      3.79 ±169%     -98.6%       0.05 ±199%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
    517.19 ±222%     -99.9%       0.29 ±201%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.54 ± 82%     -98.4%       0.01 ±141%  perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
      0.34 ± 57%     -93.1%       0.02 ±203%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
      0.64 ±141%     -99.4%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
      0.28 ±111%     -97.2%       0.01 ±180%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
      0.29 ±114%     -97.6%       0.01 ±141%  perf-sched.sch_delay.max.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
    133.30 ± 46%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
     12.53 ±135%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      1.11 ± 85%     -76.9%       0.26 ±202%  perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
      7.48 ±214%     -99.0%       0.08 ±141%  perf-sched.sch_delay.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     28.59 ±191%     -99.0%       0.28 ±120%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
    285.16 ±145%     -99.3%       1.94 ±111%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
    143.71 ±128%     -91.0%      12.97 ±134%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    107.10 ±162%     -99.1%       0.95 ±190%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
    352.73 ±216%     -99.4%       2.06 ±118%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1169 ± 25%     -58.7%     482.79 ±101%  perf-sched.sch_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      1.80 ± 20%     -58.5%       0.75 ±105%  perf-sched.total_sch_delay.average.ms
      5.09 ± 20%     -58.0%       2.14 ±106%  perf-sched.total_wait_and_delay.average.ms
     20.86 ± 25%     -82.0%       3.76 ±147%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      8.10 ± 21%     -69.1%       2.51 ±103%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     22.82 ± 27%     -66.9%       7.55 ±103%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      6.55 ± 13%     -64.1%       2.35 ±108%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
    139.95 ± 55%     -64.0%      50.45 ±122%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
     27.54 ± 61%     -81.3%       5.15 ±113%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     27.75 ± 30%     -73.3%       7.42 ±106%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     26.76 ± 25%     -64.2%       9.57 ±107%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
     29.39 ± 34%     -67.3%       9.61 ±115%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     27.53 ± 25%     -62.9%      10.21 ±105%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
      3.25 ± 20%     -62.2%       1.23 ±106%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
    864.18 ±  4%     -99.3%       6.27 ±103%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    141.47 ± 38%     -72.9%      38.27 ±154%  perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      2346 ± 25%     -58.7%     969.53 ±101%  perf-sched.wait_and_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
     83.99 ±223%    -100.0%       0.02 ±163%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
      0.16 ±122%     -97.7%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
     12.76 ± 37%     -81.6%       2.35 ±125%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      4.96 ± 22%     -67.9%       1.59 ±104%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     75.22 ± 91%     -96.4%       2.67 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
     23.31 ±188%     -98.8%       0.28 ±195%  perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
     14.93 ± 22%     -68.0%       4.78 ±104%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      1.29 ±178%     -98.5%       0.02 ±185%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
      0.20 ±140%     -92.5%       0.02 ±200%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
     87.29 ±221%     -99.9%       0.05 ±184%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.18 ± 76%     -97.0%       0.01 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
      0.12 ± 33%     -87.4%       0.02 ±212%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
      4.22 ± 15%     -63.3%       1.55 ±108%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      0.18 ± 88%     -91.1%       0.02 ±194%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
      0.50 ±145%     -92.5%       0.04 ±210%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
      0.19 ±116%     -98.5%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
      0.24 ±128%     -96.8%       0.01 ±180%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
      1.79 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.98 ± 92%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      2.44 ±199%     -98.1%       0.05 ±109%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
    125.16 ± 52%     -64.6%      44.36 ±120%  perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
     13.77 ± 61%     -81.3%       2.58 ±113%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     16.53 ± 29%     -72.5%       4.55 ±106%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      3.11 ± 80%     -80.7%       0.60 ±138%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
     17.30 ± 23%     -65.0%       6.05 ±107%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
     50.76 ±143%     -98.1%       0.97 ±101%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     19.48 ± 27%     -66.8%       6.46 ±111%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     17.35 ± 25%     -63.3%       6.37 ±106%  perf-sched.wait_time.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
      2.09 ± 21%     -62.0%       0.79 ±107%  perf-sched.wait_time.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
    850.73 ±  6%     -99.3%       5.76 ±102%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    168.00 ±223%    -100.0%       0.02 ±172%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
      0.32 ±131%     -98.8%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
      0.88 ± 94%     -86.7%       0.12 ±144%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     83.05 ± 45%     -75.0%      20.78 ±142%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
    393.39 ± 76%     -96.3%      14.60 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
      3.87 ±170%     -98.6%       0.05 ±199%  perf-sched.wait_time.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
    520.88 ±222%     -99.9%       0.29 ±201%  perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.54 ± 82%     -98.4%       0.01 ±141%  perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
      0.34 ± 57%     -93.1%       0.02 ±203%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
      0.64 ±141%     -99.4%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
      0.28 ±111%     -97.2%       0.01 ±180%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
    210.15 ± 42%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
     34.48 ±131%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      1.11 ± 85%     -76.9%       0.26 ±202%  perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
     92.32 ±212%     -99.7%       0.27 ±123%  perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      3252 ± 21%     -58.5%       1351 ±103%  perf-sched.wait_time.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      1602 ± 28%     -66.2%     541.12 ±100%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    530.17 ± 95%     -98.5%       7.79 ±119%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1177 ± 25%     -58.7%     486.74 ±101%  perf-sched.wait_time.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
     50.88            -1.4       49.53        perf-profile.calltrace.cycles-pp.read
     45.95            -1.0       44.92        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
     45.66            -1.0       44.64        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      3.44 ±  4%      -0.8        2.66 ±  4%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
      3.32 ±  4%      -0.8        2.56 ±  4%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg
      3.28 ±  4%      -0.8        2.52 ±  4%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable
      3.48 ±  3%      -0.6        2.83 ±  5%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      3.52 ±  3%      -0.6        2.87 ±  5%  perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      3.45 ±  3%      -0.6        2.80 ±  5%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg
     47.06            -0.6       46.45        perf-profile.calltrace.cycles-pp.write
      4.26 ±  5%      -0.6        3.69        perf-profile.calltrace.cycles-pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write
      1.58 ±  3%      -0.6        1.02 ±  8%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
      1.31 ±  3%      -0.5        0.85 ±  8%  perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
      1.25 ±  3%      -0.4        0.81 ±  8%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
      0.84 ±  3%      -0.2        0.60 ±  5%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      7.91            -0.2        7.68        perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      3.17 ±  2%      -0.2        2.94        perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
      7.80            -0.2        7.58        perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      7.58            -0.2        7.36        perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
      1.22 ±  4%      -0.2        1.02 ±  4%  perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
      1.18 ±  4%      -0.2        0.99 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_timeout
      0.87            -0.2        0.68 ±  8%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_timeout
      1.14 ±  4%      -0.2        0.95 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule
      0.90            -0.2        0.72 ±  7%  perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
      3.45 ±  3%      -0.1        3.30        perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
      1.96            -0.1        1.82        perf-profile.calltrace.cycles-pp.clear_bhb_loop.read
      1.97            -0.1        1.86        perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
      2.35            -0.1        2.25        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
      2.58            -0.1        2.48        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
      1.38 ±  4%      -0.1        1.28 ±  2%  perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
      1.35            -0.1        1.25        perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
      0.67 ±  7%      -0.1        0.58 ±  3%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule
      2.59            -0.1        2.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      2.02            -0.1        1.96        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      0.77 ±  3%      -0.0        0.72 ±  2%  perf-profile.calltrace.cycles-pp.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.65 ±  4%      -0.0        0.60 ±  2%  perf-profile.calltrace.cycles-pp.fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      0.74            -0.0        0.70        perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      1.04            -0.0        0.99        perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb
      0.69            -0.0        0.65 ±  2%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter
      0.82            -0.0        0.80        perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
      0.57            -0.0        0.56        perf-profile.calltrace.cycles-pp.refill_obj_stock.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
      0.80 ±  9%      +0.2        1.01 ±  8%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
      2.50 ±  4%      +0.3        2.82 ±  9%  perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      2.64 ±  6%      +0.4        3.06 ± 12%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic
      2.73 ±  6%      +0.4        3.16 ± 12%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
      2.87 ±  6%      +0.4        3.30 ± 12%  perf-profile.calltrace.cycles-pp.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
     18.38            +0.6       18.93        perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
      0.00            +0.7        0.70 ± 11%  perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
      0.00            +0.8        0.76 ± 16%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_stream_sendmsg.sock_write_iter.vfs_write
      0.00            +1.5        1.46 ± 11%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +1.5        1.46 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      0.00            +1.5        1.46 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.00            +1.5        1.50 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      0.00            +1.5        1.52 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      0.00            +1.6        1.61 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      0.18 ±141%      +1.8        1.93 ± 11%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      0.18 ±141%      +1.8        1.94 ± 11%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
      0.18 ±141%      +1.8        1.94 ± 11%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
      0.18 ±141%      +1.8        1.97 ± 11%  perf-profile.calltrace.cycles-pp.common_startup_64
      0.00            +2.0        1.96 ± 11%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
     87.96            -1.4       86.57        perf-profile.children.cycles-pp.do_syscall_64
     88.72            -1.4       87.33        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     51.44            -1.4       50.05        perf-profile.children.cycles-pp.read
      4.55 ±  2%      -0.8        3.74 ±  5%  perf-profile.children.cycles-pp.schedule
      3.76 ±  4%      -0.7        3.02 ±  3%  perf-profile.children.cycles-pp.__wake_up_common
      3.64 ±  4%      -0.7        2.92 ±  3%  perf-profile.children.cycles-pp.autoremove_wake_function
      3.60 ±  4%      -0.7        2.90 ±  3%  perf-profile.children.cycles-pp.try_to_wake_up
      4.00 ±  2%      -0.6        3.36 ±  4%  perf-profile.children.cycles-pp.schedule_timeout
      4.65 ±  2%      -0.6        4.02 ±  4%  perf-profile.children.cycles-pp.__schedule
     47.64            -0.6       47.01        perf-profile.children.cycles-pp.write
      4.58 ±  4%      -0.5        4.06        perf-profile.children.cycles-pp.__wake_up_sync_key
      1.45 ±  2%      -0.4        1.00 ±  5%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      1.84 ±  3%      -0.3        1.50 ±  3%  perf-profile.children.cycles-pp.ttwu_do_activate
      1.62 ±  2%      -0.3        1.33 ±  3%  perf-profile.children.cycles-pp.enqueue_task
      1.53 ±  2%      -0.3        1.26 ±  3%  perf-profile.children.cycles-pp.enqueue_task_fair
      1.40            -0.3        1.14 ±  6%  perf-profile.children.cycles-pp.pick_next_task_fair
      3.97            -0.2        3.73        perf-profile.children.cycles-pp.clear_bhb_loop
      1.43            -0.2        1.19 ±  5%  perf-profile.children.cycles-pp.__pick_next_task
      0.75 ±  4%      -0.2        0.52 ±  8%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
      7.95            -0.2        7.72        perf-profile.children.cycles-pp.unix_stream_read_actor
      7.84            -0.2        7.61        perf-profile.children.cycles-pp.skb_copy_datagram_iter
      3.24 ±  2%      -0.2        3.01        perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
      7.63            -0.2        7.42        perf-profile.children.cycles-pp.__skb_datagram_iter
      0.94 ±  4%      -0.2        0.73 ±  4%  perf-profile.children.cycles-pp.enqueue_entity
      0.95 ±  8%      -0.2        0.76 ±  4%  perf-profile.children.cycles-pp.update_curr
      1.37 ±  3%      -0.2        1.18 ±  3%  perf-profile.children.cycles-pp.dequeue_task_fair
      1.34 ±  4%      -0.2        1.16 ±  3%  perf-profile.children.cycles-pp.try_to_block_task
      4.50            -0.2        4.34        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      1.37 ±  3%      -0.2        1.20 ±  3%  perf-profile.children.cycles-pp.dequeue_entities
      3.48 ±  3%      -0.1        3.33        perf-profile.children.cycles-pp._copy_to_iter
      0.91            -0.1        0.78 ±  3%  perf-profile.children.cycles-pp.update_load_avg
      4.85            -0.1        4.72        perf-profile.children.cycles-pp.__check_object_size
      3.23            -0.1        3.11        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.54 ±  3%      -0.1        0.42 ±  5%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      1.40 ±  4%      -0.1        1.30 ±  2%  perf-profile.children.cycles-pp._copy_from_iter
      2.02            -0.1        1.92        perf-profile.children.cycles-pp.its_return_thunk
      0.43 ±  2%      -0.1        0.32 ±  3%  perf-profile.children.cycles-pp.switch_fpu_return
      0.29 ±  2%      -0.1        0.18 ±  6%  perf-profile.children.cycles-pp.__enqueue_entity
      1.46 ±  3%      -0.1        1.36 ±  2%  perf-profile.children.cycles-pp.fdget_pos
      0.44 ±  3%      -0.1        0.34 ±  5%  perf-profile.children.cycles-pp.set_next_entity
      0.42 ±  2%      -0.1        0.32 ±  4%  perf-profile.children.cycles-pp.pick_task_fair
      0.31 ±  2%      -0.1        0.24 ±  6%  perf-profile.children.cycles-pp.reweight_entity
      0.28 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.__dequeue_entity
      1.96            -0.1        1.88        perf-profile.children.cycles-pp.obj_cgroup_charge_account
      0.28 ±  2%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.update_cfs_group
      0.23 ±  2%      -0.1        0.16 ±  5%  perf-profile.children.cycles-pp.pick_eevdf
      0.26 ±  2%      -0.1        0.19 ±  4%  perf-profile.children.cycles-pp.wakeup_preempt
      1.46            -0.1        1.40        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.48 ±  2%      -0.1        0.42 ±  5%  perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      0.30            -0.1        0.24 ±  4%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
      0.82            -0.1        0.77        perf-profile.children.cycles-pp.__cond_resched
      0.27 ±  2%      -0.0        0.22 ±  4%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.14 ±  3%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.update_curr_se
      0.79            -0.0        0.74        perf-profile.children.cycles-pp.mutex_lock
      0.34 ±  3%      -0.0        0.30 ±  5%  perf-profile.children.cycles-pp.rseq_ip_fixup
      0.15 ±  4%      -0.0        0.11 ±  5%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.21 ±  3%      -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.__switch_to
      0.17 ±  4%      -0.0        0.13 ±  7%  perf-profile.children.cycles-pp.place_entity
      0.22            -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.wake_affine
      0.24            -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.check_stack_object
      0.64 ±  2%      -0.0        0.61 ±  3%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.38 ±  2%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.18 ±  3%      -0.0        0.14 ±  6%  perf-profile.children.cycles-pp.update_rq_clock
      0.66            -0.0        0.62        perf-profile.children.cycles-pp.rw_verify_area
      0.19            -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.task_mm_cid_work
      0.34 ±  3%      -0.0        0.31 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.12 ±  8%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.detach_tasks
      0.39 ±  3%      -0.0        0.36 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.21 ±  3%      -0.0        0.18 ±  6%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.18 ±  6%      -0.0        0.15 ±  4%  perf-profile.children.cycles-pp.task_tick_fair
      0.25 ±  3%      -0.0        0.22 ±  4%  perf-profile.children.cycles-pp.rseq_get_rseq_cs
      0.23 ±  5%      -0.0        0.20 ±  3%  perf-profile.children.cycles-pp.sched_tick
      0.14 ±  3%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.check_preempt_wakeup_fair
      0.11 ±  4%      -0.0        0.08 ±  7%  perf-profile.children.cycles-pp.update_min_vruntime
      0.06            -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.update_curr_dl_se
      0.14 ±  3%      -0.0        0.12 ±  5%  perf-profile.children.cycles-pp.put_prev_entity
      0.13 ±  5%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.task_h_load
      0.68            -0.0        0.65        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.46 ±  2%      -0.0        0.43 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.52            -0.0        0.50        perf-profile.children.cycles-pp.scm_recv_unix
      0.08 ±  4%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.__cgroup_account_cputime
      0.11 ±  5%      -0.0        0.09 ±  4%  perf-profile.children.cycles-pp.__switch_to_asm
      0.46 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.08 ±  8%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.activate_task
      0.08 ±  8%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.detach_task
      0.11 ±  5%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.os_xsave
      0.13 ±  5%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.avg_vruntime
      0.13 ±  4%      -0.0        0.11 ±  5%  perf-profile.children.cycles-pp.update_entity_lag
      0.08 ±  4%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.__calc_delta
      0.09 ±  5%      -0.0        0.07 ±  8%  perf-profile.children.cycles-pp.vruntime_eligible
      0.34 ±  2%      -0.0        0.32        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.30            -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.__build_skb_around
      0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.rseq_update_cpu_node_id
      0.15            -0.0        0.14        perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
      0.07 ±  5%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.38 ±  2%      +0.0        0.40 ±  2%  perf-profile.children.cycles-pp.mod_memcg_lruvec_state
      0.27 ±  2%      +0.0        0.30 ±  2%  perf-profile.children.cycles-pp.prepare_task_switch
      0.05 ±  7%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.handle_softirqs
      0.06            +0.0        0.09 ± 11%  perf-profile.children.cycles-pp.finish_wait
      0.06 ±  7%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.06 ±  8%      +0.1        0.11 ±  8%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.01 ±223%      +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.ktime_get
      0.54 ±  4%      +0.1        0.61        perf-profile.children.cycles-pp.select_task_rq
      0.00            +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.enqueue_dl_entity
      0.12 ±  4%      +0.1        0.19 ±  7%  perf-profile.children.cycles-pp.get_any_partial
      0.10 ±  9%      +0.1        0.18 ±  5%  perf-profile.children.cycles-pp.available_idle_cpu
      0.00            +0.1        0.08 ±  9%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
      0.00            +0.1        0.08 ± 11%  perf-profile.children.cycles-pp.dl_server_start
      0.00            +0.1        0.08 ± 11%  perf-profile.children.cycles-pp.dl_server_stop
      0.46 ±  2%      +0.1        0.54 ±  2%  perf-profile.children.cycles-pp.select_task_rq_fair
      0.00            +0.1        0.10 ± 10%  perf-profile.children.cycles-pp.select_idle_core
      0.09 ±  7%      +0.1        0.20 ±  8%  perf-profile.children.cycles-pp.select_idle_cpu
      0.18 ±  4%      +0.1        0.31 ±  6%  perf-profile.children.cycles-pp.select_idle_sibling
      0.00            +0.2        0.18 ±  4%  perf-profile.children.cycles-pp.process_one_work
      0.06 ± 13%      +0.2        0.25 ±  9%  perf-profile.children.cycles-pp.schedule_idle
      0.44 ±  2%      +0.2        0.64 ±  8%  perf-profile.children.cycles-pp.prepare_to_wait
      0.00            +0.2        0.21 ±  5%  perf-profile.children.cycles-pp.kthread
      0.00            +0.2        0.21 ±  5%  perf-profile.children.cycles-pp.worker_thread
      0.00            +0.2        0.21 ±  4%  perf-profile.children.cycles-pp.ret_from_fork
      0.00            +0.2        0.21 ±  4%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.11 ± 12%      +0.3        0.36 ±  9%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.31 ± 35%      +0.3        0.59 ± 11%  perf-profile.children.cycles-pp.__cmd_record
      0.26 ± 45%      +0.3        0.54 ± 13%  perf-profile.children.cycles-pp.perf_session__process_events
      0.26 ± 45%      +0.3        0.54 ± 13%  perf-profile.children.cycles-pp.reader__read_event
      0.26 ± 45%      +0.3        0.54 ± 13%  perf-profile.children.cycles-pp.record__finish_output
      0.16 ± 11%      +0.3        0.45 ±  9%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.14 ± 11%      +0.3        0.45 ±  9%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.14 ± 60%      +0.3        0.48 ± 17%  perf-profile.children.cycles-pp.ordered_events__queue
      0.14 ± 61%      +0.3        0.48 ± 17%  perf-profile.children.cycles-pp.queue_event
      0.15 ± 59%      +0.3        0.49 ± 16%  perf-profile.children.cycles-pp.process_simple
      0.16 ± 12%      +0.4        0.54 ± 10%  perf-profile.children.cycles-pp.sysvec_call_function_single
      4.61 ±  3%      +0.5        5.13 ±  8%  perf-profile.children.cycles-pp.get_partial_node
      5.57 ±  3%      +0.6        6.12 ±  7%  perf-profile.children.cycles-pp.___slab_alloc
     18.44            +0.6       19.00        perf-profile.children.cycles-pp.sock_alloc_send_pskb
      6.51 ±  3%      +0.7        7.26 ±  9%  perf-profile.children.cycles-pp.__put_partials
      0.33 ± 14%      +1.0        1.30 ± 11%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.34 ± 17%      +1.1        1.47 ± 11%  perf-profile.children.cycles-pp.pv_native_safe_halt
      0.34 ± 17%      +1.1        1.48 ± 11%  perf-profile.children.cycles-pp.acpi_safe_halt
      0.34 ± 17%      +1.1        1.48 ± 11%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      0.34 ± 17%      +1.1        1.48 ± 11%  perf-profile.children.cycles-pp.acpi_idle_enter
      0.35 ± 17%      +1.2        1.53 ± 11%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.35 ± 17%      +1.2        1.54 ± 11%  perf-profile.children.cycles-pp.cpuidle_enter
      0.38 ± 17%      +1.3        1.63 ± 11%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.45 ± 16%      +1.5        1.94 ± 11%  perf-profile.children.cycles-pp.start_secondary
      0.46 ± 17%      +1.5        1.96 ± 11%  perf-profile.children.cycles-pp.do_idle
      0.46 ± 17%      +1.5        1.97 ± 11%  perf-profile.children.cycles-pp.common_startup_64
      0.46 ± 17%      +1.5        1.97 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
     13.76 ±  2%      +1.7       15.44 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     12.09 ±  2%      +1.9       14.00 ±  6%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      3.93            -0.2        3.69        perf-profile.self.cycles-pp.clear_bhb_loop
      3.43 ±  3%      -0.1        3.29        perf-profile.self.cycles-pp._copy_to_iter
      0.50 ±  2%      -0.1        0.39 ±  5%  perf-profile.self.cycles-pp.switch_mm_irqs_off
      1.37 ±  4%      -0.1        1.27 ±  2%  perf-profile.self.cycles-pp._copy_from_iter
      0.28 ±  2%      -0.1        0.18 ±  7%  perf-profile.self.cycles-pp.__enqueue_entity
      1.41 ±  3%      -0.1        1.31 ±  2%  perf-profile.self.cycles-pp.fdget_pos
      2.51            -0.1        2.42        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      1.35            -0.1        1.28        perf-profile.self.cycles-pp.read
      2.24            -0.1        2.17        perf-profile.self.cycles-pp.do_syscall_64
      0.27 ±  3%      -0.1        0.20 ±  3%  perf-profile.self.cycles-pp.update_cfs_group
      1.28            -0.1        1.22        perf-profile.self.cycles-pp.sock_write_iter
      0.84            -0.1        0.77        perf-profile.self.cycles-pp.vfs_read
      1.42            -0.1        1.36        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.20            -0.1        1.14        perf-profile.self.cycles-pp.__alloc_skb
      0.18 ±  2%      -0.1        0.13 ±  5%  perf-profile.self.cycles-pp.pick_eevdf
      1.04            -0.1        0.99        perf-profile.self.cycles-pp.its_return_thunk
      0.29 ±  2%      -0.1        0.24 ±  4%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.28 ±  5%      -0.1        0.23 ±  6%  perf-profile.self.cycles-pp.update_curr
      0.13 ±  5%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.switch_fpu_return
      0.20 ±  3%      -0.0        0.15 ±  6%  perf-profile.self.cycles-pp.__dequeue_entity
      1.00            -0.0        0.95        perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
      0.33            -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.update_load_avg
      0.88            -0.0        0.83 ±  2%  perf-profile.self.cycles-pp.vfs_write
      0.91            -0.0        0.86        perf-profile.self.cycles-pp.sock_read_iter
      0.13 ±  3%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.update_curr_se
      0.25 ±  2%      -0.0        0.21 ±  4%  perf-profile.self.cycles-pp.__update_load_avg_se
      1.22            -0.0        1.18        perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
      0.68            -0.0        0.63        perf-profile.self.cycles-pp.__check_object_size
      0.78 ±  2%      -0.0        0.74        perf-profile.self.cycles-pp.obj_cgroup_charge_account
      0.20 ±  3%      -0.0        0.16 ±  4%  perf-profile.self.cycles-pp.__switch_to
      0.15 ±  3%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.try_to_wake_up
      0.90            -0.0        0.86        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.76 ±  2%      -0.0        0.73        perf-profile.self.cycles-pp.__check_heap_object
      0.92            -0.0        0.89 ±  2%  perf-profile.self.cycles-pp.__account_obj_stock
      0.19 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.check_stack_object
      0.40 ±  3%      -0.0        0.37        perf-profile.self.cycles-pp.__schedule
      0.60 ±  2%      -0.0        0.56 ±  3%  perf-profile.self.cycles-pp.__virt_addr_valid
      0.71            -0.0        0.68        perf-profile.self.cycles-pp.__skb_datagram_iter
      0.18 ±  4%      -0.0        0.14 ±  5%  perf-profile.self.cycles-pp.task_mm_cid_work
      0.68            -0.0        0.65        perf-profile.self.cycles-pp.refill_obj_stock
      0.34            -0.0        0.31 ±  2%  perf-profile.self.cycles-pp.unix_stream_recvmsg
      0.06 ±  7%      -0.0        0.03 ± 70%  perf-profile.self.cycles-pp.enqueue_task
      0.11            -0.0        0.08        perf-profile.self.cycles-pp.pick_task_fair
      0.15 ±  2%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.20 ±  3%      -0.0        0.17 ±  7%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.41            -0.0        0.38        perf-profile.self.cycles-pp.sock_recvmsg
      0.10            -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.update_min_vruntime
      0.13 ±  3%      -0.0        0.10        perf-profile.self.cycles-pp.task_h_load
      0.23 ±  3%      -0.0        0.20 ±  6%  perf-profile.self.cycles-pp.__get_user_8
      0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.exit_to_user_mode_loop
      0.39 ±  2%      -0.0        0.37 ±  2%  perf-profile.self.cycles-pp.rw_verify_area
      0.11 ±  3%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.os_xsave
      0.12 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.pick_next_task_fair
      0.35            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
      0.46            -0.0        0.44        perf-profile.self.cycles-pp.mutex_lock
      0.11 ±  4%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__switch_to_asm
      0.10 ±  3%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.enqueue_entity
      0.08 ±  7%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.place_entity
      0.30            -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.alloc_skb_with_frags
      0.50            -0.0        0.48        perf-profile.self.cycles-pp.kfree
      0.30            -0.0        0.28        perf-profile.self.cycles-pp.ksys_write
      0.12 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.dequeue_entity
      0.11 ±  4%      -0.0        0.09        perf-profile.self.cycles-pp.prepare_to_wait
      0.19 ±  2%      -0.0        0.17        perf-profile.self.cycles-pp.update_rq_clock_task
      0.27            -0.0        0.25 ±  2%  perf-profile.self.cycles-pp.__build_skb_around
      0.08 ±  6%      -0.0        0.06 ±  9%  perf-profile.self.cycles-pp.vruntime_eligible
      0.12 ±  4%      -0.0        0.10        perf-profile.self.cycles-pp.__wake_up_common
      0.27            -0.0        0.26        perf-profile.self.cycles-pp.kmalloc_reserve
      0.48            -0.0        0.46        perf-profile.self.cycles-pp.unix_write_space
      0.19            -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.skb_copy_datagram_iter
      0.07            -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.__calc_delta
      0.06 ±  6%      -0.0        0.05        perf-profile.self.cycles-pp.__put_user_8
      0.28            -0.0        0.27        perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.11            -0.0        0.10        perf-profile.self.cycles-pp.wait_for_unix_gc
      0.05            +0.0        0.06        perf-profile.self.cycles-pp.__x64_sys_write
      0.07 ±  5%      +0.0        0.08 ±  5%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.19 ±  7%      +0.0        0.22 ±  4%  perf-profile.self.cycles-pp.prepare_task_switch
      0.10 ±  6%      +0.1        0.17 ±  5%  perf-profile.self.cycles-pp.available_idle_cpu
      0.14 ± 61%      +0.3        0.48 ± 17%  perf-profile.self.cycles-pp.queue_event
      0.19 ± 18%      +0.7        0.85 ± 12%  perf-profile.self.cycles-pp.pv_native_safe_halt
     12.07 ±  2%      +1.9       13.97 ±  6%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_3/aim9/300s

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      9156           +20.2%      11004        vmstat.system.cs
   8715946 ±  6%     -14.0%    7494314 ± 13%  meminfo.DirectMap2M
     10992           +85.4%      20381        meminfo.PageTables
    318.58            -1.7%     313.01        aim9.shell_rtns_3.ops_per_sec
  27145198            -2.1%   26576524        aim9.time.minor_page_faults
   1049306            -1.8%    1030938        aim9.time.voluntary_context_switches
      6173 ± 20%     +74.0%      10742 ±  4%  numa-meminfo.node0.PageTables
      5702 ± 31%     +55.1%       8844 ± 19%  numa-meminfo.node0.Shmem
      4803 ± 25%    +100.6%       9636 ±  6%  numa-meminfo.node1.PageTables
      1538 ± 20%     +73.7%       2673 ±  5%  numa-vmstat.node0.nr_page_table_pages
      1425 ± 31%     +55.1%       2210 ± 19%  numa-vmstat.node0.nr_shmem
      1194 ± 25%    +101.2%       2402 ±  6%  numa-vmstat.node1.nr_page_table_pages
     30413           +19.3%      36291        sched_debug.cpu.nr_switches.avg
     84768 ±  6%     +20.3%     101955 ±  4%  sched_debug.cpu.nr_switches.max
     25510 ± 13%     +23.0%      31383 ±  3%  sched_debug.cpu.nr_switches.stddev
      2727           +85.8%       5066        proc-vmstat.nr_page_table_pages
  19325131            -1.6%   19014535        proc-vmstat.numa_hit
  19274656            -1.6%   18964467        proc-vmstat.numa_local
  19877211            -1.6%   19563123        proc-vmstat.pgalloc_normal
  28020416            -2.0%   27451741        proc-vmstat.pgfault
  19829318            -1.6%   19508263        proc-vmstat.pgfree
      2679            -1.6%       2636        proc-vmstat.unevictable_pgs_culled
      0.03 ± 10%     +30.9%       0.04 ±  2%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ±  5%     +26.2%       0.02 ±  3%  perf-sched.total_sch_delay.average.ms
     27.03 ±  2%     -12.4%      23.66        perf-sched.total_wait_and_delay.average.ms
     23171           +18.2%      27385        perf-sched.total_wait_and_delay.count.ms
     27.01 ±  2%     -12.5%      23.64        perf-sched.total_wait_time.average.ms
    110.73 ±  4%     -71.1%      31.98        perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1662 ±  2%    +278.6%       6294        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    110.70 ±  4%     -71.1%      31.94        perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      5.94            +0.1        6.00        perf-stat.i.branch-miss-rate%
      9184           +20.2%      11041        perf-stat.i.context-switches
      1.96            +1.6%       1.99        perf-stat.i.cpi
     71.73 ±  4%     +66.1%     119.11 ±  5%  perf-stat.i.cpu-migrations
      0.53            -1.4%       0.52        perf-stat.i.ipc
      3.79            -2.0%       3.71        perf-stat.i.metric.K/sec
     90919            -2.0%      89065        perf-stat.i.minor-faults
     90919            -2.0%      89065        perf-stat.i.page-faults
      6.00            +0.1        6.06        perf-stat.overall.branch-miss-rate%
      1.79            +1.2%       1.81        perf-stat.overall.cpi
      0.56            -1.2%       0.55        perf-stat.overall.ipc
      9154           +20.2%      11004        perf-stat.ps.context-switches
     71.49 ±  4%     +66.1%     118.72 ±  5%  perf-stat.ps.cpu-migrations
     90616            -2.0%      88768        perf-stat.ps.minor-faults
     90616            -2.0%      88768        perf-stat.ps.page-faults
      8.89            -0.2        8.68        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      8.88            -0.2        8.66        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.47 ±  2%      -0.2        3.29        perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.47 ±  2%      -0.2        3.29        perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.51 ±  3%      -0.2        3.33        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.47 ±  2%      -0.2        3.29        perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
      1.66 ±  2%      -0.1        1.57 ±  4%  perf-profile.calltrace.cycles-pp.setlocale
      0.27 ±100%      +0.3        0.61 ±  5%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
      0.18 ±141%      +0.4        0.60 ±  5%  perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
     62.46            +0.6       63.01        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      0.09 ±223%      +0.6        0.65 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      0.09 ±223%      +0.6        0.65 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
     49.01            +0.6       49.60        perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     67.47            +0.7       68.17        perf-profile.calltrace.cycles-pp.common_startup_64
     20.25            -0.7       19.58        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     20.21            -0.7       19.54        perf-profile.children.cycles-pp.do_syscall_64
      6.54            -0.2        6.33        perf-profile.children.cycles-pp.asm_exc_page_fault
      6.10            -0.2        5.90        perf-profile.children.cycles-pp.do_user_addr_fault
      3.77 ±  3%      -0.2        3.60        perf-profile.children.cycles-pp.x64_sys_call
      3.62 ±  3%      -0.2        3.46        perf-profile.children.cycles-pp.do_exit
      2.63 ±  3%      -0.2        2.48 ±  2%  perf-profile.children.cycles-pp.__mmput
      2.16 ±  2%      -0.1        2.06 ±  3%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
      1.66 ±  2%      -0.1        1.57 ±  4%  perf-profile.children.cycles-pp.setlocale
      2.69 ±  2%      -0.1        2.61        perf-profile.children.cycles-pp.do_pte_missing
      0.77 ±  5%      -0.1        0.70 ±  6%  perf-profile.children.cycles-pp.tlb_finish_mmu
      0.92 ±  2%      -0.0        0.87 ±  4%  perf-profile.children.cycles-pp.__irqentry_text_end
      0.08 ± 10%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.tick_nohz_tick_stopped
      0.10 ± 11%      -0.0        0.07 ± 21%  perf-profile.children.cycles-pp.__percpu_counter_init_many
      0.14 ±  9%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.strnlen
      0.12 ± 11%      -0.0        0.10 ±  8%  perf-profile.children.cycles-pp.mas_prev_slot
      0.11 ± 12%      +0.0        0.14 ±  9%  perf-profile.children.cycles-pp.update_curr
      0.19 ±  8%      +0.0        0.22 ±  6%  perf-profile.children.cycles-pp.enqueue_entity
      0.10 ± 11%      +0.0        0.13 ± 11%  perf-profile.children.cycles-pp.__perf_event_task_sched_out
      0.05 ± 46%      +0.0        0.08 ± 13%  perf-profile.children.cycles-pp.select_task_rq
      0.13 ± 14%      +0.0        0.17 ±  8%  perf-profile.children.cycles-pp.perf_pmu_sched_task
      0.20 ± 10%      +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.try_to_wake_up
      0.28 ±  9%      +0.1        0.34 ±  9%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      0.04 ± 44%      +0.1        0.11 ± 13%  perf-profile.children.cycles-pp.__queue_work
      0.30 ± 11%      +0.1        0.38 ±  8%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.30 ±  4%      +0.1        0.38 ±  8%  perf-profile.children.cycles-pp.__pick_next_task
      0.22 ±  7%      +0.1        0.29 ±  9%  perf-profile.children.cycles-pp.try_to_block_task
      0.02 ±141%      +0.1        0.09 ± 10%  perf-profile.children.cycles-pp.kick_pool
      0.02 ± 99%      +0.1        0.10 ± 19%  perf-profile.children.cycles-pp.queue_work_on
      0.25 ±  4%      +0.1        0.35 ±  7%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.33 ±  6%      +0.1        0.43 ±  5%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.29 ±  4%      +0.1        0.39 ±  6%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.51 ±  6%      +0.1        0.63 ±  6%  perf-profile.children.cycles-pp.schedule_idle
      0.46 ±  7%      +0.1        0.58 ±  5%  perf-profile.children.cycles-pp.schedule
      0.88 ±  6%      +0.2        1.04 ±  5%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.18 ±  6%      +0.2        0.34 ±  8%  perf-profile.children.cycles-pp.worker_thread
      0.88 ±  6%      +0.2        1.04 ±  5%  perf-profile.children.cycles-pp.ret_from_fork
      0.38 ±  8%      +0.2        0.56 ± 10%  perf-profile.children.cycles-pp.kthread
      1.08 ±  3%      +0.2        1.32 ±  2%  perf-profile.children.cycles-pp.__schedule
     66.15            +0.5       66.64        perf-profile.children.cycles-pp.cpuidle_idle_call
     62.89            +0.6       63.47        perf-profile.children.cycles-pp.cpuidle_enter_state
     63.00            +0.6       63.59        perf-profile.children.cycles-pp.cpuidle_enter
     49.10            +0.6       49.69        perf-profile.children.cycles-pp.intel_idle
     67.47            +0.7       68.17        perf-profile.children.cycles-pp.do_idle
     67.47            +0.7       68.17        perf-profile.children.cycles-pp.common_startup_64
     67.47            +0.7       68.17        perf-profile.children.cycles-pp.cpu_startup_entry
      0.91 ±  2%      -0.0        0.86 ±  4%  perf-profile.self.cycles-pp.__irqentry_text_end
      0.14 ± 11%      +0.1        0.22 ± 11%  perf-profile.self.cycles-pp.timerqueue_del
     49.08            +0.6       49.68        perf-profile.self.cycles-pp.intel_idle



***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/800%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   3745213 ± 39%    +108.1%    7794858 ± 12%  cpuidle..usage
    186670           +17.3%     218939 ±  2%  meminfo.Percpu
      5.00          +306.7%      20.33 ± 66%  mpstat.max_utilization.seconds
      9.35 ± 76%      -4.5        4.80 ±141%  perf-profile.calltrace.cycles-pp.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
      8.90 ± 75%      -4.3        4.57 ±141%  perf-profile.calltrace.cycles-pp.perf_session__deliver_event.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
      3283 ±  7%     -16.2%       2751 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.avg
      3283 ±  7%     -16.2%       2751 ±  5%  sched_debug.cfs_rq:/.min_vruntime.avg
   1522512 ±  6%     +80.0%    2739797 ±  4%  vmstat.system.cs
    308726 ±  8%     +60.5%     495472 ±  5%  vmstat.system.in
    467562            +3.7%     485068 ±  2%  proc-vmstat.nr_kernel_stack
    266084            +3.8%     276310        proc-vmstat.nr_slab_unreclaimable
 1.375e+08            -2.0%  1.347e+08        proc-vmstat.numa_hit
 1.373e+08            -2.0%  1.346e+08        proc-vmstat.numa_local
    217472 ±  3%     -28.1%     156410        proc-vmstat.numa_other
 1.382e+08            -2.0%  1.354e+08        proc-vmstat.pgalloc_normal
 1.375e+08            -2.0%  1.347e+08        proc-vmstat.pgfree
   1514102            -6.2%    1420287        hackbench.throughput
   1480357            -6.7%    1380775        hackbench.throughput_avg
   1514102            -6.2%    1420287        hackbench.throughput_best
   1436918            -7.9%    1323413        hackbench.throughput_worst
  14551264 ± 13%    +138.1%   34644707 ±  3%  hackbench.time.involuntary_context_switches
      9919            -1.6%       9762        hackbench.time.percent_of_cpu_this_job_got
      4239            +4.5%       4428        hackbench.time.system_time
  56365933 ±  6%     +65.3%   93172066 ±  4%  hackbench.time.voluntary_context_switches
  65085618           +26.7%   82440571 ±  2%  perf-stat.i.branch-misses
     31.25            -1.6       29.66        perf-stat.i.cache-miss-rate%
 2.469e+08            +8.9%  2.689e+08        perf-stat.i.cache-misses
 7.519e+08           +15.9%  8.712e+08        perf-stat.i.cache-references
   1353061 ±  7%     +87.5%    2537450 ±  5%  perf-stat.i.context-switches
 2.269e+11            +3.5%  2.348e+11        perf-stat.i.cpu-cycles
    134588 ± 13%     +81.9%     244825 ±  8%  perf-stat.i.cpu-migrations
     13.60 ±  5%     +70.5%      23.20 ±  5%  perf-stat.i.metric.K/sec
      1.26            +7.6%       1.35        perf-stat.overall.MPKI
      0.11 ±  2%      +0.0        0.14 ±  2%  perf-stat.overall.branch-miss-rate%
     34.12            -2.1       31.97        perf-stat.overall.cache-miss-rate%
      1.17            +1.8%       1.19        perf-stat.overall.cpi
    931.96            -5.3%     882.44        perf-stat.overall.cycles-between-cache-misses
      0.85            -1.8%       0.84        perf-stat.overall.ipc
 5.372e+10            -1.2%   5.31e+10        perf-stat.ps.branch-instructions
  57783128 ±  2%     +32.9%   76802898 ±  2%  perf-stat.ps.branch-misses
 2.696e+08            +7.2%   2.89e+08        perf-stat.ps.cache-misses
 7.902e+08           +14.4%  9.039e+08        perf-stat.ps.cache-references
   1288664 ±  7%     +94.6%    2508227 ±  5%  perf-stat.ps.context-switches
 2.512e+11            +1.5%   2.55e+11        perf-stat.ps.cpu-cycles
    122960 ± 14%     +82.3%     224127 ±  9%  perf-stat.ps.cpu-migrations
 1.108e+13            +5.7%  1.171e+13        perf-stat.total.instructions
      0.94 ±223%   +5929.9%      56.62 ±121%  perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
     26.44 ± 81%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
    100.25 ±141%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      9.01 ± 43%   +1823.1%     173.24 ±106%  perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
     49.43 ± 14%     +73.8%      85.93 ± 19%  perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
    130.63 ± 17%    +135.8%     308.04 ± 28%  perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
     18.09 ± 30%    +130.4%      41.70 ± 26%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    196.51 ± 21%    +102.9%     398.77 ± 15%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     34.17 ± 39%    +191.1%      99.46 ± 20%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    154.91 ±163%   +1649.9%       2710 ± 91%  perf-sched.sch_delay.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
      0.94 ±223%  +1.9e+05%       1743 ±120%  perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      3.19 ±124%     -91.9%       0.26 ±150%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    646.26 ± 94%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
    282.66 ±139%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     63.17 ± 52%   +2854.4%       1866 ±121%  perf-sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
      1507 ± 35%    +249.4%       5266 ± 47%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3915 ± 67%     +98.7%       7779 ± 16%  perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     53.31 ± 18%     +79.9%      95.90 ± 23%  perf-sched.total_sch_delay.average.ms
    149.37 ± 18%     +80.0%     268.92 ± 22%  perf-sched.total_wait_and_delay.average.ms
     96.07 ± 18%     +80.1%     173.01 ± 21%  perf-sched.total_wait_time.average.ms
    244.53 ± 47%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
    529.64 ± 20%     +38.5%     733.60 ± 20%  perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
    136.52 ± 15%     +73.7%     237.07 ± 18%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
    373.41 ± 16%    +136.3%     882.34 ± 27%  perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
     51.96 ± 29%    +127.5%     118.22 ± 25%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    554.86 ± 23%    +103.0%       1126 ± 14%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    298.52 ±136%    +436.9%       1602 ± 27%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
    556.66 ± 37%     -97.1%      16.09 ± 47%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    707.67 ± 31%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
      1358 ± 28%   +4707.9%      65291 ± 27%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     12184 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
      1393 ±134%    +379.9%       6685 ± 15%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      6927 ±  6%    +119.8%      15224 ± 19%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    341.61 ± 21%     +39.1%     475.15 ± 20%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
     51.39 ± 99%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
    121.14 ±122%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     87.09 ± 15%     +73.6%     151.14 ± 18%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
    242.78 ± 16%    +136.6%     574.31 ± 27%  perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
     33.86 ± 29%    +126.0%      76.52 ± 24%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    250.32 ±109%     -89.4%      26.44 ±111%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
    358.36 ± 25%    +103.1%     727.72 ± 14%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     77.40 ± 47%    +102.5%     156.70 ± 28%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
     17.91 ± 42%     -75.3%       4.42 ± 76%  perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
    266.70 ±137%    +431.6%       1417 ± 36%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
    536.93 ± 40%     -97.4%      13.81 ± 50%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    180.38 ±135%   +2208.8%       4164 ± 71%  perf-sched.wait_time.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
      1028 ±129%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
    312.94 ±123%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
    418.66 ±132%     -93.7%      26.44 ±111%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
      1388 ±133%    +379.7%       6660 ± 15%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      2022 ± 25%    +164.9%       5358 ± 46%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread



***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_1/aim9/300s

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     11004           +86.2%      20490        meminfo.PageTables
    121.33 ± 12%     +18.8%     144.17 ±  5%  perf-c2c.DRAM.remote
      9155           +20.0%      10990        vmstat.system.cs
      5129 ± 20%    +107.2%      10631 ±  3%  numa-meminfo.node0.PageTables
      5864 ± 17%     +67.3%       9811 ±  3%  numa-meminfo.node1.PageTables
      1278 ± 20%    +107.9%       2658 ±  3%  numa-vmstat.node0.nr_page_table_pages
      1469 ± 17%     +66.4%       2446 ±  3%  numa-vmstat.node1.nr_page_table_pages
    319.43            -2.1%     312.66        aim9.shell_rtns_1.ops_per_sec
  27217846            -2.5%   26546962        aim9.time.minor_page_faults
   1051878            -2.1%    1029547        aim9.time.voluntary_context_switches
     30502           +18.6%      36187        sched_debug.cpu.nr_switches.avg
     90327 ± 12%     +22.7%     110866 ±  4%  sched_debug.cpu.nr_switches.max
     26316 ± 16%     +25.5%      33021 ±  5%  sched_debug.cpu.nr_switches.stddev
      0.03 ±  7%     +70.7%       0.05 ± 53%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ±  3%     +38.9%       0.02 ± 28%  perf-sched.total_sch_delay.average.ms
     27.43 ±  2%     -14.5%      23.45        perf-sched.total_wait_and_delay.average.ms
     23174           +18.0%      27340        perf-sched.total_wait_and_delay.count.ms
     27.41 ±  2%     -14.6%      23.42        perf-sched.total_wait_time.average.ms
    115.38 ±  3%     -71.9%      32.37 ±  2%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1656 ±  3%    +280.2%       6299        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    115.35 ±  3%     -72.0%      32.31 ±  2%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2737           +86.1%       5095        proc-vmstat.nr_page_table_pages
     30460            +3.2%      31439        proc-vmstat.nr_shmem
     27933            +1.8%      28432        proc-vmstat.nr_slab_unreclaimable
  19466749            -2.5%   18980434        proc-vmstat.numa_hit
  19414531            -2.5%   18927584        proc-vmstat.numa_local
  20028107            -2.5%   19528806        proc-vmstat.pgalloc_normal
  28087705            -2.4%   27417155        proc-vmstat.pgfault
  19980173            -2.5%   19474402        proc-vmstat.pgfree
    420074            -5.7%     396239 ±  8%  proc-vmstat.pgreuse
      2685            -1.9%       2633        proc-vmstat.unevictable_pgs_culled
  5.48e+08            -1.2%  5.412e+08        perf-stat.i.branch-instructions
      5.92            +0.1        6.00        perf-stat.i.branch-miss-rate%
      9195           +19.9%      11021        perf-stat.i.context-switches
      1.96            +1.7%       1.99        perf-stat.i.cpi
     70.13           +73.4%     121.59 ±  8%  perf-stat.i.cpu-migrations
 2.725e+09            -1.3%   2.69e+09        perf-stat.i.instructions
      0.53            -1.6%       0.52        perf-stat.i.ipc
      3.80            -2.4%       3.71        perf-stat.i.metric.K/sec
     91139            -2.4%      88949        perf-stat.i.minor-faults
     91139            -2.4%      88949        perf-stat.i.page-faults
      5.00 ± 44%      +1.1        6.07        perf-stat.overall.branch-miss-rate%
      1.49 ± 44%     +21.9%       1.82        perf-stat.overall.cpi
      7643 ± 44%     +43.7%      10984        perf-stat.ps.context-switches
     58.17 ± 44%    +108.4%     121.21 ±  8%  perf-stat.ps.cpu-migrations
      2.06 ±  2%      -0.2        1.87 ± 12%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.98 ±  7%      -0.2        0.83 ± 12%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      1.69 ±  2%      -0.1        1.54 ±  2%  perf-profile.calltrace.cycles-pp.setlocale
      0.58 ±  5%      -0.1        0.44 ± 44%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__open64_nocancel.setlocale
      0.72 ±  6%      -0.1        0.60 ±  8%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
      3.21 ±  2%      -0.1        3.11        perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
      0.70 ±  4%      -0.1        0.62 ±  6%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
      1.52 ±  2%      -0.1        1.44 ±  3%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.34 ±  3%      -0.1        1.28 ±  3%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.89 ±  3%      -0.1        0.84        perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.17 ±141%      +0.4        0.61 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      0.17 ±141%      +0.4        0.61 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
     65.10            +0.5       65.56        perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     66.40            +0.6       67.00        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     66.46            +0.6       67.08        perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
     66.46            +0.6       67.08        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
     67.63            +0.7       68.30        perf-profile.calltrace.cycles-pp.common_startup_64
     20.14            -0.6       19.51        perf-profile.children.cycles-pp.do_syscall_64
     20.20            -0.6       19.57        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.13 ±  5%      -0.2        0.98 ±  9%  perf-profile.children.cycles-pp.rcu_core
      1.69 ±  2%      -0.1        1.54 ±  2%  perf-profile.children.cycles-pp.setlocale
      0.84 ±  4%      -0.1        0.71 ±  5%  perf-profile.children.cycles-pp.rcu_do_batch
      2.16 ±  2%      -0.1        2.04 ±  3%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
      1.15 ±  4%      -0.1        1.04 ±  5%  perf-profile.children.cycles-pp.__open64_nocancel
      3.22 ±  2%      -0.1        3.12        perf-profile.children.cycles-pp.exec_binprm
      2.09 ±  2%      -0.1        2.00 ±  2%  perf-profile.children.cycles-pp.kernel_clone
      0.88 ±  4%      -0.1        0.79 ±  4%  perf-profile.children.cycles-pp.mas_store_prealloc
      2.19            -0.1        2.10 ±  3%  perf-profile.children.cycles-pp.__x64_sys_openat
      0.70 ±  4%      -0.1        0.62 ±  6%  perf-profile.children.cycles-pp.dup_mm
      1.36 ±  3%      -0.1        1.30        perf-profile.children.cycles-pp._Fork
      0.56 ±  4%      -0.1        0.50 ±  8%  perf-profile.children.cycles-pp.dup_mmap
      0.09 ± 16%      -0.1        0.03 ± 70%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
      0.31 ±  8%      -0.1        0.25 ± 10%  perf-profile.children.cycles-pp.strncpy_from_user
      0.94 ±  3%      -0.1        0.88 ±  2%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.41 ±  5%      -0.0        0.36 ±  5%  perf-profile.children.cycles-pp.irqtime_account_irq
      0.18 ± 12%      -0.0        0.14 ±  7%  perf-profile.children.cycles-pp.tlb_remove_table_rcu
      0.20 ±  7%      -0.0        0.17 ±  9%  perf-profile.children.cycles-pp.perf_event_task_tick
      0.08 ± 14%      -0.0        0.05 ± 49%  perf-profile.children.cycles-pp.mas_update_gap
      0.24 ±  5%      -0.0        0.21 ±  5%  perf-profile.children.cycles-pp.filemap_read
      0.19 ±  7%      -0.0        0.16 ±  8%  perf-profile.children.cycles-pp.__call_rcu_common
      0.22 ±  2%      -0.0        0.19 ±  5%  perf-profile.children.cycles-pp.mas_next_slot
      0.09 ±  5%      +0.0        0.12 ±  7%  perf-profile.children.cycles-pp.__perf_event_task_sched_out
      0.05 ± 47%      +0.0        0.08 ± 10%  perf-profile.children.cycles-pp.lru_gen_del_folio
      0.10 ± 14%      +0.0        0.12 ± 18%  perf-profile.children.cycles-pp.__folio_mod_stat
      0.12 ± 12%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.perf_pmu_sched_task
      0.20 ± 10%      +0.0        0.24 ±  4%  perf-profile.children.cycles-pp.prepare_task_switch
      0.06 ± 47%      +0.0        0.10 ± 11%  perf-profile.children.cycles-pp.__queue_work
      0.56 ±  5%      +0.1        0.61 ±  4%  perf-profile.children.cycles-pp.sched_balance_domains
      0.04 ± 72%      +0.1        0.09 ± 11%  perf-profile.children.cycles-pp.kick_pool
      0.04 ± 72%      +0.1        0.09 ± 14%  perf-profile.children.cycles-pp.queue_work_on
      0.33 ±  6%      +0.1        0.38 ±  7%  perf-profile.children.cycles-pp.dequeue_entities
      0.35 ±  6%      +0.1        0.40 ±  7%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.52 ±  6%      +0.1        0.58 ±  5%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.54 ±  7%      +0.1        0.60 ±  5%  perf-profile.children.cycles-pp.enqueue_task
      0.28 ±  9%      +0.1        0.35 ±  5%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      0.21 ±  4%      +0.1        0.28 ± 12%  perf-profile.children.cycles-pp.try_to_block_task
      0.34 ±  4%      +0.1        0.42 ±  3%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.36 ±  3%      +0.1        0.46 ±  6%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.28 ±  4%      +0.1        0.38 ±  5%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.33 ±  2%      +0.1        0.43 ±  5%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.46 ±  7%      +0.1        0.56 ±  6%  perf-profile.children.cycles-pp.schedule
      0.48 ±  8%      +0.1        0.61 ±  8%  perf-profile.children.cycles-pp.timerqueue_del
      0.18 ± 13%      +0.1        0.32 ± 11%  perf-profile.children.cycles-pp.worker_thread
      0.38 ±  9%      +0.2        0.52 ± 10%  perf-profile.children.cycles-pp.kthread
      1.10 ±  5%      +0.2        1.25 ±  2%  perf-profile.children.cycles-pp.__schedule
      0.85 ±  8%      +0.2        1.01 ±  7%  perf-profile.children.cycles-pp.ret_from_fork
      0.85 ±  8%      +0.2        1.02 ±  7%  perf-profile.children.cycles-pp.ret_from_fork_asm
     63.15            +0.5       63.64        perf-profile.children.cycles-pp.cpuidle_enter
     66.26            +0.5       66.77        perf-profile.children.cycles-pp.cpuidle_idle_call
     66.46            +0.6       67.08        perf-profile.children.cycles-pp.start_secondary
     67.63            +0.7       68.30        perf-profile.children.cycles-pp.common_startup_64
     67.63            +0.7       68.30        perf-profile.children.cycles-pp.cpu_startup_entry
     67.63            +0.7       68.30        perf-profile.children.cycles-pp.do_idle
      1.20 ±  3%      -0.1        1.12 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.09 ± 16%      -0.1        0.03 ± 70%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
      0.25 ±  6%      -0.0        0.21 ± 12%  perf-profile.self.cycles-pp.irqtime_account_irq
      0.02 ±141%      +0.0        0.06 ± 13%  perf-profile.self.cycles-pp.prepend_path
      0.13 ± 10%      +0.1        0.24 ± 11%  perf-profile.self.cycles-pp.timerqueue_del



***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 3.924e+08 ±  3%     +55.1%  6.086e+08 ±  2%  cpuidle..time
   7504886 ± 11%    +184.4%   21340245 ±  6%  cpuidle..usage
  13350305            -3.8%   12848570        vmstat.system.cs
   1849619            +5.1%    1943754        vmstat.system.in
      3.56 ±  5%      +2.6        6.16 ±  7%  mpstat.cpu.all.idle%
      0.69            +0.2        0.90 ±  3%  mpstat.cpu.all.irq%
      0.03 ±  3%      +0.0        0.04 ±  3%  mpstat.cpu.all.soft%
     18666 ±  9%     +41.2%      26352 ±  6%  perf-c2c.DRAM.remote
    197041           -39.6%     118945 ±  5%  perf-c2c.HITM.local
      3178 ± 12%     +37.2%       4361 ± 11%  perf-c2c.HITM.remote
    200219           -38.4%     123307 ±  5%  perf-c2c.HITM.total
   2842579 ± 11%     +60.1%    4550025 ± 12%  meminfo.Active
   2842579 ± 11%     +60.1%    4550025 ± 12%  meminfo.Active(anon)
   5535242 ±  5%     +30.9%    7248257 ±  7%  meminfo.Cached
   3846718 ±  8%     +44.0%    5539484 ±  9%  meminfo.Committed_AS
   9684149 ±  3%     +20.5%   11666616 ±  4%  meminfo.Memused
    136127 ±  3%     +14.2%     155524        meminfo.PageTables
     62144           +22.8%      76336        meminfo.Percpu
   2001586 ± 16%     +85.6%    3714611 ± 14%  meminfo.Shmem
   9759598 ±  3%     +20.0%   11714619 ±  4%  meminfo.max_used_kB
    710625 ± 11%     +59.3%    1131770 ± 11%  proc-vmstat.nr_active_anon
   1383631 ±  5%     +30.6%    1806419 ±  7%  proc-vmstat.nr_file_pages
     34220 ±  3%     +13.9%      38987        proc-vmstat.nr_page_table_pages
    500216 ± 16%     +84.5%     923007 ± 14%  proc-vmstat.nr_shmem
    710625 ± 11%     +59.3%    1131770 ± 11%  proc-vmstat.nr_zone_active_anon
  92308030            +8.7%  1.004e+08        proc-vmstat.numa_hit
  92171407            +8.7%  1.002e+08        proc-vmstat.numa_local
    133616            +2.7%     137265        proc-vmstat.numa_other
  92394313            +8.7%  1.004e+08        proc-vmstat.pgalloc_normal
  91035691            +7.8%   98094626        proc-vmstat.pgfree
    867815           +11.8%     970369        hackbench.throughput
    830278           +11.6%     926834        hackbench.throughput_avg
    867815           +11.8%     970369        hackbench.throughput_best
    760822           +14.2%     869145        hackbench.throughput_worst
     72.87           -10.3%      65.36        hackbench.time.elapsed_time
     72.87           -10.3%      65.36        hackbench.time.elapsed_time.max
 2.493e+08           -17.7%  2.052e+08        hackbench.time.involuntary_context_switches
     12357            -3.9%      11879        hackbench.time.percent_of_cpu_this_job_got
      8029           -14.8%       6842        hackbench.time.system_time
    976.58            -5.5%     923.21        hackbench.time.user_time
  7.54e+08           -14.4%  6.451e+08        hackbench.time.voluntary_context_switches
 5.598e+10            +6.6%  5.965e+10        perf-stat.i.branch-instructions
      0.40            -0.0        0.38        perf-stat.i.branch-miss-rate%
      8.36 ±  2%      +4.6       12.98 ±  3%  perf-stat.i.cache-miss-rate%
  2.11e+09           -33.8%  1.396e+09        perf-stat.i.cache-references
  13687653            -3.4%   13225338        perf-stat.i.context-switches
      1.36            -7.9%       1.25        perf-stat.i.cpi
 3.219e+11            -2.2%  3.147e+11        perf-stat.i.cpu-cycles
      1915 ±  2%      -6.6%       1788 ±  3%  perf-stat.i.cycles-between-cache-misses
 2.371e+11            +6.0%  2.512e+11        perf-stat.i.instructions
      0.74            +8.5%       0.80        perf-stat.i.ipc
      1.15 ± 14%     -28.3%       0.82 ± 23%  perf-stat.i.major-faults
    115.09            -3.2%     111.40        perf-stat.i.metric.K/sec
      0.37            -0.0        0.35        perf-stat.overall.branch-miss-rate%
      8.15 ±  3%      +4.6       12.74 ±  3%  perf-stat.overall.cache-miss-rate%
      1.36            -7.7%       1.25        perf-stat.overall.cpi
      1875 ±  2%      -5.5%       1772 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.74            +8.3%       0.80        perf-stat.overall.ipc
 5.524e+10            +6.4%  5.877e+10        perf-stat.ps.branch-instructions
 2.079e+09           -33.9%  1.375e+09        perf-stat.ps.cache-references
  13486088            -3.4%   13020988        perf-stat.ps.context-switches
 3.175e+11            -2.3%  3.101e+11        perf-stat.ps.cpu-cycles
  2.34e+11            +5.8%  2.475e+11        perf-stat.ps.instructions
      1.09 ± 14%     -28.3%       0.78 ± 21%  perf-stat.ps.major-faults
  1.73e+13            -5.1%  1.642e+13        perf-stat.total.instructions
   3527725           +10.7%    3905361        sched_debug.cfs_rq:/.avg_vruntime.avg
   3975260           +14.1%    4535959 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.max
     98657 ± 17%     +84.9%     182407 ± 18%  sched_debug.cfs_rq:/.avg_vruntime.stddev
     11.83 ±  7%     +17.6%      13.92 ±  5%  sched_debug.cfs_rq:/.h_nr_queued.max
      2.71 ±  5%     +21.8%       3.30 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.stddev
     11.75 ±  7%     +17.7%      13.83 ±  6%  sched_debug.cfs_rq:/.h_nr_runnable.max
      2.68 ±  4%     +21.2%       3.25 ±  5%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
      4556 ±223%    +691.0%      36039 ± 34%  sched_debug.cfs_rq:/.left_deadline.avg
    583131 ±223%    +577.3%    3949548 ±  4%  sched_debug.cfs_rq:/.left_deadline.max
     51341 ±223%    +622.0%     370695 ± 16%  sched_debug.cfs_rq:/.left_deadline.stddev
      4555 ±223%    +691.0%      36035 ± 34%  sched_debug.cfs_rq:/.left_vruntime.avg
    583105 ±223%    +577.3%    3949123 ±  4%  sched_debug.cfs_rq:/.left_vruntime.max
     51338 ±223%    +622.0%     370651 ± 16%  sched_debug.cfs_rq:/.left_vruntime.stddev
   3527725           +10.7%    3905361        sched_debug.cfs_rq:/.min_vruntime.avg
   3975260           +14.1%    4535959 ±  6%  sched_debug.cfs_rq:/.min_vruntime.max
     98657 ± 17%     +84.9%     182407 ± 18%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.22 ±  5%     +13.9%       0.25 ±  5%  sched_debug.cfs_rq:/.nr_queued.stddev
      4555 ±223%    +691.0%      36035 ± 34%  sched_debug.cfs_rq:/.right_vruntime.avg
    583105 ±223%    +577.3%    3949123 ±  4%  sched_debug.cfs_rq:/.right_vruntime.max
     51338 ±223%    +622.0%     370651 ± 16%  sched_debug.cfs_rq:/.right_vruntime.stddev
      1336 ±  7%     +50.8%       2014 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
    552.53 ±  8%     +19.6%     660.87 ±  5%  sched_debug.cfs_rq:/.util_est.avg
    384.27 ±  9%     +28.9%     495.43 ± 11%  sched_debug.cfs_rq:/.util_est.stddev
      1328 ± 17%     +42.7%       1896 ± 13%  sched_debug.cpu.curr->pid.stddev
     11.75 ±  8%     +19.1%      14.00 ±  6%  sched_debug.cpu.nr_running.max
      2.71 ±  5%     +22.7%       3.33 ±  4%  sched_debug.cpu.nr_running.stddev
     76578 ±  9%     +33.7%     102390 ±  5%  sched_debug.cpu.nr_switches.stddev
     62.25 ±  7%     +17.9%      73.42 ±  7%  sched_debug.cpu.nr_uninterruptible.max
      8.11 ± 58%     -82.0%       1.46 ± 47%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
     12.04 ±104%     -86.8%       1.58 ± 55%  perf-sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
      0.11 ±123%     -95.3%       0.01 ±102%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
      0.06 ±103%     -93.6%       0.00 ±154%  perf-sched.sch_delay.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
      0.10 ±109%     -93.9%       0.01 ±163%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
      1.00 ± 21%     -59.6%       0.40 ± 50%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
     14.54 ± 14%     -79.2%       3.02 ± 51%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
      1.50 ± 84%     -74.1%       0.39 ± 90%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      1.13 ± 68%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.38 ± 97%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      1.10 ± 17%     -68.9%       0.34 ± 49%  perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
     42.25 ± 18%     -71.7%      11.96 ± 53%  perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
      3.25 ± 17%     -77.5%       0.73 ± 49%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     29.17 ± 33%     -62.0%      11.09 ± 85%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     46.25 ± 15%     -68.8%      14.43 ± 52%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      3.72 ± 70%     -81.0%       0.70 ± 67%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      7.95 ± 55%     -69.7%       2.41 ± 65%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      3.66 ±139%     -97.1%       0.11 ± 58%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
      3.05 ± 44%     -91.9%       0.25 ± 57%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
     29.96 ±  9%     -83.6%       4.90 ± 48%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
     26.20 ± 59%     -88.9%       2.92 ± 66%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.14 ± 84%     -91.2%       0.01 ±142%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
      0.20 ±149%     -97.5%       0.01 ±102%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
      0.11 ±144%     -96.6%       0.00 ±154%  perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
      0.19 ±118%     -96.7%       0.01 ±163%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
    274.64 ± 95%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.72 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      3135 ±  5%     -48.6%       1611 ± 57%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1320 ± 19%     -78.6%     282.01 ± 74%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    265.55 ± 82%     -77.9%      58.70 ±124%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
      1850 ± 28%     -59.1%     757.74 ± 68%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
    766.85 ± 56%     -68.0%     245.51 ± 51%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1.77 ± 17%     -71.9%       0.50 ± 49%  perf-sched.total_sch_delay.average.ms
      5.15 ± 17%     -69.5%       1.57 ± 48%  perf-sched.total_wait_and_delay.average.ms
      3.38 ± 17%     -68.2%       1.07 ± 48%  perf-sched.total_wait_time.average.ms
      5100 ±  3%     -31.0%       3522 ± 47%  perf-sched.total_wait_time.max.ms
     27.42 ± 49%     -85.2%       4.07 ± 47%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
     35.29 ± 80%     -85.8%       5.00 ± 51%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
     42.28 ± 14%     -79.4%       8.70 ± 51%  perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
      3.12 ± 17%     -66.4%       1.05 ± 48%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
    122.62 ± 18%     -70.4%      36.26 ± 53%  perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
    250.26 ± 65%     -94.2%      14.56 ± 55%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      9.37 ± 17%     -78.2%       2.05 ± 48%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     58.34 ± 33%     -62.0%      22.18 ± 85%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
    134.44 ± 15%     -69.3%      41.24 ± 52%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     86.94 ±  6%     -83.1%      14.68 ± 48%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
     86.57 ± 39%     -86.0%      12.14 ± 59%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    647.92 ± 48%     -97.9%      13.86 ± 45%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      6386 ±  6%     -46.8%       3397 ± 57%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      3868 ± 27%     -60.4%       1531 ± 67%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
      1647 ± 55%     -67.7%     531.51 ± 50%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      5014 ±  5%     -32.5%       3385 ± 47%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     19.31 ± 47%     -86.5%       2.61 ± 49%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
     23.25 ± 70%     -85.3%       3.42 ± 52%  perf-sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
     18.33 ± 15%     -42.0%      10.64 ± 49%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.11 ±123%     -95.3%       0.01 ±102%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
      0.06 ±103%     -93.6%       0.00 ±154%  perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
      0.10 ±109%     -93.9%       0.01 ±163%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
      1.70 ± 21%     -52.6%       0.81 ± 48%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
     27.74 ± 15%     -79.5%       5.68 ± 51%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
      2.17 ± 75%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.42 ± 97%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      2.02 ± 17%     -65.1%       0.70 ± 48%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
     80.37 ± 18%     -69.8%      24.31 ± 52%  perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
    210.13 ± 68%     -95.1%      10.21 ± 55%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.12 ± 17%     -78.5%       1.32 ± 48%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     29.17 ± 33%     -62.0%      11.09 ± 85%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     88.19 ± 16%     -69.6%      26.81 ± 52%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     13.77 ± 45%     -65.7%       4.72 ± 53%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
    104.64 ± 42%     -76.4%      24.74 ±135%  perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      5.16 ± 29%     -92.5%       0.39 ± 48%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
     56.98 ±  5%     -82.9%       9.77 ± 48%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
     60.36 ± 32%     -84.7%       9.22 ± 57%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    619.88 ± 43%     -98.0%      12.52 ± 45%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.14 ± 84%     -91.2%       0.01 ±142%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
    740.14 ± 35%     -68.5%     233.31 ± 83%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
      0.20 ±149%     -97.5%       0.01 ±102%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
      0.11 ±144%     -96.6%       0.00 ±154%  perf-sched.wait_time.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
      0.19 ±118%     -96.7%       0.01 ±163%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
    327.64 ± 71%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.72 ±151%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      3299 ±  6%     -40.7%       1957 ± 51%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    436.75 ± 39%     -76.9%     100.85 ± 98%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
      2112 ± 19%     -62.3%     796.34 ± 63%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
    947.83 ± 46%     -58.8%     390.83 ± 53%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      5014 ±  5%     -32.5%       3385 ± 47%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm



***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_2/aim9/300s

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     11036           +85.7%      20499        meminfo.PageTables
    125.17 ±  8%     +18.4%     148.17 ±  7%  perf-c2c.HITM.local
     30464           +18.7%      36160        sched_debug.cpu.nr_switches.avg
      9166           +19.8%      10985        vmstat.system.cs
      6623 ± 17%     +60.8%      10652 ±  5%  numa-meminfo.node0.PageTables
      4414 ± 26%    +123.2%       9853 ±  6%  numa-meminfo.node1.PageTables
      1653 ± 17%     +60.1%       2647 ±  5%  numa-vmstat.node0.nr_page_table_pages
      1097 ± 26%    +123.9%       2457 ±  6%  numa-vmstat.node1.nr_page_table_pages
    319.08            -2.2%     312.04        aim9.shell_rtns_2.ops_per_sec
  27170926            -2.2%   26586121        aim9.time.minor_page_faults
   1051038            -2.2%    1027732        aim9.time.voluntary_context_switches
      2736           +86.4%       5101        proc-vmstat.nr_page_table_pages
     28014            +1.3%      28378        proc-vmstat.nr_slab_unreclaimable
  19332129            -1.5%   19048363        proc-vmstat.numa_hit
  19283853            -1.5%   18996609        proc-vmstat.numa_local
  19892794            -1.5%   19598065        proc-vmstat.pgalloc_normal
  28044189            -2.1%   27457289        proc-vmstat.pgfault
  19843766            -1.5%   19543091        proc-vmstat.pgfree
    419715            -5.7%     395688 ±  8%  proc-vmstat.pgreuse
      2682            -2.0%       2628        proc-vmstat.unevictable_pgs_culled
      0.07 ±  6%     -30.5%       0.05 ± 22%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.03 ±  6%     +36.0%       0.04        perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.07 ± 33%     -57.5%       0.03 ± 53%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.02 ± 74%    +112.0%       0.05 ± 36%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
      0.02           +24.1%       0.02 ±  2%  perf-sched.total_sch_delay.average.ms
     27.52           -14.0%      23.67        perf-sched.total_wait_and_delay.average.ms
     23179           +18.3%      27421        perf-sched.total_wait_and_delay.count.ms
     27.50           -14.0%      23.65        perf-sched.total_wait_time.average.ms
    117.03 ±  3%     -72.4%      32.27 ±  2%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1655 ±  2%    +282.0%       6324        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.96 ± 29%     +51.6%       1.45 ± 22%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
    117.00 ±  3%     -72.5%      32.23 ±  2%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      5.93            +0.1        6.00        perf-stat.i.branch-miss-rate%
      9189           +19.8%      11011        perf-stat.i.context-switches
      1.96            +1.6%       1.99        perf-stat.i.cpi
     71.21           +60.6%     114.39 ±  4%  perf-stat.i.cpu-migrations
      0.53            -1.5%       0.52        perf-stat.i.ipc
      3.79            -2.1%       3.71        perf-stat.i.metric.K/sec
     90998            -2.1%      89084        perf-stat.i.minor-faults
     90998            -2.1%      89084        perf-stat.i.page-faults
      5.99            +0.1        6.06        perf-stat.overall.branch-miss-rate%
      1.79            +1.4%       1.82        perf-stat.overall.cpi
      0.56            -1.3%       0.55        perf-stat.overall.ipc
      9158           +19.8%      10974        perf-stat.ps.context-switches
     70.99           +60.6%     114.02 ±  4%  perf-stat.ps.cpu-migrations
     90694            -2.1%      88787        perf-stat.ps.minor-faults
     90695            -2.1%      88787        perf-stat.ps.page-faults
 8.155e+11            -1.1%  8.065e+11        perf-stat.total.instructions
      8.87            -0.3        8.55        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      8.86            -0.3        8.54        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.53 ±  2%      -0.1        2.43 ±  2%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
      2.54            -0.1        2.44 ±  2%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      2.49            -0.1        2.40 ±  2%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
      0.98 ±  5%      -0.1        0.90 ±  5%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.70 ±  3%      -0.1        0.62 ±  6%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
      0.18 ±141%      +0.5        0.67 ±  6%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      0.18 ±141%      +0.5        0.67 ±  6%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      0.00            +0.6        0.59 ±  7%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
     62.48            +0.7       63.14        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     49.10            +0.7       49.78        perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     67.62            +0.8       68.43        perf-profile.calltrace.cycles-pp.common_startup_64
     20.14            -0.7       19.40        perf-profile.children.cycles-pp.do_syscall_64
     20.18            -0.7       19.44        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      3.33 ±  2%      -0.2        3.16 ±  2%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      3.22 ±  2%      -0.2        3.06        perf-profile.children.cycles-pp.do_mmap
      3.51 ±  2%      -0.1        3.38        perf-profile.children.cycles-pp.do_exit
      3.52 ±  2%      -0.1        3.38        perf-profile.children.cycles-pp.__x64_sys_exit_group
      3.52 ±  2%      -0.1        3.38        perf-profile.children.cycles-pp.do_group_exit
      3.67            -0.1        3.54        perf-profile.children.cycles-pp.x64_sys_call
      2.21            -0.1        2.09 ±  3%  perf-profile.children.cycles-pp.__x64_sys_openat
      2.07 ±  2%      -0.1        1.94 ±  2%  perf-profile.children.cycles-pp.path_openat
      2.09 ±  2%      -0.1        1.97 ±  2%  perf-profile.children.cycles-pp.do_filp_open
      2.19            -0.1        2.08 ±  3%  perf-profile.children.cycles-pp.do_sys_openat2
      1.50 ±  4%      -0.1        1.39 ±  3%  perf-profile.children.cycles-pp.copy_process
      2.56            -0.1        2.46 ±  2%  perf-profile.children.cycles-pp.exit_mm
      2.55            -0.1        2.44 ±  2%  perf-profile.children.cycles-pp.__mmput
      2.51 ±  2%      -0.1        2.41 ±  2%  perf-profile.children.cycles-pp.exit_mmap
      0.70 ±  3%      -0.1        0.62 ±  6%  perf-profile.children.cycles-pp.dup_mm
      0.94 ±  4%      -0.1        0.89 ±  2%  perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      0.57 ±  3%      -0.0        0.52 ±  4%  perf-profile.children.cycles-pp.alloc_pages_noprof
      0.20 ± 12%      -0.0        0.15 ± 10%  perf-profile.children.cycles-pp.perf_event_task_tick
      0.18 ±  4%      -0.0        0.14 ± 15%  perf-profile.children.cycles-pp.xas_find
      0.10 ± 12%      -0.0        0.07 ± 24%  perf-profile.children.cycles-pp.up_write
      0.09 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.tick_check_broadcast_expired
      0.08 ± 12%      +0.0        0.10 ±  8%  perf-profile.children.cycles-pp.hrtimer_try_to_cancel
      0.10 ± 13%      +0.0        0.13 ±  5%  perf-profile.children.cycles-pp.__perf_event_task_sched_out
      0.20 ±  8%      +0.0        0.23 ±  4%  perf-profile.children.cycles-pp.enqueue_entity
      0.21 ±  9%      +0.0        0.25 ±  4%  perf-profile.children.cycles-pp.prepare_task_switch
      0.03 ±101%      +0.0        0.07 ± 16%  perf-profile.children.cycles-pp.run_ksoftirqd
      0.04 ± 71%      +0.1        0.09 ± 15%  perf-profile.children.cycles-pp.kick_pool
      0.05 ± 47%      +0.1        0.11 ± 16%  perf-profile.children.cycles-pp.__queue_work
      0.28 ±  5%      +0.1        0.34 ±  7%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      0.50            +0.1        0.56 ±  2%  perf-profile.children.cycles-pp.timerqueue_del
      0.04 ± 71%      +0.1        0.11 ± 17%  perf-profile.children.cycles-pp.queue_work_on
      0.51 ±  4%      +0.1        0.58 ±  2%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.32 ±  3%      +0.1        0.40 ±  4%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.53 ±  5%      +0.1        0.61 ±  3%  perf-profile.children.cycles-pp.enqueue_task
      0.49 ±  4%      +0.1        0.57 ±  6%  perf-profile.children.cycles-pp.schedule
      0.28 ±  6%      +0.1        0.38        perf-profile.children.cycles-pp.sched_ttwu_pending
      0.32 ±  5%      +0.1        0.43 ±  2%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.35 ±  8%      +0.1        0.47 ±  2%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.17 ± 10%      +0.2        0.34 ± 12%  perf-profile.children.cycles-pp.worker_thread
      0.88 ±  3%      +0.2        1.06 ±  4%  perf-profile.children.cycles-pp.ret_from_fork
      0.88 ±  3%      +0.2        1.06 ±  4%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.39 ±  6%      +0.2        0.59 ±  7%  perf-profile.children.cycles-pp.kthread
     66.24            +0.6       66.85        perf-profile.children.cycles-pp.cpuidle_idle_call
     63.09            +0.6       63.73        perf-profile.children.cycles-pp.cpuidle_enter
     62.97            +0.6       63.61        perf-profile.children.cycles-pp.cpuidle_enter_state
     67.61            +0.8       68.43        perf-profile.children.cycles-pp.do_idle
     67.62            +0.8       68.43        perf-profile.children.cycles-pp.common_startup_64
     67.62            +0.8       68.43        perf-profile.children.cycles-pp.cpu_startup_entry
      0.37 ± 11%      -0.1        0.31 ±  3%  perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      0.10 ± 13%      -0.0        0.06 ± 50%  perf-profile.self.cycles-pp.up_write
      0.15 ±  4%      +0.1        0.22 ±  8%  perf-profile.self.cycles-pp.timerqueue_del



***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/exec_test/aim9/300s

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     12120           +76.7%      21422        meminfo.PageTables
      8543           +26.9%      10840        vmstat.system.cs
      6148 ± 11%     +89.9%      11678 ±  5%  numa-meminfo.node0.PageTables
      5909 ± 11%     +64.0%       9689 ±  7%  numa-meminfo.node1.PageTables
      1532 ± 10%     +90.5%       2919 ±  5%  numa-vmstat.node0.nr_page_table_pages
      1468 ± 11%     +65.2%       2426 ±  7%  numa-vmstat.node1.nr_page_table_pages
      2991           +78.0%       5323        proc-vmstat.nr_page_table_pages
  32726750            -2.4%   31952115        proc-vmstat.pgfault
      1228            -2.6%       1197        aim9.exec_test.ops_per_sec
     11018 ±  2%     +10.5%      12178 ±  2%  aim9.time.involuntary_context_switches
  31835059            -2.4%   31062527        aim9.time.minor_page_faults
    736468            -2.9%     715310        aim9.time.voluntary_context_switches
      0.28 ±  7%     +11.3%       0.31 ±  6%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.28 ±  7%     +11.3%       0.31 ±  6%  sched_debug.cfs_rq:/.nr_queued.stddev
    356683 ± 16%     +27.0%     453000 ±  9%  sched_debug.cpu.avg_idle.min
     27620 ±  7%     +29.5%      35775        sched_debug.cpu.nr_switches.avg
     84830 ± 14%     +16.3%      98648 ±  4%  sched_debug.cpu.nr_switches.max
      4563 ± 26%     +46.2%       6671 ± 26%  sched_debug.cpu.nr_switches.min
      0.03 ±  4%     -67.3%       0.01 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exec_release.exec_mm_release.exec_mmap
      0.03           +11.2%       0.03 ±  2%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.05 ± 28%     +61.3%       0.09 ± 21%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.10 ± 18%     +18.8%       0.12        perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.02 ±  3%     +18.3%       0.02 ±  2%  perf-sched.total_sch_delay.average.ms
     28.80           -19.8%      23.10 ±  3%  perf-sched.total_wait_and_delay.average.ms
     22332           +24.4%      27778        perf-sched.total_wait_and_delay.count.ms
     28.78           -19.8%      23.07 ±  3%  perf-sched.total_wait_time.average.ms
     17.39 ± 10%     -15.6%      14.67 ±  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     41.02 ±  4%     -54.6%      18.64 ±  6%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      4795 ±  2%    +122.5%      10668        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     17.35 ± 10%     -15.7%      14.63 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.00 ±141%    +400.0%       0.00 ± 44%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     40.99 ±  4%     -54.6%      18.61 ±  6%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.00 ±149%    +542.9%       0.03 ± 41%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
 5.617e+08            -1.6%  5.529e+08        perf-stat.i.branch-instructions
      5.76            +0.1        5.84        perf-stat.i.branch-miss-rate%
      8562           +27.0%      10878        perf-stat.i.context-switches
      1.87            +2.6%       1.92        perf-stat.i.cpi
     78.02 ±  3%     +11.8%      87.23 ±  2%  perf-stat.i.cpu-migrations
 2.792e+09            -1.6%  2.748e+09        perf-stat.i.instructions
      0.55            -2.5%       0.54        perf-stat.i.ipc
      4.42            -2.4%       4.31        perf-stat.i.metric.K/sec
    106019            -2.4%     103509        perf-stat.i.minor-faults
    106019            -2.4%     103509        perf-stat.i.page-faults
      5.83            +0.1        5.91        perf-stat.overall.branch-miss-rate%
      1.72            +2.3%       1.76        perf-stat.overall.cpi
      0.58            -2.3%       0.57        perf-stat.overall.ipc
 5.599e+08            -1.6%  5.511e+08        perf-stat.ps.branch-instructions
      8534           +27.0%      10841        perf-stat.ps.context-switches
     77.77 ±  3%     +11.8%      86.96 ±  2%  perf-stat.ps.cpu-migrations
 2.783e+09            -1.6%  2.739e+09        perf-stat.ps.instructions
    105666            -2.4%     103164        perf-stat.ps.minor-faults
    105666            -2.4%     103164        perf-stat.ps.page-faults
 8.386e+11            -1.6%  8.253e+11        perf-stat.total.instructions
      7.79            -0.4        7.41 ±  2%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
      7.75            -0.3        7.47        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      7.73            -0.3        7.46        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.68 ±  2%      -0.2        2.52 ±  2%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
      2.68 ±  2%      -0.2        2.52 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.68 ±  2%      -0.2        2.52 ±  2%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.73 ±  2%      -0.2        2.57 ±  2%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.60            -0.1        2.46 ±  3%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
      2.61            -0.1        2.47 ±  3%  perf-profile.calltrace.cycles-pp.execve.exec_test
      2.60            -0.1        2.46 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
      2.60            -0.1        2.46 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve.exec_test
      1.92 ±  3%      -0.1        1.79 ±  2%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
      1.92 ±  3%      -0.1        1.80 ±  2%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      4.68            -0.1        4.57        perf-profile.calltrace.cycles-pp._Fork
      1.88 ±  2%      -0.1        1.77 ±  2%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
      2.76            -0.1        2.66 ±  2%  perf-profile.calltrace.cycles-pp.exec_test
      3.24            -0.1        3.16        perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.84 ±  4%      -0.1        0.77 ±  5%  perf-profile.calltrace.cycles-pp.wait4
      0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      0.46 ± 45%      +0.3        0.78 ±  5%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.17 ±141%      +0.4        0.53 ±  4%  perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      0.18 ±141%      +0.4        0.54 ±  2%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     66.08            +0.8       66.85        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
     66.08            +0.8       66.85        perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
     66.02            +0.8       66.80        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
     67.06            +0.9       68.00        perf-profile.calltrace.cycles-pp.common_startup_64
     21.19            -0.9       20.30        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     21.15            -0.9       20.27        perf-profile.children.cycles-pp.do_syscall_64
      7.92            -0.4        7.53 ±  2%  perf-profile.children.cycles-pp.execve
      7.94            -0.4        7.56 ±  2%  perf-profile.children.cycles-pp.__x64_sys_execve
      7.84            -0.4        7.46 ±  2%  perf-profile.children.cycles-pp.do_execveat_common
      5.51            -0.3        5.25 ±  2%  perf-profile.children.cycles-pp.load_elf_binary
      3.68            -0.2        3.49 ±  2%  perf-profile.children.cycles-pp.__mmput
      2.81 ±  2%      -0.2        2.63        perf-profile.children.cycles-pp.__x64_sys_exit_group
      2.80 ±  2%      -0.2        2.62 ±  2%  perf-profile.children.cycles-pp.do_exit
      2.81 ±  2%      -0.2        2.62 ±  2%  perf-profile.children.cycles-pp.do_group_exit
      2.93 ±  2%      -0.2        2.76 ±  2%  perf-profile.children.cycles-pp.x64_sys_call
      3.60            -0.2        3.44 ±  2%  perf-profile.children.cycles-pp.exit_mmap
      5.66            -0.1        5.51        perf-profile.children.cycles-pp.__handle_mm_fault
      1.94 ±  3%      -0.1        1.82 ±  2%  perf-profile.children.cycles-pp.exit_mm
      2.64            -0.1        2.52 ±  3%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      2.55 ±  2%      -0.1        2.43 ±  3%  perf-profile.children.cycles-pp.do_mmap
      2.19 ±  2%      -0.1        2.08 ±  3%  perf-profile.children.cycles-pp.__mmap_region
      2.27            -0.1        2.16 ±  2%  perf-profile.children.cycles-pp.begin_new_exec
      2.79            -0.1        2.69 ±  2%  perf-profile.children.cycles-pp.exec_test
      0.83 ±  4%      -0.1        0.76 ±  6%  perf-profile.children.cycles-pp.__mmap_prepare
      0.86 ±  4%      -0.1        0.78 ±  5%  perf-profile.children.cycles-pp.wait4
      0.52 ±  5%      -0.1        0.45 ±  7%  perf-profile.children.cycles-pp.kernel_wait4
      0.50 ±  5%      -0.1        0.43 ±  6%  perf-profile.children.cycles-pp.do_wait
      0.88 ±  3%      -0.1        0.81 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
      0.51 ±  2%      -0.1        0.46 ±  6%  perf-profile.children.cycles-pp.setup_arg_pages
      0.39 ±  2%      -0.0        0.34 ±  8%  perf-profile.children.cycles-pp.unlink_anon_vmas
      0.08 ± 10%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
      0.37 ±  5%      -0.0        0.33 ±  3%  perf-profile.children.cycles-pp.__memcg_slab_free_hook
      0.21 ±  6%      -0.0        0.17 ±  5%  perf-profile.children.cycles-pp.user_path_at
      0.21 ±  3%      -0.0        0.18 ± 10%  perf-profile.children.cycles-pp.__percpu_counter_sum
      0.18 ±  7%      -0.0        0.15 ±  5%  perf-profile.children.cycles-pp.alloc_empty_file
      0.33 ±  5%      -0.0        0.30        perf-profile.children.cycles-pp.relocate_vma_down
      0.04 ± 45%      +0.0        0.08 ± 12%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.14 ±  7%      +0.0        0.18 ± 10%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
      0.19 ±  9%      +0.0        0.24 ±  7%  perf-profile.children.cycles-pp.prepare_task_switch
      0.02 ±142%      +0.0        0.06 ± 23%  perf-profile.children.cycles-pp.select_task_rq
      0.03 ±100%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.task_contending
      0.45 ±  7%      +0.1        0.51 ±  3%  perf-profile.children.cycles-pp.__pick_next_task
      0.14 ± 22%      +0.1        0.20 ± 10%  perf-profile.children.cycles-pp.kick_pool
      0.36 ±  4%      +0.1        0.42 ±  4%  perf-profile.children.cycles-pp.dequeue_entities
      0.36 ±  4%      +0.1        0.44 ±  5%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.15 ± 20%      +0.1        0.23 ± 10%  perf-profile.children.cycles-pp.__queue_work
      0.49 ±  5%      +0.1        0.57 ±  7%  perf-profile.children.cycles-pp.schedule_idle
      0.14 ± 22%      +0.1        0.23 ±  9%  perf-profile.children.cycles-pp.queue_work_on
      0.36 ±  3%      +0.1        0.46 ±  9%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
      0.47 ±  7%      +0.1        0.57 ±  7%  perf-profile.children.cycles-pp.timerqueue_del
      0.30 ± 13%      +0.1        0.42 ±  7%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.23 ± 15%      +0.1        0.37 ±  4%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.18 ± 14%      +0.1        0.32 ±  3%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.19 ± 13%      +0.1        0.34 ±  4%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.61 ±  3%      +0.2        0.76 ±  5%  perf-profile.children.cycles-pp.schedule
      1.60 ±  4%      +0.2        1.80 ±  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
      1.60 ±  4%      +0.2        1.80 ±  2%  perf-profile.children.cycles-pp.ret_from_fork
      0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.children.cycles-pp.kthread
      1.22 ±  3%      +0.2        1.45 ±  5%  perf-profile.children.cycles-pp.__schedule
      0.54 ±  8%      +0.2        0.78 ±  5%  perf-profile.children.cycles-pp.worker_thread
     66.08            +0.8       66.85        perf-profile.children.cycles-pp.start_secondary
     67.06            +0.9       68.00        perf-profile.children.cycles-pp.common_startup_64
     67.06            +0.9       68.00        perf-profile.children.cycles-pp.cpu_startup_entry
     67.06            +0.9       68.00        perf-profile.children.cycles-pp.do_idle
      0.08 ± 10%      -0.0        0.04 ± 71%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
      0.04 ± 45%      +0.0        0.08 ± 10%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.14 ± 10%      +0.1        0.23 ± 11%  perf-profile.self.cycles-pp.timerqueue_del



***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
  gcc-12/performance/1BRD_48G/xfs/x86_64-rhel-9.4/600/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/sync_disk_rw/aim7

commit: 
  baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
  f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")

baffb122772da116 f3de761c52148abfb1b4512914f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    344180 ±  6%     -13.0%     299325 ±  9%  meminfo.Mapped
      9594 ±123%    +191.8%      27995 ± 54%  numa-meminfo.node1.PageTables
      2399 ±123%    +191.3%       6989 ± 54%  numa-vmstat.node1.nr_page_table_pages
   1860734            -5.2%    1763194        vmstat.io.bo
    831686            +1.3%     842493        vmstat.system.cs
     50372            -5.5%      47609        aim7.jobs-per-min
   1435644           +11.5%    1600707        aim7.time.involuntary_context_switches
      7242            +1.2%       7332        aim7.time.percent_of_cpu_this_job_got
      5159            +7.1%       5526        aim7.time.system_time
  33195986            +6.9%   35497140        aim7.time.voluntary_context_switches
     40987 ± 10%     -19.8%      32872 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.stddev
     40987 ± 10%     -19.8%      32872 ±  9%  sched_debug.cfs_rq:/.min_vruntime.stddev
    605972 ±  2%     +14.5%     693922 ±  7%  sched_debug.cpu.avg_idle.max
     30974 ±  8%     -20.9%      24498 ± 15%  sched_debug.cpu.avg_idle.min
    118758 ±  5%     +22.0%     144899 ±  6%  sched_debug.cpu.avg_idle.stddev
    856253            +1.5%     869009        perf-stat.i.context-switches
      3.06            +2.3%       3.13        perf-stat.i.cpi
    164824            +7.7%     177546        perf-stat.i.cpu-migrations
      7.93            +2.5%       8.13        perf-stat.i.metric.K/sec
      3.41            +1.8%       3.47        perf-stat.overall.cpi
      1355            +5.8%       1434 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.29            -1.8%       0.29        perf-stat.overall.ipc
    845412            +1.6%     858925        perf-stat.ps.context-switches
    162728            +7.8%     175475        perf-stat.ps.cpu-migrations
 4.391e+12            +5.0%  4.609e+12        perf-stat.total.instructions
    444798            +6.0%     471383 ±  5%  proc-vmstat.nr_active_anon
     28190            -2.8%      27402        proc-vmstat.nr_dirty
   1231373            +2.3%    1259666 ±  2%  proc-vmstat.nr_file_pages
     63763            +0.9%      64355        proc-vmstat.nr_inactive_file
     86758 ±  6%     -12.9%      75546 ±  8%  proc-vmstat.nr_mapped
     10162 ±  2%      +7.2%      10895 ±  3%  proc-vmstat.nr_page_table_pages
    265229           +10.4%     292795 ±  9%  proc-vmstat.nr_shmem
    444798            +6.0%     471383 ±  5%  proc-vmstat.nr_zone_active_anon
     63763            +0.9%      64355        proc-vmstat.nr_zone_inactive_file
     28191            -2.8%      27400        proc-vmstat.nr_zone_write_pending
     24349           +11.6%      27171 ±  8%  proc-vmstat.pgreuse
      0.02 ±  3%     +11.3%       0.03 ±  2%  perf-sched.sch_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
      0.29 ± 17%     -30.7%       0.20 ± 14%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
      0.03 ± 10%     +33.5%       0.04 ±  2%  perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      0.21 ± 32%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.16 ± 16%     +51.9%       0.24 ± 11%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.22 ± 19%     +44.1%       0.32 ± 25%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      0.30 ± 28%     -38.7%       0.18 ± 28%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      0.11 ±  5%     +12.8%       0.12 ±  4%  perf-sched.sch_delay.avg.ms.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
      0.08 ±  4%     +15.8%       0.09 ±  4%  perf-sched.sch_delay.avg.ms.xlog_wait.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync
      0.02 ±  3%     +13.7%       0.02 ±  4%  perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
      0.01 ±223%   +1289.5%       0.09 ±111%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
      2.49 ± 40%     -43.4%       1.41 ± 50%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
      0.76 ±  7%     +92.8%       1.46 ± 40%  perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      0.65 ± 41%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.40 ± 64%   +2968.7%      43.04 ± 13%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.63 ± 19%     +89.8%       1.19 ± 51%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     28.67 ±  3%     -11.2%      25.45 ±  5%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
      0.80 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
      5.76 ±107%    +152.4%      14.53 ± 10%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      8441          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
     18.67 ± 71%    +108.0%      38.83 ±  5%  perf-sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_trans_commit
    116.17 ±105%   +1677.8%       2065 ±  5%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    424.79 ±151%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
     28.51 ±  3%     -11.2%      25.31 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
      0.38 ± 59%     -79.0%       0.08 ±107%  perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
      0.77 ±  9%     -56.5%       0.34 ±  3%  perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
      1.80 ±138%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.13 ± 93%    +133.2%      14.29 ± 10%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1.00 ± 16%     -48.1%       0.52 ± 20%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      0.92 ± 16%     -62.0%       0.35 ± 14%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work
      0.26 ±  2%     -59.8%       0.11        perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
      0.24 ±223%   +2180.2%       5.56 ± 83%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
      1.25 ± 77%     -79.8%       0.25 ±107%  perf-sched.wait_time.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
      1.78 ± 51%    +958.6%      18.82 ±117%  perf-sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map
     58.48 ±  6%     -10.7%      52.22 ±  2%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.__flush_workqueue.xlog_cil_push_now.isra
     10.87 ±192%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.63 ± 27%     -63.9%       3.12 ± 29%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
  2025-06-25  8:01   ` kernel test robot
@ 2025-06-25 13:57     ` Mathieu Desnoyers
  2025-06-25 15:06       ` Gabriele Monaco
  2025-07-02 13:58       ` Gabriele Monaco
  0 siblings, 2 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2025-06-25 13:57 UTC (permalink / raw)
  To: kernel test robot, Gabriele Monaco
  Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
	Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
	Paul E. McKenney, Ingo Molnar

On 2025-06-25 04:01, kernel test robot wrote:
> 
> Hello,
> 
> kernel test robot noticed a 10.1% regression of hackbench.throughput on:

Hi Gabriele,

This is a significant regression. Can you investigate before it gets
merged ?

Thanks,

Mathieu

> 
> 
> commit: f3de761c52148abfb1b4512914f64c7e1c737fc8 ("[RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct")
> url: https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250613-171504
> patch link: https://lore.kernel.org/all/20250613091229.21500-3-gmonaco@redhat.com/
> patch subject: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
> 
> testcase: hackbench
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	iterations: 4
> 	mode: process
> 	ipc: pipe
> 	cpufreq_governor: performance
> 
> 
> In addition to that, the commit also has significant impact on the following tests:
> 
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput  2.9% regression                                               |
> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | ipc=socket                                                                                     |
> |                  | iterations=4                                                                                   |
> |                  | mode=process                                                                                   |
> |                  | nr_threads=50%                                                                                 |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.shell_rtns_3.ops_per_sec  1.7% regression                                           |
> | test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | test=shell_rtns_3                                                                              |
> |                  | testtime=300s                                                                                  |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput  6.2% regression                                               |
> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | ipc=pipe                                                                                       |
> |                  | iterations=4                                                                                   |
> |                  | mode=process                                                                                   |
> |                  | nr_threads=800%                                                                                |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.shell_rtns_1.ops_per_sec  2.1% regression                                           |
> | test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | test=shell_rtns_1                                                                              |
> |                  | testtime=300s                                                                                  |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput 11.8% improvement                                              |
> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | ipc=pipe                                                                                       |
> |                  | iterations=4                                                                                   |
> |                  | mode=process                                                                                   |
> |                  | nr_threads=50%                                                                                 |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.shell_rtns_2.ops_per_sec  2.2% regression                                           |
> | test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | test=shell_rtns_2                                                                              |
> |                  | testtime=300s                                                                                  |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.exec_test.ops_per_sec  2.6% regression                                              |
> | test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | test=exec_test                                                                                 |
> |                  | testtime=300s                                                                                  |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim7: aim7.jobs-per-min  5.5% regression                                                       |
> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> | test parameters  | cpufreq_governor=performance                                                                   |
> |                  | disk=1BRD_48G                                                                                  |
> |                  | fs=xfs                                                                                         |
> |                  | load=600                                                                                       |
> |                  | test=sync_disk_rw                                                                              |
> +------------------+------------------------------------------------------------------------------------------------+
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202506251555.de6720f7-lkp@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250625/202506251555.de6720f7-lkp@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
>    gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>       55140 ± 80%    +229.2%     181547 ± 20%  numa-meminfo.node1.Mapped
>       13048 ± 80%    +248.2%      45431 ± 20%  numa-vmstat.node1.nr_mapped
>      679.17 ± 22%     -25.3%     507.33 ± 10%  sched_debug.cfs_rq:/.util_est.max
>   4.287e+08 ±  3%     +20.3%  5.158e+08        cpuidle..time
>     2953716 ± 13%    +228.9%    9716185 ±  2%  cpuidle..usage
>       91072 ± 12%    +134.8%     213855 ±  7%  meminfo.Mapped
>     8848637           +10.4%    9769875 ±  5%  meminfo.Memused
>        0.67 ±  4%      +0.1        0.78 ±  2%  mpstat.cpu.all.irq%
>        0.03 ±  2%      +0.0        0.03 ±  4%  mpstat.cpu.all.soft%
>        4.17 ±  8%    +596.0%      29.00 ± 31%  mpstat.max_utilization.seconds
>        2950           -12.3%       2587        vmstat.procs.r
>     4557607 ±  2%     +35.9%    6192548        vmstat.system.cs
>      397195 ±  5%     +73.4%     688726        vmstat.system.in
>     1490153           -10.1%    1339340        hackbench.throughput
>     1424170            -8.7%    1299590        hackbench.throughput_avg
>     1490153           -10.1%    1339340        hackbench.throughput_best
>     1353181 ±  2%     -10.1%    1216523        hackbench.throughput_worst
>    53158738 ±  3%     +34.0%   71240022        hackbench.time.involuntary_context_switches
>       12177            -2.4%      11891        hackbench.time.percent_of_cpu_this_job_got
>        4482            +7.6%       4821        hackbench.time.system_time
>      798.92            +2.0%     815.24        hackbench.time.user_time
>    1.54e+08 ±  3%     +46.6%  2.257e+08        hackbench.time.voluntary_context_switches
>      210335            +3.3%     217333        proc-vmstat.nr_anon_pages
>       23353 ± 14%    +136.2%      55152 ±  7%  proc-vmstat.nr_mapped
>       61825 ±  3%      +6.6%      65928 ±  2%  proc-vmstat.nr_page_table_pages
>       30859            +4.4%      32213        proc-vmstat.nr_slab_reclaimable
>        1294 ±177%   +1657.1%      22743 ± 66%  proc-vmstat.numa_hint_faults
>        1153 ±198%   +1597.0%      19566 ± 79%  proc-vmstat.numa_hint_faults_local
>   1.242e+08            -3.2%  1.202e+08        proc-vmstat.numa_hit
>   1.241e+08            -3.2%  1.201e+08        proc-vmstat.numa_local
>        2195 ±110%   +2337.0%      53508 ± 55%  proc-vmstat.numa_pte_updates
>   1.243e+08            -3.2%  1.203e+08        proc-vmstat.pgalloc_normal
>      875909 ±  2%      +8.6%     951378 ±  2%  proc-vmstat.pgfault
>   1.231e+08            -3.5%  1.188e+08        proc-vmstat.pgfree
>   6.903e+10            -5.6%  6.514e+10        perf-stat.i.branch-instructions
>        0.21            +0.0        0.26        perf-stat.i.branch-miss-rate%
>    89225177 ±  2%     +38.3%  1.234e+08        perf-stat.i.branch-misses
>       25.64 ±  2%      -5.7       19.95 ±  2%  perf-stat.i.cache-miss-rate%
>   9.322e+08 ±  2%     +22.8%  1.145e+09        perf-stat.i.cache-references
>     4553621 ±  2%     +39.8%    6363761        perf-stat.i.context-switches
>        1.12            +4.5%       1.17        perf-stat.i.cpi
>      186890 ±  2%    +143.9%     455784        perf-stat.i.cpu-migrations
>   2.787e+11            -4.9%  2.649e+11        perf-stat.i.instructions
>        0.91            -4.4%       0.87        perf-stat.i.ipc
>       36.79 ±  2%     +44.9%      53.30        perf-stat.i.metric.K/sec
>        0.13 ±  2%      +0.1        0.19        perf-stat.overall.branch-miss-rate%
>       24.44 ±  2%      -4.7       19.74 ±  2%  perf-stat.overall.cache-miss-rate%
>        1.12            +4.6%       1.17        perf-stat.overall.cpi
>        0.89            -4.4%       0.85        perf-stat.overall.ipc
>   6.755e+10            -5.4%  6.392e+10        perf-stat.ps.branch-instructions
>    87121352 ±  2%     +38.5%  1.206e+08        perf-stat.ps.branch-misses
>   9.098e+08 ±  2%     +23.1%   1.12e+09        perf-stat.ps.cache-references
>     4443812 ±  2%     +39.9%    6218298        perf-stat.ps.context-switches
>      181595 ±  2%    +144.5%     443985        perf-stat.ps.cpu-migrations
>   2.727e+11            -4.7%  2.599e+11        perf-stat.ps.instructions
>    1.21e+13            +4.3%  1.262e+13        perf-stat.total.instructions
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.ctx_resched.event_function.remote_function.generic_exec_single
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.ctx_resched.event_function.remote_function.generic_exec_single.smp_call_function_single
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.event_function.remote_function.generic_exec_single.smp_call_function_single.event_function_call
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.remote_function.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_internal_command
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record.run_builtin
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.main
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.perf_c2c__record.run_builtin.handle_internal_command.main
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.calltrace.cycles-pp.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.__x64_sys_ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp._perf_ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.ctx_resched
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.event_function
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.generic_exec_single
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.perf_event_for_each_child
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.perf_ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.children.cycles-pp.remote_function
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.__evlist__enable
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.perf_c2c__record
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.perf_evsel__enable_cpu
>       11.84 ± 91%      -8.4        3.49 ±154%  perf-profile.children.cycles-pp.perf_evsel__run_ioctl
>       11.84 ± 91%      -9.5        2.30 ±141%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
>       23.74 ±185%     -98.6%       0.34 ±114%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
>       12.77 ± 80%     -83.9%       2.05 ±138%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>        5.93 ± 69%     -90.5%       0.56 ±105%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        6.70 ±152%     -94.5%       0.37 ±145%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>        0.82 ± 85%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        8.59 ±202%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       13.53 ± 11%     -47.0%       7.18 ± 76%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       15.63 ± 17%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
>       47.22 ± 77%     -85.5%       6.87 ±144%  perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>      133.35 ±132%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       68.01 ±203%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       13.53 ± 11%     -47.0%       7.18 ± 76%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       34.59 ±  3%    -100.0%       0.00        perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
>       40.97 ±  8%     -71.8%      11.55 ± 64%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
>      373.07 ±123%     -99.8%       0.78 ±156%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       13.53 ± 11%     -62.0%       5.14 ±107%  perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>      120.97 ± 23%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
>       46.03 ± 30%     -62.5%      17.27 ± 87%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
>      984.50 ± 14%     -43.5%     556.24 ± 58%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>      339.42 ± 12%     -97.3%       9.11 ± 54%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        8.00 ± 23%     -85.4%       1.17 ±223%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       22.17 ± 49%    -100.0%       0.00        perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
>       73.83 ± 20%     -76.3%      17.50 ± 96%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
>       13.53 ± 11%     -62.0%       5.14 ±107%  perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>      336.30 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
>       23.74 ±185%     -98.6%       0.34 ±114%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
>       14.48 ± 61%     -74.1%       3.76 ±152%  perf-sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>        6.48 ± 68%     -91.3%       0.56 ±105%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>        6.70 ±152%     -94.5%       0.37 ±145%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>        2.18 ± 75%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       10.79 ±165%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        1.53 ±100%     -97.5%       0.04 ± 84%  perf-sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>      105.34 ± 26%    -100.0%       0.00        perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
>       29.72 ± 40%     -76.5%       7.00 ±102%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
>       32.21 ± 33%     -65.7%      11.04 ± 85%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
>      984.49 ± 14%     -43.5%     556.23 ± 58%  perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
>      337.00 ± 12%     -97.6%       8.11 ± 52%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       53.42 ± 59%     -69.8%      16.15 ±162%  perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>      218.65 ± 83%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       82.52 ±162%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       10.89 ± 98%     -98.8%       0.13 ±134%  perf-sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>      334.02 ±  6%    -100.0%       0.00        perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 
> 
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
>    gcc-12/performance/socket/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>      161258           -12.6%     141018 ±  5%  perf-c2c.HITM.total
>        6514 ±  3%     +13.3%       7381 ±  3%  uptime.idle
>      692218           +17.8%     815512        vmstat.system.in
>   4.747e+08 ±  7%    +137.3%  1.127e+09 ± 21%  cpuidle..time
>     5702271 ± 12%    +503.6%   34419686 ± 13%  cpuidle..usage
>      141191 ±  2%     +10.3%     155768 ±  3%  meminfo.PageTables
>       62180           +26.0%      78348        meminfo.Percpu
>        2.20 ± 14%      +3.5        5.67 ± 20%  mpstat.cpu.all.idle%
>        0.55            +0.2        0.72 ±  5%  mpstat.cpu.all.irq%
>        0.04 ±  2%      +0.0        0.06 ±  5%  mpstat.cpu.all.soft%
>      448780            -2.9%     435554        hackbench.throughput
>      440656            -2.6%     429130        hackbench.throughput_avg
>      448780            -2.9%     435554        hackbench.throughput_best
>      425797            -2.2%     416584        hackbench.throughput_worst
>    90998790           -15.0%   77364427 ±  6%  hackbench.time.involuntary_context_switches
>       12446            -3.9%      11960        hackbench.time.percent_of_cpu_this_job_got
>       16057            -1.4%      15825        hackbench.time.system_time
>       63421            -2.3%      61955        proc-vmstat.nr_kernel_stack
>       35455 ±  2%     +10.0%      38991 ±  3%  proc-vmstat.nr_page_table_pages
>       34542            +5.1%      36312 ±  2%  proc-vmstat.nr_slab_reclaimable
>      151083 ± 16%     +46.6%     221509 ± 17%  proc-vmstat.numa_hint_faults
>      113731 ± 26%     +64.7%     187314 ± 20%  proc-vmstat.numa_hint_faults_local
>      133591            +3.1%     137709        proc-vmstat.numa_other
>       53696 ± 16%     -28.6%      38362 ± 10%  proc-vmstat.numa_pages_migrated
>     1053504 ±  2%      +7.7%    1135052 ±  4%  proc-vmstat.pgfault
>     2077549 ±  3%      +8.5%    2254157 ±  4%  proc-vmstat.pgfree
>       53696 ± 16%     -28.6%      38362 ± 10%  proc-vmstat.pgmigrate_success
>   4.941e+10            -2.6%   4.81e+10        perf-stat.i.branch-instructions
>   2.232e+08            -1.9%  2.189e+08        perf-stat.i.branch-misses
>    2.11e+09            -5.8%  1.989e+09 ±  2%  perf-stat.i.cache-references
>   3.221e+11            -2.5%  3.141e+11        perf-stat.i.cpu-cycles
>   2.365e+11            -2.7%  2.303e+11        perf-stat.i.instructions
>        6787 ±  3%      +8.0%       7327 ±  4%  perf-stat.i.minor-faults
>        6789 ±  3%      +8.0%       7329 ±  4%  perf-stat.i.page-faults
>   4.904e+10            -2.5%  4.779e+10        perf-stat.ps.branch-instructions
>   2.215e+08            -1.8%  2.174e+08        perf-stat.ps.branch-misses
>   2.094e+09            -5.7%  1.974e+09 ±  2%  perf-stat.ps.cache-references
>   3.197e+11            -2.4%   3.12e+11        perf-stat.ps.cpu-cycles
>   2.348e+11            -2.6%  2.288e+11        perf-stat.ps.instructions
>        6691 ±  3%      +7.2%       7174 ±  4%  perf-stat.ps.minor-faults
>        6693 ±  3%      +7.2%       7176 ±  4%  perf-stat.ps.page-faults
>     7475567           +16.4%    8699139        sched_debug.cfs_rq:/.avg_vruntime.avg
>     8752154 ±  3%     +20.6%   10551563 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.max
>      211424 ± 12%    +374.5%    1003211 ± 39%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>       19.44 ±  6%     +29.4%      25.17 ±  5%  sched_debug.cfs_rq:/.h_nr_queued.max
>        4.49 ±  4%     +33.5%       5.99 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.stddev
>       19.33 ±  6%     +29.0%      24.94 ±  5%  sched_debug.cfs_rq:/.h_nr_runnable.max
>        4.47 ±  4%     +33.4%       5.96 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
>        6446 ±223%    +885.4%      63529 ± 57%  sched_debug.cfs_rq:/.left_deadline.avg
>      825119 ±223%    +613.5%    5886958 ± 44%  sched_debug.cfs_rq:/.left_deadline.max
>       72645 ±223%    +713.6%     591074 ± 49%  sched_debug.cfs_rq:/.left_deadline.stddev
>        6446 ±223%    +885.5%      63527 ± 57%  sched_debug.cfs_rq:/.left_vruntime.avg
>      825080 ±223%    +613.5%    5886805 ± 44%  sched_debug.cfs_rq:/.left_vruntime.max
>       72642 ±223%    +713.7%     591058 ± 49%  sched_debug.cfs_rq:/.left_vruntime.stddev
>        4202 ±  8%   +1115.1%      51069 ± 61%  sched_debug.cfs_rq:/.load.stddev
>      367.11           +20.2%     441.44 ± 17%  sched_debug.cfs_rq:/.load_avg.max
>     7475567           +16.4%    8699139        sched_debug.cfs_rq:/.min_vruntime.avg
>     8752154 ±  3%     +20.6%   10551563 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
>      211424 ± 12%    +374.5%    1003211 ± 39%  sched_debug.cfs_rq:/.min_vruntime.stddev
>        0.17 ± 16%     +39.8%       0.24 ±  6%  sched_debug.cfs_rq:/.nr_queued.stddev
>        6446 ±223%    +885.5%      63527 ± 57%  sched_debug.cfs_rq:/.right_vruntime.avg
>      825080 ±223%    +613.5%    5886805 ± 44%  sched_debug.cfs_rq:/.right_vruntime.max
>       72642 ±223%    +713.7%     591058 ± 49%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      752.39 ± 81%     -81.4%     139.72 ± 53%  sched_debug.cfs_rq:/.runnable_avg.min
>        2728 ±  3%     +51.2%       4126 ±  8%  sched_debug.cfs_rq:/.runnable_avg.stddev
>      265.50 ±  2%     +12.3%     298.07 ±  2%  sched_debug.cfs_rq:/.util_avg.stddev
>      686.78 ±  7%     +23.4%     847.76 ±  6%  sched_debug.cfs_rq:/.util_est.stddev
>       19.44 ±  5%     +29.7%      25.22 ±  4%  sched_debug.cpu.nr_running.max
>        4.48 ±  5%     +34.4%       6.02 ±  3%  sched_debug.cpu.nr_running.stddev
>       67323 ± 14%    +130.3%     155017 ± 29%  sched_debug.cpu.nr_switches.stddev
>      -20.78           -18.2%     -17.00        sched_debug.cpu.nr_uninterruptible.min
>        0.13 ±100%     -85.8%       0.02 ±163%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
>        0.17 ±116%     -97.8%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
>       22.92 ±110%     -97.4%       0.59 ±137%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
>        8.10 ± 45%     -78.0%       1.78 ±135%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
>        3.14 ± 19%     -70.9%       0.91 ±102%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
>       39.05 ±149%     -97.4%       1.01 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
>       15.77 ±203%     -99.7%       0.04 ±102%  perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
>        1.27 ±177%     -98.2%       0.02 ±190%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
>        0.20 ±140%     -92.4%       0.02 ±201%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
>       86.63 ±221%     -99.9%       0.05 ±184%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.18 ± 75%     -97.0%       0.01 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
>        0.13 ± 34%     -75.5%       0.03 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>        0.26 ±108%     -86.2%       0.04 ±142%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
>        2.33 ± 11%     -65.8%       0.80 ±107%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>        0.18 ± 88%     -91.1%       0.02 ±194%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
>        0.50 ±145%     -92.5%       0.04 ±210%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
>        0.19 ±116%     -98.5%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
>        0.24 ±128%     -96.8%       0.01 ±180%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
>        0.99 ± 16%     -58.0%       0.42 ±100%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>        0.27 ±124%     -97.5%       0.01 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
>        1.08 ± 28%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.96 ± 93%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        0.53 ±182%     -94.2%       0.03 ±158%  perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>        0.84 ±160%     -93.5%       0.05 ±100%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>       29.39 ±172%     -94.0%       1.78 ±123%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       21.51 ± 60%     -74.7%       5.45 ±118%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       13.77 ± 61%     -81.3%       2.57 ±113%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       11.22 ± 33%     -74.5%       2.86 ±107%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>        1.99 ± 90%     -90.1%       0.20 ±100%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>        4.50 ±138%     -94.9%       0.23 ±200%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       27.91 ±218%     -99.6%       0.11 ±120%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        9.91 ± 51%     -68.3%       3.15 ±124%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       10.18 ± 24%     -62.4%       3.83 ±105%  perf-sched.sch_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
>        1.16 ± 20%     -62.7%       0.43 ±106%  perf-sched.sch_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>        0.27 ± 99%     -92.0%       0.02 ±172%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
>        0.32 ±128%     -98.9%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
>        0.88 ± 94%     -86.7%       0.12 ±144%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>      252.53 ±128%     -98.4%       4.12 ±138%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
>       60.22 ± 58%     -67.8%      19.37 ±146%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
>      168.93 ±209%     -99.9%       0.15 ±100%  perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
>        3.79 ±169%     -98.6%       0.05 ±199%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
>      517.19 ±222%     -99.9%       0.29 ±201%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.54 ± 82%     -98.4%       0.01 ±141%  perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
>        0.34 ± 57%     -93.1%       0.02 ±203%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
>        0.64 ±141%     -99.4%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
>        0.28 ±111%     -97.2%       0.01 ±180%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
>        0.29 ±114%     -97.6%       0.01 ±141%  perf-sched.sch_delay.max.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
>      133.30 ± 46%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       12.53 ±135%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        1.11 ± 85%     -76.9%       0.26 ±202%  perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
>        7.48 ±214%     -99.0%       0.08 ±141%  perf-sched.sch_delay.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>       28.59 ±191%     -99.0%       0.28 ±120%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>      285.16 ±145%     -99.3%       1.94 ±111%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>      143.71 ±128%     -91.0%      12.97 ±134%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>      107.10 ±162%     -99.1%       0.95 ±190%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>      352.73 ±216%     -99.4%       2.06 ±118%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        1169 ± 25%     -58.7%     482.79 ±101%  perf-sched.sch_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>        1.80 ± 20%     -58.5%       0.75 ±105%  perf-sched.total_sch_delay.average.ms
>        5.09 ± 20%     -58.0%       2.14 ±106%  perf-sched.total_wait_and_delay.average.ms
>       20.86 ± 25%     -82.0%       3.76 ±147%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
>        8.10 ± 21%     -69.1%       2.51 ±103%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
>       22.82 ± 27%     -66.9%       7.55 ±103%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        6.55 ± 13%     -64.1%       2.35 ±108%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>      139.95 ± 55%     -64.0%      50.45 ±122%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
>       27.54 ± 61%     -81.3%       5.15 ±113%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       27.75 ± 30%     -73.3%       7.42 ±106%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>       26.76 ± 25%     -64.2%       9.57 ±107%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
>       29.39 ± 34%     -67.3%       9.61 ±115%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       27.53 ± 25%     -62.9%      10.21 ±105%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
>        3.25 ± 20%     -62.2%       1.23 ±106%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>      864.18 ±  4%     -99.3%       6.27 ±103%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      141.47 ± 38%     -72.9%      38.27 ±154%  perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
>        2346 ± 25%     -58.7%     969.53 ±101%  perf-sched.wait_and_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>       83.99 ±223%    -100.0%       0.02 ±163%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
>        0.16 ±122%     -97.7%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
>       12.76 ± 37%     -81.6%       2.35 ±125%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
>        4.96 ± 22%     -67.9%       1.59 ±104%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
>       75.22 ± 91%     -96.4%       2.67 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
>       23.31 ±188%     -98.8%       0.28 ±195%  perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
>       14.93 ± 22%     -68.0%       4.78 ±104%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        1.29 ±178%     -98.5%       0.02 ±185%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
>        0.20 ±140%     -92.5%       0.02 ±200%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
>       87.29 ±221%     -99.9%       0.05 ±184%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.18 ± 76%     -97.0%       0.01 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
>        0.12 ± 33%     -87.4%       0.02 ±212%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>        4.22 ± 15%     -63.3%       1.55 ±108%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>        0.18 ± 88%     -91.1%       0.02 ±194%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
>        0.50 ±145%     -92.5%       0.04 ±210%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
>        0.19 ±116%     -98.5%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
>        0.24 ±128%     -96.8%       0.01 ±180%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
>        1.79 ± 27%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        1.98 ± 92%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        2.44 ±199%     -98.1%       0.05 ±109%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>      125.16 ± 52%     -64.6%      44.36 ±120%  perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
>       13.77 ± 61%     -81.3%       2.58 ±113%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       16.53 ± 29%     -72.5%       4.55 ±106%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>        3.11 ± 80%     -80.7%       0.60 ±138%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>       17.30 ± 23%     -65.0%       6.05 ±107%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
>       50.76 ±143%     -98.1%       0.97 ±101%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       19.48 ± 27%     -66.8%       6.46 ±111%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       17.35 ± 25%     -63.3%       6.37 ±106%  perf-sched.wait_time.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
>        2.09 ± 21%     -62.0%       0.79 ±107%  perf-sched.wait_time.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>      850.73 ±  6%     -99.3%       5.76 ±102%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      168.00 ±223%    -100.0%       0.02 ±172%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
>        0.32 ±131%     -98.8%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
>        0.88 ± 94%     -86.7%       0.12 ±144%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
>       83.05 ± 45%     -75.0%      20.78 ±142%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
>      393.39 ± 76%     -96.3%      14.60 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
>        3.87 ±170%     -98.6%       0.05 ±199%  perf-sched.wait_time.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
>      520.88 ±222%     -99.9%       0.29 ±201%  perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>        0.54 ± 82%     -98.4%       0.01 ±141%  perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
>        0.34 ± 57%     -93.1%       0.02 ±203%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
>        0.64 ±141%     -99.4%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
>        0.28 ±111%     -97.2%       0.01 ±180%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
>      210.15 ± 42%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       34.48 ±131%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        1.11 ± 85%     -76.9%       0.26 ±202%  perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
>       92.32 ±212%     -99.7%       0.27 ±123%  perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>        3252 ± 21%     -58.5%       1351 ±103%  perf-sched.wait_time.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
>        1602 ± 28%     -66.2%     541.12 ±100%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      530.17 ± 95%     -98.5%       7.79 ±119%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        1177 ± 25%     -58.7%     486.74 ±101%  perf-sched.wait_time.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>       50.88            -1.4       49.53        perf-profile.calltrace.cycles-pp.read
>       45.95            -1.0       44.92        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
>       45.66            -1.0       44.64        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        3.44 ±  4%      -0.8        2.66 ±  4%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
>        3.32 ±  4%      -0.8        2.56 ±  4%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg
>        3.28 ±  4%      -0.8        2.52 ±  4%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable
>        3.48 ±  3%      -0.6        2.83 ±  5%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>        3.52 ±  3%      -0.6        2.87 ±  5%  perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>        3.45 ±  3%      -0.6        2.80 ±  5%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg
>       47.06            -0.6       46.45        perf-profile.calltrace.cycles-pp.write
>        4.26 ±  5%      -0.6        3.69        perf-profile.calltrace.cycles-pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write
>        1.58 ±  3%      -0.6        1.02 ±  8%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
>        1.31 ±  3%      -0.5        0.85 ±  8%  perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
>        1.25 ±  3%      -0.4        0.81 ±  8%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
>        0.84 ±  3%      -0.2        0.60 ±  5%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        7.91            -0.2        7.68        perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>        3.17 ±  2%      -0.2        2.94        perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
>        7.80            -0.2        7.58        perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>        7.58            -0.2        7.36        perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
>        1.22 ±  4%      -0.2        1.02 ±  4%  perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
>        1.18 ±  4%      -0.2        0.99 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_timeout
>        0.87            -0.2        0.68 ±  8%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_timeout
>        1.14 ±  4%      -0.2        0.95 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule
>        0.90            -0.2        0.72 ±  7%  perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
>        3.45 ±  3%      -0.1        3.30        perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
>        1.96            -0.1        1.82        perf-profile.calltrace.cycles-pp.clear_bhb_loop.read
>        1.97            -0.1        1.86        perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
>        2.35            -0.1        2.25        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
>        2.58            -0.1        2.48        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
>        1.38 ±  4%      -0.1        1.28 ±  2%  perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
>        1.35            -0.1        1.25        perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
>        0.67 ±  7%      -0.1        0.58 ±  3%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule
>        2.59            -0.1        2.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
>        2.02            -0.1        1.96        perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>        0.77 ±  3%      -0.0        0.72 ±  2%  perf-profile.calltrace.cycles-pp.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>        0.65 ±  4%      -0.0        0.60 ±  2%  perf-profile.calltrace.cycles-pp.fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
>        0.74            -0.0        0.70        perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>        1.04            -0.0        0.99        perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb
>        0.69            -0.0        0.65 ±  2%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter
>        0.82            -0.0        0.80        perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
>        0.57            -0.0        0.56        perf-profile.calltrace.cycles-pp.refill_obj_stock.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
>        0.80 ±  9%      +0.2        1.01 ±  8%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
>        2.50 ±  4%      +0.3        2.82 ±  9%  perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>        2.64 ±  6%      +0.4        3.06 ± 12%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic
>        2.73 ±  6%      +0.4        3.16 ± 12%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
>        2.87 ±  6%      +0.4        3.30 ± 12%  perf-profile.calltrace.cycles-pp.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>       18.38            +0.6       18.93        perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
>        0.00            +0.7        0.70 ± 11%  perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>        0.00            +0.8        0.76 ± 16%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_stream_sendmsg.sock_write_iter.vfs_write
>        0.00            +1.5        1.46 ± 11%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
>        0.00            +1.5        1.46 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>        0.00            +1.5        1.46 ± 11%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>        0.00            +1.5        1.50 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>        0.00            +1.5        1.52 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>        0.00            +1.6        1.61 ± 11%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        0.18 ±141%      +1.8        1.93 ± 11%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        0.18 ±141%      +1.8        1.94 ± 11%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
>        0.18 ±141%      +1.8        1.94 ± 11%  perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
>        0.18 ±141%      +1.8        1.97 ± 11%  perf-profile.calltrace.cycles-pp.common_startup_64
>        0.00            +2.0        1.96 ± 11%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
>       87.96            -1.4       86.57        perf-profile.children.cycles-pp.do_syscall_64
>       88.72            -1.4       87.33        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       51.44            -1.4       50.05        perf-profile.children.cycles-pp.read
>        4.55 ±  2%      -0.8        3.74 ±  5%  perf-profile.children.cycles-pp.schedule
>        3.76 ±  4%      -0.7        3.02 ±  3%  perf-profile.children.cycles-pp.__wake_up_common
>        3.64 ±  4%      -0.7        2.92 ±  3%  perf-profile.children.cycles-pp.autoremove_wake_function
>        3.60 ±  4%      -0.7        2.90 ±  3%  perf-profile.children.cycles-pp.try_to_wake_up
>        4.00 ±  2%      -0.6        3.36 ±  4%  perf-profile.children.cycles-pp.schedule_timeout
>        4.65 ±  2%      -0.6        4.02 ±  4%  perf-profile.children.cycles-pp.__schedule
>       47.64            -0.6       47.01        perf-profile.children.cycles-pp.write
>        4.58 ±  4%      -0.5        4.06        perf-profile.children.cycles-pp.__wake_up_sync_key
>        1.45 ±  2%      -0.4        1.00 ±  5%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>        1.84 ±  3%      -0.3        1.50 ±  3%  perf-profile.children.cycles-pp.ttwu_do_activate
>        1.62 ±  2%      -0.3        1.33 ±  3%  perf-profile.children.cycles-pp.enqueue_task
>        1.53 ±  2%      -0.3        1.26 ±  3%  perf-profile.children.cycles-pp.enqueue_task_fair
>        1.40            -0.3        1.14 ±  6%  perf-profile.children.cycles-pp.pick_next_task_fair
>        3.97            -0.2        3.73        perf-profile.children.cycles-pp.clear_bhb_loop
>        1.43            -0.2        1.19 ±  5%  perf-profile.children.cycles-pp.__pick_next_task
>        0.75 ±  4%      -0.2        0.52 ±  8%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
>        7.95            -0.2        7.72        perf-profile.children.cycles-pp.unix_stream_read_actor
>        7.84            -0.2        7.61        perf-profile.children.cycles-pp.skb_copy_datagram_iter
>        3.24 ±  2%      -0.2        3.01        perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
>        7.63            -0.2        7.42        perf-profile.children.cycles-pp.__skb_datagram_iter
>        0.94 ±  4%      -0.2        0.73 ±  4%  perf-profile.children.cycles-pp.enqueue_entity
>        0.95 ±  8%      -0.2        0.76 ±  4%  perf-profile.children.cycles-pp.update_curr
>        1.37 ±  3%      -0.2        1.18 ±  3%  perf-profile.children.cycles-pp.dequeue_task_fair
>        1.34 ±  4%      -0.2        1.16 ±  3%  perf-profile.children.cycles-pp.try_to_block_task
>        4.50            -0.2        4.34        perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
>        1.37 ±  3%      -0.2        1.20 ±  3%  perf-profile.children.cycles-pp.dequeue_entities
>        3.48 ±  3%      -0.1        3.33        perf-profile.children.cycles-pp._copy_to_iter
>        0.91            -0.1        0.78 ±  3%  perf-profile.children.cycles-pp.update_load_avg
>        4.85            -0.1        4.72        perf-profile.children.cycles-pp.__check_object_size
>        3.23            -0.1        3.11        perf-profile.children.cycles-pp.entry_SYSCALL_64
>        0.54 ±  3%      -0.1        0.42 ±  5%  perf-profile.children.cycles-pp.switch_mm_irqs_off
>        1.40 ±  4%      -0.1        1.30 ±  2%  perf-profile.children.cycles-pp._copy_from_iter
>        2.02            -0.1        1.92        perf-profile.children.cycles-pp.its_return_thunk
>        0.43 ±  2%      -0.1        0.32 ±  3%  perf-profile.children.cycles-pp.switch_fpu_return
>        0.29 ±  2%      -0.1        0.18 ±  6%  perf-profile.children.cycles-pp.__enqueue_entity
>        1.46 ±  3%      -0.1        1.36 ±  2%  perf-profile.children.cycles-pp.fdget_pos
>        0.44 ±  3%      -0.1        0.34 ±  5%  perf-profile.children.cycles-pp.set_next_entity
>        0.42 ±  2%      -0.1        0.32 ±  4%  perf-profile.children.cycles-pp.pick_task_fair
>        0.31 ±  2%      -0.1        0.24 ±  6%  perf-profile.children.cycles-pp.reweight_entity
>        0.28 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.__dequeue_entity
>        1.96            -0.1        1.88        perf-profile.children.cycles-pp.obj_cgroup_charge_account
>        0.28 ±  2%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.update_cfs_group
>        0.23 ±  2%      -0.1        0.16 ±  5%  perf-profile.children.cycles-pp.pick_eevdf
>        0.26 ±  2%      -0.1        0.19 ±  4%  perf-profile.children.cycles-pp.wakeup_preempt
>        1.46            -0.1        1.40        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>        0.48 ±  2%      -0.1        0.42 ±  5%  perf-profile.children.cycles-pp.__rseq_handle_notify_resume
>        0.30            -0.1        0.24 ±  4%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
>        0.82            -0.1        0.77        perf-profile.children.cycles-pp.__cond_resched
>        0.27 ±  2%      -0.0        0.22 ±  4%  perf-profile.children.cycles-pp.__update_load_avg_se
>        0.14 ±  3%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.update_curr_se
>        0.79            -0.0        0.74        perf-profile.children.cycles-pp.mutex_lock
>        0.34 ±  3%      -0.0        0.30 ±  5%  perf-profile.children.cycles-pp.rseq_ip_fixup
>        0.15 ±  4%      -0.0        0.11 ±  5%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>        0.21 ±  3%      -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.__switch_to
>        0.17 ±  4%      -0.0        0.13 ±  7%  perf-profile.children.cycles-pp.place_entity
>        0.22            -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.wake_affine
>        0.24            -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.check_stack_object
>        0.64 ±  2%      -0.0        0.61 ±  3%  perf-profile.children.cycles-pp.__virt_addr_valid
>        0.38 ±  2%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
>        0.18 ±  3%      -0.0        0.14 ±  6%  perf-profile.children.cycles-pp.update_rq_clock
>        0.66            -0.0        0.62        perf-profile.children.cycles-pp.rw_verify_area
>        0.19            -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.task_mm_cid_work
>        0.34 ±  3%      -0.0        0.31 ±  2%  perf-profile.children.cycles-pp.update_process_times
>        0.12 ±  8%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.detach_tasks
>        0.39 ±  3%      -0.0        0.36 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>        0.21 ±  3%      -0.0        0.18 ±  6%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
>        0.18 ±  6%      -0.0        0.15 ±  4%  perf-profile.children.cycles-pp.task_tick_fair
>        0.25 ±  3%      -0.0        0.22 ±  4%  perf-profile.children.cycles-pp.rseq_get_rseq_cs
>        0.23 ±  5%      -0.0        0.20 ±  3%  perf-profile.children.cycles-pp.sched_tick
>        0.14 ±  3%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.check_preempt_wakeup_fair
>        0.11 ±  4%      -0.0        0.08 ±  7%  perf-profile.children.cycles-pp.update_min_vruntime
>        0.06            -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.update_curr_dl_se
>        0.14 ±  3%      -0.0        0.12 ±  5%  perf-profile.children.cycles-pp.put_prev_entity
>        0.13 ±  5%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.task_h_load
>        0.68            -0.0        0.65        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>        0.46 ±  2%      -0.0        0.43 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
>        0.52            -0.0        0.50        perf-profile.children.cycles-pp.scm_recv_unix
>        0.08 ±  4%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.__cgroup_account_cputime
>        0.11 ±  5%      -0.0        0.09 ±  4%  perf-profile.children.cycles-pp.__switch_to_asm
>        0.46 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>        0.08 ±  8%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.activate_task
>        0.08 ±  8%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.detach_task
>        0.11 ±  5%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.os_xsave
>        0.13 ±  5%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.avg_vruntime
>        0.13 ±  4%      -0.0        0.11 ±  5%  perf-profile.children.cycles-pp.update_entity_lag
>        0.08 ±  4%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.__calc_delta
>        0.09 ±  5%      -0.0        0.07 ±  8%  perf-profile.children.cycles-pp.vruntime_eligible
>        0.34 ±  2%      -0.0        0.32        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
>        0.30            -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.__build_skb_around
>        0.08 ±  5%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.rseq_update_cpu_node_id
>        0.15            -0.0        0.14        perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
>        0.07 ±  5%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.native_irq_return_iret
>        0.38 ±  2%      +0.0        0.40 ±  2%  perf-profile.children.cycles-pp.mod_memcg_lruvec_state
>        0.27 ±  2%      +0.0        0.30 ±  2%  perf-profile.children.cycles-pp.prepare_task_switch
>        0.05 ±  7%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.handle_softirqs
>        0.06            +0.0        0.09 ± 11%  perf-profile.children.cycles-pp.finish_wait
>        0.06 ±  7%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.__irq_exit_rcu
>        0.06 ±  8%      +0.1        0.11 ±  8%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
>        0.01 ±223%      +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.ktime_get
>        0.54 ±  4%      +0.1        0.61        perf-profile.children.cycles-pp.select_task_rq
>        0.00            +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.enqueue_dl_entity
>        0.12 ±  4%      +0.1        0.19 ±  7%  perf-profile.children.cycles-pp.get_any_partial
>        0.10 ±  9%      +0.1        0.18 ±  5%  perf-profile.children.cycles-pp.available_idle_cpu
>        0.00            +0.1        0.08 ±  9%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
>        0.00            +0.1        0.08 ± 11%  perf-profile.children.cycles-pp.dl_server_start
>        0.00            +0.1        0.08 ± 11%  perf-profile.children.cycles-pp.dl_server_stop
>        0.46 ±  2%      +0.1        0.54 ±  2%  perf-profile.children.cycles-pp.select_task_rq_fair
>        0.00            +0.1        0.10 ± 10%  perf-profile.children.cycles-pp.select_idle_core
>        0.09 ±  7%      +0.1        0.20 ±  8%  perf-profile.children.cycles-pp.select_idle_cpu
>        0.18 ±  4%      +0.1        0.31 ±  6%  perf-profile.children.cycles-pp.select_idle_sibling
>        0.00            +0.2        0.18 ±  4%  perf-profile.children.cycles-pp.process_one_work
>        0.06 ± 13%      +0.2        0.25 ±  9%  perf-profile.children.cycles-pp.schedule_idle
>        0.44 ±  2%      +0.2        0.64 ±  8%  perf-profile.children.cycles-pp.prepare_to_wait
>        0.00            +0.2        0.21 ±  5%  perf-profile.children.cycles-pp.kthread
>        0.00            +0.2        0.21 ±  5%  perf-profile.children.cycles-pp.worker_thread
>        0.00            +0.2        0.21 ±  4%  perf-profile.children.cycles-pp.ret_from_fork
>        0.00            +0.2        0.21 ±  4%  perf-profile.children.cycles-pp.ret_from_fork_asm
>        0.11 ± 12%      +0.3        0.36 ±  9%  perf-profile.children.cycles-pp.sched_ttwu_pending
>        0.31 ± 35%      +0.3        0.59 ± 11%  perf-profile.children.cycles-pp.__cmd_record
>        0.26 ± 45%      +0.3        0.54 ± 13%  perf-profile.children.cycles-pp.perf_session__process_events
>        0.26 ± 45%      +0.3        0.54 ± 13%  perf-profile.children.cycles-pp.reader__read_event
>        0.26 ± 45%      +0.3        0.54 ± 13%  perf-profile.children.cycles-pp.record__finish_output
>        0.16 ± 11%      +0.3        0.45 ±  9%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>        0.14 ± 11%      +0.3        0.45 ±  9%  perf-profile.children.cycles-pp.__sysvec_call_function_single
>        0.14 ± 60%      +0.3        0.48 ± 17%  perf-profile.children.cycles-pp.ordered_events__queue
>        0.14 ± 61%      +0.3        0.48 ± 17%  perf-profile.children.cycles-pp.queue_event
>        0.15 ± 59%      +0.3        0.49 ± 16%  perf-profile.children.cycles-pp.process_simple
>        0.16 ± 12%      +0.4        0.54 ± 10%  perf-profile.children.cycles-pp.sysvec_call_function_single
>        4.61 ±  3%      +0.5        5.13 ±  8%  perf-profile.children.cycles-pp.get_partial_node
>        5.57 ±  3%      +0.6        6.12 ±  7%  perf-profile.children.cycles-pp.___slab_alloc
>       18.44            +0.6       19.00        perf-profile.children.cycles-pp.sock_alloc_send_pskb
>        6.51 ±  3%      +0.7        7.26 ±  9%  perf-profile.children.cycles-pp.__put_partials
>        0.33 ± 14%      +1.0        1.30 ± 11%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
>        0.34 ± 17%      +1.1        1.47 ± 11%  perf-profile.children.cycles-pp.pv_native_safe_halt
>        0.34 ± 17%      +1.1        1.48 ± 11%  perf-profile.children.cycles-pp.acpi_safe_halt
>        0.34 ± 17%      +1.1        1.48 ± 11%  perf-profile.children.cycles-pp.acpi_idle_do_entry
>        0.34 ± 17%      +1.1        1.48 ± 11%  perf-profile.children.cycles-pp.acpi_idle_enter
>        0.35 ± 17%      +1.2        1.53 ± 11%  perf-profile.children.cycles-pp.cpuidle_enter_state
>        0.35 ± 17%      +1.2        1.54 ± 11%  perf-profile.children.cycles-pp.cpuidle_enter
>        0.38 ± 17%      +1.3        1.63 ± 11%  perf-profile.children.cycles-pp.cpuidle_idle_call
>        0.45 ± 16%      +1.5        1.94 ± 11%  perf-profile.children.cycles-pp.start_secondary
>        0.46 ± 17%      +1.5        1.96 ± 11%  perf-profile.children.cycles-pp.do_idle
>        0.46 ± 17%      +1.5        1.97 ± 11%  perf-profile.children.cycles-pp.common_startup_64
>        0.46 ± 17%      +1.5        1.97 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
>       13.76 ±  2%      +1.7       15.44 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       12.09 ±  2%      +1.9       14.00 ±  6%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>        3.93            -0.2        3.69        perf-profile.self.cycles-pp.clear_bhb_loop
>        3.43 ±  3%      -0.1        3.29        perf-profile.self.cycles-pp._copy_to_iter
>        0.50 ±  2%      -0.1        0.39 ±  5%  perf-profile.self.cycles-pp.switch_mm_irqs_off
>        1.37 ±  4%      -0.1        1.27 ±  2%  perf-profile.self.cycles-pp._copy_from_iter
>        0.28 ±  2%      -0.1        0.18 ±  7%  perf-profile.self.cycles-pp.__enqueue_entity
>        1.41 ±  3%      -0.1        1.31 ±  2%  perf-profile.self.cycles-pp.fdget_pos
>        2.51            -0.1        2.42        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
>        1.35            -0.1        1.28        perf-profile.self.cycles-pp.read
>        2.24            -0.1        2.17        perf-profile.self.cycles-pp.do_syscall_64
>        0.27 ±  3%      -0.1        0.20 ±  3%  perf-profile.self.cycles-pp.update_cfs_group
>        1.28            -0.1        1.22        perf-profile.self.cycles-pp.sock_write_iter
>        0.84            -0.1        0.77        perf-profile.self.cycles-pp.vfs_read
>        1.42            -0.1        1.36        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>        1.20            -0.1        1.14        perf-profile.self.cycles-pp.__alloc_skb
>        0.18 ±  2%      -0.1        0.13 ±  5%  perf-profile.self.cycles-pp.pick_eevdf
>        1.04            -0.1        0.99        perf-profile.self.cycles-pp.its_return_thunk
>        0.29 ±  2%      -0.1        0.24 ±  4%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
>        0.28 ±  5%      -0.1        0.23 ±  6%  perf-profile.self.cycles-pp.update_curr
>        0.13 ±  5%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.switch_fpu_return
>        0.20 ±  3%      -0.0        0.15 ±  6%  perf-profile.self.cycles-pp.__dequeue_entity
>        1.00            -0.0        0.95        perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
>        0.33            -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.update_load_avg
>        0.88            -0.0        0.83 ±  2%  perf-profile.self.cycles-pp.vfs_write
>        0.91            -0.0        0.86        perf-profile.self.cycles-pp.sock_read_iter
>        0.13 ±  3%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.update_curr_se
>        0.25 ±  2%      -0.0        0.21 ±  4%  perf-profile.self.cycles-pp.__update_load_avg_se
>        1.22            -0.0        1.18        perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
>        0.68            -0.0        0.63        perf-profile.self.cycles-pp.__check_object_size
>        0.78 ±  2%      -0.0        0.74        perf-profile.self.cycles-pp.obj_cgroup_charge_account
>        0.20 ±  3%      -0.0        0.16 ±  4%  perf-profile.self.cycles-pp.__switch_to
>        0.15 ±  3%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.try_to_wake_up
>        0.90            -0.0        0.86        perf-profile.self.cycles-pp.entry_SYSCALL_64
>        0.76 ±  2%      -0.0        0.73        perf-profile.self.cycles-pp.__check_heap_object
>        0.92            -0.0        0.89 ±  2%  perf-profile.self.cycles-pp.__account_obj_stock
>        0.19 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.check_stack_object
>        0.40 ±  3%      -0.0        0.37        perf-profile.self.cycles-pp.__schedule
>        0.60 ±  2%      -0.0        0.56 ±  3%  perf-profile.self.cycles-pp.__virt_addr_valid
>        0.71            -0.0        0.68        perf-profile.self.cycles-pp.__skb_datagram_iter
>        0.18 ±  4%      -0.0        0.14 ±  5%  perf-profile.self.cycles-pp.task_mm_cid_work
>        0.68            -0.0        0.65        perf-profile.self.cycles-pp.refill_obj_stock
>        0.34            -0.0        0.31 ±  2%  perf-profile.self.cycles-pp.unix_stream_recvmsg
>        0.06 ±  7%      -0.0        0.03 ± 70%  perf-profile.self.cycles-pp.enqueue_task
>        0.11            -0.0        0.08        perf-profile.self.cycles-pp.pick_task_fair
>        0.15 ±  2%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.enqueue_task_fair
>        0.20 ±  3%      -0.0        0.17 ±  7%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
>        0.41            -0.0        0.38        perf-profile.self.cycles-pp.sock_recvmsg
>        0.10            -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.update_min_vruntime
>        0.13 ±  3%      -0.0        0.10        perf-profile.self.cycles-pp.task_h_load
>        0.23 ±  3%      -0.0        0.20 ±  6%  perf-profile.self.cycles-pp.__get_user_8
>        0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.exit_to_user_mode_loop
>        0.39 ±  2%      -0.0        0.37 ±  2%  perf-profile.self.cycles-pp.rw_verify_area
>        0.11 ±  3%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.os_xsave
>        0.12 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.pick_next_task_fair
>        0.35            -0.0        0.33 ±  2%  perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
>        0.46            -0.0        0.44        perf-profile.self.cycles-pp.mutex_lock
>        0.11 ±  4%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__switch_to_asm
>        0.10 ±  3%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.enqueue_entity
>        0.08 ±  7%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.place_entity
>        0.30            -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.alloc_skb_with_frags
>        0.50            -0.0        0.48        perf-profile.self.cycles-pp.kfree
>        0.30            -0.0        0.28        perf-profile.self.cycles-pp.ksys_write
>        0.12 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.dequeue_entity
>        0.11 ±  4%      -0.0        0.09        perf-profile.self.cycles-pp.prepare_to_wait
>        0.19 ±  2%      -0.0        0.17        perf-profile.self.cycles-pp.update_rq_clock_task
>        0.27            -0.0        0.25 ±  2%  perf-profile.self.cycles-pp.__build_skb_around
>        0.08 ±  6%      -0.0        0.06 ±  9%  perf-profile.self.cycles-pp.vruntime_eligible
>        0.12 ±  4%      -0.0        0.10        perf-profile.self.cycles-pp.__wake_up_common
>        0.27            -0.0        0.26        perf-profile.self.cycles-pp.kmalloc_reserve
>        0.48            -0.0        0.46        perf-profile.self.cycles-pp.unix_write_space
>        0.19            -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.skb_copy_datagram_iter
>        0.07            -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.__calc_delta
>        0.06 ±  6%      -0.0        0.05        perf-profile.self.cycles-pp.__put_user_8
>        0.28            -0.0        0.27        perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
>        0.11            -0.0        0.10        perf-profile.self.cycles-pp.wait_for_unix_gc
>        0.05            +0.0        0.06        perf-profile.self.cycles-pp.__x64_sys_write
>        0.07 ±  5%      +0.0        0.08 ±  5%  perf-profile.self.cycles-pp.native_irq_return_iret
>        0.19 ±  7%      +0.0        0.22 ±  4%  perf-profile.self.cycles-pp.prepare_task_switch
>        0.10 ±  6%      +0.1        0.17 ±  5%  perf-profile.self.cycles-pp.available_idle_cpu
>        0.14 ± 61%      +0.3        0.48 ± 17%  perf-profile.self.cycles-pp.queue_event
>        0.19 ± 18%      +0.7        0.85 ± 12%  perf-profile.self.cycles-pp.pv_native_safe_halt
>       12.07 ±  2%      +1.9       13.97 ±  6%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> 
> 
> 
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_3/aim9/300s
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>        9156           +20.2%      11004        vmstat.system.cs
>     8715946 ±  6%     -14.0%    7494314 ± 13%  meminfo.DirectMap2M
>       10992           +85.4%      20381        meminfo.PageTables
>      318.58            -1.7%     313.01        aim9.shell_rtns_3.ops_per_sec
>    27145198            -2.1%   26576524        aim9.time.minor_page_faults
>     1049306            -1.8%    1030938        aim9.time.voluntary_context_switches
>        6173 ± 20%     +74.0%      10742 ±  4%  numa-meminfo.node0.PageTables
>        5702 ± 31%     +55.1%       8844 ± 19%  numa-meminfo.node0.Shmem
>        4803 ± 25%    +100.6%       9636 ±  6%  numa-meminfo.node1.PageTables
>        1538 ± 20%     +73.7%       2673 ±  5%  numa-vmstat.node0.nr_page_table_pages
>        1425 ± 31%     +55.1%       2210 ± 19%  numa-vmstat.node0.nr_shmem
>        1194 ± 25%    +101.2%       2402 ±  6%  numa-vmstat.node1.nr_page_table_pages
>       30413           +19.3%      36291        sched_debug.cpu.nr_switches.avg
>       84768 ±  6%     +20.3%     101955 ±  4%  sched_debug.cpu.nr_switches.max
>       25510 ± 13%     +23.0%      31383 ±  3%  sched_debug.cpu.nr_switches.stddev
>        2727           +85.8%       5066        proc-vmstat.nr_page_table_pages
>    19325131            -1.6%   19014535        proc-vmstat.numa_hit
>    19274656            -1.6%   18964467        proc-vmstat.numa_local
>    19877211            -1.6%   19563123        proc-vmstat.pgalloc_normal
>    28020416            -2.0%   27451741        proc-vmstat.pgfault
>    19829318            -1.6%   19508263        proc-vmstat.pgfree
>        2679            -1.6%       2636        proc-vmstat.unevictable_pgs_culled
>        0.03 ± 10%     +30.9%       0.04 ±  2%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.02 ±  5%     +26.2%       0.02 ±  3%  perf-sched.total_sch_delay.average.ms
>       27.03 ±  2%     -12.4%      23.66        perf-sched.total_wait_and_delay.average.ms
>       23171           +18.2%      27385        perf-sched.total_wait_and_delay.count.ms
>       27.01 ±  2%     -12.5%      23.64        perf-sched.total_wait_time.average.ms
>      110.73 ±  4%     -71.1%      31.98        perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1662 ±  2%    +278.6%       6294        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      110.70 ±  4%     -71.1%      31.94        perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        5.94            +0.1        6.00        perf-stat.i.branch-miss-rate%
>        9184           +20.2%      11041        perf-stat.i.context-switches
>        1.96            +1.6%       1.99        perf-stat.i.cpi
>       71.73 ±  4%     +66.1%     119.11 ±  5%  perf-stat.i.cpu-migrations
>        0.53            -1.4%       0.52        perf-stat.i.ipc
>        3.79            -2.0%       3.71        perf-stat.i.metric.K/sec
>       90919            -2.0%      89065        perf-stat.i.minor-faults
>       90919            -2.0%      89065        perf-stat.i.page-faults
>        6.00            +0.1        6.06        perf-stat.overall.branch-miss-rate%
>        1.79            +1.2%       1.81        perf-stat.overall.cpi
>        0.56            -1.2%       0.55        perf-stat.overall.ipc
>        9154           +20.2%      11004        perf-stat.ps.context-switches
>       71.49 ±  4%     +66.1%     118.72 ±  5%  perf-stat.ps.cpu-migrations
>       90616            -2.0%      88768        perf-stat.ps.minor-faults
>       90616            -2.0%      88768        perf-stat.ps.page-faults
>        8.89            -0.2        8.68        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>        8.88            -0.2        8.66        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        3.47 ±  2%      -0.2        3.29        perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        3.47 ±  2%      -0.2        3.29        perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        3.51 ±  3%      -0.2        3.33        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        3.47 ±  2%      -0.2        3.29        perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
>        1.66 ±  2%      -0.1        1.57 ±  4%  perf-profile.calltrace.cycles-pp.setlocale
>        0.27 ±100%      +0.3        0.61 ±  5%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>        0.18 ±141%      +0.4        0.60 ±  5%  perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
>       62.46            +0.6       63.01        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>        0.09 ±223%      +0.6        0.65 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>        0.09 ±223%      +0.6        0.65 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       49.01            +0.6       49.60        perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       67.47            +0.7       68.17        perf-profile.calltrace.cycles-pp.common_startup_64
>       20.25            -0.7       19.58        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       20.21            -0.7       19.54        perf-profile.children.cycles-pp.do_syscall_64
>        6.54            -0.2        6.33        perf-profile.children.cycles-pp.asm_exc_page_fault
>        6.10            -0.2        5.90        perf-profile.children.cycles-pp.do_user_addr_fault
>        3.77 ±  3%      -0.2        3.60        perf-profile.children.cycles-pp.x64_sys_call
>        3.62 ±  3%      -0.2        3.46        perf-profile.children.cycles-pp.do_exit
>        2.63 ±  3%      -0.2        2.48 ±  2%  perf-profile.children.cycles-pp.__mmput
>        2.16 ±  2%      -0.1        2.06 ±  3%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>        1.66 ±  2%      -0.1        1.57 ±  4%  perf-profile.children.cycles-pp.setlocale
>        2.69 ±  2%      -0.1        2.61        perf-profile.children.cycles-pp.do_pte_missing
>        0.77 ±  5%      -0.1        0.70 ±  6%  perf-profile.children.cycles-pp.tlb_finish_mmu
>        0.92 ±  2%      -0.0        0.87 ±  4%  perf-profile.children.cycles-pp.__irqentry_text_end
>        0.08 ± 10%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.tick_nohz_tick_stopped
>        0.10 ± 11%      -0.0        0.07 ± 21%  perf-profile.children.cycles-pp.__percpu_counter_init_many
>        0.14 ±  9%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.strnlen
>        0.12 ± 11%      -0.0        0.10 ±  8%  perf-profile.children.cycles-pp.mas_prev_slot
>        0.11 ± 12%      +0.0        0.14 ±  9%  perf-profile.children.cycles-pp.update_curr
>        0.19 ±  8%      +0.0        0.22 ±  6%  perf-profile.children.cycles-pp.enqueue_entity
>        0.10 ± 11%      +0.0        0.13 ± 11%  perf-profile.children.cycles-pp.__perf_event_task_sched_out
>        0.05 ± 46%      +0.0        0.08 ± 13%  perf-profile.children.cycles-pp.select_task_rq
>        0.13 ± 14%      +0.0        0.17 ±  8%  perf-profile.children.cycles-pp.perf_pmu_sched_task
>        0.20 ± 10%      +0.0        0.24 ±  2%  perf-profile.children.cycles-pp.try_to_wake_up
>        0.28 ±  9%      +0.1        0.34 ±  9%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>        0.04 ± 44%      +0.1        0.11 ± 13%  perf-profile.children.cycles-pp.__queue_work
>        0.30 ± 11%      +0.1        0.38 ±  8%  perf-profile.children.cycles-pp.ttwu_do_activate
>        0.30 ±  4%      +0.1        0.38 ±  8%  perf-profile.children.cycles-pp.__pick_next_task
>        0.22 ±  7%      +0.1        0.29 ±  9%  perf-profile.children.cycles-pp.try_to_block_task
>        0.02 ±141%      +0.1        0.09 ± 10%  perf-profile.children.cycles-pp.kick_pool
>        0.02 ± 99%      +0.1        0.10 ± 19%  perf-profile.children.cycles-pp.queue_work_on
>        0.25 ±  4%      +0.1        0.35 ±  7%  perf-profile.children.cycles-pp.sched_ttwu_pending
>        0.33 ±  6%      +0.1        0.43 ±  5%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
>        0.29 ±  4%      +0.1        0.39 ±  6%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>        0.51 ±  6%      +0.1        0.63 ±  6%  perf-profile.children.cycles-pp.schedule_idle
>        0.46 ±  7%      +0.1        0.58 ±  5%  perf-profile.children.cycles-pp.schedule
>        0.88 ±  6%      +0.2        1.04 ±  5%  perf-profile.children.cycles-pp.ret_from_fork_asm
>        0.18 ±  6%      +0.2        0.34 ±  8%  perf-profile.children.cycles-pp.worker_thread
>        0.88 ±  6%      +0.2        1.04 ±  5%  perf-profile.children.cycles-pp.ret_from_fork
>        0.38 ±  8%      +0.2        0.56 ± 10%  perf-profile.children.cycles-pp.kthread
>        1.08 ±  3%      +0.2        1.32 ±  2%  perf-profile.children.cycles-pp.__schedule
>       66.15            +0.5       66.64        perf-profile.children.cycles-pp.cpuidle_idle_call
>       62.89            +0.6       63.47        perf-profile.children.cycles-pp.cpuidle_enter_state
>       63.00            +0.6       63.59        perf-profile.children.cycles-pp.cpuidle_enter
>       49.10            +0.6       49.69        perf-profile.children.cycles-pp.intel_idle
>       67.47            +0.7       68.17        perf-profile.children.cycles-pp.do_idle
>       67.47            +0.7       68.17        perf-profile.children.cycles-pp.common_startup_64
>       67.47            +0.7       68.17        perf-profile.children.cycles-pp.cpu_startup_entry
>        0.91 ±  2%      -0.0        0.86 ±  4%  perf-profile.self.cycles-pp.__irqentry_text_end
>        0.14 ± 11%      +0.1        0.22 ± 11%  perf-profile.self.cycles-pp.timerqueue_del
>       49.08            +0.6       49.68        perf-profile.self.cycles-pp.intel_idle
> 
> 
> 
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
>    gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/800%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>     3745213 ± 39%    +108.1%    7794858 ± 12%  cpuidle..usage
>      186670           +17.3%     218939 ±  2%  meminfo.Percpu
>        5.00          +306.7%      20.33 ± 66%  mpstat.max_utilization.seconds
>        9.35 ± 76%      -4.5        4.80 ±141%  perf-profile.calltrace.cycles-pp.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
>        8.90 ± 75%      -4.3        4.57 ±141%  perf-profile.calltrace.cycles-pp.perf_session__deliver_event.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
>        3283 ±  7%     -16.2%       2751 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.avg
>        3283 ±  7%     -16.2%       2751 ±  5%  sched_debug.cfs_rq:/.min_vruntime.avg
>     1522512 ±  6%     +80.0%    2739797 ±  4%  vmstat.system.cs
>      308726 ±  8%     +60.5%     495472 ±  5%  vmstat.system.in
>      467562            +3.7%     485068 ±  2%  proc-vmstat.nr_kernel_stack
>      266084            +3.8%     276310        proc-vmstat.nr_slab_unreclaimable
>   1.375e+08            -2.0%  1.347e+08        proc-vmstat.numa_hit
>   1.373e+08            -2.0%  1.346e+08        proc-vmstat.numa_local
>      217472 ±  3%     -28.1%     156410        proc-vmstat.numa_other
>   1.382e+08            -2.0%  1.354e+08        proc-vmstat.pgalloc_normal
>   1.375e+08            -2.0%  1.347e+08        proc-vmstat.pgfree
>     1514102            -6.2%    1420287        hackbench.throughput
>     1480357            -6.7%    1380775        hackbench.throughput_avg
>     1514102            -6.2%    1420287        hackbench.throughput_best
>     1436918            -7.9%    1323413        hackbench.throughput_worst
>    14551264 ± 13%    +138.1%   34644707 ±  3%  hackbench.time.involuntary_context_switches
>        9919            -1.6%       9762        hackbench.time.percent_of_cpu_this_job_got
>        4239            +4.5%       4428        hackbench.time.system_time
>    56365933 ±  6%     +65.3%   93172066 ±  4%  hackbench.time.voluntary_context_switches
>    65085618           +26.7%   82440571 ±  2%  perf-stat.i.branch-misses
>       31.25            -1.6       29.66        perf-stat.i.cache-miss-rate%
>   2.469e+08            +8.9%  2.689e+08        perf-stat.i.cache-misses
>   7.519e+08           +15.9%  8.712e+08        perf-stat.i.cache-references
>     1353061 ±  7%     +87.5%    2537450 ±  5%  perf-stat.i.context-switches
>   2.269e+11            +3.5%  2.348e+11        perf-stat.i.cpu-cycles
>      134588 ± 13%     +81.9%     244825 ±  8%  perf-stat.i.cpu-migrations
>       13.60 ±  5%     +70.5%      23.20 ±  5%  perf-stat.i.metric.K/sec
>        1.26            +7.6%       1.35        perf-stat.overall.MPKI
>        0.11 ±  2%      +0.0        0.14 ±  2%  perf-stat.overall.branch-miss-rate%
>       34.12            -2.1       31.97        perf-stat.overall.cache-miss-rate%
>        1.17            +1.8%       1.19        perf-stat.overall.cpi
>      931.96            -5.3%     882.44        perf-stat.overall.cycles-between-cache-misses
>        0.85            -1.8%       0.84        perf-stat.overall.ipc
>   5.372e+10            -1.2%   5.31e+10        perf-stat.ps.branch-instructions
>    57783128 ±  2%     +32.9%   76802898 ±  2%  perf-stat.ps.branch-misses
>   2.696e+08            +7.2%   2.89e+08        perf-stat.ps.cache-misses
>   7.902e+08           +14.4%  9.039e+08        perf-stat.ps.cache-references
>     1288664 ±  7%     +94.6%    2508227 ±  5%  perf-stat.ps.context-switches
>   2.512e+11            +1.5%   2.55e+11        perf-stat.ps.cpu-cycles
>      122960 ± 14%     +82.3%     224127 ±  9%  perf-stat.ps.cpu-migrations
>   1.108e+13            +5.7%  1.171e+13        perf-stat.total.instructions
>        0.94 ±223%   +5929.9%      56.62 ±121%  perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
>       26.44 ± 81%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      100.25 ±141%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        9.01 ± 43%   +1823.1%     173.24 ±106%  perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
>       49.43 ± 14%     +73.8%      85.93 ± 19%  perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
>      130.63 ± 17%    +135.8%     308.04 ± 28%  perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>       18.09 ± 30%    +130.4%      41.70 ± 26%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      196.51 ± 21%    +102.9%     398.77 ± 15%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>       34.17 ± 39%    +191.1%      99.46 ± 20%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>      154.91 ±163%   +1649.9%       2710 ± 91%  perf-sched.sch_delay.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>        0.94 ±223%  +1.9e+05%       1743 ±120%  perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
>        3.19 ±124%     -91.9%       0.26 ±150%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      646.26 ± 94%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      282.66 ±139%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       63.17 ± 52%   +2854.4%       1866 ±121%  perf-sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
>        1507 ± 35%    +249.4%       5266 ± 47%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        3915 ± 67%     +98.7%       7779 ± 16%  perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       53.31 ± 18%     +79.9%      95.90 ± 23%  perf-sched.total_sch_delay.average.ms
>      149.37 ± 18%     +80.0%     268.92 ± 22%  perf-sched.total_wait_and_delay.average.ms
>       96.07 ± 18%     +80.1%     173.01 ± 21%  perf-sched.total_wait_time.average.ms
>      244.53 ± 47%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
>      529.64 ± 20%     +38.5%     733.60 ± 20%  perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
>      136.52 ± 15%     +73.7%     237.07 ± 18%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
>      373.41 ± 16%    +136.3%     882.34 ± 27%  perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>       51.96 ± 29%    +127.5%     118.22 ± 25%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      554.86 ± 23%    +103.0%       1126 ± 14%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>      298.52 ±136%    +436.9%       1602 ± 27%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
>      556.66 ± 37%     -97.1%      16.09 ± 47%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      707.67 ± 31%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
>        1358 ± 28%   +4707.9%      65291 ± 27%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       12184 ±  5%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
>        1393 ±134%    +379.9%       6685 ± 15%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
>        6927 ±  6%    +119.8%      15224 ± 19%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      341.61 ± 21%     +39.1%     475.15 ± 20%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
>       51.39 ± 99%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      121.14 ±122%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       87.09 ± 15%     +73.6%     151.14 ± 18%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
>      242.78 ± 16%    +136.6%     574.31 ± 27%  perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>       33.86 ± 29%    +126.0%      76.52 ± 24%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      250.32 ±109%     -89.4%      26.44 ±111%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
>      358.36 ± 25%    +103.1%     727.72 ± 14%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>       77.40 ± 47%    +102.5%     156.70 ± 28%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
>       17.91 ± 42%     -75.3%       4.42 ± 76%  perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
>      266.70 ±137%    +431.6%       1417 ± 36%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
>      536.93 ± 40%     -97.4%      13.81 ± 50%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      180.38 ±135%   +2208.8%       4164 ± 71%  perf-sched.wait_time.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>        1028 ±129%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      312.94 ±123%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>      418.66 ±132%     -93.7%      26.44 ±111%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
>        1388 ±133%    +379.7%       6660 ± 15%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
>        2022 ± 25%    +164.9%       5358 ± 46%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 
> 
> 
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_1/aim9/300s
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>       11004           +86.2%      20490        meminfo.PageTables
>      121.33 ± 12%     +18.8%     144.17 ±  5%  perf-c2c.DRAM.remote
>        9155           +20.0%      10990        vmstat.system.cs
>        5129 ± 20%    +107.2%      10631 ±  3%  numa-meminfo.node0.PageTables
>        5864 ± 17%     +67.3%       9811 ±  3%  numa-meminfo.node1.PageTables
>        1278 ± 20%    +107.9%       2658 ±  3%  numa-vmstat.node0.nr_page_table_pages
>        1469 ± 17%     +66.4%       2446 ±  3%  numa-vmstat.node1.nr_page_table_pages
>      319.43            -2.1%     312.66        aim9.shell_rtns_1.ops_per_sec
>    27217846            -2.5%   26546962        aim9.time.minor_page_faults
>     1051878            -2.1%    1029547        aim9.time.voluntary_context_switches
>       30502           +18.6%      36187        sched_debug.cpu.nr_switches.avg
>       90327 ± 12%     +22.7%     110866 ±  4%  sched_debug.cpu.nr_switches.max
>       26316 ± 16%     +25.5%      33021 ±  5%  sched_debug.cpu.nr_switches.stddev
>        0.03 ±  7%     +70.7%       0.05 ± 53%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.02 ±  3%     +38.9%       0.02 ± 28%  perf-sched.total_sch_delay.average.ms
>       27.43 ±  2%     -14.5%      23.45        perf-sched.total_wait_and_delay.average.ms
>       23174           +18.0%      27340        perf-sched.total_wait_and_delay.count.ms
>       27.41 ±  2%     -14.6%      23.42        perf-sched.total_wait_time.average.ms
>      115.38 ±  3%     -71.9%      32.37 ±  2%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1656 ±  3%    +280.2%       6299        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      115.35 ±  3%     -72.0%      32.31 ±  2%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        2737           +86.1%       5095        proc-vmstat.nr_page_table_pages
>       30460            +3.2%      31439        proc-vmstat.nr_shmem
>       27933            +1.8%      28432        proc-vmstat.nr_slab_unreclaimable
>    19466749            -2.5%   18980434        proc-vmstat.numa_hit
>    19414531            -2.5%   18927584        proc-vmstat.numa_local
>    20028107            -2.5%   19528806        proc-vmstat.pgalloc_normal
>    28087705            -2.4%   27417155        proc-vmstat.pgfault
>    19980173            -2.5%   19474402        proc-vmstat.pgfree
>      420074            -5.7%     396239 ±  8%  proc-vmstat.pgreuse
>        2685            -1.9%       2633        proc-vmstat.unevictable_pgs_culled
>    5.48e+08            -1.2%  5.412e+08        perf-stat.i.branch-instructions
>        5.92            +0.1        6.00        perf-stat.i.branch-miss-rate%
>        9195           +19.9%      11021        perf-stat.i.context-switches
>        1.96            +1.7%       1.99        perf-stat.i.cpi
>       70.13           +73.4%     121.59 ±  8%  perf-stat.i.cpu-migrations
>   2.725e+09            -1.3%   2.69e+09        perf-stat.i.instructions
>        0.53            -1.6%       0.52        perf-stat.i.ipc
>        3.80            -2.4%       3.71        perf-stat.i.metric.K/sec
>       91139            -2.4%      88949        perf-stat.i.minor-faults
>       91139            -2.4%      88949        perf-stat.i.page-faults
>        5.00 ± 44%      +1.1        6.07        perf-stat.overall.branch-miss-rate%
>        1.49 ± 44%     +21.9%       1.82        perf-stat.overall.cpi
>        7643 ± 44%     +43.7%      10984        perf-stat.ps.context-switches
>       58.17 ± 44%    +108.4%     121.21 ±  8%  perf-stat.ps.cpu-migrations
>        2.06 ±  2%      -0.2        1.87 ± 12%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.98 ±  7%      -0.2        0.83 ± 12%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>        1.69 ±  2%      -0.1        1.54 ±  2%  perf-profile.calltrace.cycles-pp.setlocale
>        0.58 ±  5%      -0.1        0.44 ± 44%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__open64_nocancel.setlocale
>        0.72 ±  6%      -0.1        0.60 ±  8%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
>        3.21 ±  2%      -0.1        3.11        perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
>        0.70 ±  4%      -0.1        0.62 ±  6%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
>        1.52 ±  2%      -0.1        1.44 ±  3%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        1.34 ±  3%      -0.1        1.28 ±  3%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>        0.89 ±  3%      -0.1        0.84        perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>        0.17 ±141%      +0.4        0.61 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>        0.17 ±141%      +0.4        0.61 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       65.10            +0.5       65.56        perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>       66.40            +0.6       67.00        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>       66.46            +0.6       67.08        perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
>       66.46            +0.6       67.08        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
>       67.63            +0.7       68.30        perf-profile.calltrace.cycles-pp.common_startup_64
>       20.14            -0.6       19.51        perf-profile.children.cycles-pp.do_syscall_64
>       20.20            -0.6       19.57        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>        1.13 ±  5%      -0.2        0.98 ±  9%  perf-profile.children.cycles-pp.rcu_core
>        1.69 ±  2%      -0.1        1.54 ±  2%  perf-profile.children.cycles-pp.setlocale
>        0.84 ±  4%      -0.1        0.71 ±  5%  perf-profile.children.cycles-pp.rcu_do_batch
>        2.16 ±  2%      -0.1        2.04 ±  3%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
>        1.15 ±  4%      -0.1        1.04 ±  5%  perf-profile.children.cycles-pp.__open64_nocancel
>        3.22 ±  2%      -0.1        3.12        perf-profile.children.cycles-pp.exec_binprm
>        2.09 ±  2%      -0.1        2.00 ±  2%  perf-profile.children.cycles-pp.kernel_clone
>        0.88 ±  4%      -0.1        0.79 ±  4%  perf-profile.children.cycles-pp.mas_store_prealloc
>        2.19            -0.1        2.10 ±  3%  perf-profile.children.cycles-pp.__x64_sys_openat
>        0.70 ±  4%      -0.1        0.62 ±  6%  perf-profile.children.cycles-pp.dup_mm
>        1.36 ±  3%      -0.1        1.30        perf-profile.children.cycles-pp._Fork
>        0.56 ±  4%      -0.1        0.50 ±  8%  perf-profile.children.cycles-pp.dup_mmap
>        0.09 ± 16%      -0.1        0.03 ± 70%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
>        0.31 ±  8%      -0.1        0.25 ± 10%  perf-profile.children.cycles-pp.strncpy_from_user
>        0.94 ±  3%      -0.1        0.88 ±  2%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>        0.41 ±  5%      -0.0        0.36 ±  5%  perf-profile.children.cycles-pp.irqtime_account_irq
>        0.18 ± 12%      -0.0        0.14 ±  7%  perf-profile.children.cycles-pp.tlb_remove_table_rcu
>        0.20 ±  7%      -0.0        0.17 ±  9%  perf-profile.children.cycles-pp.perf_event_task_tick
>        0.08 ± 14%      -0.0        0.05 ± 49%  perf-profile.children.cycles-pp.mas_update_gap
>        0.24 ±  5%      -0.0        0.21 ±  5%  perf-profile.children.cycles-pp.filemap_read
>        0.19 ±  7%      -0.0        0.16 ±  8%  perf-profile.children.cycles-pp.__call_rcu_common
>        0.22 ±  2%      -0.0        0.19 ±  5%  perf-profile.children.cycles-pp.mas_next_slot
>        0.09 ±  5%      +0.0        0.12 ±  7%  perf-profile.children.cycles-pp.__perf_event_task_sched_out
>        0.05 ± 47%      +0.0        0.08 ± 10%  perf-profile.children.cycles-pp.lru_gen_del_folio
>        0.10 ± 14%      +0.0        0.12 ± 18%  perf-profile.children.cycles-pp.__folio_mod_stat
>        0.12 ± 12%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.perf_pmu_sched_task
>        0.20 ± 10%      +0.0        0.24 ±  4%  perf-profile.children.cycles-pp.prepare_task_switch
>        0.06 ± 47%      +0.0        0.10 ± 11%  perf-profile.children.cycles-pp.__queue_work
>        0.56 ±  5%      +0.1        0.61 ±  4%  perf-profile.children.cycles-pp.sched_balance_domains
>        0.04 ± 72%      +0.1        0.09 ± 11%  perf-profile.children.cycles-pp.kick_pool
>        0.04 ± 72%      +0.1        0.09 ± 14%  perf-profile.children.cycles-pp.queue_work_on
>        0.33 ±  6%      +0.1        0.38 ±  7%  perf-profile.children.cycles-pp.dequeue_entities
>        0.35 ±  6%      +0.1        0.40 ±  7%  perf-profile.children.cycles-pp.dequeue_task_fair
>        0.52 ±  6%      +0.1        0.58 ±  5%  perf-profile.children.cycles-pp.enqueue_task_fair
>        0.54 ±  7%      +0.1        0.60 ±  5%  perf-profile.children.cycles-pp.enqueue_task
>        0.28 ±  9%      +0.1        0.35 ±  5%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>        0.21 ±  4%      +0.1        0.28 ± 12%  perf-profile.children.cycles-pp.try_to_block_task
>        0.34 ±  4%      +0.1        0.42 ±  3%  perf-profile.children.cycles-pp.ttwu_do_activate
>        0.36 ±  3%      +0.1        0.46 ±  6%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
>        0.28 ±  4%      +0.1        0.38 ±  5%  perf-profile.children.cycles-pp.sched_ttwu_pending
>        0.33 ±  2%      +0.1        0.43 ±  5%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>        0.46 ±  7%      +0.1        0.56 ±  6%  perf-profile.children.cycles-pp.schedule
>        0.48 ±  8%      +0.1        0.61 ±  8%  perf-profile.children.cycles-pp.timerqueue_del
>        0.18 ± 13%      +0.1        0.32 ± 11%  perf-profile.children.cycles-pp.worker_thread
>        0.38 ±  9%      +0.2        0.52 ± 10%  perf-profile.children.cycles-pp.kthread
>        1.10 ±  5%      +0.2        1.25 ±  2%  perf-profile.children.cycles-pp.__schedule
>        0.85 ±  8%      +0.2        1.01 ±  7%  perf-profile.children.cycles-pp.ret_from_fork
>        0.85 ±  8%      +0.2        1.02 ±  7%  perf-profile.children.cycles-pp.ret_from_fork_asm
>       63.15            +0.5       63.64        perf-profile.children.cycles-pp.cpuidle_enter
>       66.26            +0.5       66.77        perf-profile.children.cycles-pp.cpuidle_idle_call
>       66.46            +0.6       67.08        perf-profile.children.cycles-pp.start_secondary
>       67.63            +0.7       68.30        perf-profile.children.cycles-pp.common_startup_64
>       67.63            +0.7       68.30        perf-profile.children.cycles-pp.cpu_startup_entry
>       67.63            +0.7       68.30        perf-profile.children.cycles-pp.do_idle
>        1.20 ±  3%      -0.1        1.12 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>        0.09 ± 16%      -0.1        0.03 ± 70%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
>        0.25 ±  6%      -0.0        0.21 ± 12%  perf-profile.self.cycles-pp.irqtime_account_irq
>        0.02 ±141%      +0.0        0.06 ± 13%  perf-profile.self.cycles-pp.prepend_path
>        0.13 ± 10%      +0.1        0.24 ± 11%  perf-profile.self.cycles-pp.timerqueue_del
> 
> 
> 
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
>    gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>   3.924e+08 ±  3%     +55.1%  6.086e+08 ±  2%  cpuidle..time
>     7504886 ± 11%    +184.4%   21340245 ±  6%  cpuidle..usage
>    13350305            -3.8%   12848570        vmstat.system.cs
>     1849619            +5.1%    1943754        vmstat.system.in
>        3.56 ±  5%      +2.6        6.16 ±  7%  mpstat.cpu.all.idle%
>        0.69            +0.2        0.90 ±  3%  mpstat.cpu.all.irq%
>        0.03 ±  3%      +0.0        0.04 ±  3%  mpstat.cpu.all.soft%
>       18666 ±  9%     +41.2%      26352 ±  6%  perf-c2c.DRAM.remote
>      197041           -39.6%     118945 ±  5%  perf-c2c.HITM.local
>        3178 ± 12%     +37.2%       4361 ± 11%  perf-c2c.HITM.remote
>      200219           -38.4%     123307 ±  5%  perf-c2c.HITM.total
>     2842579 ± 11%     +60.1%    4550025 ± 12%  meminfo.Active
>     2842579 ± 11%     +60.1%    4550025 ± 12%  meminfo.Active(anon)
>     5535242 ±  5%     +30.9%    7248257 ±  7%  meminfo.Cached
>     3846718 ±  8%     +44.0%    5539484 ±  9%  meminfo.Committed_AS
>     9684149 ±  3%     +20.5%   11666616 ±  4%  meminfo.Memused
>      136127 ±  3%     +14.2%     155524        meminfo.PageTables
>       62144           +22.8%      76336        meminfo.Percpu
>     2001586 ± 16%     +85.6%    3714611 ± 14%  meminfo.Shmem
>     9759598 ±  3%     +20.0%   11714619 ±  4%  meminfo.max_used_kB
>      710625 ± 11%     +59.3%    1131770 ± 11%  proc-vmstat.nr_active_anon
>     1383631 ±  5%     +30.6%    1806419 ±  7%  proc-vmstat.nr_file_pages
>       34220 ±  3%     +13.9%      38987        proc-vmstat.nr_page_table_pages
>      500216 ± 16%     +84.5%     923007 ± 14%  proc-vmstat.nr_shmem
>      710625 ± 11%     +59.3%    1131770 ± 11%  proc-vmstat.nr_zone_active_anon
>    92308030            +8.7%  1.004e+08        proc-vmstat.numa_hit
>    92171407            +8.7%  1.002e+08        proc-vmstat.numa_local
>      133616            +2.7%     137265        proc-vmstat.numa_other
>    92394313            +8.7%  1.004e+08        proc-vmstat.pgalloc_normal
>    91035691            +7.8%   98094626        proc-vmstat.pgfree
>      867815           +11.8%     970369        hackbench.throughput
>      830278           +11.6%     926834        hackbench.throughput_avg
>      867815           +11.8%     970369        hackbench.throughput_best
>      760822           +14.2%     869145        hackbench.throughput_worst
>       72.87           -10.3%      65.36        hackbench.time.elapsed_time
>       72.87           -10.3%      65.36        hackbench.time.elapsed_time.max
>   2.493e+08           -17.7%  2.052e+08        hackbench.time.involuntary_context_switches
>       12357            -3.9%      11879        hackbench.time.percent_of_cpu_this_job_got
>        8029           -14.8%       6842        hackbench.time.system_time
>      976.58            -5.5%     923.21        hackbench.time.user_time
>    7.54e+08           -14.4%  6.451e+08        hackbench.time.voluntary_context_switches
>   5.598e+10            +6.6%  5.965e+10        perf-stat.i.branch-instructions
>        0.40            -0.0        0.38        perf-stat.i.branch-miss-rate%
>        8.36 ±  2%      +4.6       12.98 ±  3%  perf-stat.i.cache-miss-rate%
>    2.11e+09           -33.8%  1.396e+09        perf-stat.i.cache-references
>    13687653            -3.4%   13225338        perf-stat.i.context-switches
>        1.36            -7.9%       1.25        perf-stat.i.cpi
>   3.219e+11            -2.2%  3.147e+11        perf-stat.i.cpu-cycles
>        1915 ±  2%      -6.6%       1788 ±  3%  perf-stat.i.cycles-between-cache-misses
>   2.371e+11            +6.0%  2.512e+11        perf-stat.i.instructions
>        0.74            +8.5%       0.80        perf-stat.i.ipc
>        1.15 ± 14%     -28.3%       0.82 ± 23%  perf-stat.i.major-faults
>      115.09            -3.2%     111.40        perf-stat.i.metric.K/sec
>        0.37            -0.0        0.35        perf-stat.overall.branch-miss-rate%
>        8.15 ±  3%      +4.6       12.74 ±  3%  perf-stat.overall.cache-miss-rate%
>        1.36            -7.7%       1.25        perf-stat.overall.cpi
>        1875 ±  2%      -5.5%       1772 ±  4%  perf-stat.overall.cycles-between-cache-misses
>        0.74            +8.3%       0.80        perf-stat.overall.ipc
>   5.524e+10            +6.4%  5.877e+10        perf-stat.ps.branch-instructions
>   2.079e+09           -33.9%  1.375e+09        perf-stat.ps.cache-references
>    13486088            -3.4%   13020988        perf-stat.ps.context-switches
>   3.175e+11            -2.3%  3.101e+11        perf-stat.ps.cpu-cycles
>    2.34e+11            +5.8%  2.475e+11        perf-stat.ps.instructions
>        1.09 ± 14%     -28.3%       0.78 ± 21%  perf-stat.ps.major-faults
>    1.73e+13            -5.1%  1.642e+13        perf-stat.total.instructions
>     3527725           +10.7%    3905361        sched_debug.cfs_rq:/.avg_vruntime.avg
>     3975260           +14.1%    4535959 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.max
>       98657 ± 17%     +84.9%     182407 ± 18%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>       11.83 ±  7%     +17.6%      13.92 ±  5%  sched_debug.cfs_rq:/.h_nr_queued.max
>        2.71 ±  5%     +21.8%       3.30 ±  4%  sched_debug.cfs_rq:/.h_nr_queued.stddev
>       11.75 ±  7%     +17.7%      13.83 ±  6%  sched_debug.cfs_rq:/.h_nr_runnable.max
>        2.68 ±  4%     +21.2%       3.25 ±  5%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
>        4556 ±223%    +691.0%      36039 ± 34%  sched_debug.cfs_rq:/.left_deadline.avg
>      583131 ±223%    +577.3%    3949548 ±  4%  sched_debug.cfs_rq:/.left_deadline.max
>       51341 ±223%    +622.0%     370695 ± 16%  sched_debug.cfs_rq:/.left_deadline.stddev
>        4555 ±223%    +691.0%      36035 ± 34%  sched_debug.cfs_rq:/.left_vruntime.avg
>      583105 ±223%    +577.3%    3949123 ±  4%  sched_debug.cfs_rq:/.left_vruntime.max
>       51338 ±223%    +622.0%     370651 ± 16%  sched_debug.cfs_rq:/.left_vruntime.stddev
>     3527725           +10.7%    3905361        sched_debug.cfs_rq:/.min_vruntime.avg
>     3975260           +14.1%    4535959 ±  6%  sched_debug.cfs_rq:/.min_vruntime.max
>       98657 ± 17%     +84.9%     182407 ± 18%  sched_debug.cfs_rq:/.min_vruntime.stddev
>        0.22 ±  5%     +13.9%       0.25 ±  5%  sched_debug.cfs_rq:/.nr_queued.stddev
>        4555 ±223%    +691.0%      36035 ± 34%  sched_debug.cfs_rq:/.right_vruntime.avg
>      583105 ±223%    +577.3%    3949123 ±  4%  sched_debug.cfs_rq:/.right_vruntime.max
>       51338 ±223%    +622.0%     370651 ± 16%  sched_debug.cfs_rq:/.right_vruntime.stddev
>        1336 ±  7%     +50.8%       2014 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
>      552.53 ±  8%     +19.6%     660.87 ±  5%  sched_debug.cfs_rq:/.util_est.avg
>      384.27 ±  9%     +28.9%     495.43 ± 11%  sched_debug.cfs_rq:/.util_est.stddev
>        1328 ± 17%     +42.7%       1896 ± 13%  sched_debug.cpu.curr->pid.stddev
>       11.75 ±  8%     +19.1%      14.00 ±  6%  sched_debug.cpu.nr_running.max
>        2.71 ±  5%     +22.7%       3.33 ±  4%  sched_debug.cpu.nr_running.stddev
>       76578 ±  9%     +33.7%     102390 ±  5%  sched_debug.cpu.nr_switches.stddev
>       62.25 ±  7%     +17.9%      73.42 ±  7%  sched_debug.cpu.nr_uninterruptible.max
>        8.11 ± 58%     -82.0%       1.46 ± 47%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
>       12.04 ±104%     -86.8%       1.58 ± 55%  perf-sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
>        0.11 ±123%     -95.3%       0.01 ±102%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
>        0.06 ±103%     -93.6%       0.00 ±154%  perf-sched.sch_delay.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
>        0.10 ±109%     -93.9%       0.01 ±163%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
>        1.00 ± 21%     -59.6%       0.40 ± 50%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
>       14.54 ± 14%     -79.2%       3.02 ± 51%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
>        1.50 ± 84%     -74.1%       0.39 ± 90%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>        1.13 ± 68%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.38 ± 97%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        1.10 ± 17%     -68.9%       0.34 ± 49%  perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
>       42.25 ± 18%     -71.7%      11.96 ± 53%  perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>        3.25 ± 17%     -77.5%       0.73 ± 49%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       29.17 ± 33%     -62.0%      11.09 ± 85%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       46.25 ± 15%     -68.8%      14.43 ± 52%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>        3.72 ± 70%     -81.0%       0.70 ± 67%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        7.95 ± 55%     -69.7%       2.41 ± 65%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
>        3.66 ±139%     -97.1%       0.11 ± 58%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
>        3.05 ± 44%     -91.9%       0.25 ± 57%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
>       29.96 ±  9%     -83.6%       4.90 ± 48%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
>       26.20 ± 59%     -88.9%       2.92 ± 66%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        0.14 ± 84%     -91.2%       0.01 ±142%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
>        0.20 ±149%     -97.5%       0.01 ±102%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
>        0.11 ±144%     -96.6%       0.00 ±154%  perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
>        0.19 ±118%     -96.7%       0.01 ±163%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
>      274.64 ± 95%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        3.72 ±151%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        3135 ±  5%     -48.6%       1611 ± 57%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1320 ± 19%     -78.6%     282.01 ± 74%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>      265.55 ± 82%     -77.9%      58.70 ±124%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
>        1850 ± 28%     -59.1%     757.74 ± 68%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
>      766.85 ± 56%     -68.0%     245.51 ± 51%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        1.77 ± 17%     -71.9%       0.50 ± 49%  perf-sched.total_sch_delay.average.ms
>        5.15 ± 17%     -69.5%       1.57 ± 48%  perf-sched.total_wait_and_delay.average.ms
>        3.38 ± 17%     -68.2%       1.07 ± 48%  perf-sched.total_wait_time.average.ms
>        5100 ±  3%     -31.0%       3522 ± 47%  perf-sched.total_wait_time.max.ms
>       27.42 ± 49%     -85.2%       4.07 ± 47%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
>       35.29 ± 80%     -85.8%       5.00 ± 51%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
>       42.28 ± 14%     -79.4%       8.70 ± 51%  perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
>        3.12 ± 17%     -66.4%       1.05 ± 48%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
>      122.62 ± 18%     -70.4%      36.26 ± 53%  perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>      250.26 ± 65%     -94.2%      14.56 ± 55%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        9.37 ± 17%     -78.2%       2.05 ± 48%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       58.34 ± 33%     -62.0%      22.18 ± 85%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>      134.44 ± 15%     -69.3%      41.24 ± 52%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>       86.94 ±  6%     -83.1%      14.68 ± 48%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
>       86.57 ± 39%     -86.0%      12.14 ± 59%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      647.92 ± 48%     -97.9%      13.86 ± 45%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        6386 ±  6%     -46.8%       3397 ± 57%  perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        3868 ± 27%     -60.4%       1531 ± 67%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
>        1647 ± 55%     -67.7%     531.51 ± 50%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        5014 ±  5%     -32.5%       3385 ± 47%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       19.31 ± 47%     -86.5%       2.61 ± 49%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
>       23.25 ± 70%     -85.3%       3.42 ± 52%  perf-sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
>       18.33 ± 15%     -42.0%      10.64 ± 49%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        0.11 ±123%     -95.3%       0.01 ±102%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
>        0.06 ±103%     -93.6%       0.00 ±154%  perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
>        0.10 ±109%     -93.9%       0.01 ±163%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
>        1.70 ± 21%     -52.6%       0.81 ± 48%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
>       27.74 ± 15%     -79.5%       5.68 ± 51%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
>        2.17 ± 75%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.42 ± 97%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        2.02 ± 17%     -65.1%       0.70 ± 48%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
>       80.37 ± 18%     -69.8%      24.31 ± 52%  perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
>      210.13 ± 68%     -95.1%      10.21 ± 55%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        6.12 ± 17%     -78.5%       1.32 ± 48%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       29.17 ± 33%     -62.0%      11.09 ± 85%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       88.19 ± 16%     -69.6%      26.81 ± 52%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>       13.77 ± 45%     -65.7%       4.72 ± 53%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
>      104.64 ± 42%     -76.4%      24.74 ±135%  perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
>        5.16 ± 29%     -92.5%       0.39 ± 48%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
>       56.98 ±  5%     -82.9%       9.77 ± 48%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
>       60.36 ± 32%     -84.7%       9.22 ± 57%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      619.88 ± 43%     -98.0%      12.52 ± 45%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.14 ± 84%     -91.2%       0.01 ±142%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
>      740.14 ± 35%     -68.5%     233.31 ± 83%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
>        0.20 ±149%     -97.5%       0.01 ±102%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
>        0.11 ±144%     -96.6%       0.00 ±154%  perf-sched.wait_time.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
>        0.19 ±118%     -96.7%       0.01 ±163%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
>      327.64 ± 71%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        3.72 ±151%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>        3299 ±  6%     -40.7%       1957 ± 51%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      436.75 ± 39%     -76.9%     100.85 ± 98%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
>        2112 ± 19%     -62.3%     796.34 ± 63%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
>      947.83 ± 46%     -58.8%     390.83 ± 53%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        5014 ±  5%     -32.5%       3385 ± 47%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 
> 
> 
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_2/aim9/300s
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>       11036           +85.7%      20499        meminfo.PageTables
>      125.17 ±  8%     +18.4%     148.17 ±  7%  perf-c2c.HITM.local
>       30464           +18.7%      36160        sched_debug.cpu.nr_switches.avg
>        9166           +19.8%      10985        vmstat.system.cs
>        6623 ± 17%     +60.8%      10652 ±  5%  numa-meminfo.node0.PageTables
>        4414 ± 26%    +123.2%       9853 ±  6%  numa-meminfo.node1.PageTables
>        1653 ± 17%     +60.1%       2647 ±  5%  numa-vmstat.node0.nr_page_table_pages
>        1097 ± 26%    +123.9%       2457 ±  6%  numa-vmstat.node1.nr_page_table_pages
>      319.08            -2.2%     312.04        aim9.shell_rtns_2.ops_per_sec
>    27170926            -2.2%   26586121        aim9.time.minor_page_faults
>     1051038            -2.2%    1027732        aim9.time.voluntary_context_switches
>        2736           +86.4%       5101        proc-vmstat.nr_page_table_pages
>       28014            +1.3%      28378        proc-vmstat.nr_slab_unreclaimable
>    19332129            -1.5%   19048363        proc-vmstat.numa_hit
>    19283853            -1.5%   18996609        proc-vmstat.numa_local
>    19892794            -1.5%   19598065        proc-vmstat.pgalloc_normal
>    28044189            -2.1%   27457289        proc-vmstat.pgfault
>    19843766            -1.5%   19543091        proc-vmstat.pgfree
>      419715            -5.7%     395688 ±  8%  proc-vmstat.pgreuse
>        2682            -2.0%       2628        proc-vmstat.unevictable_pgs_culled
>        0.07 ±  6%     -30.5%       0.05 ± 22%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>        0.03 ±  6%     +36.0%       0.04        perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.07 ± 33%     -57.5%       0.03 ± 53%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
>        0.02 ± 74%    +112.0%       0.05 ± 36%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
>        0.02           +24.1%       0.02 ±  2%  perf-sched.total_sch_delay.average.ms
>       27.52           -14.0%      23.67        perf-sched.total_wait_and_delay.average.ms
>       23179           +18.3%      27421        perf-sched.total_wait_and_delay.count.ms
>       27.50           -14.0%      23.65        perf-sched.total_wait_time.average.ms
>      117.03 ±  3%     -72.4%      32.27 ±  2%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        1655 ±  2%    +282.0%       6324        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.96 ± 29%     +51.6%       1.45 ± 22%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
>      117.00 ±  3%     -72.5%      32.23 ±  2%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        5.93            +0.1        6.00        perf-stat.i.branch-miss-rate%
>        9189           +19.8%      11011        perf-stat.i.context-switches
>        1.96            +1.6%       1.99        perf-stat.i.cpi
>       71.21           +60.6%     114.39 ±  4%  perf-stat.i.cpu-migrations
>        0.53            -1.5%       0.52        perf-stat.i.ipc
>        3.79            -2.1%       3.71        perf-stat.i.metric.K/sec
>       90998            -2.1%      89084        perf-stat.i.minor-faults
>       90998            -2.1%      89084        perf-stat.i.page-faults
>        5.99            +0.1        6.06        perf-stat.overall.branch-miss-rate%
>        1.79            +1.4%       1.82        perf-stat.overall.cpi
>        0.56            -1.3%       0.55        perf-stat.overall.ipc
>        9158           +19.8%      10974        perf-stat.ps.context-switches
>       70.99           +60.6%     114.02 ±  4%  perf-stat.ps.cpu-migrations
>       90694            -2.1%      88787        perf-stat.ps.minor-faults
>       90695            -2.1%      88787        perf-stat.ps.page-faults
>   8.155e+11            -1.1%  8.065e+11        perf-stat.total.instructions
>        8.87            -0.3        8.55        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>        8.86            -0.3        8.54        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        2.53 ±  2%      -0.1        2.43 ±  2%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
>        2.54            -0.1        2.44 ±  2%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>        2.49            -0.1        2.40 ±  2%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
>        0.98 ±  5%      -0.1        0.90 ±  5%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.70 ±  3%      -0.1        0.62 ±  6%  perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
>        0.18 ±141%      +0.5        0.67 ±  6%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>        0.18 ±141%      +0.5        0.67 ±  6%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>        0.00            +0.6        0.59 ±  7%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>       62.48            +0.7       63.14        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>       49.10            +0.7       49.78        perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       67.62            +0.8       68.43        perf-profile.calltrace.cycles-pp.common_startup_64
>       20.14            -0.7       19.40        perf-profile.children.cycles-pp.do_syscall_64
>       20.18            -0.7       19.44        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>        3.33 ±  2%      -0.2        3.16 ±  2%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>        3.22 ±  2%      -0.2        3.06        perf-profile.children.cycles-pp.do_mmap
>        3.51 ±  2%      -0.1        3.38        perf-profile.children.cycles-pp.do_exit
>        3.52 ±  2%      -0.1        3.38        perf-profile.children.cycles-pp.__x64_sys_exit_group
>        3.52 ±  2%      -0.1        3.38        perf-profile.children.cycles-pp.do_group_exit
>        3.67            -0.1        3.54        perf-profile.children.cycles-pp.x64_sys_call
>        2.21            -0.1        2.09 ±  3%  perf-profile.children.cycles-pp.__x64_sys_openat
>        2.07 ±  2%      -0.1        1.94 ±  2%  perf-profile.children.cycles-pp.path_openat
>        2.09 ±  2%      -0.1        1.97 ±  2%  perf-profile.children.cycles-pp.do_filp_open
>        2.19            -0.1        2.08 ±  3%  perf-profile.children.cycles-pp.do_sys_openat2
>        1.50 ±  4%      -0.1        1.39 ±  3%  perf-profile.children.cycles-pp.copy_process
>        2.56            -0.1        2.46 ±  2%  perf-profile.children.cycles-pp.exit_mm
>        2.55            -0.1        2.44 ±  2%  perf-profile.children.cycles-pp.__mmput
>        2.51 ±  2%      -0.1        2.41 ±  2%  perf-profile.children.cycles-pp.exit_mmap
>        0.70 ±  3%      -0.1        0.62 ±  6%  perf-profile.children.cycles-pp.dup_mm
>        0.94 ±  4%      -0.1        0.89 ±  2%  perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
>        0.57 ±  3%      -0.0        0.52 ±  4%  perf-profile.children.cycles-pp.alloc_pages_noprof
>        0.20 ± 12%      -0.0        0.15 ± 10%  perf-profile.children.cycles-pp.perf_event_task_tick
>        0.18 ±  4%      -0.0        0.14 ± 15%  perf-profile.children.cycles-pp.xas_find
>        0.10 ± 12%      -0.0        0.07 ± 24%  perf-profile.children.cycles-pp.up_write
>        0.09 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.tick_check_broadcast_expired
>        0.08 ± 12%      +0.0        0.10 ±  8%  perf-profile.children.cycles-pp.hrtimer_try_to_cancel
>        0.10 ± 13%      +0.0        0.13 ±  5%  perf-profile.children.cycles-pp.__perf_event_task_sched_out
>        0.20 ±  8%      +0.0        0.23 ±  4%  perf-profile.children.cycles-pp.enqueue_entity
>        0.21 ±  9%      +0.0        0.25 ±  4%  perf-profile.children.cycles-pp.prepare_task_switch
>        0.03 ±101%      +0.0        0.07 ± 16%  perf-profile.children.cycles-pp.run_ksoftirqd
>        0.04 ± 71%      +0.1        0.09 ± 15%  perf-profile.children.cycles-pp.kick_pool
>        0.05 ± 47%      +0.1        0.11 ± 16%  perf-profile.children.cycles-pp.__queue_work
>        0.28 ±  5%      +0.1        0.34 ±  7%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>        0.50            +0.1        0.56 ±  2%  perf-profile.children.cycles-pp.timerqueue_del
>        0.04 ± 71%      +0.1        0.11 ± 17%  perf-profile.children.cycles-pp.queue_work_on
>        0.51 ±  4%      +0.1        0.58 ±  2%  perf-profile.children.cycles-pp.enqueue_task_fair
>        0.32 ±  3%      +0.1        0.40 ±  4%  perf-profile.children.cycles-pp.ttwu_do_activate
>        0.53 ±  5%      +0.1        0.61 ±  3%  perf-profile.children.cycles-pp.enqueue_task
>        0.49 ±  4%      +0.1        0.57 ±  6%  perf-profile.children.cycles-pp.schedule
>        0.28 ±  6%      +0.1        0.38        perf-profile.children.cycles-pp.sched_ttwu_pending
>        0.32 ±  5%      +0.1        0.43 ±  2%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>        0.35 ±  8%      +0.1        0.47 ±  2%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
>        0.17 ± 10%      +0.2        0.34 ± 12%  perf-profile.children.cycles-pp.worker_thread
>        0.88 ±  3%      +0.2        1.06 ±  4%  perf-profile.children.cycles-pp.ret_from_fork
>        0.88 ±  3%      +0.2        1.06 ±  4%  perf-profile.children.cycles-pp.ret_from_fork_asm
>        0.39 ±  6%      +0.2        0.59 ±  7%  perf-profile.children.cycles-pp.kthread
>       66.24            +0.6       66.85        perf-profile.children.cycles-pp.cpuidle_idle_call
>       63.09            +0.6       63.73        perf-profile.children.cycles-pp.cpuidle_enter
>       62.97            +0.6       63.61        perf-profile.children.cycles-pp.cpuidle_enter_state
>       67.61            +0.8       68.43        perf-profile.children.cycles-pp.do_idle
>       67.62            +0.8       68.43        perf-profile.children.cycles-pp.common_startup_64
>       67.62            +0.8       68.43        perf-profile.children.cycles-pp.cpu_startup_entry
>        0.37 ± 11%      -0.1        0.31 ±  3%  perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
>        0.10 ± 13%      -0.0        0.06 ± 50%  perf-profile.self.cycles-pp.up_write
>        0.15 ±  4%      +0.1        0.22 ±  8%  perf-profile.self.cycles-pp.timerqueue_del
> 
> 
> 
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/exec_test/aim9/300s
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>       12120           +76.7%      21422        meminfo.PageTables
>        8543           +26.9%      10840        vmstat.system.cs
>        6148 ± 11%     +89.9%      11678 ±  5%  numa-meminfo.node0.PageTables
>        5909 ± 11%     +64.0%       9689 ±  7%  numa-meminfo.node1.PageTables
>        1532 ± 10%     +90.5%       2919 ±  5%  numa-vmstat.node0.nr_page_table_pages
>        1468 ± 11%     +65.2%       2426 ±  7%  numa-vmstat.node1.nr_page_table_pages
>        2991           +78.0%       5323        proc-vmstat.nr_page_table_pages
>    32726750            -2.4%   31952115        proc-vmstat.pgfault
>        1228            -2.6%       1197        aim9.exec_test.ops_per_sec
>       11018 ±  2%     +10.5%      12178 ±  2%  aim9.time.involuntary_context_switches
>    31835059            -2.4%   31062527        aim9.time.minor_page_faults
>      736468            -2.9%     715310        aim9.time.voluntary_context_switches
>        0.28 ±  7%     +11.3%       0.31 ±  6%  sched_debug.cfs_rq:/.h_nr_queued.stddev
>        0.28 ±  7%     +11.3%       0.31 ±  6%  sched_debug.cfs_rq:/.nr_queued.stddev
>      356683 ± 16%     +27.0%     453000 ±  9%  sched_debug.cpu.avg_idle.min
>       27620 ±  7%     +29.5%      35775        sched_debug.cpu.nr_switches.avg
>       84830 ± 14%     +16.3%      98648 ±  4%  sched_debug.cpu.nr_switches.max
>        4563 ± 26%     +46.2%       6671 ± 26%  sched_debug.cpu.nr_switches.min
>        0.03 ±  4%     -67.3%       0.01 ±141%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exec_release.exec_mm_release.exec_mmap
>        0.03           +11.2%       0.03 ±  2%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.05 ± 28%     +61.3%       0.09 ± 21%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
>        0.10 ± 18%     +18.8%       0.12        perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>        0.02 ±  3%     +18.3%       0.02 ±  2%  perf-sched.total_sch_delay.average.ms
>       28.80           -19.8%      23.10 ±  3%  perf-sched.total_wait_and_delay.average.ms
>       22332           +24.4%      27778        perf-sched.total_wait_and_delay.count.ms
>       28.78           -19.8%      23.07 ±  3%  perf-sched.total_wait_time.average.ms
>       17.39 ± 10%     -15.6%      14.67 ±  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       41.02 ±  4%     -54.6%      18.64 ±  6%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        4795 ±  2%    +122.5%      10668        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       17.35 ± 10%     -15.7%      14.63 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>        0.00 ±141%    +400.0%       0.00 ± 44%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>       40.99 ±  4%     -54.6%      18.61 ±  6%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.00 ±149%    +542.9%       0.03 ± 41%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
>   5.617e+08            -1.6%  5.529e+08        perf-stat.i.branch-instructions
>        5.76            +0.1        5.84        perf-stat.i.branch-miss-rate%
>        8562           +27.0%      10878        perf-stat.i.context-switches
>        1.87            +2.6%       1.92        perf-stat.i.cpi
>       78.02 ±  3%     +11.8%      87.23 ±  2%  perf-stat.i.cpu-migrations
>   2.792e+09            -1.6%  2.748e+09        perf-stat.i.instructions
>        0.55            -2.5%       0.54        perf-stat.i.ipc
>        4.42            -2.4%       4.31        perf-stat.i.metric.K/sec
>      106019            -2.4%     103509        perf-stat.i.minor-faults
>      106019            -2.4%     103509        perf-stat.i.page-faults
>        5.83            +0.1        5.91        perf-stat.overall.branch-miss-rate%
>        1.72            +2.3%       1.76        perf-stat.overall.cpi
>        0.58            -2.3%       0.57        perf-stat.overall.ipc
>   5.599e+08            -1.6%  5.511e+08        perf-stat.ps.branch-instructions
>        8534           +27.0%      10841        perf-stat.ps.context-switches
>       77.77 ±  3%     +11.8%      86.96 ±  2%  perf-stat.ps.cpu-migrations
>   2.783e+09            -1.6%  2.739e+09        perf-stat.ps.instructions
>      105666            -2.4%     103164        perf-stat.ps.minor-faults
>      105666            -2.4%     103164        perf-stat.ps.page-faults
>   8.386e+11            -1.6%  8.253e+11        perf-stat.total.instructions
>        7.79            -0.4        7.41 ±  2%  perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
>        7.75            -0.3        7.47        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>        7.73            -0.3        7.46        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        2.68 ±  2%      -0.2        2.52 ±  2%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
>        2.68 ±  2%      -0.2        2.52 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        2.68 ±  2%      -0.2        2.52 ±  2%  perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        2.73 ±  2%      -0.2        2.57 ±  2%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        2.60            -0.1        2.46 ±  3%  perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
>        2.61            -0.1        2.47 ±  3%  perf-profile.calltrace.cycles-pp.execve.exec_test
>        2.60            -0.1        2.46 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
>        2.60            -0.1        2.46 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve.exec_test
>        1.92 ±  3%      -0.1        1.79 ±  2%  perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
>        1.92 ±  3%      -0.1        1.80 ±  2%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
>        4.68            -0.1        4.57        perf-profile.calltrace.cycles-pp._Fork
>        1.88 ±  2%      -0.1        1.77 ±  2%  perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
>        2.76            -0.1        2.66 ±  2%  perf-profile.calltrace.cycles-pp.exec_test
>        3.24            -0.1        3.16        perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.84 ±  4%      -0.1        0.77 ±  5%  perf-profile.calltrace.cycles-pp.wait4
>        0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>        0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>        0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>        0.46 ± 45%      +0.3        0.78 ±  5%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>        0.17 ±141%      +0.4        0.53 ±  4%  perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
>        0.18 ±141%      +0.4        0.54 ±  2%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>       66.08            +0.8       66.85        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
>       66.08            +0.8       66.85        perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
>       66.02            +0.8       66.80        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
>       67.06            +0.9       68.00        perf-profile.calltrace.cycles-pp.common_startup_64
>       21.19            -0.9       20.30        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       21.15            -0.9       20.27        perf-profile.children.cycles-pp.do_syscall_64
>        7.92            -0.4        7.53 ±  2%  perf-profile.children.cycles-pp.execve
>        7.94            -0.4        7.56 ±  2%  perf-profile.children.cycles-pp.__x64_sys_execve
>        7.84            -0.4        7.46 ±  2%  perf-profile.children.cycles-pp.do_execveat_common
>        5.51            -0.3        5.25 ±  2%  perf-profile.children.cycles-pp.load_elf_binary
>        3.68            -0.2        3.49 ±  2%  perf-profile.children.cycles-pp.__mmput
>        2.81 ±  2%      -0.2        2.63        perf-profile.children.cycles-pp.__x64_sys_exit_group
>        2.80 ±  2%      -0.2        2.62 ±  2%  perf-profile.children.cycles-pp.do_exit
>        2.81 ±  2%      -0.2        2.62 ±  2%  perf-profile.children.cycles-pp.do_group_exit
>        2.93 ±  2%      -0.2        2.76 ±  2%  perf-profile.children.cycles-pp.x64_sys_call
>        3.60            -0.2        3.44 ±  2%  perf-profile.children.cycles-pp.exit_mmap
>        5.66            -0.1        5.51        perf-profile.children.cycles-pp.__handle_mm_fault
>        1.94 ±  3%      -0.1        1.82 ±  2%  perf-profile.children.cycles-pp.exit_mm
>        2.64            -0.1        2.52 ±  3%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>        2.55 ±  2%      -0.1        2.43 ±  3%  perf-profile.children.cycles-pp.do_mmap
>        2.19 ±  2%      -0.1        2.08 ±  3%  perf-profile.children.cycles-pp.__mmap_region
>        2.27            -0.1        2.16 ±  2%  perf-profile.children.cycles-pp.begin_new_exec
>        2.79            -0.1        2.69 ±  2%  perf-profile.children.cycles-pp.exec_test
>        0.83 ±  4%      -0.1        0.76 ±  6%  perf-profile.children.cycles-pp.__mmap_prepare
>        0.86 ±  4%      -0.1        0.78 ±  5%  perf-profile.children.cycles-pp.wait4
>        0.52 ±  5%      -0.1        0.45 ±  7%  perf-profile.children.cycles-pp.kernel_wait4
>        0.50 ±  5%      -0.1        0.43 ±  6%  perf-profile.children.cycles-pp.do_wait
>        0.88 ±  3%      -0.1        0.81 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
>        0.51 ±  2%      -0.1        0.46 ±  6%  perf-profile.children.cycles-pp.setup_arg_pages
>        0.39 ±  2%      -0.0        0.34 ±  8%  perf-profile.children.cycles-pp.unlink_anon_vmas
>        0.08 ± 10%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
>        0.37 ±  5%      -0.0        0.33 ±  3%  perf-profile.children.cycles-pp.__memcg_slab_free_hook
>        0.21 ±  6%      -0.0        0.17 ±  5%  perf-profile.children.cycles-pp.user_path_at
>        0.21 ±  3%      -0.0        0.18 ± 10%  perf-profile.children.cycles-pp.__percpu_counter_sum
>        0.18 ±  7%      -0.0        0.15 ±  5%  perf-profile.children.cycles-pp.alloc_empty_file
>        0.33 ±  5%      -0.0        0.30        perf-profile.children.cycles-pp.relocate_vma_down
>        0.04 ± 45%      +0.0        0.08 ± 12%  perf-profile.children.cycles-pp.__update_load_avg_se
>        0.14 ±  7%      +0.0        0.18 ± 10%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
>        0.19 ±  9%      +0.0        0.24 ±  7%  perf-profile.children.cycles-pp.prepare_task_switch
>        0.02 ±142%      +0.0        0.06 ± 23%  perf-profile.children.cycles-pp.select_task_rq
>        0.03 ±100%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.task_contending
>        0.45 ±  7%      +0.1        0.51 ±  3%  perf-profile.children.cycles-pp.__pick_next_task
>        0.14 ± 22%      +0.1        0.20 ± 10%  perf-profile.children.cycles-pp.kick_pool
>        0.36 ±  4%      +0.1        0.42 ±  4%  perf-profile.children.cycles-pp.dequeue_entities
>        0.36 ±  4%      +0.1        0.44 ±  5%  perf-profile.children.cycles-pp.dequeue_task_fair
>        0.15 ± 20%      +0.1        0.23 ± 10%  perf-profile.children.cycles-pp.__queue_work
>        0.49 ±  5%      +0.1        0.57 ±  7%  perf-profile.children.cycles-pp.schedule_idle
>        0.14 ± 22%      +0.1        0.23 ±  9%  perf-profile.children.cycles-pp.queue_work_on
>        0.36 ±  3%      +0.1        0.46 ±  9%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>        0.47 ±  7%      +0.1        0.57 ±  7%  perf-profile.children.cycles-pp.timerqueue_del
>        0.30 ± 13%      +0.1        0.42 ±  7%  perf-profile.children.cycles-pp.ttwu_do_activate
>        0.23 ± 15%      +0.1        0.37 ±  4%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
>        0.18 ± 14%      +0.1        0.32 ±  3%  perf-profile.children.cycles-pp.sched_ttwu_pending
>        0.19 ± 13%      +0.1        0.34 ±  4%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>        0.61 ±  3%      +0.2        0.76 ±  5%  perf-profile.children.cycles-pp.schedule
>        1.60 ±  4%      +0.2        1.80 ±  2%  perf-profile.children.cycles-pp.ret_from_fork_asm
>        1.60 ±  4%      +0.2        1.80 ±  2%  perf-profile.children.cycles-pp.ret_from_fork
>        0.88 ±  7%      +0.2        1.09 ±  3%  perf-profile.children.cycles-pp.kthread
>        1.22 ±  3%      +0.2        1.45 ±  5%  perf-profile.children.cycles-pp.__schedule
>        0.54 ±  8%      +0.2        0.78 ±  5%  perf-profile.children.cycles-pp.worker_thread
>       66.08            +0.8       66.85        perf-profile.children.cycles-pp.start_secondary
>       67.06            +0.9       68.00        perf-profile.children.cycles-pp.common_startup_64
>       67.06            +0.9       68.00        perf-profile.children.cycles-pp.cpu_startup_entry
>       67.06            +0.9       68.00        perf-profile.children.cycles-pp.do_idle
>        0.08 ± 10%      -0.0        0.04 ± 71%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
>        0.04 ± 45%      +0.0        0.08 ± 10%  perf-profile.self.cycles-pp.__update_load_avg_se
>        0.14 ± 10%      +0.1        0.23 ± 11%  perf-profile.self.cycles-pp.timerqueue_del
> 
> 
> 
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
>    gcc-12/performance/1BRD_48G/xfs/x86_64-rhel-9.4/600/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/sync_disk_rw/aim7
> 
> commit:
>    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
>    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> 
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>      344180 ±  6%     -13.0%     299325 ±  9%  meminfo.Mapped
>        9594 ±123%    +191.8%      27995 ± 54%  numa-meminfo.node1.PageTables
>        2399 ±123%    +191.3%       6989 ± 54%  numa-vmstat.node1.nr_page_table_pages
>     1860734            -5.2%    1763194        vmstat.io.bo
>      831686            +1.3%     842493        vmstat.system.cs
>       50372            -5.5%      47609        aim7.jobs-per-min
>     1435644           +11.5%    1600707        aim7.time.involuntary_context_switches
>        7242            +1.2%       7332        aim7.time.percent_of_cpu_this_job_got
>        5159            +7.1%       5526        aim7.time.system_time
>    33195986            +6.9%   35497140        aim7.time.voluntary_context_switches
>       40987 ± 10%     -19.8%      32872 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>       40987 ± 10%     -19.8%      32872 ±  9%  sched_debug.cfs_rq:/.min_vruntime.stddev
>      605972 ±  2%     +14.5%     693922 ±  7%  sched_debug.cpu.avg_idle.max
>       30974 ±  8%     -20.9%      24498 ± 15%  sched_debug.cpu.avg_idle.min
>      118758 ±  5%     +22.0%     144899 ±  6%  sched_debug.cpu.avg_idle.stddev
>      856253            +1.5%     869009        perf-stat.i.context-switches
>        3.06            +2.3%       3.13        perf-stat.i.cpi
>      164824            +7.7%     177546        perf-stat.i.cpu-migrations
>        7.93            +2.5%       8.13        perf-stat.i.metric.K/sec
>        3.41            +1.8%       3.47        perf-stat.overall.cpi
>        1355            +5.8%       1434 ±  4%  perf-stat.overall.cycles-between-cache-misses
>        0.29            -1.8%       0.29        perf-stat.overall.ipc
>      845412            +1.6%     858925        perf-stat.ps.context-switches
>      162728            +7.8%     175475        perf-stat.ps.cpu-migrations
>   4.391e+12            +5.0%  4.609e+12        perf-stat.total.instructions
>      444798            +6.0%     471383 ±  5%  proc-vmstat.nr_active_anon
>       28190            -2.8%      27402        proc-vmstat.nr_dirty
>     1231373            +2.3%    1259666 ±  2%  proc-vmstat.nr_file_pages
>       63763            +0.9%      64355        proc-vmstat.nr_inactive_file
>       86758 ±  6%     -12.9%      75546 ±  8%  proc-vmstat.nr_mapped
>       10162 ±  2%      +7.2%      10895 ±  3%  proc-vmstat.nr_page_table_pages
>      265229           +10.4%     292795 ±  9%  proc-vmstat.nr_shmem
>      444798            +6.0%     471383 ±  5%  proc-vmstat.nr_zone_active_anon
>       63763            +0.9%      64355        proc-vmstat.nr_zone_inactive_file
>       28191            -2.8%      27400        proc-vmstat.nr_zone_write_pending
>       24349           +11.6%      27171 ±  8%  proc-vmstat.pgreuse
>        0.02 ±  3%     +11.3%       0.03 ±  2%  perf-sched.sch_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
>        0.29 ± 17%     -30.7%       0.20 ± 14%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
>        0.03 ± 10%     +33.5%       0.04 ±  2%  perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
>        0.21 ± 32%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        0.16 ± 16%     +51.9%       0.24 ± 11%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.22 ± 19%     +44.1%       0.32 ± 25%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
>        0.30 ± 28%     -38.7%       0.18 ± 28%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        0.11 ±  5%     +12.8%       0.12 ±  4%  perf-sched.sch_delay.avg.ms.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
>        0.08 ±  4%     +15.8%       0.09 ±  4%  perf-sched.sch_delay.avg.ms.xlog_wait.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync
>        0.02 ±  3%     +13.7%       0.02 ±  4%  perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
>        0.01 ±223%   +1289.5%       0.09 ±111%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
>        2.49 ± 40%     -43.4%       1.41 ± 50%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
>        0.76 ±  7%     +92.8%       1.46 ± 40%  perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
>        0.65 ± 41%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        1.40 ± 64%   +2968.7%      43.04 ± 13%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        0.63 ± 19%     +89.8%       1.19 ± 51%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       28.67 ±  3%     -11.2%      25.45 ±  5%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
>        0.80 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
>        5.76 ±107%    +152.4%      14.53 ± 10%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        8441          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
>       18.67 ± 71%    +108.0%      38.83 ±  5%  perf-sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_trans_commit
>      116.17 ±105%   +1677.8%       2065 ±  5%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      424.79 ±151%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
>       28.51 ±  3%     -11.2%      25.31 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
>        0.38 ± 59%     -79.0%       0.08 ±107%  perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
>        0.77 ±  9%     -56.5%       0.34 ±  3%  perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
>        1.80 ±138%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        6.13 ± 93%    +133.2%      14.29 ± 10%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>        1.00 ± 16%     -48.1%       0.52 ± 20%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
>        0.92 ± 16%     -62.0%       0.35 ± 14%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work
>        0.26 ±  2%     -59.8%       0.11        perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
>        0.24 ±223%   +2180.2%       5.56 ± 83%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
>        1.25 ± 77%     -79.8%       0.25 ±107%  perf-sched.wait_time.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
>        1.78 ± 51%    +958.6%      18.82 ±117%  perf-sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map
>       58.48 ±  6%     -10.7%      52.22 ±  2%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.__flush_workqueue.xlog_cil_push_now.isra
>       10.87 ±192%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>        8.63 ± 27%     -63.9%       3.12 ± 29%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work
> 
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
  2025-06-25 13:57     ` Mathieu Desnoyers
@ 2025-06-25 15:06       ` Gabriele Monaco
  2025-07-02 13:58       ` Gabriele Monaco
  1 sibling, 0 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-25 15:06 UTC (permalink / raw)
  To: Mathieu Desnoyers, kernel test robot
  Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
	Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
	Paul E. McKenney, Ingo Molnar



On Wed, 2025-06-25 at 09:57 -0400, Mathieu Desnoyers wrote:
> On 2025-06-25 04:01, kernel test robot wrote:
> > 
> > Hello,
> > 
> > kernel test robot noticed a 10.1% regression of
> > hackbench.throughput on:
> 
> Hi Gabriele,
> 
> This is a significant regression. Can you investigate before it gets
> merged ?
> 

Hi Mathieu,

I'll have a closer look at this next week.
For now let's keep this stalled.

Thanks,
Gabriele

> Thanks,
> 
> Mathieu
> 
> > 
> > 
> > commit: f3de761c52148abfb1b4512914f64c7e1c737fc8 ("[RESEND PATCH
> > v13 2/3] sched: Move task_mm_cid_work to mm work_struct")
> > url:
> > https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250613-171504
> > patch link:
> > https://lore.kernel.org/all/20250613091229.21500-3-gmonaco@redhat.com/
> > patch subject: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work
> > to mm work_struct
> > 
> > testcase: hackbench
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > parameters:
> > 
> > 	nr_threads: 100%
> > 	iterations: 4
> > 	mode: process
> > 	ipc: pipe
> > 	cpufreq_governor: performance
> > 
> > 
> > In addition to that, the commit also has significant impact on the
> > following tests:
> > 
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | hackbench: hackbench.throughput  2.9%
> > > regression                                               |
> > > test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > ipc=socket                                                       
> > >                               |
> > >                  |
> > > iterations=4                                                     
> > >                               |
> > >                  |
> > > mode=process                                                     
> > >                               |
> > >                  |
> > > nr_threads=50%                                                   
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.shell_rtns_3.ops_per_sec  1.7%
> > > regression                                           |
> > > test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > test=shell_rtns_3                                                
> > >                               |
> > >                  |
> > > testtime=300s                                                    
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | hackbench: hackbench.throughput  6.2%
> > > regression                                               |
> > > test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > ipc=pipe                                                         
> > >                               |
> > >                  |
> > > iterations=4                                                     
> > >                               |
> > >                  |
> > > mode=process                                                     
> > >                               |
> > >                  |
> > > nr_threads=800%                                                  
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.shell_rtns_1.ops_per_sec  2.1%
> > > regression                                           |
> > > test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > test=shell_rtns_1                                                
> > >                               |
> > >                  |
> > > testtime=300s                                                    
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | hackbench: hackbench.throughput 11.8%
> > > improvement                                              |
> > > test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > ipc=pipe                                                         
> > >                               |
> > >                  |
> > > iterations=4                                                     
> > >                               |
> > >                  |
> > > mode=process                                                     
> > >                               |
> > >                  |
> > > nr_threads=50%                                                   
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.shell_rtns_2.ops_per_sec  2.2%
> > > regression                                           |
> > > test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > test=shell_rtns_2                                                
> > >                               |
> > >                  |
> > > testtime=300s                                                    
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.exec_test.ops_per_sec  2.6%
> > > regression                                              |
> > > test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > test=exec_test                                                   
> > >                               |
> > >                  |
> > > testtime=300s                                                    
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim7: aim7.jobs-per-min  5.5%
> > > regression                                                      
> > > |
> > > test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory     |
> > > test parameters  |
> > > cpufreq_governor=performance                                     
> > >                               |
> > >                  |
> > > disk=1BRD_48G                                                    
> > >                               |
> > >                  |
> > > fs=xfs                                                           
> > >                               |
> > >                  |
> > > load=600                                                         
> > >                               |
> > >                  |
> > > test=sync_disk_rw                                                
> > >                               |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a
> > new version of
> > the same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes:
> > > https://lore.kernel.org/oe-lkp/202506251555.de6720f7-lkp@intel.com
> > 
> > 
> > Details are as below:
> > -------------------------------------------------------------------
> > ------------------------------->
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250625/202506251555.de6720f7-lkp@intel.com
> > 
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> >    gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/100%/debian-
> > 12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >       55140 ± 80%    +229.2%     181547 ± 20%  numa-
> > meminfo.node1.Mapped
> >       13048 ± 80%    +248.2%      45431 ± 20%  numa-
> > vmstat.node1.nr_mapped
> >      679.17 ± 22%     -25.3%     507.33 ± 10% 
> > sched_debug.cfs_rq:/.util_est.max
> >   4.287e+08 ±  3%     +20.3%  5.158e+08        cpuidle..time
> >     2953716 ± 13%    +228.9%    9716185 ±  2%  cpuidle..usage
> >       91072 ± 12%    +134.8%     213855 ±  7%  meminfo.Mapped
> >     8848637           +10.4%    9769875 ±  5%  meminfo.Memused
> >        0.67 ±  4%      +0.1        0.78 ±  2%  mpstat.cpu.all.irq%
> >        0.03 ±  2%      +0.0        0.03 ±  4%  mpstat.cpu.all.soft%
> >        4.17 ±  8%    +596.0%      29.00 ± 31% 
> > mpstat.max_utilization.seconds
> >        2950           -12.3%       2587        vmstat.procs.r
> >     4557607 ±  2%     +35.9%    6192548        vmstat.system.cs
> >      397195 ±  5%     +73.4%     688726        vmstat.system.in
> >     1490153           -10.1%    1339340        hackbench.throughput
> >     1424170            -8.7%    1299590       
> > hackbench.throughput_avg
> >     1490153           -10.1%    1339340       
> > hackbench.throughput_best
> >     1353181 ±  2%     -10.1%    1216523       
> > hackbench.throughput_worst
> >    53158738 ±  3%     +34.0%   71240022       
> > hackbench.time.involuntary_context_switches
> >       12177            -2.4%      11891       
> > hackbench.time.percent_of_cpu_this_job_got
> >        4482            +7.6%       4821       
> > hackbench.time.system_time
> >      798.92            +2.0%     815.24       
> > hackbench.time.user_time
> >    1.54e+08 ±  3%     +46.6%  2.257e+08       
> > hackbench.time.voluntary_context_switches
> >      210335            +3.3%     217333        proc-
> > vmstat.nr_anon_pages
> >       23353 ± 14%    +136.2%      55152 ±  7%  proc-
> > vmstat.nr_mapped
> >       61825 ±  3%      +6.6%      65928 ±  2%  proc-
> > vmstat.nr_page_table_pages
> >       30859            +4.4%      32213        proc-
> > vmstat.nr_slab_reclaimable
> >        1294 ±177%   +1657.1%      22743 ± 66%  proc-
> > vmstat.numa_hint_faults
> >        1153 ±198%   +1597.0%      19566 ± 79%  proc-
> > vmstat.numa_hint_faults_local
> >   1.242e+08            -3.2%  1.202e+08        proc-vmstat.numa_hit
> >   1.241e+08            -3.2%  1.201e+08        proc-
> > vmstat.numa_local
> >        2195 ±110%   +2337.0%      53508 ± 55%  proc-
> > vmstat.numa_pte_updates
> >   1.243e+08            -3.2%  1.203e+08        proc-
> > vmstat.pgalloc_normal
> >      875909 ±  2%      +8.6%     951378 ±  2%  proc-vmstat.pgfault
> >   1.231e+08            -3.5%  1.188e+08        proc-vmstat.pgfree
> >   6.903e+10            -5.6%  6.514e+10        perf-stat.i.branch-
> > instructions
> >        0.21            +0.0        0.26        perf-stat.i.branch-
> > miss-rate%
> >    89225177 ±  2%     +38.3%  1.234e+08        perf-stat.i.branch-
> > misses
> >       25.64 ±  2%      -5.7       19.95 ±  2%  perf-stat.i.cache-
> > miss-rate%
> >   9.322e+08 ±  2%     +22.8%  1.145e+09        perf-stat.i.cache-
> > references
> >     4553621 ±  2%     +39.8%    6363761        perf-stat.i.context-
> > switches
> >        1.12            +4.5%       1.17        perf-stat.i.cpi
> >      186890 ±  2%    +143.9%     455784        perf-stat.i.cpu-
> > migrations
> >   2.787e+11            -4.9%  2.649e+11        perf-
> > stat.i.instructions
> >        0.91            -4.4%       0.87        perf-stat.i.ipc
> >       36.79 ±  2%     +44.9%      53.30        perf-
> > stat.i.metric.K/sec
> >        0.13 ±  2%      +0.1        0.19        perf-
> > stat.overall.branch-miss-rate%
> >       24.44 ±  2%      -4.7       19.74 ±  2%  perf-
> > stat.overall.cache-miss-rate%
> >        1.12            +4.6%       1.17        perf-
> > stat.overall.cpi
> >        0.89            -4.4%       0.85        perf-
> > stat.overall.ipc
> >   6.755e+10            -5.4%  6.392e+10        perf-stat.ps.branch-
> > instructions
> >    87121352 ±  2%     +38.5%  1.206e+08        perf-stat.ps.branch-
> > misses
> >   9.098e+08 ±  2%     +23.1%   1.12e+09        perf-stat.ps.cache-
> > references
> >     4443812 ±  2%     +39.9%    6218298        perf-
> > stat.ps.context-switches
> >      181595 ±  2%    +144.5%     443985        perf-stat.ps.cpu-
> > migrations
> >   2.727e+11            -4.7%  2.599e+11        perf-
> > stat.ps.instructions
> >    1.21e+13            +4.3%  1.262e+13        perf-
> > stat.total.instructions
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.__intel_pmu_enable_all.ctx_resched.event_function.remote_functio
> > n.generic_exec_single
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioc
> > tl.perf_evsel__run_ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCA
> > LL_64_after_hwframe
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.ctx_resched.event_function.remote_function.generic_exec_single.s
> > mp_call_function_single
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__r
> > un_ioctl.perf_evsel__enable_cpu
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_
> > evsel__enable_cpu.__evlist__enable
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.event_function.remote_function.generic_exec_single.smp_call_func
> > tion_single.event_function_call
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.event_function_call.perf_event_for_each_child._perf_ioctl.perf_i
> > octl.__x64_sys_ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.generic_exec_single.smp_call_function_single.event_function_call
> > .perf_event_for_each_child._perf_ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__ena
> > ble.__cmd_record
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl
> > .do_syscall_64
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_
> > hwframe.ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.remote_function.generic_exec_single.smp_call_function_single.eve
> > nt_function_call.perf_event_for_each_child
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.smp_call_function_single.event_function_call.perf_event_for_each
> > _child._perf_ioctl.perf_ioctl
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.calltrace.cycles-
> > pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_inte
> > rnal_command
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.calltrace.cycles-
> > pp.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record.run_bu
> > iltin
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.calltrace.cycles-
> > pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.
> > main
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.calltrace.cycles-
> > pp.perf_c2c__record.run_builtin.handle_internal_command.main
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.calltrace.cycles-
> > pp.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record.
> > perf_c2c__record
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.calltrace.cycles-
> > pp.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__
> > cmd_record.cmd_record
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.__intel_pmu_enable_all
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.__x64_sys_ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp._perf_ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.ctx_resched
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.event_function
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.generic_exec_single
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.perf_event_for_each_child
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.perf_ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.children.cycles-pp.remote_function
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.children.cycles-pp.__evlist__enable
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.children.cycles-pp.perf_c2c__record
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.children.cycles-pp.perf_evsel__enable_cpu
> >       11.84 ± 91%      -8.4        3.49 ±154%  perf-
> > profile.children.cycles-pp.perf_evsel__run_ioctl
> >       11.84 ± 91%      -9.5        2.30 ±141%  perf-
> > profile.self.cycles-pp.__intel_pmu_enable_all
> >       23.74 ±185%     -98.6%       0.34 ±114%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
> >       12.77 ± 80%     -83.9%       2.05 ±138%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> >        5.93 ± 69%     -90.5%       0.56 ±105%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_f
> > ile_write_iter.vfs_write.ksys_write
> >        6.70 ±152%     -94.5%       0.37 ±145%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem
> > _alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> >        0.82 ± 85%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        8.59 ±202%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >       13.53 ± 11%     -47.0%       7.18 ± 76%  perf-
> > sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> >       15.63 ± 17%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> >       47.22 ± 77%     -85.5%       6.87 ±144%  perf-
> > sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> >      133.35 ±132%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       68.01 ±203%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >       13.53 ± 11%     -47.0%       7.18 ± 76%  perf-
> > sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> >       34.59 ±  3%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> >       40.97 ±  8%     -71.8%      11.55 ± 64%  perf-
> > sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> >      373.07 ±123%     -99.8%       0.78 ±156%  perf-
> > sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_
> > from_fork_asm
> >       13.53 ± 11%     -62.0%       5.14 ±107%  perf-
> > sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_sysc
> > all_64
> >      120.97 ± 23%    -100.0%       0.00        perf-
> > sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filem
> > ap_fault.__do_fault
> >       46.03 ± 30%     -62.5%      17.27 ± 87%  perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_r
> > eschedule_ipi.[unknown].[unknown]
> >      984.50 ± 14%     -43.5%     556.24 ± 58%  perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_
> > from_fork
> >      339.42 ± 12%     -97.3%       9.11 ± 54%  perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >        8.00 ± 23%     -85.4%       1.17 ±223%  perf-
> > sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread
> > .ret_from_fork.ret_from_fork_asm
> >       22.17 ± 49%    -100.0%       0.00        perf-
> > sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filema
> > p_fault.__do_fault
> >       73.83 ± 20%     -76.3%      17.50 ± 96%  perf-
> > sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_
> > fault.[unknown].[unknown]
> >       13.53 ± 11%     -62.0%       5.14 ±107%  perf-
> > sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_sysc
> > all_64
> >      336.30 ±  5%    -100.0%       0.00        perf-
> > sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.filem
> > ap_fault.__do_fault
> >       23.74 ±185%     -98.6%       0.34 ±114%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
> >       14.48 ± 61%     -74.1%       3.76 ±152%  perf-
> > sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> >        6.48 ± 68%     -91.3%       0.56 ±105%  perf-
> > sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_f
> > ile_write_iter.vfs_write.ksys_write
> >        6.70 ±152%     -94.5%       0.37 ±145%  perf-
> > sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem
> > _alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> >        2.18 ± 75%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       10.79 ±165%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        1.53 ±100%     -97.5%       0.04 ± 84%  perf-
> > sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> >      105.34 ± 26%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> >       29.72 ± 40%     -76.5%       7.00 ±102%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_faul
> > t.[unknown].[unknown]
> >       32.21 ± 33%     -65.7%      11.04 ± 85%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> >      984.49 ± 14%     -43.5%     556.23 ± 58%  perf-
> > sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_
> > fork
> >      337.00 ± 12%     -97.6%       8.11 ± 52%  perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >       53.42 ± 59%     -69.8%      16.15 ±162%  perf-
> > sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> >      218.65 ± 83%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       82.52 ±162%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >       10.89 ± 98%     -98.8%       0.13 ±134%  perf-
> > sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> >      334.02 ±  6%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> >    gcc-12/performance/socket/4/x86_64-rhel-9.4/process/50%/debian-
> > 12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >      161258           -12.6%     141018 ±  5%  perf-c2c.HITM.total
> >        6514 ±  3%     +13.3%       7381 ±  3%  uptime.idle
> >      692218           +17.8%     815512        vmstat.system.in
> >   4.747e+08 ±  7%    +137.3%  1.127e+09 ± 21%  cpuidle..time
> >     5702271 ± 12%    +503.6%   34419686 ± 13%  cpuidle..usage
> >      141191 ±  2%     +10.3%     155768 ±  3%  meminfo.PageTables
> >       62180           +26.0%      78348        meminfo.Percpu
> >        2.20 ± 14%      +3.5        5.67 ± 20%  mpstat.cpu.all.idle%
> >        0.55            +0.2        0.72 ±  5%  mpstat.cpu.all.irq%
> >        0.04 ±  2%      +0.0        0.06 ±  5%  mpstat.cpu.all.soft%
> >      448780            -2.9%     435554        hackbench.throughput
> >      440656            -2.6%     429130       
> > hackbench.throughput_avg
> >      448780            -2.9%     435554       
> > hackbench.throughput_best
> >      425797            -2.2%     416584       
> > hackbench.throughput_worst
> >    90998790           -15.0%   77364427 ±  6% 
> > hackbench.time.involuntary_context_switches
> >       12446            -3.9%      11960       
> > hackbench.time.percent_of_cpu_this_job_got
> >       16057            -1.4%      15825       
> > hackbench.time.system_time
> >       63421            -2.3%      61955        proc-
> > vmstat.nr_kernel_stack
> >       35455 ±  2%     +10.0%      38991 ±  3%  proc-
> > vmstat.nr_page_table_pages
> >       34542            +5.1%      36312 ±  2%  proc-
> > vmstat.nr_slab_reclaimable
> >      151083 ± 16%     +46.6%     221509 ± 17%  proc-
> > vmstat.numa_hint_faults
> >      113731 ± 26%     +64.7%     187314 ± 20%  proc-
> > vmstat.numa_hint_faults_local
> >      133591            +3.1%     137709        proc-
> > vmstat.numa_other
> >       53696 ± 16%     -28.6%      38362 ± 10%  proc-
> > vmstat.numa_pages_migrated
> >     1053504 ±  2%      +7.7%    1135052 ±  4%  proc-vmstat.pgfault
> >     2077549 ±  3%      +8.5%    2254157 ±  4%  proc-vmstat.pgfree
> >       53696 ± 16%     -28.6%      38362 ± 10%  proc-
> > vmstat.pgmigrate_success
> >   4.941e+10            -2.6%   4.81e+10        perf-stat.i.branch-
> > instructions
> >   2.232e+08            -1.9%  2.189e+08        perf-stat.i.branch-
> > misses
> >    2.11e+09            -5.8%  1.989e+09 ±  2%  perf-stat.i.cache-
> > references
> >   3.221e+11            -2.5%  3.141e+11        perf-stat.i.cpu-
> > cycles
> >   2.365e+11            -2.7%  2.303e+11        perf-
> > stat.i.instructions
> >        6787 ±  3%      +8.0%       7327 ±  4%  perf-stat.i.minor-
> > faults
> >        6789 ±  3%      +8.0%       7329 ±  4%  perf-stat.i.page-
> > faults
> >   4.904e+10            -2.5%  4.779e+10        perf-stat.ps.branch-
> > instructions
> >   2.215e+08            -1.8%  2.174e+08        perf-stat.ps.branch-
> > misses
> >   2.094e+09            -5.7%  1.974e+09 ±  2%  perf-stat.ps.cache-
> > references
> >   3.197e+11            -2.4%   3.12e+11        perf-stat.ps.cpu-
> > cycles
> >   2.348e+11            -2.6%  2.288e+11        perf-
> > stat.ps.instructions
> >        6691 ±  3%      +7.2%       7174 ±  4%  perf-stat.ps.minor-
> > faults
> >        6693 ±  3%      +7.2%       7176 ±  4%  perf-stat.ps.page-
> > faults
> >     7475567           +16.4%    8699139       
> > sched_debug.cfs_rq:/.avg_vruntime.avg
> >     8752154 ±  3%     +20.6%   10551563 ±  4% 
> > sched_debug.cfs_rq:/.avg_vruntime.max
> >      211424 ± 12%    +374.5%    1003211 ± 39% 
> > sched_debug.cfs_rq:/.avg_vruntime.stddev
> >       19.44 ±  6%     +29.4%      25.17 ±  5% 
> > sched_debug.cfs_rq:/.h_nr_queued.max
> >        4.49 ±  4%     +33.5%       5.99 ±  4% 
> > sched_debug.cfs_rq:/.h_nr_queued.stddev
> >       19.33 ±  6%     +29.0%      24.94 ±  5% 
> > sched_debug.cfs_rq:/.h_nr_runnable.max
> >        4.47 ±  4%     +33.4%       5.96 ±  3% 
> > sched_debug.cfs_rq:/.h_nr_runnable.stddev
> >        6446 ±223%    +885.4%      63529 ± 57% 
> > sched_debug.cfs_rq:/.left_deadline.avg
> >      825119 ±223%    +613.5%    5886958 ± 44% 
> > sched_debug.cfs_rq:/.left_deadline.max
> >       72645 ±223%    +713.6%     591074 ± 49% 
> > sched_debug.cfs_rq:/.left_deadline.stddev
> >        6446 ±223%    +885.5%      63527 ± 57% 
> > sched_debug.cfs_rq:/.left_vruntime.avg
> >      825080 ±223%    +613.5%    5886805 ± 44% 
> > sched_debug.cfs_rq:/.left_vruntime.max
> >       72642 ±223%    +713.7%     591058 ± 49% 
> > sched_debug.cfs_rq:/.left_vruntime.stddev
> >        4202 ±  8%   +1115.1%      51069 ± 61% 
> > sched_debug.cfs_rq:/.load.stddev
> >      367.11           +20.2%     441.44 ± 17% 
> > sched_debug.cfs_rq:/.load_avg.max
> >     7475567           +16.4%    8699139       
> > sched_debug.cfs_rq:/.min_vruntime.avg
> >     8752154 ±  3%     +20.6%   10551563 ±  4% 
> > sched_debug.cfs_rq:/.min_vruntime.max
> >      211424 ± 12%    +374.5%    1003211 ± 39% 
> > sched_debug.cfs_rq:/.min_vruntime.stddev
> >        0.17 ± 16%     +39.8%       0.24 ±  6% 
> > sched_debug.cfs_rq:/.nr_queued.stddev
> >        6446 ±223%    +885.5%      63527 ± 57% 
> > sched_debug.cfs_rq:/.right_vruntime.avg
> >      825080 ±223%    +613.5%    5886805 ± 44% 
> > sched_debug.cfs_rq:/.right_vruntime.max
> >       72642 ±223%    +713.7%     591058 ± 49% 
> > sched_debug.cfs_rq:/.right_vruntime.stddev
> >      752.39 ± 81%     -81.4%     139.72 ± 53% 
> > sched_debug.cfs_rq:/.runnable_avg.min
> >        2728 ±  3%     +51.2%       4126 ±  8% 
> > sched_debug.cfs_rq:/.runnable_avg.stddev
> >      265.50 ±  2%     +12.3%     298.07 ±  2% 
> > sched_debug.cfs_rq:/.util_avg.stddev
> >      686.78 ±  7%     +23.4%     847.76 ±  6% 
> > sched_debug.cfs_rq:/.util_est.stddev
> >       19.44 ±  5%     +29.7%      25.22 ±  4% 
> > sched_debug.cpu.nr_running.max
> >        4.48 ±  5%     +34.4%       6.02 ±  3% 
> > sched_debug.cpu.nr_running.stddev
> >       67323 ± 14%    +130.3%     155017 ± 29% 
> > sched_debug.cpu.nr_switches.stddev
> >      -20.78           -18.2%     -17.00       
> > sched_debug.cpu.nr_uninterruptible.min
> >        0.13 ±100%     -85.8%       0.02 ±163%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> >        0.17 ±116%     -97.8%       0.00 ±223%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> >       22.92 ±110%     -97.4%       0.59 ±137%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_nop
> > rof
> >        8.10 ± 45%     -78.0%       1.78 ±135%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> >        3.14 ± 19%     -70.9%       0.91 ±102%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_n
> > oprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> >       39.05 ±149%     -97.4%       1.01 ±223%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vm
> > as.free_pgtables.exit_mmap
> >       15.77 ±203%     -99.7%       0.04 ±102%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_page
> > s.tlb_finish_mmu.exit_mmap.__mmput
> >        1.27 ±177%     -98.2%       0.02 ±190%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> >        0.20 ±140%     -92.4%       0.02 ±201%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link
> > _path_walk.path_openat
> >       86.63 ±221%     -99.9%       0.05 ±184%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> >        0.18 ± 75%     -97.0%       0.01 ±141%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> >        0.13 ± 34%     -75.5%       0.03 ±141%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_open
> > at.do_filp_open
> >        0.26 ±108%     -86.2%       0.04 ±142%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> >        2.33 ± 11%     -65.8%       0.80 ±107%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.
> > __alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> >        0.18 ± 88%     -91.1%       0.02 ±194%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> >        0.50 ±145%     -92.5%       0.04 ±210%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getna
> > me_flags.part.0
> >        0.19 ±116%     -98.5%       0.00 ±223%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> >        0.24 ±128%     -96.8%       0.01 ±180%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> >        0.99 ± 16%     -58.0%       0.42 ±100%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_g
> > eneric.unix_stream_recvmsg.sock_recvmsg
> >        0.27 ±124%     -97.5%       0.01 ±141%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.
> > exit_mm
> >        1.08 ± 28%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        0.96 ± 93%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        0.53 ±182%     -94.2%       0.03 ±158%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> >        0.84 ±160%     -93.5%       0.05 ±100%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> >       29.39 ±172%     -94.0%       1.78 ±123%  perf-
> > sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> >       21.51 ± 60%     -74.7%       5.45 ±118%  perf-
> > sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> >       13.77 ± 61%     -81.3%       2.57 ±113%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> >       11.22 ± 33%     -74.5%       2.86 ±107%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> >        1.99 ± 90%     -90.1%       0.20 ±100%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> >        4.50 ±138%     -94.9%       0.23 ±200%  perf-
> > sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_ep
> > oll_wait.__x64_sys_epoll_wait
> >       27.91 ±218%     -99.6%       0.11 ±120%  perf-
> > sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> >        9.91 ± 51%     -68.3%       3.15 ±124%  perf-
> > sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >       10.18 ± 24%     -62.4%       3.83 ±105%  perf-
> > sched.sch_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_s
> > tream_sendmsg.sock_write_iter
> >        1.16 ± 20%     -62.7%       0.43 ±106%  perf-
> > sched.sch_delay.avg.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> >        0.27 ± 99%     -92.0%       0.02 ±172%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> >        0.32 ±128%     -98.9%       0.00 ±223%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> >        0.88 ± 94%     -86.7%       0.12 ±144%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_e
> > vent_mmap_event.perf_event_mmap.__mmap_region
> >      252.53 ±128%     -98.4%       4.12 ±138%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_nop
> > rof
> >       60.22 ± 58%     -67.8%      19.37 ±146%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> >      168.93 ±209%     -99.9%       0.15 ±100%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_page
> > s.tlb_finish_mmu.exit_mmap.__mmput
> >        3.79 ±169%     -98.6%       0.05 ±199%  perf-
> > sched.sch_delay.max.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> >      517.19 ±222%     -99.9%       0.29 ±201%  perf-
> > sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> >        0.54 ± 82%     -98.4%       0.01 ±141%  perf-
> > sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> >        0.34 ± 57%     -93.1%       0.02 ±203%  perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> >        0.64 ±141%     -99.4%       0.00 ±223%  perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> >        0.28 ±111%     -97.2%       0.01 ±180%  perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> >        0.29 ±114%     -97.6%       0.01 ±141%  perf-
> > sched.sch_delay.max.ms.__cond_resched.remove_vma.exit_mmap.__mmput.
> > exit_mm
> >      133.30 ± 46%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       12.53 ±135%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        1.11 ± 85%     -76.9%       0.26 ±202%  perf-
> > sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.par
> > t.0
> >        7.48 ±214%     -99.0%       0.08 ±141%  perf-
> > sched.sch_delay.max.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> >       28.59 ±191%     -99.0%       0.28 ±120%  perf-
> > sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> >      285.16 ±145%     -99.3%       1.94 ±111%  perf-
> > sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> >      143.71 ±128%     -91.0%      12.97 ±134%  perf-
> > sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> >      107.10 ±162%     -99.1%       0.95 ±190%  perf-
> > sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_ep
> > oll_wait.__x64_sys_epoll_wait
> >      352.73 ±216%     -99.4%       2.06 ±118%  perf-
> > sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> >        1169 ± 25%     -58.7%     482.79 ±101%  perf-
> > sched.sch_delay.max.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> >        1.80 ± 20%     -58.5%       0.75 ±105%  perf-
> > sched.total_sch_delay.average.ms
> >        5.09 ± 20%     -58.0%       2.14 ±106%  perf-
> > sched.total_wait_and_delay.average.ms
> >       20.86 ± 25%     -82.0%       3.76 ±147%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.al
> > loc_slab_obj_exts.allocate_slab.___slab_alloc
> >        8.10 ± 21%     -69.1%       2.51 ±103%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_cal
> > ler_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> >       22.82 ± 27%     -66.9%       7.55 ±103%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine
> > _move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> >        6.55 ± 13%     -64.1%       2.35 ±108%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_no
> > prof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> >      139.95 ± 55%     -64.0%      50.45 ±122%  perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.
> > ksys_read
> >       27.54 ± 61%     -81.3%       5.15 ±113%  perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown]
> >       27.75 ± 30%     -73.3%       7.42 ±106%  perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown].[unknown]
> >       26.76 ± 25%     -64.2%       9.57 ±107%  perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_r
> > eschedule_ipi.[unknown].[unknown]
> >       29.39 ± 34%     -67.3%       9.61 ±115%  perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp
> > _kthread.kthread
> >       27.53 ± 25%     -62.9%      10.21 ±105%  perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.u
> > nix_stream_sendmsg.sock_write_iter
> >        3.25 ± 20%     -62.2%       1.23 ±106%  perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.unix_stream_read_gener
> > ic.unix_stream_recvmsg.sock_recvmsg
> >      864.18 ±  4%     -99.3%       6.27 ±103%  perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >      141.47 ± 38%     -72.9%      38.27 ±154%  perf-
> > sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.al
> > loc_slab_obj_exts.allocate_slab.___slab_alloc
> >        2346 ± 25%     -58.7%     969.53 ±101%  perf-
> > sched.wait_and_delay.max.ms.schedule_timeout.unix_stream_read_gener
> > ic.unix_stream_recvmsg.sock_recvmsg
> >       83.99 ±223%    -100.0%       0.02 ±163%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> >        0.16 ±122%     -97.7%       0.00 ±223%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> >       12.76 ± 37%     -81.6%       2.35 ±125%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> >        4.96 ± 22%     -67.9%       1.59 ±104%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_n
> > oprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> >       75.22 ± 91%     -96.4%       2.67 ±223%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vm
> > as.free_pgtables.exit_mmap
> >       23.31 ±188%     -98.8%       0.28 ±195%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_page
> > s.tlb_finish_mmu.exit_mmap.__mmput
> >       14.93 ± 22%     -68.0%       4.78 ±104%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move
> > _task.__set_cpus_allowed_ptr.__sched_setaffinity
> >        1.29 ±178%     -98.5%       0.02 ±185%  perf-
> > sched.wait_time.avg.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> >        0.20 ±140%     -92.5%       0.02 ±200%  perf-
> > sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link
> > _path_walk.path_openat
> >       87.29 ±221%     -99.9%       0.05 ±184%  perf-
> > sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> >        0.18 ± 76%     -97.0%       0.01 ±141%  perf-
> > sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> >        0.12 ± 33%     -87.4%       0.02 ±212%  perf-
> > sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_open
> > at.do_filp_open
> >        4.22 ± 15%     -63.3%       1.55 ±108%  perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.
> > __alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> >        0.18 ± 88%     -91.1%       0.02 ±194%  perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> >        0.50 ±145%     -92.5%       0.04 ±210%  perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getna
> > me_flags.part.0
> >        0.19 ±116%     -98.5%       0.00 ±223%  perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> >        0.24 ±128%     -96.8%       0.01 ±180%  perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> >        1.79 ± 27%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        1.98 ± 92%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        2.44 ±199%     -98.1%       0.05 ±109%  perf-
> > sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> >      125.16 ± 52%     -64.6%      44.36 ±120%  perf-
> > sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> >       13.77 ± 61%     -81.3%       2.58 ±113%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> >       16.53 ± 29%     -72.5%       4.55 ±106%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> >        3.11 ± 80%     -80.7%       0.60 ±138%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> >       17.30 ± 23%     -65.0%       6.05 ±107%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> >       50.76 ±143%     -98.1%       0.97 ±101%  perf-
> > sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> >       19.48 ± 27%     -66.8%       6.46 ±111%  perf-
> > sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >       17.35 ± 25%     -63.3%       6.37 ±106%  perf-
> > sched.wait_time.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_s
> > tream_sendmsg.sock_write_iter
> >        2.09 ± 21%     -62.0%       0.79 ±107%  perf-
> > sched.wait_time.avg.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> >      850.73 ±  6%     -99.3%       5.76 ±102%  perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >      168.00 ±223%    -100.0%       0.02 ±172%  perf-
> > sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> >        0.32 ±131%     -98.8%       0.00 ±223%  perf-
> > sched.wait_time.max.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> >        0.88 ± 94%     -86.7%       0.12 ±144%  perf-
> > sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_e
> > vent_mmap_event.perf_event_mmap.__mmap_region
> >       83.05 ± 45%     -75.0%      20.78 ±142%  perf-
> > sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> >      393.39 ± 76%     -96.3%      14.60 ±223%  perf-
> > sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vm
> > as.free_pgtables.exit_mmap
> >        3.87 ±170%     -98.6%       0.05 ±199%  perf-
> > sched.wait_time.max.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> >      520.88 ±222%     -99.9%       0.29 ±201%  perf-
> > sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> >        0.54 ± 82%     -98.4%       0.01 ±141%  perf-
> > sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> >        0.34 ± 57%     -93.1%       0.02 ±203%  perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> >        0.64 ±141%     -99.4%       0.00 ±223%  perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> >        0.28 ±111%     -97.2%       0.01 ±180%  perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> >      210.15 ± 42%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       34.48 ±131%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        1.11 ± 85%     -76.9%       0.26 ±202%  perf-
> > sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.par
> > t.0
> >       92.32 ±212%     -99.7%       0.27 ±123%  perf-
> > sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> >        3252 ± 21%     -58.5%       1351 ±103%  perf-
> > sched.wait_time.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> >        1602 ± 28%     -66.2%     541.12 ±100%  perf-
> > sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >      530.17 ± 95%     -98.5%       7.79 ±119%  perf-
> > sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> >        1177 ± 25%     -58.7%     486.74 ±101%  perf-
> > sched.wait_time.max.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> >       50.88            -1.4       49.53        perf-
> > profile.calltrace.cycles-pp.read
> >       45.95            -1.0       44.92        perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
> >       45.66            -1.0       44.64        perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> >        3.44 ±  4%      -0.8        2.66 ±  4%  perf-
> > profile.calltrace.cycles-
> > pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_strea
> > m_sendmsg.sock_write_iter
> >        3.32 ±  4%      -0.8        2.56 ±  4%  perf-
> > profile.calltrace.cycles-
> > pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.soc
> > k_def_readable.unix_stream_sendmsg
> >        3.28 ±  4%      -0.8        2.52 ±  4%  perf-
> > profile.calltrace.cycles-
> > pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_
> > up_sync_key.sock_def_readable
> >        3.48 ±  3%      -0.6        2.83 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_r
> > ecvmsg.sock_recvmsg
> >        3.52 ±  3%      -0.6        2.87 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.so
> > ck_recvmsg.sock_read_iter
> >        3.45 ±  3%      -0.6        2.80 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.__schedule.schedule.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg
> >       47.06            -0.6       46.45        perf-
> > profile.calltrace.cycles-pp.write
> >        4.26 ±  5%      -0.6        3.69        perf-
> > profile.calltrace.cycles-
> > pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_wr
> > ite_iter.vfs_write
> >        1.58 ±  3%      -0.6        1.02 ±  8%  perf-
> > profile.calltrace.cycles-
> > pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_
> > up_common.__wake_up_sync_key
> >        1.31 ±  3%      -0.5        0.85 ±  8%  perf-
> > profile.calltrace.cycles-
> > pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_fun
> > ction.__wake_up_common
> >        1.25 ±  3%      -0.4        0.81 ±  8%  perf-
> > profile.calltrace.cycles-
> > pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.a
> > utoremove_wake_function
> >        0.84 ±  3%      -0.2        0.60 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwfr
> > ame.read
> >        7.91            -0.2        7.68        perf-
> > profile.calltrace.cycles-
> > pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recv
> > msg.sock_recvmsg.sock_read_iter
> >        3.17 ±  2%      -0.2        2.94        perf-
> > profile.calltrace.cycles-
> > pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.
> > vfs_write.ksys_write
> >        7.80            -0.2        7.58        perf-
> > profile.calltrace.cycles-
> > pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_g
> > eneric.unix_stream_recvmsg.sock_recvmsg
> >        7.58            -0.2        7.36        perf-
> > profile.calltrace.cycles-
> > pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_acto
> > r.unix_stream_read_generic.unix_stream_recvmsg
> >        1.22 ±  4%      -0.2        1.02 ±  4%  perf-
> > profile.calltrace.cycles-
> > pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stre
> > am_read_generic
> >        1.18 ±  4%      -0.2        0.99 ±  4%  perf-
> > profile.calltrace.cycles-
> > pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule
> > _timeout
> >        0.87            -0.2        0.68 ±  8%  perf-
> > profile.calltrace.cycles-
> > pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedul
> > e_timeout
> >        1.14 ±  4%      -0.2        0.95 ±  4%  perf-
> > profile.calltrace.cycles-
> > pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.
> > schedule
> >        0.90            -0.2        0.72 ±  7%  perf-
> > profile.calltrace.cycles-
> > pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_strea
> > m_read_generic
> >        3.45 ±  3%      -0.1        3.30        perf-
> > profile.calltrace.cycles-
> > pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_st
> > ream_read_actor.unix_stream_read_generic
> >        1.96            -0.1        1.82        perf-
> > profile.calltrace.cycles-pp.clear_bhb_loop.read
> >        1.97            -0.1        1.86        perf-
> > profile.calltrace.cycles-pp.clear_bhb_loop.write
> >        2.35            -0.1        2.25        perf-
> > profile.calltrace.cycles-
> > pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.
> > kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> >        2.58            -0.1        2.48        perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64.read
> >        1.38 ±  4%      -0.1        1.28 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.
> > sock_write_iter.vfs_write
> >        1.35            -0.1        1.25        perf-
> > profile.calltrace.cycles-
> > pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_send
> > msg.sock_write_iter.vfs_write
> >        0.67 ±  7%      -0.1        0.58 ±  3%  perf-
> > profile.calltrace.cycles-
> > pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_t
> > ask.__schedule
> >        2.59            -0.1        2.50        perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> >        2.02            -0.1        1.96        perf-
> > profile.calltrace.cycles-
> > pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__allo
> > c_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> >        0.77 ±  3%      -0.0        0.72 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwfram
> > e.write
> >        0.65 ±  4%      -0.0        0.60 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > .read
> >        0.74            -0.0        0.70        perf-
> > profile.calltrace.cycles-
> > pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_rec
> > vmsg.sock_read_iter
> >        1.04            -0.0        0.99        perf-
> > profile.calltrace.cycles-
> > pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.__kmalloc
> > _node_track_caller_noprof.kmalloc_reserve.__alloc_skb
> >        0.69            -0.0        0.65 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.check_heap_object.__check_object_size.skb_copy_datagram_from_ite
> > r.unix_stream_sendmsg.sock_write_iter
> >        0.82            -0.0        0.80        perf-
> > profile.calltrace.cycles-
> > pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.kmem_cach
> > e_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
> >        0.57            -0.0        0.56        perf-
> > profile.calltrace.cycles-
> > pp.refill_obj_stock.__memcg_slab_free_hook.kmem_cache_free.unix_str
> > eam_read_generic.unix_stream_recvmsg
> >        0.80 ±  9%      +0.2        1.01 ±  8%  perf-
> > profile.calltrace.cycles-
> > pp._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix
> > _stream_sendmsg.sock_write_iter
> >        2.50 ±  4%      +0.3        2.82 ±  9%  perf-
> > profile.calltrace.cycles-
> > pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb
> > _with_frags.sock_alloc_send_pskb
> >        2.64 ±  6%      +0.4        3.06 ± 12%  perf-
> > profile.calltrace.cycles-
> > pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_pa
> > rtials.kmem_cache_free.unix_stream_read_generic
> >        2.73 ±  6%      +0.4        3.16 ± 12%  perf-
> > profile.calltrace.cycles-
> > pp._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_strea
> > m_read_generic.unix_stream_recvmsg
> >        2.87 ±  6%      +0.4        3.30 ± 12%  perf-
> > profile.calltrace.cycles-
> > pp.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_str
> > eam_recvmsg.sock_recvmsg
> >       18.38            +0.6       18.93        perf-
> > profile.calltrace.cycles-
> > pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_wri
> > te.ksys_write
> >        0.00            +0.7        0.70 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_
> > enter.cpuidle_enter_state
> >        0.00            +0.8        0.76 ± 16%  perf-
> > profile.calltrace.cycles-
> > pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_stream_send
> > msg.sock_write_iter.vfs_write
> >        0.00            +1.5        1.46 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_
> > state.cpuidle_enter
> >        0.00            +1.5        1.46 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_e
> > nter.cpuidle_idle_call
> >        0.00            +1.5        1.46 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_c
> > all.do_idle
> >        0.00            +1.5        1.50 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_
> > startup_entry
> >        0.00            +1.5        1.52 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_
> > secondary
> >        0.00            +1.6        1.61 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.comm
> > on_startup_64
> >        0.18 ±141%      +1.8        1.93 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> >        0.18 ±141%      +1.8        1.94 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.cpu_startup_entry.start_secondary.common_startup_64
> >        0.18 ±141%      +1.8        1.94 ± 11%  perf-
> > profile.calltrace.cycles-pp.start_secondary.common_startup_64
> >        0.18 ±141%      +1.8        1.97 ± 11%  perf-
> > profile.calltrace.cycles-pp.common_startup_64
> >        0.00            +2.0        1.96 ± 11%  perf-
> > profile.calltrace.cycles-
> > pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_ha
> > lt.acpi_idle_do_entry.acpi_idle_enter
> >       87.96            -1.4       86.57        perf-
> > profile.children.cycles-pp.do_syscall_64
> >       88.72            -1.4       87.33        perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       51.44            -1.4       50.05        perf-
> > profile.children.cycles-pp.read
> >        4.55 ±  2%      -0.8        3.74 ±  5%  perf-
> > profile.children.cycles-pp.schedule
> >        3.76 ±  4%      -0.7        3.02 ±  3%  perf-
> > profile.children.cycles-pp.__wake_up_common
> >        3.64 ±  4%      -0.7        2.92 ±  3%  perf-
> > profile.children.cycles-pp.autoremove_wake_function
> >        3.60 ±  4%      -0.7        2.90 ±  3%  perf-
> > profile.children.cycles-pp.try_to_wake_up
> >        4.00 ±  2%      -0.6        3.36 ±  4%  perf-
> > profile.children.cycles-pp.schedule_timeout
> >        4.65 ±  2%      -0.6        4.02 ±  4%  perf-
> > profile.children.cycles-pp.__schedule
> >       47.64            -0.6       47.01        perf-
> > profile.children.cycles-pp.write
> >        4.58 ±  4%      -0.5        4.06        perf-
> > profile.children.cycles-pp.__wake_up_sync_key
> >        1.45 ±  2%      -0.4        1.00 ±  5%  perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> >        1.84 ±  3%      -0.3        1.50 ±  3%  perf-
> > profile.children.cycles-pp.ttwu_do_activate
> >        1.62 ±  2%      -0.3        1.33 ±  3%  perf-
> > profile.children.cycles-pp.enqueue_task
> >        1.53 ±  2%      -0.3        1.26 ±  3%  perf-
> > profile.children.cycles-pp.enqueue_task_fair
> >        1.40            -0.3        1.14 ±  6%  perf-
> > profile.children.cycles-pp.pick_next_task_fair
> >        3.97            -0.2        3.73        perf-
> > profile.children.cycles-pp.clear_bhb_loop
> >        1.43            -0.2        1.19 ±  5%  perf-
> > profile.children.cycles-pp.__pick_next_task
> >        0.75 ±  4%      -0.2        0.52 ±  8%  perf-
> > profile.children.cycles-pp.raw_spin_rq_lock_nested
> >        7.95            -0.2        7.72        perf-
> > profile.children.cycles-pp.unix_stream_read_actor
> >        7.84            -0.2        7.61        perf-
> > profile.children.cycles-pp.skb_copy_datagram_iter
> >        3.24 ±  2%      -0.2        3.01        perf-
> > profile.children.cycles-pp.skb_copy_datagram_from_iter
> >        7.63            -0.2        7.42        perf-
> > profile.children.cycles-pp.__skb_datagram_iter
> >        0.94 ±  4%      -0.2        0.73 ±  4%  perf-
> > profile.children.cycles-pp.enqueue_entity
> >        0.95 ±  8%      -0.2        0.76 ±  4%  perf-
> > profile.children.cycles-pp.update_curr
> >        1.37 ±  3%      -0.2        1.18 ±  3%  perf-
> > profile.children.cycles-pp.dequeue_task_fair
> >        1.34 ±  4%      -0.2        1.16 ±  3%  perf-
> > profile.children.cycles-pp.try_to_block_task
> >        4.50            -0.2        4.34        perf-
> > profile.children.cycles-pp.__memcg_slab_post_alloc_hook
> >        1.37 ±  3%      -0.2        1.20 ±  3%  perf-
> > profile.children.cycles-pp.dequeue_entities
> >        3.48 ±  3%      -0.1        3.33        perf-
> > profile.children.cycles-pp._copy_to_iter
> >        0.91            -0.1        0.78 ±  3%  perf-
> > profile.children.cycles-pp.update_load_avg
> >        4.85            -0.1        4.72        perf-
> > profile.children.cycles-pp.__check_object_size
> >        3.23            -0.1        3.11        perf-
> > profile.children.cycles-pp.entry_SYSCALL_64
> >        0.54 ±  3%      -0.1        0.42 ±  5%  perf-
> > profile.children.cycles-pp.switch_mm_irqs_off
> >        1.40 ±  4%      -0.1        1.30 ±  2%  perf-
> > profile.children.cycles-pp._copy_from_iter
> >        2.02            -0.1        1.92        perf-
> > profile.children.cycles-pp.its_return_thunk
> >        0.43 ±  2%      -0.1        0.32 ±  3%  perf-
> > profile.children.cycles-pp.switch_fpu_return
> >        0.29 ±  2%      -0.1        0.18 ±  6%  perf-
> > profile.children.cycles-pp.__enqueue_entity
> >        1.46 ±  3%      -0.1        1.36 ±  2%  perf-
> > profile.children.cycles-pp.fdget_pos
> >        0.44 ±  3%      -0.1        0.34 ±  5%  perf-
> > profile.children.cycles-pp.set_next_entity
> >        0.42 ±  2%      -0.1        0.32 ±  4%  perf-
> > profile.children.cycles-pp.pick_task_fair
> >        0.31 ±  2%      -0.1        0.24 ±  6%  perf-
> > profile.children.cycles-pp.reweight_entity
> >        0.28 ±  2%      -0.1        0.20 ±  7%  perf-
> > profile.children.cycles-pp.__dequeue_entity
> >        1.96            -0.1        1.88        perf-
> > profile.children.cycles-pp.obj_cgroup_charge_account
> >        0.28 ±  2%      -0.1        0.21 ±  3%  perf-
> > profile.children.cycles-pp.update_cfs_group
> >        0.23 ±  2%      -0.1        0.16 ±  5%  perf-
> > profile.children.cycles-pp.pick_eevdf
> >        0.26 ±  2%      -0.1        0.19 ±  4%  perf-
> > profile.children.cycles-pp.wakeup_preempt
> >        1.46            -0.1        1.40        perf-
> > profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> >        0.48 ±  2%      -0.1        0.42 ±  5%  perf-
> > profile.children.cycles-pp.__rseq_handle_notify_resume
> >        0.30            -0.1        0.24 ±  4%  perf-
> > profile.children.cycles-pp.restore_fpregs_from_fpstate
> >        0.82            -0.1        0.77        perf-
> > profile.children.cycles-pp.__cond_resched
> >        0.27 ±  2%      -0.0        0.22 ±  4%  perf-
> > profile.children.cycles-pp.__update_load_avg_se
> >        0.14 ±  3%      -0.0        0.10 ±  7%  perf-
> > profile.children.cycles-pp.update_curr_se
> >        0.79            -0.0        0.74        perf-
> > profile.children.cycles-pp.mutex_lock
> >        0.34 ±  3%      -0.0        0.30 ±  5%  perf-
> > profile.children.cycles-pp.rseq_ip_fixup
> >        0.15 ±  4%      -0.0        0.11 ±  5%  perf-
> > profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> >        0.21 ±  3%      -0.0        0.16 ±  4%  perf-
> > profile.children.cycles-pp.__switch_to
> >        0.17 ±  4%      -0.0        0.13 ±  7%  perf-
> > profile.children.cycles-pp.place_entity
> >        0.22            -0.0        0.18 ±  2%  perf-
> > profile.children.cycles-pp.wake_affine
> >        0.24            -0.0        0.20 ±  2%  perf-
> > profile.children.cycles-pp.check_stack_object
> >        0.64 ±  2%      -0.0        0.61 ±  3%  perf-
> > profile.children.cycles-pp.__virt_addr_valid
> >        0.38 ±  2%      -0.0        0.34 ±  2%  perf-
> > profile.children.cycles-pp.tick_nohz_handler
> >        0.18 ±  3%      -0.0        0.14 ±  6%  perf-
> > profile.children.cycles-pp.update_rq_clock
> >        0.66            -0.0        0.62        perf-
> > profile.children.cycles-pp.rw_verify_area
> >        0.19            -0.0        0.16 ±  4%  perf-
> > profile.children.cycles-pp.task_mm_cid_work
> >        0.34 ±  3%      -0.0        0.31 ±  2%  perf-
> > profile.children.cycles-pp.update_process_times
> >        0.12 ±  8%      -0.0        0.08 ± 11%  perf-
> > profile.children.cycles-pp.detach_tasks
> >        0.39 ±  3%      -0.0        0.36 ±  2%  perf-
> > profile.children.cycles-pp.__hrtimer_run_queues
> >        0.21 ±  3%      -0.0        0.18 ±  6%  perf-
> > profile.children.cycles-pp.__update_load_avg_cfs_rq
> >        0.18 ±  6%      -0.0        0.15 ±  4%  perf-
> > profile.children.cycles-pp.task_tick_fair
> >        0.25 ±  3%      -0.0        0.22 ±  4%  perf-
> > profile.children.cycles-pp.rseq_get_rseq_cs
> >        0.23 ±  5%      -0.0        0.20 ±  3%  perf-
> > profile.children.cycles-pp.sched_tick
> >        0.14 ±  3%      -0.0        0.11 ±  6%  perf-
> > profile.children.cycles-pp.check_preempt_wakeup_fair
> >        0.11 ±  4%      -0.0        0.08 ±  7%  perf-
> > profile.children.cycles-pp.update_min_vruntime
> >        0.06            -0.0        0.03 ± 70%  perf-
> > profile.children.cycles-pp.update_curr_dl_se
> >        0.14 ±  3%      -0.0        0.12 ±  5%  perf-
> > profile.children.cycles-pp.put_prev_entity
> >        0.13 ±  5%      -0.0        0.10 ±  3%  perf-
> > profile.children.cycles-pp.task_h_load
> >        0.68            -0.0        0.65        perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> >        0.46 ±  2%      -0.0        0.43 ±  2%  perf-
> > profile.children.cycles-pp.hrtimer_interrupt
> >        0.52            -0.0        0.50        perf-
> > profile.children.cycles-pp.scm_recv_unix
> >        0.08 ±  4%      -0.0        0.06 ±  9%  perf-
> > profile.children.cycles-pp.__cgroup_account_cputime
> >        0.11 ±  5%      -0.0        0.09 ±  4%  perf-
> > profile.children.cycles-pp.__switch_to_asm
> >        0.46 ±  2%      -0.0        0.44 ±  2%  perf-
> > profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> >        0.08 ±  8%      -0.0        0.06 ±  9%  perf-
> > profile.children.cycles-pp.activate_task
> >        0.08 ±  8%      -0.0        0.06 ±  9%  perf-
> > profile.children.cycles-pp.detach_task
> >        0.11 ±  5%      -0.0        0.09 ±  7%  perf-
> > profile.children.cycles-pp.os_xsave
> >        0.13 ±  5%      -0.0        0.11 ±  6%  perf-
> > profile.children.cycles-pp.avg_vruntime
> >        0.13 ±  4%      -0.0        0.11 ±  5%  perf-
> > profile.children.cycles-pp.update_entity_lag
> >        0.08 ±  4%      -0.0        0.06 ±  7%  perf-
> > profile.children.cycles-pp.__calc_delta
> >        0.09 ±  5%      -0.0        0.07 ±  8%  perf-
> > profile.children.cycles-pp.vruntime_eligible
> >        0.34 ±  2%      -0.0        0.32        perf-
> > profile.children.cycles-pp._raw_spin_unlock_irqrestore
> >        0.30            -0.0        0.29 ±  2%  perf-
> > profile.children.cycles-pp.__build_skb_around
> >        0.08 ±  5%      -0.0        0.07 ±  6%  perf-
> > profile.children.cycles-pp.rseq_update_cpu_node_id
> >        0.15            -0.0        0.14        perf-
> > profile.children.cycles-pp.security_socket_getpeersec_dgram
> >        0.07 ±  5%      +0.0        0.09 ±  5%  perf-
> > profile.children.cycles-pp.native_irq_return_iret
> >        0.38 ±  2%      +0.0        0.40 ±  2%  perf-
> > profile.children.cycles-pp.mod_memcg_lruvec_state
> >        0.27 ±  2%      +0.0        0.30 ±  2%  perf-
> > profile.children.cycles-pp.prepare_task_switch
> >        0.05 ±  7%      +0.0        0.08 ±  8%  perf-
> > profile.children.cycles-pp.handle_softirqs
> >        0.06            +0.0        0.09 ± 11%  perf-
> > profile.children.cycles-pp.finish_wait
> >        0.06 ±  7%      +0.0        0.11 ±  6%  perf-
> > profile.children.cycles-pp.__irq_exit_rcu
> >        0.06 ±  8%      +0.1        0.11 ±  8%  perf-
> > profile.children.cycles-pp.ttwu_queue_wakelist
> >        0.01 ±223%      +0.1        0.07 ± 10%  perf-
> > profile.children.cycles-pp.ktime_get
> >        0.54 ±  4%      +0.1        0.61        perf-
> > profile.children.cycles-pp.select_task_rq
> >        0.00            +0.1        0.07 ± 10%  perf-
> > profile.children.cycles-pp.enqueue_dl_entity
> >        0.12 ±  4%      +0.1        0.19 ±  7%  perf-
> > profile.children.cycles-pp.get_any_partial
> >        0.10 ±  9%      +0.1        0.18 ±  5%  perf-
> > profile.children.cycles-pp.available_idle_cpu
> >        0.00            +0.1        0.08 ±  9%  perf-
> > profile.children.cycles-pp.hrtimer_start_range_ns
> >        0.00            +0.1        0.08 ± 11%  perf-
> > profile.children.cycles-pp.dl_server_start
> >        0.00            +0.1        0.08 ± 11%  perf-
> > profile.children.cycles-pp.dl_server_stop
> >        0.46 ±  2%      +0.1        0.54 ±  2%  perf-
> > profile.children.cycles-pp.select_task_rq_fair
> >        0.00            +0.1        0.10 ± 10%  perf-
> > profile.children.cycles-pp.select_idle_core
> >        0.09 ±  7%      +0.1        0.20 ±  8%  perf-
> > profile.children.cycles-pp.select_idle_cpu
> >        0.18 ±  4%      +0.1        0.31 ±  6%  perf-
> > profile.children.cycles-pp.select_idle_sibling
> >        0.00            +0.2        0.18 ±  4%  perf-
> > profile.children.cycles-pp.process_one_work
> >        0.06 ± 13%      +0.2        0.25 ±  9%  perf-
> > profile.children.cycles-pp.schedule_idle
> >        0.44 ±  2%      +0.2        0.64 ±  8%  perf-
> > profile.children.cycles-pp.prepare_to_wait
> >        0.00            +0.2        0.21 ±  5%  perf-
> > profile.children.cycles-pp.kthread
> >        0.00            +0.2        0.21 ±  5%  perf-
> > profile.children.cycles-pp.worker_thread
> >        0.00            +0.2        0.21 ±  4%  perf-
> > profile.children.cycles-pp.ret_from_fork
> >        0.00            +0.2        0.21 ±  4%  perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> >        0.11 ± 12%      +0.3        0.36 ±  9%  perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> >        0.31 ± 35%      +0.3        0.59 ± 11%  perf-
> > profile.children.cycles-pp.__cmd_record
> >        0.26 ± 45%      +0.3        0.54 ± 13%  perf-
> > profile.children.cycles-pp.perf_session__process_events
> >        0.26 ± 45%      +0.3        0.54 ± 13%  perf-
> > profile.children.cycles-pp.reader__read_event
> >        0.26 ± 45%      +0.3        0.54 ± 13%  perf-
> > profile.children.cycles-pp.record__finish_output
> >        0.16 ± 11%      +0.3        0.45 ±  9%  perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> >        0.14 ± 11%      +0.3        0.45 ±  9%  perf-
> > profile.children.cycles-pp.__sysvec_call_function_single
> >        0.14 ± 60%      +0.3        0.48 ± 17%  perf-
> > profile.children.cycles-pp.ordered_events__queue
> >        0.14 ± 61%      +0.3        0.48 ± 17%  perf-
> > profile.children.cycles-pp.queue_event
> >        0.15 ± 59%      +0.3        0.49 ± 16%  perf-
> > profile.children.cycles-pp.process_simple
> >        0.16 ± 12%      +0.4        0.54 ± 10%  perf-
> > profile.children.cycles-pp.sysvec_call_function_single
> >        4.61 ±  3%      +0.5        5.13 ±  8%  perf-
> > profile.children.cycles-pp.get_partial_node
> >        5.57 ±  3%      +0.6        6.12 ±  7%  perf-
> > profile.children.cycles-pp.___slab_alloc
> >       18.44            +0.6       19.00        perf-
> > profile.children.cycles-pp.sock_alloc_send_pskb
> >        6.51 ±  3%      +0.7        7.26 ±  9%  perf-
> > profile.children.cycles-pp.__put_partials
> >        0.33 ± 14%      +1.0        1.30 ± 11%  perf-
> > profile.children.cycles-pp.asm_sysvec_call_function_single
> >        0.34 ± 17%      +1.1        1.47 ± 11%  perf-
> > profile.children.cycles-pp.pv_native_safe_halt
> >        0.34 ± 17%      +1.1        1.48 ± 11%  perf-
> > profile.children.cycles-pp.acpi_safe_halt
> >        0.34 ± 17%      +1.1        1.48 ± 11%  perf-
> > profile.children.cycles-pp.acpi_idle_do_entry
> >        0.34 ± 17%      +1.1        1.48 ± 11%  perf-
> > profile.children.cycles-pp.acpi_idle_enter
> >        0.35 ± 17%      +1.2        1.53 ± 11%  perf-
> > profile.children.cycles-pp.cpuidle_enter_state
> >        0.35 ± 17%      +1.2        1.54 ± 11%  perf-
> > profile.children.cycles-pp.cpuidle_enter
> >        0.38 ± 17%      +1.3        1.63 ± 11%  perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> >        0.45 ± 16%      +1.5        1.94 ± 11%  perf-
> > profile.children.cycles-pp.start_secondary
> >        0.46 ± 17%      +1.5        1.96 ± 11%  perf-
> > profile.children.cycles-pp.do_idle
> >        0.46 ± 17%      +1.5        1.97 ± 11%  perf-
> > profile.children.cycles-pp.common_startup_64
> >        0.46 ± 17%      +1.5        1.97 ± 11%  perf-
> > profile.children.cycles-pp.cpu_startup_entry
> >       13.76 ±  2%      +1.7       15.44 ±  5%  perf-
> > profile.children.cycles-pp._raw_spin_lock_irqsave
> >       12.09 ±  2%      +1.9       14.00 ±  6%  perf-
> > profile.children.cycles-pp.native_queued_spin_lock_slowpath
> >        3.93            -0.2        3.69        perf-
> > profile.self.cycles-pp.clear_bhb_loop
> >        3.43 ±  3%      -0.1        3.29        perf-
> > profile.self.cycles-pp._copy_to_iter
> >        0.50 ±  2%      -0.1        0.39 ±  5%  perf-
> > profile.self.cycles-pp.switch_mm_irqs_off
> >        1.37 ±  4%      -0.1        1.27 ±  2%  perf-
> > profile.self.cycles-pp._copy_from_iter
> >        0.28 ±  2%      -0.1        0.18 ±  7%  perf-
> > profile.self.cycles-pp.__enqueue_entity
> >        1.41 ±  3%      -0.1        1.31 ±  2%  perf-
> > profile.self.cycles-pp.fdget_pos
> >        2.51            -0.1        2.42        perf-
> > profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> >        1.35            -0.1        1.28        perf-
> > profile.self.cycles-pp.read
> >        2.24            -0.1        2.17        perf-
> > profile.self.cycles-pp.do_syscall_64
> >        0.27 ±  3%      -0.1        0.20 ±  3%  perf-
> > profile.self.cycles-pp.update_cfs_group
> >        1.28            -0.1        1.22        perf-
> > profile.self.cycles-pp.sock_write_iter
> >        0.84            -0.1        0.77        perf-
> > profile.self.cycles-pp.vfs_read
> >        1.42            -0.1        1.36        perf-
> > profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> >        1.20            -0.1        1.14        perf-
> > profile.self.cycles-pp.__alloc_skb
> >        0.18 ±  2%      -0.1        0.13 ±  5%  perf-
> > profile.self.cycles-pp.pick_eevdf
> >        1.04            -0.1        0.99        perf-
> > profile.self.cycles-pp.its_return_thunk
> >        0.29 ±  2%      -0.1        0.24 ±  4%  perf-
> > profile.self.cycles-pp.restore_fpregs_from_fpstate
> >        0.28 ±  5%      -0.1        0.23 ±  6%  perf-
> > profile.self.cycles-pp.update_curr
> >        0.13 ±  5%      -0.0        0.08 ±  5%  perf-
> > profile.self.cycles-pp.switch_fpu_return
> >        0.20 ±  3%      -0.0        0.15 ±  6%  perf-
> > profile.self.cycles-pp.__dequeue_entity
> >        1.00            -0.0        0.95        perf-
> > profile.self.cycles-pp.kmem_cache_alloc_node_noprof
> >        0.33            -0.0        0.28 ±  2%  perf-
> > profile.self.cycles-pp.update_load_avg
> >        0.88            -0.0        0.83 ±  2%  perf-
> > profile.self.cycles-pp.vfs_write
> >        0.91            -0.0        0.86        perf-
> > profile.self.cycles-pp.sock_read_iter
> >        0.13 ±  3%      -0.0        0.08 ±  4%  perf-
> > profile.self.cycles-pp.update_curr_se
> >        0.25 ±  2%      -0.0        0.21 ±  4%  perf-
> > profile.self.cycles-pp.__update_load_avg_se
> >        1.22            -0.0        1.18        perf-
> > profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
> >        0.68            -0.0        0.63        perf-
> > profile.self.cycles-pp.__check_object_size
> >        0.78 ±  2%      -0.0        0.74        perf-
> > profile.self.cycles-pp.obj_cgroup_charge_account
> >        0.20 ±  3%      -0.0        0.16 ±  4%  perf-
> > profile.self.cycles-pp.__switch_to
> >        0.15 ±  3%      -0.0        0.11 ±  4%  perf-
> > profile.self.cycles-pp.try_to_wake_up
> >        0.90            -0.0        0.86        perf-
> > profile.self.cycles-pp.entry_SYSCALL_64
> >        0.76 ±  2%      -0.0        0.73        perf-
> > profile.self.cycles-pp.__check_heap_object
> >        0.92            -0.0        0.89 ±  2%  perf-
> > profile.self.cycles-pp.__account_obj_stock
> >        0.19 ±  2%      -0.0        0.16 ±  2%  perf-
> > profile.self.cycles-pp.check_stack_object
> >        0.40 ±  3%      -0.0        0.37        perf-
> > profile.self.cycles-pp.__schedule
> >        0.60 ±  2%      -0.0        0.56 ±  3%  perf-
> > profile.self.cycles-pp.__virt_addr_valid
> >        0.71            -0.0        0.68        perf-
> > profile.self.cycles-pp.__skb_datagram_iter
> >        0.18 ±  4%      -0.0        0.14 ±  5%  perf-
> > profile.self.cycles-pp.task_mm_cid_work
> >        0.68            -0.0        0.65        perf-
> > profile.self.cycles-pp.refill_obj_stock
> >        0.34            -0.0        0.31 ±  2%  perf-
> > profile.self.cycles-pp.unix_stream_recvmsg
> >        0.06 ±  7%      -0.0        0.03 ± 70%  perf-
> > profile.self.cycles-pp.enqueue_task
> >        0.11            -0.0        0.08        perf-
> > profile.self.cycles-pp.pick_task_fair
> >        0.15 ±  2%      -0.0        0.12 ±  3%  perf-
> > profile.self.cycles-pp.enqueue_task_fair
> >        0.20 ±  3%      -0.0        0.17 ±  7%  perf-
> > profile.self.cycles-pp.__update_load_avg_cfs_rq
> >        0.41            -0.0        0.38        perf-
> > profile.self.cycles-pp.sock_recvmsg
> >        0.10            -0.0        0.07 ±  6%  perf-
> > profile.self.cycles-pp.update_min_vruntime
> >        0.13 ±  3%      -0.0        0.10        perf-
> > profile.self.cycles-pp.task_h_load
> >        0.23 ±  3%      -0.0        0.20 ±  6%  perf-
> > profile.self.cycles-pp.__get_user_8
> >        0.12 ±  4%      -0.0        0.10 ±  3%  perf-
> > profile.self.cycles-pp.exit_to_user_mode_loop
> >        0.39 ±  2%      -0.0        0.37 ±  2%  perf-
> > profile.self.cycles-pp.rw_verify_area
> >        0.11 ±  3%      -0.0        0.09 ±  7%  perf-
> > profile.self.cycles-pp.os_xsave
> >        0.12 ±  3%      -0.0        0.10 ±  3%  perf-
> > profile.self.cycles-pp.pick_next_task_fair
> >        0.35            -0.0        0.33 ±  2%  perf-
> > profile.self.cycles-pp.skb_copy_datagram_from_iter
> >        0.46            -0.0        0.44        perf-
> > profile.self.cycles-pp.mutex_lock
> >        0.11 ±  4%      -0.0        0.09 ±  4%  perf-
> > profile.self.cycles-pp.__switch_to_asm
> >        0.10 ±  3%      -0.0        0.08 ±  5%  perf-
> > profile.self.cycles-pp.enqueue_entity
> >        0.08 ±  7%      -0.0        0.06 ±  6%  perf-
> > profile.self.cycles-pp.place_entity
> >        0.30            -0.0        0.28 ±  2%  perf-
> > profile.self.cycles-pp.alloc_skb_with_frags
> >        0.50            -0.0        0.48        perf-
> > profile.self.cycles-pp.kfree
> >        0.30            -0.0        0.28        perf-
> > profile.self.cycles-pp.ksys_write
> >        0.12 ±  3%      -0.0        0.10 ±  3%  perf-
> > profile.self.cycles-pp.dequeue_entity
> >        0.11 ±  4%      -0.0        0.09        perf-
> > profile.self.cycles-pp.prepare_to_wait
> >        0.19 ±  2%      -0.0        0.17        perf-
> > profile.self.cycles-pp.update_rq_clock_task
> >        0.27            -0.0        0.25 ±  2%  perf-
> > profile.self.cycles-pp.__build_skb_around
> >        0.08 ±  6%      -0.0        0.06 ±  9%  perf-
> > profile.self.cycles-pp.vruntime_eligible
> >        0.12 ±  4%      -0.0        0.10        perf-
> > profile.self.cycles-pp.__wake_up_common
> >        0.27            -0.0        0.26        perf-
> > profile.self.cycles-pp.kmalloc_reserve
> >        0.48            -0.0        0.46        perf-
> > profile.self.cycles-pp.unix_write_space
> >        0.19            -0.0        0.18 ±  2%  perf-
> > profile.self.cycles-pp.skb_copy_datagram_iter
> >        0.07            -0.0        0.06 ±  6%  perf-
> > profile.self.cycles-pp.__calc_delta
> >        0.06 ±  6%      -0.0        0.05        perf-
> > profile.self.cycles-pp.__put_user_8
> >        0.28            -0.0        0.27        perf-
> > profile.self.cycles-pp._raw_spin_unlock_irqrestore
> >        0.11            -0.0        0.10        perf-
> > profile.self.cycles-pp.wait_for_unix_gc
> >        0.05            +0.0        0.06        perf-
> > profile.self.cycles-pp.__x64_sys_write
> >        0.07 ±  5%      +0.0        0.08 ±  5%  perf-
> > profile.self.cycles-pp.native_irq_return_iret
> >        0.19 ±  7%      +0.0        0.22 ±  4%  perf-
> > profile.self.cycles-pp.prepare_task_switch
> >        0.10 ±  6%      +0.1        0.17 ±  5%  perf-
> > profile.self.cycles-pp.available_idle_cpu
> >        0.14 ± 61%      +0.3        0.48 ± 17%  perf-
> > profile.self.cycles-pp.queue_event
> >        0.19 ± 18%      +0.7        0.85 ± 12%  perf-
> > profile.self.cycles-pp.pv_native_safe_halt
> >       12.07 ±  2%      +1.9       13.97 ±  6%  perf-
> > profile.self.cycles-pp.native_queued_spin_lock_slowpath
> > 
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> >    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/shell_rtns_3/aim9/300s
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >        9156           +20.2%      11004        vmstat.system.cs
> >     8715946 ±  6%     -14.0%    7494314 ± 13%  meminfo.DirectMap2M
> >       10992           +85.4%      20381        meminfo.PageTables
> >      318.58            -1.7%     313.01       
> > aim9.shell_rtns_3.ops_per_sec
> >    27145198            -2.1%   26576524       
> > aim9.time.minor_page_faults
> >     1049306            -1.8%    1030938       
> > aim9.time.voluntary_context_switches
> >        6173 ± 20%     +74.0%      10742 ±  4%  numa-
> > meminfo.node0.PageTables
> >        5702 ± 31%     +55.1%       8844 ± 19%  numa-
> > meminfo.node0.Shmem
> >        4803 ± 25%    +100.6%       9636 ±  6%  numa-
> > meminfo.node1.PageTables
> >        1538 ± 20%     +73.7%       2673 ±  5%  numa-
> > vmstat.node0.nr_page_table_pages
> >        1425 ± 31%     +55.1%       2210 ± 19%  numa-
> > vmstat.node0.nr_shmem
> >        1194 ± 25%    +101.2%       2402 ±  6%  numa-
> > vmstat.node1.nr_page_table_pages
> >       30413           +19.3%      36291       
> > sched_debug.cpu.nr_switches.avg
> >       84768 ±  6%     +20.3%     101955 ±  4% 
> > sched_debug.cpu.nr_switches.max
> >       25510 ± 13%     +23.0%      31383 ±  3% 
> > sched_debug.cpu.nr_switches.stddev
> >        2727           +85.8%       5066        proc-
> > vmstat.nr_page_table_pages
> >    19325131            -1.6%   19014535        proc-vmstat.numa_hit
> >    19274656            -1.6%   18964467        proc-
> > vmstat.numa_local
> >    19877211            -1.6%   19563123        proc-
> > vmstat.pgalloc_normal
> >    28020416            -2.0%   27451741        proc-vmstat.pgfault
> >    19829318            -1.6%   19508263        proc-vmstat.pgfree
> >        2679            -1.6%       2636        proc-
> > vmstat.unevictable_pgs_culled
> >        0.03 ± 10%     +30.9%       0.04 ±  2%  perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        0.02 ±  5%     +26.2%       0.02 ±  3%  perf-
> > sched.total_sch_delay.average.ms
> >       27.03 ±  2%     -12.4%      23.66        perf-
> > sched.total_wait_and_delay.average.ms
> >       23171           +18.2%      27385        perf-
> > sched.total_wait_and_delay.count.ms
> >       27.01 ±  2%     -12.5%      23.64        perf-
> > sched.total_wait_time.average.ms
> >      110.73 ±  4%     -71.1%      31.98        perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >        1662 ±  2%    +278.6%       6294        perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> >      110.70 ±  4%     -71.1%      31.94        perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        5.94            +0.1        6.00        perf-stat.i.branch-
> > miss-rate%
> >        9184           +20.2%      11041        perf-stat.i.context-
> > switches
> >        1.96            +1.6%       1.99        perf-stat.i.cpi
> >       71.73 ±  4%     +66.1%     119.11 ±  5%  perf-stat.i.cpu-
> > migrations
> >        0.53            -1.4%       0.52        perf-stat.i.ipc
> >        3.79            -2.0%       3.71        perf-
> > stat.i.metric.K/sec
> >       90919            -2.0%      89065        perf-stat.i.minor-
> > faults
> >       90919            -2.0%      89065        perf-stat.i.page-
> > faults
> >        6.00            +0.1        6.06        perf-
> > stat.overall.branch-miss-rate%
> >        1.79            +1.2%       1.81        perf-
> > stat.overall.cpi
> >        0.56            -1.2%       0.55        perf-
> > stat.overall.ipc
> >        9154           +20.2%      11004        perf-
> > stat.ps.context-switches
> >       71.49 ±  4%     +66.1%     118.72 ±  5%  perf-stat.ps.cpu-
> > migrations
> >       90616            -2.0%      88768        perf-stat.ps.minor-
> > faults
> >       90616            -2.0%      88768        perf-stat.ps.page-
> > faults
> >        8.89            -0.2        8.68        perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> >        8.88            -0.2        8.66        perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        3.47 ±  2%      -0.2        3.29        perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64
> > _after_hwframe
> >        3.47 ±  2%      -0.2        3.29        perf-
> > profile.calltrace.cycles-
> > pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe
> >        3.51 ±  3%      -0.2        3.33        perf-
> > profile.calltrace.cycles-
> > pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        3.47 ±  2%      -0.2        3.29        perf-
> > profile.calltrace.cycles-
> > pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_sysca
> > ll_64
> >        1.66 ±  2%      -0.1        1.57 ±  4%  perf-
> > profile.calltrace.cycles-pp.setlocale
> >        0.27 ±100%      +0.3        0.61 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_s
> > tartup_64
> >        0.18 ±141%      +0.4        0.60 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_seconda
> > ry
> >       62.46            +0.6       63.01        perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_
> > startup_entry
> >        0.09 ±223%      +0.6        0.65 ±  7%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> >        0.09 ±223%      +0.6        0.65 ±  7%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> >       49.01            +0.6       49.60        perf-
> > profile.calltrace.cycles-
> > pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.d
> > o_idle
> >       67.47            +0.7       68.17        perf-
> > profile.calltrace.cycles-pp.common_startup_64
> >       20.25            -0.7       19.58        perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       20.21            -0.7       19.54        perf-
> > profile.children.cycles-pp.do_syscall_64
> >        6.54            -0.2        6.33        perf-
> > profile.children.cycles-pp.asm_exc_page_fault
> >        6.10            -0.2        5.90        perf-
> > profile.children.cycles-pp.do_user_addr_fault
> >        3.77 ±  3%      -0.2        3.60        perf-
> > profile.children.cycles-pp.x64_sys_call
> >        3.62 ±  3%      -0.2        3.46        perf-
> > profile.children.cycles-pp.do_exit
> >        2.63 ±  3%      -0.2        2.48 ±  2%  perf-
> > profile.children.cycles-pp.__mmput
> >        2.16 ±  2%      -0.1        2.06 ±  3%  perf-
> > profile.children.cycles-pp.ksys_mmap_pgoff
> >        1.66 ±  2%      -0.1        1.57 ±  4%  perf-
> > profile.children.cycles-pp.setlocale
> >        2.69 ±  2%      -0.1        2.61        perf-
> > profile.children.cycles-pp.do_pte_missing
> >        0.77 ±  5%      -0.1        0.70 ±  6%  perf-
> > profile.children.cycles-pp.tlb_finish_mmu
> >        0.92 ±  2%      -0.0        0.87 ±  4%  perf-
> > profile.children.cycles-pp.__irqentry_text_end
> >        0.08 ± 10%      -0.0        0.04 ± 71%  perf-
> > profile.children.cycles-pp.tick_nohz_tick_stopped
> >        0.10 ± 11%      -0.0        0.07 ± 21%  perf-
> > profile.children.cycles-pp.__percpu_counter_init_many
> >        0.14 ±  9%      -0.0        0.11 ±  4%  perf-
> > profile.children.cycles-pp.strnlen
> >        0.12 ± 11%      -0.0        0.10 ±  8%  perf-
> > profile.children.cycles-pp.mas_prev_slot
> >        0.11 ± 12%      +0.0        0.14 ±  9%  perf-
> > profile.children.cycles-pp.update_curr
> >        0.19 ±  8%      +0.0        0.22 ±  6%  perf-
> > profile.children.cycles-pp.enqueue_entity
> >        0.10 ± 11%      +0.0        0.13 ± 11%  perf-
> > profile.children.cycles-pp.__perf_event_task_sched_out
> >        0.05 ± 46%      +0.0        0.08 ± 13%  perf-
> > profile.children.cycles-pp.select_task_rq
> >        0.13 ± 14%      +0.0        0.17 ±  8%  perf-
> > profile.children.cycles-pp.perf_pmu_sched_task
> >        0.20 ± 10%      +0.0        0.24 ±  2%  perf-
> > profile.children.cycles-pp.try_to_wake_up
> >        0.28 ±  9%      +0.1        0.34 ±  9%  perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> >        0.04 ± 44%      +0.1        0.11 ± 13%  perf-
> > profile.children.cycles-pp.__queue_work
> >        0.30 ± 11%      +0.1        0.38 ±  8%  perf-
> > profile.children.cycles-pp.ttwu_do_activate
> >        0.30 ±  4%      +0.1        0.38 ±  8%  perf-
> > profile.children.cycles-pp.__pick_next_task
> >        0.22 ±  7%      +0.1        0.29 ±  9%  perf-
> > profile.children.cycles-pp.try_to_block_task
> >        0.02 ±141%      +0.1        0.09 ± 10%  perf-
> > profile.children.cycles-pp.kick_pool
> >        0.02 ± 99%      +0.1        0.10 ± 19%  perf-
> > profile.children.cycles-pp.queue_work_on
> >        0.25 ±  4%      +0.1        0.35 ±  7%  perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> >        0.33 ±  6%      +0.1        0.43 ±  5%  perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> >        0.29 ±  4%      +0.1        0.39 ±  6%  perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> >        0.51 ±  6%      +0.1        0.63 ±  6%  perf-
> > profile.children.cycles-pp.schedule_idle
> >        0.46 ±  7%      +0.1        0.58 ±  5%  perf-
> > profile.children.cycles-pp.schedule
> >        0.88 ±  6%      +0.2        1.04 ±  5%  perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> >        0.18 ±  6%      +0.2        0.34 ±  8%  perf-
> > profile.children.cycles-pp.worker_thread
> >        0.88 ±  6%      +0.2        1.04 ±  5%  perf-
> > profile.children.cycles-pp.ret_from_fork
> >        0.38 ±  8%      +0.2        0.56 ± 10%  perf-
> > profile.children.cycles-pp.kthread
> >        1.08 ±  3%      +0.2        1.32 ±  2%  perf-
> > profile.children.cycles-pp.__schedule
> >       66.15            +0.5       66.64        perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> >       62.89            +0.6       63.47        perf-
> > profile.children.cycles-pp.cpuidle_enter_state
> >       63.00            +0.6       63.59        perf-
> > profile.children.cycles-pp.cpuidle_enter
> >       49.10            +0.6       49.69        perf-
> > profile.children.cycles-pp.intel_idle
> >       67.47            +0.7       68.17        perf-
> > profile.children.cycles-pp.do_idle
> >       67.47            +0.7       68.17        perf-
> > profile.children.cycles-pp.common_startup_64
> >       67.47            +0.7       68.17        perf-
> > profile.children.cycles-pp.cpu_startup_entry
> >        0.91 ±  2%      -0.0        0.86 ±  4%  perf-
> > profile.self.cycles-pp.__irqentry_text_end
> >        0.14 ± 11%      +0.1        0.22 ± 11%  perf-
> > profile.self.cycles-pp.timerqueue_del
> >       49.08            +0.6       49.68        perf-
> > profile.self.cycles-pp.intel_idle
> > 
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> >    gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/800%/debian-
> > 12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >     3745213 ± 39%    +108.1%    7794858 ± 12%  cpuidle..usage
> >      186670           +17.3%     218939 ±  2%  meminfo.Percpu
> >        5.00          +306.7%      20.33 ± 66% 
> > mpstat.max_utilization.seconds
> >        9.35 ± 76%      -4.5        4.80 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.__ordered_events__flush.perf_session__process_events.record__fin
> > ish_output.__cmd_record
> >        8.90 ± 75%      -4.3        4.57 ±141%  perf-
> > profile.calltrace.cycles-
> > pp.perf_session__deliver_event.__ordered_events__flush.perf_session
> > __process_events.record__finish_output.__cmd_record
> >        3283 ±  7%     -16.2%       2751 ±  5% 
> > sched_debug.cfs_rq:/.avg_vruntime.avg
> >        3283 ±  7%     -16.2%       2751 ±  5% 
> > sched_debug.cfs_rq:/.min_vruntime.avg
> >     1522512 ±  6%     +80.0%    2739797 ±  4%  vmstat.system.cs
> >      308726 ±  8%     +60.5%     495472 ±  5%  vmstat.system.in
> >      467562            +3.7%     485068 ±  2%  proc-
> > vmstat.nr_kernel_stack
> >      266084            +3.8%     276310        proc-
> > vmstat.nr_slab_unreclaimable
> >   1.375e+08            -2.0%  1.347e+08        proc-vmstat.numa_hit
> >   1.373e+08            -2.0%  1.346e+08        proc-
> > vmstat.numa_local
> >      217472 ±  3%     -28.1%     156410        proc-
> > vmstat.numa_other
> >   1.382e+08            -2.0%  1.354e+08        proc-
> > vmstat.pgalloc_normal
> >   1.375e+08            -2.0%  1.347e+08        proc-vmstat.pgfree
> >     1514102            -6.2%    1420287        hackbench.throughput
> >     1480357            -6.7%    1380775       
> > hackbench.throughput_avg
> >     1514102            -6.2%    1420287       
> > hackbench.throughput_best
> >     1436918            -7.9%    1323413       
> > hackbench.throughput_worst
> >    14551264 ± 13%    +138.1%   34644707 ±  3% 
> > hackbench.time.involuntary_context_switches
> >        9919            -1.6%       9762       
> > hackbench.time.percent_of_cpu_this_job_got
> >        4239            +4.5%       4428       
> > hackbench.time.system_time
> >    56365933 ±  6%     +65.3%   93172066 ±  4% 
> > hackbench.time.voluntary_context_switches
> >    65085618           +26.7%   82440571 ±  2%  perf-stat.i.branch-
> > misses
> >       31.25            -1.6       29.66        perf-stat.i.cache-
> > miss-rate%
> >   2.469e+08            +8.9%  2.689e+08        perf-stat.i.cache-
> > misses
> >   7.519e+08           +15.9%  8.712e+08        perf-stat.i.cache-
> > references
> >     1353061 ±  7%     +87.5%    2537450 ±  5%  perf-stat.i.context-
> > switches
> >   2.269e+11            +3.5%  2.348e+11        perf-stat.i.cpu-
> > cycles
> >      134588 ± 13%     +81.9%     244825 ±  8%  perf-stat.i.cpu-
> > migrations
> >       13.60 ±  5%     +70.5%      23.20 ±  5%  perf-
> > stat.i.metric.K/sec
> >        1.26            +7.6%       1.35        perf-
> > stat.overall.MPKI
> >        0.11 ±  2%      +0.0        0.14 ±  2%  perf-
> > stat.overall.branch-miss-rate%
> >       34.12            -2.1       31.97        perf-
> > stat.overall.cache-miss-rate%
> >        1.17            +1.8%       1.19        perf-
> > stat.overall.cpi
> >      931.96            -5.3%     882.44        perf-
> > stat.overall.cycles-between-cache-misses
> >        0.85            -1.8%       0.84        perf-
> > stat.overall.ipc
> >   5.372e+10            -1.2%   5.31e+10        perf-stat.ps.branch-
> > instructions
> >    57783128 ±  2%     +32.9%   76802898 ±  2%  perf-stat.ps.branch-
> > misses
> >   2.696e+08            +7.2%   2.89e+08        perf-stat.ps.cache-
> > misses
> >   7.902e+08           +14.4%  9.039e+08        perf-stat.ps.cache-
> > references
> >     1288664 ±  7%     +94.6%    2508227 ±  5%  perf-
> > stat.ps.context-switches
> >   2.512e+11            +1.5%   2.55e+11        perf-stat.ps.cpu-
> > cycles
> >      122960 ± 14%     +82.3%     224127 ±  9%  perf-stat.ps.cpu-
> > migrations
> >   1.108e+13            +5.7%  1.171e+13        perf-
> > stat.total.instructions
> >        0.94 ±223%   +5929.9%      56.62 ±121%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> >       26.44 ± 81%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      100.25 ±141%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        9.01 ± 43%   +1823.1%     173.24 ±106%  perf-
> > sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> >       49.43 ± 14%     +73.8%      85.93 ± 19%  perf-
> > sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> >      130.63 ± 17%    +135.8%     308.04 ± 28%  perf-
> > sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> >       18.09 ± 30%    +130.4%      41.70 ± 26%  perf-
> > sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >      196.51 ± 21%    +102.9%     398.77 ± 15%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> >       34.17 ± 39%    +191.1%      99.46 ± 20%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> >      154.91 ±163%   +1649.9%       2710 ± 91%  perf-
> > sched.sch_delay.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksy
> > s_write.do_syscall_64
> >        0.94 ±223%  +1.9e+05%       1743 ±120%  perf-
> > sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> >        3.19 ±124%     -91.9%       0.26 ±150%  perf-
> > sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret
> > _from_fork.ret_from_fork_asm
> >      646.26 ± 94%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      282.66 ±139%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >       63.17 ± 52%   +2854.4%       1866 ±121%  perf-
> > sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> >        1507 ± 35%    +249.4%       5266 ± 47%  perf-
> > sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >        3915 ± 67%     +98.7%       7779 ± 16%  perf-
> > sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >       53.31 ± 18%     +79.9%      95.90 ± 23%  perf-
> > sched.total_sch_delay.average.ms
> >      149.37 ± 18%     +80.0%     268.92 ± 22%  perf-
> > sched.total_wait_and_delay.average.ms
> >       96.07 ± 18%     +80.1%     173.01 ± 21%  perf-
> > sched.total_wait_time.average.ms
> >      244.53 ± 47%    -100.0%       0.00        perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_rea
> > d.vfs_read.ksys_read
> >      529.64 ± 20%     +38.5%     733.60 ± 20%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_wri
> > te.vfs_write.ksys_write
> >      136.52 ± 15%     +73.7%     237.07 ± 18%  perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_sy
> > scall_64
> >      373.41 ± 16%    +136.3%     882.34 ± 27%  perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do
> > _syscall_64
> >       51.96 ± 29%    +127.5%     118.22 ± 25%  perf-
> > sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> >      554.86 ± 23%    +103.0%       1126 ± 14%  perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown].[unknown]
> >      298.52 ±136%    +436.9%       1602 ± 27%  perf-
> > sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_sch
> > edule_timeout.constprop.0.do_poll
> >      556.66 ± 37%     -97.1%      16.09 ± 47%  perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >      707.67 ± 31%    -100.0%       0.00        perf-
> > sched.wait_and_delay.count.__cond_resched.mutex_lock.anon_pipe_read
> > .vfs_read.ksys_read
> >        1358 ± 28%   +4707.9%      65291 ± 27%  perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> >       12184 ±  5%    -100.0%       0.00        perf-
> > sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.anon_pipe_rea
> > d.vfs_read.ksys_read
> >        1393 ±134%    +379.9%       6685 ± 15%  perf-
> > sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_sch
> > edule_timeout.constprop.0.do_poll
> >        6927 ±  6%    +119.8%      15224 ± 19%  perf-
> > sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >      341.61 ± 21%     +39.1%     475.15 ± 20%  perf-
> > sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vf
> > s_write.ksys_write
> >       51.39 ± 99%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      121.14 ±122%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >       87.09 ± 15%     +73.6%     151.14 ± 18%  perf-
> > sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> >      242.78 ± 16%    +136.6%     574.31 ± 27%  perf-
> > sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> >       33.86 ± 29%    +126.0%      76.52 ± 24%  perf-
> > sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >      250.32 ±109%     -89.4%      26.44 ±111%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_common_interr
> > upt.[unknown].[unknown]
> >      358.36 ± 25%    +103.1%     727.72 ± 14%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> >       77.40 ± 47%    +102.5%     156.70 ± 28%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> >       17.91 ± 42%     -75.3%       4.42 ± 76%  perf-
> > sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_fro
> > m_fork_asm
> >      266.70 ±137%    +431.6%       1417 ± 36%  perf-
> > sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> >      536.93 ± 40%     -97.4%      13.81 ± 50%  perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >      180.38 ±135%   +2208.8%       4164 ± 71%  perf-
> > sched.wait_time.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksy
> > s_write.do_syscall_64
> >        1028 ±129%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      312.94 ±123%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >      418.66 ±132%     -93.7%      26.44 ±111%  perf-
> > sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_common_interr
> > upt.[unknown].[unknown]
> >        1388 ±133%    +379.7%       6660 ± 15%  perf-
> > sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> >        2022 ± 25%    +164.9%       5358 ± 46%  perf-
> > sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> >    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/shell_rtns_1/aim9/300s
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >       11004           +86.2%      20490        meminfo.PageTables
> >      121.33 ± 12%     +18.8%     144.17 ±  5%  perf-c2c.DRAM.remote
> >        9155           +20.0%      10990        vmstat.system.cs
> >        5129 ± 20%    +107.2%      10631 ±  3%  numa-
> > meminfo.node0.PageTables
> >        5864 ± 17%     +67.3%       9811 ±  3%  numa-
> > meminfo.node1.PageTables
> >        1278 ± 20%    +107.9%       2658 ±  3%  numa-
> > vmstat.node0.nr_page_table_pages
> >        1469 ± 17%     +66.4%       2446 ±  3%  numa-
> > vmstat.node1.nr_page_table_pages
> >      319.43            -2.1%     312.66       
> > aim9.shell_rtns_1.ops_per_sec
> >    27217846            -2.5%   26546962       
> > aim9.time.minor_page_faults
> >     1051878            -2.1%    1029547       
> > aim9.time.voluntary_context_switches
> >       30502           +18.6%      36187       
> > sched_debug.cpu.nr_switches.avg
> >       90327 ± 12%     +22.7%     110866 ±  4% 
> > sched_debug.cpu.nr_switches.max
> >       26316 ± 16%     +25.5%      33021 ±  5% 
> > sched_debug.cpu.nr_switches.stddev
> >        0.03 ±  7%     +70.7%       0.05 ± 53%  perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        0.02 ±  3%     +38.9%       0.02 ± 28%  perf-
> > sched.total_sch_delay.average.ms
> >       27.43 ±  2%     -14.5%      23.45        perf-
> > sched.total_wait_and_delay.average.ms
> >       23174           +18.0%      27340        perf-
> > sched.total_wait_and_delay.count.ms
> >       27.41 ±  2%     -14.6%      23.42        perf-
> > sched.total_wait_time.average.ms
> >      115.38 ±  3%     -71.9%      32.37 ±  2%  perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >        1656 ±  3%    +280.2%       6299        perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> >      115.35 ±  3%     -72.0%      32.31 ±  2%  perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        2737           +86.1%       5095        proc-
> > vmstat.nr_page_table_pages
> >       30460            +3.2%      31439        proc-vmstat.nr_shmem
> >       27933            +1.8%      28432        proc-
> > vmstat.nr_slab_unreclaimable
> >    19466749            -2.5%   18980434        proc-vmstat.numa_hit
> >    19414531            -2.5%   18927584        proc-
> > vmstat.numa_local
> >    20028107            -2.5%   19528806        proc-
> > vmstat.pgalloc_normal
> >    28087705            -2.4%   27417155        proc-vmstat.pgfault
> >    19980173            -2.5%   19474402        proc-vmstat.pgfree
> >      420074            -5.7%     396239 ±  8%  proc-vmstat.pgreuse
> >        2685            -1.9%       2633        proc-
> > vmstat.unevictable_pgs_culled
> >    5.48e+08            -1.2%  5.412e+08        perf-stat.i.branch-
> > instructions
> >        5.92            +0.1        6.00        perf-stat.i.branch-
> > miss-rate%
> >        9195           +19.9%      11021        perf-stat.i.context-
> > switches
> >        1.96            +1.7%       1.99        perf-stat.i.cpi
> >       70.13           +73.4%     121.59 ±  8%  perf-stat.i.cpu-
> > migrations
> >   2.725e+09            -1.3%   2.69e+09        perf-
> > stat.i.instructions
> >        0.53            -1.6%       0.52        perf-stat.i.ipc
> >        3.80            -2.4%       3.71        perf-
> > stat.i.metric.K/sec
> >       91139            -2.4%      88949        perf-stat.i.minor-
> > faults
> >       91139            -2.4%      88949        perf-stat.i.page-
> > faults
> >        5.00 ± 44%      +1.1        6.07        perf-
> > stat.overall.branch-miss-rate%
> >        1.49 ± 44%     +21.9%       1.82        perf-
> > stat.overall.cpi
> >        7643 ± 44%     +43.7%      10984        perf-
> > stat.ps.context-switches
> >       58.17 ± 44%    +108.4%     121.21 ±  8%  perf-stat.ps.cpu-
> > migrations
> >        2.06 ±  2%      -0.2        1.87 ± 12%  perf-
> > profile.calltrace.cycles-
> > pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCAL
> > L_64_after_hwframe
> >        0.98 ±  7%      -0.2        0.83 ± 12%  perf-
> > profile.calltrace.cycles-
> > pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interr
> > upt.asm_sysvec_apic_timer_interrupt
> >        1.69 ±  2%      -0.1        1.54 ±  2%  perf-
> > profile.calltrace.cycles-pp.setlocale
> >        0.58 ±  5%      -0.1        0.44 ± 44%  perf-
> > profile.calltrace.cycles-
> > pp.entry_SYSCALL_64_after_hwframe.__open64_nocancel.setlocale
> >        0.72 ±  6%      -0.1        0.60 ±  8%  perf-
> > profile.calltrace.cycles-
> > pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic
> > _timer_interrupt
> >        3.21 ±  2%      -0.1        3.11        perf-
> > profile.calltrace.cycles-
> > pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_s
> > yscall_64
> >        0.70 ±  4%      -0.1        0.62 ±  6%  perf-
> > profile.calltrace.cycles-
> > pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
> >        1.52 ±  2%      -0.1        1.44 ±  3%  perf-
> > profile.calltrace.cycles-
> > pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_aft
> > er_hwframe
> >        1.34 ±  3%      -0.1        1.28 ±  3%  perf-
> > profile.calltrace.cycles-
> > pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_6
> > 4
> >        0.89 ±  3%      -0.1        0.84        perf-
> > profile.calltrace.cycles-
> > pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.
> > __sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
> >        0.17 ±141%      +0.4        0.61 ±  7%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> >        0.17 ±141%      +0.4        0.61 ±  7%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> >       65.10            +0.5       65.56        perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.comm
> > on_startup_64
> >       66.40            +0.6       67.00        perf-
> > profile.calltrace.cycles-
> > pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> >       66.46            +0.6       67.08        perf-
> > profile.calltrace.cycles-pp.start_secondary.common_startup_64
> >       66.46            +0.6       67.08        perf-
> > profile.calltrace.cycles-
> > pp.cpu_startup_entry.start_secondary.common_startup_64
> >       67.63            +0.7       68.30        perf-
> > profile.calltrace.cycles-pp.common_startup_64
> >       20.14            -0.6       19.51        perf-
> > profile.children.cycles-pp.do_syscall_64
> >       20.20            -0.6       19.57        perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >        1.13 ±  5%      -0.2        0.98 ±  9%  perf-
> > profile.children.cycles-pp.rcu_core
> >        1.69 ±  2%      -0.1        1.54 ±  2%  perf-
> > profile.children.cycles-pp.setlocale
> >        0.84 ±  4%      -0.1        0.71 ±  5%  perf-
> > profile.children.cycles-pp.rcu_do_batch
> >        2.16 ±  2%      -0.1        2.04 ±  3%  perf-
> > profile.children.cycles-pp.ksys_mmap_pgoff
> >        1.15 ±  4%      -0.1        1.04 ±  5%  perf-
> > profile.children.cycles-pp.__open64_nocancel
> >        3.22 ±  2%      -0.1        3.12        perf-
> > profile.children.cycles-pp.exec_binprm
> >        2.09 ±  2%      -0.1        2.00 ±  2%  perf-
> > profile.children.cycles-pp.kernel_clone
> >        0.88 ±  4%      -0.1        0.79 ±  4%  perf-
> > profile.children.cycles-pp.mas_store_prealloc
> >        2.19            -0.1        2.10 ±  3%  perf-
> > profile.children.cycles-pp.__x64_sys_openat
> >        0.70 ±  4%      -0.1        0.62 ±  6%  perf-
> > profile.children.cycles-pp.dup_mm
> >        1.36 ±  3%      -0.1        1.30        perf-
> > profile.children.cycles-pp._Fork
> >        0.56 ±  4%      -0.1        0.50 ±  8%  perf-
> > profile.children.cycles-pp.dup_mmap
> >        0.09 ± 16%      -0.1        0.03 ± 70%  perf-
> > profile.children.cycles-pp.perf_adjust_freq_unthr_context
> >        0.31 ±  8%      -0.1        0.25 ± 10%  perf-
> > profile.children.cycles-pp.strncpy_from_user
> >        0.94 ±  3%      -0.1        0.88 ±  2%  perf-
> > profile.children.cycles-pp.perf_mux_hrtimer_handler
> >        0.41 ±  5%      -0.0        0.36 ±  5%  perf-
> > profile.children.cycles-pp.irqtime_account_irq
> >        0.18 ± 12%      -0.0        0.14 ±  7%  perf-
> > profile.children.cycles-pp.tlb_remove_table_rcu
> >        0.20 ±  7%      -0.0        0.17 ±  9%  perf-
> > profile.children.cycles-pp.perf_event_task_tick
> >        0.08 ± 14%      -0.0        0.05 ± 49%  perf-
> > profile.children.cycles-pp.mas_update_gap
> >        0.24 ±  5%      -0.0        0.21 ±  5%  perf-
> > profile.children.cycles-pp.filemap_read
> >        0.19 ±  7%      -0.0        0.16 ±  8%  perf-
> > profile.children.cycles-pp.__call_rcu_common
> >        0.22 ±  2%      -0.0        0.19 ±  5%  perf-
> > profile.children.cycles-pp.mas_next_slot
> >        0.09 ±  5%      +0.0        0.12 ±  7%  perf-
> > profile.children.cycles-pp.__perf_event_task_sched_out
> >        0.05 ± 47%      +0.0        0.08 ± 10%  perf-
> > profile.children.cycles-pp.lru_gen_del_folio
> >        0.10 ± 14%      +0.0        0.12 ± 18%  perf-
> > profile.children.cycles-pp.__folio_mod_stat
> >        0.12 ± 12%      +0.0        0.16 ±  3%  perf-
> > profile.children.cycles-pp.perf_pmu_sched_task
> >        0.20 ± 10%      +0.0        0.24 ±  4%  perf-
> > profile.children.cycles-pp.prepare_task_switch
> >        0.06 ± 47%      +0.0        0.10 ± 11%  perf-
> > profile.children.cycles-pp.__queue_work
> >        0.56 ±  5%      +0.1        0.61 ±  4%  perf-
> > profile.children.cycles-pp.sched_balance_domains
> >        0.04 ± 72%      +0.1        0.09 ± 11%  perf-
> > profile.children.cycles-pp.kick_pool
> >        0.04 ± 72%      +0.1        0.09 ± 14%  perf-
> > profile.children.cycles-pp.queue_work_on
> >        0.33 ±  6%      +0.1        0.38 ±  7%  perf-
> > profile.children.cycles-pp.dequeue_entities
> >        0.35 ±  6%      +0.1        0.40 ±  7%  perf-
> > profile.children.cycles-pp.dequeue_task_fair
> >        0.52 ±  6%      +0.1        0.58 ±  5%  perf-
> > profile.children.cycles-pp.enqueue_task_fair
> >        0.54 ±  7%      +0.1        0.60 ±  5%  perf-
> > profile.children.cycles-pp.enqueue_task
> >        0.28 ±  9%      +0.1        0.35 ±  5%  perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> >        0.21 ±  4%      +0.1        0.28 ± 12%  perf-
> > profile.children.cycles-pp.try_to_block_task
> >        0.34 ±  4%      +0.1        0.42 ±  3%  perf-
> > profile.children.cycles-pp.ttwu_do_activate
> >        0.36 ±  3%      +0.1        0.46 ±  6%  perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> >        0.28 ±  4%      +0.1        0.38 ±  5%  perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> >        0.33 ±  2%      +0.1        0.43 ±  5%  perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> >        0.46 ±  7%      +0.1        0.56 ±  6%  perf-
> > profile.children.cycles-pp.schedule
> >        0.48 ±  8%      +0.1        0.61 ±  8%  perf-
> > profile.children.cycles-pp.timerqueue_del
> >        0.18 ± 13%      +0.1        0.32 ± 11%  perf-
> > profile.children.cycles-pp.worker_thread
> >        0.38 ±  9%      +0.2        0.52 ± 10%  perf-
> > profile.children.cycles-pp.kthread
> >        1.10 ±  5%      +0.2        1.25 ±  2%  perf-
> > profile.children.cycles-pp.__schedule
> >        0.85 ±  8%      +0.2        1.01 ±  7%  perf-
> > profile.children.cycles-pp.ret_from_fork
> >        0.85 ±  8%      +0.2        1.02 ±  7%  perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> >       63.15            +0.5       63.64        perf-
> > profile.children.cycles-pp.cpuidle_enter
> >       66.26            +0.5       66.77        perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> >       66.46            +0.6       67.08        perf-
> > profile.children.cycles-pp.start_secondary
> >       67.63            +0.7       68.30        perf-
> > profile.children.cycles-pp.common_startup_64
> >       67.63            +0.7       68.30        perf-
> > profile.children.cycles-pp.cpu_startup_entry
> >       67.63            +0.7       68.30        perf-
> > profile.children.cycles-pp.do_idle
> >        1.20 ±  3%      -0.1        1.12 ±  4%  perf-
> > profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> >        0.09 ± 16%      -0.1        0.03 ± 70%  perf-
> > profile.self.cycles-pp.perf_adjust_freq_unthr_context
> >        0.25 ±  6%      -0.0        0.21 ± 12%  perf-
> > profile.self.cycles-pp.irqtime_account_irq
> >        0.02 ±141%      +0.0        0.06 ± 13%  perf-
> > profile.self.cycles-pp.prepend_path
> >        0.13 ± 10%      +0.1        0.24 ± 11%  perf-
> > profile.self.cycles-pp.timerqueue_del
> > 
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> >    gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/50%/debian-12-
> > x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >   3.924e+08 ±  3%     +55.1%  6.086e+08 ±  2%  cpuidle..time
> >     7504886 ± 11%    +184.4%   21340245 ±  6%  cpuidle..usage
> >    13350305            -3.8%   12848570        vmstat.system.cs
> >     1849619            +5.1%    1943754        vmstat.system.in
> >        3.56 ±  5%      +2.6        6.16 ±  7%  mpstat.cpu.all.idle%
> >        0.69            +0.2        0.90 ±  3%  mpstat.cpu.all.irq%
> >        0.03 ±  3%      +0.0        0.04 ±  3%  mpstat.cpu.all.soft%
> >       18666 ±  9%     +41.2%      26352 ±  6%  perf-c2c.DRAM.remote
> >      197041           -39.6%     118945 ±  5%  perf-c2c.HITM.local
> >        3178 ± 12%     +37.2%       4361 ± 11%  perf-c2c.HITM.remote
> >      200219           -38.4%     123307 ±  5%  perf-c2c.HITM.total
> >     2842579 ± 11%     +60.1%    4550025 ± 12%  meminfo.Active
> >     2842579 ± 11%     +60.1%    4550025 ± 12%  meminfo.Active(anon)
> >     5535242 ±  5%     +30.9%    7248257 ±  7%  meminfo.Cached
> >     3846718 ±  8%     +44.0%    5539484 ±  9%  meminfo.Committed_AS
> >     9684149 ±  3%     +20.5%   11666616 ±  4%  meminfo.Memused
> >      136127 ±  3%     +14.2%     155524        meminfo.PageTables
> >       62144           +22.8%      76336        meminfo.Percpu
> >     2001586 ± 16%     +85.6%    3714611 ± 14%  meminfo.Shmem
> >     9759598 ±  3%     +20.0%   11714619 ±  4%  meminfo.max_used_kB
> >      710625 ± 11%     +59.3%    1131770 ± 11%  proc-
> > vmstat.nr_active_anon
> >     1383631 ±  5%     +30.6%    1806419 ±  7%  proc-
> > vmstat.nr_file_pages
> >       34220 ±  3%     +13.9%      38987        proc-
> > vmstat.nr_page_table_pages
> >      500216 ± 16%     +84.5%     923007 ± 14%  proc-vmstat.nr_shmem
> >      710625 ± 11%     +59.3%    1131770 ± 11%  proc-
> > vmstat.nr_zone_active_anon
> >    92308030            +8.7%  1.004e+08        proc-vmstat.numa_hit
> >    92171407            +8.7%  1.002e+08        proc-
> > vmstat.numa_local
> >      133616            +2.7%     137265        proc-
> > vmstat.numa_other
> >    92394313            +8.7%  1.004e+08        proc-
> > vmstat.pgalloc_normal
> >    91035691            +7.8%   98094626        proc-vmstat.pgfree
> >      867815           +11.8%     970369        hackbench.throughput
> >      830278           +11.6%     926834       
> > hackbench.throughput_avg
> >      867815           +11.8%     970369       
> > hackbench.throughput_best
> >      760822           +14.2%     869145       
> > hackbench.throughput_worst
> >       72.87           -10.3%      65.36       
> > hackbench.time.elapsed_time
> >       72.87           -10.3%      65.36       
> > hackbench.time.elapsed_time.max
> >   2.493e+08           -17.7%  2.052e+08       
> > hackbench.time.involuntary_context_switches
> >       12357            -3.9%      11879       
> > hackbench.time.percent_of_cpu_this_job_got
> >        8029           -14.8%       6842       
> > hackbench.time.system_time
> >      976.58            -5.5%     923.21       
> > hackbench.time.user_time
> >    7.54e+08           -14.4%  6.451e+08       
> > hackbench.time.voluntary_context_switches
> >   5.598e+10            +6.6%  5.965e+10        perf-stat.i.branch-
> > instructions
> >        0.40            -0.0        0.38        perf-stat.i.branch-
> > miss-rate%
> >        8.36 ±  2%      +4.6       12.98 ±  3%  perf-stat.i.cache-
> > miss-rate%
> >    2.11e+09           -33.8%  1.396e+09        perf-stat.i.cache-
> > references
> >    13687653            -3.4%   13225338        perf-stat.i.context-
> > switches
> >        1.36            -7.9%       1.25        perf-stat.i.cpi
> >   3.219e+11            -2.2%  3.147e+11        perf-stat.i.cpu-
> > cycles
> >        1915 ±  2%      -6.6%       1788 ±  3%  perf-stat.i.cycles-
> > between-cache-misses
> >   2.371e+11            +6.0%  2.512e+11        perf-
> > stat.i.instructions
> >        0.74            +8.5%       0.80        perf-stat.i.ipc
> >        1.15 ± 14%     -28.3%       0.82 ± 23%  perf-stat.i.major-
> > faults
> >      115.09            -3.2%     111.40        perf-
> > stat.i.metric.K/sec
> >        0.37            -0.0        0.35        perf-
> > stat.overall.branch-miss-rate%
> >        8.15 ±  3%      +4.6       12.74 ±  3%  perf-
> > stat.overall.cache-miss-rate%
> >        1.36            -7.7%       1.25        perf-
> > stat.overall.cpi
> >        1875 ±  2%      -5.5%       1772 ±  4%  perf-
> > stat.overall.cycles-between-cache-misses
> >        0.74            +8.3%       0.80        perf-
> > stat.overall.ipc
> >   5.524e+10            +6.4%  5.877e+10        perf-stat.ps.branch-
> > instructions
> >   2.079e+09           -33.9%  1.375e+09        perf-stat.ps.cache-
> > references
> >    13486088            -3.4%   13020988        perf-
> > stat.ps.context-switches
> >   3.175e+11            -2.3%  3.101e+11        perf-stat.ps.cpu-
> > cycles
> >    2.34e+11            +5.8%  2.475e+11        perf-
> > stat.ps.instructions
> >        1.09 ± 14%     -28.3%       0.78 ± 21%  perf-stat.ps.major-
> > faults
> >    1.73e+13            -5.1%  1.642e+13        perf-
> > stat.total.instructions
> >     3527725           +10.7%    3905361       
> > sched_debug.cfs_rq:/.avg_vruntime.avg
> >     3975260           +14.1%    4535959 ±  6% 
> > sched_debug.cfs_rq:/.avg_vruntime.max
> >       98657 ± 17%     +84.9%     182407 ± 18% 
> > sched_debug.cfs_rq:/.avg_vruntime.stddev
> >       11.83 ±  7%     +17.6%      13.92 ±  5% 
> > sched_debug.cfs_rq:/.h_nr_queued.max
> >        2.71 ±  5%     +21.8%       3.30 ±  4% 
> > sched_debug.cfs_rq:/.h_nr_queued.stddev
> >       11.75 ±  7%     +17.7%      13.83 ±  6% 
> > sched_debug.cfs_rq:/.h_nr_runnable.max
> >        2.68 ±  4%     +21.2%       3.25 ±  5% 
> > sched_debug.cfs_rq:/.h_nr_runnable.stddev
> >        4556 ±223%    +691.0%      36039 ± 34% 
> > sched_debug.cfs_rq:/.left_deadline.avg
> >      583131 ±223%    +577.3%    3949548 ±  4% 
> > sched_debug.cfs_rq:/.left_deadline.max
> >       51341 ±223%    +622.0%     370695 ± 16% 
> > sched_debug.cfs_rq:/.left_deadline.stddev
> >        4555 ±223%    +691.0%      36035 ± 34% 
> > sched_debug.cfs_rq:/.left_vruntime.avg
> >      583105 ±223%    +577.3%    3949123 ±  4% 
> > sched_debug.cfs_rq:/.left_vruntime.max
> >       51338 ±223%    +622.0%     370651 ± 16% 
> > sched_debug.cfs_rq:/.left_vruntime.stddev
> >     3527725           +10.7%    3905361       
> > sched_debug.cfs_rq:/.min_vruntime.avg
> >     3975260           +14.1%    4535959 ±  6% 
> > sched_debug.cfs_rq:/.min_vruntime.max
> >       98657 ± 17%     +84.9%     182407 ± 18% 
> > sched_debug.cfs_rq:/.min_vruntime.stddev
> >        0.22 ±  5%     +13.9%       0.25 ±  5% 
> > sched_debug.cfs_rq:/.nr_queued.stddev
> >        4555 ±223%    +691.0%      36035 ± 34% 
> > sched_debug.cfs_rq:/.right_vruntime.avg
> >      583105 ±223%    +577.3%    3949123 ±  4% 
> > sched_debug.cfs_rq:/.right_vruntime.max
> >       51338 ±223%    +622.0%     370651 ± 16% 
> > sched_debug.cfs_rq:/.right_vruntime.stddev
> >        1336 ±  7%     +50.8%       2014 ±  6% 
> > sched_debug.cfs_rq:/.runnable_avg.stddev
> >      552.53 ±  8%     +19.6%     660.87 ±  5% 
> > sched_debug.cfs_rq:/.util_est.avg
> >      384.27 ±  9%     +28.9%     495.43 ± 11% 
> > sched_debug.cfs_rq:/.util_est.stddev
> >        1328 ± 17%     +42.7%       1896 ± 13% 
> > sched_debug.cpu.curr->pid.stddev
> >       11.75 ±  8%     +19.1%      14.00 ±  6% 
> > sched_debug.cpu.nr_running.max
> >        2.71 ±  5%     +22.7%       3.33 ±  4% 
> > sched_debug.cpu.nr_running.stddev
> >       76578 ±  9%     +33.7%     102390 ±  5% 
> > sched_debug.cpu.nr_switches.stddev
> >       62.25 ±  7%     +17.9%      73.42 ±  7% 
> > sched_debug.cpu.nr_uninterruptible.max
> >        8.11 ± 58%     -82.0%       1.46 ± 47%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> >       12.04 ±104%     -86.8%       1.58 ± 55%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon
> > _pipe_write
> >        0.11 ±123%     -95.3%       0.01 ±102%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> >        0.06 ±103%     -93.6%       0.00 ±154%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> >        0.10 ±109%     -93.9%       0.01 ±163%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> >        1.00 ± 21%     -59.6%       0.40 ± 50%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs
> > _read.ksys_read
> >       14.54 ± 14%     -79.2%       3.02 ± 51%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vf
> > s_write.ksys_write
> >        1.50 ± 84%     -74.1%       0.39 ± 90%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem
> > _alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> >        1.13 ± 68%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        0.38 ± 97%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        1.10 ± 17%     -68.9%       0.34 ± 49%  perf-
> > sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> >       42.25 ± 18%     -71.7%      11.96 ± 53%  perf-
> > sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> >        3.25 ± 17%     -77.5%       0.73 ± 49%  perf-
> > sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >       29.17 ± 33%     -62.0%      11.09 ± 85%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> >       46.25 ± 15%     -68.8%      14.43 ± 52%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> >        3.72 ± 70%     -81.0%       0.70 ± 67%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> >        7.95 ± 55%     -69.7%       2.41 ± 65%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> >        3.66 ±139%     -97.1%       0.11 ± 58%  perf-
> > sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> >        3.05 ± 44%     -91.9%       0.25 ± 57%  perf-
> > sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> >       29.96 ±  9%     -83.6%       4.90 ± 48%  perf-
> > sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> >       26.20 ± 59%     -88.9%       2.92 ± 66%  perf-
> > sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >        0.14 ± 84%     -91.2%       0.01 ±142%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
> >        0.20 ±149%     -97.5%       0.01 ±102%  perf-
> > sched.sch_delay.max.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> >        0.11 ±144%     -96.6%       0.00 ±154%  perf-
> > sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> >        0.19 ±118%     -96.7%       0.01 ±163%  perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> >      274.64 ± 95%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        3.72 ±151%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        3135 ±  5%     -48.6%       1611 ± 57%  perf-
> > sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >        1320 ± 19%     -78.6%     282.01 ± 74%  perf-
> > sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> >      265.55 ± 82%     -77.9%      58.70 ±124%  perf-
> > sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> >        1850 ± 28%     -59.1%     757.74 ± 68%  perf-
> > sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> >      766.85 ± 56%     -68.0%     245.51 ± 51%  perf-
> > sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >        1.77 ± 17%     -71.9%       0.50 ± 49%  perf-
> > sched.total_sch_delay.average.ms
> >        5.15 ± 17%     -69.5%       1.57 ± 48%  perf-
> > sched.total_wait_and_delay.average.ms
> >        3.38 ± 17%     -68.2%       1.07 ± 48%  perf-
> > sched.total_wait_time.average.ms
> >        5100 ±  3%     -31.0%       3522 ± 47%  perf-
> > sched.total_wait_time.max.ms
> >       27.42 ± 49%     -85.2%       4.07 ± 47%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_nop
> > rof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> >       35.29 ± 80%     -85.8%       5.00 ± 51%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0
> > .anon_pipe_write
> >       42.28 ± 14%     -79.4%       8.70 ± 51%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_wri
> > te.vfs_write.ksys_write
> >        3.12 ± 17%     -66.4%       1.05 ± 48%  perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_sy
> > scall_64
> >      122.62 ± 18%     -70.4%      36.26 ± 53%  perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do
> > _syscall_64
> >      250.26 ± 65%     -94.2%      14.56 ± 55%  perf-
> > sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entr
> > y_SYSCALL_64_after_hwframe
> >        9.37 ± 17%     -78.2%       2.05 ± 48%  perf-
> > sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> >       58.34 ± 33%     -62.0%      22.18 ± 85%  perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown]
> >      134.44 ± 15%     -69.3%      41.24 ± 52%  perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown].[unknown]
> >       86.94 ±  6%     -83.1%      14.68 ± 48%  perf-
> > sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.
> > constprop.0.anon_pipe_write
> >       86.57 ± 39%     -86.0%      12.14 ± 59%  perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp
> > _kthread.kthread
> >      647.92 ± 48%     -97.9%      13.86 ± 45%  perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >        6386 ±  6%     -46.8%       3397 ± 57%  perf-
> > sched.wait_and_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> >        3868 ± 27%     -60.4%       1531 ± 67%  perf-
> > sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.
> > constprop.0.anon_pipe_write
> >        1647 ± 55%     -67.7%     531.51 ± 50%  perf-
> > sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp
> > _kthread.kthread
> >        5014 ±  5%     -32.5%       3385 ± 47%  perf-
> > sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork
> > .ret_from_fork_asm
> >       19.31 ± 47%     -86.5%       2.61 ± 49%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> >       23.25 ± 70%     -85.3%       3.42 ± 52%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon
> > _pipe_write
> >       18.33 ± 15%     -42.0%      10.64 ± 49%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move
> > _task.__set_cpus_allowed_ptr.__sched_setaffinity
> >        0.11 ±123%     -95.3%       0.01 ±102%  perf-
> > sched.wait_time.avg.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> >        0.06 ±103%     -93.6%       0.00 ±154%  perf-
> > sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> >        0.10 ±109%     -93.9%       0.01 ±163%  perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> >        1.70 ± 21%     -52.6%       0.81 ± 48%  perf-
> > sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs
> > _read.ksys_read
> >       27.74 ± 15%     -79.5%       5.68 ± 51%  perf-
> > sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vf
> > s_write.ksys_write
> >        2.17 ± 75%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        0.42 ± 97%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        2.02 ± 17%     -65.1%       0.70 ± 48%  perf-
> > sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> >       80.37 ± 18%     -69.8%      24.31 ± 52%  perf-
> > sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> >      210.13 ± 68%     -95.1%      10.21 ± 55%  perf-
> > sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> >        6.12 ± 17%     -78.5%       1.32 ± 48%  perf-
> > sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >       29.17 ± 33%     -62.0%      11.09 ± 85%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> >       88.19 ± 16%     -69.6%      26.81 ± 52%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> >       13.77 ± 45%     -65.7%       4.72 ± 53%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> >      104.64 ± 42%     -76.4%      24.74 ±135%  perf-
> > sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_fro
> > m_fork_asm
> >        5.16 ± 29%     -92.5%       0.39 ± 48%  perf-
> > sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> >       56.98 ±  5%     -82.9%       9.77 ± 48%  perf-
> > sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> >       60.36 ± 32%     -84.7%       9.22 ± 57%  perf-
> > sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >      619.88 ± 43%     -98.0%      12.52 ± 45%  perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        0.14 ± 84%     -91.2%       0.01 ±142%  perf-
> > sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
> >      740.14 ± 35%     -68.5%     233.31 ± 83%  perf-
> > sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> >        0.20 ±149%     -97.5%       0.01 ±102%  perf-
> > sched.wait_time.max.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> >        0.11 ±144%     -96.6%       0.00 ±154%  perf-
> > sched.wait_time.max.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> >        0.19 ±118%     -96.7%       0.01 ±163%  perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> >      327.64 ± 71%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        3.72 ±151%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> >        3299 ±  6%     -40.7%       1957 ± 51%  perf-
> > sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >      436.75 ± 39%     -76.9%     100.85 ± 98%  perf-
> > sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> >        2112 ± 19%     -62.3%     796.34 ± 63%  perf-
> > sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> >      947.83 ± 46%     -58.8%     390.83 ± 53%  perf-
> > sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >        5014 ±  5%     -32.5%       3385 ± 47%  perf-
> > sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_
> > from_fork_asm
> > 
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> >    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/shell_rtns_2/aim9/300s
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >       11036           +85.7%      20499        meminfo.PageTables
> >      125.17 ±  8%     +18.4%     148.17 ±  7%  perf-c2c.HITM.local
> >       30464           +18.7%      36160       
> > sched_debug.cpu.nr_switches.avg
> >        9166           +19.8%      10985        vmstat.system.cs
> >        6623 ± 17%     +60.8%      10652 ±  5%  numa-
> > meminfo.node0.PageTables
> >        4414 ± 26%    +123.2%       9853 ±  6%  numa-
> > meminfo.node1.PageTables
> >        1653 ± 17%     +60.1%       2647 ±  5%  numa-
> > vmstat.node0.nr_page_table_pages
> >        1097 ± 26%    +123.9%       2457 ±  6%  numa-
> > vmstat.node1.nr_page_table_pages
> >      319.08            -2.2%     312.04       
> > aim9.shell_rtns_2.ops_per_sec
> >    27170926            -2.2%   26586121       
> > aim9.time.minor_page_faults
> >     1051038            -2.2%    1027732       
> > aim9.time.voluntary_context_switches
> >        2736           +86.4%       5101        proc-
> > vmstat.nr_page_table_pages
> >       28014            +1.3%      28378        proc-
> > vmstat.nr_slab_unreclaimable
> >    19332129            -1.5%   19048363        proc-vmstat.numa_hit
> >    19283853            -1.5%   18996609        proc-
> > vmstat.numa_local
> >    19892794            -1.5%   19598065        proc-
> > vmstat.pgalloc_normal
> >    28044189            -2.1%   27457289        proc-vmstat.pgfault
> >    19843766            -1.5%   19543091        proc-vmstat.pgfree
> >      419715            -5.7%     395688 ±  8%  proc-vmstat.pgreuse
> >        2682            -2.0%       2628        proc-
> > vmstat.unevictable_pgs_culled
> >        0.07 ±  6%     -30.5%       0.05 ± 22%  perf-
> > sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >        0.03 ±  6%     +36.0%       0.04        perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        0.07 ± 33%     -57.5%       0.03 ± 53%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_co
> > mpletion_state.kernel_clone.__x64_sys_vfork
> >        0.02 ± 74%    +112.0%       0.05 ± 36%  perf-
> > sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link
> > _path_walk.path_openat
> >        0.02           +24.1%       0.02 ±  2%  perf-
> > sched.total_sch_delay.average.ms
> >       27.52           -14.0%      23.67        perf-
> > sched.total_wait_and_delay.average.ms
> >       23179           +18.3%      27421        perf-
> > sched.total_wait_and_delay.count.ms
> >       27.50           -14.0%      23.65        perf-
> > sched.total_wait_time.average.ms
> >      117.03 ±  3%     -72.4%      32.27 ±  2%  perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >        1655 ±  2%    +282.0%       6324        perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> >        0.96 ± 29%     +51.6%       1.45 ± 22%  perf-
> > sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> >      117.00 ±  3%     -72.5%      32.23 ±  2%  perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        5.93            +0.1        6.00        perf-stat.i.branch-
> > miss-rate%
> >        9189           +19.8%      11011        perf-stat.i.context-
> > switches
> >        1.96            +1.6%       1.99        perf-stat.i.cpi
> >       71.21           +60.6%     114.39 ±  4%  perf-stat.i.cpu-
> > migrations
> >        0.53            -1.5%       0.52        perf-stat.i.ipc
> >        3.79            -2.1%       3.71        perf-
> > stat.i.metric.K/sec
> >       90998            -2.1%      89084        perf-stat.i.minor-
> > faults
> >       90998            -2.1%      89084        perf-stat.i.page-
> > faults
> >        5.99            +0.1        6.06        perf-
> > stat.overall.branch-miss-rate%
> >        1.79            +1.4%       1.82        perf-
> > stat.overall.cpi
> >        0.56            -1.3%       0.55        perf-
> > stat.overall.ipc
> >        9158           +19.8%      10974        perf-
> > stat.ps.context-switches
> >       70.99           +60.6%     114.02 ±  4%  perf-stat.ps.cpu-
> > migrations
> >       90694            -2.1%      88787        perf-stat.ps.minor-
> > faults
> >       90695            -2.1%      88787        perf-stat.ps.page-
> > faults
> >   8.155e+11            -1.1%  8.065e+11        perf-
> > stat.total.instructions
> >        8.87            -0.3        8.55        perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> >        8.86            -0.3        8.54        perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        2.53 ±  2%      -0.1        2.43 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
> >        2.54            -0.1        2.44 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
> >        2.49            -0.1        2.40 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
> >        0.98 ±  5%      -0.1        0.90 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> >        0.70 ±  3%      -0.1        0.62 ±  6%  perf-
> > profile.calltrace.cycles-
> > pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
> >        0.18 ±141%      +0.5        0.67 ±  6%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> >        0.18 ±141%      +0.5        0.67 ±  6%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> >        0.00            +0.6        0.59 ±  7%  perf-
> > profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> >       62.48            +0.7       63.14        perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_
> > startup_entry
> >       49.10            +0.7       49.78        perf-
> > profile.calltrace.cycles-
> > pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.d
> > o_idle
> >       67.62            +0.8       68.43        perf-
> > profile.calltrace.cycles-pp.common_startup_64
> >       20.14            -0.7       19.40        perf-
> > profile.children.cycles-pp.do_syscall_64
> >       20.18            -0.7       19.44        perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >        3.33 ±  2%      -0.2        3.16 ±  2%  perf-
> > profile.children.cycles-pp.vm_mmap_pgoff
> >        3.22 ±  2%      -0.2        3.06        perf-
> > profile.children.cycles-pp.do_mmap
> >        3.51 ±  2%      -0.1        3.38        perf-
> > profile.children.cycles-pp.do_exit
> >        3.52 ±  2%      -0.1        3.38        perf-
> > profile.children.cycles-pp.__x64_sys_exit_group
> >        3.52 ±  2%      -0.1        3.38        perf-
> > profile.children.cycles-pp.do_group_exit
> >        3.67            -0.1        3.54        perf-
> > profile.children.cycles-pp.x64_sys_call
> >        2.21            -0.1        2.09 ±  3%  perf-
> > profile.children.cycles-pp.__x64_sys_openat
> >        2.07 ±  2%      -0.1        1.94 ±  2%  perf-
> > profile.children.cycles-pp.path_openat
> >        2.09 ±  2%      -0.1        1.97 ±  2%  perf-
> > profile.children.cycles-pp.do_filp_open
> >        2.19            -0.1        2.08 ±  3%  perf-
> > profile.children.cycles-pp.do_sys_openat2
> >        1.50 ±  4%      -0.1        1.39 ±  3%  perf-
> > profile.children.cycles-pp.copy_process
> >        2.56            -0.1        2.46 ±  2%  perf-
> > profile.children.cycles-pp.exit_mm
> >        2.55            -0.1        2.44 ±  2%  perf-
> > profile.children.cycles-pp.__mmput
> >        2.51 ±  2%      -0.1        2.41 ±  2%  perf-
> > profile.children.cycles-pp.exit_mmap
> >        0.70 ±  3%      -0.1        0.62 ±  6%  perf-
> > profile.children.cycles-pp.dup_mm
> >        0.94 ±  4%      -0.1        0.89 ±  2%  perf-
> > profile.children.cycles-pp.__alloc_frozen_pages_noprof
> >        0.57 ±  3%      -0.0        0.52 ±  4%  perf-
> > profile.children.cycles-pp.alloc_pages_noprof
> >        0.20 ± 12%      -0.0        0.15 ± 10%  perf-
> > profile.children.cycles-pp.perf_event_task_tick
> >        0.18 ±  4%      -0.0        0.14 ± 15%  perf-
> > profile.children.cycles-pp.xas_find
> >        0.10 ± 12%      -0.0        0.07 ± 24%  perf-
> > profile.children.cycles-pp.up_write
> >        0.09 ±  6%      -0.0        0.07 ± 11%  perf-
> > profile.children.cycles-pp.tick_check_broadcast_expired
> >        0.08 ± 12%      +0.0        0.10 ±  8%  perf-
> > profile.children.cycles-pp.hrtimer_try_to_cancel
> >        0.10 ± 13%      +0.0        0.13 ±  5%  perf-
> > profile.children.cycles-pp.__perf_event_task_sched_out
> >        0.20 ±  8%      +0.0        0.23 ±  4%  perf-
> > profile.children.cycles-pp.enqueue_entity
> >        0.21 ±  9%      +0.0        0.25 ±  4%  perf-
> > profile.children.cycles-pp.prepare_task_switch
> >        0.03 ±101%      +0.0        0.07 ± 16%  perf-
> > profile.children.cycles-pp.run_ksoftirqd
> >        0.04 ± 71%      +0.1        0.09 ± 15%  perf-
> > profile.children.cycles-pp.kick_pool
> >        0.05 ± 47%      +0.1        0.11 ± 16%  perf-
> > profile.children.cycles-pp.__queue_work
> >        0.28 ±  5%      +0.1        0.34 ±  7%  perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> >        0.50            +0.1        0.56 ±  2%  perf-
> > profile.children.cycles-pp.timerqueue_del
> >        0.04 ± 71%      +0.1        0.11 ± 17%  perf-
> > profile.children.cycles-pp.queue_work_on
> >        0.51 ±  4%      +0.1        0.58 ±  2%  perf-
> > profile.children.cycles-pp.enqueue_task_fair
> >        0.32 ±  3%      +0.1        0.40 ±  4%  perf-
> > profile.children.cycles-pp.ttwu_do_activate
> >        0.53 ±  5%      +0.1        0.61 ±  3%  perf-
> > profile.children.cycles-pp.enqueue_task
> >        0.49 ±  4%      +0.1        0.57 ±  6%  perf-
> > profile.children.cycles-pp.schedule
> >        0.28 ±  6%      +0.1        0.38        perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> >        0.32 ±  5%      +0.1        0.43 ±  2%  perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> >        0.35 ±  8%      +0.1        0.47 ±  2%  perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> >        0.17 ± 10%      +0.2        0.34 ± 12%  perf-
> > profile.children.cycles-pp.worker_thread
> >        0.88 ±  3%      +0.2        1.06 ±  4%  perf-
> > profile.children.cycles-pp.ret_from_fork
> >        0.88 ±  3%      +0.2        1.06 ±  4%  perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> >        0.39 ±  6%      +0.2        0.59 ±  7%  perf-
> > profile.children.cycles-pp.kthread
> >       66.24            +0.6       66.85        perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> >       63.09            +0.6       63.73        perf-
> > profile.children.cycles-pp.cpuidle_enter
> >       62.97            +0.6       63.61        perf-
> > profile.children.cycles-pp.cpuidle_enter_state
> >       67.61            +0.8       68.43        perf-
> > profile.children.cycles-pp.do_idle
> >       67.62            +0.8       68.43        perf-
> > profile.children.cycles-pp.common_startup_64
> >       67.62            +0.8       68.43        perf-
> > profile.children.cycles-pp.cpu_startup_entry
> >        0.37 ± 11%      -0.1        0.31 ±  3%  perf-
> > profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> >        0.10 ± 13%      -0.0        0.06 ± 50%  perf-
> > profile.self.cycles-pp.up_write
> >        0.15 ±  4%      +0.1        0.22 ±  8%  perf-
> > profile.self.cycles-pp.timerqueue_del
> > 
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> >    gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/exec_test/aim9/300s
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >       12120           +76.7%      21422        meminfo.PageTables
> >        8543           +26.9%      10840        vmstat.system.cs
> >        6148 ± 11%     +89.9%      11678 ±  5%  numa-
> > meminfo.node0.PageTables
> >        5909 ± 11%     +64.0%       9689 ±  7%  numa-
> > meminfo.node1.PageTables
> >        1532 ± 10%     +90.5%       2919 ±  5%  numa-
> > vmstat.node0.nr_page_table_pages
> >        1468 ± 11%     +65.2%       2426 ±  7%  numa-
> > vmstat.node1.nr_page_table_pages
> >        2991           +78.0%       5323        proc-
> > vmstat.nr_page_table_pages
> >    32726750            -2.4%   31952115        proc-vmstat.pgfault
> >        1228            -2.6%       1197       
> > aim9.exec_test.ops_per_sec
> >       11018 ±  2%     +10.5%      12178 ±  2% 
> > aim9.time.involuntary_context_switches
> >    31835059            -2.4%   31062527       
> > aim9.time.minor_page_faults
> >      736468            -2.9%     715310       
> > aim9.time.voluntary_context_switches
> >        0.28 ±  7%     +11.3%       0.31 ±  6% 
> > sched_debug.cfs_rq:/.h_nr_queued.stddev
> >        0.28 ±  7%     +11.3%       0.31 ±  6% 
> > sched_debug.cfs_rq:/.nr_queued.stddev
> >      356683 ± 16%     +27.0%     453000 ±  9% 
> > sched_debug.cpu.avg_idle.min
> >       27620 ±  7%     +29.5%      35775       
> > sched_debug.cpu.nr_switches.avg
> >       84830 ± 14%     +16.3%      98648 ±  4% 
> > sched_debug.cpu.nr_switches.max
> >        4563 ± 26%     +46.2%       6671 ± 26% 
> > sched_debug.cpu.nr_switches.min
> >        0.03 ±  4%     -67.3%       0.01 ±141%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exec_release
> > .exec_mm_release.exec_mmap
> >        0.03           +11.2%       0.03 ±  2%  perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        0.05 ± 28%     +61.3%       0.09 ± 21%  perf-
> > sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> >        0.10 ± 18%     +18.8%       0.12        perf-
> > sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> >        0.02 ±  3%     +18.3%       0.02 ±  2%  perf-
> > sched.total_sch_delay.average.ms
> >       28.80           -19.8%      23.10 ±  3%  perf-
> > sched.total_wait_and_delay.average.ms
> >       22332           +24.4%      27778        perf-
> > sched.total_wait_and_delay.count.ms
> >       28.78           -19.8%      23.07 ±  3%  perf-
> > sched.total_wait_time.average.ms
> >       17.39 ± 10%     -15.6%      14.67 ±  4%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine
> > _move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> >       41.02 ±  4%     -54.6%      18.64 ±  6%  perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> >        4795 ±  2%    +122.5%      10668        perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> >       17.35 ± 10%     -15.7%      14.63 ±  4%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move
> > _task.__set_cpus_allowed_ptr.__sched_setaffinity
> >        0.00 ±141%    +400.0%       0.00 ± 44%  perf-
> > sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vf
> > s_open
> >       40.99 ±  4%     -54.6%      18.61 ±  6%  perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> >        0.00 ±149%    +542.9%       0.03 ± 41%  perf-
> > sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vf
> > s_open
> >   5.617e+08            -1.6%  5.529e+08        perf-stat.i.branch-
> > instructions
> >        5.76            +0.1        5.84        perf-stat.i.branch-
> > miss-rate%
> >        8562           +27.0%      10878        perf-stat.i.context-
> > switches
> >        1.87            +2.6%       1.92        perf-stat.i.cpi
> >       78.02 ±  3%     +11.8%      87.23 ±  2%  perf-stat.i.cpu-
> > migrations
> >   2.792e+09            -1.6%  2.748e+09        perf-
> > stat.i.instructions
> >        0.55            -2.5%       0.54        perf-stat.i.ipc
> >        4.42            -2.4%       4.31        perf-
> > stat.i.metric.K/sec
> >      106019            -2.4%     103509        perf-stat.i.minor-
> > faults
> >      106019            -2.4%     103509        perf-stat.i.page-
> > faults
> >        5.83            +0.1        5.91        perf-
> > stat.overall.branch-miss-rate%
> >        1.72            +2.3%       1.76        perf-
> > stat.overall.cpi
> >        0.58            -2.3%       0.57        perf-
> > stat.overall.ipc
> >   5.599e+08            -1.6%  5.511e+08        perf-stat.ps.branch-
> > instructions
> >        8534           +27.0%      10841        perf-
> > stat.ps.context-switches
> >       77.77 ±  3%     +11.8%      86.96 ±  2%  perf-stat.ps.cpu-
> > migrations
> >   2.783e+09            -1.6%  2.739e+09        perf-
> > stat.ps.instructions
> >      105666            -2.4%     103164        perf-stat.ps.minor-
> > faults
> >      105666            -2.4%     103164        perf-stat.ps.page-
> > faults
> >   8.386e+11            -1.6%  8.253e+11        perf-
> > stat.total.instructions
> >        7.79            -0.4        7.41 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_
> > 64_after_hwframe.execve
> >        7.75            -0.3        7.47        perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> >        7.73            -0.3        7.46        perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        2.68 ±  2%      -0.2        2.52 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_sysca
> > ll_64
> >        2.68 ±  2%      -0.2        2.52 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64
> > _after_hwframe
> >        2.68 ±  2%      -0.2        2.52 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe
> >        2.73 ±  2%      -0.2        2.57 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        2.60            -0.1        2.46 ±  3%  perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.ex
> > ecve.exec_test
> >        2.61            -0.1        2.47 ±  3%  perf-
> > profile.calltrace.cycles-pp.execve.exec_test
> >        2.60            -0.1        2.46 ±  3%  perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
> >        2.60            -0.1        2.46 ±  3%  perf-
> > profile.calltrace.cycles-
> > pp.entry_SYSCALL_64_after_hwframe.execve.exec_test
> >        1.92 ±  3%      -0.1        1.79 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
> >        1.92 ±  3%      -0.1        1.80 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
> >        4.68            -0.1        4.57        perf-
> > profile.calltrace.cycles-pp._Fork
> >        1.88 ±  2%      -0.1        1.77 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
> >        2.76            -0.1        2.66 ±  2%  perf-
> > profile.calltrace.cycles-pp.exec_test
> >        3.24            -0.1        3.16        perf-
> > profile.calltrace.cycles-
> > pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> >        0.84 ±  4%      -0.1        0.77 ±  5%  perf-
> > profile.calltrace.cycles-pp.wait4
> >        0.88 ±  7%      +0.2        1.09 ±  3%  perf-
> > profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> >        0.88 ±  7%      +0.2        1.09 ±  3%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> >        0.88 ±  7%      +0.2        1.09 ±  3%  perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> >        0.46 ± 45%      +0.3        0.78 ±  5%  perf-
> > profile.calltrace.cycles-
> > pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> >        0.17 ±141%      +0.4        0.53 ±  4%  perf-
> > profile.calltrace.cycles-
> > pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_seconda
> > ry
> >        0.18 ±141%      +0.4        0.54 ±  2%  perf-
> > profile.calltrace.cycles-
> > pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_s
> > tartup_64
> >       66.08            +0.8       66.85        perf-
> > profile.calltrace.cycles-
> > pp.cpu_startup_entry.start_secondary.common_startup_64
> >       66.08            +0.8       66.85        perf-
> > profile.calltrace.cycles-pp.start_secondary.common_startup_64
> >       66.02            +0.8       66.80        perf-
> > profile.calltrace.cycles-
> > pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> >       67.06            +0.9       68.00        perf-
> > profile.calltrace.cycles-pp.common_startup_64
> >       21.19            -0.9       20.30        perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       21.15            -0.9       20.27        perf-
> > profile.children.cycles-pp.do_syscall_64
> >        7.92            -0.4        7.53 ±  2%  perf-
> > profile.children.cycles-pp.execve
> >        7.94            -0.4        7.56 ±  2%  perf-
> > profile.children.cycles-pp.__x64_sys_execve
> >        7.84            -0.4        7.46 ±  2%  perf-
> > profile.children.cycles-pp.do_execveat_common
> >        5.51            -0.3        5.25 ±  2%  perf-
> > profile.children.cycles-pp.load_elf_binary
> >        3.68            -0.2        3.49 ±  2%  perf-
> > profile.children.cycles-pp.__mmput
> >        2.81 ±  2%      -0.2        2.63        perf-
> > profile.children.cycles-pp.__x64_sys_exit_group
> >        2.80 ±  2%      -0.2        2.62 ±  2%  perf-
> > profile.children.cycles-pp.do_exit
> >        2.81 ±  2%      -0.2        2.62 ±  2%  perf-
> > profile.children.cycles-pp.do_group_exit
> >        2.93 ±  2%      -0.2        2.76 ±  2%  perf-
> > profile.children.cycles-pp.x64_sys_call
> >        3.60            -0.2        3.44 ±  2%  perf-
> > profile.children.cycles-pp.exit_mmap
> >        5.66            -0.1        5.51        perf-
> > profile.children.cycles-pp.__handle_mm_fault
> >        1.94 ±  3%      -0.1        1.82 ±  2%  perf-
> > profile.children.cycles-pp.exit_mm
> >        2.64            -0.1        2.52 ±  3%  perf-
> > profile.children.cycles-pp.vm_mmap_pgoff
> >        2.55 ±  2%      -0.1        2.43 ±  3%  perf-
> > profile.children.cycles-pp.do_mmap
> >        2.19 ±  2%      -0.1        2.08 ±  3%  perf-
> > profile.children.cycles-pp.__mmap_region
> >        2.27            -0.1        2.16 ±  2%  perf-
> > profile.children.cycles-pp.begin_new_exec
> >        2.79            -0.1        2.69 ±  2%  perf-
> > profile.children.cycles-pp.exec_test
> >        0.83 ±  4%      -0.1        0.76 ±  6%  perf-
> > profile.children.cycles-pp.__mmap_prepare
> >        0.86 ±  4%      -0.1        0.78 ±  5%  perf-
> > profile.children.cycles-pp.wait4
> >        0.52 ±  5%      -0.1        0.45 ±  7%  perf-
> > profile.children.cycles-pp.kernel_wait4
> >        0.50 ±  5%      -0.1        0.43 ±  6%  perf-
> > profile.children.cycles-pp.do_wait
> >        0.88 ±  3%      -0.1        0.81 ±  2%  perf-
> > profile.children.cycles-pp.kmem_cache_free
> >        0.51 ±  2%      -0.1        0.46 ±  6%  perf-
> > profile.children.cycles-pp.setup_arg_pages
> >        0.39 ±  2%      -0.0        0.34 ±  8%  perf-
> > profile.children.cycles-pp.unlink_anon_vmas
> >        0.08 ± 10%      -0.0        0.04 ± 71%  perf-
> > profile.children.cycles-pp.perf_adjust_freq_unthr_context
> >        0.37 ±  5%      -0.0        0.33 ±  3%  perf-
> > profile.children.cycles-pp.__memcg_slab_free_hook
> >        0.21 ±  6%      -0.0        0.17 ±  5%  perf-
> > profile.children.cycles-pp.user_path_at
> >        0.21 ±  3%      -0.0        0.18 ± 10%  perf-
> > profile.children.cycles-pp.__percpu_counter_sum
> >        0.18 ±  7%      -0.0        0.15 ±  5%  perf-
> > profile.children.cycles-pp.alloc_empty_file
> >        0.33 ±  5%      -0.0        0.30        perf-
> > profile.children.cycles-pp.relocate_vma_down
> >        0.04 ± 45%      +0.0        0.08 ± 12%  perf-
> > profile.children.cycles-pp.__update_load_avg_se
> >        0.14 ±  7%      +0.0        0.18 ± 10%  perf-
> > profile.children.cycles-pp.hrtimer_start_range_ns
> >        0.19 ±  9%      +0.0        0.24 ±  7%  perf-
> > profile.children.cycles-pp.prepare_task_switch
> >        0.02 ±142%      +0.0        0.06 ± 23%  perf-
> > profile.children.cycles-pp.select_task_rq
> >        0.03 ±100%      +0.0        0.08 ±  8%  perf-
> > profile.children.cycles-pp.task_contending
> >        0.45 ±  7%      +0.1        0.51 ±  3%  perf-
> > profile.children.cycles-pp.__pick_next_task
> >        0.14 ± 22%      +0.1        0.20 ± 10%  perf-
> > profile.children.cycles-pp.kick_pool
> >        0.36 ±  4%      +0.1        0.42 ±  4%  perf-
> > profile.children.cycles-pp.dequeue_entities
> >        0.36 ±  4%      +0.1        0.44 ±  5%  perf-
> > profile.children.cycles-pp.dequeue_task_fair
> >        0.15 ± 20%      +0.1        0.23 ± 10%  perf-
> > profile.children.cycles-pp.__queue_work
> >        0.49 ±  5%      +0.1        0.57 ±  7%  perf-
> > profile.children.cycles-pp.schedule_idle
> >        0.14 ± 22%      +0.1        0.23 ±  9%  perf-
> > profile.children.cycles-pp.queue_work_on
> >        0.36 ±  3%      +0.1        0.46 ±  9%  perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> >        0.47 ±  7%      +0.1        0.57 ±  7%  perf-
> > profile.children.cycles-pp.timerqueue_del
> >        0.30 ± 13%      +0.1        0.42 ±  7%  perf-
> > profile.children.cycles-pp.ttwu_do_activate
> >        0.23 ± 15%      +0.1        0.37 ±  4%  perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> >        0.18 ± 14%      +0.1        0.32 ±  3%  perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> >        0.19 ± 13%      +0.1        0.34 ±  4%  perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> >        0.61 ±  3%      +0.2        0.76 ±  5%  perf-
> > profile.children.cycles-pp.schedule
> >        1.60 ±  4%      +0.2        1.80 ±  2%  perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> >        1.60 ±  4%      +0.2        1.80 ±  2%  perf-
> > profile.children.cycles-pp.ret_from_fork
> >        0.88 ±  7%      +0.2        1.09 ±  3%  perf-
> > profile.children.cycles-pp.kthread
> >        1.22 ±  3%      +0.2        1.45 ±  5%  perf-
> > profile.children.cycles-pp.__schedule
> >        0.54 ±  8%      +0.2        0.78 ±  5%  perf-
> > profile.children.cycles-pp.worker_thread
> >       66.08            +0.8       66.85        perf-
> > profile.children.cycles-pp.start_secondary
> >       67.06            +0.9       68.00        perf-
> > profile.children.cycles-pp.common_startup_64
> >       67.06            +0.9       68.00        perf-
> > profile.children.cycles-pp.cpu_startup_entry
> >       67.06            +0.9       68.00        perf-
> > profile.children.cycles-pp.do_idle
> >        0.08 ± 10%      -0.0        0.04 ± 71%  perf-
> > profile.self.cycles-pp.perf_adjust_freq_unthr_context
> >        0.04 ± 45%      +0.0        0.08 ± 10%  perf-
> > profile.self.cycles-pp.__update_load_avg_se
> >        0.14 ± 10%      +0.1        0.23 ± 11%  perf-
> > profile.self.cycles-pp.timerqueue_del
> > 
> > 
> > 
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/te
> > st/testcase:
> >    gcc-12/performance/1BRD_48G/xfs/x86_64-rhel-9.4/600/debian-12-
> > x86_64-20240206.cgz/lkp-icl-2sp2/sync_disk_rw/aim7
> > 
> > commit:
> >    baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> >    f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> > 
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> >           %stddev     %change         %stddev
> >               \          |                \
> >      344180 ±  6%     -13.0%     299325 ±  9%  meminfo.Mapped
> >        9594 ±123%    +191.8%      27995 ± 54%  numa-
> > meminfo.node1.PageTables
> >        2399 ±123%    +191.3%       6989 ± 54%  numa-
> > vmstat.node1.nr_page_table_pages
> >     1860734            -5.2%    1763194        vmstat.io.bo
> >      831686            +1.3%     842493        vmstat.system.cs
> >       50372            -5.5%      47609        aim7.jobs-per-min
> >     1435644           +11.5%    1600707       
> > aim7.time.involuntary_context_switches
> >        7242            +1.2%       7332       
> > aim7.time.percent_of_cpu_this_job_got
> >        5159            +7.1%       5526       
> > aim7.time.system_time
> >    33195986            +6.9%   35497140       
> > aim7.time.voluntary_context_switches
> >       40987 ± 10%     -19.8%      32872 ±  9% 
> > sched_debug.cfs_rq:/.avg_vruntime.stddev
> >       40987 ± 10%     -19.8%      32872 ±  9% 
> > sched_debug.cfs_rq:/.min_vruntime.stddev
> >      605972 ±  2%     +14.5%     693922 ±  7% 
> > sched_debug.cpu.avg_idle.max
> >       30974 ±  8%     -20.9%      24498 ± 15% 
> > sched_debug.cpu.avg_idle.min
> >      118758 ±  5%     +22.0%     144899 ±  6% 
> > sched_debug.cpu.avg_idle.stddev
> >      856253            +1.5%     869009        perf-stat.i.context-
> > switches
> >        3.06            +2.3%       3.13        perf-stat.i.cpi
> >      164824            +7.7%     177546        perf-stat.i.cpu-
> > migrations
> >        7.93            +2.5%       8.13        perf-
> > stat.i.metric.K/sec
> >        3.41            +1.8%       3.47        perf-
> > stat.overall.cpi
> >        1355            +5.8%       1434 ±  4%  perf-
> > stat.overall.cycles-between-cache-misses
> >        0.29            -1.8%       0.29        perf-
> > stat.overall.ipc
> >      845412            +1.6%     858925        perf-
> > stat.ps.context-switches
> >      162728            +7.8%     175475        perf-stat.ps.cpu-
> > migrations
> >   4.391e+12            +5.0%  4.609e+12        perf-
> > stat.total.instructions
> >      444798            +6.0%     471383 ±  5%  proc-
> > vmstat.nr_active_anon
> >       28190            -2.8%      27402        proc-vmstat.nr_dirty
> >     1231373            +2.3%    1259666 ±  2%  proc-
> > vmstat.nr_file_pages
> >       63763            +0.9%      64355        proc-
> > vmstat.nr_inactive_file
> >       86758 ±  6%     -12.9%      75546 ±  8%  proc-
> > vmstat.nr_mapped
> >       10162 ±  2%      +7.2%      10895 ±  3%  proc-
> > vmstat.nr_page_table_pages
> >      265229           +10.4%     292795 ±  9%  proc-vmstat.nr_shmem
> >      444798            +6.0%     471383 ±  5%  proc-
> > vmstat.nr_zone_active_anon
> >       63763            +0.9%      64355        proc-
> > vmstat.nr_zone_inactive_file
> >       28191            -2.8%      27400        proc-
> > vmstat.nr_zone_write_pending
> >       24349           +11.6%      27171 ±  8%  proc-vmstat.pgreuse
> >        0.02 ±  3%     +11.3%       0.03 ±  2%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_write_get_more_iclog_space
> >        0.29 ± 17%     -30.7%       0.20 ± 14%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_read.xfs_file_fsync.xfs_
> > file_buffered_write.vfs_write
> >        0.03 ± 10%     +33.5%       0.04 ±  2%  perf-
> > sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> >        0.21 ± 32%    -100.0%       0.00        perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        0.16 ± 16%     +51.9%       0.24 ± 11%  perf-
> > sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >        0.22 ± 19%     +44.1%       0.32 ± 25%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown]
> >        0.30 ± 28%     -38.7%       0.18 ± 28%  perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> >        0.11 ±  5%     +12.8%       0.12 ±  4%  perf-
> > sched.sch_delay.avg.ms.xlog_cil_force_seq.xfs_log_force_seq.xfs_fil
> > e_fsync.xfs_file_buffered_write
> >        0.08 ±  4%     +15.8%       0.09 ±  4%  perf-
> > sched.sch_delay.avg.ms.xlog_wait.xlog_force_lsn.xfs_log_force_seq.x
> > fs_file_fsync
> >        0.02 ±  3%     +13.7%       0.02 ±  4%  perf-
> > sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.proces
> > s_one_work.worker_thread
> >        0.01 ±223%   +1289.5%       0.09 ±111%  perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_c
> > il_ctx_alloc.xlog_cil_push_work.process_one_work
> >        2.49 ± 40%     -43.4%       1.41 ± 50%  perf-
> > sched.sch_delay.max.ms.__cond_resched.down_read.xfs_file_fsync.xfs_
> > file_buffered_write.vfs_write
> >        0.76 ±  7%     +92.8%       1.46 ± 40%  perf-
> > sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> >        0.65 ± 41%    -100.0%       0.00        perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        1.40 ± 64%   +2968.7%      43.04 ± 13%  perf-
> > sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >        0.63 ± 19%     +89.8%       1.19 ± 51%  perf-
> > sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> >       28.67 ±  3%     -11.2%      25.45 ±  5%  perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.__flus
> > h_workqueue.xlog_cil_push_now.isra
> >        0.80 ±  9%    -100.0%       0.00        perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xl
> > og_state_release_iclog.xlog_write_get_more_iclog_space
> >        5.76 ±107%    +152.4%      14.53 ± 10%  perf-
> > sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> >        8441          -100.0%       0.00        perf-
> > sched.wait_and_delay.count.__cond_resched.down.xlog_write_iclog.xlo
> > g_state_release_iclog.xlog_write_get_more_iclog_space
> >       18.67 ± 71%    +108.0%      38.83 ±  5%  perf-
> > sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit
> > .__xfs_trans_commit.xfs_trans_commit
> >      116.17 ±105%   +1677.8%       2065 ±  5%  perf-
> > sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.ent
> > ry_SYSCALL_64_after_hwframe.[unknown]
> >      424.79 ±151%    -100.0%       0.00        perf-
> > sched.wait_and_delay.max.ms.__cond_resched.down.xlog_write_iclog.xl
> > og_state_release_iclog.xlog_write_get_more_iclog_space
> >       28.51 ±  3%     -11.2%      25.31 ±  4%  perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__flush_wor
> > kqueue.xlog_cil_push_now.isra
> >        0.38 ± 59%     -79.0%       0.08 ±107%  perf-
> > sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_state_get_iclog_space
> >        0.77 ±  9%     -56.5%       0.34 ±  3%  perf-
> > sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_write_get_more_iclog_space
> >        1.80 ±138%    -100.0%       0.00        perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        6.13 ± 93%    +133.2%      14.29 ± 10%  perf-
> > sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> >        1.00 ± 16%     -48.1%       0.52 ± 20%  perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> >        0.92 ± 16%     -62.0%       0.35 ± 14%  perf-
> > sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_s
> > lowpath.down_write.xlog_cil_push_work
> >        0.26 ±  2%     -59.8%       0.11        perf-
> > sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.proces
> > s_one_work.worker_thread
> >        0.24 ±223%   +2180.2%       5.56 ± 83%  perf-
> > sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_c
> > il_ctx_alloc.xlog_cil_push_work.process_one_work
> >        1.25 ± 77%     -79.8%       0.25 ±107%  perf-
> > sched.wait_time.max.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_state_get_iclog_space
> >        1.78 ± 51%    +958.6%      18.82 ±117%  perf-
> > sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_allo
> > c_bioset.iomap_writepage_map_blocks.iomap_writepage_map
> >       58.48 ±  6%     -10.7%      52.22 ±  2%  perf-
> > sched.wait_time.max.ms.__cond_resched.mutex_lock.__flush_workqueue.
> > xlog_cil_push_now.isra
> >       10.87 ±192%    -100.0%       0.00        perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >        8.63 ± 27%     -63.9%       3.12 ± 29%  perf-
> > sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_s
> > lowpath.down_write.xlog_cil_push_work
> > 
> > 
> > 
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and
> > are provided
> > for informational purposes only. Any difference in system hardware
> > or software
> > design or configuration may affect actual performance.
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
  2025-06-25 13:57     ` Mathieu Desnoyers
  2025-06-25 15:06       ` Gabriele Monaco
@ 2025-07-02 13:58       ` Gabriele Monaco
  1 sibling, 0 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-07-02 13:58 UTC (permalink / raw)
  To: Mathieu Desnoyers, kernel test robot
  Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
	Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
	Paul E. McKenney, Ingo Molnar



On Wed, 2025-06-25 at 09:57 -0400, Mathieu Desnoyers wrote:
> On 2025-06-25 04:01, kernel test robot wrote:
> > 
> > Hello,
> > 
> > kernel test robot noticed a 10.1% regression of
> > hackbench.throughput on:
> 
> Hi Gabriele,
> 
> This is a significant regression. Can you investigate before it gets
> merged ?
> 

Hi Mathieu,

I run some tests, the culprit for this performance regression seems to
be the interference due to more consistent `mm_cid` scans and them
running in `work_struct`, which brings some more scheduling overhead.

One solution could be to reduce the frequency: now they run
(sporadically) about every 100ms, if the minimum delay is 1s, the test
results seem ok.

However, I tried another approach that seems promising: work_struct get
scheduled relatively fast and this ends up giving a lot of contention
with kworkers, however something like timer_list seems less aggressive
and we obtain a similar reliability with respect to calls to the mm_cid
scan, without the same performance impact.

At the moment I just kept roughly the same structure of the patch and
used a timer delayed by 1 jiffy in place of the work_struct.
It may look cleaner if we use the timer directly for the 100ms delay
instead of storing and checking the time, in fact running a scan about
100ms after every rseq_handle_notify_resume.

What do you think?

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-07-02 14:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13  9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
2025-06-13  9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
2025-06-13  9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
2025-06-25  8:01   ` kernel test robot
2025-06-25 13:57     ` Mathieu Desnoyers
2025-06-25 15:06       ` Gabriele Monaco
2025-07-02 13:58       ` Gabriele Monaco
2025-06-13  9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
2025-06-18 21:04   ` Shuah Khan
2025-06-20 17:20     ` Gabriele Monaco
2025-06-20 17:29       ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).