* [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability
@ 2025-06-13 9:12 Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13 9:12 UTC (permalink / raw)
To: linux-kernel, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar
Cc: Gabriele Monaco
This patchset moves the task_mm_cid_work to a preemptible and migratable
context. This reduces the impact of this work to the scheduling latency
of real time tasks.
The change makes the recurrence of the task a bit more predictable.
The behaviour causing latency was introduced in commit 223baf9d17f2
("sched: Fix performance regression introduced by mm_cid") which
introduced a task work tied to the scheduler tick.
That approach presents two possible issues:
* the task work runs before returning to user and causes, in fact, a
scheduling latency (with order of magnitude significant in PREEMPT_RT)
* periodic tasks with short runtime are less likely to run during the
tick, hence they might not run the task work at all
Patch 1 add support for prev_sum_exec_runtime to the RT, deadline and
sched_ext classes as it is supported by fair, this is required to avoid
calling rseq_preempt on tick if the runtime is below a threshold.
Patch 2 contains the main changes, removing the task_work on the
scheduler tick and using a work_struct scheduled more reliably during
__rseq_handle_notify_resume.
Patch 3 adds a selftest to validate the functionality of the
task_mm_cid_work (i.e. to compact the mm_cids).
Rebased on v6.16-rc1, no change since V13 [1].
Changes since V12:
* Ensure the tick schedules the mm_cid compaction only once for tasks
executing longer than 100ms (until the scan expires again)
* Execute an rseq_preempt from the tick only after compaction was done
and the cid assignation changed
Changes since V11:
* Remove variable to make mm_cid_needs_scan more compact
* All patches reviewed
Changes since V10:
* Fix compilation errors with RSEQ and/or MM_CID disabled
Changes since V9:
* Simplify and move checks from task_queue_mm_cid to its call site
Changes since V8 [2]:
* Add support for prev_sum_exec_runtime to RT, deadline and sched_ext
* Avoid rseq_preempt on ticks unless executing for more than 100ms
* Queue the work on the unbound workqueue
Changes since V7:
* Schedule mm_cid compaction and update at every tick too
* mmgrab before scheduling the work
Changes since V6 [3]:
* Switch to a simple work_struct instead of a delayed work
* Schedule the work_struct in __rseq_handle_notify_resume
* Asynchronously disable the work but make sure mm is there while we run
* Remove first patch as merged independently
* Fix commit tag for test
Changes since V5:
* Punctuation
Changes since V4 [4]:
* Fixes on the selftest
* Polished memory allocation and cleanup
* Handle the test failure in main
Changes since V3 [5]:
* Fixes on the selftest
* Minor style issues in comments and indentation
* Use of perror where possible
* Add a barrier to align threads execution
* Improve test failure and error handling
Changes since V2 [6]:
* Change the order of the patches
* Merge patches changing the main delayed_work logic
* Improved self-test to spawn 1 less thread and use the main one instead
Changes since V1 [7]:
* Re-arm the delayed_work at each invocation
* Cancel the work synchronously at mmdrop
* Remove next scan fields and completely rely on the delayed_work
* Shrink mm_cid allocation with nr thread/affinity (Mathieu Desnoyers)
* Add self test
[1] - https://lore.kernel.org/lkml/20250414123630.177385-5-gmonaco@redhat.com
[2] - https://lore.kernel.org/lkml/20250220102639.141314-1-gmonaco@redhat.com
[3] - https://lore.kernel.org/lkml/20250210153253.460471-1-gmonaco@redhat.com
[4] - https://lore.kernel.org/lkml/20250113074231.61638-4-gmonaco@redhat.com
[5] - https://lore.kernel.org/lkml/20241216130909.240042-1-gmonaco@redhat.com
[6] - https://lore.kernel.org/lkml/20241213095407.271357-1-gmonaco@redhat.com
[7] - https://lore.kernel.org/lkml/20241205083110.180134-2-gmonaco@redhat.com
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Ingo Molnar <mingo@redhat.org>
Gabriele Monaco (3):
sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes
sched: Move task_mm_cid_work to mm work_struct
selftests/rseq: Add test for mm_cid compaction
include/linux/mm_types.h | 26 +++
include/linux/sched.h | 8 +-
kernel/rseq.c | 2 +
kernel/sched/core.c | 75 ++++---
kernel/sched/deadline.c | 1 +
kernel/sched/ext.c | 1 +
kernel/sched/rt.c | 1 +
kernel/sched/sched.h | 6 +-
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
11 files changed, 294 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
base-commit: 2c4a1f3fe03edab80db66688360685031802160a
--
2.49.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes
2025-06-13 9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
@ 2025-06-13 9:12 ` Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
2 siblings, 0 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13 9:12 UTC (permalink / raw)
To: linux-kernel, Ingo Molnar, Peter Zijlstra
Cc: Gabriele Monaco, Mathieu Desnoyers, Ingo Molnar
The fair scheduling class relies on prev_sum_exec_runtime to compute the
duration of the task's runtime since it was last scheduled. This value
is currently not required by other scheduling classes but can be useful
to understand long running tasks and take certain actions (e.g. during a
scheduler tick).
Add support for prev_sum_exec_runtime to the RT, deadline and sched_ext
classes by simply assigning the sum_exec_runtime at each set_next_task.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
kernel/sched/deadline.c | 1 +
kernel/sched/ext.c | 1 +
kernel/sched/rt.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245e..8387006396c8a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2389,6 +2389,7 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
p->se.exec_start = rq_clock_task(rq);
if (on_dl_rq(&p->dl))
update_stats_wait_end_dl(dl_rq, dl_se);
+ p->se.prev_sum_exec_runtime = p->se.sum_exec_runtime;
/* You can't push away the running task */
dequeue_pushable_dl_task(rq, p);
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 2c41c78be61eb..75772767f87d2 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3255,6 +3255,7 @@ static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool first)
}
p->se.exec_start = rq_clock_task(rq);
+ p->se.prev_sum_exec_runtime = p->se.sum_exec_runtime;
/* see dequeue_task_scx() on why we skip when !QUEUED */
if (SCX_HAS_OP(sch, running) && (p->scx.flags & SCX_TASK_QUEUED))
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e40422c370335..2c70ff2042ee9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1693,6 +1693,7 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
p->se.exec_start = rq_clock_task(rq);
if (on_rt_rq(&p->rt))
update_stats_wait_end_rt(rt_rq, rt_se);
+ p->se.prev_sum_exec_runtime = p->se.sum_exec_runtime;
/* The running task is never eligible for pushing */
dequeue_pushable_task(rq, p);
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
2025-06-13 9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
@ 2025-06-13 9:12 ` Gabriele Monaco
2025-06-25 8:01 ` kernel test robot
2025-06-13 9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
2 siblings, 1 reply; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13 9:12 UTC (permalink / raw)
To: linux-kernel, Andrew Morton, David Hildenbrand, Ingo Molnar,
Peter Zijlstra, Mathieu Desnoyers, Paul E. McKenney, linux-mm
Cc: Gabriele Monaco, Ingo Molnar
Currently, the task_mm_cid_work function is called in a task work
triggered by a scheduler tick to frequently compact the mm_cids of each
process. This can delay the execution of the corresponding thread for
the entire duration of the function, negatively affecting the response
in case of real time tasks. In practice, we observe task_mm_cid_work
increasing the latency of 30-35us on a 128 cores system, this order of
magnitude is meaningful under PREEMPT_RT.
Run the task_mm_cid_work in a new work_struct connected to the
mm_struct rather than in the task context before returning to
userspace.
This work_struct is initialised with the mm and disabled before freeing
it. The queuing of the work happens while returning to userspace in
__rseq_handle_notify_resume, maintaining the checks to avoid running
more frequently than MM_CID_SCAN_DELAY.
To make sure this happens predictably also on long running tasks, we
trigger a call to __rseq_handle_notify_resume also from the scheduler
tick if the runtime exceeded a 100ms threshold.
The main advantage of this change is that the function can be offloaded
to a different CPU and even preempted by RT tasks.
Moreover, this new behaviour is more predictable with periodic tasks
with short runtime, which may rarely run during a scheduler tick.
Now, the work is always scheduled when the task returns to userspace.
The work is disabled during mmdrop, since the function cannot sleep in
all kernel configurations, we cannot wait for possibly running work
items to terminate. We make sure the mm is valid in case the task is
terminating by reserving it with mmgrab/mmdrop, returning prematurely if
we are really the last user while the work gets to run.
This situation is unlikely since we don't schedule the work for exiting
tasks, but we cannot rule it out.
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/linux/mm_types.h | 26 ++++++++++++++
include/linux/sched.h | 8 ++++-
kernel/rseq.c | 2 ++
kernel/sched/core.c | 75 ++++++++++++++++++++++++++--------------
kernel/sched/sched.h | 6 ++--
5 files changed, 89 insertions(+), 28 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d6b91e8a66d6d..d14c7c49cf0ec 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1017,6 +1017,10 @@ struct mm_struct {
* mm nr_cpus_allowed updates.
*/
raw_spinlock_t cpus_allowed_lock;
+ /*
+ * @cid_work: Work item to run the mm_cid scan.
+ */
+ struct work_struct cid_work;
#endif
#ifdef CONFIG_MMU
atomic_long_t pgtables_bytes; /* size of all page tables */
@@ -1321,6 +1325,8 @@ enum mm_cid_state {
MM_CID_LAZY_PUT = (1U << 31),
};
+extern void task_mm_cid_work(struct work_struct *work);
+
static inline bool mm_cid_is_unset(int cid)
{
return cid == MM_CID_UNSET;
@@ -1393,12 +1399,14 @@ static inline int mm_alloc_cid_noprof(struct mm_struct *mm, struct task_struct *
if (!mm->pcpu_cid)
return -ENOMEM;
mm_init_cid(mm, p);
+ INIT_WORK(&mm->cid_work, task_mm_cid_work);
return 0;
}
#define mm_alloc_cid(...) alloc_hooks(mm_alloc_cid_noprof(__VA_ARGS__))
static inline void mm_destroy_cid(struct mm_struct *mm)
{
+ disable_work(&mm->cid_work);
free_percpu(mm->pcpu_cid);
mm->pcpu_cid = NULL;
}
@@ -1420,6 +1428,16 @@ static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct cpumas
WRITE_ONCE(mm->nr_cpus_allowed, cpumask_weight(mm_allowed));
raw_spin_unlock(&mm->cpus_allowed_lock);
}
+
+static inline bool mm_cid_needs_scan(struct mm_struct *mm)
+{
+ return mm && !time_before(jiffies, READ_ONCE(mm->mm_cid_next_scan));
+}
+
+static inline bool mm_cid_scan_pending(struct mm_struct *mm)
+{
+ return mm && work_pending(&mm->cid_work);
+}
#else /* CONFIG_SCHED_MM_CID */
static inline void mm_init_cid(struct mm_struct *mm, struct task_struct *p) { }
static inline int mm_alloc_cid(struct mm_struct *mm, struct task_struct *p) { return 0; }
@@ -1430,6 +1448,14 @@ static inline unsigned int mm_cid_size(void)
return 0;
}
static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct cpumask *cpumask) { }
+static inline bool mm_cid_needs_scan(struct mm_struct *mm)
+{
+ return false;
+}
+static inline bool mm_cid_scan_pending(struct mm_struct *mm)
+{
+ return false;
+}
#endif /* CONFIG_SCHED_MM_CID */
struct mmu_gather;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4f78a64beb52c..e90bc52dece3e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1432,7 +1432,7 @@ struct task_struct {
int last_mm_cid; /* Most recent cid in mm */
int migrate_from_cpu;
int mm_cid_active; /* Whether cid bitmap is active */
- struct callback_head cid_work;
+ unsigned long last_cid_reset; /* Time of last reset in jiffies */
#endif
struct tlbflush_unmap_batch tlb_ubc;
@@ -2277,4 +2277,10 @@ static __always_inline void alloc_tag_restore(struct alloc_tag *tag, struct allo
#define alloc_tag_restore(_tag, _old) do {} while (0)
#endif
+#ifdef CONFIG_SCHED_MM_CID
+extern void task_queue_mm_cid(struct task_struct *curr);
+#else
+static inline void task_queue_mm_cid(struct task_struct *curr) { }
+#endif
+
#endif
diff --git a/kernel/rseq.c b/kernel/rseq.c
index b7a1ec327e811..383db2ccad4d0 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -441,6 +441,8 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
}
if (unlikely(rseq_update_cpu_node_id(t)))
goto error;
+ if (mm_cid_needs_scan(t->mm))
+ task_queue_mm_cid(t);
return;
error:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dce50fa57471d..7d502a99a69cb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10589,22 +10589,16 @@ static void sched_mm_cid_remote_clear_weight(struct mm_struct *mm, int cpu,
sched_mm_cid_remote_clear(mm, pcpu_cid, cpu);
}
-static void task_mm_cid_work(struct callback_head *work)
+void task_mm_cid_work(struct work_struct *work)
{
unsigned long now = jiffies, old_scan, next_scan;
- struct task_struct *t = current;
struct cpumask *cidmask;
- struct mm_struct *mm;
+ struct mm_struct *mm = container_of(work, struct mm_struct, cid_work);
int weight, cpu;
- WARN_ON_ONCE(t != container_of(work, struct task_struct, cid_work));
-
- work->next = work; /* Prevent double-add */
- if (t->flags & PF_EXITING)
- return;
- mm = t->mm;
- if (!mm)
- return;
+ /* We are the last user, process already terminated. */
+ if (atomic_read(&mm->mm_count) == 1)
+ goto out_drop;
old_scan = READ_ONCE(mm->mm_cid_next_scan);
next_scan = now + msecs_to_jiffies(MM_CID_SCAN_DELAY);
if (!old_scan) {
@@ -10617,9 +10611,9 @@ static void task_mm_cid_work(struct callback_head *work)
old_scan = next_scan;
}
if (time_before(now, old_scan))
- return;
+ goto out_drop;
if (!try_cmpxchg(&mm->mm_cid_next_scan, &old_scan, next_scan))
- return;
+ goto out_drop;
cidmask = mm_cidmask(mm);
/* Clear cids that were not recently used. */
for_each_possible_cpu(cpu)
@@ -10631,6 +10625,8 @@ static void task_mm_cid_work(struct callback_head *work)
*/
for_each_possible_cpu(cpu)
sched_mm_cid_remote_clear_weight(mm, cpu, weight);
+out_drop:
+ mmdrop(mm);
}
void init_sched_mm_cid(struct task_struct *t)
@@ -10643,23 +10639,52 @@ void init_sched_mm_cid(struct task_struct *t)
if (mm_users == 1)
mm->mm_cid_next_scan = jiffies + msecs_to_jiffies(MM_CID_SCAN_DELAY);
}
- t->cid_work.next = &t->cid_work; /* Protect against double add */
- init_task_work(&t->cid_work, task_mm_cid_work);
}
-void task_tick_mm_cid(struct rq *rq, struct task_struct *curr)
+void task_tick_mm_cid(struct rq *rq, struct task_struct *t)
{
- struct callback_head *work = &curr->cid_work;
- unsigned long now = jiffies;
+ u64 rtime = t->se.sum_exec_runtime - t->se.prev_sum_exec_runtime;
- if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) ||
- work->next != work)
- return;
- if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan)))
- return;
+ /*
+ * If a task is running unpreempted for a long time, it won't get its
+ * mm_cid compacted and won't update its mm_cid value after a
+ * compaction occurs.
+ * For such a task, this function does two things:
+ * A) trigger the mm_cid recompaction,
+ * B) trigger an update of the task's rseq->mm_cid field at some point
+ * after recompaction, so it can get a mm_cid value closer to 0.
+ * A change in the mm_cid triggers an rseq_preempt.
+ *
+ * A occurs only once after the scan time elapsed, until the next scan
+ * expires as well.
+ * B occurs once after the compaction work completes, that is when scan
+ * is no longer needed (it occurred for this mm) but the last rseq
+ * preempt was done before the last mm_cid scan.
+ */
+ if (t->mm && rtime > RSEQ_UNPREEMPTED_THRESHOLD) {
+ if (mm_cid_needs_scan(t->mm) && !mm_cid_scan_pending(t->mm))
+ rseq_set_notify_resume(t);
+ else if (time_after(jiffies, t->last_cid_reset +
+ msecs_to_jiffies(MM_CID_SCAN_DELAY))) {
+ int old_cid = t->mm_cid;
+
+ if (!t->mm_cid_active)
+ return;
+ mm_cid_snapshot_time(rq, t->mm);
+ mm_cid_put_lazy(t);
+ t->last_mm_cid = t->mm_cid = mm_cid_get(rq, t, t->mm);
+ if (old_cid != t->mm_cid)
+ rseq_preempt(t);
+ }
+ }
+}
- /* No page allocation under rq lock */
- task_work_add(curr, work, TWA_RESUME);
+/* Call only when curr is a user thread. */
+void task_queue_mm_cid(struct task_struct *curr)
+{
+ /* Ensure the mm exists when we run. */
+ mmgrab(curr->mm);
+ queue_work(system_unbound_wq, &curr->mm->cid_work);
}
void sched_mm_cid_exit_signals(struct task_struct *t)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 475bb5998295e..c1881ba10ac62 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3606,13 +3606,14 @@ extern const char *preempt_modes[];
#define SCHED_MM_CID_PERIOD_NS (100ULL * 1000000) /* 100ms */
#define MM_CID_SCAN_DELAY 100 /* 100ms */
+#define RSEQ_UNPREEMPTED_THRESHOLD SCHED_MM_CID_PERIOD_NS
extern raw_spinlock_t cid_lock;
extern int use_cid_lock;
extern void sched_mm_cid_migrate_from(struct task_struct *t);
extern void sched_mm_cid_migrate_to(struct rq *dst_rq, struct task_struct *t);
-extern void task_tick_mm_cid(struct rq *rq, struct task_struct *curr);
+extern void task_tick_mm_cid(struct rq *rq, struct task_struct *t);
extern void init_sched_mm_cid(struct task_struct *t);
static inline void __mm_cid_put(struct mm_struct *mm, int cid)
@@ -3822,6 +3823,7 @@ static inline int mm_cid_get(struct rq *rq, struct task_struct *t,
cid = __mm_cid_get(rq, t, mm);
__this_cpu_write(pcpu_cid->cid, cid);
__this_cpu_write(pcpu_cid->recent_cid, cid);
+ t->last_cid_reset = jiffies;
return cid;
}
@@ -3881,7 +3883,7 @@ static inline void switch_mm_cid(struct rq *rq,
static inline void switch_mm_cid(struct rq *rq, struct task_struct *prev, struct task_struct *next) { }
static inline void sched_mm_cid_migrate_from(struct task_struct *t) { }
static inline void sched_mm_cid_migrate_to(struct rq *dst_rq, struct task_struct *t) { }
-static inline void task_tick_mm_cid(struct rq *rq, struct task_struct *curr) { }
+static inline void task_tick_mm_cid(struct rq *rq, struct task_struct *t) { }
static inline void init_sched_mm_cid(struct task_struct *t) { }
#endif /* !CONFIG_SCHED_MM_CID */
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
2025-06-13 9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
@ 2025-06-13 9:12 ` Gabriele Monaco
2025-06-18 21:04 ` Shuah Khan
2 siblings, 1 reply; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-13 9:12 UTC (permalink / raw)
To: linux-kernel, Mathieu Desnoyers, Peter Zijlstra, Paul E. McKenney,
Shuah Khan, linux-kselftest
Cc: Gabriele Monaco, Ingo Molnar
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 0fda241fa62b0..b3920c59bf401 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 0d0a5fae59547..bc4d940f66d40 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -17,7 +17,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \
- syscall_errors_test
+ syscall_errors_test mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
2025-06-13 9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
@ 2025-06-18 21:04 ` Shuah Khan
2025-06-20 17:20 ` Gabriele Monaco
0 siblings, 1 reply; 11+ messages in thread
From: Shuah Khan @ 2025-06-18 21:04 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Paul E. McKenney, Shuah Khan, linux-kselftest
Cc: Ingo Molnar, Shuah Khan
On 6/13/25 03:12, Gabriele Monaco wrote:
> A task in the kernel (task_mm_cid_work) runs somewhat periodically to
> compact the mm_cid for each process. Add a test to validate that it runs
> correctly and timely.
>
> The test spawns 1 thread pinned to each CPU, then each thread, including
> the main one, runs in short bursts for some time. During this period, the
> mm_cids should be spanning all numbers between 0 and nproc.
>
> At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
> is selected to be the new leader, all other threads terminate.
>
> After some time, the only running thread should see 0 as mm_cid, if that
> doesn't happen, the compaction mechanism didn't work and the test fails.
>
> The test never fails if only 1 core is available, in which case, we
> cannot test anything as the only available mm_cid is 0.
>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Mathieu,
Let me know if you would like me to take this through my tree.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
thanks,
-- Shuah
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
2025-06-18 21:04 ` Shuah Khan
@ 2025-06-20 17:20 ` Gabriele Monaco
2025-06-20 17:29 ` Mathieu Desnoyers
0 siblings, 1 reply; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-20 17:20 UTC (permalink / raw)
To: Shuah Khan
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra, Paul E. McKenney,
Shuah Khan, linux-kselftest, Ingo Molnar
2025-06-18T21:04:30Z Shuah Khan <skhan@linuxfoundation.org>:
> On 6/13/25 03:12, Gabriele Monaco wrote:
>> A task in the kernel (task_mm_cid_work) runs somewhat periodically to
>> compact the mm_cid for each process. Add a test to validate that it runs
>> correctly and timely.
>> The test spawns 1 thread pinned to each CPU, then each thread, including
>> the main one, runs in short bursts for some time. During this period, the
>> mm_cids should be spanning all numbers between 0 and nproc.
>> At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
>> is selected to be the new leader, all other threads terminate.
>> After some time, the only running thread should see 0 as mm_cid, if that
>> doesn't happen, the compaction mechanism didn't work and the test fails.
>> The test never fails if only 1 core is available, in which case, we
>> cannot test anything as the only available mm_cid is 0.
>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>
> Mathieu,
>
> Let me know if you would like me to take this through my tree.
>
> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Thanks for the Ack, just to add some context: the test is flaky without the previous patches, would it still be alright to pull it before them?
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction
2025-06-20 17:20 ` Gabriele Monaco
@ 2025-06-20 17:29 ` Mathieu Desnoyers
0 siblings, 0 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2025-06-20 17:29 UTC (permalink / raw)
To: Gabriele Monaco, Shuah Khan
Cc: linux-kernel, Peter Zijlstra, Paul E. McKenney, Shuah Khan,
linux-kselftest, Ingo Molnar
On 2025-06-20 13:20, Gabriele Monaco wrote:
> 2025-06-18T21:04:30Z Shuah Khan <skhan@linuxfoundation.org>:
>
>> On 6/13/25 03:12, Gabriele Monaco wrote:
>>> A task in the kernel (task_mm_cid_work) runs somewhat periodically to
>>> compact the mm_cid for each process. Add a test to validate that it runs
>>> correctly and timely.
>>> The test spawns 1 thread pinned to each CPU, then each thread, including
>>> the main one, runs in short bursts for some time. During this period, the
>>> mm_cids should be spanning all numbers between 0 and nproc.
>>> At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
>>> is selected to be the new leader, all other threads terminate.
>>> After some time, the only running thread should see 0 as mm_cid, if that
>>> doesn't happen, the compaction mechanism didn't work and the test fails.
>>> The test never fails if only 1 core is available, in which case, we
>>> cannot test anything as the only available mm_cid is 0.
>>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>>
>> Mathieu,
>>
>> Let me know if you would like me to take this through my tree.
>>
>> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
>
> Thanks for the Ack, just to add some context: the test is flaky without the previous patches, would it still be alright to pull it before them?
We need Peter Zijlstra to act on merging the fix.
Peter ?
Thanks,
Mathieu
>
> Thanks,
> Gabriele
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
2025-06-13 9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
@ 2025-06-25 8:01 ` kernel test robot
2025-06-25 13:57 ` Mathieu Desnoyers
0 siblings, 1 reply; 11+ messages in thread
From: kernel test robot @ 2025-06-25 8:01 UTC (permalink / raw)
To: Gabriele Monaco
Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
Mathieu Desnoyers, Paul E. McKenney, Gabriele Monaco, Ingo Molnar,
oliver.sang
Hello,
kernel test robot noticed a 10.1% regression of hackbench.throughput on:
commit: f3de761c52148abfb1b4512914f64c7e1c737fc8 ("[RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct")
url: https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250613-171504
patch link: https://lore.kernel.org/all/20250613091229.21500-3-gmonaco@redhat.com/
patch subject: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
testcase: hackbench
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
iterations: 4
mode: process
ipc: pipe
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput 2.9% regression |
| test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | ipc=socket |
| | iterations=4 |
| | mode=process |
| | nr_threads=50% |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.shell_rtns_3.ops_per_sec 1.7% regression |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters | cpufreq_governor=performance |
| | test=shell_rtns_3 |
| | testtime=300s |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput 6.2% regression |
| test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | ipc=pipe |
| | iterations=4 |
| | mode=process |
| | nr_threads=800% |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.shell_rtns_1.ops_per_sec 2.1% regression |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters | cpufreq_governor=performance |
| | test=shell_rtns_1 |
| | testtime=300s |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput 11.8% improvement |
| test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | ipc=pipe |
| | iterations=4 |
| | mode=process |
| | nr_threads=50% |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.shell_rtns_2.ops_per_sec 2.2% regression |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters | cpufreq_governor=performance |
| | test=shell_rtns_2 |
| | testtime=300s |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim9: aim9.exec_test.ops_per_sec 2.6% regression |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters | cpufreq_governor=performance |
| | test=exec_test |
| | testtime=300s |
+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | aim7: aim7.jobs-per-min 5.5% regression |
| test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | disk=1BRD_48G |
| | fs=xfs |
| | load=600 |
| | test=sync_disk_rw |
+------------------+------------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506251555.de6720f7-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250625/202506251555.de6720f7-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
55140 ± 80% +229.2% 181547 ± 20% numa-meminfo.node1.Mapped
13048 ± 80% +248.2% 45431 ± 20% numa-vmstat.node1.nr_mapped
679.17 ± 22% -25.3% 507.33 ± 10% sched_debug.cfs_rq:/.util_est.max
4.287e+08 ± 3% +20.3% 5.158e+08 cpuidle..time
2953716 ± 13% +228.9% 9716185 ± 2% cpuidle..usage
91072 ± 12% +134.8% 213855 ± 7% meminfo.Mapped
8848637 +10.4% 9769875 ± 5% meminfo.Memused
0.67 ± 4% +0.1 0.78 ± 2% mpstat.cpu.all.irq%
0.03 ± 2% +0.0 0.03 ± 4% mpstat.cpu.all.soft%
4.17 ± 8% +596.0% 29.00 ± 31% mpstat.max_utilization.seconds
2950 -12.3% 2587 vmstat.procs.r
4557607 ± 2% +35.9% 6192548 vmstat.system.cs
397195 ± 5% +73.4% 688726 vmstat.system.in
1490153 -10.1% 1339340 hackbench.throughput
1424170 -8.7% 1299590 hackbench.throughput_avg
1490153 -10.1% 1339340 hackbench.throughput_best
1353181 ± 2% -10.1% 1216523 hackbench.throughput_worst
53158738 ± 3% +34.0% 71240022 hackbench.time.involuntary_context_switches
12177 -2.4% 11891 hackbench.time.percent_of_cpu_this_job_got
4482 +7.6% 4821 hackbench.time.system_time
798.92 +2.0% 815.24 hackbench.time.user_time
1.54e+08 ± 3% +46.6% 2.257e+08 hackbench.time.voluntary_context_switches
210335 +3.3% 217333 proc-vmstat.nr_anon_pages
23353 ± 14% +136.2% 55152 ± 7% proc-vmstat.nr_mapped
61825 ± 3% +6.6% 65928 ± 2% proc-vmstat.nr_page_table_pages
30859 +4.4% 32213 proc-vmstat.nr_slab_reclaimable
1294 ±177% +1657.1% 22743 ± 66% proc-vmstat.numa_hint_faults
1153 ±198% +1597.0% 19566 ± 79% proc-vmstat.numa_hint_faults_local
1.242e+08 -3.2% 1.202e+08 proc-vmstat.numa_hit
1.241e+08 -3.2% 1.201e+08 proc-vmstat.numa_local
2195 ±110% +2337.0% 53508 ± 55% proc-vmstat.numa_pte_updates
1.243e+08 -3.2% 1.203e+08 proc-vmstat.pgalloc_normal
875909 ± 2% +8.6% 951378 ± 2% proc-vmstat.pgfault
1.231e+08 -3.5% 1.188e+08 proc-vmstat.pgfree
6.903e+10 -5.6% 6.514e+10 perf-stat.i.branch-instructions
0.21 +0.0 0.26 perf-stat.i.branch-miss-rate%
89225177 ± 2% +38.3% 1.234e+08 perf-stat.i.branch-misses
25.64 ± 2% -5.7 19.95 ± 2% perf-stat.i.cache-miss-rate%
9.322e+08 ± 2% +22.8% 1.145e+09 perf-stat.i.cache-references
4553621 ± 2% +39.8% 6363761 perf-stat.i.context-switches
1.12 +4.5% 1.17 perf-stat.i.cpi
186890 ± 2% +143.9% 455784 perf-stat.i.cpu-migrations
2.787e+11 -4.9% 2.649e+11 perf-stat.i.instructions
0.91 -4.4% 0.87 perf-stat.i.ipc
36.79 ± 2% +44.9% 53.30 perf-stat.i.metric.K/sec
0.13 ± 2% +0.1 0.19 perf-stat.overall.branch-miss-rate%
24.44 ± 2% -4.7 19.74 ± 2% perf-stat.overall.cache-miss-rate%
1.12 +4.6% 1.17 perf-stat.overall.cpi
0.89 -4.4% 0.85 perf-stat.overall.ipc
6.755e+10 -5.4% 6.392e+10 perf-stat.ps.branch-instructions
87121352 ± 2% +38.5% 1.206e+08 perf-stat.ps.branch-misses
9.098e+08 ± 2% +23.1% 1.12e+09 perf-stat.ps.cache-references
4443812 ± 2% +39.9% 6218298 perf-stat.ps.context-switches
181595 ± 2% +144.5% 443985 perf-stat.ps.cpu-migrations
2.727e+11 -4.7% 2.599e+11 perf-stat.ps.instructions
1.21e+13 +4.3% 1.262e+13 perf-stat.total.instructions
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.ctx_resched.event_function.remote_function.generic_exec_single
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.ctx_resched.event_function.remote_function.generic_exec_single.smp_call_function_single
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.event_function.remote_function.generic_exec_single.smp_call_function_single.event_function_call
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.remote_function.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child
11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl
11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_internal_command
11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record.run_builtin
11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.main
11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.perf_c2c__record.run_builtin.handle_internal_command.main
11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record
11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.__intel_pmu_enable_all
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.__x64_sys_ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp._perf_ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.ctx_resched
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.event_function
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.generic_exec_single
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.perf_event_for_each_child
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.perf_ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.remote_function
11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.__evlist__enable
11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.perf_c2c__record
11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.perf_evsel__enable_cpu
11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.perf_evsel__run_ioctl
11.84 ± 91% -9.5 2.30 ±141% perf-profile.self.cycles-pp.__intel_pmu_enable_all
23.74 ±185% -98.6% 0.34 ±114% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
12.77 ± 80% -83.9% 2.05 ±138% perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
5.93 ± 69% -90.5% 0.56 ±105% perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
6.70 ±152% -94.5% 0.37 ±145% perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.82 ± 85% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.59 ±202% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
13.53 ± 11% -47.0% 7.18 ± 76% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
15.63 ± 17% -100.0% 0.00 perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
47.22 ± 77% -85.5% 6.87 ±144% perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
133.35 ±132% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
68.01 ±203% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
13.53 ± 11% -47.0% 7.18 ± 76% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
34.59 ± 3% -100.0% 0.00 perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
40.97 ± 8% -71.8% 11.55 ± 64% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
373.07 ±123% -99.8% 0.78 ±156% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
13.53 ± 11% -62.0% 5.14 ±107% perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
120.97 ± 23% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
46.03 ± 30% -62.5% 17.27 ± 87% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
984.50 ± 14% -43.5% 556.24 ± 58% perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
339.42 ± 12% -97.3% 9.11 ± 54% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
8.00 ± 23% -85.4% 1.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
22.17 ± 49% -100.0% 0.00 perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
73.83 ± 20% -76.3% 17.50 ± 96% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
13.53 ± 11% -62.0% 5.14 ±107% perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
336.30 ± 5% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
23.74 ±185% -98.6% 0.34 ±114% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
14.48 ± 61% -74.1% 3.76 ±152% perf-sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
6.48 ± 68% -91.3% 0.56 ±105% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
6.70 ±152% -94.5% 0.37 ±145% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
2.18 ± 75% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.79 ±165% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1.53 ±100% -97.5% 0.04 ± 84% perf-sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
105.34 ± 26% -100.0% 0.00 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
29.72 ± 40% -76.5% 7.00 ±102% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
32.21 ± 33% -65.7% 11.04 ± 85% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
984.49 ± 14% -43.5% 556.23 ± 58% perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
337.00 ± 12% -97.6% 8.11 ± 52% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
53.42 ± 59% -69.8% 16.15 ±162% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
218.65 ± 83% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
82.52 ±162% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
10.89 ± 98% -98.8% 0.13 ±134% perf-sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
334.02 ± 6% -100.0% 0.00 perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-12/performance/socket/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
161258 -12.6% 141018 ± 5% perf-c2c.HITM.total
6514 ± 3% +13.3% 7381 ± 3% uptime.idle
692218 +17.8% 815512 vmstat.system.in
4.747e+08 ± 7% +137.3% 1.127e+09 ± 21% cpuidle..time
5702271 ± 12% +503.6% 34419686 ± 13% cpuidle..usage
141191 ± 2% +10.3% 155768 ± 3% meminfo.PageTables
62180 +26.0% 78348 meminfo.Percpu
2.20 ± 14% +3.5 5.67 ± 20% mpstat.cpu.all.idle%
0.55 +0.2 0.72 ± 5% mpstat.cpu.all.irq%
0.04 ± 2% +0.0 0.06 ± 5% mpstat.cpu.all.soft%
448780 -2.9% 435554 hackbench.throughput
440656 -2.6% 429130 hackbench.throughput_avg
448780 -2.9% 435554 hackbench.throughput_best
425797 -2.2% 416584 hackbench.throughput_worst
90998790 -15.0% 77364427 ± 6% hackbench.time.involuntary_context_switches
12446 -3.9% 11960 hackbench.time.percent_of_cpu_this_job_got
16057 -1.4% 15825 hackbench.time.system_time
63421 -2.3% 61955 proc-vmstat.nr_kernel_stack
35455 ± 2% +10.0% 38991 ± 3% proc-vmstat.nr_page_table_pages
34542 +5.1% 36312 ± 2% proc-vmstat.nr_slab_reclaimable
151083 ± 16% +46.6% 221509 ± 17% proc-vmstat.numa_hint_faults
113731 ± 26% +64.7% 187314 ± 20% proc-vmstat.numa_hint_faults_local
133591 +3.1% 137709 proc-vmstat.numa_other
53696 ± 16% -28.6% 38362 ± 10% proc-vmstat.numa_pages_migrated
1053504 ± 2% +7.7% 1135052 ± 4% proc-vmstat.pgfault
2077549 ± 3% +8.5% 2254157 ± 4% proc-vmstat.pgfree
53696 ± 16% -28.6% 38362 ± 10% proc-vmstat.pgmigrate_success
4.941e+10 -2.6% 4.81e+10 perf-stat.i.branch-instructions
2.232e+08 -1.9% 2.189e+08 perf-stat.i.branch-misses
2.11e+09 -5.8% 1.989e+09 ± 2% perf-stat.i.cache-references
3.221e+11 -2.5% 3.141e+11 perf-stat.i.cpu-cycles
2.365e+11 -2.7% 2.303e+11 perf-stat.i.instructions
6787 ± 3% +8.0% 7327 ± 4% perf-stat.i.minor-faults
6789 ± 3% +8.0% 7329 ± 4% perf-stat.i.page-faults
4.904e+10 -2.5% 4.779e+10 perf-stat.ps.branch-instructions
2.215e+08 -1.8% 2.174e+08 perf-stat.ps.branch-misses
2.094e+09 -5.7% 1.974e+09 ± 2% perf-stat.ps.cache-references
3.197e+11 -2.4% 3.12e+11 perf-stat.ps.cpu-cycles
2.348e+11 -2.6% 2.288e+11 perf-stat.ps.instructions
6691 ± 3% +7.2% 7174 ± 4% perf-stat.ps.minor-faults
6693 ± 3% +7.2% 7176 ± 4% perf-stat.ps.page-faults
7475567 +16.4% 8699139 sched_debug.cfs_rq:/.avg_vruntime.avg
8752154 ± 3% +20.6% 10551563 ± 4% sched_debug.cfs_rq:/.avg_vruntime.max
211424 ± 12% +374.5% 1003211 ± 39% sched_debug.cfs_rq:/.avg_vruntime.stddev
19.44 ± 6% +29.4% 25.17 ± 5% sched_debug.cfs_rq:/.h_nr_queued.max
4.49 ± 4% +33.5% 5.99 ± 4% sched_debug.cfs_rq:/.h_nr_queued.stddev
19.33 ± 6% +29.0% 24.94 ± 5% sched_debug.cfs_rq:/.h_nr_runnable.max
4.47 ± 4% +33.4% 5.96 ± 3% sched_debug.cfs_rq:/.h_nr_runnable.stddev
6446 ±223% +885.4% 63529 ± 57% sched_debug.cfs_rq:/.left_deadline.avg
825119 ±223% +613.5% 5886958 ± 44% sched_debug.cfs_rq:/.left_deadline.max
72645 ±223% +713.6% 591074 ± 49% sched_debug.cfs_rq:/.left_deadline.stddev
6446 ±223% +885.5% 63527 ± 57% sched_debug.cfs_rq:/.left_vruntime.avg
825080 ±223% +613.5% 5886805 ± 44% sched_debug.cfs_rq:/.left_vruntime.max
72642 ±223% +713.7% 591058 ± 49% sched_debug.cfs_rq:/.left_vruntime.stddev
4202 ± 8% +1115.1% 51069 ± 61% sched_debug.cfs_rq:/.load.stddev
367.11 +20.2% 441.44 ± 17% sched_debug.cfs_rq:/.load_avg.max
7475567 +16.4% 8699139 sched_debug.cfs_rq:/.min_vruntime.avg
8752154 ± 3% +20.6% 10551563 ± 4% sched_debug.cfs_rq:/.min_vruntime.max
211424 ± 12% +374.5% 1003211 ± 39% sched_debug.cfs_rq:/.min_vruntime.stddev
0.17 ± 16% +39.8% 0.24 ± 6% sched_debug.cfs_rq:/.nr_queued.stddev
6446 ±223% +885.5% 63527 ± 57% sched_debug.cfs_rq:/.right_vruntime.avg
825080 ±223% +613.5% 5886805 ± 44% sched_debug.cfs_rq:/.right_vruntime.max
72642 ±223% +713.7% 591058 ± 49% sched_debug.cfs_rq:/.right_vruntime.stddev
752.39 ± 81% -81.4% 139.72 ± 53% sched_debug.cfs_rq:/.runnable_avg.min
2728 ± 3% +51.2% 4126 ± 8% sched_debug.cfs_rq:/.runnable_avg.stddev
265.50 ± 2% +12.3% 298.07 ± 2% sched_debug.cfs_rq:/.util_avg.stddev
686.78 ± 7% +23.4% 847.76 ± 6% sched_debug.cfs_rq:/.util_est.stddev
19.44 ± 5% +29.7% 25.22 ± 4% sched_debug.cpu.nr_running.max
4.48 ± 5% +34.4% 6.02 ± 3% sched_debug.cpu.nr_running.stddev
67323 ± 14% +130.3% 155017 ± 29% sched_debug.cpu.nr_switches.stddev
-20.78 -18.2% -17.00 sched_debug.cpu.nr_uninterruptible.min
0.13 ±100% -85.8% 0.02 ±163% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
0.17 ±116% -97.8% 0.00 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
22.92 ±110% -97.4% 0.59 ±137% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
8.10 ± 45% -78.0% 1.78 ±135% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
3.14 ± 19% -70.9% 0.91 ±102% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
39.05 ±149% -97.4% 1.01 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
15.77 ±203% -99.7% 0.04 ±102% perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
1.27 ±177% -98.2% 0.02 ±190% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
0.20 ±140% -92.4% 0.02 ±201% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
86.63 ±221% -99.9% 0.05 ±184% perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.18 ± 75% -97.0% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
0.13 ± 34% -75.5% 0.03 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
0.26 ±108% -86.2% 0.04 ±142% perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
2.33 ± 11% -65.8% 0.80 ±107% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
0.18 ± 88% -91.1% 0.02 ±194% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
0.50 ±145% -92.5% 0.04 ±210% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
0.19 ±116% -98.5% 0.00 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
0.24 ±128% -96.8% 0.01 ±180% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
0.99 ± 16% -58.0% 0.42 ±100% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
0.27 ±124% -97.5% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
1.08 ± 28% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.96 ± 93% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.53 ±182% -94.2% 0.03 ±158% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.84 ±160% -93.5% 0.05 ±100% perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
29.39 ±172% -94.0% 1.78 ±123% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
21.51 ± 60% -74.7% 5.45 ±118% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.77 ± 61% -81.3% 2.57 ±113% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
11.22 ± 33% -74.5% 2.86 ±107% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
1.99 ± 90% -90.1% 0.20 ±100% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
4.50 ±138% -94.9% 0.23 ±200% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
27.91 ±218% -99.6% 0.11 ±120% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
9.91 ± 51% -68.3% 3.15 ±124% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
10.18 ± 24% -62.4% 3.83 ±105% perf-sched.sch_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
1.16 ± 20% -62.7% 0.43 ±106% perf-sched.sch_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
0.27 ± 99% -92.0% 0.02 ±172% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
0.32 ±128% -98.9% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
0.88 ± 94% -86.7% 0.12 ±144% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
252.53 ±128% -98.4% 4.12 ±138% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
60.22 ± 58% -67.8% 19.37 ±146% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
168.93 ±209% -99.9% 0.15 ±100% perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
3.79 ±169% -98.6% 0.05 ±199% perf-sched.sch_delay.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
517.19 ±222% -99.9% 0.29 ±201% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.54 ± 82% -98.4% 0.01 ±141% perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
0.34 ± 57% -93.1% 0.02 ±203% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
0.64 ±141% -99.4% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
0.28 ±111% -97.2% 0.01 ±180% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
0.29 ±114% -97.6% 0.01 ±141% perf-sched.sch_delay.max.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
133.30 ± 46% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.53 ±135% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1.11 ± 85% -76.9% 0.26 ±202% perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
7.48 ±214% -99.0% 0.08 ±141% perf-sched.sch_delay.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
28.59 ±191% -99.0% 0.28 ±120% perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
285.16 ±145% -99.3% 1.94 ±111% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
143.71 ±128% -91.0% 12.97 ±134% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
107.10 ±162% -99.1% 0.95 ±190% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
352.73 ±216% -99.4% 2.06 ±118% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1169 ± 25% -58.7% 482.79 ±101% perf-sched.sch_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
1.80 ± 20% -58.5% 0.75 ±105% perf-sched.total_sch_delay.average.ms
5.09 ± 20% -58.0% 2.14 ±106% perf-sched.total_wait_and_delay.average.ms
20.86 ± 25% -82.0% 3.76 ±147% perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
8.10 ± 21% -69.1% 2.51 ±103% perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
22.82 ± 27% -66.9% 7.55 ±103% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
6.55 ± 13% -64.1% 2.35 ±108% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
139.95 ± 55% -64.0% 50.45 ±122% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
27.54 ± 61% -81.3% 5.15 ±113% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
27.75 ± 30% -73.3% 7.42 ±106% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
26.76 ± 25% -64.2% 9.57 ±107% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
29.39 ± 34% -67.3% 9.61 ±115% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
27.53 ± 25% -62.9% 10.21 ±105% perf-sched.wait_and_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
3.25 ± 20% -62.2% 1.23 ±106% perf-sched.wait_and_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
864.18 ± 4% -99.3% 6.27 ±103% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
141.47 ± 38% -72.9% 38.27 ±154% perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
2346 ± 25% -58.7% 969.53 ±101% perf-sched.wait_and_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
83.99 ±223% -100.0% 0.02 ±163% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
0.16 ±122% -97.7% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
12.76 ± 37% -81.6% 2.35 ±125% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
4.96 ± 22% -67.9% 1.59 ±104% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
75.22 ± 91% -96.4% 2.67 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
23.31 ±188% -98.8% 0.28 ±195% perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
14.93 ± 22% -68.0% 4.78 ±104% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1.29 ±178% -98.5% 0.02 ±185% perf-sched.wait_time.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
0.20 ±140% -92.5% 0.02 ±200% perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
87.29 ±221% -99.9% 0.05 ±184% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.18 ± 76% -97.0% 0.01 ±141% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
0.12 ± 33% -87.4% 0.02 ±212% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
4.22 ± 15% -63.3% 1.55 ±108% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
0.18 ± 88% -91.1% 0.02 ±194% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
0.50 ±145% -92.5% 0.04 ±210% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
0.19 ±116% -98.5% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
0.24 ±128% -96.8% 0.01 ±180% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
1.79 ± 27% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.98 ± 92% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
2.44 ±199% -98.1% 0.05 ±109% perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
125.16 ± 52% -64.6% 44.36 ±120% perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
13.77 ± 61% -81.3% 2.58 ±113% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
16.53 ± 29% -72.5% 4.55 ±106% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
3.11 ± 80% -80.7% 0.60 ±138% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
17.30 ± 23% -65.0% 6.05 ±107% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
50.76 ±143% -98.1% 0.97 ±101% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
19.48 ± 27% -66.8% 6.46 ±111% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
17.35 ± 25% -63.3% 6.37 ±106% perf-sched.wait_time.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
2.09 ± 21% -62.0% 0.79 ±107% perf-sched.wait_time.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
850.73 ± 6% -99.3% 5.76 ±102% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
168.00 ±223% -100.0% 0.02 ±172% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
0.32 ±131% -98.8% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
0.88 ± 94% -86.7% 0.12 ±144% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
83.05 ± 45% -75.0% 20.78 ±142% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
393.39 ± 76% -96.3% 14.60 ±223% perf-sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
3.87 ±170% -98.6% 0.05 ±199% perf-sched.wait_time.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
520.88 ±222% -99.9% 0.29 ±201% perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.54 ± 82% -98.4% 0.01 ±141% perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
0.34 ± 57% -93.1% 0.02 ±203% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
0.64 ±141% -99.4% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
0.28 ±111% -97.2% 0.01 ±180% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
210.15 ± 42% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
34.48 ±131% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1.11 ± 85% -76.9% 0.26 ±202% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
92.32 ±212% -99.7% 0.27 ±123% perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
3252 ± 21% -58.5% 1351 ±103% perf-sched.wait_time.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
1602 ± 28% -66.2% 541.12 ±100% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
530.17 ± 95% -98.5% 7.79 ±119% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1177 ± 25% -58.7% 486.74 ±101% perf-sched.wait_time.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
50.88 -1.4 49.53 perf-profile.calltrace.cycles-pp.read
45.95 -1.0 44.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
45.66 -1.0 44.64 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
3.44 ± 4% -0.8 2.66 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
3.32 ± 4% -0.8 2.56 ± 4% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg
3.28 ± 4% -0.8 2.52 ± 4% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable
3.48 ± 3% -0.6 2.83 ± 5% perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
3.52 ± 3% -0.6 2.87 ± 5% perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
3.45 ± 3% -0.6 2.80 ± 5% perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg
47.06 -0.6 46.45 perf-profile.calltrace.cycles-pp.write
4.26 ± 5% -0.6 3.69 perf-profile.calltrace.cycles-pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write
1.58 ± 3% -0.6 1.02 ± 8% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
1.31 ± 3% -0.5 0.85 ± 8% perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
1.25 ± 3% -0.4 0.81 ± 8% perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
0.84 ± 3% -0.2 0.60 ± 5% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
7.91 -0.2 7.68 perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
3.17 ± 2% -0.2 2.94 perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
7.80 -0.2 7.58 perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
7.58 -0.2 7.36 perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
1.22 ± 4% -0.2 1.02 ± 4% perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
1.18 ± 4% -0.2 0.99 ± 4% perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_timeout
0.87 -0.2 0.68 ± 8% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_timeout
1.14 ± 4% -0.2 0.95 ± 4% perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule
0.90 -0.2 0.72 ± 7% perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
3.45 ± 3% -0.1 3.30 perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
1.96 -0.1 1.82 perf-profile.calltrace.cycles-pp.clear_bhb_loop.read
1.97 -0.1 1.86 perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
2.35 -0.1 2.25 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
2.58 -0.1 2.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
1.38 ± 4% -0.1 1.28 ± 2% perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
1.35 -0.1 1.25 perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
0.67 ± 7% -0.1 0.58 ± 3% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule
2.59 -0.1 2.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
2.02 -0.1 1.96 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
0.77 ± 3% -0.0 0.72 ± 2% perf-profile.calltrace.cycles-pp.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.65 ± 4% -0.0 0.60 ± 2% perf-profile.calltrace.cycles-pp.fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.74 -0.0 0.70 perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
1.04 -0.0 0.99 perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb
0.69 -0.0 0.65 ± 2% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter
0.82 -0.0 0.80 perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
0.57 -0.0 0.56 perf-profile.calltrace.cycles-pp.refill_obj_stock.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
0.80 ± 9% +0.2 1.01 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
2.50 ± 4% +0.3 2.82 ± 9% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
2.64 ± 6% +0.4 3.06 ± 12% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic
2.73 ± 6% +0.4 3.16 ± 12% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
2.87 ± 6% +0.4 3.30 ± 12% perf-profile.calltrace.cycles-pp.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
18.38 +0.6 18.93 perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
0.00 +0.7 0.70 ± 11% perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
0.00 +0.8 0.76 ± 16% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_stream_sendmsg.sock_write_iter.vfs_write
0.00 +1.5 1.46 ± 11% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
0.00 +1.5 1.46 ± 11% perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.00 +1.5 1.46 ± 11% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.00 +1.5 1.50 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
0.00 +1.5 1.52 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.00 +1.6 1.61 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.18 ±141% +1.8 1.93 ± 11% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.18 ±141% +1.8 1.94 ± 11% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
0.18 ±141% +1.8 1.94 ± 11% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
0.18 ±141% +1.8 1.97 ± 11% perf-profile.calltrace.cycles-pp.common_startup_64
0.00 +2.0 1.96 ± 11% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
87.96 -1.4 86.57 perf-profile.children.cycles-pp.do_syscall_64
88.72 -1.4 87.33 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
51.44 -1.4 50.05 perf-profile.children.cycles-pp.read
4.55 ± 2% -0.8 3.74 ± 5% perf-profile.children.cycles-pp.schedule
3.76 ± 4% -0.7 3.02 ± 3% perf-profile.children.cycles-pp.__wake_up_common
3.64 ± 4% -0.7 2.92 ± 3% perf-profile.children.cycles-pp.autoremove_wake_function
3.60 ± 4% -0.7 2.90 ± 3% perf-profile.children.cycles-pp.try_to_wake_up
4.00 ± 2% -0.6 3.36 ± 4% perf-profile.children.cycles-pp.schedule_timeout
4.65 ± 2% -0.6 4.02 ± 4% perf-profile.children.cycles-pp.__schedule
47.64 -0.6 47.01 perf-profile.children.cycles-pp.write
4.58 ± 4% -0.5 4.06 perf-profile.children.cycles-pp.__wake_up_sync_key
1.45 ± 2% -0.4 1.00 ± 5% perf-profile.children.cycles-pp.exit_to_user_mode_loop
1.84 ± 3% -0.3 1.50 ± 3% perf-profile.children.cycles-pp.ttwu_do_activate
1.62 ± 2% -0.3 1.33 ± 3% perf-profile.children.cycles-pp.enqueue_task
1.53 ± 2% -0.3 1.26 ± 3% perf-profile.children.cycles-pp.enqueue_task_fair
1.40 -0.3 1.14 ± 6% perf-profile.children.cycles-pp.pick_next_task_fair
3.97 -0.2 3.73 perf-profile.children.cycles-pp.clear_bhb_loop
1.43 -0.2 1.19 ± 5% perf-profile.children.cycles-pp.__pick_next_task
0.75 ± 4% -0.2 0.52 ± 8% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
7.95 -0.2 7.72 perf-profile.children.cycles-pp.unix_stream_read_actor
7.84 -0.2 7.61 perf-profile.children.cycles-pp.skb_copy_datagram_iter
3.24 ± 2% -0.2 3.01 perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
7.63 -0.2 7.42 perf-profile.children.cycles-pp.__skb_datagram_iter
0.94 ± 4% -0.2 0.73 ± 4% perf-profile.children.cycles-pp.enqueue_entity
0.95 ± 8% -0.2 0.76 ± 4% perf-profile.children.cycles-pp.update_curr
1.37 ± 3% -0.2 1.18 ± 3% perf-profile.children.cycles-pp.dequeue_task_fair
1.34 ± 4% -0.2 1.16 ± 3% perf-profile.children.cycles-pp.try_to_block_task
4.50 -0.2 4.34 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
1.37 ± 3% -0.2 1.20 ± 3% perf-profile.children.cycles-pp.dequeue_entities
3.48 ± 3% -0.1 3.33 perf-profile.children.cycles-pp._copy_to_iter
0.91 -0.1 0.78 ± 3% perf-profile.children.cycles-pp.update_load_avg
4.85 -0.1 4.72 perf-profile.children.cycles-pp.__check_object_size
3.23 -0.1 3.11 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.54 ± 3% -0.1 0.42 ± 5% perf-profile.children.cycles-pp.switch_mm_irqs_off
1.40 ± 4% -0.1 1.30 ± 2% perf-profile.children.cycles-pp._copy_from_iter
2.02 -0.1 1.92 perf-profile.children.cycles-pp.its_return_thunk
0.43 ± 2% -0.1 0.32 ± 3% perf-profile.children.cycles-pp.switch_fpu_return
0.29 ± 2% -0.1 0.18 ± 6% perf-profile.children.cycles-pp.__enqueue_entity
1.46 ± 3% -0.1 1.36 ± 2% perf-profile.children.cycles-pp.fdget_pos
0.44 ± 3% -0.1 0.34 ± 5% perf-profile.children.cycles-pp.set_next_entity
0.42 ± 2% -0.1 0.32 ± 4% perf-profile.children.cycles-pp.pick_task_fair
0.31 ± 2% -0.1 0.24 ± 6% perf-profile.children.cycles-pp.reweight_entity
0.28 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.__dequeue_entity
1.96 -0.1 1.88 perf-profile.children.cycles-pp.obj_cgroup_charge_account
0.28 ± 2% -0.1 0.21 ± 3% perf-profile.children.cycles-pp.update_cfs_group
0.23 ± 2% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.pick_eevdf
0.26 ± 2% -0.1 0.19 ± 4% perf-profile.children.cycles-pp.wakeup_preempt
1.46 -0.1 1.40 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.48 ± 2% -0.1 0.42 ± 5% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.30 -0.1 0.24 ± 4% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.82 -0.1 0.77 perf-profile.children.cycles-pp.__cond_resched
0.27 ± 2% -0.0 0.22 ± 4% perf-profile.children.cycles-pp.__update_load_avg_se
0.14 ± 3% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.update_curr_se
0.79 -0.0 0.74 perf-profile.children.cycles-pp.mutex_lock
0.34 ± 3% -0.0 0.30 ± 5% perf-profile.children.cycles-pp.rseq_ip_fixup
0.15 ± 4% -0.0 0.11 ± 5% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.21 ± 3% -0.0 0.16 ± 4% perf-profile.children.cycles-pp.__switch_to
0.17 ± 4% -0.0 0.13 ± 7% perf-profile.children.cycles-pp.place_entity
0.22 -0.0 0.18 ± 2% perf-profile.children.cycles-pp.wake_affine
0.24 -0.0 0.20 ± 2% perf-profile.children.cycles-pp.check_stack_object
0.64 ± 2% -0.0 0.61 ± 3% perf-profile.children.cycles-pp.__virt_addr_valid
0.38 ± 2% -0.0 0.34 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
0.18 ± 3% -0.0 0.14 ± 6% perf-profile.children.cycles-pp.update_rq_clock
0.66 -0.0 0.62 perf-profile.children.cycles-pp.rw_verify_area
0.19 -0.0 0.16 ± 4% perf-profile.children.cycles-pp.task_mm_cid_work
0.34 ± 3% -0.0 0.31 ± 2% perf-profile.children.cycles-pp.update_process_times
0.12 ± 8% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.detach_tasks
0.39 ± 3% -0.0 0.36 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.21 ± 3% -0.0 0.18 ± 6% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.18 ± 6% -0.0 0.15 ± 4% perf-profile.children.cycles-pp.task_tick_fair
0.25 ± 3% -0.0 0.22 ± 4% perf-profile.children.cycles-pp.rseq_get_rseq_cs
0.23 ± 5% -0.0 0.20 ± 3% perf-profile.children.cycles-pp.sched_tick
0.14 ± 3% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.check_preempt_wakeup_fair
0.11 ± 4% -0.0 0.08 ± 7% perf-profile.children.cycles-pp.update_min_vruntime
0.06 -0.0 0.03 ± 70% perf-profile.children.cycles-pp.update_curr_dl_se
0.14 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.put_prev_entity
0.13 ± 5% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.task_h_load
0.68 -0.0 0.65 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.46 ± 2% -0.0 0.43 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt
0.52 -0.0 0.50 perf-profile.children.cycles-pp.scm_recv_unix
0.08 ± 4% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.__cgroup_account_cputime
0.11 ± 5% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.__switch_to_asm
0.46 ± 2% -0.0 0.44 ± 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.08 ± 8% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.activate_task
0.08 ± 8% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.detach_task
0.11 ± 5% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.os_xsave
0.13 ± 5% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.avg_vruntime
0.13 ± 4% -0.0 0.11 ± 5% perf-profile.children.cycles-pp.update_entity_lag
0.08 ± 4% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.__calc_delta
0.09 ± 5% -0.0 0.07 ± 8% perf-profile.children.cycles-pp.vruntime_eligible
0.34 ± 2% -0.0 0.32 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.30 -0.0 0.29 ± 2% perf-profile.children.cycles-pp.__build_skb_around
0.08 ± 5% -0.0 0.07 ± 6% perf-profile.children.cycles-pp.rseq_update_cpu_node_id
0.15 -0.0 0.14 perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
0.07 ± 5% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret
0.38 ± 2% +0.0 0.40 ± 2% perf-profile.children.cycles-pp.mod_memcg_lruvec_state
0.27 ± 2% +0.0 0.30 ± 2% perf-profile.children.cycles-pp.prepare_task_switch
0.05 ± 7% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.handle_softirqs
0.06 +0.0 0.09 ± 11% perf-profile.children.cycles-pp.finish_wait
0.06 ± 7% +0.0 0.11 ± 6% perf-profile.children.cycles-pp.__irq_exit_rcu
0.06 ± 8% +0.1 0.11 ± 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.01 ±223% +0.1 0.07 ± 10% perf-profile.children.cycles-pp.ktime_get
0.54 ± 4% +0.1 0.61 perf-profile.children.cycles-pp.select_task_rq
0.00 +0.1 0.07 ± 10% perf-profile.children.cycles-pp.enqueue_dl_entity
0.12 ± 4% +0.1 0.19 ± 7% perf-profile.children.cycles-pp.get_any_partial
0.10 ± 9% +0.1 0.18 ± 5% perf-profile.children.cycles-pp.available_idle_cpu
0.00 +0.1 0.08 ± 9% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.dl_server_start
0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.dl_server_stop
0.46 ± 2% +0.1 0.54 ± 2% perf-profile.children.cycles-pp.select_task_rq_fair
0.00 +0.1 0.10 ± 10% perf-profile.children.cycles-pp.select_idle_core
0.09 ± 7% +0.1 0.20 ± 8% perf-profile.children.cycles-pp.select_idle_cpu
0.18 ± 4% +0.1 0.31 ± 6% perf-profile.children.cycles-pp.select_idle_sibling
0.00 +0.2 0.18 ± 4% perf-profile.children.cycles-pp.process_one_work
0.06 ± 13% +0.2 0.25 ± 9% perf-profile.children.cycles-pp.schedule_idle
0.44 ± 2% +0.2 0.64 ± 8% perf-profile.children.cycles-pp.prepare_to_wait
0.00 +0.2 0.21 ± 5% perf-profile.children.cycles-pp.kthread
0.00 +0.2 0.21 ± 5% perf-profile.children.cycles-pp.worker_thread
0.00 +0.2 0.21 ± 4% perf-profile.children.cycles-pp.ret_from_fork
0.00 +0.2 0.21 ± 4% perf-profile.children.cycles-pp.ret_from_fork_asm
0.11 ± 12% +0.3 0.36 ± 9% perf-profile.children.cycles-pp.sched_ttwu_pending
0.31 ± 35% +0.3 0.59 ± 11% perf-profile.children.cycles-pp.__cmd_record
0.26 ± 45% +0.3 0.54 ± 13% perf-profile.children.cycles-pp.perf_session__process_events
0.26 ± 45% +0.3 0.54 ± 13% perf-profile.children.cycles-pp.reader__read_event
0.26 ± 45% +0.3 0.54 ± 13% perf-profile.children.cycles-pp.record__finish_output
0.16 ± 11% +0.3 0.45 ± 9% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.14 ± 11% +0.3 0.45 ± 9% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.14 ± 60% +0.3 0.48 ± 17% perf-profile.children.cycles-pp.ordered_events__queue
0.14 ± 61% +0.3 0.48 ± 17% perf-profile.children.cycles-pp.queue_event
0.15 ± 59% +0.3 0.49 ± 16% perf-profile.children.cycles-pp.process_simple
0.16 ± 12% +0.4 0.54 ± 10% perf-profile.children.cycles-pp.sysvec_call_function_single
4.61 ± 3% +0.5 5.13 ± 8% perf-profile.children.cycles-pp.get_partial_node
5.57 ± 3% +0.6 6.12 ± 7% perf-profile.children.cycles-pp.___slab_alloc
18.44 +0.6 19.00 perf-profile.children.cycles-pp.sock_alloc_send_pskb
6.51 ± 3% +0.7 7.26 ± 9% perf-profile.children.cycles-pp.__put_partials
0.33 ± 14% +1.0 1.30 ± 11% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.34 ± 17% +1.1 1.47 ± 11% perf-profile.children.cycles-pp.pv_native_safe_halt
0.34 ± 17% +1.1 1.48 ± 11% perf-profile.children.cycles-pp.acpi_safe_halt
0.34 ± 17% +1.1 1.48 ± 11% perf-profile.children.cycles-pp.acpi_idle_do_entry
0.34 ± 17% +1.1 1.48 ± 11% perf-profile.children.cycles-pp.acpi_idle_enter
0.35 ± 17% +1.2 1.53 ± 11% perf-profile.children.cycles-pp.cpuidle_enter_state
0.35 ± 17% +1.2 1.54 ± 11% perf-profile.children.cycles-pp.cpuidle_enter
0.38 ± 17% +1.3 1.63 ± 11% perf-profile.children.cycles-pp.cpuidle_idle_call
0.45 ± 16% +1.5 1.94 ± 11% perf-profile.children.cycles-pp.start_secondary
0.46 ± 17% +1.5 1.96 ± 11% perf-profile.children.cycles-pp.do_idle
0.46 ± 17% +1.5 1.97 ± 11% perf-profile.children.cycles-pp.common_startup_64
0.46 ± 17% +1.5 1.97 ± 11% perf-profile.children.cycles-pp.cpu_startup_entry
13.76 ± 2% +1.7 15.44 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
12.09 ± 2% +1.9 14.00 ± 6% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
3.93 -0.2 3.69 perf-profile.self.cycles-pp.clear_bhb_loop
3.43 ± 3% -0.1 3.29 perf-profile.self.cycles-pp._copy_to_iter
0.50 ± 2% -0.1 0.39 ± 5% perf-profile.self.cycles-pp.switch_mm_irqs_off
1.37 ± 4% -0.1 1.27 ± 2% perf-profile.self.cycles-pp._copy_from_iter
0.28 ± 2% -0.1 0.18 ± 7% perf-profile.self.cycles-pp.__enqueue_entity
1.41 ± 3% -0.1 1.31 ± 2% perf-profile.self.cycles-pp.fdget_pos
2.51 -0.1 2.42 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
1.35 -0.1 1.28 perf-profile.self.cycles-pp.read
2.24 -0.1 2.17 perf-profile.self.cycles-pp.do_syscall_64
0.27 ± 3% -0.1 0.20 ± 3% perf-profile.self.cycles-pp.update_cfs_group
1.28 -0.1 1.22 perf-profile.self.cycles-pp.sock_write_iter
0.84 -0.1 0.77 perf-profile.self.cycles-pp.vfs_read
1.42 -0.1 1.36 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.20 -0.1 1.14 perf-profile.self.cycles-pp.__alloc_skb
0.18 ± 2% -0.1 0.13 ± 5% perf-profile.self.cycles-pp.pick_eevdf
1.04 -0.1 0.99 perf-profile.self.cycles-pp.its_return_thunk
0.29 ± 2% -0.1 0.24 ± 4% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.28 ± 5% -0.1 0.23 ± 6% perf-profile.self.cycles-pp.update_curr
0.13 ± 5% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.switch_fpu_return
0.20 ± 3% -0.0 0.15 ± 6% perf-profile.self.cycles-pp.__dequeue_entity
1.00 -0.0 0.95 perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
0.33 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.update_load_avg
0.88 -0.0 0.83 ± 2% perf-profile.self.cycles-pp.vfs_write
0.91 -0.0 0.86 perf-profile.self.cycles-pp.sock_read_iter
0.13 ± 3% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.update_curr_se
0.25 ± 2% -0.0 0.21 ± 4% perf-profile.self.cycles-pp.__update_load_avg_se
1.22 -0.0 1.18 perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
0.68 -0.0 0.63 perf-profile.self.cycles-pp.__check_object_size
0.78 ± 2% -0.0 0.74 perf-profile.self.cycles-pp.obj_cgroup_charge_account
0.20 ± 3% -0.0 0.16 ± 4% perf-profile.self.cycles-pp.__switch_to
0.15 ± 3% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.try_to_wake_up
0.90 -0.0 0.86 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.76 ± 2% -0.0 0.73 perf-profile.self.cycles-pp.__check_heap_object
0.92 -0.0 0.89 ± 2% perf-profile.self.cycles-pp.__account_obj_stock
0.19 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.check_stack_object
0.40 ± 3% -0.0 0.37 perf-profile.self.cycles-pp.__schedule
0.60 ± 2% -0.0 0.56 ± 3% perf-profile.self.cycles-pp.__virt_addr_valid
0.71 -0.0 0.68 perf-profile.self.cycles-pp.__skb_datagram_iter
0.18 ± 4% -0.0 0.14 ± 5% perf-profile.self.cycles-pp.task_mm_cid_work
0.68 -0.0 0.65 perf-profile.self.cycles-pp.refill_obj_stock
0.34 -0.0 0.31 ± 2% perf-profile.self.cycles-pp.unix_stream_recvmsg
0.06 ± 7% -0.0 0.03 ± 70% perf-profile.self.cycles-pp.enqueue_task
0.11 -0.0 0.08 perf-profile.self.cycles-pp.pick_task_fair
0.15 ± 2% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.enqueue_task_fair
0.20 ± 3% -0.0 0.17 ± 7% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.41 -0.0 0.38 perf-profile.self.cycles-pp.sock_recvmsg
0.10 -0.0 0.07 ± 6% perf-profile.self.cycles-pp.update_min_vruntime
0.13 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.task_h_load
0.23 ± 3% -0.0 0.20 ± 6% perf-profile.self.cycles-pp.__get_user_8
0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.exit_to_user_mode_loop
0.39 ± 2% -0.0 0.37 ± 2% perf-profile.self.cycles-pp.rw_verify_area
0.11 ± 3% -0.0 0.09 ± 7% perf-profile.self.cycles-pp.os_xsave
0.12 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.pick_next_task_fair
0.35 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
0.46 -0.0 0.44 perf-profile.self.cycles-pp.mutex_lock
0.11 ± 4% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__switch_to_asm
0.10 ± 3% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.enqueue_entity
0.08 ± 7% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.place_entity
0.30 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.alloc_skb_with_frags
0.50 -0.0 0.48 perf-profile.self.cycles-pp.kfree
0.30 -0.0 0.28 perf-profile.self.cycles-pp.ksys_write
0.12 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.dequeue_entity
0.11 ± 4% -0.0 0.09 perf-profile.self.cycles-pp.prepare_to_wait
0.19 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.update_rq_clock_task
0.27 -0.0 0.25 ± 2% perf-profile.self.cycles-pp.__build_skb_around
0.08 ± 6% -0.0 0.06 ± 9% perf-profile.self.cycles-pp.vruntime_eligible
0.12 ± 4% -0.0 0.10 perf-profile.self.cycles-pp.__wake_up_common
0.27 -0.0 0.26 perf-profile.self.cycles-pp.kmalloc_reserve
0.48 -0.0 0.46 perf-profile.self.cycles-pp.unix_write_space
0.19 -0.0 0.18 ± 2% perf-profile.self.cycles-pp.skb_copy_datagram_iter
0.07 -0.0 0.06 ± 6% perf-profile.self.cycles-pp.__calc_delta
0.06 ± 6% -0.0 0.05 perf-profile.self.cycles-pp.__put_user_8
0.28 -0.0 0.27 perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.11 -0.0 0.10 perf-profile.self.cycles-pp.wait_for_unix_gc
0.05 +0.0 0.06 perf-profile.self.cycles-pp.__x64_sys_write
0.07 ± 5% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret
0.19 ± 7% +0.0 0.22 ± 4% perf-profile.self.cycles-pp.prepare_task_switch
0.10 ± 6% +0.1 0.17 ± 5% perf-profile.self.cycles-pp.available_idle_cpu
0.14 ± 61% +0.3 0.48 ± 17% perf-profile.self.cycles-pp.queue_event
0.19 ± 18% +0.7 0.85 ± 12% perf-profile.self.cycles-pp.pv_native_safe_halt
12.07 ± 2% +1.9 13.97 ± 6% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_3/aim9/300s
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
9156 +20.2% 11004 vmstat.system.cs
8715946 ± 6% -14.0% 7494314 ± 13% meminfo.DirectMap2M
10992 +85.4% 20381 meminfo.PageTables
318.58 -1.7% 313.01 aim9.shell_rtns_3.ops_per_sec
27145198 -2.1% 26576524 aim9.time.minor_page_faults
1049306 -1.8% 1030938 aim9.time.voluntary_context_switches
6173 ± 20% +74.0% 10742 ± 4% numa-meminfo.node0.PageTables
5702 ± 31% +55.1% 8844 ± 19% numa-meminfo.node0.Shmem
4803 ± 25% +100.6% 9636 ± 6% numa-meminfo.node1.PageTables
1538 ± 20% +73.7% 2673 ± 5% numa-vmstat.node0.nr_page_table_pages
1425 ± 31% +55.1% 2210 ± 19% numa-vmstat.node0.nr_shmem
1194 ± 25% +101.2% 2402 ± 6% numa-vmstat.node1.nr_page_table_pages
30413 +19.3% 36291 sched_debug.cpu.nr_switches.avg
84768 ± 6% +20.3% 101955 ± 4% sched_debug.cpu.nr_switches.max
25510 ± 13% +23.0% 31383 ± 3% sched_debug.cpu.nr_switches.stddev
2727 +85.8% 5066 proc-vmstat.nr_page_table_pages
19325131 -1.6% 19014535 proc-vmstat.numa_hit
19274656 -1.6% 18964467 proc-vmstat.numa_local
19877211 -1.6% 19563123 proc-vmstat.pgalloc_normal
28020416 -2.0% 27451741 proc-vmstat.pgfault
19829318 -1.6% 19508263 proc-vmstat.pgfree
2679 -1.6% 2636 proc-vmstat.unevictable_pgs_culled
0.03 ± 10% +30.9% 0.04 ± 2% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 5% +26.2% 0.02 ± 3% perf-sched.total_sch_delay.average.ms
27.03 ± 2% -12.4% 23.66 perf-sched.total_wait_and_delay.average.ms
23171 +18.2% 27385 perf-sched.total_wait_and_delay.count.ms
27.01 ± 2% -12.5% 23.64 perf-sched.total_wait_time.average.ms
110.73 ± 4% -71.1% 31.98 perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1662 ± 2% +278.6% 6294 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
110.70 ± 4% -71.1% 31.94 perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
5.94 +0.1 6.00 perf-stat.i.branch-miss-rate%
9184 +20.2% 11041 perf-stat.i.context-switches
1.96 +1.6% 1.99 perf-stat.i.cpi
71.73 ± 4% +66.1% 119.11 ± 5% perf-stat.i.cpu-migrations
0.53 -1.4% 0.52 perf-stat.i.ipc
3.79 -2.0% 3.71 perf-stat.i.metric.K/sec
90919 -2.0% 89065 perf-stat.i.minor-faults
90919 -2.0% 89065 perf-stat.i.page-faults
6.00 +0.1 6.06 perf-stat.overall.branch-miss-rate%
1.79 +1.2% 1.81 perf-stat.overall.cpi
0.56 -1.2% 0.55 perf-stat.overall.ipc
9154 +20.2% 11004 perf-stat.ps.context-switches
71.49 ± 4% +66.1% 118.72 ± 5% perf-stat.ps.cpu-migrations
90616 -2.0% 88768 perf-stat.ps.minor-faults
90616 -2.0% 88768 perf-stat.ps.page-faults
8.89 -0.2 8.68 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
8.88 -0.2 8.66 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.47 ± 2% -0.2 3.29 perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.47 ± 2% -0.2 3.29 perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.51 ± 3% -0.2 3.33 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.47 ± 2% -0.2 3.29 perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
1.66 ± 2% -0.1 1.57 ± 4% perf-profile.calltrace.cycles-pp.setlocale
0.27 ±100% +0.3 0.61 ± 5% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.18 ±141% +0.4 0.60 ± 5% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
62.46 +0.6 63.01 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
0.09 ±223% +0.6 0.65 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
0.09 ±223% +0.6 0.65 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
49.01 +0.6 49.60 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
67.47 +0.7 68.17 perf-profile.calltrace.cycles-pp.common_startup_64
20.25 -0.7 19.58 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
20.21 -0.7 19.54 perf-profile.children.cycles-pp.do_syscall_64
6.54 -0.2 6.33 perf-profile.children.cycles-pp.asm_exc_page_fault
6.10 -0.2 5.90 perf-profile.children.cycles-pp.do_user_addr_fault
3.77 ± 3% -0.2 3.60 perf-profile.children.cycles-pp.x64_sys_call
3.62 ± 3% -0.2 3.46 perf-profile.children.cycles-pp.do_exit
2.63 ± 3% -0.2 2.48 ± 2% perf-profile.children.cycles-pp.__mmput
2.16 ± 2% -0.1 2.06 ± 3% perf-profile.children.cycles-pp.ksys_mmap_pgoff
1.66 ± 2% -0.1 1.57 ± 4% perf-profile.children.cycles-pp.setlocale
2.69 ± 2% -0.1 2.61 perf-profile.children.cycles-pp.do_pte_missing
0.77 ± 5% -0.1 0.70 ± 6% perf-profile.children.cycles-pp.tlb_finish_mmu
0.92 ± 2% -0.0 0.87 ± 4% perf-profile.children.cycles-pp.__irqentry_text_end
0.08 ± 10% -0.0 0.04 ± 71% perf-profile.children.cycles-pp.tick_nohz_tick_stopped
0.10 ± 11% -0.0 0.07 ± 21% perf-profile.children.cycles-pp.__percpu_counter_init_many
0.14 ± 9% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.strnlen
0.12 ± 11% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.mas_prev_slot
0.11 ± 12% +0.0 0.14 ± 9% perf-profile.children.cycles-pp.update_curr
0.19 ± 8% +0.0 0.22 ± 6% perf-profile.children.cycles-pp.enqueue_entity
0.10 ± 11% +0.0 0.13 ± 11% perf-profile.children.cycles-pp.__perf_event_task_sched_out
0.05 ± 46% +0.0 0.08 ± 13% perf-profile.children.cycles-pp.select_task_rq
0.13 ± 14% +0.0 0.17 ± 8% perf-profile.children.cycles-pp.perf_pmu_sched_task
0.20 ± 10% +0.0 0.24 ± 2% perf-profile.children.cycles-pp.try_to_wake_up
0.28 ± 9% +0.1 0.34 ± 9% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.04 ± 44% +0.1 0.11 ± 13% perf-profile.children.cycles-pp.__queue_work
0.30 ± 11% +0.1 0.38 ± 8% perf-profile.children.cycles-pp.ttwu_do_activate
0.30 ± 4% +0.1 0.38 ± 8% perf-profile.children.cycles-pp.__pick_next_task
0.22 ± 7% +0.1 0.29 ± 9% perf-profile.children.cycles-pp.try_to_block_task
0.02 ±141% +0.1 0.09 ± 10% perf-profile.children.cycles-pp.kick_pool
0.02 ± 99% +0.1 0.10 ± 19% perf-profile.children.cycles-pp.queue_work_on
0.25 ± 4% +0.1 0.35 ± 7% perf-profile.children.cycles-pp.sched_ttwu_pending
0.33 ± 6% +0.1 0.43 ± 5% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.29 ± 4% +0.1 0.39 ± 6% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.51 ± 6% +0.1 0.63 ± 6% perf-profile.children.cycles-pp.schedule_idle
0.46 ± 7% +0.1 0.58 ± 5% perf-profile.children.cycles-pp.schedule
0.88 ± 6% +0.2 1.04 ± 5% perf-profile.children.cycles-pp.ret_from_fork_asm
0.18 ± 6% +0.2 0.34 ± 8% perf-profile.children.cycles-pp.worker_thread
0.88 ± 6% +0.2 1.04 ± 5% perf-profile.children.cycles-pp.ret_from_fork
0.38 ± 8% +0.2 0.56 ± 10% perf-profile.children.cycles-pp.kthread
1.08 ± 3% +0.2 1.32 ± 2% perf-profile.children.cycles-pp.__schedule
66.15 +0.5 66.64 perf-profile.children.cycles-pp.cpuidle_idle_call
62.89 +0.6 63.47 perf-profile.children.cycles-pp.cpuidle_enter_state
63.00 +0.6 63.59 perf-profile.children.cycles-pp.cpuidle_enter
49.10 +0.6 49.69 perf-profile.children.cycles-pp.intel_idle
67.47 +0.7 68.17 perf-profile.children.cycles-pp.do_idle
67.47 +0.7 68.17 perf-profile.children.cycles-pp.common_startup_64
67.47 +0.7 68.17 perf-profile.children.cycles-pp.cpu_startup_entry
0.91 ± 2% -0.0 0.86 ± 4% perf-profile.self.cycles-pp.__irqentry_text_end
0.14 ± 11% +0.1 0.22 ± 11% perf-profile.self.cycles-pp.timerqueue_del
49.08 +0.6 49.68 perf-profile.self.cycles-pp.intel_idle
***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/800%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
3745213 ± 39% +108.1% 7794858 ± 12% cpuidle..usage
186670 +17.3% 218939 ± 2% meminfo.Percpu
5.00 +306.7% 20.33 ± 66% mpstat.max_utilization.seconds
9.35 ± 76% -4.5 4.80 ±141% perf-profile.calltrace.cycles-pp.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
8.90 ± 75% -4.3 4.57 ±141% perf-profile.calltrace.cycles-pp.perf_session__deliver_event.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
3283 ± 7% -16.2% 2751 ± 5% sched_debug.cfs_rq:/.avg_vruntime.avg
3283 ± 7% -16.2% 2751 ± 5% sched_debug.cfs_rq:/.min_vruntime.avg
1522512 ± 6% +80.0% 2739797 ± 4% vmstat.system.cs
308726 ± 8% +60.5% 495472 ± 5% vmstat.system.in
467562 +3.7% 485068 ± 2% proc-vmstat.nr_kernel_stack
266084 +3.8% 276310 proc-vmstat.nr_slab_unreclaimable
1.375e+08 -2.0% 1.347e+08 proc-vmstat.numa_hit
1.373e+08 -2.0% 1.346e+08 proc-vmstat.numa_local
217472 ± 3% -28.1% 156410 proc-vmstat.numa_other
1.382e+08 -2.0% 1.354e+08 proc-vmstat.pgalloc_normal
1.375e+08 -2.0% 1.347e+08 proc-vmstat.pgfree
1514102 -6.2% 1420287 hackbench.throughput
1480357 -6.7% 1380775 hackbench.throughput_avg
1514102 -6.2% 1420287 hackbench.throughput_best
1436918 -7.9% 1323413 hackbench.throughput_worst
14551264 ± 13% +138.1% 34644707 ± 3% hackbench.time.involuntary_context_switches
9919 -1.6% 9762 hackbench.time.percent_of_cpu_this_job_got
4239 +4.5% 4428 hackbench.time.system_time
56365933 ± 6% +65.3% 93172066 ± 4% hackbench.time.voluntary_context_switches
65085618 +26.7% 82440571 ± 2% perf-stat.i.branch-misses
31.25 -1.6 29.66 perf-stat.i.cache-miss-rate%
2.469e+08 +8.9% 2.689e+08 perf-stat.i.cache-misses
7.519e+08 +15.9% 8.712e+08 perf-stat.i.cache-references
1353061 ± 7% +87.5% 2537450 ± 5% perf-stat.i.context-switches
2.269e+11 +3.5% 2.348e+11 perf-stat.i.cpu-cycles
134588 ± 13% +81.9% 244825 ± 8% perf-stat.i.cpu-migrations
13.60 ± 5% +70.5% 23.20 ± 5% perf-stat.i.metric.K/sec
1.26 +7.6% 1.35 perf-stat.overall.MPKI
0.11 ± 2% +0.0 0.14 ± 2% perf-stat.overall.branch-miss-rate%
34.12 -2.1 31.97 perf-stat.overall.cache-miss-rate%
1.17 +1.8% 1.19 perf-stat.overall.cpi
931.96 -5.3% 882.44 perf-stat.overall.cycles-between-cache-misses
0.85 -1.8% 0.84 perf-stat.overall.ipc
5.372e+10 -1.2% 5.31e+10 perf-stat.ps.branch-instructions
57783128 ± 2% +32.9% 76802898 ± 2% perf-stat.ps.branch-misses
2.696e+08 +7.2% 2.89e+08 perf-stat.ps.cache-misses
7.902e+08 +14.4% 9.039e+08 perf-stat.ps.cache-references
1288664 ± 7% +94.6% 2508227 ± 5% perf-stat.ps.context-switches
2.512e+11 +1.5% 2.55e+11 perf-stat.ps.cpu-cycles
122960 ± 14% +82.3% 224127 ± 9% perf-stat.ps.cpu-migrations
1.108e+13 +5.7% 1.171e+13 perf-stat.total.instructions
0.94 ±223% +5929.9% 56.62 ±121% perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
26.44 ± 81% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
100.25 ±141% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
9.01 ± 43% +1823.1% 173.24 ±106% perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
49.43 ± 14% +73.8% 85.93 ± 19% perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
130.63 ± 17% +135.8% 308.04 ± 28% perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
18.09 ± 30% +130.4% 41.70 ± 26% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
196.51 ± 21% +102.9% 398.77 ± 15% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
34.17 ± 39% +191.1% 99.46 ± 20% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
154.91 ±163% +1649.9% 2710 ± 91% perf-sched.sch_delay.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
0.94 ±223% +1.9e+05% 1743 ±120% perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
3.19 ±124% -91.9% 0.26 ±150% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
646.26 ± 94% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
282.66 ±139% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
63.17 ± 52% +2854.4% 1866 ±121% perf-sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
1507 ± 35% +249.4% 5266 ± 47% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
3915 ± 67% +98.7% 7779 ± 16% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
53.31 ± 18% +79.9% 95.90 ± 23% perf-sched.total_sch_delay.average.ms
149.37 ± 18% +80.0% 268.92 ± 22% perf-sched.total_wait_and_delay.average.ms
96.07 ± 18% +80.1% 173.01 ± 21% perf-sched.total_wait_time.average.ms
244.53 ± 47% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
529.64 ± 20% +38.5% 733.60 ± 20% perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
136.52 ± 15% +73.7% 237.07 ± 18% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
373.41 ± 16% +136.3% 882.34 ± 27% perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
51.96 ± 29% +127.5% 118.22 ± 25% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
554.86 ± 23% +103.0% 1126 ± 14% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
298.52 ±136% +436.9% 1602 ± 27% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
556.66 ± 37% -97.1% 16.09 ± 47% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
707.67 ± 31% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
1358 ± 28% +4707.9% 65291 ± 27% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
12184 ± 5% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
1393 ±134% +379.9% 6685 ± 15% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
6927 ± 6% +119.8% 15224 ± 19% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
341.61 ± 21% +39.1% 475.15 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
51.39 ± 99% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
121.14 ±122% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
87.09 ± 15% +73.6% 151.14 ± 18% perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
242.78 ± 16% +136.6% 574.31 ± 27% perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
33.86 ± 29% +126.0% 76.52 ± 24% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
250.32 ±109% -89.4% 26.44 ±111% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
358.36 ± 25% +103.1% 727.72 ± 14% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
77.40 ± 47% +102.5% 156.70 ± 28% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
17.91 ± 42% -75.3% 4.42 ± 76% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
266.70 ±137% +431.6% 1417 ± 36% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
536.93 ± 40% -97.4% 13.81 ± 50% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
180.38 ±135% +2208.8% 4164 ± 71% perf-sched.wait_time.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
1028 ±129% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
312.94 ±123% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
418.66 ±132% -93.7% 26.44 ±111% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
1388 ±133% +379.7% 6660 ± 15% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
2022 ± 25% +164.9% 5358 ± 46% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_1/aim9/300s
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
11004 +86.2% 20490 meminfo.PageTables
121.33 ± 12% +18.8% 144.17 ± 5% perf-c2c.DRAM.remote
9155 +20.0% 10990 vmstat.system.cs
5129 ± 20% +107.2% 10631 ± 3% numa-meminfo.node0.PageTables
5864 ± 17% +67.3% 9811 ± 3% numa-meminfo.node1.PageTables
1278 ± 20% +107.9% 2658 ± 3% numa-vmstat.node0.nr_page_table_pages
1469 ± 17% +66.4% 2446 ± 3% numa-vmstat.node1.nr_page_table_pages
319.43 -2.1% 312.66 aim9.shell_rtns_1.ops_per_sec
27217846 -2.5% 26546962 aim9.time.minor_page_faults
1051878 -2.1% 1029547 aim9.time.voluntary_context_switches
30502 +18.6% 36187 sched_debug.cpu.nr_switches.avg
90327 ± 12% +22.7% 110866 ± 4% sched_debug.cpu.nr_switches.max
26316 ± 16% +25.5% 33021 ± 5% sched_debug.cpu.nr_switches.stddev
0.03 ± 7% +70.7% 0.05 ± 53% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 3% +38.9% 0.02 ± 28% perf-sched.total_sch_delay.average.ms
27.43 ± 2% -14.5% 23.45 perf-sched.total_wait_and_delay.average.ms
23174 +18.0% 27340 perf-sched.total_wait_and_delay.count.ms
27.41 ± 2% -14.6% 23.42 perf-sched.total_wait_time.average.ms
115.38 ± 3% -71.9% 32.37 ± 2% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1656 ± 3% +280.2% 6299 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
115.35 ± 3% -72.0% 32.31 ± 2% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2737 +86.1% 5095 proc-vmstat.nr_page_table_pages
30460 +3.2% 31439 proc-vmstat.nr_shmem
27933 +1.8% 28432 proc-vmstat.nr_slab_unreclaimable
19466749 -2.5% 18980434 proc-vmstat.numa_hit
19414531 -2.5% 18927584 proc-vmstat.numa_local
20028107 -2.5% 19528806 proc-vmstat.pgalloc_normal
28087705 -2.4% 27417155 proc-vmstat.pgfault
19980173 -2.5% 19474402 proc-vmstat.pgfree
420074 -5.7% 396239 ± 8% proc-vmstat.pgreuse
2685 -1.9% 2633 proc-vmstat.unevictable_pgs_culled
5.48e+08 -1.2% 5.412e+08 perf-stat.i.branch-instructions
5.92 +0.1 6.00 perf-stat.i.branch-miss-rate%
9195 +19.9% 11021 perf-stat.i.context-switches
1.96 +1.7% 1.99 perf-stat.i.cpi
70.13 +73.4% 121.59 ± 8% perf-stat.i.cpu-migrations
2.725e+09 -1.3% 2.69e+09 perf-stat.i.instructions
0.53 -1.6% 0.52 perf-stat.i.ipc
3.80 -2.4% 3.71 perf-stat.i.metric.K/sec
91139 -2.4% 88949 perf-stat.i.minor-faults
91139 -2.4% 88949 perf-stat.i.page-faults
5.00 ± 44% +1.1 6.07 perf-stat.overall.branch-miss-rate%
1.49 ± 44% +21.9% 1.82 perf-stat.overall.cpi
7643 ± 44% +43.7% 10984 perf-stat.ps.context-switches
58.17 ± 44% +108.4% 121.21 ± 8% perf-stat.ps.cpu-migrations
2.06 ± 2% -0.2 1.87 ± 12% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.98 ± 7% -0.2 0.83 ± 12% perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
1.69 ± 2% -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.setlocale
0.58 ± 5% -0.1 0.44 ± 44% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__open64_nocancel.setlocale
0.72 ± 6% -0.1 0.60 ± 8% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
3.21 ± 2% -0.1 3.11 perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
0.70 ± 4% -0.1 0.62 ± 6% perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
1.52 ± 2% -0.1 1.44 ± 3% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.34 ± 3% -0.1 1.28 ± 3% perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.89 ± 3% -0.1 0.84 perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.17 ±141% +0.4 0.61 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
0.17 ±141% +0.4 0.61 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
65.10 +0.5 65.56 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
66.40 +0.6 67.00 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
66.46 +0.6 67.08 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
66.46 +0.6 67.08 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
67.63 +0.7 68.30 perf-profile.calltrace.cycles-pp.common_startup_64
20.14 -0.6 19.51 perf-profile.children.cycles-pp.do_syscall_64
20.20 -0.6 19.57 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
1.13 ± 5% -0.2 0.98 ± 9% perf-profile.children.cycles-pp.rcu_core
1.69 ± 2% -0.1 1.54 ± 2% perf-profile.children.cycles-pp.setlocale
0.84 ± 4% -0.1 0.71 ± 5% perf-profile.children.cycles-pp.rcu_do_batch
2.16 ± 2% -0.1 2.04 ± 3% perf-profile.children.cycles-pp.ksys_mmap_pgoff
1.15 ± 4% -0.1 1.04 ± 5% perf-profile.children.cycles-pp.__open64_nocancel
3.22 ± 2% -0.1 3.12 perf-profile.children.cycles-pp.exec_binprm
2.09 ± 2% -0.1 2.00 ± 2% perf-profile.children.cycles-pp.kernel_clone
0.88 ± 4% -0.1 0.79 ± 4% perf-profile.children.cycles-pp.mas_store_prealloc
2.19 -0.1 2.10 ± 3% perf-profile.children.cycles-pp.__x64_sys_openat
0.70 ± 4% -0.1 0.62 ± 6% perf-profile.children.cycles-pp.dup_mm
1.36 ± 3% -0.1 1.30 perf-profile.children.cycles-pp._Fork
0.56 ± 4% -0.1 0.50 ± 8% perf-profile.children.cycles-pp.dup_mmap
0.09 ± 16% -0.1 0.03 ± 70% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
0.31 ± 8% -0.1 0.25 ± 10% perf-profile.children.cycles-pp.strncpy_from_user
0.94 ± 3% -0.1 0.88 ± 2% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.41 ± 5% -0.0 0.36 ± 5% perf-profile.children.cycles-pp.irqtime_account_irq
0.18 ± 12% -0.0 0.14 ± 7% perf-profile.children.cycles-pp.tlb_remove_table_rcu
0.20 ± 7% -0.0 0.17 ± 9% perf-profile.children.cycles-pp.perf_event_task_tick
0.08 ± 14% -0.0 0.05 ± 49% perf-profile.children.cycles-pp.mas_update_gap
0.24 ± 5% -0.0 0.21 ± 5% perf-profile.children.cycles-pp.filemap_read
0.19 ± 7% -0.0 0.16 ± 8% perf-profile.children.cycles-pp.__call_rcu_common
0.22 ± 2% -0.0 0.19 ± 5% perf-profile.children.cycles-pp.mas_next_slot
0.09 ± 5% +0.0 0.12 ± 7% perf-profile.children.cycles-pp.__perf_event_task_sched_out
0.05 ± 47% +0.0 0.08 ± 10% perf-profile.children.cycles-pp.lru_gen_del_folio
0.10 ± 14% +0.0 0.12 ± 18% perf-profile.children.cycles-pp.__folio_mod_stat
0.12 ± 12% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.perf_pmu_sched_task
0.20 ± 10% +0.0 0.24 ± 4% perf-profile.children.cycles-pp.prepare_task_switch
0.06 ± 47% +0.0 0.10 ± 11% perf-profile.children.cycles-pp.__queue_work
0.56 ± 5% +0.1 0.61 ± 4% perf-profile.children.cycles-pp.sched_balance_domains
0.04 ± 72% +0.1 0.09 ± 11% perf-profile.children.cycles-pp.kick_pool
0.04 ± 72% +0.1 0.09 ± 14% perf-profile.children.cycles-pp.queue_work_on
0.33 ± 6% +0.1 0.38 ± 7% perf-profile.children.cycles-pp.dequeue_entities
0.35 ± 6% +0.1 0.40 ± 7% perf-profile.children.cycles-pp.dequeue_task_fair
0.52 ± 6% +0.1 0.58 ± 5% perf-profile.children.cycles-pp.enqueue_task_fair
0.54 ± 7% +0.1 0.60 ± 5% perf-profile.children.cycles-pp.enqueue_task
0.28 ± 9% +0.1 0.35 ± 5% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.21 ± 4% +0.1 0.28 ± 12% perf-profile.children.cycles-pp.try_to_block_task
0.34 ± 4% +0.1 0.42 ± 3% perf-profile.children.cycles-pp.ttwu_do_activate
0.36 ± 3% +0.1 0.46 ± 6% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.28 ± 4% +0.1 0.38 ± 5% perf-profile.children.cycles-pp.sched_ttwu_pending
0.33 ± 2% +0.1 0.43 ± 5% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.46 ± 7% +0.1 0.56 ± 6% perf-profile.children.cycles-pp.schedule
0.48 ± 8% +0.1 0.61 ± 8% perf-profile.children.cycles-pp.timerqueue_del
0.18 ± 13% +0.1 0.32 ± 11% perf-profile.children.cycles-pp.worker_thread
0.38 ± 9% +0.2 0.52 ± 10% perf-profile.children.cycles-pp.kthread
1.10 ± 5% +0.2 1.25 ± 2% perf-profile.children.cycles-pp.__schedule
0.85 ± 8% +0.2 1.01 ± 7% perf-profile.children.cycles-pp.ret_from_fork
0.85 ± 8% +0.2 1.02 ± 7% perf-profile.children.cycles-pp.ret_from_fork_asm
63.15 +0.5 63.64 perf-profile.children.cycles-pp.cpuidle_enter
66.26 +0.5 66.77 perf-profile.children.cycles-pp.cpuidle_idle_call
66.46 +0.6 67.08 perf-profile.children.cycles-pp.start_secondary
67.63 +0.7 68.30 perf-profile.children.cycles-pp.common_startup_64
67.63 +0.7 68.30 perf-profile.children.cycles-pp.cpu_startup_entry
67.63 +0.7 68.30 perf-profile.children.cycles-pp.do_idle
1.20 ± 3% -0.1 1.12 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.09 ± 16% -0.1 0.03 ± 70% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
0.25 ± 6% -0.0 0.21 ± 12% perf-profile.self.cycles-pp.irqtime_account_irq
0.02 ±141% +0.0 0.06 ± 13% perf-profile.self.cycles-pp.prepend_path
0.13 ± 10% +0.1 0.24 ± 11% perf-profile.self.cycles-pp.timerqueue_del
***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.924e+08 ± 3% +55.1% 6.086e+08 ± 2% cpuidle..time
7504886 ± 11% +184.4% 21340245 ± 6% cpuidle..usage
13350305 -3.8% 12848570 vmstat.system.cs
1849619 +5.1% 1943754 vmstat.system.in
3.56 ± 5% +2.6 6.16 ± 7% mpstat.cpu.all.idle%
0.69 +0.2 0.90 ± 3% mpstat.cpu.all.irq%
0.03 ± 3% +0.0 0.04 ± 3% mpstat.cpu.all.soft%
18666 ± 9% +41.2% 26352 ± 6% perf-c2c.DRAM.remote
197041 -39.6% 118945 ± 5% perf-c2c.HITM.local
3178 ± 12% +37.2% 4361 ± 11% perf-c2c.HITM.remote
200219 -38.4% 123307 ± 5% perf-c2c.HITM.total
2842579 ± 11% +60.1% 4550025 ± 12% meminfo.Active
2842579 ± 11% +60.1% 4550025 ± 12% meminfo.Active(anon)
5535242 ± 5% +30.9% 7248257 ± 7% meminfo.Cached
3846718 ± 8% +44.0% 5539484 ± 9% meminfo.Committed_AS
9684149 ± 3% +20.5% 11666616 ± 4% meminfo.Memused
136127 ± 3% +14.2% 155524 meminfo.PageTables
62144 +22.8% 76336 meminfo.Percpu
2001586 ± 16% +85.6% 3714611 ± 14% meminfo.Shmem
9759598 ± 3% +20.0% 11714619 ± 4% meminfo.max_used_kB
710625 ± 11% +59.3% 1131770 ± 11% proc-vmstat.nr_active_anon
1383631 ± 5% +30.6% 1806419 ± 7% proc-vmstat.nr_file_pages
34220 ± 3% +13.9% 38987 proc-vmstat.nr_page_table_pages
500216 ± 16% +84.5% 923007 ± 14% proc-vmstat.nr_shmem
710625 ± 11% +59.3% 1131770 ± 11% proc-vmstat.nr_zone_active_anon
92308030 +8.7% 1.004e+08 proc-vmstat.numa_hit
92171407 +8.7% 1.002e+08 proc-vmstat.numa_local
133616 +2.7% 137265 proc-vmstat.numa_other
92394313 +8.7% 1.004e+08 proc-vmstat.pgalloc_normal
91035691 +7.8% 98094626 proc-vmstat.pgfree
867815 +11.8% 970369 hackbench.throughput
830278 +11.6% 926834 hackbench.throughput_avg
867815 +11.8% 970369 hackbench.throughput_best
760822 +14.2% 869145 hackbench.throughput_worst
72.87 -10.3% 65.36 hackbench.time.elapsed_time
72.87 -10.3% 65.36 hackbench.time.elapsed_time.max
2.493e+08 -17.7% 2.052e+08 hackbench.time.involuntary_context_switches
12357 -3.9% 11879 hackbench.time.percent_of_cpu_this_job_got
8029 -14.8% 6842 hackbench.time.system_time
976.58 -5.5% 923.21 hackbench.time.user_time
7.54e+08 -14.4% 6.451e+08 hackbench.time.voluntary_context_switches
5.598e+10 +6.6% 5.965e+10 perf-stat.i.branch-instructions
0.40 -0.0 0.38 perf-stat.i.branch-miss-rate%
8.36 ± 2% +4.6 12.98 ± 3% perf-stat.i.cache-miss-rate%
2.11e+09 -33.8% 1.396e+09 perf-stat.i.cache-references
13687653 -3.4% 13225338 perf-stat.i.context-switches
1.36 -7.9% 1.25 perf-stat.i.cpi
3.219e+11 -2.2% 3.147e+11 perf-stat.i.cpu-cycles
1915 ± 2% -6.6% 1788 ± 3% perf-stat.i.cycles-between-cache-misses
2.371e+11 +6.0% 2.512e+11 perf-stat.i.instructions
0.74 +8.5% 0.80 perf-stat.i.ipc
1.15 ± 14% -28.3% 0.82 ± 23% perf-stat.i.major-faults
115.09 -3.2% 111.40 perf-stat.i.metric.K/sec
0.37 -0.0 0.35 perf-stat.overall.branch-miss-rate%
8.15 ± 3% +4.6 12.74 ± 3% perf-stat.overall.cache-miss-rate%
1.36 -7.7% 1.25 perf-stat.overall.cpi
1875 ± 2% -5.5% 1772 ± 4% perf-stat.overall.cycles-between-cache-misses
0.74 +8.3% 0.80 perf-stat.overall.ipc
5.524e+10 +6.4% 5.877e+10 perf-stat.ps.branch-instructions
2.079e+09 -33.9% 1.375e+09 perf-stat.ps.cache-references
13486088 -3.4% 13020988 perf-stat.ps.context-switches
3.175e+11 -2.3% 3.101e+11 perf-stat.ps.cpu-cycles
2.34e+11 +5.8% 2.475e+11 perf-stat.ps.instructions
1.09 ± 14% -28.3% 0.78 ± 21% perf-stat.ps.major-faults
1.73e+13 -5.1% 1.642e+13 perf-stat.total.instructions
3527725 +10.7% 3905361 sched_debug.cfs_rq:/.avg_vruntime.avg
3975260 +14.1% 4535959 ± 6% sched_debug.cfs_rq:/.avg_vruntime.max
98657 ± 17% +84.9% 182407 ± 18% sched_debug.cfs_rq:/.avg_vruntime.stddev
11.83 ± 7% +17.6% 13.92 ± 5% sched_debug.cfs_rq:/.h_nr_queued.max
2.71 ± 5% +21.8% 3.30 ± 4% sched_debug.cfs_rq:/.h_nr_queued.stddev
11.75 ± 7% +17.7% 13.83 ± 6% sched_debug.cfs_rq:/.h_nr_runnable.max
2.68 ± 4% +21.2% 3.25 ± 5% sched_debug.cfs_rq:/.h_nr_runnable.stddev
4556 ±223% +691.0% 36039 ± 34% sched_debug.cfs_rq:/.left_deadline.avg
583131 ±223% +577.3% 3949548 ± 4% sched_debug.cfs_rq:/.left_deadline.max
51341 ±223% +622.0% 370695 ± 16% sched_debug.cfs_rq:/.left_deadline.stddev
4555 ±223% +691.0% 36035 ± 34% sched_debug.cfs_rq:/.left_vruntime.avg
583105 ±223% +577.3% 3949123 ± 4% sched_debug.cfs_rq:/.left_vruntime.max
51338 ±223% +622.0% 370651 ± 16% sched_debug.cfs_rq:/.left_vruntime.stddev
3527725 +10.7% 3905361 sched_debug.cfs_rq:/.min_vruntime.avg
3975260 +14.1% 4535959 ± 6% sched_debug.cfs_rq:/.min_vruntime.max
98657 ± 17% +84.9% 182407 ± 18% sched_debug.cfs_rq:/.min_vruntime.stddev
0.22 ± 5% +13.9% 0.25 ± 5% sched_debug.cfs_rq:/.nr_queued.stddev
4555 ±223% +691.0% 36035 ± 34% sched_debug.cfs_rq:/.right_vruntime.avg
583105 ±223% +577.3% 3949123 ± 4% sched_debug.cfs_rq:/.right_vruntime.max
51338 ±223% +622.0% 370651 ± 16% sched_debug.cfs_rq:/.right_vruntime.stddev
1336 ± 7% +50.8% 2014 ± 6% sched_debug.cfs_rq:/.runnable_avg.stddev
552.53 ± 8% +19.6% 660.87 ± 5% sched_debug.cfs_rq:/.util_est.avg
384.27 ± 9% +28.9% 495.43 ± 11% sched_debug.cfs_rq:/.util_est.stddev
1328 ± 17% +42.7% 1896 ± 13% sched_debug.cpu.curr->pid.stddev
11.75 ± 8% +19.1% 14.00 ± 6% sched_debug.cpu.nr_running.max
2.71 ± 5% +22.7% 3.33 ± 4% sched_debug.cpu.nr_running.stddev
76578 ± 9% +33.7% 102390 ± 5% sched_debug.cpu.nr_switches.stddev
62.25 ± 7% +17.9% 73.42 ± 7% sched_debug.cpu.nr_uninterruptible.max
8.11 ± 58% -82.0% 1.46 ± 47% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
12.04 ±104% -86.8% 1.58 ± 55% perf-sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
0.11 ±123% -95.3% 0.01 ±102% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
0.06 ±103% -93.6% 0.00 ±154% perf-sched.sch_delay.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
0.10 ±109% -93.9% 0.01 ±163% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
1.00 ± 21% -59.6% 0.40 ± 50% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
14.54 ± 14% -79.2% 3.02 ± 51% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
1.50 ± 84% -74.1% 0.39 ± 90% perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
1.13 ± 68% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.38 ± 97% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1.10 ± 17% -68.9% 0.34 ± 49% perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
42.25 ± 18% -71.7% 11.96 ± 53% perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
3.25 ± 17% -77.5% 0.73 ± 49% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
29.17 ± 33% -62.0% 11.09 ± 85% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
46.25 ± 15% -68.8% 14.43 ± 52% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
3.72 ± 70% -81.0% 0.70 ± 67% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
7.95 ± 55% -69.7% 2.41 ± 65% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
3.66 ±139% -97.1% 0.11 ± 58% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
3.05 ± 44% -91.9% 0.25 ± 57% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
29.96 ± 9% -83.6% 4.90 ± 48% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
26.20 ± 59% -88.9% 2.92 ± 66% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.14 ± 84% -91.2% 0.01 ±142% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
0.20 ±149% -97.5% 0.01 ±102% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
0.11 ±144% -96.6% 0.00 ±154% perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
0.19 ±118% -96.7% 0.01 ±163% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
274.64 ± 95% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.72 ±151% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
3135 ± 5% -48.6% 1611 ± 57% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1320 ± 19% -78.6% 282.01 ± 74% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
265.55 ± 82% -77.9% 58.70 ±124% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
1850 ± 28% -59.1% 757.74 ± 68% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
766.85 ± 56% -68.0% 245.51 ± 51% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
1.77 ± 17% -71.9% 0.50 ± 49% perf-sched.total_sch_delay.average.ms
5.15 ± 17% -69.5% 1.57 ± 48% perf-sched.total_wait_and_delay.average.ms
3.38 ± 17% -68.2% 1.07 ± 48% perf-sched.total_wait_time.average.ms
5100 ± 3% -31.0% 3522 ± 47% perf-sched.total_wait_time.max.ms
27.42 ± 49% -85.2% 4.07 ± 47% perf-sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
35.29 ± 80% -85.8% 5.00 ± 51% perf-sched.wait_and_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
42.28 ± 14% -79.4% 8.70 ± 51% perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
3.12 ± 17% -66.4% 1.05 ± 48% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
122.62 ± 18% -70.4% 36.26 ± 53% perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
250.26 ± 65% -94.2% 14.56 ± 55% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.37 ± 17% -78.2% 2.05 ± 48% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
58.34 ± 33% -62.0% 22.18 ± 85% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
134.44 ± 15% -69.3% 41.24 ± 52% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
86.94 ± 6% -83.1% 14.68 ± 48% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
86.57 ± 39% -86.0% 12.14 ± 59% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
647.92 ± 48% -97.9% 13.86 ± 45% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
6386 ± 6% -46.8% 3397 ± 57% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
3868 ± 27% -60.4% 1531 ± 67% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
1647 ± 55% -67.7% 531.51 ± 50% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
5014 ± 5% -32.5% 3385 ± 47% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
19.31 ± 47% -86.5% 2.61 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
23.25 ± 70% -85.3% 3.42 ± 52% perf-sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
18.33 ± 15% -42.0% 10.64 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.11 ±123% -95.3% 0.01 ±102% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
0.06 ±103% -93.6% 0.00 ±154% perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
0.10 ±109% -93.9% 0.01 ±163% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
1.70 ± 21% -52.6% 0.81 ± 48% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
27.74 ± 15% -79.5% 5.68 ± 51% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
2.17 ± 75% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.42 ± 97% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
2.02 ± 17% -65.1% 0.70 ± 48% perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
80.37 ± 18% -69.8% 24.31 ± 52% perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
210.13 ± 68% -95.1% 10.21 ± 55% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.12 ± 17% -78.5% 1.32 ± 48% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
29.17 ± 33% -62.0% 11.09 ± 85% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
88.19 ± 16% -69.6% 26.81 ± 52% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
13.77 ± 45% -65.7% 4.72 ± 53% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
104.64 ± 42% -76.4% 24.74 ±135% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
5.16 ± 29% -92.5% 0.39 ± 48% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
56.98 ± 5% -82.9% 9.77 ± 48% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
60.36 ± 32% -84.7% 9.22 ± 57% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
619.88 ± 43% -98.0% 12.52 ± 45% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.14 ± 84% -91.2% 0.01 ±142% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
740.14 ± 35% -68.5% 233.31 ± 83% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
0.20 ±149% -97.5% 0.01 ±102% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
0.11 ±144% -96.6% 0.00 ±154% perf-sched.wait_time.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
0.19 ±118% -96.7% 0.01 ±163% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
327.64 ± 71% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.72 ±151% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
3299 ± 6% -40.7% 1957 ± 51% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
436.75 ± 39% -76.9% 100.85 ± 98% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
2112 ± 19% -62.3% 796.34 ± 63% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
947.83 ± 46% -58.8% 390.83 ± 53% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
5014 ± 5% -32.5% 3385 ± 47% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_2/aim9/300s
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
11036 +85.7% 20499 meminfo.PageTables
125.17 ± 8% +18.4% 148.17 ± 7% perf-c2c.HITM.local
30464 +18.7% 36160 sched_debug.cpu.nr_switches.avg
9166 +19.8% 10985 vmstat.system.cs
6623 ± 17% +60.8% 10652 ± 5% numa-meminfo.node0.PageTables
4414 ± 26% +123.2% 9853 ± 6% numa-meminfo.node1.PageTables
1653 ± 17% +60.1% 2647 ± 5% numa-vmstat.node0.nr_page_table_pages
1097 ± 26% +123.9% 2457 ± 6% numa-vmstat.node1.nr_page_table_pages
319.08 -2.2% 312.04 aim9.shell_rtns_2.ops_per_sec
27170926 -2.2% 26586121 aim9.time.minor_page_faults
1051038 -2.2% 1027732 aim9.time.voluntary_context_switches
2736 +86.4% 5101 proc-vmstat.nr_page_table_pages
28014 +1.3% 28378 proc-vmstat.nr_slab_unreclaimable
19332129 -1.5% 19048363 proc-vmstat.numa_hit
19283853 -1.5% 18996609 proc-vmstat.numa_local
19892794 -1.5% 19598065 proc-vmstat.pgalloc_normal
28044189 -2.1% 27457289 proc-vmstat.pgfault
19843766 -1.5% 19543091 proc-vmstat.pgfree
419715 -5.7% 395688 ± 8% proc-vmstat.pgreuse
2682 -2.0% 2628 proc-vmstat.unevictable_pgs_culled
0.07 ± 6% -30.5% 0.05 ± 22% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.03 ± 6% +36.0% 0.04 perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.07 ± 33% -57.5% 0.03 ± 53% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
0.02 ± 74% +112.0% 0.05 ± 36% perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
0.02 +24.1% 0.02 ± 2% perf-sched.total_sch_delay.average.ms
27.52 -14.0% 23.67 perf-sched.total_wait_and_delay.average.ms
23179 +18.3% 27421 perf-sched.total_wait_and_delay.count.ms
27.50 -14.0% 23.65 perf-sched.total_wait_time.average.ms
117.03 ± 3% -72.4% 32.27 ± 2% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1655 ± 2% +282.0% 6324 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.96 ± 29% +51.6% 1.45 ± 22% perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
117.00 ± 3% -72.5% 32.23 ± 2% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
5.93 +0.1 6.00 perf-stat.i.branch-miss-rate%
9189 +19.8% 11011 perf-stat.i.context-switches
1.96 +1.6% 1.99 perf-stat.i.cpi
71.21 +60.6% 114.39 ± 4% perf-stat.i.cpu-migrations
0.53 -1.5% 0.52 perf-stat.i.ipc
3.79 -2.1% 3.71 perf-stat.i.metric.K/sec
90998 -2.1% 89084 perf-stat.i.minor-faults
90998 -2.1% 89084 perf-stat.i.page-faults
5.99 +0.1 6.06 perf-stat.overall.branch-miss-rate%
1.79 +1.4% 1.82 perf-stat.overall.cpi
0.56 -1.3% 0.55 perf-stat.overall.ipc
9158 +19.8% 10974 perf-stat.ps.context-switches
70.99 +60.6% 114.02 ± 4% perf-stat.ps.cpu-migrations
90694 -2.1% 88787 perf-stat.ps.minor-faults
90695 -2.1% 88787 perf-stat.ps.page-faults
8.155e+11 -1.1% 8.065e+11 perf-stat.total.instructions
8.87 -0.3 8.55 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
8.86 -0.3 8.54 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.53 ± 2% -0.1 2.43 ± 2% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
2.54 -0.1 2.44 ± 2% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
2.49 -0.1 2.40 ± 2% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
0.98 ± 5% -0.1 0.90 ± 5% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.70 ± 3% -0.1 0.62 ± 6% perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
0.18 ±141% +0.5 0.67 ± 6% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
0.18 ±141% +0.5 0.67 ± 6% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
0.00 +0.6 0.59 ± 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
62.48 +0.7 63.14 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
49.10 +0.7 49.78 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
67.62 +0.8 68.43 perf-profile.calltrace.cycles-pp.common_startup_64
20.14 -0.7 19.40 perf-profile.children.cycles-pp.do_syscall_64
20.18 -0.7 19.44 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
3.33 ± 2% -0.2 3.16 ± 2% perf-profile.children.cycles-pp.vm_mmap_pgoff
3.22 ± 2% -0.2 3.06 perf-profile.children.cycles-pp.do_mmap
3.51 ± 2% -0.1 3.38 perf-profile.children.cycles-pp.do_exit
3.52 ± 2% -0.1 3.38 perf-profile.children.cycles-pp.__x64_sys_exit_group
3.52 ± 2% -0.1 3.38 perf-profile.children.cycles-pp.do_group_exit
3.67 -0.1 3.54 perf-profile.children.cycles-pp.x64_sys_call
2.21 -0.1 2.09 ± 3% perf-profile.children.cycles-pp.__x64_sys_openat
2.07 ± 2% -0.1 1.94 ± 2% perf-profile.children.cycles-pp.path_openat
2.09 ± 2% -0.1 1.97 ± 2% perf-profile.children.cycles-pp.do_filp_open
2.19 -0.1 2.08 ± 3% perf-profile.children.cycles-pp.do_sys_openat2
1.50 ± 4% -0.1 1.39 ± 3% perf-profile.children.cycles-pp.copy_process
2.56 -0.1 2.46 ± 2% perf-profile.children.cycles-pp.exit_mm
2.55 -0.1 2.44 ± 2% perf-profile.children.cycles-pp.__mmput
2.51 ± 2% -0.1 2.41 ± 2% perf-profile.children.cycles-pp.exit_mmap
0.70 ± 3% -0.1 0.62 ± 6% perf-profile.children.cycles-pp.dup_mm
0.94 ± 4% -0.1 0.89 ± 2% perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
0.57 ± 3% -0.0 0.52 ± 4% perf-profile.children.cycles-pp.alloc_pages_noprof
0.20 ± 12% -0.0 0.15 ± 10% perf-profile.children.cycles-pp.perf_event_task_tick
0.18 ± 4% -0.0 0.14 ± 15% perf-profile.children.cycles-pp.xas_find
0.10 ± 12% -0.0 0.07 ± 24% perf-profile.children.cycles-pp.up_write
0.09 ± 6% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.tick_check_broadcast_expired
0.08 ± 12% +0.0 0.10 ± 8% perf-profile.children.cycles-pp.hrtimer_try_to_cancel
0.10 ± 13% +0.0 0.13 ± 5% perf-profile.children.cycles-pp.__perf_event_task_sched_out
0.20 ± 8% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.enqueue_entity
0.21 ± 9% +0.0 0.25 ± 4% perf-profile.children.cycles-pp.prepare_task_switch
0.03 ±101% +0.0 0.07 ± 16% perf-profile.children.cycles-pp.run_ksoftirqd
0.04 ± 71% +0.1 0.09 ± 15% perf-profile.children.cycles-pp.kick_pool
0.05 ± 47% +0.1 0.11 ± 16% perf-profile.children.cycles-pp.__queue_work
0.28 ± 5% +0.1 0.34 ± 7% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.50 +0.1 0.56 ± 2% perf-profile.children.cycles-pp.timerqueue_del
0.04 ± 71% +0.1 0.11 ± 17% perf-profile.children.cycles-pp.queue_work_on
0.51 ± 4% +0.1 0.58 ± 2% perf-profile.children.cycles-pp.enqueue_task_fair
0.32 ± 3% +0.1 0.40 ± 4% perf-profile.children.cycles-pp.ttwu_do_activate
0.53 ± 5% +0.1 0.61 ± 3% perf-profile.children.cycles-pp.enqueue_task
0.49 ± 4% +0.1 0.57 ± 6% perf-profile.children.cycles-pp.schedule
0.28 ± 6% +0.1 0.38 perf-profile.children.cycles-pp.sched_ttwu_pending
0.32 ± 5% +0.1 0.43 ± 2% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.35 ± 8% +0.1 0.47 ± 2% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.17 ± 10% +0.2 0.34 ± 12% perf-profile.children.cycles-pp.worker_thread
0.88 ± 3% +0.2 1.06 ± 4% perf-profile.children.cycles-pp.ret_from_fork
0.88 ± 3% +0.2 1.06 ± 4% perf-profile.children.cycles-pp.ret_from_fork_asm
0.39 ± 6% +0.2 0.59 ± 7% perf-profile.children.cycles-pp.kthread
66.24 +0.6 66.85 perf-profile.children.cycles-pp.cpuidle_idle_call
63.09 +0.6 63.73 perf-profile.children.cycles-pp.cpuidle_enter
62.97 +0.6 63.61 perf-profile.children.cycles-pp.cpuidle_enter_state
67.61 +0.8 68.43 perf-profile.children.cycles-pp.do_idle
67.62 +0.8 68.43 perf-profile.children.cycles-pp.common_startup_64
67.62 +0.8 68.43 perf-profile.children.cycles-pp.cpu_startup_entry
0.37 ± 11% -0.1 0.31 ± 3% perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
0.10 ± 13% -0.0 0.06 ± 50% perf-profile.self.cycles-pp.up_write
0.15 ± 4% +0.1 0.22 ± 8% perf-profile.self.cycles-pp.timerqueue_del
***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/exec_test/aim9/300s
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
12120 +76.7% 21422 meminfo.PageTables
8543 +26.9% 10840 vmstat.system.cs
6148 ± 11% +89.9% 11678 ± 5% numa-meminfo.node0.PageTables
5909 ± 11% +64.0% 9689 ± 7% numa-meminfo.node1.PageTables
1532 ± 10% +90.5% 2919 ± 5% numa-vmstat.node0.nr_page_table_pages
1468 ± 11% +65.2% 2426 ± 7% numa-vmstat.node1.nr_page_table_pages
2991 +78.0% 5323 proc-vmstat.nr_page_table_pages
32726750 -2.4% 31952115 proc-vmstat.pgfault
1228 -2.6% 1197 aim9.exec_test.ops_per_sec
11018 ± 2% +10.5% 12178 ± 2% aim9.time.involuntary_context_switches
31835059 -2.4% 31062527 aim9.time.minor_page_faults
736468 -2.9% 715310 aim9.time.voluntary_context_switches
0.28 ± 7% +11.3% 0.31 ± 6% sched_debug.cfs_rq:/.h_nr_queued.stddev
0.28 ± 7% +11.3% 0.31 ± 6% sched_debug.cfs_rq:/.nr_queued.stddev
356683 ± 16% +27.0% 453000 ± 9% sched_debug.cpu.avg_idle.min
27620 ± 7% +29.5% 35775 sched_debug.cpu.nr_switches.avg
84830 ± 14% +16.3% 98648 ± 4% sched_debug.cpu.nr_switches.max
4563 ± 26% +46.2% 6671 ± 26% sched_debug.cpu.nr_switches.min
0.03 ± 4% -67.3% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exec_release.exec_mm_release.exec_mmap
0.03 +11.2% 0.03 ± 2% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.05 ± 28% +61.3% 0.09 ± 21% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
0.10 ± 18% +18.8% 0.12 perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.02 ± 3% +18.3% 0.02 ± 2% perf-sched.total_sch_delay.average.ms
28.80 -19.8% 23.10 ± 3% perf-sched.total_wait_and_delay.average.ms
22332 +24.4% 27778 perf-sched.total_wait_and_delay.count.ms
28.78 -19.8% 23.07 ± 3% perf-sched.total_wait_time.average.ms
17.39 ± 10% -15.6% 14.67 ± 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
41.02 ± 4% -54.6% 18.64 ± 6% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
4795 ± 2% +122.5% 10668 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
17.35 ± 10% -15.7% 14.63 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.00 ±141% +400.0% 0.00 ± 44% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
40.99 ± 4% -54.6% 18.61 ± 6% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.00 ±149% +542.9% 0.03 ± 41% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
5.617e+08 -1.6% 5.529e+08 perf-stat.i.branch-instructions
5.76 +0.1 5.84 perf-stat.i.branch-miss-rate%
8562 +27.0% 10878 perf-stat.i.context-switches
1.87 +2.6% 1.92 perf-stat.i.cpi
78.02 ± 3% +11.8% 87.23 ± 2% perf-stat.i.cpu-migrations
2.792e+09 -1.6% 2.748e+09 perf-stat.i.instructions
0.55 -2.5% 0.54 perf-stat.i.ipc
4.42 -2.4% 4.31 perf-stat.i.metric.K/sec
106019 -2.4% 103509 perf-stat.i.minor-faults
106019 -2.4% 103509 perf-stat.i.page-faults
5.83 +0.1 5.91 perf-stat.overall.branch-miss-rate%
1.72 +2.3% 1.76 perf-stat.overall.cpi
0.58 -2.3% 0.57 perf-stat.overall.ipc
5.599e+08 -1.6% 5.511e+08 perf-stat.ps.branch-instructions
8534 +27.0% 10841 perf-stat.ps.context-switches
77.77 ± 3% +11.8% 86.96 ± 2% perf-stat.ps.cpu-migrations
2.783e+09 -1.6% 2.739e+09 perf-stat.ps.instructions
105666 -2.4% 103164 perf-stat.ps.minor-faults
105666 -2.4% 103164 perf-stat.ps.page-faults
8.386e+11 -1.6% 8.253e+11 perf-stat.total.instructions
7.79 -0.4 7.41 ± 2% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
7.75 -0.3 7.47 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
7.73 -0.3 7.46 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.68 ± 2% -0.2 2.52 ± 2% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
2.68 ± 2% -0.2 2.52 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.68 ± 2% -0.2 2.52 ± 2% perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.73 ± 2% -0.2 2.57 ± 2% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.60 -0.1 2.46 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
2.61 -0.1 2.47 ± 3% perf-profile.calltrace.cycles-pp.execve.exec_test
2.60 -0.1 2.46 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
2.60 -0.1 2.46 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve.exec_test
1.92 ± 3% -0.1 1.79 ± 2% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
1.92 ± 3% -0.1 1.80 ± 2% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
4.68 -0.1 4.57 perf-profile.calltrace.cycles-pp._Fork
1.88 ± 2% -0.1 1.77 ± 2% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
2.76 -0.1 2.66 ± 2% perf-profile.calltrace.cycles-pp.exec_test
3.24 -0.1 3.16 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.84 ± 4% -0.1 0.77 ± 5% perf-profile.calltrace.cycles-pp.wait4
0.88 ± 7% +0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
0.88 ± 7% +0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
0.88 ± 7% +0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
0.46 ± 45% +0.3 0.78 ± 5% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.17 ±141% +0.4 0.53 ± 4% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.18 ±141% +0.4 0.54 ± 2% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
66.08 +0.8 66.85 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
66.08 +0.8 66.85 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
66.02 +0.8 66.80 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
67.06 +0.9 68.00 perf-profile.calltrace.cycles-pp.common_startup_64
21.19 -0.9 20.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
21.15 -0.9 20.27 perf-profile.children.cycles-pp.do_syscall_64
7.92 -0.4 7.53 ± 2% perf-profile.children.cycles-pp.execve
7.94 -0.4 7.56 ± 2% perf-profile.children.cycles-pp.__x64_sys_execve
7.84 -0.4 7.46 ± 2% perf-profile.children.cycles-pp.do_execveat_common
5.51 -0.3 5.25 ± 2% perf-profile.children.cycles-pp.load_elf_binary
3.68 -0.2 3.49 ± 2% perf-profile.children.cycles-pp.__mmput
2.81 ± 2% -0.2 2.63 perf-profile.children.cycles-pp.__x64_sys_exit_group
2.80 ± 2% -0.2 2.62 ± 2% perf-profile.children.cycles-pp.do_exit
2.81 ± 2% -0.2 2.62 ± 2% perf-profile.children.cycles-pp.do_group_exit
2.93 ± 2% -0.2 2.76 ± 2% perf-profile.children.cycles-pp.x64_sys_call
3.60 -0.2 3.44 ± 2% perf-profile.children.cycles-pp.exit_mmap
5.66 -0.1 5.51 perf-profile.children.cycles-pp.__handle_mm_fault
1.94 ± 3% -0.1 1.82 ± 2% perf-profile.children.cycles-pp.exit_mm
2.64 -0.1 2.52 ± 3% perf-profile.children.cycles-pp.vm_mmap_pgoff
2.55 ± 2% -0.1 2.43 ± 3% perf-profile.children.cycles-pp.do_mmap
2.19 ± 2% -0.1 2.08 ± 3% perf-profile.children.cycles-pp.__mmap_region
2.27 -0.1 2.16 ± 2% perf-profile.children.cycles-pp.begin_new_exec
2.79 -0.1 2.69 ± 2% perf-profile.children.cycles-pp.exec_test
0.83 ± 4% -0.1 0.76 ± 6% perf-profile.children.cycles-pp.__mmap_prepare
0.86 ± 4% -0.1 0.78 ± 5% perf-profile.children.cycles-pp.wait4
0.52 ± 5% -0.1 0.45 ± 7% perf-profile.children.cycles-pp.kernel_wait4
0.50 ± 5% -0.1 0.43 ± 6% perf-profile.children.cycles-pp.do_wait
0.88 ± 3% -0.1 0.81 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
0.51 ± 2% -0.1 0.46 ± 6% perf-profile.children.cycles-pp.setup_arg_pages
0.39 ± 2% -0.0 0.34 ± 8% perf-profile.children.cycles-pp.unlink_anon_vmas
0.08 ± 10% -0.0 0.04 ± 71% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
0.37 ± 5% -0.0 0.33 ± 3% perf-profile.children.cycles-pp.__memcg_slab_free_hook
0.21 ± 6% -0.0 0.17 ± 5% perf-profile.children.cycles-pp.user_path_at
0.21 ± 3% -0.0 0.18 ± 10% perf-profile.children.cycles-pp.__percpu_counter_sum
0.18 ± 7% -0.0 0.15 ± 5% perf-profile.children.cycles-pp.alloc_empty_file
0.33 ± 5% -0.0 0.30 perf-profile.children.cycles-pp.relocate_vma_down
0.04 ± 45% +0.0 0.08 ± 12% perf-profile.children.cycles-pp.__update_load_avg_se
0.14 ± 7% +0.0 0.18 ± 10% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.19 ± 9% +0.0 0.24 ± 7% perf-profile.children.cycles-pp.prepare_task_switch
0.02 ±142% +0.0 0.06 ± 23% perf-profile.children.cycles-pp.select_task_rq
0.03 ±100% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.task_contending
0.45 ± 7% +0.1 0.51 ± 3% perf-profile.children.cycles-pp.__pick_next_task
0.14 ± 22% +0.1 0.20 ± 10% perf-profile.children.cycles-pp.kick_pool
0.36 ± 4% +0.1 0.42 ± 4% perf-profile.children.cycles-pp.dequeue_entities
0.36 ± 4% +0.1 0.44 ± 5% perf-profile.children.cycles-pp.dequeue_task_fair
0.15 ± 20% +0.1 0.23 ± 10% perf-profile.children.cycles-pp.__queue_work
0.49 ± 5% +0.1 0.57 ± 7% perf-profile.children.cycles-pp.schedule_idle
0.14 ± 22% +0.1 0.23 ± 9% perf-profile.children.cycles-pp.queue_work_on
0.36 ± 3% +0.1 0.46 ± 9% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.47 ± 7% +0.1 0.57 ± 7% perf-profile.children.cycles-pp.timerqueue_del
0.30 ± 13% +0.1 0.42 ± 7% perf-profile.children.cycles-pp.ttwu_do_activate
0.23 ± 15% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.18 ± 14% +0.1 0.32 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending
0.19 ± 13% +0.1 0.34 ± 4% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.61 ± 3% +0.2 0.76 ± 5% perf-profile.children.cycles-pp.schedule
1.60 ± 4% +0.2 1.80 ± 2% perf-profile.children.cycles-pp.ret_from_fork_asm
1.60 ± 4% +0.2 1.80 ± 2% perf-profile.children.cycles-pp.ret_from_fork
0.88 ± 7% +0.2 1.09 ± 3% perf-profile.children.cycles-pp.kthread
1.22 ± 3% +0.2 1.45 ± 5% perf-profile.children.cycles-pp.__schedule
0.54 ± 8% +0.2 0.78 ± 5% perf-profile.children.cycles-pp.worker_thread
66.08 +0.8 66.85 perf-profile.children.cycles-pp.start_secondary
67.06 +0.9 68.00 perf-profile.children.cycles-pp.common_startup_64
67.06 +0.9 68.00 perf-profile.children.cycles-pp.cpu_startup_entry
67.06 +0.9 68.00 perf-profile.children.cycles-pp.do_idle
0.08 ± 10% -0.0 0.04 ± 71% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
0.04 ± 45% +0.0 0.08 ± 10% perf-profile.self.cycles-pp.__update_load_avg_se
0.14 ± 10% +0.1 0.23 ± 11% perf-profile.self.cycles-pp.timerqueue_del
***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
gcc-12/performance/1BRD_48G/xfs/x86_64-rhel-9.4/600/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/sync_disk_rw/aim7
commit:
baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
baffb122772da116 f3de761c52148abfb1b4512914f
---------------- ---------------------------
%stddev %change %stddev
\ | \
344180 ± 6% -13.0% 299325 ± 9% meminfo.Mapped
9594 ±123% +191.8% 27995 ± 54% numa-meminfo.node1.PageTables
2399 ±123% +191.3% 6989 ± 54% numa-vmstat.node1.nr_page_table_pages
1860734 -5.2% 1763194 vmstat.io.bo
831686 +1.3% 842493 vmstat.system.cs
50372 -5.5% 47609 aim7.jobs-per-min
1435644 +11.5% 1600707 aim7.time.involuntary_context_switches
7242 +1.2% 7332 aim7.time.percent_of_cpu_this_job_got
5159 +7.1% 5526 aim7.time.system_time
33195986 +6.9% 35497140 aim7.time.voluntary_context_switches
40987 ± 10% -19.8% 32872 ± 9% sched_debug.cfs_rq:/.avg_vruntime.stddev
40987 ± 10% -19.8% 32872 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev
605972 ± 2% +14.5% 693922 ± 7% sched_debug.cpu.avg_idle.max
30974 ± 8% -20.9% 24498 ± 15% sched_debug.cpu.avg_idle.min
118758 ± 5% +22.0% 144899 ± 6% sched_debug.cpu.avg_idle.stddev
856253 +1.5% 869009 perf-stat.i.context-switches
3.06 +2.3% 3.13 perf-stat.i.cpi
164824 +7.7% 177546 perf-stat.i.cpu-migrations
7.93 +2.5% 8.13 perf-stat.i.metric.K/sec
3.41 +1.8% 3.47 perf-stat.overall.cpi
1355 +5.8% 1434 ± 4% perf-stat.overall.cycles-between-cache-misses
0.29 -1.8% 0.29 perf-stat.overall.ipc
845412 +1.6% 858925 perf-stat.ps.context-switches
162728 +7.8% 175475 perf-stat.ps.cpu-migrations
4.391e+12 +5.0% 4.609e+12 perf-stat.total.instructions
444798 +6.0% 471383 ± 5% proc-vmstat.nr_active_anon
28190 -2.8% 27402 proc-vmstat.nr_dirty
1231373 +2.3% 1259666 ± 2% proc-vmstat.nr_file_pages
63763 +0.9% 64355 proc-vmstat.nr_inactive_file
86758 ± 6% -12.9% 75546 ± 8% proc-vmstat.nr_mapped
10162 ± 2% +7.2% 10895 ± 3% proc-vmstat.nr_page_table_pages
265229 +10.4% 292795 ± 9% proc-vmstat.nr_shmem
444798 +6.0% 471383 ± 5% proc-vmstat.nr_zone_active_anon
63763 +0.9% 64355 proc-vmstat.nr_zone_inactive_file
28191 -2.8% 27400 proc-vmstat.nr_zone_write_pending
24349 +11.6% 27171 ± 8% proc-vmstat.pgreuse
0.02 ± 3% +11.3% 0.03 ± 2% perf-sched.sch_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
0.29 ± 17% -30.7% 0.20 ± 14% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
0.03 ± 10% +33.5% 0.04 ± 2% perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
0.21 ± 32% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.16 ± 16% +51.9% 0.24 ± 11% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.22 ± 19% +44.1% 0.32 ± 25% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
0.30 ± 28% -38.7% 0.18 ± 28% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
0.11 ± 5% +12.8% 0.12 ± 4% perf-sched.sch_delay.avg.ms.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
0.08 ± 4% +15.8% 0.09 ± 4% perf-sched.sch_delay.avg.ms.xlog_wait.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync
0.02 ± 3% +13.7% 0.02 ± 4% perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
0.01 ±223% +1289.5% 0.09 ±111% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
2.49 ± 40% -43.4% 1.41 ± 50% perf-sched.sch_delay.max.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
0.76 ± 7% +92.8% 1.46 ± 40% perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
0.65 ± 41% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.40 ± 64% +2968.7% 43.04 ± 13% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.63 ± 19% +89.8% 1.19 ± 51% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
28.67 ± 3% -11.2% 25.45 ± 5% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
0.80 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
5.76 ±107% +152.4% 14.53 ± 10% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
8441 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
18.67 ± 71% +108.0% 38.83 ± 5% perf-sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_trans_commit
116.17 ±105% +1677.8% 2065 ± 5% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
424.79 ±151% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
28.51 ± 3% -11.2% 25.31 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
0.38 ± 59% -79.0% 0.08 ±107% perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
0.77 ± 9% -56.5% 0.34 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
1.80 ±138% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.13 ± 93% +133.2% 14.29 ± 10% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1.00 ± 16% -48.1% 0.52 ± 20% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
0.92 ± 16% -62.0% 0.35 ± 14% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work
0.26 ± 2% -59.8% 0.11 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
0.24 ±223% +2180.2% 5.56 ± 83% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
1.25 ± 77% -79.8% 0.25 ±107% perf-sched.wait_time.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
1.78 ± 51% +958.6% 18.82 ±117% perf-sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map
58.48 ± 6% -10.7% 52.22 ± 2% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.__flush_workqueue.xlog_cil_push_now.isra
10.87 ±192% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.63 ± 27% -63.9% 3.12 ± 29% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
2025-06-25 8:01 ` kernel test robot
@ 2025-06-25 13:57 ` Mathieu Desnoyers
2025-06-25 15:06 ` Gabriele Monaco
2025-07-02 13:58 ` Gabriele Monaco
0 siblings, 2 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2025-06-25 13:57 UTC (permalink / raw)
To: kernel test robot, Gabriele Monaco
Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
Paul E. McKenney, Ingo Molnar
On 2025-06-25 04:01, kernel test robot wrote:
>
> Hello,
>
> kernel test robot noticed a 10.1% regression of hackbench.throughput on:
Hi Gabriele,
This is a significant regression. Can you investigate before it gets
merged ?
Thanks,
Mathieu
>
>
> commit: f3de761c52148abfb1b4512914f64c7e1c737fc8 ("[RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct")
> url: https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250613-171504
> patch link: https://lore.kernel.org/all/20250613091229.21500-3-gmonaco@redhat.com/
> patch subject: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
>
> testcase: hackbench
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
>
> nr_threads: 100%
> iterations: 4
> mode: process
> ipc: pipe
> cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput 2.9% regression |
> | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | ipc=socket |
> | | iterations=4 |
> | | mode=process |
> | | nr_threads=50% |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.shell_rtns_3.ops_per_sec 1.7% regression |
> | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=shell_rtns_3 |
> | | testtime=300s |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput 6.2% regression |
> | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | ipc=pipe |
> | | iterations=4 |
> | | mode=process |
> | | nr_threads=800% |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.shell_rtns_1.ops_per_sec 2.1% regression |
> | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=shell_rtns_1 |
> | | testtime=300s |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput 11.8% improvement |
> | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | ipc=pipe |
> | | iterations=4 |
> | | mode=process |
> | | nr_threads=50% |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.shell_rtns_2.ops_per_sec 2.2% regression |
> | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=shell_rtns_2 |
> | | testtime=300s |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim9: aim9.exec_test.ops_per_sec 2.6% regression |
> | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=exec_test |
> | | testtime=300s |
> +------------------+------------------------------------------------------------------------------------------------+
> | testcase: change | aim7: aim7.jobs-per-min 5.5% regression |
> | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | disk=1BRD_48G |
> | | fs=xfs |
> | | load=600 |
> | | test=sync_disk_rw |
> +------------------+------------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202506251555.de6720f7-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250625/202506251555.de6720f7-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 55140 ± 80% +229.2% 181547 ± 20% numa-meminfo.node1.Mapped
> 13048 ± 80% +248.2% 45431 ± 20% numa-vmstat.node1.nr_mapped
> 679.17 ± 22% -25.3% 507.33 ± 10% sched_debug.cfs_rq:/.util_est.max
> 4.287e+08 ± 3% +20.3% 5.158e+08 cpuidle..time
> 2953716 ± 13% +228.9% 9716185 ± 2% cpuidle..usage
> 91072 ± 12% +134.8% 213855 ± 7% meminfo.Mapped
> 8848637 +10.4% 9769875 ± 5% meminfo.Memused
> 0.67 ± 4% +0.1 0.78 ± 2% mpstat.cpu.all.irq%
> 0.03 ± 2% +0.0 0.03 ± 4% mpstat.cpu.all.soft%
> 4.17 ± 8% +596.0% 29.00 ± 31% mpstat.max_utilization.seconds
> 2950 -12.3% 2587 vmstat.procs.r
> 4557607 ± 2% +35.9% 6192548 vmstat.system.cs
> 397195 ± 5% +73.4% 688726 vmstat.system.in
> 1490153 -10.1% 1339340 hackbench.throughput
> 1424170 -8.7% 1299590 hackbench.throughput_avg
> 1490153 -10.1% 1339340 hackbench.throughput_best
> 1353181 ± 2% -10.1% 1216523 hackbench.throughput_worst
> 53158738 ± 3% +34.0% 71240022 hackbench.time.involuntary_context_switches
> 12177 -2.4% 11891 hackbench.time.percent_of_cpu_this_job_got
> 4482 +7.6% 4821 hackbench.time.system_time
> 798.92 +2.0% 815.24 hackbench.time.user_time
> 1.54e+08 ± 3% +46.6% 2.257e+08 hackbench.time.voluntary_context_switches
> 210335 +3.3% 217333 proc-vmstat.nr_anon_pages
> 23353 ± 14% +136.2% 55152 ± 7% proc-vmstat.nr_mapped
> 61825 ± 3% +6.6% 65928 ± 2% proc-vmstat.nr_page_table_pages
> 30859 +4.4% 32213 proc-vmstat.nr_slab_reclaimable
> 1294 ±177% +1657.1% 22743 ± 66% proc-vmstat.numa_hint_faults
> 1153 ±198% +1597.0% 19566 ± 79% proc-vmstat.numa_hint_faults_local
> 1.242e+08 -3.2% 1.202e+08 proc-vmstat.numa_hit
> 1.241e+08 -3.2% 1.201e+08 proc-vmstat.numa_local
> 2195 ±110% +2337.0% 53508 ± 55% proc-vmstat.numa_pte_updates
> 1.243e+08 -3.2% 1.203e+08 proc-vmstat.pgalloc_normal
> 875909 ± 2% +8.6% 951378 ± 2% proc-vmstat.pgfault
> 1.231e+08 -3.5% 1.188e+08 proc-vmstat.pgfree
> 6.903e+10 -5.6% 6.514e+10 perf-stat.i.branch-instructions
> 0.21 +0.0 0.26 perf-stat.i.branch-miss-rate%
> 89225177 ± 2% +38.3% 1.234e+08 perf-stat.i.branch-misses
> 25.64 ± 2% -5.7 19.95 ± 2% perf-stat.i.cache-miss-rate%
> 9.322e+08 ± 2% +22.8% 1.145e+09 perf-stat.i.cache-references
> 4553621 ± 2% +39.8% 6363761 perf-stat.i.context-switches
> 1.12 +4.5% 1.17 perf-stat.i.cpi
> 186890 ± 2% +143.9% 455784 perf-stat.i.cpu-migrations
> 2.787e+11 -4.9% 2.649e+11 perf-stat.i.instructions
> 0.91 -4.4% 0.87 perf-stat.i.ipc
> 36.79 ± 2% +44.9% 53.30 perf-stat.i.metric.K/sec
> 0.13 ± 2% +0.1 0.19 perf-stat.overall.branch-miss-rate%
> 24.44 ± 2% -4.7 19.74 ± 2% perf-stat.overall.cache-miss-rate%
> 1.12 +4.6% 1.17 perf-stat.overall.cpi
> 0.89 -4.4% 0.85 perf-stat.overall.ipc
> 6.755e+10 -5.4% 6.392e+10 perf-stat.ps.branch-instructions
> 87121352 ± 2% +38.5% 1.206e+08 perf-stat.ps.branch-misses
> 9.098e+08 ± 2% +23.1% 1.12e+09 perf-stat.ps.cache-references
> 4443812 ± 2% +39.9% 6218298 perf-stat.ps.context-switches
> 181595 ± 2% +144.5% 443985 perf-stat.ps.cpu-migrations
> 2.727e+11 -4.7% 2.599e+11 perf-stat.ps.instructions
> 1.21e+13 +4.3% 1.262e+13 perf-stat.total.instructions
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.ctx_resched.event_function.remote_function.generic_exec_single
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.ctx_resched.event_function.remote_function.generic_exec_single.smp_call_function_single
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.event_function.remote_function.generic_exec_single.smp_call_function_single.event_function_call
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.remote_function.generic_exec_single.smp_call_function_single.event_function_call.perf_event_for_each_child
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_for_each_child._perf_ioctl.perf_ioctl
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_internal_command
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record.run_builtin
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.main
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.perf_c2c__record.run_builtin.handle_internal_command.main
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.calltrace.cycles-pp.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.__intel_pmu_enable_all
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.__x64_sys_ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp._perf_ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.ctx_resched
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.event_function
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.generic_exec_single
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.perf_event_for_each_child
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.perf_ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.children.cycles-pp.remote_function
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.__evlist__enable
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.perf_c2c__record
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.perf_evsel__enable_cpu
> 11.84 ± 91% -8.4 3.49 ±154% perf-profile.children.cycles-pp.perf_evsel__run_ioctl
> 11.84 ± 91% -9.5 2.30 ±141% perf-profile.self.cycles-pp.__intel_pmu_enable_all
> 23.74 ±185% -98.6% 0.34 ±114% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
> 12.77 ± 80% -83.9% 2.05 ±138% perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
> 5.93 ± 69% -90.5% 0.56 ±105% perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
> 6.70 ±152% -94.5% 0.37 ±145% perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> 0.82 ± 85% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 8.59 ±202% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 13.53 ± 11% -47.0% 7.18 ± 76% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 15.63 ± 17% -100.0% 0.00 perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 47.22 ± 77% -85.5% 6.87 ±144% perf-sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
> 133.35 ±132% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 68.01 ±203% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 13.53 ± 11% -47.0% 7.18 ± 76% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 34.59 ± 3% -100.0% 0.00 perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 40.97 ± 8% -71.8% 11.55 ± 64% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
> 373.07 ±123% -99.8% 0.78 ±156% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 13.53 ± 11% -62.0% 5.14 ±107% perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 120.97 ± 23% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 46.03 ± 30% -62.5% 17.27 ± 87% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
> 984.50 ± 14% -43.5% 556.24 ± 58% perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
> 339.42 ± 12% -97.3% 9.11 ± 54% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 8.00 ± 23% -85.4% 1.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 22.17 ± 49% -100.0% 0.00 perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 73.83 ± 20% -76.3% 17.50 ± 96% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
> 13.53 ± 11% -62.0% 5.14 ±107% perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 336.30 ± 5% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 23.74 ±185% -98.6% 0.34 ±114% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
> 14.48 ± 61% -74.1% 3.76 ±152% perf-sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
> 6.48 ± 68% -91.3% 0.56 ±105% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
> 6.70 ±152% -94.5% 0.37 ±145% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> 2.18 ± 75% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 10.79 ±165% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 1.53 ±100% -97.5% 0.04 ± 84% perf-sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 105.34 ± 26% -100.0% 0.00 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 29.72 ± 40% -76.5% 7.00 ±102% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
> 32.21 ± 33% -65.7% 11.04 ± 85% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
> 984.49 ± 14% -43.5% 556.23 ± 58% perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
> 337.00 ± 12% -97.6% 8.11 ± 52% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 53.42 ± 59% -69.8% 16.15 ±162% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
> 218.65 ± 83% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 82.52 ±162% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 10.89 ± 98% -98.8% 0.13 ±134% perf-sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 334.02 ± 6% -100.0% 0.00 perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
>
>
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> gcc-12/performance/socket/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 161258 -12.6% 141018 ± 5% perf-c2c.HITM.total
> 6514 ± 3% +13.3% 7381 ± 3% uptime.idle
> 692218 +17.8% 815512 vmstat.system.in
> 4.747e+08 ± 7% +137.3% 1.127e+09 ± 21% cpuidle..time
> 5702271 ± 12% +503.6% 34419686 ± 13% cpuidle..usage
> 141191 ± 2% +10.3% 155768 ± 3% meminfo.PageTables
> 62180 +26.0% 78348 meminfo.Percpu
> 2.20 ± 14% +3.5 5.67 ± 20% mpstat.cpu.all.idle%
> 0.55 +0.2 0.72 ± 5% mpstat.cpu.all.irq%
> 0.04 ± 2% +0.0 0.06 ± 5% mpstat.cpu.all.soft%
> 448780 -2.9% 435554 hackbench.throughput
> 440656 -2.6% 429130 hackbench.throughput_avg
> 448780 -2.9% 435554 hackbench.throughput_best
> 425797 -2.2% 416584 hackbench.throughput_worst
> 90998790 -15.0% 77364427 ± 6% hackbench.time.involuntary_context_switches
> 12446 -3.9% 11960 hackbench.time.percent_of_cpu_this_job_got
> 16057 -1.4% 15825 hackbench.time.system_time
> 63421 -2.3% 61955 proc-vmstat.nr_kernel_stack
> 35455 ± 2% +10.0% 38991 ± 3% proc-vmstat.nr_page_table_pages
> 34542 +5.1% 36312 ± 2% proc-vmstat.nr_slab_reclaimable
> 151083 ± 16% +46.6% 221509 ± 17% proc-vmstat.numa_hint_faults
> 113731 ± 26% +64.7% 187314 ± 20% proc-vmstat.numa_hint_faults_local
> 133591 +3.1% 137709 proc-vmstat.numa_other
> 53696 ± 16% -28.6% 38362 ± 10% proc-vmstat.numa_pages_migrated
> 1053504 ± 2% +7.7% 1135052 ± 4% proc-vmstat.pgfault
> 2077549 ± 3% +8.5% 2254157 ± 4% proc-vmstat.pgfree
> 53696 ± 16% -28.6% 38362 ± 10% proc-vmstat.pgmigrate_success
> 4.941e+10 -2.6% 4.81e+10 perf-stat.i.branch-instructions
> 2.232e+08 -1.9% 2.189e+08 perf-stat.i.branch-misses
> 2.11e+09 -5.8% 1.989e+09 ± 2% perf-stat.i.cache-references
> 3.221e+11 -2.5% 3.141e+11 perf-stat.i.cpu-cycles
> 2.365e+11 -2.7% 2.303e+11 perf-stat.i.instructions
> 6787 ± 3% +8.0% 7327 ± 4% perf-stat.i.minor-faults
> 6789 ± 3% +8.0% 7329 ± 4% perf-stat.i.page-faults
> 4.904e+10 -2.5% 4.779e+10 perf-stat.ps.branch-instructions
> 2.215e+08 -1.8% 2.174e+08 perf-stat.ps.branch-misses
> 2.094e+09 -5.7% 1.974e+09 ± 2% perf-stat.ps.cache-references
> 3.197e+11 -2.4% 3.12e+11 perf-stat.ps.cpu-cycles
> 2.348e+11 -2.6% 2.288e+11 perf-stat.ps.instructions
> 6691 ± 3% +7.2% 7174 ± 4% perf-stat.ps.minor-faults
> 6693 ± 3% +7.2% 7176 ± 4% perf-stat.ps.page-faults
> 7475567 +16.4% 8699139 sched_debug.cfs_rq:/.avg_vruntime.avg
> 8752154 ± 3% +20.6% 10551563 ± 4% sched_debug.cfs_rq:/.avg_vruntime.max
> 211424 ± 12% +374.5% 1003211 ± 39% sched_debug.cfs_rq:/.avg_vruntime.stddev
> 19.44 ± 6% +29.4% 25.17 ± 5% sched_debug.cfs_rq:/.h_nr_queued.max
> 4.49 ± 4% +33.5% 5.99 ± 4% sched_debug.cfs_rq:/.h_nr_queued.stddev
> 19.33 ± 6% +29.0% 24.94 ± 5% sched_debug.cfs_rq:/.h_nr_runnable.max
> 4.47 ± 4% +33.4% 5.96 ± 3% sched_debug.cfs_rq:/.h_nr_runnable.stddev
> 6446 ±223% +885.4% 63529 ± 57% sched_debug.cfs_rq:/.left_deadline.avg
> 825119 ±223% +613.5% 5886958 ± 44% sched_debug.cfs_rq:/.left_deadline.max
> 72645 ±223% +713.6% 591074 ± 49% sched_debug.cfs_rq:/.left_deadline.stddev
> 6446 ±223% +885.5% 63527 ± 57% sched_debug.cfs_rq:/.left_vruntime.avg
> 825080 ±223% +613.5% 5886805 ± 44% sched_debug.cfs_rq:/.left_vruntime.max
> 72642 ±223% +713.7% 591058 ± 49% sched_debug.cfs_rq:/.left_vruntime.stddev
> 4202 ± 8% +1115.1% 51069 ± 61% sched_debug.cfs_rq:/.load.stddev
> 367.11 +20.2% 441.44 ± 17% sched_debug.cfs_rq:/.load_avg.max
> 7475567 +16.4% 8699139 sched_debug.cfs_rq:/.min_vruntime.avg
> 8752154 ± 3% +20.6% 10551563 ± 4% sched_debug.cfs_rq:/.min_vruntime.max
> 211424 ± 12% +374.5% 1003211 ± 39% sched_debug.cfs_rq:/.min_vruntime.stddev
> 0.17 ± 16% +39.8% 0.24 ± 6% sched_debug.cfs_rq:/.nr_queued.stddev
> 6446 ±223% +885.5% 63527 ± 57% sched_debug.cfs_rq:/.right_vruntime.avg
> 825080 ±223% +613.5% 5886805 ± 44% sched_debug.cfs_rq:/.right_vruntime.max
> 72642 ±223% +713.7% 591058 ± 49% sched_debug.cfs_rq:/.right_vruntime.stddev
> 752.39 ± 81% -81.4% 139.72 ± 53% sched_debug.cfs_rq:/.runnable_avg.min
> 2728 ± 3% +51.2% 4126 ± 8% sched_debug.cfs_rq:/.runnable_avg.stddev
> 265.50 ± 2% +12.3% 298.07 ± 2% sched_debug.cfs_rq:/.util_avg.stddev
> 686.78 ± 7% +23.4% 847.76 ± 6% sched_debug.cfs_rq:/.util_est.stddev
> 19.44 ± 5% +29.7% 25.22 ± 4% sched_debug.cpu.nr_running.max
> 4.48 ± 5% +34.4% 6.02 ± 3% sched_debug.cpu.nr_running.stddev
> 67323 ± 14% +130.3% 155017 ± 29% sched_debug.cpu.nr_switches.stddev
> -20.78 -18.2% -17.00 sched_debug.cpu.nr_uninterruptible.min
> 0.13 ±100% -85.8% 0.02 ±163% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> 0.17 ±116% -97.8% 0.00 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
> 22.92 ±110% -97.4% 0.59 ±137% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
> 8.10 ± 45% -78.0% 1.78 ±135% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
> 3.14 ± 19% -70.9% 0.91 ±102% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> 39.05 ±149% -97.4% 1.01 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
> 15.77 ±203% -99.7% 0.04 ±102% perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
> 1.27 ±177% -98.2% 0.02 ±190% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
> 0.20 ±140% -92.4% 0.02 ±201% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
> 86.63 ±221% -99.9% 0.05 ±184% perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
> 0.18 ± 75% -97.0% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
> 0.13 ± 34% -75.5% 0.03 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 0.26 ±108% -86.2% 0.04 ±142% perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
> 2.33 ± 11% -65.8% 0.80 ±107% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 0.18 ± 88% -91.1% 0.02 ±194% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
> 0.50 ±145% -92.5% 0.04 ±210% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
> 0.19 ±116% -98.5% 0.00 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
> 0.24 ±128% -96.8% 0.01 ±180% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
> 0.99 ± 16% -58.0% 0.42 ±100% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 0.27 ±124% -97.5% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
> 1.08 ± 28% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.96 ± 93% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 0.53 ±182% -94.2% 0.03 ±158% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 0.84 ±160% -93.5% 0.05 ±100% perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> 29.39 ±172% -94.0% 1.78 ±123% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 21.51 ± 60% -74.7% 5.45 ±118% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 13.77 ± 61% -81.3% 2.57 ±113% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 11.22 ± 33% -74.5% 2.86 ±107% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 1.99 ± 90% -90.1% 0.20 ±100% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
> 4.50 ±138% -94.9% 0.23 ±200% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
> 27.91 ±218% -99.6% 0.11 ±120% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 9.91 ± 51% -68.3% 3.15 ±124% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 10.18 ± 24% -62.4% 3.83 ±105% perf-sched.sch_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
> 1.16 ± 20% -62.7% 0.43 ±106% perf-sched.sch_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 0.27 ± 99% -92.0% 0.02 ±172% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> 0.32 ±128% -98.9% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
> 0.88 ± 94% -86.7% 0.12 ±144% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
> 252.53 ±128% -98.4% 4.12 ±138% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
> 60.22 ± 58% -67.8% 19.37 ±146% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
> 168.93 ±209% -99.9% 0.15 ±100% perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
> 3.79 ±169% -98.6% 0.05 ±199% perf-sched.sch_delay.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
> 517.19 ±222% -99.9% 0.29 ±201% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
> 0.54 ± 82% -98.4% 0.01 ±141% perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
> 0.34 ± 57% -93.1% 0.02 ±203% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
> 0.64 ±141% -99.4% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
> 0.28 ±111% -97.2% 0.01 ±180% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
> 0.29 ±114% -97.6% 0.01 ±141% perf-sched.sch_delay.max.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
> 133.30 ± 46% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 12.53 ±135% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 1.11 ± 85% -76.9% 0.26 ±202% perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
> 7.48 ±214% -99.0% 0.08 ±141% perf-sched.sch_delay.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 28.59 ±191% -99.0% 0.28 ±120% perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> 285.16 ±145% -99.3% 1.94 ±111% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 143.71 ±128% -91.0% 12.97 ±134% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
> 107.10 ±162% -99.1% 0.95 ±190% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
> 352.73 ±216% -99.4% 2.06 ±118% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 1169 ± 25% -58.7% 482.79 ±101% perf-sched.sch_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 1.80 ± 20% -58.5% 0.75 ±105% perf-sched.total_sch_delay.average.ms
> 5.09 ± 20% -58.0% 2.14 ±106% perf-sched.total_wait_and_delay.average.ms
> 20.86 ± 25% -82.0% 3.76 ±147% perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
> 8.10 ± 21% -69.1% 2.51 ±103% perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> 22.82 ± 27% -66.9% 7.55 ±103% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 6.55 ± 13% -64.1% 2.35 ±108% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 139.95 ± 55% -64.0% 50.45 ±122% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
> 27.54 ± 61% -81.3% 5.15 ±113% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 27.75 ± 30% -73.3% 7.42 ±106% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 26.76 ± 25% -64.2% 9.57 ±107% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
> 29.39 ± 34% -67.3% 9.61 ±115% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 27.53 ± 25% -62.9% 10.21 ±105% perf-sched.wait_and_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
> 3.25 ± 20% -62.2% 1.23 ±106% perf-sched.wait_and_delay.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 864.18 ± 4% -99.3% 6.27 ±103% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 141.47 ± 38% -72.9% 38.27 ±154% perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
> 2346 ± 25% -58.7% 969.53 ±101% perf-sched.wait_and_delay.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 83.99 ±223% -100.0% 0.02 ±163% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> 0.16 ±122% -97.7% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
> 12.76 ± 37% -81.6% 2.35 ±125% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
> 4.96 ± 22% -67.9% 1.59 ±104% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> 75.22 ± 91% -96.4% 2.67 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
> 23.31 ±188% -98.8% 0.28 ±195% perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
> 14.93 ± 22% -68.0% 4.78 ±104% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 1.29 ±178% -98.5% 0.02 ±185% perf-sched.wait_time.avg.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
> 0.20 ±140% -92.5% 0.02 ±200% perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
> 87.29 ±221% -99.9% 0.05 ±184% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
> 0.18 ± 76% -97.0% 0.01 ±141% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
> 0.12 ± 33% -87.4% 0.02 ±212% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 4.22 ± 15% -63.3% 1.55 ±108% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 0.18 ± 88% -91.1% 0.02 ±194% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
> 0.50 ±145% -92.5% 0.04 ±210% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
> 0.19 ±116% -98.5% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
> 0.24 ±128% -96.8% 0.01 ±180% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
> 1.79 ± 27% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.98 ± 92% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 2.44 ±199% -98.1% 0.05 ±109% perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> 125.16 ± 52% -64.6% 44.36 ±120% perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
> 13.77 ± 61% -81.3% 2.58 ±113% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 16.53 ± 29% -72.5% 4.55 ±106% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 3.11 ± 80% -80.7% 0.60 ±138% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
> 17.30 ± 23% -65.0% 6.05 ±107% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
> 50.76 ±143% -98.1% 0.97 ±101% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 19.48 ± 27% -66.8% 6.46 ±111% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 17.35 ± 25% -63.3% 6.37 ±106% perf-sched.wait_time.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
> 2.09 ± 21% -62.0% 0.79 ±107% perf-sched.wait_time.avg.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 850.73 ± 6% -99.3% 5.76 ±102% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 168.00 ±223% -100.0% 0.02 ±172% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> 0.32 ±131% -98.8% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.__get_user_pages.get_user_pages_remote.get_arg_page.copy_strings
> 0.88 ± 94% -86.7% 0.12 ±144% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
> 83.05 ± 45% -75.0% 20.78 ±142% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
> 393.39 ± 76% -96.3% 14.60 ±223% perf-sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
> 3.87 ±170% -98.6% 0.05 ±199% perf-sched.wait_time.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
> 520.88 ±222% -99.9% 0.29 ±201% perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
> 0.54 ± 82% -98.4% 0.01 ±141% perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.path_openat
> 0.34 ± 57% -93.1% 0.02 ±203% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
> 0.64 ±141% -99.4% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.commit_merge
> 0.28 ±111% -97.2% 0.01 ±180% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
> 210.15 ± 42% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 34.48 ±131% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 1.11 ± 85% -76.9% 0.26 ±202% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
> 92.32 ±212% -99.7% 0.27 ±123% perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> 3252 ± 21% -58.5% 1351 ±103% perf-sched.wait_time.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
> 1602 ± 28% -66.2% 541.12 ±100% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 530.17 ± 95% -98.5% 7.79 ±119% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 1177 ± 25% -58.7% 486.74 ±101% perf-sched.wait_time.max.ms.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 50.88 -1.4 49.53 perf-profile.calltrace.cycles-pp.read
> 45.95 -1.0 44.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
> 45.66 -1.0 44.64 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> 3.44 ± 4% -0.8 2.66 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
> 3.32 ± 4% -0.8 2.56 ± 4% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg
> 3.28 ± 4% -0.8 2.52 ± 4% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable
> 3.48 ± 3% -0.6 2.83 ± 5% perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 3.52 ± 3% -0.6 2.87 ± 5% perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
> 3.45 ± 3% -0.6 2.80 ± 5% perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg
> 47.06 -0.6 46.45 perf-profile.calltrace.cycles-pp.write
> 4.26 ± 5% -0.6 3.69 perf-profile.calltrace.cycles-pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write
> 1.58 ± 3% -0.6 1.02 ± 8% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
> 1.31 ± 3% -0.5 0.85 ± 8% perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 1.25 ± 3% -0.4 0.81 ± 8% perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> 0.84 ± 3% -0.2 0.60 ± 5% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> 7.91 -0.2 7.68 perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
> 3.17 ± 2% -0.2 2.94 perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
> 7.80 -0.2 7.58 perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 7.58 -0.2 7.36 perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
> 1.22 ± 4% -0.2 1.02 ± 4% perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
> 1.18 ± 4% -0.2 0.99 ± 4% perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_timeout
> 0.87 -0.2 0.68 ± 8% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_timeout
> 1.14 ± 4% -0.2 0.95 ± 4% perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule
> 0.90 -0.2 0.72 ± 7% perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_stream_read_generic
> 3.45 ± 3% -0.1 3.30 perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
> 1.96 -0.1 1.82 perf-profile.calltrace.cycles-pp.clear_bhb_loop.read
> 1.97 -0.1 1.86 perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
> 2.35 -0.1 2.25 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> 2.58 -0.1 2.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
> 1.38 ± 4% -0.1 1.28 ± 2% perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
> 1.35 -0.1 1.25 perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
> 0.67 ± 7% -0.1 0.58 ± 3% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule
> 2.59 -0.1 2.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> 2.02 -0.1 1.96 perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 0.77 ± 3% -0.0 0.72 ± 2% perf-profile.calltrace.cycles-pp.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 0.65 ± 4% -0.0 0.60 ± 2% perf-profile.calltrace.cycles-pp.fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> 0.74 -0.0 0.70 perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
> 1.04 -0.0 0.99 perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb
> 0.69 -0.0 0.65 ± 2% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter
> 0.82 -0.0 0.80 perf-profile.calltrace.cycles-pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
> 0.57 -0.0 0.56 perf-profile.calltrace.cycles-pp.refill_obj_stock.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
> 0.80 ± 9% +0.2 1.01 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
> 2.50 ± 4% +0.3 2.82 ± 9% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 2.64 ± 6% +0.4 3.06 ± 12% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic
> 2.73 ± 6% +0.4 3.16 ± 12% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
> 2.87 ± 6% +0.4 3.30 ± 12% perf-profile.calltrace.cycles-pp.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
> 18.38 +0.6 18.93 perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
> 0.00 +0.7 0.70 ± 11% perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
> 0.00 +0.8 0.76 ± 16% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_stream_sendmsg.sock_write_iter.vfs_write
> 0.00 +1.5 1.46 ± 11% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
> 0.00 +1.5 1.46 ± 11% perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> 0.00 +1.5 1.46 ± 11% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 0.00 +1.5 1.50 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> 0.00 +1.5 1.52 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
> 0.00 +1.6 1.61 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> 0.18 ±141% +1.8 1.93 ± 11% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> 0.18 ±141% +1.8 1.94 ± 11% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
> 0.18 ±141% +1.8 1.94 ± 11% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
> 0.18 ±141% +1.8 1.97 ± 11% perf-profile.calltrace.cycles-pp.common_startup_64
> 0.00 +2.0 1.96 ± 11% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter
> 87.96 -1.4 86.57 perf-profile.children.cycles-pp.do_syscall_64
> 88.72 -1.4 87.33 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 51.44 -1.4 50.05 perf-profile.children.cycles-pp.read
> 4.55 ± 2% -0.8 3.74 ± 5% perf-profile.children.cycles-pp.schedule
> 3.76 ± 4% -0.7 3.02 ± 3% perf-profile.children.cycles-pp.__wake_up_common
> 3.64 ± 4% -0.7 2.92 ± 3% perf-profile.children.cycles-pp.autoremove_wake_function
> 3.60 ± 4% -0.7 2.90 ± 3% perf-profile.children.cycles-pp.try_to_wake_up
> 4.00 ± 2% -0.6 3.36 ± 4% perf-profile.children.cycles-pp.schedule_timeout
> 4.65 ± 2% -0.6 4.02 ± 4% perf-profile.children.cycles-pp.__schedule
> 47.64 -0.6 47.01 perf-profile.children.cycles-pp.write
> 4.58 ± 4% -0.5 4.06 perf-profile.children.cycles-pp.__wake_up_sync_key
> 1.45 ± 2% -0.4 1.00 ± 5% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 1.84 ± 3% -0.3 1.50 ± 3% perf-profile.children.cycles-pp.ttwu_do_activate
> 1.62 ± 2% -0.3 1.33 ± 3% perf-profile.children.cycles-pp.enqueue_task
> 1.53 ± 2% -0.3 1.26 ± 3% perf-profile.children.cycles-pp.enqueue_task_fair
> 1.40 -0.3 1.14 ± 6% perf-profile.children.cycles-pp.pick_next_task_fair
> 3.97 -0.2 3.73 perf-profile.children.cycles-pp.clear_bhb_loop
> 1.43 -0.2 1.19 ± 5% perf-profile.children.cycles-pp.__pick_next_task
> 0.75 ± 4% -0.2 0.52 ± 8% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
> 7.95 -0.2 7.72 perf-profile.children.cycles-pp.unix_stream_read_actor
> 7.84 -0.2 7.61 perf-profile.children.cycles-pp.skb_copy_datagram_iter
> 3.24 ± 2% -0.2 3.01 perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
> 7.63 -0.2 7.42 perf-profile.children.cycles-pp.__skb_datagram_iter
> 0.94 ± 4% -0.2 0.73 ± 4% perf-profile.children.cycles-pp.enqueue_entity
> 0.95 ± 8% -0.2 0.76 ± 4% perf-profile.children.cycles-pp.update_curr
> 1.37 ± 3% -0.2 1.18 ± 3% perf-profile.children.cycles-pp.dequeue_task_fair
> 1.34 ± 4% -0.2 1.16 ± 3% perf-profile.children.cycles-pp.try_to_block_task
> 4.50 -0.2 4.34 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
> 1.37 ± 3% -0.2 1.20 ± 3% perf-profile.children.cycles-pp.dequeue_entities
> 3.48 ± 3% -0.1 3.33 perf-profile.children.cycles-pp._copy_to_iter
> 0.91 -0.1 0.78 ± 3% perf-profile.children.cycles-pp.update_load_avg
> 4.85 -0.1 4.72 perf-profile.children.cycles-pp.__check_object_size
> 3.23 -0.1 3.11 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 0.54 ± 3% -0.1 0.42 ± 5% perf-profile.children.cycles-pp.switch_mm_irqs_off
> 1.40 ± 4% -0.1 1.30 ± 2% perf-profile.children.cycles-pp._copy_from_iter
> 2.02 -0.1 1.92 perf-profile.children.cycles-pp.its_return_thunk
> 0.43 ± 2% -0.1 0.32 ± 3% perf-profile.children.cycles-pp.switch_fpu_return
> 0.29 ± 2% -0.1 0.18 ± 6% perf-profile.children.cycles-pp.__enqueue_entity
> 1.46 ± 3% -0.1 1.36 ± 2% perf-profile.children.cycles-pp.fdget_pos
> 0.44 ± 3% -0.1 0.34 ± 5% perf-profile.children.cycles-pp.set_next_entity
> 0.42 ± 2% -0.1 0.32 ± 4% perf-profile.children.cycles-pp.pick_task_fair
> 0.31 ± 2% -0.1 0.24 ± 6% perf-profile.children.cycles-pp.reweight_entity
> 0.28 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.__dequeue_entity
> 1.96 -0.1 1.88 perf-profile.children.cycles-pp.obj_cgroup_charge_account
> 0.28 ± 2% -0.1 0.21 ± 3% perf-profile.children.cycles-pp.update_cfs_group
> 0.23 ± 2% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.pick_eevdf
> 0.26 ± 2% -0.1 0.19 ± 4% perf-profile.children.cycles-pp.wakeup_preempt
> 1.46 -0.1 1.40 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.48 ± 2% -0.1 0.42 ± 5% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
> 0.30 -0.1 0.24 ± 4% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> 0.82 -0.1 0.77 perf-profile.children.cycles-pp.__cond_resched
> 0.27 ± 2% -0.0 0.22 ± 4% perf-profile.children.cycles-pp.__update_load_avg_se
> 0.14 ± 3% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.update_curr_se
> 0.79 -0.0 0.74 perf-profile.children.cycles-pp.mutex_lock
> 0.34 ± 3% -0.0 0.30 ± 5% perf-profile.children.cycles-pp.rseq_ip_fixup
> 0.15 ± 4% -0.0 0.11 ± 5% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> 0.21 ± 3% -0.0 0.16 ± 4% perf-profile.children.cycles-pp.__switch_to
> 0.17 ± 4% -0.0 0.13 ± 7% perf-profile.children.cycles-pp.place_entity
> 0.22 -0.0 0.18 ± 2% perf-profile.children.cycles-pp.wake_affine
> 0.24 -0.0 0.20 ± 2% perf-profile.children.cycles-pp.check_stack_object
> 0.64 ± 2% -0.0 0.61 ± 3% perf-profile.children.cycles-pp.__virt_addr_valid
> 0.38 ± 2% -0.0 0.34 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
> 0.18 ± 3% -0.0 0.14 ± 6% perf-profile.children.cycles-pp.update_rq_clock
> 0.66 -0.0 0.62 perf-profile.children.cycles-pp.rw_verify_area
> 0.19 -0.0 0.16 ± 4% perf-profile.children.cycles-pp.task_mm_cid_work
> 0.34 ± 3% -0.0 0.31 ± 2% perf-profile.children.cycles-pp.update_process_times
> 0.12 ± 8% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.detach_tasks
> 0.39 ± 3% -0.0 0.36 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 0.21 ± 3% -0.0 0.18 ± 6% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> 0.18 ± 6% -0.0 0.15 ± 4% perf-profile.children.cycles-pp.task_tick_fair
> 0.25 ± 3% -0.0 0.22 ± 4% perf-profile.children.cycles-pp.rseq_get_rseq_cs
> 0.23 ± 5% -0.0 0.20 ± 3% perf-profile.children.cycles-pp.sched_tick
> 0.14 ± 3% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.check_preempt_wakeup_fair
> 0.11 ± 4% -0.0 0.08 ± 7% perf-profile.children.cycles-pp.update_min_vruntime
> 0.06 -0.0 0.03 ± 70% perf-profile.children.cycles-pp.update_curr_dl_se
> 0.14 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.put_prev_entity
> 0.13 ± 5% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.task_h_load
> 0.68 -0.0 0.65 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.46 ± 2% -0.0 0.43 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.52 -0.0 0.50 perf-profile.children.cycles-pp.scm_recv_unix
> 0.08 ± 4% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.__cgroup_account_cputime
> 0.11 ± 5% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.__switch_to_asm
> 0.46 ± 2% -0.0 0.44 ± 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 0.08 ± 8% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.activate_task
> 0.08 ± 8% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.detach_task
> 0.11 ± 5% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.os_xsave
> 0.13 ± 5% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.avg_vruntime
> 0.13 ± 4% -0.0 0.11 ± 5% perf-profile.children.cycles-pp.update_entity_lag
> 0.08 ± 4% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.__calc_delta
> 0.09 ± 5% -0.0 0.07 ± 8% perf-profile.children.cycles-pp.vruntime_eligible
> 0.34 ± 2% -0.0 0.32 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> 0.30 -0.0 0.29 ± 2% perf-profile.children.cycles-pp.__build_skb_around
> 0.08 ± 5% -0.0 0.07 ± 6% perf-profile.children.cycles-pp.rseq_update_cpu_node_id
> 0.15 -0.0 0.14 perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
> 0.07 ± 5% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret
> 0.38 ± 2% +0.0 0.40 ± 2% perf-profile.children.cycles-pp.mod_memcg_lruvec_state
> 0.27 ± 2% +0.0 0.30 ± 2% perf-profile.children.cycles-pp.prepare_task_switch
> 0.05 ± 7% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.handle_softirqs
> 0.06 +0.0 0.09 ± 11% perf-profile.children.cycles-pp.finish_wait
> 0.06 ± 7% +0.0 0.11 ± 6% perf-profile.children.cycles-pp.__irq_exit_rcu
> 0.06 ± 8% +0.1 0.11 ± 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist
> 0.01 ±223% +0.1 0.07 ± 10% perf-profile.children.cycles-pp.ktime_get
> 0.54 ± 4% +0.1 0.61 perf-profile.children.cycles-pp.select_task_rq
> 0.00 +0.1 0.07 ± 10% perf-profile.children.cycles-pp.enqueue_dl_entity
> 0.12 ± 4% +0.1 0.19 ± 7% perf-profile.children.cycles-pp.get_any_partial
> 0.10 ± 9% +0.1 0.18 ± 5% perf-profile.children.cycles-pp.available_idle_cpu
> 0.00 +0.1 0.08 ± 9% perf-profile.children.cycles-pp.hrtimer_start_range_ns
> 0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.dl_server_start
> 0.00 +0.1 0.08 ± 11% perf-profile.children.cycles-pp.dl_server_stop
> 0.46 ± 2% +0.1 0.54 ± 2% perf-profile.children.cycles-pp.select_task_rq_fair
> 0.00 +0.1 0.10 ± 10% perf-profile.children.cycles-pp.select_idle_core
> 0.09 ± 7% +0.1 0.20 ± 8% perf-profile.children.cycles-pp.select_idle_cpu
> 0.18 ± 4% +0.1 0.31 ± 6% perf-profile.children.cycles-pp.select_idle_sibling
> 0.00 +0.2 0.18 ± 4% perf-profile.children.cycles-pp.process_one_work
> 0.06 ± 13% +0.2 0.25 ± 9% perf-profile.children.cycles-pp.schedule_idle
> 0.44 ± 2% +0.2 0.64 ± 8% perf-profile.children.cycles-pp.prepare_to_wait
> 0.00 +0.2 0.21 ± 5% perf-profile.children.cycles-pp.kthread
> 0.00 +0.2 0.21 ± 5% perf-profile.children.cycles-pp.worker_thread
> 0.00 +0.2 0.21 ± 4% perf-profile.children.cycles-pp.ret_from_fork
> 0.00 +0.2 0.21 ± 4% perf-profile.children.cycles-pp.ret_from_fork_asm
> 0.11 ± 12% +0.3 0.36 ± 9% perf-profile.children.cycles-pp.sched_ttwu_pending
> 0.31 ± 35% +0.3 0.59 ± 11% perf-profile.children.cycles-pp.__cmd_record
> 0.26 ± 45% +0.3 0.54 ± 13% perf-profile.children.cycles-pp.perf_session__process_events
> 0.26 ± 45% +0.3 0.54 ± 13% perf-profile.children.cycles-pp.reader__read_event
> 0.26 ± 45% +0.3 0.54 ± 13% perf-profile.children.cycles-pp.record__finish_output
> 0.16 ± 11% +0.3 0.45 ± 9% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> 0.14 ± 11% +0.3 0.45 ± 9% perf-profile.children.cycles-pp.__sysvec_call_function_single
> 0.14 ± 60% +0.3 0.48 ± 17% perf-profile.children.cycles-pp.ordered_events__queue
> 0.14 ± 61% +0.3 0.48 ± 17% perf-profile.children.cycles-pp.queue_event
> 0.15 ± 59% +0.3 0.49 ± 16% perf-profile.children.cycles-pp.process_simple
> 0.16 ± 12% +0.4 0.54 ± 10% perf-profile.children.cycles-pp.sysvec_call_function_single
> 4.61 ± 3% +0.5 5.13 ± 8% perf-profile.children.cycles-pp.get_partial_node
> 5.57 ± 3% +0.6 6.12 ± 7% perf-profile.children.cycles-pp.___slab_alloc
> 18.44 +0.6 19.00 perf-profile.children.cycles-pp.sock_alloc_send_pskb
> 6.51 ± 3% +0.7 7.26 ± 9% perf-profile.children.cycles-pp.__put_partials
> 0.33 ± 14% +1.0 1.30 ± 11% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
> 0.34 ± 17% +1.1 1.47 ± 11% perf-profile.children.cycles-pp.pv_native_safe_halt
> 0.34 ± 17% +1.1 1.48 ± 11% perf-profile.children.cycles-pp.acpi_safe_halt
> 0.34 ± 17% +1.1 1.48 ± 11% perf-profile.children.cycles-pp.acpi_idle_do_entry
> 0.34 ± 17% +1.1 1.48 ± 11% perf-profile.children.cycles-pp.acpi_idle_enter
> 0.35 ± 17% +1.2 1.53 ± 11% perf-profile.children.cycles-pp.cpuidle_enter_state
> 0.35 ± 17% +1.2 1.54 ± 11% perf-profile.children.cycles-pp.cpuidle_enter
> 0.38 ± 17% +1.3 1.63 ± 11% perf-profile.children.cycles-pp.cpuidle_idle_call
> 0.45 ± 16% +1.5 1.94 ± 11% perf-profile.children.cycles-pp.start_secondary
> 0.46 ± 17% +1.5 1.96 ± 11% perf-profile.children.cycles-pp.do_idle
> 0.46 ± 17% +1.5 1.97 ± 11% perf-profile.children.cycles-pp.common_startup_64
> 0.46 ± 17% +1.5 1.97 ± 11% perf-profile.children.cycles-pp.cpu_startup_entry
> 13.76 ± 2% +1.7 15.44 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 12.09 ± 2% +1.9 14.00 ± 6% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 3.93 -0.2 3.69 perf-profile.self.cycles-pp.clear_bhb_loop
> 3.43 ± 3% -0.1 3.29 perf-profile.self.cycles-pp._copy_to_iter
> 0.50 ± 2% -0.1 0.39 ± 5% perf-profile.self.cycles-pp.switch_mm_irqs_off
> 1.37 ± 4% -0.1 1.27 ± 2% perf-profile.self.cycles-pp._copy_from_iter
> 0.28 ± 2% -0.1 0.18 ± 7% perf-profile.self.cycles-pp.__enqueue_entity
> 1.41 ± 3% -0.1 1.31 ± 2% perf-profile.self.cycles-pp.fdget_pos
> 2.51 -0.1 2.42 perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> 1.35 -0.1 1.28 perf-profile.self.cycles-pp.read
> 2.24 -0.1 2.17 perf-profile.self.cycles-pp.do_syscall_64
> 0.27 ± 3% -0.1 0.20 ± 3% perf-profile.self.cycles-pp.update_cfs_group
> 1.28 -0.1 1.22 perf-profile.self.cycles-pp.sock_write_iter
> 0.84 -0.1 0.77 perf-profile.self.cycles-pp.vfs_read
> 1.42 -0.1 1.36 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 1.20 -0.1 1.14 perf-profile.self.cycles-pp.__alloc_skb
> 0.18 ± 2% -0.1 0.13 ± 5% perf-profile.self.cycles-pp.pick_eevdf
> 1.04 -0.1 0.99 perf-profile.self.cycles-pp.its_return_thunk
> 0.29 ± 2% -0.1 0.24 ± 4% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> 0.28 ± 5% -0.1 0.23 ± 6% perf-profile.self.cycles-pp.update_curr
> 0.13 ± 5% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.switch_fpu_return
> 0.20 ± 3% -0.0 0.15 ± 6% perf-profile.self.cycles-pp.__dequeue_entity
> 1.00 -0.0 0.95 perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
> 0.33 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.update_load_avg
> 0.88 -0.0 0.83 ± 2% perf-profile.self.cycles-pp.vfs_write
> 0.91 -0.0 0.86 perf-profile.self.cycles-pp.sock_read_iter
> 0.13 ± 3% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.update_curr_se
> 0.25 ± 2% -0.0 0.21 ± 4% perf-profile.self.cycles-pp.__update_load_avg_se
> 1.22 -0.0 1.18 perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
> 0.68 -0.0 0.63 perf-profile.self.cycles-pp.__check_object_size
> 0.78 ± 2% -0.0 0.74 perf-profile.self.cycles-pp.obj_cgroup_charge_account
> 0.20 ± 3% -0.0 0.16 ± 4% perf-profile.self.cycles-pp.__switch_to
> 0.15 ± 3% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.try_to_wake_up
> 0.90 -0.0 0.86 perf-profile.self.cycles-pp.entry_SYSCALL_64
> 0.76 ± 2% -0.0 0.73 perf-profile.self.cycles-pp.__check_heap_object
> 0.92 -0.0 0.89 ± 2% perf-profile.self.cycles-pp.__account_obj_stock
> 0.19 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.check_stack_object
> 0.40 ± 3% -0.0 0.37 perf-profile.self.cycles-pp.__schedule
> 0.60 ± 2% -0.0 0.56 ± 3% perf-profile.self.cycles-pp.__virt_addr_valid
> 0.71 -0.0 0.68 perf-profile.self.cycles-pp.__skb_datagram_iter
> 0.18 ± 4% -0.0 0.14 ± 5% perf-profile.self.cycles-pp.task_mm_cid_work
> 0.68 -0.0 0.65 perf-profile.self.cycles-pp.refill_obj_stock
> 0.34 -0.0 0.31 ± 2% perf-profile.self.cycles-pp.unix_stream_recvmsg
> 0.06 ± 7% -0.0 0.03 ± 70% perf-profile.self.cycles-pp.enqueue_task
> 0.11 -0.0 0.08 perf-profile.self.cycles-pp.pick_task_fair
> 0.15 ± 2% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.enqueue_task_fair
> 0.20 ± 3% -0.0 0.17 ± 7% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> 0.41 -0.0 0.38 perf-profile.self.cycles-pp.sock_recvmsg
> 0.10 -0.0 0.07 ± 6% perf-profile.self.cycles-pp.update_min_vruntime
> 0.13 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.task_h_load
> 0.23 ± 3% -0.0 0.20 ± 6% perf-profile.self.cycles-pp.__get_user_8
> 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.exit_to_user_mode_loop
> 0.39 ± 2% -0.0 0.37 ± 2% perf-profile.self.cycles-pp.rw_verify_area
> 0.11 ± 3% -0.0 0.09 ± 7% perf-profile.self.cycles-pp.os_xsave
> 0.12 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.pick_next_task_fair
> 0.35 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
> 0.46 -0.0 0.44 perf-profile.self.cycles-pp.mutex_lock
> 0.11 ± 4% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__switch_to_asm
> 0.10 ± 3% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.enqueue_entity
> 0.08 ± 7% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.place_entity
> 0.30 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.alloc_skb_with_frags
> 0.50 -0.0 0.48 perf-profile.self.cycles-pp.kfree
> 0.30 -0.0 0.28 perf-profile.self.cycles-pp.ksys_write
> 0.12 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.dequeue_entity
> 0.11 ± 4% -0.0 0.09 perf-profile.self.cycles-pp.prepare_to_wait
> 0.19 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.update_rq_clock_task
> 0.27 -0.0 0.25 ± 2% perf-profile.self.cycles-pp.__build_skb_around
> 0.08 ± 6% -0.0 0.06 ± 9% perf-profile.self.cycles-pp.vruntime_eligible
> 0.12 ± 4% -0.0 0.10 perf-profile.self.cycles-pp.__wake_up_common
> 0.27 -0.0 0.26 perf-profile.self.cycles-pp.kmalloc_reserve
> 0.48 -0.0 0.46 perf-profile.self.cycles-pp.unix_write_space
> 0.19 -0.0 0.18 ± 2% perf-profile.self.cycles-pp.skb_copy_datagram_iter
> 0.07 -0.0 0.06 ± 6% perf-profile.self.cycles-pp.__calc_delta
> 0.06 ± 6% -0.0 0.05 perf-profile.self.cycles-pp.__put_user_8
> 0.28 -0.0 0.27 perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
> 0.11 -0.0 0.10 perf-profile.self.cycles-pp.wait_for_unix_gc
> 0.05 +0.0 0.06 perf-profile.self.cycles-pp.__x64_sys_write
> 0.07 ± 5% +0.0 0.08 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret
> 0.19 ± 7% +0.0 0.22 ± 4% perf-profile.self.cycles-pp.prepare_task_switch
> 0.10 ± 6% +0.1 0.17 ± 5% perf-profile.self.cycles-pp.available_idle_cpu
> 0.14 ± 61% +0.3 0.48 ± 17% perf-profile.self.cycles-pp.queue_event
> 0.19 ± 18% +0.7 0.85 ± 12% perf-profile.self.cycles-pp.pv_native_safe_halt
> 12.07 ± 2% +1.9 13.97 ± 6% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
>
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_3/aim9/300s
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 9156 +20.2% 11004 vmstat.system.cs
> 8715946 ± 6% -14.0% 7494314 ± 13% meminfo.DirectMap2M
> 10992 +85.4% 20381 meminfo.PageTables
> 318.58 -1.7% 313.01 aim9.shell_rtns_3.ops_per_sec
> 27145198 -2.1% 26576524 aim9.time.minor_page_faults
> 1049306 -1.8% 1030938 aim9.time.voluntary_context_switches
> 6173 ± 20% +74.0% 10742 ± 4% numa-meminfo.node0.PageTables
> 5702 ± 31% +55.1% 8844 ± 19% numa-meminfo.node0.Shmem
> 4803 ± 25% +100.6% 9636 ± 6% numa-meminfo.node1.PageTables
> 1538 ± 20% +73.7% 2673 ± 5% numa-vmstat.node0.nr_page_table_pages
> 1425 ± 31% +55.1% 2210 ± 19% numa-vmstat.node0.nr_shmem
> 1194 ± 25% +101.2% 2402 ± 6% numa-vmstat.node1.nr_page_table_pages
> 30413 +19.3% 36291 sched_debug.cpu.nr_switches.avg
> 84768 ± 6% +20.3% 101955 ± 4% sched_debug.cpu.nr_switches.max
> 25510 ± 13% +23.0% 31383 ± 3% sched_debug.cpu.nr_switches.stddev
> 2727 +85.8% 5066 proc-vmstat.nr_page_table_pages
> 19325131 -1.6% 19014535 proc-vmstat.numa_hit
> 19274656 -1.6% 18964467 proc-vmstat.numa_local
> 19877211 -1.6% 19563123 proc-vmstat.pgalloc_normal
> 28020416 -2.0% 27451741 proc-vmstat.pgfault
> 19829318 -1.6% 19508263 proc-vmstat.pgfree
> 2679 -1.6% 2636 proc-vmstat.unevictable_pgs_culled
> 0.03 ± 10% +30.9% 0.04 ± 2% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.02 ± 5% +26.2% 0.02 ± 3% perf-sched.total_sch_delay.average.ms
> 27.03 ± 2% -12.4% 23.66 perf-sched.total_wait_and_delay.average.ms
> 23171 +18.2% 27385 perf-sched.total_wait_and_delay.count.ms
> 27.01 ± 2% -12.5% 23.64 perf-sched.total_wait_time.average.ms
> 110.73 ± 4% -71.1% 31.98 perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 1662 ± 2% +278.6% 6294 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 110.70 ± 4% -71.1% 31.94 perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 5.94 +0.1 6.00 perf-stat.i.branch-miss-rate%
> 9184 +20.2% 11041 perf-stat.i.context-switches
> 1.96 +1.6% 1.99 perf-stat.i.cpi
> 71.73 ± 4% +66.1% 119.11 ± 5% perf-stat.i.cpu-migrations
> 0.53 -1.4% 0.52 perf-stat.i.ipc
> 3.79 -2.0% 3.71 perf-stat.i.metric.K/sec
> 90919 -2.0% 89065 perf-stat.i.minor-faults
> 90919 -2.0% 89065 perf-stat.i.page-faults
> 6.00 +0.1 6.06 perf-stat.overall.branch-miss-rate%
> 1.79 +1.2% 1.81 perf-stat.overall.cpi
> 0.56 -1.2% 0.55 perf-stat.overall.ipc
> 9154 +20.2% 11004 perf-stat.ps.context-switches
> 71.49 ± 4% +66.1% 118.72 ± 5% perf-stat.ps.cpu-migrations
> 90616 -2.0% 88768 perf-stat.ps.minor-faults
> 90616 -2.0% 88768 perf-stat.ps.page-faults
> 8.89 -0.2 8.68 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 8.88 -0.2 8.66 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.47 ± 2% -0.2 3.29 perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.47 ± 2% -0.2 3.29 perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.51 ± 3% -0.2 3.33 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.47 ± 2% -0.2 3.29 perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
> 1.66 ± 2% -0.1 1.57 ± 4% perf-profile.calltrace.cycles-pp.setlocale
> 0.27 ±100% +0.3 0.61 ± 5% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> 0.18 ±141% +0.4 0.60 ± 5% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
> 62.46 +0.6 63.01 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> 0.09 ±223% +0.6 0.65 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 0.09 ±223% +0.6 0.65 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 49.01 +0.6 49.60 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 67.47 +0.7 68.17 perf-profile.calltrace.cycles-pp.common_startup_64
> 20.25 -0.7 19.58 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 20.21 -0.7 19.54 perf-profile.children.cycles-pp.do_syscall_64
> 6.54 -0.2 6.33 perf-profile.children.cycles-pp.asm_exc_page_fault
> 6.10 -0.2 5.90 perf-profile.children.cycles-pp.do_user_addr_fault
> 3.77 ± 3% -0.2 3.60 perf-profile.children.cycles-pp.x64_sys_call
> 3.62 ± 3% -0.2 3.46 perf-profile.children.cycles-pp.do_exit
> 2.63 ± 3% -0.2 2.48 ± 2% perf-profile.children.cycles-pp.__mmput
> 2.16 ± 2% -0.1 2.06 ± 3% perf-profile.children.cycles-pp.ksys_mmap_pgoff
> 1.66 ± 2% -0.1 1.57 ± 4% perf-profile.children.cycles-pp.setlocale
> 2.69 ± 2% -0.1 2.61 perf-profile.children.cycles-pp.do_pte_missing
> 0.77 ± 5% -0.1 0.70 ± 6% perf-profile.children.cycles-pp.tlb_finish_mmu
> 0.92 ± 2% -0.0 0.87 ± 4% perf-profile.children.cycles-pp.__irqentry_text_end
> 0.08 ± 10% -0.0 0.04 ± 71% perf-profile.children.cycles-pp.tick_nohz_tick_stopped
> 0.10 ± 11% -0.0 0.07 ± 21% perf-profile.children.cycles-pp.__percpu_counter_init_many
> 0.14 ± 9% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.strnlen
> 0.12 ± 11% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.mas_prev_slot
> 0.11 ± 12% +0.0 0.14 ± 9% perf-profile.children.cycles-pp.update_curr
> 0.19 ± 8% +0.0 0.22 ± 6% perf-profile.children.cycles-pp.enqueue_entity
> 0.10 ± 11% +0.0 0.13 ± 11% perf-profile.children.cycles-pp.__perf_event_task_sched_out
> 0.05 ± 46% +0.0 0.08 ± 13% perf-profile.children.cycles-pp.select_task_rq
> 0.13 ± 14% +0.0 0.17 ± 8% perf-profile.children.cycles-pp.perf_pmu_sched_task
> 0.20 ± 10% +0.0 0.24 ± 2% perf-profile.children.cycles-pp.try_to_wake_up
> 0.28 ± 9% +0.1 0.34 ± 9% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 0.04 ± 44% +0.1 0.11 ± 13% perf-profile.children.cycles-pp.__queue_work
> 0.30 ± 11% +0.1 0.38 ± 8% perf-profile.children.cycles-pp.ttwu_do_activate
> 0.30 ± 4% +0.1 0.38 ± 8% perf-profile.children.cycles-pp.__pick_next_task
> 0.22 ± 7% +0.1 0.29 ± 9% perf-profile.children.cycles-pp.try_to_block_task
> 0.02 ±141% +0.1 0.09 ± 10% perf-profile.children.cycles-pp.kick_pool
> 0.02 ± 99% +0.1 0.10 ± 19% perf-profile.children.cycles-pp.queue_work_on
> 0.25 ± 4% +0.1 0.35 ± 7% perf-profile.children.cycles-pp.sched_ttwu_pending
> 0.33 ± 6% +0.1 0.43 ± 5% perf-profile.children.cycles-pp.flush_smp_call_function_queue
> 0.29 ± 4% +0.1 0.39 ± 6% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> 0.51 ± 6% +0.1 0.63 ± 6% perf-profile.children.cycles-pp.schedule_idle
> 0.46 ± 7% +0.1 0.58 ± 5% perf-profile.children.cycles-pp.schedule
> 0.88 ± 6% +0.2 1.04 ± 5% perf-profile.children.cycles-pp.ret_from_fork_asm
> 0.18 ± 6% +0.2 0.34 ± 8% perf-profile.children.cycles-pp.worker_thread
> 0.88 ± 6% +0.2 1.04 ± 5% perf-profile.children.cycles-pp.ret_from_fork
> 0.38 ± 8% +0.2 0.56 ± 10% perf-profile.children.cycles-pp.kthread
> 1.08 ± 3% +0.2 1.32 ± 2% perf-profile.children.cycles-pp.__schedule
> 66.15 +0.5 66.64 perf-profile.children.cycles-pp.cpuidle_idle_call
> 62.89 +0.6 63.47 perf-profile.children.cycles-pp.cpuidle_enter_state
> 63.00 +0.6 63.59 perf-profile.children.cycles-pp.cpuidle_enter
> 49.10 +0.6 49.69 perf-profile.children.cycles-pp.intel_idle
> 67.47 +0.7 68.17 perf-profile.children.cycles-pp.do_idle
> 67.47 +0.7 68.17 perf-profile.children.cycles-pp.common_startup_64
> 67.47 +0.7 68.17 perf-profile.children.cycles-pp.cpu_startup_entry
> 0.91 ± 2% -0.0 0.86 ± 4% perf-profile.self.cycles-pp.__irqentry_text_end
> 0.14 ± 11% +0.1 0.22 ± 11% perf-profile.self.cycles-pp.timerqueue_del
> 49.08 +0.6 49.68 perf-profile.self.cycles-pp.intel_idle
>
>
>
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/800%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 3745213 ± 39% +108.1% 7794858 ± 12% cpuidle..usage
> 186670 +17.3% 218939 ± 2% meminfo.Percpu
> 5.00 +306.7% 20.33 ± 66% mpstat.max_utilization.seconds
> 9.35 ± 76% -4.5 4.80 ±141% perf-profile.calltrace.cycles-pp.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
> 8.90 ± 75% -4.3 4.57 ±141% perf-profile.calltrace.cycles-pp.perf_session__deliver_event.__ordered_events__flush.perf_session__process_events.record__finish_output.__cmd_record
> 3283 ± 7% -16.2% 2751 ± 5% sched_debug.cfs_rq:/.avg_vruntime.avg
> 3283 ± 7% -16.2% 2751 ± 5% sched_debug.cfs_rq:/.min_vruntime.avg
> 1522512 ± 6% +80.0% 2739797 ± 4% vmstat.system.cs
> 308726 ± 8% +60.5% 495472 ± 5% vmstat.system.in
> 467562 +3.7% 485068 ± 2% proc-vmstat.nr_kernel_stack
> 266084 +3.8% 276310 proc-vmstat.nr_slab_unreclaimable
> 1.375e+08 -2.0% 1.347e+08 proc-vmstat.numa_hit
> 1.373e+08 -2.0% 1.346e+08 proc-vmstat.numa_local
> 217472 ± 3% -28.1% 156410 proc-vmstat.numa_other
> 1.382e+08 -2.0% 1.354e+08 proc-vmstat.pgalloc_normal
> 1.375e+08 -2.0% 1.347e+08 proc-vmstat.pgfree
> 1514102 -6.2% 1420287 hackbench.throughput
> 1480357 -6.7% 1380775 hackbench.throughput_avg
> 1514102 -6.2% 1420287 hackbench.throughput_best
> 1436918 -7.9% 1323413 hackbench.throughput_worst
> 14551264 ± 13% +138.1% 34644707 ± 3% hackbench.time.involuntary_context_switches
> 9919 -1.6% 9762 hackbench.time.percent_of_cpu_this_job_got
> 4239 +4.5% 4428 hackbench.time.system_time
> 56365933 ± 6% +65.3% 93172066 ± 4% hackbench.time.voluntary_context_switches
> 65085618 +26.7% 82440571 ± 2% perf-stat.i.branch-misses
> 31.25 -1.6 29.66 perf-stat.i.cache-miss-rate%
> 2.469e+08 +8.9% 2.689e+08 perf-stat.i.cache-misses
> 7.519e+08 +15.9% 8.712e+08 perf-stat.i.cache-references
> 1353061 ± 7% +87.5% 2537450 ± 5% perf-stat.i.context-switches
> 2.269e+11 +3.5% 2.348e+11 perf-stat.i.cpu-cycles
> 134588 ± 13% +81.9% 244825 ± 8% perf-stat.i.cpu-migrations
> 13.60 ± 5% +70.5% 23.20 ± 5% perf-stat.i.metric.K/sec
> 1.26 +7.6% 1.35 perf-stat.overall.MPKI
> 0.11 ± 2% +0.0 0.14 ± 2% perf-stat.overall.branch-miss-rate%
> 34.12 -2.1 31.97 perf-stat.overall.cache-miss-rate%
> 1.17 +1.8% 1.19 perf-stat.overall.cpi
> 931.96 -5.3% 882.44 perf-stat.overall.cycles-between-cache-misses
> 0.85 -1.8% 0.84 perf-stat.overall.ipc
> 5.372e+10 -1.2% 5.31e+10 perf-stat.ps.branch-instructions
> 57783128 ± 2% +32.9% 76802898 ± 2% perf-stat.ps.branch-misses
> 2.696e+08 +7.2% 2.89e+08 perf-stat.ps.cache-misses
> 7.902e+08 +14.4% 9.039e+08 perf-stat.ps.cache-references
> 1288664 ± 7% +94.6% 2508227 ± 5% perf-stat.ps.context-switches
> 2.512e+11 +1.5% 2.55e+11 perf-stat.ps.cpu-cycles
> 122960 ± 14% +82.3% 224127 ± 9% perf-stat.ps.cpu-migrations
> 1.108e+13 +5.7% 1.171e+13 perf-stat.total.instructions
> 0.94 ±223% +5929.9% 56.62 ±121% perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
> 26.44 ± 81% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 100.25 ±141% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 9.01 ± 43% +1823.1% 173.24 ±106% perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
> 49.43 ± 14% +73.8% 85.93 ± 19% perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
> 130.63 ± 17% +135.8% 308.04 ± 28% perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 18.09 ± 30% +130.4% 41.70 ± 26% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 196.51 ± 21% +102.9% 398.77 ± 15% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 34.17 ± 39% +191.1% 99.46 ± 20% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
> 154.91 ±163% +1649.9% 2710 ± 91% perf-sched.sch_delay.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 0.94 ±223% +1.9e+05% 1743 ±120% perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
> 3.19 ±124% -91.9% 0.26 ±150% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 646.26 ± 94% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 282.66 ±139% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 63.17 ± 52% +2854.4% 1866 ±121% perf-sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
> 1507 ± 35% +249.4% 5266 ± 47% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 3915 ± 67% +98.7% 7779 ± 16% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 53.31 ± 18% +79.9% 95.90 ± 23% perf-sched.total_sch_delay.average.ms
> 149.37 ± 18% +80.0% 268.92 ± 22% perf-sched.total_wait_and_delay.average.ms
> 96.07 ± 18% +80.1% 173.01 ± 21% perf-sched.total_wait_time.average.ms
> 244.53 ± 47% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
> 529.64 ± 20% +38.5% 733.60 ± 20% perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
> 136.52 ± 15% +73.7% 237.07 ± 18% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
> 373.41 ± 16% +136.3% 882.34 ± 27% perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 51.96 ± 29% +127.5% 118.22 ± 25% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 554.86 ± 23% +103.0% 1126 ± 14% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 298.52 ±136% +436.9% 1602 ± 27% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
> 556.66 ± 37% -97.1% 16.09 ± 47% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 707.67 ± 31% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
> 1358 ± 28% +4707.9% 65291 ± 27% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 12184 ± 5% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
> 1393 ±134% +379.9% 6685 ± 15% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
> 6927 ± 6% +119.8% 15224 ± 19% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 341.61 ± 21% +39.1% 475.15 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
> 51.39 ± 99% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 121.14 ±122% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 87.09 ± 15% +73.6% 151.14 ± 18% perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
> 242.78 ± 16% +136.6% 574.31 ± 27% perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 33.86 ± 29% +126.0% 76.52 ± 24% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 250.32 ±109% -89.4% 26.44 ±111% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
> 358.36 ± 25% +103.1% 727.72 ± 14% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 77.40 ± 47% +102.5% 156.70 ± 28% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
> 17.91 ± 42% -75.3% 4.42 ± 76% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
> 266.70 ±137% +431.6% 1417 ± 36% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
> 536.93 ± 40% -97.4% 13.81 ± 50% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 180.38 ±135% +2208.8% 4164 ± 71% perf-sched.wait_time.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 1028 ±129% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 312.94 ±123% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 418.66 ±132% -93.7% 26.44 ±111% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_common_interrupt.[unknown].[unknown]
> 1388 ±133% +379.7% 6660 ± 15% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
> 2022 ± 25% +164.9% 5358 ± 46% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>
>
>
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_1/aim9/300s
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 11004 +86.2% 20490 meminfo.PageTables
> 121.33 ± 12% +18.8% 144.17 ± 5% perf-c2c.DRAM.remote
> 9155 +20.0% 10990 vmstat.system.cs
> 5129 ± 20% +107.2% 10631 ± 3% numa-meminfo.node0.PageTables
> 5864 ± 17% +67.3% 9811 ± 3% numa-meminfo.node1.PageTables
> 1278 ± 20% +107.9% 2658 ± 3% numa-vmstat.node0.nr_page_table_pages
> 1469 ± 17% +66.4% 2446 ± 3% numa-vmstat.node1.nr_page_table_pages
> 319.43 -2.1% 312.66 aim9.shell_rtns_1.ops_per_sec
> 27217846 -2.5% 26546962 aim9.time.minor_page_faults
> 1051878 -2.1% 1029547 aim9.time.voluntary_context_switches
> 30502 +18.6% 36187 sched_debug.cpu.nr_switches.avg
> 90327 ± 12% +22.7% 110866 ± 4% sched_debug.cpu.nr_switches.max
> 26316 ± 16% +25.5% 33021 ± 5% sched_debug.cpu.nr_switches.stddev
> 0.03 ± 7% +70.7% 0.05 ± 53% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.02 ± 3% +38.9% 0.02 ± 28% perf-sched.total_sch_delay.average.ms
> 27.43 ± 2% -14.5% 23.45 perf-sched.total_wait_and_delay.average.ms
> 23174 +18.0% 27340 perf-sched.total_wait_and_delay.count.ms
> 27.41 ± 2% -14.6% 23.42 perf-sched.total_wait_time.average.ms
> 115.38 ± 3% -71.9% 32.37 ± 2% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 1656 ± 3% +280.2% 6299 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 115.35 ± 3% -72.0% 32.31 ± 2% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 2737 +86.1% 5095 proc-vmstat.nr_page_table_pages
> 30460 +3.2% 31439 proc-vmstat.nr_shmem
> 27933 +1.8% 28432 proc-vmstat.nr_slab_unreclaimable
> 19466749 -2.5% 18980434 proc-vmstat.numa_hit
> 19414531 -2.5% 18927584 proc-vmstat.numa_local
> 20028107 -2.5% 19528806 proc-vmstat.pgalloc_normal
> 28087705 -2.4% 27417155 proc-vmstat.pgfault
> 19980173 -2.5% 19474402 proc-vmstat.pgfree
> 420074 -5.7% 396239 ± 8% proc-vmstat.pgreuse
> 2685 -1.9% 2633 proc-vmstat.unevictable_pgs_culled
> 5.48e+08 -1.2% 5.412e+08 perf-stat.i.branch-instructions
> 5.92 +0.1 6.00 perf-stat.i.branch-miss-rate%
> 9195 +19.9% 11021 perf-stat.i.context-switches
> 1.96 +1.7% 1.99 perf-stat.i.cpi
> 70.13 +73.4% 121.59 ± 8% perf-stat.i.cpu-migrations
> 2.725e+09 -1.3% 2.69e+09 perf-stat.i.instructions
> 0.53 -1.6% 0.52 perf-stat.i.ipc
> 3.80 -2.4% 3.71 perf-stat.i.metric.K/sec
> 91139 -2.4% 88949 perf-stat.i.minor-faults
> 91139 -2.4% 88949 perf-stat.i.page-faults
> 5.00 ± 44% +1.1 6.07 perf-stat.overall.branch-miss-rate%
> 1.49 ± 44% +21.9% 1.82 perf-stat.overall.cpi
> 7643 ± 44% +43.7% 10984 perf-stat.ps.context-switches
> 58.17 ± 44% +108.4% 121.21 ± 8% perf-stat.ps.cpu-migrations
> 2.06 ± 2% -0.2 1.87 ± 12% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.98 ± 7% -0.2 0.83 ± 12% perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 1.69 ± 2% -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.setlocale
> 0.58 ± 5% -0.1 0.44 ± 44% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__open64_nocancel.setlocale
> 0.72 ± 6% -0.1 0.60 ± 8% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interrupt
> 3.21 ± 2% -0.1 3.11 perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64
> 0.70 ± 4% -0.1 0.62 ± 6% perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
> 1.52 ± 2% -0.1 1.44 ± 3% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.34 ± 3% -0.1 1.28 ± 3% perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
> 0.89 ± 3% -0.1 0.84 perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
> 0.17 ±141% +0.4 0.61 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 0.17 ±141% +0.4 0.61 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 65.10 +0.5 65.56 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> 66.40 +0.6 67.00 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> 66.46 +0.6 67.08 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
> 66.46 +0.6 67.08 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
> 67.63 +0.7 68.30 perf-profile.calltrace.cycles-pp.common_startup_64
> 20.14 -0.6 19.51 perf-profile.children.cycles-pp.do_syscall_64
> 20.20 -0.6 19.57 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 1.13 ± 5% -0.2 0.98 ± 9% perf-profile.children.cycles-pp.rcu_core
> 1.69 ± 2% -0.1 1.54 ± 2% perf-profile.children.cycles-pp.setlocale
> 0.84 ± 4% -0.1 0.71 ± 5% perf-profile.children.cycles-pp.rcu_do_batch
> 2.16 ± 2% -0.1 2.04 ± 3% perf-profile.children.cycles-pp.ksys_mmap_pgoff
> 1.15 ± 4% -0.1 1.04 ± 5% perf-profile.children.cycles-pp.__open64_nocancel
> 3.22 ± 2% -0.1 3.12 perf-profile.children.cycles-pp.exec_binprm
> 2.09 ± 2% -0.1 2.00 ± 2% perf-profile.children.cycles-pp.kernel_clone
> 0.88 ± 4% -0.1 0.79 ± 4% perf-profile.children.cycles-pp.mas_store_prealloc
> 2.19 -0.1 2.10 ± 3% perf-profile.children.cycles-pp.__x64_sys_openat
> 0.70 ± 4% -0.1 0.62 ± 6% perf-profile.children.cycles-pp.dup_mm
> 1.36 ± 3% -0.1 1.30 perf-profile.children.cycles-pp._Fork
> 0.56 ± 4% -0.1 0.50 ± 8% perf-profile.children.cycles-pp.dup_mmap
> 0.09 ± 16% -0.1 0.03 ± 70% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
> 0.31 ± 8% -0.1 0.25 ± 10% perf-profile.children.cycles-pp.strncpy_from_user
> 0.94 ± 3% -0.1 0.88 ± 2% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
> 0.41 ± 5% -0.0 0.36 ± 5% perf-profile.children.cycles-pp.irqtime_account_irq
> 0.18 ± 12% -0.0 0.14 ± 7% perf-profile.children.cycles-pp.tlb_remove_table_rcu
> 0.20 ± 7% -0.0 0.17 ± 9% perf-profile.children.cycles-pp.perf_event_task_tick
> 0.08 ± 14% -0.0 0.05 ± 49% perf-profile.children.cycles-pp.mas_update_gap
> 0.24 ± 5% -0.0 0.21 ± 5% perf-profile.children.cycles-pp.filemap_read
> 0.19 ± 7% -0.0 0.16 ± 8% perf-profile.children.cycles-pp.__call_rcu_common
> 0.22 ± 2% -0.0 0.19 ± 5% perf-profile.children.cycles-pp.mas_next_slot
> 0.09 ± 5% +0.0 0.12 ± 7% perf-profile.children.cycles-pp.__perf_event_task_sched_out
> 0.05 ± 47% +0.0 0.08 ± 10% perf-profile.children.cycles-pp.lru_gen_del_folio
> 0.10 ± 14% +0.0 0.12 ± 18% perf-profile.children.cycles-pp.__folio_mod_stat
> 0.12 ± 12% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.perf_pmu_sched_task
> 0.20 ± 10% +0.0 0.24 ± 4% perf-profile.children.cycles-pp.prepare_task_switch
> 0.06 ± 47% +0.0 0.10 ± 11% perf-profile.children.cycles-pp.__queue_work
> 0.56 ± 5% +0.1 0.61 ± 4% perf-profile.children.cycles-pp.sched_balance_domains
> 0.04 ± 72% +0.1 0.09 ± 11% perf-profile.children.cycles-pp.kick_pool
> 0.04 ± 72% +0.1 0.09 ± 14% perf-profile.children.cycles-pp.queue_work_on
> 0.33 ± 6% +0.1 0.38 ± 7% perf-profile.children.cycles-pp.dequeue_entities
> 0.35 ± 6% +0.1 0.40 ± 7% perf-profile.children.cycles-pp.dequeue_task_fair
> 0.52 ± 6% +0.1 0.58 ± 5% perf-profile.children.cycles-pp.enqueue_task_fair
> 0.54 ± 7% +0.1 0.60 ± 5% perf-profile.children.cycles-pp.enqueue_task
> 0.28 ± 9% +0.1 0.35 ± 5% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 0.21 ± 4% +0.1 0.28 ± 12% perf-profile.children.cycles-pp.try_to_block_task
> 0.34 ± 4% +0.1 0.42 ± 3% perf-profile.children.cycles-pp.ttwu_do_activate
> 0.36 ± 3% +0.1 0.46 ± 6% perf-profile.children.cycles-pp.flush_smp_call_function_queue
> 0.28 ± 4% +0.1 0.38 ± 5% perf-profile.children.cycles-pp.sched_ttwu_pending
> 0.33 ± 2% +0.1 0.43 ± 5% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> 0.46 ± 7% +0.1 0.56 ± 6% perf-profile.children.cycles-pp.schedule
> 0.48 ± 8% +0.1 0.61 ± 8% perf-profile.children.cycles-pp.timerqueue_del
> 0.18 ± 13% +0.1 0.32 ± 11% perf-profile.children.cycles-pp.worker_thread
> 0.38 ± 9% +0.2 0.52 ± 10% perf-profile.children.cycles-pp.kthread
> 1.10 ± 5% +0.2 1.25 ± 2% perf-profile.children.cycles-pp.__schedule
> 0.85 ± 8% +0.2 1.01 ± 7% perf-profile.children.cycles-pp.ret_from_fork
> 0.85 ± 8% +0.2 1.02 ± 7% perf-profile.children.cycles-pp.ret_from_fork_asm
> 63.15 +0.5 63.64 perf-profile.children.cycles-pp.cpuidle_enter
> 66.26 +0.5 66.77 perf-profile.children.cycles-pp.cpuidle_idle_call
> 66.46 +0.6 67.08 perf-profile.children.cycles-pp.start_secondary
> 67.63 +0.7 68.30 perf-profile.children.cycles-pp.common_startup_64
> 67.63 +0.7 68.30 perf-profile.children.cycles-pp.cpu_startup_entry
> 67.63 +0.7 68.30 perf-profile.children.cycles-pp.do_idle
> 1.20 ± 3% -0.1 1.12 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.09 ± 16% -0.1 0.03 ± 70% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
> 0.25 ± 6% -0.0 0.21 ± 12% perf-profile.self.cycles-pp.irqtime_account_irq
> 0.02 ±141% +0.0 0.06 ± 13% perf-profile.self.cycles-pp.prepend_path
> 0.13 ± 10% +0.1 0.24 ± 11% perf-profile.self.cycles-pp.timerqueue_del
>
>
>
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/50%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 3.924e+08 ± 3% +55.1% 6.086e+08 ± 2% cpuidle..time
> 7504886 ± 11% +184.4% 21340245 ± 6% cpuidle..usage
> 13350305 -3.8% 12848570 vmstat.system.cs
> 1849619 +5.1% 1943754 vmstat.system.in
> 3.56 ± 5% +2.6 6.16 ± 7% mpstat.cpu.all.idle%
> 0.69 +0.2 0.90 ± 3% mpstat.cpu.all.irq%
> 0.03 ± 3% +0.0 0.04 ± 3% mpstat.cpu.all.soft%
> 18666 ± 9% +41.2% 26352 ± 6% perf-c2c.DRAM.remote
> 197041 -39.6% 118945 ± 5% perf-c2c.HITM.local
> 3178 ± 12% +37.2% 4361 ± 11% perf-c2c.HITM.remote
> 200219 -38.4% 123307 ± 5% perf-c2c.HITM.total
> 2842579 ± 11% +60.1% 4550025 ± 12% meminfo.Active
> 2842579 ± 11% +60.1% 4550025 ± 12% meminfo.Active(anon)
> 5535242 ± 5% +30.9% 7248257 ± 7% meminfo.Cached
> 3846718 ± 8% +44.0% 5539484 ± 9% meminfo.Committed_AS
> 9684149 ± 3% +20.5% 11666616 ± 4% meminfo.Memused
> 136127 ± 3% +14.2% 155524 meminfo.PageTables
> 62144 +22.8% 76336 meminfo.Percpu
> 2001586 ± 16% +85.6% 3714611 ± 14% meminfo.Shmem
> 9759598 ± 3% +20.0% 11714619 ± 4% meminfo.max_used_kB
> 710625 ± 11% +59.3% 1131770 ± 11% proc-vmstat.nr_active_anon
> 1383631 ± 5% +30.6% 1806419 ± 7% proc-vmstat.nr_file_pages
> 34220 ± 3% +13.9% 38987 proc-vmstat.nr_page_table_pages
> 500216 ± 16% +84.5% 923007 ± 14% proc-vmstat.nr_shmem
> 710625 ± 11% +59.3% 1131770 ± 11% proc-vmstat.nr_zone_active_anon
> 92308030 +8.7% 1.004e+08 proc-vmstat.numa_hit
> 92171407 +8.7% 1.002e+08 proc-vmstat.numa_local
> 133616 +2.7% 137265 proc-vmstat.numa_other
> 92394313 +8.7% 1.004e+08 proc-vmstat.pgalloc_normal
> 91035691 +7.8% 98094626 proc-vmstat.pgfree
> 867815 +11.8% 970369 hackbench.throughput
> 830278 +11.6% 926834 hackbench.throughput_avg
> 867815 +11.8% 970369 hackbench.throughput_best
> 760822 +14.2% 869145 hackbench.throughput_worst
> 72.87 -10.3% 65.36 hackbench.time.elapsed_time
> 72.87 -10.3% 65.36 hackbench.time.elapsed_time.max
> 2.493e+08 -17.7% 2.052e+08 hackbench.time.involuntary_context_switches
> 12357 -3.9% 11879 hackbench.time.percent_of_cpu_this_job_got
> 8029 -14.8% 6842 hackbench.time.system_time
> 976.58 -5.5% 923.21 hackbench.time.user_time
> 7.54e+08 -14.4% 6.451e+08 hackbench.time.voluntary_context_switches
> 5.598e+10 +6.6% 5.965e+10 perf-stat.i.branch-instructions
> 0.40 -0.0 0.38 perf-stat.i.branch-miss-rate%
> 8.36 ± 2% +4.6 12.98 ± 3% perf-stat.i.cache-miss-rate%
> 2.11e+09 -33.8% 1.396e+09 perf-stat.i.cache-references
> 13687653 -3.4% 13225338 perf-stat.i.context-switches
> 1.36 -7.9% 1.25 perf-stat.i.cpi
> 3.219e+11 -2.2% 3.147e+11 perf-stat.i.cpu-cycles
> 1915 ± 2% -6.6% 1788 ± 3% perf-stat.i.cycles-between-cache-misses
> 2.371e+11 +6.0% 2.512e+11 perf-stat.i.instructions
> 0.74 +8.5% 0.80 perf-stat.i.ipc
> 1.15 ± 14% -28.3% 0.82 ± 23% perf-stat.i.major-faults
> 115.09 -3.2% 111.40 perf-stat.i.metric.K/sec
> 0.37 -0.0 0.35 perf-stat.overall.branch-miss-rate%
> 8.15 ± 3% +4.6 12.74 ± 3% perf-stat.overall.cache-miss-rate%
> 1.36 -7.7% 1.25 perf-stat.overall.cpi
> 1875 ± 2% -5.5% 1772 ± 4% perf-stat.overall.cycles-between-cache-misses
> 0.74 +8.3% 0.80 perf-stat.overall.ipc
> 5.524e+10 +6.4% 5.877e+10 perf-stat.ps.branch-instructions
> 2.079e+09 -33.9% 1.375e+09 perf-stat.ps.cache-references
> 13486088 -3.4% 13020988 perf-stat.ps.context-switches
> 3.175e+11 -2.3% 3.101e+11 perf-stat.ps.cpu-cycles
> 2.34e+11 +5.8% 2.475e+11 perf-stat.ps.instructions
> 1.09 ± 14% -28.3% 0.78 ± 21% perf-stat.ps.major-faults
> 1.73e+13 -5.1% 1.642e+13 perf-stat.total.instructions
> 3527725 +10.7% 3905361 sched_debug.cfs_rq:/.avg_vruntime.avg
> 3975260 +14.1% 4535959 ± 6% sched_debug.cfs_rq:/.avg_vruntime.max
> 98657 ± 17% +84.9% 182407 ± 18% sched_debug.cfs_rq:/.avg_vruntime.stddev
> 11.83 ± 7% +17.6% 13.92 ± 5% sched_debug.cfs_rq:/.h_nr_queued.max
> 2.71 ± 5% +21.8% 3.30 ± 4% sched_debug.cfs_rq:/.h_nr_queued.stddev
> 11.75 ± 7% +17.7% 13.83 ± 6% sched_debug.cfs_rq:/.h_nr_runnable.max
> 2.68 ± 4% +21.2% 3.25 ± 5% sched_debug.cfs_rq:/.h_nr_runnable.stddev
> 4556 ±223% +691.0% 36039 ± 34% sched_debug.cfs_rq:/.left_deadline.avg
> 583131 ±223% +577.3% 3949548 ± 4% sched_debug.cfs_rq:/.left_deadline.max
> 51341 ±223% +622.0% 370695 ± 16% sched_debug.cfs_rq:/.left_deadline.stddev
> 4555 ±223% +691.0% 36035 ± 34% sched_debug.cfs_rq:/.left_vruntime.avg
> 583105 ±223% +577.3% 3949123 ± 4% sched_debug.cfs_rq:/.left_vruntime.max
> 51338 ±223% +622.0% 370651 ± 16% sched_debug.cfs_rq:/.left_vruntime.stddev
> 3527725 +10.7% 3905361 sched_debug.cfs_rq:/.min_vruntime.avg
> 3975260 +14.1% 4535959 ± 6% sched_debug.cfs_rq:/.min_vruntime.max
> 98657 ± 17% +84.9% 182407 ± 18% sched_debug.cfs_rq:/.min_vruntime.stddev
> 0.22 ± 5% +13.9% 0.25 ± 5% sched_debug.cfs_rq:/.nr_queued.stddev
> 4555 ±223% +691.0% 36035 ± 34% sched_debug.cfs_rq:/.right_vruntime.avg
> 583105 ±223% +577.3% 3949123 ± 4% sched_debug.cfs_rq:/.right_vruntime.max
> 51338 ±223% +622.0% 370651 ± 16% sched_debug.cfs_rq:/.right_vruntime.stddev
> 1336 ± 7% +50.8% 2014 ± 6% sched_debug.cfs_rq:/.runnable_avg.stddev
> 552.53 ± 8% +19.6% 660.87 ± 5% sched_debug.cfs_rq:/.util_est.avg
> 384.27 ± 9% +28.9% 495.43 ± 11% sched_debug.cfs_rq:/.util_est.stddev
> 1328 ± 17% +42.7% 1896 ± 13% sched_debug.cpu.curr->pid.stddev
> 11.75 ± 8% +19.1% 14.00 ± 6% sched_debug.cpu.nr_running.max
> 2.71 ± 5% +22.7% 3.33 ± 4% sched_debug.cpu.nr_running.stddev
> 76578 ± 9% +33.7% 102390 ± 5% sched_debug.cpu.nr_switches.stddev
> 62.25 ± 7% +17.9% 73.42 ± 7% sched_debug.cpu.nr_uninterruptible.max
> 8.11 ± 58% -82.0% 1.46 ± 47% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> 12.04 ±104% -86.8% 1.58 ± 55% perf-sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
> 0.11 ±123% -95.3% 0.01 ±102% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
> 0.06 ±103% -93.6% 0.00 ±154% perf-sched.sch_delay.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
> 0.10 ±109% -93.9% 0.01 ±163% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
> 1.00 ± 21% -59.6% 0.40 ± 50% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
> 14.54 ± 14% -79.2% 3.02 ± 51% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
> 1.50 ± 84% -74.1% 0.39 ± 90% perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> 1.13 ± 68% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.38 ± 97% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 1.10 ± 17% -68.9% 0.34 ± 49% perf-sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
> 42.25 ± 18% -71.7% 11.96 ± 53% perf-sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 3.25 ± 17% -77.5% 0.73 ± 49% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 29.17 ± 33% -62.0% 11.09 ± 85% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 46.25 ± 15% -68.8% 14.43 ± 52% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 3.72 ± 70% -81.0% 0.70 ± 67% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
> 7.95 ± 55% -69.7% 2.41 ± 65% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
> 3.66 ±139% -97.1% 0.11 ± 58% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll
> 3.05 ± 44% -91.9% 0.25 ± 57% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
> 29.96 ± 9% -83.6% 4.90 ± 48% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
> 26.20 ± 59% -88.9% 2.92 ± 66% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 0.14 ± 84% -91.2% 0.01 ±142% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
> 0.20 ±149% -97.5% 0.01 ±102% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
> 0.11 ±144% -96.6% 0.00 ±154% perf-sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
> 0.19 ±118% -96.7% 0.01 ±163% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
> 274.64 ± 95% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.72 ±151% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 3135 ± 5% -48.6% 1611 ± 57% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 1320 ± 19% -78.6% 282.01 ± 74% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
> 265.55 ± 82% -77.9% 58.70 ±124% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
> 1850 ± 28% -59.1% 757.74 ± 68% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
> 766.85 ± 56% -68.0% 245.51 ± 51% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 1.77 ± 17% -71.9% 0.50 ± 49% perf-sched.total_sch_delay.average.ms
> 5.15 ± 17% -69.5% 1.57 ± 48% perf-sched.total_wait_and_delay.average.ms
> 3.38 ± 17% -68.2% 1.07 ± 48% perf-sched.total_wait_time.average.ms
> 5100 ± 3% -31.0% 3522 ± 47% perf-sched.total_wait_time.max.ms
> 27.42 ± 49% -85.2% 4.07 ± 47% perf-sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> 35.29 ± 80% -85.8% 5.00 ± 51% perf-sched.wait_and_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
> 42.28 ± 14% -79.4% 8.70 ± 51% perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
> 3.12 ± 17% -66.4% 1.05 ± 48% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
> 122.62 ± 18% -70.4% 36.26 ± 53% perf-sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 250.26 ± 65% -94.2% 14.56 ± 55% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 9.37 ± 17% -78.2% 2.05 ± 48% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 58.34 ± 33% -62.0% 22.18 ± 85% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 134.44 ± 15% -69.3% 41.24 ± 52% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 86.94 ± 6% -83.1% 14.68 ± 48% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
> 86.57 ± 39% -86.0% 12.14 ± 59% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 647.92 ± 48% -97.9% 13.86 ± 45% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 6386 ± 6% -46.8% 3397 ± 57% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 3868 ± 27% -60.4% 1531 ± 67% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
> 1647 ± 55% -67.7% 531.51 ± 50% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 5014 ± 5% -32.5% 3385 ± 47% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 19.31 ± 47% -86.5% 2.61 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> 23.25 ± 70% -85.3% 3.42 ± 52% perf-sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon_pipe_write
> 18.33 ± 15% -42.0% 10.64 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 0.11 ±123% -95.3% 0.01 ±102% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
> 0.06 ±103% -93.6% 0.00 ±154% perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
> 0.10 ±109% -93.9% 0.01 ±163% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
> 1.70 ± 21% -52.6% 0.81 ± 48% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs_read.ksys_read
> 27.74 ± 15% -79.5% 5.68 ± 51% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vfs_write.ksys_write
> 2.17 ± 75% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.42 ± 97% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 2.02 ± 17% -65.1% 0.70 ± 48% perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
> 80.37 ± 18% -69.8% 24.31 ± 52% perf-sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_syscall_64
> 210.13 ± 68% -95.1% 10.21 ± 55% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.12 ± 17% -78.5% 1.32 ± 48% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 29.17 ± 33% -62.0% 11.09 ± 85% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 88.19 ± 16% -69.6% 26.81 ± 52% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 13.77 ± 45% -65.7% 4.72 ± 53% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
> 104.64 ± 42% -76.4% 24.74 ±135% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
> 5.16 ± 29% -92.5% 0.39 ± 48% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
> 56.98 ± 5% -82.9% 9.77 ± 48% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
> 60.36 ± 32% -84.7% 9.22 ± 57% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 619.88 ± 43% -98.0% 12.52 ± 45% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.14 ± 84% -91.2% 0.01 ±142% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
> 740.14 ± 35% -68.5% 233.31 ± 83% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> 0.20 ±149% -97.5% 0.01 ±102% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.map_vdso.load_elf_binary.exec_binprm
> 0.11 ±144% -96.6% 0.00 ±154% perf-sched.wait_time.max.ms.__cond_resched.filemap_read.__kernel_read.exec_binprm.bprm_execve
> 0.19 ±118% -96.7% 0.01 ±163% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_link
> 327.64 ± 71% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.72 ±151% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> 3299 ± 6% -40.7% 1957 ± 51% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 436.75 ± 39% -76.9% 100.85 ± 98% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_read
> 2112 ± 19% -62.3% 796.34 ± 63% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.anon_pipe_write
> 947.83 ± 46% -58.8% 390.83 ± 53% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 5014 ± 5% -32.5% 3385 ± 47% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>
>
>
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/shell_rtns_2/aim9/300s
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 11036 +85.7% 20499 meminfo.PageTables
> 125.17 ± 8% +18.4% 148.17 ± 7% perf-c2c.HITM.local
> 30464 +18.7% 36160 sched_debug.cpu.nr_switches.avg
> 9166 +19.8% 10985 vmstat.system.cs
> 6623 ± 17% +60.8% 10652 ± 5% numa-meminfo.node0.PageTables
> 4414 ± 26% +123.2% 9853 ± 6% numa-meminfo.node1.PageTables
> 1653 ± 17% +60.1% 2647 ± 5% numa-vmstat.node0.nr_page_table_pages
> 1097 ± 26% +123.9% 2457 ± 6% numa-vmstat.node1.nr_page_table_pages
> 319.08 -2.2% 312.04 aim9.shell_rtns_2.ops_per_sec
> 27170926 -2.2% 26586121 aim9.time.minor_page_faults
> 1051038 -2.2% 1027732 aim9.time.voluntary_context_switches
> 2736 +86.4% 5101 proc-vmstat.nr_page_table_pages
> 28014 +1.3% 28378 proc-vmstat.nr_slab_unreclaimable
> 19332129 -1.5% 19048363 proc-vmstat.numa_hit
> 19283853 -1.5% 18996609 proc-vmstat.numa_local
> 19892794 -1.5% 19598065 proc-vmstat.pgalloc_normal
> 28044189 -2.1% 27457289 proc-vmstat.pgfault
> 19843766 -1.5% 19543091 proc-vmstat.pgfree
> 419715 -5.7% 395688 ± 8% proc-vmstat.pgreuse
> 2682 -2.0% 2628 proc-vmstat.unevictable_pgs_culled
> 0.07 ± 6% -30.5% 0.05 ± 22% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 0.03 ± 6% +36.0% 0.04 perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.07 ± 33% -57.5% 0.03 ± 53% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
> 0.02 ± 74% +112.0% 0.05 ± 36% perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.path_openat
> 0.02 +24.1% 0.02 ± 2% perf-sched.total_sch_delay.average.ms
> 27.52 -14.0% 23.67 perf-sched.total_wait_and_delay.average.ms
> 23179 +18.3% 27421 perf-sched.total_wait_and_delay.count.ms
> 27.50 -14.0% 23.65 perf-sched.total_wait_time.average.ms
> 117.03 ± 3% -72.4% 32.27 ± 2% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 1655 ± 2% +282.0% 6324 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.96 ± 29% +51.6% 1.45 ± 22% perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> 117.00 ± 3% -72.5% 32.23 ± 2% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 5.93 +0.1 6.00 perf-stat.i.branch-miss-rate%
> 9189 +19.8% 11011 perf-stat.i.context-switches
> 1.96 +1.6% 1.99 perf-stat.i.cpi
> 71.21 +60.6% 114.39 ± 4% perf-stat.i.cpu-migrations
> 0.53 -1.5% 0.52 perf-stat.i.ipc
> 3.79 -2.1% 3.71 perf-stat.i.metric.K/sec
> 90998 -2.1% 89084 perf-stat.i.minor-faults
> 90998 -2.1% 89084 perf-stat.i.page-faults
> 5.99 +0.1 6.06 perf-stat.overall.branch-miss-rate%
> 1.79 +1.4% 1.82 perf-stat.overall.cpi
> 0.56 -1.3% 0.55 perf-stat.overall.ipc
> 9158 +19.8% 10974 perf-stat.ps.context-switches
> 70.99 +60.6% 114.02 ± 4% perf-stat.ps.cpu-migrations
> 90694 -2.1% 88787 perf-stat.ps.minor-faults
> 90695 -2.1% 88787 perf-stat.ps.page-faults
> 8.155e+11 -1.1% 8.065e+11 perf-stat.total.instructions
> 8.87 -0.3 8.55 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 8.86 -0.3 8.54 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.53 ± 2% -0.1 2.43 ± 2% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
> 2.54 -0.1 2.44 ± 2% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
> 2.49 -0.1 2.40 ± 2% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
> 0.98 ± 5% -0.1 0.90 ± 5% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.70 ± 3% -0.1 0.62 ± 6% perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
> 0.18 ±141% +0.5 0.67 ± 6% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 0.18 ±141% +0.5 0.67 ± 6% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 0.00 +0.6 0.59 ± 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> 62.48 +0.7 63.14 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> 49.10 +0.7 49.78 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 67.62 +0.8 68.43 perf-profile.calltrace.cycles-pp.common_startup_64
> 20.14 -0.7 19.40 perf-profile.children.cycles-pp.do_syscall_64
> 20.18 -0.7 19.44 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 3.33 ± 2% -0.2 3.16 ± 2% perf-profile.children.cycles-pp.vm_mmap_pgoff
> 3.22 ± 2% -0.2 3.06 perf-profile.children.cycles-pp.do_mmap
> 3.51 ± 2% -0.1 3.38 perf-profile.children.cycles-pp.do_exit
> 3.52 ± 2% -0.1 3.38 perf-profile.children.cycles-pp.__x64_sys_exit_group
> 3.52 ± 2% -0.1 3.38 perf-profile.children.cycles-pp.do_group_exit
> 3.67 -0.1 3.54 perf-profile.children.cycles-pp.x64_sys_call
> 2.21 -0.1 2.09 ± 3% perf-profile.children.cycles-pp.__x64_sys_openat
> 2.07 ± 2% -0.1 1.94 ± 2% perf-profile.children.cycles-pp.path_openat
> 2.09 ± 2% -0.1 1.97 ± 2% perf-profile.children.cycles-pp.do_filp_open
> 2.19 -0.1 2.08 ± 3% perf-profile.children.cycles-pp.do_sys_openat2
> 1.50 ± 4% -0.1 1.39 ± 3% perf-profile.children.cycles-pp.copy_process
> 2.56 -0.1 2.46 ± 2% perf-profile.children.cycles-pp.exit_mm
> 2.55 -0.1 2.44 ± 2% perf-profile.children.cycles-pp.__mmput
> 2.51 ± 2% -0.1 2.41 ± 2% perf-profile.children.cycles-pp.exit_mmap
> 0.70 ± 3% -0.1 0.62 ± 6% perf-profile.children.cycles-pp.dup_mm
> 0.94 ± 4% -0.1 0.89 ± 2% perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
> 0.57 ± 3% -0.0 0.52 ± 4% perf-profile.children.cycles-pp.alloc_pages_noprof
> 0.20 ± 12% -0.0 0.15 ± 10% perf-profile.children.cycles-pp.perf_event_task_tick
> 0.18 ± 4% -0.0 0.14 ± 15% perf-profile.children.cycles-pp.xas_find
> 0.10 ± 12% -0.0 0.07 ± 24% perf-profile.children.cycles-pp.up_write
> 0.09 ± 6% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.tick_check_broadcast_expired
> 0.08 ± 12% +0.0 0.10 ± 8% perf-profile.children.cycles-pp.hrtimer_try_to_cancel
> 0.10 ± 13% +0.0 0.13 ± 5% perf-profile.children.cycles-pp.__perf_event_task_sched_out
> 0.20 ± 8% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.enqueue_entity
> 0.21 ± 9% +0.0 0.25 ± 4% perf-profile.children.cycles-pp.prepare_task_switch
> 0.03 ±101% +0.0 0.07 ± 16% perf-profile.children.cycles-pp.run_ksoftirqd
> 0.04 ± 71% +0.1 0.09 ± 15% perf-profile.children.cycles-pp.kick_pool
> 0.05 ± 47% +0.1 0.11 ± 16% perf-profile.children.cycles-pp.__queue_work
> 0.28 ± 5% +0.1 0.34 ± 7% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 0.50 +0.1 0.56 ± 2% perf-profile.children.cycles-pp.timerqueue_del
> 0.04 ± 71% +0.1 0.11 ± 17% perf-profile.children.cycles-pp.queue_work_on
> 0.51 ± 4% +0.1 0.58 ± 2% perf-profile.children.cycles-pp.enqueue_task_fair
> 0.32 ± 3% +0.1 0.40 ± 4% perf-profile.children.cycles-pp.ttwu_do_activate
> 0.53 ± 5% +0.1 0.61 ± 3% perf-profile.children.cycles-pp.enqueue_task
> 0.49 ± 4% +0.1 0.57 ± 6% perf-profile.children.cycles-pp.schedule
> 0.28 ± 6% +0.1 0.38 perf-profile.children.cycles-pp.sched_ttwu_pending
> 0.32 ± 5% +0.1 0.43 ± 2% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> 0.35 ± 8% +0.1 0.47 ± 2% perf-profile.children.cycles-pp.flush_smp_call_function_queue
> 0.17 ± 10% +0.2 0.34 ± 12% perf-profile.children.cycles-pp.worker_thread
> 0.88 ± 3% +0.2 1.06 ± 4% perf-profile.children.cycles-pp.ret_from_fork
> 0.88 ± 3% +0.2 1.06 ± 4% perf-profile.children.cycles-pp.ret_from_fork_asm
> 0.39 ± 6% +0.2 0.59 ± 7% perf-profile.children.cycles-pp.kthread
> 66.24 +0.6 66.85 perf-profile.children.cycles-pp.cpuidle_idle_call
> 63.09 +0.6 63.73 perf-profile.children.cycles-pp.cpuidle_enter
> 62.97 +0.6 63.61 perf-profile.children.cycles-pp.cpuidle_enter_state
> 67.61 +0.8 68.43 perf-profile.children.cycles-pp.do_idle
> 67.62 +0.8 68.43 perf-profile.children.cycles-pp.common_startup_64
> 67.62 +0.8 68.43 perf-profile.children.cycles-pp.cpu_startup_entry
> 0.37 ± 11% -0.1 0.31 ± 3% perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> 0.10 ± 13% -0.0 0.06 ± 50% perf-profile.self.cycles-pp.up_write
> 0.15 ± 4% +0.1 0.22 ± 8% perf-profile.self.cycles-pp.timerqueue_del
>
>
>
> ***************************************************************************************************
> lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/exec_test/aim9/300s
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 12120 +76.7% 21422 meminfo.PageTables
> 8543 +26.9% 10840 vmstat.system.cs
> 6148 ± 11% +89.9% 11678 ± 5% numa-meminfo.node0.PageTables
> 5909 ± 11% +64.0% 9689 ± 7% numa-meminfo.node1.PageTables
> 1532 ± 10% +90.5% 2919 ± 5% numa-vmstat.node0.nr_page_table_pages
> 1468 ± 11% +65.2% 2426 ± 7% numa-vmstat.node1.nr_page_table_pages
> 2991 +78.0% 5323 proc-vmstat.nr_page_table_pages
> 32726750 -2.4% 31952115 proc-vmstat.pgfault
> 1228 -2.6% 1197 aim9.exec_test.ops_per_sec
> 11018 ± 2% +10.5% 12178 ± 2% aim9.time.involuntary_context_switches
> 31835059 -2.4% 31062527 aim9.time.minor_page_faults
> 736468 -2.9% 715310 aim9.time.voluntary_context_switches
> 0.28 ± 7% +11.3% 0.31 ± 6% sched_debug.cfs_rq:/.h_nr_queued.stddev
> 0.28 ± 7% +11.3% 0.31 ± 6% sched_debug.cfs_rq:/.nr_queued.stddev
> 356683 ± 16% +27.0% 453000 ± 9% sched_debug.cpu.avg_idle.min
> 27620 ± 7% +29.5% 35775 sched_debug.cpu.nr_switches.avg
> 84830 ± 14% +16.3% 98648 ± 4% sched_debug.cpu.nr_switches.max
> 4563 ± 26% +46.2% 6671 ± 26% sched_debug.cpu.nr_switches.min
> 0.03 ± 4% -67.3% 0.01 ±141% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exec_release.exec_mm_release.exec_mmap
> 0.03 +11.2% 0.03 ± 2% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.05 ± 28% +61.3% 0.09 ± 21% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 0.10 ± 18% +18.8% 0.12 perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 0.02 ± 3% +18.3% 0.02 ± 2% perf-sched.total_sch_delay.average.ms
> 28.80 -19.8% 23.10 ± 3% perf-sched.total_wait_and_delay.average.ms
> 22332 +24.4% 27778 perf-sched.total_wait_and_delay.count.ms
> 28.78 -19.8% 23.07 ± 3% perf-sched.total_wait_time.average.ms
> 17.39 ± 10% -15.6% 14.67 ± 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 41.02 ± 4% -54.6% 18.64 ± 6% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 4795 ± 2% +122.5% 10668 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 17.35 ± 10% -15.7% 14.63 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 0.00 ±141% +400.0% 0.00 ± 44% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
> 40.99 ± 4% -54.6% 18.61 ± 6% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.00 ±149% +542.9% 0.03 ± 41% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
> 5.617e+08 -1.6% 5.529e+08 perf-stat.i.branch-instructions
> 5.76 +0.1 5.84 perf-stat.i.branch-miss-rate%
> 8562 +27.0% 10878 perf-stat.i.context-switches
> 1.87 +2.6% 1.92 perf-stat.i.cpi
> 78.02 ± 3% +11.8% 87.23 ± 2% perf-stat.i.cpu-migrations
> 2.792e+09 -1.6% 2.748e+09 perf-stat.i.instructions
> 0.55 -2.5% 0.54 perf-stat.i.ipc
> 4.42 -2.4% 4.31 perf-stat.i.metric.K/sec
> 106019 -2.4% 103509 perf-stat.i.minor-faults
> 106019 -2.4% 103509 perf-stat.i.page-faults
> 5.83 +0.1 5.91 perf-stat.overall.branch-miss-rate%
> 1.72 +2.3% 1.76 perf-stat.overall.cpi
> 0.58 -2.3% 0.57 perf-stat.overall.ipc
> 5.599e+08 -1.6% 5.511e+08 perf-stat.ps.branch-instructions
> 8534 +27.0% 10841 perf-stat.ps.context-switches
> 77.77 ± 3% +11.8% 86.96 ± 2% perf-stat.ps.cpu-migrations
> 2.783e+09 -1.6% 2.739e+09 perf-stat.ps.instructions
> 105666 -2.4% 103164 perf-stat.ps.minor-faults
> 105666 -2.4% 103164 perf-stat.ps.page-faults
> 8.386e+11 -1.6% 8.253e+11 perf-stat.total.instructions
> 7.79 -0.4 7.41 ± 2% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
> 7.75 -0.3 7.47 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 7.73 -0.3 7.46 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.68 ± 2% -0.2 2.52 ± 2% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
> 2.68 ± 2% -0.2 2.52 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.68 ± 2% -0.2 2.52 ± 2% perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.73 ± 2% -0.2 2.57 ± 2% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.60 -0.1 2.46 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
> 2.61 -0.1 2.47 ± 3% perf-profile.calltrace.cycles-pp.execve.exec_test
> 2.60 -0.1 2.46 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
> 2.60 -0.1 2.46 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve.exec_test
> 1.92 ± 3% -0.1 1.79 ± 2% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
> 1.92 ± 3% -0.1 1.80 ± 2% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
> 4.68 -0.1 4.57 perf-profile.calltrace.cycles-pp._Fork
> 1.88 ± 2% -0.1 1.77 ± 2% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
> 2.76 -0.1 2.66 ± 2% perf-profile.calltrace.cycles-pp.exec_test
> 3.24 -0.1 3.16 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.84 ± 4% -0.1 0.77 ± 5% perf-profile.calltrace.cycles-pp.wait4
> 0.88 ± 7% +0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> 0.88 ± 7% +0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 0.88 ± 7% +0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 0.46 ± 45% +0.3 0.78 ± 5% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.17 ±141% +0.4 0.53 ± 4% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
> 0.18 ±141% +0.4 0.54 ± 2% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> 66.08 +0.8 66.85 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
> 66.08 +0.8 66.85 perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
> 66.02 +0.8 66.80 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> 67.06 +0.9 68.00 perf-profile.calltrace.cycles-pp.common_startup_64
> 21.19 -0.9 20.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 21.15 -0.9 20.27 perf-profile.children.cycles-pp.do_syscall_64
> 7.92 -0.4 7.53 ± 2% perf-profile.children.cycles-pp.execve
> 7.94 -0.4 7.56 ± 2% perf-profile.children.cycles-pp.__x64_sys_execve
> 7.84 -0.4 7.46 ± 2% perf-profile.children.cycles-pp.do_execveat_common
> 5.51 -0.3 5.25 ± 2% perf-profile.children.cycles-pp.load_elf_binary
> 3.68 -0.2 3.49 ± 2% perf-profile.children.cycles-pp.__mmput
> 2.81 ± 2% -0.2 2.63 perf-profile.children.cycles-pp.__x64_sys_exit_group
> 2.80 ± 2% -0.2 2.62 ± 2% perf-profile.children.cycles-pp.do_exit
> 2.81 ± 2% -0.2 2.62 ± 2% perf-profile.children.cycles-pp.do_group_exit
> 2.93 ± 2% -0.2 2.76 ± 2% perf-profile.children.cycles-pp.x64_sys_call
> 3.60 -0.2 3.44 ± 2% perf-profile.children.cycles-pp.exit_mmap
> 5.66 -0.1 5.51 perf-profile.children.cycles-pp.__handle_mm_fault
> 1.94 ± 3% -0.1 1.82 ± 2% perf-profile.children.cycles-pp.exit_mm
> 2.64 -0.1 2.52 ± 3% perf-profile.children.cycles-pp.vm_mmap_pgoff
> 2.55 ± 2% -0.1 2.43 ± 3% perf-profile.children.cycles-pp.do_mmap
> 2.19 ± 2% -0.1 2.08 ± 3% perf-profile.children.cycles-pp.__mmap_region
> 2.27 -0.1 2.16 ± 2% perf-profile.children.cycles-pp.begin_new_exec
> 2.79 -0.1 2.69 ± 2% perf-profile.children.cycles-pp.exec_test
> 0.83 ± 4% -0.1 0.76 ± 6% perf-profile.children.cycles-pp.__mmap_prepare
> 0.86 ± 4% -0.1 0.78 ± 5% perf-profile.children.cycles-pp.wait4
> 0.52 ± 5% -0.1 0.45 ± 7% perf-profile.children.cycles-pp.kernel_wait4
> 0.50 ± 5% -0.1 0.43 ± 6% perf-profile.children.cycles-pp.do_wait
> 0.88 ± 3% -0.1 0.81 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
> 0.51 ± 2% -0.1 0.46 ± 6% perf-profile.children.cycles-pp.setup_arg_pages
> 0.39 ± 2% -0.0 0.34 ± 8% perf-profile.children.cycles-pp.unlink_anon_vmas
> 0.08 ± 10% -0.0 0.04 ± 71% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
> 0.37 ± 5% -0.0 0.33 ± 3% perf-profile.children.cycles-pp.__memcg_slab_free_hook
> 0.21 ± 6% -0.0 0.17 ± 5% perf-profile.children.cycles-pp.user_path_at
> 0.21 ± 3% -0.0 0.18 ± 10% perf-profile.children.cycles-pp.__percpu_counter_sum
> 0.18 ± 7% -0.0 0.15 ± 5% perf-profile.children.cycles-pp.alloc_empty_file
> 0.33 ± 5% -0.0 0.30 perf-profile.children.cycles-pp.relocate_vma_down
> 0.04 ± 45% +0.0 0.08 ± 12% perf-profile.children.cycles-pp.__update_load_avg_se
> 0.14 ± 7% +0.0 0.18 ± 10% perf-profile.children.cycles-pp.hrtimer_start_range_ns
> 0.19 ± 9% +0.0 0.24 ± 7% perf-profile.children.cycles-pp.prepare_task_switch
> 0.02 ±142% +0.0 0.06 ± 23% perf-profile.children.cycles-pp.select_task_rq
> 0.03 ±100% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.task_contending
> 0.45 ± 7% +0.1 0.51 ± 3% perf-profile.children.cycles-pp.__pick_next_task
> 0.14 ± 22% +0.1 0.20 ± 10% perf-profile.children.cycles-pp.kick_pool
> 0.36 ± 4% +0.1 0.42 ± 4% perf-profile.children.cycles-pp.dequeue_entities
> 0.36 ± 4% +0.1 0.44 ± 5% perf-profile.children.cycles-pp.dequeue_task_fair
> 0.15 ± 20% +0.1 0.23 ± 10% perf-profile.children.cycles-pp.__queue_work
> 0.49 ± 5% +0.1 0.57 ± 7% perf-profile.children.cycles-pp.schedule_idle
> 0.14 ± 22% +0.1 0.23 ± 9% perf-profile.children.cycles-pp.queue_work_on
> 0.36 ± 3% +0.1 0.46 ± 9% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 0.47 ± 7% +0.1 0.57 ± 7% perf-profile.children.cycles-pp.timerqueue_del
> 0.30 ± 13% +0.1 0.42 ± 7% perf-profile.children.cycles-pp.ttwu_do_activate
> 0.23 ± 15% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.flush_smp_call_function_queue
> 0.18 ± 14% +0.1 0.32 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending
> 0.19 ± 13% +0.1 0.34 ± 4% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> 0.61 ± 3% +0.2 0.76 ± 5% perf-profile.children.cycles-pp.schedule
> 1.60 ± 4% +0.2 1.80 ± 2% perf-profile.children.cycles-pp.ret_from_fork_asm
> 1.60 ± 4% +0.2 1.80 ± 2% perf-profile.children.cycles-pp.ret_from_fork
> 0.88 ± 7% +0.2 1.09 ± 3% perf-profile.children.cycles-pp.kthread
> 1.22 ± 3% +0.2 1.45 ± 5% perf-profile.children.cycles-pp.__schedule
> 0.54 ± 8% +0.2 0.78 ± 5% perf-profile.children.cycles-pp.worker_thread
> 66.08 +0.8 66.85 perf-profile.children.cycles-pp.start_secondary
> 67.06 +0.9 68.00 perf-profile.children.cycles-pp.common_startup_64
> 67.06 +0.9 68.00 perf-profile.children.cycles-pp.cpu_startup_entry
> 67.06 +0.9 68.00 perf-profile.children.cycles-pp.do_idle
> 0.08 ± 10% -0.0 0.04 ± 71% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
> 0.04 ± 45% +0.0 0.08 ± 10% perf-profile.self.cycles-pp.__update_load_avg_se
> 0.14 ± 10% +0.1 0.23 ± 11% perf-profile.self.cycles-pp.timerqueue_del
>
>
>
> ***************************************************************************************************
> lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
> gcc-12/performance/1BRD_48G/xfs/x86_64-rhel-9.4/600/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/sync_disk_rw/aim7
>
> commit:
> baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes")
> f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
>
> baffb122772da116 f3de761c52148abfb1b4512914f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 344180 ± 6% -13.0% 299325 ± 9% meminfo.Mapped
> 9594 ±123% +191.8% 27995 ± 54% numa-meminfo.node1.PageTables
> 2399 ±123% +191.3% 6989 ± 54% numa-vmstat.node1.nr_page_table_pages
> 1860734 -5.2% 1763194 vmstat.io.bo
> 831686 +1.3% 842493 vmstat.system.cs
> 50372 -5.5% 47609 aim7.jobs-per-min
> 1435644 +11.5% 1600707 aim7.time.involuntary_context_switches
> 7242 +1.2% 7332 aim7.time.percent_of_cpu_this_job_got
> 5159 +7.1% 5526 aim7.time.system_time
> 33195986 +6.9% 35497140 aim7.time.voluntary_context_switches
> 40987 ± 10% -19.8% 32872 ± 9% sched_debug.cfs_rq:/.avg_vruntime.stddev
> 40987 ± 10% -19.8% 32872 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev
> 605972 ± 2% +14.5% 693922 ± 7% sched_debug.cpu.avg_idle.max
> 30974 ± 8% -20.9% 24498 ± 15% sched_debug.cpu.avg_idle.min
> 118758 ± 5% +22.0% 144899 ± 6% sched_debug.cpu.avg_idle.stddev
> 856253 +1.5% 869009 perf-stat.i.context-switches
> 3.06 +2.3% 3.13 perf-stat.i.cpi
> 164824 +7.7% 177546 perf-stat.i.cpu-migrations
> 7.93 +2.5% 8.13 perf-stat.i.metric.K/sec
> 3.41 +1.8% 3.47 perf-stat.overall.cpi
> 1355 +5.8% 1434 ± 4% perf-stat.overall.cycles-between-cache-misses
> 0.29 -1.8% 0.29 perf-stat.overall.ipc
> 845412 +1.6% 858925 perf-stat.ps.context-switches
> 162728 +7.8% 175475 perf-stat.ps.cpu-migrations
> 4.391e+12 +5.0% 4.609e+12 perf-stat.total.instructions
> 444798 +6.0% 471383 ± 5% proc-vmstat.nr_active_anon
> 28190 -2.8% 27402 proc-vmstat.nr_dirty
> 1231373 +2.3% 1259666 ± 2% proc-vmstat.nr_file_pages
> 63763 +0.9% 64355 proc-vmstat.nr_inactive_file
> 86758 ± 6% -12.9% 75546 ± 8% proc-vmstat.nr_mapped
> 10162 ± 2% +7.2% 10895 ± 3% proc-vmstat.nr_page_table_pages
> 265229 +10.4% 292795 ± 9% proc-vmstat.nr_shmem
> 444798 +6.0% 471383 ± 5% proc-vmstat.nr_zone_active_anon
> 63763 +0.9% 64355 proc-vmstat.nr_zone_inactive_file
> 28191 -2.8% 27400 proc-vmstat.nr_zone_write_pending
> 24349 +11.6% 27171 ± 8% proc-vmstat.pgreuse
> 0.02 ± 3% +11.3% 0.03 ± 2% perf-sched.sch_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
> 0.29 ± 17% -30.7% 0.20 ± 14% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
> 0.03 ± 10% +33.5% 0.04 ± 2% perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
> 0.21 ± 32% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.16 ± 16% +51.9% 0.24 ± 11% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 0.22 ± 19% +44.1% 0.32 ± 25% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
> 0.30 ± 28% -38.7% 0.18 ± 28% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
> 0.11 ± 5% +12.8% 0.12 ± 4% perf-sched.sch_delay.avg.ms.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
> 0.08 ± 4% +15.8% 0.09 ± 4% perf-sched.sch_delay.avg.ms.xlog_wait.xlog_force_lsn.xfs_log_force_seq.xfs_file_fsync
> 0.02 ± 3% +13.7% 0.02 ± 4% perf-sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
> 0.01 ±223% +1289.5% 0.09 ±111% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
> 2.49 ± 40% -43.4% 1.41 ± 50% perf-sched.sch_delay.max.ms.__cond_resched.down_read.xfs_file_fsync.xfs_file_buffered_write.vfs_write
> 0.76 ± 7% +92.8% 1.46 ± 40% perf-sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
> 0.65 ± 41% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.40 ± 64% +2968.7% 43.04 ± 13% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 0.63 ± 19% +89.8% 1.19 ± 51% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 28.67 ± 3% -11.2% 25.45 ± 5% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
> 0.80 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
> 5.76 ±107% +152.4% 14.53 ± 10% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 8441 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
> 18.67 ± 71% +108.0% 38.83 ± 5% perf-sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_trans_commit
> 116.17 ±105% +1677.8% 2065 ± 5% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 424.79 ±151% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
> 28.51 ± 3% -11.2% 25.31 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__flush_workqueue.xlog_cil_push_now.isra
> 0.38 ± 59% -79.0% 0.08 ±107% perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
> 0.77 ± 9% -56.5% 0.34 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_write_get_more_iclog_space
> 1.80 ±138% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.13 ± 93% +133.2% 14.29 ± 10% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 1.00 ± 16% -48.1% 0.52 ± 20% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
> 0.92 ± 16% -62.0% 0.35 ± 14% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work
> 0.26 ± 2% -59.8% 0.11 perf-sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.process_one_work.worker_thread
> 0.24 ±223% +2180.2% 5.56 ± 83% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_cil_ctx_alloc.xlog_cil_push_work.process_one_work
> 1.25 ± 77% -79.8% 0.25 ±107% perf-sched.wait_time.max.ms.__cond_resched.down.xlog_write_iclog.xlog_state_release_iclog.xlog_state_get_iclog_space
> 1.78 ± 51% +958.6% 18.82 ±117% perf-sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_alloc_bioset.iomap_writepage_map_blocks.iomap_writepage_map
> 58.48 ± 6% -10.7% 52.22 ± 2% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.__flush_workqueue.xlog_cil_push_now.isra
> 10.87 ±192% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 8.63 ± 27% -63.9% 3.12 ± 29% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xlog_cil_push_work
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
2025-06-25 13:57 ` Mathieu Desnoyers
@ 2025-06-25 15:06 ` Gabriele Monaco
2025-07-02 13:58 ` Gabriele Monaco
1 sibling, 0 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-06-25 15:06 UTC (permalink / raw)
To: Mathieu Desnoyers, kernel test robot
Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
Paul E. McKenney, Ingo Molnar
On Wed, 2025-06-25 at 09:57 -0400, Mathieu Desnoyers wrote:
> On 2025-06-25 04:01, kernel test robot wrote:
> >
> > Hello,
> >
> > kernel test robot noticed a 10.1% regression of
> > hackbench.throughput on:
>
> Hi Gabriele,
>
> This is a significant regression. Can you investigate before it gets
> merged ?
>
Hi Mathieu,
I'll have a closer look at this next week.
For now let's keep this stalled.
Thanks,
Gabriele
> Thanks,
>
> Mathieu
>
> >
> >
> > commit: f3de761c52148abfb1b4512914f64c7e1c737fc8 ("[RESEND PATCH
> > v13 2/3] sched: Move task_mm_cid_work to mm work_struct")
> > url:
> > https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250613-171504
> > patch link:
> > https://lore.kernel.org/all/20250613091229.21500-3-gmonaco@redhat.com/
> > patch subject: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work
> > to mm work_struct
> >
> > testcase: hackbench
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > parameters:
> >
> > nr_threads: 100%
> > iterations: 4
> > mode: process
> > ipc: pipe
> > cpufreq_governor: performance
> >
> >
> > In addition to that, the commit also has significant impact on the
> > following tests:
> >
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | hackbench: hackbench.throughput 2.9%
> > > regression |
> > > test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > ipc=socket
> > > |
> > > |
> > > iterations=4
> > > |
> > > |
> > > mode=process
> > > |
> > > |
> > > nr_threads=50%
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.shell_rtns_3.ops_per_sec 1.7%
> > > regression |
> > > test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > test=shell_rtns_3
> > > |
> > > |
> > > testtime=300s
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | hackbench: hackbench.throughput 6.2%
> > > regression |
> > > test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > ipc=pipe
> > > |
> > > |
> > > iterations=4
> > > |
> > > |
> > > mode=process
> > > |
> > > |
> > > nr_threads=800%
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.shell_rtns_1.ops_per_sec 2.1%
> > > regression |
> > > test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > test=shell_rtns_1
> > > |
> > > |
> > > testtime=300s
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | hackbench: hackbench.throughput 11.8%
> > > improvement |
> > > test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > ipc=pipe
> > > |
> > > |
> > > iterations=4
> > > |
> > > |
> > > mode=process
> > > |
> > > |
> > > nr_threads=50%
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.shell_rtns_2.ops_per_sec 2.2%
> > > regression |
> > > test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > test=shell_rtns_2
> > > |
> > > |
> > > testtime=300s
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim9: aim9.exec_test.ops_per_sec 2.6%
> > > regression |
> > > test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-
> > > 2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > test=exec_test
> > > |
> > > |
> > > testtime=300s
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> > > testcase: change | aim7: aim7.jobs-per-min 5.5%
> > > regression
> > > |
> > > test machine | 128 threads 2 sockets Intel(R) Xeon(R) Gold
> > > 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory |
> > > test parameters |
> > > cpufreq_governor=performance
> > > |
> > > |
> > > disk=1BRD_48G
> > > |
> > > |
> > > fs=xfs
> > > |
> > > |
> > > load=600
> > > |
> > > |
> > > test=sync_disk_rw
> > > |
> > +------------------+-----------------------------------------------
> > -------------------------------------------------+
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a
> > new version of
> > the same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes:
> > > https://lore.kernel.org/oe-lkp/202506251555.de6720f7-lkp@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------
> > ------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250625/202506251555.de6720f7-lkp@intel.com
> >
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> > gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/100%/debian-
> > 12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 55140 ± 80% +229.2% 181547 ± 20% numa-
> > meminfo.node1.Mapped
> > 13048 ± 80% +248.2% 45431 ± 20% numa-
> > vmstat.node1.nr_mapped
> > 679.17 ± 22% -25.3% 507.33 ± 10%
> > sched_debug.cfs_rq:/.util_est.max
> > 4.287e+08 ± 3% +20.3% 5.158e+08 cpuidle..time
> > 2953716 ± 13% +228.9% 9716185 ± 2% cpuidle..usage
> > 91072 ± 12% +134.8% 213855 ± 7% meminfo.Mapped
> > 8848637 +10.4% 9769875 ± 5% meminfo.Memused
> > 0.67 ± 4% +0.1 0.78 ± 2% mpstat.cpu.all.irq%
> > 0.03 ± 2% +0.0 0.03 ± 4% mpstat.cpu.all.soft%
> > 4.17 ± 8% +596.0% 29.00 ± 31%
> > mpstat.max_utilization.seconds
> > 2950 -12.3% 2587 vmstat.procs.r
> > 4557607 ± 2% +35.9% 6192548 vmstat.system.cs
> > 397195 ± 5% +73.4% 688726 vmstat.system.in
> > 1490153 -10.1% 1339340 hackbench.throughput
> > 1424170 -8.7% 1299590
> > hackbench.throughput_avg
> > 1490153 -10.1% 1339340
> > hackbench.throughput_best
> > 1353181 ± 2% -10.1% 1216523
> > hackbench.throughput_worst
> > 53158738 ± 3% +34.0% 71240022
> > hackbench.time.involuntary_context_switches
> > 12177 -2.4% 11891
> > hackbench.time.percent_of_cpu_this_job_got
> > 4482 +7.6% 4821
> > hackbench.time.system_time
> > 798.92 +2.0% 815.24
> > hackbench.time.user_time
> > 1.54e+08 ± 3% +46.6% 2.257e+08
> > hackbench.time.voluntary_context_switches
> > 210335 +3.3% 217333 proc-
> > vmstat.nr_anon_pages
> > 23353 ± 14% +136.2% 55152 ± 7% proc-
> > vmstat.nr_mapped
> > 61825 ± 3% +6.6% 65928 ± 2% proc-
> > vmstat.nr_page_table_pages
> > 30859 +4.4% 32213 proc-
> > vmstat.nr_slab_reclaimable
> > 1294 ±177% +1657.1% 22743 ± 66% proc-
> > vmstat.numa_hint_faults
> > 1153 ±198% +1597.0% 19566 ± 79% proc-
> > vmstat.numa_hint_faults_local
> > 1.242e+08 -3.2% 1.202e+08 proc-vmstat.numa_hit
> > 1.241e+08 -3.2% 1.201e+08 proc-
> > vmstat.numa_local
> > 2195 ±110% +2337.0% 53508 ± 55% proc-
> > vmstat.numa_pte_updates
> > 1.243e+08 -3.2% 1.203e+08 proc-
> > vmstat.pgalloc_normal
> > 875909 ± 2% +8.6% 951378 ± 2% proc-vmstat.pgfault
> > 1.231e+08 -3.5% 1.188e+08 proc-vmstat.pgfree
> > 6.903e+10 -5.6% 6.514e+10 perf-stat.i.branch-
> > instructions
> > 0.21 +0.0 0.26 perf-stat.i.branch-
> > miss-rate%
> > 89225177 ± 2% +38.3% 1.234e+08 perf-stat.i.branch-
> > misses
> > 25.64 ± 2% -5.7 19.95 ± 2% perf-stat.i.cache-
> > miss-rate%
> > 9.322e+08 ± 2% +22.8% 1.145e+09 perf-stat.i.cache-
> > references
> > 4553621 ± 2% +39.8% 6363761 perf-stat.i.context-
> > switches
> > 1.12 +4.5% 1.17 perf-stat.i.cpi
> > 186890 ± 2% +143.9% 455784 perf-stat.i.cpu-
> > migrations
> > 2.787e+11 -4.9% 2.649e+11 perf-
> > stat.i.instructions
> > 0.91 -4.4% 0.87 perf-stat.i.ipc
> > 36.79 ± 2% +44.9% 53.30 perf-
> > stat.i.metric.K/sec
> > 0.13 ± 2% +0.1 0.19 perf-
> > stat.overall.branch-miss-rate%
> > 24.44 ± 2% -4.7 19.74 ± 2% perf-
> > stat.overall.cache-miss-rate%
> > 1.12 +4.6% 1.17 perf-
> > stat.overall.cpi
> > 0.89 -4.4% 0.85 perf-
> > stat.overall.ipc
> > 6.755e+10 -5.4% 6.392e+10 perf-stat.ps.branch-
> > instructions
> > 87121352 ± 2% +38.5% 1.206e+08 perf-stat.ps.branch-
> > misses
> > 9.098e+08 ± 2% +23.1% 1.12e+09 perf-stat.ps.cache-
> > references
> > 4443812 ± 2% +39.9% 6218298 perf-
> > stat.ps.context-switches
> > 181595 ± 2% +144.5% 443985 perf-stat.ps.cpu-
> > migrations
> > 2.727e+11 -4.7% 2.599e+11 perf-
> > stat.ps.instructions
> > 1.21e+13 +4.3% 1.262e+13 perf-
> > stat.total.instructions
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.__intel_pmu_enable_all.ctx_resched.event_function.remote_functio
> > n.generic_exec_single
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioc
> > tl.perf_evsel__run_ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp._perf_ioctl.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCA
> > LL_64_after_hwframe
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.ctx_resched.event_function.remote_function.generic_exec_single.s
> > mp_call_function_single
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__r
> > un_ioctl.perf_evsel__enable_cpu
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.entry_SYSCALL_64_after_hwframe.ioctl.perf_evsel__run_ioctl.perf_
> > evsel__enable_cpu.__evlist__enable
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.event_function.remote_function.generic_exec_single.smp_call_func
> > tion_single.event_function_call
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.event_function_call.perf_event_for_each_child._perf_ioctl.perf_i
> > octl.__x64_sys_ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.generic_exec_single.smp_call_function_single.event_function_call
> > .perf_event_for_each_child._perf_ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.ioctl.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__ena
> > ble.__cmd_record
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.perf_event_for_each_child._perf_ioctl.perf_ioctl.__x64_sys_ioctl
> > .do_syscall_64
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.perf_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_
> > hwframe.ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.remote_function.generic_exec_single.smp_call_function_single.eve
> > nt_function_call.perf_event_for_each_child
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.calltrace.cycles-
> > pp.smp_call_function_single.event_function_call.perf_event_for_each
> > _child._perf_ioctl.perf_ioctl
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.calltrace.cycles-
> > pp.__cmd_record.cmd_record.perf_c2c__record.run_builtin.handle_inte
> > rnal_command
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.calltrace.cycles-
> > pp.__evlist__enable.__cmd_record.cmd_record.perf_c2c__record.run_bu
> > iltin
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.calltrace.cycles-
> > pp.cmd_record.perf_c2c__record.run_builtin.handle_internal_command.
> > main
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.calltrace.cycles-
> > pp.perf_c2c__record.run_builtin.handle_internal_command.main
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.calltrace.cycles-
> > pp.perf_evsel__enable_cpu.__evlist__enable.__cmd_record.cmd_record.
> > perf_c2c__record
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.calltrace.cycles-
> > pp.perf_evsel__run_ioctl.perf_evsel__enable_cpu.__evlist__enable.__
> > cmd_record.cmd_record
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.__intel_pmu_enable_all
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.__x64_sys_ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp._perf_ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.ctx_resched
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.event_function
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.generic_exec_single
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.perf_event_for_each_child
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.perf_ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.children.cycles-pp.remote_function
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.children.cycles-pp.__evlist__enable
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.children.cycles-pp.perf_c2c__record
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.children.cycles-pp.perf_evsel__enable_cpu
> > 11.84 ± 91% -8.4 3.49 ±154% perf-
> > profile.children.cycles-pp.perf_evsel__run_ioctl
> > 11.84 ± 91% -9.5 2.30 ±141% perf-
> > profile.self.cycles-pp.__intel_pmu_enable_all
> > 23.74 ±185% -98.6% 0.34 ±114% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
> > 12.77 ± 80% -83.9% 2.05 ±138% perf-
> > sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> > 5.93 ± 69% -90.5% 0.56 ±105% perf-
> > sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_f
> > ile_write_iter.vfs_write.ksys_write
> > 6.70 ±152% -94.5% 0.37 ±145% perf-
> > sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem
> > _alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> > 0.82 ± 85% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 8.59 ±202% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 13.53 ± 11% -47.0% 7.18 ± 76% perf-
> > sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> > 15.63 ± 17% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> > 47.22 ± 77% -85.5% 6.87 ±144% perf-
> > sched.sch_delay.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> > 133.35 ±132% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 68.01 ±203% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 13.53 ± 11% -47.0% 7.18 ± 76% perf-
> > sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> > 34.59 ± 3% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> > 40.97 ± 8% -71.8% 11.55 ± 64% perf-
> > sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> > 373.07 ±123% -99.8% 0.78 ±156% perf-
> > sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_
> > from_fork_asm
> > 13.53 ± 11% -62.0% 5.14 ±107% perf-
> > sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_sysc
> > all_64
> > 120.97 ± 23% -100.0% 0.00 perf-
> > sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filem
> > ap_fault.__do_fault
> > 46.03 ± 30% -62.5% 17.27 ± 87% perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_r
> > eschedule_ipi.[unknown].[unknown]
> > 984.50 ± 14% -43.5% 556.24 ± 58% perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_
> > from_fork
> > 339.42 ± 12% -97.3% 9.11 ± 54% perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 8.00 ± 23% -85.4% 1.17 ±223% perf-
> > sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread
> > .ret_from_fork.ret_from_fork_asm
> > 22.17 ± 49% -100.0% 0.00 perf-
> > sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filema
> > p_fault.__do_fault
> > 73.83 ± 20% -76.3% 17.50 ± 96% perf-
> > sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_
> > fault.[unknown].[unknown]
> > 13.53 ± 11% -62.0% 5.14 ±107% perf-
> > sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_sysc
> > all_64
> > 336.30 ± 5% -100.0% 0.00 perf-
> > sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.filem
> > ap_fault.__do_fault
> > 23.74 ±185% -98.6% 0.34 ±114% perf-
> > sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
> > 14.48 ± 61% -74.1% 3.76 ±152% perf-
> > sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> > 6.48 ± 68% -91.3% 0.56 ±105% perf-
> > sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_f
> > ile_write_iter.vfs_write.ksys_write
> > 6.70 ±152% -94.5% 0.37 ±145% perf-
> > sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem
> > _alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> > 2.18 ± 75% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 10.79 ±165% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 1.53 ±100% -97.5% 0.04 ± 84% perf-
> > sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> > 105.34 ± 26% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> > 29.72 ± 40% -76.5% 7.00 ±102% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_faul
> > t.[unknown].[unknown]
> > 32.21 ± 33% -65.7% 11.04 ± 85% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> > 984.49 ± 14% -43.5% 556.23 ± 58% perf-
> > sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_
> > fork
> > 337.00 ± 12% -97.6% 8.11 ± 52% perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 53.42 ± 59% -69.8% 16.15 ±162% perf-
> > sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> > 218.65 ± 83% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 82.52 ±162% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 10.89 ± 98% -98.8% 0.13 ±134% perf-
> > sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> > 334.02 ± 6% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fa
> > ult.__do_fault
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> > gcc-12/performance/socket/4/x86_64-rhel-9.4/process/50%/debian-
> > 12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 161258 -12.6% 141018 ± 5% perf-c2c.HITM.total
> > 6514 ± 3% +13.3% 7381 ± 3% uptime.idle
> > 692218 +17.8% 815512 vmstat.system.in
> > 4.747e+08 ± 7% +137.3% 1.127e+09 ± 21% cpuidle..time
> > 5702271 ± 12% +503.6% 34419686 ± 13% cpuidle..usage
> > 141191 ± 2% +10.3% 155768 ± 3% meminfo.PageTables
> > 62180 +26.0% 78348 meminfo.Percpu
> > 2.20 ± 14% +3.5 5.67 ± 20% mpstat.cpu.all.idle%
> > 0.55 +0.2 0.72 ± 5% mpstat.cpu.all.irq%
> > 0.04 ± 2% +0.0 0.06 ± 5% mpstat.cpu.all.soft%
> > 448780 -2.9% 435554 hackbench.throughput
> > 440656 -2.6% 429130
> > hackbench.throughput_avg
> > 448780 -2.9% 435554
> > hackbench.throughput_best
> > 425797 -2.2% 416584
> > hackbench.throughput_worst
> > 90998790 -15.0% 77364427 ± 6%
> > hackbench.time.involuntary_context_switches
> > 12446 -3.9% 11960
> > hackbench.time.percent_of_cpu_this_job_got
> > 16057 -1.4% 15825
> > hackbench.time.system_time
> > 63421 -2.3% 61955 proc-
> > vmstat.nr_kernel_stack
> > 35455 ± 2% +10.0% 38991 ± 3% proc-
> > vmstat.nr_page_table_pages
> > 34542 +5.1% 36312 ± 2% proc-
> > vmstat.nr_slab_reclaimable
> > 151083 ± 16% +46.6% 221509 ± 17% proc-
> > vmstat.numa_hint_faults
> > 113731 ± 26% +64.7% 187314 ± 20% proc-
> > vmstat.numa_hint_faults_local
> > 133591 +3.1% 137709 proc-
> > vmstat.numa_other
> > 53696 ± 16% -28.6% 38362 ± 10% proc-
> > vmstat.numa_pages_migrated
> > 1053504 ± 2% +7.7% 1135052 ± 4% proc-vmstat.pgfault
> > 2077549 ± 3% +8.5% 2254157 ± 4% proc-vmstat.pgfree
> > 53696 ± 16% -28.6% 38362 ± 10% proc-
> > vmstat.pgmigrate_success
> > 4.941e+10 -2.6% 4.81e+10 perf-stat.i.branch-
> > instructions
> > 2.232e+08 -1.9% 2.189e+08 perf-stat.i.branch-
> > misses
> > 2.11e+09 -5.8% 1.989e+09 ± 2% perf-stat.i.cache-
> > references
> > 3.221e+11 -2.5% 3.141e+11 perf-stat.i.cpu-
> > cycles
> > 2.365e+11 -2.7% 2.303e+11 perf-
> > stat.i.instructions
> > 6787 ± 3% +8.0% 7327 ± 4% perf-stat.i.minor-
> > faults
> > 6789 ± 3% +8.0% 7329 ± 4% perf-stat.i.page-
> > faults
> > 4.904e+10 -2.5% 4.779e+10 perf-stat.ps.branch-
> > instructions
> > 2.215e+08 -1.8% 2.174e+08 perf-stat.ps.branch-
> > misses
> > 2.094e+09 -5.7% 1.974e+09 ± 2% perf-stat.ps.cache-
> > references
> > 3.197e+11 -2.4% 3.12e+11 perf-stat.ps.cpu-
> > cycles
> > 2.348e+11 -2.6% 2.288e+11 perf-
> > stat.ps.instructions
> > 6691 ± 3% +7.2% 7174 ± 4% perf-stat.ps.minor-
> > faults
> > 6693 ± 3% +7.2% 7176 ± 4% perf-stat.ps.page-
> > faults
> > 7475567 +16.4% 8699139
> > sched_debug.cfs_rq:/.avg_vruntime.avg
> > 8752154 ± 3% +20.6% 10551563 ± 4%
> > sched_debug.cfs_rq:/.avg_vruntime.max
> > 211424 ± 12% +374.5% 1003211 ± 39%
> > sched_debug.cfs_rq:/.avg_vruntime.stddev
> > 19.44 ± 6% +29.4% 25.17 ± 5%
> > sched_debug.cfs_rq:/.h_nr_queued.max
> > 4.49 ± 4% +33.5% 5.99 ± 4%
> > sched_debug.cfs_rq:/.h_nr_queued.stddev
> > 19.33 ± 6% +29.0% 24.94 ± 5%
> > sched_debug.cfs_rq:/.h_nr_runnable.max
> > 4.47 ± 4% +33.4% 5.96 ± 3%
> > sched_debug.cfs_rq:/.h_nr_runnable.stddev
> > 6446 ±223% +885.4% 63529 ± 57%
> > sched_debug.cfs_rq:/.left_deadline.avg
> > 825119 ±223% +613.5% 5886958 ± 44%
> > sched_debug.cfs_rq:/.left_deadline.max
> > 72645 ±223% +713.6% 591074 ± 49%
> > sched_debug.cfs_rq:/.left_deadline.stddev
> > 6446 ±223% +885.5% 63527 ± 57%
> > sched_debug.cfs_rq:/.left_vruntime.avg
> > 825080 ±223% +613.5% 5886805 ± 44%
> > sched_debug.cfs_rq:/.left_vruntime.max
> > 72642 ±223% +713.7% 591058 ± 49%
> > sched_debug.cfs_rq:/.left_vruntime.stddev
> > 4202 ± 8% +1115.1% 51069 ± 61%
> > sched_debug.cfs_rq:/.load.stddev
> > 367.11 +20.2% 441.44 ± 17%
> > sched_debug.cfs_rq:/.load_avg.max
> > 7475567 +16.4% 8699139
> > sched_debug.cfs_rq:/.min_vruntime.avg
> > 8752154 ± 3% +20.6% 10551563 ± 4%
> > sched_debug.cfs_rq:/.min_vruntime.max
> > 211424 ± 12% +374.5% 1003211 ± 39%
> > sched_debug.cfs_rq:/.min_vruntime.stddev
> > 0.17 ± 16% +39.8% 0.24 ± 6%
> > sched_debug.cfs_rq:/.nr_queued.stddev
> > 6446 ±223% +885.5% 63527 ± 57%
> > sched_debug.cfs_rq:/.right_vruntime.avg
> > 825080 ±223% +613.5% 5886805 ± 44%
> > sched_debug.cfs_rq:/.right_vruntime.max
> > 72642 ±223% +713.7% 591058 ± 49%
> > sched_debug.cfs_rq:/.right_vruntime.stddev
> > 752.39 ± 81% -81.4% 139.72 ± 53%
> > sched_debug.cfs_rq:/.runnable_avg.min
> > 2728 ± 3% +51.2% 4126 ± 8%
> > sched_debug.cfs_rq:/.runnable_avg.stddev
> > 265.50 ± 2% +12.3% 298.07 ± 2%
> > sched_debug.cfs_rq:/.util_avg.stddev
> > 686.78 ± 7% +23.4% 847.76 ± 6%
> > sched_debug.cfs_rq:/.util_est.stddev
> > 19.44 ± 5% +29.7% 25.22 ± 4%
> > sched_debug.cpu.nr_running.max
> > 4.48 ± 5% +34.4% 6.02 ± 3%
> > sched_debug.cpu.nr_running.stddev
> > 67323 ± 14% +130.3% 155017 ± 29%
> > sched_debug.cpu.nr_switches.stddev
> > -20.78 -18.2% -17.00
> > sched_debug.cpu.nr_uninterruptible.min
> > 0.13 ±100% -85.8% 0.02 ±163% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> > 0.17 ±116% -97.8% 0.00 ±223% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> > 22.92 ±110% -97.4% 0.59 ±137% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_nop
> > rof
> > 8.10 ± 45% -78.0% 1.78 ±135% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> > 3.14 ± 19% -70.9% 0.91 ±102% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_n
> > oprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> > 39.05 ±149% -97.4% 1.01 ±223% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vm
> > as.free_pgtables.exit_mmap
> > 15.77 ±203% -99.7% 0.04 ±102% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_page
> > s.tlb_finish_mmu.exit_mmap.__mmput
> > 1.27 ±177% -98.2% 0.02 ±190% perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> > 0.20 ±140% -92.4% 0.02 ±201% perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link
> > _path_walk.path_openat
> > 86.63 ±221% -99.9% 0.05 ±184% perf-
> > sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> > 0.18 ± 75% -97.0% 0.01 ±141% perf-
> > sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> > 0.13 ± 34% -75.5% 0.03 ±141% perf-
> > sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_open
> > at.do_filp_open
> > 0.26 ±108% -86.2% 0.04 ±142% perf-
> > sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_
> > exit
> > 2.33 ± 11% -65.8% 0.80 ±107% perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.
> > __alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> > 0.18 ± 88% -91.1% 0.02 ±194% perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> > 0.50 ±145% -92.5% 0.04 ±210% perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getna
> > me_flags.part.0
> > 0.19 ±116% -98.5% 0.00 ±223% perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> > 0.24 ±128% -96.8% 0.01 ±180% perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> > 0.99 ± 16% -58.0% 0.42 ±100% perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_g
> > eneric.unix_stream_recvmsg.sock_recvmsg
> > 0.27 ±124% -97.5% 0.01 ±141% perf-
> > sched.sch_delay.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.
> > exit_mm
> > 1.08 ± 28% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.96 ± 93% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 0.53 ±182% -94.2% 0.03 ±158% perf-
> > sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> > 0.84 ±160% -93.5% 0.05 ±100% perf-
> > sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> > 29.39 ±172% -94.0% 1.78 ±123% perf-
> > sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> > 21.51 ± 60% -74.7% 5.45 ±118% perf-
> > sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> > 13.77 ± 61% -81.3% 2.57 ±113% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> > 11.22 ± 33% -74.5% 2.86 ±107% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> > 1.99 ± 90% -90.1% 0.20 ±100% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> > 4.50 ±138% -94.9% 0.23 ±200% perf-
> > sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_ep
> > oll_wait.__x64_sys_epoll_wait
> > 27.91 ±218% -99.6% 0.11 ±120% perf-
> > sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> > 9.91 ± 51% -68.3% 3.15 ±124% perf-
> > sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 10.18 ± 24% -62.4% 3.83 ±105% perf-
> > sched.sch_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_s
> > tream_sendmsg.sock_write_iter
> > 1.16 ± 20% -62.7% 0.43 ±106% perf-
> > sched.sch_delay.avg.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> > 0.27 ± 99% -92.0% 0.02 ±172% perf-
> > sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> > 0.32 ±128% -98.9% 0.00 ±223% perf-
> > sched.sch_delay.max.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> > 0.88 ± 94% -86.7% 0.12 ±144% perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_e
> > vent_mmap_event.perf_event_mmap.__mmap_region
> > 252.53 ±128% -98.4% 4.12 ±138% perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_nop
> > rof
> > 60.22 ± 58% -67.8% 19.37 ±146% perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> > 168.93 ±209% -99.9% 0.15 ±100% perf-
> > sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_page
> > s.tlb_finish_mmu.exit_mmap.__mmput
> > 3.79 ±169% -98.6% 0.05 ±199% perf-
> > sched.sch_delay.max.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> > 517.19 ±222% -99.9% 0.29 ±201% perf-
> > sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> > 0.54 ± 82% -98.4% 0.01 ±141% perf-
> > sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> > 0.34 ± 57% -93.1% 0.02 ±203% perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> > 0.64 ±141% -99.4% 0.00 ±223% perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> > 0.28 ±111% -97.2% 0.01 ±180% perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> > 0.29 ±114% -97.6% 0.01 ±141% perf-
> > sched.sch_delay.max.ms.__cond_resched.remove_vma.exit_mmap.__mmput.
> > exit_mm
> > 133.30 ± 46% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 12.53 ±135% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 1.11 ± 85% -76.9% 0.26 ±202% perf-
> > sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.par
> > t.0
> > 7.48 ±214% -99.0% 0.08 ±141% perf-
> > sched.sch_delay.max.ms.__cond_resched.wp_page_copy.__handle_mm_faul
> > t.handle_mm_fault.do_user_addr_fault
> > 28.59 ±191% -99.0% 0.28 ±120% perf-
> > sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> > 285.16 ±145% -99.3% 1.94 ±111% perf-
> > sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_6
> > 4
> > 143.71 ±128% -91.0% 12.97 ±134% perf-
> > sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> > 107.10 ±162% -99.1% 0.95 ±190% perf-
> > sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_ep
> > oll_wait.__x64_sys_epoll_wait
> > 352.73 ±216% -99.4% 2.06 ±118% perf-
> > sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> > 1169 ± 25% -58.7% 482.79 ±101% perf-
> > sched.sch_delay.max.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> > 1.80 ± 20% -58.5% 0.75 ±105% perf-
> > sched.total_sch_delay.average.ms
> > 5.09 ± 20% -58.0% 2.14 ±106% perf-
> > sched.total_wait_and_delay.average.ms
> > 20.86 ± 25% -82.0% 3.76 ±147% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.al
> > loc_slab_obj_exts.allocate_slab.___slab_alloc
> > 8.10 ± 21% -69.1% 2.51 ±103% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_cal
> > ler_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> > 22.82 ± 27% -66.9% 7.55 ±103% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine
> > _move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> > 6.55 ± 13% -64.1% 2.35 ±108% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_no
> > prof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> > 139.95 ± 55% -64.0% 50.45 ±122% perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.
> > ksys_read
> > 27.54 ± 61% -81.3% 5.15 ±113% perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown]
> > 27.75 ± 30% -73.3% 7.42 ±106% perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown].[unknown]
> > 26.76 ± 25% -64.2% 9.57 ±107% perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_r
> > eschedule_ipi.[unknown].[unknown]
> > 29.39 ± 34% -67.3% 9.61 ±115% perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp
> > _kthread.kthread
> > 27.53 ± 25% -62.9% 10.21 ±105% perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.u
> > nix_stream_sendmsg.sock_write_iter
> > 3.25 ± 20% -62.2% 1.23 ±106% perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.unix_stream_read_gener
> > ic.unix_stream_recvmsg.sock_recvmsg
> > 864.18 ± 4% -99.3% 6.27 ±103% perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 141.47 ± 38% -72.9% 38.27 ±154% perf-
> > sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.al
> > loc_slab_obj_exts.allocate_slab.___slab_alloc
> > 2346 ± 25% -58.7% 969.53 ±101% perf-
> > sched.wait_and_delay.max.ms.schedule_timeout.unix_stream_read_gener
> > ic.unix_stream_recvmsg.sock_recvmsg
> > 83.99 ±223% -100.0% 0.02 ±163% perf-
> > sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> > 0.16 ±122% -97.7% 0.00 ±223% perf-
> > sched.wait_time.avg.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> > 12.76 ± 37% -81.6% 2.35 ±125% perf-
> > sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> > 4.96 ± 22% -67.9% 1.59 ±104% perf-
> > sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_n
> > oprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> > 75.22 ± 91% -96.4% 2.67 ±223% perf-
> > sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vm
> > as.free_pgtables.exit_mmap
> > 23.31 ±188% -98.8% 0.28 ±195% perf-
> > sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_page
> > s.tlb_finish_mmu.exit_mmap.__mmput
> > 14.93 ± 22% -68.0% 4.78 ±104% perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move
> > _task.__set_cpus_allowed_ptr.__sched_setaffinity
> > 1.29 ±178% -98.5% 0.02 ±185% perf-
> > sched.wait_time.avg.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> > 0.20 ±140% -92.5% 0.02 ±200% perf-
> > sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link
> > _path_walk.path_openat
> > 87.29 ±221% -99.9% 0.05 ±184% perf-
> > sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> > 0.18 ± 76% -97.0% 0.01 ±141% perf-
> > sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> > 0.12 ± 33% -87.4% 0.02 ±212% perf-
> > sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_open
> > at.do_filp_open
> > 4.22 ± 15% -63.3% 1.55 ±108% perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.
> > __alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> > 0.18 ± 88% -91.1% 0.02 ±194% perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> > 0.50 ±145% -92.5% 0.04 ±210% perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getna
> > me_flags.part.0
> > 0.19 ±116% -98.5% 0.00 ±223% perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> > 0.24 ±128% -96.8% 0.01 ±180% perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> > 1.79 ± 27% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 1.98 ± 92% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 2.44 ±199% -98.1% 0.05 ±109% perf-
> > sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> > 125.16 ± 52% -64.6% 44.36 ±120% perf-
> > sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> > 13.77 ± 61% -81.3% 2.58 ±113% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> > 16.53 ± 29% -72.5% 4.55 ±106% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> > 3.11 ± 80% -80.7% 0.60 ±138% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> > 17.30 ± 23% -65.0% 6.05 ±107% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> > 50.76 ±143% -98.1% 0.97 ±101% perf-
> > sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> > 19.48 ± 27% -66.8% 6.46 ±111% perf-
> > sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 17.35 ± 25% -63.3% 6.37 ±106% perf-
> > sched.wait_time.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_s
> > tream_sendmsg.sock_write_iter
> > 2.09 ± 21% -62.0% 0.79 ±107% perf-
> > sched.wait_time.avg.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> > 850.73 ± 6% -99.3% 5.76 ±102% perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 168.00 ±223% -100.0% 0.02 ±172% perf-
> > sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
> > 0.32 ±131% -98.8% 0.00 ±223% perf-
> > sched.wait_time.max.ms.__cond_resched.__get_user_pages.get_user_pag
> > es_remote.get_arg_page.copy_strings
> > 0.88 ± 94% -86.7% 0.12 ±144% perf-
> > sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_e
> > vent_mmap_event.perf_event_mmap.__mmap_region
> > 83.05 ± 45% -75.0% 20.78 ±142% perf-
> > sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_s
> > lab_obj_exts.allocate_slab.___slab_alloc
> > 393.39 ± 76% -96.3% 14.60 ±223% perf-
> > sched.wait_time.max.ms.__cond_resched.__put_anon_vma.unlink_anon_vm
> > as.free_pgtables.exit_mmap
> > 3.87 ±170% -98.6% 0.05 ±199% perf-
> > sched.wait_time.max.ms.__cond_resched.down_read.acct_collect.do_exi
> > t.do_group_exit
> > 520.88 ±222% -99.9% 0.29 ±201% perf-
> > sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.d
> > o_syscall_64
> > 0.54 ± 82% -98.4% 0.01 ±141% perf-
> > sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk
> > .path_openat
> > 0.34 ± 57% -93.1% 0.02 ±203% perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc
> > _empty_file.path_openat.do_filp_open
> > 0.64 ±141% -99.4% 0.00 ±223% perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.commit_merge
> > 0.28 ±111% -97.2% 0.01 ±180% perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_ar
> > ea_dup.__split_vma.vms_gather_munmap_vmas
> > 210.15 ± 42% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 34.48 ±131% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 1.11 ± 85% -76.9% 0.26 ±202% perf-
> > sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.par
> > t.0
> > 92.32 ±212% -99.7% 0.27 ±123% perf-
> > sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> > 3252 ± 21% -58.5% 1351 ±103% perf-
> > sched.wait_time.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> > 1602 ± 28% -66.2% 541.12 ±100% perf-
> > sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 530.17 ± 95% -98.5% 7.79 ±119% perf-
> > sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> > 1177 ± 25% -58.7% 486.74 ±101% perf-
> > sched.wait_time.max.ms.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg.sock_recvmsg
> > 50.88 -1.4 49.53 perf-
> > profile.calltrace.cycles-pp.read
> > 45.95 -1.0 44.92 perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
> > 45.66 -1.0 44.64 perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> > 3.44 ± 4% -0.8 2.66 ± 4% perf-
> > profile.calltrace.cycles-
> > pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_strea
> > m_sendmsg.sock_write_iter
> > 3.32 ± 4% -0.8 2.56 ± 4% perf-
> > profile.calltrace.cycles-
> > pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.soc
> > k_def_readable.unix_stream_sendmsg
> > 3.28 ± 4% -0.8 2.52 ± 4% perf-
> > profile.calltrace.cycles-
> > pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_
> > up_sync_key.sock_def_readable
> > 3.48 ± 3% -0.6 2.83 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.schedule.schedule_timeout.unix_stream_read_generic.unix_stream_r
> > ecvmsg.sock_recvmsg
> > 3.52 ± 3% -0.6 2.87 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.schedule_timeout.unix_stream_read_generic.unix_stream_recvmsg.so
> > ck_recvmsg.sock_read_iter
> > 3.45 ± 3% -0.6 2.80 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.__schedule.schedule.schedule_timeout.unix_stream_read_generic.un
> > ix_stream_recvmsg
> > 47.06 -0.6 46.45 perf-
> > profile.calltrace.cycles-pp.write
> > 4.26 ± 5% -0.6 3.69 perf-
> > profile.calltrace.cycles-
> > pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_wr
> > ite_iter.vfs_write
> > 1.58 ± 3% -0.6 1.02 ± 8% perf-
> > profile.calltrace.cycles-
> > pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_
> > up_common.__wake_up_sync_key
> > 1.31 ± 3% -0.5 0.85 ± 8% perf-
> > profile.calltrace.cycles-
> > pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_fun
> > ction.__wake_up_common
> > 1.25 ± 3% -0.4 0.81 ± 8% perf-
> > profile.calltrace.cycles-
> > pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.a
> > utoremove_wake_function
> > 0.84 ± 3% -0.2 0.60 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwfr
> > ame.read
> > 7.91 -0.2 7.68 perf-
> > profile.calltrace.cycles-
> > pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recv
> > msg.sock_recvmsg.sock_read_iter
> > 3.17 ± 2% -0.2 2.94 perf-
> > profile.calltrace.cycles-
> > pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.
> > vfs_write.ksys_write
> > 7.80 -0.2 7.58 perf-
> > profile.calltrace.cycles-
> > pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_g
> > eneric.unix_stream_recvmsg.sock_recvmsg
> > 7.58 -0.2 7.36 perf-
> > profile.calltrace.cycles-
> > pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_acto
> > r.unix_stream_read_generic.unix_stream_recvmsg
> > 1.22 ± 4% -0.2 1.02 ± 4% perf-
> > profile.calltrace.cycles-
> > pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stre
> > am_read_generic
> > 1.18 ± 4% -0.2 0.99 ± 4% perf-
> > profile.calltrace.cycles-
> > pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule
> > _timeout
> > 0.87 -0.2 0.68 ± 8% perf-
> > profile.calltrace.cycles-
> > pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedul
> > e_timeout
> > 1.14 ± 4% -0.2 0.95 ± 4% perf-
> > profile.calltrace.cycles-
> > pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.
> > schedule
> > 0.90 -0.2 0.72 ± 7% perf-
> > profile.calltrace.cycles-
> > pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_strea
> > m_read_generic
> > 3.45 ± 3% -0.1 3.30 perf-
> > profile.calltrace.cycles-
> > pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_st
> > ream_read_actor.unix_stream_read_generic
> > 1.96 -0.1 1.82 perf-
> > profile.calltrace.cycles-pp.clear_bhb_loop.read
> > 1.97 -0.1 1.86 perf-
> > profile.calltrace.cycles-pp.clear_bhb_loop.write
> > 2.35 -0.1 2.25 perf-
> > profile.calltrace.cycles-
> > pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.
> > kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> > 2.58 -0.1 2.48 perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64.read
> > 1.38 ± 4% -0.1 1.28 ± 2% perf-
> > profile.calltrace.cycles-
> > pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.
> > sock_write_iter.vfs_write
> > 1.35 -0.1 1.25 perf-
> > profile.calltrace.cycles-
> > pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_send
> > msg.sock_write_iter.vfs_write
> > 0.67 ± 7% -0.1 0.58 ± 3% perf-
> > profile.calltrace.cycles-
> > pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_t
> > ask.__schedule
> > 2.59 -0.1 2.50 perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64.write
> > 2.02 -0.1 1.96 perf-
> > profile.calltrace.cycles-
> > pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__allo
> > c_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> > 0.77 ± 3% -0.0 0.72 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwfram
> > e.write
> > 0.65 ± 4% -0.0 0.60 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > .read
> > 0.74 -0.0 0.70 perf-
> > profile.calltrace.cycles-
> > pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_rec
> > vmsg.sock_read_iter
> > 1.04 -0.0 0.99 perf-
> > profile.calltrace.cycles-
> > pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.__kmalloc
> > _node_track_caller_noprof.kmalloc_reserve.__alloc_skb
> > 0.69 -0.0 0.65 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.check_heap_object.__check_object_size.skb_copy_datagram_from_ite
> > r.unix_stream_sendmsg.sock_write_iter
> > 0.82 -0.0 0.80 perf-
> > profile.calltrace.cycles-
> > pp.obj_cgroup_charge_account.__memcg_slab_post_alloc_hook.kmem_cach
> > e_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
> > 0.57 -0.0 0.56 perf-
> > profile.calltrace.cycles-
> > pp.refill_obj_stock.__memcg_slab_free_hook.kmem_cache_free.unix_str
> > eam_read_generic.unix_stream_recvmsg
> > 0.80 ± 9% +0.2 1.01 ± 8% perf-
> > profile.calltrace.cycles-
> > pp._raw_spin_lock_irqsave.__wake_up_sync_key.sock_def_readable.unix
> > _stream_sendmsg.sock_write_iter
> > 2.50 ± 4% +0.3 2.82 ± 9% perf-
> > profile.calltrace.cycles-
> > pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb
> > _with_frags.sock_alloc_send_pskb
> > 2.64 ± 6% +0.4 3.06 ± 12% perf-
> > profile.calltrace.cycles-
> > pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_pa
> > rtials.kmem_cache_free.unix_stream_read_generic
> > 2.73 ± 6% +0.4 3.16 ± 12% perf-
> > profile.calltrace.cycles-
> > pp._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_strea
> > m_read_generic.unix_stream_recvmsg
> > 2.87 ± 6% +0.4 3.30 ± 12% perf-
> > profile.calltrace.cycles-
> > pp.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_str
> > eam_recvmsg.sock_recvmsg
> > 18.38 +0.6 18.93 perf-
> > profile.calltrace.cycles-
> > pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_wri
> > te.ksys_write
> > 0.00 +0.7 0.70 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_
> > enter.cpuidle_enter_state
> > 0.00 +0.8 0.76 ± 16% perf-
> > profile.calltrace.cycles-
> > pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_stream_send
> > msg.sock_write_iter.vfs_write
> > 0.00 +1.5 1.46 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_
> > state.cpuidle_enter
> > 0.00 +1.5 1.46 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_e
> > nter.cpuidle_idle_call
> > 0.00 +1.5 1.46 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_c
> > all.do_idle
> > 0.00 +1.5 1.50 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_
> > startup_entry
> > 0.00 +1.5 1.52 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_
> > secondary
> > 0.00 +1.6 1.61 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.comm
> > on_startup_64
> > 0.18 ±141% +1.8 1.93 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> > 0.18 ±141% +1.8 1.94 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.cpu_startup_entry.start_secondary.common_startup_64
> > 0.18 ±141% +1.8 1.94 ± 11% perf-
> > profile.calltrace.cycles-pp.start_secondary.common_startup_64
> > 0.18 ±141% +1.8 1.97 ± 11% perf-
> > profile.calltrace.cycles-pp.common_startup_64
> > 0.00 +2.0 1.96 ± 11% perf-
> > profile.calltrace.cycles-
> > pp.asm_sysvec_call_function_single.pv_native_safe_halt.acpi_safe_ha
> > lt.acpi_idle_do_entry.acpi_idle_enter
> > 87.96 -1.4 86.57 perf-
> > profile.children.cycles-pp.do_syscall_64
> > 88.72 -1.4 87.33 perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 51.44 -1.4 50.05 perf-
> > profile.children.cycles-pp.read
> > 4.55 ± 2% -0.8 3.74 ± 5% perf-
> > profile.children.cycles-pp.schedule
> > 3.76 ± 4% -0.7 3.02 ± 3% perf-
> > profile.children.cycles-pp.__wake_up_common
> > 3.64 ± 4% -0.7 2.92 ± 3% perf-
> > profile.children.cycles-pp.autoremove_wake_function
> > 3.60 ± 4% -0.7 2.90 ± 3% perf-
> > profile.children.cycles-pp.try_to_wake_up
> > 4.00 ± 2% -0.6 3.36 ± 4% perf-
> > profile.children.cycles-pp.schedule_timeout
> > 4.65 ± 2% -0.6 4.02 ± 4% perf-
> > profile.children.cycles-pp.__schedule
> > 47.64 -0.6 47.01 perf-
> > profile.children.cycles-pp.write
> > 4.58 ± 4% -0.5 4.06 perf-
> > profile.children.cycles-pp.__wake_up_sync_key
> > 1.45 ± 2% -0.4 1.00 ± 5% perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> > 1.84 ± 3% -0.3 1.50 ± 3% perf-
> > profile.children.cycles-pp.ttwu_do_activate
> > 1.62 ± 2% -0.3 1.33 ± 3% perf-
> > profile.children.cycles-pp.enqueue_task
> > 1.53 ± 2% -0.3 1.26 ± 3% perf-
> > profile.children.cycles-pp.enqueue_task_fair
> > 1.40 -0.3 1.14 ± 6% perf-
> > profile.children.cycles-pp.pick_next_task_fair
> > 3.97 -0.2 3.73 perf-
> > profile.children.cycles-pp.clear_bhb_loop
> > 1.43 -0.2 1.19 ± 5% perf-
> > profile.children.cycles-pp.__pick_next_task
> > 0.75 ± 4% -0.2 0.52 ± 8% perf-
> > profile.children.cycles-pp.raw_spin_rq_lock_nested
> > 7.95 -0.2 7.72 perf-
> > profile.children.cycles-pp.unix_stream_read_actor
> > 7.84 -0.2 7.61 perf-
> > profile.children.cycles-pp.skb_copy_datagram_iter
> > 3.24 ± 2% -0.2 3.01 perf-
> > profile.children.cycles-pp.skb_copy_datagram_from_iter
> > 7.63 -0.2 7.42 perf-
> > profile.children.cycles-pp.__skb_datagram_iter
> > 0.94 ± 4% -0.2 0.73 ± 4% perf-
> > profile.children.cycles-pp.enqueue_entity
> > 0.95 ± 8% -0.2 0.76 ± 4% perf-
> > profile.children.cycles-pp.update_curr
> > 1.37 ± 3% -0.2 1.18 ± 3% perf-
> > profile.children.cycles-pp.dequeue_task_fair
> > 1.34 ± 4% -0.2 1.16 ± 3% perf-
> > profile.children.cycles-pp.try_to_block_task
> > 4.50 -0.2 4.34 perf-
> > profile.children.cycles-pp.__memcg_slab_post_alloc_hook
> > 1.37 ± 3% -0.2 1.20 ± 3% perf-
> > profile.children.cycles-pp.dequeue_entities
> > 3.48 ± 3% -0.1 3.33 perf-
> > profile.children.cycles-pp._copy_to_iter
> > 0.91 -0.1 0.78 ± 3% perf-
> > profile.children.cycles-pp.update_load_avg
> > 4.85 -0.1 4.72 perf-
> > profile.children.cycles-pp.__check_object_size
> > 3.23 -0.1 3.11 perf-
> > profile.children.cycles-pp.entry_SYSCALL_64
> > 0.54 ± 3% -0.1 0.42 ± 5% perf-
> > profile.children.cycles-pp.switch_mm_irqs_off
> > 1.40 ± 4% -0.1 1.30 ± 2% perf-
> > profile.children.cycles-pp._copy_from_iter
> > 2.02 -0.1 1.92 perf-
> > profile.children.cycles-pp.its_return_thunk
> > 0.43 ± 2% -0.1 0.32 ± 3% perf-
> > profile.children.cycles-pp.switch_fpu_return
> > 0.29 ± 2% -0.1 0.18 ± 6% perf-
> > profile.children.cycles-pp.__enqueue_entity
> > 1.46 ± 3% -0.1 1.36 ± 2% perf-
> > profile.children.cycles-pp.fdget_pos
> > 0.44 ± 3% -0.1 0.34 ± 5% perf-
> > profile.children.cycles-pp.set_next_entity
> > 0.42 ± 2% -0.1 0.32 ± 4% perf-
> > profile.children.cycles-pp.pick_task_fair
> > 0.31 ± 2% -0.1 0.24 ± 6% perf-
> > profile.children.cycles-pp.reweight_entity
> > 0.28 ± 2% -0.1 0.20 ± 7% perf-
> > profile.children.cycles-pp.__dequeue_entity
> > 1.96 -0.1 1.88 perf-
> > profile.children.cycles-pp.obj_cgroup_charge_account
> > 0.28 ± 2% -0.1 0.21 ± 3% perf-
> > profile.children.cycles-pp.update_cfs_group
> > 0.23 ± 2% -0.1 0.16 ± 5% perf-
> > profile.children.cycles-pp.pick_eevdf
> > 0.26 ± 2% -0.1 0.19 ± 4% perf-
> > profile.children.cycles-pp.wakeup_preempt
> > 1.46 -0.1 1.40 perf-
> > profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 0.48 ± 2% -0.1 0.42 ± 5% perf-
> > profile.children.cycles-pp.__rseq_handle_notify_resume
> > 0.30 -0.1 0.24 ± 4% perf-
> > profile.children.cycles-pp.restore_fpregs_from_fpstate
> > 0.82 -0.1 0.77 perf-
> > profile.children.cycles-pp.__cond_resched
> > 0.27 ± 2% -0.0 0.22 ± 4% perf-
> > profile.children.cycles-pp.__update_load_avg_se
> > 0.14 ± 3% -0.0 0.10 ± 7% perf-
> > profile.children.cycles-pp.update_curr_se
> > 0.79 -0.0 0.74 perf-
> > profile.children.cycles-pp.mutex_lock
> > 0.34 ± 3% -0.0 0.30 ± 5% perf-
> > profile.children.cycles-pp.rseq_ip_fixup
> > 0.15 ± 4% -0.0 0.11 ± 5% perf-
> > profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> > 0.21 ± 3% -0.0 0.16 ± 4% perf-
> > profile.children.cycles-pp.__switch_to
> > 0.17 ± 4% -0.0 0.13 ± 7% perf-
> > profile.children.cycles-pp.place_entity
> > 0.22 -0.0 0.18 ± 2% perf-
> > profile.children.cycles-pp.wake_affine
> > 0.24 -0.0 0.20 ± 2% perf-
> > profile.children.cycles-pp.check_stack_object
> > 0.64 ± 2% -0.0 0.61 ± 3% perf-
> > profile.children.cycles-pp.__virt_addr_valid
> > 0.38 ± 2% -0.0 0.34 ± 2% perf-
> > profile.children.cycles-pp.tick_nohz_handler
> > 0.18 ± 3% -0.0 0.14 ± 6% perf-
> > profile.children.cycles-pp.update_rq_clock
> > 0.66 -0.0 0.62 perf-
> > profile.children.cycles-pp.rw_verify_area
> > 0.19 -0.0 0.16 ± 4% perf-
> > profile.children.cycles-pp.task_mm_cid_work
> > 0.34 ± 3% -0.0 0.31 ± 2% perf-
> > profile.children.cycles-pp.update_process_times
> > 0.12 ± 8% -0.0 0.08 ± 11% perf-
> > profile.children.cycles-pp.detach_tasks
> > 0.39 ± 3% -0.0 0.36 ± 2% perf-
> > profile.children.cycles-pp.__hrtimer_run_queues
> > 0.21 ± 3% -0.0 0.18 ± 6% perf-
> > profile.children.cycles-pp.__update_load_avg_cfs_rq
> > 0.18 ± 6% -0.0 0.15 ± 4% perf-
> > profile.children.cycles-pp.task_tick_fair
> > 0.25 ± 3% -0.0 0.22 ± 4% perf-
> > profile.children.cycles-pp.rseq_get_rseq_cs
> > 0.23 ± 5% -0.0 0.20 ± 3% perf-
> > profile.children.cycles-pp.sched_tick
> > 0.14 ± 3% -0.0 0.11 ± 6% perf-
> > profile.children.cycles-pp.check_preempt_wakeup_fair
> > 0.11 ± 4% -0.0 0.08 ± 7% perf-
> > profile.children.cycles-pp.update_min_vruntime
> > 0.06 -0.0 0.03 ± 70% perf-
> > profile.children.cycles-pp.update_curr_dl_se
> > 0.14 ± 3% -0.0 0.12 ± 5% perf-
> > profile.children.cycles-pp.put_prev_entity
> > 0.13 ± 5% -0.0 0.10 ± 3% perf-
> > profile.children.cycles-pp.task_h_load
> > 0.68 -0.0 0.65 perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> > 0.46 ± 2% -0.0 0.43 ± 2% perf-
> > profile.children.cycles-pp.hrtimer_interrupt
> > 0.52 -0.0 0.50 perf-
> > profile.children.cycles-pp.scm_recv_unix
> > 0.08 ± 4% -0.0 0.06 ± 9% perf-
> > profile.children.cycles-pp.__cgroup_account_cputime
> > 0.11 ± 5% -0.0 0.09 ± 4% perf-
> > profile.children.cycles-pp.__switch_to_asm
> > 0.46 ± 2% -0.0 0.44 ± 2% perf-
> > profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> > 0.08 ± 8% -0.0 0.06 ± 9% perf-
> > profile.children.cycles-pp.activate_task
> > 0.08 ± 8% -0.0 0.06 ± 9% perf-
> > profile.children.cycles-pp.detach_task
> > 0.11 ± 5% -0.0 0.09 ± 7% perf-
> > profile.children.cycles-pp.os_xsave
> > 0.13 ± 5% -0.0 0.11 ± 6% perf-
> > profile.children.cycles-pp.avg_vruntime
> > 0.13 ± 4% -0.0 0.11 ± 5% perf-
> > profile.children.cycles-pp.update_entity_lag
> > 0.08 ± 4% -0.0 0.06 ± 7% perf-
> > profile.children.cycles-pp.__calc_delta
> > 0.09 ± 5% -0.0 0.07 ± 8% perf-
> > profile.children.cycles-pp.vruntime_eligible
> > 0.34 ± 2% -0.0 0.32 perf-
> > profile.children.cycles-pp._raw_spin_unlock_irqrestore
> > 0.30 -0.0 0.29 ± 2% perf-
> > profile.children.cycles-pp.__build_skb_around
> > 0.08 ± 5% -0.0 0.07 ± 6% perf-
> > profile.children.cycles-pp.rseq_update_cpu_node_id
> > 0.15 -0.0 0.14 perf-
> > profile.children.cycles-pp.security_socket_getpeersec_dgram
> > 0.07 ± 5% +0.0 0.09 ± 5% perf-
> > profile.children.cycles-pp.native_irq_return_iret
> > 0.38 ± 2% +0.0 0.40 ± 2% perf-
> > profile.children.cycles-pp.mod_memcg_lruvec_state
> > 0.27 ± 2% +0.0 0.30 ± 2% perf-
> > profile.children.cycles-pp.prepare_task_switch
> > 0.05 ± 7% +0.0 0.08 ± 8% perf-
> > profile.children.cycles-pp.handle_softirqs
> > 0.06 +0.0 0.09 ± 11% perf-
> > profile.children.cycles-pp.finish_wait
> > 0.06 ± 7% +0.0 0.11 ± 6% perf-
> > profile.children.cycles-pp.__irq_exit_rcu
> > 0.06 ± 8% +0.1 0.11 ± 8% perf-
> > profile.children.cycles-pp.ttwu_queue_wakelist
> > 0.01 ±223% +0.1 0.07 ± 10% perf-
> > profile.children.cycles-pp.ktime_get
> > 0.54 ± 4% +0.1 0.61 perf-
> > profile.children.cycles-pp.select_task_rq
> > 0.00 +0.1 0.07 ± 10% perf-
> > profile.children.cycles-pp.enqueue_dl_entity
> > 0.12 ± 4% +0.1 0.19 ± 7% perf-
> > profile.children.cycles-pp.get_any_partial
> > 0.10 ± 9% +0.1 0.18 ± 5% perf-
> > profile.children.cycles-pp.available_idle_cpu
> > 0.00 +0.1 0.08 ± 9% perf-
> > profile.children.cycles-pp.hrtimer_start_range_ns
> > 0.00 +0.1 0.08 ± 11% perf-
> > profile.children.cycles-pp.dl_server_start
> > 0.00 +0.1 0.08 ± 11% perf-
> > profile.children.cycles-pp.dl_server_stop
> > 0.46 ± 2% +0.1 0.54 ± 2% perf-
> > profile.children.cycles-pp.select_task_rq_fair
> > 0.00 +0.1 0.10 ± 10% perf-
> > profile.children.cycles-pp.select_idle_core
> > 0.09 ± 7% +0.1 0.20 ± 8% perf-
> > profile.children.cycles-pp.select_idle_cpu
> > 0.18 ± 4% +0.1 0.31 ± 6% perf-
> > profile.children.cycles-pp.select_idle_sibling
> > 0.00 +0.2 0.18 ± 4% perf-
> > profile.children.cycles-pp.process_one_work
> > 0.06 ± 13% +0.2 0.25 ± 9% perf-
> > profile.children.cycles-pp.schedule_idle
> > 0.44 ± 2% +0.2 0.64 ± 8% perf-
> > profile.children.cycles-pp.prepare_to_wait
> > 0.00 +0.2 0.21 ± 5% perf-
> > profile.children.cycles-pp.kthread
> > 0.00 +0.2 0.21 ± 5% perf-
> > profile.children.cycles-pp.worker_thread
> > 0.00 +0.2 0.21 ± 4% perf-
> > profile.children.cycles-pp.ret_from_fork
> > 0.00 +0.2 0.21 ± 4% perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> > 0.11 ± 12% +0.3 0.36 ± 9% perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> > 0.31 ± 35% +0.3 0.59 ± 11% perf-
> > profile.children.cycles-pp.__cmd_record
> > 0.26 ± 45% +0.3 0.54 ± 13% perf-
> > profile.children.cycles-pp.perf_session__process_events
> > 0.26 ± 45% +0.3 0.54 ± 13% perf-
> > profile.children.cycles-pp.reader__read_event
> > 0.26 ± 45% +0.3 0.54 ± 13% perf-
> > profile.children.cycles-pp.record__finish_output
> > 0.16 ± 11% +0.3 0.45 ± 9% perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> > 0.14 ± 11% +0.3 0.45 ± 9% perf-
> > profile.children.cycles-pp.__sysvec_call_function_single
> > 0.14 ± 60% +0.3 0.48 ± 17% perf-
> > profile.children.cycles-pp.ordered_events__queue
> > 0.14 ± 61% +0.3 0.48 ± 17% perf-
> > profile.children.cycles-pp.queue_event
> > 0.15 ± 59% +0.3 0.49 ± 16% perf-
> > profile.children.cycles-pp.process_simple
> > 0.16 ± 12% +0.4 0.54 ± 10% perf-
> > profile.children.cycles-pp.sysvec_call_function_single
> > 4.61 ± 3% +0.5 5.13 ± 8% perf-
> > profile.children.cycles-pp.get_partial_node
> > 5.57 ± 3% +0.6 6.12 ± 7% perf-
> > profile.children.cycles-pp.___slab_alloc
> > 18.44 +0.6 19.00 perf-
> > profile.children.cycles-pp.sock_alloc_send_pskb
> > 6.51 ± 3% +0.7 7.26 ± 9% perf-
> > profile.children.cycles-pp.__put_partials
> > 0.33 ± 14% +1.0 1.30 ± 11% perf-
> > profile.children.cycles-pp.asm_sysvec_call_function_single
> > 0.34 ± 17% +1.1 1.47 ± 11% perf-
> > profile.children.cycles-pp.pv_native_safe_halt
> > 0.34 ± 17% +1.1 1.48 ± 11% perf-
> > profile.children.cycles-pp.acpi_safe_halt
> > 0.34 ± 17% +1.1 1.48 ± 11% perf-
> > profile.children.cycles-pp.acpi_idle_do_entry
> > 0.34 ± 17% +1.1 1.48 ± 11% perf-
> > profile.children.cycles-pp.acpi_idle_enter
> > 0.35 ± 17% +1.2 1.53 ± 11% perf-
> > profile.children.cycles-pp.cpuidle_enter_state
> > 0.35 ± 17% +1.2 1.54 ± 11% perf-
> > profile.children.cycles-pp.cpuidle_enter
> > 0.38 ± 17% +1.3 1.63 ± 11% perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> > 0.45 ± 16% +1.5 1.94 ± 11% perf-
> > profile.children.cycles-pp.start_secondary
> > 0.46 ± 17% +1.5 1.96 ± 11% perf-
> > profile.children.cycles-pp.do_idle
> > 0.46 ± 17% +1.5 1.97 ± 11% perf-
> > profile.children.cycles-pp.common_startup_64
> > 0.46 ± 17% +1.5 1.97 ± 11% perf-
> > profile.children.cycles-pp.cpu_startup_entry
> > 13.76 ± 2% +1.7 15.44 ± 5% perf-
> > profile.children.cycles-pp._raw_spin_lock_irqsave
> > 12.09 ± 2% +1.9 14.00 ± 6% perf-
> > profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > 3.93 -0.2 3.69 perf-
> > profile.self.cycles-pp.clear_bhb_loop
> > 3.43 ± 3% -0.1 3.29 perf-
> > profile.self.cycles-pp._copy_to_iter
> > 0.50 ± 2% -0.1 0.39 ± 5% perf-
> > profile.self.cycles-pp.switch_mm_irqs_off
> > 1.37 ± 4% -0.1 1.27 ± 2% perf-
> > profile.self.cycles-pp._copy_from_iter
> > 0.28 ± 2% -0.1 0.18 ± 7% perf-
> > profile.self.cycles-pp.__enqueue_entity
> > 1.41 ± 3% -0.1 1.31 ± 2% perf-
> > profile.self.cycles-pp.fdget_pos
> > 2.51 -0.1 2.42 perf-
> > profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> > 1.35 -0.1 1.28 perf-
> > profile.self.cycles-pp.read
> > 2.24 -0.1 2.17 perf-
> > profile.self.cycles-pp.do_syscall_64
> > 0.27 ± 3% -0.1 0.20 ± 3% perf-
> > profile.self.cycles-pp.update_cfs_group
> > 1.28 -0.1 1.22 perf-
> > profile.self.cycles-pp.sock_write_iter
> > 0.84 -0.1 0.77 perf-
> > profile.self.cycles-pp.vfs_read
> > 1.42 -0.1 1.36 perf-
> > profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 1.20 -0.1 1.14 perf-
> > profile.self.cycles-pp.__alloc_skb
> > 0.18 ± 2% -0.1 0.13 ± 5% perf-
> > profile.self.cycles-pp.pick_eevdf
> > 1.04 -0.1 0.99 perf-
> > profile.self.cycles-pp.its_return_thunk
> > 0.29 ± 2% -0.1 0.24 ± 4% perf-
> > profile.self.cycles-pp.restore_fpregs_from_fpstate
> > 0.28 ± 5% -0.1 0.23 ± 6% perf-
> > profile.self.cycles-pp.update_curr
> > 0.13 ± 5% -0.0 0.08 ± 5% perf-
> > profile.self.cycles-pp.switch_fpu_return
> > 0.20 ± 3% -0.0 0.15 ± 6% perf-
> > profile.self.cycles-pp.__dequeue_entity
> > 1.00 -0.0 0.95 perf-
> > profile.self.cycles-pp.kmem_cache_alloc_node_noprof
> > 0.33 -0.0 0.28 ± 2% perf-
> > profile.self.cycles-pp.update_load_avg
> > 0.88 -0.0 0.83 ± 2% perf-
> > profile.self.cycles-pp.vfs_write
> > 0.91 -0.0 0.86 perf-
> > profile.self.cycles-pp.sock_read_iter
> > 0.13 ± 3% -0.0 0.08 ± 4% perf-
> > profile.self.cycles-pp.update_curr_se
> > 0.25 ± 2% -0.0 0.21 ± 4% perf-
> > profile.self.cycles-pp.__update_load_avg_se
> > 1.22 -0.0 1.18 perf-
> > profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
> > 0.68 -0.0 0.63 perf-
> > profile.self.cycles-pp.__check_object_size
> > 0.78 ± 2% -0.0 0.74 perf-
> > profile.self.cycles-pp.obj_cgroup_charge_account
> > 0.20 ± 3% -0.0 0.16 ± 4% perf-
> > profile.self.cycles-pp.__switch_to
> > 0.15 ± 3% -0.0 0.11 ± 4% perf-
> > profile.self.cycles-pp.try_to_wake_up
> > 0.90 -0.0 0.86 perf-
> > profile.self.cycles-pp.entry_SYSCALL_64
> > 0.76 ± 2% -0.0 0.73 perf-
> > profile.self.cycles-pp.__check_heap_object
> > 0.92 -0.0 0.89 ± 2% perf-
> > profile.self.cycles-pp.__account_obj_stock
> > 0.19 ± 2% -0.0 0.16 ± 2% perf-
> > profile.self.cycles-pp.check_stack_object
> > 0.40 ± 3% -0.0 0.37 perf-
> > profile.self.cycles-pp.__schedule
> > 0.60 ± 2% -0.0 0.56 ± 3% perf-
> > profile.self.cycles-pp.__virt_addr_valid
> > 0.71 -0.0 0.68 perf-
> > profile.self.cycles-pp.__skb_datagram_iter
> > 0.18 ± 4% -0.0 0.14 ± 5% perf-
> > profile.self.cycles-pp.task_mm_cid_work
> > 0.68 -0.0 0.65 perf-
> > profile.self.cycles-pp.refill_obj_stock
> > 0.34 -0.0 0.31 ± 2% perf-
> > profile.self.cycles-pp.unix_stream_recvmsg
> > 0.06 ± 7% -0.0 0.03 ± 70% perf-
> > profile.self.cycles-pp.enqueue_task
> > 0.11 -0.0 0.08 perf-
> > profile.self.cycles-pp.pick_task_fair
> > 0.15 ± 2% -0.0 0.12 ± 3% perf-
> > profile.self.cycles-pp.enqueue_task_fair
> > 0.20 ± 3% -0.0 0.17 ± 7% perf-
> > profile.self.cycles-pp.__update_load_avg_cfs_rq
> > 0.41 -0.0 0.38 perf-
> > profile.self.cycles-pp.sock_recvmsg
> > 0.10 -0.0 0.07 ± 6% perf-
> > profile.self.cycles-pp.update_min_vruntime
> > 0.13 ± 3% -0.0 0.10 perf-
> > profile.self.cycles-pp.task_h_load
> > 0.23 ± 3% -0.0 0.20 ± 6% perf-
> > profile.self.cycles-pp.__get_user_8
> > 0.12 ± 4% -0.0 0.10 ± 3% perf-
> > profile.self.cycles-pp.exit_to_user_mode_loop
> > 0.39 ± 2% -0.0 0.37 ± 2% perf-
> > profile.self.cycles-pp.rw_verify_area
> > 0.11 ± 3% -0.0 0.09 ± 7% perf-
> > profile.self.cycles-pp.os_xsave
> > 0.12 ± 3% -0.0 0.10 ± 3% perf-
> > profile.self.cycles-pp.pick_next_task_fair
> > 0.35 -0.0 0.33 ± 2% perf-
> > profile.self.cycles-pp.skb_copy_datagram_from_iter
> > 0.46 -0.0 0.44 perf-
> > profile.self.cycles-pp.mutex_lock
> > 0.11 ± 4% -0.0 0.09 ± 4% perf-
> > profile.self.cycles-pp.__switch_to_asm
> > 0.10 ± 3% -0.0 0.08 ± 5% perf-
> > profile.self.cycles-pp.enqueue_entity
> > 0.08 ± 7% -0.0 0.06 ± 6% perf-
> > profile.self.cycles-pp.place_entity
> > 0.30 -0.0 0.28 ± 2% perf-
> > profile.self.cycles-pp.alloc_skb_with_frags
> > 0.50 -0.0 0.48 perf-
> > profile.self.cycles-pp.kfree
> > 0.30 -0.0 0.28 perf-
> > profile.self.cycles-pp.ksys_write
> > 0.12 ± 3% -0.0 0.10 ± 3% perf-
> > profile.self.cycles-pp.dequeue_entity
> > 0.11 ± 4% -0.0 0.09 perf-
> > profile.self.cycles-pp.prepare_to_wait
> > 0.19 ± 2% -0.0 0.17 perf-
> > profile.self.cycles-pp.update_rq_clock_task
> > 0.27 -0.0 0.25 ± 2% perf-
> > profile.self.cycles-pp.__build_skb_around
> > 0.08 ± 6% -0.0 0.06 ± 9% perf-
> > profile.self.cycles-pp.vruntime_eligible
> > 0.12 ± 4% -0.0 0.10 perf-
> > profile.self.cycles-pp.__wake_up_common
> > 0.27 -0.0 0.26 perf-
> > profile.self.cycles-pp.kmalloc_reserve
> > 0.48 -0.0 0.46 perf-
> > profile.self.cycles-pp.unix_write_space
> > 0.19 -0.0 0.18 ± 2% perf-
> > profile.self.cycles-pp.skb_copy_datagram_iter
> > 0.07 -0.0 0.06 ± 6% perf-
> > profile.self.cycles-pp.__calc_delta
> > 0.06 ± 6% -0.0 0.05 perf-
> > profile.self.cycles-pp.__put_user_8
> > 0.28 -0.0 0.27 perf-
> > profile.self.cycles-pp._raw_spin_unlock_irqrestore
> > 0.11 -0.0 0.10 perf-
> > profile.self.cycles-pp.wait_for_unix_gc
> > 0.05 +0.0 0.06 perf-
> > profile.self.cycles-pp.__x64_sys_write
> > 0.07 ± 5% +0.0 0.08 ± 5% perf-
> > profile.self.cycles-pp.native_irq_return_iret
> > 0.19 ± 7% +0.0 0.22 ± 4% perf-
> > profile.self.cycles-pp.prepare_task_switch
> > 0.10 ± 6% +0.1 0.17 ± 5% perf-
> > profile.self.cycles-pp.available_idle_cpu
> > 0.14 ± 61% +0.3 0.48 ± 17% perf-
> > profile.self.cycles-pp.queue_event
> > 0.19 ± 18% +0.7 0.85 ± 12% perf-
> > profile.self.cycles-pp.pv_native_safe_halt
> > 12.07 ± 2% +1.9 13.97 ± 6% perf-
> > profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> > gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/shell_rtns_3/aim9/300s
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 9156 +20.2% 11004 vmstat.system.cs
> > 8715946 ± 6% -14.0% 7494314 ± 13% meminfo.DirectMap2M
> > 10992 +85.4% 20381 meminfo.PageTables
> > 318.58 -1.7% 313.01
> > aim9.shell_rtns_3.ops_per_sec
> > 27145198 -2.1% 26576524
> > aim9.time.minor_page_faults
> > 1049306 -1.8% 1030938
> > aim9.time.voluntary_context_switches
> > 6173 ± 20% +74.0% 10742 ± 4% numa-
> > meminfo.node0.PageTables
> > 5702 ± 31% +55.1% 8844 ± 19% numa-
> > meminfo.node0.Shmem
> > 4803 ± 25% +100.6% 9636 ± 6% numa-
> > meminfo.node1.PageTables
> > 1538 ± 20% +73.7% 2673 ± 5% numa-
> > vmstat.node0.nr_page_table_pages
> > 1425 ± 31% +55.1% 2210 ± 19% numa-
> > vmstat.node0.nr_shmem
> > 1194 ± 25% +101.2% 2402 ± 6% numa-
> > vmstat.node1.nr_page_table_pages
> > 30413 +19.3% 36291
> > sched_debug.cpu.nr_switches.avg
> > 84768 ± 6% +20.3% 101955 ± 4%
> > sched_debug.cpu.nr_switches.max
> > 25510 ± 13% +23.0% 31383 ± 3%
> > sched_debug.cpu.nr_switches.stddev
> > 2727 +85.8% 5066 proc-
> > vmstat.nr_page_table_pages
> > 19325131 -1.6% 19014535 proc-vmstat.numa_hit
> > 19274656 -1.6% 18964467 proc-
> > vmstat.numa_local
> > 19877211 -1.6% 19563123 proc-
> > vmstat.pgalloc_normal
> > 28020416 -2.0% 27451741 proc-vmstat.pgfault
> > 19829318 -1.6% 19508263 proc-vmstat.pgfree
> > 2679 -1.6% 2636 proc-
> > vmstat.unevictable_pgs_culled
> > 0.03 ± 10% +30.9% 0.04 ± 2% perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 0.02 ± 5% +26.2% 0.02 ± 3% perf-
> > sched.total_sch_delay.average.ms
> > 27.03 ± 2% -12.4% 23.66 perf-
> > sched.total_wait_and_delay.average.ms
> > 23171 +18.2% 27385 perf-
> > sched.total_wait_and_delay.count.ms
> > 27.01 ± 2% -12.5% 23.64 perf-
> > sched.total_wait_time.average.ms
> > 110.73 ± 4% -71.1% 31.98 perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 1662 ± 2% +278.6% 6294 perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> > 110.70 ± 4% -71.1% 31.94 perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 5.94 +0.1 6.00 perf-stat.i.branch-
> > miss-rate%
> > 9184 +20.2% 11041 perf-stat.i.context-
> > switches
> > 1.96 +1.6% 1.99 perf-stat.i.cpi
> > 71.73 ± 4% +66.1% 119.11 ± 5% perf-stat.i.cpu-
> > migrations
> > 0.53 -1.4% 0.52 perf-stat.i.ipc
> > 3.79 -2.0% 3.71 perf-
> > stat.i.metric.K/sec
> > 90919 -2.0% 89065 perf-stat.i.minor-
> > faults
> > 90919 -2.0% 89065 perf-stat.i.page-
> > faults
> > 6.00 +0.1 6.06 perf-
> > stat.overall.branch-miss-rate%
> > 1.79 +1.2% 1.81 perf-
> > stat.overall.cpi
> > 0.56 -1.2% 0.55 perf-
> > stat.overall.ipc
> > 9154 +20.2% 11004 perf-
> > stat.ps.context-switches
> > 71.49 ± 4% +66.1% 118.72 ± 5% perf-stat.ps.cpu-
> > migrations
> > 90616 -2.0% 88768 perf-stat.ps.minor-
> > faults
> > 90616 -2.0% 88768 perf-stat.ps.page-
> > faults
> > 8.89 -0.2 8.68 perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 8.88 -0.2 8.66 perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 3.47 ± 2% -0.2 3.29 perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64
> > _after_hwframe
> > 3.47 ± 2% -0.2 3.29 perf-
> > profile.calltrace.cycles-
> > pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe
> > 3.51 ± 3% -0.2 3.33 perf-
> > profile.calltrace.cycles-
> > pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 3.47 ± 2% -0.2 3.29 perf-
> > profile.calltrace.cycles-
> > pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_sysca
> > ll_64
> > 1.66 ± 2% -0.1 1.57 ± 4% perf-
> > profile.calltrace.cycles-pp.setlocale
> > 0.27 ±100% +0.3 0.61 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_s
> > tartup_64
> > 0.18 ±141% +0.4 0.60 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_seconda
> > ry
> > 62.46 +0.6 63.01 perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_
> > startup_entry
> > 0.09 ±223% +0.6 0.65 ± 7% perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> > 0.09 ±223% +0.6 0.65 ± 7% perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> > 49.01 +0.6 49.60 perf-
> > profile.calltrace.cycles-
> > pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.d
> > o_idle
> > 67.47 +0.7 68.17 perf-
> > profile.calltrace.cycles-pp.common_startup_64
> > 20.25 -0.7 19.58 perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 20.21 -0.7 19.54 perf-
> > profile.children.cycles-pp.do_syscall_64
> > 6.54 -0.2 6.33 perf-
> > profile.children.cycles-pp.asm_exc_page_fault
> > 6.10 -0.2 5.90 perf-
> > profile.children.cycles-pp.do_user_addr_fault
> > 3.77 ± 3% -0.2 3.60 perf-
> > profile.children.cycles-pp.x64_sys_call
> > 3.62 ± 3% -0.2 3.46 perf-
> > profile.children.cycles-pp.do_exit
> > 2.63 ± 3% -0.2 2.48 ± 2% perf-
> > profile.children.cycles-pp.__mmput
> > 2.16 ± 2% -0.1 2.06 ± 3% perf-
> > profile.children.cycles-pp.ksys_mmap_pgoff
> > 1.66 ± 2% -0.1 1.57 ± 4% perf-
> > profile.children.cycles-pp.setlocale
> > 2.69 ± 2% -0.1 2.61 perf-
> > profile.children.cycles-pp.do_pte_missing
> > 0.77 ± 5% -0.1 0.70 ± 6% perf-
> > profile.children.cycles-pp.tlb_finish_mmu
> > 0.92 ± 2% -0.0 0.87 ± 4% perf-
> > profile.children.cycles-pp.__irqentry_text_end
> > 0.08 ± 10% -0.0 0.04 ± 71% perf-
> > profile.children.cycles-pp.tick_nohz_tick_stopped
> > 0.10 ± 11% -0.0 0.07 ± 21% perf-
> > profile.children.cycles-pp.__percpu_counter_init_many
> > 0.14 ± 9% -0.0 0.11 ± 4% perf-
> > profile.children.cycles-pp.strnlen
> > 0.12 ± 11% -0.0 0.10 ± 8% perf-
> > profile.children.cycles-pp.mas_prev_slot
> > 0.11 ± 12% +0.0 0.14 ± 9% perf-
> > profile.children.cycles-pp.update_curr
> > 0.19 ± 8% +0.0 0.22 ± 6% perf-
> > profile.children.cycles-pp.enqueue_entity
> > 0.10 ± 11% +0.0 0.13 ± 11% perf-
> > profile.children.cycles-pp.__perf_event_task_sched_out
> > 0.05 ± 46% +0.0 0.08 ± 13% perf-
> > profile.children.cycles-pp.select_task_rq
> > 0.13 ± 14% +0.0 0.17 ± 8% perf-
> > profile.children.cycles-pp.perf_pmu_sched_task
> > 0.20 ± 10% +0.0 0.24 ± 2% perf-
> > profile.children.cycles-pp.try_to_wake_up
> > 0.28 ± 9% +0.1 0.34 ± 9% perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> > 0.04 ± 44% +0.1 0.11 ± 13% perf-
> > profile.children.cycles-pp.__queue_work
> > 0.30 ± 11% +0.1 0.38 ± 8% perf-
> > profile.children.cycles-pp.ttwu_do_activate
> > 0.30 ± 4% +0.1 0.38 ± 8% perf-
> > profile.children.cycles-pp.__pick_next_task
> > 0.22 ± 7% +0.1 0.29 ± 9% perf-
> > profile.children.cycles-pp.try_to_block_task
> > 0.02 ±141% +0.1 0.09 ± 10% perf-
> > profile.children.cycles-pp.kick_pool
> > 0.02 ± 99% +0.1 0.10 ± 19% perf-
> > profile.children.cycles-pp.queue_work_on
> > 0.25 ± 4% +0.1 0.35 ± 7% perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> > 0.33 ± 6% +0.1 0.43 ± 5% perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> > 0.29 ± 4% +0.1 0.39 ± 6% perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> > 0.51 ± 6% +0.1 0.63 ± 6% perf-
> > profile.children.cycles-pp.schedule_idle
> > 0.46 ± 7% +0.1 0.58 ± 5% perf-
> > profile.children.cycles-pp.schedule
> > 0.88 ± 6% +0.2 1.04 ± 5% perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> > 0.18 ± 6% +0.2 0.34 ± 8% perf-
> > profile.children.cycles-pp.worker_thread
> > 0.88 ± 6% +0.2 1.04 ± 5% perf-
> > profile.children.cycles-pp.ret_from_fork
> > 0.38 ± 8% +0.2 0.56 ± 10% perf-
> > profile.children.cycles-pp.kthread
> > 1.08 ± 3% +0.2 1.32 ± 2% perf-
> > profile.children.cycles-pp.__schedule
> > 66.15 +0.5 66.64 perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> > 62.89 +0.6 63.47 perf-
> > profile.children.cycles-pp.cpuidle_enter_state
> > 63.00 +0.6 63.59 perf-
> > profile.children.cycles-pp.cpuidle_enter
> > 49.10 +0.6 49.69 perf-
> > profile.children.cycles-pp.intel_idle
> > 67.47 +0.7 68.17 perf-
> > profile.children.cycles-pp.do_idle
> > 67.47 +0.7 68.17 perf-
> > profile.children.cycles-pp.common_startup_64
> > 67.47 +0.7 68.17 perf-
> > profile.children.cycles-pp.cpu_startup_entry
> > 0.91 ± 2% -0.0 0.86 ± 4% perf-
> > profile.self.cycles-pp.__irqentry_text_end
> > 0.14 ± 11% +0.1 0.22 ± 11% perf-
> > profile.self.cycles-pp.timerqueue_del
> > 49.08 +0.6 49.68 perf-
> > profile.self.cycles-pp.intel_idle
> >
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> > gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/800%/debian-
> > 12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 3745213 ± 39% +108.1% 7794858 ± 12% cpuidle..usage
> > 186670 +17.3% 218939 ± 2% meminfo.Percpu
> > 5.00 +306.7% 20.33 ± 66%
> > mpstat.max_utilization.seconds
> > 9.35 ± 76% -4.5 4.80 ±141% perf-
> > profile.calltrace.cycles-
> > pp.__ordered_events__flush.perf_session__process_events.record__fin
> > ish_output.__cmd_record
> > 8.90 ± 75% -4.3 4.57 ±141% perf-
> > profile.calltrace.cycles-
> > pp.perf_session__deliver_event.__ordered_events__flush.perf_session
> > __process_events.record__finish_output.__cmd_record
> > 3283 ± 7% -16.2% 2751 ± 5%
> > sched_debug.cfs_rq:/.avg_vruntime.avg
> > 3283 ± 7% -16.2% 2751 ± 5%
> > sched_debug.cfs_rq:/.min_vruntime.avg
> > 1522512 ± 6% +80.0% 2739797 ± 4% vmstat.system.cs
> > 308726 ± 8% +60.5% 495472 ± 5% vmstat.system.in
> > 467562 +3.7% 485068 ± 2% proc-
> > vmstat.nr_kernel_stack
> > 266084 +3.8% 276310 proc-
> > vmstat.nr_slab_unreclaimable
> > 1.375e+08 -2.0% 1.347e+08 proc-vmstat.numa_hit
> > 1.373e+08 -2.0% 1.346e+08 proc-
> > vmstat.numa_local
> > 217472 ± 3% -28.1% 156410 proc-
> > vmstat.numa_other
> > 1.382e+08 -2.0% 1.354e+08 proc-
> > vmstat.pgalloc_normal
> > 1.375e+08 -2.0% 1.347e+08 proc-vmstat.pgfree
> > 1514102 -6.2% 1420287 hackbench.throughput
> > 1480357 -6.7% 1380775
> > hackbench.throughput_avg
> > 1514102 -6.2% 1420287
> > hackbench.throughput_best
> > 1436918 -7.9% 1323413
> > hackbench.throughput_worst
> > 14551264 ± 13% +138.1% 34644707 ± 3%
> > hackbench.time.involuntary_context_switches
> > 9919 -1.6% 9762
> > hackbench.time.percent_of_cpu_this_job_got
> > 4239 +4.5% 4428
> > hackbench.time.system_time
> > 56365933 ± 6% +65.3% 93172066 ± 4%
> > hackbench.time.voluntary_context_switches
> > 65085618 +26.7% 82440571 ± 2% perf-stat.i.branch-
> > misses
> > 31.25 -1.6 29.66 perf-stat.i.cache-
> > miss-rate%
> > 2.469e+08 +8.9% 2.689e+08 perf-stat.i.cache-
> > misses
> > 7.519e+08 +15.9% 8.712e+08 perf-stat.i.cache-
> > references
> > 1353061 ± 7% +87.5% 2537450 ± 5% perf-stat.i.context-
> > switches
> > 2.269e+11 +3.5% 2.348e+11 perf-stat.i.cpu-
> > cycles
> > 134588 ± 13% +81.9% 244825 ± 8% perf-stat.i.cpu-
> > migrations
> > 13.60 ± 5% +70.5% 23.20 ± 5% perf-
> > stat.i.metric.K/sec
> > 1.26 +7.6% 1.35 perf-
> > stat.overall.MPKI
> > 0.11 ± 2% +0.0 0.14 ± 2% perf-
> > stat.overall.branch-miss-rate%
> > 34.12 -2.1 31.97 perf-
> > stat.overall.cache-miss-rate%
> > 1.17 +1.8% 1.19 perf-
> > stat.overall.cpi
> > 931.96 -5.3% 882.44 perf-
> > stat.overall.cycles-between-cache-misses
> > 0.85 -1.8% 0.84 perf-
> > stat.overall.ipc
> > 5.372e+10 -1.2% 5.31e+10 perf-stat.ps.branch-
> > instructions
> > 57783128 ± 2% +32.9% 76802898 ± 2% perf-stat.ps.branch-
> > misses
> > 2.696e+08 +7.2% 2.89e+08 perf-stat.ps.cache-
> > misses
> > 7.902e+08 +14.4% 9.039e+08 perf-stat.ps.cache-
> > references
> > 1288664 ± 7% +94.6% 2508227 ± 5% perf-
> > stat.ps.context-switches
> > 2.512e+11 +1.5% 2.55e+11 perf-stat.ps.cpu-
> > cycles
> > 122960 ± 14% +82.3% 224127 ± 9% perf-stat.ps.cpu-
> > migrations
> > 1.108e+13 +5.7% 1.171e+13 perf-
> > stat.total.instructions
> > 0.94 ±223% +5929.9% 56.62 ±121% perf-
> > sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> > 26.44 ± 81% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 100.25 ±141% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 9.01 ± 43% +1823.1% 173.24 ±106% perf-
> > sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> > 49.43 ± 14% +73.8% 85.93 ± 19% perf-
> > sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> > 130.63 ± 17% +135.8% 308.04 ± 28% perf-
> > sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> > 18.09 ± 30% +130.4% 41.70 ± 26% perf-
> > sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 196.51 ± 21% +102.9% 398.77 ± 15% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> > 34.17 ± 39% +191.1% 99.46 ± 20% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> > 154.91 ±163% +1649.9% 2710 ± 91% perf-
> > sched.sch_delay.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksy
> > s_write.do_syscall_64
> > 0.94 ±223% +1.9e+05% 1743 ±120% perf-
> > sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> > 3.19 ±124% -91.9% 0.26 ±150% perf-
> > sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret
> > _from_fork.ret_from_fork_asm
> > 646.26 ± 94% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 282.66 ±139% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 63.17 ± 52% +2854.4% 1866 ±121% perf-
> > sched.sch_delay.max.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_
> > read
> > 1507 ± 35% +249.4% 5266 ± 47% perf-
> > sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 3915 ± 67% +98.7% 7779 ± 16% perf-
> > sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 53.31 ± 18% +79.9% 95.90 ± 23% perf-
> > sched.total_sch_delay.average.ms
> > 149.37 ± 18% +80.0% 268.92 ± 22% perf-
> > sched.total_wait_and_delay.average.ms
> > 96.07 ± 18% +80.1% 173.01 ± 21% perf-
> > sched.total_wait_time.average.ms
> > 244.53 ± 47% -100.0% 0.00 perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_rea
> > d.vfs_read.ksys_read
> > 529.64 ± 20% +38.5% 733.60 ± 20% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_wri
> > te.vfs_write.ksys_write
> > 136.52 ± 15% +73.7% 237.07 ± 18% perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_sy
> > scall_64
> > 373.41 ± 16% +136.3% 882.34 ± 27% perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do
> > _syscall_64
> > 51.96 ± 29% +127.5% 118.22 ± 25% perf-
> > sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> > 554.86 ± 23% +103.0% 1126 ± 14% perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown].[unknown]
> > 298.52 ±136% +436.9% 1602 ± 27% perf-
> > sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_sch
> > edule_timeout.constprop.0.do_poll
> > 556.66 ± 37% -97.1% 16.09 ± 47% perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 707.67 ± 31% -100.0% 0.00 perf-
> > sched.wait_and_delay.count.__cond_resched.mutex_lock.anon_pipe_read
> > .vfs_read.ksys_read
> > 1358 ± 28% +4707.9% 65291 ± 27% perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> > 12184 ± 5% -100.0% 0.00 perf-
> > sched.wait_and_delay.max.ms.__cond_resched.mutex_lock.anon_pipe_rea
> > d.vfs_read.ksys_read
> > 1393 ±134% +379.9% 6685 ± 15% perf-
> > sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_sch
> > edule_timeout.constprop.0.do_poll
> > 6927 ± 6% +119.8% 15224 ± 19% perf-
> > sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 341.61 ± 21% +39.1% 475.15 ± 20% perf-
> > sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vf
> > s_write.ksys_write
> > 51.39 ± 99% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 121.14 ±122% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 87.09 ± 15% +73.6% 151.14 ± 18% perf-
> > sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> > 242.78 ± 16% +136.6% 574.31 ± 27% perf-
> > sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> > 33.86 ± 29% +126.0% 76.52 ± 24% perf-
> > sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 250.32 ±109% -89.4% 26.44 ±111% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_common_interr
> > upt.[unknown].[unknown]
> > 358.36 ± 25% +103.1% 727.72 ± 14% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> > 77.40 ± 47% +102.5% 156.70 ± 28% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown].[unknown]
> > 17.91 ± 42% -75.3% 4.42 ± 76% perf-
> > sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_fro
> > m_fork_asm
> > 266.70 ±137% +431.6% 1417 ± 36% perf-
> > sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> > 536.93 ± 40% -97.4% 13.81 ± 50% perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 180.38 ±135% +2208.8% 4164 ± 71% perf-
> > sched.wait_time.max.ms.__cond_resched.anon_pipe_write.vfs_write.ksy
> > s_write.do_syscall_64
> > 1028 ±129% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 312.94 ±123% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 418.66 ±132% -93.7% 26.44 ±111% perf-
> > sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_common_interr
> > upt.[unknown].[unknown]
> > 1388 ±133% +379.7% 6660 ± 15% perf-
> > sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> > 2022 ± 25% +164.9% 5358 ± 46% perf-
> > sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> >
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> > gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/shell_rtns_1/aim9/300s
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 11004 +86.2% 20490 meminfo.PageTables
> > 121.33 ± 12% +18.8% 144.17 ± 5% perf-c2c.DRAM.remote
> > 9155 +20.0% 10990 vmstat.system.cs
> > 5129 ± 20% +107.2% 10631 ± 3% numa-
> > meminfo.node0.PageTables
> > 5864 ± 17% +67.3% 9811 ± 3% numa-
> > meminfo.node1.PageTables
> > 1278 ± 20% +107.9% 2658 ± 3% numa-
> > vmstat.node0.nr_page_table_pages
> > 1469 ± 17% +66.4% 2446 ± 3% numa-
> > vmstat.node1.nr_page_table_pages
> > 319.43 -2.1% 312.66
> > aim9.shell_rtns_1.ops_per_sec
> > 27217846 -2.5% 26546962
> > aim9.time.minor_page_faults
> > 1051878 -2.1% 1029547
> > aim9.time.voluntary_context_switches
> > 30502 +18.6% 36187
> > sched_debug.cpu.nr_switches.avg
> > 90327 ± 12% +22.7% 110866 ± 4%
> > sched_debug.cpu.nr_switches.max
> > 26316 ± 16% +25.5% 33021 ± 5%
> > sched_debug.cpu.nr_switches.stddev
> > 0.03 ± 7% +70.7% 0.05 ± 53% perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 0.02 ± 3% +38.9% 0.02 ± 28% perf-
> > sched.total_sch_delay.average.ms
> > 27.43 ± 2% -14.5% 23.45 perf-
> > sched.total_wait_and_delay.average.ms
> > 23174 +18.0% 27340 perf-
> > sched.total_wait_and_delay.count.ms
> > 27.41 ± 2% -14.6% 23.42 perf-
> > sched.total_wait_time.average.ms
> > 115.38 ± 3% -71.9% 32.37 ± 2% perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 1656 ± 3% +280.2% 6299 perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> > 115.35 ± 3% -72.0% 32.31 ± 2% perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 2737 +86.1% 5095 proc-
> > vmstat.nr_page_table_pages
> > 30460 +3.2% 31439 proc-vmstat.nr_shmem
> > 27933 +1.8% 28432 proc-
> > vmstat.nr_slab_unreclaimable
> > 19466749 -2.5% 18980434 proc-vmstat.numa_hit
> > 19414531 -2.5% 18927584 proc-
> > vmstat.numa_local
> > 20028107 -2.5% 19528806 proc-
> > vmstat.pgalloc_normal
> > 28087705 -2.4% 27417155 proc-vmstat.pgfault
> > 19980173 -2.5% 19474402 proc-vmstat.pgfree
> > 420074 -5.7% 396239 ± 8% proc-vmstat.pgreuse
> > 2685 -1.9% 2633 proc-
> > vmstat.unevictable_pgs_culled
> > 5.48e+08 -1.2% 5.412e+08 perf-stat.i.branch-
> > instructions
> > 5.92 +0.1 6.00 perf-stat.i.branch-
> > miss-rate%
> > 9195 +19.9% 11021 perf-stat.i.context-
> > switches
> > 1.96 +1.7% 1.99 perf-stat.i.cpi
> > 70.13 +73.4% 121.59 ± 8% perf-stat.i.cpu-
> > migrations
> > 2.725e+09 -1.3% 2.69e+09 perf-
> > stat.i.instructions
> > 0.53 -1.6% 0.52 perf-stat.i.ipc
> > 3.80 -2.4% 3.71 perf-
> > stat.i.metric.K/sec
> > 91139 -2.4% 88949 perf-stat.i.minor-
> > faults
> > 91139 -2.4% 88949 perf-stat.i.page-
> > faults
> > 5.00 ± 44% +1.1 6.07 perf-
> > stat.overall.branch-miss-rate%
> > 1.49 ± 44% +21.9% 1.82 perf-
> > stat.overall.cpi
> > 7643 ± 44% +43.7% 10984 perf-
> > stat.ps.context-switches
> > 58.17 ± 44% +108.4% 121.21 ± 8% perf-stat.ps.cpu-
> > migrations
> > 2.06 ± 2% -0.2 1.87 ± 12% perf-
> > profile.calltrace.cycles-
> > pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCAL
> > L_64_after_hwframe
> > 0.98 ± 7% -0.2 0.83 ± 12% perf-
> > profile.calltrace.cycles-
> > pp.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic_timer_interr
> > upt.asm_sysvec_apic_timer_interrupt
> > 1.69 ± 2% -0.1 1.54 ± 2% perf-
> > profile.calltrace.cycles-pp.setlocale
> > 0.58 ± 5% -0.1 0.44 ± 44% perf-
> > profile.calltrace.cycles-
> > pp.entry_SYSCALL_64_after_hwframe.__open64_nocancel.setlocale
> > 0.72 ± 6% -0.1 0.60 ± 8% perf-
> > profile.calltrace.cycles-
> > pp.rcu_do_batch.rcu_core.handle_softirqs.__irq_exit_rcu.sysvec_apic
> > _timer_interrupt
> > 3.21 ± 2% -0.1 3.11 perf-
> > profile.calltrace.cycles-
> > pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_s
> > yscall_64
> > 0.70 ± 4% -0.1 0.62 ± 6% perf-
> > profile.calltrace.cycles-
> > pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
> > 1.52 ± 2% -0.1 1.44 ± 3% perf-
> > profile.calltrace.cycles-
> > pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_aft
> > er_hwframe
> > 1.34 ± 3% -0.1 1.28 ± 3% perf-
> > profile.calltrace.cycles-
> > pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_6
> > 4
> > 0.89 ± 3% -0.1 0.84 perf-
> > profile.calltrace.cycles-
> > pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.
> > __sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
> > 0.17 ±141% +0.4 0.61 ± 7% perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> > 0.17 ±141% +0.4 0.61 ± 7% perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> > 65.10 +0.5 65.56 perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.comm
> > on_startup_64
> > 66.40 +0.6 67.00 perf-
> > profile.calltrace.cycles-
> > pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> > 66.46 +0.6 67.08 perf-
> > profile.calltrace.cycles-pp.start_secondary.common_startup_64
> > 66.46 +0.6 67.08 perf-
> > profile.calltrace.cycles-
> > pp.cpu_startup_entry.start_secondary.common_startup_64
> > 67.63 +0.7 68.30 perf-
> > profile.calltrace.cycles-pp.common_startup_64
> > 20.14 -0.6 19.51 perf-
> > profile.children.cycles-pp.do_syscall_64
> > 20.20 -0.6 19.57 perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 1.13 ± 5% -0.2 0.98 ± 9% perf-
> > profile.children.cycles-pp.rcu_core
> > 1.69 ± 2% -0.1 1.54 ± 2% perf-
> > profile.children.cycles-pp.setlocale
> > 0.84 ± 4% -0.1 0.71 ± 5% perf-
> > profile.children.cycles-pp.rcu_do_batch
> > 2.16 ± 2% -0.1 2.04 ± 3% perf-
> > profile.children.cycles-pp.ksys_mmap_pgoff
> > 1.15 ± 4% -0.1 1.04 ± 5% perf-
> > profile.children.cycles-pp.__open64_nocancel
> > 3.22 ± 2% -0.1 3.12 perf-
> > profile.children.cycles-pp.exec_binprm
> > 2.09 ± 2% -0.1 2.00 ± 2% perf-
> > profile.children.cycles-pp.kernel_clone
> > 0.88 ± 4% -0.1 0.79 ± 4% perf-
> > profile.children.cycles-pp.mas_store_prealloc
> > 2.19 -0.1 2.10 ± 3% perf-
> > profile.children.cycles-pp.__x64_sys_openat
> > 0.70 ± 4% -0.1 0.62 ± 6% perf-
> > profile.children.cycles-pp.dup_mm
> > 1.36 ± 3% -0.1 1.30 perf-
> > profile.children.cycles-pp._Fork
> > 0.56 ± 4% -0.1 0.50 ± 8% perf-
> > profile.children.cycles-pp.dup_mmap
> > 0.09 ± 16% -0.1 0.03 ± 70% perf-
> > profile.children.cycles-pp.perf_adjust_freq_unthr_context
> > 0.31 ± 8% -0.1 0.25 ± 10% perf-
> > profile.children.cycles-pp.strncpy_from_user
> > 0.94 ± 3% -0.1 0.88 ± 2% perf-
> > profile.children.cycles-pp.perf_mux_hrtimer_handler
> > 0.41 ± 5% -0.0 0.36 ± 5% perf-
> > profile.children.cycles-pp.irqtime_account_irq
> > 0.18 ± 12% -0.0 0.14 ± 7% perf-
> > profile.children.cycles-pp.tlb_remove_table_rcu
> > 0.20 ± 7% -0.0 0.17 ± 9% perf-
> > profile.children.cycles-pp.perf_event_task_tick
> > 0.08 ± 14% -0.0 0.05 ± 49% perf-
> > profile.children.cycles-pp.mas_update_gap
> > 0.24 ± 5% -0.0 0.21 ± 5% perf-
> > profile.children.cycles-pp.filemap_read
> > 0.19 ± 7% -0.0 0.16 ± 8% perf-
> > profile.children.cycles-pp.__call_rcu_common
> > 0.22 ± 2% -0.0 0.19 ± 5% perf-
> > profile.children.cycles-pp.mas_next_slot
> > 0.09 ± 5% +0.0 0.12 ± 7% perf-
> > profile.children.cycles-pp.__perf_event_task_sched_out
> > 0.05 ± 47% +0.0 0.08 ± 10% perf-
> > profile.children.cycles-pp.lru_gen_del_folio
> > 0.10 ± 14% +0.0 0.12 ± 18% perf-
> > profile.children.cycles-pp.__folio_mod_stat
> > 0.12 ± 12% +0.0 0.16 ± 3% perf-
> > profile.children.cycles-pp.perf_pmu_sched_task
> > 0.20 ± 10% +0.0 0.24 ± 4% perf-
> > profile.children.cycles-pp.prepare_task_switch
> > 0.06 ± 47% +0.0 0.10 ± 11% perf-
> > profile.children.cycles-pp.__queue_work
> > 0.56 ± 5% +0.1 0.61 ± 4% perf-
> > profile.children.cycles-pp.sched_balance_domains
> > 0.04 ± 72% +0.1 0.09 ± 11% perf-
> > profile.children.cycles-pp.kick_pool
> > 0.04 ± 72% +0.1 0.09 ± 14% perf-
> > profile.children.cycles-pp.queue_work_on
> > 0.33 ± 6% +0.1 0.38 ± 7% perf-
> > profile.children.cycles-pp.dequeue_entities
> > 0.35 ± 6% +0.1 0.40 ± 7% perf-
> > profile.children.cycles-pp.dequeue_task_fair
> > 0.52 ± 6% +0.1 0.58 ± 5% perf-
> > profile.children.cycles-pp.enqueue_task_fair
> > 0.54 ± 7% +0.1 0.60 ± 5% perf-
> > profile.children.cycles-pp.enqueue_task
> > 0.28 ± 9% +0.1 0.35 ± 5% perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> > 0.21 ± 4% +0.1 0.28 ± 12% perf-
> > profile.children.cycles-pp.try_to_block_task
> > 0.34 ± 4% +0.1 0.42 ± 3% perf-
> > profile.children.cycles-pp.ttwu_do_activate
> > 0.36 ± 3% +0.1 0.46 ± 6% perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> > 0.28 ± 4% +0.1 0.38 ± 5% perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> > 0.33 ± 2% +0.1 0.43 ± 5% perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> > 0.46 ± 7% +0.1 0.56 ± 6% perf-
> > profile.children.cycles-pp.schedule
> > 0.48 ± 8% +0.1 0.61 ± 8% perf-
> > profile.children.cycles-pp.timerqueue_del
> > 0.18 ± 13% +0.1 0.32 ± 11% perf-
> > profile.children.cycles-pp.worker_thread
> > 0.38 ± 9% +0.2 0.52 ± 10% perf-
> > profile.children.cycles-pp.kthread
> > 1.10 ± 5% +0.2 1.25 ± 2% perf-
> > profile.children.cycles-pp.__schedule
> > 0.85 ± 8% +0.2 1.01 ± 7% perf-
> > profile.children.cycles-pp.ret_from_fork
> > 0.85 ± 8% +0.2 1.02 ± 7% perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> > 63.15 +0.5 63.64 perf-
> > profile.children.cycles-pp.cpuidle_enter
> > 66.26 +0.5 66.77 perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> > 66.46 +0.6 67.08 perf-
> > profile.children.cycles-pp.start_secondary
> > 67.63 +0.7 68.30 perf-
> > profile.children.cycles-pp.common_startup_64
> > 67.63 +0.7 68.30 perf-
> > profile.children.cycles-pp.cpu_startup_entry
> > 67.63 +0.7 68.30 perf-
> > profile.children.cycles-pp.do_idle
> > 1.20 ± 3% -0.1 1.12 ± 4% perf-
> > profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 0.09 ± 16% -0.1 0.03 ± 70% perf-
> > profile.self.cycles-pp.perf_adjust_freq_unthr_context
> > 0.25 ± 6% -0.0 0.21 ± 12% perf-
> > profile.self.cycles-pp.irqtime_account_irq
> > 0.02 ±141% +0.0 0.06 ± 13% perf-
> > profile.self.cycles-pp.prepend_path
> > 0.13 ± 10% +0.1 0.24 ± 11% perf-
> > profile.self.cycles-pp.timerqueue_del
> >
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/ro
> > otfs/tbox_group/testcase:
> > gcc-12/performance/pipe/4/x86_64-rhel-9.4/process/50%/debian-12-
> > x86_64-20240206.cgz/lkp-icl-2sp2/hackbench
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 3.924e+08 ± 3% +55.1% 6.086e+08 ± 2% cpuidle..time
> > 7504886 ± 11% +184.4% 21340245 ± 6% cpuidle..usage
> > 13350305 -3.8% 12848570 vmstat.system.cs
> > 1849619 +5.1% 1943754 vmstat.system.in
> > 3.56 ± 5% +2.6 6.16 ± 7% mpstat.cpu.all.idle%
> > 0.69 +0.2 0.90 ± 3% mpstat.cpu.all.irq%
> > 0.03 ± 3% +0.0 0.04 ± 3% mpstat.cpu.all.soft%
> > 18666 ± 9% +41.2% 26352 ± 6% perf-c2c.DRAM.remote
> > 197041 -39.6% 118945 ± 5% perf-c2c.HITM.local
> > 3178 ± 12% +37.2% 4361 ± 11% perf-c2c.HITM.remote
> > 200219 -38.4% 123307 ± 5% perf-c2c.HITM.total
> > 2842579 ± 11% +60.1% 4550025 ± 12% meminfo.Active
> > 2842579 ± 11% +60.1% 4550025 ± 12% meminfo.Active(anon)
> > 5535242 ± 5% +30.9% 7248257 ± 7% meminfo.Cached
> > 3846718 ± 8% +44.0% 5539484 ± 9% meminfo.Committed_AS
> > 9684149 ± 3% +20.5% 11666616 ± 4% meminfo.Memused
> > 136127 ± 3% +14.2% 155524 meminfo.PageTables
> > 62144 +22.8% 76336 meminfo.Percpu
> > 2001586 ± 16% +85.6% 3714611 ± 14% meminfo.Shmem
> > 9759598 ± 3% +20.0% 11714619 ± 4% meminfo.max_used_kB
> > 710625 ± 11% +59.3% 1131770 ± 11% proc-
> > vmstat.nr_active_anon
> > 1383631 ± 5% +30.6% 1806419 ± 7% proc-
> > vmstat.nr_file_pages
> > 34220 ± 3% +13.9% 38987 proc-
> > vmstat.nr_page_table_pages
> > 500216 ± 16% +84.5% 923007 ± 14% proc-vmstat.nr_shmem
> > 710625 ± 11% +59.3% 1131770 ± 11% proc-
> > vmstat.nr_zone_active_anon
> > 92308030 +8.7% 1.004e+08 proc-vmstat.numa_hit
> > 92171407 +8.7% 1.002e+08 proc-
> > vmstat.numa_local
> > 133616 +2.7% 137265 proc-
> > vmstat.numa_other
> > 92394313 +8.7% 1.004e+08 proc-
> > vmstat.pgalloc_normal
> > 91035691 +7.8% 98094626 proc-vmstat.pgfree
> > 867815 +11.8% 970369 hackbench.throughput
> > 830278 +11.6% 926834
> > hackbench.throughput_avg
> > 867815 +11.8% 970369
> > hackbench.throughput_best
> > 760822 +14.2% 869145
> > hackbench.throughput_worst
> > 72.87 -10.3% 65.36
> > hackbench.time.elapsed_time
> > 72.87 -10.3% 65.36
> > hackbench.time.elapsed_time.max
> > 2.493e+08 -17.7% 2.052e+08
> > hackbench.time.involuntary_context_switches
> > 12357 -3.9% 11879
> > hackbench.time.percent_of_cpu_this_job_got
> > 8029 -14.8% 6842
> > hackbench.time.system_time
> > 976.58 -5.5% 923.21
> > hackbench.time.user_time
> > 7.54e+08 -14.4% 6.451e+08
> > hackbench.time.voluntary_context_switches
> > 5.598e+10 +6.6% 5.965e+10 perf-stat.i.branch-
> > instructions
> > 0.40 -0.0 0.38 perf-stat.i.branch-
> > miss-rate%
> > 8.36 ± 2% +4.6 12.98 ± 3% perf-stat.i.cache-
> > miss-rate%
> > 2.11e+09 -33.8% 1.396e+09 perf-stat.i.cache-
> > references
> > 13687653 -3.4% 13225338 perf-stat.i.context-
> > switches
> > 1.36 -7.9% 1.25 perf-stat.i.cpi
> > 3.219e+11 -2.2% 3.147e+11 perf-stat.i.cpu-
> > cycles
> > 1915 ± 2% -6.6% 1788 ± 3% perf-stat.i.cycles-
> > between-cache-misses
> > 2.371e+11 +6.0% 2.512e+11 perf-
> > stat.i.instructions
> > 0.74 +8.5% 0.80 perf-stat.i.ipc
> > 1.15 ± 14% -28.3% 0.82 ± 23% perf-stat.i.major-
> > faults
> > 115.09 -3.2% 111.40 perf-
> > stat.i.metric.K/sec
> > 0.37 -0.0 0.35 perf-
> > stat.overall.branch-miss-rate%
> > 8.15 ± 3% +4.6 12.74 ± 3% perf-
> > stat.overall.cache-miss-rate%
> > 1.36 -7.7% 1.25 perf-
> > stat.overall.cpi
> > 1875 ± 2% -5.5% 1772 ± 4% perf-
> > stat.overall.cycles-between-cache-misses
> > 0.74 +8.3% 0.80 perf-
> > stat.overall.ipc
> > 5.524e+10 +6.4% 5.877e+10 perf-stat.ps.branch-
> > instructions
> > 2.079e+09 -33.9% 1.375e+09 perf-stat.ps.cache-
> > references
> > 13486088 -3.4% 13020988 perf-
> > stat.ps.context-switches
> > 3.175e+11 -2.3% 3.101e+11 perf-stat.ps.cpu-
> > cycles
> > 2.34e+11 +5.8% 2.475e+11 perf-
> > stat.ps.instructions
> > 1.09 ± 14% -28.3% 0.78 ± 21% perf-stat.ps.major-
> > faults
> > 1.73e+13 -5.1% 1.642e+13 perf-
> > stat.total.instructions
> > 3527725 +10.7% 3905361
> > sched_debug.cfs_rq:/.avg_vruntime.avg
> > 3975260 +14.1% 4535959 ± 6%
> > sched_debug.cfs_rq:/.avg_vruntime.max
> > 98657 ± 17% +84.9% 182407 ± 18%
> > sched_debug.cfs_rq:/.avg_vruntime.stddev
> > 11.83 ± 7% +17.6% 13.92 ± 5%
> > sched_debug.cfs_rq:/.h_nr_queued.max
> > 2.71 ± 5% +21.8% 3.30 ± 4%
> > sched_debug.cfs_rq:/.h_nr_queued.stddev
> > 11.75 ± 7% +17.7% 13.83 ± 6%
> > sched_debug.cfs_rq:/.h_nr_runnable.max
> > 2.68 ± 4% +21.2% 3.25 ± 5%
> > sched_debug.cfs_rq:/.h_nr_runnable.stddev
> > 4556 ±223% +691.0% 36039 ± 34%
> > sched_debug.cfs_rq:/.left_deadline.avg
> > 583131 ±223% +577.3% 3949548 ± 4%
> > sched_debug.cfs_rq:/.left_deadline.max
> > 51341 ±223% +622.0% 370695 ± 16%
> > sched_debug.cfs_rq:/.left_deadline.stddev
> > 4555 ±223% +691.0% 36035 ± 34%
> > sched_debug.cfs_rq:/.left_vruntime.avg
> > 583105 ±223% +577.3% 3949123 ± 4%
> > sched_debug.cfs_rq:/.left_vruntime.max
> > 51338 ±223% +622.0% 370651 ± 16%
> > sched_debug.cfs_rq:/.left_vruntime.stddev
> > 3527725 +10.7% 3905361
> > sched_debug.cfs_rq:/.min_vruntime.avg
> > 3975260 +14.1% 4535959 ± 6%
> > sched_debug.cfs_rq:/.min_vruntime.max
> > 98657 ± 17% +84.9% 182407 ± 18%
> > sched_debug.cfs_rq:/.min_vruntime.stddev
> > 0.22 ± 5% +13.9% 0.25 ± 5%
> > sched_debug.cfs_rq:/.nr_queued.stddev
> > 4555 ±223% +691.0% 36035 ± 34%
> > sched_debug.cfs_rq:/.right_vruntime.avg
> > 583105 ±223% +577.3% 3949123 ± 4%
> > sched_debug.cfs_rq:/.right_vruntime.max
> > 51338 ±223% +622.0% 370651 ± 16%
> > sched_debug.cfs_rq:/.right_vruntime.stddev
> > 1336 ± 7% +50.8% 2014 ± 6%
> > sched_debug.cfs_rq:/.runnable_avg.stddev
> > 552.53 ± 8% +19.6% 660.87 ± 5%
> > sched_debug.cfs_rq:/.util_est.avg
> > 384.27 ± 9% +28.9% 495.43 ± 11%
> > sched_debug.cfs_rq:/.util_est.stddev
> > 1328 ± 17% +42.7% 1896 ± 13%
> > sched_debug.cpu.curr->pid.stddev
> > 11.75 ± 8% +19.1% 14.00 ± 6%
> > sched_debug.cpu.nr_running.max
> > 2.71 ± 5% +22.7% 3.33 ± 4%
> > sched_debug.cpu.nr_running.stddev
> > 76578 ± 9% +33.7% 102390 ± 5%
> > sched_debug.cpu.nr_switches.stddev
> > 62.25 ± 7% +17.9% 73.42 ± 7%
> > sched_debug.cpu.nr_uninterruptible.max
> > 8.11 ± 58% -82.0% 1.46 ± 47% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> > 12.04 ±104% -86.8% 1.58 ± 55% perf-
> > sched.sch_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon
> > _pipe_write
> > 0.11 ±123% -95.3% 0.01 ±102% perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> > 0.06 ±103% -93.6% 0.00 ±154% perf-
> > sched.sch_delay.avg.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> > 0.10 ±109% -93.9% 0.01 ±163% perf-
> > sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> > 1.00 ± 21% -59.6% 0.40 ± 50% perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs
> > _read.ksys_read
> > 14.54 ± 14% -79.2% 3.02 ± 51% perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vf
> > s_write.ksys_write
> > 1.50 ± 84% -74.1% 0.39 ± 90% perf-
> > sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem
> > _alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> > 1.13 ± 68% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.38 ± 97% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 1.10 ± 17% -68.9% 0.34 ± 49% perf-
> > sched.sch_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> > 42.25 ± 18% -71.7% 11.96 ± 53% perf-
> > sched.sch_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> > 3.25 ± 17% -77.5% 0.73 ± 49% perf-
> > sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 29.17 ± 33% -62.0% 11.09 ± 85% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> > 46.25 ± 15% -68.8% 14.43 ± 52% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> > 3.72 ± 70% -81.0% 0.70 ± 67% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> > 7.95 ± 55% -69.7% 2.41 ± 65% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> > 3.66 ±139% -97.1% 0.11 ± 58% perf-
> > sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule
> > _timeout.constprop.0.do_poll
> > 3.05 ± 44% -91.9% 0.25 ± 57% perf-
> > sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> > 29.96 ± 9% -83.6% 4.90 ± 48% perf-
> > sched.sch_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> > 26.20 ± 59% -88.9% 2.92 ± 66% perf-
> > sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 0.14 ± 84% -91.2% 0.01 ±142% perf-
> > sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
> > 0.20 ±149% -97.5% 0.01 ±102% perf-
> > sched.sch_delay.max.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> > 0.11 ±144% -96.6% 0.00 ±154% perf-
> > sched.sch_delay.max.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> > 0.19 ±118% -96.7% 0.01 ±163% perf-
> > sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> > 274.64 ± 95% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 3.72 ±151% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 3135 ± 5% -48.6% 1611 ± 57% perf-
> > sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 1320 ± 19% -78.6% 282.01 ± 74% perf-
> > sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> > 265.55 ± 82% -77.9% 58.70 ±124% perf-
> > sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> > 1850 ± 28% -59.1% 757.74 ± 68% perf-
> > sched.sch_delay.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> > 766.85 ± 56% -68.0% 245.51 ± 51% perf-
> > sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 1.77 ± 17% -71.9% 0.50 ± 49% perf-
> > sched.total_sch_delay.average.ms
> > 5.15 ± 17% -69.5% 1.57 ± 48% perf-
> > sched.total_wait_and_delay.average.ms
> > 3.38 ± 17% -68.2% 1.07 ± 48% perf-
> > sched.total_wait_time.average.ms
> > 5100 ± 3% -31.0% 3522 ± 47% perf-
> > sched.total_wait_time.max.ms
> > 27.42 ± 49% -85.2% 4.07 ± 47% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_nop
> > rof.alloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> > 35.29 ± 80% -85.8% 5.00 ± 51% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__mutex_lock.constprop.0
> > .anon_pipe_write
> > 42.28 ± 14% -79.4% 8.70 ± 51% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock.anon_pipe_wri
> > te.vfs_write.ksys_write
> > 3.12 ± 17% -66.4% 1.05 ± 48% perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_sy
> > scall_64
> > 122.62 ± 18% -70.4% 36.26 ± 53% perf-
> > sched.wait_and_delay.avg.ms.anon_pipe_write.vfs_write.ksys_write.do
> > _syscall_64
> > 250.26 ± 65% -94.2% 14.56 ± 55% perf-
> > sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entr
> > y_SYSCALL_64_after_hwframe
> > 9.37 ± 17% -78.2% 2.05 ± 48% perf-
> > sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> > 58.34 ± 33% -62.0% 22.18 ± 85% perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown]
> > 134.44 ± 15% -69.3% 41.24 ± 52% perf-
> > sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_a
> > pic_timer_interrupt.[unknown].[unknown]
> > 86.94 ± 6% -83.1% 14.68 ± 48% perf-
> > sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.
> > constprop.0.anon_pipe_write
> > 86.57 ± 39% -86.0% 12.14 ± 59% perf-
> > sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp
> > _kthread.kthread
> > 647.92 ± 48% -97.9% 13.86 ± 45% perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 6386 ± 6% -46.8% 3397 ± 57% perf-
> > sched.wait_and_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> > 3868 ± 27% -60.4% 1531 ± 67% perf-
> > sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.
> > constprop.0.anon_pipe_write
> > 1647 ± 55% -67.7% 531.51 ± 50% perf-
> > sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp
> > _kthread.kthread
> > 5014 ± 5% -32.5% 3385 ± 47% perf-
> > sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork
> > .ret_from_fork_asm
> > 19.31 ± 47% -86.5% 2.61 ± 49% perf-
> > sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> > 23.25 ± 70% -85.3% 3.42 ± 52% perf-
> > sched.wait_time.avg.ms.__cond_resched.__mutex_lock.constprop.0.anon
> > _pipe_write
> > 18.33 ± 15% -42.0% 10.64 ± 49% perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move
> > _task.__set_cpus_allowed_ptr.__sched_setaffinity
> > 0.11 ±123% -95.3% 0.01 ±102% perf-
> > sched.wait_time.avg.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> > 0.06 ±103% -93.6% 0.00 ±154% perf-
> > sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> > 0.10 ±109% -93.9% 0.01 ±163% perf-
> > sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> > 1.70 ± 21% -52.6% 0.81 ± 48% perf-
> > sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_read.vfs
> > _read.ksys_read
> > 27.74 ± 15% -79.5% 5.68 ± 51% perf-
> > sched.wait_time.avg.ms.__cond_resched.mutex_lock.anon_pipe_write.vf
> > s_write.ksys_write
> > 2.17 ± 75% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.42 ± 97% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 2.02 ± 17% -65.1% 0.70 ± 48% perf-
> > sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall
> > _64
> > 80.37 ± 18% -69.8% 24.31 ± 52% perf-
> > sched.wait_time.avg.ms.anon_pipe_write.vfs_write.ksys_write.do_sysc
> > all_64
> > 210.13 ± 68% -95.1% 10.21 ± 55% perf-
> > sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> > 6.12 ± 17% -78.5% 1.32 ± 48% perf-
> > sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 29.17 ± 33% -62.0% 11.09 ± 85% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown]
> > 88.19 ± 16% -69.6% 26.81 ± 52% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> > 13.77 ± 45% -65.7% 4.72 ± 53% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown].[unknown]
> > 104.64 ± 42% -76.4% 24.74 ±135% perf-
> > sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_fro
> > m_fork_asm
> > 5.16 ± 29% -92.5% 0.39 ± 48% perf-
> > sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> > 56.98 ± 5% -82.9% 9.77 ± 48% perf-
> > sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> > 60.36 ± 32% -84.7% 9.22 ± 57% perf-
> > sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 619.88 ± 43% -98.0% 12.52 ± 45% perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 0.14 ± 84% -91.2% 0.01 ±142% perf-
> > sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
> > 740.14 ± 35% -68.5% 233.31 ± 83% perf-
> > sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.a
> > lloc_pages_mpol.alloc_pages_noprof.anon_pipe_write
> > 0.20 ±149% -97.5% 0.01 ±102% perf-
> > sched.wait_time.max.ms.__cond_resched.down_write_killable.map_vdso.
> > load_elf_binary.exec_binprm
> > 0.11 ±144% -96.6% 0.00 ±154% perf-
> > sched.wait_time.max.ms.__cond_resched.filemap_read.__kernel_read.ex
> > ec_binprm.bprm_execve
> > 0.19 ±118% -96.7% 0.01 ±163% perf-
> > sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_a
> > lloc_nodes.mas_preallocate.vma_link
> > 327.64 ± 71% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 3.72 ±151% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_t
> > o_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
> > 3299 ± 6% -40.7% 1957 ± 51% perf-
> > sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 436.75 ± 39% -76.9% 100.85 ± 98% perf-
> > sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_read
> > 2112 ± 19% -62.3% 796.34 ± 63% perf-
> > sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.const
> > prop.0.anon_pipe_write
> > 947.83 ± 46% -58.8% 390.83 ± 53% perf-
> > sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 5014 ± 5% -32.5% 3385 ± 47% perf-
> > sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_
> > from_fork_asm
> >
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> > gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/shell_rtns_2/aim9/300s
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 11036 +85.7% 20499 meminfo.PageTables
> > 125.17 ± 8% +18.4% 148.17 ± 7% perf-c2c.HITM.local
> > 30464 +18.7% 36160
> > sched_debug.cpu.nr_switches.avg
> > 9166 +19.8% 10985 vmstat.system.cs
> > 6623 ± 17% +60.8% 10652 ± 5% numa-
> > meminfo.node0.PageTables
> > 4414 ± 26% +123.2% 9853 ± 6% numa-
> > meminfo.node1.PageTables
> > 1653 ± 17% +60.1% 2647 ± 5% numa-
> > vmstat.node0.nr_page_table_pages
> > 1097 ± 26% +123.9% 2457 ± 6% numa-
> > vmstat.node1.nr_page_table_pages
> > 319.08 -2.2% 312.04
> > aim9.shell_rtns_2.ops_per_sec
> > 27170926 -2.2% 26586121
> > aim9.time.minor_page_faults
> > 1051038 -2.2% 1027732
> > aim9.time.voluntary_context_switches
> > 2736 +86.4% 5101 proc-
> > vmstat.nr_page_table_pages
> > 28014 +1.3% 28378 proc-
> > vmstat.nr_slab_unreclaimable
> > 19332129 -1.5% 19048363 proc-vmstat.numa_hit
> > 19283853 -1.5% 18996609 proc-
> > vmstat.numa_local
> > 19892794 -1.5% 19598065 proc-
> > vmstat.pgalloc_normal
> > 28044189 -2.1% 27457289 proc-vmstat.pgfault
> > 19843766 -1.5% 19543091 proc-vmstat.pgfree
> > 419715 -5.7% 395688 ± 8% proc-vmstat.pgreuse
> > 2682 -2.0% 2628 proc-
> > vmstat.unevictable_pgs_culled
> > 0.07 ± 6% -30.5% 0.05 ± 22% perf-
> > sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthr
> > ead.kthread
> > 0.03 ± 6% +36.0% 0.04 perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 0.07 ± 33% -57.5% 0.03 ± 53% perf-
> > sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_co
> > mpletion_state.kernel_clone.__x64_sys_vfork
> > 0.02 ± 74% +112.0% 0.05 ± 36% perf-
> > sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link
> > _path_walk.path_openat
> > 0.02 +24.1% 0.02 ± 2% perf-
> > sched.total_sch_delay.average.ms
> > 27.52 -14.0% 23.67 perf-
> > sched.total_wait_and_delay.average.ms
> > 23179 +18.3% 27421 perf-
> > sched.total_wait_and_delay.count.ms
> > 27.50 -14.0% 23.65 perf-
> > sched.total_wait_time.average.ms
> > 117.03 ± 3% -72.4% 32.27 ± 2% perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 1655 ± 2% +282.0% 6324 perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> > 0.96 ± 29% +51.6% 1.45 ± 22% perf-
> > sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.i
> > sra.0
> > 117.00 ± 3% -72.5% 32.23 ± 2% perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 5.93 +0.1 6.00 perf-stat.i.branch-
> > miss-rate%
> > 9189 +19.8% 11011 perf-stat.i.context-
> > switches
> > 1.96 +1.6% 1.99 perf-stat.i.cpi
> > 71.21 +60.6% 114.39 ± 4% perf-stat.i.cpu-
> > migrations
> > 0.53 -1.5% 0.52 perf-stat.i.ipc
> > 3.79 -2.1% 3.71 perf-
> > stat.i.metric.K/sec
> > 90998 -2.1% 89084 perf-stat.i.minor-
> > faults
> > 90998 -2.1% 89084 perf-stat.i.page-
> > faults
> > 5.99 +0.1 6.06 perf-
> > stat.overall.branch-miss-rate%
> > 1.79 +1.4% 1.82 perf-
> > stat.overall.cpi
> > 0.56 -1.3% 0.55 perf-
> > stat.overall.ipc
> > 9158 +19.8% 10974 perf-
> > stat.ps.context-switches
> > 70.99 +60.6% 114.02 ± 4% perf-stat.ps.cpu-
> > migrations
> > 90694 -2.1% 88787 perf-stat.ps.minor-
> > faults
> > 90695 -2.1% 88787 perf-stat.ps.page-
> > faults
> > 8.155e+11 -1.1% 8.065e+11 perf-
> > stat.total.instructions
> > 8.87 -0.3 8.55 perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 8.86 -0.3 8.54 perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 2.53 ± 2% -0.1 2.43 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
> > 2.54 -0.1 2.44 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
> > 2.49 -0.1 2.40 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
> > 0.98 ± 5% -0.1 0.90 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> > 0.70 ± 3% -0.1 0.62 ± 6% perf-
> > profile.calltrace.cycles-
> > pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
> > 0.18 ±141% +0.5 0.67 ± 6% perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> > 0.18 ±141% +0.5 0.67 ± 6% perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> > 0.00 +0.6 0.59 ± 7% perf-
> > profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> > 62.48 +0.7 63.14 perf-
> > profile.calltrace.cycles-
> > pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_
> > startup_entry
> > 49.10 +0.7 49.78 perf-
> > profile.calltrace.cycles-
> > pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.d
> > o_idle
> > 67.62 +0.8 68.43 perf-
> > profile.calltrace.cycles-pp.common_startup_64
> > 20.14 -0.7 19.40 perf-
> > profile.children.cycles-pp.do_syscall_64
> > 20.18 -0.7 19.44 perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 3.33 ± 2% -0.2 3.16 ± 2% perf-
> > profile.children.cycles-pp.vm_mmap_pgoff
> > 3.22 ± 2% -0.2 3.06 perf-
> > profile.children.cycles-pp.do_mmap
> > 3.51 ± 2% -0.1 3.38 perf-
> > profile.children.cycles-pp.do_exit
> > 3.52 ± 2% -0.1 3.38 perf-
> > profile.children.cycles-pp.__x64_sys_exit_group
> > 3.52 ± 2% -0.1 3.38 perf-
> > profile.children.cycles-pp.do_group_exit
> > 3.67 -0.1 3.54 perf-
> > profile.children.cycles-pp.x64_sys_call
> > 2.21 -0.1 2.09 ± 3% perf-
> > profile.children.cycles-pp.__x64_sys_openat
> > 2.07 ± 2% -0.1 1.94 ± 2% perf-
> > profile.children.cycles-pp.path_openat
> > 2.09 ± 2% -0.1 1.97 ± 2% perf-
> > profile.children.cycles-pp.do_filp_open
> > 2.19 -0.1 2.08 ± 3% perf-
> > profile.children.cycles-pp.do_sys_openat2
> > 1.50 ± 4% -0.1 1.39 ± 3% perf-
> > profile.children.cycles-pp.copy_process
> > 2.56 -0.1 2.46 ± 2% perf-
> > profile.children.cycles-pp.exit_mm
> > 2.55 -0.1 2.44 ± 2% perf-
> > profile.children.cycles-pp.__mmput
> > 2.51 ± 2% -0.1 2.41 ± 2% perf-
> > profile.children.cycles-pp.exit_mmap
> > 0.70 ± 3% -0.1 0.62 ± 6% perf-
> > profile.children.cycles-pp.dup_mm
> > 0.94 ± 4% -0.1 0.89 ± 2% perf-
> > profile.children.cycles-pp.__alloc_frozen_pages_noprof
> > 0.57 ± 3% -0.0 0.52 ± 4% perf-
> > profile.children.cycles-pp.alloc_pages_noprof
> > 0.20 ± 12% -0.0 0.15 ± 10% perf-
> > profile.children.cycles-pp.perf_event_task_tick
> > 0.18 ± 4% -0.0 0.14 ± 15% perf-
> > profile.children.cycles-pp.xas_find
> > 0.10 ± 12% -0.0 0.07 ± 24% perf-
> > profile.children.cycles-pp.up_write
> > 0.09 ± 6% -0.0 0.07 ± 11% perf-
> > profile.children.cycles-pp.tick_check_broadcast_expired
> > 0.08 ± 12% +0.0 0.10 ± 8% perf-
> > profile.children.cycles-pp.hrtimer_try_to_cancel
> > 0.10 ± 13% +0.0 0.13 ± 5% perf-
> > profile.children.cycles-pp.__perf_event_task_sched_out
> > 0.20 ± 8% +0.0 0.23 ± 4% perf-
> > profile.children.cycles-pp.enqueue_entity
> > 0.21 ± 9% +0.0 0.25 ± 4% perf-
> > profile.children.cycles-pp.prepare_task_switch
> > 0.03 ±101% +0.0 0.07 ± 16% perf-
> > profile.children.cycles-pp.run_ksoftirqd
> > 0.04 ± 71% +0.1 0.09 ± 15% perf-
> > profile.children.cycles-pp.kick_pool
> > 0.05 ± 47% +0.1 0.11 ± 16% perf-
> > profile.children.cycles-pp.__queue_work
> > 0.28 ± 5% +0.1 0.34 ± 7% perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> > 0.50 +0.1 0.56 ± 2% perf-
> > profile.children.cycles-pp.timerqueue_del
> > 0.04 ± 71% +0.1 0.11 ± 17% perf-
> > profile.children.cycles-pp.queue_work_on
> > 0.51 ± 4% +0.1 0.58 ± 2% perf-
> > profile.children.cycles-pp.enqueue_task_fair
> > 0.32 ± 3% +0.1 0.40 ± 4% perf-
> > profile.children.cycles-pp.ttwu_do_activate
> > 0.53 ± 5% +0.1 0.61 ± 3% perf-
> > profile.children.cycles-pp.enqueue_task
> > 0.49 ± 4% +0.1 0.57 ± 6% perf-
> > profile.children.cycles-pp.schedule
> > 0.28 ± 6% +0.1 0.38 perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> > 0.32 ± 5% +0.1 0.43 ± 2% perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> > 0.35 ± 8% +0.1 0.47 ± 2% perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> > 0.17 ± 10% +0.2 0.34 ± 12% perf-
> > profile.children.cycles-pp.worker_thread
> > 0.88 ± 3% +0.2 1.06 ± 4% perf-
> > profile.children.cycles-pp.ret_from_fork
> > 0.88 ± 3% +0.2 1.06 ± 4% perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> > 0.39 ± 6% +0.2 0.59 ± 7% perf-
> > profile.children.cycles-pp.kthread
> > 66.24 +0.6 66.85 perf-
> > profile.children.cycles-pp.cpuidle_idle_call
> > 63.09 +0.6 63.73 perf-
> > profile.children.cycles-pp.cpuidle_enter
> > 62.97 +0.6 63.61 perf-
> > profile.children.cycles-pp.cpuidle_enter_state
> > 67.61 +0.8 68.43 perf-
> > profile.children.cycles-pp.do_idle
> > 67.62 +0.8 68.43 perf-
> > profile.children.cycles-pp.common_startup_64
> > 67.62 +0.8 68.43 perf-
> > profile.children.cycles-pp.cpu_startup_entry
> > 0.37 ± 11% -0.1 0.31 ± 3% perf-
> > profile.self.cycles-pp.__memcg_slab_post_alloc_hook
> > 0.10 ± 13% -0.0 0.06 ± 50% perf-
> > profile.self.cycles-pp.up_write
> > 0.15 ± 4% +0.1 0.22 ± 8% perf-
> > profile.self.cycles-pp.timerqueue_del
> >
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2
> > @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/t
> > esttime:
> > gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-
> > 20240206.cgz/lkp-ivb-2ep2/exec_test/aim9/300s
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 12120 +76.7% 21422 meminfo.PageTables
> > 8543 +26.9% 10840 vmstat.system.cs
> > 6148 ± 11% +89.9% 11678 ± 5% numa-
> > meminfo.node0.PageTables
> > 5909 ± 11% +64.0% 9689 ± 7% numa-
> > meminfo.node1.PageTables
> > 1532 ± 10% +90.5% 2919 ± 5% numa-
> > vmstat.node0.nr_page_table_pages
> > 1468 ± 11% +65.2% 2426 ± 7% numa-
> > vmstat.node1.nr_page_table_pages
> > 2991 +78.0% 5323 proc-
> > vmstat.nr_page_table_pages
> > 32726750 -2.4% 31952115 proc-vmstat.pgfault
> > 1228 -2.6% 1197
> > aim9.exec_test.ops_per_sec
> > 11018 ± 2% +10.5% 12178 ± 2%
> > aim9.time.involuntary_context_switches
> > 31835059 -2.4% 31062527
> > aim9.time.minor_page_faults
> > 736468 -2.9% 715310
> > aim9.time.voluntary_context_switches
> > 0.28 ± 7% +11.3% 0.31 ± 6%
> > sched_debug.cfs_rq:/.h_nr_queued.stddev
> > 0.28 ± 7% +11.3% 0.31 ± 6%
> > sched_debug.cfs_rq:/.nr_queued.stddev
> > 356683 ± 16% +27.0% 453000 ± 9%
> > sched_debug.cpu.avg_idle.min
> > 27620 ± 7% +29.5% 35775
> > sched_debug.cpu.nr_switches.avg
> > 84830 ± 14% +16.3% 98648 ± 4%
> > sched_debug.cpu.nr_switches.max
> > 4563 ± 26% +46.2% 6671 ± 26%
> > sched_debug.cpu.nr_switches.min
> > 0.03 ± 4% -67.3% 0.01 ±141% perf-
> > sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exec_release
> > .exec_mm_release.exec_mmap
> > 0.03 +11.2% 0.03 ± 2% perf-
> > sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 0.05 ± 28% +61.3% 0.09 ± 21% perf-
> > sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_t
> > imer_interrupt.[unknown].[unknown]
> > 0.10 ± 18% +18.8% 0.12 perf-
> > sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> > 0.02 ± 3% +18.3% 0.02 ± 2% perf-
> > sched.total_sch_delay.average.ms
> > 28.80 -19.8% 23.10 ± 3% perf-
> > sched.total_wait_and_delay.average.ms
> > 22332 +24.4% 27778 perf-
> > sched.total_wait_and_delay.count.ms
> > 28.78 -19.8% 23.07 ± 3% perf-
> > sched.total_wait_time.average.ms
> > 17.39 ± 10% -15.6% 14.67 ± 4% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine
> > _move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> > 41.02 ± 4% -54.6% 18.64 ± 6% perf-
> > sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret
> > _from_fork_asm
> > 4795 ± 2% +122.5% 10668 perf-
> > sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_
> > from_fork_asm
> > 17.35 ± 10% -15.7% 14.63 ± 4% perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move
> > _task.__set_cpus_allowed_ptr.__sched_setaffinity
> > 0.00 ±141% +400.0% 0.00 ± 44% perf-
> > sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vf
> > s_open
> > 40.99 ± 4% -54.6% 18.61 ± 6% perf-
> > sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from
> > _fork_asm
> > 0.00 ±149% +542.9% 0.03 ± 41% perf-
> > sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vf
> > s_open
> > 5.617e+08 -1.6% 5.529e+08 perf-stat.i.branch-
> > instructions
> > 5.76 +0.1 5.84 perf-stat.i.branch-
> > miss-rate%
> > 8562 +27.0% 10878 perf-stat.i.context-
> > switches
> > 1.87 +2.6% 1.92 perf-stat.i.cpi
> > 78.02 ± 3% +11.8% 87.23 ± 2% perf-stat.i.cpu-
> > migrations
> > 2.792e+09 -1.6% 2.748e+09 perf-
> > stat.i.instructions
> > 0.55 -2.5% 0.54 perf-stat.i.ipc
> > 4.42 -2.4% 4.31 perf-
> > stat.i.metric.K/sec
> > 106019 -2.4% 103509 perf-stat.i.minor-
> > faults
> > 106019 -2.4% 103509 perf-stat.i.page-
> > faults
> > 5.83 +0.1 5.91 perf-
> > stat.overall.branch-miss-rate%
> > 1.72 +2.3% 1.76 perf-
> > stat.overall.cpi
> > 0.58 -2.3% 0.57 perf-
> > stat.overall.ipc
> > 5.599e+08 -1.6% 5.511e+08 perf-stat.ps.branch-
> > instructions
> > 8534 +27.0% 10841 perf-
> > stat.ps.context-switches
> > 77.77 ± 3% +11.8% 86.96 ± 2% perf-stat.ps.cpu-
> > migrations
> > 2.783e+09 -1.6% 2.739e+09 perf-
> > stat.ps.instructions
> > 105666 -2.4% 103164 perf-stat.ps.minor-
> > faults
> > 105666 -2.4% 103164 perf-stat.ps.page-
> > faults
> > 8.386e+11 -1.6% 8.253e+11 perf-
> > stat.total.instructions
> > 7.79 -0.4 7.41 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_
> > 64_after_hwframe.execve
> > 7.75 -0.3 7.47 perf-
> > profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 7.73 -0.3 7.46 perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 2.68 ± 2% -0.2 2.52 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_sysca
> > ll_64
> > 2.68 ± 2% -0.2 2.52 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64
> > _after_hwframe
> > 2.68 ± 2% -0.2 2.52 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe
> > 2.73 ± 2% -0.2 2.57 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 2.60 -0.1 2.46 ± 3% perf-
> > profile.calltrace.cycles-
> > pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.ex
> > ecve.exec_test
> > 2.61 -0.1 2.47 ± 3% perf-
> > profile.calltrace.cycles-pp.execve.exec_test
> > 2.60 -0.1 2.46 ± 3% perf-
> > profile.calltrace.cycles-
> > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve.exec_test
> > 2.60 -0.1 2.46 ± 3% perf-
> > profile.calltrace.cycles-
> > pp.entry_SYSCALL_64_after_hwframe.execve.exec_test
> > 1.92 ± 3% -0.1 1.79 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
> > 1.92 ± 3% -0.1 1.80 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
> > 4.68 -0.1 4.57 perf-
> > profile.calltrace.cycles-pp._Fork
> > 1.88 ± 2% -0.1 1.77 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
> > 2.76 -0.1 2.66 ± 2% perf-
> > profile.calltrace.cycles-pp.exec_test
> > 3.24 -0.1 3.16 perf-
> > profile.calltrace.cycles-
> > pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYS
> > CALL_64_after_hwframe
> > 0.84 ± 4% -0.1 0.77 ± 5% perf-
> > profile.calltrace.cycles-pp.wait4
> > 0.88 ± 7% +0.2 1.09 ± 3% perf-
> > profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> > 0.88 ± 7% +0.2 1.09 ± 3% perf-
> > profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> > 0.88 ± 7% +0.2 1.09 ± 3% perf-
> > profile.calltrace.cycles-pp.ret_from_fork_asm
> > 0.46 ± 45% +0.3 0.78 ± 5% perf-
> > profile.calltrace.cycles-
> > pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> > 0.17 ±141% +0.4 0.53 ± 4% perf-
> > profile.calltrace.cycles-
> > pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_seconda
> > ry
> > 0.18 ±141% +0.4 0.54 ± 2% perf-
> > profile.calltrace.cycles-
> > pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.common_s
> > tartup_64
> > 66.08 +0.8 66.85 perf-
> > profile.calltrace.cycles-
> > pp.cpu_startup_entry.start_secondary.common_startup_64
> > 66.08 +0.8 66.85 perf-
> > profile.calltrace.cycles-pp.start_secondary.common_startup_64
> > 66.02 +0.8 66.80 perf-
> > profile.calltrace.cycles-
> > pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> > 67.06 +0.9 68.00 perf-
> > profile.calltrace.cycles-pp.common_startup_64
> > 21.19 -0.9 20.30 perf-
> > profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 21.15 -0.9 20.27 perf-
> > profile.children.cycles-pp.do_syscall_64
> > 7.92 -0.4 7.53 ± 2% perf-
> > profile.children.cycles-pp.execve
> > 7.94 -0.4 7.56 ± 2% perf-
> > profile.children.cycles-pp.__x64_sys_execve
> > 7.84 -0.4 7.46 ± 2% perf-
> > profile.children.cycles-pp.do_execveat_common
> > 5.51 -0.3 5.25 ± 2% perf-
> > profile.children.cycles-pp.load_elf_binary
> > 3.68 -0.2 3.49 ± 2% perf-
> > profile.children.cycles-pp.__mmput
> > 2.81 ± 2% -0.2 2.63 perf-
> > profile.children.cycles-pp.__x64_sys_exit_group
> > 2.80 ± 2% -0.2 2.62 ± 2% perf-
> > profile.children.cycles-pp.do_exit
> > 2.81 ± 2% -0.2 2.62 ± 2% perf-
> > profile.children.cycles-pp.do_group_exit
> > 2.93 ± 2% -0.2 2.76 ± 2% perf-
> > profile.children.cycles-pp.x64_sys_call
> > 3.60 -0.2 3.44 ± 2% perf-
> > profile.children.cycles-pp.exit_mmap
> > 5.66 -0.1 5.51 perf-
> > profile.children.cycles-pp.__handle_mm_fault
> > 1.94 ± 3% -0.1 1.82 ± 2% perf-
> > profile.children.cycles-pp.exit_mm
> > 2.64 -0.1 2.52 ± 3% perf-
> > profile.children.cycles-pp.vm_mmap_pgoff
> > 2.55 ± 2% -0.1 2.43 ± 3% perf-
> > profile.children.cycles-pp.do_mmap
> > 2.19 ± 2% -0.1 2.08 ± 3% perf-
> > profile.children.cycles-pp.__mmap_region
> > 2.27 -0.1 2.16 ± 2% perf-
> > profile.children.cycles-pp.begin_new_exec
> > 2.79 -0.1 2.69 ± 2% perf-
> > profile.children.cycles-pp.exec_test
> > 0.83 ± 4% -0.1 0.76 ± 6% perf-
> > profile.children.cycles-pp.__mmap_prepare
> > 0.86 ± 4% -0.1 0.78 ± 5% perf-
> > profile.children.cycles-pp.wait4
> > 0.52 ± 5% -0.1 0.45 ± 7% perf-
> > profile.children.cycles-pp.kernel_wait4
> > 0.50 ± 5% -0.1 0.43 ± 6% perf-
> > profile.children.cycles-pp.do_wait
> > 0.88 ± 3% -0.1 0.81 ± 2% perf-
> > profile.children.cycles-pp.kmem_cache_free
> > 0.51 ± 2% -0.1 0.46 ± 6% perf-
> > profile.children.cycles-pp.setup_arg_pages
> > 0.39 ± 2% -0.0 0.34 ± 8% perf-
> > profile.children.cycles-pp.unlink_anon_vmas
> > 0.08 ± 10% -0.0 0.04 ± 71% perf-
> > profile.children.cycles-pp.perf_adjust_freq_unthr_context
> > 0.37 ± 5% -0.0 0.33 ± 3% perf-
> > profile.children.cycles-pp.__memcg_slab_free_hook
> > 0.21 ± 6% -0.0 0.17 ± 5% perf-
> > profile.children.cycles-pp.user_path_at
> > 0.21 ± 3% -0.0 0.18 ± 10% perf-
> > profile.children.cycles-pp.__percpu_counter_sum
> > 0.18 ± 7% -0.0 0.15 ± 5% perf-
> > profile.children.cycles-pp.alloc_empty_file
> > 0.33 ± 5% -0.0 0.30 perf-
> > profile.children.cycles-pp.relocate_vma_down
> > 0.04 ± 45% +0.0 0.08 ± 12% perf-
> > profile.children.cycles-pp.__update_load_avg_se
> > 0.14 ± 7% +0.0 0.18 ± 10% perf-
> > profile.children.cycles-pp.hrtimer_start_range_ns
> > 0.19 ± 9% +0.0 0.24 ± 7% perf-
> > profile.children.cycles-pp.prepare_task_switch
> > 0.02 ±142% +0.0 0.06 ± 23% perf-
> > profile.children.cycles-pp.select_task_rq
> > 0.03 ±100% +0.0 0.08 ± 8% perf-
> > profile.children.cycles-pp.task_contending
> > 0.45 ± 7% +0.1 0.51 ± 3% perf-
> > profile.children.cycles-pp.__pick_next_task
> > 0.14 ± 22% +0.1 0.20 ± 10% perf-
> > profile.children.cycles-pp.kick_pool
> > 0.36 ± 4% +0.1 0.42 ± 4% perf-
> > profile.children.cycles-pp.dequeue_entities
> > 0.36 ± 4% +0.1 0.44 ± 5% perf-
> > profile.children.cycles-pp.dequeue_task_fair
> > 0.15 ± 20% +0.1 0.23 ± 10% perf-
> > profile.children.cycles-pp.__queue_work
> > 0.49 ± 5% +0.1 0.57 ± 7% perf-
> > profile.children.cycles-pp.schedule_idle
> > 0.14 ± 22% +0.1 0.23 ± 9% perf-
> > profile.children.cycles-pp.queue_work_on
> > 0.36 ± 3% +0.1 0.46 ± 9% perf-
> > profile.children.cycles-pp.exit_to_user_mode_loop
> > 0.47 ± 7% +0.1 0.57 ± 7% perf-
> > profile.children.cycles-pp.timerqueue_del
> > 0.30 ± 13% +0.1 0.42 ± 7% perf-
> > profile.children.cycles-pp.ttwu_do_activate
> > 0.23 ± 15% +0.1 0.37 ± 4% perf-
> > profile.children.cycles-pp.flush_smp_call_function_queue
> > 0.18 ± 14% +0.1 0.32 ± 3% perf-
> > profile.children.cycles-pp.sched_ttwu_pending
> > 0.19 ± 13% +0.1 0.34 ± 4% perf-
> > profile.children.cycles-pp.__flush_smp_call_function_queue
> > 0.61 ± 3% +0.2 0.76 ± 5% perf-
> > profile.children.cycles-pp.schedule
> > 1.60 ± 4% +0.2 1.80 ± 2% perf-
> > profile.children.cycles-pp.ret_from_fork_asm
> > 1.60 ± 4% +0.2 1.80 ± 2% perf-
> > profile.children.cycles-pp.ret_from_fork
> > 0.88 ± 7% +0.2 1.09 ± 3% perf-
> > profile.children.cycles-pp.kthread
> > 1.22 ± 3% +0.2 1.45 ± 5% perf-
> > profile.children.cycles-pp.__schedule
> > 0.54 ± 8% +0.2 0.78 ± 5% perf-
> > profile.children.cycles-pp.worker_thread
> > 66.08 +0.8 66.85 perf-
> > profile.children.cycles-pp.start_secondary
> > 67.06 +0.9 68.00 perf-
> > profile.children.cycles-pp.common_startup_64
> > 67.06 +0.9 68.00 perf-
> > profile.children.cycles-pp.cpu_startup_entry
> > 67.06 +0.9 68.00 perf-
> > profile.children.cycles-pp.do_idle
> > 0.08 ± 10% -0.0 0.04 ± 71% perf-
> > profile.self.cycles-pp.perf_adjust_freq_unthr_context
> > 0.04 ± 45% +0.0 0.08 ± 10% perf-
> > profile.self.cycles-pp.__update_load_avg_se
> > 0.14 ± 10% +0.1 0.23 ± 11% perf-
> > profile.self.cycles-pp.timerqueue_del
> >
> >
> >
> > *******************************************************************
> > ********************************
> > lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU
> > @ 2.00GHz (Ice Lake) with 256G memory
> > ===================================================================
> > ======================
> > compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/te
> > st/testcase:
> > gcc-12/performance/1BRD_48G/xfs/x86_64-rhel-9.4/600/debian-12-
> > x86_64-20240206.cgz/lkp-icl-2sp2/sync_disk_rw/aim7
> >
> > commit:
> > baffb12277 ("sched: Add prev_sum_exec_runtime support for RT, DL
> > and SCX classes")
> > f3de761c52 ("sched: Move task_mm_cid_work to mm work_struct")
> >
> > baffb122772da116 f3de761c52148abfb1b4512914f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 344180 ± 6% -13.0% 299325 ± 9% meminfo.Mapped
> > 9594 ±123% +191.8% 27995 ± 54% numa-
> > meminfo.node1.PageTables
> > 2399 ±123% +191.3% 6989 ± 54% numa-
> > vmstat.node1.nr_page_table_pages
> > 1860734 -5.2% 1763194 vmstat.io.bo
> > 831686 +1.3% 842493 vmstat.system.cs
> > 50372 -5.5% 47609 aim7.jobs-per-min
> > 1435644 +11.5% 1600707
> > aim7.time.involuntary_context_switches
> > 7242 +1.2% 7332
> > aim7.time.percent_of_cpu_this_job_got
> > 5159 +7.1% 5526
> > aim7.time.system_time
> > 33195986 +6.9% 35497140
> > aim7.time.voluntary_context_switches
> > 40987 ± 10% -19.8% 32872 ± 9%
> > sched_debug.cfs_rq:/.avg_vruntime.stddev
> > 40987 ± 10% -19.8% 32872 ± 9%
> > sched_debug.cfs_rq:/.min_vruntime.stddev
> > 605972 ± 2% +14.5% 693922 ± 7%
> > sched_debug.cpu.avg_idle.max
> > 30974 ± 8% -20.9% 24498 ± 15%
> > sched_debug.cpu.avg_idle.min
> > 118758 ± 5% +22.0% 144899 ± 6%
> > sched_debug.cpu.avg_idle.stddev
> > 856253 +1.5% 869009 perf-stat.i.context-
> > switches
> > 3.06 +2.3% 3.13 perf-stat.i.cpi
> > 164824 +7.7% 177546 perf-stat.i.cpu-
> > migrations
> > 7.93 +2.5% 8.13 perf-
> > stat.i.metric.K/sec
> > 3.41 +1.8% 3.47 perf-
> > stat.overall.cpi
> > 1355 +5.8% 1434 ± 4% perf-
> > stat.overall.cycles-between-cache-misses
> > 0.29 -1.8% 0.29 perf-
> > stat.overall.ipc
> > 845412 +1.6% 858925 perf-
> > stat.ps.context-switches
> > 162728 +7.8% 175475 perf-stat.ps.cpu-
> > migrations
> > 4.391e+12 +5.0% 4.609e+12 perf-
> > stat.total.instructions
> > 444798 +6.0% 471383 ± 5% proc-
> > vmstat.nr_active_anon
> > 28190 -2.8% 27402 proc-vmstat.nr_dirty
> > 1231373 +2.3% 1259666 ± 2% proc-
> > vmstat.nr_file_pages
> > 63763 +0.9% 64355 proc-
> > vmstat.nr_inactive_file
> > 86758 ± 6% -12.9% 75546 ± 8% proc-
> > vmstat.nr_mapped
> > 10162 ± 2% +7.2% 10895 ± 3% proc-
> > vmstat.nr_page_table_pages
> > 265229 +10.4% 292795 ± 9% proc-vmstat.nr_shmem
> > 444798 +6.0% 471383 ± 5% proc-
> > vmstat.nr_zone_active_anon
> > 63763 +0.9% 64355 proc-
> > vmstat.nr_zone_inactive_file
> > 28191 -2.8% 27400 proc-
> > vmstat.nr_zone_write_pending
> > 24349 +11.6% 27171 ± 8% proc-vmstat.pgreuse
> > 0.02 ± 3% +11.3% 0.03 ± 2% perf-
> > sched.sch_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_write_get_more_iclog_space
> > 0.29 ± 17% -30.7% 0.20 ± 14% perf-
> > sched.sch_delay.avg.ms.__cond_resched.down_read.xfs_file_fsync.xfs_
> > file_buffered_write.vfs_write
> > 0.03 ± 10% +33.5% 0.04 ± 2% perf-
> > sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> > 0.21 ± 32% -100.0% 0.00 perf-
> > sched.sch_delay.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.16 ± 16% +51.9% 0.24 ± 11% perf-
> > sched.sch_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 0.22 ± 19% +44.1% 0.32 ± 25% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_f
> > unction_single.[unknown]
> > 0.30 ± 28% -38.7% 0.18 ± 28% perf-
> > sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> > 0.11 ± 5% +12.8% 0.12 ± 4% perf-
> > sched.sch_delay.avg.ms.xlog_cil_force_seq.xfs_log_force_seq.xfs_fil
> > e_fsync.xfs_file_buffered_write
> > 0.08 ± 4% +15.8% 0.09 ± 4% perf-
> > sched.sch_delay.avg.ms.xlog_wait.xlog_force_lsn.xfs_log_force_seq.x
> > fs_file_fsync
> > 0.02 ± 3% +13.7% 0.02 ± 4% perf-
> > sched.sch_delay.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.proces
> > s_one_work.worker_thread
> > 0.01 ±223% +1289.5% 0.09 ±111% perf-
> > sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_c
> > il_ctx_alloc.xlog_cil_push_work.process_one_work
> > 2.49 ± 40% -43.4% 1.41 ± 50% perf-
> > sched.sch_delay.max.ms.__cond_resched.down_read.xfs_file_fsync.xfs_
> > file_buffered_write.vfs_write
> > 0.76 ± 7% +92.8% 1.46 ± 40% perf-
> > sched.sch_delay.max.ms.__cond_resched.process_one_work.worker_threa
> > d.kthread.ret_from_fork
> > 0.65 ± 41% -100.0% 0.00 perf-
> > sched.sch_delay.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 1.40 ± 64% +2968.7% 43.04 ± 13% perf-
> > sched.sch_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 0.63 ± 19% +89.8% 1.19 ± 51% perf-
> > sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_
> > completion_state.kernel_clone
> > 28.67 ± 3% -11.2% 25.45 ± 5% perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.__flus
> > h_workqueue.xlog_cil_push_now.isra
> > 0.80 ± 9% -100.0% 0.00 perf-
> > sched.wait_and_delay.avg.ms.__cond_resched.down.xlog_write_iclog.xl
> > og_state_release_iclog.xlog_write_get_more_iclog_space
> > 5.76 ±107% +152.4% 14.53 ± 10% perf-
> > sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.en
> > try_SYSCALL_64_after_hwframe.[unknown]
> > 8441 -100.0% 0.00 perf-
> > sched.wait_and_delay.count.__cond_resched.down.xlog_write_iclog.xlo
> > g_state_release_iclog.xlog_write_get_more_iclog_space
> > 18.67 ± 71% +108.0% 38.83 ± 5% perf-
> > sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit
> > .__xfs_trans_commit.xfs_trans_commit
> > 116.17 ±105% +1677.8% 2065 ± 5% perf-
> > sched.wait_and_delay.count.exit_to_user_mode_loop.do_syscall_64.ent
> > ry_SYSCALL_64_after_hwframe.[unknown]
> > 424.79 ±151% -100.0% 0.00 perf-
> > sched.wait_and_delay.max.ms.__cond_resched.down.xlog_write_iclog.xl
> > og_state_release_iclog.xlog_write_get_more_iclog_space
> > 28.51 ± 3% -11.2% 25.31 ± 4% perf-
> > sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__flush_wor
> > kqueue.xlog_cil_push_now.isra
> > 0.38 ± 59% -79.0% 0.08 ±107% perf-
> > sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_state_get_iclog_space
> > 0.77 ± 9% -56.5% 0.34 ± 3% perf-
> > sched.wait_time.avg.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_write_get_more_iclog_space
> > 1.80 ±138% -100.0% 0.00 perf-
> > sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 6.13 ± 93% +133.2% 14.29 ± 10% perf-
> > sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_S
> > YSCALL_64_after_hwframe.[unknown]
> > 1.00 ± 16% -48.1% 0.52 ± 20% perf-
> > sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_resche
> > dule_ipi.[unknown]
> > 0.92 ± 16% -62.0% 0.35 ± 14% perf-
> > sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_s
> > lowpath.down_write.xlog_cil_push_work
> > 0.26 ± 2% -59.8% 0.11 perf-
> > sched.wait_time.avg.ms.xlog_wait_on_iclog.xlog_cil_push_work.proces
> > s_one_work.worker_thread
> > 0.24 ±223% +2180.2% 5.56 ± 83% perf-
> > sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.xlog_c
> > il_ctx_alloc.xlog_cil_push_work.process_one_work
> > 1.25 ± 77% -79.8% 0.25 ±107% perf-
> > sched.wait_time.max.ms.__cond_resched.down.xlog_write_iclog.xlog_st
> > ate_release_iclog.xlog_state_get_iclog_space
> > 1.78 ± 51% +958.6% 18.82 ±117% perf-
> > sched.wait_time.max.ms.__cond_resched.mempool_alloc_noprof.bio_allo
> > c_bioset.iomap_writepage_map_blocks.iomap_writepage_map
> > 58.48 ± 6% -10.7% 52.22 ± 2% perf-
> > sched.wait_time.max.ms.__cond_resched.mutex_lock.__flush_workqueue.
> > xlog_cil_push_now.isra
> > 10.87 ±192% -100.0% 0.00 perf-
> > sched.wait_time.max.ms.__cond_resched.task_work_run.exit_to_user_mo
> > de_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 8.63 ± 27% -63.9% 3.12 ± 29% perf-
> > sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_s
> > lowpath.down_write.xlog_cil_push_work
> >
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and
> > are provided
> > for informational purposes only. Any difference in system hardware
> > or software
> > design or configuration may affect actual performance.
> >
> >
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct
2025-06-25 13:57 ` Mathieu Desnoyers
2025-06-25 15:06 ` Gabriele Monaco
@ 2025-07-02 13:58 ` Gabriele Monaco
1 sibling, 0 replies; 11+ messages in thread
From: Gabriele Monaco @ 2025-07-02 13:58 UTC (permalink / raw)
To: Mathieu Desnoyers, kernel test robot
Cc: oe-lkp, lkp, linux-mm, linux-kernel, aubrey.li, yu.c.chen,
Andrew Morton, David Hildenbrand, Ingo Molnar, Peter Zijlstra,
Paul E. McKenney, Ingo Molnar
On Wed, 2025-06-25 at 09:57 -0400, Mathieu Desnoyers wrote:
> On 2025-06-25 04:01, kernel test robot wrote:
> >
> > Hello,
> >
> > kernel test robot noticed a 10.1% regression of
> > hackbench.throughput on:
>
> Hi Gabriele,
>
> This is a significant regression. Can you investigate before it gets
> merged ?
>
Hi Mathieu,
I run some tests, the culprit for this performance regression seems to
be the interference due to more consistent `mm_cid` scans and them
running in `work_struct`, which brings some more scheduling overhead.
One solution could be to reduce the frequency: now they run
(sporadically) about every 100ms, if the minimum delay is 1s, the test
results seem ok.
However, I tried another approach that seems promising: work_struct get
scheduled relatively fast and this ends up giving a lot of contention
with kworkers, however something like timer_list seems less aggressive
and we obtain a similar reliability with respect to calls to the mm_cid
scan, without the same performance impact.
At the moment I just kept roughly the same structure of the patch and
used a timer delayed by 1 jiffy in place of the work_struct.
It may look cleaner if we use the timer directly for the 100ms delay
instead of storing and checking the time, in fact running a scan about
100ms after every rseq_handle_notify_resume.
What do you think?
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-07-02 14:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 9:12 [RESEND PATCH v13 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 1/3] sched: Add prev_sum_exec_runtime support for RT, DL and SCX classes Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 2/3] sched: Move task_mm_cid_work to mm work_struct Gabriele Monaco
2025-06-25 8:01 ` kernel test robot
2025-06-25 13:57 ` Mathieu Desnoyers
2025-06-25 15:06 ` Gabriele Monaco
2025-07-02 13:58 ` Gabriele Monaco
2025-06-13 9:12 ` [RESEND PATCH v13 3/3] selftests/rseq: Add test for mm_cid compaction Gabriele Monaco
2025-06-18 21:04 ` Shuah Khan
2025-06-20 17:20 ` Gabriele Monaco
2025-06-20 17:29 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).