* [PATCH v2 00/11] sched_ext: Implement core event counters
@ 2025-01-26 10:16 Changwoo Min
2025-01-26 10:16 ` [PATCH v2 01/11] sched_ext: Implement event counter infrastructure Changwoo Min
` (11 more replies)
0 siblings, 12 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
The sched_ext core often has to override the BPF scheduler decisions,
and some events could be interesting but not easily visible.
This patchset aims to address such a problem in the following manner:
- Introduce an infrastructure to collect such events in a scalable and
extensible way and to expose the collected events to the BPF scheduler
in a compatible way.
- Define seven events to be collected.
- Modify an scx scheduler to demonstrate the usage of the new BPF APIs.
ChangeLog: v1 -> v2
- Rename scx_event_stat and scx_bpf_event_stat() to scx_event_stats and
scx_bpf_event_stats().
- Rename event names following the convention of $COMPONENT_$EVENT.
- Rename event_stats to event_stats_cpu.
- Drop the enum scx_event_kind and related macros.
- Revise scx_add_event() to use this_cpu_add().
- Add __scx_add_event() to use __this_cpu_add().
- Move the event counter resetting code to the loading of a BPF scheduler.
- The bypass-related event is further categorized into three events:
BYPASS_ACTIVATE, BYPASS_DISPATCH, and BYPASS_DURATION.
- Revise SELECT_CPU_FALLBACK to capture the case of the chosen CPU is not
allowed.
- Move is_cpu_allowed() from core.c to sched.h to use in the
SELECT_CPU_FALLBACK code.
Changwoo Min (11):
sched_ext: Implement event counter infrastructure
sched: Move is_cpu_allowed() to the header
sched_ext: Add an event, SELECT_CPU_FALLBACK
sched_ext: Add an event, DISPATCH_LOCAL_DSQ_OFFLINE
sched_ext: Add an event, DISPATCH_KEEP_LAST
sched_ext: Add an event, ENQ_SKIP_EXITING
sched_ext: Add an event, BYPASS_ACTIVATE
sched_ext: Add an event, BYPASS_DISPATCH
sched_ext: Add an event, BYPASS_DURATION
sched_ext: Add scx_bpf_event_stats() and scx_read_event() for BPF
schedulers
sched_ext: Print core event count in scx_central scheduler
kernel/sched/core.c | 30 ----
kernel/sched/ext.c | 177 ++++++++++++++++++++++-
kernel/sched/sched.h | 30 ++++
tools/sched_ext/include/scx/common.bpf.h | 4 +
tools/sched_ext/scx_central.bpf.c | 21 +++
5 files changed, 231 insertions(+), 31 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 01/11] sched_ext: Implement event counter infrastructure
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-27 20:09 ` Tejun Heo
2025-01-26 10:16 ` [PATCH v2 02/11] sched: Move is_cpu_allowed() to the header Changwoo Min
` (10 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Collect the statistics of specific types of behavior in the sched_ext core,
which are not easily visible but still interesting to an scx scheduler.
An event type is defined in 'struct scx_event_stats.' When an event occurs,
its counter is accumulated using 'scx_add_event()' and '__scx_add_event()'
to per-CPU 'struct scx_event_stats' for efficiency. 'scx_bpf_event_stats()'
aggregates all the per-CPU counters and exposes a system-wide counters.
For convenience and readability of the code, 'scx_agg_event()' and
'scx_dump_event()' are provided.
The collected events can be observed after a BPF scheduler is unloaded
beforea new BPF scheduler is loaded so the per-CPU 'struct scx_event_stats'
are reset.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 103 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 7081c7be5f62..dbf659d593a3 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1458,6 +1458,64 @@ static struct task_struct *scx_task_iter_next_locked(struct scx_task_iter *iter)
return p;
}
+/*
+ * Collection of event counters. Event types are placed in descending order.
+ */
+struct scx_event_stats {
+};
+
+/*
+ * The event counter is organized by a per-CPU variable to minimize the
+ * accounting overhead without synchronization. A system-wide view on the
+ * event counter is constructed when requested by scx_bpf_get_event_stat().
+ */
+static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
+
+/**
+ * scx_add_event - Increase an event counter for 'name' by 'cnt'
+ * @name: an event name defined in struct scx_event_stats
+ * @cnt: the number of the event occured
+ *
+ * This can be used when preemption is not disabled.
+ */
+#define scx_add_event(name, cnt) do { \
+ this_cpu_add(event_stats_cpu.name, cnt); \
+} while(0)
+
+/**
+ * __scx_add_event - Increase an event counter for 'name' by 'cnt'
+ * @name: an event name defined in struct scx_event_stats
+ * @cnt: the number of the event occured
+ *
+ * This should be used only when preemption is disabled.
+ */
+#define __scx_add_event(name, cnt) do { \
+ __this_cpu_add(event_stats_cpu.name, cnt); \
+} while(0)
+
+/**
+ * scx_agg_event - Aggregate an event counter 'kind' from 'src_e' to 'dst_e'
+ * @dst_e: destination event stats
+ * @src_e: source event stats
+ * @kind: a kind of event to be aggregated
+ */
+#define scx_agg_event(dst_e, src_e, kind) do { \
+ (dst_e)->kind += READ_ONCE((src_e)->kind); \
+} while(0)
+
+/**
+ * scx_dump_event - Dump an event 'kind' in 'events' to 's'
+ * @s: output seq_buf
+ * @events: event stats
+ * @kind: a kind of event to dump
+ */
+#define scx_dump_event(s, events, kind) do { \
+ dump_line(&(s), "%30s: %16llu", #kind, (events)->kind); \
+} while (0)
+
+
+static void scx_bpf_event_stats(struct scx_event_stats *events, size_t events__sz);
+
static enum scx_ops_enable_state scx_ops_enable_state(void)
{
return atomic_read(&scx_ops_enable_state_var);
@@ -5310,6 +5368,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
.at_jiffies = jiffies,
};
struct seq_buf s;
+ struct scx_event_stats events;
unsigned long flags;
char *buf;
int cpu;
@@ -5418,6 +5477,12 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
rq_unlock(rq, &rf);
}
+ dump_newline(&s);
+ dump_line(&s, "Event counters");
+ dump_line(&s, "--------------");
+
+ scx_bpf_event_stats(&events, sizeof(events));
+
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
memcpy(ei->dump + dump_len - sizeof(trunc_marker),
trunc_marker, sizeof(trunc_marker));
@@ -5525,6 +5590,15 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
mutex_lock(&scx_ops_enable_mutex);
+ /*
+ * Clear event counters so a new scx scheduler gets
+ * fresh event counter values.
+ */
+ for_each_possible_cpu(cpu) {
+ struct scx_event_stats *e = per_cpu_ptr(&event_stats_cpu, cpu);
+ memset(e, 0, sizeof(*e));
+ }
+
if (!scx_ops_helper) {
WRITE_ONCE(scx_ops_helper,
scx_create_rt_helper("sched_ext_ops_helper"));
@@ -7721,6 +7795,34 @@ __bpf_kfunc u64 scx_bpf_now(void)
return clock;
}
+/*
+ * scx_bpf_event_stats - Get a system-wide event counter to
+ * @events: output buffer from a BPF program
+ * @events__sz: @events len, must end in '__sz'' for the verifier
+ */
+__bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
+ size_t events__sz)
+{
+ struct scx_event_stats e_sys, *e_cpu;
+ int cpu;
+
+ /* Aggregate per-CPU event counters into the system-wide counters. */
+ memset(&e_sys, 0, sizeof(e_sys));
+ for_each_possible_cpu(cpu) {
+ e_cpu = per_cpu_ptr(&event_stats_cpu, cpu);
+ }
+
+ /*
+ * We cannot entirely trust a BPF-provided size since a BPF program
+ * might be compiled against a different vmlinux.h, of which
+ * scx_event_stats would be larger (a newer vmlinux.h) or smaller
+ * (an older vmlinux.h). Hence, we use the smaller size to avoid
+ * memory corruption.
+ */
+ events__sz = min(events__sz, sizeof(*events));
+ memcpy(events, &e_sys, events__sz);
+}
+
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(scx_kfunc_ids_any)
@@ -7753,6 +7855,7 @@ BTF_ID_FLAGS(func, scx_bpf_cpu_rq)
BTF_ID_FLAGS(func, scx_bpf_task_cgroup, KF_RCU | KF_ACQUIRE)
#endif
BTF_ID_FLAGS(func, scx_bpf_now)
+BTF_ID_FLAGS(func, scx_bpf_event_stats, KF_TRUSTED_ARGS)
BTF_KFUNCS_END(scx_kfunc_ids_any)
static const struct btf_kfunc_id_set scx_kfunc_set_any = {
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 02/11] sched: Move is_cpu_allowed() to the header
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
2025-01-26 10:16 ` [PATCH v2 01/11] sched_ext: Implement event counter infrastructure Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 03/11] sched_ext: Add an event, SELECT_CPU_FALLBACK Changwoo Min
` (9 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
is_cpu_allowed() can be used to specific scheduling policies
(e.g., sched_ext), so move it to the header.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/core.c | 30 ------------------------------
kernel/sched/sched.h | 30 ++++++++++++++++++++++++++++++
2 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 901170708e2a..eaaee182999c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2427,36 +2427,6 @@ static inline bool rq_has_pinned_tasks(struct rq *rq)
return rq->nr_pinned;
}
-/*
- * Per-CPU kthreads are allowed to run on !active && online CPUs, see
- * __set_cpus_allowed_ptr() and select_fallback_rq().
- */
-static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
-{
- /* When not in the task's cpumask, no point in looking further. */
- if (!task_allowed_on_cpu(p, cpu))
- return false;
-
- /* migrate_disabled() must be allowed to finish. */
- if (is_migration_disabled(p))
- return cpu_online(cpu);
-
- /* Non kernel threads are not allowed during either online or offline. */
- if (!(p->flags & PF_KTHREAD))
- return cpu_active(cpu);
-
- /* KTHREAD_IS_PER_CPU is always allowed. */
- if (kthread_is_per_cpu(p))
- return cpu_online(cpu);
-
- /* Regular kernel threads don't get to stay during offline. */
- if (cpu_dying(cpu))
- return false;
-
- /* But are allowed during online. */
- return cpu_online(cpu);
-}
-
/*
* This is how migration works:
*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 38e0e323dda2..30c401d940f4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2675,6 +2675,36 @@ static inline cpumask_t *alloc_user_cpus_ptr(int node)
#endif /* !CONFIG_SMP */
+/*
+ * Per-CPU kthreads are allowed to run on !active && online CPUs, see
+ * __set_cpus_allowed_ptr() and select_fallback_rq().
+ */
+static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
+{
+ /* When not in the task's cpumask, no point in looking further. */
+ if (!task_allowed_on_cpu(p, cpu))
+ return false;
+
+ /* migrate_disabled() must be allowed to finish. */
+ if (is_migration_disabled(p))
+ return cpu_online(cpu);
+
+ /* Non kernel threads are not allowed during either online or offline. */
+ if (!(p->flags & PF_KTHREAD))
+ return cpu_active(cpu);
+
+ /* KTHREAD_IS_PER_CPU is always allowed. */
+ if (kthread_is_per_cpu(p))
+ return cpu_online(cpu);
+
+ /* Regular kernel threads don't get to stay during offline. */
+ if (cpu_dying(cpu))
+ return false;
+
+ /* But are allowed during online. */
+ return cpu_online(cpu);
+}
+
#ifdef CONFIG_CPU_IDLE
static inline void idle_set_state(struct rq *rq,
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 03/11] sched_ext: Add an event, SELECT_CPU_FALLBACK
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
2025-01-26 10:16 ` [PATCH v2 01/11] sched_ext: Implement event counter infrastructure Changwoo Min
2025-01-26 10:16 ` [PATCH v2 02/11] sched: Move is_cpu_allowed() to the header Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-27 20:01 ` Tejun Heo
2025-01-26 10:16 ` [PATCH v2 04/11] sched_ext: Add an event, DISPATCH_LOCAL_DSQ_OFFLINE Changwoo Min
` (8 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Add a core event, SELECT_CPU_FALLBACK, which represents how many times
ops.select_cpu() returns a CPU that the task can't use.
__scx_add_event() is used since the caller holds a p->pi_lock,
so the preemption has already been disabled.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index dbf659d593a3..727b1f8b623e 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1462,6 +1462,11 @@ static struct task_struct *scx_task_iter_next_locked(struct scx_task_iter *iter)
* Collection of event counters. Event types are placed in descending order.
*/
struct scx_event_stats {
+ /*
+ * If ops.select_cpu() returns a CPU which can't be used by the task,
+ * the core scheduler code silently picks a fallback CPU.
+ */
+ u64 SELECT_CPU_FALLBACK;
};
/*
@@ -3663,6 +3668,10 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
cpu = SCX_CALL_OP_TASK_RET(SCX_KF_ENQUEUE | SCX_KF_SELECT_CPU,
select_cpu, p, prev_cpu, wake_flags);
*ddsp_taskp = NULL;
+
+ if (unlikely(!is_cpu_allowed(p, cpu)))
+ __scx_add_event(SELECT_CPU_FALLBACK, 1);
+
if (ops_cpu_valid(cpu, "from ops.select_cpu()"))
return cpu;
else
@@ -5482,6 +5491,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
dump_line(&s, "--------------");
scx_bpf_event_stats(&events, sizeof(events));
+ scx_dump_event(s, &events, SELECT_CPU_FALLBACK);
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
memcpy(ei->dump + dump_len - sizeof(trunc_marker),
@@ -7810,6 +7820,7 @@ __bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
memset(&e_sys, 0, sizeof(e_sys));
for_each_possible_cpu(cpu) {
e_cpu = per_cpu_ptr(&event_stats_cpu, cpu);
+ scx_agg_event(&e_sys, e_cpu, SELECT_CPU_FALLBACK);
}
/*
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 04/11] sched_ext: Add an event, DISPATCH_LOCAL_DSQ_OFFLINE
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (2 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 03/11] sched_ext: Add an event, SELECT_CPU_FALLBACK Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 05/11] sched_ext: Add an event, DISPATCH_KEEP_LAST Changwoo Min
` (7 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Add a core event, DISPATCH_LOCAL_DSQ_OFFLINE, which represents how many
times a BPF scheduler tries to dispatch to an offlined local DSQ.
__scx_add_event() is used since the caller holds an rq lock,
so the preemption has already been disabled.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 727b1f8b623e..38cf7ea6c385 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1467,6 +1467,12 @@ struct scx_event_stats {
* the core scheduler code silently picks a fallback CPU.
*/
u64 SELECT_CPU_FALLBACK;
+
+ /*
+ * When dispatching to a local DSQ, the CPU may have gone offline in
+ * the meantime. In this case, the task is bounced to the global DSQ.
+ */
+ u64 DISPATCH_LOCAL_DSQ_OFFLINE;
};
/*
@@ -2654,6 +2660,7 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
if (unlikely(!task_can_run_on_remote_rq(p, dst_rq, true))) {
dispatch_enqueue(find_global_dsq(p), p,
enq_flags | SCX_ENQ_CLEAR_OPSS);
+ __scx_add_event(DISPATCH_LOCAL_DSQ_OFFLINE, 1);
return;
}
@@ -5492,6 +5499,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
scx_bpf_event_stats(&events, sizeof(events));
scx_dump_event(s, &events, SELECT_CPU_FALLBACK);
+ scx_dump_event(s, &events, DISPATCH_LOCAL_DSQ_OFFLINE);
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
memcpy(ei->dump + dump_len - sizeof(trunc_marker),
@@ -7821,6 +7829,7 @@ __bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
for_each_possible_cpu(cpu) {
e_cpu = per_cpu_ptr(&event_stats_cpu, cpu);
scx_agg_event(&e_sys, e_cpu, SELECT_CPU_FALLBACK);
+ scx_agg_event(&e_sys, e_cpu, DISPATCH_LOCAL_DSQ_OFFLINE);
}
/*
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 05/11] sched_ext: Add an event, DISPATCH_KEEP_LAST
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (3 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 04/11] sched_ext: Add an event, DISPATCH_LOCAL_DSQ_OFFLINE Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 06/11] sched_ext: Add an event, ENQ_SKIP_EXITING Changwoo Min
` (6 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Add a core event, DISPATCH_KEEP_LAST, which represents how many times
a task is continued to run without ops.enqueue() when SCX_OPS_ENQ_LAST
is not set.
__scx_add_event() is used since the caller holds an rq lock,
so the preemption has already been disabled.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 38cf7ea6c385..d3a1ef58202c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1473,6 +1473,12 @@ struct scx_event_stats {
* the meantime. In this case, the task is bounced to the global DSQ.
*/
u64 DISPATCH_LOCAL_DSQ_OFFLINE;
+
+ /*
+ * If SCX_OPS_ENQ_LAST is not set, the number of times that a task
+ * continued to run because there were no other tasks on the CPU.
+ */
+ u64 DISPATCH_KEEP_LAST;
};
/*
@@ -2915,6 +2921,7 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
if (prev_on_rq && (!static_branch_unlikely(&scx_ops_enq_last) ||
scx_rq_bypassing(rq))) {
rq->scx.flags |= SCX_RQ_BAL_KEEP;
+ __scx_add_event(DISPATCH_KEEP_LAST, 1);
goto has_tasks;
}
rq->scx.flags &= ~SCX_RQ_IN_BALANCE;
@@ -5500,6 +5507,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
scx_bpf_event_stats(&events, sizeof(events));
scx_dump_event(s, &events, SELECT_CPU_FALLBACK);
scx_dump_event(s, &events, DISPATCH_LOCAL_DSQ_OFFLINE);
+ scx_dump_event(s, &events, DISPATCH_KEEP_LAST);
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
memcpy(ei->dump + dump_len - sizeof(trunc_marker),
@@ -7830,6 +7838,7 @@ __bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
e_cpu = per_cpu_ptr(&event_stats_cpu, cpu);
scx_agg_event(&e_sys, e_cpu, SELECT_CPU_FALLBACK);
scx_agg_event(&e_sys, e_cpu, DISPATCH_LOCAL_DSQ_OFFLINE);
+ scx_agg_event(&e_sys, e_cpu, DISPATCH_KEEP_LAST);
}
/*
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 06/11] sched_ext: Add an event, ENQ_SKIP_EXITING
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (4 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 05/11] sched_ext: Add an event, DISPATCH_KEEP_LAST Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 07/11] sched_ext: Add an event, BYPASS_ACTIVATE Changwoo Min
` (5 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Add a core event, ENQ_SKIP_EXITING, which represents how many times
a task is enqueued to a local DSQ when exiting if SCX_OPS_ENQ_EXITING
is not set.
__scx_add_event() is used since the caller holds an rq lock,
so the preemption has already been disabled.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index d3a1ef58202c..6dc86bc06596 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1479,6 +1479,12 @@ struct scx_event_stats {
* continued to run because there were no other tasks on the CPU.
*/
u64 DISPATCH_KEEP_LAST;
+
+ /*
+ * If SCX_OPS_ENQ_EXITING is not set, the number of times that a task
+ * is dispatched to a local DSQ when exiting.
+ */
+ u64 ENQ_SKIP_EXITING;
};
/*
@@ -2086,8 +2092,10 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
/* see %SCX_OPS_ENQ_EXITING */
if (!static_branch_unlikely(&scx_ops_enq_exiting) &&
- unlikely(p->flags & PF_EXITING))
+ unlikely(p->flags & PF_EXITING)) {
+ __scx_add_event(ENQ_SKIP_EXITING, 1);
goto local;
+ }
if (!SCX_HAS_OP(enqueue))
goto global;
@@ -5508,6 +5516,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
scx_dump_event(s, &events, SELECT_CPU_FALLBACK);
scx_dump_event(s, &events, DISPATCH_LOCAL_DSQ_OFFLINE);
scx_dump_event(s, &events, DISPATCH_KEEP_LAST);
+ scx_dump_event(s, &events, ENQ_SKIP_EXITING);
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
memcpy(ei->dump + dump_len - sizeof(trunc_marker),
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 07/11] sched_ext: Add an event, BYPASS_ACTIVATE
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (5 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 06/11] sched_ext: Add an event, ENQ_SKIP_EXITING Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-27 20:03 ` Tejun Heo
2025-01-26 10:16 ` [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH Changwoo Min
` (4 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Add a core event, BYPASS_ACTIVATE, which represents how many times
the bypass mode has been triggered.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 6dc86bc06596..bb363809e484 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1485,6 +1485,11 @@ struct scx_event_stats {
* is dispatched to a local DSQ when exiting.
*/
u64 ENQ_SKIP_EXITING;
+
+ /*
+ * The number of times the bypassing mode has been activated.
+ */
+ u64 BYPASS_ACTIVATE;
};
/*
@@ -3707,6 +3712,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
p->scx.slice = SCX_SLICE_DFL;
p->scx.ddsp_dsq_id = SCX_DSQ_LOCAL;
}
+
return cpu;
}
}
@@ -4922,6 +4928,7 @@ static void scx_ops_bypass(bool bypass)
WARN_ON_ONCE(scx_ops_bypass_depth <= 0);
if (scx_ops_bypass_depth != 1)
goto unlock;
+ scx_add_event(BYPASS_ACTIVATE, 1);
} else {
scx_ops_bypass_depth--;
WARN_ON_ONCE(scx_ops_bypass_depth < 0);
@@ -5517,6 +5524,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
scx_dump_event(s, &events, DISPATCH_LOCAL_DSQ_OFFLINE);
scx_dump_event(s, &events, DISPATCH_KEEP_LAST);
scx_dump_event(s, &events, ENQ_SKIP_EXITING);
+ scx_dump_event(s, &events, BYPASS_ACTIVATE);
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
memcpy(ei->dump + dump_len - sizeof(trunc_marker),
@@ -7848,6 +7856,7 @@ __bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
scx_agg_event(&e_sys, e_cpu, SELECT_CPU_FALLBACK);
scx_agg_event(&e_sys, e_cpu, DISPATCH_LOCAL_DSQ_OFFLINE);
scx_agg_event(&e_sys, e_cpu, DISPATCH_KEEP_LAST);
+ scx_agg_event(&e_sys, e_cpu, BYPASS_ACTIVATE);
}
/*
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (6 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 07/11] sched_ext: Add an event, BYPASS_ACTIVATE Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-27 20:05 ` Tejun Heo
2025-01-26 10:16 ` [PATCH v2 09/11] sched_ext: Add an event, BYPASS_DURATION Changwoo Min
` (3 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Add a core event, BYPASS_DISPATCH, which represents how many tasks
have been dispatched in the bypass mode.
__scx_add_event() is used since the caller holds an rq lock,
so the preemption has already been disabled.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index bb363809e484..9a680774c1fc 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1486,6 +1486,11 @@ struct scx_event_stats {
*/
u64 ENQ_SKIP_EXITING;
+ /*
+ * The number of tasks dispatched in the bypassing mode.
+ */
+ u64 BYPASS_DISPATCH;
+
/*
* The number of times the bypassing mode has been activated.
*/
@@ -2941,6 +2946,8 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
return false;
has_tasks:
+ if (scx_rq_bypassing(rq))
+ __scx_add_event(BYPASS_DISPATCH, 1);
rq->scx.flags &= ~SCX_RQ_IN_BALANCE;
return true;
}
@@ -5524,6 +5531,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
scx_dump_event(s, &events, DISPATCH_LOCAL_DSQ_OFFLINE);
scx_dump_event(s, &events, DISPATCH_KEEP_LAST);
scx_dump_event(s, &events, ENQ_SKIP_EXITING);
+ scx_dump_event(s, &events, BYPASS_DISPATCH);
scx_dump_event(s, &events, BYPASS_ACTIVATE);
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
@@ -7856,6 +7864,7 @@ __bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
scx_agg_event(&e_sys, e_cpu, SELECT_CPU_FALLBACK);
scx_agg_event(&e_sys, e_cpu, DISPATCH_LOCAL_DSQ_OFFLINE);
scx_agg_event(&e_sys, e_cpu, DISPATCH_KEEP_LAST);
+ scx_agg_event(&e_sys, e_cpu, BYPASS_DISPATCH);
scx_agg_event(&e_sys, e_cpu, BYPASS_ACTIVATE);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 09/11] sched_ext: Add an event, BYPASS_DURATION
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (7 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 10/11] sched_ext: Add scx_bpf_event_stats() and scx_read_event() for BPF schedulers Changwoo Min
` (2 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Add a core event, BYPASS_DURATION, which represents the total duration
of bypass modes in nanoseconds.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 9a680774c1fc..c71c89cfd1c8 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1486,6 +1486,11 @@ struct scx_event_stats {
*/
u64 ENQ_SKIP_EXITING;
+ /*
+ * The total duration of bypass modes in nanoseconds.
+ */
+ u64 BYPASS_DURATION;
+
/*
* The number of tasks dispatched in the bypassing mode.
*/
@@ -1547,6 +1552,12 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
} while (0)
+/*
+ * The last time the bypass mode started.
+ * This is used to measure BYPASS_DURATION.
+ */
+static unsigned long scx_bypass_timestamp;
+
static void scx_bpf_event_stats(struct scx_event_stats *events, size_t events__sz);
static enum scx_ops_enable_state scx_ops_enable_state(void)
@@ -4935,12 +4946,15 @@ static void scx_ops_bypass(bool bypass)
WARN_ON_ONCE(scx_ops_bypass_depth <= 0);
if (scx_ops_bypass_depth != 1)
goto unlock;
+ scx_bypass_timestamp = ktime_get_ns();
scx_add_event(BYPASS_ACTIVATE, 1);
} else {
scx_ops_bypass_depth--;
WARN_ON_ONCE(scx_ops_bypass_depth < 0);
if (scx_ops_bypass_depth != 0)
goto unlock;
+ scx_add_event(BYPASS_DURATION,
+ ktime_get_ns() - scx_bypass_timestamp);
}
atomic_inc(&scx_ops_breather_depth);
@@ -5531,6 +5545,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
scx_dump_event(s, &events, DISPATCH_LOCAL_DSQ_OFFLINE);
scx_dump_event(s, &events, DISPATCH_KEEP_LAST);
scx_dump_event(s, &events, ENQ_SKIP_EXITING);
+ scx_dump_event(s, &events, BYPASS_DURATION);
scx_dump_event(s, &events, BYPASS_DISPATCH);
scx_dump_event(s, &events, BYPASS_ACTIVATE);
@@ -7864,6 +7879,7 @@ __bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
scx_agg_event(&e_sys, e_cpu, SELECT_CPU_FALLBACK);
scx_agg_event(&e_sys, e_cpu, DISPATCH_LOCAL_DSQ_OFFLINE);
scx_agg_event(&e_sys, e_cpu, DISPATCH_KEEP_LAST);
+ scx_agg_event(&e_sys, e_cpu, BYPASS_DURATION);
scx_agg_event(&e_sys, e_cpu, BYPASS_DISPATCH);
scx_agg_event(&e_sys, e_cpu, BYPASS_ACTIVATE);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 10/11] sched_ext: Add scx_bpf_event_stats() and scx_read_event() for BPF schedulers
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (8 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 09/11] sched_ext: Add an event, BYPASS_DURATION Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 11/11] sched_ext: Print core event count in scx_central scheduler Changwoo Min
2025-01-27 20:10 ` [PATCH v2 00/11] sched_ext: Implement core event counters Tejun Heo
11 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
scx_bpf_event_stats() is added to the header files so the BPF scheduler
can use it. Also, scx_read_event() is added to read an event type in a
compatible way.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
tools/sched_ext/include/scx/common.bpf.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/include/scx/common.bpf.h
index f3e15e9efa76..71db653c8eb2 100644
--- a/tools/sched_ext/include/scx/common.bpf.h
+++ b/tools/sched_ext/include/scx/common.bpf.h
@@ -78,6 +78,10 @@ struct rq *scx_bpf_cpu_rq(s32 cpu) __ksym;
struct cgroup *scx_bpf_task_cgroup(struct task_struct *p) __ksym __weak;
u64 scx_bpf_now(void) __ksym __weak;
+void scx_bpf_event_stats(struct scx_event_stats *events, size_t events__sz) __ksym __weak;
+#define scx_read_event(e, name) \
+ (bpf_core_field_exists((e)->name) ? (e)->name : 0)
+
/*
* Use the following as @it__iter when calling scx_bpf_dsq_move[_vtime]() from
* within bpf_for_each() loops.
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v2 11/11] sched_ext: Print core event count in scx_central scheduler
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (9 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 10/11] sched_ext: Add scx_bpf_event_stats() and scx_read_event() for BPF schedulers Changwoo Min
@ 2025-01-26 10:16 ` Changwoo Min
2025-01-27 20:08 ` Tejun Heo
2025-01-27 20:10 ` [PATCH v2 00/11] sched_ext: Implement core event counters Tejun Heo
11 siblings, 1 reply; 26+ messages in thread
From: Changwoo Min @ 2025-01-26 10:16 UTC (permalink / raw)
To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min
Modify the scx_central scheduler to print the core event counter
every second.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
tools/sched_ext/scx_central.bpf.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/tools/sched_ext/scx_central.bpf.c b/tools/sched_ext/scx_central.bpf.c
index 50bc1737c167..96b3c08132ea 100644
--- a/tools/sched_ext/scx_central.bpf.c
+++ b/tools/sched_ext/scx_central.bpf.c
@@ -256,6 +256,7 @@ static int central_timerfn(void *map, int *key, struct bpf_timer *timer)
u64 now = scx_bpf_now();
u64 nr_to_kick = nr_queued;
s32 i, curr_cpu;
+ struct scx_event_stats events;
curr_cpu = bpf_get_smp_processor_id();
if (timer_pinned && (curr_cpu != central_cpu)) {
@@ -291,6 +292,26 @@ static int central_timerfn(void *map, int *key, struct bpf_timer *timer)
bpf_timer_start(timer, TIMER_INTERVAL_NS, BPF_F_TIMER_CPU_PIN);
__sync_fetch_and_add(&nr_timers, 1);
+
+ /* print event counters every second */
+ if (nr_timers % 1000 == 0) {
+ scx_bpf_event_stats(&events, sizeof(events));
+
+ bpf_printk("%30s: %llu\n", "SELECT_CPU_FALLBACK",
+ scx_read_event(&events, SELECT_CPU_FALLBACK));
+ bpf_printk("%30s: %llu\n", "DISPATCH_LOCAL_DSQ_OFFLINE",
+ scx_read_event(&events, DISPATCH_LOCAL_DSQ_OFFLINE));
+ bpf_printk("%30s: %llu\n", "DISPATCH_KEEP_LAST",
+ scx_read_event(&events, DISPATCH_KEEP_LAST));
+ bpf_printk("%30s: %llu\n", "ENQ_SKIP_EXITING",
+ scx_read_event(&events, ENQ_SKIP_EXITING));
+ bpf_printk("%30s: %llu\n", "BYPASS_DURATION",
+ scx_read_event(&events, BYPASS_DURATION));
+ bpf_printk("%30s: %llu\n", "BYPASS_DISPATCH",
+ scx_read_event(&events, BYPASS_DISPATCH));
+ bpf_printk("%30s: %llu\n", "BYPASS_ACTIVATE",
+ scx_read_event(&events, BYPASS_ACTIVATE));
+ }
return 0;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v2 03/11] sched_ext: Add an event, SELECT_CPU_FALLBACK
2025-01-26 10:16 ` [PATCH v2 03/11] sched_ext: Add an event, SELECT_CPU_FALLBACK Changwoo Min
@ 2025-01-27 20:01 ` Tejun Heo
2025-01-30 6:22 ` Changwoo Min
0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2025-01-27 20:01 UTC (permalink / raw)
To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On Sun, Jan 26, 2025 at 07:16:06PM +0900, Changwoo Min wrote:
...
> struct scx_event_stats {
> + /*
> + * If ops.select_cpu() returns a CPU which can't be used by the task,
> + * the core scheduler code silently picks a fallback CPU.
> + */
> + u64 SELECT_CPU_FALLBACK;
As C has one global namespace, we're gonna have to prefix these. Hopefully,
something not too long. SCX_EV_?
> @@ -3663,6 +3668,10 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
> cpu = SCX_CALL_OP_TASK_RET(SCX_KF_ENQUEUE | SCX_KF_SELECT_CPU,
> select_cpu, p, prev_cpu, wake_flags);
> *ddsp_taskp = NULL;
> +
> + if (unlikely(!is_cpu_allowed(p, cpu)))
> + __scx_add_event(SELECT_CPU_FALLBACK, 1);
This is trying to guess what select_task_rq() is going to do and then count
that as an event, which doesn't seem great. Can't we just record the picked
CPU here into a field in p->scx and then compare that against cpu_of(rq) in
enqueue_task_scx()? That way, we can reliably count events where CPU
selection was modified by the fallback logic regardless of why or how that
happens.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 07/11] sched_ext: Add an event, BYPASS_ACTIVATE
2025-01-26 10:16 ` [PATCH v2 07/11] sched_ext: Add an event, BYPASS_ACTIVATE Changwoo Min
@ 2025-01-27 20:03 ` Tejun Heo
2025-01-30 6:34 ` Changwoo Min
0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2025-01-27 20:03 UTC (permalink / raw)
To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel
On Sun, Jan 26, 2025 at 07:16:10PM +0900, Changwoo Min wrote:
...
> @@ -3707,6 +3712,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
> p->scx.slice = SCX_SLICE_DFL;
> p->scx.ddsp_dsq_id = SCX_DSQ_LOCAL;
> }
> +
> return cpu;
> }
> }
Stray change?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH
2025-01-26 10:16 ` [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH Changwoo Min
@ 2025-01-27 20:05 ` Tejun Heo
2025-01-30 6:26 ` Changwoo Min
0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2025-01-27 20:05 UTC (permalink / raw)
To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel
On Sun, Jan 26, 2025 at 07:16:11PM +0900, Changwoo Min wrote:
...
> has_tasks:
> + if (scx_rq_bypassing(rq))
> + __scx_add_event(BYPASS_DISPATCH, 1);
Can we do this at the sources even if that's a bit more code?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 11/11] sched_ext: Print core event count in scx_central scheduler
2025-01-26 10:16 ` [PATCH v2 11/11] sched_ext: Print core event count in scx_central scheduler Changwoo Min
@ 2025-01-27 20:08 ` Tejun Heo
2025-01-30 6:28 ` Changwoo Min
0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2025-01-27 20:08 UTC (permalink / raw)
To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel
On Sun, Jan 26, 2025 at 07:16:14PM +0900, Changwoo Min wrote:
> Modify the scx_central scheduler to print the core event counter
> every second.
Can you add it to scx_qmap too? That's used as the dumping ground for all
the niche features, so it'd be nice to have an example there too.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 01/11] sched_ext: Implement event counter infrastructure
2025-01-26 10:16 ` [PATCH v2 01/11] sched_ext: Implement event counter infrastructure Changwoo Min
@ 2025-01-27 20:09 ` Tejun Heo
2025-01-30 6:12 ` Changwoo Min
0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2025-01-27 20:09 UTC (permalink / raw)
To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On Sun, Jan 26, 2025 at 07:16:04PM +0900, Changwoo Min wrote:
...
> +/*
> + * scx_bpf_event_stats - Get a system-wide event counter to
> + * @events: output buffer from a BPF program
> + * @events__sz: @events len, must end in '__sz'' for the verifier
> + */
> +__bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
> + size_t events__sz)
scx_bpf_events()?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 00/11] sched_ext: Implement core event counters
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
` (10 preceding siblings ...)
2025-01-26 10:16 ` [PATCH v2 11/11] sched_ext: Print core event count in scx_central scheduler Changwoo Min
@ 2025-01-27 20:10 ` Tejun Heo
2025-01-30 6:28 ` Changwoo Min
11 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2025-01-27 20:10 UTC (permalink / raw)
To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel
On Sun, Jan 26, 2025 at 07:16:03PM +0900, Changwoo Min wrote:
> The sched_ext core often has to override the BPF scheduler decisions,
> and some events could be interesting but not easily visible.
Generally looks good to me. Left several comments which should be
straightforward to address.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 01/11] sched_ext: Implement event counter infrastructure
2025-01-27 20:09 ` Tejun Heo
@ 2025-01-30 6:12 ` Changwoo Min
0 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-30 6:12 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On 25. 1. 28. 05:09, Tejun Heo wrote:
> Hello,
>
> On Sun, Jan 26, 2025 at 07:16:04PM +0900, Changwoo Min wrote:
> ...
>> +/*
>> + * scx_bpf_event_stats - Get a system-wide event counter to
>> + * @events: output buffer from a BPF program
>> + * @events__sz: @events len, must end in '__sz'' for the verifier
>> + */
>> +__bpf_kfunc void scx_bpf_event_stats(struct scx_event_stats *events,
>> + size_t events__sz)
>
> scx_bpf_events()?
That sounds simpler. I will change it in the next version.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 03/11] sched_ext: Add an event, SELECT_CPU_FALLBACK
2025-01-27 20:01 ` Tejun Heo
@ 2025-01-30 6:22 ` Changwoo Min
0 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-30 6:22 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On 25. 1. 28. 05:01, Tejun Heo wrote:
> Hello,
>
> On Sun, Jan 26, 2025 at 07:16:06PM +0900, Changwoo Min wrote:
> ...
>> struct scx_event_stats {
>> + /*
>> + * If ops.select_cpu() returns a CPU which can't be used by the task,
>> + * the core scheduler code silently picks a fallback CPU.
>> + */
>> + u64 SELECT_CPU_FALLBACK;
>
> As C has one global namespace, we're gonna have to prefix these. Hopefully,
> something not too long. SCX_EV_?
Sure, I will add SCX_EV_ as a prefix for all events.
>
>> @@ -3663,6 +3668,10 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
>> cpu = SCX_CALL_OP_TASK_RET(SCX_KF_ENQUEUE | SCX_KF_SELECT_CPU,
>> select_cpu, p, prev_cpu, wake_flags);
>> *ddsp_taskp = NULL;
>> +
>> + if (unlikely(!is_cpu_allowed(p, cpu)))
>> + __scx_add_event(SELECT_CPU_FALLBACK, 1);
>
> This is trying to guess what select_task_rq() is going to do and then count
> that as an event, which doesn't seem great. Can't we just record the picked
> CPU here into a field in p->scx and then compare that against cpu_of(rq) in
> enqueue_task_scx()? That way, we can reliably count events where CPU
> selection was modified by the fallback logic regardless of why or how that
> happens.
That sounds good. I will add a field (say, selected_cpu?) to struct
sched_ext_entity.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH
2025-01-27 20:05 ` Tejun Heo
@ 2025-01-30 6:26 ` Changwoo Min
2025-01-30 12:25 ` Tejun Heo
0 siblings, 1 reply; 26+ messages in thread
From: Changwoo Min @ 2025-01-30 6:26 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On 25. 1. 28. 05:05, Tejun Heo wrote:
> On Sun, Jan 26, 2025 at 07:16:11PM +0900, Changwoo Min wrote:
> ...
>> has_tasks:
>> + if (scx_rq_bypassing(rq))
>> + __scx_add_event(BYPASS_DISPATCH, 1);
>
> Can we do this at the sources even if that's a bit more code?
Sure. I will remove scx_add_event() and __scx_add_event() and will use
this_cpu_add() and __this_cpu_add() directly.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 11/11] sched_ext: Print core event count in scx_central scheduler
2025-01-27 20:08 ` Tejun Heo
@ 2025-01-30 6:28 ` Changwoo Min
0 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-30 6:28 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On 25. 1. 28. 05:08, Tejun Heo wrote:
> On Sun, Jan 26, 2025 at 07:16:14PM +0900, Changwoo Min wrote:
>> Modify the scx_central scheduler to print the core event counter
>> every second.
>
> Can you add it to scx_qmap too? That's used as the dumping ground for all
> the niche features, so it'd be nice to have an example there too.
Sure, I will add the event dump feature to monitor_timerfn() in scx_qmap.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 00/11] sched_ext: Implement core event counters
2025-01-27 20:10 ` [PATCH v2 00/11] sched_ext: Implement core event counters Tejun Heo
@ 2025-01-30 6:28 ` Changwoo Min
0 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-30 6:28 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On 25. 1. 28. 05:10, Tejun Heo wrote:
> On Sun, Jan 26, 2025 at 07:16:03PM +0900, Changwoo Min wrote:
>> The sched_ext core often has to override the BPF scheduler decisions,
>> and some events could be interesting but not easily visible.
>
> Generally looks good to me. Left several comments which should be
> straightforward to address.
Thank you for the review! I will cook another version in a day or two.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 07/11] sched_ext: Add an event, BYPASS_ACTIVATE
2025-01-27 20:03 ` Tejun Heo
@ 2025-01-30 6:34 ` Changwoo Min
0 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-30 6:34 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On 25. 1. 28. 05:03, Tejun Heo wrote:
> On Sun, Jan 26, 2025 at 07:16:10PM +0900, Changwoo Min wrote:
> ...
>> @@ -3707,6 +3712,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
>> p->scx.slice = SCX_SLICE_DFL;
>> p->scx.ddsp_dsq_id = SCX_DSQ_LOCAL;
>> }
>> +
>> return cpu;
>> }
>> }
>
> Stray change?
My bad. Will fix it.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH
2025-01-30 6:26 ` Changwoo Min
@ 2025-01-30 12:25 ` Tejun Heo
2025-01-31 1:13 ` Changwoo Min
0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2025-01-30 12:25 UTC (permalink / raw)
To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel
On Thu, Jan 30, 2025 at 03:26:16PM +0900, Changwoo Min wrote:
> Hello,
>
> On 25. 1. 28. 05:05, Tejun Heo wrote:
> > On Sun, Jan 26, 2025 at 07:16:11PM +0900, Changwoo Min wrote:
> > ...
> > > has_tasks:
> > > + if (scx_rq_bypassing(rq))
> > > + __scx_add_event(BYPASS_DISPATCH, 1);
> >
> > Can we do this at the sources even if that's a bit more code?
>
> Sure. I will remove scx_add_event() and __scx_add_event() and will use
> this_cpu_add() and __this_cpu_add() directly.
Oh, that's not what I meant. I meant that in the code quoted above, the stat
is being incremented in a spot where most code paths converge and then the
specific stat condition is being tested again. It'd be better to update the
stat where the condition is detected initially.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH
2025-01-30 12:25 ` Tejun Heo
@ 2025-01-31 1:13 ` Changwoo Min
0 siblings, 0 replies; 26+ messages in thread
From: Changwoo Min @ 2025-01-31 1:13 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, arighi, kernel-dev, linux-kernel
Hello,
On 25. 1. 30. 21:25, Tejun Heo wrote:
> On Thu, Jan 30, 2025 at 03:26:16PM +0900, Changwoo Min wrote:
>> Hello,
>>
>> On 25. 1. 28. 05:05, Tejun Heo wrote:
>>> On Sun, Jan 26, 2025 at 07:16:11PM +0900, Changwoo Min wrote:
>>> ...
>>>> has_tasks:
>>>> + if (scx_rq_bypassing(rq))
>>>> + __scx_add_event(BYPASS_DISPATCH, 1);
>>>
>>> Can we do this at the sources even if that's a bit more code?
>>
>> Sure. I will remove scx_add_event() and __scx_add_event() and will use
>> this_cpu_add() and __this_cpu_add() directly.
>
> Oh, that's not what I meant. I meant that in the code quoted above, the stat
> is being incremented in a spot where most code paths converge and then the
> specific stat condition is being tested again. It'd be better to update the
> stat where the condition is detected initially.
Aha, I got it. Thanks for the clarification.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-01-31 1:13 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-26 10:16 [PATCH v2 00/11] sched_ext: Implement core event counters Changwoo Min
2025-01-26 10:16 ` [PATCH v2 01/11] sched_ext: Implement event counter infrastructure Changwoo Min
2025-01-27 20:09 ` Tejun Heo
2025-01-30 6:12 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 02/11] sched: Move is_cpu_allowed() to the header Changwoo Min
2025-01-26 10:16 ` [PATCH v2 03/11] sched_ext: Add an event, SELECT_CPU_FALLBACK Changwoo Min
2025-01-27 20:01 ` Tejun Heo
2025-01-30 6:22 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 04/11] sched_ext: Add an event, DISPATCH_LOCAL_DSQ_OFFLINE Changwoo Min
2025-01-26 10:16 ` [PATCH v2 05/11] sched_ext: Add an event, DISPATCH_KEEP_LAST Changwoo Min
2025-01-26 10:16 ` [PATCH v2 06/11] sched_ext: Add an event, ENQ_SKIP_EXITING Changwoo Min
2025-01-26 10:16 ` [PATCH v2 07/11] sched_ext: Add an event, BYPASS_ACTIVATE Changwoo Min
2025-01-27 20:03 ` Tejun Heo
2025-01-30 6:34 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 08/11] sched_ext: Add an event, BYPASS_DISPATCH Changwoo Min
2025-01-27 20:05 ` Tejun Heo
2025-01-30 6:26 ` Changwoo Min
2025-01-30 12:25 ` Tejun Heo
2025-01-31 1:13 ` Changwoo Min
2025-01-26 10:16 ` [PATCH v2 09/11] sched_ext: Add an event, BYPASS_DURATION Changwoo Min
2025-01-26 10:16 ` [PATCH v2 10/11] sched_ext: Add scx_bpf_event_stats() and scx_read_event() for BPF schedulers Changwoo Min
2025-01-26 10:16 ` [PATCH v2 11/11] sched_ext: Print core event count in scx_central scheduler Changwoo Min
2025-01-27 20:08 ` Tejun Heo
2025-01-30 6:28 ` Changwoo Min
2025-01-27 20:10 ` [PATCH v2 00/11] sched_ext: Implement core event counters Tejun Heo
2025-01-30 6:28 ` Changwoo Min
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox