public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] sched_ext: Add trace point to sched_ext core events
@ 2025-03-04 10:48 Changwoo Min
  2025-03-04 10:48 ` [PATCH v4 1/2] sched_ext: Change the event type from u64 to s64 Changwoo Min
  2025-03-04 10:49 ` [PATCH v4 2/2] sched_ext: Add trace point to track sched_ext core events Changwoo Min
  0 siblings, 2 replies; 6+ messages in thread
From: Changwoo Min @ 2025-03-04 10:48 UTC (permalink / raw)
  To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min

Add tracing support to track sched_ext core events (/sched_ext/sched_ext_event)
to debug and monitor sched_ext schedulers. Also, change the core event type
from u64 to s64 to support negative event values.

ChangeLog v3 -> v4:
 - Replace a missing __u64 in a tracepoint definition to __s64.

ChangeLog v2 -> v3:
 - Change the type of @delta from __u64 to __s64 and make corresponding changes
   in scx_event_stats and scx_qmap.bpf.c.

ChangeLog v1 -> v2:
 - Rename @added field to @delta for clarity.
 - Rename sched_ext_add_event to sched_ext_event.
 - Drop the @offset field to avoid the potential misuse of non-portable numbers.

Changwoo Min (2):
  sched_ext: Change the event type from u64 to s64
  sched_ext: Add trace point to track sched_ext core events

 include/trace/events/sched_ext.h | 19 +++++++++++++++++++
 kernel/sched/ext.c               | 22 ++++++++++++----------
 tools/sched_ext/scx_qmap.bpf.c   | 16 ++++++++--------
 3 files changed, 39 insertions(+), 18 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/2] sched_ext: Change the event type from u64 to s64
  2025-03-04 10:48 [PATCH v4 0/2] sched_ext: Add trace point to sched_ext core events Changwoo Min
@ 2025-03-04 10:48 ` Changwoo Min
  2025-03-04 18:05   ` Tejun Heo
  2025-03-04 10:49 ` [PATCH v4 2/2] sched_ext: Add trace point to track sched_ext core events Changwoo Min
  1 sibling, 1 reply; 6+ messages in thread
From: Changwoo Min @ 2025-03-04 10:48 UTC (permalink / raw)
  To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min

The event count could be negative in the future,
so change the event type from u64 to s64.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 kernel/sched/ext.c             | 20 ++++++++++----------
 tools/sched_ext/scx_qmap.bpf.c | 16 ++++++++--------
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 986b655911df..686629a860f3 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1489,53 +1489,53 @@ struct scx_event_stats {
 	 * If ops.select_cpu() returns a CPU which can't be used by the task,
 	 * the core scheduler code silently picks a fallback CPU.
 	 */
-	u64		SCX_EV_SELECT_CPU_FALLBACK;
+	s64		SCX_EV_SELECT_CPU_FALLBACK;
 
 	/*
 	 * When dispatching to a local DSQ, the CPU may have gone offline in
 	 * the meantime. In this case, the task is bounced to the global DSQ.
 	 */
-	u64		SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE;
+	s64		SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE;
 
 	/*
 	 * If SCX_OPS_ENQ_LAST is not set, the number of times that a task
 	 * continued to run because there were no other tasks on the CPU.
 	 */
-	u64		SCX_EV_DISPATCH_KEEP_LAST;
+	s64		SCX_EV_DISPATCH_KEEP_LAST;
 
 	/*
 	 * If SCX_OPS_ENQ_EXITING is not set, the number of times that a task
 	 * is dispatched to a local DSQ when exiting.
 	 */
-	u64		SCX_EV_ENQ_SKIP_EXITING;
+	s64		SCX_EV_ENQ_SKIP_EXITING;
 
 	/*
 	 * If SCX_OPS_ENQ_MIGRATION_DISABLED is not set, the number of times a
 	 * migration disabled task skips ops.enqueue() and is dispatched to its
 	 * local DSQ.
 	 */
-	u64		SCX_EV_ENQ_SKIP_MIGRATION_DISABLED;
+	s64		SCX_EV_ENQ_SKIP_MIGRATION_DISABLED;
 
 	/*
 	 * The total number of tasks enqueued (or pick_task-ed) with a
 	 * default time slice (SCX_SLICE_DFL).
 	 */
-	u64		SCX_EV_ENQ_SLICE_DFL;
+	s64		SCX_EV_ENQ_SLICE_DFL;
 
 	/*
 	 * The total duration of bypass modes in nanoseconds.
 	 */
-	u64		SCX_EV_BYPASS_DURATION;
+	s64		SCX_EV_BYPASS_DURATION;
 
 	/*
 	 * The number of tasks dispatched in the bypassing mode.
 	 */
-	u64		SCX_EV_BYPASS_DISPATCH;
+	s64		SCX_EV_BYPASS_DISPATCH;
 
 	/*
 	 * The number of times the bypassing mode has been activated.
 	 */
-	u64		SCX_EV_BYPASS_ACTIVATE;
+	s64		SCX_EV_BYPASS_ACTIVATE;
 };
 
 /*
@@ -1584,7 +1584,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
  * @kind: a kind of event to dump
  */
 #define scx_dump_event(s, events, kind) do {					\
-	dump_line(&(s), "%40s: %16llu", #kind, (events)->kind);			\
+	dump_line(&(s), "%40s: %16lld", #kind, (events)->kind);			\
 } while (0)
 
 
diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
index 45fd643d2ca0..26c40ca4f36c 100644
--- a/tools/sched_ext/scx_qmap.bpf.c
+++ b/tools/sched_ext/scx_qmap.bpf.c
@@ -776,21 +776,21 @@ static int monitor_timerfn(void *map, int *key, struct bpf_timer *timer)
 
 	__COMPAT_scx_bpf_events(&events, sizeof(events));
 
-	bpf_printk("%35s: %llu", "SCX_EV_SELECT_CPU_FALLBACK",
+	bpf_printk("%35s: %lld", "SCX_EV_SELECT_CPU_FALLBACK",
 		   scx_read_event(&events, SCX_EV_SELECT_CPU_FALLBACK));
-	bpf_printk("%35s: %llu", "SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE",
+	bpf_printk("%35s: %lld", "SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE",
 		   scx_read_event(&events, SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE));
-	bpf_printk("%35s: %llu", "SCX_EV_DISPATCH_KEEP_LAST",
+	bpf_printk("%35s: %lld", "SCX_EV_DISPATCH_KEEP_LAST",
 		   scx_read_event(&events, SCX_EV_DISPATCH_KEEP_LAST));
-	bpf_printk("%35s: %llu", "SCX_EV_ENQ_SKIP_EXITING",
+	bpf_printk("%35s: %lld", "SCX_EV_ENQ_SKIP_EXITING",
 		   scx_read_event(&events, SCX_EV_ENQ_SKIP_EXITING));
-	bpf_printk("%35s: %llu", "SCX_EV_ENQ_SLICE_DFL",
+	bpf_printk("%35s: %lld", "SCX_EV_ENQ_SLICE_DFL",
 		   scx_read_event(&events, SCX_EV_ENQ_SLICE_DFL));
-	bpf_printk("%35s: %llu", "SCX_EV_BYPASS_DURATION",
+	bpf_printk("%35s: %lld", "SCX_EV_BYPASS_DURATION",
 		   scx_read_event(&events, SCX_EV_BYPASS_DURATION));
-	bpf_printk("%35s: %llu", "SCX_EV_BYPASS_DISPATCH",
+	bpf_printk("%35s: %lld", "SCX_EV_BYPASS_DISPATCH",
 		   scx_read_event(&events, SCX_EV_BYPASS_DISPATCH));
-	bpf_printk("%35s: %llu", "SCX_EV_BYPASS_ACTIVATE",
+	bpf_printk("%35s: %lld", "SCX_EV_BYPASS_ACTIVATE",
 		   scx_read_event(&events, SCX_EV_BYPASS_ACTIVATE));
 
 	bpf_timer_start(timer, ONE_SEC_IN_NS, 0);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/2] sched_ext: Add trace point to track sched_ext core events
  2025-03-04 10:48 [PATCH v4 0/2] sched_ext: Add trace point to sched_ext core events Changwoo Min
  2025-03-04 10:48 ` [PATCH v4 1/2] sched_ext: Change the event type from u64 to s64 Changwoo Min
@ 2025-03-04 10:49 ` Changwoo Min
  2025-03-04 12:20   ` Andrea Righi
  1 sibling, 1 reply; 6+ messages in thread
From: Changwoo Min @ 2025-03-04 10:49 UTC (permalink / raw)
  To: tj, void, arighi; +Cc: kernel-dev, linux-kernel, Changwoo Min

Add tracing support to track sched_ext core events
(/sched_ext/sched_ext_event). This may be useful for debugging sched_ext
schedulers that trigger a particular event.

The trace point can be used as other trace points, so it can be used in,
for example, `perf trace` and BPF programs, as follows:

======
$> sudo perf trace -e sched_ext:sched_ext_event --filter 'name == "SCX_EV_ENQ_SLICE_DFL"'
======

======
struct tp_sched_ext_event {
	struct trace_entry ent;
	u32 __data_loc_name;
	s64 delta;
};

SEC("tracepoint/sched_ext/sched_ext_event")
int rtp_add_event(struct tp_sched_ext_event *ctx)
{
	char event_name[128];
	unsigned short offset = ctx->__data_loc_name & 0xFFFF;
        bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset);

	bpf_printk("name %s   delta %lld", event_name, ctx->delta);
	return 0;
}
======

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 include/trace/events/sched_ext.h | 19 +++++++++++++++++++
 kernel/sched/ext.c               |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/include/trace/events/sched_ext.h b/include/trace/events/sched_ext.h
index fe19da7315a9..50e4b712735a 100644
--- a/include/trace/events/sched_ext.h
+++ b/include/trace/events/sched_ext.h
@@ -26,6 +26,25 @@ TRACE_EVENT(sched_ext_dump,
 	)
 );
 
+TRACE_EVENT(sched_ext_event,
+	    TP_PROTO(const char *name, __s64 delta),
+	    TP_ARGS(name, delta),
+
+	TP_STRUCT__entry(
+		__string(name, name)
+		__field(	__s64,		delta		)
+	),
+
+	TP_fast_assign(
+		__assign_str(name);
+		__entry->delta		= delta;
+	),
+
+	TP_printk("name %s delta %lld",
+		  __get_str(name), __entry->delta
+	)
+);
+
 #endif /* _TRACE_SCHED_EXT_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 686629a860f3..debcd1cf2de9 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1554,6 +1554,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
  */
 #define scx_add_event(name, cnt) do {						\
 	this_cpu_add(event_stats_cpu.name, cnt);				\
+	trace_sched_ext_event(#name, cnt);					\
 } while(0)
 
 /**
@@ -1565,6 +1566,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
  */
 #define __scx_add_event(name, cnt) do {						\
 	__this_cpu_add(event_stats_cpu.name, cnt);				\
+	trace_sched_ext_event(#name, cnt);					\
 } while(0)
 
 /**
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/2] sched_ext: Add trace point to track sched_ext core events
  2025-03-04 10:49 ` [PATCH v4 2/2] sched_ext: Add trace point to track sched_ext core events Changwoo Min
@ 2025-03-04 12:20   ` Andrea Righi
  2025-03-04 18:08     ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Andrea Righi @ 2025-03-04 12:20 UTC (permalink / raw)
  To: Changwoo Min; +Cc: tj, void, kernel-dev, linux-kernel

On Tue, Mar 04, 2025 at 07:49:00PM +0900, Changwoo Min wrote:
> Add tracing support to track sched_ext core events
> (/sched_ext/sched_ext_event). This may be useful for debugging sched_ext
> schedulers that trigger a particular event.
> 
> The trace point can be used as other trace points, so it can be used in,
> for example, `perf trace` and BPF programs, as follows:
> 
> ======
> $> sudo perf trace -e sched_ext:sched_ext_event --filter 'name == "SCX_EV_ENQ_SLICE_DFL"'
> ======
> 
> ======
> struct tp_sched_ext_event {
> 	struct trace_entry ent;
> 	u32 __data_loc_name;
> 	s64 delta;
> };
> 
> SEC("tracepoint/sched_ext/sched_ext_event")
> int rtp_add_event(struct tp_sched_ext_event *ctx)
> {
> 	char event_name[128];
> 	unsigned short offset = ctx->__data_loc_name & 0xFFFF;
>         bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset);
> 
> 	bpf_printk("name %s   delta %lld", event_name, ctx->delta);
> 	return 0;
> }
> ======
> 
> Signed-off-by: Changwoo Min <changwoo@igalia.com>
> ---
>  include/trace/events/sched_ext.h | 19 +++++++++++++++++++
>  kernel/sched/ext.c               |  2 ++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/include/trace/events/sched_ext.h b/include/trace/events/sched_ext.h
> index fe19da7315a9..50e4b712735a 100644
> --- a/include/trace/events/sched_ext.h
> +++ b/include/trace/events/sched_ext.h
> @@ -26,6 +26,25 @@ TRACE_EVENT(sched_ext_dump,
>  	)
>  );
>  
> +TRACE_EVENT(sched_ext_event,
> +	    TP_PROTO(const char *name, __s64 delta),
> +	    TP_ARGS(name, delta),
> +
> +	TP_STRUCT__entry(
> +		__string(name, name)
> +		__field(	__s64,		delta		)

nit: there's an extra space/tab after delta.

But apart than that LGTM.

Acked-by: Andrea Righi <arighi@nvidia.com>

-Andrea

> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(name);
> +		__entry->delta		= delta;
> +	),
> +
> +	TP_printk("name %s delta %lld",
> +		  __get_str(name), __entry->delta
> +	)
> +);
> +
>  #endif /* _TRACE_SCHED_EXT_H */
>  
>  /* This part must be outside protection */
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 686629a860f3..debcd1cf2de9 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -1554,6 +1554,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
>   */
>  #define scx_add_event(name, cnt) do {						\
>  	this_cpu_add(event_stats_cpu.name, cnt);				\
> +	trace_sched_ext_event(#name, cnt);					\
>  } while(0)
>  
>  /**
> @@ -1565,6 +1566,7 @@ static DEFINE_PER_CPU(struct scx_event_stats, event_stats_cpu);
>   */
>  #define __scx_add_event(name, cnt) do {						\
>  	__this_cpu_add(event_stats_cpu.name, cnt);				\
> +	trace_sched_ext_event(#name, cnt);					\
>  } while(0)
>  
>  /**
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 1/2] sched_ext: Change the event type from u64 to s64
  2025-03-04 10:48 ` [PATCH v4 1/2] sched_ext: Change the event type from u64 to s64 Changwoo Min
@ 2025-03-04 18:05   ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-03-04 18:05 UTC (permalink / raw)
  To: Changwoo Min; +Cc: void, arighi, kernel-dev, linux-kernel

On Tue, Mar 04, 2025 at 07:48:59PM +0900, Changwoo Min wrote:
> The event count could be negative in the future,
> so change the event type from u64 to s64.
> 
> Signed-off-by: Changwoo Min <changwoo@igalia.com>

Applied to sched_ext/for-6.15.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/2] sched_ext: Add trace point to track sched_ext core events
  2025-03-04 12:20   ` Andrea Righi
@ 2025-03-04 18:08     ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-03-04 18:08 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Changwoo Min, void, kernel-dev, linux-kernel

On Tue, Mar 04, 2025 at 01:20:17PM +0100, Andrea Righi wrote:
> On Tue, Mar 04, 2025 at 07:49:00PM +0900, Changwoo Min wrote:
> > Add tracing support to track sched_ext core events
> > (/sched_ext/sched_ext_event). This may be useful for debugging sched_ext
> > schedulers that trigger a particular event.
> > 
> > The trace point can be used as other trace points, so it can be used in,
> > for example, `perf trace` and BPF programs, as follows:
> > 
> > ======
> > $> sudo perf trace -e sched_ext:sched_ext_event --filter 'name == "SCX_EV_ENQ_SLICE_DFL"'
> > ======
> > 
> > ======
> > struct tp_sched_ext_event {
> > 	struct trace_entry ent;
> > 	u32 __data_loc_name;
> > 	s64 delta;
> > };
> > 
> > SEC("tracepoint/sched_ext/sched_ext_event")
> > int rtp_add_event(struct tp_sched_ext_event *ctx)
> > {
> > 	char event_name[128];
> > 	unsigned short offset = ctx->__data_loc_name & 0xFFFF;
> >         bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset);
> > 
> > 	bpf_printk("name %s   delta %lld", event_name, ctx->delta);
> > 	return 0;
> > }
> > ======
> > 
> > Signed-off-by: Changwoo Min <changwoo@igalia.com>
> > ---
> >  include/trace/events/sched_ext.h | 19 +++++++++++++++++++
> >  kernel/sched/ext.c               |  2 ++
> >  2 files changed, 21 insertions(+)
> > 
> > diff --git a/include/trace/events/sched_ext.h b/include/trace/events/sched_ext.h
> > index fe19da7315a9..50e4b712735a 100644
> > --- a/include/trace/events/sched_ext.h
> > +++ b/include/trace/events/sched_ext.h
> > @@ -26,6 +26,25 @@ TRACE_EVENT(sched_ext_dump,
> >  	)
> >  );
> >  
> > +TRACE_EVENT(sched_ext_event,
> > +	    TP_PROTO(const char *name, __s64 delta),
> > +	    TP_ARGS(name, delta),
> > +
> > +	TP_STRUCT__entry(
> > +		__string(name, name)
> > +		__field(	__s64,		delta		)
> 
> nit: there's an extra space/tab after delta.

I think it's one of common formatting styles for tp definitions. If we don't
like it, we can just them in the future.

> But apart than that LGTM.
> 
> Acked-by: Andrea Righi <arighi@nvidia.com>

Applied to sched_ext/for-6.15.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-04 18:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04 10:48 [PATCH v4 0/2] sched_ext: Add trace point to sched_ext core events Changwoo Min
2025-03-04 10:48 ` [PATCH v4 1/2] sched_ext: Change the event type from u64 to s64 Changwoo Min
2025-03-04 18:05   ` Tejun Heo
2025-03-04 10:49 ` [PATCH v4 2/2] sched_ext: Add trace point to track sched_ext core events Changwoo Min
2025-03-04 12:20   ` Andrea Righi
2025-03-04 18:08     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox