From: Ze Gao <zegao2021@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Ian Rogers <irogers@google.com>, Ingo Molnar <mingo@redhat.com>,
Jiri Olsa <jolsa@kernel.org>, Mark Rutland <mark.rutland@arm.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
linux-trace-kernel@vger.kernel.org,
linux-trace-devel@vger.kernel.org, Ze Gao <zegao@tencent.com>
Subject: [RFC PATCH v2 1/3] sched, tracing: add to report task state in symbolic chars
Date: Wed, 26 Jul 2023 20:16:16 +0800 [thread overview]
Message-ID: <20230726121618.19198-2-zegao@tencent.com> (raw)
In-Reply-To: <20230726121618.19198-1-zegao@tencent.com>
Internal representations of task state are likely to be changed or
ordered, and reporting them to userspace without exporting them as
part of API is not a good choice, which can easily break a userspace
observability tool as kernel evolves. For example, perf suffers from
this and still reports wrong states by this patch.
OTOH, some masqueraded state like TASK_REPORT_IDLE and TASK_REPORT_MAX
are also reported inadvertently, which confuses things even more.
So add a new variable in company with the old raw value to report task
state in symbolic char, which is self-explaining and no further
translation is needed, and also report priorities in 'short' to save
some buffer space. Of course this does not break any userspace tool.
Note for PREEMPT_ACTIVE, we introduce 'p' to report it and use the old
conventions for the rest.
Signed-off-by: Ze Gao <zegao@tencent.com>
---
include/trace/events/sched.h | 60 +++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 25 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index fbb99a61f714..0fcf68f49e45 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -6,6 +6,7 @@
#define _TRACE_SCHED_H
#include <linux/kthread.h>
+#include <linux/sched.h>
#include <linux/sched/numa_balancing.h>
#include <linux/tracepoint.h>
#include <linux/binfmts.h>
@@ -214,6 +215,27 @@ static inline long __trace_sched_switch_state(bool preempt,
return state ? (1 << (state - 1)) : state;
}
+
+static inline char __trace_sched_switch_state_char(bool preempt,
+ unsigned int prev_state,
+ struct task_struct *p)
+{
+ long state;
+
+#ifdef CONFIG_SCHED_DEBUG
+ BUG_ON(p != current);
+#endif /* CONFIG_SCHED_DEBUG */
+
+ /*
+ * For PREEMPT_ACTIVE, we introduce 'p' to report it and use the old
+ * conventions for the rest.
+ */
+ if (preempt)
+ return 'p';
+
+ state = __task_state_index(prev_state, p->exit_state);
+ return task_index_to_char(state);
+}
#endif /* CREATE_TRACE_POINTS */
/*
@@ -231,41 +253,29 @@ TRACE_EVENT(sched_switch,
TP_STRUCT__entry(
__array( char, prev_comm, TASK_COMM_LEN )
__field( pid_t, prev_pid )
- __field( int, prev_prio )
- __field( long, prev_state )
+ __field( short, prev_prio )
+ __field( int, prev_state )
+ __field( char, prev_state_char )
__array( char, next_comm, TASK_COMM_LEN )
__field( pid_t, next_pid )
- __field( int, next_prio )
+ __field( short, next_prio )
),
TP_fast_assign(
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
- __entry->prev_pid = prev->pid;
- __entry->prev_prio = prev->prio;
- __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev);
+ __entry->prev_pid = prev->pid;
+ __entry->prev_prio = (short) prev->prio;
+ __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev);
+ __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev);
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
- __entry->next_pid = next->pid;
- __entry->next_prio = next->prio;
+ __entry->next_pid = next->pid;
+ __entry->next_prio = (short) next->prio;
/* XXX SCHED_DEADLINE */
),
- TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d",
- __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
-
- (__entry->prev_state & (TASK_REPORT_MAX - 1)) ?
- __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), "|",
- { TASK_INTERRUPTIBLE, "S" },
- { TASK_UNINTERRUPTIBLE, "D" },
- { __TASK_STOPPED, "T" },
- { __TASK_TRACED, "t" },
- { EXIT_DEAD, "X" },
- { EXIT_ZOMBIE, "Z" },
- { TASK_PARKED, "P" },
- { TASK_DEAD, "I" }) :
- "R",
-
- __entry->prev_state & TASK_REPORT_MAX ? "+" : "",
- __entry->next_comm, __entry->next_pid, __entry->next_prio)
+ TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%c ==> next_comm=%s next_pid=%d next_prio=%d",
+ __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, __entry->prev_state_char, __entry->next_comm,
+ __entry->next_pid, __entry->next_prio)
);
/*
--
2.40.1
next prev parent reply other threads:[~2023-07-26 12:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-26 12:16 [RFC PATCH v2 0/3] report task state in symbolic chars from sched tracepoint Ze Gao
2023-07-26 12:16 ` Ze Gao [this message]
2023-07-31 9:36 ` [RFC PATCH v2 1/3] sched, tracing: add to report task state in symbolic chars Peter Zijlstra
2023-07-31 10:17 ` Ze Gao
2023-07-31 11:00 ` Ze Gao
2023-07-31 15:38 ` Steven Rostedt
2023-08-01 1:35 ` Ze Gao
2023-07-26 12:16 ` [RFC PATCH v2 2/3] perf sched: use the new prev_state_char instead in tracepoint sched_switch Ze Gao
2023-07-26 12:16 ` [RFC PATCH v2 3/3] libtraceevent: " Ze Gao
2023-07-31 15:40 ` Steven Rostedt
2023-08-01 1:36 ` Ze Gao
2023-07-28 17:29 ` [RFC PATCH v2 0/3] report task state in symbolic chars from sched tracepoint Ian Rogers
2023-07-31 2:09 ` Ze Gao
2023-07-31 3:52 ` Masami Hiramatsu
2023-07-31 6:19 ` Ze Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230726121618.19198-2-zegao@tencent.com \
--to=zegao2021@gmail.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-trace-devel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=zegao@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).