From: Ze Gao <zegao2021@gmail.com>
To: Adrian Hunter <adrian.hunter@intel.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Ian Rogers <irogers@google.com>, Ingo Molnar <mingo@redhat.com>,
Jiri Olsa <jolsa@kernel.org>, Mark Rutland <mark.rutland@arm.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, Ze Gao <zegao@tencent.com>
Subject: [RFC PATCH v6 4/5] sched, tracing: add to report task state in symbolic chars
Date: Thu, 3 Aug 2023 04:33:51 -0400 [thread overview]
Message-ID: <20230803083352.1585-5-zegao@tencent.com> (raw)
In-Reply-To: <20230803083352.1585-1-zegao@tencent.com>
Internal representations of task state are likely to be changed
or ordered, and reporting them to userspace without exporting
them as part of API is basically wrong, which can easily break
a userspace observability tool as kernel evolves. For example,
perf suffers from this and still reports wrong states as of this
writing.
OTOH, some masqueraded states like TASK_REPORT_IDLE and
TASK_REPORT_MAX are also reported inadvertently, which confuses
things even more and most userspace tools do not even take them
into consideration.
So add a new variable in company with the old raw value to
report task state in symbolic chars, which are self-explaining
and no further translation is needed. Of course this does not
break any userspace tool.
Note for PREEMPT_ACTIVE, we introduce 'p' to report it and use
the old conventions for the rest.
Signed-off-by: Ze Gao <zegao@tencent.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
---
include/trace/events/sched.h | 44 ++++++++++++++++++++++--------------
1 file changed, 27 insertions(+), 17 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 43492daefa6f..ae5b486cc969 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -6,6 +6,7 @@
#define _TRACE_SCHED_H
#include <linux/kthread.h>
+#include <linux/sched.h>
#include <linux/sched/numa_balancing.h>
#include <linux/tracepoint.h>
#include <linux/binfmts.h>
@@ -214,6 +215,27 @@ static inline short __trace_sched_switch_state(bool preempt,
return state ? (1 << (state - 1)) : state;
}
+
+static inline char __trace_sched_switch_state_char(bool preempt,
+ unsigned int prev_state,
+ struct task_struct *p)
+{
+ long state;
+
+#ifdef CONFIG_SCHED_DEBUG
+ WARN_ON_ONCE(p != current);
+#endif /* CONFIG_SCHED_DEBUG */
+
+ /*
+ * For PREEMPT_ACTIVE, we introduce 'p' to report it and use the old
+ * conventions for the rest.
+ */
+ if (preempt)
+ return 'p';
+
+ state = __task_state_index(prev_state, p->exit_state);
+ return task_index_to_char(state);
+}
#endif /* CREATE_TRACE_POINTS */
/*
@@ -236,6 +258,7 @@ TRACE_EVENT(sched_switch,
__array( char, prev_comm, TASK_COMM_LEN )
__array( char, next_comm, TASK_COMM_LEN )
__field( short, prev_state )
+ __field( char, prev_state_char )
),
TP_fast_assign(
@@ -246,26 +269,13 @@ TRACE_EVENT(sched_switch,
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev);
+ __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev);
/* XXX SCHED_DEADLINE */
),
- TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d",
- __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
-
- (__entry->prev_state & (TASK_REPORT_MAX - 1)) ?
- __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), "|",
- { TASK_INTERRUPTIBLE, "S" },
- { TASK_UNINTERRUPTIBLE, "D" },
- { __TASK_STOPPED, "T" },
- { __TASK_TRACED, "t" },
- { EXIT_DEAD, "X" },
- { EXIT_ZOMBIE, "Z" },
- { TASK_PARKED, "P" },
- { TASK_DEAD, "I" }) :
- "R",
-
- __entry->prev_state & TASK_REPORT_MAX ? "+" : "",
- __entry->next_comm, __entry->next_pid, __entry->next_prio)
+ TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%c ==> next_comm=%s next_pid=%d next_prio=%d",
+ __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, __entry->prev_state_char, __entry->next_comm,
+ __entry->next_pid, __entry->next_prio)
);
/*
--
2.41.0
next prev parent reply other threads:[~2023-08-03 8:36 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-03 8:33 [RFC PATCH v6 0/5] fix task state report from sched tracepoint Ze Gao
2023-08-03 8:33 ` [RFC PATCH v6 1/5] perf sched: sync state char array with the kernel Ze Gao
2023-08-03 9:09 ` Steven Rostedt
2023-08-03 10:29 ` Ze Gao
2023-08-03 12:25 ` Ze Gao
2023-08-03 12:39 ` Ze Gao
2023-08-03 15:10 ` Steven Rostedt
2023-08-04 2:21 ` Ze Gao
2023-08-04 2:38 ` Ze Gao
2023-08-04 3:19 ` Ze Gao
2023-08-04 3:41 ` Steven Rostedt
2023-08-10 5:50 ` Ze Gao
2023-08-10 6:07 ` [PATCH] perf sched: parse task state from tracepoint print format Ze Gao
2023-08-11 17:28 ` Steven Rostedt
2023-08-14 2:28 ` Ze Gao
2023-08-03 8:33 ` [RFC PATCH v6 2/5] perf sched: reorganize sched-out task state report code Ze Gao
2023-08-03 9:10 ` Steven Rostedt
2023-08-03 12:37 ` Ze Gao
2023-08-03 8:33 ` [RFC PATCH v6 3/5] sched, tracing: reorganize fields of switch event struct Ze Gao
2023-08-03 8:53 ` Peter Zijlstra
2023-08-03 11:06 ` Ze Gao
2023-08-03 9:18 ` Steven Rostedt
2023-08-03 9:51 ` Peter Zijlstra
2023-08-03 14:45 ` Steven Rostedt
2023-08-03 12:54 ` Ze Gao
2023-08-03 12:57 ` Ze Gao
2023-08-23 2:52 ` kernel test robot
2023-08-03 8:33 ` Ze Gao [this message]
2023-08-03 8:59 ` [RFC PATCH v6 4/5] sched, tracing: add to report task state in symbolic chars Peter Zijlstra
2023-08-03 13:09 ` Ze Gao
2023-08-03 9:29 ` Steven Rostedt
2023-08-03 10:55 ` Ze Gao
2023-08-03 8:33 ` [RFC PATCH v6 5/5] perf sched: prefer to use prev_state_char introduced in sched_switch Ze Gao
2023-08-03 9:34 ` Steven Rostedt
2023-08-03 11:01 ` Ze Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230803083352.1585-5-zegao@tencent.com \
--to=zegao2021@gmail.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=zegao@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).