linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Ze Gao <zegao2021@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ian Rogers <irogers@google.com>, Ingo Molnar <mingo@redhat.com>,
	Jiri Olsa <jolsa@kernel.org>, Mark Rutland <mark.rutland@arm.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	linux-trace-devel@vger.kernel.org, Ze Gao <zegao@tencent.com>
Subject: Re: [RFC PATCH v3 3/6] sched, tracing: add to report task state in symbolic chars
Date: Tue, 1 Aug 2023 15:41:42 +0200	[thread overview]
Message-ID: <20230801134142.GC11704@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CAD8CoPA-cqe+hY8dHjO+6WhcL2VKy2Wq=rYSSkwUqBFRJ4ECWg@mail.gmail.com>

On Tue, Aug 01, 2023 at 09:03:51PM +0800, Ze Gao wrote:

> It's just the design that exporting internal details is fundamentally wrong.

This is tracing... it wasn't supposed to be ABI (although it somehow
ended up being one). But even then, things like PF_foo get exposed in
procfs but even that we change.

The whole point of tracing is to see internals in order to figure out
wth is going wrong.

> And even worse,  I did not see any userspace tool is aware of masqueraded
> states like TASK_REPORT_IDLE and TASK_REPORT_MAX and let alone
> parse it correctly.

That's probably because I never use tools, I just look at the raw trace
output -- sometimes an impromptu awk script. I'm pretty sure I ran with
something like the below when I did the freezer rewrite -- or perhaps I
just stuck in trace_printk() and didn't even bother with the
tracepoints, I can't remember.

> > Why do we need this character anyway, why not just print the state in
> > hex and leave it at that? These single character state things are a
> > relic, please just let them die.
> 
> I believe hex is ok only after having the reported task state mapping
> appear in the uapi headers, otherwise it's still useless to userspace
> especially for value like TASK_REPORT_IDLE and TASK_REPORT_MAX, which
> need to dig into the kernel to see what the hell is going on here.
> 
> Thoughts?

If you're tracing the kernel, you had better know what the kernel is
doing, otherwise you get to keep the pieces.

Anyway, if you're doing BPF then why do you care about the trace event
at all, just attach to the raw tracepoint and consume @preemt, @prev,
@next and @prev_state.

---
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index fbb99a61f714..cb0c1251729e 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -186,36 +186,6 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new,
 	     TP_PROTO(struct task_struct *p),
 	     TP_ARGS(p));
 
-#ifdef CREATE_TRACE_POINTS
-static inline long __trace_sched_switch_state(bool preempt,
-					      unsigned int prev_state,
-					      struct task_struct *p)
-{
-	unsigned int state;
-
-#ifdef CONFIG_SCHED_DEBUG
-	BUG_ON(p != current);
-#endif /* CONFIG_SCHED_DEBUG */
-
-	/*
-	 * Preemption ignores task state, therefore preempted tasks are always
-	 * RUNNING (we will not have dequeued if state != RUNNING).
-	 */
-	if (preempt)
-		return TASK_REPORT_MAX;
-
-	/*
-	 * task_state_index() uses fls() and returns a value from 0-8 range.
-	 * Decrement it by 1 (except TASK_RUNNING state i.e 0) before using
-	 * it for left shift operation to get the correct task->state
-	 * mapping.
-	 */
-	state = __task_state_index(prev_state, p->exit_state);
-
-	return state ? (1 << (state - 1)) : state;
-}
-#endif /* CREATE_TRACE_POINTS */
-
 /*
  * Tracepoint for task switches, performed by the scheduler:
  */
@@ -242,29 +212,16 @@ TRACE_EVENT(sched_switch,
 		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
 		__entry->prev_pid	= prev->pid;
 		__entry->prev_prio	= prev->prio;
-		__entry->prev_state	= __trace_sched_switch_state(preempt, prev_state, prev);
+		__entry->prev_state	= prev_state | (preempt * TASK_STATE_MAX);
 		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
 		__entry->next_pid	= next->pid;
 		__entry->next_prio	= next->prio;
 		/* XXX SCHED_DEADLINE */
 	),
 
-	TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d",
+	TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%x ==> next_comm=%s next_pid=%d next_prio=%d",
 		__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
-
-		(__entry->prev_state & (TASK_REPORT_MAX - 1)) ?
-		  __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), "|",
-				{ TASK_INTERRUPTIBLE, "S" },
-				{ TASK_UNINTERRUPTIBLE, "D" },
-				{ __TASK_STOPPED, "T" },
-				{ __TASK_TRACED, "t" },
-				{ EXIT_DEAD, "X" },
-				{ EXIT_ZOMBIE, "Z" },
-				{ TASK_PARKED, "P" },
-				{ TASK_DEAD, "I" }) :
-		  "R",
-
-		__entry->prev_state & TASK_REPORT_MAX ? "+" : "",
+		__entry->prev_state,
 		__entry->next_comm, __entry->next_pid, __entry->next_prio)
 );
 

  reply	other threads:[~2023-08-01 13:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01  9:01 [RFC PATCH v3 0/6] add to report task state in symbolic chars from sched tracepoint Ze Gao
2023-08-01  9:01 ` [RFC PATCH v3 1/6] perf sched: sync state char array with the kernel Ze Gao
2023-08-01  9:01 ` [RFC PATCH v3 2/6] perf sched: reorganize sched-out task state report code Ze Gao
2023-08-01  9:01 ` [RFC PATCH v3 3/6] sched, tracing: add to report task state in symbolic chars Ze Gao
2023-08-01 11:34   ` Peter Zijlstra
2023-08-01 13:03     ` Ze Gao
2023-08-01 13:41       ` Peter Zijlstra [this message]
2023-08-02  3:03         ` Ze Gao
2023-08-01 11:45   ` Peter Zijlstra
2023-08-01 13:08     ` Ze Gao
2023-08-01 14:27     ` Steven Rostedt
2023-08-01  9:01 ` [RFC PATCH v3 4/6] sched, tracing: reorganize fields of switch event struct Ze Gao
2023-08-01 11:46   ` Peter Zijlstra
2023-08-01 13:16     ` Ze Gao
2023-08-01 14:33     ` Steven Rostedt
2023-08-02  3:06       ` Ze Gao
2023-08-01 14:16   ` Steven Rostedt
2023-08-02  3:07     ` Ze Gao
2023-08-01  9:01 ` [RFC PATCH v3 5/6] perf sched: prefer to use prev_state_char introduced in sched_switch Ze Gao
2023-08-01  9:01 ` [RFC PATCH v3 6/6] libtraceevent: " Ze Gao
2023-08-01 14:19   ` Steven Rostedt
2023-08-02  3:08     ` Ze Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230801134142.GC11704@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=zegao2021@gmail.com \
    --cc=zegao@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).