From: Ze Gao <zegao2021@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Namhyung Kim <namhyung@kernel.org>,
Ian Rogers <irogers@google.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Jiri Olsa <jolsa@kernel.org>, Mark Rutland <mark.rutland@arm.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, Ze Gao <zegao@tencent.com>
Subject: Re: [RFC PATCH 1/3] sched, tracing: report task state in symbolic chars instead
Date: Wed, 26 Jul 2023 10:41:30 +0800 [thread overview]
Message-ID: <CAD8CoPBTCXiusUGNTEwcm9WJ+MEfrX01xsCWYAtA7MCpfaeqmQ@mail.gmail.com> (raw)
In-Reply-To: <20230725135938.1e056a02@rorschach.local.home>
On Wed, Jul 26, 2023 at 1:59 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 25 Jul 2023 15:22:52 +0800
> Ze Gao <zegao2021@gmail.com> wrote:
>
> > #ifdef CREATE_TRACE_POINTS
> > -static inline long __trace_sched_switch_state(bool preempt,
> > +static inline const char __trace_sched_switch_state(bool preempt,
>
> Does it really need to be "const"?
Not really since it's rvalue after all, will remove this.
> > unsigned int prev_state,
> > struct task_struct *p)
> > {
> > @@ -197,13 +198,6 @@ static inline long __trace_sched_switch_state(bool preempt,
> > BUG_ON(p != current);
> > #endif /* CONFIG_SCHED_DEBUG */
> >
> > - /*
> > - * Preemption ignores task state, therefore preempted tasks are always
> > - * RUNNING (we will not have dequeued if state != RUNNING).
> > - */
> > - if (preempt)
> > - return TASK_REPORT_MAX;
> > -
> > /*
> > * task_state_index() uses fls() and returns a value from 0-8 range.
> > * Decrement it by 1 (except TASK_RUNNING state i.e 0) before using
> > @@ -212,7 +206,16 @@ static inline long __trace_sched_switch_state(bool preempt,
> > */
> > state = __task_state_index(prev_state, p->exit_state);
> >
> > - return state ? (1 << (state - 1)) : state;
> > + /*
> > + * Preemption ignores task state, therefore preempted tasks are always
> > + * RUNNING (we will not have dequeued if state != RUNNING).
> > + * Here, we use 'p' to denote this case and only for this case.
> > + */
> > + if (preempt)
> > + return 'p';
> > +
> > +
> > + return task_index_to_char(state);
> > }
> > #endif /* CREATE_TRACE_POINTS */
> >
> > @@ -232,7 +235,7 @@ TRACE_EVENT(sched_switch,
> > __array( char, prev_comm, TASK_COMM_LEN )
> > __field( pid_t, prev_pid )
> > __field( int, prev_prio )
> > - __field( long, prev_state )
> > + __field( char, prev_state )
> > __array( char, next_comm, TASK_COMM_LEN )
> > __field( pid_t, next_pid )
> > __field( int, next_prio )
>
> This will break userspace. Just because you update libtraceevent
> doesn't mean that it will get to all the users of it. There's still
> tools that have the old method hard coded and doesn't use the library.
>
> Now, because the old tools still do the parsing of this format, we can
> add a new field called prev_state_char that will give you this. Now to
> save space, we should change prev_state to int (can't make it short as
> there's that test for "+" which does sometimes happen). I believe we
> can make prev_prio and next prio into shorts (and possibly chars!).
>
> -- Steve
>
Agreed, and my first attempt was to not break anything here by
introducing a new "char", which is kinda redundant, but is a good way
for smooth transition to decouple userspace from kernel on this weird
design(of ABI).
However, given the fact the in-tree perf still suffers this, I doubt many
other tools could spare from it. And the best guess is that they are
still partially broken silently and reports wrong and confusing states.
And this is basically why I came up with this RFC and deliberately
let them fail loudly to do a painful but quick recovery.
Anyway, if we want to keep status quo, I can send over another two
patches to fix perf and libtraceevent specifically and leave the tracepoint
alone. But seriously, imho we might have no chance to correct this.
and without a formal abi, the raw value (let alone the masqueraded ones)
is just meaningless for userspace and that leaves it intended for kernel
developers only.
Thanks,
Ze
next prev parent reply other threads:[~2023-07-26 2:41 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 7:22 [RFC PATCH 0/3] report task state in symbolic chars in sched tracepoints Ze Gao
2023-07-25 7:22 ` [RFC PATCH 1/3] sched, tracing: report task state in symbolic chars instead Ze Gao
2023-07-25 8:33 ` Peter Zijlstra
2023-07-25 10:53 ` Ze Gao
2023-07-25 13:31 ` Peter Zijlstra
2023-07-25 20:13 ` Steven Rostedt
2023-07-25 17:59 ` Steven Rostedt
2023-07-25 20:16 ` Peter Zijlstra
2023-07-25 21:55 ` Steven Rostedt
2023-07-26 2:48 ` Ze Gao
2023-07-26 2:41 ` Ze Gao [this message]
2023-07-25 7:22 ` [RFC PATCH 2/3] perf sched: sync with latest sched_switch tracepoint definition Ze Gao
2023-07-25 7:22 ` [RFC PATCH 3/3] libtraceevent: " Ze Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAD8CoPBTCXiusUGNTEwcm9WJ+MEfrX01xsCWYAtA7MCpfaeqmQ@mail.gmail.com \
--to=zegao2021@gmail.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=zegao@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).