* [PATCH v6 1/2] libtraceevent: sync state char array with the kernel
@ 2023-08-03 3:09 Ze Gao
2023-08-03 3:09 ` [PATCH v6 2/2] libtraceevent: prefer to use prev_state_char introduced in sched_switch Ze Gao
2023-12-06 22:55 ` [PATCH v6 1/2] libtraceevent: sync state char array with the kernel Steven Rostedt
0 siblings, 2 replies; 5+ messages in thread
From: Ze Gao @ 2023-08-03 3:09 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-trace-devel, Ze Gao
Update state char array to match the latest kernel
definitions.
Signed-off-by: Ze Gao <zegao@tencent.com>
---
plugins/plugin_sched_switch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
index 8752cae..e0986ac 100644
--- a/plugins/plugin_sched_switch.c
+++ b/plugins/plugin_sched_switch.c
@@ -11,7 +11,7 @@
static void write_state(struct trace_seq *s, int val)
{
- const char states[] = "SDTtZXxW";
+ const char states[] = "SDTtXZPI";
int found = 0;
int i;
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v6 2/2] libtraceevent: prefer to use prev_state_char introduced in sched_switch
2023-08-03 3:09 [PATCH v6 1/2] libtraceevent: sync state char array with the kernel Ze Gao
@ 2023-08-03 3:09 ` Ze Gao
2023-12-06 22:55 ` [PATCH v6 1/2] libtraceevent: sync state char array with the kernel Steven Rostedt
1 sibling, 0 replies; 5+ messages in thread
From: Ze Gao @ 2023-08-03 3:09 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-trace-devel, Ze Gao
Since the sched_switch tracepoint introduces a new variable to
report sched-out task state in symbolic char, we prefer to use
it to spare from knowing internal implementations in kernel.
Also we keep the old parsing logic intact but sync the state char
array with the latest kernel.
Signed-off-by: Ze Gao <zegao@tencent.com>
---
plugins/plugin_sched_switch.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
index e0986ac..d96bd22 100644
--- a/plugins/plugin_sched_switch.c
+++ b/plugins/plugin_sched_switch.c
@@ -11,7 +11,7 @@
static void write_state(struct trace_seq *s, int val)
{
- const char states[] = "SDTtXZPI";
+ const char states[] = "SDTtXZPIp";
int found = 0;
int i;
@@ -99,8 +99,14 @@ static int sched_switch_handler(struct trace_seq *s,
if (tep_get_field_val(s, event, "prev_prio", record, &val, 1) == 0)
trace_seq_printf(s, "[%d] ", (int) val);
- if (tep_get_field_val(s, event, "prev_state", record, &val, 1) == 0)
+ /* find if has prev_state_char, otherwise fallback to prev_state */
+ if (tep_find_field(event, "prev_state_char")) {
+ if (tep_get_field_val(s, event, "prev_state_char", record, &val, 1) == 0)
+ trace_seq_putc(s, (char) val);
+ }
+ else if (tep_get_field_val(s, event, "prev_state", record, &val, 1) == 0) {
write_state(s, val);
+ }
trace_seq_puts(s, " ==> ");
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v6 1/2] libtraceevent: sync state char array with the kernel
2023-08-03 3:09 [PATCH v6 1/2] libtraceevent: sync state char array with the kernel Ze Gao
2023-08-03 3:09 ` [PATCH v6 2/2] libtraceevent: prefer to use prev_state_char introduced in sched_switch Ze Gao
@ 2023-12-06 22:55 ` Steven Rostedt
2023-12-07 2:52 ` Ze Gao
1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2023-12-06 22:55 UTC (permalink / raw)
To: Ze Gao; +Cc: linux-trace-devel, Ze Gao
On Wed, 2 Aug 2023 23:09:54 -0400
Ze Gao <zegao2021@gmail.com> wrote:
> Update state char array to match the latest kernel
> definitions.
>
> Signed-off-by: Ze Gao <zegao@tencent.com>
> ---
> plugins/plugin_sched_switch.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
> index 8752cae..e0986ac 100644
> --- a/plugins/plugin_sched_switch.c
> +++ b/plugins/plugin_sched_switch.c
> @@ -11,7 +11,7 @@
>
> static void write_state(struct trace_seq *s, int val)
> {
> - const char states[] = "SDTtZXxW";
> + const char states[] = "SDTtXZPI";
> int found = 0;
> int i;
>
Hi Ze,
As your update never made it into the kernel I never pulled these patches
in. But we do have an issue with the state getting out of sync with what
the kernel actually uses.
I did this "hack" but it seems to work. What I do is to look at the
sched_switch event print parsing arguments, find the __print_flags
arguments for the "prev_state" field. And then parse them, creating a new
output and I use that for the write_state() function.
I guess this should work for you too?
I still need to pull in this patch and use that as the default. I'll do
that before applying this one, and have your above update be the default.
Thanks!
-- Steve
diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
index 8752caea5c38..760d224f5507 100644
--- a/plugins/plugin_sched_switch.c
+++ b/plugins/plugin_sched_switch.c
@@ -9,13 +9,132 @@
#include "event-parse.h"
#include "trace-seq.h"
-static void write_state(struct trace_seq *s, int val)
+static const char *convert_sym(struct tep_print_flag_sym *sym)
{
- const char states[] = "SDTtZXxW";
+ static char save_states[33];
+
+ memset(save_states, 0, sizeof(save_states));
+
+ /* This is the flags for the prev_state_field, now make them into a string */
+ for (; sym; sym = sym->next) {
+ long bitmask = strtoul(sym->value, NULL, 0);
+ int i;
+
+ for (i = 0; !(bitmask & 1); i++)
+ bitmask >>= 1;
+
+ if (i > 32)
+ continue; // warn?
+
+ save_states[i] = sym->str[0];
+ }
+
+ return save_states;
+}
+
+static struct tep_print_arg_field *
+find_arg_field(struct tep_format_field *prev_state_field, struct tep_print_arg *arg)
+{
+ struct tep_print_arg_field *field;
+
+ if (!arg)
+ return NULL;
+
+ if (arg->type == TEP_PRINT_FIELD)
+ return &arg->field;
+
+ if (arg->type == TEP_PRINT_OP) {
+ field = find_arg_field(prev_state_field, arg->op.left);
+ if (field && field->field == prev_state_field)
+ return field;
+ field = find_arg_field(prev_state_field, arg->op.right);
+ if (field && field->field == prev_state_field)
+ return field;
+ }
+ return NULL;
+}
+
+static struct tep_print_flag_sym *
+test_flags(struct tep_format_field *prev_state_field, struct tep_print_arg *arg)
+{
+ struct tep_print_arg_field *field;
+
+ field = find_arg_field(prev_state_field, arg->flags.field);
+ if (!field)
+ return NULL;
+
+ return arg->flags.flags;
+}
+
+static struct tep_print_flag_sym *
+search_op(struct tep_format_field *prev_state_field, struct tep_print_arg *arg)
+{
+ struct tep_print_flag_sym *sym = NULL;
+
+ if (!arg)
+ return NULL;
+
+ if (arg->type == TEP_PRINT_OP) {
+ sym = search_op(prev_state_field, arg->op.left);
+ if (sym)
+ return sym;
+
+ sym = search_op(prev_state_field, arg->op.right);
+ if (sym)
+ return sym;
+ } else if (arg->type == TEP_PRINT_FLAGS) {
+ sym = test_flags(prev_state_field, arg);
+ }
+
+ return sym;
+}
+
+static const char *get_states(struct tep_format_field *prev_state_field)
+{
+ struct tep_print_flag_sym *sym;
+ struct tep_print_arg *arg;
+ struct tep_event *event;
+
+ event = prev_state_field->event;
+
+ /*
+ * Look at the event format fields, and search for where
+ * the prev_state is parsed via the format flags.
+ */
+ for (arg = event->print_fmt.args; arg; arg = arg->next) {
+ /*
+ * Currently, the __print_flags() for the prev_state
+ * is embedded in operations, so they too must be
+ * searched.
+ */
+ sym = search_op(prev_state_field, arg);
+ if (sym)
+ return convert_sym(sym);
+ }
+ return NULL;
+}
+
+static void write_state(struct trace_seq *s, struct tep_format_field *field,
+ struct tep_record *record)
+{
+ static struct tep_format_field *prev_state_field;
+ static const char *states;
+ unsigned long long val;
int found = 0;
+ int len;
int i;
- for (i = 0; i < (sizeof(states) - 1); i++) {
+ if (field != prev_state_field) {
+ states = get_states(field);
+ if (!states)
+ states = "SDTtZXxW";
+ prev_state_field = field;
+ }
+
+ tep_read_number_field(field, record->data, &val);
+
+ len = strlen(states);
+ for (i = 0; i < len; i++) {
if (!(val & (1 << i)))
continue;
@@ -99,8 +218,8 @@ static int sched_switch_handler(struct trace_seq *s,
if (tep_get_field_val(s, event, "prev_prio", record, &val, 1) == 0)
trace_seq_printf(s, "[%d] ", (int) val);
- if (tep_get_field_val(s, event, "prev_state", record, &val, 1) == 0)
- write_state(s, val);
+ field = tep_find_any_field(event, "prev_state");
+ write_state(s, field, record);
trace_seq_puts(s, " ==> ");
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v6 1/2] libtraceevent: sync state char array with the kernel
2023-12-06 22:55 ` [PATCH v6 1/2] libtraceevent: sync state char array with the kernel Steven Rostedt
@ 2023-12-07 2:52 ` Ze Gao
2023-12-07 13:20 ` Steven Rostedt
0 siblings, 1 reply; 5+ messages in thread
From: Ze Gao @ 2023-12-07 2:52 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-trace-devel, Ze Gao
On Thu, Dec 7, 2023 at 6:54 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 2 Aug 2023 23:09:54 -0400
> Ze Gao <zegao2021@gmail.com> wrote:
>
> > Update state char array to match the latest kernel
> > definitions.
> >
> > Signed-off-by: Ze Gao <zegao@tencent.com>
> > ---
> > plugins/plugin_sched_switch.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
> > index 8752cae..e0986ac 100644
> > --- a/plugins/plugin_sched_switch.c
> > +++ b/plugins/plugin_sched_switch.c
> > @@ -11,7 +11,7 @@
> >
> > static void write_state(struct trace_seq *s, int val)
> > {
> > - const char states[] = "SDTtZXxW";
> > + const char states[] = "SDTtXZPI";
> > int found = 0;
> > int i;
> >
>
> Hi Ze,
>
> As your update never made it into the kernel I never pulled these patches
> in. But we do have an issue with the state getting out of sync with what
> the kernel actually uses.
>
> I did this "hack" but it seems to work. What I do is to look at the
> sched_switch event print parsing arguments, find the __print_flags
> arguments for the "prev_state" field. And then parse them, creating a new
> output and I use that for the write_state() function.
Thanks for your updates, I remembered to reply with my "hacks" which
fix both libtraceevent and "perf sched timehist --state".
FYI, for libtraceevent in this thread:
https://lore.kernel.org/all/CAD8CoPDqZrd2HOUCVkrOq9JgJEQwQnqBn9zD1A3M4aKLZ4mLvg@mail.gmail.com
and here for perf:
https://lore.kernel.org/all/CAD8CoPDqZrd2HOUCVkrOq9JgJEQwQnqBn9zD1A3M4aKLZ4mLvg@mail.gmail.com/
My only concern at that time was the very hacky way to pinpoint the
TEP_PRINT_FLAGS due to the lack of knowledge of libtraceevent design.
So I was not actively seeking to push my fixes to the upstream.
> I guess this should work for you too?
I'll give it a try definitely.
> I still need to pull in this patch and use that as the default. I'll do
> that before applying this one, and have your above update be the default.
>
> Thanks!
> -- Steve
>
> diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
> index 8752caea5c38..760d224f5507 100644
> --- a/plugins/plugin_sched_switch.c
> +++ b/plugins/plugin_sched_switch.c
> @@ -9,13 +9,132 @@
> #include "event-parse.h"
> #include "trace-seq.h"
>
> -static void write_state(struct trace_seq *s, int val)
> +static const char *convert_sym(struct tep_print_flag_sym *sym)
> {
> - const char states[] = "SDTtZXxW";
> + static char save_states[33];
how about a gobal counter instead of fixed sized constant here,
and we can warn if it outbounds, u know, just in case.
> + memset(save_states, 0, sizeof(save_states));
> +
> + /* This is the flags for the prev_state_field, now make them into a string */
> + for (; sym; sym = sym->next) {
> + long bitmask = strtoul(sym->value, NULL, 0);
> + int i;
> +
> + for (i = 0; !(bitmask & 1); i++)
> + bitmask >>= 1;
> +
> + if (i > 32)
> + continue; // warn?
> +
> + save_states[i] = sym->str[0];
> + }
> +
> + return save_states;
> +}
> +
> +static struct tep_print_arg_field *
> +find_arg_field(struct tep_format_field *prev_state_field, struct tep_print_arg *arg)
> +{
> + struct tep_print_arg_field *field;
> +
> + if (!arg)
> + return NULL;
> +
> + if (arg->type == TEP_PRINT_FIELD)
> + return &arg->field;
> +
> + if (arg->type == TEP_PRINT_OP) {
> + field = find_arg_field(prev_state_field, arg->op.left);
> + if (field && field->field == prev_state_field)
> + return field;
> + field = find_arg_field(prev_state_field, arg->op.right);
> + if (field && field->field == prev_state_field)
> + return field;
> + }
> + return NULL;
> +}
> +
> +static struct tep_print_flag_sym *
> +test_flags(struct tep_format_field *prev_state_field, struct tep_print_arg *arg)
> +{
> + struct tep_print_arg_field *field;
> +
> + field = find_arg_field(prev_state_field, arg->flags.field);
> + if (!field)
> + return NULL;
> +
> + return arg->flags.flags;
> +}
> +
> +static struct tep_print_flag_sym *
> +search_op(struct tep_format_field *prev_state_field, struct tep_print_arg *arg)
> +{
> + struct tep_print_flag_sym *sym = NULL;
> +
> + if (!arg)
> + return NULL;
> +
> + if (arg->type == TEP_PRINT_OP) {
> + sym = search_op(prev_state_field, arg->op.left);
> + if (sym)
> + return sym;
> +
> + sym = search_op(prev_state_field, arg->op.right);
> + if (sym)
> + return sym;
> + } else if (arg->type == TEP_PRINT_FLAGS) {
> + sym = test_flags(prev_state_field, arg);
> + }
> +
> + return sym;
> +}
> +
> +static const char *get_states(struct tep_format_field *prev_state_field)
> +{
> + struct tep_print_flag_sym *sym;
> + struct tep_print_arg *arg;
> + struct tep_event *event;
> +
> + event = prev_state_field->event;
> +
> + /*
> + * Look at the event format fields, and search for where
> + * the prev_state is parsed via the format flags.
> + */
> + for (arg = event->print_fmt.args; arg; arg = arg->next) {
> + /*
> + * Currently, the __print_flags() for the prev_state
> + * is embedded in operations, so they too must be
> + * searched.
> + */
> + sym = search_op(prev_state_field, arg);
> + if (sym)
> + return convert_sym(sym);
> + }
> + return NULL;
> +}
> +
> +static void write_state(struct trace_seq *s, struct tep_format_field *field,
> + struct tep_record *record)
> +{
> + static struct tep_format_field *prev_state_field;
> + static const char *states;
> + unsigned long long val;
> int found = 0;
> + int len;
> int i;
>
> - for (i = 0; i < (sizeof(states) - 1); i++) {
> + if (field != prev_state_field) {
> + states = get_states(field);
I think we only need to build the task_state_char_array once,
and there are no other possibilities here. Correct me if I'm stupid :)
So here we can check if the char array is built, if so, we can
directly look it up in the mapping, right?
> + if (!states)
> + states = "SDTtZXxW";
I think I've done experiments on the very old kernel in which the TP_print
is brought in the sched_switch tracepoint. The "hack" works for it too,
so this should not happen. Anyway, I'll give it a try and keep this
thread updated
if anything goes wrong.
Regards,
Ze Gao
> + prev_state_field = field;
> + }
> +
> + tep_read_number_field(field, record->data, &val);
> +
> + len = strlen(states);
> + for (i = 0; i < len; i++) {
> if (!(val & (1 << i)))
> continue;
>
> @@ -99,8 +218,8 @@ static int sched_switch_handler(struct trace_seq *s,
> if (tep_get_field_val(s, event, "prev_prio", record, &val, 1) == 0)
> trace_seq_printf(s, "[%d] ", (int) val);
>
> - if (tep_get_field_val(s, event, "prev_state", record, &val, 1) == 0)
> - write_state(s, val);
> + field = tep_find_any_field(event, "prev_state");
> + write_state(s, field, record);
>
> trace_seq_puts(s, " ==> ");
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v6 1/2] libtraceevent: sync state char array with the kernel
2023-12-07 2:52 ` Ze Gao
@ 2023-12-07 13:20 ` Steven Rostedt
0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2023-12-07 13:20 UTC (permalink / raw)
To: Ze Gao; +Cc: linux-trace-devel, Ze Gao
On Thu, 7 Dec 2023 10:52:35 +0800
Ze Gao <zegao2021@gmail.com> wrote:
> On Thu, Dec 7, 2023 at 6:54 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 2 Aug 2023 23:09:54 -0400
> > Ze Gao <zegao2021@gmail.com> wrote:
> >
> > As your update never made it into the kernel I never pulled these patches
> > in. But we do have an issue with the state getting out of sync with what
> > the kernel actually uses.
> >
> > I did this "hack" but it seems to work. What I do is to look at the
> > sched_switch event print parsing arguments, find the __print_flags
> > arguments for the "prev_state" field. And then parse them, creating a new
> > output and I use that for the write_state() function.
>
> Thanks for your updates, I remembered to reply with my "hacks" which
> fix both libtraceevent and "perf sched timehist --state".
> FYI, for libtraceevent in this thread:
> https://lore.kernel.org/all/CAD8CoPDqZrd2HOUCVkrOq9JgJEQwQnqBn9zD1A3M4aKLZ4mLvg@mail.gmail.com
And somehow I missed this reply :-/
Sorry about that.
> and here for perf:
> https://lore.kernel.org/all/CAD8CoPDqZrd2HOUCVkrOq9JgJEQwQnqBn9zD1A3M4aKLZ4mLvg@mail.gmail.com/
>
> My only concern at that time was the very hacky way to pinpoint the
> TEP_PRINT_FLAGS due to the lack of knowledge of libtraceevent design.
> So I was not actively seeking to push my fixes to the upstream.
>
> > I guess this should work for you too?
>
> I'll give it a try definitely.
>
> > I still need to pull in this patch and use that as the default. I'll do
> > that before applying this one, and have your above update be the default.
> >
> > Thanks!
>
>
> > -- Steve
> >
> > diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
> > index 8752caea5c38..760d224f5507 100644
> > --- a/plugins/plugin_sched_switch.c
> > +++ b/plugins/plugin_sched_switch.c
> > @@ -9,13 +9,132 @@
> > #include "event-parse.h"
> > #include "trace-seq.h"
> >
> > -static void write_state(struct trace_seq *s, int val)
> > +static const char *convert_sym(struct tep_print_flag_sym *sym)
> > {
> > - const char states[] = "SDTtZXxW";
> > + static char save_states[33];
>
> how about a gobal counter instead of fixed sized constant here,
> and we can warn if it outbounds, u know, just in case.
I picked 33 to be 1 greater than 32 as that's the max size of prev_state on
32 bit systems. I should change that to be a macro though.
>
> > + memset(save_states, 0, sizeof(save_states));
> > +
> > + /* This is the flags for the prev_state_field, now make them into a string */
> > + for (; sym; sym = sym->next) {
> > + long bitmask = strtoul(sym->value, NULL, 0);
> > + int i;
> > +
> > + for (i = 0; !(bitmask & 1); i++)
> > + bitmask >>= 1;
> > +
> > + if (i > 32)
> > + continue; // warn?
> > +
> > + save_states[i] = sym->str[0];
> > + }
> > +
> > + return save_states;
> > +}
> > +
> > +static void write_state(struct trace_seq *s, struct tep_format_field *field,
> > + struct tep_record *record)
> > +{
> > + static struct tep_format_field *prev_state_field;
> > + static const char *states;
> > + unsigned long long val;
> > int found = 0;
> > + int len;
> > int i;
> >
> > - for (i = 0; i < (sizeof(states) - 1); i++) {
> > + if (field != prev_state_field) {
> > + states = get_states(field);
>
> I think we only need to build the task_state_char_array once,
> and there are no other possibilities here. Correct me if I'm stupid :)
libtracecmd can be used for multiple trace.dat files, that in theory can
have more than one version of sched_switch. And yes this can be called with
two different versions of prev_state.
trace-cmd report -i trace1.dat -i trace2.dat
This is commonly done to see host and guest trace data interleaved, where
the host and guest could have different versions of sched_switch.
> So here we can check if the char array is built, if so, we can
> directly look it up in the mapping, right?
>
> > + if (!states)
> > + states = "SDTtZXxW";
>
> I think I've done experiments on the very old kernel in which the TP_print
> is brought in the sched_switch tracepoint. The "hack" works for it too,
> so this should not happen. Anyway, I'll give it a try and keep this
I agree that "this should not happen" but I prefer to be safe than sorry,
and always try to write robust code. If in never happens we just waste a
branch, if it does happen, it can give some weird result. I don't know what
is in the future.
> thread updated
> if anything goes wrong.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-12-07 13:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-03 3:09 [PATCH v6 1/2] libtraceevent: sync state char array with the kernel Ze Gao
2023-08-03 3:09 ` [PATCH v6 2/2] libtraceevent: prefer to use prev_state_char introduced in sched_switch Ze Gao
2023-12-06 22:55 ` [PATCH v6 1/2] libtraceevent: sync state char array with the kernel Steven Rostedt
2023-12-07 2:52 ` Ze Gao
2023-12-07 13:20 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).