* perf script, libperf: python binding bug (bytearrays vs. strings)
@ 2020-09-27 7:43 Hagen Paul Pfeifer
2020-09-27 9:20 ` Hagen Paul Pfeifer
2020-09-28 10:08 ` Jiri Olsa
0 siblings, 2 replies; 6+ messages in thread
From: Hagen Paul Pfeifer @ 2020-09-27 7:43 UTC (permalink / raw)
To: linux-kernel; +Cc: Jiri Olsa, Arnaldo Carvalho de Melo
Hallo Jiri, Arnaldo,
after updating Debian (probably with the advent of Python 3.8.5, guessing)
I get a wired behavior with python scripting. The error is that the python type
for prev_comm and next_comm are not strings anymore, rather bytearrays. Which
are incompatible types and scripts will not work anymore. NOTE: common_comm is
still fine (see swapper & mutex-thread-co), so they must be treated internal
differently compared to prev_comm and next_comm and possibly show a way to solve
this problem!
After bisecting the kernel (perf) even back to v5.6 the problem still exist.
Compiling perf with PYTHON=python2 do not show any problems - no problems in
the Python2 world. So I assume with Python 3.8.5 (or other helper library)
something changed internally. I assume the cause exists in perf forever but
is now triggered with the new Python3 version.
How to reproduce:
make PYTHON=python3
./perf record -e sched:sched_switch -a -- sleep 5
./perf script --gen-script py
./perf script -s ./perf-script.py
[..]
sched__sched_switch 7 563231.759525792 0 swapper prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'), next_pid=3447985, next_prio=120
Sample: {addr=0, cpu=7, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744072189289569, period=1, phys_addr=0, pid=0, tid=0, time=563231759525792, transaction=0, values=[(0, 0)], weight=0}
sched__sched_switch 7 563231.759582596 3447985 mutex-thread-co prev_comm=bytearray(b'mutex-thread-co\x00'), prev_pid=3447985, prev_prio=120, prev_state=, next_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), next_pid=0, next_prio=120
Sample: {addr=0, cpu=7, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744072189289569, period=1, phys_addr=0, pid=3447983, tid=3447985, time=563231759582596, transaction=0, values=[(0, 0)], weight=0}
See =bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00') - should be swapper/7
Note: the byte array has the length of 16 - exactly like the kernel
(TASK_COMM_LEN). I assume this is somehow copied directly into the variables
and not stringified anymore.
Even worse: I discovered bytearrays which are not correctly "memseted":
bytearray(b'chrome\x00sandbox\x00\x00')
chrome should be the comm name, but is 'chromesandbox' somehow. See the null
bytes in between.
Jiri, Arnaldo - I tried to fix this. But the Python binding magic for the
automatically generated events are hard to get comfy.
Hagen
PS: assume this fix is also kernel stable relevant.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf script, libperf: python binding bug (bytearrays vs. strings)
2020-09-27 7:43 perf script, libperf: python binding bug (bytearrays vs. strings) Hagen Paul Pfeifer
@ 2020-09-27 9:20 ` Hagen Paul Pfeifer
2020-09-28 10:08 ` Jiri Olsa
1 sibling, 0 replies; 6+ messages in thread
From: Hagen Paul Pfeifer @ 2020-09-27 9:20 UTC (permalink / raw)
To: linux-kernel; +Cc: Jiri Olsa, Arnaldo Carvalho de Melo
Short addendum: even under python2 not everything is perfect. It seems that
the "comm-name-memory-reuse" problem is a general problem - for python3 and
python2.
In the following example the prev_comm name should be chrome only, but is
"chromesandbox", common_comm is correctly "chrome". I believe that the memory
area for prev_comm & next_comm has not been fully memset'ed before
(memset(mem, 0, 16)).
That the string in Python3 is now a bytearray instead of a string is a
second, independent problem.
Python 2 example:
sched__sched_switch 2 563232.889356634 2924848 chrome prev_comm=chromesandbox, prev_pid=2924848, prev_prio=120, prev_state=S, next_comm=swapper/2, next_pid=0, next_prio=120
Sample: {addr=0, cpu=2, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744072189289569, period=1, phys_addr=0, pid=2924848, tid=2924848, time=563232889356634, transaction=0, values=[(0L, 0L)], weight=0}
Hagen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf script, libperf: python binding bug (bytearrays vs. strings)
2020-09-27 7:43 perf script, libperf: python binding bug (bytearrays vs. strings) Hagen Paul Pfeifer
2020-09-27 9:20 ` Hagen Paul Pfeifer
@ 2020-09-28 10:08 ` Jiri Olsa
2020-09-28 10:43 ` Hagen Paul Pfeifer
1 sibling, 1 reply; 6+ messages in thread
From: Jiri Olsa @ 2020-09-28 10:08 UTC (permalink / raw)
To: Hagen Paul Pfeifer; +Cc: linux-kernel, Jiri Olsa, Arnaldo Carvalho de Melo
On Sun, Sep 27, 2020 at 09:43:12AM +0200, Hagen Paul Pfeifer wrote:
> Hallo Jiri, Arnaldo,
>
> after updating Debian (probably with the advent of Python 3.8.5, guessing)
> I get a wired behavior with python scripting. The error is that the python type
> for prev_comm and next_comm are not strings anymore, rather bytearrays. Which
> are incompatible types and scripts will not work anymore. NOTE: common_comm is
> still fine (see swapper & mutex-thread-co), so they must be treated internal
> differently compared to prev_comm and next_comm and possibly show a way to solve
> this problem!
>
> After bisecting the kernel (perf) even back to v5.6 the problem still exist.
> Compiling perf with PYTHON=python2 do not show any problems - no problems in
> the Python2 world. So I assume with Python 3.8.5 (or other helper library)
> something changed internally. I assume the cause exists in perf forever but
> is now triggered with the new Python3 version.
>
> How to reproduce:
>
> make PYTHON=python3
> ./perf record -e sched:sched_switch -a -- sleep 5
> ./perf script --gen-script py
> ./perf script -s ./perf-script.py
>
> [..]
> sched__sched_switch 7 563231.759525792 0 swapper prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'), next_pid=3447985, next_prio=120
> Sample: {addr=0, cpu=7, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744072189289569, period=1, phys_addr=0, pid=0, tid=0, time=563231759525792, transaction=0, values=[(0, 0)], weight=0}
>
> sched__sched_switch 7 563231.759582596 3447985 mutex-thread-co prev_comm=bytearray(b'mutex-thread-co\x00'), prev_pid=3447985, prev_prio=120, prev_state=, next_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), next_pid=0, next_prio=120
> Sample: {addr=0, cpu=7, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744072189289569, period=1, phys_addr=0, pid=3447983, tid=3447985, time=563231759582596, transaction=0, values=[(0, 0)], weight=0}
>
>
> See =bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00') - should be swapper/7
>
>
> Note: the byte array has the length of 16 - exactly like the kernel
> (TASK_COMM_LEN). I assume this is somehow copied directly into the variables
> and not stringified anymore.
>
>
> Even worse: I discovered bytearrays which are not correctly "memseted":
> bytearray(b'chrome\x00sandbox\x00\x00')
>
> chrome should be the comm name, but is 'chromesandbox' somehow. See the null
> bytes in between.
>
> Jiri, Arnaldo - I tried to fix this. But the Python binding magic for the
> automatically generated events are hard to get comfy.
>
> Hagen
>
> PS: assume this fix is also kernel stable relevant.
>
patch below fixes it for me, but seems strange this was
working till now.. maybe you're the only one using this
with python3 ;-)
jirka
---
diff --git a/tools/perf/util/print_binary.c b/tools/perf/util/print_binary.c
index 599a1543871d..13fdc51c61d9 100644
--- a/tools/perf/util/print_binary.c
+++ b/tools/perf/util/print_binary.c
@@ -50,7 +50,7 @@ int is_printable_array(char *p, unsigned int len)
len--;
- for (i = 0; i < len; i++) {
+ for (i = 0; i < len && p[i]; i++) {
if (!isprint(p[i]) && !isspace(p[i]))
return 0;
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: perf script, libperf: python binding bug (bytearrays vs. strings)
2020-09-28 10:08 ` Jiri Olsa
@ 2020-09-28 10:43 ` Hagen Paul Pfeifer
2020-09-28 13:39 ` Jiri Olsa
0 siblings, 1 reply; 6+ messages in thread
From: Hagen Paul Pfeifer @ 2020-09-28 10:43 UTC (permalink / raw)
To: Jiri Olsa; +Cc: linux-kernel, Jiri Olsa, Arnaldo Carvalho de Melo
* Jiri Olsa | 2020-09-28 12:08:08 [+0200]:
>patch below fixes it for me, but seems strange this was
>working till now.. maybe you're the only one using this
>with python3 ;-)
and I thought python2 is obsolete and not maintained anymore ... ;-)
Anyway, the patch fixed everything: no more garbage for Python2 and Python3
as well as no bytearray type Python3!
Tested-by: Hagen Paul Pfeifer <hagen@jauu.net>
Thank you Jiri!
Probably this patch should be applied on stable too!? Not sure when the
problem was introduced.
Hagen
>jirka
>
>
>---
>diff --git a/tools/perf/util/print_binary.c b/tools/perf/util/print_binary.c
>index 599a1543871d..13fdc51c61d9 100644
>--- a/tools/perf/util/print_binary.c
>+++ b/tools/perf/util/print_binary.c
>@@ -50,7 +50,7 @@ int is_printable_array(char *p, unsigned int len)
>
> len--;
>
>- for (i = 0; i < len; i++) {
>+ for (i = 0; i < len && p[i]; i++) {
> if (!isprint(p[i]) && !isspace(p[i]))
> return 0;
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf script, libperf: python binding bug (bytearrays vs. strings)
2020-09-28 10:43 ` Hagen Paul Pfeifer
@ 2020-09-28 13:39 ` Jiri Olsa
2020-09-28 19:19 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 6+ messages in thread
From: Jiri Olsa @ 2020-09-28 13:39 UTC (permalink / raw)
To: Hagen Paul Pfeifer; +Cc: linux-kernel, Jiri Olsa, Arnaldo Carvalho de Melo
On Mon, Sep 28, 2020 at 12:43:11PM +0200, Hagen Paul Pfeifer wrote:
> * Jiri Olsa | 2020-09-28 12:08:08 [+0200]:
>
> >patch below fixes it for me, but seems strange this was
> >working till now.. maybe you're the only one using this
> >with python3 ;-)
>
> and I thought python2 is obsolete and not maintained anymore ... ;-)
> Anyway, the patch fixed everything: no more garbage for Python2 and Python3
> as well as no bytearray type Python3!
>
> Tested-by: Hagen Paul Pfeifer <hagen@jauu.net>
>
> Thank you Jiri!
>
> Probably this patch should be applied on stable too!? Not sure when the
> problem was introduced.
great, I'll check on that and send full patch later, thanks
jirka
>
> Hagen
>
> >jirka
> >
> >
> >---
> >diff --git a/tools/perf/util/print_binary.c b/tools/perf/util/print_binary.c
> >index 599a1543871d..13fdc51c61d9 100644
> >--- a/tools/perf/util/print_binary.c
> >+++ b/tools/perf/util/print_binary.c
> >@@ -50,7 +50,7 @@ int is_printable_array(char *p, unsigned int len)
> >
> > len--;
> >
> >- for (i = 0; i < len; i++) {
> >+ for (i = 0; i < len && p[i]; i++) {
> > if (!isprint(p[i]) && !isspace(p[i]))
> > return 0;
> > }
> >
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf script, libperf: python binding bug (bytearrays vs. strings)
2020-09-28 13:39 ` Jiri Olsa
@ 2020-09-28 19:19 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-09-28 19:19 UTC (permalink / raw)
To: Jiri Olsa
Cc: Hagen Paul Pfeifer, linux-kernel, Jiri Olsa,
Arnaldo Carvalho de Melo
Em Mon, Sep 28, 2020 at 03:39:42PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 28, 2020 at 12:43:11PM +0200, Hagen Paul Pfeifer wrote:
> > * Jiri Olsa | 2020-09-28 12:08:08 [+0200]:
> >
> > >patch below fixes it for me, but seems strange this was
> > >working till now.. maybe you're the only one using this
> > >with python3 ;-)
> >
> > and I thought python2 is obsolete and not maintained anymore ... ;-)
> > Anyway, the patch fixed everything: no more garbage for Python2 and Python3
> > as well as no bytearray type Python3!
> >
> > Tested-by: Hagen Paul Pfeifer <hagen@jauu.net>
> >
> > Thank you Jiri!
> >
> > Probably this patch should be applied on stable too!? Not sure when the
> > problem was introduced.
>
> great, I'll check on that and send full patch later, thanks
Thanks, I'll do one more pull req for v5.9, will have that in.
Hagen, please consider sending a patch making using python3 the default,
with python2 left just for whoever still needs it.
Thanks!
- Arnaldo
> jirka
>
> >
> > Hagen
> >
> > >jirka
> > >
> > >
> > >---
> > >diff --git a/tools/perf/util/print_binary.c b/tools/perf/util/print_binary.c
> > >index 599a1543871d..13fdc51c61d9 100644
> > >--- a/tools/perf/util/print_binary.c
> > >+++ b/tools/perf/util/print_binary.c
> > >@@ -50,7 +50,7 @@ int is_printable_array(char *p, unsigned int len)
> > >
> > > len--;
> > >
> > >- for (i = 0; i < len; i++) {
> > >+ for (i = 0; i < len && p[i]; i++) {
> > > if (!isprint(p[i]) && !isspace(p[i]))
> > > return 0;
> > > }
> > >
> >
>
--
- Arnaldo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-09-28 19:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-27 7:43 perf script, libperf: python binding bug (bytearrays vs. strings) Hagen Paul Pfeifer
2020-09-27 9:20 ` Hagen Paul Pfeifer
2020-09-28 10:08 ` Jiri Olsa
2020-09-28 10:43 ` Hagen Paul Pfeifer
2020-09-28 13:39 ` Jiri Olsa
2020-09-28 19:19 ` Arnaldo Carvalho de Melo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.