All of lore.kernel.org
 help / color / mirror / Atom feed
* perf script, libperf: python binding bug (bytearrays vs. strings)
@ 2020-09-27  7:43 Hagen Paul Pfeifer
  2020-09-27  9:20 ` Hagen Paul Pfeifer
  2020-09-28 10:08 ` Jiri Olsa
  0 siblings, 2 replies; 6+ messages in thread
From: Hagen Paul Pfeifer @ 2020-09-27  7:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jiri Olsa, Arnaldo Carvalho de Melo

Hallo Jiri, Arnaldo,

after updating Debian (probably with the advent of Python 3.8.5, guessing)
I get a wired behavior with python scripting. The error is that the python type
for prev_comm and next_comm are not strings anymore, rather bytearrays. Which
are incompatible types and scripts will not work anymore. NOTE: common_comm is
still fine (see swapper & mutex-thread-co), so they must be treated internal
differently compared to prev_comm and next_comm and possibly show a way to solve
this problem!

After bisecting the kernel (perf) even back to v5.6 the problem still exist.
Compiling perf with PYTHON=python2 do not show any problems - no problems in
the Python2 world. So I assume with Python 3.8.5 (or other helper library)
something changed internally. I assume the cause exists in perf forever but
is now triggered with the new Python3 version.

How to reproduce:

make PYTHON=python3
./perf record -e sched:sched_switch -a -- sleep 5
./perf script --gen-script py
./perf script -s ./perf-script.py

[..]
sched__sched_switch      7 563231.759525792        0 swapper              prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'), next_pid=3447985, next_prio=120
Sample: {addr=0, cpu=7, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744072189289569, period=1, phys_addr=0, pid=0, tid=0, time=563231759525792, transaction=0, values=[(0, 0)], weight=0}

sched__sched_switch      7 563231.759582596  3447985 mutex-thread-co      prev_comm=bytearray(b'mutex-thread-co\x00'), prev_pid=3447985, prev_prio=120, prev_state=, next_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), next_pid=0, next_prio=120
Sample: {addr=0, cpu=7, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744072189289569, period=1, phys_addr=0, pid=3447983, tid=3447985, time=563231759582596, transaction=0, values=[(0, 0)], weight=0}


See =bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00') - should be swapper/7


Note: the byte array has the length of 16 - exactly like the kernel
(TASK_COMM_LEN). I assume this is somehow copied directly into the variables
and not stringified anymore.


Even worse: I discovered bytearrays which are not correctly "memseted":
	bytearray(b'chrome\x00sandbox\x00\x00')

chrome should be the comm name, but is 'chromesandbox' somehow. See the null
bytes in between.

Jiri, Arnaldo - I tried to fix this. But the Python binding magic for the
automatically generated events are hard to get comfy.

Hagen

PS: assume this fix is also kernel stable relevant.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-28 19:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-27  7:43 perf script, libperf: python binding bug (bytearrays vs. strings) Hagen Paul Pfeifer
2020-09-27  9:20 ` Hagen Paul Pfeifer
2020-09-28 10:08 ` Jiri Olsa
2020-09-28 10:43   ` Hagen Paul Pfeifer
2020-09-28 13:39     ` Jiri Olsa
2020-09-28 19:19       ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.