[PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions
@ 2020-06-24 15:31 Ahmed Karaman
  2020-06-24 15:31 ` [PATCH v3 1/3] scripts/performance: Add topN_perf.py script Ahmed Karaman
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Ahmed Karaman @ 2020-06-24 15:31 UTC (permalink / raw)
  To: qemu-devel, aleksandar.qemu.devel, alex.bennee, eblake, rth,
	ldoktor, ehabkost, crosa
  Cc: Ahmed Karaman

Greetings,

As a part of the TCG Continous Benchmarking project for GSoC this
year, detailed reports discussing different performance measurement
methodologies and analysis results will be sent here on the mailing
list.

The project's first report is currently being revised and will be
posted on the mailing list in the next few days.*
A section in this report will deal with measuring the top 25 executed
functions when running QEMU. It includes two Python scripts that
automatically perform this task.

This series adds these two scripts to a new performance directory
created under the scripts directory. It also adds a new
"Miscellaneous" section to the end of the MAINTAINERS file with a
"Performance Tools and Tests" subsection.

Previous version of the series:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg06147.html

*UPDATE: Report 1 was published on the mailing list on Monday the 22nd
of June.

Best regards,
Ahmed Karaman

v2->v3:
- Use a clearer "Syntax" and "Example of usage" in the script comment
  and commit message.
- Manually specify the instructions required to run Perf instead of
  relying on the stderr produced by Perf.
- Use more descriptive variable names.

Ahmed Karaman (3):
  scripts/performance: Add topN_perf.py script
  scripts/performance: Add topN_callgrind.py script
  MAINTAINERS: Add 'Performance Tools and Tests' subsection

 MAINTAINERS                           |   7 ++
 scripts/performance/topN_callgrind.py | 139 +++++++++++++++++++++++++
 scripts/performance/topN_perf.py      | 142 ++++++++++++++++++++++++++
 3 files changed, 288 insertions(+)
 create mode 100755 scripts/performance/topN_callgrind.py
 create mode 100755 scripts/performance/topN_perf.py

-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/3] scripts/performance: Add topN_perf.py script
  2020-06-24 15:31 [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Ahmed Karaman
@ 2020-06-24 15:31 ` Ahmed Karaman
  2020-06-25  9:45   ` Aleksandar Markovic
  2020-06-24 15:31 ` [PATCH v3 2/3] scripts/performance: Add topN_callgrind.py script Ahmed Karaman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Ahmed Karaman @ 2020-06-24 15:31 UTC (permalink / raw)
  To: qemu-devel, aleksandar.qemu.devel, alex.bennee, eblake, rth,
	ldoktor, ehabkost, crosa
  Cc: Ahmed Karaman

Syntax:
topN_perf.py [-h] [-n] <number of displayed top functions>  -- \
                 <qemu executable> [<qemu executable options>] \
                 <target executable> [<target execurable options>]

[-h] - Print the script arguments help message.
[-n] - Specify the number of top functions to print.
     - If this flag is not specified, the tool defaults to 25.

Example of usage:
topN_perf.py -n 20 -- qemu-arm coulomb_double-arm

Example Output:
 No.  Percentage  Name                       Caller
----  ----------  -------------------------  -------------------------
   1      16.25%  float64_mul                qemu-x86_64
   2      12.01%  float64_sub                qemu-x86_64
   3      11.99%  float64_add                qemu-x86_64
   4       5.69%  helper_mulsd               qemu-x86_64
   5       4.68%  helper_addsd               qemu-x86_64
   6       4.43%  helper_lookup_tb_ptr       qemu-x86_64
   7       4.28%  helper_subsd               qemu-x86_64
   8       2.71%  f64_compare                qemu-x86_64
   9       2.71%  helper_ucomisd             qemu-x86_64
  10       1.04%  helper_pand_xmm            qemu-x86_64
  11       0.71%  float64_div                qemu-x86_64
  12       0.63%  helper_pxor_xmm            qemu-x86_64
  13       0.50%  0x00007f7b7004ef95         [JIT] tid 491
  14       0.50%  0x00007f7b70044e83         [JIT] tid 491
  15       0.36%  helper_por_xmm             qemu-x86_64
  16       0.32%  helper_cc_compute_all      qemu-x86_64
  17       0.30%  0x00007f7b700433f0         [JIT] tid 491
  18       0.30%  float64_compare_quiet      qemu-x86_64
  19       0.27%  soft_f64_addsub            qemu-x86_64
  20       0.26%  round_to_int               qemu-x86_64

Signed-off-by: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
---
 scripts/performance/topN_perf.py | 142 +++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100755 scripts/performance/topN_perf.py

diff --git a/scripts/performance/topN_perf.py b/scripts/performance/topN_perf.py
new file mode 100755
index 0000000000..d2b939c375
--- /dev/null
+++ b/scripts/performance/topN_perf.py
@@ -0,0 +1,142 @@
+#!/usr/bin/env python3
+
+#  Print the top N most executed functions in QEMU using perf.
+#  Syntax:
+#  topN_perf.py [-h] [-n] <number of displayed top functions>  -- \
+#           <qemu executable> [<qemu executable options>] \
+#           <target executable> [<target execurable options>]
+#
+#  [-h] - Print the script arguments help message.
+#  [-n] - Specify the number of top functions to print.
+#       - If this flag is not specified, the tool defaults to 25.
+#
+#  Example of usage:
+#  topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
+#  Copyright (C) 2020  Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+
+
+# Parse the command line arguments
+parser = argparse.ArgumentParser(
+    usage='topN_perf.py [-h] [-n] <number of displayed top functions >  -- '
+          '<qemu executable> [<qemu executable options>] '
+          '<target executable> [<target executable options>]')
+
+parser.add_argument('-n', dest='top', type=int, default=25,
+                    help='Specify the number of top functions to print.')
+
+parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+
+args = parser.parse_args()
+
+# Extract the needed variables from the args
+command = args.command
+top = args.top
+
+# Insure that perf is installed
+check_perf = subprocess.run(["which", "perf"], stdout=subprocess.DEVNULL)
+if check_perf.returncode:
+    sys.exit("Please install perf before running the script!")
+
+# Insure user has previllage to run perf
+check_perf_executability = subprocess.run(["perf", "stat", "ls", "/"],
+                           stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+if check_perf_executability.returncode:
+    sys.exit(
+"""
+Error:
+You may not have permission to collect stats.
+
+Consider tweaking /proc/sys/kernel/perf_event_paranoid,
+which controls use of the performance events system by
+unprivileged users (without CAP_SYS_ADMIN).
+
+  -1: Allow use of (almost) all events by all users
+      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
+   0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
+      Disallow raw tracepoint access by users without CAP_SYS_ADMIN
+   1: Disallow CPU event access by users without CAP_SYS_ADMIN
+   2: Disallow kernel profiling by users without CAP_SYS_ADMIN
+
+To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
+   kernel.perf_event_paranoid = -1
+"""
+)
+
+# Run perf record
+perf_record = subprocess.run((["perf", "record"] + command),
+                             stdout=subprocess.DEVNULL, stderr=subprocess.PIPE)
+if perf_record.returncode:
+    os.unlink('perf.data')
+    sys.exit(perf_record.stderr.decode("utf-8"))
+
+# Save perf report output to perf_report.out
+with open("perf_report.out", "w") as output:
+    perf_report = subprocess.run(
+        ["perf", "report", "--stdio"], stdout=output, stderr=subprocess.PIPE)
+    if perf_report.returncode:
+        os.unlink('perf.data')
+        output.close()
+        os.unlink('perf_report.out')
+        sys.exit(perf_report.stderr.decode("utf-8"))
+
+# Read the reported data to functions[]
+functions = []
+with open("perf_report.out", "r") as data:
+    # Only read lines that are not comments (comments start with #)
+    # Only read lines that are not empty
+    functions = [line for line in data.readlines() if line and line[0]
+                 != '#' and line[0] != "\n"]
+
+# Limit the number of top functions to "top"
+number_of_top_functions = top if len(functions) > top else len(functions)
+
+# Store the data of the top functions in top_functions[]
+top_functions = functions[:number_of_top_functions]
+
+# Print table header
+print('{:>4}  {:>10}  {:<30}  {}\n{}  {}  {}  {}'.format('No.',
+                                                         'Percentage',
+                                                         'Name',
+                                                         'Caller',
+                                                         '-' * 4,
+                                                         '-' * 10,
+                                                         '-' * 30,
+                                                         '-' * 25))
+
+
+# Print top N functions
+for (index, function) in enumerate(top_functions, start=1):
+    function_data = function.split()
+    function_percentage = function_data[0]
+    function_name = function_data[-1]
+    function_caller = ' '.join(function_data[2:-2])
+    print('{:>4}  {:>10}  {:<30}  {}'.format(index,
+                                             function_percentage,
+                                             function_name,
+                                             function_caller))
+
+# Remove intermediate files
+os.unlink('perf.data')
+os.unlink('perf_report.out')
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/3] scripts/performance: Add topN_callgrind.py script
  2020-06-24 15:31 [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Ahmed Karaman
  2020-06-24 15:31 ` [PATCH v3 1/3] scripts/performance: Add topN_perf.py script Ahmed Karaman
@ 2020-06-24 15:31 ` Ahmed Karaman
  2020-06-25  9:54   ` Aleksandar Markovic
  2020-06-24 15:31 ` [PATCH v3 3/3] MAINTAINERS: Add 'Performance Tools and Tests' subsection Ahmed Karaman
  2020-06-25  9:19 ` [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Aleksandar Markovic
  3 siblings, 1 reply; 9+ messages in thread
From: Ahmed Karaman @ 2020-06-24 15:31 UTC (permalink / raw)
  To: qemu-devel, aleksandar.qemu.devel, alex.bennee, eblake, rth,
	ldoktor, ehabkost, crosa
  Cc: Ahmed Karaman

Python script that prints the top N most executed functions in QEMU
using callgrind.

Syntax:
topN_callgrind.py [-h] [-n] <number of displayed top functions>  -- \
                      <qemu executable> [<qemu executable options>] \
                      <target executable> [<target execurable options>]

[-h] - Print the script arguments help message.
[-n] - Specify the number of top functions to print.
     - If this flag is not specified, the tool defaults to 25.

Example of usage:
topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm

Example Output:
No.  Percentage Name                  Source File
----  --------- ------------------    ------------------------------
   1    24.577% 0x00000000082db000    ???
   2    20.467% float64_mul           <qemu>/fpu/softfloat.c
   3    14.720% float64_sub           <qemu>/fpu/softfloat.c
   4    13.864% float64_add           <qemu>/fpu/softfloat.c
   5     4.876% helper_mulsd          <qemu>/target/i386/ops_sse.h
   6     3.767% helper_subsd          <qemu>/target/i386/ops_sse.h
   7     3.549% helper_addsd          <qemu>/target/i386/ops_sse.h
   8     2.185% helper_ucomisd        <qemu>/target/i386/ops_sse.h
   9     1.667% helper_lookup_tb_ptr  <qemu>/include/exec/tb-lookup.h
  10     1.662% f64_compare           <qemu>/fpu/softfloat.c
  11     1.509% helper_lookup_tb_ptr  <qemu>/accel/tcg/tcg-runtime.c
  12     0.635% helper_lookup_tb_ptr  <qemu>/include/exec/exec-all.h
  13     0.616% float64_div           <qemu>/fpu/softfloat.c
  14     0.502% helper_pand_xmm       <qemu>/target/i386/ops_sse.h
  15     0.502% float64_mul           <qemu>/include/fpu/softfloat.h
  16     0.476% helper_lookup_tb_ptr  <qemu>/target/i386/cpu.h
  17     0.437% float64_compare_quiet <qemu>/fpu/softfloat.c
  18     0.414% helper_pxor_xmm       <qemu>/target/i386/ops_sse.h
  19     0.353% round_to_int          <qemu>/fpu/softfloat.c
  20     0.347% helper_cc_compute_all <qemu>/target/i386/cc_helper.c

Signed-off-by: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
---
 scripts/performance/topN_callgrind.py | 139 ++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)
 create mode 100755 scripts/performance/topN_callgrind.py

diff --git a/scripts/performance/topN_callgrind.py b/scripts/performance/topN_callgrind.py
new file mode 100755
index 0000000000..6136f72a74
--- /dev/null
+++ b/scripts/performance/topN_callgrind.py
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+
+#  Print the top N most executed functions in QEMU using callgrind.
+#  Syntax:
+#  topN_callgrind.py [-h] [-n] <number of displayed top functions>  -- \
+#           <qemu executable> [<qemu executable options>] \
+#           <target executable> [<target execurable options>]
+#
+#  [-h] - Print the script arguments help message.
+#  [-n] - Specify the number of top functions to print.
+#       - If this flag is not specified, the tool defaults to 25.
+#
+#  Example of usage:
+#  topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm
+#
+#  This file is a part of the project "TCG Continuous Benchmarking".
+#
+#  Copyright (C) 2020  Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
+#  Copyright (C) 2020  Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
+#
+#  This program is free software: you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation, either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+import argparse
+import os
+import subprocess
+import sys
+
+
+# Parse the command line arguments
+parser = argparse.ArgumentParser(
+    usage='topN_callgrind.py [-h] [-n] <number of displayed top functions>  -- '
+          '<qemu executable> [<qemu executable options>] '
+          '<target executable> [<target executable options>]')
+
+parser.add_argument('-n', dest='top', type=int, default=25,
+                    help='Specify the number of top functions to print.')
+
+parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
+
+args = parser.parse_args()
+
+# Extract the needed variables from the args
+command = args.command
+top = args.top
+
+# Insure that valgrind is installed
+check_valgrind = subprocess.run(
+    ["which", "valgrind"], stdout=subprocess.DEVNULL)
+if check_valgrind.returncode:
+    sys.exit("Please install valgrind before running the script!")
+
+# Run callgrind
+callgrind = subprocess.run((["valgrind", "--tool=callgrind",
+                             "--callgrind-out-file=callgrind.data"] + command),
+                           stdout=subprocess.DEVNULL, stderr=subprocess.PIPE)
+if callgrind.returncode:
+    sys.exit(callgrind.stderr.decode("utf-8"))
+
+# Save callgrind_annotate output to callgrind_annotate.out
+with open("callgrind_annotate.out", "w") as output:
+    callgrind_annotate = subprocess.run(
+        ["callgrind_annotate", "callgrind.data"],
+        stdout=output,
+        stderr=subprocess.PIPE)
+    if callgrind_annotate.returncode:
+        os.unlink('callgrind.data')
+        output.close()
+        os.unlink('callgrind_annotate.out')
+        sys.exit(callgrind_annotate.stderr.decode("utf-8"))
+
+
+# Read the callgrind_annotate output to callgrind_data[]
+callgrind_data = []
+with open('callgrind_annotate.out', 'r') as data:
+    callgrind_data = data.readlines()
+
+# Line number with the total number of instructions
+total_instructions_line_number = 20
+
+# Get the total number of instructions
+total_instructions_line_data = callgrind_data[total_instructions_line_number]
+total_number_of_instructions = total_instructions_line_data.split(' ')[0]
+total_number_of_instructions = int(
+    total_number_of_instructions.replace(',', ''))
+
+# Line number with the top function
+first_func_line = 25
+
+# Number of functions recorded by callgrind, last two lines are always empty
+number_of_functions = len(callgrind_data) - first_func_line - 2
+
+# Limit the number of top functions to "top"
+number_of_top_functions = (top if number_of_functions >
+                           top else number_of_functions)
+
+# Store the data of the top functions in top_functions[]
+top_functions = callgrind_data[first_func_line:
+                               first_func_line + number_of_top_functions]
+
+# Print table header
+print('{:>4}  {:>10}  {:<30}  {}\n{}  {}  {}  {}'.format('No.',
+                                                         'Percentage',
+                                                         'Name',
+                                                         'Source File',
+                                                         '-' * 4,
+                                                         '-' * 10,
+                                                         '-' * 30,
+                                                         '-' * 30,
+                                                         ))
+
+# Print top N functions
+for (index, function) in enumerate(top_functions, start=1):
+    function_data = function.split()
+    # Calculate function percentage
+    function_instructions = float(function_data[0].replace(',', ''))
+    function_percentage = (function_instructions /
+                           total_number_of_instructions)*100
+    # Get function name and source files path
+    function_source_path, function_name = function_data[1].split(':')
+    # Print extracted data
+    print('{:>4}  {:>9.3f}%  {:<30}  {}'.format(index,
+                                                round(function_percentage, 3),
+                                                function_name,
+                                                function_source_path))
+
+# Remove intermediate files
+os.unlink('callgrind.data')
+os.unlink('callgrind_annotate.out')
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] MAINTAINERS: Add 'Performance Tools and Tests' subsection
  2020-06-24 15:31 [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Ahmed Karaman
  2020-06-24 15:31 ` [PATCH v3 1/3] scripts/performance: Add topN_perf.py script Ahmed Karaman
  2020-06-24 15:31 ` [PATCH v3 2/3] scripts/performance: Add topN_callgrind.py script Ahmed Karaman
@ 2020-06-24 15:31 ` Ahmed Karaman
  2020-06-25 10:00   ` Aleksandar Markovic
  2020-06-25  9:19 ` [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Aleksandar Markovic
  3 siblings, 1 reply; 9+ messages in thread
From: Ahmed Karaman @ 2020-06-24 15:31 UTC (permalink / raw)
  To: qemu-devel, aleksandar.qemu.devel, alex.bennee, eblake, rth,
	ldoktor, ehabkost, crosa
  Cc: Ahmed Karaman

This commit creates a new 'Miscellaneous' section which hosts a new
'Performance Tools and Tests' subsection.
The subsection will contain the the performance scripts and benchmarks
written as a part of the 'TCG Continuous Benchmarking' project.

Signed-off-by: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 955cc8dd5c..ee4bfc5fb1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2974,3 +2974,10 @@ M: Peter Maydell <peter.maydell@linaro.org>
 S: Maintained
 F: docs/conf.py
 F: docs/*/conf.py
+
+Miscellaneous
+-------------
+Performance Tools and Tests
+M: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
+S: Maintained
+F: scripts/performance/
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions
  2020-06-24 15:31 [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Ahmed Karaman
                   ` (2 preceding siblings ...)
  2020-06-24 15:31 ` [PATCH v3 3/3] MAINTAINERS: Add 'Performance Tools and Tests' subsection Ahmed Karaman
@ 2020-06-25  9:19 ` Aleksandar Markovic
  3 siblings, 0 replies; 9+ messages in thread
From: Aleksandar Markovic @ 2020-06-25  9:19 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Eduardo Habkost, Alex Bennée,
	QEMU Developers, Cleber Rosa, Richard Henderson

сре, 24. јун 2020. у 17:31 Ahmed Karaman
<ahmedkhaledkaraman@gmail.com> је написао/ла:
>
> Greetings,
>
> As a part of the TCG Continous Benchmarking project for GSoC this
> year, detailed reports discussing different performance measurement
> methodologies and analysis results will be sent here on the mailing
> list.
>
> The project's first report is currently being revised and will be
> posted on the mailing list in the next few days.*

Yes, I said that for each series v2, v3, v4 must contain you must
carry the same cover letter. But I didn't mean literally the same - I
didn't mean "identical".

The cover letter should always reflect the content, and should always
be a stand-alone letter, independent on previous version, but it can
change in some details, as the series or cicrcumstance change.

So, here, you should replace:

> The project's first report is currently being revised and will be
> posted on the mailing list in the next few days.*

With:

"Report 1 was published on the mailing list on the 22nd of June:

<insert here the link to the corresponding mailing list item>
"

> A section in this report will deal with measuring the top 25 executed
> functions when running QEMU. It includes two Python scripts that
> automatically perform this task.
>
> This series adds these two scripts to a new performance directory
> created under the scripts directory. It also adds a new
> "Miscellaneous" section to the end of the MAINTAINERS file with a
> "Performance Tools and Tests" subsection.
>
> Previous version of the series:
> https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg06147.html
>
> *UPDATE: Report 1 was published on the mailing list on Monday the 22nd
> of June.
>
> Best regards,
> Ahmed Karaman
>
> v2->v3:
> - Use a clearer "Syntax" and "Example of usage" in the script comment
>   and commit message.
> - Manually specify the instructions required to run Perf instead of
>   relying on the stderr produced by Perf.
> - Use more descriptive variable names.
>

History must be a complete history, not only "diff" to the previous
version. So, for v4, you should have something like this:

v3->v4:
   <you describe here difference between v3 and v4>

v2->v3:
   <you describe here difference between v2 and v3>

v1->v2:
   <you describe here difference between v1 and v2>

Thanks,
Aleksandar

> Ahmed Karaman (3):
>   scripts/performance: Add topN_perf.py script
>   scripts/performance: Add topN_callgrind.py script
>   MAINTAINERS: Add 'Performance Tools and Tests' subsection
>
>  MAINTAINERS                           |   7 ++
>  scripts/performance/topN_callgrind.py | 139 +++++++++++++++++++++++++
>  scripts/performance/topN_perf.py      | 142 ++++++++++++++++++++++++++
>  3 files changed, 288 insertions(+)
>  create mode 100755 scripts/performance/topN_callgrind.py
>  create mode 100755 scripts/performance/topN_perf.py
>
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] scripts/performance: Add topN_perf.py script
  2020-06-24 15:31 ` [PATCH v3 1/3] scripts/performance: Add topN_perf.py script Ahmed Karaman
@ 2020-06-25  9:45   ` Aleksandar Markovic
  2020-06-26 14:18     ` Ahmed Karaman
  0 siblings, 1 reply; 9+ messages in thread
From: Aleksandar Markovic @ 2020-06-25  9:45 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Eduardo Habkost, Alex Bennée,
	QEMU Developers, Cleber Rosa, Richard Henderson

сре, 24. јун 2020. у 17:32 Ahmed Karaman
<ahmedkhaledkaraman@gmail.com> је написао/ла:
>
> Syntax:
> topN_perf.py [-h] [-n] <number of displayed top functions>  -- \
>                  <qemu executable> [<qemu executable options>] \
>                  <target executable> [<target execurable options>]
>
> [-h] - Print the script arguments help message.
> [-n] - Specify the number of top functions to print.
>      - If this flag is not specified, the tool defaults to 25.
>
> Example of usage:
> topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
>
> Example Output:
>  No.  Percentage  Name                       Caller
> ----  ----------  -------------------------  -------------------------
>    1      16.25%  float64_mul                qemu-x86_64
>    2      12.01%  float64_sub                qemu-x86_64
>    3      11.99%  float64_add                qemu-x86_64
>    4       5.69%  helper_mulsd               qemu-x86_64
>    5       4.68%  helper_addsd               qemu-x86_64
>    6       4.43%  helper_lookup_tb_ptr       qemu-x86_64
>    7       4.28%  helper_subsd               qemu-x86_64
>    8       2.71%  f64_compare                qemu-x86_64
>    9       2.71%  helper_ucomisd             qemu-x86_64
>   10       1.04%  helper_pand_xmm            qemu-x86_64
>   11       0.71%  float64_div                qemu-x86_64
>   12       0.63%  helper_pxor_xmm            qemu-x86_64
>   13       0.50%  0x00007f7b7004ef95         [JIT] tid 491
>   14       0.50%  0x00007f7b70044e83         [JIT] tid 491
>   15       0.36%  helper_por_xmm             qemu-x86_64
>   16       0.32%  helper_cc_compute_all      qemu-x86_64
>   17       0.30%  0x00007f7b700433f0         [JIT] tid 491
>   18       0.30%  float64_compare_quiet      qemu-x86_64
>   19       0.27%  soft_f64_addsub            qemu-x86_64
>   20       0.26%  round_to_int               qemu-x86_64
>
> Signed-off-by: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> ---
>  scripts/performance/topN_perf.py | 142 +++++++++++++++++++++++++++++++
>  1 file changed, 142 insertions(+)
>  create mode 100755 scripts/performance/topN_perf.py
>
> diff --git a/scripts/performance/topN_perf.py b/scripts/performance/topN_perf.py
> new file mode 100755
> index 0000000000..d2b939c375
> --- /dev/null
> +++ b/scripts/performance/topN_perf.py
> @@ -0,0 +1,142 @@
> +#!/usr/bin/env python3
> +
> +#  Print the top N most executed functions in QEMU using perf.
> +#  Syntax:
> +#  topN_perf.py [-h] [-n] <number of displayed top functions>  -- \
> +#           <qemu executable> [<qemu executable options>] \
> +#           <target executable> [<target execurable options>]
> +#
> +#  [-h] - Print the script arguments help message.
> +#  [-n] - Specify the number of top functions to print.
> +#       - If this flag is not specified, the tool defaults to 25.
> +#
> +#  Example of usage:
> +#  topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
> +#
> +#  This file is a part of the project "TCG Continuous Benchmarking".
> +#
> +#  Copyright (C) 2020  Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> +#  Copyright (C) 2020  Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
> +#
> +#  This program is free software: you can redistribute it and/or modify
> +#  it under the terms of the GNU General Public License as published by
> +#  the Free Software Foundation, either version 2 of the License, or
> +#  (at your option) any later version.
> +#
> +#  This program is distributed in the hope that it will be useful,
> +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
> +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +#  GNU General Public License for more details.
> +#
> +#  You should have received a copy of the GNU General Public License
> +#  along with this program. If not, see <https://www.gnu.org/licenses/>.
> +
> +import argparse
> +import os
> +import subprocess
> +import sys
> +
> +
> +# Parse the command line arguments
> +parser = argparse.ArgumentParser(
> +    usage='topN_perf.py [-h] [-n] <number of displayed top functions >  -- '
> +          '<qemu executable> [<qemu executable options>] '
> +          '<target executable> [<target executable options>]')
> +
> +parser.add_argument('-n', dest='top', type=int, default=25,
> +                    help='Specify the number of top functions to print.')
> +
> +parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
> +
> +args = parser.parse_args()
> +
> +# Extract the needed variables from the args
> +command = args.command
> +top = args.top
> +
> +# Insure that perf is installed
> +check_perf = subprocess.run(["which", "perf"], stdout=subprocess.DEVNULL)
> +if check_perf.returncode:
> +    sys.exit("Please install perf before running the script!")

I would rename "chech_perf" to "check_perf_presence". It is more
specific, clearer.

> +
> +# Insure user has previllage to run perf
> +check_perf_executability = subprocess.run(["perf", "stat", "ls", "/"],
> +                           stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
> +if check_perf_executability.returncode:
> +    sys.exit(
> +"""
> +Error:
> +You may not have permission to collect stats.
> +
> +Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> +which controls use of the performance events system by
> +unprivileged users (without CAP_SYS_ADMIN).
> +
> +  -1: Allow use of (almost) all events by all users
> +      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> +   0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
> +      Disallow raw tracepoint access by users without CAP_SYS_ADMIN
> +   1: Disallow CPU event access by users without CAP_SYS_ADMIN
> +   2: Disallow kernel profiling by users without CAP_SYS_ADMIN
> +
> +To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
> +   kernel.perf_event_paranoid = -1
> +"""
> +)

Very good.

> +
> +# Run perf record
> +perf_record = subprocess.run((["perf", "record"] + command),
> +                             stdout=subprocess.DEVNULL, stderr=subprocess.PIPE)
> +if perf_record.returncode:
> +    os.unlink('perf.data')
> +    sys.exit(perf_record.stderr.decode("utf-8"))

Here, the file "perf.data" will be created in the current working
directory. If one existed prior to script execution, it will be
overwritten.

I think such "corruption" of current working directory is not optimal.
It would be better that the script doesn't touch current working
directory at all (perhaps user wants to keep perf.data he obtained
from some experiment in the past.

Therefore, I think it would be better if you specify output of "perf
report" to be "/tmp/perf.data", not "perf.data", which is the default.
There is an option of "perf record" to specify the output file:

       -o, --output=
           Output file name.

> +
> +# Save perf report output to perf_report.out
> +with open("perf_report.out", "w") as output:
> +    perf_report = subprocess.run(
> +        ["perf", "report", "--stdio"], stdout=output, stderr=subprocess.PIPE)
> +    if perf_report.returncode:
> +        os.unlink('perf.data')
> +        output.close()
> +        os.unlink('perf_report.out')
> +        sys.exit(perf_report.stderr.decode("utf-8"))

For similar reasons described above, input file should be
"/tmp/perf.data". Option of "perf report" for input file:

       -i, --input=
           Input file name.

Output file should be "/tmp/perf_report.out", not "perf_report.out".

> +
> +# Read the reported data to functions[]
> +functions = []
> +with open("perf_report.out", "r") as data:

"/tmp/perf_report.out"

> +    # Only read lines that are not comments (comments start with #)
> +    # Only read lines that are not empty
> +    functions = [line for line in data.readlines() if line and line[0]
> +                 != '#' and line[0] != "\n"]
> +
> +# Limit the number of top functions to "top"
> +number_of_top_functions = top if len(functions) > top else len(functions)
> +
> +# Store the data of the top functions in top_functions[]
> +top_functions = functions[:number_of_top_functions]
> +
> +# Print table header
> +print('{:>4}  {:>10}  {:<30}  {}\n{}  {}  {}  {}'.format('No.',
> +                                                         'Percentage',
> +                                                         'Name',

'Function Name' would be more ergonomic here.

> +                                                         'Caller',

Please replace 'Caller' with 'Invoked by'. 'Caller' implies a function
that directly calls the function in question. 'Invoked by' avoids such
confusion, and it just feels more appropriate here.

> +                                                         '-' * 4,
> +                                                         '-' * 10,
> +                                                         '-' * 30,
> +                                                         '-' * 25))
> +
> +
> +# Print top N functions
> +for (index, function) in enumerate(top_functions, start=1):
> +    function_data = function.split()
> +    function_percentage = function_data[0]
> +    function_name = function_data[-1]
> +    function_caller = ' '.join(function_data[2:-2])

function_invoker

> +    print('{:>4}  {:>10}  {:<30}  {}'.format(index,
> +                                             function_percentage,
> +                                             function_name,
> +                                             function_caller))

function_invoker

> +
> +# Remove intermediate files
> +os.unlink('perf.data')
> +os.unlink('perf_report.out')

os.unlink('/tmp/perf.data')
os.unlink('/tmp/perf_report.out')


> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] scripts/performance: Add topN_callgrind.py script
  2020-06-24 15:31 ` [PATCH v3 2/3] scripts/performance: Add topN_callgrind.py script Ahmed Karaman
@ 2020-06-25  9:54   ` Aleksandar Markovic
  0 siblings, 0 replies; 9+ messages in thread
From: Aleksandar Markovic @ 2020-06-25  9:54 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Eduardo Habkost, Alex Bennée,
	QEMU Developers, Cleber Rosa, Richard Henderson

сре, 24. јун 2020. у 17:32 Ahmed Karaman
<ahmedkhaledkaraman@gmail.com> је написао/ла:
>
> Python script that prints the top N most executed functions in QEMU
> using callgrind.
>
> Syntax:
> topN_callgrind.py [-h] [-n] <number of displayed top functions>  -- \
>                       <qemu executable> [<qemu executable options>] \
>                       <target executable> [<target execurable options>]
>
> [-h] - Print the script arguments help message.
> [-n] - Specify the number of top functions to print.
>      - If this flag is not specified, the tool defaults to 25.
>
> Example of usage:
> topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm
>
> Example Output:
> No.  Percentage Name                  Source File
> ----  --------- ------------------    ------------------------------
>    1    24.577% 0x00000000082db000    ???
>    2    20.467% float64_mul           <qemu>/fpu/softfloat.c
>    3    14.720% float64_sub           <qemu>/fpu/softfloat.c
>    4    13.864% float64_add           <qemu>/fpu/softfloat.c
>    5     4.876% helper_mulsd          <qemu>/target/i386/ops_sse.h
>    6     3.767% helper_subsd          <qemu>/target/i386/ops_sse.h
>    7     3.549% helper_addsd          <qemu>/target/i386/ops_sse.h
>    8     2.185% helper_ucomisd        <qemu>/target/i386/ops_sse.h
>    9     1.667% helper_lookup_tb_ptr  <qemu>/include/exec/tb-lookup.h
>   10     1.662% f64_compare           <qemu>/fpu/softfloat.c
>   11     1.509% helper_lookup_tb_ptr  <qemu>/accel/tcg/tcg-runtime.c
>   12     0.635% helper_lookup_tb_ptr  <qemu>/include/exec/exec-all.h
>   13     0.616% float64_div           <qemu>/fpu/softfloat.c
>   14     0.502% helper_pand_xmm       <qemu>/target/i386/ops_sse.h
>   15     0.502% float64_mul           <qemu>/include/fpu/softfloat.h
>   16     0.476% helper_lookup_tb_ptr  <qemu>/target/i386/cpu.h
>   17     0.437% float64_compare_quiet <qemu>/fpu/softfloat.c
>   18     0.414% helper_pxor_xmm       <qemu>/target/i386/ops_sse.h
>   19     0.353% round_to_int          <qemu>/fpu/softfloat.c
>   20     0.347% helper_cc_compute_all <qemu>/target/i386/cc_helper.c
>
> Signed-off-by: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> ---
>  scripts/performance/topN_callgrind.py | 139 ++++++++++++++++++++++++++
>  1 file changed, 139 insertions(+)
>  create mode 100755 scripts/performance/topN_callgrind.py
>
> diff --git a/scripts/performance/topN_callgrind.py b/scripts/performance/topN_callgrind.py
> new file mode 100755
> index 0000000000..6136f72a74
> --- /dev/null
> +++ b/scripts/performance/topN_callgrind.py
> @@ -0,0 +1,139 @@
> +#!/usr/bin/env python3
> +
> +#  Print the top N most executed functions in QEMU using callgrind.
> +#  Syntax:
> +#  topN_callgrind.py [-h] [-n] <number of displayed top functions>  -- \
> +#           <qemu executable> [<qemu executable options>] \
> +#           <target executable> [<target execurable options>]
> +#
> +#  [-h] - Print the script arguments help message.
> +#  [-n] - Specify the number of top functions to print.
> +#       - If this flag is not specified, the tool defaults to 25.
> +#
> +#  Example of usage:
> +#  topN_callgrind.py -n 20 -- qemu-arm coulomb_double-arm
> +#
> +#  This file is a part of the project "TCG Continuous Benchmarking".
> +#
> +#  Copyright (C) 2020  Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> +#  Copyright (C) 2020  Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
> +#
> +#  This program is free software: you can redistribute it and/or modify
> +#  it under the terms of the GNU General Public License as published by
> +#  the Free Software Foundation, either version 2 of the License, or
> +#  (at your option) any later version.
> +#
> +#  This program is distributed in the hope that it will be useful,
> +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
> +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +#  GNU General Public License for more details.
> +#
> +#  You should have received a copy of the GNU General Public License
> +#  along with this program. If not, see <https://www.gnu.org/licenses/>.
> +
> +import argparse
> +import os
> +import subprocess
> +import sys
> +
> +
> +# Parse the command line arguments
> +parser = argparse.ArgumentParser(
> +    usage='topN_callgrind.py [-h] [-n] <number of displayed top functions>  -- '
> +          '<qemu executable> [<qemu executable options>] '
> +          '<target executable> [<target executable options>]')
> +
> +parser.add_argument('-n', dest='top', type=int, default=25,
> +                    help='Specify the number of top functions to print.')
> +
> +parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
> +
> +args = parser.parse_args()
> +
> +# Extract the needed variables from the args
> +command = args.command
> +top = args.top
> +
> +# Insure that valgrind is installed
> +check_valgrind = subprocess.run(

check_valgrind_presence is better that check_valgrind.

> +    ["which", "valgrind"], stdout=subprocess.DEVNULL)
> +if check_valgrind.returncode:
> +    sys.exit("Please install valgrind before running the script!")
> +
> +# Run callgrind
> +callgrind = subprocess.run((["valgrind", "--tool=callgrind",
> +                             "--callgrind-out-file=callgrind.data"] + command),
> +                           stdout=subprocess.DEVNULL, stderr=subprocess.PIPE)

As I described in my comments for perf-related script, it is better to
use /tmp/callgrind.data, rather than just callgrind.data.

> +if callgrind.returncode:
> +    sys.exit(callgrind.stderr.decode("utf-8"))
> +
> +# Save callgrind_annotate output to callgrind_annotate.out
> +with open("callgrind_annotate.out", "w") as output:

/tmp/callgrind_annotate.out

> +    callgrind_annotate = subprocess.run(
> +        ["callgrind_annotate", "callgrind.data"],
> +        stdout=output,
> +        stderr=subprocess.PIPE)
> +    if callgrind_annotate.returncode:
> +        os.unlink('callgrind.data')
> +        output.close()
> +        os.unlink('callgrind_annotate.out')
> +        sys.exit(callgrind_annotate.stderr.decode("utf-8"))
> +
> +
> +# Read the callgrind_annotate output to callgrind_data[]
> +callgrind_data = []
> +with open('callgrind_annotate.out', 'r') as data:
> +    callgrind_data = data.readlines()
> +
> +# Line number with the total number of instructions
> +total_instructions_line_number = 20
> +
> +# Get the total number of instructions
> +total_instructions_line_data = callgrind_data[total_instructions_line_number]
> +total_number_of_instructions = total_instructions_line_data.split(' ')[0]
> +total_number_of_instructions = int(
> +    total_number_of_instructions.replace(',', ''))
> +
> +# Line number with the top function
> +first_func_line = 25
> +
> +# Number of functions recorded by callgrind, last two lines are always empty
> +number_of_functions = len(callgrind_data) - first_func_line - 2
> +
> +# Limit the number of top functions to "top"
> +number_of_top_functions = (top if number_of_functions >
> +                           top else number_of_functions)
> +
> +# Store the data of the top functions in top_functions[]
> +top_functions = callgrind_data[first_func_line:
> +                               first_func_line + number_of_top_functions]
> +
> +# Print table header
> +print('{:>4}  {:>10}  {:<30}  {}\n{}  {}  {}  {}'.format('No.',
> +                                                         'Percentage',
> +                                                         'Name',

Function Name

> +                                                         'Source File',
> +                                                         '-' * 4,
> +                                                         '-' * 10,
> +                                                         '-' * 30,
> +                                                         '-' * 30,
> +                                                         ))
> +
> +# Print top N functions
> +for (index, function) in enumerate(top_functions, start=1):
> +    function_data = function.split()
> +    # Calculate function percentage
> +    function_instructions = float(function_data[0].replace(',', ''))
> +    function_percentage = (function_instructions /
> +                           total_number_of_instructions)*100
> +    # Get function name and source files path
> +    function_source_path, function_name = function_data[1].split(':')

Please replace 'function_source_path' with more accurate 'function_source_file'.

> +    # Print extracted data
> +    print('{:>4}  {:>9.3f}%  {:<30}  {}'.format(index,
> +                                                round(function_percentage, 3),
> +                                                function_name,
> +                                                function_source_path))
> +
> +# Remove intermediate files
> +os.unlink('callgrind.data')
> +os.unlink('callgrind_annotate.out')

os.unlink('/tmp/callgrind.data')
os.unlink('/tmp/callgrind_annotate.out')

Thanks,
Aleksandar

> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] MAINTAINERS: Add 'Performance Tools and Tests' subsection
  2020-06-24 15:31 ` [PATCH v3 3/3] MAINTAINERS: Add 'Performance Tools and Tests' subsection Ahmed Karaman
@ 2020-06-25 10:00   ` Aleksandar Markovic
  0 siblings, 0 replies; 9+ messages in thread
From: Aleksandar Markovic @ 2020-06-25 10:00 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Eduardo Habkost, Alex Bennée,
	QEMU Developers, Cleber Rosa, Richard Henderson

сре, 24. јун 2020. у 17:32 Ahmed Karaman
<ahmedkhaledkaraman@gmail.com> је написао/ла:
>
> This commit creates a new 'Miscellaneous' section which hosts a new
> 'Performance Tools and Tests' subsection.
> The subsection will contain the the performance scripts and benchmarks

Remove 'the the'.

> written as a part of the 'TCG Continuous Benchmarking' project.
>
> Signed-off-by: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> ---

Alex already gave you "Reviewed-by:". When this happens, you are
supposed to add that line in the bottom part of the commit message,
for all future versions of the patch.

The reason for this is that you indicate to Alex and others that he
already agreed, and he doesn't need to look at it again.

So, please add that line in v4. See other series on the list for
examples how other people usually do the same thing.

Thanks,
Aleksandar

>  MAINTAINERS | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 955cc8dd5c..ee4bfc5fb1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2974,3 +2974,10 @@ M: Peter Maydell <peter.maydell@linaro.org>
>  S: Maintained
>  F: docs/conf.py
>  F: docs/*/conf.py
> +
> +Miscellaneous
> +-------------
> +Performance Tools and Tests
> +M: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> +S: Maintained
> +F: scripts/performance/
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] scripts/performance: Add topN_perf.py script
  2020-06-25  9:45   ` Aleksandar Markovic
@ 2020-06-26 14:18     ` Ahmed Karaman
  0 siblings, 0 replies; 9+ messages in thread
From: Ahmed Karaman @ 2020-06-26 14:18 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: Lukáš Doktor, Eduardo Habkost, Alex Bennée,
	QEMU Developers, Cleber Rosa, Richard Henderson

On Thu, Jun 25, 2020 at 11:45 AM Aleksandar Markovic
<aleksandar.qemu.devel@gmail.com> wrote:
>
> сре, 24. јун 2020. у 17:32 Ahmed Karaman
> <ahmedkhaledkaraman@gmail.com> је написао/ла:
> >
> > Syntax:
> > topN_perf.py [-h] [-n] <number of displayed top functions>  -- \
> >                  <qemu executable> [<qemu executable options>] \
> >                  <target executable> [<target execurable options>]
> >
> > [-h] - Print the script arguments help message.
> > [-n] - Specify the number of top functions to print.
> >      - If this flag is not specified, the tool defaults to 25.
> >
> > Example of usage:
> > topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
> >
> > Example Output:
> >  No.  Percentage  Name                       Caller
> > ----  ----------  -------------------------  -------------------------
> >    1      16.25%  float64_mul                qemu-x86_64
> >    2      12.01%  float64_sub                qemu-x86_64
> >    3      11.99%  float64_add                qemu-x86_64
> >    4       5.69%  helper_mulsd               qemu-x86_64
> >    5       4.68%  helper_addsd               qemu-x86_64
> >    6       4.43%  helper_lookup_tb_ptr       qemu-x86_64
> >    7       4.28%  helper_subsd               qemu-x86_64
> >    8       2.71%  f64_compare                qemu-x86_64
> >    9       2.71%  helper_ucomisd             qemu-x86_64
> >   10       1.04%  helper_pand_xmm            qemu-x86_64
> >   11       0.71%  float64_div                qemu-x86_64
> >   12       0.63%  helper_pxor_xmm            qemu-x86_64
> >   13       0.50%  0x00007f7b7004ef95         [JIT] tid 491
> >   14       0.50%  0x00007f7b70044e83         [JIT] tid 491
> >   15       0.36%  helper_por_xmm             qemu-x86_64
> >   16       0.32%  helper_cc_compute_all      qemu-x86_64
> >   17       0.30%  0x00007f7b700433f0         [JIT] tid 491
> >   18       0.30%  float64_compare_quiet      qemu-x86_64
> >   19       0.27%  soft_f64_addsub            qemu-x86_64
> >   20       0.26%  round_to_int               qemu-x86_64
> >
> > Signed-off-by: Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> > ---
> >  scripts/performance/topN_perf.py | 142 +++++++++++++++++++++++++++++++
> >  1 file changed, 142 insertions(+)
> >  create mode 100755 scripts/performance/topN_perf.py
> >
> > diff --git a/scripts/performance/topN_perf.py b/scripts/performance/topN_perf.py
> > new file mode 100755
> > index 0000000000..d2b939c375
> > --- /dev/null
> > +++ b/scripts/performance/topN_perf.py
> > @@ -0,0 +1,142 @@
> > +#!/usr/bin/env python3
> > +
> > +#  Print the top N most executed functions in QEMU using perf.
> > +#  Syntax:
> > +#  topN_perf.py [-h] [-n] <number of displayed top functions>  -- \
> > +#           <qemu executable> [<qemu executable options>] \
> > +#           <target executable> [<target execurable options>]
> > +#
> > +#  [-h] - Print the script arguments help message.
> > +#  [-n] - Specify the number of top functions to print.
> > +#       - If this flag is not specified, the tool defaults to 25.
> > +#
> > +#  Example of usage:
> > +#  topN_perf.py -n 20 -- qemu-arm coulomb_double-arm
> > +#
> > +#  This file is a part of the project "TCG Continuous Benchmarking".
> > +#
> > +#  Copyright (C) 2020  Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
> > +#  Copyright (C) 2020  Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
> > +#
> > +#  This program is free software: you can redistribute it and/or modify
> > +#  it under the terms of the GNU General Public License as published by
> > +#  the Free Software Foundation, either version 2 of the License, or
> > +#  (at your option) any later version.
> > +#
> > +#  This program is distributed in the hope that it will be useful,
> > +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > +#  GNU General Public License for more details.
> > +#
> > +#  You should have received a copy of the GNU General Public License
> > +#  along with this program. If not, see <https://www.gnu.org/licenses/>.
> > +
> > +import argparse
> > +import os
> > +import subprocess
> > +import sys
> > +
> > +
> > +# Parse the command line arguments
> > +parser = argparse.ArgumentParser(
> > +    usage='topN_perf.py [-h] [-n] <number of displayed top functions >  -- '
> > +          '<qemu executable> [<qemu executable options>] '
> > +          '<target executable> [<target executable options>]')
> > +
> > +parser.add_argument('-n', dest='top', type=int, default=25,
> > +                    help='Specify the number of top functions to print.')
> > +
> > +parser.add_argument('command', type=str, nargs='+', help=argparse.SUPPRESS)
> > +
> > +args = parser.parse_args()
> > +
> > +# Extract the needed variables from the args
> > +command = args.command
> > +top = args.top
> > +
> > +# Insure that perf is installed
> > +check_perf = subprocess.run(["which", "perf"], stdout=subprocess.DEVNULL)
> > +if check_perf.returncode:
> > +    sys.exit("Please install perf before running the script!")
>
> I would rename "chech_perf" to "check_perf_presence". It is more
> specific, clearer.
>
> > +
> > +# Insure user has previllage to run perf
> > +check_perf_executability = subprocess.run(["perf", "stat", "ls", "/"],
> > +                           stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
> > +if check_perf_executability.returncode:
> > +    sys.exit(
> > +"""
> > +Error:
> > +You may not have permission to collect stats.
> > +
> > +Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> > +which controls use of the performance events system by
> > +unprivileged users (without CAP_SYS_ADMIN).
> > +
> > +  -1: Allow use of (almost) all events by all users
> > +      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> > +   0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
> > +      Disallow raw tracepoint access by users without CAP_SYS_ADMIN
> > +   1: Disallow CPU event access by users without CAP_SYS_ADMIN
> > +   2: Disallow kernel profiling by users without CAP_SYS_ADMIN
> > +
> > +To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
> > +   kernel.perf_event_paranoid = -1
> > +"""
> > +)
>
> Very good.
>
> > +
> > +# Run perf record
> > +perf_record = subprocess.run((["perf", "record"] + command),
> > +                             stdout=subprocess.DEVNULL, stderr=subprocess.PIPE)
> > +if perf_record.returncode:
> > +    os.unlink('perf.data')
> > +    sys.exit(perf_record.stderr.decode("utf-8"))
>
> Here, the file "perf.data" will be created in the current working
> directory. If one existed prior to script execution, it will be
> overwritten.
>
> I think such "corruption" of current working directory is not optimal.
> It would be better that the script doesn't touch current working
> directory at all (perhaps user wants to keep perf.data he obtained
> from some experiment in the past.
>
> Therefore, I think it would be better if you specify output of "perf
> report" to be "/tmp/perf.data", not "perf.data", which is the default.
> There is an option of "perf record" to specify the output file:
>
>        -o, --output=
>            Output file name.
>
> > +
> > +# Save perf report output to perf_report.out
> > +with open("perf_report.out", "w") as output:
> > +    perf_report = subprocess.run(
> > +        ["perf", "report", "--stdio"], stdout=output, stderr=subprocess.PIPE)
> > +    if perf_report.returncode:
> > +        os.unlink('perf.data')
> > +        output.close()
> > +        os.unlink('perf_report.out')
> > +        sys.exit(perf_report.stderr.decode("utf-8"))
>
> For similar reasons described above, input file should be
> "/tmp/perf.data". Option of "perf report" for input file:
>
>        -i, --input=
>            Input file name.
>
> Output file should be "/tmp/perf_report.out", not "perf_report.out".
>
> > +
> > +# Read the reported data to functions[]
> > +functions = []
> > +with open("perf_report.out", "r") as data:
>
> "/tmp/perf_report.out"
>
> > +    # Only read lines that are not comments (comments start with #)
> > +    # Only read lines that are not empty
> > +    functions = [line for line in data.readlines() if line and line[0]
> > +                 != '#' and line[0] != "\n"]
> > +
> > +# Limit the number of top functions to "top"
> > +number_of_top_functions = top if len(functions) > top else len(functions)
> > +
> > +# Store the data of the top functions in top_functions[]
> > +top_functions = functions[:number_of_top_functions]
> > +
> > +# Print table header
> > +print('{:>4}  {:>10}  {:<30}  {}\n{}  {}  {}  {}'.format('No.',
> > +                                                         'Percentage',
> > +                                                         'Name',
>
> 'Function Name' would be more ergonomic here.
>
> > +                                                         'Caller',
>
> Please replace 'Caller' with 'Invoked by'. 'Caller' implies a function
> that directly calls the function in question. 'Invoked by' avoids such
> confusion, and it just feels more appropriate here.
>
> > +                                                         '-' * 4,
> > +                                                         '-' * 10,
> > +                                                         '-' * 30,
> > +                                                         '-' * 25))
> > +
> > +
> > +# Print top N functions
> > +for (index, function) in enumerate(top_functions, start=1):
> > +    function_data = function.split()
> > +    function_percentage = function_data[0]
> > +    function_name = function_data[-1]
> > +    function_caller = ' '.join(function_data[2:-2])
>
> function_invoker
>
> > +    print('{:>4}  {:>10}  {:<30}  {}'.format(index,
> > +                                             function_percentage,
> > +                                             function_name,
> > +                                             function_caller))
>
> function_invoker
>
> > +
> > +# Remove intermediate files
> > +os.unlink('perf.data')
> > +os.unlink('perf_report.out')
>
> os.unlink('/tmp/perf.data')
> os.unlink('/tmp/perf_report.out')
>
>
> > --
> > 2.17.1
> >

Thanks Mr. Aleksandar. These are really valid points. I'll add these
updates in v4 of this series.

Best regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-26 14:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-24 15:31 [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Ahmed Karaman
2020-06-24 15:31 ` [PATCH v3 1/3] scripts/performance: Add topN_perf.py script Ahmed Karaman
2020-06-25  9:45   ` Aleksandar Markovic
2020-06-26 14:18     ` Ahmed Karaman
2020-06-24 15:31 ` [PATCH v3 2/3] scripts/performance: Add topN_callgrind.py script Ahmed Karaman
2020-06-25  9:54   ` Aleksandar Markovic
2020-06-24 15:31 ` [PATCH v3 3/3] MAINTAINERS: Add 'Performance Tools and Tests' subsection Ahmed Karaman
2020-06-25 10:00   ` Aleksandar Markovic
2020-06-25  9:19 ` [PATCH v3 0/3] Add Scripts for Finding Top 25 Executed Functions Aleksandar Markovic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).