* [PATCH 1/3] perf report: Fix OOM error in TUI mode on s390
2019-05-22 14:45 [PATCH 0/3] Fix OOM in perf report on s390 Thomas Richter
@ 2019-05-22 14:45 ` Thomas Richter
2019-05-22 18:08 ` Arnaldo Carvalho de Melo
2019-05-22 14:46 ` [PATCH 2/3] perf record: Fix kallsym map size for s390 Thomas Richter
2019-05-22 14:46 ` [PATCH 3/3] perf record: Fix s390 missing module symbol and warning for non-root users Thomas Richter
2 siblings, 1 reply; 7+ messages in thread
From: Thomas Richter @ 2019-05-22 14:45 UTC (permalink / raw)
To: linux-kernel, linux-perf-users, acme
Cc: brueckner, heiko.carstens, Thomas Richter
Debugging a OOM error using the TUI interface revealed this issue
on s390:
[tmricht@m83lp54 perf]$ cat /proc/kallsyms |sort
....
00000001119b7158 B radix_tree_node_cachep
00000001119b8000 B __bss_stop
00000001119b8000 B _end
000003ff80002850 t autofs_mount [autofs4]
000003ff80002868 t autofs_show_options [autofs4]
000003ff80002a98 t autofs_evict_inode [autofs4]
....
There is a huge gap between the last kernel symbol
__bss_stop/_end and the first kernel module symbol
autofs_mount (from autofs4 module).
After reading the kernel symbol table via functions:
dso__load()
+--> dso__load_kernel_sym()
+--> dso__load_kallsyms()
+--> __dso_load_kallsyms()
+--> symbols__fixup_end()
the symbol __bss_stop has a start address of 1119b8000 and
an end address of 3ff80002850, as can be seen by this debug statement:
symbols__fixup_end __bss_stop start:0x1119b8000 end:0x3ff80002850
The size of symbol __bss_stop is 0x3fe6e64a850 bytes!
It is the last kernel symbol and fills up the space until
the first kernel module symbol.
This size kills the TUI interface when executing the following
code:
process_sample_event()
hist_entry_iter__add()
hist_iter__report_callback()
hist_entry__inc_addr_samples()
symbol__inc_addr_samples(symbol = __bss_stop)
symbol__cycles_hist()
annotated_source__alloc_histograms(...,
symbol__size(sym),
...)
This function allocates memory to save sample histograms.
The symbol_size() marco is defined as sym->end - sym->start, which
results in above value of 0x3fe6e64a850 bytes and
the call to calloc() in annotated_source__alloc_histograms() fails.
Samples are generated when functions execute.
To fix this I suggest to allow histogram entries only for functions.
Therefore ignore symbol entries which are not of type STT_FUNC.
Output before:
[tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
-i ~/slow.data 2>/tmp/2
[tmricht@m83lp54 perf]$ tail -5 /tmp/2
__symbol__inc_addr_samples(875): ENOMEM! sym->name=__bss_stop,
start=0x1119b8000, addr=0x2aa0005eb08, end=0x3ff80002850,
func: 0
problem adding hist entry, skipping event
0x938b8 [0x8]: failed to process type: 68 [Cannot allocate memory]
[tmricht@m83lp54 perf]$
Output after:
[tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
-i ~/slow.data 2>/tmp/2
[tmricht@m83lp54 perf]$ tail -5 /tmp/2
symbol__inc_addr_samples map:0x1597830 start:0x110730000 end:0x3ff80002850
symbol__hists notes->src:0x2aa2a70 nr_hists:1
symbol__inc_addr_samples sym:unlink_anon_vmas src:0x2aa2a70
__symbol__inc_addr_samples: addr=0x11094c69e
0x11094c670 unlink_anon_vmas: period++ [addr: 0x11094c69e, 0x2e, evidx=0]
=> nr_samples: 1, period: 526008
[tmricht@m83lp54 perf]$
There is no error about failed memory allocation and the TUI interface
shows all entries.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
---
tools/perf/util/annotate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 0b8573fd9b05..005cad111586 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -929,7 +929,7 @@ static int symbol__inc_addr_samples(struct symbol *sym, struct map *map,
{
struct annotated_source *src;
- if (sym == NULL)
+ if (sym == NULL || sym->type != STT_FUNC)
return 0;
src = symbol__hists(sym, evsel->evlist->nr_entries);
if (src == NULL)
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] perf record: Fix kallsym map size for s390
2019-05-22 14:45 [PATCH 0/3] Fix OOM in perf report on s390 Thomas Richter
2019-05-22 14:45 ` [PATCH 1/3] perf report: Fix OOM error in TUI mode " Thomas Richter
@ 2019-05-22 14:46 ` Thomas Richter
2019-05-22 14:46 ` [PATCH 3/3] perf record: Fix s390 missing module symbol and warning for non-root users Thomas Richter
2 siblings, 0 replies; 7+ messages in thread
From: Thomas Richter @ 2019-05-22 14:46 UTC (permalink / raw)
To: linux-kernel, linux-perf-users, acme
Cc: brueckner, heiko.carstens, Thomas Richter
On s390 command
[root@m35lp76 perf]#./perf record -e cycles -- find ~
creates a perf.data file with a strange PERF_RECORD_MMAP for kallsyms:
[root@m35lp76 perf]# ./perf report -D | fgrep kernel.kallsyms
0 0x128 [0x50]: PERF_RECORD_MMAP -1/0:
[0x100000(0x3ff7ff027f0) @ 0x100000]: x [kernel.kallsyms]_text
[root@m35lp76 perf]#
The size of the kernel is 0x3ff7ff027f0 bytes which is simply wrong.
It is the difference between the kernel's _end symbol and the first module:
[root@m35lp76 perf]# cat /proc/kallsyms |sort| les
0000000002472000 B __bss_stop
0000000002472000 B _end
000003ff800027f0 T qdio_stop_irq [qdio]
000003ff80002898 t qdio_do_eqbs [qdio]
The root cause is the following function sequence:
cmd__record
__cmd_record
perf_session__new
perf_session__create_kernel_maps
machine__create_kernel_maps
machine__get_running_kernel_start
machine__update_kernel_mmap
machine__set_kernel_mmap;
Machine__get_running_kernel_start() searches /proc/kallsyms for the
start symbol and then calls machine__update_kernel_mmap() with ~0ULL
for the kernel's end address.
It relies on machine__set_kernel_mmap() to find the next map, which is
usually a module and then uses the first module's start address as
kernel end address.
This works nicely on x86 and similar plattforms but not on s390.
On s390 the kernel starts at address 0x10000 and modules are loaded
at kernel virtual address space somewhere around 0x3ff xxxx xxxx,
leaving a huge gap.
To fix this function machine__create_kernel_maps() also searches for
symbol _end as BSS symbol and if this symbol is found, use this
symbol's address a kernel end in machine__update_kernel_mmap().
Otherwise fall back to the previous method and use the first
kernel module's load address (as before).
Output before:
[root@m35lp76 perf]# ./perf record -e cycles -- find ~
[root@m35lp76 perf]# ./perf report -D | fgrep kernel.kallsyms
0 0x128 [0x50]: PERF_RECORD_MMAP -1/0:
[0x100000(0x3ff7ff027f0) @ 0x100000]:
x [kernel.kallsyms]_text
[root@m35lp76 perf]#
Output after:
[root@m35lp76 perf]# ./perf record -e cycles -- find ~
[root@m35lp76 perf]# ./perf report -D | fgrep kernel.kallsyms
0 0x128 [0x50]: PERF_RECORD_MMAP -1/0:
[0x100000(0x2372000) @ 0x100000]:
x [kernel.kallsyms]_text
[root@m35lp76 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
---
tools/perf/util/event.c | 4 ++--
tools/perf/util/machine.c | 29 +++++++++++++++++++++++------
2 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index d1ad6c419724..96b4cbdb655e 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -876,11 +876,11 @@ static int find_symbol_cb(void *arg, const char *name, char type,
/*
* Must be a function or at least an alias, as in PARISC64, where "_text" is
* an 'A' to the same address as "_stext".
+ * When searching for symbol _end allow symbol type 'B'.
*/
if (!(kallsyms__is_function(type) ||
- type == 'A') || strcmp(name, args->name))
+ type == 'A' || type == 'B') || strcmp(name, args->name))
return 0;
-
args->start = start;
return 1;
}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 28a9541c4835..c278c1fe6dd3 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -918,13 +918,18 @@ void machine__get_kallsyms_filename(struct machine *machine, char *buf,
}
const char *ref_reloc_sym_names[] = {"_text", "_stext", NULL};
+static const char *const ref_sym_endnames[] = {"_end", NULL};
/* Figure out the start address of kernel map from /proc/kallsyms.
* Returns the name of the start symbol in *symbol_name. Pass in NULL as
* symbol_name if it's not that important.
+ *
+ * Get kernel end address for kernel map from /proc/kallsyms by
+ * searching for symbol _end.
*/
static int machine__get_running_kernel_start(struct machine *machine,
- const char **symbol_name, u64 *start)
+ const char **symbol_name,
+ u64 *start, u64 *kernel_end)
{
char filename[PATH_MAX];
int i, err = -1;
@@ -949,6 +954,16 @@ static int machine__get_running_kernel_start(struct machine *machine,
*symbol_name = name;
*start = addr;
+
+ /* Get kernel end address and store it when found */
+ for (i = 0; (name = ref_sym_endnames[i]) != NULL; i++) {
+ err = kallsyms__get_function_start(filename, name, &addr);
+ if (!err) {
+ *kernel_end = addr;
+ break;
+ }
+ }
+
return 0;
}
@@ -1441,7 +1456,7 @@ int machine__create_kernel_maps(struct machine *machine)
struct dso *kernel = machine__get_kernel(machine);
const char *name = NULL;
struct map *map;
- u64 addr = 0;
+ u64 addr = 0, e_addr = 0;
int ret;
if (kernel == NULL)
@@ -1460,7 +1475,7 @@ int machine__create_kernel_maps(struct machine *machine)
"continuing anyway...\n", machine->pid);
}
- if (!machine__get_running_kernel_start(machine, &name, &addr)) {
+ if (!machine__get_running_kernel_start(machine, &name, &addr, &e_addr)) {
if (name &&
map__set_kallsyms_ref_reloc_sym(machine->vmlinux_map, name, addr)) {
machine__destroy_kernel_maps(machine);
@@ -1472,15 +1487,17 @@ int machine__create_kernel_maps(struct machine *machine)
* we have a real start address now, so re-order the kmaps
* assume it's the last in the kmaps
*/
- machine__update_kernel_mmap(machine, addr, ~0ULL);
+ machine__update_kernel_mmap(machine, addr, e_addr ?: ~0ULL);
}
if (machine__create_extra_kernel_maps(machine, kernel))
pr_debug("Problems creating extra kernel maps, continuing anyway...\n");
- /* update end address of the kernel map using adjacent module address */
+ /* update end address of the kernel map using adjacent module address
+ * only when the kernel end could not be determined.
+ */
map = map__next(machine__kernel_map(machine));
- if (map)
+ if (!e_addr && map)
machine__set_kernel_mmap(machine, addr, map->start);
out_put:
dso__put(kernel);
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] perf record: Fix s390 missing module symbol and warning for non-root users
2019-05-22 14:45 [PATCH 0/3] Fix OOM in perf report on s390 Thomas Richter
2019-05-22 14:45 ` [PATCH 1/3] perf report: Fix OOM error in TUI mode " Thomas Richter
2019-05-22 14:46 ` [PATCH 2/3] perf record: Fix kallsym map size for s390 Thomas Richter
@ 2019-05-22 14:46 ` Thomas Richter
2019-05-22 18:03 ` Arnaldo Carvalho de Melo
2 siblings, 1 reply; 7+ messages in thread
From: Thomas Richter @ 2019-05-22 14:46 UTC (permalink / raw)
To: linux-kernel, linux-perf-users, acme
Cc: brueckner, heiko.carstens, Thomas Richter
Command 'perf record' and 'perf report' on a system without kernel
debuginfo packages uses /proc/kallsyms and /proc/modules to find
addresses for kernel and module symbols. On x86 this works for root
and non-root users.
On s390, when invoked as non-root user, many of the following warnings
are shown and module symbols are missing:
proc/{kallsyms,modules} inconsistency while looking for
"[sha1_s390]" module!
Command 'perf record' creates a list of module start addresses by
parsing the output of /proc/modules and creates a PERF_RECORD_MMAP
record for the kernel and each module. The following function call
sequence is executed:
machine__create_kernel_maps
machine__create_module
modules__parse
machine__create_module --> for each line in /proc/modules
arch__fix_module_text_start
Function arch__fix_module_text_start() is s390 specific. It opens
file /sys/module/<name>/sections/.text to extract the module's .text
section start address. On s390 the module loader prepends a header
before the first section, whereas on x86 the module's text section
address is identical the the module's load address.
However module section files are root readable only. For non-root the
read operation fails and machine__create_module() returns an error.
Command perf record does not generate any PERF_RECORD_MMAP record
for loaded modules. Later command perf report complains about missing
module maps.
To fix this function arch__fix_module_text_start() always returns
success. For root users there is no change, for non-root users
the module's load address is used as module's text start address
(the prepended header then counts as part of the text section).
This enable non-root users to use module symbols and avoid the
warning when perf report is executed.
Output before:
[tmricht@m83lp54 perf]$ ./perf report -D | fgrep MMAP
0 0x168 [0x50]: PERF_RECORD_MMAP ... x [kernel.kallsyms]_text
Output after:
[tmricht@m83lp54 perf]$ ./perf report -D | fgrep MMAP
0 0x168 [0x50]: PERF_RECORD_MMAP ... x [kernel.kallsyms]_text
0 0x1b8 [0x98]: PERF_RECORD_MMAP ... x /lib/modules/.../autofs4.ko.xz
0 0x250 [0xa8]: PERF_RECORD_MMAP ... x /lib/modules/.../sha_common.ko.xz
0 0x2f8 [0x98]: PERF_RECORD_MMAP ... x /lib/modules/.../des_generic.ko.xz
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
---
tools/perf/arch/s390/util/machine.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/perf/arch/s390/util/machine.c b/tools/perf/arch/s390/util/machine.c
index 0b2054007314..a19690a17291 100644
--- a/tools/perf/arch/s390/util/machine.c
+++ b/tools/perf/arch/s390/util/machine.c
@@ -5,16 +5,19 @@
#include "util.h"
#include "machine.h"
#include "api/fs/fs.h"
+#include "debug.h"
int arch__fix_module_text_start(u64 *start, const char *name)
{
+ u64 m_start = *start;
char path[PATH_MAX];
snprintf(path, PATH_MAX, "module/%.*s/sections/.text",
(int)strlen(name) - 2, name + 1);
-
- if (sysfs__read_ull(path, (unsigned long long *)start) < 0)
- return -1;
+ if (sysfs__read_ull(path, (unsigned long long *)start) < 0) {
+ pr_debug2("Using module %s start:%#lx\n", path, m_start);
+ *start = m_start;
+ }
return 0;
}
--
2.19.1
^ permalink raw reply related [flat|nested] 7+ messages in thread