linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
@ 2023-01-10  5:58 Ravi Bangoria
  2023-01-10  5:58 ` [RFC 1/4] perf tool: Simplify machine__create_modules() a bit Ravi Bangoria
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-10  5:58 UTC (permalink / raw)
  To: acme
  Cc: ravi.bangoria, jolsa, namhyung, irogers, kan.liang, peterz,
	mark.rutland, mingo, alexander.shishkin, james.clark,
	german.gomez, leo.yan, adrian.hunter, alexey.v.bayduraev,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	santosh.shukla

Kernel module elf contains executable code in non-".text" sections as
well, for ex: ".noinstr.text". Plus, kernel module's memory layout
differs from it's binary layout because .ko elf does not contain
program header table.

Perf tries to solve it by creating special maps for allocated (SHF_ALLOC)
elf sections, but perf uses elf addresses for map address range and thus
these special maps remains unused because no real ip falls into their
address range.

Solve this by preparing section specific special maps using addresses
provided by sysfs /sys/module/.../sections/. Also save these details in
PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at
perf-report time.

Without patchset:

  # perf record -a -c 5000000
  # perf report
  Overhead  Command          Shared Object           Symbol
    13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
     6.58%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e6
     6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
     4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
     2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
     1.98%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015171
     1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit
     1.04%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e2
     0.94%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015174

  Same perf.data with kallsyms:

  # perf report --kallsyms=/proc/kallsyms
  Overhead  Command          Shared Object           Symbol
    14.22%  qemu-system-x86  [kvm_amd]               [k] __svm_vcpu_run
    13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
     6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
     4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
     2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
     1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit

With patchset:

  # perf record -a -c 5000000
  # perf report
  Overhead  Command          Shared Object           Symbol
    13.44%  qemu-system-x86  [kvm-amd].noinstr.text  [k] __svm_vcpu_run
    13.25%  qemu-system-x86  [unknown]               [.] 0x000055f4c6563973
     7.13%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.00%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     5.13%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     4.83%  qemu-system-x86  [kvm]                   [k] vcpu_run
     3.65%  qemu-system-x86  [kvm]                   [k] kvm_cpuid

  Same perf.data with kallsyms:

  # perf report --kallsyms=/proc/kallsyms
  Overhead  Command          Shared Object       Symbol
    13.44%  qemu-system-x86  [kernel.vmlinux]    [k] __svm_vcpu_run
    13.25%  qemu-system-x86  [unknown]           [.] 0x000055f4c6563973
     7.13%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_gdt
     6.00%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_tr_desc
     5.13%  qemu-system-x86  [kernel.vmlinux]    [k] svm_vcpu_run
     4.83%  qemu-system-x86  [kernel.vmlinux]    [k] vcpu_run
     3.65%  qemu-system-x86  [kernel.vmlinux]    [k] kvm_cpuid

This is an RFC only series. TODOs:
 - I'm just recording module path in PERF_RECORD_KMOD_SEC_MAP. It's very
   much possible that, at perf report time, a module file exists at the
   same path but it's internal layout is different. I think I need to add
   some buildid check. Any ideas?
 - I've enabled host perf-record/report only. It doesn't work for guest
   modules because host does not have access to guest sysfs. I'm yet to
   figure out how to fix it. May be we can add --guest-mod-sysfs option.
   Any ideas?
 - Also, I'm currently assuming that module files are not compressed.
 - I've seen perf build failures when compiling with NO_LIBELF=1.
 - I've seen perf report not honoring --kallsyms in certain conditions.

Prepared on top of acme/perf/core (69b41ac87e4a6)

Ravi Bangoria (4):
  perf tool: Simplify machine__create_modules() a bit
  perf tool: Refactor perf_event__synthesize_modules()
  perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP
  perf tool: Fix non-".text" symbol resolution for kernel modules

 tools/lib/perf/Documentation/libperf.txt |   1 +
 tools/lib/perf/include/perf/event.h      |  25 +++
 tools/perf/builtin-annotate.c            |   1 +
 tools/perf/builtin-c2c.c                 |   1 +
 tools/perf/builtin-diff.c                |   1 +
 tools/perf/builtin-inject.c              |   1 +
 tools/perf/builtin-kmem.c                |   1 +
 tools/perf/builtin-mem.c                 |   1 +
 tools/perf/builtin-record.c              |  14 ++
 tools/perf/builtin-report.c              |   1 +
 tools/perf/builtin-script.c              |  13 ++
 tools/perf/builtin-trace.c               |   1 +
 tools/perf/util/build-id.c               |   1 +
 tools/perf/util/data-convert-bt.c        |   1 +
 tools/perf/util/data-convert-json.c      |   1 +
 tools/perf/util/event.c                  |  22 ++
 tools/perf/util/event.h                  |   5 +
 tools/perf/util/machine.c                | 268 ++++++++++++++++++++++-
 tools/perf/util/machine.h                |   2 +
 tools/perf/util/map.c                    |   1 +
 tools/perf/util/map.h                    |   4 +
 tools/perf/util/session.c                |  17 ++
 tools/perf/util/symbol-elf.c             |   9 +-
 tools/perf/util/symbol.c                 |   2 +-
 tools/perf/util/synthetic-events.c       | 136 +++++++++---
 tools/perf/util/tool.h                   |   3 +-
 26 files changed, 494 insertions(+), 39 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 1/4] perf tool: Simplify machine__create_modules() a bit
  2023-01-10  5:58 [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
@ 2023-01-10  5:58 ` Ravi Bangoria
  2023-01-10  5:58 ` [RFC 2/4] perf tool: Refactor perf_event__synthesize_modules() Ravi Bangoria
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-10  5:58 UTC (permalink / raw)
  To: acme
  Cc: ravi.bangoria, jolsa, namhyung, irogers, kan.liang, peterz,
	mark.rutland, mingo, alexander.shishkin, james.clark,
	german.gomez, leo.yan, adrian.hunter, alexey.v.bayduraev,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	santosh.shukla

Tweak this function a bit. No need to return twice.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/machine.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 803c9d1803dd..ab2919bc0a0d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1613,10 +1613,8 @@ static int machine__create_modules(struct machine *machine)
 	if (modules__parse(modules, machine, machine__create_module))
 		return -1;
 
-	if (!machine__set_modules_path(machine))
-		return 0;
-
-	pr_debug("Problems setting modules path maps, continuing anyway...\n");
+	if (machine__set_modules_path(machine))
+		pr_debug("Problems setting modules path maps, continuing anyway...\n");
 
 	return 0;
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC 2/4] perf tool: Refactor perf_event__synthesize_modules()
  2023-01-10  5:58 [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
  2023-01-10  5:58 ` [RFC 1/4] perf tool: Simplify machine__create_modules() a bit Ravi Bangoria
@ 2023-01-10  5:58 ` Ravi Bangoria
  2023-01-10  5:58 ` [RFC 3/4] perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP Ravi Bangoria
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-10  5:58 UTC (permalink / raw)
  To: acme
  Cc: ravi.bangoria, jolsa, namhyung, irogers, kan.liang, peterz,
	mark.rutland, mingo, alexander.shishkin, james.clark,
	german.gomez, leo.yan, adrian.hunter, alexey.v.bayduraev,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	santosh.shukla

perf_event__synthesize_modules() synthesizes MMAP2 and MMAP events.
Split them into separate functions.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/synthetic-events.c | 77 +++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 3ab6a92b1a6d..2e145517f192 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -665,6 +665,50 @@ int perf_event__synthesize_cgroups(struct perf_tool *tool __maybe_unused,
 }
 #endif
 
+static void __perf_event__synthesize_modules_mmap2(struct machine *machine,
+						   struct map *map,
+						   union perf_event *event)
+{
+	size_t size = PERF_ALIGN(map->dso->long_name_len + 1, sizeof(u64));
+
+	event->mmap2.header.type = PERF_RECORD_MMAP2;
+	event->mmap2.header.size = sizeof(event->mmap2)
+				   - sizeof(event->mmap2.filename)
+				   + size
+				   + machine->id_hdr_size;
+
+	memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
+	event->mmap2.start = map->start;
+	event->mmap2.len   = map->end - map->start;
+	event->mmap2.pid   = machine->pid;
+
+	memcpy(event->mmap2.filename, map->dso->long_name,
+	       map->dso->long_name_len + 1);
+
+	perf_record_mmap2__read_build_id(&event->mmap2, machine, false);
+}
+
+static void __perf_event__synthesize_modules_mmap(struct machine *machine,
+						  struct map *map,
+						  union perf_event *event)
+{
+	size_t size = PERF_ALIGN(map->dso->long_name_len + 1, sizeof(u64));
+
+	event->mmap.header.type = PERF_RECORD_MMAP;
+	event->mmap.header.size = sizeof(event->mmap)
+				  - sizeof(event->mmap.filename)
+				  + size
+				  + machine->id_hdr_size;
+
+	memset(event->mmap.filename + size, 0, machine->id_hdr_size);
+	event->mmap.start = map->start;
+	event->mmap.len   = map->end - map->start;
+	event->mmap.pid   = machine->pid;
+
+	memcpy(event->mmap.filename, map->dso->long_name,
+	       map->dso->long_name_len + 1);
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
@@ -695,35 +739,10 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
 		if (!__map__is_kmodule(pos))
 			continue;
 
-		if (symbol_conf.buildid_mmap2) {
-			size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
-			event->mmap2.header.type = PERF_RECORD_MMAP2;
-			event->mmap2.header.size = (sizeof(event->mmap2) -
-						(sizeof(event->mmap2.filename) - size));
-			memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
-			event->mmap2.header.size += machine->id_hdr_size;
-			event->mmap2.start = pos->start;
-			event->mmap2.len   = pos->end - pos->start;
-			event->mmap2.pid   = machine->pid;
-
-			memcpy(event->mmap2.filename, pos->dso->long_name,
-			       pos->dso->long_name_len + 1);
-
-			perf_record_mmap2__read_build_id(&event->mmap2, machine, false);
-		} else {
-			size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
-			event->mmap.header.type = PERF_RECORD_MMAP;
-			event->mmap.header.size = (sizeof(event->mmap) -
-						(sizeof(event->mmap.filename) - size));
-			memset(event->mmap.filename + size, 0, machine->id_hdr_size);
-			event->mmap.header.size += machine->id_hdr_size;
-			event->mmap.start = pos->start;
-			event->mmap.len   = pos->end - pos->start;
-			event->mmap.pid   = machine->pid;
-
-			memcpy(event->mmap.filename, pos->dso->long_name,
-			       pos->dso->long_name_len + 1);
-		}
+		if (symbol_conf.buildid_mmap2)
+			__perf_event__synthesize_modules_mmap2(machine, pos, event);
+		else
+			__perf_event__synthesize_modules_mmap(machine, pos, event);
 
 		if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
 			rc = -1;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC 3/4] perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP
  2023-01-10  5:58 [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
  2023-01-10  5:58 ` [RFC 1/4] perf tool: Simplify machine__create_modules() a bit Ravi Bangoria
  2023-01-10  5:58 ` [RFC 2/4] perf tool: Refactor perf_event__synthesize_modules() Ravi Bangoria
@ 2023-01-10  5:58 ` Ravi Bangoria
  2023-01-16  6:14   ` Adrian Hunter
  2023-01-10  5:58 ` [RFC 4/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-10  5:58 UTC (permalink / raw)
  To: acme
  Cc: ravi.bangoria, jolsa, namhyung, irogers, kan.liang, peterz,
	mark.rutland, mingo, alexander.shishkin, james.clark,
	german.gomez, leo.yan, adrian.hunter, alexey.v.bayduraev,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	santosh.shukla

Introduce, perf tool only, synthetic event type PERF_RECORD_KMOD_SEC_MAP.
Also add stub code for it. This event will be used to save/restore kernel
module section maps to/from perf.data file. This is needed because kernel
module elfs does not contain program header table and thus there is no
easy way to find out how kernel would have loaded module sections in the
memory.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/lib/perf/Documentation/libperf.txt |  1 +
 tools/lib/perf/include/perf/event.h      | 25 ++++++++++++++++++++++++
 tools/perf/util/event.c                  |  1 +
 tools/perf/util/session.c                | 12 ++++++++++++
 tools/perf/util/tool.h                   |  3 ++-
 5 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/Documentation/libperf.txt b/tools/lib/perf/Documentation/libperf.txt
index a8f1a237931b..b62730b84cc5 100644
--- a/tools/lib/perf/Documentation/libperf.txt
+++ b/tools/lib/perf/Documentation/libperf.txt
@@ -211,6 +211,7 @@ SYNOPSIS
   struct perf_record_time_conv;
   struct perf_record_header_feature;
   struct perf_record_compressed;
+  struct perf_record_kmod_sec_maps;
 --
 
 DESCRIPTION
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index ad47d7b31046..404b23b6902b 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -438,6 +438,29 @@ struct perf_record_compressed {
 	char			 data[];
 };
 
+/* Kernel module elf section maps */
+struct perf_record_kmod_sec_map {
+	struct perf_event_header header;
+	/* Machine id. Same as synthesized PERF_RECORD_MMAP */
+	__u32			 pid;
+	/* Section start ip address */
+	__u64			 start;
+	/* Section length */
+	__u64			 len;
+	/* Section page offset in kernel module elf file */
+	__u64			 pgoff;
+	/* Section name length, including '\0' */
+	__u16			 sec_name_len;
+	/* Kernel module filename(path) length, including '\0' */
+	__u16			 filename_len;
+	/*
+	 * Section name and filename stored as: "sec_name\0filename\0". i.e:
+	 * data[0]: Section name
+	 * data[sec_name_len + 1]: File name
+	 */
+	char			 data[];
+};
+
 enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_USER_TYPE_START		= 64,
 	PERF_RECORD_HEADER_ATTR			= 64,
@@ -459,6 +482,7 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_HEADER_FEATURE		= 80,
 	PERF_RECORD_COMPRESSED			= 81,
 	PERF_RECORD_FINISHED_INIT		= 82,
+	PERF_RECORD_KMOD_SEC_MAP		= 83,
 	PERF_RECORD_HEADER_MAX
 };
 
@@ -499,6 +523,7 @@ union perf_event {
 	struct perf_record_time_conv		time_conv;
 	struct perf_record_header_feature	feat;
 	struct perf_record_compressed		pack;
+	struct perf_record_kmod_sec_map		kmod_sec_map;
 };
 
 #endif /* __LIBPERF_EVENT_H */
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 1fa14598b916..1b03061440bc 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -77,6 +77,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_HEADER_FEATURE]		= "FEATURE",
 	[PERF_RECORD_COMPRESSED]		= "COMPRESSED",
 	[PERF_RECORD_FINISHED_INIT]		= "FINISHED_INIT",
+	[PERF_RECORD_KMOD_SEC_MAP]		= "KMOD_SEC_MAP",
 };
 
 const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7c021c6cedb9..4f5165cd58de 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -563,6 +563,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->compressed = perf_session__process_compressed_event;
 	if (tool->finished_init == NULL)
 		tool->finished_init = process_event_op2_stub;
+	if (tool->kmod_sec_map == NULL)
+		tool->kmod_sec_map = process_event_stub;
 }
 
 static void swap_sample_id_all(union perf_event *event, void *data)
@@ -997,6 +999,12 @@ static void perf_event__time_conv_swap(union perf_event *event,
 	}
 }
 
+static void perf_event_kmod_sec_map_swap(union perf_event *event __maybe_unused,
+					  bool sample_id_all __maybe_unused)
+{
+	/* FIXME */
+}
+
 typedef void (*perf_event__swap_op)(union perf_event *event,
 				    bool sample_id_all);
 
@@ -1035,6 +1043,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_STAT_ROUND]	  = perf_event__stat_round_swap,
 	[PERF_RECORD_EVENT_UPDATE]	  = perf_event__event_update_swap,
 	[PERF_RECORD_TIME_CONV]		  = perf_event__time_conv_swap,
+	[PERF_RECORD_KMOD_SEC_MAP]	  = perf_event_kmod_sec_map_swap,
 	[PERF_RECORD_HEADER_MAX]	  = NULL,
 };
 
@@ -1727,6 +1736,9 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 		return err;
 	case PERF_RECORD_FINISHED_INIT:
 		return tool->finished_init(session, event);
+	case PERF_RECORD_KMOD_SEC_MAP:
+		/* Currently PERF_RECORD_KMOD_SEC_MAP is supported only for host */
+		return tool->kmod_sec_map(tool, event, &sample, &session->machines.host);
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index c957fb849ac6..8ea7fb85c196 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -60,7 +60,8 @@ struct perf_tool {
 			unthrottle,
 			ksymbol,
 			bpf,
-			text_poke;
+			text_poke,
+			kmod_sec_map;
 
 	event_attr_op	attr;
 	event_attr_op	event_update;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC 4/4] perf tool: Fix non-".text" symbol resolution for kernel modules
  2023-01-10  5:58 [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
                   ` (2 preceding siblings ...)
  2023-01-10  5:58 ` [RFC 3/4] perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP Ravi Bangoria
@ 2023-01-10  5:58 ` Ravi Bangoria
  2023-01-10  6:35 ` [RFC 0/4] " Adrian Hunter
  2023-01-16  4:21 ` Ravi Bangoria
  5 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-10  5:58 UTC (permalink / raw)
  To: acme
  Cc: ravi.bangoria, jolsa, namhyung, irogers, kan.liang, peterz,
	mark.rutland, mingo, alexander.shishkin, james.clark,
	german.gomez, leo.yan, adrian.hunter, alexey.v.bayduraev,
	linux-perf-users, linux-kernel, sandipan.das, ananth.narayan,
	santosh.shukla

Kernel module elf contains executable code in non-".text" sections as
well, for ex: ".noinstr.text". Plus, kernel module's memory layout
differs from it's binary layout because .ko elf does not contain
program header table.

Perf tries to solve it by creating special maps for allocated (SHF_ALLOC)
elf sections, but perf uses elf addresses for map address range and thus
these special maps remains unused because no real ip falls into their
address range.

Solve this by preparing section specific special maps using addresses
provided by sysfs /sys/module/.../sections/. Also save these details via
PERF_RECORD_KMOD_SEC_MAPs in perf.data which can be consumed at perf-
report time.

Without patch:

  # perf record -a -c 5000000
  # perf report
  Overhead  Command          Shared Object           Symbol
    13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
     6.58%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e6
     6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
     4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
     2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
     1.98%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015171
     1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit
     1.04%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e2
     0.94%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015174

  Same perf.data with kallsyms:

  # perf report --kallsyms=/proc/kallsyms
  Overhead  Command          Shared Object           Symbol
    14.22%  qemu-system-x86  [kvm_amd]               [k] __svm_vcpu_run
    13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
     6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
     4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
     2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
     1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit

With patch:

  # perf record -a -c 5000000
  # perf report
  Overhead  Command          Shared Object           Symbol
    13.44%  qemu-system-x86  [kvm-amd].noinstr.text  [k] __svm_vcpu_run
    13.25%  qemu-system-x86  [unknown]               [.] 0x000055f4c6563973
     7.13%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.00%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     5.13%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     4.83%  qemu-system-x86  [kvm]                   [k] vcpu_run
     3.65%  qemu-system-x86  [kvm]                   [k] kvm_cpuid

  Same perf.data with kallsyms:

  # perf report --kallsyms=/proc/kallsyms
  Overhead  Command          Shared Object       Symbol
    13.44%  qemu-system-x86  [kernel.vmlinux]    [k] __svm_vcpu_run
    13.25%  qemu-system-x86  [unknown]           [.] 0x000055f4c6563973
     7.13%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_gdt
     6.00%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_tr_desc
     5.13%  qemu-system-x86  [kernel.vmlinux]    [k] svm_vcpu_run
     4.83%  qemu-system-x86  [kernel.vmlinux]    [k] vcpu_run
     3.65%  qemu-system-x86  [kernel.vmlinux]    [k] kvm_cpuid

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/builtin-annotate.c       |   1 +
 tools/perf/builtin-c2c.c            |   1 +
 tools/perf/builtin-diff.c           |   1 +
 tools/perf/builtin-inject.c         |   1 +
 tools/perf/builtin-kmem.c           |   1 +
 tools/perf/builtin-mem.c            |   1 +
 tools/perf/builtin-record.c         |  14 ++
 tools/perf/builtin-report.c         |   1 +
 tools/perf/builtin-script.c         |  13 ++
 tools/perf/builtin-trace.c          |   1 +
 tools/perf/util/build-id.c          |   1 +
 tools/perf/util/data-convert-bt.c   |   1 +
 tools/perf/util/data-convert-json.c |   1 +
 tools/perf/util/event.c             |  21 +++
 tools/perf/util/event.h             |   5 +
 tools/perf/util/machine.c           | 264 ++++++++++++++++++++++++++++
 tools/perf/util/machine.h           |   2 +
 tools/perf/util/map.c               |   1 +
 tools/perf/util/map.h               |   4 +
 tools/perf/util/session.c           |  11 +-
 tools/perf/util/symbol-elf.c        |   9 +-
 tools/perf/util/symbol.c            |   2 +-
 tools/perf/util/synthetic-events.c  |  61 ++++++-
 23 files changed, 408 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 90458ca6933f..7d315e5f69f6 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -506,6 +506,7 @@ int cmd_annotate(int argc, const char **argv)
 			.auxtrace_info	= perf_event__process_auxtrace_info,
 			.auxtrace	= perf_event__process_auxtrace,
 			.feature	= process_feature_event,
+			.kmod_sec_map	= perf_event__process_kmod_sec_map,
 			.ordered_events = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 52d94c7dd836..65e97da8f1cf 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -392,6 +392,7 @@ static struct perf_c2c c2c = {
 		.auxtrace_info  = perf_event__process_auxtrace_info,
 		.auxtrace       = perf_event__process_auxtrace,
 		.auxtrace_error = perf_event__process_auxtrace_error,
+		.kmod_sec_map	= perf_event__process_kmod_sec_map,
 		.ordered_events	= true,
 		.ordering_requires_timestamps = true,
 	},
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index ed07cc6cca56..c4a60d9074b0 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -476,6 +476,7 @@ static struct perf_diff pdiff = {
 		.lost	= perf_event__process_lost,
 		.namespaces = perf_event__process_namespaces,
 		.cgroup = perf_event__process_cgroup,
+		.kmod_sec_map = perf_event__process_kmod_sec_map,
 		.ordered_events = true,
 		.ordering_requires_timestamps = true,
 	},
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 3f4e4dd5abf3..5665bf4b9613 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -2180,6 +2180,7 @@ int cmd_inject(int argc, const char **argv)
 			.finished_init	= perf_event__repipe_op2_synth,
 			.compressed	= perf_event__repipe_op4_synth,
 			.auxtrace	= perf_event__repipe_auxtrace,
+			.kmod_sec_map	= perf_event__repipe,
 		},
 		.input_name  = "-",
 		.samples = LIST_HEAD_INIT(inject.samples),
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index e20656c431a4..5e472f9fb39c 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -971,6 +971,7 @@ static struct perf_tool perf_kmem = {
 	.mmap		 = perf_event__process_mmap,
 	.mmap2		 = perf_event__process_mmap2,
 	.namespaces	 = perf_event__process_namespaces,
+	.kmod_sec_map	 = perf_event__process_kmod_sec_map,
 	.ordered_events	 = true,
 };
 
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index dedd612eae5e..a5443fb5baf1 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -459,6 +459,7 @@ int cmd_mem(int argc, const char **argv)
 			.auxtrace_info  = perf_event__process_auxtrace_info,
 			.auxtrace       = perf_event__process_auxtrace,
 			.auxtrace_error = perf_event__process_auxtrace_error,
+			.kmod_sec_map	= perf_event__process_kmod_sec_map,
 			.ordered_events	= true,
 		},
 		.input_name		 = "perf.data",
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 29dcd454b8e2..1e0285682813 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3252,6 +3252,19 @@ static int build_id__process_mmap2(struct perf_tool *tool, union perf_event *eve
 	return perf_event__process_mmap2(tool, event, sample, machine);
 }
 
+static int build_id__process_kmod_sec_map(struct perf_tool *tool, union perf_event *event,
+					  struct perf_sample *sample, struct machine *machine)
+{
+	/*
+	 * We already have the kernel maps, put in place via perf_session__create_kernel_maps()
+	 * no need to add them twice.
+	 */
+	if (!(event->header.misc & PERF_RECORD_MISC_USER))
+		return 0;
+
+	return perf_event__process_kmod_sec_map(tool, event, sample, machine);
+}
+
 static int process_timestamp_boundary(struct perf_tool *tool,
 				      union perf_event *event __maybe_unused,
 				      struct perf_sample *sample,
@@ -3320,6 +3333,7 @@ static struct record record = {
 		.mmap2		= build_id__process_mmap2,
 		.itrace_start	= process_timestamp_boundary,
 		.aux		= process_timestamp_boundary,
+		.kmod_sec_map	= build_id__process_kmod_sec_map,
 		.ordered_events	= true,
 	},
 };
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2ee2ecca208e..b7a9a8f72d63 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1212,6 +1212,7 @@ int cmd_report(int argc, const char **argv)
 			.auxtrace	 = perf_event__process_auxtrace,
 			.event_update	 = perf_event__process_event_update,
 			.feature	 = process_feature_event,
+			.kmod_sec_map	 = perf_event__process_kmod_sec_map,
 			.ordered_events	 = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 69394ac0a20d..711f77edee60 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2591,6 +2591,17 @@ static int process_mmap2_event(struct perf_tool *tool,
 			   event->mmap2.tid);
 }
 
+static int process_kmod_sec_map_event(struct perf_tool *tool,
+				      union perf_event *event,
+				      struct perf_sample *sample,
+				      struct machine *machine)
+{
+	if (perf_event__process_kmod_sec_map(tool, event, sample, machine) < 0)
+		return -1;
+
+	return print_event(tool, event, sample, machine, event->kmod_sec_map.pid, -1);
+}
+
 static int process_switch_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -2775,6 +2786,7 @@ static int __cmd_script(struct perf_script *script)
 	if (script->show_mmap_events) {
 		script->tool.mmap = process_mmap_event;
 		script->tool.mmap2 = process_mmap2_event;
+		script->tool.kmod_sec_map = process_kmod_sec_map_event;
 	}
 	if (script->show_switch_events || (scripting_ops && scripting_ops->process_switch))
 		script->tool.context_switch = process_switch_event;
@@ -3805,6 +3817,7 @@ int cmd_script(int argc, const char **argv)
 			.cpu_map	 = process_cpu_map_event,
 			.throttle	 = process_throttle_event,
 			.unthrottle	 = process_throttle_event,
+			.kmod_sec_map	 = perf_event__process_kmod_sec_map,
 			.ordered_events	 = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 86e06f136f40..f046a0b8862f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -4157,6 +4157,7 @@ static int trace__replay(struct trace *trace)
 	trace->tool.tracing_data  = perf_event__process_tracing_data;
 	trace->tool.build_id	  = perf_event__process_build_id;
 	trace->tool.namespaces	  = perf_event__process_namespaces;
+	trace->tool.kmod_sec_map  = perf_event__process_kmod_sec_map;
 
 	trace->tool.ordered_events = true;
 	trace->tool.ordering_requires_timestamps = true;
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index a839b30c981b..3d26781eb418 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -94,6 +94,7 @@ struct perf_tool build_id__mark_dso_hit_ops = {
 	.exit	= perf_event__exit_del_thread,
 	.attr		 = perf_event__process_attr,
 	.build_id	 = perf_event__process_build_id,
+	.kmod_sec_map	 = perf_event__process_kmod_sec_map,
 	.ordered_events	 = true,
 };
 
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index b842273458b8..200703c15cdc 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1621,6 +1621,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
 			.tracing_data    = perf_event__process_tracing_data,
 			.build_id        = perf_event__process_build_id,
 			.namespaces      = perf_event__process_namespaces,
+			.kmod_sec_map	 = perf_event__process_kmod_sec_map,
 			.ordered_events  = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/util/data-convert-json.c b/tools/perf/util/data-convert-json.c
index ba9d93ce9463..b0e5dbcaccc7 100644
--- a/tools/perf/util/data-convert-json.c
+++ b/tools/perf/util/data-convert-json.c
@@ -326,6 +326,7 @@ int bt_convert__perf2json(const char *input_name, const char *output_name,
 			.auxtrace_info  = perf_event__process_auxtrace_info,
 			.auxtrace       = perf_event__process_auxtrace,
 			.event_update   = perf_event__process_event_update,
+			.kmod_sec_map	= perf_event__process_kmod_sec_map,
 			.ordered_events = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 1b03061440bc..b13418a773a7 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -297,6 +297,16 @@ size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 		       event->mmap.filename);
 }
 
+size_t perf_event__fprintf_kmod_sec_map(union perf_event *event, FILE *fp)
+{
+	struct perf_record_kmod_sec_map *ksm = &event->kmod_sec_map;
+
+	return fprintf(fp, " %d: [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64 "]: %c %s %s\n",
+		       ksm->pid, ksm->start, ksm->len, ksm->pgoff,
+		       (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) ? 'r' : 'x',
+		       ksm->data, ksm->data + ksm->sec_name_len + 2);
+}
+
 size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
 {
 	if (event->header.misc & PERF_RECORD_MISC_MMAP_BUILD_ID) {
@@ -379,6 +389,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused,
 	return machine__process_mmap2_event(machine, event, sample);
 }
 
+int perf_event__process_kmod_sec_map(struct perf_tool *tool __maybe_unused,
+				     union perf_event *event,
+				     struct perf_sample *sample __maybe_unused,
+				     struct machine *machine)
+{
+	return machine__process_kmod_sec_map_event(machine, event);
+}
+
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, "(%d:%d):(%d:%d)\n",
@@ -554,6 +572,9 @@ size_t perf_event__fprintf(union perf_event *event, struct machine *machine, FIL
 	case PERF_RECORD_AUX_OUTPUT_HW_ID:
 		ret += perf_event__fprintf_aux_output_hw_id(event, fp);
 		break;
+	case PERF_RECORD_KMOD_SEC_MAP:
+		ret += perf_event__fprintf_kmod_sec_map(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 6663a676eadc..90f2eff24f8d 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -310,6 +310,10 @@ int perf_event__process_mmap2(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct machine *machine);
+int perf_event__process_kmod_sec_map(struct perf_tool *tool,
+				     union perf_event *event,
+				     struct perf_sample *sample,
+				     struct machine *machine);
 int perf_event__process_fork(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -343,6 +347,7 @@ const char *perf_event__name(unsigned int id);
 size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_kmod_sec_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ab2919bc0a0d..a98cf3b4599b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -36,6 +36,7 @@
 #include <internal/lib.h> // page_size
 #include "cgroup.h"
 #include "arm64-frame-pointer-unwind-support.h"
+#include "api/fs/fs.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -1595,6 +1596,188 @@ static int machine__create_module(void *arg, const char *name, u64 start,
 	return 0;
 }
 
+/* Works only for host modules. */
+static int __machine_create_modules_sections(struct maps *kmaps, struct map *map,
+					     struct list_head *tmp_map_list)
+{
+	const char *sysfs = sysfs__mountpoint();
+	char path[PATH_MAX];
+	struct dirent *dent;
+	Elf_Scn *sec = NULL;
+	struct dso *sec_dso;
+	struct map *sec_map;
+	size_t shstrndx;
+	GElf_Shdr shdr;
+	char *mod_name;
+	char *sname;
+	int ret = 0;
+	u64 start;
+	int found;
+	DIR *dir;
+	Elf *elf;
+	int fd;
+
+	if (!sysfs)
+		return -1;
+
+	fd = open(map->dso->long_name, O_RDONLY); /* XXX: What if file is compressed? */
+	if (fd < 0)
+		return -errno;
+
+	/* [modulename] => modulename */
+	mod_name = strdup(&map->dso->short_name[1]);
+	if (!mod_name)
+		return -1;
+	mod_name[map->dso->short_name_len - 2] = '\0';
+
+	elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+	if (elf == NULL) {
+		ret = -1;
+		goto out_free_mod_name;
+	}
+
+	if (elf_getshdrstrndx(elf, &shstrndx) != 0) {
+		ret = -errno;
+		goto out_elf_end;
+	}
+
+	snprintf(path, PATH_MAX, "%s/module/%s/sections/", sysfs, mod_name);
+	dir = opendir(path);
+	if (!dir) {
+		ret = -errno;
+		goto out_elf_end;
+	}
+
+	while ((dent = readdir(dir))) {
+		if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
+			continue;
+
+		found = 0;
+		sec = NULL;
+		while ((sec = elf_nextscn(elf, sec)) != NULL) {
+			gelf_getshdr(sec, &shdr);
+			sname = elf_strptr(elf, shstrndx, shdr.sh_name);
+			if (!sname)
+				continue;
+
+			if (strncmp(dent->d_name, sname, PATH_MAX))
+				continue;
+			found = 1;
+			break;
+		}
+
+		if (!found)
+			continue;
+
+		if (!(shdr.sh_flags & SHF_ALLOC))
+			continue;
+
+		/*
+		 * .init sections are removed once module init is done. But
+		 * their stale addresses remain in sysfs sections list. Later,
+		 * kernel can allocate the same address range to other module,
+		 * and in such cases, addresses provided by sysfs will overlap
+		 * across different modules. Ignore init sections.
+		 */
+		if (strstarts(dent->d_name, ".init"))
+			continue;
+
+		if (!strcmp(dent->d_name, ".text")) {
+			map->end = map->start + shdr.sh_size;
+			map->pgoff = shdr.sh_offset;
+			continue;
+		}
+
+		snprintf(path, PATH_MAX, "module/%s/sections/%s", mod_name, dent->d_name);
+
+		if (sysfs__read_ull(path, (unsigned long long *)&start) < 0)
+			continue;
+
+		sec_dso = dso__new(map->dso->long_name);
+		if (!sec_dso) {
+			ret = -1;
+			goto out;
+		}
+
+		sec_dso->kernel = map->dso->kernel;
+		sec_dso->symtab_type = map->dso->symtab_type;
+		sec_dso->long_name_len = map->dso->long_name_len;
+		sec_dso->short_name = NULL;
+		ret = asprintf((char **)&sec_dso->short_name, "%s%s",
+			       map->dso->short_name, sname);
+		if (ret < 0 || !sec_dso->short_name) {
+			ret = -1;
+			goto out;
+		}
+		sec_dso->short_name_allocated = true;
+
+		sec_map = map__new2(start, sec_dso);
+		dso__put(sec_dso);
+		if (!sec_map) {
+			ret = -1;
+			goto out;
+		}
+
+		sec_map->end = start + shdr.sh_size;
+		sec_map->pgoff = shdr.sh_offset;
+		sec_map->sec_map = true;
+
+		map__kmap(sec_map)->kmaps = kmaps;
+
+		/*
+		 * We are adding this new map in a temporary list. We need
+		 * to keep map->refcnt non-zero until we insert this map in
+		 * machine->kmaps. So don't call map__put() here. It should
+		 * be done by the caller.
+		 */
+		list_add(&sec_map->node, tmp_map_list);
+	}
+
+out:
+	closedir(dir);
+out_elf_end:
+	elf_end(elf);
+out_free_mod_name:
+	free(mod_name);
+	close(fd);
+	return ret;
+}
+
+static int machine_create_modules_sections(struct machine *machine)
+{
+	struct maps *kmaps = machine__kernel_maps(machine);
+	LIST_HEAD(tmp_map_list);
+	struct map *map, *tmp;
+
+	maps__for_each_entry(kmaps, map) {
+		if (!__map__is_kmodule(map) || map->sec_map)
+			continue;
+		/*
+		 * Although we haven't added any section specific maps to
+		 * machine->kmaps yet, we would have changed .text section
+		 * map's address range of prev modules. And we don't have
+		 * a way to revert those back if this function fails. So
+		 * just continue with debug message.
+		 */
+		if (__machine_create_modules_sections(kmaps, map, &tmp_map_list) < 0)
+			pr_debug("Couldn't prepare section maps of %s\n", map->dso->short_name);
+	}
+
+	list_for_each_entry_safe(map, tmp, &tmp_map_list, node) {
+		list_del(&map->node);
+		maps__insert(kmaps, map);
+		/*
+		 * Add it before we drop the reference to map, i.e. while we
+		 * still are sure to have a reference to this DSO via *map->dso.
+		 */
+		dsos__add(&machine->dsos, map->dso);
+		/* kmaps already got it */
+		map__put(map);
+	}
+
+	return 0;
+}
+
 static int machine__create_modules(struct machine *machine)
 {
 	const char *modules;
@@ -1616,6 +1799,13 @@ static int machine__create_modules(struct machine *machine)
 	if (machine__set_modules_path(machine))
 		pr_debug("Problems setting modules path maps, continuing anyway...\n");
 
+	/*
+	 * We don't have access to guest sysfs. So section specific maps
+	 * are created only for host modules.
+	 */
+	if (!machine__is_default_guest(machine))
+		return machine_create_modules_sections(machine);
+
 	return 0;
 }
 
@@ -1982,6 +2172,80 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 	return 0;
 }
 
+/* sec maps are saved only for host modules. */
+int machine__process_kmod_sec_map_event(struct machine *machine, union perf_event *event)
+{
+	struct perf_record_kmod_sec_map *ksm;
+	char *mod_name, *ptr;
+	struct maps *kmaps;
+	struct map *map;
+	struct dso *dso;
+	int ret;
+
+	if (dump_trace)
+		perf_event__fprintf_kmod_sec_map(event, stdout);
+
+	ksm = &event->kmod_sec_map;
+
+	mod_name = strrchr(ksm->data + ksm->sec_name_len + 1, '/') + 1;
+	ptr = strstr(mod_name, ".ko");
+	if (mod_name == (char *)1 || !ptr || ptr == mod_name)
+		goto err;
+
+	dso = dso__new(ksm->data + ksm->sec_name_len + 1);
+	if (!dso)
+		goto err;
+
+	dso->kernel = DSO_SPACE__KERNEL;
+	/* XXX: What about DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP? */
+	dso->symtab_type = DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE;
+
+	ret = asprintf((char **)&dso->short_name, "[%.*s]%.*s", (int)(ptr - mod_name),
+		       mod_name, ksm->sec_name_len - 1, ksm->data);
+	if (ret < 0 || !dso->short_name) {
+		dso__put(dso);
+		goto err;
+	}
+	dso->short_name_allocated = true;
+	dso->short_name_len = strlen(dso->short_name);
+
+	dso->long_name = strdup(ksm->data + ksm->sec_name_len + 1);
+	if (!dso->long_name) {
+		dso__put(dso);
+		goto err;
+	}
+	dso->long_name_allocated = true;
+	dso->long_name_len = ksm->filename_len - 1;
+
+	map = map__new2(ksm->start, dso);
+	dso__put(dso);
+	if (!map)
+		goto err;
+
+	map->end = map->start + ksm->len;
+	map->pgoff = ksm->pgoff;
+	map->sec_map = 1;
+
+	kmaps = machine__kernel_maps(machine);
+	map__kmap(map)->kmaps = kmaps;
+	maps__insert(kmaps, map);
+
+	/*
+	 * Add it before we drop the reference to sec_map, i.e.
+	 * while we still are sure to have a reference to this
+	 * DSO via *sec_map->dso.
+	 */
+	dsos__add(&machine->dsos, dso);
+	/* kmaps already got it */
+	map__put(map);
+
+	return 0;
+
+err:
+	dump_printf("problem processing PERF_RECORD_KMOD_SEC_MAP, skipping event.\n");
+	return -1;
+}
+
 static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock)
 {
 	struct threads *threads = machine__threads(machine, th->tid);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d034ecaf89c1..3524587f7c06 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -146,6 +146,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
 				 struct perf_sample *sample);
+int machine__process_kmod_sec_map_event(struct machine *machine,
+					union perf_event *event);
 int machine__process_ksymbol(struct machine *machine,
 			     union perf_event *event,
 			     struct perf_sample *sample);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index f3a3d9b3a40d..8325800581c5 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -114,6 +114,7 @@ void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso)
 	RB_CLEAR_NODE(&map->rb_node);
 	map->erange_warned = false;
 	refcount_set(&map->refcnt, 1);
+	map->sec_map  = false;
 }
 
 struct map *map__new(struct machine *machine, u64 start, u64 len,
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 3dcfe06db6b3..4b3fa454ea24 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -20,6 +20,7 @@ struct map {
 		struct rb_node	rb_node;
 		struct list_head node;
 	};
+	/* Range of a map [start, end). */
 	u64			start;
 	u64			end;
 	bool			erange_warned:1;
@@ -36,6 +37,9 @@ struct map {
 	struct dso		*dso;
 	refcount_t		refcnt;
 	u32			flags;
+
+	/* Indicates whether a map is of kernel module's elf section. */
+	bool			sec_map:1;
 };
 
 struct kmap;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4f5165cd58de..e1a568a1dcd2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -999,10 +999,15 @@ static void perf_event__time_conv_swap(union perf_event *event,
 	}
 }
 
-static void perf_event_kmod_sec_map_swap(union perf_event *event __maybe_unused,
-					  bool sample_id_all __maybe_unused)
+static void perf_event_kmod_sec_map_swap(union perf_event *event,
+					 bool sample_id_all __maybe_unused)
 {
-	/* FIXME */
+	event->kmod_sec_map.pid = bswap_32(event->kmod_sec_map.pid);
+	event->kmod_sec_map.start = bswap_64(event->kmod_sec_map.start);
+	event->kmod_sec_map.len = bswap_64(event->kmod_sec_map.len);
+	event->kmod_sec_map.pgoff = bswap_64(event->kmod_sec_map.pgoff);
+	event->kmod_sec_map.sec_name_len = bswap_16(event->kmod_sec_map.sec_name_len);
+	event->kmod_sec_map.filename_len = bswap_16(event->kmod_sec_map.filename_len);
 }
 
 typedef void (*perf_event__swap_op)(union perf_event *event,
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 96767d1b3f1c..8d14faeebb8f 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1137,6 +1137,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 	struct maps *kmaps = kmap ? map__kmaps(map) : NULL;
 	struct map *curr_map = map;
 	struct dso *curr_dso = dso;
+	char *sec_map_name = NULL;
 	Elf_Data *symstrs, *secstrs, *secstrs_run, *secstrs_sym;
 	uint32_t nr_syms;
 	int err = -1;
@@ -1236,6 +1237,10 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		remap_kernel = true;
 		adjust_kernel_syms = dso->adjust_symbols;
 	}
+
+	if (map->sec_map)
+		sec_map_name = strchr(dso->short_name, ']') + 1;
+
 	elf_symtab__for_each_symbol(syms, nr_syms, idx, sym) {
 		struct symbol *f;
 		const char *elf_name = elf_sym__name(&sym, symstrs);
@@ -1315,6 +1320,8 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 			continue;
 
 		section_name = elf_sec__name(&shdr, secstrs);
+		if (sec_map_name && strcmp(sec_map_name, section_name))
+			continue;
 
 		/* On ARM, symbols for thumb functions have 1 added to
 		 * the symbol address as a flag - remove it */
@@ -1323,7 +1330,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		    (sym.st_value & 1))
 			--sym.st_value;
 
-		if (dso->kernel) {
+		if (dso->kernel && !map->sec_map) {
 			if (dso__process_kernel_symbol(dso, map, &sym, &shdr, kmaps, kmap, &curr_dso, &curr_map,
 						       section_name, adjust_kernel_syms, kmodule, &remap_kernel))
 				goto out_elf_end;
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a3a165ae933a..2407c047974c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1121,7 +1121,7 @@ static int do_validate_kcore_modules(const char *filename, struct maps *kmaps)
 	maps__for_each_entry(kmaps, old_map) {
 		struct module_info *mi;
 
-		if (!__map__is_kmodule(old_map)) {
+		if (!__map__is_kmodule(old_map) || old_map->sec_map) {
 			continue;
 		}
 
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 2e145517f192..f8a91113e93b 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -665,6 +665,40 @@ int perf_event__synthesize_cgroups(struct perf_tool *tool __maybe_unused,
 }
 #endif
 
+static void __perf_event__synthesize_modules_sec_map(struct machine *machine,
+						     struct map *map,
+						     union perf_event *event)
+{
+	size_t sec_name_len;
+	size_t size;
+	char *ptr;
+
+	ptr = strchr(map->dso->short_name, ']') + 1;
+	if (!*ptr)
+		return;
+
+	sec_name_len = strlen(ptr);
+	size = PERF_ALIGN(sec_name_len + map->dso->long_name_len + 2, sizeof(u64));
+
+	event->kmod_sec_map.header.type = PERF_RECORD_KMOD_SEC_MAP;
+	event->kmod_sec_map.header.size = sizeof(event->kmod_sec_map)
+					  + size
+					  + machine->id_hdr_size;
+
+	memset(event->kmod_sec_map.data + size, 0, machine->id_hdr_size);
+
+	event->kmod_sec_map.pid = machine->pid;
+	event->kmod_sec_map.start = map->start;
+	event->kmod_sec_map.len = map->end - map->start;
+	event->kmod_sec_map.pgoff = map->pgoff;
+	event->kmod_sec_map.sec_name_len = sec_name_len + 1;
+	event->kmod_sec_map.filename_len = map->dso->long_name_len + 1;
+
+	memcpy(event->kmod_sec_map.data, ptr, sec_name_len + 1);
+	memcpy(event->kmod_sec_map.data + sec_name_len + 2, map->dso->long_name,
+	       map->dso->long_name_len + 1);
+}
+
 static void __perf_event__synthesize_modules_mmap2(struct machine *machine,
 						   struct map *map,
 						   union perf_event *event)
@@ -715,7 +749,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
 	int rc = 0;
 	struct map *pos;
 	struct maps *maps = machine__kernel_maps(machine);
-	union perf_event *event;
+	union perf_event *event, *sec_event;
 	size_t size = symbol_conf.buildid_mmap2 ?
 			sizeof(event->mmap2) : sizeof(event->mmap);
 
@@ -726,30 +760,47 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
 		return -1;
 	}
 
+	/* Length of section name and filenames are capped at PATH_MAX */
+	sec_event = zalloc(sizeof(event->kmod_sec_map) + (2 * PATH_MAX) + machine->id_hdr_size);
+	if (sec_event == NULL) {
+		free(event);
+		pr_debug("Not enough memory synthesizing mmap event "
+			 "for kernel modules\n");
+		return -1;
+	}
+
 	/*
 	 * kernel uses 0 for user space maps, see kernel/perf_event.c
 	 * __perf_event_mmap
 	 */
-	if (machine__is_host(machine))
+	if (machine__is_host(machine)) {
 		event->header.misc = PERF_RECORD_MISC_KERNEL;
-	else
+		/* PERF_RECORD_KMOD_SEC_MAP is supported only on Host */
+		sec_event->header.misc = PERF_RECORD_MISC_KERNEL;
+	} else {
 		event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
+	}
 
 	maps__for_each_entry(maps, pos) {
 		if (!__map__is_kmodule(pos))
 			continue;
 
-		if (symbol_conf.buildid_mmap2)
+		if (pos->sec_map)
+			__perf_event__synthesize_modules_sec_map(machine, pos, sec_event);
+		else if (symbol_conf.buildid_mmap2)
 			__perf_event__synthesize_modules_mmap2(machine, pos, event);
 		else
 			__perf_event__synthesize_modules_mmap(machine, pos, event);
 
-		if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
+		rc = perf_tool__process_synth_event(tool, pos->sec_map ? sec_event : event,
+						    machine, process);
+		if (rc) {
 			rc = -1;
 			break;
 		}
 	}
 
+	free(sec_event);
 	free(event);
 	return rc;
 }
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
  2023-01-10  5:58 [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
                   ` (3 preceding siblings ...)
  2023-01-10  5:58 ` [RFC 4/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
@ 2023-01-10  6:35 ` Adrian Hunter
  2023-01-10  8:43   ` Ravi Bangoria
  2023-01-16  4:21 ` Ravi Bangoria
  5 siblings, 1 reply; 11+ messages in thread
From: Adrian Hunter @ 2023-01-10  6:35 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, namhyung, irogers, kan.liang, peterz, mark.rutland, mingo,
	alexander.shishkin, james.clark, german.gomez, leo.yan,
	alexey.v.bayduraev, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla

On 10/01/23 07:58, Ravi Bangoria wrote:
> Kernel module elf contains executable code in non-".text" sections as
> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
> differs from it's binary layout because .ko elf does not contain
> program header table.

Have you looked at using perf record --kcore option.

> 
> Perf tries to solve it by creating special maps for allocated (SHF_ALLOC)
> elf sections, but perf uses elf addresses for map address range and thus
> these special maps remains unused because no real ip falls into their
> address range.
> 
> Solve this by preparing section specific special maps using addresses
> provided by sysfs /sys/module/.../sections/. Also save these details in
> PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at
> perf-report time.
> 
> Without patchset:
> 
>   # perf record -a -c 5000000
>   # perf report
>   Overhead  Command          Shared Object           Symbol
>     13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
>      6.58%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e6
>      6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
>      6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
>      4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
>      4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
>      3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
>      2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
>      1.98%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015171
>      1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit
>      1.04%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e2
>      0.94%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015174
> 
>   Same perf.data with kallsyms:
> 
>   # perf report --kallsyms=/proc/kallsyms
>   Overhead  Command          Shared Object           Symbol
>     14.22%  qemu-system-x86  [kvm_amd]               [k] __svm_vcpu_run
>     13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
>      6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
>      6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
>      4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
>      4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
>      3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
>      2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
>      1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit
> 
> With patchset:
> 
>   # perf record -a -c 5000000
>   # perf report
>   Overhead  Command          Shared Object           Symbol
>     13.44%  qemu-system-x86  [kvm-amd].noinstr.text  [k] __svm_vcpu_run
>     13.25%  qemu-system-x86  [unknown]               [.] 0x000055f4c6563973
>      7.13%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
>      6.00%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
>      5.13%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
>      4.83%  qemu-system-x86  [kvm]                   [k] vcpu_run
>      3.65%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
> 
>   Same perf.data with kallsyms:
> 
>   # perf report --kallsyms=/proc/kallsyms
>   Overhead  Command          Shared Object       Symbol
>     13.44%  qemu-system-x86  [kernel.vmlinux]    [k] __svm_vcpu_run
>     13.25%  qemu-system-x86  [unknown]           [.] 0x000055f4c6563973
>      7.13%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_gdt
>      6.00%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_tr_desc
>      5.13%  qemu-system-x86  [kernel.vmlinux]    [k] svm_vcpu_run
>      4.83%  qemu-system-x86  [kernel.vmlinux]    [k] vcpu_run
>      3.65%  qemu-system-x86  [kernel.vmlinux]    [k] kvm_cpuid
> 
> This is an RFC only series. TODOs:
>  - I'm just recording module path in PERF_RECORD_KMOD_SEC_MAP. It's very
>    much possible that, at perf report time, a module file exists at the
>    same path but it's internal layout is different. I think I need to add
>    some buildid check. Any ideas?
>  - I've enabled host perf-record/report only. It doesn't work for guest
>    modules because host does not have access to guest sysfs. I'm yet to
>    figure out how to fix it. May be we can add --guest-mod-sysfs option.
>    Any ideas?
>  - Also, I'm currently assuming that module files are not compressed.
>  - I've seen perf build failures when compiling with NO_LIBELF=1.
>  - I've seen perf report not honoring --kallsyms in certain conditions.
> 
> Prepared on top of acme/perf/core (69b41ac87e4a6)
> 
> Ravi Bangoria (4):
>   perf tool: Simplify machine__create_modules() a bit
>   perf tool: Refactor perf_event__synthesize_modules()
>   perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP
>   perf tool: Fix non-".text" symbol resolution for kernel modules
> 
>  tools/lib/perf/Documentation/libperf.txt |   1 +
>  tools/lib/perf/include/perf/event.h      |  25 +++
>  tools/perf/builtin-annotate.c            |   1 +
>  tools/perf/builtin-c2c.c                 |   1 +
>  tools/perf/builtin-diff.c                |   1 +
>  tools/perf/builtin-inject.c              |   1 +
>  tools/perf/builtin-kmem.c                |   1 +
>  tools/perf/builtin-mem.c                 |   1 +
>  tools/perf/builtin-record.c              |  14 ++
>  tools/perf/builtin-report.c              |   1 +
>  tools/perf/builtin-script.c              |  13 ++
>  tools/perf/builtin-trace.c               |   1 +
>  tools/perf/util/build-id.c               |   1 +
>  tools/perf/util/data-convert-bt.c        |   1 +
>  tools/perf/util/data-convert-json.c      |   1 +
>  tools/perf/util/event.c                  |  22 ++
>  tools/perf/util/event.h                  |   5 +
>  tools/perf/util/machine.c                | 268 ++++++++++++++++++++++-
>  tools/perf/util/machine.h                |   2 +
>  tools/perf/util/map.c                    |   1 +
>  tools/perf/util/map.h                    |   4 +
>  tools/perf/util/session.c                |  17 ++
>  tools/perf/util/symbol-elf.c             |   9 +-
>  tools/perf/util/symbol.c                 |   2 +-
>  tools/perf/util/synthetic-events.c       | 136 +++++++++---
>  tools/perf/util/tool.h                   |   3 +-
>  26 files changed, 494 insertions(+), 39 deletions(-)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
  2023-01-10  6:35 ` [RFC 0/4] " Adrian Hunter
@ 2023-01-10  8:43   ` Ravi Bangoria
  2023-01-10  8:58     ` Ravi Bangoria
  0 siblings, 1 reply; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-10  8:43 UTC (permalink / raw)
  To: Adrian Hunter, acme
  Cc: jolsa, namhyung, irogers, kan.liang, peterz, mark.rutland, mingo,
	alexander.shishkin, james.clark, german.gomez, leo.yan,
	alexey.v.bayduraev, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla, Ravi Bangoria

Hi Adrian,

On 10-Jan-23 12:05 PM, Adrian Hunter wrote:
> On 10/01/23 07:58, Ravi Bangoria wrote:
>> Kernel module elf contains executable code in non-".text" sections as
>> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
>> differs from it's binary layout because .ko elf does not contain
>> program header table.
> 
> Have you looked at using perf record --kcore option.

Nice! We can also use --kallsyms with perf report and it resolves symbols
fine.

But what about normal perf record/report? Why I'm enforcing on normal perf-
record/report is because, generally user don't specify these options, esp if
he has root privileges, he expects symbol-resolution should just work fine.
But when he sees inconsistency in symbol-resolution of the same kernel module,
he will be clueless of what's missing. This patchset is trying to solve it,
although I too feel adding section specific maps to perf.data is overkill as
--kcore or --kallsyms can also resolve those symbols.

Thanks for your feedback,
Ravi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
  2023-01-10  8:43   ` Ravi Bangoria
@ 2023-01-10  8:58     ` Ravi Bangoria
  0 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-10  8:58 UTC (permalink / raw)
  To: Adrian Hunter, acme
  Cc: jolsa, namhyung, irogers, kan.liang, peterz, mark.rutland, mingo,
	alexander.shishkin, james.clark, german.gomez, leo.yan,
	alexey.v.bayduraev, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla, Ravi Bangoria

On 10-Jan-23 2:13 PM, Ravi Bangoria wrote:
> Hi Adrian,
> 
> On 10-Jan-23 12:05 PM, Adrian Hunter wrote:
>> On 10/01/23 07:58, Ravi Bangoria wrote:
>>> Kernel module elf contains executable code in non-".text" sections as
>>> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
>>> differs from it's binary layout because .ko elf does not contain
>>> program header table.
>>
>> Have you looked at using perf record --kcore option.
> 
> Nice! We can also use --kallsyms with perf report and it resolves symbols
> fine.
> 
> But what about normal perf record/report? Why I'm enforcing on normal perf-
> record/report is because, generally user don't specify these options, esp if
> he has root privileges, he expects symbol-resolution should just work fine.
> But when he sees inconsistency in symbol-resolution of the same kernel module,
> he will be clueless of what's missing. This patchset is trying to solve it,
> although I too feel adding section specific maps to perf.data is overkill as
> --kcore or --kallsyms can also resolve those symbols.

FWIW, what this patchset does is not new. Perf already creates (pseudo) maps
for module elf sections while parsing symbol table: dso__process_kernel_symbol().
But perf does it incorrectly so this patchset is trying to fix it.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
  2023-01-10  5:58 [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
                   ` (4 preceding siblings ...)
  2023-01-10  6:35 ` [RFC 0/4] " Adrian Hunter
@ 2023-01-16  4:21 ` Ravi Bangoria
  5 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-16  4:21 UTC (permalink / raw)
  To: acme
  Cc: jolsa, namhyung, irogers, kan.liang, peterz, mark.rutland, mingo,
	alexander.shishkin, james.clark, german.gomez, leo.yan,
	adrian.hunter, alexey.v.bayduraev, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla, Ravi Bangoria

Hi all,

On 10-Jan-23 11:28 AM, Ravi Bangoria wrote:
> Kernel module elf contains executable code in non-".text" sections as
> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
> differs from it's binary layout because .ko elf does not contain
> program header table.
> 
> Perf tries to solve it by creating special maps for allocated (SHF_ALLOC)
> elf sections, but perf uses elf addresses for map address range and thus
> these special maps remains unused because no real ip falls into their
> address range.
> 
> Solve this by preparing section specific special maps using addresses
> provided by sysfs /sys/module/.../sections/. Also save these details in
> PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at
> perf-report time.

Do you guys feel this is worth to fix and I shall continue? Or --kcore /
--kallsyms workarounds are sufficient?

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 3/4] perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP
  2023-01-10  5:58 ` [RFC 3/4] perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP Ravi Bangoria
@ 2023-01-16  6:14   ` Adrian Hunter
  2023-01-16 13:34     ` Ravi Bangoria
  0 siblings, 1 reply; 11+ messages in thread
From: Adrian Hunter @ 2023-01-16  6:14 UTC (permalink / raw)
  To: Ravi Bangoria, acme
  Cc: jolsa, namhyung, irogers, kan.liang, peterz, mark.rutland, mingo,
	alexander.shishkin, james.clark, german.gomez, leo.yan,
	alexey.v.bayduraev, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla

On 10/01/23 07:58, Ravi Bangoria wrote:
> Introduce, perf tool only, synthetic event type PERF_RECORD_KMOD_SEC_MAP.
> Also add stub code for it. This event will be used to save/restore kernel
> module section maps to/from perf.data file. This is needed because kernel
> module elfs does not contain program header table and thus there is no
> easy way to find out how kernel would have loaded module sections in the
> memory.

Currently machine__addnew_module_map() adds a map for a
module, and then perf_event__synthesize_modules() creates
an MMAP/MMAP2 event for it.  Why can't we do that for the
sections?

> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  tools/lib/perf/Documentation/libperf.txt |  1 +
>  tools/lib/perf/include/perf/event.h      | 25 ++++++++++++++++++++++++
>  tools/perf/util/event.c                  |  1 +
>  tools/perf/util/session.c                | 12 ++++++++++++
>  tools/perf/util/tool.h                   |  3 ++-
>  5 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/perf/Documentation/libperf.txt b/tools/lib/perf/Documentation/libperf.txt
> index a8f1a237931b..b62730b84cc5 100644
> --- a/tools/lib/perf/Documentation/libperf.txt
> +++ b/tools/lib/perf/Documentation/libperf.txt
> @@ -211,6 +211,7 @@ SYNOPSIS
>    struct perf_record_time_conv;
>    struct perf_record_header_feature;
>    struct perf_record_compressed;
> +  struct perf_record_kmod_sec_maps;
>  --
>  
>  DESCRIPTION
> diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
> index ad47d7b31046..404b23b6902b 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -438,6 +438,29 @@ struct perf_record_compressed {
>  	char			 data[];
>  };
>  
> +/* Kernel module elf section maps */
> +struct perf_record_kmod_sec_map {
> +	struct perf_event_header header;
> +	/* Machine id. Same as synthesized PERF_RECORD_MMAP */
> +	__u32			 pid;
> +	/* Section start ip address */
> +	__u64			 start;
> +	/* Section length */
> +	__u64			 len;
> +	/* Section page offset in kernel module elf file */
> +	__u64			 pgoff;
> +	/* Section name length, including '\0' */
> +	__u16			 sec_name_len;
> +	/* Kernel module filename(path) length, including '\0' */
> +	__u16			 filename_len;
> +	/*
> +	 * Section name and filename stored as: "sec_name\0filename\0". i.e:
> +	 * data[0]: Section name
> +	 * data[sec_name_len + 1]: File name
> +	 */
> +	char			 data[];
> +};
> +
>  enum perf_user_event_type { /* above any possible kernel type */
>  	PERF_RECORD_USER_TYPE_START		= 64,
>  	PERF_RECORD_HEADER_ATTR			= 64,
> @@ -459,6 +482,7 @@ enum perf_user_event_type { /* above any possible kernel type */
>  	PERF_RECORD_HEADER_FEATURE		= 80,
>  	PERF_RECORD_COMPRESSED			= 81,
>  	PERF_RECORD_FINISHED_INIT		= 82,
> +	PERF_RECORD_KMOD_SEC_MAP		= 83,
>  	PERF_RECORD_HEADER_MAX
>  };
>  
> @@ -499,6 +523,7 @@ union perf_event {
>  	struct perf_record_time_conv		time_conv;
>  	struct perf_record_header_feature	feat;
>  	struct perf_record_compressed		pack;
> +	struct perf_record_kmod_sec_map		kmod_sec_map;
>  };
>  
>  #endif /* __LIBPERF_EVENT_H */
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 1fa14598b916..1b03061440bc 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -77,6 +77,7 @@ static const char *perf_event__names[] = {
>  	[PERF_RECORD_HEADER_FEATURE]		= "FEATURE",
>  	[PERF_RECORD_COMPRESSED]		= "COMPRESSED",
>  	[PERF_RECORD_FINISHED_INIT]		= "FINISHED_INIT",
> +	[PERF_RECORD_KMOD_SEC_MAP]		= "KMOD_SEC_MAP",
>  };
>  
>  const char *perf_event__name(unsigned int id)
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 7c021c6cedb9..4f5165cd58de 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -563,6 +563,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>  		tool->compressed = perf_session__process_compressed_event;
>  	if (tool->finished_init == NULL)
>  		tool->finished_init = process_event_op2_stub;
> +	if (tool->kmod_sec_map == NULL)
> +		tool->kmod_sec_map = process_event_stub;
>  }
>  
>  static void swap_sample_id_all(union perf_event *event, void *data)
> @@ -997,6 +999,12 @@ static void perf_event__time_conv_swap(union perf_event *event,
>  	}
>  }
>  
> +static void perf_event_kmod_sec_map_swap(union perf_event *event __maybe_unused,
> +					  bool sample_id_all __maybe_unused)
> +{
> +	/* FIXME */
> +}
> +
>  typedef void (*perf_event__swap_op)(union perf_event *event,
>  				    bool sample_id_all);
>  
> @@ -1035,6 +1043,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
>  	[PERF_RECORD_STAT_ROUND]	  = perf_event__stat_round_swap,
>  	[PERF_RECORD_EVENT_UPDATE]	  = perf_event__event_update_swap,
>  	[PERF_RECORD_TIME_CONV]		  = perf_event__time_conv_swap,
> +	[PERF_RECORD_KMOD_SEC_MAP]	  = perf_event_kmod_sec_map_swap,
>  	[PERF_RECORD_HEADER_MAX]	  = NULL,
>  };
>  
> @@ -1727,6 +1736,9 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>  		return err;
>  	case PERF_RECORD_FINISHED_INIT:
>  		return tool->finished_init(session, event);
> +	case PERF_RECORD_KMOD_SEC_MAP:
> +		/* Currently PERF_RECORD_KMOD_SEC_MAP is supported only for host */
> +		return tool->kmod_sec_map(tool, event, &sample, &session->machines.host);
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index c957fb849ac6..8ea7fb85c196 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -60,7 +60,8 @@ struct perf_tool {
>  			unthrottle,
>  			ksymbol,
>  			bpf,
> -			text_poke;
> +			text_poke,
> +			kmod_sec_map;
>  
>  	event_attr_op	attr;
>  	event_attr_op	event_update;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 3/4] perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP
  2023-01-16  6:14   ` Adrian Hunter
@ 2023-01-16 13:34     ` Ravi Bangoria
  0 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2023-01-16 13:34 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: acme, jolsa, namhyung, irogers, kan.liang, peterz, mark.rutland,
	mingo, alexander.shishkin, james.clark, german.gomez, leo.yan,
	alexey.v.bayduraev, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla, Ravi Bangoria

Hi Adrian,

On 16-Jan-23 11:44 AM, Adrian Hunter wrote:
> On 10/01/23 07:58, Ravi Bangoria wrote:
>> Introduce, perf tool only, synthetic event type PERF_RECORD_KMOD_SEC_MAP.
>> Also add stub code for it. This event will be used to save/restore kernel
>> module section maps to/from perf.data file. This is needed because kernel
>> module elfs does not contain program header table and thus there is no
>> easy way to find out how kernel would have loaded module sections in the
>> memory.
> 
> Currently machine__addnew_module_map() adds a map for a
> module, and then perf_event__synthesize_modules() creates
> an MMAP/MMAP2 event for it.  Why can't we do that for the
> sections?

Do you mean why not use MMAP/MMAP2 for module sections as well?
I thought about that but:
- PERF_RECORD_MMAP has no field which can help segregate normal vs section
  specific maps.
- I can probably introduce special MAP_ flag and use PERF_RECORD_MMAP2. But
  that flag will be non-standard (perf tool only), which is ugly.
- I also need additional hacks while synthesizing section specific MMAP/
  MMAP2 (e.g.: store section name and file path in filename[]) and similar
  hacks while parsing them at perf report time.

So I felt introducing PERF_RECORD_KMOD_SEC_MAP is relatively cleaner approach.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-01-16 13:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-10  5:58 [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
2023-01-10  5:58 ` [RFC 1/4] perf tool: Simplify machine__create_modules() a bit Ravi Bangoria
2023-01-10  5:58 ` [RFC 2/4] perf tool: Refactor perf_event__synthesize_modules() Ravi Bangoria
2023-01-10  5:58 ` [RFC 3/4] perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP Ravi Bangoria
2023-01-16  6:14   ` Adrian Hunter
2023-01-16 13:34     ` Ravi Bangoria
2023-01-10  5:58 ` [RFC 4/4] perf tool: Fix non-".text" symbol resolution for kernel modules Ravi Bangoria
2023-01-10  6:35 ` [RFC 0/4] " Adrian Hunter
2023-01-10  8:43   ` Ravi Bangoria
2023-01-10  8:58     ` Ravi Bangoria
2023-01-16  4:21 ` Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).