linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] tracing: Have boot instance use reserve_mem option and use fgraph tracer
@ 2024-08-13 17:11 Steven Rostedt
  2024-08-13 17:11 ` [PATCH 1/2] tracing: Allow boot instances to use reserve_mem boot memory Steven Rostedt
  2024-08-13 17:11 ` [PATCH 2/2] tracing/fgraph: Have fgraph handle previous boot function addresses Steven Rostedt
  0 siblings, 2 replies; 3+ messages in thread
From: Steven Rostedt @ 2024-08-13 17:11 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Ross Zwisler, Vincent Donnefort


Now that "reserve_mem" kernel command line option is upstream, add a patch
to use it with the ring buffer boot up mappings. That is:

  reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace

Will allocate 12 megabytes at boot up that is aligned by 4096 bytes and
label it with "trace". A trace_instance with the name "boot_mapped" will be
created on top of that memory.

Documentation has been updated about this and it states that KASLR can make
it somewhat unreliable for every boot as well as the layout of the memory
for the ring buffer may change with new kernel versions which will clear the
previous buffer.

Also, now that function graph tracing can be used by trace instances,
update its code to be able to be used by this boot process. This can give a
nicer trace of a reboot:

           swapper/0-1       [000] d..1.   363.079162:  0)               |              lapic_shutdown() {
           swapper/0-1       [000] d..1.   363.079163:  0)               |                disable_local_APIC() {
           swapper/0-1       [000] d..1.   363.079163:  0) + 26.144 us   |                  clear_local_APIC.part.0();
           swapper/0-1       [000] d....   363.079192:  0) + 29.424 us   |                }
           swapper/0-1       [000] d....   363.079192:  0) + 30.376 us   |              }
           swapper/0-1       [000] d..1.   363.079193:  0)               |              restore_boot_irq_mode() {
           swapper/0-1       [000] d..1.   363.079194:  0)               |                native_restore_boot_irq_mode() {
           swapper/0-1       [000] d..1.   363.079194:  0) + 13.863 us   |                  disconnect_bsp_APIC();
           swapper/0-1       [000] d....   363.079209:  0) + 14.933 us   |                }
           swapper/0-1       [000] d....   363.079209:  0) + 16.009 us   |              }
           swapper/0-1       [000] d..1.   363.079210:  0)   0.694 us    |              hpet_disable();
           swapper/0-1       [000] d..1.   363.079211:  0)   0.511 us    |              iommu_shutdown_noop();
           swapper/0-1       [000] d....   363.079212:  0) # 3980.260 us |            }
           swapper/0-1       [000] d..1.   363.079212:  0)               |            native_machine_emergency_restart() {
           swapper/0-1       [000] d..1.   363.079213:  0)   0.495 us    |              tboot_shutdown();
           swapper/0-1       [000] d..1.   363.079230:  0)               |              acpi_reboot() {
           swapper/0-1       [000] d..1.   363.079231:  0)               |                acpi_reset() {
           swapper/0-1       [000] d..1.   363.079232:  0)               |                  acpi_os_write_port() {

This is based on top of:

  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
     branch: ring-buffer/for-next

Which was supposed to go in the last merge window, but due to
miscommunication, it did not. As it has been in linux-next, I do not want to
rebase it, so instead I merged in v6.11-rc1 to get access to the
reserve_mem kernel command line parameter and applied these patches on top.


Steven Rostedt (1):
      tracing/fgraph: Have fgraph handle previous boot function addresses

Steven Rostedt (Google) (1):
      tracing: Allow boot instances to use reserve_mem boot memory

----
 Documentation/admin-guide/kernel-parameters.txt | 13 +++++++++++++
 kernel/trace/trace.c                            | 19 +++++++++++++------
 kernel/trace/trace_functions_graph.c            | 23 ++++++++++++++++++-----
 3 files changed, 44 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] tracing: Allow boot instances to use reserve_mem boot memory
  2024-08-13 17:11 [PATCH 0/2] tracing: Have boot instance use reserve_mem option and use fgraph tracer Steven Rostedt
@ 2024-08-13 17:11 ` Steven Rostedt
  2024-08-13 17:11 ` [PATCH 2/2] tracing/fgraph: Have fgraph handle previous boot function addresses Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2024-08-13 17:11 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Ross Zwisler, Vincent Donnefort

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Allow boot instances to use memory reserved by the reserve_mem boot
option.

  reserve_mem=12M:4096:trace  trace_instance=boot_mapped@trace

The above will allocate 12 megs with 4096 alignment and label it "trace".
The second parameter will create a "boot_mapped" instance and use the
memory reserved and labeled as "trace" as the memory for the ring buffer.

That will create an instance called "boot_mapped":

  /sys/kernel/tracing/instances/boot_mapped

Note, because the ring buffer is using a defined memory ranged, it will
act just like a memory mapped ring buffer. It will not have a snapshot
buffer, as it can't swap out the buffer. The snapshot files as well as any
tracers that uses a snapshot will not be present in the boot_mapped
instance.

Also note that reserve_mem is not reliable in acquiring the same physical
memory at each soft reboot. It is possible that KALSR could map the kernel
at the previous boot memory location forcing the reserve_mem to return a
different memory location. In this case, the previous ring buffer will be
lost.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 .../admin-guide/kernel-parameters.txt         | 13 +++++++++++++
 kernel/trace/trace.c                          | 19 +++++++++++++------
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c688bc6e9153..f91a68bc8e77 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6754,6 +6754,19 @@
 			memory at 0x284500000 that is 12Megs. The per CPU buffers of that
 			instance will be split up accordingly.
 
+			Alternatively, the memory can be reserved by the reserve_mem option:
+
+				reserve_mem=12M:4096:trace trace_instance=boot_map@trace
+
+			This will reserve 12 megabytes at boot up with a 4096 byte alignment
+			and place the ring buffer in this memory. Note that due to KASLR, the
+			memory may not be the same location each time, which will not preserve
+			the buffer content.
+
+			Also note that the layout of the ring buffer data may change between
+			kernel versions where the validator will fail and reset the ring buffer
+			if the layout is not the same as the previous kernel.
+
 	trace_options=[option-list]
 			[FTRACE] Enable or disable tracer options at boot.
 			The option-list is a comma delimited list of options
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8e5a4ca9fd70..c93a8dc69c69 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10465,22 +10465,20 @@ __init static void enable_instances(void)
 	str = boot_instance_info;
 
 	while ((curr_str = strsep(&str, "\t"))) {
-		unsigned long start = 0;
-		unsigned long size = 0;
+		phys_addr_t start = 0;
+		phys_addr_t size = 0;
 		unsigned long addr = 0;
 
 		tok = strsep(&curr_str, ",");
 		name = strsep(&tok, "@");
-		if (tok) {
+
+		if (tok && isdigit(*tok)) {
 			start = memparse(tok, &tok);
 			if (!start) {
 				pr_warn("Tracing: Invalid boot instance address for %s\n",
 					name);
 				continue;
 			}
-		}
-
-		if (start) {
 			if (*tok != ':') {
 				pr_warn("Tracing: No size specified for instance %s\n", name);
 				continue;
@@ -10492,6 +10490,15 @@ __init static void enable_instances(void)
 					name);
 				continue;
 			}
+		} else if (tok) {
+			if (!reserve_mem_find_by_name(tok, &start, &size)) {
+				start = 0;
+				pr_warn("Failed to map boot instance %s to %s\n", name, tok);
+				continue;
+			}
+		}
+
+		if (start) {
 			addr = map_pages(start, size);
 			if (addr) {
 				pr_info("Tracing: mapped boot instance %s at physical memory 0x%lx of size 0x%lx\n",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] tracing/fgraph: Have fgraph handle previous boot function addresses
  2024-08-13 17:11 [PATCH 0/2] tracing: Have boot instance use reserve_mem option and use fgraph tracer Steven Rostedt
  2024-08-13 17:11 ` [PATCH 1/2] tracing: Allow boot instances to use reserve_mem boot memory Steven Rostedt
@ 2024-08-13 17:11 ` Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2024-08-13 17:11 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Ross Zwisler, Vincent Donnefort

From: Steven Rostedt <rostedt@goodmis.org>

Update the function graph code to modify the function addresses for a
previous boot buffer so that it matches the current kallsyms (note this
does not handle module addresses, yet).

After a reboot, instead of seeing:

 # trace-cmd show -B boot_mapped | tail -n30
       swapper/0-1       [000] d..2.    56.286470:  0)   0.481 us    |                    0xffffffff925da5c4();
       swapper/0-1       [000] d....    56.286471:  0)   4.065 us    |                  }
       swapper/0-1       [000] d....    56.286471:  0)   4.920 us    |                }
       swapper/0-1       [000] d..1.    56.286472:  0)               |                0xffffffff92536254() {
       swapper/0-1       [000] d..1.    56.286472:  0) + 28.974 us   |                  0xffffffff92534e30();
       swapper/0-1       [000] d....    56.286516:  0) + 43.881 us   |                }
       swapper/0-1       [000] d..1.    56.286517:  0)               |                0xffffffff925136c4() {
       swapper/0-1       [000] d..1.    56.286518:  0)               |                  0xffffffff92514a14() {
       swapper/0-1       [000] d..1.    56.286518:  0)   6.003 us    |                    0xffffffff92514200();
       swapper/0-1       [000] d....    56.286529:  0) + 11.510 us   |                  }
       swapper/0-1       [000] d....    56.286529:  0) + 12.895 us   |                }
       swapper/0-1       [000] d....    56.286530:  0) ! 382.884 us  |              }
       swapper/0-1       [000] d..1.    56.286530:  0)               |              0xffffffff92536444() {
       swapper/0-1       [000] d..1.    56.286531:  0)               |                0xffffffff92536254() {
       swapper/0-1       [000] d..1.    56.286531:  0) + 26.335 us   |                  0xffffffff92534e30();
       swapper/0-1       [000] d....    56.286560:  0) + 29.511 us   |                }
       swapper/0-1       [000] d....    56.286561:  0) + 30.452 us   |              }
       swapper/0-1       [000] d..1.    56.286562:  0)               |              0xffffffff9253c014() {
       swapper/0-1       [000] d..1.    56.286562:  0)               |                0xffffffff9253bed4() {
       swapper/0-1       [000] d..1.    56.286563:  0) + 13.465 us   |                  0xffffffff92536684();
       swapper/0-1       [000] d....    56.286577:  0) + 14.651 us   |                }
       swapper/0-1       [000] d....    56.286577:  0) + 15.821 us   |              }
       swapper/0-1       [000] d..1.    56.286578:  0)   0.667 us    |              0xffffffff92547074();
       swapper/0-1       [000] d..1.    56.286579:  0)   0.453 us    |              0xffffffff924f35c4();
       swapper/0-1       [000] d....    56.286580:  0) # 3906.348 us |            }
       swapper/0-1       [000] d..1.    56.286581:  0)               |            0xffffffff92531a14() {
       swapper/0-1       [000] d..1.    56.286581:  0)   0.518 us    |              0xffffffff92505cb4();
       swapper/0-1       [000] d..1.    56.286595:  0)               |              0xffffffff92db83c4() {
       swapper/0-1       [000] d..1.    56.286596:  0)               |                0xffffffff92dec2e4() {
       swapper/0-1       [000] d..1.    56.286597:  0)               |                  0xffffffff92db5304() {

It now shows:

 # trace-cmd show -B boot_mapped | tail -n30
       swapper/0-1       [000] d..2.   363.079099:  0)   0.483 us    |                    preempt_count_sub();
       swapper/0-1       [000] d....   363.079100:  0)   4.112 us    |                  }
       swapper/0-1       [000] d....   363.079101:  0)   4.979 us    |                }
       swapper/0-1       [000] d..1.   363.079101:  0)               |                disable_local_APIC() {
       swapper/0-1       [000] d..1.   363.079102:  0) + 29.153 us   |                  clear_local_APIC.part.0();
       swapper/0-1       [000] d....   363.079148:  0) + 46.517 us   |                }
       swapper/0-1       [000] d..1.   363.079149:  0)               |                mcheck_cpu_clear() {
       swapper/0-1       [000] d..1.   363.079149:  0)               |                  mce_intel_feature_clear() {
       swapper/0-1       [000] d..1.   363.079150:  0)   5.871 us    |                    lmce_supported();
       swapper/0-1       [000] d....   363.079161:  0) + 11.340 us   |                  }
       swapper/0-1       [000] d....   363.079161:  0) + 12.638 us   |                }
       swapper/0-1       [000] d....   363.079162:  0) ! 383.518 us  |              }
       swapper/0-1       [000] d..1.   363.079162:  0)               |              lapic_shutdown() {
       swapper/0-1       [000] d..1.   363.079163:  0)               |                disable_local_APIC() {
       swapper/0-1       [000] d..1.   363.079163:  0) + 26.144 us   |                  clear_local_APIC.part.0();
       swapper/0-1       [000] d....   363.079192:  0) + 29.424 us   |                }
       swapper/0-1       [000] d....   363.079192:  0) + 30.376 us   |              }
       swapper/0-1       [000] d..1.   363.079193:  0)               |              restore_boot_irq_mode() {
       swapper/0-1       [000] d..1.   363.079194:  0)               |                native_restore_boot_irq_mode() {
       swapper/0-1       [000] d..1.   363.079194:  0) + 13.863 us   |                  disconnect_bsp_APIC();
       swapper/0-1       [000] d....   363.079209:  0) + 14.933 us   |                }
       swapper/0-1       [000] d....   363.079209:  0) + 16.009 us   |              }
       swapper/0-1       [000] d..1.   363.079210:  0)   0.694 us    |              hpet_disable();
       swapper/0-1       [000] d..1.   363.079211:  0)   0.511 us    |              iommu_shutdown_noop();
       swapper/0-1       [000] d....   363.079212:  0) # 3980.260 us |            }
       swapper/0-1       [000] d..1.   363.079212:  0)               |            native_machine_emergency_restart() {
       swapper/0-1       [000] d..1.   363.079213:  0)   0.495 us    |              tboot_shutdown();
       swapper/0-1       [000] d..1.   363.079230:  0)               |              acpi_reboot() {
       swapper/0-1       [000] d..1.   363.079231:  0)               |                acpi_reset() {
       swapper/0-1       [000] d..1.   363.079232:  0)               |                  acpi_os_write_port() {

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_functions_graph.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 13d0387ac6a6..a569daaac4c4 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -544,6 +544,8 @@ print_graph_irq(struct trace_iterator *iter, unsigned long addr,
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *ent = iter->ent;
 
+	addr += iter->tr->text_delta;
+
 	if (addr < (unsigned long)__irqentry_text_start ||
 		addr >= (unsigned long)__irqentry_text_end)
 		return;
@@ -710,6 +712,7 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 	struct ftrace_graph_ret *graph_ret;
 	struct ftrace_graph_ent *call;
 	unsigned long long duration;
+	unsigned long func;
 	int cpu = iter->cpu;
 	int i;
 
@@ -717,6 +720,8 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 	call = &entry->graph_ent;
 	duration = graph_ret->rettime - graph_ret->calltime;
 
+	func = call->func + iter->tr->text_delta;
+
 	if (data) {
 		struct fgraph_cpu_data *cpu_data;
 
@@ -747,10 +752,10 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 	 * enabled.
 	 */
 	if (flags & __TRACE_GRAPH_PRINT_RETVAL)
-		print_graph_retval(s, graph_ret->retval, true, (void *)call->func,
+		print_graph_retval(s, graph_ret->retval, true, (void *)func,
 				!!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
 	else
-		trace_seq_printf(s, "%ps();\n", (void *)call->func);
+		trace_seq_printf(s, "%ps();\n", (void *)func);
 
 	print_graph_irq(iter, graph_ret->func, TRACE_GRAPH_RET,
 			cpu, iter->ent->pid, flags);
@@ -766,6 +771,7 @@ print_graph_entry_nested(struct trace_iterator *iter,
 	struct ftrace_graph_ent *call = &entry->graph_ent;
 	struct fgraph_data *data = iter->private;
 	struct trace_array *tr = iter->tr;
+	unsigned long func;
 	int i;
 
 	if (data) {
@@ -788,7 +794,9 @@ print_graph_entry_nested(struct trace_iterator *iter,
 	for (i = 0; i < call->depth * TRACE_GRAPH_INDENT; i++)
 		trace_seq_putc(s, ' ');
 
-	trace_seq_printf(s, "%ps() {\n", (void *)call->func);
+	func = call->func + iter->tr->text_delta;
+
+	trace_seq_printf(s, "%ps() {\n", (void *)func);
 
 	if (trace_seq_has_overflowed(s))
 		return TRACE_TYPE_PARTIAL_LINE;
@@ -863,6 +871,8 @@ check_irq_entry(struct trace_iterator *iter, u32 flags,
 	int *depth_irq;
 	struct fgraph_data *data = iter->private;
 
+	addr += iter->tr->text_delta;
+
 	/*
 	 * If we are either displaying irqs, or we got called as
 	 * a graph event and private data does not exist,
@@ -990,11 +1000,14 @@ print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
 	unsigned long long duration = trace->rettime - trace->calltime;
 	struct fgraph_data *data = iter->private;
 	struct trace_array *tr = iter->tr;
+	unsigned long func;
 	pid_t pid = ent->pid;
 	int cpu = iter->cpu;
 	int func_match = 1;
 	int i;
 
+	func = trace->func + iter->tr->text_delta;
+
 	if (check_irq_return(iter, flags, trace->depth))
 		return TRACE_TYPE_HANDLED;
 
@@ -1033,7 +1046,7 @@ print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
 	 * function-retval option is enabled.
 	 */
 	if (flags & __TRACE_GRAPH_PRINT_RETVAL) {
-		print_graph_retval(s, trace->retval, false, (void *)trace->func,
+		print_graph_retval(s, trace->retval, false, (void *)func,
 			!!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
 	} else {
 		/*
@@ -1046,7 +1059,7 @@ print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
 		if (func_match && !(flags & TRACE_GRAPH_PRINT_TAIL))
 			trace_seq_puts(s, "}\n");
 		else
-			trace_seq_printf(s, "} /* %ps */\n", (void *)trace->func);
+			trace_seq_printf(s, "} /* %ps */\n", (void *)func);
 	}
 
 	/* Overrun */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-08-13 17:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 17:11 [PATCH 0/2] tracing: Have boot instance use reserve_mem option and use fgraph tracer Steven Rostedt
2024-08-13 17:11 ` [PATCH 1/2] tracing: Allow boot instances to use reserve_mem boot memory Steven Rostedt
2024-08-13 17:11 ` [PATCH 2/2] tracing/fgraph: Have fgraph handle previous boot function addresses Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).