[PATCH v1] samples/bpf: Add a trace tool with perf PMU counters

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters
@ 2025-01-19 15:33 Leo Yan
  2025-01-20 16:18 ` Daniel Borkmann
  0 siblings, 1 reply; 6+ messages in thread
From: Leo Yan @ 2025-01-19 15:33 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	James Clark, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	linux-kernel, bpf
  Cc: Leo Yan

Developers might need to profile a program with fine-grained
granularity.  E.g., a user case is to account the CPU cycles for a small
program or for a specific function within the program.

This commit introduces a small tool with using eBPF program to read the
perf PMU counters for performance metrics.  As the first step, the four
counters are supported with the '-e' option: cycles, instructions,
branches, branch-misses.

The '-r' option is provided for support raw event number.  This option
is mutually exclusive to the '-e' option, users either pass a raw event
number or a counter name.

The tool enables the counters for the entire trace session in free-run
mode.  It reads the beginning values for counters when the profiled
program is scheduled in, and calculate the interval when the task is
scheduled out.  The advantage of this approach is to dismiss the
statistics noise (e.g. caused by the tool itself) as possible.

The tool can support function based tracing.  By using the '-f' option,
users can specify the traced function.  The eBPF program enables tracing
at the function entry and disables trace upon exit from the function.

The '-u' option can be specified for tracing user mode only.

Below are several usage examples.

Trace CPU cycles for the whole program:

  # ./trace_counter -e cycles -- /mnt/sort
  Or
  # ./trace_counter -e cycles /mnt/sort
  Create process for the workload.
  Enable the event cycles.
  Bubble sorting array of 3000 elements
  551 ms
  Finished the workload.
  Event (cycles) statistics:
   +-----------+------------------+
   | CPU[0000] |         29093250 |
   +-----------+------------------+
   | CPU[0002] |         75672820 |
   +-----------+------------------+
   | CPU[0006] |       1067458735 |
   +-----------+------------------+
     Total     :       1172224805

Trace branches for the user mode only:

  # ./trace_counter -e branches -u -- /mnt/sort
  Create process for the workload.
  Enable the event branches.
  Bubble sorting array of 3000 elements
  541 ms
  Finished the workload.
  Event (branches) statistics:
   +-----------+------------------+
   | CPU[0007] |         88112669 |
   +-----------+------------------+
     Total     :         88112669

Trace instructions for the 'bubble_sort' function:

  # ./trace_counter -f bubble_sort -e instructions -- /mnt/sort
  Create process for the workload.
  Enable the event instructions.
  Bubble sorting array of 3000 elements
  541 ms
  Finished the workload.
  Event (instructions) statistics:
   +-----------+------------------+
   | CPU[0006] |       1169810201 |
   +-----------+------------------+
     Total     :       1169810201
  Function (bubble_sort) duration statistics:
     Count     :                5
     Minimum   :        232009928
     Maximum   :        236742006
     Average   :        233962040

Trace the raw event '0x5' (L1D_TLB_REFILL):

  # ./trace_counter -r 0x5 -u -- /mnt/sort
  Create process for the workload.
  Enable the raw event 0x5.
  Bubble sorting array of 3000 elements
  540 ms
  Finished the workload.
  Event (0x5) statistics:
   +-----------+------------------+
   | CPU[0007] |              174 |
   +-----------+------------------+
     Total     :              174

Trace for the function and set CPU affinity for the profiled program:

  # ./trace_counter -f bubble_sort -x /mnt/sort -e cycles \
	-- taskset -c 2 /mnt/sort
  Create process for the workload.
  Enable the event cycles.
  Bubble sorting array of 3000 elements
  619 ms
  Finished the workload.
  Event (cycles) statistics:
   +-----------+------------------+
   | CPU[0002] |       1169913056 |
   +-----------+------------------+
     Total     :       1169913056
  Function (bubble_sort) duration statistics:
     Count     :                5
     Minimum   :        232054101
     Maximum   :        236769623
     Average   :        233982611

The command above sets the CPU affinity with taskset command.  The
profiled function 'bubble_sort' is in the executable '/mnt/sort' but not
in the taskset binary.  The '-x' option is used to tell the tool the
correct executable path.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 samples/bpf/Makefile             |   7 +-
 samples/bpf/trace_counter.bpf.c  | 222 +++++++++++++
 samples/bpf/trace_counter_user.c | 528 +++++++++++++++++++++++++++++++
 3 files changed, 756 insertions(+), 1 deletion(-)
 create mode 100644 samples/bpf/trace_counter.bpf.c
 create mode 100644 samples/bpf/trace_counter_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index dd9944a97b7e..59123c7f4443 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -23,6 +23,7 @@ tprogs-y += offwaketime
 tprogs-y += spintest
 tprogs-y += map_perf_test
 tprogs-y += xdp_router_ipv4
+tprogs-y += trace_counter
 tprogs-y += trace_event
 tprogs-y += sampleip
 tprogs-y += tc_l2_redirect
@@ -64,6 +65,7 @@ offwaketime-objs := offwaketime_user.o $(TRACE_HELPERS)
 spintest-objs := spintest_user.o $(TRACE_HELPERS)
 map_perf_test-objs := map_perf_test_user.o
 test_overhead-objs := test_overhead_user.o
+trace_counter-objs := trace_counter_user.o $(TRACE_HELPERS)
 trace_event-objs := trace_event_user.o $(TRACE_HELPERS)
 sampleip-objs := sampleip_user.o $(TRACE_HELPERS)
 tc_l2_redirect-objs := tc_l2_redirect_user.o
@@ -99,6 +101,7 @@ always-y += offwaketime.bpf.o
 always-y += spintest.bpf.o
 always-y += map_perf_test.bpf.o
 always-y += parse_varlen.o parse_simple.o parse_ldabs.o
+always-y += trace_counter.bpf.o
 always-y += trace_event_kern.o
 always-y += sampleip_kern.o
 always-y += lwt_len_hist.bpf.o
@@ -283,6 +286,7 @@ $(obj)/$(TRACE_HELPERS) $(obj)/$(CGROUP_HELPERS) $(obj)/$(XDP_SAMPLE): | libbpf_
 .PHONY: libbpf_hdrs
 
 $(obj)/xdp_router_ipv4_user.o: $(obj)/xdp_router_ipv4.skel.h
+$(obj)/trace_counter_user.o: $(obj)/trace_counter.skel.h
 
 $(obj)/tracex5.bpf.o: $(obj)/syscall_nrs.h
 $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
@@ -347,10 +351,11 @@ $(obj)/%.bpf.o: $(src)/%.bpf.c $(obj)/vmlinux.h $(src)/xdp_sample.bpf.h $(src)/x
 		-I$(LIBBPF_INCLUDE) $(CLANG_SYS_INCLUDES) \
 		-c $(filter %.bpf.c,$^) -o $@
 
-LINKED_SKELS := xdp_router_ipv4.skel.h
+LINKED_SKELS := xdp_router_ipv4.skel.h trace_counter.skel.h
 clean-files += $(LINKED_SKELS)
 
 xdp_router_ipv4.skel.h-deps := xdp_router_ipv4.bpf.o xdp_sample.bpf.o
+trace_counter.skel.h-deps := trace_counter.bpf.o
 
 LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.bpf.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
 
diff --git a/samples/bpf/trace_counter.bpf.c b/samples/bpf/trace_counter.bpf.c
new file mode 100644
index 000000000000..e7c1e93dcd83
--- /dev/null
+++ b/samples/bpf/trace_counter.bpf.c
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <linux/version.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__type(key, int);
+	__type(value, u32);
+	__uint(max_entries, 1);
+} counters SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, u64);
+	__uint(max_entries, 1);
+} values_begin SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, u64);
+	__uint(max_entries, 1);
+} values SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, u64);
+	__uint(max_entries, 1);
+} task_filter SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, u64);
+	__uint(max_entries, 4);
+} func_stat SEC(".maps");
+
+#define TRACE_MODE_FUNCTION	(1 << 0)
+
+const volatile int trace_mode;
+
+static volatile bool func_trace_enabled;
+static u64 func_duration;
+
+static bool trace_is_enabled(void)
+{
+	/* The trace_mode is zero, traces for whole program. */
+	if (!trace_mode)
+		return true;
+
+	/* Tracing for the function will wait until the function is hit. */
+	if ((trace_mode & TRACE_MODE_FUNCTION) && func_trace_enabled)
+		return true;
+
+	return false;
+}
+
+static bool task_is_traced(u32 pid)
+{
+	u32 *pid_filter;
+	int key = 0;
+
+	pid_filter = bpf_map_lookup_elem(&task_filter, &key);
+	if (!pid_filter)
+		return false;
+
+	if (*pid_filter == pid)
+		return true;
+
+	return false;
+}
+
+static int record_begin_values(void)
+{
+	u64 count, *start;
+	u32 cpu = bpf_get_smp_processor_id();
+	s64 error;
+
+	count = bpf_perf_event_read(&counters, 0);
+	error = (s64)count;
+	if (error <= -2 && error >= -22)
+		return 0;
+
+	start = bpf_map_lookup_elem(&values_begin, &cpu);
+	if (start)
+		*start = count;
+	else
+		bpf_map_update_elem(&values_begin, &cpu, &count, BPF_NOEXIST);
+
+	return 0;
+}
+
+static int record_end_values(void)
+{
+	u64 count, *start, *value, interval;
+	u32 cpu = bpf_get_smp_processor_id();
+	s64 error;
+
+	count = bpf_perf_event_read(&counters, 0);
+	error = (s64)count;
+	if (error <= -2 && error >= -22)
+		return 0;
+
+	start = bpf_map_lookup_elem(&values_begin, &cpu);
+	/* It must be wrong if failed to read out start values, bail out. */
+	if (!start || *start == -1)
+		return 0;
+
+	interval = count - *start;
+	if (!interval)
+		return 0;
+
+	/* Record the interval */
+	value = bpf_map_lookup_elem(&values, &cpu);
+	if (value)
+		*value = *value + interval;
+	else
+		bpf_map_update_elem(&values, &cpu, &interval, BPF_NOEXIST);
+
+	if (func_trace_enabled)
+		func_duration += interval;
+
+	*start = -1;
+	return 0;
+}
+
+static int stat_function(void)
+{
+	int key;
+	u64 *count, *min, *max, init = 1;
+
+	/* Update function entering count */
+	key = 0;
+	count = bpf_map_lookup_elem(&func_stat, &key);
+	if (count)
+		*count += 1;
+	else
+		bpf_map_update_elem(&func_stat, &key, &init, BPF_NOEXIST);
+
+	/* Update function minimum duration */
+	key = 1;
+	min = bpf_map_lookup_elem(&func_stat, &key);
+	if (min) {
+		if (func_duration < *min)
+			*min = func_duration;
+	} else {
+		bpf_map_update_elem(&func_stat, &key, &func_duration, BPF_NOEXIST);
+	}
+
+	/* Update function maximum duration */
+	key = 2;
+	max = bpf_map_lookup_elem(&func_stat, &key);
+	if (max) {
+		if (func_duration > *max)
+			*max = func_duration;
+	} else {
+		bpf_map_update_elem(&func_stat, &key, &func_duration, BPF_NOEXIST);
+	}
+
+	return 0;
+}
+
+SEC("kretprobe/finish_task_switch")
+int schedule_in(struct pt_regs *ctx)
+{
+	if (!trace_is_enabled())
+		return 0;
+
+	if (!task_is_traced(bpf_get_current_pid_tgid()))
+		return 0;
+
+	return record_begin_values();
+}
+
+SEC("tracepoint/sched/sched_switch")
+int schedule_out(struct trace_event_raw_sched_switch *ctx)
+{
+	if (!trace_is_enabled())
+		return 0;
+
+	if (!task_is_traced(ctx->prev_pid))
+		return 0;
+
+	return record_end_values();
+}
+
+SEC("uprobe/func")
+int BPF_PROG(func_begin)
+{
+	int key = 0;
+	u32 *trace_mode;
+
+	if (!task_is_traced(bpf_get_current_pid_tgid()))
+		return 0;
+
+	func_trace_enabled = true;
+	func_duration = 0;
+	return record_begin_values();
+}
+
+SEC("uretprobe/func")
+int BPF_PROG(func_exit)
+{
+	int key = 0;
+	u32 *trace_mode;
+
+	if (!task_is_traced(bpf_get_current_pid_tgid()))
+		return 0;
+
+	record_end_values();
+	func_trace_enabled = false;
+	return stat_function();
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/trace_counter_user.c b/samples/bpf/trace_counter_user.c
new file mode 100644
index 000000000000..7c28bf5df51e
--- /dev/null
+++ b/samples/bpf/trace_counter_user.c
@@ -0,0 +1,528 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Arm Limited. */
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <linux/bpf.h>
+#include <linux/limits.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/resource.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "perf-sys.h"
+#include "trace_helpers.h"
+#include "trace_counter.skel.h"
+
+#define ARRAY_SIZE(x)		(sizeof(x) / sizeof(*(x)))
+#define SAMPLE_PERIOD		0x7fffffffffffffffULL
+
+static struct trace_counter *skel;
+
+/* Set the flag TRACE_MODE_FUNCTION for function trace. */
+#define TRACE_MODE_FUNCTION	(1 << 0)
+
+/* The PID for the profiled program */
+static int pid;
+
+/*
+ * The PID for a pipe used for synchronization between the parent
+ * process and the forked program in the child process.
+ *
+ * The synchronization mechanism is borrowed from the perf tool.
+ */
+static int cork_fd;
+
+/*
+ * The PMU event for tracing.  Four events are supported:
+ *   cycles, instructions, branches, branch-misses.
+ */
+static char *event;
+
+/*
+ * The absolute path for executable file.
+ */
+static char *executable;
+
+/*
+ * The function name for function trace.
+ */
+static char *func;
+
+/* File descriptor for PMU events */
+static int pmu_fd = -1;
+
+/* The target CPU for PMU event */
+static int pmu_cpu = -1;
+
+static char exec_path[PATH_MAX + 1];
+
+/* Only trace user mode */
+static bool user_mode;
+
+/* CPU number */
+static int nr_cpus;
+
+/* Raw event number */
+static long raw_event = -1;
+
+static void usage(const char *cmd)
+{
+	printf("USAGE: %s [-f function] [-x binary_path] [-e event] -- command ...\n", cmd);
+	printf("       %s [-f function] [-e event] command\n", cmd);
+	printf("       %s [-h]\n", cmd);
+	printf("       -c cpu		# Traced CPU (default: -1 for all CPUs)\n");
+	printf("       -f function      # Traced function name\n");
+	printf("       -x executable    # Absolute path for executable for adding uprobes\n");
+	printf("       -e event         # Event name (default: cycles). Supported events:\n");
+	printf("                        # cycles, instructions, branches, branch-misses\n");
+	printf("       -r raw_event_num # Raw event number in hexadecimal format.\n");
+	printf("                        # The '-r' option is mutually exclusive to the '-e' option.\n");
+	printf("       -u               # Only trace user mode\n");
+	printf("       -h               # Help\n");
+}
+
+static void err_exit(int err)
+{
+	if (pid)
+		kill(pid, SIGKILL);
+	exit(err);
+}
+
+static void print_counter(void)
+{
+	int cpu, fd, key;
+	__u64 value, sum = 0, count, min, max, avg;
+
+	if (event)
+		printf("Event (%s) statistics:\n", event);
+	else
+		printf("Event (0x%lx) statistics:\n", raw_event);
+
+	printf(" +-----------+------------------+\n");
+
+	fd = bpf_map__fd(skel->maps.values);
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		if (bpf_map_lookup_elem(fd, &cpu, &value))
+			continue;
+
+		printf(" | CPU[%04d] | %16llu |\n", cpu, value);
+		printf(" +-----------+------------------+\n");
+		sum += value;
+	}
+
+	printf("   Total     : %16llu\n", sum);
+
+	if (func) {
+		fd = bpf_map__fd(skel->maps.func_stat);
+		key = 0;
+		if (bpf_map_lookup_elem(fd, &key, &count)) {
+			printf("failed to read out function count\n");
+			return;
+		}
+
+		key = 1;
+		if (bpf_map_lookup_elem(fd, &key, &min)) {
+			printf("failed to read out function min duration\n");
+			return;
+		}
+
+		key = 2;
+		if (bpf_map_lookup_elem(fd, &key, &max)) {
+			printf("failed to read out function max duration\n");
+			return;
+		}
+
+		avg = sum / count;
+		printf("Function (%s) duration statistics:\n"
+		       "   Count     : %16llu\n"
+		       "   Minimum   : %16llu\n"
+		       "   Maximum   : %16llu\n"
+		       "   Average   : %16llu\n",
+		       func, count, min, max, avg);
+	}
+}
+
+/*
+ * Prepare process for profiled workload.  The synchronization mechanism is
+ * borrowed from the perf tool for avoid accounting the loads by the tool
+ * itself.
+ *
+ * It creates a child process context for the profiled program and the child
+ * process waits on pipe (go_pipe[0]).  The parent process will enable PMU
+ * events and notify the child process to proceed.
+ */
+static int prepare_workload(char **argv)
+{
+	int child_ready_pipe[2], go_pipe[2];
+	char bf;
+	int key = 0, fd;
+
+	if (pipe(child_ready_pipe) < 0) {
+		printf("Failed to create 'ready' pipe\n");
+		return -1;
+	}
+
+	if (pipe(go_pipe) < 0) {
+		printf("Failed to create 'go' pipe\n");
+		goto out_close_ready_pipe;
+	}
+
+	printf("Create process for the workload.\n");
+
+	pid = fork();
+	if (pid < 0) {
+		printf("Failed to fork child process\n");
+		goto out_close_pipes;
+	}
+
+	/* Child process */
+	if (!pid) {
+		int ret;
+
+		close(child_ready_pipe[0]);
+		close(go_pipe[1]);
+
+		fcntl(go_pipe[0], F_SETFD, FD_CLOEXEC);
+
+		/*
+		 * Tell the parent we're ready to go
+		 */
+		close(child_ready_pipe[1]);
+
+		/*
+		 * Wait until the parent tells us to go.
+		 */
+		ret = read(go_pipe[0], &bf, 1);
+		if (ret != 1) {
+			if (ret == -1)
+				printf("Unable to read go_pipe[0]\n");
+			exit(ret);
+		}
+
+		execvp(argv[0], (char **)argv);
+
+		/* Should not run to here */
+		exit(-1);
+	}
+
+	close(child_ready_pipe[1]);
+	close(go_pipe[0]);
+
+	/* Wait for child to settle */
+	if (read(child_ready_pipe[0], &bf, 1) == -1) {
+		printf("Unable to read ready pipe\n");
+		goto out_close_pipes;
+	}
+
+	fcntl(go_pipe[1], F_SETFD, FD_CLOEXEC);
+	cork_fd = go_pipe[1];
+	close(child_ready_pipe[0]);
+
+	fd = bpf_map__fd(skel->maps.task_filter);
+	/* Set task filter for only trace the target process */
+	assert(bpf_map_update_elem(fd, &key, &pid, BPF_ANY) == 0);
+	return 0;
+
+out_close_pipes:
+	close(go_pipe[0]);
+	close(go_pipe[1]);
+out_close_ready_pipe:
+	close(child_ready_pipe[0]);
+	close(child_ready_pipe[1]);
+	return -1;
+}
+
+static int run_workload(void)
+{
+	int ret, status;
+	char bf = 0;
+
+	if (cork_fd >= 0) {
+		/*
+		 * Remove the cork, let it rip!
+		 */
+		ret = write(cork_fd, &bf, 1);
+		close(cork_fd);
+		cork_fd = -1;
+
+		if (ret < 0) {
+			printf("Unable to write to cork pipe");
+			return ret;
+		}
+	}
+
+	waitpid(pid, &status, 0);
+	printf("Finished the workload.\n");
+
+	return 0;
+}
+
+static void disable_perf_event(void)
+{
+	if (pmu_fd < 0)
+		return;
+
+	ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE, 0);
+	close(pmu_fd);
+}
+
+static int enable_perf_event(void)
+{
+	int key = 0, fd;
+	struct perf_event_attr attr = {
+		.freq = 0,
+		.sample_period = SAMPLE_PERIOD,
+		.inherit = 0,
+		.type = PERF_TYPE_RAW,
+		.read_format = 0,
+		.sample_type = 0,
+		.enable_on_exec = 1,
+		.disabled = 1,
+	};
+
+	if (event) {
+		attr.type = PERF_TYPE_HARDWARE;
+		if (!strcmp(event, "cycles"))
+			attr.config = PERF_COUNT_HW_CPU_CYCLES;
+		else if (!strcmp(event, "instructions"))
+			attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		else if (!strcmp(event, "branches"))
+			attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		else if (!strcmp(event, "branch-misses"))
+			attr.config = PERF_COUNT_HW_BRANCH_MISSES;
+		else
+			return -EINVAL;
+		printf("Enable the event %s.\n", event);
+	} else {
+		attr.type = PERF_TYPE_RAW;
+		attr.config = raw_event;
+		printf("Enable the raw event 0x%lx.\n", raw_event);
+	}
+
+	if (user_mode) {
+		attr.exclude_kernel = 1;
+		attr.exclude_hv = 1;
+	}
+
+	pmu_fd = sys_perf_event_open(&attr, pid, pmu_cpu, -1, 0);
+	if (pmu_fd < 0) {
+		printf("sys_perf_event_open failed: %d\n", pmu_fd);
+		return pmu_fd;
+	}
+
+	fd = bpf_map__fd(skel->maps.counters);
+	assert(bpf_map_update_elem(fd, &key, &pmu_fd, BPF_ANY) == 0);
+	return 0;
+}
+
+static int parse_opts(int argc, char **argv)
+{
+	int opt;
+
+	nr_cpus = libbpf_num_possible_cpus();
+
+	while ((opt = getopt(argc, argv, "c:f:e:r:x:uh")) != -1) {
+		switch (opt) {
+		case 'c':
+			pmu_cpu = atoi(optarg);
+			break;
+		case 'f':
+			func = optarg;
+			break;
+		case 'e':
+			event = optarg;
+			break;
+		case 'r':
+			raw_event = strtol(optarg, NULL, 16);
+			break;
+		case 'x':
+			executable = optarg;
+			break;
+		case 'u':
+			user_mode = true;
+			break;
+		case 'h':
+		default:
+			usage(argv[0]);
+			return -EINVAL;
+		}
+	}
+
+	if (pmu_cpu != -1 && (pmu_cpu < 0 || pmu_cpu >= nr_cpus)) {
+		printf("Target CPU %d is out-of-range [0..%d].\n",
+		       pmu_cpu, nr_cpus - 1);
+		return -EINVAL;
+	}
+
+	if (event && raw_event != -1) {
+		printf("Only one of event name or raw event can be enabled.\n");
+		return -EINVAL;
+	}
+
+	if (!event && raw_event == -1)
+		event = "cycles";
+
+	if (event) {
+		if (strcmp(event, "cycles") && strcmp(event, "instructions") &&
+		    strcmp(event, "branches") && strcmp(event, "branch-misses")) {
+			printf("Invalid event name: %s\n", event);
+			printf("Supported events: cycles/instructions/branches/branch-misses\n");
+			return -EOPNOTSUPP;
+		}
+	} else {
+		if (raw_event == LONG_MIN || raw_event == LONG_MAX) {
+			printf("Invalid raw event number: %ld\n", raw_event);
+			return -EOPNOTSUPP;
+		}
+	}
+
+	if (func) {
+		if (!executable)
+			executable = argv[optind];
+
+		if (!executable) {
+			printf("Missed the executable path\n");
+			return -EINVAL;
+		}
+
+		if (!realpath(executable, exec_path)) {
+			printf("Unable to find the executable's absolute path.\n"
+			       "Use the '-x' option to specify it.\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+	struct bpf_link *links[4] = { NULL };
+	int error = 1;
+	LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts);
+	__u64 finish_task_switch_addr;
+	char *finish_task_switch_str = NULL;
+
+	if (parse_opts(argc, argv))
+		return 0;
+
+	/*
+	 * The kernel function "finish_task_switch" is an occasion for recording
+	 * a task scheduling out.  The compiler can perform interprocedural
+	 * scalar replacement (-fipa-sra) and the function can be altered (e.g.
+	 * "finish_task_switch.isra.0").  Search the kernel symbols to find out
+	 * the correct symbol name.
+	 */
+	if (!kallsyms_find("finish_task_switch", &finish_task_switch_addr)) {
+		finish_task_switch_str = "finish_task_switch";
+	} else if (!kallsyms_find("finish_task_switch.isra.0",
+				&finish_task_switch_addr)) {
+		finish_task_switch_str = "finish_task_switch.isra.0";
+	} else {
+		printf("Failed to find kernel symbol 'finish_task_switch'\n");
+		return -1;
+	}
+
+	signal(SIGINT, err_exit);
+	signal(SIGTERM, err_exit);
+
+	skel = trace_counter__open();
+	if (!skel) {
+		printf("Failed to open trace counter skeleton\n");
+		return -1;
+	}
+
+	bpf_map__set_max_entries(skel->maps.values_begin, nr_cpus);
+	bpf_map__set_max_entries(skel->maps.values, nr_cpus);
+
+	if (func)
+		skel->rodata->trace_mode = TRACE_MODE_FUNCTION;
+
+	if (trace_counter__load(skel)) {
+		printf("Failed to load trace counter skeleton\n");
+		goto cleanup;
+	}
+
+	/* Create a child process for profied program */
+	if (prepare_workload(&argv[optind]))
+		goto cleanup;
+
+	links[0] = bpf_program__attach(skel->progs.schedule_out);
+	if (libbpf_get_error(links[0])) {
+		printf("bpf_program__attach failed\n");
+		links[0] = NULL;
+		goto cleanup;
+	}
+
+	links[1] = bpf_program__attach_kprobe(skel->progs.schedule_in,
+					      true, finish_task_switch_str);
+	if (libbpf_get_error(links[1])) {
+		printf("bpf_program__attach_kprobe failed\n");
+		links[1] = NULL;
+		goto cleanup;
+	}
+
+	if (func) {
+		uprobe_opts.func_name = func;
+
+		uprobe_opts.retprobe = false;
+		links[2] = bpf_program__attach_uprobe_opts(skel->progs.func_begin,
+							   pid,
+							   exec_path,
+							   0 /* offset */,
+							   &uprobe_opts);
+		if (libbpf_get_error(links[2])) {
+			printf("Failed to attach func_begin uprobe\n");
+			links[2] = NULL;
+			goto cleanup;
+		}
+
+		uprobe_opts.retprobe = true;
+		links[3] = bpf_program__attach_uprobe_opts(skel->progs.func_exit,
+							   pid,
+							   exec_path,
+							   0 /* offset */,
+							   &uprobe_opts);
+		if (libbpf_get_error(links[3])) {
+			printf("Failed to attach func_exit uprobe\n");
+			links[3] = NULL;
+			goto cleanup;
+		}
+	}
+
+	/* Enable perf events */
+	if (enable_perf_event() < 0)
+		goto cleanup;
+
+	/* Excute the workload */
+	if (run_workload() < 0)
+		goto cleanup;
+
+	print_counter();
+	error = 0;
+
+cleanup:
+	/* Disable perf events */
+	disable_perf_event();
+
+	for (i = 0; i < ARRAY_SIZE(links); i++) {
+		if (!links[i])
+			continue;
+		bpf_link__destroy(links[i]);
+	}
+	trace_counter__destroy(skel);
+
+	err_exit(error);
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters
  2025-01-19 15:33 [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters Leo Yan
@ 2025-01-20 16:18 ` Daniel Borkmann
  2025-01-20 18:50   ` Leo Yan
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2025-01-20 16:18 UTC (permalink / raw)
  To: Leo Yan, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, James Clark,
	Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, linux-kernel,
	bpf, Quentin Monnet

Hi Leo,

On 1/19/25 4:33 PM, Leo Yan wrote:
> Developers might need to profile a program with fine-grained
> granularity.  E.g., a user case is to account the CPU cycles for a small
> program or for a specific function within the program.
> 
> This commit introduces a small tool with using eBPF program to read the
> perf PMU counters for performance metrics.  As the first step, the four
> counters are supported with the '-e' option: cycles, instructions,
> branches, branch-misses.
> 
> The '-r' option is provided for support raw event number.  This option
> is mutually exclusive to the '-e' option, users either pass a raw event
> number or a counter name.
> 
> The tool enables the counters for the entire trace session in free-run
> mode.  It reads the beginning values for counters when the profiled
> program is scheduled in, and calculate the interval when the task is
> scheduled out.  The advantage of this approach is to dismiss the
> statistics noise (e.g. caused by the tool itself) as possible.
> 
> The tool can support function based tracing.  By using the '-f' option,
> users can specify the traced function.  The eBPF program enables tracing
> at the function entry and disables trace upon exit from the function.
> 
> The '-u' option can be specified for tracing user mode only.
> 
> Below are several usage examples.
> 
> Trace CPU cycles for the whole program:
> 
>    # ./trace_counter -e cycles -- /mnt/sort
>    Or
>    # ./trace_counter -e cycles /mnt/sort
>    Create process for the workload.
>    Enable the event cycles.
>    Bubble sorting array of 3000 elements
>    551 ms
>    Finished the workload.
>    Event (cycles) statistics:
>     +-----------+------------------+
>     | CPU[0000] |         29093250 |
>     +-----------+------------------+
>     | CPU[0002] |         75672820 |
>     +-----------+------------------+
>     | CPU[0006] |       1067458735 |
>     +-----------+------------------+
>       Total     :       1172224805
> 
> Trace branches for the user mode only:
> 
>    # ./trace_counter -e branches -u -- /mnt/sort
>    Create process for the workload.
>    Enable the event branches.
>    Bubble sorting array of 3000 elements
>    541 ms
>    Finished the workload.
>    Event (branches) statistics:
>     +-----------+------------------+
>     | CPU[0007] |         88112669 |
>     +-----------+------------------+
>       Total     :         88112669
> 
> Trace instructions for the 'bubble_sort' function:
> 
>    # ./trace_counter -f bubble_sort -e instructions -- /mnt/sort
>    Create process for the workload.
>    Enable the event instructions.
>    Bubble sorting array of 3000 elements
>    541 ms
>    Finished the workload.
>    Event (instructions) statistics:
>     +-----------+------------------+
>     | CPU[0006] |       1169810201 |
>     +-----------+------------------+
>       Total     :       1169810201
>    Function (bubble_sort) duration statistics:
>       Count     :                5
>       Minimum   :        232009928
>       Maximum   :        236742006
>       Average   :        233962040
> 
> Trace the raw event '0x5' (L1D_TLB_REFILL):
> 
>    # ./trace_counter -r 0x5 -u -- /mnt/sort
>    Create process for the workload.
>    Enable the raw event 0x5.
>    Bubble sorting array of 3000 elements
>    540 ms
>    Finished the workload.
>    Event (0x5) statistics:
>     +-----------+------------------+
>     | CPU[0007] |              174 |
>     +-----------+------------------+
>       Total     :              174
> 
> Trace for the function and set CPU affinity for the profiled program:
> 
>    # ./trace_counter -f bubble_sort -x /mnt/sort -e cycles \
> 	-- taskset -c 2 /mnt/sort
>    Create process for the workload.
>    Enable the event cycles.
>    Bubble sorting array of 3000 elements
>    619 ms
>    Finished the workload.
>    Event (cycles) statistics:
>     +-----------+------------------+
>     | CPU[0002] |       1169913056 |
>     +-----------+------------------+
>       Total     :       1169913056
>    Function (bubble_sort) duration statistics:
>       Count     :                5
>       Minimum   :        232054101
>       Maximum   :        236769623
>       Average   :        233982611
> 
> The command above sets the CPU affinity with taskset command.  The
> profiled function 'bubble_sort' is in the executable '/mnt/sort' but not
> in the taskset binary.  The '-x' option is used to tell the tool the
> correct executable path.
> 
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
>   samples/bpf/Makefile             |   7 +-
>   samples/bpf/trace_counter.bpf.c  | 222 +++++++++++++
>   samples/bpf/trace_counter_user.c | 528 +++++++++++++++++++++++++++++++
>   3 files changed, 756 insertions(+), 1 deletion(-)
>   create mode 100644 samples/bpf/trace_counter.bpf.c
>   create mode 100644 samples/bpf/trace_counter_user.c

Thanks for this work! Few suggestions.. the contents of samples/bpf/ are in process of being
migrated into BPF selftests given they have been bit-rotting for quite some time, so we'd like
to migrate missing coverage into BPF CI (see test_progs in tools/testing/selftests/bpf/). That
could be one option, or an alternative is to extend bpftool for profiling BPF programs (see
47c09d6a9f67 ("bpftool: Introduce "prog profile" command")).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters
  2025-01-20 16:18 ` Daniel Borkmann
@ 2025-01-20 18:50   ` Leo Yan
  2025-01-20 19:38     ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Leo Yan @ 2025-01-20 18:50 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, James Clark,
	Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, linux-kernel,
	bpf, Quentin Monnet

Hi Daniel,

On Mon, Jan 20, 2025 at 05:18:23PM +0100, Daniel Borkmann wrote:
> 
> Hi Leo,
> 
> On 1/19/25 4:33 PM, Leo Yan wrote:
> > Developers might need to profile a program with fine-grained
> > granularity.  E.g., a user case is to account the CPU cycles for a small
> > program or for a specific function within the program.
> > 
> > This commit introduces a small tool with using eBPF program to read the
> > perf PMU counters for performance metrics.  As the first step, the four
> > counters are supported with the '-e' option: cycles, instructions,
> > branches, branch-misses.
> > 
> > The '-r' option is provided for support raw event number.  This option
> > is mutually exclusive to the '-e' option, users either pass a raw event
> > number or a counter name.
> > 
> > The tool enables the counters for the entire trace session in free-run
> > mode.  It reads the beginning values for counters when the profiled
> > program is scheduled in, and calculate the interval when the task is
> > scheduled out.  The advantage of this approach is to dismiss the
> > statistics noise (e.g. caused by the tool itself) as possible.

[...]

> Thanks for this work! Few suggestions.. the contents of samples/bpf/ are in process of being
> migrated into BPF selftests given they have been bit-rotting for quite some time, so we'd like
> to migrate missing coverage into BPF CI (see test_progs in tools/testing/selftests/bpf/). That
> could be one option, or an alternative is to extend bpftool for profiling BPF programs (see
> 47c09d6a9f67 ("bpftool: Introduce "prog profile" command")).

Thanks for suggestions!

The naming or info in the commit log might cause confuse.  The purpose
of this patch is to measure PMU counters for normal programs (not BPF
program specific).  To achieve profiling accuracy, it opens perf event
in thread mode, and support function based trace and can be userspace
mode only.

My understanding for bpftool is for eBPF program specific.  I looked
into a bit the commit 47c09d6a9f67, it is nature for integrating the
tracing feature for eBPF program specific.  My patch is for tracing
normal userspace programs, I am not sure if this is really wanted by
bpftool.  I would like to hear opinions from bpftool maintainer before
proceeding.

My program mainly uses eBPF attaching to uprobe.  selftest/bpf has
contained the related functionality testing, actually I refered the
test for writing my code :).  So maybe it is not quite useful for
merging the code as a test?

If both options are not ideal, I would spend time to land the
feature in perf tool - the perf tool has supported eBPF backend for
reading PMU counters, but it is absent function based profiling.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters
  2025-01-20 18:50   ` Leo Yan
@ 2025-01-20 19:38     ` Alexei Starovoitov
  2025-01-20 19:54       ` Daniel Borkmann
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2025-01-20 19:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	James Clark, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	LKML, bpf, Quentin Monnet

On Mon, Jan 20, 2025 at 10:50 AM Leo Yan <leo.yan@arm.com> wrote:
>
> Hi Daniel,
>
> On Mon, Jan 20, 2025 at 05:18:23PM +0100, Daniel Borkmann wrote:
> >
> > Hi Leo,
> >
> > On 1/19/25 4:33 PM, Leo Yan wrote:
> > > Developers might need to profile a program with fine-grained
> > > granularity.  E.g., a user case is to account the CPU cycles for a small
> > > program or for a specific function within the program.
> > >
> > > This commit introduces a small tool with using eBPF program to read the
> > > perf PMU counters for performance metrics.  As the first step, the four
> > > counters are supported with the '-e' option: cycles, instructions,
> > > branches, branch-misses.
> > >
> > > The '-r' option is provided for support raw event number.  This option
> > > is mutually exclusive to the '-e' option, users either pass a raw event
> > > number or a counter name.
> > >
> > > The tool enables the counters for the entire trace session in free-run
> > > mode.  It reads the beginning values for counters when the profiled
> > > program is scheduled in, and calculate the interval when the task is
> > > scheduled out.  The advantage of this approach is to dismiss the
> > > statistics noise (e.g. caused by the tool itself) as possible.
>
> [...]
>
> > Thanks for this work! Few suggestions.. the contents of samples/bpf/ are in process of being
> > migrated into BPF selftests given they have been bit-rotting for quite some time, so we'd like
> > to migrate missing coverage into BPF CI (see test_progs in tools/testing/selftests/bpf/). That
> > could be one option, or an alternative is to extend bpftool for profiling BPF programs (see
> > 47c09d6a9f67 ("bpftool: Introduce "prog profile" command")).
>
> Thanks for suggestions!
>
> The naming or info in the commit log might cause confuse.  The purpose
> of this patch is to measure PMU counters for normal programs (not BPF
> program specific).  To achieve profiling accuracy, it opens perf event
> in thread mode, and support function based trace and can be userspace
> mode only.
>
> My understanding for bpftool is for eBPF program specific.  I looked
> into a bit the commit 47c09d6a9f67, it is nature for integrating the
> tracing feature for eBPF program specific.  My patch is for tracing
> normal userspace programs, I am not sure if this is really wanted by
> bpftool.  I would like to hear opinions from bpftool maintainer before
> proceeding.
>
> My program mainly uses eBPF attaching to uprobe.  selftest/bpf has
> contained the related functionality testing, actually I refered the
> test for writing my code :).  So maybe it is not quite useful for
> merging the code as a test?
>
> If both options are not ideal, I would spend time to land the
> feature in perf tool - the perf tool has supported eBPF backend for
> reading PMU counters, but it is absent function based profiling.

We don't add tools to kernel repo. bpftool is an exception
because it's used during the selftest build.
'perf' is another exception for historical reasons.

This particular feature fits 'perf' the best.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters
  2025-01-20 19:38     ` Alexei Starovoitov
@ 2025-01-20 19:54       ` Daniel Borkmann
  2025-01-20 21:37         ` Leo Yan
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2025-01-20 19:54 UTC (permalink / raw)
  To: Alexei Starovoitov, Leo Yan
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, James Clark,
	Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers, LKML, bpf,
	Quentin Monnet

On 1/20/25 8:38 PM, Alexei Starovoitov wrote:
> On Mon, Jan 20, 2025 at 10:50 AM Leo Yan <leo.yan@arm.com> wrote:
>> On Mon, Jan 20, 2025 at 05:18:23PM +0100, Daniel Borkmann wrote:
>>> On 1/19/25 4:33 PM, Leo Yan wrote:
>>>> Developers might need to profile a program with fine-grained
>>>> granularity.  E.g., a user case is to account the CPU cycles for a small
>>>> program or for a specific function within the program.
>>>>
>>>> This commit introduces a small tool with using eBPF program to read the
>>>> perf PMU counters for performance metrics.  As the first step, the four
>>>> counters are supported with the '-e' option: cycles, instructions,
>>>> branches, branch-misses.
>>>>
>>>> The '-r' option is provided for support raw event number.  This option
>>>> is mutually exclusive to the '-e' option, users either pass a raw event
>>>> number or a counter name.
>>>>
>>>> The tool enables the counters for the entire trace session in free-run
>>>> mode.  It reads the beginning values for counters when the profiled
>>>> program is scheduled in, and calculate the interval when the task is
>>>> scheduled out.  The advantage of this approach is to dismiss the
>>>> statistics noise (e.g. caused by the tool itself) as possible.
>>
>> [...]
>>
>>> Thanks for this work! Few suggestions.. the contents of samples/bpf/ are in process of being
>>> migrated into BPF selftests given they have been bit-rotting for quite some time, so we'd like
>>> to migrate missing coverage into BPF CI (see test_progs in tools/testing/selftests/bpf/). That
>>> could be one option, or an alternative is to extend bpftool for profiling BPF programs (see
>>> 47c09d6a9f67 ("bpftool: Introduce "prog profile" command")).
>>
>> Thanks for suggestions!
>>
>> The naming or info in the commit log might cause confuse.  The purpose
>> of this patch is to measure PMU counters for normal programs (not BPF
>> program specific).  To achieve profiling accuracy, it opens perf event
>> in thread mode, and support function based trace and can be userspace
>> mode only.
>>
>> My understanding for bpftool is for eBPF program specific.  I looked
>> into a bit the commit 47c09d6a9f67, it is nature for integrating the
>> tracing feature for eBPF program specific.  My patch is for tracing
>> normal userspace programs, I am not sure if this is really wanted by
>> bpftool.  I would like to hear opinions from bpftool maintainer before
>> proceeding.

Yes, that suggestion was if it would have been applicable also
for the existing bpftool (BPF program) profiling functionality.

>> My program mainly uses eBPF attaching to uprobe.  selftest/bpf has
>> contained the related functionality testing, actually I refered the
>> test for writing my code :).  So maybe it is not quite useful for
>> merging the code as a test?
>>
>> If both options are not ideal, I would spend time to land the
>> feature in perf tool - the perf tool has supported eBPF backend for
>> reading PMU counters, but it is absent function based profiling.
> 
> We don't add tools to kernel repo. bpftool is an exception
> because it's used during the selftest build.
> 'perf' is another exception for historical reasons.
> 
> This particular feature fits 'perf' the best.

Agree, looks like perf is the best target for integration then.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters
  2025-01-20 19:54       ` Daniel Borkmann
@ 2025-01-20 21:37         ` Leo Yan
  0 siblings, 0 replies; 6+ messages in thread
From: Leo Yan @ 2025-01-20 21:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Alexei Starovoitov, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	James Clark, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	LKML, bpf, Quentin Monnet

On Mon, Jan 20, 2025 at 08:54:30PM +0100, Daniel Borkmann wrote:
> On 1/20/25 8:38 PM, Alexei Starovoitov wrote:
> > On Mon, Jan 20, 2025 at 10:50 AM Leo Yan <leo.yan@arm.com> wrote:

[...]

> > > My understanding for bpftool is for eBPF program specific.  I looked
> > > into a bit the commit 47c09d6a9f67, it is nature for integrating the
> > > tracing feature for eBPF program specific.  My patch is for tracing
> > > normal userspace programs, I am not sure if this is really wanted by
> > > bpftool.  I would like to hear opinions from bpftool maintainer before
> > > proceeding.
> 
> Yes, that suggestion was if it would have been applicable also
> for the existing bpftool (BPF program) profiling functionality.
> 
> > > My program mainly uses eBPF attaching to uprobe.  selftest/bpf has
> > > contained the related functionality testing, actually I refered the
> > > test for writing my code :).  So maybe it is not quite useful for
> > > merging the code as a test?
> > > 
> > > If both options are not ideal, I would spend time to land the
> > > feature in perf tool - the perf tool has supported eBPF backend for
> > > reading PMU counters, but it is absent function based profiling.
> > 
> > We don't add tools to kernel repo. bpftool is an exception
> > because it's used during the selftest build.
> > 'perf' is another exception for historical reasons.
> > 
> > This particular feature fits 'perf' the best.
> 
> Agree, looks like perf is the best target for integration then.

Thanks for suggestions, Alexei and Daniel.  It makes sense for me to
move to perf, and now I understand the policy for moving code from
samples/bpf.

It may be irrelevant to the patch itself.  I know we have great BPF
toolings (BCC/bpftrace, etc), but it would be a bit confused for me
that we don't have a offical repo to maintain C based BPF toolkits.

Sometimes, C based BPF tool is small and handy ...

Thanks,
Leo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-20 21:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-19 15:33 [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters Leo Yan
2025-01-20 16:18 ` Daniel Borkmann
2025-01-20 18:50   ` Leo Yan
2025-01-20 19:38     ` Alexei Starovoitov
2025-01-20 19:54       ` Daniel Borkmann
2025-01-20 21:37         ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox