[PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table
@ 2025-07-28  4:12 Menglong Dong
  2025-07-28  4:12 ` [PATCH bpf-next 1/4] fprobe: use rhashtable Menglong Dong
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Menglong Dong @ 2025-07-28  4:12 UTC (permalink / raw)
  To: alexei.starovoitov, mhiramat
  Cc: rostedt, mathieu.desnoyers, hca, revest, linux-kernel,
	linux-trace-kernel, bpf

For now, the budget of the hash table that is used for fprobe_ip_table is
fixed, which is 256, and can cause huge overhead when the hooked functions
is a huge quantity.

In this series, we use rhashtable for fprobe_ip_table to reduce the
overhead.

Meanwhile, we also add the benchmark testcase "kprobe-multi-all", which
will hook all the kernel functions during the testing. Before this series,
the performance is:
  usermode-count :  875.380 ± 0.366M/s 
  kernel-count   :  435.924 ± 0.461M/s 
  syscall-count  :   31.004 ± 0.017M/s 
  fentry         :  134.076 ± 1.752M/s 
  fexit          :   68.319 ± 0.055M/s 
  fmodret        :   71.530 ± 0.032M/s 
  rawtp          :  202.751 ± 0.138M/s 
  tp             :   79.562 ± 0.084M/s 
  kprobe         :   55.587 ± 0.028M/s 
  kprobe-multi   :   56.481 ± 0.043M/s 
  kprobe-multi-all:    6.283 ± 0.005M/s << look this
  kretprobe      :   22.378 ± 0.028M/s 
  kretprobe-multi:   28.205 ± 0.025M/s

With this series, the performance is:
  usermode-count :  897.083 ± 5.347M/s 
  kernel-count   :  431.638 ± 1.781M/s 
  syscall-count  :   30.807 ± 0.057M/s 
  fentry         :  134.803 ± 1.045M/s 
  fexit          :   68.763 ± 0.018M/s 
  fmodret        :   71.444 ± 0.052M/s 
  rawtp          :  202.344 ± 0.149M/s 
  tp             :   79.644 ± 0.376M/s 
  kprobe         :   55.480 ± 0.108M/s 
  kprobe-multi   :   57.302 ± 0.119M/s 
  kprobe-multi-all:   57.855 ± 0.144M/s << look this
  kretprobe      :   22.265 ± 0.023M/s 
  kretprobe-multi:   27.740 ± 0.023M/s

The benchmark of "kprobe-multi-all" increase from 6.283M/s to 57.855M/s.

Menglong Dong (4):
  fprobe: use rhashtable
  selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
  selftests/bpf: add benchmark testing for kprobe-multi-all
  selftests/bpf: skip recursive functions for kprobe_multi

 include/linux/fprobe.h                        |   2 +-
 kernel/trace/fprobe.c                         | 144 ++++++-----
 tools/testing/selftests/bpf/bench.c           |   2 +
 .../selftests/bpf/benchs/bench_trigger.c      |  30 +++
 .../selftests/bpf/benchs/run_bench_trigger.sh |   2 +-
 .../bpf/prog_tests/kprobe_multi_test.c        | 220 +----------------
 tools/testing/selftests/bpf/trace_helpers.c   | 230 ++++++++++++++++++
 tools/testing/selftests/bpf/trace_helpers.h   |   3 +
 8 files changed, 351 insertions(+), 282 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 1/4] fprobe: use rhashtable
  2025-07-28  4:12 [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Menglong Dong
@ 2025-07-28  4:12 ` Menglong Dong
  2025-07-28 13:13   ` Jiri Olsa
  2025-07-29  3:43   ` kernel test robot
  2025-07-28  4:12 ` [PATCH bpf-next 2/4] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 11+ messages in thread
From: Menglong Dong @ 2025-07-28  4:12 UTC (permalink / raw)
  To: alexei.starovoitov, mhiramat
  Cc: rostedt, mathieu.desnoyers, hca, revest, linux-kernel,
	linux-trace-kernel, bpf

For now, all the kernel functions who are hooked by the fprobe will be
added to the hash table "fprobe_ip_table". The key of it is the function
address, and the value of it is "struct fprobe_hlist_node".

The budget of the hash table is FPROBE_IP_TABLE_SIZE, which is 256. And
this means the overhead of the hash table lookup will grow linearly if
the count of the functions in the fprobe more than 256. When we try to
hook all the kernel functions, the overhead will be huge.

Therefore, replace the hash table with rhashtable to reduce the overhead.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 include/linux/fprobe.h |   2 +-
 kernel/trace/fprobe.c  | 144 +++++++++++++++++++++++------------------
 2 files changed, 82 insertions(+), 64 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 702099f08929..0c9b239f5485 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -26,7 +26,7 @@ typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
  * @fp: The fprobe which owns this.
  */
 struct fprobe_hlist_node {
-	struct hlist_node	hlist;
+	struct rhash_head	hlist;
 	unsigned long		addr;
 	struct fprobe		*fp;
 };
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index ba7ff14f5339..b3e16303fc6a 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -12,6 +12,7 @@
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/sort.h>
+#include <linux/rhashtable.h>
 
 #include <asm/fprobe.h>
 
@@ -41,47 +42,47 @@
  *  - RCU hlist traversal under disabling preempt
  */
 static struct hlist_head fprobe_table[FPROBE_TABLE_SIZE];
-static struct hlist_head fprobe_ip_table[FPROBE_IP_TABLE_SIZE];
+static struct rhashtable fprobe_ip_table;
 static DEFINE_MUTEX(fprobe_mutex);
 
-/*
- * Find first fprobe in the hlist. It will be iterated twice in the entry
- * probe, once for correcting the total required size, the second time is
- * calling back the user handlers.
- * Thus the hlist in the fprobe_table must be sorted and new probe needs to
- * be added *before* the first fprobe.
- */
-static struct fprobe_hlist_node *find_first_fprobe_node(unsigned long ip)
+static u32 fprobe_node_hashfn(const void *data, u32 len, u32 seed)
 {
-	struct fprobe_hlist_node *node;
-	struct hlist_head *head;
+	return hash_ptr(*(unsigned long **)data, 32);
+}
 
-	head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
-	hlist_for_each_entry_rcu(node, head, hlist,
-				 lockdep_is_held(&fprobe_mutex)) {
-		if (node->addr == ip)
-			return node;
-	}
-	return NULL;
+static int fprobe_node_cmp(struct rhashtable_compare_arg *arg,
+			   const void *ptr)
+{
+	unsigned long key = *(unsigned long *)arg->key;
+	const struct fprobe_hlist_node *n = ptr;
+
+	return n->addr != key;
+}
+
+static u32 fprobe_node_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+	const struct fprobe_hlist_node *n = data;
+
+	return hash_ptr((void *)n->addr, 32);
 }
-NOKPROBE_SYMBOL(find_first_fprobe_node);
+
+static const struct rhashtable_params fprobe_rht_params = {
+	.head_offset		= offsetof(struct fprobe_hlist_node, hlist),
+	.key_offset		= offsetof(struct fprobe_hlist_node, addr),
+	.key_len		= sizeof_field(struct fprobe_hlist_node, addr),
+	.hashfn			= fprobe_node_hashfn,
+	.obj_hashfn		= fprobe_node_obj_hashfn,
+	.obj_cmpfn		= fprobe_node_cmp,
+	.automatic_shrinking	= true,
+};
 
 /* Node insertion and deletion requires the fprobe_mutex */
 static void insert_fprobe_node(struct fprobe_hlist_node *node)
 {
-	unsigned long ip = node->addr;
-	struct fprobe_hlist_node *next;
-	struct hlist_head *head;
-
 	lockdep_assert_held(&fprobe_mutex);
 
-	next = find_first_fprobe_node(ip);
-	if (next) {
-		hlist_add_before_rcu(&node->hlist, &next->hlist);
-		return;
-	}
-	head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
-	hlist_add_head_rcu(&node->hlist, head);
+	rhashtable_insert_fast(&fprobe_ip_table, &node->hlist,
+			       fprobe_rht_params);
 }
 
 /* Return true if there are synonims */
@@ -92,9 +93,11 @@ static bool delete_fprobe_node(struct fprobe_hlist_node *node)
 	/* Avoid double deleting */
 	if (READ_ONCE(node->fp) != NULL) {
 		WRITE_ONCE(node->fp, NULL);
-		hlist_del_rcu(&node->hlist);
+		rhashtable_remove_fast(&fprobe_ip_table, &node->hlist,
+				       fprobe_rht_params);
 	}
-	return !!find_first_fprobe_node(node->addr);
+	return !!rhashtable_lookup_fast(&fprobe_ip_table, &node->addr,
+					fprobe_rht_params);
 }
 
 /* Check existence of the fprobe */
@@ -249,25 +252,28 @@ static inline int __fprobe_kprobe_handler(unsigned long ip, unsigned long parent
 static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
 			struct ftrace_regs *fregs)
 {
-	struct fprobe_hlist_node *node, *first;
+	struct rhash_lock_head __rcu *const *bkt;
+	struct fprobe_hlist_node *node;
 	unsigned long *fgraph_data = NULL;
 	unsigned long func = trace->func;
+	struct bucket_table *tbl;
+	struct rhash_head *head;
 	unsigned long ret_ip;
 	int reserved_words;
 	struct fprobe *fp;
+	unsigned int key;
 	int used, ret;
 
 	if (WARN_ON_ONCE(!fregs))
 		return 0;
 
-	first = node = find_first_fprobe_node(func);
-	if (unlikely(!first))
-		return 0;
-
+	tbl = rht_dereference_rcu(fprobe_ip_table.tbl, &fprobe_ip_table);
+	key = rht_key_hashfn(&fprobe_ip_table, tbl, &func, fprobe_rht_params);
+	bkt = rht_bucket(tbl, key);
 	reserved_words = 0;
-	hlist_for_each_entry_from_rcu(node, hlist) {
+	rht_for_each_entry_rcu_from(node, head, rht_ptr_rcu(bkt), tbl, key, hlist) {
 		if (node->addr != func)
-			break;
+			continue;
 		fp = READ_ONCE(node->fp);
 		if (!fp || !fp->exit_handler)
 			continue;
@@ -278,13 +284,13 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
 		reserved_words +=
 			FPROBE_HEADER_SIZE_IN_LONG + SIZE_IN_LONG(fp->entry_data_size);
 	}
-	node = first;
 	if (reserved_words) {
 		fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long));
 		if (unlikely(!fgraph_data)) {
-			hlist_for_each_entry_from_rcu(node, hlist) {
+			rht_for_each_entry_rcu_from(node, head, rht_ptr_rcu(bkt),
+						    tbl, key, hlist) {
 				if (node->addr != func)
-					break;
+					continue;
 				fp = READ_ONCE(node->fp);
 				if (fp && !fprobe_disabled(fp))
 					fp->nmissed++;
@@ -299,12 +305,12 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
 	 */
 	ret_ip = ftrace_regs_get_return_address(fregs);
 	used = 0;
-	hlist_for_each_entry_from_rcu(node, hlist) {
+	rht_for_each_entry_rcu_from(node, head, rht_ptr_rcu(bkt), tbl, key, hlist) {
 		int data_size;
 		void *data;
 
 		if (node->addr != func)
-			break;
+			continue;
 		fp = READ_ONCE(node->fp);
 		if (!fp || fprobe_disabled(fp))
 			continue;
@@ -448,25 +454,21 @@ static int fprobe_addr_list_add(struct fprobe_addr_list *alist, unsigned long ad
 	return 0;
 }
 
-static void fprobe_remove_node_in_module(struct module *mod, struct hlist_head *head,
-					struct fprobe_addr_list *alist)
+static void fprobe_remove_node_in_module(struct module *mod, struct fprobe_hlist_node *node,
+					 struct fprobe_addr_list *alist)
 {
-	struct fprobe_hlist_node *node;
 	int ret = 0;
 
-	hlist_for_each_entry_rcu(node, head, hlist,
-				 lockdep_is_held(&fprobe_mutex)) {
-		if (!within_module(node->addr, mod))
-			continue;
-		if (delete_fprobe_node(node))
-			continue;
-		/*
-		 * If failed to update alist, just continue to update hlist.
-		 * Therefore, at list user handler will not hit anymore.
-		 */
-		if (!ret)
-			ret = fprobe_addr_list_add(alist, node->addr);
-	}
+	if (!within_module(node->addr, mod))
+		return;
+	if (delete_fprobe_node(node))
+		return;
+	/*
+	 * If failed to update alist, just continue to update hlist.
+	 * Therefore, at list user handler will not hit anymore.
+	 */
+	if (!ret)
+		ret = fprobe_addr_list_add(alist, node->addr);
 }
 
 /* Handle module unloading to manage fprobe_ip_table. */
@@ -474,8 +476,9 @@ static int fprobe_module_callback(struct notifier_block *nb,
 				  unsigned long val, void *data)
 {
 	struct fprobe_addr_list alist = {.size = FPROBE_IPS_BATCH_INIT};
+	struct fprobe_hlist_node *node;
+	struct rhashtable_iter iter;
 	struct module *mod = data;
-	int i;
 
 	if (val != MODULE_STATE_GOING)
 		return NOTIFY_DONE;
@@ -486,8 +489,16 @@ static int fprobe_module_callback(struct notifier_block *nb,
 		return NOTIFY_DONE;
 
 	mutex_lock(&fprobe_mutex);
-	for (i = 0; i < FPROBE_IP_TABLE_SIZE; i++)
-		fprobe_remove_node_in_module(mod, &fprobe_ip_table[i], &alist);
+	rhashtable_walk_enter(&fprobe_ip_table, &iter);
+	do {
+		rhashtable_walk_start(&iter);
+
+		while ((node = rhashtable_walk_next(&iter)) && !IS_ERR(node))
+			fprobe_remove_node_in_module(mod, node, &alist);
+
+		rhashtable_walk_stop(&iter);
+	} while (node == ERR_PTR(-EAGAIN));
+	rhashtable_walk_exit(&iter);
 
 	if (alist.index < alist.size && alist.index > 0)
 		ftrace_set_filter_ips(&fprobe_graph_ops.ops,
@@ -819,3 +830,10 @@ int unregister_fprobe(struct fprobe *fp)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(unregister_fprobe);
+
+static int __init fprobe_initcall(void)
+{
+	rhashtable_init(&fprobe_ip_table, &fprobe_rht_params);
+	return 0;
+}
+late_initcall(fprobe_initcall);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 2/4] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
  2025-07-28  4:12 [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Menglong Dong
  2025-07-28  4:12 ` [PATCH bpf-next 1/4] fprobe: use rhashtable Menglong Dong
@ 2025-07-28  4:12 ` Menglong Dong
  2025-07-28  4:12 ` [PATCH bpf-next 3/4] selftests/bpf: skip recursive functions for kprobe_multi Menglong Dong
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-07-28  4:12 UTC (permalink / raw)
  To: alexei.starovoitov, mhiramat
  Cc: rostedt, mathieu.desnoyers, hca, revest, linux-kernel,
	linux-trace-kernel, bpf

We need to get all the kernel function that can be traced sometimes, so we
move the get_syms() and get_addrs() in kprobe_multi_test.c to
trace_helpers.c and rename it to bpf_get_ksyms() and bpf_get_addrs().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 .../bpf/prog_tests/kprobe_multi_test.c        | 220 +-----------------
 tools/testing/selftests/bpf/trace_helpers.c   | 214 +++++++++++++++++
 tools/testing/selftests/bpf/trace_helpers.h   |   3 +
 3 files changed, 220 insertions(+), 217 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
index e19ef509ebf8..171706e78da8 100644
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -422,220 +422,6 @@ static void test_unique_match(void)
 	kprobe_multi__destroy(skel);
 }
 
-static size_t symbol_hash(long key, void *ctx __maybe_unused)
-{
-	return str_hash((const char *) key);
-}
-
-static bool symbol_equal(long key1, long key2, void *ctx __maybe_unused)
-{
-	return strcmp((const char *) key1, (const char *) key2) == 0;
-}
-
-static bool is_invalid_entry(char *buf, bool kernel)
-{
-	if (kernel && strchr(buf, '['))
-		return true;
-	if (!kernel && !strchr(buf, '['))
-		return true;
-	return false;
-}
-
-static bool skip_entry(char *name)
-{
-	/*
-	 * We attach to almost all kernel functions and some of them
-	 * will cause 'suspicious RCU usage' when fprobe is attached
-	 * to them. Filter out the current culprits - arch_cpu_idle
-	 * default_idle and rcu_* functions.
-	 */
-	if (!strcmp(name, "arch_cpu_idle"))
-		return true;
-	if (!strcmp(name, "default_idle"))
-		return true;
-	if (!strncmp(name, "rcu_", 4))
-		return true;
-	if (!strcmp(name, "bpf_dispatcher_xdp_func"))
-		return true;
-	if (!strncmp(name, "__ftrace_invalid_address__",
-		     sizeof("__ftrace_invalid_address__") - 1))
-		return true;
-	return false;
-}
-
-/* Do comparision by ignoring '.llvm.<hash>' suffixes. */
-static int compare_name(const char *name1, const char *name2)
-{
-	const char *res1, *res2;
-	int len1, len2;
-
-	res1 = strstr(name1, ".llvm.");
-	res2 = strstr(name2, ".llvm.");
-	len1 = res1 ? res1 - name1 : strlen(name1);
-	len2 = res2 ? res2 - name2 : strlen(name2);
-
-	if (len1 == len2)
-		return strncmp(name1, name2, len1);
-	if (len1 < len2)
-		return strncmp(name1, name2, len1) <= 0 ? -1 : 1;
-	return strncmp(name1, name2, len2) >= 0 ? 1 : -1;
-}
-
-static int load_kallsyms_compare(const void *p1, const void *p2)
-{
-	return compare_name(((const struct ksym *)p1)->name, ((const struct ksym *)p2)->name);
-}
-
-static int search_kallsyms_compare(const void *p1, const struct ksym *p2)
-{
-	return compare_name(p1, p2->name);
-}
-
-static int get_syms(char ***symsp, size_t *cntp, bool kernel)
-{
-	size_t cap = 0, cnt = 0;
-	char *name = NULL, *ksym_name, **syms = NULL;
-	struct hashmap *map;
-	struct ksyms *ksyms;
-	struct ksym *ks;
-	char buf[256];
-	FILE *f;
-	int err = 0;
-
-	ksyms = load_kallsyms_custom_local(load_kallsyms_compare);
-	if (!ASSERT_OK_PTR(ksyms, "load_kallsyms_custom_local"))
-		return -EINVAL;
-
-	/*
-	 * The available_filter_functions contains many duplicates,
-	 * but other than that all symbols are usable in kprobe multi
-	 * interface.
-	 * Filtering out duplicates by using hashmap__add, which won't
-	 * add existing entry.
-	 */
-
-	if (access("/sys/kernel/tracing/trace", F_OK) == 0)
-		f = fopen("/sys/kernel/tracing/available_filter_functions", "r");
-	else
-		f = fopen("/sys/kernel/debug/tracing/available_filter_functions", "r");
-
-	if (!f)
-		return -EINVAL;
-
-	map = hashmap__new(symbol_hash, symbol_equal, NULL);
-	if (IS_ERR(map)) {
-		err = libbpf_get_error(map);
-		goto error;
-	}
-
-	while (fgets(buf, sizeof(buf), f)) {
-		if (is_invalid_entry(buf, kernel))
-			continue;
-
-		free(name);
-		if (sscanf(buf, "%ms$*[^\n]\n", &name) != 1)
-			continue;
-		if (skip_entry(name))
-			continue;
-
-		ks = search_kallsyms_custom_local(ksyms, name, search_kallsyms_compare);
-		if (!ks) {
-			err = -EINVAL;
-			goto error;
-		}
-
-		ksym_name = ks->name;
-		err = hashmap__add(map, ksym_name, 0);
-		if (err == -EEXIST) {
-			err = 0;
-			continue;
-		}
-		if (err)
-			goto error;
-
-		err = libbpf_ensure_mem((void **) &syms, &cap,
-					sizeof(*syms), cnt + 1);
-		if (err)
-			goto error;
-
-		syms[cnt++] = ksym_name;
-	}
-
-	*symsp = syms;
-	*cntp = cnt;
-
-error:
-	free(name);
-	fclose(f);
-	hashmap__free(map);
-	if (err)
-		free(syms);
-	return err;
-}
-
-static int get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel)
-{
-	unsigned long *addr, *addrs, *tmp_addrs;
-	int err = 0, max_cnt, inc_cnt;
-	char *name = NULL;
-	size_t cnt = 0;
-	char buf[256];
-	FILE *f;
-
-	if (access("/sys/kernel/tracing/trace", F_OK) == 0)
-		f = fopen("/sys/kernel/tracing/available_filter_functions_addrs", "r");
-	else
-		f = fopen("/sys/kernel/debug/tracing/available_filter_functions_addrs", "r");
-
-	if (!f)
-		return -ENOENT;
-
-	/* In my local setup, the number of entries is 50k+ so Let us initially
-	 * allocate space to hold 64k entries. If 64k is not enough, incrementally
-	 * increase 1k each time.
-	 */
-	max_cnt = 65536;
-	inc_cnt = 1024;
-	addrs = malloc(max_cnt * sizeof(long));
-	if (addrs == NULL) {
-		err = -ENOMEM;
-		goto error;
-	}
-
-	while (fgets(buf, sizeof(buf), f)) {
-		if (is_invalid_entry(buf, kernel))
-			continue;
-
-		free(name);
-		if (sscanf(buf, "%p %ms$*[^\n]\n", &addr, &name) != 2)
-			continue;
-		if (skip_entry(name))
-			continue;
-
-		if (cnt == max_cnt) {
-			max_cnt += inc_cnt;
-			tmp_addrs = realloc(addrs, max_cnt);
-			if (!tmp_addrs) {
-				err = -ENOMEM;
-				goto error;
-			}
-			addrs = tmp_addrs;
-		}
-
-		addrs[cnt++] = (unsigned long)addr;
-	}
-
-	*addrsp = addrs;
-	*cntp = cnt;
-
-error:
-	free(name);
-	fclose(f);
-	if (err)
-		free(addrs);
-	return err;
-}
-
 static void do_bench_test(struct kprobe_multi_empty *skel, struct bpf_kprobe_multi_opts *opts)
 {
 	long attach_start_ns, attach_end_ns;
@@ -670,7 +456,7 @@ static void test_kprobe_multi_bench_attach(bool kernel)
 	char **syms = NULL;
 	size_t cnt = 0;
 
-	if (!ASSERT_OK(get_syms(&syms, &cnt, kernel), "get_syms"))
+	if (!ASSERT_OK(bpf_get_ksyms(&syms, &cnt, kernel), "bpf_get_ksyms"))
 		return;
 
 	skel = kprobe_multi_empty__open_and_load();
@@ -696,13 +482,13 @@ static void test_kprobe_multi_bench_attach_addr(bool kernel)
 	size_t cnt = 0;
 	int err;
 
-	err = get_addrs(&addrs, &cnt, kernel);
+	err = bpf_get_addrs(&addrs, &cnt, kernel);
 	if (err == -ENOENT) {
 		test__skip();
 		return;
 	}
 
-	if (!ASSERT_OK(err, "get_addrs"))
+	if (!ASSERT_OK(err, "bpf_get_addrs"))
 		return;
 
 	skel = kprobe_multi_empty__open_and_load();
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 81943c6254e6..d24baf244d1f 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -17,6 +17,7 @@
 #include <linux/limits.h>
 #include <libelf.h>
 #include <gelf.h>
+#include "bpf/hashmap.h"
 #include "bpf/libbpf_internal.h"
 
 #define TRACEFS_PIPE	"/sys/kernel/tracing/trace_pipe"
@@ -519,3 +520,216 @@ void read_trace_pipe(void)
 {
 	read_trace_pipe_iter(trace_pipe_cb, NULL, 0);
 }
+
+static size_t symbol_hash(long key, void *ctx __maybe_unused)
+{
+	return str_hash((const char *) key);
+}
+
+static bool symbol_equal(long key1, long key2, void *ctx __maybe_unused)
+{
+	return strcmp((const char *) key1, (const char *) key2) == 0;
+}
+
+static bool is_invalid_entry(char *buf, bool kernel)
+{
+	if (kernel && strchr(buf, '['))
+		return true;
+	if (!kernel && !strchr(buf, '['))
+		return true;
+	return false;
+}
+
+static bool skip_entry(char *name)
+{
+	/*
+	 * We attach to almost all kernel functions and some of them
+	 * will cause 'suspicious RCU usage' when fprobe is attached
+	 * to them. Filter out the current culprits - arch_cpu_idle
+	 * default_idle and rcu_* functions.
+	 */
+	if (!strcmp(name, "arch_cpu_idle"))
+		return true;
+	if (!strcmp(name, "default_idle"))
+		return true;
+	if (!strncmp(name, "rcu_", 4))
+		return true;
+	if (!strcmp(name, "bpf_dispatcher_xdp_func"))
+		return true;
+	if (!strncmp(name, "__ftrace_invalid_address__",
+		     sizeof("__ftrace_invalid_address__") - 1))
+		return true;
+	return false;
+}
+
+/* Do comparison by ignoring '.llvm.<hash>' suffixes. */
+static int compare_name(const char *name1, const char *name2)
+{
+	const char *res1, *res2;
+	int len1, len2;
+
+	res1 = strstr(name1, ".llvm.");
+	res2 = strstr(name2, ".llvm.");
+	len1 = res1 ? res1 - name1 : strlen(name1);
+	len2 = res2 ? res2 - name2 : strlen(name2);
+
+	if (len1 == len2)
+		return strncmp(name1, name2, len1);
+	if (len1 < len2)
+		return strncmp(name1, name2, len1) <= 0 ? -1 : 1;
+	return strncmp(name1, name2, len2) >= 0 ? 1 : -1;
+}
+
+static int load_kallsyms_compare(const void *p1, const void *p2)
+{
+	return compare_name(((const struct ksym *)p1)->name, ((const struct ksym *)p2)->name);
+}
+
+static int search_kallsyms_compare(const void *p1, const struct ksym *p2)
+{
+	return compare_name(p1, p2->name);
+}
+
+int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel)
+{
+	size_t cap = 0, cnt = 0;
+	char *name = NULL, *ksym_name, **syms = NULL;
+	struct hashmap *map;
+	struct ksyms *ksyms;
+	struct ksym *ks;
+	char buf[256];
+	FILE *f;
+	int err = 0;
+
+	ksyms = load_kallsyms_custom_local(load_kallsyms_compare);
+	if (!ksyms)
+		return -EINVAL;
+
+	/*
+	 * The available_filter_functions contains many duplicates,
+	 * but other than that all symbols are usable to trace.
+	 * Filtering out duplicates by using hashmap__add, which won't
+	 * add existing entry.
+	 */
+
+	if (access("/sys/kernel/tracing/trace", F_OK) == 0)
+		f = fopen("/sys/kernel/tracing/available_filter_functions", "r");
+	else
+		f = fopen("/sys/kernel/debug/tracing/available_filter_functions", "r");
+
+	if (!f)
+		return -EINVAL;
+
+	map = hashmap__new(symbol_hash, symbol_equal, NULL);
+	if (IS_ERR(map)) {
+		err = libbpf_get_error(map);
+		goto error;
+	}
+
+	while (fgets(buf, sizeof(buf), f)) {
+		if (is_invalid_entry(buf, kernel))
+			continue;
+
+		free(name);
+		if (sscanf(buf, "%ms$*[^\n]\n", &name) != 1)
+			continue;
+		if (skip_entry(name))
+			continue;
+
+		ks = search_kallsyms_custom_local(ksyms, name, search_kallsyms_compare);
+		if (!ks) {
+			err = -EINVAL;
+			goto error;
+		}
+
+		ksym_name = ks->name;
+		err = hashmap__add(map, ksym_name, 0);
+		if (err == -EEXIST) {
+			err = 0;
+			continue;
+		}
+		if (err)
+			goto error;
+
+		err = libbpf_ensure_mem((void **) &syms, &cap,
+					sizeof(*syms), cnt + 1);
+		if (err)
+			goto error;
+
+		syms[cnt++] = ksym_name;
+	}
+
+	*symsp = syms;
+	*cntp = cnt;
+
+error:
+	free(name);
+	fclose(f);
+	hashmap__free(map);
+	if (err)
+		free(syms);
+	return err;
+}
+
+int bpf_get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel)
+{
+	unsigned long *addr, *addrs, *tmp_addrs;
+	int err = 0, max_cnt, inc_cnt;
+	char *name = NULL;
+	size_t cnt = 0;
+	char buf[256];
+	FILE *f;
+
+	if (access("/sys/kernel/tracing/trace", F_OK) == 0)
+		f = fopen("/sys/kernel/tracing/available_filter_functions_addrs", "r");
+	else
+		f = fopen("/sys/kernel/debug/tracing/available_filter_functions_addrs", "r");
+
+	if (!f)
+		return -ENOENT;
+
+	/* In my local setup, the number of entries is 50k+ so Let us initially
+	 * allocate space to hold 64k entries. If 64k is not enough, incrementally
+	 * increase 1k each time.
+	 */
+	max_cnt = 65536;
+	inc_cnt = 1024;
+	addrs = malloc(max_cnt * sizeof(long));
+	if (addrs == NULL) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	while (fgets(buf, sizeof(buf), f)) {
+		if (is_invalid_entry(buf, kernel))
+			continue;
+
+		free(name);
+		if (sscanf(buf, "%p %ms$*[^\n]\n", &addr, &name) != 2)
+			continue;
+		if (skip_entry(name))
+			continue;
+
+		if (cnt == max_cnt) {
+			max_cnt += inc_cnt;
+			tmp_addrs = realloc(addrs, max_cnt);
+			if (!tmp_addrs) {
+				err = -ENOMEM;
+				goto error;
+			}
+			addrs = tmp_addrs;
+		}
+
+		addrs[cnt++] = (unsigned long)addr;
+	}
+
+	*addrsp = addrs;
+	*cntp = cnt;
+
+error:
+	free(name);
+	fclose(f);
+	if (err)
+		free(addrs);
+	return err;
+}
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
index 2ce873c9f9aa..9437bdd4afa5 100644
--- a/tools/testing/selftests/bpf/trace_helpers.h
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -41,4 +41,7 @@ ssize_t get_rel_offset(uintptr_t addr);
 
 int read_build_id(const char *path, char *build_id, size_t size);
 
+int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel);
+int bpf_get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel);
+
 #endif
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 3/4] selftests/bpf: skip recursive functions for kprobe_multi
  2025-07-28  4:12 [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Menglong Dong
  2025-07-28  4:12 ` [PATCH bpf-next 1/4] fprobe: use rhashtable Menglong Dong
  2025-07-28  4:12 ` [PATCH bpf-next 2/4] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
@ 2025-07-28  4:12 ` Menglong Dong
  2025-07-28  4:12 ` [PATCH bpf-next 4/4] selftests/bpf: add benchmark testing for kprobe-multi-all Menglong Dong
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-07-28  4:12 UTC (permalink / raw)
  To: alexei.starovoitov, mhiramat
  Cc: rostedt, mathieu.desnoyers, hca, revest, linux-kernel,
	linux-trace-kernel, bpf

Some functions is recursive for the kprobe_multi and impact the benchmark
results. So just skip them.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 tools/testing/selftests/bpf/trace_helpers.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index d24baf244d1f..9da9da51b132 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -559,6 +559,22 @@ static bool skip_entry(char *name)
 	if (!strncmp(name, "__ftrace_invalid_address__",
 		     sizeof("__ftrace_invalid_address__") - 1))
 		return true;
+
+	if (!strcmp(name, "migrate_disable"))
+		return true;
+	if (!strcmp(name, "migrate_enable"))
+		return true;
+	if (!strcmp(name, "rcu_read_unlock_strict"))
+		return true;
+	if (!strcmp(name, "preempt_count_add"))
+		return true;
+	if (!strcmp(name, "preempt_count_sub"))
+		return true;
+	if (!strcmp(name, "__rcu_read_lock"))
+		return true;
+	if (!strcmp(name, "__rcu_read_unlock"))
+		return true;
+
 	return false;
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 4/4] selftests/bpf: add benchmark testing for kprobe-multi-all
  2025-07-28  4:12 [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Menglong Dong
                   ` (2 preceding siblings ...)
  2025-07-28  4:12 ` [PATCH bpf-next 3/4] selftests/bpf: skip recursive functions for kprobe_multi Menglong Dong
@ 2025-07-28  4:12 ` Menglong Dong
  2025-07-28  7:28 ` [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Masami Hiramatsu
  2025-07-28 13:14 ` Jiri Olsa
  5 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-07-28  4:12 UTC (permalink / raw)
  To: alexei.starovoitov, mhiramat
  Cc: rostedt, mathieu.desnoyers, hca, revest, linux-kernel,
	linux-trace-kernel, bpf

For now, the benchmark for kprobe-multi is single, which means there is
only 1 function is hooked during testing. Add the testing
"kprobe-multi-all", which will hook all the kernel functions during
the benchmark.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 tools/testing/selftests/bpf/bench.c           |  2 ++
 .../selftests/bpf/benchs/bench_trigger.c      | 30 +++++++++++++++++++
 .../selftests/bpf/benchs/run_bench_trigger.sh |  2 +-
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index ddd73d06a1eb..da971d8c5ae5 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -510,6 +510,7 @@ extern const struct bench bench_trig_kretprobe;
 extern const struct bench bench_trig_kprobe_multi;
 extern const struct bench bench_trig_kretprobe_multi;
 extern const struct bench bench_trig_fentry;
+extern const struct bench bench_trig_kprobe_multi_all;
 extern const struct bench bench_trig_fexit;
 extern const struct bench bench_trig_fmodret;
 extern const struct bench bench_trig_tp;
@@ -578,6 +579,7 @@ static const struct bench *benchs[] = {
 	&bench_trig_kprobe_multi,
 	&bench_trig_kretprobe_multi,
 	&bench_trig_fentry,
+	&bench_trig_kprobe_multi_all,
 	&bench_trig_fexit,
 	&bench_trig_fmodret,
 	&bench_trig_tp,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 82327657846e..be5fe88862a4 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -226,6 +226,35 @@ static void trigger_fentry_setup(void)
 	attach_bpf(ctx.skel->progs.bench_trigger_fentry);
 }
 
+static void trigger_kprobe_multi_all_setup(void)
+{
+	LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
+	struct bpf_program *prog;
+	struct bpf_link *link;
+	char **syms = NULL;
+	size_t cnt = 0;
+
+	setup_ctx();
+	prog = ctx.skel->progs.bench_trigger_kprobe_multi;
+	bpf_program__set_autoload(prog, true);
+	load_ctx();
+
+	if (bpf_get_ksyms(&syms, &cnt, true)) {
+		printf("failed to get ksyms\n");
+		exit(1);
+	}
+
+	printf("found %zu ksyms\n", cnt);
+	opts.syms = (const char **) syms;
+	opts.cnt = cnt;
+	link = bpf_program__attach_kprobe_multi_opts(prog, NULL, &opts);
+	if (!link) {
+		printf("failed to attach bpf_program__attach_kprobe_multi_opts to all\n");
+		exit(1);
+	}
+	ctx.skel->links.bench_trigger_kprobe_multi = link;
+}
+
 static void trigger_fexit_setup(void)
 {
 	setup_ctx();
@@ -512,6 +541,7 @@ BENCH_TRIG_KERNEL(kretprobe, "kretprobe");
 BENCH_TRIG_KERNEL(kprobe_multi, "kprobe-multi");
 BENCH_TRIG_KERNEL(kretprobe_multi, "kretprobe-multi");
 BENCH_TRIG_KERNEL(fentry, "fentry");
+BENCH_TRIG_KERNEL(kprobe_multi_all, "kprobe-multi-all");
 BENCH_TRIG_KERNEL(fexit, "fexit");
 BENCH_TRIG_KERNEL(fmodret, "fmodret");
 BENCH_TRIG_KERNEL(tp, "tp");
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh b/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
index a690f5a68b6b..886b6ffc9742 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
@@ -6,7 +6,7 @@ def_tests=( \
 	usermode-count kernel-count syscall-count \
 	fentry fexit fmodret \
 	rawtp tp \
-	kprobe kprobe-multi \
+	kprobe kprobe-multi kprobe-multi-all \
 	kretprobe kretprobe-multi \
 )
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table
  2025-07-28  4:12 [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Menglong Dong
                   ` (3 preceding siblings ...)
  2025-07-28  4:12 ` [PATCH bpf-next 4/4] selftests/bpf: add benchmark testing for kprobe-multi-all Menglong Dong
@ 2025-07-28  7:28 ` Masami Hiramatsu
  2025-07-28 13:14 ` Jiri Olsa
  5 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu @ 2025-07-28  7:28 UTC (permalink / raw)
  To: Menglong Dong
  Cc: alexei.starovoitov, rostedt, mathieu.desnoyers, hca, revest,
	linux-kernel, linux-trace-kernel, bpf

On Mon, 28 Jul 2025 12:12:47 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:

> For now, the budget of the hash table that is used for fprobe_ip_table is
> fixed, which is 256, and can cause huge overhead when the hooked functions
> is a huge quantity.
> 
> In this series, we use rhashtable for fprobe_ip_table to reduce the
> overhead.
> 
> Meanwhile, we also add the benchmark testcase "kprobe-multi-all", which
> will hook all the kernel functions during the testing. Before this series,
> the performance is:
>   usermode-count :  875.380 ± 0.366M/s 
>   kernel-count   :  435.924 ± 0.461M/s 
>   syscall-count  :   31.004 ± 0.017M/s 
>   fentry         :  134.076 ± 1.752M/s 
>   fexit          :   68.319 ± 0.055M/s 
>   fmodret        :   71.530 ± 0.032M/s 
>   rawtp          :  202.751 ± 0.138M/s 
>   tp             :   79.562 ± 0.084M/s 
>   kprobe         :   55.587 ± 0.028M/s 
>   kprobe-multi   :   56.481 ± 0.043M/s 
>   kprobe-multi-all:    6.283 ± 0.005M/s << look this
>   kretprobe      :   22.378 ± 0.028M/s 
>   kretprobe-multi:   28.205 ± 0.025M/s
> 
> With this series, the performance is:
>   usermode-count :  897.083 ± 5.347M/s 
>   kernel-count   :  431.638 ± 1.781M/s 
>   syscall-count  :   30.807 ± 0.057M/s 
>   fentry         :  134.803 ± 1.045M/s 
>   fexit          :   68.763 ± 0.018M/s 
>   fmodret        :   71.444 ± 0.052M/s 
>   rawtp          :  202.344 ± 0.149M/s 
>   tp             :   79.644 ± 0.376M/s 
>   kprobe         :   55.480 ± 0.108M/s 
>   kprobe-multi   :   57.302 ± 0.119M/s 
>   kprobe-multi-all:   57.855 ± 0.144M/s << look this
>   kretprobe      :   22.265 ± 0.023M/s 
>   kretprobe-multi:   27.740 ± 0.023M/s
> 
> The benchmark of "kprobe-multi-all" increase from 6.283M/s to 57.855M/s.

Wow, great improvement. Interesting. Let me review it.

Thanks!!

> 
> Menglong Dong (4):
>   fprobe: use rhashtable
>   selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
>   selftests/bpf: add benchmark testing for kprobe-multi-all
>   selftests/bpf: skip recursive functions for kprobe_multi
> 
>  include/linux/fprobe.h                        |   2 +-
>  kernel/trace/fprobe.c                         | 144 ++++++-----
>  tools/testing/selftests/bpf/bench.c           |   2 +
>  .../selftests/bpf/benchs/bench_trigger.c      |  30 +++
>  .../selftests/bpf/benchs/run_bench_trigger.sh |   2 +-
>  .../bpf/prog_tests/kprobe_multi_test.c        | 220 +----------------
>  tools/testing/selftests/bpf/trace_helpers.c   | 230 ++++++++++++++++++
>  tools/testing/selftests/bpf/trace_helpers.h   |   3 +
>  8 files changed, 351 insertions(+), 282 deletions(-)
> 
> -- 
> 2.50.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 1/4] fprobe: use rhashtable
  2025-07-28  4:12 ` [PATCH bpf-next 1/4] fprobe: use rhashtable Menglong Dong
@ 2025-07-28 13:13   ` Jiri Olsa
  2025-07-28 14:44     ` Menglong Dong
  2025-07-29  3:43   ` kernel test robot
  1 sibling, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2025-07-28 13:13 UTC (permalink / raw)
  To: Menglong Dong
  Cc: alexei.starovoitov, mhiramat, rostedt, mathieu.desnoyers, hca,
	revest, linux-kernel, linux-trace-kernel, bpf

On Mon, Jul 28, 2025 at 12:12:48PM +0800, Menglong Dong wrote:

SNIP

> +static const struct rhashtable_params fprobe_rht_params = {
> +	.head_offset		= offsetof(struct fprobe_hlist_node, hlist),
> +	.key_offset		= offsetof(struct fprobe_hlist_node, addr),
> +	.key_len		= sizeof_field(struct fprobe_hlist_node, addr),
> +	.hashfn			= fprobe_node_hashfn,
> +	.obj_hashfn		= fprobe_node_obj_hashfn,
> +	.obj_cmpfn		= fprobe_node_cmp,
> +	.automatic_shrinking	= true,
> +};
>  
>  /* Node insertion and deletion requires the fprobe_mutex */
>  static void insert_fprobe_node(struct fprobe_hlist_node *node)
>  {
> -	unsigned long ip = node->addr;
> -	struct fprobe_hlist_node *next;
> -	struct hlist_head *head;
> -
>  	lockdep_assert_held(&fprobe_mutex);
>  
> -	next = find_first_fprobe_node(ip);
> -	if (next) {
> -		hlist_add_before_rcu(&node->hlist, &next->hlist);
> -		return;
> -	}
> -	head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
> -	hlist_add_head_rcu(&node->hlist, head);
> +	rhashtable_insert_fast(&fprobe_ip_table, &node->hlist,
> +			       fprobe_rht_params);

onw that rhashtable_insert_fast can fail, I think insert_fprobe_node
needs to be able to fail as well

>  }
>  
>  /* Return true if there are synonims */
> @@ -92,9 +93,11 @@ static bool delete_fprobe_node(struct fprobe_hlist_node *node)
>  	/* Avoid double deleting */
>  	if (READ_ONCE(node->fp) != NULL) {
>  		WRITE_ONCE(node->fp, NULL);
> -		hlist_del_rcu(&node->hlist);
> +		rhashtable_remove_fast(&fprobe_ip_table, &node->hlist,
> +				       fprobe_rht_params);

I guess this one can't fail in here.. ?

jirka

>  	}
> -	return !!find_first_fprobe_node(node->addr);
> +	return !!rhashtable_lookup_fast(&fprobe_ip_table, &node->addr,
> +					fprobe_rht_params);
>  }
>  

SNIP

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table
  2025-07-28  4:12 [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Menglong Dong
                   ` (4 preceding siblings ...)
  2025-07-28  7:28 ` [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Masami Hiramatsu
@ 2025-07-28 13:14 ` Jiri Olsa
  2025-07-28 14:36   ` Menglong Dong
  5 siblings, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2025-07-28 13:14 UTC (permalink / raw)
  To: Menglong Dong
  Cc: alexei.starovoitov, mhiramat, rostedt, mathieu.desnoyers, hca,
	revest, linux-kernel, linux-trace-kernel, bpf

On Mon, Jul 28, 2025 at 12:12:47PM +0800, Menglong Dong wrote:
> For now, the budget of the hash table that is used for fprobe_ip_table is
> fixed, which is 256, and can cause huge overhead when the hooked functions
> is a huge quantity.
> 
> In this series, we use rhashtable for fprobe_ip_table to reduce the
> overhead.
> 
> Meanwhile, we also add the benchmark testcase "kprobe-multi-all", which
> will hook all the kernel functions during the testing. Before this series,
> the performance is:
>   usermode-count :  875.380 ± 0.366M/s 
>   kernel-count   :  435.924 ± 0.461M/s 
>   syscall-count  :   31.004 ± 0.017M/s 
>   fentry         :  134.076 ± 1.752M/s 
>   fexit          :   68.319 ± 0.055M/s 
>   fmodret        :   71.530 ± 0.032M/s 
>   rawtp          :  202.751 ± 0.138M/s 
>   tp             :   79.562 ± 0.084M/s 
>   kprobe         :   55.587 ± 0.028M/s 
>   kprobe-multi   :   56.481 ± 0.043M/s 
>   kprobe-multi-all:    6.283 ± 0.005M/s << look this
>   kretprobe      :   22.378 ± 0.028M/s 
>   kretprobe-multi:   28.205 ± 0.025M/s
> 
> With this series, the performance is:
>   usermode-count :  897.083 ± 5.347M/s 
>   kernel-count   :  431.638 ± 1.781M/s 
>   syscall-count  :   30.807 ± 0.057M/s 
>   fentry         :  134.803 ± 1.045M/s 
>   fexit          :   68.763 ± 0.018M/s 
>   fmodret        :   71.444 ± 0.052M/s 
>   rawtp          :  202.344 ± 0.149M/s 
>   tp             :   79.644 ± 0.376M/s 
>   kprobe         :   55.480 ± 0.108M/s 
>   kprobe-multi   :   57.302 ± 0.119M/s 
>   kprobe-multi-all:   57.855 ± 0.144M/s << look this

nice, so the we still trigger one function, but having all possible
functions attached, right?

thanks,
jirka


>   kretprobe      :   22.265 ± 0.023M/s 
>   kretprobe-multi:   27.740 ± 0.023M/s
> 
> The benchmark of "kprobe-multi-all" increase from 6.283M/s to 57.855M/s.
> 
> Menglong Dong (4):
>   fprobe: use rhashtable
>   selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
>   selftests/bpf: add benchmark testing for kprobe-multi-all
>   selftests/bpf: skip recursive functions for kprobe_multi
> 
>  include/linux/fprobe.h                        |   2 +-
>  kernel/trace/fprobe.c                         | 144 ++++++-----
>  tools/testing/selftests/bpf/bench.c           |   2 +
>  .../selftests/bpf/benchs/bench_trigger.c      |  30 +++
>  .../selftests/bpf/benchs/run_bench_trigger.sh |   2 +-
>  .../bpf/prog_tests/kprobe_multi_test.c        | 220 +----------------
>  tools/testing/selftests/bpf/trace_helpers.c   | 230 ++++++++++++++++++
>  tools/testing/selftests/bpf/trace_helpers.h   |   3 +
>  8 files changed, 351 insertions(+), 282 deletions(-)
> 
> -- 
> 2.50.1
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table
  2025-07-28 13:14 ` Jiri Olsa
@ 2025-07-28 14:36   ` Menglong Dong
  0 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-07-28 14:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: alexei.starovoitov, mhiramat, rostedt, mathieu.desnoyers, hca,
	revest, linux-kernel, linux-trace-kernel, bpf

On Mon, Jul 28, 2025 at 9:14 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Jul 28, 2025 at 12:12:47PM +0800, Menglong Dong wrote:
> > For now, the budget of the hash table that is used for fprobe_ip_table is
> > fixed, which is 256, and can cause huge overhead when the hooked functions
> > is a huge quantity.
> >
> > In this series, we use rhashtable for fprobe_ip_table to reduce the
> > overhead.
> >
> > Meanwhile, we also add the benchmark testcase "kprobe-multi-all", which
> > will hook all the kernel functions during the testing. Before this series,
> > the performance is:
> >   usermode-count :  875.380 ± 0.366M/s
> >   kernel-count   :  435.924 ± 0.461M/s
> >   syscall-count  :   31.004 ± 0.017M/s
> >   fentry         :  134.076 ± 1.752M/s
> >   fexit          :   68.319 ± 0.055M/s
> >   fmodret        :   71.530 ± 0.032M/s
> >   rawtp          :  202.751 ± 0.138M/s
> >   tp             :   79.562 ± 0.084M/s
> >   kprobe         :   55.587 ± 0.028M/s
> >   kprobe-multi   :   56.481 ± 0.043M/s
> >   kprobe-multi-all:    6.283 ± 0.005M/s << look this
> >   kretprobe      :   22.378 ± 0.028M/s
> >   kretprobe-multi:   28.205 ± 0.025M/s
> >
> > With this series, the performance is:
> >   usermode-count :  897.083 ± 5.347M/s
> >   kernel-count   :  431.638 ± 1.781M/s
> >   syscall-count  :   30.807 ± 0.057M/s
> >   fentry         :  134.803 ± 1.045M/s
> >   fexit          :   68.763 ± 0.018M/s
> >   fmodret        :   71.444 ± 0.052M/s
> >   rawtp          :  202.344 ± 0.149M/s
> >   tp             :   79.644 ± 0.376M/s
> >   kprobe         :   55.480 ± 0.108M/s
> >   kprobe-multi   :   57.302 ± 0.119M/s
> >   kprobe-multi-all:   57.855 ± 0.144M/s << look this
>
> nice, so the we still trigger one function, but having all possible
> functions attached, right?

Yes. The test case can be improved further. For now,
I attach the prog bench_trigger_kprobe_multi to all the kernel
functions and triggers the benchmark. There can be some noise,
as all the kernel function calling can increase the benchmark
results. However, it will not make much difference.

A better choice will be: attach an empty kprobe_multi prog to
all the kernel functions except bpf_get_numa_node_id, and
attach bench_trigger_kprobe_multi to bpf_get_numa_node_id,
which can make the results more accurate.

>
> thanks,
> jirka
>
>
> >   kretprobe      :   22.265 ± 0.023M/s
> >   kretprobe-multi:   27.740 ± 0.023M/s
> >
> > The benchmark of "kprobe-multi-all" increase from 6.283M/s to 57.855M/s.
> >
> > Menglong Dong (4):
> >   fprobe: use rhashtable
> >   selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
> >   selftests/bpf: add benchmark testing for kprobe-multi-all
> >   selftests/bpf: skip recursive functions for kprobe_multi
> >
> >  include/linux/fprobe.h                        |   2 +-
> >  kernel/trace/fprobe.c                         | 144 ++++++-----
> >  tools/testing/selftests/bpf/bench.c           |   2 +
> >  .../selftests/bpf/benchs/bench_trigger.c      |  30 +++
> >  .../selftests/bpf/benchs/run_bench_trigger.sh |   2 +-
> >  .../bpf/prog_tests/kprobe_multi_test.c        | 220 +----------------
> >  tools/testing/selftests/bpf/trace_helpers.c   | 230 ++++++++++++++++++
> >  tools/testing/selftests/bpf/trace_helpers.h   |   3 +
> >  8 files changed, 351 insertions(+), 282 deletions(-)
> >
> > --
> > 2.50.1
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 1/4] fprobe: use rhashtable
  2025-07-28 13:13   ` Jiri Olsa
@ 2025-07-28 14:44     ` Menglong Dong
  0 siblings, 0 replies; 11+ messages in thread
From: Menglong Dong @ 2025-07-28 14:44 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: alexei.starovoitov, mhiramat, rostedt, mathieu.desnoyers, hca,
	revest, linux-kernel, linux-trace-kernel, bpf

On Mon, Jul 28, 2025 at 9:13 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Jul 28, 2025 at 12:12:48PM +0800, Menglong Dong wrote:
>
> SNIP
>
> > +static const struct rhashtable_params fprobe_rht_params = {
> > +     .head_offset            = offsetof(struct fprobe_hlist_node, hlist),
> > +     .key_offset             = offsetof(struct fprobe_hlist_node, addr),
> > +     .key_len                = sizeof_field(struct fprobe_hlist_node, addr),
> > +     .hashfn                 = fprobe_node_hashfn,
> > +     .obj_hashfn             = fprobe_node_obj_hashfn,
> > +     .obj_cmpfn              = fprobe_node_cmp,
> > +     .automatic_shrinking    = true,
> > +};
> >
> >  /* Node insertion and deletion requires the fprobe_mutex */
> >  static void insert_fprobe_node(struct fprobe_hlist_node *node)
> >  {
> > -     unsigned long ip = node->addr;
> > -     struct fprobe_hlist_node *next;
> > -     struct hlist_head *head;
> > -
> >       lockdep_assert_held(&fprobe_mutex);
> >
> > -     next = find_first_fprobe_node(ip);
> > -     if (next) {
> > -             hlist_add_before_rcu(&node->hlist, &next->hlist);
> > -             return;
> > -     }
> > -     head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
> > -     hlist_add_head_rcu(&node->hlist, head);
> > +     rhashtable_insert_fast(&fprobe_ip_table, &node->hlist,
> > +                            fprobe_rht_params);
>
> onw that rhashtable_insert_fast can fail, I think insert_fprobe_node
> needs to be able to fail as well

You are right, the insert_fprobe_node should return a error and
be handled properly.

>
> >  }
> >
> >  /* Return true if there are synonims */
> > @@ -92,9 +93,11 @@ static bool delete_fprobe_node(struct fprobe_hlist_node *node)
> >       /* Avoid double deleting */
> >       if (READ_ONCE(node->fp) != NULL) {
> >               WRITE_ONCE(node->fp, NULL);
> > -             hlist_del_rcu(&node->hlist);
> > +             rhashtable_remove_fast(&fprobe_ip_table, &node->hlist,
> > +                                    fprobe_rht_params);
>
> I guess this one can't fail in here.. ?

Yeah, the only failure is the entry doesn't exist in the hash
table.

BTW, the usage of rhltable is similar to rhashtable, and the
comment here is valid in the V2 too.

Thanks!
Menglong Dong

>
> jirka
>
> >       }
> > -     return !!find_first_fprobe_node(node->addr);
> > +     return !!rhashtable_lookup_fast(&fprobe_ip_table, &node->addr,
> > +                                     fprobe_rht_params);
> >  }
> >
>
> SNIP

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 1/4] fprobe: use rhashtable
  2025-07-28  4:12 ` [PATCH bpf-next 1/4] fprobe: use rhashtable Menglong Dong
  2025-07-28 13:13   ` Jiri Olsa
@ 2025-07-29  3:43   ` kernel test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-07-29  3:43 UTC (permalink / raw)
  To: Menglong Dong, alexei.starovoitov, mhiramat
  Cc: llvm, oe-kbuild-all, rostedt, mathieu.desnoyers, hca, revest,
	linux-kernel, linux-trace-kernel, bpf

Hi Menglong,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/fprobe-use-rhashtable/20250728-121631
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250728041252.441040-2-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next 1/4] fprobe: use rhashtable
config: loongarch-allmodconfig (https://download.01.org/0day-ci/archive/20250729/202507291147.Fov8pl4N-lkp@intel.com/config)
compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250729/202507291147.Fov8pl4N-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507291147.Fov8pl4N-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/trace/fprobe.c:8:
>> include/linux/fprobe.h:29:20: error: field has incomplete type 'struct rhash_head'
      29 |         struct rhash_head       hlist;
         |                                 ^
   include/linux/fprobe.h:29:9: note: forward declaration of 'struct rhash_head'
      29 |         struct rhash_head       hlist;
         |                ^
>> kernel/trace/fprobe.c:71:17: error: initializer element is not a compile-time constant
      71 |         .key_offset             = offsetof(struct fprobe_hlist_node, addr),
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/stddef.h:16:32: note: expanded from macro 'offsetof'
      16 | #define offsetof(TYPE, MEMBER)  __builtin_offsetof(TYPE, MEMBER)
         |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   2 errors generated.
--
   In file included from kernel/trace/trace_fprobe.c:9:
>> include/linux/fprobe.h:29:20: error: field has incomplete type 'struct rhash_head'
      29 |         struct rhash_head       hlist;
         |                                 ^
   include/linux/fprobe.h:29:9: note: forward declaration of 'struct rhash_head'
      29 |         struct rhash_head       hlist;
         |                ^
   1 error generated.


vim +29 include/linux/fprobe.h

    11	
    12	struct fprobe;
    13	typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned long entry_ip,
    14				       unsigned long ret_ip, struct ftrace_regs *regs,
    15				       void *entry_data);
    16	
    17	typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
    18				       unsigned long ret_ip, struct ftrace_regs *regs,
    19				       void *entry_data);
    20	
    21	/**
    22	 * struct fprobe_hlist_node - address based hash list node for fprobe.
    23	 *
    24	 * @hlist: The hlist node for address search hash table.
    25	 * @addr: One of the probing address of @fp.
    26	 * @fp: The fprobe which owns this.
    27	 */
    28	struct fprobe_hlist_node {
  > 29		struct rhash_head	hlist;
    30		unsigned long		addr;
    31		struct fprobe		*fp;
    32	};
    33	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-07-29  3:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-28  4:12 [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Menglong Dong
2025-07-28  4:12 ` [PATCH bpf-next 1/4] fprobe: use rhashtable Menglong Dong
2025-07-28 13:13   ` Jiri Olsa
2025-07-28 14:44     ` Menglong Dong
2025-07-29  3:43   ` kernel test robot
2025-07-28  4:12 ` [PATCH bpf-next 2/4] selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c Menglong Dong
2025-07-28  4:12 ` [PATCH bpf-next 3/4] selftests/bpf: skip recursive functions for kprobe_multi Menglong Dong
2025-07-28  4:12 ` [PATCH bpf-next 4/4] selftests/bpf: add benchmark testing for kprobe-multi-all Menglong Dong
2025-07-28  7:28 ` [PATCH bpf-next 0/4] fprobe: use rhashtable for fprobe_ip_table Masami Hiramatsu
2025-07-28 13:14 ` Jiri Olsa
2025-07-28 14:36   ` Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).