From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2455547D920 for ; Fri, 15 May 2026 11:32:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778844749; cv=none; b=FhphVt/oXOCzgcxOz4Z68k88naBMXZSVbX5nnVIg2BOZXweIFg9qRPXKRe10XE+1owkw/SmtepBiq5viMlrExaLUVnaRuvPijPeIXtu+BkwQJbw0BImVkIUDbW06PPpO2rHY/0nSLDaX9iISbuWGHV6jHfnTEF65QmnKUggSR6w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778844749; c=relaxed/simple; bh=Gyo7X/zqPyaQW6BaRXNYoZ1jjssW6nD4TdQM6rF9TZA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e2nALOoEwN5qv/9bC2kLh5Iti9z1KMpuYOou2QQPXaBhDmPgQE53H3yvueZ3z+yQrIlDBu5Hxkwn+LuZ3KgEMxFwp5xlZx+cFwDa+BXUXMLxKHk3C6/HuifUiHocBnMTfIdNqDK86tlmdZiIWNn377BVlKF0kFbgQFSyoOxdklQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=H+9xzzdg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H+9xzzdg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAC16C2BCC7; Fri, 15 May 2026 11:32:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778844748; bh=Gyo7X/zqPyaQW6BaRXNYoZ1jjssW6nD4TdQM6rF9TZA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=H+9xzzdg/06vJj59jf1gS/dxkeLM+WCb2b/hAkuFR9uwYfZMRX7DZNRp7QZTzJxd8 xTpp5qbIRC2kgewX/Y1HcICDZ4ltvay4svn+Aw57QUP1htWqRR4BWt/ZXMIc0UcYwd z5xI9NgpBttZj3KQQS2z5nbj1Io8egEY9Wnl4qYmaGHLcHwCZBtwEtQVl5zVPP187g 2Er5Tggx7plYBDyjxYlBqym9E30CT4PZFbtzP8+V7rXH6jgn6rdiczfVQct0oz/yTh y2bhREtappNbiqI4NxTx7I8yek69lfqx+BwdU1FqgTGfTe+tiUJJPQ6k/QZK17OXFV jkyiwQYnQDBwQ== From: Sasha Levin To: stable@vger.kernel.org Cc: Menglong Dong , Menglong Dong , "Masami Hiramatsu (Google)" , Sasha Levin Subject: [PATCH 6.18.y 1/4] tracing: fprobe: use rhltable for fprobe_ip_table Date: Fri, 15 May 2026 07:32:23 -0400 Message-ID: <20260515113226.2979191-1-sashal@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <2026051206-mocker-bonfire-ac1c@gregkh> References: <2026051206-mocker-bonfire-ac1c@gregkh> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Menglong Dong [ Upstream commit 0de4c70d04a46a3c266547dd4275ce25f623796a ] For now, all the kernel functions who are hooked by the fprobe will be added to the hash table "fprobe_ip_table". The key of it is the function address, and the value of it is "struct fprobe_hlist_node". The budget of the hash table is FPROBE_IP_TABLE_SIZE, which is 256. And this means the overhead of the hash table lookup will grow linearly if the count of the functions in the fprobe more than 256. When we try to hook all the kernel functions, the overhead will be huge. Therefore, replace the hash table with rhltable to reduce the overhead. Link: https://lore.kernel.org/all/20250819031825.55653-1-dongml2@chinatelecom.cn/ Signed-off-by: Menglong Dong Signed-off-by: Masami Hiramatsu (Google) Stable-dep-of: 845947aca681 ("tracing/fprobe: Remove fprobe from hash in failure path") Signed-off-by: Sasha Levin --- include/linux/fprobe.h | 3 +- kernel/trace/fprobe.c | 157 ++++++++++++++++++++++++----------------- 2 files changed, 93 insertions(+), 67 deletions(-) diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h index 7964db96e41a9..0a3bcd1718f37 100644 --- a/include/linux/fprobe.h +++ b/include/linux/fprobe.h @@ -7,6 +7,7 @@ #include #include #include +#include #include struct fprobe; @@ -26,7 +27,7 @@ typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip, * @fp: The fprobe which owns this. */ struct fprobe_hlist_node { - struct hlist_node hlist; + struct rhlist_head hlist; unsigned long addr; struct fprobe *fp; }; diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c index 43b27f07730c2..9db0a4e331132 100644 --- a/kernel/trace/fprobe.c +++ b/kernel/trace/fprobe.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -42,60 +43,67 @@ * - RCU hlist traversal under disabling preempt */ static struct hlist_head fprobe_table[FPROBE_TABLE_SIZE]; -static struct hlist_head fprobe_ip_table[FPROBE_IP_TABLE_SIZE]; +static struct rhltable fprobe_ip_table; static DEFINE_MUTEX(fprobe_mutex); -/* - * Find first fprobe in the hlist. It will be iterated twice in the entry - * probe, once for correcting the total required size, the second time is - * calling back the user handlers. - * Thus the hlist in the fprobe_table must be sorted and new probe needs to - * be added *before* the first fprobe. - */ -static struct fprobe_hlist_node *find_first_fprobe_node(unsigned long ip) +static u32 fprobe_node_hashfn(const void *data, u32 len, u32 seed) { - struct fprobe_hlist_node *node; - struct hlist_head *head; + return hash_ptr(*(unsigned long **)data, 32); +} - head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)]; - hlist_for_each_entry_rcu(node, head, hlist, - lockdep_is_held(&fprobe_mutex)) { - if (node->addr == ip) - return node; - } - return NULL; +static int fprobe_node_cmp(struct rhashtable_compare_arg *arg, + const void *ptr) +{ + unsigned long key = *(unsigned long *)arg->key; + const struct fprobe_hlist_node *n = ptr; + + return n->addr != key; } -NOKPROBE_SYMBOL(find_first_fprobe_node); -/* Node insertion and deletion requires the fprobe_mutex */ -static void insert_fprobe_node(struct fprobe_hlist_node *node) +static u32 fprobe_node_obj_hashfn(const void *data, u32 len, u32 seed) { - unsigned long ip = node->addr; - struct fprobe_hlist_node *next; - struct hlist_head *head; + const struct fprobe_hlist_node *n = data; + + return hash_ptr((void *)n->addr, 32); +} +static const struct rhashtable_params fprobe_rht_params = { + .head_offset = offsetof(struct fprobe_hlist_node, hlist), + .key_offset = offsetof(struct fprobe_hlist_node, addr), + .key_len = sizeof_field(struct fprobe_hlist_node, addr), + .hashfn = fprobe_node_hashfn, + .obj_hashfn = fprobe_node_obj_hashfn, + .obj_cmpfn = fprobe_node_cmp, + .automatic_shrinking = true, +}; + +/* Node insertion and deletion requires the fprobe_mutex */ +static int insert_fprobe_node(struct fprobe_hlist_node *node) +{ lockdep_assert_held(&fprobe_mutex); - next = find_first_fprobe_node(ip); - if (next) { - hlist_add_before_rcu(&node->hlist, &next->hlist); - return; - } - head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)]; - hlist_add_head_rcu(&node->hlist, head); + return rhltable_insert(&fprobe_ip_table, &node->hlist, fprobe_rht_params); } /* Return true if there are synonims */ static bool delete_fprobe_node(struct fprobe_hlist_node *node) { lockdep_assert_held(&fprobe_mutex); + bool ret; /* Avoid double deleting */ if (READ_ONCE(node->fp) != NULL) { WRITE_ONCE(node->fp, NULL); - hlist_del_rcu(&node->hlist); + rhltable_remove(&fprobe_ip_table, &node->hlist, + fprobe_rht_params); } - return !!find_first_fprobe_node(node->addr); + + rcu_read_lock(); + ret = !!rhltable_lookup(&fprobe_ip_table, &node->addr, + fprobe_rht_params); + rcu_read_unlock(); + + return ret; } /* Check existence of the fprobe */ @@ -247,9 +255,10 @@ static inline int __fprobe_kprobe_handler(unsigned long ip, unsigned long parent static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops, struct ftrace_regs *fregs) { - struct fprobe_hlist_node *node, *first; unsigned long *fgraph_data = NULL; unsigned long func = trace->func; + struct fprobe_hlist_node *node; + struct rhlist_head *head, *pos; unsigned long ret_ip; int reserved_words; struct fprobe *fp; @@ -258,14 +267,11 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops, if (WARN_ON_ONCE(!fregs)) return 0; - first = node = find_first_fprobe_node(func); - if (unlikely(!first)) - return 0; - + head = rhltable_lookup(&fprobe_ip_table, &func, fprobe_rht_params); reserved_words = 0; - hlist_for_each_entry_from_rcu(node, hlist) { + rhl_for_each_entry_rcu(node, pos, head, hlist) { if (node->addr != func) - break; + continue; fp = READ_ONCE(node->fp); if (!fp || !fp->exit_handler) continue; @@ -276,13 +282,12 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops, reserved_words += FPROBE_HEADER_SIZE_IN_LONG + SIZE_IN_LONG(fp->entry_data_size); } - node = first; if (reserved_words) { fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long)); if (unlikely(!fgraph_data)) { - hlist_for_each_entry_from_rcu(node, hlist) { + rhl_for_each_entry_rcu(node, pos, head, hlist) { if (node->addr != func) - break; + continue; fp = READ_ONCE(node->fp); if (fp && !fprobe_disabled(fp)) fp->nmissed++; @@ -297,12 +302,12 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops, */ ret_ip = ftrace_regs_get_return_address(fregs); used = 0; - hlist_for_each_entry_from_rcu(node, hlist) { + rhl_for_each_entry_rcu(node, pos, head, hlist) { int data_size; void *data; if (node->addr != func) - break; + continue; fp = READ_ONCE(node->fp); if (!fp || fprobe_disabled(fp)) continue; @@ -447,25 +452,21 @@ static int fprobe_addr_list_add(struct fprobe_addr_list *alist, unsigned long ad return 0; } -static void fprobe_remove_node_in_module(struct module *mod, struct hlist_head *head, - struct fprobe_addr_list *alist) +static void fprobe_remove_node_in_module(struct module *mod, struct fprobe_hlist_node *node, + struct fprobe_addr_list *alist) { - struct fprobe_hlist_node *node; int ret = 0; - hlist_for_each_entry_rcu(node, head, hlist, - lockdep_is_held(&fprobe_mutex)) { - if (!within_module(node->addr, mod)) - continue; - if (delete_fprobe_node(node)) - continue; - /* - * If failed to update alist, just continue to update hlist. - * Therefore, at list user handler will not hit anymore. - */ - if (!ret) - ret = fprobe_addr_list_add(alist, node->addr); - } + if (!within_module(node->addr, mod)) + return; + if (delete_fprobe_node(node)) + return; + /* + * If failed to update alist, just continue to update hlist. + * Therefore, at list user handler will not hit anymore. + */ + if (!ret) + ret = fprobe_addr_list_add(alist, node->addr); } /* Handle module unloading to manage fprobe_ip_table. */ @@ -473,8 +474,9 @@ static int fprobe_module_callback(struct notifier_block *nb, unsigned long val, void *data) { struct fprobe_addr_list alist = {.size = FPROBE_IPS_BATCH_INIT}; + struct fprobe_hlist_node *node; + struct rhashtable_iter iter; struct module *mod = data; - int i; if (val != MODULE_STATE_GOING) return NOTIFY_DONE; @@ -485,8 +487,16 @@ static int fprobe_module_callback(struct notifier_block *nb, return NOTIFY_DONE; mutex_lock(&fprobe_mutex); - for (i = 0; i < FPROBE_IP_TABLE_SIZE; i++) - fprobe_remove_node_in_module(mod, &fprobe_ip_table[i], &alist); + rhltable_walk_enter(&fprobe_ip_table, &iter); + do { + rhashtable_walk_start(&iter); + + while ((node = rhashtable_walk_next(&iter)) && !IS_ERR(node)) + fprobe_remove_node_in_module(mod, node, &alist); + + rhashtable_walk_stop(&iter); + } while (node == ERR_PTR(-EAGAIN)); + rhashtable_walk_exit(&iter); if (alist.index > 0) ftrace_set_filter_ips(&fprobe_graph_ops.ops, @@ -728,8 +738,16 @@ int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num) ret = fprobe_graph_add_ips(addrs, num); if (!ret) { add_fprobe_hash(fp); - for (i = 0; i < hlist_array->size; i++) - insert_fprobe_node(&hlist_array->array[i]); + for (i = 0; i < hlist_array->size; i++) { + ret = insert_fprobe_node(&hlist_array->array[i]); + if (ret) + break; + } + /* fallback on insert error */ + if (ret) { + for (i--; i >= 0; i--) + delete_fprobe_node(&hlist_array->array[i]); + } } if (ret) @@ -824,3 +842,10 @@ int unregister_fprobe(struct fprobe *fp) return ret; } EXPORT_SYMBOL_GPL(unregister_fprobe); + +static int __init fprobe_initcall(void) +{ + rhltable_init(&fprobe_ip_table, &fprobe_rht_params); + return 0; +} +late_initcall(fprobe_initcall); -- 2.53.0