* Re: [PATCH v2 1/2] tracing/hist: rebuild full_name on each hist_field_name() call
From: Pengpeng Hou @ 2026-04-08 2:15 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, Tom Zanussi,
linux-trace-kernel, linux-kernel, pengpeng
In-Reply-To: <20260401112224.85582-1-pengpeng@iscas.ac.cn>
Hi Steve,
Thanks for looking at this.
I believe this path is reachable.
field->system is set when a VAR_REF is created with an event-qualified
reference. The assignment happens in init_var_ref(), which is called
from create_var_ref().
The parser accepts fully qualified references of the form
system.event.$var
and parse_expr() passes the parsed system/event strings into
create_var_ref(). trace_action_create() also accepts explicitly
qualified action parameters in the same form and passes those strings
into create_var_ref().
This is also documented in Documentation/trace/histogram.rst:
fully-qualified name is of the form 'system.event_name.$var_name'
or 'system.event_name.field'.
The path is user-visible because reading back the trigger goes through
event_hist_trigger_print() -> hist_field_print() -> hist_field_name()
so creating a hist trigger that uses an inter-event variable reference
in fully qualified form is enough to exercise it.
So I don't think this is dead code. If you would prefer a different way
of handling the printed name, I can respin accordingly, but I don't
think the current branch is unreachable.
Thanks,
Pengpeng
^ permalink raw reply
* Re: [PATCH bpf-next v7 0/2] tracing: Fix kprobe attachment when module shadows vmlinux symbol
From: patchwork-bot+netdevbpf @ 2026-04-08 1:20 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, linux-trace-kernel, ast, daniel, andrii, jolsa, rostedt,
mhiramat, ihor.solodrai, emil, linux-open-source
In-Reply-To: <20260407203912.1787502-1-andrey.grodzovsky@crowdstrike.com>
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Tue, 7 Apr 2026 16:39:10 -0400 you wrote:
> When a kernel module exports a symbol with the same name as an existing
> vmlinux symbol, kprobe attachment fails with -EADDRNOTAVAIL because
> number_of_same_symbols() counts matches across both vmlinux and all
> loaded modules, returning a count greater than 1.
>
> This series takes a different approach from v1-v4, which implemented a
> libbpf-side fallback parsing /proc/kallsyms and retrying with the
> absolute address. That approach was rejected (Andrii Nakryiko, Ihor
> Solodrai) because ambiguous symbol resolution does not belong in libbpf.
>
> [...]
Here is the summary with links:
- [bpf-next,v7,1/2] tracing: Prefer vmlinux symbols over module symbols for unqualified kprobes
https://git.kernel.org/bpf/bpf-next/c/1870ddcd94b0
- [bpf-next,v7,2/2] selftests/bpf: Add tests for kprobe attachment with duplicate symbols
https://git.kernel.org/bpf/bpf-next/c/cea4323f1cfe
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH] tracing/fprobe: Check the same type fprobe on table as the unregistered one
From: Masami Hiramatsu @ 2026-04-08 1:07 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Steven Rostedt, Menglong Dong, Mathieu Desnoyers, jiang.biao,
linux-kernel, linux-trace-kernel
In-Reply-To: <177555745233.1441308.6535024692186959381.stgit@mhiramat.tok.corp.google.com>
Hi,
Sashiko reported another concern in module callback [1].
Let me fix it.
[1] https://sashiko.dev/#/patchset/177555745233.1441308.6535024692186959381.stgit%40mhiramat.tok.corp.google.com
Thanks,
On Tue, 7 Apr 2026 19:24:12 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> Commit 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
> introduced a different ftrace_ops for entry-only fprobes.
>
> However, when unregistering an fprobe, the kernel only checks if another
> fprobe exists at the same address, without checking which type of fprobe
> it is.
> If different fprobes are registered at the same address, the same address
> will be registered in both fgraph_ops and ftrace_ops, but only one of
> them will be deleted when unregistering. (the one removed first will not
> be deleted from the ops).
>
> This results in junk entries remaining in either fgraph_ops or ftrace_ops.
> For example:
> =======
> cd /sys/kernel/tracing
>
> # 'Add entry and exit events on the same place'
> echo 'f:event1 vfs_read' >> dynamic_events
> echo 'f:event2 vfs_read%return' >> dynamic_events
>
> # 'Enable both of them'
> echo 1 > events/fprobes/enable
> cat enabled_functions
> vfs_read (2) ->arch_ftrace_ops_list_func+0x0/0x210
>
> # 'Disable and remove exit event'
> echo 0 > events/fprobes/event2/enable
> echo -:event2 >> dynamic_events
>
> # 'Disable and remove all events'
> echo 0 > events/fprobes/enable
> echo > dynamic_events
>
> # 'Add another event'
> echo 'f:event3 vfs_open%return' > dynamic_events
> cat dynamic_events
> f:fprobes/event3 vfs_open%return
>
> echo 1 > events/fprobes/enable
> cat enabled_functions
> vfs_open (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
> vfs_read (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
> =======
>
> As you can see, an entry for the vfs_read remains.
>
> To fix this issue, when unregistering, the kernel should also check if
> there is the same type of fprobes still exist at the same address, and
> if not, delete its entry from either fgraph_ops or ftrace_ops.
>
> Fixes: 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
> Cc: stable@vger.kernel.org
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
> kernel/trace/fprobe.c | 77 +++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 62 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> index dcadf1d23b8a..7f75e6e4462c 100644
> --- a/kernel/trace/fprobe.c
> +++ b/kernel/trace/fprobe.c
> @@ -85,11 +85,9 @@ static int insert_fprobe_node(struct fprobe_hlist_node *node)
> return rhltable_insert(&fprobe_ip_table, &node->hlist, fprobe_rht_params);
> }
>
> -/* Return true if there are synonims */
> -static bool delete_fprobe_node(struct fprobe_hlist_node *node)
> +static void delete_fprobe_node(struct fprobe_hlist_node *node)
> {
> lockdep_assert_held(&fprobe_mutex);
> - bool ret;
>
> /* Avoid double deleting */
> if (READ_ONCE(node->fp) != NULL) {
> @@ -97,13 +95,6 @@ static bool delete_fprobe_node(struct fprobe_hlist_node *node)
> rhltable_remove(&fprobe_ip_table, &node->hlist,
> fprobe_rht_params);
> }
> -
> - rcu_read_lock();
> - ret = !!rhltable_lookup(&fprobe_ip_table, &node->addr,
> - fprobe_rht_params);
> - rcu_read_unlock();
> -
> - return ret;
> }
>
> /* Check existence of the fprobe */
> @@ -337,6 +328,32 @@ static bool fprobe_is_ftrace(struct fprobe *fp)
> return !fp->exit_handler;
> }
>
> +static bool fprobe_exists_on_hash(unsigned long ip, bool ftrace)
> +{
> + struct rhlist_head *head, *pos;
> + struct fprobe_hlist_node *node;
> + struct fprobe *fp;
> +
> + guard(rcu)();
> + head = rhltable_lookup(&fprobe_ip_table, &ip,
> + fprobe_rht_params);
> + if (!head)
> + return false;
> + /* We have to check the same type on the list. */
> + rhl_for_each_entry_rcu(node, pos, head, hlist) {
> + if (node->addr != ip)
> + break;
> + fp = READ_ONCE(node->fp);
> + if (likely(fp)) {
> + if ((!ftrace && fp->exit_handler) ||
> + (ftrace && !fp->exit_handler))
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> #ifdef CONFIG_MODULES
> static void fprobe_set_ips(unsigned long *ips, unsigned int cnt, int remove,
> int reset)
> @@ -360,6 +377,29 @@ static bool fprobe_is_ftrace(struct fprobe *fp)
> return false;
> }
>
> +static bool fprobe_exists_on_hash(unsigned long ip, bool ftrace __maybe_unused)
> +{
> + struct rhlist_head *head, *pos;
> + struct fprobe_hlist_node *node;
> + struct fprobe *fp;
> +
> + guard(rcu)();
> + head = rhltable_lookup(&fprobe_ip_table, &ip,
> + fprobe_rht_params);
> + if (!head)
> + return false;
> + /* We only need to check fp is there. */
> + rhl_for_each_entry_rcu(node, pos, head, hlist) {
> + if (node->addr != ip)
> + break;
> + fp = READ_ONCE(node->fp);
> + if (likely(fp))
> + return true;
> + }
> +
> + return false;
> +}
> +
> #ifdef CONFIG_MODULES
> static void fprobe_set_ips(unsigned long *ips, unsigned int cnt, int remove,
> int reset)
> @@ -574,15 +614,20 @@ static int fprobe_addr_list_add(struct fprobe_addr_list *alist, unsigned long ad
> static void fprobe_remove_node_in_module(struct module *mod, struct fprobe_hlist_node *node,
> struct fprobe_addr_list *alist)
> {
> + lockdep_assert_in_rcu_read_lock();
> +
> if (!within_module(node->addr, mod))
> return;
> - if (delete_fprobe_node(node))
> - return;
> +
> + delete_fprobe_node(node);
> /*
> - * If failed to update alist, just continue to update hlist.
> + * Ignore failure of updating alist, but continue to update hlist.
> * Therefore, at list user handler will not hit anymore.
> + * And don't care the type here, because all fprobes on the same
> + * address must be removed eventually.
> */
> - fprobe_addr_list_add(alist, node->addr);
> + if (!rhltable_lookup(&fprobe_ip_table, &node->addr, fprobe_rht_params))
> + fprobe_addr_list_add(alist, node->addr);
> }
>
> /* Handle module unloading to manage fprobe_ip_table. */
> @@ -943,7 +988,9 @@ int unregister_fprobe(struct fprobe *fp)
> /* Remove non-synonim ips from table and hash */
> count = 0;
> for (i = 0; i < hlist_array->size; i++) {
> - if (!delete_fprobe_node(&hlist_array->array[i]))
> + delete_fprobe_node(&hlist_array->array[i]);
> + if (!fprobe_exists_on_hash(hlist_array->array[i].addr,
> + fprobe_is_ftrace(fp)))
> addrs[count++] = hlist_array->array[i].addr;
> }
> del_fprobe_hash(fp);
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH v2 1/2] tracing/hist: rebuild full_name on each hist_field_name() call
From: Steven Rostedt @ 2026-04-08 1:05 UTC (permalink / raw)
To: Pengpeng Hou
Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
Tom Zanussi
In-Reply-To: <20260401112224.85582-1-pengpeng@iscas.ac.cn>
Tom,
On Wed, 1 Apr 2026 19:22:23 +0800
Pengpeng Hou <pengpeng@iscas.ac.cn> wrote:
> hist_field_name() uses a static MAX_FILTER_STR_VAL buffer for fully
> qualified variable-reference names, but it currently appends into that
> buffer with strcat() without rebuilding it first. As a result, repeated
> calls append a new "system.event.field" name onto the previous one,
> which can eventually run past the end of full_name.
>
> Build the name with snprintf() on each call and return NULL if the fully
> qualified name does not fit in MAX_FILTER_STR_VAL.
>
> Fixes: 067fe038e70f ("tracing: Add variable reference handling to hist triggers")
> Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> ---
> Changes since v1: https://lore.kernel.org/all/20260329030950.32503-1-pengpeng@iscas.ac.cn/
>
> - rebuild full_name on each call instead of falling back to field->name
> - return NULL on overflow as suggested
> - split out the snprintf() length check instead of using an inline if
>
> kernel/trace/trace_events_hist.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 73ea180cad55..f9c8a4f078ea 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1361,12 +1361,14 @@ static const char *hist_field_name(struct hist_field *field,
> field->flags & HIST_FIELD_FL_VAR_REF) {
> if (field->system) {
> static char full_name[MAX_FILTER_STR_VAL];
> + int len;
> +
> + len = snprintf(full_name, sizeof(full_name), "%s.%s.%s",
> + field->system, field->event_name,
> + field->name);
> + if (len >= sizeof(full_name))
> + return NULL;
>
> - strcat(full_name, field->system);
> - strcat(full_name, ".");
> - strcat(full_name, field->event_name);
> - strcat(full_name, ".");
> - strcat(full_name, field->name);
> field_name = full_name;
I wanted to test this but I can't find anything that triggers this path.
How does a field here get its ->system set?
If there's no way to hit this path, I much rather remove it than "fix" it.
-- Steve
> } else
> field_name = field->name;
^ permalink raw reply
* Re: [PATCH] selftests/ftrace: Account for fprobe attachment at creation
From: Masami Hiramatsu @ 2026-04-08 0:49 UTC (permalink / raw)
To: Cao Ruichuang
Cc: rostedt, mathieu.desnoyers, shuah, linux-kernel,
linux-trace-kernel, linux-kselftest
In-Reply-To: <20260407115751.96184-1-create0818@163.com>
On Tue, 7 Apr 2026 19:57:51 +0800
Cao Ruichuang <create0818@163.com> wrote:
> add_remove_fprobe.tc assumes that enabling an fprobe event is what adds
> its target function to enabled_functions.
>
> On the current kernel, the fprobe target already appears in
> enabled_functions as soon as the event is created, and enabling the
> event does not change that count again. That makes the test fail even
> though the event lifecycle itself works.
>
> Record the attachment baseline after creating the probe events and only
> check that enabling them keeps the expected functions attached. The
> cleanup checks still verify that removing the events returns
> enabled_functions to its original state.
>
Hmm, did you run this test with not clean environment?
It is not designed to do so.
(I think we should not check ocnt, or if there are any entries,
we should stop this test as unresolved.)
Thank you,
> Signed-off-by: Cao Ruichuang <create0818@163.com>
> ---
> .../test.d/dynevent/add_remove_fprobe.tc | 28 +++++++++++++------
> 1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
> index 47067a5e3..ff08bd1ac 100644
> --- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
> +++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
> @@ -26,23 +26,29 @@ grep -q myevent2 dynamic_events
> grep -q myevent3 dynamic_events
> test -d events/fprobes/myevent1
> test -d events/fprobes/myevent2
> -
> -echo 1 > events/fprobes/myevent1/enable
> -# Make sure the event is attached.
> grep -q $PLACE enabled_functions
> +grep -q $PLACE2 enabled_functions
> cnt=`cat enabled_functions | wc -l`
> -if [ $cnt -eq $ocnt ]; then
> +if [ $cnt -le $ocnt ]; then
> + exit_fail
> +fi
> +
> +echo 1 > events/fprobes/myevent1/enable
> +cnt1=`cat enabled_functions | wc -l`
> +if [ $cnt1 -ne $cnt ]; then
> exit_fail
> fi
>
> echo 1 > events/fprobes/myevent2/enable
> cnt2=`cat enabled_functions | wc -l`
> +if [ $cnt2 -ne $cnt1 ]; then
> + exit_fail
> +fi
>
> echo 1 > events/fprobes/myevent3/enable
> -# If the function is different, the attached function should be increased
> grep -q $PLACE2 enabled_functions
> cnt=`cat enabled_functions | wc -l`
> -if [ $cnt -eq $cnt2 ]; then
> +if [ $cnt -ne $cnt2 ]; then
> exit_fail
> fi
>
> @@ -62,11 +68,15 @@ if [ $cnt -ne $ocnt ]; then
> fi
>
> echo "f:myevent4 $PLACE" >> dynamic_events
> +grep -q $PLACE enabled_functions
> +cnt=`cat enabled_functions | wc -l`
> +if [ $cnt -le $ocnt ]; then
> + exit_fail
> +fi
>
> echo 1 > events/fprobes/myevent4/enable
> -# Should only have one enabled
> -cnt=`cat enabled_functions | wc -l`
> -if [ $cnt -ne $((ocnt + 1)) ]; then
> +cnt2=`cat enabled_functions | wc -l`
> +if [ $cnt2 -ne $cnt ]; then
> exit_fail
> fi
>
> --
> 2.39.5 (Apple Git-154)
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH 14/24] nfsd: add tracepoint to dir_event handler
From: Steven Rostedt @ 2026-04-08 0:41 UTC (permalink / raw)
To: Jeff Layton
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, NeilBrown, Olga Kornievskaia,
Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker,
Amir Goldstein, Calum Mackay, linux-fsdevel, linux-kernel,
linux-trace-kernel, linux-doc, linux-nfs
In-Reply-To: <20260407-dir-deleg-v1-14-aaf68c478abd@kernel.org>
On Tue, 07 Apr 2026 09:21:27 -0400
Jeff Layton <jlayton@kernel.org> wrote:
> Add some extra visibility around the fsnotify handlers.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/nfsd/nfs4state.c | 2 ++
> fs/nfsd/trace.h | 20 ++++++++++++++++++++
> 2 files changed, 22 insertions(+)
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-- Steve
^ permalink raw reply
* Re: [PATCH] selftests/ftrace: Drop invalid top-level local in test_ownership
From: Steven Rostedt @ 2026-04-08 0:37 UTC (permalink / raw)
To: CaoRuichuang, Shuah Khan
Cc: mhiramat, mathieu.desnoyers, shuah, linux-kernel,
linux-trace-kernel, linux-kselftest
In-Reply-To: <20260407102613.81419-1-create0818@163.com>
Shuah,
Care to take this through your tree. Probably could even add:
Cc: stable@vger.kernel.org
Fixes: 8b55572e51805 ("tracing/selftests: Add tracefs mount options test")
As well as:
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-- Steve
On Tue, 7 Apr 2026 18:26:13 +0800
CaoRuichuang <create0818@163.com> wrote:
> From: Cao Ruichuang <create0818@163.com>
>
> test_ownership.tc is sourced by ftracetest under /bin/sh.
>
> The script currently declares mount_point with local at file scope,
> which makes /bin/sh abort with "local: not in a function" before the
> test can reach the eventfs ownership checks.
>
> Replace the top-level local declaration with a normal shell variable so
> kernels that support the gid= tracefs mount option can run the test at
> all.
>
> Signed-off-by: Cao Ruichuang <create0818@163.com>
> ---
> tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
> index e71cc3ad0..6d00d3c0f 100644
> --- a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
> +++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
> @@ -6,7 +6,7 @@
> original_group=`stat -c "%g" .`
> original_owner=`stat -c "%u" .`
>
> -local mount_point=$(get_mount_point)
> +mount_point=$(get_mount_point)
>
> mount_options=$(get_mnt_options "$mount_point")
>
^ permalink raw reply
* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Sean Christopherson @ 2026-04-08 0:33 UTC (permalink / raw)
To: Michael Roth
Cc: Vishal Annapurve, Ackerley Tng, aik, andrew.jones, binbin.wu,
brauner, chao.p.peng, david, ira.weiny, jmattson, jthoughton,
oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
forkloop, pratyush, suzuki.poulose, aneesh.kumar, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm
In-Reply-To: <nmy5polcxfnn3hoircsiqarxmzlulwwq7w34okanccntp32v56@h2eac44agovv>
On Tue, Apr 07, 2026, Michael Roth wrote:
> On Tue, Apr 07, 2026 at 02:50:58PM -0700, Vishal Annapurve wrote:
> > On Tue, Apr 7, 2026 at 2:09 PM Michael Roth <michael.roth@amd.com> wrote:
> > >
> > > > TLDR:
> > > >
> > > > + Think of populate ioctls not as KVM touching memory, but platform
> > > > handling population.
> > > > + KVM code (kvm_gmem_populate) still doesn't touch memory contents
> > > > + post_populate is platform-specific code that handles loading into
> > > > private destination memory just to support legacy non-in-place
> > > > conversion.
> > > > + Don't complicate populate ioctls by doing conversion just to support
> > > > legacy use-cases where platform-specific code has to do copying on
> > > > the host.
> > >
> > > That's a good point: these are only considerations in the context of
> > > actually copying from src->dst, but with in-place conversion the
> > > primary/more-performant approach will be for userspace to initial
> > > directly. I.e. if we enforced that, then gmem could right ascertain that
> > > it isn't even writing to private pages via these hooks and any
> > > manipulation of that memory is purely on the part of the trusted entity
> > > handling initial encryption/etc.
> > >
> > > I understand that we decided to keep the option of allowing separate
> > > src/dst even with in-place conversion, but it doesn't seem worthwhile if
> > > that necessarily means we need to glue population+conversion together in
> > > 1 clumsy interface that needs to handle partial return/error responses to
> > > userspace (or potentially get stuck forever in the conversion path).
> >
> > I think ARM needs userspace to specify separate source and destination
> > memory ranges for initial population as ARM doesn't support in-place
> > memory encryption. [1]
> >
> > [1] https://lore.kernel.org/kvm/20260318155413.793430-25-steven.price@arm.com/
> >
> > >
> > > So I agree with Ackerley's proposal (which I guess is the same as what's
> > > in this series).
> > >
> > > However, 1 other alternative would be to do what was suggested on the
> > > call, but require userspace to subsequently handle the shared->private
> > > conversion. I think that would be workable too.
> >
> > IIUC, Converting memory ranges to private after it essentially is
> > treated as private by the KVM CC backend will expose the
> > implementation to the same risk of userspace being able to access
> > private memory and compromise host safety which guest_memfd was
> > invented to address.
>
> Doh, fair point. Doing conversion as part of the populate call would allow
> us to use the filemap write-lock to avoid userspace being able to fault
> in private (as tracked by trusted entity) pages before they are
> transitioned to private (as tracked by KVM), so it's safer than having
> userspace drive it.
>
> But obviously I still think Ackerley's original proposal has more
> upsides than the alternatives mentioned so far.
I'm a bit lost. What exactly is/was Ackerley's original proposal? If the answer
is "convert pages from shared=>private when populating via in-place conversion",
then I agree, because AFAICT, that's the only sane option.
^ permalink raw reply
* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Sean Christopherson @ 2026-04-08 0:30 UTC (permalink / raw)
To: Ackerley Tng
Cc: Michael Roth, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
david, ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta,
qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm
In-Reply-To: <CAEvNRgEtigp7+PVDkyu_DH947CUqDt312d+P+hWjjd2fHONiag@mail.gmail.com>
On Fri, Apr 03, 2026, Ackerley Tng wrote:
> Currently, in TDX's populate flow, KVM doesn't do any copying, it only
> instructs TDX to do the copying.
I disagree with this statement. For all intents and purposes, the TDX-Module is
firmware. If Intel had elected to implement TDX via XuCode, and presented it to
software as ISA (see SGX), then under the hood "firmware" would still be doing the
actual copy, but KVM would be execute some form of "copy" instruction.
Saying "KVM doesn't do any copying" is (very loosely) analogous to saying that
KVM doesn't copy anything when it does REP MOVSQ. It wasn't me your honor, Intel's
string engine did it!
I don't think it changes anything in practice, but I don't want to treat TDX
SEAMCALLs (or SNP PSP commands) as something completely different than what we
usually think of as "hardware".
^ permalink raw reply
* Re: [RFC PATCH 3/4] livepatch: Add "replaceable" attribute to klp_patch
From: Song Liu @ 2026-04-07 23:09 UTC (permalink / raw)
To: Petr Mladek
Cc: Yafang Shao, Joe Lawrence, Dylan Hatch, jpoimboe, jikos, mbenes,
rostedt, mhiramat, mathieu.desnoyers, kpsingh, mattbobrowski,
jolsa, ast, daniel, andrii, martin.lau, eddyz87, memxor,
yonghong.song, live-patching, linux-kernel, linux-trace-kernel,
bpf
In-Reply-To: <adUd0Mojbtrwmeod@pathway.suse.cz>
On Tue, Apr 7, 2026 at 8:08 AM Petr Mladek <pmladek@suse.com> wrote:
[...]
> > + * @replace: replace tag:
> > + * = 0: Atomic replace is disabled; however, this patch remains
> > + * eligible to be superseded by others.
>
> This is weird semantic. Which livepatch tag would be allowed to
> supersede it, please?
>
> Do we still need this category?
>
> > + * > 0: Atomic replace is enabled. Only existing patches with a
> > + * matching replace tag will be superseded.
> > * @list: list node for global list of actively used patches
> > * @kobj: kobject for sysfs resources
> > * @obj_list: dynamic list of the object entries
> > @@ -137,7 +141,7 @@ struct klp_patch {
> > struct module *mod;
> > struct klp_object *objs;
> > struct klp_state *states;
> > - bool replace;
> > + unsigned int replace;
>
> This already breaks the backward compatibility by changing the type
> and semantic of this field.
I was thinking if replace=0 means no replace, it is still backward
compatible. Did I miss something?
Thanks,
Song
> I would also change the name to better
> match the new semantic. What about using:
>
>
> * @replace_set: Livepatch using the same @replace_set will get
> atomically replaced, see also conflicts[*].
>
> unsigned int replace_set;
>
> [*] A livepatch module, livepatching an already livepatches function,
> can be loaded only when it has the same @replace_set number.
>
> By other words, two livepatches conflict when they have a different
> @replace_set number and have at least one livepatched version
> in common.
[...]
^ permalink raw reply
* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-04-07 22:09 UTC (permalink / raw)
To: Vishal Annapurve
Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
david, ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta,
qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm
In-Reply-To: <CAGtprH-kgRByFvvCYeWMXtsvpb6qpaWAo8k-3PEnioyPg-LEvA@mail.gmail.com>
On Tue, Apr 07, 2026 at 02:50:58PM -0700, Vishal Annapurve wrote:
> On Tue, Apr 7, 2026 at 2:09 PM Michael Roth <michael.roth@amd.com> wrote:
> >
> > > TLDR:
> > >
> > > + Think of populate ioctls not as KVM touching memory, but platform
> > > handling population.
> > > + KVM code (kvm_gmem_populate) still doesn't touch memory contents
> > > + post_populate is platform-specific code that handles loading into
> > > private destination memory just to support legacy non-in-place
> > > conversion.
> > > + Don't complicate populate ioctls by doing conversion just to support
> > > legacy use-cases where platform-specific code has to do copying on
> > > the host.
> >
> > That's a good point: these are only considerations in the context of
> > actually copying from src->dst, but with in-place conversion the
> > primary/more-performant approach will be for userspace to initial
> > directly. I.e. if we enforced that, then gmem could right ascertain that
> > it isn't even writing to private pages via these hooks and any
> > manipulation of that memory is purely on the part of the trusted entity
> > handling initial encryption/etc.
> >
> > I understand that we decided to keep the option of allowing separate
> > src/dst even with in-place conversion, but it doesn't seem worthwhile if
> > that necessarily means we need to glue population+conversion together in
> > 1 clumsy interface that needs to handle partial return/error responses to
> > userspace (or potentially get stuck forever in the conversion path).
>
> I think ARM needs userspace to specify separate source and destination
> memory ranges for initial population as ARM doesn't support in-place
> memory encryption. [1]
>
> [1] https://lore.kernel.org/kvm/20260318155413.793430-25-steven.price@arm.com/
>
> >
> > So I agree with Ackerley's proposal (which I guess is the same as what's
> > in this series).
> >
> > However, 1 other alternative would be to do what was suggested on the
> > call, but require userspace to subsequently handle the shared->private
> > conversion. I think that would be workable too.
>
> IIUC, Converting memory ranges to private after it essentially is
> treated as private by the KVM CC backend will expose the
> implementation to the same risk of userspace being able to access
> private memory and compromise host safety which guest_memfd was
> invented to address.
Doh, fair point. Doing conversion as part of the populate call would allow
us to use the filemap write-lock to avoid userspace being able to fault
in private (as tracked by trusted entity) pages before they are
transitioned to private (as tracked by KVM), so it's safer than having
userspace drive it.
But obviously I still think Ackerley's original proposal has more
upsides than the alternatives mentioned so far.
-Mike
>
> >
> > One other benefit to Ackerley's/current approach however is that it allows
> > us to potentially keep hugepages intact in the populate path, since
> > prep'ing/encrypting everything while it's in a shared state means gmem will
> > split the hugepage and all the firmware/RMP/etc. data structures will only
> > be able to handle individual 4K pages. I still suspect doing things like
> > encoding the initial 2MB OVMF image as a single hugepage might yield
> > enough benefit to explore this (at some point). So there's some niceness
> > in knowing that Ackerley's approach would allow for that eventually and
> > not require a complete rethink on these same topics.
> >
> > Thanks,
> >
> > Mike
> >
> > >
> > > >>>
> > > >>> [...snip...]
> > > >>>
^ permalink raw reply
* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Vishal Annapurve @ 2026-04-07 21:50 UTC (permalink / raw)
To: Michael Roth
Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
david, ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta,
qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm
In-Reply-To: <yvfwexsub7nrogh67hzcsupbrkzer6a7kbeao5tlq4elrzc2iz@xrwdjd7p32pp>
On Tue, Apr 7, 2026 at 2:09 PM Michael Roth <michael.roth@amd.com> wrote:
>
> > TLDR:
> >
> > + Think of populate ioctls not as KVM touching memory, but platform
> > handling population.
> > + KVM code (kvm_gmem_populate) still doesn't touch memory contents
> > + post_populate is platform-specific code that handles loading into
> > private destination memory just to support legacy non-in-place
> > conversion.
> > + Don't complicate populate ioctls by doing conversion just to support
> > legacy use-cases where platform-specific code has to do copying on
> > the host.
>
> That's a good point: these are only considerations in the context of
> actually copying from src->dst, but with in-place conversion the
> primary/more-performant approach will be for userspace to initial
> directly. I.e. if we enforced that, then gmem could right ascertain that
> it isn't even writing to private pages via these hooks and any
> manipulation of that memory is purely on the part of the trusted entity
> handling initial encryption/etc.
>
> I understand that we decided to keep the option of allowing separate
> src/dst even with in-place conversion, but it doesn't seem worthwhile if
> that necessarily means we need to glue population+conversion together in
> 1 clumsy interface that needs to handle partial return/error responses to
> userspace (or potentially get stuck forever in the conversion path).
I think ARM needs userspace to specify separate source and destination
memory ranges for initial population as ARM doesn't support in-place
memory encryption. [1]
[1] https://lore.kernel.org/kvm/20260318155413.793430-25-steven.price@arm.com/
>
> So I agree with Ackerley's proposal (which I guess is the same as what's
> in this series).
>
> However, 1 other alternative would be to do what was suggested on the
> call, but require userspace to subsequently handle the shared->private
> conversion. I think that would be workable too.
IIUC, Converting memory ranges to private after it essentially is
treated as private by the KVM CC backend will expose the
implementation to the same risk of userspace being able to access
private memory and compromise host safety which guest_memfd was
invented to address.
>
> One other benefit to Ackerley's/current approach however is that it allows
> us to potentially keep hugepages intact in the populate path, since
> prep'ing/encrypting everything while it's in a shared state means gmem will
> split the hugepage and all the firmware/RMP/etc. data structures will only
> be able to handle individual 4K pages. I still suspect doing things like
> encoding the initial 2MB OVMF image as a single hugepage might yield
> enough benefit to explore this (at some point). So there's some niceness
> in knowing that Ackerley's approach would allow for that eventually and
> not require a complete rethink on these same topics.
>
> Thanks,
>
> Mike
>
> >
> > >>>
> > >>> [...snip...]
> > >>>
^ permalink raw reply
* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-04-07 21:09 UTC (permalink / raw)
To: Ackerley Tng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, tabba, willy,
wyihan, yan.y.zhao, forkloop, pratyush, suzuki.poulose,
aneesh.kumar, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm
In-Reply-To: <CAEvNRgEtigp7+PVDkyu_DH947CUqDt312d+P+hWjjd2fHONiag@mail.gmail.com>
On Fri, Apr 03, 2026 at 07:50:16AM -0700, Ackerley Tng wrote:
> Ackerley
> dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnng <ackerleytng@google.com> writes:
>
> >
> > [...snip...]
> >
> > guest_memfd's populate will first check that the memory is shared, then
> > also set the memory to private after the populate.
> >
> > [...snip...]
> >
> Looking at this again, the above basically means that the entire
> conversion process needs to be performed within populate.
>
> In addition to setting the attributes in guest_memfd as private, for
> consistency, populate will also have to do all the associated
> operations, especially unmapping from the host, checking refcounts,
> and the list of work in conversion will only increase in future with
> direct map removal/restoration and huge page merging.
>
> The complexity of conversion also means possible errors (EAGAIN for
> elevated refcounts and ENOMEM when we're out of memory), and error
> information like the offset where the elevated refcount was.
>
> It doesn't look like there's room for this information to be plumbed out
> through the platform-specific ioctls, and even if we make space, it
> seems odd to have conversion-related error information returned through
> the platform-specific call.
>
>
> I agree with the goal of not having KVM touch private memory contents as
> tracked by guest_memfd, so I'd like to propose that we distinguish:
>
> 1. private as tracked by KVM (guest_memfd/vm_memory_attributes)
> 2. private as tracked by trusted entity
I think this is a good distinction to keep in mind, because if we adopt
the proposal from the call of having userspace set the destination memory
to shared prior to calling kvm_gmem_populate(), then they don't really
stay shared until gmem convert them to private: instead, they get set to
"private as tracked by trusted entity", but at the same time still have
'shared' memory attributes as far as KVM is concerned. Normally (SNP,
at least), the 'private (as tracked by KVM)' state is an intermediate state
on the way to 'private (as tracked by KVM + trusted entity)'.
So we introduce some inconsistencies on that side, in order to address
the inconsistency of kvm_gmem_populate() writing to 'private (as tracked
by KVM)' memory. But as you point out...
> + destination address: private (as tracked by guest_memfd)
> + source address: shared (as tracked by guest_memfd) or NULL
>
> KVM doesn't touch private memory contents, even in this case, because
> it's really a platform-specific ioctl that handles loading, and the
> platform does expect the destination is private for both TDX and SNP
> at the firmware boundary.
...yah, it's not really gmem that's writing to that memory, it's the
platform-specific hooks that 'prepare' the memory as part of population
and puts that in a 'private (as tracked by trusted entity)' state, just as
it's the platform-specific hooks that 'prepare' the memory as part of vCPU
page fault path at run-time and put them into a private (as tracked by
trusted entity). You could even imagine a naive CoCo implementation that
encrypts memory in-place at initial fault time via kvm_gmem_prepare()
hooks... we likely wouldn't insist on some new flow because this results
in gmem calling something that writes to 'private (as tracked by KVM)'
pages and would consider that to be more of a platform-specific
implementation detail that should be handled the same as other
architectures. And that seems like it would be roughly analogous to what
is being discussed here WRT the kvm_gmem_populate() path, so I think it
makes sense to continue to expecting the pages to be marked private in
advance of platform-specific preparation, whether that be via the
populate path or the runtime/fault-time path.
And for recent KVM,
; all the things we exp
about how the callbacks
As far as the copying goes,
By expecting 'private' (as tracked by KVM) as the initial state for
kvm_gmem_populate(), a lot of invariants about private memory (safe
refcount, directmap removal expectations, etc.) remain consistent even
in the populate path, where any special handling for private memory can be
accounted for in the same way rather than "shared, but..." or "private,
but...".
>
> Since SNP (platform-specific) only allows in-place launch update, and
> KVM had to provide an interface that allows a different source address
> for support before in-place conversion, then SNP has to continue
> supporting the to-be-deprecated path by doing the copying within
> platform-specific code.
>
> For consistency, guest_memfd can continue to check that it tracks the
> destination address as private, and sev_gmem_populate will then hide
> the copying code away just to support the legacy case.
>
>
> The flow before in-place conversion is
>
> 1. Load memory (shared or non-guest_memfd memory)
> 2. KVM_SEV_SNP_LAUNCH_UPDATE or KVM_TDX_INIT_MEM_REGION (destination:
> gfn for separate private memory, source: shared memory)
>
> The proposed flow for in-place conversion is
>
> 1. INIT_SHARED or convert to shared
> 2. Load memory while it is shared
> 3. Convert to private (PRESERVE, or new flag?)
> 4. KVM_SEV_SNP_LAUNCH_UPDATE or KVM_TDX_INIT_MEM_REGION (destination:
> gfn for converted private memory, source: NULL)
>
>
> TLDR:
>
> + Think of populate ioctls not as KVM touching memory, but platform
> handling population.
> + KVM code (kvm_gmem_populate) still doesn't touch memory contents
> + post_populate is platform-specific code that handles loading into
> private destination memory just to support legacy non-in-place
> conversion.
> + Don't complicate populate ioctls by doing conversion just to support
> legacy use-cases where platform-specific code has to do copying on
> the host.
That's a good point: these are only considerations in the context of
actually copying from src->dst, but with in-place conversion the
primary/more-performant approach will be for userspace to initial
directly. I.e. if we enforced that, then gmem could right ascertain that
it isn't even writing to private pages via these hooks and any
manipulation of that memory is purely on the part of the trusted entity
handling initial encryption/etc.
I understand that we decided to keep the option of allowing separate
src/dst even with in-place conversion, but it doesn't seem worthwhile if
that necessarily means we need to glue population+conversion together in
1 clumsy interface that needs to handle partial return/error responses to
userspace (or potentially get stuck forever in the conversion path).
So I agree with Ackerley's proposal (which I guess is the same as what's
in this series).
However, 1 other alternative would be to do what was suggested on the
call, but require userspace to subsequently handle the shared->private
conversion. I think that would be workable too.
One other benefit to Ackerley's/current approach however is that it allows
us to potentially keep hugepages intact in the populate path, since
prep'ing/encrypting everything while it's in a shared state means gmem will
split the hugepage and all the firmware/RMP/etc. data structures will only
be able to handle individual 4K pages. I still suspect doing things like
encoding the initial 2MB OVMF image as a single hugepage might yield
enough benefit to explore this (at some point). So there's some niceness
in knowing that Ackerley's approach would allow for that eventually and
not require a complete rethink on these same topics.
Thanks,
Mike
>
> >>>
> >>> [...snip...]
> >>>
^ permalink raw reply
* [PATCH bpf-next v7 2/2] selftests/bpf: Add tests for kprobe attachment with duplicate symbols
From: Andrey Grodzovsky @ 2026-04-07 20:39 UTC (permalink / raw)
To: bpf, linux-trace-kernel
Cc: ast, daniel, andrii, jolsa, rostedt, mhiramat, ihor.solodrai,
emil, linux-open-source
In-Reply-To: <20260407203912.1787502-1-andrey.grodzovsky@crowdstrike.com>
bpf_fentry_shadow_test exists in both vmlinux (net/bpf/test_run.c) and
bpf_testmod (bpf_testmod.c), creating a duplicate symbol condition when
bpf_testmod is loaded. Add subtests that verify kprobe behavior with
this duplicate symbol:
In attach_probe:
- dup-sym-{default,legacy,perf,link}: unqualified attach succeeds
across all four modes, preferring vmlinux over module shadow.
- MOD:SYM qualification attaches to the module version.
In kprobe_multi_test:
- dup_sym: kprobe_multi attach with kprobe and kretprobe succeeds.
bpf_fentry_shadow_test is not invoked via test_run, so tests verify
attach and detach succeed without triggering the probe.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
.../selftests/bpf/prog_tests/attach_probe.c | 69 +++++++++++++++++++
.../bpf/prog_tests/kprobe_multi_test.c | 40 +++++++++++
2 files changed, 109 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index 12a841afda68..e8c1a619e330 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -197,6 +197,66 @@ static void test_attach_kprobe_legacy_by_addr_reject(void)
test_attach_probe_manual__destroy(skel);
}
+/*
+ * bpf_fentry_shadow_test exists in both vmlinux (net/bpf/test_run.c) and
+ * bpf_testmod (bpf_testmod.c). When bpf_testmod is loaded the symbol is
+ * duplicated. Test that kprobe attachment handles this correctly:
+ * - Unqualified name ("bpf_fentry_shadow_test") attaches to vmlinux.
+ * - MOD:SYM name ("bpf_testmod:bpf_fentry_shadow_test") attaches to module.
+ *
+ * Note: bpf_fentry_shadow_test is not invoked via test_run, so we only
+ * verify that attach and detach succeed without triggering the probe.
+ */
+static void test_attach_probe_dup_sym(enum probe_attach_mode attach_mode)
+{
+ DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts);
+ struct bpf_link *kprobe_link, *kretprobe_link;
+ struct test_attach_probe_manual *skel;
+
+ skel = test_attach_probe_manual__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_dup_sym_open_and_load"))
+ return;
+
+ kprobe_opts.attach_mode = attach_mode;
+
+ /* Unqualified: should attach to vmlinux symbol */
+ kprobe_opts.retprobe = false;
+ kprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+ "bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kprobe_link, "attach_kprobe_vmlinux"))
+ goto cleanup;
+ bpf_link__destroy(kprobe_link);
+
+ kprobe_opts.retprobe = true;
+ kretprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+ "bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kretprobe_link, "attach_kretprobe_vmlinux"))
+ goto cleanup;
+ bpf_link__destroy(kretprobe_link);
+
+ /* MOD:SYM qualified: should attach to module symbol */
+ kprobe_opts.retprobe = false;
+ kprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+ "bpf_testmod:bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kprobe_link, "attach_kprobe_module"))
+ goto cleanup;
+ bpf_link__destroy(kprobe_link);
+
+ kprobe_opts.retprobe = true;
+ kretprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+ "bpf_testmod:bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kretprobe_link, "attach_kretprobe_module"))
+ goto cleanup;
+ bpf_link__destroy(kretprobe_link);
+
+cleanup:
+ test_attach_probe_manual__destroy(skel);
+}
+
/* attach uprobe/uretprobe long event name testings */
static void test_attach_uprobe_long_event_name(void)
{
@@ -559,6 +619,15 @@ void test_attach_probe(void)
if (test__start_subtest("kprobe-legacy-by-addr-reject"))
test_attach_kprobe_legacy_by_addr_reject();
+ if (test__start_subtest("dup-sym-default"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_DEFAULT);
+ if (test__start_subtest("dup-sym-legacy"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_LEGACY);
+ if (test__start_subtest("dup-sym-perf"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_PERF);
+ if (test__start_subtest("dup-sym-link"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_LINK);
+
if (test__start_subtest("auto"))
test_attach_probe_auto(skel);
if (test__start_subtest("kprobe-sleepable"))
diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
index 78c974d4ea33..20ddff6812e9 100644
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -633,6 +633,44 @@ static void test_attach_write_ctx(void)
}
#endif
+/*
+ * Test kprobe_multi handles shadow symbols (vmlinux + module duplicate).
+ * bpf_fentry_shadow_test exists in both vmlinux and bpf_testmod.
+ * kprobe_multi resolves via ftrace_lookup_symbols() which finds the
+ * vmlinux symbol first and stops, so this should always succeed.
+ */
+static void test_attach_probe_dup_sym(void)
+{
+ LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
+ const char *syms[1] = { "bpf_fentry_shadow_test" };
+ struct kprobe_multi *skel = NULL;
+ struct bpf_link *link1 = NULL, *link2 = NULL;
+
+ skel = kprobe_multi__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "kprobe_multi__open_and_load"))
+ goto cleanup;
+
+ skel->bss->pid = getpid();
+ opts.syms = syms;
+ opts.cnt = ARRAY_SIZE(syms);
+
+ link1 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe_manual,
+ NULL, &opts);
+ if (!ASSERT_OK_PTR(link1, "attach_kprobe_multi_dup_sym"))
+ goto cleanup;
+
+ opts.retprobe = true;
+ link2 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kretprobe_manual,
+ NULL, &opts);
+ if (!ASSERT_OK_PTR(link2, "attach_kretprobe_multi_dup_sym"))
+ goto cleanup;
+
+cleanup:
+ bpf_link__destroy(link2);
+ bpf_link__destroy(link1);
+ kprobe_multi__destroy(skel);
+}
+
void serial_test_kprobe_multi_bench_attach(void)
{
if (test__start_subtest("kernel"))
@@ -676,5 +714,7 @@ void test_kprobe_multi_test(void)
test_unique_match();
if (test__start_subtest("attach_write_ctx"))
test_attach_write_ctx();
+ if (test__start_subtest("dup_sym"))
+ test_attach_probe_dup_sym();
RUN_TESTS(kprobe_multi_verifier);
}
--
2.34.1
^ permalink raw reply related
* [PATCH bpf-next v7 0/2] tracing: Fix kprobe attachment when module shadows vmlinux symbol
From: Andrey Grodzovsky @ 2026-04-07 20:39 UTC (permalink / raw)
To: bpf, linux-trace-kernel
Cc: ast, daniel, andrii, jolsa, rostedt, mhiramat, ihor.solodrai,
emil, linux-open-source
When a kernel module exports a symbol with the same name as an existing
vmlinux symbol, kprobe attachment fails with -EADDRNOTAVAIL because
number_of_same_symbols() counts matches across both vmlinux and all
loaded modules, returning a count greater than 1.
This series takes a different approach from v1-v4, which implemented a
libbpf-side fallback parsing /proc/kallsyms and retrying with the
absolute address. That approach was rejected (Andrii Nakryiko, Ihor
Solodrai) because ambiguous symbol resolution does not belong in libbpf.
Following Ihor's suggestion, this series fixes the root cause in the
kernel: when an unqualified symbol name is given and the symbol is found
in vmlinux, prefer the vmlinux symbol and do not scan loaded modules.
This makes the skeleton auto-attach path work transparently with no
libbpf changes needed.
Patch 1: Kernel fix - return vmlinux-only count from
number_of_same_symbols() when the symbol is found in vmlinux,
preventing module shadows from causing -EADDRNOTAVAIL.
Patch 2: Selftests using bpf_fentry_shadow_test which exists in both
vmlinux and bpf_testmod - tests unqualified (vmlinux) and
MOD:SYM (module) attachment across all four attach modes, plus
kprobe_multi with the duplicate symbol.
Changes since v6 [1]:
- Fix comment style: use /* on its own line instead of networking-style
/* text on opener line (Alexei Starovoitov).
[1] https://lore.kernel.org/bpf/20260407165145.1651061-1-andrey.grodzovsky@crowdstrike.com/
Andrey Grodzovsky (2):
tracing: Prefer vmlinux symbols over module symbols for unqualified
kprobes
selftests/bpf: Add tests for kprobe attachment with duplicate symbols
kernel/trace/trace_kprobe.c | 8 +++
.../selftests/bpf/prog_tests/attach_probe.c | 69 +++++++++++++++++++
.../bpf/prog_tests/kprobe_multi_test.c | 40 +++++++++++
3 files changed, 117 insertions(+)
--
2.34.1
^ permalink raw reply
* [PATCH bpf-next v7 1/2] tracing: Prefer vmlinux symbols over module symbols for unqualified kprobes
From: Andrey Grodzovsky @ 2026-04-07 20:39 UTC (permalink / raw)
To: bpf, linux-trace-kernel
Cc: ast, daniel, andrii, jolsa, rostedt, mhiramat, ihor.solodrai,
emil, linux-open-source
In-Reply-To: <20260407203912.1787502-1-andrey.grodzovsky@crowdstrike.com>
When an unqualified kprobe target exists in both vmlinux and a loaded
module, number_of_same_symbols() returns a count greater than 1,
causing kprobe attachment to fail with -EADDRNOTAVAIL even though the
vmlinux symbol is unambiguous.
When no module qualifier is given and the symbol is found in vmlinux,
return the vmlinux-only count without scanning loaded modules. This
preserves the existing behavior for all other cases:
- Symbol only in a module: vmlinux count is 0, falls through to module
scan as before.
- Symbol qualified with MOD:SYM: mod != NULL, unchanged path.
- Symbol ambiguous within vmlinux itself: count > 1 is returned as-is.
Fixes: 926fe783c8a6 ("tracing/kprobes: Fix symbol counting logic by looking at modules as well")
Fixes: 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads")
Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
kernel/trace/trace_kprobe.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index a5dbb72528e0..058724c41c46 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -765,6 +765,14 @@ static unsigned int number_of_same_symbols(const char *mod, const char *func_nam
if (!mod)
kallsyms_on_each_match_symbol(count_symbols, func_name, &ctx.count);
+ /*
+ * If the symbol is found in vmlinux, use vmlinux resolution only.
+ * This prevents module symbols from shadowing vmlinux symbols
+ * and causing -EADDRNOTAVAIL for unqualified kprobe targets.
+ */
+ if (!mod && ctx.count > 0)
+ return ctx.count;
+
module_kallsyms_on_each_symbol(mod, count_mod_symbols, &ctx);
return ctx.count;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH bpf-next v6 1/2] tracing: Prefer vmlinux symbols over module symbols for unqualified kprobes
From: Alexei Starovoitov @ 2026-04-07 19:34 UTC (permalink / raw)
To: Andrey Grodzovsky
Cc: bpf, linux-trace-kernel, Jiri Olsa, Ihor Solodrai,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Steven Rostedt, Masami Hiramatsu, Emil Tsalapatis,
DL Linux Open Source Team
In-Reply-To: <20260407165145.1651061-2-andrey.grodzovsky@crowdstrike.com>
On Tue, Apr 7, 2026 at 9:52 AM Andrey Grodzovsky
<andrey.grodzovsky@crowdstrike.com> wrote:
>
> When an unqualified kprobe target exists in both vmlinux and a loaded
> module, number_of_same_symbols() returns a count greater than 1,
> causing kprobe attachment to fail with -EADDRNOTAVAIL even though the
> vmlinux symbol is unambiguous.
>
> When no module qualifier is given and the symbol is found in vmlinux,
> return the vmlinux-only count without scanning loaded modules. This
> preserves the existing behavior for all other cases:
> - Symbol only in a module: vmlinux count is 0, falls through to module
> scan as before.
> - Symbol qualified with MOD:SYM: mod != NULL, unchanged path.
> - Symbol ambiguous within vmlinux itself: count > 1 is returned as-is.
>
> Fixes: 926fe783c8a6 ("tracing/kprobes: Fix symbol counting logic by looking at modules as well")
> Fixes: 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads")
> Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
> ---
> kernel/trace/trace_kprobe.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index a5dbb72528e0..99c41ea8b6d7 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -765,6 +765,13 @@ static unsigned int number_of_same_symbols(const char *mod, const char *func_nam
> if (!mod)
> kallsyms_on_each_match_symbol(count_symbols, func_name, &ctx.count);
>
> + /* If the symbol is found in vmlinux, use vmlinux resolution only.
> + * This prevents module symbols from shadowing vmlinux symbols
> + * and causing -EADDRNOTAVAIL for unqualified kprobe targets.
> + */
Please do not use networking style comments in the new code.
Same applies to selftests.
pw-bot: cr
^ permalink raw reply
* [RFC 2/2] openrisc: Add KProbes
From: Sahil Siddiq @ 2026-04-07 18:56 UTC (permalink / raw)
To: jonas, stefan.kristiansson, shorne, naveen, davem, mhiramat
Cc: peterz, jpoimboe, jbaron, rostedt, ardb, chenmiao.ku, johannes,
nsc, masahiroy, tytso, linux-openrisc, linux-kernel,
linux-trace-kernel, Sahil Siddiq
In-Reply-To: <20260407185650.79816-1-sahilcdq0@gmail.com>
Add KProbes support for OpenRISC. This work is primarily based
on similar work done for LoongArch, MIPS and RISC-V.
KProbes make it possible to trap at almost any address in the
kernel to collect performance/debugging info.
Signed-off-by: Sahil Siddiq <sahilcdq0@gmail.com>
---
arch/openrisc/Kconfig | 1 +
arch/openrisc/configs/or1ksim_defconfig | 2 +
arch/openrisc/configs/virt_defconfig | 2 +
arch/openrisc/include/asm/asm.h | 22 ++
arch/openrisc/include/asm/break.h | 19 ++
arch/openrisc/include/asm/kprobes.h | 76 +++++
arch/openrisc/kernel/Makefile | 1 +
arch/openrisc/kernel/entry.S | 16 +
arch/openrisc/kernel/kprobes.c | 381 ++++++++++++++++++++++++
arch/openrisc/kernel/traps.c | 26 ++
arch/openrisc/lib/memcpy.c | 2 +
arch/openrisc/lib/memset.S | 4 +
arch/openrisc/mm/fault.c | 5 +
samples/kprobes/kprobe_example.c | 8 +
14 files changed, 565 insertions(+)
create mode 100644 arch/openrisc/include/asm/asm.h
create mode 100644 arch/openrisc/include/asm/break.h
create mode 100644 arch/openrisc/include/asm/kprobes.h
create mode 100644 arch/openrisc/kernel/kprobes.c
diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 9156635dd264..d240533b424b 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -27,6 +27,7 @@ config OPENRISC
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_PCI
+ select HAVE_KPROBES
select HAVE_UID16
select HAVE_PAGE_SIZE_8KB
select HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/arch/openrisc/configs/or1ksim_defconfig b/arch/openrisc/configs/or1ksim_defconfig
index 769705ac24d5..24d2915e7609 100644
--- a/arch/openrisc/configs/or1ksim_defconfig
+++ b/arch/openrisc/configs/or1ksim_defconfig
@@ -10,7 +10,9 @@ CONFIG_EXPERT=y
# CONFIG_KALLSYMS is not set
CONFIG_BUILTIN_DTB_NAME="or1ksim"
CONFIG_HZ_100=y
+CONFIG_OPENRISC=y
CONFIG_JUMP_LABEL=y
+CONFIG_KPROBES=y
CONFIG_MODULES=y
# CONFIG_BLOCK is not set
CONFIG_SLUB_TINY=y
diff --git a/arch/openrisc/configs/virt_defconfig b/arch/openrisc/configs/virt_defconfig
index 0b9979b35ca8..2eccb506032f 100644
--- a/arch/openrisc/configs/virt_defconfig
+++ b/arch/openrisc/configs/virt_defconfig
@@ -11,8 +11,10 @@ CONFIG_OPENRISC_HAVE_INST_SEXT=y
CONFIG_NR_CPUS=8
CONFIG_SMP=y
CONFIG_HZ_100=y
+CONFIG_OPENRISC=y
# CONFIG_OPENRISC_NO_SPR_SR_DSX is not set
CONFIG_JUMP_LABEL=y
+CONFIG_KPROBES=y
# CONFIG_COMPAT_BRK is not set
CONFIG_NET=y
CONFIG_PACKET=y
diff --git a/arch/openrisc/include/asm/asm.h b/arch/openrisc/include/asm/asm.h
new file mode 100644
index 000000000000..1a9c8bbb4430
--- /dev/null
+++ b/arch/openrisc/include/asm/asm.h
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Macros for OpenRISC asm
+ *
+ * Linux architectural port borrowing nearly verbatim from
+ * LoongArch and Arm. All original copyrights apply as per
+ * the original source declaration.
+ */
+
+#ifndef __ASM_ASM_H
+#define __ASM_ASM_H
+
+#ifdef CONFIG_KPROBES
+#define _ASM_NOKPROBE(symbol) \
+ .pushsection "_kprobe_blacklist", "aw"; \
+ .long symbol; \
+ .popsection
+#else
+#define _ASM_NOKPROBE(symbol)
+#endif
+
+#endif /* __ASM_ASM_H */
diff --git a/arch/openrisc/include/asm/break.h b/arch/openrisc/include/asm/break.h
new file mode 100644
index 000000000000..4bd04f4dd17a
--- /dev/null
+++ b/arch/openrisc/include/asm/break.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * OpenRISC trap codes used internally by the kernel
+ *
+ * Linux architectural port borrowing liberally from similar works of
+ * others. All original copyrights apply as per the original source
+ * declaration.
+ *
+ * Modifications for the OpenRISC architecture:
+ * Copyright (C) 2026 Sahil Siddiq <sahilcdq0@gmail.com>
+ */
+
+#ifndef __ASM_BREAK_H
+#define __ASM_BREAK_H
+
+#define BRK_KPROBE_BP 512 /* Kprobe break */
+#define BRK_KPROBE_SSTEPBP 1024 /* Kprobe single-step software implementation */
+
+#endif /* __ASM_BREAK_H */
diff --git a/arch/openrisc/include/asm/kprobes.h b/arch/openrisc/include/asm/kprobes.h
new file mode 100644
index 000000000000..50b6dc6d5a0c
--- /dev/null
+++ b/arch/openrisc/include/asm/kprobes.h
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * OpenRISC Linux
+ *
+ * Linux architectural port borrowing liberally from similar works of
+ * others. All original copyrights apply as per the original source
+ * declaration.
+ *
+ * OpenRISC implementation:
+ * Copyright (C) 2026 Sahil Siddiq <sahilcdq0@gmail.com>
+ */
+
+#ifndef __ASM_OPENRISC_KPROBES_H
+#define __ASM_OPENRISC_KPROBES_H
+
+#include <asm-generic/kprobes.h>
+
+#ifdef CONFIG_KPROBES
+#include <asm/break.h>
+#include <asm/cacheflush.h>
+
+#define __ARCH_WANT_KPROBES_INSN_SLOT
+
+struct pt_regs;
+struct kprobe;
+
+typedef u32 kprobe_opcode_t;
+
+/*
+ * MAX_INSN_SIZE is used as the number of slots in an executable
+ * page for single-stepping out of line (SSOL). We need two slots
+ * since we single-step using software breakpoints. The probed
+ * instruction is placed in the first slot with a breakpoint
+ * instruction in the second slot.
+ */
+#define MAX_INSN_SIZE 2
+
+/* Architecture specific copy of original instruction */
+struct arch_specific_insn {
+ /* copy of original instruction */
+ kprobe_opcode_t *insn;
+ /* address of next instruction in case of SSOL */
+ unsigned long restore;
+};
+
+struct prev_kprobe {
+ struct kprobe *kp;
+ unsigned int status;
+};
+
+/* per-cpu kprobe control block */
+struct kprobe_ctlblk {
+ unsigned int kprobe_status;
+ unsigned long irq_flags;
+ struct prev_kprobe prev_kprobe;
+};
+
+#define flush_insn_slot(p) do { } while (0)
+#define kretprobe_blacklist_size 0
+
+void arch_remove_kprobe(struct kprobe *p);
+int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
+bool kprobe_breakpoint_handler(struct pt_regs *regs);
+bool kprobe_singlestep_handler(struct pt_regs *regs);
+#else /* !CONFIG_KPROBES */
+static inline bool kprobe_breakpoint_handler(struct pt_regs *regs)
+{
+ return false;
+}
+
+static inline bool kprobe_singlestep_handler(struct pt_regs *regs)
+{
+ return false;
+}
+#endif /* CONFIG_KPROBES */
+#endif /* __ASM_OPENRISC_KPROBES_H */
diff --git a/arch/openrisc/kernel/Makefile b/arch/openrisc/kernel/Makefile
index 150779fbf010..2ac824867963 100644
--- a/arch/openrisc/kernel/Makefile
+++ b/arch/openrisc/kernel/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_SMP) += smp.o sync-timer.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_OF) += prom.o
+obj-$(CONFIG_KPROBES) += kprobes.o
obj-y += patching.o
clean:
diff --git a/arch/openrisc/kernel/entry.S b/arch/openrisc/kernel/entry.S
index c7e90b09645e..cd28bf1f7a3b 100644
--- a/arch/openrisc/kernel/entry.S
+++ b/arch/openrisc/kernel/entry.S
@@ -15,6 +15,7 @@
#include <linux/linkage.h>
#include <linux/pgtable.h>
+#include <asm/asm.h>
#include <asm/processor.h>
#include <asm/unistd.h>
#include <asm/thread_info.h>
@@ -640,6 +641,7 @@ ENTRY(_sys_call_handler)
/* r30 is the only register we clobber in the fast path */
/* r30 already saved */
/* l.sw PT_GPR30(r1),r30 */
+_ASM_NOKPROBE(_sys_call_handler)
_syscall_check_trace_enter:
/* syscalls run with interrupts enabled */
@@ -652,12 +654,14 @@ _syscall_check_trace_enter:
l.sfne r30,r0
l.bf _syscall_trace_enter
l.nop
+_ASM_NOKPROBE(_syscall_check_trace_enter)
_syscall_check:
/* Ensure that the syscall number is reasonable */
l.sfgeui r11,__NR_syscalls
l.bf _syscall_badsys
l.nop
+_ASM_NOKPROBE(_syscall_check)
_syscall_call:
l.movhi r29,hi(sys_call_table)
@@ -668,12 +672,14 @@ _syscall_call:
l.jalr r29
l.nop
+_ASM_NOKPROBE(_syscall_call)
_syscall_return:
/* All syscalls return here... just pay attention to ret_from_fork
* which does it in a round-about way.
*/
l.sw PT_GPR11(r1),r11 // save return value
+_ASM_NOKPROBE(_syscall_return)
#if 0
_syscall_debug:
@@ -708,6 +714,7 @@ _syscall_check_trace_leave:
l.sfne r30,r0
l.bf _syscall_trace_leave
l.nop
+_ASM_NOKPROBE(_syscall_check_trace_leave)
/* This is where the exception-return code begins... interrupts need to be
* disabled the rest of the way here because we can't afford to miss any
@@ -744,6 +751,7 @@ _syscall_check_work:
/* _work_pending needs to be called with interrupts disabled */
l.j _work_pending
l.nop
+_ASM_NOKPROBE(_syscall_check_work)
_syscall_resume_userspace:
// ENABLE_INTERRUPTS(r29)
@@ -800,6 +808,7 @@ _syscall_resume_userspace:
l.mtspr r0,r13,SPR_EPCR_BASE
l.mtspr r0,r15,SPR_ESR_BASE
l.rfe
+_ASM_NOKPROBE(_syscall_resume_userspace)
/* End of hot path!
* Keep the below tracing and error handling out of the hot path...
@@ -829,6 +838,7 @@ _syscall_trace_enter:
l.j _syscall_check
l.lwz r8,PT_GPR8(r1)
+_ASM_NOKPROBE(_syscall_trace_enter)
_syscall_trace_leave:
l.jal do_syscall_trace_leave
@@ -836,6 +846,7 @@ _syscall_trace_leave:
l.j _syscall_check_work
l.nop
+_ASM_NOKPROBE(_syscall_trace_leave)
_syscall_badsys:
/* Here we effectively pretend to have executed an imaginary
@@ -845,6 +856,7 @@ _syscall_badsys:
*/
l.j _syscall_return
l.addi r11,r0,-ENOSYS
+_ASM_NOKPROBE(_syscall_badsys)
/******* END SYSCALL HANDLING *******/
@@ -870,6 +882,7 @@ EXCEPTION_ENTRY(_trap_handler)
l.j _ret_from_exception
l.nop
+_ASM_NOKPROBE(_trap_handler)
/* ---[ 0xf00: Reserved exception ]-------------------------------------- */
@@ -1004,6 +1017,8 @@ ENTRY(_ret_from_exception)
l.nop
l.j _resume_userspace
l.nop
+_ASM_NOKPROBE(_ret_from_exception)
+_ASM_NOKPROBE(_ret_from_intr)
ENTRY(ret_from_fork)
l.jal schedule_tail
@@ -1038,6 +1053,7 @@ ENTRY(ret_from_fork)
l.j _syscall_return
l.nop
+_ASM_NOKPROBE(ret_from_fork)
/* ========================================================[ switch ] === */
diff --git a/arch/openrisc/kernel/kprobes.c b/arch/openrisc/kernel/kprobes.c
new file mode 100644
index 000000000000..f9073a1cb6eb
--- /dev/null
+++ b/arch/openrisc/kernel/kprobes.c
@@ -0,0 +1,381 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Kernel probes (KProbes) for OpenRISC
+ *
+ * Linux architectural port borrowing liberally from similar works of
+ * others. All original copyrights apply as per the original source
+ * declaration.
+ *
+ * OpenRISC implementation:
+ * Copyright (C) 2026 Sahil Siddiq <sahilcdq0@gmail.com>
+ */
+
+#include <linux/kprobes.h>
+#include <asm/insn-def.h>
+#include <asm/text-patching.h>
+
+DEFINE_PER_CPU(struct kprobe *, current_kprobe);
+DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
+
+#define KPROBE_BP_INSN __emit_trap(BRK_KPROBE_BP)
+#define KPROBE_SSTEPBP_INSN __emit_trap(BRK_KPROBE_SSTEPBP)
+
+static bool insn_not_supported(union openrisc_instruction insn)
+{
+ switch (insn.word) {
+ case INSN_CSYNC:
+ case INSN_MSYNC:
+ case INSN_PSYNC:
+ return true;
+ }
+
+ if ((insn.word & 0xffff0000) == OPCODE_SYS)
+ return true;
+
+ if ((insn.word & 0xfc01ffff) == OPCODE_MACRC)
+ return true;
+
+ switch (insn.opcodes_6bit.opcode) {
+ case l_rfe:
+ case l_lwa:
+ case l_mfspr:
+ case l_mtspr:
+ case l_swa:
+ return true;
+ }
+
+ return false;
+}
+NOKPROBE_SYMBOL(insn_not_supported)
+
+static bool is_branch_insn(union openrisc_instruction insn)
+{
+ switch (insn.opcodes_6bit.opcode) {
+ case l_j:
+ case l_jal:
+ case l_bnf:
+ case l_bf:
+ case l_jr:
+ case l_jalr:
+ return true;
+ }
+
+ return false;
+}
+NOKPROBE_SYMBOL(is_branch_insn)
+
+static bool is_pc_insn(union openrisc_instruction insn)
+{
+ if (insn.opcodes_6bit.opcode == l_adrp)
+ return true;
+
+ return false;
+}
+NOKPROBE_SYMBOL(is_pc_insn)
+
+static bool insns_need_simulation(union openrisc_instruction insn, bool *exec_delay_slot)
+{
+ if (is_branch_insn(insn)) {
+ *exec_delay_slot = has_delay_slot();
+ return true;
+ }
+
+ if (is_pc_insn(insn)) {
+ *exec_delay_slot = false;
+ return true;
+ }
+
+ *exec_delay_slot = false;
+ return false;
+}
+NOKPROBE_SYMBOL(insns_need_simulation)
+
+int arch_prepare_kprobe(struct kprobe *p)
+{
+ union openrisc_instruction insn;
+ union openrisc_instruction prev_insn;
+ unsigned int cpucfgr = mfspr(SPR_CPUCFGR);
+ bool ss_delay_slot = false, exec_delay_slot = false;
+
+ /* Attempt to probe at unaligned address */
+ if ((unsigned long)p->addr & 0x3)
+ return -EILSEQ;
+
+ p->opcode = *p->addr;
+ insn.word = p->opcode;
+
+ if (insn_not_supported(insn)) {
+ pr_notice("Can't insert KProbe at blacklisted instruction.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_kernel_nofault(&prev_insn, p->addr - 1,
+ OPENRISC_INSN_SIZE) == 0 &&
+ insns_need_simulation(prev_insn, &exec_delay_slot) && exec_delay_slot) {
+ pr_notice("Can't insert KProbe in delay slot.\n");
+ return -EINVAL;
+ }
+
+ if (insns_need_simulation(insn, &exec_delay_slot) && !exec_delay_slot) {
+ p->ainsn.insn = NULL;
+ p->ainsn.restore = 0;
+ } else {
+ /*
+ * Single step probed instruction or, in case of branch instructions, single
+ * step instruction in delay slot.
+ */
+ p->ainsn.insn = get_insn_slot();
+ if (!p->ainsn.insn)
+ return -ENOMEM;
+
+ if (exec_delay_slot) {
+ patch_insn_write(p->ainsn.insn, *(p->addr + 1));
+ p->ainsn.restore = 0;
+ } else {
+ patch_insn_write(p->ainsn.insn, p->opcode);
+ p->ainsn.restore = (unsigned long)p->addr + OPENRISC_INSN_SIZE;
+ }
+ patch_insn_write(&p->ainsn.insn[1], KPROBE_SSTEPBP_INSN);
+ }
+
+ return 0;
+}
+NOKPROBE_SYMBOL(arch_prepare_kprobe);
+
+void arch_arm_kprobe(struct kprobe *p)
+{
+ patch_insn_write(p->addr, KPROBE_BP_INSN);
+ flush_insn_slot(p);
+}
+NOKPROBE_SYMBOL(arch_arm_kprobe);
+
+void arch_disarm_kprobe(struct kprobe *p)
+{
+ patch_insn_write(p->addr, p->opcode);
+ flush_insn_slot(p);
+}
+NOKPROBE_SYMBOL(arch_disarm_kprobe);
+
+void arch_remove_kprobe(struct kprobe *p)
+{
+ if (p->ainsn.insn) {
+ free_insn_slot(p->ainsn.insn, 0);
+ p->ainsn.insn = NULL;
+ }
+}
+NOKPROBE_SYMBOL(arch_remove_kprobe);
+
+static void set_current_kprobe(struct kprobe *p)
+{
+ __this_cpu_write(current_kprobe, p);
+}
+NOKPROBE_SYMBOL(set_current_kprobe);
+
+static void save_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ kcb->prev_kprobe.kp = kprobe_running();
+ kcb->prev_kprobe.status = kcb->kprobe_status;
+}
+NOKPROBE_SYMBOL(save_previous_kprobe);
+
+static void restore_previous_kprobe(struct kprobe_ctlblk *kcb)
+{
+ kcb->kprobe_status = kcb->prev_kprobe.status;
+ set_current_kprobe(kcb->prev_kprobe.kp);
+}
+NOKPROBE_SYMBOL(restore_previous_kprobe);
+
+static void post_kprobe_handler(struct kprobe *cur, struct kprobe_ctlblk *kcb,
+ struct pt_regs *regs)
+{
+ if (cur->ainsn.restore != 0)
+ instruction_pointer_set(regs, (unsigned long)cur->ainsn.restore);
+
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ preempt_enable_no_resched();
+ return;
+ }
+
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+
+ if (cur->post_handler)
+ cur->post_handler(cur, regs, 0);
+
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+}
+NOKPROBE_SYMBOL(post_kprobe_handler);
+
+static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb, int reenter)
+{
+ union openrisc_instruction insn;
+
+ if (reenter) {
+ save_previous_kprobe(kcb);
+ set_current_kprobe(p);
+ kcb->kprobe_status = KPROBE_REENTER;
+ } else {
+ kcb->kprobe_status = KPROBE_HIT_SS;
+ }
+
+ /* Emulate instruction if required. */
+ insn.word = p->opcode;
+ if (is_branch_insn(insn)) {
+ simulate_branch(regs, insn.word, has_delay_slot());
+ /* Save target addr before updating PC to SSOL slot */
+ p->ainsn.restore = regs->pc;
+ } else if (is_pc_insn(insn)) {
+ simulate_pc(regs, insn.word);
+ /* No SSOL is required here, so call post-process immediately. */
+ post_kprobe_handler(p, kcb, regs);
+ return;
+ }
+
+ if (p->ainsn.insn) {
+ /* Disable IRQs before single stepping */
+ local_irq_save(kcb->irq_flags);
+ instruction_pointer_set(regs, (unsigned long)p->ainsn.insn);
+ } else {
+ /*
+ * The instruction is a branch but delay slots are disabled.
+ * Simply call the post-handler.
+ */
+ post_kprobe_handler(p, kcb, regs);
+ }
+}
+NOKPROBE_SYMBOL(setup_singlestep);
+
+static bool reenter_kprobe(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+ switch (kcb->kprobe_status) {
+ case KPROBE_HIT_SS:
+ case KPROBE_HIT_SSDONE:
+ case KPROBE_HIT_ACTIVE:
+ kprobes_inc_nmissed_count(p);
+ setup_singlestep(p, regs, kcb, 1);
+ break;
+ case KPROBE_REENTER:
+ pr_warn("Unrecoverable KProbe detected.\n");
+ dump_kprobe(p);
+ WARN_ON_ONCE(1);
+ break;
+ default:
+ WARN_ON(1);
+ return false;
+ }
+
+ return true;
+}
+NOKPROBE_SYMBOL(reenter_kprobe);
+
+bool kprobe_breakpoint_handler(struct pt_regs *regs)
+{
+ struct kprobe *p, *curr_kprobe;
+ struct kprobe_ctlblk *kcb;
+ kprobe_opcode_t *addr = (kprobe_opcode_t *)regs->pc;
+
+ /*
+ * We don't want to be preempted for the entire
+ * duration of kprobe processing.
+ */
+ preempt_disable();
+
+ kcb = get_kprobe_ctlblk();
+ curr_kprobe = kprobe_running();
+ p = get_kprobe(addr);
+ if (p) {
+ if (curr_kprobe) {
+ if (reenter_kprobe(p, regs, kcb))
+ return true;
+ } else {
+ /* Probe hit */
+ set_current_kprobe(p);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ /*
+ * If there's no pre-handler or it returns 0, continue
+ * processing the KProbe as usual. A non-zero return
+ * value implies the saved registers have been modified
+ * and the execution path might deviate from what's
+ * expected. Reset the KProbe and return.
+ */
+ if (!p->pre_handler || !p->pre_handler(p, regs)) {
+ setup_singlestep(p, regs, kcb, 0);
+ } else {
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ }
+ return true;
+ }
+ }
+
+ /*
+ * The breakpoint instruction was removed right after we hit it
+ * possibly by another cpu. If the original instruction was also
+ * a trap instruction then return to the trap handler for further
+ * processing, else no further processing is required.
+ */
+ if ((*addr & 0xffff0000) != OPCODE_TRAP) {
+ preempt_enable_no_resched();
+ return true;
+ }
+
+ preempt_enable_no_resched();
+ return false;
+}
+NOKPROBE_SYMBOL(kprobe_breakpoint_handler);
+
+bool kprobe_singlestep_handler(struct pt_regs *regs)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ unsigned long addr = instruction_pointer(regs);
+
+ if (cur && (kcb->kprobe_status & (KPROBE_HIT_SS | KPROBE_REENTER)) &&
+ ((unsigned long)&cur->ainsn.insn[1] == addr)) {
+ /* Single stepping has been completed. Enable IRQs. */
+ local_irq_restore(kcb->irq_flags);
+ post_kprobe_handler(cur, kcb, regs);
+ return true;
+ }
+
+ preempt_enable_no_resched();
+ return false;
+}
+NOKPROBE_SYMBOL(kprobe_singlestep_handler);
+
+int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+ struct kprobe *cur = kprobe_running();
+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+ switch (kcb->kprobe_status) {
+ case KPROBE_HIT_SS:
+ case KPROBE_REENTER:
+ /*
+ * The instruction being single-stepped caused a page fault.
+ * Reset the current KProbe and set PC to the probe's address.
+ * Then allow the page fault handler to continue as usual.
+ */
+ instruction_pointer_set(regs, (unsigned long)cur->addr);
+
+ if (kcb->kprobe_status == KPROBE_REENTER) {
+ restore_previous_kprobe(kcb);
+ } else {
+ local_irq_restore(kcb->irq_flags);
+ reset_current_kprobe();
+ preempt_enable_no_resched();
+ }
+ break;
+ }
+
+ return 0;
+}
+NOKPROBE_SYMBOL(kprobe_fault_handler);
+
+int __init arch_init_kprobes(void)
+{
+ return 0;
+}
diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index ee87a3af34fc..ae6bfcca388e 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -30,10 +30,12 @@
#include <linux/kallsyms.h>
#include <linux/uaccess.h>
+#include <asm/break.h>
#include <asm/bug.h>
#include <asm/fpu.h>
#include <asm/insn-def.h>
#include <asm/io.h>
+#include <asm/kprobes.h>
#include <asm/processor.h>
#include <asm/unwinder.h>
#include <asm/sections.h>
@@ -216,10 +218,34 @@ asmlinkage void do_trap(struct pt_regs *regs, unsigned long address)
if (user_mode(regs)) {
force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc);
} else {
+ unsigned int trap = *(unsigned int *)regs->pc, cpucfgr = mfspr(SPR_CPUCFGR), bcode;
+
+ /*
+ * Trap instruction was probably removed and no further processing
+ * is required.
+ */
+ if ((trap & 0xffff0000) != OPCODE_TRAP)
+ return;
+
+ bcode = (trap & 0xffff);
+ switch (bcode) {
+ case BRK_KPROBE_BP:
+ if (kprobe_breakpoint_handler(regs))
+ return;
+ break;
+ case BRK_KPROBE_SSTEPBP:
+ if (kprobe_singlestep_handler(regs))
+ return;
+ break;
+ default:
+ break;
+ }
+
pr_emerg("KERNEL: Illegal trap exception 0x%.8lx\n", regs->pc);
die("Die:", regs, SIGILL);
}
}
+NOKPROBE_SYMBOL(do_trap)
asmlinkage void do_unaligned_access(struct pt_regs *regs, unsigned long address)
{
diff --git a/arch/openrisc/lib/memcpy.c b/arch/openrisc/lib/memcpy.c
index e2af9b510804..701262a40134 100644
--- a/arch/openrisc/lib/memcpy.c
+++ b/arch/openrisc/lib/memcpy.c
@@ -16,6 +16,7 @@
#include <linux/export.h>
+#include <linux/kprobes.h>
#include <linux/string.h>
#ifdef CONFIG_OR1K_1200
@@ -122,4 +123,5 @@ void *memcpy(void *dest, __const void *src, __kernel_size_t n)
}
#endif
+NOKPROBE_SYMBOL(memcpy);
EXPORT_SYMBOL(memcpy);
diff --git a/arch/openrisc/lib/memset.S b/arch/openrisc/lib/memset.S
index c3ac2a8b68d3..14e63af12d73 100644
--- a/arch/openrisc/lib/memset.S
+++ b/arch/openrisc/lib/memset.S
@@ -9,6 +9,8 @@
* Copyright (C) 2015 Olof Kindgren <olof.kindgren@gmail.com>
*/
+#include <asm/asm.h>
+
.global memset
.type memset, @function
memset:
@@ -92,3 +94,5 @@ memset:
4: l.jr r9
l.ori r11, r3, 0
+
+_ASM_NOKPROBE(memset)
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index 29e232d78d82..5263a832562f 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -14,6 +14,7 @@
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/extable.h>
+#include <linux/kprobes.h>
#include <linux/sched/signal.h>
#include <linux/perf_event.h>
@@ -55,6 +56,9 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
tsk = current;
+ if (kprobe_page_fault(regs, vector))
+ return;
+
/*
* We fault-in kernel-space virtual memory on-demand. The
* 'reference' page table is init_mm.pgd.
@@ -351,3 +355,4 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
return;
}
}
+NOKPROBE_SYMBOL(do_page_fault)
diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index 53ec6c8b8c40..84e26ebef70b 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -59,6 +59,10 @@ static int __kprobes handler_pre(struct kprobe *p, struct pt_regs *regs)
pr_info("<%s> p->addr = 0x%p, era = 0x%lx, estat = 0x%lx\n",
p->symbol_name, p->addr, regs->csr_era, regs->csr_estat);
#endif
+#ifdef CONFIG_OPENRISC
+ pr_info("<%s> p->addr = 0x%p, pc = 0x%lx, status = 0x%lx\n",
+ p->symbol_name, p->addr, regs->pc, regs->sr);
+#endif
/* A dump_stack() here will give a stack backtrace */
return 0;
@@ -100,6 +104,10 @@ static void __kprobes handler_post(struct kprobe *p, struct pt_regs *regs,
pr_info("<%s> p->addr = 0x%p, estat = 0x%lx\n",
p->symbol_name, p->addr, regs->csr_estat);
#endif
+#ifdef CONFIG_OPENRISC
+ pr_info("<%s> p->addr = 0x%p, status = 0x%lx\n",
+ p->symbol_name, p->addr, regs->sr);
+#endif
}
static int __init kprobe_init(void)
--
2.53.0
^ permalink raw reply related
* [RFC 1/2] openrisc: Add utilities and clean up simulation of instructions
From: Sahil Siddiq @ 2026-04-07 18:56 UTC (permalink / raw)
To: jonas, stefan.kristiansson, shorne, naveen, davem, mhiramat
Cc: peterz, jpoimboe, jbaron, rostedt, ardb, chenmiao.ku, johannes,
nsc, masahiroy, tytso, linux-openrisc, linux-kernel,
linux-trace-kernel, Sahil Siddiq
In-Reply-To: <20260407185650.79816-1-sahilcdq0@gmail.com>
Introduce new instruction-related utilities and macros for OpenRISC.
This is in preparation for patches that add tracing support such as
KProbes.
Simulate l.adrp. Fix bugs in simulation of l.jal and l.jalr. Earlier,
PC was being updated and then saved in the link register r9, resulting
in a corrupted page table (bad page map in process). Instead, update
PC after storing it in r9.
Move instruction simulation to its own file to enable reuse. Clean it
up and replace hardcoded values with computed expressions.
Link: https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.4-rev0.pdf
Signed-off-by: Sahil Siddiq <sahilcdq0@gmail.com>
---
arch/openrisc/include/asm/insn-def.h | 61 +++++++++++++++++++++--
arch/openrisc/include/asm/spr_defs.h | 1 +
arch/openrisc/kernel/Makefile | 2 +-
arch/openrisc/kernel/insn.c | 74 ++++++++++++++++++++++++++++
arch/openrisc/kernel/jump_label.c | 2 +-
arch/openrisc/kernel/traps.c | 41 +--------------
6 files changed, 136 insertions(+), 45 deletions(-)
create mode 100644 arch/openrisc/kernel/insn.c
diff --git a/arch/openrisc/include/asm/insn-def.h b/arch/openrisc/include/asm/insn-def.h
index 1e0c028a5b95..c98f9770c52e 100644
--- a/arch/openrisc/include/asm/insn-def.h
+++ b/arch/openrisc/include/asm/insn-def.h
@@ -3,13 +3,66 @@
* Copyright (C) 2025 Chen Miao
*/
+#include <asm/spr.h>
+#include <asm/spr_defs.h>
+
#ifndef __ASM_OPENRISC_INSN_DEF_H
#define __ASM_OPENRISC_INSN_DEF_H
-/* or1k instructions are always 32 bits. */
-#define OPENRISC_INSN_SIZE 4
-
/* or1k nop instruction code */
-#define OPENRISC_INSN_NOP 0x15000000U
+#define INSN_NOP 0x15000000U
+
+#define INSN_CSYNC 0x23000000U
+#define INSN_MSYNC 0x22000000U
+#define INSN_PSYNC 0x22800000U
+
+#define OPCODE_TRAP 0x21000000U
+#define OPCODE_SYS 0x20000000U
+#define OPCODE_MACRC 0x18010000U
+
+struct pt_regs;
+
+enum six_bit_opcodes {
+ l_rfe = 0x09,
+ l_lwa = 0x1b,
+ l_mfspr = 0x2d,
+ l_mtspr = 0x30,
+ l_swa = 0x33,
+ l_j = 0x00,
+ l_jal = 0x01,
+ l_adrp = 0x02,
+ l_bnf = 0x03,
+ l_bf = 0x04,
+ l_jr = 0x11,
+ l_jalr = 0x12,
+};
+
+struct insn {
+ unsigned int opcode: 6;
+ unsigned int operands: 26;
+};
+
+union openrisc_instruction {
+ unsigned int word;
+ struct insn opcodes_6bit;
+};
+
+#define OPENRISC_INSN_SIZE (sizeof(union openrisc_instruction))
+
+/* Helpers for working with l.trap */
+static inline unsigned long __emit_trap(unsigned int code)
+{
+ return (code & 0xffff) | OPCODE_TRAP;
+}
+
+static inline bool has_delay_slot(void)
+{
+ unsigned int cpucfgr = mfspr(SPR_CPUCFGR);
+
+ return !(cpucfgr & SPR_CPUCFGR_ND);
+}
+
+void simulate_pc(struct pt_regs *regs, unsigned int jmp);
+void simulate_branch(struct pt_regs *regs, unsigned int jmp, bool has_delay_slot);
#endif /* __ASM_OPENRISC_INSN_DEF_H */
diff --git a/arch/openrisc/include/asm/spr_defs.h b/arch/openrisc/include/asm/spr_defs.h
index f0b6b492e9f4..5d13a9b96263 100644
--- a/arch/openrisc/include/asm/spr_defs.h
+++ b/arch/openrisc/include/asm/spr_defs.h
@@ -179,6 +179,7 @@
#define SPR_CPUCFGR_OF32S 0x00000080 /* ORFPX32 supported */
#define SPR_CPUCFGR_OF64S 0x00000100 /* ORFPX64 supported */
#define SPR_CPUCFGR_OV64S 0x00000200 /* ORVDX64 supported */
+#define SPR_CPUCFGR_ND 0x00000400 /* No delay slot */
#define SPR_CPUCFGR_RES 0xfffffc00 /* Reserved */
/*
diff --git a/arch/openrisc/kernel/Makefile b/arch/openrisc/kernel/Makefile
index 19e0eb94f2eb..150779fbf010 100644
--- a/arch/openrisc/kernel/Makefile
+++ b/arch/openrisc/kernel/Makefile
@@ -5,7 +5,7 @@
always-$(KBUILD_BUILTIN) := vmlinux.lds
-obj-y := head.o setup.o or32_ksyms.o process.o dma.o \
+obj-y := head.o insn.o setup.o or32_ksyms.o process.o dma.o \
traps.o time.o irq.o entry.o ptrace.o signal.o \
sys_call_table.o unwinder.o cacheinfo.o
diff --git a/arch/openrisc/kernel/insn.c b/arch/openrisc/kernel/insn.c
new file mode 100644
index 000000000000..2c97eceee6d7
--- /dev/null
+++ b/arch/openrisc/kernel/insn.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * OpenRISC instruction utils
+ *
+ * Linux architectural port borrowing liberally from similar works of
+ * others. All original copyrights apply as per the original source
+ * declaration.
+ *
+ * OpenRISC implementation:
+ * Copyright (C) 2026 Sahil Siddiq <sahilcdq0@gmail.com>
+ */
+
+#include <linux/ptrace.h>
+#include <asm/insn-def.h>
+
+void simulate_pc(struct pt_regs *regs, unsigned int jmp)
+{
+ int displacement;
+ unsigned int rd, op;
+
+ displacement = sign_extend32(((jmp) & 0x7ffff) << 13, 31);
+ rd = (jmp & 0x3ffffff) >> 21;
+ op = jmp >> 26;
+
+ switch (op) {
+ case l_adrp:
+ regs->gpr[rd] = displacement + (regs->pc & (-8192));
+ return;
+ default:
+ break;
+ }
+}
+
+void simulate_branch(struct pt_regs *regs, unsigned int jmp_insn, bool has_delay_slot)
+{
+ int displacement;
+ unsigned int rb, op, jmp;
+
+ displacement = sign_extend32(((jmp_insn) & 0x3ffffff) << 2, 27);
+ rb = (jmp_insn & 0x0000ffff) >> 11;
+ op = jmp_insn >> 26;
+ jmp = has_delay_slot ? 2 * OPENRISC_INSN_SIZE : OPENRISC_INSN_SIZE;
+
+ switch (op) {
+ case l_j: /* l.j */
+ regs->pc += displacement;
+ return;
+ case l_jal: /* l.jal */
+ regs->gpr[9] = regs->pc + jmp;
+ regs->pc += displacement;
+ return;
+ case l_bnf: /* l.bnf */
+ if (regs->sr & SPR_SR_F)
+ regs->pc += jmp;
+ else
+ regs->pc += displacement;
+ return;
+ case l_bf: /* l.bf */
+ if (regs->sr & SPR_SR_F)
+ regs->pc += displacement;
+ else
+ regs->pc += jmp;
+ return;
+ case l_jr: /* l.jr */
+ regs->pc = regs->gpr[rb];
+ return;
+ case l_jalr: /* l.jalr */
+ regs->gpr[9] = regs->pc + jmp;
+ regs->pc = regs->gpr[rb];
+ return;
+ default:
+ break;
+ }
+}
diff --git a/arch/openrisc/kernel/jump_label.c b/arch/openrisc/kernel/jump_label.c
index ab7137c23b46..fe082eb847a4 100644
--- a/arch/openrisc/kernel/jump_label.c
+++ b/arch/openrisc/kernel/jump_label.c
@@ -34,7 +34,7 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry,
insn = offset;
} else {
- insn = OPENRISC_INSN_NOP;
+ insn = INSN_NOP;
}
if (early_boot_irqs_disabled)
diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c
index c195be9cc9fc..ee87a3af34fc 100644
--- a/arch/openrisc/kernel/traps.c
+++ b/arch/openrisc/kernel/traps.c
@@ -32,6 +32,7 @@
#include <asm/bug.h>
#include <asm/fpu.h>
+#include <asm/insn-def.h>
#include <asm/io.h>
#include <asm/processor.h>
#include <asm/unwinder.h>
@@ -269,47 +270,9 @@ static inline int in_delay_slot(struct pt_regs *regs)
static inline void adjust_pc(struct pt_regs *regs, unsigned long address)
{
- int displacement;
- unsigned int rb, op, jmp;
-
if (unlikely(in_delay_slot(regs))) {
/* In delay slot, instruction at pc is a branch, simulate it */
- jmp = *((unsigned int *)regs->pc);
-
- displacement = sign_extend32(((jmp) & 0x3ffffff) << 2, 27);
- rb = (jmp & 0x0000ffff) >> 11;
- op = jmp >> 26;
-
- switch (op) {
- case 0x00: /* l.j */
- regs->pc += displacement;
- return;
- case 0x01: /* l.jal */
- regs->pc += displacement;
- regs->gpr[9] = regs->pc + 8;
- return;
- case 0x03: /* l.bnf */
- if (regs->sr & SPR_SR_F)
- regs->pc += 8;
- else
- regs->pc += displacement;
- return;
- case 0x04: /* l.bf */
- if (regs->sr & SPR_SR_F)
- regs->pc += displacement;
- else
- regs->pc += 8;
- return;
- case 0x11: /* l.jr */
- regs->pc = regs->gpr[rb];
- return;
- case 0x12: /* l.jalr */
- regs->pc = regs->gpr[rb];
- regs->gpr[9] = regs->pc + 8;
- return;
- default:
- break;
- }
+ simulate_branch(regs, *((unsigned int *)regs->pc), has_delay_slot());
} else {
regs->pc += 4;
}
--
2.53.0
^ permalink raw reply related
* [RFC 0/2] openrisc: Add support for KProbes
From: Sahil Siddiq @ 2026-04-07 18:56 UTC (permalink / raw)
To: jonas, stefan.kristiansson, shorne, naveen, davem, mhiramat
Cc: peterz, jpoimboe, jbaron, rostedt, ardb, chenmiao.ku, johannes,
nsc, masahiroy, tytso, linux-openrisc, linux-kernel,
linux-trace-kernel, Sahil Siddiq
Hi,
This series adds basic support for KProbes on OpenRISC. There are
a few changes that I would still like to add and test before this
can be considered for merging. I was hoping to get some feedback on
the changes made so far. The implementation in this series is based
on KProbes for LoongArch, MIPS and RISC-V.
The current state of the series allows traps to be inserted dynamically
in the kernel. A KProbe can be inserted via a kernel module whose
init/exit functions are used to register/unregister a KProbe. A pre-
handler and post-handler can also be provisioned in the module, which
are associated with the KProbe and triggered when the probe is hit. See
the documentation on KProbes for a detailed explanation [1].
The following are yet to be implemented for OpenRISC:
1. kretprobes
2. kprobe-based event tracing
3. ftrace, and kprobe features that depend on ftrace (particularly,
dynamic tracing)
I hope to submit a patch for kretprobes soon (possibly in a revision of
this series).
I wrote a couple of kernel modules to test these changes. They can be found
here [2]. I also ran test_kprobes located at ./lib/tests/ against these
changes [3]. The results are as shown below:
/home # insmod test_kprobes.ko
KTAP version 1
1..1
KTAP version 1
# Subtest: kprobes_test
# module: test_kprobes
1..3
ok 1 test_kprobe
ok 2 test_kprobes
ok 3 test_kprobe_missed
# kprobes_test: pass:3 fail:0 skip:0 total:3
# Totals: pass:3 fail:0 skip:0 total:3
ok 1 kprobes_test
/home #
When compiling the kernel, the following options should be enabled:
1. CONFIG_HAVE_KPROBES=y
2. CONFIG_KPROBES=y
Also ensure that CONFIG_KPROBE_EVENTS is disabled.
To compile /lib/tests/test_kprobes.c, add the following to .config:
1. CONFIG_KUNIT=y
2. CONFIG_DEBUG_KERNEL=y
3. CONFIG_KPROBES_SANITY_TEST=m
The first commit cleans up and reorganizes existing functions, fixes
a few issues with instruction simulation, and introduces new structures
and macros that will be used by KProbes and other tracing facilities
in the future.
The second commit adds support for KProbes. Currently, I have
implemented this in such a way that KProbes can't be used to probe
a few "blacklisted" instructions. Probes can't be inserted in a delay
slot either (similar to MIPS). I have also added a few asm functions
to the blacklist that I think should not be probed. For e.g., "memset"
and "_trap_handler" have been blacklisted because probing them causes
the kernel to hang. However, I am not sure if other functions in "entry.S"
need to be added as well to the blacklist.
Thanks,
Sahil
[1] https://www.kernel.org/doc/html/latest/trace/kprobes.html
[2] https://github.com/valdaarhun/or-dev/tree/main/home
[3] https://github.com/openrisc/linux/blob/for-next/lib/tests/test_kprobes.c
Sahil Siddiq (2):
openrisc: Add utilities and clean up simulation of instructions
openrisc: Add KProbes
arch/openrisc/Kconfig | 1 +
arch/openrisc/configs/or1ksim_defconfig | 2 +
arch/openrisc/configs/virt_defconfig | 2 +
arch/openrisc/include/asm/asm.h | 22 ++
arch/openrisc/include/asm/break.h | 19 ++
arch/openrisc/include/asm/insn-def.h | 61 +++-
arch/openrisc/include/asm/kprobes.h | 76 +++++
arch/openrisc/include/asm/spr_defs.h | 1 +
arch/openrisc/kernel/Makefile | 3 +-
arch/openrisc/kernel/entry.S | 16 +
arch/openrisc/kernel/insn.c | 74 +++++
arch/openrisc/kernel/jump_label.c | 2 +-
arch/openrisc/kernel/kprobes.c | 381 ++++++++++++++++++++++++
arch/openrisc/kernel/traps.c | 67 ++---
arch/openrisc/lib/memcpy.c | 2 +
arch/openrisc/lib/memset.S | 4 +
arch/openrisc/mm/fault.c | 5 +
samples/kprobes/kprobe_example.c | 8 +
18 files changed, 701 insertions(+), 45 deletions(-)
create mode 100644 arch/openrisc/include/asm/asm.h
create mode 100644 arch/openrisc/include/asm/break.h
create mode 100644 arch/openrisc/include/asm/kprobes.h
create mode 100644 arch/openrisc/kernel/insn.c
create mode 100644 arch/openrisc/kernel/kprobes.c
--
2.53.0
^ permalink raw reply
* Re: [PATCH bpf-next v6 1/2] tracing: Prefer vmlinux symbols over module symbols for unqualified kprobes
From: Ihor Solodrai @ 2026-04-07 18:42 UTC (permalink / raw)
To: Andrey Grodzovsky, bpf, linux-trace-kernel, jolsa
Cc: ast, daniel, andrii, rostedt, mhiramat, emil, linux-open-source
In-Reply-To: <20260407165145.1651061-2-andrey.grodzovsky@crowdstrike.com>
On 4/7/26 9:51 AM, Andrey Grodzovsky wrote:
> When an unqualified kprobe target exists in both vmlinux and a loaded
> module, number_of_same_symbols() returns a count greater than 1,
> causing kprobe attachment to fail with -EADDRNOTAVAIL even though the
> vmlinux symbol is unambiguous.
>
> When no module qualifier is given and the symbol is found in vmlinux,
> return the vmlinux-only count without scanning loaded modules. This
> preserves the existing behavior for all other cases:
> - Symbol only in a module: vmlinux count is 0, falls through to module
> scan as before.
> - Symbol qualified with MOD:SYM: mod != NULL, unchanged path.
> - Symbol ambiguous within vmlinux itself: count > 1 is returned as-is.
>
> Fixes: 926fe783c8a6 ("tracing/kprobes: Fix symbol counting logic by looking at modules as well")
> Fixes: 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads")
> Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
> ---
> kernel/trace/trace_kprobe.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index a5dbb72528e0..99c41ea8b6d7 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -765,6 +765,13 @@ static unsigned int number_of_same_symbols(const char *mod, const char *func_nam
> if (!mod)
> kallsyms_on_each_match_symbol(count_symbols, func_name, &ctx.count);
>
> + /* If the symbol is found in vmlinux, use vmlinux resolution only.
> + * This prevents module symbols from shadowing vmlinux symbols
> + * and causing -EADDRNOTAVAIL for unqualified kprobe targets.
> + */
> + if (!mod && ctx.count > 0)
> + return ctx.count;
Nice to see this actually works.
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
> +
> module_kallsyms_on_each_symbol(mod, count_mod_symbols, &ctx);
>
> return ctx.count;
^ permalink raw reply
* Re: [PATCH] selftests/ftrace: Check exact trace_marker_raw payload lengths
From: Steven Rostedt @ 2026-04-07 16:54 UTC (permalink / raw)
To: CaoRuichuang
Cc: mhiramat, mathieu.desnoyers, shuah, linux-kernel,
linux-trace-kernel, linux-kselftest
In-Reply-To: <20260407101245.78988-1-create0818@163.com>
On Tue, 7 Apr 2026 18:12:45 +0800
CaoRuichuang <create0818@163.com> wrote:
> From: Cao Ruichuang <create0818@163.com>
>
> trace_marker_raw.tc currently depends on awk strtonum() and
> assumes that the printed raw-data byte count is rounded up to four
> bytes.
>
> That makes the test fail on systems that use mawk, and it no longer
> matches the raw_data trace output we want to validate after preserving
> true payload lengths for long records.
>
> Rewrite the test to capture a small sequence of raw marker writes and
> check the exact number of printed payload bytes in order. While doing
> that, use od for the endian probe, switch to a fixed raw marker id so
> the test only varies payload length, and disable pause-on-trace while
> streaming trace_pipe.
>
The tests are to validate the code and if the tests fail when the code is
working as designed, it is the test that is at fault, and the fix is to fix
the tests. Not to make the code match the test.
-- Steve
^ permalink raw reply
* [PATCH bpf-next v6 0/2] tracing: Fix kprobe attachment when module shadows vmlinux symbol
From: Andrey Grodzovsky @ 2026-04-07 16:51 UTC (permalink / raw)
To: bpf, linux-trace-kernel, jolsa, ihor.solodrai
Cc: ast, daniel, andrii, rostedt, mhiramat, emil, linux-open-source
When a kernel module exports a symbol with the same name as an existing
vmlinux symbol, kprobe attachment fails with -EADDRNOTAVAIL because
number_of_same_symbols() counts matches across both vmlinux and all
loaded modules, returning a count greater than 1.
This series takes a different approach from v1-v5, which implemented a
libbpf-side fallback parsing /proc/kallsyms and retrying with the
absolute address. That approach was rejected (Andrii Nakryiko, Ihor
Solodrai) because ambiguous symbol resolution does not belong in libbpf,
and because it did not cover the kprobe_multi path.
Following Ihor's suggestion, this series fixes the root cause in the
kernel: when an unqualified symbol name is given and the symbol is found
in vmlinux, prefer the vmlinux symbol and do not scan loaded modules.
This makes the skeleton auto-attach path work transparently with no
libbpf changes needed.
Patch 1: Kernel fix - return vmlinux-only count from
number_of_same_symbols() when the symbol is found in vmlinux,
preventing module shadows from causing -EADDRNOTAVAIL.
Patch 2: Selftests using bpf_fentry_shadow_test which exists in both
vmlinux and bpf_testmod - tests unqualified (vmlinux) and
MOD:SYM (module) attachment across all four attach modes, plus
kprobe_multi with the duplicate symbol.
Changes since v5 [1]:
- Selftest: use existing bpf_fentry_shadow_test (vmlinux + bpf_testmod)
instead of a new bpf_testmod_dup_sym.ko test module (Jiri Olsa).
- Selftest: add MOD:SYM qualification test for module shadow attachment
(Jiri Olsa).
- Selftest: add kprobe_multi dup_sym test in kprobe_multi_test.c
(Jiri Olsa).
[1] https://lore.kernel.org/bpf/20260406193158.754498-1-andrey.grodzovsky@crowdstrike.com/
Andrey Grodzovsky (2):
tracing: Prefer vmlinux symbols over module symbols for unqualified
kprobes
selftests/bpf: Add tests for kprobe attachment with duplicate symbols
kernel/trace/trace_kprobe.c | 7 ++
.../selftests/bpf/prog_tests/attach_probe.c | 68 +++++++++++++++++++
.../bpf/prog_tests/kprobe_multi_test.c | 39 +++++++++++
3 files changed, 114 insertions(+)
--
2.34.1
^ permalink raw reply
* [PATCH bpf-next v6 1/2] tracing: Prefer vmlinux symbols over module symbols for unqualified kprobes
From: Andrey Grodzovsky @ 2026-04-07 16:51 UTC (permalink / raw)
To: bpf, linux-trace-kernel, jolsa, ihor.solodrai
Cc: ast, daniel, andrii, rostedt, mhiramat, emil, linux-open-source
In-Reply-To: <20260407165145.1651061-1-andrey.grodzovsky@crowdstrike.com>
When an unqualified kprobe target exists in both vmlinux and a loaded
module, number_of_same_symbols() returns a count greater than 1,
causing kprobe attachment to fail with -EADDRNOTAVAIL even though the
vmlinux symbol is unambiguous.
When no module qualifier is given and the symbol is found in vmlinux,
return the vmlinux-only count without scanning loaded modules. This
preserves the existing behavior for all other cases:
- Symbol only in a module: vmlinux count is 0, falls through to module
scan as before.
- Symbol qualified with MOD:SYM: mod != NULL, unchanged path.
- Symbol ambiguous within vmlinux itself: count > 1 is returned as-is.
Fixes: 926fe783c8a6 ("tracing/kprobes: Fix symbol counting logic by looking at modules as well")
Fixes: 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads")
Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
kernel/trace/trace_kprobe.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index a5dbb72528e0..99c41ea8b6d7 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -765,6 +765,13 @@ static unsigned int number_of_same_symbols(const char *mod, const char *func_nam
if (!mod)
kallsyms_on_each_match_symbol(count_symbols, func_name, &ctx.count);
+ /* If the symbol is found in vmlinux, use vmlinux resolution only.
+ * This prevents module symbols from shadowing vmlinux symbols
+ * and causing -EADDRNOTAVAIL for unqualified kprobe targets.
+ */
+ if (!mod && ctx.count > 0)
+ return ctx.count;
+
module_kallsyms_on_each_symbol(mod, count_mod_symbols, &ctx);
return ctx.count;
--
2.34.1
^ permalink raw reply related
* [PATCH bpf-next v6 2/2] selftests/bpf: Add tests for kprobe attachment with duplicate symbols
From: Andrey Grodzovsky @ 2026-04-07 16:51 UTC (permalink / raw)
To: bpf, linux-trace-kernel, jolsa, ihor.solodrai
Cc: ast, daniel, andrii, rostedt, mhiramat, emil, linux-open-source
In-Reply-To: <20260407165145.1651061-1-andrey.grodzovsky@crowdstrike.com>
bpf_fentry_shadow_test exists in both vmlinux (net/bpf/test_run.c) and
bpf_testmod (bpf_testmod.c), creating a duplicate symbol condition when
bpf_testmod is loaded. Add subtests that verify kprobe behavior with
this duplicate symbol:
In attach_probe:
- dup-sym-{default,legacy,perf,link}: unqualified attach succeeds
across all four modes, preferring vmlinux over module shadow.
- dup-sym-module-{default,legacy,perf,link}: MOD:SYM qualification
(bpf_testmod:bpf_fentry_shadow_test) attaches to the module version.
In kprobe_multi_test:
- dup_sym: kprobe_multi attach with kprobe and kretprobe succeeds.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
.../selftests/bpf/prog_tests/attach_probe.c | 68 +++++++++++++++++++
.../bpf/prog_tests/kprobe_multi_test.c | 39 +++++++++++
2 files changed, 107 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index 12a841afda68..2d52547c395e 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -197,6 +197,65 @@ static void test_attach_kprobe_legacy_by_addr_reject(void)
test_attach_probe_manual__destroy(skel);
}
+/* bpf_fentry_shadow_test exists in both vmlinux (net/bpf/test_run.c) and
+ * bpf_testmod (bpf_testmod.c). When bpf_testmod is loaded the symbol is
+ * duplicated. Test that kprobe attachment handles this correctly:
+ * - Unqualified name ("bpf_fentry_shadow_test") attaches to vmlinux.
+ * - MOD:SYM name ("bpf_testmod:bpf_fentry_shadow_test") attaches to module.
+ *
+ * Note: bpf_fentry_shadow_test is not invoked via test_run, so we only
+ * verify that attach and detach succeed without triggering the probe.
+ */
+static void test_attach_probe_dup_sym(enum probe_attach_mode attach_mode)
+{
+ DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts);
+ struct bpf_link *kprobe_link, *kretprobe_link;
+ struct test_attach_probe_manual *skel;
+
+ skel = test_attach_probe_manual__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_dup_sym_open_and_load"))
+ return;
+
+ kprobe_opts.attach_mode = attach_mode;
+
+ /* Unqualified: should attach to vmlinux symbol */
+ kprobe_opts.retprobe = false;
+ kprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+ "bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kprobe_link, "attach_kprobe_vmlinux"))
+ goto cleanup;
+ bpf_link__destroy(kprobe_link);
+
+ kprobe_opts.retprobe = true;
+ kretprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+ "bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kretprobe_link, "attach_kretprobe_vmlinux"))
+ goto cleanup;
+ bpf_link__destroy(kretprobe_link);
+
+ /* MOD:SYM qualified: should attach to module symbol */
+ kprobe_opts.retprobe = false;
+ kprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+ "bpf_testmod:bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kprobe_link, "attach_kprobe_module"))
+ goto cleanup;
+ bpf_link__destroy(kprobe_link);
+
+ kprobe_opts.retprobe = true;
+ kretprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+ "bpf_testmod:bpf_fentry_shadow_test",
+ &kprobe_opts);
+ if (!ASSERT_OK_PTR(kretprobe_link, "attach_kretprobe_module"))
+ goto cleanup;
+ bpf_link__destroy(kretprobe_link);
+
+cleanup:
+ test_attach_probe_manual__destroy(skel);
+}
+
/* attach uprobe/uretprobe long event name testings */
static void test_attach_uprobe_long_event_name(void)
{
@@ -559,6 +618,15 @@ void test_attach_probe(void)
if (test__start_subtest("kprobe-legacy-by-addr-reject"))
test_attach_kprobe_legacy_by_addr_reject();
+ if (test__start_subtest("dup-sym-default"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_DEFAULT);
+ if (test__start_subtest("dup-sym-legacy"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_LEGACY);
+ if (test__start_subtest("dup-sym-perf"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_PERF);
+ if (test__start_subtest("dup-sym-link"))
+ test_attach_probe_dup_sym(PROBE_ATTACH_MODE_LINK);
+
if (test__start_subtest("auto"))
test_attach_probe_auto(skel);
if (test__start_subtest("kprobe-sleepable"))
diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
index 78c974d4ea33..d89650e31fd3 100644
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -633,6 +633,43 @@ static void test_attach_write_ctx(void)
}
#endif
+/* Test kprobe_multi handles shadow symbols (vmlinux + module duplicate).
+ * bpf_fentry_shadow_test exists in both vmlinux and bpf_testmod.
+ * kprobe_multi resolves via ftrace_lookup_symbols() which finds the
+ * vmlinux symbol first and stops, so this should always succeed.
+ */
+static void test_attach_probe_dup_sym(void)
+{
+ LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
+ const char *syms[1] = { "bpf_fentry_shadow_test" };
+ struct kprobe_multi *skel = NULL;
+ struct bpf_link *link1 = NULL, *link2 = NULL;
+
+ skel = kprobe_multi__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "kprobe_multi__open_and_load"))
+ goto cleanup;
+
+ skel->bss->pid = getpid();
+ opts.syms = syms;
+ opts.cnt = ARRAY_SIZE(syms);
+
+ link1 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe_manual,
+ NULL, &opts);
+ if (!ASSERT_OK_PTR(link1, "attach_kprobe_multi_dup_sym"))
+ goto cleanup;
+
+ opts.retprobe = true;
+ link2 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kretprobe_manual,
+ NULL, &opts);
+ if (!ASSERT_OK_PTR(link2, "attach_kretprobe_multi_dup_sym"))
+ goto cleanup;
+
+cleanup:
+ bpf_link__destroy(link2);
+ bpf_link__destroy(link1);
+ kprobe_multi__destroy(skel);
+}
+
void serial_test_kprobe_multi_bench_attach(void)
{
if (test__start_subtest("kernel"))
@@ -676,5 +713,7 @@ void test_kprobe_multi_test(void)
test_unique_match();
if (test__start_subtest("attach_write_ctx"))
test_attach_write_ctx();
+ if (test__start_subtest("dup_sym"))
+ test_attach_probe_dup_sym();
RUN_TESTS(kprobe_multi_verifier);
}
--
2.34.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox