* Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Balbir Singh @ 2026-06-03 5:00 UTC (permalink / raw)
To: Gregory Price
Cc: lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
terry.bowman
In-Reply-To: <ah6bDNxlB1zBUnzN@gourry-fedora-PF4VCD3F>
On Tue, Jun 02, 2026 at 09:57:48AM +0100, Gregory Price wrote:
> On Tue, Jun 02, 2026 at 12:16:50PM +1000, Balbir Singh wrote:
> > On Sun, May 24, 2026 at 09:50:06PM -0400, Gregory Price wrote:
> > >
> > > I'm debating on whether to include OPS_MEMPOLICY in the initial version
> > > if only because it's not intuitive how it interacts with pagecache. That
> > > needs more time to bake.
> > >
> >
> > It makes sense to look at it and then decide if it makes sense.
> >
>
> I am thinking i will ship without any OPS flags at all for now and the
> have the introduction of ops as a separate series.
>
> > > alloc_pages_node() is the kernel interface
> >
> > I was think we wouldn't need explicit flags and that allocations would
> > happen from user space using __GFP_THISNODE to the node or via a nodemask
> > based on nodes of interest. Is there a reason to add this flag, a system
> > might have more than one source of N_MEMORY_PRIVATE?
> >
>
> There's a few things to unpack here. I discussed this many times on
> list and at LSF, but to reiterate.
>
> 1) __GFP_THISNODE is insufficient to enforce isolation and otherwise
> not particularly useful. Additionally, from userland, it's not
> something you can actually set.
I was thinking mbind()/mempolicy() is how we get to it. It already
accepts a nodemask.
>
> for node in possible_nodes:
> alloc_pages_node(private_node, __GFP_THISNODE)
>
> In fact it's the opposite semantic of what we want.
> THISNODE says: "Do not fallback back to OTHER nodes".
>
That's why we need to control the fallback nodes carefully for
N_MEMORY_PRIVATE
> The semantic we want is "Do not allow allocations from private
> nodes UNLESS we specifically request" (__GFP_PRIVATE).
>
> __GFP_THISNODE does not actually buy you anything here, AND it's
> worse, in the scenario where a private node makes its way into the
> preferred slot (via possible_nodes or some other nodemask), the
> allocator cannot fall back to a node it can access.
>
> __GFP_THISNODE cannot be overloaded to do anything useful here.
Let me clarify, I meant to say, let's use a nodemask for allocation
and __GFP_THISNODE gets us to the node we desire, if that is the only
node. My earlier comment might not have been clear.
>
> 2) We're trying not to expose *ANY* userland APIs for this, at all.
>
> The ultimate goal here should be one of two things:
>
> 1) fd = open(/dev/xxx, ...);
> mem = mmap(fd, ...);
> mem[0] = 0xDEADBEEF; /* Fault device page into page table */
>
> In this case, the driver is responsible for doing the
> alloc_pages_node() call.
>
> or
>
> 2) mem = mmap(NULL, ..., ANON);
> mbind(mem, ..., private_node);
> mem[0] = 0xDEADBEEF; /* Fault device page into page table */
>
> in this case mempolicy.c is responsible for doing the
> alloc_pages_node() call via the _mpol() alloc variants.
>
> Addition OPT flags (reclaim, compaction, whatever), would
> (optionally) allow mm/ to operate on the device memory with, for
> example, mmu_notifier callbacks to tell the device to invalidate
> whatever it's caching about that page.
>
> This would all be relatively transparent the userland, all userland
> "knows" is that it's getting memory from a device (/dev/xxx) or a
> node it's otherwise aware of hosting device memory somehow.
>
Why not use mbind() API's? Do we want to gate allocation/privileges
via a /dev?
Balbir
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Andy Shevchenko @ 2026-06-03 5:46 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Arnd Bergmann, Steven Rostedt, Andrew Morton, Petr Mladek,
Nathan Chancellor, Arnd Bergmann, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Rasmus Villemoes,
Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka, linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <20260603105842.1e0ef8cb4a55cb776d6a4971@kernel.org>
On Wed, Jun 03, 2026 at 10:58:42AM +0900, Masami Hiramatsu wrote:
> On Tue, 2 Jun 2026 17:07:05 +0200
> Arnd Bergmann <arnd@kernel.org> wrote:
...
> I think this is a slightly confusing name. What about vsnprintf_nocheck()?
What check? If you want to be more precise: vsnprintf_no_printf_attr() or
vsnprintf_no_format_check(). But they also seem to me not the good choices.
(Just slight preference to the latter one no_format_check.)
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* [PATCH 1/2] tracing/synthetic: Free pending field on error path
From: Yu Peng @ 2026-06-03 6:25 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel, Yu Peng
Some __create_synth_event() error paths run after parse_synth_field()
succeeds but before the field is stored in fields[]. The common cleanup
then misses the field. Free it before freeing argv.
Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
kernel/trace/trace_events_synth.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index e6871230bde96..cdd5b93328358 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -1446,13 +1446,13 @@ static int __create_synth_event(const char *name, const char *raw_fields)
if (cmd_version > 1 && n_fields_this_loop >= 1) {
synth_err(SYNTH_ERR_INVALID_CMD, errpos(field_str));
ret = -EINVAL;
- goto err_free_arg;
+ goto err_free_field;
}
if (n_fields == SYNTH_FIELDS_MAX) {
synth_err(SYNTH_ERR_TOO_MANY_FIELDS, 0);
ret = -EINVAL;
- goto err_free_arg;
+ goto err_free_field;
}
fields[n_fields++] = field;
@@ -1491,6 +1491,8 @@ static int __create_synth_event(const char *name, const char *raw_fields)
kfree(saved_fields);
return ret;
+ err_free_field:
+ free_synth_field(field);
err_free_arg:
argv_free(argv);
err:
--
2.43.0
^ permalink raw reply related
* [PATCH 2/2] tracing/synthetic: Free type string on error path
From: Yu Peng @ 2026-06-03 6:25 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel, Yu Peng
In-Reply-To: <20260603062533.1096320-1-pengyu@kylinos.cn>
parse_synth_field() builds a "__data_loc ..." type string before
assigning it to field->type. If the seq_buf check fails, the common
cleanup cannot free the temporary string. Free it before leaving.
Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
kernel/trace/trace_events_synth.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index cdd5b93328358..dc15658a887cb 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -839,8 +839,10 @@ static struct synth_field *parse_synth_field(int argc, char **argv,
seq_buf_puts(&s, "__data_loc ");
seq_buf_puts(&s, field->type);
- if (WARN_ON_ONCE(!seq_buf_buffer_left(&s)))
+ if (WARN_ON_ONCE(!seq_buf_buffer_left(&s))) {
+ kfree(type);
goto free;
+ }
s.buffer[s.len] = '\0';
kfree(field->type);
--
2.43.0
^ permalink raw reply related
* Re: [syzbot] [trace?] KASAN: use-after-free Write in ring_buffer_read_page
From: Aleksandr Nogikh @ 2026-06-03 6:38 UTC (permalink / raw)
To: Masami Hiramatsu, Alexander Potapenko
Cc: Steven Rostedt, syzbot, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, syzkaller-bugs
In-Reply-To: <20260603103445.236f260a3c5eafe140055761@kernel.org>
On Wed, Jun 3, 2026 at 3:34 AM 'Masami Hiramatsu' via syzkaller-bugs
<syzkaller-bugs@googlegroups.com> wrote:
>
> On Tue, 2 Jun 2026 12:28:29 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Tue, 02 Jun 2026 06:45:31 -0700
> > syzbot <syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com> wrote:
> >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: e7ae89a0c97c Linux 7.1-rc5
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=16f06e2e580000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=58acee1ac5406016
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
> > > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Looks like the test was doing something really weird to trigger this.
> > Without a reproducer, it's pretty much impossible to find out what
> > happened. Maybe AI could do it?
> >
>
> Does the "I don't have any reproducer for this issue yet." means
> this is not reproducible even if it runs completely same sequence
> in the console output? If so, might this be a timing related issue?
> (e.g. read v.s. write-event)
Yes, syzbot normally re-plays the sequence of last programs executed
on the crashed VM to find a reproducer, and, in many cases, they no
longer crash the kernel..
In the meanwhile, syzbot's AI bug reproduction functionality has found
a C reproducer for a KASAN crash in the kernel/trace's ring buffer,
although with a slightly different stack trace:
https://syzkaller.appspot.com/ai_job?id=b2620161-1632-4d4e-9314-114a8a5e79ef
Cc Alexander Potapenko
>
> Thanks,
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
^ permalink raw reply
* Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Gregory Price @ 2026-06-03 7:02 UTC (permalink / raw)
To: Balbir Singh
Cc: lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
terry.bowman
In-Reply-To: <ah-0CyZurn5D1ezY@parvat>
On Wed, Jun 03, 2026 at 03:00:01PM +1000, Balbir Singh wrote:
> On Tue, Jun 02, 2026 at 09:57:48AM +0100, Gregory Price wrote:
> > On Tue, Jun 02, 2026 at 12:16:50PM +1000, Balbir Singh wrote:
> > >
> > > I was think we wouldn't need explicit flags and that allocations would
> > > happen from user space using __GFP_THISNODE to the node or via a nodemask
> > > based on nodes of interest. Is there a reason to add this flag, a system
> > > might have more than one source of N_MEMORY_PRIVATE?
> > >
> >
> > There's a few things to unpack here. I discussed this many times on
> > list and at LSF, but to reiterate.
> >
> > 1) __GFP_THISNODE is insufficient to enforce isolation and otherwise
> > not particularly useful. Additionally, from userland, it's not
> > something you can actually set.
>
> I was thinking mbind()/mempolicy() is how we get to it. It already
> accepts a nodemask.
>
First let me say: I want to enable mbind access to these nodes.
But let me caveat: I think that needs more time to develop, and
in the meantime, we can enable the /dev/xxx pattern somewhat trivially.
First let me address a few things about mbind/mempolicy and how it
interacts with page_alloc.c, I gave this overview at LSF but I don't
remember if I posted it in any of my follow ups.
1) Fallback lists are filtered by nodemask, the nodemask does not replace
the fallback list.
Here is how the page allocator fallback lists and nodemasks interact:
Fallbacks A: A B
Fallbacks B: B A
Fallbacks C: C A B (Private)
Fallbacks D: D B A (Private)
Lets say you pass:
alloc_pages_node(C, ..., nodemask(A,C,D))
So we get
Fallback(C,A,B) & nodemask(A,C,D) -> iterate(C,A)
If we wanted to change this behavior, realistically we'd be looking for
a way to add specific nodes to certain fallback lists - rather than
modify the nodemask interaction in some way.
I think this is out of scope for the first iteration - so supporting
anything other than mbind() from the start is just pointless.
The only feasible mempolicy you can apply is single-node bind, so
realistically you can only support mbind.
2) full mempolicy support doesn't really make sense
task mempolicy PROBABLY should never really touch private nodes,
while VMA policy certainly can. Assuming we're able to support
multi-private-node masks, none of the non-bind mempolicies even
make sense for most private nodes (interleave? weighted interleave?)
I haven't worked through all the implications of a task policy having
a private node attached, but the longer I think about it, the less it
makes sense to just support this outright.
3) Introducing mbind support is not just a simple nodemask on a VMA,
It also implies migration, cgroup/cpuset, and UAPI interactions.
a) migration:
mbind/mempolicy can and will engage migration when it is called
with certain flags. Migration has subtle LRU interactions, but
the patch set I have at least allows this to work.
b) cgroup/cpuset:
cpuset.mems rebinding will cause private nodes to be quietly
rebound to non-private nodes within a nodemask.
c) between A and B - we really want MPOL_F_STATIC to be required
for mbind to be applied to private node so that it is never
forcefully remapped.
That's a UAPI semantic change specific for private nodes we
should really take time to consider.
4) File VMA interactions don't entirely make sense with mbind
In theory you might want:
fd = open("somefile", ...);
mem = mmap(fd, ...);
mbind(mem, ..., private_node);
for page in mem:
mem[page_off] /* fault file into private memory */
In reality: This does not work the way you want.
I went digging and we need a few mild extensions to allow
migration on mbind to work for pagecache pages, and the fault
path does not necessarily respect the vma mempolicy always.
You also start getting into the question of "what happens when
the node is out of memory and you don't have reclaim support?".
The OOM implications jump out at you pretty aggressively.
Moreover other tasks can force the page cache pages to be moved
as well. So the programming model here just kind of sucks.
Works great for anon memory though :]
For all these reasons, I think the be mbind/mempolicy support with
private nodes needs to be brought in with follow up work - not
introduced as part of the baseline set.
> >
> > for node in possible_nodes:
> > alloc_pages_node(private_node, __GFP_THISNODE)
> >
> > In fact it's the opposite semantic of what we want.
> > THISNODE says: "Do not fallback back to OTHER nodes".
> >
>
> That's why we need to control the fallback nodes carefully for
> N_MEMORY_PRIVATE
>
My point is that __GFP_THISNODE is not actually useful.
If we go by nodemask, submitting a single-node nodemask is the
equivalent of an empty fallback list.
If we gate access to a private node by __GFP_THISNODE... this is the
same as just providing a single-node nodelist (putting aside the OOM
implications for a moment).
And it doesn't even buy you any new filtering ability against existing
nodemask iterators that may already utilize __GFP_THISNODE. i.e.
for node in online_nodes:
alloc_pages_node(node, __GFP_THISNODE, ...)
/* Alloc per-node resources */
This pattern is undesirable, but completely valid.
So overloading/requiring __GFP_THISNODE is just not useful.
I will follow up soon with a new version that limits the private node
interface to just nodemask and fallback list controls.
I need to test a few more things related to removing normal nodes from
private node fallbacks before I feel comfortable shipping without
__GFP_PRIVATE.
> > The semantic we want is "Do not allow allocations from private
> > nodes UNLESS we specifically request" (__GFP_PRIVATE).
> >
> > __GFP_THISNODE does not actually buy you anything here, AND it's
> > worse, in the scenario where a private node makes its way into the
> > preferred slot (via possible_nodes or some other nodemask), the
> > allocator cannot fall back to a node it can access.
> >
> > __GFP_THISNODE cannot be overloaded to do anything useful here.
>
> Let me clarify, I meant to say, let's use a nodemask for allocation
> and __GFP_THISNODE gets us to the node we desire, if that is the only
> node. My earlier comment might not have been clear.
>
My point was that __GFP_THISNODE is pointless and reduces to providing a
single node nodemask anyway.
The contention over __GFP_PRIVATE is a bit ideological - do we want:
1) A hard guarantee that allocations to a private node are controlled
(__GFP_PRIVATE implies the caller knows what it's doing)
or
2) A soft guarantee (fallback list isolation only), and needing to
deal with undesired behavior that's "not technically a bug"
associated with existing users of global nodemasks (possible,
online, etc).
I am arguing for #1 - the community has argued for #2 and "fixing
existing nodemask users". I think we can ship #2 and pivot to #1 if we
find fixing existing users is infeasible or too much of a maintenance
burden.
>
> Why not use mbind() API's? Do we want to gate allocation/privileges
> via a /dev?
>
We want to eventually enable it, but we really need to treat these
extensions as a separate step from the base so that the UAPI
implications are given proper scrutiny.
In the short term, /dev/xxx and driver-local/service-local control
of a node is still very useful.
For example, for my compressed memory work, I have found that if
implemented as a swap backend - the kernel can manage the node without
any UAPI implications at all :].
A driver managing memory on a private node could do the same.
~Gregory
^ permalink raw reply
* Re: [PATCH 1/8] scripts/sorttable: Handle RISC-V patchable ftrace entries
From: Chen Pei @ 2026-06-03 7:14 UTC (permalink / raw)
To: wanghan
Cc: acme, alex, andybnac, aou, bjorn, catalin.marinas, conor.dooley,
cp0613, debug, jikos, joe.lawrence, jpoimboe, linux-kernel,
linux-kselftest, linux-perf-users, linux-riscv,
linux-trace-kernel, live-patching, mark.rutland, mbenes, mhiramat,
mingo, namhyung, palmer, peterz, pjw, pmladek, puranjay, rostedt,
shuah
In-Reply-To: <20260527123530.2593918-2-wanghan@linux.alibaba.com>
On Wed, 27 May 2026 20:35:23 +0800, wanghan@linux.alibaba.com wrote:
> On an affected RISC-V QEMU boot with both CONFIG_FTRACE_SORT_STARTUP_TEST
> and CONFIG_FTRACE_STARTUP_TEST enabled, the sort check still passes
> while ftrace reports zero usable entries and the early selftests fail:
>
> [ 0.000000] ftrace section at ffffffff8101da98 sorted properly
> [ 0.000000] ftrace: allocating 0 entries in 128 pages
> [ 0.054999] Testing tracer function: .. no entries found ..FAILED!
> [ 0.172407] tracer: function failed selftest, disabling
> [ 0.178186] Failed to init function_graph tracer, init returned -19
>
> Handle RISC-V like arm64 for the function-range check and allow
> patchable entries up to 8 bytes before the function address.
>
> With this fix, a RISC-V QEMU smoke boot with ftrace startup tests shows
> the vmlinux ftrace table is populated and dynamic ftrace still works:
>
> [ 0.000000] ftrace: allocating 46749 entries in 184 pages
> [ 0.051115] Testing tracer function: PASSED
> [ 1.283782] Testing dynamic ftrace: PASSED
> [ 6.275456] Testing tracer function_graph: PASSED
>
> Fixes: 0ca1724b56af ("riscv: ftrace: select HAVE_BUILDTIME_MCOUNT_SORT")
Oops, sorry for missing that. Thanks for the quick fix!
Reviewed-by: Chen Pei <cp0613@linux.alibaba.com>
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Rasmus Villemoes @ 2026-06-03 7:15 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Andy Shevchenko, Arnd Bergmann, Steven Rostedt, Masami Hiramatsu,
Andrew Morton, Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Sergey Senozhatsky,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <35c1ba62-e74d-4abc-aa73-ccd35968ff89@app.fastmail.com>
On Tue, Jun 02 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
>> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>>>
>>> A number of tracing headers turn off -Wsuggest-attribute=format for
>>> gcc, but they don't turn it off for clang, so the same warning still
>>> happens on new versions of clang that support the format attribute.
>>>
>>> To avoid duplicating the same thing in each tracing header, as well
>>> as changing all of them to also turn it off for clang, add a new
>>> __vsnprintf() helper that is not annotated this way in linux/sprintf.h
>>> but is defined to work the same way as the regular vsprintf.
>>
>> vsprintf()
>
> Fixed now
>
>> Why the __printf() annotation is in the C file and not here?
>> Is this all about headers as the second paragraph in the commit message
>> explains?
>> I would add a comment to explain it here, otherwise we might see false
>> patches to "make things consistent" in a wrong way.
>
> I've tried to come up with a kerneldoc comment now, similar to
> the one for the vsnprintf() function, and added a separate prototype
> in the header. Does this address your concern?
>
> Arnd
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 3caf0796f54d..7c696aea2ed3 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2975,7 +2975,23 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> }
> EXPORT_SYMBOL(vsnprintf);
>
> -int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> +/**
> + * __vsnprintf - vsnprintf() wrapper without __printf() attribute
> + * @buf: The buffer to place the result into
> + * @size: The size of the buffer, including the trailing null space
> + * @fmt_str: The format string to use
> + * @args: Arguments for the format string
> + *
> + * This has the exact same behavior as vsnprintf() but can be used in call
> + * sites that are missing a __printf() annotation, e.g. because they
> + * get a 'va_format' argument instead of format and varargs.
> + *
> + * For this to work, the attribute is added to the declaration here but
> + * not in the header.
> + */
> +int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args);
> +
> +int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> {
> return vsnprintf(buf, size, fmt_str, args);
> }
May I suggest a different approach, that avoids having that extra
function emitted (which presumably compiles to a single jump
instruction, but still, with retpoline and CFI and all that it all adds
up): Keep the declaration of __vsnprintf() in the header without the
__print() attribute, but then do
int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
__alias(vsnprintf);
in vsprintf.c. Aside from reusing the same entry point, I could well
imagine a compiler some day complaining about seeing the printf
attribute applied in a local extra declaration but not having it in the
header file.
Presumably it will need its own EXPORT_SYMBOL if any of the intended
users are modular, and it certainly still needs a comment.
Rasmus
^ permalink raw reply
* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: David Hildenbrand (Arm) @ 2026-06-03 8:05 UTC (permalink / raw)
To: Lance Yang, Nico Pache
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat, mhocko,
peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
zokeefe
In-Reply-To: <185f5699-3797-4300-8c54-bb99fc2a45e0@linux.dev>
On 6/2/26 17:44, Lance Yang wrote:
>
>
> On 2026/6/2 18:58, Nico Pache wrote:
>> On Sun, May 31, 2026 at 1:19 AM Lance Yang <lance.yang@linux.dev> wrote:
>>>
>>>
>>> [...]
>>>
>>> Hmm ... don't we lose the allocation-failure result here?
>>>
>>> Previously collapse_scan_pmd() propagated SCAN_ALLOC_HUGE_PAGE_FAIL from
>>> collapse_huge_page(), so khugepaged would call khugepaged_alloc_sleep()
>>> in khugepaged_do_scan().
>>>
>>> Now if allocation fails and nr_collapsed stays 0, we just return
>>> SCAN_FAIL. So we won't back off via khugepaged_alloc_sleep() anymore?
>>
>> Ok I did the error propagation! I think I handled both of these cases
>> you brought up pretty easily.
>
> Thanks.
>
>> However I don't know what to do in the following case: We successfully
>> collapsed some portion of the PMD, but during that process, we also
>> hit an allocation failure. Is it best to back off entirely? or can we
>> treat some forward progress as a sign we can continue trying collapses
>> without sleeping.
>>
>> Basically, do we prioritize SCAN_ALLOC_HUGE_PAGE_FAIL or the
>> successful collapses as the returned value?
>
> Thinking out loud, forward progress should win here, the allocation
> failure only matter if we made no progress at all?
Agreed, in the first approach, forward progress makes sense.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Arnd Bergmann @ 2026-06-03 8:41 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Andy Shevchenko, Arnd Bergmann, Steven Rostedt, Masami Hiramatsu,
Andrew Morton, Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Sergey Senozhatsky,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <875x40hz7k.fsf@prevas.dk>
On Wed, Jun 3, 2026, at 09:15, Rasmus Villemoes wrote:
> On Tue, Jun 02 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
>> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
>>> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>
> May I suggest a different approach, that avoids having that extra
> function emitted (which presumably compiles to a single jump
> instruction, but still, with retpoline and CFI and all that it all adds
> up): Keep the declaration of __vsnprintf() in the header without the
> __print() attribute, but then do
>
> int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> __alias(vsnprintf);
>
> in vsprintf.c. Aside from reusing the same entry point, I could well
> imagine a compiler some day complaining about seeing the printf
> attribute applied in a local extra declaration but not having it in the
> header file.
>
> Presumably it will need its own EXPORT_SYMBOL if any of the intended
> users are modular, and it certainly still needs a comment.
I had tried that earlier but given up because the attributes have to
match exactly.
This definition works with all currently supported versions of gcc,
but may have to change when the there is a new version that adds
even more attributes:
int
__printf(3, 0)
__attribute__((nothrow))
__attribute__((nonnull(1)))
__vsnprintf(char *__restrict buf, size_t size,
const char * __restrict fmt_str, va_list args)
__alias(vsnprintf);
We'd probably want to also add __nothrow and __nonnull macros
in linux/compiler-attributes.h if we do this.
For reference, see below for the alternative idea I had
that avoids adding the __vsnprintf() alias altogether by
passing down the va_format using "%pV".
I don't think I actually got this one right in the end
since I only build-tested it, but I expect it could be done
if someone is able to test and fix all the corner cases
properly.
Arnd
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 4715330c7b6b..8e44fc3e60b0 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -956,14 +956,11 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
* gcc warns that you can not use a va_list in an inlined
* function. But lets me make it into a macro :-/
*/
-#define __trace_event_vstr_len(fmt, va) \
+#define __trace_event_vstr_len(vf) \
({ \
- va_list __ap; \
int __ret; \
\
- va_copy(__ap, *(va)); \
- __ret = __vsnprintf(NULL, 0, fmt, __ap) + 1; \
- va_end(__ap); \
+ __ret = snprintf(NULL, 0, "%pV", vf) + 1; \
\
min(__ret, TRACE_EVENT_STR_MAX); \
})
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 1a05fc153353..2f3ee3632e77 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -143,20 +143,20 @@
* saved string into the "foo" field.
*
* __vstring: This is similar to __string() but instead of taking a
- * dynamic length, it takes a variable list va_list 'va' variable.
+ * dynamic length, it takes a variable list va_format 'vaf' variable.
* Some event callers already have a message from parameters saved
- * in a va_list. Passing in the format and the va_list variable
- * will save just enough on the ring buffer for that string.
- * Note, the va variable used is a pointer to a va_list, not
- * to the va_list directly.
+ * in a va_format. Passing in the va_format variable will save just
+ * enough on the ring buffer for that string.
*
- * (va_list *va)
+ * (va_format *vaf)
*
- * __vstring(foo, fmt, va) is similar to: vsnprintf(foo, fmt, va)
+ * __vstring(foo, vaf) is similar to:
+ *
+ * vsnprintf(foo, "%pV", vaf)
*
* To assign the string, use the helper macro __assign_vstr().
*
- * __assign_vstr(foo, fmt, va);
+ * __assign_vstr(foo, vaf);
*
* In most cases, the __assign_vstr() macro will take the same
* parameters as the __vstring() macro had to declare the string.
@@ -292,9 +292,9 @@ TRACE_EVENT(foo_bar,
TP_PROTO(const char *foo, int bar, const int *lst,
const char *string, const struct cpumask *mask,
- const char *fmt, va_list *va),
+ struct va_format *vaf),
- TP_ARGS(foo, bar, lst, string, mask, fmt, va),
+ TP_ARGS(foo, bar, lst, string, mask, vaf),
TP_STRUCT__entry(
__array( char, foo, 10 )
@@ -303,7 +303,7 @@ TRACE_EVENT(foo_bar,
__string( str, string )
__bitmask( cpus, num_possible_cpus() )
__cpumask( cpum )
- __vstring( vstr, fmt, va )
+ __vstring( vstr, vaf )
__string_len( lstr, foo, bar / 2 < strlen(foo) ? bar / 2 : strlen(foo) )
),
@@ -314,7 +314,7 @@ TRACE_EVENT(foo_bar,
__length_of(lst) * sizeof(int));
__assign_str(str);
__assign_str(lstr);
- __assign_vstr(vstr, fmt, va);
+ __assign_vstr(vstr, vaf);
__assign_bitmask(cpus, cpumask_bits(mask), num_possible_cpus());
__assign_cpumask(cpum, cpumask_bits(mask));
),
diff --git a/include/trace/stages/stage6_event_callback.h b/include/trace/stages/stage6_event_callback.h
index 7d6a6ca6e779..2a4611b20afa 100644
--- a/include/trace/stages/stage6_event_callback.h
+++ b/include/trace/stages/stage6_event_callback.h
@@ -28,7 +28,7 @@
#define __string_len(item, src, len) __dynamic_array(char, item, -1)
#undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
#undef __assign_str
#define __assign_str(dst) \
@@ -41,13 +41,8 @@
} while (0)
#undef __assign_vstr
-#define __assign_vstr(dst, fmt, va) \
- do { \
- va_list __cp_va; \
- va_copy(__cp_va, *(va)); \
- __vsnprintf(__get_str(dst), TRACE_EVENT_STR_MAX, fmt, __cp_va); \
- va_end(__cp_va); \
- } while (0)
+#define __assign_vstr(dst, vf) \
+ snprintf(__get_str(dst), TRACE_EVENT_STR_MAX, "%pV", vf);
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)
diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
index 05c4f1354269..c96144d516db 100644
--- a/drivers/infiniband/hw/hfi1/trace_dbg.h
+++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
@@ -26,10 +26,10 @@ DECLARE_EVENT_CLASS(hfi1_trace_template,
TP_PROTO(const char *function, struct va_format *vaf),
TP_ARGS(function, vaf),
TP_STRUCT__entry(__string(function, function)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(__assign_str(function);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("(%s) %s",
__get_str(function),
diff --git a/drivers/net/wireless/ath/ath10k/trace.h b/drivers/net/wireless/ath/ath10k/trace.h
index 68b78ca17eaa..c258ad7de79e 100644
--- a/drivers/net/wireless/ath/ath10k/trace.h
+++ b/drivers/net/wireless/ath/ath10k/trace.h
@@ -52,12 +52,12 @@ DECLARE_EVENT_CLASS(ath10k_log_event,
TP_STRUCT__entry(
__string(device, dev_name(ar->dev))
__string(driver, dev_driver_string(ar->dev))
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(device);
__assign_str(driver);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk(
"%s %s %s",
@@ -89,13 +89,13 @@ TRACE_EVENT(ath10k_log_dbg,
__string(device, dev_name(ar->dev))
__string(driver, dev_driver_string(ar->dev))
__field(unsigned int, level)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(device);
__assign_str(driver);
__entry->level = level;
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk(
"%s %s %s",
diff --git a/drivers/net/wireless/ath/ath11k/trace.h b/drivers/net/wireless/ath/ath11k/trace.h
index 75246b0a82e3..0ac14b72deac 100644
--- a/drivers/net/wireless/ath/ath11k/trace.h
+++ b/drivers/net/wireless/ath/ath11k/trace.h
@@ -127,12 +127,12 @@ DECLARE_EVENT_CLASS(ath11k_log_event,
TP_STRUCT__entry(
__string(device, dev_name(ab->dev))
__string(driver, dev_driver_string(ab->dev))
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(device);
__assign_str(driver);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk(
"%s %s %s",
diff --git a/drivers/net/wireless/ath/ath6kl/trace.h b/drivers/net/wireless/ath/ath6kl/trace.h
index 8577aa459c58..d46fe6b675f9 100644
--- a/drivers/net/wireless/ath/ath6kl/trace.h
+++ b/drivers/net/wireless/ath/ath6kl/trace.h
@@ -253,10 +253,10 @@ DECLARE_EVENT_CLASS(ath6kl_log_event,
TP_PROTO(struct va_format *vaf),
TP_ARGS(vaf),
TP_STRUCT__entry(
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
);
@@ -281,11 +281,11 @@ TRACE_EVENT(ath6kl_log_dbg,
TP_ARGS(level, vaf),
TP_STRUCT__entry(
__field(unsigned int, level)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__entry->level = level;
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
);
diff --git a/drivers/net/wireless/ath/trace.h b/drivers/net/wireless/ath/trace.h
index 82aac0a4baff..298a56349ea7 100644
--- a/drivers/net/wireless/ath/trace.h
+++ b/drivers/net/wireless/ath/trace.h
@@ -40,13 +40,13 @@ TRACE_EVENT(ath_log,
TP_STRUCT__entry(
__string(device, wiphy_name(wiphy))
__string(driver, KBUILD_MODNAME)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(device);
__assign_str(driver);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk(
diff --git a/drivers/net/wireless/ath/wil6210/trace.h b/drivers/net/wireless/ath/wil6210/trace.h
index 201f44612c31..7eb6ca2b0cb6 100644
--- a/drivers/net/wireless/ath/wil6210/trace.h
+++ b/drivers/net/wireless/ath/wil6210/trace.h
@@ -70,10 +70,10 @@ DECLARE_EVENT_CLASS(wil6210_log_event,
TP_PROTO(struct va_format *vaf),
TP_ARGS(vaf),
TP_STRUCT__entry(
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h
index 6c4e00e9ccd1..66b179adb80c 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h
@@ -33,11 +33,11 @@ TRACE_EVENT(brcmf_err,
TP_ARGS(func, vaf),
TP_STRUCT__entry(
__string(func, func)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(func);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s: %s", __get_str(func), __get_str(msg))
);
@@ -48,12 +48,12 @@ TRACE_EVENT(brcmf_dbg,
TP_STRUCT__entry(
__field(u32, level)
__string(func, func)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__entry->level = level;
__assign_str(func);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s: %s", __get_str(func), __get_str(msg))
);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h
index dc296d8bf775..369171af1a30 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h
@@ -28,10 +28,10 @@ DECLARE_EVENT_CLASS(brcms_msg_event,
TP_PROTO(struct va_format *vaf),
TP_ARGS(vaf),
TP_STRUCT__entry(
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
);
@@ -62,12 +62,12 @@ TRACE_EVENT(brcms_dbg,
TP_STRUCT__entry(
__field(u32, level)
__string(func, func)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__entry->level = level;
__assign_str(func);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s: %s", __get_str(func), __get_str(msg))
);
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h
index 0db1fa5477af..80cfb9fc8ad8 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h
@@ -18,10 +18,10 @@ DECLARE_EVENT_CLASS(iwlwifi_msg_event,
TP_PROTO(struct va_format *vaf),
TP_ARGS(vaf),
TP_STRUCT__entry(
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
);
@@ -53,12 +53,12 @@ TRACE_EVENT(iwlwifi_dbg,
TP_STRUCT__entry(
__field(u32, level)
__string(function, function)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__entry->level = level;
__assign_str(function);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
);
diff --git a/drivers/usb/chipidea/trace.h b/drivers/usb/chipidea/trace.h
index 1875419cd17f..9ec0df074872 100644
--- a/drivers/usb/chipidea/trace.h
+++ b/drivers/usb/chipidea/trace.h
@@ -28,11 +28,11 @@ TRACE_EVENT(ci_log,
TP_ARGS(ci, vaf),
TP_STRUCT__entry(
__string(name, dev_name(ci->dev))
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(name);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s: %s", __get_str(name), __get_str(msg))
);
diff --git a/drivers/usb/host/xhci-trace.h b/drivers/usb/host/xhci-trace.h
index 724cba2dbb78..575c02109b4b 100644
--- a/drivers/usb/host/xhci-trace.h
+++ b/drivers/usb/host/xhci-trace.h
@@ -28,9 +28,9 @@
DECLARE_EVENT_CLASS(xhci_log_msg,
TP_PROTO(struct va_format *vaf),
TP_ARGS(vaf),
- TP_STRUCT__entry(__vstring(msg, vaf->fmt, vaf->va)),
+ TP_STRUCT__entry(__vstring(msg, vaf)),
TP_fast_assign(
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
);
diff --git a/drivers/usb/mtu3/mtu3_trace.h b/drivers/usb/mtu3/mtu3_trace.h
index 89870175d635..56c9263a99d8 100644
--- a/drivers/usb/mtu3/mtu3_trace.h
+++ b/drivers/usb/mtu3/mtu3_trace.h
@@ -23,11 +23,11 @@ TRACE_EVENT(mtu3_log,
TP_ARGS(dev, vaf),
TP_STRUCT__entry(
__string(name, dev_name(dev))
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(name);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s: %s", __get_str(name), __get_str(msg))
);
diff --git a/drivers/usb/musb/musb_trace.h b/drivers/usb/musb/musb_trace.h
index 726e6697d475..7dba44b0496d 100644
--- a/drivers/usb/musb/musb_trace.h
+++ b/drivers/usb/musb/musb_trace.h
@@ -28,11 +28,11 @@ TRACE_EVENT(musb_log,
TP_ARGS(musb, vaf),
TP_STRUCT__entry(
__string(name, dev_name(musb->controller))
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(name);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s: %s", __get_str(name), __get_str(msg))
);
diff --git a/include/trace/events/iscsi.h b/include/trace/events/iscsi.h
index 990fd154f586..2e2667658b51 100644
--- a/include/trace/events/iscsi.h
+++ b/include/trace/events/iscsi.h
@@ -26,12 +26,12 @@ DECLARE_EVENT_CLASS(iscsi_log_msg,
TP_STRUCT__entry(
__string(dname, dev_name(dev) )
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(dname);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s: %s",__get_str(dname), __get_str(msg)
diff --git a/include/trace/events/qla.h b/include/trace/events/qla.h
index 74a7534b99b6..554ae9a623c6 100644
--- a/include/trace/events/qla.h
+++ b/include/trace/events/qla.h
@@ -17,11 +17,11 @@ DECLARE_EVENT_CLASS(qla_log_event,
TP_STRUCT__entry(
__string(buf, buf)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(buf);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s %s", __get_str(buf), __get_str(msg))
diff --git a/include/trace/stages/stage1_struct_define.h b/include/trace/stages/stage1_struct_define.h
index 69e0dae453bf..0ae49a935d16 100644
--- a/include/trace/stages/stage1_struct_define.h
+++ b/include/trace/stages/stage1_struct_define.h
@@ -27,7 +27,7 @@
#define __string_len(item, src, len) __dynamic_array(char, item, -1)
#undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(char, item, -1)
diff --git a/include/trace/stages/stage2_data_offsets.h b/include/trace/stages/stage2_data_offsets.h
index 8b0cff06d346..5c6dc3092e07 100644
--- a/include/trace/stages/stage2_data_offsets.h
+++ b/include/trace/stages/stage2_data_offsets.h
@@ -33,7 +33,7 @@
#define __string_len(item, src, len) __dynamic_array(char, item, -1)
#undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)
diff --git a/include/trace/stages/stage4_event_fields.h b/include/trace/stages/stage4_event_fields.h
index b6f679ae21aa..77f74d509760 100644
--- a/include/trace/stages/stage4_event_fields.h
+++ b/include/trace/stages/stage4_event_fields.h
@@ -42,7 +42,7 @@
#define __string_len(item, src, len) __dynamic_array(char, item, -1)
#undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
#undef __bitmask
#define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)
diff --git a/include/trace/stages/stage5_get_offsets.h b/include/trace/stages/stage5_get_offsets.h
index c6a62dfb18ef..1ce5ca15a8ed 100644
--- a/include/trace/stages/stage5_get_offsets.h
+++ b/include/trace/stages/stage5_get_offsets.h
@@ -65,8 +65,8 @@ static inline const char *__string_src(const char *str)
__data_offsets->item##_ptr_ = src;
#undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, \
- __trace_event_vstr_len(fmt, ap))
+#define __vstring(item, vf) __dynamic_array(char, item, \
+ __trace_event_vstr_len(vf))
#undef __rel_dynamic_array
#define __rel_dynamic_array(type, item, len) \
diff --git a/net/batman-adv/trace.h b/net/batman-adv/trace.h
index 7da692ec38e9..ac88789330a3 100644
--- a/net/batman-adv/trace.h
+++ b/net/batman-adv/trace.h
@@ -36,13 +36,13 @@ TRACE_EVENT(batadv_dbg,
TP_STRUCT__entry(
__string(device, bat_priv->mesh_iface->name)
__string(driver, KBUILD_MODNAME)
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
__assign_str(device);
__assign_str(driver);
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk(
diff --git a/net/mac80211/trace_msg.h b/net/mac80211/trace_msg.h
index aea4ce55c5ac..0de50dfa13ed 100644
--- a/net/mac80211/trace_msg.h
+++ b/net/mac80211/trace_msg.h
@@ -22,11 +22,11 @@ DECLARE_EVENT_CLASS(mac80211_msg_event,
TP_ARGS(vaf),
TP_STRUCT__entry(
- __vstring(msg, vaf->fmt, vaf->va)
+ __vstring(msg, vaf)
),
TP_fast_assign(
- __assign_vstr(msg, vaf->fmt, vaf->va);
+ __assign_vstr(msg, vaf);
),
TP_printk("%s", __get_str(msg))
diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index ecc7db237f2e..07096eadfb7b 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -23,6 +23,7 @@ static void do_simple_thread_func(int cnt, const char *fmt, ...)
{
unsigned long bitmask[1] = {0xdeadbeefUL};
va_list va;
+ struct va_format vf = { .fmt = fmt };
int array[6];
int len = cnt % 5;
int i;
@@ -35,10 +36,11 @@ static void do_simple_thread_func(int cnt, const char *fmt, ...)
array[i] = 0;
va_start(va, fmt);
+ vf.va = &va;
/* Silly tracepoints */
trace_foo_bar("hello", cnt, array, random_strings[len],
- current->cpus_ptr, fmt, &va);
+ current->cpus_ptr, &vf);
va_end(va);
^ permalink raw reply related
* Re: [syzbot] [trace?] KASAN: use-after-free Write in ring_buffer_read_page
From: Alexander Potapenko @ 2026-06-03 8:49 UTC (permalink / raw)
To: Aleksandr Nogikh
Cc: Masami Hiramatsu, Steven Rostedt, syzbot, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, syzkaller-bugs
In-Reply-To: <CANp29Y55QBfKT=FLpn=trH5Tmxj2P_7H7yhJG_xXCbCdR3Lv_A@mail.gmail.com>
On Wed, Jun 3, 2026 at 8:38 AM Aleksandr Nogikh <nogikh@google.com> wrote:
>
> On Wed, Jun 3, 2026 at 3:34 AM 'Masami Hiramatsu' via syzkaller-bugs
> <syzkaller-bugs@googlegroups.com> wrote:
> >
> > On Tue, 2 Jun 2026 12:28:29 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > On Tue, 02 Jun 2026 06:45:31 -0700
> > > syzbot <syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com> wrote:
> > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: e7ae89a0c97c Linux 7.1-rc5
> > > > git tree: upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16f06e2e580000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=58acee1ac5406016
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
> > > > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Looks like the test was doing something really weird to trigger this.
> > > Without a reproducer, it's pretty much impossible to find out what
> > > happened. Maybe AI could do it?
> > >
> >
> > Does the "I don't have any reproducer for this issue yet." means
> > this is not reproducible even if it runs completely same sequence
> > in the console output? If so, might this be a timing related issue?
> > (e.g. read v.s. write-event)
>
> Yes, syzbot normally re-plays the sequence of last programs executed
> on the crashed VM to find a reproducer, and, in many cases, they no
> longer crash the kernel..
>
> In the meanwhile, syzbot's AI bug reproduction functionality has found
> a C reproducer for a KASAN crash in the kernel/trace's ring buffer,
> although with a slightly different stack trace:
> https://syzkaller.appspot.com/ai_job?id=b2620161-1632-4d4e-9314-114a8a5e79ef
>
> Cc Alexander Potapenko
Yes, the bug that the AI reproduced manifests with a different stack:
BUG: KASAN: slab-use-after-free in instrument_copy_to_user
include/linux/instrumented.h:129 [inline]
BUG: KASAN: slab-use-after-free in _inline_copy_to_user
include/linux/uaccess.h:205 [inline]
BUG: KASAN: slab-use-after-free in _copy_to_user+0x79/0xb0 lib/usercopy.c:26
Read of size 12288 at addr ffff888180423000 by task syz-executor144/5941
CPU: 1 UID: 0 PID: 5941 Comm: syz-executor144 Not tainted syzkaller #1
PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description+0x55/0x1e0 mm/kasan/report.c:378
print_report+0x58/0x70 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x264/0x2c0 mm/kasan/generic.c:200
instrument_copy_to_user include/linux/instrumented.h:129 [inline]
_inline_copy_to_user include/linux/uaccess.h:205 [inline]
_copy_to_user+0x79/0xb0 lib/usercopy.c:26
copy_to_user include/linux/uaccess.h:236 [inline]
tracing_buffers_read+0x4cd/0xd60 kernel/trace/trace.c:7158
vfs_read+0x20c/0xa70 fs/read_write.c:572
ksys_read+0x150/0x270 fs/read_write.c:717
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f9facd00cde
Code: 08 0f 85 f5 e2 ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89
c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 90
41 57 41 56 4d 89 c6 41 55 4d 89 cd 41 54 55 53 48 83 ec 08
RSP: 002b:00007f9fabc9e198 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f9fabca26c0 RCX: 00007f9facd00cde
RDX: 0000000000004000 RSI: 00007f9fabc9e200 RDI: 0000000000000006
RBP: 00007f9fabc9e200 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9facd7ab20
R13: 0000000000000000 R14: 00007ffd6ed73110 R15: 00007ffd6ed731f8
</TASK>
Quoting the AI itself:
"""
The reproducer successfully triggered a KASAN use-after-free crash in
the tracing subsystem. Although the exact crash signature differs
slightly (a read in `_copy_to_user` called from `tracing_buffers_read`
vs a write in `ring_buffer_read_page` called from
`tracing_buffers_read`), both crashes are use-after-free bugs on ring
buffer pages accessed during `tracing_buffers_read`. The reproduced
crash shows the page being freed by `ring_buffer_subbuf_order_set`
(via `buffer_subbuf_size_write`) while being concurrently accessed by
`tracing_buffers_read`. This confirms the underlying race condition
between reading the trace buffers and modifying the buffer size/order
has been successfully reproduced.
"""
I took a glance at the reports, and the above makes sense: we just
happen to access filp->private_data->spare at different times after it
has been freed.
PS. Please bear with repro-c, it's making its baby steps.
The reproducer contains some dead code, and the results are hard to navigate.
At some point we'll probably be able to link AI-generated repros from
the original bugs.
^ permalink raw reply
* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Suzuki K Poulose @ 2026-06-03 8:58 UTC (permalink / raw)
To: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, aneesh.kumar, liam, Paolo Bonzini, Sean Christopherson,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <CAEvNRgE1dCVAxJWd_hyFa8N=m9JLfn97ip9tAmvHxspWJ50oGg@mail.gmail.com>
On 02/06/2026 23:41, Ackerley Tng wrote:
> Suzuki K Poulose <suzuki.poulose@arm.com> writes:
>
>>
>> [...snip...]
>>
>>>> @@ -914,7 +916,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct
>>>> kvm_memory_slot *slot,
>>>> folio_mark_uptodate(folio);
>>>> }
>>>> - r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>>>> + if (kvm_gmem_is_private_mem(inode, index))
>>>
>>> Don't we need to make sure the entire folio is private ? Not just the
>>> page at the index ?
>>> if (kvm_gmem_range_is_private(, index, folio_nr_pages(folio)) ?
>
> I was thinking to fix this when I do huge pages, for now guest_memfd is
> always just PAGE_SIZE, so just looking up index is fine.
>
> Is that okay?
Thats fine, but would be good to enforce that here, so that we don't
miss out when we add support for multi page folios.
>
>>
>> Or rather, we should go through the individual pages and apply the
>> prepare for ones that are private ?
>>
>> Suzuki
>>
>
> IIRC the plan was to make kvm_gmem_prepare_folio() idempotent, as in, if
> a page is already private, just skip. Currently sev_gmem_prepare() does
> a pr_debug(), which I guess is technically still idempotent.
>
> I'm thinking that the information tha needs tracking to make
> .gmem_prepare() idempotent should be tracked by arch code.
>
> Does this work for ARM CCA?
We don't hook into the prepare yet, but have plans to do that. We should
be able to handle the pages that are already private. (For CCA context,
RMI_GRANULE_DELEGATE_RANGE can skip over already REALM pages). So this
should be fine.
My point is, in a given folio, there may be pages that are shared.
Like you said, this could be dealt with when we support hugepages.
Suzuki
>
>>>
>>> [...snip...]
>>>
^ permalink raw reply
* Re: [PATCH v2 07/13] rv: Simply hybrid automata monitors's clock variables
From: Gabriele Monaco @ 2026-06-03 9:27 UTC (permalink / raw)
To: Nam Cao
Cc: Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <ed1719ebe4af8872673af4264fdbf9ad96425b7f.1779956342.git.namcao@linutronix.de>
s/Simply/Simplify/ from the patch title.
On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> /*
> @@ -389,14 +357,14 @@ static inline void ha_setup_timer(struct
> ha_monitor *ha_mon)
> static inline void ha_start_timer_jiffy(struct ha_monitor *ha_mon,
> enum envs env,
> u64 expire, u64 time_ns)
> {
> - u64 passed = ha_invariant_passed_jiffy(ha_mon, env, expire,
> time_ns);
> + u64 passed = ha_invariant_passed_jiffy(ha_mon, env,
> time_ns);
>
> mod_timer(&ha_mon->timer, get_jiffies_64() + expire -
> passed);
> }
> static inline void ha_start_timer_ns(struct ha_monitor *ha_mon, enum
> envs env,
> u64 expire, u64 time_ns)
> {
> - u64 passed = ha_invariant_passed_ns(ha_mon, env, expire,
> time_ns);
> + u64 passed = ha_invariant_passed_ns(ha_mon, env, time_ns);
>
> ha_start_timer_jiffy(ha_mon, ENV_MAX_STORED,
> nsecs_to_jiffies(expire - passed +
> TICK_NSEC - 1), time_ns);
> @@ -438,7 +406,7 @@ static inline void ha_start_timer_ns(struct
> ha_monitor *ha_mon, enum envs env,
> u64 expire, u64 time_ns)
> {
> int mode = HRTIMER_MODE_REL_HARD;
> - u64 passed = ha_invariant_passed_ns(ha_mon, env, expire,
> time_ns);
> + u64 passed = ha_invariant_passed_ns(ha_mon, env, time_ns);
>
You need to remove expire also for ha_invariant_passed_jiffy in the
hrtimer flavour (just set HA_TIMER_HRTIMER in stall and you see it
won't compile). Jiffy granularity monitors with hrtimers are an
unlikely usecase but still supported.
Other than that it looks good.
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Thanks,
Gabriele
> if (RV_MON_TYPE == RV_MON_PER_CPU)
> mode |= HRTIMER_MODE_PINNED;
> diff --git a/kernel/trace/rv/monitors/nomiss/nomiss.c
> b/kernel/trace/rv/monitors/nomiss/nomiss.c
> index a0b5641a1858..19d0e9aa4d58 100644
> --- a/kernel/trace/rv/monitors/nomiss/nomiss.c
> +++ b/kernel/trace/rv/monitors/nomiss/nomiss.c
> @@ -57,24 +57,12 @@ static inline bool ha_verify_invariants(struct
> ha_monitor *ha_mon,
> enum states next_state, u64
> time_ns)
> {
> if (curr_state == ready_nomiss)
> - return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns);
> + return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns, DEADLINE_NS(ha_mon));
> else if (curr_state == running_nomiss)
> - return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns);
> + return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns, DEADLINE_NS(ha_mon));
> return true;
> }
>
> -static inline void ha_convert_inv_guard(struct ha_monitor *ha_mon,
> - enum states curr_state, enum
> events event,
> - enum states next_state, u64
> time_ns)
> -{
> - if (curr_state == next_state)
> - return;
> - if (curr_state == ready_nomiss)
> - ha_inv_to_guard(ha_mon, clk_nomiss,
> DEADLINE_NS(ha_mon), time_ns);
> - else if (curr_state == running_nomiss)
> - ha_inv_to_guard(ha_mon, clk_nomiss,
> DEADLINE_NS(ha_mon), time_ns);
> -}
> -
> static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
> enum states curr_state, enum
> events event,
> enum states next_state, u64
> time_ns)
> @@ -122,8 +110,6 @@ static bool ha_verify_constraint(struct
> ha_monitor *ha_mon,
> if (!ha_verify_invariants(ha_mon, curr_state, event,
> next_state, time_ns))
> return false;
>
> - ha_convert_inv_guard(ha_mon, curr_state, event, next_state,
> time_ns);
> -
> if (!ha_verify_guards(ha_mon, curr_state, event, next_state,
> time_ns))
> return false;
>
> diff --git a/kernel/trace/rv/monitors/stall/stall.c
> b/kernel/trace/rv/monitors/stall/stall.c
> index 9ccfda6b0e73..1aa65d7e690d 100644
> --- a/kernel/trace/rv/monitors/stall/stall.c
> +++ b/kernel/trace/rv/monitors/stall/stall.c
> @@ -38,7 +38,7 @@ static inline bool ha_verify_invariants(struct
> ha_monitor *ha_mon,
> enum states next_state, u64
> time_ns)
> {
> if (curr_state == enqueued_stall)
> - return ha_check_invariant_jiffy(ha_mon, clk_stall,
> time_ns);
> + return ha_check_invariant_jiffy(ha_mon, clk_stall,
> time_ns, threshold_jiffies);
> return true;
> }
>
^ permalink raw reply
* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: David Hildenbrand (Arm) @ 2026-06-03 9:55 UTC (permalink / raw)
To: Nico Pache
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
Usama Arif, usamaarif642
In-Reply-To: <19639b08-5bf1-4974-9635-c458d512fa38@redhat.com>
On 6/2/26 19:23, Nico Pache wrote:
>
>
> On 6/1/26 7:15 AM, David Hildenbrand (Arm) wrote:
>>>
>>> So I looked into your items below. It seems logical, and I think it
>>> works the same way; however, your method seems slightly harder to
>>> understand due to all the edge cases and more error-prone to future
>>> changes (the stack holds implicit knowledge of the offset/order that
>>> must now be tracked in the edge cases).
>>>
>>> Given the stack is 24 bytes, I'm not sure if the extra complexity is
>>> worth saving that small amount of memory. Although we would also be
>>> getting rid of (3?) functions, so both approaches have pros and cons.
>>
>> I consider a simple forward loop over the offset ... less complexity compared to
>> a stack structure :)
>>
>>>
>>> I will implement a patch comparing your solution against mine and send
>>> it here, then we can decide which approach is better.
>>
>> Right, throw it over the fence and I'll see how to improve it further.
>
> Ok heres what the diff looks like on top of my V19.
>
> you can access the tree here https://gitlab.com/npache/linux/-/commits/mthp-v19?ref_type=heads for easier review.
>
> So far I have no problem with this approach it appeared cleaner than i thought. Did some light testing. Gonna throw it more through the ringer tomorrow.
It's very clean.
Almost too nice to be true ;)
[...]
> unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
> enum scan_result last_result = SCAN_FAIL;
> - int collapsed = 0, stack_size = 0;
> + int collapsed = 0;
> bool alloc_failed = false;
> unsigned long collapse_address;
> - struct mthp_range range;
> - u16 offset;
> - u8 order;
> + unsigned int offset = 0;
> + unsigned int order = HPAGE_PMD_ORDER;
In include/linux/huge_mm.h we have
highest_order()
and
next_order()
They essentially allow you to get rid of the test_bit() and just jump to the
next enabled order right away.
I assume with only a handful of enabled_orders, that might be much more efficient.
I tried to optimize it and ended with the following, which is completely untested.
I think it might make sense to defer that and start with the simple approach you have.
I do wonder, though, about the last hunk below: should we bail out early if
enabled_orders is suddenly 0?
From 0d8ff955b3071f354b7fc9b627820fa374fa99dc Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Wed, 3 Jun 2026 11:52:44 +0200
Subject: [PATCH] tmp
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
include/linux/huge_mm.h | 5 ++
mm/khugepaged.c | 132 ++++++++++++++++++++++------------------
2 files changed, 78 insertions(+), 59 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 48496f09909b..099318bc1181 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -205,6 +205,11 @@ static inline int highest_order(unsigned long orders)
return fls_long(orders) - 1;
}
+static inline int smallest_order(unsigned long orders)
+{
+ return __ffs(orders);
+}
+
static inline int next_order(unsigned long *orders, int prev)
{
*orders &= ~BIT(prev);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6de935e76ceb..49be9d1a88cb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -99,8 +99,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
static struct kmem_cache *mm_slot_cache __ro_after_init;
-#define KHUGEPAGED_MIN_MTHP_ORDER 2
-
struct collapse_control {
bool is_khugepaged;
@@ -1454,76 +1452,86 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
*/
static enum scan_result mthp_collapse(struct mm_struct *mm,
unsigned long address, int referenced, int unmapped,
- struct collapse_control *cc, unsigned long enabled_orders)
+ struct collapse_control *cc, const unsigned long enabled_orders)
{
- unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
enum scan_result last_result = SCAN_FAIL;
int collapsed = 0;
bool alloc_failed = false;
unsigned long collapse_address;
unsigned int offset = 0;
- unsigned int order = HPAGE_PMD_ORDER;
+ /* We cannot collapse anon folios to order-1 or order-0. */
+ VM_WARN_ON_ONCE(!enabled_order || (enabled_orders & 0x3));
while (offset < HPAGE_PMD_NR) {
- nr_ptes = 1UL << order;
-
- if (!test_bit(order, &enabled_orders))
- goto next_order;
-
- max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
- nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
- offset + nr_ptes);
-
- if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
- enum scan_result ret;
-
- collapse_address = address + offset * PAGE_SIZE;
- ret = collapse_huge_page(mm, collapse_address, referenced,
- unmapped, cc, order);
-
- switch (ret) {
- /* Cases where we continue to next collapse candidate */
- case SCAN_SUCCEED:
- collapsed += nr_ptes;
- fallthrough;
- case SCAN_PTE_MAPPED_HUGEPAGE:
- goto next_offset;
- /* Cases where lower orders might still succeed */
- case SCAN_ALLOC_HUGE_PAGE_FAIL:
- alloc_failed = true;
- fallthrough;
- case SCAN_LACK_REFERENCED_PAGE:
- case SCAN_EXCEED_NONE_PTE:
- case SCAN_EXCEED_SWAP_PTE:
- case SCAN_EXCEED_SHARED_PTE:
- case SCAN_PAGE_LOCK:
- case SCAN_PAGE_COUNT:
- case SCAN_PAGE_NULL:
- case SCAN_DEL_PAGE_LRU:
- case SCAN_PTE_NON_PRESENT:
- case SCAN_PTE_UFFD_WP:
- case SCAN_PAGE_LAZYFREE:
- last_result = ret;
- goto next_order;
- /* Cases where no further collapse is possible */
- case SCAN_PMD_MAPPED:
- fallthrough;
- default:
- last_result = ret;
- goto done;
+ /*
+ * We can only collapse to a maximum order for a given offset.
+ * So ignore all orders that do not apply to the current
+ * offset, then see if any order to collapse to remains.
+ */
+ unsigned long orders = enabled_orders & GENMASK(__ffs(offset), 0);
+ unsigned int order = highest_order(orders);
+
+ while (order) {
+ const unsigned int nr_ptes = 1UL << order;
+ unsigned int nr_occupied_ptes, max_ptes_none;
+
+ max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
+ nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
+ offset + nr_ptes);
+
+ if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
+ enum scan_result ret;
+
+ collapse_address = address + offset * PAGE_SIZE;
+ ret = collapse_huge_page(mm, collapse_address, referenced,
+ unmapped, cc, order);
+
+ switch (ret) {
+ /* Cases where we continue to next collapse candidate */
+ case SCAN_SUCCEED:
+ collapsed += nr_ptes;
+ fallthrough;
+ case SCAN_PTE_MAPPED_HUGEPAGE:
+ goto next_offset;
+ /* Cases where lower orders might still succeed */
+ case SCAN_ALLOC_HUGE_PAGE_FAIL:
+ alloc_failed = true;
+ fallthrough;
+ case SCAN_LACK_REFERENCED_PAGE:
+ case SCAN_EXCEED_NONE_PTE:
+ case SCAN_EXCEED_SWAP_PTE:
+ case SCAN_EXCEED_SHARED_PTE:
+ case SCAN_PAGE_LOCK:
+ case SCAN_PAGE_COUNT:
+ case SCAN_PAGE_NULL:
+ case SCAN_DEL_PAGE_LRU:
+ case SCAN_PTE_NON_PRESENT:
+ case SCAN_PTE_UFFD_WP:
+ case SCAN_PAGE_LAZYFREE:
+ last_result = ret;
+ break;
+ /* Cases where no further collapse is possible */
+ case SCAN_PMD_MAPPED:
+ fallthrough;
+ default:
+ last_result = ret;
+ goto done;
+ }
}
- }
-next_order:
- if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
- (BIT(order) - 1) & enabled_orders) {
- order = order - 1;
- continue;
+ order = next_order(&orders, order);
}
+
next_offset:
- offset += nr_ptes;
- order = min_t(int, __ffs(offset), HPAGE_PMD_ORDER);
+ /*
+ * Continue with the next collapse candidate. If we do not
+ * have an order, skip to nest smallest mTHP we can collapse to.
+ */
+ if (order)
+ offset += 1UL << order;
+ else
+ offset = ALIGN(offset + 1, smallest_order(enabled_orders));
}
done:
if (collapsed)
@@ -1567,6 +1575,12 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
+ if (unlikely(!enabled_orders)) {
+ cc->progress++;
+ result = SCAN_SUCCEED;
+ goto out;
+ }
+
/*
* If PMD is the only enabled order, enforce max_ptes_none, otherwise
* scan all pages to populate the bitmap for mTHP collapse.
--
2.43.0
--
Cheers,
David
^ permalink raw reply related
* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: David Hildenbrand (Arm) @ 2026-06-03 10:00 UTC (permalink / raw)
To: Nico Pache
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
Usama Arif, usamaarif642
In-Reply-To: <19639b08-5bf1-4974-9635-c458d512fa38@redhat.com>
> next_order:
> - if ((BIT(order) - 1) & enabled_orders) {
> - const u8 next_order = order - 1;
> - const u16 mid_offset = offset + (nr_ptes / 2);
> -
> - collapse_mthp_stack_push(cc, &stack_size, mid_offset,
> - next_order);
> - collapse_mthp_stack_push(cc, &stack_size, offset,
> - next_order);
> + if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
> + (BIT(order) - 1) & enabled_orders) {
Why not a test_bit() ?
But, wouldn't you want to skip orders that are not enabled and try with the next
smaller one in any case before you advance the offset?
--
Cheers,
David
^ permalink raw reply
* [PATCHv7 bpf-next 00/29] bpf: tracing_multi link
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Hengqi Chen, bpf, linux-trace-kernel, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, Menglong Dong,
Steven Rostedt
hi,
adding tracing_multi link support that allows fast attachment
of tracing program to many functions.
RFC: https://lore.kernel.org/bpf/20260203093819.2105105-1-jolsa@kernel.org/
v1: https://lore.kernel.org/bpf/20260220100649.628307-1-jolsa@kernel.org/
v2: https://lore.kernel.org/bpf/20260304222141.497203-1-jolsa@kernel.org/
v3: https://lore.kernel.org/bpf/20260316075138.465430-1-jolsa@kernel.org/
v4: https://lore.kernel.org/bpf/20260324081846.2334094-1-jolsa@kernel.org/
v5: https://lore.kernel.org/bpf/20260417192502.194548-1-jolsa@kernel.org/
v6: https://lore.kernel.org/bpf/20260527113951.46265-1-jolsa@kernel.org/
v7 changes:
- added ftrace_hash_count stub for !CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS cade [sashiko]
- selftests fixes [sashiko]
- use hash_ptr in select_trampoline_lock [sashiko]
- changed the check duplicate logic in check_dup_ids [sashiko]
- use sort_r_nonatomic in check_dup_ids [sashiko]
- added BPF_TRACE_FSESSION_MULTI to can_be_sleepable,
plus added testcase for sleepable fsession
- make bpf_tracing_multi_opts pointer fields as const
- add ___migrate_enable to trace_blacklist
v6 changes:
- move ftrace_hash_count declaration under CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS [sashiko]
- fix ftrace_hash_remove check/deref [sashiko]
- disable context access for multi programs by using stub function with no arguments
for verification [sashiko]
- add __used for bpf_multi_func, and removed arguments, we do not allow direct access [sashiko]
- rebased on latest loongarch changes, fix ppc build
- guard update_ftrace_direct_del with ftrace_hash_count on rollback [sashiko]
- fix noreturn attachment condition in bpf_check_attach_btf_id_multi [sashiko]
- fail early on multiple same IDs provided by user [sashiko]
- fix selftests error paths [sashiko]
- add MAX_RESOLVE_DEPTH check to btf_get_type_size [sashiko]
- use btf__pointer_size [sashiko]
- fixed compilation on powerpc [sashiko]
- added verifier fails selftest
- after discussing with Song, it was determined that cleaning up FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER
is not strictly necessary — keeping the trampoline in the ipmodify_enabled state is acceptable.
The race condition this introduces remains unlikely, so the concern raised in [1] will not be
addressed at this time.
[1] https://lore.kernel.org/bpf/aec7bAbGlnEo3R1g@krava/
v5 changes:
- add dedicated hashes used for detach, so there's no need to allocate
them on detach [sashiko]
- safely release old trampoline images [sashiko]
- add cond_resched() to couple of loops [sashiko]
- validate attr->link_create.target_fd [sashiko]
- allow only bpf_get_func_ret() for return value retrieval [sashiko]
- do not allow attachment of fexit/fsession_multi for noreturn functions [sashiko]
- fixed double free/close in libbpf btf cleanup, in separate patch [sashiko]
- make btf_type_is_traceable_func closer to btf_distill_func_proto [sashiko]
- add prog->attach_btf_obj_fd check to collect_func_ids_by_glob,
to check we don't load module programs for kernel [sashiko]
- make sure program is loaded in bpf_program__attach_tracing_multi [sashiko]
- several selftests fixes [sashiko]
- add attach_type to fdinfo output [Leon Hwang]
- selftests cleanup fixes [Leon Hwang]
v4 changes:
- unlink rollback fix (added ftrace_hash_count) [bot]
- use const for some bpf_link_create_opts tracing_multi members [bot]
- adding missing comment for lockdep keys [bot]
- selftest error path fixes (leaks) and other assorted test fixes [Leon Hwang]
- several compile fixes wrt CONFIG_BPF_SYSCALL and CONFIG_BPF_JIT [kernel test robot]
- make ftrace_hash_clear global, because it's needed in rollback
v3 changes:
- fix module parsing [Leon Hwang]
- use function traceable check from libbpf [Leon Hwang]
- use ptr_to_u64 and fix/updated few comments [ci]
- display cookies as decimal numbers [ci]
- added link_create.flags check [ci]
- fix error path in bpf_trampoline_multi_detach [ci]
- make fentry/fexit.multi not extendable [ci]
- add missing OPTS_VALID to bpf_program__attach_tracing_multi [ci]
v2 changes:
- allocate data.unreg in bpf_trampoline_multi_attach for rollback path [ci]
and fixed link count setup in rollback path [ci]
- several small assorted fixes [ci]
- added loongarch and powerpc changes for struct bpf_tramp_node change
- added support to attach functions from modules
- added tests for sleepable programs
- added rollback tests
v1 changes:
- added ftrace_hash_count as wrapper for hash_count [Steven]
- added trampoline mutex pool [Andrii]
- reworked 'struct bpf_tramp_node' separatoin [Andrii]
- the 'struct bpf_tramp_node' now holds pointer to bpf_link,
which is similar to what we do for uprobe_multi;
I understand it's not a fundamental change compared to previous
version which used bpf_prog pointer instead, but I don't see better
way of doing this.. I'm happy to discuss this further if there's
better idea
- reworked 'struct bpf_fsession_link' based on bpf_tramp_node
- made btf__find_by_glob_kind function internal helper [Andrii]
- many small assorted fixes [Andrii,CI]
- added session support [Leon Hwang]
- added cookies support
- added more tests
Note I plan to send linkinfo support separately, the patchset is big enough.
thanks,
jirka
Cc: Hengqi Chen <hengqi.chen@gmail.com>
---
Jiri Olsa (29):
ftrace: Add ftrace_hash_count function
ftrace: Add ftrace_hash_remove function
ftrace: Add add_ftrace_hash_entry function
bpf: Use mutex lock pool for bpf trampolines
bpf: Add struct bpf_trampoline_ops object
bpf: Move trampoline image setup into bpf_trampoline_ops callbacks
bpf: Add bpf_trampoline_add/remove_prog functions
bpf: Add struct bpf_tramp_node object
bpf: Factor fsession link to use struct bpf_tramp_node
bpf: Add multi tracing attach types
bpf: Move sleepable verification code to btf_id_allow_sleepable
bpf: Add bpf_trampoline_multi_attach/detach functions
bpf: Add support for tracing multi link
bpf: Add support for tracing_multi link cookies
bpf: Add support for tracing_multi link session
bpf: Add support for tracing_multi link fdinfo
libbpf: Add bpf_object_cleanup_btf function
libbpf: Add bpf_link_create support for tracing_multi link
libbpf: Add btf_type_is_traceable_func function
libbpf: Add support to create tracing multi link
selftests/bpf: Add tracing multi skel/pattern/ids attach tests
selftests/bpf: Add tracing multi skel/pattern/ids module attach tests
selftests/bpf: Add tracing multi intersect tests
selftests/bpf: Add tracing multi cookies test
selftests/bpf: Add tracing multi session test
selftests/bpf: Add tracing multi attach fails test
selftests/bpf: Add tracing multi verifier fails test
selftests/bpf: Add tracing multi attach benchmark test
selftests/bpf: Add tracing multi attach rollback tests
arch/arm64/net/bpf_jit_comp.c | 58 ++--
arch/loongarch/net/bpf_jit.c | 52 ++--
arch/powerpc/net/bpf_jit_comp.c | 54 ++--
arch/riscv/net/bpf_jit_comp64.c | 52 ++--
arch/s390/net/bpf_jit_comp.c | 44 +--
arch/x86/net/bpf_jit_comp.c | 54 ++--
include/linux/bpf.h | 117 ++++++--
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 4 +
include/linux/btf_ids.h | 1 +
include/linux/ftrace.h | 9 +
include/linux/trace_events.h | 6 +
include/uapi/linux/bpf.h | 9 +
kernel/bpf/bpf_struct_ops.c | 27 +-
kernel/bpf/fixups.c | 2 +
kernel/bpf/syscall.c | 83 +++---
kernel/bpf/trampoline.c | 670 +++++++++++++++++++++++++++++++++----------
kernel/bpf/verifier.c | 183 ++++++++++--
kernel/trace/bpf_trace.c | 204 +++++++++++++-
kernel/trace/ftrace.c | 35 ++-
net/bpf/bpf_dummy_struct_ops.c | 14 +-
net/bpf/test_run.c | 3 +
tools/include/uapi/linux/bpf.h | 10 +
tools/lib/bpf/bpf.c | 9 +
tools/lib/bpf/bpf.h | 5 +
tools/lib/bpf/libbpf.c | 375 ++++++++++++++++++++++++-
tools/lib/bpf/libbpf.h | 15 +
tools/lib/bpf/libbpf.map | 1 +
tools/lib/bpf/libbpf_internal.h | 1 +
tools/testing/selftests/bpf/Makefile | 9 +-
tools/testing/selftests/bpf/prog_tests/tracing_multi.c | 936 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tools/testing/selftests/bpf/progs/tracing_multi_attach.c | 39 +++
tools/testing/selftests/bpf/progs/tracing_multi_attach_module.c | 25 ++
tools/testing/selftests/bpf/progs/tracing_multi_bench.c | 12 +
tools/testing/selftests/bpf/progs/tracing_multi_check.c | 214 ++++++++++++++
tools/testing/selftests/bpf/progs/tracing_multi_fail.c | 18 ++
tools/testing/selftests/bpf/progs/tracing_multi_intersect_attach.c | 41 +++
tools/testing/selftests/bpf/progs/tracing_multi_rollback.c | 43 +++
tools/testing/selftests/bpf/progs/tracing_multi_session_attach.c | 65 +++++
tools/testing/selftests/bpf/progs/tracing_multi_verifier.c | 31 ++
tools/testing/selftests/bpf/trace_helpers.c | 7 +-
tools/testing/selftests/bpf/trace_helpers.h | 1 +
42 files changed, 3110 insertions(+), 429 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_attach.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_attach_module.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_bench.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_check.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_fail.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_intersect_attach.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_rollback.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_session_attach.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_verifier.c
^ permalink raw reply
* [PATCHv7 bpf-next 01/29] ftrace: Add ftrace_hash_count function
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Adding external ftrace_hash_count function so we could get hash
count outside of ftrace object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/ftrace.h | 7 +++++++
kernel/trace/ftrace.c | 7 ++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 28b30c6f1031..02c24bf766ce 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -551,6 +551,8 @@ int update_ftrace_direct_mod(struct ftrace_ops *ops, struct ftrace_hash *hash, b
void ftrace_stub_direct_tramp(void);
+unsigned long ftrace_hash_count(struct ftrace_hash *hash);
+
#else
struct ftrace_ops;
static inline unsigned long ftrace_find_rec_direct(unsigned long ip)
@@ -590,6 +592,11 @@ static inline int update_ftrace_direct_mod(struct ftrace_ops *ops, struct ftrace
return -ENODEV;
}
+static inline unsigned long ftrace_hash_count(struct ftrace_hash *hash)
+{
+ return 0;
+}
+
/*
* This must be implemented by the architecture.
* It is the way the ftrace direct_ops helper, when called
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b2611de3f594..57ab01fd00bd 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6288,11 +6288,16 @@ int modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
}
EXPORT_SYMBOL_GPL(modify_ftrace_direct);
-static unsigned long hash_count(struct ftrace_hash *hash)
+static inline unsigned long hash_count(struct ftrace_hash *hash)
{
return hash ? hash->count : 0;
}
+unsigned long ftrace_hash_count(struct ftrace_hash *hash)
+{
+ return hash_count(hash);
+}
+
/**
* hash_add - adds two struct ftrace_hash and returns the result
* @a: struct ftrace_hash object
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 02/29] ftrace: Add ftrace_hash_remove function
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Adding ftrace_hash_remove function that removes all entries
from struct ftrace_hash object without freeing them.
It will be used in following changes where entries are allocated
as part of another structure and are free-ed separately.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/ftrace.h | 1 +
kernel/trace/ftrace.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 02c24bf766ce..b55ec9b25bb3 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -415,6 +415,7 @@ struct ftrace_hash *alloc_ftrace_hash(int size_bits);
void free_ftrace_hash(struct ftrace_hash *hash);
struct ftrace_func_entry *add_ftrace_hash_entry_direct(struct ftrace_hash *hash,
unsigned long ip, unsigned long direct);
+void ftrace_hash_remove(struct ftrace_hash *hash);
/* The hash used to know what functions callbacks trace */
struct ftrace_ops_hash {
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 57ab01fd00bd..45548b0200eb 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1249,6 +1249,25 @@ remove_hash_entry(struct ftrace_hash *hash,
hash->count--;
}
+void ftrace_hash_remove(struct ftrace_hash *hash)
+{
+ struct ftrace_func_entry *entry;
+ struct hlist_head *hhd;
+ struct hlist_node *tn;
+ int size;
+ int i;
+
+ if (!hash || !hash->count)
+ return;
+ size = 1 << hash->size_bits;
+ for (i = 0; i < size; i++) {
+ hhd = &hash->buckets[i];
+ hlist_for_each_entry_safe(entry, tn, hhd, hlist)
+ remove_hash_entry(hash, entry);
+ }
+ FTRACE_WARN_ON(hash->count);
+}
+
static void ftrace_hash_clear(struct ftrace_hash *hash)
{
struct hlist_head *hhd;
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 03/29] ftrace: Add add_ftrace_hash_entry function
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Renaming __add_hash_entry to add_ftrace_hash_entry and making it global,
it will be used in following changes outside ftrace.c object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/ftrace.h | 1 +
kernel/trace/ftrace.c | 9 ++++-----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index b55ec9b25bb3..02bc5027523a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -415,6 +415,7 @@ struct ftrace_hash *alloc_ftrace_hash(int size_bits);
void free_ftrace_hash(struct ftrace_hash *hash);
struct ftrace_func_entry *add_ftrace_hash_entry_direct(struct ftrace_hash *hash,
unsigned long ip, unsigned long direct);
+void add_ftrace_hash_entry(struct ftrace_hash *hash, struct ftrace_func_entry *entry);
void ftrace_hash_remove(struct ftrace_hash *hash);
/* The hash used to know what functions callbacks trace */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 45548b0200eb..f93e34dd2328 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1198,8 +1198,7 @@ ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip)
return __ftrace_lookup_ip(hash, ip);
}
-static void __add_hash_entry(struct ftrace_hash *hash,
- struct ftrace_func_entry *entry)
+void add_ftrace_hash_entry(struct ftrace_hash *hash, struct ftrace_func_entry *entry)
{
struct hlist_head *hhd;
unsigned long key;
@@ -1221,7 +1220,7 @@ add_ftrace_hash_entry_direct(struct ftrace_hash *hash, unsigned long ip, unsigne
entry->ip = ip;
entry->direct = direct;
- __add_hash_entry(hash, entry);
+ add_ftrace_hash_entry(hash, entry);
return entry;
}
@@ -1477,7 +1476,7 @@ static struct ftrace_hash *__move_hash(struct ftrace_hash *src, int size)
hhd = &src->buckets[i];
hlist_for_each_entry_safe(entry, tn, hhd, hlist) {
remove_hash_entry(src, entry);
- __add_hash_entry(new_hash, entry);
+ add_ftrace_hash_entry(new_hash, entry);
}
}
return new_hash;
@@ -5360,7 +5359,7 @@ int ftrace_func_mapper_add_ip(struct ftrace_func_mapper *mapper,
map->entry.ip = ip;
map->data = data;
- __add_hash_entry(&mapper->hash, &map->entry);
+ add_ftrace_hash_entry(&mapper->hash, &map->entry);
return 0;
}
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 04/29] bpf: Use mutex lock pool for bpf trampolines
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Adding mutex lock pool that replaces bpf trampolines mutex.
For tracing_multi link coming in following changes we need to lock all
the involved trampolines during the attachment. This could mean thousands
of mutex locks, which is not convenient.
As suggested by Andrii we can replace bpf trampolines mutex with mutex
pool, where each trampoline is hash-ed to one of the locks from the pool.
It's better to lock all the pool mutexes (32 at the moment) than
thousands of them.
There is 48 (MAX_LOCK_DEPTH) lock limit allowed to be simultaneously
held by task, so we need to keep 32 mutexes (5 bits) in the pool, so
when we lock them all in following changes the lockdep won't scream.
Removing the mutex_is_locked in bpf_trampoline_put, because we removed
the mutex from bpf_trampoline.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/bpf.h | 2 --
kernel/bpf/trampoline.c | 77 ++++++++++++++++++++++++++++-------------
2 files changed, 53 insertions(+), 26 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8599b451dd7a..fd0d873219d2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1352,8 +1352,6 @@ struct bpf_trampoline {
/* hlist for trampoline_ip_table */
struct hlist_node hlist_ip;
struct ftrace_ops *fops;
- /* serializes access to fields of this trampoline */
- struct mutex mutex;
refcount_t refcnt;
u32 flags;
u64 key;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index a4298a25d4ba..c0b4732627be 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -30,6 +30,35 @@ static struct hlist_head trampoline_ip_table[TRAMPOLINE_TABLE_SIZE];
/* serializes access to trampoline tables */
static DEFINE_MUTEX(trampoline_mutex);
+/*
+ * Keep 32 trampoline locks (5 bits) in the pool so trampoline_lock_all()
+ * stays below MAX_LOCK_DEPTH. Each pool slot has a distinct lockdep
+ * class because trampoline_lock_all() takes all pool mutexes at once;
+ * otherwise lockdep would report recursive locking on same-class mutexes.
+ */
+#define TRAMPOLINE_LOCKS_BITS 5
+#define TRAMPOLINE_LOCKS_TABLE_SIZE (1 << TRAMPOLINE_LOCKS_BITS)
+
+static struct {
+ struct mutex mutex;
+ struct lock_class_key key;
+} trampoline_locks[TRAMPOLINE_LOCKS_TABLE_SIZE];
+
+static struct mutex *select_trampoline_lock(struct bpf_trampoline *tr)
+{
+ return &trampoline_locks[hash_ptr(tr, TRAMPOLINE_LOCKS_BITS)].mutex;
+}
+
+static void trampoline_lock(struct bpf_trampoline *tr)
+{
+ mutex_lock(select_trampoline_lock(tr));
+}
+
+static void trampoline_unlock(struct bpf_trampoline *tr)
+{
+ mutex_unlock(select_trampoline_lock(tr));
+}
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex);
@@ -69,9 +98,9 @@ static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, unsigned long ip,
if (cmd == FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF) {
/* This is called inside register_ftrace_direct_multi(), so
- * tr->mutex is already locked.
+ * trampoline's mutex is already locked.
*/
- lockdep_assert_held_once(&tr->mutex);
+ lockdep_assert_held_once(select_trampoline_lock(tr));
/* Instead of updating the trampoline here, we propagate
* -EAGAIN to register_ftrace_direct(). Then we can
@@ -91,7 +120,7 @@ static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, unsigned long ip,
}
/* The normal locking order is
- * tr->mutex => direct_mutex (ftrace.c) => ftrace_lock (ftrace.c)
+ * select_trampoline_lock(tr) => direct_mutex (ftrace.c) => ftrace_lock (ftrace.c)
*
* The following two commands are called from
*
@@ -99,12 +128,12 @@ static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, unsigned long ip,
* cleanup_direct_functions_after_ipmodify
*
* In both cases, direct_mutex is already locked. Use
- * mutex_trylock(&tr->mutex) to avoid deadlock in race condition
- * (something else is making changes to this same trampoline).
+ * mutex_trylock(select_trampoline_lock(tr)) to avoid deadlock in race condition
+ * (something else holds the same pool lock).
*/
- if (!mutex_trylock(&tr->mutex)) {
- /* sleep 1 ms to make sure whatever holding tr->mutex makes
- * some progress.
+ if (!mutex_trylock(select_trampoline_lock(tr))) {
+ /* sleep 1 ms to make sure whatever holding select_trampoline_lock(tr)
+ * makes some progress.
*/
msleep(1);
return -EAGAIN;
@@ -129,7 +158,7 @@ static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, unsigned long ip,
break;
}
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
return ret;
}
#endif
@@ -359,7 +388,6 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key, unsigned long ip)
head = &trampoline_ip_table[hash_64(tr->ip, TRAMPOLINE_HASH_BITS)];
hlist_add_head(&tr->hlist_ip, head);
refcount_set(&tr->refcnt, 1);
- mutex_init(&tr->mutex);
for (i = 0; i < BPF_TRAMP_MAX; i++)
INIT_HLIST_HEAD(&tr->progs_hlist[i]);
out:
@@ -843,9 +871,9 @@ int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
{
int err;
- mutex_lock(&tr->mutex);
+ trampoline_lock(tr);
err = __bpf_trampoline_link_prog(link, tr, tgt_prog);
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
return err;
}
@@ -886,9 +914,9 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
{
int err;
- mutex_lock(&tr->mutex);
+ trampoline_lock(tr);
err = __bpf_trampoline_unlink_prog(link, tr, tgt_prog);
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
return err;
}
@@ -998,12 +1026,12 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
if (!tr)
return -ENOMEM;
- mutex_lock(&tr->mutex);
+ trampoline_lock(tr);
shim_link = cgroup_shim_find(tr, bpf_func);
if (shim_link && !IS_ERR(bpf_link_inc_not_zero(&shim_link->link.link))) {
/* Reusing existing shim attached by the other program. */
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
bpf_trampoline_put(tr); /* bpf_trampoline_get above */
return 0;
}
@@ -1023,16 +1051,16 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
shim_link->trampoline = tr;
/* note, we're still holding tr refcnt from above */
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
return 0;
err:
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
if (shim_link)
bpf_link_put(&shim_link->link.link);
- /* have to release tr while _not_ holding its mutex */
+ /* have to release tr while _not_ holding pool mutex for trampoline */
bpf_trampoline_put(tr); /* bpf_trampoline_get above */
return err;
@@ -1053,9 +1081,9 @@ void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog)
if (WARN_ON_ONCE(!tr))
return;
- mutex_lock(&tr->mutex);
+ trampoline_lock(tr);
shim_link = cgroup_shim_find(tr, bpf_func);
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
if (shim_link)
bpf_link_put(&shim_link->link.link);
@@ -1073,14 +1101,14 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key,
if (!tr)
return NULL;
- mutex_lock(&tr->mutex);
+ trampoline_lock(tr);
if (tr->func.addr)
goto out;
memcpy(&tr->func.model, &tgt_info->fmodel, sizeof(tgt_info->fmodel));
tr->func.addr = (void *)tgt_info->tgt_addr;
out:
- mutex_unlock(&tr->mutex);
+ trampoline_unlock(tr);
return tr;
}
@@ -1093,7 +1121,6 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
mutex_lock(&trampoline_mutex);
if (!refcount_dec_and_test(&tr->refcnt))
goto out;
- WARN_ON_ONCE(mutex_is_locked(&tr->mutex));
for (i = 0; i < BPF_TRAMP_MAX; i++)
if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[i])))
@@ -1379,6 +1406,8 @@ static int __init init_trampolines(void)
INIT_HLIST_HEAD(&trampoline_key_table[i]);
for (i = 0; i < TRAMPOLINE_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&trampoline_ip_table[i]);
+ for (i = 0; i < TRAMPOLINE_LOCKS_TABLE_SIZE; i++)
+ __mutex_init(&trampoline_locks[i].mutex, "trampoline_lock", &trampoline_locks[i].key);
return 0;
}
late_initcall(init_trampolines);
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 05/29] bpf: Add struct bpf_trampoline_ops object
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
In following changes we will need to override ftrace direct attachment
behaviour. In order to do that we are adding struct bpf_trampoline_ops
object that defines callbacks for ftrace direct attachment:
register_fentry
unregister_fentry
modify_fentry
The new struct bpf_trampoline_ops object is passed as an argument to
__bpf_trampoline_link/unlink_prog functions.
At the moment the default trampoline_ops is set to the current ftrace
direct attachment functions, so there's no functional change for the
current code.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/bpf/trampoline.c | 59 ++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 18 deletions(-)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index c0b4732627be..5c943832fb9d 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -59,8 +59,18 @@ static void trampoline_unlock(struct bpf_trampoline *tr)
mutex_unlock(select_trampoline_lock(tr));
}
+struct bpf_trampoline_ops {
+ int (*register_fentry)(struct bpf_trampoline *tr, void *new_addr, void *data);
+ int (*unregister_fentry)(struct bpf_trampoline *tr, u32 orig_flags, void *old_addr,
+ void *data);
+ int (*modify_fentry)(struct bpf_trampoline *tr, u32 orig_flags, void *old_addr,
+ void *new_addr, bool lock_direct_mutex, void *data);
+};
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
-static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex);
+static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex,
+ const struct bpf_trampoline_ops *ops, void *data);
+static const struct bpf_trampoline_ops trampoline_ops;
#ifdef CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS
static struct bpf_trampoline *direct_ops_ip_lookup(struct ftrace_ops *ops, unsigned long ip)
@@ -145,13 +155,15 @@ static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, unsigned long ip,
if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) &&
!(tr->flags & BPF_TRAMP_F_ORIG_STACK))
- ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
+ ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */,
+ &trampoline_ops, NULL);
break;
case FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER:
tr->flags &= ~BPF_TRAMP_F_SHARE_IPMODIFY;
if (tr->flags & BPF_TRAMP_F_ORIG_STACK)
- ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
+ ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */,
+ &trampoline_ops, NULL);
break;
default:
ret = -EINVAL;
@@ -415,7 +427,7 @@ static int bpf_trampoline_update_fentry(struct bpf_trampoline *tr, u32 orig_flag
}
static int unregister_fentry(struct bpf_trampoline *tr, u32 orig_flags,
- void *old_addr)
+ void *old_addr, void *data __maybe_unused)
{
int ret;
@@ -429,7 +441,7 @@ static int unregister_fentry(struct bpf_trampoline *tr, u32 orig_flags,
static int modify_fentry(struct bpf_trampoline *tr, u32 orig_flags,
void *old_addr, void *new_addr,
- bool lock_direct_mutex)
+ bool lock_direct_mutex, void *data __maybe_unused)
{
int ret;
@@ -443,7 +455,7 @@ static int modify_fentry(struct bpf_trampoline *tr, u32 orig_flags,
}
/* first time registering */
-static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
+static int register_fentry(struct bpf_trampoline *tr, void *new_addr, void *data __maybe_unused)
{
void *ip = tr->func.addr;
unsigned long faddr;
@@ -465,6 +477,12 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
return ret;
}
+static const struct bpf_trampoline_ops trampoline_ops = {
+ .register_fentry = register_fentry,
+ .unregister_fentry = unregister_fentry,
+ .modify_fentry = modify_fentry,
+};
+
static struct bpf_tramp_links *
bpf_trampoline_get_progs(const struct bpf_trampoline *tr, int *total, bool *ip_arg)
{
@@ -632,7 +650,8 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, int size)
return ERR_PTR(err);
}
-static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex)
+static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex,
+ const struct bpf_trampoline_ops *ops, void *data)
{
struct bpf_tramp_image *im;
struct bpf_tramp_links *tlinks;
@@ -645,7 +664,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
return PTR_ERR(tlinks);
if (total == 0) {
- err = unregister_fentry(tr, orig_flags, tr->cur_image->image);
+ err = ops->unregister_fentry(tr, orig_flags, tr->cur_image->image, data);
bpf_tramp_image_put(tr->cur_image);
tr->cur_image = NULL;
goto out;
@@ -715,11 +734,11 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
if (tr->cur_image)
/* progs already running at this address */
- err = modify_fentry(tr, orig_flags, tr->cur_image->image,
- im->image, lock_direct_mutex);
+ err = ops->modify_fentry(tr, orig_flags, tr->cur_image->image,
+ im->image, lock_direct_mutex, data);
else
/* first time registering */
- err = register_fentry(tr, im->image);
+ err = ops->register_fentry(tr, im->image, data);
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
if (err == -EAGAIN) {
@@ -793,7 +812,9 @@ static int bpf_freplace_check_tgt_prog(struct bpf_prog *tgt_prog)
static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
- struct bpf_prog *tgt_prog)
+ struct bpf_prog *tgt_prog,
+ const struct bpf_trampoline_ops *ops,
+ void *data)
{
struct bpf_fsession_link *fslink = NULL;
enum bpf_tramp_prog_type kind;
@@ -851,7 +872,7 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
} else {
tr->progs_cnt[kind]++;
}
- err = bpf_trampoline_update(tr, true /* lock_direct_mutex */);
+ err = bpf_trampoline_update(tr, true /* lock_direct_mutex */, ops, data);
if (err) {
hlist_del_init(&link->tramp_hlist);
if (kind == BPF_TRAMP_FSESSION) {
@@ -872,14 +893,16 @@ int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
int err;
trampoline_lock(tr);
- err = __bpf_trampoline_link_prog(link, tr, tgt_prog);
+ err = __bpf_trampoline_link_prog(link, tr, tgt_prog, &trampoline_ops, NULL);
trampoline_unlock(tr);
return err;
}
static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
- struct bpf_prog *tgt_prog)
+ struct bpf_prog *tgt_prog,
+ const struct bpf_trampoline_ops *ops,
+ void *data)
{
enum bpf_tramp_prog_type kind;
int err;
@@ -904,7 +927,7 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
}
hlist_del_init(&link->tramp_hlist);
tr->progs_cnt[kind]--;
- return bpf_trampoline_update(tr, true /* lock_direct_mutex */);
+ return bpf_trampoline_update(tr, true /* lock_direct_mutex */, ops, data);
}
/* bpf_trampoline_unlink_prog() should never fail. */
@@ -915,7 +938,7 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
int err;
trampoline_lock(tr);
- err = __bpf_trampoline_unlink_prog(link, tr, tgt_prog);
+ err = __bpf_trampoline_unlink_prog(link, tr, tgt_prog, &trampoline_ops, NULL);
trampoline_unlock(tr);
return err;
}
@@ -1044,7 +1067,7 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
goto err;
}
- err = __bpf_trampoline_link_prog(&shim_link->link, tr, NULL);
+ err = __bpf_trampoline_link_prog(&shim_link->link, tr, NULL, &trampoline_ops, NULL);
if (err)
goto err;
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 06/29] bpf: Move trampoline image setup into bpf_trampoline_ops callbacks
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Moving trampoline image setup into bpf_trampoline_ops callbacks,
so we can have different image handling for multi attachment which
is coming in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/bpf/trampoline.c | 66 ++++++++++++++++++++++++-----------------
1 file changed, 38 insertions(+), 28 deletions(-)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 5c943832fb9d..1006031ea021 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -60,11 +60,10 @@ static void trampoline_unlock(struct bpf_trampoline *tr)
}
struct bpf_trampoline_ops {
- int (*register_fentry)(struct bpf_trampoline *tr, void *new_addr, void *data);
- int (*unregister_fentry)(struct bpf_trampoline *tr, u32 orig_flags, void *old_addr,
- void *data);
- int (*modify_fentry)(struct bpf_trampoline *tr, u32 orig_flags, void *old_addr,
- void *new_addr, bool lock_direct_mutex, void *data);
+ int (*register_fentry)(struct bpf_trampoline *tr, struct bpf_tramp_image *im, void *data);
+ int (*unregister_fentry)(struct bpf_trampoline *tr, u32 orig_flags, void *data);
+ int (*modify_fentry)(struct bpf_trampoline *tr, u32 orig_flags, struct bpf_tramp_image *im,
+ bool lock_direct_mutex, void *data);
};
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
@@ -426,9 +425,11 @@ static int bpf_trampoline_update_fentry(struct bpf_trampoline *tr, u32 orig_flag
return bpf_arch_text_poke(ip, old_t, new_t, old_addr, new_addr);
}
-static int unregister_fentry(struct bpf_trampoline *tr, u32 orig_flags,
- void *old_addr, void *data __maybe_unused)
+static void bpf_tramp_image_put(struct bpf_tramp_image *im);
+
+static int unregister_fentry(struct bpf_trampoline *tr, u32 orig_flags, void *data __maybe_unused)
{
+ void *old_addr = tr->cur_image->image;
int ret;
if (tr->func.ftrace_managed)
@@ -436,13 +437,19 @@ static int unregister_fentry(struct bpf_trampoline *tr, u32 orig_flags,
else
ret = bpf_trampoline_update_fentry(tr, orig_flags, old_addr, NULL);
- return ret;
+ if (ret)
+ return ret;
+
+ bpf_tramp_image_put(tr->cur_image);
+ tr->cur_image = NULL;
+ return 0;
}
-static int modify_fentry(struct bpf_trampoline *tr, u32 orig_flags,
- void *old_addr, void *new_addr,
+static int modify_fentry(struct bpf_trampoline *tr, u32 orig_flags, struct bpf_tramp_image *im,
bool lock_direct_mutex, void *data __maybe_unused)
{
+ void *old_addr = tr->cur_image->image;
+ void *new_addr = im->image;
int ret;
if (tr->func.ftrace_managed) {
@@ -451,12 +458,20 @@ static int modify_fentry(struct bpf_trampoline *tr, u32 orig_flags,
ret = bpf_trampoline_update_fentry(tr, orig_flags, old_addr,
new_addr);
}
- return ret;
+
+ if (ret)
+ return ret;
+
+ bpf_tramp_image_put(tr->cur_image);
+ tr->cur_image = im;
+ return 0;
}
/* first time registering */
-static int register_fentry(struct bpf_trampoline *tr, void *new_addr, void *data __maybe_unused)
+static int register_fentry(struct bpf_trampoline *tr, struct bpf_tramp_image *im,
+ void *data __maybe_unused)
{
+ void *new_addr = im->image;
void *ip = tr->func.addr;
unsigned long faddr;
int ret;
@@ -474,7 +489,11 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr, void *data
ret = bpf_trampoline_update_fentry(tr, 0, NULL, new_addr);
}
- return ret;
+ if (ret)
+ return ret;
+
+ tr->cur_image = im;
+ return 0;
}
static const struct bpf_trampoline_ops trampoline_ops = {
@@ -664,9 +683,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
return PTR_ERR(tlinks);
if (total == 0) {
- err = ops->unregister_fentry(tr, orig_flags, tr->cur_image->image, data);
- bpf_tramp_image_put(tr->cur_image);
- tr->cur_image = NULL;
+ err = ops->unregister_fentry(tr, orig_flags, data);
goto out;
}
@@ -734,11 +751,10 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
if (tr->cur_image)
/* progs already running at this address */
- err = ops->modify_fentry(tr, orig_flags, tr->cur_image->image,
- im->image, lock_direct_mutex, data);
+ err = ops->modify_fentry(tr, orig_flags, im, lock_direct_mutex, data);
else
/* first time registering */
- err = ops->register_fentry(tr, im->image, data);
+ err = ops->register_fentry(tr, im, data);
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
if (err == -EAGAIN) {
@@ -750,22 +766,16 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
goto again;
}
#endif
- if (err)
- goto out_free;
- if (tr->cur_image)
- bpf_tramp_image_put(tr->cur_image);
- tr->cur_image = im;
+out_free:
+ if (err)
+ bpf_tramp_image_free(im);
out:
/* If any error happens, restore previous flags */
if (err)
tr->flags = orig_flags;
kfree(tlinks);
return err;
-
-out_free:
- bpf_tramp_image_free(im);
- goto out;
}
static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 07/29] bpf: Add bpf_trampoline_add/remove_prog functions
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Separate bpf_trampoline_add/remove_prog functions from
__bpf_trampoline_link/unlink functions to be able to add/remove
trampoline programs without the image being updated in following
changes. No functional change is intended.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/bpf/trampoline.c | 108 +++++++++++++++++++++++-----------------
1 file changed, 61 insertions(+), 47 deletions(-)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 1006031ea021..701138ef424a 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -820,41 +820,16 @@ static int bpf_freplace_check_tgt_prog(struct bpf_prog *tgt_prog)
return 0;
}
-static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
- struct bpf_trampoline *tr,
- struct bpf_prog *tgt_prog,
- const struct bpf_trampoline_ops *ops,
- void *data)
+static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
+ struct bpf_tramp_link *link,
+ int cnt)
{
struct bpf_fsession_link *fslink = NULL;
enum bpf_tramp_prog_type kind;
struct bpf_tramp_link *link_exiting;
struct hlist_head *prog_list;
- int err = 0;
- int cnt = 0, i;
kind = bpf_attach_type_to_tramp(link->link.prog);
- if (tr->extension_prog)
- /* cannot attach fentry/fexit if extension prog is attached.
- * cannot overwrite extension prog either.
- */
- return -EBUSY;
-
- for (i = 0; i < BPF_TRAMP_MAX; i++)
- cnt += tr->progs_cnt[i];
-
- if (kind == BPF_TRAMP_REPLACE) {
- /* Cannot attach extension if fentry/fexit are in use. */
- if (cnt)
- return -EBUSY;
- err = bpf_freplace_check_tgt_prog(tgt_prog);
- if (err)
- return err;
- tr->extension_prog = link->link.prog;
- return bpf_arch_text_poke(tr->func.addr, BPF_MOD_NOP,
- BPF_MOD_JUMP, NULL,
- link->link.prog->bpf_func);
- }
if (kind == BPF_TRAMP_FSESSION) {
prog_list = &tr->progs_hlist[BPF_TRAMP_FENTRY];
cnt++;
@@ -882,17 +857,64 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
} else {
tr->progs_cnt[kind]++;
}
- err = bpf_trampoline_update(tr, true /* lock_direct_mutex */, ops, data);
- if (err) {
- hlist_del_init(&link->tramp_hlist);
- if (kind == BPF_TRAMP_FSESSION) {
- tr->progs_cnt[BPF_TRAMP_FENTRY]--;
- hlist_del_init(&fslink->fexit.tramp_hlist);
- tr->progs_cnt[BPF_TRAMP_FEXIT]--;
- } else {
- tr->progs_cnt[kind]--;
- }
+ return 0;
+}
+
+static void bpf_trampoline_remove_prog(struct bpf_trampoline *tr,
+ struct bpf_tramp_link *link)
+{
+ struct bpf_fsession_link *fslink;
+ enum bpf_tramp_prog_type kind;
+
+ kind = bpf_attach_type_to_tramp(link->link.prog);
+ if (kind == BPF_TRAMP_FSESSION) {
+ fslink = container_of(link, struct bpf_fsession_link, link.link);
+ hlist_del_init(&fslink->fexit.tramp_hlist);
+ tr->progs_cnt[BPF_TRAMP_FEXIT]--;
+ kind = BPF_TRAMP_FENTRY;
+ }
+ hlist_del_init(&link->tramp_hlist);
+ tr->progs_cnt[kind]--;
+}
+
+static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
+ struct bpf_trampoline *tr,
+ struct bpf_prog *tgt_prog,
+ const struct bpf_trampoline_ops *ops,
+ void *data)
+{
+ enum bpf_tramp_prog_type kind;
+ int err = 0;
+ int cnt = 0, i;
+
+ kind = bpf_attach_type_to_tramp(link->link.prog);
+ if (tr->extension_prog)
+ /* cannot attach fentry/fexit if extension prog is attached.
+ * cannot overwrite extension prog either.
+ */
+ return -EBUSY;
+
+ for (i = 0; i < BPF_TRAMP_MAX; i++)
+ cnt += tr->progs_cnt[i];
+
+ if (kind == BPF_TRAMP_REPLACE) {
+ /* Cannot attach extension if fentry/fexit are in use. */
+ if (cnt)
+ return -EBUSY;
+ err = bpf_freplace_check_tgt_prog(tgt_prog);
+ if (err)
+ return err;
+ tr->extension_prog = link->link.prog;
+ return bpf_arch_text_poke(tr->func.addr, BPF_MOD_NOP,
+ BPF_MOD_JUMP, NULL,
+ link->link.prog->bpf_func);
}
+ err = bpf_trampoline_add_prog(tr, link, cnt);
+ if (err)
+ return err;
+ err = bpf_trampoline_update(tr, true /* lock_direct_mutex */, ops, data);
+ if (err)
+ bpf_trampoline_remove_prog(tr, link);
return err;
}
@@ -927,16 +949,8 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
guard(mutex)(&tgt_prog->aux->ext_mutex);
tgt_prog->aux->is_extended = false;
return err;
- } else if (kind == BPF_TRAMP_FSESSION) {
- struct bpf_fsession_link *fslink =
- container_of(link, struct bpf_fsession_link, link.link);
-
- hlist_del_init(&fslink->fexit.tramp_hlist);
- tr->progs_cnt[BPF_TRAMP_FEXIT]--;
- kind = BPF_TRAMP_FENTRY;
}
- hlist_del_init(&link->tramp_hlist);
- tr->progs_cnt[kind]--;
+ bpf_trampoline_remove_prog(tr, link);
return bpf_trampoline_update(tr, true /* lock_direct_mutex */, ops, data);
}
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 08/29] bpf: Add struct bpf_tramp_node object
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Hengqi Chen, bpf, linux-trace-kernel, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, Menglong Dong,
Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Adding struct bpf_tramp_node to decouple the link out of the trampoline
attachment info.
At the moment the object for attaching bpf program to the trampoline is
'struct bpf_tramp_link':
struct bpf_tramp_link {
struct bpf_link link;
struct hlist_node tramp_hlist;
u64 cookie;
}
The link holds the bpf_prog pointer and forces one link - one program
binding logic. In following changes we want to attach program to multiple
trampolines but we want to keep just one bpf_link object.
Splitting struct bpf_tramp_link into:
struct bpf_tramp_link {
struct bpf_link link;
struct bpf_tramp_node node;
};
struct bpf_tramp_node {
struct bpf_link *link;
struct hlist_node tramp_hlist;
u64 cookie;
};
The 'struct bpf_tramp_link' defines standard single trampoline link
and 'struct bpf_tramp_node' is the attachment trampoline object with
pointer to the bpf_link object.
This will allow us to define link for multiple trampolines, like:
struct bpf_tracing_multi_link {
struct bpf_link link;
...
int nodes_cnt;
struct bpf_tracing_multi_node nodes[] __counted_by(nodes_cnt);
};
Cc: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/arm64/net/bpf_jit_comp.c | 58 ++++++++--------
arch/loongarch/net/bpf_jit.c | 52 +++++++--------
arch/powerpc/net/bpf_jit_comp.c | 54 +++++++--------
arch/riscv/net/bpf_jit_comp64.c | 52 +++++++--------
arch/s390/net/bpf_jit_comp.c | 44 ++++++------
arch/x86/net/bpf_jit_comp.c | 54 +++++++--------
include/linux/bpf.h | 60 ++++++++++-------
kernel/bpf/bpf_struct_ops.c | 27 ++++----
kernel/bpf/syscall.c | 39 ++++++-----
kernel/bpf/trampoline.c | 115 ++++++++++++++++----------------
net/bpf/bpf_dummy_struct_ops.c | 14 ++--
11 files changed, 294 insertions(+), 275 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index b4abc3138f37..f6bcc0e1a950 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -2335,24 +2335,24 @@ bool bpf_jit_supports_subprog_tailcalls(void)
return true;
}
-static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
+static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_node *node,
int bargs_off, int retval_off, int run_ctx_off,
bool save_ret)
{
__le32 *branch;
u64 enter_prog;
u64 exit_prog;
- struct bpf_prog *p = l->link.prog;
+ struct bpf_prog *p = node->link->prog;
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
enter_prog = (u64)bpf_trampoline_enter(p);
exit_prog = (u64)bpf_trampoline_exit(p);
- if (l->cookie == 0) {
+ if (node->cookie == 0) {
/* if cookie is zero, one instruction is enough to store it */
emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx);
} else {
- emit_a64_mov_i64(A64_R(10), l->cookie, ctx);
+ emit_a64_mov_i64(A64_R(10), node->cookie, ctx);
emit(A64_STR64I(A64_R(10), A64_SP, run_ctx_off + cookie_off),
ctx);
}
@@ -2402,7 +2402,7 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
emit_call(exit_prog, ctx);
}
-static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
+static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_nodes *tn,
int bargs_off, int retval_off, int run_ctx_off,
__le32 **branches)
{
@@ -2412,8 +2412,8 @@ static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
* Set this to 0 to avoid confusing the program.
*/
emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx);
- for (i = 0; i < tl->nr_links; i++) {
- invoke_bpf_prog(ctx, tl->links[i], bargs_off, retval_off,
+ for (i = 0; i < tn->nr_nodes; i++) {
+ invoke_bpf_prog(ctx, tn->nodes[i], bargs_off, retval_off,
run_ctx_off, true);
/* if (*(u64 *)(sp + retval_off) != 0)
* goto do_fexit;
@@ -2544,10 +2544,10 @@ static void restore_args(struct jit_ctx *ctx, int bargs_off, int nregs)
}
}
-static bool is_struct_ops_tramp(const struct bpf_tramp_links *fentry_links)
+static bool is_struct_ops_tramp(const struct bpf_tramp_nodes *fentry_nodes)
{
- return fentry_links->nr_links == 1 &&
- fentry_links->links[0]->link.type == BPF_LINK_TYPE_STRUCT_OPS;
+ return fentry_nodes->nr_nodes == 1 &&
+ fentry_nodes->nodes[0]->link->type == BPF_LINK_TYPE_STRUCT_OPS;
}
static void store_func_meta(struct jit_ctx *ctx, u64 func_meta, int func_meta_off)
@@ -2568,7 +2568,7 @@ static void store_func_meta(struct jit_ctx *ctx, u64 func_meta, int func_meta_of
*
*/
static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
- struct bpf_tramp_links *tlinks, void *func_addr,
+ struct bpf_tramp_nodes *tnodes, void *func_addr,
const struct btf_func_model *m,
const struct arg_aux *a,
u32 flags)
@@ -2584,14 +2584,14 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
int run_ctx_off;
int oargs_off;
int nfuncargs;
- struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
- struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
- struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+ struct bpf_tramp_nodes *fentry = &tnodes[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes *fexit = &tnodes[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_nodes *fmod_ret = &tnodes[BPF_TRAMP_MODIFY_RETURN];
bool save_ret;
__le32 **branches = NULL;
bool is_struct_ops = is_struct_ops_tramp(fentry);
int cookie_off, cookie_cnt, cookie_bargs_off;
- int fsession_cnt = bpf_fsession_cnt(tlinks);
+ int fsession_cnt = bpf_fsession_cnt(tnodes);
u64 func_meta;
/* trampoline stack layout:
@@ -2637,7 +2637,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
cookie_off = stack_size;
/* room for session cookies */
- cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
+ cookie_cnt = bpf_fsession_cookie_cnt(tnodes);
stack_size += cookie_cnt * 8;
ip_off = stack_size;
@@ -2734,20 +2734,20 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
}
cookie_bargs_off = (bargs_off - cookie_off) / 8;
- for (i = 0; i < fentry->nr_links; i++) {
- if (bpf_prog_calls_session_cookie(fentry->links[i])) {
+ for (i = 0; i < fentry->nr_nodes; i++) {
+ if (bpf_prog_calls_session_cookie(fentry->nodes[i])) {
u64 meta = func_meta | (cookie_bargs_off << BPF_TRAMP_COOKIE_INDEX_SHIFT);
store_func_meta(ctx, meta, func_meta_off);
cookie_bargs_off--;
}
- invoke_bpf_prog(ctx, fentry->links[i], bargs_off,
+ invoke_bpf_prog(ctx, fentry->nodes[i], bargs_off,
retval_off, run_ctx_off,
flags & BPF_TRAMP_F_RET_FENTRY_RET);
}
- if (fmod_ret->nr_links) {
- branches = kcalloc(fmod_ret->nr_links, sizeof(__le32 *),
+ if (fmod_ret->nr_nodes) {
+ branches = kcalloc(fmod_ret->nr_nodes, sizeof(__le32 *),
GFP_KERNEL);
if (!branches)
return -ENOMEM;
@@ -2771,7 +2771,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
}
/* update the branches saved in invoke_bpf_mod_ret with cbnz */
- for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) {
+ for (i = 0; i < fmod_ret->nr_nodes && ctx->image != NULL; i++) {
int offset = &ctx->image[ctx->idx] - branches[i];
*branches[i] = cpu_to_le32(A64_CBNZ(1, A64_R(10), offset));
}
@@ -2782,14 +2782,14 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
store_func_meta(ctx, func_meta, func_meta_off);
cookie_bargs_off = (bargs_off - cookie_off) / 8;
- for (i = 0; i < fexit->nr_links; i++) {
- if (bpf_prog_calls_session_cookie(fexit->links[i])) {
+ for (i = 0; i < fexit->nr_nodes; i++) {
+ if (bpf_prog_calls_session_cookie(fexit->nodes[i])) {
u64 meta = func_meta | (cookie_bargs_off << BPF_TRAMP_COOKIE_INDEX_SHIFT);
store_func_meta(ctx, meta, func_meta_off);
cookie_bargs_off--;
}
- invoke_bpf_prog(ctx, fexit->links[i], bargs_off, retval_off,
+ invoke_bpf_prog(ctx, fexit->nodes[i], bargs_off, retval_off,
run_ctx_off, false);
}
@@ -2847,7 +2847,7 @@ bool bpf_jit_supports_fsession(void)
}
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *func_addr)
+ struct bpf_tramp_nodes *tnodes, void *func_addr)
{
struct jit_ctx ctx = {
.image = NULL,
@@ -2861,7 +2861,7 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
if (ret < 0)
return ret;
- ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, m, &aaux, flags);
+ ret = prepare_trampoline(&ctx, &im, tnodes, func_addr, m, &aaux, flags);
if (ret < 0)
return ret;
@@ -2885,7 +2885,7 @@ int arch_protect_bpf_trampoline(void *image, unsigned int size)
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
void *ro_image_end, const struct btf_func_model *m,
- u32 flags, struct bpf_tramp_links *tlinks,
+ u32 flags, struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
u32 size = ro_image_end - ro_image;
@@ -2912,7 +2912,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
ret = calc_arg_aux(m, &aaux);
if (ret)
goto out;
- ret = prepare_trampoline(&ctx, im, tlinks, func_addr, m, &aaux, flags);
+ ret = prepare_trampoline(&ctx, im, tnodes, func_addr, m, &aaux, flags);
if (ret > 0 && validate_code(&ctx) < 0) {
ret = -EINVAL;
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index 24913dc7f4e8..058ffbbaad85 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1674,17 +1674,17 @@ static void restore_stk_args(struct jit_ctx *ctx, int nr_stk_args, int args_off,
}
}
-static int invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
+static int invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_node *n,
int args_off, int retval_off, int run_ctx_off, bool save_ret)
{
int ret;
u32 *branch;
- struct bpf_prog *p = l->link.prog;
+ struct bpf_prog *p = n->link->prog;
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
- if (l->cookie)
+ if (n->cookie)
emit_store_stack_imm64(ctx, LOONGARCH_GPR_T1,
- -run_ctx_off + cookie_off, l->cookie);
+ -run_ctx_off + cookie_off, n->cookie);
else
emit_insn(ctx, std, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_FP, -run_ctx_off + cookie_off);
@@ -1737,22 +1737,22 @@ static int invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
return ret;
}
-static int invoke_bpf(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
+static int invoke_bpf(struct jit_ctx *ctx, struct bpf_tramp_nodes *tn,
int args_off, int retval_off, int run_ctx_off,
int func_meta_off, bool save_ret, u64 func_meta, int cookie_off)
{
int i, cur_cookie = (cookie_off - args_off) / 8;
- for (i = 0; i < tl->nr_links; i++) {
+ for (i = 0; i < tn->nr_nodes; i++) {
int err;
- if (bpf_prog_calls_session_cookie(tl->links[i])) {
+ if (bpf_prog_calls_session_cookie(tn->nodes[i])) {
u64 meta = func_meta | ((u64)cur_cookie << BPF_TRAMP_COOKIE_INDEX_SHIFT);
emit_store_stack_imm64(ctx, LOONGARCH_GPR_T1, -func_meta_off, meta);
cur_cookie--;
}
- err = invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off, run_ctx_off, save_ret);
+ err = invoke_bpf_prog(ctx, tn->nodes[i], args_off, retval_off, run_ctx_off, save_ret);
if (err)
return err;
}
@@ -1807,7 +1807,7 @@ static void sign_extend(struct jit_ctx *ctx, int rd, int rj, u8 size, bool sign)
}
static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
- const struct btf_func_model *m, struct bpf_tramp_links *tlinks,
+ const struct btf_func_model *m, struct bpf_tramp_nodes *tnodes,
void *func_addr, u32 flags)
{
int i, ret, save_ret;
@@ -1817,9 +1817,9 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
unsigned long long func_meta;
bool is_struct_ops = flags & BPF_TRAMP_F_INDIRECT;
void *orig_call = func_addr;
- struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
- struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
- struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+ struct bpf_tramp_nodes *fentry = &tnodes[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes *fexit = &tnodes[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_nodes *fmod_ret = &tnodes[BPF_TRAMP_MODIFY_RETURN];
u32 **branches = NULL;
/*
@@ -1898,7 +1898,7 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
ip_off = stack_size;
}
- cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
+ cookie_cnt = bpf_fsession_cookie_cnt(tnodes);
/* Room for session cookies */
stack_size += cookie_cnt * 8;
@@ -1969,7 +1969,7 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
store_args(ctx, nr_arg_slots, args_off);
- if (bpf_fsession_cnt(tlinks)) {
+ if (bpf_fsession_cnt(tnodes)) {
/* clear all session cookies' value */
for (i = 0; i < cookie_cnt; i++)
emit_insn(ctx, std, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_FP, -cookie_off + 8 * i);
@@ -1994,20 +1994,20 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
return ret;
}
- if (fentry->nr_links) {
+ if (fentry->nr_nodes) {
ret = invoke_bpf(ctx, fentry, args_off, retval_off, run_ctx_off, func_meta_off,
flags & BPF_TRAMP_F_RET_FENTRY_RET, func_meta, cookie_off);
if (ret)
return ret;
}
- if (fmod_ret->nr_links) {
- branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *), GFP_KERNEL);
+ if (fmod_ret->nr_nodes) {
+ branches = kcalloc(fmod_ret->nr_nodes, sizeof(u32 *), GFP_KERNEL);
if (!branches)
return -ENOMEM;
emit_insn(ctx, std, LOONGARCH_GPR_ZERO, LOONGARCH_GPR_FP, -retval_off);
- for (i = 0; i < fmod_ret->nr_links; i++) {
- ret = invoke_bpf_prog(ctx, fmod_ret->links[i],
+ for (i = 0; i < fmod_ret->nr_nodes; i++) {
+ ret = invoke_bpf_prog(ctx, fmod_ret->nodes[i],
args_off, retval_off, run_ctx_off, true);
if (ret)
goto out;
@@ -2035,17 +2035,17 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
emit_insn(ctx, nop);
}
- for (i = 0; ctx->image && i < fmod_ret->nr_links; i++) {
+ for (i = 0; ctx->image && i < fmod_ret->nr_nodes; i++) {
int offset = (void *)(&ctx->image[ctx->idx]) - (void *)branches[i];
*branches[i] = larch_insn_gen_bne(LOONGARCH_GPR_T1, LOONGARCH_GPR_ZERO, offset);
}
/* Set "is_return" flag for fsession */
func_meta |= (1ULL << BPF_TRAMP_IS_RETURN_SHIFT);
- if (bpf_fsession_cnt(tlinks))
+ if (bpf_fsession_cnt(tnodes))
emit_store_stack_imm64(ctx, LOONGARCH_GPR_T1, -func_meta_off, func_meta);
- if (fexit->nr_links) {
+ if (fexit->nr_nodes) {
ret = invoke_bpf(ctx, fexit, args_off, retval_off, run_ctx_off,
func_meta_off, false, func_meta, cookie_off);
if (ret)
@@ -2115,7 +2115,7 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
void *ro_image_end, const struct btf_func_model *m,
- u32 flags, struct bpf_tramp_links *tlinks, void *func_addr)
+ u32 flags, struct bpf_tramp_nodes *tnodes, void *func_addr)
{
int ret, size;
void *image, *tmp;
@@ -2131,7 +2131,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
ctx.idx = 0;
jit_fill_hole(image, (unsigned int)(ro_image_end - ro_image));
- ret = __arch_prepare_bpf_trampoline(&ctx, im, m, tlinks, func_addr, flags);
+ ret = __arch_prepare_bpf_trampoline(&ctx, im, m, tnodes, func_addr, flags);
if (ret < 0)
goto out;
@@ -2152,7 +2152,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
}
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *func_addr)
+ struct bpf_tramp_nodes *tnodes, void *func_addr)
{
int ret;
struct jit_ctx ctx;
@@ -2161,7 +2161,7 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
ctx.image = NULL;
ctx.idx = 0;
- ret = __arch_prepare_bpf_trampoline(&ctx, &im, m, tlinks, func_addr, flags);
+ ret = __arch_prepare_bpf_trampoline(&ctx, &im, m, tnodes, func_addr, flags);
return ret < 0 ? ret : ret * LOONGARCH_INSN_SIZE;
}
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 53ab97ad6074..6351a187ca61 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -597,22 +597,22 @@ int arch_protect_bpf_trampoline(void *image, unsigned int size)
}
static int invoke_bpf_prog(u32 *image, u32 *ro_image, struct codegen_context *ctx,
- struct bpf_tramp_link *l, int regs_off, int retval_off,
+ struct bpf_tramp_node *n, int regs_off, int retval_off,
int run_ctx_off, bool save_ret)
{
- struct bpf_prog *p = l->link.prog;
+ struct bpf_prog *p = n->link->prog;
ppc_inst_t branch_insn;
u32 jmp_idx;
int ret = 0;
/* Save cookie */
if (IS_ENABLED(CONFIG_PPC64)) {
- PPC_LI64(_R3, l->cookie);
+ PPC_LI64(_R3, n->cookie);
EMIT(PPC_RAW_STD(_R3, _R1, run_ctx_off + offsetof(struct bpf_tramp_run_ctx,
bpf_cookie)));
} else {
- PPC_LI32(_R3, l->cookie >> 32);
- PPC_LI32(_R4, l->cookie);
+ PPC_LI32(_R3, n->cookie >> 32);
+ PPC_LI32(_R4, n->cookie);
EMIT(PPC_RAW_STW(_R3, _R1,
run_ctx_off + offsetof(struct bpf_tramp_run_ctx, bpf_cookie)));
EMIT(PPC_RAW_STW(_R4, _R1,
@@ -679,7 +679,7 @@ static int invoke_bpf_prog(u32 *image, u32 *ro_image, struct codegen_context *ct
}
static int invoke_bpf_mod_ret(u32 *image, u32 *ro_image, struct codegen_context *ctx,
- struct bpf_tramp_links *tl, int regs_off, int retval_off,
+ struct bpf_tramp_nodes *tn, int regs_off, int retval_off,
int run_ctx_off, u32 *branches)
{
int i;
@@ -690,8 +690,8 @@ static int invoke_bpf_mod_ret(u32 *image, u32 *ro_image, struct codegen_context
*/
EMIT(PPC_RAW_LI(_R3, 0));
EMIT(PPC_RAW_STL(_R3, _R1, retval_off));
- for (i = 0; i < tl->nr_links; i++) {
- if (invoke_bpf_prog(image, ro_image, ctx, tl->links[i], regs_off, retval_off,
+ for (i = 0; i < tn->nr_nodes; i++) {
+ if (invoke_bpf_prog(image, ro_image, ctx, tn->nodes[i], regs_off, retval_off,
run_ctx_off, true))
return -EINVAL;
@@ -807,18 +807,18 @@ static void bpf_trampoline_restore_args_stack(u32 *image, struct codegen_context
static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_image,
void *rw_image_end, void *ro_image,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
int regs_off, func_meta_off, ip_off, run_ctx_off, retval_off;
int nvr_off, alt_lr_off, r4_off = 0;
- struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
- struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
- struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_nodes *fmod_ret = &tnodes[BPF_TRAMP_MODIFY_RETURN];
+ struct bpf_tramp_nodes *fentry = &tnodes[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes *fexit = &tnodes[BPF_TRAMP_FEXIT];
int i, ret, nr_regs, retaddr_off, bpf_frame_size = 0;
struct codegen_context codegen_ctx, *ctx;
int cookie_off, cookie_cnt, cookie_ctx_off;
- int fsession_cnt = bpf_fsession_cnt(tlinks);
+ int fsession_cnt = bpf_fsession_cnt(tnodes);
u64 func_meta;
u32 *image = (u32 *)rw_image;
ppc_inst_t branch_insn;
@@ -893,7 +893,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
/* room for session cookies */
cookie_off = bpf_frame_size;
- cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
+ cookie_cnt = bpf_fsession_cookie_cnt(tnodes);
bpf_frame_size += cookie_cnt * 8;
/* Room for IP address argument */
@@ -1030,21 +1030,21 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
cookie_ctx_off = (regs_off - cookie_off) / 8;
- for (i = 0; i < fentry->nr_links; i++) {
- if (bpf_prog_calls_session_cookie(fentry->links[i])) {
+ for (i = 0; i < fentry->nr_nodes; i++) {
+ if (bpf_prog_calls_session_cookie(fentry->nodes[i])) {
u64 meta = func_meta | (cookie_ctx_off << BPF_TRAMP_COOKIE_INDEX_SHIFT);
store_func_meta(image, ctx, meta, func_meta_off);
cookie_ctx_off--;
}
- if (invoke_bpf_prog(image, ro_image, ctx, fentry->links[i], regs_off, retval_off,
+ if (invoke_bpf_prog(image, ro_image, ctx, fentry->nodes[i], regs_off, retval_off,
run_ctx_off, flags & BPF_TRAMP_F_RET_FENTRY_RET))
return -EINVAL;
}
- if (fmod_ret->nr_links) {
- branches = kcalloc(fmod_ret->nr_links, sizeof(u32), GFP_KERNEL);
+ if (fmod_ret->nr_nodes) {
+ branches = kcalloc(fmod_ret->nr_nodes, sizeof(u32), GFP_KERNEL);
if (!branches)
return -ENOMEM;
@@ -1093,7 +1093,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
}
/* Update branches saved in invoke_bpf_mod_ret with address of do_fexit */
- for (i = 0; i < fmod_ret->nr_links && image; i++) {
+ for (i = 0; i < fmod_ret->nr_nodes && image; i++) {
if (create_cond_branch(&branch_insn, &image[branches[i]],
(unsigned long)&image[ctx->idx], COND_NE << 16)) {
ret = -EINVAL;
@@ -1110,15 +1110,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
cookie_ctx_off = (regs_off - cookie_off) / 8;
- for (i = 0; i < fexit->nr_links; i++) {
- if (bpf_prog_calls_session_cookie(fexit->links[i])) {
+ for (i = 0; i < fexit->nr_nodes; i++) {
+ if (bpf_prog_calls_session_cookie(fexit->nodes[i])) {
u64 meta = func_meta | (cookie_ctx_off << BPF_TRAMP_COOKIE_INDEX_SHIFT);
store_func_meta(image, ctx, meta, func_meta_off);
cookie_ctx_off--;
}
- if (invoke_bpf_prog(image, ro_image, ctx, fexit->links[i], regs_off, retval_off,
+ if (invoke_bpf_prog(image, ro_image, ctx, fexit->nodes[i], regs_off, retval_off,
run_ctx_off, false)) {
ret = -EINVAL;
goto cleanup;
@@ -1185,18 +1185,18 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
}
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *func_addr)
+ struct bpf_tramp_nodes *tnodes, void *func_addr)
{
struct bpf_tramp_image im;
int ret;
- ret = __arch_prepare_bpf_trampoline(&im, NULL, NULL, NULL, m, flags, tlinks, func_addr);
+ ret = __arch_prepare_bpf_trampoline(&im, NULL, NULL, NULL, m, flags, tnodes, func_addr);
return ret;
}
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
u32 size = image_end - image;
@@ -1212,7 +1212,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
return -ENOMEM;
ret = __arch_prepare_bpf_trampoline(im, rw_image, rw_image + size, image, m,
- flags, tlinks, func_addr);
+ flags, tnodes, func_addr);
if (ret < 0)
goto out;
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 2f1109dbf105..461b902a5f92 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -934,15 +934,15 @@ static void emit_store_stack_imm64(u8 reg, int stack_off, u64 imm64,
emit_sd(RV_REG_FP, stack_off, reg, ctx);
}
-static int invoke_bpf_prog(struct bpf_tramp_link *l, int args_off, int retval_off,
+static int invoke_bpf_prog(struct bpf_tramp_node *node, int args_off, int retval_off,
int run_ctx_off, bool save_ret, struct rv_jit_context *ctx)
{
int ret, branch_off;
- struct bpf_prog *p = l->link.prog;
+ struct bpf_prog *p = node->link->prog;
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
- if (l->cookie)
- emit_store_stack_imm64(RV_REG_T1, -run_ctx_off + cookie_off, l->cookie, ctx);
+ if (node->cookie)
+ emit_store_stack_imm64(RV_REG_T1, -run_ctx_off + cookie_off, node->cookie, ctx);
else
emit_sd(RV_REG_FP, -run_ctx_off + cookie_off, RV_REG_ZERO, ctx);
@@ -996,22 +996,22 @@ static int invoke_bpf_prog(struct bpf_tramp_link *l, int args_off, int retval_of
return ret;
}
-static int invoke_bpf(struct bpf_tramp_links *tl, int args_off, int retval_off,
+static int invoke_bpf(struct bpf_tramp_nodes *tn, int args_off, int retval_off,
int run_ctx_off, int func_meta_off, bool save_ret, u64 func_meta,
int cookie_off, struct rv_jit_context *ctx)
{
int i, cur_cookie = (cookie_off - args_off) / 8;
- for (i = 0; i < tl->nr_links; i++) {
+ for (i = 0; i < tn->nr_nodes; i++) {
int err;
- if (bpf_prog_calls_session_cookie(tl->links[i])) {
+ if (bpf_prog_calls_session_cookie(tn->nodes[i])) {
u64 meta = func_meta | ((u64)cur_cookie << BPF_TRAMP_COOKIE_INDEX_SHIFT);
emit_store_stack_imm64(RV_REG_T1, -func_meta_off, meta, ctx);
cur_cookie--;
}
- err = invoke_bpf_prog(tl->links[i], args_off, retval_off, run_ctx_off,
+ err = invoke_bpf_prog(tn->nodes[i], args_off, retval_off, run_ctx_off,
save_ret, ctx);
if (err)
return err;
@@ -1021,7 +1021,7 @@ static int invoke_bpf(struct bpf_tramp_links *tl, int args_off, int retval_off,
static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
const struct btf_func_model *m,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr, u32 flags,
struct rv_jit_context *ctx)
{
@@ -1030,9 +1030,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
int stack_size = 0, nr_arg_slots = 0;
int retval_off, args_off, func_meta_off, ip_off, run_ctx_off, sreg_off, stk_arg_off;
int cookie_off, cookie_cnt;
- struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
- struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
- struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+ struct bpf_tramp_nodes *fentry = &tnodes[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes *fexit = &tnodes[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_nodes *fmod_ret = &tnodes[BPF_TRAMP_MODIFY_RETURN];
bool is_struct_ops = flags & BPF_TRAMP_F_INDIRECT;
void *orig_call = func_addr;
bool save_ret;
@@ -1115,7 +1115,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
ip_off = stack_size;
}
- cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
+ cookie_cnt = bpf_fsession_cookie_cnt(tnodes);
/* room for session cookies */
stack_size += cookie_cnt * 8;
cookie_off = stack_size;
@@ -1172,7 +1172,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
store_args(nr_arg_slots, args_off, ctx);
- if (bpf_fsession_cnt(tlinks)) {
+ if (bpf_fsession_cnt(tnodes)) {
/* clear all session cookies' value */
for (i = 0; i < cookie_cnt; i++)
emit_sd(RV_REG_FP, -cookie_off + 8 * i, RV_REG_ZERO, ctx);
@@ -1187,22 +1187,22 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
return ret;
}
- if (fentry->nr_links) {
+ if (fentry->nr_nodes) {
ret = invoke_bpf(fentry, args_off, retval_off, run_ctx_off, func_meta_off,
flags & BPF_TRAMP_F_RET_FENTRY_RET, func_meta, cookie_off, ctx);
if (ret)
return ret;
}
- if (fmod_ret->nr_links) {
- branches_off = kzalloc_objs(int, fmod_ret->nr_links);
+ if (fmod_ret->nr_nodes) {
+ branches_off = kzalloc_objs(int, fmod_ret->nr_nodes);
if (!branches_off)
return -ENOMEM;
/* cleanup to avoid garbage return value confusion */
emit_sd(RV_REG_FP, -retval_off, RV_REG_ZERO, ctx);
- for (i = 0; i < fmod_ret->nr_links; i++) {
- ret = invoke_bpf_prog(fmod_ret->links[i], args_off, retval_off,
+ for (i = 0; i < fmod_ret->nr_nodes; i++) {
+ ret = invoke_bpf_prog(fmod_ret->nodes[i], args_off, retval_off,
run_ctx_off, true, ctx);
if (ret)
goto out;
@@ -1230,7 +1230,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
}
/* update branches saved in invoke_bpf_mod_ret with bnez */
- for (i = 0; ctx->insns && i < fmod_ret->nr_links; i++) {
+ for (i = 0; ctx->insns && i < fmod_ret->nr_nodes; i++) {
offset = ninsns_rvoff(ctx->ninsns - branches_off[i]);
insn = rv_bne(RV_REG_T1, RV_REG_ZERO, offset >> 1);
*(u32 *)(ctx->insns + branches_off[i]) = insn;
@@ -1238,10 +1238,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
/* set "is_return" flag for fsession */
func_meta |= (1ULL << BPF_TRAMP_IS_RETURN_SHIFT);
- if (bpf_fsession_cnt(tlinks))
+ if (bpf_fsession_cnt(tnodes))
emit_store_stack_imm64(RV_REG_T1, -func_meta_off, func_meta, ctx);
- if (fexit->nr_links) {
+ if (fexit->nr_nodes) {
ret = invoke_bpf(fexit, args_off, retval_off, run_ctx_off, func_meta_off,
false, func_meta, cookie_off, ctx);
if (ret)
@@ -1305,7 +1305,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
}
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *func_addr)
+ struct bpf_tramp_nodes *tnodes, void *func_addr)
{
struct bpf_tramp_image im;
struct rv_jit_context ctx;
@@ -1314,7 +1314,7 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
ctx.ninsns = 0;
ctx.insns = NULL;
ctx.ro_insns = NULL;
- ret = __arch_prepare_bpf_trampoline(&im, m, tlinks, func_addr, flags, &ctx);
+ ret = __arch_prepare_bpf_trampoline(&im, m, tnodes, func_addr, flags, &ctx);
return ret < 0 ? ret : ninsns_rvoff(ctx.ninsns);
}
@@ -1331,7 +1331,7 @@ void arch_free_bpf_trampoline(void *image, unsigned int size)
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
void *ro_image_end, const struct btf_func_model *m,
- u32 flags, struct bpf_tramp_links *tlinks,
+ u32 flags, struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
int ret;
@@ -1346,7 +1346,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
ctx.ninsns = 0;
ctx.insns = image;
ctx.ro_insns = ro_image;
- ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx);
+ ret = __arch_prepare_bpf_trampoline(im, m, tnodes, func_addr, flags, &ctx);
if (ret < 0)
goto out;
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 14eaaa5b2185..31749c0362ca 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -2537,19 +2537,19 @@ static void emit_store_stack_imm64(struct bpf_jit *jit, int tmp_reg, int stack_o
static int invoke_bpf_prog(struct bpf_tramp_jit *tjit,
const struct btf_func_model *m,
- struct bpf_tramp_link *tlink, bool save_ret)
+ struct bpf_tramp_node *node, bool save_ret)
{
struct bpf_jit *jit = &tjit->common;
int cookie_off = tjit->run_ctx_off +
offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
- struct bpf_prog *p = tlink->link.prog;
+ struct bpf_prog *p = node->link->prog;
int patch;
/*
- * run_ctx.cookie = tlink->cookie;
+ * run_ctx.cookie = node->cookie;
*/
- emit_store_stack_imm64(jit, REG_W0, cookie_off, tlink->cookie);
+ emit_store_stack_imm64(jit, REG_W0, cookie_off, node->cookie);
/*
* if ((start = __bpf_prog_enter(p, &run_ctx)) == 0)
@@ -2609,20 +2609,20 @@ static int invoke_bpf_prog(struct bpf_tramp_jit *tjit,
static int invoke_bpf(struct bpf_tramp_jit *tjit,
const struct btf_func_model *m,
- struct bpf_tramp_links *tl, bool save_ret,
+ struct bpf_tramp_nodes *tn, bool save_ret,
u64 func_meta, int cookie_off)
{
int i, cur_cookie = (tjit->bpf_args_off - cookie_off) / sizeof(u64);
struct bpf_jit *jit = &tjit->common;
- for (i = 0; i < tl->nr_links; i++) {
- if (bpf_prog_calls_session_cookie(tl->links[i])) {
+ for (i = 0; i < tn->nr_nodes; i++) {
+ if (bpf_prog_calls_session_cookie(tn->nodes[i])) {
u64 meta = func_meta | ((u64)cur_cookie << BPF_TRAMP_COOKIE_INDEX_SHIFT);
emit_store_stack_imm64(jit, REG_0, tjit->func_meta_off, meta);
cur_cookie--;
}
- if (invoke_bpf_prog(tjit, m, tl->links[i], save_ret))
+ if (invoke_bpf_prog(tjit, m, tn->nodes[i], save_ret))
return -EINVAL;
}
@@ -2651,12 +2651,12 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
struct bpf_tramp_jit *tjit,
const struct btf_func_model *m,
u32 flags,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
- struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
- struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
- struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_nodes *fmod_ret = &tnodes[BPF_TRAMP_MODIFY_RETURN];
+ struct bpf_tramp_nodes *fentry = &tnodes[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes *fexit = &tnodes[BPF_TRAMP_FEXIT];
int nr_bpf_args, nr_reg_args, nr_stack_args;
int cookie_cnt, cookie_off, fsession_cnt;
struct bpf_jit *jit = &tjit->common;
@@ -2693,8 +2693,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
return -ENOTSUPP;
}
- cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
- fsession_cnt = bpf_fsession_cnt(tlinks);
+ cookie_cnt = bpf_fsession_cookie_cnt(tnodes);
+ fsession_cnt = bpf_fsession_cnt(tnodes);
/*
* Calculate the stack layout.
@@ -2829,7 +2829,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
func_meta, cookie_off))
return -EINVAL;
- if (fmod_ret->nr_links) {
+ if (fmod_ret->nr_nodes) {
/*
* retval = 0;
*/
@@ -2838,8 +2838,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
_EMIT6(0xd707f000 | tjit->retval_off,
0xf000 | tjit->retval_off);
- for (i = 0; i < fmod_ret->nr_links; i++) {
- if (invoke_bpf_prog(tjit, m, fmod_ret->links[i], true))
+ for (i = 0; i < fmod_ret->nr_nodes; i++) {
+ if (invoke_bpf_prog(tjit, m, fmod_ret->nodes[i], true))
return -EINVAL;
/*
@@ -2964,7 +2964,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
}
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *orig_call)
+ struct bpf_tramp_nodes *tnodes, void *orig_call)
{
struct bpf_tramp_image im;
struct bpf_tramp_jit tjit;
@@ -2973,14 +2973,14 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
memset(&tjit, 0, sizeof(tjit));
ret = __arch_prepare_bpf_trampoline(&im, &tjit, m, flags,
- tlinks, orig_call);
+ tnodes, orig_call);
return ret < 0 ? ret : tjit.common.prg;
}
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
void *image_end, const struct btf_func_model *m,
- u32 flags, struct bpf_tramp_links *tlinks,
+ u32 flags, struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
struct bpf_tramp_jit tjit;
@@ -2989,7 +2989,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
/* Compute offsets, check whether the code fits. */
memset(&tjit, 0, sizeof(tjit));
ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags,
- tlinks, func_addr);
+ tnodes, func_addr);
if (ret < 0)
return ret;
@@ -3003,7 +3003,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
tjit.common.prg = 0;
tjit.common.prg_buf = image;
ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags,
- tlinks, func_addr);
+ tnodes, func_addr);
return ret < 0 ? ret : tjit.common.prg;
}
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a0c541a441cf..054e043ffcd2 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3104,15 +3104,15 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog,
}
static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
- struct bpf_tramp_link *l, int stack_size,
+ struct bpf_tramp_node *node, int stack_size,
int run_ctx_off, bool save_ret,
void *image, void *rw_image)
{
u8 *prog = *pprog;
u8 *jmp_insn;
int ctx_cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
- struct bpf_prog *p = l->link.prog;
- u64 cookie = l->cookie;
+ struct bpf_prog *p = node->link->prog;
+ u64 cookie = node->cookie;
/* mov rdi, cookie */
emit_mov_imm64(&prog, BPF_REG_1, (long) cookie >> 32, (u32) (long) cookie);
@@ -3219,7 +3219,7 @@ static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond)
}
static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
- struct bpf_tramp_links *tl, int stack_size,
+ struct bpf_tramp_nodes *tl, int stack_size,
int run_ctx_off, int func_meta_off, bool save_ret,
void *image, void *rw_image, u64 func_meta,
int cookie_off)
@@ -3227,13 +3227,13 @@ static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
int i, cur_cookie = (cookie_off - stack_size) / 8;
u8 *prog = *pprog;
- for (i = 0; i < tl->nr_links; i++) {
- if (tl->links[i]->link.prog->call_session_cookie) {
+ for (i = 0; i < tl->nr_nodes; i++) {
+ if (tl->nodes[i]->link->prog->call_session_cookie) {
emit_store_stack_imm64(&prog, BPF_REG_0, -func_meta_off,
func_meta | (cur_cookie << BPF_TRAMP_COOKIE_INDEX_SHIFT));
cur_cookie--;
}
- if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size,
+ if (invoke_bpf_prog(m, &prog, tl->nodes[i], stack_size,
run_ctx_off, save_ret, image, rw_image))
return -EINVAL;
}
@@ -3242,7 +3242,7 @@ static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
}
static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
- struct bpf_tramp_links *tl, int stack_size,
+ struct bpf_tramp_nodes *tl, int stack_size,
int run_ctx_off, u8 **branches,
void *image, void *rw_image)
{
@@ -3254,8 +3254,8 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
*/
emit_mov_imm32(&prog, false, BPF_REG_0, 0);
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
- for (i = 0; i < tl->nr_links; i++) {
- if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, true,
+ for (i = 0; i < tl->nr_nodes; i++) {
+ if (invoke_bpf_prog(m, &prog, tl->nodes[i], stack_size, run_ctx_off, true,
image, rw_image))
return -EINVAL;
@@ -3346,14 +3346,14 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_image,
void *rw_image_end, void *image,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
int i, ret, nr_regs = m->nr_args, stack_size = 0;
int regs_off, func_meta_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
- struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
- struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
- struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+ struct bpf_tramp_nodes *fentry = &tnodes[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes *fexit = &tnodes[BPF_TRAMP_FEXIT];
+ struct bpf_tramp_nodes *fmod_ret = &tnodes[BPF_TRAMP_MODIFY_RETURN];
void *orig_call = func_addr;
int cookie_off, cookie_cnt;
u8 **branches = NULL;
@@ -3425,7 +3425,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
ip_off = stack_size;
- cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
+ cookie_cnt = bpf_fsession_cookie_cnt(tnodes);
/* room for session cookies */
stack_size += cookie_cnt * 8;
cookie_off = stack_size;
@@ -3518,7 +3518,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
}
}
- if (bpf_fsession_cnt(tlinks)) {
+ if (bpf_fsession_cnt(tnodes)) {
/* clear all the session cookies' value */
for (int i = 0; i < cookie_cnt; i++)
emit_store_stack_imm64(&prog, BPF_REG_0, -cookie_off + 8 * i, 0);
@@ -3526,15 +3526,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
emit_store_stack_imm64(&prog, BPF_REG_0, -8, 0);
}
- if (fentry->nr_links) {
+ if (fentry->nr_nodes) {
if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off, func_meta_off,
flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image,
func_meta, cookie_off))
return -EINVAL;
}
- if (fmod_ret->nr_links) {
- branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *),
+ if (fmod_ret->nr_nodes) {
+ branches = kcalloc(fmod_ret->nr_nodes, sizeof(u8 *),
GFP_KERNEL);
if (!branches)
return -ENOMEM;
@@ -3573,7 +3573,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
emit_nops(&prog, X86_PATCH_SIZE);
}
- if (fmod_ret->nr_links) {
+ if (fmod_ret->nr_nodes) {
/* From Intel 64 and IA-32 Architectures Optimization
* Reference Manual, 3.4.1.4 Code Alignment, Assembly/Compiler
* Coding Rule 11: All branch targets should be 16-byte
@@ -3583,7 +3583,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
/* Update the branches saved in invoke_bpf_mod_ret with the
* aligned address of do_fexit.
*/
- for (i = 0; i < fmod_ret->nr_links; i++) {
+ for (i = 0; i < fmod_ret->nr_nodes; i++) {
emit_cond_near_jump(&branches[i], image + (prog - (u8 *)rw_image),
image + (branches[i] - (u8 *)rw_image), X86_JNE);
}
@@ -3591,10 +3591,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
/* set the "is_return" flag for fsession */
func_meta |= (1ULL << BPF_TRAMP_IS_RETURN_SHIFT);
- if (bpf_fsession_cnt(tlinks))
+ if (bpf_fsession_cnt(tnodes))
emit_store_stack_imm64(&prog, BPF_REG_0, -func_meta_off, func_meta);
- if (fexit->nr_links) {
+ if (fexit->nr_nodes) {
if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off, func_meta_off,
false, image, rw_image, func_meta, cookie_off)) {
ret = -EINVAL;
@@ -3668,7 +3668,7 @@ int arch_protect_bpf_trampoline(void *image, unsigned int size)
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
void *rw_image, *tmp;
@@ -3683,7 +3683,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
return -ENOMEM;
ret = __arch_prepare_bpf_trampoline(im, rw_image, rw_image + size, image, m,
- flags, tlinks, func_addr);
+ flags, tnodes, func_addr);
if (ret < 0)
goto out;
@@ -3696,7 +3696,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
}
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *func_addr)
+ struct bpf_tramp_nodes *tnodes, void *func_addr)
{
struct bpf_tramp_image im;
void *image;
@@ -3714,7 +3714,7 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
return -ENOMEM;
ret = __arch_prepare_bpf_trampoline(&im, image, image + PAGE_SIZE, image,
- m, flags, tlinks, func_addr);
+ m, flags, tnodes, func_addr);
bpf_jit_free_exec(image);
return ret;
}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fd0d873219d2..27d13d6c14be 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1250,9 +1250,9 @@ enum {
#define BPF_TRAMP_COOKIE_INDEX_SHIFT 8
#define BPF_TRAMP_IS_RETURN_SHIFT 63
-struct bpf_tramp_links {
- struct bpf_tramp_link *links[BPF_MAX_TRAMP_LINKS];
- int nr_links;
+struct bpf_tramp_nodes {
+ struct bpf_tramp_node *nodes[BPF_MAX_TRAMP_LINKS];
+ int nr_nodes;
};
struct bpf_tramp_run_ctx;
@@ -1280,13 +1280,13 @@ struct bpf_tramp_run_ctx;
struct bpf_tramp_image;
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr);
void *arch_alloc_bpf_trampoline(unsigned int size);
void arch_free_bpf_trampoline(void *image, unsigned int size);
int __must_check arch_protect_bpf_trampoline(void *image, unsigned int size);
int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *func_addr);
+ struct bpf_tramp_nodes *tnodes, void *func_addr);
u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog,
struct bpf_tramp_run_ctx *run_ctx);
@@ -1470,10 +1470,10 @@ static inline int bpf_dynptr_check_off_len(const struct bpf_dynptr_kern *ptr, u6
}
#ifdef CONFIG_BPF_JIT
-int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
+int bpf_trampoline_link_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog);
-int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
+int bpf_trampoline_unlink_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog);
struct bpf_trampoline *bpf_trampoline_get(u64 key,
@@ -1560,13 +1560,13 @@ bool bpf_insn_is_indirect_target(const struct bpf_verifier_env *env, const struc
int insn_idx);
u16 bpf_out_stack_arg_cnt(const struct bpf_verifier_env *env, const struct bpf_prog *prog);
#else
-static inline int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
+static inline int bpf_trampoline_link_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
return -ENOTSUPP;
}
-static inline int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
+static inline int bpf_trampoline_unlink_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
@@ -1890,12 +1890,17 @@ struct bpf_link_ops {
__poll_t (*poll)(struct file *file, struct poll_table_struct *pts);
};
-struct bpf_tramp_link {
- struct bpf_link link;
+struct bpf_tramp_node {
+ struct bpf_link *link;
struct hlist_node tramp_hlist;
u64 cookie;
};
+struct bpf_tramp_link {
+ struct bpf_link link;
+ struct bpf_tramp_node node;
+};
+
struct bpf_shim_tramp_link {
struct bpf_tramp_link link;
struct bpf_trampoline *trampoline;
@@ -2113,8 +2118,8 @@ void bpf_struct_ops_put(const void *kdata);
int bpf_struct_ops_supported(const struct bpf_struct_ops *st_ops, u32 moff);
int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
void *value);
-int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
- struct bpf_tramp_link *link,
+int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_nodes *tnodes,
+ struct bpf_tramp_node *node,
const struct btf_func_model *model,
void *stub_func,
void **image, u32 *image_off,
@@ -2209,31 +2214,31 @@ static inline void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_op
#endif
-static inline int bpf_fsession_cnt(struct bpf_tramp_links *links)
+static inline int bpf_fsession_cnt(struct bpf_tramp_nodes *nodes)
{
- struct bpf_tramp_links fentries = links[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes fentries = nodes[BPF_TRAMP_FENTRY];
int cnt = 0;
- for (int i = 0; i < links[BPF_TRAMP_FENTRY].nr_links; i++) {
- if (fentries.links[i]->link.prog->expected_attach_type == BPF_TRACE_FSESSION)
+ for (int i = 0; i < nodes[BPF_TRAMP_FENTRY].nr_nodes; i++) {
+ if (fentries.nodes[i]->link->prog->expected_attach_type == BPF_TRACE_FSESSION)
cnt++;
}
return cnt;
}
-static inline bool bpf_prog_calls_session_cookie(struct bpf_tramp_link *link)
+static inline bool bpf_prog_calls_session_cookie(struct bpf_tramp_node *node)
{
- return link->link.prog->call_session_cookie;
+ return node->link->prog->call_session_cookie;
}
-static inline int bpf_fsession_cookie_cnt(struct bpf_tramp_links *links)
+static inline int bpf_fsession_cookie_cnt(struct bpf_tramp_nodes *nodes)
{
- struct bpf_tramp_links fentries = links[BPF_TRAMP_FENTRY];
+ struct bpf_tramp_nodes fentries = nodes[BPF_TRAMP_FENTRY];
int cnt = 0;
- for (int i = 0; i < links[BPF_TRAMP_FENTRY].nr_links; i++) {
- if (bpf_prog_calls_session_cookie(fentries.links[i]))
+ for (int i = 0; i < nodes[BPF_TRAMP_FENTRY].nr_nodes; i++) {
+ if (bpf_prog_calls_session_cookie(fentries.nodes[i]))
cnt++;
}
@@ -2781,6 +2786,9 @@ void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
void bpf_link_init_sleepable(struct bpf_link *link, enum bpf_link_type type,
const struct bpf_link_ops *ops, struct bpf_prog *prog,
enum bpf_attach_type attach_type, bool sleepable);
+void bpf_tramp_link_init(struct bpf_tramp_link *link, enum bpf_link_type type,
+ const struct bpf_link_ops *ops, struct bpf_prog *prog,
+ enum bpf_attach_type attach_type, u64 cookie);
int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
int bpf_link_settle(struct bpf_link_primer *primer);
void bpf_link_cleanup(struct bpf_link_primer *primer);
@@ -3204,6 +3212,12 @@ static inline void bpf_link_init_sleepable(struct bpf_link *link, enum bpf_link_
{
}
+static inline void bpf_tramp_link_init(struct bpf_tramp_link *link, enum bpf_link_type type,
+ const struct bpf_link_ops *ops, struct bpf_prog *prog,
+ enum bpf_attach_type attach_type, u64 cookie)
+{
+}
+
static inline int bpf_link_prime(struct bpf_link *link,
struct bpf_link_primer *primer)
{
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 5e51c1211673..51b16e5f5534 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -594,8 +594,8 @@ const struct bpf_link_ops bpf_struct_ops_link_lops = {
.dealloc = bpf_struct_ops_link_dealloc,
};
-int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
- struct bpf_tramp_link *link,
+int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_nodes *tnodes,
+ struct bpf_tramp_node *node,
const struct btf_func_model *model,
void *stub_func,
void **_image, u32 *_image_off,
@@ -605,13 +605,13 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
void *image = *_image;
int size;
- tlinks[BPF_TRAMP_FENTRY].links[0] = link;
- tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
+ tnodes[BPF_TRAMP_FENTRY].nodes[0] = node;
+ tnodes[BPF_TRAMP_FENTRY].nr_nodes = 1;
if (model->ret_size > 0)
flags |= BPF_TRAMP_F_RET_FENTRY_RET;
- size = arch_bpf_trampoline_size(model, flags, tlinks, stub_func);
+ size = arch_bpf_trampoline_size(model, flags, tnodes, stub_func);
if (size <= 0)
return size ? : -EFAULT;
@@ -628,7 +628,7 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
size = arch_prepare_bpf_trampoline(NULL, image + image_off,
image + image_off + size,
- model, flags, tlinks, stub_func);
+ model, flags, tnodes, stub_func);
if (size <= 0) {
if (image != *_image)
bpf_struct_ops_image_free(image);
@@ -693,7 +693,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
const struct btf_type *module_type;
const struct btf_member *member;
const struct btf_type *t = st_ops_desc->type;
- struct bpf_tramp_links *tlinks;
+ struct bpf_tramp_nodes *tnodes;
void *udata, *kdata;
int prog_fd, err;
u32 i, trampoline_start, image_off = 0;
@@ -720,8 +720,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
if (uvalue->common.state || refcount_read(&uvalue->common.refcnt))
return -EINVAL;
- tlinks = kzalloc_objs(*tlinks, BPF_TRAMP_MAX);
- if (!tlinks)
+ tnodes = kzalloc_objs(*tnodes, BPF_TRAMP_MAX);
+ if (!tnodes)
return -ENOMEM;
uvalue = (struct bpf_struct_ops_value *)st_map->uvalue;
@@ -817,8 +817,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
err = -ENOMEM;
goto reset_unlock;
}
- bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS,
- &bpf_struct_ops_link_lops, prog, prog->expected_attach_type);
+ bpf_tramp_link_init(link, BPF_LINK_TYPE_STRUCT_OPS,
+ &bpf_struct_ops_link_lops, prog, prog->expected_attach_type, 0);
+
*plink++ = &link->link;
/* Poison pointer on error instead of return for backward compatibility */
@@ -832,7 +833,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
*pksym++ = ksym;
trampoline_start = image_off;
- err = bpf_struct_ops_prepare_trampoline(tlinks, link,
+ err = bpf_struct_ops_prepare_trampoline(tnodes, &link->node,
&st_ops->func_models[i],
*(void **)(st_ops->cfi_stubs + moff),
&image, &image_off,
@@ -911,7 +912,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
memset(uvalue, 0, map->value_size);
memset(kvalue, 0, map->value_size);
unlock:
- kfree(tlinks);
+ kfree(tnodes);
mutex_unlock(&st_map->lock);
if (!err)
bpf_struct_ops_map_add_ksyms(st_map);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 625a4366fe6d..7bb2271072e9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3263,6 +3263,15 @@ void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
bpf_link_init_sleepable(link, type, ops, prog, attach_type, false);
}
+void bpf_tramp_link_init(struct bpf_tramp_link *link, enum bpf_link_type type,
+ const struct bpf_link_ops *ops, struct bpf_prog *prog,
+ enum bpf_attach_type attach_type, u64 cookie)
+{
+ bpf_link_init(&link->link, type, ops, prog, attach_type);
+ link->node.link = &link->link;
+ link->node.cookie = cookie;
+}
+
static void bpf_link_free_id(int id)
{
if (!id)
@@ -3570,7 +3579,7 @@ static void bpf_tracing_link_release(struct bpf_link *link)
struct bpf_tracing_link *tr_link =
container_of(link, struct bpf_tracing_link, link.link);
- WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link,
+ WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link.node,
tr_link->trampoline,
tr_link->tgt_prog));
@@ -3583,8 +3592,7 @@ static void bpf_tracing_link_release(struct bpf_link *link)
static void bpf_tracing_link_dealloc(struct bpf_link *link)
{
- struct bpf_tracing_link *tr_link =
- container_of(link, struct bpf_tracing_link, link.link);
+ struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link);
kfree(tr_link);
}
@@ -3592,8 +3600,8 @@ static void bpf_tracing_link_dealloc(struct bpf_link *link)
static void bpf_tracing_link_show_fdinfo(const struct bpf_link *link,
struct seq_file *seq)
{
- struct bpf_tracing_link *tr_link =
- container_of(link, struct bpf_tracing_link, link.link);
+ struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link);
+
u32 target_btf_id, target_obj_id;
bpf_trampoline_unpack_key(tr_link->trampoline->key,
@@ -3606,17 +3614,16 @@ static void bpf_tracing_link_show_fdinfo(const struct bpf_link *link,
link->attach_type,
target_obj_id,
target_btf_id,
- tr_link->link.cookie);
+ tr_link->link.node.cookie);
}
static int bpf_tracing_link_fill_link_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
- struct bpf_tracing_link *tr_link =
- container_of(link, struct bpf_tracing_link, link.link);
+ struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link.link);
info->tracing.attach_type = link->attach_type;
- info->tracing.cookie = tr_link->link.cookie;
+ info->tracing.cookie = tr_link->link.node.cookie;
bpf_trampoline_unpack_key(tr_link->trampoline->key,
&info->tracing.target_obj_id,
&info->tracing.target_btf_id);
@@ -3703,9 +3710,9 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
fslink = kzalloc_obj(*fslink, GFP_USER);
if (fslink) {
- bpf_link_init(&fslink->fexit.link, BPF_LINK_TYPE_TRACING,
- &bpf_tracing_link_lops, prog, attach_type);
- fslink->fexit.cookie = bpf_cookie;
+ bpf_tramp_link_init(&fslink->fexit, BPF_LINK_TYPE_TRACING,
+ &bpf_tracing_link_lops, prog, attach_type,
+ bpf_cookie);
link = &fslink->link;
} else {
link = NULL;
@@ -3717,10 +3724,8 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
err = -ENOMEM;
goto out_put_prog;
}
- bpf_link_init(&link->link.link, BPF_LINK_TYPE_TRACING,
- &bpf_tracing_link_lops, prog, attach_type);
-
- link->link.cookie = bpf_cookie;
+ bpf_tramp_link_init(&link->link, BPF_LINK_TYPE_TRACING,
+ &bpf_tracing_link_lops, prog, attach_type, bpf_cookie);
mutex_lock(&prog->aux->dst_mutex);
@@ -3823,7 +3828,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
if (err)
goto out_unlock;
- err = bpf_trampoline_link_prog(&link->link, tr, tgt_prog);
+ err = bpf_trampoline_link_prog(&link->link.node, tr, tgt_prog);
if (err) {
bpf_link_cleanup(&link_primer);
link = NULL;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 701138ef424a..6a45c09fc0d8 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -502,30 +502,29 @@ static const struct bpf_trampoline_ops trampoline_ops = {
.modify_fentry = modify_fentry,
};
-static struct bpf_tramp_links *
+static struct bpf_tramp_nodes *
bpf_trampoline_get_progs(const struct bpf_trampoline *tr, int *total, bool *ip_arg)
{
- struct bpf_tramp_link *link;
- struct bpf_tramp_links *tlinks;
- struct bpf_tramp_link **links;
+ struct bpf_tramp_node *node, **nodes;
+ struct bpf_tramp_nodes *tnodes;
int kind;
*total = 0;
- tlinks = kzalloc_objs(*tlinks, BPF_TRAMP_MAX);
- if (!tlinks)
+ tnodes = kzalloc_objs(*tnodes, BPF_TRAMP_MAX);
+ if (!tnodes)
return ERR_PTR(-ENOMEM);
for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
- tlinks[kind].nr_links = tr->progs_cnt[kind];
+ tnodes[kind].nr_nodes = tr->progs_cnt[kind];
*total += tr->progs_cnt[kind];
- links = tlinks[kind].links;
+ nodes = tnodes[kind].nodes;
- hlist_for_each_entry(link, &tr->progs_hlist[kind], tramp_hlist) {
- *ip_arg |= link->link.prog->call_get_func_ip;
- *links++ = link;
+ hlist_for_each_entry(node, &tr->progs_hlist[kind], tramp_hlist) {
+ *ip_arg |= node->link->prog->call_get_func_ip;
+ *nodes++ = node;
}
}
- return tlinks;
+ return tnodes;
}
static void bpf_tramp_image_free(struct bpf_tramp_image *im)
@@ -673,14 +672,14 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
const struct bpf_trampoline_ops *ops, void *data)
{
struct bpf_tramp_image *im;
- struct bpf_tramp_links *tlinks;
+ struct bpf_tramp_nodes *tnodes;
u32 orig_flags = tr->flags;
bool ip_arg = false;
int err, total, size;
- tlinks = bpf_trampoline_get_progs(tr, &total, &ip_arg);
- if (IS_ERR(tlinks))
- return PTR_ERR(tlinks);
+ tnodes = bpf_trampoline_get_progs(tr, &total, &ip_arg);
+ if (IS_ERR(tnodes))
+ return PTR_ERR(tnodes);
if (total == 0) {
err = ops->unregister_fentry(tr, orig_flags, data);
@@ -690,8 +689,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
- if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
- tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
+ if (tnodes[BPF_TRAMP_FEXIT].nr_nodes ||
+ tnodes[BPF_TRAMP_MODIFY_RETURN].nr_nodes) {
/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
* should not be set together.
*/
@@ -722,7 +721,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
#endif
size = arch_bpf_trampoline_size(&tr->func.model, tr->flags,
- tlinks, tr->func.addr);
+ tnodes, tr->func.addr);
if (size < 0) {
err = size;
goto out;
@@ -740,7 +739,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
}
err = arch_prepare_bpf_trampoline(im, im->image, im->image + size,
- &tr->func.model, tr->flags, tlinks,
+ &tr->func.model, tr->flags, tnodes,
tr->func.addr);
if (err < 0)
goto out_free;
@@ -774,7 +773,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
/* If any error happens, restore previous flags */
if (err)
tr->flags = orig_flags;
- kfree(tlinks);
+ kfree(tnodes);
return err;
}
@@ -821,15 +820,15 @@ static int bpf_freplace_check_tgt_prog(struct bpf_prog *tgt_prog)
}
static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
- struct bpf_tramp_link *link,
+ struct bpf_tramp_node *node,
int cnt)
{
struct bpf_fsession_link *fslink = NULL;
enum bpf_tramp_prog_type kind;
- struct bpf_tramp_link *link_exiting;
+ struct bpf_tramp_node *node_existing;
struct hlist_head *prog_list;
- kind = bpf_attach_type_to_tramp(link->link.prog);
+ kind = bpf_attach_type_to_tramp(node->link->prog);
if (kind == BPF_TRAMP_FSESSION) {
prog_list = &tr->progs_hlist[BPF_TRAMP_FENTRY];
cnt++;
@@ -838,21 +837,21 @@ static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
}
if (cnt >= BPF_MAX_TRAMP_LINKS)
return -E2BIG;
- if (!hlist_unhashed(&link->tramp_hlist))
+ if (!hlist_unhashed(&node->tramp_hlist))
/* prog already linked */
return -EBUSY;
- hlist_for_each_entry(link_exiting, prog_list, tramp_hlist) {
- if (link_exiting->link.prog != link->link.prog)
+ hlist_for_each_entry(node_existing, prog_list, tramp_hlist) {
+ if (node_existing->link->prog != node->link->prog)
continue;
/* prog already linked */
return -EBUSY;
}
- hlist_add_head(&link->tramp_hlist, prog_list);
+ hlist_add_head(&node->tramp_hlist, prog_list);
if (kind == BPF_TRAMP_FSESSION) {
tr->progs_cnt[BPF_TRAMP_FENTRY]++;
- fslink = container_of(link, struct bpf_fsession_link, link.link);
- hlist_add_head(&fslink->fexit.tramp_hlist, &tr->progs_hlist[BPF_TRAMP_FEXIT]);
+ fslink = container_of(node, struct bpf_fsession_link, link.link.node);
+ hlist_add_head(&fslink->fexit.node.tramp_hlist, &tr->progs_hlist[BPF_TRAMP_FEXIT]);
tr->progs_cnt[BPF_TRAMP_FEXIT]++;
} else {
tr->progs_cnt[kind]++;
@@ -861,23 +860,23 @@ static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
}
static void bpf_trampoline_remove_prog(struct bpf_trampoline *tr,
- struct bpf_tramp_link *link)
+ struct bpf_tramp_node *node)
{
struct bpf_fsession_link *fslink;
enum bpf_tramp_prog_type kind;
- kind = bpf_attach_type_to_tramp(link->link.prog);
+ kind = bpf_attach_type_to_tramp(node->link->prog);
if (kind == BPF_TRAMP_FSESSION) {
- fslink = container_of(link, struct bpf_fsession_link, link.link);
- hlist_del_init(&fslink->fexit.tramp_hlist);
+ fslink = container_of(node, struct bpf_fsession_link, link.link.node);
+ hlist_del_init(&fslink->fexit.node.tramp_hlist);
tr->progs_cnt[BPF_TRAMP_FEXIT]--;
kind = BPF_TRAMP_FENTRY;
}
- hlist_del_init(&link->tramp_hlist);
+ hlist_del_init(&node->tramp_hlist);
tr->progs_cnt[kind]--;
}
-static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
+static int __bpf_trampoline_link_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog,
const struct bpf_trampoline_ops *ops,
@@ -887,7 +886,7 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
int err = 0;
int cnt = 0, i;
- kind = bpf_attach_type_to_tramp(link->link.prog);
+ kind = bpf_attach_type_to_tramp(node->link->prog);
if (tr->extension_prog)
/* cannot attach fentry/fexit if extension prog is attached.
* cannot overwrite extension prog either.
@@ -904,33 +903,33 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
err = bpf_freplace_check_tgt_prog(tgt_prog);
if (err)
return err;
- tr->extension_prog = link->link.prog;
+ tr->extension_prog = node->link->prog;
return bpf_arch_text_poke(tr->func.addr, BPF_MOD_NOP,
BPF_MOD_JUMP, NULL,
- link->link.prog->bpf_func);
+ node->link->prog->bpf_func);
}
- err = bpf_trampoline_add_prog(tr, link, cnt);
+ err = bpf_trampoline_add_prog(tr, node, cnt);
if (err)
return err;
err = bpf_trampoline_update(tr, true /* lock_direct_mutex */, ops, data);
if (err)
- bpf_trampoline_remove_prog(tr, link);
+ bpf_trampoline_remove_prog(tr, node);
return err;
}
-int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
+int bpf_trampoline_link_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
int err;
trampoline_lock(tr);
- err = __bpf_trampoline_link_prog(link, tr, tgt_prog, &trampoline_ops, NULL);
+ err = __bpf_trampoline_link_prog(node, tr, tgt_prog, &trampoline_ops, NULL);
trampoline_unlock(tr);
return err;
}
-static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
+static int __bpf_trampoline_unlink_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog,
const struct bpf_trampoline_ops *ops,
@@ -939,7 +938,7 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
enum bpf_tramp_prog_type kind;
int err;
- kind = bpf_attach_type_to_tramp(link->link.prog);
+ kind = bpf_attach_type_to_tramp(node->link->prog);
if (kind == BPF_TRAMP_REPLACE) {
WARN_ON_ONCE(!tr->extension_prog);
err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP,
@@ -950,19 +949,19 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
tgt_prog->aux->is_extended = false;
return err;
}
- bpf_trampoline_remove_prog(tr, link);
+ bpf_trampoline_remove_prog(tr, node);
return bpf_trampoline_update(tr, true /* lock_direct_mutex */, ops, data);
}
/* bpf_trampoline_unlink_prog() should never fail. */
-int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
+int bpf_trampoline_unlink_prog(struct bpf_tramp_node *node,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
int err;
trampoline_lock(tr);
- err = __bpf_trampoline_unlink_prog(link, tr, tgt_prog, &trampoline_ops, NULL);
+ err = __bpf_trampoline_unlink_prog(node, tr, tgt_prog, &trampoline_ops, NULL);
trampoline_unlock(tr);
return err;
}
@@ -977,7 +976,7 @@ static void bpf_shim_tramp_link_release(struct bpf_link *link)
if (!shim_link->trampoline)
return;
- WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link, shim_link->trampoline, NULL));
+ WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link.node, shim_link->trampoline, NULL));
bpf_trampoline_put(shim_link->trampoline);
}
@@ -1023,8 +1022,8 @@ static struct bpf_shim_tramp_link *cgroup_shim_alloc(const struct bpf_prog *prog
p->type = BPF_PROG_TYPE_LSM;
p->expected_attach_type = BPF_LSM_MAC;
bpf_prog_inc(p);
- bpf_link_init(&shim_link->link.link, BPF_LINK_TYPE_UNSPEC,
- &bpf_shim_tramp_link_lops, p, attach_type);
+ bpf_tramp_link_init(&shim_link->link, BPF_LINK_TYPE_UNSPEC,
+ &bpf_shim_tramp_link_lops, p, attach_type, 0);
bpf_cgroup_atype_get(p->aux->attach_btf_id, cgroup_atype);
return shim_link;
@@ -1033,15 +1032,15 @@ static struct bpf_shim_tramp_link *cgroup_shim_alloc(const struct bpf_prog *prog
static struct bpf_shim_tramp_link *cgroup_shim_find(struct bpf_trampoline *tr,
bpf_func_t bpf_func)
{
- struct bpf_tramp_link *link;
+ struct bpf_tramp_node *node;
int kind;
for (kind = 0; kind < BPF_TRAMP_MAX; kind++) {
- hlist_for_each_entry(link, &tr->progs_hlist[kind], tramp_hlist) {
- struct bpf_prog *p = link->link.prog;
+ hlist_for_each_entry(node, &tr->progs_hlist[kind], tramp_hlist) {
+ struct bpf_prog *p = node->link->prog;
if (p->bpf_func == bpf_func)
- return container_of(link, struct bpf_shim_tramp_link, link);
+ return container_of(node, struct bpf_shim_tramp_link, link.node);
}
}
@@ -1091,7 +1090,7 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
goto err;
}
- err = __bpf_trampoline_link_prog(&shim_link->link, tr, NULL, &trampoline_ops, NULL);
+ err = __bpf_trampoline_link_prog(&shim_link->link.node, tr, NULL, &trampoline_ops, NULL);
if (err)
goto err;
@@ -1406,7 +1405,7 @@ bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog)
int __weak
arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks,
+ struct bpf_tramp_nodes *tnodes,
void *func_addr)
{
return -ENOTSUPP;
@@ -1440,7 +1439,7 @@ int __weak arch_protect_bpf_trampoline(void *image, unsigned int size)
}
int __weak arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
- struct bpf_tramp_links *tlinks, void *func_addr)
+ struct bpf_tramp_nodes *tnodes, void *func_addr)
{
return -ENOTSUPP;
}
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index ae5a54c350b9..191a6b3ee254 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -132,7 +132,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
const struct bpf_struct_ops *st_ops = &bpf_bpf_dummy_ops;
const struct btf_type *func_proto;
struct bpf_dummy_ops_test_args *args;
- struct bpf_tramp_links *tlinks = NULL;
+ struct bpf_tramp_nodes *tnodes = NULL;
struct bpf_tramp_link *link = NULL;
void *image = NULL;
unsigned int op_idx;
@@ -158,8 +158,8 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
if (err)
goto out;
- tlinks = kzalloc_objs(*tlinks, BPF_TRAMP_MAX);
- if (!tlinks) {
+ tnodes = kzalloc_objs(*tnodes, BPF_TRAMP_MAX);
+ if (!tnodes) {
err = -ENOMEM;
goto out;
}
@@ -171,11 +171,11 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
}
/* prog doesn't take the ownership of the reference from caller */
bpf_prog_inc(prog);
- bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_link_lops, prog,
- prog->expected_attach_type);
+ bpf_tramp_link_init(link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_link_lops,
+ prog, prog->expected_attach_type, 0);
op_idx = prog->expected_attach_type;
- err = bpf_struct_ops_prepare_trampoline(tlinks, link,
+ err = bpf_struct_ops_prepare_trampoline(tnodes, &link->node,
&st_ops->func_models[op_idx],
&dummy_ops_test_ret_function,
&image, &image_off,
@@ -198,7 +198,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
bpf_struct_ops_image_free(image);
if (link)
bpf_link_put(&link->link);
- kfree(tlinks);
+ kfree(tnodes);
return err;
}
--
2.54.0
^ permalink raw reply related
* [PATCHv7 bpf-next 09/29] bpf: Factor fsession link to use struct bpf_tramp_node
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260603110554.29590-1-jolsa@kernel.org>
Now that we split trampoline attachment object (bpf_tramp_node) from
the link object (bpf_tramp_link) we can use bpf_tramp_node as fsession's
fexit attachment object and get rid of the bpf_fsession_link object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/bpf.h | 6 +-----
kernel/bpf/syscall.c | 21 ++++++---------------
kernel/bpf/trampoline.c | 12 ++++++------
3 files changed, 13 insertions(+), 26 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 27d13d6c14be..4764b4aa7081 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1908,15 +1908,11 @@ struct bpf_shim_tramp_link {
struct bpf_tracing_link {
struct bpf_tramp_link link;
+ struct bpf_tramp_node fexit;
struct bpf_trampoline *trampoline;
struct bpf_prog *tgt_prog;
};
-struct bpf_fsession_link {
- struct bpf_tracing_link link;
- struct bpf_tramp_link fexit;
-};
-
struct bpf_raw_tp_link {
struct bpf_link link;
struct bpf_raw_event_map *btp;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7bb2271072e9..f308ebdab750 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3705,21 +3705,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
key = bpf_trampoline_compute_key(tgt_prog, NULL, btf_id);
}
- if (prog->expected_attach_type == BPF_TRACE_FSESSION) {
- struct bpf_fsession_link *fslink;
-
- fslink = kzalloc_obj(*fslink, GFP_USER);
- if (fslink) {
- bpf_tramp_link_init(&fslink->fexit, BPF_LINK_TYPE_TRACING,
- &bpf_tracing_link_lops, prog, attach_type,
- bpf_cookie);
- link = &fslink->link;
- } else {
- link = NULL;
- }
- } else {
- link = kzalloc_obj(*link, GFP_USER);
- }
+ link = kzalloc_obj(*link, GFP_USER);
if (!link) {
err = -ENOMEM;
goto out_put_prog;
@@ -3727,6 +3713,11 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
bpf_tramp_link_init(&link->link, BPF_LINK_TYPE_TRACING,
&bpf_tracing_link_lops, prog, attach_type, bpf_cookie);
+ if (prog->expected_attach_type == BPF_TRACE_FSESSION) {
+ link->fexit.link = &link->link.link;
+ link->fexit.cookie = bpf_cookie;
+ }
+
mutex_lock(&prog->aux->dst_mutex);
/* There are a few possible cases here:
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 6a45c09fc0d8..5776d2b8e36e 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -823,7 +823,7 @@ static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
struct bpf_tramp_node *node,
int cnt)
{
- struct bpf_fsession_link *fslink = NULL;
+ struct bpf_tracing_link *tr_link = NULL;
enum bpf_tramp_prog_type kind;
struct bpf_tramp_node *node_existing;
struct hlist_head *prog_list;
@@ -850,8 +850,8 @@ static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
hlist_add_head(&node->tramp_hlist, prog_list);
if (kind == BPF_TRAMP_FSESSION) {
tr->progs_cnt[BPF_TRAMP_FENTRY]++;
- fslink = container_of(node, struct bpf_fsession_link, link.link.node);
- hlist_add_head(&fslink->fexit.node.tramp_hlist, &tr->progs_hlist[BPF_TRAMP_FEXIT]);
+ tr_link = container_of(node, struct bpf_tracing_link, link.node);
+ hlist_add_head(&tr_link->fexit.tramp_hlist, &tr->progs_hlist[BPF_TRAMP_FEXIT]);
tr->progs_cnt[BPF_TRAMP_FEXIT]++;
} else {
tr->progs_cnt[kind]++;
@@ -862,13 +862,13 @@ static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
static void bpf_trampoline_remove_prog(struct bpf_trampoline *tr,
struct bpf_tramp_node *node)
{
- struct bpf_fsession_link *fslink;
+ struct bpf_tracing_link *tr_link;
enum bpf_tramp_prog_type kind;
kind = bpf_attach_type_to_tramp(node->link->prog);
if (kind == BPF_TRAMP_FSESSION) {
- fslink = container_of(node, struct bpf_fsession_link, link.link.node);
- hlist_del_init(&fslink->fexit.node.tramp_hlist);
+ tr_link = container_of(node, struct bpf_tracing_link, link.node);
+ hlist_del_init(&tr_link->fexit.tramp_hlist);
tr->progs_cnt[BPF_TRAMP_FEXIT]--;
kind = BPF_TRAMP_FENTRY;
}
--
2.54.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox