* [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
@ 2025-01-16 12:49 Theodore Ts'o
2025-01-16 16:53 ` Al Viro
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Theodore Ts'o @ 2025-01-16 12:49 UTC (permalink / raw)
To: lsf-pc; +Cc: Linux Filesystem Development List, bpf
Historically, we have avoided adding tracepoints to the VFS because of
concerns that tracepoints would be considered a userspace-level
interface, and would therefore potentially constrain our ability to
improve an interface which has been extremely performance critical.
I'd like to discuss whether in 2025, it's time to reconsider our
reticence in adding tracepoints in the VFS layer. First, while there
has been a single incident of a tracepoint being used by programs that
were distributed far and wide (powertop) such that we had to revert a
change to a tracepoint that broke it --- that was ***14** years ago,
in 2011. Across multiple other subsystems, many of
which have added an extensive number of tracepoints, there has been
only a single problem in over a decade, so I'd like to suggest that
this concern may have not have been as serious as we had first
thought.
In practice, most tracepoints are used by system administrators and
they have to deal with enough changes that break backwards
compatibility (e.g., bash 3 ->bash 4, bash 4 -> bash 5, python 2.7 ->
python 3, etc.) that the ones who really care end up using an
enterprise distribution, which goes to extreme length to maintain the
stable ABI nonsense. Maintaining tracepoints shouldn't be a big deal
for them.
Secondly, we've had a very long time to let the dentry interface
mature, and so (a) the fundamental architecture of the dcache hasn't
been changing as much in the past few years, and (b) we should have
enough understanding of the interface to understand where we could put
tracepoints (e.g., close to the syscall interface) which would make it
much less likely that there would be any need to make
backwards-incompatible changes to tracepoints.
The benefits of this would be to make it much easier for users,
developers, and kernel developers to use BPF to probe file
system-related activities. Today, people who want to do these sorts
of things need to use fs-specific tracepoints (for example, ext4 has a
very large number of tracepoints which can be used for this purpose)
but this locks users into a single file system and makes it harder for
them to switch to a different file system, or if they want to use
different file systems for different use cases.
I'd like to propose that we experiment with adding tracepoints in
early 2025, so that at the end of the year the year-end 2025 LTS
kernels will have tracepoints that we are confident will be fit for
purpose for BPF users.
Thanks,
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 12:49 [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs? Theodore Ts'o
@ 2025-01-16 16:53 ` Al Viro
2025-01-16 17:29 ` [Lsf-pc] " Jan Kara
2025-01-16 17:20 ` Jan Kara
2025-01-16 21:18 ` Dave Chinner
2 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2025-01-16 16:53 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: lsf-pc, Linux Filesystem Development List, bpf
On Thu, Jan 16, 2025 at 07:49:49AM -0500, Theodore Ts'o wrote:
> Secondly, we've had a very long time to let the dentry interface
> mature, and so (a) the fundamental architecture of the dcache hasn't
> been changing as much in the past few years, and (b) we should have
> enough understanding of the interface to understand where we could put
> tracepoints (e.g., close to the syscall interface) which would make it
> much less likely that there would be any need to make
> backwards-incompatible changes to tracepoints.
FWIW, earlier this week I'd been going through the piles of tracepoints
playing with ->d_name. Mature interface or not, they do manage to
fuck that up...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 12:49 [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs? Theodore Ts'o
2025-01-16 16:53 ` Al Viro
@ 2025-01-16 17:20 ` Jan Kara
2025-01-20 15:43 ` Christian Brauner
2025-01-16 21:18 ` Dave Chinner
2 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2025-01-16 17:20 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: lsf-pc, Linux Filesystem Development List, bpf
On Thu 16-01-25 07:49:49, Theodore Ts'o wrote:
> Historically, we have avoided adding tracepoints to the VFS because of
> concerns that tracepoints would be considered a userspace-level
> interface, and would therefore potentially constrain our ability to
> improve an interface which has been extremely performance critical.
>
> I'd like to discuss whether in 2025, it's time to reconsider our
> reticence in adding tracepoints in the VFS layer. First, while there
> has been a single incident of a tracepoint being used by programs that
> were distributed far and wide (powertop) such that we had to revert a
> change to a tracepoint that broke it --- that was ***14** years ago,
> in 2011. Across multiple other subsystems, many of
> which have added an extensive number of tracepoints, there has been
> only a single problem in over a decade, so I'd like to suggest that
> this concern may have not have been as serious as we had first
> thought.
>
> In practice, most tracepoints are used by system administrators and
> they have to deal with enough changes that break backwards
> compatibility (e.g., bash 3 ->bash 4, bash 4 -> bash 5, python 2.7 ->
> python 3, etc.) that the ones who really care end up using an
> enterprise distribution, which goes to extreme length to maintain the
> stable ABI nonsense. Maintaining tracepoints shouldn't be a big deal
> for them.
>
> Secondly, we've had a very long time to let the dentry interface
> mature, and so (a) the fundamental architecture of the dcache hasn't
> been changing as much in the past few years, and (b) we should have
> enough understanding of the interface to understand where we could put
> tracepoints (e.g., close to the syscall interface) which would make it
> much less likely that there would be any need to make
> backwards-incompatible changes to tracepoints.
>
> The benefits of this would be to make it much easier for users,
> developers, and kernel developers to use BPF to probe file
> system-related activities. Today, people who want to do these sorts
> of things need to use fs-specific tracepoints (for example, ext4 has a
> very large number of tracepoints which can be used for this purpose)
> but this locks users into a single file system and makes it harder for
> them to switch to a different file system, or if they want to use
> different file systems for different use cases.
>
> I'd like to propose that we experiment with adding tracepoints in
> early 2025, so that at the end of the year the year-end 2025 LTS
> kernels will have tracepoints that we are confident will be fit for
> purpose for BPF users.
So I personally have nothing against tracepoints in VFS. Occasionally they
are useful and so far userspace was pretty much accepting the fact that
they are a moving target. That being said with BPF and all the tooling
around it (bcc, bpftrace) userspace has in my experience very much adapted
to just attaching BPF programs to random functions through kprobes so they
are not even relying that much on tracepoints anymore. Just look through
bcc scripts collection... I have myself adopted to a lack of trace points
in VFS by just using kprobes. The learning curve is a bit steeper but after
that it's not a big deal. I'm watching with a bit of concern developments
like BTF which try to provide some illusion of stability where there isn't
much of it. So some tool could spread wide enough without getting regularly
broken that breaking it will become a problem. But that is not really the
topic of this discussion.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 16:53 ` Al Viro
@ 2025-01-16 17:29 ` Jan Kara
0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2025-01-16 17:29 UTC (permalink / raw)
To: Al Viro; +Cc: Theodore Ts'o, lsf-pc, Linux Filesystem Development List, bpf
On Thu 16-01-25 16:53:21, Al Viro wrote:
> On Thu, Jan 16, 2025 at 07:49:49AM -0500, Theodore Ts'o wrote:
>
> > Secondly, we've had a very long time to let the dentry interface
> > mature, and so (a) the fundamental architecture of the dcache hasn't
> > been changing as much in the past few years, and (b) we should have
> > enough understanding of the interface to understand where we could put
> > tracepoints (e.g., close to the syscall interface) which would make it
> > much less likely that there would be any need to make
> > backwards-incompatible changes to tracepoints.
>
> FWIW, earlier this week I'd been going through the piles of tracepoints
> playing with ->d_name. Mature interface or not, they do manage to
> fuck that up...
Well, tracepoints are like any other rarely executed kernel code. The bugs
do accumulate there with higher probability due to lack of testing. But I
guess that's not strong enough reason to refuse them.
I remember you were refusing tracepoints in VFS in the past on the grounds
that it could make code changes harder due to concerns of breaking
tracepoint users. That is a fair concern but I guess it is also a fair
question whether we shouldn't reconsider this decision given how the rest
of the Linux kernel and the tracing ecosystem around it evolves...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 12:49 [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs? Theodore Ts'o
2025-01-16 16:53 ` Al Viro
2025-01-16 17:20 ` Jan Kara
@ 2025-01-16 21:18 ` Dave Chinner
2025-01-16 21:43 ` Andrii Nakryiko
2025-01-18 3:07 ` Daniel Xu
2 siblings, 2 replies; 13+ messages in thread
From: Dave Chinner @ 2025-01-16 21:18 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: lsf-pc, Linux Filesystem Development List, bpf
On Thu, Jan 16, 2025 at 07:49:49AM -0500, Theodore Ts'o wrote:
> Historically, we have avoided adding tracepoints to the VFS because of
> concerns that tracepoints would be considered a userspace-level
> interface, and would therefore potentially constrain our ability to
> improve an interface which has been extremely performance critical.
Yes, the lack of tracepoints in the VFS is a fairly significant
issue when it comes to runtime debugging of production systems...
> I'd like to discuss whether in 2025, it's time to reconsider our
> reticence in adding tracepoints in the VFS layer. First, while there
> has been a single incident of a tracepoint being used by programs that
> were distributed far and wide (powertop) such that we had to revert a
> change to a tracepoint that broke it --- that was ***14** years ago,
> in 2011.
Yes, that was a big mistake in multiple ways. Firstly, the app using
a tracepoint in this way. The second mistake was the response that
"tracepoints should be stable API" based on the abuse of a single
tracepoint.
We had extensive tracepoint coverage in subsystems *before* this
happened. In XFS, we had already converted hundreds of existing
debug-build-only tracing calls to use tracepoints based on the
understanding that tracepoints were *not* considered stable user
interfaces.
The fact that existing subsystem tracepoints already exposed the
internal implementation of objects like struct inode, struct file,
superblocks, etc simply wasn't considered when tracepoints were
declared "stable".
The fact is that it is simply not possible to maintain any sort of
useful introspection with the tracepoint infrastructure without
exposing internal implementation details that can change from kernel
to kernel.
> Across multiple other subsystems, many of
> which have added an extensive number of tracepoints, there has been
> only a single problem in over a decade, so I'd like to suggest that
> this concern may have not have been as serious as we had first
> thought.
Yes, these subsystems still operate under the "tracepoints are not
stable" understanding. The reality is that userspace has *never*
been able to rely on tracepoints being stable across multiple kernel
releases, regardless of what anyone else (including Linus) says is
the policy.
> I'd like to propose that we experiment with adding tracepoints in
> early 2025, so that at the end of the year the year-end 2025 LTS
> kernels will have tracepoints that we are confident will be fit for
> purpose for BPF users.
Why does BPF even need tracepoints? BPF code should be using kprobes
to hook into the running kernel to monitor it, yes?
Regardless of BPF, why not just send patches to add the tracepoints
you want?
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 21:18 ` Dave Chinner
@ 2025-01-16 21:43 ` Andrii Nakryiko
2025-01-17 2:20 ` Al Viro
2025-01-20 15:42 ` Christian Brauner
2025-01-18 3:07 ` Daniel Xu
1 sibling, 2 replies; 13+ messages in thread
From: Andrii Nakryiko @ 2025-01-16 21:43 UTC (permalink / raw)
To: Dave Chinner
Cc: Theodore Ts'o, lsf-pc, Linux Filesystem Development List, bpf
On Thu, Jan 16, 2025 at 1:18 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Jan 16, 2025 at 07:49:49AM -0500, Theodore Ts'o wrote:
> > Historically, we have avoided adding tracepoints to the VFS because of
> > concerns that tracepoints would be considered a userspace-level
> > interface, and would therefore potentially constrain our ability to
> > improve an interface which has been extremely performance critical.
>
> Yes, the lack of tracepoints in the VFS is a fairly significant
> issue when it comes to runtime debugging of production systems...
>
> > I'd like to discuss whether in 2025, it's time to reconsider our
> > reticence in adding tracepoints in the VFS layer. First, while there
> > has been a single incident of a tracepoint being used by programs that
> > were distributed far and wide (powertop) such that we had to revert a
> > change to a tracepoint that broke it --- that was ***14** years ago,
> > in 2011.
>
> Yes, that was a big mistake in multiple ways. Firstly, the app using
> a tracepoint in this way. The second mistake was the response that
> "tracepoints should be stable API" based on the abuse of a single
> tracepoint.
>
> We had extensive tracepoint coverage in subsystems *before* this
> happened. In XFS, we had already converted hundreds of existing
> debug-build-only tracing calls to use tracepoints based on the
> understanding that tracepoints were *not* considered stable user
> interfaces.
>
> The fact that existing subsystem tracepoints already exposed the
> internal implementation of objects like struct inode, struct file,
> superblocks, etc simply wasn't considered when tracepoints were
> declared "stable".
>
> The fact is that it is simply not possible to maintain any sort of
> useful introspection with the tracepoint infrastructure without
> exposing internal implementation details that can change from kernel
> to kernel.
>
> > Across multiple other subsystems, many of
> > which have added an extensive number of tracepoints, there has been
> > only a single problem in over a decade, so I'd like to suggest that
> > this concern may have not have been as serious as we had first
> > thought.
>
> Yes, these subsystems still operate under the "tracepoints are not
> stable" understanding. The reality is that userspace has *never*
> been able to rely on tracepoints being stable across multiple kernel
> releases, regardless of what anyone else (including Linus) says is
> the policy.
>
> > I'd like to propose that we experiment with adding tracepoints in
> > early 2025, so that at the end of the year the year-end 2025 LTS
> > kernels will have tracepoints that we are confident will be fit for
> > purpose for BPF users.
>
> Why does BPF even need tracepoints? BPF code should be using kprobes
> to hook into the running kernel to monitor it, yes?
This is way more nuanced than that. There are at least a few
advantages that tracepoints have over kprobes, even if both are usable
(and useful) with BPF:
- kprobes very often get inlined by the compiler (especially if they
are static functions), making them unusable (and kprobing inlined
functions comes with a huge set of additional hurdles and problems, we
don't have to go into details here). This is probably the biggest
issue in practice for which tracepoints are way-way better.
- raw performance: tracepoints are *significantly* faster than
kprobes (like 2-3x less overhead, [0])
- relative stability of tracepoints in terms of naming, semantics,
arguments. While not stable APIs, tracepoints are "more stable" in
practice due to more deliberate and strategic placement (usually), so
they tend to get renamed or changed much less frequently.
So, as far as BPF is concerned, tracepoints are still preferable to
kprobes for something like VFS, and just because BPF can be used with
kprobes easily doesn't mean BPF users don't need useful tracepoints.
[0] https://patchwork.kernel.org/project/netdevbpf/patch/20240326162151.3981687-3-andrii@kernel.org/
>
> Regardless of BPF, why not just send patches to add the tracepoints
> you want?
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 21:43 ` Andrii Nakryiko
@ 2025-01-17 2:20 ` Al Viro
2025-01-17 18:33 ` Andrii Nakryiko
2025-01-20 15:42 ` Christian Brauner
1 sibling, 1 reply; 13+ messages in thread
From: Al Viro @ 2025-01-17 2:20 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Dave Chinner, Theodore Ts'o, lsf-pc,
Linux Filesystem Development List, bpf
On Thu, Jan 16, 2025 at 01:43:39PM -0800, Andrii Nakryiko wrote:
> - relative stability of tracepoints in terms of naming, semantics,
> arguments. While not stable APIs, tracepoints are "more stable" in
> practice due to more deliberate and strategic placement (usually), so
> they tend to get renamed or changed much less frequently.
>
> So, as far as BPF is concerned, tracepoints are still preferable to
> kprobes for something like VFS, and just because BPF can be used with
> kprobes easily doesn't mean BPF users don't need useful tracepoints.
The problem is, exact same reasons invite their use by LSM-in-BPF and
similar projects, and once that happens, the rules regarding stability
will bite and bite _hard_.
And from what I've seen from the same LSM-in-BPF folks, it won't stay
within relatively stable areas - not for long, anyway.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-17 2:20 ` Al Viro
@ 2025-01-17 18:33 ` Andrii Nakryiko
0 siblings, 0 replies; 13+ messages in thread
From: Andrii Nakryiko @ 2025-01-17 18:33 UTC (permalink / raw)
To: Al Viro
Cc: Dave Chinner, Theodore Ts'o, lsf-pc,
Linux Filesystem Development List, bpf
On Thu, Jan 16, 2025 at 6:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Jan 16, 2025 at 01:43:39PM -0800, Andrii Nakryiko wrote:
>
> > - relative stability of tracepoints in terms of naming, semantics,
> > arguments. While not stable APIs, tracepoints are "more stable" in
> > practice due to more deliberate and strategic placement (usually), so
> > they tend to get renamed or changed much less frequently.
> >
> > So, as far as BPF is concerned, tracepoints are still preferable to
> > kprobes for something like VFS, and just because BPF can be used with
> > kprobes easily doesn't mean BPF users don't need useful tracepoints.
>
> The problem is, exact same reasons invite their use by LSM-in-BPF and
> similar projects, and once that happens, the rules regarding stability
> will bite and bite _hard_.
Not clear what you mean by "their use"... Use of tracepoint by
LSM-in-BPF? Sure, to augment information gathering, perhaps, if there
is no more suitable LSM hook. But tracepoints don't allow you to make
decisions, that's the biggest difference between LSM hooks and
tracepoints from BPF POV (IMO): LSMs allow decision making,
tracepoints are read-only.
Or you mean use of LSM hooks by BPF because they are more stable
semantically? If so, yes, sure, that's a good property. Still, neither
tracepoint nor BPF LSM hooks are truly stable APIs, and users are
prepared and expected to work around that.
So, again, from BPF and BPF users' POV, neither tracepoint nor LSM
provides or guarantees API stability (though, in practice, they are,
thankfully, pretty semantically stable, which reduces the amount of
pain, of course).
>
> And from what I've seen from the same LSM-in-BPF folks, it won't stay
> within relatively stable areas - not for long, anyway.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 21:18 ` Dave Chinner
2025-01-16 21:43 ` Andrii Nakryiko
@ 2025-01-18 3:07 ` Daniel Xu
2025-01-18 3:37 ` Al Viro
1 sibling, 1 reply; 13+ messages in thread
From: Daniel Xu @ 2025-01-18 3:07 UTC (permalink / raw)
To: Dave Chinner
Cc: Theodore Ts'o, lsf-pc, Linux Filesystem Development List, bpf
Hi Dave,
On Fri, Jan 17, 2025 at 08:18:37AM +1100, Dave Chinner wrote:
> On Thu, Jan 16, 2025 at 07:49:49AM -0500, Theodore Ts'o wrote:
> > Historically, we have avoided adding tracepoints to the VFS because of
> > concerns that tracepoints would be considered a userspace-level
> > interface, and would therefore potentially constrain our ability to
> > improve an interface which has been extremely performance critical.
>
> Yes, the lack of tracepoints in the VFS is a fairly significant
> issue when it comes to runtime debugging of production systems...
>
> > I'd like to discuss whether in 2025, it's time to reconsider our
> > reticence in adding tracepoints in the VFS layer. First, while there
> > has been a single incident of a tracepoint being used by programs that
> > were distributed far and wide (powertop) such that we had to revert a
> > change to a tracepoint that broke it --- that was ***14** years ago,
> > in 2011.
>
> Yes, that was a big mistake in multiple ways. Firstly, the app using
> a tracepoint in this way. The second mistake was the response that
> "tracepoints should be stable API" based on the abuse of a single
> tracepoint.
>
> We had extensive tracepoint coverage in subsystems *before* this
> happened. In XFS, we had already converted hundreds of existing
> debug-build-only tracing calls to use tracepoints based on the
> understanding that tracepoints were *not* considered stable user
> interfaces.
>
> The fact that existing subsystem tracepoints already exposed the
> internal implementation of objects like struct inode, struct file,
> superblocks, etc simply wasn't considered when tracepoints were
> declared "stable".
>
> The fact is that it is simply not possible to maintain any sort of
> useful introspection with the tracepoint infrastructure without
> exposing internal implementation details that can change from kernel
> to kernel.
>
> > Across multiple other subsystems, many of
> > which have added an extensive number of tracepoints, there has been
> > only a single problem in over a decade, so I'd like to suggest that
> > this concern may have not have been as serious as we had first
> > thought.
>
> Yes, these subsystems still operate under the "tracepoints are not
> stable" understanding. The reality is that userspace has *never*
> been able to rely on tracepoints being stable across multiple kernel
> releases, regardless of what anyone else (including Linus) says is
> the policy.
As a (relatively) long time bpftrace developer, I've always been
fairly consistent with users new to linux tracing that tracepoints
are _not_ guaranteed to be stable and they exist on the stability
spectrum somewhere between kprobes/fentry and uapi.
IIRC from the cases I've seen where tracepoints shift, users just adjust
their scripts. I don't remember having seen anyone both think that it's
the kernel's fault and then go complain on list.
I'm happy to adjust any of bpftrace's public facing docs to make that
reality more clear if it'll help.
>
> > I'd like to propose that we experiment with adding tracepoints in
> > early 2025, so that at the end of the year the year-end 2025 LTS
> > kernels will have tracepoints that we are confident will be fit for
> > purpose for BPF users.
>
> Why does BPF even need tracepoints? BPF code should be using kprobes
> to hook into the running kernel to monitor it, yes?
In addition to the points Andrii makes below, tracepoints also have a
nice documenting property. They tend to get added to "places of
interest". They're a great starting point for non kernel developers to
dig into kernel internals. Often times tracepoint naming (as well as the
exported fields) provide helpful hints.
At least for me, if I'm mucking around new places (mostly net/) I'll
tend to go look at the tracepoints to find the interesting codepaths.
[..]
Thanks,
Daniel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-18 3:07 ` Daniel Xu
@ 2025-01-18 3:37 ` Al Viro
0 siblings, 0 replies; 13+ messages in thread
From: Al Viro @ 2025-01-18 3:37 UTC (permalink / raw)
To: Daniel Xu
Cc: Dave Chinner, Theodore Ts'o, lsf-pc,
Linux Filesystem Development List, bpf
On Fri, Jan 17, 2025 at 08:07:48PM -0700, Daniel Xu wrote:
> In addition to the points Andrii makes below, tracepoints also have a
> nice documenting property. They tend to get added to "places of
> interest". They're a great starting point for non kernel developers to
> dig into kernel internals. Often times tracepoint naming (as well as the
> exported fields) provide helpful hints.
>
> At least for me, if I'm mucking around new places (mostly net/) I'll
> tend to go look at the tracepoints to find the interesting codepaths.
Here's one for you:
trace_ocfs2_file_splice_read(inode, in, in->f_path.dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
in->f_path.dentry->d_name.len,
in->f_path.dentry->d_name.name,
flags);
The trouble is, what happens if your ->splice_read() races
with rename()? Yes, it is allowed to happen in parallel with
splice(2). Or with read(2), for that matter. Or close(2) (and
dup2(2) or exit(2) of something that happens to have the file
opened).
What happens is that
* you get len and name that might not match each other - you might
see len being 200 and name pointing to 40-byte array inside dentry.
* you get name that is not guaranteed to be *there* - you might
pick one before rename and have it freed and reused by the time you
try to access it.
* you get name that points to a string that might be modified
by another CPU right under you (for short names).
Doing that inside ->mkdir() - sure, no problem, the name _is_ stable
there. Doing that inside ->lookup() - fine on the entry, may be not
safe on the way out.
In filesystems it's living dangerously, but as long as you know what
you are doing you can get away with that (ocfs2 folks hadn't, but
it's not just ocfs2 - similar tracepoints exist for nfs, etc.)...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 21:43 ` Andrii Nakryiko
2025-01-17 2:20 ` Al Viro
@ 2025-01-20 15:42 ` Christian Brauner
1 sibling, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2025-01-20 15:42 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Dave Chinner, Theodore Ts'o, lsf-pc,
Linux Filesystem Development List, bpf
> - relative stability of tracepoints in terms of naming, semantics,
> arguments. While not stable APIs, tracepoints are "more stable" in
> practice due to more deliberate and strategic placement (usually), so
> they tend to get renamed or changed much less frequently.
I will support tracepoints in the VFS. It would be very useful to have
them.
But we will clearly document that we retain the right to change them at
any time. Tracepoints will not become a burden for refactorings or
rewrites that tend to happen not that infrequently.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-16 17:20 ` Jan Kara
@ 2025-01-20 15:43 ` Christian Brauner
2025-01-20 17:15 ` Jan Kara
0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2025-01-20 15:43 UTC (permalink / raw)
To: Jan Kara
Cc: Theodore Ts'o, lsf-pc, Linux Filesystem Development List, bpf
> that it's not a big deal. I'm watching with a bit of concern developments
> like BTF which try to provide some illusion of stability where there isn't
> much of it. So some tool could spread wide enough without getting regularly
> broken that breaking it will become a problem. But that is not really the
> topic of this discussion.
We've stated over and over and will document that we give no stability
guarantees in that regard.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?
2025-01-20 15:43 ` Christian Brauner
@ 2025-01-20 17:15 ` Jan Kara
0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2025-01-20 17:15 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Theodore Ts'o, lsf-pc,
Linux Filesystem Development List, bpf
On Mon 20-01-25 16:43:31, Christian Brauner wrote:
> > that it's not a big deal. I'm watching with a bit of concern developments
> > like BTF which try to provide some illusion of stability where there isn't
> > much of it. So some tool could spread wide enough without getting regularly
> > broken that breaking it will become a problem. But that is not really the
> > topic of this discussion.
>
> We've stated over and over and will document that we give no stability
> guarantees in that regard.
I'm fully in support of stating that and documenting that because setting
the expectation is important. And I'm also in support of adding tracepoints
to VFS. As Ted wrote, so far both kernel and userspace parts of tracing
were able live along together smoothly (at least from the kernel side ;)).
But I've also heard Linus explicitely saying something along the lines that
if a change in a trace point breaks real users, he's going to revert that
change no matter what you've documented. So we have to take that
possibility into account as well.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-01-20 17:15 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-16 12:49 [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs? Theodore Ts'o
2025-01-16 16:53 ` Al Viro
2025-01-16 17:29 ` [Lsf-pc] " Jan Kara
2025-01-16 17:20 ` Jan Kara
2025-01-20 15:43 ` Christian Brauner
2025-01-20 17:15 ` Jan Kara
2025-01-16 21:18 ` Dave Chinner
2025-01-16 21:43 ` Andrii Nakryiko
2025-01-17 2:20 ` Al Viro
2025-01-17 18:33 ` Andrii Nakryiko
2025-01-20 15:42 ` Christian Brauner
2025-01-18 3:07 ` Daniel Xu
2025-01-18 3:37 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox