* [PATCH v3 0/2] eventfs: Create dentries and inodes at dir open
@ 2024-01-16 22:55 Steven Rostedt
2024-01-16 22:55 ` [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same Steven Rostedt
2024-01-16 22:55 ` [PATCH v3 2/2] eventfs: Do not create dentries nor inodes in iterate_shared Steven Rostedt
0 siblings, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2024-01-16 22:55 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Linus Torvalds,
Christian Brauner, Al Viro, Ajay Kaher, linux-fsdevel
[ subject is still wrong, but is to match v2, see patch 2 for correct subject ]
Changes since v2: https://lore.kernel.org/all/20240116211217.968123837@goodmis.org/
Implemented Linus's suggestion to just change the iterate_shared to
use the hard coded inodes.
Steven Rostedt (Google) (2):
eventfs: Have the inodes all for files and directories all be the same
eventfs: Do not create dentries nor inodes in iterate_shared
----
fs/tracefs/event_inode.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same
2024-01-16 22:55 [PATCH v3 0/2] eventfs: Create dentries and inodes at dir open Steven Rostedt
@ 2024-01-16 22:55 ` Steven Rostedt
2024-01-22 21:59 ` Darrick J. Wong
2024-01-16 22:55 ` [PATCH v3 2/2] eventfs: Do not create dentries nor inodes in iterate_shared Steven Rostedt
1 sibling, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2024-01-16 22:55 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Linus Torvalds,
Christian Brauner, Al Viro, Ajay Kaher, linux-fsdevel
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
The dentries and inodes are created in the readdir for the sole purpose of
getting a consistent inode number. Linus stated that is unnecessary, and
that all inodes can have the same inode number. For a virtual file system
they are pretty meaningless.
Instead use a single unique inode number for all files and one for all
directories.
Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index fdff53d5a1f8..5edf0b96758b 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -32,6 +32,10 @@
*/
static DEFINE_MUTEX(eventfs_mutex);
+/* Choose something "unique" ;-) */
+#define EVENTFS_FILE_INODE_INO 0x12c4e37
+#define EVENTFS_DIR_INODE_INO 0x134b2f5
+
/*
* The eventfs_inode (ei) itself is protected by SRCU. It is released from
* its parent's list and will have is_freed set (under eventfs_mutex).
@@ -352,6 +356,9 @@ static struct dentry *create_file(const char *name, umode_t mode,
inode->i_fop = fop;
inode->i_private = data;
+ /* All files will have the same inode number */
+ inode->i_ino = EVENTFS_FILE_INODE_INO;
+
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
d_instantiate(dentry, inode);
@@ -388,6 +395,9 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
inode->i_op = &eventfs_root_dir_inode_operations;
inode->i_fop = &eventfs_file_operations;
+ /* All directories will have the same inode number */
+ inode->i_ino = EVENTFS_DIR_INODE_INO;
+
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] eventfs: Do not create dentries nor inodes in iterate_shared
2024-01-16 22:55 [PATCH v3 0/2] eventfs: Create dentries and inodes at dir open Steven Rostedt
2024-01-16 22:55 ` [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same Steven Rostedt
@ 2024-01-16 22:55 ` Steven Rostedt
1 sibling, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2024-01-16 22:55 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Linus Torvalds,
Christian Brauner, Al Viro, Ajay Kaher, linux-fsdevel,
kernel test robot
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
The original eventfs code added a wrapper around the dcache_readdir open
callback and created all the dentries and inodes at open, and increment
their ref count. A wrapper was added around the dcache_readdir release
function to decrement all the ref counts of those created inodes and
dentries. But this proved to be buggy[1] for when a kprobe was created
during a dir read, it would create a dentry between the open and the
release, and because the release would decrement all ref counts of all
files and directories, that would include the kprobe directory that was
not there to have its ref count incremented in open. This would cause the
ref count to go to negative and later crash the kernel.
To solve this, the dentries and inodes that were created and had their ref
count upped in open needed to be saved. That list needed to be passed from
the open to the release, so that the release would only decrement the ref
counts of the entries that were incremented in the open.
Unfortunately, the dcache_readdir logic was already using the
file->private_data, which is the only field that can be used to pass
information from the open to the release. What was done was the eventfs
created another descriptor that had a void pointer to save the
dcache_readdir pointer, and it wrapped all the callbacks, so that it could
save the list of entries that had their ref counts incremented in the
open, and pass it to the release. The wrapped callbacks would just put
back the dcache_readdir pointer and call the functions it used so it could
still use its data[2].
But Linus had an issue with the "hijacking" of the file->private_data
(unfortunately this discussion was on a security list, so no public link).
Which we finally agreed on doing everything within the iterate_shared
callback and leave the dcache_readdir out of it[3]. All the information
needed for the getents() could be created then.
But this ended up being buggy too[4]. The iterate_shared callback was not
the right place to create the dentries and inodes. Even Christian Brauner
had issues with that[5].
An attempt was to go back to creating the inodes and dentries at
the open, create an array to store the information in the
file->private_data, and pass that information to the other callbacks.[6]
The difference between that and the original method, is that it does not
use dcache_readdir. It also does not up the ref counts of the dentries and
pass them. Instead, it creates an array of a structure that saves the
dentry's name and inode number. That information is used in the
iterate_shared callback, and the array is freed in the dir release. The
dentries and inodes created in the open are not used for the iterate_share
or release callbacks. Just their names and inode numbers.
Linus did not like that either[7] and just wanted to remove the dentries
being created in iterate_shared and use the hard coded inode numbers.
[ All this while Linus enjoyed an unexpected vacation during the merge
window due to lack of power. ]
[1] https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/
[2] https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home/
[3] https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org/
[4] https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/
[5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/
[6] https://lore.kernel.org/all/20240116114711.7e8637be@gandalf.local.home/
[7] https://lore.kernel.org/all/20240116170154.5bf0a250@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.573784051@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Fixes: 493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401152142.bfc28861-oliver.sang@intel.com
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 5edf0b96758b..10580d6b5012 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -727,8 +727,6 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
struct eventfs_inode *ei_child;
struct tracefs_inode *ti;
struct eventfs_inode *ei;
- struct dentry *ei_dentry = NULL;
- struct dentry *dentry;
const char *name;
umode_t mode;
int idx;
@@ -749,11 +747,11 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
mutex_lock(&eventfs_mutex);
ei = READ_ONCE(ti->private);
- if (ei && !ei->is_freed)
- ei_dentry = READ_ONCE(ei->dentry);
+ if (ei && ei->is_freed)
+ ei = NULL;
mutex_unlock(&eventfs_mutex);
- if (!ei || !ei_dentry)
+ if (!ei)
goto out;
/*
@@ -780,11 +778,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
if (r <= 0)
continue;
- dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops);
- if (!dentry)
- goto out;
- ino = dentry->d_inode->i_ino;
- dput(dentry);
+ ino = EVENTFS_FILE_INODE_INO;
if (!dir_emit(ctx, name, strlen(name), ino, DT_REG))
goto out;
@@ -808,11 +802,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
name = ei_child->name;
- dentry = create_dir_dentry(ei, ei_child, ei_dentry);
- if (!dentry)
- goto out_dec;
- ino = dentry->d_inode->i_ino;
- dput(dentry);
+ ino = EVENTFS_DIR_INODE_INO;
if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR))
goto out_dec;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same
2024-01-16 22:55 ` [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same Steven Rostedt
@ 2024-01-22 21:59 ` Darrick J. Wong
2024-01-22 22:02 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2024-01-22 21:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Linus Torvalds, Christian Brauner, Al Viro,
Ajay Kaher, linux-fsdevel
On Tue, Jan 16, 2024 at 05:55:32PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
> The dentries and inodes are created in the readdir for the sole purpose of
> getting a consistent inode number. Linus stated that is unnecessary, and
> that all inodes can have the same inode number. For a virtual file system
> they are pretty meaningless.
>
> Instead use a single unique inode number for all files and one for all
> directories.
>
> Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/
> Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org
>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Al Viro <viro@ZenIV.linux.org.uk>
> Cc: Ajay Kaher <ajay.kaher@broadcom.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> fs/tracefs/event_inode.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index fdff53d5a1f8..5edf0b96758b 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -32,6 +32,10 @@
> */
> static DEFINE_MUTEX(eventfs_mutex);
>
> +/* Choose something "unique" ;-) */
> +#define EVENTFS_FILE_INODE_INO 0x12c4e37
> +#define EVENTFS_DIR_INODE_INO 0x134b2f5
> +
> /*
> * The eventfs_inode (ei) itself is protected by SRCU. It is released from
> * its parent's list and will have is_freed set (under eventfs_mutex).
> @@ -352,6 +356,9 @@ static struct dentry *create_file(const char *name, umode_t mode,
> inode->i_fop = fop;
> inode->i_private = data;
>
> + /* All files will have the same inode number */
> + inode->i_ino = EVENTFS_FILE_INODE_INO;
> +
> ti = get_tracefs(inode);
> ti->flags |= TRACEFS_EVENT_INODE;
> d_instantiate(dentry, inode);
> @@ -388,6 +395,9 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
> inode->i_op = &eventfs_root_dir_inode_operations;
> inode->i_fop = &eventfs_file_operations;
>
> + /* All directories will have the same inode number */
> + inode->i_ino = EVENTFS_DIR_INODE_INO;
Regrettably, this leads to find failing on 6.8-rc1 (see xfs/55[89] in
fstests):
# find /sys/kernel/debug/tracing/ >/dev/null
find: File system loop detected; ‘/sys/kernel/debug/tracing/events/initcall/initcall_finish’ is part of the same file system loop as ‘/sys/kernel/debug/tracing/events/initcall’.
find: File system loop detected; ‘/sys/kernel/debug/tracing/events/initcall/initcall_start’ is part of the same file system loop as ‘/sys/kernel/debug/tracing/events/initcall’.
find: File system loop detected; ‘/sys/kernel/debug/tracing/events/initcall/initcall_level’ is part of the same file system loop as ‘/sys/kernel/debug/tracing/events/initcall’.
There were no such reports on 6.7.0; AFAICT find(1) is tripping over
parent and child subdirectory having the same dev/i_ino. Changing this
line to the following:
/* All directories will NOT have the same inode number */
inode->i_ino = (unsigned long)inode;
makes the messages about filesystem loops go away, though I don't think
leaking raw kernel pointers is an awesome idea.
--D
> +
> ti = get_tracefs(inode);
> ti->flags |= TRACEFS_EVENT_INODE;
>
> --
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same
2024-01-22 21:59 ` Darrick J. Wong
@ 2024-01-22 22:02 ` Linus Torvalds
2024-01-22 23:03 ` Darrick J. Wong
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2024-01-22 22:02 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
Christian Brauner, Al Viro, Ajay Kaher, linux-fsdevel
On Mon, 22 Jan 2024 at 13:59, Darrick J. Wong <djwong@kernel.org> wrote:
>
> though I don't think
> leaking raw kernel pointers is an awesome idea.
Yeah, I wasn't all that comfortable even with trying to hash it
(because I think the number of source bits is small enough that even
with a crypto hash, it's trivially brute-forceable).
See
https://lore.kernel.org/all/20240122152748.46897388@gandalf.local.home/
for the current patch under discussion (and it contains a link _to_
said discussion).
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same
2024-01-22 22:02 ` Linus Torvalds
@ 2024-01-22 23:03 ` Darrick J. Wong
0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2024-01-22 23:03 UTC (permalink / raw)
To: Linus Torvalds
Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
Christian Brauner, Al Viro, Ajay Kaher, linux-fsdevel
On Mon, Jan 22, 2024 at 02:02:28PM -0800, Linus Torvalds wrote:
> On Mon, 22 Jan 2024 at 13:59, Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > though I don't think
> > leaking raw kernel pointers is an awesome idea.
>
> Yeah, I wasn't all that comfortable even with trying to hash it
> (because I think the number of source bits is small enough that even
> with a crypto hash, it's trivially brute-forceable).
>
> See
>
> https://lore.kernel.org/all/20240122152748.46897388@gandalf.local.home/
>
> for the current patch under discussion (and it contains a link _to_
> said discussion).
Ah, cool, thank you!
--D
> Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-22 23:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-16 22:55 [PATCH v3 0/2] eventfs: Create dentries and inodes at dir open Steven Rostedt
2024-01-16 22:55 ` [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same Steven Rostedt
2024-01-22 21:59 ` Darrick J. Wong
2024-01-22 22:02 ` Linus Torvalds
2024-01-22 23:03 ` Darrick J. Wong
2024-01-16 22:55 ` [PATCH v3 2/2] eventfs: Do not create dentries nor inodes in iterate_shared Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).