* [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6
@ 2023-11-05 15:56 Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 1/5] tracing: Have trace_event_file have ref counters Steven Rostedt
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Steven Rostedt @ 2023-11-05 15:56 UTC (permalink / raw)
To: linux-kernel, stable, gregkh
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton
Greg,
Friday before the merge window opened, I received a bug report
for the eventfs code that was in linux-next. I spent the next
5 days debugging it and not only fixing it, but it led to finding
other bugs in the code. Several of these other bugs happen to
also affect the 6.6 kernel.
The eventfs code was written in two parts to lower the complexity.
The first part added just the dynamic creation of the eventfs
file system and that was added to 6.6.
The second part went further and removed the one-to-one mapping between
dentry/inode and meta data, as all events have the same files. It replaced
the meta data for each file with callbacks, which caused quite a bit of
code churn.
As the merge window was already open, when I finished all the fixes
I just sent those fixes on top of the linux-next changes along with
my pull request. That means, there are 5 commits that are marked
stable (or should be marked for stable) that need to be applied to
6.6 but require a bit of tweaking or even a new way of implementing the fix!
After sending the pull request, I then checked out 6.6 an took those
5 changes and fixed them up on top of it. I ran them through all my
tests that I use to send to Linus.
So these should be as good as the versions of the patches in Linus's tree.
I waited until Linus pulled in those changes to send this series out.
-- Steve
Steven Rostedt (Google) (5):
tracing: Have trace_event_file have ref counters
eventfs: Remove "is_freed" union with rcu head
eventfs: Save ownership and mode
eventfs: Delete eventfs_inode when the last dentry is freed
eventfs: Use simple_recursive_removal() to clean up dentries
----
fs/tracefs/event_inode.c | 288 +++++++++++++++++++++++--------------
include/linux/trace_events.h | 4 +
kernel/trace/trace.c | 15 ++
kernel/trace/trace.h | 3 +
kernel/trace/trace_events.c | 31 +++-
kernel/trace/trace_events_filter.c | 3 +
6 files changed, 231 insertions(+), 113 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread
* [v6.6][PATCH 1/5] tracing: Have trace_event_file have ref counters
2023-11-05 15:56 [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Steven Rostedt
@ 2023-11-05 15:56 ` Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 2/5] eventfs: Remove "is_freed" union with rcu head Steven Rostedt
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2023-11-05 15:56 UTC (permalink / raw)
To: linux-kernel, stable, gregkh
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Beau Belgrave
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream
The following can crash the kernel:
# cd /sys/kernel/tracing
# echo 'p:sched schedule' > kprobe_events
# exec 5>>events/kprobes/sched/enable
# > kprobe_events
# exec 5>&-
The above commands:
1. Change directory to the tracefs directory
2. Create a kprobe event (doesn't matter what one)
3. Open bash file descriptor 5 on the enable file of the kprobe event
4. Delete the kprobe event (removes the files too)
5. Close the bash file descriptor 5
The above causes a crash!
BUG: kernel NULL pointer dereference, address: 0000000000000028
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:tracing_release_file_tr+0xc/0x50
What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.
But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.
To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.
Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave <beaub@linux.microsoft.com>
Tested-by: Beau Belgrave <beaub@linux.microsoft.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/trace_events.h | 4 ++++
kernel/trace/trace.c | 15 +++++++++++++++
kernel/trace/trace.h | 3 +++
kernel/trace/trace_events.c | 31 ++++++++++++++++++++++++++----
kernel/trace/trace_events_filter.c | 3 +++
5 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 21ae37e49319..cf9f0c61796e 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -492,6 +492,7 @@ enum {
EVENT_FILE_FL_TRIGGER_COND_BIT,
EVENT_FILE_FL_PID_FILTER_BIT,
EVENT_FILE_FL_WAS_ENABLED_BIT,
+ EVENT_FILE_FL_FREED_BIT,
};
extern struct trace_event_file *trace_get_event_file(const char *instance,
@@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...);
* TRIGGER_COND - When set, one or more triggers has an associated filter
* PID_FILTER - When set, the event is filtered based on pid
* WAS_ENABLED - Set when enabled to know to clear trace on module removal
+ * FREED - File descriptor is freed, all fields should be considered invalid
*/
enum {
EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT),
@@ -643,6 +645,7 @@ enum {
EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT),
EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT),
EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT),
+ EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT),
};
struct trace_event_file {
@@ -671,6 +674,7 @@ struct trace_event_file {
* caching and such. Which is mostly OK ;-)
*/
unsigned long flags;
+ atomic_t ref; /* ref count for opened files */
atomic_t sm_ref; /* soft-mode reference counter */
atomic_t tm_ref; /* trigger-mode reference counter */
};
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index abaaf516fcae..a40d6baf101f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4986,6 +4986,20 @@ int tracing_open_file_tr(struct inode *inode, struct file *filp)
if (ret)
return ret;
+ mutex_lock(&event_mutex);
+
+ /* Fail if the file is marked for removal */
+ if (file->flags & EVENT_FILE_FL_FREED) {
+ trace_array_put(file->tr);
+ ret = -ENODEV;
+ } else {
+ event_file_get(file);
+ }
+
+ mutex_unlock(&event_mutex);
+ if (ret)
+ return ret;
+
filp->private_data = inode->i_private;
return 0;
@@ -4996,6 +5010,7 @@ int tracing_release_file_tr(struct inode *inode, struct file *filp)
struct trace_event_file *file = inode->i_private;
trace_array_put(file->tr);
+ event_file_put(file);
return 0;
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 77debe53f07c..d608f6128704 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1664,6 +1664,9 @@ extern void event_trigger_unregister(struct event_command *cmd_ops,
char *glob,
struct event_trigger_data *trigger_data);
+extern void event_file_get(struct trace_event_file *file);
+extern void event_file_put(struct trace_event_file *file);
+
/**
* struct event_trigger_ops - callbacks for trace event triggers
*
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f49d6ddb6342..82cb22ad6d61 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -990,13 +990,35 @@ static void remove_subsystem(struct trace_subsystem_dir *dir)
}
}
+void event_file_get(struct trace_event_file *file)
+{
+ atomic_inc(&file->ref);
+}
+
+void event_file_put(struct trace_event_file *file)
+{
+ if (WARN_ON_ONCE(!atomic_read(&file->ref))) {
+ if (file->flags & EVENT_FILE_FL_FREED)
+ kmem_cache_free(file_cachep, file);
+ return;
+ }
+
+ if (atomic_dec_and_test(&file->ref)) {
+ /* Count should only go to zero when it is freed */
+ if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED)))
+ return;
+ kmem_cache_free(file_cachep, file);
+ }
+}
+
static void remove_event_file_dir(struct trace_event_file *file)
{
eventfs_remove(file->ef);
list_del(&file->list);
remove_subsystem(file->system);
free_event_filter(file->filter);
- kmem_cache_free(file_cachep, file);
+ file->flags |= EVENT_FILE_FL_FREED;
+ event_file_put(file);
}
/*
@@ -1369,7 +1391,7 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
flags = file->flags;
mutex_unlock(&event_mutex);
- if (!file)
+ if (!file || flags & EVENT_FILE_FL_FREED)
return -ENODEV;
if (flags & EVENT_FILE_FL_ENABLED &&
@@ -1407,7 +1429,7 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
ret = -ENODEV;
mutex_lock(&event_mutex);
file = event_file_data(filp);
- if (likely(file))
+ if (likely(file && !(file->flags & EVENT_FILE_FL_FREED)))
ret = ftrace_event_enable_disable(file, val);
mutex_unlock(&event_mutex);
break;
@@ -1681,7 +1703,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
mutex_lock(&event_mutex);
file = event_file_data(filp);
- if (file)
+ if (file && !(file->flags & EVENT_FILE_FL_FREED))
print_event_filter(file, s);
mutex_unlock(&event_mutex);
@@ -2803,6 +2825,7 @@ trace_create_new_event(struct trace_event_call *call,
atomic_set(&file->tm_ref, 0);
INIT_LIST_HEAD(&file->triggers);
list_add(&file->list, &tr->events);
+ event_file_get(file);
return file;
}
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 33264e510d16..0c611b281a5b 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -2349,6 +2349,9 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string)
struct event_filter *filter = NULL;
int err;
+ if (file->flags & EVENT_FILE_FL_FREED)
+ return -ENODEV;
+
if (!strcmp(strstrip(filter_string), "0")) {
filter_disable(file);
filter = event_filter(file);
--
2.42.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [v6.6][PATCH 2/5] eventfs: Remove "is_freed" union with rcu head
2023-11-05 15:56 [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 1/5] tracing: Have trace_event_file have ref counters Steven Rostedt
@ 2023-11-05 15:56 ` Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 3/5] eventfs: Save ownership and mode Steven Rostedt
` (3 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2023-11-05 15:56 UTC (permalink / raw)
To: linux-kernel, stable, gregkh
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Ajay Kaher
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
commit f2f496370afcbc5227d7002da28c74b91fed12ff upstream
The eventfs_inode->is_freed was a union with the rcu_head with the
assumption that when it was on the srcu list the head would contain a
pointer which would make "is_freed" true. But that was a wrong assumption
as the rcu head is a single link list where the last element is NULL.
Instead, split the nr_entries integer so that "is_freed" is one bit and
the nr_entries is the next 31 bits. As there shouldn't be more than 10
(currently there's at most 5 to 7 depending on the config), this should
not be a problem.
Link: https://lkml.kernel.org/r/20231101172649.049758712@goodmis.org
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 8c8d64e76103..a64d8fa39e54 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -38,6 +38,7 @@ struct eventfs_inode {
* @fop: file_operations for file or directory
* @iop: inode_operations for file or directory
* @data: something that the caller will want to get to later on
+ * @is_freed: Flag set if the eventfs is on its way to be freed
* @mode: the permission that the file or directory should have
*/
struct eventfs_file {
@@ -52,15 +53,14 @@ struct eventfs_file {
* Union - used for deletion
* @del_list: list of eventfs_file to delete
* @rcu: eventfs_file to delete in RCU
- * @is_freed: node is freed if one of the above is set
*/
union {
struct list_head del_list;
struct rcu_head rcu;
- unsigned long is_freed;
};
void *data;
- umode_t mode;
+ unsigned int is_freed:1;
+ unsigned int mode:31;
};
static DEFINE_MUTEX(eventfs_mutex);
@@ -814,6 +814,8 @@ static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head,
}
}
+ ef->is_freed = 1;
+
list_del_rcu(&ef->list);
list_add_tail(&ef->del_list, head);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [v6.6][PATCH 3/5] eventfs: Save ownership and mode
2023-11-05 15:56 [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 1/5] tracing: Have trace_event_file have ref counters Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 2/5] eventfs: Remove "is_freed" union with rcu head Steven Rostedt
@ 2023-11-05 15:56 ` Steven Rostedt
2023-11-12 10:41 ` NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing` Milian Wolff
2023-11-05 15:56 ` [v6.6][PATCH 4/5] eventfs: Delete eventfs_inode when the last dentry is freed Steven Rostedt
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2023-11-05 15:56 UTC (permalink / raw)
To: linux-kernel, stable, gregkh
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Ajay Kaher
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
commit 28e12c09f5aa081b2d13d1340e3610070b6c624d upstream
Now that inodes and dentries are created on the fly, they are also
reclaimed on memory pressure. Since the ownership and file mode are saved
in the inode, if they are freed, any changes to the ownership and mode
will be lost.
To counter this, if the user changes the permissions or ownership, save
them, and when creating the inodes again, restore those changes.
Link: https://lkml.kernel.org/r/20231101172649.691841445@goodmis.org
Cc: stable@vger.kernel.org
Cc: Ajay Kaher <akaher@vmware.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 107 +++++++++++++++++++++++++++++++++------
1 file changed, 91 insertions(+), 16 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index a64d8fa39e54..6a3f7502310c 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -40,6 +40,8 @@ struct eventfs_inode {
* @data: something that the caller will want to get to later on
* @is_freed: Flag set if the eventfs is on its way to be freed
* @mode: the permission that the file or directory should have
+ * @uid: saved uid if changed
+ * @gid: saved gid if changed
*/
struct eventfs_file {
const char *name;
@@ -61,11 +63,22 @@ struct eventfs_file {
void *data;
unsigned int is_freed:1;
unsigned int mode:31;
+ kuid_t uid;
+ kgid_t gid;
};
static DEFINE_MUTEX(eventfs_mutex);
DEFINE_STATIC_SRCU(eventfs_srcu);
+/* Mode is unsigned short, use the upper bits for flags */
+enum {
+ EVENTFS_SAVE_MODE = BIT(16),
+ EVENTFS_SAVE_UID = BIT(17),
+ EVENTFS_SAVE_GID = BIT(18),
+};
+
+#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1)
+
static struct dentry *eventfs_root_lookup(struct inode *dir,
struct dentry *dentry,
unsigned int flags);
@@ -73,8 +86,54 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx);
static int eventfs_release(struct inode *inode, struct file *file);
+static void update_attr(struct eventfs_file *ef, struct iattr *iattr)
+{
+ unsigned int ia_valid = iattr->ia_valid;
+
+ if (ia_valid & ATTR_MODE) {
+ ef->mode = (ef->mode & ~EVENTFS_MODE_MASK) |
+ (iattr->ia_mode & EVENTFS_MODE_MASK) |
+ EVENTFS_SAVE_MODE;
+ }
+ if (ia_valid & ATTR_UID) {
+ ef->mode |= EVENTFS_SAVE_UID;
+ ef->uid = iattr->ia_uid;
+ }
+ if (ia_valid & ATTR_GID) {
+ ef->mode |= EVENTFS_SAVE_GID;
+ ef->gid = iattr->ia_gid;
+ }
+}
+
+static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct iattr *iattr)
+{
+ struct eventfs_file *ef;
+ int ret;
+
+ mutex_lock(&eventfs_mutex);
+ ef = dentry->d_fsdata;
+ /* The LSB is set when the eventfs_inode is being freed */
+ if (((unsigned long)ef & 1UL) || ef->is_freed) {
+ /* Do not allow changes if the event is about to be removed. */
+ mutex_unlock(&eventfs_mutex);
+ return -ENODEV;
+ }
+
+ ret = simple_setattr(idmap, dentry, iattr);
+ if (!ret)
+ update_attr(ef, iattr);
+ mutex_unlock(&eventfs_mutex);
+ return ret;
+}
+
static const struct inode_operations eventfs_root_dir_inode_operations = {
.lookup = eventfs_root_lookup,
+ .setattr = eventfs_set_attr,
+};
+
+static const struct inode_operations eventfs_file_inode_operations = {
+ .setattr = eventfs_set_attr,
};
static const struct file_operations eventfs_file_operations = {
@@ -85,10 +144,20 @@ static const struct file_operations eventfs_file_operations = {
.release = eventfs_release,
};
+static void update_inode_attr(struct inode *inode, struct eventfs_file *ef)
+{
+ inode->i_mode = ef->mode & EVENTFS_MODE_MASK;
+
+ if (ef->mode & EVENTFS_SAVE_UID)
+ inode->i_uid = ef->uid;
+
+ if (ef->mode & EVENTFS_SAVE_GID)
+ inode->i_gid = ef->gid;
+}
+
/**
* create_file - create a file in the tracefs filesystem
- * @name: the name of the file to create.
- * @mode: the permission that the file should have.
+ * @ef: the eventfs_file
* @parent: parent dentry for this file.
* @data: something that the caller will want to get to later on.
* @fop: struct file_operations that should be used for this file.
@@ -104,7 +173,7 @@ static const struct file_operations eventfs_file_operations = {
* If tracefs is not enabled in the kernel, the value -%ENODEV will be
* returned.
*/
-static struct dentry *create_file(const char *name, umode_t mode,
+static struct dentry *create_file(struct eventfs_file *ef,
struct dentry *parent, void *data,
const struct file_operations *fop)
{
@@ -112,13 +181,13 @@ static struct dentry *create_file(const char *name, umode_t mode,
struct dentry *dentry;
struct inode *inode;
- if (!(mode & S_IFMT))
- mode |= S_IFREG;
+ if (!(ef->mode & S_IFMT))
+ ef->mode |= S_IFREG;
- if (WARN_ON_ONCE(!S_ISREG(mode)))
+ if (WARN_ON_ONCE(!S_ISREG(ef->mode)))
return NULL;
- dentry = eventfs_start_creating(name, parent);
+ dentry = eventfs_start_creating(ef->name, parent);
if (IS_ERR(dentry))
return dentry;
@@ -127,7 +196,10 @@ static struct dentry *create_file(const char *name, umode_t mode,
if (unlikely(!inode))
return eventfs_failed_creating(dentry);
- inode->i_mode = mode;
+ /* If the user updated the directory's attributes, use them */
+ update_inode_attr(inode, ef);
+
+ inode->i_op = &eventfs_file_inode_operations;
inode->i_fop = fop;
inode->i_private = data;
@@ -140,7 +212,7 @@ static struct dentry *create_file(const char *name, umode_t mode,
/**
* create_dir - create a dir in the tracefs filesystem
- * @name: the name of the file to create.
+ * @ei: the eventfs_inode that represents the directory to create
* @parent: parent dentry for this file.
* @data: something that the caller will want to get to later on.
*
@@ -155,13 +227,14 @@ static struct dentry *create_file(const char *name, umode_t mode,
* If tracefs is not enabled in the kernel, the value -%ENODEV will be
* returned.
*/
-static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
+static struct dentry *create_dir(struct eventfs_file *ef,
+ struct dentry *parent, void *data)
{
struct tracefs_inode *ti;
struct dentry *dentry;
struct inode *inode;
- dentry = eventfs_start_creating(name, parent);
+ dentry = eventfs_start_creating(ef->name, parent);
if (IS_ERR(dentry))
return dentry;
@@ -169,7 +242,8 @@ static struct dentry *create_dir(const char *name, struct dentry *parent, void *
if (unlikely(!inode))
return eventfs_failed_creating(dentry);
- inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
+ update_inode_attr(inode, ef);
+
inode->i_op = &eventfs_root_dir_inode_operations;
inode->i_fop = &eventfs_file_operations;
inode->i_private = data;
@@ -306,10 +380,9 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
inode_lock(parent->d_inode);
if (ef->ei)
- dentry = create_dir(ef->name, parent, ef->data);
+ dentry = create_dir(ef, parent, ef->data);
else
- dentry = create_file(ef->name, ef->mode, parent,
- ef->data, ef->fop);
+ dentry = create_file(ef, parent, ef->data, ef->fop);
if (!lookup)
inode_unlock(parent->d_inode);
@@ -475,6 +548,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
if (d) {
struct dentry **tmp;
+
tmp = krealloc(dentries, sizeof(d) * (cnt + 2), GFP_KERNEL);
if (!tmp)
break;
@@ -549,13 +623,14 @@ static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode,
return ERR_PTR(-ENOMEM);
}
INIT_LIST_HEAD(&ef->ei->e_top_files);
+ ef->mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
} else {
ef->ei = NULL;
+ ef->mode = mode;
}
ef->iop = iop;
ef->fop = fop;
- ef->mode = mode;
ef->data = data;
return ef;
}
--
2.42.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [v6.6][PATCH 4/5] eventfs: Delete eventfs_inode when the last dentry is freed
2023-11-05 15:56 [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Steven Rostedt
` (2 preceding siblings ...)
2023-11-05 15:56 ` [v6.6][PATCH 3/5] eventfs: Save ownership and mode Steven Rostedt
@ 2023-11-05 15:56 ` Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 5/5] eventfs: Use simple_recursive_removal() to clean up dentries Steven Rostedt
2023-11-06 11:40 ` [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Greg KH
5 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2023-11-05 15:56 UTC (permalink / raw)
To: linux-kernel, stable, gregkh
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Ajay Kaher
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
commit 020010fbfa202aa528a52743eba4ab0da3400a4e upstream
There exists a race between holding a reference of an eventfs_inode dentry
and the freeing of the eventfs_inode. If user space has a dentry held long
enough, it may still be able to access the dentry's eventfs_inode after it
has been freed.
To prevent this, have he eventfs_inode freed via the last dput() (or via
RCU if the eventfs_inode does not have a dentry).
This means reintroducing the eventfs_inode del_list field at a temporary
place to put the eventfs_inode. It needs to mark it as freed (via the
list) but also must invalidate the dentry immediately as the return from
eventfs_remove_dir() expects that they are. But the dentry invalidation
must not be called under the eventfs_mutex, so it must be done after the
eventfs_inode is marked as free (put on a deletion list).
Link: https://lkml.kernel.org/r/20231101172650.123479767@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 150 +++++++++++++++++++--------------------
1 file changed, 74 insertions(+), 76 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 6a3f7502310c..7aa92b8ebc51 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -53,10 +53,12 @@ struct eventfs_file {
const struct inode_operations *iop;
/*
* Union - used for deletion
+ * @llist: for calling dput() if needed after RCU
* @del_list: list of eventfs_file to delete
* @rcu: eventfs_file to delete in RCU
*/
union {
+ struct llist_node llist;
struct list_head del_list;
struct rcu_head rcu;
};
@@ -113,8 +115,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
mutex_lock(&eventfs_mutex);
ef = dentry->d_fsdata;
- /* The LSB is set when the eventfs_inode is being freed */
- if (((unsigned long)ef & 1UL) || ef->is_freed) {
+ if (ef->is_freed) {
/* Do not allow changes if the event is about to be removed. */
mutex_unlock(&eventfs_mutex);
return -ENODEV;
@@ -258,6 +259,13 @@ static struct dentry *create_dir(struct eventfs_file *ef,
return eventfs_end_creating(dentry);
}
+static void free_ef(struct eventfs_file *ef)
+{
+ kfree(ef->name);
+ kfree(ef->ei);
+ kfree(ef);
+}
+
/**
* eventfs_set_ef_status_free - set the ef->status to free
* @ti: the tracefs_inode of the dentry
@@ -270,34 +278,20 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
{
struct tracefs_inode *ti_parent;
struct eventfs_inode *ei;
- struct eventfs_file *ef, *tmp;
+ struct eventfs_file *ef;
/* The top level events directory may be freed by this */
if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) {
- LIST_HEAD(ef_del_list);
-
mutex_lock(&eventfs_mutex);
-
ei = ti->private;
- /* Record all the top level files */
- list_for_each_entry_srcu(ef, &ei->e_top_files, list,
- lockdep_is_held(&eventfs_mutex)) {
- list_add_tail(&ef->del_list, &ef_del_list);
- }
-
/* Nothing should access this, but just in case! */
ti->private = NULL;
-
mutex_unlock(&eventfs_mutex);
- /* Now safely free the top level files and their children */
- list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
- list_del(&ef->del_list);
- eventfs_remove(ef);
- }
-
- kfree(ei);
+ ef = dentry->d_fsdata;
+ if (ef)
+ free_ef(ef);
return;
}
@@ -311,16 +305,13 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
if (!ef)
goto out;
- /*
- * If ef was freed, then the LSB bit is set for d_fsdata.
- * But this should not happen, as it should still have a
- * ref count that prevents it. Warn in case it does.
- */
- if (WARN_ON_ONCE((unsigned long)ef & 1))
- goto out;
+ if (ef->is_freed) {
+ free_ef(ef);
+ } else {
+ ef->dentry = NULL;
+ }
dentry->d_fsdata = NULL;
- ef->dentry = NULL;
out:
mutex_unlock(&eventfs_mutex);
}
@@ -847,13 +838,53 @@ int eventfs_add_file(const char *name, umode_t mode,
return 0;
}
-static void free_ef(struct rcu_head *head)
+static LLIST_HEAD(free_list);
+
+static void eventfs_workfn(struct work_struct *work)
+{
+ struct eventfs_file *ef, *tmp;
+ struct llist_node *llnode;
+
+ llnode = llist_del_all(&free_list);
+ llist_for_each_entry_safe(ef, tmp, llnode, llist) {
+ /* This should only get here if it had a dentry */
+ if (!WARN_ON_ONCE(!ef->dentry))
+ dput(ef->dentry);
+ }
+}
+
+static DECLARE_WORK(eventfs_work, eventfs_workfn);
+
+static void free_rcu_ef(struct rcu_head *head)
{
struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
- kfree(ef->name);
- kfree(ef->ei);
- kfree(ef);
+ if (ef->dentry) {
+ /* Do not free the ef until all references of dentry are gone */
+ if (llist_add(&ef->llist, &free_list))
+ queue_work(system_unbound_wq, &eventfs_work);
+ return;
+ }
+
+ free_ef(ef);
+}
+
+static void unhook_dentry(struct dentry *dentry)
+{
+ if (!dentry)
+ return;
+
+ /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+ dget(dentry);
+
+ dentry->d_fsdata = NULL;
+ d_invalidate(dentry);
+ mutex_lock(&eventfs_mutex);
+ /* dentry should now have at least a single reference */
+ WARN_ONCE((int)d_count(dentry) < 1,
+ "dentry %px (%s) less than one reference (%d) after invalidate\n",
+ dentry, dentry->d_name.name, d_count(dentry));
+ mutex_unlock(&eventfs_mutex);
}
/**
@@ -905,58 +936,25 @@ void eventfs_remove(struct eventfs_file *ef)
{
struct eventfs_file *tmp;
LIST_HEAD(ef_del_list);
- struct dentry *dentry_list = NULL;
- struct dentry *dentry;
if (!ef)
return;
+ /*
+ * Move the deleted eventfs_inodes onto the ei_del_list
+ * which will also set the is_freed value. Note, this has to be
+ * done under the eventfs_mutex, but the deletions of
+ * the dentries must be done outside the eventfs_mutex.
+ * Hence moving them to this temporary list.
+ */
mutex_lock(&eventfs_mutex);
eventfs_remove_rec(ef, &ef_del_list, 0);
- list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
- if (ef->dentry) {
- unsigned long ptr = (unsigned long)dentry_list;
-
- /* Keep the dentry from being freed yet */
- dget(ef->dentry);
-
- /*
- * Paranoid: The dget() above should prevent the dentry
- * from being freed and calling eventfs_set_ef_status_free().
- * But just in case, set the link list LSB pointer to 1
- * and have eventfs_set_ef_status_free() check that to
- * make sure that if it does happen, it will not think
- * the d_fsdata is an event_file.
- *
- * For this to work, no event_file should be allocated
- * on a odd space, as the ef should always be allocated
- * to be at least word aligned. Check for that too.
- */
- WARN_ON_ONCE(ptr & 1);
-
- ef->dentry->d_fsdata = (void *)(ptr | 1);
- dentry_list = ef->dentry;
- ef->dentry = NULL;
- }
- call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
- }
mutex_unlock(&eventfs_mutex);
- while (dentry_list) {
- unsigned long ptr;
-
- dentry = dentry_list;
- ptr = (unsigned long)dentry->d_fsdata & ~1UL;
- dentry_list = (struct dentry *)ptr;
- dentry->d_fsdata = NULL;
- d_invalidate(dentry);
- mutex_lock(&eventfs_mutex);
- /* dentry should now have at least a single reference */
- WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %p less than one reference (%d) after invalidate\n",
- dentry, d_count(dentry));
- mutex_unlock(&eventfs_mutex);
- dput(dentry);
+ list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
+ unhook_dentry(ef->dentry);
+ list_del(&ef->del_list);
+ call_srcu(&eventfs_srcu, &ef->rcu, free_rcu_ef);
}
}
--
2.42.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [v6.6][PATCH 5/5] eventfs: Use simple_recursive_removal() to clean up dentries
2023-11-05 15:56 [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Steven Rostedt
` (3 preceding siblings ...)
2023-11-05 15:56 ` [v6.6][PATCH 4/5] eventfs: Delete eventfs_inode when the last dentry is freed Steven Rostedt
@ 2023-11-05 15:56 ` Steven Rostedt
2023-11-06 11:40 ` [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Greg KH
5 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2023-11-05 15:56 UTC (permalink / raw)
To: linux-kernel, stable, gregkh
Cc: Masami Hiramatsu, Mark Rutland, Andrew Morton, Al Viro
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
commit 407c6726ca71b33330d2d6345d9ea7ebc02575e9 upstream
Looking at how dentry is removed via the tracefs system, I found that
eventfs does not do everything that it did under tracefs. The tracefs
removal of a dentry calls simple_recursive_removal() that does a lot more
than a simple d_invalidate().
As it should be a requirement that any eventfs_inode that has a dentry, so
does its parent. When removing a eventfs_inode, if it has a dentry, a call
to simple_recursive_removal() on that dentry should clean up all the
dentries underneath it.
Add WARN_ON_ONCE() to check for the parent having a dentry if any children
do.
Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/
Link: https://lkml.kernel.org/r/20231101172650.552471568@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
fs/tracefs/event_inode.c | 71 +++++++++++++++++++---------------------
1 file changed, 33 insertions(+), 38 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 7aa92b8ebc51..5fcfb634fec2 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -54,12 +54,10 @@ struct eventfs_file {
/*
* Union - used for deletion
* @llist: for calling dput() if needed after RCU
- * @del_list: list of eventfs_file to delete
* @rcu: eventfs_file to delete in RCU
*/
union {
struct llist_node llist;
- struct list_head del_list;
struct rcu_head rcu;
};
void *data;
@@ -276,7 +274,6 @@ static void free_ef(struct eventfs_file *ef)
*/
void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
{
- struct tracefs_inode *ti_parent;
struct eventfs_inode *ei;
struct eventfs_file *ef;
@@ -297,10 +294,6 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
mutex_lock(&eventfs_mutex);
- ti_parent = get_tracefs(dentry->d_parent->d_inode);
- if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
- goto out;
-
ef = dentry->d_fsdata;
if (!ef)
goto out;
@@ -873,30 +866,29 @@ static void unhook_dentry(struct dentry *dentry)
{
if (!dentry)
return;
-
- /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+ /*
+ * Need to add a reference to the dentry that is expected by
+ * simple_recursive_removal(), which will include a dput().
+ */
dget(dentry);
- dentry->d_fsdata = NULL;
- d_invalidate(dentry);
- mutex_lock(&eventfs_mutex);
- /* dentry should now have at least a single reference */
- WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %px (%s) less than one reference (%d) after invalidate\n",
- dentry, dentry->d_name.name, d_count(dentry));
- mutex_unlock(&eventfs_mutex);
+ /*
+ * Also add a reference for the dput() in eventfs_workfn().
+ * That is required as that dput() will free the ei after
+ * the SRCU grace period is over.
+ */
+ dget(dentry);
}
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ef: eventfs_file to be removed.
- * @head: to create list of eventfs_file to be deleted
* @level: to check recursion depth
*
* The helper function eventfs_remove_rec() is used to clean up and free the
* associated data from eventfs for both of the added functions.
*/
-static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
+static void eventfs_remove_rec(struct eventfs_file *ef, int level)
{
struct eventfs_file *ef_child;
@@ -916,14 +908,16 @@ static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head,
/* search for nested folders or files */
list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
lockdep_is_held(&eventfs_mutex)) {
- eventfs_remove_rec(ef_child, head, level + 1);
+ eventfs_remove_rec(ef_child, level + 1);
}
}
ef->is_freed = 1;
+ unhook_dentry(ef->dentry);
+
list_del_rcu(&ef->list);
- list_add_tail(&ef->del_list, head);
+ call_srcu(&eventfs_srcu, &ef->rcu, free_rcu_ef);
}
/**
@@ -934,28 +928,22 @@ static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head,
*/
void eventfs_remove(struct eventfs_file *ef)
{
- struct eventfs_file *tmp;
- LIST_HEAD(ef_del_list);
+ struct dentry *dentry;
if (!ef)
return;
- /*
- * Move the deleted eventfs_inodes onto the ei_del_list
- * which will also set the is_freed value. Note, this has to be
- * done under the eventfs_mutex, but the deletions of
- * the dentries must be done outside the eventfs_mutex.
- * Hence moving them to this temporary list.
- */
mutex_lock(&eventfs_mutex);
- eventfs_remove_rec(ef, &ef_del_list, 0);
+ dentry = ef->dentry;
+ eventfs_remove_rec(ef, 0);
mutex_unlock(&eventfs_mutex);
- list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
- unhook_dentry(ef->dentry);
- list_del(&ef->del_list);
- call_srcu(&eventfs_srcu, &ef->rcu, free_rcu_ef);
- }
+ /*
+ * If any of the ei children has a dentry, then the ei itself
+ * must have a dentry.
+ */
+ if (dentry)
+ simple_recursive_removal(dentry, NULL);
}
/**
@@ -966,6 +954,8 @@ void eventfs_remove(struct eventfs_file *ef)
*/
void eventfs_remove_events_dir(struct dentry *dentry)
{
+ struct eventfs_file *ef_child;
+ struct eventfs_inode *ei;
struct tracefs_inode *ti;
if (!dentry || !dentry->d_inode)
@@ -975,6 +965,11 @@ void eventfs_remove_events_dir(struct dentry *dentry)
if (!ti || !(ti->flags & TRACEFS_EVENT_INODE))
return;
- d_invalidate(dentry);
- dput(dentry);
+ mutex_lock(&eventfs_mutex);
+ ei = ti->private;
+ list_for_each_entry_srcu(ef_child, &ei->e_top_files, list,
+ lockdep_is_held(&eventfs_mutex)) {
+ eventfs_remove_rec(ef_child, 0);
+ }
+ mutex_unlock(&eventfs_mutex);
}
--
2.42.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6
2023-11-05 15:56 [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Steven Rostedt
` (4 preceding siblings ...)
2023-11-05 15:56 ` [v6.6][PATCH 5/5] eventfs: Use simple_recursive_removal() to clean up dentries Steven Rostedt
@ 2023-11-06 11:40 ` Greg KH
2023-11-06 20:12 ` Steven Rostedt
5 siblings, 1 reply; 14+ messages in thread
From: Greg KH @ 2023-11-06 11:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, stable, Masami Hiramatsu, Mark Rutland,
Andrew Morton
On Sun, Nov 05, 2023 at 10:56:30AM -0500, Steven Rostedt wrote:
>
> Greg,
>
> Friday before the merge window opened, I received a bug report
> for the eventfs code that was in linux-next. I spent the next
> 5 days debugging it and not only fixing it, but it led to finding
> other bugs in the code. Several of these other bugs happen to
> also affect the 6.6 kernel.
>
> The eventfs code was written in two parts to lower the complexity.
> The first part added just the dynamic creation of the eventfs
> file system and that was added to 6.6.
>
> The second part went further and removed the one-to-one mapping between
> dentry/inode and meta data, as all events have the same files. It replaced
> the meta data for each file with callbacks, which caused quite a bit of
> code churn.
>
> As the merge window was already open, when I finished all the fixes
> I just sent those fixes on top of the linux-next changes along with
> my pull request. That means, there are 5 commits that are marked
> stable (or should be marked for stable) that need to be applied to
> 6.6 but require a bit of tweaking or even a new way of implementing the fix!
>
> After sending the pull request, I then checked out 6.6 an took those
> 5 changes and fixed them up on top of it. I ran them through all my
> tests that I use to send to Linus.
>
> So these should be as good as the versions of the patches in Linus's tree.
> I waited until Linus pulled in those changes to send this series out.
All now queued up. Note, patch 1/6 needs to go to older kernels as
well, according to your Fixes: tag, so if you could provide backports
for them as well that would be great.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6
2023-11-06 11:40 ` [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Greg KH
@ 2023-11-06 20:12 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2023-11-06 20:12 UTC (permalink / raw)
To: Greg KH; +Cc: linux-kernel, stable, Masami Hiramatsu, Mark Rutland,
Andrew Morton
On Mon, 6 Nov 2023 12:40:52 +0100
Greg KH <gregkh@linuxfoundation.org> wrote:
> All now queued up. Note, patch 1/6 needs to go to older kernels as
> well, according to your Fixes: tag, so if you could provide backports
> for them as well that would be great.
Done.
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing`
2023-11-05 15:56 ` [v6.6][PATCH 3/5] eventfs: Save ownership and mode Steven Rostedt
@ 2023-11-12 10:41 ` Milian Wolff
2023-11-12 12:14 ` Steven Rostedt
2023-11-14 13:38 ` Linux regression tracking #adding (Thorsten Leemhuis)
0 siblings, 2 replies; 14+ messages in thread
From: Milian Wolff @ 2023-11-12 10:41 UTC (permalink / raw)
To: rostedt
Cc: akaher, akpm, gregkh, linux-kernel, mark.rutland, mhiramat,
Milian Wolff
Hey all,
this patch seems to have introduced a kernel bug causing
a NULL pointer dereference when one runs:
sudo chown -R root:tracing /sys/kernel/debug/tracing/
See the archlinux bug report I created initially for some more information:
https://bugs.archlinux.org/task/80230
With 6.6.1 and 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b reverted,
the above `chmod` command works. With a normal 6.6.1 build, or re-applying
the patch again, the command fails and `dmesg` shows:
```
[ 60.723813] BUG: kernel NULL pointer dereference, address: 0000000000000058
[ 60.723817] #PF: supervisor read access in kernel mode
[ 60.723819] #PF: error_code(0x0000) - not-present page
[ 60.723820] PGD 0 P4D 0
[ 60.723821] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 60.723823] CPU: 5 PID: 2830 Comm: chown Tainted: P OE 6.6.1-arch1-1 #1 be166a630cd909acf8820643140e9106c6ea80e6
[ 60.723825] Hardware name: LENOVO 20Y30018GE/20Y30018GE, BIOS N40ET42W (1.24 ) 07/26/2023
[ 60.723826] RIP: 0010:eventfs_set_attr+0x28/0xd0
[ 60.723830] Code: 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 48 89 fd 48 c7 c7 a0 b7 b2 b2 53 48 89 f3 e8 8c 16 8c 00 4c 8b 6b 78 <41> f6 45 58 01 0f 85 87 00 00 00 48 89 de 4c 89 e2 48 89 ef e8 4f
[ 60.723830] RSP: 0018:ffffc90007fdbd38 EFLAGS: 00010246
[ 60.723832] RAX: 0000000000000000 RBX: ffff88810047e180 RCX: 8000000000000000
[ 60.723832] RDX: ffff888174b6ce00 RSI: ffff88810047e180 RDI: ffffffffb2b2b7a0
[ 60.723833] RBP: ffffffffb2b20620 R08: 0000000000000000 R09: ffffffffb2a4a488
[ 60.723834] R10: 00000000654f505f R11: 0000000018432441 R12: ffffc90007fdbe00
[ 60.723835] R13: 0000000000000000 R14: ffff88810047e180 R15: ffff8881004c02a8
[ 60.723835] FS: 00007f4ac6f4c740(0000) GS:ffff88901f540000(0000) knlGS:0000000000000000
[ 60.723836] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 60.723837] CR2: 0000000000000058 CR3: 000000021ad48001 CR4: 0000000000f70ee0
[ 60.723838] PKRU: 55555554
[ 60.723839] Call Trace:
[ 60.723840] <TASK>
[ 60.723842] ? __die+0x23/0x70
[ 60.723845] ? page_fault_oops+0x171/0x4e0
[ 60.723847] ? generic_permission+0x39/0x220
[ 60.723850] ? exc_page_fault+0x7f/0x180
[ 60.723853] ? asm_exc_page_fault+0x26/0x30
[ 60.723857] ? eventfs_set_attr+0x28/0xd0
[ 60.723858] ? eventfs_set_attr+0x24/0xd0
[ 60.723859] notify_change+0x1f2/0x4b0
[ 60.723862] ? chown_common+0x222/0x230
[ 60.723863] chown_common+0x222/0x230
[ 60.723865] do_fchownat+0xa3/0x100
[ 60.723866] __x64_sys_fchownat+0x1f/0x30
[ 60.723867] do_syscall_64+0x5d/0x90
[ 60.723869] ? syscall_exit_to_user_mode+0x2b/0x40
[ 60.723871] ? do_syscall_64+0x6c/0x90
[ 60.723872] ? do_syscall_64+0x6c/0x90
[ 60.723874] ? do_syscall_64+0x6c/0x90
[ 60.723875] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 60.723877] RIP: 0033:0x7f4ac704e2ce
[ 60.723904] Code: 48 8b 0d 65 8a 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 04 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 32 8a 0d 00 f7 d8 64 89 01 48
[ 60.723905] RSP: 002b:00007ffed0b735b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000104
[ 60.723906] RAX: ffffffffffffffda RBX: 00005625366cfd80 RCX: 00007f4ac704e2ce
[ 60.723907] RDX: 0000000000000000 RSI: 00005625366cfe10 RDI: 0000000000000004
[ 60.723908] RBP: 0000000000000001 R08: 0000000000000100 R09: 0000000000000007
[ 60.723908] R10: 00000000000003e9 R11: 0000000000000246 R12: 00005625366cb9c0
[ 60.723909] R13: 00005625366cfd10 R14: 00005625366cfe10 R15: 0000000000000000
[ 60.723910] </TASK>
[ 60.723910] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm ccm cmac algif_hash algif_skcipher af_alg bnep nvidia_drm(POE) nvidia_modeset(POE) snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes nvidia_uvm(POE) snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic intel_tcc_cooling snd_sof_pci_intel_tgl x86_pkg_temp_thermal intel_powerclamp snd_sof_intel_hda_common soundwire_intel coretemp snd_sof_intel_hda_mlink nvidia(POE) soundwire_cadence snd_sof_intel_hda kvm_intel snd_sof_pci snd_sof_xtensa_dsp snd_sof kvm snd_sof_utils snd_soc_hdac_hda vfat snd_hda_ext_core fat snd_soc_acpi_intel_match irqbypass snd_soc_acpi iwlmvm crct10dif_pclmul joydev soundwire_generic_allocation mousedev soundwire_bus crc32_pclmul snd_hda_codec_hdmi snd_soc_core polyval_clmulni polyval_generic snd_compress gf128mul mac80211 ac97_bus ghash_clmulni_intel snd_pcm_dmaengine sha512_ssse3 btusb aesni_intel btrtl uvcvideo snd_hda_intel crypto_simd btintel videobuf2_vmalloc
[ 60.723940] snd_intel_dspcfg cryptd libarc4 btbcm uvc snd_intel_sdw_acpi btmtk videobuf2_memops iTCO_wdt hid_multitouch snd_hda_codec intel_pmc_bxt videobuf2_v4l2 mei_hdcp mei_wdt mei_pxp rapl ee1004 iTCO_vendor_support intel_rapl_msr snd_hda_core bluetooth processor_thermal_device_pci_legacy videodev processor_thermal_device iwlwifi processor_thermal_rfim spi_nor videobuf2_common snd_hwdep intel_cstate think_lmi processor_thermal_mbox intel_uncore psmouse pcspkr cfg80211 mc ecdh_generic mtd firmware_attributes_class wmi_bmof mei_me ucsi_acpi processor_thermal_rapl snd_pcm i2c_i801 intel_lpss_pci typec_ucsi intel_rapl_common i2c_smbus intel_lpss thunderbolt i2c_hid_acpi mei snd_timer typec intel_soc_dts_iosf idma64 i2c_hid int3403_thermal roles int340x_thermal_zone int3400_thermal acpi_tad acpi_pad acpi_thermal_rel mac_hid i2c_dev crypto_user fuse acpi_call(OE) dm_mod loop ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 thinkpad_acpi rtsx_pci_sdmmc ledtrig_audio i2c_algo_bit mmc_core serio_raw
[ 60.723972] platform_profile drm_buddy ttm snd atkbd nvme intel_gtt libps2 vivaldi_fmap soundcore drm_display_helper nvme_core xhci_pci crc32c_intel spi_intel_pci rfkill rtsx_pci spi_intel xhci_pci_renesas cec nvme_common video i8042 serio wmi
[ 60.723982] CR2: 0000000000000058
[ 60.723983] ---[ end trace 0000000000000000 ]---
[ 60.723984] RIP: 0010:eventfs_set_attr+0x28/0xd0
[ 60.723985] Code: 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 48 89 fd 48 c7 c7 a0 b7 b2 b2 53 48 89 f3 e8 8c 16 8c 00 4c 8b 6b 78 <41> f6 45 58 01 0f 85 87 00 00 00 48 89 de 4c 89 e2 48 89 ef e8 4f
[ 60.723986] RSP: 0018:ffffc90007fdbd38 EFLAGS: 00010246
[ 60.723987] RAX: 0000000000000000 RBX: ffff88810047e180 RCX: 8000000000000000
[ 60.723987] RDX: ffff888174b6ce00 RSI: ffff88810047e180 RDI: ffffffffb2b2b7a0
[ 60.723988] RBP: ffffffffb2b20620 R08: 0000000000000000 R09: ffffffffb2a4a488
[ 60.723989] R10: 00000000654f505f R11: 0000000018432441 R12: ffffc90007fdbe00
[ 60.723989] R13: 0000000000000000 R14: ffff88810047e180 R15: ffff8881004c02a8
[ 60.723990] FS: 00007f4ac6f4c740(0000) GS:ffff88901f540000(0000) knlGS:0000000000000000
[ 60.723991] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 60.723991] CR2: 0000000000000058 CR3: 000000021ad48001 CR4: 0000000000f70ee0
[ 60.723992] PKRU: 55555554
[ 60.723992] note: chown[2830] exited with irqs disabled
```
Thanks, if you need more information just let me know.
Cheers
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing`
2023-11-12 10:41 ` NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing` Milian Wolff
@ 2023-11-12 12:14 ` Steven Rostedt
2023-11-12 14:26 ` Steven Rostedt
2023-11-14 13:38 ` Linux regression tracking #adding (Thorsten Leemhuis)
1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2023-11-12 12:14 UTC (permalink / raw)
To: Milian Wolff; +Cc: akaher, akpm, gregkh, linux-kernel, mark.rutland, mhiramat
On Sun, 12 Nov 2023 11:41:58 +0100
Milian Wolff <milian.wolff@kdab.com> wrote:
> Hey all,
>
> this patch seems to have introduced a kernel bug causing
> a NULL pointer dereference when one runs:
>
> sudo chown -R root:tracing /sys/kernel/debug/tracing/
>
> See the archlinux bug report I created initially for some more information:
> https://bugs.archlinux.org/task/80230
>
> With 6.6.1 and 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b reverted,
> the above `chmod` command works. With a normal 6.6.1 build, or re-applying
> the patch again, the command fails and `dmesg` shows:
Thanks for the report. I'll work on it on my way to Plumbers.
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing`
2023-11-12 12:14 ` Steven Rostedt
@ 2023-11-12 14:26 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2023-11-12 14:26 UTC (permalink / raw)
To: Milian Wolff; +Cc: akaher, akpm, gregkh, linux-kernel, mark.rutland, mhiramat
On Sun, 12 Nov 2023 07:14:39 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> > With 6.6.1 and 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b reverted,
> > the above `chmod` command works. With a normal 6.6.1 build, or re-applying
> > the patch again, the command fails and `dmesg` shows:
>
> Thanks for the report. I'll work on it on my way to Plumbers.
Can you test this patch?
Note, this code was rewritten for 6.7 so it probably doesn't affect
that tree, but I'm going to test to make sure, just in case.
Also, this shows I need to add a selftest to cover this case.
Thanks,
-- Steve
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 5fcfb634fec2..efbdc47c74dc 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -113,14 +113,14 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
mutex_lock(&eventfs_mutex);
ef = dentry->d_fsdata;
- if (ef->is_freed) {
+ if (ef && ef->is_freed) {
/* Do not allow changes if the event is about to be removed. */
mutex_unlock(&eventfs_mutex);
return -ENODEV;
}
ret = simple_setattr(idmap, dentry, iattr);
- if (!ret)
+ if (!ret && ef)
update_attr(ef, iattr);
mutex_unlock(&eventfs_mutex);
return ret;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing`
2023-11-12 10:41 ` NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing` Milian Wolff
2023-11-12 12:14 ` Steven Rostedt
@ 2023-11-14 13:38 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-11-14 13:55 ` Steven Rostedt
1 sibling, 1 reply; 14+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-11-14 13:38 UTC (permalink / raw)
To: Milian Wolff, rostedt
Cc: akaher, akpm, gregkh, linux-kernel, mark.rutland, mhiramat
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]
On 12.11.23 11:41, Milian Wolff wrote:
>
> this patch seems to have introduced a kernel bug causing
> a NULL pointer dereference when one runs:
>
> sudo chown -R root:tracing /sys/kernel/debug/tracing/
>
> See the archlinux bug report I created initially for some more information:
> https://bugs.archlinux.org/task/80230
>
> With 6.6.1 and 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b reverted,
> the above `chmod` command works. With a normal 6.6.1 build, or re-applying
> the patch again, the command fails and `dmesg` shows:
Steven is already working on this, but to ensure the issue doesn't fall
through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel
regression tracking bot (and from the context I assume it happens in
mainline as well)
#regzbot ^introduced 28e12c09f5aa081b2
#regzbot title eventfs: NULL pointer dereference regression when running
`chmod -R root:tracing /sys/kernel/debug/tracing`
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing`
2023-11-14 13:38 ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-11-14 13:55 ` Steven Rostedt
2023-11-14 15:06 ` Thorsten Leemhuis
0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2023-11-14 13:55 UTC (permalink / raw)
To: Linux regression tracking #adding (Thorsten Leemhuis)
Cc: Linux regressions mailing list, Milian Wolff, akaher, akpm,
gregkh, linux-kernel, mark.rutland, mhiramat
On Tue, 14 Nov 2023 14:38:57 +0100
"Linux regression tracking #adding (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.
> See link in footer if these mails annoy you.]
>
> On 12.11.23 11:41, Milian Wolff wrote:
> >
> > this patch seems to have introduced a kernel bug causing
> > a NULL pointer dereference when one runs:
> >
> > sudo chown -R root:tracing /sys/kernel/debug/tracing/
> >
> > See the archlinux bug report I created initially for some more information:
> > https://bugs.archlinux.org/task/80230
> >
> > With 6.6.1 and 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b reverted,
> > the above `chmod` command works. With a normal 6.6.1 build, or re-applying
> > the patch again, the command fails and `dmesg` shows:
>
> Steven is already working on this, but to ensure the issue doesn't fall
> through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel
> regression tracking bot (and from the context I assume it happens in
> mainline as well)
Note, the code in question was rewritten in 6.7 and the bug does not
exist there. It only exists in 6.6 and I already sent Greg the patch,
and he told me that it's in his queue.
It's only a regression in 6.6 and not in mainline.
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing`
2023-11-14 13:55 ` Steven Rostedt
@ 2023-11-14 15:06 ` Thorsten Leemhuis
0 siblings, 0 replies; 14+ messages in thread
From: Thorsten Leemhuis @ 2023-11-14 15:06 UTC (permalink / raw)
To: Steven Rostedt
Cc: Linux regressions mailing list, Milian Wolff, akaher, akpm,
gregkh, linux-kernel, mark.rutland, mhiramat
On 14.11.23 14:55, Steven Rostedt wrote:
> On Tue, 14 Nov 2023 14:38:57 +0100
> "Linux regression tracking #adding (Thorsten Leemhuis)" <regressions@leemhuis.info> wrote:
>
>> [TLDR: I'm adding this report to the list of tracked Linux kernel
>> regressions; the text you find below is based on a few templates
>> paragraphs you might have encountered already in similar form.
>> See link in footer if these mails annoy you.]
>>
>> On 12.11.23 11:41, Milian Wolff wrote:
>>>
>>> this patch seems to have introduced a kernel bug causing
>>> a NULL pointer dereference when one runs:
>>>
>>> sudo chown -R root:tracing /sys/kernel/debug/tracing/
>>>
>>> See the archlinux bug report I created initially for some more information:
>>> https://bugs.archlinux.org/task/80230
>>>
>>> With 6.6.1 and 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b reverted,
>>> the above `chmod` command works. With a normal 6.6.1 build, or re-applying
>>> the patch again, the command fails and `dmesg` shows:
>>
>> Steven is already working on this, but to ensure the issue doesn't fall
>> through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel
>> regression tracking bot (and from the context I assume it happens in
>> mainline as well)
>
> Note, the code in question was rewritten in 6.7 and the bug does not
> exist there. It only exists in 6.6 and I already sent Greg the patch,
> and he told me that it's in his queue.
>
> It's only a regression in 6.6 and not in mainline.
Ahh, sorry fot getting this wrong and thx for letting me know.
#regzbot introduced 9aaee3eebc91dd9ccebf6b6bc8a5f59d04ef718b
#regzbot fix: eventfs: Check for NULL ef in eventfs_set_attr()
Ciao, Thorsten
P.S.: Enjoy LPC, hope to be there again next year.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-11-14 15:06 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-05 15:56 [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 1/5] tracing: Have trace_event_file have ref counters Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 2/5] eventfs: Remove "is_freed" union with rcu head Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 3/5] eventfs: Save ownership and mode Steven Rostedt
2023-11-12 10:41 ` NULL pointer dereference regression when running `chmod -R root:tracing /sys/kernel/debug/tracing` Milian Wolff
2023-11-12 12:14 ` Steven Rostedt
2023-11-12 14:26 ` Steven Rostedt
2023-11-14 13:38 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-11-14 13:55 ` Steven Rostedt
2023-11-14 15:06 ` Thorsten Leemhuis
2023-11-05 15:56 ` [v6.6][PATCH 4/5] eventfs: Delete eventfs_inode when the last dentry is freed Steven Rostedt
2023-11-05 15:56 ` [v6.6][PATCH 5/5] eventfs: Use simple_recursive_removal() to clean up dentries Steven Rostedt
2023-11-06 11:40 ` [v6.6][PATCH 0/5] tracing: Backport of eventfs fixes for v6.6 Greg KH
2023-11-06 20:12 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox