From: Heiko Carstens <hca@linux.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Ajay Kaher <akaher@vmware.com>,
chinglinyu@google.com, lkp@intel.com, namit@vmware.com,
oe-lkp@lists.linux.dev, amakhalov@vmware.com,
er.ajay.kaher@gmail.com, srivatsa@csail.mit.edu,
tkundu@vmware.com, vsirnapalli@vmware.com,
linux-s390@vger.kernel.org
Subject: Re: [PATCH v5] eventfs: Remove eventfs_file and just use eventfs_inode
Date: Fri, 17 Nov 2023 15:23:35 +0100 [thread overview]
Message-ID: <20231117142335.9674-A-hca@linux.ibm.com> (raw)
In-Reply-To: <20231004165007.43d79161@gandalf.local.home>
Hi Steven,
On Wed, Oct 04, 2023 at 04:50:07PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
> Instead of having a descriptor for every file represented in the eventfs
> directory, only have the directory itself represented. Change the API to
> send in a list of entries that represent all the files in the directory
> (but not other directories). The entry list contains a name and a callback
> function that will be used to create the files when they are accessed.
...
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ajay Kaher <akaher@vmware.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> Changes since v4: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home/
>
> - Get the ei->dentry within the eventfs_mutex to keep consistency during the lookup.
>
> fs/tracefs/event_inode.c | 847 ++++++++++++++++++-----------------
> fs/tracefs/inode.c | 2 +-
> fs/tracefs/internal.h | 37 +-
> include/linux/trace_events.h | 2 +-
> include/linux/tracefs.h | 29 +-
> kernel/trace/trace.c | 7 +-
> kernel/trace/trace.h | 4 +-
> kernel/trace/trace_events.c | 313 +++++++++----
> 8 files changed, 705 insertions(+), 536 deletions(-)
I think this patch causes from time to time crashes when running ftrace
selftests. In particular I guess there is a bug wrt error handling in this
function (see below for call trace):
> +static struct dentry *
> +create_file_dentry(struct eventfs_inode *ei, struct dentry **e_dentry,
> + struct dentry *parent, const char *name, umode_t mode, void *data,
> + const struct file_operations *fops, bool lookup)
> +{
> + struct dentry *dentry;
> + bool invalidate = false;
> +
> + mutex_lock(&eventfs_mutex);
> + /* If the e_dentry already has a dentry, use it */
> + if (*e_dentry) {
> + /* lookup does not need to up the ref count */
> + if (!lookup)
> + dget(*e_dentry);
> + mutex_unlock(&eventfs_mutex);
> + return *e_dentry;
> + }
> + mutex_unlock(&eventfs_mutex);
> +
> + /* The lookup already has the parent->d_inode locked */
> + if (!lookup)
> + inode_lock(parent->d_inode);
> +
> + dentry = create_file(name, mode, parent, data, fops);
> +
> + if (!lookup)
> + inode_unlock(parent->d_inode);
> +
> + mutex_lock(&eventfs_mutex);
> +
> + if (IS_ERR_OR_NULL(dentry)) {
> + /*
> + * When the mutex was released, something else could have
> + * created the dentry for this e_dentry. In which case
> + * use that one.
> + *
> + * Note, with the mutex held, the e_dentry cannot have content
> + * and the ei->is_freed be true at the same time.
> + */
> + WARN_ON_ONCE(ei->is_freed);
> + dentry = *e_dentry;
> + /* The lookup does not need to up the dentry refcount */
> + if (dentry && !lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> +
> + if (!*e_dentry && !ei->is_freed) {
> + *e_dentry = dentry;
> + dentry->d_fsdata = ei;
> + } else {
> + /*
> + * Should never happen unless we get here due to being freed.
> + * Otherwise it means two dentries exist with the same name.
> + */
> + WARN_ON_ONCE(!ei->is_freed);
> + invalidate = true;
> + }
> + mutex_unlock(&eventfs_mutex);
> +
> + if (invalidate)
> + d_invalidate(dentry);
> +
> + if (lookup || invalidate)
> + dput(dentry);
> +
> + return invalidate ? NULL : dentry;
> +}
We sometimes see crashes like this:
specification exception: 0006 ilc:2 [#1] SMP
CPU: 6 PID: 38815 Comm: ls Not tainted 6.7.0-20231116.rc1.git1.a7e756a5bb26.300.vr.fc38.s390x #1
Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
Krnl PSW : 0704c00180000000 000001682304bb00 (d_invalidate+0x30/0x110)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 000000e200000000 0000000000000047 000000e200000007
0000000000000000 ffffff7c197bf000 000000e2f13b0b20 000000e25bfae180
000000e2f2536000 ffffffffffffffef 0000000000000000 ffffffffffffffef
000003ff95cacf98 000000e2f29323f0 000000e827c1fa18 000000e827c1f9d0
Krnl Code: 000001682304baf4: a7180000 lhi %r1,0
000001682304baf8: 583003ac l %r3,940
#000001682304bafc: ba13b058 cs %r1,%r3,88(%r11)
>000001682304bb00: ec16006b007e cij %r1,0,6,000001682304bbd6
000001682304bb06: e310b0100002 ltg %r1,16(%r11)
000001682304bb0c: a784004e brc 8,000001682304bba8
000001682304bb10: b904002b lgr %r2,%r11
000001682304bb14: c0e5ffffe67e brasl %r14,0000016823048810
Call Trace:
[<000001682304bb00>] d_invalidate+0x30/0x110
[<000001682329147a>] create_dir_dentry+0xe2/0x200
[<000001682329190a>] dcache_dir_open_wrapper+0x102/0x3e8
[<000001682301fb8a>] do_dentry_open+0x24a/0x568
[<0000016823038836>] do_open+0x2de/0x448
[<000001682303cb58>] path_openat+0x110/0x2b0
[<000001682303d688>] do_filp_open+0x90/0x130
[<0000016823022960>] do_sys_openat2+0xa8/0xd8
[<0000016823022b50>] do_sys_open+0x58/0x90
[<00000168239c9edc>] __do_syscall+0x1d4/0x200
[<00000168239db1f8>] system_call+0x70/0x98
Last Breaking-Event-Address:
[<0000016823291474>] create_dir_dentry+0xdc/0x200
Kernel panic - not syncing: Fatal exception: panic_on_oops
Note that the compare and swap instruction within d_invalidate() generates
a specification exception because it operates on an invalid address
(0xffffffffffffffef), which happens to be -EEXIST. So my assumption is that
create_dir_dentry() has incorrect error handling and passes -EEXIST instead
of a valid dentry pointer to d_invalidate().
But I leave it up to you to figure this out :)
next prev parent reply other threads:[~2023-11-17 14:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-04 20:50 [PATCH v5] eventfs: Remove eventfs_file and just use eventfs_inode Steven Rostedt
2023-11-17 14:23 ` Heiko Carstens [this message]
2023-11-17 14:38 ` Heiko Carstens
2023-11-23 11:25 ` Heiko Carstens
2023-11-23 12:34 ` Ajay Kaher
2023-11-23 15:23 ` Steven Rostedt
2023-11-23 16:06 ` Heiko Carstens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231117142335.9674-A-hca@linux.ibm.com \
--to=hca@linux.ibm.com \
--cc=akaher@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=amakhalov@vmware.com \
--cc=chinglinyu@google.com \
--cc=er.ajay.kaher@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=namit@vmware.com \
--cc=oe-lkp@lists.linux.dev \
--cc=rostedt@goodmis.org \
--cc=srivatsa@csail.mit.edu \
--cc=tkundu@vmware.com \
--cc=vsirnapalli@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.