From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
NeilBrown <neil@brown.name>
Subject: Re: [RFC] a possible way of reducing the PITA of ->d_name audits
Date: Mon, 8 Sep 2025 04:57:08 +0100 [thread overview]
Message-ID: <20250908035708.GH31600@ZenIV> (raw)
In-Reply-To: <20250908025135.GG31600@ZenIV>
On Mon, Sep 08, 2025 at 03:51:35AM +0100, Al Viro wrote:
> Most of the uses *are* done to stable dentries; it's just that we have no
> way to tell which ones are like that.
Random example to get the taste of that joy:
static void
xfs_dentry_to_name(
struct xfs_name *namep,
struct dentry *dentry)
{
namep->name = dentry->d_name.name;
namep->len = dentry->d_name.len;
namep->type = XFS_DIR3_FT_UNKNOWN;
}
OK, fetches from ->d_name. Callers:
xfs_cleanup_inode()
xfs_generic_create()
xfs_vn_mknod()
== xfs_dir_inode_operations.mknod
== xfs_dir_ci_inode_operations.mknod
xfs_vn_create()
== xfs_dir_inode_operations.create
== xfs_dir_ci_inode_operations.create
xfs_vn_mkdir()
== xfs_dir_inode_operations.mkdir
== xfs_dir_ci_inode_operations.mkdir
xfs_vn_tmpfile() # WTF?
xfs_vn_symlink()
== xfs_dir_inode_operations.symlink
== xfs_dir_ci_inode_operations.symlink
xfs_vn_lookup()
== xfs_dir_inode_operations.lookup
xfs_vn_ci_lookup()
== xfs_dir_ci_inode_operations.lookup
xfs_vn_unlink()
== xfs_dir_inode_operations.unlink
== xfs_dir_inode_operations.rmdir
== xfs_dir_ci_inode_operations.unlink
== xfs_dir_ci_inode_operations.rmdir
(+ checking that in all cases dentry has come from the method argument)
WTF is going on with xfs_vn_tmpfile()? It doesn't *have* any useful
name... looking... Aha. vfs_generic_create(_, _, _, _, _, p) only calls
xfs_cleanup_inode() in case when p is NULL; xfs_vn_tmpfile() is called
only as ->tmpfile(), and the only caller of that is this:
file->f_path.mnt = parentpath->mnt;
file->f_path.dentry = child;
mode = vfs_prepare_mode(idmap, dir, mode, mode, mode);
error = dir->i_op->tmpfile(idmap, dir, file, mode);
so the method never gets called with NULL as the 3rd argument. Safe...
And that's just one example - two grep hits. Right next to them,
static int
xfs_dentry_mode_to_name(
struct xfs_name *namep,
struct dentry *dentry,
int mode)
{
namep->name = dentry->d_name.name;
namep->len = dentry->d_name.len;
namep->type = xfs_mode_to_ftype(mode);
if (unlikely(namep->type == XFS_DIR3_FT_UNKNOWN))
return -EFSCORRUPTED;
return 0;
}
Callers:
xfs_generic_create()
same callers as above, but this time it's *not* conditional:
------------------------------------------------------------------
/* Verify mode is valid also for tmpfile case */
error = xfs_dentry_mode_to_name(&name, dentry, args.mode);
if (unlikely(error))
goto out_free_acl;
------------------------------------------------------------------
presumably it's "we may fetch shite for tmpfile, but in that case
we won't use that shite".
xfs_vn_link()
== xfs_dir_inode_operations.link
== xfs_dir_ci_inode_operations.link
xfs_vn_symlink()
seen above, same dentry as above
xfs_vn_rename() # for odentry
== xfs_dir_inode_operations.rename
== xfs_dir_ci_inode_operations.rename
xfs_vn_rename() # for ndentry
seen above
That's not all for xfs, though - there's also
error = xfs_inode_init_security(inode, dir, &dentry->d_name);
in the same xfs_generic_create() - and also called for tmpfile case,
AFAISC. Which is quite likely a bug - ->d_name is stable there, all
right, but at that stage it's "/"; what selinux (the only thing that
cares about the basename of object being created) would do to that
is an interesting question, might depend upon the policy. Non-tmpfile
callers are OK, as seen above.
Another one:
error = xfs_inode_init_security(inode, dir, &dentry->d_name);
in xfs_vn_symlink(), safe per above.
Another:
if (dentry->d_name.len >= MAXNAMELEN)
return ERR_PTR(-ENAMETOOLONG);
in xfs_vn_lookup() and xfs_vn_ci_lookup(). Safe.
... and finally there's this, in all its foul glory:
DECLARE_EVENT_CLASS(xrep_dentry_class,
TP_PROTO(struct xfs_mount *mp, const struct dentry *dentry),
TP_ARGS(mp, dentry),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(unsigned int, flags)
__field(unsigned long, ino)
__field(bool, positive)
__field(unsigned long, parent_ino)
__field(unsigned int, namelen)
__dynamic_array(char, name, dentry->d_name.len)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->flags = dentry->d_flags;
__entry->positive = d_is_positive(dentry);
if (dentry->d_parent && d_inode(dentry->d_parent))
__entry->parent_ino = d_inode(dentry->d_parent)->i_ino;
else
__entry->parent_ino = -1UL;
__entry->ino = d_inode(dentry) ? d_inode(dentry)->i_ino : 0;
__entry->namelen = dentry->d_name.len;
memcpy(__get_str(name), dentry->d_name.name, dentry->d_name.len);
),
TP_printk("dev %d:%d flags 0x%x positive? %d parent_ino 0x%lx ino 0x%lx name '%.*s'",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->flags,
__entry->positive,
__entry->parent_ino,
__entry->ino,
__entry->namelen,
__get_str(name))
);
#define DEFINE_REPAIR_DENTRY_EVENT(name) \
DEFINE_EVENT(xrep_dentry_class, name, \
TP_PROTO(struct xfs_mount *mp, const struct dentry *dentry), \
TP_ARGS(mp, dentry))
DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_check_child);
DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_invalidate_child);
DEFINE_REPAIR_DENTRY_EVENT(xrep_dirtree_delete_child);
used by
trace_xrep_adoption_check_child(sc->mp, d_child);
in xrep_adoption_check_dcache(), two calls of
trace_xrep_adoption_invalidate_child(sc->mp, d_child);
in xrep_adoption_zap_dcache() and
trace_xrep_dirtree_delete_child(dp->i_mount, child_dentry);
in xrep_dirtree_purge_dentry()
The last 3 are *not* stable - fuck knows if they can happen in parallel with
lookups from other threads (those can end up moving dentries on sufficiently
buggered filesystem), but IMO these deserve take_dentry_name_snapshot()
treatment - if tracepoint is active, that is.
The rest all get stable dentries; I would really prefer to have that checked
by compiler, with sufficient annotations given to it. In this case -
struct stable_dentry arguments for lookup/create/mkdir/mknod/symlink/unlink/rmdir,
rename and link as part of calling conventions change
+ stable_dentry as argument of xfs_generic_create(), xfs_dentry_mode_to_name(),
xfs_dentry_to_name() and xfs_cleanup_inode()
+ claim of stability in xfs_vn_tmpfile():
STATIC int
xfs_vn_tmpfile(
struct mnt_idmap *idmap,
struct inode *dir,
struct file *file,
umode_t mode)
{
int err = xfs_generic_create(idmap, dir,
claim_stability(file->f_path.dentry), mode, 0, file);
// at this point in ->tmpfile() dentry is nameless and negative;
// nothing can move it until we get to finish_open_...()
return finish_open_simple(file, err);
}
That way xfs hits will be down to that claim_stability() and the obscenity in
trace.h - until the users of the latter get wrapped into something that would
take snapshots and pass those instead of messing with ->d_name. Considering
the fun quoted above, not having to repeat that digging is something I'd
count as a win...
next prev parent reply other threads:[~2025-09-08 3:57 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-07 20:32 [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-07 21:51 ` Linus Torvalds
2025-09-08 0:06 ` Al Viro
2025-09-08 0:47 ` Linus Torvalds
2025-09-08 2:51 ` Al Viro
2025-09-08 3:57 ` Al Viro [this message]
2025-09-08 4:50 ` NeilBrown
2025-09-08 5:19 ` Al Viro
2025-09-08 6:25 ` NeilBrown
2025-09-08 9:05 ` Al Viro
2025-09-10 2:45 ` NeilBrown
2025-09-10 7:24 ` Al Viro
2025-09-10 22:52 ` NeilBrown
2025-09-12 5:49 ` ->atomic_open() fun (was Re: [RFC] a possible way of reducing the PITA of ->d_name audits) Al Viro
2025-09-12 8:23 ` Miklos Szeredi
2025-09-12 18:29 ` Al Viro
2025-09-12 19:22 ` Miklos Szeredi
2025-09-12 20:36 ` Al Viro
2025-09-12 20:50 ` Al Viro
2025-09-13 3:36 ` NeilBrown
2025-09-13 5:07 ` Al Viro
2025-09-13 5:50 ` NeilBrown
2025-09-14 19:01 ` Miklos Szeredi
2025-09-14 19:50 ` Al Viro
2025-09-14 20:05 ` Miklos Szeredi
2025-09-15 8:54 ` Bernd Schubert
2025-09-12 18:55 ` Al Viro
2025-09-12 18:59 ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Al Viro
2025-09-12 18:59 ` [PATCH 2/9] 9p: simplify v9fs_vfs_atomic_open() Al Viro
2025-09-12 18:59 ` [PATCH 3/9] 9p: simplify v9fs_vfs_atomic_open_dotl() Al Viro
2025-09-12 18:59 ` [PATCH 4/9] simplify cifs_atomic_open() Al Viro
2025-09-12 18:59 ` [PATCH 5/9] simplify vboxsf_dir_atomic_open() Al Viro
2025-09-12 18:59 ` [PATCH 6/9] simplify nfs_atomic_open_v23() Al Viro
2025-09-12 18:59 ` [PATCH 7/9] simplify fuse_atomic_open() Al Viro
2025-09-12 18:59 ` [PATCH 8/9] simplify gfs2_atomic_open() Al Viro
2025-09-12 18:59 ` [PATCH 9/9] slightly simplify nfs_atomic_open() Al Viro
2025-09-12 22:23 ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Linus Torvalds
2025-09-13 3:34 ` NeilBrown
2025-09-13 21:28 ` [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-14 1:05 ` NeilBrown
2025-09-14 1:37 ` Al Viro
2025-09-14 5:56 ` Al Viro
2025-09-14 23:07 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250908035708.GH31600@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=neil@brown.name \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox