Re: [RFC] a possible way of reducing the PITA of ->d_name audits

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	NeilBrown <neil@brown.name>
Subject: Re: [RFC] a possible way of reducing the PITA of ->d_name audits
Date: Mon, 8 Sep 2025 04:57:08 +0100	[thread overview]
Message-ID: <20250908035708.GH31600@ZenIV> (raw)
In-Reply-To: <20250908025135.GG31600@ZenIV>

On Mon, Sep 08, 2025 at 03:51:35AM +0100, Al Viro wrote:
 
> Most of the uses *are* done to stable dentries; it's just that we have no
> way to tell which ones are like that.

Random example to get the taste of that joy:

static void
xfs_dentry_to_name( 
        struct xfs_name *namep,
        struct dentry   *dentry)
{
        namep->name = dentry->d_name.name;
        namep->len = dentry->d_name.len;
        namep->type = XFS_DIR3_FT_UNKNOWN;
}

OK, fetches from ->d_name.  Callers:
	xfs_cleanup_inode()
		xfs_generic_create()
			xfs_vn_mknod()
				== xfs_dir_inode_operations.mknod
				== xfs_dir_ci_inode_operations.mknod
			xfs_vn_create()
				== xfs_dir_inode_operations.create
				== xfs_dir_ci_inode_operations.create
			xfs_vn_mkdir()
				== xfs_dir_inode_operations.mkdir
				== xfs_dir_ci_inode_operations.mkdir
			xfs_vn_tmpfile()	# WTF?
		xfs_vn_symlink()
			== xfs_dir_inode_operations.symlink
			== xfs_dir_ci_inode_operations.symlink
	xfs_vn_lookup()
		== xfs_dir_inode_operations.lookup
	xfs_vn_ci_lookup()
		== xfs_dir_ci_inode_operations.lookup
	xfs_vn_unlink()
		== xfs_dir_inode_operations.unlink
		== xfs_dir_inode_operations.rmdir
		== xfs_dir_ci_inode_operations.unlink
		== xfs_dir_ci_inode_operations.rmdir
(+ checking that in all cases dentry has come from the method argument)
WTF is going on with xfs_vn_tmpfile()?  It doesn't *have* any useful
name... looking... Aha.  vfs_generic_create(_, _, _, _, _, p) only calls
xfs_cleanup_inode() in case when p is NULL; xfs_vn_tmpfile() is called
only as ->tmpfile(), and the only caller of that is this:
        file->f_path.mnt = parentpath->mnt;
        file->f_path.dentry = child;
        mode = vfs_prepare_mode(idmap, dir, mode, mode, mode);
        error = dir->i_op->tmpfile(idmap, dir, file, mode);
so the method never gets called with NULL as the 3rd argument.  Safe...

And that's just one example - two grep hits.  Right next to them,
static int
xfs_dentry_mode_to_name(
        struct xfs_name *namep,
        struct dentry   *dentry,
        int             mode)
{
        namep->name = dentry->d_name.name;
        namep->len = dentry->d_name.len;
        namep->type = xfs_mode_to_ftype(mode);
 
        if (unlikely(namep->type == XFS_DIR3_FT_UNKNOWN))
                return -EFSCORRUPTED;
 
        return 0;
}

Callers:
	xfs_generic_create()
		same callers as above, but this time it's *not* conditional:
------------------------------------------------------------------
        /* Verify mode is valid also for tmpfile case */
        error = xfs_dentry_mode_to_name(&name, dentry, args.mode);
        if (unlikely(error))
                goto out_free_acl;
------------------------------------------------------------------
		presumably it's "we may fetch shite for tmpfile, but in that case
		we won't use that shite".
	xfs_vn_link()
		== xfs_dir_inode_operations.link
		== xfs_dir_ci_inode_operations.link
	xfs_vn_symlink()
		seen above, same dentry as above
	xfs_vn_rename()	# for odentry
		== xfs_dir_inode_operations.rename
		== xfs_dir_ci_inode_operations.rename
	xfs_vn_rename() # for ndentry
		seen above

That's not all for xfs, though - there's also
        error = xfs_inode_init_security(inode, dir, &dentry->d_name);
in the same xfs_generic_create() - and also called for tmpfile case,
AFAISC.  Which is quite likely a bug - ->d_name is stable there, all
right, but at that stage it's "/"; what selinux (the only thing that
cares about the basename of object being created) would do to that
is an interesting question, might depend upon the policy.  Non-tmpfile
callers are OK, as seen above.
Another one:
        error = xfs_inode_init_security(inode, dir, &dentry->d_name);
in xfs_vn_symlink(), safe per above.
Another:
        if (dentry->d_name.len >= MAXNAMELEN)
                return ERR_PTR(-ENAMETOOLONG);
in xfs_vn_lookup() and xfs_vn_ci_lookup().  Safe.

... and finally there's this, in all its foul glory:
DECLARE_EVENT_CLASS(xrep_dentry_class,
        TP_PROTO(struct xfs_mount *mp, const struct dentry *dentry),
        TP_ARGS(mp, dentry),
        TP_STRUCT__entry(
                __field(dev_t, dev)
                __field(unsigned int, flags)
                __field(unsigned long, ino)
                __field(bool, positive)
                __field(unsigned long, parent_ino)
                __field(unsigned int, namelen)
                __dynamic_array(char, name, dentry->d_name.len)
        ),
        TP_fast_assign(
                __entry->dev = mp->m_super->s_dev;
                __entry->flags = dentry->d_flags;
                __entry->positive = d_is_positive(dentry);
                if (dentry->d_parent && d_inode(dentry->d_parent))
                        __entry->parent_ino = d_inode(dentry->d_parent)->i_ino;
                else
                        __entry->parent_ino = -1UL;
                __entry->ino = d_inode(dentry) ? d_inode(dentry)->i_ino : 0;
                __entry->namelen = dentry->d_name.len;
                memcpy(__get_str(name), dentry->d_name.name, dentry->d_name.len);
        ),
        TP_printk("dev %d:%d flags 0x%x positive? %d parent_ino 0x%lx ino 0x%lx name '%.*s'",
                  MAJOR(__entry->dev), MINOR(__entry->dev),
                  __entry->flags,
                  __entry->positive,
                  __entry->parent_ino,
                  __entry->ino,
                  __entry->namelen,
                  __get_str(name))
);
#define DEFINE_REPAIR_DENTRY_EVENT(name) \
DEFINE_EVENT(xrep_dentry_class, name, \
        TP_PROTO(struct xfs_mount *mp, const struct dentry *dentry), \
        TP_ARGS(mp, dentry))
DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_check_child);
DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_invalidate_child);
DEFINE_REPAIR_DENTRY_EVENT(xrep_dirtree_delete_child);

used by
                trace_xrep_adoption_check_child(sc->mp, d_child);
in xrep_adoption_check_dcache(), two calls of
                trace_xrep_adoption_invalidate_child(sc->mp, d_child);
in xrep_adoption_zap_dcache() and 
        trace_xrep_dirtree_delete_child(dp->i_mount, child_dentry);
in xrep_dirtree_purge_dentry()

The last 3 are *not* stable - fuck knows if they can happen in parallel with
lookups from other threads (those can end up moving dentries on sufficiently
buggered filesystem), but IMO these deserve take_dentry_name_snapshot()
treatment - if tracepoint is active, that is.

The rest all get stable dentries; I would really prefer to have that checked
by compiler, with sufficient annotations given to it.  In this case -
struct stable_dentry arguments for lookup/create/mkdir/mknod/symlink/unlink/rmdir,
rename and link as part of calling conventions change
+ stable_dentry as argument of xfs_generic_create(), xfs_dentry_mode_to_name(),
xfs_dentry_to_name() and xfs_cleanup_inode()
+ claim of stability in xfs_vn_tmpfile():
STATIC int
xfs_vn_tmpfile(
        struct mnt_idmap        *idmap,
        struct inode            *dir,
        struct file             *file,
        umode_t                 mode)
{
	int err = xfs_generic_create(idmap, dir,
			claim_stability(file->f_path.dentry), mode, 0, file);
	// at this point in ->tmpfile() dentry is nameless and negative;
	// nothing can move it until we get to finish_open_...()
        return finish_open_simple(file, err);
}

That way xfs hits will be down to that claim_stability() and the obscenity in
trace.h - until the users of the latter get wrapped into something that would
take snapshots and pass those instead of messing with ->d_name.  Considering
the fun quoted above, not having to repeat that digging is something I'd
count as a win...

next prev parent reply	other threads:[~2025-09-08  3:57 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-07 20:32 [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-07 21:51 ` Linus Torvalds
2025-09-08  0:06   ` Al Viro
2025-09-08  0:47     ` Linus Torvalds
2025-09-08  2:51       ` Al Viro
2025-09-08  3:57         ` Al Viro [this message]
2025-09-08  4:50           ` NeilBrown
2025-09-08  5:19             ` Al Viro
2025-09-08  6:25               ` NeilBrown
2025-09-08  9:05                 ` Al Viro
2025-09-10  2:45                   ` NeilBrown
2025-09-10  7:24                     ` Al Viro
2025-09-10 22:52                       ` NeilBrown
2025-09-12  5:49                       ` ->atomic_open() fun (was Re: [RFC] a possible way of reducing the PITA of ->d_name audits) Al Viro
2025-09-12  8:23                         ` Miklos Szeredi
2025-09-12 18:29                           ` Al Viro
2025-09-12 19:22                             ` Miklos Szeredi
2025-09-12 20:36                               ` Al Viro
2025-09-12 20:50                                 ` Al Viro
2025-09-13  3:36                             ` NeilBrown
2025-09-13  5:07                               ` Al Viro
2025-09-13  5:50                                 ` NeilBrown
2025-09-14 19:01                                 ` Miklos Szeredi
2025-09-14 19:50                                   ` Al Viro
2025-09-14 20:05                                     ` Miklos Szeredi
2025-09-15  8:54                                       ` Bernd Schubert
2025-09-12 18:55                         ` Al Viro
2025-09-12 18:59                           ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Al Viro
2025-09-12 18:59                             ` [PATCH 2/9] 9p: simplify v9fs_vfs_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 3/9] 9p: simplify v9fs_vfs_atomic_open_dotl() Al Viro
2025-09-12 18:59                             ` [PATCH 4/9] simplify cifs_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 5/9] simplify vboxsf_dir_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 6/9] simplify nfs_atomic_open_v23() Al Viro
2025-09-12 18:59                             ` [PATCH 7/9] simplify fuse_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 8/9] simplify gfs2_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 9/9] slightly simplify nfs_atomic_open() Al Viro
2025-09-12 22:23                             ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Linus Torvalds
2025-09-13  3:34                             ` NeilBrown
2025-09-13 21:28                   ` [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-14  1:05                     ` NeilBrown
2025-09-14  1:37                       ` Al Viro
2025-09-14  5:56                         ` Al Viro
2025-09-14 23:07                           ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250908035708.GH31600@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox