Re: [RFC] a possible way of reducing the PITA of ->d_name audits

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	NeilBrown <neil@brown.name>
Subject: Re: [RFC] a possible way of reducing the PITA of ->d_name audits
Date: Mon, 8 Sep 2025 04:57:08 +0100	[thread overview]
Message-ID: <20250908035708.GH31600@ZenIV> (raw)
In-Reply-To: <20250908025135.GG31600@ZenIV>

On Mon, Sep 08, 2025 at 03:51:35AM +0100, Al Viro wrote:
 
> Most of the uses *are* done to stable dentries; it's just that we have no
> way to tell which ones are like that.

Random example to get the taste of that joy:

static void
xfs_dentry_to_name( 
        struct xfs_name *namep,
        struct dentry   *dentry)
{
        namep->name = dentry->d_name.name;
        namep->len = dentry->d_name.len;
        namep->type = XFS_DIR3_FT_UNKNOWN;
}

OK, fetches from ->d_name.  Callers:
	xfs_cleanup_inode()
		xfs_generic_create()
			xfs_vn_mknod()
				== xfs_dir_inode_operations.mknod
				== xfs_dir_ci_inode_operations.mknod
			xfs_vn_create()
				== xfs_dir_inode_operations.create
				== xfs_dir_ci_inode_operations.create
			xfs_vn_mkdir()
				== xfs_dir_inode_operations.mkdir
				== xfs_dir_ci_inode_operations.mkdir
			xfs_vn_tmpfile()	# WTF?
		xfs_vn_symlink()
			== xfs_dir_inode_operations.symlink
			== xfs_dir_ci_inode_operations.symlink
	xfs_vn_lookup()
		== xfs_dir_inode_operations.lookup
	xfs_vn_ci_lookup()
		== xfs_dir_ci_inode_operations.lookup
	xfs_vn_unlink()
		== xfs_dir_inode_operations.unlink
		== xfs_dir_inode_operations.rmdir
		== xfs_dir_ci_inode_operations.unlink
		== xfs_dir_ci_inode_operations.rmdir
(+ checking that in all cases dentry has come from the method argument)
WTF is going on with xfs_vn_tmpfile()?  It doesn't *have* any useful
name... looking... Aha.  vfs_generic_create(_, _, _, _, _, p) only calls
xfs_cleanup_inode() in case when p is NULL; xfs_vn_tmpfile() is called
only as ->tmpfile(), and the only caller of that is this:
        file->f_path.mnt = parentpath->mnt;
        file->f_path.dentry = child;
        mode = vfs_prepare_mode(idmap, dir, mode, mode, mode);
        error = dir->i_op->tmpfile(idmap, dir, file, mode);
so the method never gets called with NULL as the 3rd argument.  Safe...

And that's just one example - two grep hits.  Right next to them,
static int
xfs_dentry_mode_to_name(
        struct xfs_name *namep,
        struct dentry   *dentry,
        int             mode)
{
        namep->name = dentry->d_name.name;
        namep->len = dentry->d_name.len;
        namep->type = xfs_mode_to_ftype(mode);
 
        if (unlikely(namep->type == XFS_DIR3_FT_UNKNOWN))
                return -EFSCORRUPTED;
 
        return 0;
}

Callers:
	xfs_generic_create()
		same callers as above, but this time it's *not* conditional:
------------------------------------------------------------------
        /* Verify mode is valid also for tmpfile case */
        error = xfs_dentry_mode_to_name(&name, dentry, args.mode);
        if (unlikely(error))
                goto out_free_acl;
------------------------------------------------------------------
		presumably it's "we may fetch shite for tmpfile, but in that case
		we won't use that shite".
	xfs_vn_link()
		== xfs_dir_inode_operations.link
		== xfs_dir_ci_inode_operations.link
	xfs_vn_symlink()
		seen above, same dentry as above
	xfs_vn_rename()	# for odentry
		== xfs_dir_inode_operations.rename
		== xfs_dir_ci_inode_operations.rename
	xfs_vn_rename() # for ndentry
		seen above

That's not all for xfs, though - there's also
        error = xfs_inode_init_security(inode, dir, &dentry->d_name);
in the same xfs_generic_create() - and also called for tmpfile case,
AFAISC.  Which is quite likely a bug - ->d_name is stable there, all
right, but at that stage it's "/"; what selinux (the only thing that
cares about the basename of object being created) would do to that
is an interesting question, might depend upon the policy.  Non-tmpfile
callers are OK, as seen above.
Another one:
        error = xfs_inode_init_security(inode, dir, &dentry->d_name);
in xfs_vn_symlink(), safe per above.
Another:
        if (dentry->d_name.len >= MAXNAMELEN)
                return ERR_PTR(-ENAMETOOLONG);
in xfs_vn_lookup() and xfs_vn_ci_lookup().  Safe.

... and finally there's this, in all its foul glory:
DECLARE_EVENT_CLASS(xrep_dentry_class,
        TP_PROTO(struct xfs_mount *mp, const struct dentry *dentry),
        TP_ARGS(mp, dentry),
        TP_STRUCT__entry(
                __field(dev_t, dev)
                __field(unsigned int, flags)
                __field(unsigned long, ino)
                __field(bool, positive)
                __field(unsigned long, parent_ino)
                __field(unsigned int, namelen)
                __dynamic_array(char, name, dentry->d_name.len)
        ),
        TP_fast_assign(
                __entry->dev = mp->m_super->s_dev;
                __entry->flags = dentry->d_flags;
                __entry->positive = d_is_positive(dentry);
                if (dentry->d_parent && d_inode(dentry->d_parent))
                        __entry->parent_ino = d_inode(dentry->d_parent)->i_ino;
                else
                        __entry->parent_ino = -1UL;
                __entry->ino = d_inode(dentry) ? d_inode(dentry)->i_ino : 0;
                __entry->namelen = dentry->d_name.len;
                memcpy(__get_str(name), dentry->d_name.name, dentry->d_name.len);
        ),
        TP_printk("dev %d:%d flags 0x%x positive? %d parent_ino 0x%lx ino 0x%lx name '%.*s'",
                  MAJOR(__entry->dev), MINOR(__entry->dev),
                  __entry->flags,
                  __entry->positive,
                  __entry->parent_ino,
                  __entry->ino,
                  __entry->namelen,
                  __get_str(name))
);
#define DEFINE_REPAIR_DENTRY_EVENT(name) \
DEFINE_EVENT(xrep_dentry_class, name, \
        TP_PROTO(struct xfs_mount *mp, const struct dentry *dentry), \
        TP_ARGS(mp, dentry))
DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_check_child);
DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_invalidate_child);
DEFINE_REPAIR_DENTRY_EVENT(xrep_dirtree_delete_child);

used by
                trace_xrep_adoption_check_child(sc->mp, d_child);
in xrep_adoption_check_dcache(), two calls of
                trace_xrep_adoption_invalidate_child(sc->mp, d_child);
in xrep_adoption_zap_dcache() and 
        trace_xrep_dirtree_delete_child(dp->i_mount, child_dentry);
in xrep_dirtree_purge_dentry()

The last 3 are *not* stable - fuck knows if they can happen in parallel with
lookups from other threads (those can end up moving dentries on sufficiently
buggered filesystem), but IMO these deserve take_dentry_name_snapshot()
treatment - if tracepoint is active, that is.

The rest all get stable dentries; I would really prefer to have that checked
by compiler, with sufficient annotations given to it.  In this case -
struct stable_dentry arguments for lookup/create/mkdir/mknod/symlink/unlink/rmdir,
rename and link as part of calling conventions change
+ stable_dentry as argument of xfs_generic_create(), xfs_dentry_mode_to_name(),
xfs_dentry_to_name() and xfs_cleanup_inode()
+ claim of stability in xfs_vn_tmpfile():
STATIC int
xfs_vn_tmpfile(
        struct mnt_idmap        *idmap,
        struct inode            *dir,
        struct file             *file,
        umode_t                 mode)
{
	int err = xfs_generic_create(idmap, dir,
			claim_stability(file->f_path.dentry), mode, 0, file);
	// at this point in ->tmpfile() dentry is nameless and negative;
	// nothing can move it until we get to finish_open_...()
        return finish_open_simple(file, err);
}

That way xfs hits will be down to that claim_stability() and the obscenity in
trace.h - until the users of the latter get wrapped into something that would
take snapshots and pass those instead of messing with ->d_name.  Considering
the fun quoted above, not having to repeat that digging is something I'd
count as a win...

next prev parent reply	other threads:[~2025-09-08  3:57 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-07 20:32 [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-07 21:51 ` Linus Torvalds
2025-09-08  0:06   ` Al Viro
2025-09-08  0:47     ` Linus Torvalds
2025-09-08  2:51       ` Al Viro
2025-09-08  3:57         ` Al Viro [this message]
2025-09-08  4:50           ` NeilBrown
2025-09-08  5:19             ` Al Viro
2025-09-08  6:25               ` NeilBrown
2025-09-08  9:05                 ` Al Viro
2025-09-10  2:45                   ` NeilBrown
2025-09-10  7:24                     ` Al Viro
2025-09-10 22:52                       ` NeilBrown
2025-09-12  5:49                       ` ->atomic_open() fun (was Re: [RFC] a possible way of reducing the PITA of ->d_name audits) Al Viro
2025-09-12  8:23                         ` Miklos Szeredi
2025-09-12 18:29                           ` Al Viro
2025-09-12 19:22                             ` Miklos Szeredi
2025-09-12 20:36                               ` Al Viro
2025-09-12 20:50                                 ` Al Viro
2025-09-13  3:36                             ` NeilBrown
2025-09-13  5:07                               ` Al Viro
2025-09-13  5:50                                 ` NeilBrown
2025-09-14 19:01                                 ` Miklos Szeredi
2025-09-14 19:50                                   ` Al Viro
2025-09-14 20:05                                     ` Miklos Szeredi
2025-09-15  8:54                                       ` Bernd Schubert
2025-09-12 18:55                         ` Al Viro
2025-09-12 18:59                           ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Al Viro
2025-09-12 18:59                             ` [PATCH 2/9] 9p: simplify v9fs_vfs_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 3/9] 9p: simplify v9fs_vfs_atomic_open_dotl() Al Viro
2025-09-12 18:59                             ` [PATCH 4/9] simplify cifs_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 5/9] simplify vboxsf_dir_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 6/9] simplify nfs_atomic_open_v23() Al Viro
2025-09-12 18:59                             ` [PATCH 7/9] simplify fuse_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 8/9] simplify gfs2_atomic_open() Al Viro
2025-09-12 18:59                             ` [PATCH 9/9] slightly simplify nfs_atomic_open() Al Viro
2025-09-12 22:23                             ` [PATCH 1/9] allow finish_no_open(file, ERR_PTR(-E...)) Linus Torvalds
2025-09-13  3:34                             ` NeilBrown
2025-09-13 21:28                   ` [RFC] a possible way of reducing the PITA of ->d_name audits Al Viro
2025-09-14  1:05                     ` NeilBrown
2025-09-14  1:37                       ` Al Viro
2025-09-14  5:56                         ` Al Viro
2025-09-14 23:07                           ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250908035708.GH31600@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.