From: Dave Chinner <david@fromorbit.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
brauner@kernel.org, viro@zeniv.linux.org.uk,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, kernel-team@fb.com, amir73il@gmail.com,
linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-xfs@vger.kernel.org, ceph-devel@vger.kernel.org,
linux-unionfs@vger.kernel.org
Subject: Re: [PATCH v7 03/14] fs: provide accessors for ->i_state
Date: Thu, 16 Oct 2025 09:06:45 +1100 [thread overview]
Message-ID: <aPAa9fz-4OG_9pVX@dread.disaster.area> (raw)
In-Reply-To: <CAGudoHFpoo0Qm=b4Z85tbJJmhh+vmSHuNnm3pVaLaQsmX9mURg@mail.gmail.com>
On Wed, Oct 15, 2025 at 07:46:39AM +0200, Mateusz Guzik wrote:
> On Wed, Oct 15, 2025 at 12:24 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Oct 10, 2025 at 05:51:06PM +0200, Mateusz Guzik wrote:
> > > On Fri, Oct 10, 2025 at 4:44 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Thu 09-10-25 09:59:17, Mateusz Guzik wrote:
> > > > > +static inline void inode_state_set_raw(struct inode *inode,
> > > > > + enum inode_state_flags_enum flags)
> > > > > +{
> > > > > + WRITE_ONCE(inode->i_state, inode->i_state | flags);
> > > > > +}
> > > >
> > > > I think this shouldn't really exist as it is dangerous to use and if we
> > > > deal with XFS, nobody will actually need this function.
> > > >
> > >
> > > That's not strictly true, unless you mean code outside of fs/inode.c
> > >
> > > First, something is still needed to clear out the state in
> > > inode_init_always_gfp().
> > >
> > > Afterwards there are few spots which further modify it without the
> > > spinlock held (for example see insert_inode_locked4()).
> > >
> > > My take on the situation is that the current I_NEW et al handling is
> > > crap and the inode hash api is also crap.
> >
> > The inode hash implementation is crap, too. The historically poor
> > scalability characteristics of the VFS inode cache is the primary
> > reason we've never considered ever trying to port XFS to use it,
> > even if we ignore all the inode lifecycle issues that would have to
> > be solved first...
> >
>
> I don't know of anyone defending the inode hash tho. The performance
> of the thing was already bashed a few times, I did not see anyone
> dunking on the API ;)
I think it goes without saying that the amount of
similar-but-slightly-different-and-inconsistently-named functions
that have simply grown organically as individual fs needs have
occurred has resulted in a bit of a mess that nobody really wants to
tackle... :/
> > > For starters freshly allocated inodes should not be starting with 0,
> > > but with I_NEW.
> >
> > Not all inodes are cached filesystem inodes. e.g. anonymous inodes
> > are initialised to inode->i_state = I_DIRTY. pipe inodes also start
> > at I_DIRTY. socket inodes don't touch i_state at init, so they
> > essentially init i_state = 0....
> >
> > IOWs, the initial inode state depends on what the inode is being
> > used for, and I_NEW is only relevant to inodes that are cached and
> > can be found before the filesystem has fully initialised the VFS
> > inode.
> >
>
> Well it is true that currently the I_NEW flag is there to help out
> entities like the hash inode hash.
>
> I'm looking to change it into a generic indicator of an uninitialized
> inode. This is completely harmless for the consumers which currently
> operate on inodes which never had the flag.
>
> Here is one use: I'd like to introduce a mandatory routine to call
> when the filesystem at hand claims the inode is ready to use.
I like the idea, but I don't think that overloading I_NEW is the
right thing to do nor is it that simple.
e.g. We added the I_CREATING state years ago as a subset of I_NEW so
that VFS inodes being instantiated can't be found -at all- by the
open-by-handle interface doing direct inode hash lookups. However,
only some of the inode hash APIs add this flag, and only overlay as
a filesystem adds it in certain circumstances.
IOWs, even during initialisation, different filesystems need to
behave differently w.r.t. how the core VFS performs various
operations on the inode during the initialisation stage...
FWIW, XFS has the XFS_INEW state that wraps around the outside of
the VFS inode initialisation process that prevents it from being
found via any type of inode cache lookup (internal or external)
until the inode is fully initialised.
IOWs, features that XFS has
supported for 25+ years (like open-by-handle) is supported natively
by the XFS inode cache and the XFS inode life cycle state machine.
In contrast, The way the VFS inode cache handles stuff like this is
very much a hacked-in "oops we didn't think of that" after-thought
that doesn't actually cover all the different APIs or filesystems...
> Said routine would have 2 main purposes:
> - validate the state of the inode (for example that a valid mode is
> set; this would have caught some of the syzkaller bugs from the get
> go)
I think that's going to be harder than it sounds (speaking as the
architect of the comprehensive on-disk metadata validation
infrastructure in XFS).
> - pre-compute a bunch of stuff, for example see this crapper:
>
> static inline int do_inode_permission(struct mnt_idmap *idmap,
> struct inode *inode, int mask)
> {
> if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
> if (likely(inode->i_op->permission))
> return inode->i_op->permission(idmap, inode,
> mask);
>
> /* This gets set once for the inode lifetime */
> spin_lock(&inode->i_lock);
> inode->i_opflags |= IOP_FASTPERM;
> spin_unlock(&inode->i_lock);
> }
> return generic_permission(idmap, inode, mask);
> }
Yup, that would be useful.
> Note unlock_new_inode() and similar are not mandatory to call.
To a point. i.e. if you are using a VFS inode hash implemtation that
sets I_NEW, then it is definitely mandatory to call
unlock_new_inode(). Documentation/filesystems/porting.rst even says
that.
However, if you aren't using a VFS inode cache implemenation that
sets I_NEW, then you've got to set it yourself and clear it
appropriately so the rest of the VFS functionality does the right
thing whilst the inode is published but still being initialised.
e.g. putting an inode still undergoing initialisation on the
sb->s_inodes list without it being marked as I_NEW is, quite simply,
a bug.
Hence it may not be mandatory to use unlock_new_inode(), but if you
are publishing a partially initialised inode on any VFS list or
cache, you still need to be doing the right thing w.r.t. locking,
I_NEW, I_CREATING and calling unlock_new_inode() during inode
initialisation and cache lookups.
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2025-10-15 22:06 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 7:59 [PATCH v7 00/14] hide ->i_state behind accessors Mateusz Guzik
2025-10-09 7:59 ` [PATCH v7 01/14] fs: move wait_on_inode() from writeback.h to fs.h Mateusz Guzik
2025-10-10 13:47 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 02/14] fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb Mateusz Guzik
2025-10-10 13:48 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 03/14] fs: provide accessors for ->i_state Mateusz Guzik
2025-10-10 13:51 ` Jan Kara
2025-10-10 14:44 ` Jan Kara
2025-10-10 15:51 ` Mateusz Guzik
2025-10-14 22:24 ` Dave Chinner
2025-10-15 5:46 ` Mateusz Guzik
2025-10-15 22:06 ` Dave Chinner [this message]
2025-10-20 9:43 ` Jan Kara
2025-10-14 21:57 ` Dave Chinner
2025-10-09 7:59 ` [PATCH v7 04/14] Coccinelle-based conversion to use ->i_state accessors Mateusz Guzik
2025-10-10 14:07 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 05/14] Manual conversion to use ->i_state accessors of all places not covered by coccinelle Mateusz Guzik
2025-10-10 14:10 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 06/14] btrfs: use the new ->i_state accessors Mateusz Guzik
2025-10-10 14:11 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 07/14] ceph: " Mateusz Guzik
2025-10-10 14:12 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 08/14] smb: " Mateusz Guzik
2025-10-10 14:15 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 09/14] f2fs: " Mateusz Guzik
2025-10-10 14:16 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 10/14] gfs2: " Mateusz Guzik
2025-10-10 14:17 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 11/14] overlayfs: " Mateusz Guzik
2025-10-10 14:18 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 12/14] nilfs2: " Mateusz Guzik
2025-10-10 14:18 ` Jan Kara
2025-10-09 7:59 ` [PATCH v7 13/14] xfs: " Mateusz Guzik
2025-10-10 14:41 ` Jan Kara
2025-10-10 15:40 ` Mateusz Guzik
2025-10-15 0:02 ` Dave Chinner
2025-10-15 2:10 ` Mateusz Guzik
2025-10-09 7:59 ` [PATCH v7 14/14] fs: make plain ->i_state access fail to compile Mateusz Guzik
2025-10-10 14:26 ` Jan Kara
2025-10-10 11:24 ` [PATCH v7 00/14] hide ->i_state behind accessors Christian Brauner
2025-10-10 11:29 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPAa9fz-4OG_9pVX@dread.disaster.area \
--to=david@fromorbit.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mjguzik@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox