linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org,
	kernel-team@fb.com, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 02/50] fs: make the i_state flags an enum
Date: Fri, 22 Aug 2025 09:31:51 -0400	[thread overview]
Message-ID: <20250822133151.GB927384@perftesting> (raw)
In-Reply-To: <20250822-orcas-bemannten-728c9946b160@brauner>

On Fri, Aug 22, 2025 at 01:08:07PM +0200, Christian Brauner wrote:
> On Thu, Aug 21, 2025 at 04:18:13PM -0400, Josef Bacik wrote:
> > Adjusting i_state flags always means updating the values manually. Bring
> > these forward into the 2020's and make a nice clean macro for defining
> > the i_state values as an enum, providing __ variants for the cases where
> > we need the bit position instead of the actual value, and leaving the
> > actual NAME as the 1U << bit value.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  include/linux/fs.h | 234 +++++++++++++++++++++++----------------------
> >  1 file changed, 122 insertions(+), 112 deletions(-)
> > 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9a1ce67eed33..e741dc453c2c 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -665,6 +665,127 @@ is_uncached_acl(struct posix_acl *acl)
> >  #define IOP_MGTIME	0x0020
> >  #define IOP_CACHED_LINK	0x0040
> >  
> > +/*
> > + * Inode state bits.  Protected by inode->i_lock
> > + *
> > + * Four bits determine the dirty state of the inode: I_DIRTY_SYNC,
> > + * I_DIRTY_DATASYNC, I_DIRTY_PAGES, and I_DIRTY_TIME.
> > + *
> > + * Four bits define the lifetime of an inode.  Initially, inodes are I_NEW,
> > + * until that flag is cleared.  I_WILL_FREE, I_FREEING and I_CLEAR are set at
> > + * various stages of removing an inode.
> > + *
> > + * Two bits are used for locking and completion notification, I_NEW and I_SYNC.
> > + *
> > + * I_DIRTY_SYNC		Inode is dirty, but doesn't have to be written on
> > + *			fdatasync() (unless I_DIRTY_DATASYNC is also set).
> > + *			Timestamp updates are the usual cause.
> > + * I_DIRTY_DATASYNC	Data-related inode changes pending.  We keep track of
> > + *			these changes separately from I_DIRTY_SYNC so that we
> > + *			don't have to write inode on fdatasync() when only
> > + *			e.g. the timestamps have changed.
> > + * I_DIRTY_PAGES	Inode has dirty pages.  Inode itself may be clean.
> > + * I_DIRTY_TIME		The inode itself has dirty timestamps, and the
> > + *			lazytime mount option is enabled.  We keep track of this
> > + *			separately from I_DIRTY_SYNC in order to implement
> > + *			lazytime.  This gets cleared if I_DIRTY_INODE
> > + *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> > + *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> > + *			in place because writeback might already be in progress
> > + *			and we don't want to lose the time update
> > + * I_NEW		Serves as both a mutex and completion notification.
> > + *			New inodes set I_NEW.  If two processes both create
> > + *			the same inode, one of them will release its inode and
> > + *			wait for I_NEW to be released before returning.
> > + *			Inodes in I_WILL_FREE, I_FREEING or I_CLEAR state can
> > + *			also cause waiting on I_NEW, without I_NEW actually
> > + *			being set.  find_inode() uses this to prevent returning
> > + *			nearly-dead inodes.
> > + * I_WILL_FREE		Must be set when calling write_inode_now() if i_count
> > + *			is zero.  I_FREEING must be set when I_WILL_FREE is
> > + *			cleared.
> > + * I_FREEING		Set when inode is about to be freed but still has dirty
> > + *			pages or buffers attached or the inode itself is still
> > + *			dirty.
> > + * I_CLEAR		Added by clear_inode().  In this state the inode is
> > + *			clean and can be destroyed.  Inode keeps I_FREEING.
> > + *
> > + *			Inodes that are I_WILL_FREE, I_FREEING or I_CLEAR are
> > + *			prohibited for many purposes.  iget() must wait for
> > + *			the inode to be completely released, then create it
> > + *			anew.  Other functions will just ignore such inodes,
> > + *			if appropriate.  I_NEW is used for waiting.
> > + *
> > + * I_SYNC		Writeback of inode is running. The bit is set during
> > + *			data writeback, and cleared with a wakeup on the bit
> > + *			address once it is done. The bit is also used to pin
> > + *			the inode in memory for flusher thread.
> > + *
> > + * I_REFERENCED		Marks the inode as recently references on the LRU list.
> > + *
> > + * I_WB_SWITCH		Cgroup bdi_writeback switching in progress.  Used to
> > + *			synchronize competing switching instances and to tell
> > + *			wb stat updates to grab the i_pages lock.  See
> > + *			inode_switch_wbs_work_fn() for details.
> > + *
> > + * I_OVL_INUSE		Used by overlayfs to get exclusive ownership on upper
> > + *			and work dirs among overlayfs mounts.
> > + *
> > + * I_CREATING		New object's inode in the middle of setting up.
> > + *
> > + * I_DONTCACHE		Evict inode as soon as it is not used anymore.
> > + *
> > + * I_SYNC_QUEUED	Inode is queued in b_io or b_more_io writeback lists.
> > + *			Used to detect that mark_inode_dirty() should not move
> > + *			inode between dirty lists.
> > + *
> > + * I_PINNING_FSCACHE_WB	Inode is pinning an fscache object for writeback.
> > + *
> > + * I_LRU_ISOLATING	Inode is pinned being isolated from LRU without holding
> > + *			i_count.
> > + *
> > + * Q: What is the difference between I_WILL_FREE and I_FREEING?
> > + *
> > + * __I_{SYNC,NEW,LRU_ISOLATING} are used to derive unique addresses to wait
> > + * upon. There's one free address left.
> > + */
> > +
> > +/*
> > + * As simple macro to define the inode state bits, __NAME will be the bit value
> > + * (0, 1, 2, ...), and NAME will be the bit mask (1U << __NAME). The __NAME_SEQ
> > + * is used to reset the sequence number so the next name gets the next bit value
> > + * in the sequence.
> > + */
> > +#define INODE_BIT(name)			\
> > +	__ ## name,			\
> > +	name = (1U << __ ## name),	\
> > +	__ ## name ## _SEQ = __ ## name
> 
> I'm not sure if this is the future we want :D
> I think it's harder to parse than what we have now.
> 
> > +
> > +enum inode_state_bits {
> > +	INODE_BIT(I_NEW),
> > +	INODE_BIT(I_SYNC),
> > +	INODE_BIT(I_LRU_ISOLATING),
> > +	INODE_BIT(I_DIRTY_SYNC),
> > +	INODE_BIT(I_DIRTY_DATASYNC),
> > +	INODE_BIT(I_DIRTY_PAGES),
> > +	INODE_BIT(I_WILL_FREE),
> > +	INODE_BIT(I_FREEING),
> > +	INODE_BIT(I_CLEAR),
> > +	INODE_BIT(I_REFERENCED),
> > +	INODE_BIT(I_LINKABLE),
> > +	INODE_BIT(I_DIRTY_TIME),
> > +	INODE_BIT(I_WB_SWITCH),
> > +	INODE_BIT(I_OVL_INUSE),
> > +	INODE_BIT(I_CREATING),
> > +	INODE_BIT(I_DONTCACHE),
> > +	INODE_BIT(I_SYNC_QUEUED),
> > +	INODE_BIT(I_PINNING_NETFS_WB),
> > +};
> 
> Good idea but I really dislike this macro indirection.
> Can't we just do the really boring?
> 
> enum inode_state_bits {
> 	__I_BIT_NEW		= 0U
> 	__I_BIT_SYNC		= 1U
> 	__I_BIT_LRU_ISOLATING	= 2U
> }
> 
> enum inode_state_flags_t {
> 	I_NEW			= (1U << __I_BIT_NEW)
> 	I_SYNC			= (1U << __I_BIT_SYNC)
> 	I_LRU_ISOLATING		= (1U << __I_BIT_LRU_ISOLATING)
> 	I_DIRTY_SYNC		= (1U << 3)
> 	I_DIRTY_DATASYNC	= (1U << 4)
> 	I_DIRTY_PAGES		= (1U << 5)
> 	I_WILL_FREE		= (1U << 6)
> 	I_FREEING		= (1U << 7)
> 	I_CLEAR			= (1U << 8)
> 	I_REFERENCED		= (1U << 9)
> 	I_LINKABLE		= (1U << 10)
> 	I_DIRTY_TIME		= (1U << 11)
> 	I_WB_SWITCH		= (1U << 12)
> 	I_OVL_INUSE		= (1U << 13)
> 	I_CREATING		= (1U << 14)
> 	I_DONTCACHE		= (1U << 15)
> 	I_SYNC_QUEUED		= (1U << 16)
> 	I_PINNING_NETFS_WB	= (1U << 17)
> };
> 
> Note that inode_state_wait_address() and that only works on four bits so
> we can't really use higher bits anyway without switching back to a
> scheme where we have to use unsigned long and waste for bytes for
> nothing on 64 bit.
> 
> With that out of the way,
> 
> Reviewed-by: Christian Brauner <brauner@kernel.org>

Yup totally, I'll fix this and add your RB. Thanks!

Josef

  reply	other threads:[~2025-08-22 13:31 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21 20:18 [PATCH 00/50] fs: rework inode reference counting Josef Bacik
2025-08-21 20:18 ` [PATCH 01/50] fs: add an i_obj_count refcount to the inode Josef Bacik
2025-08-21 20:18 ` [PATCH 02/50] fs: make the i_state flags an enum Josef Bacik
2025-08-22 11:08   ` Christian Brauner
2025-08-22 13:31     ` Josef Bacik [this message]
2025-08-22 14:36     ` David Sterba
2025-08-22 11:18   ` Sun YangKai
2025-08-22 11:42     ` [PATCH 02/50] " Alan Huang
2025-08-22 12:11       ` Sun YangKai
2025-08-22 14:40     ` [PATCH 02/50] fs: " Josef Bacik
2025-08-21 20:18 ` [PATCH 03/50] fs: hold an i_obj_count reference in wait_sb_inodes Josef Bacik
2025-08-21 20:18 ` [PATCH 04/50] fs: hold an i_obj_count reference for the i_wb_list Josef Bacik
2025-08-22 11:27   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 05/50] fs: hold an i_obj_count reference for the i_io_list Josef Bacik
2025-08-21 20:18 ` [PATCH 06/50] fs: hold an i_obj_count reference in writeback_sb_inodes Josef Bacik
2025-08-22 12:20   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 07/50] fs: hold an i_obj_count reference while on the hashtable Josef Bacik
2025-08-21 20:18 ` [PATCH 08/50] fs: hold an i_obj_count reference while on the LRU list Josef Bacik
2025-08-21 20:18 ` [PATCH 09/50] fs: hold an i_obj_count reference while on the sb inode list Josef Bacik
2025-08-21 20:18 ` [PATCH 10/50] fs: stop accessing ->i_count directly in f2fs and gfs2 Josef Bacik
2025-08-22 12:38   ` (subset) " Christian Brauner
2025-08-21 20:18 ` [PATCH 11/50] fs: hold an i_obj_count when we have an i_count reference Josef Bacik
2025-08-21 20:18 ` [PATCH 12/50] fs: rework iput logic Josef Bacik
2025-08-22 12:54   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 13/50] fs: add an I_LRU flag to the inode Josef Bacik
2025-08-21 20:18 ` [PATCH 14/50] fs: maintain a list of pinned inodes Josef Bacik
2025-08-22 14:55   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 15/50] fs: delete the inode from the LRU list on lookup Josef Bacik
2025-08-22 15:27   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 16/50] fs: change evict_inodes to use iput instead of evict directly Josef Bacik
2025-08-25  9:07   ` Christian Brauner
2025-08-25 19:35     ` Josef Bacik
2025-08-26  9:56       ` Christian Brauner
2025-08-21 20:18 ` [PATCH 17/50] fs: hold a full ref while the inode is on a LRU Josef Bacik
2025-08-25  9:20   ` Christian Brauner
2025-08-25 10:40     ` Christian Brauner
2025-08-21 20:18 ` [PATCH 18/50] fs: disallow 0 reference count inodes Josef Bacik
2025-08-25 10:54   ` Christian Brauner
2025-08-25 19:26     ` Josef Bacik
2025-08-26  9:28       ` Christian Brauner
2025-08-21 20:18 ` [PATCH 19/50] fs: make evict_inodes add to the dispose list under the i_lock Josef Bacik
2025-08-21 20:18 ` [PATCH 20/50] fs: convert i_count to refcount_t Josef Bacik
2025-08-22 12:10   ` Amir Goldstein
2025-08-22 13:56   ` kernel test robot
2025-08-25 11:03   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 21/50] fs: use refcount_inc_not_zero in igrab Josef Bacik
2025-08-25 11:21   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 22/50] fs: use inode_tryget in find_inode* Josef Bacik
2025-08-25 11:26   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 23/50] fs: update find_inode_*rcu to check the i_count count Josef Bacik
2025-08-25 11:27   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 24/50] fs: use igrab in insert_inode_locked Josef Bacik
2025-08-21 20:18 ` [PATCH 25/50] fs: remove I_WILL_FREE|I_FREEING check from __inode_add_lru Josef Bacik
2025-08-21 20:18 ` [PATCH 26/50] fs: remove I_WILL_FREE|I_FREEING check in inode_pin_lru_isolating Josef Bacik
2025-08-21 20:18 ` [PATCH 27/50] fs: use inode_tryget in evict_inodes Josef Bacik
2025-08-25 11:43   ` Christian Brauner
2025-08-25 18:22     ` Josef Bacik
2025-08-21 20:18 ` [PATCH 28/50] fs: change evict_dentries_for_decrypted_inodes to use refcount Josef Bacik
2025-08-21 20:18 ` [PATCH 29/50] block: use igrab in sync_bdevs Josef Bacik
2025-08-21 20:18 ` [PATCH 30/50] bcachefs: use the refcount instead of I_WILL_FREE|I_FREEING Josef Bacik
2025-08-21 20:18 ` [PATCH 31/50] btrfs: don't check I_WILL_FREE|I_FREEING Josef Bacik
2025-08-21 20:18 ` [PATCH 32/50] fs: use igrab in drop_pagecache_sb Josef Bacik
2025-08-21 20:18 ` [PATCH 33/50] fs: stop checking I_FREEING in d_find_alias_rcu Josef Bacik
2025-08-21 20:18 ` [PATCH 34/50] ext4: stop checking I_WILL_FREE|IFREEING in ext4_check_map_extents_env Josef Bacik
2025-08-21 20:18 ` [PATCH 35/50] fs: remove I_WILL_FREE|I_FREEING from fs-writeback.c Josef Bacik
2025-08-25 11:46   ` Christian Brauner
2025-08-21 20:18 ` [PATCH 36/50] gfs2: remove I_WILL_FREE|I_FREEING usage Josef Bacik
2025-08-21 20:18 ` [PATCH 37/50] fs: remove I_WILL_FREE|I_FREEING check from dquot.c Josef Bacik
2025-08-21 20:18 ` [PATCH 38/50] notify: remove I_WILL_FREE|I_FREEING checks in fsnotify_unmount_inodes Josef Bacik
2025-08-21 20:18 ` [PATCH 39/50] xfs: remove I_FREEING check Josef Bacik
2025-08-21 20:18 ` [PATCH 40/50] landlock: remove I_FREEING|I_WILL_FREE check Josef Bacik
2025-08-21 20:18 ` [PATCH 41/50] fs: change inode_is_dirtytime_only to use refcount Josef Bacik
2025-08-21 20:18 ` [PATCH 42/50] btrfs: remove references to I_FREEING Josef Bacik
2025-08-21 20:18 ` [PATCH 43/50] ext4: remove reference to I_FREEING in inode.c Josef Bacik
2025-08-21 20:18 ` [PATCH 44/50] ext4: remove reference to I_FREEING in orphan.c Josef Bacik
2025-08-21 20:18 ` [PATCH 45/50] pnfs: use i_count refcount to determine if the inode is going away Josef Bacik
2025-08-21 20:18 ` [PATCH 46/50] fs: remove some spurious I_FREEING references in inode.c Josef Bacik
2025-08-21 20:18 ` [PATCH 47/50] xfs: remove reference to I_FREEING|I_WILL_FREE Josef Bacik
2025-08-21 20:18 ` [PATCH 48/50] ocfs2: do not set I_WILL_FREE Josef Bacik
2025-08-21 20:19 ` [PATCH 49/50] fs: remove I_FREEING|I_WILL_FREE Josef Bacik
2025-08-25 11:53   ` Christian Brauner
2025-08-21 20:19 ` [PATCH 50/50] fs: add documentation explaining the reference count rules for inodes Josef Bacik
2025-08-25 11:56   ` Christian Brauner
2025-08-22 10:51 ` [PATCH 00/50] fs: rework inode reference counting Christian Brauner
2025-08-22 13:30   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250822133151.GB927384@perftesting \
    --to=josef@toxicpanda.com \
    --cc=brauner@kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).