From: Josef Bacik <josef@toxicpanda.com>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org,
kernel-team@fb.com, linux-ext4@vger.kernel.org,
linux-xfs@vger.kernel.org, viro@zeniv.linux.org.uk,
amir73il@gmail.com
Subject: Re: [PATCH v2 15/54] fs: maintain a list of pinned inodes
Date: Wed, 27 Aug 2025 12:07:56 -0400 [thread overview]
Message-ID: <20250827160756.GA2272053@perftesting> (raw)
In-Reply-To: <20250827-gelandet-heizt-1f250f77bfc8@brauner>
On Wed, Aug 27, 2025 at 05:20:17PM +0200, Christian Brauner wrote:
> On Tue, Aug 26, 2025 at 11:39:15AM -0400, Josef Bacik wrote:
> > Currently we have relied on dirty inodes and inodes with cache on them
> > to simply be left hanging around on the system outside of an LRU. The
> > only way to make sure these inodes are eventually reclaimed is because
> > dirty writeback will grab a reference on the inode and then iput it when
> > it's done, potentially getting it on the LRU. For the cached case the
> > page cache deletion path will call inode_add_lru when the inode no
> > longer has cached pages in order to make sure the inode object can be
> > freed eventually. In the unmount case we walk all inodes and free them
> > so this all works out fine.
> >
> > But we want to eliminate 0 i_count objects as a concept, so we need a
> > mechanism to hold a reference on these pinned inodes. To that end, add a
> > list to the super block that contains any inodes that are cached for one
> > reason or another.
> >
> > When we call inode_add_lru(), if the inode falls into one of these
> > categories, we will add it to the cached inode list and hold an
> > i_obj_count reference. If the inode does not fall into one of these
> > categories it will be moved to the normal LRU, which is already holds an
> > i_obj_count reference.
> >
> > The dirty case we will delete it from the LRU if it is on one, and then
> > the iput after the writeout will make sure it's placed onto the correct
> > list at that point.
> >
> > The page cache case will migrate it when it calls inode_add_lru() when
> > deleting pages from the page cache.
> >
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
>
> Ok, I'm trying to wrap my head around the justification for this new
> list. Currently we have inodes with a zero reference counts that aren't
> on any LRU. They just appear on sb->i_sb_list and are e.g., dealt with
> during umount (sync_filesystem() followed by evict_inodes()).
>
> So they're either dealt with by writeback or by the page cache and are
> eventually put on the regular LRU or the filesystem shuts down before
> that happens.
>
> They're easy to handle and recognize because their inode->i_count is
> zero.
>
> Now you make the LRUs hold a full reference so it can be grabbed from
> the LRU again avoiding the zombie resurrection from zero. So to
> recognize inodes that are pinned internally due to being dirty or having
> pagecache pages attached to it you need to track them in a new list
> otherwise you can't really differentiate them and when to move them onto
> the LRU after writeback and pagecache is done with them.
>
Exactly. We need to put them somewhere so we can account for their reference.
We could technically just use a flag and not have a list for this, and just use
the flag to indicate that the inode is pinned and the flag has a full reference
associated with it.
I did it this way because if I had a nickel for every time I needed to figure
out where a zombie inode was and had to do the most grotesque drgn magic to find
it, I'd have like 15 cents, which isn't a lot but weird that it's happened 3
times. Having a list makes it easier from a debugging perspective.
But again, we have ->s_inodes, and I can just scan that list and look for
I_LRU_CACHED. We'd still need to hold a full reference for that, but it would
eliminate the need for another list if that's more preferable? Thanks,
Josef
next prev parent reply other threads:[~2025-08-27 16:07 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 15:39 [PATCH v2 00/54] fs: rework inode reference counting Josef Bacik
2025-08-26 15:39 ` [PATCH v2 01/54] fs: make the i_state flags an enum Josef Bacik
2025-08-26 15:39 ` [PATCH v2 02/54] fs: add an icount_read helper Josef Bacik
2025-08-26 22:18 ` Mateusz Guzik
2025-08-27 11:25 ` (subset) " Christian Brauner
2025-08-26 15:39 ` [PATCH v2 03/54] fs: rework iput logic Josef Bacik
2025-08-27 12:58 ` Mateusz Guzik
2025-08-27 14:18 ` Mateusz Guzik
2025-08-27 14:54 ` Josef Bacik
2025-08-27 14:57 ` Christian Brauner
2025-08-27 16:24 ` [PATCH] fs: revamp iput() Mateusz Guzik
2025-08-30 15:54 ` Mateusz Guzik
2025-09-01 8:50 ` Jan Kara
2025-09-01 10:39 ` Christian Brauner
2025-09-01 10:41 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 04/54] fs: add an i_obj_count refcount to the inode Josef Bacik
2025-08-26 15:39 ` [PATCH v2 05/54] fs: hold an i_obj_count reference in wait_sb_inodes Josef Bacik
2025-08-26 15:39 ` [PATCH v2 06/54] fs: hold an i_obj_count reference for the i_wb_list Josef Bacik
2025-08-26 15:39 ` [PATCH v2 07/54] fs: hold an i_obj_count reference for the i_io_list Josef Bacik
2025-08-26 15:39 ` [PATCH v2 08/54] fs: hold an i_obj_count reference in writeback_sb_inodes Josef Bacik
2025-08-26 15:39 ` [PATCH v2 09/54] fs: hold an i_obj_count reference while on the hashtable Josef Bacik
2025-08-26 15:39 ` [PATCH v2 10/54] fs: hold an i_obj_count reference while on the LRU list Josef Bacik
2025-08-26 15:39 ` [PATCH v2 11/54] fs: hold an i_obj_count reference while on the sb inode list Josef Bacik
2025-08-26 15:39 ` [PATCH v2 12/54] fs: stop accessing ->i_count directly in f2fs and gfs2 Josef Bacik
2025-08-26 15:39 ` [PATCH v2 13/54] fs: hold an i_obj_count when we have an i_count reference Josef Bacik
2025-08-26 15:39 ` [PATCH v2 14/54] fs: add an I_LRU flag to the inode Josef Bacik
2025-08-26 15:39 ` [PATCH v2 15/54] fs: maintain a list of pinned inodes Josef Bacik
2025-08-27 15:20 ` Christian Brauner
2025-08-27 16:07 ` Josef Bacik [this message]
2025-08-28 8:24 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 16/54] fs: delete the inode from the LRU list on lookup Josef Bacik
2025-08-27 21:46 ` Dave Chinner
2025-08-28 11:42 ` Josef Bacik
2025-09-02 4:07 ` Dave Chinner
2025-08-26 15:39 ` [PATCH v2 17/54] fs: remove the inode from the LRU list on unlink/rmdir Josef Bacik
2025-08-27 12:32 ` Christian Brauner
2025-08-27 16:08 ` Josef Bacik
2025-08-27 22:01 ` Dave Chinner
2025-08-28 11:46 ` Josef Bacik
2025-09-02 1:48 ` Dave Chinner
2025-08-28 9:00 ` Christian Brauner
2025-08-28 9:06 ` Christian Brauner
2025-08-28 10:43 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 18/54] fs: change evict_inodes to use iput instead of evict directly Josef Bacik
2025-08-28 10:18 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 19/54] fs: hold a full ref while the inode is on a LRU Josef Bacik
2025-08-28 10:51 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 20/54] fs: disallow 0 reference count inodes Josef Bacik
2025-08-28 11:02 ` Christian Brauner
2025-08-28 11:44 ` Josef Bacik
2025-08-26 15:39 ` [PATCH v2 21/54] fs: make evict_inodes add to the dispose list under the i_lock Josef Bacik
2025-08-26 15:39 ` [PATCH v2 22/54] fs: convert i_count to refcount_t Josef Bacik
2025-08-28 12:00 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 23/54] fs: use refcount_inc_not_zero in igrab Josef Bacik
2025-08-28 22:08 ` Eric Biggers
2025-08-29 13:42 ` Josef Bacik
2025-08-26 15:39 ` [PATCH v2 24/54] fs: use inode_tryget in find_inode* Josef Bacik
2025-08-26 15:39 ` [PATCH v2 25/54] fs: update find_inode_*rcu to check the i_count count Josef Bacik
2025-08-26 15:39 ` [PATCH v2 26/54] fs: use igrab in insert_inode_locked Josef Bacik
2025-08-28 12:15 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 27/54] fs: remove I_WILL_FREE|I_FREEING check from __inode_add_lru Josef Bacik
2025-08-26 15:39 ` [PATCH v2 28/54] fs: remove I_WILL_FREE|I_FREEING check in inode_pin_lru_isolating Josef Bacik
2025-08-26 15:39 ` [PATCH v2 29/54] fs: use inode_tryget in evict_inodes Josef Bacik
2025-08-26 15:39 ` [PATCH v2 30/54] fs: change evict_dentries_for_decrypted_inodes to use refcount Josef Bacik
2025-08-28 12:25 ` Christian Brauner
2025-08-28 22:26 ` Eric Biggers
2025-08-29 7:38 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 31/54] block: use igrab in sync_bdevs Josef Bacik
2025-08-26 15:39 ` [PATCH v2 32/54] bcachefs: use the refcount instead of I_WILL_FREE|I_FREEING Josef Bacik
2025-08-26 15:39 ` [PATCH v2 33/54] btrfs: don't check I_WILL_FREE|I_FREEING Josef Bacik
2025-08-26 15:39 ` [PATCH v2 34/54] fs: use igrab in drop_pagecache_sb Josef Bacik
2025-08-26 15:39 ` [PATCH v2 35/54] fs: stop checking I_FREEING in d_find_alias_rcu Josef Bacik
2025-08-26 15:39 ` [PATCH v2 36/54] ext4: stop checking I_WILL_FREE|IFREEING in ext4_check_map_extents_env Josef Bacik
2025-08-26 15:39 ` [PATCH v2 37/54] fs: remove I_WILL_FREE|I_FREEING from fs-writeback.c Josef Bacik
2025-08-26 15:39 ` [PATCH v2 38/54] gfs2: remove I_WILL_FREE|I_FREEING usage Josef Bacik
2025-08-26 15:39 ` [PATCH v2 39/54] fs: remove I_WILL_FREE|I_FREEING check from dquot.c Josef Bacik
2025-08-28 12:35 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 40/54] notify: remove I_WILL_FREE|I_FREEING checks in fsnotify_unmount_inodes Josef Bacik
2025-08-26 15:39 ` [PATCH v2 41/54] xfs: remove I_FREEING check Josef Bacik
2025-08-26 15:39 ` [PATCH v2 42/54] landlock: remove I_FREEING|I_WILL_FREE check Josef Bacik
2025-08-26 15:39 ` [PATCH v2 43/54] fs: change inode_is_dirtytime_only to use refcount Josef Bacik
2025-08-26 22:06 ` Mateusz Guzik
2025-08-28 12:38 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 44/54] btrfs: remove references to I_FREEING Josef Bacik
2025-08-26 15:39 ` [PATCH v2 45/54] ext4: remove reference to I_FREEING in inode.c Josef Bacik
2025-08-26 15:39 ` [PATCH v2 46/54] ext4: remove reference to I_FREEING in orphan.c Josef Bacik
2025-08-26 15:39 ` [PATCH v2 47/54] pnfs: use i_count refcount to determine if the inode is going away Josef Bacik
2025-08-26 15:39 ` [PATCH v2 48/54] fs: remove some spurious I_FREEING references in inode.c Josef Bacik
2025-08-28 12:40 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 49/54] xfs: remove reference to I_FREEING|I_WILL_FREE Josef Bacik
2025-08-26 15:39 ` [PATCH v2 50/54] ocfs2: do not set I_WILL_FREE Josef Bacik
2025-08-26 15:39 ` [PATCH v2 51/54] fs: remove I_FREEING|I_WILL_FREE Josef Bacik
2025-08-28 12:42 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 52/54] fs: remove I_REFERENCED Josef Bacik
2025-08-28 12:47 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 53/54] fs: remove I_LRU_ISOLATING flag Josef Bacik
2025-08-28 0:26 ` Dave Chinner
2025-08-28 10:35 ` Christian Brauner
2025-08-26 15:39 ` [PATCH v2 54/54] fs: add documentation explaining the reference count rules for inodes Josef Bacik
2025-08-27 8:03 ` [syzbot ci] Re: fs: rework inode reference counting syzbot ci
2025-08-27 11:14 ` (subset) [PATCH v2 00/54] " Christian Brauner
2025-08-28 12:51 ` Christian Brauner
2025-08-28 21:22 ` Josef Bacik
2025-09-02 10:06 ` Mateusz Guzik
2025-09-02 21:16 ` Josef Bacik
2025-09-09 13:48 ` Mateusz Guzik
2025-09-09 16:32 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250827160756.GA2272053@perftesting \
--to=josef@toxicpanda.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.