From: Christian Brauner <brauner@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org,
kernel-team@fb.com, linux-ext4@vger.kernel.org,
linux-xfs@vger.kernel.org, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 16/50] fs: change evict_inodes to use iput instead of evict directly
Date: Tue, 26 Aug 2025 11:56:12 +0200 [thread overview]
Message-ID: <20250826-begnadet-vorarbeit-901ad1e2bfa0@brauner> (raw)
In-Reply-To: <20250825193502.GB1310133@perftesting>
On Mon, Aug 25, 2025 at 03:35:02PM -0400, Josef Bacik wrote:
> On Mon, Aug 25, 2025 at 11:07:55AM +0200, Christian Brauner wrote:
> > On Thu, Aug 21, 2025 at 04:18:27PM -0400, Josef Bacik wrote:
> > > At evict_inodes() time, we no longer have SB_ACTIVE set, so we can
> > > easily go through the normal iput path to clear any inodes. Update
> >
> > I'm a bit lost why SB_ACTIVE is used here as a justification to call
> > iput(). I think it's because iput_final() would somehow add it back to
> > the LRU if SB_ACTIVE was still set and the filesystem somehow would
> > indicate it wouldn't want to drop the inode.
> >
> > I'm confused where that would even happen. IOW, which filesystem would
> > indicate "don't drop the inode" even though it's about to vanish. But
> > anyway, that's probably not important because...
> >
> > > dispose_list() to check how we need to free the inode, and then grab a
> > > full reference to the inode while we're looping through the remaining
> > > inodes, and simply iput them at the end.
> > >
> > > Since we're just calling iput we don't really care about the i_count on
> > > the inode at the current time. Remove the i_count checks and just call
> > > iput on every inode we find.
> > >
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > ---
> > > fs/inode.c | 26 +++++++++++---------------
> > > 1 file changed, 11 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 72981b890ec6..80ad327746a7 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -933,7 +933,7 @@ static void evict(struct inode *inode)
> > > * Dispose-list gets a local list with local inodes in it, so it doesn't
> > > * need to worry about list corruption and SMP locks.
> > > */
> > > -static void dispose_list(struct list_head *head)
> > > +static void dispose_list(struct list_head *head, bool for_lru)
> > > {
> > > while (!list_empty(head)) {
> > > struct inode *inode;
> > > @@ -941,8 +941,12 @@ static void dispose_list(struct list_head *head)
> > > inode = list_first_entry(head, struct inode, i_lru);
> > > list_del_init(&inode->i_lru);
> > >
> > > - evict(inode);
> > > - iobj_put(inode);
> > > + if (for_lru) {
> > > + evict(inode);
> > > + iobj_put(inode);
> > > + } else {
> > > + iput(inode);
> > > + }
> >
> > ... Afaict, if we end up in dispose_list() we came from one of two
> > locations:
> >
> > (1) prune_icache_sb()
> > In which case inode_lru_isolate() will have only returned inodes
> > that prior to your changes would have inode->i_count zero.
> >
> > (2) evict_inodes()
> > Similar story, this only hits inodes with inode->i_count zero.
> >
> > With your change you're adding an increment from zero for (2) via
> > __iget() so that you always end up with a full refcount, and that is
> > backing your changes to dispose_list() later.
> >
> > I don't see the same done for (1) though and so your later call to
> > iput() drops the reference below zero? It's accidently benign because
> > iiuc atomic_dec_and_test() will simply tell you that reference count
> > didn't go to zero and so iput() will back off. But still this should be
> > fixed if I'm right.
>
> Because (1) at this point doesn't have a full reference, it only has an
> i_obj_count reference. The next patch converts this, and removes this bit. I did
> it this way to clearly mark the change in behavior.
Ah, right, I forgot to take the the boolean argument into account.
You're passing that as true in (1). Sure and then sorry about the noise.
> prune_icache_sb() will call dispose_list(&list, true), which will do the
> evict(inode) and iobj_put(inode). This is correct because the inodes on the list
> from prune_icache_sb() will have an i_count == and have I_WILL_FREE set, so it
> will never have it's i_count increased to 1.
>
> The change here is to change evict_inodes() to simply call iput(), as it calls
> dispose_list(&list, false). We will increase the i_count to 1 from zero via
> __iget(), which at this point in the series is completely correct behavior. Then
> we will call iput() which will drop the i_count back to zero, and then call
> iput_final, and since SB_ACTIVE is not set, it will call evict(inode) and clean
> everything up properly.
>
> >
> > The conversion to iput() is introducing a lot of subtlety in the middle
> > of the series. If I'm right then the iput() is a always a nop because in
> > all cases it was an increment from zero. But it isn't really a nop
> > because we still do stuff like call ->drop_inode() again. Maybe it's
> > fine because no filesystem would have issues with this but I wouldn't
> > count on it and also it feels rather unclean to do it this way.
>
> So I'm definitely introducing another call to ->drop_inode() here, but
> ->drop_inode() has always been a "do we want to keep this inode on the LRU"
> call, calling it again doesn't really change anything.
Right, the first time the inode reference count drops to zero we call
iput_final() and then ->drop_inode() which tells us to put it on the
LRU. And we leave it otherwise in tact. This is the part I forgot.
Once you add it to the LRU you'll have incremented i_obj_count either
when you added it to the cached LRU and moving it from there to the
"actual" LRU or you'll take a new one.
And then dispose_list() for (1) will just need to put the i_obj_count.
Thanks!
> That being said it is a subtle functional change. I put it here specifically
> because it is a functional change. If it bites us in the ass in some unforseen
> way we'll be able to bisect it down to here and then we can all laugh at Josef
> because he missed something.
Yeah, it's good. It's just very confusing because for a brief time we
have to understand two reference counts that are first coupled almost
1:1 and then slowly decoupled which makes this particularly nasty...
> > So, under the assumption, that after the increment from zero you did, we
> > really only have a blatant zombie inode on our hands and we only need to
> > get rid of the i_count we took make that explicit and do:
> >
> > if (for_lru) {
> > evict(inode);
> > iobj_put(inode);
> > } else {
> > /* This inode was always incremented from zero.
> > * Get rid of that reference without doing anything else.
> > */
> > WARN_ON_ONCE(!atomic_dec_and_test(&inode->i_count));
> > }
>
> We still need the evict() to actually free the inode. We're just getting there
> via iput_final() now instead of directly calling evict().
Right, thanks!
>
> >
> > Btw, for the iobj_put() above, I assume that we're not guaranteed that
> > i_obj_count == 1?
>
> Right, it's purely dropping the LRU list i_obj_count reference. Thanks,
Thanks for the comments!
next prev parent reply other threads:[~2025-08-26 9:56 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-21 20:18 [PATCH 00/50] fs: rework inode reference counting Josef Bacik
2025-08-21 20:18 ` [PATCH 01/50] fs: add an i_obj_count refcount to the inode Josef Bacik
2025-08-21 20:18 ` [PATCH 02/50] fs: make the i_state flags an enum Josef Bacik
2025-08-22 11:08 ` Christian Brauner
2025-08-22 13:31 ` Josef Bacik
2025-08-22 14:36 ` David Sterba
2025-08-22 11:18 ` Sun YangKai
2025-08-22 11:42 ` [PATCH 02/50] " Alan Huang
2025-08-22 12:11 ` Sun YangKai
2025-08-22 14:40 ` [PATCH 02/50] fs: " Josef Bacik
2025-08-21 20:18 ` [PATCH 03/50] fs: hold an i_obj_count reference in wait_sb_inodes Josef Bacik
2025-08-21 20:18 ` [PATCH 04/50] fs: hold an i_obj_count reference for the i_wb_list Josef Bacik
2025-08-22 11:27 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 05/50] fs: hold an i_obj_count reference for the i_io_list Josef Bacik
2025-08-21 20:18 ` [PATCH 06/50] fs: hold an i_obj_count reference in writeback_sb_inodes Josef Bacik
2025-08-22 12:20 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 07/50] fs: hold an i_obj_count reference while on the hashtable Josef Bacik
2025-08-21 20:18 ` [PATCH 08/50] fs: hold an i_obj_count reference while on the LRU list Josef Bacik
2025-08-21 20:18 ` [PATCH 09/50] fs: hold an i_obj_count reference while on the sb inode list Josef Bacik
2025-08-21 20:18 ` [PATCH 10/50] fs: stop accessing ->i_count directly in f2fs and gfs2 Josef Bacik
2025-08-22 12:38 ` (subset) " Christian Brauner
2025-08-21 20:18 ` [PATCH 11/50] fs: hold an i_obj_count when we have an i_count reference Josef Bacik
2025-08-21 20:18 ` [PATCH 12/50] fs: rework iput logic Josef Bacik
2025-08-22 12:54 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 13/50] fs: add an I_LRU flag to the inode Josef Bacik
2025-08-21 20:18 ` [PATCH 14/50] fs: maintain a list of pinned inodes Josef Bacik
2025-08-22 14:55 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 15/50] fs: delete the inode from the LRU list on lookup Josef Bacik
2025-08-22 15:27 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 16/50] fs: change evict_inodes to use iput instead of evict directly Josef Bacik
2025-08-25 9:07 ` Christian Brauner
2025-08-25 19:35 ` Josef Bacik
2025-08-26 9:56 ` Christian Brauner [this message]
2025-08-21 20:18 ` [PATCH 17/50] fs: hold a full ref while the inode is on a LRU Josef Bacik
2025-08-25 9:20 ` Christian Brauner
2025-08-25 10:40 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 18/50] fs: disallow 0 reference count inodes Josef Bacik
2025-08-25 10:54 ` Christian Brauner
2025-08-25 19:26 ` Josef Bacik
2025-08-26 9:28 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 19/50] fs: make evict_inodes add to the dispose list under the i_lock Josef Bacik
2025-08-21 20:18 ` [PATCH 20/50] fs: convert i_count to refcount_t Josef Bacik
2025-08-22 12:10 ` Amir Goldstein
2025-08-22 13:56 ` kernel test robot
2025-08-25 11:03 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 21/50] fs: use refcount_inc_not_zero in igrab Josef Bacik
2025-08-25 11:21 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 22/50] fs: use inode_tryget in find_inode* Josef Bacik
2025-08-25 11:26 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 23/50] fs: update find_inode_*rcu to check the i_count count Josef Bacik
2025-08-25 11:27 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 24/50] fs: use igrab in insert_inode_locked Josef Bacik
2025-08-21 20:18 ` [PATCH 25/50] fs: remove I_WILL_FREE|I_FREEING check from __inode_add_lru Josef Bacik
2025-08-21 20:18 ` [PATCH 26/50] fs: remove I_WILL_FREE|I_FREEING check in inode_pin_lru_isolating Josef Bacik
2025-08-21 20:18 ` [PATCH 27/50] fs: use inode_tryget in evict_inodes Josef Bacik
2025-08-25 11:43 ` Christian Brauner
2025-08-25 18:22 ` Josef Bacik
2025-08-21 20:18 ` [PATCH 28/50] fs: change evict_dentries_for_decrypted_inodes to use refcount Josef Bacik
2025-08-21 20:18 ` [PATCH 29/50] block: use igrab in sync_bdevs Josef Bacik
2025-08-21 20:18 ` [PATCH 30/50] bcachefs: use the refcount instead of I_WILL_FREE|I_FREEING Josef Bacik
2025-08-21 20:18 ` [PATCH 31/50] btrfs: don't check I_WILL_FREE|I_FREEING Josef Bacik
2025-08-21 20:18 ` [PATCH 32/50] fs: use igrab in drop_pagecache_sb Josef Bacik
2025-08-21 20:18 ` [PATCH 33/50] fs: stop checking I_FREEING in d_find_alias_rcu Josef Bacik
2025-08-21 20:18 ` [PATCH 34/50] ext4: stop checking I_WILL_FREE|IFREEING in ext4_check_map_extents_env Josef Bacik
2025-08-21 20:18 ` [PATCH 35/50] fs: remove I_WILL_FREE|I_FREEING from fs-writeback.c Josef Bacik
2025-08-25 11:46 ` Christian Brauner
2025-08-21 20:18 ` [PATCH 36/50] gfs2: remove I_WILL_FREE|I_FREEING usage Josef Bacik
2025-08-21 20:18 ` [PATCH 37/50] fs: remove I_WILL_FREE|I_FREEING check from dquot.c Josef Bacik
2025-08-21 20:18 ` [PATCH 38/50] notify: remove I_WILL_FREE|I_FREEING checks in fsnotify_unmount_inodes Josef Bacik
2025-08-21 20:18 ` [PATCH 39/50] xfs: remove I_FREEING check Josef Bacik
2025-08-21 20:18 ` [PATCH 40/50] landlock: remove I_FREEING|I_WILL_FREE check Josef Bacik
2025-08-21 20:18 ` [PATCH 41/50] fs: change inode_is_dirtytime_only to use refcount Josef Bacik
2025-08-21 20:18 ` [PATCH 42/50] btrfs: remove references to I_FREEING Josef Bacik
2025-08-21 20:18 ` [PATCH 43/50] ext4: remove reference to I_FREEING in inode.c Josef Bacik
2025-08-21 20:18 ` [PATCH 44/50] ext4: remove reference to I_FREEING in orphan.c Josef Bacik
2025-08-21 20:18 ` [PATCH 45/50] pnfs: use i_count refcount to determine if the inode is going away Josef Bacik
2025-08-21 20:18 ` [PATCH 46/50] fs: remove some spurious I_FREEING references in inode.c Josef Bacik
2025-08-21 20:18 ` [PATCH 47/50] xfs: remove reference to I_FREEING|I_WILL_FREE Josef Bacik
2025-08-21 20:18 ` [PATCH 48/50] ocfs2: do not set I_WILL_FREE Josef Bacik
2025-08-21 20:19 ` [PATCH 49/50] fs: remove I_FREEING|I_WILL_FREE Josef Bacik
2025-08-25 11:53 ` Christian Brauner
2025-08-21 20:19 ` [PATCH 50/50] fs: add documentation explaining the reference count rules for inodes Josef Bacik
2025-08-25 11:56 ` Christian Brauner
2025-08-22 10:51 ` [PATCH 00/50] fs: rework inode reference counting Christian Brauner
2025-08-22 13:30 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250826-begnadet-vorarbeit-901ad1e2bfa0@brauner \
--to=brauner@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox