All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev,
	torvalds@linux-foundation.org
Subject: Re: [RFC PATCH 0/7] vfs: improving inode cache iteration scalability
Date: Thu, 3 Oct 2024 23:35:56 +1000	[thread overview]
Message-ID: <Zv6dvCpGIqOr9IPw@dread.disaster.area> (raw)
In-Reply-To: <20241003124619.wfgozqj4yoyl4xbu@quack3>

On Thu, Oct 03, 2024 at 02:46:19PM +0200, Jan Kara wrote:
> On Thu 03-10-24 05:18:30, Christoph Hellwig wrote:
> > On Thu, Oct 03, 2024 at 01:45:55PM +0200, Jan Kara wrote:
> > > /* Find next inode on the inode list eligible for processing */
> > > #define sb_inode_iter_next(sb, inode, old_inode, inode_eligible) 	\
> > > ({									\
> > > 	struct inode *ret = NULL;					\
> > 
> > <snip>
> > 
> > > 	ret;								\
> > > })
> > 
> > How is this going to interact with calling into the file system
> > to do the interaction, which is kinda the point of this series?
> 
> Yeah, I was concentrated on the VFS bits and forgot why Dave wrote this
> series in the first place. So this style of iterator isn't useful for what
> Dave wants to achieve. Sorry for the noise. Still the possibility to have a
> callback under inode->i_lock being able to do stuff and decide whether we

I did that first, and turned into an utter mess the moment we step
outside the existing iterator mechanism.

I implemented a separate XFS icwalk function because having to hold
the inode->i_lock between inode lookup and the callback function
means we cannot do batched inode lookups from the radix tree.

The existing icwalk code grabs 32 inodes at a time from the radix
tree and validates them all, then runs the callback on them one at a
time, then it drops them all.

If the VFS inode callback requires the inode i_lock to be held and
be atomic with the initial state checks, then we have to nest 32
spinlocks in what is effectively a random lock order.

So I implemented a non-batched icwalk method, and it didn't get that
much cleaner. It wasn't until I dropped the inode->i_lock from the
callback API that everything cleaned up and the offload mechanism
started to make sense. And it was this change that also makes it
possible for XFs to use it's existing batched lookup mechanisms
instead of the special case implementation that I wrote for this
patch set.

IOWs, we can't hold the inode->i_lock across lookup validation to
callback if we want to provide freedom of implementation to the
filesystem specific code. We aren't really concerned about
performance of traversals, so I went with freedom of implementation
over clunky locking semantics to optimise away a couple of atomic
ops per inode for iget/iput calls.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2024-10-03 13:35 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-02  1:33 [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Dave Chinner
2024-10-02  1:33 ` [PATCH 1/7] vfs: replace invalidate_inodes() with evict_inodes() Dave Chinner
2024-10-03  7:07   ` Christoph Hellwig
2024-10-03  9:20   ` Jan Kara
2024-10-02  1:33 ` [PATCH 2/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03  7:12   ` Christoph Hellwig
2024-10-03 10:35     ` Dave Chinner
2024-10-04  9:53   ` kernel test robot
2024-10-02  1:33 ` [PATCH 3/7] vfs: convert vfs inode iterators to super_iter_inodes_unsafe() Dave Chinner
2024-10-03  7:14   ` Christoph Hellwig
2024-10-03 10:45     ` Dave Chinner
2024-10-04 10:55   ` kernel test robot
2024-10-02  1:33 ` [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes() Dave Chinner
2024-10-03  7:23   ` lsm sb_delete hook, was " Christoph Hellwig
2024-10-03  7:38     ` Christoph Hellwig
2024-10-03 11:57       ` Jan Kara
2024-10-03 12:11         ` Christoph Hellwig
2024-10-03 12:26           ` Jan Kara
2024-10-03 12:39             ` Christoph Hellwig
2024-10-03 12:56               ` Jan Kara
2024-10-03 13:04                 ` Christoph Hellwig
2024-10-03 13:59                 ` Dave Chinner
2024-10-03 16:17                   ` Jan Kara
2024-10-04  0:46                     ` Dave Chinner
2024-10-04  7:21                       ` Christian Brauner
2024-10-04 12:14                         ` Christoph Hellwig
2024-10-04 13:49                           ` Jan Kara
2024-10-04 18:15                             ` Paul Moore
2024-10-04 22:57                         ` Dave Chinner
2024-10-05 15:21                           ` Mickaël Salaün
2024-10-05 16:03                             ` Mickaël Salaün
2024-10-05 16:03                             ` Paul Moore
2024-10-07 20:37         ` Linus Torvalds
2024-10-07 23:33           ` Dave Chinner
2024-10-08  0:28             ` Linus Torvalds
2024-10-08  0:54               ` Linus Torvalds
2024-10-09  9:49                 ` Jan Kara
2024-10-08 12:59               ` Mickaël Salaün
2024-10-09  0:21                 ` Dave Chinner
2024-10-09  9:23                   ` Mickaël Salaün
2024-10-08  8:57             ` Amir Goldstein
2024-10-08 11:23               ` Jan Kara
2024-10-08 12:16                 ` Christian Brauner
2024-10-09  0:03                   ` Dave Chinner
2024-10-08 23:44                 ` Dave Chinner
2024-10-09  6:10                   ` Amir Goldstein
2024-10-09 14:18                   ` Jan Kara
2024-10-02  1:33 ` [PATCH 5/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03  7:24   ` Christoph Hellwig
2024-10-02  1:33 ` [PATCH 6/7] xfs: implement sb->iter_vfs_inodes Dave Chinner
2024-10-03  7:30   ` Christoph Hellwig
2024-10-02  1:33 ` [PATCH 7/7] bcachefs: " Dave Chinner
2024-10-02 10:00 ` [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Christian Brauner
2024-10-02 12:34   ` Dave Chinner
2024-10-02 19:29     ` Kent Overstreet
2024-10-02 22:23       ` Dave Chinner
2024-10-02 23:20         ` Kent Overstreet
2024-10-03  1:41           ` Dave Chinner
2024-10-03  2:24             ` Kent Overstreet
2024-10-03  9:17             ` Jan Kara
2024-10-03  9:59               ` Dave Chinner
2024-10-02 19:49     ` Linus Torvalds
2024-10-02 20:28       ` Kent Overstreet
2024-10-02 23:17         ` Dave Chinner
2024-10-03  1:22           ` Kent Overstreet
2024-10-03  2:20             ` Dave Chinner
2024-10-03  2:42               ` Kent Overstreet
2024-10-03 11:45 ` Jan Kara
2024-10-03 12:18   ` Christoph Hellwig
2024-10-03 12:46     ` Jan Kara
2024-10-03 13:35       ` Dave Chinner [this message]
2024-10-03 13:03   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv6dvCpGIqOr9IPw@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.