From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: "Jan Kara" <jack@suse.cz>,
"Christoph Hellwig" <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev,
torvalds@linux-foundation.org,
"Mickaël Salaün" <mic@linux.microsoft.com>,
"Jann Horn" <jannh@google.com>, "Serge Hallyn" <serge@hallyn.com>,
"Kees Cook" <keescook@chromium.org>,
linux-security-module@vger.kernel.org,
"Amir Goldstein" <amir73il@gmail.com>
Subject: Re: lsm sb_delete hook, was Re: [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes()
Date: Thu, 3 Oct 2024 18:17:31 +0200 [thread overview]
Message-ID: <20241003161731.kwveypqzu4bivesv@quack3> (raw)
In-Reply-To: <Zv6jV40xKIJYuePA@dread.disaster.area>
On Thu 03-10-24 23:59:51, Dave Chinner wrote:
> On Thu, Oct 03, 2024 at 02:56:50PM +0200, Jan Kara wrote:
> > On Thu 03-10-24 05:39:23, Christoph Hellwig wrote:
> > > @@ -789,11 +789,23 @@ static bool dispose_list(struct list_head *head)
> > > */
> > > static int evict_inode_fn(struct inode *inode, void *data)
> > > {
> > > + struct super_block *sb = inode->i_sb;
> > > struct list_head *dispose = data;
> > > + bool post_unmount = !(sb->s_flags & SB_ACTIVE);
> > >
> > > spin_lock(&inode->i_lock);
> > > - if (atomic_read(&inode->i_count) ||
> > > - (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))) {
> > > + if (atomic_read(&inode->i_count)) {
> > > + spin_unlock(&inode->i_lock);
> > > +
> > > + /* for each watch, send FS_UNMOUNT and then remove it */
> > > + if (post_unmount && fsnotify_sb_info(sb)) {
> > > + fsnotify_inode(inode, FS_UNMOUNT);
> > > + fsnotify_inode_delete(inode);
> > > + }
> >
> > This will not work because you are in unsafe iterator holding
> > sb->s_inode_list_lock. To be able to call into fsnotify, you need to do the
> > iget / iput dance and releasing of s_inode_list_lock which does not work
> > when a filesystem has its own inodes iterator AFAICT... That's why I've
> > called it a layering violation.
>
> The whole point of the iget/iput dance is to stabilise the
> s_inodes list iteration whilst it is unlocked - the actual fsnotify
> calls don't need an inode reference to work correctly.
>
> IOWs, we don't need to run the fsnotify stuff right here - we can
> defer that like we do with the dispose list for all the inodes we
> mark as I_FREEING here.
>
> So if we pass a structure:
>
> struct evict_inode_args {
> struct list_head dispose;
> struct list_head fsnotify;
> };
>
> If we use __iget() instead of requiring an inode state flag to keep
> the inode off the LRU for the fsnotify cleanup, then the code
> fragment above becomes:
>
> if (atomic_read(&inode->i_count)) {
> if (post_unmount && fsnotify_sb_info(sb)) {
> __iget(inode);
> inode_lru_list_del(inode);
> spin_unlock(&inode->i_lock);
> list_add(&inode->i_lru, &args->fsnotify);
> }
Nit: Need to release i_lock in else branch here. Otherwise interesting
idea. Yes, something like this could work even in unsafe iterator.
> return INO_ITER_DONE;
> }
> And then once we return to evict_inodes(), we do this:
>
> while (!list_empty(args->fsnotify)) {
> struct inode *inode
>
> inode = list_first_entry(head, struct inode, i_lru);
> list_del_init(&inode->i_lru);
>
> fsnotify_inode(inode, FS_UNMOUNT);
> fsnotify_inode_delete(inode);
> iput(inode);
> cond_resched();
> }
>
> And so now all the fsnotify cleanup is done outside the traversal in
> one large batch from evict_inodes().
Yup.
> As for the landlock code, I think it needs to have it's own internal
> tracking mechanism and not search the sb inode list for inodes that
> it holds references to. LSM cleanup should be run before before we
> get to tearing down the inode cache, not after....
Well, I think LSM cleanup could in principle be handled together with the
fsnotify cleanup but I didn't check the details.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2024-10-03 16:17 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-02 1:33 [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Dave Chinner
2024-10-02 1:33 ` [PATCH 1/7] vfs: replace invalidate_inodes() with evict_inodes() Dave Chinner
2024-10-03 7:07 ` Christoph Hellwig
2024-10-03 9:20 ` Jan Kara
2024-10-02 1:33 ` [PATCH 2/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03 7:12 ` Christoph Hellwig
2024-10-03 10:35 ` Dave Chinner
2024-10-04 9:53 ` kernel test robot
2024-10-02 1:33 ` [PATCH 3/7] vfs: convert vfs inode iterators to super_iter_inodes_unsafe() Dave Chinner
2024-10-03 7:14 ` Christoph Hellwig
2024-10-03 10:45 ` Dave Chinner
2024-10-04 10:55 ` kernel test robot
2024-10-02 1:33 ` [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes() Dave Chinner
2024-10-03 7:23 ` lsm sb_delete hook, was " Christoph Hellwig
2024-10-03 7:38 ` Christoph Hellwig
2024-10-03 11:57 ` Jan Kara
2024-10-03 12:11 ` Christoph Hellwig
2024-10-03 12:26 ` Jan Kara
2024-10-03 12:39 ` Christoph Hellwig
2024-10-03 12:56 ` Jan Kara
2024-10-03 13:04 ` Christoph Hellwig
2024-10-03 13:59 ` Dave Chinner
2024-10-03 16:17 ` Jan Kara [this message]
2024-10-04 0:46 ` Dave Chinner
2024-10-04 7:21 ` Christian Brauner
2024-10-04 12:14 ` Christoph Hellwig
2024-10-04 13:49 ` Jan Kara
2024-10-04 18:15 ` Paul Moore
2024-10-04 22:57 ` Dave Chinner
2024-10-05 15:21 ` Mickaël Salaün
2024-10-05 16:03 ` Mickaël Salaün
2024-10-05 16:03 ` Paul Moore
2024-10-07 20:37 ` Linus Torvalds
2024-10-07 23:33 ` Dave Chinner
2024-10-08 0:28 ` Linus Torvalds
2024-10-08 0:54 ` Linus Torvalds
2024-10-09 9:49 ` Jan Kara
2024-10-08 12:59 ` Mickaël Salaün
2024-10-09 0:21 ` Dave Chinner
2024-10-09 9:23 ` Mickaël Salaün
2024-10-08 8:57 ` Amir Goldstein
2024-10-08 11:23 ` Jan Kara
2024-10-08 12:16 ` Christian Brauner
2024-10-09 0:03 ` Dave Chinner
2024-10-08 23:44 ` Dave Chinner
2024-10-09 6:10 ` Amir Goldstein
2024-10-09 14:18 ` Jan Kara
2024-10-02 1:33 ` [PATCH 5/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03 7:24 ` Christoph Hellwig
2024-10-02 1:33 ` [PATCH 6/7] xfs: implement sb->iter_vfs_inodes Dave Chinner
2024-10-03 7:30 ` Christoph Hellwig
2024-10-02 1:33 ` [PATCH 7/7] bcachefs: " Dave Chinner
2024-10-02 10:00 ` [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Christian Brauner
2024-10-02 12:34 ` Dave Chinner
2024-10-02 19:29 ` Kent Overstreet
2024-10-02 22:23 ` Dave Chinner
2024-10-02 23:20 ` Kent Overstreet
2024-10-03 1:41 ` Dave Chinner
2024-10-03 2:24 ` Kent Overstreet
2024-10-03 9:17 ` Jan Kara
2024-10-03 9:59 ` Dave Chinner
2024-10-02 19:49 ` Linus Torvalds
2024-10-02 20:28 ` Kent Overstreet
2024-10-02 23:17 ` Dave Chinner
2024-10-03 1:22 ` Kent Overstreet
2024-10-03 2:20 ` Dave Chinner
2024-10-03 2:42 ` Kent Overstreet
2024-10-03 11:45 ` Jan Kara
2024-10-03 12:18 ` Christoph Hellwig
2024-10-03 12:46 ` Jan Kara
2024-10-03 13:35 ` Dave Chinner
2024-10-03 13:03 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241003161731.kwveypqzu4bivesv@quack3 \
--to=jack@suse.cz \
--cc=amir73il@gmail.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mic@linux.microsoft.com \
--cc=serge@hallyn.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).