From: "Mickaël Salaün" <mic@digikod.net>
To: Dave Chinner <david@fromorbit.com>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>,
"Jan Kara" <jack@suse.cz>,
"Christoph Hellwig" <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev,
"Jann Horn" <jannh@google.com>, "Serge Hallyn" <serge@hallyn.com>,
"Kees Cook" <keescook@chromium.org>,
linux-security-module@vger.kernel.org,
"Amir Goldstein" <amir73il@gmail.com>,
"Günther Noack" <gnoack@google.com>,
"Christian Brauner" <brauner@kernel.org>
Subject: Re: lsm sb_delete hook, was Re: [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes()
Date: Wed, 9 Oct 2024 11:23:06 +0200 [thread overview]
Message-ID: <20241009.ahqu6AeW3cow@digikod.net> (raw)
In-Reply-To: <ZwXMdqxz5PWNjW3C@dread.disaster.area>
On Wed, Oct 09, 2024 at 11:21:10AM +1100, Dave Chinner wrote:
> On Tue, Oct 08, 2024 at 02:59:07PM +0200, Mickaël Salaün wrote:
> > On Mon, Oct 07, 2024 at 05:28:57PM -0700, Linus Torvalds wrote:
> > > On Mon, 7 Oct 2024 at 16:33, Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > There may be other inode references being held that make
> > > > the inode live longer than the dentry cache. When should the
> > > > fsnotify marks be removed from the inode in that case? Do they need
> > > > to remain until, e.g, writeback completes?
> > >
> > > Note that my idea is to just remove the fsnotify marks when the dentry
> > > discards the inode.
> > >
> > > That means that yes, the inode may still have a lifetime after the
> > > dentry (because of other references, _or_ just because I_DONTCACHE
> > > isn't set and we keep caching the inode).
> > >
> > > BUT - fsnotify won't care. There won't be any fsnotify marks on that
> > > inode any more, and without a dentry that points to it, there's no way
> > > to add such marks.
> > >
> > > (A new dentry may be re-attached to such an inode, and then fsnotify
> > > could re-add new marks, but that doesn't change anything - the next
> > > time the dentry is detached, the marks would go away again).
> > >
> > > And yes, this changes the timing on when fsnotify events happen, but
> > > what I'm actually hoping for is that Jan will agree that it doesn't
> > > actually matter semantically.
> > >
> > > > > Then at umount time, the dentry shrinking will deal with all live
> > > > > dentries, and at most the fsnotify layer would send the FS_UNMOUNT to
> > > > > just the root dentry inodes?
> > > >
> > > > I don't think even that is necessary, because
> > > > shrink_dcache_for_umount() drops the sb->s_root dentry after
> > > > trimming the dentry tree. Hence the dcache drop would cleanup all
> > > > inode references, roots included.
> > >
> > > Ahh - even better.
> > >
> > > I didn't actually look very closely at the actual umount path, I was
> > > looking just at the fsnotify_inoderemove() place in
> > > dentry_unlink_inode() and went "couldn't we do _this_ instead?"
> > >
> > > > > Wouldn't that make things much cleaner, and remove at least *one* odd
> > > > > use of the nasty s_inodes list?
> > > >
> > > > Yes, it would, but someone who knows exactly when the fsnotify
> > > > marks can be removed needs to chime in here...
> > >
> > > Yup. Honza?
> > >
> > > (Aside: I don't actually know if you prefer Jan or Honza, so I use
> > > both randomly and interchangeably?)
> > >
> > > > > I have this feeling that maybe we can just remove the other users too
> > > > > using similar models. I think the LSM layer use (in landlock) is bogus
> > > > > for exactly the same reason - there's really no reason to keep things
> > > > > around for a random cached inode without a dentry.
> > > >
> > > > Perhaps, but I'm not sure what the landlock code is actually trying
> > > > to do.
> >
> > In Landlock, inodes (see landlock_object) may be referenced by several
> > rulesets, either tied to a task's cred or a ruleset's file descriptor.
> > A ruleset may outlive its referenced inodes, and this should not block
> > related umounts. security_sb_delete() is used to gracefully release
> > such references.
>
> Ah, there's the problem. The ruleset is persistent, not the inode.
> Like fsnotify, the life cycle and reference counting is upside down.
> The inode should cache the ruleset rather than the ruleset pinning
> the inode.
A ruleset needs to takes a reference to the inode as for an opened file
and keep it "alive" as long as it may be re-used by user space (i.e. as
long as the superblock exists). One of the goal of a ruleset is to
identify inodes as long as they are accessible. When a sandboxed
process request to open a file, its sandbox's ruleset checks against the
referenced inodes (in a nutshell).
In practice, rulesets reference a set of struct landlock_object which
references an inode or not (if it vanished). There is only one
landlock_object referenced per inode. This makes it possible to have a
dynamic N:M mapping between rulesets and inodes which enables a ruleset
to be deleted before its referenced inodes, or the other way around.
>
> See my reply to Jan about fsnotify.
>
> > > Yeah, I wouldn't be surprised if it's just confused - it's very odd.
> > >
> > > But I'd be perfectly happy just removing one use at a time - even if
> > > we keep the s_inodes list around because of other users, it would
> > > still be "one less thing".
> > >
> > > > Hence, to me, the lifecycle and reference counting of inode related
> > > > objects in landlock doesn't seem quite right, and the use of the
> > > > security_sb_delete() callout appears to be papering over an internal
> > > > lifecycle issue.
> > > >
> > > > I'd love to get rid of it altogether.
> >
> > I'm not sure to fully understand the implications for now, but it would
> > definitely be good to simplify this lifetime management. The only
> > requirement for Landlock is that inodes references should live as long
> > as the related inodes are accessible by user space or already in use.
> > The sooner these references are removed from related ruleset, the
> > better.
>
> I'm missing something. Inodes are accessible to users even when
> they are not in cache - we just read them from disk and instantiate
> a new VFS inode.
>
> So how do you attach the correct ruleset to a newly instantiated
> inode?
We can see a Landlock ruleset as a set of weakly opened files/inodes. A
Landolck ruleset call iget() to keep the related VFS inodes alive, which
means that when user space opens a file pointing to the same inode, the
same VFS inode will be re-used and then we can match it against a ruleset.
>
> i.e. If you can find the ruleset for any given inode that is brought
> into cache (e.g. opening an existing, uncached file), then why do
> you need to take inode references so they are never evicted?
A landlock_object only keep a reference to an inode, not to the rulesets
pointing to it:
* inode -> 1 landlock_object or NULL
* landlock_object -> 1 inode or NULL
* ruleset -> N landlock_object
There are mainly two different operations:
1. Match 1 inode against a set of N inode references (i.e. a ruleset).
2. Drop the references of N rulesets (in practice 1 intermediate
landlock_object) pointing to 1 inode.
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2024-10-09 9:29 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-02 1:33 [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Dave Chinner
2024-10-02 1:33 ` [PATCH 1/7] vfs: replace invalidate_inodes() with evict_inodes() Dave Chinner
2024-10-03 7:07 ` Christoph Hellwig
2024-10-03 9:20 ` Jan Kara
2024-10-02 1:33 ` [PATCH 2/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03 7:12 ` Christoph Hellwig
2024-10-03 10:35 ` Dave Chinner
2024-10-04 9:53 ` kernel test robot
2024-10-02 1:33 ` [PATCH 3/7] vfs: convert vfs inode iterators to super_iter_inodes_unsafe() Dave Chinner
2024-10-03 7:14 ` Christoph Hellwig
2024-10-03 10:45 ` Dave Chinner
2024-10-04 10:55 ` kernel test robot
2024-10-02 1:33 ` [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes() Dave Chinner
2024-10-03 7:23 ` lsm sb_delete hook, was " Christoph Hellwig
2024-10-03 7:38 ` Christoph Hellwig
2024-10-03 11:57 ` Jan Kara
2024-10-03 12:11 ` Christoph Hellwig
2024-10-03 12:26 ` Jan Kara
2024-10-03 12:39 ` Christoph Hellwig
2024-10-03 12:56 ` Jan Kara
2024-10-03 13:04 ` Christoph Hellwig
2024-10-03 13:59 ` Dave Chinner
2024-10-03 16:17 ` Jan Kara
2024-10-04 0:46 ` Dave Chinner
2024-10-04 7:21 ` Christian Brauner
2024-10-04 12:14 ` Christoph Hellwig
2024-10-04 13:49 ` Jan Kara
2024-10-04 18:15 ` Paul Moore
2024-10-04 22:57 ` Dave Chinner
2024-10-05 15:21 ` Mickaël Salaün
2024-10-05 16:03 ` Mickaël Salaün
2024-10-05 16:03 ` Paul Moore
2024-10-07 20:37 ` Linus Torvalds
2024-10-07 23:33 ` Dave Chinner
2024-10-08 0:28 ` Linus Torvalds
2024-10-08 0:54 ` Linus Torvalds
2024-10-09 9:49 ` Jan Kara
2024-10-08 12:59 ` Mickaël Salaün
2024-10-09 0:21 ` Dave Chinner
2024-10-09 9:23 ` Mickaël Salaün [this message]
2024-10-08 8:57 ` Amir Goldstein
2024-10-08 11:23 ` Jan Kara
2024-10-08 12:16 ` Christian Brauner
2024-10-09 0:03 ` Dave Chinner
2024-10-08 23:44 ` Dave Chinner
2024-10-09 6:10 ` Amir Goldstein
2024-10-09 14:18 ` Jan Kara
2024-10-02 1:33 ` [PATCH 5/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03 7:24 ` Christoph Hellwig
2024-10-02 1:33 ` [PATCH 6/7] xfs: implement sb->iter_vfs_inodes Dave Chinner
2024-10-03 7:30 ` Christoph Hellwig
2024-10-02 1:33 ` [PATCH 7/7] bcachefs: " Dave Chinner
2024-10-02 10:00 ` [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Christian Brauner
2024-10-02 12:34 ` Dave Chinner
2024-10-02 19:29 ` Kent Overstreet
2024-10-02 22:23 ` Dave Chinner
2024-10-02 23:20 ` Kent Overstreet
2024-10-03 1:41 ` Dave Chinner
2024-10-03 2:24 ` Kent Overstreet
2024-10-03 9:17 ` Jan Kara
2024-10-03 9:59 ` Dave Chinner
2024-10-02 19:49 ` Linus Torvalds
2024-10-02 20:28 ` Kent Overstreet
2024-10-02 23:17 ` Dave Chinner
2024-10-03 1:22 ` Kent Overstreet
2024-10-03 2:20 ` Dave Chinner
2024-10-03 2:42 ` Kent Overstreet
2024-10-03 11:45 ` Jan Kara
2024-10-03 12:18 ` Christoph Hellwig
2024-10-03 12:46 ` Jan Kara
2024-10-03 13:35 ` Dave Chinner
2024-10-03 13:03 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241009.ahqu6AeW3cow@digikod.net \
--to=mic@digikod.net \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=gnoack@google.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=serge@hallyn.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).