linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Mickaël Salaün" <mic@digikod.net>
To: Dave Chinner <david@fromorbit.com>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>,
	"Jan Kara" <jack@suse.cz>,
	"Christoph Hellwig" <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-bcachefs@vger.kernel.org, kent.overstreet@linux.dev,
	"Jann Horn" <jannh@google.com>, "Serge Hallyn" <serge@hallyn.com>,
	"Kees Cook" <keescook@chromium.org>,
	linux-security-module@vger.kernel.org,
	"Amir Goldstein" <amir73il@gmail.com>,
	"Günther Noack" <gnoack@google.com>,
	"Christian Brauner" <brauner@kernel.org>
Subject: Re: lsm sb_delete hook, was Re: [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes()
Date: Wed, 9 Oct 2024 11:23:06 +0200	[thread overview]
Message-ID: <20241009.ahqu6AeW3cow@digikod.net> (raw)
In-Reply-To: <ZwXMdqxz5PWNjW3C@dread.disaster.area>

On Wed, Oct 09, 2024 at 11:21:10AM +1100, Dave Chinner wrote:
> On Tue, Oct 08, 2024 at 02:59:07PM +0200, Mickaël Salaün wrote:
> > On Mon, Oct 07, 2024 at 05:28:57PM -0700, Linus Torvalds wrote:
> > > On Mon, 7 Oct 2024 at 16:33, Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > There may be other inode references being held that make
> > > > the inode live longer than the dentry cache. When should the
> > > > fsnotify marks be removed from the inode in that case? Do they need
> > > > to remain until, e.g, writeback completes?
> > > 
> > > Note that my idea is to just remove the fsnotify marks when the dentry
> > > discards the inode.
> > > 
> > > That means that yes, the inode may still have a lifetime after the
> > > dentry (because of other references, _or_ just because I_DONTCACHE
> > > isn't set and we keep caching the inode).
> > > 
> > > BUT - fsnotify won't care. There won't be any fsnotify marks on that
> > > inode any more, and without a dentry that points to it, there's no way
> > > to add such marks.
> > > 
> > > (A new dentry may be re-attached to such an inode, and then fsnotify
> > > could re-add new marks, but that doesn't change anything - the next
> > > time the dentry is detached, the marks would go away again).
> > > 
> > > And yes, this changes the timing on when fsnotify events happen, but
> > > what I'm actually hoping for is that Jan will agree that it doesn't
> > > actually matter semantically.
> > > 
> > > > > Then at umount time, the dentry shrinking will deal with all live
> > > > > dentries, and at most the fsnotify layer would send the FS_UNMOUNT to
> > > > > just the root dentry inodes?
> > > >
> > > > I don't think even that is necessary, because
> > > > shrink_dcache_for_umount() drops the sb->s_root dentry after
> > > > trimming the dentry tree. Hence the dcache drop would cleanup all
> > > > inode references, roots included.
> > > 
> > > Ahh - even better.
> > > 
> > > I didn't actually look very closely at the actual umount path, I was
> > > looking just at the fsnotify_inoderemove() place in
> > > dentry_unlink_inode() and went "couldn't we do _this_ instead?"
> > > 
> > > > > Wouldn't that make things much cleaner, and remove at least *one* odd
> > > > > use of the nasty s_inodes list?
> > > >
> > > > Yes, it would, but someone who knows exactly when the fsnotify
> > > > marks can be removed needs to chime in here...
> > > 
> > > Yup. Honza?
> > > 
> > > (Aside: I don't actually know if you prefer Jan or Honza, so I use
> > > both randomly and interchangeably?)
> > > 
> > > > > I have this feeling that maybe we can just remove the other users too
> > > > > using similar models. I think the LSM layer use (in landlock) is bogus
> > > > > for exactly the same reason - there's really no reason to keep things
> > > > > around for a random cached inode without a dentry.
> > > >
> > > > Perhaps, but I'm not sure what the landlock code is actually trying
> > > > to do.
> > 
> > In Landlock, inodes (see landlock_object) may be referenced by several
> > rulesets, either tied to a task's cred or a ruleset's file descriptor.
> > A ruleset may outlive its referenced inodes, and this should not block
> > related umounts.  security_sb_delete() is used to gracefully release
> > such references.
> 
> Ah, there's the problem. The ruleset is persistent, not the inode.
> Like fsnotify, the life cycle and reference counting is upside down.
> The inode should cache the ruleset rather than the ruleset pinning
> the inode.

A ruleset needs to takes a reference to the inode as for an opened file
and keep it "alive" as long as it may be re-used by user space (i.e. as
long as the superblock exists).  One of the goal of a ruleset is to
identify inodes as long as they are accessible.  When a sandboxed
process request to open a file, its sandbox's ruleset checks against the
referenced inodes (in a nutshell).

In practice, rulesets reference a set of struct landlock_object which
references an inode or not (if it vanished).  There is only one
landlock_object referenced per inode.  This makes it possible to have a
dynamic N:M mapping between rulesets and inodes which enables a ruleset
to be deleted before its referenced inodes, or the other way around.

> 
> See my reply to Jan about fsnotify.
> 
> > > Yeah, I wouldn't be surprised if it's just confused - it's very odd.
> > > 
> > > But I'd be perfectly happy just removing one use at a time - even if
> > > we keep the s_inodes list around because of other users, it would
> > > still be "one less thing".
> > > 
> > > > Hence, to me, the lifecycle and reference counting of inode related
> > > > objects in landlock doesn't seem quite right, and the use of the
> > > > security_sb_delete() callout appears to be papering over an internal
> > > > lifecycle issue.
> > > >
> > > > I'd love to get rid of it altogether.
> > 
> > I'm not sure to fully understand the implications for now, but it would
> > definitely be good to simplify this lifetime management.  The only
> > requirement for Landlock is that inodes references should live as long
> > as the related inodes are accessible by user space or already in use.
> > The sooner these references are removed from related ruleset, the
> > better.
> 
> I'm missing something.  Inodes are accessible to users even when
> they are not in cache - we just read them from disk and instantiate
> a new VFS inode.
> 
> So how do you attach the correct ruleset to a newly instantiated
> inode?

We can see a Landlock ruleset as a set of weakly opened files/inodes.  A
Landolck ruleset call iget() to keep the related VFS inodes alive, which
means that when user space opens a file pointing to the same inode, the
same VFS inode will be re-used and then we can match it against a ruleset.

> 
> i.e. If you can find the ruleset for any given inode that is brought
> into cache (e.g. opening an existing, uncached file), then why do
> you need to take inode references so they are never evicted?

A landlock_object only keep a reference to an inode, not to the rulesets
pointing to it:
* inode -> 1 landlock_object or NULL
* landlock_object -> 1 inode or NULL
* ruleset -> N landlock_object

There are mainly two different operations:
1. Match 1 inode against a set of N inode references (i.e. a ruleset).
2. Drop the references of N rulesets (in practice 1 intermediate
   landlock_object) pointing to 1 inode.

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

  reply	other threads:[~2024-10-09  9:29 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-02  1:33 [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Dave Chinner
2024-10-02  1:33 ` [PATCH 1/7] vfs: replace invalidate_inodes() with evict_inodes() Dave Chinner
2024-10-03  7:07   ` Christoph Hellwig
2024-10-03  9:20   ` Jan Kara
2024-10-02  1:33 ` [PATCH 2/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03  7:12   ` Christoph Hellwig
2024-10-03 10:35     ` Dave Chinner
2024-10-04  9:53   ` kernel test robot
2024-10-02  1:33 ` [PATCH 3/7] vfs: convert vfs inode iterators to super_iter_inodes_unsafe() Dave Chinner
2024-10-03  7:14   ` Christoph Hellwig
2024-10-03 10:45     ` Dave Chinner
2024-10-04 10:55   ` kernel test robot
2024-10-02  1:33 ` [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes() Dave Chinner
2024-10-03  7:23   ` lsm sb_delete hook, was " Christoph Hellwig
2024-10-03  7:38     ` Christoph Hellwig
2024-10-03 11:57       ` Jan Kara
2024-10-03 12:11         ` Christoph Hellwig
2024-10-03 12:26           ` Jan Kara
2024-10-03 12:39             ` Christoph Hellwig
2024-10-03 12:56               ` Jan Kara
2024-10-03 13:04                 ` Christoph Hellwig
2024-10-03 13:59                 ` Dave Chinner
2024-10-03 16:17                   ` Jan Kara
2024-10-04  0:46                     ` Dave Chinner
2024-10-04  7:21                       ` Christian Brauner
2024-10-04 12:14                         ` Christoph Hellwig
2024-10-04 13:49                           ` Jan Kara
2024-10-04 18:15                             ` Paul Moore
2024-10-04 22:57                         ` Dave Chinner
2024-10-05 15:21                           ` Mickaël Salaün
2024-10-05 16:03                             ` Mickaël Salaün
2024-10-05 16:03                             ` Paul Moore
2024-10-07 20:37         ` Linus Torvalds
2024-10-07 23:33           ` Dave Chinner
2024-10-08  0:28             ` Linus Torvalds
2024-10-08  0:54               ` Linus Torvalds
2024-10-09  9:49                 ` Jan Kara
2024-10-08 12:59               ` Mickaël Salaün
2024-10-09  0:21                 ` Dave Chinner
2024-10-09  9:23                   ` Mickaël Salaün [this message]
2024-10-08  8:57             ` Amir Goldstein
2024-10-08 11:23               ` Jan Kara
2024-10-08 12:16                 ` Christian Brauner
2024-10-09  0:03                   ` Dave Chinner
2024-10-08 23:44                 ` Dave Chinner
2024-10-09  6:10                   ` Amir Goldstein
2024-10-09 14:18                   ` Jan Kara
2024-10-02  1:33 ` [PATCH 5/7] vfs: add inode iteration superblock method Dave Chinner
2024-10-03  7:24   ` Christoph Hellwig
2024-10-02  1:33 ` [PATCH 6/7] xfs: implement sb->iter_vfs_inodes Dave Chinner
2024-10-03  7:30   ` Christoph Hellwig
2024-10-02  1:33 ` [PATCH 7/7] bcachefs: " Dave Chinner
2024-10-02 10:00 ` [RFC PATCH 0/7] vfs: improving inode cache iteration scalability Christian Brauner
2024-10-02 12:34   ` Dave Chinner
2024-10-02 19:29     ` Kent Overstreet
2024-10-02 22:23       ` Dave Chinner
2024-10-02 23:20         ` Kent Overstreet
2024-10-03  1:41           ` Dave Chinner
2024-10-03  2:24             ` Kent Overstreet
2024-10-03  9:17             ` Jan Kara
2024-10-03  9:59               ` Dave Chinner
2024-10-02 19:49     ` Linus Torvalds
2024-10-02 20:28       ` Kent Overstreet
2024-10-02 23:17         ` Dave Chinner
2024-10-03  1:22           ` Kent Overstreet
2024-10-03  2:20             ` Dave Chinner
2024-10-03  2:42               ` Kent Overstreet
2024-10-03 11:45 ` Jan Kara
2024-10-03 12:18   ` Christoph Hellwig
2024-10-03 12:46     ` Jan Kara
2024-10-03 13:35       ` Dave Chinner
2024-10-03 13:03   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241009.ahqu6AeW3cow@digikod.net \
    --to=mic@digikod.net \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=gnoack@google.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=serge@hallyn.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).