All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Wedson Almeida Filho <wedsonaf@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Benno Lossin <benno.lossin@proton.me>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Wedson Almeida Filho <walmeida@microsoft.com>,
	Marco Elver <elver@google.com>
Subject: Re: [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root`
Date: Mon, 30 Oct 2023 13:29:47 +1100	[thread overview]
Message-ID: <ZT8VG0MTKNNpGDOC@dread.disaster.area> (raw)
In-Reply-To: <CANeycqq49Ubj-3BKcUaMOKeEwFastZzC17z_uk_VE3RDDv2wfw@mail.gmail.com>

On Mon, Oct 23, 2023 at 09:55:08AM -0300, Wedson Almeida Filho wrote:
> On Mon, 23 Oct 2023 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sat, Oct 21, 2023 at 12:33:57PM -0700, Boqun Feng wrote:
> > > On Sat, Oct 21, 2023 at 06:01:02PM +0100, Matthew Wilcox wrote:
> > > > I'm only an expert on the page cache, not the rest of the VFS.  So
> > > > what are the rules around modifying i_state for the VFS?
> > >
> > > Agreed, same question here.
> >
> > inode->i_state should only be modified under inode->i_lock.
> >
> > And in most situations, you have to hold the inode->i_lock to read
> > state flags as well so that reads are serialised against
> > modifications which are typically non-atomic RMW operations.
> >
> > There is, I think, one main exception to read side locking and this
> > is find_inode_rcu() which does an unlocked check for I_WILL_FREE |
> > I_FREEING. In this case, the inode->i_state updates in iput_final()
> > use WRITE_ONCE under the inode->i_lock to provide the necessary
> > semantics for the unlocked READ_ONCE() done under rcu_read_lock().
> >
> > IOWs, if you follow the general rule that any inode->i_state access
> > (read or write) needs to hold inode->i_lock, you probably won't
> > screw up.
> 
> I don't see filesystems doing this though. In particular, see
> iget_locked() -- if a new inode is returned, then it is locked, but if
> a cached one is found, it's not locked.

I did say "if you follow the general rule".

And where there is a "general rule" there is the implication that
there are special cases where the "general rule" doesn't get
applied, yes? :)

I_NEW is the exception to the general rule, and very few people
writing filesystems actually know about it let alone care about
it...

> So we're in this situation where a returned inode may or may not be
> locked. And the way to determine if it's locked or not is to read
> i_state.
> 
> Here are examples of kernfs, ext2, ext4 and squashfs doing it:
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/kernfs/inode.c#L252
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext2/inode.c#L1392
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/ext4/inode.c#L4707
> https://elixir.bootlin.com/linux/v6.6-rc7/source/fs/squashfs/inode.c#L82
> 
> They all call iget_locked(), and if I_NEW is set, they initialise the
> inode and unlock it with unlock_new_inode(); otherwise they just
> return the unlocked inode.

All of them are perfectly fine.

I_NEW is the bit we use to synchronise inode initialisation - we
have to ensure there is only a single initialisation running while
there are concurrent lookups that can find the inode whilst it is
being initialised. We cannot hold a spin lock over inode
initialisation (it may have to do IO!), so we set the I_NEW flag
under the i_lock and the inode_hash_lock during hash insertion so
that they are set atomically from the hash lookup POV. If the inode
is then found in cache, wait_on_inode() does the serialisation
against the running initialisation indicated by the __I_NEW bit in
the i_state word.

Hence if the caller of iget_locked() ever sees I_NEW, it is
guaranteed to have exclusive access to the inode and -must- first
initialise the inode and then call unlock_new_inode() when it has
completed. It doesn't need to hold inode->i_lock in this case
because there's nothing it needs to serialise against as
iget_locked() has already done all that work.

If the inode is found in cache by iget_locked, then the
wait_on_inode() call is guaranteed to ensure that I_NEW is not set
when it returns. The atomic bit operations on __I_NEW and the memory
barriers in unlock_new_inode() plays an important part in this
dance, and they guarantee that I_NEW has been cleared before
iget_locked() returns. No need for inode->i_lock to be held in this
case, either, because iget_locked() did all the serialisation for
us.

This special dance is an optimisation that avoids the need to take
inode->i_lock in the inode lookup fast path just to check I_NEW. It
is an exception to the general rule but internal it uses
inode->i_lock in the places it is needed to ensure anything using
the general rule about accessing i_state still behaves correctly.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-10-30  2:29 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 12:24 [RFC PATCH 00/19] Rust abstractions for VFS Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 01/19] rust: fs: add registration/unregistration of file systems Wedson Almeida Filho
2023-10-18 15:38   ` Benno Lossin
2024-01-10 18:32     ` Wedson Almeida Filho
2024-01-25  9:15       ` Benno Lossin
2023-10-18 12:25 ` [RFC PATCH 02/19] rust: fs: introduce the `module_fs` macro Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 03/19] samples: rust: add initial ro file system sample Wedson Almeida Filho
2023-10-28 16:18   ` Alice Ryhl
2024-01-10 18:25     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 04/19] rust: fs: introduce `FileSystem::super_params` Wedson Almeida Filho
2023-10-18 16:34   ` Benno Lossin
2023-10-28 16:39     ` Alice Ryhl
2023-10-30  8:21       ` Benno Lossin
2023-10-30 21:36         ` Alice Ryhl
2023-10-20 15:04   ` Ariel Miculas (amiculas)
2024-01-03 12:25   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 05/19] rust: fs: introduce `INode<T>` Wedson Almeida Filho
2023-10-28 18:00   ` Alice Ryhl
2024-01-03 12:45     ` Andreas Hindborg (Samsung)
2024-01-03 12:54   ` Andreas Hindborg (Samsung)
2024-01-04  5:20     ` Darrick J. Wong
2024-01-04  9:57       ` Andreas Hindborg (Samsung)
2024-01-10  9:45     ` Benno Lossin
2024-01-04  5:14   ` Darrick J. Wong
2024-01-24 18:17     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 06/19] rust: fs: introduce `FileSystem::init_root` Wedson Almeida Filho
2023-10-19 14:30   ` Benno Lossin
2023-10-20  0:52     ` Boqun Feng
2023-10-21 13:48       ` Benno Lossin
2023-10-21 15:57         ` Boqun Feng
2023-10-21 17:01           ` Matthew Wilcox
2023-10-21 19:33             ` Boqun Feng
2023-10-23  5:29               ` Dave Chinner
2023-10-23 12:55                 ` Wedson Almeida Filho
2023-10-30  2:29                   ` Dave Chinner [this message]
2023-10-31 20:49                     ` Wedson Almeida Filho
2023-11-08  4:54                       ` Dave Chinner
2023-11-08  6:15                         ` Wedson Almeida Filho
2023-10-20  0:30   ` Boqun Feng
2023-10-23 12:36     ` Wedson Almeida Filho
2024-01-03 13:29   ` Andreas Hindborg (Samsung)
2024-01-24  4:07     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 07/19] rust: fs: introduce `FileSystem::read_dir` Wedson Almeida Filho
2023-10-21  8:33   ` Benno Lossin
2024-01-03 14:09   ` Andreas Hindborg (Samsung)
2024-01-21 21:00   ` Askar Safin
2024-01-21 21:51     ` Dave Chinner
2023-10-18 12:25 ` [RFC PATCH 08/19] rust: fs: introduce `FileSystem::lookup` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 09/19] rust: folio: introduce basic support for folios Wedson Almeida Filho
2023-10-18 17:17   ` Matthew Wilcox
2023-10-18 18:32     ` Wedson Almeida Filho
2023-10-18 19:21       ` Matthew Wilcox
2023-10-19 13:25         ` Wedson Almeida Filho
2023-10-20  4:11           ` Matthew Wilcox
2023-10-20 15:17             ` Matthew Wilcox
2023-10-23 12:32               ` Wedson Almeida Filho
2023-10-23 10:48             ` Andreas Hindborg (Samsung)
2023-10-23 14:28               ` Matthew Wilcox
2023-10-24 15:04                 ` Ariel Miculas (amiculas)
2023-10-23 12:29             ` Wedson Almeida Filho
2023-10-21  9:21   ` Benno Lossin
2023-10-18 12:25 ` [RFC PATCH 10/19] rust: fs: introduce `FileSystem::read_folio` Wedson Almeida Filho
2023-11-07 22:18   ` Matthew Wilcox
2023-11-07 22:22     ` Al Viro
2023-11-08  0:35       ` Wedson Almeida Filho
2023-11-08  0:56         ` Al Viro
2023-11-08  2:39           ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 11/19] rust: fs: introduce `FileSystem::read_xattr` Wedson Almeida Filho
2023-10-18 13:06   ` Ariel Miculas (amiculas)
2023-10-19 13:35     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 12/19] rust: fs: introduce `FileSystem::statfs` Wedson Almeida Filho
2024-01-03 14:13   ` Andreas Hindborg (Samsung)
2024-01-04  5:33   ` Darrick J. Wong
2024-01-24  4:24     ` Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 13/19] rust: fs: introduce more inode types Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 14/19] rust: fs: add per-superblock data Wedson Almeida Filho
2023-10-25 15:51   ` Ariel Miculas (amiculas)
2023-10-26 13:46   ` Ariel Miculas (amiculas)
2024-01-03 14:16   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 15/19] rust: fs: add basic support for fs buffer heads Wedson Almeida Filho
2024-01-03 14:17   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 16/19] rust: fs: allow file systems backed by a block device Wedson Almeida Filho
2023-10-21 13:39   ` Benno Lossin
2024-01-24  4:14     ` Wedson Almeida Filho
2024-01-03 14:38   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 17/19] rust: fs: allow per-inode data Wedson Almeida Filho
2023-10-21 13:57   ` Benno Lossin
2024-01-03 14:39   ` Andreas Hindborg (Samsung)
2023-10-18 12:25 ` [RFC PATCH 18/19] rust: fs: export file type from mode constants Wedson Almeida Filho
2023-10-18 12:25 ` [RFC PATCH 19/19] tarfs: introduce tar fs Wedson Almeida Filho
2023-10-18 16:57   ` Matthew Wilcox
2023-10-18 17:05     ` Wedson Almeida Filho
2023-10-18 17:20       ` Matthew Wilcox
2023-10-18 18:07         ` Wedson Almeida Filho
2024-01-24  5:05   ` Matthew Wilcox
2024-01-24  5:23     ` Matthew Wilcox
2024-01-24 18:26       ` Wedson Almeida Filho
2024-01-24 21:05         ` Dave Chinner
2024-01-24 21:28           ` Matthew Wilcox
2024-01-24  5:34     ` Gao Xiang
2023-10-18 13:40 ` [RFC PATCH 00/19] Rust abstractions for VFS Ariel Miculas (amiculas)
2023-10-18 17:12   ` Wedson Almeida Filho
2023-10-29 20:31 ` Matthew Wilcox
2023-10-31 20:14   ` Wedson Almeida Filho
2024-01-03 18:02     ` Matthew Wilcox
2024-01-03 19:04       ` Wedson Almeida Filho
2024-01-03 19:53         ` Al Viro
2024-01-03 20:38           ` Kent Overstreet
2024-01-04  1:49         ` Matthew Wilcox
2024-01-09 18:25           ` Wedson Almeida Filho
2024-01-09 19:30             ` Matthew Wilcox
2024-01-03 19:14       ` Kent Overstreet
2024-01-03 20:41         ` Al Viro
2024-01-09 19:13           ` Wedson Almeida Filho
2024-01-09 19:25             ` Matthew Wilcox
2024-01-09 19:32               ` Greg Kroah-Hartman
2024-01-10  7:49                 ` Wedson Almeida Filho
2024-01-10  7:57                   ` Greg Kroah-Hartman
2024-01-10 12:56                   ` Matthew Wilcox
2024-01-09 22:19               ` Dave Chinner
2024-01-10 19:19                 ` Kent Overstreet
2024-01-24 13:08                   ` FUJITA Tomonori
2024-01-24 19:49                     ` Kent Overstreet
2024-01-05  0:04         ` David Howells
2024-01-05 15:54           ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZT8VG0MTKNNpGDOC@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=benno.lossin@proton.me \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=elver@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walmeida@microsoft.com \
    --cc=wedsonaf@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.