From: Al Viro <viro@zeniv.linux.org.uk>
To: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Fenghua Yu <fenghua.yu@intel.com>,
Reinette Chatre <reinette.chatre@intel.com>,
Miquel Raynal <miquel.raynal@bootlin.com>,
Richard Weinberger <richard@nod.at>,
Vignesh Raghavendra <vigneshr@ti.com>,
Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
Tejun Heo <tj@kernel.org>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Anna Schumaker <anna@kernel.org>,
Kees Cook <keescook@chromium.org>,
Damien Le Moal <dlemoal@kernel.org>,
Naohiro Aota <naohiro.aota@wdc.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-hardening@vger.kernel.org,
cgroups@vger.kernel.org, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super
Date: Thu, 14 Sep 2023 17:58:05 +0100 [thread overview]
Message-ID: <20230914165805.GJ800259@ZenIV> (raw)
In-Reply-To: <20230914-munkeln-pelzmantel-3e3a761acb72@brauner>
On Thu, Sep 14, 2023 at 04:02:25PM +0200, Christian Brauner wrote:
> Yes, you're right that making the superblock and not the filesytem type
> the bd_holder changes the logic and we are aware of that of course. And
> it requires changes such as moving additional block device closing from
> where some callers currently do it.
Details, please?
> But the filesytem type is not a very useful holder itself and has other
> drawbacks. The obvious one being that it requires us to wade through all
> superblocks on the system trying to find the superblock associated with
> a given block device continously grabbing and dropping sb_lock and
> s_umount. None of that is very pleasant nor elegant and it is for sure
> not very easy to understand (Plus, it's broken for btrfs freezing and
> syncing via block level ioctls.).
"Constantly" is a bit of a stretch - IIRC, we grabbed sb_lock once, then
went through the list comparing ->s_bdev (without any extra locking),
then bumped ->s_count on the found superblock, dropped sb_lock,
grabbed ->s_umount on the sucker and verified it's still alive.
Repeated grabbing of any lock happened only on a race with fs shutdown;
normal case is one spin_lock, one spin_unlock, one down_read().
Oh, well...
> Using the superblock as holder makes this go away and is overall a lot
> more useful and intuitive and can be extended to filesystems with
> multiple devices (Of which we apparently are bound to get more.).
>
> So I think this change is worth the pain.
>
> It's a fair point that these lifetime rules should be documented in
> Documentation/filesystems/. The old lifetime documentation is too sparse
> to be useful though.
What *are* these lifetime rules? Seriously, you have 3 chunks of
fs-dependent actions at the moment:
* the things needed to get rid of internal references pinning
inodes/dentries + whatever else we need done before generic_shutdown_super()
* the stuff to be done between generic_shutdown_super() and
making the sucker invisible to sget()/sget_fc()
* the stuff that must be done after we are sure that sget
callbacks won't be looking at this instance.
Note that Christoph's series has mashed (2) and (3) together, resulting
in UAF in a bunch of places. And I'm dead serious about
Documentation/filesystems/porting being the right place; any development
tree of any filesystem (in-tree one or not) will have to go through the
changes and figure out WTF to do with their existing code. We are
going to play whack-a-mole for at least several years as development
branches get rebased and merged.
Incidentally, I'm going to add a (belated by 10 years) chunk in porting.rst
re making sure that anything in superblock that might be needed by methods
called in RCU mode should *not* be freed without an RCU delay... Should've
done that back in 3.12 merge window when RCU'd vfsmounts went in; as it
is, today we have several filesystems with exact same kind of breakage.
hfsplus and affs breakage had been there in 3.13 (missed those two), exfat
and ntfs3 - introduced later, by initial merges of filesystems in question.
Missed on review...
Hell knows - perhaps Documentation/filesystems/whack-a-mole might be a good
idea...
next prev parent reply other threads:[~2023-09-14 16:58 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-13 11:09 split up ->kill_sb Christoph Hellwig
2023-09-13 11:09 ` [PATCH 01/19] fs: reflow deactivate_locked_super Christoph Hellwig
2023-09-13 16:35 ` Christian Brauner
2023-09-26 9:24 ` Christoph Hellwig
2023-09-13 11:09 ` [PATCH 02/19] fs: make ->kill_sb optional Christoph Hellwig
2023-09-13 11:09 ` [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super Christoph Hellwig
2023-09-13 23:27 ` Al Viro
2023-09-14 2:37 ` Al Viro
2023-09-14 5:38 ` Al Viro
2023-09-14 7:56 ` Christian Brauner
2023-09-26 9:31 ` Christoph Hellwig
2023-09-14 14:02 ` Christian Brauner
2023-09-14 16:58 ` Al Viro [this message]
2023-09-14 19:23 ` Al Viro
2023-09-15 7:40 ` Christian Brauner
2023-09-15 9:44 ` Christian Brauner
2023-09-15 14:12 ` Christian Brauner
2023-09-15 14:28 ` Al Viro
2023-09-15 14:33 ` Al Viro
2023-09-15 14:40 ` Christian Brauner
2023-09-26 9:41 ` Christoph Hellwig
2023-09-26 9:38 ` Christoph Hellwig
2023-09-26 21:25 ` Al Viro
2023-09-27 22:29 ` Al Viro
2023-10-02 6:46 ` Christoph Hellwig
2023-10-09 21:57 ` Al Viro
2023-10-10 8:44 ` Christian Brauner
2023-10-17 19:50 ` Al Viro
2023-09-13 11:09 ` [PATCH 04/19] NFS: remove the s_dev field from struct nfs_server Christoph Hellwig
2023-09-13 11:09 ` [PATCH 05/19] fs: assign an anon dev_t in common code Christoph Hellwig
2023-09-14 0:34 ` Al Viro
2023-09-13 11:10 ` [PATCH 06/19] qibfs: use simple_release_fs Christoph Hellwig
2023-09-18 11:41 ` Leon Romanovsky
2023-09-13 11:10 ` [PATCH 07/19] hypfs: use d_genocide to kill fs entries Christoph Hellwig
2023-09-13 11:10 ` [PATCH 08/19] pstore: shrink the pstore_sb_lock critical section in pstore_kill_sb Christoph Hellwig
2023-09-13 22:07 ` Kees Cook
2023-09-13 11:10 ` [PATCH 09/19] zonefs: remove duplicate cleanup in zonefs_fill_super Christoph Hellwig
2023-09-14 0:33 ` Damien Le Moal
2023-09-14 0:49 ` Al Viro
2023-09-13 11:10 ` [PATCH 10/19] USB: gadget/legacy: remove sb_mutex Christoph Hellwig
2023-09-13 16:10 ` Alan Stern
2023-09-26 9:24 ` Christoph Hellwig
2023-09-14 10:22 ` Sergey Shtylyov
2023-09-13 11:10 ` [PATCH 11/19] fs: add new shutdown_sb and free_sb methods Christoph Hellwig
2023-09-14 2:07 ` Al Viro
2023-09-13 11:10 ` [PATCH 12/19] fs: convert kill_litter_super to litter_shutdown_sb Christoph Hellwig
2023-09-13 22:07 ` Kees Cook
2023-09-13 11:10 ` [PATCH 13/19] fs: convert kill_block_super to block_free_sb Christoph Hellwig
2023-09-14 2:29 ` Al Viro
2023-09-13 11:10 ` [PATCH 14/19] jffs2: convert to ->shutdown_sb and ->free_sb Christoph Hellwig
2023-09-13 11:10 ` [PATCH 15/19] kernfs: split ->kill_sb Christoph Hellwig
2023-09-18 15:24 ` Michal Koutný
2023-09-13 11:10 ` [PATCH 16/19] x86/resctrl: release rdtgroup_mutex and the CPU hotplug lock in rdt_shutdown_sb Christoph Hellwig
2023-09-13 11:10 ` [PATCH 17/19] NFS: move nfs_kill_super to fs_context.c Christoph Hellwig
2023-09-13 11:10 ` [PATCH 18/19] fs: simple ->shutdown_sb and ->free_sb conversions Christoph Hellwig
2023-09-13 11:10 ` [PATCH 19/19] fs: remove ->kill_sb Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230914165805.GJ800259@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=agordeev@linux.ibm.com \
--cc=anna@kernel.org \
--cc=brauner@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=dennis.dalessandro@cornelisnetworks.com \
--cc=dlemoal@kernel.org \
--cc=fenghua.yu@intel.com \
--cc=gor@linux.ibm.com \
--cc=gregkh@linuxfoundation.org \
--cc=hca@linux.ibm.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=keescook@chromium.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=miquel.raynal@bootlin.com \
--cc=naohiro.aota@wdc.com \
--cc=reinette.chatre@intel.com \
--cc=richard@nod.at \
--cc=tj@kernel.org \
--cc=trond.myklebust@hammerspace.com \
--cc=vigneshr@ti.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).