Linux cgroups development
 help / color / mirror / Atom feed
From: Christian Brauner <brauner-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	Heiko Carstens <hca-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>,
	Vasily Gorbik <gor-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>,
	Alexander Gordeev
	<agordeev-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>,
	Fenghua Yu <fenghua.yu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Reinette Chatre
	<reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Miquel Raynal
	<miquel.raynal-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org>,
	Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org>,
	Vignesh Raghavendra <vigneshr-l0cyMroinI0@public.gmane.org>,
	Dennis Dalessandro
	<dennis.dalessandro-ntyVByD3zXaTtA8H5PvdGFaTQe2KTcn/@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Trond Myklebust
	<trond.myklebust-F/q8l9xzQnoyLce1RVWEUA@public.gmane.org>,
	Anna Schumaker <anna-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
	Damien Le Moal <dlemoal-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Naohiro Aota <naohiro.aota-Sjgp3cTcYWE@public.gmane.org>,
	Greg Kroah-Hartman
	<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-s390-u79uwXL29TaqPxH82wqD4g@public.gmane.org
Subject: Re: [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super
Date: Fri, 15 Sep 2023 16:12:07 +0200	[thread overview]
Message-ID: <20230915-zweit-frech-0e06394208a3@brauner> (raw)
In-Reply-To: <20230915-elstern-etatplanung-906c6780af19@brauner>

> > tree of any filesystem (in-tree one or not) will have to go through the
> > changes and figure out WTF to do with their existing code.  We are
> > going to play whack-a-mole for at least several years as development
> > branches get rebased and merged.
> 
> Let me write something up.

So here I've written two porting.rst patches that aim to reflect the
current state of things (They do _not_ reflect what's in Christoph's
series here as that'ss again pretty separate and will require additional
spelling out.).

I'm adding explanation for both the old and new logic fwiw. I hope to
upstream these docs soon so we all have something to point to.

From 200666901f53db74edf309d48e3c74fd275a822a Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Fri, 15 Sep 2023 16:01:02 +0200
Subject: [PATCH 1/2] porting: document new block device opening order

Signed-off-by: Christian Brauner <brauner-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 Documentation/filesystems/porting.rst | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index deac4e973ddc..f436b64b77bf 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -949,3 +949,27 @@ mmap_lock held.  All in-tree users have been audited and do not seem to
 depend on the mmap_lock being held, but out of tree users should verify
 for themselves.  If they do need it, they can return VM_FAULT_RETRY to
 be called with the mmap_lock held.
+
+---
+
+**mandatory**
+
+The order of opening block devices and matching or creating superblocks has
+changed.
+
+The old logic opened block devices first and then tried to find a
+suitable superblock to reuse based on the block device pointer.
+
+The new logic finds or creates a superblock first, opening block devices
+afterwards. Since opening block devices cannot happen under s_umount because of
+lock ordering requirements s_umount is now dropped while opening block
+devices and reacquired before calling fill_super().
+
+In the old logic concurrent mounters would find the superblock on the list of
+active superblock for the filesystem type. Since the first opener of the block
+device would hold s_umount they would wait until the superblock became either
+born or died prematurely due to initialization failure.
+
+Since the new logic drops s_umount concurrent mounters could grab s_umount and
+would spin. Instead they are now made to wait using an explicit wait-wake
+mechanism without having to hold s_umount.
-- 
2.34.1

From 1f09898322b4402219d8d3219d399c9e56a76bae Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Fri, 15 Sep 2023 16:01:40 +0200
Subject: [PATCH 2/2] porting: document superblock as block device holder

Signed-off-by: Christian Brauner <brauner-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 Documentation/filesystems/porting.rst | 79 +++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index f436b64b77bf..fefefaf289b4 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -973,3 +973,82 @@ born or died prematurely due to initialization failure.
 Since the new logic drops s_umount concurrent mounters could grab s_umount and
 would spin. Instead they are now made to wait using an explicit wait-wake
 mechanism without having to hold s_umount.
+
+---
+
+**mandatory**
+
+The holder of a block device is now the superblock.
+
+The holder of a block device used to be the file_system_type which wasn't
+particularly useful. It wasn't possible to go from block device to owning
+superblock without matching on the device pointer stored in the superblock.
+This mechanism would only work for a single device so the block layer couldn't
+find the owning superblock associated with additional devices.
+
+In the old mechanism reusing or creating a superblock for racing mount(2) and
+umount(2) relied on the file_system_type as the holder. This was severly
+underdocumented however:
+
+(1) If the concurrent mount(2) managed to grab an active reference before the
+    umount(2) dropped the last active reference in deactivate_locked_super()
+    the mounter would simply reuse the existing superblock.
+
+(2) If the mounter came after deactivate_locked_super() but before
+    the superblock had been removed from the list of superblocks of the
+    filesystem type the mounter would wait until the superblock was shutdown
+    and allocated a new superblock.
+
+(3) If the mounter came after deactivate_locked_super() and after
+    the superblock had been removed from the list of superblocks of the
+    filesystem type the mounter would allocate a new superblock.
+
+Because the holder of the block device was the filesystem type any concurrent
+mounter could open the block device without risking seeing EBUSY because the
+block device was still in use.
+
+Making the superblock the owner of the block device changes this as the holder
+is now a unique superblock and not shared among all superblocks of the
+filesystem type. So a concurrent mounter in (2) could suddenly see EBUSY when
+trying to open a block device whose holder was a different superblock.
+
+The new logic thus waits until the superblock and the devices are shutdown in
+->kill_sb(). Removal of the superblock from the list of superblocks of the
+filesystem type is now moved to a later point when the devices are closed:
+
+(1) Any concurrent mounter managing to grab an active reference on an existing
+    superblock is made to wait until the superblock is either ready or until
+    the superblock and all devices are shutdown in ->kill_sb().
+
+(2) If the mounter came after deactivate_locked_super() but before
+    the superblock had been removed from the list of superblocks of the
+    filesystem type the mounter is made to wait until the superblock and the
+    devices are shut down in ->kill_sb() and the superblock is removed from the
+    list of superblocks of the filesystem type.
+
+(3) This case is now collapsed into (2) as the superblock is left on the list
+    of superblocks of the filesystem type until all devices are shutdown in
+    ->kill_sb().
+
+As this is a VFS level change it has no practical consequences for filesystems
+other than that all of them must use one of the provided kill_litter_super(),
+kill_anon_super(), or kill_block_super() helpers.
+
+Filesystems that reuse superblocks based on non-static keys such as
+sb->s_fs_info must ensure that these keys remain valid across kill_*_super()
+calls. The expected pattern is::
+
+	static struct file_system_type some_fs_type = {
+		.name 		= "somefs",
+		.kill_sb 	= some_fs_kill_sb,
+	};
+
+	static void some_fs_kill_sb(struct super_block *sb)
+	{
+		struct some_fs_info *info = sb->s_fs_info;
+
+		kill_*_super(sb);
+		kfree(info);
+	}
+
+It's best practice to never deviate from this pattern.
-- 
2.34.1


  reply	other threads:[~2023-09-15 14:12 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-13 11:09 split up ->kill_sb Christoph Hellwig
2023-09-13 11:09 ` [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super Christoph Hellwig
     [not found]   ` <20230913111013.77623-4-hch-jcswGhMUV9g@public.gmane.org>
2023-09-13 23:27     ` Al Viro
2023-09-14  2:37       ` Al Viro
2023-09-14  5:38         ` Al Viro
2023-09-14  7:56           ` Christian Brauner
2023-09-26  9:31             ` Christoph Hellwig
2023-09-26  9:31               ` Christoph Hellwig
2023-09-14 14:02           ` Christian Brauner
2023-09-14 16:58             ` Al Viro
2023-09-14 19:23               ` Al Viro
2023-09-15  7:40                 ` Christian Brauner
2023-09-15  9:44               ` Christian Brauner
2023-09-15 14:12                 ` Christian Brauner [this message]
2023-09-15 14:28                   ` Al Viro
2023-09-15 14:33                     ` Al Viro
2023-09-15 14:40                     ` Christian Brauner
2023-09-26  9:41           ` Christoph Hellwig
2023-09-26  9:41             ` Christoph Hellwig
2023-09-26  9:38       ` Christoph Hellwig
2023-09-26  9:38         ` Christoph Hellwig
2023-09-26 21:25         ` Al Viro
2023-09-27 22:29           ` Al Viro
2023-10-02  6:46           ` Christoph Hellwig
2023-10-09 21:57             ` Al Viro
2023-10-10  8:44               ` Christian Brauner
2023-10-17 19:50                 ` Al Viro
     [not found] ` <20230913111013.77623-1-hch-jcswGhMUV9g@public.gmane.org>
2023-09-13 11:09   ` [PATCH 01/19] fs: reflow deactivate_locked_super Christoph Hellwig
     [not found]     ` <20230913111013.77623-2-hch-jcswGhMUV9g@public.gmane.org>
2023-09-13 16:35       ` Christian Brauner
2023-09-26  9:24         ` Christoph Hellwig
2023-09-26  9:24           ` Christoph Hellwig
2023-09-13 11:09   ` [PATCH 02/19] fs: make ->kill_sb optional Christoph Hellwig
2023-09-13 11:09   ` [PATCH 04/19] NFS: remove the s_dev field from struct nfs_server Christoph Hellwig
2023-09-13 11:09   ` [PATCH 05/19] fs: assign an anon dev_t in common code Christoph Hellwig
     [not found]     ` <20230913111013.77623-6-hch-jcswGhMUV9g@public.gmane.org>
2023-09-14  0:34       ` Al Viro
2023-09-13 11:10   ` [PATCH 06/19] qibfs: use simple_release_fs Christoph Hellwig
     [not found]     ` <20230913111013.77623-7-hch-jcswGhMUV9g@public.gmane.org>
2023-09-18 11:41       ` Leon Romanovsky
2023-09-13 11:10   ` [PATCH 07/19] hypfs: use d_genocide to kill fs entries Christoph Hellwig
2023-09-13 11:10   ` [PATCH 08/19] pstore: shrink the pstore_sb_lock critical section in pstore_kill_sb Christoph Hellwig
     [not found]     ` <20230913111013.77623-9-hch-jcswGhMUV9g@public.gmane.org>
2023-09-13 22:07       ` Kees Cook
2023-09-13 11:10   ` [PATCH 09/19] zonefs: remove duplicate cleanup in zonefs_fill_super Christoph Hellwig
     [not found]     ` <20230913111013.77623-10-hch-jcswGhMUV9g@public.gmane.org>
2023-09-14  0:33       ` Damien Le Moal
2023-09-14  0:49       ` Al Viro
2023-09-13 11:10   ` [PATCH 10/19] USB: gadget/legacy: remove sb_mutex Christoph Hellwig
     [not found]     ` <20230913111013.77623-11-hch-jcswGhMUV9g@public.gmane.org>
2023-09-13 16:10       ` Alan Stern
     [not found]         ` <7f839be1-4898-41ad-8eda-10d5a0350bdf-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org>
2023-09-26  9:24           ` Christoph Hellwig
2023-09-26  9:24             ` Christoph Hellwig
2023-09-14 10:22     ` Sergey Shtylyov
2023-09-13 11:10   ` [PATCH 11/19] fs: add new shutdown_sb and free_sb methods Christoph Hellwig
     [not found]     ` <20230913111013.77623-12-hch-jcswGhMUV9g@public.gmane.org>
2023-09-14  2:07       ` Al Viro
2023-09-13 11:10   ` [PATCH 12/19] fs: convert kill_litter_super to litter_shutdown_sb Christoph Hellwig
     [not found]     ` <20230913111013.77623-13-hch-jcswGhMUV9g@public.gmane.org>
2023-09-13 22:07       ` Kees Cook
2023-09-13 11:10   ` [PATCH 13/19] fs: convert kill_block_super to block_free_sb Christoph Hellwig
     [not found]     ` <20230913111013.77623-14-hch-jcswGhMUV9g@public.gmane.org>
2023-09-14  2:29       ` Al Viro
2023-09-13 11:10   ` [PATCH 14/19] jffs2: convert to ->shutdown_sb and ->free_sb Christoph Hellwig
2023-09-13 11:10   ` [PATCH 15/19] kernfs: split ->kill_sb Christoph Hellwig
     [not found]     ` <20230913111013.77623-16-hch-jcswGhMUV9g@public.gmane.org>
2023-09-18 15:24       ` Michal Koutný
2023-09-13 11:10   ` [PATCH 16/19] x86/resctrl: release rdtgroup_mutex and the CPU hotplug lock in rdt_shutdown_sb Christoph Hellwig
2023-09-13 11:10   ` [PATCH 17/19] NFS: move nfs_kill_super to fs_context.c Christoph Hellwig
2023-09-13 11:10   ` [PATCH 18/19] fs: simple ->shutdown_sb and ->free_sb conversions Christoph Hellwig
2023-09-13 11:10   ` [PATCH 19/19] fs: remove ->kill_sb Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230915-zweit-frech-0e06394208a3@brauner \
    --to=brauner-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=agordeev-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org \
    --cc=anna-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=dennis.dalessandro-ntyVByD3zXaTtA8H5PvdGFaTQe2KTcn/@public.gmane.org \
    --cc=dlemoal-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=fenghua.yu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=gor-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org \
    --cc=gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org \
    --cc=hca-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=jack-AlSwsSmVLrQ@public.gmane.org \
    --cc=keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-s390-u79uwXL29TaqPxH82wqD4g@public.gmane.org \
    --cc=linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=miquel.raynal-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org \
    --cc=naohiro.aota-Sjgp3cTcYWE@public.gmane.org \
    --cc=reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=richard-/L3Ra7n9ekc@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=trond.myklebust-F/q8l9xzQnoyLce1RVWEUA@public.gmane.org \
    --cc=vigneshr-l0cyMroinI0@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox