From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()
Date: Wed, 1 May 2019 02:59:05 +0100 [thread overview]
Message-ID: <20190501015904.GP23075@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190430040043.GH23075@ZenIV.linux.org.uk>
On Tue, Apr 30, 2019 at 05:00:43AM +0100, Al Viro wrote:
> Where would you put that synchronize_rcu()? Doing that before ->put_super()
> is too early - inode references might be dropped in there. OTOH, doing
> that after that point means that while struct super_block itself will be
> there, any number of data structures hanging from it might be not.
>
> So we are still very limited in what we can do inside ->free_inode()
> instance *and* we get bunch of synchronize_rcu() for no good reason.
>
> Note that for normal lockless accesses (lockless ->d_revalidate(), ->d_hash(),
> etc.) we are just fine with having struct super_block freeing RCU-delayed
> (along with any data structures we might need) - the superblock had
> been seen at some point after we'd taken rcu_read_lock(), so its
> freeing won't happen until we drop it. So we don't need synchronize_rcu()
> for that.
>
> Here the problem is that we are dealing with another RCU callback;
> synchronize_rcu() would be needed for it, but it will only protect that
> intermediate dereference of ->i_sb; any rcu-delayed stuff scheduled
> from inside ->put_super() would not be ordered wrt ->free_inode().
> And if we are doing that just for the sake of that one dereference,
> we might as well do it before scheduling i_callback().
>
> PS: we *are* guaranteed that module will still be there (unregister_filesystem()
> does synchronize_rcu() and rcu_barrier() is done before kmem_cache_destroy()
> in assorted exit_foo_fs()).
After playing with that for a while, I think that adding barriers on
superblock freeing (or shutdown) should wait, assuming we do them at
all.
Right now no ->free_inode() instances look at superblock or anything
associated with it; moreover, there's no good candidate code that
could be moved there and would benefit from such access. So we
don't have any material to see what could be useful to protect.
Access to ->i_sb->s_op->free_inode itself is the only exception and
moving that to before the rcu delay is both less invasive and a _lot_
more robust than playing with synchronize_rcu(). We can do that
without growing struct inode or storing it for long periods -
->i_fop is only accessed for struct inode with positive refcount,
so we can put that into anon union with the ->free_inode value,
setting it just before we schedule execution of i_callback()
(and before the direct call of the same in alloc_inode() failure
exit).
IMO the following is the sane incremental for the coming window purposes;
if we get a convincing case for ->free_inode() doing something that could
benefit from being ordered wrt parts of fs shutdown, we can always deal
with synchronize_rcu() later. Existing instances will be fine, and IMO
separating RCU-delayed parts of inode destruction from the rest is
worthwhile on its own.
Objections?
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 9d80f9e0855e..b8d3ddd8b8db 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -655,3 +655,11 @@ in your dentry operations instead.
* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
* combination of NULL ->destroy_inode and NULL ->free_inode is
treated as NULL/free_inode_nonrcu, to preserve the compatibility.
+
+ Note that the callback (be it via ->free_inode() or explicit call_rcu()
+ in ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
+ as the matter of fact, the superblock and all associated structures
+ might be already gone. The filesystem driver is guaranteed to be still
+ there, but that's it. Freeing memory in the callback is fine; doing
+ more than that is possible, but requires a lot of care and is best
+ avoided.
diff --git a/fs/inode.c b/fs/inode.c
index fb45590d284e..627e1766503a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -211,8 +211,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
static void i_callback(struct rcu_head *head)
{
struct inode *inode = container_of(head, struct inode, i_rcu);
- if (inode->i_sb->s_op->free_inode)
- inode->i_sb->s_op->free_inode(inode);
+ if (inode->free_inode)
+ inode->free_inode(inode);
else
free_inode_nonrcu(inode);
}
@@ -236,6 +236,7 @@ static struct inode *alloc_inode(struct super_block *sb)
if (!ops->free_inode)
return NULL;
}
+ inode->free_inode = ops->free_inode;
i_callback(&inode->i_rcu);
return NULL;
}
@@ -276,6 +277,7 @@ static void destroy_inode(struct inode *inode)
if (!ops->free_inode)
return;
}
+ inode->free_inode = ops->free_inode;
call_rcu(&inode->i_rcu, i_callback);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2e9b9f87caca..92732286b748 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -694,7 +694,10 @@ struct inode {
#ifdef CONFIG_IMA
atomic_t i_readcount; /* struct files open RO */
#endif
- const struct file_operations *i_fop; /* former ->i_op->default_file_ops */
+ union {
+ const struct file_operations *i_fop; /* former ->i_op->default_file_ops */
+ void (*free_inode)(struct inode *);
+ };
struct file_lock_context *i_flctx;
struct address_space i_data;
struct list_head i_devices;
next prev parent reply other threads:[~2019-05-01 1:59 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-16 17:49 [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode() Al Viro
2019-04-16 17:52 ` [RFC PATCH 01/62] securityfs: fix use-after-free on symlink traversal Al Viro
2019-04-16 17:52 ` [RFC PATCH 02/62] apparmorfs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 03/62] new inode method: ->free_inode() Al Viro
2019-04-16 17:52 ` [RFC PATCH 04/62] spufs: switch to ->free_inode() Al Viro
2019-04-16 17:52 ` [RFC PATCH 05/62] erofs: " Al Viro
2019-04-18 14:01 ` Gao Xiang
2019-04-18 14:01 ` Gao Xiang
2019-04-16 17:52 ` [RFC PATCH 06/62] 9p: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 07/62] adfs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 08/62] affs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 09/62] befs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 10/62] bfs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 11/62] bdev: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 12/62] cifs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 13/62] debugfs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 14/62] efs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 15/62] ext2: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 16/62] f2fs: " Al Viro
2019-04-20 2:52 ` Chao Yu
2019-04-16 17:52 ` [RFC PATCH 17/62] fat: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 18/62] freevxfs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 19/62] gfs2: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 20/62] hfs: " Al Viro
2019-04-16 17:52 ` [RFC PATCH 21/62] hfsplus: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 22/62] hostfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 23/62] hpfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 24/62] isofs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 25/62] jffs2: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 26/62] minix: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 27/62] nfs{,4}: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 28/62] nilfs2: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 29/62] dlmfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 30/62] ocfs2: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 31/62] openpromfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 32/62] procfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 33/62] qnx4: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 34/62] qnx6: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 35/62] reiserfs: convert " Al Viro
2019-04-16 17:53 ` [RFC PATCH 36/62] romfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 37/62] squashfs: switch " Al Viro
2019-04-16 17:53 ` [RFC PATCH 38/62] ubifs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 39/62] udf: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 40/62] sysv: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 41/62] coda: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 42/62] ufs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 43/62] mqueue: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 44/62] bpf: " Al Viro
2019-04-16 18:07 ` Alexei Starovoitov
2019-04-16 21:34 ` Song Liu
2019-04-16 17:53 ` [RFC PATCH 45/62] rpcpipe: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 46/62] apparmor: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 47/62] securityfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 48/62] ntfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 49/62] dax: make use of ->free_inode() Al Viro
2019-04-18 12:16 ` Jan Kara
2019-04-18 16:58 ` Dan Williams
2019-04-16 17:53 ` [RFC PATCH 50/62] afs: switch to " Al Viro
2019-04-16 17:53 ` [RFC PATCH 51/62] btrfs: use ->free_inode() Al Viro
2019-04-16 17:53 ` [RFC PATCH 52/62] ceph: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 53/62] ecryptfs: make use of ->free_inode() Al Viro
2019-04-16 17:53 ` [RFC PATCH 54/62] ext4: " Al Viro
2019-04-18 12:10 ` Jan Kara
2019-04-16 17:53 ` [RFC PATCH 55/62] fuse: switch to ->free_inode() Al Viro
2019-04-16 17:53 ` [RFC PATCH 56/62] jfs: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 57/62] overlayfs: make use of ->free_inode() Al Viro
2019-04-16 17:53 ` [RFC PATCH 58/62] hugetlb: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 59/62] shmem: " Al Viro
2019-04-16 17:53 ` [RFC PATCH 60/62] orangefs: " Al Viro
2019-04-22 21:14 ` Mike Marshall
2019-04-22 21:56 ` Linus Torvalds
2019-04-22 23:10 ` Al Viro
2019-04-22 23:17 ` Mike Marshall
2019-04-16 17:53 ` [RFC PATCH 61/62] sockfs: switch to ->free_inode() Al Viro
2019-04-16 17:53 ` [RFC PATCH 62/62] coallocate socket->wq with socket itself Al Viro
2019-04-16 18:01 ` [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode() Linus Torvalds
2019-04-30 3:09 ` Al Viro
[not found] ` <CAHk-=wiMvCR0iENUVorfU-3EMC7G8RNSeHSQrz9tndP1uSg2BQ@mail.gmail.com>
2019-04-30 4:00 ` Al Viro
2019-05-01 1:59 ` Al Viro [this message]
2019-04-30 4:18 ` Andreas Dilger
2019-04-30 4:26 ` Al Viro
2019-04-30 5:26 ` Andreas Dilger
2019-04-17 15:55 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190501015904.GP23075@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.