From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Jan Kara <jack@suse.cz>, Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Stephen Brennan <stephen.s.brennan@oracle.com>
Subject: [PATCH 1/2] fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem
Date: Mon, 17 Oct 2022 21:12:32 -0700 [thread overview]
Message-ID: <20221018041233.376977-2-stephen.s.brennan@oracle.com> (raw)
In-Reply-To: <20221018041233.376977-1-stephen.s.brennan@oracle.com>
When an inode is interested in events on its children, it must set
DCACHE_FSNOTIFY_PARENT_WATCHED flag on all its children. Currently, when
the fsnotify connector is removed and i_fsnotify_mask becomes zero, we
lazily allow __fsnotify_parent() to do this the next time we see an
event on a child.
However, if the list of children is very long (e.g., in the millions),
and lots of activity is occurring on the directory, then it's possible
for many CPUs to end up blocked on the inode spinlock in
__fsnotify_update_child_flags(). Each CPU will then redundantly iterate
over the very long list of children. This situation can cause soft
lockups.
To avoid this, stop lazily updating child flags in __fsnotify_parent().
Protect the child flag update with i_rwsem held exclusive, to ensure
that we only iterate over the child list when it's absolutely necessary,
and even then, only once.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
Notes:
It seems that there are two implementation options for this, regarding
what i_rwsem protects:
1. Both updates to i_fsnotify_mask, and the child dentry flags, or
2. Only updates to the child dentry flags
I wanted to do #1, but it got really tricky with fsnotify_put_mark(). We
don't want to hold the inode lock whenever we decrement the refcount,
but if we don't, then we're stuck holding a spinlock when the refcount
goes to zero, and we need to grab the inode rwsem to synchronize the
update to the child flags. I'm sure there's a way around this, but I
didn't keep going with it.
With #1, as currently implemented, we have the unfortunate effect of
that a mark can be added, can see that no update is required, and
return, despite the fact that the flag update is still in progress on a
different CPU/thread. From our discussion, that seems to be the current
status quo, but I wanted to explicitly point that out. If we want to
move to #1, it should be possible with some work.
fs/notify/fsnotify.c | 12 ++++++++--
fs/notify/mark.c | 55 ++++++++++++++++++++++++++++++++++----------
2 files changed, 53 insertions(+), 14 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 7974e91ffe13..e887a195983b 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -207,8 +207,16 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
parent = dget_parent(dentry);
p_inode = parent->d_inode;
p_mask = fsnotify_inode_watches_children(p_inode);
- if (unlikely(parent_watched && !p_mask))
- __fsnotify_update_child_dentry_flags(p_inode);
+ if (unlikely(parent_watched && !p_mask)) {
+ /*
+ * Flag would be cleared soon by
+ * __fsnotify_update_child_dentry_flags(), but as an
+ * optimization, clear it now.
+ */
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
+ spin_unlock(&dentry->d_lock);
+ }
/*
* Include parent/name in notification either if some notification
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index c74ef947447d..da9f944fcbbb 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -184,15 +184,36 @@ static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
*/
void fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
{
+ struct inode *inode = NULL;
+ int watched_before, watched_after;
+
if (!conn)
return;
- spin_lock(&conn->lock);
- __fsnotify_recalc_mask(conn);
- spin_unlock(&conn->lock);
- if (conn->type == FSNOTIFY_OBJ_TYPE_INODE)
- __fsnotify_update_child_dentry_flags(
- fsnotify_conn_inode(conn));
+ if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
+ /*
+ * For inodes, we may need to update flags on the child
+ * dentries. To ensure these updates occur exactly once,
+ * synchronize the recalculation with the inode mutex.
+ */
+ inode = fsnotify_conn_inode(conn);
+ spin_lock(&conn->lock);
+ watched_before = fsnotify_inode_watches_children(inode);
+ __fsnotify_recalc_mask(conn);
+ watched_after = fsnotify_inode_watches_children(inode);
+ spin_unlock(&conn->lock);
+
+ inode_lock(inode);
+ if ((watched_before && !watched_after) ||
+ (!watched_before && watched_after)) {
+ __fsnotify_update_child_dentry_flags(inode);
+ }
+ inode_unlock(inode);
+ } else {
+ spin_lock(&conn->lock);
+ __fsnotify_recalc_mask(conn);
+ spin_unlock(&conn->lock);
+ }
}
/* Free all connectors queued for freeing once SRCU period ends */
@@ -295,6 +316,8 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
struct fsnotify_mark_connector *conn = READ_ONCE(mark->connector);
void *objp = NULL;
unsigned int type = FSNOTIFY_OBJ_TYPE_DETACHED;
+ struct inode *inode = NULL;
+ int watched_before, watched_after;
bool free_conn = false;
/* Catch marks that were actually never attached to object */
@@ -311,17 +334,31 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
if (!refcount_dec_and_lock(&mark->refcnt, &conn->lock))
return;
+ if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
+ inode = fsnotify_conn_inode(conn);
+ watched_before = fsnotify_inode_watches_children(inode);
+ }
+
hlist_del_init_rcu(&mark->obj_list);
if (hlist_empty(&conn->list)) {
objp = fsnotify_detach_connector_from_object(conn, &type);
free_conn = true;
+ watched_after = 0;
} else {
objp = __fsnotify_recalc_mask(conn);
type = conn->type;
+ watched_after = fsnotify_inode_watches_children(inode);
}
WRITE_ONCE(mark->connector, NULL);
spin_unlock(&conn->lock);
+ if (inode) {
+ inode_lock(inode);
+ if (watched_before && !watched_after)
+ __fsnotify_update_child_dentry_flags(inode);
+ inode_unlock(inode);
+ }
+
fsnotify_drop_object(type, objp);
if (free_conn) {
@@ -331,12 +368,6 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
spin_unlock(&destroy_lock);
queue_work(system_unbound_wq, &connector_reaper_work);
}
- /*
- * Note that we didn't update flags telling whether inode cares about
- * what's happening with children. We update these flags from
- * __fsnotify_parent() lazily when next event happens on one of our
- * children.
- */
spin_lock(&destroy_lock);
list_add(&mark->g_list, &destroy_list);
spin_unlock(&destroy_lock);
--
2.34.1
next prev parent reply other threads:[~2022-10-18 4:12 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-13 22:27 [RFC] fsnotify: allow sleepable child dentry flag update Stephen Brennan
2022-10-13 23:51 ` Al Viro
2022-11-01 21:47 ` Stephen Brennan
2022-10-14 8:01 ` Amir Goldstein
2022-10-17 7:59 ` Stephen Brennan
2022-10-17 11:44 ` Amir Goldstein
2022-10-17 16:59 ` Stephen Brennan
2022-10-17 17:42 ` Amir Goldstein
2022-10-17 9:09 ` Jan Kara
2022-10-18 4:12 ` [PATCH 0/2] fsnotify: fix softlockups iterating over d_subdirs Stephen Brennan
2022-10-18 4:12 ` Stephen Brennan [this message]
2022-10-18 7:39 ` [PATCH 1/2] fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem Amir Goldstein
2022-10-21 0:33 ` Stephen Brennan
2022-10-21 7:22 ` Amir Goldstein
2022-10-18 4:12 ` [PATCH 2/2] fsnotify: allow sleepable child flag update Stephen Brennan
2022-10-18 5:36 ` Amir Goldstein
2022-10-27 7:50 ` kernel test robot
2022-10-27 8:44 ` Yujie Liu
2022-10-27 22:12 ` Stephen Brennan
2022-10-18 8:07 ` [PATCH 0/2] fsnotify: fix softlockups iterating over d_subdirs Amir Goldstein
2022-10-18 23:52 ` Stephen Brennan
2022-10-19 5:33 ` Amir Goldstein
2022-10-27 22:06 ` Stephen Brennan
2022-10-28 8:58 ` Amir Goldstein
2022-10-21 1:03 ` [PATCH v2 0/3] " Stephen Brennan
2022-10-21 1:03 ` [PATCH v2 1/3] fsnotify: Use d_find_any_alias to get dentry associated with inode Stephen Brennan
2022-10-21 9:25 ` Amir Goldstein
2022-10-21 1:03 ` [PATCH v2 2/3] fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem Stephen Brennan
2022-10-21 4:01 ` kernel test robot
2022-10-21 8:22 ` Amir Goldstein
2022-10-21 9:18 ` Amir Goldstein
2022-10-25 18:02 ` Stephen Brennan
2022-10-26 5:41 ` Amir Goldstein
2022-10-21 9:17 ` Christian Brauner
2022-10-21 9:21 ` Amir Goldstein
2022-10-21 1:03 ` [PATCH v2 3/3] fsnotify: allow sleepable child flag update Stephen Brennan
2022-10-28 0:10 ` [PATCH v3 0/3] fsnotify: fix softlockups iterating over d_subdirs Stephen Brennan
2022-10-28 0:10 ` [PATCH v3 1/3] fsnotify: Use d_find_any_alias to get dentry associated with inode Stephen Brennan
2022-11-10 1:12 ` Stephen Brennan
2022-10-28 0:10 ` [PATCH v3 2/3] fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem Stephen Brennan
2022-10-28 9:11 ` Amir Goldstein
2022-11-10 0:03 ` kernel test robot
2022-11-10 1:06 ` Stephen Brennan
2022-10-28 0:10 ` [PATCH v3 3/3] fsnotify: allow sleepable child flag update Stephen Brennan
2022-10-28 9:32 ` Amir Goldstein
2022-11-01 21:25 ` Stephen Brennan
2022-11-01 17:51 ` [PATCH v3 0/3] fsnotify: fix softlockups iterating over d_subdirs Jan Kara
2022-11-01 20:48 ` Stephen Brennan
2022-11-02 8:55 ` Amir Goldstein
2022-11-10 20:04 ` Stephen Brennan
[not found] ` <CAOQ4uxjRVRjTNJ-2CSX9QwLVC9oQN9r4GHqCn=XZrisZo6DN2w@mail.gmail.com>
[not found] ` <87eduafg6d.fsf@oracle.com>
2022-11-11 7:56 ` Amir Goldstein
2022-11-02 17:52 ` Jan Kara
2022-11-04 23:33 ` Stephen Brennan
2022-11-07 11:56 ` Jan Kara
2022-11-11 22:06 ` [PATCH v4 0/5] " Stephen Brennan
2022-11-11 22:06 ` [PATCH v4 1/5] fsnotify: clear PARENT_WATCHED flags lazily Stephen Brennan
2022-11-11 22:06 ` [PATCH v4 2/5] fsnotify: Use d_find_any_alias to get dentry associated with inode Stephen Brennan
2022-11-12 8:53 ` Amir Goldstein
2022-11-11 22:06 ` [PATCH v4 3/5] dnotify: move fsnotify_recalc_mask() outside spinlock Stephen Brennan
2022-11-12 9:06 ` Amir Goldstein
2022-11-11 22:06 ` [PATCH v4 4/5] fsnotify: allow sleepable child flag update Stephen Brennan
2022-11-12 10:00 ` Amir Goldstein
2022-11-15 7:10 ` kernel test robot
2022-11-11 22:06 ` [PATCH v4 5/5] fsnotify: require inode lock held during " Stephen Brennan
2022-11-12 9:42 ` Amir Goldstein
2022-11-11 22:08 ` [PATCH v4 0/5] fsnotify: fix softlockups iterating over d_subdirs Stephen Brennan
2022-11-22 11:50 ` Jan Kara
2022-11-22 14:03 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221018041233.376977-2-stephen.s.brennan@oracle.com \
--to=stephen.s.brennan@oracle.com \
--cc=amir73il@gmail.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).