linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily
@ 2024-05-10 22:18 Stephen Brennan
  2024-05-10 22:19 ` [PATCH 1/1] " Stephen Brennan
  2024-05-11  3:42 ` [PATCH 0/1] " Amir Goldstein
  0 siblings, 2 replies; 6+ messages in thread
From: Stephen Brennan @ 2024-05-10 22:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: Amir Goldstein, linux-fsdevel, linux-kernel, Stephen Brennan

Hi Amir, Jan, et al,

It's been a while since I worked with you on the patch series[1] that aimed to
make __fsnotify_update_child_dentry_flags() a sleepable function. That work got
to a point that it was close to ready, but there were some locking issues which
Jan found, and the kernel test robot reported, and I didn't find myself able to
tackle them in the amount of time I had.

But looking back on that series, I think I threw out the baby with the
bathwater. While I may not have resolved the locking issues associated with the
larger change, there was one patch which Amir shared, that probably resolves
more than 90% of the issues that people may see. I'm sending that here, since it
still applies to the latest master branch, and I think it's a very good idea.

To refresh you, the underlying issue I was trying to resolve was when
directories have many dentries (frequently, a ton of negative dentries), the
__fsnotify_update_child_dentry_flags() operation can take a while, and it
happens under spinlock.

Case #1 - if the directory has tens of millions of dentries, then you could get
a soft lockup from a single call to this function. I have seen some cases where
a single directory had this many dentries, but it's pretty rare.

Case #2 - suppose you have a system with many CPUs and a busy directory. Suppose
the directory watch is removed. The caller will begin executing
__fsnotify_update_child_dentry_flags() to clear the PARENT_WATCHED flag, but in
parallel, many other CPUs could wind up in __fsnotify_parent() and decide that
they, too, must call __fsnotify_update_child_dentry_flags() to clear the flags.
These CPUs will all spin waiting their turn, at which point they'll re-do the
long (and likely, useless) call. Even if the original call only took a second or
two, if you have a dozen or so CPUs that end up in that call, some CPUs will
spin a long time.

Amir's patch to clear PARENT_WATCHED flags lazily resolves that easily. In
__fsnotify_parent(), if callers notice that the parent is no longer watching,
they merely update the flags for the current dentry (not all the other
children). The __fsnotify_recalc_mask() function further avoids excess calls by
only updating children if the parent started watching. This easily handles case
#2 above. Perhaps case #1 could still cause issues, for the cases of truly huge
dentry counts, but we shouldn't let "perfect" get in the way of "good enough" :)


Thanks,
Stephen

[1]: https://lore.kernel.org/all/20221013222719.277923-1-stephen.s.brennan@oracle.com/

Amir Goldstein (1):
  fsnotify: clear PARENT_WATCHED flags lazily

 fs/notify/fsnotify.c             | 26 ++++++++++++++++++++------
 fs/notify/fsnotify.h             |  3 ++-
 fs/notify/mark.c                 | 32 +++++++++++++++++++++++++++++---
 include/linux/fsnotify_backend.h |  8 +++++---
 4 files changed, 56 insertions(+), 13 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] fsnotify: clear PARENT_WATCHED flags lazily
  2024-05-10 22:18 [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily Stephen Brennan
@ 2024-05-10 22:19 ` Stephen Brennan
  2024-05-11  3:42 ` [PATCH 0/1] " Amir Goldstein
  1 sibling, 0 replies; 6+ messages in thread
From: Stephen Brennan @ 2024-05-10 22:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: Amir Goldstein, linux-fsdevel, linux-kernel, Stephen Brennan

From: Amir Goldstein <amir73il@gmail.com>

Call fsnotify_update_children_dentry_flags() to set PARENT_WATCHED flags
only when parent starts watching children.

When parent stops watching children, clear false positive PARENT_WATCHED
flags lazily in __fsnotify_parent() for each accessed child.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 fs/notify/fsnotify.c             | 26 ++++++++++++++++++++------
 fs/notify/fsnotify.h             |  3 ++-
 fs/notify/mark.c                 | 32 +++++++++++++++++++++++++++++---
 include/linux/fsnotify_backend.h |  8 +++++---
 4 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 2fc105a72a8f6..86d332baaba21 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -103,17 +103,13 @@ void fsnotify_sb_delete(struct super_block *sb)
  * parent cares.  Thus when an event happens on a child it can quickly tell
  * if there is a need to find a parent and send the event to the parent.
  */
-void __fsnotify_update_child_dentry_flags(struct inode *inode)
+void fsnotify_update_children_dentry_flags(struct inode *inode, bool watched)
 {
 	struct dentry *alias;
-	int watched;
 
 	if (!S_ISDIR(inode->i_mode))
 		return;
 
-	/* determine if the children should tell inode about their events */
-	watched = fsnotify_inode_watches_children(inode);
-
 	spin_lock(&inode->i_lock);
 	/* run all of the dentries associated with this inode.  Since this is a
 	 * directory, there damn well better only be one item on this list */
@@ -140,6 +136,24 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
 	spin_unlock(&inode->i_lock);
 }
 
+/*
+ * Lazily clear false positive PARENT_WATCHED flag for child whose parent had
+ * stopped watching children.
+ */
+static void fsnotify_update_child_dentry_flags(struct inode *inode,
+					       struct dentry *dentry)
+{
+	spin_lock(&dentry->d_lock);
+	/*
+	 * d_lock is a sufficient barrier to prevent observing a non-watched
+	 * parent state from before the fsnotify_update_children_dentry_flags()
+	 * or fsnotify_update_flags() call that had set PARENT_WATCHED.
+	 */
+	if (!fsnotify_inode_watches_children(inode))
+		dentry->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
+	spin_unlock(&dentry->d_lock);
+}
+
 /* Are inode/sb/mount interested in parent and name info with this event? */
 static bool fsnotify_event_needs_parent(struct inode *inode, __u32 mnt_mask,
 					__u32 mask)
@@ -214,7 +228,7 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 	p_inode = parent->d_inode;
 	p_mask = fsnotify_inode_watches_children(p_inode);
 	if (unlikely(parent_watched && !p_mask))
-		__fsnotify_update_child_dentry_flags(p_inode);
+		fsnotify_update_child_dentry_flags(p_inode, dentry);
 
 	/*
 	 * Include parent/name in notification either if some notification
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index fde74eb333cc9..bce9be36d06b5 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -74,7 +74,8 @@ static inline void fsnotify_clear_marks_by_sb(struct super_block *sb)
  * update the dentry->d_flags of all of inode's children to indicate if inode cares
  * about events that happen to its children.
  */
-extern void __fsnotify_update_child_dentry_flags(struct inode *inode);
+extern void fsnotify_update_children_dentry_flags(struct inode *inode,
+						  bool watched);
 
 extern struct kmem_cache *fsnotify_mark_connector_cachep;
 
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d6944ff86ffab..07cd66dc42fd6 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -176,6 +176,24 @@ static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 	return fsnotify_update_iref(conn, want_iref);
 }
 
+static bool fsnotify_conn_watches_children(
+					struct fsnotify_mark_connector *conn)
+{
+	if (conn->type != FSNOTIFY_OBJ_TYPE_INODE)
+		return false;
+
+	return fsnotify_inode_watches_children(fsnotify_conn_inode(conn));
+}
+
+static void fsnotify_conn_set_children_dentry_flags(
+					struct fsnotify_mark_connector *conn)
+{
+	if (conn->type != FSNOTIFY_OBJ_TYPE_INODE)
+		return;
+
+	fsnotify_update_children_dentry_flags(fsnotify_conn_inode(conn), true);
+}
+
 /*
  * Calculate mask of events for a list of marks. The caller must make sure
  * connector and connector->obj cannot disappear under us.  Callers achieve
@@ -184,15 +202,23 @@ static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
  */
 void fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 {
+	bool update_children;
+
 	if (!conn)
 		return;
 
 	spin_lock(&conn->lock);
+	update_children = !fsnotify_conn_watches_children(conn);
 	__fsnotify_recalc_mask(conn);
+	update_children &= fsnotify_conn_watches_children(conn);
 	spin_unlock(&conn->lock);
-	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE)
-		__fsnotify_update_child_dentry_flags(
-					fsnotify_conn_inode(conn));
+	/*
+	 * Set children's PARENT_WATCHED flags only if parent started watching.
+	 * When parent stops watching, we clear false positive PARENT_WATCHED
+	 * flags lazily in __fsnotify_parent().
+	 */
+	if (update_children)
+		fsnotify_conn_set_children_dentry_flags(conn);
 }
 
 /* Free all connectors queued for freeing once SRCU period ends */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 8f40c349b2283..59e6b8e98a4c1 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -563,12 +563,14 @@ static inline __u32 fsnotify_parent_needed_mask(__u32 mask)
 
 static inline int fsnotify_inode_watches_children(struct inode *inode)
 {
+	__u32 parent_mask = READ_ONCE(inode->i_fsnotify_mask);
+
 	/* FS_EVENT_ON_CHILD is set if the inode may care */
-	if (!(inode->i_fsnotify_mask & FS_EVENT_ON_CHILD))
+	if (!(parent_mask & FS_EVENT_ON_CHILD))
 		return 0;
 	/* this inode might care about child events, does it care about the
 	 * specific set of events that can happen on a child? */
-	return inode->i_fsnotify_mask & FS_EVENTS_POSS_ON_CHILD;
+	return parent_mask & FS_EVENTS_POSS_ON_CHILD;
 }
 
 /*
@@ -582,7 +584,7 @@ static inline void fsnotify_update_flags(struct dentry *dentry)
 	/*
 	 * Serialisation of setting PARENT_WATCHED on the dentries is provided
 	 * by d_lock. If inotify_inode_watched changes after we have taken
-	 * d_lock, the following __fsnotify_update_child_dentry_flags call will
+	 * d_lock, the following fsnotify_update_children_dentry_flags call will
 	 * find our entry, so it will spin until we complete here, and update
 	 * us with the new state.
 	 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily
  2024-05-10 22:18 [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily Stephen Brennan
  2024-05-10 22:19 ` [PATCH 1/1] " Stephen Brennan
@ 2024-05-11  3:42 ` Amir Goldstein
  2024-05-14  0:04   ` Stephen Brennan
  1 sibling, 1 reply; 6+ messages in thread
From: Amir Goldstein @ 2024-05-11  3:42 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: Jan Kara, linux-fsdevel, linux-kernel

On Fri, May 10, 2024 at 6:21 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Hi Amir, Jan, et al,

Hi Stephen,

>
> It's been a while since I worked with you on the patch series[1] that aimed to
> make __fsnotify_update_child_dentry_flags() a sleepable function. That work got
> to a point that it was close to ready, but there were some locking issues which
> Jan found, and the kernel test robot reported, and I didn't find myself able to
> tackle them in the amount of time I had.
>
> But looking back on that series, I think I threw out the baby with the
> bathwater. While I may not have resolved the locking issues associated with the
> larger change, there was one patch which Amir shared, that probably resolves
> more than 90% of the issues that people may see. I'm sending that here, since it
> still applies to the latest master branch, and I think it's a very good idea.
>
> To refresh you, the underlying issue I was trying to resolve was when
> directories have many dentries (frequently, a ton of negative dentries), the
> __fsnotify_update_child_dentry_flags() operation can take a while, and it
> happens under spinlock.
>
> Case #1 - if the directory has tens of millions of dentries, then you could get
> a soft lockup from a single call to this function. I have seen some cases where
> a single directory had this many dentries, but it's pretty rare.
>
> Case #2 - suppose you have a system with many CPUs and a busy directory. Suppose
> the directory watch is removed. The caller will begin executing
> __fsnotify_update_child_dentry_flags() to clear the PARENT_WATCHED flag, but in
> parallel, many other CPUs could wind up in __fsnotify_parent() and decide that
> they, too, must call __fsnotify_update_child_dentry_flags() to clear the flags.
> These CPUs will all spin waiting their turn, at which point they'll re-do the
> long (and likely, useless) call. Even if the original call only took a second or
> two, if you have a dozen or so CPUs that end up in that call, some CPUs will
> spin a long time.
>
> Amir's patch to clear PARENT_WATCHED flags lazily resolves that easily. In
> __fsnotify_parent(), if callers notice that the parent is no longer watching,
> they merely update the flags for the current dentry (not all the other
> children). The __fsnotify_recalc_mask() function further avoids excess calls by
> only updating children if the parent started watching. This easily handles case
> #2 above. Perhaps case #1 could still cause issues, for the cases of truly huge
> dentry counts, but we shouldn't let "perfect" get in the way of "good enough" :)
>

The story sounds good :)
Only thing I am worried about is: was case #2 tested to prove that
the patch really imploves in practice and not only in theory?

I am not asking that you write a test for this or even a reproducer
just evidence that you collected from a case where improvement is observed
and measurable.

Thanks,
Amir.

> [1]: https://lore.kernel.org/all/20221013222719.277923-1-stephen.s.brennan@oracle.com/
>
> Amir Goldstein (1):
>   fsnotify: clear PARENT_WATCHED flags lazily
>
>  fs/notify/fsnotify.c             | 26 ++++++++++++++++++++------
>  fs/notify/fsnotify.h             |  3 ++-
>  fs/notify/mark.c                 | 32 +++++++++++++++++++++++++++++---
>  include/linux/fsnotify_backend.h |  8 +++++---
>  4 files changed, 56 insertions(+), 13 deletions(-)
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily
  2024-05-11  3:42 ` [PATCH 0/1] " Amir Goldstein
@ 2024-05-14  0:04   ` Stephen Brennan
  2024-05-15 17:15     ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Brennan @ 2024-05-14  0:04 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel, linux-kernel

Amir Goldstein <amir73il@gmail.com> writes:

> On Fri, May 10, 2024 at 6:21 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Hi Amir, Jan, et al,
>
> Hi Stephen,
>
>>
>> It's been a while since I worked with you on the patch series[1] that aimed to
>> make __fsnotify_update_child_dentry_flags() a sleepable function. That work got
>> to a point that it was close to ready, but there were some locking issues which
>> Jan found, and the kernel test robot reported, and I didn't find myself able to
>> tackle them in the amount of time I had.
>>
>> But looking back on that series, I think I threw out the baby with the
>> bathwater. While I may not have resolved the locking issues associated with the
>> larger change, there was one patch which Amir shared, that probably resolves
>> more than 90% of the issues that people may see. I'm sending that here, since it
>> still applies to the latest master branch, and I think it's a very good idea.
>>
>> To refresh you, the underlying issue I was trying to resolve was when
>> directories have many dentries (frequently, a ton of negative dentries), the
>> __fsnotify_update_child_dentry_flags() operation can take a while, and it
>> happens under spinlock.
>>
>> Case #1 - if the directory has tens of millions of dentries, then you could get
>> a soft lockup from a single call to this function. I have seen some cases where
>> a single directory had this many dentries, but it's pretty rare.
>>
>> Case #2 - suppose you have a system with many CPUs and a busy directory. Suppose
>> the directory watch is removed. The caller will begin executing
>> __fsnotify_update_child_dentry_flags() to clear the PARENT_WATCHED flag, but in
>> parallel, many other CPUs could wind up in __fsnotify_parent() and decide that
>> they, too, must call __fsnotify_update_child_dentry_flags() to clear the flags.
>> These CPUs will all spin waiting their turn, at which point they'll re-do the
>> long (and likely, useless) call. Even if the original call only took a second or
>> two, if you have a dozen or so CPUs that end up in that call, some CPUs will
>> spin a long time.
>>
>> Amir's patch to clear PARENT_WATCHED flags lazily resolves that easily. In
>> __fsnotify_parent(), if callers notice that the parent is no longer watching,
>> they merely update the flags for the current dentry (not all the other
>> children). The __fsnotify_recalc_mask() function further avoids excess calls by
>> only updating children if the parent started watching. This easily handles case
>> #2 above. Perhaps case #1 could still cause issues, for the cases of truly huge
>> dentry counts, but we shouldn't let "perfect" get in the way of "good enough" :)
>>
>
> The story sounds good :)
> Only thing I am worried about is: was case #2 tested to prove that
> the patch really imploves in practice and not only in theory?
>
> I am not asking that you write a test for this or even a reproducer
> just evidence that you collected from a case where improvement is observed
> and measurable.

I had not done so when you sent this, but I should have done it
beforehand. In any case, now I have. I got my hands on a 384-CPU machine
and extended my negative dentry creation tool so that it can run a
workload in which it constantly runs "open()" followed by "close()" on
1000 files in the same directory, per thread (so a total of 384,000
files, a large but not unreasonable amount of dentries).

Then I simply run "inotifywait /path/to/dir" a few times. Without the
patch, softlockups are easy to reproduce. With the patch, I haven't been
able to get a single soft lockup.

https://github.com/brenns10/kernel_stuff/tree/master/negdentcreate

    make
    mkdir test

    # create 384k files inside "test"
    ./negdentcreate -p test -c 384000 -t 384 -o create

    # start a loop opening and closing those files
    negdentcreate -p test -c 384000 -t 384 -o open -l

    # in another window:
    inotifywait test

Stephen

>
> Thanks,
> Amir.
>
>> [1]: https://lore.kernel.org/all/20221013222719.277923-1-stephen.s.brennan@oracle.com/
>>
>> Amir Goldstein (1):
>>   fsnotify: clear PARENT_WATCHED flags lazily
>>
>>  fs/notify/fsnotify.c             | 26 ++++++++++++++++++++------
>>  fs/notify/fsnotify.h             |  3 ++-
>>  fs/notify/mark.c                 | 32 +++++++++++++++++++++++++++++---
>>  include/linux/fsnotify_backend.h |  8 +++++---
>>  4 files changed, 56 insertions(+), 13 deletions(-)
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily
  2024-05-14  0:04   ` Stephen Brennan
@ 2024-05-15 17:15     ` Jan Kara
  2024-05-15 22:14       ` Stephen Brennan
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2024-05-15 17:15 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: Amir Goldstein, Jan Kara, linux-fsdevel, linux-kernel

On Mon 13-05-24 17:04:12, Stephen Brennan wrote:
> Amir Goldstein <amir73il@gmail.com> writes:
> 
> > On Fri, May 10, 2024 at 6:21 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >>
> >> Hi Amir, Jan, et al,
> >
> > Hi Stephen,
> >
> >>
> >> It's been a while since I worked with you on the patch series[1] that aimed to
> >> make __fsnotify_update_child_dentry_flags() a sleepable function. That work got
> >> to a point that it was close to ready, but there were some locking issues which
> >> Jan found, and the kernel test robot reported, and I didn't find myself able to
> >> tackle them in the amount of time I had.
> >>
> >> But looking back on that series, I think I threw out the baby with the
> >> bathwater. While I may not have resolved the locking issues associated with the
> >> larger change, there was one patch which Amir shared, that probably resolves
> >> more than 90% of the issues that people may see. I'm sending that here, since it
> >> still applies to the latest master branch, and I think it's a very good idea.
> >>
> >> To refresh you, the underlying issue I was trying to resolve was when
> >> directories have many dentries (frequently, a ton of negative dentries), the
> >> __fsnotify_update_child_dentry_flags() operation can take a while, and it
> >> happens under spinlock.
> >>
> >> Case #1 - if the directory has tens of millions of dentries, then you could get
> >> a soft lockup from a single call to this function. I have seen some cases where
> >> a single directory had this many dentries, but it's pretty rare.
> >>
> >> Case #2 - suppose you have a system with many CPUs and a busy directory. Suppose
> >> the directory watch is removed. The caller will begin executing
> >> __fsnotify_update_child_dentry_flags() to clear the PARENT_WATCHED flag, but in
> >> parallel, many other CPUs could wind up in __fsnotify_parent() and decide that
> >> they, too, must call __fsnotify_update_child_dentry_flags() to clear the flags.
> >> These CPUs will all spin waiting their turn, at which point they'll re-do the
> >> long (and likely, useless) call. Even if the original call only took a second or
> >> two, if you have a dozen or so CPUs that end up in that call, some CPUs will
> >> spin a long time.
> >>
> >> Amir's patch to clear PARENT_WATCHED flags lazily resolves that easily. In
> >> __fsnotify_parent(), if callers notice that the parent is no longer watching,
> >> they merely update the flags for the current dentry (not all the other
> >> children). The __fsnotify_recalc_mask() function further avoids excess calls by
> >> only updating children if the parent started watching. This easily handles case
> >> #2 above. Perhaps case #1 could still cause issues, for the cases of truly huge
> >> dentry counts, but we shouldn't let "perfect" get in the way of "good enough" :)
> >>
> >
> > The story sounds good :)
> > Only thing I am worried about is: was case #2 tested to prove that
> > the patch really imploves in practice and not only in theory?
> >
> > I am not asking that you write a test for this or even a reproducer
> > just evidence that you collected from a case where improvement is observed
> > and measurable.
> 
> I had not done so when you sent this, but I should have done it
> beforehand. In any case, now I have. I got my hands on a 384-CPU machine
> and extended my negative dentry creation tool so that it can run a
> workload in which it constantly runs "open()" followed by "close()" on
> 1000 files in the same directory, per thread (so a total of 384,000
> files, a large but not unreasonable amount of dentries).
> 
> Then I simply run "inotifywait /path/to/dir" a few times. Without the
> patch, softlockups are easy to reproduce. With the patch, I haven't been
> able to get a single soft lockup.

Thanks for the patch and for testing! I've added your patch to my tree (not
for this merge window though) with a cosmetic tweak that instead of
fsnotify_update_child_dentry_flags() we just have
fsnotify_clear_child_dentry_flag() and fsnotify_set_children_dentry_flags()
functions to make naming somewhat clearer.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily
  2024-05-15 17:15     ` Jan Kara
@ 2024-05-15 22:14       ` Stephen Brennan
  0 siblings, 0 replies; 6+ messages in thread
From: Stephen Brennan @ 2024-05-15 22:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: Amir Goldstein, Jan Kara, linux-fsdevel, linux-kernel

Jan Kara <jack@suse.cz> writes:
> On Mon 13-05-24 17:04:12, Stephen Brennan wrote:
>> Amir Goldstein <amir73il@gmail.com> writes:
>> 
>> > On Fri, May 10, 2024 at 6:21 PM Stephen Brennan
>> > <stephen.s.brennan@oracle.com> wrote:
>> >>
>> >> Hi Amir, Jan, et al,
>> >
>> > Hi Stephen,
>> >
>> >>
>> >> It's been a while since I worked with you on the patch series[1] that aimed to
>> >> make __fsnotify_update_child_dentry_flags() a sleepable function. That work got
>> >> to a point that it was close to ready, but there were some locking issues which
>> >> Jan found, and the kernel test robot reported, and I didn't find myself able to
>> >> tackle them in the amount of time I had.
>> >>
>> >> But looking back on that series, I think I threw out the baby with the
>> >> bathwater. While I may not have resolved the locking issues associated with the
>> >> larger change, there was one patch which Amir shared, that probably resolves
>> >> more than 90% of the issues that people may see. I'm sending that here, since it
>> >> still applies to the latest master branch, and I think it's a very good idea.
>> >>
>> >> To refresh you, the underlying issue I was trying to resolve was when
>> >> directories have many dentries (frequently, a ton of negative dentries), the
>> >> __fsnotify_update_child_dentry_flags() operation can take a while, and it
>> >> happens under spinlock.
>> >>
>> >> Case #1 - if the directory has tens of millions of dentries, then you could get
>> >> a soft lockup from a single call to this function. I have seen some cases where
>> >> a single directory had this many dentries, but it's pretty rare.
>> >>
>> >> Case #2 - suppose you have a system with many CPUs and a busy directory. Suppose
>> >> the directory watch is removed. The caller will begin executing
>> >> __fsnotify_update_child_dentry_flags() to clear the PARENT_WATCHED flag, but in
>> >> parallel, many other CPUs could wind up in __fsnotify_parent() and decide that
>> >> they, too, must call __fsnotify_update_child_dentry_flags() to clear the flags.
>> >> These CPUs will all spin waiting their turn, at which point they'll re-do the
>> >> long (and likely, useless) call. Even if the original call only took a second or
>> >> two, if you have a dozen or so CPUs that end up in that call, some CPUs will
>> >> spin a long time.
>> >>
>> >> Amir's patch to clear PARENT_WATCHED flags lazily resolves that easily. In
>> >> __fsnotify_parent(), if callers notice that the parent is no longer watching,
>> >> they merely update the flags for the current dentry (not all the other
>> >> children). The __fsnotify_recalc_mask() function further avoids excess calls by
>> >> only updating children if the parent started watching. This easily handles case
>> >> #2 above. Perhaps case #1 could still cause issues, for the cases of truly huge
>> >> dentry counts, but we shouldn't let "perfect" get in the way of "good enough" :)
>> >>
>> >
>> > The story sounds good :)
>> > Only thing I am worried about is: was case #2 tested to prove that
>> > the patch really imploves in practice and not only in theory?
>> >
>> > I am not asking that you write a test for this or even a reproducer
>> > just evidence that you collected from a case where improvement is observed
>> > and measurable.
>> 
>> I had not done so when you sent this, but I should have done it
>> beforehand. In any case, now I have. I got my hands on a 384-CPU machine
>> and extended my negative dentry creation tool so that it can run a
>> workload in which it constantly runs "open()" followed by "close()" on
>> 1000 files in the same directory, per thread (so a total of 384,000
>> files, a large but not unreasonable amount of dentries).
>> 
>> Then I simply run "inotifywait /path/to/dir" a few times. Without the
>> patch, softlockups are easy to reproduce. With the patch, I haven't been
>> able to get a single soft lockup.
>
> Thanks for the patch and for testing! I've added your patch to my tree (not
> for this merge window though) with a cosmetic tweak that instead of
> fsnotify_update_child_dentry_flags() we just have
> fsnotify_clear_child_dentry_flag() and fsnotify_set_children_dentry_flags()
> functions to make naming somewhat clearer.

Thank you Jan! I agree that change will make it clearer when reading
code and stack traces :)

-Stephen

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-15 22:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-10 22:18 [PATCH 0/1] fsnotify: clear PARENT_WATCHED flags lazily Stephen Brennan
2024-05-10 22:19 ` [PATCH 1/1] " Stephen Brennan
2024-05-11  3:42 ` [PATCH 0/1] " Amir Goldstein
2024-05-14  0:04   ` Stephen Brennan
2024-05-15 17:15     ` Jan Kara
2024-05-15 22:14       ` Stephen Brennan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).