All of lore.kernel.org
 help / color / mirror / Atom feed
* shrink_dcache_parent contention can make it stall for hours
@ 2025-10-02 11:54 Miroslav Crnić
  2025-10-02 12:35 ` Al Viro
  0 siblings, 1 reply; 4+ messages in thread
From: Miroslav Crnić @ 2025-10-02 11:54 UTC (permalink / raw)
  To: linux-fsdevel

The current code can hit a substantial lock starvation on
parent->d_lock when shrinking a large amount of children.

Scenario:
- Parent passed to function has a large amount of children.
- shrink_dcache_parent is called concurrently from multiple threads.
- Thread one collects all children into a dispose list.
  To delete each one it needs to grab parent->d_lock.
- Other threads keep looping over children.
  This holds parent->d_lock stopping deletion work of the first thread.
- Work stealing does not help as it steals at most 1 item at a time.

We hit the scenario above in production in a cluster running a distributed
file system (TernFS). If you have ~100k children and 8 threads calling
shrink_dcache_parent can result in all threads being stuck in
shrink_dcache_parent for multiple hours.

I think there are two possible approaches to remove this pathological behavior:

Option 1:
Don't differentiate between work stealing and own work.
Collect items for work stealing in the disposal list along with regular items.
Fight out the contention per item with try_lock and continue through the list.

Option 2:
Change shrink_dcache_parent to collect a dispose list for one unique parent.
Split out unlinking from parent from __dentry_kill.
Run over the disposal list in 2 passes.
First pass does everything except unlink parent.
Second pass does all unlinking under a single parent->d_lock grab.

I'm happy to submit a patch for either approach, but wanted to validate the
proposed solutions first.

Thanks
// Miroslav

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: shrink_dcache_parent contention can make it stall for hours
  2025-10-02 11:54 shrink_dcache_parent contention can make it stall for hours Miroslav Crnić
@ 2025-10-02 12:35 ` Al Viro
  2025-10-02 12:54   ` Miroslav Crnić
  0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2025-10-02 12:35 UTC (permalink / raw)
  To: Miroslav Crnić; +Cc: linux-fsdevel

On Thu, Oct 02, 2025 at 12:54:38PM +0100, Miroslav Crnić wrote:

> Option 1:
> Don't differentiate between work stealing and own work.
> Collect items for work stealing in the disposal list along with regular items.
> Fight out the contention per item with try_lock and continue through the list.

Currently disposal list is modified only by the thread that has created it;
which lock (if any) do you have in mind for protecting those?

> Option 2:
> Change shrink_dcache_parent to collect a dispose list for one unique parent.
> Split out unlinking from parent from __dentry_kill.
> Run over the disposal list in 2 passes.
> First pass does everything except unlink parent.
> Second pass does all unlinking under a single parent->d_lock grab.

Do you mean dentry_unlist()?  The problem is, how do you prevent those parents
looking inexplicably busy?  Cross-fs disposal list existing in parallel with fs
shutdown is very much possible; shrink_dcache_for_umount() should quietly steal
everything relevant from that and move on, leaving the empty husks on the
list to by freed by list owner once it gets to them.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: shrink_dcache_parent contention can make it stall for hours
  2025-10-02 12:35 ` Al Viro
@ 2025-10-02 12:54   ` Miroslav Crnić
  2025-10-02 20:43     ` Miroslav Crnić
  0 siblings, 1 reply; 4+ messages in thread
From: Miroslav Crnić @ 2025-10-02 12:54 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

> Currently disposal list is modified only by the thread that has created it;
> which lock (if any) do you have in mind for protecting those?

You are completely right here. It would require additional locking.
In that case, I would suggest something different.
Skipping select_collect entirely and calling select_collect2 instead
would be better.
It would need to be modified such that it populates select_data.found
The no contention path should result in similar results as all items
will be collected in dispose list. The contention path would stop
sooner, on the first
worksteal item and prevent holding the parent->d_lock over the whole iteration.

> Do you mean dentry_unlist()?  The problem is, how do you prevent those parents
> looking inexplicably busy?  Cross-fs disposal list existing in parallel with fs
> shutdown is very much possible; shrink_dcache_for_umount() should quietly steal
> everything relevant from that and move on, leaving the empty husks on the
> list to by freed by list owner once it gets to them.

Right, this is not an option then.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: shrink_dcache_parent contention can make it stall for hours
  2025-10-02 12:54   ` Miroslav Crnić
@ 2025-10-02 20:43     ` Miroslav Crnić
  0 siblings, 0 replies; 4+ messages in thread
From: Miroslav Crnić @ 2025-10-02 20:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

> Do you mean dentry_unlist()?  The problem is, how do you prevent those parents
> looking inexplicably busy?  Cross-fs disposal list existing in parallel with fs
> shutdown is very much possible; shrink_dcache_for_umount() should quietly steal
> everything relevant from that and move on, leaving the empty husks on the
> list to by freed by list owner once it gets to them.

Coming back to this after reading the code. What you are saying here
is not in line
with what the code does.  shrink_dcache_for_umount() will get stuck in
shrink_dcache_parent along with the rest of things looping there.
Nothing leaves that function until select_data.found == 0 which means
all things need to be unlinked as select_collect counts in select_data.found
things with ref_count < 0. (among other things it counts)
I am not saying select_collect should not count it.
I am saying that following is bad:

"Looping through a large list of already claimed work while holding a
lock needed for
this work to complete and then only claiming at most 1 item from there
is extremely
bad."

If we can agree that shrink_dcache_parent does the above we can start thinking
about what solution would be better.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-02 20:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 11:54 shrink_dcache_parent contention can make it stall for hours Miroslav Crnić
2025-10-02 12:35 ` Al Viro
2025-10-02 12:54   ` Miroslav Crnić
2025-10-02 20:43     ` Miroslav Crnić

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.