All of lore.kernel.org
 help / color / mirror / Atom feed
* shrink_dcache_parent contention can make it stall for hours
@ 2025-10-02 11:54 Miroslav Crnić
  2025-10-02 12:35 ` Al Viro
  0 siblings, 1 reply; 4+ messages in thread
From: Miroslav Crnić @ 2025-10-02 11:54 UTC (permalink / raw)
  To: linux-fsdevel

The current code can hit a substantial lock starvation on
parent->d_lock when shrinking a large amount of children.

Scenario:
- Parent passed to function has a large amount of children.
- shrink_dcache_parent is called concurrently from multiple threads.
- Thread one collects all children into a dispose list.
  To delete each one it needs to grab parent->d_lock.
- Other threads keep looping over children.
  This holds parent->d_lock stopping deletion work of the first thread.
- Work stealing does not help as it steals at most 1 item at a time.

We hit the scenario above in production in a cluster running a distributed
file system (TernFS). If you have ~100k children and 8 threads calling
shrink_dcache_parent can result in all threads being stuck in
shrink_dcache_parent for multiple hours.

I think there are two possible approaches to remove this pathological behavior:

Option 1:
Don't differentiate between work stealing and own work.
Collect items for work stealing in the disposal list along with regular items.
Fight out the contention per item with try_lock and continue through the list.

Option 2:
Change shrink_dcache_parent to collect a dispose list for one unique parent.
Split out unlinking from parent from __dentry_kill.
Run over the disposal list in 2 passes.
First pass does everything except unlink parent.
Second pass does all unlinking under a single parent->d_lock grab.

I'm happy to submit a patch for either approach, but wanted to validate the
proposed solutions first.

Thanks
// Miroslav

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-02 20:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 11:54 shrink_dcache_parent contention can make it stall for hours Miroslav Crnić
2025-10-02 12:35 ` Al Viro
2025-10-02 12:54   ` Miroslav Crnić
2025-10-02 20:43     ` Miroslav Crnić

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.