Linux bcachefs list
 help / color / mirror / Atom feed
* [BUG] bcachefs fallocate btree lock contention
@ 2023-07-21 13:28 Brian Foster
  2023-07-21 20:49 ` Kent Overstreet
  2023-07-21 21:02 ` Kent Overstreet
  0 siblings, 2 replies; 3+ messages in thread
From: Brian Foster @ 2023-07-21 13:28 UTC (permalink / raw)
  To: linux-bcachefs

Hi all,

When testing the recent write buffer journaling series, I reproduced
several fstests (i.e. generic/013) that seemed pretty much hung up in a
livelock in fsstress. On some further digging, it appears they were
stuck doing fallocates and were spinning heavily on transaction
restarts. I don't think these tests are stuck indefinitely, but rather
this manifests as some excessively long runtimes for tests that involve
concurrent fsstress runs. I was eventually able to reproduce the same
behavior without the write buffer patches, so it doesn't appear to be
related.

I think the issue is basically that if multiple fallocates are running
against independent inodes that might update the same extent btree node,
the __bchfs_fallocate() loop can get into a tight spin due to the lock
cycling around bch2_clamp_data_hole() contending with node updates. This
is where I see most restarts, and I've seen upwards of 100k+ restarts
and single fallocate latencies of tens of seconds. This is pretty
trivial to reproduce by just running concurrent sequential fallocates to
different files (8x or so on my test vm pretty much grinds things to a
halt) [1].

One question that comes to mind: why do we cycle locks here? Is this a
lock ordering requirement between folio locks and btree node locks?

To test the above, I ran with a quick hack to check for pagecache pages
before we decide to clamp the range during fallocate. This speeds up the
test significantly and pretty much removes the bottleneck. This only
handles the simple case and doesn't quite feel like the proper fix to
me, but since I'm low on time I threw it up on CI [2] for reference and
to get a test cycle. I'm heading on vacation for the next week+, so I
wanted to throw this up on the list so at least folks are aware of it if
any excessive test latencies are observed.

Any thoughts are appreciated in the meantime. I'll pick it up once I'm
back..

Brian

[1] Example sequential fallocate reproducer. Run against multiple files:

offset=0
while [ true ]; do
        xfs_io -fc "falloc ${offset}k 512k" $file
        offset=$((offset + 512))
done

[2] https://evilpiepirate.org/~testdashboard/ci?branch=bfoster&commit=d755bfd22fe0fabf8def3bfa0b758864538f79cd


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-21 21:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-21 13:28 [BUG] bcachefs fallocate btree lock contention Brian Foster
2023-07-21 20:49 ` Kent Overstreet
2023-07-21 21:02 ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox