All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-bcachefs@vger.kernel.org
Subject: [BUG] bcachefs fallocate btree lock contention
Date: Fri, 21 Jul 2023 09:28:39 -0400	[thread overview]
Message-ID: <ZLqIB3k600oAS3G1@bfoster> (raw)

Hi all,

When testing the recent write buffer journaling series, I reproduced
several fstests (i.e. generic/013) that seemed pretty much hung up in a
livelock in fsstress. On some further digging, it appears they were
stuck doing fallocates and were spinning heavily on transaction
restarts. I don't think these tests are stuck indefinitely, but rather
this manifests as some excessively long runtimes for tests that involve
concurrent fsstress runs. I was eventually able to reproduce the same
behavior without the write buffer patches, so it doesn't appear to be
related.

I think the issue is basically that if multiple fallocates are running
against independent inodes that might update the same extent btree node,
the __bchfs_fallocate() loop can get into a tight spin due to the lock
cycling around bch2_clamp_data_hole() contending with node updates. This
is where I see most restarts, and I've seen upwards of 100k+ restarts
and single fallocate latencies of tens of seconds. This is pretty
trivial to reproduce by just running concurrent sequential fallocates to
different files (8x or so on my test vm pretty much grinds things to a
halt) [1].

One question that comes to mind: why do we cycle locks here? Is this a
lock ordering requirement between folio locks and btree node locks?

To test the above, I ran with a quick hack to check for pagecache pages
before we decide to clamp the range during fallocate. This speeds up the
test significantly and pretty much removes the bottleneck. This only
handles the simple case and doesn't quite feel like the proper fix to
me, but since I'm low on time I threw it up on CI [2] for reference and
to get a test cycle. I'm heading on vacation for the next week+, so I
wanted to throw this up on the list so at least folks are aware of it if
any excessive test latencies are observed.

Any thoughts are appreciated in the meantime. I'll pick it up once I'm
back..

Brian

[1] Example sequential fallocate reproducer. Run against multiple files:

offset=0
while [ true ]; do
        xfs_io -fc "falloc ${offset}k 512k" $file
        offset=$((offset + 512))
done

[2] https://evilpiepirate.org/~testdashboard/ci?branch=bfoster&commit=d755bfd22fe0fabf8def3bfa0b758864538f79cd


             reply	other threads:[~2023-07-21 13:27 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 13:28 Brian Foster [this message]
2023-07-21 20:49 ` [BUG] bcachefs fallocate btree lock contention Kent Overstreet
2023-07-21 21:02 ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZLqIB3k600oAS3G1@bfoster \
    --to=bfoster@redhat.com \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.