* [BUG] bcachefs fio lockup via generic/703
@ 2024-02-28 19:47 Brian Foster
2024-02-28 20:03 ` Kent Overstreet
0 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2024-02-28 19:47 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Hi Kent,
Firstly, I confirmed that today's master seems to avoid the splat I sent
previously (re: your comment about a reverse journal replay patch or
some such).
I still reproduce the stall issue on this system. After peeling away at
it, I was eventually able to reproduce without the drop writes
(dm-flakey) behavior from the test, and with fio using either the libaio
or sync I/O engine options. The sync I/O mode fortunately provides a
more useful stack trace:
# cat /proc/177747/stack
[<0>] bch2_dio_write_flush+0x122/0x160 [bcachefs]
[<0>] bch2_direct_write+0xb53/0xce0 [bcachefs]
[<0>] bch2_write_iter+0x142/0xc70 [bcachefs]
[<0>] vfs_write+0x29b/0x470
[<0>] ksys_write+0x6f/0xf0
[<0>] do_syscall_64+0x86/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
... which resolves down to the closure_sync() call in
bch2_dio_write_flush(). The problem seems to go away if I remove the
preceding journal flush from that function. This seems to rule out
io_uring/aio and instead suggest that we're getting stuck somehow
waiting on a journal flush.
Based on that I went back to the first commit before 746a33c96b7a0
("bcachefs: better journal pipelining"). With that, I can run hundreds
of iterations of generic/703 without a problem, so this appears to be a
regression associated with the journal pipeline improvements. I'm
currently re-running on the last known good commit with my test tweaks
backed out (i.e. so back to io_uring and drop writes) just to
corroborate that it's the same problem, but so far it's running as
expected...
Brian
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [BUG] bcachefs fio lockup via generic/703 2024-02-28 19:47 [BUG] bcachefs fio lockup via generic/703 Brian Foster @ 2024-02-28 20:03 ` Kent Overstreet 2024-02-28 20:13 ` Brian Foster 0 siblings, 1 reply; 8+ messages in thread From: Kent Overstreet @ 2024-02-28 20:03 UTC (permalink / raw) To: Brian Foster; +Cc: linux-bcachefs On Wed, Feb 28, 2024 at 02:47:26PM -0500, Brian Foster wrote: > Hi Kent, > > Firstly, I confirmed that today's master seems to avoid the splat I sent > previously (re: your comment about a reverse journal replay patch or > some such). > > I still reproduce the stall issue on this system. After peeling away at > it, I was eventually able to reproduce without the drop writes > (dm-flakey) behavior from the test, and with fio using either the libaio > or sync I/O engine options. The sync I/O mode fortunately provides a > more useful stack trace: > > # cat /proc/177747/stack > [<0>] bch2_dio_write_flush+0x122/0x160 [bcachefs] > [<0>] bch2_direct_write+0xb53/0xce0 [bcachefs] > [<0>] bch2_write_iter+0x142/0xc70 [bcachefs] > [<0>] vfs_write+0x29b/0x470 > [<0>] ksys_write+0x6f/0xf0 > [<0>] do_syscall_64+0x86/0x170 > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > ... which resolves down to the closure_sync() call in > bch2_dio_write_flush(). The problem seems to go away if I remove the > preceding journal flush from that function. This seems to rule out > io_uring/aio and instead suggest that we're getting stuck somehow > waiting on a journal flush. > > Based on that I went back to the first commit before 746a33c96b7a0 > ("bcachefs: better journal pipelining"). With that, I can run hundreds > of iterations of generic/703 without a problem, so this appears to be a > regression associated with the journal pipeline improvements. I'm > currently re-running on the last known good commit with my test tweaks > backed out (i.e. so back to io_uring and drop writes) just to > corroborate that it's the same problem, but so far it's running as > expected... So I suppose the journal must be getting stuck, and a journal write isn't completing - what does sysfs internal/journal_debug say when it happens? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] bcachefs fio lockup via generic/703 2024-02-28 20:03 ` Kent Overstreet @ 2024-02-28 20:13 ` Brian Foster 2024-02-28 23:43 ` Kent Overstreet 2024-02-29 0:02 ` Kent Overstreet 0 siblings, 2 replies; 8+ messages in thread From: Brian Foster @ 2024-02-28 20:13 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcachefs On Wed, Feb 28, 2024 at 03:03:06PM -0500, Kent Overstreet wrote: > On Wed, Feb 28, 2024 at 02:47:26PM -0500, Brian Foster wrote: > > Hi Kent, > > > > Firstly, I confirmed that today's master seems to avoid the splat I sent > > previously (re: your comment about a reverse journal replay patch or > > some such). > > > > I still reproduce the stall issue on this system. After peeling away at > > it, I was eventually able to reproduce without the drop writes > > (dm-flakey) behavior from the test, and with fio using either the libaio > > or sync I/O engine options. The sync I/O mode fortunately provides a > > more useful stack trace: > > > > # cat /proc/177747/stack > > [<0>] bch2_dio_write_flush+0x122/0x160 [bcachefs] > > [<0>] bch2_direct_write+0xb53/0xce0 [bcachefs] > > [<0>] bch2_write_iter+0x142/0xc70 [bcachefs] > > [<0>] vfs_write+0x29b/0x470 > > [<0>] ksys_write+0x6f/0xf0 > > [<0>] do_syscall_64+0x86/0x170 > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > > ... which resolves down to the closure_sync() call in > > bch2_dio_write_flush(). The problem seems to go away if I remove the > > preceding journal flush from that function. This seems to rule out > > io_uring/aio and instead suggest that we're getting stuck somehow > > waiting on a journal flush. > > > > Based on that I went back to the first commit before 746a33c96b7a0 > > ("bcachefs: better journal pipelining"). With that, I can run hundreds > > of iterations of generic/703 without a problem, so this appears to be a > > regression associated with the journal pipeline improvements. I'm > > currently re-running on the last known good commit with my test tweaks > > backed out (i.e. so back to io_uring and drop writes) just to > > corroborate that it's the same problem, but so far it's running as > > expected... > > So I suppose the journal must be getting stuck, and a journal write > isn't completing - what does sysfs internal/journal_debug say when it happens? > dirty journal entries: 0/32768 seq: 11 seq_ondisk: 11 last_seq: 12 last_seq_ondisk: 11 flushed_seq_ondisk: 11 watermark: stripe each entry reserved: 346 nr flush writes: 10 nr noflush writes: 1 average write size: 10.9 KiB nr direct reclaim: 0 nr background reclaim: 25 reclaim kicked: 0 reclaim runs in: 0 ms current entry sectors: 512 current entry error: ok current entry: closed unwritten entries: last buf closed replay done: 1 space: discarded 512:244736 clean ondisk 512:244736 clean 512:244736 total 512:245760 dev 0: nr 480 bucket size 512 available 478:266 discard_idx 1 dirty_ondisk 1 (seq 11) dirty_idx 1 (seq 11) cur_idx 1 (seq 11) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] bcachefs fio lockup via generic/703 2024-02-28 20:13 ` Brian Foster @ 2024-02-28 23:43 ` Kent Overstreet 2024-02-29 0:02 ` Kent Overstreet 1 sibling, 0 replies; 8+ messages in thread From: Kent Overstreet @ 2024-02-28 23:43 UTC (permalink / raw) To: Brian Foster; +Cc: linux-bcachefs On Wed, Feb 28, 2024 at 03:13:04PM -0500, Brian Foster wrote: > On Wed, Feb 28, 2024 at 03:03:06PM -0500, Kent Overstreet wrote: > > On Wed, Feb 28, 2024 at 02:47:26PM -0500, Brian Foster wrote: > > > Hi Kent, > > > > > > Firstly, I confirmed that today's master seems to avoid the splat I sent > > > previously (re: your comment about a reverse journal replay patch or > > > some such). > > > > > > I still reproduce the stall issue on this system. After peeling away at > > > it, I was eventually able to reproduce without the drop writes > > > (dm-flakey) behavior from the test, and with fio using either the libaio > > > or sync I/O engine options. The sync I/O mode fortunately provides a > > > more useful stack trace: > > > > > > # cat /proc/177747/stack > > > [<0>] bch2_dio_write_flush+0x122/0x160 [bcachefs] > > > [<0>] bch2_direct_write+0xb53/0xce0 [bcachefs] > > > [<0>] bch2_write_iter+0x142/0xc70 [bcachefs] > > > [<0>] vfs_write+0x29b/0x470 > > > [<0>] ksys_write+0x6f/0xf0 > > > [<0>] do_syscall_64+0x86/0x170 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > > > > ... which resolves down to the closure_sync() call in > > > bch2_dio_write_flush(). The problem seems to go away if I remove the > > > preceding journal flush from that function. This seems to rule out > > > io_uring/aio and instead suggest that we're getting stuck somehow > > > waiting on a journal flush. > > > > > > Based on that I went back to the first commit before 746a33c96b7a0 > > > ("bcachefs: better journal pipelining"). With that, I can run hundreds > > > of iterations of generic/703 without a problem, so this appears to be a > > > regression associated with the journal pipeline improvements. I'm > > > currently re-running on the last known good commit with my test tweaks > > > backed out (i.e. so back to io_uring and drop writes) just to > > > corroborate that it's the same problem, but so far it's running as > > > expected... > > > > So I suppose the journal must be getting stuck, and a journal write > > isn't completing - what does sysfs internal/journal_debug say when it happens? > > > > dirty journal entries: 0/32768 > seq: 11 > seq_ondisk: 11 > last_seq: 12 > last_seq_ondisk: 11 > flushed_seq_ondisk: 11 > watermark: stripe > each entry reserved: 346 > nr flush writes: 10 > nr noflush writes: 1 > average write size: 10.9 KiB > nr direct reclaim: 0 > nr background reclaim: 25 > reclaim kicked: 0 > reclaim runs in: 0 ms > current entry sectors: 512 > current entry error: ok > current entry: closed > unwritten entries: > last buf closed > replay done: 1 > space: > discarded 512:244736 > clean ondisk 512:244736 > clean 512:244736 > total 512:245760 > dev 0: > nr 480 > bucket size 512 > available 478:266 > discard_idx 1 > dirty_ondisk 1 (seq 11) > dirty_idx 1 (seq 11) > cur_idx 1 (seq 11) > Can you enable closure debugging (in kconfig?) That will show outstanding closures and info on what they're waiting on in debugfs. bch2_dio_write_flush is doing a journal flush and waiting on it to complete, but the journal isn't doing anything, so... lost wakeup, perhaps? Another thing to check for would be if we have waiters on journal_bufs that aren't in use - maybe add some simple assertions in journal_debug_to_text() -> journal_bufs_to_text(), if a buf isn't active and isn't in the range that's being printed just make sure buf->wait is NULL. and journal_buf_to_text() doesn't seem to print out whether the waitlist has waiters, that's an oversight... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] bcachefs fio lockup via generic/703 2024-02-28 20:13 ` Brian Foster 2024-02-28 23:43 ` Kent Overstreet @ 2024-02-29 0:02 ` Kent Overstreet 2024-02-29 2:59 ` Kent Overstreet 1 sibling, 1 reply; 8+ messages in thread From: Kent Overstreet @ 2024-02-29 0:02 UTC (permalink / raw) To: Brian Foster; +Cc: linux-bcachefs On Wed, Feb 28, 2024 at 03:13:04PM -0500, Brian Foster wrote: > On Wed, Feb 28, 2024 at 03:03:06PM -0500, Kent Overstreet wrote: > > On Wed, Feb 28, 2024 at 02:47:26PM -0500, Brian Foster wrote: > > > Hi Kent, > > > > > > Firstly, I confirmed that today's master seems to avoid the splat I sent > > > previously (re: your comment about a reverse journal replay patch or > > > some such). > > > > > > I still reproduce the stall issue on this system. After peeling away at > > > it, I was eventually able to reproduce without the drop writes > > > (dm-flakey) behavior from the test, and with fio using either the libaio > > > or sync I/O engine options. The sync I/O mode fortunately provides a > > > more useful stack trace: > > > > > > # cat /proc/177747/stack > > > [<0>] bch2_dio_write_flush+0x122/0x160 [bcachefs] > > > [<0>] bch2_direct_write+0xb53/0xce0 [bcachefs] > > > [<0>] bch2_write_iter+0x142/0xc70 [bcachefs] > > > [<0>] vfs_write+0x29b/0x470 > > > [<0>] ksys_write+0x6f/0xf0 > > > [<0>] do_syscall_64+0x86/0x170 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > > > > ... which resolves down to the closure_sync() call in > > > bch2_dio_write_flush(). The problem seems to go away if I remove the > > > preceding journal flush from that function. This seems to rule out > > > io_uring/aio and instead suggest that we're getting stuck somehow > > > waiting on a journal flush. > > > > > > Based on that I went back to the first commit before 746a33c96b7a0 > > > ("bcachefs: better journal pipelining"). With that, I can run hundreds > > > of iterations of generic/703 without a problem, so this appears to be a > > > regression associated with the journal pipeline improvements. I'm > > > currently re-running on the last known good commit with my test tweaks > > > backed out (i.e. so back to io_uring and drop writes) just to > > > corroborate that it's the same problem, but so far it's running as > > > expected... > > > > So I suppose the journal must be getting stuck, and a journal write > > isn't completing - what does sysfs internal/journal_debug say when it happens? > > > > dirty journal entries: 0/32768 > seq: 11 > seq_ondisk: 11 > last_seq: 12 > last_seq_ondisk: 11 > flushed_seq_ondisk: 11 > watermark: stripe > each entry reserved: 346 > nr flush writes: 10 > nr noflush writes: 1 > average write size: 10.9 KiB > nr direct reclaim: 0 > nr background reclaim: 25 > reclaim kicked: 0 > reclaim runs in: 0 ms > current entry sectors: 512 > current entry error: ok > current entry: closed > unwritten entries: > last buf closed > replay done: 1 > space: > discarded 512:244736 > clean ondisk 512:244736 > clean 512:244736 > total 512:245760 > dev 0: > nr 480 > bucket size 512 > available 478:266 > discard_idx 1 > dirty_ondisk 1 (seq 11) > dirty_idx 1 (seq 11) > cur_idx 1 (seq 11) > I might see it - journal_last_unwritten_seq() is now wrong, it's assuming that j->seq_ondisk + 1 hasn't been submitted yet. bch2_journal_flush_seq_async() then sets that journal buf to "must be a flush write", but if it's already been submitted - whoops But your journal_debug says seq_ondisk == flushed_seq_ondisk, so maybe that's not actually it here? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] bcachefs fio lockup via generic/703 2024-02-29 0:02 ` Kent Overstreet @ 2024-02-29 2:59 ` Kent Overstreet 2024-02-29 15:55 ` Brian Foster 0 siblings, 1 reply; 8+ messages in thread From: Kent Overstreet @ 2024-02-29 2:59 UTC (permalink / raw) To: Brian Foster; +Cc: linux-bcachefs On Wed, Feb 28, 2024 at 07:02:39PM -0500, Kent Overstreet wrote: > On Wed, Feb 28, 2024 at 03:13:04PM -0500, Brian Foster wrote: > > On Wed, Feb 28, 2024 at 03:03:06PM -0500, Kent Overstreet wrote: > > > On Wed, Feb 28, 2024 at 02:47:26PM -0500, Brian Foster wrote: > > > > Hi Kent, > > > > > > > > Firstly, I confirmed that today's master seems to avoid the splat I sent > > > > previously (re: your comment about a reverse journal replay patch or > > > > some such). > > > > > > > > I still reproduce the stall issue on this system. After peeling away at > > > > it, I was eventually able to reproduce without the drop writes > > > > (dm-flakey) behavior from the test, and with fio using either the libaio > > > > or sync I/O engine options. The sync I/O mode fortunately provides a > > > > more useful stack trace: > > > > > > > > # cat /proc/177747/stack > > > > [<0>] bch2_dio_write_flush+0x122/0x160 [bcachefs] > > > > [<0>] bch2_direct_write+0xb53/0xce0 [bcachefs] > > > > [<0>] bch2_write_iter+0x142/0xc70 [bcachefs] > > > > [<0>] vfs_write+0x29b/0x470 > > > > [<0>] ksys_write+0x6f/0xf0 > > > > [<0>] do_syscall_64+0x86/0x170 > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > > > > > > ... which resolves down to the closure_sync() call in > > > > bch2_dio_write_flush(). The problem seems to go away if I remove the > > > > preceding journal flush from that function. This seems to rule out > > > > io_uring/aio and instead suggest that we're getting stuck somehow > > > > waiting on a journal flush. > > > > > > > > Based on that I went back to the first commit before 746a33c96b7a0 > > > > ("bcachefs: better journal pipelining"). With that, I can run hundreds > > > > of iterations of generic/703 without a problem, so this appears to be a > > > > regression associated with the journal pipeline improvements. I'm > > > > currently re-running on the last known good commit with my test tweaks > > > > backed out (i.e. so back to io_uring and drop writes) just to > > > > corroborate that it's the same problem, but so far it's running as > > > > expected... > > > > > > So I suppose the journal must be getting stuck, and a journal write > > > isn't completing - what does sysfs internal/journal_debug say when it happens? > > > > > > > dirty journal entries: 0/32768 > > seq: 11 > > seq_ondisk: 11 > > last_seq: 12 > > last_seq_ondisk: 11 > > flushed_seq_ondisk: 11 > > watermark: stripe > > each entry reserved: 346 > > nr flush writes: 10 > > nr noflush writes: 1 > > average write size: 10.9 KiB > > nr direct reclaim: 0 > > nr background reclaim: 25 > > reclaim kicked: 0 > > reclaim runs in: 0 ms > > current entry sectors: 512 > > current entry error: ok > > current entry: closed > > unwritten entries: > > last buf closed > > replay done: 1 > > space: > > discarded 512:244736 > > clean ondisk 512:244736 > > clean 512:244736 > > total 512:245760 > > dev 0: > > nr 480 > > bucket size 512 > > available 478:266 > > discard_idx 1 > > dirty_ondisk 1 (seq 11) > > dirty_idx 1 (seq 11) > > cur_idx 1 (seq 11) > > > > I might see it - journal_last_unwritten_seq() is now wrong, it's > assuming that j->seq_ondisk + 1 hasn't been submitted yet. > > bch2_journal_flush_seq_async() then sets that journal buf to "must be a > flush write", but if it's already been submitted - whoops On further reading, journal_flusH_seq_async() looks correct, but bch2_journal_noflush_seq() was wrong - the following patch is in -testing: From 3c8f22258ab39992ac8a4b077610a30c733f068d Mon Sep 17 00:00:00 2001 From: Kent Overstreet <kent.overstreet@linux.dev> Date: Wed, 28 Feb 2024 21:56:57 -0500 Subject: [PATCH] bcachefs: Fix bch2_journal_noflush_seq() Improved journal pipelining broke journal_noflush_seq(); it implicitly assumed only the oldest outstanding journal buf could be in flight, but that's no longer true. Make this more straightforward by just setting buf->must_flush whenever we know a journal buf is going to be flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c index ed1724b52204..3dc1534d1d3c 100644 --- a/fs/bcachefs/journal.c +++ b/fs/bcachefs/journal.c @@ -715,7 +715,7 @@ int bch2_journal_flush_seq_async(struct journal *j, u64 seq, return ret; seq = res.seq; - buf = j->buf + (seq & JOURNAL_BUF_MASK); + buf = journal_seq_to_buf(j, seq); buf->must_flush = true; if (!buf->flush_time) { @@ -733,8 +733,8 @@ int bch2_journal_flush_seq_async(struct journal *j, u64 seq, } /* - * if write was kicked off without a flush, flush the next sequence - * number instead + * if write was kicked off without a flush, or if we promised it + * wouldn't be a flush, flush the next sequence number instead */ buf = journal_seq_to_buf(j, seq); if (buf->noflush) { @@ -812,8 +812,8 @@ bool bch2_journal_noflush_seq(struct journal *j, u64 seq) unwritten_seq++) { struct journal_buf *buf = journal_seq_to_buf(j, unwritten_seq); - /* journal write is already in flight, and was a flush write: */ - if (unwritten_seq == journal_last_unwritten_seq(j) && !buf->noflush) + /* journal flush already in flight, or flush requseted */ + if (buf->must_flush) goto out; buf->noflush = true; diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c index 4557b5c1f7e1..77e93ad875d6 100644 --- a/fs/bcachefs/journal_io.c +++ b/fs/bcachefs/journal_io.c @@ -1935,6 +1935,7 @@ static int bch2_journal_write_pick_flush(struct journal *j, struct journal_buf * j->nr_noflush_writes++; } else { + w->must_flush = true; j->last_flush_write = jiffies; j->nr_flush_writes++; clear_bit(JOURNAL_NEED_FLUSH_WRITE, &j->flags); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [BUG] bcachefs fio lockup via generic/703 2024-02-29 2:59 ` Kent Overstreet @ 2024-02-29 15:55 ` Brian Foster 2024-02-29 16:24 ` Kent Overstreet 0 siblings, 1 reply; 8+ messages in thread From: Brian Foster @ 2024-02-29 15:55 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcachefs On Wed, Feb 28, 2024 at 09:59:12PM -0500, Kent Overstreet wrote: > On Wed, Feb 28, 2024 at 07:02:39PM -0500, Kent Overstreet wrote: > > On Wed, Feb 28, 2024 at 03:13:04PM -0500, Brian Foster wrote: > > > On Wed, Feb 28, 2024 at 03:03:06PM -0500, Kent Overstreet wrote: > > > > On Wed, Feb 28, 2024 at 02:47:26PM -0500, Brian Foster wrote: > > > > > Hi Kent, > > > > > > > > > > Firstly, I confirmed that today's master seems to avoid the splat I sent > > > > > previously (re: your comment about a reverse journal replay patch or > > > > > some such). > > > > > > > > > > I still reproduce the stall issue on this system. After peeling away at > > > > > it, I was eventually able to reproduce without the drop writes > > > > > (dm-flakey) behavior from the test, and with fio using either the libaio > > > > > or sync I/O engine options. The sync I/O mode fortunately provides a > > > > > more useful stack trace: > > > > > > > > > > # cat /proc/177747/stack > > > > > [<0>] bch2_dio_write_flush+0x122/0x160 [bcachefs] > > > > > [<0>] bch2_direct_write+0xb53/0xce0 [bcachefs] > > > > > [<0>] bch2_write_iter+0x142/0xc70 [bcachefs] > > > > > [<0>] vfs_write+0x29b/0x470 > > > > > [<0>] ksys_write+0x6f/0xf0 > > > > > [<0>] do_syscall_64+0x86/0x170 > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > > > > > > > > ... which resolves down to the closure_sync() call in > > > > > bch2_dio_write_flush(). The problem seems to go away if I remove the > > > > > preceding journal flush from that function. This seems to rule out > > > > > io_uring/aio and instead suggest that we're getting stuck somehow > > > > > waiting on a journal flush. > > > > > > > > > > Based on that I went back to the first commit before 746a33c96b7a0 > > > > > ("bcachefs: better journal pipelining"). With that, I can run hundreds > > > > > of iterations of generic/703 without a problem, so this appears to be a > > > > > regression associated with the journal pipeline improvements. I'm > > > > > currently re-running on the last known good commit with my test tweaks > > > > > backed out (i.e. so back to io_uring and drop writes) just to > > > > > corroborate that it's the same problem, but so far it's running as > > > > > expected... > > > > > > > > So I suppose the journal must be getting stuck, and a journal write > > > > isn't completing - what does sysfs internal/journal_debug say when it happens? > > > > > > > ... > > > > I might see it - journal_last_unwritten_seq() is now wrong, it's > > assuming that j->seq_ondisk + 1 hasn't been submitted yet. > > > > bch2_journal_flush_seq_async() then sets that journal buf to "must be a > > flush write", but if it's already been submitted - whoops > > On further reading, journal_flusH_seq_async() looks correct, but > bch2_journal_noflush_seq() was wrong - the following patch is in > -testing: > ... So I can still reproduce the issue on master, which includes commit 3c8f22258ab ("bcachefs: Fix bch2_journal_noflush_seq()"). Just for reference, the related debug info is as follows: # cat /proc/13918/stack [<0>] bch2_dio_write_flush+0x15b/0x190 [bcachefs] [<0>] bch2_direct_write+0xb75/0xd30 [bcachefs] [<0>] bch2_write_iter+0x4c/0xf10 [bcachefs] [<0>] vfs_write+0x29b/0x470 [<0>] ksys_write+0x6f/0xf0 [<0>] do_syscall_64+0x86/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 # cat journal_debug dirty journal entries: 0/32768 seq: 72 seq_ondisk: 72 last_seq: 73 last_seq_ondisk: 72 flushed_seq_ondisk: 72 watermark: stripe each entry reserved: 361 nr flush writes: 65 nr noflush writes: 0 average write size: 2.14 KiB nr direct reclaim: 0 nr background reclaim: 66 reclaim kicked: 0 reclaim runs in: 0 ms blocked: 0 current entry sectors: 512 current entry error: ok current entry: closed unwritten entries: last buf closed replay done: 1 space: discarded 512:244736 clean ondisk 512:244736 clean 512:244736 total 512:245760 dev 0: nr 480 bucket size 512 available 478:188 discard_idx 1 dirty_ondisk 1 (seq 72) dirty_idx 1 (seq 72) cur_idx 1 (seq 72) # cat closures 0000000044781eb9: __closure_sync+0x49/0x180 -> closure_sync_fn+0x0/0x30 p 0000000000000000 r 1 W bch2_journal_flush_seq_async.part.0+0xed/0x590 [bcachefs] 000000001b9a5601: bch2_fs_open+0x538/0x15e0 [bcachefs] -> 0x0 p 0000000000000000 r 1 R Brian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] bcachefs fio lockup via generic/703 2024-02-29 15:55 ` Brian Foster @ 2024-02-29 16:24 ` Kent Overstreet 0 siblings, 0 replies; 8+ messages in thread From: Kent Overstreet @ 2024-02-29 16:24 UTC (permalink / raw) To: Brian Foster; +Cc: linux-bcachefs On Thu, Feb 29, 2024 at 10:55:46AM -0500, Brian Foster wrote: > So I can still reproduce the issue on master, which includes commit > 3c8f22258ab ("bcachefs: Fix bch2_journal_noflush_seq()"). Just for > reference, the related debug info is as follows: > > # cat /proc/13918/stack > [<0>] bch2_dio_write_flush+0x15b/0x190 [bcachefs] > [<0>] bch2_direct_write+0xb75/0xd30 [bcachefs] > [<0>] bch2_write_iter+0x4c/0xf10 [bcachefs] > [<0>] vfs_write+0x29b/0x470 > [<0>] ksys_write+0x6f/0xf0 > [<0>] do_syscall_64+0x86/0x170 > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > # cat journal_debug > dirty journal entries: 0/32768 > seq: 72 > seq_ondisk: 72 > last_seq: 73 > last_seq_ondisk: 72 > flushed_seq_ondisk: 72 > watermark: stripe > each entry reserved: 361 > nr flush writes: 65 > nr noflush writes: 0 > average write size: 2.14 KiB > nr direct reclaim: 0 > nr background reclaim: 66 > reclaim kicked: 0 > reclaim runs in: 0 ms > blocked: 0 > current entry sectors: 512 > current entry error: ok > current entry: closed > unwritten entries: > last buf closed > replay done: 1 > space: > discarded 512:244736 > clean ondisk 512:244736 > clean 512:244736 > total 512:245760 > dev 0: > nr 480 > bucket size 512 > available 478:188 > discard_idx 1 > dirty_ondisk 1 (seq 72) > dirty_idx 1 (seq 72) > cur_idx 1 (seq 72) > > # cat closures > 0000000044781eb9: __closure_sync+0x49/0x180 -> closure_sync_fn+0x0/0x30 p 0000000000000000 r 1 > W bch2_journal_flush_seq_async.part.0+0xed/0x590 [bcachefs] > > 000000001b9a5601: bch2_fs_open+0x538/0x15e0 [bcachefs] -> 0x0 p 0000000000000000 r 1 R So, definitely some sort of lost wakeup, or we accidentally waited on a journal buf that wasn't actually outstanding ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-02-29 16:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-28 19:47 [BUG] bcachefs fio lockup via generic/703 Brian Foster 2024-02-28 20:03 ` Kent Overstreet 2024-02-28 20:13 ` Brian Foster 2024-02-28 23:43 ` Kent Overstreet 2024-02-29 0:02 ` Kent Overstreet 2024-02-29 2:59 ` Kent Overstreet 2024-02-29 15:55 ` Brian Foster 2024-02-29 16:24 ` Kent Overstreet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox