public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] bcachefs: journal bug fixes
@ 2023-08-31 11:07 Brian Foster
  2023-08-31 11:07 ` [PATCH 1/3] bcachefs: restart journal reclaim thread on ro->rw transitions Brian Foster
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Brian Foster @ 2023-08-31 11:07 UTC (permalink / raw)
  To: linux-bcachefs

Hi all,

This series includes a couple bug fixes that fell out of ongoing
experimentation with freeze support. I think patch 1 is pretty
self-explanatory, patch 2 is a prepatory patch, and patch 3 addresses a
race in the journaling code.

All patches are available in my current test branch, with CI test
results below [1]. Note that branch also includes a patch to enable
freeze, but I'd rather not see that one land quite yet (which is why
it's not included here). It's so far only seen the type of testing
intended to shake these sorts of peripheral issues out, and I need to
run some more testing to confirm sane behavior.

Thoughts, reviews, flames appreciated.

Brian

[1] https://evilpiepirate.org/~testdashboard/ci?branch=bfoster

Brian Foster (3):
  bcachefs: restart journal reclaim thread on ro->rw transitions
  bcachefs: prepare journal buf put to handle pin put
  bcachefs: fix race between journal entry close and pin set

 fs/bcachefs/journal.c | 13 +------------
 fs/bcachefs/journal.h | 39 +++++++++++++++++++++++++++++++++------
 fs/bcachefs/super.c   |  4 ++++
 3 files changed, 38 insertions(+), 18 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] bcachefs: restart journal reclaim thread on ro->rw transitions
  2023-08-31 11:07 [PATCH 0/3] bcachefs: journal bug fixes Brian Foster
@ 2023-08-31 11:07 ` Brian Foster
  2023-08-31 11:07 ` [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put Brian Foster
  2023-08-31 11:07 ` [PATCH 3/3] bcachefs: fix race between journal entry close and pin set Brian Foster
  2 siblings, 0 replies; 14+ messages in thread
From: Brian Foster @ 2023-08-31 11:07 UTC (permalink / raw)
  To: linux-bcachefs

Commit c2d5ff36065a4 ("bcachefs: Start journal reclaim thread
earlier") tweaked reclaim thread management to start a bit earlier
in the mount sequence by moving the start call from
__bch2_fs_read_write() to bch2_fs_journal_start(). This has the side
effect of never starting the reclaim thread on a ro->rw transition,
which can be observed by monitoring reclaim behavior via the
journal_reclaim tracepoints. I.e. once an fs has remounted ro->rw,
we only ever rely on direct reclaim from that point forward.

Since bch2_journal_reclaim_start() properly handles the case where
the reclaim thread has already been created, restore the start call
in the read-write helper. This allows the reclaim thread to start
early when appropriate and also exit/restart on remounts or freeze
cycles. In the latter case it may be possible to simply allow the
task to freeze rather than destroy it, but for now just fix the
immediate bug.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/bcachefs/super.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index 60424865980d..29cd71445a94 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -421,6 +421,10 @@ static int __bch2_fs_read_write(struct bch_fs *c, bool early)
 		return ret;
 	}
 
+	ret = bch2_journal_reclaim_start(&c->journal);
+	if (ret)
+		goto err;
+
 	if (!early) {
 		ret = bch2_fs_read_write_late(c);
 		if (ret)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put
  2023-08-31 11:07 [PATCH 0/3] bcachefs: journal bug fixes Brian Foster
  2023-08-31 11:07 ` [PATCH 1/3] bcachefs: restart journal reclaim thread on ro->rw transitions Brian Foster
@ 2023-08-31 11:07 ` Brian Foster
  2023-09-03 21:09   ` Kent Overstreet
  2023-08-31 11:07 ` [PATCH 3/3] bcachefs: fix race between journal entry close and pin set Brian Foster
  2 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2023-08-31 11:07 UTC (permalink / raw)
  To: linux-bcachefs

bcachefs freeze testing has uncovered some raciness between journal
entry open/close and pin list reference count management. The
details of the problem are described in a separate patch. In
preparation for the associated fix, refactor the journal buffer put
path a bit to allow it to eventually handle dropping the pin list
reference currently held by an open journal entry. Open code the
journal write dispatch since it is essentially a single line and
instead factor out the journal state update to be reused, consistent
with other journal state update helpers. No functional changes in
this patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/bcachefs/journal.c |  9 ---------
 fs/bcachefs/journal.h | 22 +++++++++++++++++-----
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c
index 055920c26da6..76ebcdb3e3b4 100644
--- a/fs/bcachefs/journal.c
+++ b/fs/bcachefs/journal.c
@@ -132,15 +132,6 @@ journal_error_check_stuck(struct journal *j, int error, unsigned flags)
 	return stuck;
 }
 
-/* journal entry close/open: */
-
-void __bch2_journal_buf_put(struct journal *j)
-{
-	struct bch_fs *c = container_of(j, struct bch_fs, journal);
-
-	closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
-}
-
 /*
  * Returns true if journal entry is now closed:
  *
diff --git a/fs/bcachefs/journal.h b/fs/bcachefs/journal.h
index 008a2e25a4fa..c7a31da077c9 100644
--- a/fs/bcachefs/journal.h
+++ b/fs/bcachefs/journal.h
@@ -252,9 +252,10 @@ static inline bool journal_entry_empty(struct jset *j)
 	return true;
 }
 
-void __bch2_journal_buf_put(struct journal *);
-
-static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
+/*
+ * Drop reference on a buffer index and return true if the count has hit zero.
+ */
+static inline union journal_res_state journal_state_buf_put(struct journal *j, unsigned idx)
 {
 	union journal_res_state s;
 
@@ -264,9 +265,20 @@ static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
 				    .buf2_count = idx == 2,
 				    .buf3_count = idx == 3,
 				    }).v, &j->reservations.counter);
+	return s;
+}
 
-	if (!journal_state_count(s, idx) && idx == s.unwritten_idx)
-		__bch2_journal_buf_put(j);
+void bch2_journal_write(struct closure *);
+static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
+{
+	struct bch_fs *c = container_of(j, struct bch_fs, journal);
+	union journal_res_state s;
+
+	s = journal_state_buf_put(j, idx);
+	if (!journal_state_count(s, idx)) {
+		if (idx == s.unwritten_idx)
+			closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
+	}
 }
 
 /*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-08-31 11:07 [PATCH 0/3] bcachefs: journal bug fixes Brian Foster
  2023-08-31 11:07 ` [PATCH 1/3] bcachefs: restart journal reclaim thread on ro->rw transitions Brian Foster
  2023-08-31 11:07 ` [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put Brian Foster
@ 2023-08-31 11:07 ` Brian Foster
  2023-09-03 21:18   ` Kent Overstreet
  2 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2023-08-31 11:07 UTC (permalink / raw)
  To: linux-bcachefs

bcachefs freeze testing via fstests generic/390 occasionally
reproduces the following BUG from bch2_fs_read_only():

  BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));

This indicates that one or more dirty key cache keys still exist
after the attempt to flush and quiesce the fs. The sequence that
leads to this problem actually occurs on unfreeze (ro->rw), and
looks something like the following:

- Task A begins a transaction commit and acquires journal_res for
  the current seq. This transaction intends to perform key cache
  insertion.
- Task B begins a bch2_journal_flush() via bch2_sync_fs(). This ends
  up in journal_entry_want_write(), which closes the current journal
  entry and drops the reference to the pin list created on entry open.
  The pin put pops the front of the journal via fast reclaim since the
  reference count has dropped to 0.
- Task A attempts to set the journal pin for the associated cached
  key, but bch2_journal_pin_set() skips the pin insert because the
  seq of the transaction reservation is behind the front of the pin
  list fifo.

The end result is that the pin associated with the cached key is not
added, which prevents a subsequent reclaim from processing the key
and thus leaves it dangling at freeze time. The fundamental cause of
this problem is that the front of the journal is allowed to pop
before a transaction with outstanding reservation on the associated
journal seq is able to add a pin. The count for the pin list
associated with the seq drops to zero and is prematurely reclaimed
as a result.

The logical fix for this problem lies in how the journal buffer is
managed in similar scenarios where the entry might have been closed
before a transaction with outstanding reservations happens to be
committed.

When a journal entry is opened, the current sequence number is
bumped, the associated pin list is initialized with a reference
count of 1, and the journal buffer reference count is bumped (via
journal_state_inc()). When a journal reservation is acquired, the
reservation also acquires a reference on the associated buffer. If
the journal entry is closed in the meantime, it drops both the pin
and buffer references held by the open entry, but the buffer still
has references held by outstanding reservation. After the associated
transaction commits, the reservation release drops the associated
buffer references and the buffer is written out once the reference
count has dropped to zero.

The fundamental problem here is that the lifecycle of the pin list
reference held by an open journal entry is too short to cover
the processing of transactions with outstanding reservations. The
simplest way to address this is to expand the pin list reference
to the lifecycle of the buffer vs. the shorter lifecycle of the
open journal entry. This allows a journal entry to closer for
whatever reason, but ensures the pin list for the seq with
outstanding reservation cannot be popped and reclaimed before all
outstanding reservations have been released.

Move the pin put from journal entry close to where final reference
count processing of the journal buffer occurs. Create a duplicate
helper to cover the case where the caller doesn't already hold the
journal lock. This allows generic/390 to pass reliably.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/bcachefs/journal.c |  4 +---
 fs/bcachefs/journal.h | 19 +++++++++++++++++--
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c
index 76ebcdb3e3b4..21f97d48931d 100644
--- a/fs/bcachefs/journal.c
+++ b/fs/bcachefs/journal.c
@@ -195,13 +195,11 @@ static void __journal_entry_close(struct journal *j, unsigned closed_val)
 	buf->data->last_seq	= cpu_to_le64(buf->last_seq);
 	BUG_ON(buf->last_seq > le64_to_cpu(buf->data->seq));
 
-	__bch2_journal_pin_put(j, le64_to_cpu(buf->data->seq));
-
 	cancel_delayed_work(&j->write_work);
 
 	bch2_journal_space_available(j);
 
-	bch2_journal_buf_put(j, old.idx);
+	__bch2_journal_buf_put(j, old.idx, le64_to_cpu(buf->data->seq));
 }
 
 void bch2_journal_halt(struct journal *j)
diff --git a/fs/bcachefs/journal.h b/fs/bcachefs/journal.h
index c7a31da077c9..5dea5d46c616 100644
--- a/fs/bcachefs/journal.h
+++ b/fs/bcachefs/journal.h
@@ -112,6 +112,7 @@
 #include <linux/hash.h>
 
 #include "journal_types.h"
+#include "journal_reclaim.h"
 
 struct bch_fs;
 
@@ -269,13 +270,27 @@ static inline union journal_res_state journal_state_buf_put(struct journal *j, u
 }
 
 void bch2_journal_write(struct closure *);
-static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
+static inline void __bch2_journal_buf_put(struct journal *j, unsigned idx, u64 seq)
 {
 	struct bch_fs *c = container_of(j, struct bch_fs, journal);
 	union journal_res_state s;
 
 	s = journal_state_buf_put(j, idx);
 	if (!journal_state_count(s, idx)) {
+		__bch2_journal_pin_put(j, seq);
+		if (idx == s.unwritten_idx)
+			closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
+	}
+}
+
+static inline void bch2_journal_buf_put(struct journal *j, unsigned idx, u64 seq)
+{
+	struct bch_fs *c = container_of(j, struct bch_fs, journal);
+	union journal_res_state s;
+
+	s = journal_state_buf_put(j, idx);
+	if (!journal_state_count(s, idx)) {
+		bch2_journal_pin_put(j, seq);
 		if (idx == s.unwritten_idx)
 			closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
 	}
@@ -298,7 +313,7 @@ static inline void bch2_journal_res_put(struct journal *j,
 				       BCH_JSET_ENTRY_btree_keys,
 				       0, 0, 0);
 
-	bch2_journal_buf_put(j, res->idx);
+	bch2_journal_buf_put(j, res->idx, res->seq);
 
 	res->ref = 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put
  2023-08-31 11:07 ` [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put Brian Foster
@ 2023-09-03 21:09   ` Kent Overstreet
  2023-09-05 12:41     ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Kent Overstreet @ 2023-09-03 21:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

On Thu, Aug 31, 2023 at 07:07:33AM -0400, Brian Foster wrote:
> bcachefs freeze testing has uncovered some raciness between journal
> entry open/close and pin list reference count management. The
> details of the problem are described in a separate patch. In
> preparation for the associated fix, refactor the journal buffer put
> path a bit to allow it to eventually handle dropping the pin list
> reference currently held by an open journal entry. Open code the
> journal write dispatch since it is essentially a single line and
> instead factor out the journal state update to be reused, consistent
> with other journal state update helpers. No functional changes in
> this patch.

So, I don't want to take this patch as is because even though
__bch2_journal_buf_put() is small, you're putting it into an inline
fastpath that we want to be as small and fast as possible - the main
transaction commit path.

journal_state_buf_put() is fine though, happy to take that part of the
patch.

> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/bcachefs/journal.c |  9 ---------
>  fs/bcachefs/journal.h | 22 +++++++++++++++++-----
>  2 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c
> index 055920c26da6..76ebcdb3e3b4 100644
> --- a/fs/bcachefs/journal.c
> +++ b/fs/bcachefs/journal.c
> @@ -132,15 +132,6 @@ journal_error_check_stuck(struct journal *j, int error, unsigned flags)
>  	return stuck;
>  }
>  
> -/* journal entry close/open: */
> -
> -void __bch2_journal_buf_put(struct journal *j)
> -{
> -	struct bch_fs *c = container_of(j, struct bch_fs, journal);
> -
> -	closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
> -}
> -
>  /*
>   * Returns true if journal entry is now closed:
>   *
> diff --git a/fs/bcachefs/journal.h b/fs/bcachefs/journal.h
> index 008a2e25a4fa..c7a31da077c9 100644
> --- a/fs/bcachefs/journal.h
> +++ b/fs/bcachefs/journal.h
> @@ -252,9 +252,10 @@ static inline bool journal_entry_empty(struct jset *j)
>  	return true;
>  }
>  
> -void __bch2_journal_buf_put(struct journal *);
> -
> -static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> +/*
> + * Drop reference on a buffer index and return true if the count has hit zero.
> + */
> +static inline union journal_res_state journal_state_buf_put(struct journal *j, unsigned idx)
>  {
>  	union journal_res_state s;
>  
> @@ -264,9 +265,20 @@ static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
>  				    .buf2_count = idx == 2,
>  				    .buf3_count = idx == 3,
>  				    }).v, &j->reservations.counter);
> +	return s;
> +}
>  
> -	if (!journal_state_count(s, idx) && idx == s.unwritten_idx)
> -		__bch2_journal_buf_put(j);
> +void bch2_journal_write(struct closure *);
> +static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> +{
> +	struct bch_fs *c = container_of(j, struct bch_fs, journal);
> +	union journal_res_state s;
> +
> +	s = journal_state_buf_put(j, idx);
> +	if (!journal_state_count(s, idx)) {
> +		if (idx == s.unwritten_idx)
> +			closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
> +	}
>  }
>  
>  /*
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-08-31 11:07 ` [PATCH 3/3] bcachefs: fix race between journal entry close and pin set Brian Foster
@ 2023-09-03 21:18   ` Kent Overstreet
  2023-09-04  2:29     ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: Kent Overstreet @ 2023-09-03 21:18 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

On Thu, Aug 31, 2023 at 07:07:34AM -0400, Brian Foster wrote:
> bcachefs freeze testing via fstests generic/390 occasionally
> reproduces the following BUG from bch2_fs_read_only():
> 
>   BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
> 
> This indicates that one or more dirty key cache keys still exist
> after the attempt to flush and quiesce the fs. The sequence that
> leads to this problem actually occurs on unfreeze (ro->rw), and
> looks something like the following:
> 
> - Task A begins a transaction commit and acquires journal_res for
>   the current seq. This transaction intends to perform key cache
>   insertion.
> - Task B begins a bch2_journal_flush() via bch2_sync_fs(). This ends
>   up in journal_entry_want_write(), which closes the current journal
>   entry and drops the reference to the pin list created on entry open.
>   The pin put pops the front of the journal via fast reclaim since the
>   reference count has dropped to 0.
> - Task A attempts to set the journal pin for the associated cached
>   key, but bch2_journal_pin_set() skips the pin insert because the
>   seq of the transaction reservation is behind the front of the pin
>   list fifo.
> 
> The end result is that the pin associated with the cached key is not
> added, which prevents a subsequent reclaim from processing the key
> and thus leaves it dangling at freeze time. The fundamental cause of
> this problem is that the front of the journal is allowed to pop
> before a transaction with outstanding reservation on the associated
> journal seq is able to add a pin. The count for the pin list
> associated with the seq drops to zero and is prematurely reclaimed
> as a result.
> 
> The logical fix for this problem lies in how the journal buffer is
> managed in similar scenarios where the entry might have been closed
> before a transaction with outstanding reservations happens to be
> committed.
> 
> When a journal entry is opened, the current sequence number is
> bumped, the associated pin list is initialized with a reference
> count of 1, and the journal buffer reference count is bumped (via
> journal_state_inc()). When a journal reservation is acquired, the
> reservation also acquires a reference on the associated buffer. If
> the journal entry is closed in the meantime, it drops both the pin
> and buffer references held by the open entry, but the buffer still
> has references held by outstanding reservation. After the associated
> transaction commits, the reservation release drops the associated
> buffer references and the buffer is written out once the reference
> count has dropped to zero.
> 
> The fundamental problem here is that the lifecycle of the pin list
> reference held by an open journal entry is too short to cover
> the processing of transactions with outstanding reservations. The
> simplest way to address this is to expand the pin list reference
> to the lifecycle of the buffer vs. the shorter lifecycle of the
> open journal entry. This allows a journal entry to closer for
> whatever reason, but ensures the pin list for the seq with
> outstanding reservation cannot be popped and reclaimed before all
> outstanding reservations have been released.
> 
> Move the pin put from journal entry close to where final reference
> count processing of the journal buffer occurs. Create a duplicate
> helper to cover the case where the caller doesn't already hold the
> journal lock. This allows generic/390 to pass reliably.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Good find :)

Your fix makes sense, but I wonder if it might be simpler and better to
fix this in bch2_journal_reclaim_fast().

We could just change it to check if the pin list is for a journal buf
that's still open and has a nonzero refcount.

It seems on the surface to be a bug where just fixing the lifetime
refcounts is the right approach, but I think that ends up being the
trickier approach here, since bch2_journal_buf_put() now has a couple
things it's checking at different stages of the journal buf lifetime.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-09-03 21:18   ` Kent Overstreet
@ 2023-09-04  2:29     ` Kent Overstreet
  2023-09-05 12:59       ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Kent Overstreet @ 2023-09-04  2:29 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

On Sun, Sep 03, 2023 at 05:18:12PM -0400, Kent Overstreet wrote:
> On Thu, Aug 31, 2023 at 07:07:34AM -0400, Brian Foster wrote:
> > bcachefs freeze testing via fstests generic/390 occasionally
> > reproduces the following BUG from bch2_fs_read_only():
> > 
> >   BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
>
> Your fix makes sense, but I wonder if it might be simpler and better to
> fix this in bch2_journal_reclaim_fast().

What do you think of this fix?

diff --git a/fs/bcachefs/journal_reclaim.c b/fs/bcachefs/journal_reclaim.c
index 10e1860dad..021c56ac67 100644
--- a/fs/bcachefs/journal_reclaim.c
+++ b/fs/bcachefs/journal_reclaim.c
@@ -303,6 +303,12 @@ static void bch2_journal_reclaim_fast(struct journal *j)
 	 */
 	while (!fifo_empty(&j->pin) &&
 	       !atomic_read(&fifo_peek_front(&j->pin).count)) {
+		u64 seq = journal_last_seq(j);
+
+		if (seq > j->seq_ondisk &&
+		    journal_state_count(j->reservations, seq & JOURNAL_BUF_MASK))
+			break;
+
 		fifo_pop(&j->pin, temp);
 		popped = true;
 	}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put
  2023-09-03 21:09   ` Kent Overstreet
@ 2023-09-05 12:41     ` Brian Foster
  2023-09-10  1:55       ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2023-09-05 12:41 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

On Sun, Sep 03, 2023 at 05:09:04PM -0400, Kent Overstreet wrote:
> On Thu, Aug 31, 2023 at 07:07:33AM -0400, Brian Foster wrote:
> > bcachefs freeze testing has uncovered some raciness between journal
> > entry open/close and pin list reference count management. The
> > details of the problem are described in a separate patch. In
> > preparation for the associated fix, refactor the journal buffer put
> > path a bit to allow it to eventually handle dropping the pin list
> > reference currently held by an open journal entry. Open code the
> > journal write dispatch since it is essentially a single line and
> > instead factor out the journal state update to be reused, consistent
> > with other journal state update helpers. No functional changes in
> > this patch.
> 
> So, I don't want to take this patch as is because even though
> __bch2_journal_buf_put() is small, you're putting it into an inline
> fastpath that we want to be as small and fast as possible - the main
> transaction commit path.
> 

Can you elaborate on what you mean here? ISTM all this patch does is
replace a function call to __bch2_journal_buf_put() that already
exists..

Brian

> journal_state_buf_put() is fine though, happy to take that part of the
> patch.
> 
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/bcachefs/journal.c |  9 ---------
> >  fs/bcachefs/journal.h | 22 +++++++++++++++++-----
> >  2 files changed, 17 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c
> > index 055920c26da6..76ebcdb3e3b4 100644
> > --- a/fs/bcachefs/journal.c
> > +++ b/fs/bcachefs/journal.c
> > @@ -132,15 +132,6 @@ journal_error_check_stuck(struct journal *j, int error, unsigned flags)
> >  	return stuck;
> >  }
> >  
> > -/* journal entry close/open: */
> > -
> > -void __bch2_journal_buf_put(struct journal *j)
> > -{
> > -	struct bch_fs *c = container_of(j, struct bch_fs, journal);
> > -
> > -	closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
> > -}
> > -
> >  /*
> >   * Returns true if journal entry is now closed:
> >   *
> > diff --git a/fs/bcachefs/journal.h b/fs/bcachefs/journal.h
> > index 008a2e25a4fa..c7a31da077c9 100644
> > --- a/fs/bcachefs/journal.h
> > +++ b/fs/bcachefs/journal.h
> > @@ -252,9 +252,10 @@ static inline bool journal_entry_empty(struct jset *j)
> >  	return true;
> >  }
> >  
> > -void __bch2_journal_buf_put(struct journal *);
> > -
> > -static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> > +/*
> > + * Drop reference on a buffer index and return true if the count has hit zero.
> > + */
> > +static inline union journal_res_state journal_state_buf_put(struct journal *j, unsigned idx)
> >  {
> >  	union journal_res_state s;
> >  
> > @@ -264,9 +265,20 @@ static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> >  				    .buf2_count = idx == 2,
> >  				    .buf3_count = idx == 3,
> >  				    }).v, &j->reservations.counter);
> > +	return s;
> > +}
> >  
> > -	if (!journal_state_count(s, idx) && idx == s.unwritten_idx)
> > -		__bch2_journal_buf_put(j);
> > +void bch2_journal_write(struct closure *);
> > +static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> > +{
> > +	struct bch_fs *c = container_of(j, struct bch_fs, journal);
> > +	union journal_res_state s;
> > +
> > +	s = journal_state_buf_put(j, idx);
> > +	if (!journal_state_count(s, idx)) {
> > +		if (idx == s.unwritten_idx)
> > +			closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
> > +	}
> >  }
> >  
> >  /*
> > -- 
> > 2.41.0
> > 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-09-04  2:29     ` Kent Overstreet
@ 2023-09-05 12:59       ` Brian Foster
  2023-09-06 19:07         ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2023-09-05 12:59 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

On Sun, Sep 03, 2023 at 10:29:28PM -0400, Kent Overstreet wrote:
> On Sun, Sep 03, 2023 at 05:18:12PM -0400, Kent Overstreet wrote:
> > On Thu, Aug 31, 2023 at 07:07:34AM -0400, Brian Foster wrote:
> > > bcachefs freeze testing via fstests generic/390 occasionally
> > > reproduces the following BUG from bch2_fs_read_only():
> > > 
> > >   BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
> >
> > Your fix makes sense, but I wonder if it might be simpler and better to
> > fix this in bch2_journal_reclaim_fast().
> 
> What do you think of this fix?
> 
> diff --git a/fs/bcachefs/journal_reclaim.c b/fs/bcachefs/journal_reclaim.c
> index 10e1860dad..021c56ac67 100644
> --- a/fs/bcachefs/journal_reclaim.c
> +++ b/fs/bcachefs/journal_reclaim.c
> @@ -303,6 +303,12 @@ static void bch2_journal_reclaim_fast(struct journal *j)
>  	 */
>  	while (!fifo_empty(&j->pin) &&
>  	       !atomic_read(&fifo_peek_front(&j->pin).count)) {
> +		u64 seq = journal_last_seq(j);
> +
> +		if (seq > j->seq_ondisk &&
> +		    journal_state_count(j->reservations, seq & JOURNAL_BUF_MASK))
> +			break;
> +
>  		fifo_pop(&j->pin, temp);
>  		popped = true;
>  	}
> 

So my first reaction once I figured out what was going on was to
consider something like this, because it seemed like the issue was
premature release of the journal tail. After digging into that a bit it
didn't quite feel right, which is what had me digging further into why
the ref count was zero on a seq with outstanding reservation.

I think I see what the above is doing, but I really don't see how this
is simplified in any way. Maybe the code changes are slightly smaller,
but IMO the bigger picture is far easier to reason about when the
reference counts are properly honored. I.e., the outstanding
reservations hold references on the journal buffer associated with the
seq, and the active journal buffer holds a reference on the pin fifo.

Conversely, the above strikes me as unnecessarily clever based on the
fact that just to review it I need to grok behavior in several different
contexts (res and seq management, on disk seq updates, etc.) to convince
myself whether it's safe, and even then I'm not totally sure I have
enough context to say whether the state count check can't ever break on
something like a reused (for a new seq) journal buffer. Heh, the more I
think about that the more I have to ask: why not just make the reference
count work? ;P

FWIW, one thing I disliked about the original patch was the need to
mostly duplicate the buf put helper due to the locking context quirk. I
was trying to avoid that, but I ran out of ideas before I wanted to move
on actually fixing the bug. My preference would be to address the
reference counting issue as is (to preserve design simplicity), and then
maybe think a bit harder about cleaning up the res put implementation if
the primary concern is that we feel like this starts to make things a
bit convoluted..

Brian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-09-05 12:59       ` Brian Foster
@ 2023-09-06 19:07         ` Brian Foster
  2023-09-10  2:03           ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2023-09-06 19:07 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

On Tue, Sep 05, 2023 at 08:59:02AM -0400, Brian Foster wrote:
> On Sun, Sep 03, 2023 at 10:29:28PM -0400, Kent Overstreet wrote:
> > On Sun, Sep 03, 2023 at 05:18:12PM -0400, Kent Overstreet wrote:
> > > On Thu, Aug 31, 2023 at 07:07:34AM -0400, Brian Foster wrote:
> > > > bcachefs freeze testing via fstests generic/390 occasionally
> > > > reproduces the following BUG from bch2_fs_read_only():
> > > > 
> > > >   BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
> > >
> > > Your fix makes sense, but I wonder if it might be simpler and better to
> > > fix this in bch2_journal_reclaim_fast().
...
> 
> FWIW, one thing I disliked about the original patch was the need to
> mostly duplicate the buf put helper due to the locking context quirk. I
> was trying to avoid that, but I ran out of ideas before I wanted to move
> on actually fixing the bug. My preference would be to address the
> reference counting issue as is (to preserve design simplicity), and then
> maybe think a bit harder about cleaning up the res put implementation if
> the primary concern is that we feel like this starts to make things a
> bit convoluted..
> 

Another thought that comes to mind is to perhaps allow the journal_res
to hold a reference to the pin fifo for the associated seq.. The idea
would be we could continue to hold a reference during the open/close
journal buffer lifecycle, but a res of the same seq would acquire an
additional reference as well to keep the tail from popping before a
transaction can actually set a pin (i.e. essentially parallel to the
buffer reference).

In a sense the behavior in this patch is kind of an optimization of that
approach, but I think implementing it directly would eliminate the need
to mostly duplicate the put helper, perhaps making things a bit more
natural. We could still fall into bch2_journal_reclaim_fast() from
res_put(), but now only for cases where the reservation outlives the
active journal buffer (due to a journal flush or whatever). That also
still addresses the race by properly using the reference count. I'd
probably have to try it to be sure I'm not missing something, but...
thoughts?

Brian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put
  2023-09-05 12:41     ` Brian Foster
@ 2023-09-10  1:55       ` Kent Overstreet
  0 siblings, 0 replies; 14+ messages in thread
From: Kent Overstreet @ 2023-09-10  1:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

On Tue, Sep 05, 2023 at 08:41:44AM -0400, Brian Foster wrote:
> On Sun, Sep 03, 2023 at 05:09:04PM -0400, Kent Overstreet wrote:
> > On Thu, Aug 31, 2023 at 07:07:33AM -0400, Brian Foster wrote:
> > > bcachefs freeze testing has uncovered some raciness between journal
> > > entry open/close and pin list reference count management. The
> > > details of the problem are described in a separate patch. In
> > > preparation for the associated fix, refactor the journal buffer put
> > > path a bit to allow it to eventually handle dropping the pin list
> > > reference currently held by an open journal entry. Open code the
> > > journal write dispatch since it is essentially a single line and
> > > instead factor out the journal state update to be reused, consistent
> > > with other journal state update helpers. No functional changes in
> > > this patch.
> > 
> > So, I don't want to take this patch as is because even though
> > __bch2_journal_buf_put() is small, you're putting it into an inline
> > fastpath that we want to be as small and fast as possible - the main
> > transaction commit path.
> > 
> 
> Can you elaborate on what you mean here? ISTM all this patch does is
> replace a function call to __bch2_journal_buf_put() that already
> exists..

Yes, you made the code in __bch2_journal_buf_put() inline. Since that's
the slowpath, we don't want it inline...

> 
> Brian
> 
> > journal_state_buf_put() is fine though, happy to take that part of the
> > patch.
> > 
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/bcachefs/journal.c |  9 ---------
> > >  fs/bcachefs/journal.h | 22 +++++++++++++++++-----
> > >  2 files changed, 17 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c
> > > index 055920c26da6..76ebcdb3e3b4 100644
> > > --- a/fs/bcachefs/journal.c
> > > +++ b/fs/bcachefs/journal.c
> > > @@ -132,15 +132,6 @@ journal_error_check_stuck(struct journal *j, int error, unsigned flags)
> > >  	return stuck;
> > >  }
> > >  
> > > -/* journal entry close/open: */
> > > -
> > > -void __bch2_journal_buf_put(struct journal *j)
> > > -{
> > > -	struct bch_fs *c = container_of(j, struct bch_fs, journal);
> > > -
> > > -	closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
> > > -}
> > > -
> > >  /*
> > >   * Returns true if journal entry is now closed:
> > >   *
> > > diff --git a/fs/bcachefs/journal.h b/fs/bcachefs/journal.h
> > > index 008a2e25a4fa..c7a31da077c9 100644
> > > --- a/fs/bcachefs/journal.h
> > > +++ b/fs/bcachefs/journal.h
> > > @@ -252,9 +252,10 @@ static inline bool journal_entry_empty(struct jset *j)
> > >  	return true;
> > >  }
> > >  
> > > -void __bch2_journal_buf_put(struct journal *);
> > > -
> > > -static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> > > +/*
> > > + * Drop reference on a buffer index and return true if the count has hit zero.
> > > + */
> > > +static inline union journal_res_state journal_state_buf_put(struct journal *j, unsigned idx)
> > >  {
> > >  	union journal_res_state s;
> > >  
> > > @@ -264,9 +265,20 @@ static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> > >  				    .buf2_count = idx == 2,
> > >  				    .buf3_count = idx == 3,
> > >  				    }).v, &j->reservations.counter);
> > > +	return s;
> > > +}
> > >  
> > > -	if (!journal_state_count(s, idx) && idx == s.unwritten_idx)
> > > -		__bch2_journal_buf_put(j);
> > > +void bch2_journal_write(struct closure *);
> > > +static inline void bch2_journal_buf_put(struct journal *j, unsigned idx)
> > > +{
> > > +	struct bch_fs *c = container_of(j, struct bch_fs, journal);
> > > +	union journal_res_state s;
> > > +
> > > +	s = journal_state_buf_put(j, idx);
> > > +	if (!journal_state_count(s, idx)) {
> > > +		if (idx == s.unwritten_idx)
> > > +			closure_call(&j->io, bch2_journal_write, c->io_complete_wq, NULL);
> > > +	}
> > >  }
> > >  
> > >  /*
> > > -- 
> > > 2.41.0
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-09-06 19:07         ` Brian Foster
@ 2023-09-10  2:03           ` Kent Overstreet
  2023-09-12 18:54             ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Kent Overstreet @ 2023-09-10  2:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

On Wed, Sep 06, 2023 at 03:07:58PM -0400, Brian Foster wrote:
> On Tue, Sep 05, 2023 at 08:59:02AM -0400, Brian Foster wrote:
> > On Sun, Sep 03, 2023 at 10:29:28PM -0400, Kent Overstreet wrote:
> > > On Sun, Sep 03, 2023 at 05:18:12PM -0400, Kent Overstreet wrote:
> > > > On Thu, Aug 31, 2023 at 07:07:34AM -0400, Brian Foster wrote:
> > > > > bcachefs freeze testing via fstests generic/390 occasionally
> > > > > reproduces the following BUG from bch2_fs_read_only():
> > > > > 
> > > > >   BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
> > > >
> > > > Your fix makes sense, but I wonder if it might be simpler and better to
> > > > fix this in bch2_journal_reclaim_fast().
> ...
> > 
> > FWIW, one thing I disliked about the original patch was the need to
> > mostly duplicate the buf put helper due to the locking context quirk. I
> > was trying to avoid that, but I ran out of ideas before I wanted to move
> > on actually fixing the bug. My preference would be to address the
> > reference counting issue as is (to preserve design simplicity), and then
> > maybe think a bit harder about cleaning up the res put implementation if
> > the primary concern is that we feel like this starts to make things a
> > bit convoluted..
> > 
> 
> Another thought that comes to mind is to perhaps allow the journal_res
> to hold a reference to the pin fifo for the associated seq.. The idea
> would be we could continue to hold a reference during the open/close
> journal buffer lifecycle, but a res of the same seq would acquire an
> additional reference as well to keep the tail from popping before a
> transaction can actually set a pin (i.e. essentially parallel to the
> buffer reference).

Yeah that would probably be cleanest - but it would be much too heavy.

> In a sense the behavior in this patch is kind of an optimization of that
> approach, but I think implementing it directly would eliminate the need
> to mostly duplicate the put helper, perhaps making things a bit more
> natural. We could still fall into bch2_journal_reclaim_fast() from
> res_put(), but now only for cases where the reservation outlives the
> active journal buffer (due to a journal flush or whatever). That also
> still addresses the race by properly using the reference count. I'd
> probably have to try it to be sure I'm not missing something, but...
> thoughts?

We can do your approach if that's what you feel is going to be cleanest,
but if we go that way here's my two main concerns:

 - we can't put more slowpath code into the inline fastpath, remember
   this is all inlined into the main __bch2_trans_commit() path

 - your approach (the optimized version) means there's a new state
   transition where we do work, i.e. when journal_state_count() hits 0.

   So we have to make sure that that refcount isn't hitting 0 multiple
   times. The appropriate assertion already exists in
   journal_res_get_fast() - for a refcount to hit 0 multiple times it
   must leave 0 while in use, and we check for that.

   Let's just add a comment by that EBUG_ON(!journal_state_count(new,
   new.idx)) line indicating why it's now important.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-09-10  2:03           ` Kent Overstreet
@ 2023-09-12 18:54             ` Brian Foster
  2023-09-12 19:15               ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Foster @ 2023-09-12 18:54 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

On Sat, Sep 09, 2023 at 10:03:04PM -0400, Kent Overstreet wrote:
> On Wed, Sep 06, 2023 at 03:07:58PM -0400, Brian Foster wrote:
> > On Tue, Sep 05, 2023 at 08:59:02AM -0400, Brian Foster wrote:
> > > On Sun, Sep 03, 2023 at 10:29:28PM -0400, Kent Overstreet wrote:
> > > > On Sun, Sep 03, 2023 at 05:18:12PM -0400, Kent Overstreet wrote:
> > > > > On Thu, Aug 31, 2023 at 07:07:34AM -0400, Brian Foster wrote:
> > > > > > bcachefs freeze testing via fstests generic/390 occasionally
> > > > > > reproduces the following BUG from bch2_fs_read_only():
> > > > > > 
> > > > > >   BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
> > > > >
> > > > > Your fix makes sense, but I wonder if it might be simpler and better to
> > > > > fix this in bch2_journal_reclaim_fast().
> > ...
> > > 
> > > FWIW, one thing I disliked about the original patch was the need to
> > > mostly duplicate the buf put helper due to the locking context quirk. I
> > > was trying to avoid that, but I ran out of ideas before I wanted to move
> > > on actually fixing the bug. My preference would be to address the
> > > reference counting issue as is (to preserve design simplicity), and then
> > > maybe think a bit harder about cleaning up the res put implementation if
> > > the primary concern is that we feel like this starts to make things a
> > > bit convoluted..
> > > 
> > 
> > Another thought that comes to mind is to perhaps allow the journal_res
> > to hold a reference to the pin fifo for the associated seq.. The idea
> > would be we could continue to hold a reference during the open/close
> > journal buffer lifecycle, but a res of the same seq would acquire an
> > additional reference as well to keep the tail from popping before a
> > transaction can actually set a pin (i.e. essentially parallel to the
> > buffer reference).
> 
> Yeah that would probably be cleanest - but it would be much too heavy.
> 

I'm curious how so? Additional atomics? 

If so, we could also consider something that allows a pin list ref on
the reservation to simply transfer to a pin set via the transaction.
That might not add any new overhead over the current code, but would
require some plumbing.

> > In a sense the behavior in this patch is kind of an optimization of that
> > approach, but I think implementing it directly would eliminate the need
> > to mostly duplicate the put helper, perhaps making things a bit more
> > natural. We could still fall into bch2_journal_reclaim_fast() from
> > res_put(), but now only for cases where the reservation outlives the
> > active journal buffer (due to a journal flush or whatever). That also
> > still addresses the race by properly using the reference count. I'd
> > probably have to try it to be sure I'm not missing something, but...
> > thoughts?
> 
> We can do your approach if that's what you feel is going to be cleanest,
> but if we go that way here's my two main concerns:
> 

Ok..

FWIW, it's not so much that I think it's the cleanest (I agree that the
variant discussed above probably ends up cleaner code-wise), but rather
that it provides sufficient confidence we won't just have to revisit the
same underlying problem the next time some bit of code assumes that an
outstanding reservation guarantees a usable pin list.

>  - we can't put more slowpath code into the inline fastpath, remember
>    this is all inlined into the main __bch2_trans_commit() path
> 

Sure. I missed that closure_call() was itself inlined. I think that is
what initially confused me wrt to the feedback on patch 2. I'll try to
keep that helper external and perhaps just rename it.

>  - your approach (the optimized version) means there's a new state
>    transition where we do work, i.e. when journal_state_count() hits 0.
> 

Indeed.

>    So we have to make sure that that refcount isn't hitting 0 multiple
>    times. The appropriate assertion already exists in
>    journal_res_get_fast() - for a refcount to hit 0 multiple times it
>    must leave 0 while in use, and we check for that.
> 
>    Let's just add a comment by that EBUG_ON(!journal_state_count(new,
>    new.idx)) line indicating why it's now important.
> 

Sounds reasonable. I'll try to come up with something. Thanks.

Brian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] bcachefs: fix race between journal entry close and pin set
  2023-09-12 18:54             ` Brian Foster
@ 2023-09-12 19:15               ` Kent Overstreet
  0 siblings, 0 replies; 14+ messages in thread
From: Kent Overstreet @ 2023-09-12 19:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

On Tue, Sep 12, 2023 at 02:54:34PM -0400, Brian Foster wrote:
> On Sat, Sep 09, 2023 at 10:03:04PM -0400, Kent Overstreet wrote:
> > On Wed, Sep 06, 2023 at 03:07:58PM -0400, Brian Foster wrote:
> > > On Tue, Sep 05, 2023 at 08:59:02AM -0400, Brian Foster wrote:
> > > > On Sun, Sep 03, 2023 at 10:29:28PM -0400, Kent Overstreet wrote:
> > > > > On Sun, Sep 03, 2023 at 05:18:12PM -0400, Kent Overstreet wrote:
> > > > > > On Thu, Aug 31, 2023 at 07:07:34AM -0400, Brian Foster wrote:
> > > > > > > bcachefs freeze testing via fstests generic/390 occasionally
> > > > > > > reproduces the following BUG from bch2_fs_read_only():
> > > > > > > 
> > > > > > >   BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
> > > > > >
> > > > > > Your fix makes sense, but I wonder if it might be simpler and better to
> > > > > > fix this in bch2_journal_reclaim_fast().
> > > ...
> > > > 
> > > > FWIW, one thing I disliked about the original patch was the need to
> > > > mostly duplicate the buf put helper due to the locking context quirk. I
> > > > was trying to avoid that, but I ran out of ideas before I wanted to move
> > > > on actually fixing the bug. My preference would be to address the
> > > > reference counting issue as is (to preserve design simplicity), and then
> > > > maybe think a bit harder about cleaning up the res put implementation if
> > > > the primary concern is that we feel like this starts to make things a
> > > > bit convoluted..
> > > > 
> > > 
> > > Another thought that comes to mind is to perhaps allow the journal_res
> > > to hold a reference to the pin fifo for the associated seq.. The idea
> > > would be we could continue to hold a reference during the open/close
> > > journal buffer lifecycle, but a res of the same seq would acquire an
> > > additional reference as well to keep the tail from popping before a
> > > transaction can actually set a pin (i.e. essentially parallel to the
> > > buffer reference).
> > 
> > Yeah that would probably be cleanest - but it would be much too heavy.
> > 
> 
> I'm curious how so? Additional atomics? 

Not just that, getting a journal reservation is lockless but getting a
journal pin is not.

> >    So we have to make sure that that refcount isn't hitting 0 multiple
> >    times. The appropriate assertion already exists in
> >    journal_res_get_fast() - for a refcount to hit 0 multiple times it
> >    must leave 0 while in use, and we check for that.
> > 
> >    Let's just add a comment by that EBUG_ON(!journal_state_count(new,
> >    new.idx)) line indicating why it's now important.
> > 
> 
> Sounds reasonable. I'll try to come up with something. Thanks.

Sounds good

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-12 19:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-31 11:07 [PATCH 0/3] bcachefs: journal bug fixes Brian Foster
2023-08-31 11:07 ` [PATCH 1/3] bcachefs: restart journal reclaim thread on ro->rw transitions Brian Foster
2023-08-31 11:07 ` [PATCH 2/3] bcachefs: prepare journal buf put to handle pin put Brian Foster
2023-09-03 21:09   ` Kent Overstreet
2023-09-05 12:41     ` Brian Foster
2023-09-10  1:55       ` Kent Overstreet
2023-08-31 11:07 ` [PATCH 3/3] bcachefs: fix race between journal entry close and pin set Brian Foster
2023-09-03 21:18   ` Kent Overstreet
2023-09-04  2:29     ` Kent Overstreet
2023-09-05 12:59       ` Brian Foster
2023-09-06 19:07         ` Brian Foster
2023-09-10  2:03           ` Kent Overstreet
2023-09-12 18:54             ` Brian Foster
2023-09-12 19:15               ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox