public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Revert "bcache: fix improper use of bi_end_io"
@ 2026-01-13  6:09 colyli
  2026-01-13  8:07 ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: colyli @ 2026-01-13  6:09 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-bcache, Coly Li, Christoph Hellwig,
	Shida Zhang, Kent Overstreet

From: Coly Li <colyli@fnnas.com>

This reverts commit 53280e398471f0bddbb17b798a63d41264651325.

The above commit tries to address the race in bio chain handling,
but it seems in detached_dev_end_io() simply using bio_endio() to
replace bio->bi_end_io() may introduce potential regression.

This patch revert the commit, let's wait for better fix from Shida.

Signed-off-by: Coly Li <colyli@fnnas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Shida Zhang <zhangshida@kylinos.cn>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
---
 drivers/md/bcache/request.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 82fdea7dea7a..af345dc6fde1 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1104,7 +1104,7 @@ static void detached_dev_end_io(struct bio *bio)
 	}
 
 	kfree(ddip);
-	bio_endio(bio);
+	bio->bi_end_io(bio);
 }
 
 static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
@@ -1121,7 +1121,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
 	ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
 	if (!ddip) {
 		bio->bi_status = BLK_STS_RESOURCE;
-		bio_endio(bio);
+		bio->bi_end_io(bio);
 		return;
 	}
 
@@ -1136,7 +1136,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
 
 	if ((bio_op(bio) == REQ_OP_DISCARD) &&
 	    !bdev_max_discard_sectors(dc->bdev))
-		detached_dev_end_io(bio);
+		bio->bi_end_io(bio);
 	else
 		submit_bio_noacct(bio);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  6:09 [PATCH] Revert "bcache: fix improper use of bi_end_io" colyli
@ 2026-01-13  8:07 ` Christoph Hellwig
  2026-01-13  8:30   ` Kent Overstreet
  2026-01-13 16:22   ` Coly Li
  0 siblings, 2 replies; 18+ messages in thread
From: Christoph Hellwig @ 2026-01-13  8:07 UTC (permalink / raw)
  To: colyli
  Cc: axboe, linux-block, linux-bcache, Christoph Hellwig, Shida Zhang,
	Kent Overstreet

On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote:
> From: Coly Li <colyli@fnnas.com>
> 
> This reverts commit 53280e398471f0bddbb17b798a63d41264651325.
> 
> The above commit tries to address the race in bio chain handling,
> but it seems in detached_dev_end_io() simply using bio_endio() to
> replace bio->bi_end_io() may introduce potential regression.
> 
> This patch revert the commit, let's wait for better fix from Shida.

That's a pretty vague commit message for reverting a clear API
violation that has caused trouble.  What is the story here?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  8:07 ` Christoph Hellwig
@ 2026-01-13  8:30   ` Kent Overstreet
  2026-01-13  8:34     ` Christoph Hellwig
  2026-01-13 16:22   ` Coly Li
  1 sibling, 1 reply; 18+ messages in thread
From: Kent Overstreet @ 2026-01-13  8:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang

On Tue, Jan 13, 2026 at 12:07:54AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote:
> > From: Coly Li <colyli@fnnas.com>
> > 
> > This reverts commit 53280e398471f0bddbb17b798a63d41264651325.
> > 
> > The above commit tries to address the race in bio chain handling,
> > but it seems in detached_dev_end_io() simply using bio_endio() to
> > replace bio->bi_end_io() may introduce potential regression.
> > 
> > This patch revert the commit, let's wait for better fix from Shida.
> 
> That's a pretty vague commit message for reverting a clear API
> violation that has caused trouble.  What is the story here?

Christoph, you can't call bio_endio() on the same bio twice. You should
know this. Calling a bare bi_end_io function is the correct thing to do
when we're getting called from bio_endio().

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  8:30   ` Kent Overstreet
@ 2026-01-13  8:34     ` Christoph Hellwig
  2026-01-13  8:39       ` Kent Overstreet
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2026-01-13  8:34 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, colyli, axboe, linux-block, linux-bcache,
	Shida Zhang

On Tue, Jan 13, 2026 at 03:30:41AM -0500, Kent Overstreet wrote:
> On Tue, Jan 13, 2026 at 12:07:54AM -0800, Christoph Hellwig wrote:
> > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote:
> > > From: Coly Li <colyli@fnnas.com>
> > > 
> > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325.
> > > 
> > > The above commit tries to address the race in bio chain handling,
> > > but it seems in detached_dev_end_io() simply using bio_endio() to
> > > replace bio->bi_end_io() may introduce potential regression.
> > > 
> > > This patch revert the commit, let's wait for better fix from Shida.
> > 
> > That's a pretty vague commit message for reverting a clear API
> > violation that has caused trouble.  What is the story here?
> 
> Christoph, you can't call bio_endio() on the same bio twice. You should
> know this. Calling a bare bi_end_io function is the correct thing to do
> when we're getting called from bio_endio().

Hi Kent,

indeed, calling bio_endio() twice is a very bad idea.  Nothing in the
quoted commit log indicates that is the case, though.  If that is
the problem it needs to be fixed, but calling ->bi_end_io directly
is not the proper fix either.  That's eaxtly why I'm asking for the
story behind this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  8:34     ` Christoph Hellwig
@ 2026-01-13  8:39       ` Kent Overstreet
  2026-01-13  8:58         ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Kent Overstreet @ 2026-01-13  8:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang

On Tue, Jan 13, 2026 at 12:34:36AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 03:30:41AM -0500, Kent Overstreet wrote:
> > On Tue, Jan 13, 2026 at 12:07:54AM -0800, Christoph Hellwig wrote:
> > > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote:
> > > > From: Coly Li <colyli@fnnas.com>
> > > > 
> > > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325.
> > > > 
> > > > The above commit tries to address the race in bio chain handling,
> > > > but it seems in detached_dev_end_io() simply using bio_endio() to
> > > > replace bio->bi_end_io() may introduce potential regression.
> > > > 
> > > > This patch revert the commit, let's wait for better fix from Shida.
> > > 
> > > That's a pretty vague commit message for reverting a clear API
> > > violation that has caused trouble.  What is the story here?
> > 
> > Christoph, you can't call bio_endio() on the same bio twice. You should
> > know this. Calling a bare bi_end_io function is the correct thing to do
> > when we're getting called from bio_endio().
> 
> Hi Kent,
> 
> indeed, calling bio_endio() twice is a very bad idea.  Nothing in the
> quoted commit log indicates that is the case, though.  If that is
> the problem it needs to be fixed, but calling ->bi_end_io directly
> is not the proper fix either.  That's eaxtly why I'm asking for the
> story behind this.

No. The original (buggy) patch has your name on it in the suggested-by,
you should have done your homework.

This almost made it into a bunch of stable releases, where it would have
exploded as soon as it hit actual users, and I would've gotten some of
those bug reports.

You need to reexamine your priorities here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  8:39       ` Kent Overstreet
@ 2026-01-13  8:58         ` Christoph Hellwig
  2026-01-13  9:27           ` Kent Overstreet
  2026-01-13 16:18           ` Coly Li
  0 siblings, 2 replies; 18+ messages in thread
From: Christoph Hellwig @ 2026-01-13  8:58 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, colyli, axboe, linux-block, linux-bcache,
	Shida Zhang

On Tue, Jan 13, 2026 at 03:39:28AM -0500, Kent Overstreet wrote:
> No. The original (buggy) patch has your name on it in the suggested-by,
> you should have done your homework.

How about you stop being a dick?  And if you're talking about homework
do you own and lookup that thread, and help implementing the suggestions
for fixing an API abuse, or at least reviewing them instead of angrily
shouting at someone suggesting to fix things.

> You need to reexamine your priorities here.

Maybe my priority should be to better ignore you..

Anyway, Coly: I think the issue is that bcache tries to do a silly
shortcut and reuses the bio for multiple layers of I/O which still trying
to hook into completions for both.  Something like the (untested) patch
below fixes that by cloning the bio and making everything work as
expected.  I'm not sure how dead lock safe even the original version is,
and my suspicion is that it needs a bio_set.  Of course even suggesting
something will probably get me in trouble so take it with a grain of
salt.

---
From 1a2336f617f2e351564ec20e4db9727584e04aa9 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 13 Jan 2026 09:53:34 +0100
Subject: bcache: clone bio in detached_dev_do_request

Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/bcache/request.c | 72 ++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 82fdea7dea7a..9e7b59121313 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata)
 }
 
 struct detached_dev_io_private {
-	struct bcache_device	*d;
 	unsigned long		start_time;
-	bio_end_io_t		*bi_end_io;
-	void			*bi_private;
-	struct block_device	*orig_bdev;
+	struct bio		*orig_bio;
+	struct bio		bio;
 };
 
 static void detached_dev_end_io(struct bio *bio)
 {
-	struct detached_dev_io_private *ddip;
-
-	ddip = bio->bi_private;
-	bio->bi_end_io = ddip->bi_end_io;
-	bio->bi_private = ddip->bi_private;
+	struct detached_dev_io_private *ddip =
+		container_of(bio, struct detached_dev_io_private, bio);
+	struct bio *orig_bio = ddip->orig_bio;
 
 	/* Count on the bcache device */
-	bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev);
+	bio_end_io_acct(orig_bio, ddip->start_time);
 
 	if (bio->bi_status) {
-		struct cached_dev *dc = container_of(ddip->d,
-						     struct cached_dev, disk);
+		struct cached_dev *dc = bio->bi_private;
+
 		/* should count I/O error for backing device here */
 		bch_count_backing_io_errors(dc, bio);
+		orig_bio->bi_status = bio->bi_status;
 	}
 
 	kfree(ddip);
-	bio_endio(bio);
+	bio_endio(orig_bio);
 }
 
-static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
-		struct block_device *orig_bdev, unsigned long start_time)
+static void detached_dev_do_request(struct bcache_device *d,
+		struct bio *orig_bio, unsigned long start_time)
 {
 	struct detached_dev_io_private *ddip;
 	struct cached_dev *dc = container_of(d, struct cached_dev, disk);
 
+	if (bio_op(orig_bio) == REQ_OP_DISCARD &&
+	    !bdev_max_discard_sectors(dc->bdev)) {
+		bio_endio(orig_bio);
+		return;
+	}
+
 	/*
 	 * no need to call closure_get(&dc->disk.cl),
 	 * because upper layer had already opened bcache device,
 	 * which would call closure_get(&dc->disk.cl)
 	 */
 	ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
-	if (!ddip) {
-		bio->bi_status = BLK_STS_RESOURCE;
-		bio_endio(bio);
-		return;
-	}
-
-	ddip->d = d;
+	if (!ddip)
+		goto enomem;
+	if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO))
+		goto free_ddip;
 	/* Count on the bcache device */
-	ddip->orig_bdev = orig_bdev;
 	ddip->start_time = start_time;
-	ddip->bi_end_io = bio->bi_end_io;
-	ddip->bi_private = bio->bi_private;
-	bio->bi_end_io = detached_dev_end_io;
-	bio->bi_private = ddip;
-
-	if ((bio_op(bio) == REQ_OP_DISCARD) &&
-	    !bdev_max_discard_sectors(dc->bdev))
-		detached_dev_end_io(bio);
-	else
-		submit_bio_noacct(bio);
+	ddip->orig_bio = orig_bio;
+	ddip->bio.bi_end_io = detached_dev_end_io;
+	ddip->bio.bi_private = dc;
+	submit_bio_noacct(&ddip->bio);
+	return;
+free_ddip:
+	kfree(ddip);
+enomem:
+	orig_bio->bi_status = BLK_STS_RESOURCE;
+	bio_endio(orig_bio);
 }
 
 static void quit_max_writeback_rate(struct cache_set *c,
@@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio)
 
 	start_time = bio_start_io_acct(bio);
 
-	bio_set_dev(bio, dc->bdev);
 	bio->bi_iter.bi_sector += dc->sb.data_offset;
 
 	if (cached_dev_get(dc)) {
+		bio_set_dev(bio, dc->bdev);
 		s = search_alloc(bio, d, orig_bdev, start_time);
 		trace_bcache_request_start(s->d, bio);
 
@@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio)
 			else
 				cached_dev_read(dc, s);
 		}
-	} else
+	} else {
 		/* I/O request sent to backing device */
-		detached_dev_do_request(d, bio, orig_bdev, start_time);
+		detached_dev_do_request(d, bio, start_time);
+	}
 }
 
 static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  8:58         ` Christoph Hellwig
@ 2026-01-13  9:27           ` Kent Overstreet
  2026-01-13 15:00             ` Christoph Hellwig
  2026-01-13 16:18           ` Coly Li
  1 sibling, 1 reply; 18+ messages in thread
From: Kent Overstreet @ 2026-01-13  9:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang

On Tue, Jan 13, 2026 at 12:58:46AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 03:39:28AM -0500, Kent Overstreet wrote:
> > No. The original (buggy) patch has your name on it in the suggested-by,
> > you should have done your homework.
> 
> How about you stop being a dick?  And if you're talking about homework
> do you own and lookup that thread, and help implementing the suggestions
> for fixing an API abuse, or at least reviewing them instead of angrily
> shouting at someone suggesting to fix things.

It's not being a dick to tell someone they need to slow down and be more
careful when they're doing something broken and arguing over the revert.

The priority is keeping things working for the end user. Show me that
you understand that point, and can slow down and be more careful - so I
don't have to take time out of my day when I've got other things that I
need to be working on - and I'll be able to engage in a much more
friendly manner.

> Anyway, Coly: I think the issue is that bcache tries to do a silly
> shortcut and reuses the bio for multiple layers of I/O which still trying
> to hook into completions for both.  Something like the (untested) patch
> below fixes that by cloning the bio and making everything work as
> expected.  I'm not sure how dead lock safe even the original version is,
> and my suspicion is that it needs a bio_set.  Of course even suggesting
> something will probably get me in trouble so take it with a grain of
> salt.

Cloning the bio is completely fine - that's a cleaner and more modern
approach.

I'd still like to know why you consider the bare bi_end_io calls
problematic. If it's just that you want the code clean so you can grep
for actual issues, that's an acceptable answer. Historically that's been
fine, and I haven't seen it cause issues, so I would like to be
enlightened on that point.

On a side note, I just yesterday had to shoot down another broken "fix"
for memory allocation profiling - where it, again, turned out that the
submitter hadn't confirmed anything he was basing his patch on by
testing.

One of the big factors in why I was ok with continuing bcachefs
development outside the kernel is that when getting patch submissions
from the kernel community, I repeatedly had to tell people that yes,
testing their code is, in fact their responsibility, and I even had
people argue with me over it - and at some point I realized that this
has literally never been an issue with submissions from people outside
the kernel community.

This is something that really needs to change.

> 
> ---
> From 1a2336f617f2e351564ec20e4db9727584e04aa9 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Tue, 13 Jan 2026 09:53:34 +0100
> Subject: bcache: clone bio in detached_dev_do_request
> 
> Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/md/bcache/request.c | 72 ++++++++++++++++++-------------------
>  1 file changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 82fdea7dea7a..9e7b59121313 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata)
>  }
>  
>  struct detached_dev_io_private {
> -	struct bcache_device	*d;
>  	unsigned long		start_time;
> -	bio_end_io_t		*bi_end_io;
> -	void			*bi_private;
> -	struct block_device	*orig_bdev;
> +	struct bio		*orig_bio;
> +	struct bio		bio;
>  };
>  
>  static void detached_dev_end_io(struct bio *bio)
>  {
> -	struct detached_dev_io_private *ddip;
> -
> -	ddip = bio->bi_private;
> -	bio->bi_end_io = ddip->bi_end_io;
> -	bio->bi_private = ddip->bi_private;
> +	struct detached_dev_io_private *ddip =
> +		container_of(bio, struct detached_dev_io_private, bio);
> +	struct bio *orig_bio = ddip->orig_bio;
>  
>  	/* Count on the bcache device */
> -	bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev);
> +	bio_end_io_acct(orig_bio, ddip->start_time);
>  
>  	if (bio->bi_status) {
> -		struct cached_dev *dc = container_of(ddip->d,
> -						     struct cached_dev, disk);
> +		struct cached_dev *dc = bio->bi_private;
> +
>  		/* should count I/O error for backing device here */
>  		bch_count_backing_io_errors(dc, bio);
> +		orig_bio->bi_status = bio->bi_status;
>  	}
>  
>  	kfree(ddip);
> -	bio_endio(bio);
> +	bio_endio(orig_bio);
>  }
>  
> -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> -		struct block_device *orig_bdev, unsigned long start_time)
> +static void detached_dev_do_request(struct bcache_device *d,
> +		struct bio *orig_bio, unsigned long start_time)
>  {
>  	struct detached_dev_io_private *ddip;
>  	struct cached_dev *dc = container_of(d, struct cached_dev, disk);
>  
> +	if (bio_op(orig_bio) == REQ_OP_DISCARD &&
> +	    !bdev_max_discard_sectors(dc->bdev)) {
> +		bio_endio(orig_bio);
> +		return;
> +	}
> +
>  	/*
>  	 * no need to call closure_get(&dc->disk.cl),
>  	 * because upper layer had already opened bcache device,
>  	 * which would call closure_get(&dc->disk.cl)
>  	 */
>  	ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
> -	if (!ddip) {
> -		bio->bi_status = BLK_STS_RESOURCE;
> -		bio_endio(bio);
> -		return;
> -	}
> -
> -	ddip->d = d;
> +	if (!ddip)
> +		goto enomem;
> +	if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO))
> +		goto free_ddip;
>  	/* Count on the bcache device */
> -	ddip->orig_bdev = orig_bdev;
>  	ddip->start_time = start_time;
> -	ddip->bi_end_io = bio->bi_end_io;
> -	ddip->bi_private = bio->bi_private;
> -	bio->bi_end_io = detached_dev_end_io;
> -	bio->bi_private = ddip;
> -
> -	if ((bio_op(bio) == REQ_OP_DISCARD) &&
> -	    !bdev_max_discard_sectors(dc->bdev))
> -		detached_dev_end_io(bio);
> -	else
> -		submit_bio_noacct(bio);
> +	ddip->orig_bio = orig_bio;
> +	ddip->bio.bi_end_io = detached_dev_end_io;
> +	ddip->bio.bi_private = dc;
> +	submit_bio_noacct(&ddip->bio);
> +	return;
> +free_ddip:
> +	kfree(ddip);
> +enomem:
> +	orig_bio->bi_status = BLK_STS_RESOURCE;
> +	bio_endio(orig_bio);
>  }
>  
>  static void quit_max_writeback_rate(struct cache_set *c,
> @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio)
>  
>  	start_time = bio_start_io_acct(bio);
>  
> -	bio_set_dev(bio, dc->bdev);
>  	bio->bi_iter.bi_sector += dc->sb.data_offset;
>  
>  	if (cached_dev_get(dc)) {
> +		bio_set_dev(bio, dc->bdev);
>  		s = search_alloc(bio, d, orig_bdev, start_time);
>  		trace_bcache_request_start(s->d, bio);
>  
> @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio)
>  			else
>  				cached_dev_read(dc, s);
>  		}
> -	} else
> +	} else {
>  		/* I/O request sent to backing device */
> -		detached_dev_do_request(d, bio, orig_bdev, start_time);
> +		detached_dev_do_request(d, bio, start_time);
> +	}
>  }
>  
>  static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode,
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  9:27           ` Kent Overstreet
@ 2026-01-13 15:00             ` Christoph Hellwig
  2026-01-13 15:30               ` Kent Overstreet
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2026-01-13 15:00 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, colyli, axboe, linux-block, linux-bcache,
	Shida Zhang

On Tue, Jan 13, 2026 at 04:27:19AM -0500, Kent Overstreet wrote:
> It's not being a dick to tell someone they need to slow down and be more
> careful when they're doing something broken and arguing over the revert.

Yes, you are.  I did not actually do anything here but point out that
bcache is broken.  If that fix made things worse it would be useful
if one of the maintainers or a CI system spots it.  You need to do your
job as maintainer instead of constantly attacking others.

Start with that, and then just try being a little nice to people.  Most
folks will respond nicely to questions, but less so when you start out
with an attack.  And until you learn that I have absolutely no interest
in wasting any further time on arguing with you in any form.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13 15:00             ` Christoph Hellwig
@ 2026-01-13 15:30               ` Kent Overstreet
  0 siblings, 0 replies; 18+ messages in thread
From: Kent Overstreet @ 2026-01-13 15:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang

On Tue, Jan 13, 2026 at 07:00:03AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 04:27:19AM -0500, Kent Overstreet wrote:
> > It's not being a dick to tell someone they need to slow down and be more
> > careful when they're doing something broken and arguing over the revert.
> 
> Yes, you are.  I did not actually do anything here but point out that
> bcache is broken.

There've been no bug reports, people are using the code and it works.
You keep claiming breakage, and I asked you to explain what you thought
was wrong with the bare bi_end_io call - but you still haven't, and I'm
not buying it because I know what the code in bio_endio() does - I wrote
some of it, after all, and none of it expects to be called twice on the
same bio.

You're making changes in code you didn't write, bypassing the
maintainers to get it in, without providing any real justification -
that's reckless. And you have a habit of calling things "broken"
without a good reason, so please cut that out.

If, as you claim, calling a bare bi_end_io function in this context is a
problem, you better be able to explain why.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  8:58         ` Christoph Hellwig
  2026-01-13  9:27           ` Kent Overstreet
@ 2026-01-13 16:18           ` Coly Li
  2026-01-13 16:26             ` Christoph Hellwig
  1 sibling, 1 reply; 18+ messages in thread
From: Coly Li @ 2026-01-13 16:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kent Overstreet, axboe, linux-block, linux-bcache, Shida Zhang

On Tue, Jan 13, 2026 at 12:58:46AM +0800, Christoph Hellwig wrote:
> Anyway, Coly: I think the issue is that bcache tries to do a silly
> shortcut and reuses the bio for multiple layers of I/O which still trying
> to hook into completions for both.  Something like the (untested) patch
> below fixes that by cloning the bio and making everything work as
> expected.  I'm not sure how dead lock safe even the original version is,
> and my suspicion is that it needs a bio_set.  Of course even suggesting
> something will probably get me in trouble so take it with a grain of
> salt.
> 
> ---
> From 1a2336f617f2e351564ec20e4db9727584e04aa9 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Tue, 13 Jan 2026 09:53:34 +0100
> Subject: bcache: clone bio in detached_dev_do_request
> 
> Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de>

Hi Christoph,

This cloned bio method looks good. Could you please post a formal patch?
Then I may replace the revert commit with your patch.

Thanks.

Coly Li

> ---
>  drivers/md/bcache/request.c | 72 ++++++++++++++++++-------------------
>  1 file changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 82fdea7dea7a..9e7b59121313 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata)
>  }
>  
>  struct detached_dev_io_private {
> -	struct bcache_device	*d;
>  	unsigned long		start_time;
> -	bio_end_io_t		*bi_end_io;
> -	void			*bi_private;
> -	struct block_device	*orig_bdev;
> +	struct bio		*orig_bio;
> +	struct bio		bio;
>  };
>  
>  static void detached_dev_end_io(struct bio *bio)
>  {
> -	struct detached_dev_io_private *ddip;
> -
> -	ddip = bio->bi_private;
> -	bio->bi_end_io = ddip->bi_end_io;
> -	bio->bi_private = ddip->bi_private;
> +	struct detached_dev_io_private *ddip =
> +		container_of(bio, struct detached_dev_io_private, bio);
> +	struct bio *orig_bio = ddip->orig_bio;
>  
>  	/* Count on the bcache device */
> -	bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev);
> +	bio_end_io_acct(orig_bio, ddip->start_time);
>  
>  	if (bio->bi_status) {
> -		struct cached_dev *dc = container_of(ddip->d,
> -						     struct cached_dev, disk);
> +		struct cached_dev *dc = bio->bi_private;
> +
>  		/* should count I/O error for backing device here */
>  		bch_count_backing_io_errors(dc, bio);
> +		orig_bio->bi_status = bio->bi_status;
>  	}
>  
>  	kfree(ddip);
> -	bio_endio(bio);
> +	bio_endio(orig_bio);
>  }
>  
> -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> -		struct block_device *orig_bdev, unsigned long start_time)
> +static void detached_dev_do_request(struct bcache_device *d,
> +		struct bio *orig_bio, unsigned long start_time)
>  {
>  	struct detached_dev_io_private *ddip;
>  	struct cached_dev *dc = container_of(d, struct cached_dev, disk);
>  
> +	if (bio_op(orig_bio) == REQ_OP_DISCARD &&
> +	    !bdev_max_discard_sectors(dc->bdev)) {
> +		bio_endio(orig_bio);
> +		return;
> +	}
> +
>  	/*
>  	 * no need to call closure_get(&dc->disk.cl),
>  	 * because upper layer had already opened bcache device,
>  	 * which would call closure_get(&dc->disk.cl)
>  	 */
>  	ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
> -	if (!ddip) {
> -		bio->bi_status = BLK_STS_RESOURCE;
> -		bio_endio(bio);
> -		return;
> -	}
> -
> -	ddip->d = d;
> +	if (!ddip)
> +		goto enomem;
> +	if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO))
> +		goto free_ddip;
>  	/* Count on the bcache device */
> -	ddip->orig_bdev = orig_bdev;
>  	ddip->start_time = start_time;
> -	ddip->bi_end_io = bio->bi_end_io;
> -	ddip->bi_private = bio->bi_private;
> -	bio->bi_end_io = detached_dev_end_io;
> -	bio->bi_private = ddip;
> -
> -	if ((bio_op(bio) == REQ_OP_DISCARD) &&
> -	    !bdev_max_discard_sectors(dc->bdev))
> -		detached_dev_end_io(bio);
> -	else
> -		submit_bio_noacct(bio);
> +	ddip->orig_bio = orig_bio;
> +	ddip->bio.bi_end_io = detached_dev_end_io;
> +	ddip->bio.bi_private = dc;
> +	submit_bio_noacct(&ddip->bio);
> +	return;
> +free_ddip:
> +	kfree(ddip);
> +enomem:
> +	orig_bio->bi_status = BLK_STS_RESOURCE;
> +	bio_endio(orig_bio);
>  }
>  
>  static void quit_max_writeback_rate(struct cache_set *c,
> @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio)
>  
>  	start_time = bio_start_io_acct(bio);
>  
> -	bio_set_dev(bio, dc->bdev);
>  	bio->bi_iter.bi_sector += dc->sb.data_offset;
>  
>  	if (cached_dev_get(dc)) {
> +		bio_set_dev(bio, dc->bdev);
>  		s = search_alloc(bio, d, orig_bdev, start_time);
>  		trace_bcache_request_start(s->d, bio);
>  
> @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio)
>  			else
>  				cached_dev_read(dc, s);
>  		}
> -	} else
> +	} else {
>  		/* I/O request sent to backing device */
> -		detached_dev_do_request(d, bio, orig_bdev, start_time);
> +		detached_dev_do_request(d, bio, start_time);
> +	}
>  }
>  
>  static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode,
> -- 
> 2.47.3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13  8:07 ` Christoph Hellwig
  2026-01-13  8:30   ` Kent Overstreet
@ 2026-01-13 16:22   ` Coly Li
  2026-01-13 16:25     ` Christoph Hellwig
  1 sibling, 1 reply; 18+ messages in thread
From: Coly Li @ 2026-01-13 16:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, linux-block, linux-bcache, Shida Zhang, Kent Overstreet

On Tue, Jan 13, 2026 at 12:07:54AM +0800, Christoph Hellwig wrote:
> On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote:
> > From: Coly Li <colyli@fnnas.com>
> > 
> > This reverts commit 53280e398471f0bddbb17b798a63d41264651325.
> > 
> > The above commit tries to address the race in bio chain handling,
> > but it seems in detached_dev_end_io() simply using bio_endio() to
> > replace bio->bi_end_io() may introduce potential regression.
> > 
> > This patch revert the commit, let's wait for better fix from Shida.
> 
> That's a pretty vague commit message for reverting a clear API
> violation that has caused trouble.  What is the story here?
> 

The discussion happens in stable mailing list. Also I admitted my fault.
Though I didn't ack commit 53280e398471 ("bcache: fix improper use of
bi_end_io"), I read it and overlooked the bio_endio() duplicated entry
issue.

Coly Li

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13 16:22   ` Coly Li
@ 2026-01-13 16:25     ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2026-01-13 16:25 UTC (permalink / raw)
  To: Coly Li
  Cc: Christoph Hellwig, axboe, linux-block, linux-bcache, Shida Zhang,
	Kent Overstreet

On Wed, Jan 14, 2026 at 12:22:56AM +0800, Coly Li wrote:
> On Tue, Jan 13, 2026 at 12:07:54AM +0800, Christoph Hellwig wrote:
> > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote:
> > > From: Coly Li <colyli@fnnas.com>
> > > 
> > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325.
> > > 
> > > The above commit tries to address the race in bio chain handling,
> > > but it seems in detached_dev_end_io() simply using bio_endio() to
> > > replace bio->bi_end_io() may introduce potential regression.
> > > 
> > > This patch revert the commit, let's wait for better fix from Shida.
> > 
> > That's a pretty vague commit message for reverting a clear API
> > violation that has caused trouble.  What is the story here?
> > 
> 
> The discussion happens in stable mailing list. Also I admitted my fault.
> Though I didn't ack commit 53280e398471 ("bcache: fix improper use of
> bi_end_io"), I read it and overlooked the bio_endio() duplicated entry
> issue.

Please try to document the rationale in the commit logs, the current
one was extremely vague.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13 16:18           ` Coly Li
@ 2026-01-13 16:26             ` Christoph Hellwig
  2026-01-13 16:34               ` Kent Overstreet
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2026-01-13 16:26 UTC (permalink / raw)
  To: Coly Li
  Cc: Christoph Hellwig, Kent Overstreet, axboe, linux-block,
	linux-bcache, Shida Zhang

On Wed, Jan 14, 2026 at 12:18:45AM +0800, Coly Li wrote:
> Hi Christoph,
> 
> This cloned bio method looks good. Could you please post a formal patch?
> Then I may replace the revert commit with your patch.

As usual with bcache I don't really know how to test it to confidently
submit it.  Is there an official test suite I can run now.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13 16:26             ` Christoph Hellwig
@ 2026-01-13 16:34               ` Kent Overstreet
  2026-01-19  9:51                 ` Daniel Wagner
  0 siblings, 1 reply; 18+ messages in thread
From: Kent Overstreet @ 2026-01-13 16:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Coly Li, axboe, linux-block, linux-bcache, Shida Zhang

On Tue, Jan 13, 2026 at 08:26:07AM -0800, Christoph Hellwig wrote:
> On Wed, Jan 14, 2026 at 12:18:45AM +0800, Coly Li wrote:
> > Hi Christoph,
> > 
> > This cloned bio method looks good. Could you please post a formal patch?
> > Then I may replace the revert commit with your patch.
> 
> As usual with bcache I don't really know how to test it to confidently
> submit it.  Is there an official test suite I can run now.

Coly and I were just on a call discussing updating my old test suite. I
haven't used the bcache tests in > 10 years so they do need to be
updated, but the harness and related tooling is well supported both for
local development and has full CI.

https://evilpiepirate.org/git/ktest.git/tree/README.md
https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/

There's fio verify testing in various modes, reboot tests - there was
fault injection when I was using it but the fault injection code never
got merged into mainline.

I wish I could do the updating myself, but I'm neck deep in my own code,
but as I was just telling Coly I'm always available on IRC to answer
questions - irc.oftc.net #bcache.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-13 16:34               ` Kent Overstreet
@ 2026-01-19  9:51                 ` Daniel Wagner
  2026-01-19 10:18                   ` Kent Overstreet
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Wagner @ 2026-01-19  9:51 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache,
	Shida Zhang

On Tue, Jan 13, 2026 at 11:34:18AM -0500, Kent Overstreet wrote:
> Coly and I were just on a call discussing updating my old test suite. I
> haven't used the bcache tests in > 10 years so they do need to be
> updated, but the harness and related tooling is well supported both for
> local development and has full CI.
> 
> https://evilpiepirate.org/git/ktest.git/tree/README.md
> https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/

I just quickly look at the tests and I got the impression some of those
tests could be added to blktests. blktests is run by various people,
thus bcache would get some basic test exposure, e.g. in linux-next.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-19  9:51                 ` Daniel Wagner
@ 2026-01-19 10:18                   ` Kent Overstreet
  2026-01-19 10:34                     ` Daniel Wagner
  0 siblings, 1 reply; 18+ messages in thread
From: Kent Overstreet @ 2026-01-19 10:18 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache,
	Shida Zhang

On Mon, Jan 19, 2026 at 10:51:46AM +0100, Daniel Wagner wrote:
> On Tue, Jan 13, 2026 at 11:34:18AM -0500, Kent Overstreet wrote:
> > Coly and I were just on a call discussing updating my old test suite. I
> > haven't used the bcache tests in > 10 years so they do need to be
> > updated, but the harness and related tooling is well supported both for
> > local development and has full CI.
> > 
> > https://evilpiepirate.org/git/ktest.git/tree/README.md
> > https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/
> 
> I just quickly look at the tests and I got the impression some of those
> tests could be added to blktests. blktests is run by various people,
> thus bcache would get some basic test exposure, e.g. in linux-next.

ktest has features that blktests/fstests don't - it's a full testrunner,
with a CI and test dashboard, with subtest level sharding that runs on
entire cluster.

What would make sense would be for ktest to wrap blktests, like it
already does fstests.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-19 10:18                   ` Kent Overstreet
@ 2026-01-19 10:34                     ` Daniel Wagner
  2026-01-19 15:57                       ` Daniel Wagner
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Wagner @ 2026-01-19 10:34 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache,
	Shida Zhang

On Mon, Jan 19, 2026 at 05:18:15AM -0500, Kent Overstreet wrote:
> > > https://evilpiepirate.org/git/ktest.git/tree/README.md
> > > https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/
> > 
> > I just quickly look at the tests and I got the impression some of those
> > tests could be added to blktests. blktests is run by various people,
> > thus bcache would get some basic test exposure, e.g. in linux-next.
> 
> ktest has features that blktests/fstests don't - it's a full testrunner,
> with a CI and test dashboard, with subtest level sharding that runs on
> entire cluster.

That's why I said some of tests could be added directly to blktests,
e.g.

test_main()
{
    setup_tracing 'bcache:*'

    setup_bcache
    cset_uuid=$(ls -d /sys/fs/bcache/*-*-* | sed -e 's/.*\///')

    (
     	for i in $(seq 1 3); do
	    sleep 5
	    echo > /sys/block/bcache0/bcache/detach
	    echo "detach done"
	    sleep 5
	    echo $cset_uuid > /sys/block/bcache0/bcache/attach
	    echo "attach done"
	done
    )&

    run_antagonist
    run_fio
    stop_bcache
}

seems something which could also run in blktests. Sure, blktests doesn't
have all the features ktest has, but that is besides the point. The
bcache test suite is surely a good thing. Lately, blktests is getting
traction and run by more people and QA teams. My whole comment is that
adding some explicit bcache tests will get bcache more test exposure.

> What would make sense would be for ktest to wrap blktests, like it
> already does fstests.

Sure, not a problem.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io"
  2026-01-19 10:34                     ` Daniel Wagner
@ 2026-01-19 15:57                       ` Daniel Wagner
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Wagner @ 2026-01-19 15:57 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache,
	Shida Zhang

On Mon, Jan 19, 2026 at 11:34:15AM +0100, Daniel Wagner wrote:
> That's why I said some of tests could be added directly to blktests,

FWIW, I've started to work on an initial test case for bcache in
blktests. I shamelessly stole some helper/setup stuff from ktest to get
things rolling. But I think I am going to replace this as it doesn't
really match the blktests style for helpers and in the end it's not that
much we need to get it working anyway:

   https://github.com/igaw/blktests/pull/new/bcache

In case anyone wants to play with, you need to define
TEST_CASE_DEV_ARRAY:

# cat /root/config
TEST_CASE_DEV_ARRAY[bcache/*]="/dev/nvme0n1 /dev/nvme1n1 /dev/vda /dev/vdb"

# ./check -c /root/config bcache

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-01-19 15:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-13  6:09 [PATCH] Revert "bcache: fix improper use of bi_end_io" colyli
2026-01-13  8:07 ` Christoph Hellwig
2026-01-13  8:30   ` Kent Overstreet
2026-01-13  8:34     ` Christoph Hellwig
2026-01-13  8:39       ` Kent Overstreet
2026-01-13  8:58         ` Christoph Hellwig
2026-01-13  9:27           ` Kent Overstreet
2026-01-13 15:00             ` Christoph Hellwig
2026-01-13 15:30               ` Kent Overstreet
2026-01-13 16:18           ` Coly Li
2026-01-13 16:26             ` Christoph Hellwig
2026-01-13 16:34               ` Kent Overstreet
2026-01-19  9:51                 ` Daniel Wagner
2026-01-19 10:18                   ` Kent Overstreet
2026-01-19 10:34                     ` Daniel Wagner
2026-01-19 15:57                       ` Daniel Wagner
2026-01-13 16:22   ` Coly Li
2026-01-13 16:25     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox