* [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io
@ 2026-01-15 7:48 zhangshida
2026-01-15 8:06 ` Stephen Zhang
0 siblings, 1 reply; 13+ messages in thread
From: zhangshida @ 2026-01-15 7:48 UTC (permalink / raw)
To: colyli, kent.overstreet, axboe, sashal
Cc: linux-bcache, linux-kernel, zhangshida, starzhangzsd
From: Shida Zhang <zhangshida@kylinos.cn>
Commit 53280e398471 ("bcache: fix improper use of bi_end_io") attempted
to fix up bio completions by replacing manual bi_end_io calls with
bio_endio(). However, it introduced a double-completion bug in the
detached_dev path.
In a normal completion path, the call stack is:
blk_update_request
bio_endio(bio)
bio->bi_end_io(bio) -> detached_dev_end_io
bio_endio(bio) <- BUG: second call
To fix this, detached_dev_end_io() must manually call the next completion
handler in the chain.
However, in detached_dev_do_request(), if a discard is unsupported, the
bio is rejected before being submitted to the lower level. In this case,
we can use the standard bio_endio().
detached_dev_do_request
bio_endio(bio) <- Correct: starts completion for
unsubmitted bio
Fixes: 53280e398471 ("bcache: fix improper use of bi_end_io")
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
---
drivers/md/bcache/request.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 82fdea7dea7..ec712b5879f 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1104,7 +1104,14 @@ static void detached_dev_end_io(struct bio *bio)
}
kfree(ddip);
- bio_endio(bio);
+ /*
+ * This is an exception where bio_endio() cannot be used.
+ * We are already called from within a bio_endio() stack;
+ * calling it again here would result in a double-completion
+ * (decrementing bi_remaining twice). We must call the
+ * original completion routine directly.
+ */
+ bio->bi_end_io(bio);
}
static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
@@ -1136,7 +1143,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
if ((bio_op(bio) == REQ_OP_DISCARD) &&
!bdev_max_discard_sectors(dc->bdev))
- detached_dev_end_io(bio);
+ bio_endio(bio);
else
submit_bio_noacct(bio);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-15 7:48 [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io zhangshida @ 2026-01-15 8:06 ` Stephen Zhang 2026-01-15 8:29 ` Christoph Hellwig 2026-01-15 8:59 ` Kent Overstreet 0 siblings, 2 replies; 13+ messages in thread From: Stephen Zhang @ 2026-01-15 8:06 UTC (permalink / raw) To: colyli, kent.overstreet, axboe, sashal Cc: linux-bcache, linux-kernel, zhangshida zhangshida <starzhangzsd@gmail.com> 于2026年1月15日周四 15:48写道: > > From: Shida Zhang <zhangshida@kylinos.cn> > > Commit 53280e398471 ("bcache: fix improper use of bi_end_io") attempted > to fix up bio completions by replacing manual bi_end_io calls with > bio_endio(). However, it introduced a double-completion bug in the > detached_dev path. > > In a normal completion path, the call stack is: > blk_update_request > bio_endio(bio) > bio->bi_end_io(bio) -> detached_dev_end_io > bio_endio(bio) <- BUG: second call > > To fix this, detached_dev_end_io() must manually call the next completion > handler in the chain. > > However, in detached_dev_do_request(), if a discard is unsupported, the > bio is rejected before being submitted to the lower level. In this case, > we can use the standard bio_endio(). > > detached_dev_do_request > bio_endio(bio) <- Correct: starts completion for > unsubmitted bio > > Fixes: 53280e398471 ("bcache: fix improper use of bi_end_io") > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > --- > drivers/md/bcache/request.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index 82fdea7dea7..ec712b5879f 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -1104,7 +1104,14 @@ static void detached_dev_end_io(struct bio *bio) > } > > kfree(ddip); > - bio_endio(bio); > + /* > + * This is an exception where bio_endio() cannot be used. > + * We are already called from within a bio_endio() stack; > + * calling it again here would result in a double-completion > + * (decrementing bi_remaining twice). We must call the > + * original completion routine directly. > + */ > + bio->bi_end_io(bio); > } > > static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > @@ -1136,7 +1143,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > if ((bio_op(bio) == REQ_OP_DISCARD) && > !bdev_max_discard_sectors(dc->bdev)) > - detached_dev_end_io(bio); > + bio_endio(bio); > else > submit_bio_noacct(bio); > } > -- > 2.34.1 > Hi, My apologies for the late reply due to a delay in checking my working inbox. I see the issue mentioned in: https://lore.kernel.org/all/aWU2mO5v6RezmIpZ@moria.home.lan/ this was indeed an oversight on my part. To resolve this quickly, I've prepared a direct fix for the double-completion bug. I hope this is better than a full revert. Thank, Shida ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-15 8:06 ` Stephen Zhang @ 2026-01-15 8:29 ` Christoph Hellwig 2026-01-18 11:49 ` Coly Li 2026-01-15 8:59 ` Kent Overstreet 1 sibling, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2026-01-15 8:29 UTC (permalink / raw) To: Stephen Zhang Cc: colyli, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida Can you please test this patch? commit d14f13516f60424f3cffc6d1837be566e360a8a3 Author: Christoph Hellwig <hch@lst.de> Date: Tue Jan 13 09:53:34 2026 +0100 bcache: clone bio in detached_dev_do_request Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 82fdea7dea7a..9e7b59121313 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata) } struct detached_dev_io_private { - struct bcache_device *d; unsigned long start_time; - bio_end_io_t *bi_end_io; - void *bi_private; - struct block_device *orig_bdev; + struct bio *orig_bio; + struct bio bio; }; static void detached_dev_end_io(struct bio *bio) { - struct detached_dev_io_private *ddip; - - ddip = bio->bi_private; - bio->bi_end_io = ddip->bi_end_io; - bio->bi_private = ddip->bi_private; + struct detached_dev_io_private *ddip = + container_of(bio, struct detached_dev_io_private, bio); + struct bio *orig_bio = ddip->orig_bio; /* Count on the bcache device */ - bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev); + bio_end_io_acct(orig_bio, ddip->start_time); if (bio->bi_status) { - struct cached_dev *dc = container_of(ddip->d, - struct cached_dev, disk); + struct cached_dev *dc = bio->bi_private; + /* should count I/O error for backing device here */ bch_count_backing_io_errors(dc, bio); + orig_bio->bi_status = bio->bi_status; } kfree(ddip); - bio_endio(bio); + bio_endio(orig_bio); } -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, - struct block_device *orig_bdev, unsigned long start_time) +static void detached_dev_do_request(struct bcache_device *d, + struct bio *orig_bio, unsigned long start_time) { struct detached_dev_io_private *ddip; struct cached_dev *dc = container_of(d, struct cached_dev, disk); + if (bio_op(orig_bio) == REQ_OP_DISCARD && + !bdev_max_discard_sectors(dc->bdev)) { + bio_endio(orig_bio); + return; + } + /* * no need to call closure_get(&dc->disk.cl), * because upper layer had already opened bcache device, * which would call closure_get(&dc->disk.cl) */ ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); - if (!ddip) { - bio->bi_status = BLK_STS_RESOURCE; - bio_endio(bio); - return; - } - - ddip->d = d; + if (!ddip) + goto enomem; + if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO)) + goto free_ddip; /* Count on the bcache device */ - ddip->orig_bdev = orig_bdev; ddip->start_time = start_time; - ddip->bi_end_io = bio->bi_end_io; - ddip->bi_private = bio->bi_private; - bio->bi_end_io = detached_dev_end_io; - bio->bi_private = ddip; - - if ((bio_op(bio) == REQ_OP_DISCARD) && - !bdev_max_discard_sectors(dc->bdev)) - detached_dev_end_io(bio); - else - submit_bio_noacct(bio); + ddip->orig_bio = orig_bio; + ddip->bio.bi_end_io = detached_dev_end_io; + ddip->bio.bi_private = dc; + submit_bio_noacct(&ddip->bio); + return; +free_ddip: + kfree(ddip); +enomem: + orig_bio->bi_status = BLK_STS_RESOURCE; + bio_endio(orig_bio); } static void quit_max_writeback_rate(struct cache_set *c, @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio) start_time = bio_start_io_acct(bio); - bio_set_dev(bio, dc->bdev); bio->bi_iter.bi_sector += dc->sb.data_offset; if (cached_dev_get(dc)) { + bio_set_dev(bio, dc->bdev); s = search_alloc(bio, d, orig_bdev, start_time); trace_bcache_request_start(s->d, bio); @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio) else cached_dev_read(dc, s); } - } else + } else { /* I/O request sent to backing device */ - detached_dev_do_request(d, bio, orig_bdev, start_time); + detached_dev_do_request(d, bio, start_time); + } } static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode, ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-15 8:29 ` Christoph Hellwig @ 2026-01-18 11:49 ` Coly Li 2026-01-19 6:43 ` Christoph Hellwig 2026-01-19 6:53 ` Stephen Zhang 0 siblings, 2 replies; 13+ messages in thread From: Coly Li @ 2026-01-18 11:49 UTC (permalink / raw) To: Christoph Hellwig Cc: Stephen Zhang, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida On Thu, Jan 15, 2026 at 12:29:15AM +0800, Christoph Hellwig wrote: > Can you please test this patch? > Shida, can you also test it and confirm? We need to get the fix in within 6.19 cycle. Yes, we need to make a dicision ASAP. I prefer the clone bio solution, and it looks fine to me. Thanks in advance. Coly Li > commit d14f13516f60424f3cffc6d1837be566e360a8a3 > Author: Christoph Hellwig <hch@lst.de> > Date: Tue Jan 13 09:53:34 2026 +0100 > > bcache: clone bio in detached_dev_do_request > > Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de> > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index 82fdea7dea7a..9e7b59121313 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata) > } > > struct detached_dev_io_private { > - struct bcache_device *d; > unsigned long start_time; > - bio_end_io_t *bi_end_io; > - void *bi_private; > - struct block_device *orig_bdev; > + struct bio *orig_bio; > + struct bio bio; > }; > > static void detached_dev_end_io(struct bio *bio) > { > - struct detached_dev_io_private *ddip; > - > - ddip = bio->bi_private; > - bio->bi_end_io = ddip->bi_end_io; > - bio->bi_private = ddip->bi_private; > + struct detached_dev_io_private *ddip = > + container_of(bio, struct detached_dev_io_private, bio); > + struct bio *orig_bio = ddip->orig_bio; > > /* Count on the bcache device */ > - bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev); > + bio_end_io_acct(orig_bio, ddip->start_time); > > if (bio->bi_status) { > - struct cached_dev *dc = container_of(ddip->d, > - struct cached_dev, disk); > + struct cached_dev *dc = bio->bi_private; > + > /* should count I/O error for backing device here */ > bch_count_backing_io_errors(dc, bio); > + orig_bio->bi_status = bio->bi_status; > } > > kfree(ddip); > - bio_endio(bio); > + bio_endio(orig_bio); > } > > -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > - struct block_device *orig_bdev, unsigned long start_time) > +static void detached_dev_do_request(struct bcache_device *d, > + struct bio *orig_bio, unsigned long start_time) > { > struct detached_dev_io_private *ddip; > struct cached_dev *dc = container_of(d, struct cached_dev, disk); > > + if (bio_op(orig_bio) == REQ_OP_DISCARD && > + !bdev_max_discard_sectors(dc->bdev)) { > + bio_endio(orig_bio); > + return; > + } > + > /* > * no need to call closure_get(&dc->disk.cl), > * because upper layer had already opened bcache device, > * which would call closure_get(&dc->disk.cl) > */ > ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); > - if (!ddip) { > - bio->bi_status = BLK_STS_RESOURCE; > - bio_endio(bio); > - return; > - } > - > - ddip->d = d; > + if (!ddip) > + goto enomem; > + if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO)) > + goto free_ddip; > /* Count on the bcache device */ > - ddip->orig_bdev = orig_bdev; > ddip->start_time = start_time; > - ddip->bi_end_io = bio->bi_end_io; > - ddip->bi_private = bio->bi_private; > - bio->bi_end_io = detached_dev_end_io; > - bio->bi_private = ddip; > - > - if ((bio_op(bio) == REQ_OP_DISCARD) && > - !bdev_max_discard_sectors(dc->bdev)) > - detached_dev_end_io(bio); > - else > - submit_bio_noacct(bio); > + ddip->orig_bio = orig_bio; > + ddip->bio.bi_end_io = detached_dev_end_io; > + ddip->bio.bi_private = dc; > + submit_bio_noacct(&ddip->bio); > + return; > +free_ddip: > + kfree(ddip); > +enomem: > + orig_bio->bi_status = BLK_STS_RESOURCE; > + bio_endio(orig_bio); > } > > static void quit_max_writeback_rate(struct cache_set *c, > @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio) > > start_time = bio_start_io_acct(bio); > > - bio_set_dev(bio, dc->bdev); > bio->bi_iter.bi_sector += dc->sb.data_offset; > > if (cached_dev_get(dc)) { > + bio_set_dev(bio, dc->bdev); > s = search_alloc(bio, d, orig_bdev, start_time); > trace_bcache_request_start(s->d, bio); > > @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio) > else > cached_dev_read(dc, s); > } > - } else > + } else { > /* I/O request sent to backing device */ > - detached_dev_do_request(d, bio, orig_bdev, start_time); > + detached_dev_do_request(d, bio, start_time); > + } > } > > static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode, ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-18 11:49 ` Coly Li @ 2026-01-19 6:43 ` Christoph Hellwig 2026-01-19 8:19 ` Coly Li 2026-01-19 6:53 ` Stephen Zhang 1 sibling, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2026-01-19 6:43 UTC (permalink / raw) To: Coly Li Cc: Christoph Hellwig, Stephen Zhang, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida On Sun, Jan 18, 2026 at 07:49:36PM +0800, Coly Li wrote: > Shida, > can you also test it and confirm? We need to get the fix in within 6.19 cycle. > > Yes, we need to make a dicision ASAP. > I prefer the clone bio solution, and it looks fine to me. Do you have any maintainer QA that exercises this? For basically any other maintained subsystem we'd have a maintainer jump on verity fixes like this ASAP and test them. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-19 6:43 ` Christoph Hellwig @ 2026-01-19 8:19 ` Coly Li 2026-01-19 8:29 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Coly Li @ 2026-01-19 8:19 UTC (permalink / raw) To: Christoph Hellwig Cc: Stephen Zhang, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida On Sun, Jan 18, 2026 at 10:43:55PM +0800, Christoph Hellwig wrote: > On Sun, Jan 18, 2026 at 07:49:36PM +0800, Coly Li wrote: > > Shida, > > can you also test it and confirm? We need to get the fix in within 6.19 cycle. > > > > Yes, we need to make a dicision ASAP. > > I prefer the clone bio solution, and it looks fine to me. > > Do you have any maintainer QA that exercises this? For basically any Normally I do testing by following methods, 1) normal file system testing on it 2) long time I/O pressure e.g. at least 48 hours 3) deploy on my working machines with real workload for weeks Current tests are not automatically. And I will integreate Kent's test tools in. > other maintained subsystem we'd have a maintainer jump on verity fixes > like this ASAP and test them. > Before I work on it, I want to know the author already ran and tested the patch firstly. Normally I won't test any patches posted on mailing list. Thanks. Coly Li ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-19 8:19 ` Coly Li @ 2026-01-19 8:29 ` Christoph Hellwig 2026-01-19 8:51 ` Coly Li 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2026-01-19 8:29 UTC (permalink / raw) To: Coly Li Cc: Christoph Hellwig, Stephen Zhang, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida On Mon, Jan 19, 2026 at 04:19:38PM +0800, Coly Li wrote: > Before I work on it, I want to know the author already ran and tested the > patch firstly. Normally I won't test any patches posted on mailing list. Without a published test suite that can be used for that, that is a very high demand. Maintainers of code that can't easily be tested by a drive by contributor (or contributor to the parent subsystems for that matter) need to help with testing, otherwise we would not get anything done in the kernel. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-19 8:29 ` Christoph Hellwig @ 2026-01-19 8:51 ` Coly Li 0 siblings, 0 replies; 13+ messages in thread From: Coly Li @ 2026-01-19 8:51 UTC (permalink / raw) To: Christoph Hellwig Cc: Stephen Zhang, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida > 2026年1月19日 16:29,Christoph Hellwig <hch@infradead.org> 写道: > > On Mon, Jan 19, 2026 at 04:19:38PM +0800, Coly Li wrote: >> Before I work on it, I want to know the author already ran and tested the >> patch firstly. Normally I won't test any patches posted on mailing list. > > Without a published test suite that can be used for that, that is a very > high demand. Maintainers of code that can't easily be tested by a > drive by contributor (or contributor to the parent subsystems for that > matter) need to help with testing, otherwise we would not get anything > done in the kernel. > OK, let me integrate Kent’s test cases and my scripts, and show it in git.kernel.org. You ask this for times, I should handle it ASAP. But it won’t be the blocker of proceeding this fix. Anyway, there is no conflict with asking “do you test your patch” from me. Coly Li ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-18 11:49 ` Coly Li 2026-01-19 6:43 ` Christoph Hellwig @ 2026-01-19 6:53 ` Stephen Zhang 2026-01-19 8:04 ` Christoph Hellwig 1 sibling, 1 reply; 13+ messages in thread From: Stephen Zhang @ 2026-01-19 6:53 UTC (permalink / raw) To: Coly Li Cc: Christoph Hellwig, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida Coly Li <colyli@fnnas.com> 于2026年1月18日周日 19:49写道: > > On Thu, Jan 15, 2026 at 12:29:15AM +0800, Christoph Hellwig wrote: > > Can you please test this patch? > > > > Shida, > can you also test it and confirm? We need to get the fix in within 6.19 cycle. > > Yes, we need to make a dicision ASAP. > I prefer the clone bio solution, and it looks fine to me. > > Thanks in advance. > > Coly Li > > > > > > commit d14f13516f60424f3cffc6d1837be566e360a8a3 > > Author: Christoph Hellwig <hch@lst.de> > > Date: Tue Jan 13 09:53:34 2026 +0100 > > > > bcache: clone bio in detached_dev_do_request > > > > Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de> > > Thank you, Coly and Christoph, for giving me the opportunity to continue your outstanding work on this patch. If given the chance to complete the next steps, I will begin by adding a proper commit message: bcache: use bio cloning for detached device requests Previously, bcache hijacked the bi_end_io and bi_private fields of the incoming bio when the backing device was in a detached state. This is fragile and breaks if the bio is needed to be processed by other layers. This patch transitions to using a cloned bio embedded within a private structure. This ensures the original bio's metadata remains untouched. Co-developed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Additionally, I would like to conduct a thorough code review to identify any potential issues that may not be easily caught through normal testing. > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > > index 82fdea7dea7a..9e7b59121313 100644 > > --- a/drivers/md/bcache/request.c > > +++ b/drivers/md/bcache/request.c > > @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata) > > } > > > > struct detached_dev_io_private { > > - struct bcache_device *d; > > unsigned long start_time; > > - bio_end_io_t *bi_end_io; > > - void *bi_private; > > - struct block_device *orig_bdev; > > + struct bio *orig_bio; > > + struct bio bio; > > }; > > > > static void detached_dev_end_io(struct bio *bio) > > { > > - struct detached_dev_io_private *ddip; > > - > > - ddip = bio->bi_private; > > - bio->bi_end_io = ddip->bi_end_io; > > - bio->bi_private = ddip->bi_private; > > + struct detached_dev_io_private *ddip = > > + container_of(bio, struct detached_dev_io_private, bio); > > + struct bio *orig_bio = ddip->orig_bio; > > > > /* Count on the bcache device */ > > - bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev); > > + bio_end_io_acct(orig_bio, ddip->start_time); > > > > if (bio->bi_status) { > > - struct cached_dev *dc = container_of(ddip->d, > > - struct cached_dev, disk); > > + struct cached_dev *dc = bio->bi_private; > > + > > /* should count I/O error for backing device here */ > > bch_count_backing_io_errors(dc, bio); > > + orig_bio->bi_status = bio->bi_status; > > } > > bio_init_clone must be paired with a bio_uninit() call before the memory is freed? + bio_uninit(bio); Thanks, Shida > > kfree(ddip); > > - bio_endio(bio); > > + bio_endio(orig_bio); > > } > > > > -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > - struct block_device *orig_bdev, unsigned long start_time) > > +static void detached_dev_do_request(struct bcache_device *d, > > + struct bio *orig_bio, unsigned long start_time) > > { > > struct detached_dev_io_private *ddip; > > struct cached_dev *dc = container_of(d, struct cached_dev, disk); > > > > + if (bio_op(orig_bio) == REQ_OP_DISCARD && > > + !bdev_max_discard_sectors(dc->bdev)) { > > + bio_endio(orig_bio); > > + return; > > + } > > + > > /* > > * no need to call closure_get(&dc->disk.cl), > > * because upper layer had already opened bcache device, > > * which would call closure_get(&dc->disk.cl) > > */ > > ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); > > - if (!ddip) { > > - bio->bi_status = BLK_STS_RESOURCE; > > - bio_endio(bio); > > - return; > > - } > > - > > - ddip->d = d; > > + if (!ddip) > > + goto enomem; > > + if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO)) > > + goto free_ddip; > > /* Count on the bcache device */ > > - ddip->orig_bdev = orig_bdev; > > ddip->start_time = start_time; > > - ddip->bi_end_io = bio->bi_end_io; > > - ddip->bi_private = bio->bi_private; > > - bio->bi_end_io = detached_dev_end_io; > > - bio->bi_private = ddip; > > - > > - if ((bio_op(bio) == REQ_OP_DISCARD) && > > - !bdev_max_discard_sectors(dc->bdev)) > > - detached_dev_end_io(bio); > > - else > > - submit_bio_noacct(bio); > > + ddip->orig_bio = orig_bio; > > + ddip->bio.bi_end_io = detached_dev_end_io; > > + ddip->bio.bi_private = dc; > > + submit_bio_noacct(&ddip->bio); > > + return; > > +free_ddip: > > + kfree(ddip); > > +enomem: > > + orig_bio->bi_status = BLK_STS_RESOURCE; > > + bio_endio(orig_bio); > > } > > > > static void quit_max_writeback_rate(struct cache_set *c, > > @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio) > > > > start_time = bio_start_io_acct(bio); > > > > - bio_set_dev(bio, dc->bdev); > > bio->bi_iter.bi_sector += dc->sb.data_offset; > > > > if (cached_dev_get(dc)) { > > + bio_set_dev(bio, dc->bdev); > > s = search_alloc(bio, d, orig_bdev, start_time); > > trace_bcache_request_start(s->d, bio); > > > > @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio) > > else > > cached_dev_read(dc, s); > > } > > - } else > > + } else { > > /* I/O request sent to backing device */ > > - detached_dev_do_request(d, bio, orig_bdev, start_time); > > + detached_dev_do_request(d, bio, start_time); > > + } > > } > > > > static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode, ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-19 6:53 ` Stephen Zhang @ 2026-01-19 8:04 ` Christoph Hellwig 0 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2026-01-19 8:04 UTC (permalink / raw) To: Stephen Zhang Cc: Coly Li, Christoph Hellwig, kent.overstreet, axboe, sashal, linux-bcache, linux-kernel, zhangshida On Mon, Jan 19, 2026 at 02:53:53PM +0800, Stephen Zhang wrote: > > > if (bio->bi_status) { > > > - struct cached_dev *dc = container_of(ddip->d, > > > - struct cached_dev, disk); > > > + struct cached_dev *dc = bio->bi_private; > > > + > > > /* should count I/O error for backing device here */ > > > bch_count_backing_io_errors(dc, bio); > > > + orig_bio->bi_status = bio->bi_status; > > > } > > > > > bio_init_clone must be paired with a bio_uninit() call before the > memory is freed? Yes. At least if you don't want leaks when using blk-group. But as stated in my initial mail, as far as I can tell this path really needs a bio_set to avoid spurious failures under memory pressure. In which case we'd then do a bio_put in the end_io path. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-15 8:06 ` Stephen Zhang 2026-01-15 8:29 ` Christoph Hellwig @ 2026-01-15 8:59 ` Kent Overstreet 2026-01-15 9:17 ` Stephen Zhang 1 sibling, 1 reply; 13+ messages in thread From: Kent Overstreet @ 2026-01-15 8:59 UTC (permalink / raw) To: Stephen Zhang Cc: colyli, axboe, sashal, linux-bcache, linux-kernel, zhangshida On Thu, Jan 15, 2026 at 04:06:53PM +0800, Stephen Zhang wrote: > zhangshida <starzhangzsd@gmail.com> 于2026年1月15日周四 15:48写道: > > > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > Commit 53280e398471 ("bcache: fix improper use of bi_end_io") attempted > > to fix up bio completions by replacing manual bi_end_io calls with > > bio_endio(). However, it introduced a double-completion bug in the > > detached_dev path. > > > > In a normal completion path, the call stack is: > > blk_update_request > > bio_endio(bio) > > bio->bi_end_io(bio) -> detached_dev_end_io > > bio_endio(bio) <- BUG: second call > > > > To fix this, detached_dev_end_io() must manually call the next completion > > handler in the chain. > > > > However, in detached_dev_do_request(), if a discard is unsupported, the > > bio is rejected before being submitted to the lower level. In this case, > > we can use the standard bio_endio(). > > > > detached_dev_do_request > > bio_endio(bio) <- Correct: starts completion for > > unsubmitted bio > > > > Fixes: 53280e398471 ("bcache: fix improper use of bi_end_io") > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > --- > > drivers/md/bcache/request.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > > index 82fdea7dea7..ec712b5879f 100644 > > --- a/drivers/md/bcache/request.c > > +++ b/drivers/md/bcache/request.c > > @@ -1104,7 +1104,14 @@ static void detached_dev_end_io(struct bio *bio) > > } > > > > kfree(ddip); > > - bio_endio(bio); > > + /* > > + * This is an exception where bio_endio() cannot be used. > > + * We are already called from within a bio_endio() stack; > > + * calling it again here would result in a double-completion > > + * (decrementing bi_remaining twice). We must call the > > + * original completion routine directly. > > + */ > > + bio->bi_end_io(bio); > > } > > > > static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > @@ -1136,7 +1143,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > > > if ((bio_op(bio) == REQ_OP_DISCARD) && > > !bdev_max_discard_sectors(dc->bdev)) > > - detached_dev_end_io(bio); > > + bio_endio(bio); > > else > > submit_bio_noacct(bio); > > } > > -- > > 2.34.1 > > > > Hi, > > My apologies for the late reply due to a delay in checking my working inbox. > I see the issue mentioned in: > https://lore.kernel.org/all/aWU2mO5v6RezmIpZ@moria.home.lan/ > this was indeed an oversight on my part. > > To resolve this quickly, I've prepared a direct fix for the > double-completion bug. > I hope this is better than a full revert. In general, it's just safer, simpler and saner to revert, reverting a patch is not something to be avoided. If there's _any_ new trickyness required in the fix, it's better to just revert than rush things. I revert or kick patches out - including my own - all the time. That said, this patch is good, you've got a comment explaining what's going on. Christoph's version of just always cloning the bio is definitely cleaner, but that's a bigger change, ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-15 8:59 ` Kent Overstreet @ 2026-01-15 9:17 ` Stephen Zhang 2026-01-15 9:28 ` Kent Overstreet 0 siblings, 1 reply; 13+ messages in thread From: Stephen Zhang @ 2026-01-15 9:17 UTC (permalink / raw) To: Kent Overstreet Cc: colyli, axboe, sashal, linux-bcache, linux-kernel, zhangshida, Christoph Hellwig Kent Overstreet <kent.overstreet@linux.dev> 于2026年1月15日周四 16:59写道: > > On Thu, Jan 15, 2026 at 04:06:53PM +0800, Stephen Zhang wrote: > > zhangshida <starzhangzsd@gmail.com> 于2026年1月15日周四 15:48写道: > > > > > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > > > Commit 53280e398471 ("bcache: fix improper use of bi_end_io") attempted > > > to fix up bio completions by replacing manual bi_end_io calls with > > > bio_endio(). However, it introduced a double-completion bug in the > > > detached_dev path. > > > > > > In a normal completion path, the call stack is: > > > blk_update_request > > > bio_endio(bio) > > > bio->bi_end_io(bio) -> detached_dev_end_io > > > bio_endio(bio) <- BUG: second call > > > > > > To fix this, detached_dev_end_io() must manually call the next completion > > > handler in the chain. > > > > > > However, in detached_dev_do_request(), if a discard is unsupported, the > > > bio is rejected before being submitted to the lower level. In this case, > > > we can use the standard bio_endio(). > > > > > > detached_dev_do_request > > > bio_endio(bio) <- Correct: starts completion for > > > unsubmitted bio > > > > > > Fixes: 53280e398471 ("bcache: fix improper use of bi_end_io") > > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > > --- > > > drivers/md/bcache/request.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > > > index 82fdea7dea7..ec712b5879f 100644 > > > --- a/drivers/md/bcache/request.c > > > +++ b/drivers/md/bcache/request.c > > > @@ -1104,7 +1104,14 @@ static void detached_dev_end_io(struct bio *bio) > > > } > > > > > > kfree(ddip); > > > - bio_endio(bio); > > > + /* > > > + * This is an exception where bio_endio() cannot be used. > > > + * We are already called from within a bio_endio() stack; > > > + * calling it again here would result in a double-completion > > > + * (decrementing bi_remaining twice). We must call the > > > + * original completion routine directly. > > > + */ > > > + bio->bi_end_io(bio); > > > } > > > > > > static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > > @@ -1136,7 +1143,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > > > > > if ((bio_op(bio) == REQ_OP_DISCARD) && > > > !bdev_max_discard_sectors(dc->bdev)) > > > - detached_dev_end_io(bio); > > > + bio_endio(bio); > > > else > > > submit_bio_noacct(bio); > > > } > > > -- > > > 2.34.1 > > > > > > > Hi, > > > > My apologies for the late reply due to a delay in checking my working inbox. > > I see the issue mentioned in: > > https://lore.kernel.org/all/aWU2mO5v6RezmIpZ@moria.home.lan/ > > this was indeed an oversight on my part. > > > > To resolve this quickly, I've prepared a direct fix for the > > double-completion bug. > > I hope this is better than a full revert. > > In general, it's just safer, simpler and saner to revert, reverting a > patch is not something to be avoided. If there's _any_ new trickyness > required in the fix, it's better to just revert than rush things. > > I revert or kick patches out - including my own - all the time. > > That said, this patch is good, you've got a comment explaining what's > going on. Christoph's version of just always cloning the bio is > definitely cleaner, but that's a bigger change, Thank you for the feedback. I sincerely hope that Christoph's version can resolve this issue properly, and that it helps remedy the regression I introduced. I appreciate everyone's patience and the efforts to address this. Let me know if there's anything further needed from my side. Best regards, Shida ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io 2026-01-15 9:17 ` Stephen Zhang @ 2026-01-15 9:28 ` Kent Overstreet 0 siblings, 0 replies; 13+ messages in thread From: Kent Overstreet @ 2026-01-15 9:28 UTC (permalink / raw) To: Stephen Zhang Cc: colyli, axboe, sashal, linux-bcache, linux-kernel, zhangshida, Christoph Hellwig On Thu, Jan 15, 2026 at 05:17:39PM +0800, Stephen Zhang wrote: > Kent Overstreet <kent.overstreet@linux.dev> 于2026年1月15日周四 16:59写道: > > > > On Thu, Jan 15, 2026 at 04:06:53PM +0800, Stephen Zhang wrote: > > > zhangshida <starzhangzsd@gmail.com> 于2026年1月15日周四 15:48写道: > > > > > > > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > > > > > Commit 53280e398471 ("bcache: fix improper use of bi_end_io") attempted > > > > to fix up bio completions by replacing manual bi_end_io calls with > > > > bio_endio(). However, it introduced a double-completion bug in the > > > > detached_dev path. > > > > > > > > In a normal completion path, the call stack is: > > > > blk_update_request > > > > bio_endio(bio) > > > > bio->bi_end_io(bio) -> detached_dev_end_io > > > > bio_endio(bio) <- BUG: second call > > > > > > > > To fix this, detached_dev_end_io() must manually call the next completion > > > > handler in the chain. > > > > > > > > However, in detached_dev_do_request(), if a discard is unsupported, the > > > > bio is rejected before being submitted to the lower level. In this case, > > > > we can use the standard bio_endio(). > > > > > > > > detached_dev_do_request > > > > bio_endio(bio) <- Correct: starts completion for > > > > unsubmitted bio > > > > > > > > Fixes: 53280e398471 ("bcache: fix improper use of bi_end_io") > > > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > > > --- > > > > drivers/md/bcache/request.c | 11 +++++++++-- > > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > > > > index 82fdea7dea7..ec712b5879f 100644 > > > > --- a/drivers/md/bcache/request.c > > > > +++ b/drivers/md/bcache/request.c > > > > @@ -1104,7 +1104,14 @@ static void detached_dev_end_io(struct bio *bio) > > > > } > > > > > > > > kfree(ddip); > > > > - bio_endio(bio); > > > > + /* > > > > + * This is an exception where bio_endio() cannot be used. > > > > + * We are already called from within a bio_endio() stack; > > > > + * calling it again here would result in a double-completion > > > > + * (decrementing bi_remaining twice). We must call the > > > > + * original completion routine directly. > > > > + */ > > > > + bio->bi_end_io(bio); > > > > } > > > > > > > > static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > > > @@ -1136,7 +1143,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > > > > > > > > if ((bio_op(bio) == REQ_OP_DISCARD) && > > > > !bdev_max_discard_sectors(dc->bdev)) > > > > - detached_dev_end_io(bio); > > > > + bio_endio(bio); > > > > else > > > > submit_bio_noacct(bio); > > > > } > > > > -- > > > > 2.34.1 > > > > > > > > > > Hi, > > > > > > My apologies for the late reply due to a delay in checking my working inbox. > > > I see the issue mentioned in: > > > https://lore.kernel.org/all/aWU2mO5v6RezmIpZ@moria.home.lan/ > > > this was indeed an oversight on my part. > > > > > > To resolve this quickly, I've prepared a direct fix for the > > > double-completion bug. > > > I hope this is better than a full revert. > > > > In general, it's just safer, simpler and saner to revert, reverting a > > patch is not something to be avoided. If there's _any_ new trickyness > > required in the fix, it's better to just revert than rush things. > > > > I revert or kick patches out - including my own - all the time. > > > > That said, this patch is good, you've got a comment explaining what's > > going on. Christoph's version of just always cloning the bio is > > definitely cleaner, but that's a bigger change, > > Thank you for the feedback. > > I sincerely hope that Christoph's version can resolve this issue properly, and > that it helps remedy the regression I introduced. I appreciate everyone's > patience and the efforts to address this. > > Let me know if there's anything further needed from my side. Thanks for being attentive, no worries about any of it. It looks like from your patch there was an actual bug you were trying to fix - bio_endio() not being called at all in this case > > > > if ((bio_op(bio) == REQ_OP_DISCARD) && > > > > !bdev_max_discard_sectors(dc->bdev)) That would have been good to highlight up front. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-01-19 8:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-15 7:48 [PATCH] bcache: fix double bio_endio completion in detached_dev_end_io zhangshida 2026-01-15 8:06 ` Stephen Zhang 2026-01-15 8:29 ` Christoph Hellwig 2026-01-18 11:49 ` Coly Li 2026-01-19 6:43 ` Christoph Hellwig 2026-01-19 8:19 ` Coly Li 2026-01-19 8:29 ` Christoph Hellwig 2026-01-19 8:51 ` Coly Li 2026-01-19 6:53 ` Stephen Zhang 2026-01-19 8:04 ` Christoph Hellwig 2026-01-15 8:59 ` Kent Overstreet 2026-01-15 9:17 ` Stephen Zhang 2026-01-15 9:28 ` Kent Overstreet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox