* [PATCH] Revert "bcache: fix improper use of bi_end_io" @ 2026-01-13 6:09 colyli 2026-01-13 8:07 ` Christoph Hellwig 0 siblings, 1 reply; 18+ messages in thread From: colyli @ 2026-01-13 6:09 UTC (permalink / raw) To: axboe Cc: linux-block, linux-bcache, Coly Li, Christoph Hellwig, Shida Zhang, Kent Overstreet From: Coly Li <colyli@fnnas.com> This reverts commit 53280e398471f0bddbb17b798a63d41264651325. The above commit tries to address the race in bio chain handling, but it seems in detached_dev_end_io() simply using bio_endio() to replace bio->bi_end_io() may introduce potential regression. This patch revert the commit, let's wait for better fix from Shida. Signed-off-by: Coly Li <colyli@fnnas.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Shida Zhang <zhangshida@kylinos.cn> Cc: Kent Overstreet <kent.overstreet@linux.dev> --- drivers/md/bcache/request.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 82fdea7dea7a..af345dc6fde1 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -1104,7 +1104,7 @@ static void detached_dev_end_io(struct bio *bio) } kfree(ddip); - bio_endio(bio); + bio->bi_end_io(bio); } static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, @@ -1121,7 +1121,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); if (!ddip) { bio->bi_status = BLK_STS_RESOURCE; - bio_endio(bio); + bio->bi_end_io(bio); return; } @@ -1136,7 +1136,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, if ((bio_op(bio) == REQ_OP_DISCARD) && !bdev_max_discard_sectors(dc->bdev)) - detached_dev_end_io(bio); + bio->bi_end_io(bio); else submit_bio_noacct(bio); } -- 2.47.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 6:09 [PATCH] Revert "bcache: fix improper use of bi_end_io" colyli @ 2026-01-13 8:07 ` Christoph Hellwig 2026-01-13 8:30 ` Kent Overstreet 2026-01-13 16:22 ` Coly Li 0 siblings, 2 replies; 18+ messages in thread From: Christoph Hellwig @ 2026-01-13 8:07 UTC (permalink / raw) To: colyli Cc: axboe, linux-block, linux-bcache, Christoph Hellwig, Shida Zhang, Kent Overstreet On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote: > From: Coly Li <colyli@fnnas.com> > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325. > > The above commit tries to address the race in bio chain handling, > but it seems in detached_dev_end_io() simply using bio_endio() to > replace bio->bi_end_io() may introduce potential regression. > > This patch revert the commit, let's wait for better fix from Shida. That's a pretty vague commit message for reverting a clear API violation that has caused trouble. What is the story here? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 8:07 ` Christoph Hellwig @ 2026-01-13 8:30 ` Kent Overstreet 2026-01-13 8:34 ` Christoph Hellwig 2026-01-13 16:22 ` Coly Li 1 sibling, 1 reply; 18+ messages in thread From: Kent Overstreet @ 2026-01-13 8:30 UTC (permalink / raw) To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 12:07:54AM -0800, Christoph Hellwig wrote: > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote: > > From: Coly Li <colyli@fnnas.com> > > > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325. > > > > The above commit tries to address the race in bio chain handling, > > but it seems in detached_dev_end_io() simply using bio_endio() to > > replace bio->bi_end_io() may introduce potential regression. > > > > This patch revert the commit, let's wait for better fix from Shida. > > That's a pretty vague commit message for reverting a clear API > violation that has caused trouble. What is the story here? Christoph, you can't call bio_endio() on the same bio twice. You should know this. Calling a bare bi_end_io function is the correct thing to do when we're getting called from bio_endio(). ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 8:30 ` Kent Overstreet @ 2026-01-13 8:34 ` Christoph Hellwig 2026-01-13 8:39 ` Kent Overstreet 0 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2026-01-13 8:34 UTC (permalink / raw) To: Kent Overstreet Cc: Christoph Hellwig, colyli, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 03:30:41AM -0500, Kent Overstreet wrote: > On Tue, Jan 13, 2026 at 12:07:54AM -0800, Christoph Hellwig wrote: > > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote: > > > From: Coly Li <colyli@fnnas.com> > > > > > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325. > > > > > > The above commit tries to address the race in bio chain handling, > > > but it seems in detached_dev_end_io() simply using bio_endio() to > > > replace bio->bi_end_io() may introduce potential regression. > > > > > > This patch revert the commit, let's wait for better fix from Shida. > > > > That's a pretty vague commit message for reverting a clear API > > violation that has caused trouble. What is the story here? > > Christoph, you can't call bio_endio() on the same bio twice. You should > know this. Calling a bare bi_end_io function is the correct thing to do > when we're getting called from bio_endio(). Hi Kent, indeed, calling bio_endio() twice is a very bad idea. Nothing in the quoted commit log indicates that is the case, though. If that is the problem it needs to be fixed, but calling ->bi_end_io directly is not the proper fix either. That's eaxtly why I'm asking for the story behind this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 8:34 ` Christoph Hellwig @ 2026-01-13 8:39 ` Kent Overstreet 2026-01-13 8:58 ` Christoph Hellwig 0 siblings, 1 reply; 18+ messages in thread From: Kent Overstreet @ 2026-01-13 8:39 UTC (permalink / raw) To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 12:34:36AM -0800, Christoph Hellwig wrote: > On Tue, Jan 13, 2026 at 03:30:41AM -0500, Kent Overstreet wrote: > > On Tue, Jan 13, 2026 at 12:07:54AM -0800, Christoph Hellwig wrote: > > > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote: > > > > From: Coly Li <colyli@fnnas.com> > > > > > > > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325. > > > > > > > > The above commit tries to address the race in bio chain handling, > > > > but it seems in detached_dev_end_io() simply using bio_endio() to > > > > replace bio->bi_end_io() may introduce potential regression. > > > > > > > > This patch revert the commit, let's wait for better fix from Shida. > > > > > > That's a pretty vague commit message for reverting a clear API > > > violation that has caused trouble. What is the story here? > > > > Christoph, you can't call bio_endio() on the same bio twice. You should > > know this. Calling a bare bi_end_io function is the correct thing to do > > when we're getting called from bio_endio(). > > Hi Kent, > > indeed, calling bio_endio() twice is a very bad idea. Nothing in the > quoted commit log indicates that is the case, though. If that is > the problem it needs to be fixed, but calling ->bi_end_io directly > is not the proper fix either. That's eaxtly why I'm asking for the > story behind this. No. The original (buggy) patch has your name on it in the suggested-by, you should have done your homework. This almost made it into a bunch of stable releases, where it would have exploded as soon as it hit actual users, and I would've gotten some of those bug reports. You need to reexamine your priorities here. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 8:39 ` Kent Overstreet @ 2026-01-13 8:58 ` Christoph Hellwig 2026-01-13 9:27 ` Kent Overstreet 2026-01-13 16:18 ` Coly Li 0 siblings, 2 replies; 18+ messages in thread From: Christoph Hellwig @ 2026-01-13 8:58 UTC (permalink / raw) To: Kent Overstreet Cc: Christoph Hellwig, colyli, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 03:39:28AM -0500, Kent Overstreet wrote: > No. The original (buggy) patch has your name on it in the suggested-by, > you should have done your homework. How about you stop being a dick? And if you're talking about homework do you own and lookup that thread, and help implementing the suggestions for fixing an API abuse, or at least reviewing them instead of angrily shouting at someone suggesting to fix things. > You need to reexamine your priorities here. Maybe my priority should be to better ignore you.. Anyway, Coly: I think the issue is that bcache tries to do a silly shortcut and reuses the bio for multiple layers of I/O which still trying to hook into completions for both. Something like the (untested) patch below fixes that by cloning the bio and making everything work as expected. I'm not sure how dead lock safe even the original version is, and my suspicion is that it needs a bio_set. Of course even suggesting something will probably get me in trouble so take it with a grain of salt. --- From 1a2336f617f2e351564ec20e4db9727584e04aa9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Tue, 13 Jan 2026 09:53:34 +0100 Subject: bcache: clone bio in detached_dev_do_request Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/md/bcache/request.c | 72 ++++++++++++++++++------------------- 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 82fdea7dea7a..9e7b59121313 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata) } struct detached_dev_io_private { - struct bcache_device *d; unsigned long start_time; - bio_end_io_t *bi_end_io; - void *bi_private; - struct block_device *orig_bdev; + struct bio *orig_bio; + struct bio bio; }; static void detached_dev_end_io(struct bio *bio) { - struct detached_dev_io_private *ddip; - - ddip = bio->bi_private; - bio->bi_end_io = ddip->bi_end_io; - bio->bi_private = ddip->bi_private; + struct detached_dev_io_private *ddip = + container_of(bio, struct detached_dev_io_private, bio); + struct bio *orig_bio = ddip->orig_bio; /* Count on the bcache device */ - bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev); + bio_end_io_acct(orig_bio, ddip->start_time); if (bio->bi_status) { - struct cached_dev *dc = container_of(ddip->d, - struct cached_dev, disk); + struct cached_dev *dc = bio->bi_private; + /* should count I/O error for backing device here */ bch_count_backing_io_errors(dc, bio); + orig_bio->bi_status = bio->bi_status; } kfree(ddip); - bio_endio(bio); + bio_endio(orig_bio); } -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, - struct block_device *orig_bdev, unsigned long start_time) +static void detached_dev_do_request(struct bcache_device *d, + struct bio *orig_bio, unsigned long start_time) { struct detached_dev_io_private *ddip; struct cached_dev *dc = container_of(d, struct cached_dev, disk); + if (bio_op(orig_bio) == REQ_OP_DISCARD && + !bdev_max_discard_sectors(dc->bdev)) { + bio_endio(orig_bio); + return; + } + /* * no need to call closure_get(&dc->disk.cl), * because upper layer had already opened bcache device, * which would call closure_get(&dc->disk.cl) */ ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); - if (!ddip) { - bio->bi_status = BLK_STS_RESOURCE; - bio_endio(bio); - return; - } - - ddip->d = d; + if (!ddip) + goto enomem; + if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO)) + goto free_ddip; /* Count on the bcache device */ - ddip->orig_bdev = orig_bdev; ddip->start_time = start_time; - ddip->bi_end_io = bio->bi_end_io; - ddip->bi_private = bio->bi_private; - bio->bi_end_io = detached_dev_end_io; - bio->bi_private = ddip; - - if ((bio_op(bio) == REQ_OP_DISCARD) && - !bdev_max_discard_sectors(dc->bdev)) - detached_dev_end_io(bio); - else - submit_bio_noacct(bio); + ddip->orig_bio = orig_bio; + ddip->bio.bi_end_io = detached_dev_end_io; + ddip->bio.bi_private = dc; + submit_bio_noacct(&ddip->bio); + return; +free_ddip: + kfree(ddip); +enomem: + orig_bio->bi_status = BLK_STS_RESOURCE; + bio_endio(orig_bio); } static void quit_max_writeback_rate(struct cache_set *c, @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio) start_time = bio_start_io_acct(bio); - bio_set_dev(bio, dc->bdev); bio->bi_iter.bi_sector += dc->sb.data_offset; if (cached_dev_get(dc)) { + bio_set_dev(bio, dc->bdev); s = search_alloc(bio, d, orig_bdev, start_time); trace_bcache_request_start(s->d, bio); @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio) else cached_dev_read(dc, s); } - } else + } else { /* I/O request sent to backing device */ - detached_dev_do_request(d, bio, orig_bdev, start_time); + detached_dev_do_request(d, bio, start_time); + } } static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode, -- 2.47.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 8:58 ` Christoph Hellwig @ 2026-01-13 9:27 ` Kent Overstreet 2026-01-13 15:00 ` Christoph Hellwig 2026-01-13 16:18 ` Coly Li 1 sibling, 1 reply; 18+ messages in thread From: Kent Overstreet @ 2026-01-13 9:27 UTC (permalink / raw) To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 12:58:46AM -0800, Christoph Hellwig wrote: > On Tue, Jan 13, 2026 at 03:39:28AM -0500, Kent Overstreet wrote: > > No. The original (buggy) patch has your name on it in the suggested-by, > > you should have done your homework. > > How about you stop being a dick? And if you're talking about homework > do you own and lookup that thread, and help implementing the suggestions > for fixing an API abuse, or at least reviewing them instead of angrily > shouting at someone suggesting to fix things. It's not being a dick to tell someone they need to slow down and be more careful when they're doing something broken and arguing over the revert. The priority is keeping things working for the end user. Show me that you understand that point, and can slow down and be more careful - so I don't have to take time out of my day when I've got other things that I need to be working on - and I'll be able to engage in a much more friendly manner. > Anyway, Coly: I think the issue is that bcache tries to do a silly > shortcut and reuses the bio for multiple layers of I/O which still trying > to hook into completions for both. Something like the (untested) patch > below fixes that by cloning the bio and making everything work as > expected. I'm not sure how dead lock safe even the original version is, > and my suspicion is that it needs a bio_set. Of course even suggesting > something will probably get me in trouble so take it with a grain of > salt. Cloning the bio is completely fine - that's a cleaner and more modern approach. I'd still like to know why you consider the bare bi_end_io calls problematic. If it's just that you want the code clean so you can grep for actual issues, that's an acceptable answer. Historically that's been fine, and I haven't seen it cause issues, so I would like to be enlightened on that point. On a side note, I just yesterday had to shoot down another broken "fix" for memory allocation profiling - where it, again, turned out that the submitter hadn't confirmed anything he was basing his patch on by testing. One of the big factors in why I was ok with continuing bcachefs development outside the kernel is that when getting patch submissions from the kernel community, I repeatedly had to tell people that yes, testing their code is, in fact their responsibility, and I even had people argue with me over it - and at some point I realized that this has literally never been an issue with submissions from people outside the kernel community. This is something that really needs to change. > > --- > From 1a2336f617f2e351564ec20e4db9727584e04aa9 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig <hch@lst.de> > Date: Tue, 13 Jan 2026 09:53:34 +0100 > Subject: bcache: clone bio in detached_dev_do_request > > Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/md/bcache/request.c | 72 ++++++++++++++++++------------------- > 1 file changed, 36 insertions(+), 36 deletions(-) > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index 82fdea7dea7a..9e7b59121313 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata) > } > > struct detached_dev_io_private { > - struct bcache_device *d; > unsigned long start_time; > - bio_end_io_t *bi_end_io; > - void *bi_private; > - struct block_device *orig_bdev; > + struct bio *orig_bio; > + struct bio bio; > }; > > static void detached_dev_end_io(struct bio *bio) > { > - struct detached_dev_io_private *ddip; > - > - ddip = bio->bi_private; > - bio->bi_end_io = ddip->bi_end_io; > - bio->bi_private = ddip->bi_private; > + struct detached_dev_io_private *ddip = > + container_of(bio, struct detached_dev_io_private, bio); > + struct bio *orig_bio = ddip->orig_bio; > > /* Count on the bcache device */ > - bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev); > + bio_end_io_acct(orig_bio, ddip->start_time); > > if (bio->bi_status) { > - struct cached_dev *dc = container_of(ddip->d, > - struct cached_dev, disk); > + struct cached_dev *dc = bio->bi_private; > + > /* should count I/O error for backing device here */ > bch_count_backing_io_errors(dc, bio); > + orig_bio->bi_status = bio->bi_status; > } > > kfree(ddip); > - bio_endio(bio); > + bio_endio(orig_bio); > } > > -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > - struct block_device *orig_bdev, unsigned long start_time) > +static void detached_dev_do_request(struct bcache_device *d, > + struct bio *orig_bio, unsigned long start_time) > { > struct detached_dev_io_private *ddip; > struct cached_dev *dc = container_of(d, struct cached_dev, disk); > > + if (bio_op(orig_bio) == REQ_OP_DISCARD && > + !bdev_max_discard_sectors(dc->bdev)) { > + bio_endio(orig_bio); > + return; > + } > + > /* > * no need to call closure_get(&dc->disk.cl), > * because upper layer had already opened bcache device, > * which would call closure_get(&dc->disk.cl) > */ > ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); > - if (!ddip) { > - bio->bi_status = BLK_STS_RESOURCE; > - bio_endio(bio); > - return; > - } > - > - ddip->d = d; > + if (!ddip) > + goto enomem; > + if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO)) > + goto free_ddip; > /* Count on the bcache device */ > - ddip->orig_bdev = orig_bdev; > ddip->start_time = start_time; > - ddip->bi_end_io = bio->bi_end_io; > - ddip->bi_private = bio->bi_private; > - bio->bi_end_io = detached_dev_end_io; > - bio->bi_private = ddip; > - > - if ((bio_op(bio) == REQ_OP_DISCARD) && > - !bdev_max_discard_sectors(dc->bdev)) > - detached_dev_end_io(bio); > - else > - submit_bio_noacct(bio); > + ddip->orig_bio = orig_bio; > + ddip->bio.bi_end_io = detached_dev_end_io; > + ddip->bio.bi_private = dc; > + submit_bio_noacct(&ddip->bio); > + return; > +free_ddip: > + kfree(ddip); > +enomem: > + orig_bio->bi_status = BLK_STS_RESOURCE; > + bio_endio(orig_bio); > } > > static void quit_max_writeback_rate(struct cache_set *c, > @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio) > > start_time = bio_start_io_acct(bio); > > - bio_set_dev(bio, dc->bdev); > bio->bi_iter.bi_sector += dc->sb.data_offset; > > if (cached_dev_get(dc)) { > + bio_set_dev(bio, dc->bdev); > s = search_alloc(bio, d, orig_bdev, start_time); > trace_bcache_request_start(s->d, bio); > > @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio) > else > cached_dev_read(dc, s); > } > - } else > + } else { > /* I/O request sent to backing device */ > - detached_dev_do_request(d, bio, orig_bdev, start_time); > + detached_dev_do_request(d, bio, start_time); > + } > } > > static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode, > -- > 2.47.3 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 9:27 ` Kent Overstreet @ 2026-01-13 15:00 ` Christoph Hellwig 2026-01-13 15:30 ` Kent Overstreet 0 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2026-01-13 15:00 UTC (permalink / raw) To: Kent Overstreet Cc: Christoph Hellwig, colyli, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 04:27:19AM -0500, Kent Overstreet wrote: > It's not being a dick to tell someone they need to slow down and be more > careful when they're doing something broken and arguing over the revert. Yes, you are. I did not actually do anything here but point out that bcache is broken. If that fix made things worse it would be useful if one of the maintainers or a CI system spots it. You need to do your job as maintainer instead of constantly attacking others. Start with that, and then just try being a little nice to people. Most folks will respond nicely to questions, but less so when you start out with an attack. And until you learn that I have absolutely no interest in wasting any further time on arguing with you in any form. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 15:00 ` Christoph Hellwig @ 2026-01-13 15:30 ` Kent Overstreet 0 siblings, 0 replies; 18+ messages in thread From: Kent Overstreet @ 2026-01-13 15:30 UTC (permalink / raw) To: Christoph Hellwig; +Cc: colyli, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 07:00:03AM -0800, Christoph Hellwig wrote: > On Tue, Jan 13, 2026 at 04:27:19AM -0500, Kent Overstreet wrote: > > It's not being a dick to tell someone they need to slow down and be more > > careful when they're doing something broken and arguing over the revert. > > Yes, you are. I did not actually do anything here but point out that > bcache is broken. There've been no bug reports, people are using the code and it works. You keep claiming breakage, and I asked you to explain what you thought was wrong with the bare bi_end_io call - but you still haven't, and I'm not buying it because I know what the code in bio_endio() does - I wrote some of it, after all, and none of it expects to be called twice on the same bio. You're making changes in code you didn't write, bypassing the maintainers to get it in, without providing any real justification - that's reckless. And you have a habit of calling things "broken" without a good reason, so please cut that out. If, as you claim, calling a bare bi_end_io function in this context is a problem, you better be able to explain why. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 8:58 ` Christoph Hellwig 2026-01-13 9:27 ` Kent Overstreet @ 2026-01-13 16:18 ` Coly Li 2026-01-13 16:26 ` Christoph Hellwig 1 sibling, 1 reply; 18+ messages in thread From: Coly Li @ 2026-01-13 16:18 UTC (permalink / raw) To: Christoph Hellwig Cc: Kent Overstreet, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 12:58:46AM +0800, Christoph Hellwig wrote: > Anyway, Coly: I think the issue is that bcache tries to do a silly > shortcut and reuses the bio for multiple layers of I/O which still trying > to hook into completions for both. Something like the (untested) patch > below fixes that by cloning the bio and making everything work as > expected. I'm not sure how dead lock safe even the original version is, > and my suspicion is that it needs a bio_set. Of course even suggesting > something will probably get me in trouble so take it with a grain of > salt. > > --- > From 1a2336f617f2e351564ec20e4db9727584e04aa9 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig <hch@lst.de> > Date: Tue, 13 Jan 2026 09:53:34 +0100 > Subject: bcache: clone bio in detached_dev_do_request > > Not-yet-Signed-off-by: Christoph Hellwig <hch@lst.de> Hi Christoph, This cloned bio method looks good. Could you please post a formal patch? Then I may replace the revert commit with your patch. Thanks. Coly Li > --- > drivers/md/bcache/request.c | 72 ++++++++++++++++++------------------- > 1 file changed, 36 insertions(+), 36 deletions(-) > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index 82fdea7dea7a..9e7b59121313 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -1078,67 +1078,66 @@ static CLOSURE_CALLBACK(cached_dev_nodata) > } > > struct detached_dev_io_private { > - struct bcache_device *d; > unsigned long start_time; > - bio_end_io_t *bi_end_io; > - void *bi_private; > - struct block_device *orig_bdev; > + struct bio *orig_bio; > + struct bio bio; > }; > > static void detached_dev_end_io(struct bio *bio) > { > - struct detached_dev_io_private *ddip; > - > - ddip = bio->bi_private; > - bio->bi_end_io = ddip->bi_end_io; > - bio->bi_private = ddip->bi_private; > + struct detached_dev_io_private *ddip = > + container_of(bio, struct detached_dev_io_private, bio); > + struct bio *orig_bio = ddip->orig_bio; > > /* Count on the bcache device */ > - bio_end_io_acct_remapped(bio, ddip->start_time, ddip->orig_bdev); > + bio_end_io_acct(orig_bio, ddip->start_time); > > if (bio->bi_status) { > - struct cached_dev *dc = container_of(ddip->d, > - struct cached_dev, disk); > + struct cached_dev *dc = bio->bi_private; > + > /* should count I/O error for backing device here */ > bch_count_backing_io_errors(dc, bio); > + orig_bio->bi_status = bio->bi_status; > } > > kfree(ddip); > - bio_endio(bio); > + bio_endio(orig_bio); > } > > -static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, > - struct block_device *orig_bdev, unsigned long start_time) > +static void detached_dev_do_request(struct bcache_device *d, > + struct bio *orig_bio, unsigned long start_time) > { > struct detached_dev_io_private *ddip; > struct cached_dev *dc = container_of(d, struct cached_dev, disk); > > + if (bio_op(orig_bio) == REQ_OP_DISCARD && > + !bdev_max_discard_sectors(dc->bdev)) { > + bio_endio(orig_bio); > + return; > + } > + > /* > * no need to call closure_get(&dc->disk.cl), > * because upper layer had already opened bcache device, > * which would call closure_get(&dc->disk.cl) > */ > ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); > - if (!ddip) { > - bio->bi_status = BLK_STS_RESOURCE; > - bio_endio(bio); > - return; > - } > - > - ddip->d = d; > + if (!ddip) > + goto enomem; > + if (bio_init_clone(dc->bdev, &ddip->bio, orig_bio, GFP_NOIO)) > + goto free_ddip; > /* Count on the bcache device */ > - ddip->orig_bdev = orig_bdev; > ddip->start_time = start_time; > - ddip->bi_end_io = bio->bi_end_io; > - ddip->bi_private = bio->bi_private; > - bio->bi_end_io = detached_dev_end_io; > - bio->bi_private = ddip; > - > - if ((bio_op(bio) == REQ_OP_DISCARD) && > - !bdev_max_discard_sectors(dc->bdev)) > - detached_dev_end_io(bio); > - else > - submit_bio_noacct(bio); > + ddip->orig_bio = orig_bio; > + ddip->bio.bi_end_io = detached_dev_end_io; > + ddip->bio.bi_private = dc; > + submit_bio_noacct(&ddip->bio); > + return; > +free_ddip: > + kfree(ddip); > +enomem: > + orig_bio->bi_status = BLK_STS_RESOURCE; > + bio_endio(orig_bio); > } > > static void quit_max_writeback_rate(struct cache_set *c, > @@ -1214,10 +1213,10 @@ void cached_dev_submit_bio(struct bio *bio) > > start_time = bio_start_io_acct(bio); > > - bio_set_dev(bio, dc->bdev); > bio->bi_iter.bi_sector += dc->sb.data_offset; > > if (cached_dev_get(dc)) { > + bio_set_dev(bio, dc->bdev); > s = search_alloc(bio, d, orig_bdev, start_time); > trace_bcache_request_start(s->d, bio); > > @@ -1237,9 +1236,10 @@ void cached_dev_submit_bio(struct bio *bio) > else > cached_dev_read(dc, s); > } > - } else > + } else { > /* I/O request sent to backing device */ > - detached_dev_do_request(d, bio, orig_bdev, start_time); > + detached_dev_do_request(d, bio, start_time); > + } > } > > static int cached_dev_ioctl(struct bcache_device *d, blk_mode_t mode, > -- > 2.47.3 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 16:18 ` Coly Li @ 2026-01-13 16:26 ` Christoph Hellwig 2026-01-13 16:34 ` Kent Overstreet 0 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2026-01-13 16:26 UTC (permalink / raw) To: Coly Li Cc: Christoph Hellwig, Kent Overstreet, axboe, linux-block, linux-bcache, Shida Zhang On Wed, Jan 14, 2026 at 12:18:45AM +0800, Coly Li wrote: > Hi Christoph, > > This cloned bio method looks good. Could you please post a formal patch? > Then I may replace the revert commit with your patch. As usual with bcache I don't really know how to test it to confidently submit it. Is there an official test suite I can run now. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 16:26 ` Christoph Hellwig @ 2026-01-13 16:34 ` Kent Overstreet 2026-01-19 9:51 ` Daniel Wagner 0 siblings, 1 reply; 18+ messages in thread From: Kent Overstreet @ 2026-01-13 16:34 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Coly Li, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 08:26:07AM -0800, Christoph Hellwig wrote: > On Wed, Jan 14, 2026 at 12:18:45AM +0800, Coly Li wrote: > > Hi Christoph, > > > > This cloned bio method looks good. Could you please post a formal patch? > > Then I may replace the revert commit with your patch. > > As usual with bcache I don't really know how to test it to confidently > submit it. Is there an official test suite I can run now. Coly and I were just on a call discussing updating my old test suite. I haven't used the bcache tests in > 10 years so they do need to be updated, but the harness and related tooling is well supported both for local development and has full CI. https://evilpiepirate.org/git/ktest.git/tree/README.md https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/ There's fio verify testing in various modes, reboot tests - there was fault injection when I was using it but the fault injection code never got merged into mainline. I wish I could do the updating myself, but I'm neck deep in my own code, but as I was just telling Coly I'm always available on IRC to answer questions - irc.oftc.net #bcache. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 16:34 ` Kent Overstreet @ 2026-01-19 9:51 ` Daniel Wagner 2026-01-19 10:18 ` Kent Overstreet 0 siblings, 1 reply; 18+ messages in thread From: Daniel Wagner @ 2026-01-19 9:51 UTC (permalink / raw) To: Kent Overstreet Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache, Shida Zhang On Tue, Jan 13, 2026 at 11:34:18AM -0500, Kent Overstreet wrote: > Coly and I were just on a call discussing updating my old test suite. I > haven't used the bcache tests in > 10 years so they do need to be > updated, but the harness and related tooling is well supported both for > local development and has full CI. > > https://evilpiepirate.org/git/ktest.git/tree/README.md > https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/ I just quickly look at the tests and I got the impression some of those tests could be added to blktests. blktests is run by various people, thus bcache would get some basic test exposure, e.g. in linux-next. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-19 9:51 ` Daniel Wagner @ 2026-01-19 10:18 ` Kent Overstreet 2026-01-19 10:34 ` Daniel Wagner 0 siblings, 1 reply; 18+ messages in thread From: Kent Overstreet @ 2026-01-19 10:18 UTC (permalink / raw) To: Daniel Wagner Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache, Shida Zhang On Mon, Jan 19, 2026 at 10:51:46AM +0100, Daniel Wagner wrote: > On Tue, Jan 13, 2026 at 11:34:18AM -0500, Kent Overstreet wrote: > > Coly and I were just on a call discussing updating my old test suite. I > > haven't used the bcache tests in > 10 years so they do need to be > > updated, but the harness and related tooling is well supported both for > > local development and has full CI. > > > > https://evilpiepirate.org/git/ktest.git/tree/README.md > > https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/ > > I just quickly look at the tests and I got the impression some of those > tests could be added to blktests. blktests is run by various people, > thus bcache would get some basic test exposure, e.g. in linux-next. ktest has features that blktests/fstests don't - it's a full testrunner, with a CI and test dashboard, with subtest level sharding that runs on entire cluster. What would make sense would be for ktest to wrap blktests, like it already does fstests. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-19 10:18 ` Kent Overstreet @ 2026-01-19 10:34 ` Daniel Wagner 2026-01-19 15:57 ` Daniel Wagner 0 siblings, 1 reply; 18+ messages in thread From: Daniel Wagner @ 2026-01-19 10:34 UTC (permalink / raw) To: Kent Overstreet Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache, Shida Zhang On Mon, Jan 19, 2026 at 05:18:15AM -0500, Kent Overstreet wrote: > > > https://evilpiepirate.org/git/ktest.git/tree/README.md > > > https://evilpiepirate.org/git/ktest.git/tree/tests/bcache/ > > > > I just quickly look at the tests and I got the impression some of those > > tests could be added to blktests. blktests is run by various people, > > thus bcache would get some basic test exposure, e.g. in linux-next. > > ktest has features that blktests/fstests don't - it's a full testrunner, > with a CI and test dashboard, with subtest level sharding that runs on > entire cluster. That's why I said some of tests could be added directly to blktests, e.g. test_main() { setup_tracing 'bcache:*' setup_bcache cset_uuid=$(ls -d /sys/fs/bcache/*-*-* | sed -e 's/.*\///') ( for i in $(seq 1 3); do sleep 5 echo > /sys/block/bcache0/bcache/detach echo "detach done" sleep 5 echo $cset_uuid > /sys/block/bcache0/bcache/attach echo "attach done" done )& run_antagonist run_fio stop_bcache } seems something which could also run in blktests. Sure, blktests doesn't have all the features ktest has, but that is besides the point. The bcache test suite is surely a good thing. Lately, blktests is getting traction and run by more people and QA teams. My whole comment is that adding some explicit bcache tests will get bcache more test exposure. > What would make sense would be for ktest to wrap blktests, like it > already does fstests. Sure, not a problem. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-19 10:34 ` Daniel Wagner @ 2026-01-19 15:57 ` Daniel Wagner 0 siblings, 0 replies; 18+ messages in thread From: Daniel Wagner @ 2026-01-19 15:57 UTC (permalink / raw) To: Kent Overstreet Cc: Christoph Hellwig, Coly Li, axboe, linux-block, linux-bcache, Shida Zhang On Mon, Jan 19, 2026 at 11:34:15AM +0100, Daniel Wagner wrote: > That's why I said some of tests could be added directly to blktests, FWIW, I've started to work on an initial test case for bcache in blktests. I shamelessly stole some helper/setup stuff from ktest to get things rolling. But I think I am going to replace this as it doesn't really match the blktests style for helpers and in the end it's not that much we need to get it working anyway: https://github.com/igaw/blktests/pull/new/bcache In case anyone wants to play with, you need to define TEST_CASE_DEV_ARRAY: # cat /root/config TEST_CASE_DEV_ARRAY[bcache/*]="/dev/nvme0n1 /dev/nvme1n1 /dev/vda /dev/vdb" # ./check -c /root/config bcache ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 8:07 ` Christoph Hellwig 2026-01-13 8:30 ` Kent Overstreet @ 2026-01-13 16:22 ` Coly Li 2026-01-13 16:25 ` Christoph Hellwig 1 sibling, 1 reply; 18+ messages in thread From: Coly Li @ 2026-01-13 16:22 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, linux-block, linux-bcache, Shida Zhang, Kent Overstreet On Tue, Jan 13, 2026 at 12:07:54AM +0800, Christoph Hellwig wrote: > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote: > > From: Coly Li <colyli@fnnas.com> > > > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325. > > > > The above commit tries to address the race in bio chain handling, > > but it seems in detached_dev_end_io() simply using bio_endio() to > > replace bio->bi_end_io() may introduce potential regression. > > > > This patch revert the commit, let's wait for better fix from Shida. > > That's a pretty vague commit message for reverting a clear API > violation that has caused trouble. What is the story here? > The discussion happens in stable mailing list. Also I admitted my fault. Though I didn't ack commit 53280e398471 ("bcache: fix improper use of bi_end_io"), I read it and overlooked the bio_endio() duplicated entry issue. Coly Li ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Revert "bcache: fix improper use of bi_end_io" 2026-01-13 16:22 ` Coly Li @ 2026-01-13 16:25 ` Christoph Hellwig 0 siblings, 0 replies; 18+ messages in thread From: Christoph Hellwig @ 2026-01-13 16:25 UTC (permalink / raw) To: Coly Li Cc: Christoph Hellwig, axboe, linux-block, linux-bcache, Shida Zhang, Kent Overstreet On Wed, Jan 14, 2026 at 12:22:56AM +0800, Coly Li wrote: > On Tue, Jan 13, 2026 at 12:07:54AM +0800, Christoph Hellwig wrote: > > On Tue, Jan 13, 2026 at 02:09:39PM +0800, colyli@fnnas.com wrote: > > > From: Coly Li <colyli@fnnas.com> > > > > > > This reverts commit 53280e398471f0bddbb17b798a63d41264651325. > > > > > > The above commit tries to address the race in bio chain handling, > > > but it seems in detached_dev_end_io() simply using bio_endio() to > > > replace bio->bi_end_io() may introduce potential regression. > > > > > > This patch revert the commit, let's wait for better fix from Shida. > > > > That's a pretty vague commit message for reverting a clear API > > violation that has caused trouble. What is the story here? > > > > The discussion happens in stable mailing list. Also I admitted my fault. > Though I didn't ack commit 53280e398471 ("bcache: fix improper use of > bi_end_io"), I read it and overlooked the bio_endio() duplicated entry > issue. Please try to document the rationale in the commit logs, the current one was extremely vague. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-01-19 15:57 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-13 6:09 [PATCH] Revert "bcache: fix improper use of bi_end_io" colyli 2026-01-13 8:07 ` Christoph Hellwig 2026-01-13 8:30 ` Kent Overstreet 2026-01-13 8:34 ` Christoph Hellwig 2026-01-13 8:39 ` Kent Overstreet 2026-01-13 8:58 ` Christoph Hellwig 2026-01-13 9:27 ` Kent Overstreet 2026-01-13 15:00 ` Christoph Hellwig 2026-01-13 15:30 ` Kent Overstreet 2026-01-13 16:18 ` Coly Li 2026-01-13 16:26 ` Christoph Hellwig 2026-01-13 16:34 ` Kent Overstreet 2026-01-19 9:51 ` Daniel Wagner 2026-01-19 10:18 ` Kent Overstreet 2026-01-19 10:34 ` Daniel Wagner 2026-01-19 15:57 ` Daniel Wagner 2026-01-13 16:22 ` Coly Li 2026-01-13 16:25 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox