* [PATCH v4 0/3] Fix bio chain related issues
@ 2025-12-01 9:04 zhangshida
2025-12-01 9:04 ` [PATCH v4 1/3] bcache: fix improper use of bi_end_io zhangshida
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: zhangshida @ 2025-12-01 9:04 UTC (permalink / raw)
To: Johannes.Thumshirn, hch, agruenba, ming.lei, hsiangkao, csander,
colyli
Cc: linux-block, linux-bcache, linux-kernel, zhangshida, starzhangzsd
From: Shida Zhang <zhangshida@kylinos.cn>
This series addresses incorrect usage of bio_chain_endio().
Note: Patch 2 must depends on changes introduced in patch 1. Therefore,
patch 1 is still included in this series even though Coly suggested
sending it directly to the bcache mailing list:
https://lore.kernel.org/all/20251201082611.2703889-1-zhangshida@kylinos.cn/
v4:
- Removed unnecessary cleanups from the series.
v3:
- Remove the dead code in bio_chain_endio and drop patch 1 in v2
- Refined the __bio_chain_endio changes with minor modifications (was
patch 02 in v2).
- Dropped cleanup patches 06 and 12 from v2 due to an incorrect 'prev'
and 'new' order.
https://lore.kernel.org/all/20251129090122.2457896-1-zhangshida@kylinos.cn/
v2:
- Added fix for bcache.
- Added BUG_ON() in bio_chain_endio().
- Enhanced commit messages for each patch
https://lore.kernel.org/all/20251128083219.2332407-1-zhangshida@kylinos.cn/
v1:
https://lore.kernel.org/all/20251121081748.1443507-1-zhangshida@kylinos.cn/
Shida Zhang (3):
bcache: fix improper use of bi_end_io
block: prohibit calls to bio_chain_endio
block: prevent race condition on bi_status in __bio_chain_endio
block/bio.c | 11 ++++++++---
drivers/md/bcache/request.c | 6 +++---
2 files changed, 11 insertions(+), 6 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v4 1/3] bcache: fix improper use of bi_end_io 2025-12-01 9:04 [PATCH v4 0/3] Fix bio chain related issues zhangshida @ 2025-12-01 9:04 ` zhangshida 2025-12-01 9:04 ` [PATCH v4 2/3] block: prohibit calls to bio_chain_endio zhangshida 2025-12-01 9:04 ` [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio zhangshida 2 siblings, 0 replies; 15+ messages in thread From: zhangshida @ 2025-12-01 9:04 UTC (permalink / raw) To: Johannes.Thumshirn, hch, agruenba, ming.lei, hsiangkao, csander, colyli Cc: linux-block, linux-bcache, linux-kernel, zhangshida, starzhangzsd From: Shida Zhang <zhangshida@kylinos.cn> Don't call bio->bi_end_io() directly. Use the bio_endio() helper function instead, which handles completion more safely and uniformly. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- drivers/md/bcache/request.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index af345dc6fde..82fdea7dea7 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -1104,7 +1104,7 @@ static void detached_dev_end_io(struct bio *bio) } kfree(ddip); - bio->bi_end_io(bio); + bio_endio(bio); } static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, @@ -1121,7 +1121,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); if (!ddip) { bio->bi_status = BLK_STS_RESOURCE; - bio->bi_end_io(bio); + bio_endio(bio); return; } @@ -1136,7 +1136,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio, if ((bio_op(bio) == REQ_OP_DISCARD) && !bdev_max_discard_sectors(dc->bdev)) - bio->bi_end_io(bio); + detached_dev_end_io(bio); else submit_bio_noacct(bio); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 2/3] block: prohibit calls to bio_chain_endio 2025-12-01 9:04 [PATCH v4 0/3] Fix bio chain related issues zhangshida 2025-12-01 9:04 ` [PATCH v4 1/3] bcache: fix improper use of bi_end_io zhangshida @ 2025-12-01 9:04 ` zhangshida 2025-12-01 9:34 ` Andreas Gruenbacher 2025-12-01 9:04 ` [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio zhangshida 2 siblings, 1 reply; 15+ messages in thread From: zhangshida @ 2025-12-01 9:04 UTC (permalink / raw) To: Johannes.Thumshirn, hch, agruenba, ming.lei, hsiangkao, csander, colyli Cc: linux-block, linux-bcache, linux-kernel, zhangshida, starzhangzsd, Christoph Hellwig From: Shida Zhang <zhangshida@kylinos.cn> Now that all potential callers of bio_chain_endio have been eliminated, completely prohibit any future calls to this function. Suggested-by: Ming Lei <ming.lei@redhat.com> Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- block/bio.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/block/bio.c b/block/bio.c index b3a79285c27..1b5e4577f4c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -320,9 +320,13 @@ static struct bio *__bio_chain_endio(struct bio *bio) return parent; } +/** + * This function should only be used as a flag and must never be called. + * If execution reaches here, it indicates a serious programming error. + */ static void bio_chain_endio(struct bio *bio) { - bio_endio(__bio_chain_endio(bio)); + BUG_ON(1); } /** -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] block: prohibit calls to bio_chain_endio 2025-12-01 9:04 ` [PATCH v4 2/3] block: prohibit calls to bio_chain_endio zhangshida @ 2025-12-01 9:34 ` Andreas Gruenbacher 2025-12-04 1:17 ` Stephen Zhang 0 siblings, 1 reply; 15+ messages in thread From: Andreas Gruenbacher @ 2025-12-01 9:34 UTC (permalink / raw) To: zhangshida Cc: Johannes.Thumshirn, hch, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida, Christoph Hellwig On Mon, Dec 1, 2025 at 10:05 AM zhangshida <starzhangzsd@gmail.com> wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > Now that all potential callers of bio_chain_endio have been > eliminated, completely prohibit any future calls to this function. > > Suggested-by: Ming Lei <ming.lei@redhat.com> > Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> > Suggested-by: Christoph Hellwig <hch@infradead.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > --- > block/bio.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/block/bio.c b/block/bio.c > index b3a79285c27..1b5e4577f4c 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -320,9 +320,13 @@ static struct bio *__bio_chain_endio(struct bio *bio) > return parent; > } > > +/** > + * This function should only be used as a flag and must never be called. > + * If execution reaches here, it indicates a serious programming error. > + */ > static void bio_chain_endio(struct bio *bio) > { > - bio_endio(__bio_chain_endio(bio)); > + BUG_ON(1); Just 'BUG()'. > } > > /** > -- > 2.34.1 > Andreas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] block: prohibit calls to bio_chain_endio 2025-12-01 9:34 ` Andreas Gruenbacher @ 2025-12-04 1:17 ` Stephen Zhang 0 siblings, 0 replies; 15+ messages in thread From: Stephen Zhang @ 2025-12-04 1:17 UTC (permalink / raw) To: Andreas Gruenbacher Cc: Johannes.Thumshirn, hch, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida, Christoph Hellwig Andreas Gruenbacher <agruenba@redhat.com> 于2025年12月1日周一 17:34写道: > > On Mon, Dec 1, 2025 at 10:05 AM zhangshida <starzhangzsd@gmail.com> wrote: > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > Now that all potential callers of bio_chain_endio have been > > eliminated, completely prohibit any future calls to this function. > > > > Suggested-by: Ming Lei <ming.lei@redhat.com> > > Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> > > Suggested-by: Christoph Hellwig <hch@infradead.org> > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > --- > > block/bio.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/block/bio.c b/block/bio.c > > index b3a79285c27..1b5e4577f4c 100644 > > --- a/block/bio.c > > +++ b/block/bio.c > > @@ -320,9 +320,13 @@ static struct bio *__bio_chain_endio(struct bio *bio) > > return parent; > > } > > > > +/** > > + * This function should only be used as a flag and must never be called. > > + * If execution reaches here, it indicates a serious programming error. > > + */ > > static void bio_chain_endio(struct bio *bio) > > { > > - bio_endio(__bio_chain_endio(bio)); > > + BUG_ON(1); > > Just 'BUG()'. > Yeah, that’s much clearer now. I'll resend the updated version. Thanks, Shida > > } > > > > /** > > > -- > > 2.34.1 > > > > Andreas > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-01 9:04 [PATCH v4 0/3] Fix bio chain related issues zhangshida 2025-12-01 9:04 ` [PATCH v4 1/3] bcache: fix improper use of bi_end_io zhangshida 2025-12-01 9:04 ` [PATCH v4 2/3] block: prohibit calls to bio_chain_endio zhangshida @ 2025-12-01 9:04 ` zhangshida 2025-12-01 10:22 ` Andreas Gruenbacher 2 siblings, 1 reply; 15+ messages in thread From: zhangshida @ 2025-12-01 9:04 UTC (permalink / raw) To: Johannes.Thumshirn, hch, agruenba, ming.lei, hsiangkao, csander, colyli Cc: linux-block, linux-bcache, linux-kernel, zhangshida, starzhangzsd, Christoph Hellwig From: Shida Zhang <zhangshida@kylinos.cn> Andreas point out that multiple completions can race setting bi_status. The check (parent->bi_status) and the subsequent write are not an atomic operation. The value of parent->bi_status could have changed between the time you read it for the if check and the time you write to it. So we use cmpxchg to fix the race, as suggested by Christoph. Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- block/bio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/block/bio.c b/block/bio.c index 1b5e4577f4c..097c1cd2054 100644 --- a/block/bio.c +++ b/block/bio.c @@ -314,8 +314,9 @@ static struct bio *__bio_chain_endio(struct bio *bio) { struct bio *parent = bio->bi_private; - if (bio->bi_status && !parent->bi_status) - parent->bi_status = bio->bi_status; + if (bio->bi_status) + cmpxchg(&parent->bi_status, 0, bio->bi_status); + bio_put(bio); return parent; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-01 9:04 ` [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio zhangshida @ 2025-12-01 10:22 ` Andreas Gruenbacher 2025-12-01 11:25 ` Christoph Hellwig 0 siblings, 1 reply; 15+ messages in thread From: Andreas Gruenbacher @ 2025-12-01 10:22 UTC (permalink / raw) To: zhangshida Cc: Johannes.Thumshirn, hch, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida, Christoph Hellwig On Mon, Dec 1, 2025 at 10:05 AM zhangshida <starzhangzsd@gmail.com> wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > Andreas point out that multiple completions can race setting > bi_status. > > The check (parent->bi_status) and the subsequent write are not an > atomic operation. The value of parent->bi_status could have changed > between the time you read it for the if check and the time you write > to it. So we use cmpxchg to fix the race, as suggested by Christoph. > > Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> > Suggested-by: Christoph Hellwig <hch@infradead.org> > Suggested-by: Caleb Sander Mateos <csander@purestorage.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > --- > block/bio.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 1b5e4577f4c..097c1cd2054 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -314,8 +314,9 @@ static struct bio *__bio_chain_endio(struct bio *bio) > { > struct bio *parent = bio->bi_private; > > - if (bio->bi_status && !parent->bi_status) > - parent->bi_status = bio->bi_status; > + if (bio->bi_status) > + cmpxchg(&parent->bi_status, 0, bio->bi_status); Hmm. I don't think cmpxchg() actually is of any value here: for all the chained bios, bi_status is initialized to 0, and it is only set again (to a non-0 value) when a failure occurs. When there are multiple failures, we only need to make sure that one of those failures is eventually reported, but for that, a simple assignment is enough here. The cmpxchg() won't guarantee that a specific error value will survive; it all still depends on the timing. The cmpxchg() only makes it look like something special is happening here with respect to ordering. > + > bio_put(bio); > return parent; > } > -- > 2.34.1 > Thanks, Andreas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-01 10:22 ` Andreas Gruenbacher @ 2025-12-01 11:25 ` Christoph Hellwig 2025-12-01 13:07 ` Andreas Gruenbacher 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2025-12-01 11:25 UTC (permalink / raw) To: Andreas Gruenbacher Cc: zhangshida, Johannes.Thumshirn, hch, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida, Christoph Hellwig On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > - if (bio->bi_status && !parent->bi_status) > > - parent->bi_status = bio->bi_status; > > + if (bio->bi_status) > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > Hmm. I don't think cmpxchg() actually is of any value here: for all > the chained bios, bi_status is initialized to 0, and it is only set > again (to a non-0 value) when a failure occurs. When there are > multiple failures, we only need to make sure that one of those > failures is eventually reported, but for that, a simple assignment is > enough here. A simple assignment doesn't guarantee atomicy. It also overrides earlier with later status codes, which might not be desirable. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-01 11:25 ` Christoph Hellwig @ 2025-12-01 13:07 ` Andreas Gruenbacher 2025-12-02 5:48 ` Christoph Hellwig 0 siblings, 1 reply; 15+ messages in thread From: Andreas Gruenbacher @ 2025-12-01 13:07 UTC (permalink / raw) To: Christoph Hellwig Cc: zhangshida, Johannes.Thumshirn, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida, Christoph Hellwig On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@infradead.org> wrote: > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > > - if (bio->bi_status && !parent->bi_status) > > > - parent->bi_status = bio->bi_status; > > > + if (bio->bi_status) > > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > the chained bios, bi_status is initialized to 0, and it is only set > > again (to a non-0 value) when a failure occurs. When there are > > multiple failures, we only need to make sure that one of those > > failures is eventually reported, but for that, a simple assignment is > > enough here. > > A simple assignment doesn't guarantee atomicy. Well, we've already discussed that bi_status is a single byte and so tearing won't be an issue. Otherwise, WRITE_ONCE() would still be enough here. > It also overrides earlier with later status codes, which might not be desirable. In an A -> B bio chain, we have two competing bi_status writers: (1) when the A fails, B->bi_status will be updated using cmpxchg(), (2) when B fails, bi_status will be assigned a non-0 value. In that scenario, B failures will always win over A failures. In addition, when we have an A -> B -> C bio chain, we still won't get "first failure wins" semantics because a failure of A will only be propagated to C once B completes as well. To "fix" that, we'd have to "chain" all bios to the same parent instead. But I don't think any of that is really needed. Andreas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-01 13:07 ` Andreas Gruenbacher @ 2025-12-02 5:48 ` Christoph Hellwig 2025-12-02 21:15 ` Andreas Gruenbacher 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2025-12-02 5:48 UTC (permalink / raw) To: Andreas Gruenbacher Cc: zhangshida, Johannes.Thumshirn, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida On Mon, Dec 01, 2025 at 02:07:07PM +0100, Andreas Gruenbacher wrote: > On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@infradead.org> wrote: > > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > > > - if (bio->bi_status && !parent->bi_status) > > > > - parent->bi_status = bio->bi_status; > > > > + if (bio->bi_status) > > > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > > the chained bios, bi_status is initialized to 0, and it is only set > > > again (to a non-0 value) when a failure occurs. When there are > > > multiple failures, we only need to make sure that one of those > > > failures is eventually reported, but for that, a simple assignment is > > > enough here. > > > > A simple assignment doesn't guarantee atomicy. > > Well, we've already discussed that bi_status is a single byte and so > tearing won't be an issue. Otherwise, WRITE_ONCE() would still be > enough here. No. At least older alpha can tear byte updates as they need a read-modify-write cycle. But even on normal x86 the check and the update would be racy. The cmpxchg makes the intentions very clear, works everywhere and given it only happens in the error path does not create any fast path overhead. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-02 5:48 ` Christoph Hellwig @ 2025-12-02 21:15 ` Andreas Gruenbacher 2025-12-03 1:51 ` Stephen Zhang 0 siblings, 1 reply; 15+ messages in thread From: Andreas Gruenbacher @ 2025-12-02 21:15 UTC (permalink / raw) To: Christoph Hellwig Cc: zhangshida, Johannes.Thumshirn, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida On Tue, Dec 2, 2025 at 6:48 AM Christoph Hellwig <hch@lst.de> wrote: > On Mon, Dec 01, 2025 at 02:07:07PM +0100, Andreas Gruenbacher wrote: > > On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@infradead.org> wrote: > > > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > > > > - if (bio->bi_status && !parent->bi_status) > > > > > - parent->bi_status = bio->bi_status; > > > > > + if (bio->bi_status) > > > > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > > > > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > > > the chained bios, bi_status is initialized to 0, and it is only set > > > > again (to a non-0 value) when a failure occurs. When there are > > > > multiple failures, we only need to make sure that one of those > > > > failures is eventually reported, but for that, a simple assignment is > > > > enough here. > > > > > > A simple assignment doesn't guarantee atomicy. > > > > Well, we've already discussed that bi_status is a single byte and so > > tearing won't be an issue. Otherwise, WRITE_ONCE() would still be > > enough here. > > No. At least older alpha can tear byte updates as they need a > read-modify-write cycle. I know this used to be a thing in the past, but to see that none of that is relevant anymore today, have a look at where [*] quotes the C11 standard: memory location either an object of scalar type, or a maximal sequence of adjacent bit-fields all having nonzero width NOTE 1: Two threads of execution can update and access separate memory locations without interfering with each other. NOTE 2: A bit-field and an adjacent non-bit-field member are in separate memory locations. The same applies to two bit-fields, if one is declared inside a nested structure declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field member declaration. It is not safe to concurrently update two bit-fields in the same structure if all members declared between them are also bit-fields, no matter what the sizes of those intervening bit-fields happen to be. [*] Documentation/memory-barriers.txt > But even on normal x86 the check and the update would be racy. There is no check and update (RMW), though. Quoting what I wrote earlier in this thread: On Mon, Dec 1, 2025 at 11:22 AM Andreas Gruenbacher <agruenba@redhat.com> wrote: > Hmm. I don't think cmpxchg() actually is of any value here: for all > the chained bios, bi_status is initialized to 0, and it is only set > again (to a non-0 value) when a failure occurs. When there are > multiple failures, we only need to make sure that one of those > failures is eventually reported, but for that, a simple assignment is > enough here. The cmpxchg() won't guarantee that a specific error value > will survive; it all still depends on the timing. The cmpxchg() only > makes it look like something special is happening here with respect to > ordering. So with or without the cmpxchg(), if there are multiple errors, we won't know which bi_status code will survive, but we do know that we will end up with one of those error codes. Andreas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-02 21:15 ` Andreas Gruenbacher @ 2025-12-03 1:51 ` Stephen Zhang 2025-12-03 3:09 ` Stephen Zhang 0 siblings, 1 reply; 15+ messages in thread From: Stephen Zhang @ 2025-12-03 1:51 UTC (permalink / raw) To: Andreas Gruenbacher Cc: Christoph Hellwig, Johannes.Thumshirn, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida Andreas Gruenbacher <agruenba@redhat.com> 于2025年12月3日周三 05:15写道: > > On Tue, Dec 2, 2025 at 6:48 AM Christoph Hellwig <hch@lst.de> wrote: > > On Mon, Dec 01, 2025 at 02:07:07PM +0100, Andreas Gruenbacher wrote: > > > On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@infradead.org> wrote: > > > > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > > > > > - if (bio->bi_status && !parent->bi_status) > > > > > > - parent->bi_status = bio->bi_status; > > > > > > + if (bio->bi_status) > > > > > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > > > > > > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > > > > the chained bios, bi_status is initialized to 0, and it is only set > > > > > again (to a non-0 value) when a failure occurs. When there are > > > > > multiple failures, we only need to make sure that one of those > > > > > failures is eventually reported, but for that, a simple assignment is > > > > > enough here. > > > > > > > > A simple assignment doesn't guarantee atomicy. > > > > > > Well, we've already discussed that bi_status is a single byte and so > > > tearing won't be an issue. Otherwise, WRITE_ONCE() would still be > > > enough here. > > > > No. At least older alpha can tear byte updates as they need a > > read-modify-write cycle. > > I know this used to be a thing in the past, but to see that none of > that is relevant anymore today, have a look at where [*] quotes the > C11 standard: > > memory location > either an object of scalar type, or a maximal sequence > of adjacent bit-fields all having nonzero width > > NOTE 1: Two threads of execution can update and access > separate memory locations without interfering with > each other. > > NOTE 2: A bit-field and an adjacent non-bit-field member > are in separate memory locations. The same applies > to two bit-fields, if one is declared inside a nested > structure declaration and the other is not, or if the two > are separated by a zero-length bit-field declaration, > or if they are separated by a non-bit-field member > declaration. It is not safe to concurrently update two > bit-fields in the same structure if all members declared > between them are also bit-fields, no matter what the > sizes of those intervening bit-fields happen to be. > > [*] Documentation/memory-barriers.txt > > > But even on normal x86 the check and the update would be racy. > > There is no check and update (RMW), though. Quoting what I wrote > earlier in this thread: > > On Mon, Dec 1, 2025 at 11:22 AM Andreas Gruenbacher <agruenba@redhat.com> wrote: > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > the chained bios, bi_status is initialized to 0, and it is only set > > again (to a non-0 value) when a failure occurs. When there are > > multiple failures, we only need to make sure that one of those > > failures is eventually reported, but for that, a simple assignment is > > enough here. The cmpxchg() won't guarantee that a specific error value > > will survive; it all still depends on the timing. The cmpxchg() only > > makes it look like something special is happening here with respect to > > ordering. > > So with or without the cmpxchg(), if there are multiple errors, we > won't know which bi_status code will survive, but we do know that we > will end up with one of those error codes. > Thank you for sharing your insights—I found the discussion very enlightening. While I agree with Andreas’s perspective, I also very much appreciate the clarity and precision offered by the cmpxchg() approach. That’s why when Christoph suggested it, I was happy to incorporate it into the code. But a cmpxchg is a little bit redundant here. so we will change it to the simple assignment: - if (bio->bi_status && !parent->bi_status) parent->bi_status = bio->bi_status; + if (bio->bi_status) parent->bi_status = bio->bi_status; I will integrate this discussion into the commit message, it is very insightful. Thanks, Shida > Andreas > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-03 1:51 ` Stephen Zhang @ 2025-12-03 3:09 ` Stephen Zhang 2025-12-03 4:34 ` Caleb Sander Mateos 2025-12-03 6:14 ` Christoph Hellwig 0 siblings, 2 replies; 15+ messages in thread From: Stephen Zhang @ 2025-12-03 3:09 UTC (permalink / raw) To: Andreas Gruenbacher Cc: Christoph Hellwig, Johannes.Thumshirn, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida Stephen Zhang <starzhangzsd@gmail.com> 于2025年12月3日周三 09:51写道: > > Andreas Gruenbacher <agruenba@redhat.com> 于2025年12月3日周三 05:15写道: > > > > On Tue, Dec 2, 2025 at 6:48 AM Christoph Hellwig <hch@lst.de> wrote: > > > On Mon, Dec 01, 2025 at 02:07:07PM +0100, Andreas Gruenbacher wrote: > > > > On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@infradead.org> wrote: > > > > > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > > > > > > - if (bio->bi_status && !parent->bi_status) > > > > > > > - parent->bi_status = bio->bi_status; > > > > > > > + if (bio->bi_status) > > > > > > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > > > > > > > > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > > > > > the chained bios, bi_status is initialized to 0, and it is only set > > > > > > again (to a non-0 value) when a failure occurs. When there are > > > > > > multiple failures, we only need to make sure that one of those > > > > > > failures is eventually reported, but for that, a simple assignment is > > > > > > enough here. > > > > > > > > > > A simple assignment doesn't guarantee atomicy. > > > > > > > > Well, we've already discussed that bi_status is a single byte and so > > > > tearing won't be an issue. Otherwise, WRITE_ONCE() would still be > > > > enough here. > > > > > > No. At least older alpha can tear byte updates as they need a > > > read-modify-write cycle. > > > > I know this used to be a thing in the past, but to see that none of > > that is relevant anymore today, have a look at where [*] quotes the > > C11 standard: > > > > memory location > > either an object of scalar type, or a maximal sequence > > of adjacent bit-fields all having nonzero width > > > > NOTE 1: Two threads of execution can update and access > > separate memory locations without interfering with > > each other. > > > > NOTE 2: A bit-field and an adjacent non-bit-field member > > are in separate memory locations. The same applies > > to two bit-fields, if one is declared inside a nested > > structure declaration and the other is not, or if the two > > are separated by a zero-length bit-field declaration, > > or if they are separated by a non-bit-field member > > declaration. It is not safe to concurrently update two > > bit-fields in the same structure if all members declared > > between them are also bit-fields, no matter what the > > sizes of those intervening bit-fields happen to be. > > > > [*] Documentation/memory-barriers.txt > > > > > But even on normal x86 the check and the update would be racy. > > > > There is no check and update (RMW), though. Quoting what I wrote > > earlier in this thread: > > > > On Mon, Dec 1, 2025 at 11:22 AM Andreas Gruenbacher <agruenba@redhat.com> wrote: > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > > the chained bios, bi_status is initialized to 0, and it is only set > > > again (to a non-0 value) when a failure occurs. When there are > > > multiple failures, we only need to make sure that one of those > > > failures is eventually reported, but for that, a simple assignment is > > > enough here. The cmpxchg() won't guarantee that a specific error value > > > will survive; it all still depends on the timing. The cmpxchg() only > > > makes it look like something special is happening here with respect to > > > ordering. > > > > So with or without the cmpxchg(), if there are multiple errors, we > > won't know which bi_status code will survive, but we do know that we > > will end up with one of those error codes. > > > > Thank you for sharing your insights—I found the discussion very enlightening. > > While I agree with Andreas’s perspective, I also very much appreciate > the clarity > and precision offered by the cmpxchg() approach. That’s why when Christoph > suggested it, I was happy to incorporate it into the code. > > But a cmpxchg is a little bit redundant here. > so we will change it to the simple assignment: > > - if (bio->bi_status && !parent->bi_status) > parent->bi_status = bio->bi_status; > + if (bio->bi_status) > parent->bi_status = bio->bi_status; > > I will integrate this discussion into the commit message, it is very insightful. > Hi, I’ve been reconsidering the two approaches for the upcoming patch revision. Essentially, we’re comparing two methods: A: if (bio->bi_status) parent->bi_status = bio->bi_status; B: if (bio->bi_status) cmpxchg(&parent->bi_status, 0, bio->bi_status); Both appear correct, but B seems a little bit redundant here. Upon further reflection, I’ve noticed a subtle difference: A unconditionally writes to parent->bi_status when bio->bi_status is non-zero, regardless of the current value of parent->bi_status. B uses cmpxchg to only update parent->bi_status if it is still zero. Thus, B could avoid unnecessary writes in cases where parent->bi_status has already been set to a non-zero value. Do you think this optimization would be beneficial in practice, or is the difference negligible? Thanks, Shida > Thanks, > Shida > > > Andreas > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-03 3:09 ` Stephen Zhang @ 2025-12-03 4:34 ` Caleb Sander Mateos 2025-12-03 6:14 ` Christoph Hellwig 1 sibling, 0 replies; 15+ messages in thread From: Caleb Sander Mateos @ 2025-12-03 4:34 UTC (permalink / raw) To: Stephen Zhang Cc: Andreas Gruenbacher, Christoph Hellwig, Johannes.Thumshirn, ming.lei, hsiangkao, colyli, linux-block, linux-bcache, linux-kernel, zhangshida On Tue, Dec 2, 2025 at 7:10 PM Stephen Zhang <starzhangzsd@gmail.com> wrote: > > Stephen Zhang <starzhangzsd@gmail.com> 于2025年12月3日周三 09:51写道: > > > > Andreas Gruenbacher <agruenba@redhat.com> 于2025年12月3日周三 05:15写道: > > > > > > On Tue, Dec 2, 2025 at 6:48 AM Christoph Hellwig <hch@lst.de> wrote: > > > > On Mon, Dec 01, 2025 at 02:07:07PM +0100, Andreas Gruenbacher wrote: > > > > > On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@infradead.org> wrote: > > > > > > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote: > > > > > > > > - if (bio->bi_status && !parent->bi_status) > > > > > > > > - parent->bi_status = bio->bi_status; > > > > > > > > + if (bio->bi_status) > > > > > > > > + cmpxchg(&parent->bi_status, 0, bio->bi_status); > > > > > > > > > > > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > > > > > > the chained bios, bi_status is initialized to 0, and it is only set > > > > > > > again (to a non-0 value) when a failure occurs. When there are > > > > > > > multiple failures, we only need to make sure that one of those > > > > > > > failures is eventually reported, but for that, a simple assignment is > > > > > > > enough here. > > > > > > > > > > > > A simple assignment doesn't guarantee atomicy. > > > > > > > > > > Well, we've already discussed that bi_status is a single byte and so > > > > > tearing won't be an issue. Otherwise, WRITE_ONCE() would still be > > > > > enough here. > > > > > > > > No. At least older alpha can tear byte updates as they need a > > > > read-modify-write cycle. > > > > > > I know this used to be a thing in the past, but to see that none of > > > that is relevant anymore today, have a look at where [*] quotes the > > > C11 standard: > > > > > > memory location > > > either an object of scalar type, or a maximal sequence > > > of adjacent bit-fields all having nonzero width > > > > > > NOTE 1: Two threads of execution can update and access > > > separate memory locations without interfering with > > > each other. > > > > > > NOTE 2: A bit-field and an adjacent non-bit-field member > > > are in separate memory locations. The same applies > > > to two bit-fields, if one is declared inside a nested > > > structure declaration and the other is not, or if the two > > > are separated by a zero-length bit-field declaration, > > > or if they are separated by a non-bit-field member > > > declaration. It is not safe to concurrently update two > > > bit-fields in the same structure if all members declared > > > between them are also bit-fields, no matter what the > > > sizes of those intervening bit-fields happen to be. > > > > > > [*] Documentation/memory-barriers.txt > > > > > > > But even on normal x86 the check and the update would be racy. > > > > > > There is no check and update (RMW), though. Quoting what I wrote > > > earlier in this thread: > > > > > > On Mon, Dec 1, 2025 at 11:22 AM Andreas Gruenbacher <agruenba@redhat.com> wrote: > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all > > > > the chained bios, bi_status is initialized to 0, and it is only set > > > > again (to a non-0 value) when a failure occurs. When there are > > > > multiple failures, we only need to make sure that one of those > > > > failures is eventually reported, but for that, a simple assignment is > > > > enough here. The cmpxchg() won't guarantee that a specific error value > > > > will survive; it all still depends on the timing. The cmpxchg() only > > > > makes it look like something special is happening here with respect to > > > > ordering. > > > > > > So with or without the cmpxchg(), if there are multiple errors, we > > > won't know which bi_status code will survive, but we do know that we > > > will end up with one of those error codes. > > > > > > > Thank you for sharing your insights—I found the discussion very enlightening. > > > > While I agree with Andreas’s perspective, I also very much appreciate > > the clarity > > and precision offered by the cmpxchg() approach. That’s why when Christoph > > suggested it, I was happy to incorporate it into the code. > > > > But a cmpxchg is a little bit redundant here. > > so we will change it to the simple assignment: > > > > - if (bio->bi_status && !parent->bi_status) > > parent->bi_status = bio->bi_status; > > + if (bio->bi_status) > > parent->bi_status = bio->bi_status; > > > > I will integrate this discussion into the commit message, it is very insightful. > > > > Hi, > > I’ve been reconsidering the two approaches for the upcoming patch revision. > Essentially, we’re comparing two methods: > A: > if (bio->bi_status) > parent->bi_status = bio->bi_status; > B: > if (bio->bi_status) > cmpxchg(&parent->bi_status, 0, bio->bi_status); > > Both appear correct, but B seems a little bit redundant here. > Upon further reflection, I’ve noticed a subtle difference: > A unconditionally writes to parent->bi_status when bio->bi_status is non-zero, > regardless of the current value of parent->bi_status. > B uses cmpxchg to only update parent->bi_status if it is still zero. > > Thus, B could avoid unnecessary writes in cases where parent->bi_status has > already been set to a non-zero value. > > Do you think this optimization would be beneficial in practice, or is > the difference > negligible? cmpxchg() is much more expensive than a plain write, as it compiles into an atomic instruction (or load-linked/store-conditional pair on architectures that don't provide such an instruction). This requires the processor to take exclusive access of the cache line for the duration of the operation (or check whether the cache line has been invalidated during the operation). On x86, cmpxchg() marks the cache line as modified regardless of whether the compare succeeds and the value is updated. So it doesn't actually avoid the cost of a write. However, the original code's check of parent->bi_status before writing to it *does* avoid the write if parent->bi_status is already nonzero. So the optimization you're looking for is already implemented :) If __bio_chain_endio() can be called concurrently from multiple threads accessing the same parent bio, it should be using WRITE_ONCE() (and READ_ONCE(), if applicable) to access parent->bi_status to avoid the data race. On x86 and ARM these macros produce the same instruction as a normal write, but they may be required on other architectures to avoid tearing, as well as to prevent the compiler from adding or removing memory accesses based on the assumption that the values aren't being accessed concurrently by other threads. Best, Caleb ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio 2025-12-03 3:09 ` Stephen Zhang 2025-12-03 4:34 ` Caleb Sander Mateos @ 2025-12-03 6:14 ` Christoph Hellwig 1 sibling, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-12-03 6:14 UTC (permalink / raw) To: Stephen Zhang Cc: Andreas Gruenbacher, Christoph Hellwig, Johannes.Thumshirn, ming.lei, hsiangkao, csander, colyli, linux-block, linux-bcache, linux-kernel, zhangshida On Wed, Dec 03, 2025 at 11:09:36AM +0800, Stephen Zhang wrote: > > I’ve been reconsidering the two approaches for the upcoming patch revision. > Essentially, we’re comparing two methods: > A: > if (bio->bi_status) > parent->bi_status = bio->bi_status; > B: > if (bio->bi_status) > cmpxchg(&parent->bi_status, 0, bio->bi_status); > > Both appear correct, but B seems a little bit redundant here. A is not correct. You at least needs READ_ONCE/WRITE_ONCE here. B solves all these issues. > Upon further reflection, I’ve noticed a subtle difference: > A unconditionally writes to parent->bi_status when bio->bi_status is non-zero, > regardless of the current value of parent->bi_status. > B uses cmpxchg to only update parent->bi_status if it is still zero. > > Thus, B could avoid unnecessary writes in cases where parent->bi_status has > already been set to a non-zero value. The unnecessary writes don't really matter, we're in an error slow path here. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-12-04 1:18 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-01 9:04 [PATCH v4 0/3] Fix bio chain related issues zhangshida 2025-12-01 9:04 ` [PATCH v4 1/3] bcache: fix improper use of bi_end_io zhangshida 2025-12-01 9:04 ` [PATCH v4 2/3] block: prohibit calls to bio_chain_endio zhangshida 2025-12-01 9:34 ` Andreas Gruenbacher 2025-12-04 1:17 ` Stephen Zhang 2025-12-01 9:04 ` [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio zhangshida 2025-12-01 10:22 ` Andreas Gruenbacher 2025-12-01 11:25 ` Christoph Hellwig 2025-12-01 13:07 ` Andreas Gruenbacher 2025-12-02 5:48 ` Christoph Hellwig 2025-12-02 21:15 ` Andreas Gruenbacher 2025-12-03 1:51 ` Stephen Zhang 2025-12-03 3:09 ` Stephen Zhang 2025-12-03 4:34 ` Caleb Sander Mateos 2025-12-03 6:14 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox