linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] blk-mq: fix race condition in active queue accounting
@ 2023-05-22  0:43 Tian Lan
  2023-05-22  1:08 ` Ming Lei
  2023-05-22  1:15 ` Damien Le Moal
  0 siblings, 2 replies; 16+ messages in thread
From: Tian Lan @ 2023-05-22  0:43 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Tian Lan

From: Tian Lan <tian.lan@twosigma.com>

If multiple CPUs are sharing the same hardware queue, it can
cause leak in the active queue counter tracking when __blk_mq_tag_busy()
is executed simultaneously.

Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
---
 block/blk-mq-tag.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d6af9d431dc6..07372032238a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 	if (blk_mq_is_shared_tags(hctx->flags)) {
 		struct request_queue *q = hctx->queue;
 
-		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
+		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
+		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
 			return;
-		set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
+		}
 	} else {
-		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
+		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) {
 			return;
-		set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
+		}
 	}
 
 	users = atomic_inc_return(&hctx->tags->active_queues);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  0:43 [PATCH 1/1] blk-mq: fix race condition in active queue accounting Tian Lan
@ 2023-05-22  1:08 ` Ming Lei
  2023-05-22  2:12   ` Tian Lan
  2023-05-22  1:15 ` Damien Le Moal
  1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-05-22  1:08 UTC (permalink / raw)
  To: Tian Lan; +Cc: axboe, linux-block, Tian Lan, Liu Song, Ming Lei

On Mon, May 22, 2023 at 8:45 AM Tian Lan <tilan7663@gmail.com> wrote:
>
> From: Tian Lan <tian.lan@twosigma.com>
>
> If multiple CPUs are sharing the same hardware queue, it can
> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> is executed simultaneously.
>
> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>

Looks fine,

Reviewed-by: Ming Lei <ming.lei@redhat.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  0:43 [PATCH 1/1] blk-mq: fix race condition in active queue accounting Tian Lan
  2023-05-22  1:08 ` Ming Lei
@ 2023-05-22  1:15 ` Damien Le Moal
  2023-05-22  1:23   ` Ming Lei
  1 sibling, 1 reply; 16+ messages in thread
From: Damien Le Moal @ 2023-05-22  1:15 UTC (permalink / raw)
  To: Tian Lan, axboe; +Cc: linux-block, Tian Lan

On 5/22/23 09:43, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> If multiple CPUs are sharing the same hardware queue, it can
> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> is executed simultaneously.
> 
> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> ---
>  block/blk-mq-tag.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d6af9d431dc6..07372032238a 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  	if (blk_mq_is_shared_tags(hctx->flags)) {
>  		struct request_queue *q = hctx->queue;
>  
> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {

This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be:

		if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
			return;

?

>  			return;
> -		set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
> +		}
>  	} else {
> -		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> +		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
> +		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) {
>  			return;
> -		set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
> +		}

Same here. And given that this pattern is the same for the if and the else, this
entire hunk can likely be simplified.

>  	}
>  
>  	users = atomic_inc_return(&hctx->tags->active_queues);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  1:15 ` Damien Le Moal
@ 2023-05-22  1:23   ` Ming Lei
  2023-05-22  1:29     ` Damien Le Moal
  2023-05-22  2:05     ` Jens Axboe
  0 siblings, 2 replies; 16+ messages in thread
From: Ming Lei @ 2023-05-22  1:23 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Tian Lan, axboe, linux-block, Tian Lan

On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote:
> On 5/22/23 09:43, Tian Lan wrote:
> > From: Tian Lan <tian.lan@twosigma.com>
> > 
> > If multiple CPUs are sharing the same hardware queue, it can
> > cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> > is executed simultaneously.
> > 
> > Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
> > Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> > ---
> >  block/blk-mq-tag.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index d6af9d431dc6..07372032238a 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> >  	if (blk_mq_is_shared_tags(hctx->flags)) {
> >  		struct request_queue *q = hctx->queue;
> >  
> > -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> > +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
> > +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
> 
> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be:
> 
> 		if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> 			return;
> 
> ?

It is one micro optimization since test_and_set_bit is much heavier than
test_bit, so test_and_set_bit() is just needed in the 1st time.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  1:23   ` Ming Lei
@ 2023-05-22  1:29     ` Damien Le Moal
  2023-05-22  2:07       ` Jens Axboe
  2023-05-22  2:17       ` Ming Lei
  2023-05-22  2:05     ` Jens Axboe
  1 sibling, 2 replies; 16+ messages in thread
From: Damien Le Moal @ 2023-05-22  1:29 UTC (permalink / raw)
  To: Ming Lei; +Cc: Tian Lan, axboe, linux-block, Tian Lan

On 5/22/23 10:23, Ming Lei wrote:
> On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote:
>> On 5/22/23 09:43, Tian Lan wrote:
>>> From: Tian Lan <tian.lan@twosigma.com>
>>>
>>> If multiple CPUs are sharing the same hardware queue, it can
>>> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
>>> is executed simultaneously.
>>>
>>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
>>> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
>>> ---
>>>  block/blk-mq-tag.c | 10 ++++++----
>>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>> index d6af9d431dc6..07372032238a 100644
>>> --- a/block/blk-mq-tag.c
>>> +++ b/block/blk-mq-tag.c
>>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>>>  	if (blk_mq_is_shared_tags(hctx->flags)) {
>>>  		struct request_queue *q = hctx->queue;
>>>  
>>> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>>> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
>>> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
>>
>> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be:
>>
>> 		if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>> 			return;
>>
>> ?
> 
> It is one micro optimization since test_and_set_bit is much heavier than
> test_bit, so test_and_set_bit() is just needed in the 1st time.

But having the 2 calls test_bit + test_and_set_bit creates a race, doesn't it ?
What if another cpu clears the bit between these 2 calls ?

At the very least, the patch should remove the curly braces around that if.

> 
> Thanks,
> Ming
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  1:23   ` Ming Lei
  2023-05-22  1:29     ` Damien Le Moal
@ 2023-05-22  2:05     ` Jens Axboe
  1 sibling, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-05-22  2:05 UTC (permalink / raw)
  To: Ming Lei, Damien Le Moal; +Cc: Tian Lan, linux-block, Tian Lan

On 5/21/23 7:23?PM, Ming Lei wrote:
> On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote:
>> On 5/22/23 09:43, Tian Lan wrote:
>>> From: Tian Lan <tian.lan@twosigma.com>
>>>
>>> If multiple CPUs are sharing the same hardware queue, it can
>>> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
>>> is executed simultaneously.
>>>
>>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
>>> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
>>> ---
>>>  block/blk-mq-tag.c | 10 ++++++----
>>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>> index d6af9d431dc6..07372032238a 100644
>>> --- a/block/blk-mq-tag.c
>>> +++ b/block/blk-mq-tag.c
>>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>>>  	if (blk_mq_is_shared_tags(hctx->flags)) {
>>>  		struct request_queue *q = hctx->queue;
>>>  
>>> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>>> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
>>> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
>>
>> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be:
>>
>> 		if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>> 			return;
>>
>> ?
> 
> It is one micro optimization since test_and_set_bit is much heavier
> than test_bit, so test_and_set_bit() is just needed in the 1st time.

It's an optimization, but it's certainly not a micro one. If the common
case is always hitting that, then test_and_set_bit() will repeatedly
dirty that cacheline. And obviously it's useless to do that if the bit
is already set. This makes it a pretty nice optimization and definitely
outside the realm of "micro optimization" as it can have quite a large
impact. I used that in various spots in blk-mq, which I suspect is where
the inspiration for this one came too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  1:29     ` Damien Le Moal
@ 2023-05-22  2:07       ` Jens Axboe
  2023-05-22  2:17       ` Ming Lei
  1 sibling, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-05-22  2:07 UTC (permalink / raw)
  To: Damien Le Moal, Ming Lei; +Cc: Tian Lan, linux-block, Tian Lan

On 5/21/23 7:29?PM, Damien Le Moal wrote:
> On 5/22/23 10:23, Ming Lei wrote:
>> On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote:
>>> On 5/22/23 09:43, Tian Lan wrote:
>>>> From: Tian Lan <tian.lan@twosigma.com>
>>>>
>>>> If multiple CPUs are sharing the same hardware queue, it can
>>>> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
>>>> is executed simultaneously.
>>>>
>>>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
>>>> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
>>>> ---
>>>>  block/blk-mq-tag.c | 10 ++++++----
>>>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>>> index d6af9d431dc6..07372032238a 100644
>>>> --- a/block/blk-mq-tag.c
>>>> +++ b/block/blk-mq-tag.c
>>>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>>>>  	if (blk_mq_is_shared_tags(hctx->flags)) {
>>>>  		struct request_queue *q = hctx->queue;
>>>>  
>>>> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>>>> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
>>>> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
>>>
>>> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be:
>>>
>>> 		if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>>> 			return;
>>>
>>> ?
>>
>> It is one micro optimization since test_and_set_bit is much heavier than
>> test_bit, so test_and_set_bit() is just needed in the 1st time.
> 
> But having the 2 calls test_bit + test_and_set_bit creates a race,
> doesn't it ? What if another cpu clears the bit between these 2 calls
> ?

How so? If the bit is already set or you're racing with it being set or
cleared, that race already exists before. This simply prevent
unnecessary dirtying of that cacheline.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  1:08 ` Ming Lei
@ 2023-05-22  2:12   ` Tian Lan
  2023-05-22  9:46     ` John Garry
  0 siblings, 1 reply; 16+ messages in thread
From: Tian Lan @ 2023-05-22  2:12 UTC (permalink / raw)
  To: ming.lei; +Cc: axboe, linux-block, liusong, tian.lan, tilan7663

From: Tian Lan <tian.lan@twosigma.com>

If multiple CPUs are sharing the same hardware queue, it can
cause leak in the active queue counter tracking when __blk_mq_tag_busy()
is executed simultaneously.

Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d6af9d431dc6..0db6c31d3f4a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -42,13 +42,13 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 	if (blk_mq_is_shared_tags(hctx->flags)) {
 		struct request_queue *q = hctx->queue;
 
-		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
+		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
+		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
 			return;
-		set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
 	} else {
-		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
+		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 			return;
-		set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
 	}
 
 	users = atomic_inc_return(&hctx->tags->active_queues);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  1:29     ` Damien Le Moal
  2023-05-22  2:07       ` Jens Axboe
@ 2023-05-22  2:17       ` Ming Lei
  2023-05-22  2:26         ` Damien Le Moal
  1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-05-22  2:17 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Tian Lan, axboe, linux-block, Tian Lan

On Mon, May 22, 2023 at 10:29:05AM +0900, Damien Le Moal wrote:
> On 5/22/23 10:23, Ming Lei wrote:
> > On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote:
> >> On 5/22/23 09:43, Tian Lan wrote:
> >>> From: Tian Lan <tian.lan@twosigma.com>
> >>>
> >>> If multiple CPUs are sharing the same hardware queue, it can
> >>> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> >>> is executed simultaneously.
> >>>
> >>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
> >>> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> >>> ---
> >>>  block/blk-mq-tag.c | 10 ++++++----
> >>>  1 file changed, 6 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> >>> index d6af9d431dc6..07372032238a 100644
> >>> --- a/block/blk-mq-tag.c
> >>> +++ b/block/blk-mq-tag.c
> >>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> >>>  	if (blk_mq_is_shared_tags(hctx->flags)) {
> >>>  		struct request_queue *q = hctx->queue;
> >>>  
> >>> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> >>> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
> >>> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
> >>
> >> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be:
> >>
> >> 		if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> >> 			return;
> >>
> >> ?
> > 
> > It is one micro optimization since test_and_set_bit is much heavier than
> > test_bit, so test_and_set_bit() is just needed in the 1st time.
> 
> But having the 2 calls test_bit + test_and_set_bit creates a race, doesn't it ?
> What if another cpu clears the bit between these 2 calls ?

If test_bit() returns 0, there isn't such race since both sides are atomic OP.

If test_bit() returns 1:

1) __blk_mq_tag_busy() vs. __blk_mq_tag_busy()
- both does nothing

2) __blk_mq_tag_busy() vs. __blk_mq_tag_idle()
- hctx_may_queue() is always following __blk_mq_tag_busy()
- hctx_may_queue() returns true in case that this flag is cleared
- current __blk_mq_tag_busy() does nothing, the following allocation
works fine given hctx_may_queue() returns true
- new __blk_mq_tag_busy() will setup everything as fine


Thanks,
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  2:17       ` Ming Lei
@ 2023-05-22  2:26         ` Damien Le Moal
  0 siblings, 0 replies; 16+ messages in thread
From: Damien Le Moal @ 2023-05-22  2:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Tian Lan, axboe, linux-block, Tian Lan

On 5/22/23 11:17, Ming Lei wrote:
> On Mon, May 22, 2023 at 10:29:05AM +0900, Damien Le Moal wrote:
>> On 5/22/23 10:23, Ming Lei wrote:
>>> On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote:
>>>> On 5/22/23 09:43, Tian Lan wrote:
>>>>> From: Tian Lan <tian.lan@twosigma.com>
>>>>>
>>>>> If multiple CPUs are sharing the same hardware queue, it can
>>>>> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
>>>>> is executed simultaneously.
>>>>>
>>>>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
>>>>> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
>>>>> ---
>>>>>  block/blk-mq-tag.c | 10 ++++++----
>>>>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>>>>> index d6af9d431dc6..07372032238a 100644
>>>>> --- a/block/blk-mq-tag.c
>>>>> +++ b/block/blk-mq-tag.c
>>>>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>>>>>  	if (blk_mq_is_shared_tags(hctx->flags)) {
>>>>>  		struct request_queue *q = hctx->queue;
>>>>>  
>>>>> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>>>>> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
>>>>> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
>>>>
>>>> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be:
>>>>
>>>> 		if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>>>> 			return;
>>>>
>>>> ?
>>>
>>> It is one micro optimization since test_and_set_bit is much heavier than
>>> test_bit, so test_and_set_bit() is just needed in the 1st time.
>>
>> But having the 2 calls test_bit + test_and_set_bit creates a race, doesn't it ?
>> What if another cpu clears the bit between these 2 calls ?
> 
> If test_bit() returns 0, there isn't such race since both sides are atomic OP.
> 
> If test_bit() returns 1:
> 
> 1) __blk_mq_tag_busy() vs. __blk_mq_tag_busy()
> - both does nothing
> 
> 2) __blk_mq_tag_busy() vs. __blk_mq_tag_idle()
> - hctx_may_queue() is always following __blk_mq_tag_busy()
> - hctx_may_queue() returns true in case that this flag is cleared
> - current __blk_mq_tag_busy() does nothing, the following allocation
> works fine given hctx_may_queue() returns true
> - new __blk_mq_tag_busy() will setup everything as fine

OK. Thanks. It would be nice to update the function comment (which is not very
clear due to grammar issues) to document this unusual use of the bit functions,
including the fact that it is an optimization to avoid dirtying cache lines.

> 
> 
> Thanks,
> Ming
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  2:12   ` Tian Lan
@ 2023-05-22  9:46     ` John Garry
  2023-05-22 21:05       ` Tian Lan
  0 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2023-05-22  9:46 UTC (permalink / raw)
  To: Tian Lan, ming.lei; +Cc: axboe, linux-block, liusong, tian.lan

On 22/05/2023 03:12, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> If multiple CPUs are sharing the same hardware queue, it can
> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> is executed simultaneously.
> 
> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq-tag.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d6af9d431dc6..0db6c31d3f4a 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -42,13 +42,13 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>   	if (blk_mq_is_shared_tags(hctx->flags)) {
>   		struct request_queue *q = hctx->queue;
>   
> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))


How about add a small comment why we have the test_bit()? It seems to be 
asked every so often, like: 
https://lore.kernel.org/linux-block/6e2db454-b2f7-16a1-acc6-999ee09fdf10@suse.de/#t

>   			return;
> -		set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
>   	} else {
> -		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> +		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
> +		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>   			return;
> -		set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
>   	}
>   
>   	users = atomic_inc_return(&hctx->tags->active_queues);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22  9:46     ` John Garry
@ 2023-05-22 21:05       ` Tian Lan
  2023-05-23  2:37         ` Damien Le Moal
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Tian Lan @ 2023-05-22 21:05 UTC (permalink / raw)
  To: john.g.garry; +Cc: axboe, linux-block, liusong, ming.lei, tian.lan, tilan7663

From: Tian Lan <tian.lan@twosigma.com>

If multiple CPUs are sharing the same hardware queue, it can
cause leak in the active queue counter tracking when __blk_mq_tag_busy()
is executed simultaneously.

Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d6af9d431dc6..dfd81cab5788 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -39,16 +39,20 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
 	unsigned int users;
 
+	/*
+	 * calling test_bit() prior to test_and_set_bit() is intentional,
+	 * it avoids dirtying the cacheline if the queue is already active.
+	 */
 	if (blk_mq_is_shared_tags(hctx->flags)) {
 		struct request_queue *q = hctx->queue;
 
-		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
+		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
+		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
 			return;
-		set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
 	} else {
-		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
+		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 			return;
-		set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
 	}
 
 	users = atomic_inc_return(&hctx->tags->active_queues);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22 21:05       ` Tian Lan
@ 2023-05-23  2:37         ` Damien Le Moal
  2023-05-23  8:22         ` John Garry
  2023-05-23 17:13         ` Jens Axboe
  2 siblings, 0 replies; 16+ messages in thread
From: Damien Le Moal @ 2023-05-23  2:37 UTC (permalink / raw)
  To: Tian Lan, john.g.garry; +Cc: axboe, linux-block, liusong, ming.lei, tian.lan

On 5/23/23 06:05, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> If multiple CPUs are sharing the same hardware queue, it can
> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> is executed simultaneously.
> 
> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  block/blk-mq-tag.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d6af9d431dc6..dfd81cab5788 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -39,16 +39,20 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  {
>  	unsigned int users;
>  
> +	/*
> +	 * calling test_bit() prior to test_and_set_bit() is intentional,
> +	 * it avoids dirtying the cacheline if the queue is already active.
> +	 */
>  	if (blk_mq_is_shared_tags(hctx->flags)) {
>  		struct request_queue *q = hctx->queue;
>  
> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>  			return;
> -		set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
>  	} else {
> -		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> +		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
> +		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  			return;
> -		set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
>  	}
>  
>  	users = atomic_inc_return(&hctx->tags->active_queues);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22 21:05       ` Tian Lan
  2023-05-23  2:37         ` Damien Le Moal
@ 2023-05-23  8:22         ` John Garry
  2023-05-24  1:48           ` Keith Busch
  2023-05-23 17:13         ` Jens Axboe
  2 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2023-05-23  8:22 UTC (permalink / raw)
  To: Tian Lan, axboe; +Cc: linux-block, liusong, ming.lei, tian.lan

On 22/05/2023 22:05, Tian Lan wrote:
> From: Tian Lan <tian.lan@twosigma.com>
> 
> If multiple CPUs are sharing the same hardware queue, it can
> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> is executed simultaneously.
> 
> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value")
> Signed-off-by: Tian Lan <tian.lan@twosigma.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>

Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
>   block/blk-mq-tag.c | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d6af9d431dc6..dfd81cab5788 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -39,16 +39,20 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>   {
>   	unsigned int users;
>   
> +	/*
> +	 * calling test_bit() prior to test_and_set_bit() is intentional,
> +	 * it avoids dirtying the cacheline if the queue is already active.
> +	 */
>   	if (blk_mq_is_shared_tags(hctx->flags)) {
>   		struct request_queue *q = hctx->queue;
>   
> -		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))

We could also add a bitops wrapper function for this. I don't know what 
name to use, maybe test_and_set_bit_fast_but_racy() - I'm half joking 
about the name

>   			return;
> -		set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags);
>   	} else {
> -		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> +		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
> +		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>   			return;
> -		set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state);
>   	}
>   
>   	users = atomic_inc_return(&hctx->tags->active_queues);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-22 21:05       ` Tian Lan
  2023-05-23  2:37         ` Damien Le Moal
  2023-05-23  8:22         ` John Garry
@ 2023-05-23 17:13         ` Jens Axboe
  2 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-05-23 17:13 UTC (permalink / raw)
  To: john.g.garry, Tian Lan; +Cc: linux-block, liusong, ming.lei, tian.lan


On Mon, 22 May 2023 17:05:55 -0400, Tian Lan wrote:
> If multiple CPUs are sharing the same hardware queue, it can
> cause leak in the active queue counter tracking when __blk_mq_tag_busy()
> is executed simultaneously.
> 
> 

Applied, thanks!

[1/1] blk-mq: fix race condition in active queue accounting
      (no commit info)

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] blk-mq: fix race condition in active queue accounting
  2023-05-23  8:22         ` John Garry
@ 2023-05-24  1:48           ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2023-05-24  1:48 UTC (permalink / raw)
  To: John Garry; +Cc: Tian Lan, axboe, linux-block, liusong, ming.lei, tian.lan

On Tue, May 23, 2023 at 09:22:13AM +0100, John Garry wrote:
> On 22/05/2023 22:05, Tian Lan wrote:
> > +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
> > +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> 
> We could also add a bitops wrapper function for this. I don't know what name
> to use, maybe test_and_set_bit_fast_but_racy() - I'm half joking about the
> name

I don't think this pattern is any more racy than a solo
test_and_set_bit(). Maybe "test_then_test_and_set_bit()"?

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-05-24  1:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-22  0:43 [PATCH 1/1] blk-mq: fix race condition in active queue accounting Tian Lan
2023-05-22  1:08 ` Ming Lei
2023-05-22  2:12   ` Tian Lan
2023-05-22  9:46     ` John Garry
2023-05-22 21:05       ` Tian Lan
2023-05-23  2:37         ` Damien Le Moal
2023-05-23  8:22         ` John Garry
2023-05-24  1:48           ` Keith Busch
2023-05-23 17:13         ` Jens Axboe
2023-05-22  1:15 ` Damien Le Moal
2023-05-22  1:23   ` Ming Lei
2023-05-22  1:29     ` Damien Le Moal
2023-05-22  2:07       ` Jens Axboe
2023-05-22  2:17       ` Ming Lei
2023-05-22  2:26         ` Damien Le Moal
2023-05-22  2:05     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).