All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm-zoned: Avoid metadata flush writeback throttling
@ 2017-07-24  7:44 Damien Le Moal
  2017-07-24  8:03 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2017-07-24  7:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Alasdair Kergon; +Cc: Bart Van Assche, Mikulas Patocka

Avoid metadata flush to be blocked by the writeback throttling code in
wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
load with all chunk works active. In such situation, a deadlock can
happen if the flush work is blocked by the throttling code while holding
the metadata lock: as the chunk works will wait for the metadata lock
too, no progress can be made.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-zoned-metadata.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 884ff7c170a0..f694fb98b002 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
 	bio->bi_bdev = zmd->dev->bdev;
 	bio->bi_private = mblk;
 	bio->bi_end_io = dmz_mblock_bio_end_io;
-	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
+	bio_set_op_attrs(bio, REQ_OP_WRITE,
+			 REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE);
 	bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0);
 	submit_bio(bio);
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm-zoned: Avoid metadata flush writeback throttling
  2017-07-24  7:44 [PATCH] dm-zoned: Avoid metadata flush writeback throttling Damien Le Moal
@ 2017-07-24  8:03 ` Christoph Hellwig
  2017-07-24  9:16   ` Damien Le Moal
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-07-24  8:03 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Bart Van Assche, dm-devel, Mikulas Patocka, Alasdair Kergon,
	Mike Snitzer

On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote:
> Avoid metadata flush to be blocked by the writeback throttling code in
> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
> load with all chunk works active. In such situation, a deadlock can
> happen if the flush work is blocked by the throttling code while holding
> the metadata lock: as the chunk works will wait for the metadata lock
> too, no progress can be made.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/md/dm-zoned-metadata.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 884ff7c170a0..f694fb98b002 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
>  	bio->bi_bdev = zmd->dev->bdev;
>  	bio->bi_private = mblk;
>  	bio->bi_end_io = dmz_mblock_bio_end_io;
> -	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
> +	bio_set_op_attrs(bio, REQ_OP_WRITE,
> +			 REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE);

Please just assign bi_opf directly instea dof using bio_set_op_attrs
in new code.

Also what do you need REQ_IDLE for?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dm-zoned: Avoid metadata flush writeback throttling
  2017-07-24  8:03 ` Christoph Hellwig
@ 2017-07-24  9:16   ` Damien Le Moal
  2017-09-11 15:05     ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2017-07-24  9:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, dm-devel, Mikulas Patocka, Alasdair Kergon,
	Mike Snitzer



On 7/24/17 17:03, Christoph Hellwig wrote:
> On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote:
>> Avoid metadata flush to be blocked by the writeback throttling code in
>> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
>> load with all chunk works active. In such situation, a deadlock can
>> happen if the flush work is blocked by the throttling code while holding
>> the metadata lock: as the chunk works will wait for the metadata lock
>> too, no progress can be made.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>  drivers/md/dm-zoned-metadata.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 884ff7c170a0..f694fb98b002 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
>>  	bio->bi_bdev = zmd->dev->bdev;
>>  	bio->bi_private = mblk;
>>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>> -	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
>> +	bio_set_op_attrs(bio, REQ_OP_WRITE,
>> +			 REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE);
> 
> Please just assign bi_opf directly instea dof using bio_set_op_attrs
> in new code.

OK. Will do.

> Also what do you need REQ_IDLE for?

It is not really needed here, but the wbt_should_throttle() code test is:

if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE))
	return false;

No if REQ_IDLE is not set, wbt will trigger.

Having a req flag that says "no wb throttling please" would be a lot
cleaner...

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-zoned: Avoid metadata flush writeback throttling
  2017-07-24  9:16   ` Damien Le Moal
@ 2017-09-11 15:05     ` Mike Snitzer
  2017-09-12  5:46       ` Damien Le Moal
  2017-09-12  6:12       ` Mikulas Patocka
  0 siblings, 2 replies; 9+ messages in thread
From: Mike Snitzer @ 2017-09-11 15:05 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Mikulas Patocka,
	Alasdair Kergon

On Mon, Jul 24 2017 at  5:16am -0400,
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> 
> 
> On 7/24/17 17:03, Christoph Hellwig wrote:
> > On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote:
> >> Avoid metadata flush to be blocked by the writeback throttling code in
> >> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
> >> load with all chunk works active. In such situation, a deadlock can
> >> happen if the flush work is blocked by the throttling code while holding
> >> the metadata lock: as the chunk works will wait for the metadata lock
> >> too, no progress can be made.
> >>
> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >> ---
> >>  drivers/md/dm-zoned-metadata.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> >> index 884ff7c170a0..f694fb98b002 100644
> >> --- a/drivers/md/dm-zoned-metadata.c
> >> +++ b/drivers/md/dm-zoned-metadata.c
> >> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
> >>  	bio->bi_bdev = zmd->dev->bdev;
> >>  	bio->bi_private = mblk;
> >>  	bio->bi_end_io = dmz_mblock_bio_end_io;
> >> -	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
> >> +	bio_set_op_attrs(bio, REQ_OP_WRITE,
> >> +			 REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE);
> > 
> > Please just assign bi_opf directly instea dof using bio_set_op_attrs
> > in new code.
> 
> OK. Will do.
> 
> > Also what do you need REQ_IDLE for?
> 
> It is not really needed here, but the wbt_should_throttle() code test is:
> 
> if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE))
> 	return false;
> 
> So if REQ_IDLE is not set, wbt will trigger.
> 
> Having a req flag that says "no wb throttling please" would be a lot
> cleaner...

Did you have a new version of this patch?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-zoned: Avoid metadata flush writeback throttling
  2017-09-11 15:05     ` Mike Snitzer
@ 2017-09-12  5:46       ` Damien Le Moal
  2017-09-12  6:12       ` Mikulas Patocka
  1 sibling, 0 replies; 9+ messages in thread
From: Damien Le Moal @ 2017-09-12  5:46 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Mikulas Patocka,
	Alasdair Kergon

Mike,

On 9/12/17 00:05, Mike Snitzer wrote:
> On Mon, Jul 24 2017 at  5:16am -0400,
> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> 
>>
>>
>> On 7/24/17 17:03, Christoph Hellwig wrote:
>>> On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote:
>>>> Avoid metadata flush to be blocked by the writeback throttling code in
>>>> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
>>>> load with all chunk works active. In such situation, a deadlock can
>>>> happen if the flush work is blocked by the throttling code while holding
>>>> the metadata lock: as the chunk works will wait for the metadata lock
>>>> too, no progress can be made.
>>>>
>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>> ---
>>>>  drivers/md/dm-zoned-metadata.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>>>> index 884ff7c170a0..f694fb98b002 100644
>>>> --- a/drivers/md/dm-zoned-metadata.c
>>>> +++ b/drivers/md/dm-zoned-metadata.c
>>>> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
>>>>  	bio->bi_bdev = zmd->dev->bdev;
>>>>  	bio->bi_private = mblk;
>>>>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>>>> -	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
>>>> +	bio_set_op_attrs(bio, REQ_OP_WRITE,
>>>> +			 REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE);
>>>
>>> Please just assign bi_opf directly instea dof using bio_set_op_attrs
>>> in new code.
>>
>> OK. Will do.
>>
>>> Also what do you need REQ_IDLE for?
>>
>> It is not really needed here, but the wbt_should_throttle() code test is:
>>
>> if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE))
>> 	return false;
>>
>> So if REQ_IDLE is not set, wbt will trigger.
>>
>> Having a req flag that says "no wb throttling please" would be a lot
>> cleaner...
> 
> Did you have a new version of this patch?
> 
> Thanks,
> Mike

My apologies, I forgot about this one. Sending right away.

Best regards.

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-zoned: Avoid metadata flush writeback throttling
  2017-09-11 15:05     ` Mike Snitzer
  2017-09-12  5:46       ` Damien Le Moal
@ 2017-09-12  6:12       ` Mikulas Patocka
  2017-09-12  8:01         ` Damien Le Moal
  1 sibling, 1 reply; 9+ messages in thread
From: Mikulas Patocka @ 2017-09-12  6:12 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Bart Van Assche, Damien Le Moal, dm-devel,
	Alasdair Kergon



On Mon, 11 Sep 2017, Mike Snitzer wrote:

> On Mon, Jul 24 2017 at  5:16am -0400,
> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> 
> > 
> > 
> > On 7/24/17 17:03, Christoph Hellwig wrote:
> > > On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote:
> > >> Avoid metadata flush to be blocked by the writeback throttling code in
> > >> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
> > >> load with all chunk works active. In such situation, a deadlock can
> > >> happen if the flush work is blocked by the throttling code while holding
> > >> the metadata lock: as the chunk works will wait for the metadata lock
> > >> too, no progress can be made.

Could you explain, what's the specific problem here?

The writeback throttling code is executed at request level and the 
dm-zoned target working above it, at bio level. So I don't see how 
dm-zoned progress could be blocked by writeback throttling.

Maybe you have some other bug in your code and this patch just masks it?

Mikulas

> > >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > >> ---
> > >>  drivers/md/dm-zoned-metadata.c | 3 ++-
> > >>  1 file changed, 2 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> > >> index 884ff7c170a0..f694fb98b002 100644
> > >> --- a/drivers/md/dm-zoned-metadata.c
> > >> +++ b/drivers/md/dm-zoned-metadata.c
> > >> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
> > >>  	bio->bi_bdev = zmd->dev->bdev;
> > >>  	bio->bi_private = mblk;
> > >>  	bio->bi_end_io = dmz_mblock_bio_end_io;
> > >> -	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
> > >> +	bio_set_op_attrs(bio, REQ_OP_WRITE,
> > >> +			 REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE);
> > > 
> > > Please just assign bi_opf directly instea dof using bio_set_op_attrs
> > > in new code.
> > 
> > OK. Will do.
> > 
> > > Also what do you need REQ_IDLE for?
> > 
> > It is not really needed here, but the wbt_should_throttle() code test is:
> > 
> > if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE))
> > 	return false;
> > 
> > So if REQ_IDLE is not set, wbt will trigger.
> > 
> > Having a req flag that says "no wb throttling please" would be a lot
> > cleaner...
> 
> Did you have a new version of this patch?
> 
> Thanks,
> Mike
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-zoned: Avoid metadata flush writeback throttling
  2017-09-12  6:12       ` Mikulas Patocka
@ 2017-09-12  8:01         ` Damien Le Moal
  2017-09-12 10:32           ` Mikulas Patocka
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2017-09-12  8:01 UTC (permalink / raw)
  To: Mikulas Patocka, Mike Snitzer
  Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Alasdair Kergon

Mikulas,

On 9/12/17 15:12, Mikulas Patocka wrote:
> 
> 
> On Mon, 11 Sep 2017, Mike Snitzer wrote:
> 
>> On Mon, Jul 24 2017 at  5:16am -0400,
>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
>>
>>>
>>>
>>> On 7/24/17 17:03, Christoph Hellwig wrote:
>>>> On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote:
>>>>> Avoid metadata flush to be blocked by the writeback throttling code in
>>>>> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
>>>>> load with all chunk works active. In such situation, a deadlock can
>>>>> happen if the flush work is blocked by the throttling code while holding
>>>>> the metadata lock: as the chunk works will wait for the metadata lock
>>>>> too, no progress can be made.
> 
> Could you explain, what's the specific problem here?
> 
> The writeback throttling code is executed at request level and the 
> dm-zoned target working above it, at bio level. So I don't see how 
> dm-zoned progress could be blocked by writeback throttling.

Request layer? All the code in block/blk-wbt.c acts on BIOs, in
particular the function wbt_wait() which is called before get_request()
in blk_queue_bio(), which is q->make_request_fn, so all of this happens
before a request even exists for the BIO. wbt_wait() calls
wbt_should_throttle() for the BIO to queue and will cause
blk_queue_bio() to sleep if that functions returns true and the number
of in-flight BIOs exceeds the set limit (wb_max, wb_normal or
wb_background).

Are we talking about a different throttling here ? Or am I entirely
missing something ?

> Maybe you have some other bug in your code and this patch just masks it?

Doing more testing with the current 4.13, I am now failing to reproduce
the hang. So you are right, it likely was something else.

In fact, analyzing writeback throttling more carefully, I realized that
the BIOs received by the chunk work and the metadata flush work BIOs are
throttled against different in-flight counters as the former BIOs are
issued to the target device queue while the latter are issued to the
target backing dev queue. As a result, one should not be blocking the
other. Chunk works may have to wait for metadata flush to complete
first, but that is before these works issue BIOs on the target bdev so
they cannot in turn block flush on a throttling condition.

Thanks for asking these hard questions!

Mike,

I will keep an eye open for issues but everything seems OK for now. So
let's drop this patch. My apologies for the noise.
Since bio_set_op_attrs() is deprecated, I will send a patch to remove
these calls.

Best regards.

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-zoned: Avoid metadata flush writeback throttling
  2017-09-12  8:01         ` Damien Le Moal
@ 2017-09-12 10:32           ` Mikulas Patocka
  2017-09-13  0:38             ` Damien Le Moal
  0 siblings, 1 reply; 9+ messages in thread
From: Mikulas Patocka @ 2017-09-12 10:32 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Alasdair Kergon,
	Mike Snitzer



On Tue, 12 Sep 2017, Damien Le Moal wrote:

> Mikulas,
> 
> On 9/12/17 15:12, Mikulas Patocka wrote:
> > 
> > 
> > On Mon, 11 Sep 2017, Mike Snitzer wrote:
> > 
> >> On Mon, Jul 24 2017 at  5:16am -0400,
> >> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> >>
> >>>
> >>>
> >>> On 7/24/17 17:03, Christoph Hellwig wrote:
> >>>> On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote:
> >>>>> Avoid metadata flush to be blocked by the writeback throttling code in
> >>>>> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write
> >>>>> load with all chunk works active. In such situation, a deadlock can
> >>>>> happen if the flush work is blocked by the throttling code while holding
> >>>>> the metadata lock: as the chunk works will wait for the metadata lock
> >>>>> too, no progress can be made.
> > 
> > Could you explain, what's the specific problem here?
> > 
> > The writeback throttling code is executed at request level and the 
> > dm-zoned target working above it, at bio level. So I don't see how 
> > dm-zoned progress could be blocked by writeback throttling.
> 
> Request layer? All the code in block/blk-wbt.c acts on BIOs, in
> particular the function wbt_wait() which is called before get_request()
> in blk_queue_bio(), which is q->make_request_fn, so all of this happens

blk_queue_bio() is only called for request-based drivers (i.e. physical 
disk drivers), not for bio-based device mapper drivers. So, bios incoming 
to your target will never be delayed because of throttling, only the 
outgoing bios will be delayed.

> before a request even exists for the BIO. wbt_wait() calls
> wbt_should_throttle() for the BIO to queue and will cause
> blk_queue_bio() to sleep if that functions returns true and the number
> of in-flight BIOs exceeds the set limit (wb_max, wb_normal or
> wb_background).
> 
> Are we talking about a different throttling here ? Or am I entirely
> missing something ?

So, you send some bios to the physical disk driver. The physical disk 
driver eventually starts throttling and blocks in the function 
blk_queue_bio(). If your driver deadlocks because the underlying disk 
driver blocks, then it is a bug in your driver.

The patch that you sent seems to be just a workaround, not a real fix. The 
underlying disk driver may block for multiple reasons. For example, if 
you set "echo 4 >/sys/block/sda/queue/nr_requests" on the disk, it will 
allow just 4 outstanding requests and then block. The disk driver can also 
block at random times, if there is temporary memory shortage and the 
request structure can't be allocated.

> > Maybe you have some other bug in your code and this patch just masks it?
> 
> Doing more testing with the current 4.13, I am now failing to reproduce
> the hang. So you are right, it likely was something else.
> 
> In fact, analyzing writeback throttling more carefully, I realized that
> the BIOs received by the chunk work and the metadata flush work BIOs are
> throttled against different in-flight counters as the former BIOs are
> issued to the target device queue while the latter are issued to the
> target backing dev queue. As a result, one should not be blocking the
> other. Chunk works may have to wait for metadata flush to complete
> first, but that is before these works issue BIOs on the target bdev so
> they cannot in turn block flush on a throttling condition.
> 
> Thanks for asking these hard questions!

The function submit_bio() or generic_make_request() can block anytime. If 
you driver assumes that it can submit multiple bios without blocking, it 
is a bug in your driver and you need to fix it.

I suggest that you try "echo 4 >/sys/block/sda/queue/nr_requests" for the 
underlying disk and then try to reproduce the deadlock, analyze it and fix 
it.

Mikulas

> Mike,
> 
> I will keep an eye open for issues but everything seems OK for now. So
> let's drop this patch. My apologies for the noise.
> Since bio_set_op_attrs() is deprecated, I will send a patch to remove
> these calls.
> 
> Best regards.
> 
> -- 
> Damien Le Moal,
> Western Digital
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-zoned: Avoid metadata flush writeback throttling
  2017-09-12 10:32           ` Mikulas Patocka
@ 2017-09-13  0:38             ` Damien Le Moal
  0 siblings, 0 replies; 9+ messages in thread
From: Damien Le Moal @ 2017-09-13  0:38 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Alasdair Kergon,
	Mike Snitzer

Mikulas,

On 9/12/17 19:32, Mikulas Patocka wrote:
>>> The writeback throttling code is executed at request level and the 
>>> dm-zoned target working above it, at bio level. So I don't see how 
>>> dm-zoned progress could be blocked by writeback throttling.
>>
>> Request layer? All the code in block/blk-wbt.c acts on BIOs, in
>> particular the function wbt_wait() which is called before get_request()
>> in blk_queue_bio(), which is q->make_request_fn, so all of this happens
> 
> blk_queue_bio() is only called for request-based drivers (i.e. physical 
> disk drivers), not for bio-based device mapper drivers. So, bios incoming 
> to your target will never be delayed because of throttling, only the 
> outgoing bios will be delayed.

Arrg ! Of course ! What an idiot I am. The dm target queue request_fn
function is different from that of the physical disks. You are
absolutely correct and my analysis was just plain wrong.

>> before a request even exists for the BIO. wbt_wait() calls
>> wbt_should_throttle() for the BIO to queue and will cause
>> blk_queue_bio() to sleep if that functions returns true and the number
>> of in-flight BIOs exceeds the set limit (wb_max, wb_normal or
>> wb_background).
>>
>> Are we talking about a different throttling here ? Or am I entirely
>> missing something ?
> 
> So, you send some bios to the physical disk driver. The physical disk 
> driver eventually starts throttling and blocks in the function 
> blk_queue_bio(). If your driver deadlocks because the underlying disk 
> driver blocks, then it is a bug in your driver.
> 
> The patch that you sent seems to be just a workaround, not a real fix. The 
> underlying disk driver may block for multiple reasons. For example, if 
> you set "echo 4 >/sys/block/sda/queue/nr_requests" on the disk, it will 
> allow just 4 outstanding requests and then block. The disk driver can also 
> block at random times, if there is temporary memory shortage and the 
> request structure can't be allocated.

dm-zoned chunk works which process target incoming BIOs and issue BIOs
to the physical drive do not wait for the completion of the physical
BIOs. Only the flush work does for BIOs handling metadata. But the flush
work and the chunk works cannot execute simultaneously (there is a
semaphore to prevent that). So no amount of blocking on BIO issuing (by
wb throttling, memory or request allocation or any other reason) can
cause a deadlock between chunk works and flush. Progress is always possible.

>>> Maybe you have some other bug in your code and this patch just masks it?
>>
>> Doing more testing with the current 4.13, I am now failing to reproduce
>> the hang. So you are right, it likely was something else.
>>
>> In fact, analyzing writeback throttling more carefully, I realized that
>> the BIOs received by the chunk work and the metadata flush work BIOs are
>> throttled against different in-flight counters as the former BIOs are
>> issued to the target device queue while the latter are issued to the
>> target backing dev queue. As a result, one should not be blocking the
>> other. Chunk works may have to wait for metadata flush to complete
>> first, but that is before these works issue BIOs on the target bdev so
>> they cannot in turn block flush on a throttling condition.
>>
>> Thanks for asking these hard questions!
> 
> The function submit_bio() or generic_make_request() can block anytime. If 
> you driver assumes that it can submit multiple bios without blocking, it 
> is a bug in your driver and you need to fix it.
> 
> I suggest that you try "echo 4 >/sys/block/sda/queue/nr_requests" for the 
> underlying disk and then try to reproduce the deadlock, analyze it and fix 
> it.

See above. I got it.
I went back to the initial code & test conditions I used when I first
reported the problem and sent the patch. I could recreate the problem.
It turns out that the real cause is zone write locking in the scsi-mq
path which can cause dispatch deadlock. If such deadlock occurs, the
chunk work issued BIOs do not complete and a flush work run trying to
sync metadata end up being blocked on wb throttling, but only because
the chunk work BIOs are still in-flight. The real cause is not wb
throttling, but ZBC+scsi-mq failing to make progress on the issued
requests. The patch I sent is indeed useless. WB throttling is just fine
with dm-zoned.

I am working on fixing ZBC on scsi-mq. But that is different from the dm
path and no patch is needed there.

Thank you for all your comments.

Best regards.

-- 
Damien Le Moal,
Western Digital

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-13  0:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-24  7:44 [PATCH] dm-zoned: Avoid metadata flush writeback throttling Damien Le Moal
2017-07-24  8:03 ` Christoph Hellwig
2017-07-24  9:16   ` Damien Le Moal
2017-09-11 15:05     ` Mike Snitzer
2017-09-12  5:46       ` Damien Le Moal
2017-09-12  6:12       ` Mikulas Patocka
2017-09-12  8:01         ` Damien Le Moal
2017-09-12 10:32           ` Mikulas Patocka
2017-09-13  0:38             ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.