* [PATCH] dm-zoned: Avoid metadata flush writeback throttling @ 2017-07-24 7:44 Damien Le Moal 2017-07-24 8:03 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Damien Le Moal @ 2017-07-24 7:44 UTC (permalink / raw) To: dm-devel, Mike Snitzer, Alasdair Kergon; +Cc: Bart Van Assche, Mikulas Patocka Avoid metadata flush to be blocked by the writeback throttling code in wbt_wait(). Otherwise, we can end up with a deadlock under heavy write load with all chunk works active. In such situation, a deadlock can happen if the flush work is blocked by the throttling code while holding the metadata lock: as the chunk works will wait for the metadata lock too, no progress can be made. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- drivers/md/dm-zoned-metadata.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c index 884ff7c170a0..f694fb98b002 100644 --- a/drivers/md/dm-zoned-metadata.c +++ b/drivers/md/dm-zoned-metadata.c @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, bio->bi_bdev = zmd->dev->bdev; bio->bi_private = mblk; bio->bi_end_io = dmz_mblock_bio_end_io; - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); + bio_set_op_attrs(bio, REQ_OP_WRITE, + REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE); bio_add_page(bio, mblk->page, DMZ_BLOCK_SIZE, 0); submit_bio(bio); } -- 2.13.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] dm-zoned: Avoid metadata flush writeback throttling 2017-07-24 7:44 [PATCH] dm-zoned: Avoid metadata flush writeback throttling Damien Le Moal @ 2017-07-24 8:03 ` Christoph Hellwig 2017-07-24 9:16 ` Damien Le Moal 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2017-07-24 8:03 UTC (permalink / raw) To: Damien Le Moal Cc: Bart Van Assche, dm-devel, Mikulas Patocka, Alasdair Kergon, Mike Snitzer On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote: > Avoid metadata flush to be blocked by the writeback throttling code in > wbt_wait(). Otherwise, we can end up with a deadlock under heavy write > load with all chunk works active. In such situation, a deadlock can > happen if the flush work is blocked by the throttling code while holding > the metadata lock: as the chunk works will wait for the metadata lock > too, no progress can be made. > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > drivers/md/dm-zoned-metadata.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c > index 884ff7c170a0..f694fb98b002 100644 > --- a/drivers/md/dm-zoned-metadata.c > +++ b/drivers/md/dm-zoned-metadata.c > @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, > bio->bi_bdev = zmd->dev->bdev; > bio->bi_private = mblk; > bio->bi_end_io = dmz_mblock_bio_end_io; > - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); > + bio_set_op_attrs(bio, REQ_OP_WRITE, > + REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE); Please just assign bi_opf directly instea dof using bio_set_op_attrs in new code. Also what do you need REQ_IDLE for? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] dm-zoned: Avoid metadata flush writeback throttling 2017-07-24 8:03 ` Christoph Hellwig @ 2017-07-24 9:16 ` Damien Le Moal 2017-09-11 15:05 ` Mike Snitzer 0 siblings, 1 reply; 9+ messages in thread From: Damien Le Moal @ 2017-07-24 9:16 UTC (permalink / raw) To: Christoph Hellwig Cc: Bart Van Assche, dm-devel, Mikulas Patocka, Alasdair Kergon, Mike Snitzer On 7/24/17 17:03, Christoph Hellwig wrote: > On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote: >> Avoid metadata flush to be blocked by the writeback throttling code in >> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write >> load with all chunk works active. In such situation, a deadlock can >> happen if the flush work is blocked by the throttling code while holding >> the metadata lock: as the chunk works will wait for the metadata lock >> too, no progress can be made. >> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >> --- >> drivers/md/dm-zoned-metadata.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c >> index 884ff7c170a0..f694fb98b002 100644 >> --- a/drivers/md/dm-zoned-metadata.c >> +++ b/drivers/md/dm-zoned-metadata.c >> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, >> bio->bi_bdev = zmd->dev->bdev; >> bio->bi_private = mblk; >> bio->bi_end_io = dmz_mblock_bio_end_io; >> - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); >> + bio_set_op_attrs(bio, REQ_OP_WRITE, >> + REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE); > > Please just assign bi_opf directly instea dof using bio_set_op_attrs > in new code. OK. Will do. > Also what do you need REQ_IDLE for? It is not really needed here, but the wbt_should_throttle() code test is: if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) return false; No if REQ_IDLE is not set, wbt will trigger. Having a req flag that says "no wb throttling please" would be a lot cleaner... -- Damien Le Moal, Western Digital ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-zoned: Avoid metadata flush writeback throttling 2017-07-24 9:16 ` Damien Le Moal @ 2017-09-11 15:05 ` Mike Snitzer 2017-09-12 5:46 ` Damien Le Moal 2017-09-12 6:12 ` Mikulas Patocka 0 siblings, 2 replies; 9+ messages in thread From: Mike Snitzer @ 2017-09-11 15:05 UTC (permalink / raw) To: Damien Le Moal Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Mikulas Patocka, Alasdair Kergon On Mon, Jul 24 2017 at 5:16am -0400, Damien Le Moal <damien.lemoal@wdc.com> wrote: > > > On 7/24/17 17:03, Christoph Hellwig wrote: > > On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote: > >> Avoid metadata flush to be blocked by the writeback throttling code in > >> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write > >> load with all chunk works active. In such situation, a deadlock can > >> happen if the flush work is blocked by the throttling code while holding > >> the metadata lock: as the chunk works will wait for the metadata lock > >> too, no progress can be made. > >> > >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > >> --- > >> drivers/md/dm-zoned-metadata.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c > >> index 884ff7c170a0..f694fb98b002 100644 > >> --- a/drivers/md/dm-zoned-metadata.c > >> +++ b/drivers/md/dm-zoned-metadata.c > >> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, > >> bio->bi_bdev = zmd->dev->bdev; > >> bio->bi_private = mblk; > >> bio->bi_end_io = dmz_mblock_bio_end_io; > >> - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); > >> + bio_set_op_attrs(bio, REQ_OP_WRITE, > >> + REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE); > > > > Please just assign bi_opf directly instea dof using bio_set_op_attrs > > in new code. > > OK. Will do. > > > Also what do you need REQ_IDLE for? > > It is not really needed here, but the wbt_should_throttle() code test is: > > if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) > return false; > > So if REQ_IDLE is not set, wbt will trigger. > > Having a req flag that says "no wb throttling please" would be a lot > cleaner... Did you have a new version of this patch? Thanks, Mike ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-zoned: Avoid metadata flush writeback throttling 2017-09-11 15:05 ` Mike Snitzer @ 2017-09-12 5:46 ` Damien Le Moal 2017-09-12 6:12 ` Mikulas Patocka 1 sibling, 0 replies; 9+ messages in thread From: Damien Le Moal @ 2017-09-12 5:46 UTC (permalink / raw) To: Mike Snitzer Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Mikulas Patocka, Alasdair Kergon Mike, On 9/12/17 00:05, Mike Snitzer wrote: > On Mon, Jul 24 2017 at 5:16am -0400, > Damien Le Moal <damien.lemoal@wdc.com> wrote: > >> >> >> On 7/24/17 17:03, Christoph Hellwig wrote: >>> On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote: >>>> Avoid metadata flush to be blocked by the writeback throttling code in >>>> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write >>>> load with all chunk works active. In such situation, a deadlock can >>>> happen if the flush work is blocked by the throttling code while holding >>>> the metadata lock: as the chunk works will wait for the metadata lock >>>> too, no progress can be made. >>>> >>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >>>> --- >>>> drivers/md/dm-zoned-metadata.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c >>>> index 884ff7c170a0..f694fb98b002 100644 >>>> --- a/drivers/md/dm-zoned-metadata.c >>>> +++ b/drivers/md/dm-zoned-metadata.c >>>> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, >>>> bio->bi_bdev = zmd->dev->bdev; >>>> bio->bi_private = mblk; >>>> bio->bi_end_io = dmz_mblock_bio_end_io; >>>> - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); >>>> + bio_set_op_attrs(bio, REQ_OP_WRITE, >>>> + REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE); >>> >>> Please just assign bi_opf directly instea dof using bio_set_op_attrs >>> in new code. >> >> OK. Will do. >> >>> Also what do you need REQ_IDLE for? >> >> It is not really needed here, but the wbt_should_throttle() code test is: >> >> if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) >> return false; >> >> So if REQ_IDLE is not set, wbt will trigger. >> >> Having a req flag that says "no wb throttling please" would be a lot >> cleaner... > > Did you have a new version of this patch? > > Thanks, > Mike My apologies, I forgot about this one. Sending right away. Best regards. -- Damien Le Moal, Western Digital ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-zoned: Avoid metadata flush writeback throttling 2017-09-11 15:05 ` Mike Snitzer 2017-09-12 5:46 ` Damien Le Moal @ 2017-09-12 6:12 ` Mikulas Patocka 2017-09-12 8:01 ` Damien Le Moal 1 sibling, 1 reply; 9+ messages in thread From: Mikulas Patocka @ 2017-09-12 6:12 UTC (permalink / raw) To: Mike Snitzer Cc: Christoph Hellwig, Bart Van Assche, Damien Le Moal, dm-devel, Alasdair Kergon On Mon, 11 Sep 2017, Mike Snitzer wrote: > On Mon, Jul 24 2017 at 5:16am -0400, > Damien Le Moal <damien.lemoal@wdc.com> wrote: > > > > > > > On 7/24/17 17:03, Christoph Hellwig wrote: > > > On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote: > > >> Avoid metadata flush to be blocked by the writeback throttling code in > > >> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write > > >> load with all chunk works active. In such situation, a deadlock can > > >> happen if the flush work is blocked by the throttling code while holding > > >> the metadata lock: as the chunk works will wait for the metadata lock > > >> too, no progress can be made. Could you explain, what's the specific problem here? The writeback throttling code is executed at request level and the dm-zoned target working above it, at bio level. So I don't see how dm-zoned progress could be blocked by writeback throttling. Maybe you have some other bug in your code and this patch just masks it? Mikulas > > >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > > >> --- > > >> drivers/md/dm-zoned-metadata.c | 3 ++- > > >> 1 file changed, 2 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c > > >> index 884ff7c170a0..f694fb98b002 100644 > > >> --- a/drivers/md/dm-zoned-metadata.c > > >> +++ b/drivers/md/dm-zoned-metadata.c > > >> @@ -567,7 +567,8 @@ static void dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk, > > >> bio->bi_bdev = zmd->dev->bdev; > > >> bio->bi_private = mblk; > > >> bio->bi_end_io = dmz_mblock_bio_end_io; > > >> - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO); > > >> + bio_set_op_attrs(bio, REQ_OP_WRITE, > > >> + REQ_META | REQ_PRIO | REQ_SYNC | REQ_IDLE); > > > > > > Please just assign bi_opf directly instea dof using bio_set_op_attrs > > > in new code. > > > > OK. Will do. > > > > > Also what do you need REQ_IDLE for? > > > > It is not really needed here, but the wbt_should_throttle() code test is: > > > > if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) > > return false; > > > > So if REQ_IDLE is not set, wbt will trigger. > > > > Having a req flag that says "no wb throttling please" would be a lot > > cleaner... > > Did you have a new version of this patch? > > Thanks, > Mike > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-zoned: Avoid metadata flush writeback throttling 2017-09-12 6:12 ` Mikulas Patocka @ 2017-09-12 8:01 ` Damien Le Moal 2017-09-12 10:32 ` Mikulas Patocka 0 siblings, 1 reply; 9+ messages in thread From: Damien Le Moal @ 2017-09-12 8:01 UTC (permalink / raw) To: Mikulas Patocka, Mike Snitzer Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Alasdair Kergon Mikulas, On 9/12/17 15:12, Mikulas Patocka wrote: > > > On Mon, 11 Sep 2017, Mike Snitzer wrote: > >> On Mon, Jul 24 2017 at 5:16am -0400, >> Damien Le Moal <damien.lemoal@wdc.com> wrote: >> >>> >>> >>> On 7/24/17 17:03, Christoph Hellwig wrote: >>>> On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote: >>>>> Avoid metadata flush to be blocked by the writeback throttling code in >>>>> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write >>>>> load with all chunk works active. In such situation, a deadlock can >>>>> happen if the flush work is blocked by the throttling code while holding >>>>> the metadata lock: as the chunk works will wait for the metadata lock >>>>> too, no progress can be made. > > Could you explain, what's the specific problem here? > > The writeback throttling code is executed at request level and the > dm-zoned target working above it, at bio level. So I don't see how > dm-zoned progress could be blocked by writeback throttling. Request layer? All the code in block/blk-wbt.c acts on BIOs, in particular the function wbt_wait() which is called before get_request() in blk_queue_bio(), which is q->make_request_fn, so all of this happens before a request even exists for the BIO. wbt_wait() calls wbt_should_throttle() for the BIO to queue and will cause blk_queue_bio() to sleep if that functions returns true and the number of in-flight BIOs exceeds the set limit (wb_max, wb_normal or wb_background). Are we talking about a different throttling here ? Or am I entirely missing something ? > Maybe you have some other bug in your code and this patch just masks it? Doing more testing with the current 4.13, I am now failing to reproduce the hang. So you are right, it likely was something else. In fact, analyzing writeback throttling more carefully, I realized that the BIOs received by the chunk work and the metadata flush work BIOs are throttled against different in-flight counters as the former BIOs are issued to the target device queue while the latter are issued to the target backing dev queue. As a result, one should not be blocking the other. Chunk works may have to wait for metadata flush to complete first, but that is before these works issue BIOs on the target bdev so they cannot in turn block flush on a throttling condition. Thanks for asking these hard questions! Mike, I will keep an eye open for issues but everything seems OK for now. So let's drop this patch. My apologies for the noise. Since bio_set_op_attrs() is deprecated, I will send a patch to remove these calls. Best regards. -- Damien Le Moal, Western Digital ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-zoned: Avoid metadata flush writeback throttling 2017-09-12 8:01 ` Damien Le Moal @ 2017-09-12 10:32 ` Mikulas Patocka 2017-09-13 0:38 ` Damien Le Moal 0 siblings, 1 reply; 9+ messages in thread From: Mikulas Patocka @ 2017-09-12 10:32 UTC (permalink / raw) To: Damien Le Moal Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Alasdair Kergon, Mike Snitzer On Tue, 12 Sep 2017, Damien Le Moal wrote: > Mikulas, > > On 9/12/17 15:12, Mikulas Patocka wrote: > > > > > > On Mon, 11 Sep 2017, Mike Snitzer wrote: > > > >> On Mon, Jul 24 2017 at 5:16am -0400, > >> Damien Le Moal <damien.lemoal@wdc.com> wrote: > >> > >>> > >>> > >>> On 7/24/17 17:03, Christoph Hellwig wrote: > >>>> On Mon, Jul 24, 2017 at 04:44:22PM +0900, Damien Le Moal wrote: > >>>>> Avoid metadata flush to be blocked by the writeback throttling code in > >>>>> wbt_wait(). Otherwise, we can end up with a deadlock under heavy write > >>>>> load with all chunk works active. In such situation, a deadlock can > >>>>> happen if the flush work is blocked by the throttling code while holding > >>>>> the metadata lock: as the chunk works will wait for the metadata lock > >>>>> too, no progress can be made. > > > > Could you explain, what's the specific problem here? > > > > The writeback throttling code is executed at request level and the > > dm-zoned target working above it, at bio level. So I don't see how > > dm-zoned progress could be blocked by writeback throttling. > > Request layer? All the code in block/blk-wbt.c acts on BIOs, in > particular the function wbt_wait() which is called before get_request() > in blk_queue_bio(), which is q->make_request_fn, so all of this happens blk_queue_bio() is only called for request-based drivers (i.e. physical disk drivers), not for bio-based device mapper drivers. So, bios incoming to your target will never be delayed because of throttling, only the outgoing bios will be delayed. > before a request even exists for the BIO. wbt_wait() calls > wbt_should_throttle() for the BIO to queue and will cause > blk_queue_bio() to sleep if that functions returns true and the number > of in-flight BIOs exceeds the set limit (wb_max, wb_normal or > wb_background). > > Are we talking about a different throttling here ? Or am I entirely > missing something ? So, you send some bios to the physical disk driver. The physical disk driver eventually starts throttling and blocks in the function blk_queue_bio(). If your driver deadlocks because the underlying disk driver blocks, then it is a bug in your driver. The patch that you sent seems to be just a workaround, not a real fix. The underlying disk driver may block for multiple reasons. For example, if you set "echo 4 >/sys/block/sda/queue/nr_requests" on the disk, it will allow just 4 outstanding requests and then block. The disk driver can also block at random times, if there is temporary memory shortage and the request structure can't be allocated. > > Maybe you have some other bug in your code and this patch just masks it? > > Doing more testing with the current 4.13, I am now failing to reproduce > the hang. So you are right, it likely was something else. > > In fact, analyzing writeback throttling more carefully, I realized that > the BIOs received by the chunk work and the metadata flush work BIOs are > throttled against different in-flight counters as the former BIOs are > issued to the target device queue while the latter are issued to the > target backing dev queue. As a result, one should not be blocking the > other. Chunk works may have to wait for metadata flush to complete > first, but that is before these works issue BIOs on the target bdev so > they cannot in turn block flush on a throttling condition. > > Thanks for asking these hard questions! The function submit_bio() or generic_make_request() can block anytime. If you driver assumes that it can submit multiple bios without blocking, it is a bug in your driver and you need to fix it. I suggest that you try "echo 4 >/sys/block/sda/queue/nr_requests" for the underlying disk and then try to reproduce the deadlock, analyze it and fix it. Mikulas > Mike, > > I will keep an eye open for issues but everything seems OK for now. So > let's drop this patch. My apologies for the noise. > Since bio_set_op_attrs() is deprecated, I will send a patch to remove > these calls. > > Best regards. > > -- > Damien Le Moal, > Western Digital > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dm-zoned: Avoid metadata flush writeback throttling 2017-09-12 10:32 ` Mikulas Patocka @ 2017-09-13 0:38 ` Damien Le Moal 0 siblings, 0 replies; 9+ messages in thread From: Damien Le Moal @ 2017-09-13 0:38 UTC (permalink / raw) To: Mikulas Patocka Cc: Christoph Hellwig, Bart Van Assche, dm-devel, Alasdair Kergon, Mike Snitzer Mikulas, On 9/12/17 19:32, Mikulas Patocka wrote: >>> The writeback throttling code is executed at request level and the >>> dm-zoned target working above it, at bio level. So I don't see how >>> dm-zoned progress could be blocked by writeback throttling. >> >> Request layer? All the code in block/blk-wbt.c acts on BIOs, in >> particular the function wbt_wait() which is called before get_request() >> in blk_queue_bio(), which is q->make_request_fn, so all of this happens > > blk_queue_bio() is only called for request-based drivers (i.e. physical > disk drivers), not for bio-based device mapper drivers. So, bios incoming > to your target will never be delayed because of throttling, only the > outgoing bios will be delayed. Arrg ! Of course ! What an idiot I am. The dm target queue request_fn function is different from that of the physical disks. You are absolutely correct and my analysis was just plain wrong. >> before a request even exists for the BIO. wbt_wait() calls >> wbt_should_throttle() for the BIO to queue and will cause >> blk_queue_bio() to sleep if that functions returns true and the number >> of in-flight BIOs exceeds the set limit (wb_max, wb_normal or >> wb_background). >> >> Are we talking about a different throttling here ? Or am I entirely >> missing something ? > > So, you send some bios to the physical disk driver. The physical disk > driver eventually starts throttling and blocks in the function > blk_queue_bio(). If your driver deadlocks because the underlying disk > driver blocks, then it is a bug in your driver. > > The patch that you sent seems to be just a workaround, not a real fix. The > underlying disk driver may block for multiple reasons. For example, if > you set "echo 4 >/sys/block/sda/queue/nr_requests" on the disk, it will > allow just 4 outstanding requests and then block. The disk driver can also > block at random times, if there is temporary memory shortage and the > request structure can't be allocated. dm-zoned chunk works which process target incoming BIOs and issue BIOs to the physical drive do not wait for the completion of the physical BIOs. Only the flush work does for BIOs handling metadata. But the flush work and the chunk works cannot execute simultaneously (there is a semaphore to prevent that). So no amount of blocking on BIO issuing (by wb throttling, memory or request allocation or any other reason) can cause a deadlock between chunk works and flush. Progress is always possible. >>> Maybe you have some other bug in your code and this patch just masks it? >> >> Doing more testing with the current 4.13, I am now failing to reproduce >> the hang. So you are right, it likely was something else. >> >> In fact, analyzing writeback throttling more carefully, I realized that >> the BIOs received by the chunk work and the metadata flush work BIOs are >> throttled against different in-flight counters as the former BIOs are >> issued to the target device queue while the latter are issued to the >> target backing dev queue. As a result, one should not be blocking the >> other. Chunk works may have to wait for metadata flush to complete >> first, but that is before these works issue BIOs on the target bdev so >> they cannot in turn block flush on a throttling condition. >> >> Thanks for asking these hard questions! > > The function submit_bio() or generic_make_request() can block anytime. If > you driver assumes that it can submit multiple bios without blocking, it > is a bug in your driver and you need to fix it. > > I suggest that you try "echo 4 >/sys/block/sda/queue/nr_requests" for the > underlying disk and then try to reproduce the deadlock, analyze it and fix > it. See above. I got it. I went back to the initial code & test conditions I used when I first reported the problem and sent the patch. I could recreate the problem. It turns out that the real cause is zone write locking in the scsi-mq path which can cause dispatch deadlock. If such deadlock occurs, the chunk work issued BIOs do not complete and a flush work run trying to sync metadata end up being blocked on wb throttling, but only because the chunk work BIOs are still in-flight. The real cause is not wb throttling, but ZBC+scsi-mq failing to make progress on the issued requests. The patch I sent is indeed useless. WB throttling is just fine with dm-zoned. I am working on fixing ZBC on scsi-mq. But that is different from the dm path and no patch is needed there. Thank you for all your comments. Best regards. -- Damien Le Moal, Western Digital ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-09-13 0:38 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-07-24 7:44 [PATCH] dm-zoned: Avoid metadata flush writeback throttling Damien Le Moal 2017-07-24 8:03 ` Christoph Hellwig 2017-07-24 9:16 ` Damien Le Moal 2017-09-11 15:05 ` Mike Snitzer 2017-09-12 5:46 ` Damien Le Moal 2017-09-12 6:12 ` Mikulas Patocka 2017-09-12 8:01 ` Damien Le Moal 2017-09-12 10:32 ` Mikulas Patocka 2017-09-13 0:38 ` Damien Le Moal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.