* Upcoming merge window
@ 2018-12-17 18:28 Jens Axboe
2018-12-17 23:16 ` Bart Van Assche
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-12-17 18:28 UTC (permalink / raw)
To: linux-block@vger.kernel.org
Hi,
As I'm sure you're all aware, the merge window is coming up. This time
it happens to coincide with that is a holiday for most. My plan is to
send in an EARLY pull request to Linus, Thursday at the latest. If you're
sitting on anything that should go in with the initial merge, then I need
to have it ASAP.
I'll do a later pull about a week in with things that were missed, but
I'm really hoping to make that fixes only. Any driver updates etc should
go in now.
Thanks,
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upcoming merge window
2018-12-17 18:28 Upcoming merge window Jens Axboe
@ 2018-12-17 23:16 ` Bart Van Assche
2018-12-17 23:27 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2018-12-17 23:16 UTC (permalink / raw)
To: Jens Axboe, linux-block@vger.kernel.org
On Mon, 2018-12-17 at 11:28 -0700, Jens Axboe wrote:
> As I'm sure you're all aware, the merge window is coming up. This time
> it happens to coincide with that is a holiday for most. My plan is to
> send in an EARLY pull request to Linus, Thursday at the latest. If you're
> sitting on anything that should go in with the initial merge, then I need
> to have it ASAP.
>
> I'll do a later pull about a week in with things that were missed, but
> I'm really hoping to make that fixes only. Any driver updates etc should
> go in now.
Hi Jens,
If I run blktests/srp/002 against Linus' master branch then that test passes,
no matter how many times I run that test. If I run that test against your
for-next branch however (commit 6a252f2772c0) then that test hangs. The output
of my list-pending-block-requests script is as follows when the hang occurs:
dm-0
dm-1
dm-1/hctx0/active:0
dm-1/hctx0/flags:alloc_policy=FIFO SHOULD_MERGE|SG_MERGE
dm-1/hctx0/tags:nr_tags=2048
dm-1/hctx0/tags:nr_reserved_tags=0
dm-1/hctx0/tags:active_queues=0
dm-1/hctx0/tags:bitmap_tags:
dm-1/hctx0/tags:depth=2048
dm-1/hctx0/tags:busy=798
dm-1/hctx0/tags:cleared=935
dm-1/hctx0/tags:bits_per_word=64
dm-1/hctx0/tags:map_nr=32
dm-1/hctx0/tags:alloc_hint={682, 463, 595, 1923, 841, 628}
dm-1/hctx0/tags:wake_batch=8
dm-1/hctx0/tags:wake_index=0
dm-1/hctx0/tags:ws_active=0
dm-1/hctx0/tags:ws={
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags: {.wait_cnt=8, .wait=inactive},
dm-1/hctx0/tags:}
dm-1/hctx0/tags:round_robin=0
dm-1/hctx0/tags:min_shallow_depth=4294967295
dm-1/hctx0/tags_bitmap:00000000: ff03 feff ffff ffff ffff ffff ffff ffff
dm-1/hctx0/tags_bitmap:00000010: ffff ffff ffff ffff ffff 1f70 0040 0100
dm-1/hctx0/tags_bitmap:00000020: ffff ffef f7bf ffe7 ffff ffff ffff ffdf
dm-1/hctx0/tags_bitmap:00000030: ffff ffff ffff ffff ffff ffff ffff ffff
dm-1/hctx0/tags_bitmap:00000040: ffff ffff ffff ff7f ffff ffff ffff ffff
dm-1/hctx0/tags_bitmap:00000050: ffff ffff ffff ffff ffff ffff ffff ffff
dm-1/hctx0/tags_bitmap:00000060: f3ff 1fe0 ffff f19f ffff ffff ffff ffff
dm-1/hctx0/tags_bitmap:00000070: ffff fff7 9bff fff7 0fec 1300 00e0 1f01
dm-1/hctx0/tags_bitmap:00000080: ffff ffff ffff e189 ff0f 90ff ffff ffff
dm-1/hctx0/tags_bitmap:00000090: ffff ffff ffff ffff ffff 0300 00fd 0780
dm-1/hctx0/tags_bitmap:000000a0: ffff ffff ffff ffff ff0f 0000 0000 5400
dm-1/hctx0/tags_bitmap:000000b0: ffff ffff ffff ffff ffff ffff ffff ff0f
dm-1/hctx0/tags_bitmap:000000c0: ffff ffff ffff ff3f ffff ffff ffff 7f1c
dm-1/hctx0/tags_bitmap:000000d0: ffff ffff ffff ffff ffff ffff ffff ffff
dm-1/hctx0/tags_bitmap:000000e0: ffff ffff 0f00 0040 1de0 1400 3cdf 0300
dm-1/hctx0/tags_bitmap:000000f0: ffff ffff 1f1c 7cfb ffff ffff ffff 1fef
dm-1/state:SAME_COMP|NONROT|IO_STAT|INIT_DONE|STATS|REGISTERED|QUIESCED
dm-1/queue/scheduler:[none] bfq kyber mq-deadline
dm-2
loop0
loop1
loop2
loop3
loop4
loop5
loop6
loop7
nullb0
nullb1
sda
sdb
sdc
sdd
sde
sdf
sdg
sdh
vda
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upcoming merge window
2018-12-17 23:16 ` Bart Van Assche
@ 2018-12-17 23:27 ` Jens Axboe
2018-12-17 23:49 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-12-17 23:27 UTC (permalink / raw)
To: Bart Van Assche, linux-block@vger.kernel.org
On 12/17/18 4:16 PM, Bart Van Assche wrote:
> On Mon, 2018-12-17 at 11:28 -0700, Jens Axboe wrote:
>> As I'm sure you're all aware, the merge window is coming up. This time
>> it happens to coincide with that is a holiday for most. My plan is to
>> send in an EARLY pull request to Linus, Thursday at the latest. If you're
>> sitting on anything that should go in with the initial merge, then I need
>> to have it ASAP.
>>
>> I'll do a later pull about a week in with things that were missed, but
>> I'm really hoping to make that fixes only. Any driver updates etc should
>> go in now.
>
> Hi Jens,
>
> If I run blktests/srp/002 against Linus' master branch then that test passes,
> no matter how many times I run that test. If I run that test against your
> for-next branch however (commit 6a252f2772c0) then that test hangs. The output
> of my list-pending-block-requests script is as follows when the hang occurs:
Ugh, I'll try and run that here again, that test is unfortunately such a pain
to run and requires me to manually install multipath libs (and remember to
uninstall before rebooting, or udev fails?).
I'll take a look!
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upcoming merge window
2018-12-17 23:27 ` Jens Axboe
@ 2018-12-17 23:49 ` Jens Axboe
2018-12-18 0:16 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-12-17 23:49 UTC (permalink / raw)
To: Bart Van Assche, linux-block@vger.kernel.org; +Cc: Ming Lei, Mike Snitzer
On 12/17/18 4:27 PM, Jens Axboe wrote:
> On 12/17/18 4:16 PM, Bart Van Assche wrote:
>> On Mon, 2018-12-17 at 11:28 -0700, Jens Axboe wrote:
>>> As I'm sure you're all aware, the merge window is coming up. This time
>>> it happens to coincide with that is a holiday for most. My plan is to
>>> send in an EARLY pull request to Linus, Thursday at the latest. If you're
>>> sitting on anything that should go in with the initial merge, then I need
>>> to have it ASAP.
>>>
>>> I'll do a later pull about a week in with things that were missed, but
>>> I'm really hoping to make that fixes only. Any driver updates etc should
>>> go in now.
>>
>> Hi Jens,
>>
>> If I run blktests/srp/002 against Linus' master branch then that test passes,
>> no matter how many times I run that test. If I run that test against your
>> for-next branch however (commit 6a252f2772c0) then that test hangs. The output
>> of my list-pending-block-requests script is as follows when the hang occurs:
>
> Ugh, I'll try and run that here again, that test is unfortunately such a pain
> to run and requires me to manually install multipath libs (and remember to
> uninstall before rebooting, or udev fails?).
>
> I'll take a look!
Looks like what Ming was talking about. CC'ing Ming and Mike. Lots of
kworkers are stuck like this:
[ 252.310187] kworker/2:19 D14072 8147 2 0x80000000
[ 252.316803] Workqueue: dio/dm-2 dio_aio_complete_work
[ 252.322925] Call Trace:
[ 252.326137] ? __schedule+0x231/0x5f0
[ 252.330703] schedule+0x2a/0x80
[ 252.334689] rwsem_down_write_failed+0x204/0x320
[ 252.340330] ? generic_make_request_checks+0x55/0x370
[ 252.346542] ? call_rwsem_down_write_failed+0x13/0x20
[ 252.352669] call_rwsem_down_write_failed+0x13/0x20
[ 252.358601] down_write+0x1b/0x30
[ 252.362781] __generic_file_fsync+0x3e/0xb0
[ 252.367933] ext4_sync_file+0xcc/0x2e0
[ 252.372599] dio_complete+0x1c4/0x210
[ 252.377168] process_one_work+0x1cb/0x350
[ 252.382915] worker_thread+0x28/0x3c0
[ 252.387482] ? process_one_work+0x350/0x350
[ 252.392632] kthread+0x107/0x120
[ 252.396717] ? kthread_park+0x80/0x80
[ 252.401285] ret_from_fork+0x1f/0x30
Where did this regression come from? This was passing just fine
recently.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upcoming merge window
2018-12-17 23:49 ` Jens Axboe
@ 2018-12-18 0:16 ` Jens Axboe
2018-12-18 0:26 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-12-18 0:16 UTC (permalink / raw)
To: Bart Van Assche, linux-block@vger.kernel.org; +Cc: Ming Lei, Mike Snitzer
On 12/17/18 4:49 PM, Jens Axboe wrote:
> On 12/17/18 4:27 PM, Jens Axboe wrote:
>> On 12/17/18 4:16 PM, Bart Van Assche wrote:
>>> On Mon, 2018-12-17 at 11:28 -0700, Jens Axboe wrote:
>>>> As I'm sure you're all aware, the merge window is coming up. This time
>>>> it happens to coincide with that is a holiday for most. My plan is to
>>>> send in an EARLY pull request to Linus, Thursday at the latest. If you're
>>>> sitting on anything that should go in with the initial merge, then I need
>>>> to have it ASAP.
>>>>
>>>> I'll do a later pull about a week in with things that were missed, but
>>>> I'm really hoping to make that fixes only. Any driver updates etc should
>>>> go in now.
>>>
>>> Hi Jens,
>>>
>>> If I run blktests/srp/002 against Linus' master branch then that test passes,
>>> no matter how many times I run that test. If I run that test against your
>>> for-next branch however (commit 6a252f2772c0) then that test hangs. The output
>>> of my list-pending-block-requests script is as follows when the hang occurs:
>>
>> Ugh, I'll try and run that here again, that test is unfortunately such a pain
>> to run and requires me to manually install multipath libs (and remember to
>> uninstall before rebooting, or udev fails?).
>>
>> I'll take a look!
>
> Looks like what Ming was talking about. CC'ing Ming and Mike. Lots of
> kworkers are stuck like this:
>
> [ 252.310187] kworker/2:19 D14072 8147 2 0x80000000
> [ 252.316803] Workqueue: dio/dm-2 dio_aio_complete_work
> [ 252.322925] Call Trace:
> [ 252.326137] ? __schedule+0x231/0x5f0
> [ 252.330703] schedule+0x2a/0x80
> [ 252.334689] rwsem_down_write_failed+0x204/0x320
> [ 252.340330] ? generic_make_request_checks+0x55/0x370
> [ 252.346542] ? call_rwsem_down_write_failed+0x13/0x20
> [ 252.352669] call_rwsem_down_write_failed+0x13/0x20
> [ 252.358601] down_write+0x1b/0x30
> [ 252.362781] __generic_file_fsync+0x3e/0xb0
> [ 252.367933] ext4_sync_file+0xcc/0x2e0
> [ 252.372599] dio_complete+0x1c4/0x210
> [ 252.377168] process_one_work+0x1cb/0x350
> [ 252.382915] worker_thread+0x28/0x3c0
> [ 252.387482] ? process_one_work+0x350/0x350
> [ 252.392632] kthread+0x107/0x120
> [ 252.396717] ? kthread_park+0x80/0x80
> [ 252.401285] ret_from_fork+0x1f/0x30
>
> Where did this regression come from? This was passing just fine
> recently.
Looks like this is the offending commit:
commit c4576aed8d85d808cd6443bda58393d525207d01
Author: Mike Snitzer <snitzer@redhat.com>
Date: Tue Dec 11 09:10:26 2018 -0500
dm: fix request-based dm's use of dm_wait_for_completion
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upcoming merge window
2018-12-18 0:16 ` Jens Axboe
@ 2018-12-18 0:26 ` Jens Axboe
2018-12-18 3:45 ` Mike Snitzer
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-12-18 0:26 UTC (permalink / raw)
To: Bart Van Assche, linux-block@vger.kernel.org; +Cc: Ming Lei, Mike Snitzer
On 12/17/18 5:16 PM, Jens Axboe wrote:
> On 12/17/18 4:49 PM, Jens Axboe wrote:
>> On 12/17/18 4:27 PM, Jens Axboe wrote:
>>> On 12/17/18 4:16 PM, Bart Van Assche wrote:
>>>> On Mon, 2018-12-17 at 11:28 -0700, Jens Axboe wrote:
>>>>> As I'm sure you're all aware, the merge window is coming up. This time
>>>>> it happens to coincide with that is a holiday for most. My plan is to
>>>>> send in an EARLY pull request to Linus, Thursday at the latest. If you're
>>>>> sitting on anything that should go in with the initial merge, then I need
>>>>> to have it ASAP.
>>>>>
>>>>> I'll do a later pull about a week in with things that were missed, but
>>>>> I'm really hoping to make that fixes only. Any driver updates etc should
>>>>> go in now.
>>>>
>>>> Hi Jens,
>>>>
>>>> If I run blktests/srp/002 against Linus' master branch then that test passes,
>>>> no matter how many times I run that test. If I run that test against your
>>>> for-next branch however (commit 6a252f2772c0) then that test hangs. The output
>>>> of my list-pending-block-requests script is as follows when the hang occurs:
>>>
>>> Ugh, I'll try and run that here again, that test is unfortunately such a pain
>>> to run and requires me to manually install multipath libs (and remember to
>>> uninstall before rebooting, or udev fails?).
>>>
>>> I'll take a look!
>>
>> Looks like what Ming was talking about. CC'ing Ming and Mike. Lots of
>> kworkers are stuck like this:
>>
>> [ 252.310187] kworker/2:19 D14072 8147 2 0x80000000
>> [ 252.316803] Workqueue: dio/dm-2 dio_aio_complete_work
>> [ 252.322925] Call Trace:
>> [ 252.326137] ? __schedule+0x231/0x5f0
>> [ 252.330703] schedule+0x2a/0x80
>> [ 252.334689] rwsem_down_write_failed+0x204/0x320
>> [ 252.340330] ? generic_make_request_checks+0x55/0x370
>> [ 252.346542] ? call_rwsem_down_write_failed+0x13/0x20
>> [ 252.352669] call_rwsem_down_write_failed+0x13/0x20
>> [ 252.358601] down_write+0x1b/0x30
>> [ 252.362781] __generic_file_fsync+0x3e/0xb0
>> [ 252.367933] ext4_sync_file+0xcc/0x2e0
>> [ 252.372599] dio_complete+0x1c4/0x210
>> [ 252.377168] process_one_work+0x1cb/0x350
>> [ 252.382915] worker_thread+0x28/0x3c0
>> [ 252.387482] ? process_one_work+0x350/0x350
>> [ 252.392632] kthread+0x107/0x120
>> [ 252.396717] ? kthread_park+0x80/0x80
>> [ 252.401285] ret_from_fork+0x1f/0x30
>>
>> Where did this regression come from? This was passing just fine
>> recently.
>
> Looks like this is the offending commit:
>
> commit c4576aed8d85d808cd6443bda58393d525207d01
> Author: Mike Snitzer <snitzer@redhat.com>
> Date: Tue Dec 11 09:10:26 2018 -0500
>
> dm: fix request-based dm's use of dm_wait_for_completion
Yep confirmed, reverted that on top and it passes. dm-2 has plenty of
requests that are allocated and pending dispatch, so the md_in_flight()
will return true. Mike, should it be checking for allocated requests or
in-flight?
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Upcoming merge window
2018-12-18 0:26 ` Jens Axboe
@ 2018-12-18 3:45 ` Mike Snitzer
2018-12-18 4:13 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Mike Snitzer @ 2018-12-18 3:45 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, linux-block@vger.kernel.org, Ming Lei
On Mon, Dec 17 2018 at 7:26pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:
> On 12/17/18 5:16 PM, Jens Axboe wrote:
> > On 12/17/18 4:49 PM, Jens Axboe wrote:
> >> On 12/17/18 4:27 PM, Jens Axboe wrote:
> >>> On 12/17/18 4:16 PM, Bart Van Assche wrote:
> >>>> On Mon, 2018-12-17 at 11:28 -0700, Jens Axboe wrote:
> >>>>> As I'm sure you're all aware, the merge window is coming up. This time
> >>>>> it happens to coincide with that is a holiday for most. My plan is to
> >>>>> send in an EARLY pull request to Linus, Thursday at the latest. If you're
> >>>>> sitting on anything that should go in with the initial merge, then I need
> >>>>> to have it ASAP.
> >>>>>
> >>>>> I'll do a later pull about a week in with things that were missed, but
> >>>>> I'm really hoping to make that fixes only. Any driver updates etc should
> >>>>> go in now.
> >>>>
> >>>> Hi Jens,
> >>>>
> >>>> If I run blktests/srp/002 against Linus' master branch then that test passes,
> >>>> no matter how many times I run that test. If I run that test against your
> >>>> for-next branch however (commit 6a252f2772c0) then that test hangs. The output
> >>>> of my list-pending-block-requests script is as follows when the hang occurs:
> >>>
> >>> Ugh, I'll try and run that here again, that test is unfortunately such a pain
> >>> to run and requires me to manually install multipath libs (and remember to
> >>> uninstall before rebooting, or udev fails?).
> >>>
> >>> I'll take a look!
> >>
> >> Looks like what Ming was talking about. CC'ing Ming and Mike. Lots of
> >> kworkers are stuck like this:
> >>
> >> [ 252.310187] kworker/2:19 D14072 8147 2 0x80000000
> >> [ 252.316803] Workqueue: dio/dm-2 dio_aio_complete_work
> >> [ 252.322925] Call Trace:
> >> [ 252.326137] ? __schedule+0x231/0x5f0
> >> [ 252.330703] schedule+0x2a/0x80
> >> [ 252.334689] rwsem_down_write_failed+0x204/0x320
> >> [ 252.340330] ? generic_make_request_checks+0x55/0x370
> >> [ 252.346542] ? call_rwsem_down_write_failed+0x13/0x20
> >> [ 252.352669] call_rwsem_down_write_failed+0x13/0x20
> >> [ 252.358601] down_write+0x1b/0x30
> >> [ 252.362781] __generic_file_fsync+0x3e/0xb0
> >> [ 252.367933] ext4_sync_file+0xcc/0x2e0
> >> [ 252.372599] dio_complete+0x1c4/0x210
> >> [ 252.377168] process_one_work+0x1cb/0x350
> >> [ 252.382915] worker_thread+0x28/0x3c0
> >> [ 252.387482] ? process_one_work+0x350/0x350
> >> [ 252.392632] kthread+0x107/0x120
> >> [ 252.396717] ? kthread_park+0x80/0x80
> >> [ 252.401285] ret_from_fork+0x1f/0x30
> >>
> >> Where did this regression come from? This was passing just fine
> >> recently.
> >
> > Looks like this is the offending commit:
> >
> > commit c4576aed8d85d808cd6443bda58393d525207d01
> > Author: Mike Snitzer <snitzer@redhat.com>
> > Date: Tue Dec 11 09:10:26 2018 -0500
> >
> > dm: fix request-based dm's use of dm_wait_for_completion
>
> Yep confirmed, reverted that on top and it passes. dm-2 has plenty of
> requests that are allocated and pending dispatch, so the md_in_flight()
> will return true. Mike, should it be checking for allocated requests or
> in-flight?
I thought we could just check for allocated (as blk_mq_check_busy() does
now) but clearly that is too broad a scope because I tested your
suggestion and it allows the srp/002 test to pass:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6847f014606b..edbf4bb1b3e8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -812,7 +812,7 @@ static bool blk_mq_check_busy(struct blk_mq_hw_ctx *hctx, struct request *rq,
* If we find a request, we know the queue is busy. Return false
* to stop the iteration.
*/
- if (rq->q == hctx->queue) {
+ if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) {
bool *busy = priv;
*busy = true;
blk_mq_check_busy() was introduced for DM to user as a replacement for
its own inflight accounting it was doing:
ae879912 blk-mq: provide a helper to check if a queue is busy
So nothing else is currently calling it, but if you'd prefer to rename
the functions to reflect the narrower MQ_RQ_IN_FLIGHT check that is fine
by me (e.g. blk_mq_check_inflight and blk_mq_queue_has_inflight).
Mike
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Upcoming merge window
2018-12-18 3:45 ` Mike Snitzer
@ 2018-12-18 4:13 ` Jens Axboe
0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2018-12-18 4:13 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Bart Van Assche, linux-block@vger.kernel.org, Ming Lei
On 12/17/18 8:45 PM, Mike Snitzer wrote:
> On Mon, Dec 17 2018 at 7:26pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
>
>> On 12/17/18 5:16 PM, Jens Axboe wrote:
>>> On 12/17/18 4:49 PM, Jens Axboe wrote:
>>>> On 12/17/18 4:27 PM, Jens Axboe wrote:
>>>>> On 12/17/18 4:16 PM, Bart Van Assche wrote:
>>>>>> On Mon, 2018-12-17 at 11:28 -0700, Jens Axboe wrote:
>>>>>>> As I'm sure you're all aware, the merge window is coming up. This time
>>>>>>> it happens to coincide with that is a holiday for most. My plan is to
>>>>>>> send in an EARLY pull request to Linus, Thursday at the latest. If you're
>>>>>>> sitting on anything that should go in with the initial merge, then I need
>>>>>>> to have it ASAP.
>>>>>>>
>>>>>>> I'll do a later pull about a week in with things that were missed, but
>>>>>>> I'm really hoping to make that fixes only. Any driver updates etc should
>>>>>>> go in now.
>>>>>>
>>>>>> Hi Jens,
>>>>>>
>>>>>> If I run blktests/srp/002 against Linus' master branch then that test passes,
>>>>>> no matter how many times I run that test. If I run that test against your
>>>>>> for-next branch however (commit 6a252f2772c0) then that test hangs. The output
>>>>>> of my list-pending-block-requests script is as follows when the hang occurs:
>>>>>
>>>>> Ugh, I'll try and run that here again, that test is unfortunately such a pain
>>>>> to run and requires me to manually install multipath libs (and remember to
>>>>> uninstall before rebooting, or udev fails?).
>>>>>
>>>>> I'll take a look!
>>>>
>>>> Looks like what Ming was talking about. CC'ing Ming and Mike. Lots of
>>>> kworkers are stuck like this:
>>>>
>>>> [ 252.310187] kworker/2:19 D14072 8147 2 0x80000000
>>>> [ 252.316803] Workqueue: dio/dm-2 dio_aio_complete_work
>>>> [ 252.322925] Call Trace:
>>>> [ 252.326137] ? __schedule+0x231/0x5f0
>>>> [ 252.330703] schedule+0x2a/0x80
>>>> [ 252.334689] rwsem_down_write_failed+0x204/0x320
>>>> [ 252.340330] ? generic_make_request_checks+0x55/0x370
>>>> [ 252.346542] ? call_rwsem_down_write_failed+0x13/0x20
>>>> [ 252.352669] call_rwsem_down_write_failed+0x13/0x20
>>>> [ 252.358601] down_write+0x1b/0x30
>>>> [ 252.362781] __generic_file_fsync+0x3e/0xb0
>>>> [ 252.367933] ext4_sync_file+0xcc/0x2e0
>>>> [ 252.372599] dio_complete+0x1c4/0x210
>>>> [ 252.377168] process_one_work+0x1cb/0x350
>>>> [ 252.382915] worker_thread+0x28/0x3c0
>>>> [ 252.387482] ? process_one_work+0x350/0x350
>>>> [ 252.392632] kthread+0x107/0x120
>>>> [ 252.396717] ? kthread_park+0x80/0x80
>>>> [ 252.401285] ret_from_fork+0x1f/0x30
>>>>
>>>> Where did this regression come from? This was passing just fine
>>>> recently.
>>>
>>> Looks like this is the offending commit:
>>>
>>> commit c4576aed8d85d808cd6443bda58393d525207d01
>>> Author: Mike Snitzer <snitzer@redhat.com>
>>> Date: Tue Dec 11 09:10:26 2018 -0500
>>>
>>> dm: fix request-based dm's use of dm_wait_for_completion
>>
>> Yep confirmed, reverted that on top and it passes. dm-2 has plenty of
>> requests that are allocated and pending dispatch, so the md_in_flight()
>> will return true. Mike, should it be checking for allocated requests or
>> in-flight?
>
> I thought we could just check for allocated (as blk_mq_check_busy() does
> now) but clearly that is too broad a scope because I tested your
> suggestion and it allows the srp/002 test to pass:
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6847f014606b..edbf4bb1b3e8 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -812,7 +812,7 @@ static bool blk_mq_check_busy(struct blk_mq_hw_ctx *hctx, struct request *rq,
> * If we find a request, we know the queue is busy. Return false
> * to stop the iteration.
> */
> - if (rq->q == hctx->queue) {
> + if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) {
> bool *busy = priv;
>
> *busy = true;
>
> blk_mq_check_busy() was introduced for DM to user as a replacement for
> its own inflight accounting it was doing:
> ae879912 blk-mq: provide a helper to check if a queue is busy
>
> So nothing else is currently calling it, but if you'd prefer to rename
> the functions to reflect the narrower MQ_RQ_IN_FLIGHT check that is fine
> by me (e.g. blk_mq_check_inflight and blk_mq_queue_has_inflight).
I agree, let's do the fix and rename it to inflight instead, since that
now reflects what it does.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-12-18 4:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-17 18:28 Upcoming merge window Jens Axboe
2018-12-17 23:16 ` Bart Van Assche
2018-12-17 23:27 ` Jens Axboe
2018-12-17 23:49 ` Jens Axboe
2018-12-18 0:16 ` Jens Axboe
2018-12-18 0:26 ` Jens Axboe
2018-12-18 3:45 ` Mike Snitzer
2018-12-18 4:13 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox