* Re: WARNING triggers at blk_mq_update_nr_hw_queues during nvme_reset_work
2017-05-30 17:55 ` WARNING triggers at blk_mq_update_nr_hw_queues during nvme_reset_work Keith Busch
@ 2017-05-30 18:09 ` Bart Van Assche
2017-05-30 18:26 ` Jens Axboe
2017-05-30 18:30 ` Gabriel Krisman Bertazi
2 siblings, 0 replies; 4+ messages in thread
From: Bart Van Assche @ 2017-05-30 18:09 UTC (permalink / raw)
To: keith.busch@intel.com, krisman@collabora.co.uk
Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
axboe@fb.com
On Tue, 2017-05-30 at 13:55 -0400, Keith Busch wrote:
> On Tue, May 30, 2017 at 02:00:44PM -0300, Gabriel Krisman Bertazi wrote:
> > Since the merge window for 4.12, one of the machines in Intel's CI
> > started to hit the WARN_ON below at blk_mq_update_nr_hw_queues during a=
n
> > nvme_reset_work. The issue persists with the latest 4.12-rc3, and full
> > dmesg from boot, up to the moment where the WARN_ON triggers is
> > available at the following link:
> >=20
> > https://intel-gfx-ci.01.org/CI/CI_DRM_2672/fi-kbl-7500u/igt@kms_pipe_cr=
c_basic@suspend-read-crc-pipe-a.html
> >=20
> > Please notice that the test we do in the CI involves putting the
> > machine to sleep (PM), and the issue triggers when resuming execution.
> >=20
> > I have not been able to get my hands on the machine yet to do an actual
> > bisect, but I'm wondering if you guys might have an idea of what is
> > wrong.
> >=20
> > Any help is appreciated :)
>=20
> Hi Gabriel,
>=20
> This appears to be new behavior in blk-mq's tag set update with commit
> 705cda97e. This is asserting a lock is held, but none of the drivers
> that call the export are take that lock.
>=20
> I think the below should fix it (CC'ing block list and developers).
>=20
> ---
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f2224ffd..1bccced 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2641,7 +2641,8 @@ int blk_mq_update_nr_requests(struct request_queue =
*q, unsigned int nr)
> return ret;
> }
> =20
> -void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_qu=
eues)
> +static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
> + int nr_hw_queues)
> {
> struct request_queue *q;
> =20
> @@ -2665,6 +2666,13 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_=
set *set, int nr_hw_queues)
> list_for_each_entry(q, &set->tag_list, tag_set_list)
> blk_mq_unfreeze_queue(q);
> }
> +
> +void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_qu=
eues)
> +{
> + mutex_lock(&set->tag_list_lock);
> + __blk_mq_update_nr_hw_queues(set, nr_hw_queues);
> + mutex_unlock(&set->tag_list_lock);
> +}
> EXPORT_SYMBOL_GPL(blk_mq_update_nr_hw_queues);
These changes look fine to me, hence:
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: WARNING triggers at blk_mq_update_nr_hw_queues during nvme_reset_work
2017-05-30 17:55 ` WARNING triggers at blk_mq_update_nr_hw_queues during nvme_reset_work Keith Busch
2017-05-30 18:09 ` Bart Van Assche
@ 2017-05-30 18:26 ` Jens Axboe
2017-05-30 18:30 ` Gabriel Krisman Bertazi
2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2017-05-30 18:26 UTC (permalink / raw)
To: Keith Busch, Gabriel Krisman Bertazi
Cc: linux-nvme, Bart Van Assche, linux-block
On 05/30/2017 11:55 AM, Keith Busch wrote:
> On Tue, May 30, 2017 at 02:00:44PM -0300, Gabriel Krisman Bertazi wrote:
>> Since the merge window for 4.12, one of the machines in Intel's CI
>> started to hit the WARN_ON below at blk_mq_update_nr_hw_queues during an
>> nvme_reset_work. The issue persists with the latest 4.12-rc3, and full
>> dmesg from boot, up to the moment where the WARN_ON triggers is
>> available at the following link:
>>
>> https://intel-gfx-ci.01.org/CI/CI_DRM_2672/fi-kbl-7500u/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
>>
>> Please notice that the test we do in the CI involves putting the
>> machine to sleep (PM), and the issue triggers when resuming execution.
>>
>> I have not been able to get my hands on the machine yet to do an actual
>> bisect, but I'm wondering if you guys might have an idea of what is
>> wrong.
>>
>> Any help is appreciated :)
>
> Hi Gabriel,
>
> This appears to be new behavior in blk-mq's tag set update with commit
> 705cda97e. This is asserting a lock is held, but none of the drivers
> that call the export are take that lock.
Ugh yes, that was a little sloppy... Would you mind sending this as
a proper patch? Then I'll queue it up for 4.12.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: WARNING triggers at blk_mq_update_nr_hw_queues during nvme_reset_work
2017-05-30 17:55 ` WARNING triggers at blk_mq_update_nr_hw_queues during nvme_reset_work Keith Busch
2017-05-30 18:09 ` Bart Van Assche
2017-05-30 18:26 ` Jens Axboe
@ 2017-05-30 18:30 ` Gabriel Krisman Bertazi
2 siblings, 0 replies; 4+ messages in thread
From: Gabriel Krisman Bertazi @ 2017-05-30 18:30 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, Jens Axboe, Bart Van Assche, linux-block
Keith Busch <keith.busch@intel.com> writes:
> On Tue, May 30, 2017 at 02:00:44PM -0300, Gabriel Krisman Bertazi wrote:
>> Since the merge window for 4.12, one of the machines in Intel's CI
>> started to hit the WARN_ON below at blk_mq_update_nr_hw_queues during an
>> nvme_reset_work. The issue persists with the latest 4.12-rc3, and full
>> dmesg from boot, up to the moment where the WARN_ON triggers is
>> available at the following link:
>>
>> https://intel-gfx-ci.01.org/CI/CI_DRM_2672/fi-kbl-7500u/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
>>
>> Please notice that the test we do in the CI involves putting the
>> machine to sleep (PM), and the issue triggers when resuming execution.
>>
>> I have not been able to get my hands on the machine yet to do an actual
>> bisect, but I'm wondering if you guys might have an idea of what is
>> wrong.
>>
>> Any help is appreciated :)
>
> Hi Gabriel,
>
> This appears to be new behavior in blk-mq's tag set update with commit
> 705cda97e. This is asserting a lock is held, but none of the drivers
> that call the export are take that lock.
>
> I think the below should fix it (CC'ing block list and developers).
>
Thanks for the quick fix, Keith. I'm running it against the CI to
confirm it fixes the issue and will send you my tested-by once the job
is completed.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 4+ messages in thread