Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

Linux virtualization list
 help / color / mirror / Atom feed

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
       [not found] <9c5eec5d-f542-4d76-6933-6fe31203ce09@de.ibm.com>
@ 2017-11-20 19:20 ` Bart Van Assche
       [not found] ` <1511205644.2396.32.camel@wdc.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-11-20 19:20 UTC (permalink / raw)
  To: virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com,
	borntraeger@de.ibm.com, axboe@kernel.dk, jasowang@redhat.com

On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
> This is 
> 
> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*

Did you really try to figure out when the code that reported the warning
was introduced? I think that warning was introduced through the following
commit:

commit fd1270d5df6a005e1248e87042159a799cc4b2c9
Date:   Wed Apr 16 09:23:48 2014 -0600

    blk-mq: don't use preempt_count() to check for right CPU
     
    UP or CONFIG_PREEMPT_NONE will return 0, and what we really
    want to check is whether or not we are on the right CPU.
    So don't make PREEMPT part of this, just test the CPU in
    the mask directly.

Anyway, I think that warning is appropriate and useful. So the next step
is to figure out what work item was involved and why that work item got
executed on the wrong CPU.

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
       [not found] ` <1511205644.2396.32.camel@wdc.com>
@ 2017-11-20 19:29   ` Christian Borntraeger
       [not found]   ` <04526c98-ffc5-1eca-3aa8-50f9212c4323@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-20 19:29 UTC (permalink / raw)
  To: Bart Van Assche, virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, axboe@kernel.dk,
	jasowang@redhat.com



On 11/20/2017 08:20 PM, Bart Van Assche wrote:
> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>> This is 
>>
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
> 
> Did you really try to figure out when the code that reported the warning
> was introduced? I think that warning was introduced through the following
> commit:

This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.

> 
> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
> Date:   Wed Apr 16 09:23:48 2014 -0600
> 
>     blk-mq: don't use preempt_count() to check for right CPU
>      
>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>     want to check is whether or not we are on the right CPU.
>     So don't make PREEMPT part of this, just test the CPU in
>     the mask directly.
> 
> Anyway, I think that warning is appropriate and useful. So the next step
> is to figure out what work item was involved and why that work item got
> executed on the wrong CPU.

It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
says: "no this is not a known issue" then :-)
I will try to take a dump to find out the work item

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
       [not found]   ` <04526c98-ffc5-1eca-3aa8-50f9212c4323@de.ibm.com>
@ 2017-11-20 19:42     ` Jens Axboe
       [not found]     ` <5c9f2228-0a8b-8225-7038-e6cb3f31ca0b@kernel.dk>
  1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-20 19:42 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com

On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>> This is 
>>>
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>
>> Did you really try to figure out when the code that reported the warning
>> was introduced? I think that warning was introduced through the following
>> commit:
> 
> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
> 
>>
>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>
>>     blk-mq: don't use preempt_count() to check for right CPU
>>      
>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>     want to check is whether or not we are on the right CPU.
>>     So don't make PREEMPT part of this, just test the CPU in
>>     the mask directly.
>>
>> Anyway, I think that warning is appropriate and useful. So the next step
>> is to figure out what work item was involved and why that work item got
>> executed on the wrong CPU.
> 
> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
> says: "no this is not a known issue" then :-)
> I will try to take a dump to find out the work item

blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
and we reconfigure the mappings. So I don't think the above is unexpected,
if you are doing CPU hot unplug while running a fio job.

While it's a bit annoying that we trigger the WARN_ON() for a condition
that can happen, we're basically interested in it if it triggers for
normal operations.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
       [not found]     ` <5c9f2228-0a8b-8225-7038-e6cb3f31ca0b@kernel.dk>
@ 2017-11-20 20:49       ` Christian Borntraeger
       [not found]       ` <2e44dbd3-2f90-c267-560c-91d1d4b0e892@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-20 20:49 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org



On 11/20/2017 08:42 PM, Jens Axboe wrote:
> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>> This is 
>>>>
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>
>>> Did you really try to figure out when the code that reported the warning
>>> was introduced? I think that warning was introduced through the following
>>> commit:
>>
>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>
>>>
>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>
>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>      
>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>     want to check is whether or not we are on the right CPU.
>>>     So don't make PREEMPT part of this, just test the CPU in
>>>     the mask directly.
>>>
>>> Anyway, I think that warning is appropriate and useful. So the next step
>>> is to figure out what work item was involved and why that work item got
>>> executed on the wrong CPU.
>>
>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>> says: "no this is not a known issue" then :-)
>> I will try to take a dump to find out the work item
> 
> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
> and we reconfigure the mappings. So I don't think the above is unexpected,
> if you are doing CPU hot unplug while running a fio job.

I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

 
> While it's a bit annoying that we trigger the WARN_ON() for a condition
> that can happen, we're basically interested in it if it triggers for
> normal operations.

I think we should never trigger a WARN_ON on conditions that can happen. I know some
folks enabling panic_on_warn to detect/avoid data integrity issues. FWIW, this also seems
to happen wit 4.13 and 4.12

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
       [not found]       ` <2e44dbd3-2f90-c267-560c-91d1d4b0e892@de.ibm.com>
@ 2017-11-20 20:52         ` Jens Axboe
  2017-11-21  8:35           ` Christian Borntraeger
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-11-20 20:52 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org

On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>> This is 
>>>>>
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>
>>>> Did you really try to figure out when the code that reported the warning
>>>> was introduced? I think that warning was introduced through the following
>>>> commit:
>>>
>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>
>>>>
>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>
>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>      
>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>     want to check is whether or not we are on the right CPU.
>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>     the mask directly.
>>>>
>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>> is to figure out what work item was involved and why that work item got
>>>> executed on the wrong CPU.
>>>
>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>> says: "no this is not a known issue" then :-)
>>> I will try to take a dump to find out the work item
>>
>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>> and we reconfigure the mappings. So I don't think the above is unexpected,
>> if you are doing CPU hot unplug while running a fio job.
> 
> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

OK, that's different, we should not be triggering a warning for that.
What does your machine/virtblk topology look like in terms of CPUS,
nr of queues for virtblk, etc?

You can probably get this info the easiest by just doing a:

# find /sys/kernel/debug/block/virtX

replace virtX with your virtblk device name. Generate this info both
before and after the hotplug event.

>> While it's a bit annoying that we trigger the WARN_ON() for a condition
>> that can happen, we're basically interested in it if it triggers for
>> normal operations.
> 
> I think we should never trigger a WARN_ON on conditions that can
> happen. I know some folks enabling panic_on_warn to detect/avoid data
> integrity issues. FWIW, this also seems to happen wit 4.13 and 4.12

It's not supposed to happen for your case, so I'd say it's been useful.
It's not a critical thing, but it is something that should not trigger
and we need to look into why it did, and fixing it up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-20 20:52         ` Jens Axboe
@ 2017-11-21  8:35           ` Christian Borntraeger
  2017-11-21  9:50             ` Christian Borntraeger
       [not found]             ` <15f232d2-2aaa-df7c-57e8-2f710e051e84@de.ibm.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21  8:35 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org



On 11/20/2017 09:52 PM, Jens Axboe wrote:
> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>> This is 
>>>>>>
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>
>>>>> Did you really try to figure out when the code that reported the warning
>>>>> was introduced? I think that warning was introduced through the following
>>>>> commit:
>>>>
>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>
>>>>>
>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>
>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>      
>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>     want to check is whether or not we are on the right CPU.
>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>     the mask directly.
>>>>>
>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>> is to figure out what work item was involved and why that work item got
>>>>> executed on the wrong CPU.
>>>>
>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>> says: "no this is not a known issue" then :-)
>>>> I will try to take a dump to find out the work item
>>>
>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>> if you are doing CPU hot unplug while running a fio job.
>>
>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
> 
> OK, that's different, we should not be triggering a warning for that.
> What does your machine/virtblk topology look like in terms of CPUS,
> nr of queues for virtblk, etc?

FWIW, 4.11 does work, 4.12 and later is broken.

> 
> You can probably get this info the easiest by just doing a:
> 
> # find /sys/kernel/debug/block/virtX
> 
> replace virtX with your virtblk device name. Generate this info both
> before and after the hotplug event.

It happens in all variants (1 cpu to 2 or 16 to 17 and independent of the
number of disks).

What I can see is that the block layer does not yet sees the new CPU:

[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

--> in host virsh setvcpu zhyp137 2

[root@zhyp137 ~]# chcpu -e 1
CPU 1 enabled
[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat



If I already start with 2 cpus it looks like the following (all cpu1 entries are new)

[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu1
/sys/kernel/debug/block/vda/hctx0/cpu1/completed
/sys/kernel/debug/block/vda/hctx0/cpu1/merged
/sys/kernel/debug/block/vda/hctx0/cpu1/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu1/rq_list
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk
  2017-11-21  8:35           ` Christian Borntraeger
@ 2017-11-21  9:50             ` Christian Borntraeger
       [not found]             ` <15f232d2-2aaa-df7c-57e8-2f710e051e84@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21  9:50 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org



On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>> This is 
>>>>>>>
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>
>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>> was introduced? I think that warning was introduced through the following
>>>>>> commit:
>>>>>
>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>
>>>>>>
>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>
>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>      
>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>     the mask directly.
>>>>>>
>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>> is to figure out what work item was involved and why that work item got
>>>>>> executed on the wrong CPU.
>>>>>
>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>> says: "no this is not a known issue" then :-)
>>>>> I will try to take a dump to find out the work item
>>>>
>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>> if you are doing CPU hot unplug while running a fio job.
>>>
>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>
>> OK, that's different, we should not be triggering a warning for that.
>> What does your machine/virtblk topology look like in terms of CPUS,
>> nr of queues for virtblk, etc?
> 
> FWIW, 4.11 does work, 4.12 and later is broken.

In fact: 4.12 is fine, 4.12.14 is broken.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]             ` <15f232d2-2aaa-df7c-57e8-2f710e051e84@de.ibm.com>
@ 2017-11-21 10:14               ` Christian Borntraeger
       [not found]               ` <c8bd769e-9742-205d-11b0-469428a8579c@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21 10:14 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig



On 11/21/2017 10:50 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>>>>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>>>>>>> This is 
>>>>>>>>
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
>>>>>>>> 6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
>>>>>>>> b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
>>>>>>>
>>>>>>> Did you really try to figure out when the code that reported the warning
>>>>>>> was introduced? I think that warning was introduced through the following
>>>>>>> commit:
>>>>>>
>>>>>> This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
>>>>>>
>>>>>>>
>>>>>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>>>>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>>>>>
>>>>>>>     blk-mq: don't use preempt_count() to check for right CPU
>>>>>>>      
>>>>>>>     UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>>>>>>     want to check is whether or not we are on the right CPU.
>>>>>>>     So don't make PREEMPT part of this, just test the CPU in
>>>>>>>     the mask directly.
>>>>>>>
>>>>>>> Anyway, I think that warning is appropriate and useful. So the next step
>>>>>>> is to figure out what work item was involved and why that work item got
>>>>>>> executed on the wrong CPU.
>>>>>>
>>>>>> It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
>>>>>> says: "no this is not a known issue" then :-)
>>>>>> I will try to take a dump to find out the work item
>>>>>
>>>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>>>> if you are doing CPU hot unplug while running a fio job.
>>>>
>>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>>
>>> OK, that's different, we should not be triggering a warning for that.
>>> What does your machine/virtblk topology look like in terms of CPUS,
>>> nr of queues for virtblk, etc?
>>
>> FWIW, 4.11 does work, 4.12 and later is broken.
> 
> In fact: 4.12 is fine, 4.12.14 is broken.


Bisect points to

1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jun 26 12:20:57 2017 +0200

    blk-mq: Create hctx for each present CPU
    
    commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
    
    Currently we only create hctx for online CPUs, which can lead to a lot
    of churn due to frequent soft offline / online operations.  Instead
    allocate one for each present CPU to avoid this and dramatically simplify
    the code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jens Axboe <axboe@kernel.dk>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: linux-block@vger.kernel.org
    Cc: linux-nvme@lists.infradead.org
    Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
    Cc: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 a61cb023014a7b7a6b9f24ea04fe8ab22299e706 059ba6dc3290c74e0468937348e580cd53f963e7 M	block
:040000 040000 432e719d7e738ffcddfb8fc964544d3b3e0a68f7 f4572aa21b249a851a1b604c148eea109e93b30d M	include





adding Christoph FWIW, your patch triggers the following on 4.14 when doing a cpu hotplug (adding a
CPU) and then accessing a virtio-blk device.


  747.652408] ------------[ cut here ]------------
[  747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 __blk_mq_run_hw_queue+0xd4/0x100
[  747.652410] Modules linked in: dm_multipath
[  747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: G        W       4.14.0+ #191
[  747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux)
[  747.652414] Workqueue: kblockd blk_mq_run_work_fn
[  747.652414] task: 0000000060680000 task.stack: 000000005ea30000
[  747.652415] Krnl PSW : 0704f00180000000 0000000000505864 (__blk_mq_run_hw_queue+0xd4/0x100)
[  747.652417]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[  747.652417] Krnl GPRS: 0000000000000010 00000000000000ff 000000005cbec400 0000000000000000
[  747.652418]            0000000063709120 0000000000000000 0000000063709500 0000000059fa44b0
[  747.652418]            0000000059fa4480 0000000000000000 000000006370f700 0000000063709100
[  747.652419]            000000005cbec500 0000000000970948 000000005ea33d80 000000005ea33d48
[  747.652423] Krnl Code: 0000000000505854: ebaff0a00004        lmg     %r10,%r15,160(%r15)
           000000000050585a: c0f4ffe690d3       brcl    15,1d7a00
          #0000000000505860: a7f40001           brc     15,505862
          >0000000000505864: 581003b0           l       %r1,944
           0000000000505868: c01b001fff00       nilf    %r1,2096896
           000000000050586e: a784ffdb           brc     8,505824
           0000000000505872: a7f40001           brc     15,505874
           0000000000505876: 9120218f           tm      399(%r2),32
[  747.652435] Call Trace:
[  747.652435] ([<0000000063709600>] 0x63709600)
[  747.652436]  [<0000000000187bcc>] process_one_work+0x264/0x4b8 
[  747.652438]  [<0000000000187e78>] worker_thread+0x58/0x4f8 
[  747.652439]  [<000000000018ee94>] kthread+0x144/0x168 
[  747.652439]  [<00000000008f8a62>] kernel_thread_starter+0x6/0xc 
[  747.652440]  [<00000000008f8a5c>] kernel_thread_starter+0x0/0xc 
[  747.652440] Last Breaking-Event-Address:
[  747.652441]  [<0000000000505860>] __blk_mq_run_hw_queue+0xd0/0x100
[  747.652442] ---[ end trace 4a001a80379b18ba ]---
[  747.652450] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]               ` <c8bd769e-9742-205d-11b0-469428a8579c@de.ibm.com>
@ 2017-11-21 17:27                 ` Jens Axboe
       [not found]                 ` <b7b3cf4b-837e-f9ba-61c0-4f9ddd8b9a95@kernel.dk>
  1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 17:27 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig

On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
> 
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Jun 26 12:20:57 2017 +0200
> 
>     blk-mq: Create hctx for each present CPU
>     
>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>     
>     Currently we only create hctx for online CPUs, which can lead to a lot
>     of churn due to frequent soft offline / online operations.  Instead
>     allocate one for each present CPU to avoid this and dramatically simplify
>     the code.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>     Cc: Keith Busch <keith.busch@intel.com>
>     Cc: linux-block@vger.kernel.org
>     Cc: linux-nvme@lists.infradead.org
>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>     Cc: Mike Galbraith <efault@gmx.de>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

I wonder if we're simply not getting the masks updated correctly. I'll
take a look.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                 ` <b7b3cf4b-837e-f9ba-61c0-4f9ddd8b9a95@kernel.dk>
@ 2017-11-21 18:09                   ` Jens Axboe
  2017-11-21 18:12                     ` Christian Borntraeger
       [not found]                     ` <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 18:09 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig

On 11/21/2017 10:27 AM, Jens Axboe wrote:
> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>> Bisect points to
>>
>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>
>>     blk-mq: Create hctx for each present CPU
>>     
>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>     
>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>     of churn due to frequent soft offline / online operations.  Instead
>>     allocate one for each present CPU to avoid this and dramatically simplify
>>     the code.
>>     
>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>     Cc: Keith Busch <keith.busch@intel.com>
>>     Cc: linux-block@vger.kernel.org
>>     Cc: linux-nvme@lists.infradead.org
>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>     Cc: Mike Galbraith <efault@gmx.de>
>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> I wonder if we're simply not getting the masks updated correctly. I'll
> take a look.

Can't make it trigger here. We do init for each present CPU, which means
that if I offline a few CPUs here and register a queue, those still show
up as present (just offline) and get mapped accordingly.

From the looks of it, your setup is different. If the CPU doesn't show
up as present and it gets hotplugged, then I can see how this condition
would trigger. What environment are you running this in? We might have
to re-introduce the cpu hotplug notifier, right now we just monitor
for a dead cpu and handle that.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:09                   ` Jens Axboe
@ 2017-11-21 18:12                     ` Christian Borntraeger
       [not found]                     ` <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21 18:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig



On 11/21/2017 07:09 PM, Jens Axboe wrote:
> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>> Bisect points to
>>>
>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>> Author: Christoph Hellwig <hch@lst.de>
>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>
>>>     blk-mq: Create hctx for each present CPU
>>>     
>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>     
>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>     of churn due to frequent soft offline / online operations.  Instead
>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>     the code.
>>>     
>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>     Cc: linux-block@vger.kernel.org
>>>     Cc: linux-nvme@lists.infradead.org
>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> I wonder if we're simply not getting the masks updated correctly. I'll
>> take a look.
> 
> Can't make it trigger here. We do init for each present CPU, which means
> that if I offline a few CPUs here and register a queue, those still show
> up as present (just offline) and get mapped accordingly.
> 
> From the looks of it, your setup is different. If the CPU doesn't show
> up as present and it gets hotplugged, then I can see how this condition
> would trigger. What environment are you running this in? We might have
> to re-introduce the cpu hotplug notifier, right now we just monitor
> for a dead cpu and handle that.

I am not doing a hot unplug and the replug, I use KVM and add a previously
not available CPU.

in libvirt/virsh speak:
  <vcpu placement='static' current='1'>4</vcpu>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                     ` <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com>
@ 2017-11-21 18:27                       ` Jens Axboe
  2017-11-21 18:39                         ` Jens Axboe
  2017-11-23 14:02                       ` Christoph Hellwig
       [not found]                       ` <20171123140208.GA28914@lst.de>
  2 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 18:27 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig

On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>> Bisect points to
>>>>
>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>> Author: Christoph Hellwig <hch@lst.de>
>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>
>>>>     blk-mq: Create hctx for each present CPU
>>>>     
>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>     
>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>     the code.
>>>>     
>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>     Cc: linux-block@vger.kernel.org
>>>>     Cc: linux-nvme@lists.infradead.org
>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>> take a look.
>>
>> Can't make it trigger here. We do init for each present CPU, which means
>> that if I offline a few CPUs here and register a queue, those still show
>> up as present (just offline) and get mapped accordingly.
>>
>> From the looks of it, your setup is different. If the CPU doesn't show
>> up as present and it gets hotplugged, then I can see how this condition
>> would trigger. What environment are you running this in? We might have
>> to re-introduce the cpu hotplug notifier, right now we just monitor
>> for a dead cpu and handle that.
> 
> I am not doing a hot unplug and the replug, I use KVM and add a previously
> not available CPU.
> 
> in libvirt/virsh speak:
>   <vcpu placement='static' current='1'>4</vcpu>

So that's why we run into problems. It's not present when we load the device,
but becomes present and online afterwards.

Christoph, we used to handle this just fine, your patch broke it.

I'll see if I can come up with an appropriate fix.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:27                       ` Jens Axboe
@ 2017-11-21 18:39                         ` Jens Axboe
  2017-11-21 19:15                           ` Christian Borntraeger
       [not found]                           ` <ba994ec6-8db6-f77e-ac73-92e3f6b0135a@de.ibm.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 18:39 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig

On 11/21/2017 11:27 AM, Jens Axboe wrote:
> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>> Bisect points to
>>>>>
>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>
>>>>>     blk-mq: Create hctx for each present CPU
>>>>>     
>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>     
>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>     the code.
>>>>>     
>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>     Cc: linux-block@vger.kernel.org
>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>
>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>> take a look.
>>>
>>> Can't make it trigger here. We do init for each present CPU, which means
>>> that if I offline a few CPUs here and register a queue, those still show
>>> up as present (just offline) and get mapped accordingly.
>>>
>>> From the looks of it, your setup is different. If the CPU doesn't show
>>> up as present and it gets hotplugged, then I can see how this condition
>>> would trigger. What environment are you running this in? We might have
>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>> for a dead cpu and handle that.
>>
>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>> not available CPU.
>>
>> in libvirt/virsh speak:
>>   <vcpu placement='static' current='1'>4</vcpu>
> 
> So that's why we run into problems. It's not present when we load the device,
> but becomes present and online afterwards.
> 
> Christoph, we used to handle this just fine, your patch broke it.
> 
> I'll see if I can come up with an appropriate fix.

Can you try the below?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index b600463791ec..ab3a66e7bd03 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -40,6 +40,7 @@
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
+static void blk_mq_map_swqueue(struct request_queue *q);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
+static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
+{
+	struct blk_mq_hw_ctx *hctx;
+
+	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
+	blk_mq_map_swqueue(hctx->queue);
+	return 0;
+}
+
 /*
  * 'cpu' is going away. splice any existing rq_list entries from this
  * software queue to the hw queue dispatch list, and ensure that it
@@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	struct blk_mq_ctx *ctx;
 	LIST_HEAD(tmp);
 
-	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
 	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
 
 	spin_lock(&ctx->lock);
@@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 
 static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
 {
-	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
-					    &hctx->cpuhp_dead);
+	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
 }
 
 /* hctx->ctxs will be freed in queue's release handler */
@@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	hctx->queue = q;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
 
-	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
+	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
 
 	hctx->tags = set->tags[hctx_idx];
 
@@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
 	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
 			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
 
-	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
+	cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+				blk_mq_hctx_notify_prepare,
 				blk_mq_hctx_notify_dead);
 	return 0;
 }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 95c9a5c862e2..a6f03e9464fb 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -52,7 +52,7 @@ struct blk_mq_hw_ctx {
 
 	atomic_t		nr_active;
 
-	struct hlist_node	cpuhp_dead;
+	struct hlist_node	cpuhp;
 	struct kobject		kobj;
 
 	unsigned long		poll_considered;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index ec32c4c5eb30..28b0fc9229c8 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -48,7 +48,7 @@ enum cpuhp_state {
 	CPUHP_BLOCK_SOFTIRQ_DEAD,
 	CPUHP_ACPI_CPUDRV_DEAD,
 	CPUHP_S390_PFAULT_DEAD,
-	CPUHP_BLK_MQ_DEAD,
+	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_FS_BUFF_DEAD,
 	CPUHP_PRINTK_DEAD,
 	CPUHP_MM_MEMCQ_DEAD,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 18:39                         ` Jens Axboe
@ 2017-11-21 19:15                           ` Christian Borntraeger
       [not found]                           ` <ba994ec6-8db6-f77e-ac73-92e3f6b0135a@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21 19:15 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig



On 11/21/2017 07:39 PM, Jens Axboe wrote:
> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>> Bisect points to
>>>>>>
>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>
>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>     
>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>     
>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>     the code.
>>>>>>     
>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>
>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>> take a look.
>>>>
>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>> that if I offline a few CPUs here and register a queue, those still show
>>>> up as present (just offline) and get mapped accordingly.
>>>>
>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>> up as present and it gets hotplugged, then I can see how this condition
>>>> would trigger. What environment are you running this in? We might have
>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>> for a dead cpu and handle that.
>>>
>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>> not available CPU.
>>>
>>> in libvirt/virsh speak:
>>>   <vcpu placement='static' current='1'>4</vcpu>
>>
>> So that's why we run into problems. It's not present when we load the device,
>> but becomes present and online afterwards.
>>
>> Christoph, we used to handle this just fine, your patch broke it.
>>
>> I'll see if I can come up with an appropriate fix.
> 
> Can you try the below?


It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:


output with 2 cpus:
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b600463791ec..ab3a66e7bd03 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -40,6 +40,7 @@
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> +static void blk_mq_map_swqueue(struct request_queue *q);
> 
>  static int blk_mq_poll_stats_bkt(const struct request *rq)
>  {
> @@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	return -ENOMEM;
>  }
> 
> +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct blk_mq_hw_ctx *hctx;
> +
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
> +	blk_mq_map_swqueue(hctx->queue);
> +	return 0;
> +}
> +
>  /*
>   * 'cpu' is going away. splice any existing rq_list entries from this
>   * software queue to the hw queue dispatch list, and ensure that it
> @@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	struct blk_mq_ctx *ctx;
>  	LIST_HEAD(tmp);
> 
> -	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
> +	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
>  	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
> 
>  	spin_lock(&ctx->lock);
> @@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
> 
>  static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
>  {
> -	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
> -					    &hctx->cpuhp_dead);
> +	cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
>  }
> 
>  /* hctx->ctxs will be freed in queue's release handler */
> @@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>  	hctx->queue = q;
>  	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
> 
> -	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, &hctx->cpuhp_dead);
> +	cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, &hctx->cpuhp);
> 
>  	hctx->tags = set->tags[hctx_idx];
> 
> @@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
>  	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
>  			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
> 
> -	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> +	cpuhp_setup_state_multi(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				blk_mq_hctx_notify_prepare,
>  				blk_mq_hctx_notify_dead);
>  	return 0;
>  }
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 95c9a5c862e2..a6f03e9464fb 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -52,7 +52,7 @@ struct blk_mq_hw_ctx {
> 
>  	atomic_t		nr_active;
> 
> -	struct hlist_node	cpuhp_dead;
> +	struct hlist_node	cpuhp;
>  	struct kobject		kobj;
> 
>  	unsigned long		poll_considered;
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index ec32c4c5eb30..28b0fc9229c8 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -48,7 +48,7 @@ enum cpuhp_state {
>  	CPUHP_BLOCK_SOFTIRQ_DEAD,
>  	CPUHP_ACPI_CPUDRV_DEAD,
>  	CPUHP_S390_PFAULT_DEAD,
> -	CPUHP_BLK_MQ_DEAD,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_FS_BUFF_DEAD,
>  	CPUHP_PRINTK_DEAD,
>  	CPUHP_MM_MEMCQ_DEAD,
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                           ` <ba994ec6-8db6-f77e-ac73-92e3f6b0135a@de.ibm.com>
@ 2017-11-21 19:30                             ` Jens Axboe
       [not found]                             ` <ae02b9c5-9a2e-cb8b-7828-475b3c0b1cb9@kernel.dk>
  1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 19:30 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig

On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>> Bisect points to
>>>>>>>
>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>
>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>     
>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>     
>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>     the code.
>>>>>>>     
>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>
>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>> take a look.
>>>>>
>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>> up as present (just offline) and get mapped accordingly.
>>>>>
>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>> would trigger. What environment are you running this in? We might have
>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>> for a dead cpu and handle that.
>>>>
>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>> not available CPU.
>>>>
>>>> in libvirt/virsh speak:
>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>
>>> So that's why we run into problems. It's not present when we load the device,
>>> but becomes present and online afterwards.
>>>
>>> Christoph, we used to handle this just fine, your patch broke it.
>>>
>>> I'll see if I can come up with an appropriate fix.
>>
>> Can you try the below?
> 
> 
> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
> 
> 
> output with 2 cpus:
> /sys/kernel/debug/block/vda
> /sys/kernel/debug/block/vda/hctx0
> /sys/kernel/debug/block/vda/hctx0/cpu0
> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
> /sys/kernel/debug/block/vda/hctx0/active
> /sys/kernel/debug/block/vda/hctx0/run
> /sys/kernel/debug/block/vda/hctx0/queued
> /sys/kernel/debug/block/vda/hctx0/dispatched
> /sys/kernel/debug/block/vda/hctx0/io_poll
> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/sched_tags
> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/tags
> /sys/kernel/debug/block/vda/hctx0/ctx_map
> /sys/kernel/debug/block/vda/hctx0/busy
> /sys/kernel/debug/block/vda/hctx0/dispatch
> /sys/kernel/debug/block/vda/hctx0/flags
> /sys/kernel/debug/block/vda/hctx0/state
> /sys/kernel/debug/block/vda/sched
> /sys/kernel/debug/block/vda/sched/dispatch
> /sys/kernel/debug/block/vda/sched/starved
> /sys/kernel/debug/block/vda/sched/batching
> /sys/kernel/debug/block/vda/sched/write_next_rq
> /sys/kernel/debug/block/vda/sched/write_fifo_list
> /sys/kernel/debug/block/vda/sched/read_next_rq
> /sys/kernel/debug/block/vda/sched/read_fifo_list
> /sys/kernel/debug/block/vda/write_hints
> /sys/kernel/debug/block/vda/state
> /sys/kernel/debug/block/vda/requeue_list
> /sys/kernel/debug/block/vda/poll_stat

Try this, basically just a revert.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..bc1950fa9ef6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,9 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
+static DEFINE_MUTEX(all_q_mutex);
+static LIST_HEAD(all_q_list);
+
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 		INIT_LIST_HEAD(&__ctx->rq_list);
 		__ctx->queue = q;
 
-		/* If the cpu isn't present, the cpu is mapped to first hctx */
-		if (!cpu_present(i))
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpu_online(i))
 			continue;
 
 		hctx = blk_mq_map_queue(q, i);
@@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	}
 }
 
-static void blk_mq_map_swqueue(struct request_queue *q)
+static void blk_mq_map_swqueue(struct request_queue *q,
+			       const struct cpumask *online_mask)
 {
 	unsigned int i, hctx_idx;
 	struct blk_mq_hw_ctx *hctx;
@@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	}
 
 	/*
-	 * Map software to hardware queues.
-	 *
-	 * If the cpu isn't present, the cpu is mapped to first hctx.
+	 * Map software to hardware queues
 	 */
-	for_each_present_cpu(i) {
+	for_each_possible_cpu(i) {
+		/* If the cpu isn't online, the cpu is mapped to first hctx */
+		if (!cpumask_test_cpu(i, online_mask))
+			continue;
+
 		hctx_idx = q->mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
@@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		blk_queue_softirq_done(q, set->ops->complete);
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
+
+	get_online_cpus();
+	mutex_lock(&all_q_mutex);
+
+	list_add_tail(&q->all_q_node, &all_q_list);
 	blk_mq_add_queue_tag_set(set, q);
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, cpu_online_mask);
+
+	mutex_unlock(&all_q_mutex);
+	put_online_cpus();
 
 	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
 		int ret;
@@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
 {
 	struct blk_mq_tag_set	*set = q->tag_set;
 
+	mutex_lock(&all_q_mutex);
+	list_del_init(&q->all_q_node);
+	mutex_unlock(&all_q_mutex);
+
 	blk_mq_del_queue_tag_set(q);
+
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q)
+static void blk_mq_queue_reinit(struct request_queue *q,
+				const struct cpumask *online_mask)
 {
 	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
 
@@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
 	 * we should change hctx numa_node according to the new topology (this
 	 * involves freeing and re-allocating memory, worth doing?)
 	 */
-	blk_mq_map_swqueue(q);
+	blk_mq_map_swqueue(q, online_mask);
 
 	blk_mq_sysfs_register(q);
 	blk_mq_debugfs_register_hctxs(q);
 }
 
+/*
+ * New online cpumask which is going to be set in this hotplug event.
+ * Declare this cpumasks as global as cpu-hotplug operation is invoked
+ * one-by-one and dynamically allocating this could result in a failure.
+ */
+static struct cpumask cpuhp_online_new;
+
+static void blk_mq_queue_reinit_work(void)
+{
+	struct request_queue *q;
+
+	mutex_lock(&all_q_mutex);
+	/*
+	 * We need to freeze and reinit all existing queues.  Freezing
+	 * involves synchronous wait for an RCU grace period and doing it
+	 * one by one may take a long time.  Start freezing all queues in
+	 * one swoop and then wait for the completions so that freezing can
+	 * take place in parallel.
+	 */
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_freeze_queue_start(q);
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_wait(q);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_queue_reinit(q, &cpuhp_online_new);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_unfreeze_queue(q);
+
+	mutex_unlock(&all_q_mutex);
+}
+
+static int blk_mq_queue_reinit_dead(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
+/*
+ * Before hotadded cpu starts handling requests, new mappings must be
+ * established.  Otherwise, these requests in hw queue might never be
+ * dispatched.
+ *
+ * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
+ * for CPU0, and ctx1 for CPU1).
+ *
+ * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
+ * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
+ *
+ * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
+ * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
+ * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
+ * ignored.
+ */
+static int blk_mq_queue_reinit_prepare(unsigned int cpu)
+{
+	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
+	cpumask_set_cpu(cpu, &cpuhp_online_new);
+	blk_mq_queue_reinit_work();
+	return 0;
+}
+
 static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
@@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_realloc_hw_ctxs(set, q);
-		blk_mq_queue_reinit(q);
+		blk_mq_queue_reinit(q, cpu_online_mask);
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 	return __blk_mq_poll(hctx, rq);
 }
 
+void blk_mq_disable_hotplug(void)
+{
+	mutex_lock(&all_q_mutex);
+}
+
+void blk_mq_enable_hotplug(void)
+{
+	mutex_unlock(&all_q_mutex);
+}
+
 static int __init blk_mq_init(void)
 {
 	/*
@@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
 
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
+
+	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
+				  blk_mq_queue_reinit_prepare,
+				  blk_mq_queue_reinit_dead);
 	return 0;
 }
 subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff5bf62..83b13ef1915e 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
+/*
+ * CPU hotplug helpers
+ */
+void blk_mq_enable_hotplug(void);
+void blk_mq_disable_hotplug(void);
 
 /*
  * CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 201ab7267986..c31d4e3bf6d0 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -76,6 +76,7 @@ enum cpuhp_state {
 	CPUHP_XEN_EVTCHN_PREPARE,
 	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
 	CPUHP_SH_SH3X_PREPARE,
+	CPUHP_BLK_MQ_PREPARE,
 	CPUHP_NET_FLOW_PREPARE,
 	CPUHP_TOPOLOGY_PREPARE,
 	CPUHP_NET_IUCV_PREPARE,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                             ` <ae02b9c5-9a2e-cb8b-7828-475b3c0b1cb9@kernel.dk>
@ 2017-11-21 20:12                               ` Christian Borntraeger
       [not found]                               ` <c438db5f-f4f1-69f8-37f3-e91eae29fa25@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig



On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>> Bisect points to
>>>>>>>>
>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>
>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>     
>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>     
>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>     the code.
>>>>>>>>     
>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>
>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>> take a look.
>>>>>>
>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>
>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>> would trigger. What environment are you running this in? We might have
>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>> for a dead cpu and handle that.
>>>>>
>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>> not available CPU.
>>>>>
>>>>> in libvirt/virsh speak:
>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>
>>>> So that's why we run into problems. It's not present when we load the device,
>>>> but becomes present and online afterwards.
>>>>
>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>
>>>> I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
> 
> Try this, basically just a revert.

Yes, seems to work.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-


> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
>  #include "blk-wbt.h"
>  #include "blk-mq-sched.h"
> 
> +static DEFINE_MUTEX(all_q_mutex);
> +static LIST_HEAD(all_q_list);
> +
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> @@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>  		INIT_LIST_HEAD(&__ctx->rq_list);
>  		__ctx->queue = q;
> 
> -		/* If the cpu isn't present, the cpu is mapped to first hctx */
> -		if (!cpu_present(i))
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpu_online(i))
>  			continue;
> 
>  		hctx = blk_mq_map_queue(q, i);
> @@ -2158,7 +2161,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>  	}
>  }
> 
> -static void blk_mq_map_swqueue(struct request_queue *q)
> +static void blk_mq_map_swqueue(struct request_queue *q,
> +			       const struct cpumask *online_mask)
>  {
>  	unsigned int i, hctx_idx;
>  	struct blk_mq_hw_ctx *hctx;
> @@ -2176,11 +2180,13 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  	}
> 
>  	/*
> -	 * Map software to hardware queues.
> -	 *
> -	 * If the cpu isn't present, the cpu is mapped to first hctx.
> +	 * Map software to hardware queues
>  	 */
> -	for_each_present_cpu(i) {
> +	for_each_possible_cpu(i) {
> +		/* If the cpu isn't online, the cpu is mapped to first hctx */
> +		if (!cpumask_test_cpu(i, online_mask))
> +			continue;
> +
>  		hctx_idx = q->mq_map[i];
>  		/* unmapped hw queue can be remapped after CPU topo changed */
>  		if (!set->tags[hctx_idx] &&
> @@ -2495,8 +2501,16 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  		blk_queue_softirq_done(q, set->ops->complete);
> 
>  	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
> +
> +	get_online_cpus();
> +	mutex_lock(&all_q_mutex);
> +
> +	list_add_tail(&q->all_q_node, &all_q_list);
>  	blk_mq_add_queue_tag_set(set, q);
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, cpu_online_mask);
> +
> +	mutex_unlock(&all_q_mutex);
> +	put_online_cpus();
> 
>  	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
>  		int ret;
> @@ -2522,12 +2536,18 @@ void blk_mq_free_queue(struct request_queue *q)
>  {
>  	struct blk_mq_tag_set	*set = q->tag_set;
> 
> +	mutex_lock(&all_q_mutex);
> +	list_del_init(&q->all_q_node);
> +	mutex_unlock(&all_q_mutex);
> +
>  	blk_mq_del_queue_tag_set(q);
> +
>  	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
>  }
> 
>  /* Basically redo blk_mq_init_queue with queue frozen */
> -static void blk_mq_queue_reinit(struct request_queue *q)
> +static void blk_mq_queue_reinit(struct request_queue *q,
> +				const struct cpumask *online_mask)
>  {
>  	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
> 
> @@ -2539,12 +2559,76 @@ static void blk_mq_queue_reinit(struct request_queue *q)
>  	 * we should change hctx numa_node according to the new topology (this
>  	 * involves freeing and re-allocating memory, worth doing?)
>  	 */
> -	blk_mq_map_swqueue(q);
> +	blk_mq_map_swqueue(q, online_mask);
> 
>  	blk_mq_sysfs_register(q);
>  	blk_mq_debugfs_register_hctxs(q);
>  }
> 
> +/*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask cpuhp_online_new;
> +
> +static void blk_mq_queue_reinit_work(void)
> +{
> +	struct request_queue *q;
> +
> +	mutex_lock(&all_q_mutex);
> +	/*
> +	 * We need to freeze and reinit all existing queues.  Freezing
> +	 * involves synchronous wait for an RCU grace period and doing it
> +	 * one by one may take a long time.  Start freezing all queues in
> +	 * one swoop and then wait for the completions so that freezing can
> +	 * take place in parallel.
> +	 */
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_freeze_queue_start(q);
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_wait(q);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_queue_reinit(q, &cpuhp_online_new);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_unfreeze_queue(q);
> +
> +	mutex_unlock(&all_q_mutex);
> +}
> +
> +static int blk_mq_queue_reinit_dead(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
> +/*
> + * Before hotadded cpu starts handling requests, new mappings must be
> + * established.  Otherwise, these requests in hw queue might never be
> + * dispatched.
> + *
> + * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
> + * for CPU0, and ctx1 for CPU1).
> + *
> + * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
> + * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
> + *
> + * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
> + * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
> + * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
> + * ignored.
> + */
> +static int blk_mq_queue_reinit_prepare(unsigned int cpu)
> +{
> +	cpumask_copy(&cpuhp_online_new, cpu_online_mask);
> +	cpumask_set_cpu(cpu, &cpuhp_online_new);
> +	blk_mq_queue_reinit_work();
> +	return 0;
> +}
> +
>  static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  {
>  	int i;
> @@ -2757,7 +2841,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  	blk_mq_update_queue_map(set);
>  	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>  		blk_mq_realloc_hw_ctxs(set, q);
> -		blk_mq_queue_reinit(q);
> +		blk_mq_queue_reinit(q, cpu_online_mask);
>  	}
> 
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
> @@ -2966,6 +3050,16 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
>  	return __blk_mq_poll(hctx, rq);
>  }
> 
> +void blk_mq_disable_hotplug(void)
> +{
> +	mutex_lock(&all_q_mutex);
> +}
> +
> +void blk_mq_enable_hotplug(void)
> +{
> +	mutex_unlock(&all_q_mutex);
> +}
> +
>  static int __init blk_mq_init(void)
>  {
>  	/*
> @@ -2976,6 +3070,10 @@ static int __init blk_mq_init(void)
> 
>  	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
>  				blk_mq_hctx_notify_dead);
> +
> +	cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
> +				  blk_mq_queue_reinit_prepare,
> +				  blk_mq_queue_reinit_dead);
>  	return 0;
>  }
>  subsys_initcall(blk_mq_init);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 6c7c3ff5bf62..83b13ef1915e 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -59,6 +59,11 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
>  void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>  				struct list_head *list);
> +/*
> + * CPU hotplug helpers
> + */
> +void blk_mq_enable_hotplug(void);
> +void blk_mq_disable_hotplug(void);
> 
>  /*
>   * CPU -> queue mappings
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 201ab7267986..c31d4e3bf6d0 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -76,6 +76,7 @@ enum cpuhp_state {
>  	CPUHP_XEN_EVTCHN_PREPARE,
>  	CPUHP_ARM_SHMOBILE_SCU_PREPARE,
>  	CPUHP_SH_SH3X_PREPARE,
> +	CPUHP_BLK_MQ_PREPARE,
>  	CPUHP_NET_FLOW_PREPARE,
>  	CPUHP_TOPOLOGY_PREPARE,
>  	CPUHP_NET_IUCV_PREPARE,
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                               ` <c438db5f-f4f1-69f8-37f3-e91eae29fa25@de.ibm.com>
@ 2017-11-21 20:14                                 ` Jens Axboe
  2017-11-21 20:19                                   ` Christian Borntraeger
       [not found]                                   ` <276625a9-44fb-719d-9281-caacefdbb99f@de.ibm.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 20:14 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig

On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>> Bisect points to
>>>>>>>>>
>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>
>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>     
>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>     
>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>     the code.
>>>>>>>>>     
>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>
>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>> take a look.
>>>>>>>
>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>
>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>> for a dead cpu and handle that.
>>>>>>
>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>> not available CPU.
>>>>>>
>>>>>> in libvirt/virsh speak:
>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>
>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>> but becomes present and online afterwards.
>>>>>
>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>
>>>>> I'll see if I can come up with an appropriate fix.
>>>>
>>>> Can you try the below?
>>>
>>>
>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>
>>>
>>> output with 2 cpus:
>>> /sys/kernel/debug/block/vda
>>> /sys/kernel/debug/block/vda/hctx0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>> /sys/kernel/debug/block/vda/hctx0/active
>>> /sys/kernel/debug/block/vda/hctx0/run
>>> /sys/kernel/debug/block/vda/hctx0/queued
>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/tags
>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>> /sys/kernel/debug/block/vda/hctx0/busy
>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>> /sys/kernel/debug/block/vda/hctx0/flags
>>> /sys/kernel/debug/block/vda/hctx0/state
>>> /sys/kernel/debug/block/vda/sched
>>> /sys/kernel/debug/block/vda/sched/dispatch
>>> /sys/kernel/debug/block/vda/sched/starved
>>> /sys/kernel/debug/block/vda/sched/batching
>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>> /sys/kernel/debug/block/vda/write_hints
>>> /sys/kernel/debug/block/vda/state
>>> /sys/kernel/debug/block/vda/requeue_list
>>> /sys/kernel/debug/block/vda/poll_stat
>>
>> Try this, basically just a revert.
> 
> Yes, seems to work.
> 
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Great, thanks for testing.

> Do you know why the original commit made it into 4.12 stable? After all
> it has no Fixes tag and no cc stable-

I was wondering the same thing when you said it was in 4.12.stable and
not in 4.12 release. That patch should absolutely not have gone into
stable, it's not marked as such and it's not fixing a problem that is
stable worthy. In fact, it's causing a regression...

Greg? Upstream commit is mentioned higher up, start of the email.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-11-21 20:14                                 ` Jens Axboe
@ 2017-11-21 20:19                                   ` Christian Borntraeger
       [not found]                                   ` <276625a9-44fb-719d-9281-caacefdbb99f@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:19 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig,
	Greg Kroah-Hartman, stable


On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>> Bisect points to
>>>>>>>>>>
>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>
>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>     
>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>     
>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>     the code.
>>>>>>>>>>     
>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>
>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>> take a look.
>>>>>>>>
>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>
>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>> for a dead cpu and handle that.
>>>>>>>
>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>> not available CPU.
>>>>>>>
>>>>>>> in libvirt/virsh speak:
>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>
>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>> but becomes present and online afterwards.
>>>>>>
>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>
>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>
>>>>> Can you try the below?
>>>>
>>>>
>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>
>>>>
>>>> output with 2 cpus:
>>>> /sys/kernel/debug/block/vda
>>>> /sys/kernel/debug/block/vda/hctx0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>> /sys/kernel/debug/block/vda/sched
>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>> /sys/kernel/debug/block/vda/sched/starved
>>>> /sys/kernel/debug/block/vda/sched/batching
>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>> /sys/kernel/debug/block/vda/write_hints
>>>> /sys/kernel/debug/block/vda/state
>>>> /sys/kernel/debug/block/vda/requeue_list
>>>> /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> Great, thanks for testing.
> 
>> Do you know why the original commit made it into 4.12 stable? After all
>> it has no Fixes tag and no cc stable-
> 
> I was wondering the same thing when you said it was in 4.12.stable and
> not in 4.12 release. That patch should absolutely not have gone into
> stable, it's not marked as such and it's not fixing a problem that is
> stable worthy. In fact, it's causing a regression...
> 
> Greg? Upstream commit is mentioned higher up, start of the email.
> 


Forgot to cc Greg?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                                   ` <276625a9-44fb-719d-9281-caacefdbb99f@de.ibm.com>
@ 2017-11-21 20:21                                     ` Jens Axboe
       [not found]                                     ` <eceae481-cf9a-0429-7a15-c363a8e7bc2a@kernel.dk>
  1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 20:21 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
> 
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>> Bisect points to
>>>>>>>>>>>
>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>
>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>     
>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>     
>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>     the code.
>>>>>>>>>>>     
>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>> take a look.
>>>>>>>>>
>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>
>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>
>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>> not available CPU.
>>>>>>>>
>>>>>>>> in libvirt/virsh speak:
>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>
>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>> but becomes present and online afterwards.
>>>>>>>
>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>
>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>
>>>>>> Can you try the below?
>>>>>
>>>>>
>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>
>>>>>
>>>>> output with 2 cpus:
>>>>> /sys/kernel/debug/block/vda
>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>> /sys/kernel/debug/block/vda/sched
>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>> /sys/kernel/debug/block/vda/state
>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>
>>>> Try this, basically just a revert.
>>>
>>> Yes, seems to work.
>>>
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great, thanks for testing.
>>
>>> Do you know why the original commit made it into 4.12 stable? After all
>>> it has no Fixes tag and no cc stable-
>>
>> I was wondering the same thing when you said it was in 4.12.stable and
>> not in 4.12 release. That patch should absolutely not have gone into
>> stable, it's not marked as such and it's not fixing a problem that is
>> stable worthy. In fact, it's causing a regression...
>>
>> Greg? Upstream commit is mentioned higher up, start of the email.
>>
> 
> 
> Forgot to cc Greg?

I did, thanks for doing that. Now I wonder how to mark this patch,
as we should revert it from kernels that have the bad commit. 4.12
is fine, 4.12.later-stable is not.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                                     ` <eceae481-cf9a-0429-7a15-c363a8e7bc2a@kernel.dk>
@ 2017-11-21 20:31                                       ` Christian Borntraeger
       [not found]                                       ` <1ddd1cd4-2862-849e-7849-82544bcb86be@de.ibm.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2017-11-21 20:31 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig,
	Greg Kroah-Hartman, stable



On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>
>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>
>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>     
>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>     the code.
>>>>>>>>>>>>     
>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>> take a look.
>>>>>>>>>>
>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>
>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>
>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>> not available CPU.
>>>>>>>>>
>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>
>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>> but becomes present and online afterwards.
>>>>>>>>
>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>
>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>
>>>>>>> Can you try the below?
>>>>>>
>>>>>>
>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>
>>>>>>
>>>>>> output with 2 cpus:
>>>>>> /sys/kernel/debug/block/vda
>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>> /sys/kernel/debug/block/vda/state
>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>
>>>>> Try this, basically just a revert.
>>>>
>>>> Yes, seems to work.
>>>>
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great, thanks for testing.
>>>
>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>> it has no Fixes tag and no cc stable-
>>>
>>> I was wondering the same thing when you said it was in 4.12.stable and
>>> not in 4.12 release. That patch should absolutely not have gone into
>>> stable, it's not marked as such and it's not fixing a problem that is
>>> stable worthy. In fact, it's causing a regression...
>>>
>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>
>>
>>
>> Forgot to cc Greg?
> 
> I did, thanks for doing that. Now I wonder how to mark this patch,
> as we should revert it from kernels that have the bad commit. 4.12
> is fine, 4.12.later-stable is not.
> 

I think we should tag it with:

Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")

which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                                       ` <1ddd1cd4-2862-849e-7849-82544bcb86be@de.ibm.com>
@ 2017-11-21 20:39                                         ` Jens Axboe
       [not found]                                         ` <08e6f35a-4f49-973e-99f7-6087b44337c4@kernel.dk>
  1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-21 20:39 UTC (permalink / raw)
  To: Christian Borntraeger, Bart Van Assche,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com,
	linux-kernel@vger.kernel.org, Christoph Hellwig,
	Greg Kroah-Hartman, stable

On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>>>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>>>>>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>>>>>>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>>>>>>>>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>>>>>>>>>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>>>>>>>>>>>> Bisect points to
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>>>>>>>>>>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>>>>>>>>>>>> Author: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>>>>>>>>>>>
>>>>>>>>>>>>>     blk-mq: Create hctx for each present CPU
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Currently we only create hctx for online CPUs, which can lead to a lot
>>>>>>>>>>>>>     of churn due to frequent soft offline / online operations.  Instead
>>>>>>>>>>>>>     allocate one for each present CPU to avoid this and dramatically simplify
>>>>>>>>>>>>>     the code.
>>>>>>>>>>>>>     
>>>>>>>>>>>>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>>>>>>>     Reviewed-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>>     Cc: Keith Busch <keith.busch@intel.com>
>>>>>>>>>>>>>     Cc: linux-block@vger.kernel.org
>>>>>>>>>>>>>     Cc: linux-nvme@lists.infradead.org
>>>>>>>>>>>>>     Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de
>>>>>>>>>>>>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>>>>>>>     Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
>>>>>>>>>>>>>     Cc: Mike Galbraith <efault@gmx.de>
>>>>>>>>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>>>>>>>>>>> take a look.
>>>>>>>>>>>
>>>>>>>>>>> Can't make it trigger here. We do init for each present CPU, which means
>>>>>>>>>>> that if I offline a few CPUs here and register a queue, those still show
>>>>>>>>>>> up as present (just offline) and get mapped accordingly.
>>>>>>>>>>>
>>>>>>>>>>> From the looks of it, your setup is different. If the CPU doesn't show
>>>>>>>>>>> up as present and it gets hotplugged, then I can see how this condition
>>>>>>>>>>> would trigger. What environment are you running this in? We might have
>>>>>>>>>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>>>>>>>>>> for a dead cpu and handle that.
>>>>>>>>>>
>>>>>>>>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>>>>>>>>> not available CPU.
>>>>>>>>>>
>>>>>>>>>> in libvirt/virsh speak:
>>>>>>>>>>   <vcpu placement='static' current='1'>4</vcpu>
>>>>>>>>>
>>>>>>>>> So that's why we run into problems. It's not present when we load the device,
>>>>>>>>> but becomes present and online afterwards.
>>>>>>>>>
>>>>>>>>> Christoph, we used to handle this just fine, your patch broke it.
>>>>>>>>>
>>>>>>>>> I'll see if I can come up with an appropriate fix.
>>>>>>>>
>>>>>>>> Can you try the below?
>>>>>>>
>>>>>>>
>>>>>>> It does prevent the crash but it seems that the new CPU is not "used " after the hotplug for mq:
>>>>>>>
>>>>>>>
>>>>>>> output with 2 cpus:
>>>>>>> /sys/kernel/debug/block/vda
>>>>>>> /sys/kernel/debug/block/vda/hctx0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>>>>>> /sys/kernel/debug/block/vda/hctx0/active
>>>>>>> /sys/kernel/debug/block/vda/hctx0/run
>>>>>>> /sys/kernel/debug/block/vda/hctx0/queued
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>>>>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>>>>>> /sys/kernel/debug/block/vda/hctx0/tags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>>>>>> /sys/kernel/debug/block/vda/hctx0/busy
>>>>>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>>>>>> /sys/kernel/debug/block/vda/hctx0/flags
>>>>>>> /sys/kernel/debug/block/vda/hctx0/state
>>>>>>> /sys/kernel/debug/block/vda/sched
>>>>>>> /sys/kernel/debug/block/vda/sched/dispatch
>>>>>>> /sys/kernel/debug/block/vda/sched/starved
>>>>>>> /sys/kernel/debug/block/vda/sched/batching
>>>>>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>>>>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>>>>>> /sys/kernel/debug/block/vda/write_hints
>>>>>>> /sys/kernel/debug/block/vda/state
>>>>>>> /sys/kernel/debug/block/vda/requeue_list
>>>>>>> /sys/kernel/debug/block/vda/poll_stat
>>>>>>
>>>>>> Try this, basically just a revert.
>>>>>
>>>>> Yes, seems to work.
>>>>>
>>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> Great, thanks for testing.
>>>>
>>>>> Do you know why the original commit made it into 4.12 stable? After all
>>>>> it has no Fixes tag and no cc stable-
>>>>
>>>> I was wondering the same thing when you said it was in 4.12.stable and
>>>> not in 4.12 release. That patch should absolutely not have gone into
>>>> stable, it's not marked as such and it's not fixing a problem that is
>>>> stable worthy. In fact, it's causing a regression...
>>>>
>>>> Greg? Upstream commit is mentioned higher up, start of the email.
>>>>
>>>
>>>
>>> Forgot to cc Greg?
>>
>> I did, thanks for doing that. Now I wonder how to mark this patch,
>> as we should revert it from kernels that have the bad commit. 4.12
>> is fine, 4.12.later-stable is not.
>>
> 
> I think we should tag it with:
> 
> Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
> 
> which should bring it into 4.13 stable and 4.14 stable. 4.12 stable seems EOL anyway.

Yeah, I think so too. But thinking more about this, I'm pretty sure this
adds a bad lock dependency with hotplug. Need to verify so we ensure we
don't introduce a potential deadlock here...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                                         ` <08e6f35a-4f49-973e-99f7-6087b44337c4@kernel.dk>
@ 2017-11-22  7:28                                           ` Christoph Hellwig
       [not found]                                           ` <20171122072857.GA19338@lst.de>
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2017-11-22  7:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: mst@redhat.com, Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
	stable, virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, Bart Van Assche, Christoph Hellwig

Jens, please don't just revert the commit in your for-linus tree.

On its own this will totally mess up the interrupt assignments.  Give
me a bit of time to sort this out properly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                                           ` <20171122072857.GA19338@lst.de>
@ 2017-11-22 14:46                                             ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-11-22 14:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mst@redhat.com, Greg Kroah-Hartman, linux-kernel@vger.kernel.org,
	stable, virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, Bart Van Assche

On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
> 
> On its own this will totally mess up the interrupt assignments.  Give
> me a bit of time to sort this out properly.

I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                     ` <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com>
  2017-11-21 18:27                       ` Jens Axboe
@ 2017-11-23 14:02                       ` Christoph Hellwig
       [not found]                       ` <20171123140208.GA28914@lst.de>
  2 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, mst@redhat.com, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, Bart Van Assche, Christoph Hellwig

I can't reproduce it in my VM with adding a new CPU.  Do you have
any interesting blk-mq like actually using multiple queues?  I'll
give that a spin next.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                       ` <20171123140208.GA28914@lst.de>
@ 2017-11-23 14:08                         ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2017-11-23 14:08 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, mst@redhat.com, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, Bart Van Assche, Christoph Hellwig

Ok, it helps to make sure we're actually doing I/O from the CPU,
I've reproduced it now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-11-23 14:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <9c5eec5d-f542-4d76-6933-6fe31203ce09@de.ibm.com>
2017-11-20 19:20 ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk Bart Van Assche
     [not found] ` <1511205644.2396.32.camel@wdc.com>
2017-11-20 19:29   ` Christian Borntraeger
     [not found]   ` <04526c98-ffc5-1eca-3aa8-50f9212c4323@de.ibm.com>
2017-11-20 19:42     ` Jens Axboe
     [not found]     ` <5c9f2228-0a8b-8225-7038-e6cb3f31ca0b@kernel.dk>
2017-11-20 20:49       ` Christian Borntraeger
     [not found]       ` <2e44dbd3-2f90-c267-560c-91d1d4b0e892@de.ibm.com>
2017-11-20 20:52         ` Jens Axboe
2017-11-21  8:35           ` Christian Borntraeger
2017-11-21  9:50             ` Christian Borntraeger
     [not found]             ` <15f232d2-2aaa-df7c-57e8-2f710e051e84@de.ibm.com>
2017-11-21 10:14               ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable) Christian Borntraeger
     [not found]               ` <c8bd769e-9742-205d-11b0-469428a8579c@de.ibm.com>
2017-11-21 17:27                 ` Jens Axboe
     [not found]                 ` <b7b3cf4b-837e-f9ba-61c0-4f9ddd8b9a95@kernel.dk>
2017-11-21 18:09                   ` Jens Axboe
2017-11-21 18:12                     ` Christian Borntraeger
     [not found]                     ` <1aeecf2e-a68e-4c18-5912-2473f457e6ea@de.ibm.com>
2017-11-21 18:27                       ` Jens Axboe
2017-11-21 18:39                         ` Jens Axboe
2017-11-21 19:15                           ` Christian Borntraeger
     [not found]                           ` <ba994ec6-8db6-f77e-ac73-92e3f6b0135a@de.ibm.com>
2017-11-21 19:30                             ` Jens Axboe
     [not found]                             ` <ae02b9c5-9a2e-cb8b-7828-475b3c0b1cb9@kernel.dk>
2017-11-21 20:12                               ` Christian Borntraeger
     [not found]                               ` <c438db5f-f4f1-69f8-37f3-e91eae29fa25@de.ibm.com>
2017-11-21 20:14                                 ` Jens Axboe
2017-11-21 20:19                                   ` Christian Borntraeger
     [not found]                                   ` <276625a9-44fb-719d-9281-caacefdbb99f@de.ibm.com>
2017-11-21 20:21                                     ` Jens Axboe
     [not found]                                     ` <eceae481-cf9a-0429-7a15-c363a8e7bc2a@kernel.dk>
2017-11-21 20:31                                       ` Christian Borntraeger
     [not found]                                       ` <1ddd1cd4-2862-849e-7849-82544bcb86be@de.ibm.com>
2017-11-21 20:39                                         ` Jens Axboe
     [not found]                                         ` <08e6f35a-4f49-973e-99f7-6087b44337c4@kernel.dk>
2017-11-22  7:28                                           ` Christoph Hellwig
     [not found]                                           ` <20171122072857.GA19338@lst.de>
2017-11-22 14:46                                             ` Jens Axboe
2017-11-23 14:02                       ` Christoph Hellwig
     [not found]                       ` <20171123140208.GA28914@lst.de>
2017-11-23 14:08                         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox