linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
       [not found]                   ` <20171204162108.GA12482@lst.de>
@ 2017-12-06 12:25                     ` Christian Borntraeger
  2017-12-06 23:29                       ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2017-12-06 12:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Stefan Haberland,
	linux-s390, Martin Schwidefsky

On 12/04/2017 05:21 PM, Christoph Hellwig wrote:
> On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
>> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
>> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
>> Seems that this is the place where the system stops. (see the sysrq-t output
>> at the bottom).
> 
> Can you check which of the patches in the tree is the culprit?


From this branch

    git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix

commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
    blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and 
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
   genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
s390 irq handling code).


Some history:
I got this warning
"WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)"
since 4.13 (and also in 4.12 stable)
on CPU hotplug of previously unavailable CPUs (real hotplug, no offline/online)

This was introduced with 

 blk-mq: Create hctx for each present CPU
    commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 

And Christoph is currently working on a fix. The fixed kernel does boot with virtio-blk and
it fixes the warning but it hangs (outstanding I/O) with dasd disks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-06 12:25                     ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable) Christian Borntraeger
@ 2017-12-06 23:29                       ` Christoph Hellwig
  2017-12-07  9:20                         ` Christian Borntraeger
  2017-12-18 13:56                         ` Stefan Haberland
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2017-12-06 23:29 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, Stefan Haberland, linux-s390, Martin Schwidefsky

On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>     blk-mq: create a blk_mq_ctx for each possible CPU
> does not boot on DASD and 
> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>    genirq/affinity: assign vectors to all possible CPUs
> does boot with DASD disks.
> 
> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> s390 irq handling code).

That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.

I guess we'd really want:

e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"

before this commit, but it seems like the whole stack didn't work for
your either.

I wonder if there is some weird thing about nr_cpu_ids in s390?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-06 23:29                       ` Christoph Hellwig
@ 2017-12-07  9:20                         ` Christian Borntraeger
  2017-12-14 17:32                           ` Christian Borntraeger
  2017-12-18 13:56                         ` Stefan Haberland
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2017-12-07  9:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Stefan Haberland,
	linux-s390, Martin Schwidefsky



On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>     blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and 
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>    genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
> 
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
> 
> I guess we'd really want:
> 
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> 
> before this commit, but it seems like the whole stack didn't work for
> your either.
> 
> I wonder if there is some weird thing about nr_cpu_ids in s390?

The problem starts as soon as NR_CPUS is larger than the number
of real CPUs.

Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:

e.g. dont we need something like (whitespace and indent damaged)

@@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
        if (--hctx->next_cpu_batch <= 0) {
                int next_cpu;
 
+               do  {
                next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
-               if (!cpu_online(next_cpu))
-                       next_cpu = cpumask_next(next_cpu, hctx->cpumask);
                if (next_cpu >= nr_cpu_ids)
                        next_cpu = cpumask_first(hctx->cpumask);
+               } while (!cpu_online(next_cpu));
 
                hctx->next_cpu = next_cpu;
                hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;

it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-07  9:20                         ` Christian Borntraeger
@ 2017-12-14 17:32                           ` Christian Borntraeger
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2017-12-14 17:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Bart Van Assche, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Stefan Haberland,
	linux-s390, Martin Schwidefsky

Independent from the issues with the dasd disks, this also seem to not enable
additional hardware queues.

with cpus 0,1 (and 248 cpus max)
I get cpus 0 and 2-247 attached to hardware contect 0 and I get
cpu 1 for hardware context 1. 

If I now add a cpu this does not change anything. hardware context 2,3,4
etc all have no CPU and hardware context 0 keeps sitting on all cpus (except 1).




On 12/07/2017 10:20 AM, Christian Borntraeger wrote:
> 
> 
> On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>     blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and 
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>    genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>>
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
> 
> The problem starts as soon as NR_CPUS is larger than the number
> of real CPUs.
> 
> Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:
> 
> e.g. dont we need something like (whitespace and indent damaged)
> 
> @@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>         if (--hctx->next_cpu_batch <= 0) {
>                 int next_cpu;
>  
> +               do  {
>                 next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
> -               if (!cpu_online(next_cpu))
> -                       next_cpu = cpumask_next(next_cpu, hctx->cpumask);
>                 if (next_cpu >= nr_cpu_ids)
>                         next_cpu = cpumask_first(hctx->cpumask);
> +               } while (!cpu_online(next_cpu));
>  
>                 hctx->next_cpu = next_cpu;
>                 hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
> 
> it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-06 23:29                       ` Christoph Hellwig
  2017-12-07  9:20                         ` Christian Borntraeger
@ 2017-12-18 13:56                         ` Stefan Haberland
  2017-12-20 15:47                           ` Christian Borntraeger
  1 sibling, 1 reply; 12+ messages in thread
From: Stefan Haberland @ 2017-12-18 13:56 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Borntraeger
  Cc: Jens Axboe, Bart Van Assche, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, linux-s390,
	Martin Schwidefsky

On 07.12.2017 00:29, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>      blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>     genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
>
> I guess we'd really want:
>
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>
> before this commit, but it seems like the whole stack didn't work for
> your either.
>
> I wonder if there is some weird thing about nr_cpu_ids in s390?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

I tried this on my system and the blk-mq-hotplug-fix branch does not 
boot for me as well.
The disks get up and running and I/O works fine. At least the partition 
detection and EXT4-fs mount works.

But at some point in time the disk do not get any requests.

I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No 
error in the logs or in our debug entrys. Just empty DASD devices 
waiting to be called for I/O requests.

Do you have anything I could have a look at?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-18 13:56                         ` Stefan Haberland
@ 2017-12-20 15:47                           ` Christian Borntraeger
  2018-01-11  9:13                             ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2017-12-20 15:47 UTC (permalink / raw)
  To: Stefan Haberland, Christoph Hellwig, Jens Axboe
  Cc: Bart Van Assche, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, linux-s390,
	Martin Schwidefsky

On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> On 07.12.2017 00:29, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>     genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> 
> But at some point in time the disk do not get any requests.
> 
> I currently have no clue why.
> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> 
> Do you have anything I could have a look at?

Jens, Christoph, so what do we do about this?
To summarize:
- commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
- Jens' quick revert did fix the issue and did not broke DASD support but has some issues
with interrupt affinity.
- Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
without hotplug).

Christian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2017-12-20 15:47                           ` Christian Borntraeger
@ 2018-01-11  9:13                             ` Ming Lei
  2018-01-11  9:26                               ` Stefan Haberland
                                                 ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ming Lei @ 2018-01-11  9:13 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, linux-s390, Martin Schwidefsky

On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> > On 07.12.2017 00:29, Christoph Hellwig wrote:
> >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068��� -> bad
> >>> ���� blk-mq: create a blk_mq_ctx for each possible CPU
> >>> does not boot on DASD and
> >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc��� -> good
> >>> ��� genirq/affinity: assign vectors to all possible CPUs
> >>> does boot with DASD disks.
> >>>
> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> >>> s390 irq handling code).
> >> That is interesting as it really isn't related to interrupts at all,
> >> it just ensures that possible CPUs are set in ->cpumask.
> >>
> >> I guess we'd really want:
> >>
> >> e005655c389e3d25bf3e43f71611ec12f3012de0
> >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> >>
> >> before this commit, but it seems like the whole stack didn't work for
> >> your either.
> >>
> >> I wonder if there is some weird thing about nr_cpu_ids in s390?
> >> -- 
> >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at� http://vger.kernel.org/majordomo-info.html
> >>
> > 
> > I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> > The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> > 
> > But at some point in time the disk do not get any requests.
> > 
> > I currently have no clue why.
> > I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> > 
> > Do you have anything I could have a look at?
> 
> Jens, Christoph, so what do we do about this?
> To summarize:
> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
> with interrupt affinity.
> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
> without hotplug).

Hello,

This one is a valid use case for VM, I think we need to fix that.

Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:

	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix

gitweb:
	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix

Could you test it and provide the feedback?

BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?

Thanks, 
Ming

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11  9:13                             ` Ming Lei
@ 2018-01-11  9:26                               ` Stefan Haberland
  2018-01-11 11:44                               ` Christian Borntraeger
  2018-01-11 17:46                               ` Christoph Hellwig
  2 siblings, 0 replies; 12+ messages in thread
From: Stefan Haberland @ 2018-01-11  9:26 UTC (permalink / raw)
  To: Ming Lei, Christian Borntraeger
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, linux-s390, Martin Schwidefsky

On 11.01.2018 10:13, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>       blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>      genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
> Hello,
>
> This one is a valid use case for VM, I think we need to fix that.
>
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
>
> 	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix
>
> gitweb:
> 	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>
> Could you test it and provide the feedback?
>
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?
>
> Thanks,
> Ming
>

Hi,

thanks for the patch. I had pretty much the same place in suspicion.
I will test it asap.

Regards,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11  9:13                             ` Ming Lei
  2018-01-11  9:26                               ` Stefan Haberland
@ 2018-01-11 11:44                               ` Christian Borntraeger
  2018-01-11 13:17                                 ` Stefan Haberland
  2018-01-11 17:46                               ` Christoph Hellwig
  2 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2018-01-11 11:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Stefan Haberland, Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, linux-s390, Martin Schwidefsky



On 01/11/2018 10:13 AM, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>     genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>>
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
> 
> Hello,
> 
> This one is a valid use case for VM, I think we need to fix that.
> 
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
> 
> 	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix
> 
> gitweb:
> 	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
> 
> Could you test it and provide the feedback?
> 
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?

That kernel seems to boot fine on my system with DASD disks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11 11:44                               ` Christian Borntraeger
@ 2018-01-11 13:17                                 ` Stefan Haberland
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Haberland @ 2018-01-11 13:17 UTC (permalink / raw)
  To: Christian Borntraeger, Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Bart Van Assche,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner, linux-s390, Martin Schwidefsky

On 11.01.2018 12:44, Christian Borntraeger wrote:
>
> On 01/11/2018 10:13 AM, Ming Lei wrote:
>> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>>       blk-mq: create a blk_mq_ctx for each possible CPU
>>>>>> does not boot on DASD and
>>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>>      genirq/affinity: assign vectors to all possible CPUs
>>>>>> does boot with DASD disks.
>>>>>>
>>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>>> s390 irq handling code).
>>>>> That is interesting as it really isn't related to interrupts at all,
>>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>>
>>>>> I guess we'd really want:
>>>>>
>>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>>
>>>>> before this commit, but it seems like the whole stack didn't work for
>>>>> your either.
>>>>>
>>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>>
>>>> But at some point in time the disk do not get any requests.
>>>>
>>>> I currently have no clue why.
>>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>>
>>>> Do you have anything I could have a look at?
>>> Jens, Christoph, so what do we do about this?
>>> To summarize:
>>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>>> with interrupt affinity.
>>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>>> without hotplug).
>> Hello,
>>
>> This one is a valid use case for VM, I think we need to fix that.
>>
>> Looks there is issue on the fouth patch("blk-mq: only select online
>> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
>> the other 3 patches are same with Christoph's:
>>
>> 	https://github.com/ming1/linux.git  v4.15-rc-block-for-next-cpuhot-fix
>>
>> gitweb:
>> 	https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>>
>> Could you test it and provide the feedback?
>>
>> BTW, if it can't help this issue, could you boot from a normal disk first
>> and dump blk-mq debugfs of DASD later?
> That kernel seems to boot fine on my system with DASD disks.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

I did some regression testing and it works quite well. Boot works, 
attaching CPUs during runtime on z/VM and enabling them in Linux works 
as well.
I also did some DASD online/offline CPU enable/disable loops.

Regards,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11  9:13                             ` Ming Lei
  2018-01-11  9:26                               ` Stefan Haberland
  2018-01-11 11:44                               ` Christian Borntraeger
@ 2018-01-11 17:46                               ` Christoph Hellwig
  2018-01-12  1:16                                 ` Ming Lei
  2 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2018-01-11 17:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christian Borntraeger, Stefan Haberland, Christoph Hellwig,
	Jens Axboe, Bart Van Assche, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, linux-s390,
	Martin Schwidefsky

Thanks for looking into this Ming, I had missed it in the my current
work overload.  Can you send the updated series to Jens?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
  2018-01-11 17:46                               ` Christoph Hellwig
@ 2018-01-12  1:16                                 ` Ming Lei
  0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2018-01-12  1:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Borntraeger, Stefan Haberland, Jens Axboe,
	Bart Van Assche, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Thomas Gleixner, linux-s390,
	Martin Schwidefsky

On Thu, Jan 11, 2018 at 06:46:54PM +0100, Christoph Hellwig wrote:
> Thanks for looking into this Ming, I had missed it in the my current
> work overload.  Can you send the updated series to Jens?

OK, I will post it out soon.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-01-12  1:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <a1d51b13-c034-e0c8-9428-e360b2c898c3@kernel.dk>
     [not found] ` <20171123143453.GA29715@lst.de>
     [not found]   ` <cdca47a9-05cc-2676-57fe-904ce8ad5fbc@de.ibm.com>
     [not found]     ` <20171123182542.GA2680@lst.de>
     [not found]       ` <899f1638-cca4-28e6-3225-51505a053d45@de.ibm.com>
     [not found]         ` <20171123183232.GA2845@lst.de>
     [not found]           ` <92ef1aae-90b5-f14f-390e-bfab97899431@de.ibm.com>
     [not found]             ` <419d8565-9cbe-16ac-3d5d-5945098694bc@de.ibm.com>
     [not found]               ` <20171127155409.GA6937@lst.de>
     [not found]                 ` <d0f39408-b697-8d1a-2cce-b833cb8fa118@de.ibm.com>
     [not found]                   ` <20171204162108.GA12482@lst.de>
2017-12-06 12:25                     ` 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable) Christian Borntraeger
2017-12-06 23:29                       ` Christoph Hellwig
2017-12-07  9:20                         ` Christian Borntraeger
2017-12-14 17:32                           ` Christian Borntraeger
2017-12-18 13:56                         ` Stefan Haberland
2017-12-20 15:47                           ` Christian Borntraeger
2018-01-11  9:13                             ` Ming Lei
2018-01-11  9:26                               ` Stefan Haberland
2018-01-11 11:44                               ` Christian Borntraeger
2018-01-11 13:17                                 ` Stefan Haberland
2018-01-11 17:46                               ` Christoph Hellwig
2018-01-12  1:16                                 ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).