From: John Garry <john.garry@huawei.com>
To: Ming Lei <tom.leiming@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Marc Zyngier <maz@kernel.org>, "Ming Lei" <ming.lei@redhat.com>,
Jens Axboe <axboe@kernel.dk>,
linux-block <linux-block@vger.kernel.org>,
Bart Van Assche <bvanassche@acm.org>,
"Hannes Reinecke" <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
"chenxiang (M)" <chenxiang66@hisilicon.com>,
Keith Busch <kbusch@kernel.org>,
"liudongdong (C)" <liudongdong3@huawei.com>,
wanghuiqiang <wanghuiqiang@huawei.com>,
"Wangzhou (B)" <wangzhou1@hisilicon.com>
Subject: Re: [PATCH V5 0/6] blk-mq: improvement CPU hotplug
Date: Mon, 3 Feb 2020 12:56:21 +0000 [thread overview]
Message-ID: <b0f35177-70f3-541d-996b-ebb364634225@huawei.com> (raw)
In-Reply-To: <CACVXFVOijCDjFa339Dyxnp9_0W5UjDyF-a42Dmo-6pogu+rp5Q@mail.gmail.com>
>>>
>>> Jobs: 6 ([ 519.858094] nvme nvme1: controller is down; will reset:
>>> CSTS=0xffffffff, PCI_STATUS=0xffff
>>>
>>> And some NVMe error also coincides with the hang. Another run has this:
>>>
>>> [ 247.015206] pcieport 0000:00:08.0: can't change power state from
>>> D3cold to D0 (config space inaccessible)
>>>
>>> I did test v5.4 previously and did not see this, but that would have
>>> included the v4 patchset in the $subject. I'll test v5.4 without that
>>> patchset now.
>>
>> So v5.4 does have this issue also:
>
> v5.5?
I am saying that both v5.4 and v5.5 have the issue. Below is the kernel
hang snippet for v5.4.
>
>>
>> [ 705.669512] psci: CPU24 killed (polled 0 ms)
>> [ 705.966753] CPU25: shutdown
>> [ 705.969548] psci: CPU25 killed (polled 0 ms)
>> [ 706.250771] CPU26: shutdown=2347MiB/s,w=0KiB/s][r=601k,w=0 IOPS][eta
>> 00m:13s]
>> [ 706.253565] psci: CPU26 killed (polled 0 ms)
>> [ 706.514728] CPU27: shutdown
>> [ 706.517523] psci: CPU27 killed (polled 0 ms)
>> [ 706.826708] CPU28: shutdown
>> [ 706.829502] psci: CPU28 killed (polled 0 ms)
>> [ 707.130916] CPU29: shutdown=2134MiB/s,w=0KiB/s][r=546k,w=0 IOPS][eta
>> 00m:12s]
>> [ 707.133714] psci: CPU29 killed (polled 0 ms)
>> [ 707.439066] CPU30: shutdown
>> [ 707.441870] psci: CPU30 killed (polled 0 ms)
>> [ 707.706727] CPU31: shutdown
>> [ 707.709526] psci: CPU31 killed (polled 0 ms)
>> [ 708.521853] pcieport 0000:00:08.0: can't change power state from
>> D3cold to D0 (config space inaccessible)
>> [ 728.741808] rcu: INFO: rcu_preempt detected stalls on
>> CPUs/tasks:80d:00h:35m:42s]
>> [ 728.747895] rcu: 48-...0: (0 ticks this GP)
>> idle=b3e/1/0x4000000000000000 softirq=5548/5548 fqs=2626
>> [ 728.757197] (detected by 63, t=5255 jiffies, g=40989, q=1890)
>> [ 728.763018] Task dump for CPU 48:
>> [ 728.766321] irqbalance R running task 0 1272 1
>> 0x00000002
>> [ 728.773358] Call trace:
>> [ 728.775801] __switch_to+0xbc/0x218
>> [ 728.779283] gic_set_affinity+0x16c/0x1d8
>> [ 728.783282] irq_do_set_affinity+0x30/0xd0
>> [ 728.787365] irq_set_affinity_locked+0xc8/0xf0
>> [ 728.791796] __irq_set_affinity+0x4c/0x80
>> [ 728.795794] write_irq_affinity.isra.7+0x104/0x120
>> [ 728.800572] irq_affinity_proc_write+0x1c/0x28
>> [ 728.805008] proc_reg_write+0x78/0xb8
>> [ 728.808660] __vfs_write+0x18/0x38
>> [ 728.812050] vfs_write+0xb4/0x1e0
>> [ 728.815352] ksys_write+0x68/0xf8
>> [ 728.818655] __arm64_sys_write+0x18/0x20
>> [ 728.822567] el0_svc_common.constprop.2+0x64/0x160
>> [ 728.827345] el0_svc_handler+0x20/0x80
>> [ 728.831082] el0_sync_handler+0xe4/0x188
>> [ 728.834991] el0_sync+0x140/0x180
>> [ 738.993844] nvme nvme1: controller is down; will reset:
>> CSTS=0xffffffff, PCI_STATUS=0xffff
>> [ 791.761810] rcu: INFO: rcu_preempt detected stalls on
>> CPUs/tasks:63d:14h:24m:13s]
>> [ 791.767897] rcu: 48-...0: (0 ticks this GP)
>> idle=b3e/1/0x4000000000000000 softirq=5548/5548 fqs=10495
>> [ 791.777281] (detected by 54, t=21010 jiffies, g=40989, q=2396)
>> [ 791.783189] Task dump for CPU 48:
>> [ 791.786491] irqbalance R running task 0 1272 1
>> 0x00000002
>> [ 791.793528] Call trace:
>> [ 791.795964] __switch_to+0xbc/0x218
>> [ 791.799441] gic_set_affinity+0x16c/0x1d8
>> [ 791.803439] irq_do_set_affinity+0x30/0xd0
>> [ 791.807522] irq_set_affinity_locked+0xc8/0xf0
>> [ 791.811953] __irq_set_affinity+0x4c/0x80
>> [ 791.815949] write_irq_affinity.isra.7+0x104/0x120
>> [ 791.820727] irq_affinity_proc_write+0x1c/0x28
>> [ 791.825158] proc_reg_write+0x78/0xb8
>> [ 791.828808] __vfs_write+0x18/0x38
>> [ 791.832197] vfs_write+0xb4/0x1e0
>> [ 791.835500] ksys_write+0x68/0xf8
>> [ 791.838802] __arm64_sys_write+0x18/0x20
>> [ 791.842712] el0_svc_common.constprop.2+0x64/0x160
>> [ 791.847490] el0_svc_handler+0x20/0x80
>> [ 791.851226] el0_sync_handler+0xe4/0x188
>> [ 791.855135] el0_sync+0x140/0x180
>> Jobs: 6 (f=6): [R(6)][0.0%][r=0KiB/s
>
> Can you trigger it after disabling irqbalance?
No, so tested by killing the irqbalance process and it ran for 25
minutes without issue.
Thanks,
John
next prev parent reply other threads:[~2020-02-03 12:56 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-15 11:44 [PATCH V5 0/6] blk-mq: improvement CPU hotplug Ming Lei
2020-01-15 11:44 ` [PATCH 1/6] blk-mq: add new state of BLK_MQ_S_INACTIVE Ming Lei
2020-01-15 11:44 ` [PATCH 2/6] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
2020-01-15 11:44 ` [PATCH 3/6] blk-mq: stop to handle IO and drain IO before hctx becomes inactive Ming Lei
2020-01-15 11:44 ` [PATCH 4/6] blk-mq: re-submit IO in case that hctx is inactive Ming Lei
2020-01-15 11:44 ` [PATCH 5/6] blk-mq: handle requests dispatched from IO scheduler in case of inactive hctx Ming Lei
2020-01-15 11:44 ` [PATCH 6/6] block: deactivate hctx when all its CPUs are offline when running queue Ming Lei
2020-01-15 17:00 ` [PATCH V5 0/6] blk-mq: improvement CPU hotplug John Garry
2020-01-20 13:23 ` John Garry
2020-01-31 10:04 ` Ming Lei
2020-01-31 10:24 ` John Garry
2020-01-31 10:58 ` Ming Lei
2020-01-31 17:51 ` John Garry
2020-01-31 18:02 ` John Garry
2020-02-01 1:31 ` Ming Lei
2020-02-01 11:05 ` Marc Zyngier
2020-02-01 11:31 ` Thomas Gleixner
2020-02-03 10:30 ` John Garry
2020-02-03 10:49 ` John Garry
2020-02-03 10:59 ` Ming Lei
2020-02-03 12:56 ` John Garry [this message]
2020-02-03 15:43 ` Marc Zyngier
2020-02-03 18:16 ` John Garry
2020-02-05 14:08 ` John Garry
2020-02-05 14:23 ` Marc Zyngier
2020-02-07 10:56 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b0f35177-70f3-541d-996b-ebb364634225@huawei.com \
--to=john.garry@huawei.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=chenxiang66@hisilicon.com \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=liudongdong3@huawei.com \
--cc=maz@kernel.org \
--cc=ming.lei@redhat.com \
--cc=tglx@linutronix.de \
--cc=tom.leiming@gmail.com \
--cc=wanghuiqiang@huawei.com \
--cc=wangzhou1@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox