Re: [PATCH V5 0/6] blk-mq: improvement CPU hotplug

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marc Zyngier <maz@kernel.org>
To: Ming Lei <tom.leiming@gmail.com>, John Garry <john.garry@huawei.com>
Cc: Ming Lei <ming.lei@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>,
	Bart Van Assche <bvanassche@acm.org>,
	Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	"chenxiang (M)" <chenxiang66@hisilicon.com>,
	Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCH V5 0/6] blk-mq: improvement CPU hotplug
Date: Sat, 1 Feb 2020 11:05:39 +0000	[thread overview]
Message-ID: <20200201110539.03db5434@why> (raw)
In-Reply-To: <CACVXFVNiBOBdtwuW=q4aSmUMAnn6Gfpg6BGhcQu44s58NZ08Ww@mail.gmail.com>

On Sat, 1 Feb 2020 09:31:17 +0800
Ming Lei <tom.leiming@gmail.com> wrote:

> On Sat, Feb 1, 2020 at 2:02 AM John Garry <john.garry@huawei.com> wrote:
> >
> > On 31/01/2020 17:51, John Garry wrote:  
> > > +- Keith  
> >
> > + Marc, on rare chance that this is related to the GIC
> >  
> > >
> > > On 31/01/2020 10:58, Ming Lei wrote:  
> > >> On Fri, Jan 31, 2020 at 6:24 PM John Garry <john.garry@huawei.com> wrote:  
> > >>>  
> > >>>>> [  141.976109] Call trace:
> > >>>>> [  141.978550]  __switch_to+0xbc/0x218
> > >>>>> [  141.982029]  blk_mq_run_work_fn+0x1c/0x28
> > >>>>> [  141.986027]  process_one_work+0x1e0/0x358
> > >>>>> [  141.990025]  worker_thread+0x40/0x488
> > >>>>> [  141.993678]  kthread+0x118/0x120
> > >>>>> [  141.996897]  ret_from_fork+0x10/0x18  
> > >>>>
> > >>>> Hi John,
> > >>>>
> > >>>> Thanks for your test!
> > >>>>  
> > >>>
> > >>> Hi Ming,
> > >>>  
> > >>>> Could you test the following patchset and only the last one is changed?
> > >>>>
> > >>>> https://github.com/ming1/linux/commits/my_for_5.6_block  
> > >>>
> > >>> For SCSI testing, I will ask my colleague Xiang Chen to test when he
> > >>> returns to work. So I did not see this issue for my SCSI testing for
> > >>> your original v5, but I was only using 1x as opposed to maybe 20x SAS
> > >>> disks.
> > >>>
> > >>> BTW, did you test NVMe? For some reason I could not trigger a scenario
> > >>> where we're draining the outstanding requests for a queue which is being
> > >>> deactivated - I mean, the queues were always already quiesced.  
> > >>
> > >> I run cpu hotplug test on both NVMe and SCSI in KVM, and fio just runs
> > >> as expected.
> > >>
> > >> NVMe is often 1:1 mapping, so it might be a bit difficult to trigger
> > >> draining in-flight IOs.  
> > >
> > > I generally have a 4:1 mapping for my NVMe cards - so many CPUs.
> > >
> > > But more alarming is that I see this on every run:
> > >
> > > [  246.378744] psci: CPU6 killed (polled 0 ms)
> > > [  246.427955] irq_migrate_all_off_this_cpu: 3 callbacks suppressed
> > > [  246.427956] IRQ 822: no longer affine to CPU7
> > > [  246.438586] CPU7: shutdown
> > > [  246.441300] psci: CPU7 killed (polled 0 ms)
> > > [  247.015206] pcieport 0000:00:08.0: can't change power state from
> > > D3cold to D0 (config space inaccessible)
> > > Jobs: 6[  267.503159] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  267.509244] rcu: 25-...0: (1 GPs behind)
> > > idle=11e/1/0x4000000000000000 softirq=1175/1177 fqs=2626
> > > [  267.518283] (detected by 0, t=5255 jiffies, g=14001, q=1787)
> > > [  267.524016] Task dump for CPU 25:
> > > [  267.527318] irqbalance      R  running task        0  1275      1
> > > 0x00000002
> > > [  267.534354] Call trace:
> > > [  267.536790]  __switch_to+0xbc/0x218
> > > [  267.540266]  0xffff800014c53d68
> > > Jobs: 6 (f=6): [R(6)][96.0%][r=0KiB/s[  278.195193] nvme nvme1:
> > > controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
> > > ,w=0KiB/s][r=0,w=0 IOPS][eta 00m:[  278.205197] nvme nvme0: controller
> > > is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
> > > Jobs: 6 (f=6): [R(6)][0.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IO[  330.523157]
> > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  330.529242] rcu: 25-...0: (1 GPs behind)
> > > idle=11e/1/0x4000000000000000 softirq=1175/1177 fqs=10503
> > > [  330.538366] (detected by 0, t=21010 jiffies, g=14001, q=2318)
> > > [  330.544186] Task dump for CPU 25:
> > > [  330.547488] irqbalance      R  running task        0  1275      1
> > > 0x00000002
> > > [  330.554523] Call trace:
> > > [  330.556958]  __switch_to+0xbc/0x218
> > > [  330.560434]  0xffff800014c53d68
> > > [  393.543155] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  393.549242] rcu: 25-...0: (1 GPs behind)
> > > idle=11e/1/0x4000000000000000 softirq=1175/1177 fqs=18379
> > > [  393.558366] (detected by 8, t=36765 jiffies, g=14001, q=2385)
> > > [  393.564187] Task dump for CPU 25:
> > > [  393.567488] irqbalance      R  running task        0  1275      1
> > > 0x00000002
> > > [  393.574523] Call trace:
> > > [  393.576960]  __switch_to+0xbc/0x218
> > > [  393.580435]  0xffff800014c53d68
> > > [  456.567156] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  456.573246] rcu: 25-...0: (1 GPs behind)
> > > idle=11e/1/0x4000000000000000 softirq=1175/1177 fqs=26255
> > > [  456.582389] (detected by 90, t=52520 jiffies, g=14001, q=2397)
> > > [  456.588296] Task dump for CPU 25:
> > > [  456.591601] irqbalance      R  running task        0  1275      1
> > > 0x00000002
> > > [  456.598638] Call trace:
> > > [  456.601078]  __switch_to+0xbc/0x218
> > > [  456.604556]  0xffff800014c53d68
> > > [  519.587152] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  519.593235] rcu: 25-...0: (1 GPs behind)
> > > idle=11e/1/0x4000000000000000 softirq=1175/1177 fqs=34131
> > > [  519.602357] (detected by 90, t=68275 jiffies, g=14001, q=2405)
> > > [  519.608264] Task dump for CPU 25:
> > > [  519.611566] irqbalance      R  running task        0  1275      1
> > > 0x00000002
> > > [  519.618601] Call trace:
> > > [  519.621036]  __switch_to+0xbc/0x218
> > > [  519.624511]  0xffff800014c53d68
> > > [  582.603153] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  582.609236] rcu: 25-...0: (1 GPs behind)
> > > idle=11e/1/0x4000000000000000 softirq=1175/1177 fqs=42006
> > > [  582.618358] (detected by 90, t=84029 jiffies, g=14001, q=2409)
> > > [  582.624265] Task dump for CPU 25:
> > > [  582.627567] irqbalance      R  running task        0  1275      1
> > > 0x00000002
> > > [  582.634601] Call trace:
> > > [  582.637036]  __switch_to+0xbc/0x218
> > > [  582.640511]  0xffff800014c53d68
> > >
> > >
> > > I tested v5.5 and I cannot see this, but your branch is based in 5.5-rc2
> > > with lots of other stuff on top. The only thing I changed was
> > > nvme.use_threaded_interrupts = 1.  
> >
> > Uhh, now I do with v5.5:
> >
> > [  705.354601] CPU22: shutdown
> > [  705.357395] psci: CPU22 killed (polled 0 ms)
> > [  705.402837] CPU23: shutdown
> > [  705.405630] psci: CPU23 killed (polled 0 ms)
> > [  705.666716] CPU24: shutdown
> > [  705.669512] psci: CPU24 killed (polled 0 ms)
> > [  705.966753] CPU25: shutdown
> > [  705.969548] psci: CPU25 killed (polled 0 ms)
> > [  706.250771] CPU26: shutdown=2347MiB/s,w=0KiB/s][r=601k,w=0 IOPS][eta
> > 00m:13s]
> > [  706.253565] psci: CPU26 killed (polled 0 ms)
> > [  706.514728] CPU27: shutdown
> > [  706.517523] psci: CPU27 killed (polled 0 ms)
> > [  706.826708] CPU28: shutdown
> > [  706.829502] psci: CPU28 killed (polled 0 ms)
> > [  707.130916] CPU29: shutdown=2134MiB/s,w=0KiB/s][r=546k,w=0 IOPS][eta
> > 00m:12s]
> > [  707.133714] psci: CPU29 killed (polled 0 ms)
> > [  707.439066] CPU30: shutdown
> > [  707.441870] psci: CPU30 killed (polled 0 ms)
> > [  707.706727] CPU31: shutdown
> > [  707.709526] psci: CPU31 killed (polled 0 ms)
> > [  708.521853] pcieport 0000:00:08.0: can't change power state from
> > D3cold to D0 (config space inaccessible)
> > [  728.741808] rcu: INFO: rcu_preempt detected stalls on
> > CPUs/tasks:80d:00h:35m:42s]
> > [  728.747895] rcu: 48-...0: (0 ticks this GP)
> > idle=b3e/1/0x4000000000000000 softirq=5548/5548 fqs=2626
> > [  728.757197] (detected by 63, t=5255 jiffies, g=40989, q=1890)
> > [  728.763018] Task dump for CPU 48:
> > [  728.766321] irqbalance      R  running task        0  1272      1
> > 0x00000002
> > [  728.773358] Call trace:
> > [  728.775801]  __switch_to+0xbc/0x218
> > [  728.779283]  gic_set_affinity+0x16c/0x1d8  
> ...
> 
> gic_set_affinity shouldn't have switched out, so looks like one gic
> issue.

Given that gic_set_affinity doesn't sleep, this looks pretty unlikely.
And __irq_set_affinity() holds a spinlock with irq disabled, so I can't
really explain how you'd get there. I've just booted a lockdep enabled
v5.5 on my D05, moved SPIs around (because that's the only way to reach
this code), and nothing caught fire.

Either the stack trace isn't reliable (when I read things like
"80d:00h:35m:42s" in the trace, I'm a bit suspicious), or CPU hotplug is
doing something really funky here.

	M.
-- 
Jazz is not dead. It just smells funny...

next prev parent reply	other threads:[~2020-02-01 11:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-15 11:44 [PATCH V5 0/6] blk-mq: improvement CPU hotplug Ming Lei
2020-01-15 11:44 ` [PATCH 1/6] blk-mq: add new state of BLK_MQ_S_INACTIVE Ming Lei
2020-01-15 11:44 ` [PATCH 2/6] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
2020-01-15 11:44 ` [PATCH 3/6] blk-mq: stop to handle IO and drain IO before hctx becomes inactive Ming Lei
2020-01-15 11:44 ` [PATCH 4/6] blk-mq: re-submit IO in case that hctx is inactive Ming Lei
2020-01-15 11:44 ` [PATCH 5/6] blk-mq: handle requests dispatched from IO scheduler in case of inactive hctx Ming Lei
2020-01-15 11:44 ` [PATCH 6/6] block: deactivate hctx when all its CPUs are offline when running queue Ming Lei
2020-01-15 17:00 ` [PATCH V5 0/6] blk-mq: improvement CPU hotplug John Garry
2020-01-20 13:23   ` John Garry
2020-01-31 10:04     ` Ming Lei
2020-01-31 10:24       ` John Garry
2020-01-31 10:58         ` Ming Lei
2020-01-31 17:51           ` John Garry
2020-01-31 18:02             ` John Garry
2020-02-01  1:31               ` Ming Lei
2020-02-01 11:05                 ` Marc Zyngier [this message]
2020-02-01 11:31                   ` Thomas Gleixner
2020-02-03 10:30                     ` John Garry
2020-02-03 10:49                       ` John Garry
2020-02-03 10:59                         ` Ming Lei
2020-02-03 12:56                           ` John Garry
2020-02-03 15:43                             ` Marc Zyngier
2020-02-03 18:16                               ` John Garry
2020-02-05 14:08                                 ` John Garry
2020-02-05 14:23                                   ` Marc Zyngier
2020-02-07 10:56           ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200201110539.03db5434@why \
    --to=maz@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.