linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Two blk-mq related topics
@ 2018-01-29 15:46 Ming Lei
  2018-01-29 20:40 ` Mike Snitzer
  2018-01-29 20:56 ` James Bottomley
  0 siblings, 2 replies; 13+ messages in thread
From: Ming Lei @ 2018-01-29 15:46 UTC (permalink / raw)


Hi guys,

Two blk-mq related topics

1. blk-mq vs. CPU hotplug & IRQ vectors spread on CPUs

We have done three big changes in this field before, each time some issues
are fixed, meantime new ones are introduced

1) freeze all queues during CPU hotplug handler
- issues: queue dependency such as loop-mq/dm vs underlying queues, NVMe admin
queue vs. namespace queues, and IO hang may be caused during freezing all
these queues in CPU hotplug handler.

2) IRQ vectors spread on all present CPUs
- fix issue on 1)
- new issues introduced: don't support CPU hotplug physically, and cause blk-mq
warning during dispatch

3) IRQ vectors spread on all possible CPUs
- can support CPU hotplug physically
- warning in __blk_mq_run_hw_queue() still may be triggered if CPU
  offline/online happens between blk_mq_hctx_next_cpu() and running
   __blk_mq_run_hw_queue()
- new issues introduced: queue mapping may be distorted completely,
patch sent out(https://marc.info/?t=151603230900002&r=1&w=2), but may
need further discussion about this approach; drivers(such as NVMe) may
need to pass 'num_possible_cpus()' as the max vectors for allocating
irq vectors; some drivers(NVMe) uses hard-code hw queue index directly,
then this way becomes very fragile, since the hw queue may be inactive
from the beginning.

Also starting from 2), another issue is that IO completion may not be
delivered to CPUs, for example, IO may be dispatched to hw queue just
before(or after) all CPUs mapped to the hctx become offline, then IRQ
vector of the hw queue can be shutdown. Now seems we depend on timeout
handler to deal with the situation, and is there better way to solve this
issue?

2. When to enable SCSI_MQ at default again?

SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1,
it is enabled at default, but later the patch is reverted in V4.13-rc7, and
becomes disabled at default too.

Now both the original reported PM issue(actually SCSI quiesce) and the
sequential IO performance issue have been addressed. And MQ IO schedulers
are ready too for traditional disks. Are there other issues to be addressed
for enabling SCSI_MQ at default? When can we do that again?

Last time, the two issues were reported during V4.13 dev cycle just when it is
enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't
be exposed to run/tested completely & fully.  

So if we continue to disable it at default, maybe it can never be exposed to
full test/production environment.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-02-07 10:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-29 15:46 [LSF/MM TOPIC] Two blk-mq related topics Ming Lei
2018-01-29 20:40 ` Mike Snitzer
2018-01-30  1:27   ` Ming Lei
2018-01-29 20:56 ` James Bottomley
2018-01-29 21:00   ` Jens Axboe
2018-01-29 23:46     ` James Bottomley
2018-01-30  1:47       ` Jens Axboe
2018-01-30 10:08     ` Johannes Thumshirn
2018-01-30 10:50       ` Mel Gorman
2018-01-30  1:24   ` Ming Lei
2018-01-30  8:33     ` Martin Steigerwald
2018-01-30 10:33     ` John Garry
2018-02-07 10:55       ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).