* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-29 15:46 ` Ming Lei 0 siblings, 0 replies; 31+ messages in thread From: Ming Lei @ 2018-01-29 15:46 UTC (permalink / raw) To: lsf-pc, Linux-scsi, linux-block, linux-nvme Hi guys, Two blk-mq related topics 1. blk-mq vs. CPU hotplug & IRQ vectors spread on CPUs We have done three big changes in this field before, each time some issues are fixed, meantime new ones are introduced 1) freeze all queues during CPU hotplug handler - issues: queue dependency such as loop-mq/dm vs underlying queues, NVMe admin queue vs. namespace queues, and IO hang may be caused during freezing all these queues in CPU hotplug handler. 2) IRQ vectors spread on all present CPUs - fix issue on 1) - new issues introduced: don't support CPU hotplug physically, and cause blk-mq warning during dispatch 3) IRQ vectors spread on all possible CPUs - can support CPU hotplug physically - warning in __blk_mq_run_hw_queue() still may be triggered if CPU offline/online happens between blk_mq_hctx_next_cpu() and running __blk_mq_run_hw_queue() - new issues introduced: queue mapping may be distorted completely, patch sent out(https://marc.info/?t=151603230900002&r=1&w=2), but may need further discussion about this approach; drivers(such as NVMe) may need to pass 'num_possible_cpus()' as the max vectors for allocating irq vectors; some drivers(NVMe) uses hard-code hw queue index directly, then this way becomes very fragile, since the hw queue may be inactive from the beginning. Also starting from 2), another issue is that IO completion may not be delivered to CPUs, for example, IO may be dispatched to hw queue just before(or after) all CPUs mapped to the hctx become offline, then IRQ vector of the hw queue can be shutdown. Now seems we depend on timeout handler to deal with the situation, and is there better way to solve this issue? 2. When to enable SCSI_MQ at default again? SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, it is enabled at default, but later the patch is reverted in V4.13-rc7, and becomes disabled at default too. Now both the original reported PM issue(actually SCSI quiesce) and the sequential IO performance issue have been addressed. And MQ IO schedulers are ready too for traditional disks. Are there other issues to be addressed for enabling SCSI_MQ at default? When can we do that again? Last time, the two issues were reported during V4.13 dev cycle just when it is enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't be exposed to run/tested completely & fully. So if we continue to disable it at default, maybe it can never be exposed to full test/production environment. Thanks, Ming ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-29 15:46 ` Ming Lei 0 siblings, 0 replies; 31+ messages in thread From: Ming Lei @ 2018-01-29 15:46 UTC (permalink / raw) Hi guys, Two blk-mq related topics 1. blk-mq vs. CPU hotplug & IRQ vectors spread on CPUs We have done three big changes in this field before, each time some issues are fixed, meantime new ones are introduced 1) freeze all queues during CPU hotplug handler - issues: queue dependency such as loop-mq/dm vs underlying queues, NVMe admin queue vs. namespace queues, and IO hang may be caused during freezing all these queues in CPU hotplug handler. 2) IRQ vectors spread on all present CPUs - fix issue on 1) - new issues introduced: don't support CPU hotplug physically, and cause blk-mq warning during dispatch 3) IRQ vectors spread on all possible CPUs - can support CPU hotplug physically - warning in __blk_mq_run_hw_queue() still may be triggered if CPU offline/online happens between blk_mq_hctx_next_cpu() and running __blk_mq_run_hw_queue() - new issues introduced: queue mapping may be distorted completely, patch sent out(https://marc.info/?t=151603230900002&r=1&w=2), but may need further discussion about this approach; drivers(such as NVMe) may need to pass 'num_possible_cpus()' as the max vectors for allocating irq vectors; some drivers(NVMe) uses hard-code hw queue index directly, then this way becomes very fragile, since the hw queue may be inactive from the beginning. Also starting from 2), another issue is that IO completion may not be delivered to CPUs, for example, IO may be dispatched to hw queue just before(or after) all CPUs mapped to the hctx become offline, then IRQ vector of the hw queue can be shutdown. Now seems we depend on timeout handler to deal with the situation, and is there better way to solve this issue? 2. When to enable SCSI_MQ at default again? SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, it is enabled at default, but later the patch is reverted in V4.13-rc7, and becomes disabled at default too. Now both the original reported PM issue(actually SCSI quiesce) and the sequential IO performance issue have been addressed. And MQ IO schedulers are ready too for traditional disks. Are there other issues to be addressed for enabling SCSI_MQ at default? When can we do that again? Last time, the two issues were reported during V4.13 dev cycle just when it is enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't be exposed to run/tested completely & fully. So if we continue to disable it at default, maybe it can never be exposed to full test/production environment. Thanks, Ming ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 15:46 ` Ming Lei @ 2018-01-29 20:40 ` Mike Snitzer -1 siblings, 0 replies; 31+ messages in thread From: Mike Snitzer @ 2018-01-29 20:40 UTC (permalink / raw) To: Ming Lei; +Cc: lsf-pc, Linux-scsi, linux-block, linux-nvme On Mon, Jan 29 2018 at 10:46am -0500, Ming Lei <ming.lei@redhat.com> wrote: > 2. When to enable SCSI_MQ at default again? > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, > it is enabled at default, but later the patch is reverted in V4.13-rc7, and > becomes disabled at default too. > > Now both the original reported PM issue(actually SCSI quiesce) and the > sequential IO performance issue have been addressed. And MQ IO schedulers > are ready too for traditional disks. Are there other issues to be addressed > for enabling SCSI_MQ at default? When can we do that again? > > Last time, the two issues were reported during V4.13 dev cycle just when it is > enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't > be exposed to run/tested completely & fully. > > So if we continue to disable it at default, maybe it can never be exposed to > full test/production environment. I was going to propose revisiting this as well. I'd really like to see all the old .request_fn block core code removed. But maybe we take a first step of enabling: CONFIG_SCSI_MQ_DEFAULT=Y CONFIG_DM_MQ_DEFAULT=Y Thanks, Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-29 20:40 ` Mike Snitzer 0 siblings, 0 replies; 31+ messages in thread From: Mike Snitzer @ 2018-01-29 20:40 UTC (permalink / raw) On Mon, Jan 29 2018 at 10:46am -0500, Ming Lei <ming.lei@redhat.com> wrote: > 2. When to enable SCSI_MQ at default again? > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, > it is enabled at default, but later the patch is reverted in V4.13-rc7, and > becomes disabled at default too. > > Now both the original reported PM issue(actually SCSI quiesce) and the > sequential IO performance issue have been addressed. And MQ IO schedulers > are ready too for traditional disks. Are there other issues to be addressed > for enabling SCSI_MQ at default? When can we do that again? > > Last time, the two issues were reported during V4.13 dev cycle just when it is > enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't > be exposed to run/tested completely & fully. > > So if we continue to disable it at default, maybe it can never be exposed to > full test/production environment. I was going to propose revisiting this as well. I'd really like to see all the old .request_fn block core code removed. But maybe we take a first step of enabling: CONFIG_SCSI_MQ_DEFAULT=Y CONFIG_DM_MQ_DEFAULT=Y Thanks, Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 20:40 ` Mike Snitzer (?) @ 2018-01-30 1:27 ` Ming Lei -1 siblings, 0 replies; 31+ messages in thread From: Ming Lei @ 2018-01-30 1:27 UTC (permalink / raw) To: Mike Snitzer; +Cc: linux-block, lsf-pc, linux-nvme, Linux-scsi On Mon, Jan 29, 2018 at 03:40:31PM -0500, Mike Snitzer wrote: > On Mon, Jan 29 2018 at 10:46am -0500, > Ming Lei <ming.lei@redhat.com> wrote: > > > 2. When to enable SCSI_MQ at default again? > > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, > > it is enabled at default, but later the patch is reverted in V4.13-rc7, and > > becomes disabled at default too. > > > > Now both the original reported PM issue(actually SCSI quiesce) and the > > sequential IO performance issue have been addressed. And MQ IO schedulers > > are ready too for traditional disks. Are there other issues to be addressed > > for enabling SCSI_MQ at default? When can we do that again? > > > > Last time, the two issues were reported during V4.13 dev cycle just when it is > > enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't > > be exposed to run/tested completely & fully. > > > > So if we continue to disable it at default, maybe it can never be exposed to > > full test/production environment. > > I was going to propose revisiting this as well. > > I'd really like to see all the old .request_fn block core code removed. Yeah, that should be a final goal, but may take a bit long. > > But maybe we take a first step of enabling: > CONFIG_SCSI_MQ_DEFAULT=Y > CONFIG_DM_MQ_DEFAULT=Y Maybe you can remove legacy path from DM_RQ first, and take your original approach to allow DM/MQ over legacy underlying driver, seems we discussed this topic before, :-) Thanks, Ming _______________________________________________ Lsf-pc mailing list Lsf-pc@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/lsf-pc ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 1:27 ` Ming Lei 0 siblings, 0 replies; 31+ messages in thread From: Ming Lei @ 2018-01-30 1:27 UTC (permalink / raw) To: Mike Snitzer; +Cc: lsf-pc, Linux-scsi, linux-block, linux-nvme On Mon, Jan 29, 2018 at 03:40:31PM -0500, Mike Snitzer wrote: > On Mon, Jan 29 2018 at 10:46am -0500, > Ming Lei <ming.lei@redhat.com> wrote: > > > 2. When to enable SCSI_MQ at default again? > > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, > > it is enabled at default, but later the patch is reverted in V4.13-rc7, and > > becomes disabled at default too. > > > > Now both the original reported PM issue(actually SCSI quiesce) and the > > sequential IO performance issue have been addressed. And MQ IO schedulers > > are ready too for traditional disks. Are there other issues to be addressed > > for enabling SCSI_MQ at default? When can we do that again? > > > > Last time, the two issues were reported during V4.13 dev cycle just when it is > > enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't > > be exposed to run/tested completely & fully. > > > > So if we continue to disable it at default, maybe it can never be exposed to > > full test/production environment. > > I was going to propose revisiting this as well. > > I'd really like to see all the old .request_fn block core code removed. Yeah, that should be a final goal, but may take a bit long. > > But maybe we take a first step of enabling: > CONFIG_SCSI_MQ_DEFAULT=Y > CONFIG_DM_MQ_DEFAULT=Y Maybe you can remove legacy path from DM_RQ first, and take your original approach to allow DM/MQ over legacy underlying driver, seems we discussed this topic before, :-) Thanks, Ming ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 1:27 ` Ming Lei 0 siblings, 0 replies; 31+ messages in thread From: Ming Lei @ 2018-01-30 1:27 UTC (permalink / raw) On Mon, Jan 29, 2018@03:40:31PM -0500, Mike Snitzer wrote: > On Mon, Jan 29 2018 at 10:46am -0500, > Ming Lei <ming.lei@redhat.com> wrote: > > > 2. When to enable SCSI_MQ at default again? > > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1, > > it is enabled at default, but later the patch is reverted in V4.13-rc7, and > > becomes disabled at default too. > > > > Now both the original reported PM issue(actually SCSI quiesce) and the > > sequential IO performance issue have been addressed. And MQ IO schedulers > > are ready too for traditional disks. Are there other issues to be addressed > > for enabling SCSI_MQ at default? When can we do that again? > > > > Last time, the two issues were reported during V4.13 dev cycle just when it is > > enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't > > be exposed to run/tested completely & fully. > > > > So if we continue to disable it at default, maybe it can never be exposed to > > full test/production environment. > > I was going to propose revisiting this as well. > > I'd really like to see all the old .request_fn block core code removed. Yeah, that should be a final goal, but may take a bit long. > > But maybe we take a first step of enabling: > CONFIG_SCSI_MQ_DEFAULT=Y > CONFIG_DM_MQ_DEFAULT=Y Maybe you can remove legacy path from DM_RQ first, and take your original approach to allow DM/MQ over legacy underlying driver, seems we discussed this topic before, :-) Thanks, Ming ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 15:46 ` Ming Lei @ 2018-01-29 20:56 ` James Bottomley -1 siblings, 0 replies; 31+ messages in thread From: James Bottomley @ 2018-01-29 20:56 UTC (permalink / raw) To: Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: [...] > 2. When to enable SCSI_MQ at default again? I'm not sure there's much to discuss ... I think the basic answer is as soon as Christoph wants to try it again. > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > V4.13-rc1, it is enabled at default, but later the patch is reverted > in V4.13-rc7, and becomes disabled at default too. > > Now both the original reported PM issue(actually SCSI quiesce) and > the sequential IO performance issue have been addressed. Is the blocker bug just not closed because no-one thought to do it: https://bugzilla.kernel.org/show_bug.cgi?id=178381 (we have confirmed that this issue is now fixed with the original reporter?) And did the Huawei guy (Jonathan Cameron) confirm his performance issue was fixed (I don't think I saw email that he did)? James ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-29 20:56 ` James Bottomley 0 siblings, 0 replies; 31+ messages in thread From: James Bottomley @ 2018-01-29 20:56 UTC (permalink / raw) On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: [...] > 2. When to enable SCSI_MQ at default again? I'm not sure there's much to discuss ... I think the basic answer is as soon as Christoph wants to try it again. > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > V4.13-rc1, it is enabled at default, but later the patch is reverted > in V4.13-rc7, and becomes disabled at default too. > > Now both the original reported PM issue(actually SCSI quiesce) and > the sequential IO performance issue have been addressed. Is the blocker bug just not closed because no-one thought to do it: https://bugzilla.kernel.org/show_bug.cgi?id=178381 (we have confirmed that this issue is now fixed with the original reporter?) And did the Huawei guy (Jonathan Cameron) confirm his performance issue was fixed (I don't think I saw email that he did)? James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 20:56 ` James Bottomley @ 2018-01-29 21:00 ` Jens Axboe -1 siblings, 0 replies; 31+ messages in thread From: Jens Axboe @ 2018-01-29 21:00 UTC (permalink / raw) To: James Bottomley, Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme On 1/29/18 1:56 PM, James Bottomley wrote: > On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: > [...] >> 2. When to enable SCSI_MQ at default again? > > I'm not sure there's much to discuss ... I think the basic answer is as > soon as Christoph wants to try it again. FWIW, internally I've been running various IO intensive workloads on what is essentially 4.12 upstream with scsi-mq the default (with mq-deadline as the scheduler) and comparing IO workloads with a previous 4.6 kernel (without scsi-mq), and things are looking great. We're never going to iron out the last kinks with it being off by default, I think we should attempt to flip the switch again for 4.16. -- Jens Axboe ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-29 21:00 ` Jens Axboe 0 siblings, 0 replies; 31+ messages in thread From: Jens Axboe @ 2018-01-29 21:00 UTC (permalink / raw) On 1/29/18 1:56 PM, James Bottomley wrote: > On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: > [...] >> 2. When to enable SCSI_MQ at default again? > > I'm not sure there's much to discuss ... I think the basic answer is as > soon as Christoph wants to try it again. FWIW, internally I've been running various IO intensive workloads on what is essentially 4.12 upstream with scsi-mq the default (with mq-deadline as the scheduler) and comparing IO workloads with a previous 4.6 kernel (without scsi-mq), and things are looking great. We're never going to iron out the last kinks with it being off by default, I think we should attempt to flip the switch again for 4.16. -- Jens Axboe ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 21:00 ` Jens Axboe @ 2018-01-29 23:46 ` James Bottomley -1 siblings, 0 replies; 31+ messages in thread From: James Bottomley @ 2018-01-29 23:46 UTC (permalink / raw) To: Jens Axboe, Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme On Mon, 2018-01-29 at 14:00 -0700, Jens Axboe wrote: > On 1/29/18 1:56 PM, James Bottomley wrote: > > > > On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: > > [...] > > > > > > 2. When to enable SCSI_MQ at default again? > > > > I'm not sure there's much to discuss ... I think the basic answer > > is as soon as Christoph wants to try it again. > > FWIW, internally I've been running various IO intensive workloads on > what is essentially 4.12 upstream with scsi-mq the default (with > mq-deadline as the scheduler) and comparing IO workloads with a > previous 4.6 kernel (without scsi-mq), and things are looking > great. > > We're never going to iron out the last kinks with it being off > by default, I think we should attempt to flip the switch again > for 4.16. Absolutely, I agree we turn it on ASAP. I just don't want to be on the receiving end of Linus' flamethrower because a bug we already had reported against scsi-mq caused problems. Get confirmation from the original reporters (or as close to it as you can) that their problems are fixed and we're good to go; he won't kick us nearly as hard for new bugs that turn up. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-29 23:46 ` James Bottomley 0 siblings, 0 replies; 31+ messages in thread From: James Bottomley @ 2018-01-29 23:46 UTC (permalink / raw) On Mon, 2018-01-29@14:00 -0700, Jens Axboe wrote: > On 1/29/18 1:56 PM, James Bottomley wrote: > > > > On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: > > [...] > > > > > > 2. When to enable SCSI_MQ at default again? > > > > I'm not sure there's much to discuss ... I think the basic answer > > is as soon as Christoph wants to try it again. > > FWIW, internally I've been running various IO intensive workloads on > what is essentially 4.12 upstream with scsi-mq the default (with > mq-deadline as the scheduler) and comparing IO workloads with a > previous 4.6 kernel (without scsi-mq), and things are looking > great. > > We're never going to iron out the last kinks with it being off > by default, I think we should attempt to flip the switch again > for 4.16. Absolutely, I agree we turn it on ASAP. ?I just don't want to be on the receiving end of Linus' flamethrower because a bug we already had reported against scsi-mq caused problems. ?Get confirmation from the original reporters (or as close to it as you can) that their problems are fixed and we're good to go; he won't kick us nearly as hard for new bugs that turn up. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 23:46 ` James Bottomley @ 2018-01-30 1:47 ` Jens Axboe -1 siblings, 0 replies; 31+ messages in thread From: Jens Axboe @ 2018-01-30 1:47 UTC (permalink / raw) To: James Bottomley, Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme On 1/29/18 4:46 PM, James Bottomley wrote: > On Mon, 2018-01-29 at 14:00 -0700, Jens Axboe wrote: >> On 1/29/18 1:56 PM, James Bottomley wrote: >>> >>> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >>> [...] >>>> >>>> 2. When to enable SCSI_MQ at default again? >>> >>> I'm not sure there's much to discuss ... I think the basic answer >>> is as soon as Christoph wants to try it again. >> >> FWIW, internally I've been running various IO intensive workloads on >> what is essentially 4.12 upstream with scsi-mq the default (with >> mq-deadline as the scheduler) and comparing IO workloads with a >> previous 4.6 kernel (without scsi-mq), and things are looking >> great. >> >> We're never going to iron out the last kinks with it being off >> by default, I think we should attempt to flip the switch again >> for 4.16. > > Absolutely, I agree we turn it on ASAP. I just don't want to be on the > receiving end of Linus' flamethrower because a bug we already had > reported against scsi-mq caused problems. Get confirmation from the > original reporters (or as close to it as you can) that their problems > are fixed and we're good to go; he won't kick us nearly as hard for new > bugs that turn up. I agree, the functional issues definitely have to be verified to be resolved. Various performance hitches we can dive into if they crop up, but reintroducing some random suspend regression is not acceptable. -- Jens Axboe ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 1:47 ` Jens Axboe 0 siblings, 0 replies; 31+ messages in thread From: Jens Axboe @ 2018-01-30 1:47 UTC (permalink / raw) On 1/29/18 4:46 PM, James Bottomley wrote: > On Mon, 2018-01-29@14:00 -0700, Jens Axboe wrote: >> On 1/29/18 1:56 PM, James Bottomley wrote: >>> >>> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: >>> [...] >>>> >>>> 2. When to enable SCSI_MQ at default again? >>> >>> I'm not sure there's much to discuss ... I think the basic answer >>> is as soon as Christoph wants to try it again. >> >> FWIW, internally I've been running various IO intensive workloads on >> what is essentially 4.12 upstream with scsi-mq the default (with >> mq-deadline as the scheduler) and comparing IO workloads with a >> previous 4.6 kernel (without scsi-mq), and things are looking >> great. >> >> We're never going to iron out the last kinks with it being off >> by default, I think we should attempt to flip the switch again >> for 4.16. > > Absolutely, I agree we turn it on ASAP. ?I just don't want to be on the > receiving end of Linus' flamethrower because a bug we already had > reported against scsi-mq caused problems. ?Get confirmation from the > original reporters (or as close to it as you can) that their problems > are fixed and we're good to go; he won't kick us nearly as hard for new > bugs that turn up. I agree, the functional issues definitely have to be verified to be resolved. Various performance hitches we can dive into if they crop up, but reintroducing some random suspend regression is not acceptable. -- Jens Axboe ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 21:00 ` Jens Axboe (?) @ 2018-01-30 10:08 ` Johannes Thumshirn -1 siblings, 0 replies; 31+ messages in thread From: Johannes Thumshirn @ 2018-01-30 10:08 UTC (permalink / raw) To: Jens Axboe Cc: James Bottomley, Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme, Mel Gorman [+Cc Mel] Jens Axboe <axboe@kernel.dk> writes: > On 1/29/18 1:56 PM, James Bottomley wrote: >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >> [...] >>> 2. When to enable SCSI_MQ at default again? >>=20 >> I'm not sure there's much to discuss ... I think the basic answer is as >> soon as Christoph wants to try it again. > > FWIW, internally I've been running various IO intensive workloads on > what is essentially 4.12 upstream with scsi-mq the default (with > mq-deadline as the scheduler) and comparing IO workloads with a > previous 4.6 kernel (without scsi-mq), and things are looking > great. > > We're never going to iron out the last kinks with it being off > by default, I think we should attempt to flip the switch again > for 4.16. The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as we where considering to flip the config option for SLES and it showed several road blocks. I'm not sure whether he re-evaluated 4.13/4.14 on his grid though. But I'm definitively interested in this discussion and can even possibly share some benchmark results we did in our FC Lab. Byte, Johannes --=20 Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg GF: Felix Imend=C3=B6rffer, Jane Smithard, Graham Norton HRB 21284 (AG N=C3=BCrnberg) Key fingerprint =3D EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 10:08 ` Johannes Thumshirn 0 siblings, 0 replies; 31+ messages in thread From: Johannes Thumshirn @ 2018-01-30 10:08 UTC (permalink / raw) To: Jens Axboe Cc: James Bottomley, Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme, Mel Gorman [+Cc Mel] Jens Axboe <axboe@kernel.dk> writes: > On 1/29/18 1:56 PM, James Bottomley wrote: >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >> [...] >>> 2. When to enable SCSI_MQ at default again? >> >> I'm not sure there's much to discuss ... I think the basic answer is as >> soon as Christoph wants to try it again. > > FWIW, internally I've been running various IO intensive workloads on > what is essentially 4.12 upstream with scsi-mq the default (with > mq-deadline as the scheduler) and comparing IO workloads with a > previous 4.6 kernel (without scsi-mq), and things are looking > great. > > We're never going to iron out the last kinks with it being off > by default, I think we should attempt to flip the switch again > for 4.16. The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as we where considering to flip the config option for SLES and it showed several road blocks. I'm not sure whether he re-evaluated 4.13/4.14 on his grid though. But I'm definitively interested in this discussion and can even possibly share some benchmark results we did in our FC Lab. Byte, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 10:08 ` Johannes Thumshirn 0 siblings, 0 replies; 31+ messages in thread From: Johannes Thumshirn @ 2018-01-30 10:08 UTC (permalink / raw) [+Cc Mel] Jens Axboe <axboe at kernel.dk> writes: > On 1/29/18 1:56 PM, James Bottomley wrote: >> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: >> [...] >>> 2. When to enable SCSI_MQ at default again? >> >> I'm not sure there's much to discuss ... I think the basic answer is as >> soon as Christoph wants to try it again. > > FWIW, internally I've been running various IO intensive workloads on > what is essentially 4.12 upstream with scsi-mq the default (with > mq-deadline as the scheduler) and comparing IO workloads with a > previous 4.6 kernel (without scsi-mq), and things are looking > great. > > We're never going to iron out the last kinks with it being off > by default, I think we should attempt to flip the switch again > for 4.16. The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as we where considering to flip the config option for SLES and it showed several road blocks. I'm not sure whether he re-evaluated 4.13/4.14 on his grid though. But I'm definitively interested in this discussion and can even possibly share some benchmark results we did in our FC Lab. Byte, Johannes -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-30 10:08 ` Johannes Thumshirn @ 2018-01-30 10:50 ` Mel Gorman -1 siblings, 0 replies; 31+ messages in thread From: Mel Gorman @ 2018-01-30 10:50 UTC (permalink / raw) To: Johannes Thumshirn Cc: Jens Axboe, James Bottomley, Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme On Tue, Jan 30, 2018 at 11:08:28AM +0100, Johannes Thumshirn wrote: > [+Cc Mel] > Jens Axboe <axboe@kernel.dk> writes: > > On 1/29/18 1:56 PM, James Bottomley wrote: > >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: > >> [...] > >>> 2. When to enable SCSI_MQ at default again? > >> > >> I'm not sure there's much to discuss ... I think the basic answer is as > >> soon as Christoph wants to try it again. > > > > FWIW, internally I've been running various IO intensive workloads on > > what is essentially 4.12 upstream with scsi-mq the default (with > > mq-deadline as the scheduler) and comparing IO workloads with a > > previous 4.6 kernel (without scsi-mq), and things are looking > > great. > > > > We're never going to iron out the last kinks with it being off > > by default, I think we should attempt to flip the switch again > > for 4.16. > > The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as > we where considering to flip the config option for SLES and it showed > several road blocks. > Mostly due to slow storage and BFQ where mq-deadline was not a universal win as an alternative default. I don't have current data and I archived what I had, but it was based on 4.13-rc7 at the time and BFQ has changed a lot since so it would need to be redone. > I'm not sure whether he re-evaluated 4.13/4.14 on his grid though. > No, it hasn't. Grid time for performance testing has been tight during the last few months to say the least. > But I'm definitively interested in this discussion and can even possibly > share some benchmark results we did in our FC Lab. > If you remind me, I may be able to re-execute the tests in a 4.16-rcX before LSF/MM so you have other data to work with. Unfortunately, I'll not be able to make LSF/MM this time due to personal commitments that conflict and are unmovable. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 10:50 ` Mel Gorman 0 siblings, 0 replies; 31+ messages in thread From: Mel Gorman @ 2018-01-30 10:50 UTC (permalink / raw) On Tue, Jan 30, 2018@11:08:28AM +0100, Johannes Thumshirn wrote: > [+Cc Mel] > Jens Axboe <axboe at kernel.dk> writes: > > On 1/29/18 1:56 PM, James Bottomley wrote: > >> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: > >> [...] > >>> 2. When to enable SCSI_MQ at default again? > >> > >> I'm not sure there's much to discuss ... I think the basic answer is as > >> soon as Christoph wants to try it again. > > > > FWIW, internally I've been running various IO intensive workloads on > > what is essentially 4.12 upstream with scsi-mq the default (with > > mq-deadline as the scheduler) and comparing IO workloads with a > > previous 4.6 kernel (without scsi-mq), and things are looking > > great. > > > > We're never going to iron out the last kinks with it being off > > by default, I think we should attempt to flip the switch again > > for 4.16. > > The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as > we where considering to flip the config option for SLES and it showed > several road blocks. > Mostly due to slow storage and BFQ where mq-deadline was not a universal win as an alternative default. I don't have current data and I archived what I had, but it was based on 4.13-rc7 at the time and BFQ has changed a lot since so it would need to be redone. > I'm not sure whether he re-evaluated 4.13/4.14 on his grid though. > No, it hasn't. Grid time for performance testing has been tight during the last few months to say the least. > But I'm definitively interested in this discussion and can even possibly > share some benchmark results we did in our FC Lab. > If you remind me, I may be able to re-execute the tests in a 4.16-rcX before LSF/MM so you have other data to work with. Unfortunately, I'll not be able to make LSF/MM this time due to personal commitments that conflict and are unmovable. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-29 20:56 ` James Bottomley @ 2018-01-30 1:24 ` Ming Lei -1 siblings, 0 replies; 31+ messages in thread From: Ming Lei @ 2018-01-30 1:24 UTC (permalink / raw) To: James Bottomley, John Garry; +Cc: lsf-pc, Linux-scsi, linux-block, linux-nvme On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote: > On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: > [...] > > 2. When to enable SCSI_MQ at default again? > > I'm not sure there's much to discuss ... I think the basic answer is as > soon as Christoph wants to try it again. I guess Christoph still need to evaluate if there are existed issues or blockers before trying it again. And more input may be got from F2F discussion, IMHO. > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > > V4.13-rc1, it is enabled at default, but later the patch is reverted > > in V4.13-rc7, and becomes disabled at default too. > > > > Now both the original reported PM issue(actually SCSI quiesce) and > > the sequential IO performance issue have been addressed. > > Is the blocker bug just not closed because no-one thought to do it: > > https://bugzilla.kernel.org/show_bug.cgi?id=178381 > > (we have confirmed that this issue is now fixed with the original > reporter?) >From a developer view, this issue is fixed by the following commit: 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), and it is verified by kernel list reporter. > > And did the Huawei guy (Jonathan Cameron) confirm his performance issue > was fixed (I don't think I saw email that he did)? Last time I talked with John Garry about the issue, and the merged .get_budget based patch improves much on the IO performance, but there is still a bit gap compared with legacy path. Seems a driver specific issue, remembered that removing a driver's lock can improve performance much. Garry, could you provide further update on this issue? Thanks, Ming ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 1:24 ` Ming Lei 0 siblings, 0 replies; 31+ messages in thread From: Ming Lei @ 2018-01-30 1:24 UTC (permalink / raw) On Mon, Jan 29, 2018@12:56:30PM -0800, James Bottomley wrote: > On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: > [...] > > 2. When to enable SCSI_MQ at default again? > > I'm not sure there's much to discuss ... I think the basic answer is as > soon as Christoph wants to try it again. I guess Christoph still need to evaluate if there are existed issues or blockers before trying it again. And more input may be got from F2F discussion, IMHO. > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > > V4.13-rc1, it is enabled at default, but later the patch is reverted > > in V4.13-rc7, and becomes disabled at default too. > > > > Now both the original reported PM issue(actually SCSI quiesce) and > > the sequential IO performance issue have been addressed. > > Is the blocker bug just not closed because no-one thought to do it: > > https://bugzilla.kernel.org/show_bug.cgi?id=178381 > > (we have confirmed that this issue is now fixed with the original > reporter?) >From a developer view, this issue is fixed by the following commit: 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), and it is verified by kernel list reporter. > > And did the Huawei guy (Jonathan Cameron) confirm his performance issue > was fixed (I don't think I saw email that he did)? Last time I talked with John Garry about the issue, and the merged .get_budget based patch improves much on the IO performance, but there is still a bit gap compared with legacy path. Seems a driver specific issue, remembered that removing a driver's lock can improve performance much. Garry, could you provide further update on this issue? Thanks, Ming ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Two blk-mq related topics 2018-01-30 1:24 ` Ming Lei (?) @ 2018-01-30 8:33 ` Martin Steigerwald -1 siblings, 0 replies; 31+ messages in thread From: Martin Steigerwald @ 2018-01-30 8:33 UTC (permalink / raw) To: Ming Lei Cc: linux-block, Linux-scsi, John Garry, linux-nvme, James Bottomley, lsf-pc TWluZyBMZWkgLSAzMC4wMS4xOCwgMDI6MjQ6Cj4gPiA+IFNDU0lfTVEgaXMgZW5hYmxlZCBvbiBW My4xNyBmaXJzdGx5LCBidXQgZGlzYWJsZWQgYXQgZGVmYXVsdC4gSW4KPiA+ID4gVjQuMTMtcmMx LCBpdCBpcyBlbmFibGVkIGF0IGRlZmF1bHQsIGJ1dCBsYXRlciB0aGUgcGF0Y2ggaXMgcmV2ZXJ0 ZWQKPiA+ID4gaW4gVjQuMTMtcmM3LCBhbmQgYmVjb21lcyBkaXNhYmxlZCBhdCBkZWZhdWx0IHRv by4KPiA+ID4gCj4gPiA+IE5vdyBib3RoIHRoZSBvcmlnaW5hbCByZXBvcnRlZCBQTSBpc3N1ZShh Y3R1YWxseSBTQ1NJIHF1aWVzY2UpIGFuZAo+ID4gPiB0aGUgc2VxdWVudGlhbCBJTyBwZXJmb3Jt YW5jZSBpc3N1ZSBoYXZlIGJlZW4gYWRkcmVzc2VkLgo+ID4gCj4gPiBJcyB0aGUgYmxvY2tlciBi dWcganVzdCBub3QgY2xvc2VkIGJlY2F1c2Ugbm8tb25lIHRob3VnaHQgdG8gZG8gaXQ6Cj4gPiAK PiA+IGh0dHBzOi8vYnVnemlsbGEua2VybmVsLm9yZy9zaG93X2J1Zy5jZ2k/aWQ9MTc4MzgxCj4g PiAKPiA+ICh3ZSBoYXZlIGNvbmZpcm1lZCB0aGF0IHRoaXMgaXNzdWUgaXMgbm93IGZpeGVkIHdp dGggdGhlIG9yaWdpbmFsCj4gPiByZXBvcnRlcj8pCj4gCj4gRnJvbSBhIGRldmVsb3BlciB2aWV3 LCB0aGlzIGlzc3VlIGlzIGZpeGVkIGJ5IHRoZSBmb2xsb3dpbmcgY29tbWl0Ogo+IDNhMGE1Mjk5 NyhibG9jaywgc2NzaTogTWFrZSBTQ1NJIHF1aWVzY2UgYW5kIHJlc3VtZSB3b3JrIHJlbGlhYmx5 KSwKPiBhbmQgaXQgaXMgdmVyaWZpZWQgYnkga2VybmVsIGxpc3QgcmVwb3J0ZXIuCgpJIG5ldmVy IHNlZW4gYW55IHN1c3BlbmQgLyBoaWJlcm5hdGUgcmVsYXRlZCBpc3N1ZXMgd2l0aCBibGstbXEg KyBiZnEgc2luY2UgCnRoZW4uIFVzaW5nIGhlYXZpbHkgdXRpbGl6ZWQgQlRSRlMgZHVhbCBTU0Qg UkFJRCAxLgoKJSBlZ3JlcCAiTVF8QkZRIiAvYm9vdC9jb25maWctNC4xNS4wLXRwNTIwLWJ0cmZz dHJpbSsKQ09ORklHX1BPU0lYX01RVUVVRT15CkNPTkZJR19QT1NJWF9NUVVFVUVfU1lTQ1RMPXkK Q09ORklHX0JMS19XQlRfTVE9eQpDT05GSUdfQkxLX01RX1BDST15CkNPTkZJR19CTEtfTVFfVklS VElPPXkKQ09ORklHX01RX0lPU0NIRURfREVBRExJTkU9bQpDT05GSUdfTVFfSU9TQ0hFRF9LWUJF Uj1tCkNPTkZJR19JT1NDSEVEX0JGUT1tCkNPTkZJR19CRlFfR1JPVVBfSU9TQ0hFRD15CkNPTkZJ R19ORVRfU0NIX01RUFJJTz1tCiMgQ09ORklHX1NDU0lfTVFfREVGQVVMVCBpcyBub3Qgc2V0CiMg Q09ORklHX0RNX01RX0RFRkFVTFQgaXMgbm90IHNldApDT05GSUdfRE1fQ0FDSEVfU01RPW0KCiUg Y2F0IC9wcm9jL2NtZGxpbmUgCkJPT1RfSU1BR0U9L3ZtbGludXotNC4xNS4wLXRwNTIwLWJ0cmZz dHJpbSsgcm9vdD1VVUlEPVvigKZdIHJvIApyb290ZmxhZ3M9c3Vidm9sPWRlYmlhbiByZXN1bWU9 L2Rldi9tYXBwZXIvc2F0YS1zd2FwIGluaXQ9L2Jpbi9zeXN0ZW1kIAp0aGlua3BhZF9hY3BpLmZh bl9jb250cm9sPTEgc3lzdGVtZC5yZXN0b3JlX3N0YXRlPTAgc2NzaV9tb2QudXNlX2Jsa19tcT0x CgolIGNhdCAvc3lzL2Jsb2NrL3NkYS9xdWV1ZS9zY2hlZHVsZXIgCltiZnFdIG5vbmUKClRoYW5r cywKLS0gCk1hcnRpbgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fXwpMc2YtcGMgbWFpbGluZyBsaXN0CkxzZi1wY0BsaXN0cy5saW51eC1mb3VuZGF0aW9uLm9y ZwpodHRwczovL2xpc3RzLmxpbnV4Zm91bmRhdGlvbi5vcmcvbWFpbG1hbi9saXN0aW5mby9sc2Yt cGMK ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 8:33 ` Martin Steigerwald 0 siblings, 0 replies; 31+ messages in thread From: Martin Steigerwald @ 2018-01-30 8:33 UTC (permalink / raw) To: Ming Lei Cc: James Bottomley, John Garry, lsf-pc, Linux-scsi, linux-block, linux-nvme Ming Lei - 30.01.18, 02:24: > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > > > V4.13-rc1, it is enabled at default, but later the patch is reverted > > > in V4.13-rc7, and becomes disabled at default too. > > > > > > Now both the original reported PM issue(actually SCSI quiesce) and > > > the sequential IO performance issue have been addressed. > > > > Is the blocker bug just not closed because no-one thought to do it: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=178381 > > > > (we have confirmed that this issue is now fixed with the original > > reporter?) > > From a developer view, this issue is fixed by the following commit: > 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), > and it is verified by kernel list reporter. I never seen any suspend / hibernate related issues with blk-mq + bfq since then. Using heavily utilized BTRFS dual SSD RAID 1. % egrep "MQ|BFQ" /boot/config-4.15.0-tp520-btrfstrim+ CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_BLK_WBT_MQ=y CONFIG_BLK_MQ_PCI=y CONFIG_BLK_MQ_VIRTIO=y CONFIG_MQ_IOSCHED_DEADLINE=m CONFIG_MQ_IOSCHED_KYBER=m CONFIG_IOSCHED_BFQ=m CONFIG_BFQ_GROUP_IOSCHED=y CONFIG_NET_SCH_MQPRIO=m # CONFIG_SCSI_MQ_DEFAULT is not set # CONFIG_DM_MQ_DEFAULT is not set CONFIG_DM_CACHE_SMQ=m % cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.15.0-tp520-btrfstrim+ root=UUID=[…] ro rootflags=subvol=debian resume=/dev/mapper/sata-swap init=/bin/systemd thinkpad_acpi.fan_control=1 systemd.restore_state=0 scsi_mod.use_blk_mq=1 % cat /sys/block/sda/queue/scheduler [bfq] none Thanks, -- Martin ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 8:33 ` Martin Steigerwald 0 siblings, 0 replies; 31+ messages in thread From: Martin Steigerwald @ 2018-01-30 8:33 UTC (permalink / raw) Ming Lei - 30.01.18, 02:24: > > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In > > > V4.13-rc1, it is enabled at default, but later the patch is reverted > > > in V4.13-rc7, and becomes disabled at default too. > > > > > > Now both the original reported PM issue(actually SCSI quiesce) and > > > the sequential IO performance issue have been addressed. > > > > Is the blocker bug just not closed because no-one thought to do it: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=178381 > > > > (we have confirmed that this issue is now fixed with the original > > reporter?) > > From a developer view, this issue is fixed by the following commit: > 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), > and it is verified by kernel list reporter. I never seen any suspend / hibernate related issues with blk-mq + bfq since then. Using heavily utilized BTRFS dual SSD RAID 1. % egrep "MQ|BFQ" /boot/config-4.15.0-tp520-btrfstrim+ CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_BLK_WBT_MQ=y CONFIG_BLK_MQ_PCI=y CONFIG_BLK_MQ_VIRTIO=y CONFIG_MQ_IOSCHED_DEADLINE=m CONFIG_MQ_IOSCHED_KYBER=m CONFIG_IOSCHED_BFQ=m CONFIG_BFQ_GROUP_IOSCHED=y CONFIG_NET_SCH_MQPRIO=m # CONFIG_SCSI_MQ_DEFAULT is not set # CONFIG_DM_MQ_DEFAULT is not set CONFIG_DM_CACHE_SMQ=m % cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.15.0-tp520-btrfstrim+ root=UUID=[?] ro rootflags=subvol=debian resume=/dev/mapper/sata-swap init=/bin/systemd thinkpad_acpi.fan_control=1 systemd.restore_state=0 scsi_mod.use_blk_mq=1 % cat /sys/block/sda/queue/scheduler [bfq] none Thanks, -- Martin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-30 1:24 ` Ming Lei (?) @ 2018-01-30 10:33 ` John Garry -1 siblings, 0 replies; 31+ messages in thread From: John Garry @ 2018-01-30 10:33 UTC (permalink / raw) To: Ming Lei, James Bottomley Cc: lsf-pc, Linux-scsi, linux-block, linux-nvme, Linuxarm On 30/01/2018 01:24, Ming Lei wrote: > On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote: >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >> [...] >>> 2. When to enable SCSI_MQ at default again? >> >> I'm not sure there's much to discuss ... I think the basic answer is as >> soon as Christoph wants to try it again. > > I guess Christoph still need to evaluate if there are existed issues or > blockers before trying it again. And more input may be got from F2F > discussion, IMHO. > >> >>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In >>> V4.13-rc1, it is enabled at default, but later the patch is reverted >>> in V4.13-rc7, and becomes disabled at default too. >>> >>> Now both the original reported PM issue(actually SCSI quiesce) and >>> the sequential IO performance issue have been addressed. >> >> Is the blocker bug just not closed because no-one thought to do it: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=178381 >> >> (we have confirmed that this issue is now fixed with the original >> reporter?) > >>>From a developer view, this issue is fixed by the following commit: > 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), > and it is verified by kernel list reporter. > >> >> And did the Huawei guy (Jonathan Cameron) confirm his performance issue >> was fixed (I don't think I saw email that he did)? > > Last time I talked with John Garry about the issue, and the merged .get_budget > based patch improves much on the IO performance, but there is still a bit gap > compared with legacy path. Seems a driver specific issue, remembered that removing > a driver's lock can improve performance much. > > Garry, could you provide further update on this issue? Hi Ming, From our testing with experimental changes to our driver to support SCSI mq we were almost getting on par performance with legacy path. But without these MQ was hitting performance (and I would not necessarily say it was a driver issue). We can retest from today's mainline and see where we are. BTW, Have you got performance figures for many other single queue HBAs with and without CONFIG_SCSI_MQ_DEFAULT=Y? Thanks, John > > Thanks, > Ming > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 10:33 ` John Garry 0 siblings, 0 replies; 31+ messages in thread From: John Garry @ 2018-01-30 10:33 UTC (permalink / raw) To: Ming Lei, James Bottomley Cc: lsf-pc, Linux-scsi, linux-block, linux-nvme, Linuxarm On 30/01/2018 01:24, Ming Lei wrote: > On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote: >> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >> [...] >>> 2. When to enable SCSI_MQ at default again? >> >> I'm not sure there's much to discuss ... I think the basic answer is as >> soon as Christoph wants to try it again. > > I guess Christoph still need to evaluate if there are existed issues or > blockers before trying it again. And more input may be got from F2F > discussion, IMHO. > >> >>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In >>> V4.13-rc1, it is enabled at default, but later the patch is reverted >>> in V4.13-rc7, and becomes disabled at default too. >>> >>> Now both the original reported PM issue(actually SCSI quiesce) and >>> the sequential IO performance issue have been addressed. >> >> Is the blocker bug just not closed because no-one thought to do it: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=178381 >> >> (we have confirmed that this issue is now fixed with the original >> reporter?) > >>From a developer view, this issue is fixed by the following commit: > 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), > and it is verified by kernel list reporter. > >> >> And did the Huawei guy (Jonathan Cameron) confirm his performance issue >> was fixed (I don't think I saw email that he did)? > > Last time I talked with John Garry about the issue, and the merged .get_budget > based patch improves much on the IO performance, but there is still a bit gap > compared with legacy path. Seems a driver specific issue, remembered that removing > a driver's lock can improve performance much. > > Garry, could you provide further update on this issue? Hi Ming, From our testing with experimental changes to our driver to support SCSI mq we were almost getting on par performance with legacy path. But without these MQ was hitting performance (and I would not necessarily say it was a driver issue). We can retest from today's mainline and see where we are. BTW, Have you got performance figures for many other single queue HBAs with and without CONFIG_SCSI_MQ_DEFAULT=Y? Thanks, John > > Thanks, > Ming > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-01-30 10:33 ` John Garry 0 siblings, 0 replies; 31+ messages in thread From: John Garry @ 2018-01-30 10:33 UTC (permalink / raw) On 30/01/2018 01:24, Ming Lei wrote: > On Mon, Jan 29, 2018@12:56:30PM -0800, James Bottomley wrote: >> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: >> [...] >>> 2. When to enable SCSI_MQ at default again? >> >> I'm not sure there's much to discuss ... I think the basic answer is as >> soon as Christoph wants to try it again. > > I guess Christoph still need to evaluate if there are existed issues or > blockers before trying it again. And more input may be got from F2F > discussion, IMHO. > >> >>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In >>> V4.13-rc1, it is enabled at default, but later the patch is reverted >>> in V4.13-rc7, and becomes disabled at default too. >>> >>> Now both the original reported PM issue(actually SCSI quiesce) and >>> the sequential IO performance issue have been addressed. >> >> Is the blocker bug just not closed because no-one thought to do it: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=178381 >> >> (we have confirmed that this issue is now fixed with the original >> reporter?) > >>From a developer view, this issue is fixed by the following commit: > 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), > and it is verified by kernel list reporter. > >> >> And did the Huawei guy (Jonathan Cameron) confirm his performance issue >> was fixed (I don't think I saw email that he did)? > > Last time I talked with John Garry about the issue, and the merged .get_budget > based patch improves much on the IO performance, but there is still a bit gap > compared with legacy path. Seems a driver specific issue, remembered that removing > a driver's lock can improve performance much. > > Garry, could you provide further update on this issue? Hi Ming, From our testing with experimental changes to our driver to support SCSI mq we were almost getting on par performance with legacy path. But without these MQ was hitting performance (and I would not necessarily say it was a driver issue). We can retest from today's mainline and see where we are. BTW, Have you got performance figures for many other single queue HBAs with and without CONFIG_SCSI_MQ_DEFAULT=Y? Thanks, John > > Thanks, > Ming > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics 2018-01-30 10:33 ` John Garry (?) @ 2018-02-07 10:55 ` John Garry -1 siblings, 0 replies; 31+ messages in thread From: John Garry @ 2018-02-07 10:55 UTC (permalink / raw) To: Ming Lei, James Bottomley Cc: linux-block, lsf-pc, linux-nvme, Linux-scsi, Linuxarm On 30/01/2018 10:33, John Garry wrote: > On 30/01/2018 01:24, Ming Lei wrote: >> On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote: >>> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >>> [...] >>>> 2. When to enable SCSI_MQ at default again? >>> >>> I'm not sure there's much to discuss ... I think the basic answer is as >>> soon as Christoph wants to try it again. >> >> I guess Christoph still need to evaluate if there are existed issues or >> blockers before trying it again. And more input may be got from F2F >> discussion, IMHO. >> >>> >>>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In >>>> V4.13-rc1, it is enabled at default, but later the patch is reverted >>>> in V4.13-rc7, and becomes disabled at default too. >>>> >>>> Now both the original reported PM issue(actually SCSI quiesce) and >>>> the sequential IO performance issue have been addressed. >>> >>> Is the blocker bug just not closed because no-one thought to do it: >>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=178381 >>> >>> (we have confirmed that this issue is now fixed with the original >>> reporter?) >> >>> From a developer view, this issue is fixed by the following commit: >> 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), >> and it is verified by kernel list reporter. >> >>> >>> And did the Huawei guy (Jonathan Cameron) confirm his performance issue >>> was fixed (I don't think I saw email that he did)? >> >> Last time I talked with John Garry about the issue, and the merged >> .get_budget >> based patch improves much on the IO performance, but there is still a >> bit gap >> compared with legacy path. Seems a driver specific issue, remembered >> that removing >> a driver's lock can improve performance much. >> >> Garry, could you provide further update on this issue? > > Hi Ming, > > From our testing with experimental changes to our driver to support SCSI > mq we were almost getting on par performance with legacy path. But > without these MQ was hitting performance (and I would not necessarily > say it was a driver issue). > > We can retest from today's mainline and see where we are. > > BTW, Have you got performance figures for many other single queue HBAs > with and without CONFIG_SCSI_MQ_DEFAULT=Y? We finally got around to retesting this (on hisi_sas controller). So the results are generally ok, in that we are now not seeing such big performance drops in our hardware for enabling SCSI MQ - in some scenarios the performance is better. Generally fio rw mode is better. Anyway, for what it's worth, it's a green light from us to set SCSI MQ on by default. John > > Thanks, > John > >> >> Thanks, >> Ming >> >> . >> > > > _______________________________________________ > Linuxarm mailing list > Linuxarm@huawei.com > http://hulk.huawei.com/mailman/listinfo/linuxarm > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [LSF/MM TOPIC] Two blk-mq related topics @ 2018-02-07 10:55 ` John Garry 0 siblings, 0 replies; 31+ messages in thread From: John Garry @ 2018-02-07 10:55 UTC (permalink / raw) To: Ming Lei, James Bottomley Cc: linux-block, lsf-pc, linux-nvme, Linux-scsi, Linuxarm On 30/01/2018 10:33, John Garry wrote: > On 30/01/2018 01:24, Ming Lei wrote: >> On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote: >>> On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote: >>> [...] >>>> 2. When to enable SCSI_MQ at default again? >>> >>> I'm not sure there's much to discuss ... I think the basic answer is as >>> soon as Christoph wants to try it again. >> >> I guess Christoph still need to evaluate if there are existed issues or >> blockers before trying it again. And more input may be got from F2F >> discussion, IMHO. >> >>> >>>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In >>>> V4.13-rc1, it is enabled at default, but later the patch is reverted >>>> in V4.13-rc7, and becomes disabled at default too. >>>> >>>> Now both the original reported PM issue(actually SCSI quiesce) and >>>> the sequential IO performance issue have been addressed. >>> >>> Is the blocker bug just not closed because no-one thought to do it: >>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=178381 >>> >>> (we have confirmed that this issue is now fixed with the original >>> reporter?) >> >>> From a developer view, this issue is fixed by the following commit: >> 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), >> and it is verified by kernel list reporter. >> >>> >>> And did the Huawei guy (Jonathan Cameron) confirm his performance issue >>> was fixed (I don't think I saw email that he did)? >> >> Last time I talked with John Garry about the issue, and the merged >> .get_budget >> based patch improves much on the IO performance, but there is still a >> bit gap >> compared with legacy path. Seems a driver specific issue, remembered >> that removing >> a driver's lock can improve performance much. >> >> Garry, could you provide further update on this issue? > > Hi Ming, > > From our testing with experimental changes to our driver to support SCSI > mq we were almost getting on par performance with legacy path. But > without these MQ was hitting performance (and I would not necessarily > say it was a driver issue). > > We can retest from today's mainline and see where we are. > > BTW, Have you got performance figures for many other single queue HBAs > with and without CONFIG_SCSI_MQ_DEFAULT=Y? We finally got around to retesting this (on hisi_sas controller). So the results are generally ok, in that we are now not seeing such big performance drops in our hardware for enabling SCSI MQ - in some scenarios the performance is better. Generally fio rw mode is better. Anyway, for what it's worth, it's a green light from us to set SCSI MQ on by default. John > > Thanks, > John > >> >> Thanks, >> Ming >> >> . >> > > > _______________________________________________ > Linuxarm mailing list > Linuxarm@huawei.com > http://hulk.huawei.com/mailman/listinfo/linuxarm > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
* [LSF/MM TOPIC] Two blk-mq related topics @ 2018-02-07 10:55 ` John Garry 0 siblings, 0 replies; 31+ messages in thread From: John Garry @ 2018-02-07 10:55 UTC (permalink / raw) On 30/01/2018 10:33, John Garry wrote: > On 30/01/2018 01:24, Ming Lei wrote: >> On Mon, Jan 29, 2018@12:56:30PM -0800, James Bottomley wrote: >>> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote: >>> [...] >>>> 2. When to enable SCSI_MQ at default again? >>> >>> I'm not sure there's much to discuss ... I think the basic answer is as >>> soon as Christoph wants to try it again. >> >> I guess Christoph still need to evaluate if there are existed issues or >> blockers before trying it again. And more input may be got from F2F >> discussion, IMHO. >> >>> >>>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In >>>> V4.13-rc1, it is enabled at default, but later the patch is reverted >>>> in V4.13-rc7, and becomes disabled at default too. >>>> >>>> Now both the original reported PM issue(actually SCSI quiesce) and >>>> the sequential IO performance issue have been addressed. >>> >>> Is the blocker bug just not closed because no-one thought to do it: >>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=178381 >>> >>> (we have confirmed that this issue is now fixed with the original >>> reporter?) >> >>> From a developer view, this issue is fixed by the following commit: >> 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably), >> and it is verified by kernel list reporter. >> >>> >>> And did the Huawei guy (Jonathan Cameron) confirm his performance issue >>> was fixed (I don't think I saw email that he did)? >> >> Last time I talked with John Garry about the issue, and the merged >> .get_budget >> based patch improves much on the IO performance, but there is still a >> bit gap >> compared with legacy path. Seems a driver specific issue, remembered >> that removing >> a driver's lock can improve performance much. >> >> Garry, could you provide further update on this issue? > > Hi Ming, > > From our testing with experimental changes to our driver to support SCSI > mq we were almost getting on par performance with legacy path. But > without these MQ was hitting performance (and I would not necessarily > say it was a driver issue). > > We can retest from today's mainline and see where we are. > > BTW, Have you got performance figures for many other single queue HBAs > with and without CONFIG_SCSI_MQ_DEFAULT=Y? We finally got around to retesting this (on hisi_sas controller). So the results are generally ok, in that we are now not seeing such big performance drops in our hardware for enabling SCSI MQ - in some scenarios the performance is better. Generally fio rw mode is better. Anyway, for what it's worth, it's a green light from us to set SCSI MQ on by default. John > > Thanks, > John > >> >> Thanks, >> Ming >> >> . >> > > > _______________________________________________ > Linuxarm mailing list > Linuxarm at huawei.com > http://hulk.huawei.com/mailman/listinfo/linuxarm > > . > ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2018-02-07 10:55 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-29 15:46 [LSF/MM TOPIC] Two blk-mq related topics Ming Lei 2018-01-29 15:46 ` Ming Lei 2018-01-29 20:40 ` Mike Snitzer 2018-01-29 20:40 ` Mike Snitzer 2018-01-30 1:27 ` [Lsf-pc] " Ming Lei 2018-01-30 1:27 ` Ming Lei 2018-01-30 1:27 ` Ming Lei 2018-01-29 20:56 ` James Bottomley 2018-01-29 20:56 ` James Bottomley 2018-01-29 21:00 ` Jens Axboe 2018-01-29 21:00 ` Jens Axboe 2018-01-29 23:46 ` James Bottomley 2018-01-29 23:46 ` James Bottomley 2018-01-30 1:47 ` Jens Axboe 2018-01-30 1:47 ` Jens Axboe 2018-01-30 10:08 ` Johannes Thumshirn 2018-01-30 10:08 ` Johannes Thumshirn 2018-01-30 10:08 ` Johannes Thumshirn 2018-01-30 10:50 ` Mel Gorman 2018-01-30 10:50 ` Mel Gorman 2018-01-30 1:24 ` Ming Lei 2018-01-30 1:24 ` Ming Lei 2018-01-30 8:33 ` [Lsf-pc] " Martin Steigerwald 2018-01-30 8:33 ` Martin Steigerwald 2018-01-30 8:33 ` Martin Steigerwald 2018-01-30 10:33 ` John Garry 2018-01-30 10:33 ` John Garry 2018-01-30 10:33 ` John Garry 2018-02-07 10:55 ` John Garry 2018-02-07 10:55 ` John Garry 2018-02-07 10:55 ` John Garry
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.