linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Two blk-mq related topics
@ 2018-01-29 15:46 Ming Lei
  2018-01-29 20:40 ` Mike Snitzer
  2018-01-29 20:56 ` James Bottomley
  0 siblings, 2 replies; 13+ messages in thread
From: Ming Lei @ 2018-01-29 15:46 UTC (permalink / raw)


Hi guys,

Two blk-mq related topics

1. blk-mq vs. CPU hotplug & IRQ vectors spread on CPUs

We have done three big changes in this field before, each time some issues
are fixed, meantime new ones are introduced

1) freeze all queues during CPU hotplug handler
- issues: queue dependency such as loop-mq/dm vs underlying queues, NVMe admin
queue vs. namespace queues, and IO hang may be caused during freezing all
these queues in CPU hotplug handler.

2) IRQ vectors spread on all present CPUs
- fix issue on 1)
- new issues introduced: don't support CPU hotplug physically, and cause blk-mq
warning during dispatch

3) IRQ vectors spread on all possible CPUs
- can support CPU hotplug physically
- warning in __blk_mq_run_hw_queue() still may be triggered if CPU
  offline/online happens between blk_mq_hctx_next_cpu() and running
   __blk_mq_run_hw_queue()
- new issues introduced: queue mapping may be distorted completely,
patch sent out(https://marc.info/?t=151603230900002&r=1&w=2), but may
need further discussion about this approach; drivers(such as NVMe) may
need to pass 'num_possible_cpus()' as the max vectors for allocating
irq vectors; some drivers(NVMe) uses hard-code hw queue index directly,
then this way becomes very fragile, since the hw queue may be inactive
from the beginning.

Also starting from 2), another issue is that IO completion may not be
delivered to CPUs, for example, IO may be dispatched to hw queue just
before(or after) all CPUs mapped to the hctx become offline, then IRQ
vector of the hw queue can be shutdown. Now seems we depend on timeout
handler to deal with the situation, and is there better way to solve this
issue?

2. When to enable SCSI_MQ at default again?

SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1,
it is enabled at default, but later the patch is reverted in V4.13-rc7, and
becomes disabled at default too.

Now both the original reported PM issue(actually SCSI quiesce) and the
sequential IO performance issue have been addressed. And MQ IO schedulers
are ready too for traditional disks. Are there other issues to be addressed
for enabling SCSI_MQ at default? When can we do that again?

Last time, the two issues were reported during V4.13 dev cycle just when it is
enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't
be exposed to run/tested completely & fully.  

So if we continue to disable it at default, maybe it can never be exposed to
full test/production environment.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 15:46 [LSF/MM TOPIC] Two blk-mq related topics Ming Lei
@ 2018-01-29 20:40 ` Mike Snitzer
  2018-01-30  1:27   ` Ming Lei
  2018-01-29 20:56 ` James Bottomley
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2018-01-29 20:40 UTC (permalink / raw)


On Mon, Jan 29 2018 at 10:46am -0500,
Ming Lei <ming.lei@redhat.com> wrote:
 
> 2. When to enable SCSI_MQ at default again?
> 
> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1,
> it is enabled at default, but later the patch is reverted in V4.13-rc7, and
> becomes disabled at default too.
> 
> Now both the original reported PM issue(actually SCSI quiesce) and the
> sequential IO performance issue have been addressed. And MQ IO schedulers
> are ready too for traditional disks. Are there other issues to be addressed
> for enabling SCSI_MQ at default? When can we do that again?
> 
> Last time, the two issues were reported during V4.13 dev cycle just when it is
> enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't
> be exposed to run/tested completely & fully.  
> 
> So if we continue to disable it at default, maybe it can never be exposed to
> full test/production environment.

I was going to propose revisiting this as well.

I'd really like to see all the old .request_fn block core code removed.

But maybe we take a first step of enabling:
CONFIG_SCSI_MQ_DEFAULT=Y
CONFIG_DM_MQ_DEFAULT=Y

Thanks,
Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 15:46 [LSF/MM TOPIC] Two blk-mq related topics Ming Lei
  2018-01-29 20:40 ` Mike Snitzer
@ 2018-01-29 20:56 ` James Bottomley
  2018-01-29 21:00   ` Jens Axboe
  2018-01-30  1:24   ` Ming Lei
  1 sibling, 2 replies; 13+ messages in thread
From: James Bottomley @ 2018-01-29 20:56 UTC (permalink / raw)


On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
[...]
> 2. When to enable SCSI_MQ at default again?

I'm not sure there's much to discuss ... I think the basic answer is as
soon as Christoph wants to try it again.

> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
> V4.13-rc1, it is enabled at default, but later the patch is reverted
> in V4.13-rc7, and becomes disabled at default too.
> 
> Now both the original reported PM issue(actually SCSI quiesce) and
> the sequential IO performance issue have been addressed.

Is the blocker bug just not closed because no-one thought to do it:

https://bugzilla.kernel.org/show_bug.cgi?id=178381

(we have confirmed that this issue is now fixed with the original
reporter?)

And did the Huawei guy (Jonathan Cameron) confirm his performance issue
was fixed (I don't think I saw email that he did)?

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 20:56 ` James Bottomley
@ 2018-01-29 21:00   ` Jens Axboe
  2018-01-29 23:46     ` James Bottomley
  2018-01-30 10:08     ` Johannes Thumshirn
  2018-01-30  1:24   ` Ming Lei
  1 sibling, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2018-01-29 21:00 UTC (permalink / raw)


On 1/29/18 1:56 PM, James Bottomley wrote:
> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
> [...]
>> 2. When to enable SCSI_MQ at default again?
> 
> I'm not sure there's much to discuss ... I think the basic answer is as
> soon as Christoph wants to try it again.

FWIW, internally I've been running various IO intensive workloads on
what is essentially 4.12 upstream with scsi-mq the default (with
mq-deadline as the scheduler) and comparing IO workloads with a
previous 4.6 kernel (without scsi-mq), and things are looking
great.

We're never going to iron out the last kinks with it being off
by default, I think we should attempt to flip the switch again
for 4.16.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 21:00   ` Jens Axboe
@ 2018-01-29 23:46     ` James Bottomley
  2018-01-30  1:47       ` Jens Axboe
  2018-01-30 10:08     ` Johannes Thumshirn
  1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2018-01-29 23:46 UTC (permalink / raw)


On Mon, 2018-01-29@14:00 -0700, Jens Axboe wrote:
> On 1/29/18 1:56 PM, James Bottomley wrote:
> > 
> > On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
> > [...]
> > > 
> > > 2. When to enable SCSI_MQ at default again?
> > 
> > I'm not sure there's much to discuss ... I think the basic answer
> > is as soon as Christoph wants to try it again.
> 
> FWIW, internally I've been running various IO intensive workloads on
> what is essentially 4.12 upstream with scsi-mq the default (with
> mq-deadline as the scheduler) and comparing IO workloads with a
> previous 4.6 kernel (without scsi-mq), and things are looking
> great.
> 
> We're never going to iron out the last kinks with it being off
> by default, I think we should attempt to flip the switch again
> for 4.16.

Absolutely, I agree we turn it on ASAP. ?I just don't want to be on the
receiving end of Linus' flamethrower because a bug we already had
reported against scsi-mq caused problems. ?Get confirmation from the
original reporters (or as close to it as you can) that their problems
are fixed and we're good to go; he won't kick us nearly as hard for new
bugs that turn up.

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 20:56 ` James Bottomley
  2018-01-29 21:00   ` Jens Axboe
@ 2018-01-30  1:24   ` Ming Lei
  2018-01-30  8:33     ` Martin Steigerwald
  2018-01-30 10:33     ` John Garry
  1 sibling, 2 replies; 13+ messages in thread
From: Ming Lei @ 2018-01-30  1:24 UTC (permalink / raw)


On Mon, Jan 29, 2018@12:56:30PM -0800, James Bottomley wrote:
> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
> [...]
> > 2. When to enable SCSI_MQ at default again?
> 
> I'm not sure there's much to discuss ... I think the basic answer is as
> soon as Christoph wants to try it again.

I guess Christoph still need to evaluate if there are existed issues or
blockers before trying it again. And more input may be got from F2F
discussion, IMHO.

> 
> > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
> > V4.13-rc1, it is enabled at default, but later the patch is reverted
> > in V4.13-rc7, and becomes disabled at default too.
> > 
> > Now both the original reported PM issue(actually SCSI quiesce) and
> > the sequential IO performance issue have been addressed.
> 
> Is the blocker bug just not closed because no-one thought to do it:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=178381
> 
> (we have confirmed that this issue is now fixed with the original
> reporter?)

>From a developer view, this issue is fixed by the following commit:
3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably),
and it is verified by kernel list reporter.

> 
> And did the Huawei guy (Jonathan Cameron) confirm his performance issue
> was fixed (I don't think I saw email that he did)?

Last time I talked with John Garry about the issue, and the merged .get_budget
based patch improves much on the IO performance, but there is still a bit gap
compared with legacy path. Seems a driver specific issue, remembered that removing
a driver's lock can improve performance much.

Garry, could you provide further update on this issue?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 20:40 ` Mike Snitzer
@ 2018-01-30  1:27   ` Ming Lei
  0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2018-01-30  1:27 UTC (permalink / raw)


On Mon, Jan 29, 2018@03:40:31PM -0500, Mike Snitzer wrote:
> On Mon, Jan 29 2018 at 10:46am -0500,
> Ming Lei <ming.lei@redhat.com> wrote:
>  
> > 2. When to enable SCSI_MQ at default again?
> > 
> > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In V4.13-rc1,
> > it is enabled at default, but later the patch is reverted in V4.13-rc7, and
> > becomes disabled at default too.
> > 
> > Now both the original reported PM issue(actually SCSI quiesce) and the
> > sequential IO performance issue have been addressed. And MQ IO schedulers
> > are ready too for traditional disks. Are there other issues to be addressed
> > for enabling SCSI_MQ at default? When can we do that again?
> > 
> > Last time, the two issues were reported during V4.13 dev cycle just when it is
> > enabled at default, that seems if SCSI_MQ isn't enabled at default, it wouldn't
> > be exposed to run/tested completely & fully.  
> > 
> > So if we continue to disable it at default, maybe it can never be exposed to
> > full test/production environment.
> 
> I was going to propose revisiting this as well.
> 
> I'd really like to see all the old .request_fn block core code removed.

Yeah, that should be a final goal, but may take a bit long.

> 
> But maybe we take a first step of enabling:
> CONFIG_SCSI_MQ_DEFAULT=Y
> CONFIG_DM_MQ_DEFAULT=Y

Maybe you can remove legacy path from DM_RQ first, and take your
original approach to allow DM/MQ over legacy underlying driver,
seems we discussed this topic before, :-)

Thanks,
Ming

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 23:46     ` James Bottomley
@ 2018-01-30  1:47       ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2018-01-30  1:47 UTC (permalink / raw)


On 1/29/18 4:46 PM, James Bottomley wrote:
> On Mon, 2018-01-29@14:00 -0700, Jens Axboe wrote:
>> On 1/29/18 1:56 PM, James Bottomley wrote:
>>>
>>> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
>>> [...]
>>>>
>>>> 2. When to enable SCSI_MQ at default again?
>>>
>>> I'm not sure there's much to discuss ... I think the basic answer
>>> is as soon as Christoph wants to try it again.
>>
>> FWIW, internally I've been running various IO intensive workloads on
>> what is essentially 4.12 upstream with scsi-mq the default (with
>> mq-deadline as the scheduler) and comparing IO workloads with a
>> previous 4.6 kernel (without scsi-mq), and things are looking
>> great.
>>
>> We're never going to iron out the last kinks with it being off
>> by default, I think we should attempt to flip the switch again
>> for 4.16.
> 
> Absolutely, I agree we turn it on ASAP. ?I just don't want to be on the
> receiving end of Linus' flamethrower because a bug we already had
> reported against scsi-mq caused problems. ?Get confirmation from the
> original reporters (or as close to it as you can) that their problems
> are fixed and we're good to go; he won't kick us nearly as hard for new
> bugs that turn up.

I agree, the functional issues definitely have to be verified to be
resolved. Various performance hitches we can dive into if they
crop up, but reintroducing some random suspend regression is not
acceptable.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-30  1:24   ` Ming Lei
@ 2018-01-30  8:33     ` Martin Steigerwald
  2018-01-30 10:33     ` John Garry
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Steigerwald @ 2018-01-30  8:33 UTC (permalink / raw)


Ming Lei - 30.01.18, 02:24:
> > > SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
> > > V4.13-rc1, it is enabled at default, but later the patch is reverted
> > > in V4.13-rc7, and becomes disabled at default too.
> > > 
> > > Now both the original reported PM issue(actually SCSI quiesce) and
> > > the sequential IO performance issue have been addressed.
> > 
> > Is the blocker bug just not closed because no-one thought to do it:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=178381
> > 
> > (we have confirmed that this issue is now fixed with the original
> > reporter?)
> 
> From a developer view, this issue is fixed by the following commit:
> 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably),
> and it is verified by kernel list reporter.

I never seen any suspend / hibernate related issues with blk-mq + bfq since 
then. Using heavily utilized BTRFS dual SSD RAID 1.

% egrep "MQ|BFQ" /boot/config-4.15.0-tp520-btrfstrim+
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_MQ_IOSCHED_DEADLINE=m
CONFIG_MQ_IOSCHED_KYBER=m
CONFIG_IOSCHED_BFQ=m
CONFIG_BFQ_GROUP_IOSCHED=y
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_SCSI_MQ_DEFAULT is not set
# CONFIG_DM_MQ_DEFAULT is not set
CONFIG_DM_CACHE_SMQ=m

% cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-4.15.0-tp520-btrfstrim+ root=UUID=[?] ro 
rootflags=subvol=debian resume=/dev/mapper/sata-swap init=/bin/systemd 
thinkpad_acpi.fan_control=1 systemd.restore_state=0 scsi_mod.use_blk_mq=1

% cat /sys/block/sda/queue/scheduler 
[bfq] none

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-29 21:00   ` Jens Axboe
  2018-01-29 23:46     ` James Bottomley
@ 2018-01-30 10:08     ` Johannes Thumshirn
  2018-01-30 10:50       ` Mel Gorman
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Thumshirn @ 2018-01-30 10:08 UTC (permalink / raw)


[+Cc Mel]
Jens Axboe <axboe at kernel.dk> writes:
> On 1/29/18 1:56 PM, James Bottomley wrote:
>> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
>> [...]
>>> 2. When to enable SCSI_MQ at default again?
>> 
>> I'm not sure there's much to discuss ... I think the basic answer is as
>> soon as Christoph wants to try it again.
>
> FWIW, internally I've been running various IO intensive workloads on
> what is essentially 4.12 upstream with scsi-mq the default (with
> mq-deadline as the scheduler) and comparing IO workloads with a
> previous 4.6 kernel (without scsi-mq), and things are looking
> great.
>
> We're never going to iron out the last kinks with it being off
> by default, I think we should attempt to flip the switch again
> for 4.16.

The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as
we where considering to flip the config option for SLES and it showed
several road blocks.

I'm not sure whether he re-evaluated 4.13/4.14 on his grid though.

But I'm definitively interested in this discussion and can even possibly
share some benchmark results we did in our FC Lab.

Byte,
        Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-30  1:24   ` Ming Lei
  2018-01-30  8:33     ` Martin Steigerwald
@ 2018-01-30 10:33     ` John Garry
  2018-02-07 10:55       ` John Garry
  1 sibling, 1 reply; 13+ messages in thread
From: John Garry @ 2018-01-30 10:33 UTC (permalink / raw)


On 30/01/2018 01:24, Ming Lei wrote:
> On Mon, Jan 29, 2018@12:56:30PM -0800, James Bottomley wrote:
>> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
>> [...]
>>> 2. When to enable SCSI_MQ at default again?
>>
>> I'm not sure there's much to discuss ... I think the basic answer is as
>> soon as Christoph wants to try it again.
>
> I guess Christoph still need to evaluate if there are existed issues or
> blockers before trying it again. And more input may be got from F2F
> discussion, IMHO.
>
>>
>>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
>>> V4.13-rc1, it is enabled at default, but later the patch is reverted
>>> in V4.13-rc7, and becomes disabled at default too.
>>>
>>> Now both the original reported PM issue(actually SCSI quiesce) and
>>> the sequential IO performance issue have been addressed.
>>
>> Is the blocker bug just not closed because no-one thought to do it:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=178381
>>
>> (we have confirmed that this issue is now fixed with the original
>> reporter?)
>
>>From a developer view, this issue is fixed by the following commit:
> 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably),
> and it is verified by kernel list reporter.
>
>>
>> And did the Huawei guy (Jonathan Cameron) confirm his performance issue
>> was fixed (I don't think I saw email that he did)?
>
> Last time I talked with John Garry about the issue, and the merged .get_budget
> based patch improves much on the IO performance, but there is still a bit gap
> compared with legacy path. Seems a driver specific issue, remembered that removing
> a driver's lock can improve performance much.
>
> Garry, could you provide further update on this issue?

Hi Ming,

 From our testing with experimental changes to our driver to support 
SCSI mq we were almost getting on par performance with legacy path. But 
without these MQ was hitting performance (and I would not necessarily 
say it was a driver issue).

We can retest from today's mainline and see where we are.

BTW, Have you got performance figures for many other single queue HBAs 
with and without CONFIG_SCSI_MQ_DEFAULT=Y?

Thanks,
John

>
> Thanks,
> Ming
>
> .
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-30 10:08     ` Johannes Thumshirn
@ 2018-01-30 10:50       ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2018-01-30 10:50 UTC (permalink / raw)


On Tue, Jan 30, 2018@11:08:28AM +0100, Johannes Thumshirn wrote:
> [+Cc Mel]
> Jens Axboe <axboe at kernel.dk> writes:
> > On 1/29/18 1:56 PM, James Bottomley wrote:
> >> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
> >> [...]
> >>> 2. When to enable SCSI_MQ at default again?
> >> 
> >> I'm not sure there's much to discuss ... I think the basic answer is as
> >> soon as Christoph wants to try it again.
> >
> > FWIW, internally I've been running various IO intensive workloads on
> > what is essentially 4.12 upstream with scsi-mq the default (with
> > mq-deadline as the scheduler) and comparing IO workloads with a
> > previous 4.6 kernel (without scsi-mq), and things are looking
> > great.
> >
> > We're never going to iron out the last kinks with it being off
> > by default, I think we should attempt to flip the switch again
> > for 4.16.
> 
> The 4.12 sounds interesting. I remember Mel ran some test with 4.12 as
> we where considering to flip the config option for SLES and it showed
> several road blocks.
> 

Mostly due to slow storage and BFQ where mq-deadline was not a universal
win as an alternative default. I don't have current data and I archived
what I had, but it was based on 4.13-rc7 at the time and BFQ has changed
a lot since so it would need to be redone.

> I'm not sure whether he re-evaluated 4.13/4.14 on his grid though.
> 

No, it hasn't. Grid time for performance testing has been tight during
the last few months to say the least.

> But I'm definitively interested in this discussion and can even possibly
> share some benchmark results we did in our FC Lab.
> 

If you remind me, I may be able to re-execute the tests in a 4.16-rcX
before LSF/MM so you have other data to work with.  Unfortunately, I'll
not be able to make LSF/MM this time due to personal commitments that
conflict and are unmovable.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Two blk-mq related topics
  2018-01-30 10:33     ` John Garry
@ 2018-02-07 10:55       ` John Garry
  0 siblings, 0 replies; 13+ messages in thread
From: John Garry @ 2018-02-07 10:55 UTC (permalink / raw)


On 30/01/2018 10:33, John Garry wrote:
> On 30/01/2018 01:24, Ming Lei wrote:
>> On Mon, Jan 29, 2018@12:56:30PM -0800, James Bottomley wrote:
>>> On Mon, 2018-01-29@23:46 +0800, Ming Lei wrote:
>>> [...]
>>>> 2. When to enable SCSI_MQ at default again?
>>>
>>> I'm not sure there's much to discuss ... I think the basic answer is as
>>> soon as Christoph wants to try it again.
>>
>> I guess Christoph still need to evaluate if there are existed issues or
>> blockers before trying it again. And more input may be got from F2F
>> discussion, IMHO.
>>
>>>
>>>> SCSI_MQ is enabled on V3.17 firstly, but disabled at default. In
>>>> V4.13-rc1, it is enabled at default, but later the patch is reverted
>>>> in V4.13-rc7, and becomes disabled at default too.
>>>>
>>>> Now both the original reported PM issue(actually SCSI quiesce) and
>>>> the sequential IO performance issue have been addressed.
>>>
>>> Is the blocker bug just not closed because no-one thought to do it:
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=178381
>>>
>>> (we have confirmed that this issue is now fixed with the original
>>> reporter?)
>>
>>> From a developer view, this issue is fixed by the following commit:
>> 3a0a52997(block, scsi: Make SCSI quiesce and resume work reliably),
>> and it is verified by kernel list reporter.
>>
>>>
>>> And did the Huawei guy (Jonathan Cameron) confirm his performance issue
>>> was fixed (I don't think I saw email that he did)?
>>
>> Last time I talked with John Garry about the issue, and the merged
>> .get_budget
>> based patch improves much on the IO performance, but there is still a
>> bit gap
>> compared with legacy path. Seems a driver specific issue, remembered
>> that removing
>> a driver's lock can improve performance much.
>>
>> Garry, could you provide further update on this issue?
>
> Hi Ming,
>
> From our testing with experimental changes to our driver to support SCSI
> mq we were almost getting on par performance with legacy path. But
> without these MQ was hitting performance (and I would not necessarily
> say it was a driver issue).
>
> We can retest from today's mainline and see where we are.
>
> BTW, Have you got performance figures for many other single queue HBAs
> with and without CONFIG_SCSI_MQ_DEFAULT=Y?

We finally got around to retesting this (on hisi_sas controller). So the 
results are generally ok, in that we are now not seeing such big 
performance drops in our hardware for enabling SCSI MQ - in some 
scenarios the performance is better. Generally fio rw mode is better.

Anyway, for what it's worth, it's a green light from us to set SCSI MQ 
on by default.

John

>
> Thanks,
> John
>
>>
>> Thanks,
>> Ming
>>
>> .
>>
>
>
> _______________________________________________
> Linuxarm mailing list
> Linuxarm at huawei.com
> http://hulk.huawei.com/mailman/listinfo/linuxarm
>
> .
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-02-07 10:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-29 15:46 [LSF/MM TOPIC] Two blk-mq related topics Ming Lei
2018-01-29 20:40 ` Mike Snitzer
2018-01-30  1:27   ` Ming Lei
2018-01-29 20:56 ` James Bottomley
2018-01-29 21:00   ` Jens Axboe
2018-01-29 23:46     ` James Bottomley
2018-01-30  1:47       ` Jens Axboe
2018-01-30 10:08     ` Johannes Thumshirn
2018-01-30 10:50       ` Mel Gorman
2018-01-30  1:24   ` Ming Lei
2018-01-30  8:33     ` Martin Steigerwald
2018-01-30 10:33     ` John Garry
2018-02-07 10:55       ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).