linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
@ 2018-01-29  9:08 Hannes Reinecke
  2018-01-29 15:41 ` Elliott, Robert (Persistent Memory)
  2018-02-01 15:05 ` Ming Lei
  0 siblings, 2 replies; 11+ messages in thread
From: Hannes Reinecke @ 2018-01-29  9:08 UTC (permalink / raw)


Hi all,

here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').

When doing I/O tests on a machine with more CPUs than MSIx vectors
provided by the HBA we can easily setup a scenario where one CPU is
submitting I/O and the other one is completing I/O. Which will result in
the latter CPU being stuck in the interrupt completion routine for
basically ever, resulting in the lockup detector kicking in.

How should these situations be handled?
Should it be made the responsibility of the drivers, ensuring that the
interrupt completion routine is terminated after a certain time?
Should it be made the resposibility of the upper layers?
Should it be the responsibility of the interrupt mapping code?
Can/should interrupt polling be used in these situations?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-01-29  9:08 [LSF/MM TOPIC] irq affinity handling for high CPU count machines Hannes Reinecke
@ 2018-01-29 15:41 ` Elliott, Robert (Persistent Memory)
  2018-01-29 16:37   ` Bart Van Assche
  2018-02-01 15:05 ` Ming Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2018-01-29 15:41 UTC (permalink / raw)




> -----Original Message-----
> From: Linux-nvme [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf
> Of Hannes Reinecke
> Sent: Monday, January 29, 2018 3:09 AM
> To: lsf-pc at lists.linux-foundation.org
> Cc: linux-nvme at lists.infradead.org; linux-scsi at vger.kernel.org; Kashyap
> Desai <kashyap.desai at broadcom.com>
> Subject: [LSF/MM TOPIC] irq affinity handling for high CPU count machines
> 
> Hi all,
> 
> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> 
> When doing I/O tests on a machine with more CPUs than MSIx vectors
> provided by the HBA we can easily setup a scenario where one CPU is
> submitting I/O and the other one is completing I/O. Which will result in
> the latter CPU being stuck in the interrupt completion routine for
> basically ever, resulting in the lockup detector kicking in.
> 
> How should these situations be handled?
> Should it be made the responsibility of the drivers, ensuring that the
> interrupt completion routine is terminated after a certain time?
> Should it be made the resposibility of the upper layers?
> Should it be the responsibility of the interrupt mapping code?
> Can/should interrupt polling be used in these situations?

Back when we introduced scsi-mq with hpsa, the best approach was to
route interrupts and completion handling so each CPU core handles its
own submissions; this way, they are self-throttling.

Every other arrangement was subject to soft lockups and other problems
when the completion CPUs become overwhelmed with work.

See https://lkml.org/lkml/2014/9/9/931.

---
Robert Elliott, HPE Persistent Memory

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-01-29 15:41 ` Elliott, Robert (Persistent Memory)
@ 2018-01-29 16:37   ` Bart Van Assche
  2018-01-29 16:42     ` Kashyap Desai
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2018-01-29 16:37 UTC (permalink / raw)


On 01/29/18 07:41, Elliott, Robert (Persistent Memory) wrote:
>> -----Original Message-----
>> From: Linux-nvme [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf
>> Of Hannes Reinecke
>> Sent: Monday, January 29, 2018 3:09 AM
>> To: lsf-pc at lists.linux-foundation.org
>> Cc: linux-nvme at lists.infradead.org; linux-scsi at vger.kernel.org; Kashyap
>> Desai <kashyap.desai at broadcom.com>
>> Subject: [LSF/MM TOPIC] irq affinity handling for high CPU count machines
>>
>> Hi all,
>>
>> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
>> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
>>
>> When doing I/O tests on a machine with more CPUs than MSIx vectors
>> provided by the HBA we can easily setup a scenario where one CPU is
>> submitting I/O and the other one is completing I/O. Which will result in
>> the latter CPU being stuck in the interrupt completion routine for
>> basically ever, resulting in the lockup detector kicking in.
>>
>> How should these situations be handled?
>> Should it be made the responsibility of the drivers, ensuring that the
>> interrupt completion routine is terminated after a certain time?
>> Should it be made the responsibility of the upper layers?
>> Should it be the responsibility of the interrupt mapping code?
>> Can/should interrupt polling be used in these situations?
> 
> Back when we introduced scsi-mq with hpsa, the best approach was to
> route interrupts and completion handling so each CPU core handles its
> own submissions; this way, they are self-throttling.

That approach may work for the hpsa adapter but I'm not sure whether it 
works for all adapter types. It has already been observed with the SRP 
initiator driver running inside a VM that a single core spent all its 
time processing IB interrupts.

Additionally, only initiator workloads are self-throttling. Target style 
workloads are not self-throttling.

In other words, I think it's worth to discuss this topic further.

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-01-29 16:37   ` Bart Van Assche
@ 2018-01-29 16:42     ` Kashyap Desai
  0 siblings, 0 replies; 11+ messages in thread
From: Kashyap Desai @ 2018-01-29 16:42 UTC (permalink / raw)


> -----Original Message-----
> From: Bart Van Assche [mailto:bart.vanassche at wdc.com]
> Sent: Monday, January 29, 2018 10:08 PM
> To: Elliott, Robert (Persistent Memory); Hannes Reinecke;
> lsf-pc at lists.linux-
> foundation.org
> Cc: linux-scsi at vger.kernel.org; linux-nvme at lists.infradead.org; Kashyap
> Desai
> Subject: Re: [LSF/MM TOPIC] irq affinity handling for high CPU count
> machines
>
> On 01/29/18 07:41, Elliott, Robert (Persistent Memory) wrote:
> >> -----Original Message-----
> >> From: Linux-nvme [mailto:linux-nvme-bounces at lists.infradead.org] On
> >> Behalf Of Hannes Reinecke
> >> Sent: Monday, January 29, 2018 3:09 AM
> >> To: lsf-pc at lists.linux-foundation.org
> >> Cc: linux-nvme at lists.infradead.org; linux-scsi at vger.kernel.org;
> >> Kashyap Desai <kashyap.desai at broadcom.com>
> >> Subject: [LSF/MM TOPIC] irq affinity handling for high CPU count
> >> machines
> >>
> >> Hi all,
> >>
> >> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> >> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> >>
> >> When doing I/O tests on a machine with more CPUs than MSIx vectors
> >> provided by the HBA we can easily setup a scenario where one CPU is
> >> submitting I/O and the other one is completing I/O. Which will result
> >> in the latter CPU being stuck in the interrupt completion routine for
> >> basically ever, resulting in the lockup detector kicking in.
> >>
> >> How should these situations be handled?
> >> Should it be made the responsibility of the drivers, ensuring that
> >> the interrupt completion routine is terminated after a certain time?
> >> Should it be made the responsibility of the upper layers?
> >> Should it be the responsibility of the interrupt mapping code?
> >> Can/should interrupt polling be used in these situations?
> >
> > Back when we introduced scsi-mq with hpsa, the best approach was to
> > route interrupts and completion handling so each CPU core handles its
> > own submissions; this way, they are self-throttling.


Ideal scenario is to make sure submitter is interrupted for completion.  It
is not possible to manage via any tuning like rq_affinity=2 (and --exact
irqbalance policy), if we have more # of CPUs than MSI-x vector supported by
controllers. If we use irq poll interface with good amount of weights in irq
poll API, we will no more see CPU lockups because low level driver will quit
ISR routine after each weighted completion. There will be always chance that
we will have back to back pressure on the same CPU for completion, but irq
poll design will manage to run watchdog task and timestamp will updated.
Using irq poll we may see close to 100% CPU consumption, but there will be
no  lockup detection.

>
> That approach may work for the hpsa adapter but I'm not sure whether it
> works for all adapter types. It has already been observed with the SRP
> initiator
> driver running inside a VM that a single core spent all its time
> processing IB
> interrupts.
>
> Additionally, only initiator workloads are self-throttling. Target style
> workloads are not self-throttling.
>
> In other words, I think it's worth to discuss this topic further.
>
> Bart.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-01-29  9:08 [LSF/MM TOPIC] irq affinity handling for high CPU count machines Hannes Reinecke
  2018-01-29 15:41 ` Elliott, Robert (Persistent Memory)
@ 2018-02-01 15:05 ` Ming Lei
  2018-02-01 16:20   ` Hannes Reinecke
  1 sibling, 1 reply; 11+ messages in thread
From: Ming Lei @ 2018-02-01 15:05 UTC (permalink / raw)


Hello Hannes,

On Mon, Jan 29, 2018@10:08:43AM +0100, Hannes Reinecke wrote:
> Hi all,
> 
> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> 
> When doing I/O tests on a machine with more CPUs than MSIx vectors
> provided by the HBA we can easily setup a scenario where one CPU is
> submitting I/O and the other one is completing I/O. Which will result in
> the latter CPU being stuck in the interrupt completion routine for
> basically ever, resulting in the lockup detector kicking in.

Today I am looking at one megaraid_sas related issue, and found
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so looks
each reply queue has been handled by more than one CPU if there are more
CPUs than MSIx vectors in the system, which is done by generic irq affinity
code, please see kernel/irq/affinity.c.

Also IMO each reply queue may be treated as blk-mq's hw queue, then
megaraid may benefit from blk-mq's MQ framework, but one annoying thing is
that both legacy and blk-mq path need to be handled inside driver.

> 
> How should these situations be handled?
> Should it be made the responsibility of the drivers, ensuring that the
> interrupt completion routine is terminated after a certain time?
> Should it be made the resposibility of the upper layers?
> Should it be the responsibility of the interrupt mapping code?
> Can/should interrupt polling be used in these situations?

Yeah, I guess interrupt polling may improve these situations, especially
KPTI introduces some extra cost in interrupt handling.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-02-01 15:05 ` Ming Lei
@ 2018-02-01 16:20   ` Hannes Reinecke
  2018-02-01 16:59     ` Kashyap Desai
  2018-02-02  1:55     ` Ming Lei
  0 siblings, 2 replies; 11+ messages in thread
From: Hannes Reinecke @ 2018-02-01 16:20 UTC (permalink / raw)


On 02/01/2018 04:05 PM, Ming Lei wrote:
> Hello Hannes,
> 
> On Mon, Jan 29, 2018@10:08:43AM +0100, Hannes Reinecke wrote:
>> Hi all,
>>
>> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
>> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
>>
>> When doing I/O tests on a machine with more CPUs than MSIx vectors
>> provided by the HBA we can easily setup a scenario where one CPU is
>> submitting I/O and the other one is completing I/O. Which will result in
>> the latter CPU being stuck in the interrupt completion routine for
>> basically ever, resulting in the lockup detector kicking in.
> 
> Today I am looking at one megaraid_sas related issue, and found
> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so looks
> each reply queue has been handled by more than one CPU if there are more
> CPUs than MSIx vectors in the system, which is done by generic irq affinity
> code, please see kernel/irq/affinity.c.
> 
> Also IMO each reply queue may be treated as blk-mq's hw queue, then
> megaraid may benefit from blk-mq's MQ framework, but one annoying thing is
> that both legacy and blk-mq path need to be handled inside driver.
> 
The megaraid driver is a really strange beast;, having layered two
different interfaces (the 'legacy' MFI interface and that from from
mpt3sas) on top of each other.
I had been thinking of converting it to scsi-mq, too (as my mpt3sas
patch finally went in), but I'm not sure if we can benefit from it as
we're still be bound by the HBA-wide tag pool.
It's on my todo list, albeit pretty far down :-)

>>
>> How should these situations be handled?
>> Should it be made the responsibility of the drivers, ensuring that the
>> interrupt completion routine is terminated after a certain time?
>> Should it be made the resposibility of the upper layers?
>> Should it be the responsibility of the interrupt mapping code?
>> Can/should interrupt polling be used in these situations?
> 
> Yeah, I guess interrupt polling may improve these situations, especially
> KPTI introduces some extra cost in interrupt handling.
> 
The question is not so much if one should be doing irq polling, but
rather if we can come up with some guidance or even infrastructure to
make this happen automatically.
Having to rely on individual drivers to get this right is probably not
the best option.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-02-01 16:20   ` Hannes Reinecke
@ 2018-02-01 16:59     ` Kashyap Desai
  2018-02-02  2:02       ` Ming Lei
  2018-02-02  1:55     ` Ming Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Kashyap Desai @ 2018-02-01 16:59 UTC (permalink / raw)


> -----Original Message-----
> From: Hannes Reinecke [mailto:hare at suse.de]
> Sent: Thursday, February 1, 2018 9:50 PM
> To: Ming Lei
> Cc: lsf-pc at lists.linux-foundation.org; linux-scsi at vger.kernel.org; linux-
> nvme at lists.infradead.org; Kashyap Desai
> Subject: Re: [LSF/MM TOPIC] irq affinity handling for high CPU count
> machines
>
> On 02/01/2018 04:05 PM, Ming Lei wrote:
> > Hello Hannes,
> >
> > On Mon, Jan 29, 2018@10:08:43AM +0100, Hannes Reinecke wrote:
> >> Hi all,
> >>
> >> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> >> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> >>
> >> When doing I/O tests on a machine with more CPUs than MSIx vectors
> >> provided by the HBA we can easily setup a scenario where one CPU is
> >> submitting I/O and the other one is completing I/O. Which will result
> >> in the latter CPU being stuck in the interrupt completion routine for
> >> basically ever, resulting in the lockup detector kicking in.
> >
> > Today I am looking at one megaraid_sas related issue, and found
> > pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so
> > looks each reply queue has been handled by more than one CPU if there
> > are more CPUs than MSIx vectors in the system, which is done by
> > generic irq affinity code, please see kernel/irq/affinity.c.

Yes. That is a problematic area. If CPU and MSI-x(reply queue) is 1:1
mapped, we don't have any issue.

> >
> > Also IMO each reply queue may be treated as blk-mq's hw queue, then
> > megaraid may benefit from blk-mq's MQ framework, but one annoying
> > thing is that both legacy and blk-mq path need to be handled inside
> > driver.

Both MR and IT driver is (due to H/W design.) is using blk-mq frame work but
it is really  a single h/w queue.
IT and MR HBA has single submission queue and multiple reply queue.

> >
> The megaraid driver is a really strange beast;, having layered two
> different
> interfaces (the 'legacy' MFI interface and that from from
> mpt3sas) on top of each other.
> I had been thinking of converting it to scsi-mq, too (as my mpt3sas patch
> finally went in), but I'm not sure if we can benefit from it as we're
> still be
> bound by the HBA-wide tag pool.
> It's on my todo list, albeit pretty far down :-)

Hannes, this is typically same in both MR (megaraid_sas) and IT (mpt3sas).
Both the driver is using shared HBA-wide tag pool.
Both MR and IT driver use request->tag to get command from free pool.

>
> >>
> >> How should these situations be handled?
> >> Should it be made the responsibility of the drivers, ensuring that
> >> the interrupt completion routine is terminated after a certain time?
> >> Should it be made the resposibility of the upper layers?
> >> Should it be the responsibility of the interrupt mapping code?
> >> Can/should interrupt polling be used in these situations?
> >
> > Yeah, I guess interrupt polling may improve these situations,
> > especially KPTI introduces some extra cost in interrupt handling.
> >
> The question is not so much if one should be doing irq polling, but rather
> if we
> can come up with some guidance or even infrastructure to make this happen
> automatically.
> Having to rely on individual drivers to get this right is probably not the
> best
> option.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare at suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG N?rnberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-02-01 16:20   ` Hannes Reinecke
  2018-02-01 16:59     ` Kashyap Desai
@ 2018-02-02  1:55     ` Ming Lei
  1 sibling, 0 replies; 11+ messages in thread
From: Ming Lei @ 2018-02-02  1:55 UTC (permalink / raw)


Hello Hannes,

On Thu, Feb 01, 2018@05:20:26PM +0100, Hannes Reinecke wrote:
> On 02/01/2018 04:05 PM, Ming Lei wrote:
> > Hello Hannes,
> > 
> > On Mon, Jan 29, 2018@10:08:43AM +0100, Hannes Reinecke wrote:
> >> Hi all,
> >>
> >> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> >> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> >>
> >> When doing I/O tests on a machine with more CPUs than MSIx vectors
> >> provided by the HBA we can easily setup a scenario where one CPU is
> >> submitting I/O and the other one is completing I/O. Which will result in
> >> the latter CPU being stuck in the interrupt completion routine for
> >> basically ever, resulting in the lockup detector kicking in.
> > 
> > Today I am looking at one megaraid_sas related issue, and found
> > pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so looks
> > each reply queue has been handled by more than one CPU if there are more
> > CPUs than MSIx vectors in the system, which is done by generic irq affinity
> > code, please see kernel/irq/affinity.c.
> > 
> > Also IMO each reply queue may be treated as blk-mq's hw queue, then
> > megaraid may benefit from blk-mq's MQ framework, but one annoying thing is
> > that both legacy and blk-mq path need to be handled inside driver.
> > 
> The megaraid driver is a really strange beast;, having layered two
> different interfaces (the 'legacy' MFI interface and that from from
> mpt3sas) on top of each other.
> I had been thinking of converting it to scsi-mq, too (as my mpt3sas
> patch finally went in), but I'm not sure if we can benefit from it as
> we're still be bound by the HBA-wide tag pool.

Actually current SCSI_MQ works at this mode of HBA-wide tag pool too,
please see scsi_host_queue_ready() which is called in scsi_queue_rq(),
same with scsi_mq_get_budget().

Seems it is weird for real MQ cases, even the tag is allocated from
per-hctx tags, the host wide queue depth still need to be respected,
finally it is just like HBA-wide tag pool.

That is something which need to discuss too.

Also I remembered you posted the patch for sharing tags among hctx,
that should help to convert reply queues into scsi_mq/blk_mq's hctx.

> It's on my todo list, albeit pretty far down :-)
> 
> >>
> >> How should these situations be handled?
> >> Should it be made the responsibility of the drivers, ensuring that the
> >> interrupt completion routine is terminated after a certain time?
> >> Should it be made the resposibility of the upper layers?
> >> Should it be the responsibility of the interrupt mapping code?
> >> Can/should interrupt polling be used in these situations?
> > 
> > Yeah, I guess interrupt polling may improve these situations, especially
> > KPTI introduces some extra cost in interrupt handling.
> > 
> The question is not so much if one should be doing irq polling, but
> rather if we can come up with some guidance or even infrastructure to
> make this happen automatically.
> Having to rely on individual drivers to get this right is probably not
> the best option.

Agree.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-02-01 16:59     ` Kashyap Desai
@ 2018-02-02  2:02       ` Ming Lei
  2018-02-02  8:49         ` Kashyap Desai
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2018-02-02  2:02 UTC (permalink / raw)


Hello Kashyap,

On Thu, Feb 01, 2018@10:29:22PM +0530, Kashyap Desai wrote:
> > -----Original Message-----
> > From: Hannes Reinecke [mailto:hare at suse.de]
> > Sent: Thursday, February 1, 2018 9:50 PM
> > To: Ming Lei
> > Cc: lsf-pc at lists.linux-foundation.org; linux-scsi at vger.kernel.org; linux-
> > nvme at lists.infradead.org; Kashyap Desai
> > Subject: Re: [LSF/MM TOPIC] irq affinity handling for high CPU count
> > machines
> >
> > On 02/01/2018 04:05 PM, Ming Lei wrote:
> > > Hello Hannes,
> > >
> > > On Mon, Jan 29, 2018@10:08:43AM +0100, Hannes Reinecke wrote:
> > >> Hi all,
> > >>
> > >> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> > >> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> > >>
> > >> When doing I/O tests on a machine with more CPUs than MSIx vectors
> > >> provided by the HBA we can easily setup a scenario where one CPU is
> > >> submitting I/O and the other one is completing I/O. Which will result
> > >> in the latter CPU being stuck in the interrupt completion routine for
> > >> basically ever, resulting in the lockup detector kicking in.
> > >
> > > Today I am looking at one megaraid_sas related issue, and found
> > > pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so
> > > looks each reply queue has been handled by more than one CPU if there
> > > are more CPUs than MSIx vectors in the system, which is done by
> > > generic irq affinity code, please see kernel/irq/affinity.c.
> 
> Yes. That is a problematic area. If CPU and MSI-x(reply queue) is 1:1
> mapped, we don't have any issue.

I guess the problematic area is similar with the following link:

	https://marc.info/?l=linux-kernel&m=151748144730409&w=2

otherwise could you explain a bit about the area?

> 
> > >
> > > Also IMO each reply queue may be treated as blk-mq's hw queue, then
> > > megaraid may benefit from blk-mq's MQ framework, but one annoying
> > > thing is that both legacy and blk-mq path need to be handled inside
> > > driver.
> 
> Both MR and IT driver is (due to H/W design.) is using blk-mq frame work but
> it is really  a single h/w queue.
> IT and MR HBA has single submission queue and multiple reply queue.

It should have been covered by MQ, but just we need to share tags among
hctxs, like what Hannes posted long time ago.

> 
> > >
> > The megaraid driver is a really strange beast;, having layered two
> > different
> > interfaces (the 'legacy' MFI interface and that from from
> > mpt3sas) on top of each other.
> > I had been thinking of converting it to scsi-mq, too (as my mpt3sas patch
> > finally went in), but I'm not sure if we can benefit from it as we're
> > still be
> > bound by the HBA-wide tag pool.
> > It's on my todo list, albeit pretty far down :-)
> 
> Hannes, this is typically same in both MR (megaraid_sas) and IT (mpt3sas).
> Both the driver is using shared HBA-wide tag pool.
> Both MR and IT driver use request->tag to get command from free pool.

Seems a generic thing, same with HPSA too.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-02-02  2:02       ` Ming Lei
@ 2018-02-02  8:49         ` Kashyap Desai
  2018-02-02 10:20           ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Kashyap Desai @ 2018-02-02  8:49 UTC (permalink / raw)


> > > > Today I am looking at one megaraid_sas related issue, and found
> > > > pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so
> > > > looks each reply queue has been handled by more than one CPU if
> > > > there are more CPUs than MSIx vectors in the system, which is done
> > > > by generic irq affinity code, please see kernel/irq/affinity.c.
> >
> > Yes. That is a problematic area. If CPU and MSI-x(reply queue) is 1:1
> > mapped, we don't have any issue.
>
> I guess the problematic area is similar with the following link:
>
> 	https://marc.info/?l=linux-kernel&m=151748144730409&w=2

Hi Ming,

Above mentioned link is different discussion and looks like a generic
issue. megaraid_sas/mpt3sas will have same symptoms if irq affinity has
only offline CPUs.
Just for info - "In such condition, we can ask users to disable affinity
hit via module parameter " smp_affinity_enable".

>
> otherwise could you explain a bit about the area?

Please check below post for more details.

https://marc.info/?l=linux-scsi&m=151601833418346&w=2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [LSF/MM TOPIC] irq affinity handling for high CPU count machines
  2018-02-02  8:49         ` Kashyap Desai
@ 2018-02-02 10:20           ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2018-02-02 10:20 UTC (permalink / raw)


Hi Kashyap,

On Fri, Feb 02, 2018@02:19:01PM +0530, Kashyap Desai wrote:
> > > > > Today I am looking at one megaraid_sas related issue, and found
> > > > > pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so
> > > > > looks each reply queue has been handled by more than one CPU if
> > > > > there are more CPUs than MSIx vectors in the system, which is done
> > > > > by generic irq affinity code, please see kernel/irq/affinity.c.
> > >
> > > Yes. That is a problematic area. If CPU and MSI-x(reply queue) is 1:1
> > > mapped, we don't have any issue.
> >
> > I guess the problematic area is similar with the following link:
> >
> > 	https://marc.info/?l=linux-kernel&m=151748144730409&w=2
> 
> Hi Ming,
> 
> Above mentioned link is different discussion and looks like a generic
> issue. megaraid_sas/mpt3sas will have same symptoms if irq affinity has
> only offline CPUs.

If you convert to SCSI_MQ/MQ, it is a generic issue, which is solved
by a generic solution, otherwise now it is driver's responsibility to make
sure to not use the reply queue in which no online CPUs is mapped.

> Just for info - "In such condition, we can ask users to disable affinity
> hit via module parameter " smp_affinity_enable".

Yeah, that is exactly what I suggested to our QE friend.

> 
> >
> > otherwise could you explain a bit about the area?
> 
> Please check below post for more details.
> 
> https://marc.info/?l=linux-scsi&m=151601833418346&w=2

Seems SCSI_MQ/MQ can solve this issue, and I have replied on the above link,
we can discuss on that thread further.


thanks, 
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-02 10:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-29  9:08 [LSF/MM TOPIC] irq affinity handling for high CPU count machines Hannes Reinecke
2018-01-29 15:41 ` Elliott, Robert (Persistent Memory)
2018-01-29 16:37   ` Bart Van Assche
2018-01-29 16:42     ` Kashyap Desai
2018-02-01 15:05 ` Ming Lei
2018-02-01 16:20   ` Hannes Reinecke
2018-02-01 16:59     ` Kashyap Desai
2018-02-02  2:02       ` Ming Lei
2018-02-02  8:49         ` Kashyap Desai
2018-02-02 10:20           ` Ming Lei
2018-02-02  1:55     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).