[PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated
@ 2018-11-02  0:34 Long Li
  2018-11-03 15:06 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Long Li @ 2018-11-02  0:34 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel; +Cc: Long Li

From: Long Li <longli@microsoft.com>

On a large system with multiple devices of the same class (e.g. NVMe disks,
using managed IRQs), the kernel tends to concentrate their IRQs on several
CPUs.

The issue is that when NVMe calls irq_matrix_alloc_managed(), the assigned
CPU tends to be the first several CPUs in the cpumask, because they check for
cpumap->available that will not change after managed IRQs are reserved.

For a managed IRQ, it tends to reserve more than one CPU, based on cpumask in
irq_matrix_reserve_managed. But later when actually allocating CPU for this
IRQ, only one CPU is allocated. Because "available" is calculated at the time
managed IRQ is reserved, it tends to indicate a CPU has more IRQs than the actual
number it's assigned.

To get a more even distribution for allocating managed IRQs, we need to keep track
of how many of them are allocated on a given CPU. Introduce "managed_allocated"
in struct cpumap to track those managed IRQs that are allocated on this CPU, and
change the code to use this information for deciding how to allocate CPU for
managed IRQs.

Signed-off-by: Long Li <longli@microsoft.com>
---
 kernel/irq/matrix.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 6e6d467f3dec..94dd173f24d6 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -14,6 +14,7 @@ struct cpumap {
 	unsigned int		available;
 	unsigned int		allocated;
 	unsigned int		managed;
+	unsigned int		managed_allocated;
 	bool			initialized;
 	bool			online;
 	unsigned long		alloc_map[IRQ_MATRIX_SIZE];
@@ -145,6 +146,27 @@ static unsigned int matrix_find_best_cpu(struct irq_matrix *m,
 	return best_cpu;
 }

+/* Find the best CPU which has the lowest number of managed IRQs allocated */
+static unsigned int matrix_find_best_cpu_managed(struct irq_matrix *m,
+						const struct cpumask *msk)
+{
+	unsigned int cpu, best_cpu, allocated = UINT_MAX;
+	struct cpumap *cm;
+
+	best_cpu = UINT_MAX;
+
+	for_each_cpu(cpu, msk) {
+		cm = per_cpu_ptr(m->maps, cpu);
+
+		if (!cm->online || cm->managed_allocated > allocated)
+			continue;
+
+		best_cpu = cpu;
+		allocated = cm->managed_allocated;
+	}
+	return best_cpu;
+}
+
 /**
  * irq_matrix_assign_system - Assign system wide entry in the matrix
  * @m:		Matrix pointer
@@ -269,7 +291,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
 	if (cpumask_empty(msk))
 		return -EINVAL;

-	cpu = matrix_find_best_cpu(m, msk);
+	cpu = matrix_find_best_cpu_managed(m, msk);
 	if (cpu == UINT_MAX)
 		return -ENOSPC;

@@ -282,6 +304,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
 		return -ENOSPC;
 	set_bit(bit, cm->alloc_map);
 	cm->allocated++;
+	cm->managed_allocated++;
 	m->total_allocated++;
 	*mapped_cpu = cpu;
 	trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated
  2018-11-02  0:34 [PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated Long Li
@ 2018-11-03 15:06 ` Thomas Gleixner
  2018-11-03 17:15   ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2018-11-03 15:06 UTC (permalink / raw)
  To: Long Li; +Cc: linux-kernel

Long,

On Fri, 2 Nov 2018, Long Li wrote:
>  /**
>   * irq_matrix_assign_system - Assign system wide entry in the matrix
>   * @m:		Matrix pointer
> @@ -269,7 +291,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
>  	if (cpumask_empty(msk))
>  		return -EINVAL;
>  
> -	cpu = matrix_find_best_cpu(m, msk);
> +	cpu = matrix_find_best_cpu_managed(m, msk);
>  	if (cpu == UINT_MAX)
>  		return -ENOSPC;
>  
> @@ -282,6 +304,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
>  		return -ENOSPC;
>  	set_bit(bit, cm->alloc_map);
>  	cm->allocated++;
> +	cm->managed_allocated++;
>  	m->total_allocated++;
>  	*mapped_cpu = cpu;
>  	trace_irq_matrix_alloc_managed(bit, cpu, m, cm);

so far so good. But what exactly decrements managed_allocated ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated
  2018-11-03 15:06 ` Thomas Gleixner
@ 2018-11-03 17:15   ` Thomas Gleixner
  2018-11-03 23:54     ` Long Li
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2018-11-03 17:15 UTC (permalink / raw)
  To: Long Li; +Cc: linux-kernel

On Sat, 3 Nov 2018, Thomas Gleixner wrote:
> On Fri, 2 Nov 2018, Long Li wrote:
> >  /**
> >   * irq_matrix_assign_system - Assign system wide entry in the matrix
> >   * @m:		Matrix pointer
> > @@ -269,7 +291,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
> >  	if (cpumask_empty(msk))
> >  		return -EINVAL;
> >  
> > -	cpu = matrix_find_best_cpu(m, msk);
> > +	cpu = matrix_find_best_cpu_managed(m, msk);
> >  	if (cpu == UINT_MAX)
> >  		return -ENOSPC;
> >  
> > @@ -282,6 +304,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
> >  		return -ENOSPC;
> >  	set_bit(bit, cm->alloc_map);
> >  	cm->allocated++;
> > +	cm->managed_allocated++;
> >  	m->total_allocated++;
> >  	*mapped_cpu = cpu;
> >  	trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
> 
> so far so good. But what exactly decrements managed_allocated ?

Another thing. If we add that counter, then it would be good to expose it
in the debugfs files as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated
  2018-11-03 17:15   ` Thomas Gleixner
@ 2018-11-03 23:54     ` Long Li
  0 siblings, 0 replies; 4+ messages in thread
From: Long Li @ 2018-11-03 23:54 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel@vger.kernel.org

> Subject: Re: [PATCH v3] genirq/matrix: Choose CPU for managed IRQs based
> on how many of them are allocated
> 
> On Sat, 3 Nov 2018, Thomas Gleixner wrote:
> > On Fri, 2 Nov 2018, Long Li wrote:
> > >  /**
> > >   * irq_matrix_assign_system - Assign system wide entry in the matrix
> > >   * @m:		Matrix pointer
> > > @@ -269,7 +291,7 @@ int irq_matrix_alloc_managed(struct irq_matrix
> *m, const struct cpumask *msk,
> > >  	if (cpumask_empty(msk))
> > >  		return -EINVAL;
> > >
> > > -	cpu = matrix_find_best_cpu(m, msk);
> > > +	cpu = matrix_find_best_cpu_managed(m, msk);
> > >  	if (cpu == UINT_MAX)
> > >  		return -ENOSPC;
> > >
> > > @@ -282,6 +304,7 @@ int irq_matrix_alloc_managed(struct irq_matrix
> *m, const struct cpumask *msk,
> > >  		return -ENOSPC;
> > >  	set_bit(bit, cm->alloc_map);
> > >  	cm->allocated++;
> > > +	cm->managed_allocated++;
> > >  	m->total_allocated++;
> > >  	*mapped_cpu = cpu;
> > >  	trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
> >
> > so far so good. But what exactly decrements managed_allocated ?
> 
> Another thing. If we add that counter, then it would be good to expose it in
> the debugfs files as well.

I will send an update to address those.

Long

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-03 23:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-02  0:34 [PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated Long Li
2018-11-03 15:06 ` Thomas Gleixner
2018-11-03 17:15   ` Thomas Gleixner
2018-11-03 23:54     ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox