Re: [PATCH V5 net-next] net: mana: Assigning IRQ affinity on HT cores

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yury Norov <yury.norov@gmail.com>
To: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, longli@microsoft.com,
	leon@kernel.org, cai.huoqing@linux.dev,
	ssengar@linux.microsoft.com, vkuznets@redhat.com,
	tglx@linutronix.de, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, schakrabarti@microsoft.com,
	paulros@microsoft.com
Subject: Re: [PATCH V5 net-next] net: mana: Assigning IRQ affinity on HT cores
Date: Fri, 8 Dec 2023 13:53:51 -0800	[thread overview]
Message-ID: <ZXOQb+3R0YAT/rAm@yury-ThinkPad> (raw)
In-Reply-To: <ZXMiOwK3sOJNXHxd@yury-ThinkPad>

Few more nits

On Fri, Dec 08, 2023 at 06:03:40AM -0800, Yury Norov wrote:
> On Fri, Dec 08, 2023 at 02:02:34AM -0800, Souradeep Chakrabarti wrote:
> > Existing MANA design assigns IRQ to every CPU, including sibling
> > hyper-threads. This may cause multiple IRQs to be active simultaneously
> > in the same core and may reduce the network performance with RSS.
> 
> Can you add an IRQ distribution diagram to compare before/after
> behavior, similarly to what I did in the other email?
> 
> > Improve the performance by assigning IRQ to non sibling CPUs in local
> > NUMA node. The performance improvement we are getting using ntttcp with
> > following patch is around 15 percent with existing design and approximately
> > 11 percent, when trying to assign one IRQ in each core across NUMA nodes,
> > if enough cores are present.
> 
> How did you measure it? In the other email you said you used perf, can
> you show your procedure in details?
> 
> > Suggested-by: Yury Norov <yury.norov@gmali.com>
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> 
> [...]
> 
> >  .../net/ethernet/microsoft/mana/gdma_main.c   | 92 +++++++++++++++++--
> >  1 file changed, 83 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 6367de0c2c2e..18e8908c5d29 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -1243,15 +1243,56 @@ void mana_gd_free_res_map(struct gdma_resource *r)
> >  	r->size = 0;
> >  }
> >  
> > +static int irq_setup(int *irqs, int nvec, int start_numa_node)
> > +{
> > +	int w, cnt, cpu, err = 0, i = 0;
> > +	int next_node = start_numa_node;
> 
> What for this?
> 
> > +	const struct cpumask *next, *prev = cpu_none_mask;
> > +	cpumask_var_t curr, cpus;
> > +
> > +	if (!zalloc_cpumask_var(&curr, GFP_KERNEL)) {

alloc_cpumask_var() here and below, because you initialize them by
copying

> > +		err = -ENOMEM;
> > +		return err;
> > +	}
> > +	if (!zalloc_cpumask_var(&cpus, GFP_KERNEL)) {
> 
>                 free(curr);
> 
> > +		err = -ENOMEM;
> > +		return err;
> > +	}
> > +
> > +	rcu_read_lock();
> > +	for_each_numa_hop_mask(next, next_node) {
> > +		cpumask_andnot(curr, next, prev);
> > +		for (w = cpumask_weight(curr), cnt = 0; cnt < w; ) {

OK, if you can't increment inside for-loop, I'd switch it to a
while-loop:
                w = cpumask_weight(curr);
                cnt = 0;

		while (cnt < w) {

> > +			cpumask_copy(cpus, curr);
> > +			for_each_cpu(cpu, cpus) {
> > +				irq_set_affinity_and_hint(irqs[i], topology_sibling_cpumask(cpu));
> > +				if (++i == nvec)
> > +					goto done;
> 
> Think what if you're passed with irq_setup(NULL, 0, 0).
> That's why I suggested to place this check at the beginning.
> 
> 
> > +				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
> > +				++cnt;
> > +			}
> > +		}
> > +		prev = next;
> > +	}

Don't hesitate to add even more vertical spacing. It's like: "take a
breath folks, this section is done". :)

> > +done:
> > +	rcu_read_unlock();
> > +	free_cpumask_var(curr);
> > +	free_cpumask_var(cpus);
> > +	return err;
> > +}
> > +
> >  static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  {
> > -	unsigned int max_queues_per_port = num_online_cpus();
> >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> > +	unsigned int max_queues_per_port;
> >  	struct gdma_irq_context *gic;
> >  	unsigned int max_irqs, cpu;
> > -	int nvec, irq;
> > +	int start_irq_index = 1;
> > +	int nvec, *irqs, irq;
> >  	int err, i = 0, j;
> >  
> > +	cpus_read_lock();
> > +	max_queues_per_port = num_online_cpus();
> >  	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> >  		max_queues_per_port = MANA_MAX_NUM_QUEUES;
> >  
> > @@ -1261,6 +1302,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  	nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX);
> >  	if (nvec < 0)
> >  		return nvec;
> > +	if (nvec <= num_online_cpus())
> > +		start_irq_index = 0;
> > +
> > +	irqs = kmalloc_array((nvec - start_irq_index), sizeof(int), GFP_KERNEL);
> > +	if (!irqs) {
> > +		err = -ENOMEM;
> > +		goto free_irq_vector;
> > +	}
> >  
> >  	gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context),
> >  				   GFP_KERNEL);
> > @@ -1287,21 +1336,44 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  			goto free_irq;
> >  		}
> >  
> > -		err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
> > -		if (err)
> > -			goto free_irq;
> > -
> > -		cpu = cpumask_local_spread(i, gc->numa_node);
> > -		irq_set_affinity_and_hint(irq, cpumask_of(cpu));
> > +		if (!i) {
> > +			err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
> > +			if (err)
> > +				goto free_irq;
> > +
> > +			/* If number of IRQ is one extra than number of online CPUs,
> > +			 * then we need to assign IRQ0 (hwc irq) and IRQ1 to
> > +			 * same CPU.
> > +			 * Else we will use different CPUs for IRQ0 and IRQ1.
> > +			 * Also we are using cpumask_local_spread instead of
> > +			 * cpumask_first for the node, because the node can be
> > +			 * mem only.
> > +			 */
> > +			if (start_irq_index) {
> > +				cpu = cpumask_local_spread(i, gc->numa_node);
> 
> I already mentioned that: if i == 0, you don't need to spread, just
> pick 1st cpu from node.
> 
> > +				irq_set_affinity_and_hint(irq, cpumask_of(cpu));
> > +			} else {
> > +				irqs[start_irq_index] = irq;
> > +			}
> > +		} else {
> > +			irqs[i - start_irq_index] = irq;
> > +			err = request_irq(irqs[i - start_irq_index], mana_gd_intr, 0,
> > +					  gic->name, gic);
> > +			if (err)
> > +				goto free_irq;
> > +		}
> >  	}
> >  
> > +	err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node);
> > +	if (err)
> > +		goto free_irq;
> >  	err = mana_gd_alloc_res_map(nvec, &gc->msix_resource);
> >  	if (err)
> >  		goto free_irq;
> >  
> >  	gc->max_num_msix = nvec;
> >  	gc->num_msix_usable = nvec;
> > -
> > +	cpus_read_unlock();
> >  	return 0;
> >  
> >  free_irq:
> > @@ -1314,8 +1386,10 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  	}
> >  
> >  	kfree(gc->irq_contexts);
> > +	kfree(irqs);
> >  	gc->irq_contexts = NULL;
> >  free_irq_vector:
> > +	cpus_read_unlock();
> >  	pci_free_irq_vectors(pdev);
> >  	return err;
> >  }
> > -- 
> > 2.34.1

next prev parent reply	other threads:[~2023-12-08 21:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-08 10:02 [PATCH V5 net-next] net: mana: Assigning IRQ affinity on HT cores Souradeep Chakrabarti
2023-12-08 14:03 ` Yury Norov
2023-12-08 21:53   ` Yury Norov [this message]
2023-12-11  6:53     ` Souradeep Chakrabarti
2023-12-11 14:00       ` Yury Norov
2023-12-12  6:03         ` Souradeep Chakrabarti
2023-12-11  6:37   ` Souradeep Chakrabarti
2023-12-11 15:30     ` Yury Norov
2023-12-12 11:38       ` Souradeep Chakrabarti
2023-12-12 16:34         ` Yury Norov
2023-12-12 17:18           ` [EXTERNAL] " Souradeep Chakrabarti
2023-12-12 17:40             ` Yury Norov
2023-12-12 18:17 ` [EXT] " Suman Ghosh
2023-12-12 18:22   ` Souradeep Chakrabarti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXOQb+3R0YAT/rAm@yury-ThinkPad \
    --to=yury.norov@gmail.com \
    --cc=cai.huoqing@linux.dev \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulros@microsoft.com \
    --cc=schakrabarti@linux.microsoft.com \
    --cc=schakrabarti@microsoft.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.