All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yajun Deng" <yajun.deng@linux.dev>
To: "Shay Drory" <shayd@nvidia.com>,
	saeedm@nvidia.com, leon@kernel.org, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] net/mlx5: Add affinity for each irq
Date: Mon, 06 Jun 2022 10:29:54 +0000	[thread overview]
Message-ID: <27b8035a0f9b1ea7cb370e0a346c224d@linux.dev> (raw)
In-Reply-To: <0338001c-4a8c-bf28-b857-42e1bc775ea0@nvidia.com>

June 6, 2022 4:31 PM, "Shay Drory" <shayd@nvidia.com> wrote:

> On 6/6/2022 10:13, Yajun Deng wrote:
> 
>> The mlx5 would allocate no less than one irq for per cpu, we can bond each
>> irq to a cpu to improve interrupt performance.
> 
> The maximum number of affinity set is hard coded to 4. in case nvec > 4 * (num_CPUs)[1]
> we will hit the following WARN[2].
> Also, we hit an oops following this WARN...
> 
> [1]
> mlx5 support up to 2K MSIX (depends on the HW). e.g.: if we max out mlx5 MSIX capability,
> we will cross this limit on any machine, at least that I know of.
> 

Oh, I didn't expect so many MSIX. Thank you!
> [2]
> 
> This is a machine with 10 CPUs and 350 MSIX
> 
> [ 1.633436] ------------[ cut here ]------------
> [ 1.633437] WARNING: CPU: 2 PID: 194 at kernel/irq/affinity.c:443
> irq_create_affinity_masks+0x175/0x270
> [ 1.633467] Modules linked in: mlx5_core(+)
> [ 1.633474] CPU: 2 PID: 194 Comm: systemd-modules Not tainted 5.18.0+ #1
> [ 1.633480] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> [ 1.633483] RIP: 0010:irq_create_affinity_masks+0x175/0x270
> [ 1.633492] Code: 5c 41 5d 41 5e 41 5f c3 48 c7 46 20 90 6d 19 81 48 c7 c0 90 6d 19 81 8b 34 24 4c
> 89 ef ff d0 41 83 7d 08 04 0f 86 de fe ff ff <0f> 0b 45 31 f6 eb c5 45 8b 5d 00 8b 34 24 43 8d 04
> 1f 42 8d 0c 1e
> [ 1.633497] RSP: 0018:ffff88810716bac0 EFLAGS: 00010202
> [ 1.633501] RAX: 000000000000000a RBX: 0000000000000001 RCX: 0000000000000200
> [ 1.633504] RDX: ffffffff82605000 RSI: ffffffff82605000 RDI: 0000000000000000
> [ 1.633507] RBP: ffff88810716bbd0 R08: 000000000000000a R09: ffffffff82604fc0
> [ 1.633510] R10: 0000000000000008 R11: 000ffffffffff000 R12: 0000000000000000
> [ 1.633513] R13: ffff88810716bbd0 R14: 0000000000000160 R15: 0000000000000160
> [ 1.633516] FS: 00007f8d72994b80(0000) GS:ffff88852c900000(0000) knlGS:0000000000000000
> [ 1.633525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1.633528] CR2: 00007f8d73ba4490 CR3: 0000000103fce001 CR4: 0000000000370ea0
> [ 1.633531] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1.633534] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1.633536] Call Trace:
> [ 1.633549] <TASK>
> [ 1.633553] __pci_enable_msix_range+0x2b9/0x4c0
> [ 1.633572] pci_alloc_irq_vectors_affinity+0xa5/0x100
> [ 1.633579] mlx5_irq_table_create.cold+0x6d/0x22f [mlx5_core]
> [ 1.634032] ? probe_one+0x1aa/0x280 [mlx5_core]
> [ 1.634193] ? pci_device_probe+0xa4/0x140
> [ 1.634201] ? really_probe+0xc9/0x350
> [ 1.634205] ? pm_runtime_barrier+0x43/0x80
> [ 1.634213] ? __driver_probe_device+0x80/0x170
> [ 1.634218] ? driver_probe_device+0x1e/0x90
> [ 1.634223] ? __driver_attach+0xcd/0x1b0
> [ 1.634226] ? __device_attach_driver+0xf0/0xf0
> [ 1.634231] ? __device_attach_driver+0xf0/0xf0
> [ 1.634235] ? bus_for_each_dev+0x77/0xc0
> [ 1.634243] ? bus_add_driver+0x184/0x1f0
> [ 1.634247] ? driver_register+0x8f/0xe0
> [ 1.634251] ? 0xffffffffa0180000
> [ 1.634256] ? init+0x62/0x1000 [mlx5_core]
> [ 1.634413] ? do_one_initcall+0x4a/0x1e0
> [ 1.634418] ? kmem_cache_alloc_trace+0x33/0x420
> [ 1.634426] ? do_init_module+0x72/0x260
> [ 1.634434] ? __do_sys_finit_module+0xbb/0x130
> [ 1.634443] ? do_syscall_64+0x3d/0x90
> [ 1.634452] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 1.634461] </TASK>
> [ 1.634463] ---[ end trace 0000000000000000 ]---
> [^[[0;32m OK ^[[0m] Finished ^[[0;1;39mudev Coldplug all Devices^[[0m.
> [ 1.713428] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
> mlx5_irq_table_create+0x9c/0xa0 [mlx5_core]
> [ 1.715521] CPU: 2 PID: 194 Comm: systemd-modules Tainted: G W 5.18.0+ #1
> [ 1.715524] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> [ 1.715525] Call Trace:
> [ 1.715532] <TASK>
> [ 1.715533] dump_stack_lvl+0x34/0x44
> [ 1.715538] panic+0x100/0x255
> [ 1.715542] ? mlx5_irq_table_create+0x9c/0xa0 [mlx5_core]
> [ 1.715602] __stack_chk_fail+0x10/0x10
> [ 1.715607] mlx5_irq_table_create+0x9c/0xa0 [mlx5_core]
> [ 1.715662] ? probe_one+0x1aa/0x280 [mlx5_core]
> [ 1.715709] ? pci_device_probe+0xa4/0x140
> [ 1.715712] ? really_probe+0xc9/0x350
> [ 1.715715] ? pm_runtime_barrier+0x43/0x80
> [ 1.715718] ? __driver_probe_device+0x80/0x170
> [ 1.715719] ? driver_probe_device+0x1e/0x90
> [ 1.715721] ? __driver_attach+0xcd/0x1b0
> [ 1.715722] ? __device_attach_driver+0xf0/0xf0
> [ 1.715723] ? __device_attach_driver+0xf0/0xf0
> [ 1.715724] ? bus_for_each_dev+0x77/0xc0
> [ 1.715727] ? bus_add_driver+0x184/0x1f0
> [ 1.715728] ? driver_register+0x8f/0xe0
> [ 1.715730] ? 0xffffffffa0180000
> [ 1.715731] ? init+0x62/0x1000 [mlx5_core]
> [ 1.715778] ? do_one_initcall+0x4a/0x1e0
> [ 1.715781] ? kmem_cache_alloc_trace+0x33/0x420
> [ 1.715784] ? do_init_module+0x72/0x260
> [ 1.715788] ? __do_sys_finit_module+0xbb/0x130
> [ 1.715790] ? do_syscall_64+0x3d/0x90
> [ 1.715792] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 1.715796] </TASK>
> [ 1.715938] Kernel Offset: disabled
> [ 1.732563] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
> mlx5_irq_table_create+0x9c/0xa0 [mlx5_core] ]---
> 
>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
>> ---
>> .../net/ethernet/mellanox/mlx5/core/pci_irq.c | 19 ++++++++++++++++++-
>> 1 file changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
>> index 662f1d55e30e..d13fc403fe78 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
>> @@ -624,11 +624,27 @@ int mlx5_irq_table_get_num_comp(struct mlx5_irq_table *table)
>> return table->pf_pool->xa_num_irqs.max - table->pf_pool->xa_num_irqs.min;
>> }
>>> +static void mlx5_calc_sets(struct irq_affinity *affd, unsigned int nvecs)
>> +{
>> + int i;
>> +
>> + affd->nr_sets = (nvecs - 1) / num_possible_cpus() + 1;
>> +
>> + for (i = 0; i < affd->nr_sets; i++) {
>> + affd->set_size[i] = min(nvecs, num_possible_cpus());
>> + nvecs -= num_possible_cpus();
>> + }
>> +}
>> +
>> int mlx5_irq_table_create(struct mlx5_core_dev *dev)
>> {
>> int num_eqs = MLX5_CAP_GEN(dev, max_num_eqs) ?
>> MLX5_CAP_GEN(dev, max_num_eqs) :
>> 1 << MLX5_CAP_GEN(dev, log_max_eq);
>> + struct irq_affinity affd = {
>> + .pre_vectors = 0,
>> + .calc_sets = mlx5_calc_sets,
>> + };
>> int total_vec;
>> int pf_vec;
>> int err;
>> @@ -644,7 +660,8 @@ int mlx5_irq_table_create(struct mlx5_core_dev *dev)
>> total_vec += MLX5_IRQ_CTRL_SF_MAX +
>> MLX5_COMP_EQS_PER_SF * mlx5_sf_max_functions(dev);
>>> - total_vec = pci_alloc_irq_vectors(dev->pdev, 1, total_vec, PCI_IRQ_MSIX);
>> + total_vec = pci_alloc_irq_vectors_affinity(dev->pdev, 1, total_vec,
>> + PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &affd);
>> if (total_vec < 0)
>> return total_vec;
>> pf_vec = min(pf_vec, total_vec);

      reply	other threads:[~2022-06-06 10:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-06  7:13 [PATCH] net/mlx5: Add affinity for each irq Yajun Deng
2022-06-06  8:31 ` Shay Drory
2022-06-06 10:29   ` Yajun Deng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27b8035a0f9b1ea7cb370e0a346c224d@linux.dev \
    --to=yajun.deng@linux.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.