public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeedm@nvidia.com>
To: Chuck Lever <cel@kernel.org>
Cc: elic@nvidia.com, Thomas Gleixner <tglx@linutronix.de>,
	Chuck Lever <chuck.lever@oracle.com>,
	linux-rdma@vger.kernel.org
Subject: Re: [PATCH] net/mlx5: Ensure af_desc.mask is properly initialized
Date: Wed, 31 May 2023 15:35:39 -0700	[thread overview]
Message-ID: <ZHfLu2u5w69VV6Ts@x130> (raw)
In-Reply-To: <168556238265.1445.7577814343475230160.stgit@manet.1015granger.net>

On 31 May 15:48, Chuck Lever wrote:
>From: Chuck Lever <chuck.lever@oracle.com>
>
>[    9.837087] mlx5_core 0000:02:00.0: firmware version: 16.35.2000
>[    9.843126] mlx5_core 0000:02:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
>[   10.311515] mlx5_core 0000:02:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
>[   10.321948] mlx5_core 0000:02:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
>[   10.344324] mlx5_core 0000:02:00.0: mlx5_pcie_event:301:(pid 88): PCIe slot advertised sufficient power (27W).
>[   10.354339] BUG: unable to handle page fault for address: ffffffff8ff0ade0
>[   10.361206] #PF: supervisor read access in kernel mode
>[   10.366335] #PF: error_code(0x0000) - not-present page
>[   10.371467] PGD 81ec39067 P4D 81ec39067 PUD 81ec3a063 PMD 114b07063 PTE 800ffff7e10f5062
>[   10.379544] Oops: 0000 [#1] PREEMPT SMP PTI
>[   10.383721] CPU: 0 PID: 117 Comm: kworker/0:6 Not tainted 6.3.0-13028-g7222f123c983 #1
>[   10.391625] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b 06/12/2017
>[   10.398750] Workqueue: events work_for_cpu_fn
>[   10.403108] RIP: 0010:__bitmap_or+0x10/0x26
>[   10.407286] Code: 85 c0 0f 95 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 89 c9 31 c0 48 83 c1 3f 48 c1 e9 06 39 c>
>[   10.426024] RSP: 0000:ffffb45a0078f7b0 EFLAGS: 00010097
>[   10.431240] RAX: 0000000000000000 RBX: ffffffff8ff0adc0 RCX: 0000000000000004
>[   10.438365] RDX: ffff9156801967d0 RSI: ffffffff8ff0ade0 RDI: ffff9156801967b0
>[   10.445489] RBP: ffffb45a0078f7e8 R08: 0000000000000030 R09: 0000000000000000
>[   10.452613] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000ec
>[   10.459737] R13: ffffffff8ff0ade0 R14: 0000000000000001 R15: 0000000000000020
>[   10.466862] FS:  0000000000000000(0000) GS:ffff9165bfc00000(0000) knlGS:0000000000000000
>[   10.474936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[   10.480674] CR2: ffffffff8ff0ade0 CR3: 00000001011ae003 CR4: 00000000003706f0
>[   10.487800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>[   10.494922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>[   10.502046] Call Trace:
>[   10.504493]  <TASK>
>[   10.506589]  ? matrix_alloc_area.constprop.0+0x43/0x9a
>[   10.511729]  ? prepare_namespace+0x84/0x174
>[   10.515914]  irq_matrix_reserve_managed+0x56/0x10c
>[   10.520699]  x86_vector_alloc_irqs+0x1d2/0x31e
>[   10.525146]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
>[   10.530284]  irq_domain_alloc_irqs_parent+0x1a/0x2a
>[   10.535155]  intel_irq_remapping_alloc+0x59/0x5e9
>[   10.539859]  ? kmem_cache_debug_flags+0x11/0x26
>[   10.544383]  ? __radix_tree_lookup+0x39/0xb9
>[   10.548649]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
>[   10.553779]  irq_domain_alloc_irqs_parent+0x1a/0x2a
>[   10.558650]  msi_domain_alloc+0x8c/0x120
>[   10.567697]  irq_domain_alloc_irqs_locked+0x11d/0x286
>[   10.572741]  __irq_domain_alloc_irqs+0x72/0x93
>[   10.577179]  __msi_domain_alloc_irqs+0x193/0x3f1
>[   10.581789]  ? __xa_alloc+0xcf/0xe2
>[   10.585273]  msi_domain_alloc_irq_at+0xa8/0xfe
>[   10.589711]  pci_msix_alloc_irq_at+0x47/0x5c
>
>The crash is due to matrix_alloc_area() attempting to access per-CPU
>memory for CPUs that are not present on the system. The CPU mask
>passed into reserve_managed_vector() via it's @irqd parameter is
>corrupted because it contains uninitialized stack data.
>
>Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor")
>Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
>Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Applied to net-mlx5, Chuck, for Faster review please CC netdev next time
for mlx5 patches.

           reply	other threads:[~2023-05-31 22:35 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <168556238265.1445.7577814343475230160.stgit@manet.1015granger.net>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHfLu2u5w69VV6Ts@x130 \
    --to=saeedm@nvidia.com \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=elic@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox