From: manjunath.b.patil@oracle.com
To: Tariq Toukan <tariqt@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Mark Bloch <mbloch@nvidia.com>, Leon Romanovsky <leon@kernel.org>,
netdev@vger.kernel.org
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Patrisious Haddad <phaddad@nvidia.com>,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH net] net/mlx5e: Use sender devcom for MPV master-up
Date: Tue, 23 Jun 2026 10:51:17 -0700 [thread overview]
Message-ID: <f5bbe6d4-b0a8-414c-bdbc-5dd169a64c2b@oracle.com> (raw)
In-Reply-To: <293db0b4-f308-469e-99c1-ef1b57d41451@nvidia.com>
On 6/22/26 2:01 AM, Tariq Toukan wrote:
>
>
> On 10/06/2026 20:39, Manjunath Patil wrote:
>> After PCIe DPC recovery, mlx5 reloads the affected functions and
>> replays multiport affiliation events. In the reported failure, the
>> first relevant device error was:
>>
>> pcieport 0000:10:01.1: DPC: containment event
>> pcieport 0000:10:01.1: PCIe Bus Error: severity=Uncorrected (Fatal)
>> pcieport 0000:10:01.1: [ 5] SDES (First)
>>
>> mlx5 recovered the PCI functions and resumed 0000:11:00.1. During
>> that resume, RDMA multiport binding replayed
>> MLX5_DRIVER_EVENT_AFFILIATION_DONE and mlx5e sent
>> MPV_DEVCOM_MASTER_UP. The host then panicked with:
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000010
>> RIP: mlx5_devcom_comp_set_ready+0x5/0x40 [mlx5_core]
>> RDI: 0000000000000000
>>
>> Call trace included:
>>
>> mlx5_devcom_comp_set_ready
>> mlx5e_devcom_event_mpv
>> mlx5_devcom_send_event
>> mlx5_ib_bind_slave_port
>> mlx5r_mp_probe
>> mlx5_pci_resume
>>
>> MPV devcom registration publishes mlx5e private data to the component
>> peer list before mlx5e_devcom_init_mpv() stores the returned component
>> device in priv->devcom. A concurrent master-up event can therefore
>> reach a peer whose private data is visible but whose priv->devcom
>> backpointer is still NULL.
>>
>> MPV_DEVCOM_MASTER_UP already carries the sender/master mlx5e private
>> data as event_data. The ready bit is stored on the shared devcom
>> component, not on an individual peer. Use the sender devcom when
>> marking the MPV component ready.
>>
>> This preserves the readiness transition while avoiding a NULL
>> dereference of the peer devcom pointer during affiliation replay after
>> PCI error recovery.
>>
>> Fixes: bf11485f8419 ("net/mlx5: Register mlx5e priv to devcom in MPV
>> mode")
>> Assisted-by: Codex:gpt-5
>> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
>> Cc: stable@vger.kernel.org # 6.7+
>> ---
>
> Thanks for your patch and sorry for the late response.
>
>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/
>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index 8f2b3abe0092..f7ff20b97e8c 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -211,11 +211,14 @@ static void mlx5e_disable_async_events(struct
>> mlx5e_priv *priv)
>> static int mlx5e_devcom_event_mpv(int event, void *my_data, void
>> *event_data)
>> {
>> - struct mlx5e_priv *slave_priv = my_data;
>> + struct mlx5e_priv *master_priv = event_data;
>
> makes sense.
>
>> switch (event) {
>> case MPV_DEVCOM_MASTER_UP:
>> - mlx5_devcom_comp_set_ready(slave_priv->devcom, true);
>> + if (!master_priv || !master_priv->devcom)
>> + return -EINVAL;
>
> is this currently possible? or just being defensive?
> if this return is unreachable I'd drop it.
Yes, the check is only defensive. For MPV_DEVCOM_MASTER_UP, event_data
is passed from mlx5e_devcom_init_mpv() after priv->devcom has been
assigned, so it should not be reachable in the valid path.
Please feel free to drop the check while applying. If you prefer a v2,
let me know and I will send one.
Thanks,
Manjunath
>
>> +
>> + mlx5_devcom_comp_set_ready(master_priv->devcom, true);
>> break;
>> case MPV_DEVCOM_MASTER_DOWN:
>> /* no need for comp set ready false since we unregister after
>
prev parent reply other threads:[~2026-06-23 17:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-10 17:39 [PATCH net] net/mlx5e: Use sender devcom for MPV master-up Manjunath Patil
2026-06-17 16:28 ` manjunath.b.patil
2026-06-22 9:01 ` Tariq Toukan
2026-06-23 17:51 ` manjunath.b.patil [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f5bbe6d4-b0a8-414c-bdbc-5dd169a64c2b@oracle.com \
--to=manjunath.b.patil@oracle.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=phaddad@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=stable@vger.kernel.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox