Netdev List
 help / color / mirror / Atom feed
From: manjunath.b.patil@oracle.com
To: Tariq Toukan <tariqt@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Mark Bloch <mbloch@nvidia.com>, Leon Romanovsky <leon@kernel.org>,
	netdev@vger.kernel.org
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Patrisious Haddad <phaddad@nvidia.com>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH net] net/mlx5e: Use sender devcom for MPV master-up
Date: Tue, 23 Jun 2026 10:51:17 -0700	[thread overview]
Message-ID: <f5bbe6d4-b0a8-414c-bdbc-5dd169a64c2b@oracle.com> (raw)
In-Reply-To: <293db0b4-f308-469e-99c1-ef1b57d41451@nvidia.com>



On 6/22/26 2:01 AM, Tariq Toukan wrote:
> 
> 
> On 10/06/2026 20:39, Manjunath Patil wrote:
>> After PCIe DPC recovery, mlx5 reloads the affected functions and
>> replays multiport affiliation events. In the reported failure, the
>> first relevant device error was:
>>
>>    pcieport 0000:10:01.1: DPC: containment event
>>    pcieport 0000:10:01.1: PCIe Bus Error: severity=Uncorrected (Fatal)
>>    pcieport 0000:10:01.1:    [ 5] SDES                   (First)
>>
>> mlx5 recovered the PCI functions and resumed 0000:11:00.1. During
>> that resume, RDMA multiport binding replayed
>> MLX5_DRIVER_EVENT_AFFILIATION_DONE and mlx5e sent
>> MPV_DEVCOM_MASTER_UP. The host then panicked with:
>>
>>    BUG: kernel NULL pointer dereference, address: 0000000000000010
>>    RIP: mlx5_devcom_comp_set_ready+0x5/0x40 [mlx5_core]
>>    RDI: 0000000000000000
>>
>> Call trace included:
>>
>>    mlx5_devcom_comp_set_ready
>>    mlx5e_devcom_event_mpv
>>    mlx5_devcom_send_event
>>    mlx5_ib_bind_slave_port
>>    mlx5r_mp_probe
>>    mlx5_pci_resume
>>
>> MPV devcom registration publishes mlx5e private data to the component
>> peer list before mlx5e_devcom_init_mpv() stores the returned component
>> device in priv->devcom. A concurrent master-up event can therefore
>> reach a peer whose private data is visible but whose priv->devcom
>> backpointer is still NULL.
>>
>> MPV_DEVCOM_MASTER_UP already carries the sender/master mlx5e private
>> data as event_data. The ready bit is stored on the shared devcom
>> component, not on an individual peer. Use the sender devcom when
>> marking the MPV component ready.
>>
>> This preserves the readiness transition while avoiding a NULL
>> dereference of the peer devcom pointer during affiliation replay after
>> PCI error recovery.
>>
>> Fixes: bf11485f8419 ("net/mlx5: Register mlx5e priv to devcom in MPV 
>> mode")
>> Assisted-by: Codex:gpt-5
>> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
>> Cc: stable@vger.kernel.org # 6.7+
>> ---
> 
> Thanks for your patch and sorry for the late response.
> 
>>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/ 
>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index 8f2b3abe0092..f7ff20b97e8c 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -211,11 +211,14 @@ static void mlx5e_disable_async_events(struct 
>> mlx5e_priv *priv)
>>   static int mlx5e_devcom_event_mpv(int event, void *my_data, void 
>> *event_data)
>>   {
>> -    struct mlx5e_priv *slave_priv = my_data;
>> +    struct mlx5e_priv *master_priv = event_data;
> 
> makes sense.
> 
>>       switch (event) {
>>       case MPV_DEVCOM_MASTER_UP:
>> -        mlx5_devcom_comp_set_ready(slave_priv->devcom, true);
>> +        if (!master_priv || !master_priv->devcom)
>> +            return -EINVAL;
> 
> is this currently possible? or just being defensive?
> if this return is unreachable I'd drop it.

Yes, the check is only defensive. For MPV_DEVCOM_MASTER_UP, event_data 
is passed from mlx5e_devcom_init_mpv() after priv->devcom has been 
assigned, so it should not be reachable in the valid path.

Please feel free to drop the check while applying. If you prefer a v2, 
let me know and I will send one.

Thanks,
Manjunath

> 
>> +
>> +        mlx5_devcom_comp_set_ready(master_priv->devcom, true);
>>           break;
>>       case MPV_DEVCOM_MASTER_DOWN:
>>           /* no need for comp set ready false since we unregister after
> 


      reply	other threads:[~2026-06-23 17:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 17:39 [PATCH net] net/mlx5e: Use sender devcom for MPV master-up Manjunath Patil
2026-06-17 16:28 ` manjunath.b.patil
2026-06-22  9:01 ` Tariq Toukan
2026-06-23 17:51   ` manjunath.b.patil [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5bbe6d4-b0a8-414c-bdbc-5dd169a64c2b@oracle.com \
    --to=manjunath.b.patil@oracle.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=stable@vger.kernel.org \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox