* [PATCH v3 0/2] net/ncsi: fix oem gma command handling and state race issue
@ 2025-01-13 2:34 Potin Lai
2025-01-13 2:34 ` [PATCH v3 1/2] net/ncsi: fix locking in Get MAC Address handling Potin Lai
2025-01-13 2:34 ` [PATCH v3 2/2] net/ncsi: fix state race during channel probe completion Potin Lai
0 siblings, 2 replies; 7+ messages in thread
From: Potin Lai @ 2025-01-13 2:34 UTC (permalink / raw)
To: Samuel Mendoza-Jonas, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Ivan Mikhaylov,
Paul Fertser, Patrick Williams
Cc: netdev, linux-kernel, Cosmo Chou, Potin Lai, Potin Lai, stable,
Cosmo Chou
We are seeing kernel panic when enabling two NCSI interfaces at same
time. It looks like mutex lock is being used in softirq caused the
issue.
This patch series try to fix oem gma command handling issue by adding a
new state, also fix a potential state handling issue.
Signed-off-by: Potin Lai <potin.lai.pt@gmail.com>
---
Changes in v3:
- Fix compile error by removing non-exist variable.
- Link to v2: https://lore.kernel.org/r/20250111-fix-ncsi-mac-v2-0-838e0a1a233a@gmail.com
Changes in v2:
- Add second patch for fixing state handling issue.
- Link to v1: https://lore.kernel.org/all/20250109145054.30925-1-fercerpav@gmail.com/
---
Cosmo Chou (1):
net/ncsi: fix state race during channel probe completion
Paul Fertser (1):
net/ncsi: fix locking in Get MAC Address handling
net/ncsi/internal.h | 2 ++
net/ncsi/ncsi-manage.c | 21 ++++++++++++++++++---
net/ncsi/ncsi-rsp.c | 19 ++++++-------------
3 files changed, 26 insertions(+), 16 deletions(-)
---
base-commit: fc033cf25e612e840e545f8d5ad2edd6ba613ed5
change-id: 20250111-fix-ncsi-mac-1e4b3df431f1
Best regards,
--
Potin Lai <potin.lai.pt@gmail.com>
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v3 1/2] net/ncsi: fix locking in Get MAC Address handling 2025-01-13 2:34 [PATCH v3 0/2] net/ncsi: fix oem gma command handling and state race issue Potin Lai @ 2025-01-13 2:34 ` Potin Lai 2025-01-13 2:34 ` [PATCH v3 2/2] net/ncsi: fix state race during channel probe completion Potin Lai 1 sibling, 0 replies; 7+ messages in thread From: Potin Lai @ 2025-01-13 2:34 UTC (permalink / raw) To: Samuel Mendoza-Jonas, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Ivan Mikhaylov, Paul Fertser, Patrick Williams Cc: netdev, linux-kernel, Cosmo Chou, Potin Lai, Potin Lai, stable From: Paul Fertser <fercerpav@gmail.com> Obtaining RTNL lock in a response handler is not allowed since it runs in an atomic softirq context. Postpone setting the MAC address by adding a dedicated step to the configuration FSM. Fixes: 790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address") Cc: stable@vger.kernel.org Cc: Potin Lai <potin.lai@quantatw.com> Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com Signed-off-by: Paul Fertser <fercerpav@gmail.com> --- net/ncsi/internal.h | 2 ++ net/ncsi/ncsi-manage.c | 16 ++++++++++++++-- net/ncsi/ncsi-rsp.c | 19 ++++++------------- 3 files changed, 22 insertions(+), 15 deletions(-) diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h index ef0f8f73826f..4e0842df5234 100644 --- a/net/ncsi/internal.h +++ b/net/ncsi/internal.h @@ -289,6 +289,7 @@ enum { ncsi_dev_state_config_sp = 0x0301, ncsi_dev_state_config_cis, ncsi_dev_state_config_oem_gma, + ncsi_dev_state_config_apply_mac, ncsi_dev_state_config_clear_vids, ncsi_dev_state_config_svf, ncsi_dev_state_config_ev, @@ -322,6 +323,7 @@ struct ncsi_dev_priv { #define NCSI_DEV_RESHUFFLE 4 #define NCSI_DEV_RESET 8 /* Reset state of NC */ unsigned int gma_flag; /* OEM GMA flag */ + struct sockaddr pending_mac; /* MAC address received from GMA */ spinlock_t lock; /* Protect the NCSI device */ unsigned int package_probe_id;/* Current ID during probe */ unsigned int package_num; /* Number of packages */ diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index 5cf55bde366d..bf276eaf9330 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -1038,7 +1038,7 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp) : ncsi_dev_state_config_clear_vids; break; case ncsi_dev_state_config_oem_gma: - nd->state = ncsi_dev_state_config_clear_vids; + nd->state = ncsi_dev_state_config_apply_mac; nca.package = np->id; nca.channel = nc->id; @@ -1050,10 +1050,22 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp) nca.type = NCSI_PKT_CMD_OEM; ret = ncsi_gma_handler(&nca, nc->version.mf_id); } - if (ret < 0) + if (ret < 0) { + nd->state = ncsi_dev_state_config_clear_vids; schedule_work(&ndp->work); + } break; + case ncsi_dev_state_config_apply_mac: + rtnl_lock(); + ret = dev_set_mac_address(dev, &ndp->pending_mac, NULL); + rtnl_unlock(); + if (ret < 0) + netdev_warn(dev, "NCSI: 'Writing MAC address to device failed\n"); + + nd->state = ncsi_dev_state_config_clear_vids; + + fallthrough; case ncsi_dev_state_config_clear_vids: case ncsi_dev_state_config_svf: case ncsi_dev_state_config_ev: diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c index e28be33bdf2c..14bd66909ca4 100644 --- a/net/ncsi/ncsi-rsp.c +++ b/net/ncsi/ncsi-rsp.c @@ -628,16 +628,14 @@ static int ncsi_rsp_handler_snfc(struct ncsi_request *nr) static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id) { struct ncsi_dev_priv *ndp = nr->ndp; + struct sockaddr *saddr = &ndp->pending_mac; struct net_device *ndev = ndp->ndev.dev; struct ncsi_rsp_oem_pkt *rsp; - struct sockaddr saddr; u32 mac_addr_off = 0; - int ret = 0; /* Get the response header */ rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp); - saddr.sa_family = ndev->type; ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE; if (mfr_id == NCSI_OEM_MFR_BCM_ID) mac_addr_off = BCM_MAC_ADDR_OFFSET; @@ -646,22 +644,17 @@ static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id) else if (mfr_id == NCSI_OEM_MFR_INTEL_ID) mac_addr_off = INTEL_MAC_ADDR_OFFSET; - memcpy(saddr.sa_data, &rsp->data[mac_addr_off], ETH_ALEN); + saddr->sa_family = ndev->type; + memcpy(saddr->sa_data, &rsp->data[mac_addr_off], ETH_ALEN); if (mfr_id == NCSI_OEM_MFR_BCM_ID || mfr_id == NCSI_OEM_MFR_INTEL_ID) - eth_addr_inc((u8 *)saddr.sa_data); - if (!is_valid_ether_addr((const u8 *)saddr.sa_data)) + eth_addr_inc((u8 *)saddr->sa_data); + if (!is_valid_ether_addr((const u8 *)saddr->sa_data)) return -ENXIO; /* Set the flag for GMA command which should only be called once */ ndp->gma_flag = 1; - rtnl_lock(); - ret = dev_set_mac_address(ndev, &saddr, NULL); - rtnl_unlock(); - if (ret < 0) - netdev_warn(ndev, "NCSI: 'Writing mac address to device failed\n"); - - return ret; + return 0; } /* Response handler for Mellanox card */ -- 2.31.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/2] net/ncsi: fix state race during channel probe completion 2025-01-13 2:34 [PATCH v3 0/2] net/ncsi: fix oem gma command handling and state race issue Potin Lai 2025-01-13 2:34 ` [PATCH v3 1/2] net/ncsi: fix locking in Get MAC Address handling Potin Lai @ 2025-01-13 2:34 ` Potin Lai 2025-01-14 12:55 ` Paul Fertser 1 sibling, 1 reply; 7+ messages in thread From: Potin Lai @ 2025-01-13 2:34 UTC (permalink / raw) To: Samuel Mendoza-Jonas, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Ivan Mikhaylov, Paul Fertser, Patrick Williams Cc: netdev, linux-kernel, Cosmo Chou, Potin Lai, Potin Lai, Cosmo Chou From: Cosmo Chou <chou.cosmo@gmail.com> During channel probing, the last NCSI_PKT_CMD_DP command can trigger an unnecessary schedule_work() via ncsi_free_request(). We observed that subsequent config states were triggered before the scheduled work completed, causing potential state handling issues. Fix this by clearing req_flags when processing the last package. Fixes: 8e13f70be05e ("net/ncsi: Probe single packages to avoid conflict") Signed-off-by: Cosmo Chou <chou.cosmo@gmail.com> --- net/ncsi/ncsi-manage.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index bf276eaf9330..99e3a5883617 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -1491,7 +1491,10 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) } break; case ncsi_dev_state_probe_dp: - ndp->pending_req_num = 1; + if (ndp->package_probe_id + 1 < 8) + ndp->pending_req_num = 1; + else + nca.req_flags = 0; /* Deselect the current package */ nca.type = NCSI_PKT_CMD_DP; -- 2.31.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] net/ncsi: fix state race during channel probe completion 2025-01-13 2:34 ` [PATCH v3 2/2] net/ncsi: fix state race during channel probe completion Potin Lai @ 2025-01-14 12:55 ` Paul Fertser 2025-01-16 0:38 ` Potin Lai 0 siblings, 1 reply; 7+ messages in thread From: Paul Fertser @ 2025-01-14 12:55 UTC (permalink / raw) To: Potin Lai Cc: Samuel Mendoza-Jonas, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Ivan Mikhaylov, Patrick Williams, netdev, linux-kernel, Cosmo Chou, Potin Lai, Cosmo Chou Hello, On Mon, Jan 13, 2025 at 10:34:48AM +0800, Potin Lai wrote: > During channel probing, the last NCSI_PKT_CMD_DP command can trigger > an unnecessary schedule_work() via ncsi_free_request(). We observed > that subsequent config states were triggered before the scheduled > work completed, causing potential state handling issues. Please let's not make this whole NC-SI story even less comprehensible than it already is. From this commit message I was unable to understand what exactly is racing with what and under which conditions. "Can trigger" would imply that it does not always trigger that wrong state transition but that would also mean there's a set of conditions that is necessary for bug to happen. > Fix this by clearing req_flags when processing the last package. After reading the code for a few hours I can probably see how lack of proper processing of the response to the last "Package Deselect" call can mix up the states. How about this diff instead (tested on Tioga Pass but there we didn't have any issues in the first place)? diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index bf276eaf9330..7891a537bddd 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -1385,6 +1385,12 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nd->state = ncsi_dev_state_probe_package; break; case ncsi_dev_state_probe_package: + if (ndp->package_probe_id >= 8) { + /* Last package probed, finishing */ + ndp->flags |= NCSI_DEV_PROBED; + break; + } + ndp->pending_req_num = 1; nca.type = NCSI_PKT_CMD_SP; @@ -1501,13 +1507,8 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) if (ret) goto error; - /* Probe next package */ + /* Probe next package after receiving response */ ndp->package_probe_id++; - if (ndp->package_probe_id >= 8) { - /* Probe finished */ - ndp->flags |= NCSI_DEV_PROBED; - break; - } nd->state = ncsi_dev_state_probe_package; ndp->active_package = NULL; break; -- Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software! mailto:fercerpav@gmail.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] net/ncsi: fix state race during channel probe completion 2025-01-14 12:55 ` Paul Fertser @ 2025-01-16 0:38 ` Potin Lai 2025-01-16 15:29 ` [PATCH] net/ncsi: wait for the last response to Deselect Package before configuring channel Paul Fertser 0 siblings, 1 reply; 7+ messages in thread From: Potin Lai @ 2025-01-16 0:38 UTC (permalink / raw) To: Paul Fertser Cc: Samuel Mendoza-Jonas, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Ivan Mikhaylov, Patrick Williams, netdev, linux-kernel, Cosmo Chou, Potin Lai, Cosmo Chou On Tue, Jan 14, 2025 at 8:55 PM Paul Fertser <fercerpav@gmail.com> wrote: > > Hello, > > On Mon, Jan 13, 2025 at 10:34:48AM +0800, Potin Lai wrote: > > During channel probing, the last NCSI_PKT_CMD_DP command can trigger > > an unnecessary schedule_work() via ncsi_free_request(). We observed > > that subsequent config states were triggered before the scheduled > > work completed, causing potential state handling issues. > > Please let's not make this whole NC-SI story even less comprehensible > than it already is. From this commit message I was unable to > understand what exactly is racing with what and under which > conditions. "Can trigger" would imply that it does not always trigger > that wrong state transition but that would also mean there's a set of > conditions that is necessary for bug to happen. > > > Fix this by clearing req_flags when processing the last package. > > After reading the code for a few hours I can probably see how lack of > proper processing of the response to the last "Package Deselect" call > can mix up the states. > > How about this diff instead (tested on Tioga Pass but there we didn't > have any issues in the first place)? > > diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c > index bf276eaf9330..7891a537bddd 100644 > --- a/net/ncsi/ncsi-manage.c > +++ b/net/ncsi/ncsi-manage.c > @@ -1385,6 +1385,12 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) > nd->state = ncsi_dev_state_probe_package; > break; > case ncsi_dev_state_probe_package: > + if (ndp->package_probe_id >= 8) { > + /* Last package probed, finishing */ > + ndp->flags |= NCSI_DEV_PROBED; > + break; > + } > + > ndp->pending_req_num = 1; > > nca.type = NCSI_PKT_CMD_SP; > @@ -1501,13 +1507,8 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) > if (ret) > goto error; > > - /* Probe next package */ > + /* Probe next package after receiving response */ > ndp->package_probe_id++; > - if (ndp->package_probe_id >= 8) { > - /* Probe finished */ > - ndp->flags |= NCSI_DEV_PROBED; > - break; > - } > nd->state = ncsi_dev_state_probe_package; > ndp->active_package = NULL; > break; > > -- > Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software! > mailto:fercerpav@gmail.com Hi Paul, Thank you for your suggestion! I tested the patch, and it works perfectly. Could you please help to send it out? I appreciate your assistance. Best regards, Potin ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] net/ncsi: wait for the last response to Deselect Package before configuring channel 2025-01-16 0:38 ` Potin Lai @ 2025-01-16 15:29 ` Paul Fertser 2025-01-23 4:10 ` patchwork-bot+netdevbpf 0 siblings, 1 reply; 7+ messages in thread From: Paul Fertser @ 2025-01-16 15:29 UTC (permalink / raw) To: Potin Lai, Cosmo Chou, Eddie James, Samuel Mendoza-Jonas, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman Cc: netdev, linux-kernel, Ivan Mikhaylov, Paul Fertser, stable The NCSI state machine as it's currently implemented assumes that transition to the next logical state is performed either explicitly by calling `schedule_work(&ndp->work)` to re-queue itself or implicitly after processing the predefined (ndp->pending_req_num) number of replies. Thus to avoid the configuration FSM from advancing prematurely and getting out of sync with the process it's essential to not skip waiting for a reply. This patch makes the code wait for reception of the Deselect Package response for the last package probed before proceeding to channel configuration. Thanks go to Potin Lai and Cosmo Chou for the initial investigation and testing. Fixes: 8e13f70be05e ("net/ncsi: Probe single packages to avoid conflict") Cc: stable@vger.kernel.org Signed-off-by: Paul Fertser <fercerpav@gmail.com> --- net/ncsi/ncsi-manage.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index 5cf55bde366d..bf8e27b84a66 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -1373,6 +1373,12 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nd->state = ncsi_dev_state_probe_package; break; case ncsi_dev_state_probe_package: + if (ndp->package_probe_id >= 8) { + /* Last package probed, finishing */ + ndp->flags |= NCSI_DEV_PROBED; + break; + } + ndp->pending_req_num = 1; nca.type = NCSI_PKT_CMD_SP; @@ -1489,13 +1495,8 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) if (ret) goto error; - /* Probe next package */ + /* Probe next package after receiving response */ ndp->package_probe_id++; - if (ndp->package_probe_id >= 8) { - /* Probe finished */ - ndp->flags |= NCSI_DEV_PROBED; - break; - } nd->state = ncsi_dev_state_probe_package; ndp->active_package = NULL; break; -- 2.34.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] net/ncsi: wait for the last response to Deselect Package before configuring channel 2025-01-16 15:29 ` [PATCH] net/ncsi: wait for the last response to Deselect Package before configuring channel Paul Fertser @ 2025-01-23 4:10 ` patchwork-bot+netdevbpf 0 siblings, 0 replies; 7+ messages in thread From: patchwork-bot+netdevbpf @ 2025-01-23 4:10 UTC (permalink / raw) To: Paul Fertser Cc: potin.lai.pt, chou.cosmo, eajames, sam, davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel, fr0st61te, stable Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 16 Jan 2025 18:29:00 +0300 you wrote: > The NCSI state machine as it's currently implemented assumes that > transition to the next logical state is performed either explicitly by > calling `schedule_work(&ndp->work)` to re-queue itself or implicitly > after processing the predefined (ndp->pending_req_num) number of > replies. Thus to avoid the configuration FSM from advancing prematurely > and getting out of sync with the process it's essential to not skip > waiting for a reply. > > [...] Here is the summary with links: - net/ncsi: wait for the last response to Deselect Package before configuring channel https://git.kernel.org/netdev/net/c/6bb194d036c6 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-01-23 4:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-13 2:34 [PATCH v3 0/2] net/ncsi: fix oem gma command handling and state race issue Potin Lai 2025-01-13 2:34 ` [PATCH v3 1/2] net/ncsi: fix locking in Get MAC Address handling Potin Lai 2025-01-13 2:34 ` [PATCH v3 2/2] net/ncsi: fix state race during channel probe completion Potin Lai 2025-01-14 12:55 ` Paul Fertser 2025-01-16 0:38 ` Potin Lai 2025-01-16 15:29 ` [PATCH] net/ncsi: wait for the last response to Deselect Package before configuring channel Paul Fertser 2025-01-23 4:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox