All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 net 0/3] There are some bugfix for hibmcge ethernet driver
@ 2025-08-02 12:32 Jijie Shao
  2025-08-02 12:32 ` [PATCH V2 net 1/3] net: hibmcge: fix rtnl deadlock issue Jijie Shao
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jijie Shao @ 2025-08-02 12:32 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, jonathan.cameron,
	shameerali.kolothum.thodi, salil.mehta, netdev, linux-kernel,
	shaojijie

This patch set is intended to fix several issues for hibmcge driver:
1. Holding the rtnl_lock in pci_error_handlers->reset_prepare()
   may lead to a deadlock issue.
2. A division by zero issue caused by debugfs when the port is down.
3. A probabilistic false positive issue with np_link_fail.

---
ChangeLog:
v1 -> v2:
  - Fix a concurrency issue for patch1, suggested by Simon Horman
  v1: https://lore.kernel.org/all/20250731134749.4090041-1-shaojijie@huawei.com/
---

Jijie Shao (3):
  net: hibmcge: fix rtnl deadlock issue
  net: hibmcge: fix the division by zero issue
  net: hibmcge: fix the np_link_fail error reporting issue

 drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c  | 14 +++++---------
 drivers/net/ethernet/hisilicon/hibmcge/hbg_hw.c   | 15 +++++++++++++--
 drivers/net/ethernet/hisilicon/hibmcge/hbg_txrx.h |  3 +++
 3 files changed, 21 insertions(+), 11 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2 net 1/3] net: hibmcge: fix rtnl deadlock issue
  2025-08-02 12:32 [PATCH V2 net 0/3] There are some bugfix for hibmcge ethernet driver Jijie Shao
@ 2025-08-02 12:32 ` Jijie Shao
  2025-08-05  9:03   ` Simon Horman
  2025-08-02 12:32 ` [PATCH V2 net 2/3] net: hibmcge: fix the division by zero issue Jijie Shao
  2025-08-02 12:32 ` [PATCH V2 net 3/3] net: hibmcge: fix the np_link_fail error reporting issue Jijie Shao
  2 siblings, 1 reply; 7+ messages in thread
From: Jijie Shao @ 2025-08-02 12:32 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, jonathan.cameron,
	shameerali.kolothum.thodi, salil.mehta, netdev, linux-kernel,
	shaojijie

Currently, the hibmcge netdev acquires the rtnl_lock in
pci_error_handlers.reset_prepare() and releases it in
pci_error_handlers.reset_done().

However, in the PCI framework:
pci_reset_bus - __pci_reset_slot - pci_slot_save_and_disable_locked -
 pci_dev_save_and_disable - err_handler->reset_prepare(dev);

In pci_slot_save_and_disable_locked():
	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
		if (!dev->slot || dev->slot!= slot)
			continue;
		pci_dev_save_and_disable(dev);
		if (dev->subordinate)
			pci_bus_save_and_disable_locked(dev->subordinate);
	}

This will iterate through all devices under the current bus and execute
err_handler->reset_prepare(), causing two devices of the hibmcge driver
to sequentially request the rtnl_lock, leading to a deadlock.

Since the driver now executes netif_device_detach()
before the reset process, it will not concurrently with
other netdev APIs, so there is no need to hold the rtnl_lock now.

Therefore, this patch removes the rtnl_lock during the reset process and
adjusts the position of HBG_NIC_STATE_RESETTING to ensure
that multiple resets are not executed concurrently.

Fixes: 3f5a61f6d504f ("net: hibmcge: Add reset supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
ChangeLog:
v1 -> v2:
  - Fix a concurrency issue, suggested by Simon Horman
  v1: https://lore.kernel.org/all/20250731134749.4090041-1-shaojijie@huawei.com/
---
 drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c b/drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c
index 503cfbfb4a8a..83cf75bf7a17 100644
--- a/drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c
+++ b/drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c
@@ -53,9 +53,11 @@ static int hbg_reset_prepare(struct hbg_priv *priv, enum hbg_reset_type type)
 {
 	int ret;
 
-	ASSERT_RTNL();
+	if (test_and_set_bit(HBG_NIC_STATE_RESETTING, &priv->state))
+		return -EBUSY;
 
 	if (netif_running(priv->netdev)) {
+		clear_bit(HBG_NIC_STATE_RESETTING, &priv->state);
 		dev_warn(&priv->pdev->dev,
 			 "failed to reset because port is up\n");
 		return -EBUSY;
@@ -64,7 +66,6 @@ static int hbg_reset_prepare(struct hbg_priv *priv, enum hbg_reset_type type)
 	netif_device_detach(priv->netdev);
 
 	priv->reset_type = type;
-	set_bit(HBG_NIC_STATE_RESETTING, &priv->state);
 	clear_bit(HBG_NIC_STATE_RESET_FAIL, &priv->state);
 	ret = hbg_hw_event_notify(priv, HBG_HW_EVENT_RESET);
 	if (ret) {
@@ -84,29 +85,26 @@ static int hbg_reset_done(struct hbg_priv *priv, enum hbg_reset_type type)
 	    type != priv->reset_type)
 		return 0;
 
-	ASSERT_RTNL();
-
-	clear_bit(HBG_NIC_STATE_RESETTING, &priv->state);
 	ret = hbg_rebuild(priv);
 	if (ret) {
 		priv->stats.reset_fail_cnt++;
 		set_bit(HBG_NIC_STATE_RESET_FAIL, &priv->state);
+		clear_bit(HBG_NIC_STATE_RESETTING, &priv->state);
 		dev_err(&priv->pdev->dev, "failed to rebuild after reset\n");
 		return ret;
 	}
 
 	netif_device_attach(priv->netdev);
+	clear_bit(HBG_NIC_STATE_RESETTING, &priv->state);
 
 	dev_info(&priv->pdev->dev, "reset done\n");
 	return ret;
 }
 
-/* must be protected by rtnl lock */
 int hbg_reset(struct hbg_priv *priv)
 {
 	int ret;
 
-	ASSERT_RTNL();
 	ret = hbg_reset_prepare(priv, HBG_RESET_TYPE_FUNCTION);
 	if (ret)
 		return ret;
@@ -171,7 +169,6 @@ static void hbg_pci_err_reset_prepare(struct pci_dev *pdev)
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct hbg_priv *priv = netdev_priv(netdev);
 
-	rtnl_lock();
 	hbg_reset_prepare(priv, HBG_RESET_TYPE_FLR);
 }
 
@@ -181,7 +178,6 @@ static void hbg_pci_err_reset_done(struct pci_dev *pdev)
 	struct hbg_priv *priv = netdev_priv(netdev);
 
 	hbg_reset_done(priv, HBG_RESET_TYPE_FLR);
-	rtnl_unlock();
 }
 
 static const struct pci_error_handlers hbg_pci_err_handler = {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 net 2/3] net: hibmcge: fix the division by zero issue
  2025-08-02 12:32 [PATCH V2 net 0/3] There are some bugfix for hibmcge ethernet driver Jijie Shao
  2025-08-02 12:32 ` [PATCH V2 net 1/3] net: hibmcge: fix rtnl deadlock issue Jijie Shao
@ 2025-08-02 12:32 ` Jijie Shao
  2025-08-06  1:14   ` Jakub Kicinski
  2025-08-02 12:32 ` [PATCH V2 net 3/3] net: hibmcge: fix the np_link_fail error reporting issue Jijie Shao
  2 siblings, 1 reply; 7+ messages in thread
From: Jijie Shao @ 2025-08-02 12:32 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, jonathan.cameron,
	shameerali.kolothum.thodi, salil.mehta, netdev, linux-kernel,
	shaojijie

When the network port is down, the queue is released, and ring->len is 0.
In debugfs, hbg_get_queue_used_num() will be called,
which may lead to a division by zero issue.

This patch adds a check, if ring->len is 0,
hbg_get_queue_used_num() directly returns 0.

Fixes: 40735e7543f9 ("net: hibmcge: Implement .ndo_start_xmit function")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 drivers/net/ethernet/hisilicon/hibmcge/hbg_txrx.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hibmcge/hbg_txrx.h b/drivers/net/ethernet/hisilicon/hibmcge/hbg_txrx.h
index 2883a5899ae2..2aecc73f3d49 100644
--- a/drivers/net/ethernet/hisilicon/hibmcge/hbg_txrx.h
+++ b/drivers/net/ethernet/hisilicon/hibmcge/hbg_txrx.h
@@ -29,6 +29,9 @@ static inline bool hbg_fifo_is_full(struct hbg_priv *priv, enum hbg_dir dir)
 
 static inline u32 hbg_get_queue_used_num(struct hbg_ring *ring)
 {
+	if (!ring->len)
+		return 0;
+
 	return (ring->ntu + ring->len - ring->ntc) % ring->len;
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 net 3/3] net: hibmcge: fix the np_link_fail error reporting issue
  2025-08-02 12:32 [PATCH V2 net 0/3] There are some bugfix for hibmcge ethernet driver Jijie Shao
  2025-08-02 12:32 ` [PATCH V2 net 1/3] net: hibmcge: fix rtnl deadlock issue Jijie Shao
  2025-08-02 12:32 ` [PATCH V2 net 2/3] net: hibmcge: fix the division by zero issue Jijie Shao
@ 2025-08-02 12:32 ` Jijie Shao
  2 siblings, 0 replies; 7+ messages in thread
From: Jijie Shao @ 2025-08-02 12:32 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, jonathan.cameron,
	shameerali.kolothum.thodi, salil.mehta, netdev, linux-kernel,
	shaojijie

Currently, after modifying device port mode, the np_link_ok state
is immediately checked. At this point, the device may not yet ready,
leading to the querying of an intermediate state.

This patch will poll to check if np_link is ok after
modifying device port mode, and only report np_link_fail upon timeout.

Fixes: e0306637e85d ("net: hibmcge: Add support for mac link exception handling feature")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
---
 drivers/net/ethernet/hisilicon/hibmcge/hbg_hw.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hibmcge/hbg_hw.c b/drivers/net/ethernet/hisilicon/hibmcge/hbg_hw.c
index 8cca8316ba40..d0aa0661ecd4 100644
--- a/drivers/net/ethernet/hisilicon/hibmcge/hbg_hw.c
+++ b/drivers/net/ethernet/hisilicon/hibmcge/hbg_hw.c
@@ -12,6 +12,8 @@
 
 #define HBG_HW_EVENT_WAIT_TIMEOUT_US	(2 * 1000 * 1000)
 #define HBG_HW_EVENT_WAIT_INTERVAL_US	(10 * 1000)
+#define HBG_MAC_LINK_WAIT_TIMEOUT_US	(500 * 1000)
+#define HBG_MAC_LINK_WAIT_INTERVAL_US	(5 * 1000)
 /* little endian or big endian.
  * ctrl means packet description, data means skb packet data
  */
@@ -228,6 +230,9 @@ void hbg_hw_fill_buffer(struct hbg_priv *priv, u32 buffer_dma_addr)
 
 void hbg_hw_adjust_link(struct hbg_priv *priv, u32 speed, u32 duplex)
 {
+	u32 link_status;
+	int ret;
+
 	hbg_hw_mac_enable(priv, HBG_STATUS_DISABLE);
 
 	hbg_reg_write_field(priv, HBG_REG_PORT_MODE_ADDR,
@@ -239,8 +244,14 @@ void hbg_hw_adjust_link(struct hbg_priv *priv, u32 speed, u32 duplex)
 
 	hbg_hw_mac_enable(priv, HBG_STATUS_ENABLE);
 
-	if (!hbg_reg_read_field(priv, HBG_REG_AN_NEG_STATE_ADDR,
-				HBG_REG_AN_NEG_STATE_NP_LINK_OK_B))
+	/* wait MAC link up */
+	ret = readl_poll_timeout(priv->io_base + HBG_REG_AN_NEG_STATE_ADDR,
+				 link_status,
+				 FIELD_GET(HBG_REG_AN_NEG_STATE_NP_LINK_OK_B,
+					   link_status),
+				 HBG_MAC_LINK_WAIT_INTERVAL_US,
+				 HBG_MAC_LINK_WAIT_TIMEOUT_US);
+	if (ret)
 		hbg_np_link_fail_task_schedule(priv);
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 net 1/3] net: hibmcge: fix rtnl deadlock issue
  2025-08-02 12:32 ` [PATCH V2 net 1/3] net: hibmcge: fix rtnl deadlock issue Jijie Shao
@ 2025-08-05  9:03   ` Simon Horman
  0 siblings, 0 replies; 7+ messages in thread
From: Simon Horman @ 2025-08-05  9:03 UTC (permalink / raw)
  To: Jijie Shao
  Cc: davem, edumazet, kuba, pabeni, andrew+netdev, shenjian15,
	liuyonglong, chenhao418, jonathan.cameron,
	shameerali.kolothum.thodi, salil.mehta, netdev, linux-kernel

On Sat, Aug 02, 2025 at 08:32:24PM +0800, Jijie Shao wrote:
> Currently, the hibmcge netdev acquires the rtnl_lock in
> pci_error_handlers.reset_prepare() and releases it in
> pci_error_handlers.reset_done().
> 
> However, in the PCI framework:
> pci_reset_bus - __pci_reset_slot - pci_slot_save_and_disable_locked -
>  pci_dev_save_and_disable - err_handler->reset_prepare(dev);
> 
> In pci_slot_save_and_disable_locked():
> 	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> 		if (!dev->slot || dev->slot!= slot)
> 			continue;
> 		pci_dev_save_and_disable(dev);
> 		if (dev->subordinate)
> 			pci_bus_save_and_disable_locked(dev->subordinate);
> 	}
> 
> This will iterate through all devices under the current bus and execute
> err_handler->reset_prepare(), causing two devices of the hibmcge driver
> to sequentially request the rtnl_lock, leading to a deadlock.
> 
> Since the driver now executes netif_device_detach()
> before the reset process, it will not concurrently with
> other netdev APIs, so there is no need to hold the rtnl_lock now.
> 
> Therefore, this patch removes the rtnl_lock during the reset process and
> adjusts the position of HBG_NIC_STATE_RESETTING to ensure
> that multiple resets are not executed concurrently.
> 
> Fixes: 3f5a61f6d504f ("net: hibmcge: Add reset supported in this module")
> Signed-off-by: Jijie Shao <shaojijie@huawei.com>
> ---
> ChangeLog:
> v1 -> v2:
>   - Fix a concurrency issue, suggested by Simon Horman
>   v1: https://lore.kernel.org/all/20250731134749.4090041-1-shaojijie@huawei.com/

Thanks for the update.

This version looks good to me.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 net 2/3] net: hibmcge: fix the division by zero issue
  2025-08-02 12:32 ` [PATCH V2 net 2/3] net: hibmcge: fix the division by zero issue Jijie Shao
@ 2025-08-06  1:14   ` Jakub Kicinski
  2025-08-06  4:00     ` Jijie Shao
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-08-06  1:14 UTC (permalink / raw)
  To: Jijie Shao
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, shenjian15,
	liuyonglong, chenhao418, jonathan.cameron,
	shameerali.kolothum.thodi, salil.mehta, netdev, linux-kernel

On Sat, 2 Aug 2025 20:32:25 +0800 Jijie Shao wrote:
>  static inline u32 hbg_get_queue_used_num(struct hbg_ring *ring)
>  {
> +	if (!ring->len)
> +		return 0;
> +
>  	return (ring->ntu + ring->len - ring->ntc) % ring->len;

This should probably be a READ_ONCE() to a temporary variable.
There is no locking in debugfs, AFAICT, the value may change
between the test and the division / modulo.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 net 2/3] net: hibmcge: fix the division by zero issue
  2025-08-06  1:14   ` Jakub Kicinski
@ 2025-08-06  4:00     ` Jijie Shao
  0 siblings, 0 replies; 7+ messages in thread
From: Jijie Shao @ 2025-08-06  4:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: shaojijie, davem, edumazet, pabeni, andrew+netdev, horms,
	shenjian15, liuyonglong, chenhao418, jonathan.cameron,
	shameerali.kolothum.thodi, salil.mehta, netdev, linux-kernel


on 2025/8/6 9:14, Jakub Kicinski wrote:
> On Sat, 2 Aug 2025 20:32:25 +0800 Jijie Shao wrote:
>>   static inline u32 hbg_get_queue_used_num(struct hbg_ring *ring)
>>   {
>> +	if (!ring->len)
>> +		return 0;
>> +
>>   	return (ring->ntu + ring->len - ring->ntc) % ring->len;
> This should probably be a READ_ONCE() to a temporary variable.
> There is no locking in debugfs, AFAICT, the value may change
> between the test and the division / modulo.

Yes, there is indeed a very short time window.
I will add READ_ONCE() to ring->len and read it only once.

Thanks
Jijie Shao



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-06  4:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-02 12:32 [PATCH V2 net 0/3] There are some bugfix for hibmcge ethernet driver Jijie Shao
2025-08-02 12:32 ` [PATCH V2 net 1/3] net: hibmcge: fix rtnl deadlock issue Jijie Shao
2025-08-05  9:03   ` Simon Horman
2025-08-02 12:32 ` [PATCH V2 net 2/3] net: hibmcge: fix the division by zero issue Jijie Shao
2025-08-06  1:14   ` Jakub Kicinski
2025-08-06  4:00     ` Jijie Shao
2025-08-02 12:32 ` [PATCH V2 net 3/3] net: hibmcge: fix the np_link_fail error reporting issue Jijie Shao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.