netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration
@ 2024-02-13 18:32 Thinh Tran
  2024-02-13 18:32 ` [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool Thinh Tran
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Thinh Tran @ 2024-02-13 18:32 UTC (permalink / raw)
  To: kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee, thinhtr

Please refer to the initial cover letter
https://lore.kernel.org/all/20230818161443.708785-1-thinhtr@linux.vnet.ibm.com

In series Version 6, the patch
   [v6,1/4] bnx2x: new flag for tracking HW resource
was successfully made it to the mainline kernel
https://github.com/torvalds/linux/commit/bf23ffc8a9a777dfdeb04232e0946b803adbb6a9
but the rest of the patches did not.

The following patch has been excluded from this series:
    net/bnx2x: prevent excessive debug information during a TX timeout
based on concerns raised by some developers that it might omit valuable
debugging details as in some other scenarios may cause the TX timeout.


v9: adding "Fixes:" tag to commit messages for patch
    net/bnx2x: Prevent access to a freed page in page_pool

v8: adding stack trace to commit messages for patch
    net/bnx2x: Prevent access to a freed page in page_pool

v7: resubmitting patches.

Hereby resubmitting the two patches below:

Thinh Tran (2):
  net/bnx2x: Prevent access to a freed page in page_pool
  net/bnx2x: refactor common code to bnx2x_stop_nic()

 .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   | 28 +++++++++++--------
 .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |  7 +++--
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  | 25 +++--------------
 .../net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c  | 12 ++------
 4 files changed, 27 insertions(+), 45 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool
  2024-02-13 18:32 [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Thinh Tran
@ 2024-02-13 18:32 ` Thinh Tran
  2024-02-14 20:47   ` Jacob Keller
  2024-02-13 18:32 ` [PATCH v10 2/2] net/bnx2x: refactor common code to bnx2x_stop_nic() Thinh Tran
  2024-02-14 20:48 ` [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Jacob Keller
  2 siblings, 1 reply; 10+ messages in thread
From: Thinh Tran @ 2024-02-13 18:32 UTC (permalink / raw)
  To: kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee, thinhtr

Fix race condition leading to system crash during EEH error handling

During EEH error recovery, the bnx2x driver's transmit timeout logic
could cause a race condition when handling reset tasks. The
bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(),
which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload()
SGEs are freed using bnx2x_free_rx_sge_range(). However, this could
overlap with the EEH driver's attempt to reset the device using
bnx2x_io_slot_reset(), which also frees SGEs. This race condition can
result in system crashes due to accessing freed memory locations.

[  793.003930] EEH: Beginning: 'slot_reset'
[  793.003937] PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
[  793.003939] bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing...
[  793.004037] bnx2x 0011:01:00.0: enabling device (0140 -> 0142)
[  793.008839] bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload
[  793.122134] Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
[  793.122143] BUG: Kernel NULL pointer dereference on read at 0x00000000
[  793.122147] Faulting instruction address: 0xc0080000025065fc
[  793.122152] Oops: Kernel access of bad area, sig: 11 [#1]
.....
[  793.122315] Call Trace:
[  793.122318] [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable)
[  793.122331] [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0
[  793.122338] [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550
[  793.122342] [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60
[  793.122347] [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170
[  793.122352] [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0
[  793.122356] [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64

To solve this issue, we need to verify page pool allocations before
freeing.

Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element")

Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>


---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index d8b1824c334d..0bc1367fd649 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -1002,9 +1002,6 @@ static inline void bnx2x_set_fw_mac_addr(__le16 *fw_hi, __le16 *fw_mid,
 static inline void bnx2x_free_rx_mem_pool(struct bnx2x *bp,
 					  struct bnx2x_alloc_pool *pool)
 {
-	if (!pool->page)
-		return;
-
 	put_page(pool->page);
 
 	pool->page = NULL;
@@ -1015,6 +1012,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp,
 {
 	int i;
 
+	if (!fp->page_pool.page)
+		return;
+
 	if (fp->mode == TPA_MODE_DISABLED)
 		return;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v10 2/2] net/bnx2x: refactor common code to bnx2x_stop_nic()
  2024-02-13 18:32 [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Thinh Tran
  2024-02-13 18:32 ` [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool Thinh Tran
@ 2024-02-13 18:32 ` Thinh Tran
  2024-02-14 20:50   ` Jacob Keller
  2024-02-14 20:48 ` [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Jacob Keller
  2 siblings, 1 reply; 10+ messages in thread
From: Thinh Tran @ 2024-02-13 18:32 UTC (permalink / raw)
  To: kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee, thinhtr

Refactor common code which disables and releases HW interrupts, deletes
NAPI objects, into a new bnx2x_stop_nic() function.

Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>

---
 .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   | 28 +++++++++++--------
 .../net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |  1 +
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  | 25 +++--------------
 .../net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c  | 12 ++------
 4 files changed, 24 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index e9c1e1bb5580..e34aff5fb782 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3097,17 +3097,8 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
 		if (!CHIP_IS_E1x(bp))
 			bnx2x_pf_disable(bp);
 
-		if (!bp->nic_stopped) {
-			/* Disable HW interrupts, NAPI */
-			bnx2x_netif_stop(bp, 1);
-			/* Delete all NAPI objects */
-			bnx2x_del_all_napi(bp);
-			if (CNIC_LOADED(bp))
-				bnx2x_del_all_napi_cnic(bp);
-			/* Release IRQs */
-			bnx2x_free_irq(bp);
-			bp->nic_stopped = true;
-		}
+		/* Disable HW interrupts, delete NAPIs, Release IRQs */
+		bnx2x_stop_nic(bp, 1);
 
 		/* Report UNLOAD_DONE to MCP */
 		bnx2x_send_unload_done(bp, false);
@@ -5139,3 +5130,18 @@ void bnx2x_schedule_sp_rtnl(struct bnx2x *bp, enum sp_rtnl_flag flag,
 	   flag);
 	schedule_delayed_work(&bp->sp_rtnl_task, 0);
 }
+
+void bnx2x_stop_nic(struct bnx2x *bp, int disable_hw)
+{
+	if (!bp->nic_stopped) {
+		/* Disable HW interrupts, NAPI */
+		bnx2x_netif_stop(bp, disable_hw);
+		/* Delete all NAPI objects */
+		bnx2x_del_all_napi(bp);
+		if (CNIC_LOADED(bp))
+			bnx2x_del_all_napi_cnic(bp);
+		/* Release IRQs */
+		bnx2x_free_irq(bp);
+		bp->nic_stopped = true;
+	}
+}
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 0bc1367fd649..a35a02299b33 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -553,6 +553,7 @@ void bnx2x_free_skbs(struct bnx2x *bp);
 void bnx2x_netif_stop(struct bnx2x *bp, int disable_hw);
 void bnx2x_netif_start(struct bnx2x *bp);
 int bnx2x_load_cnic(struct bnx2x *bp);
+void bnx2x_stop_nic(struct bnx2x *bp, int disable_hw);
 
 /**
  * bnx2x_enable_msix - set msix configuration.
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0d8e61c63c7c..ff75c883cffe 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9474,18 +9474,8 @@ void bnx2x_chip_cleanup(struct bnx2x *bp, int unload_mode, bool keep_link)
 		}
 	}
 
-	if (!bp->nic_stopped) {
-		/* Disable HW interrupts, NAPI */
-		bnx2x_netif_stop(bp, 1);
-		/* Delete all NAPI objects */
-		bnx2x_del_all_napi(bp);
-		if (CNIC_LOADED(bp))
-			bnx2x_del_all_napi_cnic(bp);
-
-		/* Release IRQs */
-		bnx2x_free_irq(bp);
-		bp->nic_stopped = true;
-	}
+	/* Disable HW interrupts, delete NAPIs, Release IRQs */
+	bnx2x_stop_nic(bp, 1);
 
 	/* Reset the chip, unless PCI function is offline. If we reach this
 	 * point following a PCI error handling, it means device is really
@@ -14241,16 +14231,9 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct pci_dev *pdev)
 		}
 		bnx2x_drain_tx_queues(bp);
 		bnx2x_send_unload_req(bp, UNLOAD_RECOVERY);
-		if (!bp->nic_stopped) {
-			bnx2x_netif_stop(bp, 1);
-			bnx2x_del_all_napi(bp);
 
-			if (CNIC_LOADED(bp))
-				bnx2x_del_all_napi_cnic(bp);
-
-			bnx2x_free_irq(bp);
-			bp->nic_stopped = true;
-		}
+		/* Disable HW interrupts, delete NAPIs, Release IRQs */
+		bnx2x_stop_nic(bp, 1);
 
 		/* Report UNLOAD_DONE to MCP */
 		bnx2x_send_unload_done(bp, true);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 8946a931e87e..e92e82423096 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -529,16 +529,8 @@ void bnx2x_vfpf_close_vf(struct bnx2x *bp)
 	bnx2x_vfpf_finalize(bp, &req->first_tlv);
 
 free_irq:
-	if (!bp->nic_stopped) {
-		/* Disable HW interrupts, NAPI */
-		bnx2x_netif_stop(bp, 0);
-		/* Delete all NAPI objects */
-		bnx2x_del_all_napi(bp);
-
-		/* Release IRQs */
-		bnx2x_free_irq(bp);
-		bp->nic_stopped = true;
-	}
+	/* Disable HW interrupts, delete NAPIs, Release IRQs */
+	bnx2x_stop_nic(bp, 0);
 }
 
 static void bnx2x_leading_vfq_init(struct bnx2x *bp, struct bnx2x_virtf *vf,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool
  2024-02-13 18:32 ` [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool Thinh Tran
@ 2024-02-14 20:47   ` Jacob Keller
  2024-03-06 21:13     ` Thinh Tran
  0 siblings, 1 reply; 10+ messages in thread
From: Jacob Keller @ 2024-02-14 20:47 UTC (permalink / raw)
  To: Thinh Tran, kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee



On 2/13/2024 10:32 AM, Thinh Tran wrote:
> Fix race condition leading to system crash during EEH error handling
> 
> During EEH error recovery, the bnx2x driver's transmit timeout logic
> could cause a race condition when handling reset tasks. The
> bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(),
> which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload()
> SGEs are freed using bnx2x_free_rx_sge_range(). However, this could
> overlap with the EEH driver's attempt to reset the device using
> bnx2x_io_slot_reset(), which also frees SGEs. This race condition can
> result in system crashes due to accessing freed memory locations.
> 
> [  793.003930] EEH: Beginning: 'slot_reset'
> [  793.003937] PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
> [  793.003939] bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing...
> [  793.004037] bnx2x 0011:01:00.0: enabling device (0140 -> 0142)
> [  793.008839] bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload
> [  793.122134] Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
> [  793.122143] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [  793.122147] Faulting instruction address: 0xc0080000025065fc
> [  793.122152] Oops: Kernel access of bad area, sig: 11 [#1]
> .....
> [  793.122315] Call Trace:
> [  793.122318] [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable)
> [  793.122331] [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0
> [  793.122338] [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550
> [  793.122342] [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60
> [  793.122347] [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170
> [  793.122352] [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0
> [  793.122356] [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
> 
> To solve this issue, we need to verify page pool allocations before
> freeing.
> 
> Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element")
> 
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> 
> 
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> index d8b1824c334d..0bc1367fd649 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
> @@ -1002,9 +1002,6 @@ static inline void bnx2x_set_fw_mac_addr(__le16 *fw_hi, __le16 *fw_mid,
>  static inline void bnx2x_free_rx_mem_pool(struct bnx2x *bp,
>  					  struct bnx2x_alloc_pool *pool)
>  {
> -	if (!pool->page)
> -		return;
> -
>  	put_page(pool->page);
>  
>  	pool->page = NULL;
> @@ -1015,6 +1012,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp,
>  {
>  	int i;
>  
> +	if (!fp->page_pool.page)
> +		return;
> +

Doesn't this still leave a race window where put_page was already called
but page hasn't yet been set NULL? I think you either need to assign
NULL first (and possibly WRITE_ONCE or a barrier depending on platform?)
or some other serialization mechanism to ensure only one thread runs here?

I guess the issue you're seeing is that bnx2x_free_rx_sge_range calls
bnx2x_free_rx_sge even if the page was already removed? Does that mean
you already have some other serialization ensuring that you can't have
both threads call put_page simultaneously?

>  	if (fp->mode == TPA_MODE_DISABLED)
>  		return;
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration
  2024-02-13 18:32 [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Thinh Tran
  2024-02-13 18:32 ` [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool Thinh Tran
  2024-02-13 18:32 ` [PATCH v10 2/2] net/bnx2x: refactor common code to bnx2x_stop_nic() Thinh Tran
@ 2024-02-14 20:48 ` Jacob Keller
  2024-02-15 16:08   ` Thinh Tran
  2 siblings, 1 reply; 10+ messages in thread
From: Jacob Keller @ 2024-02-14 20:48 UTC (permalink / raw)
  To: Thinh Tran, kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee



On 2/13/2024 10:32 AM, Thinh Tran wrote:
> Please refer to the initial cover letter
> https://lore.kernel.org/all/20230818161443.708785-1-thinhtr@linux.vnet.ibm.com
> 
> In series Version 6, the patch
>    [v6,1/4] bnx2x: new flag for tracking HW resource
> was successfully made it to the mainline kernel
> https://github.com/torvalds/linux/commit/bf23ffc8a9a777dfdeb04232e0946b803adbb6a9
> but the rest of the patches did not.
> 
> The following patch has been excluded from this series:
>     net/bnx2x: prevent excessive debug information during a TX timeout
> based on concerns raised by some developers that it might omit valuable
> debugging details as in some other scenarios may cause the TX timeout.
> 
> 
> v9: adding "Fixes:" tag to commit messages for patch
>     net/bnx2x: Prevent access to a freed page in page_pool
> 
> v8: adding stack trace to commit messages for patch
>     net/bnx2x: Prevent access to a freed page in page_pool
> 
> v7: resubmitting patches.
> 

The subject didn't clearly identify net-next or net... but the contents
of the series seem to be one bug fix which would make sense to go to net
(unless the bug itself isn't in net yet?) and one refactor that doesn't
seem reasonable to go to net..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 2/2] net/bnx2x: refactor common code to bnx2x_stop_nic()
  2024-02-13 18:32 ` [PATCH v10 2/2] net/bnx2x: refactor common code to bnx2x_stop_nic() Thinh Tran
@ 2024-02-14 20:50   ` Jacob Keller
  2024-02-15 16:17     ` Thinh Tran
  0 siblings, 1 reply; 10+ messages in thread
From: Jacob Keller @ 2024-02-14 20:50 UTC (permalink / raw)
  To: Thinh Tran, kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee



On 2/13/2024 10:32 AM, Thinh Tran wrote:
> Refactor common code which disables and releases HW interrupts, deletes
> NAPI objects, into a new bnx2x_stop_nic() function.
> 
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> 

Good cleanup.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

> +void bnx2x_stop_nic(struct bnx2x *bp, int disable_hw)
> +{
> +	if (!bp->nic_stopped) {
> +		/* Disable HW interrupts, NAPI */
> +		bnx2x_netif_stop(bp, disable_hw);
> +		/* Delete all NAPI objects */
> +		bnx2x_del_all_napi(bp);
> +		if (CNIC_LOADED(bp))
> +			bnx2x_del_all_napi_cnic(bp);
> +		/* Release IRQs */
> +		bnx2x_free_irq(bp);
> +		bp->nic_stopped = true;
> +	}
> +}

IMHO this could be an early return:

if (bp->nic_stopped)
    return;

That is a bit of a nitpick and the existing code (by necessity of not
being a function) used this style.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration
  2024-02-14 20:48 ` [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Jacob Keller
@ 2024-02-15 16:08   ` Thinh Tran
  2024-02-15 17:29     ` Jacob Keller
  0 siblings, 1 reply; 10+ messages in thread
From: Thinh Tran @ 2024-02-15 16:08 UTC (permalink / raw)
  To: Jacob Keller, kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee


Thank you for the feed back
On 2/14/2024 2:48 PM, Jacob Keller wrote:
> 
> The subject didn't clearly identify net-next or net... but the contents
> of the series seem to be one bug fix which would make sense to go to net
> (unless the bug itself isn't in net yet?) and one refactor that doesn't
> seem reasonable to go to net..

I agree that the refactor patch does not need to be included in the 
'net' tree.
Should the patches be resubmitted separately, with one targeted for 
'net' and the other for 'net-next'? Your advice on the best approach 
would be greatly appreciated.

Thanks
Thinh Tran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 2/2] net/bnx2x: refactor common code to bnx2x_stop_nic()
  2024-02-14 20:50   ` Jacob Keller
@ 2024-02-15 16:17     ` Thinh Tran
  0 siblings, 0 replies; 10+ messages in thread
From: Thinh Tran @ 2024-02-15 16:17 UTC (permalink / raw)
  To: Jacob Keller, kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee



On 2/14/2024 2:50 PM, Jacob Keller wrote:
> 
> Good cleanup.
> 
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
>
Thank you for the review.

> 
> IMHO this could be an early return:
> 
> if (bp->nic_stopped)
>      return;
> 
> That is a bit of a nitpick and the existing code (by necessity of not
> being a function) used this style.

and thank you for the suggestion. I will update the code to utilize an 
early return as you proposed.

Thanks,
Thinh Tran

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration
  2024-02-15 16:08   ` Thinh Tran
@ 2024-02-15 17:29     ` Jacob Keller
  0 siblings, 0 replies; 10+ messages in thread
From: Jacob Keller @ 2024-02-15 17:29 UTC (permalink / raw)
  To: Thinh Tran, kuba
  Cc: netdev, davem, manishc, pabeni, skalluru, simon.horman, edumazet,
	VENKATA.SAI.DUGGI, drc, abdhalee



On 2/15/2024 8:08 AM, Thinh Tran wrote:
> 
> Thank you for the feed back
> On 2/14/2024 2:48 PM, Jacob Keller wrote:
>>
>> The subject didn't clearly identify net-next or net... but the contents
>> of the series seem to be one bug fix which would make sense to go to net
>> (unless the bug itself isn't in net yet?) and one refactor that doesn't
>> seem reasonable to go to net..
> 
> I agree that the refactor patch does not need to be included in the 
> 'net' tree.
> Should the patches be resubmitted separately, with one targeted for 
> 'net' and the other for 'net-next'? Your advice on the best approach 
> would be greatly appreciated.
> 
> Thanks
> Thinh Tran

Yep. Targeting the fix to net means it will hit the next release and can
get ported to the stable trees.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool
  2024-02-14 20:47   ` Jacob Keller
@ 2024-03-06 21:13     ` Thinh Tran
  0 siblings, 0 replies; 10+ messages in thread
From: Thinh Tran @ 2024-03-06 21:13 UTC (permalink / raw)
  To: jacob.e.keller
  Cc: VENKATA.SAI.DUGGI, abdhalee, davem, drc, edumazet, kuba, manishc,
	netdev, pabeni, simon.horman, skalluru, thinhtr

Apologies for the delayed response. I did not receive this email and 
some others in my mailbox.


> Doesn't this still leave a race window where put_page was already called
> but page hasn't yet been set NULL? I think you either need to assign
> NULL first (and possibly WRITE_ONCE or a barrier depending on platform?)
> or some other serialization mechanism to ensure only one thread runs here?
 >
> I guess the issue you're seeing is that bnx2x_free_rx_sge_range calls
> bnx2x_free_rx_sge even if the page was already removed? Does that mean

yes

> you already have some other serialization ensuring that you can't have
> both threads call put_page simultaneously?

The callers to bnx2x_free_rx_sge_range() are under rtnl_lock(), which 
should handle the serialization.

The crash occurs in the bnx2x_free_rx_sge() function due to accessing a 
NULL pointer.

799  static inline void bnx2x_free_rx_sge(struct bnx2x *bp,
800				struct bnx2x_fastpath *fp, u16 index)
801  {
802	struct sw_rx_page *sw_buf = &fp->rx_page_ring[index];
803     struct page *page = sw_buf->page;
804	struct eth_rx_sge *sge = &fp->rx_sge_ring[index];
.....
810	/* Since many fragments can share the same page, make sure to
811	 * only unmap and free the page once.
812	 */
813	dma_unmap_page(&bp->pdev->dev, dma_unmap_addr(sw_buf, mapping),
814		       SGE_PAGE_SIZE, DMA_FROM_DEVICE);
815
816	put_page(page);
...
}

This happens because sw_buf was set to NULL after the call to 
dma_unmap_page(), called by the preceding thread.
The patch checking if that page in the pool is already freed, there is 
nothing else to do.

Thinh Tran


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-06 21:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-13 18:32 [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Thinh Tran
2024-02-13 18:32 ` [PATCH v10 1/2] net/bnx2x: Prevent access to a freed page in page_pool Thinh Tran
2024-02-14 20:47   ` Jacob Keller
2024-03-06 21:13     ` Thinh Tran
2024-02-13 18:32 ` [PATCH v10 2/2] net/bnx2x: refactor common code to bnx2x_stop_nic() Thinh Tran
2024-02-14 20:50   ` Jacob Keller
2024-02-15 16:17     ` Thinh Tran
2024-02-14 20:48 ` [PATCH v10 0/2] bnx2x: Fix error recovering in switch configuration Jacob Keller
2024-02-15 16:08   ` Thinh Tran
2024-02-15 17:29     ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).