* [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
@ 2023-10-02 18:55 Thinh Tran
2023-10-03 4:34 ` Pavan Chebbi
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Thinh Tran @ 2023-10-02 18:55 UTC (permalink / raw)
To: netdev
Cc: siva.kallam, prashant, mchan, drc, pavan.chebbi, Thinh Tran,
Venkata Sai Duggi
during the EEH error injection tests on the 4-port 1 GbE NetXtreme
BCM5719 Gigabit Ethernet PCIe adapter, a race condition was observed in
the process of resetting and setting the driver flag to
TX_RECOVERY_PENDING between tg3_reset_task_cancel() and tg3_tx_recover().
As a result, it occasionally leads to transmit timeouts and the
subsequent disabling of all the driver's interfaces.
[12046.886221] NETDEV WATCHDOG: eth16 (tg3): transmit queue 0 timed out
[12046.886238] WARNING: CPU: 7 PID: 0 at ../net/sched/sch_generic.c:478
dev_watchdog+0x42c/0x440
[12046.886247] Modules linked in: tg3 libphy nfsv3 nfs_acl .......
..........
[12046.886571] tg3 0021:01:00.0 eth16: transmit timed out, resetting
...........
[12046.966175] tg3 0021:01:00.1 eth15: transmit timed out, resetting
...........
[12046.981584] tg3 0021:01:00.2 eth14: transmit timed out, resetting
...........
[12047.056165] tg3 0021:01:00.3 eth13: transmit timed out, resetting
Fixing this issue by taking the spinlock when modifying the driver flag
Fixes: 6c4ca03bd890 ("net/tg3: resolve deadlock in tg3_reset_task() during EEH")
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
---
drivers/net/ethernet/broadcom/tg3.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 14b311196b8f..f4558762f9de 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6507,7 +6507,9 @@ static void tg3_tx_recover(struct tg3 *tp)
"Please report the problem to the driver maintainer "
"and include system chipset information.\n");
+ tg3_full_lock(tp, 0);
tg3_flag_set(tp, TX_RECOVERY_PENDING);
+ tg3_full_unlock(tp);
}
static inline u32 tg3_tx_avail(struct tg3_napi *tnapi)
@@ -7210,7 +7212,10 @@ static inline void tg3_reset_task_cancel(struct tg3 *tp)
{
if (test_and_clear_bit(TG3_FLAG_RESET_TASK_PENDING, tp->tg3_flags))
cancel_work_sync(&tp->reset_task);
+
+ tg3_full_lock(tp, 0);
tg3_flag_clear(tp, TX_RECOVERY_PENDING);
+ tg3_full_unlock(tp);
}
static int tg3_poll_msix(struct napi_struct *napi, int budget)
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
2023-10-02 18:55 [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel() Thinh Tran
@ 2023-10-03 4:34 ` Pavan Chebbi
2023-10-31 23:18 ` Thinh Tran
2023-10-03 9:37 ` Michael Chan
2023-11-02 16:12 ` [PATCH v2] net/tg3: fix race condition in tg3_reset_task() Thinh Tran
2 siblings, 1 reply; 22+ messages in thread
From: Pavan Chebbi @ 2023-10-03 4:34 UTC (permalink / raw)
To: Thinh Tran; +Cc: netdev, siva.kallam, prashant, mchan, drc, Venkata Sai Duggi
[-- Attachment #1: Type: text/plain, Size: 716 bytes --]
On Tue, Oct 3, 2023 at 12:25 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
> during the EEH error injection tests on the 4-port 1 GbE NetXtreme
> BCM5719 Gigabit Ethernet PCIe adapter, a race condition was observed in
> the process of resetting and setting the driver flag to
> TX_RECOVERY_PENDING between tg3_reset_task_cancel() and tg3_tx_recover().
> As a result, it occasionally leads to transmit timeouts and the
> subsequent disabling of all the driver's interfaces.
Can you elaborate on the race condition please? Are you saying
tg3_reset_task_cancel() cleared the flag and tg3_tx_recover() set it
again and the reset task never got a chance to run?
Is that what is leading to TX stall?
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
2023-10-02 18:55 [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel() Thinh Tran
2023-10-03 4:34 ` Pavan Chebbi
@ 2023-10-03 9:37 ` Michael Chan
2023-10-03 22:05 ` Thinh Tran
2023-11-02 16:12 ` [PATCH v2] net/tg3: fix race condition in tg3_reset_task() Thinh Tran
2 siblings, 1 reply; 22+ messages in thread
From: Michael Chan @ 2023-10-03 9:37 UTC (permalink / raw)
To: Thinh Tran
Cc: netdev, siva.kallam, prashant, mchan, drc, pavan.chebbi,
Venkata Sai Duggi
[-- Attachment #1: Type: text/plain, Size: 2718 bytes --]
On Mon, Oct 2, 2023 at 11:55 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
> during the EEH error injection tests on the 4-port 1 GbE NetXtreme
> BCM5719 Gigabit Ethernet PCIe adapter, a race condition was observed in
> the process of resetting and setting the driver flag to
> TX_RECOVERY_PENDING between tg3_reset_task_cancel() and tg3_tx_recover().
> As a result, it occasionally leads to transmit timeouts and the
> subsequent disabling of all the driver's interfaces.
>
> [12046.886221] NETDEV WATCHDOG: eth16 (tg3): transmit queue 0 timed out
> [12046.886238] WARNING: CPU: 7 PID: 0 at ../net/sched/sch_generic.c:478
> dev_watchdog+0x42c/0x440
> [12046.886247] Modules linked in: tg3 libphy nfsv3 nfs_acl .......
> ..........
> [12046.886571] tg3 0021:01:00.0 eth16: transmit timed out, resetting
> ...........
> [12046.966175] tg3 0021:01:00.1 eth15: transmit timed out, resetting
> ...........
> [12046.981584] tg3 0021:01:00.2 eth14: transmit timed out, resetting
> ...........
> [12047.056165] tg3 0021:01:00.3 eth13: transmit timed out, resetting
>
>
> Fixing this issue by taking the spinlock when modifying the driver flag
>
>
> Fixes: 6c4ca03bd890 ("net/tg3: resolve deadlock in tg3_reset_task() during EEH")
>
>
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
>
> ---
> drivers/net/ethernet/broadcom/tg3.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index 14b311196b8f..f4558762f9de 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -6507,7 +6507,9 @@ static void tg3_tx_recover(struct tg3 *tp)
> "Please report the problem to the driver maintainer "
> "and include system chipset information.\n");
>
> + tg3_full_lock(tp, 0);
> tg3_flag_set(tp, TX_RECOVERY_PENDING);
> + tg3_full_unlock(tp);
tg3_flag_set() calls set_bit() which is atomic. The same is true for
tg3_flag_clear(). Maybe we just need some smp_mb__after_atomic() or
similar memory barriers.
> }
>
> static inline u32 tg3_tx_avail(struct tg3_napi *tnapi)
> @@ -7210,7 +7212,10 @@ static inline void tg3_reset_task_cancel(struct tg3 *tp)
> {
> if (test_and_clear_bit(TG3_FLAG_RESET_TASK_PENDING, tp->tg3_flags))
> cancel_work_sync(&tp->reset_task);
> +
> + tg3_full_lock(tp, 0);
> tg3_flag_clear(tp, TX_RECOVERY_PENDING);
> + tg3_full_unlock(tp);
> }
>
> static int tg3_poll_msix(struct napi_struct *napi, int budget)
> --
> 2.25.1
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
2023-10-03 9:37 ` Michael Chan
@ 2023-10-03 22:05 ` Thinh Tran
2023-11-02 16:02 ` Thinh Tran
0 siblings, 1 reply; 22+ messages in thread
From: Thinh Tran @ 2023-10-03 22:05 UTC (permalink / raw)
To: Michael Chan
Cc: netdev, siva.kallam, prashant, mchan, drc, pavan.chebbi,
Venkata Sai Duggi
Thanks for the review.
On 10/3/2023 4:37 AM, Michael Chan wrote:
>
> tg3_flag_set() calls set_bit() which is atomic. The same is true for
> tg3_flag_clear(). Maybe we just need some smp_mb__after_atomic() or
> similar memory barriers.
>
I did not see it being used in this driver. I'll try that.
Thinh Tran
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
2023-10-03 4:34 ` Pavan Chebbi
@ 2023-10-31 23:18 ` Thinh Tran
0 siblings, 0 replies; 22+ messages in thread
From: Thinh Tran @ 2023-10-31 23:18 UTC (permalink / raw)
To: Pavan Chebbi; +Cc: netdev, siva.kallam, prashant, mchan, drc, Venkata Sai Duggi
Thanks for the review and I apologize for the delayed response. I had
some trouble accessing the system, which delayed my investigation.
On 10/2/2023 11:34 PM, Pavan Chebbi wrote:
>
> Can you elaborate on the race condition please? Are you saying
> tg3_reset_task_cancel() cleared the flag and tg3_tx_recover() set it
> again and the reset task never got a chance to run?
> Is that what is leading to TX stall?
This code path only triggered once, and after updating both the system
and adapter firmware, I haven't encountered it again. However, the race
condition issue still persists, causing the interfaces to go down.
Implementing the memory barrier, smp_mb__after_atomic(), as suggested by
Michael Chan, the intermittent problem still persists. Upon closer
investigation, I identified the root cause, details in the next version
of the patch. When I commented out the call to the tg3_dump_state()
function in the tg3_tx_timeout() function, the issue occurred quicker.
I'm working on submitting the v2 of the patch.
Regards,
Thinh Tran
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
2023-10-03 22:05 ` Thinh Tran
@ 2023-11-02 16:02 ` Thinh Tran
0 siblings, 0 replies; 22+ messages in thread
From: Thinh Tran @ 2023-11-02 16:02 UTC (permalink / raw)
To: Michael Chan
Cc: netdev, siva.kallam, prashant, mchan, drc, pavan.chebbi,
Venkata Sai Duggi
Hi Michael,
On 10/3/2023 5:05 PM, Thinh Tran wrote:
> Thanks for the review.
>
> On 10/3/2023 4:37 AM, Michael Chan wrote:
>
>>
>> tg3_flag_set() calls set_bit() which is atomic. The same is true for
>> tg3_flag_clear(). Maybe we just need some smp_mb__after_atomic() or
>> similar memory barriers.
>>
>
> I did not see it being used in this driver. I'll try that.
>
> Thinh Tran
>
I tried that but still the intermittent problem still persists.
I have a fix that I'll describe in V2 of the patch
Thinh Tran
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-10-02 18:55 [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel() Thinh Tran
2023-10-03 4:34 ` Pavan Chebbi
2023-10-03 9:37 ` Michael Chan
@ 2023-11-02 16:12 ` Thinh Tran
2023-11-02 17:27 ` Michael Chan
2023-11-16 15:18 ` [PATCH v3] " Thinh Tran
2 siblings, 2 replies; 22+ messages in thread
From: Thinh Tran @ 2023-11-02 16:12 UTC (permalink / raw)
To: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc
Cc: venkata.sai.duggi, Thinh Tran
When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:
enum {
/* I/O channel is in normal state */
pci_channel_io_normal = (__force pci_channel_state_t) 1,
/* I/O to channel is blocked */
pci_channel_io_frozen = (__force pci_channel_state_t) 2,
/* PCI card is dead */
pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
};
If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.
EEH driver gets error event
--> eeh_set_channel_state()
and set device to one of
error state above scheduler: tg3_reset_task() get
returned error from tg3_init_hw()
--> dev_close() shuts down the interface
tg3_io_slot_reset() and
tg3_io_resume() fail to
reset/resume the device
To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_tx_timeout() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state. (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and
tg3_io_resume() functions as appropriate.
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
---
drivers/net/ethernet/broadcom/tg3.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 14b311196b8f..1c72ef05ab1b 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7630,6 +7630,26 @@ static void tg3_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct tg3 *tp = netdev_priv(dev);
+ /* checking the PCI channel state for hard errors
+ * for pci_channel_io_frozen case
+ * - I/O to channel is blocked.
+ * The EEH layer and I/O error detections will
+ * handle the reset procedure
+ * for pci_channel_io_perm_failure case
+ * - the PCI card is dead.
+ * The reset will not help
+ * report the error for both cases and return.
+ */
+ if (tp->pdev->error_state == pci_channel_io_frozen) {
+ netdev_err(dev, " %s, I/O to channel is blocked\n", __func__);
+ return;
+ }
+
+ if (tp->pdev->error_state == pci_channel_io_perm_failure) {
+ netdev_err(dev, " %s, adapter has failed permanently!\n", __func__);
+ return;
+ }
+
if (netif_msg_tx_err(tp)) {
netdev_err(dev, "transmit timed out, resetting\n");
tg3_dump_state(tp);
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-11-02 16:12 ` [PATCH v2] net/tg3: fix race condition in tg3_reset_task() Thinh Tran
@ 2023-11-02 17:27 ` Michael Chan
2023-11-02 20:37 ` Thinh Tran
2023-11-16 15:18 ` [PATCH v3] " Thinh Tran
1 sibling, 1 reply; 22+ messages in thread
From: Michael Chan @ 2023-11-02 17:27 UTC (permalink / raw)
To: Thinh Tran
Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
venkata.sai.duggi
[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]
On Thu, Nov 2, 2023 at 9:16 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
> When an EEH error is encountered by a PCI adapter, the EEH driver
> modifies the PCI channel's state as shown below:
>
> enum {
> /* I/O channel is in normal state */
> pci_channel_io_normal = (__force pci_channel_state_t) 1,
>
> /* I/O to channel is blocked */
> pci_channel_io_frozen = (__force pci_channel_state_t) 2,
>
> /* PCI card is dead */
> pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
> };
>
> If the same EEH error then causes the tg3 driver's transmit timeout
> logic to execute, the tg3_tx_timeout() function schedules a reset
> task via tg3_reset_task_schedule(), which may cause a race condition
> between the tg3 and EEH driver as both attempt to recover the HW via
> a reset action.
>
> EEH driver gets error event
> --> eeh_set_channel_state()
> and set device to one of
> error state above scheduler: tg3_reset_task() get
> returned error from tg3_init_hw()
> --> dev_close() shuts down the interface
>
> tg3_io_slot_reset() and
> tg3_io_resume() fail to
> reset/resume the device
>
>
> To resolve this issue, we avoid the race condition by checking the PCI
> channel state in the tg3_tx_timeout() function and skip the tg3 driver
> initiated reset when the PCI channel is not in the normal state. (The
> driver has no access to tg3 device registers at this point and cannot
> even complete the reset task successfully without external assistance.)
> We'll leave the reset procedure to be managed by the EEH driver which
> calls the tg3_io_error_detected(), tg3_io_slot_reset() and
> tg3_io_resume() functions as appropriate.
This scenario can affect other drivers too, right? Shouldn't this be
handled in a higher layer before calling ->ndo_tx_timeout() so we
don't have to add this logic to all the other drivers? Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-11-02 17:27 ` Michael Chan
@ 2023-11-02 20:37 ` Thinh Tran
2023-11-14 17:39 ` Thinh Tran
0 siblings, 1 reply; 22+ messages in thread
From: Thinh Tran @ 2023-11-02 20:37 UTC (permalink / raw)
To: Michael Chan
Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
venkata.sai.duggi
Thanks for the review.
On 11/2/2023 12:27 PM, Michael Chan wrote:
>
> This scenario can affect other drivers too, right? Shouldn't this be
> handled in a higher layer before calling ->ndo_tx_timeout() so we
> don't have to add this logic to all the other drivers? Thanks.
Yes, it does. We can add this into the dev_watchdog() function, but
further investigations are required. This is because each driver may
have a different approach to handling its own ->ndo_tx_timeout() function.
Thinh Tran
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-11-02 20:37 ` Thinh Tran
@ 2023-11-14 17:39 ` Thinh Tran
2023-11-14 21:03 ` Michael Chan
0 siblings, 1 reply; 22+ messages in thread
From: Thinh Tran @ 2023-11-14 17:39 UTC (permalink / raw)
To: Michael Chan
Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
venkata.sai.duggi
Hi Michael,
On 11/2/2023 3:37 PM, Thinh Tran wrote:
> Thanks for the review.
>
> On 11/2/2023 12:27 PM, Michael Chan wrote:
>>
>> This scenario can affect other drivers too, right? Shouldn't this be
>> handled in a higher layer before calling ->ndo_tx_timeout() so we
>> don't have to add this logic to all the other drivers? Thanks.
>
> Yes, it does. We can add this into the dev_watchdog() function, but
> further investigations are required. This is because each driver may
> have a different approach to handling its own ->ndo_tx_timeout() function.
>
> Thinh Tran
I attempted to relocate the fix into the dev_watchdog(), but we
experienced crashes in various drivers, leading to the destabilization
of other drivers.
I propose to accept the current patch, rather than introducing a new fix
that may potentially create issues for other drivers
Thanks,
Thinh Tran
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-11-14 17:39 ` Thinh Tran
@ 2023-11-14 21:03 ` Michael Chan
2023-11-15 18:23 ` Thinh Tran
0 siblings, 1 reply; 22+ messages in thread
From: Michael Chan @ 2023-11-14 21:03 UTC (permalink / raw)
To: Thinh Tran
Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
venkata.sai.duggi
[-- Attachment #1: Type: text/plain, Size: 614 bytes --]
On Tue, Nov 14, 2023 at 9:39 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
> I attempted to relocate the fix into the dev_watchdog(), but we
> experienced crashes in various drivers, leading to the destabilization
> of other drivers.
Could you provide more information about the crashes? The
dev_watchdog() code already checks for netif_device_present() and
netif_running() and netif_carrier_ok() before proceeding to check for
TX timeout. Why would adding some additional checks for PCI errors
cause problems? Of course the additional checks should only be done
on PCI devices only. Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-11-14 21:03 ` Michael Chan
@ 2023-11-15 18:23 ` Thinh Tran
2023-11-15 18:56 ` Michael Chan
0 siblings, 1 reply; 22+ messages in thread
From: Thinh Tran @ 2023-11-15 18:23 UTC (permalink / raw)
To: Michael Chan
Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
venkata.sai.duggi
On 11/14/2023 3:03 PM, Michael Chan wrote:
>
> Could you provide more information about the crashes? The
> dev_watchdog() code already checks for netif_device_present() and
> netif_running() and netif_carrier_ok() before proceeding to check for
> TX timeout. Why would adding some additional checks for PCI errors
> cause problems? Of course the additional checks should only be done
> on PCI devices only. Thanks.
The checking for PCI errors is not the problem, avoiding calling drivers
->ndo_tx_timeout() function, causing some issue.
Here is the fix in the dev_watchdog():
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -24,6 +24,7 @@
#include <linux/if_vlan.h>
#include <linux/skb_array.h>
#include <linux/if_macvlan.h>
+#include <linux/pci.h>
#include <net/sch_generic.h>
#include <net/pkt_sched.h>
#include <net/dst.h>
@@ -521,12 +522,32 @@ static void dev_watchdog(struct timer_list *t)
}
if (unlikely(some_queue_timedout)) {
+ struct pci_dev *pci_dev;
+
trace_net_dev_xmit_timeout(dev, i);
WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s
(%s): transmit queue %u timed out\n",
dev->name, netdev_drivername(dev), i);
- netif_freeze_queues(dev);
- dev->netdev_ops->ndo_tx_timeout(dev, i);
- netif_unfreeze_queues(dev);
+ pci_dev = to_pci_dev(dev->dev.parent);
+ if (pci_dev && (pci_dev->error_state !=
pci_channel_io_normal)) {
+ /* checking the PCI channel state for hard errors
+ * for pci_channel_io_frozen case
+ * - I/O to channel is blocked.
+ * The EEH layer and I/O error detections will
+ * handle the reset procedure
+ * for pci_channel_io_perm_failure case
+ * - the PCI card is dead. The reset will not help
+ * Report the error for either case, but not calling
+ * the driver's ndo_tx_timeout() function.
+ */
+ if (pci_dev->error_state ==
pci_channel_io_frozen)
+ netdev_err(dev, " %s, I/O to channel is
blocked\n",dev->name);
+ else
+ netdev_err(dev, " %s, adapter has failed
permanently!\n", dev->name );
+ } else {
+ netif_freeze_queues(dev);
+ dev->netdev_ops->ndo_tx_timeout(dev, i);
+ netif_unfreeze_queues(dev);
+ }
}
if (!mod_timer(&dev->watchdog_timer,
round_jiffies(jiffies +
--
from the crash dump on a system with 4-port BCM57800 adapter
crash> net
NET_DEVICE NAME IP ADDRESS(ES)
c000000003581000 lo 127.0.0.1, ::1
c000000008f8b000 net0 9.3.233.69
c00000000315c000 enP23p1s0f0 101.1.233.69
c000000003164000 enP23p1s0f1 102.1.233.69
c00000000316c000 enP23p1s0f2 103.1.233.69
c000000003174000 enP23p1s0f3 104.1.233.69
crash> dmesg
[ 752.115994] ------------[ cut here ]------------
[ 752.115996] NETDEV WATCHDOG: enP23p1s0f0 (bnx2x): transmit queue 2
timed out
[ 752.116018] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:528
dev_watchdog+0x3c8/0x420
[ 752.116037] Modules linked in: rpadlpar_io rpaphp xsk_diag
nft_counter nft_compat nf_tables nfnetlink rfkill bonding tls sunrpc
binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs sd_mod
t10_pi sg ibmvscsi scsi_transport_srp ibmveth bnx2x vmx_crypto mdio
pseries_wdt libcrc32c dm_mirror dm_region_hash dm_log dm_mod fuse
------ snip the watchdog's dump ---------
------ in dev_watchdog() checking the PCI error and skipping
->ndo_tx_timeout() --------
[ 752.116430] ---[ end trace 868b17f3f105be7b ]---
[ 752.116437] bnx2x 0017:01:00.0 enP23p1s0f0: enP23p1s0f0, I/O to
channel is blocked
[ 752.195975] bnx2x: [bnx2x_timer:5811(enP23p1s0f3)]MFW seems hanged:
drv_pulse (0x2ca) != mcp_pulse (0x7fff)
[ 752.195986] bnx2x:
[bnx2x_acquire_hw_lock:2023(enP23p1s0f3)]lock_status 0xffffffff
resource_bit 0x1
[ 752.196507] bnx2x 0017:01:00.3 enP23p1s0f3: MDC/MDIO access timeout
[ 752.196792] EEH: Recovering PHB#17-PE#10000
[ 752.196795] EEH: PE location: N/A, PHB location: N/A
[ 752.196797] EEH: Frozen PHB#17-PE#10000 detected
[ 752.196798] EEH: Call Trace:
[ 752.196799] EEH: [c00000000005102c] __eeh_send_failure_event+0x7c/0x160
[ 752.196808] EEH: [c000000000049cf4]
eeh_dev_check_failure.part.0+0x254/0x650
[ 752.196812] EEH: [c00800000107c7a0] bnx2x_timer+0x1e8/0x250 [bnx2x]
[ 752.196863] EEH: [c00000000023eb00] call_timer_fn+0x50/0x1c0
--------- snip here -------------
[ 752.196913] EEH: This PCI device has failed 1 times in the last hour
and will be permanently disabled after 5 failures.
[ 752.196915] EEH: Notify device drivers to shutdown
[ 752.196918] EEH: Beginning: 'error_detected(IO frozen)'
[ 752.196920] PCI 0017:01:00.0#10000: EEH: Invoking
bnx2x->error_detected(IO frozen)
[ 752.196924] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f0)]IO
error detected
[ 752.197024] bnx2x 0017:01:00.3 enP23p1s0f3: MDC/MDIO access timeout
[ 752.197039] PCI 0017:01:00.0#10000: EEH: bnx2x driver reports: 'need
reset'
[ 752.197041] PCI 0017:01:00.1#10000: EEH: Invoking
bnx2x->error_detected(IO frozen)
[ 752.197042] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f1)]IO
error detected
[ 752.197093] PCI 0017:01:00.1#10000: EEH: bnx2x driver reports: 'need
reset'
[ 752.197095] PCI 0017:01:00.2#10000: EEH: Invoking
bnx2x->error_detected(IO frozen)
[ 752.197096] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f2)]IO
error detected
[ 752.197151] PCI 0017:01:00.2#10000: EEH: bnx2x driver reports: 'need
reset'
[ 752.197153] PCI 0017:01:00.3#10000: EEH: Invoking
bnx2x->error_detected(IO frozen)
[ 752.197154] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f3)]IO
error detected
[ 752.197208] PCI 0017:01:00.3#10000: EEH: bnx2x driver reports: 'need
reset'
[ 752.197210] EEH: Finished:'error_detected(IO frozen)' with aggregate
recovery state:'need reset'
-------------- snip here --------
[ 754.407972] EEH: Beginning: 'slot_reset'
[ 754.407978] PCI 0017:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
[ 754.407981] bnx2x: [bnx2x_io_slot_reset:14225(enP23p1s0f0)]IO slot
reset initializing...
[ 754.408047] bnx2x 0017:01:00.0: enabling device (0140 -> 0142)
[ 754.412432] bnx2x: [bnx2x_io_slot_reset:14241(enP23p1s0f0)]IO slot
reset --> driver unload
--------- snip here ------------
[ 764.526802] PCI 0017:01:00.0#10000: EEH: bnx2x driver reports:
'recovered'
[ 764.526806] PCI 0017:01:00.1#10000: EEH: Invoking bnx2x->slot_reset()
[ 764.526808] bnx2x: [bnx2x_io_slot_reset:14225(enP23p1s0f1)]IO slot
reset initializing...
[ 764.526898] bnx2x 0017:01:00.1: enabling device (0140 -> 0142)
[ 764.531117] bnx2x: [bnx2x_io_slot_reset:14241(enP23p1s0f1)]IO slot
reset --> driver unload
---------- snip here --------
[ 772.770957] bnx2x: [bnx2x_io_slot_reset:14241(enP23p1s0f3)]IO slot
reset --> driver unload
[ 772.886717] PCI 0017:01:00.3#10000: EEH: bnx2x driver reports:
'recovered'
[ 772.886720] EEH: Finished:'slot_reset' with aggregate recovery
state:'recovered'
[ 772.886721] EEH: Notify device driver to resume
[ 772.886722] EEH: Beginning: 'resume'
[ 772.886723] PCI 0017:01:00.0#10000: EEH: Invoking bnx2x->resume()
[ 773.476919] bnx2x 0017:01:00.0 enP23p1s0f0: using MSI-X IRQs: sp 55
fp[0] 57 ... fp[7] 64
[ 773.706115] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[ 773.708230] PCI 0017:01:00.0#10000: EEH: bnx2x driver reports: 'none'
[ 773.708234] PCI 0017:01:00.1#10000: EEH: Invoking bnx2x->resume()
[ 774.307404] bnx2x 0017:01:00.1 enP23p1s0f1: using MSI-X IRQs: sp 65
fp[0] 67 ... fp[7] 74
[ 774.546123] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[ 774.548304] PCI 0017:01:00.1#10000: EEH: bnx2x driver reports: 'none'
[ 774.548311] PCI 0017:01:00.2#10000: EEH: Invoking bnx2x->resume()
[ 774.747483] bnx2x 0017:01:00.2 enP23p1s0f2: using MSI-X IRQs: sp 75
fp[0] 77 ... fp[7] 84
[ 774.756111] bnx2x: [bnx2x_hw_stats_update:871(enP23p1s0f0)]NIG timer
max (0)
[ 775.038454] PCI 0017:01:00.2#10000: EEH: bnx2x driver reports: 'none'
[ 775.038466] PCI 0017:01:00.3#10000: EEH: Invoking bnx2x->resume()
[ 775.228049] bnx2x 0017:01:00.3 enP23p1s0f3: using MSI-X IRQs: sp 85
fp[0] 87 ... fp[7] 94
[ 775.548237] PCI 0017:01:00.3#10000: EEH: bnx2x driver reports: 'none'
[ 775.548245] EEH: Finished:'resume'
[ 775.548247] EEH: Recovery successful.
[ 775.556120] bnx2x: [bnx2x_hw_stats_update:871(enP23p1s0f1)]NIG timer
max (0)
[ 1203.919654] bnx2x 0017:01:00.0 enP23p1s0f0: using MSI-X IRQs: sp 55
fp[0] 57 ... fp[7] 64
[ 1204.156946] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[ 1204.159011] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f0: link becomes
ready
[ 1204.386939] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Down
[ 1209.789617] bnx2x 0017:01:00.1 enP23p1s0f1: using MSI-X IRQs: sp 65
fp[0] 67 ... fp[7] 74
[ 1210.026894] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[ 1210.028955] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f1: link becomes
ready
[ 1210.357268] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[ 1214.526868] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Down
[ 1215.647561] bnx2x 0017:01:00.2 enP23p1s0f2: using MSI-X IRQs: sp 75
fp[0] 77 ... fp[7] 84
[ 1220.357087] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[ 1221.517564] bnx2x 0017:01:00.3 enP23p1s0f3: using MSI-X IRQs: sp 85
fp[0] 87 ... fp[7] 94
[ 1222.012323] systemd-rc-local-generator[16948]: /etc/rc.d/rc.local is
not marked executable, skipping.
[ 1232.476909] bnx2x 0017:01:00.2 enP23p1s0f2: NIC Link is Up, 1000 Mbps
full duplex, Flow control: ON - receive & transmit
[ 1232.476941] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f2: link becomes
ready
[ 1237.996937] bnx2x 0017:01:00.3 enP23p1s0f3: NIC Link is Up, 1000 Mbps
full duplex, Flow control: ON - receive & transmit
[ 1237.996961] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f3: link becomes
ready
---------- snip here ---------
[ 1592.978832] Kernel attempted to write user page (e) - exploit
attempt? (uid: 0)
[ 1592.978836] BUG: Kernel NULL pointer dereference on write at 0x0000000e
[ 1592.978838] Faulting instruction address: 0xc0080000010bb1e8
[ 1592.978841] Oops: Kernel access of bad area, sig: 11 [#1]
----------
crash> bt
PID: 41 TASK: c000000003d29b00 CPU: 5 COMMAND: "ksoftirqd/5"
R0: c0080000010bb1a0 R1: c000000003b7b910 R2: c008000001178000
R3: 08000001173928be R4: c00c00000045ce40 R5: 00000000000028be
R6: 0000000000000001 R7: ffffffffffffffff R8: 0000000000000000
R9: 0000000000000010 R10: 0000000000000000 R11: c0080000010fee78
R12: c000000000231cf0 R13: c000000fffff9080 R14: 0000000000000000
R15: 0000000000000000 R16: 0000000000000001 R17: 0000000000000000
R18: 08000001173928be R19: 0000000000000000 R20: 0000000000000000
R21: c0000001173928be R22: c000000003164a00 R23: c000000012570200
R24: 0000000000000000 R25: 0000000000000001 R26: c00000001c51e050
R27: 0000000000000005 R28: c000000003164000 R29: 0000000000000000
R30: c0000000d38a8ae0 R31: 0000000000000000
NIP: c0080000010bb1e8 MSR: 800000000280b033 OR3: c000000000230128
CTR: c000000000231cf0 LR: c0080000010bb1a0 XER: 0000000020040000
CCR: 0000000048008482 MQ: 0000000000000000 DAR: 000000000000000e
DSISR: 0000000042000000 Syscall Result: 0000000000000000
[NIP : bnx2x_start_xmit+496]
[LR : bnx2x_start_xmit+424]
#0 [c000000003b7b4e0] crash_kexec at c000000000279f8c
#1 [c000000003b7b510] oops_end at c0000000000291a8
#2 [c000000003b7b590] __bad_page_fault at c00000000008d1cc
#3 [c000000003b7b600] data_access_common_virt at c0000000000088dc
Data Access [300] exception frame:
R0: c0080000010bb1a0 R1: c000000003b7b910 R2: c008000001178000
R3: 08000001173928be R4: c00c00000045ce40 R5: 00000000000028be
R6: 0000000000000001 R7: ffffffffffffffff R8: 0000000000000000
R9: 0000000000000010 R10: 0000000000000000 R11: c0080000010fee78
R12: c000000000231cf0 R13: c000000fffff9080 R14: 0000000000000000
R15: 0000000000000000 R16: 0000000000000001 R17: 0000000000000000
R18: 08000001173928be R19: 0000000000000000 R20: 0000000000000000
R21: c0000001173928be R22: c000000003164a00 R23: c000000012570200
R24: 0000000000000000 R25: 0000000000000001 R26: c00000001c51e050
R27: 0000000000000005 R28: c000000003164000 R29: 0000000000000000
R30: c0000000d38a8ae0 R31: 0000000000000000
NIP: c0080000010bb1e8 MSR: 800000000280b033 OR3: c000000000230128
CTR: c000000000231cf0 LR: c0080000010bb1a0 XER: 0000000020040000
CCR: 0000000048008482 MQ: 0000000000000000 DAR: 000000000000000e
crash> bt
PID: 41 TASK: c000000003d29b00 CPU: 5 COMMAND: "ksoftirqd/5"
R0: c0080000010bb1a0 R1: c000000003b7b910 R2: c008000001178000
R3: 08000001173928be R4: c00c00000045ce40 R5: 00000000000028be
R6: 0000000000000001 R7: ffffffffffffffff R8: 0000000000000000
R9: 0000000000000010 R10: 0000000000000000 R11: c0080000010fee78
R12: c000000000231cf0 R13: c000000fffff9080 R14: 0000000000000000
R15: 0000000000000000 R16: 0000000000000001 R17: 0000000000000000
R18: 08000001173928be R19: 0000000000000000 R20: 0000000000000000
R21: c0000001173928be R22: c000000003164a00 R23: c000000012570200
R24: 0000000000000000 R25: 0000000000000001 R26: c00000001c51e050
R27: 0000000000000005 R28: c000000003164000 R29: 0000000000000000
R30: c0000000d38a8ae0 R31: 0000000000000000
NIP: c0080000010bb1e8 MSR: 800000000280b033 OR3: c000000000230128
CTR: c000000000231cf0 LR: c0080000010bb1a0 XER: 0000000020040000
CCR: 0000000048008482 MQ: 0000000000000000 DAR: 000000000000000e
DSISR: 0000000042000000 Syscall Result: 0000000000000000
[NIP : bnx2x_start_xmit+496]
[LR : bnx2x_start_xmit+424]
#0 [c000000003b7b4e0] crash_kexec at c000000000279f8c
#1 [c000000003b7b510] oops_end at c0000000000291a8
#2 [c000000003b7b590] __bad_page_fault at c00000000008d1cc
#3 [c000000003b7b600] data_access_common_virt at c0000000000088dc
Data Access [300] exception frame:
R0: c0080000010bb1a0 R1: c000000003b7b910 R2: c008000001178000
R3: 08000001173928be R4: c00c00000045ce40 R5: 00000000000028be
R6: 0000000000000001 R7: ffffffffffffffff R8: 0000000000000000
R9: 0000000000000010 R10: 0000000000000000 R11: c0080000010fee78
R12: c000000000231cf0 R13: c000000fffff9080 R14: 0000000000000000
R15: 0000000000000000 R16: 0000000000000001 R17: 0000000000000000
R18: 08000001173928be R19: 0000000000000000 R20: 0000000000000000
R21: c0000001173928be R22: c000000003164a00 R23: c000000012570200
R24: 0000000000000000 R25: 0000000000000001 R26: c00000001c51e050
R27: 0000000000000005 R28: c000000003164000 R29: 0000000000000000
R30: c0000000d38a8ae0 R31: 0000000000000000
NIP: c0080000010bb1e8 MSR: 800000000280b033 OR3: c000000000230128
CTR: c000000000231cf0 LR: c0080000010bb1a0 XER: 0000000020040000
CCR: 0000000048008482 MQ: 0000000000000000 DAR: 000000000000000e
crash> dis -s bnx2x_start_xmit+496
FILE: drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
LINE: 3858
3853 /* get a tx_buf and first BD
3854 * tx_start_bd may be changed during SPLIT,
3855 * but first_bd will always stay first
3856 */
3857 tx_buf = &txdata->tx_buf_ring[TX_BD(pkt_prod)];
* 3858 tx_start_bd = &txdata->tx_desc_ring[bd_prod].start_bd;
3859 first_bd = tx_start_bd;
3860
I have not identified the root of this crash yet.
Regards,
Thinh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-11-15 18:23 ` Thinh Tran
@ 2023-11-15 18:56 ` Michael Chan
2023-11-16 14:41 ` Thinh Tran
0 siblings, 1 reply; 22+ messages in thread
From: Michael Chan @ 2023-11-15 18:56 UTC (permalink / raw)
To: Thinh Tran
Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
venkata.sai.duggi
[-- Attachment #1: Type: text/plain, Size: 890 bytes --]
On Wed, Nov 15, 2023 at 10:23 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
>
> On 11/14/2023 3:03 PM, Michael Chan wrote:
> >
> > Could you provide more information about the crashes? The
> > dev_watchdog() code already checks for netif_device_present() and
> > netif_running() and netif_carrier_ok() before proceeding to check for
> > TX timeout. Why would adding some additional checks for PCI errors
> > cause problems? Of course the additional checks should only be done
> > on PCI devices only. Thanks.
>
> The checking for PCI errors is not the problem, avoiding calling drivers
> ->ndo_tx_timeout() function, causing some issue.
I see. By skipping TX timeout during PCI errors, bnx2x crashes in
.ndo_start_xmit() after EEH error recovery.
I think it should be fine to fix the original EEH issue in tg3 then.
Please re-post the tg3 patch. Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
2023-11-15 18:56 ` Michael Chan
@ 2023-11-16 14:41 ` Thinh Tran
0 siblings, 0 replies; 22+ messages in thread
From: Thinh Tran @ 2023-11-16 14:41 UTC (permalink / raw)
To: Michael Chan
Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
venkata.sai.duggi
I'll re-post the V2 patch shortly.
Thanks for the review.
Thinh Tran
On 11/15/2023 12:56 PM, Michael Chan wrote:
> On Wed, Nov 15, 2023 at 10:23 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>>
>>
>> On 11/14/2023 3:03 PM, Michael Chan wrote:
>>>
>>> Could you provide more information about the crashes? The
>>> dev_watchdog() code already checks for netif_device_present() and
>>> netif_running() and netif_carrier_ok() before proceeding to check for
>>> TX timeout. Why would adding some additional checks for PCI errors
>>> cause problems? Of course the additional checks should only be done
>>> on PCI devices only. Thanks.
>>
>> The checking for PCI errors is not the problem, avoiding calling drivers
>> ->ndo_tx_timeout() function, causing some issue.
>
> I see. By skipping TX timeout during PCI errors, bnx2x crashes in
> .ndo_start_xmit() after EEH error recovery.
>
> I think it should be fine to fix the original EEH issue in tg3 then.
> Please re-post the tg3 patch. Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3] net/tg3: fix race condition in tg3_reset_task()
2023-11-02 16:12 ` [PATCH v2] net/tg3: fix race condition in tg3_reset_task() Thinh Tran
2023-11-02 17:27 ` Michael Chan
@ 2023-11-16 15:18 ` Thinh Tran
2023-11-16 21:34 ` Michael Chan
2023-12-01 0:19 ` [PATCH v4] " Thinh Tran
1 sibling, 2 replies; 22+ messages in thread
From: Thinh Tran @ 2023-11-16 15:18 UTC (permalink / raw)
To: mchan
Cc: pavan.chebbi, netdev, prashant, siva.kallam, drc,
venkata.sai.duggi, thinhtr, edumazet, kuba, pabeni, davem
When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:
enum {
/* I/O channel is in normal state */
pci_channel_io_normal = (__force pci_channel_state_t) 1,
/* I/O to channel is blocked */
pci_channel_io_frozen = (__force pci_channel_state_t) 2,
/* PCI card is dead */
pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
};
If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.
EEH driver gets error event
--> eeh_set_channel_state()
and set device to one of
error state above scheduler: tg3_reset_task() get
returned error from tg3_init_hw()
--> dev_close() shuts down the interface
tg3_io_slot_reset() and
tg3_io_resume() fail to
reset/resume the device
To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_tx_timeout() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state. (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and
tg3_io_resume() functions as appropriate.
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
v3: re-post the patch.
v2: checking PCI errors in tg3_tx_timeout()
---
drivers/net/ethernet/broadcom/tg3.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 14b311196b8f..1c72ef05ab1b 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7630,6 +7630,26 @@ static void tg3_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct tg3 *tp = netdev_priv(dev);
+ /* checking the PCI channel state for hard errors
+ * for pci_channel_io_frozen case
+ * - I/O to channel is blocked.
+ * The EEH layer and I/O error detections will
+ * handle the reset procedure
+ * for pci_channel_io_perm_failure case
+ * - the PCI card is dead.
+ * The reset will not help
+ * report the error for both cases and return.
+ */
+ if (tp->pdev->error_state == pci_channel_io_frozen) {
+ netdev_err(dev, " %s, I/O to channel is blocked\n", __func__);
+ return;
+ }
+
+ if (tp->pdev->error_state == pci_channel_io_perm_failure) {
+ netdev_err(dev, " %s, adapter has failed permanently!\n", __func__);
+ return;
+ }
+
if (netif_msg_tx_err(tp)) {
netdev_err(dev, "transmit timed out, resetting\n");
tg3_dump_state(tp);
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v3] net/tg3: fix race condition in tg3_reset_task()
2023-11-16 15:18 ` [PATCH v3] " Thinh Tran
@ 2023-11-16 21:34 ` Michael Chan
2023-11-17 16:19 ` Thinh Tran
2023-12-01 0:19 ` [PATCH v4] " Thinh Tran
1 sibling, 1 reply; 22+ messages in thread
From: Michael Chan @ 2023-11-16 21:34 UTC (permalink / raw)
To: Thinh Tran
Cc: mchan, pavan.chebbi, netdev, prashant, siva.kallam, drc,
venkata.sai.duggi, edumazet, kuba, pabeni, davem
[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]
On Thu, Nov 16, 2023 at 7:19 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index 14b311196b8f..1c72ef05ab1b 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -7630,6 +7630,26 @@ static void tg3_tx_timeout(struct net_device *dev, unsigned int txqueue)
> {
> struct tg3 *tp = netdev_priv(dev);
>
> + /* checking the PCI channel state for hard errors
> + * for pci_channel_io_frozen case
> + * - I/O to channel is blocked.
> + * The EEH layer and I/O error detections will
> + * handle the reset procedure
> + * for pci_channel_io_perm_failure case
> + * - the PCI card is dead.
> + * The reset will not help
> + * report the error for both cases and return.
> + */
> + if (tp->pdev->error_state == pci_channel_io_frozen) {
> + netdev_err(dev, " %s, I/O to channel is blocked\n", __func__);
> + return;
> + }
> +
> + if (tp->pdev->error_state == pci_channel_io_perm_failure) {
> + netdev_err(dev, " %s, adapter has failed permanently!\n", __func__);
> + return;
> + }
> +
I think it will be better to add these 2 checks in tg3_reset_task().
We are already doing a similar check there for tp->pcierr_recovery so
it is better to centralize all the related checks in the same place.
I don't think tg3_dump_state() below will cause any problems. We'll
probably read 0xffffffff for all registers and it will actually
confirm that the registers are not readable.
> if (netif_msg_tx_err(tp)) {
> netdev_err(dev, "transmit timed out, resetting\n");
> tg3_dump_state(tp);
> --
> 2.25.1
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] net/tg3: fix race condition in tg3_reset_task()
2023-11-16 21:34 ` Michael Chan
@ 2023-11-17 16:19 ` Thinh Tran
2023-11-17 18:31 ` Michael Chan
0 siblings, 1 reply; 22+ messages in thread
From: Thinh Tran @ 2023-11-17 16:19 UTC (permalink / raw)
To: Michael Chan
Cc: mchan, pavan.chebbi, netdev, prashant, siva.kallam, drc,
venkata.sai.duggi, edumazet, kuba, pabeni, davem
On 11/16/2023 3:34 PM, Michael Chan wrote:
>
> I think it will be better to add these 2 checks in tg3_reset_task().
> We are already doing a similar check there for tp->pcierr_recovery so
> it is better to centralize all the related checks in the same place.
> I don't think tg3_dump_state() below will cause any problems. We'll
> probably read 0xffffffff for all registers and it will actually
> confirm that the registers are not readable.
>
I'm concerned that race conditions could still occur during the handling
of Partitionable Endpoint (PE) reset by the EEH driver. The issue lies
in the dependency on the lower-level FW reset procedure. When the driver
executes tg3_dump_state(), and then follows it with tg3_reset_task(),
the EEH driver possibility changes in the PCI devices' state, but the
MMIO and DMA reset procedures may not have completed yet. Leading to a
crash in tg3_reset_task(). This patch tries to prevent that scenario.
While tg3_dump_state() is helpful, it also poses issues as it takes a
considerable amount of time, approximately 1 or 2 seconds per device.
Given our 4-port adapter, this could extend to more than 10 seconds to
write to the dmesg buffer. With the default size, the dmesg buffer may
be over-written and not retain all useful information.
Thanks,
Thinh Tran
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] net/tg3: fix race condition in tg3_reset_task()
2023-11-17 16:19 ` Thinh Tran
@ 2023-11-17 18:31 ` Michael Chan
2023-11-30 22:29 ` Thinh Tran
0 siblings, 1 reply; 22+ messages in thread
From: Michael Chan @ 2023-11-17 18:31 UTC (permalink / raw)
To: Thinh Tran
Cc: mchan, pavan.chebbi, netdev, prashant, siva.kallam, drc,
venkata.sai.duggi, edumazet, kuba, pabeni, davem
[-- Attachment #1: Type: text/plain, Size: 2056 bytes --]
On Fri, Nov 17, 2023 at 8:19 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
>
> On 11/16/2023 3:34 PM, Michael Chan wrote:
> >
> > I think it will be better to add these 2 checks in tg3_reset_task().
> > We are already doing a similar check there for tp->pcierr_recovery so
> > it is better to centralize all the related checks in the same place.
> > I don't think tg3_dump_state() below will cause any problems. We'll
> > probably read 0xffffffff for all registers and it will actually
> > confirm that the registers are not readable.
> >
>
> I'm concerned that race conditions could still occur during the handling
> of Partitionable Endpoint (PE) reset by the EEH driver. The issue lies
> in the dependency on the lower-level FW reset procedure. When the driver
> executes tg3_dump_state(), and then follows it with tg3_reset_task(),
> the EEH driver possibility changes in the PCI devices' state, but the
> MMIO and DMA reset procedures may not have completed yet. Leading to a
> crash in tg3_reset_task(). This patch tries to prevent that scenario.
It seems fragile if you are relying on such timing. TG3_TX_TIMEOUT is
5 seconds but the actual time tg3_tx_timeout() is called varies
depending on when the TX queue is stopped. So tg3_tx_timeout() will
be called 5 seconds or more after EEH if there are uncompleted TX
packets but we don't know precisely when.
>
> While tg3_dump_state() is helpful, it also poses issues as it takes a
> considerable amount of time, approximately 1 or 2 seconds per device.
> Given our 4-port adapter, this could extend to more than 10 seconds to
> write to the dmesg buffer. With the default size, the dmesg buffer may
> be over-written and not retain all useful information.
>
If tg3_dump_state() is not useful and fills the dmesg log with useless
data, we can add the same check in tg3_dump_state() and skip it.
tg3_dump_state() is also called by tg3_process_error() so we can avoid
dumping useless data if we hit tg3_process_error() during EEH or AER.
Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] net/tg3: fix race condition in tg3_reset_task()
2023-11-17 18:31 ` Michael Chan
@ 2023-11-30 22:29 ` Thinh Tran
0 siblings, 0 replies; 22+ messages in thread
From: Thinh Tran @ 2023-11-30 22:29 UTC (permalink / raw)
To: Michael Chan
Cc: mchan, pavan.chebbi, netdev, prashant, siva.kallam, drc,
venkata.sai.duggi, edumazet, kuba, pabeni, davem
On 11/17/2023 12:31 PM, Michael Chan wrote:
> On Fri, Nov 17, 2023 at 8:19 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>>
>>
>> On 11/16/2023 3:34 PM, Michael Chan wrote:
>>>
>>> I think it will be better to add these 2 checks in tg3_reset_task().
>>> We are already doing a similar check there for tp->pcierr_recovery so
>>> it is better to centralize all the related checks in the same place.
>>> I don't think tg3_dump_state() below will cause any problems. We'll
>>> probably read 0xffffffff for all registers and it will actually
>>> confirm that the registers are not readable.
>>>
>>
>> I'm concerned that race conditions could still occur during the handling
>> of Partitionable Endpoint (PE) reset by the EEH driver. The issue lies
>> in the dependency on the lower-level FW reset procedure. When the driver
>> executes tg3_dump_state(), and then follows it with tg3_reset_task(),
>> the EEH driver possibility changes in the PCI devices' state, but the
>> MMIO and DMA reset procedures may not have completed yet. Leading to a
>> crash in tg3_reset_task(). This patch tries to prevent that scenario.
>
> It seems fragile if you are relying on such timing. TG3_TX_TIMEOUT is
> 5 seconds but the actual time tg3_tx_timeout() is called varies
> depending on when the TX queue is stopped. So tg3_tx_timeout() will
> be called 5 seconds or more after EEH if there are uncompleted TX
> packets but we don't know precisely when.
>
>>
>> While tg3_dump_state() is helpful, it also poses issues as it takes a
>> considerable amount of time, approximately 1 or 2 seconds per device.
>> Given our 4-port adapter, this could extend to more than 10 seconds to
>> write to the dmesg buffer. With the default size, the dmesg buffer may
>> be over-written and not retain all useful information.
>>
>
> If tg3_dump_state() is not useful and fills the dmesg log with useless
> data, we can add the same check in tg3_dump_state() and skip it.
> tg3_dump_state() is also called by tg3_process_error() so we can avoid
> dumping useless data if we hit tg3_process_error() during EEH or AER.
>
> Thanks.
I implemented the fix as you suggested and passed the tests.
I will send the next version of patch shortly.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v4] net/tg3: fix race condition in tg3_reset_task()
2023-11-16 15:18 ` [PATCH v3] " Thinh Tran
2023-11-16 21:34 ` Michael Chan
@ 2023-12-01 0:19 ` Thinh Tran
2023-12-01 16:50 ` Michael Chan
2023-12-02 0:40 ` patchwork-bot+netdevbpf
1 sibling, 2 replies; 22+ messages in thread
From: Thinh Tran @ 2023-12-01 0:19 UTC (permalink / raw)
To: michael.chan
Cc: davem, drc, edumazet, kuba, mchan, netdev, pabeni, pavan.chebbi,
prashant, siva.kallam, thinhtr, Venkata Sai Duggi
When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:
enum {
/* I/O channel is in normal state */
pci_channel_io_normal = (__force pci_channel_state_t) 1,
/* I/O to channel is blocked */
pci_channel_io_frozen = (__force pci_channel_state_t) 2,
/* PCI card is dead */
pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
};
If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.
EEH driver gets error event
--> eeh_set_channel_state()
and set device to one of
error state above scheduler: tg3_reset_task() get
returned error from tg3_init_hw()
--> dev_close() shuts down the interface
tg3_io_slot_reset() and
tg3_io_resume() fail to
reset/resume the device
To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_reset_task() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state. (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and
tg3_io_resume() functions as appropriate.
Adding the same checking in tg3_dump_state() to avoid dumping all
device registers when the PCI channel is not in the normal state.
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
v4: moving the PCI error checking to tg3_reset_task() and
tg3_dump_state()
v3: re-post the patch.
v2: checking PCI errors in tg3_tx_timeout()
---
drivers/net/ethernet/broadcom/tg3.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 48b6191efa56..49f299c868a1 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6474,6 +6474,14 @@ static void tg3_dump_state(struct tg3 *tp)
int i;
u32 *regs;
+ /* If it is a PCI error, all registers will be 0xffff,
+ * we don't dump them out, just report the error and return
+ */
+ if (tp->pdev->error_state != pci_channel_io_normal) {
+ netdev_err(tp->dev, "PCI channel ERROR!\n");
+ return;
+ }
+
regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC);
if (!regs)
return;
@@ -11259,7 +11267,8 @@ static void tg3_reset_task(struct work_struct *work)
rtnl_lock();
tg3_full_lock(tp, 0);
- if (tp->pcierr_recovery || !netif_running(tp->dev)) {
+ if (tp->pcierr_recovery || !netif_running(tp->dev) ||
+ tp->pdev->error_state != pci_channel_io_normal) {
tg3_flag_clear(tp, RESET_TASK_PENDING);
tg3_full_unlock(tp);
rtnl_unlock();
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4] net/tg3: fix race condition in tg3_reset_task()
2023-12-01 0:19 ` [PATCH v4] " Thinh Tran
@ 2023-12-01 16:50 ` Michael Chan
2023-12-02 0:40 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 22+ messages in thread
From: Michael Chan @ 2023-12-01 16:50 UTC (permalink / raw)
To: Thinh Tran
Cc: davem, drc, edumazet, kuba, mchan, netdev, pabeni, pavan.chebbi,
prashant, siva.kallam, Venkata Sai Duggi
[-- Attachment #1: Type: text/plain, Size: 2365 bytes --]
On Thu, Nov 30, 2023 at 4:19 PM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
> When an EEH error is encountered by a PCI adapter, the EEH driver
> modifies the PCI channel's state as shown below:
>
> enum {
> /* I/O channel is in normal state */
> pci_channel_io_normal = (__force pci_channel_state_t) 1,
>
> /* I/O to channel is blocked */
> pci_channel_io_frozen = (__force pci_channel_state_t) 2,
>
> /* PCI card is dead */
> pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
> };
>
> If the same EEH error then causes the tg3 driver's transmit timeout
> logic to execute, the tg3_tx_timeout() function schedules a reset
> task via tg3_reset_task_schedule(), which may cause a race condition
> between the tg3 and EEH driver as both attempt to recover the HW via
> a reset action.
>
> EEH driver gets error event
> --> eeh_set_channel_state()
> and set device to one of
> error state above scheduler: tg3_reset_task() get
> returned error from tg3_init_hw()
> --> dev_close() shuts down the interface
> tg3_io_slot_reset() and
> tg3_io_resume() fail to
> reset/resume the device
>
> To resolve this issue, we avoid the race condition by checking the PCI
> channel state in the tg3_reset_task() function and skip the tg3 driver
> initiated reset when the PCI channel is not in the normal state. (The
> driver has no access to tg3 device registers at this point and cannot
> even complete the reset task successfully without external assistance.)
> We'll leave the reset procedure to be managed by the EEH driver which
> calls the tg3_io_error_detected(), tg3_io_slot_reset() and
> tg3_io_resume() functions as appropriate.
>
> Adding the same checking in tg3_dump_state() to avoid dumping all
> device registers when the PCI channel is not in the normal state.
>
>
> Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
> Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
>
> v4: moving the PCI error checking to tg3_reset_task() and
> tg3_dump_state()
> v3: re-post the patch.
> v2: checking PCI errors in tg3_tx_timeout()
Thanks.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] net/tg3: fix race condition in tg3_reset_task()
2023-12-01 0:19 ` [PATCH v4] " Thinh Tran
2023-12-01 16:50 ` Michael Chan
@ 2023-12-02 0:40 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 22+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-12-02 0:40 UTC (permalink / raw)
To: Thinh Tran
Cc: michael.chan, davem, drc, edumazet, kuba, mchan, netdev, pabeni,
pavan.chebbi, prashant, siva.kallam, venkata.sai.duggi
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 30 Nov 2023 18:19:11 -0600 you wrote:
> When an EEH error is encountered by a PCI adapter, the EEH driver
> modifies the PCI channel's state as shown below:
>
> enum {
> /* I/O channel is in normal state */
> pci_channel_io_normal = (__force pci_channel_state_t) 1,
>
> [...]
Here is the summary with links:
- [v4] net/tg3: fix race condition in tg3_reset_task()
https://git.kernel.org/netdev/net/c/16b55b1f2269
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-12-02 0:40 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-02 18:55 [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel() Thinh Tran
2023-10-03 4:34 ` Pavan Chebbi
2023-10-31 23:18 ` Thinh Tran
2023-10-03 9:37 ` Michael Chan
2023-10-03 22:05 ` Thinh Tran
2023-11-02 16:02 ` Thinh Tran
2023-11-02 16:12 ` [PATCH v2] net/tg3: fix race condition in tg3_reset_task() Thinh Tran
2023-11-02 17:27 ` Michael Chan
2023-11-02 20:37 ` Thinh Tran
2023-11-14 17:39 ` Thinh Tran
2023-11-14 21:03 ` Michael Chan
2023-11-15 18:23 ` Thinh Tran
2023-11-15 18:56 ` Michael Chan
2023-11-16 14:41 ` Thinh Tran
2023-11-16 15:18 ` [PATCH v3] " Thinh Tran
2023-11-16 21:34 ` Michael Chan
2023-11-17 16:19 ` Thinh Tran
2023-11-17 18:31 ` Michael Chan
2023-11-30 22:29 ` Thinh Tran
2023-12-01 0:19 ` [PATCH v4] " Thinh Tran
2023-12-01 16:50 ` Michael Chan
2023-12-02 0:40 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).