netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Chan <michael.chan@broadcom.com>
To: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Cc: netdev@vger.kernel.org, siva.kallam@broadcom.com,
	prashant@broadcom.com,  mchan@broadcom.com,
	pavan.chebbi@broadcom.com, drc@linux.vnet.ibm.com,
	 venkata.sai.duggi@ibm.com
Subject: Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
Date: Thu, 2 Nov 2023 10:27:27 -0700	[thread overview]
Message-ID: <CACKFLimX4Pjm89cneeTa36B519DN3mdXXo5FXfDFi6e0SBwUSA@mail.gmail.com> (raw)
In-Reply-To: <20231102161219.220-1-thinhtr@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]

On Thu, Nov 2, 2023 at 9:16 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
> When an EEH error is encountered by a PCI adapter, the EEH driver
> modifies the PCI channel's state as shown below:
>
>    enum {
>       /* I/O channel is in normal state */
>       pci_channel_io_normal = (__force pci_channel_state_t) 1,
>
>       /* I/O to channel is blocked */
>       pci_channel_io_frozen = (__force pci_channel_state_t) 2,
>
>       /* PCI card is dead */
>       pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
>    };
>
> If the same EEH error then causes the tg3 driver's transmit timeout
> logic to execute, the tg3_tx_timeout() function schedules a reset
> task via tg3_reset_task_schedule(), which may cause a race condition
> between the tg3 and EEH driver as both attempt to recover the HW via
> a reset action.
>
> EEH driver gets error event
> --> eeh_set_channel_state()
>     and set device to one of
>     error state above           scheduler: tg3_reset_task() get
>                                 returned error from tg3_init_hw()
>                              --> dev_close() shuts down the interface
>
> tg3_io_slot_reset() and
> tg3_io_resume() fail to
> reset/resume the device
>
>
> To resolve this issue, we avoid the race condition by checking the PCI
> channel state in the tg3_tx_timeout() function and skip the tg3 driver
> initiated reset when the PCI channel is not in the normal state.  (The
> driver has no access to tg3 device registers at this point and cannot
> even complete the reset task successfully without external assistance.)
> We'll leave the reset procedure to be managed by the EEH driver which
> calls the tg3_io_error_detected(), tg3_io_slot_reset() and
> tg3_io_resume() functions as appropriate.

This scenario can affect other drivers too, right?  Shouldn't this be
handled in a higher layer before calling ->ndo_tx_timeout() so we
don't have to add this logic to all the other drivers?  Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

  reply	other threads:[~2023-11-02 17:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-02 18:55 [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel() Thinh Tran
2023-10-03  4:34 ` Pavan Chebbi
2023-10-31 23:18   ` Thinh Tran
2023-10-03  9:37 ` Michael Chan
2023-10-03 22:05   ` Thinh Tran
2023-11-02 16:02     ` Thinh Tran
2023-11-02 16:12 ` [PATCH v2] net/tg3: fix race condition in tg3_reset_task() Thinh Tran
2023-11-02 17:27   ` Michael Chan [this message]
2023-11-02 20:37     ` Thinh Tran
2023-11-14 17:39       ` Thinh Tran
2023-11-14 21:03         ` Michael Chan
2023-11-15 18:23           ` Thinh Tran
2023-11-15 18:56             ` Michael Chan
2023-11-16 14:41               ` Thinh Tran
2023-11-16 15:18   ` [PATCH v3] " Thinh Tran
2023-11-16 21:34     ` Michael Chan
2023-11-17 16:19       ` Thinh Tran
2023-11-17 18:31         ` Michael Chan
2023-11-30 22:29           ` Thinh Tran
2023-12-01  0:19     ` [PATCH v4] " Thinh Tran
2023-12-01 16:50       ` Michael Chan
2023-12-02  0:40       ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACKFLimX4Pjm89cneeTa36B519DN3mdXXo5FXfDFi6e0SBwUSA@mail.gmail.com \
    --to=michael.chan@broadcom.com \
    --cc=drc@linux.vnet.ibm.com \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavan.chebbi@broadcom.com \
    --cc=prashant@broadcom.com \
    --cc=siva.kallam@broadcom.com \
    --cc=thinhtr@linux.vnet.ibm.com \
    --cc=venkata.sai.duggi@ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).