From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57450C4167B for ; Mon, 11 Dec 2023 14:05:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343954AbjLKOFE (ORCPT ); Mon, 11 Dec 2023 09:05:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344195AbjLKOEK (ORCPT ); Mon, 11 Dec 2023 09:04:10 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9464A358A for ; Mon, 11 Dec 2023 06:02:43 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1058C433CA; Mon, 11 Dec 2023 14:02:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702303363; bh=DcyRBQghAEDYdsobnNJjsQmjXbVGrdpzxORTviMmKm0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=baKnYNyCC4UinxgNwfxxv92phVvyBLQFdX6a5rL5PUb7V7R2tn/nx41KQR16MV2/z 5/eodOjrkbgRm65ehcarJWLR6JYWuPTw63HcSljEN1qyHAxuS04//+LoCS9yfsvmK+ TAobMj17MJTNbhPGDxWnAqBZXHRqa8A1x2o6vcgW/DQX/NZ1VYrl9uc6OtF8FmbfWC q3rzuP5+khsFxggGYIUzSikVhLtbaweTG2/kRjq9Q/FqXuOKNiR35Cih3gE6thHtco NiM/ySxgn/syGpWbxmjsEGLwLKceSnU3OAay+niZ87uC9SLP5Rf3mp0no49c31vvpt X5EO+JvlCV3zg== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Thinh Tran , Venkata Sai Duggi , David Christensen , Michael Chan , Jakub Kicinski , Sasha Levin , pavan.chebbi@broadcom.com, mchan@broadcom.com, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, netdev@vger.kernel.org Subject: [PATCH AUTOSEL 5.4 08/12] net/tg3: fix race condition in tg3_reset_task() Date: Mon, 11 Dec 2023 09:02:01 -0500 Message-ID: <20231211140219.392379-8-sashal@kernel.org> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231211140219.392379-1-sashal@kernel.org> References: <20231211140219.392379-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 5.4.263 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Thinh Tran [ Upstream commit 16b55b1f2269962fb6b5154b8bf43f37c9a96637 ] When an EEH error is encountered by a PCI adapter, the EEH driver modifies the PCI channel's state as shown below: enum { /* I/O channel is in normal state */ pci_channel_io_normal = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ pci_channel_io_frozen = (__force pci_channel_state_t) 2, /* PCI card is dead */ pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, }; If the same EEH error then causes the tg3 driver's transmit timeout logic to execute, the tg3_tx_timeout() function schedules a reset task via tg3_reset_task_schedule(), which may cause a race condition between the tg3 and EEH driver as both attempt to recover the HW via a reset action. EEH driver gets error event --> eeh_set_channel_state() and set device to one of error state above scheduler: tg3_reset_task() get returned error from tg3_init_hw() --> dev_close() shuts down the interface tg3_io_slot_reset() and tg3_io_resume() fail to reset/resume the device To resolve this issue, we avoid the race condition by checking the PCI channel state in the tg3_reset_task() function and skip the tg3 driver initiated reset when the PCI channel is not in the normal state. (The driver has no access to tg3 device registers at this point and cannot even complete the reset task successfully without external assistance.) We'll leave the reset procedure to be managed by the EEH driver which calls the tg3_io_error_detected(), tg3_io_slot_reset() and tg3_io_resume() functions as appropriate. Adding the same checking in tg3_dump_state() to avoid dumping all device registers when the PCI channel is not in the normal state. Signed-off-by: Thinh Tran Tested-by: Venkata Sai Duggi Reviewed-by: David Christensen Reviewed-by: Michael Chan Link: https://lore.kernel.org/r/20231201001911.656-1-thinhtr@linux.vnet.ibm.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/broadcom/tg3.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 90bfaea6d6292..74710f523f356 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -6460,6 +6460,14 @@ static void tg3_dump_state(struct tg3 *tp) int i; u32 *regs; + /* If it is a PCI error, all registers will be 0xffff, + * we don't dump them out, just report the error and return + */ + if (tp->pdev->error_state != pci_channel_io_normal) { + netdev_err(tp->dev, "PCI channel ERROR!\n"); + return; + } + regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC); if (!regs) return; @@ -11196,7 +11204,8 @@ static void tg3_reset_task(struct work_struct *work) rtnl_lock(); tg3_full_lock(tp, 0); - if (tp->pcierr_recovery || !netif_running(tp->dev)) { + if (tp->pcierr_recovery || !netif_running(tp->dev) || + tp->pdev->error_state != pci_channel_io_normal) { tg3_flag_clear(tp, RESET_TASK_PENDING); tg3_full_unlock(tp); rtnl_unlock(); -- 2.42.0