From: Ian Ray <ian.ray@gehealthcare.com>
To: Simon Horman <horms@kernel.org>
Cc: "Tony Nguyen" <anthony.l.nguyen@intel.com>,
"Przemek Kitszel" <przemyslaw.kitszel@intel.com>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
brian.ruley@gehealthcare.com, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
ian.ray@gehealthcare.com
Subject: Re: [PATCH] igb: Fix watchdog_task race with shutdown
Date: Wed, 30 Apr 2025 09:13:34 +0300 [thread overview]
Message-ID: <aBG_jm62ngj0Mqq-@0ec9f3ddc3bf> (raw)
In-Reply-To: <20250429152021.GP3339421@horms.kernel.org>
On Tue, Apr 29, 2025 at 04:20:21PM +0100, Simon Horman wrote:
> + Toke
>
> On Mon, Apr 28, 2025 at 02:54:49PM +0300, Ian Ray wrote:
> > A rare [1] race condition is observed between the igb_watchdog_task and
> > shutdown on a dual-core i.MX6 based system with two I210 controllers.
> >
> > Using printk, the igb_watchdog_task is hung in igb_read_phy_reg because
> > __igb_shutdown has already called __igb_close.
> >
> > Fix this by locking in igb_watchdog_task (in the same way as is done in
> > igb_reset_task).
> >
> > reboot kworker
> >
> > __igb_shutdown
> > rtnl_lock
> > __igb_close
> > : igb_watchdog_task
> > : :
> > : igb_read_phy_reg (hung)
> > rtnl_unlock
> >
> > [1] Note that this is easier to reproduce with 'initcall_debug' logging
> > and additional and printk logging in igb_main.
> >
> > Signed-off-by: Ian Ray <ian.ray@gehealthcare.com>
>
> Hi Ian,
>
> Thanks for your patch.
>
> While I think that the simplicity of this approach may well be appropriate
> as a fix for the problem described I do have a concern.
>
> I am worried that taking RTNL each time the watchdog tasks will create
> unnecessary lock contention. That may manifest in weird and wonderful ways
> in future. Maybe this patch doesn't make things materially worse in that
> regard. But it would be nice to have a plan to move away from using RTNL,
> as is happening elsewhere.
>
> ...
Hi Simon,
Many thanks for the review. I've been reflecting on the patch (and
discussing internally) and I think it would be better to model the
behaviour on igb_remove instead of igb_reset_task. Meaning that the
timer should be deleted, and the work cancelled, after setting bit
IGB_DOWN. This would mirror igb_up. (And has the advantage of not
using the RTNL.)
(As you can probably tell) I am not very familiar with this subsystem,
but the modified proposal, below, works well in my testing. I will
happily send a V2 if you think this is a better direction.
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 291348505868..d4b905469cc2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2173,10 +2173,14 @@ void igb_down(struct igb_adapter *adapter)
u32 tctl, rctl;
int i;
- /* signal that we're down so the interrupt handler does not
- * reschedule our watchdog timer
+ /* The watchdog timer may be rescheduled, so explicitly
+ * disable watchdog from being rescheduled.
*/
set_bit(__IGB_DOWN, &adapter->state);
+ del_timer_sync(&adapter->watchdog_timer);
+ del_timer_sync(&adapter->phy_info_timer);
+
+ cancel_work_sync(&adapter->watchdog_task);
/* disable receives in the hardware */
rctl = rd32(E1000_RCTL);
@@ -2207,11 +2211,6 @@ void igb_down(struct igb_adapter *adapter)
}
}
- del_timer_sync(&adapter->watchdog_timer);
- del_timer_sync(&adapter->phy_info_timer);
-
- cancel_work_sync(&adapter->watchdog_task);
-
/* record the stats before reset*/
spin_lock(&adapter->stats64_lock);
igb_update_stats(adapter);
prev parent reply other threads:[~2025-04-30 6:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-28 11:54 [PATCH] igb: Fix watchdog_task race with shutdown Ian Ray
2025-04-29 15:20 ` Simon Horman
2025-04-30 6:13 ` Ian Ray [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBG_jm62ngj0Mqq-@0ec9f3ddc3bf \
--to=ian.ray@gehealthcare.com \
--cc=andrew+netdev@lunn.ch \
--cc=anthony.l.nguyen@intel.com \
--cc=brian.ruley@gehealthcare.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox