public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Ian Ray <ian.ray@gehealthcare.com>
To: Simon Horman <horms@kernel.org>
Cc: "Tony Nguyen" <anthony.l.nguyen@intel.com>,
	"Przemek Kitszel" <przemyslaw.kitszel@intel.com>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	brian.ruley@gehealthcare.com, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	ian.ray@gehealthcare.com
Subject: Re: [PATCH] igb: Fix watchdog_task race with shutdown
Date: Wed, 30 Apr 2025 09:13:34 +0300	[thread overview]
Message-ID: <aBG_jm62ngj0Mqq-@0ec9f3ddc3bf> (raw)
In-Reply-To: <20250429152021.GP3339421@horms.kernel.org>

On Tue, Apr 29, 2025 at 04:20:21PM +0100, Simon Horman wrote:
> + Toke
> 
> On Mon, Apr 28, 2025 at 02:54:49PM +0300, Ian Ray wrote:
> > A rare [1] race condition is observed between the igb_watchdog_task and
> > shutdown on a dual-core i.MX6 based system with two I210 controllers.
> >
> > Using printk, the igb_watchdog_task is hung in igb_read_phy_reg because
> > __igb_shutdown has already called __igb_close.
> >
> > Fix this by locking in igb_watchdog_task (in the same way as is done in
> > igb_reset_task).
> >
> > reboot             kworker
> >
> > __igb_shutdown
> >   rtnl_lock
> >   __igb_close
> >   :                igb_watchdog_task
> >   :                :
> >   :                igb_read_phy_reg (hung)
> >   rtnl_unlock
> >
> > [1] Note that this is easier to reproduce with 'initcall_debug' logging
> > and additional and printk logging in igb_main.
> >
> > Signed-off-by: Ian Ray <ian.ray@gehealthcare.com>
> 
> Hi Ian,
> 
> Thanks for your patch.
> 
> While I think that the simplicity of this approach may well be appropriate
> as a fix for the problem described I do have a concern.
> 
> I am worried that taking RTNL each time the watchdog tasks will create
> unnecessary lock contention. That may manifest in weird and wonderful ways
> in future.  Maybe this patch doesn't make things materially worse in that
> regard.  But it would be nice to have a plan to move away from using RTNL,
> as is happening elsewhere.
> 
> ...

Hi Simon,

Many thanks for the review.  I've been reflecting on the patch (and
discussing internally) and I think it would be better to model the
behaviour on igb_remove instead of igb_reset_task.  Meaning that the
timer should be deleted, and the work cancelled, after setting bit
IGB_DOWN.  This would mirror igb_up.  (And has the advantage of not
using the RTNL.)

(As you can probably tell) I am not very familiar with this subsystem,
but the modified proposal, below, works well in my testing.  I will
happily send a V2 if you think this is a better direction.

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 291348505868..d4b905469cc2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2173,10 +2173,14 @@ void igb_down(struct igb_adapter *adapter)
        u32 tctl, rctl;
        int i;

-       /* signal that we're down so the interrupt handler does not
-        * reschedule our watchdog timer
+       /* The watchdog timer may be rescheduled, so explicitly
+        * disable watchdog from being rescheduled.
         */
        set_bit(__IGB_DOWN, &adapter->state);
+       del_timer_sync(&adapter->watchdog_timer);
+       del_timer_sync(&adapter->phy_info_timer);
+
+       cancel_work_sync(&adapter->watchdog_task);

        /* disable receives in the hardware */
        rctl = rd32(E1000_RCTL);
@@ -2207,11 +2211,6 @@ void igb_down(struct igb_adapter *adapter)
                }
        }

-       del_timer_sync(&adapter->watchdog_timer);
-       del_timer_sync(&adapter->phy_info_timer);
-
-       cancel_work_sync(&adapter->watchdog_task);
-
        /* record the stats before reset*/
        spin_lock(&adapter->stats64_lock);
        igb_update_stats(adapter);

      reply	other threads:[~2025-04-30  6:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-28 11:54 [PATCH] igb: Fix watchdog_task race with shutdown Ian Ray
2025-04-29 15:20 ` Simon Horman
2025-04-30  6:13   ` Ian Ray [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBG_jm62ngj0Mqq-@0ec9f3ddc3bf \
    --to=ian.ray@gehealthcare.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=brian.ruley@gehealthcare.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox