public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	davem@davemloft.net, kuba@kernel.org, edumazet@google.com,
	andrew+netdev@lunn.ch, netdev@vger.kernel.org,
	andriy.shevchenko@intel.com, ilpo.jarvinen@linux.intel.com,
	dima.ruinskiy@intel.com, mbloch@nvidia.com, leon@kernel.org,
	linux-pci@vger.kernel.org, saeedm@nvidia.com, tariqt@nvidia.com,
	lukas@wunner.de, bhelgaas@google.com, richardcochran@gmail.com,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Avigail Dahan <avigailx.dahan@intel.com>
Subject: Re: [PATCH net-next 01/15] igc: Call netif_queue_set_napi() with rtnl locked
Date: Tue, 7 Apr 2026 08:53:15 +0200	[thread overview]
Message-ID: <20260407065315.GD3552@black.igk.intel.com> (raw)
In-Reply-To: <9f169800-12f2-4f98-ab99-e4433b2b49a9@redhat.com>

Hi,

On Thu, Apr 02, 2026 at 12:29:06PM +0200, Paolo Abeni wrote:
> On 3/31/26 7:37 PM, Bjorn Helgaas wrote:
> > On Mon, Mar 30, 2026 at 04:02:30PM -0700, Tony Nguyen wrote:
> >> From: Mika Westerberg <mika.westerberg@linux.intel.com>
> >>
> >> When runtime resuming igc we get:
> >>
> >>   [  516.161666] RTNL: assertion failed at ./include/net/netdev_lock.h (72)
> >>
> >> Happens because commit 310ae9eb2617 ("net: designate queue -> napi
> >> linking as "ops protected"") added check for this. For this reason drop
> >> the special case for runtime PM from __igc_resume(). This makes it take
> >> rtnl lock unconditionally.
> > 
> > Taking the rtnl lock unconditionally certainly makes the code nicer,
> > but the commit log only mentions the "avoid the warning" benefit, not
> > the actual reason this is safe to do.
> 
> Sashiko says it's not safe:
> 
> ---
> Can this regression cause a self-deadlock when a runtime resume is
> triggered from paths that already hold the rtnl lock?
> If the network interface is logically up but the link is disconnected,
> igc_runtime_idle() allows the device to enter runtime suspend. When a
> user queries the device using ethtool, the networking core acquires
> rtnl_lock() and then calls pm_runtime_get_sync() to ensure the hardware
> is awake.
> This synchronously executes the driver's runtime resume callback, which
> calls __igc_resume(). Because netif_running(netdev) is true, the
> modified __igc_resume() unconditionally attempts to acquire rtnl_lock().
> Since the executing thread already holds this non-recursive mutex, it
> appears the system would self-deadlock, hanging the network stack.
> ---

It's a good analysis. I just tried this flow:

1. Boot the system up, nothing connected to igc NIC.
2. Plug in cable to igc.
3. Configure the interface.
4. Enable runtime PM for igc.
5. Unplug the cable.
6. Verify igc is runtime suspended.
7. Run ethtool <interface>

This leads to deadlock as below.

igc maintainers, please drop this patch. I apologize I did not realize this
flow when I did it.

[ 1231.655924] INFO: task ethtool:3139 blocked for more than 122 seconds.
[ 1231.662515]       Tainted: G     U              7.0.0-rc6+ #1748
[ 1231.668551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1231.676410] task:ethtool         state:D stack:0     pid:3139  tgid:3139  ppid:292    task_flags:0x480000 flags:0x00080800
[ 1231.687508] Call Trace:
[ 1231.689997]  <TASK>
[ 1231.692132]  __schedule+0x58a/0x1820
[ 1231.695747]  ? sysvec_apic_timer_interrupt+0x4c/0xa0
[ 1231.700742]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 1231.706090]  schedule+0x64/0xe0
[ 1231.709262]  schedule_preempt_disabled+0x15/0x30
[ 1231.713907]  __mutex_lock+0x377/0xa60
[ 1231.717606]  __mutex_lock_slowpath+0x13/0x20
[ 1231.721905]  mutex_lock+0x2c/0x40
[ 1231.725259]  rtnl_lock+0x15/0x20
[ 1231.728541]  __igc_resume+0x19a/0x2b0 [igc]
[ 1231.732798]  igc_runtime_resume+0xe/0x20 [igc]
[ 1231.737288]  pci_pm_runtime_resume+0xce/0x100
[ 1231.741678]  ? __pfx_pci_pm_runtime_resume+0x10/0x10
[ 1231.746681]  __rpm_callback+0xab/0x310
[ 1231.750458]  ? ktime_get_mono_fast_ns+0x3a/0x100
[ 1231.755107]  ? __pfx_pci_pm_runtime_resume+0x10/0x10
[ 1231.760096]  rpm_resume+0x4bb/0x670
[ 1231.763618]  __pm_runtime_resume+0x5c/0x80
[ 1231.767749]  dev_ethtool+0x19d/0xc90
[ 1231.771352]  dev_ioctl+0x23c/0x550
[ 1231.774791]  sock_do_ioctl+0x11f/0x1b0
[ 1231.778569]  sock_ioctl+0x27f/0x390
[ 1231.782091]  ? handle_mm_fault+0x11a5/0x1250
[ 1231.786388]  __se_sys_ioctl+0x75/0xd0
[ 1231.790077]  __x64_sys_ioctl+0x1d/0x30
[ 1231.793851]  x64_sys_call+0x14ed/0x2d30
[ 1231.797719]  do_syscall_64+0xfb/0x680
[ 1231.801404]  ? arch_exit_to_user_mode_prepare+0xd/0xb0
[ 1231.806559]  ? irqentry_exit+0x3b/0x510
[ 1231.810413]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

  reply	other threads:[~2026-04-07  6:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260330230248.646900-1-anthony.l.nguyen@intel.com>
2026-03-30 23:02 ` [PATCH net-next 01/15] igc: Call netif_queue_set_napi() with rtnl locked Tony Nguyen
2026-03-31 17:37   ` Bjorn Helgaas
2026-04-02 10:29     ` Paolo Abeni
2026-04-07  6:53       ` Mika Westerberg [this message]
2026-03-30 23:02 ` [PATCH net-next 02/15] igc: Let the PCI core deal with the PM resume flow Tony Nguyen
2026-03-31 17:34   ` Bjorn Helgaas
2026-03-30 23:02 ` [PATCH net-next 03/15] igc: Don't reset the hardware on suspend path Tony Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260407065315.GD3552@black.igk.intel.com \
    --to=mika.westerberg@linux.intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=andriy.shevchenko@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=avigailx.dahan@intel.com \
    --cc=bhelgaas@google.com \
    --cc=davem@davemloft.net \
    --cc=dima.ruinskiy@intel.com \
    --cc=edumazet@google.com \
    --cc=helgaas@kernel.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jacob.e.keller@intel.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=vinicius.gomes@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox