Intel-Wired-Lan Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: James Hogan <jhogan@kernel.org>
To: Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: intel-wired-lan <intel-wired-lan@lists.osuosl.org>
Subject: Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
Date: Thu, 28 Jul 2022 18:36:31 +0100	[thread overview]
Message-ID: <4755499.31r3eYUQgx@saruman> (raw)
In-Reply-To: <874jz2ei5m.fsf@intel.com>

Hi,

On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> James Hogan <jhogan@kernel.org> writes:
> > On Sunday, 17 July 2022 22:40:59 BST James Hogan wrote:
> >> On Sunday, 17 July 2022 20:59:36 BST you wrote:
> >> > Hi James,
> >> > 
> >> > James Hogan <jhogan@kernel.org> writes:
> >> > > Hi,
> >> > > 
> >> > > I'm getting regular hangs after resume from suspend with the igc
> >> > > driver,
> >> > > for an I225-V (rev 3) on an MSI Pro Z690-A, with version 5.18.11 on
> >> > > archlinux. A few stable versions ago it was possible to get the
> >> > > network
> >> > > back up by removing and reloading the igc driver, however now I get
> >> > > the
> >> > > following, and only a reboot works (which itself hangs before
> >> > > actually
> >> > > restarting the machine, and requires a hard reset).
> >> > 
> >> > Sorry for the delay. I was travelling.
> >> 
> >> No worries
> >> 
> >> > I remember seeing some weird behaviors with PCIe PTM and
> >> > suspend/resume.
> >> > Specially with onboard controllers.
> >> 
> >> It appears that the hardware got itself into a funny state such that
> >> NetworkManager hung as described more often than not on resume, however
> >> without changing kernel it has now settled back into the previous
> >> behaviour
> >> of usually working, but occasionally (maybe 1 in 5) the network wouldn't
> >> come back up on resume, with network related things hung until I unload
> >> and
> >> reload the igc module.
> >> 
> >> > Can you see if disabling CONFIG_PCIE_PTM in your kernel config changes
> >> > anything? (assuming it's enabled)
> >> 
> >> It is enabled yes. Okay I'll give it a go when I get the chance. I'll
> >> likely have to do a bunch of boot and suspend cycles to try and get it
> >> back into either failure condition.
> > 
> > (sorry somehow dropped others off cc the other day, now adding back)...
> > 
> > I've been running most of this week with 5.18.12-arch1-1, rebuilt with
> > CONFIG_PCIE_PTM=n, however I have now observed both cases.
> > 
> > It failed to bring up the network link a couple of times after resume from
> > suspend, and i managed to remove the igc module and reload it to get it
> > going again.
> > 
> > Another time it failed to come back up, but reloading module didn't help.
> > 
> > I also hit the igc_tsn_reset hang, but this time it was immediately after
> > boot (possibly a warm reset), where it failed to bring up the network at
> > all. I'll paste the full backtraces of hung tasks below.
> > 
> > I'm wondering whether, since most of the tasks are stuck trying to acquire
> > a mutex, the issue is elsewhere. In some past cases though all the tasks
> > that are dumped are at a mutex_lock...
> 
> Yeah, I agree that it seems like the issue is something else. I would
> suggest start with the "simple" things, enabling 'CONFIG_PROVE_LOCKING'
> and looking at the first splat, it could be that what you are seeing is
> caused by a deadlock somewhere else.

This is revealing I think (re-enabled PCIE_PTM and enabled PROVE_LOCKING).

In this case it happened within minutes of boot, but a few previous attempts
with several suspend cycles with the same kernel didn't detect the same thing.

NetworkManager[857]: <info>  [1659028974.1752] device (enp6s0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')

============================================
WARNING: possible recursive locking detected
5.18.12-arch1-1 #2 Not tainted
--------------------------------------------
NetworkManager/857 is trying to acquire lock:
ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: igc_resume+0xf6/0x1d0 [igc]

but task is already holding lock:
ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080

other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0
       ----
  lock(rtnl_mutex);
  lock(rtnl_mutex);

 *** DEADLOCK ***
 May be due to missing lock nesting notation
1 lock held by NetworkManager/857:
 #0: ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080

stack backtrace:
CPU: 0 PID: 857 Comm: NetworkManager Not tainted 5.18.12-arch1-1 #2 369425cead7bf2331cd4c5d2279465ad4a0fc21f
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40 05/17/2022
Call Trace:
 <TASK>
 dump_stack_lvl+0x5f/0x78
 __lock_acquire.cold+0xd4/0x2e5
 ? __lock_acquire+0x3b2/0x1fd0
 lock_acquire+0xc8/0x2d0
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 ? lock_is_held_type+0xaa/0x120
 __mutex_lock+0xb6/0x830
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 ? lockdep_hardirqs_on_prepare+0xdd/0x180
 ? _raw_spin_unlock_irqrestore+0x34/0x50
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 pci_pm_runtime_resume+0xab/0xd0
 ? pci_pm_freeze_noirq+0xe0/0xe0
 __rpm_callback+0x41/0x160
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xe0/0xe0
 rpm_resume+0x5eb/0x820
 __pm_runtime_resume+0x4b/0x80
 dev_ethtool+0x128/0x3080
 ? lock_is_held_type+0xaa/0x120
 ? find_held_lock+0x2b/0x80
 ? dev_load+0x57/0x140
 ? lock_release+0xd4/0x2d0
 dev_ioctl+0x155/0x560
 sock_do_ioctl+0xd7/0x120
 sock_ioctl+0x103/0x360
 ? __fget_files+0xd2/0x170
 __x64_sys_ioctl+0x8e/0xc0
 do_syscall_64+0x5c/0x90
 ? do_syscall_64+0x6b/0x90
 ? lockdep_hardirqs_on_prepare+0xdd/0x180
 ? do_syscall_64+0x6b/0x90
 ? lockdep_hardirqs_on_prepare+0xdd/0x180
 ? do_syscall_64+0x6b/0x90
 ? asm_sysvec_apic_timer_interrupt+0xe/0x20
 ? rcu_read_lock_sched_held+0x40/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2c35d077af
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffd7319afd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd7319b2c0 RCX: 00007f2c35d077af
RDX: 00007ffd7319b0f0 RSI: 0000000000008946 RDI: 0000000000000012
RBP: 00007ffd7319b270 R08: 0000000000000000 R09: 00007ffd7319b2c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd7319b0f0 R14: 00007ffd7319b0d0 R15: 00007ffd7319b0d0
 </TASK>

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

  reply	other threads:[~2022-07-28 17:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  8:14 [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset James Hogan
2022-07-15 17:25 ` Tony Nguyen
2022-07-17 19:59 ` Vinicius Costa Gomes
     [not found]   ` <4773114.31r3eYUQgx@saruman>
2022-07-23 15:52     ` James Hogan
2022-07-27 14:37       ` Vinicius Costa Gomes
2022-07-28 17:36         ` James Hogan [this message]
2022-08-04 13:03           ` James Hogan
2022-08-04 13:27             ` Paul Menzel
2022-08-04 21:41               ` James Hogan
2022-08-04 22:07                 ` James Hogan
2022-08-05 11:25                   ` James Hogan
2022-08-11 15:13                     ` [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path Vinicius Costa Gomes
2022-08-11 18:58                       ` kernel test robot
2022-08-11 19:59                       ` kernel test robot
2022-08-11 20:25                       ` [Intel-wired-lan] [WIP v2] " Vinicius Costa Gomes
2022-08-11 21:41                         ` James Hogan
2022-08-13  0:05                           ` Vinicius Costa Gomes
2022-08-13 17:18                             ` James Hogan
2022-08-29  8:16                               ` James Hogan
2022-10-02 10:56                                 ` James Hogan
2023-08-14 11:04                                   ` James Hogan
2023-08-29  1:58                                     ` Vinicius Costa Gomes
2023-09-03 17:57                                       ` James Hogan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4755499.31r3eYUQgx@saruman \
    --to=jhogan@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=vinicius.gomes@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox