All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Hogan <jhogan@kernel.org>
To: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
	netdev@vger.kernel.org,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	intel-wired-lan@lists.osuosl.org,
	Tony Nguyen <anthony.l.nguyen@intel.com>
Subject: Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
Date: Sun, 03 Sep 2023 18:57:46 +0100	[thread overview]
Message-ID: <2158798.irdbgypaU6@saruman> (raw)
In-Reply-To: <87zg2alict.fsf@intel.com>

On Tuesday, 29 August 2023 02:58:42 BST Vinicius Costa Gomes wrote:
> James Hogan <jhogan@kernel.org> writes:
> > On Sunday, 2 October 2022 11:56:28 BST James Hogan wrote:
> >> On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
> >> > I'd be great to have this longstanding issue properly fixed rather than
> >> > having to carry a patch locally that may not be lock safe.
> >> > 
> >> > Also, any tips for diagnosing the issue of the network link not coming
> >> > back
> >> > up after resume? I sometimes have to unload and reload the driver
> >> > module
> >> > to
> >> > get it back again.
> >> 
> >> Any thoughts on this from anybody?
> > 
> > Ping... I've been carrying this patch locally on archlinux for almost a
> > year now. Every time I update my kernel and forget to rebuild with the
> > patch it catches me out with deadlocks after resume, and even with the
> > patch I frequently have to reload the igc module after resume to get the
> > network to come up (which is preferable to deadlocks but still really
> > sucks). I'd really appreciate if it could get some attention.
> 
> I am setting up my test systems to reproduce the deadlocks, then let's
> see what ideas happen about removing the need for those locks.
> 
> About the link failures, are there any error messages in the kernel
> logs? (also, if you could share those logs, can be off-list, it would
> help) I am trying to think what could be happening, and how to further
> debug this.

Looking through the resume log, the only network/igc related items are these:

Sep 03 18:28:17 saruman kernel: igc 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.7180] manager: sleep: wake requested (sleeping: yes  enabled: yes)
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.7181] device (enp6s0): state change: activated -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Sep 03 18:28:17 saruman avahi-daemon[989]: Withdrawing address record for 192.168.1.239 on enp6s0.
Sep 03 18:28:17 saruman avahi-daemon[989]: Leaving mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:17 saruman avahi-daemon[989]: Interface enp6s0.IPv4 no longer relevant for mDNS.
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8202] manager: NetworkManager state is now CONNECTED_GLOBAL
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8657] manager: NetworkManager state is now DISCONNECTED
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8660] device (enp6s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Sep 03 18:28:17 saruman systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 03 18:28:17 saruman systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3075] device (enp6s0): carrier: link connected
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3076] device (enp6s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3080] policy: auto-activating connection 'Wired connection 1' (f6634f16-77ca-34f7-846a-8c41e15a8ad1)
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3082] device (enp6s0): Activation: starting connection 'Wired connection 1' (f6634f16-77ca-34f7-846a-8c41e15a8ad1)
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3082] device (enp6s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3083] manager: NetworkManager state is now CONNECTING
Sep 03 18:28:21 saruman kernel: igc 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 03 18:28:21 saruman kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp6s0: link becomes ready
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3506] device (enp6s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3512] device (enp6s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3515] policy: set 'Wired connection 1' (enp6s0) as default for IPv4 routing and DNS
Sep 03 18:28:21 saruman avahi-daemon[989]: Joining mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:21 saruman avahi-daemon[989]: New relevant interface enp6s0.IPv4 for mDNS.
Sep 03 18:28:21 saruman avahi-daemon[989]: Registering new address record for 192.168.1.239 on enp6s0.IPv4.
Sep 03 18:28:22 saruman systemd[1]: systemd-rfkill.service: Deactivated successfully.
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3544] device (enp6s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3553] device (enp6s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3554] device (enp6s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3555] manager: NetworkManager state is now CONNECTED_SITE
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3556] device (enp6s0): Activation: successful, device activated.
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.3532] device (enp6s0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Sep 03 18:28:27 saruman avahi-daemon[989]: Withdrawing address record for 192.168.1.239 on enp6s0.
Sep 03 18:28:27 saruman avahi-daemon[989]: Leaving mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:27 saruman avahi-daemon[989]: Interface enp6s0.IPv4 no longer relevant for mDNS.
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.5266] manager: NetworkManager state is now CONNECTED_LOCAL
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.5267] manager: NetworkManager state is now DISCONNECTED
Sep 03 18:28:27 saruman systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.

As mentioned previously, CONFIG_PROVE_LOCKING=y and I'm seeing splats during boot, notably RTNL assertion failed at net/core/dev.c (2877) and suspicious RCU usage.

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

      reply	other threads:[~2023-09-03 17:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  8:14 [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset James Hogan
2022-07-15 17:25 ` Tony Nguyen
2022-07-17 19:59 ` Vinicius Costa Gomes
     [not found]   ` <4773114.31r3eYUQgx@saruman>
2022-07-23 15:52     ` James Hogan
2022-07-27 14:37       ` Vinicius Costa Gomes
2022-07-28 17:36         ` James Hogan
2022-08-04 13:03           ` James Hogan
2022-08-04 13:27             ` Paul Menzel
2022-08-04 21:41               ` James Hogan
2022-08-04 22:07                 ` James Hogan
2022-08-05 11:25                   ` James Hogan
2022-08-11 15:13                     ` [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path Vinicius Costa Gomes
2022-08-11 18:58                       ` kernel test robot
2022-08-11 18:58                         ` kernel test robot
2022-08-11 19:59                       ` kernel test robot
2022-08-11 20:25                       ` [Intel-wired-lan] [WIP v2] " Vinicius Costa Gomes
2022-08-11 21:41                         ` James Hogan
2022-08-13  0:05                           ` Vinicius Costa Gomes
2022-08-13 17:18                             ` James Hogan
2022-08-29  8:16                               ` James Hogan
2022-10-02 10:56                                 ` James Hogan
2023-08-14 11:04                                   ` James Hogan
2023-08-29  1:58                                     ` Vinicius Costa Gomes
2023-09-03 17:57                                       ` James Hogan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2158798.irdbgypaU6@saruman \
    --to=jhogan@kernel.org \
    --cc=aleksandr.loktionov@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    --cc=vinicius.gomes@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.