All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Hogan <jhogan@kernel.org>
To: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
	netdev@vger.kernel.org,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	intel-wired-lan@lists.osuosl.org
Subject: Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
Date: Mon, 29 Aug 2022 09:16:33 +0100	[thread overview]
Message-ID: <3186253.aeNJFYEL58@saruman> (raw)
In-Reply-To: <2301866.ElGaqSPkdT@saruman>

On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
> On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> > James Hogan <jhogan@kernel.org> writes:
> > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> > >> It was reported a RTNL deadlock in the igc driver that was causing
> > >> problems during suspend/resume.
> > >> 
> > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> > >> caused by taking RTNL in RPM resume path").
> > >> 
> > >> Reported-by: James Hogan <jhogan@kernel.org>
> > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > >> ---
> > >> Sorry for the noise earlier, my kernel config didn't have runtime PM
> > >> enabled.
> > > 
> > > Thanks for looking into this.
> > > 
> > > This is identical to the patch I've been running for the last week. The
> > > deadlock is avoided, however I now occasionally see an assertion from
> > > netif_set_real_num_tx_queues due to the lock not being taken in some
> > > cases
> > > via the runtime_resume path, and a suspicious
> > > rcu_dereference_protected()
> > > warning (presumably due to the same issue of the lock not being taken).
> > > See here for details:
> > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> > 
> > Oh, sorry. I missed the part that the rtnl assert splat was already
> > using similar/identical code to what I got/copied from igb.
> > 
> > So what this seems to be telling us is that the "fix" from igb is only
> > hiding the issue,
> 
> I suppose the patch just changes the assumption from "lock will never be
> held on runtime resume path" (incorrect, deadlock) to "lock will always be
> held on runtime resume path" (also incorrect, probably racy).
> 
> > and we would need to remove the need for taking the
> > RTNL for the suspend/resume paths in igc and igb? (as someone else said
> > in that igb thread, iirc)
> 
> (I'll defer to others on this. I'm pretty unfamiliar with networking code
> and this particular lock.)

I'd be great to have this longstanding issue properly fixed rather than having 
to carry a patch locally that may not be lock safe.

Also, any tips for diagnosing the issue of the network link not coming back up 
after resume? I sometimes have to unload and reload the driver module to get 
it back again.

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

  reply	other threads:[~2022-08-29  8:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  8:14 [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset James Hogan
2022-07-15 17:25 ` Tony Nguyen
2022-07-17 19:59 ` Vinicius Costa Gomes
     [not found]   ` <4773114.31r3eYUQgx@saruman>
2022-07-23 15:52     ` James Hogan
2022-07-27 14:37       ` Vinicius Costa Gomes
2022-07-28 17:36         ` James Hogan
2022-08-04 13:03           ` James Hogan
2022-08-04 13:27             ` Paul Menzel
2022-08-04 21:41               ` James Hogan
2022-08-04 22:07                 ` James Hogan
2022-08-05 11:25                   ` James Hogan
2022-08-11 15:13                     ` [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path Vinicius Costa Gomes
2022-08-11 18:58                       ` kernel test robot
2022-08-11 18:58                         ` kernel test robot
2022-08-11 19:59                       ` kernel test robot
2022-08-11 20:25                       ` [Intel-wired-lan] [WIP v2] " Vinicius Costa Gomes
2022-08-11 21:41                         ` James Hogan
2022-08-13  0:05                           ` Vinicius Costa Gomes
2022-08-13 17:18                             ` James Hogan
2022-08-29  8:16                               ` James Hogan [this message]
2022-10-02 10:56                                 ` James Hogan
2023-08-14 11:04                                   ` James Hogan
2023-08-29  1:58                                     ` Vinicius Costa Gomes
2023-09-03 17:57                                       ` James Hogan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3186253.aeNJFYEL58@saruman \
    --to=jhogan@kernel.org \
    --cc=aleksandr.loktionov@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=netdev@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    --cc=vinicius.gomes@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.