public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Przemek Kitszel <przemyslaw.kitszel@intel.com>
To: Petr Oros <poros@redhat.com>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Jakub Kicinski <kuba@kernel.org>
Cc: <ivecera@redhat.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"Eric Dumazet" <edumazet@google.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	"Tony Nguyen" <anthony.l.nguyen@intel.com>,
	<intel-wired-lan@lists.osuosl.org>,
	Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling
Date: Tue, 3 Feb 2026 11:19:33 +0100	[thread overview]
Message-ID: <7907d42e-4805-48bc-aaf6-16cbe46eb1d2@intel.com> (raw)
In-Reply-To: <14cb0b22-ec39-43e4-a35b-22ad558b2e34@redhat.com>

On 2/3/26 09:44, Petr Oros wrote:
> 
> On 2/3/26 02:00, Jacob Keller wrote:
>>
>>
>> On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
>>> On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
>>>> +    netdev_unlock(netdev);
>>>> +    ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
>>>> + !iavf_is_reset_in_progress(adapter),
>>>> +                           msecs_to_jiffies(5000));
>>>> +    netdev_lock(netdev);
>>>
>>> Dropping locks taken by the core around the driver callback
>>> is obviously unacceptable. SMH.
>>
>> Right. It seems like the correct fix is to either a) have reset take 
>> and hold the netdev lock (now that its distinct from the global RTNL 
>> lock) or b) refactor reset so that it can defer any of the netdev 
>> related stuff somehow.
>>
> I modeled this after the existing pattern in iavf_close() (ndo_stop), 
> which also temporarily releases the netdev instance lock taken by the 
> core to wait for an async operation to complete:

First of all, thank you for working on that, I was hit by the very same
problem (no series yet), but my local fix is the same as of now.

I don't see an easy fix (w/o substantial driver refactor).

> 
> static int iavf_close(struct net_device *netdev)
> {
>          netdev_assert_locked(netdev);
>          ...
>          iavf_down(adapter);
>          iavf_change_state(adapter, __IAVF_DOWN_PENDING);
>          iavf_free_traffic_irqs(adapter);
> 
>          netdev_unlock(netdev);
> 
>          status = wait_event_timeout(adapter->down_waitqueue,
>                                      adapter->state == __IAVF_DOWN,
>                                      msecs_to_jiffies(500));
>          if (!status)
>                  netdev_warn(netdev, "Device resources not yet 
> released\n");
>          netdev_lock(netdev);
>          ...
> }
> 
> This was introduced by commit 120f28a6f314fe ("iavf: get rid of the crit 
> lock"), and ndo_stop is called with netdev instance lock held by the 
> core just like ndo_change_mtu is. 

technically it was introduced by commmit afc664987ab3 ("eth: iavf:
extend the netdev_lock usage")

> Could you clarify why the unlock-wait- 
> lock pattern is acceptable in ndo_stop but not here?
> 

perhaps just closing netdev is a special kind of operation

Other thing is that the lock was added to allow further NAPI
development, and one silly driver should not stop that effort.
Sadly, we have not managed to re-design the driver yet. I would like to
do so personally, but have much work accumulated/pending to free my time

  reply	other threads:[~2026-02-03 10:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-02  8:48 [PATCH net] iavf: fix deadlock in reset handling Petr Oros
2026-02-02  9:06 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-02-02 10:53 ` Jijie Shao
2026-02-02 13:30 ` Ivan Vecera
2026-02-02 23:58 ` Jakub Kicinski
2026-02-03  1:00   ` [Intel-wired-lan] " Jacob Keller
2026-02-03  8:44     ` Petr Oros
2026-02-03 10:19       ` Przemek Kitszel [this message]
2026-02-03 11:32         ` Petr Oros
2026-02-03 23:47           ` Jacob Keller
2026-02-04  6:12             ` Przemek Kitszel
2026-02-04 19:25               ` Jacob Keller
2026-02-05 12:24                 ` Petr Oros
2026-02-05 23:37                   ` Jacob Keller
2026-02-06 10:04                     ` Petr Oros
2026-02-07  1:00                       ` Jacob Keller
2026-02-07  3:01                       ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7907d42e-4805-48bc-aaf6-16cbe46eb1d2@intel.com \
    --to=przemyslaw.kitszel@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=ivecera@redhat.com \
    --cc=jacob.e.keller@intel.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=poros@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox