Re: [net-next,3/9] ice: migrate to netdev ops lock

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: <anthony.l.nguyen@intel.com>, <netdev@vger.kernel.org>,
	<sdf@fomichev.me>, <andrew+netdev@lunn.ch>, <ast@kernel.org>,
	<sx.rinitha@intel.com>, <horms@kernel.org>,
	<yury.norov@gmail.com>, <john.fastabend@gmail.com>,
	<kohei@enjuk.jp>, <przemyslaw.kitszel@intel.com>,
	<richardcochran@gmail.com>, <alexander.nowlin@intel.com>,
	<daniel@iogearbox.net>, <maciej.fijalkowski@intel.com>,
	<nxne.cnse.osdt.itp.upstreaming@intel.com>, <edumazet@google.com>,
	<aleksandr.loktionov@intel.com>, <marcin.szycik@linux.intel.com>,
	<hawk@kernel.org>, <jacob.e.keller@intel.com>,
	<magnus.karlsson@intel.com>, <pmenzel@molgen.mpg.de>,
	<pabeni@redhat.com>, <bpf@vger.kernel.org>, <davem@davemloft.net>,
	<andriy.shevchenko@linux.intel.com>
Subject: Re: [net-next,3/9] ice: migrate to netdev ops lock
Date: Wed, 11 Feb 2026 14:51:56 +0100	[thread overview]
Message-ID: <2a300bf0-3975-4f79-a3e0-5cb9b8159d62@intel.com> (raw)
In-Reply-To: <20260211042458.555701-1-kuba@kernel.org>

From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 10 Feb 2026 20:24:58 -0800

Hi,

> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> 
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> ice: migrate to netdev ops lock
> 
> This commit migrates the ice driver to use the netdev ops lock by
> requesting ops locking and using _locked counterparts of NAPI
> configuration functions to avoid deadlocks during suspend/resume, queue
> rebuild and resets.
> 
>> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
>> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> 
> [ ... ]
> 
>> @@ -2308,10 +2308,14 @@ static int ice_vsi_cfg_tc_lan(struct ice_pf *pf, struct ice_vsi *vsi)
>> }
>>
>> /**
>> - * ice_vsi_cfg_def - configure default VSI based on the type
>> + * ice_vsi_cfg_def_locked - configure default VSI based on the type
>>  * @vsi: pointer to VSI
>> + *
>> + * Should be called only with the netdev lock taken.
>> + *
>> + * Return: 0 on success, -errno on failure.
>>  */
>> -static int ice_vsi_cfg_def(struct ice_vsi *vsi)
>> +static int ice_vsi_cfg_def_locked(struct ice_vsi *vsi)
> 
> [ ... ]
> 
>> +unroll_vector_base:
>> +	/* reclaim SW interrupts back to the common pool */
>> +unroll_alloc_q_vector:
>> +	ice_vsi_free_q_vectors(vsi);
> 
> Should this be ice_vsi_free_q_vectors_locked(vsi)?
> 
> ice_vsi_cfg_def_locked() is called with the netdev lock held (either by
> ice_vsi_cfg_def() wrapper or directly from callers like
> ice_vsi_rebuild_locked()). The error path here calls ice_vsi_free_q_vectors()
> which tries to acquire the netdev lock:
> 
> ice_vsi_cfg_def_locked() [netdev lock held]
>   -> ice_vsi_free_q_vectors()
>      -> netdev_lock(dev)  /* deadlock - already held */
> 
> This would deadlock when an error triggers unroll_vector_base or
> unroll_alloc_q_vector paths (e.g., memory allocation failure in
> ice_vsi_alloc_rings() or ice_vsi_alloc_ring_stats()).
> 
> [ ... ]
> 
>> @@ -2703,7 +2748,7 @@ void ice_vsi_close(struct ice_vsi *vsi)
>> {
>> 	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
>> 		ice_down(vsi);
>>
>> -	ice_vsi_clear_napi_queues(vsi);
>> +	ice_vsi_clear_napi_queues_locked(vsi);
> 
> ice_vsi_clear_napi_queues_locked() expects the netdev lock to be held.
> Are all callers of ice_vsi_close() holding the netdev lock?
> 
> Looking at ice_sf_dev_remove():
> 
> ice_sf_dev_remove()
>   -> ice_vsi_close(vsi)  /* no netdev_lock held */
>   -> ice_sf_decfg_netdev()  /* unregisters netdev later */
> 
> And ice_dis_vsi() when !netif_running(vsi->netdev):
> 
> ice_dis_vsi(vsi, false)
>   -> ice_vsi_close(vsi)  /* rtnl_lock only taken when netif_running */
> 
> Both paths call ice_vsi_close() without holding the netdev lock while
> the netdev is still registered, which could trigger
> netdev_assert_locked_or_invisible warnings.
> 
> [ ... ]
> 
>> @@ -3069,16 +3152,17 @@
>> }
>>
>> /**
>> - * ice_vsi_rebuild - Rebuild VSI after reset
>> + * ice_vsi_rebuild_locked - Rebuild VSI after reset
>>  * @vsi: VSI to be rebuild
>>  * @vsi_flags: flags used for VSI rebuild flow
>>  *
>>  * Set vsi_flags to ICE_VSI_FLAG_INIT to initialize a new VSI, or
>>  * ICE_VSI_FLAG_NO_INIT to rebuild an existing VSI in hardware.
>> + * Should be called only under the netdev lock.
>>  *
>>  * Returns 0 on success and negative value on failure
>>  */
>> -int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
>> +int ice_vsi_rebuild_locked(struct ice_vsi *vsi, u32 vsi_flags)
> 
> [ ... ]
> 
>> +free_coalesce:
>> +	kfree(coalesce);
>> +decfg:
>> +	if (ret)
>> +		ice_vsi_decfg(vsi);
> 
> Should this be ice_vsi_decfg_locked(vsi)?
> 
> ice_vsi_rebuild_locked() is called with the netdev lock already held
> (either by the ice_vsi_rebuild() wrapper or by callers like
> ice_vsi_recfg_qs()). The error path at label 'decfg:' calls ice_vsi_decfg()
> which tries to acquire the lock again:
> 
> ice_vsi_rebuild_locked() [netdev lock held]
>   -> ice_vsi_decfg()
>      -> netdev_lock(dev)  /* deadlock - already held */
> 
> This would deadlock when an error occurs after ice_vsi_cfg_def_locked()
> succeeds but a later operation fails.

Tony also fed the series to AI, two times, and each time he got a
different answer.
The series was on iwl-next for 1.5 months and only one bug was reported,
which I fixed immediately.

I can take a look into this, but wouldn't be better if we take the
series now and then have 2 months to fix bugs if any appears?

Thanks,
Olek

next prev parent reply	other threads:[~2026-02-11 13:54 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 17:48 [PATCH net-next 0/9][pull request] Intel Wired LAN Driver Updates 2026-02-06 (libeth, ice, i40e, ixgbe) Tony Nguyen
2026-02-06 17:48 ` [PATCH net-next 1/9] libeth: pass Rx queue index to PP when creating a fill queue Tony Nguyen
2026-02-06 17:48 ` [PATCH net-next 2/9] libeth: handle creating pools with unreadable buffers Tony Nguyen
2026-02-06 17:49 ` [PATCH net-next 3/9] ice: migrate to netdev ops lock Tony Nguyen
2026-02-11  4:24   ` [net-next,3/9] " Jakub Kicinski
2026-02-11 13:51     ` Alexander Lobakin [this message]
2026-02-11 16:55       ` Jakub Kicinski
2026-02-11 17:13         ` Alexander Lobakin
2026-02-11 18:46           ` Jacob Keller
2026-02-06 17:49 ` [PATCH net-next 4/9] ice: implement Rx queue management ops Tony Nguyen
2026-02-06 17:49 ` [PATCH net-next 5/9] ice: add support for transmitting unreadable frags Tony Nguyen
2026-02-06 17:49 ` [PATCH net-next 6/9] ice: Make name member of struct ice_cgu_pin_desc const Tony Nguyen
2026-02-06 17:49 ` [PATCH net-next 7/9] i40e: drop useless bitmap_weight() call in i40e_set_rxfh_fields() Tony Nguyen
2026-02-06 17:49 ` [PATCH net-next 8/9] i40e: Add missing header Tony Nguyen
2026-02-06 17:49 ` [PATCH net-next 9/9] ixgbe: refactor: use DECLARE_BITMAP for ring state field Tony Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a300bf0-3975-4f79-a3e0-5cb9b8159d62@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=alexander.nowlin@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=kohei@enjuk.jp \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=marcin.szycik@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=nxne.cnse.osdt.itp.upstreaming@intel.com \
    --cc=pabeni@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=richardcochran@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=sx.rinitha@intel.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox