[PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset
@ 2024-08-19 10:05 Larysa Zaremba
  2024-08-19 10:05 ` [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen
  Cc: Larysa Zaremba, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

PF reset can be triggered asynchronously, by tx_timeout or by a user. With some
unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and
modify XDP rings at the same time, causing system crash.

The first patch factors out rtnl-locked code from VSI rebuild code to avoid
deadlock. The following changes lock rebuild and .ndo_bpf() critical sections
with an internal mutex as well and provide complementary fixes.

v2: https://lore.kernel.org/netdev/20240724164840.2536605-1-larysa.zaremba@intel.com/
v2->v3:
* deconfig VSI when coalesce allocation fails in ice_vsi_rebuild (patch 2/6)
* rebase and resolve conflicts in patch 3 and 4
* add tags from v2

v1: https://lore.kernel.org/netdev/20240610153716.31493-1-larysa.zaremba@intel.com/
v1->v2:
* use mutex for locking
* redefine critical sections
* account for short time between rebuild and VSI being open
* add netif_queue_set_napi() patch, so ICE_RTNL_WAITS_FOR_RESET strategy can be
  dropped, no more rtnl-locked code in ice_vsi_rebuild()
* change the test case from waiting for tx_timeout to happen to actively firing
  resets through sysfs, this adds more minor fixes on top

Larysa Zaremba (6):
  ice: move netif_queue_set_napi to rtnl-protected sections
  ice: protect XDP configuration with a mutex
  ice: check for XDP rings instead of bpf program when unconfiguring
  ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  ice: remove ICE_CFG_BUSY locking from AF_XDP code
  ice: do not bring the VSI up, if it was down before the XDP setup

 drivers/net/ethernet/intel/ice/ice.h      |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
 drivers/net/ethernet/intel/ice/ice_lib.c  | 179 ++++++++--------------
 drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
 drivers/net/ethernet/intel/ice/ice_main.c |  47 ++++--
 drivers/net/ethernet/intel/ice/ice_xsk.c  |  18 +--
 6 files changed, 106 insertions(+), 161 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-08-19 10:05 [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
@ 2024-08-19 10:05 ` Larysa Zaremba
  2024-08-20 12:31   ` Maciej Fijalkowski
  2024-08-19 10:05 ` [PATCH iwl-net v3 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen
  Cc: Larysa Zaremba, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
not rtnl-locked when called from the reset. This creates the need to take
the rtnl_lock just for a single function and complicates the
synchronization with .ndo_bpf. At the same time, there no actual need to
fill napi-to-queue information at this exact point.

Fill napi-to-queue information when opening the VSI and clear it when the
VSI is being closed. Those routines are already rtnl-locked.

Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
XDP queues, as this leads to out-of-bounds writes, such as one below.

[  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
[  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
[  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
[  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[  +0.000003] Call Trace:
[  +0.000003]  <TASK>
[  +0.000002]  dump_stack_lvl+0x60/0x80
[  +0.000007]  print_report+0xce/0x630
[  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
[  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
[  +0.000003]  kasan_report+0xe9/0x120
[  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
[  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
[  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
[  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
[  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
[  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
[  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
[  +0.000006]  pci_reset_function+0x12d/0x230
[  +0.000004]  reset_store+0xa0/0x100
[  +0.000006]  ? __pfx_reset_store+0x10/0x10
[  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
[  +0.000004]  ? __check_object_size+0x4c1/0x640
[  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
[  +0.000006]  vfs_write+0x5d6/0xdf0
[  +0.000005]  ? fd_install+0x180/0x350
[  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
[  +0.000004]  ? do_fcntl+0x52c/0xcd0
[  +0.000004]  ? kasan_save_track+0x13/0x60
[  +0.000003]  ? kasan_save_free_info+0x37/0x60
[  +0.000006]  ksys_write+0xfa/0x1d0
[  +0.000003]  ? __pfx_ksys_write+0x10/0x10
[  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
[  +0.000004]  ? _raw_spin_lock+0x87/0xe0
[  +0.000005]  do_syscall_64+0x80/0x170
[  +0.000007]  ? _raw_spin_lock+0x87/0xe0
[  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
[  +0.000003]  ? file_close_fd_locked+0x167/0x230
[  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
[  +0.000005]  ? do_syscall_64+0x8c/0x170
[  +0.000004]  ? do_syscall_64+0x8c/0x170
[  +0.000003]  ? do_syscall_64+0x8c/0x170
[  +0.000003]  ? fput+0x1a/0x2c0
[  +0.000004]  ? filp_close+0x19/0x30
[  +0.000004]  ? do_dup2+0x25a/0x4c0
[  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
[  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
[  +0.000004]  ? do_syscall_64+0x8c/0x170
[  +0.000003]  ? __count_memcg_events+0x113/0x380
[  +0.000005]  ? handle_mm_fault+0x136/0x820
[  +0.000005]  ? do_user_addr_fault+0x444/0xa80
[  +0.000004]  ? clear_bhb_loop+0x25/0x80
[  +0.000004]  ? clear_bhb_loop+0x25/0x80
[  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000005] RIP: 0033:0x7f2033593154

Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
 drivers/net/ethernet/intel/ice/ice_lib.c  | 129 ++++++----------------
 drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
 drivers/net/ethernet/intel/ice/ice_main.c |  17 ++-
 4 files changed, 49 insertions(+), 118 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index f448d3a84564..c158749a80e0 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -190,16 +190,11 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
 	}
 	q_vector = vsi->q_vectors[v_idx];
 
-	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
-		ice_queue_set_napi(vsi, tx_ring->q_index, NETDEV_QUEUE_TYPE_TX,
-				   NULL);
+	ice_for_each_tx_ring(tx_ring, vsi->q_vectors[v_idx]->tx)
 		tx_ring->q_vector = NULL;
-	}
-	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
-		ice_queue_set_napi(vsi, rx_ring->q_index, NETDEV_QUEUE_TYPE_RX,
-				   NULL);
+
+	ice_for_each_rx_ring(rx_ring, vsi->q_vectors[v_idx]->rx)
 		rx_ring->q_vector = NULL;
-	}
 
 	/* only VSI with an associated netdev is set up with NAPI */
 	if (vsi->netdev)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 03c4df4ed585..5f2ddcaf7031 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2286,9 +2286,6 @@ static int ice_vsi_cfg_def(struct ice_vsi *vsi)
 
 		ice_vsi_map_rings_to_vectors(vsi);
 
-		/* Associate q_vector rings to napi */
-		ice_vsi_set_napi_queues(vsi);
-
 		vsi->stat_offsets_loaded = false;
 
 		/* ICE_VSI_CTRL does not need RSS so skip RSS processing */
@@ -2621,6 +2618,7 @@ void ice_vsi_close(struct ice_vsi *vsi)
 	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
 		ice_down(vsi);
 
+	ice_vsi_clear_napi_queues(vsi);
 	ice_vsi_free_irq(vsi);
 	ice_vsi_free_tx_rings(vsi);
 	ice_vsi_free_rx_rings(vsi);
@@ -2687,120 +2685,55 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
 }
 
 /**
- * __ice_queue_set_napi - Set the napi instance for the queue
- * @dev: device to which NAPI and queue belong
- * @queue_index: Index of queue
- * @type: queue type as RX or TX
- * @napi: NAPI context
- * @locked: is the rtnl_lock already held
- *
- * Set the napi instance for the queue. Caller indicates the lock status.
- */
-static void
-__ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
-		     enum netdev_queue_type type, struct napi_struct *napi,
-		     bool locked)
-{
-	if (!locked)
-		rtnl_lock();
-	netif_queue_set_napi(dev, queue_index, type, napi);
-	if (!locked)
-		rtnl_unlock();
-}
-
-/**
- * ice_queue_set_napi - Set the napi instance for the queue
- * @vsi: VSI being configured
- * @queue_index: Index of queue
- * @type: queue type as RX or TX
- * @napi: NAPI context
+ * ice_vsi_set_napi_queues
+ * @vsi: VSI pointer
  *
- * Set the napi instance for the queue. The rtnl lock state is derived from the
- * execution path.
+ * Associate queue[s] with napi for all vectors.
+ * The caller must hold rtnl_lock.
  */
-void
-ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
-		   enum netdev_queue_type type, struct napi_struct *napi)
+void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
 {
-	struct ice_pf *pf = vsi->back;
+	struct net_device *netdev = vsi->netdev;
+	int q_idx, v_idx;
 
-	if (!vsi->netdev)
+	if (!netdev)
 		return;
 
-	if (current_work() == &pf->serv_task ||
-	    test_bit(ICE_PREPARED_FOR_RESET, pf->state) ||
-	    test_bit(ICE_DOWN, pf->state) ||
-	    test_bit(ICE_SUSPENDED, pf->state))
-		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
-				     false);
-	else
-		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
-				     true);
-}
+	ice_for_each_rxq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX,
+				     &vsi->rx_rings[q_idx]->q_vector->napi);
 
-/**
- * __ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
- * @q_vector: q_vector pointer
- * @locked: is the rtnl_lock already held
- *
- * Associate the q_vector napi with all the queue[s] on the vector.
- * Caller indicates the lock status.
- */
-void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
-{
-	struct ice_rx_ring *rx_ring;
-	struct ice_tx_ring *tx_ring;
-
-	ice_for_each_rx_ring(rx_ring, q_vector->rx)
-		__ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
-				     NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
-				     locked);
-
-	ice_for_each_tx_ring(tx_ring, q_vector->tx)
-		__ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
-				     NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
-				     locked);
+	ice_for_each_txq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX,
+				     &vsi->tx_rings[q_idx]->q_vector->napi);
 	/* Also set the interrupt number for the NAPI */
-	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
-}
-
-/**
- * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
- * @q_vector: q_vector pointer
- *
- * Associate the q_vector napi with all the queue[s] on the vector
- */
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector)
-{
-	struct ice_rx_ring *rx_ring;
-	struct ice_tx_ring *tx_ring;
-
-	ice_for_each_rx_ring(rx_ring, q_vector->rx)
-		ice_queue_set_napi(q_vector->vsi, rx_ring->q_index,
-				   NETDEV_QUEUE_TYPE_RX, &q_vector->napi);
+	ice_for_each_q_vector(vsi, v_idx) {
+		struct ice_q_vector *q_vector = vsi->q_vectors[v_idx];
 
-	ice_for_each_tx_ring(tx_ring, q_vector->tx)
-		ice_queue_set_napi(q_vector->vsi, tx_ring->q_index,
-				   NETDEV_QUEUE_TYPE_TX, &q_vector->napi);
-	/* Also set the interrupt number for the NAPI */
-	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
+		netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
+	}
 }
 
 /**
- * ice_vsi_set_napi_queues
+ * ice_vsi_clear_napi_queues
  * @vsi: VSI pointer
  *
- * Associate queue[s] with napi for all vectors
+ * Clear the association between all VSI queues queue[s] and napi.
+ * The caller must hold rtnl_lock.
  */
-void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
+void ice_vsi_clear_napi_queues(struct ice_vsi *vsi)
 {
-	int i;
+	struct net_device *netdev = vsi->netdev;
+	int q_idx;
 
-	if (!vsi->netdev)
+	if (!netdev)
 		return;
 
-	ice_for_each_q_vector(vsi, i)
-		ice_q_vector_set_napi_queues(vsi->q_vectors[i]);
+	ice_for_each_txq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX, NULL);
+
+	ice_for_each_rxq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX, NULL);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 94ce8964dda6..36d86535695d 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -44,16 +44,10 @@ void ice_vsi_cfg_netdev_tc(struct ice_vsi *vsi, u8 ena_tc);
 struct ice_vsi *
 ice_vsi_setup(struct ice_pf *pf, struct ice_vsi_cfg_params *params);
 
-void
-ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
-		   enum netdev_queue_type type, struct napi_struct *napi);
-
-void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
-
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector);
-
 void ice_vsi_set_napi_queues(struct ice_vsi *vsi);
 
+void ice_vsi_clear_napi_queues(struct ice_vsi *vsi);
+
 int ice_vsi_release(struct ice_vsi *vsi);
 
 void ice_vsi_close(struct ice_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 66820ed5e969..2d286a4609a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3537,11 +3537,9 @@ static void ice_napi_add(struct ice_vsi *vsi)
 	if (!vsi->netdev)
 		return;
 
-	ice_for_each_q_vector(vsi, v_idx) {
+	ice_for_each_q_vector(vsi, v_idx)
 		netif_napi_add(vsi->netdev, &vsi->q_vectors[v_idx]->napi,
 			       ice_napi_poll);
-		__ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
-	}
 }
 
 /**
@@ -5519,7 +5517,9 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
 		if (ret)
 			goto err_reinit;
 		ice_vsi_map_rings_to_vectors(pf->vsi[v]);
+		rtnl_lock();
 		ice_vsi_set_napi_queues(pf->vsi[v]);
+		rtnl_unlock();
 	}
 
 	ret = ice_req_irq_msix_misc(pf);
@@ -5533,8 +5533,12 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
 
 err_reinit:
 	while (v--)
-		if (pf->vsi[v])
+		if (pf->vsi[v]) {
+			rtnl_lock();
+			ice_vsi_clear_napi_queues(pf->vsi[v]);
+			rtnl_unlock();
 			ice_vsi_free_q_vectors(pf->vsi[v]);
+		}
 
 	return ret;
 }
@@ -5599,6 +5603,9 @@ static int ice_suspend(struct device *dev)
 	ice_for_each_vsi(pf, v) {
 		if (!pf->vsi[v])
 			continue;
+		rtnl_lock();
+		ice_vsi_clear_napi_queues(pf->vsi[v]);
+		rtnl_unlock();
 		ice_vsi_free_q_vectors(pf->vsi[v]);
 	}
 	ice_clear_interrupt_scheme(pf);
@@ -7434,6 +7441,8 @@ int ice_vsi_open(struct ice_vsi *vsi)
 		err = netif_set_real_num_rx_queues(vsi->netdev, vsi->num_rxq);
 		if (err)
 			goto err_set_qs;
+
+		ice_vsi_set_napi_queues(vsi);
 	}
 
 	err = ice_up_complete(vsi);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH iwl-net v3 2/6] ice: protect XDP configuration with a mutex
  2024-08-19 10:05 [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
  2024-08-19 10:05 ` [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
@ 2024-08-19 10:05 ` Larysa Zaremba
  2024-08-22 11:39   ` Maciej Fijalkowski
  2024-08-19 10:05 ` [PATCH iwl-net v3 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen
  Cc: Larysa Zaremba, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

The main threat to data consistency in ice_xdp() is a possible asynchronous
PF reset. It can be triggered by a user or by TX timeout handler.

XDP setup and PF reset code access the same resources in the following
sections:
* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
* ice_vsi_rebuild() for the PF VSI - not protected
* ice_vsi_open() - already rtnl-locked

With an unfortunate timing, such accesses can result in a crash such as the
one below:

[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
[ +0.394718] ice 0000:b1:00.0: PTP reset successful
[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ +0.000045] #PF: supervisor read access in kernel mode
[ +0.000023] #PF: error_code(0x0000) - not-present page
[ +0.000023] PGD 0 P4D 0
[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000036] Workqueue: ice ice_service_task [ice]
[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
[...]
[ +0.000013] Call Trace:
[ +0.000016] <TASK>
[ +0.000014] ? __die+0x1f/0x70
[ +0.000029] ? page_fault_oops+0x171/0x4f0
[ +0.000029] ? schedule+0x3b/0xd0
[ +0.000027] ? exc_page_fault+0x7b/0x180
[ +0.000022] ? asm_exc_page_fault+0x22/0x30
[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[ +0.000145] ice_rebuild+0x18c/0x840 [ice]
[ +0.000145] ? delay_tsc+0x4a/0xc0
[ +0.000022] ? delay_tsc+0x92/0xc0
[ +0.000020] ice_do_reset+0x140/0x180 [ice]
[ +0.000886] ice_service_task+0x404/0x1030 [ice]
[ +0.000824] process_one_work+0x171/0x340
[ +0.000685] worker_thread+0x277/0x3a0
[ +0.000675] ? preempt_count_add+0x6a/0xa0
[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
[ +0.000679] ? __pfx_worker_thread+0x10/0x10
[ +0.000653] kthread+0xf0/0x120
[ +0.000635] ? __pfx_kthread+0x10/0x10
[ +0.000616] ret_from_fork+0x2d/0x50
[ +0.000612] ? __pfx_kthread+0x10/0x10
[ +0.000604] ret_from_fork_asm+0x1b/0x30
[ +0.000604] </TASK>

The previous way of handling this through returning -EBUSY is not viable,
particularly when destroying AF_XDP socket, because the kernel proceeds
with removal anyway.

There is plenty of code between those calls and there is no need to create
a large critical section that covers all of them, same as there is no need
to protect ice_vsi_rebuild() with rtnl_lock().

Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().

Leaving unprotected sections in between would result in two states that
have to be considered:
1. when the VSI is closed, but not yet rebuild
2. when VSI is already rebuild, but not yet open

The latter case is actually already handled through !netif_running() case,
we just need to adjust flag checking a little. The former one is not as
trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
hardware interaction happens, this can make adding/deleting rings exit
with an error. Luckily, VSI rebuild is pending and can apply new
configuration for us in a managed fashion.

Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
indicate that ice_xdp() can just hot-swap the program.

Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
consistent by deconfiguring VSI when coalesce allocation fails.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |  2 ++
 drivers/net/ethernet/intel/ice/ice_lib.c  | 34 ++++++++++++++---------
 drivers/net/ethernet/intel/ice/ice_main.c | 19 +++++++++----
 drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 +-
 4 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index caaa10157909..ce8b5505b16d 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -318,6 +318,7 @@ enum ice_vsi_state {
 	ICE_VSI_UMAC_FLTR_CHANGED,
 	ICE_VSI_MMAC_FLTR_CHANGED,
 	ICE_VSI_PROMISC_CHANGED,
+	ICE_VSI_REBUILD_PENDING,
 	ICE_VSI_STATE_NBITS		/* must be last */
 };
 
@@ -411,6 +412,7 @@ struct ice_vsi {
 	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
 	u16 num_xdp_txq;		 /* Used XDP queues */
 	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
+	struct mutex xdp_state_lock;
 
 	struct net_device **target_netdevs;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 5f2ddcaf7031..a8721ecdf2cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -447,6 +447,7 @@ static void ice_vsi_free(struct ice_vsi *vsi)
 
 	ice_vsi_free_stats(vsi);
 	ice_vsi_free_arrays(vsi);
+	mutex_destroy(&vsi->xdp_state_lock);
 	mutex_unlock(&pf->sw_mutex);
 	devm_kfree(dev, vsi);
 }
@@ -626,6 +627,8 @@ static struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf)
 	pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
 					 pf->next_vsi);
 
+	mutex_init(&vsi->xdp_state_lock);
+
 unlock_pf:
 	mutex_unlock(&pf->sw_mutex);
 	return vsi;
@@ -2973,19 +2976,24 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
 	if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
 		return -EINVAL;
 
+	mutex_lock(&vsi->xdp_state_lock);
+	clear_bit(ICE_VSI_REBUILD_PENDING, vsi->state);
+
 	ret = ice_vsi_realloc_stat_arrays(vsi);
 	if (ret)
-		goto err_vsi_cfg;
+		goto unlock;
 
 	ice_vsi_decfg(vsi);
 	ret = ice_vsi_cfg_def(vsi);
 	if (ret)
-		goto err_vsi_cfg;
+		goto unlock;
 
 	coalesce = kcalloc(vsi->num_q_vectors,
 			   sizeof(struct ice_coalesce_stored), GFP_KERNEL);
-	if (!coalesce)
-		return -ENOMEM;
+	if (!coalesce) {
+		ret = -ENOMEM;
+		goto decfg;
+	}
 
 	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
 
@@ -2993,22 +3001,22 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
 	if (ret) {
 		if (vsi_flags & ICE_VSI_FLAG_INIT) {
 			ret = -EIO;
-			goto err_vsi_cfg_tc_lan;
+			goto free_coalesce;
 		}
 
-		kfree(coalesce);
-		return ice_schedule_reset(pf, ICE_RESET_PFR);
+		ret = ice_schedule_reset(pf, ICE_RESET_PFR);
+		goto free_coalesce;
 	}
 
 	ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors);
-	kfree(coalesce);
 
-	return 0;
-
-err_vsi_cfg_tc_lan:
-	ice_vsi_decfg(vsi);
+free_coalesce:
 	kfree(coalesce);
-err_vsi_cfg:
+decfg:
+	if (ret)
+		ice_vsi_decfg(vsi);
+unlock:
+	mutex_unlock(&vsi->xdp_state_lock);
 	return ret;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 2d286a4609a5..e92f43850671 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -595,6 +595,7 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
 	/* clear SW filtering DB */
 	ice_clear_hw_tbls(hw);
 	/* disable the VSIs and their queues that are not already DOWN */
+	set_bit(ICE_VSI_REBUILD_PENDING, ice_get_main_vsi(pf)->state);
 	ice_pf_dis_all_vsi(pf, false);
 
 	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
@@ -2995,7 +2996,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 	}
 
 	/* hot swap progs and avoid toggling link */
-	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
+	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
+	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
 		ice_vsi_assign_bpf_prog(vsi, prog);
 		return 0;
 	}
@@ -3067,21 +3069,28 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	struct ice_netdev_priv *np = netdev_priv(dev);
 	struct ice_vsi *vsi = np->vsi;
+	int ret;
 
 	if (vsi->type != ICE_VSI_PF) {
 		NL_SET_ERR_MSG_MOD(xdp->extack, "XDP can be loaded only on PF VSI");
 		return -EINVAL;
 	}
 
+	mutex_lock(&vsi->xdp_state_lock);
+
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
-		return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
+		ret = ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
+		break;
 	case XDP_SETUP_XSK_POOL:
-		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
-					  xdp->xsk.queue_id);
+		ret = ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id);
+		break;
 	default:
-		return -EINVAL;
+		ret = -EINVAL;
 	}
+
+	mutex_unlock(&vsi->xdp_state_lock);
+	return ret;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 240a7bec242b..a659951fa987 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -390,7 +390,8 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
 		goto failure;
 	}
 
-	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
+	if_running = !test_bit(ICE_VSI_DOWN, vsi->state) &&
+		     ice_is_xdp_ena_vsi(vsi);
 
 	if (if_running) {
 		struct ice_rx_ring *rx_ring = vsi->rx_rings[qid];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH iwl-net v3 3/6] ice: check for XDP rings instead of bpf program when unconfiguring
  2024-08-19 10:05 [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
  2024-08-19 10:05 ` [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
  2024-08-19 10:05 ` [PATCH iwl-net v3 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
@ 2024-08-19 10:05 ` Larysa Zaremba
  2024-08-22 11:36   ` Maciej Fijalkowski
  2024-08-19 10:05 ` [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen
  Cc: Larysa Zaremba, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
VSI without applying new ring configuration. When unconfiguring the VSI, we
can encounter the state in which there is an XDP program but no XDP rings
to destroy or there will be XDP rings that need to be destroyed, but no XDP
program to indicate their presence.

When unconfiguring, rely on the presence of XDP rings rather then XDP
program, as they better represent the current state that has to be
destroyed.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c  | 4 ++--
 drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index a8721ecdf2cd..b72338974a60 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2419,7 +2419,7 @@ void ice_vsi_decfg(struct ice_vsi *vsi)
 		dev_err(ice_pf_to_dev(pf), "Failed to remove RDMA scheduler config for VSI %u, err %d\n",
 			vsi->vsi_num, err);
 
-	if (ice_is_xdp_ena_vsi(vsi))
+	if (vsi->xdp_rings)
 		/* return value check can be skipped here, it always returns
 		 * 0 if reset is in progress
 		 */
@@ -2521,7 +2521,7 @@ static void ice_vsi_release_msix(struct ice_vsi *vsi)
 		for (q = 0; q < q_vector->num_ring_tx; q++) {
 			ice_write_itr(&q_vector->tx, 0);
 			wr32(hw, QINT_TQCTL(vsi->txq_map[txq]), 0);
-			if (ice_is_xdp_ena_vsi(vsi)) {
+			if (vsi->xdp_rings) {
 				u32 xdp_txq = txq + vsi->num_xdp_txq;
 
 				wr32(hw, QINT_TQCTL(vsi->txq_map[xdp_txq]), 0);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index e92f43850671..a718763d2370 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -7228,7 +7228,7 @@ int ice_down(struct ice_vsi *vsi)
 	if (tx_err)
 		netdev_err(vsi->netdev, "Failed stop Tx rings, VSI %d error %d\n",
 			   vsi->vsi_num, tx_err);
-	if (!tx_err && ice_is_xdp_ena_vsi(vsi)) {
+	if (!tx_err && vsi->xdp_rings) {
 		tx_err = ice_vsi_stop_xdp_tx_rings(vsi);
 		if (tx_err)
 			netdev_err(vsi->netdev, "Failed stop XDP rings, VSI %d error %d\n",
@@ -7245,7 +7245,7 @@ int ice_down(struct ice_vsi *vsi)
 	ice_for_each_txq(vsi, i)
 		ice_clean_tx_ring(vsi->tx_rings[i]);
 
-	if (ice_is_xdp_ena_vsi(vsi))
+	if (vsi->xdp_rings)
 		ice_for_each_xdp_txq(vsi, i)
 			ice_clean_tx_ring(vsi->xdp_rings[i]);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index a659951fa987..8693509efbe7 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -39,7 +39,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
 	       sizeof(vsi_stat->rx_ring_stats[q_idx]->rx_stats));
 	memset(&vsi_stat->tx_ring_stats[q_idx]->stats, 0,
 	       sizeof(vsi_stat->tx_ring_stats[q_idx]->stats));
-	if (ice_is_xdp_ena_vsi(vsi))
+	if (vsi->xdp_rings)
 		memset(&vsi->xdp_rings[q_idx]->ring_stats->stats, 0,
 		       sizeof(vsi->xdp_rings[q_idx]->ring_stats->stats));
 }
@@ -52,7 +52,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
 static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx)
 {
 	ice_clean_tx_ring(vsi->tx_rings[q_idx]);
-	if (ice_is_xdp_ena_vsi(vsi))
+	if (vsi->xdp_rings)
 		ice_clean_tx_ring(vsi->xdp_rings[q_idx]);
 	ice_clean_rx_ring(vsi->rx_rings[q_idx]);
 }
@@ -194,7 +194,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 	err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, 0, tx_ring, &txq_meta);
 	if (!fail)
 		fail = err;
-	if (ice_is_xdp_ena_vsi(vsi)) {
+	if (vsi->xdp_rings) {
 		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
 
 		memset(&txq_meta, 0, sizeof(txq_meta));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-08-19 10:05 [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
                   ` (2 preceding siblings ...)
  2024-08-19 10:05 ` [PATCH iwl-net v3 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
@ 2024-08-19 10:05 ` Larysa Zaremba
  2024-08-22 11:34   ` Maciej Fijalkowski
  2024-08-19 10:05 ` [PATCH iwl-net v3 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
  2024-08-19 10:05 ` [PATCH iwl-net v3 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
  5 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen
  Cc: Larysa Zaremba, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

Consider the following scenario:

.ndo_bpf()		| ice_prepare_for_reset()		|
________________________|_______________________________________|
rtnl_lock()		|					|
ice_down()		|					|
			| test_bit(ICE_VSI_DOWN) - true		|
			| ice_dis_vsi() returns			|
ice_up()		|					|
			| proceeds to rebuild a running VSI	|

.ndo_bpf() is not the only rtnl-locked callback that toggles the interface
to apply new configuration. Another example is .set_channels().

To avoid the race condition above, act only after reading ICE_VSI_DOWN
under rtnl_lock.

Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index b72338974a60..94029e446b99 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2665,8 +2665,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
  */
 void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
 {
-	if (test_bit(ICE_VSI_DOWN, vsi->state))
-		return;
+	bool already_down = test_bit(ICE_VSI_DOWN, vsi->state);
 
 	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
 
@@ -2674,15 +2673,16 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
 		if (netif_running(vsi->netdev)) {
 			if (!locked)
 				rtnl_lock();
-
-			ice_vsi_close(vsi);
+			already_down = test_bit(ICE_VSI_DOWN, vsi->state);
+			if (!already_down)
+				ice_vsi_close(vsi);
 
 			if (!locked)
 				rtnl_unlock();
-		} else {
+		} else if (!already_down) {
 			ice_vsi_close(vsi);
 		}
-	} else if (vsi->type == ICE_VSI_CTRL) {
+	} else if (vsi->type == ICE_VSI_CTRL && !already_down) {
 		ice_vsi_close(vsi);
 	}
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH iwl-net v3 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-08-19 10:05 [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
                   ` (3 preceding siblings ...)
  2024-08-19 10:05 ` [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
@ 2024-08-19 10:05 ` Larysa Zaremba
  2024-08-22 11:43   ` Maciej Fijalkowski
  2024-08-19 10:05 ` [PATCH iwl-net v3 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
  5 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen
  Cc: Larysa Zaremba, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
state, not VSI one. Therefore it does not protect the queue pair from
e.g. reset.

Despite being useless, it still can deadlock the unfortunate functions that
have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
returns an error.

Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 8693509efbe7..5dee829bfc47 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -165,7 +165,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 	struct ice_q_vector *q_vector;
 	struct ice_tx_ring *tx_ring;
 	struct ice_rx_ring *rx_ring;
-	int timeout = 50;
 	int fail = 0;
 	int err;
 
@@ -176,13 +175,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 	rx_ring = vsi->rx_rings[q_idx];
 	q_vector = rx_ring->q_vector;
 
-	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
-		timeout--;
-		if (!timeout)
-			return -EBUSY;
-		usleep_range(1000, 2000);
-	}
-
 	synchronize_net();
 	netif_carrier_off(vsi->netdev);
 	netif_tx_stop_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
@@ -261,7 +253,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 		netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
 		netif_carrier_on(vsi->netdev);
 	}
-	clear_bit(ICE_CFG_BUSY, vsi->state);
 
 	return fail;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH iwl-net v3 6/6] ice: do not bring the VSI up, if it was down before the XDP setup
  2024-08-19 10:05 [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
                   ` (4 preceding siblings ...)
  2024-08-19 10:05 ` [PATCH iwl-net v3 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
@ 2024-08-19 10:05 ` Larysa Zaremba
  2024-08-22 11:35   ` Maciej Fijalkowski
  5 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan, Tony Nguyen
  Cc: Larysa Zaremba, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

After XDP configuration is completed, we bring the interface up
unconditionally, regardless of its state before the call to .ndo_bpf().

Preserve the information whether the interface had to be brought down and
later bring it up only in such case.

Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a718763d2370..d3277d5d3bd2 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2984,8 +2984,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 		   struct netlink_ext_ack *extack)
 {
 	unsigned int frame_size = vsi->netdev->mtu + ICE_ETH_PKT_HDR_PAD;
-	bool if_running = netif_running(vsi->netdev);
 	int ret = 0, xdp_ring_err = 0;
+	bool if_running;
 
 	if (prog && !prog->aux->xdp_has_frags) {
 		if (frame_size > ice_max_xdp_frame_size(vsi)) {
@@ -3002,8 +3002,11 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 		return 0;
 	}
 
+	if_running = netif_running(vsi->netdev) &&
+		     !test_and_set_bit(ICE_VSI_DOWN, vsi->state);
+
 	/* need to stop netdev while setting up the program for Rx rings */
-	if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) {
+	if (if_running) {
 		ret = ice_down(vsi);
 		if (ret) {
 			NL_SET_ERR_MSG_MOD(extack, "Preparing device for XDP attach failed");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-08-19 10:05 ` [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
@ 2024-08-20 12:31   ` Maciej Fijalkowski
  2024-08-20 12:47     ` Larysa Zaremba
  0 siblings, 1 reply; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-20 12:31 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Mon, Aug 19, 2024 at 12:05:38PM +0200, Larysa Zaremba wrote:
> Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
> not rtnl-locked when called from the reset. This creates the need to take
> the rtnl_lock just for a single function and complicates the
> synchronization with .ndo_bpf. At the same time, there no actual need to
> fill napi-to-queue information at this exact point.
> 
> Fill napi-to-queue information when opening the VSI and clear it when the
> VSI is being closed. Those routines are already rtnl-locked.
> 
> Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
> XDP queues, as this leads to out-of-bounds writes, such as one below.
> 
> [  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
> [  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
> [  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> [  +0.000003] Call Trace:
> [  +0.000003]  <TASK>
> [  +0.000002]  dump_stack_lvl+0x60/0x80
> [  +0.000007]  print_report+0xce/0x630
> [  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
> [  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000003]  kasan_report+0xe9/0x120
> [  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
> [  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
> [  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
> [  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
> [  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
> [  +0.000006]  pci_reset_function+0x12d/0x230
> [  +0.000004]  reset_store+0xa0/0x100
> [  +0.000006]  ? __pfx_reset_store+0x10/0x10
> [  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
> [  +0.000004]  ? __check_object_size+0x4c1/0x640
> [  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
> [  +0.000006]  vfs_write+0x5d6/0xdf0
> [  +0.000005]  ? fd_install+0x180/0x350
> [  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
> [  +0.000004]  ? do_fcntl+0x52c/0xcd0
> [  +0.000004]  ? kasan_save_track+0x13/0x60
> [  +0.000003]  ? kasan_save_free_info+0x37/0x60
> [  +0.000006]  ksys_write+0xfa/0x1d0
> [  +0.000003]  ? __pfx_ksys_write+0x10/0x10
> [  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
> [  +0.000004]  ? _raw_spin_lock+0x87/0xe0
> [  +0.000005]  do_syscall_64+0x80/0x170
> [  +0.000007]  ? _raw_spin_lock+0x87/0xe0
> [  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
> [  +0.000003]  ? file_close_fd_locked+0x167/0x230
> [  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  +0.000005]  ? do_syscall_64+0x8c/0x170
> [  +0.000004]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? fput+0x1a/0x2c0
> [  +0.000004]  ? filp_close+0x19/0x30
> [  +0.000004]  ? do_dup2+0x25a/0x4c0
> [  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
> [  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  +0.000004]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? __count_memcg_events+0x113/0x380
> [  +0.000005]  ? handle_mm_fault+0x136/0x820
> [  +0.000005]  ? do_user_addr_fault+0x444/0xa80
> [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> [  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  +0.000005] RIP: 0033:0x7f2033593154
> 
> Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")

Shouldn't you include:
Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi")

as we were iterating over XDP rings that were attached to q_vectors from
the very beginning?

> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
>  drivers/net/ethernet/intel/ice/ice_lib.c  | 129 ++++++----------------
>  drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
>  drivers/net/ethernet/intel/ice/ice_main.c |  17 ++-
>  4 files changed, 49 insertions(+), 118 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index f448d3a84564..c158749a80e0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -190,16 +190,11 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
>  	}
>  	q_vector = vsi->q_vectors[v_idx];
>  
> -	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
> -		ice_queue_set_napi(vsi, tx_ring->q_index, NETDEV_QUEUE_TYPE_TX,
> -				   NULL);
> +	ice_for_each_tx_ring(tx_ring, vsi->q_vectors[v_idx]->tx)
>  		tx_ring->q_vector = NULL;
> -	}
> -	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
> -		ice_queue_set_napi(vsi, rx_ring->q_index, NETDEV_QUEUE_TYPE_RX,
> -				   NULL);
> +
> +	ice_for_each_rx_ring(rx_ring, vsi->q_vectors[v_idx]->rx)
>  		rx_ring->q_vector = NULL;
> -	}
>  
>  	/* only VSI with an associated netdev is set up with NAPI */
>  	if (vsi->netdev)
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index 03c4df4ed585..5f2ddcaf7031 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2286,9 +2286,6 @@ static int ice_vsi_cfg_def(struct ice_vsi *vsi)
>  
>  		ice_vsi_map_rings_to_vectors(vsi);
>  
> -		/* Associate q_vector rings to napi */
> -		ice_vsi_set_napi_queues(vsi);
> -
>  		vsi->stat_offsets_loaded = false;
>  
>  		/* ICE_VSI_CTRL does not need RSS so skip RSS processing */
> @@ -2621,6 +2618,7 @@ void ice_vsi_close(struct ice_vsi *vsi)
>  	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
>  		ice_down(vsi);
>  
> +	ice_vsi_clear_napi_queues(vsi);
>  	ice_vsi_free_irq(vsi);
>  	ice_vsi_free_tx_rings(vsi);
>  	ice_vsi_free_rx_rings(vsi);
> @@ -2687,120 +2685,55 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
>  }
>  
>  /**
> - * __ice_queue_set_napi - Set the napi instance for the queue
> - * @dev: device to which NAPI and queue belong
> - * @queue_index: Index of queue
> - * @type: queue type as RX or TX
> - * @napi: NAPI context
> - * @locked: is the rtnl_lock already held
> - *
> - * Set the napi instance for the queue. Caller indicates the lock status.
> - */
> -static void
> -__ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
> -		     enum netdev_queue_type type, struct napi_struct *napi,
> -		     bool locked)
> -{
> -	if (!locked)
> -		rtnl_lock();
> -	netif_queue_set_napi(dev, queue_index, type, napi);
> -	if (!locked)
> -		rtnl_unlock();
> -}
> -
> -/**
> - * ice_queue_set_napi - Set the napi instance for the queue
> - * @vsi: VSI being configured
> - * @queue_index: Index of queue
> - * @type: queue type as RX or TX
> - * @napi: NAPI context
> + * ice_vsi_set_napi_queues
> + * @vsi: VSI pointer
>   *
> - * Set the napi instance for the queue. The rtnl lock state is derived from the
> - * execution path.
> + * Associate queue[s] with napi for all vectors.
> + * The caller must hold rtnl_lock.
>   */
> -void
> -ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
> -		   enum netdev_queue_type type, struct napi_struct *napi)
> +void ice_vsi_set_napi_queues(struct ice_vsi *vsi)

this appears to be called only in ice_main.c. It should be moved there and
made a static function instead of having it in ice_lib.c.

Unless I overlooked something...

>  {
> -	struct ice_pf *pf = vsi->back;
> +	struct net_device *netdev = vsi->netdev;
> +	int q_idx, v_idx;
>  
> -	if (!vsi->netdev)
> +	if (!netdev)
>  		return;
>  
> -	if (current_work() == &pf->serv_task ||
> -	    test_bit(ICE_PREPARED_FOR_RESET, pf->state) ||
> -	    test_bit(ICE_DOWN, pf->state) ||
> -	    test_bit(ICE_SUSPENDED, pf->state))
> -		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
> -				     false);
> -	else
> -		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
> -				     true);
> -}
> +	ice_for_each_rxq(vsi, q_idx)
> +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX,
> +				     &vsi->rx_rings[q_idx]->q_vector->napi);
>  
> -/**
> - * __ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
> - * @q_vector: q_vector pointer
> - * @locked: is the rtnl_lock already held
> - *
> - * Associate the q_vector napi with all the queue[s] on the vector.
> - * Caller indicates the lock status.
> - */
> -void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
> -{
> -	struct ice_rx_ring *rx_ring;
> -	struct ice_tx_ring *tx_ring;
> -
> -	ice_for_each_rx_ring(rx_ring, q_vector->rx)
> -		__ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
> -				     NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
> -				     locked);
> -
> -	ice_for_each_tx_ring(tx_ring, q_vector->tx)
> -		__ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
> -				     NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
> -				     locked);
> +	ice_for_each_txq(vsi, q_idx)
> +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX,
> +				     &vsi->tx_rings[q_idx]->q_vector->napi);
>  	/* Also set the interrupt number for the NAPI */
> -	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
> -}
> -
> -/**
> - * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
> - * @q_vector: q_vector pointer
> - *
> - * Associate the q_vector napi with all the queue[s] on the vector
> - */
> -void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector)
> -{
> -	struct ice_rx_ring *rx_ring;
> -	struct ice_tx_ring *tx_ring;
> -
> -	ice_for_each_rx_ring(rx_ring, q_vector->rx)
> -		ice_queue_set_napi(q_vector->vsi, rx_ring->q_index,
> -				   NETDEV_QUEUE_TYPE_RX, &q_vector->napi);
> +	ice_for_each_q_vector(vsi, v_idx) {
> +		struct ice_q_vector *q_vector = vsi->q_vectors[v_idx];
>  
> -	ice_for_each_tx_ring(tx_ring, q_vector->tx)
> -		ice_queue_set_napi(q_vector->vsi, tx_ring->q_index,
> -				   NETDEV_QUEUE_TYPE_TX, &q_vector->napi);
> -	/* Also set the interrupt number for the NAPI */
> -	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
> +		netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
> +	}
>  }
>  
>  /**
> - * ice_vsi_set_napi_queues
> + * ice_vsi_clear_napi_queues
>   * @vsi: VSI pointer
>   *
> - * Associate queue[s] with napi for all vectors
> + * Clear the association between all VSI queues queue[s] and napi.
> + * The caller must hold rtnl_lock.
>   */
> -void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
> +void ice_vsi_clear_napi_queues(struct ice_vsi *vsi)
>  {
> -	int i;
> +	struct net_device *netdev = vsi->netdev;
> +	int q_idx;
>  
> -	if (!vsi->netdev)
> +	if (!netdev)
>  		return;
>  
> -	ice_for_each_q_vector(vsi, i)
> -		ice_q_vector_set_napi_queues(vsi->q_vectors[i]);
> +	ice_for_each_txq(vsi, q_idx)
> +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX, NULL);
> +
> +	ice_for_each_rxq(vsi, q_idx)
> +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX, NULL);
>  }
>  
>  /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
> index 94ce8964dda6..36d86535695d 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.h
> @@ -44,16 +44,10 @@ void ice_vsi_cfg_netdev_tc(struct ice_vsi *vsi, u8 ena_tc);
>  struct ice_vsi *
>  ice_vsi_setup(struct ice_pf *pf, struct ice_vsi_cfg_params *params);
>  
> -void
> -ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
> -		   enum netdev_queue_type type, struct napi_struct *napi);
> -
> -void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
> -
> -void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector);
> -
>  void ice_vsi_set_napi_queues(struct ice_vsi *vsi);
>  
> +void ice_vsi_clear_napi_queues(struct ice_vsi *vsi);
> +
>  int ice_vsi_release(struct ice_vsi *vsi);
>  
>  void ice_vsi_close(struct ice_vsi *vsi);
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 66820ed5e969..2d286a4609a5 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -3537,11 +3537,9 @@ static void ice_napi_add(struct ice_vsi *vsi)
>  	if (!vsi->netdev)
>  		return;
>  
> -	ice_for_each_q_vector(vsi, v_idx) {
> +	ice_for_each_q_vector(vsi, v_idx)
>  		netif_napi_add(vsi->netdev, &vsi->q_vectors[v_idx]->napi,
>  			       ice_napi_poll);
> -		__ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
> -	}
>  }
>  
>  /**
> @@ -5519,7 +5517,9 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
>  		if (ret)
>  			goto err_reinit;
>  		ice_vsi_map_rings_to_vectors(pf->vsi[v]);
> +		rtnl_lock();
>  		ice_vsi_set_napi_queues(pf->vsi[v]);
> +		rtnl_unlock();
>  	}
>  
>  	ret = ice_req_irq_msix_misc(pf);
> @@ -5533,8 +5533,12 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
>  
>  err_reinit:
>  	while (v--)
> -		if (pf->vsi[v])
> +		if (pf->vsi[v]) {
> +			rtnl_lock();
> +			ice_vsi_clear_napi_queues(pf->vsi[v]);
> +			rtnl_unlock();
>  			ice_vsi_free_q_vectors(pf->vsi[v]);
> +		}
>  
>  	return ret;
>  }
> @@ -5599,6 +5603,9 @@ static int ice_suspend(struct device *dev)
>  	ice_for_each_vsi(pf, v) {
>  		if (!pf->vsi[v])
>  			continue;
> +		rtnl_lock();
> +		ice_vsi_clear_napi_queues(pf->vsi[v]);
> +		rtnl_unlock();
>  		ice_vsi_free_q_vectors(pf->vsi[v]);
>  	}
>  	ice_clear_interrupt_scheme(pf);
> @@ -7434,6 +7441,8 @@ int ice_vsi_open(struct ice_vsi *vsi)
>  		err = netif_set_real_num_rx_queues(vsi->netdev, vsi->num_rxq);
>  		if (err)
>  			goto err_set_qs;
> +
> +		ice_vsi_set_napi_queues(vsi);
>  	}
>  
>  	err = ice_up_complete(vsi);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-08-20 12:31   ` Maciej Fijalkowski
@ 2024-08-20 12:47     ` Larysa Zaremba
  2024-08-20 13:26       ` Maciej Fijalkowski
  0 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-20 12:47 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Tue, Aug 20, 2024 at 02:31:51PM +0200, Maciej Fijalkowski wrote:
> On Mon, Aug 19, 2024 at 12:05:38PM +0200, Larysa Zaremba wrote:
> > Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
> > not rtnl-locked when called from the reset. This creates the need to take
> > the rtnl_lock just for a single function and complicates the
> > synchronization with .ndo_bpf. At the same time, there no actual need to
> > fill napi-to-queue information at this exact point.
> > 
> > Fill napi-to-queue information when opening the VSI and clear it when the
> > VSI is being closed. Those routines are already rtnl-locked.
> > 
> > Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
> > XDP queues, as this leads to out-of-bounds writes, such as one below.
> > 
> > [  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
> > [  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
> > [  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
> > [  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> > [  +0.000003] Call Trace:
> > [  +0.000003]  <TASK>
> > [  +0.000002]  dump_stack_lvl+0x60/0x80
> > [  +0.000007]  print_report+0xce/0x630
> > [  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > [  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
> > [  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
> > [  +0.000003]  kasan_report+0xe9/0x120
> > [  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
> > [  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
> > [  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
> > [  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
> > [  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
> > [  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
> > [  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
> > [  +0.000006]  pci_reset_function+0x12d/0x230
> > [  +0.000004]  reset_store+0xa0/0x100
> > [  +0.000006]  ? __pfx_reset_store+0x10/0x10
> > [  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
> > [  +0.000004]  ? __check_object_size+0x4c1/0x640
> > [  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
> > [  +0.000006]  vfs_write+0x5d6/0xdf0
> > [  +0.000005]  ? fd_install+0x180/0x350
> > [  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
> > [  +0.000004]  ? do_fcntl+0x52c/0xcd0
> > [  +0.000004]  ? kasan_save_track+0x13/0x60
> > [  +0.000003]  ? kasan_save_free_info+0x37/0x60
> > [  +0.000006]  ksys_write+0xfa/0x1d0
> > [  +0.000003]  ? __pfx_ksys_write+0x10/0x10
> > [  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
> > [  +0.000004]  ? _raw_spin_lock+0x87/0xe0
> > [  +0.000005]  do_syscall_64+0x80/0x170
> > [  +0.000007]  ? _raw_spin_lock+0x87/0xe0
> > [  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
> > [  +0.000003]  ? file_close_fd_locked+0x167/0x230
> > [  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
> > [  +0.000005]  ? do_syscall_64+0x8c/0x170
> > [  +0.000004]  ? do_syscall_64+0x8c/0x170
> > [  +0.000003]  ? do_syscall_64+0x8c/0x170
> > [  +0.000003]  ? fput+0x1a/0x2c0
> > [  +0.000004]  ? filp_close+0x19/0x30
> > [  +0.000004]  ? do_dup2+0x25a/0x4c0
> > [  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
> > [  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
> > [  +0.000004]  ? do_syscall_64+0x8c/0x170
> > [  +0.000003]  ? __count_memcg_events+0x113/0x380
> > [  +0.000005]  ? handle_mm_fault+0x136/0x820
> > [  +0.000005]  ? do_user_addr_fault+0x444/0xa80
> > [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> > [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> > [  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  +0.000005] RIP: 0033:0x7f2033593154
> > 
> > Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
> 
> Shouldn't you include:
> Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi")
> 
> as we were iterating over XDP rings that were attached to q_vectors from
> the very beginning?
>

I probably should have done this.
 
> > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
> > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
> >  drivers/net/ethernet/intel/ice/ice_lib.c  | 129 ++++++----------------
> >  drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
> >  drivers/net/ethernet/intel/ice/ice_main.c |  17 ++-
> >  4 files changed, 49 insertions(+), 118 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> > index f448d3a84564..c158749a80e0 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_base.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> > @@ -190,16 +190,11 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
> >  	}
> >  	q_vector = vsi->q_vectors[v_idx];
> >  
> > -	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
> > -		ice_queue_set_napi(vsi, tx_ring->q_index, NETDEV_QUEUE_TYPE_TX,
> > -				   NULL);
> > +	ice_for_each_tx_ring(tx_ring, vsi->q_vectors[v_idx]->tx)
> >  		tx_ring->q_vector = NULL;
> > -	}
> > -	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
> > -		ice_queue_set_napi(vsi, rx_ring->q_index, NETDEV_QUEUE_TYPE_RX,
> > -				   NULL);
> > +
> > +	ice_for_each_rx_ring(rx_ring, vsi->q_vectors[v_idx]->rx)
> >  		rx_ring->q_vector = NULL;
> > -	}
> >  
> >  	/* only VSI with an associated netdev is set up with NAPI */
> >  	if (vsi->netdev)
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> > index 03c4df4ed585..5f2ddcaf7031 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > @@ -2286,9 +2286,6 @@ static int ice_vsi_cfg_def(struct ice_vsi *vsi)
> >  
> >  		ice_vsi_map_rings_to_vectors(vsi);
> >  
> > -		/* Associate q_vector rings to napi */
> > -		ice_vsi_set_napi_queues(vsi);
> > -
> >  		vsi->stat_offsets_loaded = false;
> >  
> >  		/* ICE_VSI_CTRL does not need RSS so skip RSS processing */
> > @@ -2621,6 +2618,7 @@ void ice_vsi_close(struct ice_vsi *vsi)
> >  	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
> >  		ice_down(vsi);
> >  
> > +	ice_vsi_clear_napi_queues(vsi);
> >  	ice_vsi_free_irq(vsi);
> >  	ice_vsi_free_tx_rings(vsi);
> >  	ice_vsi_free_rx_rings(vsi);
> > @@ -2687,120 +2685,55 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> >  }
> >  
> >  /**
> > - * __ice_queue_set_napi - Set the napi instance for the queue
> > - * @dev: device to which NAPI and queue belong
> > - * @queue_index: Index of queue
> > - * @type: queue type as RX or TX
> > - * @napi: NAPI context
> > - * @locked: is the rtnl_lock already held
> > - *
> > - * Set the napi instance for the queue. Caller indicates the lock status.
> > - */
> > -static void
> > -__ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
> > -		     enum netdev_queue_type type, struct napi_struct *napi,
> > -		     bool locked)
> > -{
> > -	if (!locked)
> > -		rtnl_lock();
> > -	netif_queue_set_napi(dev, queue_index, type, napi);
> > -	if (!locked)
> > -		rtnl_unlock();
> > -}
> > -
> > -/**
> > - * ice_queue_set_napi - Set the napi instance for the queue
> > - * @vsi: VSI being configured
> > - * @queue_index: Index of queue
> > - * @type: queue type as RX or TX
> > - * @napi: NAPI context
> > + * ice_vsi_set_napi_queues
> > + * @vsi: VSI pointer
> >   *
> > - * Set the napi instance for the queue. The rtnl lock state is derived from the
> > - * execution path.
> > + * Associate queue[s] with napi for all vectors.
> > + * The caller must hold rtnl_lock.
> >   */
> > -void
> > -ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
> > -		   enum netdev_queue_type type, struct napi_struct *napi)
> > +void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
> 
> this appears to be called only in ice_main.c. It should be moved there and
> made a static function instead of having it in ice_lib.c.
> 
> Unless I overlooked something...
>

You are not missing anything, but I think there is more value in keeping the 
set-clear functions together and ice_lib.c is a good place for that.

> >  {
> > -	struct ice_pf *pf = vsi->back;
> > +	struct net_device *netdev = vsi->netdev;
> > +	int q_idx, v_idx;
> >  
> > -	if (!vsi->netdev)
> > +	if (!netdev)
> >  		return;
> >  
> > -	if (current_work() == &pf->serv_task ||
> > -	    test_bit(ICE_PREPARED_FOR_RESET, pf->state) ||
> > -	    test_bit(ICE_DOWN, pf->state) ||
> > -	    test_bit(ICE_SUSPENDED, pf->state))
> > -		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
> > -				     false);
> > -	else
> > -		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
> > -				     true);
> > -}
> > +	ice_for_each_rxq(vsi, q_idx)
> > +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX,
> > +				     &vsi->rx_rings[q_idx]->q_vector->napi);
> >  
> > -/**
> > - * __ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
> > - * @q_vector: q_vector pointer
> > - * @locked: is the rtnl_lock already held
> > - *
> > - * Associate the q_vector napi with all the queue[s] on the vector.
> > - * Caller indicates the lock status.
> > - */
> > -void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
> > -{
> > -	struct ice_rx_ring *rx_ring;
> > -	struct ice_tx_ring *tx_ring;
> > -
> > -	ice_for_each_rx_ring(rx_ring, q_vector->rx)
> > -		__ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
> > -				     NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
> > -				     locked);
> > -
> > -	ice_for_each_tx_ring(tx_ring, q_vector->tx)
> > -		__ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
> > -				     NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
> > -				     locked);
> > +	ice_for_each_txq(vsi, q_idx)
> > +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX,
> > +				     &vsi->tx_rings[q_idx]->q_vector->napi);
> >  	/* Also set the interrupt number for the NAPI */
> > -	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
> > -}
> > -
> > -/**
> > - * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
> > - * @q_vector: q_vector pointer
> > - *
> > - * Associate the q_vector napi with all the queue[s] on the vector
> > - */
> > -void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector)
> > -{
> > -	struct ice_rx_ring *rx_ring;
> > -	struct ice_tx_ring *tx_ring;
> > -
> > -	ice_for_each_rx_ring(rx_ring, q_vector->rx)
> > -		ice_queue_set_napi(q_vector->vsi, rx_ring->q_index,
> > -				   NETDEV_QUEUE_TYPE_RX, &q_vector->napi);
> > +	ice_for_each_q_vector(vsi, v_idx) {
> > +		struct ice_q_vector *q_vector = vsi->q_vectors[v_idx];
> >  
> > -	ice_for_each_tx_ring(tx_ring, q_vector->tx)
> > -		ice_queue_set_napi(q_vector->vsi, tx_ring->q_index,
> > -				   NETDEV_QUEUE_TYPE_TX, &q_vector->napi);
> > -	/* Also set the interrupt number for the NAPI */
> > -	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
> > +		netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
> > +	}
> >  }
> >  
> >  /**
> > - * ice_vsi_set_napi_queues
> > + * ice_vsi_clear_napi_queues
> >   * @vsi: VSI pointer
> >   *
> > - * Associate queue[s] with napi for all vectors
> > + * Clear the association between all VSI queues queue[s] and napi.
> > + * The caller must hold rtnl_lock.
> >   */
> > -void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
> > +void ice_vsi_clear_napi_queues(struct ice_vsi *vsi)
> >  {
> > -	int i;
> > +	struct net_device *netdev = vsi->netdev;
> > +	int q_idx;
> >  
> > -	if (!vsi->netdev)
> > +	if (!netdev)
> >  		return;
> >  
> > -	ice_for_each_q_vector(vsi, i)
> > -		ice_q_vector_set_napi_queues(vsi->q_vectors[i]);
> > +	ice_for_each_txq(vsi, q_idx)
> > +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX, NULL);
> > +
> > +	ice_for_each_rxq(vsi, q_idx)
> > +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX, NULL);
> >  }
> >  
> >  /**
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
> > index 94ce8964dda6..36d86535695d 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.h
> > @@ -44,16 +44,10 @@ void ice_vsi_cfg_netdev_tc(struct ice_vsi *vsi, u8 ena_tc);
> >  struct ice_vsi *
> >  ice_vsi_setup(struct ice_pf *pf, struct ice_vsi_cfg_params *params);
> >  
> > -void
> > -ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
> > -		   enum netdev_queue_type type, struct napi_struct *napi);
> > -
> > -void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
> > -
> > -void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector);
> > -
> >  void ice_vsi_set_napi_queues(struct ice_vsi *vsi);
> >  
> > +void ice_vsi_clear_napi_queues(struct ice_vsi *vsi);
> > +
> >  int ice_vsi_release(struct ice_vsi *vsi);
> >  
> >  void ice_vsi_close(struct ice_vsi *vsi);
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > index 66820ed5e969..2d286a4609a5 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -3537,11 +3537,9 @@ static void ice_napi_add(struct ice_vsi *vsi)
> >  	if (!vsi->netdev)
> >  		return;
> >  
> > -	ice_for_each_q_vector(vsi, v_idx) {
> > +	ice_for_each_q_vector(vsi, v_idx)
> >  		netif_napi_add(vsi->netdev, &vsi->q_vectors[v_idx]->napi,
> >  			       ice_napi_poll);
> > -		__ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
> > -	}
> >  }
> >  
> >  /**
> > @@ -5519,7 +5517,9 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
> >  		if (ret)
> >  			goto err_reinit;
> >  		ice_vsi_map_rings_to_vectors(pf->vsi[v]);
> > +		rtnl_lock();
> >  		ice_vsi_set_napi_queues(pf->vsi[v]);
> > +		rtnl_unlock();
> >  	}
> >  
> >  	ret = ice_req_irq_msix_misc(pf);
> > @@ -5533,8 +5533,12 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
> >  
> >  err_reinit:
> >  	while (v--)
> > -		if (pf->vsi[v])
> > +		if (pf->vsi[v]) {
> > +			rtnl_lock();
> > +			ice_vsi_clear_napi_queues(pf->vsi[v]);
> > +			rtnl_unlock();
> >  			ice_vsi_free_q_vectors(pf->vsi[v]);
> > +		}
> >  
> >  	return ret;
> >  }
> > @@ -5599,6 +5603,9 @@ static int ice_suspend(struct device *dev)
> >  	ice_for_each_vsi(pf, v) {
> >  		if (!pf->vsi[v])
> >  			continue;
> > +		rtnl_lock();
> > +		ice_vsi_clear_napi_queues(pf->vsi[v]);
> > +		rtnl_unlock();
> >  		ice_vsi_free_q_vectors(pf->vsi[v]);
> >  	}
> >  	ice_clear_interrupt_scheme(pf);
> > @@ -7434,6 +7441,8 @@ int ice_vsi_open(struct ice_vsi *vsi)
> >  		err = netif_set_real_num_rx_queues(vsi->netdev, vsi->num_rxq);
> >  		if (err)
> >  			goto err_set_qs;
> > +
> > +		ice_vsi_set_napi_queues(vsi);
> >  	}
> >  
> >  	err = ice_up_complete(vsi);
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-08-20 12:47     ` Larysa Zaremba
@ 2024-08-20 13:26       ` Maciej Fijalkowski
  2024-08-21 21:20         ` Tony Nguyen
  0 siblings, 1 reply; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-20 13:26 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Tue, Aug 20, 2024 at 02:47:31PM +0200, Larysa Zaremba wrote:
> On Tue, Aug 20, 2024 at 02:31:51PM +0200, Maciej Fijalkowski wrote:
> > On Mon, Aug 19, 2024 at 12:05:38PM +0200, Larysa Zaremba wrote:
> > > Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
> > > not rtnl-locked when called from the reset. This creates the need to take
> > > the rtnl_lock just for a single function and complicates the
> > > synchronization with .ndo_bpf. At the same time, there no actual need to
> > > fill napi-to-queue information at this exact point.
> > > 
> > > Fill napi-to-queue information when opening the VSI and clear it when the
> > > VSI is being closed. Those routines are already rtnl-locked.
> > > 
> > > Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
> > > XDP queues, as this leads to out-of-bounds writes, such as one below.
> > > 
> > > [  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
> > > [  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
> > > [  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
> > > [  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> > > [  +0.000003] Call Trace:
> > > [  +0.000003]  <TASK>
> > > [  +0.000002]  dump_stack_lvl+0x60/0x80
> > > [  +0.000007]  print_report+0xce/0x630
> > > [  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > [  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
> > > [  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
> > > [  +0.000003]  kasan_report+0xe9/0x120
> > > [  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
> > > [  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
> > > [  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
> > > [  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
> > > [  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
> > > [  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
> > > [  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
> > > [  +0.000006]  pci_reset_function+0x12d/0x230
> > > [  +0.000004]  reset_store+0xa0/0x100
> > > [  +0.000006]  ? __pfx_reset_store+0x10/0x10
> > > [  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
> > > [  +0.000004]  ? __check_object_size+0x4c1/0x640
> > > [  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
> > > [  +0.000006]  vfs_write+0x5d6/0xdf0
> > > [  +0.000005]  ? fd_install+0x180/0x350
> > > [  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
> > > [  +0.000004]  ? do_fcntl+0x52c/0xcd0
> > > [  +0.000004]  ? kasan_save_track+0x13/0x60
> > > [  +0.000003]  ? kasan_save_free_info+0x37/0x60
> > > [  +0.000006]  ksys_write+0xfa/0x1d0
> > > [  +0.000003]  ? __pfx_ksys_write+0x10/0x10
> > > [  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
> > > [  +0.000004]  ? _raw_spin_lock+0x87/0xe0
> > > [  +0.000005]  do_syscall_64+0x80/0x170
> > > [  +0.000007]  ? _raw_spin_lock+0x87/0xe0
> > > [  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
> > > [  +0.000003]  ? file_close_fd_locked+0x167/0x230
> > > [  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
> > > [  +0.000005]  ? do_syscall_64+0x8c/0x170
> > > [  +0.000004]  ? do_syscall_64+0x8c/0x170
> > > [  +0.000003]  ? do_syscall_64+0x8c/0x170
> > > [  +0.000003]  ? fput+0x1a/0x2c0
> > > [  +0.000004]  ? filp_close+0x19/0x30
> > > [  +0.000004]  ? do_dup2+0x25a/0x4c0
> > > [  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
> > > [  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
> > > [  +0.000004]  ? do_syscall_64+0x8c/0x170
> > > [  +0.000003]  ? __count_memcg_events+0x113/0x380
> > > [  +0.000005]  ? handle_mm_fault+0x136/0x820
> > > [  +0.000005]  ? do_user_addr_fault+0x444/0xa80
> > > [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> > > [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> > > [  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [  +0.000005] RIP: 0033:0x7f2033593154
> > > 
> > > Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
> > 
> > Shouldn't you include:
> > Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi")
> > 
> > as we were iterating over XDP rings that were attached to q_vectors from
> > the very beginning?
> >
> 
> I probably should have done this.

Maybe this could be included while applying or sending the pull request by
Tony. I'll go through the rest of set today to see if I have any comments,
if there won't be anything outstanding then it won't make sense to send
next revision just to fix the fixes tags.

>  
> > > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > > Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
> > > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > ---
> > >  drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
> > >  drivers/net/ethernet/intel/ice/ice_lib.c  | 129 ++++++----------------
> > >  drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
> > >  drivers/net/ethernet/intel/ice/ice_main.c |  17 ++-
> > >  4 files changed, 49 insertions(+), 118 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> > > index f448d3a84564..c158749a80e0 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_base.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> > > @@ -190,16 +190,11 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
> > >  	}
> > >  	q_vector = vsi->q_vectors[v_idx];
> > >  
> > > -	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
> > > -		ice_queue_set_napi(vsi, tx_ring->q_index, NETDEV_QUEUE_TYPE_TX,
> > > -				   NULL);
> > > +	ice_for_each_tx_ring(tx_ring, vsi->q_vectors[v_idx]->tx)
> > >  		tx_ring->q_vector = NULL;
> > > -	}
> > > -	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
> > > -		ice_queue_set_napi(vsi, rx_ring->q_index, NETDEV_QUEUE_TYPE_RX,
> > > -				   NULL);
> > > +
> > > +	ice_for_each_rx_ring(rx_ring, vsi->q_vectors[v_idx]->rx)
> > >  		rx_ring->q_vector = NULL;
> > > -	}
> > >  
> > >  	/* only VSI with an associated netdev is set up with NAPI */
> > >  	if (vsi->netdev)
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> > > index 03c4df4ed585..5f2ddcaf7031 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > > @@ -2286,9 +2286,6 @@ static int ice_vsi_cfg_def(struct ice_vsi *vsi)
> > >  
> > >  		ice_vsi_map_rings_to_vectors(vsi);
> > >  
> > > -		/* Associate q_vector rings to napi */
> > > -		ice_vsi_set_napi_queues(vsi);
> > > -
> > >  		vsi->stat_offsets_loaded = false;
> > >  
> > >  		/* ICE_VSI_CTRL does not need RSS so skip RSS processing */
> > > @@ -2621,6 +2618,7 @@ void ice_vsi_close(struct ice_vsi *vsi)
> > >  	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
> > >  		ice_down(vsi);
> > >  
> > > +	ice_vsi_clear_napi_queues(vsi);
> > >  	ice_vsi_free_irq(vsi);
> > >  	ice_vsi_free_tx_rings(vsi);
> > >  	ice_vsi_free_rx_rings(vsi);
> > > @@ -2687,120 +2685,55 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> > >  }
> > >  
> > >  /**
> > > - * __ice_queue_set_napi - Set the napi instance for the queue
> > > - * @dev: device to which NAPI and queue belong
> > > - * @queue_index: Index of queue
> > > - * @type: queue type as RX or TX
> > > - * @napi: NAPI context
> > > - * @locked: is the rtnl_lock already held
> > > - *
> > > - * Set the napi instance for the queue. Caller indicates the lock status.
> > > - */
> > > -static void
> > > -__ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
> > > -		     enum netdev_queue_type type, struct napi_struct *napi,
> > > -		     bool locked)
> > > -{
> > > -	if (!locked)
> > > -		rtnl_lock();
> > > -	netif_queue_set_napi(dev, queue_index, type, napi);
> > > -	if (!locked)
> > > -		rtnl_unlock();
> > > -}
> > > -
> > > -/**
> > > - * ice_queue_set_napi - Set the napi instance for the queue
> > > - * @vsi: VSI being configured
> > > - * @queue_index: Index of queue
> > > - * @type: queue type as RX or TX
> > > - * @napi: NAPI context
> > > + * ice_vsi_set_napi_queues
> > > + * @vsi: VSI pointer
> > >   *
> > > - * Set the napi instance for the queue. The rtnl lock state is derived from the
> > > - * execution path.
> > > + * Associate queue[s] with napi for all vectors.
> > > + * The caller must hold rtnl_lock.
> > >   */
> > > -void
> > > -ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
> > > -		   enum netdev_queue_type type, struct napi_struct *napi)
> > > +void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
> > 
> > this appears to be called only in ice_main.c. It should be moved there and
> > made a static function instead of having it in ice_lib.c.
> > 
> > Unless I overlooked something...
> >
> 
> You are not missing anything, but I think there is more value in keeping the 
> set-clear functions together and ice_lib.c is a good place for that.

Yeah I realized after sending the comment that clear func indeed has to be
exported so I'd say we can live with that.

> 
> > >  {
> > > -	struct ice_pf *pf = vsi->back;
> > > +	struct net_device *netdev = vsi->netdev;
> > > +	int q_idx, v_idx;
> > >  
> > > -	if (!vsi->netdev)
> > > +	if (!netdev)
> > >  		return;
> > >  
> > > -	if (current_work() == &pf->serv_task ||
> > > -	    test_bit(ICE_PREPARED_FOR_RESET, pf->state) ||
> > > -	    test_bit(ICE_DOWN, pf->state) ||
> > > -	    test_bit(ICE_SUSPENDED, pf->state))
> > > -		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
> > > -				     false);
> > > -	else
> > > -		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
> > > -				     true);
> > > -}
> > > +	ice_for_each_rxq(vsi, q_idx)
> > > +		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX,
> > > +				     &vsi->rx_rings[q_idx]->q_vector->napi);
> > >  
> > > -/**

(...)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-08-20 13:26       ` Maciej Fijalkowski
@ 2024-08-21 21:20         ` Tony Nguyen
  0 siblings, 0 replies; 21+ messages in thread
From: Tony Nguyen @ 2024-08-21 21:20 UTC (permalink / raw)
  To: Maciej Fijalkowski, Larysa Zaremba
  Cc: intel-wired-lan, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, linux-kernel, bpf,
	magnus.karlsson, Michal Kubiak, Wojciech Drewek, Amritha Nambiar,
	Chandan Kumar Rout

On 8/20/2024 6:26 AM, Maciej Fijalkowski wrote:

...

>>>> Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
>>>
>>> Shouldn't you include:
>>> Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi")
>>>
>>> as we were iterating over XDP rings that were attached to q_vectors from
>>> the very beginning?
>>>
>>
>> I probably should have done this.
> 
> Maybe this could be included while applying or sending the pull request by
> Tony. I'll go through the rest of set today to see if I have any comments,
> if there won't be anything outstanding then it won't make sense to send
> next revision just to fix the fixes tags.

Yes, I can add that in.

Thanks,
Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-08-19 10:05 ` [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
@ 2024-08-22 11:34   ` Maciej Fijalkowski
  2024-08-22 12:56     ` Larysa Zaremba
  0 siblings, 1 reply; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-22 11:34 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Mon, Aug 19, 2024 at 12:05:41PM +0200, Larysa Zaremba wrote:
> Consider the following scenario:
> 
> .ndo_bpf()		| ice_prepare_for_reset()		|
> ________________________|_______________________________________|
> rtnl_lock()		|					|
> ice_down()		|					|
> 			| test_bit(ICE_VSI_DOWN) - true		|
> 			| ice_dis_vsi() returns			|
> ice_up()		|					|
> 			| proceeds to rebuild a running VSI	|
> 
> .ndo_bpf() is not the only rtnl-locked callback that toggles the interface
> to apply new configuration. Another example is .set_channels().
> 
> To avoid the race condition above, act only after reading ICE_VSI_DOWN
> under rtnl_lock.
> 
> Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index b72338974a60..94029e446b99 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2665,8 +2665,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
>   */
>  void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
>  {
> -	if (test_bit(ICE_VSI_DOWN, vsi->state))
> -		return;
> +	bool already_down = test_bit(ICE_VSI_DOWN, vsi->state);
>  
>  	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
>  
> @@ -2674,15 +2673,16 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
>  		if (netif_running(vsi->netdev)) {
>  			if (!locked)
>  				rtnl_lock();
> -
> -			ice_vsi_close(vsi);
> +			already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> +			if (!already_down)
> +				ice_vsi_close(vsi);

ehh sorry for being sloppy reviewer. we still are testing ICE_VSI_DOWN in
ice_vsi_close(). wouldn't all of this be cleaner if we would bail out of
the called function when bit was already set?

>  
>  			if (!locked)
>  				rtnl_unlock();
> -		} else {
> +		} else if (!already_down) {
>  			ice_vsi_close(vsi);
>  		}
> -	} else if (vsi->type == ICE_VSI_CTRL) {
> +	} else if (vsi->type == ICE_VSI_CTRL && !already_down) {
>  		ice_vsi_close(vsi);
>  	}
>  }
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 6/6] ice: do not bring the VSI up, if it was down before the XDP setup
  2024-08-19 10:05 ` [PATCH iwl-net v3 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
@ 2024-08-22 11:35   ` Maciej Fijalkowski
  0 siblings, 0 replies; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-22 11:35 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Mon, Aug 19, 2024 at 12:05:43PM +0200, Larysa Zaremba wrote:
> After XDP configuration is completed, we bring the interface up
> unconditionally, regardless of its state before the call to .ndo_bpf().
> 
> Preserve the information whether the interface had to be brought down and
> later bring it up only in such case.
> 
> Fixes: efc2214b6047 ("ice: Add support for XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>

Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index a718763d2370..d3277d5d3bd2 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -2984,8 +2984,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>  		   struct netlink_ext_ack *extack)
>  {
>  	unsigned int frame_size = vsi->netdev->mtu + ICE_ETH_PKT_HDR_PAD;
> -	bool if_running = netif_running(vsi->netdev);
>  	int ret = 0, xdp_ring_err = 0;
> +	bool if_running;
>  
>  	if (prog && !prog->aux->xdp_has_frags) {
>  		if (frame_size > ice_max_xdp_frame_size(vsi)) {
> @@ -3002,8 +3002,11 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>  		return 0;
>  	}
>  
> +	if_running = netif_running(vsi->netdev) &&
> +		     !test_and_set_bit(ICE_VSI_DOWN, vsi->state);
> +
>  	/* need to stop netdev while setting up the program for Rx rings */
> -	if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) {
> +	if (if_running) {
>  		ret = ice_down(vsi);
>  		if (ret) {
>  			NL_SET_ERR_MSG_MOD(extack, "Preparing device for XDP attach failed");
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 3/6] ice: check for XDP rings instead of bpf program when unconfiguring
  2024-08-19 10:05 ` [PATCH iwl-net v3 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
@ 2024-08-22 11:36   ` Maciej Fijalkowski
  0 siblings, 0 replies; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-22 11:36 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Mon, Aug 19, 2024 at 12:05:40PM +0200, Larysa Zaremba wrote:
> If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
> VSI without applying new ring configuration. When unconfiguring the VSI, we
> can encounter the state in which there is an XDP program but no XDP rings
> to destroy or there will be XDP rings that need to be destroyed, but no XDP
> program to indicate their presence.
> 
> When unconfiguring, rely on the presence of XDP rings rather then XDP
> program, as they better represent the current state that has to be
> destroyed.
> 
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>

Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> ---
>  drivers/net/ethernet/intel/ice/ice_lib.c  | 4 ++--
>  drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
>  drivers/net/ethernet/intel/ice/ice_xsk.c  | 6 +++---
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index a8721ecdf2cd..b72338974a60 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2419,7 +2419,7 @@ void ice_vsi_decfg(struct ice_vsi *vsi)
>  		dev_err(ice_pf_to_dev(pf), "Failed to remove RDMA scheduler config for VSI %u, err %d\n",
>  			vsi->vsi_num, err);
>  
> -	if (ice_is_xdp_ena_vsi(vsi))
> +	if (vsi->xdp_rings)
>  		/* return value check can be skipped here, it always returns
>  		 * 0 if reset is in progress
>  		 */
> @@ -2521,7 +2521,7 @@ static void ice_vsi_release_msix(struct ice_vsi *vsi)
>  		for (q = 0; q < q_vector->num_ring_tx; q++) {
>  			ice_write_itr(&q_vector->tx, 0);
>  			wr32(hw, QINT_TQCTL(vsi->txq_map[txq]), 0);
> -			if (ice_is_xdp_ena_vsi(vsi)) {
> +			if (vsi->xdp_rings) {
>  				u32 xdp_txq = txq + vsi->num_xdp_txq;
>  
>  				wr32(hw, QINT_TQCTL(vsi->txq_map[xdp_txq]), 0);
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index e92f43850671..a718763d2370 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -7228,7 +7228,7 @@ int ice_down(struct ice_vsi *vsi)
>  	if (tx_err)
>  		netdev_err(vsi->netdev, "Failed stop Tx rings, VSI %d error %d\n",
>  			   vsi->vsi_num, tx_err);
> -	if (!tx_err && ice_is_xdp_ena_vsi(vsi)) {
> +	if (!tx_err && vsi->xdp_rings) {
>  		tx_err = ice_vsi_stop_xdp_tx_rings(vsi);
>  		if (tx_err)
>  			netdev_err(vsi->netdev, "Failed stop XDP rings, VSI %d error %d\n",
> @@ -7245,7 +7245,7 @@ int ice_down(struct ice_vsi *vsi)
>  	ice_for_each_txq(vsi, i)
>  		ice_clean_tx_ring(vsi->tx_rings[i]);
>  
> -	if (ice_is_xdp_ena_vsi(vsi))
> +	if (vsi->xdp_rings)
>  		ice_for_each_xdp_txq(vsi, i)
>  			ice_clean_tx_ring(vsi->xdp_rings[i]);
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index a659951fa987..8693509efbe7 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -39,7 +39,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
>  	       sizeof(vsi_stat->rx_ring_stats[q_idx]->rx_stats));
>  	memset(&vsi_stat->tx_ring_stats[q_idx]->stats, 0,
>  	       sizeof(vsi_stat->tx_ring_stats[q_idx]->stats));
> -	if (ice_is_xdp_ena_vsi(vsi))
> +	if (vsi->xdp_rings)
>  		memset(&vsi->xdp_rings[q_idx]->ring_stats->stats, 0,
>  		       sizeof(vsi->xdp_rings[q_idx]->ring_stats->stats));
>  }
> @@ -52,7 +52,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
>  static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx)
>  {
>  	ice_clean_tx_ring(vsi->tx_rings[q_idx]);
> -	if (ice_is_xdp_ena_vsi(vsi))
> +	if (vsi->xdp_rings)
>  		ice_clean_tx_ring(vsi->xdp_rings[q_idx]);
>  	ice_clean_rx_ring(vsi->rx_rings[q_idx]);
>  }
> @@ -194,7 +194,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, 0, tx_ring, &txq_meta);
>  	if (!fail)
>  		fail = err;
> -	if (ice_is_xdp_ena_vsi(vsi)) {
> +	if (vsi->xdp_rings) {
>  		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
>  
>  		memset(&txq_meta, 0, sizeof(txq_meta));
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 2/6] ice: protect XDP configuration with a mutex
  2024-08-19 10:05 ` [PATCH iwl-net v3 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
@ 2024-08-22 11:39   ` Maciej Fijalkowski
  2024-08-22 13:05     ` Larysa Zaremba
  0 siblings, 1 reply; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-22 11:39 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Mon, Aug 19, 2024 at 12:05:39PM +0200, Larysa Zaremba wrote:
> The main threat to data consistency in ice_xdp() is a possible asynchronous
> PF reset. It can be triggered by a user or by TX timeout handler.
> 
> XDP setup and PF reset code access the same resources in the following
> sections:
> * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
> * ice_vsi_rebuild() for the PF VSI - not protected
> * ice_vsi_open() - already rtnl-locked
> 
> With an unfortunate timing, such accesses can result in a crash such as the
> one below:
> 
> [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
> [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
> [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
> [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
> [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
> [ +0.394718] ice 0000:b1:00.0: PTP reset successful
> [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
> [ +0.000045] #PF: supervisor read access in kernel mode
> [ +0.000023] #PF: error_code(0x0000) - not-present page
> [ +0.000023] PGD 0 P4D 0
> [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
> [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> [ +0.000036] Workqueue: ice ice_service_task [ice]
> [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
> [...]
> [ +0.000013] Call Trace:
> [ +0.000016] <TASK>
> [ +0.000014] ? __die+0x1f/0x70
> [ +0.000029] ? page_fault_oops+0x171/0x4f0
> [ +0.000029] ? schedule+0x3b/0xd0
> [ +0.000027] ? exc_page_fault+0x7b/0x180
> [ +0.000022] ? asm_exc_page_fault+0x22/0x30
> [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
> [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
> [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
> [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
> [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
> [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
> [ +0.000145] ice_rebuild+0x18c/0x840 [ice]
> [ +0.000145] ? delay_tsc+0x4a/0xc0
> [ +0.000022] ? delay_tsc+0x92/0xc0
> [ +0.000020] ice_do_reset+0x140/0x180 [ice]
> [ +0.000886] ice_service_task+0x404/0x1030 [ice]
> [ +0.000824] process_one_work+0x171/0x340
> [ +0.000685] worker_thread+0x277/0x3a0
> [ +0.000675] ? preempt_count_add+0x6a/0xa0
> [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
> [ +0.000679] ? __pfx_worker_thread+0x10/0x10
> [ +0.000653] kthread+0xf0/0x120
> [ +0.000635] ? __pfx_kthread+0x10/0x10
> [ +0.000616] ret_from_fork+0x2d/0x50
> [ +0.000612] ? __pfx_kthread+0x10/0x10
> [ +0.000604] ret_from_fork_asm+0x1b/0x30
> [ +0.000604] </TASK>
> 
> The previous way of handling this through returning -EBUSY is not viable,
> particularly when destroying AF_XDP socket, because the kernel proceeds
> with removal anyway.
> 
> There is plenty of code between those calls and there is no need to create
> a large critical section that covers all of them, same as there is no need
> to protect ice_vsi_rebuild() with rtnl_lock().
> 
> Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
> 
> Leaving unprotected sections in between would result in two states that
> have to be considered:
> 1. when the VSI is closed, but not yet rebuild
> 2. when VSI is already rebuild, but not yet open
> 
> The latter case is actually already handled through !netif_running() case,
> we just need to adjust flag checking a little. The former one is not as
> trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
> hardware interaction happens, this can make adding/deleting rings exit
> with an error. Luckily, VSI rebuild is pending and can apply new
> configuration for us in a managed fashion.
> 
> Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
> indicate that ice_xdp() can just hot-swap the program.
> 
> Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
> consistent by deconfiguring VSI when coalesce allocation fails.
> 
> Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> Fixes: efc2214b6047 ("ice: Add support for XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h      |  2 ++
>  drivers/net/ethernet/intel/ice/ice_lib.c  | 34 ++++++++++++++---------
>  drivers/net/ethernet/intel/ice/ice_main.c | 19 +++++++++----
>  drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 +-
>  4 files changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index caaa10157909..ce8b5505b16d 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -318,6 +318,7 @@ enum ice_vsi_state {
>  	ICE_VSI_UMAC_FLTR_CHANGED,
>  	ICE_VSI_MMAC_FLTR_CHANGED,
>  	ICE_VSI_PROMISC_CHANGED,
> +	ICE_VSI_REBUILD_PENDING,
>  	ICE_VSI_STATE_NBITS		/* must be last */
>  };
>  
> @@ -411,6 +412,7 @@ struct ice_vsi {
>  	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
>  	u16 num_xdp_txq;		 /* Used XDP queues */
>  	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
> +	struct mutex xdp_state_lock;
>  
>  	struct net_device **target_netdevs;
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index 5f2ddcaf7031..a8721ecdf2cd 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -447,6 +447,7 @@ static void ice_vsi_free(struct ice_vsi *vsi)
>  
>  	ice_vsi_free_stats(vsi);
>  	ice_vsi_free_arrays(vsi);
> +	mutex_destroy(&vsi->xdp_state_lock);
>  	mutex_unlock(&pf->sw_mutex);
>  	devm_kfree(dev, vsi);
>  }
> @@ -626,6 +627,8 @@ static struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf)
>  	pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
>  					 pf->next_vsi);
>  
> +	mutex_init(&vsi->xdp_state_lock);
> +
>  unlock_pf:
>  	mutex_unlock(&pf->sw_mutex);
>  	return vsi;
> @@ -2973,19 +2976,24 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
>  	if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
>  		return -EINVAL;
>  
> +	mutex_lock(&vsi->xdp_state_lock);
> +	clear_bit(ICE_VSI_REBUILD_PENDING, vsi->state);

I am not sure what we be the state of interface if rebuild wouldn't
succeed but it feels like clearing this bit should happen at the end of
rebuild when we are sure it was succesful?

> +
>  	ret = ice_vsi_realloc_stat_arrays(vsi);
>  	if (ret)
> -		goto err_vsi_cfg;
> +		goto unlock;
>  
>  	ice_vsi_decfg(vsi);
>  	ret = ice_vsi_cfg_def(vsi);
>  	if (ret)
> -		goto err_vsi_cfg;
> +		goto unlock;
>  
>  	coalesce = kcalloc(vsi->num_q_vectors,
>  			   sizeof(struct ice_coalesce_stored), GFP_KERNEL);
> -	if (!coalesce)
> -		return -ENOMEM;
> +	if (!coalesce) {
> +		ret = -ENOMEM;
> +		goto decfg;
> +	}
>  
>  	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
>  
> @@ -2993,22 +3001,22 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
>  	if (ret) {
>  		if (vsi_flags & ICE_VSI_FLAG_INIT) {
>  			ret = -EIO;
> -			goto err_vsi_cfg_tc_lan;
> +			goto free_coalesce;
>  		}
>  
> -		kfree(coalesce);
> -		return ice_schedule_reset(pf, ICE_RESET_PFR);
> +		ret = ice_schedule_reset(pf, ICE_RESET_PFR);
> +		goto free_coalesce;
>  	}
>  
>  	ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors);
> -	kfree(coalesce);
>  
> -	return 0;
> -
> -err_vsi_cfg_tc_lan:
> -	ice_vsi_decfg(vsi);
> +free_coalesce:
>  	kfree(coalesce);
> -err_vsi_cfg:
> +decfg:
> +	if (ret)
> +		ice_vsi_decfg(vsi);
> +unlock:
> +	mutex_unlock(&vsi->xdp_state_lock);
>  	return ret;
>  }
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 2d286a4609a5..e92f43850671 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -595,6 +595,7 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
>  	/* clear SW filtering DB */
>  	ice_clear_hw_tbls(hw);
>  	/* disable the VSIs and their queues that are not already DOWN */
> +	set_bit(ICE_VSI_REBUILD_PENDING, ice_get_main_vsi(pf)->state);
>  	ice_pf_dis_all_vsi(pf, false);
>  
>  	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
> @@ -2995,7 +2996,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>  	}
>  
>  	/* hot swap progs and avoid toggling link */
> -	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
> +	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
> +	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
>  		ice_vsi_assign_bpf_prog(vsi, prog);
>  		return 0;
>  	}
> @@ -3067,21 +3069,28 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>  {
>  	struct ice_netdev_priv *np = netdev_priv(dev);
>  	struct ice_vsi *vsi = np->vsi;
> +	int ret;
>  
>  	if (vsi->type != ICE_VSI_PF) {
>  		NL_SET_ERR_MSG_MOD(xdp->extack, "XDP can be loaded only on PF VSI");
>  		return -EINVAL;
>  	}
>  
> +	mutex_lock(&vsi->xdp_state_lock);
> +
>  	switch (xdp->command) {
>  	case XDP_SETUP_PROG:
> -		return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> +		ret = ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> +		break;
>  	case XDP_SETUP_XSK_POOL:
> -		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
> -					  xdp->xsk.queue_id);
> +		ret = ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id);
> +		break;
>  	default:
> -		return -EINVAL;
> +		ret = -EINVAL;
>  	}
> +
> +	mutex_unlock(&vsi->xdp_state_lock);
> +	return ret;
>  }
>  
>  /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 240a7bec242b..a659951fa987 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -390,7 +390,8 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
>  		goto failure;
>  	}
>  
> -	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
> +	if_running = !test_bit(ICE_VSI_DOWN, vsi->state) &&
> +		     ice_is_xdp_ena_vsi(vsi);
>  
>  	if (if_running) {
>  		struct ice_rx_ring *rx_ring = vsi->rx_rings[qid];
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-08-19 10:05 ` [PATCH iwl-net v3 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
@ 2024-08-22 11:43   ` Maciej Fijalkowski
  2024-08-22 13:07     ` Larysa Zaremba
  0 siblings, 1 reply; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-22 11:43 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Mon, Aug 19, 2024 at 12:05:42PM +0200, Larysa Zaremba wrote:
> Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
> because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
> state, not VSI one. Therefore it does not protect the queue pair from
> e.g. reset.
> 
> Despite being useless, it still can deadlock the unfortunate functions that
> have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
> returns an error.

I believe the last sentence is not valid after our recent fixes around xsk
and tx timeouts.

> 
> Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
> 
> Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> ---
>  drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 8693509efbe7..5dee829bfc47 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -165,7 +165,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	struct ice_q_vector *q_vector;
>  	struct ice_tx_ring *tx_ring;
>  	struct ice_rx_ring *rx_ring;
> -	int timeout = 50;
>  	int fail = 0;
>  	int err;
>  
> @@ -176,13 +175,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	rx_ring = vsi->rx_rings[q_idx];
>  	q_vector = rx_ring->q_vector;
>  
> -	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
> -		timeout--;
> -		if (!timeout)
> -			return -EBUSY;
> -		usleep_range(1000, 2000);
> -	}
> -
>  	synchronize_net();
>  	netif_carrier_off(vsi->netdev);
>  	netif_tx_stop_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
> @@ -261,7 +253,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>  		netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
>  		netif_carrier_on(vsi->netdev);
>  	}
> -	clear_bit(ICE_CFG_BUSY, vsi->state);
>  
>  	return fail;
>  }
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-08-22 11:34   ` Maciej Fijalkowski
@ 2024-08-22 12:56     ` Larysa Zaremba
  2024-08-22 14:42       ` Maciej Fijalkowski
  0 siblings, 1 reply; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-22 12:56 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Thu, Aug 22, 2024 at 01:34:33PM +0200, Maciej Fijalkowski wrote:
> On Mon, Aug 19, 2024 at 12:05:41PM +0200, Larysa Zaremba wrote:
> > Consider the following scenario:
> > 
> > .ndo_bpf()		| ice_prepare_for_reset()		|
> > ________________________|_______________________________________|
> > rtnl_lock()		|					|
> > ice_down()		|					|
> > 			| test_bit(ICE_VSI_DOWN) - true		|
> > 			| ice_dis_vsi() returns			|
> > ice_up()		|					|
> > 			| proceeds to rebuild a running VSI	|
> > 
> > .ndo_bpf() is not the only rtnl-locked callback that toggles the interface
> > to apply new configuration. Another example is .set_channels().
> > 
> > To avoid the race condition above, act only after reading ICE_VSI_DOWN
> > under rtnl_lock.
> > 
> > Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
> > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> > index b72338974a60..94029e446b99 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > @@ -2665,8 +2665,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
> >   */
> >  void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> >  {
> > -	if (test_bit(ICE_VSI_DOWN, vsi->state))
> > -		return;
> > +	bool already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> >  
> >  	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
> >  
> > @@ -2674,15 +2673,16 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> >  		if (netif_running(vsi->netdev)) {
> >  			if (!locked)
> >  				rtnl_lock();
> > -
> > -			ice_vsi_close(vsi);
> > +			already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> > +			if (!already_down)
> > +				ice_vsi_close(vsi);
> 
> ehh sorry for being sloppy reviewer. we still are testing ICE_VSI_DOWN in
> ice_vsi_close(). wouldn't all of this be cleaner if we would bail out of
> the called function when bit was already set?
>

I am not sure I see the possibility to rewrite this as you suggest, we cannot 
bail out for the netif_running() case due to needing to unlock after 
ice_vsi_close() finishes. This leaves bailing out in case of CTRL VSI and 
non-running PF, which we could do, but it would require a lengthy if condition, 
which is not that much better than nested code, IMO.

> >  
> >  			if (!locked)
> >  				rtnl_unlock();
> > -		} else {
> > +		} else if (!already_down) {
> >  			ice_vsi_close(vsi);
> >  		}
> > -	} else if (vsi->type == ICE_VSI_CTRL) {
> > +	} else if (vsi->type == ICE_VSI_CTRL && !already_down) {
> >  		ice_vsi_close(vsi);
> >  	}
> >  }
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 2/6] ice: protect XDP configuration with a mutex
  2024-08-22 11:39   ` Maciej Fijalkowski
@ 2024-08-22 13:05     ` Larysa Zaremba
  0 siblings, 0 replies; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-22 13:05 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Thu, Aug 22, 2024 at 01:39:15PM +0200, Maciej Fijalkowski wrote:
> On Mon, Aug 19, 2024 at 12:05:39PM +0200, Larysa Zaremba wrote:
> > The main threat to data consistency in ice_xdp() is a possible asynchronous
> > PF reset. It can be triggered by a user or by TX timeout handler.
> > 
> > XDP setup and PF reset code access the same resources in the following
> > sections:
> > * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
> > * ice_vsi_rebuild() for the PF VSI - not protected
> > * ice_vsi_open() - already rtnl-locked
> > 
> > With an unfortunate timing, such accesses can result in a crash such as the
> > one below:
> > 
> > [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
> > [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
> > [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
> > [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
> > [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
> > [ +0.394718] ice 0000:b1:00.0: PTP reset successful
> > [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
> > [ +0.000045] #PF: supervisor read access in kernel mode
> > [ +0.000023] #PF: error_code(0x0000) - not-present page
> > [ +0.000023] PGD 0 P4D 0
> > [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
> > [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> > [ +0.000036] Workqueue: ice ice_service_task [ice]
> > [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
> > [...]
> > [ +0.000013] Call Trace:
> > [ +0.000016] <TASK>
> > [ +0.000014] ? __die+0x1f/0x70
> > [ +0.000029] ? page_fault_oops+0x171/0x4f0
> > [ +0.000029] ? schedule+0x3b/0xd0
> > [ +0.000027] ? exc_page_fault+0x7b/0x180
> > [ +0.000022] ? asm_exc_page_fault+0x22/0x30
> > [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
> > [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
> > [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
> > [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
> > [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
> > [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
> > [ +0.000145] ice_rebuild+0x18c/0x840 [ice]
> > [ +0.000145] ? delay_tsc+0x4a/0xc0
> > [ +0.000022] ? delay_tsc+0x92/0xc0
> > [ +0.000020] ice_do_reset+0x140/0x180 [ice]
> > [ +0.000886] ice_service_task+0x404/0x1030 [ice]
> > [ +0.000824] process_one_work+0x171/0x340
> > [ +0.000685] worker_thread+0x277/0x3a0
> > [ +0.000675] ? preempt_count_add+0x6a/0xa0
> > [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
> > [ +0.000679] ? __pfx_worker_thread+0x10/0x10
> > [ +0.000653] kthread+0xf0/0x120
> > [ +0.000635] ? __pfx_kthread+0x10/0x10
> > [ +0.000616] ret_from_fork+0x2d/0x50
> > [ +0.000612] ? __pfx_kthread+0x10/0x10
> > [ +0.000604] ret_from_fork_asm+0x1b/0x30
> > [ +0.000604] </TASK>
> > 
> > The previous way of handling this through returning -EBUSY is not viable,
> > particularly when destroying AF_XDP socket, because the kernel proceeds
> > with removal anyway.
> > 
> > There is plenty of code between those calls and there is no need to create
> > a large critical section that covers all of them, same as there is no need
> > to protect ice_vsi_rebuild() with rtnl_lock().
> > 
> > Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
> > 
> > Leaving unprotected sections in between would result in two states that
> > have to be considered:
> > 1. when the VSI is closed, but not yet rebuild
> > 2. when VSI is already rebuild, but not yet open
> > 
> > The latter case is actually already handled through !netif_running() case,
> > we just need to adjust flag checking a little. The former one is not as
> > trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
> > hardware interaction happens, this can make adding/deleting rings exit
> > with an error. Luckily, VSI rebuild is pending and can apply new
> > configuration for us in a managed fashion.
> > 
> > Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
> > indicate that ice_xdp() can just hot-swap the program.
> > 
> > Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
> > consistent by deconfiguring VSI when coalesce allocation fails.
> > 
> > Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> > Fixes: efc2214b6047 ("ice: Add support for XDP")
> > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice.h      |  2 ++
> >  drivers/net/ethernet/intel/ice/ice_lib.c  | 34 ++++++++++++++---------
> >  drivers/net/ethernet/intel/ice/ice_main.c | 19 +++++++++----
> >  drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 +-
> >  4 files changed, 39 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> > index caaa10157909..ce8b5505b16d 100644
> > --- a/drivers/net/ethernet/intel/ice/ice.h
> > +++ b/drivers/net/ethernet/intel/ice/ice.h
> > @@ -318,6 +318,7 @@ enum ice_vsi_state {
> >  	ICE_VSI_UMAC_FLTR_CHANGED,
> >  	ICE_VSI_MMAC_FLTR_CHANGED,
> >  	ICE_VSI_PROMISC_CHANGED,
> > +	ICE_VSI_REBUILD_PENDING,
> >  	ICE_VSI_STATE_NBITS		/* must be last */
> >  };
> >  
> > @@ -411,6 +412,7 @@ struct ice_vsi {
> >  	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
> >  	u16 num_xdp_txq;		 /* Used XDP queues */
> >  	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
> > +	struct mutex xdp_state_lock;
> >  
> >  	struct net_device **target_netdevs;
> >  
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> > index 5f2ddcaf7031..a8721ecdf2cd 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > @@ -447,6 +447,7 @@ static void ice_vsi_free(struct ice_vsi *vsi)
> >  
> >  	ice_vsi_free_stats(vsi);
> >  	ice_vsi_free_arrays(vsi);
> > +	mutex_destroy(&vsi->xdp_state_lock);
> >  	mutex_unlock(&pf->sw_mutex);
> >  	devm_kfree(dev, vsi);
> >  }
> > @@ -626,6 +627,8 @@ static struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf)
> >  	pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
> >  					 pf->next_vsi);
> >  
> > +	mutex_init(&vsi->xdp_state_lock);
> > +
> >  unlock_pf:
> >  	mutex_unlock(&pf->sw_mutex);
> >  	return vsi;
> > @@ -2973,19 +2976,24 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> >  	if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
> >  		return -EINVAL;
> >  
> > +	mutex_lock(&vsi->xdp_state_lock);
> > +	clear_bit(ICE_VSI_REBUILD_PENDING, vsi->state);
> 
> I am not sure what we be the state of interface if rebuild wouldn't
> succeed but it feels like clearing this bit should happen at the end of
> rebuild when we are sure it was succesful?
>

Unfortunately, this is a very valid point and I will have to send another 
version.
Until rebuild is completed, we cannot be sure that VSI is in a state to 
configure XDP queues on, but we can be sure that either 
there will be another rebuild soon or XDP won't be the user's main concern.

> > +
> >  	ret = ice_vsi_realloc_stat_arrays(vsi);
> >  	if (ret)
> > -		goto err_vsi_cfg;
> > +		goto unlock;
> >  
> >  	ice_vsi_decfg(vsi);
> >  	ret = ice_vsi_cfg_def(vsi);
> >  	if (ret)
> > -		goto err_vsi_cfg;
> > +		goto unlock;
> >  
> >  	coalesce = kcalloc(vsi->num_q_vectors,
> >  			   sizeof(struct ice_coalesce_stored), GFP_KERNEL);
> > -	if (!coalesce)
> > -		return -ENOMEM;
> > +	if (!coalesce) {
> > +		ret = -ENOMEM;
> > +		goto decfg;
> > +	}
> >  
> >  	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
> >  
> > @@ -2993,22 +3001,22 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> >  	if (ret) {
> >  		if (vsi_flags & ICE_VSI_FLAG_INIT) {
> >  			ret = -EIO;
> > -			goto err_vsi_cfg_tc_lan;
> > +			goto free_coalesce;
> >  		}
> >  
> > -		kfree(coalesce);
> > -		return ice_schedule_reset(pf, ICE_RESET_PFR);
> > +		ret = ice_schedule_reset(pf, ICE_RESET_PFR);
> > +		goto free_coalesce;
> >  	}
> >  
> >  	ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors);
> > -	kfree(coalesce);
> >  
> > -	return 0;
> > -
> > -err_vsi_cfg_tc_lan:
> > -	ice_vsi_decfg(vsi);
> > +free_coalesce:
> >  	kfree(coalesce);
> > -err_vsi_cfg:
> > +decfg:
> > +	if (ret)
> > +		ice_vsi_decfg(vsi);
> > +unlock:
> > +	mutex_unlock(&vsi->xdp_state_lock);
> >  	return ret;
> >  }
> >  
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > index 2d286a4609a5..e92f43850671 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -595,6 +595,7 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
> >  	/* clear SW filtering DB */
> >  	ice_clear_hw_tbls(hw);
> >  	/* disable the VSIs and their queues that are not already DOWN */
> > +	set_bit(ICE_VSI_REBUILD_PENDING, ice_get_main_vsi(pf)->state);
> >  	ice_pf_dis_all_vsi(pf, false);
> >  
> >  	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
> > @@ -2995,7 +2996,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
> >  	}
> >  
> >  	/* hot swap progs and avoid toggling link */
> > -	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
> > +	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
> > +	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
> >  		ice_vsi_assign_bpf_prog(vsi, prog);
> >  		return 0;
> >  	}
> > @@ -3067,21 +3069,28 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> >  {
> >  	struct ice_netdev_priv *np = netdev_priv(dev);
> >  	struct ice_vsi *vsi = np->vsi;
> > +	int ret;
> >  
> >  	if (vsi->type != ICE_VSI_PF) {
> >  		NL_SET_ERR_MSG_MOD(xdp->extack, "XDP can be loaded only on PF VSI");
> >  		return -EINVAL;
> >  	}
> >  
> > +	mutex_lock(&vsi->xdp_state_lock);
> > +
> >  	switch (xdp->command) {
> >  	case XDP_SETUP_PROG:
> > -		return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> > +		ret = ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> > +		break;
> >  	case XDP_SETUP_XSK_POOL:
> > -		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
> > -					  xdp->xsk.queue_id);
> > +		ret = ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id);
> > +		break;
> >  	default:
> > -		return -EINVAL;
> > +		ret = -EINVAL;
> >  	}
> > +
> > +	mutex_unlock(&vsi->xdp_state_lock);
> > +	return ret;
> >  }
> >  
> >  /**
> > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > index 240a7bec242b..a659951fa987 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > @@ -390,7 +390,8 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
> >  		goto failure;
> >  	}
> >  
> > -	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
> > +	if_running = !test_bit(ICE_VSI_DOWN, vsi->state) &&
> > +		     ice_is_xdp_ena_vsi(vsi);
> >  
> >  	if (if_running) {
> >  		struct ice_rx_ring *rx_ring = vsi->rx_rings[qid];
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-08-22 11:43   ` Maciej Fijalkowski
@ 2024-08-22 13:07     ` Larysa Zaremba
  0 siblings, 0 replies; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-22 13:07 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Thu, Aug 22, 2024 at 01:43:50PM +0200, Maciej Fijalkowski wrote:
> On Mon, Aug 19, 2024 at 12:05:42PM +0200, Larysa Zaremba wrote:
> > Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
> > because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
> > state, not VSI one. Therefore it does not protect the queue pair from
> > e.g. reset.
> > 
> > Despite being useless, it still can deadlock the unfortunate functions that
> > have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
> > returns an error.
> 
> I believe the last sentence is not valid after our recent fixes around xsk
> and tx timeouts.
>

Yes, this is no longer valid, I need to remove this.
 
> > 
> > Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
> > 
> > Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> 
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> 
> > ---
> >  drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
> >  1 file changed, 9 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > index 8693509efbe7..5dee829bfc47 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > @@ -165,7 +165,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
> >  	struct ice_q_vector *q_vector;
> >  	struct ice_tx_ring *tx_ring;
> >  	struct ice_rx_ring *rx_ring;
> > -	int timeout = 50;
> >  	int fail = 0;
> >  	int err;
> >  
> > @@ -176,13 +175,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
> >  	rx_ring = vsi->rx_rings[q_idx];
> >  	q_vector = rx_ring->q_vector;
> >  
> > -	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
> > -		timeout--;
> > -		if (!timeout)
> > -			return -EBUSY;
> > -		usleep_range(1000, 2000);
> > -	}
> > -
> >  	synchronize_net();
> >  	netif_carrier_off(vsi->netdev);
> >  	netif_tx_stop_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
> > @@ -261,7 +253,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
> >  		netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
> >  		netif_carrier_on(vsi->netdev);
> >  	}
> > -	clear_bit(ICE_CFG_BUSY, vsi->state);
> >  
> >  	return fail;
> >  }
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-08-22 12:56     ` Larysa Zaremba
@ 2024-08-22 14:42       ` Maciej Fijalkowski
  2024-08-22 17:18         ` Larysa Zaremba
  0 siblings, 1 reply; 21+ messages in thread
From: Maciej Fijalkowski @ 2024-08-22 14:42 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Thu, Aug 22, 2024 at 02:56:50PM +0200, Larysa Zaremba wrote:
> On Thu, Aug 22, 2024 at 01:34:33PM +0200, Maciej Fijalkowski wrote:
> > On Mon, Aug 19, 2024 at 12:05:41PM +0200, Larysa Zaremba wrote:
> > > Consider the following scenario:
> > > 
> > > .ndo_bpf()		| ice_prepare_for_reset()		|
> > > ________________________|_______________________________________|
> > > rtnl_lock()		|					|
> > > ice_down()		|					|
> > > 			| test_bit(ICE_VSI_DOWN) - true		|
> > > 			| ice_dis_vsi() returns			|
> > > ice_up()		|					|
> > > 			| proceeds to rebuild a running VSI	|
> > > 
> > > .ndo_bpf() is not the only rtnl-locked callback that toggles the interface
> > > to apply new configuration. Another example is .set_channels().
> > > 
> > > To avoid the race condition above, act only after reading ICE_VSI_DOWN
> > > under rtnl_lock.
> > > 
> > > Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
> > > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > ---
> > >  drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> > > index b72338974a60..94029e446b99 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > > @@ -2665,8 +2665,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
> > >   */
> > >  void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> > >  {
> > > -	if (test_bit(ICE_VSI_DOWN, vsi->state))
> > > -		return;
> > > +	bool already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> > >  
> > >  	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
> > >  
> > > @@ -2674,15 +2673,16 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> > >  		if (netif_running(vsi->netdev)) {
> > >  			if (!locked)
> > >  				rtnl_lock();
> > > -
> > > -			ice_vsi_close(vsi);
> > > +			already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> > > +			if (!already_down)
> > > +				ice_vsi_close(vsi);
> > 
> > ehh sorry for being sloppy reviewer. we still are testing ICE_VSI_DOWN in
> > ice_vsi_close(). wouldn't all of this be cleaner if we would bail out of
> > the called function when bit was already set?
> >
> 
> I am not sure I see the possibility to rewrite this as you suggest, we cannot 
> bail out for the netif_running() case due to needing to unlock after 
> ice_vsi_close() finishes. This leaves bailing out in case of CTRL VSI and 
> non-running PF, which we could do, but it would require a lengthy if condition, 
> which is not that much better than nested code, IMO.

Hmm. I meant to move bit checking onto ice_vsi_close() only, so you would
bail out of it in case bit has already been set.

overall, ice_dis_vsi() is a very cumbersome way of calling ice_vsi_close()
:(

I guess we can progress with what you have but i'd like to brainstorm
later about some simplification around it.

I prototyped something but not tested that, just to maybe spark a
discussion. Feels easier to read and swallow in the end. Not sure if
functionality is kept:)

From 706289d5c37c41cd3021997e0d5e64d7496e5536 Mon Sep 17 00:00:00 2001
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Thu, 22 Aug 2024 16:33:37 +0200
Subject: [PATCH] ice: simplify ice_dis_vsi()

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 46 +++++++++++++-----------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index f559e60992fa..8ccdda69a1d4 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2625,14 +2625,34 @@ void ice_vsi_free_rx_rings(struct ice_vsi *vsi)
  */
 void ice_vsi_close(struct ice_vsi *vsi)
 {
-	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
-		ice_down(vsi);
+	if (test_bit(ICE_VSI_DOWN, vsi->state))
+		return;
+
+	set_bit(ICE_VSI_DOWN, vsi->state);
 
+	ice_down(vsi);
 	ice_vsi_free_irq(vsi);
 	ice_vsi_free_tx_rings(vsi);
 	ice_vsi_free_rx_rings(vsi);
 }
 
+/**
+ * __ice_vsi_close - variant of shutting down a VSI that takes care of
+ *                   rtnl_lock
+ * @vsi: the VSI being shut down
+ * @take_lock: to lock or not to lock
+ */
+static void __ice_vsi_close(struct ice_vsi *vsi, bool take_lock)
+{
+	if (take_lock)
+		rtnl_lock();
+
+	ice_vsi_close(vsi);
+
+	if (take_lock)
+		rtnl_unlock();
+}
+
 /**
  * ice_ena_vsi - resume a VSI
  * @vsi: the VSI being resume
@@ -2671,26 +2691,12 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
  */
 void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
 {
-	if (test_bit(ICE_VSI_DOWN, vsi->state))
-		return;
-
 	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
 
-	if (vsi->type == ICE_VSI_PF && vsi->netdev) {
-		if (netif_running(vsi->netdev)) {
-			if (!locked)
-				rtnl_lock();
-
-			ice_vsi_close(vsi);
-
-			if (!locked)
-				rtnl_unlock();
-		} else {
-			ice_vsi_close(vsi);
-		}
-	} else if (vsi->type == ICE_VSI_CTRL) {
-		ice_vsi_close(vsi);
-	}
+	if (vsi->type == ICE_VSI_PF && vsi->netdev)
+		__ice_vsi_close(vsi, !locked && netif_running(vsi->netdev));
+	else if (vsi->type == ICE_VSI_CTRL)
+		__ice_vsi_close(vsi, false);
 }
 
 /**
-- 
2.34.1



> 
> > >  
> > >  			if (!locked)
> > >  				rtnl_unlock();
> > > -		} else {
> > > +		} else if (!already_down) {
> > >  			ice_vsi_close(vsi);
> > >  		}
> > > -	} else if (vsi->type == ICE_VSI_CTRL) {
> > > +	} else if (vsi->type == ICE_VSI_CTRL && !already_down) {
> > >  		ice_vsi_close(vsi);
> > >  	}
> > >  }
> > > -- 
> > > 2.43.0
> > > 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-08-22 14:42       ` Maciej Fijalkowski
@ 2024-08-22 17:18         ` Larysa Zaremba
  0 siblings, 0 replies; 21+ messages in thread
From: Larysa Zaremba @ 2024-08-22 17:18 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar, Chandan Kumar Rout

On Thu, Aug 22, 2024 at 04:42:44PM +0200, Maciej Fijalkowski wrote:
> On Thu, Aug 22, 2024 at 02:56:50PM +0200, Larysa Zaremba wrote:
> > On Thu, Aug 22, 2024 at 01:34:33PM +0200, Maciej Fijalkowski wrote:
> > > On Mon, Aug 19, 2024 at 12:05:41PM +0200, Larysa Zaremba wrote:
> > > > Consider the following scenario:
> > > > 
> > > > .ndo_bpf()		| ice_prepare_for_reset()		|
> > > > ________________________|_______________________________________|
> > > > rtnl_lock()		|					|
> > > > ice_down()		|					|
> > > > 			| test_bit(ICE_VSI_DOWN) - true		|
> > > > 			| ice_dis_vsi() returns			|
> > > > ice_up()		|					|
> > > > 			| proceeds to rebuild a running VSI	|
> > > > 
> > > > .ndo_bpf() is not the only rtnl-locked callback that toggles the interface
> > > > to apply new configuration. Another example is .set_channels().
> > > > 
> > > > To avoid the race condition above, act only after reading ICE_VSI_DOWN
> > > > under rtnl_lock.
> > > > 
> > > > Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
> > > > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > > > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > > > Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
> > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > > ---
> > > >  drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
> > > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> > > > index b72338974a60..94029e446b99 100644
> > > > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > > > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > > > @@ -2665,8 +2665,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
> > > >   */
> > > >  void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> > > >  {
> > > > -	if (test_bit(ICE_VSI_DOWN, vsi->state))
> > > > -		return;
> > > > +	bool already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> > > >  
> > > >  	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
> > > >  
> > > > @@ -2674,15 +2673,16 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
> > > >  		if (netif_running(vsi->netdev)) {
> > > >  			if (!locked)
> > > >  				rtnl_lock();
> > > > -
> > > > -			ice_vsi_close(vsi);
> > > > +			already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> > > > +			if (!already_down)
> > > > +				ice_vsi_close(vsi);
> > > 
> > > ehh sorry for being sloppy reviewer. we still are testing ICE_VSI_DOWN in
> > > ice_vsi_close(). wouldn't all of this be cleaner if we would bail out of
> > > the called function when bit was already set?
> > >
> > 
> > I am not sure I see the possibility to rewrite this as you suggest, we cannot 
> > bail out for the netif_running() case due to needing to unlock after 
> > ice_vsi_close() finishes. This leaves bailing out in case of CTRL VSI and 
> > non-running PF, which we could do, but it would require a lengthy if condition, 
> > which is not that much better than nested code, IMO.
> 
> Hmm. I meant to move bit checking onto ice_vsi_close() only, so you would
> bail out of it in case bit has already been set.
> 
> overall, ice_dis_vsi() is a very cumbersome way of calling ice_vsi_close()
> :(
> 
> I guess we can progress with what you have but i'd like to brainstorm
> later about some simplification around it.
> 
> I prototyped something but not tested that, just to maybe spark a
> discussion. Feels easier to read and swallow in the end. Not sure if
> functionality is kept:)
>

Ok, now I get it.
Yes, this is something worth considering for a -next patch. Opting out of 
closing the VSI based on a down state seems not very nice though :/
I am not even sure if such approach is correct in ice_dis_vsi or is it just 
some ancient atrifact.
Seems like it needs some VSI state changes analysis.

> From 706289d5c37c41cd3021997e0d5e64d7496e5536 Mon Sep 17 00:00:00 2001
> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Date: Thu, 22 Aug 2024 16:33:37 +0200
> Subject: [PATCH] ice: simplify ice_dis_vsi()
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_lib.c | 46 +++++++++++++-----------
>  1 file changed, 26 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index f559e60992fa..8ccdda69a1d4 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2625,14 +2625,34 @@ void ice_vsi_free_rx_rings(struct ice_vsi *vsi)
>   */
>  void ice_vsi_close(struct ice_vsi *vsi)
>  {
> -	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
> -		ice_down(vsi);
> +	if (test_bit(ICE_VSI_DOWN, vsi->state))
> +		return;
> +
> +	set_bit(ICE_VSI_DOWN, vsi->state);
>  
> +	ice_down(vsi);
>  	ice_vsi_free_irq(vsi);
>  	ice_vsi_free_tx_rings(vsi);
>  	ice_vsi_free_rx_rings(vsi);
>  }
>  
> +/**
> + * __ice_vsi_close - variant of shutting down a VSI that takes care of
> + *                   rtnl_lock
> + * @vsi: the VSI being shut down
> + * @take_lock: to lock or not to lock
> + */
> +static void __ice_vsi_close(struct ice_vsi *vsi, bool take_lock)
> +{
> +	if (take_lock)
> +		rtnl_lock();
> +
> +	ice_vsi_close(vsi);
> +
> +	if (take_lock)
> +		rtnl_unlock();
> +}
> +
>  /**
>   * ice_ena_vsi - resume a VSI
>   * @vsi: the VSI being resume
> @@ -2671,26 +2691,12 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
>   */
>  void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
>  {
> -	if (test_bit(ICE_VSI_DOWN, vsi->state))
> -		return;
> -
>  	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
>  
> -	if (vsi->type == ICE_VSI_PF && vsi->netdev) {
> -		if (netif_running(vsi->netdev)) {
> -			if (!locked)
> -				rtnl_lock();
> -
> -			ice_vsi_close(vsi);
> -
> -			if (!locked)
> -				rtnl_unlock();
> -		} else {
> -			ice_vsi_close(vsi);
> -		}
> -	} else if (vsi->type == ICE_VSI_CTRL) {
> -		ice_vsi_close(vsi);
> -	}
> +	if (vsi->type == ICE_VSI_PF && vsi->netdev)
> +		__ice_vsi_close(vsi, !locked && netif_running(vsi->netdev));
> +	else if (vsi->type == ICE_VSI_CTRL)
> +		__ice_vsi_close(vsi, false);
>  }
>  
>  /**
> -- 
> 2.34.1
> 
> 
> 
> > 
> > > >  
> > > >  			if (!locked)
> > > >  				rtnl_unlock();
> > > > -		} else {
> > > > +		} else if (!already_down) {
> > > >  			ice_vsi_close(vsi);
> > > >  		}
> > > > -	} else if (vsi->type == ICE_VSI_CTRL) {
> > > > +	} else if (vsi->type == ICE_VSI_CTRL && !already_down) {
> > > >  		ice_vsi_close(vsi);
> > > >  	}
> > > >  }
> > > > -- 
> > > > 2.43.0
> > > > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-08-22 17:18 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-19 10:05 [PATCH iwl-net v3 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
2024-08-19 10:05 ` [PATCH iwl-net v3 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
2024-08-20 12:31   ` Maciej Fijalkowski
2024-08-20 12:47     ` Larysa Zaremba
2024-08-20 13:26       ` Maciej Fijalkowski
2024-08-21 21:20         ` Tony Nguyen
2024-08-19 10:05 ` [PATCH iwl-net v3 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
2024-08-22 11:39   ` Maciej Fijalkowski
2024-08-22 13:05     ` Larysa Zaremba
2024-08-19 10:05 ` [PATCH iwl-net v3 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
2024-08-22 11:36   ` Maciej Fijalkowski
2024-08-19 10:05 ` [PATCH iwl-net v3 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
2024-08-22 11:34   ` Maciej Fijalkowski
2024-08-22 12:56     ` Larysa Zaremba
2024-08-22 14:42       ` Maciej Fijalkowski
2024-08-22 17:18         ` Larysa Zaremba
2024-08-19 10:05 ` [PATCH iwl-net v3 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
2024-08-22 11:43   ` Maciej Fijalkowski
2024-08-22 13:07     ` Larysa Zaremba
2024-08-19 10:05 ` [PATCH iwl-net v3 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
2024-08-22 11:35   ` Maciej Fijalkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).