netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset
@ 2024-07-24 16:48 Larysa Zaremba
  2024-07-24 16:48 ` [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-07-24 16:48 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Larysa Zaremba, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Maciej Fijalkowski, netdev, linux-kernel, bpf, magnus.karlsson,
	Michal Kubiak, Wojciech Drewek, Amritha Nambiar

PF reset can be triggered asynchronously, by tx_timeout or by a user. With some
unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and
modify XDP rings at the same time, causing system crash.

The first patch factors out rtnl-locked code from VSI rebuild code to avoid
deadlock. The following changes lock rebuild and .ndo_bpf() critical sections
with an internal mutex as well and provide complementary fixes.

v1: https://lore.kernel.org/netdev/20240610153716.31493-1-larysa.zaremba@intel.com/
v1->v2:
* use mutex for locking
* redefine critical sections
* account for short time between rebuild and VSI being open
* add netif_queue_set_napi() patch, so ICE_RTNL_WAITS_FOR_RESET strategy can be
  dropped, no more rtnl-locked code in ice_vsi_rebuild()
* change the test case from waiting for tx_timeout to happen to actively firing
  resets through sysfs, this adds more minor fixes on top

Larysa Zaremba (6):
  ice: move netif_queue_set_napi to rtnl-protected sections
  ice: protect XDP configuration with a mutex
  ice: check for XDP rings instead of bpf program when unconfiguring
  ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  ice: remove ICE_CFG_BUSY locking from AF_XDP code
  ice: do not bring the VSI up, if it was down before the XDP setup

 drivers/net/ethernet/intel/ice/ice.h      |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
 drivers/net/ethernet/intel/ice/ice_lib.c  | 171 +++++++---------------
 drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
 drivers/net/ethernet/intel/ice/ice_main.c |  47 ++++--
 drivers/net/ethernet/intel/ice/ice_xsk.c  |  18 +--
 6 files changed, 102 insertions(+), 157 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-07-24 16:48 [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
@ 2024-07-24 16:48 ` Larysa Zaremba
  2024-07-24 18:21   ` Jacob Keller
                     ` (2 more replies)
  2024-07-24 16:48 ` [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-07-24 16:48 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Larysa Zaremba, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Maciej Fijalkowski, netdev, linux-kernel, bpf, magnus.karlsson,
	Michal Kubiak, Wojciech Drewek, Amritha Nambiar

Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
not rtnl-locked when called from the reset. This creates the need to take
the rtnl_lock just for a single function and complicates the
synchronization with .ndo_bpf. At the same time, there no actual need to
fill napi-to-queue information at this exact point.

Fill napi-to-queue information when opening the VSI and clear it when the
VSI is being closed. Those routines are already rtnl-locked.

Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
XDP queues, as this leads to out-of-bounds writes, such as one below.

[  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
[  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
[  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
[  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[  +0.000003] Call Trace:
[  +0.000003]  <TASK>
[  +0.000002]  dump_stack_lvl+0x60/0x80
[  +0.000007]  print_report+0xce/0x630
[  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
[  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
[  +0.000003]  kasan_report+0xe9/0x120
[  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
[  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
[  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
[  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
[  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
[  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
[  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
[  +0.000006]  pci_reset_function+0x12d/0x230
[  +0.000004]  reset_store+0xa0/0x100
[  +0.000006]  ? __pfx_reset_store+0x10/0x10
[  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
[  +0.000004]  ? __check_object_size+0x4c1/0x640
[  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
[  +0.000006]  vfs_write+0x5d6/0xdf0
[  +0.000005]  ? fd_install+0x180/0x350
[  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
[  +0.000004]  ? do_fcntl+0x52c/0xcd0
[  +0.000004]  ? kasan_save_track+0x13/0x60
[  +0.000003]  ? kasan_save_free_info+0x37/0x60
[  +0.000006]  ksys_write+0xfa/0x1d0
[  +0.000003]  ? __pfx_ksys_write+0x10/0x10
[  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
[  +0.000004]  ? _raw_spin_lock+0x87/0xe0
[  +0.000005]  do_syscall_64+0x80/0x170
[  +0.000007]  ? _raw_spin_lock+0x87/0xe0
[  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
[  +0.000003]  ? file_close_fd_locked+0x167/0x230
[  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
[  +0.000005]  ? do_syscall_64+0x8c/0x170
[  +0.000004]  ? do_syscall_64+0x8c/0x170
[  +0.000003]  ? do_syscall_64+0x8c/0x170
[  +0.000003]  ? fput+0x1a/0x2c0
[  +0.000004]  ? filp_close+0x19/0x30
[  +0.000004]  ? do_dup2+0x25a/0x4c0
[  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
[  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
[  +0.000004]  ? do_syscall_64+0x8c/0x170
[  +0.000003]  ? __count_memcg_events+0x113/0x380
[  +0.000005]  ? handle_mm_fault+0x136/0x820
[  +0.000005]  ? do_user_addr_fault+0x444/0xa80
[  +0.000004]  ? clear_bhb_loop+0x25/0x80
[  +0.000004]  ? clear_bhb_loop+0x25/0x80
[  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000005] RIP: 0033:0x7f2033593154

Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
 drivers/net/ethernet/intel/ice/ice_lib.c  | 129 ++++++----------------
 drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
 drivers/net/ethernet/intel/ice/ice_main.c |  17 ++-
 4 files changed, 49 insertions(+), 118 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 5d396c1a7731..0157a4eacd7e 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -190,16 +190,11 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
 	}
 	q_vector = vsi->q_vectors[v_idx];
 
-	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
-		ice_queue_set_napi(vsi, tx_ring->q_index, NETDEV_QUEUE_TYPE_TX,
-				   NULL);
+	ice_for_each_tx_ring(tx_ring, vsi->q_vectors[v_idx]->tx)
 		tx_ring->q_vector = NULL;
-	}
-	ice_for_each_rx_ring(rx_ring, q_vector->rx) {
-		ice_queue_set_napi(vsi, rx_ring->q_index, NETDEV_QUEUE_TYPE_RX,
-				   NULL);
+
+	ice_for_each_rx_ring(rx_ring, vsi->q_vectors[v_idx]->rx)
 		rx_ring->q_vector = NULL;
-	}
 
 	/* only VSI with an associated netdev is set up with NAPI */
 	if (vsi->netdev)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 03c4df4ed585..5f2ddcaf7031 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2286,9 +2286,6 @@ static int ice_vsi_cfg_def(struct ice_vsi *vsi)
 
 		ice_vsi_map_rings_to_vectors(vsi);
 
-		/* Associate q_vector rings to napi */
-		ice_vsi_set_napi_queues(vsi);
-
 		vsi->stat_offsets_loaded = false;
 
 		/* ICE_VSI_CTRL does not need RSS so skip RSS processing */
@@ -2621,6 +2618,7 @@ void ice_vsi_close(struct ice_vsi *vsi)
 	if (!test_and_set_bit(ICE_VSI_DOWN, vsi->state))
 		ice_down(vsi);
 
+	ice_vsi_clear_napi_queues(vsi);
 	ice_vsi_free_irq(vsi);
 	ice_vsi_free_tx_rings(vsi);
 	ice_vsi_free_rx_rings(vsi);
@@ -2687,120 +2685,55 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
 }
 
 /**
- * __ice_queue_set_napi - Set the napi instance for the queue
- * @dev: device to which NAPI and queue belong
- * @queue_index: Index of queue
- * @type: queue type as RX or TX
- * @napi: NAPI context
- * @locked: is the rtnl_lock already held
- *
- * Set the napi instance for the queue. Caller indicates the lock status.
- */
-static void
-__ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
-		     enum netdev_queue_type type, struct napi_struct *napi,
-		     bool locked)
-{
-	if (!locked)
-		rtnl_lock();
-	netif_queue_set_napi(dev, queue_index, type, napi);
-	if (!locked)
-		rtnl_unlock();
-}
-
-/**
- * ice_queue_set_napi - Set the napi instance for the queue
- * @vsi: VSI being configured
- * @queue_index: Index of queue
- * @type: queue type as RX or TX
- * @napi: NAPI context
+ * ice_vsi_set_napi_queues
+ * @vsi: VSI pointer
  *
- * Set the napi instance for the queue. The rtnl lock state is derived from the
- * execution path.
+ * Associate queue[s] with napi for all vectors.
+ * The caller must hold rtnl_lock.
  */
-void
-ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
-		   enum netdev_queue_type type, struct napi_struct *napi)
+void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
 {
-	struct ice_pf *pf = vsi->back;
+	struct net_device *netdev = vsi->netdev;
+	int q_idx, v_idx;
 
-	if (!vsi->netdev)
+	if (!netdev)
 		return;
 
-	if (current_work() == &pf->serv_task ||
-	    test_bit(ICE_PREPARED_FOR_RESET, pf->state) ||
-	    test_bit(ICE_DOWN, pf->state) ||
-	    test_bit(ICE_SUSPENDED, pf->state))
-		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
-				     false);
-	else
-		__ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
-				     true);
-}
+	ice_for_each_rxq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX,
+				     &vsi->rx_rings[q_idx]->q_vector->napi);
 
-/**
- * __ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
- * @q_vector: q_vector pointer
- * @locked: is the rtnl_lock already held
- *
- * Associate the q_vector napi with all the queue[s] on the vector.
- * Caller indicates the lock status.
- */
-void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
-{
-	struct ice_rx_ring *rx_ring;
-	struct ice_tx_ring *tx_ring;
-
-	ice_for_each_rx_ring(rx_ring, q_vector->rx)
-		__ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
-				     NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
-				     locked);
-
-	ice_for_each_tx_ring(tx_ring, q_vector->tx)
-		__ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
-				     NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
-				     locked);
+	ice_for_each_txq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX,
+				     &vsi->tx_rings[q_idx]->q_vector->napi);
 	/* Also set the interrupt number for the NAPI */
-	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
-}
-
-/**
- * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
- * @q_vector: q_vector pointer
- *
- * Associate the q_vector napi with all the queue[s] on the vector
- */
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector)
-{
-	struct ice_rx_ring *rx_ring;
-	struct ice_tx_ring *tx_ring;
-
-	ice_for_each_rx_ring(rx_ring, q_vector->rx)
-		ice_queue_set_napi(q_vector->vsi, rx_ring->q_index,
-				   NETDEV_QUEUE_TYPE_RX, &q_vector->napi);
+	ice_for_each_q_vector(vsi, v_idx) {
+		struct ice_q_vector *q_vector = vsi->q_vectors[v_idx];
 
-	ice_for_each_tx_ring(tx_ring, q_vector->tx)
-		ice_queue_set_napi(q_vector->vsi, tx_ring->q_index,
-				   NETDEV_QUEUE_TYPE_TX, &q_vector->napi);
-	/* Also set the interrupt number for the NAPI */
-	netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
+		netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
+	}
 }
 
 /**
- * ice_vsi_set_napi_queues
+ * ice_vsi_clear_napi_queues
  * @vsi: VSI pointer
  *
- * Associate queue[s] with napi for all vectors
+ * Clear the association between all VSI queues queue[s] and napi.
+ * The caller must hold rtnl_lock.
  */
-void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
+void ice_vsi_clear_napi_queues(struct ice_vsi *vsi)
 {
-	int i;
+	struct net_device *netdev = vsi->netdev;
+	int q_idx;
 
-	if (!vsi->netdev)
+	if (!netdev)
 		return;
 
-	ice_for_each_q_vector(vsi, i)
-		ice_q_vector_set_napi_queues(vsi->q_vectors[i]);
+	ice_for_each_txq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX, NULL);
+
+	ice_for_each_rxq(vsi, q_idx)
+		netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX, NULL);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 94ce8964dda6..36d86535695d 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -44,16 +44,10 @@ void ice_vsi_cfg_netdev_tc(struct ice_vsi *vsi, u8 ena_tc);
 struct ice_vsi *
 ice_vsi_setup(struct ice_pf *pf, struct ice_vsi_cfg_params *params);
 
-void
-ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
-		   enum netdev_queue_type type, struct napi_struct *napi);
-
-void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
-
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector);
-
 void ice_vsi_set_napi_queues(struct ice_vsi *vsi);
 
+void ice_vsi_clear_napi_queues(struct ice_vsi *vsi);
+
 int ice_vsi_release(struct ice_vsi *vsi);
 
 void ice_vsi_close(struct ice_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index ec636be4d17d..8ed1798bb06e 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3553,11 +3553,9 @@ static void ice_napi_add(struct ice_vsi *vsi)
 	if (!vsi->netdev)
 		return;
 
-	ice_for_each_q_vector(vsi, v_idx) {
+	ice_for_each_q_vector(vsi, v_idx)
 		netif_napi_add(vsi->netdev, &vsi->q_vectors[v_idx]->napi,
 			       ice_napi_poll);
-		__ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
-	}
 }
 
 /**
@@ -5535,7 +5533,9 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
 		if (ret)
 			goto err_reinit;
 		ice_vsi_map_rings_to_vectors(pf->vsi[v]);
+		rtnl_lock();
 		ice_vsi_set_napi_queues(pf->vsi[v]);
+		rtnl_unlock();
 	}
 
 	ret = ice_req_irq_msix_misc(pf);
@@ -5549,8 +5549,12 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
 
 err_reinit:
 	while (v--)
-		if (pf->vsi[v])
+		if (pf->vsi[v]) {
+			rtnl_lock();
+			ice_vsi_clear_napi_queues(pf->vsi[v]);
+			rtnl_unlock();
 			ice_vsi_free_q_vectors(pf->vsi[v]);
+		}
 
 	return ret;
 }
@@ -5615,6 +5619,9 @@ static int ice_suspend(struct device *dev)
 	ice_for_each_vsi(pf, v) {
 		if (!pf->vsi[v])
 			continue;
+		rtnl_lock();
+		ice_vsi_clear_napi_queues(pf->vsi[v]);
+		rtnl_unlock();
 		ice_vsi_free_q_vectors(pf->vsi[v]);
 	}
 	ice_clear_interrupt_scheme(pf);
@@ -7450,6 +7457,8 @@ int ice_vsi_open(struct ice_vsi *vsi)
 		err = netif_set_real_num_rx_queues(vsi->netdev, vsi->num_rxq);
 		if (err)
 			goto err_set_qs;
+
+		ice_vsi_set_napi_queues(vsi);
 	}
 
 	err = ice_up_complete(vsi);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex
  2024-07-24 16:48 [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
  2024-07-24 16:48 ` [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
@ 2024-07-24 16:48 ` Larysa Zaremba
  2024-07-24 18:24   ` Jacob Keller
                     ` (2 more replies)
  2024-07-24 16:48 ` [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-07-24 16:48 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Larysa Zaremba, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Maciej Fijalkowski, netdev, linux-kernel, bpf, magnus.karlsson,
	Michal Kubiak, Wojciech Drewek, Amritha Nambiar

The main threat to data consistency in ice_xdp() is a possible asynchronous
PF reset. It can be triggered by a user or by TX timeout handler.

XDP setup and PF reset code access the same resources in the following
sections:
* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
* ice_vsi_rebuild() for the PF VSI - not protected
* ice_vsi_open() - already rtnl-locked

With an unfortunate timing, such accesses can result in a crash such as the
one below:

[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
[ +0.394718] ice 0000:b1:00.0: PTP reset successful
[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ +0.000045] #PF: supervisor read access in kernel mode
[ +0.000023] #PF: error_code(0x0000) - not-present page
[ +0.000023] PGD 0 P4D 0
[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000036] Workqueue: ice ice_service_task [ice]
[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
[...]
[ +0.000013] Call Trace:
[ +0.000016] <TASK>
[ +0.000014] ? __die+0x1f/0x70
[ +0.000029] ? page_fault_oops+0x171/0x4f0
[ +0.000029] ? schedule+0x3b/0xd0
[ +0.000027] ? exc_page_fault+0x7b/0x180
[ +0.000022] ? asm_exc_page_fault+0x22/0x30
[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[ +0.000145] ice_rebuild+0x18c/0x840 [ice]
[ +0.000145] ? delay_tsc+0x4a/0xc0
[ +0.000022] ? delay_tsc+0x92/0xc0
[ +0.000020] ice_do_reset+0x140/0x180 [ice]
[ +0.000886] ice_service_task+0x404/0x1030 [ice]
[ +0.000824] process_one_work+0x171/0x340
[ +0.000685] worker_thread+0x277/0x3a0
[ +0.000675] ? preempt_count_add+0x6a/0xa0
[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
[ +0.000679] ? __pfx_worker_thread+0x10/0x10
[ +0.000653] kthread+0xf0/0x120
[ +0.000635] ? __pfx_kthread+0x10/0x10
[ +0.000616] ret_from_fork+0x2d/0x50
[ +0.000612] ? __pfx_kthread+0x10/0x10
[ +0.000604] ret_from_fork_asm+0x1b/0x30
[ +0.000604] </TASK>

The previous way of handling this through returning -EBUSY is not viable,
particularly when destroying AF_XDP socket, because the kernel proceeds
with removal anyway.

There is plenty of code between those calls and there is no need to create
a large critical section that covers all of them, same as there is no need
to protect ice_vsi_rebuild() with rtnl_lock().

Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().

Leaving unprotected sections in between would result in two states that
have to be considered:
1. when the VSI is closed, but not yet rebuild
2. when VSI is already rebuild, but not yet open

The latter case is actually already handled through !netif_running() case,
we just need to adjust flag checking a little. The former one is not as
trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
hardware interaction happens, this can make adding/deleting rings exit
with an error. Luckily, VSI rebuild is pending and can apply new
configuration for us in a managed fashion.

Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
indicate that ice_xdp() can just hot-swap the program.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |  2 ++
 drivers/net/ethernet/intel/ice/ice_lib.c  | 26 +++++++++++++++--------
 drivers/net/ethernet/intel/ice/ice_main.c | 19 ++++++++++++-----
 drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 ++-
 4 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 99a75a59078e..3d7a0abc13ab 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -318,6 +318,7 @@ enum ice_vsi_state {
 	ICE_VSI_UMAC_FLTR_CHANGED,
 	ICE_VSI_MMAC_FLTR_CHANGED,
 	ICE_VSI_PROMISC_CHANGED,
+	ICE_VSI_REBUILD_PENDING,
 	ICE_VSI_STATE_NBITS		/* must be last */
 };
 
@@ -411,6 +412,7 @@ struct ice_vsi {
 	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
 	u16 num_xdp_txq;		 /* Used XDP queues */
 	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
+	struct mutex xdp_state_lock;
 
 	struct net_device **target_netdevs;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 5f2ddcaf7031..bbef5ec67cae 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -447,6 +447,7 @@ static void ice_vsi_free(struct ice_vsi *vsi)
 
 	ice_vsi_free_stats(vsi);
 	ice_vsi_free_arrays(vsi);
+	mutex_destroy(&vsi->xdp_state_lock);
 	mutex_unlock(&pf->sw_mutex);
 	devm_kfree(dev, vsi);
 }
@@ -626,6 +627,8 @@ static struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf)
 	pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
 					 pf->next_vsi);
 
+	mutex_init(&vsi->xdp_state_lock);
+
 unlock_pf:
 	mutex_unlock(&pf->sw_mutex);
 	return vsi;
@@ -2973,19 +2976,24 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
 	if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
 		return -EINVAL;
 
+	mutex_lock(&vsi->xdp_state_lock);
+	clear_bit(ICE_VSI_REBUILD_PENDING, vsi->state);
+
 	ret = ice_vsi_realloc_stat_arrays(vsi);
 	if (ret)
-		goto err_vsi_cfg;
+		goto unlock;
 
 	ice_vsi_decfg(vsi);
 	ret = ice_vsi_cfg_def(vsi);
 	if (ret)
-		goto err_vsi_cfg;
+		goto unlock;
 
 	coalesce = kcalloc(vsi->num_q_vectors,
 			   sizeof(struct ice_coalesce_stored), GFP_KERNEL);
-	if (!coalesce)
-		return -ENOMEM;
+	if (!coalesce) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
 
 	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
 
@@ -2996,19 +3004,19 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
 			goto err_vsi_cfg_tc_lan;
 		}
 
-		kfree(coalesce);
-		return ice_schedule_reset(pf, ICE_RESET_PFR);
+		ret = ice_schedule_reset(pf, ICE_RESET_PFR);
+		goto err_vsi_cfg_tc_lan;
 	}
 
 	ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors);
 	kfree(coalesce);
-
-	return 0;
+	goto unlock;
 
 err_vsi_cfg_tc_lan:
 	ice_vsi_decfg(vsi);
 	kfree(coalesce);
-err_vsi_cfg:
+unlock:
+	mutex_unlock(&vsi->xdp_state_lock);
 	return ret;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 8ed1798bb06e..e50526b491fc 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -611,6 +611,7 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
 	/* clear SW filtering DB */
 	ice_clear_hw_tbls(hw);
 	/* disable the VSIs and their queues that are not already DOWN */
+	set_bit(ICE_VSI_REBUILD_PENDING, ice_get_main_vsi(pf)->state);
 	ice_pf_dis_all_vsi(pf, false);
 
 	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
@@ -3011,7 +3012,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 	}
 
 	/* hot swap progs and avoid toggling link */
-	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
+	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
+	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
 		ice_vsi_assign_bpf_prog(vsi, prog);
 		return 0;
 	}
@@ -3083,21 +3085,28 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	struct ice_netdev_priv *np = netdev_priv(dev);
 	struct ice_vsi *vsi = np->vsi;
+	int ret;
 
 	if (vsi->type != ICE_VSI_PF) {
 		NL_SET_ERR_MSG_MOD(xdp->extack, "XDP can be loaded only on PF VSI");
 		return -EINVAL;
 	}
 
+	mutex_lock(&vsi->xdp_state_lock);
+
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
-		return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
+		ret = ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
+		break;
 	case XDP_SETUP_XSK_POOL:
-		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
-					  xdp->xsk.queue_id);
+		ret = ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id);
+		break;
 	default:
-		return -EINVAL;
+		ret = -EINVAL;
 	}
+
+	mutex_unlock(&vsi->xdp_state_lock);
+	return ret;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index a65955eb23c0..2c1a843ba200 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -379,7 +379,8 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
 		goto failure;
 	}
 
-	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
+	if_running = !test_bit(ICE_VSI_DOWN, vsi->state) &&
+		     ice_is_xdp_ena_vsi(vsi);
 
 	if (if_running) {
 		struct ice_rx_ring *rx_ring = vsi->rx_rings[qid];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring
  2024-07-24 16:48 [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
  2024-07-24 16:48 ` [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
  2024-07-24 16:48 ` [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
@ 2024-07-24 16:48 ` Larysa Zaremba
  2024-07-24 18:25   ` Jacob Keller
                     ` (2 more replies)
  2024-07-24 16:48 ` [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-07-24 16:48 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Larysa Zaremba, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Maciej Fijalkowski, netdev, linux-kernel, bpf, magnus.karlsson,
	Michal Kubiak, Wojciech Drewek, Amritha Nambiar

If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
VSI without applying new ring configuration. When unconfiguring the VSI, we
can encounter the state in which there is an XDP program but no XDP rings
to destroy or there will be XDP rings that need to be destroyed, but no XDP
program to indicate their presence.

When unconfiguring, rely on the presence of XDP rings rather then XDP
program, as they better represent the current state that has to be
destroyed.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c  | 4 ++--
 drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index bbef5ec67cae..f9852f1a136e 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2419,7 +2419,7 @@ void ice_vsi_decfg(struct ice_vsi *vsi)
 		dev_err(ice_pf_to_dev(pf), "Failed to remove RDMA scheduler config for VSI %u, err %d\n",
 			vsi->vsi_num, err);
 
-	if (ice_is_xdp_ena_vsi(vsi))
+	if (vsi->xdp_rings)
 		/* return value check can be skipped here, it always returns
 		 * 0 if reset is in progress
 		 */
@@ -2521,7 +2521,7 @@ static void ice_vsi_release_msix(struct ice_vsi *vsi)
 		for (q = 0; q < q_vector->num_ring_tx; q++) {
 			ice_write_itr(&q_vector->tx, 0);
 			wr32(hw, QINT_TQCTL(vsi->txq_map[txq]), 0);
-			if (ice_is_xdp_ena_vsi(vsi)) {
+			if (vsi->xdp_rings) {
 				u32 xdp_txq = txq + vsi->num_xdp_txq;
 
 				wr32(hw, QINT_TQCTL(vsi->txq_map[xdp_txq]), 0);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index e50526b491fc..d7cc641643f8 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -7244,7 +7244,7 @@ int ice_down(struct ice_vsi *vsi)
 	if (tx_err)
 		netdev_err(vsi->netdev, "Failed stop Tx rings, VSI %d error %d\n",
 			   vsi->vsi_num, tx_err);
-	if (!tx_err && ice_is_xdp_ena_vsi(vsi)) {
+	if (!tx_err && vsi->xdp_rings) {
 		tx_err = ice_vsi_stop_xdp_tx_rings(vsi);
 		if (tx_err)
 			netdev_err(vsi->netdev, "Failed stop XDP rings, VSI %d error %d\n",
@@ -7261,7 +7261,7 @@ int ice_down(struct ice_vsi *vsi)
 	ice_for_each_txq(vsi, i)
 		ice_clean_tx_ring(vsi->tx_rings[i]);
 
-	if (ice_is_xdp_ena_vsi(vsi))
+	if (vsi->xdp_rings)
 		ice_for_each_xdp_txq(vsi, i)
 			ice_clean_tx_ring(vsi->xdp_rings[i]);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 2c1a843ba200..5dd50a2866cc 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -39,7 +39,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
 	       sizeof(vsi_stat->rx_ring_stats[q_idx]->rx_stats));
 	memset(&vsi_stat->tx_ring_stats[q_idx]->stats, 0,
 	       sizeof(vsi_stat->tx_ring_stats[q_idx]->stats));
-	if (ice_is_xdp_ena_vsi(vsi))
+	if (vsi->xdp_rings)
 		memset(&vsi->xdp_rings[q_idx]->ring_stats->stats, 0,
 		       sizeof(vsi->xdp_rings[q_idx]->ring_stats->stats));
 }
@@ -52,7 +52,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
 static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx)
 {
 	ice_clean_tx_ring(vsi->tx_rings[q_idx]);
-	if (ice_is_xdp_ena_vsi(vsi)) {
+	if (vsi->xdp_rings) {
 		synchronize_rcu();
 		ice_clean_tx_ring(vsi->xdp_rings[q_idx]);
 	}
@@ -189,7 +189,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 	err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, 0, tx_ring, &txq_meta);
 	if (err)
 		return err;
-	if (ice_is_xdp_ena_vsi(vsi)) {
+	if (vsi->xdp_rings) {
 		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
 
 		memset(&txq_meta, 0, sizeof(txq_meta));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-07-24 16:48 [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
                   ` (2 preceding siblings ...)
  2024-07-24 16:48 ` [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
@ 2024-07-24 16:48 ` Larysa Zaremba
  2024-07-24 18:27   ` Jacob Keller
  2024-08-08  2:15   ` [Intel-wired-lan] " Rout, ChandanX
  2024-07-24 16:48 ` [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
  2024-07-24 16:48 ` [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
  5 siblings, 2 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-07-24 16:48 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Larysa Zaremba, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Maciej Fijalkowski, netdev, linux-kernel, bpf, magnus.karlsson,
	Michal Kubiak, Wojciech Drewek, Amritha Nambiar

Consider the following scenario:

.ndo_bpf()		| ice_prepare_for_reset()		|
________________________|_______________________________________|
rtnl_lock()		|					|
ice_down()		|					|
			| test_bit(ICE_VSI_DOWN) - true		|
			| ice_dis_vsi() returns			|
ice_up()		|					|
			| proceeds to rebuild a running VSI	|

.ndo_bpf() is not the only rtnl-locked callback that toggles the interface
to apply new configuration. Another example is .set_channels().

To avoid the race condition above, act only after reading ICE_VSI_DOWN
under rtnl_lock.

Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index f9852f1a136e..b773078ad81a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2665,8 +2665,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
  */
 void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
 {
-	if (test_bit(ICE_VSI_DOWN, vsi->state))
-		return;
+	bool already_down = test_bit(ICE_VSI_DOWN, vsi->state);
 
 	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
 
@@ -2674,15 +2673,16 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
 		if (netif_running(vsi->netdev)) {
 			if (!locked)
 				rtnl_lock();
-
-			ice_vsi_close(vsi);
+			already_down = test_bit(ICE_VSI_DOWN, vsi->state);
+			if (!already_down)
+				ice_vsi_close(vsi);
 
 			if (!locked)
 				rtnl_unlock();
-		} else {
+		} else if (!already_down) {
 			ice_vsi_close(vsi);
 		}
-	} else if (vsi->type == ICE_VSI_CTRL) {
+	} else if (vsi->type == ICE_VSI_CTRL && !already_down) {
 		ice_vsi_close(vsi);
 	}
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-07-24 16:48 [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
                   ` (3 preceding siblings ...)
  2024-07-24 16:48 ` [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
@ 2024-07-24 16:48 ` Larysa Zaremba
  2024-07-24 18:37   ` Jacob Keller
                     ` (2 more replies)
  2024-07-24 16:48 ` [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
  5 siblings, 3 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-07-24 16:48 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Larysa Zaremba, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Maciej Fijalkowski, netdev, linux-kernel, bpf, magnus.karlsson,
	Michal Kubiak, Wojciech Drewek, Amritha Nambiar

Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
state, not VSI one. Therefore it does not protect the queue pair from
e.g. reset.

Despite being useless, it still can deadlock the unfortunate functions that
have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
returns an error.

Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 5dd50a2866cc..d23fd4ea9129 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -163,7 +163,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 	struct ice_q_vector *q_vector;
 	struct ice_tx_ring *tx_ring;
 	struct ice_rx_ring *rx_ring;
-	int timeout = 50;
 	int err;
 
 	if (q_idx >= vsi->num_rxq || q_idx >= vsi->num_txq)
@@ -173,13 +172,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 	rx_ring = vsi->rx_rings[q_idx];
 	q_vector = rx_ring->q_vector;
 
-	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
-		timeout--;
-		if (!timeout)
-			return -EBUSY;
-		usleep_range(1000, 2000);
-	}
-
 	ice_qvec_dis_irq(vsi, rx_ring, q_vector);
 	ice_qvec_toggle_napi(vsi, q_vector, false);
 
@@ -250,7 +242,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 	ice_qvec_ena_irq(vsi, q_vector);
 
 	netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
-	clear_bit(ICE_CFG_BUSY, vsi->state);
 
 	return 0;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup
  2024-07-24 16:48 [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
                   ` (4 preceding siblings ...)
  2024-07-24 16:48 ` [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
@ 2024-07-24 16:48 ` Larysa Zaremba
  2024-07-24 18:40   ` Jacob Keller
  2024-08-08  2:14   ` [Intel-wired-lan] " Rout, ChandanX
  5 siblings, 2 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-07-24 16:48 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Larysa Zaremba, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Maciej Fijalkowski, netdev, linux-kernel, bpf, magnus.karlsson,
	Michal Kubiak, Wojciech Drewek, Amritha Nambiar

After XDP configuration is completed, we bring the interface up
unconditionally, regardless of its state before the call to .ndo_bpf().

Preserve the information whether the interface had to be brought down and
later bring it up only in such case.

Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index d7cc641643f8..d83cde431fa5 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3000,8 +3000,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 		   struct netlink_ext_ack *extack)
 {
 	unsigned int frame_size = vsi->netdev->mtu + ICE_ETH_PKT_HDR_PAD;
-	bool if_running = netif_running(vsi->netdev);
 	int ret = 0, xdp_ring_err = 0;
+	bool if_running;
 
 	if (prog && !prog->aux->xdp_has_frags) {
 		if (frame_size > ice_max_xdp_frame_size(vsi)) {
@@ -3018,8 +3018,11 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 		return 0;
 	}
 
+	if_running = netif_running(vsi->netdev) &&
+		     !test_and_set_bit(ICE_VSI_DOWN, vsi->state);
+
 	/* need to stop netdev while setting up the program for Rx rings */
-	if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) {
+	if (if_running) {
 		ret = ice_down(vsi);
 		if (ret) {
 			NL_SET_ERR_MSG_MOD(extack, "Preparing device for XDP attach failed");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-07-24 16:48 ` [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
@ 2024-07-24 18:21   ` Jacob Keller
  2024-07-24 20:40   ` Nambiar, Amritha
  2024-08-08  2:19   ` [Intel-wired-lan] " Rout, ChandanX
  2 siblings, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2024-07-24 18:21 UTC (permalink / raw)
  To: Larysa Zaremba, intel-wired-lan
  Cc: Tony Nguyen, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar



On 7/24/2024 9:48 AM, Larysa Zaremba wrote:
> Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
> not rtnl-locked when called from the reset. This creates the need to take
> the rtnl_lock just for a single function and complicates the
> synchronization with .ndo_bpf. At the same time, there no actual need to
> fill napi-to-queue information at this exact point.
> 
> Fill napi-to-queue information when opening the VSI and clear it when the
> VSI is being closed. Those routines are already rtnl-locked.
> 
> Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
> XDP queues, as this leads to out-of-bounds writes, such as one below.
> 
> [  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
> [  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
> [  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> [  +0.000003] Call Trace:
> [  +0.000003]  <TASK>
> [  +0.000002]  dump_stack_lvl+0x60/0x80
> [  +0.000007]  print_report+0xce/0x630
> [  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
> [  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000003]  kasan_report+0xe9/0x120
> [  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
> [  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
> [  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
> [  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
> [  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
> [  +0.000006]  pci_reset_function+0x12d/0x230
> [  +0.000004]  reset_store+0xa0/0x100
> [  +0.000006]  ? __pfx_reset_store+0x10/0x10
> [  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
> [  +0.000004]  ? __check_object_size+0x4c1/0x640
> [  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
> [  +0.000006]  vfs_write+0x5d6/0xdf0
> [  +0.000005]  ? fd_install+0x180/0x350
> [  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
> [  +0.000004]  ? do_fcntl+0x52c/0xcd0
> [  +0.000004]  ? kasan_save_track+0x13/0x60
> [  +0.000003]  ? kasan_save_free_info+0x37/0x60
> [  +0.000006]  ksys_write+0xfa/0x1d0
> [  +0.000003]  ? __pfx_ksys_write+0x10/0x10
> [  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
> [  +0.000004]  ? _raw_spin_lock+0x87/0xe0
> [  +0.000005]  do_syscall_64+0x80/0x170
> [  +0.000007]  ? _raw_spin_lock+0x87/0xe0
> [  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
> [  +0.000003]  ? file_close_fd_locked+0x167/0x230
> [  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  +0.000005]  ? do_syscall_64+0x8c/0x170
> [  +0.000004]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? fput+0x1a/0x2c0
> [  +0.000004]  ? filp_close+0x19/0x30
> [  +0.000004]  ? do_dup2+0x25a/0x4c0
> [  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
> [  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  +0.000004]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? __count_memcg_events+0x113/0x380
> [  +0.000005]  ? handle_mm_fault+0x136/0x820
> [  +0.000005]  ? do_user_addr_fault+0x444/0xa80
> [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> [  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  +0.000005] RIP: 0033:0x7f2033593154
> 
> Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---

Much simpler method, good fix and cleanup!

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex
  2024-07-24 16:48 ` [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
@ 2024-07-24 18:24   ` Jacob Keller
  2024-08-08  2:16   ` [Intel-wired-lan] " Rout, ChandanX
  2024-08-13 11:31   ` Maciej Fijalkowski
  2 siblings, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2024-07-24 18:24 UTC (permalink / raw)
  To: Larysa Zaremba, intel-wired-lan
  Cc: Tony Nguyen, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar



On 7/24/2024 9:48 AM, Larysa Zaremba wrote:
> The main threat to data consistency in ice_xdp() is a possible asynchronous
> PF reset. It can be triggered by a user or by TX timeout handler.
> 
> XDP setup and PF reset code access the same resources in the following
> sections:
> * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
> * ice_vsi_rebuild() for the PF VSI - not protected
> * ice_vsi_open() - already rtnl-locked
> 
> With an unfortunate timing, such accesses can result in a crash such as the
> one below:
> 
> [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
> [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
> [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
> [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
> [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
> [ +0.394718] ice 0000:b1:00.0: PTP reset successful
> [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
> [ +0.000045] #PF: supervisor read access in kernel mode
> [ +0.000023] #PF: error_code(0x0000) - not-present page
> [ +0.000023] PGD 0 P4D 0
> [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
> [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> [ +0.000036] Workqueue: ice ice_service_task [ice]
> [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
> [...]
> [ +0.000013] Call Trace:
> [ +0.000016] <TASK>
> [ +0.000014] ? __die+0x1f/0x70
> [ +0.000029] ? page_fault_oops+0x171/0x4f0
> [ +0.000029] ? schedule+0x3b/0xd0
> [ +0.000027] ? exc_page_fault+0x7b/0x180
> [ +0.000022] ? asm_exc_page_fault+0x22/0x30
> [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
> [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
> [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
> [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
> [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
> [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
> [ +0.000145] ice_rebuild+0x18c/0x840 [ice]
> [ +0.000145] ? delay_tsc+0x4a/0xc0
> [ +0.000022] ? delay_tsc+0x92/0xc0
> [ +0.000020] ice_do_reset+0x140/0x180 [ice]
> [ +0.000886] ice_service_task+0x404/0x1030 [ice]
> [ +0.000824] process_one_work+0x171/0x340
> [ +0.000685] worker_thread+0x277/0x3a0
> [ +0.000675] ? preempt_count_add+0x6a/0xa0
> [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
> [ +0.000679] ? __pfx_worker_thread+0x10/0x10
> [ +0.000653] kthread+0xf0/0x120
> [ +0.000635] ? __pfx_kthread+0x10/0x10
> [ +0.000616] ret_from_fork+0x2d/0x50
> [ +0.000612] ? __pfx_kthread+0x10/0x10
> [ +0.000604] ret_from_fork_asm+0x1b/0x30
> [ +0.000604] </TASK>
> 
> The previous way of handling this through returning -EBUSY is not viable,
> particularly when destroying AF_XDP socket, because the kernel proceeds
> with removal anyway.
> 
> There is plenty of code between those calls and there is no need to create
> a large critical section that covers all of them, same as there is no need
> to protect ice_vsi_rebuild() with rtnl_lock().
> 
> Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
> 
> Leaving unprotected sections in between would result in two states that
> have to be considered:
> 1. when the VSI is closed, but not yet rebuild
> 2. when VSI is already rebuild, but not yet open
> 
> The latter case is actually already handled through !netif_running() case,
> we just need to adjust flag checking a little. The former one is not as
> trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
> hardware interaction happens, this can make adding/deleting rings exit
> with an error. Luckily, VSI rebuild is pending and can apply new
> configuration for us in a managed fashion.
> 
> Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
> indicate that ice_xdp() can just hot-swap the program.
> 
> Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> Fixes: efc2214b6047 ("ice: Add support for XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring
  2024-07-24 16:48 ` [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
@ 2024-07-24 18:25   ` Jacob Keller
  2024-08-08  2:18   ` [Intel-wired-lan] " Rout, ChandanX
  2024-08-12 12:58   ` Maciej Fijalkowski
  2 siblings, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2024-07-24 18:25 UTC (permalink / raw)
  To: Larysa Zaremba, intel-wired-lan
  Cc: Tony Nguyen, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar



On 7/24/2024 9:48 AM, Larysa Zaremba wrote:
> If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
> VSI without applying new ring configuration. When unconfiguring the VSI, we
> can encounter the state in which there is an XDP program but no XDP rings
> to destroy or there will be XDP rings that need to be destroyed, but no XDP
> program to indicate their presence.
> 
> When unconfiguring, rely on the presence of XDP rings rather then XDP
> program, as they better represent the current state that has to be
> destroyed.
> 
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---

Right. When operating on the rings, we should be checking xdp_rings. Ok.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-07-24 16:48 ` [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
@ 2024-07-24 18:27   ` Jacob Keller
  2024-08-08  2:15   ` [Intel-wired-lan] " Rout, ChandanX
  1 sibling, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2024-07-24 18:27 UTC (permalink / raw)
  To: Larysa Zaremba, intel-wired-lan
  Cc: Tony Nguyen, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar



On 7/24/2024 9:48 AM, Larysa Zaremba wrote:
> Consider the following scenario:
> 
> .ndo_bpf()		| ice_prepare_for_reset()		|
> ________________________|_______________________________________|
> rtnl_lock()		|					|
> ice_down()		|					|
> 			| test_bit(ICE_VSI_DOWN) - true		|
> 			| ice_dis_vsi() returns			|
> ice_up()		|					|
> 			| proceeds to rebuild a running VSI	|
> 
> .ndo_bpf() is not the only rtnl-locked callback that toggles the interface
> to apply new configuration. Another example is .set_channels().
> 
> To avoid the race condition above, act only after reading ICE_VSI_DOWN
> under rtnl_lock.
> 
> Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index f9852f1a136e..b773078ad81a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2665,8 +2665,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked)
>   */
>  void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
>  {
> -	if (test_bit(ICE_VSI_DOWN, vsi->state))
> -		return;
> +	bool already_down = test_bit(ICE_VSI_DOWN, vsi->state);
>  

Do we need to initialize already_down? I guess its because the other
initialization happens inside the conditional which may not be executed
in every flow. Ok.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

>  	set_bit(ICE_VSI_NEEDS_RESTART, vsi->state);
>  
> @@ -2674,15 +2673,16 @@ void ice_dis_vsi(struct ice_vsi *vsi, bool locked)
>  		if (netif_running(vsi->netdev)) {
>  			if (!locked)
>  				rtnl_lock();
> -
> -			ice_vsi_close(vsi);
> +			already_down = test_bit(ICE_VSI_DOWN, vsi->state);
> +			if (!already_down)
> +				ice_vsi_close(vsi);
>  
>  			if (!locked)
>  				rtnl_unlock();
> -		} else {
> +		} else if (!already_down) {
>  			ice_vsi_close(vsi);
>  		}
> -	} else if (vsi->type == ICE_VSI_CTRL) {
> +	} else if (vsi->type == ICE_VSI_CTRL && !already_down) {
>  		ice_vsi_close(vsi);
>  	}
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-07-24 16:48 ` [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
@ 2024-07-24 18:37   ` Jacob Keller
  2024-08-08  2:17   ` [Intel-wired-lan] " Rout, ChandanX
  2024-08-12 13:03   ` Maciej Fijalkowski
  2 siblings, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2024-07-24 18:37 UTC (permalink / raw)
  To: Larysa Zaremba, intel-wired-lan
  Cc: Tony Nguyen, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar



On 7/24/2024 9:48 AM, Larysa Zaremba wrote:
> Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
> because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
> state, not VSI one. Therefore it does not protect the queue pair from
> e.g. reset.
> 

Yea, unfortunately a lot of places accidentally use the wrong flags. I
wonder if this is something sparse could help with identifying by having
the flags tagged in some way...

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

> Despite being useless, it still can deadlock the unfortunate functions that
> have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
> returns an error.
> 

This wording makes it sound like other functions have this issue. Is it
only these two left?

Seems like there are a few other places which check this:

> ice_xsk.c
> 176:    while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
> 253:    clear_bit(ICE_CFG_BUSY, vsi->state);
> 

These two are fixed by your patch.

> ice_main.c
> 334:    while (test_and_set_bit(ICE_CFG_BUSY, vsi->state))
> 475:    clear_bit(ICE_CFG_BUSY, vsi->state);

These two appear to be ice_vsi_sync_fltr.

> 3791:   while (test_and_set_bit(ICE_CFG_BUSY, vsi->state))
> 3828:   clear_bit(ICE_CFG_BUSY, vsi->state);

These two appear to be ice_vlan_rx_add_vid.

> 3854:   while (test_and_set_bit(ICE_CFG_BUSY, vsi->state))
> 3897:   clear_bit(ICE_CFG_BUSY, vsi->state);

These two appear to be ice_vlan_rx_kill_vid.

> ice.h
> 299:    ICE_CFG_BUSY,
>

This is part of the ice_pf_state enumeration. So yes, we really
shouldn't be checking it in the vsi->state. In the strictest sense this
could be leading to a out-of-bounds read or set, but we happen to luck
into working because the DECLARE_BITMAP uses longs so there is junk data
after the end of the actual state bit size. The bit functions don't get
passed the size so can't have annotations which would catch this.
 Obviously not your fault, and don't need to be fixed in this series,
but its at least a semantic bug if not actually trigger-able by
anything. It looks like VLAN functions *are* using this flag
intentionally, if incorrectly. Its unclear what the correct fix is to me
offhand. Perhaps just creating a VSI specific flag for VLANs... or
perhaps replacing the flag with a regular synchronization primitive....

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

> Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
> 
> Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 5dd50a2866cc..d23fd4ea9129 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -163,7 +163,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	struct ice_q_vector *q_vector;
>  	struct ice_tx_ring *tx_ring;
>  	struct ice_rx_ring *rx_ring;
> -	int timeout = 50;
>  	int err;
>  
>  	if (q_idx >= vsi->num_rxq || q_idx >= vsi->num_txq)
> @@ -173,13 +172,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	rx_ring = vsi->rx_rings[q_idx];
>  	q_vector = rx_ring->q_vector;
>  
> -	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
> -		timeout--;
> -		if (!timeout)
> -			return -EBUSY;
> -		usleep_range(1000, 2000);
> -	}
> -
>  	ice_qvec_dis_irq(vsi, rx_ring, q_vector);
>  	ice_qvec_toggle_napi(vsi, q_vector, false);
>  
> @@ -250,7 +242,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>  	ice_qvec_ena_irq(vsi, q_vector);
>  
>  	netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
> -	clear_bit(ICE_CFG_BUSY, vsi->state);
>  
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup
  2024-07-24 16:48 ` [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
@ 2024-07-24 18:40   ` Jacob Keller
  2024-08-08  2:14   ` [Intel-wired-lan] " Rout, ChandanX
  1 sibling, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2024-07-24 18:40 UTC (permalink / raw)
  To: Larysa Zaremba, intel-wired-lan
  Cc: Tony Nguyen, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar



On 7/24/2024 9:48 AM, Larysa Zaremba wrote:
> After XDP configuration is completed, we bring the interface up
> unconditionally, regardless of its state before the call to .ndo_bpf().
> 
> Preserve the information whether the interface had to be brought down and
> later bring it up only in such case.
> 
> Fixes: efc2214b6047 ("ice: Add support for XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index d7cc641643f8..d83cde431fa5 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -3000,8 +3000,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>  		   struct netlink_ext_ack *extack)
>  {
>  	unsigned int frame_size = vsi->netdev->mtu + ICE_ETH_PKT_HDR_PAD;
> -	bool if_running = netif_running(vsi->netdev);
>  	int ret = 0, xdp_ring_err = 0;
> +	bool if_running;
>  
>  	if (prog && !prog->aux->xdp_has_frags) {
>  		if (frame_size > ice_max_xdp_frame_size(vsi)) {
> @@ -3018,8 +3018,11 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>  		return 0;
>  	}
>  
> +	if_running = netif_running(vsi->netdev) &&
> +		     !test_and_set_bit(ICE_VSI_DOWN, vsi->state);
> +
>  	/* need to stop netdev while setting up the program for Rx rings */
> -	if (if_running && !test_and_set_bit(ICE_VSI_DOWN, vsi->state)) {
> +	if (if_running) {
>  		ret = ice_down(vsi);
>  		if (ret) {
>  			NL_SET_ERR_MSG_MOD(extack, "Preparing device for XDP attach failed");

This diff lacks the appropriate context that can help explain how it works:

> 
>         if (if_running)
>                 ret = ice_up(vsi);
> 
>         if (!ret && prog)
>                 ice_vsi_rx_napi_schedule(vsi);
> 
>         return (ret || xdp_ring_err) ? -ENOMEM : 0;

By changing if_running to include the VSI state flag, we ensure the call
to ice_up is correctly conditional.

Good fix. If I understand right, it depends on the other synchronization
fixes to work correctly so that the VSI state won't be changed on us.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-07-24 16:48 ` [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
  2024-07-24 18:21   ` Jacob Keller
@ 2024-07-24 20:40   ` Nambiar, Amritha
  2024-08-08  2:19   ` [Intel-wired-lan] " Rout, ChandanX
  2 siblings, 0 replies; 27+ messages in thread
From: Nambiar, Amritha @ 2024-07-24 20:40 UTC (permalink / raw)
  To: Larysa Zaremba, intel-wired-lan
  Cc: Tony Nguyen, David S. Miller, Jacob Keller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Maciej Fijalkowski,
	netdev, linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek

On 7/24/2024 9:48 AM, Larysa Zaremba wrote:
> Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
> not rtnl-locked when called from the reset. This creates the need to take
> the rtnl_lock just for a single function and complicates the
> synchronization with .ndo_bpf. At the same time, there no actual need to
> fill napi-to-queue information at this exact point.
> 
> Fill napi-to-queue information when opening the VSI and clear it when the
> VSI is being closed. Those routines are already rtnl-locked.
> 
> Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
> XDP queues, as this leads to out-of-bounds writes, such as one below.
> 
> [  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
> [  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
> [  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> [  +0.000003] Call Trace:
> [  +0.000003]  <TASK>
> [  +0.000002]  dump_stack_lvl+0x60/0x80
> [  +0.000007]  print_report+0xce/0x630
> [  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
> [  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000003]  kasan_report+0xe9/0x120
> [  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
> [  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
> [  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
> [  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
> [  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
> [  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
> [  +0.000006]  pci_reset_function+0x12d/0x230
> [  +0.000004]  reset_store+0xa0/0x100
> [  +0.000006]  ? __pfx_reset_store+0x10/0x10
> [  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
> [  +0.000004]  ? __check_object_size+0x4c1/0x640
> [  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
> [  +0.000006]  vfs_write+0x5d6/0xdf0
> [  +0.000005]  ? fd_install+0x180/0x350
> [  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
> [  +0.000004]  ? do_fcntl+0x52c/0xcd0
> [  +0.000004]  ? kasan_save_track+0x13/0x60
> [  +0.000003]  ? kasan_save_free_info+0x37/0x60
> [  +0.000006]  ksys_write+0xfa/0x1d0
> [  +0.000003]  ? __pfx_ksys_write+0x10/0x10
> [  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
> [  +0.000004]  ? _raw_spin_lock+0x87/0xe0
> [  +0.000005]  do_syscall_64+0x80/0x170
> [  +0.000007]  ? _raw_spin_lock+0x87/0xe0
> [  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
> [  +0.000003]  ? file_close_fd_locked+0x167/0x230
> [  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  +0.000005]  ? do_syscall_64+0x8c/0x170
> [  +0.000004]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? fput+0x1a/0x2c0
> [  +0.000004]  ? filp_close+0x19/0x30
> [  +0.000004]  ? do_dup2+0x25a/0x4c0
> [  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
> [  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  +0.000004]  ? do_syscall_64+0x8c/0x170
> [  +0.000003]  ? __count_memcg_events+0x113/0x380
> [  +0.000005]  ? handle_mm_fault+0x136/0x820
> [  +0.000005]  ? do_user_addr_fault+0x444/0xa80
> [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> [  +0.000004]  ? clear_bhb_loop+0x25/0x80
> [  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  +0.000005] RIP: 0033:0x7f2033593154
> 
> Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
> Reviewed-by: Wojciech Drewek<wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba<larysa.zaremba@intel.com>

Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup
  2024-07-24 16:48 ` [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
  2024-07-24 18:40   ` Jacob Keller
@ 2024-08-08  2:14   ` Rout, ChandanX
  1 sibling, 0 replies; 27+ messages in thread
From: Rout, ChandanX @ 2024-08-08  2:14 UTC (permalink / raw)
  To: Zaremba, Larysa, intel-wired-lan@lists.osuosl.org
  Cc: Drewek, Wojciech, Fijalkowski, Maciej, Jesper Dangaard Brouer,
	Daniel Borkmann, netdev@vger.kernel.org, John Fastabend,
	Alexei Starovoitov, linux-kernel@vger.kernel.org, Eric Dumazet,
	Kubiak, Michal, Nguyen, Anthony L, Nambiar, Amritha,
	Keller, Jacob E, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
	David S. Miller, Karlsson, Magnus, Kuruvinakunnel, George,
	Pandey, Atul, Nagraj, Shravan



>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
>Zaremba, Larysa
>Sent: Wednesday, July 24, 2024 10:19 PM
>To: intel-wired-lan@lists.osuosl.org
>Cc: Drewek, Wojciech <wojciech.drewek@intel.com>; Fijalkowski, Maciej
><maciej.fijalkowski@intel.com>; Jesper Dangaard Brouer <hawk@kernel.org>;
>Daniel Borkmann <daniel@iogearbox.net>; Zaremba, Larysa
><larysa.zaremba@intel.com>; netdev@vger.kernel.org; John Fastabend
><john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; linux-
>kernel@vger.kernel.org; Eric Dumazet <edumazet@google.com>; Kubiak,
>Michal <michal.kubiak@intel.com>; Nguyen, Anthony L
><anthony.l.nguyen@intel.com>; Nambiar, Amritha
><amritha.nambiar@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
>Jakub Kicinski <kuba@kernel.org>; bpf@vger.kernel.org; Paolo Abeni
><pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; Karlsson,
>Magnus <magnus.karlsson@intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if
>it was down before the XDP setup
>
>After XDP configuration is completed, we bring the interface up unconditionally,
>regardless of its state before the call to .ndo_bpf().
>
>Preserve the information whether the interface had to be brought down and
>later bring it up only in such case.
>
>Fixes: efc2214b6047 ("ice: Add support for XDP")
>Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>---
> drivers/net/ethernet/intel/ice/ice_main.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>

Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  2024-07-24 16:48 ` [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
  2024-07-24 18:27   ` Jacob Keller
@ 2024-08-08  2:15   ` Rout, ChandanX
  1 sibling, 0 replies; 27+ messages in thread
From: Rout, ChandanX @ 2024-08-08  2:15 UTC (permalink / raw)
  To: Zaremba, Larysa, intel-wired-lan@lists.osuosl.org
  Cc: Drewek, Wojciech, Fijalkowski, Maciej, Jesper Dangaard Brouer,
	Daniel Borkmann, netdev@vger.kernel.org, John Fastabend,
	Alexei Starovoitov, linux-kernel@vger.kernel.org, Eric Dumazet,
	Kubiak, Michal, Nguyen, Anthony L, Nambiar, Amritha,
	Keller, Jacob E, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
	David S. Miller, Karlsson, Magnus, Pandey, Atul,
	Kuruvinakunnel, George, Nagraj, Shravan



>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
>Zaremba, Larysa
>Sent: Wednesday, July 24, 2024 10:19 PM
>To: intel-wired-lan@lists.osuosl.org
>Cc: Drewek, Wojciech <wojciech.drewek@intel.com>; Fijalkowski, Maciej
><maciej.fijalkowski@intel.com>; Jesper Dangaard Brouer <hawk@kernel.org>;
>Daniel Borkmann <daniel@iogearbox.net>; Zaremba, Larysa
><larysa.zaremba@intel.com>; netdev@vger.kernel.org; John Fastabend
><john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; linux-
>kernel@vger.kernel.org; Eric Dumazet <edumazet@google.com>; Kubiak,
>Michal <michal.kubiak@intel.com>; Nguyen, Anthony L
><anthony.l.nguyen@intel.com>; Nambiar, Amritha
><amritha.nambiar@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
>Jakub Kicinski <kuba@kernel.org>; bpf@vger.kernel.org; Paolo Abeni
><pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; Karlsson,
>Magnus <magnus.karlsson@intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN
>under rtnl_lock when preparing for reset
>
>Consider the following scenario:
>
>.ndo_bpf()		| ice_prepare_for_reset()		|
>________________________|_______________________________________|
>rtnl_lock()		|					|
>ice_down()		|					|
>			| test_bit(ICE_VSI_DOWN) - true		|
>			| ice_dis_vsi() returns			|
>ice_up()		|					|
>			| proceeds to rebuild a running VSI	|
>
>.ndo_bpf() is not the only rtnl-locked callback that toggles the interface to apply
>new configuration. Another example is .set_channels().
>
>To avoid the race condition above, act only after reading ICE_VSI_DOWN under
>rtnl_lock.
>
>Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
>Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>---
> drivers/net/ethernet/intel/ice/ice_lib.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>

Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex
  2024-07-24 16:48 ` [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
  2024-07-24 18:24   ` Jacob Keller
@ 2024-08-08  2:16   ` Rout, ChandanX
  2024-08-13 11:31   ` Maciej Fijalkowski
  2 siblings, 0 replies; 27+ messages in thread
From: Rout, ChandanX @ 2024-08-08  2:16 UTC (permalink / raw)
  To: Zaremba, Larysa, intel-wired-lan@lists.osuosl.org
  Cc: Drewek, Wojciech, Fijalkowski, Maciej, Jesper Dangaard Brouer,
	Daniel Borkmann, netdev@vger.kernel.org, John Fastabend,
	Alexei Starovoitov, linux-kernel@vger.kernel.org, Eric Dumazet,
	Kubiak, Michal, Nguyen, Anthony L, Nambiar, Amritha,
	Keller, Jacob E, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
	David S. Miller, Karlsson, Magnus, Kuruvinakunnel, George,
	Pandey, Atul, Nagraj, Shravan



>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
>Zaremba, Larysa
>Sent: Wednesday, July 24, 2024 10:19 PM
>To: intel-wired-lan@lists.osuosl.org
>Cc: Drewek, Wojciech <wojciech.drewek@intel.com>; Fijalkowski, Maciej
><maciej.fijalkowski@intel.com>; Jesper Dangaard Brouer <hawk@kernel.org>;
>Daniel Borkmann <daniel@iogearbox.net>; Zaremba, Larysa
><larysa.zaremba@intel.com>; netdev@vger.kernel.org; John Fastabend
><john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; linux-
>kernel@vger.kernel.org; Eric Dumazet <edumazet@google.com>; Kubiak,
>Michal <michal.kubiak@intel.com>; Nguyen, Anthony L
><anthony.l.nguyen@intel.com>; Nambiar, Amritha
><amritha.nambiar@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
>Jakub Kicinski <kuba@kernel.org>; bpf@vger.kernel.org; Paolo Abeni
><pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; Karlsson,
>Magnus <magnus.karlsson@intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-net v2 2/6] ice: protect XDP configuration
>with a mutex
>
>The main threat to data consistency in ice_xdp() is a possible asynchronous PF
>reset. It can be triggered by a user or by TX timeout handler.
>
>XDP setup and PF reset code access the same resources in the following
>sections:
>* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
>* ice_vsi_rebuild() for the PF VSI - not protected
>* ice_vsi_open() - already rtnl-locked
>
>With an unfortunate timing, such accesses can result in a crash such as the one
>below:
>
>[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model
>MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0:
>Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
>[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38:
>transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0
>ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU:
>0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout
>recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset
>successful [ +0.006184] BUG: kernel NULL pointer dereference, address:
>0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [
>+0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0
>P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU:
>38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031]
>Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS
>SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036]
>Workqueue: ice ice_service_task [ice] [ +0.000183] RIP:
>0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace:
>[ +0.000016] <TASK>
>[ +0.000014] ? __die+0x1f/0x70
>[ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ?
>schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022]
>? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0
>[ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186]
>ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151]
>ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
>[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145]
>ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [
>+0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180
>[ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824]
>process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [
>+0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ?
>_raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ?
>__pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [
>+0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616]
>ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [
>+0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK>
>
>The previous way of handling this through returning -EBUSY is not viable,
>particularly when destroying AF_XDP socket, because the kernel proceeds with
>removal anyway.
>
>There is plenty of code between those calls and there is no need to create a
>large critical section that covers all of them, same as there is no need to protect
>ice_vsi_rebuild() with rtnl_lock().
>
>Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
>
>Leaving unprotected sections in between would result in two states that have
>to be considered:
>1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but
>not yet open
>
>The latter case is actually already handled through !netif_running() case, we
>just need to adjust flag checking a little. The former one is not as trivial, because
>between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction
>happens, this can make adding/deleting rings exit with an error. Luckily, VSI
>rebuild is pending and can apply new configuration for us in a managed fashion.
>
>Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
>indicate that ice_xdp() can just hot-swap the program.
>
>Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
>Fixes: efc2214b6047 ("ice: Add support for XDP")
>Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>---
> drivers/net/ethernet/intel/ice/ice.h      |  2 ++
> drivers/net/ethernet/intel/ice/ice_lib.c  | 26 +++++++++++++++--------
>drivers/net/ethernet/intel/ice/ice_main.c | 19 ++++++++++++-----
>drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 ++-
> 4 files changed, 35 insertions(+), 15 deletions(-)
>

Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-07-24 16:48 ` [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
  2024-07-24 18:37   ` Jacob Keller
@ 2024-08-08  2:17   ` Rout, ChandanX
  2024-08-12 13:03   ` Maciej Fijalkowski
  2 siblings, 0 replies; 27+ messages in thread
From: Rout, ChandanX @ 2024-08-08  2:17 UTC (permalink / raw)
  To: Zaremba, Larysa, intel-wired-lan@lists.osuosl.org
  Cc: Drewek, Wojciech, Fijalkowski, Maciej, Jesper Dangaard Brouer,
	Daniel Borkmann, netdev@vger.kernel.org, John Fastabend,
	Alexei Starovoitov, linux-kernel@vger.kernel.org, Eric Dumazet,
	Kubiak, Michal, Nguyen, Anthony L, Nambiar, Amritha,
	Keller, Jacob E, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
	David S. Miller, Karlsson, Magnus, Kuruvinakunnel, George,
	Pandey, Atul, Nagraj, Shravan



>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
>Zaremba, Larysa
>Sent: Wednesday, July 24, 2024 10:19 PM
>To: intel-wired-lan@lists.osuosl.org
>Cc: Drewek, Wojciech <wojciech.drewek@intel.com>; Fijalkowski, Maciej
><maciej.fijalkowski@intel.com>; Jesper Dangaard Brouer <hawk@kernel.org>;
>Daniel Borkmann <daniel@iogearbox.net>; Zaremba, Larysa
><larysa.zaremba@intel.com>; netdev@vger.kernel.org; John Fastabend
><john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; linux-
>kernel@vger.kernel.org; Eric Dumazet <edumazet@google.com>; Kubiak,
>Michal <michal.kubiak@intel.com>; Nguyen, Anthony L
><anthony.l.nguyen@intel.com>; Nambiar, Amritha
><amritha.nambiar@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
>Jakub Kicinski <kuba@kernel.org>; bpf@vger.kernel.org; Paolo Abeni
><pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; Karlsson,
>Magnus <magnus.karlsson@intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY
>locking from AF_XDP code
>
>Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
>because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF state,
>not VSI one. Therefore it does not protect the queue pair from e.g. reset.
>
>Despite being useless, it still can deadlock the unfortunate functions that have
>fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena returns
>an error.
>
>Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
>
>Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
>Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>---
> drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
> 1 file changed, 9 deletions(-)
>

Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring
  2024-07-24 16:48 ` [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
  2024-07-24 18:25   ` Jacob Keller
@ 2024-08-08  2:18   ` Rout, ChandanX
  2024-08-12 12:58   ` Maciej Fijalkowski
  2 siblings, 0 replies; 27+ messages in thread
From: Rout, ChandanX @ 2024-08-08  2:18 UTC (permalink / raw)
  To: Zaremba, Larysa, intel-wired-lan@lists.osuosl.org
  Cc: Drewek, Wojciech, Fijalkowski, Maciej, Jesper Dangaard Brouer,
	Daniel Borkmann, netdev@vger.kernel.org, John Fastabend,
	Alexei Starovoitov, linux-kernel@vger.kernel.org, Eric Dumazet,
	Kubiak, Michal, Nguyen, Anthony L, Nambiar, Amritha,
	Keller, Jacob E, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
	David S. Miller, Karlsson, Magnus, Nagraj, Shravan,
	Kuruvinakunnel, George, Pandey, Atul



>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
>Zaremba, Larysa
>Sent: Wednesday, July 24, 2024 10:19 PM
>To: intel-wired-lan@lists.osuosl.org
>Cc: Drewek, Wojciech <wojciech.drewek@intel.com>; Fijalkowski, Maciej
><maciej.fijalkowski@intel.com>; Jesper Dangaard Brouer <hawk@kernel.org>;
>Daniel Borkmann <daniel@iogearbox.net>; Zaremba, Larysa
><larysa.zaremba@intel.com>; netdev@vger.kernel.org; John Fastabend
><john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; linux-
>kernel@vger.kernel.org; Eric Dumazet <edumazet@google.com>; Kubiak,
>Michal <michal.kubiak@intel.com>; Nguyen, Anthony L
><anthony.l.nguyen@intel.com>; Nambiar, Amritha
><amritha.nambiar@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
>Jakub Kicinski <kuba@kernel.org>; bpf@vger.kernel.org; Paolo Abeni
><pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; Karlsson,
>Magnus <magnus.karlsson@intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-net v2 3/6] ice: check for XDP rings
>instead of bpf program when unconfiguring
>
>If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on VSI
>without applying new ring configuration. When unconfiguring the VSI, we can
>encounter the state in which there is an XDP program but no XDP rings to
>destroy or there will be XDP rings that need to be destroyed, but no XDP
>program to indicate their presence.
>
>When unconfiguring, rely on the presence of XDP rings rather then XDP
>program, as they better represent the current state that has to be destroyed.
>
>Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>---
> drivers/net/ethernet/intel/ice/ice_lib.c  | 4 ++--
>drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
>drivers/net/ethernet/intel/ice/ice_xsk.c  | 6 +++---
> 3 files changed, 7 insertions(+), 7 deletions(-)
>

Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections
  2024-07-24 16:48 ` [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
  2024-07-24 18:21   ` Jacob Keller
  2024-07-24 20:40   ` Nambiar, Amritha
@ 2024-08-08  2:19   ` Rout, ChandanX
  2 siblings, 0 replies; 27+ messages in thread
From: Rout, ChandanX @ 2024-08-08  2:19 UTC (permalink / raw)
  To: Zaremba, Larysa, intel-wired-lan@lists.osuosl.org
  Cc: Drewek, Wojciech, Fijalkowski, Maciej, Jesper Dangaard Brouer,
	Daniel Borkmann, netdev@vger.kernel.org, John Fastabend,
	Alexei Starovoitov, linux-kernel@vger.kernel.org, Eric Dumazet,
	Kubiak, Michal, Nguyen, Anthony L, Nambiar, Amritha,
	Keller, Jacob E, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
	David S. Miller, Karlsson, Magnus, Pandey, Atul,
	Kuruvinakunnel, George, Nagraj, Shravan



>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
>Zaremba, Larysa
>Sent: Wednesday, July 24, 2024 10:19 PM
>To: intel-wired-lan@lists.osuosl.org
>Cc: Drewek, Wojciech <wojciech.drewek@intel.com>; Fijalkowski, Maciej
><maciej.fijalkowski@intel.com>; Jesper Dangaard Brouer <hawk@kernel.org>;
>Daniel Borkmann <daniel@iogearbox.net>; Zaremba, Larysa
><larysa.zaremba@intel.com>; netdev@vger.kernel.org; John Fastabend
><john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; linux-
>kernel@vger.kernel.org; Eric Dumazet <edumazet@google.com>; Kubiak,
>Michal <michal.kubiak@intel.com>; Nguyen, Anthony L
><anthony.l.nguyen@intel.com>; Nambiar, Amritha
><amritha.nambiar@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
>Jakub Kicinski <kuba@kernel.org>; bpf@vger.kernel.org; Paolo Abeni
><pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; Karlsson,
>Magnus <magnus.karlsson@intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-net v2 1/6] ice: move
>netif_queue_set_napi to rtnl-protected sections
>
>Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is not
>rtnl-locked when called from the reset. This creates the need to take the
>rtnl_lock just for a single function and complicates the synchronization with
>.ndo_bpf. At the same time, there no actual need to fill napi-to-queue
>information at this exact point.
>
>Fill napi-to-queue information when opening the VSI and clear it when the VSI is
>being closed. Those routines are already rtnl-locked.
>
>Also, rewrite napi-to-queue assignment in a way that prevents inclusion of XDP
>queues, as this leads to out-of-bounds writes, such as one below.
>
>[  +0.000004] BUG: KASAN: slab-out-of-bounds in
>netif_queue_set_napi+0x1c2/0x1e0 [  +0.000012] Write of size 8 at addr
>ffff889881727c80 by task bash/7047 [  +0.000006] CPU: 24 PID: 7047 Comm:
>bash Not tainted 6.10.0-rc2+ #2 [  +0.000004] Hardware name: Intel
>Corporation S2600WFT/S2600WFT, BIOS
>SE5C620.86B.02.01.0014.082620210524 08/26/2021 [  +0.000003] Call
>Trace:
>[  +0.000003]  <TASK>
>[  +0.000002]  dump_stack_lvl+0x60/0x80
>[  +0.000007]  print_report+0xce/0x630
>[  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
>[  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0 [  +0.000005]  ?
>netif_queue_set_napi+0x1c2/0x1e0 [  +0.000003]  kasan_report+0xe9/0x120
>[  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0 [  +0.000004]
>netif_queue_set_napi+0x1c2/0x1e0 [  +0.000005]  ice_vsi_close+0x161/0x670
>[ice] [  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice] [  +0.000095]
>ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice] [  +0.000086]
>ice_prepare_for_reset+0x299/0x750 [ice] [  +0.000087]
>pci_dev_save_and_disable+0x82/0xd0
>[  +0.000006]  pci_reset_function+0x12d/0x230 [  +0.000004]
>reset_store+0xa0/0x100 [  +0.000006]  ? __pfx_reset_store+0x10/0x10 [
>+0.000002]  ? __pfx_mutex_lock+0x10/0x10 [  +0.000004]  ?
>__check_object_size+0x4c1/0x640 [  +0.000007]
>kernfs_fop_write_iter+0x30b/0x4a0 [  +0.000006]  vfs_write+0x5d6/0xdf0 [
>+0.000005]  ? fd_install+0x180/0x350 [  +0.000005]  ?
>__pfx_vfs_write+0x10/0xA10 [  +0.000004]  ? do_fcntl+0x52c/0xcd0 [
>+0.000004]  ? kasan_save_track+0x13/0x60 [  +0.000003]  ?
>kasan_save_free_info+0x37/0x60 [  +0.000006]  ksys_write+0xfa/0x1d0 [
>+0.000003]  ? __pfx_ksys_write+0x10/0x10 [  +0.000002]  ?
>__x64_sys_fcntl+0x121/0x180 [  +0.000004]  ? _raw_spin_lock+0x87/0xe0 [
>+0.000005]  do_syscall_64+0x80/0x170 [  +0.000007]  ?
>_raw_spin_lock+0x87/0xe0 [  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
>[  +0.000003]  ? file_close_fd_locked+0x167/0x230 [  +0.000005]  ?
>syscall_exit_to_user_mode+0x7d/0x220
>[  +0.000005]  ? do_syscall_64+0x8c/0x170 [  +0.000004]  ?
>do_syscall_64+0x8c/0x170 [  +0.000003]  ? do_syscall_64+0x8c/0x170 [
>+0.000003]  ? fput+0x1a/0x2c0 [  +0.000004]  ? filp_close+0x19/0x30 [
>+0.000004]  ? do_dup2+0x25a/0x4c0 [  +0.000004]  ?
>__x64_sys_dup2+0x6e/0x2e0 [  +0.000002]  ?
>syscall_exit_to_user_mode+0x7d/0x220
>[  +0.000004]  ? do_syscall_64+0x8c/0x170 [  +0.000003]  ?
>__count_memcg_events+0x113/0x380 [  +0.000005]  ?
>handle_mm_fault+0x136/0x820 [  +0.000005]  ?
>do_user_addr_fault+0x444/0xa80 [  +0.000004]  ? clear_bhb_loop+0x25/0x80
>[  +0.000004]  ? clear_bhb_loop+0x25/0x80 [  +0.000002]
>entry_SYSCALL_64_after_hwframe+0x76/0x7e
>[  +0.000005] RIP: 0033:0x7f2033593154
>
>Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain
>scenarios")
>Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
>---
> drivers/net/ethernet/intel/ice/ice_base.c |  11 +-
>drivers/net/ethernet/intel/ice/ice_lib.c  | 129 ++++++----------------
>drivers/net/ethernet/intel/ice/ice_lib.h  |  10 +-
>drivers/net/ethernet/intel/ice/ice_main.c |  17 ++-
> 4 files changed, 49 insertions(+), 118 deletions(-)
>

Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring
  2024-07-24 16:48 ` [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
  2024-07-24 18:25   ` Jacob Keller
  2024-08-08  2:18   ` [Intel-wired-lan] " Rout, ChandanX
@ 2024-08-12 12:58   ` Maciej Fijalkowski
  2024-08-12 15:13     ` Larysa Zaremba
  2 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2024-08-12 12:58 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

On Wed, Jul 24, 2024 at 06:48:34PM +0200, Larysa Zaremba wrote:
> If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
> VSI without applying new ring configuration. When unconfiguring the VSI, we
> can encounter the state in which there is an XDP program but no XDP rings
> to destroy or there will be XDP rings that need to be destroyed, but no XDP
> program to indicate their presence.
> 
> When unconfiguring, rely on the presence of XDP rings rather then XDP
> program, as they better represent the current state that has to be
> destroyed.
> 

No Fixes: tag?

> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_lib.c  | 4 ++--
>  drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
>  drivers/net/ethernet/intel/ice/ice_xsk.c  | 6 +++---
>  3 files changed, 7 insertions(+), 7 deletions(-)

[...]

> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 2c1a843ba200..5dd50a2866cc 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -39,7 +39,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
>  	       sizeof(vsi_stat->rx_ring_stats[q_idx]->rx_stats));
>  	memset(&vsi_stat->tx_ring_stats[q_idx]->stats, 0,
>  	       sizeof(vsi_stat->tx_ring_stats[q_idx]->stats));
> -	if (ice_is_xdp_ena_vsi(vsi))
> +	if (vsi->xdp_rings)
>  		memset(&vsi->xdp_rings[q_idx]->ring_stats->stats, 0,
>  		       sizeof(vsi->xdp_rings[q_idx]->ring_stats->stats));
>  }
> @@ -52,7 +52,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
>  static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx)
>  {
>  	ice_clean_tx_ring(vsi->tx_rings[q_idx]);
> -	if (ice_is_xdp_ena_vsi(vsi)) {
> +	if (vsi->xdp_rings) {
>  		synchronize_rcu();
>  		ice_clean_tx_ring(vsi->xdp_rings[q_idx]);
>  	}
> @@ -189,7 +189,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, 0, tx_ring, &txq_meta);
>  	if (err)
>  		return err;
> -	if (ice_is_xdp_ena_vsi(vsi)) {
> +	if (vsi->xdp_rings) {

From XSK POV these checks are false positive and I will be sending a patch
that gets rid of it (I had this on my tree when working on timeout issues
but I pulled this out as it was not a -net candidate IMHO).

Just a heads up, this can go as-is currently.

>  		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
>  
>  		memset(&txq_meta, 0, sizeof(txq_meta));
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-07-24 16:48 ` [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
  2024-07-24 18:37   ` Jacob Keller
  2024-08-08  2:17   ` [Intel-wired-lan] " Rout, ChandanX
@ 2024-08-12 13:03   ` Maciej Fijalkowski
  2024-08-12 15:59     ` Larysa Zaremba
  2 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2024-08-12 13:03 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

On Wed, Jul 24, 2024 at 06:48:36PM +0200, Larysa Zaremba wrote:
> Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
> because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
> state, not VSI one. Therefore it does not protect the queue pair from
> e.g. reset.
> 
> Despite being useless, it still can deadlock the unfortunate functions that
> have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
> returns an error.
> 
> Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().

Why not just check the pf->state ? And address other broken callsites?

> 
> Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 5dd50a2866cc..d23fd4ea9129 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -163,7 +163,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	struct ice_q_vector *q_vector;
>  	struct ice_tx_ring *tx_ring;
>  	struct ice_rx_ring *rx_ring;
> -	int timeout = 50;
>  	int err;
>  
>  	if (q_idx >= vsi->num_rxq || q_idx >= vsi->num_txq)
> @@ -173,13 +172,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  	rx_ring = vsi->rx_rings[q_idx];
>  	q_vector = rx_ring->q_vector;
>  
> -	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
> -		timeout--;
> -		if (!timeout)
> -			return -EBUSY;
> -		usleep_range(1000, 2000);
> -	}
> -
>  	ice_qvec_dis_irq(vsi, rx_ring, q_vector);
>  	ice_qvec_toggle_napi(vsi, q_vector, false);
>  
> @@ -250,7 +242,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>  	ice_qvec_ena_irq(vsi, q_vector);
>  
>  	netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
> -	clear_bit(ICE_CFG_BUSY, vsi->state);
>  
>  	return 0;
>  }
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring
  2024-08-12 12:58   ` Maciej Fijalkowski
@ 2024-08-12 15:13     ` Larysa Zaremba
  0 siblings, 0 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-08-12 15:13 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

On Mon, Aug 12, 2024 at 02:58:00PM +0200, Maciej Fijalkowski wrote:
> On Wed, Jul 24, 2024 at 06:48:34PM +0200, Larysa Zaremba wrote:
> > If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
> > VSI without applying new ring configuration. When unconfiguring the VSI, we
> > can encounter the state in which there is an XDP program but no XDP rings
> > to destroy or there will be XDP rings that need to be destroyed, but no XDP
> > program to indicate their presence.
> > 
> > When unconfiguring, rely on the presence of XDP rings rather then XDP
> > program, as they better represent the current state that has to be
> > destroyed.
> > 
> 
> No Fixes: tag?
>

This is basically a fixup for a previous patch. The need to replace program 
check with a ring check emerges from the existence of an in-between state that 
my synchronization introduces. So it does not fix anything from a referencable 
patch. I have separated this because it was seriously getting in the way when 
trying to read the synchronization logic.

> > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_lib.c  | 4 ++--
> >  drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
> >  drivers/net/ethernet/intel/ice/ice_xsk.c  | 6 +++---
> >  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> [...]
> 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > index 2c1a843ba200..5dd50a2866cc 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > @@ -39,7 +39,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
> >  	       sizeof(vsi_stat->rx_ring_stats[q_idx]->rx_stats));
> >  	memset(&vsi_stat->tx_ring_stats[q_idx]->stats, 0,
> >  	       sizeof(vsi_stat->tx_ring_stats[q_idx]->stats));
> > -	if (ice_is_xdp_ena_vsi(vsi))
> > +	if (vsi->xdp_rings)
> >  		memset(&vsi->xdp_rings[q_idx]->ring_stats->stats, 0,
> >  		       sizeof(vsi->xdp_rings[q_idx]->ring_stats->stats));
> >  }
> > @@ -52,7 +52,7 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx)
> >  static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx)
> >  {
> >  	ice_clean_tx_ring(vsi->tx_rings[q_idx]);
> > -	if (ice_is_xdp_ena_vsi(vsi)) {
> > +	if (vsi->xdp_rings) {
> >  		synchronize_rcu();
> >  		ice_clean_tx_ring(vsi->xdp_rings[q_idx]);
> >  	}
> > @@ -189,7 +189,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
> >  	err = ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, 0, tx_ring, &txq_meta);
> >  	if (err)
> >  		return err;
> > -	if (ice_is_xdp_ena_vsi(vsi)) {
> > +	if (vsi->xdp_rings) {
> 
> From XSK POV these checks are false positive and I will be sending a patch
> that gets rid of it (I had this on my tree when working on timeout issues
> but I pulled this out as it was not a -net candidate IMHO).
> 
> Just a heads up, this can go as-is currently.
>

Ok, thanks.

> >  		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
> >  
> >  		memset(&txq_meta, 0, sizeof(txq_meta));
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-08-12 13:03   ` Maciej Fijalkowski
@ 2024-08-12 15:59     ` Larysa Zaremba
  2024-08-13 10:28       ` Maciej Fijalkowski
  0 siblings, 1 reply; 27+ messages in thread
From: Larysa Zaremba @ 2024-08-12 15:59 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

On Mon, Aug 12, 2024 at 03:03:19PM +0200, Maciej Fijalkowski wrote:
> On Wed, Jul 24, 2024 at 06:48:36PM +0200, Larysa Zaremba wrote:
> > Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
> > because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
> > state, not VSI one. Therefore it does not protect the queue pair from
> > e.g. reset.
> > 
> > Despite being useless, it still can deadlock the unfortunate functions that
> > have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
> > returns an error.
> > 
> > Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
> 
> Why not just check the pf->state ?

I would just cite Jakub: "you lose lockdep and all other infra normal mutex 
would give you." [0]

[0] https://lore.kernel.org/netdev/20240612140935.54981c49@kernel.org/

> And address other broken callsites?

Because the current state of sychronization does not allow me to assume this 
would fix anything and testing all the places would be out of scope for theese 
series.

With Dawid's patch [1], a mutex for XDP and miscellaneous changes from these 
series I think we would probably come pretty close being able to get rid of 
ICE_CFG_BUSY at least when locking software resources.

[1] 
https://lore.kernel.org/netdev/20240812125009.62635-1-dawid.osuchowski@linux.intel.com/

> > 
> > Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
> >  1 file changed, 9 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > index 5dd50a2866cc..d23fd4ea9129 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > @@ -163,7 +163,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
> >  	struct ice_q_vector *q_vector;
> >  	struct ice_tx_ring *tx_ring;
> >  	struct ice_rx_ring *rx_ring;
> > -	int timeout = 50;
> >  	int err;
> >  
> >  	if (q_idx >= vsi->num_rxq || q_idx >= vsi->num_txq)
> > @@ -173,13 +172,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
> >  	rx_ring = vsi->rx_rings[q_idx];
> >  	q_vector = rx_ring->q_vector;
> >  
> > -	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
> > -		timeout--;
> > -		if (!timeout)
> > -			return -EBUSY;
> > -		usleep_range(1000, 2000);
> > -	}
> > -
> >  	ice_qvec_dis_irq(vsi, rx_ring, q_vector);
> >  	ice_qvec_toggle_napi(vsi, q_vector, false);
> >  
> > @@ -250,7 +242,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
> >  	ice_qvec_ena_irq(vsi, q_vector);
> >  
> >  	netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
> > -	clear_bit(ICE_CFG_BUSY, vsi->state);
> >  
> >  	return 0;
> >  }
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code
  2024-08-12 15:59     ` Larysa Zaremba
@ 2024-08-13 10:28       ` Maciej Fijalkowski
  0 siblings, 0 replies; 27+ messages in thread
From: Maciej Fijalkowski @ 2024-08-13 10:28 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

On Mon, Aug 12, 2024 at 05:59:21PM +0200, Larysa Zaremba wrote:
> On Mon, Aug 12, 2024 at 03:03:19PM +0200, Maciej Fijalkowski wrote:
> > On Wed, Jul 24, 2024 at 06:48:36PM +0200, Larysa Zaremba wrote:
> > > Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
> > > because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
> > > state, not VSI one. Therefore it does not protect the queue pair from
> > > e.g. reset.
> > > 
> > > Despite being useless, it still can deadlock the unfortunate functions that
> > > have fell into the same ICE_CFG_BUSY-VSI trap. This happens if ice_qp_ena
> > > returns an error.
> > > 
> > > Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
> > 
> > Why not just check the pf->state ?
> 
> I would just cite Jakub: "you lose lockdep and all other infra normal mutex 
> would give you." [0]

I was not sure why you're bringing up mutex here but I missed 2nd patch
somehow :) let me start from the beginning.

> 
> [0] https://lore.kernel.org/netdev/20240612140935.54981c49@kernel.org/
> 
> > And address other broken callsites?
> 
> Because the current state of sychronization does not allow me to assume this 
> would fix anything and testing all the places would be out of scope for theese 
> series.
> 
> With Dawid's patch [1], a mutex for XDP and miscellaneous changes from these 
> series I think we would probably come pretty close being able to get rid of 
> ICE_CFG_BUSY at least when locking software resources.
> 
> [1] 
> https://lore.kernel.org/netdev/20240812125009.62635-1-dawid.osuchowski@linux.intel.com/
> 
> > > 
> > > Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> > > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > > ---
> > >  drivers/net/ethernet/intel/ice/ice_xsk.c | 9 ---------
> > >  1 file changed, 9 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > > index 5dd50a2866cc..d23fd4ea9129 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > > @@ -163,7 +163,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
> > >  	struct ice_q_vector *q_vector;
> > >  	struct ice_tx_ring *tx_ring;
> > >  	struct ice_rx_ring *rx_ring;
> > > -	int timeout = 50;
> > >  	int err;
> > >  
> > >  	if (q_idx >= vsi->num_rxq || q_idx >= vsi->num_txq)
> > > @@ -173,13 +172,6 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
> > >  	rx_ring = vsi->rx_rings[q_idx];
> > >  	q_vector = rx_ring->q_vector;
> > >  
> > > -	while (test_and_set_bit(ICE_CFG_BUSY, vsi->state)) {
> > > -		timeout--;
> > > -		if (!timeout)
> > > -			return -EBUSY;
> > > -		usleep_range(1000, 2000);
> > > -	}
> > > -
> > >  	ice_qvec_dis_irq(vsi, rx_ring, q_vector);
> > >  	ice_qvec_toggle_napi(vsi, q_vector, false);
> > >  
> > > @@ -250,7 +242,6 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
> > >  	ice_qvec_ena_irq(vsi, q_vector);
> > >  
> > >  	netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
> > > -	clear_bit(ICE_CFG_BUSY, vsi->state);
> > >  
> > >  	return 0;
> > >  }
> > > -- 
> > > 2.43.0
> > > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex
  2024-07-24 16:48 ` [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
  2024-07-24 18:24   ` Jacob Keller
  2024-08-08  2:16   ` [Intel-wired-lan] " Rout, ChandanX
@ 2024-08-13 11:31   ` Maciej Fijalkowski
  2024-08-13 13:36     ` Larysa Zaremba
  2 siblings, 1 reply; 27+ messages in thread
From: Maciej Fijalkowski @ 2024-08-13 11:31 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

On Wed, Jul 24, 2024 at 06:48:33PM +0200, Larysa Zaremba wrote:
> The main threat to data consistency in ice_xdp() is a possible asynchronous
> PF reset. It can be triggered by a user or by TX timeout handler.
> 
> XDP setup and PF reset code access the same resources in the following
> sections:
> * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
> * ice_vsi_rebuild() for the PF VSI - not protected
> * ice_vsi_open() - already rtnl-locked
> 
> With an unfortunate timing, such accesses can result in a crash such as the
> one below:
> 
> [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
> [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
> [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
> [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
> [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
> [ +0.394718] ice 0000:b1:00.0: PTP reset successful
> [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
> [ +0.000045] #PF: supervisor read access in kernel mode
> [ +0.000023] #PF: error_code(0x0000) - not-present page
> [ +0.000023] PGD 0 P4D 0
> [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
> [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> [ +0.000036] Workqueue: ice ice_service_task [ice]
> [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
> [...]
> [ +0.000013] Call Trace:
> [ +0.000016] <TASK>
> [ +0.000014] ? __die+0x1f/0x70
> [ +0.000029] ? page_fault_oops+0x171/0x4f0
> [ +0.000029] ? schedule+0x3b/0xd0
> [ +0.000027] ? exc_page_fault+0x7b/0x180
> [ +0.000022] ? asm_exc_page_fault+0x22/0x30
> [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
> [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
> [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
> [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
> [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
> [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
> [ +0.000145] ice_rebuild+0x18c/0x840 [ice]
> [ +0.000145] ? delay_tsc+0x4a/0xc0
> [ +0.000022] ? delay_tsc+0x92/0xc0
> [ +0.000020] ice_do_reset+0x140/0x180 [ice]
> [ +0.000886] ice_service_task+0x404/0x1030 [ice]
> [ +0.000824] process_one_work+0x171/0x340
> [ +0.000685] worker_thread+0x277/0x3a0
> [ +0.000675] ? preempt_count_add+0x6a/0xa0
> [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
> [ +0.000679] ? __pfx_worker_thread+0x10/0x10
> [ +0.000653] kthread+0xf0/0x120
> [ +0.000635] ? __pfx_kthread+0x10/0x10
> [ +0.000616] ret_from_fork+0x2d/0x50
> [ +0.000612] ? __pfx_kthread+0x10/0x10
> [ +0.000604] ret_from_fork_asm+0x1b/0x30
> [ +0.000604] </TASK>
> 
> The previous way of handling this through returning -EBUSY is not viable,
> particularly when destroying AF_XDP socket, because the kernel proceeds
> with removal anyway.
> 
> There is plenty of code between those calls and there is no need to create
> a large critical section that covers all of them, same as there is no need
> to protect ice_vsi_rebuild() with rtnl_lock().
> 
> Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
> 
> Leaving unprotected sections in between would result in two states that
> have to be considered:
> 1. when the VSI is closed, but not yet rebuild
> 2. when VSI is already rebuild, but not yet open
> 
> The latter case is actually already handled through !netif_running() case,
> we just need to adjust flag checking a little. The former one is not as
> trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
> hardware interaction happens, this can make adding/deleting rings exit
> with an error. Luckily, VSI rebuild is pending and can apply new
> configuration for us in a managed fashion.
> 
> Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
> indicate that ice_xdp() can just hot-swap the program.

couldn't this be a separate patch?

> 
> Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> Fixes: efc2214b6047 ("ice: Add support for XDP")
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h      |  2 ++
>  drivers/net/ethernet/intel/ice/ice_lib.c  | 26 +++++++++++++++--------
>  drivers/net/ethernet/intel/ice/ice_main.c | 19 ++++++++++++-----
>  drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 ++-
>  4 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index 99a75a59078e..3d7a0abc13ab 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -318,6 +318,7 @@ enum ice_vsi_state {
>  	ICE_VSI_UMAC_FLTR_CHANGED,
>  	ICE_VSI_MMAC_FLTR_CHANGED,
>  	ICE_VSI_PROMISC_CHANGED,
> +	ICE_VSI_REBUILD_PENDING,
>  	ICE_VSI_STATE_NBITS		/* must be last */
>  };
>  
> @@ -411,6 +412,7 @@ struct ice_vsi {
>  	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
>  	u16 num_xdp_txq;		 /* Used XDP queues */
>  	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
> +	struct mutex xdp_state_lock;
>  
>  	struct net_device **target_netdevs;
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index 5f2ddcaf7031..bbef5ec67cae 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -447,6 +447,7 @@ static void ice_vsi_free(struct ice_vsi *vsi)
>  
>  	ice_vsi_free_stats(vsi);
>  	ice_vsi_free_arrays(vsi);
> +	mutex_destroy(&vsi->xdp_state_lock);
>  	mutex_unlock(&pf->sw_mutex);
>  	devm_kfree(dev, vsi);
>  }
> @@ -626,6 +627,8 @@ static struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf)
>  	pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
>  					 pf->next_vsi);
>  
> +	mutex_init(&vsi->xdp_state_lock);
> +
>  unlock_pf:
>  	mutex_unlock(&pf->sw_mutex);
>  	return vsi;
> @@ -2973,19 +2976,24 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
>  	if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
>  		return -EINVAL;
>  
> +	mutex_lock(&vsi->xdp_state_lock);
> +	clear_bit(ICE_VSI_REBUILD_PENDING, vsi->state);
> +
>  	ret = ice_vsi_realloc_stat_arrays(vsi);
>  	if (ret)
> -		goto err_vsi_cfg;
> +		goto unlock;
>  
>  	ice_vsi_decfg(vsi);
>  	ret = ice_vsi_cfg_def(vsi);
>  	if (ret)
> -		goto err_vsi_cfg;
> +		goto unlock;
>  
>  	coalesce = kcalloc(vsi->num_q_vectors,
>  			   sizeof(struct ice_coalesce_stored), GFP_KERNEL);
> -	if (!coalesce)
> -		return -ENOMEM;
> +	if (!coalesce) {
> +		ret = -ENOMEM;

Knee-jerk reaction would be to deconfig things that ice_vsi_cfg_def()
setup above.

So I think the order of kfree and ice_vsi_decfg should be swapped,
something like:

	if (!coalesce) {
		ret = -ENOMEM;
		goto err_mem_alloc;
	}

err_vsi_cfg_tc_lan:
	kfree(coalesce);
err_mem_alloc:
	ice_vsi_decfg(vsi);
unlock:
	mutex_unlock(&vsi->xdp_state_lock);
	return ret;


or am I missing something?

> +		goto unlock;
> +	}
>  
>  	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
>  
> @@ -2996,19 +3004,19 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
>  			goto err_vsi_cfg_tc_lan;
>  		}
>  
> -		kfree(coalesce);
> -		return ice_schedule_reset(pf, ICE_RESET_PFR);
> +		ret = ice_schedule_reset(pf, ICE_RESET_PFR);
> +		goto err_vsi_cfg_tc_lan;
>  	}
>  
>  	ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors);
>  	kfree(coalesce);
> -
> -	return 0;
> +	goto unlock;
>  
>  err_vsi_cfg_tc_lan:
>  	ice_vsi_decfg(vsi);
>  	kfree(coalesce);
> -err_vsi_cfg:
> +unlock:
> +	mutex_unlock(&vsi->xdp_state_lock);
>  	return ret;
>  }
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 8ed1798bb06e..e50526b491fc 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -611,6 +611,7 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
>  	/* clear SW filtering DB */
>  	ice_clear_hw_tbls(hw);
>  	/* disable the VSIs and their queues that are not already DOWN */
> +	set_bit(ICE_VSI_REBUILD_PENDING, ice_get_main_vsi(pf)->state);
>  	ice_pf_dis_all_vsi(pf, false);
>  
>  	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
> @@ -3011,7 +3012,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>  	}
>  
>  	/* hot swap progs and avoid toggling link */
> -	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
> +	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
> +	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
>  		ice_vsi_assign_bpf_prog(vsi, prog);
>  		return 0;
>  	}
> @@ -3083,21 +3085,28 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>  {
>  	struct ice_netdev_priv *np = netdev_priv(dev);
>  	struct ice_vsi *vsi = np->vsi;
> +	int ret;
>  
>  	if (vsi->type != ICE_VSI_PF) {
>  		NL_SET_ERR_MSG_MOD(xdp->extack, "XDP can be loaded only on PF VSI");
>  		return -EINVAL;
>  	}
>  
> +	mutex_lock(&vsi->xdp_state_lock);
> +
>  	switch (xdp->command) {
>  	case XDP_SETUP_PROG:
> -		return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> +		ret = ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> +		break;
>  	case XDP_SETUP_XSK_POOL:
> -		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
> -					  xdp->xsk.queue_id);
> +		ret = ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id);
> +		break;
>  	default:
> -		return -EINVAL;
> +		ret = -EINVAL;
>  	}
> +
> +	mutex_unlock(&vsi->xdp_state_lock);
> +	return ret;
>  }
>  
>  /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index a65955eb23c0..2c1a843ba200 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -379,7 +379,8 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
>  		goto failure;
>  	}
>  
> -	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
> +	if_running = !test_bit(ICE_VSI_DOWN, vsi->state) &&
> +		     ice_is_xdp_ena_vsi(vsi);
>  
>  	if (if_running) {
>  		struct ice_rx_ring *rx_ring = vsi->rx_rings[qid];
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex
  2024-08-13 11:31   ` Maciej Fijalkowski
@ 2024-08-13 13:36     ` Larysa Zaremba
  0 siblings, 0 replies; 27+ messages in thread
From: Larysa Zaremba @ 2024-08-13 13:36 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: intel-wired-lan, Tony Nguyen, David S. Miller, Jacob Keller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	linux-kernel, bpf, magnus.karlsson, Michal Kubiak,
	Wojciech Drewek, Amritha Nambiar

On Tue, Aug 13, 2024 at 01:31:57PM +0200, Maciej Fijalkowski wrote:
> On Wed, Jul 24, 2024 at 06:48:33PM +0200, Larysa Zaremba wrote:
> > The main threat to data consistency in ice_xdp() is a possible asynchronous
> > PF reset. It can be triggered by a user or by TX timeout handler.
> > 
> > XDP setup and PF reset code access the same resources in the following
> > sections:
> > * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
> > * ice_vsi_rebuild() for the PF VSI - not protected
> > * ice_vsi_open() - already rtnl-locked
> > 
> > With an unfortunate timing, such accesses can result in a crash such as the
> > one below:
> > 
> > [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
> > [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
> > [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
> > [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
> > [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
> > [ +0.394718] ice 0000:b1:00.0: PTP reset successful
> > [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
> > [ +0.000045] #PF: supervisor read access in kernel mode
> > [ +0.000023] #PF: error_code(0x0000) - not-present page
> > [ +0.000023] PGD 0 P4D 0
> > [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
> > [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
> > [ +0.000036] Workqueue: ice ice_service_task [ice]
> > [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
> > [...]
> > [ +0.000013] Call Trace:
> > [ +0.000016] <TASK>
> > [ +0.000014] ? __die+0x1f/0x70
> > [ +0.000029] ? page_fault_oops+0x171/0x4f0
> > [ +0.000029] ? schedule+0x3b/0xd0
> > [ +0.000027] ? exc_page_fault+0x7b/0x180
> > [ +0.000022] ? asm_exc_page_fault+0x22/0x30
> > [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
> > [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
> > [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
> > [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
> > [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
> > [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
> > [ +0.000145] ice_rebuild+0x18c/0x840 [ice]
> > [ +0.000145] ? delay_tsc+0x4a/0xc0
> > [ +0.000022] ? delay_tsc+0x92/0xc0
> > [ +0.000020] ice_do_reset+0x140/0x180 [ice]
> > [ +0.000886] ice_service_task+0x404/0x1030 [ice]
> > [ +0.000824] process_one_work+0x171/0x340
> > [ +0.000685] worker_thread+0x277/0x3a0
> > [ +0.000675] ? preempt_count_add+0x6a/0xa0
> > [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
> > [ +0.000679] ? __pfx_worker_thread+0x10/0x10
> > [ +0.000653] kthread+0xf0/0x120
> > [ +0.000635] ? __pfx_kthread+0x10/0x10
> > [ +0.000616] ret_from_fork+0x2d/0x50
> > [ +0.000612] ? __pfx_kthread+0x10/0x10
> > [ +0.000604] ret_from_fork_asm+0x1b/0x30
> > [ +0.000604] </TASK>
> > 
> > The previous way of handling this through returning -EBUSY is not viable,
> > particularly when destroying AF_XDP socket, because the kernel proceeds
> > with removal anyway.
> > 
> > There is plenty of code between those calls and there is no need to create
> > a large critical section that covers all of them, same as there is no need
> > to protect ice_vsi_rebuild() with rtnl_lock().
> > 
> > Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
> > 
> > Leaving unprotected sections in between would result in two states that
> > have to be considered:
> > 1. when the VSI is closed, but not yet rebuild
> > 2. when VSI is already rebuild, but not yet open
> > 
> > The latter case is actually already handled through !netif_running() case,
> > we just need to adjust flag checking a little. The former one is not as
> > trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
> > hardware interaction happens, this can make adding/deleting rings exit
> > with an error. Luckily, VSI rebuild is pending and can apply new
> > configuration for us in a managed fashion.
> > 
> > Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
> > indicate that ice_xdp() can just hot-swap the program.
> 
> couldn't this be a separate patch?
> 

I think, this is an integral part of the synchronization concept, without it 
locking the rebuild would not have much sense.

> > 
> > Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
> > Fixes: efc2214b6047 ("ice: Add support for XDP")
> > Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> > Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice.h      |  2 ++
> >  drivers/net/ethernet/intel/ice/ice_lib.c  | 26 +++++++++++++++--------
> >  drivers/net/ethernet/intel/ice/ice_main.c | 19 ++++++++++++-----
> >  drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 ++-
> >  4 files changed, 35 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> > index 99a75a59078e..3d7a0abc13ab 100644
> > --- a/drivers/net/ethernet/intel/ice/ice.h
> > +++ b/drivers/net/ethernet/intel/ice/ice.h
> > @@ -318,6 +318,7 @@ enum ice_vsi_state {
> >  	ICE_VSI_UMAC_FLTR_CHANGED,
> >  	ICE_VSI_MMAC_FLTR_CHANGED,
> >  	ICE_VSI_PROMISC_CHANGED,
> > +	ICE_VSI_REBUILD_PENDING,
> >  	ICE_VSI_STATE_NBITS		/* must be last */
> >  };
> >  
> > @@ -411,6 +412,7 @@ struct ice_vsi {
> >  	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
> >  	u16 num_xdp_txq;		 /* Used XDP queues */
> >  	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
> > +	struct mutex xdp_state_lock;
> >  
> >  	struct net_device **target_netdevs;
> >  
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> > index 5f2ddcaf7031..bbef5ec67cae 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > @@ -447,6 +447,7 @@ static void ice_vsi_free(struct ice_vsi *vsi)
> >  
> >  	ice_vsi_free_stats(vsi);
> >  	ice_vsi_free_arrays(vsi);
> > +	mutex_destroy(&vsi->xdp_state_lock);
> >  	mutex_unlock(&pf->sw_mutex);
> >  	devm_kfree(dev, vsi);
> >  }
> > @@ -626,6 +627,8 @@ static struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf)
> >  	pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
> >  					 pf->next_vsi);
> >  
> > +	mutex_init(&vsi->xdp_state_lock);
> > +
> >  unlock_pf:
> >  	mutex_unlock(&pf->sw_mutex);
> >  	return vsi;
> > @@ -2973,19 +2976,24 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> >  	if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
> >  		return -EINVAL;
> >  
> > +	mutex_lock(&vsi->xdp_state_lock);
> > +	clear_bit(ICE_VSI_REBUILD_PENDING, vsi->state);
> > +
> >  	ret = ice_vsi_realloc_stat_arrays(vsi);
> >  	if (ret)
> > -		goto err_vsi_cfg;
> > +		goto unlock;
> >  
> >  	ice_vsi_decfg(vsi);
> >  	ret = ice_vsi_cfg_def(vsi);
> >  	if (ret)
> > -		goto err_vsi_cfg;
> > +		goto unlock;
> >  
> >  	coalesce = kcalloc(vsi->num_q_vectors,
> >  			   sizeof(struct ice_coalesce_stored), GFP_KERNEL);
> > -	if (!coalesce)
> > -		return -ENOMEM;
> > +	if (!coalesce) {
> > +		ret = -ENOMEM;
> 
> Knee-jerk reaction would be to deconfig things that ice_vsi_cfg_def()
> setup above.
> 
> So I think the order of kfree and ice_vsi_decfg should be swapped,
> something like:
> 
> 	if (!coalesce) {
> 		ret = -ENOMEM;
> 		goto err_mem_alloc;
> 	}
> 
> err_vsi_cfg_tc_lan:
> 	kfree(coalesce);
> err_mem_alloc:
> 	ice_vsi_decfg(vsi);
> unlock:
> 	mutex_unlock(&vsi->xdp_state_lock);
> 	return ret;
> 
> 
> or am I missing something?
> 

You are correct, v3 it is :D

> > +		goto unlock;
> > +	}
> >  
> >  	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
> >  
> > @@ -2996,19 +3004,19 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags)
> >  			goto err_vsi_cfg_tc_lan;
> >  		}
> >  
> > -		kfree(coalesce);
> > -		return ice_schedule_reset(pf, ICE_RESET_PFR);
> > +		ret = ice_schedule_reset(pf, ICE_RESET_PFR);
> > +		goto err_vsi_cfg_tc_lan;
> >  	}
> >  
> >  	ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors);
> >  	kfree(coalesce);
> > -
> > -	return 0;
> > +	goto unlock;
> >  
> >  err_vsi_cfg_tc_lan:
> >  	ice_vsi_decfg(vsi);
> >  	kfree(coalesce);
> > -err_vsi_cfg:
> > +unlock:
> > +	mutex_unlock(&vsi->xdp_state_lock);
> >  	return ret;
> >  }
> >  
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > index 8ed1798bb06e..e50526b491fc 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -611,6 +611,7 @@ ice_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type)
> >  	/* clear SW filtering DB */
> >  	ice_clear_hw_tbls(hw);
> >  	/* disable the VSIs and their queues that are not already DOWN */
> > +	set_bit(ICE_VSI_REBUILD_PENDING, ice_get_main_vsi(pf)->state);
> >  	ice_pf_dis_all_vsi(pf, false);
> >  
> >  	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
> > @@ -3011,7 +3012,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
> >  	}
> >  
> >  	/* hot swap progs and avoid toggling link */
> > -	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
> > +	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
> > +	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
> >  		ice_vsi_assign_bpf_prog(vsi, prog);
> >  		return 0;
> >  	}
> > @@ -3083,21 +3085,28 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> >  {
> >  	struct ice_netdev_priv *np = netdev_priv(dev);
> >  	struct ice_vsi *vsi = np->vsi;
> > +	int ret;
> >  
> >  	if (vsi->type != ICE_VSI_PF) {
> >  		NL_SET_ERR_MSG_MOD(xdp->extack, "XDP can be loaded only on PF VSI");
> >  		return -EINVAL;
> >  	}
> >  
> > +	mutex_lock(&vsi->xdp_state_lock);
> > +
> >  	switch (xdp->command) {
> >  	case XDP_SETUP_PROG:
> > -		return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> > +		ret = ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
> > +		break;
> >  	case XDP_SETUP_XSK_POOL:
> > -		return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
> > -					  xdp->xsk.queue_id);
> > +		ret = ice_xsk_pool_setup(vsi, xdp->xsk.pool, xdp->xsk.queue_id);
> > +		break;
> >  	default:
> > -		return -EINVAL;
> > +		ret = -EINVAL;
> >  	}
> > +
> > +	mutex_unlock(&vsi->xdp_state_lock);
> > +	return ret;
> >  }
> >  
> >  /**
> > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > index a65955eb23c0..2c1a843ba200 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > @@ -379,7 +379,8 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
> >  		goto failure;
> >  	}
> >  
> > -	if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
> > +	if_running = !test_bit(ICE_VSI_DOWN, vsi->state) &&
> > +		     ice_is_xdp_ena_vsi(vsi);
> >  
> >  	if (if_running) {
> >  		struct ice_rx_ring *rx_ring = vsi->rx_rings[qid];
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-08-13 13:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-24 16:48 [PATCH iwl-net v2 0/6] ice: fix synchronization between .ndo_bpf() and reset Larysa Zaremba
2024-07-24 16:48 ` [PATCH iwl-net v2 1/6] ice: move netif_queue_set_napi to rtnl-protected sections Larysa Zaremba
2024-07-24 18:21   ` Jacob Keller
2024-07-24 20:40   ` Nambiar, Amritha
2024-08-08  2:19   ` [Intel-wired-lan] " Rout, ChandanX
2024-07-24 16:48 ` [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex Larysa Zaremba
2024-07-24 18:24   ` Jacob Keller
2024-08-08  2:16   ` [Intel-wired-lan] " Rout, ChandanX
2024-08-13 11:31   ` Maciej Fijalkowski
2024-08-13 13:36     ` Larysa Zaremba
2024-07-24 16:48 ` [PATCH iwl-net v2 3/6] ice: check for XDP rings instead of bpf program when unconfiguring Larysa Zaremba
2024-07-24 18:25   ` Jacob Keller
2024-08-08  2:18   ` [Intel-wired-lan] " Rout, ChandanX
2024-08-12 12:58   ` Maciej Fijalkowski
2024-08-12 15:13     ` Larysa Zaremba
2024-07-24 16:48 ` [PATCH iwl-net v2 4/6] ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset Larysa Zaremba
2024-07-24 18:27   ` Jacob Keller
2024-08-08  2:15   ` [Intel-wired-lan] " Rout, ChandanX
2024-07-24 16:48 ` [PATCH iwl-net v2 5/6] ice: remove ICE_CFG_BUSY locking from AF_XDP code Larysa Zaremba
2024-07-24 18:37   ` Jacob Keller
2024-08-08  2:17   ` [Intel-wired-lan] " Rout, ChandanX
2024-08-12 13:03   ` Maciej Fijalkowski
2024-08-12 15:59     ` Larysa Zaremba
2024-08-13 10:28       ` Maciej Fijalkowski
2024-07-24 16:48 ` [PATCH iwl-net v2 6/6] ice: do not bring the VSI up, if it was down before the XDP setup Larysa Zaremba
2024-07-24 18:40   ` Jacob Keller
2024-08-08  2:14   ` [Intel-wired-lan] " Rout, ChandanX

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).