public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Florian Kauer <florian.kauer@linutronix.de>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Naama Meir <naamax.meir@linux.intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.6 27/60] igc: avoid returning frame twice in XDP_REDIRECT
Date: Wed, 13 Mar 2024 12:36:34 -0400	[thread overview]
Message-ID: <20240313163707.615000-28-sashal@kernel.org> (raw)
In-Reply-To: <20240313163707.615000-1-sashal@kernel.org>

From: Florian Kauer <florian.kauer@linutronix.de>

[ Upstream commit ef27f655b438bed4c83680e4f01e1cde2739854b ]

When a frame can not be transmitted in XDP_REDIRECT
(e.g. due to a full queue), it is necessary to free
it by calling xdp_return_frame_rx_napi.

However, this is the responsibility of the caller of
the ndo_xdp_xmit (see for example bq_xmit_all in
kernel/bpf/devmap.c) and thus calling it inside
igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
driver) as well will lead to memory corruption.

In fact, bq_xmit_all expects that it can return all
frames after the last successfully transmitted one.
Therefore, break for the first not transmitted frame,
but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
This is equally implemented in other Intel drivers
such as the igb.

There are two alternatives to this that were rejected:
1. Return num_frames as all the frames would have been
   transmitted and release them inside igc_xdp_xmit.
   While it might work technically, it is not what
   the return value is meant to represent (i.e. the
   number of SUCCESSFULLY transmitted packets).
2. Rework kernel/bpf/devmap.c and all drivers to
   support non-consecutively dropped packets.
   Besides being complex, it likely has a negative
   performance impact without a significant gain
   since it is anyway unlikely that the next frame
   can be transmitted if the previous one was dropped.

The memory corruption can be reproduced with
the following script which leads to a kernel panic
after a few seconds.  It basically generates more
traffic than a i225 NIC can transmit and pushes it
via XDP_REDIRECT from a virtual interface to the
physical interface where frames get dropped.

   #!/bin/bash
   INTERFACE=enp4s0
   INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`

   sudo ip link add dev veth1 type veth peer name veth2
   sudo ip link set up $INTERFACE
   sudo ip link set up veth1
   sudo ip link set up veth2

   cat << EOF > redirect.bpf.c

   SEC("prog")
   int redirect(struct xdp_md *ctx)
   {
       return bpf_redirect($INTERFACE_IDX, 0);
   }

   char _license[] SEC("license") = "GPL";
   EOF
   clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
   sudo ip link set veth2 xdp obj redirect.bpf.o

   cat << EOF > pass.bpf.c

   SEC("prog")
   int pass(struct xdp_md *ctx)
   {
       return XDP_PASS;
   }

   char _license[] SEC("license") = "GPL";
   EOF
   clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
   sudo ip link set $INTERFACE xdp obj pass.bpf.o

   cat << EOF > trafgen.cfg

   {
     /* Ethernet Header */
     0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     const16(ETH_P_IP),

     /* IPv4 Header */
     0b01000101, 0,   # IPv4 version, IHL, TOS
     const16(1028),   # IPv4 total length (UDP length + 20 bytes (IP header))
     const16(2),      # IPv4 ident
     0b01000000, 0,   # IPv4 flags, fragmentation off
     64,              # IPv4 TTL
     17,              # Protocol UDP
     csumip(14, 33),  # IPv4 checksum

     /* UDP Header */
     10,  0, 1, 1,    # IP Src - adapt as needed
     10,  0, 1, 2,    # IP Dest - adapt as needed
     const16(6666),   # UDP Src Port
     const16(6666),   # UDP Dest Port
     const16(1008),   # UDP length (UDP header 8 bytes + payload length)
     csumudp(14, 34), # UDP checksum

     /* Payload */
     fill('W', 1000),
   }
   EOF

   sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp

Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 98de34d0ce07e..e549ffca88e39 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6489,7 +6489,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 	int cpu = smp_processor_id();
 	struct netdev_queue *nq;
 	struct igc_ring *ring;
-	int i, drops;
+	int i, nxmit;
 
 	if (unlikely(!netif_carrier_ok(dev)))
 		return -ENETDOWN;
@@ -6505,16 +6505,15 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 	/* Avoid transmit queue timeout since we share it with the slow path */
 	txq_trans_cond_update(nq);
 
-	drops = 0;
+	nxmit = 0;
 	for (i = 0; i < num_frames; i++) {
 		int err;
 		struct xdp_frame *xdpf = frames[i];
 
 		err = igc_xdp_init_tx_descriptor(ring, xdpf);
-		if (err) {
-			xdp_return_frame_rx_napi(xdpf);
-			drops++;
-		}
+		if (err)
+			break;
+		nxmit++;
 	}
 
 	if (flags & XDP_XMIT_FLUSH)
@@ -6522,7 +6521,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 
 	__netif_tx_unlock(nq);
 
-	return num_frames - drops;
+	return nxmit;
 }
 
 static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,
-- 
2.43.0


  parent reply	other threads:[~2024-03-13 16:37 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 01/60] dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 02/60] dmaengine: fsl-edma: utilize common dt-binding header file Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 03/60] dmaengine: fsl-edma: correct max_segment_size setting Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 04/60] ceph: switch to corrected encoding of max_xattr_size in mdsmap Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 05/60] mm: migrate: remove PageTransHuge check in numamigrate_isolate_page() Sasha Levin
2024-03-13 17:29   ` Hugh Dickins
2024-03-13 16:36 ` [PATCH 6.6 06/60] mm: migrate: remove THP mapcount " Sasha Levin
2024-03-13 17:31   ` Hugh Dickins
2024-03-13 16:36 ` [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio() Sasha Levin
2024-03-13 17:32   ` Hugh Dickins
2024-03-13 18:32     ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 08/60] mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 09/60] xfrm: Pass UDP encapsulation in TX packet offload Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 10/60] net: lan78xx: fix runtime PM count underflow on link stop Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 11/60] ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 12/60] i40e: disable NAPI right after disabling irqs when handling xsk_pool Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 13/60] ice: reorder disabling IRQ and NAPI in ice_qp_dis Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 14/60] Revert "net/mlx5: Block entering switchdev mode with ns inconsistency" Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 15/60] Revert "net/mlx5e: Check the number of elements before walk TC rhashtable" Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 16/60] net/mlx5: E-switch, Change flow rule destination checking Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 17/60] net/mlx5: Check capability for fw_reset Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 18/60] net/mlx5e: Change the warning when ignore_flow_level is not supported Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 19/60] net/mlx5e: Fix MACsec state loss upon state update in offload path Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 20/60] net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 21/60] net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 22/60] tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 23/60] geneve: make sure to pull inner header in geneve_rx() Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 24/60] net: sparx5: Fix use after free inside sparx5_del_mact_entry Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 25/60] ice: virtchnl: stop pretending to support RSS over AQ or registers Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 26/60] net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() Sasha Levin
2024-03-13 16:36 ` Sasha Levin [this message]
2024-03-13 16:36 ` [PATCH 6.6 28/60] net/ipv6: avoid possible UAF in ip6_route_mpath_notify() Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 29/60] bpf: check bpf_func_state->callback_depth when pruning states Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 30/60] xdp, bonding: Fix feature flags when there are no slave devs anymore Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 31/60] selftests/bpf: Fix up xdp bonding test wrt feature flags Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 32/60] cpumap: Zero-initialise xdp_rxq_info struct before running XDP program Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 33/60] net: dsa: microchip: fix register write order in ksz8_ind_write8() Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 34/60] net/rds: fix WARNING in rds_conn_connect_if_down Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 35/60] netfilter: nft_ct: fix l3num expectations with inet pseudo family Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 36/60] netfilter: nf_conntrack_h323: Add protection for bmp length out of range Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 37/60] erofs: apply proper VMA alignment for memory mapped files on THP Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 38/60] netrom: Fix a data-race around sysctl_netrom_default_path_quality Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 39/60] netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 40/60] netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 41/60] netrom: Fix a data-race around sysctl_netrom_transport_timeout Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 42/60] netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 43/60] netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 44/60] netrom: Fix a data-race around sysctl_netrom_transport_busy_delay Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 45/60] netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 46/60] netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 47/60] netrom: Fix a data-race around sysctl_netrom_routing_control Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 48/60] netrom: Fix a data-race around sysctl_netrom_link_fails_count Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 49/60] netrom: Fix data-races around sysctl_net_busy_read Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 50/60] net: pds_core: Fix possible double free in error handling path Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 51/60] KVM: s390: add stat counter for shadow gmap events Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 52/60] KVM: s390: vsie: fix race during shadow creation Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 53/60] readahead: avoid multiple marked readahead pages Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 54/60] selftests: mptcp: decrease BW in simult flows Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 55/60] exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 56/60] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 57/60] Documentation/hw-vuln: Add documentation for RFDS Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 58/60] x86/rfds: Mitigate Register File Data Sampling (RFDS) Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 59/60] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 60/60] Linux 6.6.22-rc1 Sasha Levin
2024-03-14  8:02 ` [PATCH 6.6 00/60] 6.6.22-rc1 review Bagas Sanjaya
2024-03-14 10:08 ` Naresh Kamboju
2024-03-14 11:56 ` Takeshi Ogasawara
2024-03-14 20:55 ` Florian Fainelli
2024-03-15 15:44 ` Mark Brown
2024-03-15 16:01 ` Ron Economos
2024-03-15 17:36 ` Harshit Mogalapalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240313163707.615000-28-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=anthony.l.nguyen@intel.com \
    --cc=florian.kauer@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=naamax.meir@linux.intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox