Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2] netpoll: fix a use-after-free on shutdown path
From: Breno Leitao @ 2026-06-25 12:03 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Amerigo Wang
  Cc: netdev, linux-kernel, vlad.wing, asantostc, paulmck, kernel-team,
	stable, Pavan Chebbi, Breno Leitao

There is a use-after-free error on netpoll, which is clearly detected by
KASAN.

      BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x3b/0x80
      Read of size 1 at addr ... by task kworker/9:1
      Workqueue: events queue_process
      Call Trace:
       skb_dequeue+0x1e/0xb0
       queue_process+0x2c/0x600
       process_scheduled_works+0x4b6/0x850
       worker_thread+0x414/0x5a0
      Allocated by task 242:
       __netpoll_setup+0x201/0x4a0
       netpoll_setup+0x249/0x550
       enabled_store+0x32f/0x380
      Freed by task 0:
       kfree+0x1b7/0x540
       rcu_core+0x3f8/0x7a0

The problem happens when there is a pending TX worker running in
parallel with the cleanup path.

This is what happens on netpoll shutdown path:

1) __netpoll_cleanup() is called
2) set dev->npinfo to NULL
3) call_rcu() with rcu_cleanup_netpoll_info()
  3.1) rcu_cleanup_netpoll_info() tries to cancel all workers with
       cancel_delayed_work(), but doesn't wait for the worker to finish
4) and kfree(npinfo);

Because 3.1) doesn't really cancel the work, as the comment says "we
can't call cancel_delayed_work_sync here, as we are in softirq", the TX
worker can run after 4).

Tl;DR: queue_process() is not an RCU reader, it reaches npinfo through
the work item via container_of().

Use disable_delayed_work_sync() to ensure the worker is completely
stopped and prevent any future re-arming attempts. Once npinfo is set
to NULL, senders will bail out and not queue new work. The disable flag
ensures any in-flight re-arming attempts also fail silently.

In the future, we can do the cleanup inline here without needing the
npinfo->rcu rcu_head, but that is net-next material.

Cc: stable@vger.kernel.org
Fixes: 38e6bc185d95 ("netpoll: make __netpoll_cleanup non-block")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Remove the synchronize_rcu() and keep cancel the tx_work
  before call_rcu(). (Jakub)
- Link to v1: https://lore.kernel.org/r/20260622-netpoll_rcu_fix-v1-1-15c3285e92e6@debian.org
---
 net/core/netpoll.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 229dde818ab33..96d5945e6a30f 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -633,14 +633,6 @@ static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
 			container_of(rcu_head, struct netpoll_info, rcu);
 
 	skb_queue_purge(&npinfo->txq);
-
-	/* we can't call cancel_delayed_work_sync here, as we are in softirq */
-	cancel_delayed_work(&npinfo->tx_work);
-
-	/* clean after last, unfinished work */
-	__skb_queue_purge(&npinfo->txq);
-	/* now cancel it again */
-	cancel_delayed_work(&npinfo->tx_work);
 	kfree(npinfo);
 }
 
@@ -664,6 +656,7 @@ static void __netpoll_cleanup(struct netpoll *np)
 			ops->ndo_netpoll_cleanup(np->dev);
 
 		RCU_INIT_POINTER(np->dev->npinfo, NULL);
+		disable_delayed_work_sync(&npinfo->tx_work);
 		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
 	}
 

---
base-commit: d07d80b6a129a44538cda1549b7acf95154fb197
change-id: 20260622-netpoll_rcu_fix-def7bce1207a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply related

* Re: [PATCH net 1/3] i40e: keep q_vectors array in sync with channel count changes
From: Maciej Fijalkowski @ 2026-06-25 11:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tony Nguyen, davem, pabeni, edumazet, andrew+netdev, netdev,
	poros, arkadiusz.kubalewski, przemyslaw.kitszel, horms,
	aleksandr.loktionov, pmenzel, sx.rinitha
In-Reply-To: <20260605163934.547c7bdd@kernel.org>

On Fri, Jun 05, 2026 at 04:39:34PM -0700, Jakub Kicinski wrote:
> On Fri, 5 Jun 2026 11:01:19 -0700 Tony Nguyen wrote:
> > > Should the new err_lump label, and the existing err_vsi exits from the
> > > two allocation steps above, instead unwind through the err_rings block
> > > (unregister_netdev / free_netdev / i40e_devlink_destroy_port /
> > > i40e_aq_delete_element) the way i40e_vsi_setup()'s err_msix path does?
> > > 
> > > The pre-patch code had the same defective err_vsi target for the
> > > qp_pile and arrays paths, but the patch adds two new failure points
> > > (the unconditional q_vectors kzalloc and the new
> > > i40e_vsi_setup_vectors() call) that route into it during reset
> > > rebuild, where vsi->netdev is already registered.  
> >  
> > This does seem valid, but as mentioned by Sashiko the pre-patch code has 
> > the same target/issue. There's a recent submission [1], with changes 
> > requested, that should cover this. Did you want to take this now or wait 
> > and have it sent with this other one?
> 
> Hm. I convinced myself yesterday that the old code did _not_ 
> have the issue because it was pass false as the second arg to
> i40e_vsi_{alloc,free}_arrays() ? Good chance that I misread,
> it's tricky code. As much as I would love to apply this to prevent 
> the deadlock in NIPA - let's wait for the follow up. I'll pick up 
> the other two patches from this series off the list.

FWIW it was our beloved "pre-existing issue", alloc arrays could fail at
ring memory allocation and bail out without de-registering netdev.

Regardless, I'm gonna send a v4 with preceding patch that should fix
this...

> 

^ permalink raw reply

* [PATCH v2 net-next] selftests/xsk: Preserve UMEM view in BIDIRECTIONAL test
From: Maciej Fijalkowski @ 2026-06-25 11:52 UTC (permalink / raw)
  To: netdev
  Cc: bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	tushar.vyavahare, kerneljasonxing, Maciej Fijalkowski

The UMEM state refactor made __send_pkts() use xsk->umem for Tx
address generation. At the same time, the shared-UMEM Tx setup copies the
Rx UMEM state into a Tx-local state object and resets base_addr and
next_buffer before configuring the Tx socket.

Passing that Tx-local object to xsk_configure() makes xsk->umem point to
the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
the roles are switched: the same socket is then used for Rx validation, but
received descriptors from the other logical UMEM half are checked against
base_addr == 0. With the new UMEM bounds check, a valid address such as
base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
window.

Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
address generation, preserving the BIDIRECTIONAL test's intent of using
the proper logical UMEM half after the direction switch.

Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
Reviewed-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Tested-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
v2:
- fix SoB line
- rebase
- add tags from Tushar
---
 tools/testing/selftests/bpf/prog_tests/test_xsk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index 72875071d4f1..26437d4bdc8e 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -1169,8 +1169,8 @@ static int receive_pkts(struct test_spec *test)
 static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, bool timeout)
 {
 	u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len;
+	struct xsk_umem_info *umem = ifobject->xsk_arr[0].umem_real;
 	struct pkt_stream *pkt_stream = xsk->pkt_stream;
-	struct xsk_umem_info *umem = xsk->umem;
 	bool use_poll = ifobject->use_poll;
 	struct pollfd fds = { };
 	int ret;
@@ -1521,7 +1521,7 @@ static int thread_common_ops_tx(struct test_spec *test, struct ifobject *ifobjec
 	umem_tx->base_addr = 0;
 	umem_tx->next_buffer = 0;
 
-	ret = xsk_configure(test, ifobject, umem_tx, true);
+	ret = xsk_configure(test, ifobject, umem_rx, true);
 	if (ret)
 		return ret;
 	ifobject->xsk = &ifobject->xsk_arr[0];
-- 
2.43.0


^ permalink raw reply related

* [PATCH net] mlxsw: spectrum_acl_erp: Fix const qualifier of delta_clear()
From: Evgenii Burenchev @ 2026-06-25 11:48 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Evgenii Burenchev, idosch, petrm, andrew+netdev, davem, edumazet,
	kuba, pabeni, jiri, netdev, linux-kernel, lvc-project

mlxsw_sp_acl_erp_delta_clear() takes 'const char *enc_key' but modifies
the memory it points to. This is a logical error in the function
declaration.

The only caller passes a non-const buffer (aentry->ht_key.enc_key), so
the const qualifier is misleading and unnecessary.

Remove const from the enc_key parameter to match the actual usage.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c  | 2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index cbb272a96359..0d0cd093b3c6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -1118,7 +1118,7 @@ u8 mlxsw_sp_acl_erp_delta_value(const struct mlxsw_sp_acl_erp_delta *delta,
 }
 
 void mlxsw_sp_acl_erp_delta_clear(const struct mlxsw_sp_acl_erp_delta *delta,
-				  const char *enc_key)
+				  char *enc_key)
 {
 	u16 start = delta->start;
 	u8 mask = delta->mask;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
index 010204f73ea4..67cc7a5737dd 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
@@ -245,7 +245,7 @@ u8 mlxsw_sp_acl_erp_delta_mask(const struct mlxsw_sp_acl_erp_delta *delta);
 u8 mlxsw_sp_acl_erp_delta_value(const struct mlxsw_sp_acl_erp_delta *delta,
 				const char *enc_key);
 void mlxsw_sp_acl_erp_delta_clear(const struct mlxsw_sp_acl_erp_delta *delta,
-				  const char *enc_key);
+				  char *enc_key);
 
 struct mlxsw_sp_acl_erp_mask;
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v2] octeontx2-af: Block VFs from clobbering special CGX PKIND state
From: kernel test robot @ 2026-06-25 11:47 UTC (permalink / raw)
  To: Ratheesh Kannoth, davem, gakula, linux-kernel, netdev, sgoutham
  Cc: llvm, oe-kbuild-all, andrew+netdev, edumazet, kuba, pabeni,
	Hariprasad Kelam, Ratheesh Kannoth
In-Reply-To: <20260625044621.2841831-1-rkannoth@marvell.com>

Hi Ratheesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]
[also build test WARNING on linus/master v7.1 next-20260623]
[cannot apply to linux-review/Ratheesh-Kannoth/octeontx2-af-Block-VFs-from-clobbering-special-CGX-PKIND-state/20260622-133621 horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ratheesh-Kannoth/octeontx2-af-Block-VFs-from-clobbering-special-CGX-PKIND-state/20260625-124846
base:   net/main
patch link:    https://lore.kernel.org/r/20260625044621.2841831-1-rkannoth%40marvell.com
patch subject: [PATCH net v2] octeontx2-af: Block VFs from clobbering special CGX PKIND state
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20260625/202606251954.vsXupLpQ-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 6cc609bb250b21b47fc7d394b4019101e9983597)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260625/202606251954.vsXupLpQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606251954.vsXupLpQ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1522:6: warning: variable 'pf' set but not used [-Wunused-but-set-variable]
    1522 |         int pf;
         |             ^
>> drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1696:24: warning: variable 'cgx' is uninitialized when used here [-Wuninitialized]
    1696 |                 cgxd = rvu_cgx_pdata(cgx, rvu);
         |                                      ^~~
   drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1521:8: note: initialize the variable 'cgx' to silence this warning
    1521 |         u8 cgx;
         |               ^
         |                = '\0'
   2 warnings generated.


vim +/pf +1522 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c

  1506	
  1507	int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
  1508					  struct nix_lf_alloc_req *req,
  1509					  struct nix_lf_alloc_rsp *rsp)
  1510	{
  1511		int nixlf, qints, hwctx_size, intf, rc = 0;
  1512		u16 bcast, mcast, promisc, ucast;
  1513		struct rvu_hwinfo *hw = rvu->hw;
  1514		u16 pcifunc = req->hdr.pcifunc;
  1515		bool rules_created = false;
  1516		struct rvu_block *block;
  1517		struct rvu_pfvf *pfvf;
  1518		u64 cfg, ctx_cfg;
  1519		struct cgx *cgxd;
  1520		int blkaddr;
  1521		u8 cgx;
> 1522		int pf;
  1523	
  1524		if (!req->rq_cnt || !req->sq_cnt || !req->cq_cnt)
  1525			return NIX_AF_ERR_PARAM;
  1526	
  1527		if (req->way_mask)
  1528			req->way_mask &= 0xFFFF;
  1529	
  1530		pfvf = rvu_get_pfvf(rvu, pcifunc);
  1531		blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
  1532		if (!pfvf->nixlf || blkaddr < 0)
  1533			return NIX_AF_ERR_AF_LF_INVALID;
  1534	
  1535		block = &hw->block[blkaddr];
  1536		nixlf = rvu_get_lf(rvu, block, pcifunc, 0);
  1537		if (nixlf < 0)
  1538			return NIX_AF_ERR_AF_LF_INVALID;
  1539	
  1540		/* Check if requested 'NIXLF <=> NPALF' mapping is valid */
  1541		if (req->npa_func) {
  1542			/* If default, use 'this' NIXLF's PFFUNC */
  1543			if (req->npa_func == RVU_DEFAULT_PF_FUNC)
  1544				req->npa_func = pcifunc;
  1545			if (!is_pffunc_map_valid(rvu, req->npa_func, BLKTYPE_NPA))
  1546				return NIX_AF_INVAL_NPA_PF_FUNC;
  1547		}
  1548	
  1549		/* Check if requested 'NIXLF <=> SSOLF' mapping is valid */
  1550		if (req->sso_func) {
  1551			/* If default, use 'this' NIXLF's PFFUNC */
  1552			if (req->sso_func == RVU_DEFAULT_PF_FUNC)
  1553				req->sso_func = pcifunc;
  1554			if (!is_pffunc_map_valid(rvu, req->sso_func, BLKTYPE_SSO))
  1555				return NIX_AF_INVAL_SSO_PF_FUNC;
  1556		}
  1557	
  1558		/* If RSS is being enabled, check if requested config is valid.
  1559		 * RSS table size should be power of two, otherwise
  1560		 * RSS_GRP::OFFSET + adder might go beyond that group or
  1561		 * won't be able to use entire table.
  1562		 */
  1563		if (req->rss_sz && (req->rss_sz > MAX_RSS_INDIR_TBL_SIZE ||
  1564				    !is_power_of_2(req->rss_sz)))
  1565			return NIX_AF_ERR_RSS_SIZE_INVALID;
  1566	
  1567		if (req->rss_sz &&
  1568		    (!req->rss_grps || req->rss_grps > MAX_RSS_GROUPS))
  1569			return NIX_AF_ERR_RSS_GRPS_INVALID;
  1570	
  1571		/* Reset this NIX LF */
  1572		rc = rvu_lf_reset(rvu, block, nixlf);
  1573		if (rc) {
  1574			dev_err(rvu->dev, "Failed to reset NIX%d LF%d\n",
  1575				block->addr - BLKADDR_NIX0, nixlf);
  1576			return NIX_AF_ERR_LF_RESET;
  1577		}
  1578	
  1579		ctx_cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST3);
  1580	
  1581		/* Alloc NIX RQ HW context memory and config the base */
  1582		hwctx_size = 1UL << ((ctx_cfg >> 4) & 0xF);
  1583		rc = qmem_alloc(rvu->dev, &pfvf->rq_ctx, req->rq_cnt, hwctx_size);
  1584		if (rc)
  1585			goto free_mem;
  1586	
  1587		pfvf->rq_bmap = kcalloc(req->rq_cnt, sizeof(long), GFP_KERNEL);
  1588		if (!pfvf->rq_bmap) {
  1589			rc = -ENOMEM;
  1590			goto free_mem;
  1591		}
  1592	
  1593		rvu_write64(rvu, blkaddr, NIX_AF_LFX_RQS_BASE(nixlf),
  1594			    (u64)pfvf->rq_ctx->iova);
  1595	
  1596		/* Set caching and queue count in HW */
  1597		cfg = BIT_ULL(36) | (req->rq_cnt - 1) | req->way_mask << 20;
  1598		rvu_write64(rvu, blkaddr, NIX_AF_LFX_RQS_CFG(nixlf), cfg);
  1599	
  1600		/* Alloc NIX SQ HW context memory and config the base */
  1601		hwctx_size = 1UL << (ctx_cfg & 0xF);
  1602		rc = qmem_alloc(rvu->dev, &pfvf->sq_ctx, req->sq_cnt, hwctx_size);
  1603		if (rc)
  1604			goto free_mem;
  1605	
  1606		pfvf->sq_bmap = kcalloc(req->sq_cnt, sizeof(long), GFP_KERNEL);
  1607		if (!pfvf->sq_bmap) {
  1608			rc = -ENOMEM;
  1609			goto free_mem;
  1610		}
  1611	
  1612		rvu_write64(rvu, blkaddr, NIX_AF_LFX_SQS_BASE(nixlf),
  1613			    (u64)pfvf->sq_ctx->iova);
  1614	
  1615		cfg = BIT_ULL(36) | (req->sq_cnt - 1) | req->way_mask << 20;
  1616		rvu_write64(rvu, blkaddr, NIX_AF_LFX_SQS_CFG(nixlf), cfg);
  1617	
  1618		/* Alloc NIX CQ HW context memory and config the base */
  1619		hwctx_size = 1UL << ((ctx_cfg >> 8) & 0xF);
  1620		rc = qmem_alloc(rvu->dev, &pfvf->cq_ctx, req->cq_cnt, hwctx_size);
  1621		if (rc)
  1622			goto free_mem;
  1623	
  1624		pfvf->cq_bmap = kcalloc(req->cq_cnt, sizeof(long), GFP_KERNEL);
  1625		if (!pfvf->cq_bmap) {
  1626			rc = -ENOMEM;
  1627			goto free_mem;
  1628		}
  1629	
  1630		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CQS_BASE(nixlf),
  1631			    (u64)pfvf->cq_ctx->iova);
  1632	
  1633		cfg = BIT_ULL(36) | (req->cq_cnt - 1) | req->way_mask << 20;
  1634		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CQS_CFG(nixlf), cfg);
  1635	
  1636		/* Initialize receive side scaling (RSS) */
  1637		hwctx_size = 1UL << ((ctx_cfg >> 12) & 0xF);
  1638		rc = nixlf_rss_ctx_init(rvu, blkaddr, pfvf, nixlf, req->rss_sz,
  1639					req->rss_grps, hwctx_size, req->way_mask,
  1640					!!(req->flags & NIX_LF_RSS_TAG_LSB_AS_ADDER));
  1641		if (rc)
  1642			goto free_mem;
  1643	
  1644		/* Alloc memory for CQINT's HW contexts */
  1645		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
  1646		qints = (cfg >> 24) & 0xFFF;
  1647		hwctx_size = 1UL << ((ctx_cfg >> 24) & 0xF);
  1648		rc = qmem_alloc(rvu->dev, &pfvf->cq_ints_ctx, qints, hwctx_size);
  1649		if (rc)
  1650			goto free_mem;
  1651	
  1652		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CINTS_BASE(nixlf),
  1653			    (u64)pfvf->cq_ints_ctx->iova);
  1654	
  1655		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CINTS_CFG(nixlf),
  1656			    BIT_ULL(36) | req->way_mask << 20);
  1657	
  1658		/* Alloc memory for QINT's HW contexts */
  1659		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
  1660		qints = (cfg >> 12) & 0xFFF;
  1661		hwctx_size = 1UL << ((ctx_cfg >> 20) & 0xF);
  1662		rc = qmem_alloc(rvu->dev, &pfvf->nix_qints_ctx, qints, hwctx_size);
  1663		if (rc)
  1664			goto free_mem;
  1665	
  1666		rvu_write64(rvu, blkaddr, NIX_AF_LFX_QINTS_BASE(nixlf),
  1667			    (u64)pfvf->nix_qints_ctx->iova);
  1668		rvu_write64(rvu, blkaddr, NIX_AF_LFX_QINTS_CFG(nixlf),
  1669			    BIT_ULL(36) | req->way_mask << 20);
  1670	
  1671		/* Setup VLANX TPID's.
  1672		 * Use VLAN1 for 802.1Q
  1673		 * and VLAN0 for 802.1AD.
  1674		 */
  1675		cfg = (0x8100ULL << 16) | 0x88A8ULL;
  1676		rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_CFG(nixlf), cfg);
  1677	
  1678		/* Enable LMTST for this NIX LF */
  1679		rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_CFG2(nixlf), BIT_ULL(0));
  1680	
  1681		/* Set CQE/WQE size, NPA_PF_FUNC for SQBs and also SSO_PF_FUNC */
  1682		if (req->npa_func)
  1683			cfg = req->npa_func;
  1684		if (req->sso_func)
  1685			cfg |= (u64)req->sso_func << 16;
  1686	
  1687		cfg |= (u64)req->xqe_sz << 33;
  1688		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CFG(nixlf), cfg);
  1689	
  1690		/* Config Rx pkt length, csum checks and apad  enable / disable */
  1691		rvu_write64(rvu, blkaddr, NIX_AF_LFX_RX_CFG(nixlf), req->rx_cfg);
  1692	
  1693		/* Configure pkind for TX parse config */
  1694		if (is_pf_cgxmapped(rvu, rvu_get_pf(rvu->pdev, pcifunc))) {
  1695			pf = rvu_get_pf(rvu->pdev, pcifunc);
> 1696			cgxd = rvu_cgx_pdata(cgx, rvu);
  1697	
  1698			mutex_lock(&cgxd->lock);
  1699			if (rvu_cgx_is_pkind_config_permitted(rvu, pcifunc)) {
  1700				cfg = NPC_TX_DEF_PKIND;
  1701				rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_PARSE_CFG(nixlf), cfg);
  1702			}
  1703			mutex_unlock(&cgxd->lock);
  1704		}
  1705	
  1706		if (is_rep_dev(rvu, pcifunc)) {
  1707			pfvf->tx_chan_base = RVU_SWITCH_LBK_CHAN;
  1708			pfvf->tx_chan_cnt = 1;
  1709			goto exit;
  1710		}
  1711	
  1712		intf = is_lbk_vf(rvu, pcifunc) ? NIX_INTF_TYPE_LBK : NIX_INTF_TYPE_CGX;
  1713		if (is_sdp_pfvf(rvu, pcifunc))
  1714			intf = NIX_INTF_TYPE_SDP;
  1715	
  1716		if (is_cn20k(rvu->pdev)) {
  1717			rc = npc_cn20k_dft_rules_idx_get(rvu, pcifunc, &bcast, &mcast,
  1718							 &promisc, &ucast);
  1719			if (rc) {
  1720				rc = npc_cn20k_dft_rules_alloc(rvu, pcifunc);
  1721				if (rc)
  1722					goto free_mem;
  1723	
  1724				rules_created = true;
  1725			}
  1726		}
  1727	
  1728		rc = nix_interface_init(rvu, pcifunc, intf, nixlf, rsp,
  1729					!!(req->flags & NIX_LF_LBK_BLK_SEL));
  1730		if (rc)
  1731			goto free_dft;
  1732	
  1733		/* Disable NPC entries as NIXLF's contexts are not initialized yet */
  1734		rvu_npc_disable_default_entries(rvu, pcifunc, nixlf);
  1735	
  1736		/* Configure RX VTAG Type 7 (strip) for vf vlan */
  1737		rvu_write64(rvu, blkaddr,
  1738			    NIX_AF_LFX_RX_VTAG_TYPEX(nixlf, NIX_AF_LFX_RX_VTAG_TYPE7),
  1739			    VTAGSIZE_T4 | VTAG_STRIP);
  1740	
  1741		goto exit;
  1742	
  1743	free_dft:
  1744		if (is_cn20k(rvu->pdev) && rules_created)
  1745			npc_cn20k_dft_rules_free(rvu, pcifunc);
  1746	
  1747	free_mem:
  1748		nix_ctx_free(rvu, pfvf);
  1749	
  1750	exit:
  1751		/* Set macaddr of this PF/VF */
  1752		ether_addr_copy(rsp->mac_addr, pfvf->mac_addr);
  1753	
  1754		/* set SQB size info */
  1755		cfg = rvu_read64(rvu, blkaddr, NIX_AF_SQ_CONST);
  1756		rsp->sqb_size = (cfg >> 34) & 0xFFFF;
  1757		rsp->rx_chan_base = pfvf->rx_chan_base;
  1758		rsp->tx_chan_base = pfvf->tx_chan_base;
  1759		rsp->rx_chan_cnt = pfvf->rx_chan_cnt;
  1760		rsp->tx_chan_cnt = pfvf->tx_chan_cnt;
  1761		rsp->lso_tsov4_idx = NIX_LSO_FORMAT_IDX_TSOV4;
  1762		rsp->lso_tsov6_idx = NIX_LSO_FORMAT_IDX_TSOV6;
  1763		/* Get HW supported stat count */
  1764		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST1);
  1765		rsp->lf_rx_stats = ((cfg >> 32) & 0xFF);
  1766		rsp->lf_tx_stats = ((cfg >> 24) & 0xFF);
  1767		/* Get count of CQ IRQs and error IRQs supported per LF */
  1768		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
  1769		rsp->qints = ((cfg >> 12) & 0xFFF);
  1770		rsp->cints = ((cfg >> 24) & 0xFFF);
  1771		rsp->cgx_links = hw->cgx_links;
  1772		rsp->lbk_links = hw->lbk_links;
  1773		rsp->sdp_links = hw->sdp_links;
  1774	
  1775		return rc;
  1776	}
  1777	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 6.12.y] net: add missing ns_capable check for peer netns
From: Greg KH @ 2026-06-25 11:37 UTC (permalink / raw)
  To: Maximilian Heyne
  Cc: stable, Marc Kleine-Budde, Vincent Mailhol, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Daniel Borkmann, Nikolay Aleksandrov, Eric W. Biederman,
	linux-can, netdev, linux-kernel, bpf
In-Reply-To: <20260617-pats-coif-316245c6@mheyne-amazon>

On Wed, Jun 17, 2026 at 08:25:31AM +0000, Maximilian Heyne wrote:
> The upstream commit 7b735ef81286 ("rtnetlink: add missing
> netlink_ns_capable() check for peer netns") doesn't apply on older
> stable kernels due to refactoring. Therefore, this patch is an attempt
> to implement the same capability check just directly in the respective
> interface types.

Why can't we take the full series of patches instead?  Otherwise this is
going to be a pain over time for any other fixes/updates in this area,
right?

And if not, then we need acks from the maintainers here...

thanks,

greg k-h

^ permalink raw reply

* [PATCH iproute2-next v5] ip/bond: add lacp_strict support
From: Louis Scalbert @ 2026-06-25 11:42 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, jonas.gorski, horms, stephen, Louis Scalbert

lacp_strict defines the behavior of a LACP bonding interface
when no slaves are in Collecting_Distributing state while at least
'min_links' slaves have carrier.

In the default (off) mode, the bonding master remains up and a
single slave is selected for TX/RX, while traffic received on other
slaves is dropped. This preserves the existing behavior.

In lacp_strict mode, the bonding master reports carrier down in this
situation.

Link: https://lore.kernel.org/netdev/20260603150331.1919611-1-louis.scalbert@6wind.com/
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 include/uapi/linux/if_link.h |  1 +
 ip/iplink_bond.c             | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 70aee114..d3a21fba 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1601,6 +1601,7 @@ enum {
 	IFLA_BOND_NS_IP6_TARGET,
 	IFLA_BOND_COUPLED_CONTROL,
 	IFLA_BOND_BROADCAST_NEIGH,
+	IFLA_BOND_LACP_STRICT,
 	__IFLA_BOND_MAX,
 };
 
diff --git a/ip/iplink_bond.c b/ip/iplink_bond.c
index 714fe7bd..7e2e397a 100644
--- a/ip/iplink_bond.c
+++ b/ip/iplink_bond.c
@@ -87,6 +87,12 @@ static const char *lacp_rate_tbl[] = {
 	NULL,
 };
 
+static const char *lacp_strict_tbl[] = {
+	"off",
+	"on",
+	NULL,
+};
+
 static const char *ad_select_tbl[] = {
 	"stable",
 	"bandwidth",
@@ -155,6 +161,7 @@ static void print_explain(FILE *f)
 		"                [ ad_user_port_key PORTKEY ]\n"
 		"                [ ad_actor_sys_prio SYSPRIO ]\n"
 		"                [ ad_actor_system LLADDR ]\n"
+		"                [ lacp_strict LACP_STRICT ]\n"
 		"                [ arp_missed_max MISSED_MAX ]\n"
 		"\n"
 		"BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb\n"
@@ -168,6 +175,7 @@ static void print_explain(FILE *f)
 		"AD_SELECT := stable|bandwidth|count\n"
 		"COUPLED_CONTROL := off|on\n"
 		"BROADCAST_NEIGHBOR := off|on\n"
+		"LACP_STRICT := off|on\n"
 	);
 }
 
@@ -188,6 +196,7 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u32 packets_per_slave;
 	__u8 missed_max;
 	__u8 broadcast_neighbor;
+	__u8 lacp_strict;
 	unsigned int ifindex;
 	int ret;
 
@@ -417,6 +426,13 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
 				return -1;
 			addattr_l(n, 1024, IFLA_BOND_AD_ACTOR_SYSTEM,
 				  abuf, len);
+		} else if (matches(*argv, "lacp_strict") == 0) {
+			NEXT_ARG();
+			lacp_strict = parse_on_off("lacp_strict", *argv, &ret);
+			if (ret)
+				return ret;
+			lacp_strict = get_index(lacp_strict_tbl, *argv);
+			addattr8(n, 1024, IFLA_BOND_LACP_STRICT, lacp_strict);
 		} else if (matches(*argv, "tlb_dynamic_lb") == 0) {
 			NEXT_ARG();
 			if (get_u8(&tlb_dynamic_lb, *argv, 0)) {
@@ -642,6 +658,10 @@ static void bond_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 			   "all_slaves_active %u ",
 			   rta_getattr_u8(tb[IFLA_BOND_ALL_SLAVES_ACTIVE]));
 
+	if (tb[IFLA_BOND_LACP_STRICT])
+		print_on_off(PRINT_ANY, "lacp_strict", "lacp_strict %s ",
+			     rta_getattr_u8(tb[IFLA_BOND_LACP_STRICT]));
+
 	if (tb[IFLA_BOND_MIN_LINKS])
 		print_uint(PRINT_ANY,
 			   "min_links",
-- 
2.39.2


^ permalink raw reply related

* Re: [PATCH net 3/4] vlan: defer real device state propagation to netdev_work
From: Nicolai Buchwitz @ 2026-06-25 11:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, jv, sdf,
	dongchenchen2, idosch, n05ec, yuantan098, kuniyu,
	aleksandr.loktionov, dtatulea, syzbot+09da62a8b78959ceb8bb,
	syzbot+cb67c392b0b8f0fd0fc1, syzbot+9bb8bd77f3966641f298
In-Reply-To: <20260624182018.2445732-4-kuba@kernel.org>

On 24.6.2026 20:20, Jakub Kicinski wrote:
> vlan_device_event() generates nested UP/DOWN, MTU and feature
> change events. It executes an event for the VLAN device directly
> from the notifier - while the locks of the lower device are held.
> 
> This causes deadlocks, for example:
> 
>   bond    (3) bond_update_speed_duplex(vlan)
>     |           ^                v
>   vlan    (2) UP(vlan)    (4) vlan_ethtool_get_link_ksettings()
>     |           ^                v
>   dummy   (1) UP(dummy)   (5) __ethtool_get_link_ksettings()
> 
> The dummy device is ops locked, vlan creates a nested event (2),
> then bond wants to ask vlan for link state (3). bond uses the
> "I'm already holding the instance lock" flavor of API. But in
> this case the lock held refers to vlan itself. We hit vlan's
> link settings trampoline (4) and call __ethtool_get_link_ksettings()
> which tries to lock dummy. Deadlock. There's no clean way for us
> to tell the vlan_ethtool_get_link_ksettings() that the caller
> is already in lower device's critical section.
> 
> Defer the propagation to the per-netdev work facility instead:
> the notifier only schedules netdev_work_sched(vlandev, VLAN_WORK_*),
> and ndo_work (vlan_dev_work) applies the change later. Hopefully
> nobody expects the VLAN state changes to be instantaneous.
> 
> If someone does expect the changes to be instantaneous we will
> have to do the same thing Stan did for rx_mode and "strategically"
> place sync calls, to make sure such delayed works are executed
> after we drop the ops lock but before we drop rtnl_lock.
> 
> Stan suggests that if we need that down the line we may
> consider reshaping the mechanism into "async notifications".
> AFAICT only vlan does this sort of netdev open chaining,
> so as a first try I think that sticking the complexity into
> the vlan code makes sense.
> 
> One corner case is that we need to cancel the event if user
> explicitly changes the state before work could run. Consider
> the following operations with vlan0 on top of dummy0:
> 
>   ip link set dev dummy0 up    # queues work to up vlan0
>   ip link set dev vlan0 down   # user explicitly downs the vlan
>   ndo_work                     # acts on the stale event
> 
> Reported-by: syzbot+09da62a8b78959ceb8bb@syzkaller.appspotmail.com
> Reported-by: syzbot+cb67c392b0b8f0fd0fc1@syzkaller.appspotmail.com
> Reported-by: syzbot+9bb8bd77f3966641f298@syzkaller.appspotmail.com
> Fixes: 9f275c2e9020 ("net: ethtool: make sure 
> __ethtool_get_link_ksettings() is ops-locked")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---

> [...]

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>

Thanks
Nicolai

^ permalink raw reply

* [PATCH net] nfc: clear active_target when the target list is replaced
From: Yinhao Hu @ 2026-06-25 11:18 UTC (permalink / raw)
  To: netdev
  Cc: david, davem, edumazet, kuba, pabeni, horms, dzm91,
	hust-os-kernel-patches, Yinhao Hu

nfc_activate_target() and nfc_dep_link_up() cache dev->active_target as a
raw pointer into the dev->targets array. When a later poll reports new
targets, nfc_targets_found() frees and replaces dev->targets but does not
clear dev->active_target, so the cached pointer is left dangling into
freed memory. Any subsequent NFC core path that dereferences
dev->active_target->idx then reads the freed memory, e.g.
nfc_deactivate_target(), nfc_data_exchange().

When nfc_targets_found() is about to free the current target array, clear
dev->active_target if it points into that array, and tear down the
associated active state (stop the presence-check timer, drop the DEP link
and reset the RF mode) as nfc_deactivate_target() does.

Fixes: 900994332675 ("NFC: Cache the core NFC active target pointer instead of its index")
Signed-off-by: Yinhao Hu <dddddd@hust.edu.cn>
---
 net/nfc/core.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/net/nfc/core.c b/net/nfc/core.c
index a92a6566e6a0..950807906645 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -786,6 +786,21 @@ int nfc_targets_found(struct nfc_dev *dev,
 
 	dev->targets_generation++;
 
+	if (dev->active_target && dev->targets) {
+		for (i = 0; i < dev->n_targets; i++) {
+			if (dev->active_target != &dev->targets[i])
+				continue;
+
+			if (dev->ops->check_presence)
+				timer_delete_sync(&dev->check_pres_timer);
+
+			dev->active_target = NULL;
+			dev->dep_link_up = false;
+			dev->rf_mode = NFC_RF_NONE;
+			break;
+		}
+	}
+
 	kfree(dev->targets);
 	dev->targets = NULL;
 
-- 
2.43.0


^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
From: Marcin Szycik @ 2026-06-25 11:07 UTC (permalink / raw)
  To: Florian Bezdeka, Kwapulinski, Piotr, Ding Meng, Nguyen, Anthony L,
	Kitszel, Przemyslaw, andrew+netdev@lunn.ch, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	Kiszka, Jan
  Cc: intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, wq.wang@siemens.com
In-Reply-To: <d058b0fa9ad923514084a44f51c78ae8355c4ebb.camel@siemens.com>



On 24/06/2026 11:05, Florian Bezdeka via Intel-wired-lan wrote:
> On Tue, 2026-06-23 at 09:46 +0000, Kwapulinski, Piotr wrote:
>>> -----Original Message-----
>>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ding Meng via Intel-wired-lan
>>> Sent: Monday, June 22, 2026 6:13 AM
>>> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Kiszka, Jan <jan.kiszka@siemens.com>; Bezdeka, Florian <florian.bezdeka@siemens.com>
>>> Cc: intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; meng.ding@siemens.com; wq.wang@siemens.com
>>> Subject: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
>>>
>>> When CONFIG_NET_RX_BUSY_POLL is deactivated, fetching RX HW timestamps from the NIC no longer works as expected.
>>>
>>> This occurs because disabling CONFIG_NET_RX_BUSY_POLL disables the SKB NAPI mapping in __skb_mark_napi_id(). Consequently, get_timestamp() fails to perform its driver lookup, and the igc driver's struct net_device_ops::ndo_get_tstamp is never invoked.
>>>
>>> Instead, get_timestamp() falls back to use shhwtstamps(skb)->hwtstamp, a field that the driver has not populated.
>>>
>>> Fix this by populating the hwtstamp field with the correct timestamp in the default timer when CONFIG_NET_RX_BUSY_POLL is disabled.
>>>
>>> Fixes: 069b142f5819 ("igc: Add support for PTP .getcyclesx64()")
>>> Co-developed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> Signed-off-by: Ding Meng <meng.ding@siemens.com>
>>> ---
>>> drivers/net/ethernet/intel/igc/igc_main.c | 38 ++++++++++++++++-------
>>> 1 file changed, 26 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>>> index 8ac16808023..1da8d7aa76d 100644
>>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>>> @@ -1992,7 +1992,26 @@ static struct sk_buff *igc_build_skb(struct igc_ring *rx_ring,
>>> 	return skb;
>>> }
>>>
>>> -static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
>>> +static void igc_construct_skb_timestamps(struct igc_adapter *adapter,
>>> +					 struct sk_buff *skb,
>>> +					 struct igc_xdp_buff *ctx)
>>> +{
>>> +	if (!ctx->rx_ts)
>>> +		return;
>>> +#ifdef CONFIG_NET_RX_BUSY_POLL
>>> +	skb_shinfo(skb)->tx_flags |= SKBTX_HW_TSTAMP_NETDEV;
>>> +	skb_hwtstamps(skb)->netdev_data = ctx->rx_ts; #else
>>> +	struct igc_inline_rx_tstamps *tstamps;
>> Please move at the top of the function and add:
> 
> That would trigger a "unused variable" warning in the
> CONFIG_NET_RX_BUSY_POLL case.

Put it under #ifndef CONFIG_NET_RX_BUSY_POLL. Variable declarations
need to be on top.

Thanks,
Marcin

> Btw: I was really confused that the #else statement moved to the end of
> the previous line. Might someone be using a wrongly configured mail
> client here?
> 
> Florian
> 
>> Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com
>>
>>> +
>>> +	tstamps = ctx->rx_ts;
>>> +	skb_hwtstamps(skb)->hwtstamp = igc_ptp_rx_pktstamp(adapter,
>>> +							   tstamps->timer0);
>>> +#endif
>>> +}
>>> +
> 
> [snip]


^ permalink raw reply

* [PATCH bpf-next v10 5/5] selftests/bpf: add bpf_icmp_send no route test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

For normal live cgroup_skb paths, the skb should already be routed. The
exception is for test run via BPF_PROG_TEST_RUN with packets created
via bpf_prog_test_run_skb. Those lack dst route and thus the icmp_send
would quietly fail by returning early.

This test exercises this and makes sure the kfunc returns -ENETUNREACH.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bb532aa0d158..ffaf0fe1880b 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -169,6 +169,29 @@ static void run_icmp_test(struct icmp_send *skel, int af, const char *ip,
 	}
 }

+static void run_icmp_no_route_test(struct icmp_send *skel)
+{
+	struct ipv4_packet pkt = pkt_v4;
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt,
+		.data_size_in = sizeof(pkt),
+	);
+	int err;
+
+	pkt.iph.version = 4;
+	pkt.iph.daddr = inet_addr("127.0.0.1");
+	pkt.tcp.dest = htons(80);
+	skel->bss->server_port = 80;
+	skel->bss->unreach_type = ICMP_DEST_UNREACH;
+	skel->bss->unreach_code = ICMP_HOST_UNREACH;
+	skel->data->kfunc_ret = KFUNC_RET_UNSET;
+
+	err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.egress), &opts);
+	if (!ASSERT_OK(err, "test_run"))
+		return;
+	ASSERT_EQ(skel->data->kfunc_ret, -ENETUNREACH, "kfunc_ret_no_route");
+}
+
 void test_icmp_send_unreach_cgroup(void)
 {
 	struct icmp_send *skel;
@@ -193,6 +216,9 @@ void test_icmp_send_unreach_cgroup(void)
 	if (test__start_subtest("ipv6"))
 		run_icmp_test(skel, AF_INET6, "::1", ICMPV6_REJECT_ROUTE);

+	if (test__start_subtest("no_route"))
+		run_icmp_no_route_test(skel);
+
 cleanup:
 	icmp_send__destroy(skel);
 	if (cgroup_fd >= 0)
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 4/5] selftests/bpf: add bpf_icmp_send recursion test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This test is similar to test_icmp_send_unreach_cgroup but checks that,
in case of recursion, meaning that the BPF program calling the kfunc was
re-triggered by the icmp_send done by the kfunc, the kfunc will stop
early and return -EBUSY.

The test attaches to the root cgroup to ensure the ICMP packet generated
by the kfunc re-triggers the BPF program.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 46 ++++++++++++++++
 tools/testing/selftests/bpf/progs/icmp_send.c | 55 +++++++++++++++++++
 2 files changed, 101 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bbb3c3d4509c..bb532aa0d158 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 #include <network_helpers.h>
+#include <cgroup_helpers.h>
 #include <linux/errqueue.h>
 #include <poll.h>
+#include <unistd.h>
 #include "icmp_send.skel.h"

 #define TIMEOUT_MS 1000
@@ -10,6 +12,7 @@
 #define ICMP_DEST_UNREACH 3
 #define ICMPV6_DEST_UNREACH 1

+#define ICMP_HOST_UNREACH 1
 #define ICMP_FRAG_NEEDED 4
 #define NR_ICMP_UNREACH 15
 #define ICMPV6_REJECT_ROUTE 6
@@ -195,3 +198,46 @@ void test_icmp_send_unreach_cgroup(void)
 	if (cgroup_fd >= 0)
 		close(cgroup_fd);
 }
+
+void test_icmp_send_unreach_recursion(void)
+{
+	struct icmp_send *skel;
+	int cgroup_fd = -1;
+	int err;
+
+	err = setup_cgroup_environment();
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		return;
+
+	skel = icmp_send__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = get_root_cgroup();
+	if (!ASSERT_OK_FD(cgroup_fd, "get_root_cgroup"))
+		goto cleanup;
+
+	skel->data->target_pid = getpid();
+	skel->links.recursion =
+		bpf_program__attach_cgroup(skel->progs.recursion, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.recursion, "prog_attach_cgroup"))
+		goto cleanup;
+
+	trigger_prog_read_icmp_errqueue(skel, ICMP_HOST_UNREACH, AF_INET,
+					"127.0.0.1");
+
+	/*
+	 * Because there's recursion involved, the first call will return at
+	 * index 1 since it will return the second, and the second call will
+	 * return at index 0 since it will return the first.
+	 */
+	ASSERT_EQ(skel->bss->rec_count, 2, "rec_count");
+	ASSERT_EQ(skel->data->rec_kfunc_rets[0], -EBUSY, "kfunc_rets[0]");
+	ASSERT_EQ(skel->data->rec_kfunc_rets[1], 0, "kfunc_rets[1]");
+
+cleanup:
+	icmp_send__destroy(skel);
+	if (cgroup_fd >= 0)
+		close(cgroup_fd);
+	cleanup_cgroup_environment();
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
index 6e1ba539eeb0..c642ccdf9fd5 100644
--- a/tools/testing/selftests/bpf/progs/icmp_send.c
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -12,6 +12,10 @@ __u16 server_port = 0;
 int unreach_type = 0;
 int unreach_code = 0;
 int kfunc_ret = -1;
+int target_pid = -1;
+
+unsigned int rec_count = 0;
+int rec_kfunc_rets[] = { -1, -1 };

 SEC("cgroup_skb/egress")
 int egress(struct __sk_buff *skb)
@@ -65,4 +69,55 @@ int egress(struct __sk_buff *skb)
 	return SK_DROP;
 }

+SEC("cgroup_skb/egress")
+int recursion(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct icmphdr *icmph;
+	struct tcphdr *tcph;
+	struct iphdr *iph;
+	int ret;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return SK_PASS;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4)
+		return SK_PASS;
+
+	if (iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	if (iph->protocol == IPPROTO_TCP) {
+		tcph = (void *)iph + iph->ihl * 4;
+		if ((void *)(tcph + 1) > data_end ||
+		    tcph->dest != bpf_htons(server_port))
+			return SK_PASS;
+	} else if (iph->protocol == IPPROTO_ICMP) {
+		icmph = (void *)iph + iph->ihl * 4;
+		if ((void *)(icmph + 1) > data_end ||
+		    icmph->type != unreach_type || icmph->code != unreach_code)
+			return SK_PASS;
+	} else {
+		return SK_PASS;
+	}
+
+	/*
+	 * This call will provoke a recursion: the ICMP packet generated by the
+	 * kfunc will re-trigger this program since we are in the root cgroup in
+	 * which the kernel ICMP socket belongs. However when re-entering the
+	 * kfunc, it should return EBUSY.
+	 */
+	ret = bpf_icmp_send(skb, unreach_type, unreach_code);
+	rec_kfunc_rets[rec_count & 1] = ret;
+	__sync_fetch_and_add(&rec_count, 1);
+
+	/* Let the first ICMP error message pass */
+	if (iph->protocol == IPPROTO_ICMP)
+		return SK_PASS;
+
+	return SK_DROP;
+}
+
 char LICENSE[] SEC("license") = "Dual BSD/GPL";
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 3/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb IPv6 tests
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This test extends the existing cgroup_skb tests with IPv6 support.

Note that we need to set IPV6_RECVERR on the socket for IPv6 in
connect_to_fd_nonblock otherwise the error will be ignored even if we
are in the middle of the TCP handshake. See in
net/ipv6/datagram.c:ipv6_icmp_error for more details.

Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 91 +++++++++++++------
 tools/testing/selftests/bpf/progs/icmp_send.c | 48 ++++++++--
 2 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index b8a98c90053e..bbb3c3d4509c 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -8,9 +8,11 @@
 #define TIMEOUT_MS 1000

 #define ICMP_DEST_UNREACH 3
+#define ICMPV6_DEST_UNREACH 1

 #define ICMP_FRAG_NEEDED 4
 #define NR_ICMP_UNREACH 15
+#define ICMPV6_REJECT_ROUTE 6

 #define KFUNC_RET_UNSET -1

@@ -18,7 +20,7 @@ static int connect_to_fd_nonblock(int server_fd)
 {
 	struct sockaddr_storage addr;
 	socklen_t len = sizeof(addr);
-	int fd, err;
+	int fd, err, on = 1;

 	if (getsockname(server_fd, (struct sockaddr *)&addr, &len))
 		return -1;
@@ -27,6 +29,12 @@ static int connect_to_fd_nonblock(int server_fd)
 	if (fd < 0)
 		return -1;

+	if (addr.ss_family == AF_INET6 &&
+	    setsockopt(fd, IPPROTO_IPV6, IPV6_RECVERR, &on, sizeof(on)) < 0) {
+		close(fd);
+		return -1;
+	}
+
 	err = connect(fd, (struct sockaddr *)&addr, len);
 	if (err < 0 && errno != EINPROGRESS) {
 		close(fd);
@@ -36,8 +44,14 @@ static int connect_to_fd_nonblock(int server_fd)
 	return fd;
 }

-static void read_icmp_errqueue(int sockfd, int expected_code)
+static void read_icmp_errqueue(int sockfd, int expected_code, int af)
 {
+	int expected_ee_type = (af == AF_INET) ? ICMP_DEST_UNREACH :
+						 ICMPV6_DEST_UNREACH;
+	int expected_origin = (af == AF_INET) ? SO_EE_ORIGIN_ICMP :
+						SO_EE_ORIGIN_ICMP6;
+	int expected_level = (af == AF_INET) ? IPPROTO_IP : IPPROTO_IPV6;
+	int expected_type = (af == AF_INET) ? IP_RECVERR : IPV6_RECVERR;
 	struct sock_extended_err *sock_err;
 	char ctrl_buf[512];
 	struct msghdr msg = {
@@ -63,38 +77,43 @@ static void read_icmp_errqueue(int sockfd, int expected_code)
 		return;

 	for (; cm; cm = CMSG_NXTHDR(&msg, cm)) {
-		if (cm->cmsg_level != IPPROTO_IP || cm->cmsg_type != IP_RECVERR)
+		if (cm->cmsg_level != expected_level ||
+		    cm->cmsg_type != expected_type)
 			continue;

 		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);

-		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
-			       "sock_err_origin_icmp"))
+		if (!ASSERT_EQ(sock_err->ee_origin, expected_origin,
+			       "sock_err_origin"))
 			return;
-		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
+		if (!ASSERT_EQ(sock_err->ee_type, expected_ee_type,
 			       "sock_err_type_dest_unreach"))
 			return;
 		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
 		return;
 	}

-	ASSERT_FAIL("no IP_RECVERR control message found");
+	ASSERT_FAIL("no IP_RECVERR/IPV6_RECVERR control message found");
 }

-static bool valid_unreach_code(int code)
+static bool valid_unreach_code(int code, int af)
 {
 	if (code < 0)
 		return false;

-	return code <= NR_ICMP_UNREACH && code != ICMP_FRAG_NEEDED;
+	if (af == AF_INET)
+		return code <= NR_ICMP_UNREACH && code != ICMP_FRAG_NEEDED;
+
+	return code <= ICMPV6_REJECT_ROUTE;
 }

-static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
+static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code,
+					    int af, const char *ip)
 {
 	int srv_fd = -1, client_fd = -1;
 	int port;

-	srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1", 0, TIMEOUT_MS);
+	srv_fd = start_server(af, SOCK_STREAM, ip, 0, TIMEOUT_MS);
 	if (!ASSERT_OK_FD(srv_fd, "start_server"))
 		return;

@@ -105,6 +124,8 @@ static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
 	}

 	skel->bss->server_port = ntohs(port);
+	skel->bss->unreach_type = (af == AF_INET) ? ICMP_DEST_UNREACH :
+						    ICMPV6_DEST_UNREACH;
 	skel->bss->unreach_code = code;
 	skel->data->kfunc_ret = KFUNC_RET_UNSET;

@@ -114,13 +135,37 @@ static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
 		return;
 	}

-	if (valid_unreach_code(code))
-		read_icmp_errqueue(client_fd, code);
+	if (valid_unreach_code(code, af))
+		read_icmp_errqueue(client_fd, code, af);

 	close(client_fd);
 	close(srv_fd);
 }

+static void run_icmp_test(struct icmp_send *skel, int af, const char *ip,
+			  int max_code)
+{
+	for (int code = 0; code <= max_code; code++) {
+		if (af == AF_INET && code == ICMP_FRAG_NEEDED)
+			continue;
+
+		trigger_prog_read_icmp_errqueue(skel, code, af, ip);
+		ASSERT_EQ(skel->data->kfunc_ret, 0, "kfunc_ret");
+	}
+
+	/* Test invalid codes */
+	trigger_prog_read_icmp_errqueue(skel, -1, af, ip);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	trigger_prog_read_icmp_errqueue(skel, max_code + 1, af, ip);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	if (af == AF_INET) {
+		trigger_prog_read_icmp_errqueue(skel, ICMP_FRAG_NEEDED, af, ip);
+		ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+	}
+}
+
 void test_icmp_send_unreach_cgroup(void)
 {
 	struct icmp_send *skel;
@@ -139,23 +184,11 @@ void test_icmp_send_unreach_cgroup(void)
 	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
 		goto cleanup;

-	for (int code = 0; code <= NR_ICMP_UNREACH; code++) {
-		if (code == ICMP_FRAG_NEEDED)
-			continue;
-
-		trigger_prog_read_icmp_errqueue(skel, code);
-		ASSERT_EQ(skel->data->kfunc_ret, 0, "kfunc_ret");
-	}
-
-	/* Test invalid codes */
-	trigger_prog_read_icmp_errqueue(skel, -1);
-	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+	if (test__start_subtest("ipv4"))
+		run_icmp_test(skel, AF_INET, "127.0.0.1", NR_ICMP_UNREACH);

-	trigger_prog_read_icmp_errqueue(skel, NR_ICMP_UNREACH + 1);
-	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
-
-	trigger_prog_read_icmp_errqueue(skel, ICMP_FRAG_NEEDED);
-	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+	if (test__start_subtest("ipv6"))
+		run_icmp_test(skel, AF_INET6, "::1", ICMPV6_REJECT_ROUTE);

 cleanup:
 	icmp_send__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
index 6d0be0a9afe1..6e1ba539eeb0 100644
--- a/tools/testing/selftests/bpf/progs/icmp_send.c
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -5,10 +5,11 @@

 /* 127.0.0.1 in host byte order */
 #define SERVER_IP 0x7F000001
-
-#define ICMP_DEST_UNREACH 3
+/* ::1 in host byte order (last 32-bit word) */
+#define SERVER_IP6_LO 0x00000001

 __u16 server_port = 0;
+int unreach_type = 0;
 int unreach_code = 0;
 int kfunc_ret = -1;

@@ -18,19 +19,48 @@ int egress(struct __sk_buff *skb)
 	void *data = (void *)(long)skb->data;
 	void *data_end = (void *)(long)skb->data_end;
 	struct iphdr *iph;
+	struct ipv6hdr *ip6h;
 	struct tcphdr *tcph;
+	__u8 version;

-	iph = data;
-	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
-	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+	if (data + 1 > data_end)
 		return SK_PASS;

-	tcph = (void *)iph + iph->ihl * 4;
-	if ((void *)(tcph + 1) > data_end ||
-	    tcph->dest != bpf_htons(server_port))
+	version = (*((__u8 *)data)) >> 4;
+
+	if (version == 4) {
+		iph = data;
+		if ((void *)(iph + 1) > data_end ||
+		    iph->protocol != IPPROTO_TCP ||
+		    iph->daddr != bpf_htonl(SERVER_IP))
+			return SK_PASS;
+
+		tcph = (void *)iph + iph->ihl * 4;
+		if ((void *)(tcph + 1) > data_end ||
+		    tcph->dest != bpf_htons(server_port))
+			return SK_PASS;
+
+	} else if (version == 6) {
+		ip6h = data;
+		if ((void *)(ip6h + 1) > data_end ||
+		    ip6h->nexthdr != IPPROTO_TCP)
+			return SK_PASS;
+
+		if (ip6h->daddr.in6_u.u6_addr32[0] != 0 ||
+		    ip6h->daddr.in6_u.u6_addr32[1] != 0 ||
+		    ip6h->daddr.in6_u.u6_addr32[2] != 0 ||
+		    ip6h->daddr.in6_u.u6_addr32[3] != bpf_htonl(SERVER_IP6_LO))
+			return SK_PASS;
+
+		tcph = (void *)(ip6h + 1);
+		if ((void *)(tcph + 1) > data_end ||
+		    tcph->dest != bpf_htons(server_port))
+			return SK_PASS;
+	} else {
 		return SK_PASS;
+	}

-	kfunc_ret = bpf_icmp_send(skb, ICMP_DEST_UNREACH, unreach_code);
+	kfunc_ret = bpf_icmp_send(skb, unreach_type, unreach_code);

 	return SK_DROP;
 }
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 2/5] selftests/bpf: add bpf_icmp_send kfunc cgroup_skb tests
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This test opens a server and client, enters a new cgroup, attach a
cgroup_skb program on egress and calls the bpf_icmp_send function from
the client egress so that an ICMP unreach control message is sent back
to the client. It then fetches the message from the error queue to
confirm the correct ICMP unreach code has been sent.

Note that, for the client, we have to connect in non-blocking mode to
let the test execute faster. Otherwise, we need to wait for the TCP
three-way handshake to timeout in the kernel before reading the errno.

Also note that we don't set IP_RECVERR on the socket in
connect_to_fd_nonblock since the error will be transferred anyway in our
test because the connection is rejected at the beginning of the TCP
handshake. See in net/ipv4/tcp_ipv4.c:tcp_v4_err for more details.

Reviewed-by: Jordan Rife <jordan@jrife.io>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 164 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/icmp_send.c |  38 ++++
 2 files changed, 202 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send.c

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
new file mode 100644
index 000000000000..b8a98c90053e
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <linux/errqueue.h>
+#include <poll.h>
+#include "icmp_send.skel.h"
+
+#define TIMEOUT_MS 1000
+
+#define ICMP_DEST_UNREACH 3
+
+#define ICMP_FRAG_NEEDED 4
+#define NR_ICMP_UNREACH 15
+
+#define KFUNC_RET_UNSET -1
+
+static int connect_to_fd_nonblock(int server_fd)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd, err;
+
+	if (getsockname(server_fd, (struct sockaddr *)&addr, &len))
+		return -1;
+
+	fd = socket(addr.ss_family, SOCK_STREAM | SOCK_NONBLOCK, 0);
+	if (fd < 0)
+		return -1;
+
+	err = connect(fd, (struct sockaddr *)&addr, len);
+	if (err < 0 && errno != EINPROGRESS) {
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static void read_icmp_errqueue(int sockfd, int expected_code)
+{
+	struct sock_extended_err *sock_err;
+	char ctrl_buf[512];
+	struct msghdr msg = {
+		.msg_control = ctrl_buf,
+		.msg_controllen = sizeof(ctrl_buf),
+	};
+	struct pollfd pfd = {
+		.fd = sockfd,
+		.events = POLLERR,
+	};
+	struct cmsghdr *cm;
+	ssize_t n;
+
+	if (!ASSERT_GE(poll(&pfd, 1, TIMEOUT_MS), 1, "poll_errqueue"))
+		return;
+
+	n = recvmsg(sockfd, &msg, MSG_ERRQUEUE);
+	if (!ASSERT_GE(n, 0, "recvmsg_errqueue"))
+		return;
+
+	cm = CMSG_FIRSTHDR(&msg);
+	if (!ASSERT_NEQ(cm, NULL, "cm_firsthdr_null"))
+		return;
+
+	for (; cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if (cm->cmsg_level != IPPROTO_IP || cm->cmsg_type != IP_RECVERR)
+			continue;
+
+		sock_err = (struct sock_extended_err *)CMSG_DATA(cm);
+
+		if (!ASSERT_EQ(sock_err->ee_origin, SO_EE_ORIGIN_ICMP,
+			       "sock_err_origin_icmp"))
+			return;
+		if (!ASSERT_EQ(sock_err->ee_type, ICMP_DEST_UNREACH,
+			       "sock_err_type_dest_unreach"))
+			return;
+		ASSERT_EQ(sock_err->ee_code, expected_code, "sock_err_code");
+		return;
+	}
+
+	ASSERT_FAIL("no IP_RECVERR control message found");
+}
+
+static bool valid_unreach_code(int code)
+{
+	if (code < 0)
+		return false;
+
+	return code <= NR_ICMP_UNREACH && code != ICMP_FRAG_NEEDED;
+}
+
+static void trigger_prog_read_icmp_errqueue(struct icmp_send *skel, int code)
+{
+	int srv_fd = -1, client_fd = -1;
+	int port;
+
+	srv_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1", 0, TIMEOUT_MS);
+	if (!ASSERT_OK_FD(srv_fd, "start_server"))
+		return;
+
+	port = get_socket_local_port(srv_fd);
+	if (!ASSERT_GE(port, 0, "get_socket_local_port")) {
+		close(srv_fd);
+		return;
+	}
+
+	skel->bss->server_port = ntohs(port);
+	skel->bss->unreach_code = code;
+	skel->data->kfunc_ret = KFUNC_RET_UNSET;
+
+	client_fd = connect_to_fd_nonblock(srv_fd);
+	if (!ASSERT_OK_FD(client_fd, "client_connect_nonblock")) {
+		close(srv_fd);
+		return;
+	}
+
+	if (valid_unreach_code(code))
+		read_icmp_errqueue(client_fd, code);
+
+	close(client_fd);
+	close(srv_fd);
+}
+
+void test_icmp_send_unreach_cgroup(void)
+{
+	struct icmp_send *skel;
+	int cgroup_fd = -1;
+
+	skel = icmp_send__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = test__join_cgroup("/icmp_send_unreach_cgroup");
+	if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup"))
+		goto cleanup;
+
+	skel->links.egress =
+		bpf_program__attach_cgroup(skel->progs.egress, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.egress, "prog_attach_cgroup"))
+		goto cleanup;
+
+	for (int code = 0; code <= NR_ICMP_UNREACH; code++) {
+		if (code == ICMP_FRAG_NEEDED)
+			continue;
+
+		trigger_prog_read_icmp_errqueue(skel, code);
+		ASSERT_EQ(skel->data->kfunc_ret, 0, "kfunc_ret");
+	}
+
+	/* Test invalid codes */
+	trigger_prog_read_icmp_errqueue(skel, -1);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	trigger_prog_read_icmp_errqueue(skel, NR_ICMP_UNREACH + 1);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+	trigger_prog_read_icmp_errqueue(skel, ICMP_FRAG_NEEDED);
+	ASSERT_EQ(skel->data->kfunc_ret, -EINVAL, "kfunc_ret");
+
+cleanup:
+	icmp_send__destroy(skel);
+	if (cgroup_fd >= 0)
+		close(cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
new file mode 100644
index 000000000000..6d0be0a9afe1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+/* 127.0.0.1 in host byte order */
+#define SERVER_IP 0x7F000001
+
+#define ICMP_DEST_UNREACH 3
+
+__u16 server_port = 0;
+int unreach_code = 0;
+int kfunc_ret = -1;
+
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct iphdr *iph;
+	struct tcphdr *tcph;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4 ||
+	    iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	tcph = (void *)iph + iph->ihl * 4;
+	if ((void *)(tcph + 1) > data_end ||
+	    tcph->dest != bpf_htons(server_port))
+		return SK_PASS;
+
+	kfunc_ret = bpf_icmp_send(skb, ICMP_DEST_UNREACH, unreach_code);
+
+	return SK_DROP;
+}
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 1/5] bpf: add bpf_icmp_send kfunc
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This is needed in the context of Tetragon to provide improved feedback
(in contrast to just dropping packets) to east-west traffic when blocked
by policies using cgroup_skb programs.

This reuses concepts from netfilter reject target codepath with the
differences that:
* Packets are cloned since the BPF user can still let the packet pass
  (SK_PASS from the cgroup_skb progs for example) and the current skb
  need to stay untouched (cgroup_skb hooks only allow read-only skb
  payload).
* We protect against recursion since the kfunc, by generating an ICMP
  error message, could retrigger the BPF prog that invoked it.

Only ICMP_DEST_UNREACH and ICMPV6_DEST_UNREACH are currently supported.
The interface accepts a type parameter to facilitate future extension to
other ICMP control message types.

For normal cgroup_skb paths, the skb dst route should already be set.
However, bpf_prog_test_run_skb can create synthetic IPv4 skbs without an
attached route. In that case, icmp_send returns early, and the kfunc
would otherwise report success despite no ICMP reply being sent. The
check also rejects metadata dsts, which are not valid struct rtable
instances. For IPv6, reject metadata dsts only: icmpv6_send can reach
icmp6_dev, where skb_rt6_info treats any non-NULL skb dst as a struct
rt6_info, which is not valid for metadata_dst.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 net/core/filter.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 2e96b4b847ce..0a0191586b44 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -84,6 +84,9 @@
 #include <linux/un.h>
 #include <net/xdp_sock_drv.h>
 #include <net/inet_dscp.h>
+#include <linux/icmpv6.h>
+#include <net/icmp.h>
+#include <net/ip6_route.h>

 #include "dev.h"

@@ -12546,6 +12549,88 @@ __bpf_kfunc int bpf_xdp_pull_data(struct xdp_md *x, u32 len)
 	return 0;
 }

+/**
+ * bpf_icmp_send - Send an ICMP control message
+ * @skb_ctx: Packet that triggered the control message
+ * @type: ICMP type (only ICMP_DEST_UNREACH/ICMPV6_DEST_UNREACH supported)
+ * @code: ICMP code (0-15 except ICMP_FRAG_NEEDED for IPv4, 0-6 for IPv6)
+ *
+ * Sends an ICMP control message in response to the packet. The original packet
+ * is cloned before sending the ICMP message, so the BPF program can still let
+ * the packet pass if desired.
+ *
+ * Currently only ICMP_DEST_UNREACH (IPv4) and ICMPV6_DEST_UNREACH (IPv6) are
+ * supported.
+ *
+ * Return: 0 on success (send attempt), negative error code on failure:
+ *         -EBUSY: Recursion detected
+ *         -EPROTONOSUPPORT: Non-IP protocol
+ *         -EOPNOTSUPP: Unsupported ICMP type
+ *         -EINVAL: Invalid code parameter
+ *         -ENETUNREACH: No usable route/dst for the ICMP reply
+ *         -ENOMEM: Memory allocation failed
+ */
+__bpf_kfunc int bpf_icmp_send(struct __sk_buff *skb_ctx, int type, int code)
+{
+	struct sk_buff *skb = (struct sk_buff *)skb_ctx;
+	struct sk_buff *nskb;
+	struct sock *sk;
+
+	sk = skb_to_full_sk(skb);
+	if (sk && sk->sk_kern_sock &&
+	    (sk->sk_protocol == IPPROTO_ICMP || sk->sk_protocol == IPPROTO_ICMPV6))
+		return -EBUSY;
+
+	switch (skb->protocol) {
+#if IS_ENABLED(CONFIG_INET)
+	case htons(ETH_P_IP): {
+		if (type != ICMP_DEST_UNREACH)
+			return -EOPNOTSUPP;
+		if (code < 0 || code > NR_ICMP_UNREACH ||
+		    code == ICMP_FRAG_NEEDED) /* needs a valid next-hop MTU */
+			return -EINVAL;
+
+		/* icmp_send expects skb_dst to be a real rtable. */
+		if (!skb_valid_dst(skb))
+			return -ENETUNREACH;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		memset(IPCB(nskb), 0, sizeof(*IPCB(nskb)));
+		icmp_send(nskb, type, code, 0);
+		consume_skb(nskb);
+		break;
+	}
+#endif
+#if IS_ENABLED(CONFIG_IPV6)
+	case htons(ETH_P_IPV6):
+		if (type != ICMPV6_DEST_UNREACH)
+			return -EOPNOTSUPP;
+		if (code < 0 || code > ICMPV6_REJECT_ROUTE)
+			return -EINVAL;
+
+		/* icmpv6_send may treat skb_dst as rt6_info. */
+		if (skb_metadata_dst(skb))
+			return -ENETUNREACH;
+
+		nskb = skb_clone(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		memset(IP6CB(nskb), 0, sizeof(*IP6CB(nskb)));
+		icmpv6_send(nskb, type, code, 0);
+		consume_skb(nskb);
+		break;
+#endif
+	default:
+		return -EPROTONOSUPPORT;
+	}
+
+	return 0;
+}
+
 __bpf_kfunc_end_defs();

 int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
@@ -12588,6 +12673,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
 BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp)
 BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)

+BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send)
+BTF_ID_FLAGS(func, bpf_icmp_send)
+BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send)
+
 static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
 	.owner = THIS_MODULE,
 	.set = &bpf_kfunc_check_set_skb,
@@ -12618,6 +12707,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
 	.set = &bpf_kfunc_check_set_sock_ops,
 };

+static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send = {
+	.owner = THIS_MODULE,
+	.set = &bpf_kfunc_check_set_icmp_send,
+};
+
 static int __init bpf_kfunc_init(void)
 {
 	int ret;
@@ -12639,6 +12733,7 @@ static int __init bpf_kfunc_init(void)
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 					       &bpf_kfunc_set_sock_addr);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send);
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
 }
 late_initcall(bpf_kfunc_init);
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 0/5] bpf: add icmp_send kfunc
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy

Hello,

This is v10 of adding the icmp_send kfunc, as suggested during
LSF/MM/BPF 2025[^1]. The goal is to allow cgroup_skb programs to
actively reject east-west traffic, similarly to what is possible to do
with netfilter reject target. Applications can receive early feedback
that something went wrong during the TCP handshake.

The first step to implement this is using ICMP control messages, with
the ICMP_DEST_UNREACH type with various code ICMP_NET_UNREACH,
ICMP_HOST_UNREACH, ICMP_PROT_UNREACH, etc. This is easier to implement
than a TCP RST reply and will already hint the client TCP stack to abort
the connection and not retry extensively.

Note that this is different than the sock_destroy kfunc, that along
calls tcp_abort and thus sends a reset, destroying the underlying
socket.

Caveats of this kfunc design are that a program can call this function N
times, thus send N ICMP unreach control messages and that the program
can return from the BPF filter with pass leading to a potential
confusing situation where the TCP connection was established while the
client received ICMP_DEST_UNREACH messages.

v2 updates:
- fix a build error from a missing function call rename;
- avoid changing return line in bpf_kfunc_init;
- return SK_DROP from the kfunc (similarly to bpf_redirect);
- check the return value in the selftest.

v3 update:
- fix an undefined reference build error.

v4 updates:
- prevent the kfunc to be called recursively and add a test (thanks to
  Martin).
- do not fetch dst route when unnecessary (thanks to Martin).
- extend the test for IPv6 (thanks to Martin).
- use SK_DROP in examples and use non blocking sockets for testing
  (thanks to Martin).
- test when the kfunc returns -EINVAL (thanks to Jordan).
- add the kfunc to bpf_kfunc_set_skb as suggested by Alexei.
- guard the IPv4 parts with IS_ENABLED(CONFIG_INET).
- fix a wrong initial value for client_fd (thanks to Yonghong).
- add documentation to the kfunc.
- to Jordan: I couldn't include <linux/icmp.h> because of redefines from
  <network_helpers.h>.

v5 updates:
- kfunc name is now icmp_send and takes the control message type as
  parameter for future potential extension (daniel)
- drop the net patches to route packet since now the kfunc is limited to
  cgroup_skb and tc progs (daniel & martin)
- linearize skb headers (sashiko)
- zero SKB control block (sashiko)
- bind to port 0 instead of fixed port (sashiko)
- poll to wait for POLLERR event (sashiko)
- do not use ASSERT_EQ in CMSG_NXTHDR loop (sashiko)
- fix comment about byte order (sashiko)
- fix endianness IP address issue (sashiko)
- add forgotten cleanup_cgroup_environment (sashiko)
- let packets pass in recursion test (sashiko)
- clarify evaluation order for recursion test (sashiko)

v6 updates (all from sashiko):
- bring back the net patches to route packet since tc ingress needs it.
- rename the ip_route_reply helpers from fetch to fill.
- call pskb_network_may_pull on the cloned pkt.
- check explicitly that we received one and only one ICMP err ctrl msg.

v7 updates:
- use consume_skb on success path (stanislav)
- replace recursion protection with CPU_ARRAY by checking the nature of
  the sk (daniel, offline)
- use reverse xmas tree in read_icmp_errqueue (jordan)
- use ASSERT_OK_FD instead of ASSERT_GE whenever possible (jordan)
- add a test for tc (jordan)
- better filtering from host cgroup test progs (sashiko)

v8 updates:
- mostly a resend as it's been sitting as "New" in the queue for almost
  one month, fixed a few nits.
- on new bpf_icmp_send kfunc cgroup_skb test (patch 4/7):
  - guard a close fd with fd >= 0 (jordan)
  - use ASSERT_OK_FD instead of ASSERT_GE (jordan)
  - fixed comment style (sashiko)
- on recursion test (patch 7/7):
  - guard a close fd with fd >= 0 (jordan)
  - fixed comments style (sashiko)
  - filter bpf prog on pid and ICMP message types (sashiko)

v9 updates:
- first, there was a v8.5 that I discussed here[^2] with Emil
  Tsalapatis. I tried once again to make tc work but the ai review found
  something fundamentally wrong. This version removes the tc support for
  now and focuses on cgroup_skb.
- use helper get_socket_local_port instead of getsockname (sashiko)
- use if_nametoindex("lo") instead of value 1 (bpf-ci)
- fix IPV6_RECVERR appearance before IPv6 patch (bpf-ci)
- precise that 0 on success mean icmp_send was called but it was just an
  attempt since this function does not return anything (sashiko)
- explicitly consider ICMP_FRAG_NEEDED as invalid in bpf_icmp_send as
  it would miss the next-hop MTU info. Also test it. (sashiko)
- test for max_code + 1 for invalid (sashiko)
- add review-by tags from Jordan and Emil but remove it on the main
  patch as I have significantly changed it.
- check for rec_count in recursion test (sashiko)
- re-order setup_cgroup_environmment in test (sashiko)
- reset kfunc_ret on every test run (sashiko)
- check for skb route for icmp_send as the function would quietly fail
  and add a test (sashiko)

v10 updates:
- guard against skbs with metadata_dst before calling icmpv6_send
  (sashiko)
- add more review-by tags from Emil and Jordan.

[^1]: https://lwn.net/Articles/1022034/
[^2]: https://lore.kernel.org/bpf/ajvDRCw8cPqXAqQq@gmail.com/

Link to v9: https://lore.kernel.org/bpf/20260624185554.362555-1-mahe.tardy@gmail.com/

Mahe Tardy (5):
  bpf: add bpf_icmp_send kfunc
  selftests/bpf: add bpf_icmp_send kfunc cgroup_skb tests
  selftests/bpf: add bpf_icmp_send kfunc cgroup_skb IPv6 tests
  selftests/bpf: add bpf_icmp_send recursion test
  selftests/bpf: add bpf_icmp_send no route test

 net/core/filter.c                             |  95 +++++++
 .../bpf/prog_tests/icmp_send_kfunc.c          | 269 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/icmp_send.c | 123 ++++++++
 3 files changed, 487 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/icmp_send.c

--
2.34.1


^ permalink raw reply

* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Jani Nikula @ 2026-06-25 11:00 UTC (permalink / raw)
  To: Kaitao Cheng, David Laight, Christian König,
	David Hildenbrand (Arm), Alexei Starovoitov
  Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Daniel Borkmann,
	Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
	Paul E. McKenney, Shakeel Butt, David Howells, Simona Vetter,
	Randy Dunlap, Luca Ceresoli, Philipp Stanner, linux-block,
	linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel, io-uring,
	audit, bpf, netdev, dri-devel, linux-perf-users,
	linux-trace-kernel, kexec, live-patching, linux-modules,
	linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
	damon, llvm, Kaitao Cheng, Muchun Song
In-Reply-To: <0ed6b5c3-e955-46e2-9fc6-075a0dfd1c4f@linux.dev>

On Thu, 25 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> 在 2026/6/24 22:23, David Laight 写道:
>> On Wed, 24 Jun 2026 15:23:47 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>> On 6/24/26 15:14, Kaitao Cheng wrote:
>>>> 在 2026/6/22 16:42, David Laight 写道:  
>>>>> On Mon, 22 Jun 2026 12:05:31 +0800
>>>>> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>>>  
>>>>>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>>
>>>>>> The list_for_each*_safe() helpers are used when the loop body may
>>>>>> remove the current entry.  Their API exposes the temporary cursor at
>>>>>> every call site, even though most users only need it for the iterator
>>>>>> implementation and never reference it in the loop body.
>>>>>>
>>>>>> Add *_mutable() variants for list and hlist iteration.  The new helpers
>>>>>> support both forms: callers may keep passing an explicit temporary cursor
>>>>>> when they need to inspect or reset it, or omit it and let the helper use
>>>>>> a unique internal cursor.  
>>>>>
>>>>> I'm not really sure 'mutable' means anything either.
>>>>> It is possible to make it valid for the loop body (or even other threads)
>>>>> to delete arbitrary list items - but that needs significant extra overheads.
>>>>>
>>>>> It might be worth doing something that doesn't need the extra variable,
>>>>> but there is little point doing all the churn just to rename things.
>>>>>  
>>>>>>
>>>>>> This makes call sites that only mutate the list through the current entry
>>>>>> less noisy, while keeping the existing *_safe() helpers available for
>>>>>> compatibility.
>>>>>>
>>>>>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>> ---
>>>>>>  include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>>>>>>  1 file changed, 231 insertions(+), 38 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/list.h b/include/linux/list.h
>>>>>> index 09d979976b3b..1081def7cea9 100644
>>>>>> --- a/include/linux/list.h
>>>>>> +++ b/include/linux/list.h
>>>>>> @@ -7,6 +7,7 @@
>>>>>>  #include <linux/stddef.h>
>>>>>>  #include <linux/poison.h>
>>>>>>  #include <linux/const.h>
>>>>>> +#include <linux/args.h>
>>>>>>  
>>>>>>  #include <asm/barrier.h>
>>>>>>  
>>>>>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>>>>>>  #define list_for_each_prev(pos, head) \
>>>>>>  	for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>>>>>  
>>>>>> -/**
>>>>>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>>>>>> - * @pos:	the &struct list_head to use as a loop cursor.
>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>> - * @head:	the head for your list.
>>>>>> +/*
>>>>>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>>>>>>   */
>>>>>>  #define list_for_each_safe(pos, n, head) \
>>>>>>  	for (pos = (head)->next, n = pos->next; \
>>>>>>  	     !list_is_head(pos, (head)); \
>>>>>>  	     pos = n, n = pos->next)
>>>>>>  
>>>>>> +#define __list_for_each_mutable_internal(pos, tmp, head)		\
>>>>>> +	for (typeof(pos) tmp = (pos = (head)->next)->next;		\  
>>>>>
>>>>> Use auto
>>>>>  
>>>>>> +	     !list_is_head(pos, (head));				\
>>>>>> +	     pos = tmp, tmp = pos->next)
>>>>>> +
>>>>>> +#define __list_for_each_mutable1(pos, head)				\
>>>>>> +	__list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>>>>>> +
>>>>>> +#define __list_for_each_mutable2(pos, next, head)			\
>>>>>> +	list_for_each_safe(pos, next, head)
>>>>>> +
>>>>>>  /**
>>>>>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>>>>>> + * list_for_each_mutable - iterate over a list safe against entry removal
>>>>>>   * @pos:	the &struct list_head to use as a loop cursor.
>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>> - * @head:	the head for your list.
>>>>>> + * @...:	either (head) or (next, head)
>>>>>> + *
>>>>>> + * next:	another &struct list_head to use as optional temporary storage.
>>>>>> + *		The temporary cursor is internal unless explicitly supplied by
>>>>>> + *		the caller.
>>>>>> + * head:	the head for your list.
>>>>>> + */
>>>>>> +#define list_for_each_mutable(pos, ...)					\
>>>>>> +	CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__))	\
>>>>>> +		(pos, __VA_ARGS__)  
>>>>>
>>>>> The variable argument count logic really just slows down compilation.
>>>>> Maybe there aren't enough copies of this code to make that significant.
>>>>> But just because you can do it doesn't mean it is a gooD idea.
>>>>> I'm also not sure it really adds anything to the readability.
>>>>>
>>>>> And, it you are going to make the middle argument optional there is
>>>>> no need to change the macro name.  
>>>>
>>>> Christian König and Jani Nikula also disagree with the variadic-argument
>>>> implementation approach. If we abandon that method, it means we will
>>>> inevitably need to add some new macros. If mutable is not a good name,
>>>> suggestions for better alternatives would be welcome; coming up with a
>>>> suitable name is indeed rather tricky.  
>>>
>>> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>>>
>>> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
>> 
>> IIRC currently you have a choice of either:
>> 	define               Item that can't be deleted
>> 	list_for_each()	     The current item.
>> 	list_for_each_safe() The next item.
>> There is also likely to be code that updates the variables to allow
>> for other scenarios.
>> 
>> Note that if increase a reference count and release a lock then list_for_each()
>> is likely safer than list_for_each_safe() :-)
>> 
>> list.h has 9 variants of the 'safe' loop.
>> The bloat of another 9 is getting excessive.
>> 
>> It has to be said that this is one of my least favourite type of list...
>
> Hi Christian König, David Laight, Jani Nikula, David Hildenbrand,
> Andy Shevchenko, Alexei Starovoitov
>
> For ease of discussion, I need to summarize the currently possible
> approaches and briefly describe their respective pros and cons,
> using the list_for_each_entry* interfaces as examples.
>
> 1. Add list_for_each_entry_mutable, while keeping list_for_each_entry
> and list_for_each_entry_safe unchanged. list_for_each_entry_mutable
> would be used specifically for safe deletion scenarios that do not
> need to expose the temporary cursor externally. The code can refer to
> the v1 version.
>
> Pros: Does not depend on immediate per-subsystem adaptation and can be
>       merged directly.
> Cons: Requires adding a whole set of mutable interfaces, which makes the
>       code somewhat redundant.

Seems fine, and the original _safe naming is ambiguous anyway.

> 2. Directly optimize away the temporary cursor in list_for_each_entry_safe
> and define it inside the loop instead, changing the interface from four
> arguments to three.
>
> Pros: Does not add redundant interfaces.
> Cons: (1) Users need to manually update special cases that use the
>       traversal variable of list_for_each_entry_safe, the new
>       list_for_each_entry_safe would no longer apply there and would
>       need to be open-coded.
>       (2) Because the macro arguments changes, all list_for_each_entry_safe
>       callers would need to be modified and merged together, making it
>       difficult to merge such a large amount of code at once.

This won't fly because there are literally thousands of
list_for_each_entry_safe() users.

> 3. Use a variadic macro approach to optimize list_for_each_entry_safe,
> so that it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) Does not depend on immediate per-subsystem adaptation and can
>       be merged directly.
> Cons: (1) Increases compile time.
>       (2) Makes the interface harder for users to use.

Basically I'm against any variadic macro tricks where the optional
argument is not the last argument. That's just way too surprising, and
goes against common practice in just about all other languages.

> 4. Optimize list_for_each_entry by defining the temporary cursor internally,
> making it compatible with the functionality of list_for_each_entry_safe.
> The code can refer to the v2 version.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) The number of externally visible arguments of list_for_each_entry
>       remains unchanged, still three.
> Cons: (1) list_for_each_entry and list_for_each_entry_safe would be merged
>       into one, and list_for_each_entry_safe would gradually be deprecated.
>       (2) Users need to manually update special cases that use the traversal
>       variable of list_for_each_entry, the new list_for_each_entry would no
>       longer apply there and would need to be open-coded. There are 15 such
>       cases in total.

This sounds good to me, though I take it there's some code size increase
and/or performance penalty?

Maybe the 15 cases are questionable anyway?

> 5. Use a variadic macro approach to optimize list_for_each_entry, so that
> it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
>       (2) Does not depend on immediate per-subsystem adaptation and can be
>       merged directly.
> Cons: (1) Increases compile time.
>       (2) list_for_each_entry and list_for_each_entry_safe would be merged
>       into one, and list_for_each_entry_safe would gradually be deprecated.

Please don't do the macro tricks.

> 6. Make no changes, keep the current logic unchanged, and close the current
> email discussion.

I like hiding the temporary stuff when possible.


BR,
Jani.

-- 
Jani Nikula, Intel

^ permalink raw reply

* Re: [PATCH net] netpoll: fix a use-after-free on shutdown path
From: Breno Leitao @ 2026-06-25 10:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Amerigo Wang, netdev, linux-kernel, vlad.wing, asantostc,
	kernel-team, stable
In-Reply-To: <20260624192513.33023e54@kernel.org>

On Wed, Jun 24, 2026 at 07:25:13PM -0700, Jakub Kicinski wrote:
> On Mon, 22 Jun 2026 08:01:23 -0700 Breno Leitao wrote:
> > +		 * synchronize_net() does not protect the worker
> > +		 * (queue_process() is not an RCU reader). It fences the
> > +		 * senders -- the real RCU readers -- so they cannot re-arm
> > +		 * tx_work after the np->dev->npinfo was set to NULL.
> > +		 */
> > +		synchronize_net();
> > +		cancel_delayed_work_sync(&npinfo->tx_work);
> 
> Maybe we can avoid the sync_net and the comment by using
> disable_delayed_work_sync() ?

I've been thinking about it, and I think you have a good point.
queue_process() is the only place that take npinfo without RCU
protection.

This is what it happening right now:

CPU0 {
	run tx_work (queue_process())
	npinfo = container_of()...
	while {
A:		deqeue skb from the txq
		try to send
	}
}

CPU 1 {
	call_rcu() -> rcu_cleanup_netpoll_info()
	np->dev->npinfo, NULL
B:	kfree(npinfo);
}

Then, if B happens before A, we have the UAF. That said, if we make sure
that tx_work() is done, then we are OK with rcu_cleanup_netpoll_info

I am not totally sure if the order of pointer zero'ing and disabling
tx work is important, but, it doesn't seem so, any order would be OK
for:

	RCU_INIT_POINTER(np->dev->npinfo, NULL);
	disable_delayed_work_sync(&npinfo->tx_work);

Given that npinfo is not read inside queue_process(), then, order doesn't
matter.

Thanks for the point, I will update.
--breno

---
pw-bot: cr


^ permalink raw reply

* Re: [PATCH bpf 1/2] bpf, sockmap: Don't leak UDP socks on lookup-bind-release
From: Jakub Sitnicki @ 2026-06-25 10:48 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Michal Luczaj, Willem de Bruijn, John Fastabend, Jiayuan Chen,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <CAAVpQUAy=EVZcZRmXJPr=neJh7Q+UbYML5L4+nBynVrDPidUkw@mail.gmail.com>

On Wed, Jun 24, 2026 at 02:39 PM -07, Kuniyuki Iwashima wrote:
> On Wed, Jun 24, 2026 at 2:33 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>
>> On Wed, Jun 24, 2026 at 2:26 PM Michal Luczaj <mhal@rbox.co> wrote:
>> >
>> > On 6/24/26 22:01, Willem de Bruijn wrote:
>> > > Jakub Sitnicki wrote:
>> > >> On Tue, Jun 23, 2026 at 08:03 PM +02, Michal Luczaj wrote:
>> > >>> UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
>> > >>> sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.
>> > >>>
>> > >>> Because sockmap accepts unbound UDP sockets, a BPF program can increment a
>> > >>> socket's refcount via lookup. If the socket is subsequently bound, the
>> > >>> transition from unbound to bound causes bpf_sk_release() to skip the
>> > >>> decrement of the refcount, causing a memory leak.
>> > >>>
>> > >>> unreferenced object 0xffff88810bc2eb40 (size 1984):
>> > >>>   comm "test_progs", pid 2451, jiffies 4295320596
>> > >>>   hex dump (first 32 bytes):
>> > >>>     7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
>> > >>>     02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
>> > >>>   backtrace (crc bdee079d):
>> > >>>     kmem_cache_alloc_noprof+0x557/0x660
>> > >>>     sk_prot_alloc+0x69/0x240
>> > >>>     sk_alloc+0x30/0x460
>> > >>>     inet_create+0x2ce/0xf80
>> > >>>     __sock_create+0x25b/0x5c0
>> > >>>     __sys_socket+0x119/0x1d0
>> > >>>     __x64_sys_socket+0x72/0xd0
>> > >>>     do_syscall_64+0xa1/0x5f0
>> > >>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> > >>>
>> > >>> Maintain balanced refcounts across sk lookup/release: (re-)set
>> > >>> SOCK_RCU_FREE on proto update to treat the socket (whether bound or
>> > >>> unbound) as not requiring a refcount increment on (a RCU protected) lookup.
>> > >>>
>> > >>> Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
>> > >>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>> > >>> ---
>> > >>> Note: this issue is related to commit 67312adc96b5 ("bpf: reject unhashed
>> > >>> sockets in bpf_sk_assign").
>> > >>> ---
>> > >>>  net/ipv4/udp_bpf.c | 3 +++
>> > >>>  1 file changed, 3 insertions(+)
>> > >>>
>> > >>> diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
>> > >>> index ad57c4c9eaab..970327b59582 100644
>> > >>> --- a/net/ipv4/udp_bpf.c
>> > >>> +++ b/net/ipv4/udp_bpf.c
>> > >>> @@ -173,6 +173,9 @@ int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
>> > >>>     if (sk->sk_family == AF_INET6)
>> > >>>             udp_bpf_check_v6_needs_rebuild(psock->sk_proto);
>> > >>>
>> > >>> +   /* Treat all sockets as non-refcounted, regardless of binding state. */
>> > >>> +   sock_set_flag(sk, SOCK_RCU_FREE);
>> > >>> +
>> > >>>     sock_replace_proto(sk, &udp_bpf_prots[family]);
>> > >>>     return 0;
>> > >>>  }
>> > >>
>> > >> There is a side effect that an unhashed (unbound) UDP socket can now be
>> > >> selected in sk_lookup with bpf_sk_assign.
>> > >
>> > > The commit does mention a related fix, beneath the ---, commit
>> > > 67312adc96b5 ("bpf: reject unhashed sockets in bpf_sk_assign").
>> > > That fixes a similar issue by exactly disallowing this:
>> > >
>> > >     Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
>> > >     This matches the behaviour of __inet_lookup_skb which is ultimately
>> > >     the goal of bpf_sk_assign().
>> > >
>> > > So ..
>> > >
>> > >> Though perhaps that's for the
>> > >> better because TC bpf_sk_assign doesn't reject non-refcounted UDP
>> > >> sockets either, so we would have both socket dispatch sites behave the
>> > >> same way.
>> > >
>> > > .. there are two conflicting types of consistency here? Consistent with
>> > > __inet_lookup_skb or the TC bpf hook. Of those the first is the more
>> > > canonical.
>> > >
>> > >> Also, with this patch, if we insert & remove an unhashed UDP socket
>> > >> into/from a sockmap, we end up with an unhashed non-refcounted UDP
>> > >> socket. Not entirely sure if that is actually a problem or not.
>> > >>
>> > >> Willem, what is your take on having unhashed non-refcoted UDP sockets?
>> > >
>> > > I don't immediately see a problem, but I'm not an expert on SOCK_RCU_FREE.
>> >
>> > Perhaps it's worth mentioning that unhashed non-refcounted UDP socket is
>> > already possible: first auto-bind via connect(AF_INET) (which also sets
>> > SOCK_RCU_FREE), then unhash via connect(AF_UNSPEC).
>>
>> Setting SOCK_RCU_FREE itself should not cause a problem, but I think
>> we should take a step back.
>>
>> AFAIU, 0c48eefae712 was to allow putting AF_UNIX SOCK_DGRAM sockets
>> into sockmap, not to allow using unconnected UDP sockets in sk_lookup etc.
>>
>> Actually, v4 of the patch was implemented as such but did not get any feedback,
>> https://lore.kernel.org/bpf/20210508220835.53801-9-xiyou.wangcong@gmail.com/#t
>>
>> ... and v5 (the final commit) somehow removed the restriction for unconnected
>> UDP socket as well.
>> https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com/
>>
>> Given the initial use case, sockmap redirect, is still blocked by
>> TCP_ESTABLISHED
>> check in sock_map_redirect_allowed(), I feel there is no point in supporting
>> unconnected UDP sockets in sockmap.  It cannot get any skb from anywhere
>> (without buggy sk_lookup).
>
> s/unconnected/unhashed/g :)

Rejecting unhashed UDP sockets on insert to sockmap SGTM.
It is also in line with disable-problematic-cases strategy.

^ permalink raw reply

* Re: [PATCH net-next] Documentation: networking: Add a test plan for ethtool pause validation
From: Maxime Chevallier @ 2026-06-25 10:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, davem, Eric Dumazet, Paolo Abeni, Simon Horman,
	Russell King, Heiner Kallweit, Jonathan Corbet, Shuah Khan,
	Oleksij Rempel, Vladimir Oltean, Florian Fainelli,
	thomas.petazzoni, netdev, linux-kernel, linux-doc
In-Reply-To: <b7de216a-fd1a-42a0-8711-d822a1ad9319@lunn.ch>

Hi Andrew,

On 5/29/26 14:59, Andrew Lunn wrote:

(This discussion was a while ago, but this bit of context should be enough)

> But we also need to consider that for some APIs, we have decided that
> a configuration can be set now, which does not actually apply in our
> current conditions, but it will be stored away for when conditions
> change and it is applicable. The half duplex case could fit that. When
> the link is currently half duplex, you can configure pause, but you
> don't expect it to actually change the current behaviour. It only
> kicks in when the link renegotiates to full duplex sometime in the
> future. We have to also consider this the other way around. The link
> is full duplex and pause is configured by the user. Something happens
> with the LP and the link renegotiates to half duplex. The local end
> should not throw away the configuration, it simply cannot apply it
> given the current situation.

I'm writing the test description for HD with a better formatting, so the
HD test wouldn't be about "are we using pause stuff while in HD" as it
doesn't make sense, but rather "do we correctly store the pause settings
aside for later".

I'm realising that we don't really have an API to report the *true* in-use pause
settings. Taking HD as an example :

# ethtool -s eth2 duplex half

[588209.379363] mvpp2 f4000000.ethernet eth2: Link is Up - 100Mbps/Half - flow control off

# ethtool eth2
	[...]
	Supported pause frame use: Symmetric Receive-only
	Advertised pause frame use: Symmetric Receive-only
	Link partner advertised pause frame use: Symmetric Receive-only

# ethtool -a eth2
Autonegotiate:	on
RX:		off
TX:		off
RX negotiated: on
TX negotiated: on


Sure, pause and HD don't make sense, however what I find confusing to some
extent is that the only place we have information about the *actual* pause
settings is the "link is Up" log in dmesg.

Maybe the problem in the above situation is that whoever advertises
half-duplex only modes should also not advertise pause ?

Still, I'm wondering if we should even care about all that actually, HD and
Pause are incompatible, and that's it. If you have any thought on this, let
me know.

Maxime

^ permalink raw reply

* Re: [PATCH net v3 1/2] iov_iter: export iov_iter_restore
From: Christian Brauner @ 2026-06-25 10:43 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: netdev, Alexander Viro, Andrew Morton, Arseniy Krasnov,
	David S. Miller, Eric Dumazet, Eugenio Pérez, Jakub Kicinski,
	Jason Wang, kvm, linux-block, linux-fsdevel, linux-kernel,
	Michael S. Tsirkin, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	Stefano Garzarella, virtualization, Xuan Zhuo, Jens Axboe
In-Reply-To: <20260622222757.2130402-2-tavip@google.com>

> Export iov_iter_restore so that it can be used by modules.
> 
> This is needed by the virtio vsock transport (which can be built as a
> module) to restore the msg_iter state when transmission fails.
> 
> Acked-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Octavian Purdila <tavip@google.com>
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 273919b16161..f5df63961fb2 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
>  		i->__iov -= state->nr_segs - i->nr_segs;
>  	i->nr_segs = state->nr_segs;
>  }
> +EXPORT_SYMBOL_GPL(iov_iter_restore);

At least only export it for the module that really needs it. For
example, see:

EXPORT_SYMBOL_FOR_MODULES(__kernel_write, "autofs4");

-- 
Christian Brauner <brauner@kernel.org>

^ permalink raw reply

* Re: Please backport bridge multicast exponential field encoding fix series to stable kernels
From: Sasha Levin @ 2026-06-25 10:42 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, David Ahern,
	Shuah Khan, Andy Roulin, Yong Wang, Petr Machata, stable, Greg KH,
	Greg Kroah-Hartman
  Cc: Sasha Levin, Ujjal Roy, bridge, Kernel, Kernel, linux-kselftest,
	Ujjal Roy
In-Reply-To: <CAE2MWknz4X_gcNo6jkR87Lg8F0zfubkOc4Ujr57CS3aBMWrjEA@mail.gmail.com>

> Please backport the 5-patch bridge multicast exponential field
> encoding series (726fa7da2d8c, 12cfb4ecc471, 95bfd196f0dc,
> e51560f4220a, 529dbe762de0) to the stable kernels.

I tried, but it doesn't apply to 7.1. Could you provide a backport please?

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH 5.15/6.1/6.6] af_unix: Reject SIOCATMARK on non-stream sockets
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Sasha Levin, Alexander Martyniuk, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Kuniyuki Iwashima, Jann Horn, Lee Jones, Rao Shoaib,
	netdev, linux-kernel, stable, Yuan Tan, Yifan Wu, Juefei Pu,
	Xin Liu, Jiexun Wang, Ren Wei
In-Reply-To: <20260624151651.38894-1-alexevgmart@gmail.com>

> [PATCH 5.15/6.1/6.6] af_unix: Reject SIOCATMARK on non-stream sockets
>
> Backport fix for CVE-2026-52928. Reject SIOCATMARK in unix_ioctl()
> for non-stream sockets.

Queued for 6.6, 6.1 and 5.15, thanks!

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH 5.10] netfilter: nf_log: validate MAC header was set before dumping it
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Sasha Levin, Alexander Martyniuk, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, Jakub Kicinski,
	Patrick McHardy, netfilter-devel, coreteam, netdev, linux-kernel,
	Weiming Shi, Xiang Mei
In-Reply-To: <20260624140117.19799-1-alexevgmart@gmail.com>

> [PATCH 5.10] netfilter: nf_log: validate MAC header was set before
> dumping it
>
> --- a/net/ipv4/netfilter/nf_log_ipv4.c
> +++ b/net/ipv4/netfilter/nf_log_ipv4.c

Thanks for the backport - the retarget to nf_log_ipv4.c is right for 5.10.

One gap though: upstream fixed both loggers via the consolidated
nf_log_syslog.c, but in 5.10 the IPv6 logger (net/ipv6/netfilter/
nf_log_ipv6.c) still has the identical unguarded fallback and is left
vulnerable here - which is also Pablo's "why only 5.10?" point.

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Maciej Fijalkowski @ 2026-06-25 10:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, pabeni, horms,
	tushar.vyavahare, kerneljasonxing
In-Reply-To: <20260624193326.295e3711@kernel.org>

On Wed, Jun 24, 2026 at 07:33:26PM -0700, Jakub Kicinski wrote:
> On Tue, 23 Jun 2026 11:10:08 +0200 Maciej Fijalkowski wrote:
> > Subject: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
> 
> Do you want it in net? Either way - we'll need a rebase

I have not checked if this has been -net propagated already, but the rule
of thumb on bpf side was that all selftests related effort goes to -next.
Is it different on netdev side?

> 
> > Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com
> 
> missing <> around the email 

oof.

> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox