Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf v2 1/2] bpf: Fix partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-16 13:33 UTC (permalink / raw)
  To: Sun Jian
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
	edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
	toke, lorenzo
In-Reply-To: <20260616093103.471444-2-sun.jian.kdev@gmail.com>

On Tue, Jun 16, 2026 at 05:31:02PM +0800, Sun Jian wrote:
> For non-linear test_run output, bpf_test_finish() derives the linear
> data copy length from copy_size - frag_size. This only matches the
> linear data length when copy_size is the full packet size.
> 
> When userspace provides a short data_out buffer, copy_size is clamped to
> that buffer size. If copy_size is smaller than frag_size, the computed
> length becomes negative and bpf_test_finish() returns -ENOSPC before
> copying the packet prefix or updating data_size_out.
> 
> Compute the linear data length from the packet layout instead, and clamp
> the linear copy length to copy_size. This preserves the expected
> partial-copy semantics: return -ENOSPC, copy the packet prefix that fits
> in data_out, and report the full packet length through data_size_out.
> 
> Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
>  net/bpf/test_run.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 2bc04feadfab..976e8fa31bc9 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -453,19 +453,16 @@ static int bpf_test_finish(const union bpf_attr *kattr,
>  	}
>  
>  	if (data_out) {
> -		int len = sinfo ? copy_size - frag_size : copy_size;
> -
> -		if (len < 0) {
> -			err = -ENOSPC;
> -			goto out;
> -		}
> +		u32 head_len = size - frag_size;
> +		u32 len = min(copy_size, head_len);
>  
>  		if (copy_to_user(data_out, data, len))
>  			goto out;
>  
>  		if (sinfo) {
> -			int i, offset = len;
> +			u32 offset = len;
>  			u32 data_len;
> +			int i;

That doesn't look needed.

>  
>  			for (i = 0; i < sinfo->nr_frags; i++) {
>  				skb_frag_t *frag = &sinfo->frags[i];
> -- 
> 2.43.0
> 

^ permalink raw reply

* [PATCH net v4 0/2] ipv4/ipv6: account for fraggap on paged allocation paths
From: Wongi Lee @ 2026-06-16 13:33 UTC (permalink / raw)
  To: netdev
  Cc: David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, asml.silence, dhowells,
	willemb, Jungwoo Lee

Fix fraggap accounting in the paged-allocation paths of IPv4 and IPv6.

The IPv6 patch is the v4 update of the previously posted patch. The IPv4
patch handles the same code pattern (by Ido).

v3->v4
- Remove the MSG_SPLICE_PAGES exception from the IPv6 negative copy check.
- Clarify where the fraggap bytes are copied in the commit messages.
- Add Reviewed-by tags.

v2->v3
- Add the IPv4 counterpart.
- Mention that the IPv6 corruption became triggerable after ce650a166335.
- Remove the stale comments about copy becoming -fraggap when pagedlen > 0.
- Add missing Cc entries.

v1->v2:
- Fix mail format.

v3: https://lore.kernel.org/netdev/aiq3f7UZGFp0F3MV@DESKTOP-19IMU7U.localdomain/
v2: https://lore.kernel.org/netdev/aigx83czv+UJZA0d@DESKTOP-19IMU7U.localdomain/
v1: https://lore.kernel.org/netdev/aibiIYMAwUErTw5U@DESKTOP-19IMU7U.localdomain/

Wongi Lee (2):
  ipv4: account for fraggap on the paged allocation path
  ipv6: account for fraggap on the paged allocation path

 net/ipv4/ip_output.c  | 7 ++-----
 net/ipv6/ip6_output.c | 9 +++------
 2 files changed, 5 insertions(+), 11 deletions(-)

-- 
2.34.1

^ permalink raw reply

* [PATCH net-next v6 1/2] dinghai: add ZTE network driver support
From: han.junyang @ 2026-06-16 13:30 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, horms
  Cc: linux-kernel, netdev, han.junyang, ran.ming, han.chengfei,
	zhang.yanze
In-Reply-To: <20260616212106742_trNLb7r-FL04eDlJO8tT@zte.com.cn>

From: Junyang Han <han.junyang@zte.com.cn>

Add basic framework for ZTE DingHai ethernet PF driver, including
Kconfig/Makefile build support and PCIe device probe/remove skeleton.

Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
 MAINTAINERS                               |   6 +
 drivers/net/ethernet/Kconfig              |   1 +
 drivers/net/ethernet/Makefile             |   1 +
 drivers/net/ethernet/zte/Kconfig          |  20 +++
 drivers/net/ethernet/zte/Makefile         |   6 +
 drivers/net/ethernet/zte/dinghai/Kconfig  |  34 ++++
 drivers/net/ethernet/zte/dinghai/Makefile |  10 ++
 drivers/net/ethernet/zte/dinghai/en_pf.c  | 183 ++++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.h  |  65 ++++++++
 9 files changed, 326 insertions(+)
 create mode 100644 drivers/net/ethernet/zte/Kconfig
 create mode 100644 drivers/net/ethernet/zte/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/Kconfig
 create mode 100644 drivers/net/ethernet/zte/dinghai/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.c
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..73692b09bf7b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -29440,6 +29440,12 @@ S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
 F:	sound/hda/codecs/senarytech.c

+ZTE DINGHAI ETHERNET DRIVER
+M:	Junyang Han <han.junyang@zte.com.cn>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ethernet/zte/
+
 THE REST
 M:	Linus Torvalds <torvalds@linux-foundation.org>
 L:	linux-kernel@vger.kernel.org
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index b8f70e2a1763..c2b6996b0cfe 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -188,5 +188,6 @@ source "drivers/net/ethernet/wangxun/Kconfig"
 source "drivers/net/ethernet/wiznet/Kconfig"
 source "drivers/net/ethernet/xilinx/Kconfig"
 source "drivers/net/ethernet/xircom/Kconfig"
+source "drivers/net/ethernet/zte/Kconfig"

 endif # ETHERNET
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 57344fec6ce0..a34bcbd4df4e 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -104,3 +104,4 @@ obj-$(CONFIG_NET_VENDOR_XIRCOM) += xircom/
 obj-$(CONFIG_NET_VENDOR_SYNOPSYS) += synopsys/
 obj-$(CONFIG_NET_VENDOR_PENSANDO) += pensando/
 obj-$(CONFIG_OA_TC6) += oa_tc6.o
+obj-$(CONFIG_NET_VENDOR_ZTE) += zte/
diff --git a/drivers/net/ethernet/zte/Kconfig b/drivers/net/ethernet/zte/Kconfig
new file mode 100644
index 000000000000..b95c2fc7db77
--- /dev/null
+++ b/drivers/net/ethernet/zte/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE driver configuration
+#
+
+config NET_VENDOR_ZTE
+    bool "ZTE devices"
+    default y
+    help
+      If you have a network (Ethernet) card belonging to this class, say Y.
+      Note that the answer to this question doesn't directly affect the
+      kernel: saying N will just cause the configurator to skip all
+      the questions about Zte cards. If you say Y, you will be asked
+      for your specific card in the following questions.
+
+if NET_VENDOR_ZTE
+
+source "drivers/net/ethernet/zte/dinghai/Kconfig"
+
+endif # NET_VENDOR_ZTE
diff --git a/drivers/net/ethernet/zte/Makefile b/drivers/net/ethernet/zte/Makefile
new file mode 100644
index 000000000000..cd9929b61559
--- /dev/null
+++ b/drivers/net/ethernet/zte/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the ZTE device drivers
+#
+
+obj-$(CONFIG_DINGHAI) += dinghai/
diff --git a/drivers/net/ethernet/zte/dinghai/Kconfig b/drivers/net/ethernet/zte/dinghai/Kconfig
new file mode 100644
index 000000000000..94b5bd9b3c50
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE DingHai Ethernet driver configuration
+#
+
+config DINGHAI
+    bool "ZTE DingHai Ethernet driver"
+    depends on NET_VENDOR_ZTE && PCI
+    select NET_DEVLINK
+    help
+      This driver supports ZTE DingHai Ethernet devices.
+
+      DingHai is a high-performance Ethernet controller that supports
+      multiple features including hardware offloading, SR-IOV, and
+      advanced virtualization capabilities.
+
+      If you say Y here, you can select specific driver variants below.
+
+      If unsure, say N.
+
+if DINGHAI
+
+config DINGHAI_PF
+    tristate "ZTE DingHai PF (Physical Function) driver"
+    help
+      This driver supports ZTE DingHai PCI Express Ethernet
+      adapters (PF).
+
+      To compile this driver as a module, choose M here. The module
+      will be named dinghai10e.
+
+      If unsure, say N.
+
+endif # DINGHAI
diff --git a/drivers/net/ethernet/zte/dinghai/Makefile b/drivers/net/ethernet/zte/dinghai/Makefile
new file mode 100644
index 000000000000..f55a8de518be
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for ZTE DingHai Ethernet driver
+#
+
+ccflags-y += -I$(src)
+
+obj-$(CONFIG_DINGHAI_PF) += dinghai10e.o
+dinghai10e-y := en_pf.o
+
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
new file mode 100644
index 000000000000..99f2a8af5bf4
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ZTE DingHai Ethernet driver
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <net/devlink.h>
+#include <linux/dma-mapping.h>
+#include "en_pf.h"
+
+MODULE_AUTHOR("Junyang Han <han.junyang@zte.com.cn>");
+MODULE_DESCRIPTION("ZTE DingHai series Ethernet driver");
+MODULE_LICENSE("GPL");
+
+static const struct devlink_ops dh_pf_devlink_ops = {};
+
+static const struct pci_device_id dh_pf_pci_table[] = {
+	{ PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_PF_DEVICE_ID), 0 },
+	{ PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_VF_DEVICE_ID), 0 },
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, dh_pf_pci_table);
+
+static int dh_pf_pci_init(struct dh_core_dev *dev)
+{
+	struct zxdh_pf_device *pf_dev = dev->priv;
+	int ret;
+
+	pci_set_drvdata(dev->pdev, dev);
+
+	ret = pci_enable_device(dev->pdev);
+	if (ret) {
+		dev_err(dev->device, "pci_enable_device failed: %d\n", ret);
+		return ret;
+	}
+
+	ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(64));
+	if (ret) {
+		ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(32));
+		if (ret) {
+			dev_err(dev->device, "dma_set_mask_and_coherent failed: %d\n", ret);
+			goto err_pci;
+		}
+	}
+
+	ret = pci_request_selected_regions(dev->pdev,
+					   pci_select_bars(dev->pdev, IORESOURCE_MEM),
+					   "dh-pf");
+	if (ret) {
+		dev_err(dev->device, "pci_request_selected_regions failed: %d\n", ret);
+		goto err_pci;
+	}
+
+	pci_set_master(dev->pdev);
+	ret = pci_save_state(dev->pdev);
+	if (ret) {
+		dev_err(dev->device, "pci_save_state failed: %d\n", ret);
+		goto err_pci_save_state;
+	}
+
+	pf_dev->pci_ioremap_addr[0] =
+		ioremap(pci_resource_start(dev->pdev, 0),
+			pci_resource_len(dev->pdev, 0));
+	if (!pf_dev->pci_ioremap_addr[0]) {
+		ret = -ENOMEM;
+		dev_err(dev->device, "dh pf pci ioremap failed\n");
+		goto err_pci_save_state;
+	}
+
+	return 0;
+
+err_pci_save_state:
+	pci_release_selected_regions(dev->pdev,
+				     pci_select_bars(dev->pdev, IORESOURCE_MEM));
+err_pci:
+	pci_disable_device(dev->pdev);
+	return ret;
+}
+
+void dh_pf_pci_close(struct dh_core_dev *dev)
+{
+	struct zxdh_pf_device *pf_dev = dev->priv;
+
+	iounmap(pf_dev->pci_ioremap_addr[0]);
+	pci_release_selected_regions(dev->pdev,
+				     pci_select_bars(dev->pdev, IORESOURCE_MEM));
+	pci_disable_device(dev->pdev);
+}
+
+static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct zxdh_pf_device *pf_dev;
+	struct dh_core_dev *dh_dev;
+	struct devlink *devlink;
+	int ret;
+
+	devlink = devlink_alloc(&dh_pf_devlink_ops, sizeof(struct dh_core_dev),
+				&pdev->dev);
+	if (!devlink) {
+		dev_err(&pdev->dev, "dh_pf devlink alloc failed\n");
+		return -ENOMEM;
+	}
+
+	dh_dev = devlink_priv(devlink);
+	dh_dev->device = &pdev->dev;
+	dh_dev->pdev = pdev;
+	dh_dev->devlink = devlink;
+
+	pf_dev = dh_core_alloc_priv(dh_dev, sizeof(*pf_dev));
+	if (!pf_dev) {
+		dev_err(&pdev->dev, "dh_pf_dev alloc failed\n");
+		ret = -ENOMEM;
+		goto err_pf_dev;
+	}
+
+	pf_dev->bar_chan_valid = false;
+	pf_dev->vepa = false;
+	mutex_init(&dh_dev->lock);
+	mutex_init(&pf_dev->irq_lock);
+
+	dh_dev->coredev_type = GET_COREDEV_TYPE(pdev);
+
+	ret = dh_pf_pci_init(dh_dev);
+	if (ret) {
+		dev_err(&pdev->dev, "dh_pf_pci_init failed: %d\n", ret);
+		goto err_cfg_init;
+	}
+
+	devlink_register(devlink);
+
+	return 0;
+
+err_cfg_init:
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	dh_core_free_priv(dh_dev);
+err_pf_dev:
+	devlink_free(devlink);
+	return ret;
+}
+
+static void dh_pf_remove(struct pci_dev *pdev)
+{
+	struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+	struct devlink *devlink = priv_to_devlink(dh_dev);
+	struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+	devlink_unregister(devlink);
+	dh_pf_pci_close(dh_dev);
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	dh_core_free_priv(dh_dev);
+	devlink_free(devlink);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static void dh_pf_shutdown(struct pci_dev *pdev)
+{
+	struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+	struct devlink *devlink = priv_to_devlink(dh_dev);
+	struct zxdh_pf_device *pf_dev = dh_dev->priv;
+
+	devlink_unregister(devlink);
+	dh_pf_pci_close(dh_dev);
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	dh_core_free_priv(dh_dev);
+	devlink_free(devlink);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static struct pci_driver dh_pf_driver = {
+	.name = "dinghai10e",
+	.id_table = dh_pf_pci_table,
+	.probe = dh_pf_probe,
+	.remove = dh_pf_remove,
+	.shutdown = dh_pf_shutdown,
+};
+
+module_pci_driver(dh_pf_driver);
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.h b/drivers/net/ethernet/zte/dinghai/en_pf.h
new file mode 100644
index 000000000000..80ff1b860b83
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - PF header
+ * Copyright (c) 2022-2026, ZTE Corporation.
+ */
+
+#ifndef __ZXDH_EN_PF_H__
+#define __ZXDH_EN_PF_H__
+
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/mutex.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+
+#define ZXDH_PF_VENDOR_ID	0x1cf2
+#define ZXDH_PF_DEVICE_ID	0x8040
+#define ZXDH_VF_DEVICE_ID	0x8041
+
+enum dh_coredev_type {
+	DH_COREDEV_PF,
+	DH_COREDEV_VF,
+	DH_COREDEV_SF,
+	DH_COREDEV_MPF
+};
+
+struct devlink;
+
+struct dh_core_dev {
+	struct device *device;
+	enum dh_coredev_type coredev_type;
+	struct pci_dev *pdev;
+	struct devlink *devlink;
+	struct mutex lock; /* Protects device configuration */
+	void *priv;
+};
+
+struct zxdh_pf_device {
+	void __iomem *pci_ioremap_addr[6];
+	bool bar_chan_valid;
+	bool vepa;
+	struct mutex irq_lock; /* Protects IRQ operations */
+};
+
+static inline void *dh_core_alloc_priv(struct dh_core_dev *dh_dev,
+				       size_t size)
+{
+	void *priv = kzalloc(size, GFP_KERNEL);
+
+	if (priv)
+		dh_dev->priv = priv;
+	return priv;
+}
+
+static inline void dh_core_free_priv(struct dh_core_dev *dh_dev)
+{
+	kfree(dh_dev->priv);
+}
+
+#define GET_COREDEV_TYPE(pdev) \
+	((pdev)->device == ZXDH_VF_DEVICE_ID ? DH_COREDEV_VF : DH_COREDEV_PF)
+
+void dh_pf_pci_close(struct dh_core_dev *dev);
+
+#endif /* __ZXDH_EN_PF_H__ */
-- 
2.27.0

^ permalink raw reply related

* Re: [PATCH net v2] appletalk: fix TOCTOU race in atalk_sendmsg
From: Simon Horman @ 2026-06-16 13:22 UTC (permalink / raw)
  To: Yizhou Zhao
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kees Cook, Kito Xu, linux-kernel, Yuxiang Yang,
	Ao Wang, Xuewei Feng, Qi Li, Ke Xu, stable
In-Reply-To: <20260615090635.1549-1-zhaoyz24@mails.tsinghua.edu.cn>

On Mon, Jun 15, 2026 at 05:06:33PM +0800, Yizhou Zhao wrote:
> atalk_sendmsg() looks up an AppleTalk route, stores the returned
> atalk_route and net_device pointers, and then drops the socket lock
> around sock_alloc_send_skb().  The route pointer returned by
> atrtr_find() is only protected while atalk_routes_lock is held; after
> that lock is dropped, a concurrent SIOCDELRT or device-down path can
> unlink the route, drop the device reference, and free the route.
> 
> When sendmsg resumes, it can still dereference the stale route and
> device pointers while building or transmitting the packet.  A KASAN
> reproducer using AF_APPLETALK sockets and SIOCADDRT/SIOCDELRT reports
> slab-use-after-free reads in atalk_sendmsg(), with the object allocated
> by atrtr_create() and freed by atrtr_delete().
> 
> Fix this by splitting the route lookup into a helper that is called with
> atalk_routes_lock already held.  atalk_sendmsg() now performs route
> lookup, copies the route fields it needs, and takes references to the
> selected devices with netdev_hold() while still holding
> atalk_routes_lock.  After the lock is dropped and skb allocation sleeps,
> the send path uses only the copied route data and the held net_device
> references, which are released with netdev_put() before returning.
> 
> This preserves the existing route selection behaviour, including the
> separate loopback route used for broadcast loopback, while removing the
> dangling route/device window.
> 
> Fixes: 60d9f461a20b ("appletalk: remove the BKL")
> Cc: stable@vger.kernel.org
> Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
> Reported-by: Ao Wang <wangao@seu.edu.cn>
> Reported-by: Xuewei Feng <fengxw06@126.com>
> Reported-by: Qi Li <qli01@tsinghua.edu.cn>
> Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
> Assisted-by: GLM:GLM-5.1
> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
> ---
> Changes in v2:
> - Use netdev_hold()/netdev_put() instead of dev_hold()/dev_put().
> - Drop explicit NULL checks before releasing temporary device refs.
> - Link to v1: https://lore.kernel.org/netdev/20260610052315.64504-1-zhaoyz24@mails.tsinghua.edu.cn/

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* [PATCH net-next v6 0/2] Add ZTE DingHai Ethernet PF driver
From: han.junyang @ 2026-06-16 13:21 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, horms
  Cc: linux-kernel, netdev, han.junyang, ran.ming, han.chengfei,
	zhang.yanze

From: Junyang Han <han.junyang@zte.com.cn>

This series adds initial support for the ZTE DingHai Ethernet controller,
a high-performance PCIe Ethernet device supporting SR-IOV, hardware
offloading, and advanced virtualization features.

Changes from v5:
- Drop dev_info() log spam.
- Propagate the real error code from dh_pf_pci_init() in
  dh_pf_probe() instead of hard-coding -ENOMEM.
- Register devlink only after dh_pf_pci_init() succeeds, and
  in dh_pf_remove()/dh_pf_shutdown() unregister devlink
  before tearing down PCI/mutex/priv.
- Drop the "dh_dev->priv = NULL" assignment from
  dh_core_free_priv().

Changes from v4:
- Fix sparse warning: add __iomem annotation to priv pointer
- Fix Clang format warning
- Use "dinghai:" as patch subject prefix
- Ensure proper patch threading

Note: Sent manually due to temporary git send-email unavailability
in our environment. Will use git send-email or b4 for future
submissions. Apologies for any inconvenience.

Changes from v3:
- Merged patches 1 and 2: 
  Combined initial framework with logging infrastructure
  for better code organization and reduced patch count. This was done because
  the logging infrastructure now uses Linux's built-in dev_err(), dev_info(),
  dev_warn(), etc. macros instead of a custom logging system.
- Removed unnecessary variable initialization: 
  Fixed "don't initialise variables".
- Fixed variable declaration order: 
  Applied "Reverse Christmas tree" ordering with variables
  declared from longest to shortest line length.
- Code quality improvements:
  Fixed all checkpatch.pl issues (alignment, formatting, etc.).

Changes from v2:
- Address maintainer feedback from v2 review:
  * Remove meaningless initialization
  * Change dh_pf_pci_table to static const for better encapsulation
  * Simplify MODULE_DESCRIPTION for brevity
- Coding style improvements:
  * Ensure all lines are within 80-column limit
  * Use kernel types (u32/u8) consistently throughout
  * Improve code readability with better formatting


Changes from v1 (addressing feedback from AndrewLunn):
- Update copyright years to 2022-2026
- Remove DRV_VERSION, MODULE_VERSION and related boilerplate
- Fix MODULE_AUTHOR to use person with email address
- Use module_pci_driver() instead of manual init/exit
- Remove empty suspend/resume callbacks
- Replace char priv[] flexible array with void *priv + kzalloc
- Switch logging from printk wrappers to dev_*() based macros
- Remove dh_helper.h and dh_log.c, simplify to dh_log.h only
- Fix variable declaration ordering (reverse Christmas tree)
- Remove unnecessary NULL check in remove and pf_dev=NULL in probe
- Fix indentation and remove unnecessary type casts
- Use kernel idiomatic "if (ret)" style

This is the initial submission and only includes the PF (Physical Function)
driver. The VF (Virtual Function) driver will be submitted separately.

Junyang Han (2):
  dinghai: add ZTE network driver support
  dinghai: add hardware register access and PCI capability scanning

 MAINTAINERS                                 |   6 +
 drivers/net/ethernet/Kconfig                |   1 +
 drivers/net/ethernet/Makefile               |   1 +
 drivers/net/ethernet/zte/Kconfig            |  20 +
 drivers/net/ethernet/zte/Makefile           |   6 +
 drivers/net/ethernet/zte/dinghai/Kconfig    |  34 ++
 drivers/net/ethernet/zte/dinghai/Makefile   |  10 +
 drivers/net/ethernet/zte/dinghai/dh_queue.h |  71 +++
 drivers/net/ethernet/zte/dinghai/en_pf.c    | 622 ++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.h    | 131 +++++
 10 files changed, 902 insertions(+)
 create mode 100644 drivers/net/ethernet/zte/Kconfig
 create mode 100644 drivers/net/ethernet/zte/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/Kconfig
 create mode 100644 drivers/net/ethernet/zte/dinghai/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_queue.h
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.c
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.h

-- 
2.27.0

^ permalink raw reply

* [PATCH v2] net: macb: add TX stall timeout callback to recover from lost TSTART write
From: Andrea della Porta @ 2026-06-16 13:23 UTC (permalink / raw)
  To: netdev, Theo Lebrun, Andrea della Porta, Nicolas Ferre,
	Claudiu Beznea, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, linux-arm-kernel,
	linux-rpi-kernel, Nicolai Buchwitz
  Cc: Lukasz Raczylo, Steffen Jaeckel

From: Lukasz Raczylo <lukasz@raczylo.com>

The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
the TX queue.
While the exact root cause is not yet fully understood, it is likely
related to a hardware issue where a TSTART write to the NCR register
is missed, preventing the transmission from being kicked off.

Implement a timeout callback to handle TX queue stalls, triggering the
existing restart mechanism to recover.

Link: https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi RP1 ethernet controller")
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
---

CHANGES IN v2:

- dropped the rate-limited log message
- avoid incrementing tx_error as this is per packet

---
 drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index a12aa21244e83..fd282a1700fb9 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4522,6 +4522,13 @@ static int macb_setup_tc(struct net_device *dev, enum tc_setup_type type,
 	}
 }
 
+static void macb_tx_timeout(struct net_device *dev, unsigned int q)
+{
+	struct macb *bp = netdev_priv(dev);
+
+	macb_tx_restart(&bp->queues[q]);
+}
+
 static const struct net_device_ops macb_netdev_ops = {
 	.ndo_open		= macb_open,
 	.ndo_stop		= macb_close,
@@ -4540,6 +4547,7 @@ static const struct net_device_ops macb_netdev_ops = {
 	.ndo_hwtstamp_set	= macb_hwtstamp_set,
 	.ndo_hwtstamp_get	= macb_hwtstamp_get,
 	.ndo_setup_tc		= macb_setup_tc,
+	.ndo_tx_timeout		= macb_tx_timeout,
 };
 
 /* Configure peripheral capabilities according to device tree
-- 
2.35.3


^ permalink raw reply related

* Re: [PATCH bpf v2 2/2] selftests/bpf: Cover partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-16 13:17 UTC (permalink / raw)
  To: Sun Jian
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
	edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
	toke, lorenzo
In-Reply-To: <20260616093103.471444-3-sun.jian.kdev@gmail.com>

On Tue, Jun 16, 2026 at 05:31:03PM +0800, Sun Jian wrote:
> prog_run_opts already verifies that BPF_PROG_TEST_RUN returns -ENOSPC
> for a short data_out buffer while still reporting the full output size
> through data_size_out.
> 
> Add the same coverage for non-linear test_run output. Use pass-through
> TC and XDP programs with a 9000-byte packet, a 64-byte linear data area,
> and a 100-byte data_out buffer. The expected output spans both the linear
> data and the first fragment.
> 
> Verify that test_run returns -ENOSPC, reports the full packet length
> through data_size_out, and copies the packet prefix into data_out for
> both non-linear skb and XDP frags paths.
> 
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/prog_run_opts.c  | 72 +++++++++++++++++++
>  .../selftests/bpf/progs/test_pkt_access.c     | 12 ++++
>  2 files changed, 84 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> index 01f1d1b6715a..71af1ff02023 100644
> --- a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> +++ b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> @@ -4,6 +4,10 @@
>  
>  #include "test_pkt_access.skel.h"
>  
> +#define NONLINEAR_PKT_LEN 9000
> +#define NONLINEAR_LINEAR_DATA_LEN 64
> +#define SHORT_OUT_LEN 100
> +
>  static const __u32 duration;
>  
>  static void check_run_cnt(int prog_fd, __u64 run_cnt)
> @@ -20,6 +24,71 @@ static void check_run_cnt(int prog_fd, __u64 run_cnt)
>  	      "incorrect number of repetitions, want %llu have %llu\n", run_cnt, info.run_cnt);
>  }
>  
> +static void init_pkt(__u8 *pkt, size_t len)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < len; i++)
> +		pkt[i] = i & 0xff;
> +}
> +
> +static void test_skb_nonlinear_data_out_partial(struct test_pkt_access *skel)
> +{
> +	LIBBPF_OPTS(bpf_test_run_opts, topts);
> +	__u8 pkt[NONLINEAR_PKT_LEN];
> +	__u8 out[SHORT_OUT_LEN];
> +	struct __sk_buff skb = {};
> +	int prog_fd, err;
> +
> +	init_pkt(pkt, sizeof(pkt));

Can't we reuse pkt_v4 by reducing the linear area to ETH_HLEN?

> +	memset(out, 0xa5, sizeof(out));

Why is this needed?

> +
> +	skb.data_end = NONLINEAR_LINEAR_DATA_LEN;
> +
> +	topts.data_in = pkt;
> +	topts.data_size_in = sizeof(pkt);
> +	topts.data_out = out;
> +	topts.data_size_out = sizeof(out);
> +	topts.ctx_in = &skb;
> +	topts.ctx_size_in = sizeof(skb);
> +
> +	prog_fd = bpf_program__fd(skel->progs.tc_pass_prog);
> +	err = bpf_prog_test_run_opts(prog_fd, &topts);
> +
> +	ASSERT_EQ(err, -ENOSPC, "skb_nonlinear_partial_err");
> +	ASSERT_EQ(topts.data_size_out, sizeof(pkt), "skb_nonlinear_partial_data_size_out");
> +	ASSERT_OK(memcmp(out, pkt, sizeof(out)), "skb_nonlinear_partial_data_out");
> +}
> +
> +static void test_xdp_nonlinear_data_out_partial(struct test_pkt_access *skel)
> +{
> +	LIBBPF_OPTS(bpf_test_run_opts, topts);
> +	__u8 pkt[NONLINEAR_PKT_LEN];
> +	__u8 out[SHORT_OUT_LEN];
> +	struct xdp_md ctx = {};
> +	int prog_fd, err;
> +
> +	init_pkt(pkt, sizeof(pkt));
> +	memset(out, 0xa5, sizeof(out));
> +
> +	ctx.data = 0;
> +	ctx.data_end = NONLINEAR_LINEAR_DATA_LEN;
> +
> +	topts.data_in = pkt;
> +	topts.data_size_in = sizeof(pkt);
> +	topts.data_out = out;
> +	topts.data_size_out = sizeof(out);
> +	topts.ctx_in = &ctx;
> +	topts.ctx_size_in = sizeof(ctx);
> +
> +	prog_fd = bpf_program__fd(skel->progs.xdp_frags_pass_prog);
> +	err = bpf_prog_test_run_opts(prog_fd, &topts);
> +
> +	ASSERT_EQ(err, -ENOSPC, "xdp_nonlinear_partial_err");
> +	ASSERT_EQ(topts.data_size_out, sizeof(pkt), "xdp_nonlinear_partial_data_size_out");
> +	ASSERT_OK(memcmp(out, pkt, sizeof(out)), "xdp_nonlinear_partial_data_out");
> +}
> +
>  void test_prog_run_opts(void)
>  {
>  	struct test_pkt_access *skel;
> @@ -69,6 +138,9 @@ void test_prog_run_opts(void)
>  	run_cnt += topts.repeat;
>  	check_run_cnt(prog_fd, run_cnt);
>  
> +	test_skb_nonlinear_data_out_partial(skel);
> +	test_xdp_nonlinear_data_out_partial(skel);
> +
>  cleanup:
>  	if (skel)
>  		test_pkt_access__destroy(skel);
> diff --git a/tools/testing/selftests/bpf/progs/test_pkt_access.c b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> index bce7173152c6..cd284401eebd 100644
> --- a/tools/testing/selftests/bpf/progs/test_pkt_access.c
> +++ b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> @@ -150,3 +150,15 @@ int test_pkt_access(struct __sk_buff *skb)
>  
>  	return TC_ACT_UNSPEC;
>  }
> +
> +SEC("tc")
> +int tc_pass_prog(struct __sk_buff *skb)
> +{
> +	return TC_ACT_OK;
> +}

Once we're reusing pkt_v4, maybe we can also reuse the existing BPF
program?

> +
> +SEC("xdp.frags")
> +int xdp_frags_pass_prog(struct xdp_md *ctx)
> +{
> +	return XDP_PASS;
> +}
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [PATCH net v3 2/2] ipv6: account for fraggap on the paged allocation path
From: Wongi Lee @ 2026-06-16 13:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David Ahern, Ido Schimmel, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, asml.silence, dhowells, willemb,
	Jungwoo Lee
In-Reply-To: <20260615133945.6e94c2d9@kernel.org>

On Mon, Jun 15, 2026 at 01:39:45PM -0700, Jakub Kicinski wrote:
> On Thu, 11 Jun 2026 22:34:13 +0900 Wongi Lee wrote:
> >  			copy = datalen - transhdrlen - fraggap - pagedlen;
> > -			/* [!] NOTE: copy may be negative if pagedlen>0
> > -			 * because then the equation may reduces to -fraggap.
> > -			 */
> >  			if (copy < 0 && !(flags & MSG_SPLICE_PAGES)) {
> 
> You remove the comment because copy can never be negative with
> pagedlen>0 now, can we not remove "!(flags & MSG_SPLICE_PAGES)"
> as well then?

Yes, I checked the arithmetic and I agree that the MSG_SPLICE_PAGES 
exception is no longer needed after fraggap is accounted for in pagedlen.

I will remove the exception in v4 and address Ido's commit message
comment as well.

^ permalink raw reply

* Re: [PATCH v2] vsock/virtio: rework MSG_ZEROCOPY flag handling
From: Stefano Garzarella @ 2026-06-16 13:09 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman,
	Xuan Zhuo, Eugenio Pérez, Simon Horman, kvm, virtualization,
	netdev, linux-kernel, oxffffaa, rulkc
In-Reply-To: <20260614174756.170631-1-avkrasnov@rulkc.org>

On Sun, Jun 14, 2026 at 08:47:56PM +0300, Arseniy Krasnov wrote:
>Logically it was based on TCP implementation, so make further support
>easier, rewrite it in the TCP way.

Hi Arseniy, and thank you so much for the patch!

I’d like to ask you to expand on the message a bit, especially to 
explain why we’re making this change.

In particular, I’d like to better understand whether this is just a 
cosmetic change or if we’re fixing any issues (and if so, which ones), 
so we can determine whether this patch should be backported to the 
stable branches.

>
>Signed-off-by: Arseniy Krasnov <avkrasnov@rulkc.org>
>---
> Changelog v1->v2:
> * Rebase on last 'net-next'. Don't need 'skb_zcopy_set()' now - it was
>   already added.

Ah, okay is net-next material, please use the net-next tag (ie. [PATCH 
net-next v2]).

>
> net/vmw_vsock/virtio_transport_common.c | 48 ++++++++++++-------------
> 1 file changed, 23 insertions(+), 25 deletions(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 09475007165b..787524b8cb44 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -328,38 +328,36 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> 		return pkt_len;
>
>-	if (info->msg) {
>-		/* If zerocopy is not enabled by 'setsockopt()', we behave as
>-		 * there is no MSG_ZEROCOPY flag set.
>+	if (info->msg && (info->msg->msg_flags & MSG_ZEROCOPY)) {
>+		/* If 'info->msg' is not NULL, this is only VIRTIO_VSOCK_OP_RW.
>+		 * 'MSG_ZEROCOPY' flag handling here is based on the same flag
>+		 * handling from 'tcp_sendmsg_locked()'.
> 		 */
>-		if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
>-			info->msg->msg_flags &= ~MSG_ZEROCOPY;
>+		if (info->msg->msg_ubuf) {
>+			uarg = info->msg->msg_ubuf;
>+			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
>+		} else if (sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY)) {
>+			uarg = msg_zerocopy_realloc(sk_vsock(vsk), pkt_len,
>+						    NULL, false);
>+			if (!uarg) {
>+				virtio_transport_put_credit(vvs, pkt_len);
>+				return -ENOMEM;
>+			}
>
>-		if (info->msg->msg_flags & MSG_ZEROCOPY)
> 			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
>

nit: we can remove this extra blank line.

For the rest I can't see anything wrong, but a bit more context in the 
commit would help me in the review.

Thanks,
Stefano


^ permalink raw reply

* [Bug] incompatibility between 'e1000e' and Aruba AOS-CX switches (too small inter-packet gap)
From: Philippe Andersson @ 2026-06-16 13:07 UTC (permalink / raw)
  To: netdev; +Cc: Ludovic Calmant, Fabian Noël


[-- Attachment #1.1.1: Type: text/plain, Size: 2726 bytes --]

Hello,

Our product (a medical oncology system) uses TCP/IP for time-critical 
communications between control systems (PCs running Linux) and 
electronic units or PLCs.

We recently upgraded our standard networking equipment from the old 
HPE/Aruba "ProCurve" line to the new Aruba AOS-CX one, and since then we 
started experiencing communication issues caused by (or triggering) 
larger than usual packet retransmits specifically on hosts that use the 
'e1000e' driver (Intel I219-LM and Intel 82579LM boards). This issue 
immediately affects the time-critical communication channels I mentioned 
above, but are also seen e.g. in regular NFS communications (although in 
this case the effect is not generally perceptible to the user).

Here are the tests we did to try and narrow down the problem:

- add a single AOS-CX switch to our older network topology, connect a 
single Debian 12 workstation to it, simulate a clinical workflow: 
problem present

- in the same test setup as above, configure the Debian workstation to 
use its 2nd NIC instead, which uses the 'igb' driver: problem disappears 
(idem when plugging in an extra PCI NIC that uses the 'tg3' driver)

- in a lab environment, try to reproduce the problem with a single 
Debian 13 workstation, a single AOS-CX switch and a sole communication 
partner: could not reproduce (hypothesis: because the "background noise" 
on the network is too low)

- on a test site already equipped with AOS-CX switches, upgraded the 
Debian workstation to Debian 13 (kernel 6.12.x), simulate a clinical 
workflow: problem present

- on a test site already equipped with AOS-CX switches, force 
workstation port to 100Mbps half-duplex: problem disappears

- set option "interpacket-gap high-average" on Aruba AOS-CX switch: no 
effect

- activate flow-control on workstation and switch port: no effect

A support ticket has already been opened with Aruba, but it's unclear at 
this stage that the problem is on their side. Based on their analysis: 
"There was a BUG noted for the same issue for the devices 
with Intel I219-LM Network Card. This issue is observed with 
some NIC cards that are not tolerant to smaller inter packet gap, 
these NICs will silently drop packets at certain inter packet gaps."

We can provide packet captures that illustrate the problem if it can 
help (but ideally through direct email, not this list).

Is this issue already known/tracked on your side?

Thanks in advance.

Best regards,

Ph. A.

-- 

*Philippe Andersson*
Unix System Administrator
IBA Particle Therapy |
Tel: +32-10-475.983
Fax: +32-10-487.707
eMail: pan@iba-group.com
<http://www.iba-worldwide.com>



[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3165 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply

* Re: [PATCH net v2 2/2] vsock/virtio: restore msg_iter on transmission failure
From: Stefano Garzarella @ 2026-06-16 12:59 UTC (permalink / raw)
  To: Octavian Purdila, g
  Cc: netdev, Alexander Viro, Andrew Morton, Arseniy Krasnov,
	David S. Miller, Eric Dumazet, Eugenio Pérez, Jakub Kicinski,
	Jason Wang, kvm, linux-block, linux-fsdevel, linux-kernel,
	Michael S. Tsirkin, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	virtualization, Xuan Zhuo, syzbot+28e5f3d207b14bae122a
In-Reply-To: <20260613000953.467473-3-tavip@google.com>

On Sat, Jun 13, 2026 at 12:09:53AM +0000, Octavian Purdila wrote:
>When transmission fails in virtio_transport_send_pkt_info, the msg_iter
>might have been partially advanced. If we don't restore it, the next
>attempt to send data will use an incorrect iterator state, leading to
>desync and warnings like "send_pkt() returns 0, but X expected".
>
>Specifically, this can happen in the following scenario, triggered by
>the syzkaller repro:
>
>1. A write-only VMA (PROT_WRITE only) is partially populated by a
>   prior TUN write that failed with -EIO but still faulted in some
>   pages).
>2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
>   buffer from this VMA.
>3. The first packet (64KB) is sent successfully because the pages are
>   populated.
>4. The second packet allocation fails because GUP fast pins the first page
>   but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
>   permissions.
>5. The iterator is advanced by the partially successful GUP (68KB total
>   advanced: 64KB from first packet + 4KB from second), but the send loop
>   breaks and only reports 64KB sent. This creates a 4KB desync.
>6. The next retry starts with a non-zero iov_offset, disabling zerocopy
>   and falling back to copy mode.
>7. In copy mode, the transmission succeeds for the next packets but
>   exhausts the iterator early because of the desync.
>8. The final retry sees an empty iterator but zerocopy is re-enabled
>   (offset resets). It attempts to send the remaining bytes with zerocopy
>   but pins 0 pages, creating an empty packet.
>9. The transport sends the empty packet, triggering the warning because
>   the returned bytes (header only) do not match the expected payload size.
>10. The loop continues to spin, allocating ubuf_info each time, eventually
>    exhausting sysctl_optmem_max and returning -ENOMEM to userspace.
>
>Restore msg_iter to its original state before the packet allocation
>and transmission attempt if they fail.
>
>Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
>Reported-by: syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com
>Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
>Assisted-by: gemini:gemini-3.1-pro
>Signed-off-by: Octavian Purdila <tavip@google.com>
>---
> net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)

Thanks, looks much better to me now!

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply

* Re: [PATCH net-next v7 2/2] net: ti: icssg-prueth: Add ethtool ops for Frame Preemption MAC Merge
From: Meghana Malladi @ 2026-06-16 12:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: elfring, haokexin, vadim.fedorenko, devnexen, horms,
	jacob.e.keller, arnd, basharath, afd, parvathi, vladimir.oltean,
	rogerq, danishanwar, pabeni, edumazet, davem, andrew+netdev,
	linux-arm-kernel, netdev, linux-kernel, srk, vigneshr
In-Reply-To: <20260615163932.50bb3df0@kernel.org>

Hi Jakub,

On 6/16/26 05:09, Jakub Kicinski wrote:
> On Mon, 15 Jun 2026 16:10:41 -0700 Jakub Kicinski wrote:
>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>> index 5ec0b38e0c67..8073deac35c3 100644
>>> --- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>> @@ -189,6 +187,11 @@ static const struct icssg_pa_stats icssg_all_pa_stats[] = {
>>>   	ICSSG_PA_STATS(FW_INF_DROP_PRIOTAGGED),
>>>   	ICSSG_PA_STATS(FW_INF_DROP_NOTAG),
>>>   	ICSSG_PA_STATS(FW_INF_DROP_NOTMEMBER),
>>> +	ICSSG_PA_STATS(FW_PREEMPT_BAD_FRAG),
>>> +	ICSSG_PA_STATS(FW_PREEMPT_ASSEMBLY_ERR),
>>> +	ICSSG_PA_STATS(FW_PREEMPT_FRAG_CNT_TX),
>>> +	ICSSG_PA_STATS(FW_PREEMPT_ASSEMBLY_OK),
>>> +	ICSSG_PA_STATS(FW_PREEMPT_FRAG_CNT_RX),
>>>   	ICSSG_PA_STATS(FW_RX_EOF_SHORT_FRMERR),
>>>   	ICSSG_PA_STATS(FW_RX_B0_DROP_EARLY_EOF),
>>>   	ICSSG_PA_STATS(FW_TX_JUMBO_FRM_CUTOFF),
>>
>> [Medium]
>> Are these five new entries duplicating values that already have a
>> standard uAPI?
>>
>> The same five firmware counters are exposed through the new
>> .get_mm_stats callback as the standardized MAC Merge stats
>> (MACMergeFrameAssOkCount, MACMergeFrameAssErrorCount, MACMergeFragCountRx,
>> MACMergeFragCountTx, MACMergeFrameSmdErrorCount in struct
>> ethtool_mm_stats), and adding them to icssg_all_pa_stats[] also
>> publishes them via emac_get_strings() / emac_get_ethtool_stats() as
>> ethtool -S strings.
>>
>> Documentation/networking/statistics.rst describes ethtool -S as the
>> private-driver-stats interface; counters that have a standard uAPI are
>> expected to flow only through that uAPI.
>>
>> Could the firmware-register lookup table used by emac_get_stat_by_name()
>> be separated from the ethtool -S string table, so the new preemption
>> counters feed get_mm_stats without also showing up under ethtool -S?
> 
> This -- not sure about the other complaints but this one looks legit.

I agree that this is legit, but right now there is no other place holder 
other than pa stats to put the mac merge firmware counters. I believe
the effort needs to go in re-structuring the hardware and firmware stats 
implementation to address this issue.


^ permalink raw reply

* Re: [PATCH net-next 1/9] atm: remove AAL3/4 transport support
From: David Laight @ 2026-06-16 12:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, 3chas3,
	mitch, linux-atm-general, dwmw2
In-Reply-To: <20260613201032.77274-2-kuba@kernel.org>

On Sat, 13 Jun 2026 13:10:24 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> AAL3/4 is an obsolete connection-oriented ATM adaptation layer that has
> seen no real use since the SMDS-era hardware it was designed for (90s?).
...

From what I remember they weren't really used even then.
Apart from 'uncompressed 64k telephony audio' pretty much everything actually
used AAL5 so that data could be compressed.

I do remember PCI ATM cards that could be used for TCP/ATM to the desktop.
The equipment you needed in the network rack to make it work was stunning.
Completely killed by 100M ethernet over twisted pair.

	David

^ permalink raw reply

* Re: [PATCH v2] [net] net: airoha: Clean up RX queues in airoha_dev_stop
From: Lorenzo Bianconi @ 2026-06-16 12:45 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178161160256.2165161.14322392784449633554@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

On Jun 16, Wayen Yan wrote:
> When the last port is stopped, airoha_dev_stop() clears TX queues
> but neglects to clean up RX queues. This can lead to:
> - RX ring buffer descriptors remaining valid after device close
> - Potential DMA synchronization issues on device reopen
> - Risk of use-after-free if pages are freed while DMA is still active
> 
> Add cleanup loop for RX queues to mirror the TX queue cleanup,
> ensuring symmetric resource management.
> 
> Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
> Signed-off-by: Wayen Yan <win847@gmail.com>

when you send a new revision:
- please add a note of what you changed with respect to the previous one.
- please give some time to reviewers to take a look to the previous revision.

> ---
>  drivers/net/ethernet/airoha/airoha_eth.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 31cdb11cd7..9ca5bbf64d 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1771,6 +1771,13 @@ static int airoha_dev_stop(struct net_device *dev)
>  
>  			airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]);
>  		}
> +
> +		for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
> +			if (!qdma->q_rx[i].ndesc)
> +				continue;
> +
> +			airoha_qdma_cleanup_rx_queue(&qdma->q_rx[i]);
> +		}
>  	}

I do not think this patch is needed since there is no point to remove all the
RX buffers from the hw RX queues stopping the device, this is necessary just
removing the module (I think we can avoid it for TX too, I have a patch for it
I need to post).
Moreover, doing so, when the device is opened again, RX queues will be empty.

Regards,
Lorenzo

>  
>  	return 0;
> -- 
> 2.51.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH net v3] net: airoha: Fix skb->priority underflow in airoha_dev_select_queue()
From: Wayen Yan @ 2026-06-13 23:30 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

In airoha_dev_select_queue(), the expression:

  queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;

implicitly converts to unsigned arithmetic: when skb->priority is 0
(the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
and UINT_MAX % 8 = 7, routing default best-effort packets to the
highest-priority QoS queue. This causes QoS inversion where the
majority of traffic on a PON gateway starves actual high-priority
flows (VoIP, gaming, etc.).

Fix by guarding the subtraction: when priority is 0, map to queue 0
(lowest priority), otherwise apply the original (priority - 1) % 8
mapping.

Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Wayen Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 31cdb11cd7..d476ef83c3 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1933,7 +1933,7 @@ static u16 airoha_dev_select_queue(struct net_device *dev, struct sk_buff *skb,
 	 */
 	channel = netdev_uses_dsa(dev) ? skb_get_queue_mapping(skb) : port->id;
 	channel = channel % AIROHA_NUM_QOS_CHANNELS;
-	queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES; /* QoS queue */
+	queue = skb->priority ? (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES : 0;
 	queue = channel * AIROHA_NUM_QOS_QUEUES + queue;
 
 	return queue < dev->num_tx_queues ? queue : 0;
-- 
2.51.0



^ permalink raw reply related

* Re: [PATCH net-next 0/3] Introduce HSR/PRP HW offload support for PRU-ICSSM Ethernet driver
From: Parvathi Pudi @ 2026-06-16 12:40 UTC (permalink / raw)
  To: Simon Horman
  Cc: Parvathi Pudi, andrew+netdev, davem, edumazet, kuba, pabeni,
	danishanwar, rogerq, pmohan, afd, basharath, arnd, linux-kernel,
	netdev, linux-arm-kernel, pratheesh, j-rameshbabu,
	Vignesh Raghavendra, praneeth, srk, rogerq, m-malladi, krishna,
	mohan
In-Reply-To: <20260612200102.GN671640@horms.kernel.org>

Hi,

> On Thu, Jun 11, 2026 at 06:03:25PM +0530, Parvathi Pudi wrote:
>> Hi,
>> 
>> This series introduces HSR and PRP protocol HW offload support for
>> ICSSM-Prueth driver.  HW offload support for HSR/PRP is implemented using
>> dedicated HSR/PRP firmware running on 2 PRU cores(PRU-ICSS) as a "DAN"
>> available in AM57xx, AM437x and AM335x.
> 
> Hi Parvathi,
> 
> There is AI-generated review of this patch-set available on
> https://sashiko.dev
> 
> I would appreciate it if you could look over that with a view
> to addressing any issues that directly affect this patch-set.

Sure. We will review the sashiko feedback for this patch series and
address any findings that are directly relevant in the next version.

Thanks and Regards,
Parvathi.

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned folio
From: Jason Gunthorpe @ 2026-06-16 12:40 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Bobby Eshleman, Donald Hunter, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Andrew Lunn,
	Gerd Hoffmann, Sumit Semwal, Christian König, Shuah Khan,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kselftest@vger.kernel.org,
	sdf@fomichev.me, razor@blackwall.org, daniel@iogearbox.net,
	almasrymina@google.com, matttbe@kernel.org, skhawaja@google.com,
	dw@davidwei.uk, Bobby Eshleman
In-Reply-To: <IA0PR11MB71852246277F773AC41DAAA3F8E52@IA0PR11MB7185.namprd11.prod.outlook.com>

On Tue, Jun 16, 2026 at 06:04:03AM +0000, Kasireddy, Vivek wrote:

> > This is helpful for importers like net/core/devmem that expect dmabuf sg
> IMO, udmabuf needs to detect whether importers can handle segments that
> are > PAGE_SIZE and set the entries appropriately. Please look into how the
> GPU drivers and other dmabuf exporters/importers handle this situation, so
> that we can adopt best practices to address this issue.

Importers have to handle arbitary scatterlists, devmem is just broken
if it can't handle the output of sg_alloc_table_from_pages().

Jason

^ permalink raw reply

* Re: [PATCH v2] atm: fix use-after-free in sigd_put_skb()
From: Weiming Shi @ 2026-06-16 12:39 UTC (permalink / raw)
  To: Jakub Kicinski, Weiming Shi
  Cc: Chas Williams, David S . Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, linux-atm-general, netdev, linux-kernel, Xiang Mei
In-Reply-To: <20260612161051.2d4df09b@kernel.org>

On Sat Jun 13, 2026 at 7:10 AM CST, Jakub Kicinski wrote:
> On Wed, 10 Jun 2026 00:21:08 +0800 Weiming Shi wrote:
>> sigd_put_skb() delivers a signalling message to the daemon socket named
>> by the global @sigd pointer, ending in a call to sk_data_ready(). It
>> reads @sigd with no synchronisation, so it can race with a close of the
>> daemon socket: sigd_close() clears @sigd and the socket is then torn
>> down and freed.
>
> Hm, we intend to only retain the portions of the ATM stack which are
> still used in PPPoATM and ADSL. I don't believe the signaling stuff 
> is used there. I will post a patch to delete this code.

Thanks, that makes sense to me. 


^ permalink raw reply

* Re: [PATCH net-next 3/3] net: ti: icssm-prueth: Support duplicate HW offload feature for HSR and PRP
From: Parvathi Pudi @ 2026-06-16 12:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Parvathi Pudi, andrew+netdev, davem, edumazet, pabeni,
	danishanwar, rogerq, pmohan, afd, basharath, arnd, linux-kernel,
	netdev, linux-arm-kernel, pratheesh, j-rameshbabu,
	Vignesh Raghavendra, praneeth, srk, rogerq, m-malladi, krishna,
	mohan
In-Reply-To: <20260615135600.655e4be4@kernel.org>

Hi,

> On Thu, 11 Jun 2026 18:03:28 +0530 Parvathi Pudi wrote:
>> From: Roger Quadros <rogerq@ti.com>
>> 
>> In HSR and PRP modes each outgoing frame must be sent on both PRU slave
>> ports.
>> 
>> Previously the driver was writing the frame into each port's transmit queue
>> independently after updating the tags resulting in performing two OCMC
>> buffer copy operations.
>> 
>> Frame duplicate offloading is implemented with a common shared queue
>> between the two ports. The driver writes the frame once into OCMC RAM,
>> each port reads from the shared queue and replicates the transmission to
>> both PRU ports, synchronising between PRU ports are maintained within
>> firmware with appropriate handling.
>> 
>> For HSR the driver inspects the encapsulated ethertype in the HSR tag.
>> PTP frames (ETH_P_1588) are sent on the directed port only to avoid double
>> duplication and all other HSR frames are duplicated to both ports.
>> VLAN-tagged HSR frames are handled by advancing past the 4-byte VLAN header
>> before reading the HSR tag.
>> 
>> For PRP the driver checks the 6-byte RCT trailer for the ETH_P_PRP suffix
>> to identify redundancy-tagged frames. Frames without an RCT are sent on the
>> originating port only.
> 
> Warning: drivers/net/ethernet/ti/icssm/icssm_prueth.h:113 struct member
> 'host_recv_flag' not described in 'prueth_packet_info'
> 

We will address this in the next version.

> Please note that net-next will be closed for the next two weeks.

Also noted regarding the net-next closure. We will post the updated series
once the tree reopens.

Thanks and Regards,
Parvathi.

^ permalink raw reply

* Re: [PATCH v2] [net] net: airoha: Stop TX queues on error path in airoha_dev_open
From: Lorenzo Bianconi @ 2026-06-16 12:37 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178161146875.2165143.7400860261990016053@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

> In airoha_dev_open(), if airoha_set_vip_for_gdm_port() fails after
> netif_tx_start_all_queues() has been called, the TX queues remain
> started while the device configuration is incomplete. This leaves
> the device in an inconsistent state where packets could be
> transmitted before the VIP/IFC port configuration is complete.
> 
> Add netif_tx_stop_all_queues() call on the error path to properly
> roll back the TX queue state.
> 
> Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
> Signed-off-by: Wayen Yan <win847@gmail.com>
> ---
>  drivers/net/ethernet/airoha/airoha_eth.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 31cdb11cd7..cf9c366907 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1715,8 +1715,10 @@ static int airoha_dev_open(struct net_device *dev)
>  
>  	netif_tx_start_all_queues(dev);
>  	err = airoha_set_vip_for_gdm_port(port, true);
> -	if (err)
> +	if (err) {
> +		netif_tx_stop_all_queues(dev);

I do not think this is necessary since if ndo_open() callback fails, the
net_device is not marked as running.

Regards,
Lorenzo

>  		return err;
> +	}
>  
>  	if (netdev_uses_dsa(dev))
>  		airoha_fe_set(qdma->eth, REG_GDM_INGRESS_CFG(port->id),
> -- 
> 2.51.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH net v2 1/2] iov_iter: export iov_iter_restore
From: Stefano Garzarella @ 2026-06-16 12:35 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: netdev, Alexander Viro, Andrew Morton, Arseniy Krasnov,
	David S. Miller, Eric Dumazet, Eugenio Pérez, Jakub Kicinski,
	Jason Wang, kvm, linux-block, linux-fsdevel, linux-kernel,
	Michael S. Tsirkin, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	virtualization, Xuan Zhuo
In-Reply-To: <20260613000953.467473-2-tavip@google.com>

On Sat, Jun 13, 2026 at 12:09:52AM +0000, Octavian Purdila wrote:
>Export iov_iter_restore so that it can be used by modules.
>
>This is needed by the virtio vsock transport (which can be built as a
>module) to restore the msg_iter state when transmission fails.
>
>Signed-off-by: Octavian Purdila <tavip@google.com>
>---
> lib/iov_iter.c | 1 +
> 1 file changed, 1 insertion(+)

Acked-by: Stefano Garzarella <sgarzare@redhat.com>

>
>diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>index 243662af1af73..067e745f9ef53 100644
>--- a/lib/iov_iter.c
>+++ b/lib/iov_iter.c
>@@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
> 		i->__iov -= state->nr_segs - i->nr_segs;
> 	i->nr_segs = state->nr_segs;
> }
>+EXPORT_SYMBOL(iov_iter_restore);
>
> /*
>  * Extract a list of contiguous pages from an ITER_FOLIOQ iterator.  This does
>-- 
>2.54.0.1136.gdb2ca164c4-goog
>


^ permalink raw reply

* Re: [PATCH v2] [net] net: airoha: Fix QoS counter configuration for Tx-fwd channels
From: Lorenzo Bianconi @ 2026-06-16 12:35 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178161132384.2164449.18407700117859190327@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

> In airoha_qdma_init_qos_stats(), the Tx-fwd counter was incorrectly
> using register index (i << 1) instead of ((i << 1) + 1). This caused
> the Tx-fwd configuration to overwrite the Tx-cpu configuration for
> each QoS channel, resulting in incorrect QoS statistics.
> 
> Fix by using the correct register index ((i << 1) + 1) for Tx-fwd
> counter configuration.
> 
> Fixes: 20bf7d07c956 ("net: airoha: Add sched ETS offload support")
> Signed-off-by: Wayen Yan <win847@gmail.com>

Is this a patch you already sent? IIRC I have acked it.

Regards,
Lorenzo

> ---
>  drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 31cdb11cd7..329988a840 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1256,7 +1256,7 @@ static void airoha_qdma_init_qos_stats(struct airoha_qdma *qdma)
>  			       FIELD_PREP(CNTR_CHAN_MASK, i));
>  		/* Tx-fwd transferred count */
>  		airoha_qdma_wr(qdma, REG_CNTR_VAL((i << 1) + 1), 0);
> -		airoha_qdma_wr(qdma, REG_CNTR_CFG(i << 1),
> +		airoha_qdma_wr(qdma, REG_CNTR_CFG((i << 1) + 1),
>  			       CNTR_EN_MASK | CNTR_ALL_QUEUE_EN_MASK |
>  			       CNTR_ALL_DSCP_RING_EN_MASK |
>  			       FIELD_PREP(CNTR_SRC_MASK, 1) |
> -- 
> 2.51.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v2] [net] net: airoha: fix foe_check_time allocation size
From: Lorenzo Bianconi @ 2026-06-16 12:34 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178161119471.2163752.14373384830691569758@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]

> foe_check_time is declared as u16 pointer but was allocated with
> only ppe_num_entries bytes instead of ppe_num_entries * sizeof(u16).
> 
> When airoha_ppe_foe_verify_entry() is called with hash >= ppe_num_entries/2,
> it writes beyond the allocated buffer, causing heap buffer overflow and
> potential kernel crash.
> 
> Fixes: 6d5b601d52a2 ("net: airoha: ppe: Dynamically allocate foe_check_time array in airoha_ppe struct")
> Signed-off-by: Wayen Yan <win847@gmail.com>

Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

> ---
>  drivers/net/ethernet/airoha/airoha_ppe.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 5c9dff6bcc..8fb8ecf909 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -1578,7 +1578,8 @@ int airoha_ppe_init(struct airoha_eth *eth)
>  			return -ENOMEM;
>  	}
>  
> -	ppe->foe_check_time = devm_kzalloc(eth->dev, ppe_num_entries,
> +	ppe->foe_check_time = devm_kzalloc(eth->dev,
> +					   ppe_num_entries * sizeof(*ppe->foe_check_time),
>  					   GFP_KERNEL);
>  	if (!ppe->foe_check_time)
>  		return -ENOMEM;
> -- 
> 2.51.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH v4] flow_dissector: check device type before reading ETH_ADDRS
From: Yun Zhou @ 2026-06-16 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms
  Cc: netdev, linux-kernel, yun.zhou, qingfang.deng

__skb_flow_dissect() unconditionally reads 12 bytes from eth_hdr(skb)
when FLOW_DISSECTOR_KEY_ETH_ADDRS is requested. This assumes the skb
has a valid Ethernet header at mac_header, which is not always the case.

The problem can be triggered by:
 1. Creating a TUN device in L3 mode (IFF_TUN, hard_header_len=0)
 2. Attaching a multiq qdisc with a flower filter matching on eth_src
 3. Sending a packet through AF_PACKET

Since TUN in L3 mode has no link-layer header, mac_header points to
the L3 data area. The flow dissector reads 12 bytes of uninitialized
skb memory, which then propagates through fl_set_masked_key() and is
used as a rhashtable lookup key in __fl_lookup(), as reported by KMSAN.

Rejecting the filter in the control path (at tc filter add time) is
not feasible because TC filter blocks can be shared between arbitrary
devices -- a filter installed on an Ethernet device may later classify
packets on a headerless device through a shared block. The device
association is not fixed at filter creation time.

Fix this by gating the memcpy on dev->type == ARPHRD_ETHER, which
ensures only true Ethernet-framed packets have their addresses read.
This is more precise than the previous hard_header_len >= 12 check,
which would incorrectly pass for non-Ethernet link types like IPoIB
(ARPHRD_INFINIBAND, hard_header_len=24) and FDDI (hard_header_len=21)
whose L2 headers are not in Ethernet format. Additionally check
skb_mac_header_was_set() to guard against the pathological case where
mac_header is the unset sentinel (~0U), which would cause eth_hdr() to
return a wild pointer.

For the act_mirred redirect case (Ethernet packet redirected to a
non-Ethernet device sharing a TC block), zeroing the key is the correct
behavior: the packet is now being classified on the target device, where
Ethernet address matching is not semantically meaningful.

Note: on non-Ethernet devices, the zeroed key will match a filter
configured with all-zero MAC addresses. This is an improvement over the
previous behavior where uninitialized memory could randomly match any
filter.

Reported-by: syzbot+fa2f5b1fb06147be5e16@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa2f5b1fb06147be5e16
Fixes: 67a900cc0436 ("flow_dissector: introduce support for Ethernet addresses")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v4:
 - Use dev->type == ARPHRD_ETHER instead of hard_header_len >= 12 to
   avoid false positives on non-Ethernet link types (IPoIB, FDDI)
 - Add skb_mac_header_was_set() guard against unset mac_header sentinel
 - Document act_mirred and all-zero key edge cases in commit message

v3:
 - Replace skb_tail_pointer() - skb_mac_header() length check with
    skb->dev->hard_header_len check.

v2:
 - Adjust commit message and comment.

 net/core/flow_dissector.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2a98f5fa74eb..8aa4f9b4df81 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1173,13 +1173,21 @@ bool __skb_flow_dissect(const struct net *net,
 
 	if (dissector_uses_key(flow_dissector,
 			       FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
-		struct ethhdr *eth = eth_hdr(skb);
 		struct flow_dissector_key_eth_addrs *key_eth_addrs;
 
 		key_eth_addrs = skb_flow_dissector_target(flow_dissector,
 							  FLOW_DISSECTOR_KEY_ETH_ADDRS,
 							  target_container);
-		memcpy(key_eth_addrs, eth, sizeof(*key_eth_addrs));
+		/* TC filter blocks can be shared across devices with
+		 * different link types, so we cannot validate this
+		 * when the filter is installed -- check at dissect time.
+		 */
+		if (skb && skb->dev &&
+		    skb->dev->type == ARPHRD_ETHER &&
+		    skb_mac_header_was_set(skb))
+			memcpy(key_eth_addrs, eth_hdr(skb), sizeof(*key_eth_addrs));
+		else
+			memset(key_eth_addrs, 0, sizeof(*key_eth_addrs));
 	}
 
 	if (dissector_uses_key(flow_dissector,
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net] tipc: fix use-after-free of discoverer in tipc_disc_rcv()
From: Weiming Shi @ 2026-06-16 12:28 UTC (permalink / raw)
  To: Tung Quang Nguyen, Weiming Shi
  Cc: Simon Horman, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, Xiang Mei, Jon Maloy,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
In-Reply-To: <GV1P189MB1988A1CFCAA9214B6F009315C6182@GV1P189MB1988.EURP189.PROD.OUTLOOK.COM>

On Fri Jun 12, 2026 at 4:53 PM CST, Tung Quang Nguyen wrote:
>>Subject: [PATCH net] tipc: fix use-after-free of discoverer in tipc_disc_rcv()
>>
>>bearer_disable() frees b->disc with tipc_disc_delete()'s plain kfree(), but
>>tipc_disc_rcv() still dereferences b->disc in RX softirq under
>>rcu_read_lock() (tipc_udp_recv -> tipc_rcv -> tipc_disc_rcv).
>>
>>L2 bearers are safe thanks to the synchronize_net() in tipc_disable_l2_media(),
>>but the UDP bearer defers that call to the
>>cleanup_bearer() workqueue, so the discoverer is freed with no grace
>>period:
>>
>> BUG: KASAN: slab-use-after-free in tipc_disc_rcv (net/tipc/discover.c:149)
>>Read of size 8 at addr ffff88802348b728 by task poc_tipc/184  <IRQ>
>>  tipc_disc_rcv (net/tipc/discover.c:149)
>>  tipc_rcv (net/tipc/node.c:2126)
>>  tipc_udp_recv (net/tipc/udp_media.c:391)
>>  udp_rcv (net/ipv4/udp.c:2643)
>>  ip_local_deliver_finish (net/ipv4/ip_input.c:241)  </IRQ>  Freed by task 181:
>>  kfree (mm/slub.c:6565)
>>  bearer_disable (net/tipc/bearer.c:418)
>>  tipc_nl_bearer_disable (net/tipc/bearer.c:1001)
>>
>>The bearer is freed with kfree_rcu(); free the discoverer the same way.
>>Add an rcu_head to struct tipc_discoverer and free it and its skb from an RCU
>>callback.
>>
>>Reachable from an unprivileged user namespace: the TIPCv2 genl family is
>>netnsok and its bearer commands have no GENL_ADMIN_PERM. Needs
>>CONFIG_TIPC and CONFIG_TIPC_MEDIA_UDP.
>>
>>Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash
>>values")
>>Reported-by: Xiang Mei <xmei5@asu.edu>
>>Assisted-by: Claude:claude-opus-4-8
>>Signed-off-by: Weiming Shi <bestswngs@gmail.com>
>>---
>> net/tipc/discover.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>>diff --git a/net/tipc/discover.c b/net/tipc/discover.c index
>>3e54d2df5683a..34dbe5ad10e09 100644
>>--- a/net/tipc/discover.c
>>+++ b/net/tipc/discover.c
>>@@ -58,6 +58,7 @@
>>  * @skb: request message to be (repeatedly) sent
>>  * @timer: timer governing period between requests
>>  * @timer_intv: current interval between requests (in ms)
>>+ * @rcu: RCU head for deferred freeing
>>  */
>> struct tipc_discoverer {
>> 	u32 bearer_id;
>>@@ -69,6 +70,7 @@ struct tipc_discoverer {
>> 	struct sk_buff *skb;
>> 	struct timer_list timer;
>> 	unsigned long timer_intv;
>>+	struct rcu_head rcu;
>> };
>>
>> /**
>>@@ -382,6 +384,14 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>>*b,
>> 	return 0;
>> }
>>
>>+static void tipc_disc_free_rcu(struct rcu_head *rp) {
>>+	struct tipc_discoverer *d = container_of(rp, struct tipc_discoverer,
>>+rcu);
>
> This line is long (over 80 columns). Please break it into 2 lines (refer to linux/Documentation/process/coding-style.rst).
>
>>+
>>+	kfree_skb(d->skb);
>>+	kfree(d);
>>+}
>>+
>> /**
>>  * tipc_disc_delete - destroy object sending periodic link setup requests
>>  * @d: ptr to link dest structure
>>@@ -389,8 +399,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>>*b,  void tipc_disc_delete(struct tipc_discoverer *d)  {
>> 	timer_shutdown_sync(&d->timer);
>>-	kfree_skb(d->skb);
>>-	kfree(d);
>>+	call_rcu(&d->rcu, tipc_disc_free_rcu);
>> }
>>
>> /**
>>--
>>2.43.0
>>

Hi,

I’m sorry for taking so long to respond. The v2 version has already been sent.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox