Netdev List
 help / color / mirror / Atom feed
* [PATCH net v3 3/4] nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nfc_llcp_recv_snl() has four problems when handling a hostile peer:

 1. nfc_llcp_dsap()/nfc_llcp_ssap() dereference skb->data[0..1] without
    verifying skb->len; a 0- or 1-byte frame leads to an OOB read.
    Additionally tlv_len = skb->len - LLCP_HEADER_SIZE wraps when
    skb->len < 2, causing the following loop to run far past the
    buffer.

 2. The per-iteration loop guard `offset < tlv_len` only proves one
    byte is available, but the body reads tlv[0] and tlv[1].

 3. The peer-supplied `length` field is used to advance `tlv` without
    being checked against the remaining array space.

 4. The SDREQ handler previously only required length >= 1 but reads
    both tid (tlv[2]) and the first byte of service_name (tlv[3], via
    the pr_debug("%.16s") print and the service_name_len = length - 1
    string usage), so length >= 2 is required.

Fix: reject frames smaller than LLCP_HEADER_SIZE up front; add TLV
header and TLV value guards at the top of each iteration; bump the
SDREQ minimum length to 2.

Reachable from any NFC peer within ~4 cm once an LLCP link is up.

Fixes: 7a06f0ee2823 ("NFC: llcp: Service Name Lookup implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/llcp_core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index 366d75663..efe228f96 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -1282,6 +1282,11 @@ static void nfc_llcp_recv_snl(struct nfc_llcp_local *local,
 	size_t sdres_tlvs_len;
 	HLIST_HEAD(nl_sdres_list);
 
+	if (skb->len < LLCP_HEADER_SIZE) {
+		pr_err("Malformed SNL PDU\n");
+		return;
+	}
+
 	dsap = nfc_llcp_dsap(skb);
 	ssap = nfc_llcp_ssap(skb);
 
@@ -1298,11 +1303,17 @@ static void nfc_llcp_recv_snl(struct nfc_llcp_local *local,
 	sdres_tlvs_len = 0;
 
 	while (offset < tlv_len) {
+		if (tlv_len - offset < 2)
+			break;
 		type = tlv[0];
 		length = tlv[1];
+		if (tlv_len - offset - 2 < length)
+			break;
 
 		switch (type) {
 		case LLCP_TLV_SDREQ:
+			if (length < 2)
+				break;
 			tid = tlv[2];
 			service_name = (char *) &tlv[3];
 			service_name_len = length - 1;
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v3 4/4] nfc: llcp: fix OOB read of DM reason byte in nfc_llcp_recv_dm
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nfc_llcp_recv_dm() reads skb->data[2] (the DM reason byte) without
first verifying that skb->len is at least LLCP_HEADER_SIZE + 1.  A DM
PDU carrying only the 2-byte LLCP header from a rogue peer therefore
triggers a 1-byte OOB read.

Add the minimum-length guard at function entry, matching the pattern
used by nfc_llcp_recv_snl() and nfc_llcp_recv_agf().

Reachable from any NFC peer within ~4 cm once an LLCP link is up.

Fixes: d646960f7986 ("NFC: Add LLCP sockets")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/llcp_core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index efe228f96..6baf2fc6b 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -1237,6 +1237,11 @@ static void nfc_llcp_recv_dm(struct nfc_llcp_local *local,
 	struct sock *sk;
 	u8 dsap, ssap, reason;
 
+	if (skb->len < LLCP_HEADER_SIZE + 1) {
+		pr_err("Malformed DM PDU\n");
+		return;
+	}
+
 	dsap = nfc_llcp_dsap(skb);
 	ssap = nfc_llcp_ssap(skb);
 	reason = skb->data[2];
-- 
2.51.0


^ permalink raw reply related

* Re: [PATCH net-next v9 03/10] net: phylink: Register a phy_port for MAC-driven SFP busses
From: Andrew Lunn @ 2026-04-14 23:38 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-4-maxime.chevallier@bootlin.com>

> This phy_port represents the SFP cage itself, and not the module

> +static int phylink_create_sfp_port(struct phylink *pl)

I'm thinking about naming here. If this represent the cage, why not
call this phylink_create_sfp_cage_port(). I assume as some point there
is going to be something for the module, and it seem like the naming
is going to be confusing.

> +{
> +	struct phy_port *port;
> +	int ret = 0;
> +
> +	if (!pl->netdev || !pl->sfp_bus)
> +		return 0;
> +
> +	port = phy_port_alloc();
> +	if (!port)
> +		return -ENOMEM;
> +
> +	port->is_sfp = true;
> +	port->is_mii = true;
> +	port->active = true;

If this is a cage, not a module, does is_sfp = true make sense?

And what does an active cage mean?

	Andrew

^ permalink raw reply

* Re: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
From: kernel test robot @ 2026-04-14 23:40 UTC (permalink / raw)
  To: Dudu Lu, netdev; +Cc: oe-kbuild-all, stefanha, sgarzare, mst, jasowang, Dudu Lu
In-Reply-To: <20260413085243.73200-1-phx0fer@gmail.com>

Hi Dudu,

kernel test robot noticed the following build errors:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on net/main net-next/main linus/master horms-ipvs/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dudu-Lu/vsock-virtio-fix-accept-queue-count-leak-on-transport-mismatch-in-recv_listen/20260414-233232
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20260413085243.73200-1-phx0fer%40gmail.com
patch subject: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
config: sparc-randconfig-001-20260415 (https://download.01.org/0day-ci/archive/20260415/202604150747.6LyaJckM-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604150747.6LyaJckM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604150747.6LyaJckM-lkp@intel.com/

All errors (new ones prefixed by >>):

   net/vmw_vsock/virtio_transport_common.c:1:2: warning: data definition has no type or storage class
     sk_acceptq_added(sk);
     ^~~~~~~~~~~~~~~~
   net/vmw_vsock/virtio_transport_common.c:1:2: error: type defaults to 'int' in declaration of 'sk_acceptq_added' [-Werror=implicit-int]
   net/vmw_vsock/virtio_transport_common.c:1:2: warning: parameter names (without types) in function declaration
   In file included from include/linux/virtio_vsock.h:7,
                    from net/vmw_vsock/virtio_transport_common.c:15:
>> include/net/sock.h:1080:20: error: conflicting types for 'sk_acceptq_added'
    static inline void sk_acceptq_added(struct sock *sk)
                       ^~~~~~~~~~~~~~~~
   net/vmw_vsock/virtio_transport_common.c:1:2: note: previous declaration of 'sk_acceptq_added' was here
     sk_acceptq_added(sk);
     ^~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/sk_acceptq_added +1080 include/net/sock.h

^1da177e4c3f415 Linus Torvalds 2005-04-16  1079  
^1da177e4c3f415 Linus Torvalds 2005-04-16 @1080  static inline void sk_acceptq_added(struct sock *sk)
^1da177e4c3f415 Linus Torvalds 2005-04-16  1081  {
288efe8606b62d0 Eric Dumazet   2019-11-05  1082  	WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog + 1);
^1da177e4c3f415 Linus Torvalds 2005-04-16  1083  }
^1da177e4c3f415 Linus Torvalds 2005-04-16  1084  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [PATCH net] hv_sock: Report EOF instead of -EIO for FIN
From: Dexuan Cui @ 2026-04-14 23:43 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, sgarzare, davem, edumazet,
	kuba, pabeni, horms, niuxuewei.nxw, linux-hyperv, virtualization,
	netdev, linux-kernel
  Cc: stable, Ben Hillis, Mitchell Levy

Commit f0c5827d07cb unluckily causes a regression for the FIN packet,
and the final read syscall gets an error rather than 0.

Ideally, we would want to fix hvs_channel_readable_payload() so that it
could return 0 in the FIN scenario, but it's not good for the hv_sock
driver to use the VMBus ringbuffer's cached priv_read_index, which is
internal data in the VMBus driver.

Fix the regression in hv_sock by returning 0 rather than -EIO.

Fixes: f0c5827d07cb ("hv_sock: Return the readable bytes in hvs_stream_has_data()")
Cc: stable@vger.kernel.org
Reported-by: Ben Hillis <Ben.Hillis@microsoft.com>
Reported-by: Mitchell Levy <levymitchell0@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 net/vmw_vsock/hyperv_transport.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 069386a74557..63d3549125be 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -703,8 +703,22 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
 	switch (hvs_channel_readable_payload(hvs->chan)) {
 	case 1:
 		need_refill = !hvs->recv_desc;
-		if (!need_refill)
-			return -EIO;
+		if (!need_refill) {
+			/* Here hvs->recv_data_len is 0, so hvs->recv_desc must
+			 * be NULL unless it points to the 0-byte-payload FIN
+			 * packet: see hvs_update_recv_data().
+			 *
+			 * Here all the payload has been dequeued, but
+			 * hvs_channel_readable_payload() still returns 1,
+			 * because the VMBus ringbuffer's read_index is not
+			 * updated for the FIN packet: hvs_stream_dequeue() ->
+			 * hv_pkt_iter_next() updates the cached priv_read_index
+			 * but has no opportunity to update the read_index in
+			 * hv_pkt_iter_close() as hvs_stream_has_data() returns
+			 * 0 for the FIN packet, so it won't get dequeued.
+			 */
+			return 0;
+		}
 
 		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
 		if (!hvs->recv_desc)
-- 
2.49.0


^ permalink raw reply related

* Re: [PATCH net-next v9 04/10] net: phy: Create SFP phy_port before registering upstream
From: Andrew Lunn @ 2026-04-14 23:46 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-5-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:48PM +0200, Maxime Chevallier wrote:
> When dealing with PHY-driven SFP, we create a phy_port representing the
> SFP bus when we know we have such a bus.

I'm missing the big picture here.

Do we have three different things represented in the topology:

SFP bus-> SFP cage-> SFP module

	Andrew

^ permalink raw reply

* Re: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
From: kernel test robot @ 2026-04-15  0:04 UTC (permalink / raw)
  To: Dudu Lu, netdev; +Cc: oe-kbuild-all, stefanha, sgarzare, mst, jasowang, Dudu Lu
In-Reply-To: <20260413085243.73200-1-phx0fer@gmail.com>

Hi Dudu,

kernel test robot noticed the following build errors:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on net/main net-next/main linus/master horms-ipvs/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dudu-Lu/vsock-virtio-fix-accept-queue-count-leak-on-transport-mismatch-in-recv_listen/20260414-233232
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20260413085243.73200-1-phx0fer%40gmail.com
patch subject: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
config: arc-randconfig-001-20260415 (https://download.01.org/0day-ci/archive/20260415/202604150741.iQBI3cGE-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 13.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604150741.iQBI3cGE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604150741.iQBI3cGE-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> net/vmw_vsock/virtio_transport_common.c:1:9: warning: data definition has no type or storage class
       1 |         sk_acceptq_added(sk);
         |         ^~~~~~~~~~~~~~~~
>> net/vmw_vsock/virtio_transport_common.c:1:9: error: type defaults to 'int' in declaration of 'sk_acceptq_added' [-Werror=implicit-int]
>> net/vmw_vsock/virtio_transport_common.c:1:9: warning: parameter names (without types) in function declaration
   In file included from include/linux/virtio_vsock.h:7,
                    from net/vmw_vsock/virtio_transport_common.c:15:
>> include/net/sock.h:1080:20: error: conflicting types for 'sk_acceptq_added'; have 'void(struct sock *)'
    1080 | static inline void sk_acceptq_added(struct sock *sk)
         |                    ^~~~~~~~~~~~~~~~
   net/vmw_vsock/virtio_transport_common.c:1:9: note: previous declaration of 'sk_acceptq_added' with type 'int()'
       1 |         sk_acceptq_added(sk);
         |         ^~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +1 net/vmw_vsock/virtio_transport_common.c

   > 1		sk_acceptq_added(sk);
     2	// SPDX-License-Identifier: GPL-2.0-only
     3	/*
     4	 * common code for virtio vsock
     5	 *
     6	 * Copyright (C) 2013-2015 Red Hat, Inc.
     7	 * Author: Asias He <asias@redhat.com>
     8	 *         Stefan Hajnoczi <stefanha@redhat.com>
     9	 */
    10	#include <linux/spinlock.h>
    11	#include <linux/module.h>
    12	#include <linux/sched/signal.h>
    13	#include <linux/ctype.h>
    14	#include <linux/list.h>
    15	#include <linux/virtio_vsock.h>
    16	#include <uapi/linux/vsockmon.h>
    17	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: DMA issues with the SKGE drivers
From: Andrew Lunn @ 2026-04-15  0:12 UTC (permalink / raw)
  To: Benoît Dufour; +Cc: netdev
In-Reply-To: <9df653d6-d7f8-4b36-87de-65daf28635dd@mail.com>

On Tue, Apr 14, 2026 at 07:23:17PM -0400, Benoît Dufour wrote:
> In 2024, I reported a bug about the SKGE driver, you can see it here:
> https://bugzilla.kernel.org/show_bug.cgi?id=219270
> 
> Basically, the problem is that the Marvell 88E8001 on my ASUS A8V motherboard
> can only work with 32bit DMA, and if trying to use 64bit DMA, the NIC won't
> work at all and after some time, the operating system will become completely
> unresponsive (on screen tty will stop refresh, keyboard and mouse input will
> stop working too).
> 
> The fix is quite easy:
> 
> At the very end of the SKGE driver source code, the ASUS A8V motherboard (as
> well as many other boards like the ASUS A8V Deluxe) should be added to the list
> of 32bit DMA boards:
> https://github.com/torvalds/linux/blob/508fed6795411f5ab277fd1edc0d7adca4946f23
> /drivers/net/ethernet/marvell/skge.c#L4150

Hi Benoît 

Could you submit a patch adding the needed entry for your board?

	Andrew



^ permalink raw reply

* Re: DMA issues with the SKGE drivers
From: Benoît Dufour @ 2026-04-15  0:27 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <c3130329-a7ea-406c-9ac1-2fa5d9d3a8bc@lunn.ch>

Yes, I guess I'd be able to, but only after testing it.
I'm currently waiting a new cooler that would support the 110 W TDP of 
my Opteron 180 X2.
The previous cooler I tested were definitely not good enough for cooling 
that CPU.
Just running MemTest86+ for too long made the CPU overheat.
Even the Arctic Cooler Freezer 7 CO wasn't good enough.

Le 14/04/2026 à 20:12, Andrew Lunn a écrit :
> On Tue, Apr 14, 2026 at 07:23:17PM -0400, Benoît Dufour wrote:
>> In 2024, I reported a bug about the SKGE driver, you can see it here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219270
>>
>> Basically, the problem is that the Marvell 88E8001 on my ASUS A8V motherboard
>> can only work with 32bit DMA, and if trying to use 64bit DMA, the NIC won't
>> work at all and after some time, the operating system will become completely
>> unresponsive (on screen tty will stop refresh, keyboard and mouse input will
>> stop working too).
>>
>> The fix is quite easy:
>>
>> At the very end of the SKGE driver source code, the ASUS A8V motherboard (as
>> well as many other boards like the ASUS A8V Deluxe) should be added to the list
>> of 32bit DMA boards:
>> https://github.com/torvalds/linux/blob/508fed6795411f5ab277fd1edc0d7adca4946f23
>> /drivers/net/ethernet/marvell/skge.c#L4150
> Hi Benoît
>
> Could you submit a patch adding the needed entry for your board?
>
> 	Andrew
>
>
-- 
Benoît Dufour

Unfortunately still a student in Computer Science


^ permalink raw reply

* [PATCH net v1] net/mlx5: Fix HCA caps leak on notifier init failure
From: Prathamesh Deshpande @ 2026-04-15  0:49 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Carolina Jubran
  Cc: Cosmin Ratiu, Tariq Toukan, Jakub Kicinski, netdev, linux-rdma,
	linux-kernel, Prathamesh Deshpande

mlx5_mdev_init() allocates HCA caps via mlx5_hca_caps_alloc() before
calling mlx5_notifiers_init(). If notifier initialization fails, the
error path jumps to err_hca_caps and skips mlx5_hca_caps_free(), leaking
allocated caps.

Add a dedicated unwind label for notifier-init failure that frees HCA
caps before continuing the existing cleanup sequence.

Fixes: b6b03097f982 ("net/mlx5: Initialize events outside devlink lock")
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 3f73d9b1115d..fab80c79ff07 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1907,7 +1907,7 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
 
 	err = mlx5_notifiers_init(dev);
 	if (err)
-		goto err_hca_caps;
+		goto err_notifiers_init;
 
 	/* The conjunction of sw_vhca_id with sw_owner_id will be a global
 	 * unique id per function which uses mlx5_core.
@@ -1923,6 +1923,8 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
 
 	return 0;
 
+err_notifiers_init:
+	mlx5_hca_caps_free(dev);
 err_hca_caps:
 	mlx5_adev_cleanup(dev);
 err_adev_init:
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v4] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Prathamesh Deshpande @ 2026-04-15  1:05 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Carolina Jubran, Saeed Mahameed, Richard Cochran, Tariq Toukan,
	Mark Bloch, netdev, linux-rdma, linux-kernel
In-Reply-To: <20260413144610.GJ21470@unreal>

On Mon, Apr 13, 2026 at 05:46:10PM +0300, Leon Romanovsky wrote:
> On Sun, Apr 12, 2026 at 01:04:10AM +0100, Prathamesh Deshpande wrote:
> > In mlx5_pps_event(), several critical issues were identified:
> > 
> > 1. The 'pin' index from the hardware event was used without bounds
> >    checking to index 'pin_config' and 'pps_info->start'. Check against
> >    MAX_PIN_NUM to prevent out-of-bounds access.
> 
> You were told more than once that this is impossible.
> 
> <...>
> 
> > +	if (WARN_ON_ONCE(pin >= MAX_PIN_NUM))
> > +		return NOTIFY_OK;
> 
> Let's not add useless checks in fast path.

Hi Leon,

Thanks for the feedback. I've addressed this in v5 by dropping the 
redundant pin bounds and pin_config checks to keep the fast path clean, 
focusing strictly on the stack leak and NULL clock guard fixes.

Thanks,
Prathamesh

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-15  1:19 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <ad5LlXzeQ8j14Mjg@shell.armlinux.org.uk>

Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
survives iperf3 -c -R to the imx6.

Dumping the registers and comparing, and then forcing the RQS and TQS
values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
*256 = 36864 ytes) respectively seems to solve the problem. Under
net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
all four of the MTL receive operation mode registers, but only the
first MTL transmit operation mode register. However, DMA channels 1-3
aren't initialised.

net-next derives them from:

        unsigned int tqs = fifosz / 256 - 1;

where fifosz is passed in to dwmac4_dma_tx_chan_op_mode() and

        unsigned int rqs = fifosz / 256 - 1;

where fifosz is passed in to dwmac4_dma_rx_chan_op_mode().

Now, according to the DMA capabilities:

        Number of Additional RX channel: 4
        Number of Additional TX channel: 4
        Number of Additional RX queues: 4
        Number of Additional TX queues: 4
        TX Fifo Size: 65536
        RX Fifo Size: 65536

However:

# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:             4
TX:             4
Other:          0
Combined:       0
Current hardware settings:
RX:             1
TX:             1
Other:          0
Combined:       0

So, we end up allocating the entire 64K of the tx and rx FIFO to one
queue in net-next.

Looking back at 5.10, I don't see any code that would account for these
values being programmed for TQS and RQS, it looks like the calculations
are basically the same as we have today.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [ANN] netdev development stats for 7.1
From: Jakub Kicinski @ 2026-04-15  1:26 UTC (permalink / raw)
  To: netdev

Hi!

Intro
-----

As is tradition here are the development statistics based on mailing
list traffic on netdev@vger.

These stats are somewhat like LWN stats: https://lwn.net/Articles/1004998/
but more focused on mailing list participation. And by participation
we mean reviewing code more than producing patches.

In particular "review score" tries to capture the balance between
reviewing other people's code vs posting patches. It's roughly
number of patches reviewed minus number of patches posted. 
Those who post more than they review will have a negative score.

Previous 3 reports:
 - for 6.18: https://lore.kernel.org/20251002171032.75263b18@kernel.org 
 - for 6.19: https://lore.kernel.org/20251202175548.6b5eb80e@kernel.org
 - for 7.0:  https://lore.kernel.org/20260212124208.187e53ae@kernel.org

General stats
-------------

This has been subjectively a pretty crazy release. Last week especially.
It's bleakly reassuring to see that the numbers confirm how we feel.

Let us use 6.18 as a point of reference, since the last release of year
is usually the biggest one. 7.1 had the same linux-next size as 6.18.
The core networking maintainers committed slightly more changes than 
in the 6.18 cycle (1531 / 24 a day / +1.6%). The number of messages 
on the list was dramatically higher (318 msg a day / +21.0%), and so
was the number of people we've interacted with (874 / +12.0%).
The number of people may be slightly under-counted, we noticed that
some authors of semi-automated fixes share an email address(!?)

The tenure histograms confirm that we are dealing with a lot of
newcomers:

  Time since poster's first commit in 6.18
  no commit |  76 | **************************************************
   0- 3mo   |  33 | *********************
   3- 6mo   |  18 | ***********
  6mo-1yr   |  30 | *******************

  Time since poster's first commit in 7.1
  no commit | 107 | **************************************************
   0- 3mo   |  61 | ****************************
   3- 6mo   |  15 | *******
  6mo-1yr   |  29 | *************

In other words number of authors increased by 81, number of people with
less than 3mo since their first commit increased by 59. This is not
surprising but newcomers require a lot more hand holding. And something
tells me the churn of newcomers will only go up.

The review coverage continues to drop, and is now the lowest recorded
(42.9% of changes being reviewed by someone from a different company
than the author). At the same time patches are reposted more often,
with average number of revisions going up by 10%.

AI reviews
----------

In the previous cycle we have introduced a netdev AI review bot which
was using Chris Mason's review prompts. This cycle saw introduction
of Sashiko, which _seems_ much better at spotting bugs but most of the
bugs it finds are unrelated to the submission. Our bot intentionally
tried to exclude complaining about existing problems. Sashiko also 
"asks questions" about potentially issues which it is unsure are in
fact a problem. This may be fine during review in development, but
upstream it means that maintainers are now spending around 50% of their
time trying to disprove AI reviews. Last but not least because
the reviews are public immediately we have people reacting to them,
spamming the list and often incorporating incorrect feedback.

None of this is meant as a criticism of the tools. We are lucky
to have in the community people willing to invest their time
to build such tools, and companies willing to sponsor the LLM tokens.
That said, combination of extra work AI tools put on maintainers and
ease for newcomers to produce plausible but incorrect code is pushing 
us beyond our limits. Especially when the plausible looking code is
"fixing bugs" in 20 year old code which none of the current maintainers
have any interest in or frankly sense of responsibility for.

One more thing to note before I end this rant. The LLMs are expensive
and/or capacity constrained. While a lot of the issues could be
addressed by LLMs doing more research, the current prompts already
eat our entire budgets. Real engineering work is required to make
the LLMs more efficient by building tools and MCP endpoints around
the LLMs. It is hard to find time to do this work when we average 
150 patches send to the list on any working day.

I'd like us to gather up during the next bi-weekly call slot and
discuss some ideas on how we can survive the changes.

Testing
-------

Percentage of changes to selftests stubbornly remains at around 10% of
all commits. Here are the top contributors:

Contributions to selftests:
   1 [ 34] Jakub Kicinski
   2 [ 10] Ioana Ciornei
   3 [  7] Aleksei Oladko
   4 [  7] Simon Baatz
   5 [  6] Jiayuan Chen
   6 [  6] Dimitri Daskalakis
   7 [  6] Bobby Eshleman
   8 [  5] David Wei
   9 [  5] Allison Henderson
  10 [  4] Maciej Fijalkowski

Good news on the HW testing side, we now have machines with 4 NICs in
our labs (all the 25G+ NICs our supplier offered ;)) Broadcom BCM57508,
nVidia CX7, Intel X710, Intel E830. We have caught a number of issues
with them already.

Matrix of the tests vs NICs: https://netdev.bots.linux.dev/devices.html

Developer rankings
------------------

Top reviewers (cs):                  Top reviewers (msg):                
   1 (   ) [48] Jakub Kicinski          1 (   ) [112] Jakub Kicinski     
   2 (   ) [31] Simon Horman            2 ( +1) [ 50] Simon Horman       
   3 (   ) [13] Andrew Lunn             3 ( -1) [ 34] Andrew Lunn        
   4 (   ) [11] Paolo Abeni             4 (   ) [ 22] Paolo Abeni        
   5 (   ) [10] Eric Dumazet            5 ( +1) [ 21] Eric Dumazet       
   6 (+13) [ 7] Kuniyuki Iwashima       6 ( +1) [ 19] Russell King       
   7 (   ) [ 7] Russell King            7 ( -2) [ 15] Aleksandr Loktionov
   8 ( -2) [ 5] Aleksandr Loktionov     8 ( +7) [ 13] Kuniyuki Iwashima  
   9 ( -1) [ 4] Willem de Bruijn        9 ( -1) [ 11] Willem de Bruijn   
  10 ( +5) [ 3] Krzysztof Kozlowski    10 ( +4) [  9] Krzysztof Kozlowski
  11 (***) [ 3] Joe Damato             11 ( -2) [  9] Vladimir Oltean    
  12 ( +5) [ 3] Florian Westphal       12 (+33) [  8] Sabrina Dubroca    
  13 (***) [ 3] Pablo Neira Ayuso      13 ( +9) [  7] Ido Schimmel       
  14 (   ) [ 3] Paul Menzel            14 (***) [  6] Joe Damato         
  15 ( -3) [ 3] Maxime Chevallier      15 (+21) [  6] Conor Dooley       

Lots of familiar names among top reviewers. Kuniyuki returned after
short absence, reviewing core networking, sockets, UNIX, TCP etc.
Joe reviewed various patches with no easily discernible theme (which
is perfectly fine :)). Sabrina reviews / maintains all things crypto
(ipsec, macsec, tls) which is of huge help. Ido is reliably helping
with IP / routing and bridge reviews. Pablo and Florian focus on
netfilter but there's quite a bit of cross posting. Thank you all!

Top authors (cs):                    Top authors (msg):                  
   1 (   ) [10] Eric Dumazet            1 (   ) [37] Russell King        
   2 ( +1) [ 5] Jakub Kicinski          2 ( +3) [23] Eric Dumazet        
   3 (***) [ 4] Jiayuan Chen            3 (***) [22] Jeff Layton         
   4 (***) [ 4] Aleksandr Loktionov     4 (***) [22] Kuniyuki Iwashima   
   5 ( -1) [ 4] Russell King            5 ( +2) [21] Tariq Toukan        
   6 ( +3) [ 3] Lorenzo Bianconi        6 (+21) [20] Vladimir Oltean     
   7 ( -1) [ 3] Tariq Toukan            7 ( +6) [17] Jakub Kicinski      
   8 (***) [ 2] Qingfang Deng           8 (+15) [16] Xuan Zhuo           
   9 ( +3) [ 2] Kuniyuki Iwashima       9 (+11) [15] Florian Westphal    
  10 (***) [ 2] Fernando Fernandez M.  10 ( +8) [15] Tony Nguyen         

Jiayuan Chen provided quite a few (quality) fixes across the stack.
Aleksandr cross posts Intel driver submissions quite a bit.
Russell continued to clean up stammac, AKA the Augean stables.
Lorenzo works on airoha, Qingfang on PPP and Fernando removed
the support for IPv6=m among other things.

Jeff cross posts NFS patches, bringing Meta's reviewer score down,
much to my chagrin. Don't tell him I said this :)

Top scores (positive):               Top scores (negative):              
   1 (   ) [769] Jakub Kicinski         1 (***) [84] Jeff Layton         
   2 ( +1) [440] Simon Horman           2 (+11) [67] Tariq Toukan        
   3 ( -1) [227] Andrew Lunn            3 (+22) [57] Xuan Zhuo           
   4 (   ) [170] Paolo Abeni            4 (+42) [41] Bhargava Chenna Marreddy
   5 ( +4) [ 73] Eric Dumazet           5 (***) [38] Larysa Zaremba      
   6 (   ) [ 65] Willem de Bruijn       6 (+15) [38] Illusion Wang       
   7 ( +1) [ 58] Krzysztof Kozlowski    7 ( -6) [37] Ratheesh Kannoth    
   8 ( +7) [ 39] David Ahern            8 ( +8) [37] Tony Nguyen         
   9 (+10) [ 38] Ido Schimmel           9 (***) [36] Satish Kharat       
  10 ( -5) [ 37] Aleksandr Loktionov   10 ( -5) [36] Wei Fang            

Number of people on the "negative review score" side are there because
they are struggling to get new drivers in because of the depth of the
AI reviews.

Company rankings
----------------

Note on company rankings - because of the volume of patches I now
completely depend on a UI which ranks submissions on various
"readiness" metrics. One of them is the company review score.
This is to say that having a negative review score will now
impact review latency by up to 2 days.

Top reviewers (cs):                  Top reviewers (msg):                
   1 (   ) [55] Meta                    1 (   ) [135] Meta               
   2 (   ) [47] RedHat                  2 (   ) [104] RedHat             
   3 ( +2) [16] Google                  3 ( +1) [ 45] Google             
   4 ( -1) [16] Intel                   4 ( -1) [ 45] Intel              
   5 ( -1) [13] Andrew Lunn             5 (   ) [ 34] Andrew Lunn        
   6 (   ) [11] nVidia                  6 (   ) [ 30] nVidia             
   7 (   ) [ 9] Oracle                  7 (   ) [ 26] Oracle          

Top authors (cs):                    Top authors (msg):                  
   1 (   ) [16] Google                  1 ( +1) [102] Meta               
   2 (   ) [14] RedHat                  2 ( -1) [ 73] RedHat             
   3 (   ) [12] Meta                    3 ( +1) [ 68] Google             
   4 ( +1) [11] Intel                   4 ( +4) [ 67] Intel              
   5 ( +2) [ 7] Oracle                  5 (   ) [ 50] Oracle             
   6 ( -2) [ 6] nVidia                  6 ( -3) [ 47] nVidia             
   7 (+12) [ 5] Microsoft               7 ( -1) [ 43] NXP                 

Top scores (positive):               Top scores (negative):              
   1 (   ) [556] Meta                   1 (+16) [112] NXP                
   2 (   ) [496] RedHat                 2 (***) [ 68] Microsoft          
   3 (   ) [227] Andrew Lunn            3 (+15) [ 59] Alibaba            
   4 ( +5) [ 88] Linaro                 4 (***) [ 45] Microchip          
   5 ( +3) [ 35] Linux Foundation       5 (+46) [ 45] Shopee             
   6 ( -1) [ 32] Google                 6 (***) [ 41] Qualcomm           
   7 (   ) [ 32] Max-Planck             7 ( -6) [ 38] Huawei            
-- 
Code: https://github.com/kuba-moo/ml-stat
Raw output: https://netdev.bots.linux.dev/ml-stats/stats-7.1

^ permalink raw reply

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Jiayuan Chen @ 2026-04-15  1:47 UTC (permalink / raw)
  To: mkf, bpf
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David Ahern, netdev, linux-doc, linux-kernel
In-Reply-To: <42c1fed84a84519c2432163aa46f587f2d624fef.camel@163.com>


On 4/14/26 11:37 PM, mkf wrote:
> On Tue, 2026-04-14 at 18:57 +0800, Jiayuan Chen wrote:


[...]

> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -475,12 +475,21 @@ struct tcp_sock {
>   	u8	bpf_sock_ops_cb_flags;  /* Control calling BPF programs
>   					 * values defined in uapi/linux/tcp.h
>   					 */
> -	u8	bpf_chg_cc_inprogress:1; /* In the middle of
> +	u8	bpf_chg_cc_inprogress:1, /* In the middle of
>   					  * bpf_setsockopt(TCP_CONGESTION),
>   					  * it is to avoid the bpf_tcp_cc->init()
>   					  * to recur itself by calling
>   					  * bpf_setsockopt(TCP_CONGESTION, "itself").
>   					  */
> +		bpf_hdr_opt_len_cb_inprogress:1; /* It is set before invoking the
> +						  * callback so that a nested
> +						  * bpf_setsockopt(TCP_NODELAY) or
> +						  * bpf_setsockopt(TCP_CORK) cannot
> +						  * trigger tcp_push_pending_frames(),
> +						  * which would call tcp_current_mss()
> +						  * -> bpf_skops_hdr_opt_len(), causing
> +						  * infinite recursion.
> +						  */
>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
>   #else
>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158fb0..518699429a7a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5483,6 +5483,10 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
>   	if (sk->sk_protocol != IPPROTO_TCP)
>   		return -EINVAL;
>   
> +	if ((optname == TCP_NODELAY || optname == TCP_CORK) &&
> +	    tcp_sk(sk)->bpf_hdr_opt_len_cb_inprogress)
> +		return -EBUSY;
> +
> TCP_CORK is not support in sol_tcp_sockopt(), return -EINVAL by default. and put the check here
> could also prevent us from calling getsockopt(TCP_NODELAY) below.
>
>>   	switch (optname) {
>>   	case TCP_NODELAY:
>>   	case TCP_MAXSEG:
>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
>> index dafb63b923d0..fb06c464ac16 100644
>> --- a/net/ipv4/tcp_minisocks.c
>> +++ b/net/ipv4/tcp_minisocks.c
>> @@ -663,6 +663,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
>>   	RCU_INIT_POINTER(newtp->fastopen_rsk, NULL);
>>   
>>   	newtp->bpf_chg_cc_inprogress = 0;
>> +	newtp->bpf_hdr_opt_len_cb_inprogress = 0;
>>   	tcp_bpf_clone(sk, newsk);
>>   
>>   	__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 326b58ff1118..c9654e690e1a 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -475,6 +475,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
>>   				  unsigned int *remaining)
>>   {
>>   	struct bpf_sock_ops_kern sock_ops;
>> +	struct tcp_sock *tp = tcp_sk(sk);
>>   	int err;
>>   
>>   	if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk),
>> @@ -519,7 +520,9 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
>>   	if (skb)
>>   		bpf_skops_init_skb(&sock_ops, skb, 0);
>>   
>> +	tp->bpf_hdr_opt_len_cb_inprogress = 1;
> we check the BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG before calling BPF_CGROUP_RUN_PROG_SOCK_OPS_SK,
> could this flag use for the same purpose? so we don't need to add an extra field.
>
> 	if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk),
> 					   BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) ||
> 	    !*remaining)
> 		return;


Hi Martin, I saw your patch. Your solution is better, please ignore mine :)




^ permalink raw reply

* Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
From: fengchengwen @ 2026-04-15  1:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
In-Reply-To: <20260414151125.GF2577880@ziepe.ca>

On 4/14/2026 11:11 PM, Jason Gunthorpe wrote:
> On Tue, Apr 14, 2026 at 10:46:00PM +0800, fengchengwen wrote:
>>    We have a real platform requirement:
>>
>>      * 1. Devices in TPH Device-Specific Mode with no standard ST table
>>      * 2. Steering Tags must be obtained from ACPI _DSM (kernel-only)
>>      * 3. Devices are fully managed by userspace drivers (VFIO/UIO)
>>      * 4. Userspace must program STs into vendor-specific registers
> 
> No, this is nonsenscial too.
> 
> If you want to control the steering tags for MMIO BAR memory exposed
> by VFIO then the DMABUF mechanism Keith & co has been working on is
> the correct approach.
> 
> If the VFIO user needs to control steering tags for the device it is
> directly controling then it must do that through VFIO ioctls.
> 
> Nobody messes around with other devices under the covers of the
> operating kernel driver. Stop proposing that.

Understood.

For VFIO-passed devices that are fully under userspace control,
we will implement the TPH Steering Tag query interface
exclusively through VFIO ioctls, not sysfs.

This will allow userspace to query per-CPU Steering Tags
from platform firmware (ACPI _DSM) for the VFIO device,
which is fully under its control.

Thanks

> 
> Jason


^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH v2] dpf: fix UAF and double free in idpf_plug_vport_aux_dev() error path
From: Guangshuo Li @ 2026-04-15  1:47 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Joshua Hay,
	Tatyana Nikolova, Madhu Chittim, intel-wired-lan, netdev,
	linux-kernel, Greg Kroah-Hartman, stable
In-Reply-To: <5da15f31-e9af-4f8d-82fd-eac29a6d98f6@intel.com>

Hi Jacob,

Thanks for reviewing.

On Wed, 15 Apr 2026 at 05:03, Jacob Keller <jacob.e.keller@intel.com> wrote:
>
>
> This doesn't look right. The commit message analysis seems to match this
> fix from Greg KH:
>
> https://lore.kernel.org/intel-wired-lan/2026041432-tapestry-condition-22ff@gregkh/
>
> But the changes do not make any sense to me. It looks like a poorly done
> AI-generated "fix" which is not correct. Greg's version does look like
> it properly resolves this.
>
> > v2:
> >   - note that the issue was identified by my static analysis tool
> >   - and confirmed by manual review
> >
>
> What even is this change log?? I see that version was sent and everyone
> else was sane enough to just silently reject or ignore the v1...
>
> >  drivers/net/ethernet/intel/idpf/idpf_idc.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/intel/idpf/idpf_idc.c b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > index 6dad0593f7f2..2a18907643fc 100644
> > --- a/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > +++ b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > @@ -59,6 +59,7 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
> >       char name[IDPF_IDC_MAX_ADEV_NAME_LEN];
> >       struct auxiliary_device *adev;
> >       int ret;
> > +     int adev_id;
> >
>
> You create a local variable here...
>
> >       iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
> >       if (!iadev)
> > @@ -74,11 +75,14 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
> >               goto err_ida_alloc;
> >       }
> >       adev->id = ret;
> > +     adev->id = adev_id;
>
> adev_is is never initialized, so you assign a random garbage
> uninitialized value. This is obviously wrong and will lead to worse
> errors than the failed cleanup.
>
> I'm rejecting this patch in favor of the clearly appropriate fix from Greg.
>
> >       adev->dev.release = idpf_vport_adev_release;
> >       adev->dev.parent = &cdev_info->pdev->dev;
> >       sprintf(name, "%04x.rdma.vdev", cdev_info->pdev->vendor);
> >       adev->name = name;
> >
> > +     /* iadev is owned by the auxiliary device */
> > +     iadev = NULL;>          ret = auxiliary_device_init(adev);
> >       if (ret)
> >               goto err_aux_dev_init;
> > @@ -92,7 +96,7 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
> >  err_aux_dev_add:
> >       auxiliary_device_uninit(adev);
> >  err_aux_dev_init:
> > -     ida_free(&idpf_idc_ida, adev->id);
> > +     ida_free(&idpf_idc_ida, adev_id);
> >  err_ida_alloc:
> >       vdev_info->adev = NULL;
> >       kfree(iadev);
>

You are right that the v2 patch as sent is incomplete. That was my
mistake when preparing/sending v2: it accidentally dropped the adev_id
= ret; assignment, which made that version incorrect.

For reference, the original v1 patch is here:

https://lkml.org/lkml/2026/3/21/421

In v1, adev_id was assigned from ret before use, so I believe that
particular uninitialized-variable issue was introduced in the v2
posting.

Sorry for the confusion caused by the broken v2 posting.

Thanks,
Guangshuo

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Sam Edwards @ 2026-04-15  2:12 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <ad7nsUQvksJF7JFK@shell.armlinux.org.uk>

On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> survives iperf3 -c -R to the imx6.

Hi Russell,

Aw, you beat me to it! I was about to report that 5.10.104-tegra is
unaffected. And my iperf3 server is a multi-GbE amd64 machine.

> Dumping the registers and comparing, and then forcing the RQS and TQS
> values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> *256 = 36864 ytes) respectively seems to solve the problem. Under
> net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> all four of the MTL receive operation mode registers, but only the
> first MTL transmit operation mode register. However, DMA channels 1-3
> aren't initialised.

Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
than that, so when the DMA suffers a momentary hiccup, the FIFOs are
allowed to overflow, putting the hardware in a bad state.

Though I suspect this is only half of the problem: do you still see
RBUs? Everything you've shared so far suggests the DMA failures are
_not_ because the rx ring is drying up. My gut's telling me the DMA
unit is encountering an AXI error, triggering RBU plus some kind of
recovery behavior, and the recovery takes the DMA offline long enough
for the FIFO to overflow (without triggering RPS because the RQS
threshold is unreachable).

It seems that the problem happens less frequently on my test setup
when I boot with iommu.passthrough=1 but that could be my imagination.
But if the hardware remains stable with RQS and TQS set correctly, I
don't feel an urgent need to dig deeper. :)

> Looking back at 5.10, I don't see any code that would account for these
> values being programmed for TQS and RQS, it looks like the calculations
> are basically the same as we have today.

Note that Nvidia have their own "nvethernet" driver for their vendor
kernel, which appears to pick the FIFO sizes from hardcoded tables in
its eqos_configure_mtl_queue() [1] function.

Cheers,
Sam

[1] https://github.com/proski/nvethernet/blob/main/nvethernetrm/osi/core/eqos_core.c#L263

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] selftests/bpf: verify syncookie statistics in tcp_custom_syncookie
From: Jiayuan Chen @ 2026-04-15  2:22 UTC (permalink / raw)
  To: Martin KaFai Lau, Paolo Abeni
  Cc: Kuniyuki Iwashima, Eric Dumazet, Daniel Borkmann, netdev,
	Neal Cardwell, David S. Miller, Jakub Kicinski, Simon Horman,
	David Ahern, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	linux-kernel, bpf, linux-kselftest
In-Reply-To: <2026414195145.Ur8-.martin.lau@linux.dev>


On 4/15/26 4:02 AM, Martin KaFai Lau wrote:
> On Tue, Apr 14, 2026 at 11:17:39AM +0200, Paolo Abeni wrote:
>> On 4/14/26 11:08 AM, Paolo Abeni wrote:
>>> On 4/14/26 7:50 AM, Kuniyuki Iwashima wrote:
>>>> On Fri, Apr 10, 2026 at 6:32 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>>>>> Add read_tcpext_snmp() helper to network_helpers which reads a
>>>>> TcpExt SNMP counter via nstat, and use it in the tcp_custom_syncookie
>>>>> test to verify that LINUX_MIB_SYNCOOKIESRECV is incremented and
>>>>> LINUX_MIB_SYNCOOKIESFAILED stays unchanged across a successful
>>>>> BPF custom syncookie validation.
>>>>>
>>>>> The delta is captured between start_server() and accept(), which
>>>>> covers the full SYN/ACK/cookie-check path for one connection.
>>>>>
>>>>> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
>>>>> ---
>>>>>   tools/testing/selftests/bpf/network_helpers.c | 22 +++++++++++++++++++
>>>>>   tools/testing/selftests/bpf/network_helpers.h |  1 +
>>>>>   .../bpf/prog_tests/tcp_custom_syncookie.c     | 20 +++++++++++++++++
>>>> As you touch bpf selftest helper files, please rebase on bpf-next
>>>> to avoid possible conflicts and tag bpf-next in the Subject.
>>> To hopefully  minimize the conflicts handling I'm going to apply patch
>>> 1/2 to net-next. Please resubmit patch 2/2 to bpf-next after the
>>> relevant net core reach there.
>> Uhmm... the original feature went through the bpf tree, so I guess both
>> patches could/should via bpf-next. Hopefully conflict into the tcp code
>> should be minimal.
> I think it is best to land both patches together. It seems the 7.1 pull-request
> is out. We can take it to bpf-next/net after the merge window and then follow
> by a pull-request for the net-next tree as usual.


Thanks, Martin. I will rebase and send it, targeting bpf-next, after the 
merge window ends.


^ permalink raw reply

* [PATCH net-next 1/3] net/ethernet: add ZTE network driver support
From: Junyang Han @ 2026-04-15  1:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, han.junyang,
	ran.ming, han.chengfei, zhang.yanze


[-- Attachment #1.1.1: Type: text/plain, Size: 11688 bytes --]

Add basic framework for ZTE DingHai ethernet PF driver, including
Kconfig/Makefile build support and PCIe device probe/remove skeleton.

Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
 MAINTAINERS                               |   6 +
 drivers/net/ethernet/Kconfig              |   1 +
 drivers/net/ethernet/Makefile             |   1 +
 drivers/net/ethernet/zte/Kconfig          |  20 +++
 drivers/net/ethernet/zte/Makefile         |   6 +
 drivers/net/ethernet/zte/dinghai/Kconfig  |  34 ++++
 drivers/net/ethernet/zte/dinghai/Makefile |  10 ++
 drivers/net/ethernet/zte/dinghai/en_pf.c  | 191 ++++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.h  |  51 ++++++
 9 files changed, 320 insertions(+)
 create mode 100644 drivers/net/ethernet/zte/Kconfig
 create mode 100644 drivers/net/ethernet/zte/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/Kconfig
 create mode 100644 drivers/net/ethernet/zte/dinghai/Makefile
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.c
 create mode 100644 drivers/net/ethernet/zte/dinghai/en_pf.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 65902b97f5df..92ddac4bb310 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -29210,6 +29210,12 @@ S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
 F:	sound/hda/codecs/senarytech.c
 
+ZTE DINGHAI ETHERNET DRIVER
+M:	Junyang Han <han.junyang@zte.com.cn>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ethernet/zte/
+
 THE REST
 M:	Linus Torvalds <torvalds@linux-foundation.org>
 L:	linux-kernel@vger.kernel.org
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index bdc29d143160..ecc6fbb01510 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -190,5 +190,6 @@ source "drivers/net/ethernet/wangxun/Kconfig"
 source "drivers/net/ethernet/wiznet/Kconfig"
 source "drivers/net/ethernet/xilinx/Kconfig"
 source "drivers/net/ethernet/xircom/Kconfig"
+source "drivers/net/ethernet/zte/Kconfig"
 
 endif # ETHERNET
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 6bffb60ba644..7476af77d6c8 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -106,3 +106,4 @@ obj-$(CONFIG_NET_VENDOR_XIRCOM) += xircom/
 obj-$(CONFIG_NET_VENDOR_SYNOPSYS) += synopsys/
 obj-$(CONFIG_NET_VENDOR_PENSANDO) += pensando/
 obj-$(CONFIG_OA_TC6) += oa_tc6.o
+obj-$(CONFIG_NET_VENDOR_ZTE) += zte/
diff --git a/drivers/net/ethernet/zte/Kconfig b/drivers/net/ethernet/zte/Kconfig
new file mode 100644
index 000000000000..b95c2fc7db77
--- /dev/null
+++ b/drivers/net/ethernet/zte/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE driver configuration
+#
+
+config NET_VENDOR_ZTE
+    bool "ZTE devices"
+    default y
+    help
+      If you have a network (Ethernet) card belonging to this class, say Y.
+      Note that the answer to this question doesn't directly affect the
+      kernel: saying N will just cause the configurator to skip all
+      the questions about Zte cards. If you say Y, you will be asked
+      for your specific card in the following questions.
+
+if NET_VENDOR_ZTE
+
+source "drivers/net/ethernet/zte/dinghai/Kconfig"
+
+endif # NET_VENDOR_ZTE
diff --git a/drivers/net/ethernet/zte/Makefile b/drivers/net/ethernet/zte/Makefile
new file mode 100644
index 000000000000..cd9929b61559
--- /dev/null
+++ b/drivers/net/ethernet/zte/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the ZTE device drivers
+#
+
+obj-$(CONFIG_DINGHAI) += dinghai/
diff --git a/drivers/net/ethernet/zte/dinghai/Kconfig b/drivers/net/ethernet/zte/dinghai/Kconfig
new file mode 100644
index 000000000000..94b5bd9b3c50
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ZTE DingHai Ethernet driver configuration
+#
+
+config DINGHAI
+    bool "ZTE DingHai Ethernet driver"
+    depends on NET_VENDOR_ZTE && PCI
+    select NET_DEVLINK
+    help
+      This driver supports ZTE DingHai Ethernet devices.
+
+      DingHai is a high-performance Ethernet controller that supports
+      multiple features including hardware offloading, SR-IOV, and
+      advanced virtualization capabilities.
+
+      If you say Y here, you can select specific driver variants below.
+
+      If unsure, say N.
+
+if DINGHAI
+
+config DINGHAI_PF
+    tristate "ZTE DingHai PF (Physical Function) driver"
+    help
+      This driver supports ZTE DingHai PCI Express Ethernet
+      adapters (PF).
+
+      To compile this driver as a module, choose M here. The module
+      will be named dinghai10e.
+
+      If unsure, say N.
+
+endif # DINGHAI
diff --git a/drivers/net/ethernet/zte/dinghai/Makefile b/drivers/net/ethernet/zte/dinghai/Makefile
new file mode 100644
index 000000000000..f55a8de518be
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for ZTE DingHai Ethernet driver
+#
+
+ccflags-y += -I$(src)
+
+obj-$(CONFIG_DINGHAI_PF) += dinghai10e.o
+dinghai10e-y := en_pf.o
+
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
new file mode 100644
index 000000000000..d3a4298fa927
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ZTE DingHai Ethernet driver
+ * Copyright (c) 2022-2024, ZTE Corporation.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <net/devlink.h>
+#include "en_pf.h"
+
+#define DRV_VERSION "1.0-1"
+#define DRV_SUMMARY "ZTE(R) zxdh-net driver"
+
+const char zxdh_pf_driver_version[] = DRV_VERSION;
+static const char zxdh_pf_driver_string[] = DRV_SUMMARY;
+static const char zxdh_pf_copyright[] = "Copyright (c) 2022-2024, ZTE Corporation.";
+
+MODULE_AUTHOR("ZTE");
+MODULE_DESCRIPTION(DRV_SUMMARY);
+MODULE_VERSION(DRV_VERSION);
+MODULE_LICENSE("GPL");
+
+static const struct devlink_ops dh_pf_devlink_ops = {};
+
+const struct pci_device_id dh_pf_pci_table[] = {
+	{ PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_PF_DEVICE_ID), 0 },
+	{ PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_VF_DEVICE_ID), 0 },
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, dh_pf_pci_table);
+
+static int dh_pf_pci_init(struct dh_core_dev *dev)
+{
+	int ret = 0;
+	struct zxdh_pf_device *pf_dev = NULL;
+
+	pci_set_drvdata(dev->pdev, dev);
+
+	ret = pci_enable_device(dev->pdev);
+	if (ret)
+		return -ENOMEM;
+
+	ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(64));
+	if (ret) {
+		ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(32));
+		if (ret)
+			goto err_pci;
+	}
+
+	ret = pci_request_selected_regions(dev->pdev,
+					   pci_select_bars(dev->pdev, IORESOURCE_MEM),
+					   "dh-pf");
+	if (ret)
+		goto err_pci;
+
+	pci_set_master(dev->pdev);
+	ret = pci_save_state(dev->pdev);
+	if (ret)
+		goto err_pci_save_state;
+
+	pf_dev = dh_core_priv(dev);
+	pf_dev->pci_ioremap_addr[0] =
+		(uint64_t)ioremap(pci_resource_start(dev->pdev, 0),
+				  pci_resource_len(dev->pdev, 0));
+	if (!pf_dev->pci_ioremap_addr[0]) {
+		ret = -ENOMEM;
+		goto err_pci_save_state;
+	}
+
+	return 0;
+
+err_pci_save_state:
+	pci_release_selected_regions(dev->pdev, pci_select_bars(dev->pdev, IORESOURCE_MEM));
+err_pci:
+	pci_disable_device(dev->pdev);
+	return ret;
+}
+
+void dh_pf_pci_close(struct dh_core_dev *dev)
+{
+	struct zxdh_pf_device *pf_dev = NULL;
+
+	pf_dev = dh_core_priv(dev);
+	iounmap((void *)pf_dev->pci_ioremap_addr[0]);
+	pci_release_selected_regions(dev->pdev, pci_select_bars(dev->pdev, IORESOURCE_MEM));
+	pci_disable_device(dev->pdev);
+}
+
+static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct dh_core_dev *dh_dev = NULL;
+	struct zxdh_pf_device *pf_dev = NULL;
+	struct devlink *devlink = NULL;
+	int ret = 0;
+
+	devlink = devlink_alloc(&dh_pf_devlink_ops, sizeof(struct zxdh_pf_device),
+				&pdev->dev);
+	if (!devlink)
+		return -ENOMEM;
+
+	dh_dev = devlink_priv(devlink);
+	dh_dev->device = &pdev->dev;
+	dh_dev->pdev = pdev;
+	dh_dev->devlink = devlink;
+
+	pf_dev = dh_core_priv(dh_dev);
+	pf_dev->bar_chan_valid = false;
+	pf_dev->vepa = false;
+	mutex_init(&dh_dev->lock);
+	mutex_init(&pf_dev->irq_lock);
+
+	dh_dev->coredev_type = GET_COREDEV_TYPE(pdev);
+
+	ret = dh_pf_pci_init(dh_dev);
+	if (ret)
+		goto err_cfg_init;
+
+	return 0;
+
+err_cfg_init:
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	devlink_free(devlink);
+	pf_dev = NULL;
+	return -EPERM;
+}
+
+static void dh_pf_remove(struct pci_dev *pdev)
+{
+	struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+	struct devlink *devlink = priv_to_devlink(dh_dev);
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	if (!dh_dev)
+		return;
+
+	dh_pf_pci_close(dh_dev);
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	devlink_free(devlink);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static void dh_pf_shutdown(struct pci_dev *pdev)
+{
+	struct dh_core_dev *dh_dev = pci_get_drvdata(pdev);
+	struct devlink *devlink = priv_to_devlink(dh_dev);
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	dh_pf_pci_close(dh_dev);
+	mutex_destroy(&pf_dev->irq_lock);
+	mutex_destroy(&dh_dev->lock);
+	devlink_free(devlink);
+
+	pci_set_drvdata(pdev, NULL);
+}
+
+static int dh_pf_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+	return 0;
+}
+
+static int dh_pf_resume(struct pci_dev *pdev)
+{
+	return 0;
+}
+
+static struct pci_driver dh_pf_driver = {
+	.name = "dinghai10e",
+	.id_table = dh_pf_pci_table,
+	.probe = dh_pf_probe,
+	.remove = dh_pf_remove,
+	.suspend = dh_pf_suspend,
+	.resume = dh_pf_resume,
+	.shutdown = dh_pf_shutdown,
+};
+
+static int __init dh_pf_pci_init_module(void)
+{
+	return pci_register_driver(&dh_pf_driver);
+}
+
+static void __exit dh_pf_pci_exit_module(void)
+{
+	pci_unregister_driver(&dh_pf_driver);
+}
+
+module_init(dh_pf_pci_init_module);
+module_exit(dh_pf_pci_exit_module);
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.h b/drivers/net/ethernet/zte/dinghai/en_pf.h
new file mode 100644
index 000000000000..0d3880b0aede
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - PF header
+ * Copyright (c) 2022-2024, ZTE Corporation.
+ */
+
+#ifndef __ZXDH_EN_PF_H__
+#define __ZXDH_EN_PF_H__
+
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/mutex.h>
+
+#define ZXDH_PF_VENDOR_ID	0x1cf2
+#define ZXDH_PF_DEVICE_ID	0x8040
+#define ZXDH_VF_DEVICE_ID	0x8041
+
+enum dh_coredev_type {
+	DH_COREDEV_PF,
+	DH_COREDEV_VF,
+	DH_COREDEV_SF,
+	DH_COREDEV_MPF
+};
+
+struct devlink;
+
+struct dh_core_dev {
+	struct device *device;
+	enum dh_coredev_type coredev_type;
+	struct pci_dev *pdev;
+	struct devlink *devlink;
+	struct mutex lock; /* Protects device configuration */
+	char priv[] __aligned(32);
+};
+
+struct zxdh_pf_device {
+	bool bar_chan_valid;
+	bool vepa;
+	struct mutex irq_lock; /* Protects IRQ operations */
+};
+
+static inline void *dh_core_priv(struct dh_core_dev *dh_coredev)
+{
+	return &dh_coredev->priv;
+}
+
+#define GET_COREDEV_TYPE(pdev) \
+	((pdev)->device == ZXDH_VF_DEVICE_ID ? DH_COREDEV_VF : DH_COREDEV_PF)
+
+#endif
+
-- 
2.43.0

[-- Attachment #1.1.2: Type: text/html , Size: 22009 bytes --]

^ permalink raw reply related

* [PATCH net-next 2/3] net/ethernet/zte/dinghai: add logging infrastructure
From: Junyang Han @ 2026-04-15  1:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, han.junyang,
	ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260415015334.2018453-1-han.junyang@zte.com.cn>


[-- Attachment #1.1.1: Type: text/plain, Size: 10819 bytes --]

Introduce logging macros (DH_LOG_EMERG/ALERT/CRIT/ERR/WARN/INFO/DBG)
and helper definitions for ZTE DingHai driver debugging.

Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
 drivers/net/ethernet/zte/dinghai/Makefile    |  2 +-
 drivers/net/ethernet/zte/dinghai/dh_helper.h | 79 ++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/dh_log.c    | 10 +++
 drivers/net/ethernet/zte/dinghai/dh_log.h    | 64 ++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.c     | 49 ++++++++++--
 5 files changed, 196 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_helper.h
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_log.c
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_log.h

diff --git a/drivers/net/ethernet/zte/dinghai/Makefile b/drivers/net/ethernet/zte/dinghai/Makefile
index f55a8de518be..c2a815427c24 100644
--- a/drivers/net/ethernet/zte/dinghai/Makefile
+++ b/drivers/net/ethernet/zte/dinghai/Makefile
@@ -6,5 +6,5 @@
 ccflags-y += -I$(src)
 
 obj-$(CONFIG_DINGHAI_PF) += dinghai10e.o
-dinghai10e-y := en_pf.o
+dinghai10e-y := en_pf.o dh_log.o
 
diff --git a/drivers/net/ethernet/zte/dinghai/dh_helper.h b/drivers/net/ethernet/zte/dinghai/dh_helper.h
new file mode 100644
index 000000000000..3933e6d79460
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/dh_helper.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - device-level logging helpers
+ * Copyright (c) 2022-2024, ZTE Corporation.
+ */
+
+#ifndef __DH_HELPER_H__
+#define __DH_HELPER_H__
+
+#include <linux/dev_printk.h>
+#include <linux/types.h>
+#include "dh_log.h"
+#include "en_pf.h"
+
+extern u32 dh_debug_mask;
+
+#define dh_dbg(__dev, format, ...)					\
+	dev_dbg((__dev)->device, "%s:%d:(pid %d): " format,		\
+		 __func__, __LINE__, current->pid,			\
+		 ##__VA_ARGS__)
+
+#define dh_dbg_once(__dev, format, ...)				\
+	dev_dbg_once((__dev)->device,				\
+		     "%s:%d:(pid %d): " format,			\
+		     __func__, __LINE__, current->pid,		\
+		     ##__VA_ARGS__)
+
+#define dh_dbg_mask(__dev, mask, format, ...)			\
+do {								\
+	if ((mask) & dh_debug_mask)				\
+		dh_dbg(__dev, format, ##__VA_ARGS__);		\
+} while (0)
+
+#define dh_err(__dev, format, ...)					\
+	dev_err((__dev)->device, "%s:%d:(pid %d): " format,		\
+		__func__, __LINE__, current->pid,			\
+		##__VA_ARGS__)
+
+#define dh_err_rl(__dev, format, ...)					\
+	dev_err_ratelimited((__dev)->device,				\
+			    "%s:%d:(pid %d): " format,			\
+			    __func__, __LINE__, current->pid,		\
+			    ##__VA_ARGS__)
+
+#define dh_warn(__dev, format, ...)					\
+	dev_warn((__dev)->device, "%s:%d:(pid %d): " format,		\
+		 __func__, __LINE__, current->pid,			\
+		 ##__VA_ARGS__)
+
+#define dh_warn_once(__dev, format, ...)				\
+	dev_warn_once((__dev)->device, "%s:%d:(pid %d): " format,	\
+		      __func__, __LINE__, current->pid,		\
+		      ##__VA_ARGS__)
+
+#define dh_warn_rl(__dev, format, ...)					\
+	dev_warn_ratelimited((__dev)->device,				\
+			     "%s:%d:(pid %d): " format,		\
+			     __func__, __LINE__, current->pid,		\
+			     ##__VA_ARGS__)
+
+#define dh_info(__dev, format, ...)					\
+	dev_info((__dev)->device, format, ##__VA_ARGS__)
+
+#define dh_info_rl(__dev, format, ...)					\
+	dev_info_ratelimited((__dev)->device,				\
+			     "%s:%d:(pid %d): " format,		\
+			     __func__, __LINE__, current->pid,		\
+			     ##__VA_ARGS__)
+
+enum {
+	ZXDH_PCI_DEV_IS_VF = 1 << 0,
+};
+
+static inline bool dh_core_is_sf(const struct dh_core_dev *dev)
+{
+	return dev->coredev_type == DH_COREDEV_SF;
+}
+
+#endif /* __DH_HELPER_H__ */
diff --git a/drivers/net/ethernet/zte/dinghai/dh_log.c b/drivers/net/ethernet/zte/dinghai/dh_log.c
new file mode 100644
index 000000000000..5e6e42175e37
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/dh_log.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ZTE DingHai Ethernet driver - logging runtime control
+ * Copyright (c) 2022-2024, ZTE Corporation.
+ */
+
+#include <linux/module.h>
+
+int debug_print;
+module_param(debug_print, int, 0644);
diff --git a/drivers/net/ethernet/zte/dinghai/dh_log.h b/drivers/net/ethernet/zte/dinghai/dh_log.h
new file mode 100644
index 000000000000..295ee306fa0d
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/dh_log.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - logging infrastructure
+ * Copyright (c) 2022-2024, ZTE Corporation.
+ */
+
+#ifndef __DH_LOG_H__
+#define __DH_LOG_H__
+
+#include <linux/kernel.h>
+#include <linux/printk.h>
+
+#define MODULE_CMD		"zxdh_cmd"
+#define MODULE_NP		"zxdh_np"
+#define MODULE_PF		"zxdh_pf"
+#define MODULE_PTP		"zxdh_ptp"
+#define MODULE_TSN		"zxdh_tsn"
+#define MODULE_LAG		"zxdh_lag"
+#define MODULE_DHTOOLS		"zxdh_tool"
+#define MODULE_SEC		"zxdh_sec"
+#define MODULE_MPF		"zxdh_mpf"
+#define MODULE_FUC_HP		"zxdh_func_hp"
+#define MODULE_UACCE		"zxdh_uacce"
+#define MODULE_HEAL		"zxdh_health"
+
+extern int debug_print;
+
+#define DH_LOG_EMERG(module, fmt, arg...)				\
+	printk(KERN_EMERG "[%s][%s][%d] " fmt,			\
+	       module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_ALERT(module, fmt, arg...)				\
+	printk(KERN_ALERT "[%s][%s][%d] " fmt,			\
+	       module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_CRIT(module, fmt, arg...)				\
+	printk(KERN_CRIT "[%s][%s][%d] " fmt,			\
+	       module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_ERR(module, fmt, arg...)				\
+	printk(KERN_ERR "[%s][%s][%d] " fmt,			\
+	       module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_WARNING(module, fmt, arg...)				\
+	printk(KERN_WARNING "[%s][%s][%d] " fmt,			\
+	       module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_INFO(module, fmt, arg...)				\
+	printk(KERN_INFO "[%s][%s][%d] " fmt,			\
+	       module, __func__, __LINE__, ##arg)
+
+#define DH_LOG_DEBUG(module, fmt, arg...)				\
+do {									\
+	if (debug_print)						\
+		printk(KERN_DEBUG "[%s][%s][%d] " fmt,		\
+		       module, __func__, __LINE__, ##arg);	\
+} while (0)
+
+#define LOG_ERR(fmt, arg...)		DH_LOG_ERR(MODULE_PF, fmt, ##arg)
+#define LOG_INFO(fmt, arg...)		DH_LOG_INFO(MODULE_PF, fmt, ##arg)
+#define LOG_DEBUG(fmt, arg...)		DH_LOG_DEBUG(MODULE_PF, fmt, ##arg)
+#define LOG_WARN(fmt, arg...)		DH_LOG_WARNING(MODULE_PF, fmt, ##arg)
+
+#endif /* __DH_LOG_H__ */
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
index d3a4298fa927..2d2740223401 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.c
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -8,6 +8,7 @@
 #include <linux/pci.h>
 #include <net/devlink.h>
 #include "en_pf.h"
+#include "dh_log.h"
 
 #define DRV_VERSION "1.0-1"
 #define DRV_SUMMARY "ZTE(R) zxdh-net driver"
@@ -21,6 +22,13 @@ MODULE_DESCRIPTION(DRV_SUMMARY);
 MODULE_VERSION(DRV_VERSION);
 MODULE_LICENSE("GPL");
 
+u32 dh_debug_mask;
+module_param_named(debug_mask, dh_debug_mask, uint, 0644);
+MODULE_PARM_DESC(debug_mask, "debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0");
+static bool probe_vf = 1;
+module_param(probe_vf, bool, 0644);
+MODULE_PARM_DESC(probe_vf, "probe_vf: 0 = N, 1 = Y");
+
 static const struct devlink_ops dh_pf_devlink_ops = {};
 
 const struct pci_device_id dh_pf_pci_table[] = {
@@ -39,26 +47,34 @@ static int dh_pf_pci_init(struct dh_core_dev *dev)
 	pci_set_drvdata(dev->pdev, dev);
 
 	ret = pci_enable_device(dev->pdev);
-	if (ret)
+	if (ret) {
+		LOG_ERR("pci_enable_device failed: %d\n", ret);
 		return -ENOMEM;
+	}
 
 	ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(64));
 	if (ret) {
 		ret = dma_set_mask_and_coherent(dev->device, DMA_BIT_MASK(32));
-		if (ret)
+		if (ret) {
+			LOG_ERR("dma_set_mask_and_coherent failed: %d\n", ret);
 			goto err_pci;
+		}
 	}
 
 	ret = pci_request_selected_regions(dev->pdev,
 					   pci_select_bars(dev->pdev, IORESOURCE_MEM),
 					   "dh-pf");
-	if (ret)
+	if (ret) {
+		LOG_ERR("pci_request_selected_regions failed: %d\n", ret);
 		goto err_pci;
+	}
 
 	pci_set_master(dev->pdev);
 	ret = pci_save_state(dev->pdev);
-	if (ret)
+	if (ret) {
+		LOG_ERR("pci_save_state failed: %d\n", ret);
 		goto err_pci_save_state;
+	}
 
 	pf_dev = dh_core_priv(dev);
 	pf_dev->pci_ioremap_addr[0] =
@@ -66,6 +82,9 @@ static int dh_pf_pci_init(struct dh_core_dev *dev)
 				  pci_resource_len(dev->pdev, 0));
 	if (!pf_dev->pci_ioremap_addr[0]) {
 		ret = -ENOMEM;
+		LOG_ERR("ioremap(0x%llx, 0x%llx) failed\n",
+			pci_resource_start(dev->pdev, 0),
+			pci_resource_len(dev->pdev, 0));
 		goto err_pci_save_state;
 	}
 
@@ -95,10 +114,13 @@ static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	struct devlink *devlink = NULL;
 	int ret = 0;
 
+	LOG_INFO("pf level start\n");
 	devlink = devlink_alloc(&dh_pf_devlink_ops, sizeof(struct zxdh_pf_device),
 				&pdev->dev);
-	if (!devlink)
+	if (!devlink) {
+		LOG_ERR("devlink alloc failed\n");
 		return -ENOMEM;
+	}
 
 	dh_dev = devlink_priv(devlink);
 	dh_dev->device = &pdev->dev;
@@ -112,10 +134,17 @@ static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mutex_init(&pf_dev->irq_lock);
 
 	dh_dev->coredev_type = GET_COREDEV_TYPE(pdev);
+	LOG_DEBUG("%s device: %s\n",
+		  (dh_dev->coredev_type == DH_COREDEV_PF) ? "PF" : "VF",
+		  pci_name(pdev));
 
 	ret = dh_pf_pci_init(dh_dev);
-	if (ret)
+	if (ret) {
+		LOG_ERR("dh_pf_pci_init failed: %d\n", ret);
 		goto err_cfg_init;
+	}
+
+	LOG_INFO("pf level completed\n");
 
 	return 0;
 
@@ -133,14 +162,17 @@ static void dh_pf_remove(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dh_dev);
 	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
 
-	if (!dh_dev)
+	if (!pf_dev)
 		return;
 
+	LOG_INFO("pf level start\n");
+
 	dh_pf_pci_close(dh_dev);
 	mutex_destroy(&pf_dev->irq_lock);
 	mutex_destroy(&dh_dev->lock);
 	devlink_free(devlink);
 	pci_set_drvdata(pdev, NULL);
+	LOG_INFO("pf level completed\n");
 }
 
 static void dh_pf_shutdown(struct pci_dev *pdev)
@@ -149,12 +181,15 @@ static void dh_pf_shutdown(struct pci_dev *pdev)
 	struct devlink *devlink = priv_to_devlink(dh_dev);
 	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
 
+	LOG_INFO("pf level start\n");
+
 	dh_pf_pci_close(dh_dev);
 	mutex_destroy(&pf_dev->irq_lock);
 	mutex_destroy(&dh_dev->lock);
 	devlink_free(devlink);
 
 	pci_set_drvdata(pdev, NULL);
+	LOG_INFO("pf level completed\n");
 }
 
 static int dh_pf_suspend(struct pci_dev *pdev, pm_message_t state)
-- 
2.43.0

[-- Attachment #1.1.2: Type: text/html , Size: 26984 bytes --]

^ permalink raw reply related

* [PATCH net-next 3/3] net/ethernet/zte/dinghai: add hardware register access and PCI capability scanning
From: Junyang Han @ 2026-04-15  1:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, han.junyang,
	ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260415015334.2018453-1-han.junyang@zte.com.cn>


[-- Attachment #1.1.1: Type: text/plain, Size: 18746 bytes --]

Implement PCI configuration space access, BAR mapping, capability
scanning (common/notify/device), and hardware queue register
definitions for DingHai PF device.

Signed-off-by: Junyang Han <han.junyang@zte.com.cn>
---
 drivers/net/ethernet/zte/dinghai/dh_queue.h |  71 ++++
 drivers/net/ethernet/zte/dinghai/en_pf.c    | 411 ++++++++++++++++++++
 drivers/net/ethernet/zte/dinghai/en_pf.h    |  41 ++
 3 files changed, 523 insertions(+)
 create mode 100644 drivers/net/ethernet/zte/dinghai/dh_queue.h

diff --git a/drivers/net/ethernet/zte/dinghai/dh_queue.h b/drivers/net/ethernet/zte/dinghai/dh_queue.h
new file mode 100644
index 000000000000..1e7d64ecbbf3
--- /dev/null
+++ b/drivers/net/ethernet/zte/dinghai/dh_queue.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ZTE DingHai Ethernet driver - PCI capability definitions
+ * Copyright (c) 2022-2024, ZTE Corporation.
+ */
+
+#ifndef __DH_QUEUE_H__
+#define __DH_QUEUE_H__
+
+/* Vector value used to disable MSI for queue */
+#define ZXDH_MSI_NO_VECTOR      0xff
+
+/* Status byte for guest to report progress, and synchronize features */
+/* We have seen device and processed generic fields */
+#define ZXDH_CONFIG_S_ACKNOWLEDGE 1
+/* We have found a driver for the device. */
+#define ZXDH_CONFIG_S_DRIVER      2
+/* Driver has used its parts of the config, and is happy */
+#define ZXDH_CONFIG_S_DRIVER_OK   4
+/* Driver has finished configuring features */
+#define ZXDH_CONFIG_S_FEATURES_OK 8
+/* Device entered invalid state, driver must reset it */
+#define ZXDH_CONFIG_S_NEEDS_RESET 0x40
+/* We've given up on this device */
+#define ZXDH_CONFIG_S_FAILED      0x80
+
+/* This is the PCI capability header: */
+struct zxdh_pf_pci_cap {
+	__u8 cap_vndr;		/* Generic PCI field: PCI_CAP_ID_VNDR */
+	__u8 cap_next;		/* Generic PCI field: next ptr. */
+	__u8 cap_len;		/* Generic PCI field: capability length */
+	__u8 cfg_type;		/* Identifies the structure. */
+	__u8 bar;		/* Where to find it. */
+	__u8 id;		/* Multiple capabilities of the same type */
+	__u8 padding[2];		/* Pad to full dword. */
+	__le32 offset;		/* Offset within bar. */
+	__le32 length;		/* Length of the structure, in bytes. */
+};
+
+/* Fields in ZXDH_PF_PCI_CAP_COMMON_CFG: */
+struct zxdh_pf_pci_common_cfg {
+	/* About the whole device. */
+	__le32 device_feature_select; /* read-write */
+	__le32 device_feature;	/* read-only */
+	__le32 guest_feature_select; /* read-write */
+	__le32 guest_feature;		/* read-write */
+	__le16 msix_config;		/* read-write */
+	__le16 num_queues;		/* read-only */
+	__u8 device_status;		/* read-write */
+	__u8 config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	__le16 queue_select;		/* read-write */
+	__le16 queue_size;		/* read-write, power of 2. */
+	__le16 queue_msix_vector;	/* read-write */
+	__le16 queue_enable;		/* read-write */
+	__le16 queue_notify_off;	/* read-only */
+	__le32 queue_desc_lo;		/* read-write */
+	__le32 queue_desc_hi;		/* read-write */
+	__le32 queue_avail_lo;		/* read-write */
+	__le32 queue_avail_hi;		/* read-write */
+	__le32 queue_used_lo;		/* read-write */
+	__le32 queue_used_hi;		/* read-write */
+};
+
+struct zxdh_pf_pci_notify_cap {
+	struct zxdh_pf_pci_cap cap;
+	__le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
+};
+
+#endif /* __DH_QUEUE_H__ */
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.c b/drivers/net/ethernet/zte/dinghai/en_pf.c
index 2d2740223401..c29299ad629e 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.c
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.c
@@ -107,6 +107,417 @@ void dh_pf_pci_close(struct dh_core_dev *dev)
 	pci_disable_device(dev->pdev);
 }
 
+int32_t zxdh_pf_pci_find_capability(struct pci_dev *pdev, uint8_t cfg_type,
+				    uint32_t ioresource_types, int32_t *bars)
+{
+	int32_t pos = 0;
+	uint8_t type = 0;
+	uint8_t bar = 0;
+
+	for (pos = pci_find_capability(pdev, PCI_CAP_ID_VNDR); pos > 0;
+	     pos = pci_find_next_capability(pdev, pos, PCI_CAP_ID_VNDR)) {
+		pci_read_config_byte(pdev, pos + offsetof(struct zxdh_pf_pci_cap, cfg_type), &type);
+		pci_read_config_byte(pdev, pos + offsetof(struct zxdh_pf_pci_cap, bar), &bar);
+
+		/* ignore structures with reserved BAR values */
+		if (bar > ZXDH_PF_MAX_BAR_VAL)
+			continue;
+
+		if (type == cfg_type) {
+			if (pci_resource_len(pdev, bar) &&
+			    pci_resource_flags(pdev, bar) & ioresource_types) {
+				*bars |= (1 << bar);
+				return pos;
+			}
+		}
+	}
+
+	return 0;
+}
+
+void __iomem *zxdh_pf_map_capability(struct dh_core_dev *dh_dev, int32_t off,
+				     size_t minlen, uint32_t align,
+				     uint32_t start, uint32_t size,
+				     size_t *len, resource_size_t *pa,
+				     uint32_t *bar_off)
+{
+	struct pci_dev *pdev = dh_dev->pdev;
+	uint8_t bar = 0;
+	uint32_t offset = 0;
+	uint32_t length = 0;
+	void __iomem *p = NULL;
+
+	pci_read_config_byte(pdev, off + offsetof(struct zxdh_pf_pci_cap, bar), &bar);
+	pci_read_config_dword(pdev, off + offsetof(struct zxdh_pf_pci_cap, offset), &offset);
+	pci_read_config_dword(pdev, off + offsetof(struct zxdh_pf_pci_cap, length), &length);
+
+	if (bar_off)
+		*bar_off = offset;
+
+	if (length <= start) {
+		LOG_ERR("bad capability len %u (>%u expected)\n", length, start);
+		return NULL;
+	}
+
+	if (length - start < minlen) {
+		LOG_ERR("bad capability len %u (>=%zu expected)\n", length, minlen);
+		return NULL;
+	}
+
+	length -= start;
+	if (start + offset < offset) {
+		LOG_ERR("map wrap-around %u+%u\n", start, offset);
+		return NULL;
+	}
+
+	offset += start;
+	if (offset & (align - 1)) {
+		LOG_ERR("offset %u not aligned to %u\n", offset, align);
+		return NULL;
+	}
+
+	if (length > size)
+		length = size;
+
+	if (len)
+		*len = length;
+
+	if (minlen + offset < minlen || minlen + offset > pci_resource_len(pdev, bar)) {
+		LOG_ERR("map custom queue %zu@%u out of range on bar %i length %lu\n",
+			minlen, offset, bar, (unsigned long)pci_resource_len(pdev, bar));
+		return NULL;
+	}
+
+	p = pci_iomap_range(pdev, bar, offset, length);
+	if (unlikely(!p)) {
+		LOG_ERR("unable to map custom queue %u@%u on bar %i\n", length, offset, bar);
+	} else if (pa) {
+		*pa = pci_resource_start(pdev, bar) + offset;
+	}
+
+	return p;
+}
+
+int32_t zxdh_pf_common_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t common = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	/* check for a common config: if not, use legacy mode (bar 0). */
+	common = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_COMMON_CFG,
+					     IORESOURCE_IO | IORESOURCE_MEM,
+					     &pf_dev->modern_bars);
+	if (common == 0) {
+		LOG_ERR("missing capabilities %i, leaving for legacy driver\n", common);
+		return -ENODEV;
+	}
+
+	pf_dev->common = zxdh_pf_map_capability(dh_dev, common,
+						sizeof(struct zxdh_pf_pci_common_cfg),
+						ZXDH_PF_ALIGN4, 0,
+						sizeof(struct zxdh_pf_pci_common_cfg),
+						NULL, NULL, NULL);
+	if (unlikely(!pf_dev->common)) {
+		LOG_ERR("pf_dev->common is null\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int32_t zxdh_pf_notify_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t notify = 0;
+	uint32_t notify_length = 0;
+	uint32_t notify_offset = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	/* If common is there, these should be too... */
+	notify = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_NOTIFY_CFG,
+					     IORESOURCE_IO | IORESOURCE_MEM,
+					     &pf_dev->modern_bars);
+	if (notify == 0) {
+		LOG_ERR("missing capabilities %i\n", notify);
+		return -EINVAL;
+	}
+
+	pci_read_config_dword(pdev, notify + offsetof(struct zxdh_pf_pci_notify_cap,
+				notify_off_multiplier), &pf_dev->notify_offset_multiplier);
+	pci_read_config_dword(pdev, notify + offsetof(struct zxdh_pf_pci_notify_cap,
+				cap.length), &notify_length);
+	pci_read_config_dword(pdev, notify + offsetof(struct zxdh_pf_pci_notify_cap,
+				cap.offset), &notify_offset);
+
+	/* We don't know how many VQs we'll map, ahead of the time.
+	 * If notify length is small, map it all now. Otherwise, map each VQ individually later.
+	 */
+	if ((uint64_t)notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {
+		pf_dev->notify_base = zxdh_pf_map_capability(dh_dev, notify,
+							    ZXDH_PF_MAP_MINLEN2,
+							    ZXDH_PF_ALIGN2, 0,
+							    notify_length,
+							    &pf_dev->notify_len,
+							    &pf_dev->notify_pa, NULL);
+		if (unlikely(!pf_dev->notify_base)) {
+			LOG_ERR("pf_dev->notify_base is null\n");
+			return -EINVAL;
+		}
+	} else {
+		pf_dev->notify_map_cap = notify;
+	}
+
+	return 0;
+}
+
+int32_t zxdh_pf_device_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t device = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	/* Device capability is only mandatory for devices that have device-specific configuration. */
+	device = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_DEVICE_CFG,
+					     IORESOURCE_IO | IORESOURCE_MEM,
+					     &pf_dev->modern_bars);
+
+	/* we don't know how much we should map, but PAGE_SIZE is more than enough for all existing devices. */
+	if (device) {
+		pf_dev->device = zxdh_pf_map_capability(dh_dev, device, 0,
+						       ZXDH_PF_ALIGN4, 0, PAGE_SIZE,
+						       &pf_dev->device_len, NULL,
+						       &pf_dev->dev_cfg_bar_off);
+		if (unlikely(!pf_dev->device)) {
+			LOG_ERR("pf_dev->device is null\n");
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+void zxdh_pf_modern_cfg_uninit(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	if (pf_dev->device)
+		pci_iounmap(pdev, pf_dev->device);
+	if (pf_dev->notify_base)
+		pci_iounmap(pdev, pf_dev->notify_base);
+	pci_iounmap(pdev, pf_dev->common);
+}
+
+int32_t zxdh_pf_modern_cfg_init(struct dh_core_dev *dh_dev)
+{
+	int32_t ret = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	struct pci_dev *pdev = dh_dev->pdev;
+
+	ret = zxdh_pf_common_cfg_init(dh_dev);
+	if (ret != 0) {
+		LOG_ERR("zxdh_pf_common_cfg_init failed: %d\n", ret);
+		return -EINVAL;
+	}
+
+	ret = zxdh_pf_notify_cfg_init(dh_dev);
+	if (ret != 0) {
+		LOG_ERR("zxdh_pf_notify_cfg_init failed: %d\n", ret);
+		goto err_map_notify;
+	}
+
+	ret = zxdh_pf_device_cfg_init(dh_dev);
+	if (ret != 0) {
+		LOG_ERR("zxdh_pf_device_cfg_init failed: %d\n", ret);
+		goto err_map_device;
+	}
+
+	return 0;
+
+err_map_device:
+	if (pf_dev->notify_base)
+		pci_iounmap(pdev, pf_dev->notify_base);
+err_map_notify:
+	pci_iounmap(pdev, pf_dev->common);
+	return -EINVAL;
+}
+
+uint16_t zxdh_pf_get_queue_notify_off(struct dh_core_dev *dh_dev,
+				      uint16_t phy_index, uint16_t index)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	if (pf_dev->packed_status)
+		iowrite16(phy_index, &pf_dev->common->queue_select);
+	else
+		iowrite16(index, &pf_dev->common->queue_select);
+
+	return ioread16(&pf_dev->common->queue_notify_off);
+}
+
+void __iomem *zxdh_pf_map_vq_notify(struct dh_core_dev *dh_dev,
+				     uint16_t phy_index, uint16_t index,
+				     resource_size_t *pa)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	uint16_t off = 0;
+
+	off = zxdh_pf_get_queue_notify_off(dh_dev, phy_index, index);
+
+	if (pf_dev->notify_base) {
+		/* offset should not wrap */
+		if ((uint64_t)off * pf_dev->notify_offset_multiplier + 2 > pf_dev->notify_len) {
+			LOG_ERR("bad notification offset %u (x %u) for queue %u > %zd",
+				off, pf_dev->notify_offset_multiplier, phy_index,
+				pf_dev->notify_len);
+			return NULL;
+		}
+
+		if (pa)
+			*pa = pf_dev->notify_pa + off * pf_dev->notify_offset_multiplier;
+
+		return pf_dev->notify_base + off * pf_dev->notify_offset_multiplier;
+	} else {
+		return zxdh_pf_map_capability(dh_dev, pf_dev->notify_map_cap, 2, 2,
+					      off * pf_dev->notify_offset_multiplier,
+					      2, NULL, pa, NULL);
+	}
+}
+
+void zxdh_pf_unmap_vq_notify(struct dh_core_dev *dh_dev, void *priv)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	if (!pf_dev->notify_base)
+		pci_iounmap(dh_dev->pdev, priv);
+}
+
+void zxdh_pf_set_status(struct dh_core_dev *dh_dev, uint8_t status)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	iowrite8(status, &pf_dev->common->device_status);
+}
+
+uint8_t zxdh_pf_get_status(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	return ioread8(&pf_dev->common->device_status);
+}
+
+static uint8_t zxdh_pf_get_cfg_gen(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	uint8_t config_generation = 0;
+
+	config_generation = ioread8(&pf_dev->common->config_generation);
+	LOG_INFO("config_generation is %d\n", config_generation);
+
+	return config_generation;
+}
+
+void zxdh_pf_get_vf_mac(struct dh_core_dev *dh_dev, uint8_t *mac, int32_t vf_id)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	if (pf_dev->pf_sriov_cap_base) {
+		DEV_MAC_L = ioread32((void __iomem *)(pf_dev->pf_sriov_cap_base +
+				     (pf_dev->sriov_bar_size) * vf_id +
+				     pf_dev->dev_cfg_bar_off));
+		mac[0] = DEV_MAC_L & 0xff;
+		mac[1] = (DEV_MAC_L >> 8) & 0xff;
+		mac[2] = (DEV_MAC_L >> 16) & 0xff;
+		mac[3] = (DEV_MAC_L >> 24) & 0xff;
+		DEV_MAC_H = ioread16((void __iomem *)(pf_dev->pf_sriov_cap_base +
+				      (pf_dev->sriov_bar_size) * vf_id +
+				      pf_dev->dev_cfg_bar_off +
+				      ZXDH_DEV_MAC_HIGH_OFFSET));
+		mac[4] = DEV_MAC_H & 0xff;
+		mac[5] = (DEV_MAC_H >> 8) & 0xff;
+	}
+}
+
+void zxdh_pf_set_vf_mac_reg(struct zxdh_pf_device *pf_dev, uint8_t *mac, int32_t vf_id)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+
+	if (pf_dev->pf_sriov_cap_base) {
+		DEV_MAC_L = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+		DEV_MAC_H = mac[4] | (mac[5] << 8);
+		iowrite32(DEV_MAC_L, (void __iomem *)(pf_dev->pf_sriov_cap_base +
+			  (pf_dev->sriov_bar_size) * vf_id +
+			  pf_dev->dev_cfg_bar_off));
+		iowrite16(DEV_MAC_H, (void __iomem *)(pf_dev->pf_sriov_cap_base +
+			  (pf_dev->sriov_bar_size) * vf_id +
+			  pf_dev->dev_cfg_bar_off +
+			  ZXDH_DEV_MAC_HIGH_OFFSET));
+	}
+}
+
+void zxdh_pf_set_vf_mac(struct dh_core_dev *dh_dev, uint8_t *mac, int32_t vf_id)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	zxdh_pf_set_vf_mac_reg(pf_dev, mac, vf_id);
+}
+
+void zxdh_set_mac(struct dh_core_dev *dh_dev, uint8_t *mac)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	DEV_MAC_L = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+	DEV_MAC_H = mac[4] | (mac[5] << 8);
+	iowrite32(DEV_MAC_L, pf_dev->device);
+	iowrite16(DEV_MAC_H, (void __iomem *)((uint8_t *)pf_dev->device +
+			  ZXDH_DEV_MAC_HIGH_OFFSET));
+}
+
+void zxdh_get_mac(struct dh_core_dev *dh_dev, uint8_t *mac)
+{
+	uint32_t DEV_MAC_L = 0;
+	uint16_t DEV_MAC_H = 0;
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	DEV_MAC_L = ioread32(pf_dev->device);
+	mac[0] = DEV_MAC_L & 0xff;
+	mac[1] = (DEV_MAC_L >> 8) & 0xff;
+	mac[2] = (DEV_MAC_L >> 16) & 0xff;
+	mac[3] = (DEV_MAC_L >> 24) & 0xff;
+	DEV_MAC_H = ioread16((void __iomem *)((uint8_t *)pf_dev->device +
+			  ZXDH_DEV_MAC_HIGH_OFFSET));
+	mac[4] = DEV_MAC_H & 0xff;
+	mac[5] = (DEV_MAC_H >> 8) & 0xff;
+}
+
+uint64_t zxdh_pf_get_features(struct dh_core_dev *dh_dev)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+	uint64_t device_feature = 0;
+
+	iowrite32(0, &pf_dev->common->device_feature_select);
+	device_feature = ioread32(&pf_dev->common->device_feature);
+	iowrite32(1, &pf_dev->common->device_feature_select);
+	device_feature |= ((uint64_t)ioread32(&pf_dev->common->device_feature) << 32);
+
+	return device_feature;
+}
+
+void zxdh_pf_set_features(struct dh_core_dev *dh_dev, uint64_t features)
+{
+	struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
+
+	iowrite32(0, &pf_dev->common->guest_feature_select);
+	iowrite32((uint32_t)features, &pf_dev->common->guest_feature);
+	iowrite32(1, &pf_dev->common->guest_feature_select);
+	iowrite32(features >> 32, &pf_dev->common->guest_feature);
+}
+
 static int dh_pf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct dh_core_dev *dh_dev = NULL;
diff --git a/drivers/net/ethernet/zte/dinghai/en_pf.h b/drivers/net/ethernet/zte/dinghai/en_pf.h
index 0d3880b0aede..197b21788576 100644
--- a/drivers/net/ethernet/zte/dinghai/en_pf.h
+++ b/drivers/net/ethernet/zte/dinghai/en_pf.h
@@ -10,11 +10,31 @@
 #include <linux/types.h>
 #include <linux/pci.h>
 #include <linux/mutex.h>
+#include "dh_log.h"
+#include "dh_queue.h"
 
 #define ZXDH_PF_VENDOR_ID	0x1cf2
 #define ZXDH_PF_DEVICE_ID	0x8040
 #define ZXDH_VF_DEVICE_ID	0x8041
 
+/* Common configuration */
+#define ZXDH_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define ZXDH_PCI_CAP_NOTIFY_CFG	2
+/* ISR access */
+#define ZXDH_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define ZXDH_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define ZXDH_PCI_CAP_PCI_CFG		5
+
+#define ZXDH_PF_MAX_BAR_VAL		0x5
+#define ZXDH_PF_ALIGN4			4
+#define ZXDH_PF_ALIGN2			2
+#define ZXDH_PF_MAP_MINLEN2		2
+
+#define ZXDH_DEV_MAC_HIGH_OFFSET	4
+
 enum dh_coredev_type {
 	DH_COREDEV_PF,
 	DH_COREDEV_VF,
@@ -34,6 +54,27 @@ struct dh_core_dev {
 };
 
 struct zxdh_pf_device {
+	struct zxdh_pf_pci_common_cfg __iomem *common;
+	/* Device-specific data (non-legacy mode)  */
+	/* Base of vq notifications (non-legacy mode). */
+	void __iomem *device;
+	void __iomem *notify_base;
+	void __iomem *pf_sriov_cap_base;
+	/* Physical base of vq notifications */
+	resource_size_t notify_pa;
+	/* So we can sanity-check accesses. */
+	size_t notify_len;
+	size_t device_len;
+	/* Capability for when we need to map notifications per-vq. */
+	int32_t notify_map_cap;
+	uint32_t notify_offset_multiplier;
+	/* Multiply queue_notify_off by this value. (non-legacy mode). */
+	int32_t modern_bars;
+
+	uint64_t pci_ioremap_addr[6];
+	uint64_t sriov_bar_size;
+	uint32_t dev_cfg_bar_off;
+	bool packed_status;
 	bool bar_chan_valid;
 	bool vepa;
 	struct mutex irq_lock; /* Protects IRQ operations */
-- 
2.43.0

[-- Attachment #1.1.2: Type: text/html , Size: 45357 bytes --]

^ permalink raw reply related

* [PATCH net v6 1/2] flow_dissector: do not dissect PPPoE PFC frames
From: Qingfang Deng @ 2026-04-15  2:24 UTC (permalink / raw)
  To: linux-ppp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Qingfang Deng, Guillaume Nault,
	Wojciech Drewek, Tony Nguyen, netdev, linux-kernel
  Cc: Paul Mackerras, Jaco Kroon, James Carlson, Marcin Szycik

RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
PFC for PPPoE sessions, and the flow dissector driver has assumed an
uncompressed frame until the blamed commit.

During the review process of that commit [1], support for PFC is
suggested. However, having a compressed (1-byte) protocol field means
the subsequent PPP payload is shifted by one byte, causing 4-byte
misalignment for the network header and an unaligned access exception
on some architectures.

The exception can be reproduced by sending a PPPoE PFC frame to an
ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE
session is active on that interface:

$ 0   : 00000000 80c40000 00000000 85144817
$ 4   : 00000008 00000100 80a75758 81dc9bb8
$ 8   : 00000010 8087ae2c 0000003d 00000000
$12   : 000000e0 00000039 00000000 00000000
$16   : 85043240 80a75758 81dc9bb8 00006488
$20   : 0000002f 00000007 85144810 80a70000
$24   : 81d1bda0 00000000
$28   : 81dc8000 81dc9aa8 00000000 805ead08
Hi    : 00009d51
Lo    : 2163358a
epc   : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50
ra    : 805ead08 __skb_get_hash_net+0x74/0x12c
Status: 11000403        KERNEL EXL IE
Cause : 40800010 (ExcCode 04)
BadVA : 85144817
PrId  : 0001992f (MIPS 1004Kc)
Call Trace:
[<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50
[<805ead08>] __skb_get_hash_net+0x74/0x12c
[<805ef330>] get_rps_cpu+0x1b8/0x3fc
[<805fca70>] netif_receive_skb_list_internal+0x324/0x364
[<805fd120>] napi_complete_done+0x68/0x2a4
[<8058de5c>] mtk_napi_rx+0x228/0xfec
[<805fd398>] __napi_poll+0x3c/0x1c4
[<805fd754>] napi_threaded_poll_loop+0x234/0x29c
[<805fd848>] napi_threaded_poll+0x8c/0xb0
[<80053544>] kthread+0x104/0x12c
[<80002bd8>] ret_from_kernel_thread+0x14/0x1c

Code: 02d51821  1060045b  00000000 <8c640000> 3084000f  2c820005  144001a2  00042080  8e220000

To reduce the attack surface and maintain performance, do not process
PPPoE PFC frames.

[1] https://lore.kernel.org/r/20220630231016.GA392@debian.home
Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors")
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
---
Changes in v6: document that PFC is rejected by ppp_proto_is_valid().
 Link to v5: https://lore.kernel.org/r/20260414021353.23471-1-qingfang.deng@linux.dev

 net/core/flow_dissector.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1b61bb25ba0e..c471559db4d9 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1374,16 +1374,13 @@ bool __skb_flow_dissect(const struct net *net,
 			break;
 		}
 
-		/* least significant bit of the most significant octet
-		 * indicates if protocol field was compressed
+		/* PFC (compressed 1-byte protocol) frames are not processed.
+		 * A compressed protocol field has the least significant bit of
+		 * the most significant octet set, which will fail the following
+		 * ppp_proto_is_valid(), returning FLOW_DISSECT_RET_OUT_BAD.
 		 */
 		ppp_proto = ntohs(hdr->proto);
-		if (ppp_proto & 0x0100) {
-			ppp_proto = ppp_proto >> 8;
-			nhoff += PPPOE_SES_HLEN - 1;
-		} else {
-			nhoff += PPPOE_SES_HLEN;
-		}
+		nhoff += PPPOE_SES_HLEN;
 
 		if (ppp_proto == PPP_IP) {
 			proto = htons(ETH_P_IP);
-- 
2.43.0


^ permalink raw reply related

* [PATCH net v6 2/2] pppoe: drop PFC frames
From: Qingfang Deng @ 2026-04-15  2:24 UTC (permalink / raw)
  To: linux-ppp, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Qingfang Deng, Taegu Ha,
	Simon Horman, Kees Cook, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Guillaume Nault, Eric Woudstra,
	Sam Protsenko, netdev, linux-kernel
  Cc: Paul Mackerras, Jaco Kroon, James Carlson, Wojciech Drewek,
	Marcin Szycik
In-Reply-To: <20260415022456.141758-1-qingfang.deng@linux.dev>

RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
PFC for PPPoE sessions, and the current PPPoE driver assumes an
uncompressed (2-byte) protocol field. However, the generic PPP layer
function ppp_input() is not aware of the negotiation result, and still
accepts PFC frames.

If a peer with a broken implementation or an attacker sends a frame with
a compressed (1-byte) protocol field, the subsequent PPP payload is
shifted by one byte. This causes the network header to be 4-byte
misaligned, which may trigger unaligned access exceptions on some
architectures.

To reduce the attack surface, drop PPPoE PFC frames. Introduce
ppp_skb_is_compressed_proto() helper function to be used in both
ppp_generic.c and pppoe.c to avoid open-coding.

Fixes: 7fb1b8ca8fa1 ("ppp: Move PFC decompression to PPP generic layer")
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changes in v6: none
 Link to v5: https://lore.kernel.org/r/20260414021353.23471-2-qingfang.deng@linux.dev

 drivers/net/ppp/ppp_generic.c |  2 +-
 drivers/net/ppp/pppoe.c       |  8 +++++++-
 include/linux/ppp_defs.h      | 16 ++++++++++++++++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index b0d3bc49c685..57c68efa5ff8 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2245,7 +2245,7 @@ ppp_do_recv(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
  */
 static void __ppp_decompress_proto(struct sk_buff *skb)
 {
-	if (skb->data[0] & 0x01)
+	if (ppp_skb_is_compressed_proto(skb))
 		*(u8 *)skb_push(skb, 1) = 0x00;
 }
 
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index d546a7af0d54..bdd61c504a1c 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -393,7 +393,7 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (skb_mac_header_len(skb) < ETH_HLEN)
 		goto drop;
 
-	if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr)))
+	if (!pskb_may_pull(skb, PPPOE_SES_HLEN))
 		goto drop;
 
 	ph = pppoe_hdr(skb);
@@ -403,6 +403,12 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (skb->len < len)
 		goto drop;
 
+	/* skb->data points to the PPP protocol header after skb_pull_rcsum.
+	 * Drop PFC frames.
+	 */
+	if (ppp_skb_is_compressed_proto(skb))
+		goto drop;
+
 	if (pskb_trim_rcsum(skb, len))
 		goto drop;
 
diff --git a/include/linux/ppp_defs.h b/include/linux/ppp_defs.h
index b7e57fdbd413..b1d1f46d7d3b 100644
--- a/include/linux/ppp_defs.h
+++ b/include/linux/ppp_defs.h
@@ -8,6 +8,7 @@
 #define _PPP_DEFS_H_
 
 #include <linux/crc-ccitt.h>
+#include <linux/skbuff.h>
 #include <uapi/linux/ppp_defs.h>
 
 #define PPP_FCS(fcs, c) crc_ccitt_byte(fcs, c)
@@ -25,4 +26,19 @@ static inline bool ppp_proto_is_valid(u16 proto)
 	return !!((proto & 0x0101) == 0x0001);
 }
 
+/**
+ * ppp_skb_is_compressed_proto - checks if PPP protocol in a skb is compressed
+ * @skb: skb to check
+ *
+ * Check if the PPP protocol field is compressed (the least significant
+ * bit of the most significant octet is 1). skb->data must point to the PPP
+ * protocol header.
+ *
+ * Return: Whether the PPP protocol field is compressed.
+ */
+static inline bool ppp_skb_is_compressed_proto(const struct sk_buff *skb)
+{
+	return unlikely(skb->data[0] & 0x01);
+}
+
 #endif /* _PPP_DEFS_H_ */
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v2 2/2] KEYS: annotate struct user_key_payload with __counted_by
From: Jarkko Sakkinen @ 2026-04-15  2:25 UTC (permalink / raw)
  To: Thorsten Blum
  Cc: David Howells, Kees Cook, Gustavo A. R. Silva, netdev, keyrings,
	linux-kernel, linux-hardening
In-Reply-To: <20260409225703.158552-7-thorsten.blum@linux.dev>

On Fri, Apr 10, 2026 at 12:57:05AM +0200, Thorsten Blum wrote:
> Add the __counted_by() compiler attribute to the flexible array member
> 'data' to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
> CONFIG_FORTIFY_SOURCE.
> 
> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
> ---
> Changes in v2:
> - Use __aligned(8) as suggested by David
> - v1: https://lore.kernel.org/lkml/20260409073711.57020-6-thorsten.blum@linux.dev/
> 
> Cc: netdev@vger.kernel.org
> ---
>  include/keys/user-type.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/keys/user-type.h b/include/keys/user-type.h
> index 386c31432789..c29ed9f5d300 100644
> --- a/include/keys/user-type.h
> +++ b/include/keys/user-type.h
> @@ -27,7 +27,8 @@
>  struct user_key_payload {
>  	struct rcu_head	rcu;		/* RCU destructor */
>  	unsigned short	datalen;	/* length of this data */
> -	char		data[] __aligned(__alignof__(u64)); /* actual data */
> +	char		data[]		/* actual data */
> +			__aligned(8) __counted_by(datalen);
>  };
>  
>  extern struct key_type key_type_user;

The change is acceptable but I see no point split the declaration into
two lines (as long as it passes checkpatch.pl as per line length).

BR, Jarkko

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] bpf: Use kmalloc_nolock() universally in local storage
From: Alexei Starovoitov @ 2026-04-15  2:27 UTC (permalink / raw)
  To: Slava Imameev
  Cc: ameryhung, andrii, ast, bot+bpf-ci, bpf, clm, daniel, eddyz87,
	ihor.solodrai, kernel-team, martin.lau, memxor, netdev,
	yonghong.song, linux-open-source
In-Reply-To: <20260413034829.39307-1-slava.imameev@crowdstrike.com>

On Mon, Apr 13, 2026 at 01:48:29PM +1000, Slava Imameev wrote:
> On Fri, 10 Apr 2026 21:39:00 -0700 Alexei Starovoitov wrote:
> > >
> > >
> > > This allows value sizes up to ~65KB. Before this patch, socket and
> > > inode storage used bpf_map_kzalloc() (backed by regular kmalloc)
> > > which could handle those large sizes. After this patch, any
> > > elem_size above KMALLOC_MAX_CACHE_SIZE will silently fail: the map
> > > creation succeeds via bpf_local_storage_map_alloc_check() but every
> > > element allocation returns NULL.
> > >
> > > Should BPF_LOCAL_STORAGE_MAX_VALUE_SIZE be updated to use
> > > KMALLOC_MAX_CACHE_SIZE instead of KMALLOC_MAX_SIZE now that all
> > > storage types go through kmalloc_nolock()?
> > >
> > > Slava Imameev raised the same concern for task storage in
> > > https://lore.kernel.org/bpf/20260410014341.47043-1-slava.imameev@crowdstrike.com/
> > 
> > Right. Let's update it, but I don't think it's a regression.
> > On a loaded system kmalloc_large() rarely succeeds for order 2+.
> > That's why kmalloc_nolock() doesn't attempt to bridge that gap.
> > One or two contiguous physical pages is the best one can expect.
> > In early bpf days we picked KMALLOC_MAX_SIZE assuming that
> > it's a realistic max for kmalloc().
> > It turned out to be wishful thinking.
> > kmalloc_large concept should really be removed.
> > It deceives users into thinking that it's usable.
> 
> In defense of supporting 8KB-64KB allocations for local
> storage, we can consider BPF_MAP_TYPE_HASH with BPF_F_NO_PREALLOC
> as providing similar functionality to replace the missing 8KB-64KB
> local storage allocation support. However, these map entry
> allocations can also fail with similar probability since they
> depend on the same underlying allocator.

I really hope that 64kb task local storage is not your production code.
Severs easily have 50k threads. Sometimes more.
64k * 50k = 3 Gbytes of memory wasted.
You need to redesign it from ground up.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox