Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v1 net-next 00/10] net: fib_rules: RTNL-less RTM_NEWRULE and RTM_DELRULE.
From: Kuniyuki Iwashima @ 2026-07-01 16:53 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Ahern, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260701133858.GA1439085@shredder>

On Wed, Jul 1, 2026 at 6:39 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Mon, Jun 29, 2026 at 06:10:52PM +0000, Kuniyuki Iwashima wrote:
> > RTM_NEWRULE and RTM_DELRULE acquire rtnl_net_lock(), but this is
> > only for fib_unmerge() in IPv4.
> >
> > Since commit d954a67a7dfa ("ipv4: fib_rule: Move fib4_rules_exit()
> > to ->exit()."), RTM_DELRULE no longer needs RTNL.
> >
> > fib_unmerge() is one-time event for each netns, so we only need
> > RTNL for the first IPv4 rule.
> >
> > This series introduces per-fib_rules_ops mutex and drops RTNL
> > from fib_rules code except for the first IPv4 RTM_NEWRULE.
>
> LGTM, thanks:
>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>

Thanks Ido !

>
> A few nits that can be addressed in a follow-up:
>
> 1. Patch #3:
>
> The comment at the top of netns_ipv4 suggests that we should document
> the new lock in
> Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
>
> Related: Did you consider moving this lock under
> CONFIG_IP_MULTIPLE_TABLES?

I put it just after fib_table_hash but it can be moved there indeed.
I will follow up.

>
> 2. Patch #5:
>
> Sashiko suggests a mutex_destroy() in fib_rules_unregister():
>
> https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260629181226.1929658-1-kuniyu%40google.com?part=5

I had an impression that many core code do not use mutex_destroy(),
but didn't know mutex_destroy() is nop w/o debug config.  Probably it's
a kind of new (I mean not 90s) function and that's the reason ?

I will follow up on this too :)

^ permalink raw reply

* Re: [PATCH 4/4] sfc: use kmalloc() to allocate logging buffer
From: Edward Cree @ 2026-07-01 16:59 UTC (permalink / raw)
  To: Mike Rapoport (Microsoft), Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Manish Chopra, Paolo Abeni
  Cc: Przemek Kitszel, Sudarsana Kalluru, Tony Nguyen, intel-wired-lan,
	linux-kernel, linux-mm, linux-net-drivers, netdev
In-Reply-To: <20260701-b4-drivers-ethernet-v1-4-58776615db6e@kernel.org>

On 01/07/2026 14:57, Mike Rapoport (Microsoft) wrote:
> efx_mcdi_init() allocates a logging buffer for MCDI firmware
> communication diagnostics.
> 
> This buffer can be allocated with kmalloc() as there's nothing special
> about it to go directly to the page allocator.
> 
> kmalloc() provides a better API that does not require ugly casts and
> kfree() does not need to know the size of the freed object.
> 
> Performance difference between kmalloc() and __get_free_pages() is not
> measurable as both allocators take an object/page from a per-CPU list for
> fast path allocations.
> 
> For the slow path the performance is anyway determined by the amount of
> reclaim involved rather than by what allocator is used.
> 
> Replace use of __get_free_page() with kmalloc() and free_page() with
> kfree().
> 
> Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>

> ---
>  drivers/net/ethernet/sfc/mcdi.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
> index e65db9b70724..b806d3d90c42 100644
> --- a/drivers/net/ethernet/sfc/mcdi.c
> +++ b/drivers/net/ethernet/sfc/mcdi.c
> @@ -7,6 +7,7 @@
>  #include <linux/delay.h>
>  #include <linux/moduleparam.h>
>  #include <linux/atomic.h>
> +#include <linux/slab.h>
>  #include "net_driver.h"
>  #include "nic.h"
>  #include "io.h"
> @@ -71,7 +72,7 @@ int efx_mcdi_init(struct efx_nic *efx)
>  	mcdi->efx = efx;
>  #ifdef CONFIG_SFC_MCDI_LOGGING
>  	/* consuming code assumes buffer is page-sized */
> -	mcdi->logging_buffer = (char *)__get_free_page(GFP_KERNEL);
> +	mcdi->logging_buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
>  	if (!mcdi->logging_buffer)
>  		goto fail1;
>  	mcdi->logging_enabled = mcdi_logging_default;
> @@ -112,7 +113,7 @@ int efx_mcdi_init(struct efx_nic *efx)
>  	return 0;
>  fail2:
>  #ifdef CONFIG_SFC_MCDI_LOGGING
> -	free_page((unsigned long)mcdi->logging_buffer);
> +	kfree(mcdi->logging_buffer);
>  fail1:
>  #endif
>  	kfree(efx->mcdi);
> @@ -138,7 +139,7 @@ void efx_mcdi_fini(struct efx_nic *efx)
>  		return;
>  
>  #ifdef CONFIG_SFC_MCDI_LOGGING
> -	free_page((unsigned long)efx->mcdi->iface.logging_buffer);
> +	kfree(efx->mcdi->iface.logging_buffer);
>  #endif
>  
>  	kfree(efx->mcdi);
> 


^ permalink raw reply

* Re: [PATCH 3/4] sfc/siena: use kmalloc() to allocate logging buffer
From: Edward Cree @ 2026-07-01 17:01 UTC (permalink / raw)
  To: Mike Rapoport (Microsoft), Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Manish Chopra, Paolo Abeni
  Cc: Przemek Kitszel, Sudarsana Kalluru, Tony Nguyen, intel-wired-lan,
	linux-kernel, linux-mm, linux-net-drivers, netdev
In-Reply-To: <20260701-b4-drivers-ethernet-v1-3-58776615db6e@kernel.org>

On 01/07/2026 14:57, Mike Rapoport (Microsoft) wrote:
> efx_siena_mcdi_init() allocates a logging buffer for MCDI firmware
> communication diagnostics.
> 
> This buffer can be allocated with kmalloc() as there's nothing special
> about it to go directly to the page allocator.
> 
> kmalloc() provides a better API that does not require ugly casts and
> kfree() does not need to know the size of the freed object.
> 
> Performance difference between kmalloc() and __get_free_pages() is not
> measurable as both allocators take an object/page from a per-CPU list for
> fast path allocations.
> 
> For the slow path the performance is anyway determined by the amount of
> reclaim involved rather than by what allocator is used.
> 
> Replace use of __get_free_page() with kmalloc() and free_page() with
> kfree().
> 
> Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>

(resending since I hit 'reply' instead of 'reply all' the first time)

> ---
>  drivers/net/ethernet/sfc/siena/mcdi.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/siena/mcdi.c b/drivers/net/ethernet/sfc/siena/mcdi.c
> index 4d0d6bd5d3d1..048c1e6017c0 100644
> --- a/drivers/net/ethernet/sfc/siena/mcdi.c
> +++ b/drivers/net/ethernet/sfc/siena/mcdi.c
> @@ -7,6 +7,7 @@
>  #include <linux/delay.h>
>  #include <linux/moduleparam.h>
>  #include <linux/atomic.h>
> +#include <linux/slab.h>
>  #include "net_driver.h"
>  #include "nic.h"
>  #include "io.h"
> @@ -73,7 +74,7 @@ int efx_siena_mcdi_init(struct efx_nic *efx)
>  	mcdi->efx = efx;
>  #ifdef CONFIG_SFC_SIENA_MCDI_LOGGING
>  	/* consuming code assumes buffer is page-sized */
> -	mcdi->logging_buffer = (char *)__get_free_page(GFP_KERNEL);
> +	mcdi->logging_buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
>  	if (!mcdi->logging_buffer)
>  		goto fail1;
>  	mcdi->logging_enabled = efx_siena_mcdi_logging_default;
> @@ -116,7 +117,7 @@ int efx_siena_mcdi_init(struct efx_nic *efx)
>  	return 0;
>  fail2:
>  #ifdef CONFIG_SFC_SIENA_MCDI_LOGGING
> -	free_page((unsigned long)mcdi->logging_buffer);
> +	kfree(mcdi->logging_buffer);
>  fail1:
>  #endif
>  	kfree(efx->mcdi);
> @@ -142,7 +143,7 @@ void efx_siena_mcdi_fini(struct efx_nic *efx)
>  		return;
>  
>  #ifdef CONFIG_SFC_SIENA_MCDI_LOGGING
> -	free_page((unsigned long)efx->mcdi->iface.logging_buffer);
> +	kfree(efx->mcdi->iface.logging_buffer);
>  #endif
>  
>  	kfree(efx->mcdi);
> 

^ permalink raw reply

* Re: [PATCH 5/9] ax88179_178a: Add support for ethtool pause parameter configuration
From: Andrew Lunn @ 2026-07-01 17:05 UTC (permalink / raw)
  To: Birger Koblitz
  Cc: Maxime Chevallier, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-usb, netdev, linux-kernel
In-Reply-To: <d2edb164-91c8-40e4-a104-db690609cb57@birger-koblitz.de>

On Wed, Jul 01, 2026 at 06:22:31PM +0200, Birger Koblitz wrote:
> Hi Andrew,
> 
> thanks for reviewing this patch-series! I will answer to the other questions later,
> so that the answers stay together. But it is probably best if I give this answer
> immediately:
> On 7/1/26 17:08, Andrew Lunn wrote:
> > > > +static void ax88179a_get_pauseparam(struct net_device *net, struct ethtool_pauseparam *pause)
> > > > +	if (!(bmcr & BMCR_ANENABLE)) {
> > > > +		pause->autoneg = 0;
> > > > +		pause->rx_pause = 0;
> > > > +		pause->tx_pause = 0;
> > > The best way to have this correct is to use phylink, but for that you'd need to
> > > have a proper PHY driver instead of using the mii_ API here.
> > 
> > I said the some to one of the other patches.
> > 
> > Do we know what PHYs are being used? Can register 2 and 3 be read to
> > get the PHY IDs?
> > 
> > 	Andrew
> 
> I tested
>   id1 = ax88179_mdio_read(dev->net, dev->mii.phy_id, MII_PHYSID1);
>   id2 = ax88179_mdio_read(dev->net, dev->mii.phy_id, MII_PHYSID2);
> 
> and got:

Thanks for these numbers.

> Renkforce AX88179A: ID1 7c9f, ID2 7061
> Delock AX88279  ID1 03a2, ID2 a411

air_en8811h.c:#define EN8811H_PHY_ID		0x03a2a411

> UGreen AX88772D ID1 e65b, ID2 2c61
> TP-Link AX88179A ID1 e65b, ID2 2c61

The two ID registers contain part of an OUI, but it has some bits
missing. So it is not so easy to look it up.

However, anything using the MII framework basically assumes a very
simple PHY and only looks at the 802.3 defined registers. So the
genphy generic PHY driver might be sufficient for when there is not a
specific driver. At lot depends on how much extra code there is
accessing the PHY registers in the driver.

	 Andrew

^ permalink raw reply

* Re: [PATCH net 1/9] netfilter: nf_conntrack_expect: zero at allocation time
From: patchwork-bot+netdevbpf @ 2026-07-01 17:10 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo
In-Reply-To: <20260630045243.2657-2-fw@strlen.de>

Hello:

This series was applied to netdev/net.git (main)
by Florian Westphal <fw@strlen.de>:

On Tue, 30 Jun 2026 06:52:35 +0200 you wrote:
> There are occasional LLM hints wrt. leaking uninitialized data to
> userspace via ctnetlink.  Just zero at allocation time,
> expectations are not frequently used these days.
> 
> Intentionally keeps _init as-is because we could theoretically
> support re-init, so add the missing exp->dir there.
> 
> [...]

Here is the summary with links:
  - [net,1/9] netfilter: nf_conntrack_expect: zero at allocation time
    https://git.kernel.org/netdev/net/c/241ccd2fed90
  - [net,2/9] netfilter: nft_set_pipapo: don't leak bad clone into future transaction
    https://git.kernel.org/netdev/net/c/47e65eff5069
  - [net,3/9] netfilter: ipset: fix race between dump and ip_set_list resize
    https://git.kernel.org/netdev/net/c/7cd9103283b2
  - [net,4/9] netfilter: nf_conntrack_sip: validate skb_dst() before accessing it
    https://git.kernel.org/netdev/net/c/e5e24a365a5e
  - [net,5/9] netfilter: nfnetlink_cthelper: cap to maximum number of expectation per master
    https://git.kernel.org/netdev/net/c/bf5355cfdede
  - [net,6/9] netfilter: nft_fib: reject fib expression on the netdev egress hook
    https://git.kernel.org/netdev/net/c/d07955dd34ec
  - [net,7/9] netfilter: nfnetlink_queue: restrict writes to network header
    https://git.kernel.org/netdev/net/c/54f34607d184
  - [net,8/9] netfilter: nftables: restrict linklayer and network header writes
    https://git.kernel.org/netdev/net/c/df07998dfd40
  - [net,9/9] netfilter: nftables: restrict checkum update offset
    https://git.kernel.org/netdev/net/c/c3716a3c4346

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* RE: Ethtool : PRBS feature
From: Das, Shubham @ 2026-07-01 17:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alexander Duyck, Lee Trager, Maxime Chevallier,
	netdev@vger.kernel.org, mkubecek@suse.cz, D H, Siddaraju,
	Chintalapalle, Balaji, Lindberg, Magnus,
	niklas.damberg@ericsson.com, Wirandi, Jonas, Srinivasan, Vijay
In-Reply-To: <6e905826-779a-456d-a37d-7602a37ab6d7@lunn.ch>

> Sorry, but i could not implement that, in a sensible way, given its current
> specification.
> 
> I suppose i could simply flip the first `inject-error-count` bits, and make the rest of
> the stream perfect? I could also wait until the stop command is received, and
> then flip that many bits before i stop the stream? But none of these seem
> sensible.
> 
> Please make this specification have sufficient details, or references to 802.3, that
> you could give it to another engineer and get back a reasonable implementation,
> without having to answer any questions.

Andrew,

IEEE has clear documentation of the PRBS Receiver block and the BER counter as an output.
Before performing the actual BER validation, it is a usual industry practice to introduce errors
to guarantee that the checker is functional and accurately identifying them.

Similarly, in DATA mode, error injection is used to verify the FEC block
by ensuring that injected errors are detected and corrected as expected.

Updated description.

+        name: inject-error-count
+        type: u32
+        doc: |
+          Request the PHY to inject exactly this many bit errors into the
+          currently active test data stream.
+
+          This is a diagnostic tool used to validate that the far-end PRBS
+          checker or FEC decoder is functioning correctly. For example,
+          after enabling a PRBS pattern and confirming ber-lock-status is
+          locked, injecting N errors should cause ber-error-count to
+          increment by exactly N on the receiving port, confirming the
+          checker is actively detecting bit errors. Similarly, in normal
+          data mode with FEC enabled, injecting errors verifies that the
+          FEC block detects errors as expected.

> > +      name: phy-test-act
> > +      doc: |
> > +        Configure PHY test parameters. Each attribute is optional and only
> > +        specified attributes are applied. TX/RX patterns are set on the
> > +        local port. BERT and error injection operate on the receiver port.
> 
> Error injection operates on the receive port? That is not what i expected. I should
> go read 802.3, and understand how this is used.

Thanks for the correction, it is in transmit direction, Updated description.

+      name: phy-test-act
+      doc: |
+        Configure PHY test parameters. Each attribute is optional and only
+        specified attributes are applied. TX/RX patterns are set on the
+        local port. BERT operates on the receiver port, while errors
+        are injected through the PRBS/DATA transmission port. 
+        When bert-action is stats, a reply with BERT counters is returned.
+        Typical workflow:
+          ethtool --phy-test eth1 tx-pattern prbs7  (TX side)
+          ethtool --phy-test eth2 rx-pattern prbs7  (RX side)
+          ethtool --phy-test eth2 bert start        (start BERT on RX)
+          ethtool --phy-test eth2 bert stats        (read counters)
+          ethtool --phy-test eth2 bert stop         (stop BERT)

- Shubham D


> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 29 June 2026 22:27
> To: Das, Shubham <shubham.das@intel.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>; Lee Trager <lee@trager.us>;
> Maxime Chevallier <maxime.chevallier@bootlin.com>; netdev@vger.kernel.org;
> mkubecek@suse.cz; D H, Siddaraju <siddaraju.dh@intel.com>; Chintalapalle,
> Balaji <balaji.chintalapalle@intel.com>; Lindberg, Magnus
> <magnus.k.lindberg@ericsson.com>; niklas.damberg@ericsson.com
> Subject: Re: Ethtool : PRBS feature
> 
> > +        name: inject-error-count
> > +        type: u32
> > +        doc: |
> > +          Number of errors to inject. Each invocation injects the specified
> > +          number of bit errors into the data stream.
> 
> Sorry, but i could not implement that, in a sensible way, given its current
> specification.
> 
> I suppose i could simply flip the first `inject-error-count` bits, and make the rest of
> the stream perfect? I could also wait until the stop command is received, and
> then flip that many bits before i stop the stream? But none of these seem
> sensible.
> 
> Please make this specification have sufficient details, or references to 802.3, that
> you could give it to another engineer and get back a reasonable implementation,
> without having to answer any questions.
> 
> > +      name: phy-test-act
> > +      doc: |
> > +        Configure PHY test parameters. Each attribute is optional and only
> > +        specified attributes are applied. TX/RX patterns are set on the
> > +        local port. BERT and error injection operate on the receiver port.
> 
> Error injection operates on the receive port? That is not what i expected. I should
> go read 802.3, and understand how this is used.
> 
> 	Andrew

^ permalink raw reply

* Re: [RFC PATCH bpf-next v1 0/7] xdp: RX checksum metadata hint and checksum assertion over redirect
From: Vladimir Vdovin @ 2026-07-01 17:10 UTC (permalink / raw)
  To: lorenzo, sdf, kuba
  Cc: andrii, ast, daniel, hawk, john.fastabend, martin.lau, sdf.kernel,
	bpf, netdev, Vladimir Vdovin
In-Reply-To: <akRAUTAqVKkxmoVa@lore-desk>

Hi Lorenzo,

Sorry -- I blindly missed your earlier RX-checksum series before I posted
mine, thanks Stanislav for the pointer.

To answer your question: yes, I'm happy to take on the driver selftest
Jakub asked for.

As for my own series, the read side clearly overlaps yours and you own it,
so I'll drop my bpf_xdp_metadata_rx_csum() hint and its driver bits.

What's left that is genuinely separate is the "assertion" half -- a non-dev-bound
bpf_xdp_assert_rx_csum() that preserves the HW verdict across a
cpumap/redirect: it sets a flag on the xdp_buff that rides into the
xdp_frame and becomes skb->ip_summed = CHECKSUM_UNNECESSARY in
__xdp_build_skb_from_frame().

Should I resend that as a small standalone series (v2, assert-only)?
It also looks like a PoC that you and Jakub discussed on v3 [1].

A few things I'd like to confirm before writing the test:

1. API for v4. In the v3 discussion you agreed to rework the API to report
   both COMPLETE and UNNECESSARY (+ csum_level), per Jakub. Do you plan to
   send that in v4, or should the driver selftest target the current v3
   signature (enum xdp_checksum + cksum_meta)? I'd rather write the test
   against the API you intend to keep.

2. Documented behavior. The selftest is meant to "check the documented
   expectation", so which rule should it assert -- "a driver must never
   report CHECKSUM_COMPLETE while an XDP program is attached", or that the
   driver downgrades/repairs COMPLETE on the XDP_PASS path? I'll write the
   doc paragraph and the test to match whatever we settle on.

3. Drivers. Your series adds veth and ice; I don't see mlx5e -- was that
   intentional (left to the driver maintainers)? I had an mlx5e
   implementation in my v1 and I'm happy to contribute it to your series if
   it's useful.

For the test itself I was thinking of extending
tools/testing/selftests/drivers/net/hw/xdp_metadata.py, gated on the new
"checksum" xdp-rx-metadata feature, with good-csum / bad-csum / modify +
XDP_PASS cases. Does that match what you and Jakub had in mind?

[1] https://lore.kernel.org/bpf/20260217-bpf-xdp-meta-rxcksum-v3-0-30024c50ba71@kernel.org/

Thanks,
Vladimir

^ permalink raw reply

* RTL8159 firmware
From: Jan Hendrik Farr @ 2026-07-01 17:13 UTC (permalink / raw)
  To: mail
  Cc: andrew+netdev, davem, edumazet, hsu.chih.kai, kuba, linux-kernel,
	linux-usb, netdev, olek2, pabeni, Jan Hendrik Farr
In-Reply-To: <20260505-rtl8159_net_next-v4-3-1a648a9c4d8d@birger-koblitz.de>

Hi Birger,

it looks like the firmware file rtl_nic/rtl8159-1.fw isn't in linux-firmware yet.
Could you send it for people to potentially test?

Jan


^ permalink raw reply

* Re: RTL8159 firmware
From: Birger Koblitz @ 2026-07-01 17:24 UTC (permalink / raw)
  To: Jan Hendrik Farr
  Cc: andrew+netdev, davem, edumazet, hsu.chih.kai, kuba, linux-kernel,
	linux-usb, netdev, olek2, pabeni
In-Reply-To: <20260701171327.2916132-1-kernel@jfarr.cc>

Hi Jan,

On 7/1/26 19:13, Jan Hendrik Farr wrote:
> Hi Birger,
> 
> it looks like the firmware file rtl_nic/rtl8159-1.fw isn't in linux-firmware yet.
> Could you send it for people to potentially test?
> 
> Jan
> 
The code to create the binary firmware file is at:
https://gitlab.com/koblitz-rtlnic/rtlnic_fw
But I cannot submit the firmware itself to linux-firmware, as the sourcecode from
which the binary data is extracted is published by Realtek under the GPL.
For linux-firmware, a binary distribution license is necessary, which requires
someone from Realtek to license it under their usual firmware license. I
contacted Realtek, but never heard back.

Cheers,
   Birger



^ permalink raw reply

* Re: [PATCH v2] selftests/net/openvswitch: add output truncation test
From: Aaron Conole @ 2026-07-01 17:26 UTC (permalink / raw)
  To: Minxi Hou; +Cc: netdev, echaudro, linux-kselftest
In-Reply-To: <20260630102208.29140-1-houminxi@gmail.com>

Minxi Hou <houminxi@gmail.com> writes:

> Add test_trunc exercising the OVS_ACTION_ATTR_TRUNC action. The test
> verifies truncation in three steps: first confirm normal forwarding
> works, then apply trunc(14) which truncates packets to the Ethernet
> header and verify ping fails, then restore normal forwarding and
> verify connectivity recovers.
>
> The trunc action sets OVS_CB(skb)->cutlen, causing pskb_trim at
> output time. With trunc(14) the IP payload is stripped, so the
> receiver drops the frame and ICMP echo reply is never generated.
>
> Signed-off-by: Minxi Hou <houminxi@gmail.com>
> ---

NOTE: The subject should have had net-next I think

>  .../selftests/net/openvswitch/openvswitch.sh  | 66 +++++++++++++++++++
>  1 file changed, 66 insertions(+)
>
> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> index 2954245129a2..fef21eb4a129 100755
> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> @@ -32,6 +32,7 @@ tests="
>  	dec_ttl					ttl: dec_ttl decrements IP TTL
>  	flow_set				flow-set: Flow modify
>  	action_set				set: SET action rewrites fields
> +	trunc					trunc: output truncation
>  	psample					psample: Sampling packets with psample"
>  
>  info() {
> @@ -443,6 +444,71 @@ test_action_set() {
>  	return 0
>  }
>  

Most of the tests have a comment in front of them that does some
description.  I haven't been good about enforcing it, but at least I'd
like to see it here to describe what trunc limits are being tested (in
this case, just limiting to 14).

Perhaps we can also add testing for trunc(1) since that should reject
(or both trunc(1) and trunc(13))

> +test_trunc() {
> +	sbx_add "test_trunc" || return $?
> +	ovs_add_dp "test_trunc" trunctest || return 1
> +
> +	info "create namespaces"
> +	for ns in client server; do
> +		ovs_add_netns_and_veths "test_trunc" "trunctest" "$ns" \
> +			"${ns:0:1}0" "${ns:0:1}1" || return 1
> +	done
> +
> +	ip netns exec client ip addr add 10.0.0.1/24 dev c1
> +	ip netns exec client ip link set c1 up
> +	ip netns exec server ip addr add 10.0.0.2/24 dev s1
> +	ip netns exec server ip link set s1 up
> +
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
> +
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(1),eth(),eth_type(0x0800),ipv4()' '2' || return 1
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(2),eth(),eth_type(0x0800),ipv4()' '1' || return 1
> +
> +	info "verify connectivity without truncation"
> +	ovs_sbx "test_trunc" ip netns exec client ping -c 1 -W 2 \
> +		10.0.0.2 || return 1
> +
> +	ovs_del_flows "test_trunc" trunctest
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
> +
> +	info "add truncated forwarding flow"
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(1),eth(),eth_type(0x0800),ipv4()' \
> +		'trunc(14),2' || return 1
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(2),eth(),eth_type(0x0800),ipv4()' '1' || return 1
> +
> +	info "verify ping fails with truncation"
> +	ovs_sbx "test_trunc" ip netns exec client ping -c 1 -W 2 \
> +		10.0.0.2 >/dev/null 2>&1 \
> +		&& { info "FAIL: ping should fail with trunc(14)"
> +		     return 1; }
> +
> +	ovs_del_flows "test_trunc" trunctest
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(1),eth(),eth_type(0x0800),ipv4()' '2' || return 1
> +	ovs_add_flow "test_trunc" trunctest \
> +		'in_port(2),eth(),eth_type(0x0800),ipv4()' '1' || return 1
> +
> +	info "verify connectivity restored without truncation"
> +	ovs_sbx "test_trunc" ip netns exec client ping -c 1 -W 2 \
> +		10.0.0.2 || return 1
> +
> +	return 0
> +}
> +
>  # psample test
>  # - use psample to observe packets
>  test_psample() {
>
> base-commit: cef9d6804030793cf8b8796fd6936197d065dd3e


^ permalink raw reply

* Re: Ethtool : PRBS feature
From: Andrew Lunn @ 2026-07-01 17:32 UTC (permalink / raw)
  To: Das, Shubham
  Cc: Alexander Duyck, Lee Trager, Maxime Chevallier,
	netdev@vger.kernel.org, mkubecek@suse.cz, D H, Siddaraju,
	Chintalapalle, Balaji, Lindberg, Magnus,
	niklas.damberg@ericsson.com, Wirandi, Jonas, Srinivasan, Vijay
In-Reply-To: <SN7PR11MB81098D82A7B377E0AC439979FFF62@SN7PR11MB8109.namprd11.prod.outlook.com>

On Wed, Jul 01, 2026 at 05:10:43PM +0000, Das, Shubham wrote:
> > Sorry, but i could not implement that, in a sensible way, given its current
> > specification.
> > 
> > I suppose i could simply flip the first `inject-error-count` bits, and make the rest of
> > the stream perfect? I could also wait until the stop command is received, and
> > then flip that many bits before i stop the stream? But none of these seem
> > sensible.
> > 
> > Please make this specification have sufficient details, or references to 802.3, that
> > you could give it to another engineer and get back a reasonable implementation,
> > without having to answer any questions.
> 
> Andrew,
> 
> IEEE has clear documentation of the PRBS Receiver block and the BER counter as an output.
> Before performing the actual BER validation, it is a usual industry practice to introduce errors
> to guarantee that the checker is functional and accurately identifying them.
> 
> Similarly, in DATA mode, error injection is used to verify the FEC block
> by ensuring that injected errors are detected and corrected as expected.
> 
> Updated description.
> 
> +        name: inject-error-count
> +        type: u32
> +        doc: |
> +          Request the PHY to inject exactly this many bit errors into the
> +          currently active test data stream.
> +
> +          This is a diagnostic tool used to validate that the far-end PRBS
> +          checker or FEC decoder is functioning correctly. For example,
> +          after enabling a PRBS pattern and confirming ber-lock-status is
> +          locked, injecting N errors should cause ber-error-count to
> +          increment by exactly N on the receiving port, confirming the
> +          checker is actively detecting bit errors. Similarly, in normal
> +          data mode with FEC enabled, injecting errors verifies that the
> +          FEC block detects errors as expected.

There is no mention of how many frames to send in the stream. I don't
think that is part of the API? Because we have no idea of how many
frames will be sent, it is not possible to distribute the corrupted
frames over the duration of the stream. So that means i should flip
one bit, anywhere in the first inject-error-count frames. All frames
after that should not have bit flips. The assumption being, the stream
has a minimum of inject-error-count frames, and if the stream is
short, the counter will be too low. But it does not matter if the
stream is longer.

Your description has no mention of frames. Should it? What exactly
does the ber-error-count count? Can multiple bit flip within one frame
be counted individually? I don't see how, since the checksum just says
the frame is bad, and cannot report how bad.

As i said, give this description to another engineer and ask him/her
how it could be implemented.

https://www.youtube.com/watch?v=j-6N3bLgYyQ

	Andrew

^ permalink raw reply

* [PATCH net v2] net: airoha: fix MIB stats collection to be lossless
From: Aniket Negi @ 2026-07-01 17:39 UTC (permalink / raw)
  To: lorenzo, netdev
  Cc: Aniket Negi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-arm-kernel, linux-mediatek,
	linux-kernel
In-Reply-To: <20260630111834.233643-1-aniket.negi03@gmail.com>

The current driver resets hardware MIB counters after every read via
REG_FE_GDM_MIB_CLEAR. This creates a race window: packets arriving
between the read and the clear are silently lost from statistics.

Fix this by removing the MIB clear and switching to a delta-based
software tracking approach:

- 64-bit H+L registers (tx/rx ok pkts, ok bytes, E64..L1023):
  read the absolute hardware total directly each poll.

- 32-bit registers (drops, bc, mc, errors, runt, long, ...):
  store the previous raw register value in mib_prev and accumulate
  (u32)(curr - prev) into a 64-bit software counter. Unsigned
  subtraction handles wrap-around transparently.

- tx_len[0]/rx_len[0] ({0,64} RMON bucket) combines RUNT_CNT
  (32-bit, delta-tracked via mib_prev.tx_runt_cnt) and E64_CNT
  (64-bit, absolute). A u64 accumulator tx_runt_accum64 holds the
  running RUNT delta sum so that each poll sets:
    tx_len[0] = tx_runt_accum64 + E64_abs
  without double-counting the E64 value.

Merge airoha_dev_get_hw_stats() into airoha_update_hw_stats(),
moving the port spin_lock inside so callers do not need a separate
wrapper.

Signed-off-by: Aniket Negi <aniket.negi03@gmail.com>
---

Changes in v2:
  - Store _CNT_L register reads in val before adding to stats, improving
    readability (suggested by Lorenzo Bianconi)
  - Fix double-counting bug in the RUNT+E64 combined bucket: previously
    "+=" for E64 re-added the full absolute counter each poll; now a
    dedicated tx_runt_accum64/rx_runt_accum64 accumulator holds the
    running RUNT delta, and tx_len[0] is assigned (not accumulated) each
    poll as runt_accum64 + E64_abs
  - Replace 7-element tx_len[]/rx_len[] shadow arrays in mib_prev with
    focused tx_runt_cnt/tx_long_cnt and rx_runt_cnt/rx_long_cnt fields;
    only RUNT and LONG are 32-bit and need wrap-around tracking
  - Rename inner struct hw_prev_stats to mib_prev; rename accumulator
    fields to tx_runt_accum64/rx_runt_accum64 for clarity
  - Fix comment alignment in mib_prev struct block
  - Rename airoha_dev_get_hw_stats() to airoha_update_hw_stats() and
    move the port spin_lock inside, removing the separate wrapper

 drivers/net/ethernet/airoha/airoha_eth.c | 115 +++++++++++++----------
 drivers/net/ethernet/airoha/airoha_eth.h |  27 ++++++
 2 files changed, 92 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 59001fd4b6f7..4b7c547de165 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1686,12 +1686,14 @@ static void airoha_qdma_stop_napi(struct airoha_qdma *qdma)
 	}
 }
 
-static void airoha_dev_get_hw_stats(struct airoha_gdm_dev *dev)
+static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
 {
 	struct airoha_gdm_port *port = dev->port;
 	struct airoha_eth *eth = dev->eth;
 	u32 val, i = 0;
 
+	spin_lock(&port->stats_lock);
+
 	/* Read relevant MIB for GDM with multiple port attached */
 	if (port->id == AIROHA_GDM3_IDX || port->id == AIROHA_GDM4_IDX)
 		airoha_fe_rmw(eth, REG_FE_GDM_MIB_CFG(port->id),
@@ -1701,152 +1703,165 @@ static void airoha_dev_get_hw_stats(struct airoha_gdm_dev *dev)
 
 	u64_stats_update_begin(&dev->stats.syncp);
 
-	/* TX */
+	/* TX - 64-bit H+L registers: hw accumulates the total, read directly. */
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_OK_PKT_CNT_H(port->id));
-	dev->stats.tx_ok_pkts += ((u64)val << 32);
+	dev->stats.tx_ok_pkts = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_OK_PKT_CNT_L(port->id));
 	dev->stats.tx_ok_pkts += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_OK_BYTE_CNT_H(port->id));
-	dev->stats.tx_ok_bytes += ((u64)val << 32);
+	dev->stats.tx_ok_bytes = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_OK_BYTE_CNT_L(port->id));
 	dev->stats.tx_ok_bytes += val;
 
+	/* TX - 32-bit registers: accumulate delta to handle wrap-around. */
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_DROP_CNT(port->id));
-	dev->stats.tx_drops += val;
+	dev->stats.tx_drops += (u32)(val - dev->stats.mib_prev.tx_drops);
+	dev->stats.mib_prev.tx_drops = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_BC_CNT(port->id));
-	dev->stats.tx_broadcast += val;
+	dev->stats.tx_broadcast += (u32)(val - dev->stats.mib_prev.tx_broadcast);
+	dev->stats.mib_prev.tx_broadcast = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_MC_CNT(port->id));
-	dev->stats.tx_multicast += val;
+	dev->stats.tx_multicast += (u32)(val - dev->stats.mib_prev.tx_multicast);
+	dev->stats.mib_prev.tx_multicast = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_RUNT_CNT(port->id));
-	dev->stats.tx_len[i] += val;
+	dev->stats.mib_prev.tx_runt_accum64 +=
+		(u32)(val - dev->stats.mib_prev.tx_runt_cnt);
+	dev->stats.mib_prev.tx_runt_cnt = val;
+
+	/* tx_len[0]: RUNT (32-bit, delta) + E64 (64-bit, absolute) → {0, 64} bucket.
+	 * Accumulate RUNT delta in tx_runt_accum64, then assign tx_len[0] as
+	 * accum + E64_abs so each call gives the correct combined total.
+	 */
+
+	dev->stats.tx_len[i] = dev->stats.mib_prev.tx_runt_accum64;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_E64_CNT_H(port->id));
-	dev->stats.tx_len[i] += ((u64)val << 32);
+	dev->stats.tx_len[i] += (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_E64_CNT_L(port->id));
 	dev->stats.tx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L64_CNT_H(port->id));
-	dev->stats.tx_len[i] += ((u64)val << 32);
+	dev->stats.tx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L64_CNT_L(port->id));
 	dev->stats.tx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L127_CNT_H(port->id));
-	dev->stats.tx_len[i] += ((u64)val << 32);
+	dev->stats.tx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L127_CNT_L(port->id));
 	dev->stats.tx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L255_CNT_H(port->id));
-	dev->stats.tx_len[i] += ((u64)val << 32);
+	dev->stats.tx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L255_CNT_L(port->id));
 	dev->stats.tx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L511_CNT_H(port->id));
-	dev->stats.tx_len[i] += ((u64)val << 32);
+	dev->stats.tx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L511_CNT_L(port->id));
 	dev->stats.tx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L1023_CNT_H(port->id));
-	dev->stats.tx_len[i] += ((u64)val << 32);
+	dev->stats.tx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_L1023_CNT_L(port->id));
 	dev->stats.tx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_TX_ETH_LONG_CNT(port->id));
-	dev->stats.tx_len[i++] += val;
+	dev->stats.tx_len[i++] += (u32)(val - dev->stats.mib_prev.tx_long_cnt);
+	dev->stats.mib_prev.tx_long_cnt = val;
 
 	/* RX */
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_OK_PKT_CNT_H(port->id));
-	dev->stats.rx_ok_pkts += ((u64)val << 32);
+	dev->stats.rx_ok_pkts = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_OK_PKT_CNT_L(port->id));
 	dev->stats.rx_ok_pkts += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_OK_BYTE_CNT_H(port->id));
-	dev->stats.rx_ok_bytes += ((u64)val << 32);
+	dev->stats.rx_ok_bytes = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_OK_BYTE_CNT_L(port->id));
 	dev->stats.rx_ok_bytes += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_DROP_CNT(port->id));
-	dev->stats.rx_drops += val;
+	dev->stats.rx_drops += (u32)(val - dev->stats.mib_prev.rx_drops);
+	dev->stats.mib_prev.rx_drops = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_BC_CNT(port->id));
-	dev->stats.rx_broadcast += val;
+	dev->stats.rx_broadcast += (u32)(val - dev->stats.mib_prev.rx_broadcast);
+	dev->stats.mib_prev.rx_broadcast = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_MC_CNT(port->id));
-	dev->stats.rx_multicast += val;
+	dev->stats.rx_multicast += (u32)(val - dev->stats.mib_prev.rx_multicast);
+	dev->stats.mib_prev.rx_multicast = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ERROR_DROP_CNT(port->id));
-	dev->stats.rx_errors += val;
+	dev->stats.rx_errors += (u32)(val - dev->stats.mib_prev.rx_errors);
+	dev->stats.mib_prev.rx_errors = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_CRC_ERR_CNT(port->id));
-	dev->stats.rx_crc_error += val;
+	dev->stats.rx_crc_error += (u32)(val - dev->stats.mib_prev.rx_crc_error);
+	dev->stats.mib_prev.rx_crc_error = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_OVERFLOW_DROP_CNT(port->id));
-	dev->stats.rx_over_errors += val;
+	dev->stats.rx_over_errors += (u32)(val - dev->stats.mib_prev.rx_over_errors);
+	dev->stats.mib_prev.rx_over_errors = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_FRAG_CNT(port->id));
-	dev->stats.rx_fragment += val;
+	dev->stats.rx_fragment += (u32)(val - dev->stats.mib_prev.rx_fragment);
+	dev->stats.mib_prev.rx_fragment = val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_JABBER_CNT(port->id));
-	dev->stats.rx_jabber += val;
+	dev->stats.rx_jabber += (u32)(val - dev->stats.mib_prev.rx_jabber);
+	dev->stats.mib_prev.rx_jabber = val;
 
 	i = 0;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_RUNT_CNT(port->id));
-	dev->stats.rx_len[i] += val;
+	dev->stats.mib_prev.rx_runt_accum64 +=
+		(u32)(val - dev->stats.mib_prev.rx_runt_cnt);
+	dev->stats.mib_prev.rx_runt_cnt = val;
+
+	/* rx_len[0]: RUNT (32-bit, delta) + E64 (64-bit, absolute) → {0, 64} bucket.
+	 * then assign rx_len[0] = rx_runt_accum64 + E64_abs.
+	 */
 
+	dev->stats.rx_len[i] = dev->stats.mib_prev.rx_runt_accum64;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_E64_CNT_H(port->id));
-	dev->stats.rx_len[i] += ((u64)val << 32);
+	dev->stats.rx_len[i] += (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_E64_CNT_L(port->id));
 	dev->stats.rx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L64_CNT_H(port->id));
-	dev->stats.rx_len[i] += ((u64)val << 32);
+	dev->stats.rx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L64_CNT_L(port->id));
 	dev->stats.rx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L127_CNT_H(port->id));
-	dev->stats.rx_len[i] += ((u64)val << 32);
+	dev->stats.rx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L127_CNT_L(port->id));
 	dev->stats.rx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L255_CNT_H(port->id));
-	dev->stats.rx_len[i] += ((u64)val << 32);
+	dev->stats.rx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L255_CNT_L(port->id));
 	dev->stats.rx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L511_CNT_H(port->id));
-	dev->stats.rx_len[i] += ((u64)val << 32);
+	dev->stats.rx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L511_CNT_L(port->id));
 	dev->stats.rx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L1023_CNT_H(port->id));
-	dev->stats.rx_len[i] += ((u64)val << 32);
+	dev->stats.rx_len[i] = (u64)val << 32;
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_L1023_CNT_L(port->id));
 	dev->stats.rx_len[i++] += val;
 
 	val = airoha_fe_rr(eth, REG_FE_GDM_RX_ETH_LONG_CNT(port->id));
-	dev->stats.rx_len[i++] += val;
+	dev->stats.rx_len[i] += (u32)(val - dev->stats.mib_prev.rx_long_cnt);
+	dev->stats.mib_prev.rx_long_cnt = val;
 
 	u64_stats_update_end(&dev->stats.syncp);
-}
-
-static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
-{
-	struct airoha_gdm_port *port = dev->port;
-	int i;
-
-	spin_lock(&port->stats_lock);
-
-	for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
-		if (port->devs[i])
-			airoha_dev_get_hw_stats(port->devs[i]);
-	}
-
-	/* Reset MIB counters */
-	airoha_fe_set(dev->eth, REG_FE_GDM_MIB_CLEAR(port->id),
-		      FE_GDM_MIB_RX_CLEAR_MASK | FE_GDM_MIB_TX_CLEAR_MASK);
 
 	spin_unlock(&port->stats_lock);
 }
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index f6d01a8e8da1..3af1c49dd62d 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -245,6 +245,33 @@ struct airoha_hw_stats {
 	u64 rx_fragment;
 	u64 rx_jabber;
 	u64 rx_len[7];
+
+	struct {
+		/* Previous HW register values for 32-bit counter delta
+		 * tracking. Storing the last seen value and accumulating
+		 * (u32)(curr - prev) into the 64-bit software counter
+		 * handles wrap-around transparently via unsigned arithmetic.
+		 * tx_runt_accum64/rx_runt_accum64 hold the running sum of
+		 * runt deltas. These fields are never reported to userspace.
+		 */
+		u32 tx_drops;
+		u32 tx_broadcast;
+		u32 tx_multicast;
+		u32 tx_runt_cnt;
+		u32 tx_long_cnt;
+		u64 tx_runt_accum64;
+		u32 rx_drops;
+		u32 rx_broadcast;
+		u32 rx_multicast;
+		u32 rx_errors;
+		u32 rx_crc_error;
+		u32 rx_over_errors;
+		u32 rx_fragment;
+		u32 rx_jabber;
+		u32 rx_runt_cnt;
+		u32 rx_long_cnt;
+		u64 rx_runt_accum64;
+	} mib_prev;
 };
 
 enum {

base-commit: a225f8c20712713406ae47024b8df42deacddd4a
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v3] selftests/net/openvswitch: add ICMPv6 echo type match test
From: Aaron Conole @ 2026-07-01 17:40 UTC (permalink / raw)
  To: Minxi Hou; +Cc: netdev, echaudro, linux-kselftest
In-Reply-To: <20260630102225.29733-1-houminxi@gmail.com>

Minxi Hou <houminxi@gmail.com> writes:

> Register OVS_KEY_ATTR_ICMPV6 in the flow key parser so that
> icmpv6(type=...) can be used in flow specifications. Without this
> registration the parser silently drops the token and the kernel
> rejects the flow with EINVAL because the expected ICMPv6 key
> attribute is missing.
>
> While here, add convert_int() to the ovs_key_ipv6 and ovs_key_icmp
> fields_map entries so that specifying a field value produces the
> correct wildcard mask. The IPv6 flow label uses convert_int(20) to
> produce a 20-bit mask (0x000FFFFF), matching the kernel constraint in
> flow_netlink.c that rejects masks with bits 20-31 set; byte-wide
> fields use convert_int(8). The ipv4 counterpart already does this via
> convert_int(); the ipv6 and icmp classes were simply missing the fifth
> tuple element. Existing callers that pass empty parentheses are
> unaffected because convert_int("") returns (0, 0).
>
> Add test_icmpv6 exercising the ICMPv6 echo flow key. The test uses
> static neighbour entries to bypass NDP, then verifies in three steps:
> install icmpv6(type=128) and icmpv6(type=129) flows and confirm ping
> works, remove the flows and confirm ping fails, reinstall and confirm
> recovery.
>
> Signed-off-by: Minxi Hou <houminxi@gmail.com>
> ---

Same as other, I think this should have had the 'net-next' specifier.

Also, I see that dev@openvswitch.org and Ilya weren't CC'd.  Next
version should do that.

>  .../selftests/net/openvswitch/openvswitch.sh  | 63 +++++++++++++++++++
>  .../selftests/net/openvswitch/ovs-dpctl.py    | 26 +++++---
>  2 files changed, 82 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> index 2954245129a2..2de01137bb50 100755
> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> @@ -32,6 +32,7 @@ tests="
>  	dec_ttl					ttl: dec_ttl decrements IP TTL
>  	flow_set				flow-set: Flow modify
>  	action_set				set: SET action rewrites fields
> +	icmpv6					icmpv6: ICMPv6 echo type match
>  	psample					psample: Sampling packets with psample"
>  
>  info() {
> @@ -443,6 +444,68 @@ test_action_set() {
>  	return 0
>  }
>  

Please document this test case

> +test_icmpv6() {
> +	sbx_add "test_icmpv6" || return $?
> +	ovs_add_dp "test_icmpv6" icmpv6 || return 1
> +
> +	info "create namespaces"
> +	for ns in client server; do
> +		ovs_add_netns_and_veths "test_icmpv6" "icmpv6" \
> +			"$ns" "${ns:0:1}0" "${ns:0:1}1" || return 1
> +	done
> +

===

> +	ip netns exec client ip addr add fd00::1/64 dev c1 nodad
> +	ip netns exec client ip link set c1 up
> +	ip netns exec server ip addr add fd00::2/64 dev s1 nodad
> +	ip netns exec server ip link set s1 up
> +
> +	local cl_mac sl_mac
> +	cl_mac=$(ip netns exec client \
> +		ip link show c1 | awk '/link\/ether/ {print $2}')
> +	[ -z "$cl_mac" ] && \
> +		{ info "failed to get c1 hwaddr"; return 1; }
> +	sl_mac=$(ip netns exec server \
> +		ip link show s1 | awk '/link\/ether/ {print $2}')
> +	[ -z "$sl_mac" ] && \
> +		{ info "failed to get s1 hwaddr"; return 1; }
> +	ip netns exec client \
> +		ip -6 neigh add fd00::2 lladdr "$sl_mac" dev c1
> +	ip netns exec server \
> +		ip -6 neigh add fd00::1 lladdr "$cl_mac" dev s1

===

Should there be some error detection / bailing here?  I think we should
do the same as in the vlan case and set the unreachability detection to
'permanent' to prevent possible issues with racy neighbor discovery.

> +
> +	ovs_add_flow "test_icmpv6" icmpv6 \
> +	  'in_port(1),eth(),eth_type(0x86dd),ipv6(proto=58),icmpv6(type=128)' \
> +	  '2' || return 1
> +	ovs_add_flow "test_icmpv6" icmpv6 \
> +	  'in_port(2),eth(),eth_type(0x86dd),ipv6(proto=58),icmpv6(type=129)' \
> +	  '1' || return 1
> +
> +	info "verify ICMPv6 echo with type-specific flows"
> +	ovs_sbx "test_icmpv6" ip netns exec client \
> +		ping -6 -c 1 -W 2 fd00::2 || return 1
> +
> +	ovs_del_flows "test_icmpv6" icmpv6
> +
> +	info "verify ping fails without echo flows"
> +	ovs_sbx "test_icmpv6" ip netns exec client \
> +		ping -6 -c 1 -W 2 fd00::2 >/dev/null 2>&1 \
> +		&& { info "FAIL: ping should fail without flows"
> +		     return 1; }
> +
> +	ovs_add_flow "test_icmpv6" icmpv6 \
> +	  'in_port(1),eth(),eth_type(0x86dd),ipv6(proto=58),icmpv6(type=128)' \
> +	  '2' || return 1
> +	ovs_add_flow "test_icmpv6" icmpv6 \
> +	  'in_port(2),eth(),eth_type(0x86dd),ipv6(proto=58),icmpv6(type=129)' \
> +	  '1' || return 1
> +
> +	info "verify connectivity restored"
> +	ovs_sbx "test_icmpv6" ip netns exec client \
> +		ping -6 -c 1 -W 2 fd00::2 || return 1
> +
> +	return 0
> +}
> +
>  # psample test
>  # - use psample to observe packets
>  test_psample() {
> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> index e1ecfad2c03e..f3edd198223f 100644
> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> @@ -1255,11 +1255,16 @@ class ovskey(nla):
>                  lambda x: ipaddress.IPv6Address(x).packed if x else 0,
>                  convert_ipv6,
>              ),
> -            ("label", "label", "%d", lambda x: int(x) if x else 0),
> -            ("proto", "proto", "%d", lambda x: int(x) if x else 0),
> -            ("tclass", "tclass", "%d", lambda x: int(x) if x else 0),
> -            ("hlimit", "hlimit", "%d", lambda x: int(x) if x else 0),
> -            ("frag", "frag", "%d", lambda x: int(x) if x else 0),
> +            ("label", "label", "%d", lambda x: int(x) if x else 0,
> +                convert_int(20)),
> +            ("proto", "proto", "%d", lambda x: int(x) if x else 0,
> +                convert_int(8)),
> +            ("tclass", "tclass", "%d", lambda x: int(x) if x else 0,
> +                convert_int(8)),
> +            ("hlimit", "hlimit", "%d", lambda x: int(x) if x else 0,
> +                convert_int(8)),
> +            ("frag", "frag", "%d", lambda x: int(x) if x else 0,
> +                convert_int(8)),
>          )
>  
>          def __init__(
> @@ -1344,8 +1349,10 @@ class ovskey(nla):
>          )
>  
>          fields_map = (
> -            ("type", "type", "%d", lambda x: int(x) if x else 0),
> -            ("code", "code", "%d", lambda x: int(x) if x else 0),
> +            ("type", "type", "%d", lambda x: int(x) if x else 0,
> +                convert_int(8)),
> +            ("code", "code", "%d", lambda x: int(x) if x else 0,
> +                convert_int(8)),
>          )
>  
>          def __init__(
> @@ -1982,6 +1989,11 @@ class ovskey(nla):
>                  "icmp",
>                  ovskey.ovs_key_icmp,
>              ),
> +            (
> +                "OVS_KEY_ATTR_ICMPV6",
> +                "icmpv6",
> +                ovskey.ovs_key_icmpv6,
> +            ),
>              (
>                  "OVS_KEY_ATTR_TCP_FLAGS",
>                  "tcp_flags",
>
> base-commit: cef9d6804030793cf8b8796fd6936197d065dd3e


^ permalink raw reply

* Re: [PATCH net-next V4 4/6] devlink: Apply eswitch mode boot defaults
From: Mark Bloch @ 2026-07-01 17:42 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	Jonathan Corbet, Shuah Khan, netdev, linux-rdma, linux-doc
In-Reply-To: <akUfXyKioGNAO_iB@FV6GYCPJ69>



On 01/07/2026 17:09, Jiri Pirko wrote:
> Wed, Jul 01, 2026 at 02:57:21PM +0200, mbloch@nvidia.com wrote:
>>
>>
>> On 01/07/2026 12:48, Jiri Pirko wrote:
>>> Mon, Jun 29, 2026 at 08:20:59PM +0200, mbloch@nvidia.com wrote:
>>>> Apply parsed devlink_eswitch_mode= defaults after devlink registration
>>>> and after successful reload.
>>>>
>>>> devl_register() may still be called before the device is ready for an
>>>
>>> How so? I would assume that driver calls devl_register only after
>>> everything is up and running and ready. If not, isn't it a bug?
>>>
>>
>> You would think so :)
>>
>> Some drivers, mlx5 included, call devl_register() while holding the
>> devlink instance lock and then finish setting up state before releasing
>> the lock.
>>
>> In v3 I tried to enforce exactly that model, move devl_register() to
>> be the last thing the driver does. Jakub pushed back on making that a
>> general rule. So in v4 I changed the approach. devl_register() only
>> schedules the work, and the actual eswitch mode change can run only
>> after the driver releases the devlink lock.
> 
> Wouldn't it make sense to use a completion instead of loop-reschedule of
> delayed work?

Just to make sure I understand the suggestion, this would mean that the
work waits until the devlink lock holder drops the lock, and devl_unlock()
would signal it, something like:

void devl_unlock(struct devlink *devlink)
{
	ool complete_apply = devlink->default_esw_mode_apply_pending;

	mutex_unlock(&devlink->lock);

	if (complete_apply)
		complete(&devlink->default_esw_mode_apply_ready);
}

That would avoid the retry loop, but it also means the queued work 
sleeps until the driver drops devl_lock. It does keep one worker
blocked per pending instance and adds this default-esw-mode signalling to
the generic devl_unlock() path.

The delayed retry was meant to avoid a sleeping worker and keep the
instances independent. If one devlink instance is still locked, we just
try it again later while other instances can progress.

If you prefer the completion approach I can switch to it, but I don't see
it as simpler overall.

Mark

> 
>>
>> Mark
>>
>>>
>>>> eswitch mode change, so keep a per-devlink delayed work item and pending
>>>> flag for the registration path. Registration queues the work, and the
>>>> worker tries to take the devlink instance lock.
>>>>
>>>> If the lock is busy, the worker requeues itself with a delay.
>>>>
>>>> For successful reloads that performed DRIVER_REINIT, devlink_reload()
>>>> already holds the devlink instance lock and the driver has completed
>>>> reload_up(). Clear pending work and apply the default directly from the
>>>> reload path instead of queueing work.
>>>>
>>>> If a user sets eswitch mode through netlink before the pending
>>>> registration work runs, clear the pending flag so the queued default does
>>>> not override that user request. Cancel pending default apply work when
>>>> freeing the devlink instance.
>>>
>>> These AI generated code descriptive messages are generally not very
>>> useful :(
>>>
>>


^ permalink raw reply

* Re: [PATCH 5/9] ax88179_178a: Add support for ethtool pause parameter configuration
From: Birger Koblitz @ 2026-07-01 17:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Maxime Chevallier, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-usb, netdev, linux-kernel
In-Reply-To: <46553470-9be3-4ae0-824d-ae85441c920d@lunn.ch>


> Thanks for these numbers.
> 
>> Renkforce AX88179A: ID1 7c9f, ID2 7061
>> Delock AX88279  ID1 03a2, ID2 a411
> 
> air_en8811h.c:#define EN8811H_PHY_ID		0x03a2a411
> 
>> UGreen AX88772D ID1 e65b, ID2 2c61
>> TP-Link AX88179A ID1 e65b, ID2 2c61
> 
> The two ID registers contain part of an OUI, but it has some bits
> missing. So it is not so easy to look it up.
> 
> However, anything using the MII framework basically assumes a very
> simple PHY and only looks at the 802.3 defined registers. So the
> genphy generic PHY driver might be sufficient for when there is not a
> specific driver. At lot depends on how much extra code there is
> accessing the PHY registers in the driver.
I also found
#define PHY_ID_ASIX_AX88772A         0x003b1861
Which has the same ending as the AX88179A at least. So hopefully these
IDs are stable across firmware versions.
So, I will give phylink a try and send a new patch version out. I am a
bit worried because the PHYs are real divas, getting them to survive
a suspend/resume cycle was a bit like herding fleas: get one of them
to survive a cycle made another one come out completely dead back from
resume, and in need of unplug/plug. So that took a lot of
experimentation to get right. The out-of-tree driver by ASIX is of
little help, because it implements a complete USB device and is not
using usbnet.

Birger

^ permalink raw reply

* RE: [EXTERNAL] Re: [PATCH net-next] net: mana: Add handler for sriov configure
From: Haiyang Zhang @ 2026-07-01 17:54 UTC (permalink / raw)
  To: Bjorn Helgaas, Leon Romanovsky
  Cc: Haiyang Zhang, Paul Rosswurm, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Wei Liu, Dexuan Cui,
	Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Bjorn Helgaas, Simon Horman,
	Shradha Gupta, Dipayaan Roy, Erni Sri Satya Vennela,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
In-Reply-To: <20260513190509.GA328362@bhelgaas>



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Wednesday, May 13, 2026 3:05 PM
> To: Leon Romanovsky <leon@kernel.org>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>; Haiyang Zhang
> <haiyangz@linux.microsoft.com>; Paul Rosswurm <paulros@microsoft.com>;
> linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Dexuan Cui
> <DECUI@microsoft.com>; Long Li <longli@microsoft.com>; Andrew Lunn
> <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; Bjorn Helgaas <bhelgaas@google.com>; Simon
> Horman <horms@kernel.org>; Shradha Gupta
> <shradhagupta@linux.microsoft.com>; Dipayaan Roy
> <dipayanroy@linux.microsoft.com>; Erni Sri Satya Vennela
> <ernis@linux.microsoft.com>; linux-kernel@vger.kernel.org; linux-
> pci@vger.kernel.org
> Subject: Re: [EXTERNAL] Re: [PATCH net-next] net: mana: Add handler for
> sriov configure
> 
> On Wed, May 13, 2026 at 09:47:49PM +0300, Leon Romanovsky wrote:
> > On Fri, May 08, 2026 at 06:10:29PM -0500, Bjorn Helgaas wrote:
> > > On Fri, May 08, 2026 at 10:47:14PM +0000, Haiyang Zhang wrote:
> > > > > On Fri, May 08, 2026 at 03:04:06PM -0700, Haiyang Zhang wrote:
> > > > > > From: Haiyang Zhang <haiyangz@microsoft.com>
> > > > > >
> > > > > > Add callback function for the pci_driver, sriov_configure.
> > > > > >
> > > > > > Also disable VF autoprobe when it runs as PF driver on bare
> metal,
> > > > > > since the hardware side may not have the VF ready immediately.
> > > > > >
> > > > > > Export pci_vf_drivers_autoprobe() so the driver can toggle the
> VF
> > > > > > autoprobe flag.
> > > > >
> > > > > Technically pci_vf_drivers_autoprobe() doesn't *toggle* the
> autoprobe
> > > > > flag.  That would mean setting it to the opposite of its current
> > > > > value.
> > > > >
> > > > > Here I would say "so the driver can prevent autoprobing of the
> VFs",
> > > > > which is the intent.
> > > > Thanks, I will change the wording.
> > > >
> > > > >
> > > > > Out of curiosity, how do the VFs eventually get probed?  I guess
> > > > > there's some other mechanism that tells you when they're ready,
> and
> > > > > you manually use sysfs 'sriov_drivers_autoprobe' to enable
> probing,
> > > > > then bind drivers to them via sysfs?
> > > > We have a user program talking to the Azure backplane to get that
> information.
> > > > @Paul Rosswurm, do you have more details?
> > > >
> > > >
> > > > > The prevention of autoprobing sounds like a critical part of this
> > > > > change; might be worth saying something in the subject, because
> "add
> > > > > sriov configure" doesn't include much information.
> > > > How about "Add handler for sriov configure with VF autoprobe off"?
> > >
> > > OK by me :)
> >
> > I believe it is the wrong decision to allow toggling a user‑visible knob
> > without the user’s awareness. In this case, they can either disable
> > autoprobe on the PF or rely on EPROBE_DEFER. In all cases, the same
> > functionality can be achieved without changing PCI autoprobe code.
> 
> OK, Haiyang, can you drop my ack please?  If Leon's solutions don't
> work for you, continue this conversation and we can explore
> alternatives.

Sure, I will submit an updated patch without changing VF autoprobe.

Thanks,
- Haiyang

^ permalink raw reply

* Re: [PATCH AUTOSEL 7.0-6.12] net: usb: cdc_ncm: add Apple Mac USB-C direct networking quirk
From: Jan Kot @ 2026-07-01 17:55 UTC (permalink / raw)
  To: sashal
  Cc: alex, andrew+netdev, davem, edumazet, horms, kuba, linux-kernel,
	linux-usb, netdev, oliver, pabeni, patches, stable
In-Reply-To: <20260520111944.3424570-19-sashal@kernel.org>

Hi,
I noticed that there may be more PIDs for Apple Silicon Macs.
My M1 MacBook Air uses PID 0x1903, not 0x1905.
Looking at libimobiledevice/usbmuxd for reference, they actually define
a range of PIDs from 0x1901 to 0x1905 for these devices:
https://github.com/libimobiledevice/usbmuxd/blob/master/src/usb.h
I guess the whole range should be added to the id_table as well.

Best regards,
Jan Kot

^ permalink raw reply

* Re: [PATCH net] ppp: defer channel free to an RCU grace period to fix pppol2tp RX UAF
From: Norbert Szetei @ 2026-07-01 18:00 UTC (permalink / raw)
  To: Breno Leitao
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Qingfang Deng, Taegu Ha, Yue Haibing,
	Sebastian Andrzej Siewior, Kees Cook, linux-ppp, linux-kernel
In-Reply-To: <akUSFR-ih9U27fgr@gmail.com>

On Jul 1, 2026, at 15:15, Breno Leitao <leitao@debian.org> wrote:
> 
> On Wed, Jul 01, 2026 at 02:14:39PM +0200, Norbert Szetei wrote:
>> diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
>> index 57c68efa5ff8..cb8fe37170d3 100644
>> --- a/drivers/net/ppp/ppp_generic.c
>> +++ b/drivers/net/ppp/ppp_generic.c
>> @@ -184,6 +184,7 @@ struct channel {
>> struct list_head clist; /* link in list of channels per unit */
>> spinlock_t upl; /* protects `ppp' and 'bridge' */
>> struct channel __rcu *bridge; /* "bridged" ppp channel */
>> + struct rcu_head rcu; /* for RCU-deferred free of the channel */
>> #ifdef CONFIG_PPP_MULTILINK
>> u8 avail; /* flag used in multilink stuff */
>> u8 had_frag; /* >= 1 fragments have been sent */
>> @@ -3583,7 +3584,7 @@ static void ppp_release_channel(struct channel *pch)
>> }
>> skb_queue_purge(&pch->file.xq);
>> skb_queue_purge(&pch->file.rq);
>> - kfree(pch);
>> + kfree_rcu(pch, rcu);
> 
> Why not use kfree_rcu_mightsleep() instead? That would eliminate the need
> for the additional `struct rcu_head rcu;` field.

You are right, kfree_rcu_mightsleep() would be simpler, but it's
free-only. No callback to run the deferred purge, it can't handle the
in-flight skb. So, I'll keep the rcu_head and use call_rcu() in v2.

^ permalink raw reply

* [PATCH net-next v2] net: mana: Add handler for sriov configure
From: Haiyang Zhang @ 2026-07-01 18:01 UTC (permalink / raw)
  To: linux-hyperv, netdev, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Erni Sri Satya Vennela,
	Dipayaan Roy, Aditya Garg, Shradha Gupta, linux-kernel
  Cc: paulros

From: Haiyang Zhang <haiyangz@microsoft.com>

Add callback function for the pci_driver / sriov_configure.

It asks the NIC to provide certain number of VFs, or disable
VFs if the request is zero.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
v2:
  No longer change VF autoprobe as discussed with Leon Romanovsky and Bjorn Helgaas.

---
 drivers/net/ethernet/microsoft/mana/gdma_main.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index a0fdd052d7f1..0b7380fd1da8 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -2446,6 +2446,20 @@ static void mana_gd_shutdown(struct pci_dev *pdev)
 	pci_disable_device(pdev);
 }
 
+static int mana_sriov_configure(struct pci_dev *pdev, int numvfs)
+{
+	int err = 0;
+
+	dev_info(&pdev->dev, "Requested num VFs: %d\n", numvfs);
+
+	if (numvfs > 0)
+		err = pci_enable_sriov(pdev, numvfs);
+	else
+		pci_disable_sriov(pdev);
+
+	return err ? err : numvfs;
+}
+
 static const struct pci_device_id mana_id_table[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_MICROSOFT, MANA_PF_DEVICE_ID) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_MICROSOFT, MANA_PF2_DEVICE_ID) },
@@ -2461,6 +2475,7 @@ static struct pci_driver mana_driver = {
 	.suspend	= mana_gd_suspend,
 	.resume		= mana_gd_resume,
 	.shutdown	= mana_gd_shutdown,
+	.sriov_configure = mana_sriov_configure,
 };
 
 static int __init mana_driver_init(void)
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net] ppp: defer channel free to an RCU grace period to fix pppol2tp RX UAF
From: Norbert Szetei @ 2026-07-01 18:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Qingfang Deng, Taegu Ha, Yue Haibing,
	Kees Cook, linux-ppp, linux-kernel
In-Reply-To: <20260701132552.nFP2AZrJ@linutronix.de>

On Jul 1, 2026, at 15:25, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> On 2026-07-01 14:14:39 [+0200], Norbert Szetei wrote:
>> --- a/drivers/net/ppp/ppp_generic.c
>> +++ b/drivers/net/ppp/ppp_generic.c
>> @@ -184,6 +184,7 @@ struct channel {
>> struct list_head clist; /* link in list of channels per unit */
>> spinlock_t upl; /* protects `ppp' and 'bridge' */
>> struct channel __rcu *bridge; /* "bridged" ppp channel */
>> + struct rcu_head rcu; /* for RCU-deferred free of the channel */
>> #ifdef CONFIG_PPP_MULTILINK
>> u8 avail; /* flag used in multilink stuff */
>> u8 had_frag; /* >= 1 fragments have been sent */
>> @@ -3583,7 +3584,7 @@ static void ppp_release_channel(struct channel *pch)
>> }
>> skb_queue_purge(&pch->file.xq);
>> skb_queue_purge(&pch->file.rq);
>> - kfree(pch);
>> + kfree_rcu(pch, rcu);
> 
> From looking at ppp_input(), what ensures that the skb in-flight is not
> added skb_queue which is purged above?

Good catch, purging before the free races an in-flight ppp_input() and
leaks the skb, confirmed with kmemleak. In v2 I moved the purge into the
call_rcu() callback so it runs after the grace period.

N.

> 
>> }
> 
> Sebastian


^ permalink raw reply

* [PATCH net v2] ppp: defer channel free to an RCU grace period to fix pppol2tp RX UAF
From: Norbert Szetei @ 2026-07-01 18:12 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Qingfang Deng, Sebastian Andrzej Siewior,
	Breno Leitao, Taegu Ha, Kees Cook, linux-ppp, linux-kernel

pppol2tp_recv() runs in the L2TP UDP-encap softirq RX path:

 l2tp_udp_encap_recv() -> l2tp_recv_common() -> pppol2tp_recv()
   -> ppp_input(&po->chan)

It runs under rcu_read_lock() holding only an l2tp_session reference and
takes NO reference on the internal PPP channel (struct channel,
chan->ppp) that ppp_input() dereferences.

The pppox socket is SOCK_RCU_FREE, so 'po' and the embedded ppp_channel
are RCU-safe.  But the internal struct channel is a separate allocation
that ppp_release_channel() frees with a plain kfree():

 close(data socket) -> pppol2tp_release() -> pppox_unbind_sock()
   -> ppp_unregister_channel() -> ppp_release_channel() -> kfree(pch)

For a channel that is bound (PPPIOCGCHAN) but not attached to a ppp unit
(no PPPIOCCONNECT, pch->ppp == NULL) and not bridged, teardown skips
both ppp_disconnect_channel()'s synchronize_net() and
ppp_unbridge_channels()'s synchronize_rcu(), so the kfree() has no grace
period.  rcu_read_lock() in pppol2tp_recv() does not protect against a
plain kfree(), so an in-flight ppp_input() on one CPU can dereference
the channel just freed by close() on another CPU.

The bug is reachable by an unprivileged user.

Defer the channel free to an RCU callback via call_rcu() so the grace
period fences any in-flight ppp_input(). The disconnect and unbridge
teardown paths already fence with synchronize_net()/synchronize_rcu();
call_rcu() does the same here without stalling the close() path.

Fixes: ee40fb2e1eb5 ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
---
v2:
- Moved skb_queue_purge() to a dedicated RCU callback to prevent leaking
  skbs added by an in-flight ppp_input() during the grace period (Sebastian).
- Retained call_rcu() to avoid introducing synchronous multi-millisecond
  latency into the teardown path.
v1: https://lore.kernel.org/netdev/C954A7EA-AA98-4E3C-80B5-42C34B3183A3@doyensec.com/

 drivers/net/ppp/ppp_generic.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 57c68efa5ff8..2d57de77780f 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -184,6 +184,7 @@ struct channel {
 	struct list_head clist;		/* link in list of channels per unit */
 	spinlock_t	upl;		/* protects `ppp' and 'bridge' */
 	struct channel __rcu *bridge;	/* "bridged" ppp channel */
+	struct rcu_head rcu;		/* for RCU-deferred free of the channel */
 #ifdef CONFIG_PPP_MULTILINK
 	u8		avail;		/* flag used in multilink stuff */
 	u8		had_frag;	/* >= 1 fragments have been sent */
@@ -3562,6 +3563,18 @@ ppp_disconnect_channel(struct channel *pch)
 	return err;
 }
 
+/* Purge after the grace period: a late ppp_input() may still queue an
+ * skb on pch->file.rq before the last RCU reader drains.
+ */
+static void ppp_release_channel_free(struct rcu_head *rcu)
+{
+	struct channel *pch = container_of(rcu, struct channel, rcu);
+
+	skb_queue_purge(&pch->file.xq);
+	skb_queue_purge(&pch->file.rq);
+	kfree(pch);
+}
+
 /*
  * Drop a reference to a ppp channel and free its memory if the refcount reaches
  * zero.
@@ -3581,9 +3594,7 @@ static void ppp_release_channel(struct channel *pch)
 		pr_err("ppp: destroying undead channel %p !\n", pch);
 		return;
 	}
-	skb_queue_purge(&pch->file.xq);
-	skb_queue_purge(&pch->file.rq);
-	kfree(pch);
+	call_rcu(&pch->rcu, ppp_release_channel_free);
 }
 
 static void __exit ppp_cleanup(void)
-- 
2.54.0

^ permalink raw reply related

* Re: RTL8159 firmware
From: Jan Hendrik Farr @ 2026-07-01 18:15 UTC (permalink / raw)
  To: Birger Koblitz
  Cc: andrew+netdev, davem, edumazet, hsu.chih.kai, kuba, linux-kernel,
	linux-usb, netdev, olek2, pabeni
In-Reply-To: <5dc0e654-0bdb-422c-9049-94ee6d8867e4@birger-koblitz.de>

On 01 19:24:13, Birger Koblitz wrote:
> Hi Jan,
> 
> On 7/1/26 19:13, Jan Hendrik Farr wrote:
> > Hi Birger,
> > 
> > it looks like the firmware file rtl_nic/rtl8159-1.fw isn't in linux-firmware yet.
> > Could you send it for people to potentially test?
> > 
> > Jan
> > 
> The code to create the binary firmware file is at:
> https://gitlab.com/koblitz-rtlnic/rtlnic_fw

I'm getting a 404.


> But I cannot submit the firmware itself to linux-firmware, as the sourcecode from
> which the binary data is extracted is published by Realtek under the GPL.
> For linux-firmware, a binary distribution license is necessary, which requires
> someone from Realtek to license it under their usual firmware license. I
> contacted Realtek, but never heard back.

Ok, let's hope they'll get back to you...



Jan


^ permalink raw reply

* Re: [PATCH 5/9] ax88179_178a: Add support for ethtool pause parameter configuration
From: Andrew Lunn @ 2026-07-01 18:41 UTC (permalink / raw)
  To: Birger Koblitz
  Cc: Maxime Chevallier, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-usb, netdev, linux-kernel
In-Reply-To: <3cc16d45-7b6d-4fb0-8a7b-d6b4d53ab036@birger-koblitz.de>

> So, I will give phylink a try and send a new patch version out. I am a
> bit worried because the PHYs are real divas, getting them to survive
> a suspend/resume cycle was a bit like herding fleas:

It might be having PHY drivers helps, since it is the PHY drivers
problem to handle suspend/resume. The code is then cleanly separated
between drivers, and one should not effect the other.

	Andrew

^ permalink raw reply

* Re: [PATCH] selftests: Open /dev/udmabuf O_RDONLY
From: T.J. Mercier @ 2026-07-01 18:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kraxel, vivek.kasireddy, Shuah Khan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-kselftest, linux-kernel, netdev,
	bpf
In-Reply-To: <20260626180941.45158025@kernel.org>

On Fri, Jun 26, 2026 at 6:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 25 Jun 2026 11:15:55 -0700 T.J. Mercier wrote:
> > Write permissions on the /dev/udmabuf device file are not required to
> > issue ioctls and allocate udmabufs. Applications should be opening this
> > file as O_RDONLY. The BPF dmabuf_iter selftest already does this. [1]
> >
> > Remove the write access mode from the drivers/dma-buf/udmabuf.c and
> > drivers/net/hw/ncdevmem.c selftests.
>
> You need to explain "why", too. Why change it if it clearly
> worked for everyone running this test until now.
> --
> pw-bot: cr

Principle of least privilege. Folks use or point to these selftests as
examples, and then wonder why O_RDWR doesn't work on systems where
write permissions are not available on /dev/udmabuf.

^ permalink raw reply

* Re: [PATCH] selftests: Open /dev/udmabuf O_RDONLY
From: Jakub Kicinski @ 2026-07-01 18:57 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: kraxel, vivek.kasireddy, Shuah Khan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, linux-kselftest, linux-kernel, netdev,
	bpf
In-Reply-To: <CABdmKX2oG9m316hiJpSXbujrT3vgE5hUpzH_WHfjNxBJ1_+BdA@mail.gmail.com>

On Wed, 1 Jul 2026 11:53:15 -0700 T.J. Mercier wrote:
> On Fri, Jun 26, 2026 at 6:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 25 Jun 2026 11:15:55 -0700 T.J. Mercier wrote:  
> > > Write permissions on the /dev/udmabuf device file are not required to
> > > issue ioctls and allocate udmabufs. Applications should be opening this
> > > file as O_RDONLY. The BPF dmabuf_iter selftest already does this. [1]
> > >
> > > Remove the write access mode from the drivers/dma-buf/udmabuf.c and
> > > drivers/net/hw/ncdevmem.c selftests.  
> >
> > You need to explain "why", too. Why change it if it clearly
> > worked for everyone running this test until now.
> > --
> > pw-bot: cr  
> 
> Principle of least privilege. Folks use or point to these selftests as
> examples, and then wonder why O_RDWR doesn't work on systems where
> write permissions are not available on /dev/udmabuf.

Alright, pop that into the commit msg and repost please.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox