Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v2 2/6] r8169: create a virtual interrupt for linkchg
From: Jakub Kicinski @ 2026-06-13 22:05 UTC (permalink / raw)
  To: javen_xu
  Cc: Jakub Kicinski, hkallweit1, nic_swsd, andrew+netdev, davem,
	edumazet, pabeni, maxime.chevallier, horms, netdev, linux-kernel
In-Reply-To: <20260611094345.880-3-javen_xu@realsil.com.cn>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
r8169: create a virtual interrupt for linkchg

Creates a virtual interrupt for linkchg. To support phylink, we should try
to decouple most of tp->phydev, so we add virtual interrupt for mac
interrupt to inform the change of link status.

> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index c60710f9bd21f..560f987437b63 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
[ ... ]
> @@ -4869,7 +4872,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	}
>  
>  	if (status & LinkChg)
> -		phy_mac_interrupt(tp->phydev);
> +		generic_handle_domain_irq(tp->phy_irq_domain, 0);

[Severity: High]
Does this actually work as intended without the PHY driver implementing the
interrupt callbacks?

Because the Realtek PHY drivers don't implement .config_intr and
.handle_interrupt, phy_drv_supports_irq() will return false, causing
phy_probe() to force phydev->irq to PHY_POLL.

Will this cause the virtual interrupt to be unhandled by phylib, meaning
link state detection degrades from an immediate event-driven notification to
1-second polling?

>  	rtl_irq_disable(tp);
>  	napi_schedule(&tp->napi);
[ ... ]
> @@ -5458,6 +5488,21 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
[ ... ]
> +	virq = irq_create_mapping(tp->phy_irq_domain, 0);
> +	if (!virq)
> +		ret = -EINVAL;
> +
> +	new_bus->irq[0] = virq;
> +
>  	ret = devm_mdiobus_register(&pdev->dev, new_bus);

[Severity: Medium]
If irq_create_mapping() fails, ret is set to -EINVAL, but does this
assignment get immediately overwritten by the return value of
devm_mdiobus_register()?

This would cause the IRQ mapping failure to be silently ignored, leaving
the bus with an invalid IRQ of 0 without aborting the driver initialization.

>  	if (ret)
>  		return ret;

^ permalink raw reply

* Re: [PATCH net] net: bcmgenet: Use weighted round-robin TX DMA arbitration
From: patchwork-bot+netdevbpf @ 2026-06-13 22:00 UTC (permalink / raw)
  To: Ovidiu Panait
  Cc: opendmb, florian.fainelli, bcm-kernel-feedback-list,
	andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel
In-Reply-To: <20260610085238.56300-1-ovidiu.panait.rb@renesas.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 10 Jun 2026 08:52:38 +0000 you wrote:
> Under heavy network traffic, we observed sporadic TX queue timeouts on the
> Raspberry Pi 4. The timeouts can be reproduced by stress testing the TX
> path with multiple concurrent iperf UDP streams:
> 
>     iperf3 -c <ip> -u -b0 -P16 -t60
>     NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2044 ms
>     NETDEV WATCHDOG: CPU: 3: transmit queue 0 timed out 2004 ms
> 
> [...]

Here is the summary with links:
  - [net] net: bcmgenet: Use weighted round-robin TX DMA arbitration
    https://git.kernel.org/netdev/net/c/fd615abd5311

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues
From: Jakub Kicinski @ 2026-06-13 21:57 UTC (permalink / raw)
  To: Nicolai Buchwitz
  Cc: Doug Berger, Florian Fainelli, bcm-kernel-feedback-list,
	Andrew Lunn, David S . Miller, Eric Dumazet, Paolo Abeni,
	Justin Chen, Ovidiu Panait, netdev, linux-kernel
In-Reply-To: <20260612205915.3156127-1-nb@tipi-net.de>

On Fri, 12 Jun 2026 22:59:12 +0200 Nicolai Buchwitz wrote:
> Tested on Raspberry Pi CM4 (BCM2711):
>   - Ovidiu's reproducer (iperf3 -u -b0 -P16 -t60) no longer trips
>     NETDEV_WATCHDOG.
>   - UDP sustains 956 Mbit/s line rate over 60 s with 0 datagrams
>     lost (0/4952890).
>   - Single-stream TCP throughput unchanged at 943 Mbit/s.

Of course it has no impact on a single TCP stream test, since TCP
stream can only use one queue. If anything it should help.
The testing here is not very convincing. At least install a realistic
qdisc (fq/fq_codel/cake) and run multi-stream test with multiple cores?
What's the CPU idle delta in such a test?

The reason for this change is not coming thru from the submission.
Ovidiu's patch makes much more intuitive sense. I'll apply that,
please rebase.

^ permalink raw reply

* Re: [PATCH net-next 3/4] net: dsa: realtek: rtl8366rb: Disable STP learning on all ports in setup
From: Luiz Angelo Daros de Luca @ 2026-06-13 21:51 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Alvin Šipraga, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
In-Reply-To: <CAJq09z6oPRAsF2QCD9n4HjruHvE-nqanYvP1CaFVDn7sg-ef6Q@mail.gmail.com>

> Hum... I might need to check rtl8365mb as well. There we disable only
> MSTP/FID 0.

Just for the record, I double-checked rtl8365mb.

The field rtl8365mb_vlan4k.fid, which is currently left as 0 by the
driver, is actually called fid_msti in the vendor code. We should
probably rename it to msti if we ever implement .port_mst_state_set.
However, as it stands today (always defaulting to 0), controlling the
STP states only on MSTI 0 is perfectly fine. Hardware tests confirm
that the MSTI 0 state accurately dictates the port state, though I
haven't explicitly tested if setting vlan4k->fid = 1 would actually
make the VLAN follow the MSTI 1 state.

Regards,

Luiz

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: support xsk wake up
From: Jakub Kicinski @ 2026-06-13 21:46 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Menglong Dong, xuanzhuo, mst, jasowang, andrew+netdev, davem,
	edumazet, pabeni, netdev, virtualization, linux-kernel
In-Reply-To: <CAJaqyWeYiruNosJsMTh2jQ=XCEcPg7956aqeRRpDSyynfpjNZA@mail.gmail.com>

On Wed, 10 Jun 2026 10:27:28 +0200 Eugenio Perez Martin wrote:
> And the From and Signed-off-by emails don't match, which I'm not sure is valid.

It's clearly the same person. Please focus on the code, not trivial
process issues.

Quoting documentation:

  Reviewer guidance
  -----------------

  [...]

  Reviewers are highly encouraged to do more in-depth review of submissions
  and not focus exclusively on process issues, trivial or subjective
  matters like code formatting, tags etc.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#reviewer-guidance

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: support xsk wake up
From: Jakub Kicinski @ 2026-06-13 21:44 UTC (permalink / raw)
  To: Menglong Dong
  Cc: xuanzhuo, mst, jasowang, eperezma, andrew+netdev, davem, edumazet,
	pabeni, netdev, virtualization, linux-kernel
In-Reply-To: <20260610081648.2205711-1-dongml2@chinatelecom.cn>

On Wed, 10 Jun 2026 16:16:48 +0800 Menglong Dong wrote:
> +	/* If both rq->vq and fill ring are empty, and then the user submit
> +	 * all the chunks to the fill ring and check the wake up flag
> +	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
> +	 * we will lose the chance to wake up the rx napi, so we have to
> +	 * set the need_wakeup flag here.
> +	 */

TBH all the comments you're adding are harder to understand than the
code itself ;( Please try to phrase them better or just remove them.

^ permalink raw reply

* Re: [PATCH v2 bpf-next/net 0/5] bpf: Support RX/TX HW timestamp proxy.
From: Kuniyuki Iwashima @ 2026-06-13 21:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Stanislav Fomichev, Andrii Nakryiko, John Fastabend,
	Kumar Kartikeya Dwivedi, Eduard Zingerman, Song Liu,
	Yonghong Song, Jiri Olsa, Andrew Lunn, David S . Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Willem de Bruijn,
	Kuniyuki Iwashima, bpf, netdev
In-Reply-To: <20260613102041.55e3b50e@kernel.org>

On Sat, Jun 13, 2026 at 10:20 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 13 Jun 2026 00:59:57 +0000 Kuniyuki Iwashima wrote:
> > When standard socket applications are run on these hosts,
> > a userspace proxy is required to mediate traffic between the
> > hardware and the applications.
> >
> >             +---------+                 +----------------------+
> >             |  proxy  |                 |  socket application  |
> >             +---------+                 +----------------------+
> >               ^     ^                               ^
> >   userspace   |     |                               |
> >   -----------| |-----------------------------------------------
> >              | |    |    +---------------------+    | skb
> >              | |    `--->|  virtual interface  |<---'
> >   kernel     | |   skb   +---------------------+
> >   -----------| |-----------------------------------------------
> >               |
> >               v
> >        +------------+
> >        |  hardware  |
> >        +------------+
>
> The first patch looks kinda nonsensical but then I saw this diagram.
> Looks like you're vibe coding an integration that makes it easier to
> treat netdev as a slow path for a user networking stack.
> Please tell me if I'm missing anything otherwise add my nack if you
> repost.

Hmm, what would be a better way to tell users that HW TS is
available on tunnel devices ?

Other options were 1) add attribute to tie the tunnel to a physical
device and use its ndo_hwtstamp_set/get, or 2) add bpf retval
hook in the path like update_socket_protocol().

Or do you think applications should handle ENOTSUP by ioctl
as soft error and always set SOF_TIMESTAMPING_XXX_HARDWARE
since the underlying physical device may support it ?

Anyway, I'm happy to drop patch 1 for now and explore options.

(Believe it or not, I haven't let AI write code because I don't
 want AI to take over the most fun part.)

^ permalink raw reply

* Re: [PATCH 1/1] selftests: net: fix file owner for broadcast_ether_dst test
From: Jakub Kicinski @ 2026-06-13 21:32 UTC (permalink / raw)
  To: ross.porter
  Cc: Jakub Kicinski, linux-kselftest, netdev, stable, edoardo.canepa,
	davem, edumazet, pabeni, horms, shuah, oscmaes92, bacs,
	linux-kernel
In-Reply-To: <20260610062230.71573-2-ross.porter@canonical.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
selftests: net: fix file owner for broadcast_ether_dst test

This commit ensures the tcpdump output file in the broadcast_ether_dst test
is always owned by root by passing the -Z root argument.

> diff --git a/tools/testing/selftests/net/broadcast_ether_dst.sh b/tools/testing/selftests/net/broadcast_ether_dst.sh
> --- a/tools/testing/selftests/net/broadcast_ether_dst.sh
> +++ b/tools/testing/selftests/net/broadcast_ether_dst.sh
> @@ -44,7 +44,7 @@ test_broadcast_ether_dst() {
>  	# tcpdump will exit after receiving a single packet
>  	# timeout will kill tcpdump if it is still running after 2s
>  	timeout 2s ip netns exec "${CLIENT_NS}" \
> -		tcpdump -i link0 -c 1 -w "${CAPFILE}" icmp &> "${OUTPUT}" &
> +		tcpdump -i link0 -c 1 -w "${CAPFILE}" icmp -Z root &> "${OUTPUT}" &

[Severity: Medium]
Will this tcpdump command fail with a syntax error in test_broadcast_ether_dst()
on systems that use musl libc or when POSIXLY_CORRECT is set?

By placing the -Z root option after the positional filter expression icmp,
standard POSIX option parsing will stop at the first non-option argument.

While this works on glibc systems because getopt() permutes arguments by
default, on strict POSIX systems tcpdump might treat -Z root as part of the
filter expression. This would concatenate the arguments into "icmp -Z root"
and cause a fatal pcap filter compilation error, breaking the selftest
unconditionally in those environments.

Could the -Z root argument be moved before the icmp filter expression?
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net v2] net/sched: cake: reject overhead values that underflow length
From: Jakub Kicinski @ 2026-06-13 21:26 UTC (permalink / raw)
  To: Samuel Moelius
  Cc: Toke Høiland-Jørgensen, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	moderated list:CAKE QDISC, open list:TC subsystem, open list
In-Reply-To: <20260609232935.1602659.8545fdb04fbe.cake-overhead-underflow@trailofbits.com>

On Tue,  9 Jun 2026 23:29:36 +0000 Samuel Moelius wrote:
> +static const struct netlink_range_validation_signed cake_overhead_range = {
> +	.min = -64,
> +	.max = 256,

Both Sashiko's complain - these values are neither safe nor sufficient.

How was the -64 chosen? It looks suspiciously close the min ethernet
frame length. But in that case (a) FCS doesn't count so 60, and 
(b) even IPv4 TCP packets can be shorter (at qdisc layer) than 64B
leading to underflow...

I see min rate in cake is 64 but I don't see any other meaning of the
64 literal.

Toke, WDYT? Should we use a smaller constant (ETH_HLEN?) or do the
check on the datapath?

Also - small constants fit directly in nla_policy, you don't need
struct netlink_range_validation_signed 
-- 
pw-bot: cr

^ permalink raw reply

* Re: [RFC PATCH bpf-next 0/5] tcp: opportunistic loopback splice for BPF-paired sockets
From: Cong Wang @ 2026-06-13 21:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Cong Wang, Network Development, bpf,
	John Fastabend, Jakub Sitnicki, Jiayuan Chen, Hemanth Malla,
	zijianzhang
In-Reply-To: <20260613105730.0ca1ca07@kernel.org>

On Sat, Jun 13, 2026 at 10:57:30AM -0700, Jakub Kicinski wrote:
> On Fri, 12 Jun 2026 09:01:43 -0700 Alexei Starovoitov wrote:
> > Just saying that the code is free nowadays, so whether it's 1k lines
> > or 10 lines is irrelevant for the discussion.
> > 
> > As far as the idea goes, I think, it would be interesting in pre-AI era,
> > but today splice and friends are a prime target for bugs and more bugs.
> > skmsg and tcp_bpf are reeling from unfixed bugs too,
> > so my take is that we should not add any new features to skmsg
> > and instead deprecate what is already there.
> 
> 100% agreed. There are so many unfixed skmsg bugs it's hard to know
> were to start :( Kernel "intelligence" to help unoptimized applications
> is particularly unappealing right now.

You are absolutely right. :)

Thanks for offering opportunity for me to make profit out of it.

^ permalink raw reply

* Re: [PATCH 1/3 net-next v6] ipv4: centralize devconf sysctl handling
From: patchwork-bot+netdevbpf @ 2026-06-13 21:20 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netdev, horms, pabeni, kuba, edumazet, dsahern, idosch, davem,
	nicolas.dichtel
In-Reply-To: <20260609204520.4670-1-fmancera@suse.de>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  9 Jun 2026 22:45:18 +0200 you wrote:
> The logic for handling IPv4 devconf sysctls is scattered. Notification
> and cache flushes are managed in devinet_conf_proc(), while a separate
> ipv4_doint_and_flush() function and DEVINET_SYSCTL_FLUSHING_ENTRY macro
> is used for properties that solely require a cache flush.
> 
> This patch refactors the sysctl handling by introducing a centralized
> helper, devinet_conf_post_set(). This new function evaluates the changed
> attribute and handles all necessary operations like triggering netlink
> notifications. It returns a boolean indicating whether a routing cache
> flush is required.
> 
> [...]

Here is the summary with links:
  - [1/3,net-next,v6] ipv4: centralize devconf sysctl handling
    https://git.kernel.org/netdev/net-next/c/3a29b55505f3
  - [2/3,net-next,v6] ipv4: handle devconf post-set actions on netlink updates
    https://git.kernel.org/netdev/net-next/c/489730ec2a73
  - [3/3,net-next,v6] selftests: net: add test for IPv4 devconf netlink notifications
    https://git.kernel.org/netdev/net-next/c/32229484e381

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] tcp: refine tcp_sequence() for the FIN exception
From: patchwork-bot+netdevbpf @ 2026-06-13 21:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, ncardwell, kuniyu, netdev,
	eric.dumazet, gmbnomis
In-Reply-To: <20260608151452.706822-1-edumazet@google.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  8 Jun 2026 15:14:52 +0000 you wrote:
> Commit 0e24d17bd966 ("tcp: implement RFC 7323 window retraction
> receiver requirements") removed the special FIN case that
> was added in commit 1e3bb184e941 ("tcp: re-enable acceptance of
> FIN packets when RWIN is 0").
> 
> If a peer sends a segment containing data and a FIN flag before
> it learns about our window retraction and has a buggy TCP stack,
> it might place the FIN one byte beyond what it thinks is the
> right edge of the window (i.e., max_window_edge + 1).
> 
> [...]

Here is the summary with links:
  - [net-next] tcp: refine tcp_sequence() for the FIN exception
    https://git.kernel.org/netdev/net-next/c/91934d44468d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net-next v4] net: mana: Add Interrupt Moderation support
From: Haiyang Zhang @ 2026-06-13 20:57 UTC (permalink / raw)
  To: linux-hyperv, netdev, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Shradha Gupta, Erni Sri Satya Vennela, Dipayaan Roy, Aditya Garg,
	Breno Leitao, linux-kernel, linux-rdma
  Cc: paulros

From: Haiyang Zhang <haiyangz@microsoft.com>

Add Static and Dynamic Interrupt Moderation (DIM) support for
Rx and Tx.
Update queue creation procedure with new data struct with the related
settings.
Add functions to collect stat for DIM, and workers to update DIM data
and settings.
Update ethtool handler to get/set the moderation settings from a user.
To avoid detach/re-attach ops, ring DIM doorbell to change settings
at run time.
By default, adaptive-rx/tx (DIM) are enabled if supported by HW.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
v4:
  Fixed tx stat, concurrency, and mb issues from Simon's review.

v3:
  Updated to avoid detach/re-attach ops as suggested by Paolo.

v2:
  Updated with comments from Jedrzej.

---
 drivers/net/ethernet/microsoft/Kconfig        |   1 +
 .../net/ethernet/microsoft/mana/gdma_main.c   |  29 +++
 drivers/net/ethernet/microsoft/mana/mana_en.c | 171 ++++++++++++++++++
 .../ethernet/microsoft/mana/mana_ethtool.c    | 167 ++++++++++++++++-
 include/net/mana/gdma.h                       |  24 ++-
 include/net/mana/mana.h                       |  54 ++++++
 6 files changed, 437 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/Kconfig b/drivers/net/ethernet/microsoft/Kconfig
index 3f36ee6a8ece..e9be18c92ca5 100644
--- a/drivers/net/ethernet/microsoft/Kconfig
+++ b/drivers/net/ethernet/microsoft/Kconfig
@@ -21,6 +21,7 @@ config MICROSOFT_MANA
 	depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
 	depends on PCI_HYPERV
 	select AUXILIARY_BUS
+	select DIMLIB
 	select PAGE_POOL
 	select NET_SHAPER
 	help
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index c9ec80a1dd6f..7a012b1e5751 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /* Copyright (c) 2021, Microsoft Corporation. */
 
+#include <linux/bitfield.h>
 #include <linux/debugfs.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -464,6 +465,7 @@ static int mana_gd_disable_queue(struct gdma_queue *queue)
 #define DOORBELL_OFFSET_RQ	0x400
 #define DOORBELL_OFFSET_CQ	0x800
 #define DOORBELL_OFFSET_EQ	0xFF8
+#define DOORBELL_OFFSET_DIM	0x820
 
 static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
 				  enum gdma_queue_type q_type, u32 qid,
@@ -504,6 +506,16 @@ static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
 		addr += DOORBELL_OFFSET_SQ;
 		break;
 
+	case GDMA_DIM:
+		e.dim.id = qid;
+		e.dim.mod_usec = FIELD_GET(MANA_INTR_MODR_USEC_MAX, tail_ptr);
+		e.dim.mod_usec_vld = !!(tail_ptr & MANA_INTR_MODR_USEC_VLD);
+		e.dim.mod_comps = FIELD_GET(MANA_INTR_MODR_COMP_MASK, tail_ptr);
+		e.dim.mod_comps_vld = num_req;
+
+		addr += DOORBELL_OFFSET_DIM;
+		break;
+
 	default:
 		WARN_ON(1);
 		return;
@@ -538,6 +550,23 @@ void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit)
 }
 EXPORT_SYMBOL_NS(mana_gd_ring_cq, "NET_MANA");
 
+void mana_gd_ring_dim(struct gdma_queue *cq, u32 mod_usec, bool mod_usec_vld,
+		      u32 mod_comps, bool mod_comps_vld)
+{
+	struct gdma_context *gc = cq->gdma_dev->gdma_context;
+	u32 dim_val;
+
+	/* Convert the DIM values to doorbell parameters */
+	dim_val = FIELD_PREP(MANA_INTR_MODR_USEC_MAX, mod_usec) |
+		  FIELD_PREP(MANA_INTR_MODR_COMP_MASK, mod_comps);
+	if (mod_usec_vld)
+		dim_val |= MANA_INTR_MODR_USEC_VLD;
+
+	mana_gd_ring_doorbell(gc, cq->gdma_dev->doorbell, GDMA_DIM, cq->id,
+			      dim_val, mod_comps_vld);
+}
+EXPORT_SYMBOL_NS(mana_gd_ring_dim, "NET_MANA");
+
 #define MANA_SERVICE_PERIOD 10
 
 static void mana_serv_rescan(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 26aef21c6c2c..d36850084f2e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1579,6 +1579,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
 
 	mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
 			     sizeof(req), sizeof(resp));
+
+	req.hdr.req.msg_version = GDMA_MESSAGE_V3;
+	req.hdr.resp.msg_version = GDMA_MESSAGE_V2;
 	req.vport = vport;
 	req.wq_type = wq_type;
 	req.wq_gdma_region = wq_spec->gdma_region;
@@ -1587,6 +1590,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
 	req.cq_size = cq_spec->queue_size;
 	req.cq_moderation_ctx_id = cq_spec->modr_ctx_id;
 	req.cq_parent_qid = cq_spec->attached_eq;
+	req.req_cq_moderation = cq_spec->req_cq_moderation;
+	req.cq_moderation_comp = cq_spec->cq_moderation_comp;
+	req.cq_moderation_usec = cq_spec->cq_moderation_usec;
 
 	err = mana_send_request(apc->ac, &req, sizeof(req), &resp,
 				sizeof(resp));
@@ -1844,6 +1850,7 @@ static void mana_poll_tx_cq(struct mana_cq *cq)
 	struct gdma_posted_wqe_info *wqe_info;
 	unsigned int pkt_transmitted = 0;
 	unsigned int wqe_unit_cnt = 0;
+	unsigned int tx_bytes = 0;
 	struct mana_txq *txq = cq->txq;
 	struct mana_port_context *apc;
 	struct netdev_queue *net_txq;
@@ -1925,6 +1932,8 @@ static void mana_poll_tx_cq(struct mana_cq *cq)
 
 		mana_unmap_skb(skb, apc);
 
+		tx_bytes += skb->len;
+
 		napi_consume_skb(skb, cq->budget);
 
 		pkt_transmitted++;
@@ -1955,6 +1964,10 @@ static void mana_poll_tx_cq(struct mana_cq *cq)
 	if (atomic_sub_return(pkt_transmitted, &txq->pending_sends) < 0)
 		WARN_ON_ONCE(1);
 
+	/* Feed DIM with the completion rate observed here, in NAPI context. */
+	cq->tx_dim_pkts += pkt_transmitted;
+	cq->tx_dim_bytes += tx_bytes;
+
 	cq->work_done = pkt_transmitted;
 }
 
@@ -2306,6 +2319,119 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
 		xdp_do_flush();
 }
 
+static void mana_rx_dim_work(struct work_struct *work)
+{
+	struct dim *dim = container_of(work, struct dim, work);
+	struct dim_cq_moder cur_moder;
+	struct mana_cq *cq;
+
+	cur_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+	cq = container_of(dim, struct mana_cq, dim);
+
+	cur_moder.usec = min_t(u16, cur_moder.usec, MANA_INTR_MODR_USEC_MAX);
+	cur_moder.pkts = min_t(u16, cur_moder.pkts, MANA_INTR_MODR_COMP_MAX);
+
+	mana_gd_ring_dim(cq->gdma_cq, cur_moder.usec, true,
+			 cur_moder.pkts, true);
+
+	dim->state = DIM_START_MEASURE;
+}
+
+static void mana_tx_dim_work(struct work_struct *work)
+{
+	struct dim *dim = container_of(work, struct dim, work);
+	struct dim_cq_moder cur_moder;
+	struct mana_cq *cq;
+
+	cur_moder = net_dim_get_tx_moderation(dim->mode, dim->profile_ix);
+	cq = container_of(dim, struct mana_cq, dim);
+
+	cur_moder.usec = min_t(u16, cur_moder.usec, MANA_INTR_MODR_USEC_MAX);
+	cur_moder.pkts = min_t(u16, cur_moder.pkts, MANA_INTR_MODR_COMP_MAX);
+
+	mana_gd_ring_dim(cq->gdma_cq, cur_moder.usec, true,
+			 cur_moder.pkts, true);
+
+	dim->state = DIM_START_MEASURE;
+}
+
+/* The caller must update apc->rx/tx_dim_enabled before disabling and
+ * after enabling. And synchronize_net() before draining the DIM work,
+ * so that NAPI cannot observe a stale flag.
+ */
+int mana_dim_change(struct mana_cq *cq, bool enable)
+{
+	bool is_rx = cq->type == MANA_CQ_TYPE_RX;
+	struct mana_port_context *apc;
+	work_func_t work_func;
+	u32 usec, comp;
+
+	if (is_rx) {
+		apc = netdev_priv(cq->rxq->ndev);
+		usec = apc->intr_modr_rx_usec;
+		comp = apc->intr_modr_rx_comp;
+		work_func = mana_rx_dim_work;
+	} else {
+		apc = netdev_priv(cq->txq->ndev);
+		usec = apc->intr_modr_tx_usec;
+		comp = apc->intr_modr_tx_comp;
+		work_func = mana_tx_dim_work;
+	}
+
+	/* On enable, zero the DIM state so net_dim() starts measuring from
+	 * scratch.
+	 * On disable, drain any pending DIM work and restore the static
+	 * moderation values.
+	 */
+	if (enable) {
+		memset(&cq->dim, 0, sizeof(cq->dim));
+		cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		INIT_WORK(&cq->dim.work, work_func);
+	} else {
+		cancel_work_sync(&cq->dim.work);
+		mana_gd_ring_dim(cq->gdma_cq, usec, true, comp, true);
+	}
+
+	return 0;
+}
+
+static void mana_update_rx_dim(struct mana_cq *cq)
+{
+	struct mana_port_context *apc = netdev_priv(cq->rxq->ndev);
+	struct dim_sample dim_sample = {};
+	struct mana_rxq *rxq = cq->rxq;
+
+	/* Pairs with smp_store_release() in mana_set_coalesce(): observing the
+	 * enable flag set guarantees the DIM (re)initialization is visible.
+	 */
+	if (!smp_load_acquire(&apc->rx_dim_enabled))
+		return;
+
+	dim_update_sample(READ_ONCE(cq->dim_event_ctr), rxq->stats.packets,
+			  rxq->stats.bytes, &dim_sample);
+	net_dim(&cq->dim, &dim_sample);
+}
+
+static void mana_update_tx_dim(struct mana_cq *cq)
+{
+	struct mana_port_context *apc = netdev_priv(cq->txq->ndev);
+	struct dim_sample dim_sample = {};
+
+	/* Pairs with smp_store_release() in mana_set_coalesce(): observing the
+	 * enable flag set guarantees the DIM (re)initialization is visible.
+	 */
+	if (!smp_load_acquire(&apc->tx_dim_enabled))
+		return;
+
+	/* cq->tx_dim_pkts/bytes are accumulated in mana_poll_tx_cq(), in the
+	 * same NAPI context as this read, so they track the hardware
+	 * completion rate and need no u64_stats_sync protection.
+	 */
+	dim_update_sample(READ_ONCE(cq->dim_event_ctr), cq->tx_dim_pkts,
+			  cq->tx_dim_bytes, &dim_sample);
+	net_dim(&cq->dim, &dim_sample);
+}
+
 static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
 {
 	struct mana_cq *cq = context;
@@ -2324,6 +2450,15 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
 	if (w < cq->budget) {
 		mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
 		cq->work_done_since_doorbell = 0;
+
+		/* Update DIM before napi_complete_done() to prevent running
+		 * net_dim() concurrently.
+		 */
+		if (cq->type == MANA_CQ_TYPE_RX)
+			mana_update_rx_dim(cq);
+		else
+			mana_update_tx_dim(cq);
+
 		napi_complete_done(&cq->napi, w);
 	} else if (cq->work_done_since_doorbell >=
 		   (cq->gdma_cq->queue_size / COMP_ENTRY_SIZE) * 4) {
@@ -2356,6 +2491,7 @@ static void mana_schedule_napi(void *context, struct gdma_queue *gdma_queue)
 {
 	struct mana_cq *cq = context;
 
+	WRITE_ONCE(cq->dim_event_ctr, cq->dim_event_ctr + 1);
 	napi_schedule_irqoff(&cq->napi);
 }
 
@@ -2398,6 +2534,7 @@ static void mana_destroy_txq(struct mana_port_context *apc)
 		if (apc->tx_qp[i]->txq.napi_initialized) {
 			napi_synchronize(napi);
 			napi_disable_locked(napi);
+			cancel_work_sync(&apc->tx_qp[i]->tx_cq.dim.work);
 			netif_napi_del_locked(napi);
 			apc->tx_qp[i]->txq.napi_initialized = false;
 		}
@@ -2529,6 +2666,11 @@ static int mana_create_txq(struct mana_port_context *apc,
 		cq_spec.modr_ctx_id = 0;
 		cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
 
+		/* DIM setting can be changed at runtime */
+		cq_spec.req_cq_moderation = true;
+		cq_spec.cq_moderation_usec = apc->intr_modr_tx_usec;
+		cq_spec.cq_moderation_comp = apc->intr_modr_tx_comp;
+
 		err = mana_create_wq_obj(apc, apc->port_handle, GDMA_SQ,
 					 &wq_spec, &cq_spec,
 					 &apc->tx_qp[i]->tx_object);
@@ -2559,6 +2701,13 @@ static int mana_create_txq(struct mana_port_context *apc,
 
 		set_bit(NAPI_STATE_NO_BUSY_POLL, &cq->napi.state);
 		netif_napi_add_locked(net, &cq->napi, mana_poll);
+
+		/* Initialize the DIM work before enabling NAPI, so that a poll
+		 * cannot reach net_dim() with an uninitialized cq->dim.work.
+		 */
+		INIT_WORK(&cq->dim.work, mana_tx_dim_work);
+		cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+
 		napi_enable_locked(&cq->napi);
 		txq->napi_initialized = true;
 
@@ -2596,6 +2745,7 @@ static void mana_destroy_rxq(struct mana_port_context *apc,
 		napi_synchronize(napi);
 
 		napi_disable_locked(napi);
+		cancel_work_sync(&rxq->rx_cq.dim.work);
 		netif_napi_del_locked(napi);
 	}
 
@@ -2834,6 +2984,11 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 	cq_spec.modr_ctx_id = 0;
 	cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
 
+	/* DIM setting can be changed at runtime */
+	cq_spec.req_cq_moderation = true;
+	cq_spec.cq_moderation_usec = apc->intr_modr_rx_usec;
+	cq_spec.cq_moderation_comp = apc->intr_modr_rx_comp;
+
 	err = mana_create_wq_obj(apc, apc->port_handle, GDMA_RQ,
 				 &wq_spec, &cq_spec, &rxq->rxobj);
 	if (err)
@@ -2866,6 +3021,12 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 	WARN_ON(xdp_rxq_info_reg_mem_model(&rxq->xdp_rxq, MEM_TYPE_PAGE_POOL,
 					   rxq->page_pool));
 
+	/* Initialize the DIM work before enabling NAPI, so that a poll
+	 * cannot reach net_dim() with an uninitialized cq->dim.work.
+	 */
+	INIT_WORK(&cq->dim.work, mana_rx_dim_work);
+	cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+
 	napi_enable_locked(&cq->napi);
 
 	mana_gd_ring_cq(cq->gdma_cq, SET_ARM_BIT);
@@ -3532,6 +3693,16 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	apc->link_cfg_error = 1;
 	apc->cqe_coalescing_enable = 0;
 
+	/* Initialize interrupt moderation settings if supported by HW */
+	if (gc->pf_cap_flags1 & GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION) {
+		apc->intr_modr_rx_usec = MANA_INTR_MODR_USEC_DEF;
+		apc->intr_modr_rx_comp = MANA_INTR_MODR_COMP_DEF;
+		apc->intr_modr_tx_usec = MANA_INTR_MODR_USEC_DEF;
+		apc->intr_modr_tx_comp = MANA_INTR_MODR_COMP_DEF;
+		apc->rx_dim_enabled = MANA_ADAPTIVE_RX_DEF;
+		apc->tx_dim_enabled = MANA_ADAPTIVE_TX_DEF;
+	}
+
 	mutex_init(&apc->vport_mutex);
 	apc->vport_use_count = 0;
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index 94e658d07a27..5e5fb5b18bbf 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -419,6 +419,15 @@ static int mana_get_coalesce(struct net_device *ndev,
 	    !kernel_coal->rx_cqe_nsecs)
 		kernel_coal->rx_cqe_nsecs = MANA_RX_CQE_NSEC_DEF;
 
+	ec->rx_coalesce_usecs = apc->intr_modr_rx_usec;
+	ec->rx_max_coalesced_frames = apc->intr_modr_rx_comp;
+
+	ec->tx_coalesce_usecs = apc->intr_modr_tx_usec;
+	ec->tx_max_coalesced_frames = apc->intr_modr_tx_comp;
+
+	ec->use_adaptive_rx_coalesce = apc->rx_dim_enabled;
+	ec->use_adaptive_tx_coalesce = apc->tx_dim_enabled;
+
 	return 0;
 }
 
@@ -428,9 +437,34 @@ static int mana_set_coalesce(struct net_device *ndev,
 			     struct netlink_ext_ack *extack)
 {
 	struct mana_port_context *apc = netdev_priv(ndev);
-	u8 saved_cqe_coalescing_enable;
+	struct {
+		u16 intr_modr_rx_usec;
+		u16 intr_modr_rx_comp;
+		u16 intr_modr_tx_usec;
+		u16 intr_modr_tx_comp;
+		u8 cqe_coalescing_enable;
+		bool rx_dim_enabled;
+		bool tx_dim_enabled;
+	} saved;
+	bool modr_changed = false;
+	bool dim_changed = false;
+	struct gdma_context *gc;
 	int err;
 
+	gc = apc->ac->gdma_dev->gdma_context;
+
+	/* Both static and dynamic interrupt moderation (DIM) rely on the
+	 * same HW capability advertised by the PF.
+	 */
+	if ((ec->use_adaptive_rx_coalesce || ec->use_adaptive_tx_coalesce ||
+	     ec->rx_coalesce_usecs || ec->tx_coalesce_usecs ||
+	     ec->rx_max_coalesced_frames || ec->tx_max_coalesced_frames) &&
+	    !(gc->pf_cap_flags1 & GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION)) {
+		NL_SET_ERR_MSG(extack,
+			       "Interrupt Moderation is not supported by HW");
+		return -EOPNOTSUPP;
+	}
+
 	if (kernel_coal->rx_cqe_frames != 1 &&
 	    kernel_coal->rx_cqe_frames != MANA_RXCOMP_OOB_NUM_PPI) {
 		NL_SET_ERR_MSG_FMT(extack,
@@ -440,18 +474,129 @@ static int mana_set_coalesce(struct net_device *ndev,
 		return -EINVAL;
 	}
 
-	saved_cqe_coalescing_enable = apc->cqe_coalescing_enable;
+	if (ec->rx_coalesce_usecs > MANA_INTR_MODR_USEC_MAX ||
+	    ec->tx_coalesce_usecs > MANA_INTR_MODR_USEC_MAX) {
+		NL_SET_ERR_MSG_FMT(extack,
+				   "coalesce usecs must be <= %lu",
+				   MANA_INTR_MODR_USEC_MAX);
+		return -EINVAL;
+	}
+
+	if (ec->rx_max_coalesced_frames > MANA_INTR_MODR_COMP_MAX ||
+	    ec->tx_max_coalesced_frames > MANA_INTR_MODR_COMP_MAX) {
+		NL_SET_ERR_MSG_FMT(extack,
+				   "coalesce frames must be <= %lu",
+				   MANA_INTR_MODR_COMP_MAX);
+		return -EINVAL;
+	}
+
+	if (ec->rx_coalesce_usecs != apc->intr_modr_rx_usec ||
+	    ec->rx_max_coalesced_frames != apc->intr_modr_rx_comp ||
+	    ec->tx_coalesce_usecs != apc->intr_modr_tx_usec ||
+	    ec->tx_max_coalesced_frames != apc->intr_modr_tx_comp)
+		modr_changed = true;
+
+	saved.intr_modr_rx_usec = apc->intr_modr_rx_usec;
+	saved.intr_modr_rx_comp = apc->intr_modr_rx_comp;
+	saved.intr_modr_tx_usec = apc->intr_modr_tx_usec;
+	saved.intr_modr_tx_comp = apc->intr_modr_tx_comp;
+
+	apc->intr_modr_rx_usec = ec->rx_coalesce_usecs;
+	apc->intr_modr_rx_comp = ec->rx_max_coalesced_frames;
+	apc->intr_modr_tx_usec = ec->tx_coalesce_usecs;
+	apc->intr_modr_tx_comp = ec->tx_max_coalesced_frames;
+
+	if (!!ec->use_adaptive_rx_coalesce != apc->rx_dim_enabled ||
+	    !!ec->use_adaptive_tx_coalesce != apc->tx_dim_enabled)
+		dim_changed = true;
+
+	saved.rx_dim_enabled = apc->rx_dim_enabled;
+	saved.tx_dim_enabled = apc->tx_dim_enabled;
+
+	saved.cqe_coalescing_enable = apc->cqe_coalescing_enable;
 	apc->cqe_coalescing_enable =
 		kernel_coal->rx_cqe_frames == MANA_RXCOMP_OOB_NUM_PPI;
 
-	if (!apc->port_is_up)
+	if (!apc->port_is_up) {
+		WRITE_ONCE(apc->rx_dim_enabled, !!ec->use_adaptive_rx_coalesce);
+		WRITE_ONCE(apc->tx_dim_enabled, !!ec->use_adaptive_tx_coalesce);
 		return 0;
+	}
 
-	err = mana_config_rss(apc, TRI_STATE_TRUE, false, false);
-	if (err)
-		apc->cqe_coalescing_enable = saved_cqe_coalescing_enable;
+	if (apc->cqe_coalescing_enable != saved.cqe_coalescing_enable) {
+		/* CQE coalescing setting is applied via RSS configuration. */
+		err = mana_config_rss(apc, TRI_STATE_TRUE, false, false);
+		if (err) {
+			netdev_err(ndev, "Change CQE coalescing failed: %d\n",
+				   err);
+			apc->cqe_coalescing_enable =
+				saved.cqe_coalescing_enable;
+			apc->intr_modr_rx_usec = saved.intr_modr_rx_usec;
+			apc->intr_modr_rx_comp = saved.intr_modr_rx_comp;
+			apc->intr_modr_tx_usec = saved.intr_modr_tx_usec;
+			apc->intr_modr_tx_comp = saved.intr_modr_tx_comp;
+			return err;
+		}
+	}
 
-	return err;
+	if (modr_changed || dim_changed) {
+		bool new_rx_dim = !!ec->use_adaptive_rx_coalesce;
+		bool new_tx_dim = !!ec->use_adaptive_tx_coalesce;
+		bool disable_rx_dim = saved.rx_dim_enabled && !new_rx_dim;
+		bool disable_tx_dim = saved.tx_dim_enabled && !new_tx_dim;
+		bool enable_rx_dim = !saved.rx_dim_enabled && new_rx_dim;
+		bool enable_tx_dim = !saved.tx_dim_enabled && new_tx_dim;
+		int q;
+
+		/* On disable: clear the per-port flag first and
+		 * synchronize_net() so any in-flight NAPI poll observes
+		 * the new value and will not schedule further DIM work;
+		 * then drain pending work and restore the static
+		 * moderation values.
+		 */
+		if (disable_rx_dim)
+			WRITE_ONCE(apc->rx_dim_enabled, false);
+		if (disable_tx_dim)
+			WRITE_ONCE(apc->tx_dim_enabled, false);
+		if (disable_rx_dim || disable_tx_dim)
+			synchronize_net();
+
+		for (q = 0; q < apc->num_queues; q++) {
+			struct mana_cq *rx_cq = &apc->rxqs[q]->rx_cq;
+			struct mana_cq *tx_cq = &apc->tx_qp[q]->tx_cq;
+
+			if (disable_rx_dim)
+				mana_dim_change(rx_cq, false);
+			else if (enable_rx_dim)
+				mana_dim_change(rx_cq, true);
+			else if (!new_rx_dim && modr_changed)
+				mana_gd_ring_dim(rx_cq->gdma_cq,
+						 apc->intr_modr_rx_usec, true,
+						 apc->intr_modr_rx_comp, true);
+
+			if (disable_tx_dim)
+				mana_dim_change(tx_cq, false);
+			else if (enable_tx_dim)
+				mana_dim_change(tx_cq, true);
+			else if (!new_tx_dim && modr_changed)
+				mana_gd_ring_dim(tx_cq->gdma_cq,
+						 apc->intr_modr_tx_usec, true,
+						 apc->intr_modr_tx_comp, true);
+		}
+
+		/* Publish the enable flag with release semantics so a
+		 * concurrent NAPI poll that observes it set also sees the DIM
+		 * (re)init done by mana_dim_change() above.
+		 */
+		if (enable_rx_dim)
+			/* pairs with smp_load_acquire() in mana_update_rx_dim() */
+			smp_store_release(&apc->rx_dim_enabled, true);
+		if (enable_tx_dim)
+			/* pairs with smp_load_acquire() in mana_update_tx_dim() */
+			smp_store_release(&apc->tx_dim_enabled, true);
+	}
+
+	return 0;
 }
 
 /* mana_set_channels - change the number of queues on a port
@@ -595,7 +740,13 @@ static int mana_get_link_ksettings(struct net_device *ndev,
 }
 
 const struct ethtool_ops mana_ethtool_ops = {
-	.supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
+	.supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES |
+				     ETHTOOL_COALESCE_RX_USECS |
+				     ETHTOOL_COALESCE_RX_MAX_FRAMES |
+				     ETHTOOL_COALESCE_TX_USECS |
+				     ETHTOOL_COALESCE_TX_MAX_FRAMES |
+				     ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
+				     ETHTOOL_COALESCE_USE_ADAPTIVE_TX,
 	.op_needs_rtnl		= ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
 				  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
 	.get_ethtool_stats	= mana_get_ethtool_stats,
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 0c395917b214..8529cef0d7c4 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -47,6 +47,7 @@ enum gdma_queue_type {
 	GDMA_RQ,
 	GDMA_CQ,
 	GDMA_EQ,
+	GDMA_DIM,
 };
 
 enum gdma_work_request_flags {
@@ -126,6 +127,17 @@ union gdma_doorbell_entry {
 		u64 tail_ptr	: 31;
 		u64 arm		: 1;
 	} eq;
+
+	struct {
+		u64 id           : 24;
+		u64 reserved     : 8;
+		u64 mod_usec     : 10;
+		u64 reserve1     : 5;
+		u64 mod_usec_vld : 1;
+		u64 mod_comps    : 8;
+		u64 reserve2     : 7;
+		u64 mod_comps_vld: 1;
+	} dim;
 }; /* HW DATA */
 
 struct gdma_msg_hdr {
@@ -502,6 +514,9 @@ void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit);
 
 int mana_schedule_serv_work(struct gdma_context *gc, enum gdma_eqe_type type);
 
+void mana_gd_ring_dim(struct gdma_queue *cq, u32 mod_usec, bool mod_usec_vld,
+		      u32 mod_comps, bool mod_comps_vld);
+
 struct gdma_wqe {
 	u32 reserved	:24;
 	u32 last_vbytes	:8;
@@ -650,6 +665,9 @@ enum {
 /* Driver supports self recovery on Hardware Channel timeouts */
 #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY BIT(25)
 
+/* Driver supports dynamic interrupt moderation - DIM */
+#define GDMA_DRV_CAP_FLAG_1_DYN_INTERRUPT_MODERATION BIT(28)
+
 #define GDMA_DRV_CAP_FLAGS1 \
 	(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
 	 GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
@@ -665,7 +683,8 @@ enum {
 	 GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \
 	 GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \
 	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \
-	 GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT)
+	 GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT | \
+	 GDMA_DRV_CAP_FLAG_1_DYN_INTERRUPT_MODERATION)
 
 #define GDMA_DRV_CAP_FLAGS2 0
 
@@ -701,6 +720,9 @@ struct gdma_verify_ver_req {
 	u8 os_ver_str4[128];
 }; /* HW DATA */
 
+/* HW supports dynamic interrupt moderation - DIM */
+#define GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION BIT(15)
+
 struct gdma_verify_ver_resp {
 	struct gdma_resp_hdr hdr;
 	u64 gdma_protocol_ver;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 13c87baf018e..df4c4a3f68fa 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -4,6 +4,7 @@
 #ifndef _MANA_H
 #define _MANA_H
 
+#include <linux/dim.h>
 #include <net/xdp.h>
 #include <net/net_shaper.h>
 
@@ -64,6 +65,19 @@ enum TRI_STATE {
 /* Maximum number of packets per coalesced CQE */
 #define MANA_RXCOMP_OOB_NUM_PPI 4
 
+/* Default/max interrupt moderation settings */
+#define MANA_INTR_MODR_USEC_DEF 0
+#define MANA_INTR_MODR_COMP_DEF 0
+
+#define MANA_ADAPTIVE_RX_DEF true
+#define MANA_ADAPTIVE_TX_DEF true
+
+/* DIM doorbell value field layout */
+#define MANA_INTR_MODR_USEC_MAX    GENMASK(9, 0)
+#define MANA_INTR_MODR_USEC_VLD    BIT(15)
+#define MANA_INTR_MODR_COMP_MAX    GENMASK(7, 0)
+#define MANA_INTR_MODR_COMP_MASK   GENMASK(23, 16)
+
 /* Update this count whenever the respective structures are changed */
 #define MANA_STATS_RX_COUNT (6 + MANA_RXCOMP_OOB_NUM_PPI - 1)
 #define MANA_STATS_TX_COUNT 11
@@ -297,6 +311,17 @@ struct mana_cq {
 	int work_done;
 	int work_done_since_doorbell;
 	int budget;
+
+	/* DIM - Dynamic Interrupt Moderation */
+	struct dim dim;
+	u16 dim_event_ctr;
+
+	/* Cumulative TX completions fed to DIM. Updated and read only in
+	 * NAPI context (mana_poll_tx_cq() / mana_update_tx_dim()), so they
+	 * measure the hardware completion rate and need no u64_stats_sync.
+	 */
+	u64 tx_dim_pkts;
+	u64 tx_dim_bytes;
 };
 
 struct mana_recv_buf_oob {
@@ -573,6 +598,15 @@ struct mana_port_context {
 	u8 cqe_coalescing_enable;
 	u32 cqe_coalescing_timeout_ns;
 
+	/* Interrupt moderation settings */
+	u16 intr_modr_rx_usec;
+	u16 intr_modr_rx_comp;
+	u16 intr_modr_tx_usec;
+	u16 intr_modr_tx_comp;
+
+	bool rx_dim_enabled;
+	bool tx_dim_enabled;
+
 	struct mana_ethtool_stats eth_stats;
 
 	struct mana_ethtool_phy_stats phy_stats;
@@ -598,6 +632,8 @@ int mana_alloc_queues(struct net_device *ndev);
 int mana_attach(struct net_device *ndev);
 int mana_detach(struct net_device *ndev, bool from_close);
 
+int mana_dim_change(struct mana_cq *cq, bool enable);
+
 int mana_probe(struct gdma_dev *gd, bool resuming);
 void mana_remove(struct gdma_dev *gd, bool suspending);
 
@@ -633,6 +669,9 @@ struct mana_obj_spec {
 	u32 queue_size;
 	u32 attached_eq;
 	u32 modr_ctx_id;
+	u8 req_cq_moderation;
+	u16 cq_moderation_comp;
+	u16 cq_moderation_usec;
 };
 
 enum mana_command_code {
@@ -764,6 +803,15 @@ struct mana_create_wqobj_req {
 	u32 cq_size;
 	u32 cq_moderation_ctx_id;
 	u32 cq_parent_qid;
+
+	/* V2 */
+	u8 allow_rqwqe_chain;
+
+	/* V3 */
+	u8 req_cq_moderation;
+	u16 cq_moderation_comp;
+	u16 cq_moderation_usec;
+	u8 reserved2[2];
 }; /* HW DATA */
 
 struct mana_create_wqobj_resp {
@@ -771,6 +819,12 @@ struct mana_create_wqobj_resp {
 	u32 wq_id;
 	u32 cq_id;
 	mana_handle_t wq_obj;
+
+	/* V2 */
+	u16 cq_moderation_comp;
+	u16 cq_moderation_usec;
+	u8 cq_moderation_enabled;
+	u8 reserved1[3];
 }; /* HW DATA */
 
 /* Destroy WQ Object */
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next v3 0/6] pds_core: Add PLDM firmware update and host backed memory support
From: Jakub Kicinski @ 2026-06-13 20:58 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Rao, Nikhil, netdev, Brett Creeley, Andrew Lunn, David S . Miller,
	Eric Dumazet, Paolo Abeni, Eric Joyner
In-Reply-To: <1264de6d-a669-4afa-ad08-594d956e101a@intel.com>

On Thu, 11 Jun 2026 11:53:13 -0700 Jacob Keller wrote:
> On 6/11/2026 10:15 AM, Rao, Nikhil wrote:
> >> The preference is to use generic names when possible/feasible. I think
> >> at least some of your name choices could align with the generic ones.
> >>
> >> Could you explain why fw.mainfw was selected and fw.mgmt was deemed not
> >> suitable?  
> > 
> > This component handles both control and data path, so fw.mgmt didn't
> > feel right since devlink-info.rst explicitly excludes data path from
> > that definition. We went with fw.mainfw to indicate it's the primary
> > firmware component
> 
> It might make sense to extend or add a new definition in this case.
> Technically you could also report both fw.mgmt and fw.app together, but
> I think that would be more confusing.
> 
> Perhaps Jakub has a suggestion on the name or policy here. The
> maintainer preference has generally been to use or extend standardized
> names first unless the name or component is clearly unique to the device.

If it covers both datapath and mgmt then shouldn't it be just "fw" ?
Right now driver reports something "running"-only as "fw", looks off

^ permalink raw reply

* Re: [PATCH v5 2/6] landlock: Add UDP send+connect access control
From: Mickaël Salaün @ 2026-06-13 20:55 UTC (permalink / raw)
  To: Matthieu Buffet
  Cc: Günther Noack, linux-security-module, Mikhail Ivanov,
	konstantin.meskhidze, Tingmao Wang, netdev
In-Reply-To: <20260611162107.49278-3-matthieu@buffet.re>

A few issues were identified by Sashiko:
https://sashiko.dev/#/patchset/20260611162107.49278-1-matthieu%40buffet.re

I squashed this patch:

diff --git a/security/landlock/net.c b/security/landlock/net.c
index 9273cdbbf844..b12568666a9e 100644
--- a/security/landlock/net.c
+++ b/security/landlock/net.c
@@ -261,10 +261,17 @@ static int current_check_access_socket(struct socket *const sock,
 
 static int current_check_autobind_udp_socket(struct socket *const sock)
 {
+	const struct access_masks bind_udp = {
+		.net = LANDLOCK_ACCESS_NET_BIND_UDP,
+	};
 	struct sockaddr_storage port0 = {};
 	unsigned short num;
 	bool slow;
 
+	/* Quick return for non-Landlocked tasks. */
+	if (!landlock_get_applicable_subject(current_cred(), bind_udp, NULL))
+		return 0;
+
 	/*
 	 * On UDP sockets, if a local port has not already been bound, calling
 	 * connect() or sending a first datagram has the side effect of
@@ -287,8 +294,7 @@ static int current_check_autobind_udp_socket(struct socket *const sock)
 	port0.ss_family = READ_ONCE(sock->sk->sk_family);
 
 	return current_check_access_socket(sock, (struct sockaddr *)&port0,
-					   sizeof(port0),
-					   LANDLOCK_ACCESS_NET_BIND_UDP, false);
+					   sizeof(port0), bind_udp.net, false);
 }
 
 static int hook_socket_bind(struct socket *const sock,
@@ -328,7 +334,9 @@ static int hook_socket_connect(struct socket *const sock,
 	 * connect()ing to an AF_UNSPEC address does not trigger an autobind and
 	 * should never be restricted.
 	 */
-	if (ret == 0 && sk_is_udp(sock->sk) && address->sa_family != AF_UNSPEC)
+	if (ret == 0 && sk_is_udp(sock->sk) &&
+	    addrlen >= offsetofend(typeof(*address), sa_family) &&
+	    address->sa_family != AF_UNSPEC)
 		ret = current_check_autobind_udp_socket(sock);
 
 	return ret;


We might want to factor out some code, but that should be good for now.


On Thu, Jun 11, 2026 at 06:21:02PM +0200, Matthieu Buffet wrote:
> Add support for a second fine-grained UDP access right.
> LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP controls the ability to set the
> remote port of a socket (via connect()) and to specify an explicit
> destination when sending a datagram, to override any remote peer set on
> a UDP socket (e.g. in sendto() or sendmsg()).
> It will be useful for applications that send datagrams, and for some
> servers too (those creating per-client sockets, which want to receive
> traffic only from a specific address).
> 
> Similarly as for bind(), this access control is performed when
> configuring sockets, not in hot code paths.
> 
> Add detection of when autobind is about to be required, and deny the
> operation if the process would not be allowed to call bind(0)
> explicitly. Autobind can only be performed in udp_lib_get_port() from
> code paths already controlled by LSM hooks: when connect()ing,
> sending a first datagram, and in some splice() EOF edge case which,
> afaiu, can only happen after a remote peer has been set. This invariant
> needs to be preserved to keep bind policies actually enforced.
> 
> Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
> ---
>  include/uapi/linux/landlock.h               |  23 ++++
>  security/landlock/audit.c                   |   2 +
>  security/landlock/limits.h                  |   2 +-
>  security/landlock/net.c                     | 137 +++++++++++++++++---
>  tools/testing/selftests/landlock/net_test.c |   5 +-
>  5 files changed, 151 insertions(+), 18 deletions(-)
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 045b251ff1b4..b147223efc97 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -378,11 +378,34 @@ struct landlock_net_port_attr {
>   *
>   * - %LANDLOCK_ACCESS_NET_BIND_UDP: Bind UDP sockets to the given local
>   *   port. Support added in Landlock ABI version 10.
> + * - %LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP: Set the remote port of UDP
> + *   sockets to the given port, or send datagrams to the given remote port
> + *   ignoring any destination pre-set on a socket. Support added in
> + *   Landlock ABI version 10.
> + *
> + * .. note:: Setting a remote address or sending a first datagram
> + *   auto-binds UDP sockets to an ephemeral local source port if not
> + *   already bound. To allow this if both %LANDLOCK_ACCESS_NET_BIND_UDP
> + *   and %LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP are handled, you need to
> + *   either:
> + *
> + *   - use a socket already bound to a port before the ruleset started
> + *     being enforced;
> + *   - or grant %LANDLOCK_ACCESS_NET_BIND_UDP on port 0, meaning "any
> + *     port in the ephemeral port range";
> + *   - or grant %LANDLOCK_ACCESS_NET_BIND_UDP on a specific port, and
> + *     call :manpage:`bind(2)` on that port before trying to
> + *     :manpage:`connect(2)` or send datagrams.
> + *
> + * .. note:: Sending datagrams to an ``AF_UNSPEC`` destination address
> + *   family is not supported for IPv6 UDP sockets: you will need to use a
> + *   ``NULL`` address instead.
>   */
>  /* clang-format off */
>  #define LANDLOCK_ACCESS_NET_BIND_TCP			(1ULL << 0)
>  #define LANDLOCK_ACCESS_NET_CONNECT_TCP			(1ULL << 1)
>  #define LANDLOCK_ACCESS_NET_BIND_UDP			(1ULL << 2)
> +#define LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP		(1ULL << 3)
>  /* clang-format on */
>  
>  /**
> diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> index e676ebffeebe..851647197a01 100644
> --- a/security/landlock/audit.c
> +++ b/security/landlock/audit.c
> @@ -46,6 +46,8 @@ static const char *const net_access_strings[] = {
>  	[BIT_INDEX(LANDLOCK_ACCESS_NET_BIND_TCP)] = "net.bind_tcp",
>  	[BIT_INDEX(LANDLOCK_ACCESS_NET_CONNECT_TCP)] = "net.connect_tcp",
>  	[BIT_INDEX(LANDLOCK_ACCESS_NET_BIND_UDP)] = "net.bind_udp",
> +	[BIT_INDEX(LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP)] =
> +		"net.connect_send_udp",
>  };
>  
>  static_assert(ARRAY_SIZE(net_access_strings) == LANDLOCK_NUM_ACCESS_NET);
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index c0f30a4591b8..a4d908b240a2 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -23,7 +23,7 @@
>  #define LANDLOCK_MASK_ACCESS_FS		((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
>  #define LANDLOCK_NUM_ACCESS_FS		__const_hweight64(LANDLOCK_MASK_ACCESS_FS)
>  
> -#define LANDLOCK_LAST_ACCESS_NET	LANDLOCK_ACCESS_NET_BIND_UDP
> +#define LANDLOCK_LAST_ACCESS_NET	LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP
>  #define LANDLOCK_MASK_ACCESS_NET	((LANDLOCK_LAST_ACCESS_NET << 1) - 1)
>  #define LANDLOCK_NUM_ACCESS_NET		__const_hweight64(LANDLOCK_MASK_ACCESS_NET)
>  
> diff --git a/security/landlock/net.c b/security/landlock/net.c
> index 8da40614c452..0e697403eca9 100644
> --- a/security/landlock/net.c
> +++ b/security/landlock/net.c
> @@ -44,7 +44,8 @@ int landlock_append_net_rule(struct landlock_ruleset *const ruleset,
>  static int current_check_access_socket(struct socket *const sock,
>  				       struct sockaddr *const address,
>  				       const int addrlen,
> -				       access_mask_t access_request)
> +				       access_mask_t access_request,
> +				       bool connecting)
>  {
>  	unsigned short sock_family;
>  	__be16 port;
> @@ -75,19 +76,51 @@ static int current_check_access_socket(struct socket *const sock,
>  
>  	switch (address->sa_family) {
>  	case AF_UNSPEC:
> -		if (access_request == LANDLOCK_ACCESS_NET_CONNECT_TCP) {
> +		if (access_request == LANDLOCK_ACCESS_NET_CONNECT_TCP ||
> +		    (access_request == LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP &&
> +		     connecting)) {
>  			/*
>  			 * Connecting to an address with AF_UNSPEC dissolves
> -			 * the TCP association, which have the same effect as
> -			 * closing the connection while retaining the socket
> -			 * object (i.e., the file descriptor).  As for dropping
> -			 * privileges, closing connections is always allowed.
> -			 *
> -			 * For a TCP access control system, this request is
> -			 * legitimate. Let the network stack handle potential
> +			 * the remote association while retaining the socket
> +			 * object (i.e., the file descriptor). For TCP, it has
> +			 * the same effect as closing the connection. For UDP,
> +			 * it removes any preset remote address. As for
> +			 * dropping privileges, these actions are always
> +			 * allowed.
> +			 * Let the network stack handle potential
>  			 * inconsistencies and return -EINVAL if needed.
>  			 */
>  			return 0;
> +		} else if (access_request ==
> +			   LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP) {
> +			if (sock_family == AF_INET6) {
> +				/*
> +				 * We cannot allow sending UDP datagrams to an
> +				 * explicit AF_UNSPEC address on IPv6 sockets,
> +				 * even if AF_UNSPEC is treated as "no address"
> +				 * on such sockets (so it should always be allowed).
> +				 * That's because the socket's family can change under
> +				 * our feet (if another thread calls setsockopt(IPV6_ADDRFORM))
> +				 * to IPv4, which would then treat AF_UNSPEC as
> +				 * AF_INET.
> +				 */
> +				audit_net.family = AF_UNSPEC;
> +				audit_net.sk = sock->sk;
> +				landlock_init_layer_masks(
> +					subject->domain, access_request,
> +					&layer_masks, LANDLOCK_KEY_NET_PORT);
> +				landlock_log_denial(
> +					subject,
> +					&(struct landlock_request){
> +						.type = LANDLOCK_REQUEST_NET_ACCESS,
> +						.audit.type =
> +							LSM_AUDIT_DATA_NET,
> +						.audit.u.net = &audit_net,
> +						.access = access_request,
> +						.layer_masks = &layer_masks,
> +					});
> +				return -EACCES;
> +			}
>  		} else if (access_request == LANDLOCK_ACCESS_NET_BIND_TCP ||
>  			   access_request == LANDLOCK_ACCESS_NET_BIND_UDP) {
>  			/*
> @@ -130,7 +163,11 @@ static int current_check_access_socket(struct socket *const sock,
>  		} else {
>  			WARN_ON_ONCE(1);
>  		}
> -		/* Only for bind(AF_UNSPEC+INADDR_ANY) on IPv4 socket. */
> +		/*
> +		 * AF_UNSPEC is treated as AF_INET only in
> +		 * bind(AF_UNSPEC+INADDR_ANY) on IPv4 sockets and
> +		 * when sending to AF_UNSPEC addresses on IPv4 sockets.
> +		 */
>  		fallthrough;
>  	case AF_INET: {
>  		const struct sockaddr_in *addr4;
> @@ -141,7 +178,8 @@ static int current_check_access_socket(struct socket *const sock,
>  		addr4 = (struct sockaddr_in *)address;
>  		port = addr4->sin_port;
>  
> -		if (access_request == LANDLOCK_ACCESS_NET_CONNECT_TCP) {
> +		if (access_request == LANDLOCK_ACCESS_NET_CONNECT_TCP ||
> +		    access_request == LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP) {
>  			audit_net.dport = port;
>  			audit_net.v4info.daddr = addr4->sin_addr.s_addr;
>  		} else if (access_request == LANDLOCK_ACCESS_NET_BIND_TCP ||
> @@ -164,7 +202,8 @@ static int current_check_access_socket(struct socket *const sock,
>  		addr6 = (struct sockaddr_in6 *)address;
>  		port = addr6->sin6_port;
>  
> -		if (access_request == LANDLOCK_ACCESS_NET_CONNECT_TCP) {
> +		if (access_request == LANDLOCK_ACCESS_NET_CONNECT_TCP ||
> +		    access_request == LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP) {
>  			audit_net.dport = port;
>  			audit_net.v6info.daddr = addr6->sin6_addr;
>  		} else if (access_request == LANDLOCK_ACCESS_NET_BIND_TCP ||
> @@ -221,6 +260,38 @@ static int current_check_access_socket(struct socket *const sock,
>  	return -EACCES;
>  }
>  
> +static int current_check_autobind_udp_socket(struct socket *const sock)
> +{
> +	struct sockaddr_storage port0 = {};
> +	unsigned short num;
> +	bool slow;
> +
> +	/*
> +	 * On UDP sockets, if a local port has not already been bound,
> +	 * calling connect() or sending a first datagram has the side
> +	 * effect of autobinding an ephemeral port: we also have to check
> +	 * that the process would have had the right to bind(0) explicitly.
> +	 * Hold the socket lock around the inet_num read to exclude
> +	 * udp_lib_get_port()'s transient inet_num = snum write that is
> +	 * reverted to 0 on a failing reuseport bind.
> +	 */
> +	slow = lock_sock_fast(sock->sk);
> +	num = inet_sk(sock->sk)->inet_num;
> +	unlock_sock_fast(sock->sk, slow);
> +	if (num != 0)
> +		return 0;
> +
> +	/*
> +	 * Construct a struct sockaddr* with port 0 to pretend the
> +	 * process tried to bind() on that address.
> +	 */
> +	port0.ss_family = READ_ONCE(sock->sk->sk_family);
> +
> +	return current_check_access_socket(sock, (struct sockaddr *)&port0,
> +					   sizeof(port0),
> +					   LANDLOCK_ACCESS_NET_BIND_UDP, false);
> +}
> +
>  static int hook_socket_bind(struct socket *const sock,
>  			    struct sockaddr *const address, const int addrlen)
>  {
> @@ -234,7 +305,7 @@ static int hook_socket_bind(struct socket *const sock,
>  		return 0;
>  
>  	return current_check_access_socket(sock, address, addrlen,
> -					   access_request);
> +					   access_request, false);
>  }
>  
>  static int hook_socket_connect(struct socket *const sock,
> @@ -242,19 +313,55 @@ static int hook_socket_connect(struct socket *const sock,
>  			       const int addrlen)
>  {
>  	access_mask_t access_request;
> +	int ret = 0;
>  
>  	if (sk_is_tcp(sock->sk))
>  		access_request = LANDLOCK_ACCESS_NET_CONNECT_TCP;
> +	else if (sk_is_udp(sock->sk))
> +		access_request = LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP;
>  	else
>  		return 0;
>  
> -	return current_check_access_socket(sock, address, addrlen,
> -					   access_request);
> +	ret = current_check_access_socket(sock, address, addrlen,
> +					  access_request, true);
> +
> +	/*
> +	 * connect()ing to an AF_UNSPEC address does not trigger an
> +	 * autobind and should never be restricted.
> +	 */
> +	if (ret == 0 && sk_is_udp(sock->sk) && address->sa_family != AF_UNSPEC)
> +		ret = current_check_autobind_udp_socket(sock);
> +
> +	return ret;
> +}
> +
> +static int hook_socket_sendmsg(struct socket *const sock,
> +			       struct msghdr *const msg, const int size)
> +{
> +	struct sockaddr *const address = msg->msg_name;
> +	const int addrlen = msg->msg_namelen;
> +	access_mask_t access_request;
> +	int ret = 0;
> +
> +	if (sk_is_udp(sock->sk))
> +		access_request = LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP;
> +	else
> +		return 0;
> +
> +	if (address != NULL)
> +		ret = current_check_access_socket(sock, address, addrlen,
> +						  access_request, false);
> +
> +	if (ret == 0)
> +		ret = current_check_autobind_udp_socket(sock);
> +
> +	return ret;
>  }
>  
>  static struct security_hook_list landlock_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
>  	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
> +	LSM_HOOK_INIT(socket_sendmsg, hook_socket_sendmsg),
>  };
>  
>  __init void landlock_add_net_hooks(void)
> diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
> index ec392d971ea3..016c7277e370 100644
> --- a/tools/testing/selftests/landlock/net_test.c
> +++ b/tools/testing/selftests/landlock/net_test.c
> @@ -1326,12 +1326,13 @@ FIXTURE_TEARDOWN(mini)
>  
>  /* clang-format off */
>  
> -#define ACCESS_LAST LANDLOCK_ACCESS_NET_BIND_UDP
> +#define ACCESS_LAST LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP
>  
>  #define ACCESS_ALL ( \
>  	LANDLOCK_ACCESS_NET_BIND_TCP | \
>  	LANDLOCK_ACCESS_NET_CONNECT_TCP | \
> -	LANDLOCK_ACCESS_NET_BIND_UDP)
> +	LANDLOCK_ACCESS_NET_BIND_UDP | \
> +	LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP)
>  
>  /* clang-format on */
>  
> -- 
> 2.47.3
> 
> 

^ permalink raw reply related

* RE: [EXTERNAL] Re: [PATCH net-next v3] net: mana: Add Interrupt Moderation support
From: Haiyang Zhang @ 2026-06-13 20:48 UTC (permalink / raw)
  To: Simon Horman
  Cc: linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	KY Srinivasan, wei.liu@kernel.org, Dexuan Cui, Long Li,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Konstantin Taranov,
	shradhagupta@linux.microsoft.com, ernis@linux.microsoft.com,
	dipayanroy@linux.microsoft.com, gargaditya@linux.microsoft.com,
	kees@kernel.org, leitao@debian.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, Paul Rosswurm
In-Reply-To: <20260613082014.715350-1-horms@kernel.org>



> -----Original Message-----
> From: Simon Horman <horms@kernel.org>
> Sent: Saturday, June 13, 2026 4:20 AM
> To: haiyangz@linux.microsoft.com
> Cc: Simon Horman <horms@kernel.org>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <DECUI@microsoft.com>; Long Li <longli@microsoft.com>;
> andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Konstantin Taranov
> <kotaranov@microsoft.com>; shradhagupta@linux.microsoft.com;
> ernis@linux.microsoft.com; dipayanroy@linux.microsoft.com;
> gargaditya@linux.microsoft.com; kees@kernel.org; leitao@debian.org; linux-
> kernel@vger.kernel.org; linux-rdma@vger.kernel.org; Paul Rosswurm
> <paulros@microsoft.com>
> Subject: [EXTERNAL] Re: [PATCH net-next v3] net: mana: Add Interrupt
> Moderation support
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsashiko.
> dev%2F&data=05%7C02%7Chaiyangz%40microsoft.com%7Ccbca53cce3d24759006708dec
> 924a556%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639169356403391033%7C
> Unknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJX
> aW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=D2ZPGJ5XznEkcL
> uH3Cxtze0FYOL3xxDIY%2Bg2DP508Q8%3D&reserved=0
> ---
> Date:   Thu Jun 11 19:02:22 2026 +0000
> 
>     net: mana: Add Interrupt Moderation support
> 
>     This commit adds Static and Dynamic Interrupt Moderation (DIM) support
> for
>     Rx and Tx. It updates the queue creation procedure, adds functions to
>     collect stats for DIM, and provides ethtool handler support to get/set
>     the moderation settings.
> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 26aef21c6c2c8..d92e022d8533c 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -1579,6 +1579,9 @@ int mana_create_wq_obj(struct mana_port_context
> *apc,
> >
> >  	mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
> >  			     sizeof(req), sizeof(resp));
> > +
> > +	req.hdr.req.msg_version = GDMA_MESSAGE_V3;
> > +	req.hdr.resp.msg_version = GDMA_MESSAGE_V2;
> >  	req.vport = vport;
> >  	req.wq_type = wq_type;
> >  	req.wq_gdma_region = wq_spec->gdma_region;
> > @@ -1587,6 +1590,9 @@ int mana_create_wq_obj(struct mana_port_context
> *apc,
> >  	req.cq_size = cq_spec->queue_size;
> >  	req.cq_moderation_ctx_id = cq_spec->modr_ctx_id;
> >  	req.cq_parent_qid = cq_spec->attached_eq;
> > +	req.req_cq_moderation = cq_spec->req_cq_moderation;
> > +	req.cq_moderation_comp = cq_spec->cq_moderation_comp;
> > +	req.cq_moderation_usec = cq_spec->cq_moderation_usec;
> 
> [Severity: High]
> Does this break queue creation on older Azure host firmwares?
> 
> The code unconditionally bumps the WQ object creation request to use
> GDMA_MESSAGE_V3 and passes the req_cq_moderation flag. If the firmware
> does
> not support GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION, could the
> firmware
> reject the queue creation request and break device initialization
> entirely?

Our firmware is forward compatible with newer message versions, so the 
old firmware still properly handles this message, just the new feature 
fields are ignored, and queue creation will be successful.
And if the DIM capability bit is zero from FW, driver will keep the DIM
feature to be off and unchangeable.

> 
> [ ... ]
> 
> > +static void mana_update_tx_dim(struct mana_cq *cq)
> > +{
> > +	struct mana_port_context *apc = netdev_priv(cq->txq->ndev);
> > +	struct dim_sample dim_sample = {};
> > +	struct mana_txq *txq = cq->txq;
> > +
> > +	if (!apc->tx_dim_enabled)
> > +		return;
> > +
> > +	dim_update_sample(READ_ONCE(cq->dim_event_ctr), txq->stats.packets,
> > +			  txq->stats.bytes, &dim_sample);
> > +	net_dim(&cq->dim, &dim_sample);
> > +}
> 
> [Severity: High]
> Does this measure the CPU enqueue rate rather than the hardware completion
> rate?
> 
> It appears that txq->stats.packets is incremented in mana_start_xmit()
> upon enqueue, not during completion in mana_poll_tx_cq(). Dynamic
> interrupt
> moderation relies on measuring the actual hardware completion rate to
> function correctly.
> 
> Additionally, is it safe to read txq->stats.packets here without the
> u64_stats_fetch_begin() retry loop? Reading a 64-bit statistic locklessly
> without the synchronization primitives could result in torn reads on 32-
> bit
> systems.
In the next patch version, I added counters to cq, and track the HW comp rate,
And the reader is in the same NAPI thread to prevent any torn reads.

> 
> [ ... ]
> 
> > @@ -2324,7 +2434,13 @@ static int mana_cq_handler(void *context, struct
> gdma_queue *gdma_queue)
> >  	if (w < cq->budget) {
> >  		mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
> >  		cq->work_done_since_doorbell = 0;
> > -		napi_complete_done(&cq->napi, w);
> > +
> > +		if (napi_complete_done(&cq->napi, w)) {
> > +			if (cq->type == MANA_CQ_TYPE_RX)
> > +				mana_update_rx_dim(cq);
> > +			else
> > +				mana_update_tx_dim(cq);
> > +		}
> 
> [Severity: High]
> Is it safe to call mana_update_rx_dim() and mana_update_tx_dim() after
> napi_complete_done()?
> 
> Since napi_complete_done() ends the NAPI polling cycle and relinquishes
> mutual exclusion, and the hardware interrupt was already re-armed by
> mana_gd_ring_cq() right before this, couldn't a new packet trigger an
> interrupt immediately? This could allow NAPI to be scheduled on another
> CPU
> concurrently while this CPU is still running the DIM update, potentially
> corrupting the non-thread-safe net_dim() state machine.

Thanks, I moved the dim update before the napi_complete_done() to avoid
concurrently running it.

> 
> [ ... ]
> 
> > @@ -2562,6 +2685,9 @@ static int mana_create_txq(struct
> mana_port_context *apc,
> >  		napi_enable_locked(&cq->napi);
> >  		txq->napi_initialized = true;
> >
> > +		INIT_WORK(&cq->dim.work, mana_tx_dim_work);
> > +		cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> 
> [Severity: High]
> Could this allow NAPI to schedule an uninitialized work_struct?
> 
> napi_enable_locked() is called right before INIT_WORK(). Once NAPI is
> enabled, it can be scheduled to run immediately (for example, via
> netpoll).
> If NAPI executes before INIT_WORK completes, mana_update_tx_dim() would
> pass the uninitialized cq->dim to net_dim(), which could cause a kernel
> panic when it attempts to call schedule_work().
> 
> This pattern also occurs in mana_create_rxq().

I have updated the init order in the new version.

> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > index 94e658d07a27e..67a2b282ff4da 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> 
> [ ... ]
> 
> > @@ -440,18 +474,123 @@ static int mana_set_coalesce(struct net_device
> *ndev,
> > +		if (enable_rx_dim)
> > +			apc->rx_dim_enabled = true;
> > +		if (enable_tx_dim)
> > +			apc->tx_dim_enabled = true;
> 
> [Severity: High]
> Could the lack of memory barriers here expose uninitialized DIM structures
> to NAPI?
> 
> The driver initializes DIM via mana_dim_change() and then sets
> apc->rx_dim_enabled to true. Without an smp_store_release() here and a
> corresponding smp_load_acquire() in mana_update_rx_dim(), weakly-ordered
> CPUs like ARM64 might reorder the stores. Concurrently, NAPI polling might
> observe the flag as true before the initialization is fully visible in
> memory,
> potentially invoking net_dim() on garbage memory.

I added smp_store_release() & smp_load_acquire() to fix it in the next version.

And, I will submit the next version soon.

Thanks,
- Haiyang


^ permalink raw reply

* Re: [PATCH 01/18] xfrm: cleanup error path in xfrm_add_policy()
From: patchwork-bot+netdevbpf @ 2026-06-13 20:40 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, kuba, herbert, netdev
In-Reply-To: <20260612074725.1760473-2-steffen.klassert@secunet.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Steffen Klassert <steffen.klassert@secunet.com>:

On Fri, 12 Jun 2026 09:46:17 +0200 you wrote:
> From: Deepanshu Kartikey <kartikey406@gmail.com>
> 
> Replace the open-coded manual cleanup in the error path of
> xfrm_add_policy() with xfrm_policy_destroy(), which already
> handles all the necessary cleanup internally. This is consistent
> with how xfrm_policy_construct() handles its own error paths.
> 
> [...]

Here is the summary with links:
  - [01/18] xfrm: cleanup error path in xfrm_add_policy()
    https://git.kernel.org/netdev/net-next/c/a77d172177f3
  - [02/18] xfrm: Reject excessive values for XFRMA_TFCPAD
    https://git.kernel.org/netdev/net-next/c/41c4d3b26f5e
  - [03/18] xfrm: remove redundant assignments
    https://git.kernel.org/netdev/net-next/c/440bf355d32e
  - [04/18] xfrm: add extack to xfrm_init_state
    https://git.kernel.org/netdev/net-next/c/231a1744dc43
  - [05/18] xfrm: allow migration from UDP encapsulated to non-encapsulated ESP
    https://git.kernel.org/netdev/net-next/c/b8addb8884f2
  - [06/18] xfrm: fix NAT-related field inheritance in SA migration
    https://git.kernel.org/netdev/net-next/c/364e165e0b63
  - [07/18] xfrm: rename reqid in xfrm_migrate
    https://git.kernel.org/netdev/net-next/c/e2e92714d081
  - [08/18] xfrm: split xfrm_state_migrate into create and install functions
    https://git.kernel.org/netdev/net-next/c/8de53883a4bf
  - [09/18] xfrm: check family before comparing addresses in migrate
    https://git.kernel.org/netdev/net-next/c/b2cb192b95e5
  - [10/18] xfrm: add state synchronization after migration
    https://git.kernel.org/netdev/net-next/c/bac7a60e2678
  - [11/18] xfrm: add error messages to state migration
    https://git.kernel.org/netdev/net-next/c/15e5d32de6bf
  - [12/18] xfrm: move encap and xuo into struct xfrm_migrate
    https://git.kernel.org/netdev/net-next/c/1d97daee3038
  - [13/18] xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper
    https://git.kernel.org/netdev/net-next/c/92550d30c69b
  - [14/18] xfrm: extract address family and selector validation helpers
    https://git.kernel.org/netdev/net-next/c/38d400e5d0fd
  - [15/18] xfrm: make xfrm_dev_state_add xuo parameter const
    https://git.kernel.org/netdev/net-next/c/8eed5ba25734
  - [16/18] xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration
    https://git.kernel.org/netdev/net-next/c/a9d155ea9b44
  - [17/18] xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE
    https://git.kernel.org/netdev/net-next/c/c4460171d78a
  - [18/18] xfrm: add documentation for XFRM_MSG_MIGRATE_STATE
    https://git.kernel.org/netdev/net-next/c/c13c0cc6f52e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v14 net-next 00/13] dpll/ice: Add generic DPLL type and full TX reference clock control for E825
From: patchwork-bot+netdevbpf @ 2026-06-13 20:40 UTC (permalink / raw)
  To: Nitka, Grzegorz
  Cc: netdev, linux-kernel, intel-wired-lan, poros, richardcochran,
	andrew+netdev, przemyslaw.kitszel, anthony.l.nguyen,
	Prathosh.Satish, ivecera, jiri, arkadiusz.kubalewski,
	vadim.fedorenko, donald.hunter, horms, pabeni, kuba, davem,
	edumazet
In-Reply-To: <20260607183045.1213735-1-grzegorz.nitka@intel.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun,  7 Jun 2026 20:30:32 +0200 you wrote:
> NOTE: This series is intentionally submitted on net-next (not
> intel-wired-lan) as early feedback of DPLL subsystem changes is
> welcomed. In the past possible approaches were discussed in [1].
> 
> This series adds TX reference clock support for E825 devices and exposes
> TX clock selection and synchronization status via the Linux DPLL
> subsystem.
> 
> [...]

Here is the summary with links:
  - [v14,net-next,01/13] dpll: add generic DPLL type
    https://git.kernel.org/netdev/net-next/c/9375487c0c78
  - [v14,net-next,02/13] dpll: allow registering FW-identified pin with a different DPLL
    https://git.kernel.org/netdev/net-next/c/c191b319f208
  - [v14,net-next,03/13] dpll: fix stale iteration in dpll_pin_on_pin_unregister()
    https://git.kernel.org/netdev/net-next/c/32239d600236
  - [v14,net-next,04/13] dpll: send delete notification before unregister in on-pin rollback
    https://git.kernel.org/netdev/net-next/c/e83b403eb142
  - [v14,net-next,05/13] dpll: emit per-dpll delete notifications in dpll_pin_on_pin_unregister()
    https://git.kernel.org/netdev/net-next/c/df0ba51ccf87
  - [v14,net-next,06/13] dpll: guard sync-pair removal on full pin unregister
    https://git.kernel.org/netdev/net-next/c/0a5c720a7d57
  - [v14,net-next,07/13] dpll: balance create/delete notifications in __dpll_pin_(un)register
    https://git.kernel.org/netdev/net-next/c/1a2292101c0d
  - [v14,net-next,08/13] dpll: extend pin notifier with notification source ID
    https://git.kernel.org/netdev/net-next/c/0bf47f722fa9
  - [v14,net-next,09/13] dpll: allow fwnode pins to attempt state change without capability bit
    https://git.kernel.org/netdev/net-next/c/521b6d5de08d
  - [v14,net-next,10/13] ice: introduce TXC DPLL device and TX ref clock pin framework for E825
    https://git.kernel.org/netdev/net-next/c/5db36ee62849
  - [v14,net-next,11/13] ice: implement CPI support for E825C
    https://git.kernel.org/netdev/net-next/c/fff4ed70ca9b
  - [v14,net-next,12/13] ice: add Tx reference clock index handling to AN restart command
    https://git.kernel.org/netdev/net-next/c/4128bda8fc1d
  - [v14,net-next,13/13] ice: implement E825 TX ref clock control and TXC hardware sync status
    https://git.kernel.org/netdev/net-next/c/e075d7768386

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v6 3/5] net: dsa: tag_ks8995: Add the KS8995 tag handling
From: Jakub Kicinski @ 2026-06-13 20:38 UTC (permalink / raw)
  To: Linus Walleij
  Cc: woojung.huh, UNGLinuxDriver, andrew, olteanv, davem, edumazet,
	pabeni, robh, krzk+dt, conor+dt, marex, horms, linux, netdev,
	devicetree, nb
In-Reply-To: <CAD++jLnuBv97nUW-EdZXiLmgsUSiVLgkB0R=gKB0zYtr8JN7xg@mail.gmail.com>

On Sat, 13 Jun 2026 18:56:03 +0200 Linus Walleij wrote:
> Which is what I do.
> 
> So yeah. skb_free() will be free:ed twice. The code in tag_8021q.c will
> also do that. But what do you expect ->xmit() to return on error if
> not NULL?
> 
> When user.c does this:
> 
>     /* Transmit function may have to reallocate the original SKB,
>      * in which case it must have freed it. Only free it here on error.
>      */
>     nskb = p->xmit(skb, dev);
>     if (!nskb) {
>         kfree_skb(skb);
>         return NETDEV_TX_OK;
>     }
> 
>     return dsa_enqueue_skb(nskb, dev);
> 
> The only way to get clean out of this branch if you run
> into an error in ->xmit() is to return NULL!

Yes, maybe DSA experts remember the background here, and can guide us
But from a fresh look this and ->rcv have very odd semantics.

  nskb = func(skb);

should assume skb is either freed or returned. Freeing the input param
on failure of func() is a rather strange pattern.

I vote we drop these kfree_skb()s (both xmit and rcv) and fix up any
driver that depended on them (if any)?

^ permalink raw reply

* Re: [PATCH net-next 4/4] net: dsa: realtek: rtl8366rb: Switch to generic learning enablement
From: Luiz Angelo Daros de Luca @ 2026-06-13 20:28 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Alvin Šipraga, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
In-Reply-To: <20260612-rtl8366rb-improvements-v1-4-9232286fc20c@kernel.org>

Em sex., 12 de jun. de 2026 às 11:23, Linus Walleij
<linusw@kernel.org> escreveu:
>
> Instead of just writing the learning disablement register in setup
> and a custom handling of BR_LEARNING, implement the generic RTL83xx
> .port_set_learning() callback for setting learning on a port, and
> call this in the per-port loop in .setup().
>
> Instead of the custom rtl83366rb_port_bridge_flags() function for
> setting learning mode on each port, use the RTL83xx generic
> rtl83xx_port_bridge_flags() callback.
>
> Signed-off-by: Linus Walleij <linusw@kernel.org>
> ---
>  drivers/net/dsa/realtek/rtl8366rb.c | 43 +++++++++++++++----------------------
>  1 file changed, 17 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/dsa/realtek/rtl8366rb.c b/drivers/net/dsa/realtek/rtl8366rb.c
> index 155bf0010d5f..d2fa8ff6a5d0 100644
> --- a/drivers/net/dsa/realtek/rtl8366rb.c
> +++ b/drivers/net/dsa/realtek/rtl8366rb.c
> @@ -854,6 +854,16 @@ rtl8366rb_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
>         }
>  }
>
> +static int rtl8366rb_port_set_learning(struct realtek_priv *priv, int port,
> +                                      bool enable)
> +{
> +       /* Notice inverted semantics in this register: setting a bit disables
> +        * learning instead of enabling it.
> +        */
> +       return regmap_update_bits(priv->map, RTL8366RB_PORT_LEARNDIS_CTRL,
> +                                 BIT(port), enable ? 0 : BIT(port));
> +}
> +
>  static int rtl8366rb_setup(struct dsa_switch *ds)
>  {
>         struct realtek_priv *priv = ds->priv;
> @@ -945,6 +955,11 @@ static int rtl8366rb_setup(struct dsa_switch *ds)
>                 if (ret)
>                         return ret;
>
> +               /* Disable learning */
> +               ret = rtl8366rb_port_set_learning(priv, dp->index, false);
> +               if (ret)
> +                       return ret;
> +
>                 /* Collect CPU ports. If we support cascade switches, it should
>                  * also include the upstream DSA ports.
>                  */
> @@ -1037,12 +1052,6 @@ static int rtl8366rb_setup(struct dsa_switch *ds)
>                         rb->max_mtu[i] = ETH_DATA_LEN;
>         }
>
> -       /* Disable learning for all ports */
> -       ret = regmap_write(priv->map, RTL8366RB_PORT_LEARNDIS_CTRL,
> -                          RTL8366RB_PORT_ALL);
> -       if (ret)
> -               return ret;
> -
>         /* Enable auto ageing for all ports */
>         ret = regmap_write(priv->map, RTL8366RB_SECURITY_CTRL, 0);
>         if (ret)
> @@ -1341,25 +1350,6 @@ rtl8366rb_port_pre_bridge_flags(struct dsa_switch *ds, int port,
>         return 0;
>  }
>
> -static int
> -rtl8366rb_port_bridge_flags(struct dsa_switch *ds, int port,
> -                           struct switchdev_brport_flags flags,
> -                           struct netlink_ext_ack *extack)
> -{
> -       struct realtek_priv *priv = ds->priv;
> -       int ret;
> -
> -       if (flags.mask & BR_LEARNING) {
> -               ret = regmap_update_bits(priv->map, RTL8366RB_PORT_LEARNDIS_CTRL,
> -                                        BIT(port),
> -                                        (flags.val & BR_LEARNING) ? 0 : BIT(port));
> -               if (ret)
> -                       return ret;
> -       }
> -
> -       return 0;
> -}
> -
>  static void
>  rtl8366rb_port_fast_age(struct dsa_switch *ds, int port)
>  {
> @@ -1810,7 +1800,7 @@ static const struct dsa_switch_ops rtl8366rb_switch_ops = {
>         .port_enable = rtl8366rb_port_enable,
>         .port_disable = rtl8366rb_port_disable,
>         .port_pre_bridge_flags = rtl8366rb_port_pre_bridge_flags,
> -       .port_bridge_flags = rtl8366rb_port_bridge_flags,
> +       .port_bridge_flags = rtl83xx_port_bridge_flags,
>         .port_stp_state_set = rtl8366rb_port_stp_state_set,
>         .port_fast_age = rtl8366rb_port_fast_age,
>         .port_change_mtu = rtl8366rb_change_mtu,
> @@ -1833,6 +1823,7 @@ static const struct realtek_ops rtl8366rb_ops = {
>         .enable_vlan4k  = rtl8366rb_enable_vlan4k,
>         .port_add_isolation = rtl8366rb_port_add_isolation,
>         .port_remove_isolation = rtl8366rb_port_remove_isolation,
> +       .port_set_learning = rtl8366rb_port_set_learning,

The bridge flags code is fine as it does not touch CPU port but the
.port_set_learning will also be used by bridge join/leave. As I
mentioned in the last patch comment, disabling CPU learning requires
fdb_add/remove. Maybe we should both check if port_set_learning and
fdb is available before disabling CPU learning.

>         .phy_read       = rtl8366rb_phy_read,
>         .phy_write      = rtl8366rb_phy_write,
>  };
>
> --
> 2.54.0
>

Regards,

Luiz

^ permalink raw reply

* Re: [PATCH net-next 1/4] net: dsa: realtek: rtl8366rb: Switch to generic port_bridge* handlers
From: Luiz Angelo Daros de Luca @ 2026-06-13 20:28 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Alvin Šipraga, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
In-Reply-To: <20260612-rtl8366rb-improvements-v1-1-9232286fc20c@kernel.org>

Hi Linus,

> The RTL8366RB is using its own sub-standard port isolation code.
>
> Implement the required isolation helpers, use these directly in
> the port setup callback, and switch over to the standard port
> isolation code.
>
> Signed-off-by: Linus Walleij <linusw@kernel.org>

> -       .port_bridge_join = rtl8366rb_port_bridge_join,
> -       .port_bridge_leave = rtl8366rb_port_bridge_leave,
> +       .port_bridge_join = rtl83xx_port_bridge_join,
> +       .port_bridge_leave = rtl83xx_port_bridge_leave,

Maybe you wrote this patch based on a previous submission I sent. The
merged code requires both port_set_learning and
port_{add,remove}_isolation. Even if the next patch implements
port_set_learning, it will break bisectability.
You'll need to change rtl83xx.c to make port_set_learning optional in
order to keep the old behavior for rtl8366rb.

If learning remains enabled on the CPU port (as the rtl8366rb driver
works today), the host's MAC entry might eventually expire due to
aging. If that happens, traffic destined to the CPU will momentarily
flood to user ports. It will still work as it does today, but not in
the recommended DSA way.

On the other hand, if you do disable learning on the CPU port, the
driver must implement FDB callbacks so the kernel can statically teach
the switch where the host MAC is. Otherwise, the switch will treat all
host-bound traffic as unknown unicast, flooding it to all user ports
in the same bridge continuously. The worst part is that traffic will
work, but that flooding would be a regression.

My suggestion is to change rtl83xx_port_bridge_{join,leave} to check
for the operation's existence before calling it, preserving the
current state.

>         .port_vlan_filtering = rtl8366rb_vlan_filtering,
>         .port_vlan_add = rtl8366_vlan_add,
>         .port_vlan_del = rtl8366_vlan_del,
> @@ -1830,6 +1792,8 @@ static const struct realtek_ops rtl8366rb_ops = {
>         .is_vlan_valid  = rtl8366rb_is_vlan_valid,
>         .enable_vlan    = rtl8366rb_enable_vlan,
>         .enable_vlan4k  = rtl8366rb_enable_vlan4k,
> +       .port_add_isolation = rtl8366rb_port_add_isolation,
> +       .port_remove_isolation = rtl8366rb_port_remove_isolation,

Regards,

Luiz

^ permalink raw reply

* Re: [PATCH net-next 3/4] net: dsa: realtek: rtl8366rb: Disable STP learning on all ports in setup
From: Luiz Angelo Daros de Luca @ 2026-06-13 20:27 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Alvin Šipraga, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
In-Reply-To: <20260612-rtl8366rb-improvements-v1-3-9232286fc20c@kernel.org>

> +static void
> +rtl8366rb_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
> +{
> +       struct realtek_priv *priv = ds->priv;
> +       u32 val;
> +       int i;
> +
> +       switch (state) {
> +       case BR_STATE_DISABLED:
> +               val = RTL8366RB_STP_STATE_DISABLED;
> +               break;
> +       case BR_STATE_BLOCKING:
> +       case BR_STATE_LISTENING:
> +               val = RTL8366RB_STP_STATE_BLOCKING;
> +               break;
> +       case BR_STATE_LEARNING:
> +               val = RTL8366RB_STP_STATE_LEARNING;
> +               break;
> +       case BR_STATE_FORWARDING:
> +               val = RTL8366RB_STP_STATE_FORWARDING;
> +               break;
> +       default:
> +               dev_err(priv->dev, "unknown bridge state requested\n");
> +               return;
> +       }
> +
> +       /* Set the same status for the port on all the FIDs */
> +       for (i = 0; i < RTL8366RB_NUM_FIDS; i++) {
> +               regmap_update_bits(priv->map, RTL8366RB_STP_STATE_BASE + i,
> +                                  RTL8366RB_STP_STATE_MASK(port),
> +                                  RTL8366RB_STP_STATE(port, val));
> +       }

Hum... I might need to check rtl8365mb as well. There we disable only
MSTP/FID 0.

Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>

^ permalink raw reply

* Re: [PATCH v14 net-next 03/13] dpll: fix stale iteration in dpll_pin_on_pin_unregister()
From: Jakub Kicinski @ 2026-06-13 20:27 UTC (permalink / raw)
  To: Nitka, Grzegorz
  Cc: Paolo Abeni, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org, Oros, Petr,
	richardcochran@gmail.com, andrew+netdev@lunn.ch,
	Kitszel, Przemyslaw, Nguyen, Anthony L,
	Prathosh.Satish@microchip.com, Vecera, Ivan, jiri@resnulli.us,
	Kubalewski, Arkadiusz, vadim.fedorenko@linux.dev,
	donald.hunter@gmail.com, horms@kernel.org, davem@davemloft.net,
	edumazet@google.com
In-Reply-To: <IA1PR11MB6219B836E61D44198B0FB5AF921B2@IA1PR11MB6219.namprd11.prod.outlook.com>

On Thu, 11 Jun 2026 18:36:14 +0000 Nitka, Grzegorz wrote:
> For v14 patchset, I got only comments from Arek and Paolo.
> In my opinion, after rethinking, Arek's concerns are not valid (explained in
> the responses). Maybe I could squash some changes, but the final code would
> remain the same as for v14.
> 
> Paolo raised 'Fixes' tags which I added and critical divide-by-zero panic
> Regarding 'Fixes' tag, It might be my fault or misunderstanding.
> I can remove them if we want and re-send the series.
> Regarding div-by-zero - see my comments about AI concern list below.
> 
> Also, what was raised by AI, I unintentionally changed WARN_ON to WARN_ON_ONCE
> in patch 2. I'd restore it to WARN_ON.

Alright, let me apply this.

But please investigate the situation with the notifications.
Maybe there are two create notifications because we add more info
to the second one? If not we should drop the duplicates.. YNL can
monitor the notifications during tests if you need a repro.

You can follow up with the fixes during the merge window.

^ permalink raw reply

* Re: [PATCH net-next 2/4] net: dsa: realtek: rtl8366rb: Use DSA port iterators
From: Luiz Angelo Daros de Luca @ 2026-06-13 20:22 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Alvin Šipraga, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
In-Reply-To: <20260612-rtl8366rb-improvements-v1-2-9232286fc20c@kernel.org>

Em sex., 12 de jun. de 2026 às 11:23, Linus Walleij
<linusw@kernel.org> escreveu:
>
> Instead of custom loops for intializing the ports (including the
> CPU port) use the DSA helpers dsa_switch_for_each_port() and
> dsa_switch_for_each_cpu_port() following the pattern in RTL8365MB by
> accumulatong masks for the upstream and downstream ports.
>
> This gives us similar enough code to the RTL8365MB that we
> can start using more generic rtl83xx helpers.
>
> Signed-off-by: Linus Walleij <linusw@kernel.org>

Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>

^ permalink raw reply

* Re: [PATCH 0/18] pull request (net-next): ipsec-next 2026-06-12
From: Jakub Kicinski @ 2026-06-13 20:15 UTC (permalink / raw)
  To: Steffen Klassert, Antony Antony; +Cc: David Miller, Herbert Xu, netdev
In-Reply-To: <20260612074725.1760473-1-steffen.klassert@secunet.com>

On Fri, 12 Jun 2026 09:46:16 +0200 Steffen Klassert wrote:
> 3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that
>    allows migrating individual IPsec SAs independently of
>    their policies. The existing XFRM_MSG_MIGRATE is tightly coupled
>    to policy+SA migration, lacks SPI for unique SA identification,
>    and cannot express reqid changes or migrate Transport mode
>    selectors. The new interface identifies the SA via SPI and mark,
>    supports reqid changes, address family changes, encap removal,
>    and uses an atomic create+install flow under x->lock to prevent
>    SN/IV reuse during AEAD SA migration.
>    From Antony Antony.

Hi! There are some Sashiko comments here, please follow up:

https://sashiko.dev/#/patchset/20260612074725.1760473-8-steffen.klassert@secunet.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox