Netdev List
 help / color / mirror / Atom feed
* [PATCH net v5 3/4] bonding: 3ad: fix mux port state on oper down
From: Louis Scalbert @ 2026-05-06 16:11 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, jonas.gorski, Louis Scalbert
In-Reply-To: <20260506161144.465485-1-louis.scalbert@6wind.com>

When the bonding interface has carrier down due to the absence of
usable slaves and a slave transitions from down to up, the bonding
interface briefly goes carrier up, then down again, and finally up
once LACP negotiates collecting and distributing on the port.

When lacp_strict mode is on, the interface should not transition to
carrier up until LACP negotiation is complete.

This happens because the actor and partner port states remain in
Collecting_Distributing when the port goes down. When the port
comes back up, it temporarily remains in this state until LACP
renegotiation occurs.

Previously this was mostly cosmetic, but since the bonding carrier
state may depend on the LACP negotiation state, it causes the
interface to flap.

Move an operationally down port to the Mux WAITING state and clear the
Synchronization, Collecting, and Distributing states, in accordance with
the 802.1AX Mux state machine diagram.

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 1247a1e048df..b531f68a24b0 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1055,6 +1055,8 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
 	aggregator = rcu_dereference(port->aggregator);
 	if (port->sm_vars & AD_PORT_BEGIN) {
 		port->sm_mux_state = AD_MUX_DETACHED;
+	} else if (!port->is_enabled && port->sm_mux_state != AD_MUX_DETACHED) {
+		port->sm_mux_state = AD_MUX_WAITING;
 	} else {
 		switch (port->sm_mux_state) {
 		case AD_MUX_DETACHED:
@@ -1202,6 +1204,11 @@ static void ad_mux_machine(struct port *port, bool *update_slave_arr)
 			break;
 		case AD_MUX_WAITING:
 			port->sm_mux_timer_counter = __ad_timer_to_ticks(AD_WAIT_WHILE_TIMER, 0);
+			port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
+			ad_disable_collecting_distributing(port,
+							   update_slave_arr);
+			port->actor_oper_port_state &= ~LACP_STATE_COLLECTING;
+			port->actor_oper_port_state &= ~LACP_STATE_DISTRIBUTING;
 			break;
 		case AD_MUX_ATTACHED:
 			if (aggregator->is_active)
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v5 2/4] bonding: 3ad: fix carrier when no usable slaves
From: Louis Scalbert @ 2026-05-06 16:11 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, jonas.gorski, Louis Scalbert
In-Reply-To: <20260506161144.465485-1-louis.scalbert@6wind.com>

Apply the "lacp_strict" configuration from the previous commit.

"lacp_strict" mode "on" asserts that the bonding master carrier is up
only when at least 'min_links' slaves are in the Collecting_Distributing
state.

Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 drivers/net/bonding/bond_3ad.c     | 21 ++++++++++++++++++++-
 drivers/net/bonding/bond_options.c |  1 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index f0aa7d2f2171..1247a1e048df 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -745,6 +745,21 @@ static void __set_agg_ports_ready(struct aggregator *aggregator, int val)
 	}
 }
 
+static int __agg_usable_ports(struct aggregator *agg)
+{
+	struct port *port;
+	int valid = 0;
+
+	for (port = agg->lag_ports; port;
+	     port = port->next_port_in_aggregator) {
+		if (port->actor_oper_port_state & LACP_STATE_COLLECTING &&
+		    port->actor_oper_port_state & LACP_STATE_DISTRIBUTING)
+			valid++;
+	}
+
+	return valid;
+}
+
 static int __agg_active_ports(struct aggregator *agg)
 {
 	struct port *port;
@@ -2128,6 +2143,7 @@ static void ad_enable_collecting_distributing(struct port *port,
 			  port->actor_port_number,
 			  aggregator->aggregator_identifier);
 		__enable_port(port);
+		bond_3ad_set_carrier(port->slave->bond);
 		/* Slave array needs update */
 		*update_slave_arr = true;
 		/* Should notify peers if possible */
@@ -2151,6 +2167,7 @@ static void ad_disable_collecting_distributing(struct port *port,
 			  port->actor_port_number,
 			  aggregator->aggregator_identifier);
 		__disable_port(port);
+		bond_3ad_set_carrier(port->slave->bond);
 		/* Slave array needs an update */
 		*update_slave_arr = true;
 	}
@@ -2830,7 +2847,9 @@ int bond_3ad_set_carrier(struct bonding *bond)
 	active = __get_active_agg(&(SLAVE_AD_INFO(first_slave)->aggregator));
 	if (active) {
 		/* are enough slaves available to consider link up? */
-		if (__agg_active_ports(active) < bond->params.min_links) {
+		if ((bond->params.lacp_strict ? __agg_usable_ports(active)
+					: __agg_active_ports(active)) <
+		    bond->params.min_links) {
 			if (netif_carrier_ok(bond->dev)) {
 				netif_carrier_off(bond->dev);
 				goto out;
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index d358b831df77..94b7b0851f16 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1706,6 +1706,7 @@ static int bond_option_lacp_strict_set(struct bonding *bond,
 	netdev_dbg(bond->dev, "Setting LACP fallback to %s (%llu)\n",
 		   newval->string, newval->value);
 	bond->params.lacp_strict = newval->value;
+	bond_3ad_set_carrier(bond);
 
 	return 0;
 }
-- 
2.39.2


^ permalink raw reply related

* [PATCH net v5 0/4] bonding: 3ad: fix carrier state with no usable slaves
From: Louis Scalbert @ 2026-05-06 16:11 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, jonas.gorski, Louis Scalbert

Hi everyone,

This series addresses a blackholing issue and a subsequent link-flapping
issue in the 802.3ad bonding driver when dealing with inactive slaves
and the `min_links` parameter.

When an 802.3ad (LACP) bonding interface has no slaves in the
collecting/distributing state, the bonding master still reports
carrier as up as long as at least 'min_links' slaves have carrier.

In this situation, only one slave is effectively used for TX/RX,
while traffic received on other slaves is dropped. Upper-layer
daemons therefore consider the interface operational, even though
traffic may be blackholed if the lack of LACP negotiation means
the partner is not ready to deal with traffic.

This patchset introduces an optional behavior, widely adopted across
the industry, to address this issue. It consists of bringing the
bonding master interface down to signal to upper-layer processes
that it is not usable.

This patchset depends on the following iproute2 change:
ip/bond: add lacp_strict support

Patch 1 introduces the lacp_strict configuration knob, which is
applied in the subsequent patch. The default (off) mode preserves
the existing behavior, while the strict mode (on) is intended to force
the bonding master carrier down in this situation.

Patch 2 addresses the core issue when lacp_strict is set to strict.
It ensures that carrier is asserted only when at least 'min_links'
slaves are in the Collecting/Distributing state.

Patch 3 fixes a side effect of the second patch. Tightening the carrier 
logic exposes a state persistence bug: when a physical link goes down, 
the LACP collecting/distributing flags remain set. When the link returns, 
the interface briefly hallucinates that it is ready, bounces the carrier 
up, and then drops it again once LACP renegotiation starts. Fix by
resetting Collecting and Distributing state as soon as the link goes
down.

Patch 4 adds a test for bonding lacp_strict both modes.


Changelog:

v4 -> v5
  - Patch 4: replace the use of netem, which is not included in the
    bonding selftests configuration. Instead, use a dedicated netns to
    forward frames between the partner and DUT. The partner and DUT are
    bridged within that netns. Since Linux bridges do not forward LACP
    special frames by default, group_fwd_mask is configured on bridge
    interfaces to allow them.
  Link: https://lore.kernel.org/netdev/20260417140505.3860237-1-louis.scalbert@6wind.com/

v3 -> v4
  - Rename the configuration knob to lacp_strict on/off instead of
    lacp_fallback legacy/strict.
  - Patch 1: change the command documentation accordingly and wrap
    text at approximately 75 columns.
  - Use "usable" wording instead "valid" for LACP Collecting /
    Distributing state in code and commit log.
  - Patch 2: test collecting and distributing state regardless of
    coupled_control
  - Patch 3: Reworked because removing the SELECTED flag was not
    compliant with 802.1AX. Instead, to transition to the WAITING state
    on port disabled, except when already in the DETACHED state.
    And remove Collecting and Distributing state in WAITING state.
  - Patch 4 is removed. It was a fix for patch 3 but it is no more
    needed since patch 3 was reworked.
  Link: https://lore.kernel.org/netdev/20260408152353.276204-1-louis.scalbert@6wind.com/

v2 -> v3
  - Add an initial patch introducing the lacp_fallback configuration
    knob (no behavior change yet).
  - Patch 2 (was patch 1 in v2): apply the new behavior only when
    lacp_fallback is set to strict, and re-evaluate the bonding
    master carrier when the setting changes.
  Link: https://lore.kernel.org/netdev/20260325134439.3048615-1-louis.scalbert@6wind.com/

v1 -> v2
  - Patch 1: split a comment line that exceeded 80 characters.
  - Move the change from patch 2 in __agg_ports_are_ready() into a third
    patch, as it is actually a side effect of the fix introduced in
    patch 2.
  - Patch 2: Expand the commit message and add a code comment describing
    the change in ad_port_selection_logic().
  - Patch 3: Check the READY_N flag only on ports in the WAITING state,
    rather than on all enabled ports. This more closely matches 802.3ad.
  Link: https://lore.kernel.org/netdev/20260316131838.3257889-1-louis.scalbert@6wind.com/

Louis Scalbert (4):
  bonding: 3ad: add lacp_strict configuration knob
  bonding: 3ad: fix carrier when no usable slaves
  bonding: 3ad: fix mux port state on oper down
  selftests: bonding: add test for lacp_strict mode

 Documentation/networking/bonding.rst          |  23 ++
 drivers/net/bonding/bond_3ad.c                |  28 +-
 drivers/net/bonding/bond_main.c               |   1 +
 drivers/net/bonding/bond_netlink.c            |  16 +
 drivers/net/bonding/bond_options.c            |  27 ++
 include/net/bond_options.h                    |   1 +
 include/net/bonding.h                         |   1 +
 include/uapi/linux/if_link.h                  |   1 +
 .../selftests/drivers/net/bonding/Makefile    |   1 +
 .../drivers/net/bonding/bond_lacp_strict.sh   | 347 ++++++++++++++++++
 10 files changed, 445 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_strict.sh

-- 
2.39.2


^ permalink raw reply

* [PATCH ipsec-next] xfrm: Use regular error handling instead of BUG_ON() in the netlink API.
From: Steffen Klassert @ 2026-05-06 16:08 UTC (permalink / raw)
  To: netdev; +Cc: devel

The xfrm netlink API uses BUG_ON() on failures since it exists.
However all these error are uncritical and can be handled
with regular error handling. This fixes machine crashes
in situations where an emergency break is not needed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_user.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d56450f61669..b24a0f9e91d5 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1734,7 +1734,10 @@ static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -ENOMEM;
 
 	err = build_spdinfo(r_skb, net, sportid, seq, *flags);
-	BUG_ON(err < 0);
+	if (err < 0) {
+		kfree_skb(r_skb);
+		return err;
+	}
 
 	return nlmsg_unicast(xfrm_net_nlsk(net, skb), r_skb, sportid);
 }
@@ -1794,7 +1797,11 @@ static int xfrm_get_sadinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -ENOMEM;
 
 	err = build_sadinfo(r_skb, net, sportid, seq, *flags);
-	BUG_ON(err < 0);
+	if (err < 0) {
+		kfree_skb(r_skb);
+		return err;
+	}
+
 
 	return nlmsg_unicast(xfrm_net_nlsk(net, skb), r_skb, sportid);
 }
@@ -3285,7 +3292,10 @@ static int xfrm_send_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
 
 	/* build migrate */
 	err = build_migrate(skb, m, num_migrate, k, sel, encap, dir, type);
-	BUG_ON(err < 0);
+	if (err < 0) {
+		kfree_skb(skb);
+		return err;
+	}
 
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_MIGRATE);
 }
@@ -3623,7 +3633,10 @@ static int xfrm_aevent_state_notify(struct xfrm_state *x, const struct km_event
 		return -ENOMEM;
 
 	err = build_aevent(skb, x, c);
-	BUG_ON(err < 0);
+	if (err < 0) {
+		kfree_skb(skb);
+		return err;
+	}
 
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_AEVENTS);
 }
@@ -3862,7 +3875,10 @@ static int xfrm_send_acquire(struct xfrm_state *x, struct xfrm_tmpl *xt,
 		return -ENOMEM;
 
 	err = build_acquire(skb, x, xt, xp);
-	BUG_ON(err < 0);
+	if (err < 0) {
+		kfree_skb(skb);
+		return err;
+	}
 
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_ACQUIRE);
 }
@@ -3984,7 +4000,10 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
 		return -ENOMEM;
 
 	err = build_polexpire(skb, xp, dir, c);
-	BUG_ON(err < 0);
+	if (err < 0) {
+		kfree_skb(skb);
+		return err;
+	}
 
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
 }
@@ -4151,7 +4170,8 @@ static int xfrm_send_report(struct net *net, u8 proto,
 		return -ENOMEM;
 
 	err = build_report(skb, proto, sel, addr);
-	BUG_ON(err < 0);
+	if (err < 0)
+		return err;
 
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_REPORT);
 }
@@ -4206,7 +4226,10 @@ static int xfrm_send_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
 		return -ENOMEM;
 
 	err = build_mapping(skb, x, ipaddr, sport);
-	BUG_ON(err < 0);
+	if (err < 0) {
+		kfree_skb(skb);
+		return err;
+	}
 
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_MAPPING);
 }
-- 
2.43.0


^ permalink raw reply related

* [PATCH iproute2-next] tc/class: use rtnl_dump_request_n() in tc_class_list()
From: Eric Dumazet @ 2026-05-06 16:06 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Eric Dumazet, Jamal Hadi Salim

strace tc -s class sh dev eth1 parent 2:

strace is fooled with rtnl_dump_request() way:

sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000},
        msg_namelen=12,
	msg_iov=[{iov_base={nlmsg_len=36,
			nlmsg_type=RTM_GETTCLASS,
			nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP,
			nlmsg_seq=1778082626, nlmsg_pid=0},
			iov_len=16},
		{iov_base={nlmsg_len=0, nlmsg_type=NLMSG_NOOP, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=131072},
		        iov_len=20}],
		msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 36

With rtnl_dump_request_n() we get instead:

sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000},
	msg_namelen=12,
	msg_iov=[{iov_base=[{nlmsg_len=36,
			nlmsg_type=RTM_GETTCLASS,
			nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP,
			nlmsg_seq=1778082790, nlmsg_pid=0},
		{tcm_family=AF_UNSPEC,
		 tcm_ifindex=if_nametoindex("eth1"), tcm_handle=0, tcm_parent=131072, tcm_info=0}],
		iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36

This will also permit future attribute additions, like skip stats.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
---
 tc/tc_class.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/tc/tc_class.c b/tc/tc_class.c
index 310514ce560ab6c1b42665cb6c75314e45b84cd6..82f0f85ee39a335bc7e1c3bea2e3ea9571ea155f 100644
--- a/tc/tc_class.c
+++ b/tc/tc_class.c
@@ -386,7 +386,14 @@ int print_class(struct nlmsghdr *n, void *arg)
 
 static int tc_class_list(int argc, char **argv)
 {
-	struct tcmsg t = { .tcm_family = AF_UNSPEC };
+	struct {
+		struct nlmsghdr n;
+		struct tcmsg t;
+	} req = {
+		.n.nlmsg_type = RTM_GETTCLASS,
+		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+		.t.tcm_family = AF_UNSPEC,
+	};
 	char d[IFNAMSIZ] = {};
 	char buf[1024] = {0};
 
@@ -412,20 +419,20 @@ static int tc_class_list(int argc, char **argv)
 			if (get_tc_classid(&filter_classid, *argv))
 				invarg("invalid class ID", *argv);
 		} else if (strcmp(*argv, "root") == 0) {
-			if (t.tcm_parent) {
+			if (req.t.tcm_parent) {
 				fprintf(stderr, "Error: \"root\" is duplicate parent ID\n");
 				return -1;
 			}
-			t.tcm_parent = TC_H_ROOT;
+			req.t.tcm_parent = TC_H_ROOT;
 		} else if (strcmp(*argv, "parent") == 0) {
 			__u32 handle;
 
-			if (t.tcm_parent)
+			if (req.t.tcm_parent)
 				duparg("parent", *argv);
 			NEXT_ARG();
 			if (get_tc_classid(&handle, *argv))
 				invarg("invalid parent ID", *argv);
-			t.tcm_parent = handle;
+			req.t.tcm_parent = handle;
 		} else if (matches(*argv, "help") == 0) {
 			usage();
 		} else {
@@ -437,13 +444,13 @@ static int tc_class_list(int argc, char **argv)
 	}
 
 	if (d[0]) {
-		t.tcm_ifindex = ll_name_to_index(d);
-		if (!t.tcm_ifindex)
+		req.t.tcm_ifindex = ll_name_to_index(d);
+		if (!req.t.tcm_ifindex)
 			return -nodev(d);
-		filter_ifindex = t.tcm_ifindex;
+		filter_ifindex = req.t.tcm_ifindex;
 	}
 
-	if (rtnl_dump_request(&rth, RTM_GETTCLASS, &t, sizeof(t)) < 0) {
+	if (rtnl_dump_request_n(&rth, &req.n) < 0) {
 		perror("Cannot send dump request");
 		return 1;
 	}
-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Xilin Wu @ 2026-05-06 16:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Daniel Thompson, Alex Elder, andrew+netdev, davem, edumazet, kuba,
	pabeni, maxime.chevallier, rmk+kernel, andersson, konradybcio,
	robh, krzk+dt, conor+dt, linusw, brgl, arnd, gregkh, mohd.anwar,
	a0987203069, alexandre.torgue, ast, boon.khai.ng, chenchuangyu,
	chenhuacai, daniel, hawk, hkallweit1, inochiama, john.fastabend,
	julianbraha, livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <a8b5d96b-3c8f-4ec5-b205-bbacafbf47aa@lunn.ch>

On 5/6/2026 11:56 PM, Andrew Lunn wrote:
>>> Does that mean you don't get phy interrupts reported in /proc/interrupts
>>> before any suspend happens?
>>>
>>
>> No. The phy works in polling mode AFAIK.
> 
> You should be able to tell from dmesg:
> 
> Generic FE-GE Realtek PHY r8169-0-600:00: attached PHY driver (mii_bus:phy_addr=r8169-0-600:00, irq=MAC)
> 
> irq= can be MAC, POLL, or the interrupt number if interrupts are used.
> 
> If PHY WoL is used, i would expect interrupts to be used, otherwise
> how is the PHY waking the system?
> 
>       Andrew
> 

Yeah I know, it's indeed POLL. The phy is now waking the system using 
the INTN_WOL pin directly connected to the SoC GPIO, instead of the INTN 
pin connected to the QPS615.

-- 
Best regards,
Xilin Wu <sophon@radxa.com>

^ permalink raw reply

* AW: assert in phylink.c with lan7801 and dp83tc811 since kernel 6.18
From: Sven Schuchmann @ 2026-05-06 16:00 UTC (permalink / raw)
  To: Andrew Lunn, Maxime Chevallier; +Cc: netdev@vger.kernel.org
In-Reply-To: <eba96f21-78e6-42bf-938a-6d4037d0a305@lunn.ch>

Hi,

> > Von: Andrew Lunn <andrew@lunn.ch>
> > Gesendet: Mittwoch, 06. Mai 2026 15:18
> > An: Maxime Chevallier <maxime.chevallier@bootlin.com>
> > Cc: Sven Schuchmann <schuchmann@schleissheimer.de>; netdev@vger.kernel.org <netdev@vger.kernel.org>
> > Betreff: Re: assert in phylink.c with lan7801 and dp83tc811 since kernel 6.18
> 
> On Wed, May 06, 2026 at 03:04:42PM +0200, Maxime Chevallier wrote:
> > On 06/05/2026 14:39, Andrew Lunn wrote:
> > >> So for me it somewhere happens in phylink_validate_mac_and_pcs()
> > > I was expecting to see more debug output, but reading the code, i was
> > > also thinking we need to look at phylink_validate_mac_and_pcs().
> > > You are going in the correct direction putting lots of printk() in the
> > > code. We need to find where it returns EINVAL:
> > [...]
> > > In the end, we are probably going to find that what the MAC says its
> 
> > > capabilities are don't match what the PCS says it can do.
> > > Thinking about that, it says phy mode RGMII. You generally don't use a
> > > PCS with RGMII, so that is suspicious.

> > Another thing to consider is that in this case, we're attaching to a
> > dp83tc811, so a 100BaseT1 PHY. Does the PHY driver correctly populates
> > the supported features ?

> Using RGMII to connect to a 100BaseT is also odd. It does happen, but
> it is a bit of a corner case, and something could be going wrong here.

What I can at least say that it was working with kernel 6.12 on 
the same hardware. But I do not know about the supported features population.

I added some debug code to the end of phylink_validate_mac_and_pcs():

	/* Then validate the link parameters with the MAC */
	if (pl->mac_ops->mac_get_caps)
		capabilities = pl->mac_ops->mac_get_caps(pl->config,
							 state->interface);
	else
		capabilities = pl->config->mac_capabilities;

	phylink_validate_mask_caps(supported, state, capabilities);

	phylink_dbg(pl, "phy: --3.4 supported    0x%x\n", supported);
	phylink_dbg(pl, "phy: --3.4 capabilities 0x%x\n", capabilities);

	int retval = phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
	phylink_dbg(pl, "phy: --3.5-- %d\n", retval);
	return retval;
}

and the output I get is:
[    2.771602] lan78xx 1-1.4:1.0 (unnamed net_device) (uninitialized): phy: --3.4 supported    0x80cf3430
[    2.771607] lan78xx 1-1.4:1.0 (unnamed net_device) (uninitialized): phy: --3.4 capabilities 0xbf
[    2.771611] lan78xx 1-1.4:1.0 (unnamed net_device) (uninitialized): phy: --3.5-- -22
[    2.771616] lan78xx 1-1.4:1.0 (unnamed net_device) (uninitialized): validation of rgmii-id with support 00000000,00000000,00000000,00006280 and advertisement 00000000,00000000,00000000,00006280 failed: -EINVAL

But I do not get what phylink_is_empty_linkmode() is doing...

Regards,

   Sven

^ permalink raw reply

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Andrew Lunn @ 2026-05-06 15:56 UTC (permalink / raw)
  To: Xilin Wu
  Cc: Daniel Thompson, Alex Elder, andrew+netdev, davem, edumazet, kuba,
	pabeni, maxime.chevallier, rmk+kernel, andersson, konradybcio,
	robh, krzk+dt, conor+dt, linusw, brgl, arnd, gregkh, mohd.anwar,
	a0987203069, alexandre.torgue, ast, boon.khai.ng, chenchuangyu,
	chenhuacai, daniel, hawk, hkallweit1, inochiama, john.fastabend,
	julianbraha, livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <549BE66F62F1470F+a489d4fd-10ab-484c-9b55-6aecfd05d383@radxa.com>

> > Does that mean you don't get phy interrupts reported in /proc/interrupts
> > before any suspend happens?
> > 
> 
> No. The phy works in polling mode AFAIK.

You should be able to tell from dmesg:

Generic FE-GE Realtek PHY r8169-0-600:00: attached PHY driver (mii_bus:phy_addr=r8169-0-600:00, irq=MAC)

irq= can be MAC, POLL, or the interrupt number if interrupts are used.

If PHY WoL is used, i would expect interrupts to be used, otherwise
how is the PHY waking the system?

     Andrew

^ permalink raw reply

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
From: Michael S. Tsirkin @ 2026-05-06 15:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stefano Garzarella, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov,
	Stefan Hajnoczi, Jason Wang, Xuan Zhuo, Eugenio Pérez, kvm,
	virtualization
In-Reply-To: <CANn89iKWhJp2N-60FC4JpX2Dw0sK8-vk2WrHkDOnkgC_c8nbHA@mail.gmail.com>

On Wed, May 06, 2026 at 08:38:39AM -0700, Eric Dumazet wrote:
> On Wed, May 6, 2026 at 8:15 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > > There is always a discrepancy between skb->len and skb->truesize.
> > > You will not be able to announce a 1MB window, and accept one milliion
> > > skb of 1-byte each.
> >
> > We can if we copy.
> 
> You mean, ignore VIRTIO_VSOCK_SEQ_EOM?

No I mean saving it in a compact form.
If packet boundaries are the only concern, the overhead
needn't exceed 50% even for 1 byte messages. asn.1 ber, for
an over-engineered example?


^ permalink raw reply

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Xilin Wu @ 2026-05-06 15:44 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Andrew Lunn, Alex Elder, andrew+netdev, davem, edumazet, kuba,
	pabeni, maxime.chevallier, rmk+kernel, andersson, konradybcio,
	robh, krzk+dt, conor+dt, linusw, brgl, arnd, gregkh, mohd.anwar,
	a0987203069, alexandre.torgue, ast, boon.khai.ng, chenchuangyu,
	chenhuacai, daniel, hawk, hkallweit1, inochiama, john.fastabend,
	julianbraha, livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <aftgorkah-Hjrvq2@aspen.lan>

On 5/6/2026 11:39 PM, Daniel Thompson wrote:
> On Wed, May 06, 2026 at 10:35:18PM +0800, Xilin Wu wrote:
>> On 5/6/2026 10:19 PM, Andrew Lunn wrote:
>>> On Wed, May 06, 2026 at 08:59:01PM +0800, Xilin Wu wrote:
>>>> On 5/1/2026 11:54 PM, Alex Elder wrote:
>>>>> +	/* AXI Configuration */
>>>>> +	axi = &td->axi;
>>>>> +	axi->axi_lpi_en = 1;
>>>>> +	axi->axi_wr_osr_lmt = 31;
>>>>> +	axi->axi_rd_osr_lmt = 31;
>>>>> +	/* All sizes (2^2..2^8) are supported */
>>>>> +	axi->axi_blen_regval = DMA_AXI_BLEN_MASK;
>>>>> +	plat->axi = axi;
>>>>> +
>>>>> +	plat->mac_port_sel_speed = speed;
>>>>> +	plat->flags = STMMAC_FLAG_MULTI_MSI_EN | STMMAC_FLAG_TSO_EN;
>>>>
>>>> I got WoL working only after adding STMMAC_FLAG_USE_PHY_WOL here. I guess
>>>> it's required, since the driver clocks down the MAC/PMA/XPCS in its suspend
>>>> hook?
>>>
>>> Nice to see somebody testing WoL.
>>>
>>> In your testing, is it the PHY doing the WoL, or the MAC? I assume
>>> PHY.
>>>
>>> If i remember the DT correctly, the PHY interrupt is connected to a
>>> SoC GPIO, not a GPIO of this chip. So for your board, it is the SoCs
>>> GPIO controllers ability to perform the wake which is
>>> important. However, where the PHY interrupt is connected is a board
>>> design issue. Could the PHY interrupt be connected to the chip? Would
>>> the chip be able to wake the system? Should STMMAC_FLAG_USE_PHY_WOL be
>>> conditional?
>>
>> Yes, the PHY is doing the WoL. And I guess this makes sense as it allows the
>> MAC to power down during suspend to save power.
>>
>> The INTN pin of QCA8081 is connected to the ETH_0_INT_N of QPS615. And the
>> INTN_WOL pin is connected to a SoC GPIO.
> 
> Interesting. That is different to RB3gen2 where INTN is routed to both
> (although there is a do-not-fit 0ohm resistor option that could change
> that).
> 
> Does that mean you don't get phy interrupts reported in /proc/interrupts
> before any suspend happens?
> 

No. The phy works in polling mode AFAIK.

> 
>> Without this change, I can't get WoL to work. I have a working branch for
>> our board here:
>> https://github.com/strongtz/linux-radxa-qcom/commits/v7.0.2-8280-wip/
> 
> I took a quick look at the DT and I noticed you have an SGMII PHY
> attached to both eMAC0 and eMAC1 on your board. This is something we
> think should work but were unable to test. Are you able to use both
> eMACs concurrently? Would be great to see that confirmed!
> 
> 
> Daniel.
> 

Yes, both eMACs can be used concurrently. And they can reach 2.5Gbps 
under iperf3 testing.

-- 
Best regards,
Xilin Wu <sophon@radxa.com>

^ permalink raw reply

* Fw: [Bug 221470] New: ARP reply received at MAC level but neighbor entry not resolved (Linux 5.15, automotive BSP) for vlan interface
From: Stephen Hemminger @ 2026-05-06 15:43 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Wed, 06 May 2026 05:29:26 +0000
From: bugzilla-daemon@kernel.org
To: stephen@networkplumber.org
Subject: [Bug 221470] New: ARP reply received at MAC level but neighbor entry not resolved (Linux 5.15, automotive BSP) for vlan interface


https://bugzilla.kernel.org/show_bug.cgi?id=221470

            Bug ID: 221470
           Summary: ARP reply received at MAC level but neighbor entry not
                    resolved (Linux 5.15, automotive BSP) for vlan
                    interface
           Product: Networking
           Version: 2.5
          Hardware: ARM
                OS: Linux
            Status: NEW
          Severity: high
          Priority: P3
         Component: IPV4
          Assignee: stephen@networkplumber.org
          Reporter: rajkumar.veer@harman.com
        Regression: No

Observed behavior: -
During a system suspend–resume cycle, ARP requests are transmitted successfully
from the target. Corresponding ARP replies are continuously received at the
EMAC/MAC level, as confirmed through driver debug logs and hardware RX
statistics.
However, the ARP entry is not being resolved in the kernel neighbor table (ip
neigh), resulting in failure of communication on a specific VLAN interface.
This issue is sporadic and observed only on a particular VLAN interface.
Recovery currently requires a full reboot of the system to restore proper ARP
functionality for particular vlan interface

Environment: - Kernel version: Linux 5.15.x (vanilla, with automotive BSP
integrations)
Architecture: ARM64 & Exynosv920 soc
Ethernet MAC driver: Sxgmac EMAC controller
PHY: Macsec marvell phy/fixed phy switch  
Network stack: mainline Linux networking stack

What we verified: - ARP reply frame is valid and well formed.
Destination MAC matches the interface MAC.
No RX errors or CRC issues at the hardware level.
ARP reply is visible via driver-level logging but not via tcpdump.
Neighbor entry remains in INCOMPLETE state.

Questions: -
Are there any known issues in Linux 5.15 where ARP replies are dropped before
reaching the neighbor subsystem?
Could RX offloading, VLAN filtering, checksum offload, or NAPI GRO/LRO affect
ARP processing in this scenario?
Has anyone encountered similar issues on automotive BSPs or embedded Ethernet
MAC drivers?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [PATCH net-next v6 0/2] selftests: openvswitch: add pop_vlan test
From: Aaron Conole @ 2026-05-06 15:39 UTC (permalink / raw)
  To: Minxi Hou
  Cc: netdev, echaudro, i.maximets, davem, edumazet, kuba, pabeni,
	horms, shuah, dev, linux-kselftest, linux-kernel
In-Reply-To: <20260506131218.1880852-1-houminxi@gmail.com>

Hi Minxi,

Minxi Hou <houminxi@gmail.com> writes:

> Add test_pop_vlan() to verify OVS kernel datapath pop_vlan action
> correctly strips 802.1Q VLAN tags from frames.
>
> Patch 1 extends ovs-dpctl.py with vlan(vid=X,pcp=Y,cfi=Z) formatting
> and parsing, plus an encap_ovskey subclass for safe ENCAP NLA decoding.
> It changes OVS_KEY_ATTR_VLAN type from uint16 to be16 to match
> the kernel __be16 wire format.
> It also adds push_vlan action support (parse/format with range
> validation) and removes the unnecessary MAX_ENCAP_DEPTH limit.
> Patch 2 adds the selftest using purely ping-based verification with
> a push_vlan return flow for symmetric bidirectional testing.
>
> Tested with vng on x86_64, all OVS selftests pass (including new
> test_pop_vlan).
>
> v6:
>   - fix non-ASCII characters (em dashes) in comments and commit
>     messages

I just commented on v5, and I think those comments still apply.

In the future, if you will respin a series that hasn't gotten comments
yet, it would be preferable to reply to that series somewhere saying
that you will respin so we can make sure to always reply to the latest.

> v5: https://lore.kernel.org/netdev/20260505124957.1239812-1-houminxi@gmail.com/
>   - add push_vlan action class, dpstr format and parse with range
>     validation (vid 0-4095, pcp 0-7, tpid 0-0xFFFF, CFI forced to 1)
>   - remove MAX_ENCAP_DEPTH constant and depth tracking (bracket-depth
>     counter in encap parser already handles nesting)
>   - remove start_capture/stop_capture helpers and tcpdump/pcap
>     verification -- use ping success/failure instead
>   - remove modprobe/netns pre-flight checks (other tests don't do this)
>   - remove ethtool VLAN offload disable (unnecessary for veth)
>   - add push_vlan return flow for symmetric bidirectional ping
>   - use ovs_sbx wrapper for ping commands (consistent with siblings)
> v4: https://lore.kernel.org/netdev/20260504123713.555461-1-houminxi@gmail.com/
>   - fix all checkpatch line-length warnings in new code
>   - fix pylint W0707: use explicit exception chaining (from exc)
> v3: https://lore.kernel.org/netdev/20260503120946.51869-1-houminxi@gmail.com/
>   - encap_ovskey: MPLS type "ovs_key_mpls" -> "array(ovs_key_mpls)"
>   - encap_ovskey: PRIORITY/IN_PORT set to "none" (metadata, not in ENCAP)
>   - _vlan_dpstr: cfi=0 falls back to tci=0x%04x for round-trip safety
>   - encap parse(): check return value for unrecognized trailing content
>   - vlan parser: boundary check + raise-from for exception chaining
>   - start_capture: || return $? to propagate ksft_skip correctly
>   - on_exit: moved after resource creation, not before
>   - ping success: changed from NOTE to FAIL + return 1
>   - VLAN interface creation: added || return 1 error propagation
>   - netns probe: distinguish EEXIST from missing CONFIG_NET_NS
>   - sbx_add: || return $ksft_skip -> || return $? (match sibling tests)
> v2: https://lore.kernel.org/netdev/20260501133924.3100680-1-houminxi@gmail.com/
>
> Minxi Hou (2):
>   selftests: openvswitch: add vlan() and encap() flow string parsing
>   selftests: openvswitch: add pop_vlan test
>
>  .../selftests/net/openvswitch/openvswitch.sh  |  77 +++++
>  .../selftests/net/openvswitch/ovs-dpctl.py    | 322 +++++++++++++++++-
>  2 files changed, 389 insertions(+), 10 deletions(-)


^ permalink raw reply

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Daniel Thompson @ 2026-05-06 15:39 UTC (permalink / raw)
  To: Xilin Wu
  Cc: Andrew Lunn, Alex Elder, andrew+netdev, davem, edumazet, kuba,
	pabeni, maxime.chevallier, rmk+kernel, andersson, konradybcio,
	robh, krzk+dt, conor+dt, linusw, brgl, arnd, gregkh, mohd.anwar,
	a0987203069, alexandre.torgue, ast, boon.khai.ng, chenchuangyu,
	chenhuacai, daniel, hawk, hkallweit1, inochiama, john.fastabend,
	julianbraha, livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <4C0D95BC59F1A4ED+53f3be85-2cdd-4058-8950-57970027d481@radxa.com>

On Wed, May 06, 2026 at 10:35:18PM +0800, Xilin Wu wrote:
> On 5/6/2026 10:19 PM, Andrew Lunn wrote:
> > On Wed, May 06, 2026 at 08:59:01PM +0800, Xilin Wu wrote:
> > > On 5/1/2026 11:54 PM, Alex Elder wrote:
> > > > +	/* AXI Configuration */
> > > > +	axi = &td->axi;
> > > > +	axi->axi_lpi_en = 1;
> > > > +	axi->axi_wr_osr_lmt = 31;
> > > > +	axi->axi_rd_osr_lmt = 31;
> > > > +	/* All sizes (2^2..2^8) are supported */
> > > > +	axi->axi_blen_regval = DMA_AXI_BLEN_MASK;
> > > > +	plat->axi = axi;
> > > > +
> > > > +	plat->mac_port_sel_speed = speed;
> > > > +	plat->flags = STMMAC_FLAG_MULTI_MSI_EN | STMMAC_FLAG_TSO_EN;
> > >
> > > I got WoL working only after adding STMMAC_FLAG_USE_PHY_WOL here. I guess
> > > it's required, since the driver clocks down the MAC/PMA/XPCS in its suspend
> > > hook?
> >
> > Nice to see somebody testing WoL.
> >
> > In your testing, is it the PHY doing the WoL, or the MAC? I assume
> > PHY.
> >
> > If i remember the DT correctly, the PHY interrupt is connected to a
> > SoC GPIO, not a GPIO of this chip. So for your board, it is the SoCs
> > GPIO controllers ability to perform the wake which is
> > important. However, where the PHY interrupt is connected is a board
> > design issue. Could the PHY interrupt be connected to the chip? Would
> > the chip be able to wake the system? Should STMMAC_FLAG_USE_PHY_WOL be
> > conditional?
>
> Yes, the PHY is doing the WoL. And I guess this makes sense as it allows the
> MAC to power down during suspend to save power.
>
> The INTN pin of QCA8081 is connected to the ETH_0_INT_N of QPS615. And the
> INTN_WOL pin is connected to a SoC GPIO.

Interesting. That is different to RB3gen2 where INTN is routed to both
(although there is a do-not-fit 0ohm resistor option that could change
that).

Does that mean you don't get phy interrupts reported in /proc/interrupts
before any suspend happens?


> Without this change, I can't get WoL to work. I have a working branch for
> our board here:
> https://github.com/strongtz/linux-radxa-qcom/commits/v7.0.2-8280-wip/

I took a quick look at the DT and I noticed you have an SGMII PHY
attached to both eMAC0 and eMAC1 on your board. This is something we
think should work but were unable to test. Are you able to use both
eMACs concurrently? Would be great to see that confirmed!


Daniel.

^ permalink raw reply

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
From: Eric Dumazet @ 2026-05-06 15:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefano Garzarella, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet, Arseniy Krasnov,
	Stefan Hajnoczi, Jason Wang, Xuan Zhuo, Eugenio Pérez, kvm,
	virtualization
In-Reply-To: <20260506111507-mutt-send-email-mst@kernel.org>

On Wed, May 6, 2026 at 8:15 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > There is always a discrepancy between skb->len and skb->truesize.
> > You will not be able to announce a 1MB window, and accept one milliion
> > skb of 1-byte each.
>
> We can if we copy.

You mean, ignore VIRTIO_VSOCK_SEQ_EOM?

^ permalink raw reply

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Xilin Wu @ 2026-05-06 15:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alex Elder, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, rmk+kernel, andersson, konradybcio, robh,
	krzk+dt, conor+dt, linusw, brgl, arnd, gregkh, Daniel Thompson,
	mohd.anwar, a0987203069, alexandre.torgue, ast, boon.khai.ng,
	chenchuangyu, chenhuacai, daniel, hawk, hkallweit1, inochiama,
	john.fastabend, julianbraha, livelycarpet87, matthew.gerlach,
	mcoquelin.stm32, me, prabhakar.mahadev-lad.rj, richardcochran,
	rohan.g.thomas, sdf, siyanteng, weishangjuan, wens, netdev, bpf,
	linux-arm-msm, devicetree, linux-gpio, linux-stm32,
	linux-arm-kernel, linux-kernel
In-Reply-To: <2af0fee3-d3d6-4434-847f-3fd2fbb841d3@lunn.ch>

On 5/6/2026 10:45 PM, Andrew Lunn wrote:
>> Hi Andrew,
>>
>> Yes, the PHY is doing the WoL. And I guess this makes sense as it allows the
>> MAC to power down during suspend to save power.
>>
>> The INTN pin of QCA8081 is connected to the ETH_0_INT_N of QPS615. And the
>> INTN_WOL pin is connected to a SoC GPIO.
>>
>> Without this change, I can't get WoL to work. I have a working branch for
>> our board here:
>> https://github.com/strongtz/linux-radxa-qcom/commits/v7.0.2-8280-wip/
> 
> Please take a look at commit
> 
> commit 6911308d7d111a9c367293b52f2dc265819f2b60
> Author: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Date:   Thu Oct 23 10:16:50 2025 +0100
> 
>      net: stmmac: convert to phylink-managed Wake-on-Lan
> 
> In particular:
> 
>      When STMMAC_FLAG_USE_PHY_WOL is not set, we provide the MAC's WoL
>      capabilities to phylink, which then allows phylink to choose between
>      the PHY and MAC for WoL depending on their individual capabilities
>      as described in the phylink commit. This only augments the WoL
>      functionality with PHYs that declare to the driver model that they are
>      wake-up capable. Currently, very few PHY drivers support this.
>      
> Could you actually patch the PHY driver to make it list its
> capabilities. That is the direction we want to go in the long term,
> and not use STMMAC_FLAG_USE_PHY_WOL.
> 
>      Andrew
> 

Thanks for pointing this out! You are right that I should patch the PHY 
driver. I have made WoL work without the change in the tc956x driver.

-- 
Best regards,
Xilin Wu <sophon@radxa.com>


^ permalink raw reply

* Re: [PATCH net] vsock/virtio: fix potential unbounded skb queue
From: Michael S. Tsirkin @ 2026-05-06 15:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eric Dumazet, Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Arseniy Krasnov, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, kvm, virtualization
In-Reply-To: <afoF_cHfl6ygcupM@sgarzare-redhat>

On Tue, May 05, 2026 at 06:11:13PM +0200, Stefano Garzarella wrote:
> On Tue, May 05, 2026 at 07:14:36AM -0700, Eric Dumazet wrote:
> > On Tue, May 5, 2026 at 6:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > 
> > > On Thu, Apr 30, 2026 at 12:26:52PM +0000, Eric Dumazet wrote:
> > > >virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.
> > > >
> > > >virtio_transport_recv_enqueue() skips coalescing for packets
> > > >with VIRTIO_VSOCK_SEQ_EOM.
> > > >
> > > >If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
> > > >a very large number of packets can be queued
> > > >because vvs->rx_bytes stays at 0.
> > > >
> > > >Fix this by estimating the skb metadata size:
> > > >
> > > >       (Number of skbs in the queue) * SKB_TRUESIZE(0)
> > > >
> > > >Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
> > > >Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > >Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
> > > >Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > > >Cc: Stefano Garzarella <sgarzare@redhat.com>
> > > >Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > >Cc: Jason Wang <jasowang@redhat.com>
> > > >Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >Cc: "Eugenio Pérez" <eperezma@redhat.com>
> > > >Cc: kvm@vger.kernel.org
> > > >Cc: virtualization@lists.linux.dev
> > > >---
> > > > net/vmw_vsock/virtio_transport_common.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > >index 416d533f493d7b07e9c77c43f741d28cfcd0953e..9b8014516f4fb1130ae184635fbba4dfee58bd64 100644
> > > >--- a/net/vmw_vsock/virtio_transport_common.c
> > > >+++ b/net/vmw_vsock/virtio_transport_common.c
> > > >@@ -447,7 +447,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> > > >                                       u32 len)
> > > > {
> > > >-      if (vvs->buf_used + len > vvs->buf_alloc)
> > > >+      u64 skb_overhead = (skb_queue_len(&vvs->rx_queue) + 1) * SKB_TRUESIZE(0);
> > > >+
> > > >+      if (skb_overhead + vvs->buf_used + len > vvs->buf_alloc)
> > > >               return false;
> > > 
> > > I'm not sure about this fix, I mean that maybe this is incomplete.
> > > In virtio-vsock, there is a credit mechanism between the two peers:
> > > https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4850003
> > > 
> > > This takes only the payload into account, so it’s true that this problem
> > > exists; however, perhaps we should also inform the other peer of a lower
> > > credit balance, otherwise the other peer will believe it has much more
> > > credit than it actually does, send a large payload, and then the packet
> > > will be discarded and the data lost (there are no retransmissions,
> > > etc.).
> > 
> > I dunno, perhaps revert 077706165717 ("virtio/vsock: don't use skbuff
> > state to account credit")
> > and find a better fix then?
> 
> IIRC the same issue was there before the commit fixed by that one (commit
> 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")), so
> not sure about reverting it TBH.
> 
> CCing Arseniy and Bobby.
> 
> > 
> > There is always a discrepancy between skb->len and skb->truesize.
> > You will not be able to announce a 1MB window, and accept one milliion
> > skb of 1-byte each.
> > 
> > This kind of contract is broken.
> > 
> 
> Yep, I agree, but before we start discarding data (and losing it), IMHO we
> should at least inform the other peer that we're out of space.
> 
> @Stefan, @Michael, do you think we can do something in the spec to avoid
> this issue and in some way take into account also the metadata in the
> credit. I mean to avoid the 1-byte packets flooding.
> 
> Thanks,
> Stefano

Why do we need the metadata? Just don't keep it around if you begin
running low on memory.

-- 
MST


^ permalink raw reply

* Re: [PATCH net-next v5 2/2] selftests: openvswitch: add pop_vlan test
From: Aaron Conole @ 2026-05-06 15:36 UTC (permalink / raw)
  To: Minxi Hou
  Cc: netdev, echaudro, i.maximets, davem, edumazet, kuba, pabeni,
	horms, shuah, dev, linux-kselftest, linux-kernel
In-Reply-To: <20260505124957.1239812-3-houminxi@gmail.com>

Minxi Hou <houminxi@gmail.com> writes:

> Add test_pop_vlan() to verify OVS kernel datapath pop_vlan action
> correctly strips 802.1Q VLAN tags from frames.
>
> Test structure:
> - Baseline: untagged forwarding validates basic connectivity.
> - Negative: forward without pop_vlan, tagged frame is invisible
>   to ns2 (no VLAN sub-interface), ping fails.
> - Positive: pop_vlan strips tag on forward path, push_vlan
>   restores tag on return path, ping succeeds.
>
> Use static ARP entries to avoid VLAN-tagged ARP complexity.
> Rely on ping success/failure for verification — no tcpdump or
> pcap files needed.
>
> Signed-off-by: Minxi Hou <houminxi@gmail.com>
> ---

Thanks for adding this.  I'm still a little unclear about the explicit
modprobe for 8021q.  Is it really needed?  I thought the request to add
a vlan tagged interface should auto-load that module (unless it is
blacklisted or something).  I guess this is an attempt to short-circuit
the skip, but maybe it would be better to configure an interface and if
that fails, then either fail the test or skip the test.


^ permalink raw reply

* Re: [PATCH v4 0/7] landlock: Add UDP access control support
From: Günther Noack @ 2026-05-06 15:33 UTC (permalink / raw)
  To: Matthieu Buffet
  Cc: Mickaël Salaün, linux-security-module, Mikhail Ivanov,
	konstantin.meskhidze, Tingmao Wang, netdev
In-Reply-To: <20260502124306.3975990-1-matthieu@buffet.re>

Hello!

Thanks for sending another revision!

On Sat, May 02, 2026 at 02:42:59PM +0200, Matthieu Buffet wrote:
> This is V4 of UDP access control in Landlock. Thanks to the round of
> review of v3, access rights have changed to something that seems easier
> to use and understand. It adds only two access rights, to restrict
> configuring local and remote addresses on UDP sockets. The one that
> restricts setting a remote address also controls sending datagrams to
> explicit remote addresses -ignoring any remote address preset on the
> socket-. The one that restricts binding to a local port also applies
> when the kernel auto-binds an ephemeral port.
> v1:
> Link: https://lore.kernel.org/all/20240916122230.114800-1-matthieu@buffet.re/
> v2:
> Link: https://lore.kernel.org/all/20241214184540.3835222-1-matthieu@buffet.re/
> v3:
> Link: https://lore.kernel.org/all/20251212163704.142301-1-matthieu@buffet.re/
> 
> The limitation around allowing a process to send but not receive is
> still there, and could warrant another patch if there is a real user
> need.
> I'm just not super happy about the clarity of logs generated for denied
> autobinds ("domain=xxxxxx blockers=net.bind_udp"), due to the fact that
> addresses and ports are currently only logged if they are non-0. A later
> (coordinated LSM-wide) patch could improve readability by replacing != 0
> checks with new booleans in struct lsm_network_audit. I'm also not
> exactly happy with the integration in existing TCP selftests, but
> refactoring them has already been discussed earlier.
> 
> Changes v1->v2
> ==============
> - recvmsg hook is gone and sendmsg hook doesn't apply when sending to a
>   remote address pre-set on socket, to improve performance
> - don't add a get_addr_port() helper function, which required a weird
>   "am I in IPv4 or IPv6 context"
> - reorder hook prologue for consistency: check domain, then type and
>   family
> 
> Changes v2->v3
> ==============
> - removed support for sending datagrams with explicit destination
>   address of family AF_UNSPEC, which allowed to bypass restrictions with
>   a race condition
> - rebased on linux-mic/next => add support for auditing
> - fixed mistake in selftests when using unspec_srv variables, which were
>   implicitly of type SOCK_STREAM and did not actually test UDP code
> - add tests for IPPROTO_IP
> - improved docs, split off TCP-related refactoring
> 
> Changes v3->v4
> ==============
> - merge LANDLOCK_ACCESS_NET_CONNECT_UDP and
>   LANDLOCK_ACCESS_NET_SENDTO_UDP into
>   LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP (everything that might set the
>   destination of a datagram)

I wish the name could be more in-line with
LANDLOCK_ACCESS_FS_RESOLVE_UNIX, but since this does not need
resolving any more, "resolve" in the name would be confusing.  I also
failed to come up with a better name for this access right.


> - make LANDLOCK_ACCESS_NET_BIND_UDP apply when kernel is about to
>   auto-bind an ephemeral port for the caller. Block it if policy would
>   not allow an explicit call to bind(0)
> - only deny sending AF_UNSPEC datagrams on IPv6 sockets, where there is
>   a risk of the address family changing midway
> 
> Patch is based on https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
> 3457a5ccacd3 ("landlock: Document fallocate(2) as another truncation corner case")
> All lines added are covered with selftests, except the "default: return
> 0" in current_check_autobind_udp_socket() which is not currently
> reachable (net.c goes from 92.9%->94.6% line coverage).
> 
> Let me know what you think!
> 
> Closes: https://github.com/landlock-lsm/linux/issues/10
> 
> Matthieu Buffet (7):
>   landlock: Add UDP bind() access control
>   landlock: Add UDP connect() access control
>   landlock: Add UDP send access control

For the final revision, I think it would be good to squash the two
commits that are about LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP.  That
reduces the chances that someone backports the first but not the
second to one of the distribution kernels.


>   selftests/landlock: Add UDP bind/connect tests
>   selftests/landlock: Add tests for sendmsg()
>   samples/landlock: Add sandboxer UDP access control
>   landlock: Add documentation for UDP support
> 
>  Documentation/userspace-api/landlock.rst     |   89 +-
>  include/uapi/linux/landlock.h                |   35 +-
>  samples/landlock/sandboxer.c                 |   40 +-
>  security/landlock/audit.c                    |    3 +
>  security/landlock/limits.h                   |    2 +-
>  security/landlock/net.c                      |  161 ++-
>  security/landlock/syscalls.c                 |    2 +-
>  tools/testing/selftests/landlock/base_test.c |    4 +-
>  tools/testing/selftests/landlock/net_test.c  | 1146 ++++++++++++++++--
>  9 files changed, 1341 insertions(+), 141 deletions(-)
> 
> 
> base-commit: 3457a5ccacd34fdd5ebd3a4745e721b5a1239690
> -- 
> 2.39.5
> 

—Günther

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] selftests: openvswitch: add vlan() and encap() flow string parsing
From: Aaron Conole @ 2026-05-06 15:33 UTC (permalink / raw)
  To: Minxi Hou
  Cc: netdev, echaudro, i.maximets, davem, edumazet, kuba, pabeni,
	horms, shuah, dev, linux-kselftest, linux-kernel
In-Reply-To: <20260505124957.1239812-2-houminxi@gmail.com>

Minxi Hou <houminxi@gmail.com> writes:

> Add VLAN TCI formatting and parsing support to ovs-dpctl.py:
>
> - Add _vlan_dpstr() to decompose TCI into vid/pcp/cfi fields,
>   with raw tci=0x%04x fallback when cfi=0 for round-trip safety.
> - Add _parse_vlan_from_flowstr() boundary check for missing ')'.
> - Add encap_ovskey subclass restricting nla_map to L2-L4 attributes
>   (slots 0-21) that appear inside 802.1Q ENCAP, with metadata
>   attributes set to "none".
> - Check parse() return value for unrecognized trailing content.
> - Support callable format functions in dpstr() output.
> - Add push_vlan action class with fields matching kernel struct
>   ovs_action_push_vlan (vlan_tpid, vlan_tci as network-order u16).
> - Add push_vlan dpstr format and parse with range validation
>   (vid 0-4095, pcp 0-7, tpid 0-0xFFFF) and CFI forced to 1.
> - Remove MAX_ENCAP_DEPTH constant and depth tracking — the
>   bracket-depth counter in the encap parser already handles
>   nesting; the global depth limit was unnecessary.
>
> Signed-off-by: Minxi Hou <houminxi@gmail.com>
> ---

It's worth noting that there are pylint errors introduced with this
patch.  HOWEVER, they are related to things like pascal case (where
ovs-dpctl.py prefers snake_case), and using explicit string formatting
vs f-string.  I think it is okay to ignore these errors since it keeps
the file consistent, and it may be worth cleaning some of those up
(missing docstring and f-string conversion) at another time.

>  .../selftests/net/openvswitch/ovs-dpctl.py    | 322 +++++++++++++++++-
>  1 file changed, 312 insertions(+), 10 deletions(-)
>
> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> index 848f61fdcee0..50551d4fa7c7 100644
> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> @@ -370,7 +370,7 @@ class ovsactions(nla):
>          ("OVS_ACTION_ATTR_OUTPUT", "uint32"),
>          ("OVS_ACTION_ATTR_USERSPACE", "userspace"),
>          ("OVS_ACTION_ATTR_SET", "ovskey"),
> -        ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
> +        ("OVS_ACTION_ATTR_PUSH_VLAN", "push_vlan"),
>          ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
>          ("OVS_ACTION_ATTR_SAMPLE", "sample"),
>          ("OVS_ACTION_ATTR_RECIRC", "uint32"),
> @@ -427,6 +427,9 @@ class ovsactions(nla):
>  
>              return actstr
>  
> +    class push_vlan(nla):
> +        fields = (("vlan_tpid", "!H"), ("vlan_tci", "!H"))
> +
>      class sample(nla):
>          nla_flags = NLA_F_NESTED
>  
> @@ -633,6 +636,14 @@ class ovsactions(nla):
>                  print_str += "ct_clear"
>              elif field[0] == "OVS_ACTION_ATTR_POP_VLAN":
>                  print_str += "pop_vlan"
> +            elif field[0] == "OVS_ACTION_ATTR_PUSH_VLAN":
> +                datum = self.get_attr(field[0])
> +                tpid = datum["vlan_tpid"]
> +                tci = datum["vlan_tci"]
> +                vid = tci & 0x0FFF
> +                pcp = (tci >> 13) & 0x7
> +                print_str += "push_vlan(vid=%d,pcp=%d" \
> +                    ",tpid=0x%04x)" % (vid, pcp, tpid)
>              elif field[0] == "OVS_ACTION_ATTR_POP_ETH":
>                  print_str += "pop_eth"
>              elif field[0] == "OVS_ACTION_ATTR_POP_NSH":
> @@ -726,7 +737,57 @@ class ovsactions(nla):
>                      actstr = actstr[strspn(actstr, ", ") :]
>                      parsed = True
>  
> -            if parse_starts_block(actstr, "clone(", False):
> +            if parse_starts_block(actstr, "push_vlan(", False):
> +                actstr = actstr[len("push_vlan("):]
> +                vid = 0
> +                pcp = 0
> +                tpid = 0x8100
> +                if ")" not in actstr:
> +                    raise ValueError(
> +                        "push_vlan: missing ')'")
> +                paren = actstr.index(")")
> +                if not actstr[:paren].strip():
> +                    raise ValueError("push_vlan: no fields")
> +                for kv in actstr[:paren].split(","):
> +                    if "=" not in kv:
> +                        raise ValueError(
> +                            "push_vlan: bad field '%s'"
> +                            % kv.strip())
> +                    k = kv[:kv.index("=")].strip()
> +                    v = kv[kv.index("=") + 1:].strip()
> +                    if k == "vid":
> +                        vid = int(v, 0)
> +                        if vid < 0 or vid > 0xFFF:
> +                            raise ValueError(
> +                                "push_vlan: vid=%d out of "
> +                                "range (0-4095)" % vid)
> +                    elif k == "pcp":
> +                        pcp = int(v, 0)
> +                        if pcp < 0 or pcp > 7:
> +                            raise ValueError(
> +                                "push_vlan: pcp=%d out of "
> +                                "range (0-7)" % pcp)
> +                    elif k == "tpid":
> +                        tpid = int(v, 0)
> +                        if tpid < 0 or tpid > 0xFFFF:
> +                            raise ValueError(
> +                                "push_vlan: tpid=0x%x out "
> +                                "of range (0-0xffff)" % tpid)
> +                    else:
> +                        raise ValueError(
> +                            "push_vlan: unknown key '%s'"
> +                            % k)
> +                tci = (vid & 0x0FFF) | ((pcp & 0x7) << 13) \
> +                    | 0x1000
> +                pvact = self.push_vlan()
> +                pvact["vlan_tpid"] = tpid
> +                pvact["vlan_tci"] = tci
> +                self["attrs"].append(
> +                    ["OVS_ACTION_ATTR_PUSH_VLAN", pvact])
> +                actstr = actstr[paren + 1:]
> +                parsed = True
> +
> +            elif parse_starts_block(actstr, "clone(", False):
>                  parencount += 1
>                  subacts = ovsactions()
>                  actstr = actstr[len("clone("):]
> @@ -901,11 +962,11 @@ class ovskey(nla):
>      nla_flags = NLA_F_NESTED
>      nla_map = (
>          ("OVS_KEY_ATTR_UNSPEC", "none"),
> -        ("OVS_KEY_ATTR_ENCAP", "none"),
> +        ("OVS_KEY_ATTR_ENCAP", "encap_ovskey"),
>          ("OVS_KEY_ATTR_PRIORITY", "uint32"),
>          ("OVS_KEY_ATTR_IN_PORT", "uint32"),
>          ("OVS_KEY_ATTR_ETHERNET", "ethaddr"),
> -        ("OVS_KEY_ATTR_VLAN", "uint16"),
> +        ("OVS_KEY_ATTR_VLAN", "be16"),
>          ("OVS_KEY_ATTR_ETHERTYPE", "be16"),
>          ("OVS_KEY_ATTR_IPV4", "ovs_key_ipv4"),
>          ("OVS_KEY_ATTR_IPV6", "ovs_key_ipv6"),
> @@ -1636,6 +1697,194 @@ class ovskey(nla):
>      class ovs_key_mpls(nla):
>          fields = (("lse", ">I"),)
>  
> +    # 802.1Q CFI (Canonical Format Indicator) bit, always set for Ethernet
> +    _VLAN_CFI_MASK = 0x1000
> +
> +    @staticmethod
> +    def _vlan_dpstr(tci):
> +        """Format VLAN TCI as vid=X,pcp=Y,cfi=Z or tci=0xNNNN.
> +
> +        When cfi=1 (standard Ethernet VLAN), outputs decomposed
> +        vid/pcp/cfi fields. When cfi=0 (truncated VLAN header),
> +        falls back to raw tci=0x%04x to ensure round-trip
> +        correctness: the parser auto-adds cfi=1 for vid/pcp
> +        format, so cfi=0 would be lost on re-parse."""
> +        vid = tci & 0x0FFF
> +        pcp = (tci >> 13) & 0x7
> +        cfi = (tci >> 12) & 0x1
> +        if cfi:
> +            return "vid=%d,pcp=%d,cfi=%d" % (vid, pcp, cfi)
> +        return "tci=0x%04x" % tci
> +
> +    @staticmethod
> +    def _parse_vlan_from_flowstr(flowstr):
> +        """Parse vlan(tci=X) or vlan(vid=X[,pcp=Y,cfi=Z]) from flowstr.
> +
> +        Returns (remaining_flowstr, key_tci, mask_tci).
> +        TCI values use standard bit layout (VID bits 0-11,
> +        CFI bit 12, PCP bits 13-15); byte order conversion to
> +        big-endian happens in pyroute2 be16 NLA serialization.
> +        The mask covers only the fields the caller specified:
> +        vid -> 0x0FFF, pcp -> 0xE000, cfi -> 0x1000, tci -> 0xFFFF.
> +
> +        The tci= key sets the raw TCI bitfield (no CFI validation) to allow
> +        non-Ethernet use cases.  Use cfi=1 for standard Ethernet VLAN matching.
> +        """
> +        tci = 0
> +        mask = 0
> +        has_tci = False
> +        has_vid = has_pcp = has_cfi = False
> +        _tci_mix_err = "vlan(): 'tci' cannot be mixed " \
> +                       "with 'vid'/'pcp'/'cfi'"
> +        first = True
> +        while True:
> +            flowstr = flowstr.lstrip()
> +            if not flowstr:
> +                raise ValueError("vlan(): missing ')'")
> +            if flowstr[0] == ')':
> +                break
> +            if not first:
> +                flowstr = flowstr[1:]  # skip ','
> +                if not flowstr:
> +                    raise ValueError("vlan(): missing ')' after trailing comma")
> +                flowstr = flowstr.lstrip()
> +                if flowstr and flowstr[0] == ')':
> +                    break
> +                if flowstr and flowstr[0] == ',':
> +                    raise ValueError(
> +                        "vlan(): empty or extra comma in field list")
> +            first = False
> +
> +            eq = flowstr.find('=')
> +            if eq == -1:
> +                raise ValueError(
> +                    "vlan(): expected key=value, got '%s'" % flowstr)
> +            key = flowstr[:eq].strip()
> +            flowstr = flowstr[eq + 1:]
> +
> +            end = flowstr.find(',')
> +            end2 = flowstr.find(')')
> +            if end == -1 and end2 == -1:
> +                raise ValueError("vlan(): missing ')'")
> +            if end == -1 or (end2 != -1 and end2 < end):
> +                end = end2
> +            val = flowstr[:end].strip()
> +            flowstr = flowstr[end:]
> +
> +            if not val:
> +                raise ValueError("vlan(): empty value for key '%s'" % key)
> +            try:
> +                v = int(val, 16) if val.startswith(('0x', '0X')) else int(val)
> +            except ValueError as exc:
> +                raise ValueError(
> +                    "vlan(): invalid value '%s' for key '%s'"
> +                    % (val, key)) from exc
> +
> +            if key == 'tci':
> +                if has_tci:
> +                    raise ValueError("vlan(): duplicate 'tci'")
> +                if has_vid or has_pcp or has_cfi:
> +                    raise ValueError(_tci_mix_err)
> +                if v > 0xFFFF or v < 0:
> +                    raise ValueError("vlan(): tci=0x%x out of range" % v)
> +                tci = v
> +                mask = 0xFFFF
> +                has_tci = True
> +            elif key == 'vid':
> +                if has_tci:
> +                    raise ValueError(_tci_mix_err)
> +                if has_vid:
> +                    raise ValueError("vlan(): duplicate 'vid'")
> +                if v < 0 or v > 0xFFF:
> +                    raise ValueError("vlan(): vid=%d out of range (0-4095)" % v)
> +                tci |= v
> +                mask |= 0x0FFF
> +                has_vid = True
> +            elif key == 'pcp':
> +                if has_tci:
> +                    raise ValueError(_tci_mix_err)
> +                if has_pcp:
> +                    raise ValueError("vlan(): duplicate 'pcp'")
> +                if v < 0 or v > 7:
> +                    raise ValueError("vlan(): pcp=%d out of range (0-7)" % v)
> +                tci |= (v & 0x7) << 13
> +                mask |= 0xE000
> +                has_pcp = True
> +            elif key == 'cfi':
> +                if has_tci:
> +                    raise ValueError(_tci_mix_err)
> +                if has_cfi:
> +                    raise ValueError("vlan(): duplicate 'cfi'")
> +                if v != 1:
> +                    raise ValueError("vlan(): cfi must be 1 for Ethernet")
> +                tci |= ovskey._VLAN_CFI_MASK
> +                mask |= ovskey._VLAN_CFI_MASK
> +                has_cfi = True
> +            else:
> +                raise ValueError("vlan(): unknown key '%s'" % key)
> +
> +        flowstr = flowstr[1:]  # skip ')'
> +        # Catch immediate '))' (user error).  A ')' after ',' is consumed
> +        # by parse()'s strspn(flowstr, "), ") inter-field separator stripping.
> +        if flowstr.lstrip().startswith(')'):
> +            raise ValueError("vlan(): unmatched ')'")
> +        # parse() strips trailing ',', ')', ' ' as inter-field separators,
> +        # so we do not need to call strspn here.
> +
> +        if mask == 0:
> +            raise ValueError("vlan(): no fields specified, "
> +                             "use vlan(vid=X[,pcp=Y,cfi=Z]) or vlan(tci=X)")
> +        if not has_tci:
> +            tci |= ovskey._VLAN_CFI_MASK
> +            mask |= ovskey._VLAN_CFI_MASK
> +        return flowstr, tci, mask
> +
> +    @staticmethod
> +    def _parse_encap_from_flowstr(flowstr):
> +        """Parse encap(inner_flow) from flowstr.
> +
> +        Returns (remaining_flowstr, inner_key_dict, inner_mask_dict)
> +        where each dict has an 'attrs' key for recursive NLA encoding.
> +        Parenthesis-depth tracking handles nested encap() calls but not
> +        quoted strings containing literal parentheses.
> +        """
> +        depth = 1
> +        end = -1
> +        for i, c in enumerate(flowstr):
> +            if c == '(':
> +                depth += 1
> +            elif c == ')':
> +                depth -= 1
> +                if depth < 0:
> +                    raise ValueError(
> +                        "encap(): unmatched ')' at position %d" % i)
> +                if depth == 0:
> +                    end = i
> +                    break
> +
> +        if end == -1:
> +            if depth > 1:
> +                raise ValueError("encap(): missing ')' at end")
> +            raise ValueError("encap(): missing closing ')'")
> +
> +        inner_str = flowstr[:end].strip()
> +        if not inner_str:
> +            raise ValueError("encap(): empty inner flow")
> +
> +        flowstr = flowstr[end + 1:]
> +        if flowstr.lstrip().startswith(')'):
> +            raise ValueError("encap(): unmatched ')' after encap()")
> +
> +        inner_key = encap_ovskey()
> +        inner_mask = encap_ovskey()
> +        remaining = inner_key.parse(inner_str, inner_mask)
> +        if remaining and re.search(r'[^\s,)]', remaining):
> +            raise ValueError(
> +                "encap(): unrecognized trailing "
> +                "content '%s'" % remaining.strip())
> +
> +        return flowstr, inner_key, inner_mask
> +
>      def parse(self, flowstr, mask=None):
>          for field in (
>              ("OVS_KEY_ATTR_PRIORITY", "skb_priority", intparse),
> @@ -1657,6 +1906,16 @@ class ovskey(nla):
>                  "eth_type",
>                  lambda x: intparse(x, "0xffff"),
>              ),
> +            (
> +                "OVS_KEY_ATTR_VLAN",
> +                "vlan",
> +                ovskey._parse_vlan_from_flowstr,
> +            ),
> +            (
> +                "OVS_KEY_ATTR_ENCAP",
> +                "encap",
> +                ovskey._parse_encap_from_flowstr,
> +            ),
>              (
>                  "OVS_KEY_ATTR_IPV4",
>                  "ipv4",
> @@ -1794,6 +2053,9 @@ class ovskey(nla):
>                  True,
>              ),
>              ("OVS_KEY_ATTR_ETHERNET", None, None, False, False),
> +            ("OVS_KEY_ATTR_VLAN", "vlan", ovskey._vlan_dpstr,
> +                lambda x: False, True),
> +            ("OVS_KEY_ATTR_ENCAP", None, None, False, False),
>              (
>                  "OVS_KEY_ATTR_ETHERTYPE",
>                  "eth_type",
> @@ -1821,22 +2083,61 @@ class ovskey(nla):
>              v = self.get_attr(field[0])
>              if v is not None:
>                  m = None if mask is None else mask.get_attr(field[0])
> +                fmt = field[2]  # str format or callable
>                  if field[4] is False:
>                      print_str += v.dpstr(m, more)
>                      print_str += ","
>                  else:
>                      if m is None or field[3](m):
> -                        print_str += field[1] + "("
> -                        print_str += field[2] % v
> -                        print_str += "),"
> +                        val = fmt(v) if callable(fmt) else fmt % v
> +                        print_str += field[1] + "(" + val + "),"
>                      elif more or m != 0:
> -                        print_str += field[1] + "("
> -                        print_str += (field[2] % v) + "/" + (field[2] % m)
> -                        print_str += "),"
> +                        if callable(fmt):
> +                            val = fmt(v) + "/" + fmt(m)
> +                        else:
> +                            val = (fmt % v) + "/" + (fmt % m)
> +                        print_str += field[1] + "(" + val + "),"
>  
>          return print_str
>  
>  
> +class encap_ovskey(ovskey):
> +    """Inner flow key attributes valid inside 802.1Q ENCAP.
> +
> +    Only L2-L4 key attributes (slots 0-21) appear inside ENCAP.
> +    Metadata-only attributes (SKB_MARK, DP_HASH, RECIRC_ID, etc.)
> +    are set to "none" — they never appear inside ENCAP per
> +    ovs_nla_put_vlan() in net/openvswitch/flow_netlink.c.
> +
> +    nla_map indexes must match OVS_KEY_ATTR_* enum values in
> +    include/uapi/linux/openvswitch.h.
> +    """
> +    nla_map = (

I was thinking that we might be able to use something like:

  nla_map = ovskey.nlamap

But the comment clearly describes why we have this separate structure.
That said:

> +        ("OVS_KEY_ATTR_UNSPEC", "none"),       # 0
> +        ("OVS_KEY_ATTR_ENCAP", "none"),        # 1 — placeholder, no recursion
> +        ("OVS_KEY_ATTR_PRIORITY", "none"),       # 2 — skb metadata, not in ENCAP
> +        ("OVS_KEY_ATTR_IN_PORT", "none"),       # 3 — skb metadata, not in ENCAP

I don't think the '# <num>' are useful.  It may be useful to just
indicate the ones that are different from the ovskey base type, but not
include the number (looks like excess noise).

Otherwise, looks good to me.

> +        ("OVS_KEY_ATTR_ETHERNET", "ethaddr"),   # 4
> +        ("OVS_KEY_ATTR_VLAN", "be16"),          # 5
> +        ("OVS_KEY_ATTR_ETHERTYPE", "be16"),     # 6
> +        ("OVS_KEY_ATTR_IPV4", "ovs_key_ipv4"),  # 7
> +        ("OVS_KEY_ATTR_IPV6", "ovs_key_ipv6"),  # 8
> +        ("OVS_KEY_ATTR_TCP", "ovs_key_tcp"),    # 9
> +        ("OVS_KEY_ATTR_UDP", "ovs_key_udp"),    # 10
> +        ("OVS_KEY_ATTR_ICMP", "ovs_key_icmp"),  # 11
> +        ("OVS_KEY_ATTR_ICMPV6", "ovs_key_icmpv6"),  # 12
> +        ("OVS_KEY_ATTR_ARP", "ovs_key_arp"),    # 13
> +        ("OVS_KEY_ATTR_ND", "ovs_key_nd"),      # 14
> +        ("OVS_KEY_ATTR_SKB_MARK", "none"),      # 15 — metadata, not in ENCAP
> +        ("OVS_KEY_ATTR_TUNNEL", "none"),        # 16 — tunnel metadata, not in ENCAP
> +        ("OVS_KEY_ATTR_SCTP", "ovs_key_sctp"),  # 17
> +        ("OVS_KEY_ATTR_TCP_FLAGS", "be16"),     # 18
> +        ("OVS_KEY_ATTR_DP_HASH", "none"),       # 19 — metadata, not in ENCAP
> +        ("OVS_KEY_ATTR_RECIRC_ID", "none"),     # 20 — metadata, not in ENCAP
> +        ("OVS_KEY_ATTR_MPLS", "array(ovs_key_mpls)"),  # 21
> +    )
> +
> +
>  class OvsPacket(GenericNetlinkSocket):
>      OVS_PACKET_CMD_MISS = 1  # Flow table miss
>      OVS_PACKET_CMD_ACTION = 2  # USERSPACE action
> @@ -2576,6 +2877,7 @@ def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), vpl=OvsVport()):
>  
>  
>  def main(argv):
> +    nlmsg_atoms.encap_ovskey = encap_ovskey
>      nlmsg_atoms.ovskey = ovskey
>      nlmsg_atoms.ovsactions = ovsactions


^ permalink raw reply

* Re: [PATCH net 04/12] net: shaper: try to avoid violating RCU
From: Paolo Abeni @ 2026-05-06 15:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, edumazet, andrew+netdev, horms, shuah, linux-kselftest,
	davem
In-Reply-To: <59b26f1e-efd5-488e-8738-15e96b9c79e4@redhat.com>

On 5/6/26 5:22 PM, Paolo Abeni wrote:
> On 5/6/26 2:06 AM, Jakub Kicinski wrote:
>> net_shaper_commit() overrides nodes which may be concurrently read
>> under RCU. This is not a huge deal since the entries only contain
>> config, worst case user will see inconsistent config params. But
>> we should try to avoid this obvious RCU violation. Try to allocate
>> a new node. Since commit() can't fail fall back to overriding.
>>
>> Full fix is probably not worth the complexity, struct net_shaper
>> is around 80B, and the allocation is with GFP_KERNEL.
> 
> I'm not sure if even this variant is worthy?!? The scheduler tree dump
> could be still inconsistent, as the dump is not atomic. IMHO e.g.
> inconsistent weights in the same WRR group would be as bad as
> inconsistent values inside the single shaper.

I mean: I would simply avoid addressing this RCU violation.

/P


^ permalink raw reply

* Re: [PATCH net 01/12] net: shaper: drop redundant xa_lock() bracketing
From: Paolo Abeni @ 2026-05-06 15:30 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, andrew+netdev, horms, shuah, linux-kselftest
In-Reply-To: <20260506000628.1501691-2-kuba@kernel.org>

On 5/6/26 2:06 AM, Jakub Kicinski wrote:
> The shaper insertion code is written in a way that suggests that
> perhaps it was expecting readers to be fenced off by xa_lock.

FTR, the code was shaped (pun intended :) that way to avoid acquiring
multiple times the xa_lock when not needed. Sort of evil early optimization.

/P


^ permalink raw reply

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Daniel Thompson @ 2026-05-06 15:28 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Xilin Wu, Alex Elder, andrew+netdev, davem, edumazet, kuba,
	pabeni, maxime.chevallier, rmk+kernel, andersson, konradybcio,
	robh, krzk+dt, conor+dt, linusw, brgl, arnd, gregkh, mohd.anwar,
	a0987203069, alexandre.torgue, ast, boon.khai.ng, chenchuangyu,
	chenhuacai, daniel, hawk, hkallweit1, inochiama, john.fastabend,
	julianbraha, livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <4015f47a-af62-441d-b1b8-a8598f963970@lunn.ch>

On Wed, May 06, 2026 at 04:19:54PM +0200, Andrew Lunn wrote:
> On Wed, May 06, 2026 at 08:59:01PM +0800, Xilin Wu wrote:
> > On 5/1/2026 11:54 PM, Alex Elder wrote:
> > > +	/* AXI Configuration */
> > > +	axi = &td->axi;
> > > +	axi->axi_lpi_en = 1;
> > > +	axi->axi_wr_osr_lmt = 31;
> > > +	axi->axi_rd_osr_lmt = 31;
> > > +	/* All sizes (2^2..2^8) are supported */
> > > +	axi->axi_blen_regval = DMA_AXI_BLEN_MASK;
> > > +	plat->axi = axi;
> > > +
> > > +	plat->mac_port_sel_speed = speed;
> > > +	plat->flags = STMMAC_FLAG_MULTI_MSI_EN | STMMAC_FLAG_TSO_EN;
> >
> > I got WoL working only after adding STMMAC_FLAG_USE_PHY_WOL here. I guess
> > it's required, since the driver clocks down the MAC/PMA/XPCS in its suspend
> > hook?
>
> Nice to see somebody testing WoL.

Absolutely!

We recently stripped out the (obviously broken and partially ported)
WoL support we had in tc956x-pci.c. We planned to bring it back later.
Hadn't realized it could be so easy.


> In your testing, is it the PHY doing the WoL, or the MAC? I assume
> PHY.
>
> If i remember the DT correctly, the PHY interrupt is connected to a
> SoC GPIO, not a GPIO of this chip.

On RB3Gen2 (and I think also the QPS615 reference design) the phy
interrupt is routed twice. It is connected both to the TC9564 GPIO
block and to the host SoC.


> So for your board, it is the SoCs
> GPIO controllers ability to perform the wake which is
> important. However, where the PHY interrupt is connected is a board
> design issue. Could the PHY interrupt be connected to the chip? Would
> the chip be able to wake the system? Should STMMAC_FLAG_USE_PHY_WOL be
> conditional?

I couldn't think of any reason to use the TC9564 GPIO block for the
interrupts on RB3gen2 so I left it unused and the DT describes the SoC
routing only.

However if TC9564 were to be mounted on a real PCIe card then we'd have
to use the TC9564 GPIO instead and would have to leave enough of the
TC9564 enabled to handle the interrupt (and also to load a firmware to
catch the interrupt and drive the PCI wake up signal).

However such systems would need extensions to the current driver. We
have done all we can to make it possible to add those extensions in the
future but we would not be able to test them: not only do would a real
PCIe card need extra driver features to play nice with ACPI but the
card would also need an I2C EEPROM. We don't have that EEPROM on
RB3gen2 (and again, we'd need firmware to read it).

TL;DR - there are conceivable (and sane) hardware designs where the
interrupt goes only to the TC9564 GPIO, but they are too different to
RB3gen2 (and related SBC designs) for them to be supported before
they exist!


Daniel.

^ permalink raw reply

* Re: [PATCH bpf-next v4 05/12] bpf: Refactor object relationship tracking and fix dynptr UAF bug
From: bot+bpf-ci @ 2026-05-06 15:27 UTC (permalink / raw)
  To: ameryhung, bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team, ast, andrii,
	daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260506142709.2298255-6-ameryhung@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index f330e9cf297e..51d5f5dd6e5b 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h

[ ... ]

> @@ -1381,6 +1394,7 @@ struct bpf_dynptr_desc {
>  	enum bpf_dynptr_type type;
>  	u32 id;
>  	u32 ref_obj_id;
> +	u32 parent_id;
>  };

Eduard Zingerman raised a suggestion in v3 about adding documentation
comments to struct bpf_dynptr_desc. The structure still lacks comments
explaining when the id, ref_obj_id, and parent_id fields are populated.
Would it help readers to document when these fields are set?

>
>  struct bpf_kfunc_call_arg_meta {
> @@ -1392,6 +1406,7 @@ struct bpf_kfunc_call_arg_meta {
>  	const char *func_name;
>  	/* Out parameters */
>  	u32 ref_obj_id;
> +	u32 id;
>  	u8 release_regno;
>  	bool r0_rdonly;
>  	u32 ret_btf_id;

Eduard Zingerman suggested in v3 adding a comment for the new id field
in struct bpf_kfunc_call_arg_meta. The field remains undocumented. Would
a comment clarifying its purpose improve code readability?

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25442456702

^ permalink raw reply

* Re: [PATCH bpf-next v4 03/12] bpf: Assign reg->id when getting referenced kptr from ctx
From: bot+bpf-ci @ 2026-05-06 15:27 UTC (permalink / raw)
  To: ameryhung, bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team, ast, andrii,
	daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260506142709.2298255-4-ameryhung@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3173 bytes --]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6057,8 +6057,6 @@ struct bpf_insn_access_aux info = {
>  			} else {
>  				mark_reg_known_zero(env, regs,
>  						    value_regno);
> -				if (type_may_be_null(info.reg_type))
> -					regs[value_regno].id = ++env->id_gen;
>  				/* A load of ctx field could have different
>  				 * actual load size with the one encoded in the
>  				 * insn. When the dst is PTR, it is for sure not
> @@ -6068,8 +6066,11 @@ struct bpf_insn_access_aux info = {
>  				if (base_type(info.reg_type) == PTR_TO_BTF_ID) {
>  					regs[value_regno].btf = info.btf;
>  					regs[value_regno].btf_id = info.btf_id;
> +					regs[value_regno].id = info.ref_obj_id;
                                        ^^^^

Does this introduce a bug when reading the same nullable context field
multiple times?

The new code sets reg->id = info.ref_obj_id, where ref_obj_id comes from
ctx_arg_info->ref_obj_id. The ctx_arg_info structure is initialized once
during program type setup in check_attach_btf_id(), making ref_obj_id a
static value per context field, not per read.

When a BPF program reads the same nullable context field into different
registers, they would share the same ID:

    r1 = ctx->nullable_kptr;  // r1->id = ref_obj_id = 5
    r2 = ctx->nullable_kptr;  // r2->id = ref_obj_id = 5 (SAME!)
    if (r1 != NULL) {
        // mark_ptr_or_null_regs(id=5) marks ALL registers with id==5
        // Both r1 AND r2 are now considered safe
        use(r2);  // BUG: r2 was never checked but verifier allows use
    }

The mark_ptr_or_null_reg() function uses reg->id to identify which
registers should be updated together after a NULL check. With shared IDs,
independent reads are incorrectly treated as aliases.

The old code generated a unique ID via ++env->id_gen for each nullable
read, ensuring proper NULL-check tracking.

>  					regs[value_regno].ref_obj_id = info.ref_obj_id;
>  				}
> +				if (type_may_be_null(info.reg_type) && !regs[value_regno].id)
> +					regs[value_regno].id = ++env->id_gen;

Would this fallback path prevent fresh ID generation when id was already
set to ref_obj_id? The condition '!regs[value_regno].id' is now false for
nullable PTR_TO_BTF_ID pointers, so each read gets the same static ID
instead of a unique one.

Eduard Zingerman's review comment from v3 suggested that the commit message
should mention that this patch is mandatory for proper invalidation
propagation in later patches. He noted that without the line
'regs[value_regno].id = info.ref_obj_id;', the test
'bpf_qdisc_fail__invalid_dynptr_cross_frame.c:invalid_dynptr_cross_frame'
fails because the .id link is needed to propagate invalidation triggered by
'free_skb(skb);' to 'hdr'. The current commit message mentions consistency
and parent_id tracking but does not explicitly mention the invalidation
propagation aspect.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25442456702

^ permalink raw reply

* Re: [PATCH bpf-next v4 02/12] bpf: Unify dynptr handling in the verifier
From: bot+bpf-ci @ 2026-05-06 15:27 UTC (permalink / raw)
  To: ameryhung, bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, eddyz87, memxor,
	martin.lau, mykyta.yatsenko5, ameryhung, kernel-team, ast, andrii,
	daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260506142709.2298255-3-ameryhung@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 976e2b2f40e8..f330e9cf297e 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -1376,6 +1376,13 @@ struct bpf_map_desc {
>  	int uid;
>  };
>
> +/* The last initialized dynptr; Populated by process_dynptr_func() */
> +struct bpf_dynptr_desc {
> +	enum bpf_dynptr_type type;
> +	u32 id;
> +	u32 ref_obj_id;
> +};
> +

A review comment from Mykyta Yatsenko requested that a comment be added
to the ref_obj_id field explaining what it's for:

  https://lore.kernel.org/bpf/87ikb2tkta.fsf@gmail.com/

The author mentioned in v3 that comments for id, ref_obj_id, and parent_id
would be added in a later refactor patch. However, should this struct have
inline documentation now for clarity, particularly for the ref_obj_id field?

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25442456702

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox