Netdev List
 help / color / mirror / Atom feed
* [PATCH v2] xprtrdma: Move long delayed work on system_dfl_long_wq
From: Marco Crivellari @ 2026-05-07 13:01 UTC (permalink / raw)
  To: linux-kernel, linux-nfs, netdev
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
	Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
	NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman

Currently the code enqueue work items using {queue|mod}_delayed_work(),
using system_long_wq. This workqueue should be used when long works are
expected and it is a per-cpu workqueue.

The function(s) end up calling __queue_delayed_work(), which set a global
timer that could fire anywhere, enqueuing the work where the timer fired.

Unbound works could benefit from scheduler task placement, to optimize
performance and power consumption. Long work shouldn't stick to a single
CPU.

Recently, a new unbound workqueue specific for long running work has
been added:

    c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")

Since the workqueue work doesn't rely on per-cpu variables, there is no
obvious reason that justify the use of a per-cpu workqueue. So change
system_long_wq with system_dfl_long_wq so that the work may benefit from
scheduler task placement.

Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
Changes in v2:
- Commit log improvements

- Rebase on v7.1-rc2

Link to v1: https://lore.kernel.org/all/20260430085412.96961-1-marco.crivellari@suse.com/

 net/sunrpc/xprtrdma/transport.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 61706df5e485..1a54993f7ffb 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -484,7 +484,8 @@ xprt_rdma_connect(struct rpc_xprt *xprt, struct rpc_task *task)
 		xprt_reconnect_backoff(xprt, RPCRDMA_INIT_REEST_TO);
 	}
 	trace_xprtrdma_op_connect(r_xprt, delay);
-	queue_delayed_work(system_long_wq, &r_xprt->rx_connect_worker, delay);
+	queue_delayed_work(system_dfl_long_wq, &r_xprt->rx_connect_worker,
+			   delay);
 }
 
 /**
-- 
2.53.0


^ permalink raw reply related

* [PATCH net-next] bnxt_en: Drop pci_save_state() after pci_restore_state()
From: Lukas Wunner @ 2026-05-07 13:04 UTC (permalink / raw)
  To: Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Vasundhara Volam
  Cc: netdev

Commit 383d89699c50 ("treewide: Drop pci_save_state() after
pci_restore_state()") sought to purge all superfluous invocations of
pci_save_state() from the tree.

Unfortunately the commit missed one invocation in the Broadcom
NetXtreme-C/E driver.  Drop it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8c55874..10196f9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -17386,7 +17386,6 @@ static pci_ers_result_t bnxt_io_slot_reset(struct pci_dev *pdev)
 				pci_write_config_dword(bp->pdev, off, 0);
 		}
 		pci_restore_state(pdev);
-		pci_save_state(pdev);
 
 		bnxt_inv_fw_health_reg(bp);
 		bnxt_try_map_fw_health_reg(bp);
-- 
2.51.0


^ permalink raw reply related

* [PATCH net-next v7 0/2] selftests: openvswitch: add pop_vlan test
From: Minxi Hou @ 2026-05-07 13:15 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, i.maximets, davem, edumazet, kuba, pabeni,
	horms, shuah, dev, linux-kselftest, linux-kernel, Minxi Hou

Add test_pop_vlan() to verify OVS kernel datapath pop_vlan action
correctly strips 802.1Q VLAN tags from frames.

Patch 1 extends ovs-dpctl.py with vlan(vid=X,pcp=Y,cfi=Z) formatting
and parsing, plus an encap_ovskey subclass for safe ENCAP NLA decoding.
It changes OVS_KEY_ATTR_VLAN type from uint16 to be16 to match
the kernel __be16 wire format.
It also adds push_vlan action support (parse/format with range
validation) and removes the unnecessary MAX_ENCAP_DEPTH limit.
Patch 2 adds the selftest using purely ping-based verification with
a push_vlan return flow for symmetric bidirectional testing.

Tested with vng on x86_64, all OVS selftests pass (including new
test_pop_vlan).

v7:
  - remove slot number comments from encap_ovskey nla_map, keep
    only comments explaining differences from base ovskey class
  - remove explicit modprobe 8021q pre-flight check (ip link add
    type vlan auto-loads the module)
v6: https://lore.kernel.org/netdev/20260506131218.1880852-1-houminxi@gmail.com/
  - fix non-ASCII characters (em dashes) in comments and commit
    messages
v5: https://lore.kernel.org/netdev/20260505124957.1239812-1-houminxi@gmail.com/
  - add push_vlan action class, dpstr format and parse with range
    validation (vid 0-4095, pcp 0-7, tpid 0-0xFFFF, CFI forced to 1)
  - remove MAX_ENCAP_DEPTH constant and depth tracking (bracket-depth
    counter in encap parser already handles nesting)
  - remove start_capture/stop_capture helpers and tcpdump/pcap
    verification -- use ping success/failure instead
  - remove modprobe/netns pre-flight checks (other tests don't do this)
  - remove ethtool VLAN offload disable (unnecessary for veth)
  - add push_vlan return flow for symmetric bidirectional ping
  - use ovs_sbx wrapper for ping commands (consistent with siblings)
v4: https://lore.kernel.org/netdev/20260504123713.555461-1-houminxi@gmail.com/
  - fix all checkpatch line-length warnings in new code
  - fix pylint W0707: use explicit exception chaining (from exc)
v3: https://lore.kernel.org/netdev/20260503120946.51869-1-houminxi@gmail.com/
  - encap_ovskey: MPLS type "ovs_key_mpls" -> "array(ovs_key_mpls)"
  - encap_ovskey: PRIORITY/IN_PORT set to "none" (metadata, not in ENCAP)
  - _vlan_dpstr: cfi=0 falls back to tci=0x%04x for round-trip safety
  - encap parse(): check return value for unrecognized trailing content
  - vlan parser: boundary check + raise-from for exception chaining
  - start_capture: || return $? to propagate ksft_skip correctly
  - on_exit: moved after resource creation, not before
  - ping success: changed from NOTE to FAIL + return 1
  - VLAN interface creation: added || return 1 error propagation
  - netns probe: distinguish EEXIST from missing CONFIG_NET_NS
  - sbx_add: || return $ksft_skip -> || return $? (match sibling tests)
v2: https://lore.kernel.org/netdev/20260501133924.3100680-1-houminxi@gmail.com/

Minxi Hou (2):
  selftests: openvswitch: add vlan() and encap() flow string parsing
  selftests: openvswitch: add pop_vlan test

 .../selftests/net/openvswitch/openvswitch.sh  |  73 ++++
 .../selftests/net/openvswitch/ovs-dpctl.py    | 322 +++++++++++++++++-
 2 files changed, 385 insertions(+), 10 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH net-next v7 1/2] selftests: openvswitch: add vlan() and encap() flow string parsing
From: Minxi Hou @ 2026-05-07 13:15 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, i.maximets, davem, edumazet, kuba, pabeni,
	horms, shuah, dev, linux-kselftest, linux-kernel, Minxi Hou
In-Reply-To: <20260507131541.2331771-1-houminxi@gmail.com>

Add VLAN TCI formatting and parsing support to ovs-dpctl.py:

- Add _vlan_dpstr() to decompose TCI into vid/pcp/cfi fields,
  with raw tci=0x%04x fallback when cfi=0 for round-trip safety.
- Add _parse_vlan_from_flowstr() boundary check for missing ')'.
- Add encap_ovskey subclass restricting nla_map to L2-L4 attributes
  (slots 0-21) that appear inside 802.1Q ENCAP, with metadata
  attributes set to "none".
- Check parse() return value for unrecognized trailing content.
- Support callable format functions in dpstr() output.
- Change OVS_KEY_ATTR_VLAN type from uint16 to be16 to match the
  kernel __be16 wire format; uint16 decodes in host byte order,
  which gives wrong values on little-endian architectures.
- Change OVS_KEY_ATTR_ENCAP type from none to encap_ovskey to
  enable recursive parsing of 802.1Q encapsulated flow keys.
- Add push_vlan action class with fields matching kernel struct
  ovs_action_push_vlan (vlan_tpid, vlan_tci as network-order u16).
- Add push_vlan dpstr format and parse with range validation
  (vid 0-4095, pcp 0-7, tpid 0-0xFFFF) and CFI forced to 1.
- Remove MAX_ENCAP_DEPTH constant and depth tracking -- the
  bracket-depth counter in the encap parser already handles
  nesting; the global depth limit was unnecessary.

Signed-off-by: Minxi Hou <houminxi@gmail.com>
---
 .../selftests/net/openvswitch/ovs-dpctl.py    | 322 +++++++++++++++++-
 1 file changed, 312 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 848f61fdcee0..98d68277b9e7 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -370,7 +370,7 @@ class ovsactions(nla):
         ("OVS_ACTION_ATTR_OUTPUT", "uint32"),
         ("OVS_ACTION_ATTR_USERSPACE", "userspace"),
         ("OVS_ACTION_ATTR_SET", "ovskey"),
-        ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
+        ("OVS_ACTION_ATTR_PUSH_VLAN", "push_vlan"),
         ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
         ("OVS_ACTION_ATTR_SAMPLE", "sample"),
         ("OVS_ACTION_ATTR_RECIRC", "uint32"),
@@ -427,6 +427,9 @@ class ovsactions(nla):
 
             return actstr
 
+    class push_vlan(nla):
+        fields = (("vlan_tpid", "!H"), ("vlan_tci", "!H"))
+
     class sample(nla):
         nla_flags = NLA_F_NESTED
 
@@ -633,6 +636,14 @@ class ovsactions(nla):
                 print_str += "ct_clear"
             elif field[0] == "OVS_ACTION_ATTR_POP_VLAN":
                 print_str += "pop_vlan"
+            elif field[0] == "OVS_ACTION_ATTR_PUSH_VLAN":
+                datum = self.get_attr(field[0])
+                tpid = datum["vlan_tpid"]
+                tci = datum["vlan_tci"]
+                vid = tci & 0x0FFF
+                pcp = (tci >> 13) & 0x7
+                print_str += "push_vlan(vid=%d,pcp=%d" \
+                    ",tpid=0x%04x)" % (vid, pcp, tpid)
             elif field[0] == "OVS_ACTION_ATTR_POP_ETH":
                 print_str += "pop_eth"
             elif field[0] == "OVS_ACTION_ATTR_POP_NSH":
@@ -726,7 +737,57 @@ class ovsactions(nla):
                     actstr = actstr[strspn(actstr, ", ") :]
                     parsed = True
 
-            if parse_starts_block(actstr, "clone(", False):
+            if parse_starts_block(actstr, "push_vlan(", False):
+                actstr = actstr[len("push_vlan("):]
+                vid = 0
+                pcp = 0
+                tpid = 0x8100
+                if ")" not in actstr:
+                    raise ValueError(
+                        "push_vlan: missing ')'")
+                paren = actstr.index(")")
+                if not actstr[:paren].strip():
+                    raise ValueError("push_vlan: no fields")
+                for kv in actstr[:paren].split(","):
+                    if "=" not in kv:
+                        raise ValueError(
+                            "push_vlan: bad field '%s'"
+                            % kv.strip())
+                    k = kv[:kv.index("=")].strip()
+                    v = kv[kv.index("=") + 1:].strip()
+                    if k == "vid":
+                        vid = int(v, 0)
+                        if vid < 0 or vid > 0xFFF:
+                            raise ValueError(
+                                "push_vlan: vid=%d out of "
+                                "range (0-4095)" % vid)
+                    elif k == "pcp":
+                        pcp = int(v, 0)
+                        if pcp < 0 or pcp > 7:
+                            raise ValueError(
+                                "push_vlan: pcp=%d out of "
+                                "range (0-7)" % pcp)
+                    elif k == "tpid":
+                        tpid = int(v, 0)
+                        if tpid < 0 or tpid > 0xFFFF:
+                            raise ValueError(
+                                "push_vlan: tpid=0x%x out "
+                                "of range (0-0xffff)" % tpid)
+                    else:
+                        raise ValueError(
+                            "push_vlan: unknown key '%s'"
+                            % k)
+                tci = (vid & 0x0FFF) | ((pcp & 0x7) << 13) \
+                    | 0x1000
+                pvact = self.push_vlan()
+                pvact["vlan_tpid"] = tpid
+                pvact["vlan_tci"] = tci
+                self["attrs"].append(
+                    ["OVS_ACTION_ATTR_PUSH_VLAN", pvact])
+                actstr = actstr[paren + 1:]
+                parsed = True
+
+            elif parse_starts_block(actstr, "clone(", False):
                 parencount += 1
                 subacts = ovsactions()
                 actstr = actstr[len("clone("):]
@@ -901,11 +962,11 @@ class ovskey(nla):
     nla_flags = NLA_F_NESTED
     nla_map = (
         ("OVS_KEY_ATTR_UNSPEC", "none"),
-        ("OVS_KEY_ATTR_ENCAP", "none"),
+        ("OVS_KEY_ATTR_ENCAP", "encap_ovskey"),
         ("OVS_KEY_ATTR_PRIORITY", "uint32"),
         ("OVS_KEY_ATTR_IN_PORT", "uint32"),
         ("OVS_KEY_ATTR_ETHERNET", "ethaddr"),
-        ("OVS_KEY_ATTR_VLAN", "uint16"),
+        ("OVS_KEY_ATTR_VLAN", "be16"),
         ("OVS_KEY_ATTR_ETHERTYPE", "be16"),
         ("OVS_KEY_ATTR_IPV4", "ovs_key_ipv4"),
         ("OVS_KEY_ATTR_IPV6", "ovs_key_ipv6"),
@@ -1636,6 +1697,194 @@ class ovskey(nla):
     class ovs_key_mpls(nla):
         fields = (("lse", ">I"),)
 
+    # 802.1Q CFI (Canonical Format Indicator) bit, always set for Ethernet
+    _VLAN_CFI_MASK = 0x1000
+
+    @staticmethod
+    def _vlan_dpstr(tci):
+        """Format VLAN TCI as vid=X,pcp=Y,cfi=Z or tci=0xNNNN.
+
+        When cfi=1 (standard Ethernet VLAN), outputs decomposed
+        vid/pcp/cfi fields. When cfi=0 (truncated VLAN header),
+        falls back to raw tci=0x%04x to ensure round-trip
+        correctness: the parser auto-adds cfi=1 for vid/pcp
+        format, so cfi=0 would be lost on re-parse."""
+        vid = tci & 0x0FFF
+        pcp = (tci >> 13) & 0x7
+        cfi = (tci >> 12) & 0x1
+        if cfi:
+            return "vid=%d,pcp=%d,cfi=%d" % (vid, pcp, cfi)
+        return "tci=0x%04x" % tci
+
+    @staticmethod
+    def _parse_vlan_from_flowstr(flowstr):
+        """Parse vlan(tci=X) or vlan(vid=X[,pcp=Y,cfi=Z]) from flowstr.
+
+        Returns (remaining_flowstr, key_tci, mask_tci).
+        TCI values use standard bit layout (VID bits 0-11,
+        CFI bit 12, PCP bits 13-15); byte order conversion to
+        big-endian happens in pyroute2 be16 NLA serialization.
+        The mask covers only the fields the caller specified:
+        vid -> 0x0FFF, pcp -> 0xE000, cfi -> 0x1000, tci -> 0xFFFF.
+
+        The tci= key sets the raw TCI bitfield (no CFI validation) to allow
+        non-Ethernet use cases.  Use cfi=1 for standard Ethernet VLAN matching.
+        """
+        tci = 0
+        mask = 0
+        has_tci = False
+        has_vid = has_pcp = has_cfi = False
+        _tci_mix_err = "vlan(): 'tci' cannot be mixed " \
+                       "with 'vid'/'pcp'/'cfi'"
+        first = True
+        while True:
+            flowstr = flowstr.lstrip()
+            if not flowstr:
+                raise ValueError("vlan(): missing ')'")
+            if flowstr[0] == ')':
+                break
+            if not first:
+                flowstr = flowstr[1:]  # skip ','
+                if not flowstr:
+                    raise ValueError("vlan(): missing ')' after trailing comma")
+                flowstr = flowstr.lstrip()
+                if flowstr and flowstr[0] == ')':
+                    break
+                if flowstr and flowstr[0] == ',':
+                    raise ValueError(
+                        "vlan(): empty or extra comma in field list")
+            first = False
+
+            eq = flowstr.find('=')
+            if eq == -1:
+                raise ValueError(
+                    "vlan(): expected key=value, got '%s'" % flowstr)
+            key = flowstr[:eq].strip()
+            flowstr = flowstr[eq + 1:]
+
+            end = flowstr.find(',')
+            end2 = flowstr.find(')')
+            if end == -1 and end2 == -1:
+                raise ValueError("vlan(): missing ')'")
+            if end == -1 or (end2 != -1 and end2 < end):
+                end = end2
+            val = flowstr[:end].strip()
+            flowstr = flowstr[end:]
+
+            if not val:
+                raise ValueError("vlan(): empty value for key '%s'" % key)
+            try:
+                v = int(val, 16) if val.startswith(('0x', '0X')) else int(val)
+            except ValueError as exc:
+                raise ValueError(
+                    "vlan(): invalid value '%s' for key '%s'"
+                    % (val, key)) from exc
+
+            if key == 'tci':
+                if has_tci:
+                    raise ValueError("vlan(): duplicate 'tci'")
+                if has_vid or has_pcp or has_cfi:
+                    raise ValueError(_tci_mix_err)
+                if v > 0xFFFF or v < 0:
+                    raise ValueError("vlan(): tci=0x%x out of range" % v)
+                tci = v
+                mask = 0xFFFF
+                has_tci = True
+            elif key == 'vid':
+                if has_tci:
+                    raise ValueError(_tci_mix_err)
+                if has_vid:
+                    raise ValueError("vlan(): duplicate 'vid'")
+                if v < 0 or v > 0xFFF:
+                    raise ValueError("vlan(): vid=%d out of range (0-4095)" % v)
+                tci |= v
+                mask |= 0x0FFF
+                has_vid = True
+            elif key == 'pcp':
+                if has_tci:
+                    raise ValueError(_tci_mix_err)
+                if has_pcp:
+                    raise ValueError("vlan(): duplicate 'pcp'")
+                if v < 0 or v > 7:
+                    raise ValueError("vlan(): pcp=%d out of range (0-7)" % v)
+                tci |= (v & 0x7) << 13
+                mask |= 0xE000
+                has_pcp = True
+            elif key == 'cfi':
+                if has_tci:
+                    raise ValueError(_tci_mix_err)
+                if has_cfi:
+                    raise ValueError("vlan(): duplicate 'cfi'")
+                if v != 1:
+                    raise ValueError("vlan(): cfi must be 1 for Ethernet")
+                tci |= ovskey._VLAN_CFI_MASK
+                mask |= ovskey._VLAN_CFI_MASK
+                has_cfi = True
+            else:
+                raise ValueError("vlan(): unknown key '%s'" % key)
+
+        flowstr = flowstr[1:]  # skip ')'
+        # Catch immediate '))' (user error).  A ')' after ',' is consumed
+        # by parse()'s strspn(flowstr, "), ") inter-field separator stripping.
+        if flowstr.lstrip().startswith(')'):
+            raise ValueError("vlan(): unmatched ')'")
+        # parse() strips trailing ',', ')', ' ' as inter-field separators,
+        # so we do not need to call strspn here.
+
+        if mask == 0:
+            raise ValueError("vlan(): no fields specified, "
+                             "use vlan(vid=X[,pcp=Y,cfi=Z]) or vlan(tci=X)")
+        if not has_tci:
+            tci |= ovskey._VLAN_CFI_MASK
+            mask |= ovskey._VLAN_CFI_MASK
+        return flowstr, tci, mask
+
+    @staticmethod
+    def _parse_encap_from_flowstr(flowstr):
+        """Parse encap(inner_flow) from flowstr.
+
+        Returns (remaining_flowstr, inner_key_dict, inner_mask_dict)
+        where each dict has an 'attrs' key for recursive NLA encoding.
+        Parenthesis-depth tracking handles nested encap() calls but not
+        quoted strings containing literal parentheses.
+        """
+        depth = 1
+        end = -1
+        for i, c in enumerate(flowstr):
+            if c == '(':
+                depth += 1
+            elif c == ')':
+                depth -= 1
+                if depth < 0:
+                    raise ValueError(
+                        "encap(): unmatched ')' at position %d" % i)
+                if depth == 0:
+                    end = i
+                    break
+
+        if end == -1:
+            if depth > 1:
+                raise ValueError("encap(): missing ')' at end")
+            raise ValueError("encap(): missing closing ')'")
+
+        inner_str = flowstr[:end].strip()
+        if not inner_str:
+            raise ValueError("encap(): empty inner flow")
+
+        flowstr = flowstr[end + 1:]
+        if flowstr.lstrip().startswith(')'):
+            raise ValueError("encap(): unmatched ')' after encap()")
+
+        inner_key = encap_ovskey()
+        inner_mask = encap_ovskey()
+        remaining = inner_key.parse(inner_str, inner_mask)
+        if remaining and re.search(r'[^\s,)]', remaining):
+            raise ValueError(
+                "encap(): unrecognized trailing "
+                "content '%s'" % remaining.strip())
+
+        return flowstr, inner_key, inner_mask
+
     def parse(self, flowstr, mask=None):
         for field in (
             ("OVS_KEY_ATTR_PRIORITY", "skb_priority", intparse),
@@ -1657,6 +1906,16 @@ class ovskey(nla):
                 "eth_type",
                 lambda x: intparse(x, "0xffff"),
             ),
+            (
+                "OVS_KEY_ATTR_VLAN",
+                "vlan",
+                ovskey._parse_vlan_from_flowstr,
+            ),
+            (
+                "OVS_KEY_ATTR_ENCAP",
+                "encap",
+                ovskey._parse_encap_from_flowstr,
+            ),
             (
                 "OVS_KEY_ATTR_IPV4",
                 "ipv4",
@@ -1794,6 +2053,9 @@ class ovskey(nla):
                 True,
             ),
             ("OVS_KEY_ATTR_ETHERNET", None, None, False, False),
+            ("OVS_KEY_ATTR_VLAN", "vlan", ovskey._vlan_dpstr,
+                lambda x: False, True),
+            ("OVS_KEY_ATTR_ENCAP", None, None, False, False),
             (
                 "OVS_KEY_ATTR_ETHERTYPE",
                 "eth_type",
@@ -1821,22 +2083,61 @@ class ovskey(nla):
             v = self.get_attr(field[0])
             if v is not None:
                 m = None if mask is None else mask.get_attr(field[0])
+                fmt = field[2]  # str format or callable
                 if field[4] is False:
                     print_str += v.dpstr(m, more)
                     print_str += ","
                 else:
                     if m is None or field[3](m):
-                        print_str += field[1] + "("
-                        print_str += field[2] % v
-                        print_str += "),"
+                        val = fmt(v) if callable(fmt) else fmt % v
+                        print_str += field[1] + "(" + val + "),"
                     elif more or m != 0:
-                        print_str += field[1] + "("
-                        print_str += (field[2] % v) + "/" + (field[2] % m)
-                        print_str += "),"
+                        if callable(fmt):
+                            val = fmt(v) + "/" + fmt(m)
+                        else:
+                            val = (fmt % v) + "/" + (fmt % m)
+                        print_str += field[1] + "(" + val + "),"
 
         return print_str
 
 
+class encap_ovskey(ovskey):
+    """Inner flow key attributes valid inside 802.1Q ENCAP.
+
+    Only L2-L4 key attributes (slots 0-21) appear inside ENCAP.
+    Metadata-only attributes (SKB_MARK, DP_HASH, RECIRC_ID, etc.)
+    are set to "none" -- they never appear inside ENCAP per
+    ovs_nla_put_vlan() in net/openvswitch/flow_netlink.c.
+
+    nla_map indexes must match OVS_KEY_ATTR_* enum values in
+    include/uapi/linux/openvswitch.h.
+    """
+    nla_map = (
+        ("OVS_KEY_ATTR_UNSPEC", "none"),
+        ("OVS_KEY_ATTR_ENCAP", "none"),  # placeholder, parsed by ovskey
+        ("OVS_KEY_ATTR_PRIORITY", "none"),  # skb metadata, not in ENCAP
+        ("OVS_KEY_ATTR_IN_PORT", "none"),  # skb metadata, not in ENCAP
+        ("OVS_KEY_ATTR_ETHERNET", "ethaddr"),
+        ("OVS_KEY_ATTR_VLAN", "be16"),
+        ("OVS_KEY_ATTR_ETHERTYPE", "be16"),
+        ("OVS_KEY_ATTR_IPV4", "ovs_key_ipv4"),
+        ("OVS_KEY_ATTR_IPV6", "ovs_key_ipv6"),
+        ("OVS_KEY_ATTR_TCP", "ovs_key_tcp"),
+        ("OVS_KEY_ATTR_UDP", "ovs_key_udp"),
+        ("OVS_KEY_ATTR_ICMP", "ovs_key_icmp"),
+        ("OVS_KEY_ATTR_ICMPV6", "ovs_key_icmpv6"),
+        ("OVS_KEY_ATTR_ARP", "ovs_key_arp"),
+        ("OVS_KEY_ATTR_ND", "ovs_key_nd"),
+        ("OVS_KEY_ATTR_SKB_MARK", "none"),  # metadata, not in ENCAP
+        ("OVS_KEY_ATTR_TUNNEL", "none"),  # tunnel metadata, not in ENCAP
+        ("OVS_KEY_ATTR_SCTP", "ovs_key_sctp"),
+        ("OVS_KEY_ATTR_TCP_FLAGS", "be16"),
+        ("OVS_KEY_ATTR_DP_HASH", "none"),  # metadata, not in ENCAP
+        ("OVS_KEY_ATTR_RECIRC_ID", "none"),  # metadata, not in ENCAP
+        ("OVS_KEY_ATTR_MPLS", "array(ovs_key_mpls)"),
+    )
+
+
 class OvsPacket(GenericNetlinkSocket):
     OVS_PACKET_CMD_MISS = 1  # Flow table miss
     OVS_PACKET_CMD_ACTION = 2  # USERSPACE action
@@ -2576,6 +2877,7 @@ def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), vpl=OvsVport()):
 
 
 def main(argv):
+    nlmsg_atoms.encap_ovskey = encap_ovskey
     nlmsg_atoms.ovskey = ovskey
     nlmsg_atoms.ovsactions = ovsactions
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH net-next v7 2/2] selftests: openvswitch: add pop_vlan test
From: Minxi Hou @ 2026-05-07 13:15 UTC (permalink / raw)
  To: netdev
  Cc: aconole, echaudro, i.maximets, davem, edumazet, kuba, pabeni,
	horms, shuah, dev, linux-kselftest, linux-kernel, Minxi Hou
In-Reply-To: <20260507131541.2331771-1-houminxi@gmail.com>

Add test_pop_vlan() to verify OVS kernel datapath pop_vlan action
correctly strips 802.1Q VLAN tags from frames.

Test structure:
- Baseline: untagged forwarding validates basic connectivity.
- Negative: forward without pop_vlan, tagged frame is invisible
  to ns2 (no VLAN sub-interface), ping fails.
- Positive: pop_vlan strips tag on forward path, push_vlan
  restores tag on return path, ping succeeds.

Use static ARP entries to avoid VLAN-tagged ARP complexity.
Rely on ping success/failure for verification -- no tcpdump or
pcap files needed.

Signed-off-by: Minxi Hou <houminxi@gmail.com>
---
 .../selftests/net/openvswitch/openvswitch.sh  | 73 +++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index b327d3061ed5..6d13ee8c2baf 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -27,6 +27,7 @@ tests="
 	upcall_interfaces			ovs: test the upcall interfaces
 	tunnel_metadata				ovs: test extraction of tunnel metadata
 	drop_reason				drop: test drop reasons are emitted
+	pop_vlan				vlan: POP_VLAN action strips tag
 	psample					psample: Sampling packets with psample"
 
 info() {
@@ -830,6 +831,78 @@ test_tunnel_metadata() {
 	return 0
 }
 
+test_pop_vlan() {
+	local sbx="test_pop_vlan"
+	sbx_add "$sbx" || return $?
+	ovs_add_dp "$sbx" vlandp || return 1
+
+	ovs_add_netns_and_veths "$sbx" vlandp \
+		ns1 veth1 ns1veth 192.0.2.1/24 || return 1
+	ovs_add_netns_and_veths "$sbx" vlandp \
+		ns2 veth2 ns2veth 192.0.2.2/24 || return 1
+
+	# Baseline: untagged bidirectional forwarding
+	ovs_add_flow "$sbx" vlandp \
+		'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
+	ovs_add_flow "$sbx" vlandp \
+		'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
+	ovs_add_flow "$sbx" vlandp \
+		'in_port(1),eth(),eth_type(0x0800),ipv4()' '2' || return 1
+	ovs_add_flow "$sbx" vlandp \
+		'in_port(2),eth(),eth_type(0x0800),ipv4()' '1' || return 1
+	ovs_sbx "$sbx" ip netns exec ns1 ping -c 3 -W 2 \
+		192.0.2.2 || return 1
+
+	# VLAN topology: ns1 uses VLAN sub-interface, ns2 is plain
+	ip -n ns1 link add link ns1veth name ns1veth.10 \
+		type vlan id 10 || return 1
+	on_exit "ip -n ns1 link del ns1veth.10 2>/dev/null"
+	ip -n ns1 addr add 198.51.100.1/24 dev ns1veth.10 || return 1
+	ip -n ns1 link set ns1veth.10 up || return 1
+	ip -n ns2 addr add 198.51.100.2/24 dev ns2veth || return 1
+
+	ovs_del_flows "$sbx" vlandp
+
+	# Static ARP: avoids VLAN-tagged ARP complexity
+	local ns1veth10mac ns2mac
+	ns1veth10mac=$(ip -n ns1 link show ns1veth.10 \
+		| awk '/link\/ether/ {print $2}')
+	ns2mac=$(ip -n ns2 link show ns2veth \
+		| awk '/link\/ether/ {print $2}')
+	ip -n ns1 neigh replace 198.51.100.2 lladdr "$ns2mac" \
+		dev ns1veth.10 nud permanent || return 1
+	ip -n ns2 neigh replace 198.51.100.1 \
+		lladdr "$ns1veth10mac" \
+		dev ns2veth nud permanent || return 1
+
+	local vlan_match='in_port(1),eth(),eth_type(0x8100),'
+	vlan_match+='vlan(vid=10),'
+	vlan_match+='encap(eth_type(0x0800),'
+	vlan_match+='ipv4(src=198.51.100.1,proto=1),icmp())'
+
+	# Negative: forward without pop_vlan -- tagged frame
+	# is invisible to ns2 (no VLAN sub-interface), ping fails
+	ovs_add_flow "$sbx" vlandp "$vlan_match" '2' || return 1
+	ovs_sbx "$sbx" ip netns exec ns1 ping -I ns1veth.10 \
+		-c 3 -W 1 198.51.100.2 >/dev/null 2>&1 \
+		&& { info "FAIL: ping should fail without pop_vlan"
+		     return 1; }
+
+	ovs_del_flows "$sbx" vlandp
+
+	# Positive: pop_vlan strips tag on forward path,
+	# push_vlan restores tag on return path -- ping succeeds
+	ovs_add_flow "$sbx" vlandp \
+		"$vlan_match" 'pop_vlan,2' || return 1
+	ovs_add_flow "$sbx" vlandp \
+		'in_port(2),eth(),eth_type(0x0800),ipv4()' \
+		'push_vlan(vid=10,pcp=0,tpid=0x8100),1' || return 1
+	ovs_sbx "$sbx" ip netns exec ns1 ping -I ns1veth.10 \
+		-c 3 -W 2 198.51.100.2 || return 1
+
+	return 0
+}
+
 run_test() {
 	(
 	tname="$1"
-- 
2.53.0


^ permalink raw reply related

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
From: Maxime Chevallier @ 2026-05-07 13:16 UTC (permalink / raw)
  To: Jensen Huang, Thorsten Leemhuis
  Cc: Russell King, Heiner Kallweit, Andrew Lunn, regressions, netdev,
	LKML
In-Reply-To: <CAMpZ1qGEOiPj7cApnWJnojSyEpDmXfco=No5n1VfyTCoNyCyFQ@mail.gmail.com>

Hi,

On 07/05/2026 14:49, Jensen Huang wrote:
> On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
> <regressions@leemhuis.info> wrote:
>>
>> [Jumping in here, as there are no replies yet]
>>
>> BTW, Russel, just in case you missed this: looks like this regressions
>> caused by a change of yours.

I think Russell is dealing with unpleasant personal stuff, let's see if we
can figure this out while he's away.

>>
>> On 4/29/26 14:53, Jensen Huang wrote:
>>>
>>> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
>>> When a network cable is connected during boot, the DMA reset
>>> occasionally fails with the error message: "Failed to reset the dma".
>>>
>>> This appears to be a timing issue related to the EEE RX clock-stop
>>> logic. Based on my investigation with the RTL8211E PHY, I monitored
>>> the PHY register PS1R (MMD device 3, address 0x01) and observed a
>>> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
>>> clock may have already stopped.

From what I get, your current hypthesis is that it takes a while for that
clock to stabilize and therefore we're accessing the DMA registers too soon ?

Can you confirm that with the addition of a small delay ?

>>>
>>> While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
>>
>> Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
>> affected? This is something that is always a good advisable (some people
>> would call it required). In this case even more, as it since a while
>> contains a fix for the change you mentioned, that wasn't backported:
>> c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
>> enabled"). But this is not my area of expertise (and in different area
>> of the code), so that fix might be unrelated to your issue.
> 
> Thanks for the pointer.
> As you suggested, I have tested the mainline and confirmed that the
> issue is not present in v7.1-rc2, nor as early as v6.19-rc1. However,
> I verified that the issue persists in the latest stable v6.18.26.
> I performed a git bisect and the result pointed exactly to the commit
> you mentioned: c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
> when VLAN is enabled").

Do you mean that c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
when VLAN is enabled") introduces the bug on 6.18.26 ?

do you have the possibility of bisecting to verify when exactly the issue
was solved between v6.18 and v6.19 ?

Maxime



^ permalink raw reply

* [PATCH net v2] net: ethtool: fix NULL pointer dereference in phy_reply_size
From: Quan Sun @ 2026-05-07 13:17 UTC (permalink / raw)
  To: netdev, maxime.chevallier, andrew; +Cc: kuba, edumazet, pabeni, Quan Sun

In phy_prepare_data(), several strings such as 'name', 'drvname',
'upstream_sfp_name', and 'downstream_sfp_name' are allocated using
kstrdup(). However, these allocations were not checked  for failure.

If kstrdup() fails for 'name', it returns NULL while the function
continues. This leads to a kernel NULL pointer dereference and panic
later in phy_reply_size() when it unconditionally calls strlen() on
the NULL pointer.

While other strings like 'upstream_sfp_name' might be checked before
access in certain code paths, failing to handle these allocations
consistently can lead to incomplete data reporting or hidden bugs.

Fix this by adding proper NULL checks for all kstrdup() calls in
phy_prepare_data() and implement a centralized error handling path
using goto labels to ensure all previously allocated resources are
freed on failure.

Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump")
Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn>
---
Changes in v2:
- Add Fixes: tag.
- Expand the fix to cover all kstrdup() allocations in the function.
- Use goto labels for a cleaner and more robust error handling path.
---
 net/ethtool/phy.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/net/ethtool/phy.c b/net/ethtool/phy.c
index d4e6887055ab1..f76d94d848d6d 100644
--- a/net/ethtool/phy.c
+++ b/net/ethtool/phy.c
@@ -76,6 +76,7 @@ static int phy_prepare_data(const struct ethnl_req_info *req_info,
 	struct nlattr **tb = info->attrs;
 	struct phy_device_node *pdn;
 	struct phy_device *phydev;
+	int ret;
 
 	/* RTNL is held by the caller */
 	phydev = ethnl_req_get_phydev(req_info, tb, ETHTOOL_A_PHY_HEADER,
@@ -88,8 +89,17 @@ static int phy_prepare_data(const struct ethnl_req_info *req_info,
 		return -EOPNOTSUPP;
 
 	rep_data->phyindex = phydev->phyindex;
+
 	rep_data->name = kstrdup(dev_name(&phydev->mdio.dev), GFP_KERNEL);
+	if (!rep_data->name)
+		return -ENOMEM;
+
 	rep_data->drvname = kstrdup(phydev->drv->name, GFP_KERNEL);
+	if (!rep_data->drvname) {
+		ret = -ENOMEM;
+		goto err_free_name;
+	}
+
 	rep_data->upstream_type = pdn->upstream_type;
 
 	if (pdn->upstream_type == PHY_UPSTREAM_PHY) {
@@ -97,15 +107,33 @@ static int phy_prepare_data(const struct ethnl_req_info *req_info,
 		rep_data->upstream_index = upstream->phyindex;
 	}
 
-	if (pdn->parent_sfp_bus)
+	if (pdn->parent_sfp_bus) {
 		rep_data->upstream_sfp_name = kstrdup(sfp_get_name(pdn->parent_sfp_bus),
 						      GFP_KERNEL);
+		if (!rep_data->upstream_sfp_name) {
+			ret = -ENOMEM;
+			goto err_free_drvname;
+		}
+	}
 
-	if (phydev->sfp_bus)
+	if (phydev->sfp_bus) {
 		rep_data->downstream_sfp_name = kstrdup(sfp_get_name(phydev->sfp_bus),
 							GFP_KERNEL);
+		if (!rep_data->downstream_sfp_name) {
+			ret = -ENOMEM;
+			goto err_free_upstream_sfp;
+		}
+	}
 
 	return 0;
+
+err_free_upstream_sfp:
+	kfree(rep_data->upstream_sfp_name);
+err_free_drvname:
+	kfree(rep_data->drvname);
+err_free_name:
+	kfree(rep_data->name);
+	return ret;
 }
 
 static int phy_fill_reply(struct sk_buff *skb,
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v3 1/4] net: eth: fbnic: Fix addr validation in pcs write
From: Mike Marciniszyn @ 2026-05-07 13:20 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jakub Kicinski, Alexander Duyck, kernel-team, Andrew Lunn,
	David S. Miller, Eric Dumazet, Heiner Kallweit, Russell King,
	Jacob Keller, Mohsin Bashir, Simon Horman, Lee Trager,
	Andrew Lunn, netdev, linux-kernel
In-Reply-To: <1cd8256c-286a-4a73-b0a0-89233cb3c2d0@redhat.com>

On Thu, May 07, 2026 at 09:20:53AM +0200, Paolo Abeni wrote:
> On 5/7/26 3:58 AM, Jakub Kicinski wrote:
> > On Mon,  4 May 2026 09:58:12 -0400 mike.marciniszyn@gmail.com wrote:
> >> From: "Mike Marciniszyn (Meta)" <mike.marciniszyn@gmail.com>
> >>
> >> The DW IP has two distinct PCS address ranges cooresponding
> >> to the C45 PCS registers.
> >>
> >> The shim translates the PCS addr/regno into specific CSR writes
> >> into one of those two zero-relative.
> >>
> >> This patch fixes a one off in the test that could allow an invalid
> >> CSR write if an addr == 2 was called.
> >>
> >> This patch contains a fix for addr validation in fbnic_mdio_write_pcs()
> >> to only return actual CSR reads for addr 0 and 1.
> >>
> >> There are as of yet, no real impact for the bug as no PCS writes are
> >> not yet present.
> >
> > Hi Paolo! Was there a reason / do you recall why this was not applied?
> > (I dropped it from patchwork now. If the omission was accidental it has
> > to be reposted)
>
> Darn, limited capacity here plus re-submission glitch: v3 had a slightly
> different cover title (due to typo) WRT v2 so PW did not mark v2 as
> superseded. I process patches via PW in sequence, when I reached v2 I
> considered the sashiko comment not blocking and I apply it. I was unable
> to reach v3 until now.
>
> TL;DR: @Mike: please re-submit 1/4 and double check there are not other
> differences between v2 and v3 - otherwise more patches needed. Also
> please ensure you keep the series title consistent among revision, or at
> least manually remove old revisions from PW upon resubmission.
>
> Thanks,
>
> Paolo
>

I double checked v2 -> v3 for the other patches are ok.

I'm just now resending 1/4 of the series.  I reworded the commit message
to fix the AI review comment.

Mike

^ permalink raw reply

* Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction
From: Paolo Abeni @ 2026-05-07 13:21 UTC (permalink / raw)
  To: Simon Schippers, hawk, netdev
  Cc: kernel-team, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <ee275aa6-af27-4dac-9afa-da88abde312b@schippers-hamm.de>

On 5/7/26 8:54 AM, Simon Schippers wrote:
>> @@ -928,9 +968,13 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
>>  			}
>>  		} else {
>>  			/* ndo_start_xmit */
>> -			struct sk_buff *skb = ptr;
>> +			bool bql_charged = veth_ptr_is_bql(ptr);
>> +			struct sk_buff *skb = veth_ptr_to_skb(ptr);
>>  
>>  			stats->xdp_bytes += skb->len;
>> +			if (peer_txq && bql_charged)
>> +				netdev_tx_completed_queue(peer_txq, 1, VETH_BQL_UNIT);
> 
> In the discussion with Jonas [1], I left a comment explaining why I think
> this doesn’t work.
> 
> I still think first that adding an option to modify the hard-coded
> VETH_RING_SIZE is the way to go.
Isn't the veth_poll() (towards the end of the function) a more natural
place to issue completion events?

/P


^ permalink raw reply

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
From: Zhu Yanjun @ 2026-05-07 13:25 UTC (permalink / raw)
  To: Edward Adam Davis, syzbot+d8f76778263ab65c2b21,
	yanjun.zhu@linux.dev
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzkaller-bugs, zyjzyj2000
In-Reply-To: <tencent_611BEB4B141B1A2526BAA3BBB2335F9E9108@qq.com>


在 2026/5/7 5:50, Edward Adam Davis 写道:
> We must serialize calls to nldev_dellink() or risk a crash as syzbot
> reported:
>
> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> Call Trace:
>   udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>   rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>   rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>   
> Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>

Thanks a lot. This looks like a good solution. Since the issue is 
reproducible,

have you sent this commit to syzbot for verification?

Thanks,

Zhu Yanjun

> ---
>   drivers/infiniband/core/nldev.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 96c745d5bac4..3cb3cb7629fe 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1816,6 +1816,8 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	return err;
>   }
>   
> +static DEFINE_MUTEX(nldev_dellink_mutex);
> +
>   static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   			  struct netlink_ext_ack *extack)
>   {
> @@ -1846,7 +1848,9 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	 * implicitly scoped to the driver supporting dynamic link deletion like RXE.
>   	 */
>   	if (device->link_ops && device->link_ops->dellink) {
> +		mutex_lock(&nldev_dellink_mutex);
>   		err = device->link_ops->dellink(device);
> +		mutex_unlock(&nldev_dellink_mutex);
>   		if (err)
>   			return err;
>   	}

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply

* Re: [PATCH ipsec-next v8 10/14] xfrm: move encap and xuo into struct xfrm_migrate
From: Sabrina Dubroca @ 2026-05-07 13:26 UTC (permalink / raw)
  To: Antony Antony
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	Masahide NAKAMURA, Paul Moore, Stephen Smalley, Ondrej Mosnacek,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, selinux,
	linux-doc, Chiachang Wang, Yan Yan, devel
In-Reply-To: <migrate-state-v8-10-4578fb016965@secunet.com>

2026-05-05, 06:34:04 +0200, Antony Antony wrote:
> In preparation for an upcoming patch, move the xfrm_encap_tmpl and
> xfrm_user_offload pointers from separate parameters into struct
> xfrm_migrate, reducing the parameter count of
> xfrm_state_migrate_create(), xfrm_state_migrate_install(), and
> xfrm_state_migrate().
> 
> The fields are placed after the four xfrm_address_t members where
> the struct is naturally 8-byte aligned, avoiding padding.
> 
> No functional change.
> 
> Signed-off-by: Antony Antony <antony.antony@secunet.com>
> 
> ---
> v5->v6: added this patch.
> ---
>  include/net/xfrm.h     |  7 ++-----
>  net/xfrm/xfrm_policy.c |  4 +++-
>  net/xfrm/xfrm_state.c  | 20 +++++++-------------
>  3 files changed, 12 insertions(+), 19 deletions(-)

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

-- 
Sabrina

^ permalink raw reply

* Re: assert in phylink.c with lan7801 and dp83tc811 since kernel 6.18
From: Andrew Lunn @ 2026-05-07 12:49 UTC (permalink / raw)
  To: Sven Schuchmann; +Cc: Maxime Chevallier, netdev@vger.kernel.org
In-Reply-To: <BEZP281MB2245BB0FC7DA18B285BD564CD93C2@BEZP281MB2245.DEUP281.PROD.OUTLOOK.COM>

> I think it is worth doing a patch on this also?
> I am not that experienced in sending kernel patches but I could try...

Please try, and i will help you.

A good place to start is:

https://docs.kernel.org/process/submitting-patches.html

and

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html

	Andrew

^ permalink raw reply

* [PATCH 1/2 net v3] ipv6: addrconf: fix temp address generation after prefix deprecation
From: Fernando Fernandez Mancera @ 2026-05-07 13:28 UTC (permalink / raw)
  To: netdev
  Cc: linux-kselftest, horms, pabeni, kuba, edumazet, davem, idosch,
	dsahern, Fernando Fernandez Mancera, Łukasz Stelmach

When a router temporarily deprecates an IPv6 prefix (either by sending a
Router Advertisement with Preferred Lifetime = 0 or by letting the
lifetime expire) and later restores it, the kernel permanently loses its
ability to generate temporary privacy addresses (RFC 8981) for that
prefix.

This happens because the address worker attempts to generate a
replacement temporary address when the current one nears expiration. As
the base prefix is deprecated already, the generation fails after
marking the temporary address already having spawned a replacement
(ifp->regen_count++).

When the router eventually restores the prefix, the temporary address
becomes active again. However, once it naturally expires, the address
worker sees this temporary address already tried to generate one and
skips the regeneration.

Fix this by verifying that the base prefix has sufficient preferred
lifetime remaining before attempting to generate a new temporary
address. In addition, make ipv6_create_tempaddr() return meaningful
error codes. This way, we can catch if a 0-lft RA arrived just after we
passed the verification mentioned above. If we don't have sufficient
preferred lifetime remaining, the worker will keep the next timer as it
is.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Łukasz Stelmach <steelman@post.pl>
Closes: https://lore.kernel.org/netdev/87340td30q.fsf%25steelman@post.pl/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: adjusted commit message, adjusted the implementation to cover all
race conditions
v3: regen now if ipv6_create_tempaddr failed due to timer to avoid an
infinite loop as we restart the loop and we need to check now against
prefered_lft again.
---
 net/ipv6/addrconf.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5476b6536eb7..d54737b5610d 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1379,7 +1379,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block)
 		write_unlock_bh(&idev->lock);
 		pr_info("%s: use_tempaddr is disabled\n", __func__);
 		in6_dev_put(idev);
-		ret = -1;
+		ret = -EOPNOTSUPP;
 		goto out;
 	}
 	spin_lock_bh(&ifp->lock);
@@ -1390,7 +1390,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block)
 		pr_warn("%s: regeneration time exceeded - disabled temporary address support\n",
 			__func__);
 		in6_dev_put(idev);
-		ret = -1;
+		ret = -EADDRNOTAVAIL;
 		goto out;
 	}
 	in6_ifa_hold(ifp);
@@ -1466,7 +1466,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block)
 		    cfg.preferred_lft > if_public_preferred_lft) {
 			in6_ifa_put(ifp);
 			in6_dev_put(idev);
-			ret = -1;
+			ret = -EINVAL;
 			goto out;
 		}
 	}
@@ -4655,8 +4655,17 @@ static void addrconf_verify_rtnl(struct net *net)
 				/* This is a non-regenerated temporary addr. */
 
 				unsigned long regen_advance = ipv6_get_regen_advance(ifp->idev);
+				unsigned long pub_tstamp = READ_ONCE(ifp->ifpub->tstamp);
+				unsigned long pub_age = 0;
+				bool pub_expired = false;
+
+				if (time_after(now, pub_tstamp))
+					pub_age = (now - pub_tstamp) / HZ;
 
-				if (age + regen_advance >= ifp->prefered_lft) {
+				if (pub_age + regen_advance >= READ_ONCE(ifp->ifpub->prefered_lft))
+					pub_expired = true;
+
+				if (age + regen_advance >= ifp->prefered_lft && !pub_expired) {
 					struct inet6_ifaddr *ifpub = ifp->ifpub;
 					if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next))
 						next = ifp->tstamp + ifp->prefered_lft * HZ;
@@ -4670,12 +4679,19 @@ static void addrconf_verify_rtnl(struct net *net)
 					ifpub->regen_count = 0;
 					spin_unlock(&ifpub->lock);
 					rcu_read_unlock_bh();
-					ipv6_create_tempaddr(ifpub, true);
+
+					if (ipv6_create_tempaddr(ifpub, true) == -EINVAL) {
+						spin_lock_bh(&ifp->lock);
+						ifp->regen_count = 0;
+						spin_unlock_bh(&ifp->lock);
+						now = jiffies;
+					}
 					in6_ifa_put(ifpub);
 					in6_ifa_put(ifp);
 					rcu_read_lock_bh();
 					goto restart;
-				} else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next))
+				} else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next) &&
+					   !pub_expired)
 					next = ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ;
 			}
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH 2/2 net v3] selftests: fib_tests: add temporary IPv6 address renewal test
From: Fernando Fernandez Mancera @ 2026-05-07 13:28 UTC (permalink / raw)
  To: netdev
  Cc: linux-kselftest, horms, pabeni, kuba, edumazet, davem, idosch,
	dsahern, Fernando Fernandez Mancera
In-Reply-To: <20260507132828.3923-1-fmancera@suse.de>

Add a test to check that temporary IPv6 address is regenerated properly
after the base prefix is deprecated and restored.

Fib6 temporary address renewal test
    TEST: IPv6 temporary address cleanly deprecated and regenerated     [ OK ]

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: adjusted the sleep so there is enough time for the issue to trigger,
added cleanup at the end
v3: no changes
---
 tools/testing/selftests/net/fib_tests.sh | 59 +++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index af64f93bb2e1..8f10de0eb985 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -12,7 +12,7 @@ TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify \
        ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr \
        ipv6_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh fib6_gc_test \
        ipv4_mpath_list ipv6_mpath_list ipv4_mpath_balance ipv6_mpath_balance \
-       ipv4_mpath_balance_preferred fib6_ra_to_static"
+       ipv4_mpath_balance_preferred fib6_ra_to_static fib6_temp_addr_renewal"
 
 VERBOSE=0
 PAUSE_ON_FAIL=no
@@ -1611,6 +1611,62 @@ fib6_ra_to_static()
 	cleanup &> /dev/null
 }
 
+fib6_temp_addr_renewal() {
+	setup
+
+	echo
+	echo "Fib6 temporary address renewal test"
+	set -e
+
+	# ra6 is required for the test. (ipv6toolkit)
+	if [ ! -x "$(command -v ra6)" ]; then
+	    echo "SKIP: ra6 not found."
+	    set +e
+	    cleanup &> /dev/null
+	    return
+	fi
+
+	# Create a pair of veth devices to send a RA message from one
+	# device to another.
+	$IP link add veth1 type veth peer name veth2
+	$IP link set dev veth1 up
+	$IP link set dev veth2 up
+
+	# Make veth1 ready to receive RA messages.
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.accept_ra=2
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.use_tempaddr=2
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.temp_prefered_lft=15
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.max_desync_factor=0
+
+	# Send a RA message with a prefix from veth2.
+	$NS_EXEC ra6 -i veth2 -s fe80::1 -d ff02::1 -P 2001:12::/64\#LA\#3600\#3600 -e
+	sleep 3
+
+	# Deprecate it
+	$NS_EXEC ra6 -i veth2 -s fe80::1 -d ff02::1 -P 2001:12::/64\#LA\#3600\#0 -e
+	sleep 3
+
+	# Restore it
+	$NS_EXEC ra6 -i veth2 -s fe80::1 -d ff02::1 -P 2001:12::/64\#LA\#3600\#3600 -e
+
+	ret=1
+	for i in $(seq 1 25); do
+		sleep 1
+		num_dep="$($IP -6 addr | grep -c "temporary deprecated" || true)"
+		num_tot="$($IP -6 addr | grep -c "temporary" || true)"
+
+		if [ "$num_dep" -eq 1 ] && [ "$num_tot" -ge 2 ]; then
+			ret=0
+			break
+		fi
+	done
+	log_test "$ret" 0 "IPv6 temporary address cleanly deprecated and regenerated"
+
+	set +e
+
+	cleanup &> /dev/null
+}
+
 # add route for a prefix, flushing any existing routes first
 # expected to be the first step of a test
 add_route()
@@ -3002,6 +3058,7 @@ do
 	ipv6_mpath_balance)		ipv6_mpath_balance_test;;
 	ipv4_mpath_balance_preferred)	ipv4_mpath_balance_preferred_test;;
 	fib6_ra_to_static)		fib6_ra_to_static;;
+	fib6_temp_addr_renewal)		fib6_temp_addr_renewal;;
 
 	help) echo "Test names: $TESTS"; exit 0;;
 	esac
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
From: Edward Adam Davis @ 2026-05-07 13:40 UTC (permalink / raw)
  To: yanjun.zhu
  Cc: akpm, arjan, davem, dsahern, eadavis, edumazet, hdanton, horms,
	jgg, kuba, kuni1840, kuniyu, leon, linux-kernel, linux-rdma,
	netdev, pabeni, syzbot+d8f76778263ab65c2b21, syzkaller-bugs,
	zyjzyj2000
In-Reply-To: <3c4264e6-2e93-4121-a8ec-5ac20e5cc213@linux.dev>

On Thu, 7 May 2026 06:25:54 -0700, Zhu Yanjun wrote:
> > We must serialize calls to nldev_dellink() or risk a crash as syzbot
> > reported:
> >
> > KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
> > Call Trace:
> >   udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
> >   rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
> >   rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
> >   rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
> >   rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
> >
> > Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
> > Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
> > Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
> > Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> 
> Thanks a lot. This looks like a good solution. Since the issue is
> reproducible,
> 
> have you sent this commit to syzbot for verification?
The patch has been verified by syzbot.

BR,
Edward


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next] ice: add SBQ posted writes with non-posted support for CGU
From: Korba, Przemyslaw @ 2026-05-07 13:40 UTC (permalink / raw)
  To: Keller, Jacob E, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Loktionov, Aleksandr, Kubalewski, Arkadiusz
In-Reply-To: <85ec6953-3a52-4ea6-9c74-d798bf5ecce3@intel.com>


> -----Original Message-----
> From: Keller, Jacob E <jacob.e.keller@intel.com>
> Sent: Tuesday, May 5, 2026 1:42 AM
> To: Korba, Przemyslaw <przemyslaw.korba@intel.com>; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>;
> Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next] ice: add SBQ posted writes with non-posted support for CGU
> 
> On 4/15/2026 4:27 AM, Przemyslaw Korba wrote:
> > From: Karol Kolacinski <karol.kolacinski@intel.com>
> >
> > Sideband queue (SBQ) is a HW queue with very short completion time. All
> > SBQ writes were posted by default, which means that the driver did not
> > have to wait for completion from the neighbor device, because there was
> > none. This introduced unnecessary delays, where only those delays were
> > "ensuring" that the command is "completed" and this was a potential race
> > condition.
> >
> > Add the possibility to perform non-posted writes where it's necessary to
> > wait for completion, instead of relying on fake completion from the FW,
> > where only the delays are guarding the writes.
> >
> > Flush the SBQ by reading address 0 from the PHY 0 before issuing SYNC
> > command to ensure that writes to all PHYs were completed and skip SBQ
> > message completion if it's posted.
> >
> > To analyze if delays are gone, look for and compare time spent in
> > ice_sq_send_cmd — posted writes should return immediately after the wr32.
> > That can be done for example by adjusting phc time with phc_ctl on E830
> > device, for less than 2 seconds to use this new mechanism. Without it,
> > command below will fail.
> >
> > Reproduction steps:
> > phc_ctl eth13 adj 1
> > phc_ctl[4478170.994]: adjusted clock by 1.000000 seconds
> >
> > Check trace for timing for comparisions:
> > echo ice_sbq_send_cmd > /sys/kernel/debug/tracing/set_ftrace_filter
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > cat /sys/kernel/debug/tracing/trace
> >
> > Tested on:
> >   - Intel E830 NIC (FW version 1.00)
> >   - Kernel 6.19.0+
> >
> > Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
> > Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
> > Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> > Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> > ---
> 
> This doesn't appear to apply clean to the tip of Intel Wired LAN
> dev-queue, nor to net-next/main...
> 
> >  drivers/net/ethernet/intel/ice/ice_common.c  | 18 ++++--
> >  drivers/net/ethernet/intel/ice/ice_ptp_hw.c  | 64 ++++++++++++--------
> >  drivers/net/ethernet/intel/ice/ice_sbq_cmd.h |  5 +-
> >  3 files changed, 53 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> > index f84990996530..2cd3d6d450a9 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_common.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> > @@ -1777,23 +1777,29 @@ int ice_sbq_rw_reg(struct ice_hw *hw, struct ice_sbq_msg_input *in, u16 flags)
> >  	msg.msg_addr_low = cpu_to_le16(in->msg_addr_low);
> >  	msg.msg_addr_high = cpu_to_le32(in->msg_addr_high);
> >
> > -	if (in->opcode)
> > +	switch (in->opcode) {
> > +	case ice_sbq_msg_wr_p:
> > +	case ice_sbq_msg_wr_np:
> >  		msg.data = cpu_to_le32(in->data);
> > -	else
> > +		break;
> > +	case ice_sbq_msg_rd:
> >  		/* data read comes back in completion, so shorten the struct by
> >  		 * sizeof(msg.data)
> >  		 */
> >  		msg_len -= sizeof(msg.data);
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> >
> > -	if (in->opcode == ice_sbq_msg_wr)
> > -		cd.posted = 1;
> 
> It looks like this code in the upstream version doesn't have the cd
> structure on this function.
> 
> > +	cd.posted = in->opcode == ice_sbq_msg_wr_p;
> >
> 
> It looks like this is based on top of "ice: fix posted write support for
> sideband queue operations"? That was dropped from the queue because of
> our discussion that you would re-submit a fixed version.
> 
> Since that didn't get applied, this won't apply clean either. Do you
> still want the part that fixes E830 to go to net? Or do you just want to
> implement posted writes all together in one patch?
> 
> Either way, could you please re-submit the work either as 2 patches or
> as a single combined patch?
> 
> Thanks,
> Jake

Hi, thank you for review 😊 I thin iwl-net will be a better place for this patch, since
we need it backported to older kernels as well. I will re-upload it to net as one patch.

^ permalink raw reply

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
From: Thorsten Leemhuis @ 2026-05-07 13:13 UTC (permalink / raw)
  To: Ovidiu Panait, Jensen Huang
  Cc: Russell King, Heiner Kallweit, Andrew Lunn, regressions, netdev,
	LKML
In-Reply-To: <CAMpZ1qGEOiPj7cApnWJnojSyEpDmXfco=No5n1VfyTCoNyCyFQ@mail.gmail.com>

[+Ovidiu Panait]

On 5/7/26 14:49, Jensen Huang wrote:
> On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
> <regressions@leemhuis.info> wrote:
>> On 4/29/26 14:53, Jensen Huang wrote:
>
>>> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
>>> When a network cable is connected during boot, the DMA reset
>>> occasionally fails with the error message: "Failed to reset the dma".
>>>
>>> This appears to be a timing issue related to the EEE RX clock-stop
>>> logic. Based on my investigation with the RTL8211E PHY, I monitored
>>> the PHY register PS1R (MMD device 3, address 0x01) and observed a
>>> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
>>> clock may have already stopped.
>>>
>>> While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
>>
>> Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
>> affected? This is something that is always a good advisable (some people
>> would call it required). In this case even more, as it since a while
>> contains a fix for the change you mentioned, that wasn't backported:
>> c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
>> enabled"). But this is not my area of expertise (and in different area
>> of the code), so that fix might be unrelated to your issue.
> 
> Thanks for the pointer.
> As you suggested, I have tested the mainline and confirmed that the
> issue is not present in v7.1-rc2, nor as early as v6.19-rc1. However,
> I verified that the issue persists in the latest stable v6.18.26.
> I performed a git bisect and the result pointed exactly to the commit
> you mentioned: c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
> when VLAN is enabled").

Great! Could you please cherry-pick c171e679ee66d7 to 6.18.y and see if
that fixes things? It sounds like it should.

@Ovidiu Panait: c171e679ee66d7 is a commit of yours. If Jensen confirms
that cherry-picking fixed the problem, I'd say we ask Greg to pick it up
for 6.18.y -- unless you see any reasons why that might be a bad idea.

> Additionally, I tested the case where CONFIG_VLAN_8021Q is not set,
> and the DMA reset issue occurs again.

I'd say that is likely best discussed in a new thread you might want to
start. Also wondering if it was like that earlier. Or iow: if that is a
regression or not.

Ciao, Thorsten

>>> ensures the clock is running before the DMA reset, my tests suggest
>>> that the phylink_rx_clk_stop_block() call might not provide a
>>> sufficiently stable RX clock in time for the immediate DMA reset that
>>> follows.
>>>
>>> Since stmmac already sets mac_requires_rxc = true, I modified
>>> phylink_bringup_phy() to honor this flag. This avoids toggling the
>>> PHY's clk_stop_enable during the initialization sequence, ensuring the
>>> RX clock remains active and stable throughout.
>>> With the change below, I achieved 200/200 successful reboots with the
>>> cable connected (previously ~50% failure rate).
>>>
>>> --- a/drivers/net/phy/phylink.c
>>> +++ b/drivers/net/phy/phylink.c
>>> @@ -2171,7 +2171,7 @@ static int phylink_bringup_phy(struct phylink
>>> *pl, struct phy_device *phy,
>>>      /* Allow the MAC to stop its clock if the PHY has the capability */
>>>      pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0;
>>>
>>> -    if (pl->mac_supports_eee_ops) {
>>> +    if (pl->mac_supports_eee_ops && !pl->config->mac_requires_rxc) {
>>>          /* Explicitly configure whether the PHY is allowed to stop it's
>>>           * receive clock.
>>>           */
>>>
>>> Any feedback/testing on this would be appreciated.
>>>
>>> Best regards,
>>> Jensen Huang
>>>
>>


^ permalink raw reply

* Re: [PATCH net-next v3 1/4] net: eth: fbnic: Fix addr validation in pcs write
From: Mike Marciniszyn @ 2026-05-07 13:48 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jakub Kicinski, Alexander Duyck, kernel-team, Andrew Lunn,
	David S. Miller, Eric Dumazet, Heiner Kallweit, Russell King,
	Jacob Keller, Mohsin Bashir, Simon Horman, Lee Trager,
	Andrew Lunn, netdev, linux-kernel
In-Reply-To: <afyRrWW8XVeO55Ob@PF5YBGDS.localdomain>

On Thu, May 07, 2026 at 09:20:45AM -0400, Mike Marciniszyn wrote:
> On Thu, May 07, 2026 at 09:20:53AM +0200, Paolo Abeni wrote:
> > On 5/7/26 3:58 AM, Jakub Kicinski wrote:
> > > On Mon,  4 May 2026 09:58:12 -0400 mike.marciniszyn@gmail.com wrote:
> > >> From: "Mike Marciniszyn (Meta)" <mike.marciniszyn@gmail.com>
> > >>
> > >> The DW IP has two distinct PCS address ranges cooresponding
> > >> to the C45 PCS registers.
> > >>
> > >> The shim translates the PCS addr/regno into specific CSR writes
> > >> into one of those two zero-relative.
> > >>
> > >> This patch fixes a one off in the test that could allow an invalid
> > >> CSR write if an addr == 2 was called.
> > >>
> > >> This patch contains a fix for addr validation in fbnic_mdio_write_pcs()
> > >> to only return actual CSR reads for addr 0 and 1.
> > >>
> > >> There are as of yet, no real impact for the bug as no PCS writes are
> > >> not yet present.
> > >
> > > Hi Paolo! Was there a reason / do you recall why this was not applied?
> > > (I dropped it from patchwork now. If the omission was accidental it has
> > > to be reposted)
> >
> > Darn, limited capacity here plus re-submission glitch: v3 had a slightly
> > different cover title (due to typo) WRT v2 so PW did not mark v2 as
> > superseded. I process patches via PW in sequence, when I reached v2 I
> > considered the sashiko comment not blocking and I apply it. I was unable
> > to reach v3 until now.
> >
> > TL;DR: @Mike: please re-submit 1/4 and double check there are not other
> > differences between v2 and v3 - otherwise more patches needed. Also
> > please ensure you keep the series title consistent among revision, or at
> > least manually remove old revisions from PW upon resubmission.
> >
> > Thanks,
> >
> > Paolo
> >
>
> I double checked v2 -> v3 for the other patches are ok.
>
> I'm just now resending 1/4 of the series.  I reworded the commit message
> to fix the AI review comment.
>
> Mike

The patch isn't showing up on either lore or patchwork.

I may need to bump the rev?

Mike

^ permalink raw reply

* [RFC PATCH net-next] net: bridge: Move long delayed work on system_dfl_long_wq
From: Marco Crivellari @ 2026-05-07 13:48 UTC (permalink / raw)
  To: linux-kernel, netdev, bridge
  Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
	Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
	Simon Horman, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Ido Schimmel, Nikolay Aleksandrov

Currently the code enqueue work items using {queue|mod}_delayed_work(),
using system_long_wq. This workqueue should be used when long works are
expected and it is a per-cpu workqueue.

The function(s) end up calling __queue_delayed_work(), which set a global
timer that could fire anywhere, enqueuing the work where the timer fired.

Unbound works could benefit from scheduler task placement, to optimize
performance and power consumption. Long work shouldn't stick to a single
CPU.

Recently, a new unbound workqueue specific for long running work has
been added:

    c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")

Since the workqueue work doesn't rely on per-cpu variables, there is no
obvious reason that justify the use of a per-cpu workqueue. So change
system_long_wq with system_dfl_long_wq so that the work may benefit from
scheduler task placement.

Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 net/bridge/br_fdb.c    | 2 +-
 net/bridge/br_stp.c    | 2 +-
 net/bridge/br_stp_if.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index ac81e58d5f70..14c7943d59a1 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -583,7 +583,7 @@ void br_fdb_cleanup(struct work_struct *work)
 
 	/* Cleanup minimum 10 milliseconds apart */
 	work_delay = max_t(unsigned long, work_delay, msecs_to_jiffies(10));
-	mod_delayed_work(system_long_wq, &br->gc_work, work_delay);
+	mod_delayed_work(system_dfl_long_wq, &br->gc_work, work_delay);
 }
 
 static void br_fdb_delete_locals_per_vlan_port(struct net_bridge *br,
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 024210f95468..426e1c76ef6e 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -640,7 +640,7 @@ int br_set_ageing_time(struct net_bridge *br, clock_t ageing_time)
 	br->ageing_time = t;
 	spin_unlock_bh(&br->lock);
 
-	mod_delayed_work(system_long_wq, &br->gc_work, 0);
+	mod_delayed_work(system_dfl_long_wq, &br->gc_work, 0);
 
 	return 0;
 }
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 28c1d3f7e22f..7a5f4747e1a1 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -53,7 +53,7 @@ void br_stp_enable_bridge(struct net_bridge *br)
 	spin_lock_bh(&br->lock);
 	if (br->stp_enabled == BR_KERNEL_STP)
 		mod_timer(&br->hello_timer, jiffies + br->hello_time);
-	mod_delayed_work(system_long_wq, &br->gc_work, HZ / 10);
+	mod_delayed_work(system_dfl_long_wq, &br->gc_work, HZ / 10);
 
 	br_config_bpdu_generation(br);
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH iwl-net] ice: support SBQ posted writes with non-posted support for CGU
From: Przemyslaw Korba @ 2026-05-07 13:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, aleksandr.loktionov,
	arkadiusz.kubalewski, Przemyslaw Korba

From: Karol Kolacinski <karol.kolacinski@intel.com>

Sideband queue (SBQ) is a HW queue with very short completion time. All
SBQ writes were posted by default, which means that the driver did not
have to wait for completion from the neighbor device, because there was
none. This introduced unnecessary delays, where only those delays were
"ensuring" that the command is "completed" and this was a potential race
condition.

Add the possibility to perform non-posted writes where it's necessary to
wait for completion, instead of relying on fake completion from the FW,
where only the delays are guarding the writes.

Flush the SBQ by reading address 0 from the PHY 0 before issuing SYNC
command to ensure that writes to all PHYs were completed and skip SBQ
message completion if it's posted.

To analyze if delays are gone, look for and compare time spent in
ice_sq_send_cmd — posted writes should return immediately after the wr32.
That can be done for example by adjusting phc time with phc_ctl on E830
device, for less than 2 seconds to use this new mechanism. Without it,
command below will fail.

Reproduction steps:
phc_ctl eth13 adj 1
phc_ctl[4478170.994]: adjusted clock by 1.000000 seconds

Check trace for timing for comparisions:
echo ice_sbq_send_cmd > /sys/kernel/debug/tracing/set_ftrace_filter
echo function_graph > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace

Tested on:
  - Intel E830 NIC (FW version 1.00)
  - Kernel 6.19.0+

Fixes: 8f5ee3c477a8 ("ice: add support for sideband messages")
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c   | 21 ++++--
 drivers/net/ethernet/intel/ice/ice_controlq.c |  4 ++
 drivers/net/ethernet/intel/ice/ice_controlq.h |  1 +
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c   | 64 +++++++++++--------
 drivers/net/ethernet/intel/ice/ice_sbq_cmd.h  |  5 +-
 5 files changed, 62 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 0ec65007d672..d5007f6c9d6e 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1762,6 +1762,7 @@ int ice_sbq_rw_reg(struct ice_hw *hw, struct ice_sbq_msg_input *in, u16 flags)
 {
 	struct ice_sbq_cmd_desc desc = {0};
 	struct ice_sbq_msg_req msg = {0};
+	struct ice_sq_cd cd = {};
 	u16 msg_len;
 	int status;
 
@@ -1774,19 +1775,29 @@ int ice_sbq_rw_reg(struct ice_hw *hw, struct ice_sbq_msg_input *in, u16 flags)
 	msg.msg_addr_low = cpu_to_le16(in->msg_addr_low);
 	msg.msg_addr_high = cpu_to_le32(in->msg_addr_high);
 
-	if (in->opcode)
+	switch (in->opcode) {
+	case ice_sbq_msg_wr_p:
+	case ice_sbq_msg_wr_np:
 		msg.data = cpu_to_le32(in->data);
-	else
+		break;
+	case ice_sbq_msg_rd:
 		/* data read comes back in completion, so shorten the struct by
 		 * sizeof(msg.data)
 		 */
 		msg_len -= sizeof(msg.data);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	cd.posted = in->opcode == ice_sbq_msg_wr_p;
 
 	desc.flags = cpu_to_le16(flags);
 	desc.opcode = cpu_to_le16(ice_sbq_opc_neigh_dev_req);
 	desc.param0.cmd_len = cpu_to_le16(msg_len);
-	status = ice_sbq_send_cmd(hw, &desc, &msg, msg_len, NULL);
-	if (!status && !in->opcode)
+	status = ice_sbq_send_cmd(hw, &desc, &msg, msg_len, &cd);
+
+	if (!status && in->opcode == ice_sbq_msg_rd)
 		in->data = le32_to_cpu
 			(((struct ice_sbq_msg_cmpl *)&msg)->data);
 	return status;
@@ -6557,7 +6568,7 @@ int ice_write_cgu_reg(struct ice_hw *hw, u32 addr, u32 val)
 {
 	struct ice_sbq_msg_input cgu_msg = {
 		.dest_dev = ice_get_dest_cgu(hw),
-		.opcode = ice_sbq_msg_wr,
+		.opcode = ice_sbq_msg_wr_np,
 		.msg_addr_low = addr,
 		.data = val
 	};
diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.c b/drivers/net/ethernet/intel/ice/ice_controlq.c
index dcb837cadd18..a6008dc77fa4 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.c
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.c
@@ -1086,6 +1086,10 @@ ice_sq_send_cmd(struct ice_hw *hw, struct ice_ctl_q_info *cq,
 	wr32(hw, cq->sq.tail, cq->sq.next_to_use);
 	ice_flush(hw);
 
+	/* If the message is posted, don't wait for completion. */
+	if (cd && cd->posted)
+		goto sq_send_command_error;
+
 	/* Wait for the command to complete. If it finishes within the
 	 * timeout, copy the descriptor back to temp.
 	 */
diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.h b/drivers/net/ethernet/intel/ice/ice_controlq.h
index 788040dd662e..c50d6fcbacba 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.h
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.h
@@ -77,6 +77,7 @@ struct ice_ctl_q_ring {
 /* sq transaction details */
 struct ice_sq_cd {
 	struct libie_aq_desc *wb_desc;
+	u8 posted : 1;
 };
 
 /* rq event information */
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 2c18e16fe053..5a1a1f5ea9bb 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -352,6 +352,16 @@ void ice_ptp_src_cmd(struct ice_hw *hw, enum ice_ptp_tmr_cmd cmd)
 static void ice_ptp_exec_tmr_cmd(struct ice_hw *hw)
 {
 	struct ice_pf *pf = container_of(hw, struct ice_pf, hw);
+	struct ice_sbq_msg_input msg = {
+		.dest_dev = ice_sbq_dev_phy_0,
+		.opcode = ice_sbq_msg_rd,
+	};
+	int err;
+
+	/* Flush SBQ by reading address 0 on PHY 0 */
+	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
+	if (err)
+		dev_warn(ice_hw_to_dev(hw), "Failed to flush SBQ: %d\n", err);
 
 	if (!ice_is_primary(hw))
 		hw = ice_get_primary_hw(pf);
@@ -442,7 +452,7 @@ static int ice_write_phy_eth56g(struct ice_hw *hw, u8 port, u32 addr, u32 val)
 {
 	struct ice_sbq_msg_input msg = {
 		.dest_dev = ice_ptp_get_dest_dev_e825(hw, port),
-		.opcode = ice_sbq_msg_wr,
+		.opcode = ice_sbq_msg_wr_p,
 		.msg_addr_low = lower_16_bits(addr),
 		.msg_addr_high = upper_16_bits(addr),
 		.data = val
@@ -2504,11 +2514,12 @@ static bool ice_is_40b_phy_reg_e82x(u16 low_addr, u16 *high_addr)
 static int
 ice_read_phy_reg_e82x(struct ice_hw *hw, u8 port, u16 offset, u32 *val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_rd,
+	};
 	int err;
 
 	ice_fill_phy_msg_e82x(hw, &msg, port, offset);
-	msg.opcode = ice_sbq_msg_rd;
 
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
@@ -2581,12 +2592,13 @@ ice_read_64b_phy_reg_e82x(struct ice_hw *hw, u8 port, u16 low_addr, u64 *val)
 static int
 ice_write_phy_reg_e82x(struct ice_hw *hw, u8 port, u16 offset, u32 val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_wr_p,
+		.data = val
+	};
 	int err;
 
 	ice_fill_phy_msg_e82x(hw, &msg, port, offset);
-	msg.opcode = ice_sbq_msg_wr;
-	msg.data = val;
 
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
@@ -2740,15 +2752,15 @@ static int ice_fill_quad_msg_e82x(struct ice_hw *hw,
 int
 ice_read_quad_reg_e82x(struct ice_hw *hw, u8 quad, u16 offset, u32 *val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_rd,
+	};
 	int err;
 
 	err = ice_fill_quad_msg_e82x(hw, &msg, quad, offset);
 	if (err)
 		return err;
 
-	msg.opcode = ice_sbq_msg_rd;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
@@ -2774,16 +2786,16 @@ ice_read_quad_reg_e82x(struct ice_hw *hw, u8 quad, u16 offset, u32 *val)
 int
 ice_write_quad_reg_e82x(struct ice_hw *hw, u8 quad, u16 offset, u32 val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_wr_p,
+		.data = val
+	};
 	int err;
 
 	err = ice_fill_quad_msg_e82x(hw, &msg, quad, offset);
 	if (err)
 		return err;
 
-	msg.opcode = ice_sbq_msg_wr;
-	msg.data = val;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
@@ -4450,14 +4462,14 @@ static void ice_ptp_init_phy_e82x(struct ice_ptp_hw *ptp)
  */
 static int ice_read_phy_reg_e810(struct ice_hw *hw, u32 addr, u32 *val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.dest_dev = ice_sbq_dev_phy_0,
+		.opcode = ice_sbq_msg_rd,
+		.msg_addr_low = lower_16_bits(addr),
+		.msg_addr_high = upper_16_bits(addr),
+	};
 	int err;
 
-	msg.msg_addr_low = lower_16_bits(addr);
-	msg.msg_addr_high = upper_16_bits(addr);
-	msg.opcode = ice_sbq_msg_rd;
-	msg.dest_dev = ice_sbq_dev_phy_0;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
@@ -4480,15 +4492,15 @@ static int ice_read_phy_reg_e810(struct ice_hw *hw, u32 addr, u32 *val)
  */
 static int ice_write_phy_reg_e810(struct ice_hw *hw, u32 addr, u32 val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.dest_dev = ice_sbq_dev_phy_0,
+		.opcode = ice_sbq_msg_wr_p,
+		.msg_addr_low = lower_16_bits(addr),
+		.msg_addr_high = upper_16_bits(addr),
+		.data = val
+	};
 	int err;
 
-	msg.msg_addr_low = lower_16_bits(addr);
-	msg.msg_addr_high = upper_16_bits(addr);
-	msg.opcode = ice_sbq_msg_wr;
-	msg.dest_dev = ice_sbq_dev_phy_0;
-	msg.data = val;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
diff --git a/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h b/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h
index 21bb861febbf..86a143ebf089 100644
--- a/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h
@@ -54,8 +54,9 @@ enum ice_sbq_dev_id {
 };
 
 enum ice_sbq_msg_opcode {
-	ice_sbq_msg_rd	= 0x00,
-	ice_sbq_msg_wr	= 0x01
+	ice_sbq_msg_rd		= 0x00,
+	ice_sbq_msg_wr_p	= 0x01,
+	ice_sbq_msg_wr_np	= 0x02,
 };
 
 #define ICE_SBQ_MSG_FLAGS	0x40

base-commit: 0e1f1fc37cbc0d7c4d977d9570ad9eefaccf83fc
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next v3 1/4] net: eth: fbnic: Fix addr validation in pcs write
From: Paolo Abeni @ 2026-05-07 13:50 UTC (permalink / raw)
  To: Mike Marciniszyn
  Cc: Jakub Kicinski, Alexander Duyck, kernel-team, Andrew Lunn,
	David S. Miller, Eric Dumazet, Heiner Kallweit, Russell King,
	Jacob Keller, Mohsin Bashir, Simon Horman, Lee Trager,
	Andrew Lunn, netdev, linux-kernel
In-Reply-To: <afyYJ9-qW_QG6f7_@PF5YBGDS.localdomain>

On 5/7/26 3:48 PM, Mike Marciniszyn wrote:
> On Thu, May 07, 2026 at 09:20:45AM -0400, Mike Marciniszyn wrote:
>> On Thu, May 07, 2026 at 09:20:53AM +0200, Paolo Abeni wrote:
>>> On 5/7/26 3:58 AM, Jakub Kicinski wrote:
>>>> On Mon,  4 May 2026 09:58:12 -0400 mike.marciniszyn@gmail.com wrote:
>>>>> From: "Mike Marciniszyn (Meta)" <mike.marciniszyn@gmail.com>
>>>>>
>>>>> The DW IP has two distinct PCS address ranges cooresponding
>>>>> to the C45 PCS registers.
>>>>>
>>>>> The shim translates the PCS addr/regno into specific CSR writes
>>>>> into one of those two zero-relative.
>>>>>
>>>>> This patch fixes a one off in the test that could allow an invalid
>>>>> CSR write if an addr == 2 was called.
>>>>>
>>>>> This patch contains a fix for addr validation in fbnic_mdio_write_pcs()
>>>>> to only return actual CSR reads for addr 0 and 1.
>>>>>
>>>>> There are as of yet, no real impact for the bug as no PCS writes are
>>>>> not yet present.
>>>>
>>>> Hi Paolo! Was there a reason / do you recall why this was not applied?
>>>> (I dropped it from patchwork now. If the omission was accidental it has
>>>> to be reposted)
>>>
>>> Darn, limited capacity here plus re-submission glitch: v3 had a slightly
>>> different cover title (due to typo) WRT v2 so PW did not mark v2 as
>>> superseded. I process patches via PW in sequence, when I reached v2 I
>>> considered the sashiko comment not blocking and I apply it. I was unable
>>> to reach v3 until now.
>>>
>>> TL;DR: @Mike: please re-submit 1/4 and double check there are not other
>>> differences between v2 and v3 - otherwise more patches needed. Also
>>> please ensure you keep the series title consistent among revision, or at
>>> least manually remove old revisions from PW upon resubmission.
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>>
>> I double checked v2 -> v3 for the other patches are ok.
>>
>> I'm just now resending 1/4 of the series.  I reworded the commit message
>> to fix the AI review comment.
>>
>> Mike
> 
> The patch isn't showing up on either lore or patchwork.
> 
> I may need to bump the rev?

Wait a bit more. 1h latency can happen quite easily.

/P


^ permalink raw reply

* Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part
From: Amery Hung @ 2026-05-07 13:53 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller, Saeed Mahameed, Mark Bloch,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, Gal Pressman,
	Dragos Tatulea, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Stanislav Fomichev, Alexei Starovoitov
In-Reply-To: <20260507095330.318892-3-tariqt@nvidia.com>

On Thu, May 7, 2026 at 10:54 AM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Christoph Paasch <cpaasch@openai.com>
>
> mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
> bytes from the page-pool to the skb's linear part. Those 256 bytes
> include part of the payload.
>
> When attempting to do GRO in skb_gro_receive, if headlen > data_offset
> (and skb->head_frag is not set), we end up aggregating packets in the
> frag_list.
>
> This is of course not good when we are CPU-limited. Also causes a worse
> skb->len/truesize ratio,...
>
> So, let's avoid copying parts of the payload to the linear part. We use
> eth_get_headlen() to parse the headers and compute the length of the
> protocol headers, which will be used to copy the relevant bits of the
> skb's linear part.
>
> We still allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking
> stack needs to call pskb_may_pull() later on, we don't need to reallocate
> memory.
>
> This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
> LRO enabled):
>
> BEFORE:
> =======
> (netserver pinned to core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.01    32547.82
>
> (netserver pinned to adjacent core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.00    52531.67
>
> AFTER:
> ======
> (netserver pinned to core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.00    52896.06
>
> (netserver pinned to adjacent core receiving interrupts)
>  $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>  87380  16384 262144    60.00    85094.90
>
> Additional tests across a larger range of parameters w/ and w/o LRO, w/
> and w/o IPv6-encapsulation, different MTUs (1500, 4096, 9000), different
> TCP read/write-sizes as well as UDP benchmarks, all have shown equal or
> better performance with this patch.
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 75ccf40a7f17..301b33419207 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1976,6 +1976,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>                                         ALIGN(headlen, sizeof(long)),
>                                         rq->buff.map_dir);
>
> +               headlen = eth_get_headlen(rq->netdev, head_addr, headlen);
> +
>                 frag_offset += headlen;
>                 byte_cnt -= headlen;
>                 linear_hr = skb_headroom(skb);
> @@ -2012,9 +2014,13 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>
>         if (prog) {
>                 u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
> +               skb_frag_t *frag = &sinfo->frags[0];
>                 u8 new_nr_frags;
>                 u32 len;
>
> +               headlen = eth_get_headlen(rq->netdev, skb_frag_address(frag),
> +                                         skb_frag_size(frag));
> +
>                 if (mlx5e_xdp_handle(rq, prog, mxbuf)) {

Hello,

Am I understanding correctly that the better performance comes with
the assumption that the XDP does not change headers?

headlen is determined before the XDP program runs. If it push/pop
headers, there could be headers in frags or data in the linear region
after __pskb_pull_tail().

>                         if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
>                                 struct mlx5e_frag_page *pfp;
> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>                                 pagep->frags++;
>                         while (++pagep < frag_page);
>
> -                       headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
> -                                       skb->data_len);
> +                       headlen = min_t(u16, headlen - len, skb->data_len);

headlen - len can underflow but will be capped by skb->data_len, so
this should be okay, right?

>                         __pskb_pull_tail(skb, headlen);
>                 }
>         } else {
> --
> 2.44.0
>

^ permalink raw reply

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Xilin Wu @ 2026-05-07 13:57 UTC (permalink / raw)
  To: Alex Elder, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, rmk+kernel, andersson, konradybcio, robh,
	krzk+dt, conor+dt, linusw, brgl, arnd, gregkh
  Cc: Daniel Thompson, mohd.anwar, a0987203069, alexandre.torgue, ast,
	boon.khai.ng, chenchuangyu, chenhuacai, daniel, hawk, hkallweit1,
	inochiama, john.fastabend, julianbraha, livelycarpet87,
	matthew.gerlach, mcoquelin.stm32, me, prabhakar.mahadev-lad.rj,
	richardcochran, rohan.g.thomas, sdf, siyanteng, weishangjuan,
	wens, netdev, bpf, linux-arm-msm, devicetree, linux-gpio,
	linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <7f3a0f16-5159-4bbc-8b15-9b5841603bf6@riscstar.com>

On 5/7/2026 1:44 AM, Alex Elder wrote:
> On 5/5/26 9:30 PM, Xilin Wu wrote:
>> On 5/1/2026 11:54 PM, Alex Elder wrote:
>>> From: Daniel Thompson <daniel@riscstar.com>
>>>
>>> Toshiba TC956x is an Ethernet AVB/TSN bridge and is essentially a
>>> small and highly-specialized SoC. TC956x includes an "eMAC" subsystem
>>> that can be accessed, along with several other peripherals, via two
>>> PCIe endpoint functions. There is a main driver for the endpoint that
>>> decomposes things and creates auxiliary bus devices to model the SoC.
>>>
>>> The eMAC consists of a Designware XGMAC, XPCS and PMA. Each eMAC is
>>> supported by an MSIGEN that bridges TC956x level interrupts to PCIe
>>> MSIs.
>>>
>>> Add a driver for the eMAC/MSIGEN combination.
>>>
>>> Co-developed-by: Alex Elder <elder@riscstar.com>
>>> Signed-off-by: Alex Elder <elder@riscstar.com>
>>> Signed-off-by: Daniel Thompson <daniel@riscstar.com>
>>> ---
>>>   drivers/net/ethernet/stmicro/stmmac/Kconfig   |  13 +
>>>   drivers/net/ethernet/stmicro/stmmac/Makefile  |   2 +
>>>   .../ethernet/stmicro/stmmac/dwmac-tc956x.c    | 791 ++++++++++++++++++
>>>   include/soc/toshiba/tc956x-dwmac.h            |  84 ++
>>>   4 files changed, 890 insertions(+)
>>>   create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-tc956x.c
>>>   create mode 100644 include/soc/toshiba/tc956x-dwmac.h
>>>
>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/ 
>>> net/ethernet/stmicro/stmmac/Kconfig
>>> index e3dd5adda5aca..66bcfaccbe21f 100644
>>> --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
>>> +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
>>> @@ -404,6 +404,19 @@ config DWMAC_MOTORCOMM
>>>         This enables glue driver for Motorcomm DWMAC-based PCI Ethernet
>>>         controllers. Currently only YT6801 is supported.
>>> +config DWMAC_TC956X
>>> +    tristate "Toshiba TC956X DWMAC support"
>>> +    depends on PCI
>>> +    depends on COMMON_CLK
>>> +    depends on TOSHIBA_TC956X_PCI
>>> +    default m if TOSHIBA_TC956X_PCI
>>
>> Hi Alex,
>>
>> I think GENERIC_IRQ_CHIP should be selected here.
> 
> Yes there are a number of things missing in the Kconfig definitions
> and I'm working through them this week.  And yes, since we use
> irq_generic_chip_ops we must ensure CONFIG_GENERIC_IRQ_CHIP is
> enabled here.
> 
>> Thank you for the driver.
> 
> Thank you for your feedback (this and others I see).
> 
>                      -Alex
> 
> 
> 

Hi Alex,

Do you think if a shutdown callback like this is required? It looks like 
the driver sometimes does a MDIO MMIO read when the PCIe link is down, 
causing the board to reset due to SoC side PCIe NoC timeout.

After this change, the board can always shutdown gracefully.


diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-tc956x.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-tc956x.c
index 4e8b4a185583..34b8e3fe1b51 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-tc956x.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-tc956x.c
@@ -767,6 +767,17 @@ static void tc956x_dwmac_remove(struct 
auxiliary_device *adev)
         tc956x_mac_disable(td);
  }

+static void tc956x_dwmac_shutdown(struct auxiliary_device *adev)
+{
+       struct device *dev = &adev->dev;
+       int ret;
+
+       ret = stmmac_suspend(dev);
+       if (ret)
+               dev_warn(dev, "failed to suspend MAC during shutdown: %d\n",
+                        ret);
+}
+
  static const struct auxiliary_device_id tc956x_dwmac_ids[] = {
         { .name = TC956X_PCIE_DRIVER_NAME "." TC956X_XGMAC_DEV_NAME, },
         { },
@@ -777,6 +788,7 @@ static struct auxiliary_driver tc956x_dwmac_driver = {
         .name           = DRIVER_NAME,
         .probe          = tc956x_dwmac_probe,
         .remove         = tc956x_dwmac_remove,
+       .shutdown       = tc956x_dwmac_shutdown,
         .id_table       = tc956x_dwmac_ids,
         .driver = {
                 .name   = DRIVER_NAME,

-- 
Best regards,
Xilin Wu <sophon@radxa.com>


^ permalink raw reply related

* Re: [PATCH net 1/5] net: dsa: mt7530: fix FDB entries not aging out with short timeout
From: Paolo Abeni @ 2026-05-07 14:08 UTC (permalink / raw)
  To: Daniel Golle, Chester A. Unal, Andrew Lunn, Vladimir Oltean,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Matthias Brugger,
	AngeloGioacchino Del Regno, DENG Qingfang, Florian Fainelli,
	Arınç ÜNAL, Sean Wang, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek
In-Reply-To: <f285707e09a0febffd1b987f204ff4eb71736489.1777986341.git.daniel@makrotopia.org>

On 5/5/26 4:16 PM, Daniel Golle wrote:
> When setting a low ageing time such as 10 seconds, the algorithm in
> mt7530_set_ageing_time() finds AGE_CNT=0 and AGE_UNIT=9 as the first
> exact match (starting the search from tmp_age_count=0).
> 
> On the MT7530/MT7531 hardware, the per-entry aging counter is
> initialized to AGE_CNT when a MAC address is learned. With AGE_CNT=0,
> new entries start with a counter value of 0, which the hardware treats
> as "already aged" and never removes, effectively disabling aging.
> 
> Fix this by starting the search from tmp_age_count=1 to ensure entries
> always have a non-zero initial aging counter. For a 10-second ageing
> time this yields AGE_CNT=1 and AGE_UNIT=4 instead: the timer ticks
> every 5 seconds and entries are removed after 2 ticks.
> 
> Fixes: ea6d5c924e39 ("net: dsa: mt7530: support setting ageing time")
> Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> ---
>  drivers/net/dsa/mt7530.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index 44d670904ad8..b1903da7d500 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c
> @@ -1027,8 +1027,12 @@ mt7530_set_ageing_time(struct dsa_switch *ds, unsigned int msecs)
>  	if (secs < 1 || secs > (AGE_CNT_MAX + 1) * (AGE_UNIT_MAX + 1))
>  		return -ERANGE;
>  
> -	/* iterate through all possible age_count to find the closest pair */
> -	for (tmp_age_count = 0; tmp_age_count <= AGE_CNT_MAX; ++tmp_age_count) {
> +	/* Iterate through all possible age_count values to find the closest
> +	 * pair. Start from 1 because the per-entry aging counter is
> +	 * initialized to AGE_CNT and a value of 0 means the entry will
> +	 * never be aged out.
> +	 */
> +	for (tmp_age_count = 1; tmp_age_count <= AGE_CNT_MAX; ++tmp_age_count) {
>  		unsigned int tmp_age_unit = secs / (tmp_age_count + 1) - 1;
>  
>  		if (tmp_age_unit <= AGE_UNIT_MAX) {

Sashiko noted that the above will have problem with secs == 1:

What happens here if secs is 1?
Since the bounds check at the start of the function allows secs == 1,
tmp_age_unit would be calculated as 1 / (1 + 1) - 1, which evaluates to
0 - 1, resulting in an unsigned underflow to UINT_MAX.
>  		if (tmp_age_unit <= AGE_UNIT_MAX) {
Because UINT_MAX is greater than AGE_UNIT_MAX, this condition will fail for
all iterations of the loop.
[ ... ]
>  	mt7530_write(priv, MT7530_AAC, AGE_CNT(age_count) | AGE_UNIT(age_unit));
If the loop exits without ever finding a match and entering the if block,
age_count and age_unit will remain uninitialized. Could this result in
uninitialized stack variables being written to the MT7530_AAC hardware
register?

/P


^ permalink raw reply

* Re: [PATCH] RDMA/nldev: add mutual exclusion in nldev_dellink()
From: Zhu Yanjun @ 2026-05-07 14:11 UTC (permalink / raw)
  To: Edward Adam Davis, yanjun.zhu@linux.dev
  Cc: akpm, arjan, davem, dsahern, edumazet, hdanton, horms, jgg, kuba,
	kuni1840, kuniyu, leon, linux-kernel, linux-rdma, netdev, pabeni,
	syzbot+d8f76778263ab65c2b21, syzkaller-bugs, zyjzyj2000
In-Reply-To: <tencent_D175A964A3A32452D77DB76B66C2B3730305@qq.com>


在 2026/5/7 6:40, Edward Adam Davis 写道:
> On Thu, 7 May 2026 06:25:54 -0700, Zhu Yanjun wrote:
>>> We must serialize calls to nldev_dellink() or risk a crash as syzbot
>>> reported:
>>>
>>> KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
>>> Call Trace:
>>>    udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
>>>    rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
>>>    rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
>>>    rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
>>>    rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
>>>
>>> Fixes: a60e3f3d6fba ("RDMA/nldev: Add dellink function pointer")
>>> Reported-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
>>> Tested-by: syzbot+d8f76778263ab65c2b21@syzkaller.appspotmail.com
>>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>> Thanks a lot. This looks like a good solution. Since the issue is
>> reproducible,
>>
>> have you sent this commit to syzbot for verification?
> The patch has been verified by syzbot.

Thanks a lot.

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun

>
> BR,
> Edward
>
-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox