Netdev List
 help / color / mirror / Atom feed
* [GIT PULL] Networking for v7.1-rc3
From: Jakub Kicinski @ 2026-05-07 17:21 UTC (permalink / raw)
  To: torvalds; +Cc: kuba, davem, netdev, linux-kernel, pabeni

Hi Linus!

The following changes since commit 08d0d3466664000ba0670e0ef0d447f23459e0d4:

  Merge tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2026-04-30 08:45:43 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-7.1-rc3

for you to fetch changes up to 41ae14071cd7f6a7770e2fe1f8a0859d4c2c6ba4:

  net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init() (2026-05-07 09:08:47 -0700)

----------------------------------------------------------------
Including fixes from Netfilter, IPsec, Bluetooth and WiFi.

Current release - fix to a fix:

 - ipmr: add __rcu to netns_ipv4.mrt, make sure we hold the RCU lock
   in all relevant places

Current release - new code bugs:

 - fixes for the recently added resizable hash tables

 - ipv6: make sure we default IPv6 tunnel drivers to =m now that
   IPv6 itself is built in

 - drv: octeontx2-af: fixes for parser/CAM fixes

Previous releases - regressions:

 - phy: micrel: fix LAN8814 QSGMII soft reset

 - wifi: cw1200: revert "Fix locking in error paths"

 - wifi: ath12k: fix crash on WCN7850, due to adding the same queue
   buffer to a list multiple times

Previous releases - always broken:

 - number of info leak fixes

 - ipv6: implement limits on extension header parsing

 - wifi: number of fixes for missing bound checks in the drivers

 - Bluetooth: fixes for races and locking issues

 - af_unix: fix an issue between garbage collection and PEEK

 - af_unix: fix yet another issue with OOB data

 - xfrm: esp: avoid in-place decrypt on shared skb frags

 - netfilter: replace skb_try_make_writable() by skb_ensure_writable()

 - openvswitch: vport: fix race between tunnel creation and linking
   leading to invalid memory accesses (type confusion)

 - drv: amd-xgbe: fix PTP addend overflow causing frozen clock

Misc:

 - sched/isolation: make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
   (for relevant IPVS change)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

----------------------------------------------------------------
Aaradhana Sahu (1):
      wifi: ath12k: fix OF node refcount imbalance in WSI graph traversal

Aleksander Jan Bajkowski (1):
      net: usb: r8152: add TRENDnet TUC-ET2G v2.0

Alex Cheema (1):
      net: usb: cdc_ncm: add Apple Mac USB-C direct networking quirk

Alyssa Ross (1):
      ipv6: default IPV6_SIT to m

Amir Mohammad Jahangirzad (1):
      wifi: libertas: fix integer underflow in process_cmdrequest()

Andreas Haarmann-Thiemann (1):
      net: ethernet: cortina: Drop half-assembled SKB

Aurelien DESBRIERES (1):
      Bluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized

Baochen Qiang (2):
      wifi: ath12k: prepare REO update element only for primary link
      wifi: ath12k: fix peer_id usage in normal RX path

Bart Van Assche (1):
      wifi: cw1200: Revert "Fix locking in error paths"

Benjamin Berg (1):
      wifi: mac80211: use safe list iteration in radar detect work

Bobby Eshleman (1):
      eth: fbnic: fix double-free of PCS on phylink creation failure

Breno Leitao (1):
      netpoll: pass buffer size to egress_dev() to avoid MAC truncation

Catherine (1):
      wifi: mac80211: drop stray 'static' from fast-RX rx_result

Cosmin Ratiu (6):
      tools/selftests: Use a sensible timeout value for iperf3 client
      tools/selftests: Add a VXLAN+IPsec traffic test
      xfrm: Don't clobber inner headers when already set
      net/mlx5e: psp: Fix invalid access on PSP dev registration fail
      net/mlx5e: psp: Expose only a fully initialized priv->psp
      net/mlx5e: psp: Hook PSP dev reg/unreg to profile enable/disable

D. Wythe (1):
      net/smc: fix missing sk_err when TCP handshake fails

Daniel Borkmann (1):
      ipv6: Implement limits on extension header parsing

Daniel Golle (1):
      net: dsa: mt7530: fix .get_stats64 sleeping in atomic context

Daniel Machon (2):
      net: sparx5: fix wrong chip ids for TSN SKUs
      net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()

Daniel Zahka (3):
      netdevsim: psp: only call nsim_psp_uninit() on PFs
      netdevsim: psp: serialize calls to nsim_psp_uninit()
      netdevsim: psp: rcu protect psp_dev reference

David Carlier (2):
      psp: strip variable-length PSP header in psp_dev_rcv()
      Bluetooth: hci_conn: fix potential UAF in create_big_sync

Dipayaan Roy (4):
      net: mana: check xdp_rxq registration before unreg in mana_destroy_rxq()
      net: mana: Skip WQ object destruction for uninitialized RXQ
      net: mana: remove double CQ cleanup in mana_create_rxq error path
      net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR

Dmitry Baryshkov (1):
      wifi: ath10k: snoc: select POWER_SEQUENCING

Dudu Lu (2):
      Bluetooth: bnep: fix incorrect length parsing in bnep_rx_frame() extension handling
      Bluetooth: l2cap: fix MPS check in l2cap_ecred_reconf_req

Eric Dumazet (12):
      ipmr: prevent info-leak in pmr_cache_report()
      ipv4: igmp: annotate data-races in igmp_heard_query()
      net/sched: sch_pie: annotate more data-races in pie_dump_stats()
      net/sched: sch_cake: annotate data-races in cake_dump_class_stats (I)
      net/sched: sch_cake: annotate data-races in cake_dump_class_stats (II)
      vsock/virtio: fix potential unbounded skb queue
      net: prevent possible UAF in rtnl_prop_list_size()
      net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()
      ipv6: fix potential UAF caused by ip6_forward_proxy_check()
      inetpeer: add a missing read_seqretry() in inet_getpeer()
      net/sched: sch_sfq: annotate data-races from sfq_dump_class_stats()
      tcp: tcp_child_process() related UAF

Fernando Fernandez Mancera (3):
      netfilter: nf_socket: skip socket lookup for non-first fragments
      netfilter: nf_tables: skip L4 header parsing for non-first fragments
      netfilter: xtables: fix L4 header parsing for non-first fragments

Florian Westphal (2):
      netfilter: xt_CT: fix usersize for v1 and v2 revision
      netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables

Gregory Fuchedgi (1):
      amd-xgbe: fix PTP addend overflow causing frozen clock

Holger Brunck (2):
      net: wan: fsl_ucc_hdlc: fix uhdlc_memclean
      net: wan: fsl_ucc_hdlc: fix ucc_hdlc_remove

Ilya Maximets (3):
      openvswitch: vport: fix race between tunnel creation and linking
      openvswitch: vport: fix self-deadlock on release of tunnel ports
      selftests: openvswitch: add tests for tunnel vport refcounting

Jakov Novak (1):
      wifi: libertas: notify firmware load wait on disconnect

Jakub Kicinski (21):
      Merge branch 'net-mctp-test-minor-kunit-test-fixes'
      Merge branch 'octeontx2-af-npc-cn20k-mcam-fixes'
      Merge tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
      Merge branch 'ipv6-fix-ecmp-route-failover-on-carrier-loss'
      Merge branch 'replace-direct-dequeue-call-with-qdisc_dequeue_peeked'
      Merge branch 'net-sched-sch_cake-annotate-data-races-in-cake_dump_class_stats-series'
      net: tls: fix silent data drop under pipe back-pressure
      selftests: tls: add test for data loss on small pipe
      Merge branch 'mptcp-misc-fixes-for-v7-1-rc3'
      Merge branch 'bnxt_en-bug-fixes'
      Merge tag 'nf-26-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
      Merge branch 'net-mlx5e-psp-fixes'
      Merge branch 'net-mlx5-fixes-for-socket-direct'
      Merge branch 'xsk-fix-bugs-around-xsk-skb-allocation'
      Merge tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
      Merge tag 'for-net-2026-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
      Merge tag 'ovpn-net-20260504' of https://github.com/OpenVPN/ovpn-net-next
      Merge tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
      selftests: drv-net: fix sort order of makefile and config
      Merge branch 'netdevsim-psp-fix-init-and-uninit-bugs'
      Merge branch 'mptcp-pm-misc-fixes-for-v7-1-rc3'

Jamal Hadi Salim (1):
      net/sched: sch_red: Replace direct dequeue call with peek and qdisc_dequeue_peeked

Jann Horn (1):
      Bluetooth: hci_event: fix memset typo

Jason Xing (8):
      xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
      xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
      xsk: handle NULL dereference of the skb without frags issue
      xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
      xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
      xsk: avoid skb leak in XDP_TX_METADATA case
      xsk: fix xsk_addrs slab leak on multi-buffer error path
      xsk: fix u64 descriptor address truncation on 32-bit architectures

Jeongjun Park (1):
      wifi: rsi: fix kthread lifetime race between self-exit and external-stop

Jeremy Kerr (2):
      net: mctp: test: use a zeroed struct sockaddr_mctp
      net: mctp: test: Use dev_direct_xmit for TX to our test device

Jesper Dangaard Brouer (1):
      veth: fix OOB txq access in veth_poll() with asymmetric queue counts

Jiawen Wu (2):
      net: libwx: fix VF illegal register access
      net: libwx: use request_irq for VF misc interrupt

Jiexun Wang (1):
      af_unix: Reject SIOCATMARK on non-stream sockets

Jiri Slaby (SUSE) (1):
      wifi: ath5k: do not access array OOB

Joey Lu (1):
      net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()

Johannes Berg (5):
      Merge tag 'ath-current-20260427' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
      wifi: mac80211: tests: mark HT check strict
      Merge tag 'ath-current-20260505' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
      wifi: mac80211: remove station if connection prep fails
      wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage

Julian Anastasov (6):
      ipvs: fixes for the new ip_vs_status info
      ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars
      ipvs: fix the spin_lock usage for RT build
      ipvs: do not leak dest after get from dest trash
      ipvs: fix races around est_mutex and est_cpulist
      ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size

Justin Chen (1):
      net: phy: broadcom: Save PHY counters during suspend

Kai Zen (1):
      net: rtnetlink: zero ifla_vf_broadcast to avoid stack infoleak in rtnl_fill_vfinfo

Kalesh AP (1):
      bnxt_en: Check return value of bnxt_hwrm_vnic_cfg

Kuan-Ting Chen (1):
      xfrm: esp: avoid in-place decrypt on shared skb frags

Kuniyuki Iwashima (6):
      selftest: net: Add test for TCP flow failover with ECMP routes.
      af_unix: Set gc_in_progress to true in unix_gc().
      ipmr: Add __rcu to netns_ipv4.mrt.
      ipv6: Fix null-ptr-deref in fib6_mtu().
      ipmr: Call ipmr_fib_lookup() under RCU.
      tcp: Fix dst leak in tcp_v6_connect().

Lorenzo Bianconi (1):
      net: airoha: Move entries to queue head in case of DMA mapping failure in airoha_dev_xmit()

Luiz Augusto von Dentz (1):
      Bluetooth: hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt

Maciej W. Rozycki (1):
      MAINTAINERS: Add self for the DEC LANCE network driver

Maoyi Xie (3):
      ip6_gre: Use cached t->net in ip6erspan_changelink().
      wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS
      wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation

Marek Szyprowski (1):
      wifi: brcmfmac: Fix potential use-after-free issue when stopping watchdog task

Markus Baier (1):
      net: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks

Matthieu Baerts (NGI0) (12):
      mptcp: sockopt: increase seq in mptcp_setsockopt_all_sf
      mptcp: pm: kernel: correctly retransmit ADD_ADDR ID 0
      mptcp: pm: ADD_ADDR rtx: allow ID 0
      mptcp: pm: ADD_ADDR rtx: fix potential data-race
      mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
      mptcp: pm: ADD_ADDR rtx: free sk if last
      mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
      mptcp: pm: ADD_ADDR rtx: skip inactive subflows
      mptcp: pm: ADD_ADDR rtx: return early if no retrans
      mptcp: pm: prio: skip closed subflows
      selftests: mptcp: check output: catch cmd errors
      selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl

Michael Bommarito (6):
      xfrm: ah: account for ESN high bits in async callbacks
      wifi: nl80211: require admin perm on SET_PMK / DEL_PMK
      wifi: mac80211: check ieee80211_rx_data_set_link return in pubsta MLO path
      Bluetooth: virtio_bt: clamp rx length before skb_put
      Bluetooth: virtio_bt: validate rx pkt_type header length
      Bluetooth: HIDP: serialise l2cap_unregister_user via hidp_session_sem

Michael Chan (2):
      bnxt_en: Delay for 5 seconds after AER DPC for all chips
      bnxt_en: Set bp->max_tpa according to what the FW supports

Michal Kosiorek (1):
      xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete

Mikhail Gavrilov (1):
      Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion

Nan Li (1):
      net/rds: handle zerocopy send cleanup before the message is queued

Nicolas Escande (1):
      wifi: ath12k: fix leak in some ath12k_wmi_xxx() functions

Pablo Neira Ayuso (8):
      netfilter: replace skb_try_make_writable() by skb_ensure_writable()
      netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
      netfilter: x_tables: add .check_hooks to matches and targets
      netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate
      netfilter: flowtable: ensure sufficient headroom in xmit path
      netfilter: flowtable: fix inline vlan encapsulation in xmit path
      netfilter: flowtable: fix inline pppoe encapsulation in xmit path
      netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header

Paolo Abeni (3):
      mptcp: fix rx timestamp corruption on fastopen
      Merge branch 'net-mana-fix-mana_destroy_rxq-cleanup-for-partial-rxq-init'
      Merge branch 'openvswitch-fix-self-deadlock-on-release-of-tunnel-vports'

Pauli Virtanen (2):
      Bluetooth: SCO: fix sleeping under spinlock in sco_conn_ready
      Bluetooth: SCO: hold sk properly in sco_conn_ready

Pavan Chebbi (1):
      bnxt_en: Use absolute target ns from ptp_clock_request

Pavitra Jha (1):
      net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler

Pengpeng Hou (1):
      Bluetooth: RFCOMM: pull credit byte with skb_pull_data()

Qingfang Deng (1):
      ovpn: reset MAC header before passing skb up

Ralf Lici (2):
      ovpn: ensure packet delivery happens with BH disabled
      selftests: ovpn: reduce ping count in test.sh

Rameshkumar Sundaram (1):
      wifi: ath12k: initialize RSSI dBm conversion event state

Ratheesh Kannoth (10):
      octeontx2-af: npc: cn20k: Propagate MCAM key-type errors on cn20k
      octeontx2-af: npc: cn20k: Drop debugfs_create_file() error checks in init
      octeontx2-af: npc: cn20k: Propagate errors in defrag MCAM alloc rollback
      octeontx2-af: npc: cn20k: Fix target map and rule
      octeontx2-af: npc: cn20k: Clear MCAM entries by index and key width
      octeontx2-af: npc: cn20k: Fix bank value
      octeontx2-af: npc: cn20k: Fix MCAM actions read
      octeontx2-af: npc: cn20k: Initialize default-rule index outputs up front
      octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free
      octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices

Rio Liu (1):
      wifi: mac80211: skip ieee80211_verify_sta_ht_mcs_support check in non-strict mode

Robert Marko (1):
      net: phy: micrel: fix LAN8814 QSGMII soft reset

Ruijie Li (1):
      xfrm: provide message size for XFRM_MSG_MAPPING

Sagarika Sharma (1):
      ipv6: update route serial number on NETDEV_CHANGE

Sai Teja Aluvala (1):
      Bluetooth: btintel_pcie: treat boot stage bit 12 as warning

SeungJu Cheon (2):
      Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
      Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths

Shardul Bankar (2):
      mptcp: use MPJoinSynAckHMacFailure for SynAck HMAC failure
      mptcp: use MPTCP_RST_EMPTCP for ACK HMAC validation failure

Shay Drory (4):
      net/mlx5: SD: Serialize init/cleanup
      net/mlx5: SD, Keep multi-pf debugfs entries on primary
      net/mlx5e: SD, Fix missing cleanup on probe error
      net/mlx5e: SD, Fix race condition in secondary device probe/remove

Shitalkumar Gandhi (1):
      net: rtsn: fix mdio_node leak in rtsn_mdio_alloc()

Siwei Zhang (3):
      Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
      Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
      Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()

Tristan Madani (3):
      wifi: b43: enforce bounds check on firmware key index in b43_rx()
      wifi: b43legacy: enforce bounds check on firmware key index in RX path
      Bluetooth: btmtk: validate WMT event SKB length before struct access

Victor Nogueira (1):
      selftests/tc-testing: Add tests that force red and sfb to dequeue from child's gso_skb

Victor Nogueria (1):
      net/sched: sch_sfb: Replace direct dequeue call with peek and qdisc_dequeue_peeked

Waiman Long (2):
      ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU
      sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN

Wei Fang (1):
      net: enetc: fix VSI mailbox timeout handling and DMA lifecycle

Weiming Shi (1):
      netfilter: nft_fwd_netdev: use recursion counter in neigh egress path

Yilin Zhu (1):
      ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()

Yu-Hsiang Tseng (1):
      wifi: ath12k: use lockdep_assert_in_rcu_read_lock() for RCU assertions

 MAINTAINERS                                        |   6 +
 drivers/bluetooth/btintel_pcie.c                   |  13 +-
 drivers/bluetooth/btintel_pcie.h                   |   2 +-
 drivers/bluetooth/btmtk.c                          |  15 +-
 drivers/bluetooth/hci_ath.c                        |   3 +
 drivers/bluetooth/hci_bcsp.c                       |   3 +
 drivers/bluetooth/hci_h4.c                         |   3 +
 drivers/bluetooth/hci_h5.c                         |   3 +
 drivers/bluetooth/virtio_bt.c                      |  39 ++-
 drivers/net/dsa/mt7530.c                           |  75 +++-
 drivers/net/dsa/mt7530.h                           |   8 +
 drivers/net/ethernet/airoha/airoha_eth.c           |   6 +-
 drivers/net/ethernet/amd/xgbe/xgbe.h               |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  16 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c      |  29 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c      |  10 +-
 drivers/net/ethernet/cortina/gemini.c              |   5 +
 drivers/net/ethernet/freescale/enetc/enetc.h       |   1 +
 drivers/net/ethernet/freescale/enetc/enetc_vf.c    |  42 ++-
 .../ethernet/marvell/octeontx2/af/cn20k/debugfs.c  |  33 +-
 .../net/ethernet/marvell/octeontx2/af/cn20k/npc.c  | 382 ++++++++++++++-------
 .../net/ethernet/marvell/octeontx2/af/cn20k/npc.h  |  24 +-
 .../net/ethernet/marvell/octeontx2/af/rvu_nix.c    |   3 +
 .../net/ethernet/marvell/octeontx2/af/rvu_npc.c    | 231 +++++++++++--
 .../net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c |  30 +-
 .../net/ethernet/mellanox/mlx5/core/en_accel/psp.c |  36 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  30 +-
 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c   | 114 +++++-
 drivers/net/ethernet/mellanox/mlx5/core/lib/sd.h   |   2 +
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   3 +-
 .../net/ethernet/microchip/sparx5/sparx5_main.h    |  10 +-
 .../net/ethernet/microchip/sparx5/sparx5_port.c    |   3 +-
 drivers/net/ethernet/microsoft/mana/gdma_main.c    |  40 ++-
 drivers/net/ethernet/microsoft/mana/mana_en.c      |  10 +-
 drivers/net/ethernet/microsoft/mana/shm_channel.c  |   5 -
 drivers/net/ethernet/renesas/rtsn.c                |   6 +-
 .../net/ethernet/stmicro/stmmac/dwmac-nuvoton.c    |   2 +
 drivers/net/ethernet/wangxun/libwx/wx_hw.c         |   7 +-
 drivers/net/ethernet/wangxun/libwx/wx_vf_common.c  |   4 +-
 drivers/net/netdevsim/netdev.c                     |   3 +-
 drivers/net/netdevsim/netdevsim.h                  |   4 +-
 drivers/net/netdevsim/psp.c                        |  65 +++-
 drivers/net/ovpn/io.c                              |   7 +
 drivers/net/phy/bcm-phy-lib.c                      |   9 +
 drivers/net/phy/bcm-phy-lib.h                      |   1 +
 drivers/net/phy/bcm7xxx.c                          |  14 +
 drivers/net/phy/broadcom.c                         |   5 +
 drivers/net/phy/micrel.c                           |  15 +-
 drivers/net/usb/asix_devices.c                     |   2 +
 drivers/net/usb/cdc_ncm.c                          |   8 +
 drivers/net/usb/r8152.c                            |   1 +
 drivers/net/veth.c                                 |   3 +-
 drivers/net/wan/fsl_ucc_hdlc.c                     |   9 +-
 drivers/net/wireless/ath/ath10k/Kconfig            |   1 +
 drivers/net/wireless/ath/ath12k/core.c             |  77 +++--
 drivers/net/wireless/ath/ath12k/dp_rx.c            |   5 +-
 drivers/net/wireless/ath/ath12k/mac.c              |   2 +-
 drivers/net/wireless/ath/ath12k/p2p.c              |   2 +-
 drivers/net/wireless/ath/ath12k/wmi.c              | 105 +++++-
 drivers/net/wireless/ath/ath5k/base.c              |   3 +-
 drivers/net/wireless/broadcom/b43/xmit.c           |   3 +-
 drivers/net/wireless/broadcom/b43legacy/xmit.c     |   3 +-
 .../wireless/broadcom/brcm80211/brcmfmac/sdio.c    |   6 +-
 drivers/net/wireless/marvell/libertas/if_usb.c     |   6 +-
 drivers/net/wireless/rsi/rsi_common.h              |   5 +-
 drivers/net/wireless/st/cw1200/pm.c                |   2 -
 drivers/net/wwan/t7xx/t7xx_modem_ops.c             |  20 +-
 drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c         |  18 +-
 drivers/net/wwan/t7xx/t7xx_port_proxy.h            |   2 +-
 include/linux/netfilter/x_tables.h                 |   8 +
 include/linux/sched/isolation.h                    |   6 +-
 include/net/bluetooth/hci_core.h                   |   2 +-
 include/net/dropreason-core.h                      |   6 +
 include/net/ip_vs.h                                |  31 +-
 include/net/ipv6.h                                 |   3 +
 include/net/mana/shm_channel.h                     |   6 +
 include/net/netfilter/nf_dup_netdev.h              |  13 +
 include/net/netfilter/nf_flow_table.h              |   4 +-
 include/net/netns/ipv4.h                           |   2 +-
 net/bluetooth/bnep/core.c                          |  13 +-
 net/bluetooth/hci_conn.c                           | 124 +++++--
 net/bluetooth/hci_event.c                          |  31 +-
 net/bluetooth/hidp/core.c                          |  27 +-
 net/bluetooth/iso.c                                |  56 +--
 net/bluetooth/l2cap_core.c                         |  14 +-
 net/bluetooth/l2cap_sock.c                         |   9 +
 net/bluetooth/rfcomm/core.c                        |   7 +-
 net/bluetooth/sco.c                                |  62 ++--
 net/core/dev.c                                     |   2 +-
 net/core/netpoll.c                                 |  23 +-
 net/core/rtnetlink.c                               |   1 +
 net/ipv4/ah4.c                                     |  14 +-
 net/ipv4/esp4.c                                    |   3 +-
 net/ipv4/igmp.c                                    |  58 ++--
 net/ipv4/inetpeer.c                                |   3 +-
 net/ipv4/ip_output.c                               |   2 +
 net/ipv4/ipmr.c                                    |  10 +-
 net/ipv4/netfilter/nf_socket_ipv4.c                |   3 +
 net/ipv4/tcp_ipv4.c                                |  14 +-
 net/ipv4/tcp_minisocks.c                           |   2 +-
 net/ipv6/Kconfig                                   |   4 +-
 net/ipv6/ah6.c                                     |  14 +-
 net/ipv6/esp6.c                                    |   3 +-
 net/ipv6/exthdrs_core.c                            |   7 +
 net/ipv6/ip6_gre.c                                 |   5 +-
 net/ipv6/ip6_input.c                               |   5 +
 net/ipv6/ip6_output.c                              |   5 +
 net/ipv6/ip6_tunnel.c                              |   4 +
 net/ipv6/netfilter/nf_socket_ipv6.c                |   5 +-
 net/ipv6/route.c                                   |   5 +
 net/ipv6/tcp_ipv6.c                                |  17 +-
 net/ipv6/xfrm6_protocol.c                          |   4 +-
 net/mac80211/mlme.c                                |  18 +-
 net/mac80211/rx.c                                  |   6 +-
 net/mac80211/tests/chan-mode.c                     |   1 +
 net/mac80211/util.c                                |   4 +-
 net/mctp/test/route-test.c                         |   2 +-
 net/mctp/test/utils.c                              |   2 +-
 net/mptcp/fastopen.c                               |   4 +-
 net/mptcp/pm.c                                     |  62 ++--
 net/mptcp/pm_kernel.c                              |  13 +-
 net/mptcp/sockopt.c                                |   4 +
 net/mptcp/subflow.c                                |   4 +-
 net/netfilter/ipvs/ip_vs_conn.c                    |  74 ++--
 net/netfilter/ipvs/ip_vs_core.c                    |   2 +-
 net/netfilter/ipvs/ip_vs_ctl.c                     | 164 ++++++---
 net/netfilter/ipvs/ip_vs_est.c                     |  83 +++--
 net/netfilter/nf_dup_netdev.c                      |  16 -
 net/netfilter/nf_flow_table_core.c                 |   1 +
 net/netfilter/nf_flow_table_ip.c                   | 151 ++++++--
 net/netfilter/nf_flow_table_path.c                 |   7 +-
 net/netfilter/nf_tables_api.c                      |  35 +-
 net/netfilter/nf_tables_core.c                     |   2 +-
 net/netfilter/nft_compat.c                         |  45 ++-
 net/netfilter/nft_exthdr.c                         |   2 +-
 net/netfilter/nft_fwd_netdev.c                     |  29 +-
 net/netfilter/nft_osf.c                            |   2 +-
 net/netfilter/nft_tproxy.c                         |   8 +-
 net/netfilter/x_tables.c                           |  79 ++++-
 net/netfilter/xt_CT.c                              |   8 +-
 net/netfilter/xt_TCPMSS.c                          |  33 +-
 net/netfilter/xt_TPROXY.c                          |  11 +-
 net/netfilter/xt_addrtype.c                        |  25 +-
 net/netfilter/xt_devgroup.c                        |  18 +-
 net/netfilter/xt_ecn.c                             |   4 +
 net/netfilter/xt_hashlimit.c                       |   4 +-
 net/netfilter/xt_osf.c                             |   3 +
 net/netfilter/xt_physdev.c                         |  24 +-
 net/netfilter/xt_policy.c                          |  24 +-
 net/netfilter/xt_set.c                             |  39 ++-
 net/netfilter/xt_tcpmss.c                          |   4 +
 net/openvswitch/vport-geneve.c                     |   5 +-
 net/openvswitch/vport-gre.c                        |   5 +-
 net/openvswitch/vport-netdev.c                     |  64 ++--
 net/openvswitch/vport-netdev.h                     |   2 +-
 net/openvswitch/vport-vxlan.c                      |   5 +-
 net/psp/psp_main.c                                 |  42 ++-
 net/rds/message.c                                  |  20 +-
 net/sched/sch_cake.c                               | 153 +++++----
 net/sched/sch_fq_codel.c                           |  39 ++-
 net/sched/sch_pie.c                                |  14 +-
 net/sched/sch_red.c                                |   2 +-
 net/sched/sch_sfb.c                                |   2 +-
 net/sched/sch_sfq.c                                |  48 +--
 net/smc/af_smc.c                                   |   8 +-
 net/tls/tls_sw.c                                   |   6 +-
 net/unix/af_unix.c                                 |   3 +
 net/unix/garbage.c                                 |   6 +-
 net/vmw_vsock/virtio_transport_common.c            |   4 +-
 net/wireless/nl80211.c                             |  27 ++
 net/wireless/pmsr.c                                |   2 +-
 net/xdp/xsk.c                                      | 115 ++++---
 net/xdp/xsk_buff_pool.c                            |   3 +
 net/xfrm/xfrm_output.c                             |  20 +-
 net/xfrm/xfrm_state.c                              |  12 +-
 net/xfrm/xfrm_user.c                               |   1 +
 tools/testing/selftests/drivers/net/hw/Makefile    |   1 +
 tools/testing/selftests/drivers/net/hw/config      |   5 +
 .../selftests/drivers/net/hw/ipsec_vxlan.py        | 204 +++++++++++
 tools/testing/selftests/drivers/net/lib/py/load.py |   5 +-
 tools/testing/selftests/net/Makefile               |   1 +
 tools/testing/selftests/net/mptcp/mptcp_lib.sh     |  16 +-
 tools/testing/selftests/net/mptcp/pm_netlink.sh    |  20 +-
 .../selftests/net/openvswitch/openvswitch.sh       |  37 ++
 .../testing/selftests/net/openvswitch/ovs-dpctl.py |  19 +-
 tools/testing/selftests/net/ovpn/test.sh           |   4 +-
 tools/testing/selftests/net/tcp_ecmp_failover.sh   | 216 ++++++++++++
 tools/testing/selftests/net/tls.c                  |  43 +++
 .../tc-testing/tc-tests/infra/qdiscs.json          | 148 ++++++++
 189 files changed, 3485 insertions(+), 1160 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/hw/ipsec_vxlan.py
 create mode 100755 tools/testing/selftests/net/tcp_ecmp_failover.sh

^ permalink raw reply

* Re: [PATCH net v1 1/2] dt-bindings: ethernet: eswin: refine delay model and HSP register description
From: Conor Dooley @ 2026-05-07 17:24 UTC (permalink / raw)
  To: lizhi2
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, robh, krzk+dt,
	conor+dt, netdev, devicetree, linux-kernel, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, maxime.chevallier, linux-stm32,
	linux-arm-kernel, ningyu, linmin, pinkesh.vaghela, pritesh.patel,
	weishangjuan
In-Reply-To: <20260507083136.175-1-lizhi2@eswincomputing.com>

[-- Attachment #1: Type: text/plain, Size: 6710 bytes --]

On Thu, May 07, 2026 at 04:31:36PM +0800, lizhi2@eswincomputing.com wrote:
> From: Zhi Li <lizhi2@eswincomputing.com>
> 
> Refine the EIC7700 Ethernet dt-binding based on observed hardware behavior
> and clarify the original delay model for eth0.
> 
> The previous binding used an enum-based definition for
> rx-internal-delay-ps and tx-internal-delay-ps. Replace it with a
> range-based model using:
> 
>   - minimum: 0
>   - maximum: 2540
>   - multipleOf: 20
> 
> This better reflects the actual hardware implementation, which
> supports 20ps granularity delay steps in the MAC RGMII interface.
> 
> The tx/rx internal delay values are clarified as MAC-side programmable
> delay components applied on the RGMII clock/data path, representing
> the effective delay seen at the MAC interface.
> 
> This does not change the intended hardware semantics, but aligns the
> binding with the actual hardware implementation.
> 
> These properties are optional and only required when MAC-side fine
> tuning is needed; otherwise delay alignment is provided by PHY or
> board design.
> 
> Depending on the selected RGMII timing mode, delay alignment may be
> provided by the PHY (e.g. rgmii-id) or by board/MAC-side configuration.
> When PHY or board design already provides the required delay, these
> MAC-side properties may be omitted. When MAC-side fine tuning is
> required, they should be provided to describe the internal RGMII
> timing adjustment.
> 
> Additionally, extend the description of the HSP subsystem register
> layout used by the MAC glue logic. This includes explicit TXD and RXD
> delay control registers to ensure deterministic initialization and
> to override any residual configuration potentially left by bootloaders.
> 
> Add reference to the EIC7700X SoC Technical Reference Manual,
> Chapter 10 ("High-Speed Interface"), Part 4 for background of the
> HSP CSR block:
> https://github.com/eswincomputing/EIC7700X-SoC-Technical-Reference-Manual/releases
> 
> There are no in-tree users of this binding, so no ABI impact is
> expected.
> 
> Fixes: 888bd0eca93c ("dt-bindings: ethernet: eswin: Document for EIC7700 SoC")
> Signed-off-by: Zhi Li <lizhi2@eswincomputing.com>
> ---

While this is v1, it's really v8 and there should therefore be a
changelog that explains where my ack and the new compatible went.

Cheers,
Conor.

>  .../bindings/net/eswin,eic7700-eth.yaml       | 50 +++++++++++++------
>  1 file changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml b/Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml
> index 91e8cd1db67b..fab95603bd82 100644
> --- a/Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml
> +++ b/Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml
> @@ -63,16 +63,39 @@ properties:
>        - const: stmmaceth
>  
>    rx-internal-delay-ps:
> -    enum: [0, 200, 600, 1200, 1600, 1800, 2000, 2200, 2400]
> +    minimum: 0
> +    maximum: 2540
> +    multipleOf: 20
> +    description:
> +      RX internal delay in picoseconds applied on the RGMII clock at the MAC
> +      side. The hardware supports 20 ps steps.
> +      This property is optional and only needed when MAC-side delay tuning
> +      is required.
>  
>    tx-internal-delay-ps:
> -    enum: [0, 200, 600, 1200, 1600, 1800, 2000, 2200, 2400]
> +    minimum: 0
> +    maximum: 2540
> +    multipleOf: 20
> +    description:
> +      TX internal delay in picoseconds applied on the RGMII clock at the MAC
> +      side. The hardware supports 20 ps steps.
> +      This property is optional and only needed when MAC-side delay tuning
> +      is required.
>  
>    eswin,hsp-sp-csr:
>      description:
>        HSP CSR is to control and get status of different high-speed peripherals
>        (such as Ethernet, USB, SATA, etc.) via register, which can tune
>        board-level's parameters of PHY, etc.
> +
> +      Additional background information about the High-Speed Subsystem
> +      and the HSP CSR block is available in Chapter 10 ("High-Speed Interface")
> +      of the EIC7700X SoC Technical Reference Manual, Part 4
> +      (EIC7700X_SoC_Technical_Reference_Manual_Part4.pdf). The manual is
> +      publicly available at
> +      https://github.com/eswincomputing/EIC7700X-SoC-Technical-Reference-Manual/releases
> +
> +      This reference is provided for background information only.
>      $ref: /schemas/types.yaml#/definitions/phandle-array
>      items:
>        - items:
> @@ -82,6 +105,8 @@ properties:
>            - description: Offset of AXI clock controller Low-Power request
>                           register
>            - description: Offset of register controlling TX/RX clock delay
> +          - description: Offset of register controlling TXD delay
> +          - description: Offset of register controlling RXD delay
>  
>  required:
>    - compatible
> @@ -93,8 +118,6 @@ required:
>    - phy-mode
>    - resets
>    - reset-names
> -  - rx-internal-delay-ps
> -  - tx-internal-delay-ps
>    - eswin,hsp-sp-csr
>  
>  unevaluatedProperties: false
> @@ -104,24 +127,23 @@ examples:
>      ethernet@50400000 {
>          compatible = "eswin,eic7700-qos-eth", "snps,dwmac-5.20";
>          reg = <0x50400000 0x10000>;
> -        clocks = <&d0_clock 186>, <&d0_clock 171>, <&d0_clock 40>,
> -                <&d0_clock 193>;
> -        clock-names = "axi", "cfg", "stmmaceth", "tx";
>          interrupt-parent = <&plic>;
>          interrupts = <61>;
>          interrupt-names = "macirq";
> -        phy-mode = "rgmii-id";
> -        phy-handle = <&phy0>;
> +        clocks = <&d0_clock 186>, <&d0_clock 171>, <&d0_clock 40>,
> +                <&d0_clock 193>;
> +        clock-names = "axi", "cfg", "stmmaceth", "tx";
>          resets = <&reset 95>;
>          reset-names = "stmmaceth";
> -        rx-internal-delay-ps = <200>;
> -        tx-internal-delay-ps = <200>;
> -        eswin,hsp-sp-csr = <&hsp_sp_csr 0x100 0x108 0x118>;
> -        snps,axi-config = <&stmmac_axi_setup>;
> +        eswin,hsp-sp-csr = <&hsp_sp_csr 0x100 0x108 0x118 0x114 0x11c>;
> +        phy-handle = <&phy0>;
> +        phy-mode = "rgmii-id";
>          snps,aal;
>          snps,fixed-burst;
>          snps,tso;
> -        stmmac_axi_setup: stmmac-axi-config {
> +        snps,axi-config = <&stmmac_axi_setup_gmac0>;
> +
> +        stmmac_axi_setup_gmac0: stmmac-axi-config {
>              snps,blen = <0 0 0 0 16 8 4>;
>              snps,rd_osr_lmt = <2>;
>              snps,wr_osr_lmt = <2>;
> -- 
> 2.25.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [syzbot] [kernel?] WARNING: ODEBUG bug in smpboot_thread_fn
From: Ido Schimmel @ 2026-05-07 17:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, bridge,
	Nikolay Aleksandrov, netdev
In-Reply-To: <87bjerwqan.ffs@tglx>

On Thu, May 07, 2026 at 10:57:04AM +0200, Thomas Gleixner wrote:
> On Wed, May 06 2026 at 18:29, Thomas Gleixner wrote:
> > On Mon, May 04 2026 at 05:23, syzbot wrote:
> >>
> >> ------------[ cut here ]------------
> >> ODEBUG: free active (active state 0) object: ffff888033a47278 object type: timer_list hint: br_ip6_multicast_port_query_expired+0x0/0x380 net/bridge/br_multicast.c:-1
> >
> >                                                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > An object which contains an active timer is RCU freed....
> 
> Unlike the other timer in the same object, the own_query timer is not
> shut down in br_multicast_port_ctx_deinit()
> 
> Something kike the below.
> 
> Thanks,
> 
>         tglx
> ---
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -2030,8 +2030,10 @@ void br_multicast_port_ctx_deinit(struct
>  
>  #if IS_ENABLED(CONFIG_IPV6)
>  	timer_delete_sync(&pmctx->ip6_mc_router_timer);
> +	timer_delete_sync(&pmctx->ip6_own_query_timer);
>  #endif
>  	timer_delete_sync(&pmctx->ip4_mc_router_timer);
> +	timer_delete_sync(&pmctx->ip4_own_query_timer);
>  
>  	spin_lock_bh(&br->multicast_lock);
>  	del |= br_ip6_multicast_rport_del(pmctx);

Thanks for the report and the fix. It looks correct, but it's unclear to
me which commit to blame.

AFAICT, the trace tells us that the timer is pending (not executing)
when the object that contains it is RCU freed. However, it shouldn't be
possible for the timer to be pending at this stage since it is
deactivated when the port multicast context is disabled and it is only
reactivated if the context is not disabled.

So, I see two options:

1. We did not disable port multicast context.

2. We did disable the port multicast context, but the timer somehow got
reactivated.

I will look into it...

^ permalink raw reply

* [PATCH net-next 0/3] net/mlx5: Steering misc enhancements
From: Tariq Toukan @ 2026-05-07 17:34 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Yevgeny Kliteynik, Vlad Dogaru, Simon Horman, Kees Cook,
	Alex Vesker, Erez Shitrit, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea

Hi,

This small series by Yevgeny contains a few steering enhancements /
cleanups.

Regards,
Tariq

Yevgeny Kliteynik (3):
  net/mlx5: HWS, Check if device is down while polling for completion
  net/mlx5: HWS, Handle destroying table that has a miss table
  net/mlx5: DR, Remove unused field of struct mlx5dr_matcher_rx_tx

 .../ethernet/mellanox/mlx5/core/steering/hws/bwc.c   | 12 ++++++++++++
 .../ethernet/mellanox/mlx5/core/steering/hws/table.c |  3 +++
 .../mellanox/mlx5/core/steering/sws/dr_types.h       |  1 -
 3 files changed, 15 insertions(+), 1 deletion(-)


base-commit: dacf281771a9aed1a723b196120a0de8637910b9
-- 
2.44.0


^ permalink raw reply

* [PATCH net-next 1/3] net/mlx5: HWS, Check if device is down while polling for completion
From: Tariq Toukan @ 2026-05-07 17:34 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Yevgeny Kliteynik, Vlad Dogaru, Simon Horman, Kees Cook,
	Alex Vesker, Erez Shitrit, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea, Shay Drori
In-Reply-To: <20260507173443.320465-1-tariqt@nvidia.com>

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

In case the device is down for any reason (e.g. FLR),
the HW will no longer generate completions - no point
polling and waiting for timeout.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/steering/hws/bwc.c   | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
index 6dcd9c2a78aa..eae02bc74221 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
@@ -422,6 +422,18 @@ int mlx5hws_bwc_queue_poll(struct mlx5hws_context *ctx,
 	if (!got_comp && !drain)
 		return 0;
 
+	if (unlikely(ctx->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)) {
+		/* If the device is down for any reason (e.g. FLR), the HW will
+		 * no longer generate completions.
+		 * Note that ETIMEDOUT is returned here because the BWC layer
+		 * already has a special handling for timeouts - it breaks the
+		 * rehash / resize / shrink loops to avoid chain of timeouts.
+		 */
+		mlx5_core_warn_once(ctx->mdev,
+				    "BWC poll: device is down, polling for completion aborted\n");
+		return -ETIMEDOUT;
+	}
+
 	queue_full = mlx5hws_send_engine_full(&ctx->send_queue[queue_id]);
 	while (queue_full || ((got_comp || drain) && *pending_rules)) {
 		ret = mlx5hws_send_queue_poll(ctx, queue_id, comp, burst_th);
-- 
2.44.0


^ permalink raw reply related

* [PATCH net-next 2/3] net/mlx5: HWS, Handle destroying table that has a miss table
From: Tariq Toukan @ 2026-05-07 17:34 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Yevgeny Kliteynik, Vlad Dogaru, Simon Horman, Kees Cook,
	Alex Vesker, Erez Shitrit, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea, Moshe Shemesh
In-Reply-To: <20260507173443.320465-1-tariqt@nvidia.com>

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

If a table has a miss table that was created by
'mlx5hws_table_set_default_miss' API function, its miss_tbl
keeps the table that points to it in a list.
If such table is deleted, we need to also remove it from the
miss_tbl list, otherwise the node in miss_tbl list will contain
garbage.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c
index bd292485a25b..dd7927983ab2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/table.c
@@ -282,6 +282,9 @@ int mlx5hws_table_destroy(struct mlx5hws_table *tbl)
 		goto unlock_err;
 	}
 
+	if (tbl->default_miss.miss_tbl)
+		list_del_init(&tbl->default_miss.next);
+
 	list_del_init(&tbl->tbl_list_node);
 	mutex_unlock(&ctx->ctrl_lock);
 
-- 
2.44.0


^ permalink raw reply related

* [PATCH net-next 3/3] net/mlx5: DR, Remove unused field of struct mlx5dr_matcher_rx_tx
From: Tariq Toukan @ 2026-05-07 17:34 UTC (permalink / raw)
  To: Christoph Paasch, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Yevgeny Kliteynik, Vlad Dogaru, Simon Horman, Kees Cook,
	Alex Vesker, Erez Shitrit, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea
In-Reply-To: <20260507173443.320465-1-tariqt@nvidia.com>

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Remove a field that was never used.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_types.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_types.h
index cc328292bf84..e0344707f522 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_types.h
@@ -986,7 +986,6 @@ struct mlx5dr_matcher_rx_tx {
 					       [DR_RULE_MAX_STES];
 	u8 num_of_builders;
 	u8 num_of_builders_arr[DR_RULE_IPV_MAX][DR_RULE_IPV_MAX];
-	u64 default_icm_addr;
 	struct mlx5dr_table_rx_tx *nic_tbl;
 	u32 prio;
 	struct list_head list_node;
-- 
2.44.0


^ permalink raw reply related

* Re: [net-next v3 1/5] dt-bindings: net: starfive,jh7110-dwmac: Remove jh8100
From: Conor Dooley @ 2026-05-07 17:36 UTC (permalink / raw)
  To: Minda Chen
  Cc: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski, netdev,
	linux-kernel, linux-stm32, devicetree
In-Reply-To: <20260507094115.8355-2-minda.chen@starfivetech.com>

[-- Attachment #1: Type: text/plain, Size: 318 bytes --]

On Thu, May 07, 2026 at 05:41:11PM +0800, Minda Chen wrote:
> Remove jh8100 dt-bindings because do not support it now.
> StarFive have stopped jh8100 developing and will not release
> it outside.
> 
> Signed-off-by: Minda Chen <minda.chen@starfivetech.com>

Acked-by: Conor Dooley <conor.dooley@microchip.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [net-next v3 3/5] dt-bindings: net: starfive,jh7110-dwmac: Add jhb100 sgmii rx clk
From: Conor Dooley @ 2026-05-07 17:42 UTC (permalink / raw)
  To: Minda Chen
  Cc: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski, netdev,
	linux-kernel, linux-stm32, devicetree
In-Reply-To: <20260507094115.8355-4-minda.chen@starfivetech.com>

[-- Attachment #1: Type: text/plain, Size: 3320 bytes --]

On Thu, May 07, 2026 at 05:41:13PM +0800, Minda Chen wrote:
> jhb100 SGMII interface tx/rx mac clock is split and require to
> set clock rate in 10M/100M/1000M speed. So dts need to add a
> new rx clock in code, dts and dt binding doc.
> So in jhb100 SGMII interface contain 6 clocks, RMII/RGMII
> interface still contail 5 clocks.

Why is this not being done in the commit adding the jhb100 in the first
place?

> 
> Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
> ---
>  .../bindings/net/starfive,jh7110-dwmac.yaml   | 42 ++++++++++++++++---
>  1 file changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
> index 06aeaa0f6f00..af160a8dedb8 100644
> --- a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
> +++ b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
> @@ -39,20 +39,18 @@ properties:
>      maxItems: 1
>  
>    clocks:
> +    minItems: 5
>      items:
>        - description: GMAC main clock
>        - description: GMAC AHB clock
>        - description: PTP clock
>        - description: TX clock
>        - description: GTX clock
> +      - description: SGMII RX clock
>  
>    clock-names:
> -    items:
> -      - const: stmmaceth
> -      - const: pclk
> -      - const: ptp_ref
> -      - const: tx
> -      - const: gtx
> +    minItems: 5
> +    maxItems: 6
>  
>    starfive,tx-use-rgmii-clk:
>      description:
> @@ -99,6 +97,18 @@ allOf:
>            minItems: 2
>            maxItems: 2
>  
> +        clocks:
> +          minItems: 5
> +          maxItems: 5

This can just be "maxItems: 5", since minItems is set outside the
conditional to 5.

> +
> +        clock-names:
> +          items:
> +            - const: stmmaceth
> +            - const: pclk
> +            - const: ptp_ref
> +            - const: tx
> +            - const: gtx
> +
>          resets:
>            maxItems: 1
>  
> @@ -111,6 +121,26 @@ allOf:
>            contains:
>              const: starfive,jh7110-dwmac
>      then:
> +      properties:
> +        clocks:
> +          minItems: 5
> +          maxItems: 6

Remove these constraints, since they don't do anything more than the
outside ones do.

> +
> +        clock-names:
> +          oneOf:
> +            - items:
> +                - const: stmmaceth
> +                - const: pclk
> +                - const: ptp_ref
> +                - const: tx
> +                - const: gtx
> +            - items:
> +                - const: stmmaceth
> +                - const: pclk
> +                - const: ptp_ref
> +                - const: tx
> +                - const: gtx
> +                - const: sgmii_rx

Can't you just leave this list outside the conditional section, and add
the extra item to the end? The only difference appears to be the
sgmii_rx clock, and it's at the end.

I'm also not really convinced that this flexibility is required, unless
there are some controllers on the platform that do not support sgmii.

pw-bot: changes-requested

Cheers,
Conor.

>        if:
>          properties:
>            compatible:
> -- 
> 2.17.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [GIT PULL] Networking for v7.1-rc3
From: pr-tracker-bot @ 2026-05-07 17:42 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: torvalds, kuba, davem, netdev, linux-kernel, pabeni
In-Reply-To: <20260507172147.3509230-1-kuba@kernel.org>

The pull request you sent on Thu,  7 May 2026 10:21:47 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git tags/net-7.1-rc3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fcee7d82f27d6a8b1ddc5bbefda59b4e441e9bc0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [net-next v3 4/5] net: stmmac: starfive: Add jhb100 SGMII interface
From: Conor Dooley @ 2026-05-07 17:44 UTC (permalink / raw)
  To: Minda Chen
  Cc: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski, netdev,
	linux-kernel, linux-stm32, devicetree
In-Reply-To: <20260507094115.8355-5-minda.chen@starfivetech.com>

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

On Thu, May 07, 2026 at 05:41:14PM +0800, Minda Chen wrote:
> Add jhb100 compatible and SGMII support. jhb100 soc contains
> 2 SGMII interfaces and integrated with serdes PHY. SGMII with
> split TX/RX MAC clock and need to set 2.5M/25M/125M TX/RX clock
> rate in 10M/100M/1000M speed mode.
> 
> Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
> Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
> @@ -130,6 +160,7 @@ static const struct starfive_dwmac_data jh7100_data = {
>  static const struct of_device_id starfive_dwmac_match[] = {
>  	{ .compatible = "starfive,jh7100-dwmac", .data = &jh7100_data },
>  	{ .compatible = "starfive,jh7110-dwmac" },
> +	{ .compatible = "starfive,jhb100-dwmac" },

You've declared compatibility with the jh7110, why do you also need to
add the new comaptible?

>  	{ /* sentinel */ }
>  };
>  MODULE_DEVICE_TABLE(of, starfive_dwmac_match);
> -- 
> 2.17.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH net] rtnetlink: add RTEXT_FILTER_TERSE_DUMP support
From: Eric Dumazet @ 2026-05-07 17:45 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, David Ahern, Kuniyuki Iwashima, netdev,
	eric.dumazet, Eric Dumazet

iproute2 can spend considerable amount of time in ll_init_map()
or ll_link_get() to dump verbose netdev attributes, contributing
to RTNL pressure.

Add RTEXT_FILTER_TERSE_DUMP new flag so that rtnl_fill_ifinfo()
limits its output to:

- struct nlmsghdr
- IFLA_IFNAME
- IFLA_PROP_LIST

We can later avoid using RTNL when RTEXT_FILTER_TERSE_DUMP
is requested, as none of these attributes need RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/rtnetlink.h |  1 +
 net/core/rtnetlink.c           | 31 ++++++++++++++++++++++---------
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index dab9493c791b8465c6476990f42c4ee5ae82da2d..4b1dbd554e5c72c90d2416f7ade37956ad5472b7 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -840,6 +840,7 @@ enum {
 #define RTEXT_FILTER_CFM_CONFIG	(1 << 5)
 #define RTEXT_FILTER_CFM_STATUS	(1 << 6)
 #define RTEXT_FILTER_MST	(1 << 7)
+#define RTEXT_FILTER_TERSE_DUMP	(1 << 8)
 
 /* End of information exported to user level */
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b613bb6e07df6586aa06e7a59c8384dcedffeeef..7a9a769d142b6826d7fb01b599a4d9e0ae09a97d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1295,7 +1295,12 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 
 	size = NLMSG_ALIGN(sizeof(struct ifinfomsg))
 	       + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
-	       + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
+	       + rtnl_prop_list_size(dev);
+
+	if (ext_filter_mask & RTEXT_FILTER_TERSE_DUMP)
+		return size;
+
+	size += nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
 	       + nla_total_size(IFNAMSIZ) /* IFLA_QDISC */
 	       + nla_total_size_64bit(sizeof(struct rtnl_link_ifmap))
 	       + nla_total_size(MAX_ADDR_LEN) /* IFLA_ADDRESS */
@@ -1342,7 +1347,6 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4)  /* IFLA_CARRIER_DOWN_COUNT */
 	       + nla_total_size(4)  /* IFLA_MIN_MTU */
 	       + nla_total_size(4)  /* IFLA_MAX_MTU */
-	       + rtnl_prop_list_size(dev)
 	       + nla_total_size(MAX_ADDR_LEN) /* IFLA_PERM_ADDRESS */
 	       + rtnl_devlink_port_size(dev)
 	       + rtnl_dpll_pin_size(dev)
@@ -1940,15 +1944,18 @@ static int rtnl_fill_alt_ifnames(struct sk_buff *skb,
 	struct netdev_name_node *name_node;
 	int count = 0;
 
+	rcu_read_lock();
 	list_for_each_entry_rcu(name_node, &dev->name_node->list, list) {
-		if (nla_put_string(skb, IFLA_ALT_IFNAME, name_node->name))
+		if (nla_put_string(skb, IFLA_ALT_IFNAME, name_node->name)) {
+			rcu_read_unlock();
 			return -EMSGSIZE;
+		}
 		count++;
 	}
+	rcu_read_unlock();
 	return count;
 }
 
-/* RCU protected. */
 static int rtnl_fill_prop_list(struct sk_buff *skb,
 			       const struct net_device *dev)
 {
@@ -2071,13 +2078,20 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	ifm->ifi_flags = netif_get_flags(dev);
 	ifm->ifi_change = change;
 
-	if (tgt_netnsid >= 0 && nla_put_s32(skb, IFLA_TARGET_NETNSID, tgt_netnsid))
-		goto nla_put_failure;
-
 	netdev_copy_name(dev, devname);
 	if (nla_put_string(skb, IFLA_IFNAME, devname))
 		goto nla_put_failure;
 
+	if (rtnl_fill_prop_list(skb, dev))
+		goto nla_put_failure;
+
+	if (ext_filter_mask & RTEXT_FILTER_TERSE_DUMP)
+		goto end;
+
+	if (tgt_netnsid >= 0 &&
+	    nla_put_s32(skb, IFLA_TARGET_NETNSID, tgt_netnsid))
+		goto nla_put_failure;
+
 	if (nla_put_u32(skb, IFLA_TXQLEN, READ_ONCE(dev->tx_queue_len)) ||
 	    nla_put_u8(skb, IFLA_OPERSTATE,
 		       netif_running(dev) ? READ_ONCE(dev->operstate) :
@@ -2190,8 +2204,6 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 		goto nla_put_failure_rcu;
 	if (rtnl_fill_link_ifmap(skb, dev))
 		goto nla_put_failure_rcu;
-	if (rtnl_fill_prop_list(skb, dev))
-		goto nla_put_failure_rcu;
 	rcu_read_unlock();
 
 	if (dev->dev.parent &&
@@ -2210,6 +2222,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	if (rtnl_fill_dpll_pin(skb, dev))
 		goto nla_put_failure;
 
+end:
 	nlmsg_end(skb, nlh);
 	return 0;
 
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

* Re: [PATCH net] rtnetlink: add RTEXT_FILTER_TERSE_DUMP support
From: Eric Dumazet @ 2026-05-07 17:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, David Ahern, Kuniyuki Iwashima, netdev,
	eric.dumazet
In-Reply-To: <20260507174547.4125412-1-edumazet@google.com>

On Thu, May 7, 2026 at 10:45 AM Eric Dumazet <edumazet@google.com> wrote:
>
> iproute2 can spend considerable amount of time in ll_init_map()
> or ll_link_get() to dump verbose netdev attributes, contributing
> to RTNL pressure.
>
> Add RTEXT_FILTER_TERSE_DUMP new flag so that rtnl_fill_ifinfo()
> limits its output to:
>
> - struct nlmsghdr
> - IFLA_IFNAME
> - IFLA_PROP_LIST
>
> We can later avoid using RTNL when RTEXT_FILTER_TERSE_DUMP
> is requested, as none of these attributes need RTNL.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Wrong patch title, this targets net-next tree (obviously).

Thanks.

^ permalink raw reply

* RE: [PATCH net] net: ena: PHC: Check return code before setting timestamp output
From: Kiyanovski, Arthur @ 2026-05-07 18:09 UTC (permalink / raw)
  To: Vadim Fedorenko, David Miller, Jakub Kicinski,
	netdev@vger.kernel.org
  Cc: Richard Cochran, Eric Dumazet, Paolo Abeni, David Woodhouse,
	Thomas Gleixner, Miroslav Lichvar, Andrew Lunn, Wen Gu, Xuan Zhuo,
	Woodhouse, David, Sarna, Yuval, Machulsky, Zorik,
	Matushevsky, Alexander, Bshara, Saeed, Wilson, Matt,
	Liguori, Anthony, Bshara, Nafea, Schmeilin, Evgeny,
	Belgazal, Netanel, Saidi, Ali, Herrenschmidt, Benjamin,
	Dagan, Noam, Arinzon, David, Ostrovsky, Evgeny, Tabachnik, Ofir,
	Bernstein, Amit, stable@vger.kernel.org
In-Reply-To: <6511ab18-250b-436a-a11c-f50e78334666@linux.dev>


> -----Original Message-----
> From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> Sent: Thursday, May 7, 2026 3:38 AM
> Subject: RE: [EXTERNAL] [PATCH net] net: ena: PHC: Check return code before
> setting timestamp output
> ...
> Just an observation while reviewing - the idea of taking 2 spinlocks while
> reading timestamp doesn't look great and can potentially be CPU-expensive.
> Please, consider refactoring into RCU-style...

Noted, thanks for the review. We'll evaluate whether an RCU-based approach is appropriate here.

Arthur

^ permalink raw reply

* [PATCH net] ice: fix packet corruption due to extraneous page flip
From: John Ousterhout @ 2026-05-07 18:12 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: intel-wired-lan, przemyslaw.kitszel, netdev, John Ousterhout

Consider the following sequence of events:
* The bottom half of a buffer page is filled with data from
  packet A. The page has a net reference count (reference count
  - bias) of 1. The page is returned to the NIC, flipped to
  use the top half.
* Before the reference on the page is released, the NIC returns
  the page with no data in it ('size' is zero in ice_clean_rx_irq).
  In this case the bias does not get decremented. The page still
  has a net reference count of 1, so it gets returned to the NIC.
  However, ice_put_rx_mbuf flipped the page so that the bottom
  half is active.
* If the NIC stores another packet in the page before packet A
  has released its reference, the data in packet A will be
  overwritten with data from the new packet.
The fix is for ice_put_rx_mbuf not to flip pages that have a
size of 0.

Note: major revisions to the ice driver make this patch irrelevant
for recent versions. It applies to longterm stable versions
6.18.27 and 6.12.86; it also seems relevant for 6.6.137, but would
need modifications for that version. I have not examined earlier
versions

Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 51c459a3e722..371e6db3c272 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1215,6 +1215,9 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
 
 	while (idx != ntc) {
+		union ice_32b_rx_flex_desc *rx_desc;
+		unsigned int size;
+
 		buf = &rx_ring->rx_buf[idx];
 		if (++idx == cnt)
 			idx = 0;
@@ -1224,10 +1227,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		 * To do this, only adjust pagecnt_bias for fragments up to
 		 * the total remaining after the XDP program has run.
 		 */
-		if (verdict != ICE_XDP_CONSUMED)
-			ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
-		else if (i++ <= xdp_frags)
+		if (verdict != ICE_XDP_CONSUMED) {
+			/* Don't "flip" the page if size is 0: in this case
+			 * the data in the current half will not be used so
+			 * it's OK to reuse that half. And, since the bias
+			 * didn't get decremented for this half, the page can
+			 * be returned to the NIC even if the other half is
+			 * still in use, so flipping the page could cause
+			 * live packet data to be overwritten.
+			 */
+			if (size != 0)
+				ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
+		} else if (i++ <= xdp_frags) {
 			buf->pagecnt_bias++;
+		}
 
 		ice_put_rx_buf(rx_ring, buf);
 	}
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next 08/12] dt-bindings: net: toshiba,tc965x-dwmac: add TC956x Ethernet bridge
From: Alex Elder @ 2026-05-07 18:37 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, maxime.chevallier,
	rmk+kernel, konradybcio, robh, krzk+dt, conor+dt, linusw, brgl,
	arnd, gregkh, Daniel Thompson, mohd.anwar, a0987203069,
	alexandre.torgue, ast, boon.khai.ng, chenchuangyu, chenhuacai,
	daniel, hawk, hkallweit1, inochiama, john.fastabend, julianbraha,
	livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <afycOwz5TpkegkZd@baldur>

On 5/7/26 9:12 AM, Bjorn Andersson wrote:
> On Fri, May 01, 2026 at 10:54:16AM -0500, Alex Elder wrote:
>> diff --git a/Documentation/devicetree/bindings/net/toshiba,tc956x-dwmac.yaml b/Documentation/devicetree/bindings/net/toshiba,tc956x-dwmac.yaml
> [..]
>> +
>> +  gpio-controller: true
> 
> I don't have any concern with the use of a proper gpio driver to model
> the implementation, but if I understand correctly this relationship
> between gpio controller and gpio consumer is strictly internal to "the
> PCI device".

(I think you're already cool with this but I still wanted to respond.)

That is not correct.  These GPIO lines are used two ways for the
RB3gen2:
- drivers/pci/pwrctrl/pci-pwrctrl-tc9563.c uses GPIOs 2 and 3 to
   assert/deassert the reset lines associated with the two exposed
   downstream PCIe ports on the PCIe switch within the TC956x.

- Each of the Ethernet PHYs has a reset GPIO.  On the RB3gen2, the
   GPIOs used for the purpose come from the GPIO controller embedded
   in the TC9564 (00 and 01).

These are therefore "exposed" (they are *not* strictly internal).

> Is this connection variable or is the link merely expressed in
> DeviceTree to mitigate the fact that you choose to implement the
> responsibilities of the two parts split into two device drivers?

It is variable.  These resets might be implemented by other GPIO
controllers on other platforms.

> Are there other consumers of these TC956x gpios which would result in a
> board designer (and hence dts author) to ever reference this
> gpio-controller in a different way?

They could.  Nine of these GPIOs are exposed by the TC956x pins
(GPIO00-06, GPIO12, GPIO35 and GPIO36).  The RB3gen2 uses 00-03
(and possibly 04 but that's for a PHY we haven't tested yet).

					-Alex

> Regards,
> Bjorn


^ permalink raw reply

* Re: [PATCH net-next 09/12] gpio: tc956x: add TC956x/QPS615 support
From: Alex Elder @ 2026-05-07 18:39 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, maxime.chevallier,
	rmk+kernel, andersson, konradybcio, robh, krzk+dt, conor+dt,
	linusw, brgl, arnd, gregkh, daniel, mohd.anwar, a0987203069,
	alexandre.torgue, ast, boon.khai.ng, chenchuangyu, chenhuacai,
	daniel, hawk, hkallweit1, inochiama, john.fastabend, julianbraha,
	livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <3e5a42cc-53b8-4065-a32a-d754be40b4c7@lunn.ch>

On 5/2/26 9:48 PM, Andrew Lunn wrote:
>> It's possible gpio-regmap.c *could* be used.  We started with
>> vendor code and this code got separated at some point along
>> the way.  It was working, and I don't think I pursued other
>> options at that point.  I'll look at this possibility before we
>> send out the next version.
> 
> The GPIO subsystem has made a big effort to provide generic code,
> since GPIOs are pretty simple things with a lot in common. So if the
> generic code works, or can be made to work with minor changes, you
> should use it.

Yes, I can confirm this is what I will use.  As mentioned
elsewhere, LinusW provided a patch that support the one
unusual thing this does (input-only GPIO lines).

>> What do you mean instantiate it twice?
> 
> I _think_ you need one instance for the first 32 GPIOs, and a second
> one for the remaining GPIOs. But maybe config->reg_stride might allow
> it to work with a single instance?

Oh I see.  Looking at it now, I presume the reg_stride will work here.

Thanks.

					-Alex


>     Andrew


^ permalink raw reply

* [PATCH net v2] ice: fix packet corruption due to extraneous page flip
From: John Ousterhout @ 2026-05-07 18:38 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: intel-wired-lan, przemyslaw.kitszel, netdev, John Ousterhout

Consider the following sequence of events:
* The bottom half of a buffer page is filled with data from
  packet A. The page has a net reference count (reference count
  - bias) of 1. The page is returned to the NIC, flipped to
  use the top half.
* Before the reference on the page is released, the NIC returns
  the page with no data in it ('size' is zero in ice_clean_rx_irq).
  In this case the bias does not get decremented. The page still
  has a net reference count of 1, so it gets returned to the NIC.
  However, ice_put_rx_mbuf flipped the page so that the bottom
  half is active.
* If the NIC stores another packet in the page before packet A
  has released its reference, the data in packet A will be
  overwritten with data from the new packet.
The fix is for ice_put_rx_mbuf not to flip pages that have a
size of 0.

Note: major revisions to the ice driver make this patch irrelevant
for recent versions. It applies to longterm stable versions
6.18.27 and 6.12.86; it also seems relevant for 6.6.137, but would
need modifications for that version. I have not examined earlier
versions

Signed-off-by: John Ousterhout <ouster@cs.stanford.edu>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 51c459a3e722..081c7a7392b7 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1215,6 +1215,13 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		xdp_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
 
 	while (idx != ntc) {
+		union ice_32b_rx_flex_desc *rx_desc;
+		unsigned int size;
+
+		rx_desc = ICE_RX_DESC(rx_ring, idx);
+		size = le16_to_cpu(rx_desc->wb.pkt_len) &
+		       ICE_RX_FLX_DESC_PKT_LEN_M;
+
 		buf = &rx_ring->rx_buf[idx];
 		if (++idx == cnt)
 			idx = 0;
@@ -1224,10 +1231,20 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		 * To do this, only adjust pagecnt_bias for fragments up to
 		 * the total remaining after the XDP program has run.
 		 */
-		if (verdict != ICE_XDP_CONSUMED)
-			ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
-		else if (i++ <= xdp_frags)
+		if (verdict != ICE_XDP_CONSUMED) {
+			/* Don't "flip" the page if size is 0: in this case
+			 * the data in the current half will not be used so
+			 * it's OK to reuse that half. And, since the bias
+			 * didn't get decremented for this half, the page can
+			 * be returned to the NIC even if the other half is
+			 * still in use, so flipping the page could cause
+			 * live packet data to be overwritten.
+			 */
+			if (size != 0)
+				ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
+		} else if (i++ <= xdp_frags) {
 			buf->pagecnt_bias++;
+		}
 
 		ice_put_rx_buf(rx_ring, buf);
 	}
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next 10/12] net: stmmac: tc956x: add TC956x/QPS615 support
From: Alex Elder @ 2026-05-07 18:44 UTC (permalink / raw)
  To: Xilin Wu, andrew+netdev, davem, edumazet, kuba, pabeni,
	maxime.chevallier, rmk+kernel, andersson, konradybcio, robh,
	krzk+dt, conor+dt, linusw, brgl, arnd, gregkh
  Cc: Daniel Thompson, mohd.anwar, a0987203069, alexandre.torgue, ast,
	boon.khai.ng, chenchuangyu, chenhuacai, daniel, hawk, hkallweit1,
	inochiama, john.fastabend, julianbraha, livelycarpet87,
	matthew.gerlach, mcoquelin.stm32, me, prabhakar.mahadev-lad.rj,
	richardcochran, rohan.g.thomas, sdf, siyanteng, weishangjuan,
	wens, netdev, bpf, linux-arm-msm, devicetree, linux-gpio,
	linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <224E233C593EF171+8c8a43dd-5061-40f8-9eb7-f360eabf2ecc@radxa.com>

On 5/6/26 7:59 AM, Xilin Wu wrote:
> On 5/1/2026 11:54 PM, Alex Elder wrote:
>> +    /* AXI Configuration */
>> +    axi = &td->axi;
>> +    axi->axi_lpi_en = 1;
>> +    axi->axi_wr_osr_lmt = 31;
>> +    axi->axi_rd_osr_lmt = 31;
>> +    /* All sizes (2^2..2^8) are supported */
>> +    axi->axi_blen_regval = DMA_AXI_BLEN_MASK;
>> +    plat->axi = axi;
>> +
>> +    plat->mac_port_sel_speed = speed;
>> +    plat->flags = STMMAC_FLAG_MULTI_MSI_EN | STMMAC_FLAG_TSO_EN;
> 
> I got WoL working only after adding STMMAC_FLAG_USE_PHY_WOL here. I 
> guess it's required, since the driver clocks down the MAC/PMA/XPCS in 
> its suspend hook?

I just want to respond to this with a summary of our plans.

We will *not* be implementing wake-on-LAN (WoL) initially.  We
will work to get support for the eMACs upstream for TC956x, and
then as a separate step, we will enable WoL.

It's great to know you have it working, and our plan is to
implement it via the PHYs and not involve the MAC.  It seems
it will be relatively easy, but we have no plans to add it to
the current series.

					-Alex

^ permalink raw reply

* [PATCH iproute2-next] rdma: Align FRMR pool UAPI names with merged kernel UAPI
From: Chiara Meiohas @ 2026-05-07 18:46 UTC (permalink / raw)
  To: leon, dsahern, stephen
  Cc: michaelgur, jgg, linux-rdma, netdev, Chiara Meiohas

From: Michael Guralnik <michaelgur@nvidia.com>

The FRMR pools UAPI merged in kernel v7.0-rc1 commit dbd0472fd7a5
("RDMA/nldev: Expose kernel-internal FRMR pools in netlink")
uses different identifier names than what the iproute2 FRMR pools
series was developed against.

Update the vendored copy of RDMA UAPI and all references in the rdma
tool to match the names that actually shipped in the kernel.

Fixes: 93368ee34528 ("rdma: Update headers")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
---
 rdma/include/uapi/rdma/rdma_netlink.h | 30 ++++----
 rdma/res-frmr-pools.c                 | 98 +++++++++++++--------------
 rdma/res.h                            |  2 +-
 3 files changed, 65 insertions(+), 65 deletions(-)

diff --git a/rdma/include/uapi/rdma/rdma_netlink.h b/rdma/include/uapi/rdma/rdma_netlink.h
index 8709e558b..4356ec4a1 100644
--- a/rdma/include/uapi/rdma/rdma_netlink.h
+++ b/rdma/include/uapi/rdma/rdma_netlink.h
@@ -308,9 +308,9 @@ enum rdma_nldev_command {
 
 	RDMA_NLDEV_CMD_MONITOR,
 
-	RDMA_NLDEV_CMD_RES_FRMR_POOLS_GET, /* can dump */
+	RDMA_NLDEV_CMD_FRMR_POOLS_GET, /* can dump */
 
-	RDMA_NLDEV_CMD_RES_FRMR_POOLS_SET,
+	RDMA_NLDEV_CMD_FRMR_POOLS_SET,
 
 	RDMA_NLDEV_NUM_OPS
 };
@@ -590,19 +590,19 @@ enum rdma_nldev_attr {
 	/*
 	 * FRMR Pools attributes
 	 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOLS,			/* nested table */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_ENTRY,		/* nested table */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY,		/* nested table */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS,		/* u8 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS,	/* u32 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY,	/* u64 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS, /* u64 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES,	/* u32 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE,	/* u64 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE,		/* u64 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_AGING_PERIOD,	/* u32 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED,		/* u32 */
-	RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY, /* u64 */
+	RDMA_NLDEV_ATTR_FRMR_POOLS,		/* nested table */
+	RDMA_NLDEV_ATTR_FRMR_POOL_ENTRY,	/* nested table */
+	RDMA_NLDEV_ATTR_FRMR_POOL_KEY,		/* nested table */
+	RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS,	/* u8 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS,	/* u32 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY,	/* u64 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS,	/* u64 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES,	/* u32 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE,	/* u64 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE,	/* u64 */
+	RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD,	/* u32 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES,	/* u32 */
+	RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY,	/* u64 */
 
 	/*
 	 * Always the end
diff --git a/rdma/res-frmr-pools.c b/rdma/res-frmr-pools.c
index abcd21884..d5faa5c14 100644
--- a/rdma/res-frmr-pools.c
+++ b/rdma/res-frmr-pools.c
@@ -80,83 +80,83 @@ static int res_frmr_pools_line(struct rd *rd, const char *name, int idx,
 	char key_str[FRMR_POOL_KEY_MAX_LEN];
 	struct frmr_pool_key key = { 0 };
 
-	if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY]) {
+	if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_KEY]) {
 		if (mnl_attr_parse_nested(
-			    nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY],
+			    nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_KEY],
 			    rd_attr_cb, key_tb) != MNL_CB_OK)
 			return MNL_CB_ERROR;
 
-		if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS])
+		if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS])
 			key.ats = mnl_attr_get_u8(
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS]);
-		if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS])
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]);
+		if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS])
 			key.access_flags = mnl_attr_get_u32(
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS]);
-		if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY])
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]);
+		if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY])
 			key.vendor_key = mnl_attr_get_u64(
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY]);
-		if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS])
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]);
+		if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS])
 			key.num_dma_blocks = mnl_attr_get_u64(
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
-		if (key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY])
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
+		if (key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY])
 			kernel_vendor_key = mnl_attr_get_u64(
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
 
 		if (rd_is_filtered_attr(
 			    rd, "ats", key.ats,
-			    key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS]))
+			    key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]))
 			goto out;
 
 		if (rd_is_filtered_attr(
 			    rd, "access_flags", key.access_flags,
-			    key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS]))
+			    key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]))
 			goto out;
 
 		if (rd_is_filtered_attr(
 			    rd, "vendor_key", key.vendor_key,
-			    key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY]))
+			    key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]))
 			goto out;
 
 		if (rd_is_filtered_attr(
 			    rd, "num_dma_blocks", key.num_dma_blocks,
-			    key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS]))
+			    key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]))
 			goto out;
 	}
 
-	if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES])
+	if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES])
 		queue_handles = mnl_attr_get_u32(
-			nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES]);
+			nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES]);
 	if (rd_is_filtered_attr(
 		    rd, "queue", queue_handles,
-		    nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES]))
+		    nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES]))
 		goto out;
 
-	if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE])
+	if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE])
 		in_use = mnl_attr_get_u64(
-			nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE]);
+			nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE]);
 	if (rd_is_filtered_attr(rd, "in_use", in_use,
-				nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE]))
+				nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE]))
 		goto out;
 
-	if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE])
+	if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE])
 		max_in_use = mnl_attr_get_u64(
-			nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE]);
+			nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE]);
 	if (rd_is_filtered_attr(
 		    rd, "max_in_use", max_in_use,
-		    nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE]))
+		    nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE]))
 		goto out;
 
-	if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED])
+	if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES])
 		pinned_handles = mnl_attr_get_u32(
-			nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED]);
+			nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]);
 	if (rd_is_filtered_attr(rd, "pinned", pinned_handles,
-				nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED]))
+				nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]))
 		goto out;
 
 	open_json_object(NULL);
 	print_dev(idx, name);
 
-	if (nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY]) {
+	if (nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_KEY]) {
 		snprintf(key_str, sizeof(key_str),
 			 "%" PRIx64 ":%" PRIx64 ":%x:%s",
 			 key.vendor_key, key.num_dma_blocks,
@@ -166,30 +166,30 @@ static int res_frmr_pools_line(struct rd *rd, const char *name, int idx,
 		if (rd->show_details) {
 			res_print_u32(
 				"ats", key.ats,
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS]);
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS]);
 			res_print_u32(
 				"access_flags", key.access_flags,
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS]);
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS]);
 			res_print_u64(
 				"vendor_key", key.vendor_key,
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY]);
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY]);
 			res_print_u64(
 				"num_dma_blocks", key.num_dma_blocks,
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS]);
 			res_print_u64(
 				"kernel_vendor_key", kernel_vendor_key,
-				key_tb[RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
+				key_tb[RDMA_NLDEV_ATTR_FRMR_POOL_KEY_KERNEL_VENDOR_KEY]);
 		}
 	}
 
 	res_print_u32("queue", queue_handles,
-		      nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_QUEUE_HANDLES]);
+		      nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_QUEUE_HANDLES]);
 	res_print_u64("in_use", in_use,
-		      nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_IN_USE]);
+		      nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_IN_USE]);
 	res_print_u64("max_in_use", max_in_use,
-		      nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_MAX_IN_USE]);
+		      nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_MAX_IN_USE]);
 	res_print_u32("pinned", pinned_handles,
-		      nla_line[RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED]);
+		      nla_line[RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES]);
 
 	print_driver_table(rd, nla_line[RDMA_NLDEV_ATTR_DRIVER]);
 	close_json_object();
@@ -215,12 +215,12 @@ int res_frmr_pools_parse_cb(const struct nlmsghdr *nlh, void *data)
 
 	mnl_attr_parse(nlh, 0, rd_attr_cb, tb);
 	if (!tb[RDMA_NLDEV_ATTR_DEV_INDEX] || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
-	    !tb[RDMA_NLDEV_ATTR_RES_FRMR_POOLS])
+	    !tb[RDMA_NLDEV_ATTR_FRMR_POOLS])
 		return MNL_CB_ERROR;
 
 	name = mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_DEV_NAME]);
 	idx = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
-	nla_table = tb[RDMA_NLDEV_ATTR_RES_FRMR_POOLS];
+	nla_table = tb[RDMA_NLDEV_ATTR_FRMR_POOLS];
 
 	mnl_attr_for_each_nested(nla_entry, nla_table) {
 		struct nlattr *nla_line[RDMA_NLDEV_ATTR_MAX] = {};
@@ -256,10 +256,10 @@ static int res_frmr_pools_one_set_aging(struct rd *rd)
 		return -EINVAL;
 	}
 
-	rd_prepare_msg(rd, RDMA_NLDEV_CMD_RES_FRMR_POOLS_SET, &seq,
+	rd_prepare_msg(rd, RDMA_NLDEV_CMD_FRMR_POOLS_SET, &seq,
 		       (NLM_F_REQUEST | NLM_F_ACK));
 	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
-	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_AGING_PERIOD,
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOLS_AGING_PERIOD,
 			 aging_period);
 
 	return rd_sendrecv_msg(rd, seq);
@@ -294,24 +294,24 @@ static int res_frmr_pools_one_set_pinned(struct rd *rd)
 		return -EINVAL;
 	}
 
-	rd_prepare_msg(rd, RDMA_NLDEV_CMD_RES_FRMR_POOLS_SET, &seq,
+	rd_prepare_msg(rd, RDMA_NLDEV_CMD_FRMR_POOLS_SET, &seq,
 		       (NLM_F_REQUEST | NLM_F_ACK));
 	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
 
-	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_PINNED,
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_PINNED_HANDLES,
 			 pinned_value);
 
 	key_attr =
-		mnl_attr_nest_start(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY);
-	mnl_attr_put_u8(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ATS,
+		mnl_attr_nest_start(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_KEY);
+	mnl_attr_put_u8(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ATS,
 			pool_key.ats);
 	mnl_attr_put_u32(rd->nlh,
-			 RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_ACCESS_FLAGS,
+			 RDMA_NLDEV_ATTR_FRMR_POOL_KEY_ACCESS_FLAGS,
 			 pool_key.access_flags);
-	mnl_attr_put_u64(rd->nlh, RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_VENDOR_KEY,
+	mnl_attr_put_u64(rd->nlh, RDMA_NLDEV_ATTR_FRMR_POOL_KEY_VENDOR_KEY,
 			 pool_key.vendor_key);
 	mnl_attr_put_u64(rd->nlh,
-			 RDMA_NLDEV_ATTR_RES_FRMR_POOL_KEY_NUM_DMA_BLOCKS,
+			 RDMA_NLDEV_ATTR_FRMR_POOL_KEY_NUM_DMA_BLOCKS,
 			 pool_key.num_dma_blocks);
 	mnl_attr_nest_end(rd->nlh, key_attr);
 
diff --git a/rdma/res.h b/rdma/res.h
index 8d7b4a0bf..1f71115b9 100644
--- a/rdma/res.h
+++ b/rdma/res.h
@@ -200,7 +200,7 @@ struct filters frmr_pools_valid_filters[MAX_NUMBER_OF_FILTERS] = {
 	{ .name = "pinned", .is_number = true },
 };
 
-RES_FUNC(res_frmr_pools, RDMA_NLDEV_CMD_RES_FRMR_POOLS_GET,
+RES_FUNC(res_frmr_pools, RDMA_NLDEV_CMD_FRMR_POOLS_GET,
 	 frmr_pools_valid_filters, true, 0);
 
 int res_frmr_pools_set(struct rd *rd);
-- 
2.38.1


^ permalink raw reply related

* Re: [PATCH net v2 1/4] net: sparx5: defer VCAP debugfs creation until after netdev registration
From: Daniel Machon @ 2026-05-07 18:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Steen Hegelund, UNGLinuxDriver, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Bjarni Jonasson, Lars Povlsen,
	Philipp Zabel, kees, linux-kernel, netdev, linux-arm-kernel,
	linux-rt-devel
In-Reply-To: <20260507090810.53e66ef6@kernel.org>

> On Wed, 6 May 2026 09:25:36 +0200 Daniel Machon wrote:
> > Move the debugfs setup into a new sparx5_debugfs() helper in
> > sparx5_debugfs.c, invoked after sparx5_register_notifier_blocks()
> > succeeds so the netdev names are finalized. sparx5_vcap_init() now
> > only deals with VCAP state. The sparx5/ debugfs root is created in
> > the new helper as well.
> 
> netdev names are never final :( User can change them at any time.
> The best practice is to name the debugfs file by some stable hw-related
> property, bus, port number etc.

Right, but they are finalized in the sense that we have a name we can use for the
debugfs files (which we dont pre-patch).

Hmm. I think this patch fixes an actual issue, where you cannot query the
debugfs files, because a previous patch broke the ordering. I agree that the
names chosen (netdev_name()) for the files were poor, but is that really a fix
for this series? Should that not be adressed in a future patch for net-next (it
involves changing an VCAP API function that is not only used by Sparx5/lan969x,
but also lan966x.).

/Daniel

^ permalink raw reply

* Re: [PATCH v2 iproute2-next 1/4] rdma: Update headers
From: Chiara Meiohas @ 2026-05-07 19:03 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger
  Cc: leon, michaelgur, jgg, linux-rdma, netdev, Patrisious Haddad
In-Reply-To: <3cf0dcca-9a3f-4cee-83d7-f058f33bcc04@gmail.com>

On 07/05/2026 19:20, David Ahern wrote:

> On 4/28/26 4:05 AM, Chiara Meiohas wrote:
>> We will prepare a sync patch to align the names with the kernel and send
>> it shortly.
> what happened to this request? I see that Stephen had to post a patch
> (not yet applied) to address this problem:
>
> https://patchwork.kernel.org/project/netdevbpf/patch/20260505181045.748088-1-stephen@networkplumber.org/
>
> We allow rdma to have separate uapi headers for convenience. Responses
> to mistakes need to be timely.

Hi David,

My apologies for the delay; I will make sure these mistakes are handled

more promptly in the future.


I have sent our version to the mailing list, as I was not sure how you

would prefer to proceed given the overlap.

https://lore.kernel.org/linux-rdma/20260507184609.3439875-1-cmeiohas@nvidia.com/

Thanks,

Chiara


^ permalink raw reply

* Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction
From: Jesper Dangaard Brouer @ 2026-05-07 19:09 UTC (permalink / raw)
  To: Simon Schippers, Paolo Abeni, netdev
  Cc: kernel-team, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Stanislav Fomichev, linux-kernel, bpf
In-Reply-To: <e3a91545-13cd-4f87-8375-d707865bdbca@schippers-hamm.de>

[-- Attachment #1: Type: text/plain, Size: 3900 bytes --]



On 07/05/2026 16.46, Simon Schippers wrote:
> 
> 
> On 5/7/26 16:34, Paolo Abeni wrote:
>> On 5/7/26 8:54 AM, Simon Schippers wrote:
>>> On 5/5/26 15:21, hawk@kernel.org wrote:
>>>> @@ -928,9 +968,13 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
>>>>   			}
>>>>   		} else {
>>>>   			/* ndo_start_xmit */
>>>> -			struct sk_buff *skb = ptr;
>>>> +			bool bql_charged = veth_ptr_is_bql(ptr);
>>>> +			struct sk_buff *skb = veth_ptr_to_skb(ptr);
>>>>   
>>>>   			stats->xdp_bytes += skb->len;
>>>> +			if (peer_txq && bql_charged)
>>>> +				netdev_tx_completed_queue(peer_txq, 1, VETH_BQL_UNIT);
>>>
>>> In the discussion with Jonas [1], I left a comment explaining why I think
>>> this doesn’t work.
>>>

I've experimented with doing the "completion" at NAPI-end in
veth_poll(), but that resulted in BQL limit being 128 packets, which
leads to bad latency results (not acceptable).
(See detailed report later)


>>> I still think first that adding an option to modify the hard-coded
>>> VETH_RING_SIZE is the way to go.
>>>

Not against being able to modify VETH_RING_SIZE, but I don't think it is
the solution here.

The simply solution is the configure BQL limit_min:
  `/sys/class/net/<dev>/queues/tx-N/byte_queue_limits/limit_min`

My experiments (below) find that limit_min=8 is gives good performance.
We can simply set default to 8 as this still allows userspace to change
this later if lower latency is preferred.

>>> Thanks!
>>>
>>> [1] Link: https://lore.kernel.org/netdev/e8cdba04-aa9a-45c6-9807-8274b62920df@tu-dortmund.de/
 >>
>> In the above discussion a 20% regression is reported, which IMHO can't
>> be ignored. Still the tput figures in the data are extremely low,
>> something is possibly off?!? I would expect a few Mpps with pktgen on
>> top of veth, while the reported data is ~20-30Kpps.
>>
>> /P
>>
> 
> The ~20-30Kpps occur when thousands of iptables rules are applied and
> an UDP userspace application is sending.
> 
> And there is a 20% pktgen regression (no iptables rules applied).
> 

The pktgen test is a little dubious/weird and Jonas had to modify pktgen
to test this.   John Fastabend added a config to pktgen that allows us
to benchmarking egress qdisc path, this might be better to use this.
The samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh is a demo usage.

If redoing the tests, can you adjust limit_min to see the effect?
  /sys/class/net/<dev>/queues/tx-N/byte_queue_limits/limit_min

20% throughput performance regression is of-cause too much, but I will
remind us, that adding a qdisc will "cost" some overhead, that is a
configuration choice.  Our purpose here is to reduce bufferbloat and
latency, not optimize for throughput.


> I am pretty sure the reason is because the BQL limit is stuck at 2
> packets (because the completed queue is always called with 1 packet
> and not in a interrupt/timer with multiple packets...).
> 

I've run a lot of experiments, which I made AI write a report over, see 
attachment.  The TL;DR is that best performance vs latency tradeoff is 
defaulting BQL/DQL limit_min to be 8 packets.

I fear this patchset will stall forever, if we keep searching for a 
perfect solution without any overhead.  The qdisc layer will be a 
baseline overhead.  The limit=2 packets is actually the optimal 
darkbuffer queue size, but I acknowledge that this causes too many qdisc 
requeue events (leading to overhead).  I suggest that I add another 
patch in V6, that defaults limit_min to 8 (separate patch to make it 
easier to revert/adjust later).

I've talked with Jonas, and we want to experiment with different 
solutions to make BQL/DQL work better with virtual devices.

This patchset helps our (production) use-case reduce mice-flow latency
from approx 22ms to 1.3ms for latency under-load.  Due to the consumer
namespace being the bottleneck the requeue overhead is negligible in
comparison.

-Jesper

[-- Attachment #2: PERF-2651-bql-completion-experiment.md --]
[-- Type: text/markdown, Size: 13601 bytes --]

# PERF-2651: BQL Completion Batching Experiment (2026-05-05)

## Background

Simon Schippers and Jonas Koeppeler raised concerns that DQL settles at
limit=2 with veth BQL, citing the netdevice.h comment:

> "Must be called at most once per TX completion round (and not per
>  individual packet), so that BQL can adjust its limits appropriately."

And Tom Herbert's original BQL cover letter:

> "BQL accounting is in the transmit path for every packet, and the
>  function to recompute the byte limit is run once per transmit completion."

Thread: https://lore.kernel.org/all/e8cdba04-aa9a-45c6-9807-8274b62920df@tu-dortmund.de/

## Experiment: Batch BQL completion at end of veth_poll

Created stg patch `experiment-batch-bql-completion` that moves
`netdev_tx_completed_queue()` from per-SKB inside `veth_xdp_rcv()` to a
single batched call at the end of `veth_poll()`.

### Code change (drivers/net/veth.c)

In `veth_xdp_rcv()`: replace per-SKB completion with counter accumulation:
```c
// Before (V5, per-packet):
if (peer_txq && bql_charged)
    netdev_tx_completed_queue(peer_txq, 1, VETH_BQL_UNIT);

// After (experiment, accumulate):
if (peer_txq && bql_charged)
    stats->bql_completed += VETH_BQL_UNIT;
```

In `veth_poll()`: single batched call after veth_xdp_rcv() returns:
```c
if (peer_txq && stats.bql_completed)
    netdev_tx_completed_queue(peer_txq, stats.bql_completed,
                              stats.bql_completed);
```

Note: cannot use `done` (return value of veth_xdp_rcv) because it counts
all consumed ring entries including XDP frames that were never BQL-charged.
Using `done` would over-complete and hit BUG_ON in dql_completed().

## Why DQL settles at limit=2 with per-packet completion

The DQL slack calculation in `dql_completed()` uses:
```c
slack = POSDIFF(limit + prev_ovlimit, 2 * (completed - num_completed));
```

`completed - num_completed` equals the `count` parameter (bytes completed
this call). Per-packet: count=1, so slack = limit + prev_ovlimit - 2.
With limit=2, slack=0, so the algorithm holds steady at 2.

With batched completion: count=~64, slack calculation sees the real batch
size, and DQL converges to limit=128 (~2x NAPI budget).

## Results: nrules=3500 (sfq + tiny-flood)

| Metric        | No BQL    | Per-pkt   | limit=4 | limit=8 | limit=16 | Batched     |
|               |           | (limit=2) |         |         |          | (limit=128) |
|---------------|-----------|-----------|---------|---------|----------|-------------|
| BQL limit     | unlimited | 2         | 4       | 8       | 16       | 128         |
| BQL inflight  | 254       | 3         | 5       | 9-17    | 25       | 133         |
| Ping RTT avg  | 9.3ms     | 0.94ms    | 1.07ms  | 1.24ms  | 1.69ms   | 4.0ms       |
| requeues      | 52K       | 454K      | 426K    | 399K    | 356K     | 112K        |
| NAPI avg_work | 63        | 5         | 15      | 63      | 63       | 63          |
| NAPI polls    | ~2.2K     | ~27K      | ~10.5K  | ~2.5K   | ~2.3K    | ~2.5K       |
| Consumer pps  | ~26K      | ~30K      | ~30K    | ~30K    | ~29K     | ~30K        |

## Results: nrules=15000 (sfq + tiny-flood, slower consumer)

| Metric        | No BQL     | Per-pkt   | limit=4   | limit=8   | limit=16  | Batched     |
|               |            | (limit=2) |           |           |           | (limit=128) |
|---------------|------------|-----------|-----------|-----------|-----------|-------------|
| BQL limit     | unlimited  | 2         | 4         | 8         | 16        | 128         |
| BQL inflight  | 211-227    | 3         | 5-12      | 9         | 17        | 132-136     |
| Ping RTT avg  | **37.8ms** | **4.5ms** | **5.0ms** | **6.0ms** | **6.4ms** | **20.0ms**  |
| Ping RTT min  | 27.7ms     | 1.4ms     | 1.7ms     | 2.4ms     | 3.0ms     | 10.4ms      |
| requeues      | 12.9K      | 93K       | 87K       | 80K       | 86K       | 22.8K       |
| NAPI avg_work | 61         | 6         | 17        | 60        | 61        | 61          |
| NAPI polls    | ~540       | ~4.9K     | ~1.9K     | ~540      | ~550      | ~540        |
| Consumer pps  | ~6.7K      | ~6.7K     | ~6.9K     | ~6.8K     | ~7.0K     | ~6.7K       |

## Analysis

### Batched completion is clearly worse for latency

At nrules=15000, batched completion gives 20ms ping RTT -- only 2x better
than no-BQL (37.8ms). Per-packet gives 4.5ms -- an 8x improvement.

The math confirms this: 128 packets / 6.7K pps = 19ms of uncontrolled
queuing delay. This matches the measured 20ms almost exactly.

### Per-packet completion (limit=2) is correct for veth

Simon's concern that limit=2 is a DQL defect is wrong. limit=2 is the
ideal behavior for dark-buffer elimination:
- Only 2-3 packets in the ptr_ring at any time
- Qdisc gets immediate control over all buffering
- 8x latency reduction vs no-BQL

The DQL comment "once per TX completion round" was written for HW NICs
where interrupt coalescing batches completions naturally. For veth, each
per-SKB completion within a NAPI poll technically violates the letter of
the comment, but the resulting limit=2 is correct for the use case.

The concern with limit=2 is the overhead it introduces:

### Trade-off: NAPI polling overhead

Per-packet (limit=2) causes many more NAPI polls:
- nrules=3500: 27K polls (avg_work=5) vs 2.5K polls (avg_work=63)
- nrules=15000: 4.9K polls (avg_work=6) vs 540 polls (avg_work=61)

This is because with only 2-3 items in the ring, each NAPI poll drains
the ring quickly -> napi_complete_done -> reschedule. More scheduling
overhead, but no throughput impact when consumer is the bottleneck.

### limit_min tuning via sysfs

DQL limit_min can be set via:
`/sys/class/net/<dev>/queues/tx-0/byte_queue_limits/limit_min`

The selftest `--bql-min-limit N` flag writes to this sysfs.

- **limit_min=4**: half a cache-line (32 bytes of ptr_ring pointers).
  avg_work=17, 1.9K polls. Ping 5.0ms -- close to limit=2 (4.5ms).
- **limit_min=8**: one cache-line (64 bytes of ptr_ring pointers).
  avg_work=60, 540 polls. Ping 6.0ms -- efficient full-budget polls.

### Dark buffer formula

At consumer rate R (pps) and BQL limit L (packets):
- Dark buffer latency = L / R
- limit=2: 2/6700 = 0.3ms (negligible)
- limit=8: 8/6700 = 1.2ms
- limit=128: 128/6700 = 19ms (matches measured 20ms)
- unlimited (254): 254/6700 = 38ms (matches measured 37.8ms)

## Results: nrules=0 (no consumer overhead, max throughput)

This tests the raw throughput overhead of BQL stop/start oscillation.
All values are averages of 4 runs (VM noise is ~15-20% per-run variance).

| Metric          | No BQL | limit=2 | limit=4 | limit=8 | limit=16 |
|-----------------|--------|---------|---------|---------|----------|
| Sink pps (large)| 841K   | 759K    | 692K    | 762K    | 736K     |
| Sink pps (small)| 950K   | 874K    | 807K    | 874K    | 844K     |
| qdisc pkts      | 48.6M  | 44.8M   | 40.1M   | 45.0M   | 44.8M    |
| requeues        | 311K   | 6.1M    | 13.4M   | 5.8M    | 5.2M     |
| NAPI avg_work   | 22     | 27      | 12      | 19      | 21       |
| Ping RTT avg    | 0.17ms | 0.11ms  | 0.10ms  | 0.085ms | 0.095ms  |
| Runs            | 4      | 4       | 4       | 4       | 4        |

Observations:
- **limit=2 is NOT the worst** -- limit=4 has higher requeues (13.4M) and
  lower throughput (692K sink) due to more stop/start cycles at a less
  efficient NAPI batch size (avg_work=12)
- **limit=8 and limit=16 match No-BQL throughput** within noise (~762K vs 841K
  sink pps for large pkts, ~3-10% difference)
- **Requeue overhead**: 311K (No BQL) -> 5.2-5.8M (limit=8/16) -> 13.4M (limit=4)
- Latency sub-0.2ms for all settings at this speed -- not a differentiator

## Comparison: limit=8 vs limit=16

Multi-run (4 iterations each, nrules=0) to cut through VM noise:

### limit=8 (4 runs)

| Run | Sink pps (large/small) | qdisc pkts | requeues | avg_work | Ping avg |
|-----|------------------------|-----------|----------|----------|----------|
| 1   | 796K / 911K            | 46.2M     | 5.6M     | 20       | 0.062ms  |
| 2   | 796K / 883K            | 45.5M     | 4.7M     | 16       | 0.081ms  |
| 3   | 654K / 836K            | 43.5M     | 8.3M     | 22       | 0.100ms  |
| 4   | 803K / 865K            | 44.8M     | 4.4M     | 16       | 0.095ms  |
| **avg** | **762K / 874K**    | **45.0M** | **5.8M** | **19**   | **0.085ms** |

### limit=16 (4 runs)

| Run | Sink pps (large/small) | qdisc pkts | requeues | avg_work | Ping avg |
|-----|------------------------|-----------|----------|----------|----------|
| 1   | 844K / 940K            | 48.1M     | 3.3M     | 20       | 0.081ms  |
| 2   | 768K / 873K            | 45.6M     | 4.1M     | 15       | 0.097ms  |
| 3   | 733K / 804K            | 44.8M     | 6.5M     | 26       | 0.085ms  |
| 4   | 597K / 757K            | 40.7M     | 6.9M     | 23       | 0.115ms  |
| **avg** | **736K / 844K**    | **44.8M** | **5.2M** | **21**   | **0.095ms** |

### Averaged comparison (nrules=0, 4 runs)

| Metric              | limit=8   | limit=16  |
|---------------------|-----------|-----------|
| Sink pps (large)    | 762K      | 736K      |
| Sink pps (small)    | 874K      | 844K      |
| qdisc pkts          | 45.0M     | 44.8M     |
| requeues            | 5.8M      | 5.2M      |
| avg_work            | 19        | 21        |
| Ping RTT avg        | 0.085ms   | 0.095ms   |

At max throughput, limit=8 and limit=16 are within VM noise (~3-4%).

### Cross-load comparison (all averages of 4 runs)

| Metric        | limit=8 | limit=16 | Winner        |
|---------------|---------|----------|---------------|
| nrules=15000: |         |          |               |
|   Ping RTT    | 6.73ms  | 8.00ms   | 8 (+1.3ms)    |
|   requeues    | 71K     | 73K      | ~same         |
|   avg_work    | 59      | 59       | ~same         |
| nrules=3500:  |         |          |               |
|   Ping RTT    | 1.77ms  | 2.11ms   | 8 (+0.34ms)   |
|   requeues    | 279K    | 282K     | ~same         |
|   avg_work    | 62      | 62       | ~same         |
| nrules=0:     |         |          |               |
|   Sink pps    | 762K    | 736K     | ~same (noise) |
|   requeues    | 5.8M    | 5.2M     | ~same (noise) |

**Verdict: limit=8 is the better default.**
- Consistent latency advantage under load: +1.3ms at nrules=15000,
  +0.34ms at nrules=3500 (reproducible across 4 runs each)
- Throughput indistinguishable from limit=16 after averaging
- One cache-line (64 bytes) is a clean hardware alignment
- More conservative -- smaller dark buffer

## Proposed patch: dql_set_min_limit() + veth default min_limit=8

Two-part solution in stg patch `veth-set-bql-min-limit-8`:

### 1. New DQL API helper (include/linux/dynamic_queue_limits.h)

```c
static inline void dql_set_min_limit(struct dql *dql, unsigned int min_limit)
{
    dql->min_limit = min_limit;
}
```

Gives drivers a clean API to set a default floor.  Currently no driver
sets min_limit -- all rely on the dql_init() default of 0 or user sysfs.

### 2. Veth sets min_limit=8 at device creation (drivers/net/veth.c)

In `veth_init_queues()`, after TX queue setup:
```c
#ifdef CONFIG_BQL
    for (i = 0; i < dev->num_tx_queues; i++)
        dql_set_min_limit(&netdev_get_tx_queue(dev, i)->dql,
                          VETH_BQL_UNIT * 8);
#endif
```

Called for both `dev` and `peer` in `veth_newlink()`.  Uses
`num_tx_queues` (all pre-allocated queues), not `real_num_tx_queues`,
so channel changes via `ethtool -L` are covered -- no new queues are
ever created at runtime.

### Why min_limit=8

- One cache-line of ptr_ring pointers (8 x 8 = 64 bytes)
- Lowest requeue count at max throughput (5.3M vs 16.9M at limit=2)
- Keeps full-budget NAPI polls (avg_work=63) -- no scheduling overhead
- Latency only 0.3ms worse than limit=2 at moderate load (1.24ms vs 0.94ms)
- Still 6x better latency than no-BQL at heavy load (6ms vs 37.8ms)
- User can lower to 0 or raise via sysfs limit_min at any time

### Verified: driver default works (nrules=15000, --hist)

Tested with `veth-set-bql-min-limit-8` patch applied, no `--bql-min-limit`
sysfs override.  BQL limit=8 held stable, ping RTT ~6.5ms (matches sysfs
override results).

BQL inflight histogram (bpftrace, 169K samples):
```
[1]              15 |                                  |
[2, 4)        21193 |@@@@@@@@@@@@@                     |
[4, 8)        63615 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16)       80116 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)       4709 |@@@                               |
```

- Inflight avg=7, max=17 -- ring stays shallow
- Peak at [8,16): inflight near the limit=8 floor most of the time
- [4,8) second: ring draining between NAPI polls
- [16,32) rare: brief producer bursts
- stack_xoff ~15K/5s, drv_xoff=0 -- BQL stops queue well before ring fills
- NAPI avg_work=61, almost all full-budget polls

## Conclusion

Per-packet BQL completion in V5 is the right design. It gives DQL the
information it needs to keep the dark buffer minimal, which is exactly
what we want for latency reduction.

Simon's suggestion to call netdev_tx_completed_queue() once per NAPI poll
would regress ping latency from 4.5ms to 20ms at production-like iptables
rule counts.

The default min_limit=8 (via dql_set_min_limit) is the proposed follow-up
to address the requeue overhead that per-packet completion causes.  It
keeps latency close to optimal while reducing the ~10% throughput loss
and 20x requeue increase (6.1M vs 311K) that limit=2 causes at max speed.
Users wanting tighter latency can set limit_min=0 via sysfs to get the
original limit=2 behavior.

^ permalink raw reply

* [PATCH net-next v7 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management
From: Long Li @ 2026-05-07 19:12 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui, shradhagupta
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel

This series adds per-vPort Event Queue (EQ) allocation and MSI-X interrupt
management for the MANA driver. Previously, all vPorts shared a single set
of EQs. This change enables dedicated EQs per vPort with support for both
dedicated and shared MSI-X vector allocation modes.

Patch 1 moves EQ ownership from mana_context to per-vPort mana_port_context
and exports create/destroy functions for the RDMA driver. Also adds EQ
create/destroy calls to mana_ib_cfg_vport/uncfg_vport so RDMA vPorts get
their own EQs.

Patch 2 adds device capability queries to determine whether MSI-X vectors
should be dedicated per-vPort or shared. When the number of available MSI-X
vectors is insufficient for dedicated allocation, the driver enables sharing
mode with bitmap-based vector assignment.

Patch 3 introduces the GIC (GDMA IRQ Context) abstraction with reference
counting, allowing multiple EQs to safely share a single MSI-X vector.

Patch 4 converts the global EQ allocation in probe/resume to use the new
GIC functions.

Patch 5 adds per-vPort GIC lifecycle management, calling get/put on each
EQ creation and destruction during vPort open/close.

Patch 6 extends the same GIC lifecycle management to the RDMA driver's EQ
allocation path.

Changes in v7:
- Rebased on net-next/main
- Patch 1: Guard ibdev_dbg() in mana_ib_cfg_vport() with error check so
  the vport handle is not logged on the failure path
- Patch 1: Fix checkpatch line length warning in debugfs_create_dir() call
- Patch 2: Use rounddown_pow_of_two() instead of roundup_pow_of_two() when
  computing per-vPort queue count to avoid unnecessarily forcing shared
  MSI-X mode in borderline configurations
- Patch 2: Call mana_gd_setup_remaining_irqs() unconditionally to ensure
  irq_contexts are populated in both dedicated and shared MSI-X modes,
  fixing bisectability between patches 2 and 5
- Patch 2: Fix checkpatch line length warning in debugfs_create_u16() call
- Patch 3: Use cached gic->irq instead of pci_irq_vector() lookup in
  mana_gd_put_gic() for consistency with the allocation path
- Patch 3: Fix checkpatch line length warning in mana_gd_get_gic()
  declaration
- Patch 5: Fix unsigned int* to int* pointer type mismatch when calling
  mana_gd_get_gic() by using a local int variable for the MSI index
- Patch 6: Fix same unsigned int* to int* pointer type mismatch in RDMA
  EQ creation path

Changes in v6:
- Rebased on net-next/main (v7.1-rc1)

Changes in v5:
- Rebased on net-next/main

Changes in v4:
- Rebased on net-next/main 7.0-rc4
- Patch 2: Use MANA_DEF_NUM_QUEUES instead of hardcoded 16 for
  max_num_queues clamping
- Patch 3: Track dyn_msix in GIC context instead of re-checking
  pci_msix_can_alloc_dyn() on each call; improved remove_irqs iteration
  to skip unallocated entries

Changes in v3:
- Rebased on net-next/main
- Patch 1: Added NULL check for mpc->eqs in mana_ib_create_qp_rss() to
  prevent NULL pointer dereference when RSS QP is created before a raw QP
  has configured the vport and allocated EQs

Changes in v2:
- Rebased on net-next/main (adapted to kzalloc_objs/kzalloc_obj macros,
  new GDMA_DRV_CAP_FLAG definitions)
- Patch 2: Fixed misleading comment for max_num_queues vs
  max_num_queues_vport in gdma.h
- Patch 3: Fixed spelling typo in gdma_main.c ("difference" -> "different")

Long Li (6):
  net: mana: Create separate EQs for each vPort
  net: mana: Query device capabilities and configure MSI-X sharing for
    EQs
  net: mana: Introduce GIC context with refcounting for interrupt
    management
  net: mana: Use GIC functions to allocate global EQs
  net: mana: Allocate interrupt context for each EQ when creating vPort
  RDMA/mana_ib: Allocate interrupt contexts on EQs

 drivers/infiniband/hw/mana/main.c             |  60 +++-
 drivers/infiniband/hw/mana/qp.c               |  16 +-
 .../net/ethernet/microsoft/mana/gdma_main.c   | 297 +++++++++++++-----
 drivers/net/ethernet/microsoft/mana/mana_en.c | 168 ++++++----
 include/net/mana/gdma.h                       |  33 +-
 include/net/mana/mana.h                       |   7 +-
 6 files changed, 425 insertions(+), 156 deletions(-)

-- 
2.43.0

^ permalink raw reply

* [PATCH net-next v7 1/6] net: mana: Create separate EQs for each vPort
From: Long Li @ 2026-05-07 19:12 UTC (permalink / raw)
  To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
	Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
	Dexuan Cui, shradhagupta
  Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260507191237.438671-1-longli@microsoft.com>

To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ
sharing among the vPorts and create dedicated EQs for each vPort.

Move the EQ definition from struct mana_context to struct mana_port_context
and update related support functions. Export mana_create_eq() and
mana_destroy_eq() for use by the MANA RDMA driver.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/infiniband/hw/mana/main.c             |  19 ++-
 drivers/infiniband/hw/mana/qp.c               |  16 ++-
 drivers/net/ethernet/microsoft/mana/mana_en.c | 111 ++++++++++--------
 include/net/mana/mana.h                       |   7 +-
 4 files changed, 98 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index ac5e75dd3494..8000ab6e8beb 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
 	pd->vport_use_count--;
 	WARN_ON(pd->vport_use_count < 0);
 
-	if (!pd->vport_use_count)
+	if (!pd->vport_use_count) {
+		mana_destroy_eq(mpc);
 		mana_uncfg_vport(mpc);
+	}
 
 	mutex_unlock(&pd->vport_mutex);
 }
@@ -55,15 +57,22 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
 		return err;
 	}
 
-	mutex_unlock(&pd->vport_mutex);
 
 	pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
 	pd->tx_vp_offset = mpc->tx_vp_offset;
+	err = mana_create_eq(mpc);
+	if (err) {
+		mana_uncfg_vport(mpc);
+		pd->vport_use_count--;
+	}
 
-	ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
-		  mpc->port_handle, pd->pdn, doorbell_id);
+	mutex_unlock(&pd->vport_mutex);
 
-	return 0;
+	if (!err)
+		ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
+			  mpc->port_handle, pd->pdn, doorbell_id);
+
+	return err;
 }
 
 int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 645581359cee..6f1043383e8c 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -168,7 +168,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
 		cq_spec.gdma_region = cq->queue.gdma_region;
 		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
 		cq_spec.modr_ctx_id = 0;
-		eq = &mpc->ac->eqs[cq->comp_vector];
+		/* EQs are created when a raw QP configures the vport.
+		 * A raw QP must be created before creating rwq_ind_tbl.
+		 */
+		if (!mpc->eqs) {
+			ret = -EINVAL;
+			i--;
+			goto fail;
+		}
+		eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
 		cq_spec.attached_eq = eq->eq->id;
 
 		ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
@@ -317,7 +325,11 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
 	cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
 	cq_spec.modr_ctx_id = 0;
 	eq_vec = send_cq->comp_vector;
-	eq = &mpc->ac->eqs[eq_vec];
+	if (!mpc->eqs) {
+		err = -EINVAL;
+		goto err_destroy_queue;
+	}
+	eq = &mpc->eqs[eq_vec % mpc->num_queues];
 	cq_spec.attached_eq = eq->eq->id;
 
 	err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 462a457e7d53..a13204b3ee79 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1615,78 +1615,83 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
 }
 EXPORT_SYMBOL_NS(mana_destroy_wq_obj, "NET_MANA");
 
-static void mana_destroy_eq(struct mana_context *ac)
+void mana_destroy_eq(struct mana_port_context *apc)
 {
+	struct mana_context *ac = apc->ac;
 	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct gdma_queue *eq;
 	int i;
 
-	if (!ac->eqs)
+	if (!apc->eqs)
 		return;
 
-	debugfs_remove_recursive(ac->mana_eqs_debugfs);
-	ac->mana_eqs_debugfs = NULL;
+	debugfs_remove_recursive(apc->mana_eqs_debugfs);
+	apc->mana_eqs_debugfs = NULL;
 
-	for (i = 0; i < gc->max_num_queues; i++) {
-		eq = ac->eqs[i].eq;
+	for (i = 0; i < apc->num_queues; i++) {
+		eq = apc->eqs[i].eq;
 		if (!eq)
 			continue;
 
 		mana_gd_destroy_queue(gc, eq);
 	}
 
-	kfree(ac->eqs);
-	ac->eqs = NULL;
+	kfree(apc->eqs);
+	apc->eqs = NULL;
 }
+EXPORT_SYMBOL_NS(mana_destroy_eq, "NET_MANA");
 
-static void mana_create_eq_debugfs(struct mana_context *ac, int i)
+static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
 {
-	struct mana_eq eq = ac->eqs[i];
+	struct mana_eq eq = apc->eqs[i];
 	char eqnum[32];
 
 	sprintf(eqnum, "eq%d", i);
-	eq.mana_eq_debugfs = debugfs_create_dir(eqnum, ac->mana_eqs_debugfs);
+	eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
 	debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
 	debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
 	debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
 }
 
-static int mana_create_eq(struct mana_context *ac)
+int mana_create_eq(struct mana_port_context *apc)
 {
-	struct gdma_dev *gd = ac->gdma_dev;
+	struct gdma_dev *gd = apc->ac->gdma_dev;
 	struct gdma_context *gc = gd->gdma_context;
 	struct gdma_queue_spec spec = {};
 	int err;
 	int i;
 
-	ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
-	if (!ac->eqs)
+	WARN_ON(apc->eqs);
+	apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
+	if (!apc->eqs)
 		return -ENOMEM;
 
 	spec.type = GDMA_EQ;
 	spec.monitor_avl_buf = false;
 	spec.queue_size = EQ_SIZE;
 	spec.eq.callback = NULL;
-	spec.eq.context = ac->eqs;
+	spec.eq.context = apc->eqs;
 	spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
 
-	ac->mana_eqs_debugfs = debugfs_create_dir("EQs", gc->mana_pci_debugfs);
+	apc->mana_eqs_debugfs = debugfs_create_dir("EQs",
+						    apc->mana_port_debugfs);
 
-	for (i = 0; i < gc->max_num_queues; i++) {
+	for (i = 0; i < apc->num_queues; i++) {
 		spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
-		err = mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq);
+		err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
 		if (err) {
 			dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
 			goto out;
 		}
-		mana_create_eq_debugfs(ac, i);
+		mana_create_eq_debugfs(apc, i);
 	}
 
 	return 0;
 out:
-	mana_destroy_eq(ac);
+	mana_destroy_eq(apc);
 	return err;
 }
+EXPORT_SYMBOL_NS(mana_create_eq, "NET_MANA");
 
 static int mana_fence_rq(struct mana_port_context *apc, struct mana_rxq *rxq)
 {
@@ -2451,7 +2456,7 @@ static int mana_create_txq(struct mana_port_context *apc,
 		spec.monitor_avl_buf = false;
 		spec.queue_size = cq_size;
 		spec.cq.callback = mana_schedule_napi;
-		spec.cq.parent_eq = ac->eqs[i].eq;
+		spec.cq.parent_eq = apc->eqs[i].eq;
 		spec.cq.context = cq;
 		err = mana_gd_create_mana_wq_cq(gd, &spec, &cq->gdma_cq);
 		if (err)
@@ -2844,13 +2849,12 @@ static void mana_create_rxq_debugfs(struct mana_port_context *apc, int idx)
 static int mana_add_rx_queues(struct mana_port_context *apc,
 			      struct net_device *ndev)
 {
-	struct mana_context *ac = apc->ac;
 	struct mana_rxq *rxq;
 	int err = 0;
 	int i;
 
 	for (i = 0; i < apc->num_queues; i++) {
-		rxq = mana_create_rxq(apc, i, &ac->eqs[i], ndev);
+		rxq = mana_create_rxq(apc, i, &apc->eqs[i], ndev);
 		if (!rxq) {
 			err = -ENOMEM;
 			netdev_err(ndev, "Failed to create rxq %d : %d\n", i, err);
@@ -2869,9 +2873,8 @@ static int mana_add_rx_queues(struct mana_port_context *apc,
 	return err;
 }
 
-static void mana_destroy_vport(struct mana_port_context *apc)
+static void mana_destroy_rxqs(struct mana_port_context *apc)
 {
-	struct gdma_dev *gd = apc->ac->gdma_dev;
 	struct mana_rxq *rxq;
 	u32 rxq_idx;
 
@@ -2883,8 +2886,12 @@ static void mana_destroy_vport(struct mana_port_context *apc)
 		mana_destroy_rxq(apc, rxq, true);
 		apc->rxqs[rxq_idx] = NULL;
 	}
+}
+
+static void mana_destroy_vport(struct mana_port_context *apc)
+{
+	struct gdma_dev *gd = apc->ac->gdma_dev;
 
-	mana_destroy_txq(apc);
 	mana_uncfg_vport(apc);
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
@@ -2905,11 +2912,7 @@ static int mana_create_vport(struct mana_port_context *apc,
 			return err;
 	}
 
-	err = mana_cfg_vport(apc, gd->pdid, gd->doorbell);
-	if (err)
-		return err;
-
-	return mana_create_txq(apc, net);
+	return mana_cfg_vport(apc, gd->pdid, gd->doorbell);
 }
 
 static int mana_rss_table_alloc(struct mana_port_context *apc)
@@ -3195,21 +3198,36 @@ int mana_alloc_queues(struct net_device *ndev)
 
 	err = mana_create_vport(apc, ndev);
 	if (err) {
-		netdev_err(ndev, "Failed to create vPort %u : %d\n", apc->port_idx, err);
+		netdev_err(ndev, "Failed to create vPort %u : %d\n",
+			   apc->port_idx, err);
 		return err;
 	}
 
+	err = mana_create_eq(apc);
+	if (err) {
+		netdev_err(ndev, "Failed to create EQ on vPort %u: %d\n",
+			   apc->port_idx, err);
+		goto destroy_vport;
+	}
+
+	err = mana_create_txq(apc, ndev);
+	if (err) {
+		netdev_err(ndev, "Failed to create TXQ on vPort %u: %d\n",
+			   apc->port_idx, err);
+		goto destroy_eq;
+	}
+
 	err = netif_set_real_num_tx_queues(ndev, apc->num_queues);
 	if (err) {
 		netdev_err(ndev,
 			   "netif_set_real_num_tx_queues () failed for ndev with num_queues %u : %d\n",
 			   apc->num_queues, err);
-		goto destroy_vport;
+		goto destroy_txq;
 	}
 
 	err = mana_add_rx_queues(apc, ndev);
 	if (err)
-		goto destroy_vport;
+		goto destroy_rxq;
 
 	apc->rss_state = apc->num_queues > 1 ? TRI_STATE_TRUE : TRI_STATE_FALSE;
 
@@ -3218,7 +3236,7 @@ int mana_alloc_queues(struct net_device *ndev)
 		netdev_err(ndev,
 			   "netif_set_real_num_rx_queues () failed for ndev with num_queues %u : %d\n",
 			   apc->num_queues, err);
-		goto destroy_vport;
+		goto destroy_rxq;
 	}
 
 	mana_rss_table_init(apc);
@@ -3226,19 +3244,25 @@ int mana_alloc_queues(struct net_device *ndev)
 	err = mana_config_rss(apc, TRI_STATE_TRUE, true, true);
 	if (err) {
 		netdev_err(ndev, "Failed to configure RSS table: %d\n", err);
-		goto destroy_vport;
+		goto destroy_rxq;
 	}
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode) {
 		err = mana_pf_register_filter(apc);
 		if (err)
-			goto destroy_vport;
+			goto destroy_rxq;
 	}
 
 	mana_chn_setxdp(apc, mana_xdp_get(apc));
 
 	return 0;
 
+destroy_rxq:
+	mana_destroy_rxqs(apc);
+destroy_txq:
+	mana_destroy_txq(apc);
+destroy_eq:
+	mana_destroy_eq(apc);
 destroy_vport:
 	mana_destroy_vport(apc);
 	return err;
@@ -3343,6 +3367,9 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	mana_fence_rqs(apc);
 
 	/* Even in err case, still need to cleanup the vPort */
+	mana_destroy_rxqs(apc);
+	mana_destroy_txq(apc);
+	mana_destroy_eq(apc);
 	mana_destroy_vport(apc);
 
 	return 0;
@@ -3663,12 +3690,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 
 	INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler);
 
-	err = mana_create_eq(ac);
-	if (err) {
-		dev_err(dev, "Failed to create EQs: %d\n", err);
-		goto out;
-	}
-
 	err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
 				    MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
 	if (err)
@@ -3808,8 +3829,6 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 		free_netdev(ndev);
 	}
 
-	mana_destroy_eq(ac);
-
 	if (ac->per_port_queue_reset_wq) {
 		destroy_workqueue(ac->per_port_queue_reset_wq);
 		ac->per_port_queue_reset_wq = NULL;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index aa90a858c8e3..c8e7d16f6685 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -480,8 +480,6 @@ struct mana_context {
 	u8 bm_hostmode;
 
 	struct mana_ethtool_hc_stats hc_stats;
-	struct mana_eq *eqs;
-	struct dentry *mana_eqs_debugfs;
 	struct workqueue_struct *per_port_queue_reset_wq;
 	/* Workqueue for querying hardware stats */
 	struct delayed_work gf_stats_work;
@@ -501,6 +499,9 @@ struct mana_port_context {
 
 	u8 mac_addr[ETH_ALEN];
 
+	struct mana_eq *eqs;
+	struct dentry *mana_eqs_debugfs;
+
 	enum TRI_STATE rss_state;
 
 	mana_handle_t default_rxobj;
@@ -1034,6 +1035,8 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
 int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
 		   u32 doorbell_pg_id);
 void mana_uncfg_vport(struct mana_port_context *apc);
+int mana_create_eq(struct mana_port_context *apc);
+void mana_destroy_eq(struct mana_port_context *apc);
 
 struct net_device *mana_get_primary_netdev(struct mana_context *ac,
 					   u32 port_index,
-- 
2.43.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox