Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] rocker: Fix memory leak in ofdpa_port_fdb()
From: patchwork-bot+netdevbpf @ 2026-06-23  0:10 UTC (permalink / raw)
  To: Ziran Zhang
  Cc: jiri, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel
In-Reply-To: <20260616013245.7098-1-zhangcoder@yeah.net>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 16 Jun 2026 09:32:45 +0800 you wrote:
> In ofdpa_port_fdb(), the hash_del() only unlinks the node from
> hash table, but does not free it.
> 
> Fix this by adding kfree(found) after the !found == removing check,
> where the pointer value is no longer needed.
> 
> Found by Coccinelle kfree script.
> 
> [...]

Here is the summary with links:
  - rocker: Fix memory leak in ofdpa_port_fdb()
    https://git.kernel.org/netdev/net/c/53442aad1d57

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [syzbot] [wpan?] KASAN: slab-use-after-free Read in hwsim_set_promiscuous_mode
From: syzbot @ 2026-06-23  0:31 UTC (permalink / raw)
  To: alex.aring, andrew+netdev, davem, edumazet, kuba, linux-kernel,
	linux-wpan, miquel.raynal, netdev, pabeni, stefan, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    b85966adbf5d Merge tag 'net-next-7.2' of git://git.kernel...
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16d3e3a1580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9a9f723a32776544
dashboard link: https://syzkaller.appspot.com/bug?extid=4707bb8a43a42fca2b97
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/d65306d96573/disk-b85966ad.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/ef43139aab0e/vmlinux-b85966ad.xz
kernel image: https://storage.googleapis.com/syzbot-assets/26d4d1ab67c3/bzImage-b85966ad.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4707bb8a43a42fca2b97@syzkaller.appspotmail.com

netdevsim netdevsim0 netdevsim0: left allmulticast mode
8021q: adding VLAN 0 to HW filter on device netdevsim0
8021q: adding VLAN 0 to HW filter on device netdevsim1
8021q: adding VLAN 0 to HW filter on device netdevsim2
8021q: adding VLAN 0 to HW filter on device netdevsim3
==================================================================
BUG: KASAN: slab-use-after-free in hwsim_set_promiscuous_mode+0x2b4/0x2e0 drivers/net/ieee802154/mac802154_hwsim.c:323
Read of size 1 at addr ffff88802adb1800 by task syz.0.303/7013

CPU: 1 UID: 0 PID: 7013 Comm: syz.0.303 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 hwsim_set_promiscuous_mode+0x2b4/0x2e0 drivers/net/ieee802154/mac802154_hwsim.c:323
 drv_set_promiscuous_mode+0x159/0x2e0 net/mac802154/driver-ops.h:127
 drv_start net/mac802154/driver-ops.h:195 [inline]
 mac802154_slave_open net/mac802154/iface.c:196 [inline]
 mac802154_wpan_open+0x196e/0x29f0 net/mac802154/iface.c:295
 __dev_open+0x46f/0x850 net/core/dev.c:1702
 __dev_change_flags+0x329/0x820 net/core/dev.c:9754
 netif_change_flags+0x7c/0x1b0 net/core/dev.c:9817
 do_setlink+0xdd6/0x4670 net/core/rtnetlink.c:3207
 rtnl_group_changelink net/core/rtnetlink.c:3855 [inline]
 __rtnl_newlink net/core/rtnetlink.c:4023 [inline]
 rtnl_newlink+0x148e/0x1bd0 net/core/rtnetlink.c:4151
 rtnetlink_rcv_msg+0x802/0xc00 net/core/rtnetlink.c:7068
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec net/socket.c:775 [inline]
 __sock_sendmsg net/socket.c:790 [inline]
 ____sys_sendmsg+0x9b9/0xa20 net/socket.c:2684
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2738
 __sys_sendmsg net/socket.c:2770 [inline]
 __do_sys_sendmsg net/socket.c:2775 [inline]
 __se_sys_sendmsg net/socket.c:2773 [inline]
 __x64_sys_sendmsg+0x1b1/0x290 net/socket.c:2773
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff94f19ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff9500a3028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ff94f416090 RCX: 00007ff94f19ce59
RDX: 0000000000000000 RSI: 0000200000000180 RDI: 0000000000000008
RBP: 00007ff94f232d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff94f416128 R14: 00007ff94f416090 R15: 00007ffcb835f4f8
 </TASK>

Allocated by task 5605:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x318/0x660 mm/slub.c:5451
 _kmalloc_noprof include/linux/slab.h:969 [inline]
 _kzalloc_noprof include/linux/slab.h:1286 [inline]
 hwsim_update_pib+0x88/0x450 drivers/net/ieee802154/mac802154_hwsim.c:101
 hwsim_set_promiscuous_mode+0x196/0x2e0 drivers/net/ieee802154/mac802154_hwsim.c:323
 drv_set_promiscuous_mode+0x159/0x2e0 net/mac802154/driver-ops.h:127
 drv_start net/mac802154/driver-ops.h:195 [inline]
 mac802154_slave_open net/mac802154/iface.c:196 [inline]
 mac802154_wpan_open+0x196e/0x29f0 net/mac802154/iface.c:295
 __dev_open+0x46f/0x850 net/core/dev.c:1702
 __dev_change_flags+0x329/0x820 net/core/dev.c:9754
 netif_change_flags+0x7c/0x1b0 net/core/dev.c:9817
 do_setlink+0xdd6/0x4670 net/core/rtnetlink.c:3207
 rtnl_changelink net/core/rtnetlink.c:3841 [inline]
 __rtnl_newlink net/core/rtnetlink.c:4014 [inline]
 rtnl_newlink+0x15c2/0x1bd0 net/core/rtnetlink.c:4151
 rtnetlink_rcv_msg+0x802/0xc00 net/core/rtnetlink.c:7068
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec net/socket.c:775 [inline]
 __sock_sendmsg net/socket.c:790 [inline]
 __sys_sendto+0x626/0x6c0 net/socket.c:2252
 __do_sys_sendto net/socket.c:2259 [inline]
 __se_sys_sendto net/socket.c:2255 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2255
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 23:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2700 [inline]
 __rcu_free_sheaf_prepare+0x12d/0x2a0 mm/slub.c:2951
 rcu_free_sheaf+0x31/0x200 mm/slub.c:5909
 rcu_do_batch kernel/rcu/tree.c:2645 [inline]
 rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2897
 handle_softirqs+0x225/0x840 kernel/softirq.c:622
 run_ksoftirqd+0x36/0x60 kernel/softirq.c:1076
 smpboot_thread_fn+0x57c/0xa80 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff88802adb1800
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
 freed 64-byte region [ffff88802adb1800, ffff88802adb1840)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2adb1
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000000 ffff88813fe1b8c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 12, tgid 12 (kworker/u8:0), ts 8480159325, free_ts 7414718607
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3289 [inline]
 allocate_slab+0x74/0x5d0 mm/slub.c:3404
 new_slab mm/slub.c:3447 [inline]
 refill_objects+0x328/0x3c0 mm/slub.c:7241
 refill_sheaf mm/slub.c:2827 [inline]
 __pcs_replace_empty_main+0x2e0/0x6b0 mm/slub.c:4692
 alloc_from_pcs mm/slub.c:4790 [inline]
 slab_alloc_node mm/slub.c:4924 [inline]
 __do_kmalloc_node mm/slub.c:5333 [inline]
 __kmalloc_noprof+0x464/0x750 mm/slub.c:5347
 _kmalloc_noprof include/linux/slab.h:973 [inline]
 _kzalloc_noprof include/linux/slab.h:1286 [inline]
 lsm_blob_alloc security/security.c:218 [inline]
 lsm_task_alloc security/security.c:270 [inline]
 security_task_alloc+0x4d/0x330 security/security.c:2785
 copy_process+0x1c57/0x42e0 kernel/fork.c:2269
 kernel_clone+0x2d7/0x940 kernel/fork.c:2745
 user_mode_thread+0x110/0x180 kernel/fork.c:2821
 call_usermodehelper_exec_work+0x5c/0x230 kernel/umh.c:171
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
page last free pid 10 tgid 10 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
 vfree+0x1fd/0x330 mm/vmalloc.c:3472
 delayed_vfree_work+0x55/0x80 mm/vmalloc.c:3392
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff88802adb1700: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88802adb1780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff88802adb1800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                   ^
 ffff88802adb1880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88802adb1900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [PATCH net v2] net: ethernet: ti: icssg: guard PA stat lookups
From: patchwork-bot+netdevbpf @ 2026-06-23  0:40 UTC (permalink / raw)
  To: Philippe Schenker
  Cc: netdev, philippe.schenker, horms, danishanwar, rogerq,
	linux-arm-kernel, stable, andrew+netdev, devnexen, davem,
	edumazet, jacob.e.keller, kuba, haokexin, m-malladi, pabeni,
	vadim.fedorenko, linux-kernel
In-Reply-To: <20260618093037.3448858-1-dev@pschenker.ch>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 18 Jun 2026 11:30:24 +0200 you wrote:
> From: Philippe Schenker <philippe.schenker@impulsing.ch>
> 
> icssg_ndo_get_stats64() unconditionally calls emac_get_stat_by_name()
> with FW PA stat names regardless of whether the PA stats block is
> present on the hardware.  emac_get_stat_by_name() already guards the
> PA stats lookup with `if (emac->prueth->pa_stats)`; when that pointer
> is NULL the lookup falls through to netdev_err() and returns -EINVAL.
> Because ndo_get_stats64 is polled regularly by the networking stack
> this produces thousands of log entries of the form:
> 
> [...]

Here is the summary with links:
  - [net,v2] net: ethernet: ti: icssg: guard PA stat lookups
    https://git.kernel.org/netdev/net/c/27b9daba5060

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 0/2] tcp: make TCP-AO lookups more predictable
From: Dmitry Safonov @ 2026-06-23  1:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, Kuniyuki Iwashima, netdev, eric.dumazet
In-Reply-To: <20260622185248.1717846-1-edumazet@google.com>

Hi Eric,

On Mon, 22 Jun 2026 at 19:52, Eric Dumazet <edumazet@google.com> wrote:
>
> This series fixes a TCP-AO key lookup precedence bug.
>
> TCP-AO stores MKTs in an unsorted list and returns the first match. This
> allows newer, less-specific keys (wildcard VRF or shorter prefixes) to
> shadow older, more-specific keys if inserted later.

Yeah, at this moment, TCP-AO doesn't allow any intersection of the keys:
If you have matching VRFs, matching keyids for matching peer/masks –
then when the userspace tries to add the second key, setsockopt() is
going to return -EKEYREJECTED/-EEXIST. This is quite different from
TCP-MD5, where the most matching key is the one that's going to be
used by the kernel.

This simplification (not allowing any key intersects) is mostly from a
very permissive RFC5925, where MKT matches can be: ip-addr/mask; ip
address ranges; wildcards of addresses; tcp ports. So, this part was
intentionally simplified until there is a user who requires one of
these things. And based on their requirements, a better data structure
than a simple list could be used. Basically, the longest prefix match
is like adding power-of-two ip ranges. Also, that's another reason why
I wanted an extendable setsockopt(), where one can add new
flags/fields to uAPI without breaking the existing users.

Anyways, if you have the requirement to have intersecting keys with
bigger mask matching (imitating TCP-MD5 behaviour), we can do that,
but I think that needs a new TCP_AO_KEYF_PREFIX_MATCH (or something of
a kind). Then the keys with everything matching, but a prefix could be
added to the socket, and the longest prefix match will be used.

I think one API decision should be documented straight away (besides
the key flag) – how this flag works with multiple keys.
Say there are 4 keys on a socket, all match the peer being connected:
keyA: ip 10.0.0.0 /8 (keyid = 100)
keyB: ip 10.0.0.0 /16 (keyid = 100)
keyC: ip 10.0.0.0 /8 (keyid = 101)
keyD: ip 10.0.0.0 /16 (keyid = 102)

So, keyA and keyB obviously will have to use this new
TCP_AO_KEYF_PREFIX_MATCH. Should keyC or keyD be copied to the
established connection socket or not?
I'd think the presence of TCP_AO_KEYF_PREFIX_MATCH flag on keyC&keyD
should also affect whether they are copied or not. If the flag is not
on keyC&keyD –  they should be copied to the established socket
(together with keyB, preserving the previous behaviour).

Otherwise, if they have the flag, what should happen?
1. keyB + keyC + keyD
2. keyB + keyD
If we go with (2), then if a user wants keyC on a socket, they could
either remove TCP_AO_KEYF_PREFIX_MATCH from keyC or add keyC1 with
mask /16 and the same password as keyC – slightly inconvenient, but
quite flexible.

What do you think?

> Fix this by implementing sorted insertion in tcp_ao_link_mkt() based on
> key specificity (VRF binding, then prefix length). This keeps the RX
> lookup path fast while ensuring correctness.
>
> The second patch adds a selftest to verify this behavior.
>
> Eric Dumazet (2):
>   tcp: fix TCP-AO key lookup precedence (shadowing)
>   selftests/net: Add TCP-AO key shadowing test
>
>  net/ipv4/tcp_ao.c                             | 27 +++++-
>  tools/testing/selftests/net/tcp_ao/Makefile   |  1 +
>  .../testing/selftests/net/tcp_ao/shadowing.c  | 93 +++++++++++++++++++
>  3 files changed, 120 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/net/tcp_ao/shadowing.c
>
> --
> 2.55.0.rc0.799.gd6f94ed593-goog
>

-- 
             Dmitry

^ permalink raw reply

* [PATCH v2] iproute2: return correct status from help commands across all utilities
From: Rose Wright @ 2026-06-23  1:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Rose Wright
In-Reply-To: <20260622081637.172a6bb8@phoenix.local>

Currently, help commands ("help", "-h", "-help") in utilities
always return error codes.

This is a bug which breaks piping, grep operations, and scripts
that rely on standard exit codes. The fix parameterizes the usage/help
functions in these utilities to dynamically write to stdout when help
is explicitly requested.

This patch standardizes the behavior of help commands and usage errors
across all utilities: ip, bridge, tc, devlink, dcb, rdma, vdpa, dpll,
genl, tipc, and netshaper.

Netlink socket initialization is bypassed when a help command is used to prevent commands from failing in isolated
test/container environments.

Test added to confirm these changes: testsuite/tests/ip/help.t

Signed-off-by: Rose Wright <rosesophiewright@gmail.com>
---
 bridge/bridge.c           |  22 +++---
 dcb/dcb.c                 |  38 +++++++---
 devlink/devlink.c         |  42 ++++++++---
 dpll/dpll.c               |  31 +++++---
 genl/genl.c               |  14 ++--
 netshaper/netshaper.c     |  52 ++++++++++---
 rdma/rdma.c               |  39 +++++++---
 tc/tc.c                   |  14 ++--
 testsuite/tests/ip/help.t | 151 ++++++++++++++++++++++++++++++++++++++
 tipc/bearer.c             | 142 +++++++++++++++++++----------------
 tipc/bearer.h             |   8 +-
 tipc/cmdl.c               |   9 ++-
 tipc/cmdl.h               |   6 +-
 tipc/link.c               | 108 ++++++++++++++-------------
 tipc/link.h               |   4 +-
 tipc/media.c              |  24 +++---
 tipc/media.h              |   4 +-
 tipc/nametable.c          |   8 +-
 tipc/nametable.h          |   4 +-
 tipc/node.c               |  54 ++++++++------
 tipc/node.h               |   4 +-
 tipc/peer.c               |  26 +++----
 tipc/peer.h               |   4 +-
 tipc/socket.c             |   8 +-
 tipc/socket.h             |   4 +-
 tipc/tipc.c               |  26 ++++---
 vdpa/vdpa.c               |  40 +++++++---
 27 files changed, 598 insertions(+), 288 deletions(-)
 create mode 100755 testsuite/tests/ip/help.t

diff --git a/bridge/bridge.c b/bridge/bridge.c
index d993ba19..7bc08b82 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -29,23 +29,25 @@ int timestamp;
 static const char *batch_file;
 int force;
 
-static void usage(void) __attribute__((noreturn));
+static void usage(int status) __attribute__((noreturn));
 
-static void usage(void)
+static void usage(int status)
 {
-	fprintf(stderr,
+	FILE *out = status == 0 ? stdout : stderr;
+
+	fprintf(out,
 "Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
 "       bridge [ -force ] -batch filename\n"
 "where  OBJECT := { link | fdb | mdb | mst | vlan | vni | monitor }\n"
 "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
 "                    -o[neline] | -t[imestamp] | -n[etns] name |\n"
 "                    -com[pressvlans] -c[olor] -p[retty] -j[son] }\n");
-	exit(-1);
+	exit(status);
 }
 
 static int do_help(int argc, char **argv)
 {
-	usage();
+	usage(0);
 }
 
 
@@ -118,7 +120,7 @@ main(int argc, char **argv)
 			opt++;
 
 		if (matches(opt, "-help") == 0) {
-			usage();
+			usage(0);
 		} else if (matches(opt, "-Version") == 0) {
 			printf("bridge utility, %s\n", version);
 			exit(0);
@@ -135,13 +137,13 @@ main(int argc, char **argv)
 			argc--;
 			argv++;
 			if (argc <= 1)
-				usage();
+				usage(-1);
 			if (strcmp(argv[1], "inet") == 0)
 				preferred_family = AF_INET;
 			else if (strcmp(argv[1], "inet6") == 0)
 				preferred_family = AF_INET6;
 			else if (strcmp(argv[1], "help") == 0)
-				usage();
+				usage(0);
 			else
 				invarg("invalid protocol family", argv[1]);
 		} else if (strcmp(opt, "-4") == 0) {
@@ -165,7 +167,7 @@ main(int argc, char **argv)
 			argc--;
 			argv++;
 			if (argc <= 1)
-				usage();
+				usage(-1);
 			batch_file = argv[1];
 		} else {
 			fprintf(stderr,
@@ -192,5 +194,5 @@ main(int argc, char **argv)
 		return do_cmd(argv[1], argc-1, argv+1);
 
 	rtnl_close(&rth);
-	usage();
+	usage(-1);
 }
diff --git a/dcb/dcb.c b/dcb/dcb.c
index fe0a0f04..131005fe 100644
--- a/dcb/dcb.c
+++ b/dcb/dcb.c
@@ -465,9 +465,9 @@ int dcb_cmd_parse_dev(struct dcb *dcb, int argc, char **argv,
 	}
 }
 
-static void dcb_help(void)
+static void dcb_help(FILE *out)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: dcb [ OPTIONS ] OBJECT { COMMAND | help }\n"
 		"       dcb [ -f | --force ] { -b | --batch } filename [ -n | --netns ] netnsname\n"
 		"where  OBJECT := { app | apptrust | buffer | dcbx | ets | maxrate | pfc | rewr }\n"
@@ -478,8 +478,11 @@ static void dcb_help(void)
 
 static int dcb_cmd(struct dcb *dcb, int argc, char **argv)
 {
-	if (!argc || matches(*argv, "help") == 0) {
-		dcb_help();
+	if (!argc) {
+		dcb_help(stderr);
+		return -EINVAL;
+	} else if (matches(*argv, "help") == 0) {
+		dcb_help(stdout);
 		return 0;
 	} else if (matches(*argv, "app") == 0) {
 		return dcb_cmd_app(dcb, argc - 1, argv + 1);
@@ -533,6 +536,7 @@ int main(int argc, char **argv)
 	const char *batch_file = NULL;
 	bool force = false;
 	struct dcb *dcb;
+	bool need_nl = true;
 	int opt;
 	int err;
 	int ret;
@@ -579,12 +583,12 @@ int main(int argc, char **argv)
 			dcb->use_iec = true;
 			break;
 		case 'h':
-			dcb_help();
+			dcb_help(stdout);
 			ret = EXIT_SUCCESS;
 			goto dcb_free;
 		default:
 			fprintf(stderr, "Unknown option.\n");
-			dcb_help();
+			dcb_help(stderr);
 			ret = EXIT_FAILURE;
 			goto dcb_free;
 		}
@@ -593,10 +597,21 @@ int main(int argc, char **argv)
 	argc -= optind;
 	argv += optind;
 
-	err = dcb_init(dcb);
-	if (err) {
-		ret = EXIT_FAILURE;
-		goto dcb_free;
+	if (argc > 0 && (strcmp(argv[0], "help") == 0 ||
+			 strcmp(argv[0], "-h") == 0 ||
+			 strcmp(argv[0], "--help") == 0))
+		need_nl = false;
+	if (argc > 1 && (strcmp(argv[1], "help") == 0 ||
+			 strcmp(argv[1], "-h") == 0 ||
+			 strcmp(argv[1], "--help") == 0))
+		need_nl = false;
+
+	if (need_nl) {
+		err = dcb_init(dcb);
+		if (err) {
+			ret = EXIT_FAILURE;
+			goto dcb_free;
+		}
 	}
 
 	if (batch_file)
@@ -612,7 +627,8 @@ int main(int argc, char **argv)
 	ret = EXIT_SUCCESS;
 
 dcb_fini:
-	dcb_fini(dcb);
+	if (need_nl)
+		dcb_fini(dcb);
 dcb_free:
 	dcb_free(dcb);
 
diff --git a/devlink/devlink.c b/devlink/devlink.c
index b4deba30..ad64ba1e 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -10332,9 +10332,9 @@ static int cmd_trap(struct dl *dl)
 	return -ENOENT;
 }
 
-static void help(void)
+static void help(FILE *out)
 {
-	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
+	fprintf(out, "Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename -N[etns] netnsname\n"
 	       "where  OBJECT := { dev | port | lc | sb | monitor | dpipe | resource | region | health | trap }\n"
 	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] -s[tatistics] -[he]x }\n");
@@ -10345,9 +10345,12 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
 	dl->argc = argc;
 	dl->argv = argv;
 
-	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
-		help();
+	if (dl_argv_match(dl, "help")) {
+		help(stdout);
 		return 0;
+	} else if (dl_no_arg(dl)) {
+		help(stderr);
+		return -EINVAL;
 	} else if (dl_argv_match(dl, "dev")) {
 		dl_arg_inc(dl);
 		return cmd_dev(dl);
@@ -10454,6 +10457,7 @@ int main(int argc, char **argv)
 	const char *batch_file = NULL;
 	bool force = false;
 	struct dl *dl;
+	bool need_nl = true;
 	int opt;
 	int err;
 	int ret;
@@ -10464,7 +10468,7 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	while ((opt = getopt_long(argc, argv, "Vfb:njpvsN:ix",
+	while ((opt = getopt_long(argc, argv, "Vfb:njpvsN:ixh",
 				  long_options, NULL)) >= 0) {
 
 		switch (opt) {
@@ -10505,9 +10509,13 @@ int main(int argc, char **argv)
 		case 'x':
 			dl->hex = true;
 			break;
+		case 'h':
+			help(stdout);
+			ret = EXIT_SUCCESS;
+			goto dl_free;
 		default:
 			pr_err("Unknown option.\n");
-			help();
+			help(stderr);
 			ret = EXIT_FAILURE;
 			goto dl_free;
 		}
@@ -10516,10 +10524,21 @@ int main(int argc, char **argv)
 	argc -= optind;
 	argv += optind;
 
-	err = dl_init(dl);
-	if (err) {
-		ret = EXIT_FAILURE;
-		goto dl_free;
+	if (argc > 0 && (strcmp(argv[0], "help") == 0 ||
+			 strcmp(argv[0], "-h") == 0 ||
+			 strcmp(argv[0], "--help") == 0))
+		need_nl = false;
+	if (argc > 1 && (strcmp(argv[1], "help") == 0 ||
+			 strcmp(argv[1], "-h") == 0 ||
+			 strcmp(argv[1], "--help") == 0))
+		need_nl = false;
+
+	if (need_nl) {
+		err = dl_init(dl);
+		if (err) {
+			ret = EXIT_FAILURE;
+			goto dl_free;
+		}
 	}
 
 	if (batch_file)
@@ -10535,7 +10554,8 @@ int main(int argc, char **argv)
 	ret = EXIT_SUCCESS;
 
 dl_fini:
-	dl_fini(dl);
+	if (need_nl)
+		dl_fini(dl);
 dl_free:
 	dl_free(dl);
 
diff --git a/dpll/dpll.c b/dpll/dpll.c
index 60404e13..e0afb4fe 100644
--- a/dpll/dpll.c
+++ b/dpll/dpll.c
@@ -575,9 +575,9 @@ static void dpll_pr_freq_range(__u64 freq_min, __u64 freq_max)
 	close_json_object();
 }
 
-static void help(void)
+static void help(FILE *out)
 {
-	pr_err("Usage: dpll [ OPTIONS ] OBJECT { COMMAND | help }\n"
+	fprintf(out, "Usage: dpll [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "where  OBJECT := { device | pin | monitor }\n"
 	       "       OPTIONS := { -V | --Version | -j | --json | -p | --pretty |\n"
 	       "                    -t | --timestamp | --tshort }\n");
@@ -592,9 +592,12 @@ static int dpll_cmd(struct dpll *dpll, int argc, char **argv)
 	dpll->argc = argc;
 	dpll->argv = argv;
 
-	if (dpll_argv_match(dpll, "help") || dpll_no_arg(dpll)) {
-		help();
+	if (dpll_argv_match(dpll, "help")) {
+		help(stdout);
 		return 0;
+	} else if (dpll_no_arg(dpll)) {
+		help(stderr);
+		return -EINVAL;
 	} else if (dpll_argv_match_inc(dpll, "device")) {
 		return cmd_device(dpll);
 	} else if (dpll_argv_match_inc(dpll, "pin")) {
@@ -646,10 +649,12 @@ int main(int argc, char **argv)
 		{ "pretty", no_argument, NULL, 'p' },
 		{ "timestamp", no_argument, NULL, 't' },
 		{ "tshort", no_argument, NULL, OPT_TSHORT },
+		{ "help", no_argument, NULL, 'h' },
 		{ NULL, 0, NULL, 0 }
 	};
-	const char *opt_short = "Vjpt";
+	const char *opt_short = "Vjpth";
 	struct dpll *dpll;
+	bool need_nl = true;
 	int err, opt, ret;
 
 	dpll = dpll_alloc();
@@ -678,9 +683,13 @@ int main(int argc, char **argv)
 			timestamp = 1;
 			timestamp_short = 1;
 			break;
+		case 'h':
+			help(stdout);
+			ret = EXIT_SUCCESS;
+			goto dpll_free;
 		default:
 			pr_err("Unknown option.\n");
-			help();
+			help(stderr);
 			ret = EXIT_FAILURE;
 			goto dpll_free;
 		}
@@ -700,11 +709,13 @@ int main(int argc, char **argv)
 	}
 
 	/* Skip netlink init for help commands */
-	bool need_nl = true;
-
-	if (argc > 0 && strcmp(argv[0], "help") == 0)
+	if (argc > 0 && (strcmp(argv[0], "help") == 0 ||
+			 strcmp(argv[0], "-h") == 0 ||
+			 strcmp(argv[0], "--help") == 0))
 		need_nl = false;
-	if (argc > 1 && strcmp(argv[1], "help") == 0)
+	if (argc > 1 && (strcmp(argv[1], "help") == 0 ||
+			 strcmp(argv[1], "-h") == 0 ||
+			 strcmp(argv[1], "--help") == 0))
 		need_nl = false;
 
 	if (need_nl) {
diff --git a/genl/genl.c b/genl/genl.c
index b497a3ad..2c304a57 100644
--- a/genl/genl.c
+++ b/genl/genl.c
@@ -90,16 +90,18 @@ noexist:
 	return f;
 }
 
-static void usage(void) __attribute__((noreturn));
+static void usage(int status) __attribute__((noreturn));
 
-static void usage(void)
+static void usage(int status)
 {
-	fprintf(stderr,
+	FILE *out = status == 0 ? stdout : stderr;
+
+	fprintf(out,
 		"Usage: genl [ OPTIONS ] OBJECT [help] }\n"
 		"where  OBJECT := { ctrl etc }\n"
 		"       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[aw] |\n"
 		"                    -j[son] | -p[retty] }\n");
-	exit(-1);
+	exit(status);
 }
 
 int main(int argc, char **argv)
@@ -122,7 +124,7 @@ int main(int argc, char **argv)
 		} else if (matches(argv[1], "-pretty") == 0) {
 			++pretty;
 		} else if (matches(argv[1], "-help") == 0) {
-			usage();
+			usage(0);
 		} else {
 			fprintf(stderr,
 				"Option \"%s\" is unknown, try \"genl -help\".\n",
@@ -146,5 +148,5 @@ int main(int argc, char **argv)
 		return ret;
 	}
 
-	usage();
+	usage(-1);
 }
diff --git a/netshaper/netshaper.c b/netshaper/netshaper.c
index 3b47d43d..f9aeac9b 100644
--- a/netshaper/netshaper.c
+++ b/netshaper/netshaper.c
@@ -26,9 +26,11 @@
 static struct rtnl_handle gen_rth = { .fd = -1 };
 static int genl_family = -1;
 
-static void usage(void)
+static void usage(int status)
 {
-	fprintf(stderr,
+	FILE *out = status == 0 ? stdout : stderr;
+
+	fprintf(out,
 		"Usage: netshaper [ OPTIONS ] { COMMAND | help }\n"
 		"OPTIONS := { -V[ersion] | -c[olor] | -help }\n"
 		"COMMAND := { set | get | delete | group } dev DEVNAME\n"
@@ -216,6 +218,13 @@ static void print_netshaper_attrs(struct nlmsghdr *answer)
 
 static int do_cmd(int argc, char **argv, int cmd)
 {
+	if (argc > 0 && (strcmp(*argv, "help") == 0 ||
+			 strcmp(*argv, "-h") == 0 ||
+			 strcmp(*argv, "--help") == 0)) {
+		usage(0);
+		exit(0);
+	}
+
 	GENL_REQUEST(req, 1024, genl_family, 0, NET_SHAPER_FAMILY_VERSION, cmd,
 		     NLM_F_REQUEST | NLM_F_ACK);
 
@@ -243,7 +252,7 @@ static int do_cmd(int argc, char **argv, int cmd)
 
 			if (strcmp(*argv, "scope") != 0) {
 				fprintf(stderr, "What is \"%s\"\n", *argv);
-				usage();
+				usage(-1);
 				return -1;
 			}
 			NEXT_ARG();
@@ -270,7 +279,7 @@ static int do_cmd(int argc, char **argv, int cmd)
 				NEXT_ARG();
 				if (strcmp(*argv, "id") != 0) {
 					fprintf(stderr, "What is \"%s\"\n", *argv);
-					usage();
+					usage(-1);
 					return -1;
 				}
 				NEXT_ARG();
@@ -282,7 +291,7 @@ static int do_cmd(int argc, char **argv, int cmd)
 			}
 		} else {
 			fprintf(stderr, "What is \"%s\"\n", *argv);
-			usage();
+			usage(-1);
 			return -1;
 		}
 		argc--;
@@ -363,7 +372,7 @@ static int parse_scope_id(const char *what, int *argcp, char ***argvp, int *scop
 		NEXT_ARG();
 		if (strcmp(*argv, "id") != 0) {
 			fprintf(stderr, "What is \"%s\"\n", *argv);
-			usage();
+			usage(-1);
 			return -1;
 		}
 		NEXT_ARG();
@@ -476,6 +485,13 @@ static int parse_leaves(int *argcp, char ***argvp,
 
 static int do_group(int argc, char **argv)
 {
+	if (argc > 0 && (strcmp(*argv, "help") == 0 ||
+			 strcmp(*argv, "-h") == 0 ||
+			 strcmp(*argv, "--help") == 0)) {
+		usage(0);
+		exit(0);
+	}
+
 	GENL_REQUEST(req, 4096, genl_family, 0, NET_SHAPER_FAMILY_VERSION,
 		     NET_SHAPER_CMD_GROUP, NLM_F_REQUEST | NLM_F_ACK);
 
@@ -517,7 +533,7 @@ static int do_group(int argc, char **argv)
 			continue;
 		} else {
 			fprintf(stderr, "What is \"%s\"\n", *argv);
-			usage();
+			usage(-1);
 			goto free_leaves;
 		}
 		argc--;
@@ -599,6 +615,7 @@ free_leaves:
 int main(int argc, char **argv)
 {
 	int color = default_color_opt();
+	bool need_nl = true;
 
 	while (argc > 1) {
 		const char *opt = argv[1];
@@ -609,7 +626,7 @@ int main(int argc, char **argv)
 			opt++;
 
 		if (strcmp(opt, "-help") == 0) {
-			usage();
+			usage(0);
 			exit(0);
 		} else if (strcmp(opt, "-Version") == 0 ||
 			   strcmp(opt, "-V") == 0) {
@@ -627,8 +644,19 @@ int main(int argc, char **argv)
 
 	check_enable_color(color, 0);
 
-	if (genl_init_handle(&gen_rth, NET_SHAPER_FAMILY_NAME, &genl_family))
-		exit(1);
+	if (argc > 1 && (strcmp(argv[1], "help") == 0 ||
+			 strcmp(argv[1], "-h") == 0 ||
+			 strcmp(argv[1], "--help") == 0))
+		need_nl = false;
+	if (argc > 2 && (strcmp(argv[2], "help") == 0 ||
+			 strcmp(argv[2], "-h") == 0 ||
+			 strcmp(argv[2], "--help") == 0))
+		need_nl = false;
+
+	if (need_nl) {
+		if (genl_init_handle(&gen_rth, NET_SHAPER_FAMILY_NAME, &genl_family))
+			exit(1);
+	}
 
 	if (argc > 1) {
 		argc--;
@@ -643,7 +671,7 @@ int main(int argc, char **argv)
 		if (strcmp(*argv, "group") == 0)
 			return do_group(argc - 1, argv + 1);
 		if (strcmp(*argv, "help") == 0) {
-			usage();
+			usage(0);
 			return 0;
 		}
 		fprintf(stderr,
@@ -651,6 +679,6 @@ int main(int argc, char **argv)
 			*argv);
 		exit(-1);
 	}
-	usage();
+	usage(-1);
 	exit(-1);
 }
diff --git a/rdma/rdma.c b/rdma/rdma.c
index 253ac58b..3bdb76f0 100644
--- a/rdma/rdma.c
+++ b/rdma/rdma.c
@@ -11,9 +11,9 @@
 /* Global utils flags */
 int json;
 
-static void help(char *name)
+static void help(FILE *out, char *name)
 {
-	pr_out("Usage: %s [ OPTIONS ] OBJECT { COMMAND | help }\n"
+	fprintf(out, "Usage: %s [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       %s [ -f[orce] ] -b[atch] filename\n"
 	       "where  OBJECT := { dev | link | resource | monitor | system | statistic | help }\n"
 	       "       OPTIONS := { -V[ersion] | -d[etails] | -j[son] | -p[retty] | -r[aw]}\n", name, name);
@@ -21,14 +21,20 @@ static void help(char *name)
 
 static int cmd_help(struct rd *rd)
 {
-	help(rd->filename);
+	help(stdout, rd->filename);
 	return 0;
 }
 
+static int cmd_no_arg(struct rd *rd)
+{
+	help(stderr, rd->filename);
+	return -EINVAL;
+}
+
 static int rd_cmd(struct rd *rd, int argc, char **argv)
 {
 	const struct rd_cmd cmds[] = {
-		{ NULL,		cmd_help },
+		{ NULL,		cmd_no_arg },
 		{ "help",	cmd_help },
 		{ "dev",	cmd_dev },
 		{ "link",	cmd_link },
@@ -104,6 +110,7 @@ int main(int argc, char **argv)
 	bool show_raw = false;
 	bool force = false;
 	bool oneline = false;
+	bool need_nl = true;
 	struct rd rd = {};
 	char *filename;
 	int opt;
@@ -142,14 +149,14 @@ int main(int argc, char **argv)
 			batch_file = optarg;
 			break;
 		case 'h':
-			help(filename);
+			help(stdout, filename);
 			return EXIT_SUCCESS;
 		case ':':
 			pr_err("-%c option requires an argument\n", optopt);
 			return EXIT_FAILURE;
 		default:
 			pr_err("Unknown option.\n");
-			help(filename);
+			help(stderr, filename);
 			return EXIT_FAILURE;
 		}
 	}
@@ -163,9 +170,20 @@ int main(int argc, char **argv)
 	rd.show_driver_details = show_driver_details;
 	rd.show_raw = show_raw;
 
-	err = rd_init(&rd, filename);
-	if (err)
-		goto out;
+	if (argc > 0 && (strcmp(argv[0], "help") == 0 ||
+			 strcmp(argv[0], "-h") == 0 ||
+			 strcmp(argv[0], "--help") == 0))
+		need_nl = false;
+	if (argc > 1 && (strcmp(argv[1], "help") == 0 ||
+			 strcmp(argv[1], "-h") == 0 ||
+			 strcmp(argv[1], "--help") == 0))
+		need_nl = false;
+
+	if (need_nl) {
+		err = rd_init(&rd, filename);
+		if (err)
+			goto out;
+	}
 
 	if (batch_file)
 		err = rd_batch(&rd, batch_file, force);
@@ -173,6 +191,7 @@ int main(int argc, char **argv)
 		err = rd_cmd(&rd, argc, argv);
 out:
 	/* Always cleanup */
-	rd_cleanup(&rd);
+	if (need_nl)
+		rd_cleanup(&rd);
 	return err ? EXIT_FAILURE : EXIT_SUCCESS;
 }
diff --git a/tc/tc.c b/tc/tc.c
index 7d69e4d5..057541d2 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -186,9 +186,11 @@ noexist:
 	return q;
 }
 
-static void usage(void)
+static void usage(int status)
 {
-	fprintf(stderr,
+	FILE *out = status == 0 ? stdout : stderr;
+	
+	fprintf(out,
 		"Usage:	tc [ OPTIONS ] OBJECT { COMMAND | help }\n"
 		"	tc [-force] -batch filename\n"
 		"where  OBJECT := { qdisc | class | filter | chain |\n"
@@ -217,7 +219,7 @@ static int do_cmd(int argc, char **argv)
 	if (matches(*argv, "exec") == 0)
 		return do_exec(argc-1, argv+1);
 	if (matches(*argv, "help") == 0) {
-		usage();
+		usage(0);
 		return 0;
 	}
 
@@ -282,7 +284,7 @@ int main(int argc, char **argv)
 		} else if (matches(argv[1], "-iec") == 0) {
 			++use_iec;
 		} else if (matches(argv[1], "-help") == 0) {
-			usage();
+			usage(0);
 			return 0;
 		} else if (matches(argv[1], "-force") == 0) {
 			++force;
@@ -335,8 +337,8 @@ int main(int argc, char **argv)
 		return batch(batch_file);
 
 	if (argc <= 1) {
-		usage();
-		return 0;
+		usage(-1);
+		return -1;
 	}
 
 	tc_core_init();
diff --git a/testsuite/tests/ip/help.t b/testsuite/tests/ip/help.t
new file mode 100755
index 00000000..141abdd6
--- /dev/null
+++ b/testsuite/tests/ip/help.t
@@ -0,0 +1,151 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+. lib/generic.sh
+
+ts_log "[Testing help exit codes and output streams]"
+
+if [ -z "$SNAME" ]; then
+	SNAME="iproute2/iproute2-this"
+fi
+
+# Define binary paths using env vars if available, else fallback to SNAME subpaths
+IP="${IP:-$SNAME/ip/ip}"
+BRIDGE="${BRIDGE:-$SNAME/bridge/bridge}"
+TC="${TC:-$SNAME/tc/tc}"
+DEVLINK="${DEVLINK:-$SNAME/devlink/devlink}"
+DCB="${DCB:-$SNAME/dcb/dcb}"
+RDMA="${RDMA:-$SNAME/rdma/rdma}"
+VDPA="${VDPA:-$SNAME/vdpa/vdpa}"
+DPLL="${DPLL:-$SNAME/dpll/dpll}"
+GENL="${GENL:-$SNAME/genl/genl}"
+TIPC="${TIPC:-$SNAME/tipc/tipc}"
+NETSHAPER="${NETSHAPER:-$SNAME/netshaper/netshaper}"
+
+test_help_cmd() {
+	binary_path="$1"
+	arg="$2"
+	desc="$3"
+
+	$binary_path $arg >$STD_OUT 2>$STD_ERR
+	status=$?
+
+	if [ $status -ne 0 ]; then
+		ts_err "$0: $desc exited with $status (expected 0)"
+		return 1
+	fi
+
+	if [ -s $STD_ERR ]; then
+		ts_err "$0: $desc wrote to stderr (expected stdout only)"
+		ts_err "$0: stderr content:"
+		ts_err_cat $STD_ERR
+		return 1
+	fi
+
+	if [ ! -s $STD_OUT ]; then
+		ts_err "$0: $desc stdout was empty"
+		return 1
+	fi
+
+	echo "$0: $desc passed"
+	return 0
+}
+
+test_usage_error() {
+	binary_path="$1"
+	desc="$2"
+
+	$binary_path >$STD_OUT 2>$STD_ERR
+	status=$?
+
+	if [ $status -eq 0 ]; then
+		ts_err "$0: $desc (no args) exited with 0 (expected non-zero)"
+		return 1
+	fi
+
+	if [ ! -s $STD_ERR ]; then
+		ts_err "$0: $desc (no args) stderr was empty (expected usage info)"
+		return 1
+	fi
+
+	echo "$0: $desc (no args) failed as expected"
+	return 0
+}
+
+test_help_netlink_bypass() {
+	binary_path="$1"
+	arg="$2"
+	desc="$3"
+
+	$binary_path $arg >$STD_OUT 2>$STD_ERR
+	status=$?
+
+	if [ $status -ne 0 ]; then
+		ts_err "$0: $desc exited with $status (expected 0)"
+		return 1
+	fi
+
+	echo "$0: $desc passed"
+	return 0
+}
+
+# Run tests for each utility
+# ip
+test_help_cmd "$IP" "help" "ip help"
+test_help_cmd "$IP" "-help" "ip -help"
+test_usage_error "$IP" "ip"
+
+# bridge
+test_help_cmd "$BRIDGE" "help" "bridge help"
+test_help_cmd "$BRIDGE" "-h" "bridge -h"
+test_usage_error "$BRIDGE" "bridge"
+
+# tc
+test_help_cmd "$TC" "help" "tc help"
+test_help_cmd "$TC" "-h" "tc -h"
+test_usage_error "$TC" "tc"
+
+# devlink
+test_help_cmd "$DEVLINK" "help" "devlink help"
+test_help_cmd "$DEVLINK" "-h" "devlink -h"
+test_help_netlink_bypass "$DEVLINK" "dev help" "devlink dev help"
+test_usage_error "$DEVLINK" "devlink"
+
+# dcb
+test_help_cmd "$DCB" "help" "dcb help"
+test_help_cmd "$DCB" "-h" "dcb -h"
+test_help_netlink_bypass "$DCB" "app help" "dcb app help"
+test_usage_error "$DCB" "dcb"
+
+# rdma
+test_help_cmd "$RDMA" "help" "rdma help"
+test_help_cmd "$RDMA" "-h" "rdma -h"
+test_help_netlink_bypass "$RDMA" "dev help" "rdma dev help"
+test_usage_error "$RDMA" "rdma"
+
+# vdpa
+test_help_cmd "$VDPA" "help" "vdpa help"
+test_help_cmd "$VDPA" "-h" "vdpa -h"
+test_help_netlink_bypass "$VDPA" "dev help" "vdpa dev help"
+test_usage_error "$VDPA" "vdpa"
+
+# dpll
+test_help_cmd "$DPLL" "help" "dpll help"
+test_help_cmd "$DPLL" "-h" "dpll -h"
+test_help_netlink_bypass "$DPLL" "device help" "dpll device help"
+test_usage_error "$DPLL" "dpll"
+
+# genl
+test_help_cmd "$GENL" "-help" "genl -help"
+test_usage_error "$GENL" "genl"
+
+# tipc
+test_help_cmd "$TIPC" "-h" "tipc -h"
+test_help_cmd "$TIPC" "--help" "tipc --help"
+test_usage_error "$TIPC" "tipc"
+
+# netshaper
+test_help_cmd "$NETSHAPER" "help" "netshaper help"
+test_help_cmd "$NETSHAPER" "-help" "netshaper -help"
+test_help_netlink_bypass "$NETSHAPER" "group help" "netshaper group help"
+test_usage_error "$NETSHAPER" "netshaper"
diff --git a/tipc/bearer.c b/tipc/bearer.c
index bb434f5f..2804631b 100644
--- a/tipc/bearer.c
+++ b/tipc/bearer.c
@@ -33,9 +33,9 @@ struct cb_data {
 	struct nlmsghdr *nlh;
 };
 
-static void _print_bearer_opts(void)
+static void _print_bearer_opts(FILE *out)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"OPTIONS\n"
 		" priority		- Bearer link priority\n"
 		" tolerance		- Bearer link tolerance\n"
@@ -43,18 +43,18 @@ static void _print_bearer_opts(void)
 		" mtu			- Bearer link mtu\n");
 }
 
-void print_bearer_media(void)
+void print_bearer_media(FILE *out)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"\nMEDIA\n"
 		" udp			- User Datagram Protocol\n"
 		" ib			- Infiniband\n"
 		" eth			- Ethernet\n");
 }
 
-static void cmd_bearer_enable_l2_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_enable_l2_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s bearer enable media %s device DEVICE [OPTIONS]\n"
 		"\nOPTIONS\n"
 		" domain DOMAIN		- Discovery domain\n"
@@ -62,9 +62,9 @@ static void cmd_bearer_enable_l2_help(struct cmdl *cmdl, char *media)
 		cmdl->argv[0], media);
 }
 
-static void cmd_bearer_enable_udp_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_enable_udp_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s bearer enable [OPTIONS] media %s name NAME [localip IP|device DEVICE] [UDP OPTIONS]\n\n"
 		"OPTIONS\n"
 		" domain DOMAIN		- Discovery domain\n"
@@ -230,7 +230,7 @@ static int nl_add_udp_enable_opts(struct nlmsghdr *nlh, struct opt *opts,
 		opt = get_opt(opts, "localip");
 		if (!opt) {
 			fprintf(stderr, "error, udp bearer localip/device missing\n");
-			cmd_bearer_enable_udp_help(cmdl, "udp");
+			cmd_bearer_enable_udp_help(help_flag ? stdout : stderr, cmdl, "udp");
 			return -EINVAL;
 		}
 		locip = opt->val;
@@ -292,7 +292,7 @@ static char *cmd_get_media_type(const struct cmd *cmd, struct cmdl *cmdl,
 
 	if (!opt) {
 		if (help_flag)
-			(cmd->help)(cmdl);
+			(cmd->help)(help_flag ? stdout : stderr, cmdl);
 		else
 			fprintf(stderr, "error, missing bearer media\n");
 		return NULL;
@@ -307,14 +307,14 @@ static int nl_add_bearer_name(struct nlmsghdr *nlh, const struct cmd *cmd,
 	char bname[TIPC_MAX_BEARER_NAME];
 	int err;
 
-	if ((err = cmd_get_unique_bearer_name(cmd, cmdl, opts, bname, sup_media)))
+	if ((err = cmd_get_unique_bearer_name(help_flag ? stdout : stderr, cmd, cmdl, opts, bname, sup_media)))
 		return err;
 
 	mnl_attr_put_strz(nlh, TIPC_NLA_BEARER_NAME, bname);
 	return 0;
 }
 
-int cmd_get_unique_bearer_name(const struct cmd *cmd, struct cmdl *cmdl,
+int cmd_get_unique_bearer_name(FILE *out, const struct cmd *cmd, struct cmdl *cmdl,
 			       struct opt *opts, char *bname,
 			       const struct tipc_sup_media *sup_media)
 {
@@ -332,7 +332,7 @@ int cmd_get_unique_bearer_name(const struct cmd *cmd, struct cmdl *cmdl,
 
 		if (!(opt = get_opt(opts, entry->identifier))) {
 			if (help_flag)
-				(entry->help)(cmdl, media);
+				(entry->help)(out, cmdl, media);
 			else
 				fprintf(stderr, "error, missing bearer %s\n",
 					entry->identifier);
@@ -350,15 +350,15 @@ int cmd_get_unique_bearer_name(const struct cmd *cmd, struct cmdl *cmdl,
 	return -EINVAL;
 }
 
-static void cmd_bearer_add_udp_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_add_udp_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr, "Usage: %s bearer add media %s name NAME remoteip REMOTEIP\n\n",
+	fprintf(out, "Usage: %s bearer add media %s name NAME remoteip REMOTEIP\n\n",
 		cmdl->argv[0], media);
 }
 
-static void cmd_bearer_add_help(struct cmdl *cmdl)
+static void cmd_bearer_add_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s bearer add media udp name NAME remoteip REMOTEIP\n",
+	fprintf(out, "Usage: %s bearer add media udp name NAME remoteip REMOTEIP\n",
 		cmdl->argv[0]);
 }
 
@@ -421,6 +421,11 @@ static int cmd_bearer_add_media(struct nlmsghdr *nlh, const struct cmd *cmd,
 		{ NULL, },
 	};
 
+	if (help_flag) {
+		(cmd->help)(stdout, cmdl);
+		return 0;
+	}
+
 	/* Rewind optind to include media in the option list */
 	cmdl->optind--;
 	if (parse_opts(opts, cmdl) < 0)
@@ -473,15 +478,15 @@ static int cmd_bearer_add(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_bearer_enable_help(struct cmdl *cmdl)
+static void cmd_bearer_enable_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s bearer enable [OPTIONS] media MEDIA ARGS...\n\n"
 		"OPTIONS\n"
 		" domain DOMAIN         - Discovery domain\n"
 		" priority PRIORITY     - Bearer priority\n",
 		cmdl->argv[0]);
-	print_bearer_media();
+	print_bearer_media(out);
 }
 
 static int cmd_bearer_enable(struct nlmsghdr *nlh, const struct cmd *cmd,
@@ -509,12 +514,14 @@ static int cmd_bearer_enable(struct nlmsghdr *nlh, const struct cmd *cmd,
 		{ NULL, },
 	};
 
-	if (parse_opts(opts, cmdl) < 0) {
-		if (help_flag)
-			(cmd->help)(cmdl);
-		return -EINVAL;
+	if (help_flag) {
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
+	if (parse_opts(opts, cmdl) < 0)
+		return -EINVAL;
+
 	nlh = msg_init(TIPC_NL_BEARER_ENABLE);
 	if (!nlh) {
 		fprintf(stderr, "error: message initialisation failed\n");
@@ -548,23 +555,23 @@ static int cmd_bearer_enable(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_doit(nlh, NULL, NULL);
 }
 
-static void cmd_bearer_disable_l2_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_disable_l2_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr, "Usage: %s bearer disable media %s device DEVICE\n",
+	fprintf(out, "Usage: %s bearer disable media %s device DEVICE\n",
 		cmdl->argv[0], media);
 }
 
-static void cmd_bearer_disable_udp_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_disable_udp_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr, "Usage: %s bearer disable media %s name NAME\n",
+	fprintf(out, "Usage: %s bearer disable media %s name NAME\n",
 		cmdl->argv[0], media);
 }
 
-static void cmd_bearer_disable_help(struct cmdl *cmdl)
+static void cmd_bearer_disable_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s bearer disable media MEDIA ARGS...\n",
+	fprintf(out, "Usage: %s bearer disable media MEDIA ARGS...\n",
 		cmdl->argv[0]);
-	print_bearer_media();
+	print_bearer_media(out);
 }
 
 static int cmd_bearer_disable(struct nlmsghdr *nlh, const struct cmd *cmd,
@@ -585,12 +592,14 @@ static int cmd_bearer_disable(struct nlmsghdr *nlh, const struct cmd *cmd,
 		{ NULL, },
 	};
 
-	if (parse_opts(opts, cmdl) < 0) {
-		if (help_flag)
-			(cmd->help)(cmdl);
-		return -EINVAL;
+	if (help_flag) {
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
+	if (parse_opts(opts, cmdl) < 0)
+		return -EINVAL;
+
 	nlh = msg_init(TIPC_NL_BEARER_DISABLE);
 	if (!nlh) {
 		fprintf(stderr, "error, message initialisation failed\n");
@@ -607,27 +616,27 @@ static int cmd_bearer_disable(struct nlmsghdr *nlh, const struct cmd *cmd,
 
 }
 
-static void cmd_bearer_set_help(struct cmdl *cmdl)
+static void cmd_bearer_set_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s bearer set OPTION media MEDIA ARGS...\n",
+	fprintf(out, "Usage: %s bearer set OPTION media MEDIA ARGS...\n",
 		cmdl->argv[0]);
-	_print_bearer_opts();
-	print_bearer_media();
+	_print_bearer_opts(out);
+	print_bearer_media(out);
 }
 
-static void cmd_bearer_set_udp_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_set_udp_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr, "Usage: %s bearer set OPTION media %s name NAME\n\n",
+	fprintf(out, "Usage: %s bearer set OPTION media %s name NAME\n\n",
 		cmdl->argv[0], media);
-	_print_bearer_opts();
+	_print_bearer_opts(out);
 }
 
-static void cmd_bearer_set_l2_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_set_l2_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s bearer set [OPTION]... media %s device DEVICE\n",
 		cmdl->argv[0], media);
-	_print_bearer_opts();
+	_print_bearer_opts(out);
 }
 
 static int cmd_bearer_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
@@ -662,6 +671,11 @@ static int cmd_bearer_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 	else
 		return -EINVAL;
 
+	if (help_flag) {
+		(cmd->help)(stdout, cmdl);
+		return 0;
+	}
+
 	if (cmdl->optind >= cmdl->argc) {
 		fprintf(stderr, "error, missing value\n");
 		return -EINVAL;
@@ -716,33 +730,33 @@ static int cmd_bearer_set(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_bearer_get_help(struct cmdl *cmdl)
+static void cmd_bearer_get_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s bearer get [OPTION] media MEDIA ARGS...\n",
+	fprintf(out, "Usage: %s bearer get [OPTION] media MEDIA ARGS...\n",
 		cmdl->argv[0]);
-	_print_bearer_opts();
-	print_bearer_media();
+	_print_bearer_opts(out);
+	print_bearer_media(out);
 }
 
-static void cmd_bearer_get_udp_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_get_udp_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr, "Usage: %s bearer get [OPTION] media %s name NAME [UDP OPTIONS]\n\n",
+	fprintf(out, "Usage: %s bearer get [OPTION] media %s name NAME [UDP OPTIONS]\n\n",
 		cmdl->argv[0], media);
-	fprintf(stderr,
+	fprintf(out,
 		"UDP OPTIONS\n"
 		" remoteip              - Remote ip address\n"
 		" remoteport            - Remote port\n"
 		" localip               - Local ip address\n"
 		" localport             - Local port\n\n");
-	_print_bearer_opts();
+	_print_bearer_opts(out);
 }
 
-static void cmd_bearer_get_l2_help(struct cmdl *cmdl, char *media)
+static void cmd_bearer_get_l2_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s bearer get OPTION media %s device DEVICE\n",
 		cmdl->argv[0], media);
-	_print_bearer_opts();
+	_print_bearer_opts(out);
 }
 
 
@@ -938,8 +952,8 @@ static int cmd_bearer_get_media(struct nlmsghdr *nlh, const struct cmd *cmd,
 	media = opt->val;
 
 	if (help_flag) {
-		cmd_bearer_get_udp_help(cmdl, media);
-		return -EINVAL;
+		cmd_bearer_get_udp_help(stdout, cmdl, media);
+		return 0;
 	}
 	if (strcmp(media, "udp") != 0) {
 		fprintf(stderr, "error, no \"%s\" media specific options\n", media);
@@ -1004,8 +1018,8 @@ static int cmd_bearer_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 	};
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	if (strcmp(cmd->cmd, "priority") == 0)
@@ -1090,8 +1104,8 @@ static int cmd_bearer_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 			   struct cmdl *cmdl, void *data)
 {
 	if (help_flag) {
-		fprintf(stderr, "Usage: %s bearer list\n", cmdl->argv[0]);
-		return -EINVAL;
+		fprintf(stdout, "Usage: %s bearer list\n", cmdl->argv[0]);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_BEARER_GET);
@@ -1103,9 +1117,9 @@ static int cmd_bearer_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_dumpit(nlh, bearer_list_cb, NULL);
 }
 
-void cmd_bearer_help(struct cmdl *cmdl)
+void cmd_bearer_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s bearer COMMAND [ARGS] ...\n"
 		"\n"
 		"COMMANDS\n"
diff --git a/tipc/bearer.h b/tipc/bearer.h
index a9344659..2389f811 100644
--- a/tipc/bearer.h
+++ b/tipc/bearer.h
@@ -12,11 +12,13 @@
 
 extern int help_flag;
 
+#include <stdio.h>
+
 int cmd_bearer(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl, void *data);
-void cmd_bearer_help(struct cmdl *cmdl);
+void cmd_bearer_help(FILE *out, struct cmdl *cmdl);
 
-void print_bearer_media(void);
-int cmd_get_unique_bearer_name(const struct cmd *cmd, struct cmdl *cmdl,
+void print_bearer_media(FILE *out);
+int cmd_get_unique_bearer_name(FILE *out, const struct cmd *cmd, struct cmdl *cmdl,
 			       struct opt *opts, char *bname,
 			       const struct tipc_sup_media *sup_media);
 #endif
diff --git a/tipc/cmdl.c b/tipc/cmdl.c
index 152ddb51..ee9e822c 100644
--- a/tipc/cmdl.c
+++ b/tipc/cmdl.c
@@ -108,7 +108,7 @@ int run_cmd(struct nlmsghdr *nlh, const struct cmd *caller,
 
 	if ((cmdl->optind) >= cmdl->argc) {
 		if (caller->help)
-			(caller->help)(cmdl);
+			(caller->help)(help_flag ? stdout : stderr, cmdl);
 		return -EINVAL;
 	}
 	name = cmdl->argv[cmdl->optind];
@@ -118,7 +118,7 @@ int run_cmd(struct nlmsghdr *nlh, const struct cmd *caller,
 	if (!cmd) {
 		/* Show help about last command if we don't find this one */
 		if (help_flag && caller->help) {
-			(caller->help)(cmdl);
+			(caller->help)(stdout, cmdl);
 		} else {
 			fprintf(stderr, "error, invalid command \"%s\"\n", name);
 			fprintf(stderr, "use --help for command help\n");
@@ -126,5 +126,10 @@ int run_cmd(struct nlmsghdr *nlh, const struct cmd *caller,
 		return -EINVAL;
 	}
 
+	if (help_flag && cmdl->optind >= cmdl->argc && cmd->help) {
+		(cmd->help)(stdout, cmdl);
+		return 0;
+	}
+
 	return (cmd->func)(nlh, cmd, cmdl, data);
 }
diff --git a/tipc/cmdl.h b/tipc/cmdl.h
index 18fe51bf..fd8f02d1 100644
--- a/tipc/cmdl.h
+++ b/tipc/cmdl.h
@@ -12,6 +12,8 @@
 
 extern int help_flag;
 
+#include <stdio.h>
+
 enum {
 	OPT_KEY			= (1 << 0),
 	OPT_KEYVAL		= (1 << 1),
@@ -26,14 +28,14 @@ struct cmdl {
 struct tipc_sup_media {
 	char *media;
 	char *identifier;
-	void (*help)(struct cmdl *cmdl, char *media);
+	void (*help)(FILE *out, struct cmdl *cmdl, char *media);
 };
 
 struct cmd {
 	const char *cmd;
 	int (*func)(struct nlmsghdr *nlh, const struct cmd *cmd,
 		    struct cmdl *cmdl, void *data);
-	void (*help)(struct cmdl *cmdl);
+	void (*help)(FILE *out, struct cmdl *cmdl);
 };
 
 struct opt {
diff --git a/tipc/link.c b/tipc/link.c
index f91c3000..48b759d3 100644
--- a/tipc/link.c
+++ b/tipc/link.c
@@ -139,8 +139,8 @@ static int cmd_link_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		return -EINVAL;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	if (parse_opts(opts, cmdl) < 0)
@@ -164,9 +164,9 @@ static int cmd_link_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_doit(nlh, link_get_cb, &prop);
 }
 
-static void cmd_link_get_help(struct cmdl *cmdl)
+static void cmd_link_get_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s link get PPROPERTY link LINK\n\n"
+	fprintf(out, "Usage: %s link get PPROPERTY link LINK\n\n"
 		"PROPERTIES\n"
 		" tolerance             - Get link tolerance\n"
 		" priority              - Get link priority\n"
@@ -224,9 +224,9 @@ static int cmd_link_get_bcast_cb(const struct nlmsghdr *nlh, void *data)
 	return MNL_CB_OK;
 }
 
-static void cmd_link_get_bcast_help(struct cmdl *cmdl)
+static void cmd_link_get_bcast_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s link get PPROPERTY\n\n"
+	fprintf(out, "Usage: %s link get PPROPERTY\n\n"
 		"PROPERTIES\n"
 		" broadcast             - Get link broadcast\n",
 		cmdl->argv[0]);
@@ -239,8 +239,8 @@ static int cmd_link_get_bcast(struct nlmsghdr *nlh, const struct cmd *cmd,
 	struct nlattr *attrs;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_LINK_GET);
@@ -269,9 +269,9 @@ static int cmd_link_get(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_link_stat_reset_help(struct cmdl *cmdl)
+static void cmd_link_stat_reset_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s link stat reset link LINK\n\n", cmdl->argv[0]);
+	fprintf(out, "Usage: %s link stat reset link LINK\n\n", cmdl->argv[0]);
 }
 
 static int cmd_link_stat_reset(struct nlmsghdr *nlh, const struct cmd *cmd,
@@ -286,12 +286,12 @@ static int cmd_link_stat_reset(struct nlmsghdr *nlh, const struct cmd *cmd,
 	};
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	if (parse_opts(opts, cmdl) != 1) {
-		(cmd->help)(cmdl);
+		(cmd->help)(stdout, cmdl);
 		return -EINVAL;
 	}
 
@@ -533,9 +533,9 @@ static int link_stat_show_cb(const struct nlmsghdr *nlh, void *data)
 	return _show_link_stat(name, attrs, prop, stats);
 }
 
-static void cmd_link_stat_show_help(struct cmdl *cmdl)
+static void cmd_link_stat_show_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s link stat show [ link { LINK | SUBSTRING | all } ]\n",
+	fprintf(out, "Usage: %s link stat show [ link { LINK | SUBSTRING | all } ]\n",
 		cmdl->argv[0]);
 }
 
@@ -552,8 +552,8 @@ static int cmd_link_stat_show(struct nlmsghdr *nlh, const struct cmd *cmd,
 	int err = 0;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_LINK_GET);
@@ -581,9 +581,9 @@ static int cmd_link_stat_show(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return err;
 }
 
-static void cmd_link_stat_help(struct cmdl *cmdl)
+static void cmd_link_stat_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s link stat COMMAND [ARGS]\n\n"
+	fprintf(out, "Usage: %s link stat COMMAND [ARGS]\n\n"
 		"COMMANDS:\n"
 		" reset                 - Reset link statistics for link\n"
 		" show                  - Get link priority\n",
@@ -602,9 +602,9 @@ static int cmd_link_stat(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_link_set_help(struct cmdl *cmdl)
+static void cmd_link_set_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s link set PPROPERTY link LINK\n\n"
+	fprintf(out, "Usage: %s link set PROPERTY link LINK\n\n"
 		"PROPERTIES\n"
 		" tolerance TOLERANCE   - Set link tolerance\n"
 		" priority PRIORITY     - Set link priority\n"
@@ -636,8 +636,8 @@ static int cmd_link_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		return -EINVAL;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	if (cmdl->optind >= cmdl->argc) {
@@ -672,9 +672,9 @@ static int cmd_link_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_doit(nlh, link_get_cb, &prop);
 }
 
-static void cmd_link_set_bcast_help(struct cmdl *cmdl)
+static void cmd_link_set_bcast_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s link set broadcast PROPERTY\n\n"
+	fprintf(out, "Usage: %s link set broadcast PROPERTY\n\n"
 		"PROPERTIES\n"
 		" BROADCAST         - Forces all multicast traffic to be\n"
 		"                     transmitted via broadcast only,\n"
@@ -708,8 +708,8 @@ static int cmd_link_set_bcast(struct nlmsghdr *nlh, const struct cmd *cmd,
 	int method = 0;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	if (parse_opts(opts, cmdl) < 0)
@@ -720,7 +720,7 @@ static int cmd_link_set_bcast(struct nlmsghdr *nlh, const struct cmd *cmd,
 			break;
 
 	if (!opt || !opt->key) {
-		(cmd->help)(cmdl);
+		(cmd->help)(stdout, cmdl);
 		return -EINVAL;
 	}
 
@@ -744,7 +744,7 @@ static int cmd_link_set_bcast(struct nlmsghdr *nlh, const struct cmd *cmd,
 
 	opt = get_opt(opts, "ratio");
 	if (!method && !opt) {
-		(cmd->help)(cmdl);
+		(cmd->help)(stdout, cmdl);
 		return -EINVAL;
 	}
 
@@ -833,8 +833,8 @@ static int cmd_link_mon_summary(struct nlmsghdr *nlh, const struct cmd *cmd,
 	int err = 0;
 
 	if (help_flag) {
-		fprintf(stderr,	"Usage: %s monitor summary\n", cmdl->argv[0]);
-		return -EINVAL;
+		fprintf(stdout,	"Usage: %s monitor summary\n", cmdl->argv[0]);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_MON_GET);
@@ -1052,23 +1052,23 @@ static int link_mon_list_cb(const struct nlmsghdr *nlh, void *data)
 	return MNL_CB_OK;
 }
 
-static void cmd_link_mon_list_help(struct cmdl *cmdl)
+static void cmd_link_mon_list_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s monitor list [ media MEDIA ARGS...]\n\n",
+	fprintf(out, "Usage: %s monitor list [ media MEDIA ARGS...]\n\n",
 		cmdl->argv[0]);
-	print_bearer_media();
+	print_bearer_media(out);
 }
 
-static void cmd_link_mon_list_l2_help(struct cmdl *cmdl, char *media)
+static void cmd_link_mon_list_l2_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s monitor list media %s device DEVICE [OPTIONS]\n",
 		cmdl->argv[0], media);
 }
 
-static void cmd_link_mon_list_udp_help(struct cmdl *cmdl, char *media)
+static void cmd_link_mon_list_udp_help(FILE *out, struct cmdl *cmdl, char *media)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s monitor list media udp name NAME\n\n",
 		cmdl->argv[0]);
 }
@@ -1096,15 +1096,15 @@ static int cmd_link_mon_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 		return -EINVAL;
 
 	if (get_opt(opts, "media")) {
-		err = cmd_get_unique_bearer_name(cmd, cmdl, opts, bname,
+		err = cmd_get_unique_bearer_name(help_flag ? stdout : stderr, cmd, cmdl, opts, bname,
 						 sup_media);
 		if (err)
 			return err;
 	}
 
 	if (help_flag) {
-		cmd->help(cmdl);
-		return -EINVAL;
+		cmd->help(stdout, cmdl);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_MON_GET);
@@ -1119,9 +1119,9 @@ static int cmd_link_mon_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return err;
 }
 
-static void cmd_link_mon_set_help(struct cmdl *cmdl)
+static void cmd_link_mon_set_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s monitor set PPROPERTY\n\n"
+	fprintf(out, "Usage: %s monitor set PROPERTY\n\n"
 		"PROPERTIES\n"
 		" threshold SIZE	- Set monitor activation threshold\n",
 		cmdl->argv[0]);
@@ -1131,16 +1131,16 @@ static int cmd_link_mon_set(struct nlmsghdr *nlh, const struct cmd *cmd,
 			    struct cmdl *cmdl, void *data)
 {
 	const struct cmd cmds[] = {
-		{ "threshold",	cmd_link_mon_set_prop,	NULL },
+		{ "threshold",	cmd_link_mon_set_prop,	cmd_link_mon_set_help },
 		{ NULL }
 	};
 
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_link_mon_get_help(struct cmdl *cmdl)
+static void cmd_link_mon_get_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s monitor get PPROPERTY\n\n"
+	fprintf(out, "Usage: %s monitor get PROPERTY\n\n"
 		"PROPERTIES\n"
 		" threshold	- Get monitor activation threshold\n",
 		cmdl->argv[0]);
@@ -1171,6 +1171,10 @@ static int link_mon_get_cb(const struct nlmsghdr *nlh, void *data)
 static int cmd_link_mon_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 				 struct cmdl *cmdl, void *data)
 {
+	if (help_flag) {
+		(cmd->help)(stdout, cmdl);
+		return 0;
+	}
 
 	nlh = msg_init(TIPC_NL_MON_GET);
 	if (!nlh) {
@@ -1185,17 +1189,17 @@ static int cmd_link_mon_get(struct nlmsghdr *nlh, const struct cmd *cmd,
 			    struct cmdl *cmdl, void *data)
 {
 	const struct cmd cmds[] = {
-		{ "threshold",	cmd_link_mon_get_prop,	NULL},
+		{ "threshold",	cmd_link_mon_get_prop,	cmd_link_mon_get_help},
 		{ NULL }
 	};
 
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_link_mon_help(struct cmdl *cmdl)
+static void cmd_link_mon_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
-		"Usage: %s montior COMMAND [ARGS] ...\n\n"
+	fprintf(out,
+		"Usage: %s monitor COMMAND [ARGS] ...\n\n"
 		"COMMANDS\n"
 		" set			- Set monitor properties\n"
 		" get			- Get monitor properties\n"
@@ -1218,9 +1222,9 @@ static int cmd_link_mon(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-void cmd_link_help(struct cmdl *cmdl)
+void cmd_link_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s link COMMAND [ARGS] ...\n"
 		"\n"
 		"COMMANDS\n"
diff --git a/tipc/link.h b/tipc/link.h
index a0d46035..09e47b89 100644
--- a/tipc/link.h
+++ b/tipc/link.h
@@ -8,10 +8,10 @@
 #ifndef _TIPC_LINK_H
 #define _TIPC_LINK_H
 
-extern int help_flag;
+#include <stdio.h>
 
 int cmd_link(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl,
 	     void *data);
-void cmd_link_help(struct cmdl *cmdl);
+void cmd_link_help(FILE *out, struct cmdl *cmdl);
 
 #endif
diff --git a/tipc/media.c b/tipc/media.c
index 5ff0c8c4..82ce5145 100644
--- a/tipc/media.c
+++ b/tipc/media.c
@@ -40,8 +40,8 @@ static int cmd_media_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 			 struct cmdl *cmdl, void *data)
 {
 	if (help_flag) {
-		fprintf(stderr, "Usage: %s media list\n", cmdl->argv[0]);
-		return -EINVAL;
+		fprintf(stdout, "Usage: %s media list\n", cmdl->argv[0]);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_MEDIA_GET);
@@ -101,8 +101,8 @@ static int cmd_media_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		return -EINVAL;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	if (parse_opts(opts, cmdl) < 0)
@@ -131,9 +131,9 @@ static int cmd_media_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_doit(nlh, media_get_cb, &prop);
 }
 
-static void cmd_media_get_help(struct cmdl *cmdl)
+static void cmd_media_get_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s media get PPROPERTY media MEDIA\n\n"
+	fprintf(out, "Usage: %s media get PPROPERTY media MEDIA\n\n"
 		"PROPERTIES\n"
 		" tolerance             - Get media tolerance\n"
 		" priority              - Get media priority\n"
@@ -156,9 +156,9 @@ static int cmd_media_get(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_media_set_help(struct cmdl *cmdl)
+static void cmd_media_set_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s media set PPROPERTY media MEDIA\n\n"
+	fprintf(out, "Usage: %s media set PPROPERTY media MEDIA\n\n"
 		"PROPERTIES\n"
 		" tolerance TOLERANCE   - Set media tolerance\n"
 		" priority PRIORITY     - Set media priority\n"
@@ -192,8 +192,8 @@ static int cmd_media_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		return -EINVAL;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	if (cmdl->optind >= cmdl->argc) {
@@ -247,9 +247,9 @@ static int cmd_media_set(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-void cmd_media_help(struct cmdl *cmdl)
+void cmd_media_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s media COMMAND [ARGS] ...\n"
 		"\n"
 		"Commands:\n"
diff --git a/tipc/media.h b/tipc/media.h
index f1b4b540..eb4374dd 100644
--- a/tipc/media.h
+++ b/tipc/media.h
@@ -8,10 +8,10 @@
 #ifndef _TIPC_MEDIA_H
 #define _TIPC_MEDIA_H
 
-extern int help_flag;
+#include <stdio.h>
 
 int cmd_media(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl,
 	     void *data);
-void cmd_media_help(struct cmdl *cmdl);
+void cmd_media_help(FILE *out, struct cmdl *cmdl);
 
 #endif
diff --git a/tipc/nametable.c b/tipc/nametable.c
index 5162f7fc..1ede1024 100644
--- a/tipc/nametable.c
+++ b/tipc/nametable.c
@@ -80,8 +80,8 @@ static int cmd_nametable_show(struct nlmsghdr *nlh, const struct cmd *cmd,
 	int rc = 0;
 
 	if (help_flag) {
-		fprintf(stderr, "Usage: %s nametable show\n", cmdl->argv[0]);
-		return -EINVAL;
+		fprintf(stdout, "Usage: %s nametable show\n", cmdl->argv[0]);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_NAME_TABLE_GET);
@@ -97,9 +97,9 @@ static int cmd_nametable_show(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return rc;
 }
 
-void cmd_nametable_help(struct cmdl *cmdl)
+void cmd_nametable_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s nametable COMMAND\n\n"
 		"COMMANDS\n"
 		" show                  - Show nametable\n",
diff --git a/tipc/nametable.h b/tipc/nametable.h
index c4df8d9d..e84227ed 100644
--- a/tipc/nametable.h
+++ b/tipc/nametable.h
@@ -8,9 +8,9 @@
 #ifndef _TIPC_NAMETABLE_H
 #define _TIPC_NAMETABLE_H
 
-extern int help_flag;
+#include <stdio.h>
 
-void cmd_nametable_help(struct cmdl *cmdl);
+void cmd_nametable_help(FILE *out, struct cmdl *cmdl);
 int cmd_nametable(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl,
 		  void *data);
 
diff --git a/tipc/node.c b/tipc/node.c
index b84a3fa1..77abc185 100644
--- a/tipc/node.c
+++ b/tipc/node.c
@@ -48,8 +48,8 @@ static int cmd_node_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 			 struct cmdl *cmdl, void *data)
 {
 	if (help_flag) {
-		fprintf(stderr, "Usage: %s node list\n", cmdl->argv[0]);
-		return -EINVAL;
+		fprintf(stdout, "Usage: %s node list\n", cmdl->argv[0]);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_NODE_GET);
@@ -68,6 +68,11 @@ static int cmd_node_set_addr(struct nlmsghdr *nlh, const struct cmd *cmd,
 	uint32_t addr;
 	struct nlattr *nest;
 
+	if (help_flag) {
+		fprintf(stdout, "Usage: %s node set address ADDRESS\n", cmdl->argv[0]);
+		return 0;
+	}
+
 	if (cmdl->argc != cmdl->optind + 1) {
 		fprintf(stderr, "Usage: %s node set address ADDRESS\n",
 			cmdl->argv[0]);
@@ -126,6 +131,11 @@ static int cmd_node_set_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd,
 	struct nlattr *nest;
 	char *str;
 
+	if (help_flag) {
+		fprintf(stdout, "Usage: %s node set nodeid NODE_ID\n", cmdl->argv[0]);
+		return 0;
+	}
+
 	if (cmdl->argc != cmdl->optind + 1) {
 		fprintf(stderr, "Usage: %s node set nodeid NODE_ID\n",
 			cmdl->argv[0]);
@@ -150,9 +160,9 @@ static int cmd_node_set_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_doit(nlh, NULL, NULL);
 }
 
-static void cmd_node_set_key_help(struct cmdl *cmdl)
+static void cmd_node_set_key_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s node set key KEY [algname ALGNAME] [PROPERTIES]\n"
 		"       %s node set key rekeying REKEYING\n\n"
 		"KEY\n"
@@ -201,8 +211,8 @@ static int cmd_node_set_key(struct nlmsghdr *nlh, const struct cmd *cmd,
 	char *str;
 
 	if (help_flag || cmdl->optind >= cmdl->argc) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(help_flag ? stdout : stderr, cmdl);
+		return help_flag ? 0 : -EINVAL;
 	}
 
 	/* Check if command starts with opts i.e. "rekeying" opt without key */
@@ -285,8 +295,8 @@ static int cmd_node_flush_key(struct nlmsghdr *nlh, const struct cmd *cmd,
 			      struct cmdl *cmdl, void *data)
 {
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	/* Init & do the command */
@@ -328,8 +338,8 @@ static int cmd_node_get_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd,
 			       struct cmdl *cmdl, void *data)
 {
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_NET_GET);
@@ -364,8 +374,8 @@ static int cmd_node_get_netid(struct nlmsghdr *nlh, const struct cmd *cmd,
 			      struct cmdl *cmdl, void *data)
 {
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_NET_GET);
@@ -384,8 +394,8 @@ static int cmd_node_set_netid(struct nlmsghdr *nlh, const struct cmd *cmd,
 	struct nlattr *nest;
 
 	if (help_flag) {
-		(cmd->help)(cmdl);
-		return -EINVAL;
+		(cmd->help)(stdout, cmdl);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_NET_SET);
@@ -408,9 +418,9 @@ static int cmd_node_set_netid(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_doit(nlh, NULL, NULL);
 }
 
-static void cmd_node_flush_help(struct cmdl *cmdl)
+static void cmd_node_flush_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s node flush PROPERTY\n\n"
 		"PROPERTIES\n"
 		" key                   - Flush all symmetric-keys\n",
@@ -428,9 +438,9 @@ static int cmd_node_flush(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_node_set_help(struct cmdl *cmdl)
+static void cmd_node_set_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s node set PROPERTY\n\n"
 		"PROPERTIES\n"
 		" identity NODEID       - Set node identity\n"
@@ -454,9 +464,9 @@ static int cmd_node_set(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-static void cmd_node_get_help(struct cmdl *cmdl)
+static void cmd_node_get_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s node get PROPERTY\n\n"
 		"PROPERTIES\n"
 		" identity              - Get node identity\n"
@@ -478,9 +488,9 @@ static int cmd_node_get(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-void cmd_node_help(struct cmdl *cmdl)
+void cmd_node_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s node COMMAND [ARGS] ...\n\n"
 		"COMMANDS\n"
 		" list                  - List remote nodes\n"
diff --git a/tipc/node.h b/tipc/node.h
index 4a986d07..7d310897 100644
--- a/tipc/node.h
+++ b/tipc/node.h
@@ -8,10 +8,10 @@
 #ifndef _TIPC_NODE_H
 #define _TIPC_NODE_H
 
-extern int help_flag;
+#include <stdio.h>
 
 int cmd_node(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl,
 	     void *data);
-void cmd_node_help(struct cmdl *cmdl);
+void cmd_node_help(FILE *out, struct cmdl *cmdl);
 
 #endif
diff --git a/tipc/peer.c b/tipc/peer.c
index 5a583fb9..6d12fb19 100644
--- a/tipc/peer.c
+++ b/tipc/peer.c
@@ -27,9 +27,9 @@ static int cmd_peer_rm_addr(struct nlmsghdr *nlh, const struct cmd *cmd,
 	struct nlattr *nest;
 
 	if ((cmdl->argc != cmdl->optind + 1) || help_flag) {
-		fprintf(stderr, "Usage: %s peer remove address ADDRESS\n",
+		fprintf(help_flag ? stdout : stderr, "Usage: %s peer remove address ADDRESS\n",
 			cmdl->argv[0]);
-		return -EINVAL;
+		return help_flag ? 0 : -EINVAL;
 	}
 
 	str = shift_cmdl(cmdl);
@@ -63,10 +63,10 @@ static int cmd_peer_rm_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd,
 	struct nlattr *nest;
 	char *str;
 
-	if (cmdl->argc != cmdl->optind + 1) {
-		fprintf(stderr, "Usage: %s peer remove identity NODEID\n",
+	if ((cmdl->argc != cmdl->optind + 1) || help_flag) {
+		fprintf(help_flag ? stdout : stderr, "Usage: %s peer remove identity NODEID\n",
 			cmdl->argv[0]);
-		return -EINVAL;
+		return help_flag ? 0 : -EINVAL;
 	}
 
 	str = shift_cmdl(cmdl);
@@ -89,23 +89,23 @@ static int cmd_peer_rm_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_doit(nlh, NULL, NULL);
 }
 
-static void cmd_peer_rm_help(struct cmdl *cmdl)
+static void cmd_peer_rm_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s peer remove PROPERTY\n\n"
+	fprintf(out, "Usage: %s peer remove PROPERTY\n\n"
 		"PROPERTIES\n"
 		" identity NODEID         - Remove peer node identity\n",
 		cmdl->argv[0]);
 }
 
-static void cmd_peer_rm_addr_help(struct cmdl *cmdl)
+static void cmd_peer_rm_addr_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s peer remove address ADDRESS\n",
+	fprintf(out, "Usage: %s peer remove address ADDRESS\n",
 		cmdl->argv[0]);
 }
 
-static void cmd_peer_rm_nodeid_help(struct cmdl *cmdl)
+static void cmd_peer_rm_nodeid_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr, "Usage: %s peer remove identity NODEID\n",
+	fprintf(out, "Usage: %s peer remove identity NODEID\n",
 		cmdl->argv[0]);
 }
 
@@ -121,9 +121,9 @@ static int cmd_peer_rm(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return run_cmd(nlh, cmd, cmds, cmdl, NULL);
 }
 
-void cmd_peer_help(struct cmdl *cmdl)
+void cmd_peer_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s peer COMMAND [ARGS] ...\n\n"
 		"COMMANDS\n"
 		" remove                - Remove an offline peer node\n",
diff --git a/tipc/peer.h b/tipc/peer.h
index 2bd0a2a3..8a2374cf 100644
--- a/tipc/peer.h
+++ b/tipc/peer.h
@@ -8,10 +8,10 @@
 #ifndef _TIPC_PEER_H
 #define _TIPC_PEER_H
 
-extern int help_flag;
+#include <stdio.h>
 
 int cmd_peer(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl,
 	     void *data);
-void cmd_peer_help(struct cmdl *cmdl);
+void cmd_peer_help(FILE *out, struct cmdl *cmdl);
 
 #endif
diff --git a/tipc/socket.c b/tipc/socket.c
index 4d376e07..bf2147cb 100644
--- a/tipc/socket.c
+++ b/tipc/socket.c
@@ -112,8 +112,8 @@ static int cmd_socket_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 			   struct cmdl *cmdl, void *data)
 {
 	if (help_flag) {
-		fprintf(stderr, "Usage: %s socket list\n", cmdl->argv[0]);
-		return -EINVAL;
+		fprintf(stdout, "Usage: %s socket list\n", cmdl->argv[0]);
+		return 0;
 	}
 
 	nlh = msg_init(TIPC_NL_SOCK_GET);
@@ -125,9 +125,9 @@ static int cmd_socket_list(struct nlmsghdr *nlh, const struct cmd *cmd,
 	return msg_dumpit(nlh, sock_list_cb, NULL);
 }
 
-void cmd_socket_help(struct cmdl *cmdl)
+void cmd_socket_help(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: %s socket COMMAND\n\n"
 		"Commands:\n"
 		" list                  - List sockets (ports)\n",
diff --git a/tipc/socket.h b/tipc/socket.h
index c4341bb2..207b8666 100644
--- a/tipc/socket.h
+++ b/tipc/socket.h
@@ -8,9 +8,9 @@
 #ifndef _TIPC_SOCKET_H
 #define _TIPC_SOCKET_H
 
-extern int help_flag;
+#include <stdio.h>
 
-void cmd_socket_help(struct cmdl *cmdl);
+void cmd_socket_help(FILE *out, struct cmdl *cmdl);
 int cmd_socket(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl,
 		  void *data);
 
diff --git a/tipc/tipc.c b/tipc/tipc.c
index 56af052c..733acb50 100644
--- a/tipc/tipc.c
+++ b/tipc/tipc.c
@@ -28,9 +28,9 @@ int help_flag;
 int json;
 struct mnlu_gen_socket tipc_nlg;
 
-static void about(struct cmdl *cmdl)
+static void about(FILE *out, struct cmdl *cmdl)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Transparent Inter-Process Communication Protocol\n"
 		"Usage: %s [OPTIONS] COMMAND [ARGS] ...\n"
 		"\n"
@@ -111,20 +111,24 @@ int main(int argc, char *argv[])
 	cmdl.argc = argc;
 	cmdl.argv = argv;
 
-	res = mnlu_gen_socket_open(&tipc_nlg, TIPC_GENL_V2_NAME,
-				   TIPC_GENL_V2_VERSION);
-	if (res) {
-		fprintf(stderr,
-			"Unable to get TIPC nl family id (module loaded?)\n");
-		return -1;
+	if (!help_flag) {
+		res = mnlu_gen_socket_open(&tipc_nlg, TIPC_GENL_V2_NAME,
+					   TIPC_GENL_V2_VERSION);
+		if (res) {
+			fprintf(stderr,
+				"Unable to get TIPC nl family id (module loaded?)\n");
+			return -1;
+		}
 	}
 
 	res = run_cmd(NULL, &cmd, cmds, &cmdl, &tipc_nlg);
 	if (res != 0) {
-		mnlu_gen_socket_close(&tipc_nlg);
-		return -1;
+		if (!help_flag)
+			mnlu_gen_socket_close(&tipc_nlg);
+		return help_flag ? 0 : -1;
 	}
 
-	mnlu_gen_socket_close(&tipc_nlg);
+	if (!help_flag)
+		mnlu_gen_socket_close(&tipc_nlg);
 	return 0;
 }
diff --git a/vdpa/vdpa.c b/vdpa/vdpa.c
index e2b0a5b1..550443e6 100644
--- a/vdpa/vdpa.c
+++ b/vdpa/vdpa.c
@@ -1051,9 +1051,9 @@ static int cmd_dev(struct vdpa *vdpa, int argc, char **argv)
 	return -ENOENT;
 }
 
-static void help(void)
+static void help(FILE *out)
 {
-	fprintf(stderr,
+	fprintf(out,
 		"Usage: vdpa [ OPTIONS ] OBJECT { COMMAND | help }\n"
 		"where  OBJECT := { mgmtdev | dev }\n"
 		"       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] }\n");
@@ -1061,8 +1061,11 @@ static void help(void)
 
 static int vdpa_cmd(struct vdpa *vdpa, int argc, char **argv)
 {
-	if (!argc || matches(*argv, "help") == 0) {
-		help();
+	if (!argc) {
+		help(stderr);
+		return -EINVAL;
+	} else if (matches(*argv, "help") == 0) {
+		help(stdout);
 		return 0;
 	} else if (matches(*argv, "mgmtdev") == 0) {
 		return cmd_mgmtdev(vdpa, argc - 1, argv + 1);
@@ -1127,6 +1130,7 @@ int main(int argc, char **argv)
 		{ NULL, 0, NULL, 0 }
 	};
 	struct vdpa *vdpa;
+	bool need_nl = true;
 	int opt;
 	int err;
 	int ret;
@@ -1150,12 +1154,12 @@ int main(int argc, char **argv)
 			pretty = true;
 			break;
 		case 'h':
-			help();
+			help(stdout);
 			ret = EXIT_SUCCESS;
 			goto vdpa_free;
 		default:
 			fprintf(stderr, "Unknown option.\n");
-			help();
+			help(stderr);
 			ret = EXIT_FAILURE;
 			goto vdpa_free;
 		}
@@ -1164,10 +1168,23 @@ int main(int argc, char **argv)
 	argc -= optind;
 	argv += optind;
 
-	err = vdpa_init(vdpa);
-	if (err) {
-		ret = EXIT_FAILURE;
-		goto vdpa_free;
+	if (argc > 0 && (strcmp(argv[0], "help") == 0 ||
+			 strcmp(argv[0], "-h") == 0 ||
+			 strcmp(argv[0], "--help") == 0))
+		need_nl = false;
+	
+	if (argc > 1 && (strcmp(argv[1], "help") == 0 ||
+			 strcmp(argv[1], "-h") == 0 ||
+			 strcmp(argv[1], "--help") == 0))
+		need_nl = false;
+
+
+	if (need_nl) {
+		err = vdpa_init(vdpa);
+		if (err) {
+			ret = EXIT_FAILURE;
+			goto vdpa_free;
+		}
 	}
 
 	err = vdpa_cmd(vdpa, argc, argv);
@@ -1179,7 +1196,8 @@ int main(int argc, char **argv)
 	ret = EXIT_SUCCESS;
 
 vdpa_fini:
-	vdpa_fini(vdpa);
+	if (need_nl)
+		vdpa_fini(vdpa);
 vdpa_free:
 	vdpa_free(vdpa);
 	return ret;
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v2 0/7] seg6: add SRv6 Mobile User Plane (RFC 9433) behaviors
From: Yuya Kusakabe @ 2026-06-23  1:18 UTC (permalink / raw)
  To: andrea
  Cc: Yuya Kusakabe, andrea.mayer, davem, edumazet, dsahern, kuba,
	pabeni, horms, justin.iurman, shuah, corbet, skhan, linux-kernel,
	netdev, linux-kselftest, linux-doc, stefano.salsano, ahabdels
In-Reply-To: <20260608023951.ccd278890d7c489dbfe21113@common-net.org>

Hi Andrea,

Thank you for the answers.

> On the placement, the new lwtunnel encap type you propose could be a way to
> implement the seg6_mobile.c separation. Since this touches UAPI in
> include/uapi/linux/lwtunnel.h beyond the SRv6 subsystem and cannot be
> undone once merged, it needs careful design.
[...]
> As far as I can see, RFC 9433 has only one Headend behavior, and no L2 or
> reduced variants. So a single LWTUNNEL_ENCAP_SEG6_MOBILE handling both
> End.M.* and H.M.GTP4.D could be viable if accepting both input families
> (ETH_P_IPV6 for End.M.*, ETH_P_IP for H.M.GTP4.D) is treated as a design
> choice of the new encap type, not a stretching of the seg6_local endpoint
> processing model.
>
> These trade-offs are worth weighing in the final design. [...] I think the
> lwtunnel direction will need feedback and comments from its community and
> maintainers.

Agreed. The first per-behavior RFC series (End.MAP) will introduce the
LWTUNNEL_ENCAP_SEG6_MOBILE encap type and the SEG6_MOBILE_* attribute
namespace, and explain in its cover letter that this is the shared
container for the RFC 9433 Section 6 behaviors, so the lwtunnel and
routing folks can weigh in early. The dual input family (ETH_P_IPV6
for End.M.*, ETH_P_IP for H.M.GTP4.D) is specific to H.M.GTP4.D, so I
will lay that out in the H.M.GTP4.D cover letter; keeping it last in
the posting order gives that discussion time to converge.

> If LWTUNNEL_ENCAP_SEG6_MOBILE is added, using SEG6_MOBILE_* attributes
> instead of SEG6_LOCAL_* removes the NH6/SRH/OIF overload raised in v2.
> After solving the above, additional issues remain in the patchset,
> for example src is overloaded across MUP behaviors, and v4_mask_len
> needs revision. These are independent of the lwtunnel decision.

Both will be addressed in the rework; the details are in my replies to
your patch 2 and patch 3 reviews. In short: v4_mask_len and the src
template will be removed from End.M.GTP4.E entirely (full 32-bit IPv4
DA/SA recovery only), src will mean the verbatim outer IPv6 SA for the
IPv6-emitting behaviors, and the H.M.GTP4.D "Source UPF Prefix"
template can get its own attribute name in that series if you prefer.

> I can lead it. I have been evaluating the SRv6 drop reasons with my
> research group, alongside other pending SRv6 patches.
>
> We can sync offline on which SRv6 reasons fit your MUP behaviors, which
> v2 MUP-specific reasons would fit better as SRv6 or generic, and what
> stays MUP-specific.

Thanks for taking the lead; happy to sync offline. Until the prep
series lands, the per-behavior series will carry no MUP-specific drop
reasons.

> Thanks. Maybe also worth covering bad packets, like fragmented input or
> malformed GTP-U extensions.

Will do; the C-helper selftests will cover malformed and truncated
GTP-U extension chains, a duplicated PDU Session Container, and
fragmented outer input (which the behaviors will reject explicitly).

> Works for me. What matters is that the upcoming patches are well structured
> so NF_HOOK can be wired in cleanly in the follow-up.
>
> I am already working on the fix.

Understood. Each behavior will keep a single strip / transform / push
flow in its input handler, so the hook can later slot between strip
and push without reintroducing the skb->cb context pattern.

Thanks,
Yuya

^ permalink raw reply

* Re: [PATCH v4 net] net: mana: Optimize irq affinity for low vcpu configs
From: Jakub Kicinski @ 2026-06-23  1:22 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Erni Sri Satya Vennela, Dipayaan Roy, Shiraz Saleem,
	Michael Kelley, Long Li, Yury Norov, linux-hyperv, linux-kernel,
	netdev, Paul Rosswurm, Shradha Gupta, Saurabh Singh Sengar,
	stable
In-Reply-To: <20260619073338.481035-1-shradhagupta@linux.microsoft.com>

On Fri, 19 Jun 2026 00:33:35 -0700 Shradha Gupta wrote:
> Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
> Cc: stable@vger.kernel.org

If you want this to be a fix -- could you please rewrite the commit
message? What matters most is the comparison before the bad commit,
the bad commit, and then with this fix applied. Perhaps the three
cases you list is that but it's not immediately obvious..
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net v2] eth: bnxt: improve the timing of stats
From: patchwork-bot+netdevbpf @ 2026-06-23  1:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	michael.chan, pavan.chebbi
In-Reply-To: <20260619191538.104165-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 19 Jun 2026 12:15:38 -0700 you wrote:
> Kernel selftests wait 1.25x of the promised stats refresh time
> (as read from ethtool -c). bnxt reports 1sec by default, but
> the stats update process has two steps. First device DMAs the
> new values, then the service task performs update in full-width
> SW counters. So the worst case delay is actually 2x.
> 
> Note that the behavior is different for ring stats and port stats.
> Port stats are fetched synchronously by the service worker, so
> there's no risk of doubling up the delay there.
> 
> [...]

Here is the summary with links:
  - [net,v2] eth: bnxt: improve the timing of stats
    https://git.kernel.org/netdev/net/c/40529e58629b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net/wan/hdlc_ppp: sync per-proto timers before freeing hdlc state
From: patchwork-bot+netdevbpf @ 2026-06-23  1:30 UTC (permalink / raw)
  To: Fan Wu
  Cc: netdev, khc, kuba, davem, edumazet, pabeni, andrew+netdev,
	linux-kernel, stable
In-Reply-To: <20260617020518.116319-1-fanwu01@zju.edu.cn>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 17 Jun 2026 02:05:18 +0000 you wrote:
> Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp
> registers a timer via timer_setup(). That struct ppp is the
> hdlc->state allocation, which detach_hdlc_protocol() frees with kfree()
> in both teardown paths: unregister_hdlc_device() and the re-attach inside
> attach_hdlc_protocol().
> 
> The ppp proto never registered a .detach callback, so
> detach_hdlc_protocol() performs no timer synchronization before the
> kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(),
> is partial (it does not wait for a running callback) and only runs on the
> ->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A
> ppp_timer callback already executing (blocked on ppp->lock) survives the
> kfree and then dereferences proto->state / ppp->lock in freed memory,
> leading to a use-after-free.
> 
> [...]

Here is the summary with links:
  - [net] net/wan/hdlc_ppp: sync per-proto timers before freeing hdlc state
    https://git.kernel.org/netdev/net/c/c78a4e41ab5e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] ipv6: Fix null-ptr-deref in fib6_nh_mtu_change().
From: patchwork-bot+netdevbpf @ 2026-06-23  1:30 UTC (permalink / raw)
  To: Xiang Mei
  Cc: dsahern, idosch, netdev, davem, edumazet, pabeni, kuba, horms,
	bestswngs
In-Reply-To: <20260619045334.2427073-1-xmei5@asu.edu>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 18 Jun 2026 21:53:34 -0700 you wrote:
> From: Xiang Mei <xmei5@asu.edu>
> 
> fib6_nh_mtu_change() re-fetches idev via __in6_dev_get(arg->dev) and
> dereferences idev->cnf.mtu6 without a NULL check. addrconf_ifdown()
> clears dev->ip6_ptr with RCU_INIT_POINTER() after rt6_disable_ip() has
> released tb6_lock, so the RA-driven MTU walk can observe a NULL idev and
> oops. The caller rt6_mtu_change_route() guards its own __in6_dev_get(),
> but this re-fetch is unguarded; nexthop-backed routes survive
> addrconf_ifdown()'s flush, so the walk still reaches it after ip6_ptr is
> nulled.
> 
> [...]

Here is the summary with links:
  - [net] ipv6: Fix null-ptr-deref in fib6_nh_mtu_change().
    https://git.kernel.org/netdev/net/c/46c3b8191aad

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH bpf v4] selftests/bpf: Cover partial copy of non-linear test_run output
From: Sun Jian @ 2026-06-23  1:40 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, paul.chaignon, Sun Jian

prog_run_opts already verifies that BPF_PROG_TEST_RUN returns -ENOSPC
for a short data_out buffer while still reporting the full output size
through data_size_out.

Add the same coverage for non-linear test_run output. Use pass-through
TC and XDP programs with a 9000-byte packet, a 64-byte linear data area,
and a 100-byte data_out buffer. The expected output spans both the linear
data and the first fragment.

Verify that test_run returns -ENOSPC, reports the full packet length
through data_size_out, and copies the packet prefix into data_out for
both non-linear skb and XDP frags paths.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
---

v4:
- Send only the selftest patch; the fix patch has been applied to bpf/master.
- Initialize data_out buffers to avoid reading uninitialized stack memory if
  bpf_prog_test_run_opts() fails unexpectedly.

 .../selftests/bpf/prog_tests/prog_run_opts.c  | 70 +++++++++++++++++++
 .../selftests/bpf/progs/test_pkt_access.c     | 12 ++++
 2 files changed, 82 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
index 01f1d1b6715a..beb6fa78fd94 100644
--- a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
+++ b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
@@ -4,6 +4,10 @@
 
 #include "test_pkt_access.skel.h"
 
+#define NONLINEAR_PKT_LEN 9000
+#define NONLINEAR_LINEAR_DATA_LEN 64
+#define SHORT_OUT_LEN 100
+
 static const __u32 duration;
 
 static void check_run_cnt(int prog_fd, __u64 run_cnt)
@@ -20,6 +24,69 @@ static void check_run_cnt(int prog_fd, __u64 run_cnt)
 	      "incorrect number of repetitions, want %llu have %llu\n", run_cnt, info.run_cnt);
 }
 
+static void init_pkt(__u8 *pkt, size_t len)
+{
+	size_t i;
+
+	for (i = 0; i < len; i++)
+		pkt[i] = i & 0xff;
+}
+
+static void test_skb_nonlinear_data_out_partial(struct test_pkt_access *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	__u8 pkt[NONLINEAR_PKT_LEN];
+	__u8 out[SHORT_OUT_LEN] = {};
+	struct __sk_buff skb = {};
+	int prog_fd, err;
+
+	init_pkt(pkt, sizeof(pkt));
+
+	skb.data_end = NONLINEAR_LINEAR_DATA_LEN;
+
+	topts.data_in = pkt;
+	topts.data_size_in = sizeof(pkt);
+	topts.data_out = out;
+	topts.data_size_out = sizeof(out);
+	topts.ctx_in = &skb;
+	topts.ctx_size_in = sizeof(skb);
+
+	prog_fd = bpf_program__fd(skel->progs.tc_pass_prog);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+
+	ASSERT_EQ(err, -ENOSPC, "skb_partial_err");
+	ASSERT_EQ(topts.data_size_out, sizeof(pkt), "skb_partial_size");
+	ASSERT_OK(memcmp(out, pkt, sizeof(out)), "skb_partial_data");
+}
+
+static void test_xdp_nonlinear_data_out_partial(struct test_pkt_access *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	__u8 pkt[NONLINEAR_PKT_LEN];
+	__u8 out[SHORT_OUT_LEN] = {};
+	struct xdp_md ctx = {};
+	int prog_fd, err;
+
+	init_pkt(pkt, sizeof(pkt));
+
+	ctx.data = 0;
+	ctx.data_end = NONLINEAR_LINEAR_DATA_LEN;
+
+	topts.data_in = pkt;
+	topts.data_size_in = sizeof(pkt);
+	topts.data_out = out;
+	topts.data_size_out = sizeof(out);
+	topts.ctx_in = &ctx;
+	topts.ctx_size_in = sizeof(ctx);
+
+	prog_fd = bpf_program__fd(skel->progs.xdp_frags_pass_prog);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+
+	ASSERT_EQ(err, -ENOSPC, "xdp_partial_err");
+	ASSERT_EQ(topts.data_size_out, sizeof(pkt), "xdp_partial_size");
+	ASSERT_OK(memcmp(out, pkt, sizeof(out)), "xdp_partial_data");
+}
+
 void test_prog_run_opts(void)
 {
 	struct test_pkt_access *skel;
@@ -69,6 +136,9 @@ void test_prog_run_opts(void)
 	run_cnt += topts.repeat;
 	check_run_cnt(prog_fd, run_cnt);
 
+	test_skb_nonlinear_data_out_partial(skel);
+	test_xdp_nonlinear_data_out_partial(skel);
+
 cleanup:
 	if (skel)
 		test_pkt_access__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/test_pkt_access.c b/tools/testing/selftests/bpf/progs/test_pkt_access.c
index bce7173152c6..cd284401eebd 100644
--- a/tools/testing/selftests/bpf/progs/test_pkt_access.c
+++ b/tools/testing/selftests/bpf/progs/test_pkt_access.c
@@ -150,3 +150,15 @@ int test_pkt_access(struct __sk_buff *skb)
 
 	return TC_ACT_UNSPEC;
 }
+
+SEC("tc")
+int tc_pass_prog(struct __sk_buff *skb)
+{
+	return TC_ACT_OK;
+}
+
+SEC("xdp.frags")
+int xdp_frags_pass_prog(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net] bnx2x: fix potential memory leak in bnx2x_alloc_mem_bp()
From: Jakub Kicinski @ 2026-06-23  1:41 UTC (permalink / raw)
  To: Simon Horman
  Cc: Abdun Nihaal, skalluru, manishc, andrew+netdev, davem, edumazet,
	pabeni, netdev, linux-kernel, barak, stable
In-Reply-To: <20260622130515.GE827683@horms.kernel.org>

On Mon, 22 Jun 2026 14:05:15 +0100 Simon Horman wrote:
> FTR, there is an AI-generated review of this patch available on sashiko.dev.
> While I don't think that should effect the progress of this patch you may
> want to consider it in the context of follow-up.

TBH it seems like an adjacent enough issue to me, but okay...

^ permalink raw reply

* [PATCH 0/3] SM8450 IPA support
From: Esteban Urrutia via B4 Relay @ 2026-06-23  1:44 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alex Elder
  Cc: linux-arm-msm, devicetree, linux-kernel, netdev, Esteban Urrutia

This series adds support for the IPA subsystem found in the SM8450 SoC.
While IPA v5.0 is very similar to IPA v5.1 (heck, it even managed to
properly get the modem up and running), it wasn't perfect, since the
modem would sometimes hang when rebooting or powering the AP off.
After a thorough investigation, I managed to create the proper data file
required for IPA v5.1.

Regards,
Esteban

Signed-off-by: Esteban Urrutia <esteuwu@proton.me>
---
Esteban Urrutia (3):
      arm64: dts: qcom: sm8450: Add IPA support
      dt-bindings: net: qcom,ipa: Add SM8450 compatible string
      net: ipa: Add IPA v5.1 data

 .../devicetree/bindings/net/qcom,ipa.yaml          |   1 +
 arch/arm64/boot/dts/qcom/sm8450.dtsi               |  55 ++-
 drivers/net/ipa/Makefile                           |   2 +-
 drivers/net/ipa/data/ipa_data-v5.1.c               | 477 +++++++++++++++++++++
 drivers/net/ipa/gsi_reg.c                          |   1 +
 drivers/net/ipa/ipa_data.h                         |   1 +
 drivers/net/ipa/ipa_main.c                         |   4 +
 drivers/net/ipa/ipa_reg.c                          |   1 +
 8 files changed, 536 insertions(+), 6 deletions(-)
---
base-commit: 948efecf22e49aa4bf55bb73ec79a0ddcfd38571
change-id: 20260622-sm8450-ipa-5da81f67eb65

Best regards,
--  
Esteban Urrutia <esteuwu@proton.me>



^ permalink raw reply

* [PATCH 2/3] dt-bindings: net: qcom,ipa: Add SM8450 compatible string
From: Esteban Urrutia via B4 Relay @ 2026-06-23  1:44 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alex Elder
  Cc: linux-arm-msm, devicetree, linux-kernel, netdev, Esteban Urrutia
In-Reply-To: <20260622-sm8450-ipa-v1-0-532f0299f96e@proton.me>

From: Esteban Urrutia <esteuwu@proton.me>

Declare compatible string in ipa binding for SM8450,
which uses IPA v5.1.

Signed-off-by: Esteban Urrutia <esteuwu@proton.me>
---
 Documentation/devicetree/bindings/net/qcom,ipa.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
index 68ec76fe4473..db91bfaaf833 100644
--- a/Documentation/devicetree/bindings/net/qcom,ipa.yaml
+++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml
@@ -53,6 +53,7 @@ properties:
           - qcom,sdx65-ipa
           - qcom,sm6350-ipa
           - qcom,sm8350-ipa
+          - qcom,sm8450-ipa
           - qcom,sm8550-ipa
       - items:
           - enum:

-- 
2.54.0



^ permalink raw reply related

* [PATCH 1/3] arm64: dts: qcom: sm8450: Add IPA support
From: Esteban Urrutia via B4 Relay @ 2026-06-23  1:44 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alex Elder
  Cc: linux-arm-msm, devicetree, linux-kernel, netdev, Esteban Urrutia
In-Reply-To: <20260622-sm8450-ipa-v1-0-532f0299f96e@proton.me>

From: Esteban Urrutia <esteuwu@proton.me>

Add support for IPA in DT while expanding the IMEM region just enough to
accommodate the modem tables used by IPA.
As reference, SM8450 uses IPA v5.1.

Signed-off-by: Esteban Urrutia <esteuwu@proton.me>
---
 arch/arm64/boot/dts/qcom/sm8450.dtsi | 55 ++++++++++++++++++++++++++++++++----
 1 file changed, 50 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sm8450.dtsi b/arch/arm64/boot/dts/qcom/sm8450.dtsi
index 56cb6e959e4e..c904720008fa 100644
--- a/arch/arm64/boot/dts/qcom/sm8450.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8450.dtsi
@@ -2639,6 +2639,47 @@ adreno_smmu: iommu@3da0000 {
 			dma-coherent;
 		};
 
+		ipa: ipa@3f40000 {
+			compatible = "qcom,sm8450-ipa";
+
+			iommus = <&apps_smmu 0x5c0 0x0>,
+				 <&apps_smmu 0x5c2 0x0>;
+			reg = <0 0x3f40000 0 0x10000>,
+			      <0 0x3f50000 0 0x5000>,
+			      <0 0x3e04000 0 0xfc000>;
+			reg-names = "ipa-reg",
+				    "ipa-shared",
+				    "gsi";
+
+			interrupts-extended = <&intc GIC_SPI 654 IRQ_TYPE_EDGE_RISING>,
+					      <&intc GIC_SPI 432 IRQ_TYPE_LEVEL_HIGH>,
+					      <&ipa_smp2p_in 0 IRQ_TYPE_EDGE_RISING>,
+					      <&ipa_smp2p_in 1 IRQ_TYPE_EDGE_RISING>;
+			interrupt-names = "ipa",
+					  "gsi",
+					  "ipa-clock-query",
+					  "ipa-setup-ready";
+
+			clocks = <&rpmhcc RPMH_IPA_CLK>;
+			clock-names = "core";
+
+			interconnects = <&aggre2_noc MASTER_IPA 0 &mc_virt SLAVE_EBI1 0>,
+					<&gem_noc MASTER_APPSS_PROC 0 &config_noc SLAVE_IPA_CFG 0>;
+			interconnect-names = "memory",
+					     "config";
+
+			qcom,qmp = <&aoss_qmp>;
+
+			qcom,smem-states = <&ipa_smp2p_out 0>,
+					   <&ipa_smp2p_out 1>;
+			qcom,smem-state-names = "ipa-clock-enabled-valid",
+						"ipa-clock-enabled";
+
+			sram = <&ipa_modem_tables>;
+
+			status = "disabled";
+		};
+
 		usb_1_hsphy: phy@88e3000 {
 			compatible = "qcom,sm8450-usb-hs-phy",
 				     "qcom,usb-snps-hs-7nm-phy";
@@ -4970,17 +5011,21 @@ cti@13900000 {
 			clock-names = "apb_pclk";
 		};
 
-		sram@146aa000 {
+		sram@146a8000 {
 			compatible = "qcom,sm8450-imem", "syscon", "simple-mfd";
-			reg = <0 0x146aa000 0 0x1000>;
-			ranges = <0 0 0x146aa000 0x1000>;
+			reg = <0 0x146a8000 0 0x3000>;
+			ranges = <0 0 0x146a8000 0x3000>;
 
 			#address-cells = <1>;
 			#size-cells = <1>;
 
-			pil-reloc@94c {
+			ipa_modem_tables: modem-tables@0 {
+				reg = <0 0x2000>;
+			};
+
+			pil-reloc@294c {
 				compatible = "qcom,pil-reloc-info";
-				reg = <0x94c 0xc8>;
+				reg = <0x294c 0xc8>;
 			};
 		};
 

-- 
2.54.0



^ permalink raw reply related

* [PATCH 3/3] net: ipa: Add IPA v5.1 data
From: Esteban Urrutia via B4 Relay @ 2026-06-23  1:44 UTC (permalink / raw)
  To: Bjorn Andersson, Konrad Dybcio, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alex Elder
  Cc: linux-arm-msm, devicetree, linux-kernel, netdev, Esteban Urrutia
In-Reply-To: <20260622-sm8450-ipa-v1-0-532f0299f96e@proton.me>

From: Esteban Urrutia <esteuwu@proton.me>

Add the required ipa_data-v5.1.c file for IPA v5.1 along with changes
that declare IPA v5.1 support.
This version of IPA is used in both SM8450 and SM8475 SoCs.

Signed-off-by: Esteban Urrutia <esteuwu@proton.me>
---
 drivers/net/ipa/Makefile             |   2 +-
 drivers/net/ipa/data/ipa_data-v5.1.c | 477 +++++++++++++++++++++++++++++++++++
 drivers/net/ipa/gsi_reg.c            |   1 +
 drivers/net/ipa/ipa_data.h           |   1 +
 drivers/net/ipa/ipa_main.c           |   4 +
 drivers/net/ipa/ipa_reg.c            |   1 +
 6 files changed, 485 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ipa/Makefile b/drivers/net/ipa/Makefile
index e148ec3c1a10..d4995c2e8ca0 100644
--- a/drivers/net/ipa/Makefile
+++ b/drivers/net/ipa/Makefile
@@ -7,7 +7,7 @@ IPA_REG_VERSIONS	:=	3.1 3.5.1 4.2 4.5 4.7 4.9 4.11 5.0 5.5
 # Some IPA versions can reuse another set of GSI register definitions.
 GSI_REG_VERSIONS	:=	3.1 3.5.1 4.0 4.5 4.9 4.11 5.0
 
-IPA_DATA_VERSIONS	:=	3.1 3.5.1 4.2 4.5 4.7 4.9 4.11 5.0 5.2 5.5
+IPA_DATA_VERSIONS	:=	3.1 3.5.1 4.2 4.5 4.7 4.9 4.11 5.0 5.1 5.2 5.5
 
 obj-$(CONFIG_QCOM_IPA)	+=	ipa.o
 
diff --git a/drivers/net/ipa/data/ipa_data-v5.1.c b/drivers/net/ipa/data/ipa_data-v5.1.c
new file mode 100644
index 000000000000..85b21efa1224
--- /dev/null
+++ b/drivers/net/ipa/data/ipa_data-v5.1.c
@@ -0,0 +1,477 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (C) 2023-2024 Linaro Ltd. */
+/* Copyright (C) 2026 Esteban Urrutia <esteuwu@proton.me> */
+
+#include <linux/array_size.h>
+#include <linux/log2.h>
+
+#include "../ipa_data.h"
+#include "../ipa_endpoint.h"
+#include "../ipa_mem.h"
+#include "../ipa_version.h"
+
+/** enum ipa_resource_type - IPA resource types for an SoC having IPA v5.1 */
+enum ipa_resource_type {
+	/* Source resource types; first must have value 0 */
+	IPA_RESOURCE_TYPE_SRC_PKT_CONTEXTS		= 0,
+	IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_LISTS,
+	IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_BUFF,
+	IPA_RESOURCE_TYPE_SRC_HPS_DMARS,
+	IPA_RESOURCE_TYPE_SRC_ACK_ENTRIES,
+
+	/* Destination resource types; first must have value 0 */
+	IPA_RESOURCE_TYPE_DST_DATA_SECTORS		= 0,
+	IPA_RESOURCE_TYPE_DST_DPS_DMARS,
+	IPA_RESOURCE_TYPE_DST_ULSO_SEGMENTS,
+};
+
+/* Resource groups used for an SoC having IPA v5.1 */
+enum ipa_rsrc_group_id {
+	/* Source resource group identifiers */
+	IPA_RSRC_GROUP_SRC_UL				= 0,
+	IPA_RSRC_GROUP_SRC_DL,
+	IPA_RSRC_GROUP_SRC_UNUSED_2,
+	IPA_RSRC_GROUP_SRC_UNUSED_3,
+	IPA_RSRC_GROUP_SRC_URLLC,
+	IPA_RSRC_GROUP_SRC_U_RX_QC,
+	IPA_RSRC_GROUP_SRC_COUNT,	/* Last in set; not a source group */
+
+	/* Destination resource group identifiers */
+	IPA_RSRC_GROUP_DST_UL				= 0,
+	IPA_RSRC_GROUP_DST_DL,
+	IPA_RSRC_GROUP_DST_UNUSED_2,
+	IPA_RSRC_GROUP_DST_UNUSED_3,
+	IPA_RSRC_GROUP_DST_UNUSED_4,
+	IPA_RSRC_GROUP_DST_UC,
+	IPA_RSRC_GROUP_DST_DRB_IP,
+	IPA_RSRC_GROUP_DST_COUNT,	/* Last; not a destination group */
+};
+
+/* QSB configuration data for an SoC having IPA v5.1 */
+static const struct ipa_qsb_data ipa_qsb_data[] = {
+	[IPA_QSB_MASTER_DDR] = {
+		.max_writes		= 0,
+		.max_reads		= 0,	/* no limit (hardware max) */
+		.max_reads_beats	= 0,
+	},
+	[IPA_QSB_MASTER_PCIE] = {
+		.max_writes		= 0,
+		.max_reads		= 0,	/* no limit (hardware max) */
+		.max_reads_beats	= 0,
+	},
+};
+
+/* Endpoint configuration data for an SoC having IPA v5.1 */
+static const struct ipa_gsi_endpoint_data ipa_gsi_endpoint_data[] = {
+	[IPA_ENDPOINT_AP_COMMAND_TX] = {
+		.ee_id		= GSI_EE_AP,
+		.channel_id	= 12,
+		.endpoint_id	= 14,
+		.toward_ipa	= true,
+		.channel = {
+			.tre_count	= 256,
+			.event_count	= 256,
+			.tlv_count	= 20,
+		},
+		.endpoint = {
+			.config = {
+				.resource_group	= IPA_RSRC_GROUP_SRC_UL,
+				.dma_mode	= true,
+				.dma_endpoint	= IPA_ENDPOINT_AP_LAN_RX,
+				.tx = {
+					.seq_type = IPA_SEQ_DMA,
+				},
+			},
+		},
+	},
+	[IPA_ENDPOINT_AP_LAN_RX] = {
+		.ee_id		= GSI_EE_AP,
+		.channel_id	= 13,
+		.endpoint_id	= 16,
+		.toward_ipa	= false,
+		.channel = {
+			.tre_count	= 256,
+			.event_count	= 256,
+			.tlv_count	= 9,
+		},
+		.endpoint = {
+			.config = {
+				.resource_group	= IPA_RSRC_GROUP_DST_UL,
+				.aggregation	= true,
+				.status_enable	= true,
+				.rx = {
+					.buffer_size	= 8192,
+					.pad_align	= ilog2(sizeof(u32)),
+					.aggr_time_limit = 500,
+				},
+			},
+		},
+	},
+	[IPA_ENDPOINT_AP_MODEM_TX] = {
+		.ee_id		= GSI_EE_AP,
+		.channel_id	= 11,
+		.endpoint_id	= 2,
+		.toward_ipa	= true,
+		.channel = {
+			.tre_count	= 512,
+			.event_count	= 512,
+			.tlv_count	= 25,
+		},
+		.endpoint = {
+			.filter_support	= true,
+			.config = {
+				.resource_group	= IPA_RSRC_GROUP_SRC_UL,
+				.checksum       = true,
+				.qmap		= true,
+				.status_enable	= true,
+				.tx = {
+					.seq_type = IPA_SEQ_2_PASS_SKIP_LAST_UC,
+					.status_endpoint =
+						IPA_ENDPOINT_MODEM_AP_RX,
+				},
+			},
+		},
+	},
+	[IPA_ENDPOINT_AP_MODEM_RX] = {
+		.ee_id		= GSI_EE_AP,
+		.channel_id	= 1,
+		.endpoint_id	= 23,
+		.toward_ipa	= false,
+		.channel = {
+			.tre_count	= 256,
+			.event_count	= 256,
+			.tlv_count	= 9,
+		},
+		.endpoint = {
+			.config = {
+				.resource_group	= IPA_RSRC_GROUP_DST_UL,
+				.checksum       = true,
+				.qmap		= true,
+				.aggregation	= true,
+				.rx = {
+					.buffer_size	= 8192,
+					.aggr_time_limit = 500,
+					.aggr_close_eof	= true,
+				},
+			},
+		},
+	},
+	[IPA_ENDPOINT_MODEM_AP_TX] = {
+		.ee_id		= GSI_EE_MODEM,
+		.channel_id	= 0,
+		.endpoint_id	= 12,
+		.toward_ipa	= true,
+		.endpoint = {
+			.filter_support	= true,
+		},
+	},
+	[IPA_ENDPOINT_MODEM_AP_RX] = {
+		.ee_id		= GSI_EE_MODEM,
+		.channel_id	= 7,
+		.endpoint_id	= 21,
+		.toward_ipa	= false,
+	},
+	[IPA_ENDPOINT_MODEM_DL_NLO_TX] = {
+		.ee_id		= GSI_EE_MODEM,
+		.channel_id	= 2,
+		.endpoint_id	= 15,
+		.toward_ipa	= true,
+		.endpoint = {
+			.filter_support	= true,
+		},
+	},
+};
+
+/* Source resource configuration data for an SoC having IPA v5.1 */
+static const struct ipa_resource ipa_resource_src[] = {
+	[IPA_RESOURCE_TYPE_SRC_PKT_CONTEXTS] = {
+		.limits[IPA_RSRC_GROUP_SRC_UL] = {
+			.min = 7,	.max = 12,
+		},
+		.limits[IPA_RSRC_GROUP_SRC_URLLC] = {
+			.min = 1,	.max = 63,
+		},
+		.limits[IPA_RSRC_GROUP_SRC_U_RX_QC] = {
+			.min = 0,	.max = 63,
+		},
+	},
+	[IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_LISTS] = {
+		.limits[IPA_RSRC_GROUP_SRC_UL] = {
+			.min = 21,	.max = 21,
+		},
+		.limits[IPA_RSRC_GROUP_SRC_URLLC] = {
+			.min = 10,	.max = 10,
+		},
+	},
+	[IPA_RESOURCE_TYPE_SRC_DESCRIPTOR_BUFF] = {
+		.limits[IPA_RSRC_GROUP_SRC_UL] = {
+			.min = 33,	.max = 33,
+		},
+		.limits[IPA_RSRC_GROUP_SRC_URLLC] = {
+			.min = 20,	.max = 20,
+		},
+	},
+	[IPA_RESOURCE_TYPE_SRC_HPS_DMARS] = {
+		.limits[IPA_RSRC_GROUP_SRC_UL] = {
+			.min = 0,	.max = 63,
+		},
+		.limits[IPA_RSRC_GROUP_SRC_URLLC] = {
+			.min = 1,	.max = 63,
+		},
+		.limits[IPA_RSRC_GROUP_SRC_U_RX_QC] = {
+			.min = 0,	.max = 63,
+		},
+	},
+	[IPA_RESOURCE_TYPE_SRC_ACK_ENTRIES] = {
+		.limits[IPA_RSRC_GROUP_SRC_UL] = {
+			.min = 38,	.max = 38,
+		},
+		.limits[IPA_RSRC_GROUP_SRC_URLLC] = {
+			.min = 16,	.max = 16,
+		},
+	},
+};
+
+/* Destination resource configuration data for an SoC having IPA v5.1 */
+static const struct ipa_resource ipa_resource_dst[] = {
+	[IPA_RESOURCE_TYPE_DST_DATA_SECTORS] = {
+		.limits[IPA_RSRC_GROUP_DST_UL] = {
+			.min = 6,	.max = 6,
+		},
+		.limits[IPA_RSRC_GROUP_DST_DL] = {
+			.min = 5,	.max = 5,
+		},
+		.limits[IPA_RSRC_GROUP_DST_DRB_IP] = {
+			.min = 39,	.max = 39,
+		},
+	},
+	[IPA_RESOURCE_TYPE_DST_DPS_DMARS] = {
+		.limits[IPA_RSRC_GROUP_DST_UL] = {
+			.min = 0,	.max = 3,
+		},
+		.limits[IPA_RSRC_GROUP_DST_DL] = {
+			.min = 0,	.max = 3,
+		},
+	},
+	[IPA_RESOURCE_TYPE_DST_ULSO_SEGMENTS] = {
+		.limits[IPA_RSRC_GROUP_DST_UL] = {
+			.min = 0,	.max = 63,
+		},
+		.limits[IPA_RSRC_GROUP_DST_DL] = {
+			.min = 0,	.max = 63,
+		},
+	},
+};
+
+/* Resource configuration data for an SoC having IPA v5.1 */
+static const struct ipa_resource_data ipa_resource_data = {
+	.rsrc_group_dst_count	= IPA_RSRC_GROUP_DST_COUNT,
+	.rsrc_group_src_count	= IPA_RSRC_GROUP_SRC_COUNT,
+	.resource_src_count	= ARRAY_SIZE(ipa_resource_src),
+	.resource_src		= ipa_resource_src,
+	.resource_dst_count	= ARRAY_SIZE(ipa_resource_dst),
+	.resource_dst		= ipa_resource_dst,
+};
+
+/* IPA-resident memory region data for an SoC having IPA v5.1 */
+static const struct ipa_mem ipa_mem_local_data[] = {
+	{
+		.id		= IPA_MEM_UC_EVENT_RING,
+		.offset		= 0x0000,
+		.size		= 0x1000,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_UC_SHARED,
+		.offset		= 0x1000,
+		.size		= 0x0080,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_UC_INFO,
+		.offset		= 0x1080,
+		.size		= 0x0200,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_V4_FILTER_HASHED,
+		.offset		= 0x1288,
+		.size		= 0x0078,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_V4_FILTER,
+		.offset		= 0x1308,
+		.size		= 0x0078,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_V6_FILTER_HASHED,
+		.offset		= 0x1388,
+		.size		= 0x0078,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_V6_FILTER,
+		.offset		= 0x1408,
+		.size		= 0x0078,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_V4_ROUTE_HASHED,
+		.offset		= 0x1488,
+		.size		= 0x0098,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_V4_ROUTE,
+		.offset		= 0x1528,
+		.size		= 0x0098,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_V6_ROUTE_HASHED,
+		.offset		= 0x15c8,
+		.size		= 0x0098,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_V6_ROUTE,
+		.offset		= 0x1668,
+		.size		= 0x0098,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_MODEM_HEADER,
+		.offset		= 0x1708,
+		.size		= 0x0240,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_AP_HEADER,
+		.offset		= 0x1948,
+		.size		= 0x01e0,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_MODEM_PROC_CTX,
+		.offset		= 0x1b40,
+		.size		= 0x0b20,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_AP_PROC_CTX,
+		.offset		= 0x2660,
+		.size		= 0x0200,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_STATS_QUOTA_MODEM,
+		.offset		= 0x2868,
+		.size		= 0x0060,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_STATS_QUOTA_AP,
+		.offset		= 0x28c8,
+		.size		= 0x0048,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_STATS_TETHERING,
+		.offset		= 0x2910,
+		.size		= 0x03c0,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_AP_V4_FILTER,
+		.offset		= 0x29b8,
+		.size		= 0x0188,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_AP_V6_FILTER,
+		.offset		= 0x2b40,
+		.size		= 0x0228,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_STATS_FILTER_ROUTE,
+		.offset		= 0x2cd0,
+		.size		= 0x0ba0,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_STATS_DROP,
+		.offset		= 0x3870,
+		.size		= 0x0020,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_MODEM,
+		.offset		= 0x3898,
+		.size		= 0x0d48,
+		.canary_count	= 2,
+	},
+	{
+		.id		= IPA_MEM_NAT_TABLE,
+		.offset		= 0x45e0,
+		.size		= 0x0900,
+		.canary_count	= 0,
+	},
+	{
+		.id		= IPA_MEM_PDN_CONFIG,
+		.offset		= 0x4ee8,
+		.size		= 0x0100,
+		.canary_count	= 2,
+	},
+};
+
+/* Memory configuration data for an SoC having IPA v5.1 */
+static const struct ipa_mem_data ipa_mem_data = {
+	.local_count	= ARRAY_SIZE(ipa_mem_local_data),
+	.local		= ipa_mem_local_data,
+	.imem_addr	= 0x146a8000,
+	.imem_size	= 0x00002000,
+	/*
+	 * While this value is 0xb000 on SM8450 and 0x9000 on SM8475,
+	 * it has been left set to 0x9000 for compatibility with SM8475
+	 */
+	.smem_size	= 0x00009000,
+};
+
+/* Interconnect rates are in 1000 byte/second units */
+static const struct ipa_interconnect_data ipa_interconnect_data[] = {
+	{
+		.name			= "memory",
+		.peak_bandwidth		= 1900000,	/* 1.9 GBps */
+		.average_bandwidth	= 590000,	/* 590 MBps */
+	},
+	/* Average rate is unused for the next interconnect */
+	{
+		.name			= "config",
+		.peak_bandwidth		= 76800,	/* 76.8 MBps */
+		.average_bandwidth	= 0,		/* unused */
+	},
+};
+
+/* Clock and interconnect configuration data for an SoC having IPA v5.1 */
+static const struct ipa_power_data ipa_power_data = {
+	.core_clock_rate	= 120 * 1000 * 1000,	/* Hz */
+	.interconnect_count	= ARRAY_SIZE(ipa_interconnect_data),
+	.interconnect_data	= ipa_interconnect_data,
+};
+
+/* Configuration data for an SoC having IPA v5.1. */
+const struct ipa_data ipa_data_v5_1 = {
+	.version		= IPA_VERSION_5_1,
+	.qsb_count		= ARRAY_SIZE(ipa_qsb_data),
+	.qsb_data		= ipa_qsb_data,
+	.modem_route_count	= 11,
+	.endpoint_count		= ARRAY_SIZE(ipa_gsi_endpoint_data),
+	.endpoint_data		= ipa_gsi_endpoint_data,
+	.resource_data		= &ipa_resource_data,
+	.mem_data		= &ipa_mem_data,
+	.power_data		= &ipa_power_data,
+};
diff --git a/drivers/net/ipa/gsi_reg.c b/drivers/net/ipa/gsi_reg.c
index e13cf835a013..a57072ba4bef 100644
--- a/drivers/net/ipa/gsi_reg.c
+++ b/drivers/net/ipa/gsi_reg.c
@@ -110,6 +110,7 @@ static const struct regs *gsi_regs(struct gsi *gsi)
 		return &gsi_regs_v4_11;
 
 	case IPA_VERSION_5_0:
+	case IPA_VERSION_5_1:
 	case IPA_VERSION_5_2:
 	case IPA_VERSION_5_5:
 		return &gsi_regs_v5_0;
diff --git a/drivers/net/ipa/ipa_data.h b/drivers/net/ipa/ipa_data.h
index 3eb9dc2ce339..fe6f7d5bfe88 100644
--- a/drivers/net/ipa/ipa_data.h
+++ b/drivers/net/ipa/ipa_data.h
@@ -253,6 +253,7 @@ extern const struct ipa_data ipa_data_v4_7;
 extern const struct ipa_data ipa_data_v4_9;
 extern const struct ipa_data ipa_data_v4_11;
 extern const struct ipa_data ipa_data_v5_0;
+extern const struct ipa_data ipa_data_v5_1;
 extern const struct ipa_data ipa_data_v5_2;
 extern const struct ipa_data ipa_data_v5_5;
 
diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index 788dd99af2a4..6c449032ae45 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -669,6 +669,10 @@ static const struct of_device_id ipa_match[] = {
 		.compatible	= "qcom,sdx65-ipa",
 		.data		= &ipa_data_v5_0,
 	},
+	{
+		.compatible	= "qcom,sm8450-ipa",
+		.data		= &ipa_data_v5_1,
+	},
 	{
 		.compatible	= "qcom,milos-ipa",
 		.data		= &ipa_data_v5_2,
diff --git a/drivers/net/ipa/ipa_reg.c b/drivers/net/ipa/ipa_reg.c
index 30bd69f4c147..5f22ca6295b1 100644
--- a/drivers/net/ipa/ipa_reg.c
+++ b/drivers/net/ipa/ipa_reg.c
@@ -125,6 +125,7 @@ static const struct regs *ipa_regs(enum ipa_version version)
 	case IPA_VERSION_4_11:
 		return &ipa_regs_v4_11;
 	case IPA_VERSION_5_0:
+	case IPA_VERSION_5_1:
 	case IPA_VERSION_5_2:
 		return &ipa_regs_v5_0;
 	case IPA_VERSION_5_5:

-- 
2.54.0



^ permalink raw reply related

* Re: [PATCH net] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Xin Long @ 2026-06-23  1:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, syzbot+e14bc5d4942756023b77, Jon Maloy
In-Reply-To: <20260622171048.1626022-1-edumazet@google.com>

On Mon, Jun 22, 2026 at 1:10 PM Eric Dumazet <edumazet@google.com> wrote:
>
> TIPC UDP media bearer teardown calls dst_cache_destroy() on its
> replicast caches before calling synchronize_net() to wait for
> concurrent RCU readers (transmitters) to finish:
>
> static void cleanup_bearer(struct work_struct *work)
> {
> ...
>         list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
>                 dst_cache_destroy(&rcast->dst_cache);
>                 list_del_rcu(&rcast->list);
>                 kfree_rcu(rcast, rcu);
>         }
> ...
>         dst_cache_destroy(&ub->rcast.dst_cache);
>         udp_tunnel_sock_release(ub->sk);
>         synchronize_net();
> ...
> }
>
> This is highly buggy because dst_cache_destroy() immediately frees the
> per-CPU cache memory (free_percpu()) and releases the cached dst
> entries without any synchronization.
>
> If a concurrent transmitter (e.g., tipc_udp_xmit()) is running on another
> CPU under RCU protection, it can call dst_cache_get() concurrently,
> leading to:
> 1. Use-After-Free on the per-CPU cache pointer itself (crash).
> 2. "rcuref - imbalanced put()" warning if it attempts to release a
>    dst that was concurrently released by dst_cache_destroy().
>
> Furthermore, calling kfree(ub) immediately after synchronize_net() without
> closing the socket first (or waiting after closing it) leaves a window
> where a concurrent receiver (tipc_udp_recv()) could start after
> synchronize_net(), access ub, and suffer a UAF when kfree(ub) runs.
>
> To fix this, we must defer dst_cache_destroy() and kfree(ub) until after
> we have ensured that no more readers can see the bearer/socket and all
> existing readers have finished:
>
> 1. Move the rcast entries from the public list to a private list
>    and delete them using list_del_rcu() (stops new transmit readers).
> 2. Release the bearer socket using udp_tunnel_sock_release() (stops
>    new receive readers).
> 3. Call synchronize_net() to wait for all outstanding RCU readers
>    (both transmit and receive) to finish.
> 4. Now that it is safe, call dst_cache_destroy() on all isolated
>    rcast entries and the main bearer cache, and free the memory.
>
> Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media")
> Reported-by: syzbot+e14bc5d4942756023b77@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/6a396a66.52ae72c2.136ac7.0003.GAE@google.com/T/#u
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Xin Long <lucien.xin@gmail.com>
> Cc: Jon Maloy <jon.maloy@ericsson.com>
> ---
>  net/tipc/udp_media.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> index 988b8a7f953ad6da860e6190f1f244650f121dce..befaf7137caf642462b7203a2429a60386e64db8 100644
> --- a/net/tipc/udp_media.c
> +++ b/net/tipc/udp_media.c
> @@ -808,21 +808,26 @@ static void cleanup_bearer(struct work_struct *work)
>  {
>         struct udp_bearer *ub = container_of(work, struct udp_bearer, work);
>         struct udp_replicast *rcast, *tmp;
> +       LIST_HEAD(private_list);
>         struct tipc_net *tn;
>
>         list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
> -               dst_cache_destroy(&rcast->dst_cache);
>                 list_del_rcu(&rcast->list);
> -               kfree_rcu(rcast, rcu);
> +               list_add(&rcast->list, &private_list);
>         }

Could this corrupt the list for concurrent RCU readers?
When list_del_rcu() is called, it intentionally leaves the next pointer
intact so concurrent readers can continue their traversal. However, the
immediate call to list_add() overwrites both the next and prev pointers
to link the entry into private_list.
If a concurrent reader is currently positioned at rcast, won't it follow
the newly clobbered next pointer and jump from the original RCU list
directly into private_list?
Because private_list is allocated on the local stack, the reader might
interpret stack memory as a struct udp_replicast. Furthermore, the reader
would miss its loop termination condition because it expects to reach the
original list head, potentially resulting in an infinite loop or a crash.
[ ... ]

This looks legit.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next v3] virtio-net: xsk: support tx wake up
From: Menglong Dong @ 2026-06-23  1:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Menglong Dong, xuanzhuo, eperezma, jasowang, andrew+netdev, davem,
	edumazet, kuba, pabeni, netdev, virtualization, linux-kernel
In-Reply-To: <20260622085825-mutt-send-email-mst@kernel.org>

On 2026/6/22 21:24 Michael S. Tsirkin <mst@redhat.com> write:
> On Mon, Jun 22, 2026 at 08:27:12PM +0800, Menglong Dong wrote:
> > On 2026/6/22 06:31 Michael S. Tsirkin <mst@redhat.com> write:
> > > On Tue, Jun 16, 2026 at 07:59:12PM +0800, Menglong Dong wrote:
> > [...]
[...]
> > 
> > And the logic is like this:
> > 
> > Kernel: tx NAPI is waked up from skb_xmit_done() ->
> > Kernel: sq->vq and xsk->tx_ring are both empty ->
> > Kernel: call virtnet_xsk_xmit_batch()
> > 
> >     User: submit a entry to the xsk->tx_ring
> >     User: check the wakeup flag
> >     User: wakeup flag is not set, skip send()
> > 
> > Kernel: call xsk_set_tx_need_wakeup(), because sq->vq is empty
> > 
> > If we don't send more data, the data in the xsk->tx_ring will
> > not be sent forever.
> 
> I'm not 100% sure I understand, but when someone fixes cross-CPU races
> with no synchronization or CPU memory barriers just with extra checks,
> this always gives me pause.
> 
> AI helped write this for me, for example:
>   1. Kernel: xsk_set_tx_need_wakeup stores NEED_WAKEUP (sits in store buffer)
>   2. Kernel: xsk_tx_peek_release_desc_batch - load, sees empty (reordered before the store is globally visible)
>   3. Kernel: peek finds nothing, returns 0
>   4. Userspace: stores entry + producer
>   5. Userspace: loads flags - doesn't see NEED_WAKEUP yet (still in kernel's store buffer)
>   6. Userspeace: skips send()
>   7. Kernel: NEED_WAKEUP store finally becomes visible - too late
> 
> Seems legit?

Ah, it seems right. The race condition problem is more complex
than I thought. And seems that this is a common problem of
XSK WAKEUP, which should exists for all the drivers.

So I think we can remove the checking here. And I'll see if I
can solve such problem completely further. WDYT?

> 
> 
> 
> > > 
> > > >  	sent = virtnet_xsk_xmit_batch(sq, pool, budget, &kicks);
> > > >  
> > > > +	if (need_wakeup) {
> > > > +		if (vring_size == sq->vq->num_free)
> > > > +			/* we can't wake up by ourself, and it should be done
> > > > +			 * by the user.
> > > > +			 */
> > > > +			xsk_set_tx_need_wakeup(pool);
> > > > +		else
> > > > +			/* we can wake up from skb_xmit_done() */
> > > > +			xsk_clear_tx_need_wakeup(pool);
> > > 
> > > But what if we don't have get tx napi so no wakeup in skb_xmit_done?
> > 
> > Sorry that I'm not sure what "get tx napi" means here ;(
> > 
> > There are entry in sq->vq, so skb_xmit_done() will be called after
> > the entries in the ring is consumed by the HOST, right?
> > Then, the corresponding sq->napi will be scheduled, as we ensure
> > that tx napi is always enabled, which means napi->weight is not
> > zero, in this commit:
> > 1df5116a41a8 ("virtio_net: xsk: prevent disable tx napi")
> 
> Oh I forgot we did that. But can xsk bind when tx napi has already
> been disabled previously?

According to my observe, it can, which I think is another issue, and
I were about to fix it later in a separate patch.

It is a problem, right?

There are 2 approach to fix it:
1. don't allow the binding if the tx napi is not enabled
2. or we set the tx_napi->weight to 1 when binding, and
    restore it to 0 when unbind.

Should I fix it in this series?

Thanks!
Menglong Dong

> 
> 
> > Right?
> > 
> > Thanks!
> > Menglong Dong
> > 
> > > 
> > > 
> > > > +	}
> > > > +
> > > >  	if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq))
> > > >  		check_sq_full_and_disable(vi, vi->dev, sq);
> > > >  
> > > > @@ -1470,9 +1488,6 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> > > >  	u64_stats_add(&sq->stats.xdp_tx,  sent);
> > > >  	u64_stats_update_end(&sq->stats.syncp);
> > > >  
> > > > -	if (xsk_uses_need_wakeup(pool))
> > > > -		xsk_set_tx_need_wakeup(pool);
> > > > -
> > > >  	return sent;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.54.0
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 
> 





^ permalink raw reply

* Re: [PATCH net] bnx2x: fix potential memory leak in bnx2x_alloc_mem_bp()
From: patchwork-bot+netdevbpf @ 2026-06-23  1:50 UTC (permalink / raw)
  To: Abdun Nihaal
  Cc: skalluru, manishc, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, linux-kernel, barak, stable
In-Reply-To: <20260620062402.89549-1-nihaal@cse.iitm.ac.in>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 20 Jun 2026 11:53:50 +0530 you wrote:
> If the allocation of fp[i].tpa_info fails, the error path will not free
> the struct bnx2x_fastpath allocated earlier, as it is not linked to the
> bp structure yet. Fix that by linking it immediately after allocation.
> 
> Cc: stable@vger.kernel.org
> Fixes: 15192a8cf8a8 ("bnx2x: Split the FP structure")
> Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
> 
> [...]

Here is the summary with links:
  - [net] bnx2x: fix potential memory leak in bnx2x_alloc_mem_bp()
    https://git.kernel.org/netdev/net/c/a986fde914d8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v1 net] ipv4: fib: Don't ignore error route in local/main tables.
From: patchwork-bot+netdevbpf @ 2026-06-23  1:50 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: dsahern, idosch, davem, edumazet, kuba, pabeni, horms, kuni1840,
	netdev
In-Reply-To: <20260619212753.3367244-1-kuniyu@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 19 Jun 2026 21:27:20 +0000 you wrote:
> When CONFIG_IP_MULTIPLE_TABLES is enabled but no rule is added,
> fib_lookup() performs route lookup directly on two tables.
> 
> Since the first lookup does not properly bail out, the result
> of an error route in the merged local/main table could be
> overwritten by another route in the default table:
> 
> [...]

Here is the summary with links:
  - [v1,net] ipv4: fib: Don't ignore error route in local/main tables.
    https://git.kernel.org/netdev/net/c/b72f0db64205

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH] net/tcp-ao: fix use-after-free of key in del_async path
From: HanQuan @ 2026-06-23  1:52 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, ncardwell, HanQuan

In tcp_ao_delete_key(), the del_async path skips the current_key
and rnext_key validity checks present in the synchronous path,
assuming these pointers are always NULL on LISTEN sockets.  However,
if a key was added with set_current=1/set_rnext=1 while the socket
was in CLOSE state, current_key and rnext_key will be non-NULL
after listen() transitions the socket to LISTEN.

When such a key is deleted with del_async=1, hlist_del_rcu() and
call_rcu() free the key without clearing the dangling pointers.
After the RCU grace period, getsockopt(TCP_AO_INFO) dereferences
current_key->sndid and rnext_key->rcvid from freed slab memory.

Clear current_key and rnext_key in the del_async path when they
reference the key being deleted.

Fixes: d6732b95b6fb ("net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)")
Signed-off-by: HanQuan <eilaimemedsnaimel@gmail.com>
---
 net/ipv4/tcp_ao.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index 2f69bcecae78..a56bb79e15e0 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -1747,6 +1747,10 @@ static int tcp_ao_delete_key(struct sock *sk, struct tcp_ao_info *ao_info,
 	 * them and we can just free all resources in RCU fashion.
 	 */
 	if (del_async) {
+		if (ao_info->current_key == key)
+			WRITE_ONCE(ao_info->current_key, NULL);
+		if (ao_info->rnext_key == key)
+			WRITE_ONCE(ao_info->rnext_key, NULL);
 		atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc);
 		call_rcu(&key->rcu, tcp_ao_key_free_rcu);
 		return 0;
-- 
2.43.0


^ permalink raw reply related

* [PATCH bpf-next v4 0/3] bpf: bidirectional VLAN support for bpf_fib_lookup()
From: Avinash Duduskar @ 2026-06-23  2:51 UTC (permalink / raw)
  To: ast, daniel, andrii
  Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
	hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
	rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
	toke, dsahern

This series adds VLAN awareness to bpf_fib_lookup() in both directions.
BPF_FIB_LOOKUP_VLAN resolves a VLAN egress to its underlying real device
plus the VLAN tag (XDP programs need this because VLAN devices have no XDP
xmit), and BPF_FIB_LOOKUP_VLAN_INPUT runs the lookup as if a tagged frame
had arrived on the matching VLAN subinterface, for iif policy routing and
VRF table selection.

The independent l3mdev/VRF flow-init fix, patch 1 in v1 and v2, was split
out and merged to bpf separately.

v4 changes what bpf_fib_lookup() returns in one case. v3 left a VLAN
egress that cannot be reduced to a physical device plus one tag (a QinQ
egress, or a parent in another namespace) as best-effort SUCCESS with
the VLAN device's ifindex. In his v3 review Toke asked for a distinct
error code there instead, so an XDP program cannot mistake an unresolved
VLAN egress for a physical one, and suggested the name
BPF_FIB_LKUP_RET_VLAN_FAILURE; v4 implements that. The
code is appended after BPF_FIB_LKUP_RET_NO_SRC_ADDR (nothing renumbered,
tools/ mirror updated) and is returned only when BPF_FIB_LOOKUP_VLAN is
set, so no existing caller can observe it. On that failure params->ifindex
is left at the input, like the input-side failures; a tc or XDP program
that wants the VLAN device's own ifindex re-issues the lookup without the
flag, the recovery path he described.

The reason for a distinct code rather than best-effort SUCCESS: SUCCESS
on an unreducible egress silently blackholed XDP. A redirect to the VLAN
device drops at xdp_do_flush() with no in-band signal to the program;
VLAN_FAILURE makes the unreducible case explicit, and a live-frames
selftest exercises both the redirected and the passed paths. Only the
immediate parent is resolved, so QinQ and foreign-netns are the
unreducible cases; bond, team, TBID and VRF egress resolve, as the
selftest table pins.

Changes v3 -> v4:

- Patch 1: return BPF_FIB_LKUP_RET_VLAN_FAILURE for an unreducible VLAN
  egress, leaving params->ifindex at the input.

- Patch 3: the QinQ-egress and cross-namespace-egress arms expect
  VLAN_FAILURE; an escape-hatch arm re-issues without the flag for the
  inner VLAN device's ifindex; and test_fib_lookup_vlan_redirect drives
  live frames (BPF_F_TEST_XDP_LIVE_FRAMES) through the native redirect
  path, asserting a reducible egress is delivered and a QinQ egress is
  passed to the stack. The selftest's VLAN_FAILURE arms are IPv4 only,
  since bpf_ipv6_fib_lookup() restores params->ifindex with the same code
  as the IPv4 path that the arms exercise.

The three other v3 questions Toke marked fine as-is, and v4 keeps them: an
unmatched, down, or cross-namespace tag on input returns NOT_FWDED;
OUTPUT | VLAN_INPUT is rejected with -EINVAL; the BPF_FIB_LOOKUP_VLAN_INPUT
name is kept.

Taking the tag as lookup input follows the approach David Ahern suggested
in the 2021 fwmark discussion:
https://lore.kernel.org/bpf/6248c547-ad64-04d6-fcec-374893cc1ef2@gmail.com/

v3: https://lore.kernel.org/all/20260617224729.1428662-1-avinash.duduskar@gmail.com/
v2: https://lore.kernel.org/all/20260616223426.3568080-1-avinash.duduskar@gmail.com/
v1: https://lore.kernel.org/all/20260609172052.81613-1-avinash.duduskar@gmail.com/

Avinash Duduskar (3):
  bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
  bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT flag to bpf_fib_lookup() helper
  selftests/bpf: Add bpf_fib_lookup() VLAN flag tests

 include/uapi/linux/bpf.h                      |  47 +-
 net/core/filter.c                             | 133 +++-
 tools/include/uapi/linux/bpf.h                |  47 +-
 .../selftests/bpf/prog_tests/fib_lookup.c     | 696 +++++++++++++++++-
 .../testing/selftests/bpf/progs/fib_lookup.c  |  36 +
 5 files changed, 930 insertions(+), 29 deletions(-)


base-commit: a975094bf98ca97be9146f9d3b5681a6f9cf5ce3
-- 
2.54.0


^ permalink raw reply

* [PATCH bpf-next v4 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
From: Avinash Duduskar @ 2026-06-23  2:51 UTC (permalink / raw)
  To: ast, daniel, andrii
  Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
	hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
	rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
	toke, dsahern
In-Reply-To: <20260623025147.1001664-1-avinash.duduskar@gmail.com>

bpf_fib_lookup() returns the FIB-resolved egress ifindex straight
from the fib result. When the egress is a VLAN device, the returned
ifindex is the VLAN netdev's, which has no XDP xmit handler; XDP
programs that want to forward the frame (e.g. xdp-forward) must
instead target the underlying physical device and push the VLAN tag
themselves. Today the program has no way to learn either the
underlying ifindex or the VLAN tag without maintaining its own
VLAN-to-ifindex map in userspace and refreshing it on netlink
events.

Add BPF_FIB_LOOKUP_VLAN. When the caller sets this flag and the fib
result is a VLAN device whose immediate parent is a real (non-VLAN)
device in the same network namespace, populate the existing output
fields params->h_vlan_proto and params->h_vlan_TCI from the VLAN
device and replace params->ifindex with the parent's ifindex.
params->h_vlan_TCI carries the VID only, with PCP and DEI bits zero; a
consumer wanting to set egress priority writes PCP itself.
params->smac is the VLAN device's own address, which can differ from
the parent's.

Only the immediate parent is resolved, via vlan_dev_priv(dev)->real_dev
and not vlan_dev_real_dev(), which walks to the bottom of a stack. When
the immediate parent is not a real device in the same namespace, the
lookup returns BPF_FIB_LKUP_RET_VLAN_FAILURE and leaves params->ifindex
at the input. This covers a stacked VLAN (QinQ), where the immediate
parent is itself a VLAN device and one h_vlan_proto/h_vlan_TCI pair
cannot describe two tags, and a parent in another network namespace (a
VLAN device can be moved while its parent stays), whose ifindex would
be meaningless in the caller's namespace. A program that wants the VLAN
device's own ifindex re-issues the lookup without BPF_FIB_LOOKUP_VLAN,
so the unreducible case stays distinct from a physical egress. That
distinction matters for XDP: a program cannot xmit on a VLAN device, so
a success carrying the VLAN ifindex would make it redirect to a device
with no ndo_xdp_xmit and drop the frame at xdp_do_flush(). The swap and
the vlan fields are written only on the reduce path; other output
fields keep their existing behaviour, so a frag-needed result still
reports the route mtu in params->mtu_result.

On the skb path without tot_len the deferred mtu check is done against
the resolved egress device. To keep that the VLAN device rather than
the parent after the swap, bpf_ipv4_fib_lookup()/bpf_ipv6_fib_lookup()
hand the FIB-result device back to the caller; the XDP path always
runs the route-mtu check and passes NULL. When the flag is not set,
behaviour is unchanged: h_vlan_proto and h_vlan_TCI are zeroed and
ifindex is left at the FIB result.

The new block is compiled only under CONFIG_VLAN_8021Q since
vlan_dev_priv() is not defined otherwise; without that config
is_vlan_dev() is constant false and the flag is accepted but never
acts. That is safe because no VLAN device can exist there, so every
egress is already physical.

This lets an XDP redirect target the physical device and learn the
tag to push in a single lookup, which xdp-forward's optional VLAN
mode (xdp-project/xdp-tools#504) wants from the kernel side.

The helper's input semantics are unchanged; the reverse direction
(supplying a tag as lookup input) is added in the following patch.

Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
---
 include/uapi/linux/bpf.h       | 28 +++++++++++++-
 net/core/filter.c              | 69 ++++++++++++++++++++++++----------
 tools/include/uapi/linux/bpf.h | 28 +++++++++++++-
 3 files changed, 104 insertions(+), 21 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 89b36de5fdbb..8d0058d88eb2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3532,6 +3532,26 @@ union bpf_attr {
  *			Use the mark present in *params*->mark for the fib lookup.
  *			This option should not be used with BPF_FIB_LOOKUP_DIRECT,
  *			as it only has meaning for full lookups.
+ *		**BPF_FIB_LOOKUP_VLAN**
+ *			If the fib lookup resolves to a VLAN device whose
+ *			parent is a real (non-VLAN) device, set
+ *			*params*->h_vlan_proto and *params*->h_vlan_TCI from
+ *			the VLAN device and replace *params*->ifindex with the
+ *			parent's ifindex. *params*->h_vlan_TCI carries the VID
+ *			only, with PCP and DEI bits zero; a consumer wanting to
+ *			set egress priority writes PCP itself. *params*->smac is
+ *			the VLAN device's own address, which can differ from the
+ *			parent's. Only the immediate parent is resolved; if it
+ *			is itself a VLAN device (QinQ) or in another namespace,
+ *			the egress cannot be reduced to a physical device plus
+ *			one tag and the lookup returns
+ *			**BPF_FIB_LKUP_RET_VLAN_FAILURE** with *params*->ifindex
+ *			left at the input. Re-issue without
+ *			**BPF_FIB_LOOKUP_VLAN** to obtain the VLAN device's own
+ *			ifindex. The swap and the vlan fields
+ *			are written only on success; other output fields keep
+ *			the helper's existing behaviour, so a frag-needed result
+ *			still reports the route mtu in *params*->mtu_result.
  *
  *		*ctx* is either **struct xdp_md** for XDP programs or
  *		**struct sk_buff** tc cls_act programs.
@@ -7327,6 +7347,7 @@ enum {
 	BPF_FIB_LOOKUP_TBID    = (1U << 3),
 	BPF_FIB_LOOKUP_SRC     = (1U << 4),
 	BPF_FIB_LOOKUP_MARK    = (1U << 5),
+	BPF_FIB_LOOKUP_VLAN    = (1U << 6),
 };
 
 enum {
@@ -7340,6 +7361,7 @@ enum {
 	BPF_FIB_LKUP_RET_NO_NEIGH,     /* no neighbor entry for nh */
 	BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
 	BPF_FIB_LKUP_RET_NO_SRC_ADDR,  /* failed to derive IP src addr */
+	BPF_FIB_LKUP_RET_VLAN_FAILURE, /* VLAN egress, parent unresolvable */
 };
 
 struct bpf_fib_lookup {
@@ -7393,7 +7415,11 @@ struct bpf_fib_lookup {
 
 	union {
 		struct {
-			/* output */
+			/*
+			 * output with BPF_FIB_LOOKUP_VLAN: set from the
+			 * resolved egress VLAN device (see the flag); zeroed
+			 * on other successful lookups.
+			 */
 			__be16	h_vlan_proto;
 			__be16	h_vlan_TCI;
 		};
diff --git a/net/core/filter.c b/net/core/filter.c
index 2e96b4b847ce..8345295d84de 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6201,10 +6201,28 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
 #endif
 
 #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
-static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu)
+static int bpf_fib_set_fwd_params(struct net_device *dev,
+				  struct bpf_fib_lookup *params,
+				  u32 flags, u32 mtu)
 {
 	params->h_vlan_TCI = 0;
 	params->h_vlan_proto = 0;
+
+#if IS_ENABLED(CONFIG_VLAN_8021Q)
+	if ((flags & BPF_FIB_LOOKUP_VLAN) && is_vlan_dev(dev)) {
+		struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+
+		if (!is_vlan_dev(real_dev) &&
+		    net_eq(dev_net(real_dev), dev_net(dev))) {
+			params->h_vlan_proto = vlan_dev_vlan_proto(dev);
+			params->h_vlan_TCI = htons(vlan_dev_vlan_id(dev));
+			params->ifindex = real_dev->ifindex;
+		} else {
+			return BPF_FIB_LKUP_RET_VLAN_FAILURE;
+		}
+	}
+#endif
+
 	if (mtu)
 		params->mtu_result = mtu; /* union with tot_len */
 
@@ -6214,8 +6232,10 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu)
 
 #if IS_ENABLED(CONFIG_INET)
 static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
-			       u32 flags, bool check_mtu)
+			       u32 flags, bool check_mtu,
+			       struct net_device **fwd_dev)
 {
+	u32 in_ifindex = params->ifindex;
 	struct neighbour *neigh = NULL;
 	struct fib_nh_common *nhc;
 	struct in_device *in_dev;
@@ -6347,16 +6367,23 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
 
 set_fwd_params:
-	return bpf_fib_set_fwd_params(params, mtu);
+	if (fwd_dev)
+		*fwd_dev = dev;
+	err = bpf_fib_set_fwd_params(dev, params, flags, mtu);
+	if (err == BPF_FIB_LKUP_RET_VLAN_FAILURE)
+		params->ifindex = in_ifindex;
+	return err;
 }
 #endif
 
 #if IS_ENABLED(CONFIG_IPV6)
 static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
-			       u32 flags, bool check_mtu)
+			       u32 flags, bool check_mtu,
+			       struct net_device **fwd_dev)
 {
 	struct in6_addr *src = (struct in6_addr *) params->ipv6_src;
 	struct in6_addr *dst = (struct in6_addr *) params->ipv6_dst;
+	u32 in_ifindex = params->ifindex;
 	struct fib6_result res = {};
 	struct neighbour *neigh;
 	struct net_device *dev;
@@ -6486,13 +6513,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
 
 set_fwd_params:
-	return bpf_fib_set_fwd_params(params, mtu);
+	if (fwd_dev)
+		*fwd_dev = dev;
+	err = bpf_fib_set_fwd_params(dev, params, flags, mtu);
+	if (err == BPF_FIB_LKUP_RET_VLAN_FAILURE)
+		params->ifindex = in_ifindex;
+	return err;
 }
 #endif
 
 #define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
 			     BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
-			     BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK)
+			     BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK | \
+			     BPF_FIB_LOOKUP_VLAN)
 
 BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
 	   struct bpf_fib_lookup *, params, int, plen, u32, flags)
@@ -6507,12 +6540,12 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
 #if IS_ENABLED(CONFIG_INET)
 	case AF_INET:
 		return bpf_ipv4_fib_lookup(dev_net(ctx->rxq->dev), params,
-					   flags, true);
+					   flags, true, NULL);
 #endif
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
 		return bpf_ipv6_fib_lookup(dev_net(ctx->rxq->dev), params,
-					   flags, true);
+					   flags, true, NULL);
 #endif
 	}
 	return -EAFNOSUPPORT;
@@ -6532,6 +6565,7 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
 	   struct bpf_fib_lookup *, params, int, plen, u32, flags)
 {
 	struct net *net = dev_net(skb->dev);
+	struct net_device *fwd_dev = NULL;
 	int rc = -EAFNOSUPPORT;
 	bool check_mtu = false;
 
@@ -6547,29 +6581,26 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
 	switch (params->family) {
 #if IS_ENABLED(CONFIG_INET)
 	case AF_INET:
-		rc = bpf_ipv4_fib_lookup(net, params, flags, check_mtu);
+		rc = bpf_ipv4_fib_lookup(net, params, flags, check_mtu,
+					 &fwd_dev);
 		break;
 #endif
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
-		rc = bpf_ipv6_fib_lookup(net, params, flags, check_mtu);
+		rc = bpf_ipv6_fib_lookup(net, params, flags, check_mtu,
+					 &fwd_dev);
 		break;
 #endif
 	}
 
 	if (rc == BPF_FIB_LKUP_RET_SUCCESS && !check_mtu) {
-		struct net_device *dev;
-
-		/* When tot_len isn't provided by user, check skb
-		 * against MTU of FIB lookup resulting net_device
+		/* without tot_len, check the skb against the FIB-result
+		 * device's MTU
 		 */
-		dev = dev_get_by_index_rcu(net, params->ifindex);
-		if (unlikely(!dev))
-			return -ENODEV;
-		if (!is_skb_forwardable(dev, skb))
+		if (!is_skb_forwardable(fwd_dev, skb))
 			rc = BPF_FIB_LKUP_RET_FRAG_NEEDED;
 
-		params->mtu_result = dev->mtu; /* union with tot_len */
+		params->mtu_result = fwd_dev->mtu; /* union with tot_len */
 	}
 
 	return rc;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 89b36de5fdbb..8d0058d88eb2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3532,6 +3532,26 @@ union bpf_attr {
  *			Use the mark present in *params*->mark for the fib lookup.
  *			This option should not be used with BPF_FIB_LOOKUP_DIRECT,
  *			as it only has meaning for full lookups.
+ *		**BPF_FIB_LOOKUP_VLAN**
+ *			If the fib lookup resolves to a VLAN device whose
+ *			parent is a real (non-VLAN) device, set
+ *			*params*->h_vlan_proto and *params*->h_vlan_TCI from
+ *			the VLAN device and replace *params*->ifindex with the
+ *			parent's ifindex. *params*->h_vlan_TCI carries the VID
+ *			only, with PCP and DEI bits zero; a consumer wanting to
+ *			set egress priority writes PCP itself. *params*->smac is
+ *			the VLAN device's own address, which can differ from the
+ *			parent's. Only the immediate parent is resolved; if it
+ *			is itself a VLAN device (QinQ) or in another namespace,
+ *			the egress cannot be reduced to a physical device plus
+ *			one tag and the lookup returns
+ *			**BPF_FIB_LKUP_RET_VLAN_FAILURE** with *params*->ifindex
+ *			left at the input. Re-issue without
+ *			**BPF_FIB_LOOKUP_VLAN** to obtain the VLAN device's own
+ *			ifindex. The swap and the vlan fields
+ *			are written only on success; other output fields keep
+ *			the helper's existing behaviour, so a frag-needed result
+ *			still reports the route mtu in *params*->mtu_result.
  *
  *		*ctx* is either **struct xdp_md** for XDP programs or
  *		**struct sk_buff** tc cls_act programs.
@@ -7327,6 +7347,7 @@ enum {
 	BPF_FIB_LOOKUP_TBID    = (1U << 3),
 	BPF_FIB_LOOKUP_SRC     = (1U << 4),
 	BPF_FIB_LOOKUP_MARK    = (1U << 5),
+	BPF_FIB_LOOKUP_VLAN    = (1U << 6),
 };
 
 enum {
@@ -7340,6 +7361,7 @@ enum {
 	BPF_FIB_LKUP_RET_NO_NEIGH,     /* no neighbor entry for nh */
 	BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
 	BPF_FIB_LKUP_RET_NO_SRC_ADDR,  /* failed to derive IP src addr */
+	BPF_FIB_LKUP_RET_VLAN_FAILURE, /* VLAN egress, parent unresolvable */
 };
 
 struct bpf_fib_lookup {
@@ -7393,7 +7415,11 @@ struct bpf_fib_lookup {
 
 	union {
 		struct {
-			/* output */
+			/*
+			 * output with BPF_FIB_LOOKUP_VLAN: set from the
+			 * resolved egress VLAN device (see the flag); zeroed
+			 * on other successful lookups.
+			 */
 			__be16	h_vlan_proto;
 			__be16	h_vlan_TCI;
 		};
-- 
2.54.0


^ permalink raw reply related

* [PATCH bpf-next v4 2/3] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT flag to bpf_fib_lookup() helper
From: Avinash Duduskar @ 2026-06-23  2:51 UTC (permalink / raw)
  To: ast, daniel, andrii
  Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
	hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
	rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
	toke, dsahern
In-Reply-To: <20260623025147.1001664-1-avinash.duduskar@gmail.com>

BPF_FIB_LOOKUP_VLAN resolves a VLAN egress. The reverse is also
useful: an XDP program receiving a VLAN-tagged frame on a physical
device wants the lookup to behave as if the packet had arrived on the
corresponding VLAN subinterface, so iif-based policy routing and VRF
table selection use the right ingress.

Add BPF_FIB_LOOKUP_VLAN_INPUT. When set, params->h_vlan_proto and
params->h_vlan_TCI are read as an input VLAN tag and the matching VLAN
device of params->ifindex is resolved with __vlan_find_dev_deep_rcu().
The device must be up and in the same network namespace as
params->ifindex (a VLAN device can be moved to another netns while
registered on its parent; receive would deliver into that other
namespace, which a lookup here cannot represent). If params->ifindex
is itself a VLAN device, its inner (QinQ) subinterface is matched.
For a bond or team, a tag on a port matches no device and returns
NOT_FWDED; pass the master's ifindex.
The lookup then runs with the resolved device as the ingress;
params->ifindex itself is not modified on the input side. When the
resolved device is enslaved to a VRF, both the full lookup (via the
l3mdev rule) and BPF_FIB_LOOKUP_DIRECT (via l3mdev_fib_table_rcu())
select the VRF's table from the resolved ingress. That follows from
feeding the resolved device to the flow as the ingress
(fl4.flowi4_iif = dev->ifindex), which is what makes l3mdev resolve
the VRF master from the subinterface rather than from
params->ifindex.

The two failure classes get different treatment on purpose. A
h_vlan_proto other than 802.1Q/802.1ad is API misuse and returns
-EINVAL, since it would otherwise reach the WARN in vlan_proto_idx()
with a program-controlled value. An unmatched VID, a device that is
down, or one in another namespace is a data outcome and returns
BPF_FIB_LKUP_RET_NOT_FWDED, matching the DIRECT path when
fib_get_table() finds no table and mirroring real ingress, where the
receive path drops such frames. A VID of 0 (a priority tag) is looked
up literally and normally fails the same way; receive instead
processes such frames untagged, so callers should not set the flag for
priority tags. Proceeding on the physical device for any of these
would be fail-open for the policy-routing cases above.

The h_vlan fields share a union with tbid, so the flag cannot be
combined with BPF_FIB_LOOKUP_TBID. It describes ingress, so it also
cannot be combined with BPF_FIB_LOOKUP_OUTPUT. Both combinations
return -EINVAL; restricting now keeps a later relaxation backward
compatible. Combining with BPF_FIB_LOOKUP_VLAN is allowed: the tag is
consumed on the ingress side and the egress tag is written on
success.

Under !CONFIG_VLAN_8021Q the __vlan_find_dev_deep_rcu() stub returns
NULL, so every lookup with the flag returns NOT_FWDED, which is
correct since no VLAN device can exist.

Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
---
 include/uapi/linux/bpf.h       | 21 ++++++++++-
 net/core/filter.c              | 66 +++++++++++++++++++++++++++++++---
 tools/include/uapi/linux/bpf.h | 21 ++++++++++-
 3 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8d0058d88eb2..46a1443534bd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3552,6 +3552,22 @@ union bpf_attr {
  *			are written only on success; other output fields keep
  *			the helper's existing behaviour, so a frag-needed result
  *			still reports the route mtu in *params*->mtu_result.
+ *		**BPF_FIB_LOOKUP_VLAN_INPUT**
+ *			Treat *params*->h_vlan_proto and *params*->h_vlan_TCI
+ *			as an input VLAN tag and run the lookup as if ingress
+ *			had happened on the VLAN subinterface carrying that tag
+ *			on *params*->ifindex. The VID is the low 12 bits of
+ *			*params*->h_vlan_TCI; *params*->h_vlan_proto must be
+ *			ETH_P_8021Q or ETH_P_8021AD in network byte order, else
+ *			**-EINVAL**. If *params*->ifindex is itself a VLAN
+ *			device, its inner (QinQ) subinterface is matched; for a
+ *			bond or team, pass the master's ifindex. An unmatched
+ *			tag, a down device, or one in another namespace returns
+ *			**BPF_FIB_LKUP_RET_NOT_FWDED**, mirroring real ingress.
+ *			A VID of 0 is looked up literally, so do not set this
+ *			flag for priority-tagged frames. Cannot be combined with
+ *			**BPF_FIB_LOOKUP_TBID** or **BPF_FIB_LOOKUP_OUTPUT**
+ *			(returns **-EINVAL**).
  *
  *		*ctx* is either **struct xdp_md** for XDP programs or
  *		**struct sk_buff** tc cls_act programs.
@@ -7348,6 +7364,7 @@ enum {
 	BPF_FIB_LOOKUP_SRC     = (1U << 4),
 	BPF_FIB_LOOKUP_MARK    = (1U << 5),
 	BPF_FIB_LOOKUP_VLAN    = (1U << 6),
+	BPF_FIB_LOOKUP_VLAN_INPUT = (1U << 7),
 };
 
 enum {
@@ -7418,7 +7435,9 @@ struct bpf_fib_lookup {
 			/*
 			 * output with BPF_FIB_LOOKUP_VLAN: set from the
 			 * resolved egress VLAN device (see the flag); zeroed
-			 * on other successful lookups.
+			 * on other successful lookups. input with
+			 * BPF_FIB_LOOKUP_VLAN_INPUT: the VLAN tag to scope
+			 * the lookup by.
 			 */
 			__be16	h_vlan_proto;
 			__be16	h_vlan_TCI;
diff --git a/net/core/filter.c b/net/core/filter.c
index 8345295d84de..fc603cc36ce9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6228,6 +6228,25 @@ static int bpf_fib_set_fwd_params(struct net_device *dev,
 
 	return 0;
 }
+
+static struct net_device *bpf_fib_vlan_input_dev(struct net_device *dev,
+						 const struct bpf_fib_lookup *params)
+{
+	__be16 proto = params->h_vlan_proto;
+	struct net_device *vlan_dev;
+	u16 vid;
+
+	if (proto != htons(ETH_P_8021Q) && proto != htons(ETH_P_8021AD))
+		return ERR_PTR(-EINVAL);
+
+	vid = ntohs(params->h_vlan_TCI) & VLAN_VID_MASK;
+	vlan_dev = __vlan_find_dev_deep_rcu(dev, proto, vid);
+	if (!vlan_dev || !(vlan_dev->flags & IFF_UP) ||
+	    !net_eq(dev_net(vlan_dev), dev_net(dev)))
+		return NULL;
+
+	return vlan_dev;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_INET)
@@ -6249,6 +6268,14 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (unlikely(!dev))
 		return -ENODEV;
 
+	if (flags & BPF_FIB_LOOKUP_VLAN_INPUT) {
+		dev = bpf_fib_vlan_input_dev(dev, params);
+		if (IS_ERR(dev))
+			return PTR_ERR(dev);
+		if (!dev)
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
+	}
+
 	/* verify forwarding is enabled on this interface */
 	in_dev = __in_dev_get_rcu(dev);
 	if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
@@ -6258,7 +6285,11 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 		fl4.flowi4_iif = 1;
 		fl4.flowi4_oif = params->ifindex;
 	} else {
-		fl4.flowi4_iif = params->ifindex;
+		/*
+		 * dev->ifindex, not params->ifindex: VLAN_INPUT may have
+		 * resolved dev to a subinterface above.
+		 */
+		fl4.flowi4_iif = dev->ifindex;
 		fl4.flowi4_oif = 0;
 	}
 	fl4.flowi4_dscp = inet_dsfield_to_dscp(params->tos);
@@ -6401,6 +6432,14 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	if (unlikely(!dev))
 		return -ENODEV;
 
+	if (flags & BPF_FIB_LOOKUP_VLAN_INPUT) {
+		dev = bpf_fib_vlan_input_dev(dev, params);
+		if (IS_ERR(dev))
+			return PTR_ERR(dev);
+		if (!dev)
+			return BPF_FIB_LKUP_RET_NOT_FWDED;
+	}
+
 	idev = __in6_dev_get_safely(dev);
 	if (unlikely(!idev || !READ_ONCE(idev->cnf.forwarding)))
 		return BPF_FIB_LKUP_RET_FWD_DISABLED;
@@ -6409,7 +6448,12 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 		fl6.flowi6_iif = 1;
 		oif = fl6.flowi6_oif = params->ifindex;
 	} else {
-		oif = fl6.flowi6_iif = params->ifindex;
+		/*
+		 * dev->ifindex, not params->ifindex: VLAN_INPUT may have
+		 * resolved dev to a subinterface above.
+		 */
+		oif = dev->ifindex;
+		fl6.flowi6_iif = oif;
 		fl6.flowi6_oif = 0;
 		strict = RT6_LOOKUP_F_HAS_SADDR;
 	}
@@ -6525,7 +6569,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 #define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
 			     BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
 			     BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK | \
-			     BPF_FIB_LOOKUP_VLAN)
+			     BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_VLAN_INPUT)
+
+static bool bpf_fib_lookup_flags_ok(u32 flags)
+{
+	if (flags & ~BPF_FIB_LOOKUP_MASK)
+		return false;
+
+	if ((flags & BPF_FIB_LOOKUP_VLAN_INPUT) &&
+	    (flags & (BPF_FIB_LOOKUP_TBID | BPF_FIB_LOOKUP_OUTPUT)))
+		return false;
+
+	return true;
+}
 
 BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
 	   struct bpf_fib_lookup *, params, int, plen, u32, flags)
@@ -6533,7 +6589,7 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
 	if (plen < sizeof(*params))
 		return -EINVAL;
 
-	if (flags & ~BPF_FIB_LOOKUP_MASK)
+	if (!bpf_fib_lookup_flags_ok(flags))
 		return -EINVAL;
 
 	switch (params->family) {
@@ -6572,7 +6628,7 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
 	if (plen < sizeof(*params))
 		return -EINVAL;
 
-	if (flags & ~BPF_FIB_LOOKUP_MASK)
+	if (!bpf_fib_lookup_flags_ok(flags))
 		return -EINVAL;
 
 	if (params->tot_len)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 8d0058d88eb2..46a1443534bd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3552,6 +3552,22 @@ union bpf_attr {
  *			are written only on success; other output fields keep
  *			the helper's existing behaviour, so a frag-needed result
  *			still reports the route mtu in *params*->mtu_result.
+ *		**BPF_FIB_LOOKUP_VLAN_INPUT**
+ *			Treat *params*->h_vlan_proto and *params*->h_vlan_TCI
+ *			as an input VLAN tag and run the lookup as if ingress
+ *			had happened on the VLAN subinterface carrying that tag
+ *			on *params*->ifindex. The VID is the low 12 bits of
+ *			*params*->h_vlan_TCI; *params*->h_vlan_proto must be
+ *			ETH_P_8021Q or ETH_P_8021AD in network byte order, else
+ *			**-EINVAL**. If *params*->ifindex is itself a VLAN
+ *			device, its inner (QinQ) subinterface is matched; for a
+ *			bond or team, pass the master's ifindex. An unmatched
+ *			tag, a down device, or one in another namespace returns
+ *			**BPF_FIB_LKUP_RET_NOT_FWDED**, mirroring real ingress.
+ *			A VID of 0 is looked up literally, so do not set this
+ *			flag for priority-tagged frames. Cannot be combined with
+ *			**BPF_FIB_LOOKUP_TBID** or **BPF_FIB_LOOKUP_OUTPUT**
+ *			(returns **-EINVAL**).
  *
  *		*ctx* is either **struct xdp_md** for XDP programs or
  *		**struct sk_buff** tc cls_act programs.
@@ -7348,6 +7364,7 @@ enum {
 	BPF_FIB_LOOKUP_SRC     = (1U << 4),
 	BPF_FIB_LOOKUP_MARK    = (1U << 5),
 	BPF_FIB_LOOKUP_VLAN    = (1U << 6),
+	BPF_FIB_LOOKUP_VLAN_INPUT = (1U << 7),
 };
 
 enum {
@@ -7418,7 +7435,9 @@ struct bpf_fib_lookup {
 			/*
 			 * output with BPF_FIB_LOOKUP_VLAN: set from the
 			 * resolved egress VLAN device (see the flag); zeroed
-			 * on other successful lookups.
+			 * on other successful lookups. input with
+			 * BPF_FIB_LOOKUP_VLAN_INPUT: the VLAN tag to scope
+			 * the lookup by.
 			 */
 			__be16	h_vlan_proto;
 			__be16	h_vlan_TCI;
-- 
2.54.0


^ permalink raw reply related

* [PATCH bpf-next v4 3/3] selftests/bpf: Add bpf_fib_lookup() VLAN flag tests
From: Avinash Duduskar @ 2026-06-23  2:51 UTC (permalink / raw)
  To: ast, daniel, andrii
  Cc: eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, shuah,
	hawk, yatsenko, leon.hwang, kpsingh, a.s.protopopov, ameryhung,
	rongtao, eyal.birger, bpf, netdev, linux-kernel, linux-kselftest,
	toke, dsahern
In-Reply-To: <20260623025147.1001664-1-avinash.duduskar@gmail.com>

Cover both directions of the new VLAN flags in the fib_lookup test,
38 table cases plus dedicated cross-netns and XDP-redirect subtests.

For BPF_FIB_LOOKUP_VLAN the egress cases assert: without the flag the
lookup returns the VLAN netdev's ifindex and zeroed vlan fields, with
the flag it returns the parent's ifindex plus the tag (including via
a neighbour resolved on the VLAN device, in OUTPUT mode, over a bond,
and through a DIRECT|TBID table), with the flag on a non-VLAN egress
it changes nothing, for a stacked VLAN (QinQ) it returns
BPF_FIB_LKUP_RET_VLAN_FAILURE with params->ifindex left at the input, a
lookup without the flag returns the inner VLAN device's ifindex, and
a frag-needed return reports the route mtu in mtu_result while leaving
the swap unwritten.

The VLAN_FAILURE arms are IPv4. bpf_ipv6_fib_lookup() restores
params->ifindex with the same save/restore the IPv4 arms exercise, so an
IPv6 VLAN_FAILURE arm would only re-test shared code.

For BPF_FIB_LOOKUP_VLAN_INPUT, an iif rule on the subinterface routes
the same destination to a different gateway, so the asserted gateway
shows which device the lookup used as ingress: without the flag the
main table answers, with a matching tag the subinterface's table
does, with or without SKIP_NEIGH, and BPF_FIB_LOOKUP_SRC selects the
subinterface's address. A VRF-enslaved subinterface selects the VRF
table through the l3mdev rule and, with DIRECT, through
l3mdev_fib_table_rcu(). One case sets BPF_FIB_LOOKUP_VLAN as well and
asserts both directions work in a single lookup. Resolution semantics
are pinned: an 802.1ad tag resolves its device, PCP and DEI bits in
h_vlan_TCI are ignored, a VLAN ifindex resolves the inner QinQ
device, a tag on a bond master resolves while the same tag on the
bond port does not.

The error cases assert -EINVAL for an invalid h_vlan_proto on both
address families, for the TBID and OUTPUT flag combinations and for
an unknown flag bit, and BPF_FIB_LKUP_RET_NOT_FWDED for a VID with no
configured device on both families, for a VID-0 priority tag and for
a device that exists but is down. The failure cases also assert that
params is left untouched. By contrast, a no-neighbour case whose
input and egress devices differ asserts NO_NEIGH reports the egress
ifindex, not the input: only VLAN_FAILURE rewinds params->ifindex to
the input.

A separate subtest moves a VLAN device into a second netns while it
stays registered on its parent, and checks both directions refuse to
cross the boundary: the input flag fails closed with the tag and
ifindex untouched, and the egress flag returns
BPF_FIB_LKUP_RET_VLAN_FAILURE without publishing the foreign parent's
ifindex.

The tbid read-back check is skipped for DIRECT cases that set
BPF_FIB_LOOKUP_VLAN, since a successful swap packs the vlan fields
into the union the check reads.

Re-run the cases through bpf_xdp_fib_lookup() as well: the egress flag
exists because VLAN devices have no XDP xmit, so XDP is the primary
consumer. bpf_prog_test_run uses the netns' loopback for the xdp context's
device, so the lookup runs against the test netns' FIB, and the
path-independent results (return code, swapped ifindex, vlan tag, gateway)
are asserted to match the skb path.

A live-frames subtest (test_fib_lookup_vlan_redirect) drives real
frames through the XDP redirect path with BPF_F_TEST_XDP_LIVE_FRAMES, the
native xdp_do_redirect() plus xdp_do_flush() path. A reducible VLAN
egress is redirected to the physical parent and delivered to its peer;
a QinQ egress returns VLAN_FAILURE and is passed to the stack, since
redirecting to the VLAN device would drop the frame at xdp_do_flush()
(no ndo_xdp_xmit). The redirect program distinguishes SUCCESS from not;
the table and netns arms pin the exact VLAN_FAILURE value.

Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
---
 .../selftests/bpf/prog_tests/fib_lookup.c     | 696 +++++++++++++++++-
 .../testing/selftests/bpf/progs/fib_lookup.c  |  36 +
 2 files changed, 728 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/fib_lookup.c b/tools/testing/selftests/bpf/prog_tests/fib_lookup.c
index bd7658958004..d51bc3332e56 100644
--- a/tools/testing/selftests/bpf/prog_tests/fib_lookup.c
+++ b/tools/testing/selftests/bpf/prog_tests/fib_lookup.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
 
 #include <linux/rtnetlink.h>
+#include <linux/if_ether.h>
 #include <sys/types.h>
 #include <net/if.h>
 
@@ -23,6 +24,7 @@
 #define IPV4_TBID_ADDR		"172.0.0.254"
 #define IPV4_TBID_NET		"172.0.0.0"
 #define IPV4_TBID_DST		"172.0.0.2"
+#define IPV4_TBID_NONEIGH_DST	"172.0.0.5"
 #define IPV6_TBID_ADDR		"fd00::FFFF"
 #define IPV6_TBID_NET		"fd00::"
 #define IPV6_TBID_DST		"fd00::2"
@@ -37,6 +39,41 @@
 #define IPV6_LOCAL		"fd01::3"
 #define IPV6_GW1		"fd01::1"
 #define IPV6_GW2		"fd01::2"
+#define VLAN_ID			100
+#define VLAN_IFACE		"veth1.100"
+#define VLAN_ID_DOWN		102
+#define VLAN_IFACE_DOWN		"veth1.102"
+#define QINQ_OUTER_IFACE	"veth1.200"
+#define QINQ_INNER_IFACE	"veth1.200.300"
+#define VLAN_TABLE		"300"
+#define IPV4_VLAN_IFACE_ADDR	"10.5.0.254"
+#define IPV4_VLAN_EGRESS_DST	"10.5.0.2"
+#define IPV4_QINQ_DST		"10.7.0.2"
+#define IPV4_VLAN_DST		"10.6.0.2"
+#define IPV4_VLAN_GW		"10.5.0.1"
+#define IPV6_VLAN_IFACE_ADDR	"fd02::254"
+#define IPV6_VLAN_EGRESS_DST	"fd02::2"
+#define IPV6_VLAN_DST		"fd03::2"
+#define IPV6_VLAN_GW		"fd02::1"
+#define VLAN_VID_UNUSED		999
+#define VRF_IFACE		"vrf-blue"
+#define VRF_TABLE		"1000"
+#define VRF_VLAN_ID		101
+#define VRF_VLAN_IFACE		"veth1.101"
+#define IPV4_VRF_IFACE_ADDR	"10.8.0.254"
+#define IPV4_VRF_GW		"10.8.0.1"
+#define IPV4_VRF_DST		"10.9.0.2"
+#define TBID_VLAN_ID		50
+#define TBID_VLAN_IFACE		"veth2.50"
+#define IPV4_TBID_VLAN_DST	"172.2.0.2"
+#define IPV4_BOND_VLAN_DST	"10.11.0.2"
+#define IPV4_VLAN_MTU_DST	"10.5.9.2"
+#define QINQ_AD_VLAN_ID		200
+#define QINQ_INNER_VLAN_ID	300
+#define BOND_IFACE		"bond99"
+#define BOND_PORT		"veth3"
+#define BOND_PORT_PEER		"veth4"
+#define BOND_VLAN_ID		500
 #define DMAC			"11:11:11:11:11:11"
 #define DMAC_INIT { 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, }
 #define DMAC2			"01:01:01:01:01:01"
@@ -52,6 +89,17 @@ struct fib_lookup_test {
 	__u32 tbid;
 	__u8 dmac[6];
 	__u32 mark;
+	/*
+	 * input tag with BPF_FIB_LOOKUP_VLAN_INPUT; expected output tag
+	 * with BPF_FIB_LOOKUP_VLAN (checked when check_vlan is set)
+	 */
+	__u16 vlan_proto;
+	__u16 vlan_id;
+	bool check_vlan;
+	const char *expected_dev; /* expected params->ifindex after lookup */
+	const char *iif;	  /* override the default veth1 input device */
+	__u16 tot_len;		  /* triggers the in-lookup mtu check when set */
+	__u16 expected_mtu;	  /* expected mtu_result (union with tot_len) */
 };
 
 static const struct fib_lookup_test tests[] = {
@@ -79,6 +127,17 @@ static const struct fib_lookup_test tests[] = {
 	  .daddr = IPV4_TBID_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
 	  .lookup_flags = BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID, .tbid = 100,
 	  .dmac = DMAC_INIT2, },
+	/*
+	 * An error that returns after the egress device is resolved must
+	 * report the egress ifindex, not the input. This routes from input
+	 * veth1 via veth2 (table 100) to a dst with no neighbour, so
+	 * input != egress, pinning NO_NEIGH to the egress device.
+	 */
+	{ .desc = "IPv4 NO_NEIGH reports the egress ifindex, not the input",
+	  .daddr = IPV4_TBID_NONEIGH_DST,
+	  .expected_ret = BPF_FIB_LKUP_RET_NO_NEIGH,
+	  .lookup_flags = BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID, .tbid = 100,
+	  .expected_dev = "veth2", },
 	{ .desc = "IPv6 TBID lookup failure",
 	  .daddr = IPV6_TBID_DST, .expected_ret = BPF_FIB_LKUP_RET_NOT_FWDED,
 	  .lookup_flags = BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID,
@@ -142,6 +201,223 @@ static const struct fib_lookup_test tests[] = {
 	  .expected_dst = IPV6_GW1,
 	  .lookup_flags = BPF_FIB_LOOKUP_SKIP_NEIGH,
 	  .mark = MARK, },
+	/* vlan egress resolution */
+	/*
+	 * Invariant the VLAN-egress arms jointly enforce: a
+	 * BPF_FIB_LOOKUP_VLAN SUCCESS always carries a physical,
+	 * xmit-capable ifindex -- no SUCCESS ever returns a VLAN-device
+	 * ifindex. Reducible arms pin ifindex == the physical parent; the
+	 * QinQ and foreign-netns arms pin VLAN_FAILURE with params->ifindex
+	 * left at the input, so a regression to best-effort (SUCCESS + the
+	 * VLAN ifindex) fails one.
+	 */
+	{ .desc = "IPv4 VLAN egress, no flag",
+	  .daddr = IPV4_VLAN_EGRESS_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = VLAN_IFACE, .check_vlan = true, },
+	{ .desc = "IPv4 VLAN egress, single VLAN",
+	  .daddr = IPV4_VLAN_EGRESS_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	/*
+	 * skb path without tot_len: mtu_result is the FIB result (VLAN)
+	 * device's mtu (1400) with or without the swap, not the parent's (1500)
+	 */
+	{ .desc = "IPv4 VLAN egress, skb-path mtu is the VLAN device's without the flag",
+	  .daddr = IPV4_VLAN_EGRESS_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = VLAN_IFACE, .check_vlan = true, .expected_mtu = 1400, },
+	{ .desc = "IPv4 VLAN egress, skb-path mtu stays the VLAN device's after the swap",
+	  .daddr = IPV4_VLAN_EGRESS_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, .expected_mtu = 1400, },
+	{ .desc = "IPv4 VLAN egress, flag set but egress is not a VLAN",
+	  .daddr = IPV4_NUD_FAILED_ADDR, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true, },
+	{ .desc = "IPv4 VLAN egress, QinQ not reducible (VLAN_FAILURE)",
+	  .daddr = IPV4_QINQ_DST,
+	  .expected_ret = BPF_FIB_LKUP_RET_VLAN_FAILURE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true, },
+	{ .desc = "IPv4 QinQ egress without the flag (escape hatch)",
+	  .daddr = IPV4_QINQ_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = QINQ_INNER_IFACE, },
+	{ .desc = "IPv6 VLAN egress, single VLAN",
+	  .daddr = IPV6_VLAN_EGRESS_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN egress, neighbour on the VLAN device",
+	  .daddr = IPV4_VLAN_EGRESS_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, .dmac = DMAC_INIT, },
+	{ .desc = "IPv4 VLAN egress in OUTPUT mode",
+	  .daddr = IPV4_VLAN_EGRESS_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .iif = VLAN_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_OUTPUT | BPF_FIB_LOOKUP_VLAN |
+			  BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN egress over a bond",
+	  .daddr = IPV4_BOND_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = BOND_IFACE, .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = BOND_VLAN_ID, },
+	{ .desc = "IPv4 VLAN egress via TBID table",
+	  .daddr = IPV4_TBID_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .lookup_flags = BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID |
+			  BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .tbid = 100,
+	  .expected_dev = "veth2", .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = TBID_VLAN_ID, },
+	{ .desc = "IPv4 VLAN egress, success writes mtu_result with the swap",
+	  .daddr = IPV4_VLAN_MTU_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .tot_len = 500, .expected_mtu = 1000,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN egress, FRAG_NEEDED reports mtu, swap unwritten",
+	  .daddr = IPV4_VLAN_MTU_DST, .expected_ret = BPF_FIB_LKUP_RET_FRAG_NEEDED,
+	  .tot_len = 1400, .expected_mtu = 1000,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .expected_dev = "veth1", .check_vlan = true, },
+	/* vlan tag as lookup input */
+	{ .desc = "IPv4 VLAN input, no flag",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_GW1,
+	  .lookup_flags = BPF_FIB_LOOKUP_SKIP_NEIGH, },
+	{ .desc = "IPv4 VLAN input, tag selects subinterface route",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_VLAN_GW, .expected_dev = VLAN_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv6 VLAN input, tag selects subinterface route",
+	  .daddr = IPV6_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV6_VLAN_GW, .expected_dev = VLAN_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN input and egress combined",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_VLAN_GW, .expected_dev = "veth1",
+	  .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_VLAN |
+			  BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN input, neighbour resolved on the route",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_VLAN_GW, .expected_dev = VLAN_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, .dmac = DMAC_INIT2, },
+	{ .desc = "IPv4 VLAN input, source address from the subinterface",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_src = IPV4_VLAN_IFACE_ADDR,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SRC |
+			  BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	/*
+	 * VRF: the resolved subinterface is enslaved, so the l3mdev rule
+	 * (full lookup) and l3mdev_fib_table_rcu() (DIRECT) must select
+	 * the VRF table from the resolved ingress
+	 */
+	{ .desc = "IPv4 VLAN input, VRF subinterface, no flag",
+	  .daddr = IPV4_VRF_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_GW1,
+	  .lookup_flags = BPF_FIB_LOOKUP_SKIP_NEIGH, },
+	{ .desc = "IPv4 VLAN input, tag selects VRF table",
+	  .daddr = IPV4_VRF_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_VRF_GW, .expected_dev = VRF_VLAN_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VRF_VLAN_ID, },
+	{ .desc = "IPv4 VLAN input, DIRECT uses VRF table from resolved ingress",
+	  .daddr = IPV4_VRF_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_VRF_GW, .expected_dev = VRF_VLAN_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_DIRECT |
+			  BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VRF_VLAN_ID, },
+	/*
+	 * failure arms also assert params is left untouched: ifindex still
+	 * names the physical device and the input tag bytes survive
+	 */
+	{ .desc = "IPv4 VLAN input, invalid proto",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = -EINVAL,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = 0x1234, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN input, unmatched VID",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_NOT_FWDED,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_VID_UNUSED, },
+	{ .desc = "IPv4 VLAN input, subinterface down",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_NOT_FWDED,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID_DOWN, },
+	/*
+	 * the resolver runs before the forwarding check, so on devices
+	 * with forwarding off FWD_DISABLED (not NOT_FWDED) proves the tag
+	 * resolved to that device and the lookup used it as ingress
+	 */
+	{ .desc = "IPv4 VLAN input, 802.1ad tag",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_FWD_DISABLED,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021AD, .vlan_id = QINQ_AD_VLAN_ID, },
+	{ .desc = "IPv4 VLAN input, PCP and DEI bits ignored in TCI",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_SUCCESS,
+	  .expected_dst = IPV4_VLAN_GW,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = 0xe000 | VLAN_ID, },
+	{ .desc = "IPv4 VLAN input, inner QinQ device from VLAN ifindex",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_FWD_DISABLED,
+	  .iif = QINQ_OUTER_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = QINQ_INNER_VLAN_ID, },
+	/*
+	 * bonding: the VLANs live on the master, as on receive, where the
+	 * frame is steered to the master before VLAN processing; a port
+	 * ifindex does not match (ports carry vid state but no VLAN devs)
+	 */
+	{ .desc = "IPv4 VLAN input, tag on bond master resolves",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_FWD_DISABLED,
+	  .iif = BOND_IFACE,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = BOND_VLAN_ID, },
+	{ .desc = "IPv4 VLAN input, tag on bond port does not match",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_NOT_FWDED,
+	  .iif = BOND_PORT, .expected_dev = BOND_PORT, .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = BOND_VLAN_ID, },
+	{ .desc = "IPv6 VLAN input, invalid proto",
+	  .daddr = IPV6_VLAN_DST, .expected_ret = -EINVAL,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = 0x1234, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN input, VID 0 priority tag fails closed",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_NOT_FWDED,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = 0, },
+	{ .desc = "IPv6 VLAN input, unmatched VID",
+	  .daddr = IPV6_VLAN_DST, .expected_ret = BPF_FIB_LKUP_RET_NOT_FWDED,
+	  .expected_dev = "veth1", .check_vlan = true,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_SKIP_NEIGH,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_VID_UNUSED, },
+	{ .desc = "unknown flag bit rejected",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = -EINVAL,
+	  .lookup_flags = (1 << 14) | BPF_FIB_LOOKUP_SKIP_NEIGH, },
+	{ .desc = "IPv4 VLAN input rejected with TBID",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = -EINVAL,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_TBID,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
+	{ .desc = "IPv4 VLAN input rejected with OUTPUT",
+	  .daddr = IPV4_VLAN_DST, .expected_ret = -EINVAL,
+	  .lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT | BPF_FIB_LOOKUP_OUTPUT,
+	  .vlan_proto = ETH_P_8021Q, .vlan_id = VLAN_ID, },
 };
 
 static int setup_netns(void)
@@ -204,6 +480,110 @@ static int setup_netns(void)
 	SYS(fail, "ip rule add prio 2 fwmark %d lookup %s", MARK, MARK_TABLE);
 	SYS(fail, "ip -6 rule add prio 2 fwmark %d lookup %s", MARK, MARK_TABLE);
 
+	/*
+	 * Setup for vlan tests: a subinterface for egress resolution and
+	 * tag-as-input, a QinQ stack, and an iif rule so the input tests
+	 * observe which device the lookup used as ingress.
+	 */
+	SYS(fail, "ip link add link veth1 name %s type vlan id %d",
+	    VLAN_IFACE, VLAN_ID);
+	SYS(fail, "ip link set dev %s up", VLAN_IFACE);
+	/*
+	 * lower than the veth1 parent (1500): the skb-path mtu check uses the
+	 * FIB result (VLAN) device, so mtu_result is this value with or
+	 * without the egress swap, which two arms below pin
+	 */
+	SYS(fail, "ip link set dev %s mtu 1400", VLAN_IFACE);
+	SYS(fail, "ip addr add %s/24 dev %s", IPV4_VLAN_IFACE_ADDR, VLAN_IFACE);
+	SYS(fail, "ip addr add %s/64 dev %s nodad", IPV6_VLAN_IFACE_ADDR, VLAN_IFACE);
+
+	/*
+	 * stays down: the input flag must treat its tag the way real
+	 * ingress treats a frame arriving on a down VLAN device (drop)
+	 */
+	SYS(fail, "ip link add link veth1 name %s type vlan id %d",
+	    VLAN_IFACE_DOWN, VLAN_ID_DOWN);
+
+	err = write_sysctl("/proc/sys/net/ipv4/conf/" VLAN_IFACE "/forwarding", "1");
+	if (!ASSERT_OK(err, "write_sysctl(net.ipv4.conf." VLAN_IFACE ".forwarding)"))
+		goto fail;
+
+	err = write_sysctl("/proc/sys/net/ipv6/conf/" VLAN_IFACE "/forwarding", "1");
+	if (!ASSERT_OK(err, "write_sysctl(net.ipv6.conf." VLAN_IFACE ".forwarding)"))
+		goto fail;
+
+	SYS(fail, "ip link add link veth1 name %s type vlan proto 802.1ad id 200",
+	    QINQ_OUTER_IFACE);
+	SYS(fail, "ip link add link %s name %s type vlan id 300",
+	    QINQ_OUTER_IFACE, QINQ_INNER_IFACE);
+	SYS(fail, "ip link set dev %s up", QINQ_OUTER_IFACE);
+	SYS(fail, "ip link set dev %s up", QINQ_INNER_IFACE);
+	SYS(fail, "ip route add %s/32 dev %s", IPV4_QINQ_DST, QINQ_INNER_IFACE);
+
+	SYS(fail, "ip route add %s/32 via %s", IPV4_VLAN_DST, IPV4_GW1);
+	SYS(fail, "ip route add table %s %s/32 via %s",
+	    VLAN_TABLE, IPV4_VLAN_DST, IPV4_VLAN_GW);
+	SYS(fail, "ip rule add prio 3 iif %s lookup %s", VLAN_IFACE, VLAN_TABLE);
+	SYS(fail, "ip -6 route add %s/128 via %s", IPV6_VLAN_DST, IPV6_GW1);
+	SYS(fail, "ip -6 route add table %s %s/128 via %s",
+	    VLAN_TABLE, IPV6_VLAN_DST, IPV6_VLAN_GW);
+	SYS(fail, "ip -6 rule add prio 3 iif %s lookup %s", VLAN_IFACE, VLAN_TABLE);
+
+	/*
+	 * a bond with one port and a VLAN on the bond: VLANs on a bond
+	 * live on the master, so resolution succeeds for the master's
+	 * ifindex and fails closed for a port's, matching receive, which
+	 * steers the frame to the master before VLAN processing
+	 */
+	SYS(fail, "ip link add %s type bond", BOND_IFACE);
+	SYS(fail, "ip link add %s type veth peer name %s", BOND_PORT, BOND_PORT_PEER);
+	SYS(fail, "ip link set %s master %s", BOND_PORT, BOND_IFACE);
+	SYS(fail, "ip link set dev %s up", BOND_IFACE);
+	SYS(fail, "ip link set dev %s up", BOND_PORT);
+	SYS(fail, "ip link add link %s name %s.%d type vlan id %d",
+	    BOND_IFACE, BOND_IFACE, BOND_VLAN_ID, BOND_VLAN_ID);
+	SYS(fail, "ip link set dev %s.%d up", BOND_IFACE, BOND_VLAN_ID);
+	SYS(fail, "ip route add %s/32 dev %s.%d",
+	    IPV4_BOND_VLAN_DST, BOND_IFACE, BOND_VLAN_ID);
+
+	/*
+	 * a VRF with its own dedicated subinterface (the iif rules above
+	 * must not see it), for the table-selection-by-ingress cases
+	 */
+	SYS(fail, "ip link add %s type vrf table %s", VRF_IFACE, VRF_TABLE);
+	SYS(fail, "ip link set dev %s up", VRF_IFACE);
+	SYS(fail, "ip link add link veth1 name %s type vlan id %d",
+	    VRF_VLAN_IFACE, VRF_VLAN_ID);
+	SYS(fail, "ip link set %s master %s", VRF_VLAN_IFACE, VRF_IFACE);
+	SYS(fail, "ip link set dev %s up", VRF_VLAN_IFACE);
+	SYS(fail, "ip addr add %s/24 dev %s", IPV4_VRF_IFACE_ADDR, VRF_VLAN_IFACE);
+	err = write_sysctl("/proc/sys/net/ipv4/conf/" VRF_VLAN_IFACE "/forwarding", "1");
+	if (!ASSERT_OK(err, "write_sysctl(net.ipv4.conf." VRF_VLAN_IFACE ".forwarding)"))
+		goto fail;
+	SYS(fail, "ip route add %s/32 via %s", IPV4_VRF_DST, IPV4_GW1);
+	SYS(fail, "ip route add table %s %s/32 via %s",
+	    VRF_TABLE, IPV4_VRF_DST, IPV4_VRF_GW);
+
+	/* neighbours on the VLAN subinterface for the non-SKIP_NEIGH cases */
+	err = write_sysctl("/proc/sys/net/ipv4/neigh/" VLAN_IFACE "/gc_stale_time", "900");
+	if (!ASSERT_OK(err, "write_sysctl(net.ipv4.neigh." VLAN_IFACE ".gc_stale_time)"))
+		goto fail;
+	SYS(fail, "ip neigh add %s dev %s lladdr %s nud stale",
+	    IPV4_VLAN_EGRESS_DST, VLAN_IFACE, DMAC);
+	SYS(fail, "ip neigh add %s dev %s lladdr %s nud stale",
+	    IPV4_VLAN_GW, VLAN_IFACE, DMAC2);
+
+	/* a VLAN on veth2 with a route in the tbid test table */
+	SYS(fail, "ip link add link veth2 name %s type vlan id %d",
+	    TBID_VLAN_IFACE, TBID_VLAN_ID);
+	SYS(fail, "ip link set dev %s up", TBID_VLAN_IFACE);
+	SYS(fail, "ip route add table 100 %s/32 dev %s",
+	    IPV4_TBID_VLAN_DST, TBID_VLAN_IFACE);
+
+	/* a locked-mtu route via the subinterface for the FRAG_NEEDED case */
+	SYS(fail, "ip route add %s/32 dev %s mtu lock 1000",
+	    IPV4_VLAN_MTU_DST, VLAN_IFACE);
+
 	return 0;
 fail:
 	return -1;
@@ -218,9 +598,16 @@ static int set_lookup_params(struct bpf_fib_lookup *params,
 	memset(params, 0, sizeof(*params));
 
 	params->l4_protocol = IPPROTO_TCP;
-	params->ifindex = ifindex;
+	params->ifindex = test->iif ? if_nametoindex(test->iif) : ifindex;
 	params->tbid = test->tbid;
 	params->mark = test->mark;
+	params->tot_len = test->tot_len;
+
+	/* h_vlan_proto/h_vlan_TCI union with tbid */
+	if (test->lookup_flags & BPF_FIB_LOOKUP_VLAN_INPUT) {
+		params->h_vlan_proto = htons(test->vlan_proto);
+		params->h_vlan_TCI = htons(test->vlan_id);
+	}
 
 	if (inet_pton(AF_INET6, test->daddr, params->ipv6_dst) == 1) {
 		params->family = AF_INET6;
@@ -298,7 +685,7 @@ void test_fib_lookup(void)
 	struct nstoken *nstoken = NULL;
 	struct __sk_buff skb = { };
 	struct fib_lookup *skel;
-	int prog_fd, err, ret, i;
+	int prog_fd, xdp_fd, err, ret, i;
 
 	/* The test does not use the skb->data, so
 	 * use pkt_v6 for both v6 and v4 test.
@@ -309,11 +696,16 @@ void test_fib_lookup(void)
 		    .ctx_in = &skb,
 		    .ctx_size_in = sizeof(skb),
 	);
+	LIBBPF_OPTS(bpf_test_run_opts, xdp_opts,
+		    .data_in = &pkt_v6,
+		    .data_size_in = sizeof(pkt_v6),
+	);
 
 	skel = fib_lookup__open_and_load();
 	if (!ASSERT_OK_PTR(skel, "skel open_and_load"))
 		return;
 	prog_fd = bpf_program__fd(skel->progs.fib_lookup);
+	xdp_fd = bpf_program__fd(skel->progs.fib_lookup_xdp);
 
 	SYS(fail, "ip netns add %s", NS_TEST);
 
@@ -352,6 +744,21 @@ void test_fib_lookup(void)
 		if (tests[i].expected_dst)
 			assert_dst_ip(fib_params, tests[i].expected_dst);
 
+		if (tests[i].expected_dev)
+			ASSERT_EQ(fib_params->ifindex,
+				  if_nametoindex(tests[i].expected_dev), "ifindex");
+
+		if (tests[i].expected_mtu)
+			ASSERT_EQ(fib_params->mtu_result, tests[i].expected_mtu,
+				  "mtu_result");
+
+		if (tests[i].check_vlan) {
+			ASSERT_EQ(fib_params->h_vlan_proto,
+				  htons(tests[i].vlan_proto), "h_vlan_proto");
+			ASSERT_EQ(fib_params->h_vlan_TCI,
+				  htons(tests[i].vlan_id), "h_vlan_TCI");
+		}
+
 		ret = memcmp(tests[i].dmac, fib_params->dmac, sizeof(tests[i].dmac));
 		if (!ASSERT_EQ(ret, 0, "dmac not match")) {
 			char expected[18], actual[18];
@@ -361,15 +768,296 @@ void test_fib_lookup(void)
 			printf("dmac expected %s actual %s ", expected, actual);
 		}
 
-		// ensure tbid is zero'd out after fib lookup.
-		if (tests[i].lookup_flags & BPF_FIB_LOOKUP_DIRECT) {
+		/*
+		 * ensure tbid is zero'd out after fib lookup. With
+		 * BPF_FIB_LOOKUP_VLAN the union holds the packed vlan
+		 * fields instead, so skip the check for those.
+		 */
+		if ((tests[i].lookup_flags & BPF_FIB_LOOKUP_DIRECT) &&
+		    !(tests[i].lookup_flags & BPF_FIB_LOOKUP_VLAN)) {
 			if (!ASSERT_EQ(skel->bss->fib_params.tbid, 0,
 					"expected fib_params.tbid to be zero"))
 				goto fail;
 		}
 	}
 
+	/*
+	 * Re-run the cases through bpf_xdp_fib_lookup(). test_run uses the
+	 * current netns' loopback for ctx->rxq->dev, so dev_net() is NS_TEST
+	 * and the lookup runs against its FIB. The path-independent results
+	 * (return code, swapped ifindex, vlan tag, gateway) must match the skb
+	 * path; the no-tot_len mtu_result is skb-specific and not rechecked.
+	 */
+	for (i = 0; i < ARRAY_SIZE(tests); i++) {
+		if (set_lookup_params(fib_params, &tests[i], skb.ifindex))
+			continue;
+
+		skel->bss->fib_lookup_ret = -1;
+		skel->bss->lookup_flags = tests[i].lookup_flags;
+
+		err = bpf_prog_test_run_opts(xdp_fd, &xdp_opts);
+		if (!ASSERT_OK(err, "xdp test_run"))
+			continue;
+
+		if (!ASSERT_EQ(skel->bss->fib_lookup_ret, tests[i].expected_ret,
+			       "xdp fib_lookup_ret"))
+			printf("(xdp) %s\n", tests[i].desc);
+
+		if (tests[i].expected_dev)
+			ASSERT_EQ(fib_params->ifindex,
+				  if_nametoindex(tests[i].expected_dev),
+				  "xdp ifindex");
+
+		if (tests[i].expected_dst)
+			assert_dst_ip(fib_params, tests[i].expected_dst);
+
+		if (tests[i].check_vlan) {
+			ASSERT_EQ(fib_params->h_vlan_proto,
+				  htons(tests[i].vlan_proto), "xdp h_vlan_proto");
+			ASSERT_EQ(fib_params->h_vlan_TCI,
+				  htons(tests[i].vlan_id), "xdp h_vlan_TCI");
+		}
+	}
+
+fail:
+	if (nstoken)
+		close_netns(nstoken);
+	SYS_NOFAIL("ip netns del " NS_TEST);
+	fib_lookup__destroy(skel);
+}
+
+#define NS_VLAN_A	"fib_lookup_vlan_ns_a"
+#define NS_VLAN_B	"fib_lookup_vlan_ns_b"
+
+/*
+ * A VLAN device can be moved to another netns while staying registered
+ * on its parent. Neither direction may then cross the boundary: the
+ * egress flag must not publish the foreign parent's ifindex, and the
+ * input flag must fail closed rather than use a foreign ingress.
+ */
+void test_fib_lookup_vlan_netns(void)
+{
+	struct bpf_fib_lookup *fib_params;
+	struct nstoken *nstoken = NULL;
+	struct __sk_buff skb = { };
+	struct fib_lookup *skel = NULL;
+	int prog_fd, err, parent_idx, vlan_idx;
+
+	LIBBPF_OPTS(bpf_test_run_opts, run_opts,
+		    .data_in = &pkt_v6,
+		    .data_size_in = sizeof(pkt_v6),
+		    .ctx_in = &skb,
+		    .ctx_size_in = sizeof(skb),
+	);
+
+	skel = fib_lookup__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel open_and_load"))
+		return;
+	prog_fd = bpf_program__fd(skel->progs.fib_lookup);
+	fib_params = &skel->bss->fib_params;
+
+	SYS(fail, "ip netns add %s", NS_VLAN_A);
+	SYS(fail, "ip netns add %s", NS_VLAN_B);
+
+	nstoken = open_netns(NS_VLAN_A);
+	if (!ASSERT_OK_PTR(nstoken, "open_netns(a)"))
+		goto fail;
+
+	SYS(fail, "ip link add veth7 type veth peer name veth8");
+	SYS(fail, "ip link set dev veth7 up");
+	SYS(fail, "ip link add link veth7 name veth7.66 type vlan id 66");
+	SYS(fail, "ip link set veth7.66 netns %s", NS_VLAN_B);
+
+	parent_idx = if_nametoindex("veth7");
+	if (!ASSERT_NEQ(parent_idx, 0, "if_nametoindex(veth7)"))
+		goto fail;
+
+	/*
+	 * input: the moved device is still in veth7's VLAN group, but it
+	 * lives in another netns, so the lookup must fail closed
+	 */
+	skb.ifindex = parent_idx;
+	memset(fib_params, 0, sizeof(*fib_params));
+	fib_params->family = AF_INET;
+	fib_params->l4_protocol = IPPROTO_TCP;
+	fib_params->ifindex = parent_idx;
+	fib_params->h_vlan_proto = htons(ETH_P_8021Q);
+	fib_params->h_vlan_TCI = htons(66);
+	if (!ASSERT_EQ(inet_pton(AF_INET, "10.66.0.2", &fib_params->ipv4_dst),
+		       1, "inet_pton(dst)"))
+		goto fail;
+
+	skel->bss->fib_lookup_ret = -1;
+	skel->bss->lookup_flags = BPF_FIB_LOOKUP_VLAN_INPUT |
+				  BPF_FIB_LOOKUP_SKIP_NEIGH;
+	err = bpf_prog_test_run_opts(prog_fd, &run_opts);
+	if (!ASSERT_OK(err, "test_run(input)"))
+		goto fail;
+	ASSERT_EQ(skel->bss->fib_lookup_ret, BPF_FIB_LKUP_RET_NOT_FWDED,
+		  "input across netns fails closed");
+	ASSERT_EQ(fib_params->ifindex, parent_idx, "ifindex untouched");
+	ASSERT_EQ(fib_params->h_vlan_TCI, htons(66), "tag untouched");
+
+	close_netns(nstoken);
+	nstoken = open_netns(NS_VLAN_B);
+	if (!ASSERT_OK_PTR(nstoken, "open_netns(b)"))
+		goto fail;
+
+	/*
+	 * egress: the fib result is the VLAN device here, but its parent
+	 * is in the other netns, so the swap must not happen
+	 */
+	SYS(fail, "ip link set dev veth7.66 up");
+	SYS(fail, "ip addr add 10.66.0.1/24 dev veth7.66");
+	err = write_sysctl("/proc/sys/net/ipv4/conf/veth7.66/forwarding", "1");
+	if (!ASSERT_OK(err, "write_sysctl(forwarding)"))
+		goto fail;
+
+	vlan_idx = if_nametoindex("veth7.66");
+	if (!ASSERT_NEQ(vlan_idx, 0, "if_nametoindex(veth7.66)"))
+		goto fail;
+
+	skb.ifindex = vlan_idx;
+	memset(fib_params, 0, sizeof(*fib_params));
+	fib_params->family = AF_INET;
+	fib_params->l4_protocol = IPPROTO_TCP;
+	fib_params->ifindex = vlan_idx;
+	if (!ASSERT_EQ(inet_pton(AF_INET, "10.66.0.2", &fib_params->ipv4_dst),
+		       1, "inet_pton(dst)") ||
+	    !ASSERT_EQ(inet_pton(AF_INET, "10.66.0.1", &fib_params->ipv4_src),
+		       1, "inet_pton(src)"))
+		goto fail;
+
+	skel->bss->fib_lookup_ret = -1;
+	skel->bss->lookup_flags = BPF_FIB_LOOKUP_VLAN |
+				  BPF_FIB_LOOKUP_SKIP_NEIGH;
+	err = bpf_prog_test_run_opts(prog_fd, &run_opts);
+	if (!ASSERT_OK(err, "test_run(egress)"))
+		goto fail;
+	ASSERT_EQ(skel->bss->fib_lookup_ret, BPF_FIB_LKUP_RET_VLAN_FAILURE,
+		  "egress returns VLAN_FAILURE");
+	ASSERT_EQ(fib_params->ifindex, vlan_idx,
+		  "foreign parent not published");
+	ASSERT_EQ(fib_params->h_vlan_TCI, 0, "vlan fields zero");
+
+fail:
+	if (nstoken)
+		close_netns(nstoken);
+	SYS_NOFAIL("ip netns del " NS_VLAN_A);
+	SYS_NOFAIL("ip netns del " NS_VLAN_B);
+	fib_lookup__destroy(skel);
+}
+
+#define REDIRECT_NPKTS 1000
+
+/*
+ * The egress flag exists so an XDP program can redirect to the physical
+ * parent. A redirect that lands on a VLAN device is dropped at
+ * xdp_do_flush(), because a VLAN device has no ndo_xdp_xmit. Drive real
+ * frames with BPF_F_TEST_XDP_LIVE_FRAMES, which runs the native
+ * xdp_do_redirect() + xdp_do_flush() path: a reducible VLAN egress
+ * resolves to veth1 and is delivered to its peer veth2, while a QinQ
+ * egress returns VLAN_FAILURE and is passed to the stack instead of
+ * redirected to a device that would silently drop it.
+ */
+void test_fib_lookup_vlan_redirect(void)
+{
+	int redirect_fd, err, veth1_idx, veth2_idx = -1;
+	struct bpf_fib_lookup *fib_params;
+	struct nstoken *nstoken = NULL;
+	struct fib_lookup *skel = NULL;
+	bool xdp_attached = false;
+
+	LIBBPF_OPTS(bpf_test_run_opts, lf_opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .flags = BPF_F_TEST_XDP_LIVE_FRAMES,
+		    .repeat = REDIRECT_NPKTS,
+	);
+
+	skel = fib_lookup__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel open_and_load"))
+		return;
+	redirect_fd = bpf_program__fd(skel->progs.fib_lookup_redirect);
+	fib_params = &skel->bss->fib_params;
+
+	SYS(fail, "ip netns add %s", NS_TEST);
+	nstoken = open_netns(NS_TEST);
+	if (!ASSERT_OK_PTR(nstoken, "open_netns"))
+		goto fail;
+	if (setup_netns())
+		goto fail;
+
+	veth1_idx = if_nametoindex("veth1");
+	veth2_idx = if_nametoindex("veth2");
+	if (!ASSERT_NEQ(veth1_idx, 0, "if_nametoindex(veth1)") ||
+	    !ASSERT_NEQ(veth2_idx, 0, "if_nametoindex(veth2)"))
+		goto fail;
+
+	/*
+	 * A redirect to veth1 is delivered to its peer veth2. veth_xdp_xmit()
+	 * only accepts the frame if veth2's NAPI is up, which on veth means
+	 * veth2 carries an XDP program; xdp_count tallies what arrives.
+	 */
+	err = bpf_xdp_attach(veth2_idx, bpf_program__fd(skel->progs.xdp_count),
+			     XDP_FLAGS_DRV_MODE, NULL);
+	if (!ASSERT_OK(err, "attach xdp_count on veth2"))
+		goto fail;
+	xdp_attached = true;
+
+	/* reducible VLAN egress: resolves to the physical parent veth1 */
+	memset(fib_params, 0, sizeof(*fib_params));
+	fib_params->family = AF_INET;
+	fib_params->l4_protocol = IPPROTO_TCP;
+	fib_params->ifindex = veth1_idx;
+	if (!ASSERT_EQ(inet_pton(AF_INET, IPV4_IFACE_ADDR, &fib_params->ipv4_src),
+		       1, "inet_pton(src)") ||
+	    !ASSERT_EQ(inet_pton(AF_INET, IPV4_VLAN_EGRESS_DST, &fib_params->ipv4_dst),
+		       1, "inet_pton(reducible dst)"))
+		goto fail;
+	skel->bss->lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH;
+	skel->bss->redirected = 0;
+	skel->bss->passed = 0;
+	skel->bss->delivered = 0;
+
+	err = bpf_prog_test_run_opts(redirect_fd, &lf_opts);
+	if (!ASSERT_OK(err, "test_run(reducible egress)"))
+		goto fail;
+	ASSERT_EQ(skel->bss->redirected, REDIRECT_NPKTS, "reducible egress redirected");
+	ASSERT_EQ(skel->bss->passed, 0, "reducible egress not passed");
+	ASSERT_GT(skel->bss->delivered, 0, "reducible egress delivered to veth2");
+
+	/*
+	 * QinQ egress: not reducible, so the lookup returns VLAN_FAILURE and
+	 * the program passes the frame instead of redirecting to the inner
+	 * VLAN device. redirected == 0 is the assertion that matters: the
+	 * program did not redirect to a device that would drop the frame at
+	 * xdp_do_flush(). veth2's delivered count is not checked here, since
+	 * a passed frame can still reach veth2 through the stack's forwarding
+	 * path, which is unrelated to the redirect under test.
+	 */
+	memset(fib_params, 0, sizeof(*fib_params));
+	fib_params->family = AF_INET;
+	fib_params->l4_protocol = IPPROTO_TCP;
+	fib_params->ifindex = veth1_idx;
+	if (!ASSERT_EQ(inet_pton(AF_INET, IPV4_IFACE_ADDR, &fib_params->ipv4_src),
+		       1, "inet_pton(src)") ||
+	    !ASSERT_EQ(inet_pton(AF_INET, IPV4_QINQ_DST, &fib_params->ipv4_dst),
+		       1, "inet_pton(qinq dst)"))
+		goto fail;
+	skel->bss->lookup_flags = BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_SKIP_NEIGH;
+	skel->bss->redirected = 0;
+	skel->bss->passed = 0;
+
+	err = bpf_prog_test_run_opts(redirect_fd, &lf_opts);
+	if (!ASSERT_OK(err, "test_run(qinq egress)"))
+		goto fail;
+	ASSERT_EQ(skel->bss->passed, REDIRECT_NPKTS, "qinq egress passed");
+	ASSERT_EQ(skel->bss->redirected, 0, "qinq egress not redirected");
+
 fail:
+	if (xdp_attached)
+		bpf_xdp_detach(veth2_idx, XDP_FLAGS_DRV_MODE, NULL);
 	if (nstoken)
 		close_netns(nstoken);
 	SYS_NOFAIL("ip netns del " NS_TEST);
diff --git a/tools/testing/selftests/bpf/progs/fib_lookup.c b/tools/testing/selftests/bpf/progs/fib_lookup.c
index 7b5dd2214ff4..862a1e9457b4 100644
--- a/tools/testing/selftests/bpf/progs/fib_lookup.c
+++ b/tools/testing/selftests/bpf/progs/fib_lookup.c
@@ -19,4 +19,40 @@ int fib_lookup(struct __sk_buff *skb)
 	return TC_ACT_SHOT;
 }
 
+SEC("xdp")
+int fib_lookup_xdp(struct xdp_md *ctx)
+{
+	fib_lookup_ret = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params),
+					lookup_flags);
+
+	return XDP_DROP;
+}
+
+int redirected = 0;
+int passed = 0;
+int delivered = 0;
+
+SEC("xdp")
+int fib_lookup_redirect(struct xdp_md *ctx)
+{
+	struct bpf_fib_lookup params = fib_params;
+	long ret;
+
+	ret = bpf_fib_lookup(ctx, &params, sizeof(params), lookup_flags);
+	if (ret == BPF_FIB_LKUP_RET_SUCCESS) {
+		redirected++;
+		return bpf_redirect(params.ifindex, 0);
+	}
+
+	passed++;
+	return XDP_PASS;
+}
+
+SEC("xdp")
+int xdp_count(struct xdp_md *ctx)
+{
+	delivered++;
+	return XDP_DROP;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.54.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox