public inbox for netdev@vger.kernel.org

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 bpf 2/2] bpf: Reject access to unix_sk(sk)->listener.
From: Kuniyuki Iwashima @ 2026-02-07 23:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau
  Cc: John Fastabend, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Michal Luczaj,
	Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev
In-Reply-To: <20260207230720.2542943-1-kuniyu@google.com>

With the previous patch, bpf prog cannot access unix_sk(sk)->peer.

struct unix_sock has two pointers to struct sock, and another
pointer unix_sk(sk)->listener also has the same problem mentioned
in the previous patch.

unix_sk(sk)->listener is set by unix_stream_connect() and
cleared by unix_update_edges() during accept(), and both are
done under unix_state_lock().

There are some functions where unix_sk(sk)->peer is passed and
bpf prog can access unix_sk(unix_sk(sk)->peer)->listener locklessly,
which is unsafe.  (e.g. unix_maybe_add_creds())

Let's reject bpf access to unix_sk(sk)->listener too.

Fixes: aed6ecef55d7 ("af_unix: Save listener for embryo socket.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 kernel/bpf/verifier.c                         |  1 +
 .../selftests/bpf/progs/verifier_sock.c       | 24 +++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b328a1640c82..2ffc6eff5584 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7157,6 +7157,7 @@ BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct) {
 
 BTF_TYPE_SAFE_UNTRUSTED(struct unix_sock) {
 	struct sock *peer;
+	struct sock *listener;
 };
 
 static bool type_is_rcu(struct bpf_verifier_env *env,
diff --git a/tools/testing/selftests/bpf/progs/verifier_sock.c b/tools/testing/selftests/bpf/progs/verifier_sock.c
index 8de4d3ed98d4..730850e93d6d 100644
--- a/tools/testing/selftests/bpf/progs/verifier_sock.c
+++ b/tools/testing/selftests/bpf/progs/verifier_sock.c
@@ -1191,4 +1191,28 @@ int BPF_PROG(trace_unix_dgram_sendmsg, struct socket *sock, struct msghdr *msg,
 	return 0;
 }
 
+SEC("fentry/unix_maybe_add_creds")
+__failure __msg("R1 type=untrusted_ptr_ expected=sock_common, sock, tcp_sock, xdp_sock, ptr_, trusted_ptr_")
+int BPF_PROG(trace_unix_maybe_add_creds, struct sk_buff *skb,
+	     const struct sock *sk, struct sock *other)
+{
+	struct unix_sock *u_other, *u_listener;
+
+	if (!other)
+		return 0;
+
+	u_other = bpf_skc_to_unix_sock(other);
+	if (!u_other)
+		return 0;
+
+	/* unix_accept() could clear u_other->listener
+	 * and the listener could be close()d.
+	 */
+	u_listener = bpf_skc_to_unix_sock(u_other->listener);
+	if (!u_listener)
+		return 0;
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply related

* [PATCH v1 bpf 1/2] bpf: Reject access to unix_sk(sk)->peer.
From: Kuniyuki Iwashima @ 2026-02-07 23:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau
  Cc: John Fastabend, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Michal Luczaj,
	Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev,
	Martin KaFai Lau
In-Reply-To: <20260207230720.2542943-1-kuniyu@google.com>

Michal Luczaj reported use-after-free of unix_sk(sk)->peer by
bpf_skc_to_unix_sock(). [0]

Accessing unix_sk(sk)->peer is safe only under unix_state_lock(),
but there are many functions where bpf prog can access the field
locklessly via fentry/fexit.

unix_dgram_connect() could clear unix_sk(sk)->peer and release
the last refcnt of the peer sk while a bpf prog is accessing it,
resulting in use-after-free.

Another problematic scenario is that unix_sk(sk)->peer could
go away while being passed to bpf_setsockopt() in bpf iter.

To avoid such issues, let's reject access to unix_sk(sk)->peer
by marking the pointer with PTR_UNTRUSTED.

If needed, we could add a new helper later that uses unix_peer_get()
and requires bpf_sk_release().

[0]:
BUG: KASAN: slab-use-after-free in bpf_skc_to_unix_sock+0xa4/0xb0
Read of size 2 at addr ffff888147d38890 by task test_progs/2495
Call Trace:
 dump_stack_lvl+0x5d/0x80
 print_report+0x170/0x4f3
 kasan_report+0xe1/0x180
 bpf_skc_to_unix_sock+0xa4/0xb0
 bpf_prog_564a1c39c35d86a2_unix_shutdown_entry+0x8a/0x8e
 bpf_trampoline_6442564662+0x47/0xab
 unix_shutdown+0x9/0x880
 __sys_shutdown+0xe1/0x160
 __x64_sys_shutdown+0x52/0x90
 do_syscall_64+0x6b/0x3a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 9eeb3aa33ae0 ("bpf: Add bpf_skc_to_unix_sock() helper")
Reported-by: Michal Luczaj <mhal@rbox.co>
Closes: https://lore.kernel.org/all/408569e7-2b82-4eff-b767-79ce6ef6cae0@rbox.co/
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 kernel/bpf/verifier.c                         | 18 +++++++++++++
 .../selftests/bpf/progs/verifier_sock.c       | 25 +++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3135643d5695..b328a1640c82 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7076,6 +7076,7 @@ static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val,
 #define BTF_TYPE_SAFE_RCU_OR_NULL(__type)  __PASTE(__type, __safe_rcu_or_null)
 #define BTF_TYPE_SAFE_TRUSTED(__type)  __PASTE(__type, __safe_trusted)
 #define BTF_TYPE_SAFE_TRUSTED_OR_NULL(__type)  __PASTE(__type, __safe_trusted_or_null)
+#define BTF_TYPE_SAFE_UNTRUSTED(__type)  __PASTE(__type, __safe_untrusted)
 
 /*
  * Allow list few fields as RCU trusted or full trusted.
@@ -7154,6 +7155,10 @@ BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct) {
 	struct file *vm_file;
 };
 
+BTF_TYPE_SAFE_UNTRUSTED(struct unix_sock) {
+	struct sock *peer;
+};
+
 static bool type_is_rcu(struct bpf_verifier_env *env,
 			struct bpf_reg_state *reg,
 			const char *field_name, u32 btf_id)
@@ -7201,6 +7206,16 @@ static bool type_is_trusted_or_null(struct bpf_verifier_env *env,
 					  "__safe_trusted_or_null");
 }
 
+static bool type_is_untrusted(struct bpf_verifier_env *env,
+			      struct bpf_reg_state *reg,
+			      const char *field_name, u32 btf_id)
+{
+	BTF_TYPE_EMIT(BTF_TYPE_SAFE_UNTRUSTED(struct unix_sock));
+
+	return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id,
+					  "__safe_untrusted");
+}
+
 static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 				   struct bpf_reg_state *regs,
 				   int regno, int off, int size,
@@ -7343,6 +7358,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	} else {
 		/* Old compat. Deprecated */
 		clear_trusted_flags(&flag);
+
+		if (type_is_untrusted(env, reg, field_name, btf_id))
+			flag |= PTR_UNTRUSTED;
 	}
 
 	if (atype == BPF_READ && value_regno >= 0) {
diff --git a/tools/testing/selftests/bpf/progs/verifier_sock.c b/tools/testing/selftests/bpf/progs/verifier_sock.c
index a2132c72d3b8..8de4d3ed98d4 100644
--- a/tools/testing/selftests/bpf/progs/verifier_sock.c
+++ b/tools/testing/selftests/bpf/progs/verifier_sock.c
@@ -3,6 +3,7 @@
 
 #include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
 #include "bpf_misc.h"
 
 struct {
@@ -1166,4 +1167,28 @@ int invalidate_pkt_pointers_by_tail_call(struct __sk_buff *sk)
 	return TCX_PASS;
 }
 
+SEC("fentry/unix_dgram_sendmsg")
+__failure __msg("R1 type=untrusted_ptr_ expected=sock_common, sock, tcp_sock, xdp_sock, ptr_, trusted_ptr_")
+int BPF_PROG(trace_unix_dgram_sendmsg, struct socket *sock, struct msghdr *msg,
+	     size_t len)
+{
+	struct unix_sock *u, *u_other;
+
+	if (!sock)
+		return 0;
+
+	u = bpf_skc_to_unix_sock(sock->sk);
+	if (!u)
+		return 0;
+
+	/* unix_dgram_connect() could clear u->peer
+	 * and the peer could be freed.
+	 */
+	u_other = bpf_skc_to_unix_sock(u->peer);
+	if (!u_other)
+		return 0;
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.53.0.rc2.204.g2597b5adb4-goog


^ permalink raw reply related

* [PATCH v1 bpf 0/2] bpf: Reject access to unix_sk(sk)->{peer,listener}.
From: Kuniyuki Iwashima @ 2026-02-07 23:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau
  Cc: John Fastabend, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Michal Luczaj,
	Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev

Accessing unix_sk(sk)->{peer,listener} is only safe under
unix_state_lock().

There are many functions where bpf prog can access the fields
locklessly via fentry/fexit or bpf iter.

unix_sk(sk)->{peer,listener} could go away during such lockless
access by bpf.

This seires marks the fields with PTR_UNTRUSTED to prevent
such use-after-free.

Kuniyuki Iwashima (2):
  bpf: Reject access to unix_sk(sk)->peer.
  bpf: Reject access to unix_sk(sk)->listener.

 kernel/bpf/verifier.c                         | 19 +++++++
 .../selftests/bpf/progs/verifier_sock.c       | 49 +++++++++++++++++++
 2 files changed, 68 insertions(+)

-- 
2.53.0.rc2.204.g2597b5adb4-goog

^ permalink raw reply

* Re: [PATCH net-next v13 4/4] net: dsa: add basic initial driver for MxL862xx switches
From: Vladimir Oltean @ 2026-02-07 22:14 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Jakub Kicinski, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiner Kallweit, Russell King, Simon Horman, netdev, devicetree,
	linux-kernel, Frank Wunderlich, Chad Monroe, Cezary Wilmanski,
	Liang Xu, John Crispin
In-Reply-To: <aYYaJ4Yp_MAQ0eqw@makrotopia.org>

On Fri, Feb 06, 2026 at 04:43:19PM +0000, Daniel Golle wrote:
> I've spent an hour studying the pack_fields() API and it's (well
> written) documentation. The only example of it's use in the current
> kernel I could find is the Intel E800 (ICE) driver. And there it does
> make sense as it is handling conversion between CPU and hardware formats
> in the hotpath for DMA descriptors, a total of 3 different structs, each
> with their individual accessor functions.
> 
> Using this approach for this switch driver would require writing a lot
> of boilerplate code, accessor functions for each and every struct,
> and a struct definition once unpacked for the host platform and then
> again using the PACKED_FIELD(...) notation for the hardware format.
> Surely, most of that could be auto-generated using the existing
> vendor drivers API definition. Yet (at least to me) it feels like
> over-engineering and also it would require rewriting most of the driver
> which has been discussed for almost 2 months now.
> 
> Also note that the driver doesn't need the naturally aligned version of
> all these structs in native CPU endian -- they are not used for further
> processing anything, you can see that because they aren't ever used as a
> function parameters, but only ever as exchange formats when
> communicating with the firmware.

OK. If the fields were packed more densely maybe the tradeoff would have
looked differently then. But you're talking to an MCU and not to hardware.
And you don't need to keep a local direct representation of the data
passed through those packed buffers. Your arguments are valid.

> Maybe I'm missing something obvious here and there is a more simple way
> to use this API, some generic macros using compiler introspection to
> magically handle everything without needing to write packed and unpacked
> struct definitions and individual pack/unpack boiler-plate functions for
> each struct. If so, please provide me with an example or explain how you
> imagine the pack_fields() API to be used in the context of this driver
> and it's total of at more than 30 different structs which will be used
> for all the different firmware function I will need to use in order to
> implement phylink_pcs as well as the various offloading and VLAN-related
> functionality the driver should have in the end (ie. the structs you
> currently see in the mxl862xx-api.h file are just a fraction of what I
> hope to add there by follow-up series)

No, the API usage example is how you imagine it. There's a structure
where you only need to pull in the fields you care about (and in
whatever order), rather than every other unrelated tidbit you don't
currently need and possibly never will (like ingress_marking_mode, etc etc).
And another array of PACKED_FIELD() where you say where each field goes.
You probably don't need a pack_fields() and an unpack_fields() call for
the same data structure in most cases, just one or the other.

^ permalink raw reply

* Re: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock
From: Kuniyuki Iwashima @ 2026-02-07 22:00 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Daniel Borkmann,
	Willem de Bruijn, Cong Wang, Alexei Starovoitov, Yonghong Song,
	Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, netdev, bpf, linux-kernel, linux-kselftest
In-Reply-To: <20260207-unix-proto-update-null-ptr-deref-v2-3-9f091330e7cd@rbox.co>

On Sat, Feb 7, 2026 at 6:35 AM Michal Luczaj <mhal@rbox.co> wrote:
>
> unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state,
> TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`).
> sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that
> socket is properly set up, which would include having a defined peer. IOW,
> there's a window when unix_stream_bpf_update_proto() can be called on
> socket which still has unix_peer(sk) == NULL.
>
>           T0 bpf                            T1 connect
>           ------                            ----------
>
>                                 WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
> sock_map_sk_state_allowed(sk)
> ...
> sk_pair = unix_peer(sk)
> sock_hold(sk_pair)
>                                 sock_hold(newsk)
>                                 smp_mb__after_atomic()
>                                 unix_peer(sk) = newsk
>
> BUG: kernel NULL pointer dereference, address: 0000000000000080
> RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
> Call Trace:
>   sock_map_link+0x564/0x8b0
>   sock_map_update_common+0x6e/0x340
>   sock_map_update_elem_sys+0x17d/0x240
>   __sys_bpf+0x26db/0x3250
>   __x64_sys_bpf+0x21/0x30
>   do_syscall_64+0x6b/0x3a0
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Initial idea was to move peer assignment _before_ the sk_state update[1],
> but that involved an additional memory barrier, and changing the hot path
> was rejected. Then a check during proto update was considered[2], but a
> follow-up discussion[3] concluded the root cause is sockmap taking a wrong
> lock.
>
> Thus, teach sockmap about the af_unix-specific locking: instead of the
> usual lock_sock() involving sock::sk_lock, af_unix protects critical
> sections under unix_state_lock() operating on unix_sock::lock.
>
> [1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
> [2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/
> [3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/
>
> This patch also happens to fix a deadlock that may occur when
> bpf_iter_unix_seq_show()'s lock_sock_fast() takes the fast path and the
> iter prog attempts to update a sockmap. Which ends up spinning at
> sock_map_update_elem()'s bh_lock_sock():

Hmm.. this seems to be a more general problem for
bpf iter vs sockmap.  bpf_iter_{tcp,udp}_seq_show() also
hold lock_sock(),  where this patch's solution does not help.
We need to resolve this regardless of socket family.

Also, I feel lock_sock() should be outside of unix_state_lock()
since the former is usually sleepable.  If we used such locking
order in the future, it would trigger ABBA deadlock and we would
have to revisit this problem.



>
> WARNING: possible recursive locking detected
> --------------------------------------------
> test_progs/1393 is trying to acquire lock:
> ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: sock_map_update_elem+0xdb/0x1f0
>
> but task is already holding lock:
> ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(slock-AF_UNIX);
>   lock(slock-AF_UNIX);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 4 locks held by test_progs/1393:
>  #0: ffff88814b59c790 (&p->lock){+.+.}-{4:4}, at: bpf_seq_read+0x59/0x10d0
>  #1: ffff88811ec25fd8 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: bpf_seq_read+0x42c/0x10d0
>  #2: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0
>  #3: ffffffff85a6a7c0 (rcu_read_lock){....}-{1:3}, at: bpf_iter_run_prog+0x51d/0xb00
>
> Call Trace:
>  dump_stack_lvl+0x5d/0x80
>  print_deadlock_bug.cold+0xc0/0xce
>  __lock_acquire+0x130f/0x2590
>  lock_acquire+0x14e/0x2b0
>  _raw_spin_lock+0x30/0x40
>  sock_map_update_elem+0xdb/0x1f0
>  bpf_prog_2d0075e5d9b721cd_dump_unix+0x55/0x4f4
>  bpf_iter_run_prog+0x5b9/0xb00
>  bpf_iter_unix_seq_show+0x1f7/0x2e0
>  bpf_seq_read+0x42c/0x10d0
>  vfs_read+0x171/0xb20
>  ksys_read+0xff/0x200
>  do_syscall_64+0x6b/0x3a0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
> Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
> Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.")
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
> Keeping sparse annotations in sock_map_sk_{acquire,release}() required some
> hackery I'm not proud of. Is there a better way?
> ---
>  net/core/sock_map.c | 47 +++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index b6586d9590b7..0c638b1f363a 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -12,6 +12,7 @@
>  #include <linux/list.h>
>  #include <linux/jhash.h>
>  #include <linux/sock_diag.h>
> +#include <net/af_unix.h>
>  #include <net/udp.h>
>
>  struct bpf_stab {
> @@ -115,17 +116,49 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
>  }
>
>  static void sock_map_sk_acquire(struct sock *sk)
> -       __acquires(&sk->sk_lock.slock)
> +       __acquires(sock_or_unix_lock)
>  {
> -       lock_sock(sk);
> +       if (sk_is_unix(sk)) {
> +               unix_state_lock(sk);
> +               __release(sk); /* Silence sparse. */
> +       } else {
> +               lock_sock(sk);
> +       }
> +
>         rcu_read_lock();
>  }
>
>  static void sock_map_sk_release(struct sock *sk)
> -       __releases(&sk->sk_lock.slock)
> +       __releases(sock_or_unix_lock)
>  {
>         rcu_read_unlock();
> -       release_sock(sk);
> +
> +       if (sk_is_unix(sk)) {
> +               unix_state_unlock(sk);
> +               __acquire(sk); /* Silence sparse. */
> +       } else {
> +               release_sock(sk);
> +       }
> +}
> +
> +static inline void sock_map_sk_acquire_fast(struct sock *sk)
> +{
> +       local_bh_disable();
> +
> +       if (sk_is_unix(sk))
> +               unix_state_lock(sk);
> +       else
> +               bh_lock_sock(sk);
> +}
> +
> +static inline void sock_map_sk_release_fast(struct sock *sk)
> +{
> +       if (sk_is_unix(sk))
> +               unix_state_unlock(sk);
> +       else
> +               bh_unlock_sock(sk);
> +
> +       local_bh_enable();
>  }
>
>  static void sock_map_add_link(struct sk_psock *psock,
> @@ -604,16 +637,14 @@ static long sock_map_update_elem(struct bpf_map *map, void *key,
>         if (!sock_map_sk_is_suitable(sk))
>                 return -EOPNOTSUPP;
>
> -       local_bh_disable();
> -       bh_lock_sock(sk);
> +       sock_map_sk_acquire_fast(sk);
>         if (!sock_map_sk_state_allowed(sk))
>                 ret = -EOPNOTSUPP;
>         else if (map->map_type == BPF_MAP_TYPE_SOCKMAP)
>                 ret = sock_map_update_common(map, *(u32 *)key, sk, flags);
>         else
>                 ret = sock_hash_update_common(map, key, sk, flags);
> -       bh_unlock_sock(sk);
> -       local_bh_enable();
> +       sock_map_sk_release_fast(sk);
>         return ret;
>  }
>
>
> --
> 2.52.0
>

^ permalink raw reply

* Re: [PATCH net-next v14 4/4] net: dsa: add basic initial driver for MxL862xx switches
From: Vladimir Oltean @ 2026-02-07 21:59 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiner Kallweit, Russell King, Simon Horman, netdev, devicetree,
	linux-kernel, Frank Wunderlich, Chad Monroe, Cezary Wilmanski,
	Liang Xu, John Crispin
In-Reply-To: <ccde07e8cf33d8ae243000013b57cfaa2695e0a9.1770433307.git.daniel@makrotopia.org>

On Sat, Feb 07, 2026 at 03:07:27AM +0000, Daniel Golle wrote:
> +/* PHY access via firmware relay */
> +static int mxl862xx_phy_read_mmd(struct mxl862xx_priv *priv, int port,
> +				 int devadd, int reg)
> +{
> +	struct mdio_relay_data param = {
> +		.phy = port,
> +		.mmd = devadd,
> +		.reg = cpu_to_le16(reg),
> +	};
> +	int ret;
> +
> +	ret = MXL862XX_API_READ(priv, INT_GPHY_READ, param);
> +	if (ret)
> +		return ret;
> +
> +	return le16_to_cpu(param.data);
> +}
> +
> +static int mxl862xx_phy_write_mmd(struct mxl862xx_priv *priv, int port,
> +				  int devadd, int reg, u16 data)
> +{
> +	struct mdio_relay_data param = {
> +		.phy = port,
> +		.mmd = devadd,
> +		.reg = cpu_to_le16(reg),
> +		.data = cpu_to_le16(data),
> +	};
> +
> +	return MXL862XX_API_WRITE(priv, INT_GPHY_WRITE, param);
> +}
> +
> +static int mxl862xx_phy_read_mii_bus(struct mii_bus *bus, int port, int regnum)
> +{
> +	return mxl862xx_phy_read_mmd(bus->priv, port, 0, regnum);
> +}
> +
> +static int mxl862xx_phy_write_mii_bus(struct mii_bus *bus, int port,
> +				      int regnum, u16 val)
> +{
> +	return mxl862xx_phy_write_mmd(bus->priv, port, 0, regnum, val);
> +}
> +
> +static int mxl862xx_phy_read_c45_mii_bus(struct mii_bus *bus, int port,
> +					 int devadd, int regnum)
> +{
> +	return mxl862xx_phy_read_mmd(bus->priv, port, devadd, regnum);
> +}
> +
> +static int mxl862xx_phy_write_c45_mii_bus(struct mii_bus *bus, int port,
> +					  int devadd, int regnum, u16 val)
> +{
> +	return mxl862xx_phy_write_mmd(bus->priv, port, devadd, regnum, val);
> +}

You took inspiration from the wrong place with the mii_bus ops prototypes,
specifically with the "int port" argument.

The second argument does not hold the port, it holds the PHY address.
I.e. in this case:
                port@6 {
                    reg = <6>;
                    phy-handle = <&phy5>;
                    phy-mode = "internal";
                };
                phy5: ethernet-phy@5 {
                    reg = <5>;
                };

"int port" is 5, not 6.

Your source of inspiration are the prototypes of an mii_bus used as
ds->user_mii_bus. We have a different set of requirements there, because
ds->user_mii_bus exists for the case where the PHY is not described in
the device tree, so the port index is given as argument and the
user_mii_bus is responsible for internally translating the port index to
a PHY address.

So while the use of "int port" as argument name for these operations is
justifiable in some cases, it is not applicable to this driver, and will
be a pitfall for anyone who has to modify or debug this code.

^ permalink raw reply

* Re: [PATCH net-next 7/7] tcp: inet6_csk_xmit() optimization
From: Kuniyuki Iwashima @ 2026-02-07 21:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260206173426.1638518-8-edumazet@google.com>

On Fri, Feb 6, 2026 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> After prior patches, inet6_csk_xmit() can reuse inet->cork.fl.u.ip6
> if __sk_dst_check() returns a valid dst.
>
> Otherwise call inet6_csk_route_socket() to refresh inet->cork.fl.u.ip6
> content and get a new dst.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net-next 6/7] tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
From: Kuniyuki Iwashima @ 2026-02-07 21:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260206173426.1638518-7-edumazet@google.com>

On Fri, Feb 6, 2026 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> As explained in commit 85d05e281712 ("ipv6: change inet6_sk_rebuild_header()
> to use inet->cork.fl.u.ip6"):
>
> TCP v6 spends a good amount of time rebuilding a fresh fl6 at each
> transmit in inet6_csk_xmit()/inet6_csk_route_socket().
>
> TCP v4 caches the information in inet->cork.fl.u.ip4 instead.
>
> After this patch, passive TCP ipv6 flows have correctly initialized
> inet->cork.fl.u.ip6 structure.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net-next v14 1/4] dt-bindings: net: dsa: add MaxLinear MxL862xx
From: Vladimir Oltean @ 2026-02-07 21:48 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiner Kallweit, Russell King, Simon Horman, netdev, devicetree,
	linux-kernel, Frank Wunderlich, Chad Monroe, Cezary Wilmanski,
	Liang Xu, John Crispin
In-Reply-To: <22a6a3c8c15b932ff4b7d0cd8863939f06a0c2b4.1770433307.git.daniel@makrotopia.org>

On Sat, Feb 07, 2026 at 03:07:04AM +0000, Daniel Golle wrote:
> RFC v4:
>  * remove labels from example
>  * remove 'bindings for' from commit title
...
> +examples:
> +  - |
> +    mdio {
> +        #address-cells = <1>;
> +        #size-cells = <0>;
> +
> +        switch@0 {
> +            compatible = "maxlinear,mxl86282";
> +            reg = <0>;
> +
> +            ethernet-ports {
> +                #address-cells = <1>;
> +                #size-cells = <0>;
> +
> +                port@9 {
> +                    reg = <9>;
> +                    label = "cpu";

Sorry, it's my fault really for not checking since v4 that you properly
applied my feedback to "Please remove port labels from the example."
https://lore.kernel.org/netdev/20251216224317.maxhcdsuqqxnywmu@skbuf/

There was an effort a few years ago to remove label = "cpu" at least
from dt-binding examples, if not from device trees as well, because the
"label" property is defined and parsed only for user ports, which the
CPU port is not. [ and even for user ports, it is discouraged except for
distributions with a sub-par udev implementation (OpenWrt) ].

This line should go away.

> +                    ethernet = <&gmac0>;
> +                    phy-mode = "usxgmii";
> +
> +                    fixed-link {
> +                        speed = <10000>;
> +                        full-duplex;
> +                    };
> +                };
> +            };
> +        };
> +    };

^ permalink raw reply

* Re: [PATCH net-next v14 0/4] net: dsa: initial support for MaxLinear MxL862xx switches
From: Vladimir Oltean @ 2026-02-07 21:47 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiner Kallweit, Russell King, Simon Horman, netdev, devicetree,
	linux-kernel, Frank Wunderlich, Chad Monroe, Cezary Wilmanski,
	Liang Xu, John Crispin
In-Reply-To: <cover.1770433307.git.daniel@makrotopia.org>

On Sat, Feb 07, 2026 at 03:06:48AM +0000, Daniel Golle wrote:
> This series adds very basic DSA support for the MaxLinear MxL86252
> (5x 2500Base-T PHYs) and MxL86282 (8x 2500Base-T PHYs) switches.
> In addition to the 2.5G TP ports both switches also come with two
> SerDes interfaces which can be used either to connect external PHYs
> or SFP cages, or as CPU port when using the switch with this DSA driver.

For the entire set:

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>

with some minor comments I'll leave in individual patches, which I'd
like you to address.

I don't want to put anybody in a bad spot, but given what time it is,
this set should get at least _some_ time in net-next before the upcoming
net-next PR, to allow for some reaction time in case of some unexpected
reports like from static analysis or similar. So it would be good,
because of that, for the fixups as a result of my comments to be
separate patches rather than a new version.

^ permalink raw reply

* Re: [PATCH net-next 5/7] tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
From: Kuniyuki Iwashima @ 2026-02-07 21:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260206173426.1638518-6-edumazet@google.com>

On Fri, Feb 6, 2026 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Instead of using private @fl6 and @final variables
> use respectively inet->cork.fl.u.ip6 and np->final.
>
> As explained in commit 85d05e281712 ("ipv6: change inet6_sk_rebuild_header()
> to use inet->cork.fl.u.ip6"):
>
> TCP v6 spends a good amount of time rebuilding a fresh fl6 at each
> transmit in inet6_csk_xmit()/inet6_csk_route_socket().
>
> TCP v4 caches the information in inet->cork.fl.u.ip4 instead.
>
> After this patch, active TCP ipv6 flows have correctly initialized
> inet->cork.fl.u.ip6 structure.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net-next 4/7] ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
From: Kuniyuki Iwashima @ 2026-02-07 21:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260206173426.1638518-5-edumazet@google.com>

On Fri, Feb 6, 2026 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Convert inet6_csk_route_socket() to use np->final instead of an
> automatic variable to get rid of a stack canary.
>
> Convert inet6_csk_xmit() and inet6_csk_update_pmtu() to use
> inet->cork.fl.u.ip6 instead of @fl6 automatic variable.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net-next 3/7] ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
From: Kuniyuki Iwashima @ 2026-02-07 21:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260206173426.1638518-4-edumazet@google.com>

On Fri, Feb 6, 2026 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Get rid of @fl6 and &final variables in ip6_datagram_dst_update().
>
> Use instead inet->cork.fl.u.ip6 and np->final so that a stack canary
> is no longer needed.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net-next 2/7] ipv6: use np->final in inet6_sk_rebuild_header()
From: Kuniyuki Iwashima @ 2026-02-07 21:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260206173426.1638518-3-edumazet@google.com>

On Fri, Feb 6, 2026 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Instead of using an automatic variable, use np->final
> to get rid of the stack canary in inet6_sk_rebuild_header().
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net-next 1/7] ipv6: add daddr/final storage in struct ipv6_pinfo
From: Kuniyuki Iwashima @ 2026-02-07 21:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, David Ahern, netdev, eric.dumazet
In-Reply-To: <20260206173426.1638518-2-edumazet@google.com>

On Fri, Feb 6, 2026 at 9:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> After commit b409a7f7176b ("ipv6: colocate inet6_cork in
> inet_cork_full") we have room in ipv6_pinfo to hold daddr/final
> in case they need to be populated in fl6_update_dst() calls.
>
> This will allow stack canary removal in IPv6 tx fast paths.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH net-next v20 00/12] virtio_net: Add ethtool flow rules support
From: Michael S. Tsirkin @ 2026-02-07 21:14 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: Jakub Kicinski, netdev, jasowang, pabeni, virtualization, parav,
	shshitrit, yohadt, xuanzhuo, eperezma, jgg, kevin.tian,
	andrew+netdev, edumazet
In-Reply-To: <c96d6b1c-5bad-4e06-905c-10f722c9b9d1@nvidia.com>

On Sat, Feb 07, 2026 at 07:14:25AM -0600, Dan Jurgens wrote:
> On 2/7/26 4:01 AM, Michael S. Tsirkin wrote:
> > On Thu, Feb 05, 2026 at 06:43:28PM -0800, Jakub Kicinski wrote:
> >> On Thu, 5 Feb 2026 16:46:55 -0600 Daniel Jurgens wrote:
> >>> This series implements ethtool flow rules support for virtio_net using the
> >>> virtio flow filter (FF) specification. The implementation allows users to
> >>> configure packet filtering rules through ethtool commands, directing
> >>> packets to specific receive queues, or dropping them based on various
> >>> header fields.
> >>
> >> This is a 4th version of this you posted in as many days and it doesn't
> >> even build. Please slow down. Please wait with v21 until after the merge
> >> window. We have enough patches to sift thru still for v7.0.
> > 
> > v20 and no end in sight.
> > Just looking at the amount of pain all this parsing is inflicting
> > makes me worry. And wait until we need to begin worrying about
> > maintaining UAPI stability.
> > 
> > It would be much nicer if drivers were out of the business of parsing
> > fiddly structures.  Isn't there a way for more code in net core
> > to deal with all this?
> 
> MST, you reviewed the spec that defined these data structures. If you
> didn't want the driver to have parse data structures then suggesting
> using the same format as the ethtool flow specs would have been a great
> idea at that point. Or short of that padded and fixed size data
> structures would also made things much cleaner.

Oh virtio is actually reasonably nice IMHO, and of course
virtio net has to parse them. But a bunch of issues
I reported are around parsing uapi/linux/ethtool.h structures.
There is a ton of code like this:


+static void parse_ip4(struct iphdr *mask, struct iphdr *key,
+                     const struct ethtool_rx_flow_spec *fs)
+{
+       const struct ethtool_usrip4_spec *l3_mask = &fs->m_u.usr_ip4_spec;
+       const struct ethtool_usrip4_spec *l3_val  = &fs->h_u.usr_ip4_spec;
+
+       if (l3_mask->ip4src) {
+               memcpy(&mask->saddr, &l3_mask->ip4src, sizeof(mask->saddr));
+               memcpy(&key->saddr, &l3_val->ip4src, sizeof(key->saddr));
+       }
+
+       if (l3_mask->ip4dst) {
+               memcpy(&mask->daddr, &l3_mask->ip4dst, sizeof(mask->daddr));
+               memcpy(&key->daddr, &l3_val->ip4dst, sizeof(key->daddr));
+       }
+
+       if (l3_mask->tos) {
+               mask->tos = l3_mask->tos;
+               key->tos = l3_val->tos;
+       }
+}
+

and I just ask, given there's apparently nothing at all here
that is driver specific, whether it is generally useful
enough to live in net core?


> I thought this series was close to done, so I was trying to address the
> very non-deterministic AI review comments. It's been generating new
> comments on things that had been there for many revisions, and running
> it locally with the same model never reproduces the comments from the
> online review.

Uncritically posting, or trying to address "ai review" comments is not
at all a good idea.


A bigger problem is that we are still not done finding bugs in this.
My thought therefore was to see if we can move more code to
net core where more *humans* will read it and help find and fix bug.

-- 
MST


^ permalink raw reply

* [PATCH net-next v4 4/4] selftests/net: add no NDP b/mcast,null poison test
From: Marc Suñé @ 2026-02-07 20:40 UTC (permalink / raw)
  To: kuba, willemdebruijn.kernel, pabeni
  Cc: netdev, dborkman, vadim.fedorenko, Marc Suñé
In-Reply-To: <cover.1770399836.git.marcdevel@gmail.com>

Add a selftest to test that NDP bcast/mcast/null poisioning checks are
never bypassed.

Signed-off-by: Marc Suñé <marcdevel@gmail.com>
---
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/Makefile          |   2 +
 .../net/arp_ndisc_no_invalid_sha_poison.sh    | 368 ++++++++++++++++++
 .../net/arp_no_invalid_sha_poision.sh         | 173 --------
 tools/testing/selftests/net/ndisc_send.c      | 198 ++++++++++
 5 files changed, 569 insertions(+), 173 deletions(-)
 create mode 100755 tools/testing/selftests/net/arp_ndisc_no_invalid_sha_poison.sh
 delete mode 100755 tools/testing/selftests/net/arp_no_invalid_sha_poision.sh
 create mode 100644 tools/testing/selftests/net/ndisc_send.c

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 97ad4d551d44..3e703b31a8e1 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -18,6 +18,7 @@ ipv6_flowlabel_mgr
 ipv6_fragmentation
 log.txt
 msg_zerocopy
+ndisc_send
 netlink-dumps
 nettest
 proc_net_pktgen
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index c6d571bf72be..a8661191a78b 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -10,6 +10,7 @@ TEST_PROGS := \
 	altnames.sh \
 	amt.sh \
 	arp_ndisc_evict_nocarrier.sh \
+	arp_ndisc_no_invalid_sha_poison.sh \
 	arp_ndisc_untracked_subnets.sh \
 	arp_no_invalid_sha_poision.sh \
 	bareudp.sh \
@@ -171,6 +172,7 @@ TEST_GEN_PROGS := \
 	epoll_busy_poll \
 	icmp_rfc4884 \
 	ipv6_fragmentation \
+	ndisc_send \
 	proc_net_pktgen \
 	reuseaddr_conflict \
 	reuseport_bpf \
diff --git a/tools/testing/selftests/net/arp_ndisc_no_invalid_sha_poison.sh b/tools/testing/selftests/net/arp_ndisc_no_invalid_sha_poison.sh
new file mode 100755
index 000000000000..ad91eac18c89
--- /dev/null
+++ b/tools/testing/selftests/net/arp_ndisc_no_invalid_sha_poison.sh
@@ -0,0 +1,368 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Tests that ARP announcements with Broadcast or NULL mac are never
+# accepted
+#
+
+source lib.sh
+
+readonly V4_ADDR0="10.0.10.1"
+readonly V6_ADDR0="fd00:1::1"
+readonly V4_ADDR1="10.0.10.2"
+readonly V6_ADDR1="fd00:1::2"
+readonly V6_ALL_NODES="ff02::1"
+readonly V6_SOL_NODE1="ff02::1:ff00:0002"
+readonly BCAST_MAC="ff:ff:ff:ff:ff:ff"
+readonly MCAST_MAC="01:00:5e:00:00:00"
+readonly NULL_MAC="00:00:00:00:00:00"
+readonly VALID_MAC="02:01:02:03:04:05"
+readonly V6_ALL_NODE_MAC="33:33:FF:00:00:01"
+readonly V6_SOL_NODE_MAC1="33:33:FF:00:00:02"
+readonly NS=135
+readonly NA=136
+readonly ARP_REQ=request
+readonly ARP_REPLY=reply
+ret=0
+veth0_ifindex=0
+veth1_mac=
+
+setup() {
+	setup_ns PEER_NS
+
+	ip link add name veth0 type veth peer name veth1
+	ip link set dev veth0 up
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip netns exec "${PEER_NS}" ip link set dev veth1 up
+	ip addr add "${V4_ADDR0}"/24 dev veth0
+	ip addr add "${V6_ADDR0}"/64 dev veth0
+	ip netns exec "${PEER_NS}" ip addr add "${V4_ADDR1}"/24 dev veth1
+	ip netns exec "${PEER_NS}" ip route add default via "${V4_ADDR0}" dev veth1
+
+	ip netns exec "${PEER_NS}" ip addr add "${V6_ADDR1}"/64 dev veth1
+	ip netns exec "${PEER_NS}" ip route add default via "${V6_ADDR0}" dev veth1
+
+	# Raise ARP timers to avoid flakes due to refreshes
+	sysctl -w net.ipv4.neigh.veth0.base_reachable_time=3600 \
+		>/dev/null 2>&1
+	ip netns exec "${PEER_NS}" \
+		sysctl -w net.ipv4.neigh.veth1.gc_stale_time=3600 \
+		>/dev/null 2>&1
+	ip netns exec "${PEER_NS}" \
+		sysctl -w net.ipv4.neigh.veth1.base_reachable_time=3600 \
+		>/dev/null 2>&1
+
+	veth0_ifindex=$(ip -j link show veth0 | jq -r '.[0].ifindex')
+	veth1_mac="$(ip netns exec "${PEER_NS}" ip -j link show veth1 | \
+		jq -r '.[0].address' )"
+}
+
+cleanup() {
+	ip neigh flush dev veth0
+	ip link del veth0
+	cleanup_ns "${PEER_NS}"
+}
+
+# Make sure ARP announcement with invalid MAC is never learnt
+run_no_arp_poisoning() {
+	local l2_dmac="${1}"
+	local tmac="${2}"
+	local op="${3}"
+
+	ret=0
+
+	ip netns exec "${PEER_NS}" ip neigh flush dev veth1 >/dev/null 2>&1
+	ip netns exec "${PEER_NS}" ping -c 1 "${V4_ADDR0}" >/dev/null 2>&1
+
+	# Poison with a valid MAC to ensure injection is working
+	mausezahn "veth0" -q -a "${VALID_MAC}" -b "${BCAST_MAC}" -t arp \
+		  "${op}, sip=${V4_ADDR0}, tip=${V4_ADDR0}, smac=${VALID_MAC}, tmac=${VALID_MAC}"
+
+	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}" | \
+		grep "${VALID_MAC}")
+	if [ "${neigh}" == "" ]; then
+		echo "ERROR: unable to ARP poision with a valid MAC ${VALID_MAC}"
+		ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}"
+		ret=1
+		return
+	fi
+
+	# Poison with tmac
+	mausezahn "veth0" -q -a "${VALID_MAC}" -b "${l2_dmac}" -t arp \
+		  "${op}, sip=${V4_ADDR0}, tip=${V4_ADDR0}, smac=${tmac}, tmac=${tmac}"
+
+
+	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}" | \
+		grep "${tmac}")
+	if [ "${neigh}" != "" ]; then
+		echo "ERROR: ARP entry learnt for ${tmac} announcement."
+		ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}"
+		ret=1
+		return
+	fi
+}
+
+# Make sure NDP announcement with invalid MAC is never learnt
+run_no_ndp_poisoning() {
+	local l2_dmac="${1}"
+	local dst_ip="${2}"
+	local op="${3}"
+	local tip="${V6_ADDR0}"
+	local tmac="${4}"
+
+	if [ "${op}" == "${NS}" ]; then
+		tip="${V6_ADDR1}"
+	fi
+
+	ret=0
+
+	ip netns exec "${PEER_NS}" ip -6 neigh flush dev veth1 >/dev/null 2>&1
+	ip netns exec "${PEER_NS}" ping -c 1 "${V6_ADDR0}" >/dev/null 2>&1
+
+	# Poison with a valid MAC to ensure injection is working
+	./ndisc_send "${veth0_ifindex}" "${l2_dmac}" "${VALID_MAC}" "${dst_ip}" \
+		"${V6_ADDR0}" "${tip}" "${op}" "${VALID_MAC}"
+	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V6_ADDR0}" | \
+		grep "${VALID_MAC}")
+	if [ "${neigh}" == "" ]; then
+		echo "ERROR: unable to NDP poision with a valid MAC ${VALID_MAC}"
+		ip netns exec "${PEER_NS}" ip neigh show "${V6_ADDR0}"
+		ret=1
+		return
+	fi
+
+	# Poison with tmac
+	./ndisc_send "${veth0_ifindex}" "${l2_dmac}" "${VALID_MAC}" "${dst_ip}" \
+		"${V6_ADDR0}" "${tip}" "${op}" "${tmac}"
+	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V6_ADDR0}" | \
+		grep "${tmac}")
+	if [ "${neigh}" != "" ]; then
+		echo "ERROR: NDP entry learnt for ${tmac} announcement."
+		ip netns exec "${PEER_NS}" ip neigh show "${V6_ADDR0}"
+		ret=1
+		return
+	fi
+}
+
+print_test_result() {
+	local msg="${1}"
+	local rc="${2}"
+
+	if [ "${rc}" == 0 ]; then
+		printf "TEST: %-60s  [ OK ]" "${msg}"
+	else
+		printf "TEST: %-60s  [ FAIL ]" "${msg}"
+	fi
+}
+
+run_all_tests() {
+	local results
+
+	setup
+
+	## ARP
+	# Broadcast gARPs
+	msg="1.1  ARP no poisoning dmac=bcast reply sha=bcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${BCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.2  ARP no poisoning dmac=bcast reply sha=null"
+	run_no_arp_poisoning "${BCAST_MAC}" "${NULL_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.3  ARP no poisoning dmac=bcast req   sha=bcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${BCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.4  ARP no poisoning dmac=bcast req   sha=null"
+	run_no_arp_poisoning "${BCAST_MAC}" "${NULL_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.5  ARP no poisoning dmac=bcast req   sha=mcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${MCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.6  ARP no poisoning dmac=bcast reply sha=mcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${MCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Targeted gARPs
+	msg="1.7  ARP no poisoning dmac=veth0 reply sha=bcast"
+	run_no_arp_poisoning "${veth1_mac}" "${BCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.8  ARP no poisoning dmac=veth0 reply sha=null"
+	run_no_arp_poisoning "${veth1_mac}" "${NULL_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.9  ARP no poisoning dmac=veth0 req   sha=bcast"
+	run_no_arp_poisoning "${veth1_mac}" "${BCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.10 ARP no poisoning dmac=veth0 req   sha=null"
+	run_no_arp_poisoning "${veth1_mac}" "${NULL_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.11 ARP no poisoning dmac=veth0 req   sha=mcast"
+	run_no_arp_poisoning "${veth1_mac}" "${MCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.12 ARP no poisoning dmac=veth0 reply sha=mcast"
+	run_no_arp_poisoning "${veth1_mac}" "${MCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	### NDP
+	## NA
+	# Broadcast / All node MAC, all-node IP announcements
+	msg="2.1  NDP no poisoning dmac=bcast   all_nodes na lladdr=bcast"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ALL_NODES}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.2  NDP no poisoning dmac=bcast   all_nodes na lladdr=null"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ALL_NODES}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.3  NDP no poisoning dmac=allnode all_nodes na lladdr=bcast"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ALL_NODES}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.4  NDP no poisoning dmac=allnode all_nodes na lladdr=null"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ALL_NODES}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.5  NDP no poisoning dmac=bcast   all_nodes na lladdr=bcast"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ALL_NODES}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.6  NDP no poisoning dmac=bcast   all_nodes na lladdr=null"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ALL_NODES}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.7  NDP no poisoning dmac=allnode all_nodes na lladdr=bcast"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ALL_NODES}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.8  NDP no poisoning dmac=allnode all_nodes na lladdr=null"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ALL_NODES}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Broadcast / All node MAC, Targeted IP announce
+	msg="2.9  NDP no poisoning dmac=bcast   targeted  na lladdr=bcast"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ADDR1}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.10 NDP no poisoning dmac=bcast   targeted  na lladdr=null"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ADDR1}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.11 NDP no poisoning dmac=allnode targeted  na lladdr=bcast"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ADDR1}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.12 NDP no poisoning dmac=allnode targeted  na lladdr=null"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ADDR1}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.13 NDP no poisoning dmac=bcast   targeted  na lladdr=bcast"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ADDR1}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.14 NDP no poisoning dmac=bcast   targeted  na lladdr=null"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ADDR1}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.15 NDP no poisoning dmac=allnode targeted  na lladdr=bcast"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ADDR1}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.16 NDP no poisoning dmac=allnode targeted  na lladdr=null"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ADDR1}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Targeted MAC, Targeted IP announce
+	msg="2.17 NDP no poisoning dmac=veth1   targeted  na lladdr=bcast"
+	run_no_ndp_poisoning "${veth1_mac}" "${V6_ADDR1}" "${NA}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.18 NDP no poisoning dmac=veth1   targeted  na lladdr=null"
+	run_no_ndp_poisoning "${veth1_mac}" "${V6_ADDR1}" "${NA}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Poison with MCAST mac
+	msg="2.19 NDP no poisoning dmac=allnode all_nodes na lladdr=mcast"
+	run_no_ndp_poisoning "${V6_ALL_NODE_MAC}" "${V6_ALL_NODES}" "${NA}" "${MCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+	msg="2.20 NDP no poisoning dmac=veth1   targeted  na lladdr=mcast"
+	run_no_ndp_poisoning "${veth1_mac}" "${V6_ADDR1}" "${NA}" "${MCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	## NS
+	# Broadcast / SolNode node MAC, SolNode IP solic
+	msg="2.21 NDP no poisoning dmac=bcast   solnode   ns lladdr=bcast"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_SOL_NODE1}" "${NS}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.22 NDP no poisoning dmac=bcast   solnode   ns lladdr=null"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_SOL_NODE1}" "${NS}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.23 NDP no poisoning dmac=solnode solnode   ns lladdr=bcast"
+	run_no_ndp_poisoning "${V6_SOL_NODE_MAC1}" "${V6_SOL_NODE1}" "${NS}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.24 NDP no poisoning dmac=solnode solnode   ns lladdr=null"
+	run_no_ndp_poisoning "${V6_SOL_NODE_MAC1}" "${V6_SOL_NODE1}" "${NS}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Broadcast / SolNode node MAC, target IP solic
+	msg="2.25 NDP no poisoning dmac=bcast   target    ns lladdr=bcast"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ADDR1}" "${NS}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.26 NDP no poisoning dmac=bcast   target    ns lladdr=null"
+	run_no_ndp_poisoning "${BCAST_MAC}" "${V6_ADDR1}" "${NS}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.27 NDP no poisoning dmac=solnode target    ns lladdr=bcast"
+	run_no_ndp_poisoning "${V6_SOL_NODE_MAC1}" "${V6_ADDR1}" "${NS}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.28 NDP no poisoning dmac=solnode target    ns lladdr=null"
+	run_no_ndp_poisoning "${V6_SOL_NODE_MAC1}" "${V6_ADDR1}" "${NS}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Targeted MAC, Targeted IP solic
+	msg="2.29 NDP no poisoning dmac=veth1   target    ns lladdr=bcast"
+	run_no_ndp_poisoning "${veth1_mac}" "${V6_ADDR1}" "${NS}" "${BCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.30 NDP no poisoning dmac=veth1   target    ns lladdr=null"
+	run_no_ndp_poisoning "${veth1_mac}" "${V6_ADDR1}" "${NS}" "${NULL_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Poison with MCAST mac
+	msg="2.31 NDP no poisoning dmac=solnode solnode   ns lladdr=mcast"
+	run_no_ndp_poisoning "${V6_SOL_NODE_MAC1}" "${V6_SOL_NODE1}" "${NS}" "${MCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="2.32 NDP no poisoning dmac=veth1   target    ns lladdr=mcast"
+	run_no_ndp_poisoning "${veth1_mac}" "${V6_ADDR1}" "${NS}" "${MCAST_MAC}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	cleanup
+
+	printf '%b' "${results}"
+}
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit "${ksft_skip}"
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+	echo "SKIP: Could not run test without ip tool"
+	exit "${ksft_skip}"
+fi
+
+run_all_tests
+exit "${ret}"
diff --git a/tools/testing/selftests/net/arp_no_invalid_sha_poision.sh b/tools/testing/selftests/net/arp_no_invalid_sha_poision.sh
deleted file mode 100755
index 755dd31212c8..000000000000
--- a/tools/testing/selftests/net/arp_no_invalid_sha_poision.sh
+++ /dev/null
@@ -1,173 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-#
-# Tests that ARP announcements with Broadcast, Multicast or NULL mac are never
-# accepted
-#
-
-source lib.sh
-
-readonly V4_ADDR0="10.0.10.1"
-readonly V4_ADDR1="10.0.10.2"
-readonly BCAST_MAC="ff:ff:ff:ff:ff:ff"
-readonly MCAST_MAC="01:00:5e:00:00:00"
-readonly NULL_MAC="00:00:00:00:00:00"
-readonly VALID_MAC="02:01:02:03:04:05"
-readonly ARP_REQ="request"
-readonly ARP_REPLY="reply"
-ret=0
-veth1_mac=
-
-setup() {
-	setup_ns PEER_NS
-
-	ip link add name veth0 type veth peer name veth1
-	ip link set dev veth0 up
-	ip link set dev veth1 netns "${PEER_NS}"
-	ip netns exec "${PEER_NS}" ip link set dev veth1 up
-	ip addr add "${V4_ADDR0}"/24 dev veth0
-	ip netns exec "${PEER_NS}" ip addr add "${V4_ADDR1}"/24 dev veth1
-	ip netns exec "${PEER_NS}" ip route add default via "${V4_ADDR0}" dev veth1
-
-	# Raise ARP timers to avoid flakes due to refreshes
-	sysctl -w net.ipv4.neigh.veth0.base_reachable_time=3600 \
-		>/dev/null 2>&1
-	ip netns exec "${PEER_NS}" \
-		sysctl -w net.ipv4.neigh.veth1.gc_stale_time=3600 \
-		>/dev/null 2>&1
-	ip netns exec "${PEER_NS}" \
-		sysctl -w net.ipv4.neigh.veth1.base_reachable_time=3600 \
-		>/dev/null 2>&1
-
-	veth1_mac="$(ip netns exec "${PEER_NS}" ip -j link show veth1 | \
-		jq -r '.[0].address' )"
-}
-
-cleanup() {
-	ip neigh flush dev veth0
-	ip link del veth0
-	cleanup_ns "${PEER_NS}"
-}
-
-# Make sure ARP announcement with invalid MAC is never learnt
-run_no_arp_poisoning() {
-	local l2_dmac="${1}"
-	local tmac="${2}"
-	local op="${3}"
-
-	ret=0
-
-	ip netns exec "${PEER_NS}" ip neigh flush dev veth1 >/dev/null 2>&1
-	ip netns exec "${PEER_NS}" ping -c 1 "${V4_ADDR0}" >/dev/null 2>&1
-
-	# Poison with a valid MAC to ensure injection is working
-	mausezahn "veth0" -q -a "${VALID_MAC}" -b "${BCAST_MAC}" -t arp \
-		  "${op}, sip=${V4_ADDR0}, tip=${V4_ADDR0}, smac=${VALID_MAC}, tmac=${VALID_MAC}"
-
-	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}" | \
-		grep "${VALID_MAC}")
-	if [ "${neigh}" == "" ]; then
-		echo "ERROR: unable to ARP poision with a valid MAC ${VALID_MAC}"
-		ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}"
-		ret=1
-		return
-	fi
-
-	# Poison with tmac
-	mausezahn "veth0" -q -a "${VALID_MAC}" -b "${l2_dmac}" -t arp \
-		  "${op}, sip=${V4_ADDR0}, tip=${V4_ADDR0}, smac=${tmac}, tmac=${tmac}"
-
-	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}" | \
-		grep "${tmac}")
-	if [ "${neigh}" != "" ]; then
-		echo "ERROR: ARP entry learnt for ${tmac} announcement."
-		ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}"
-		ret=1
-		return
-	fi
-}
-
-print_test_result() {
-	local msg="${1}"
-	local rc="${2}"
-
-	if [ "${rc}" == 0 ]; then
-		printf "TEST: %-60s  [ OK ]" "${msg}"
-	else
-		printf "TEST: %-60s  [ FAIL ]" "${msg}"
-	fi
-}
-
-run_all_tests() {
-	local results
-
-	setup
-
-	## ARP
-	# Broadcast gARPs
-	msg="1.1  ARP no poisoning dmac=bcast reply sha=bcast"
-	run_no_arp_poisoning "${BCAST_MAC}" "${BCAST_MAC}" "${ARP_REPLY}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.2  ARP no poisoning dmac=bcast reply sha=null"
-	run_no_arp_poisoning "${BCAST_MAC}" "${NULL_MAC}" "${ARP_REPLY}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.3  ARP no poisoning dmac=bcast req   sha=bcast"
-	run_no_arp_poisoning "${BCAST_MAC}" "${BCAST_MAC}" "${ARP_REQ}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.4  ARP no poisoning dmac=bcast req   sha=null"
-	run_no_arp_poisoning "${BCAST_MAC}" "${NULL_MAC}" "${ARP_REQ}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.5  ARP no poisoning dmac=bcast req   sha=mcast"
-	run_no_arp_poisoning "${BCAST_MAC}" "${MCAST_MAC}" "${ARP_REQ}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.6  ARP no poisoning dmac=bcast reply sha=mcast"
-	run_no_arp_poisoning "${BCAST_MAC}" "${MCAST_MAC}" "${ARP_REPLY}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	# Targeted gARPs
-	msg="1.7  ARP no poisoning dmac=veth0 reply sha=bcast"
-	run_no_arp_poisoning "${veth1_mac}" "${BCAST_MAC}" "${ARP_REPLY}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.8  ARP no poisoning dmac=veth0 reply sha=null"
-	run_no_arp_poisoning "${veth1_mac}" "${NULL_MAC}" "${ARP_REPLY}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.9  ARP no poisoning dmac=veth0 req   sha=bcast"
-	run_no_arp_poisoning "${veth1_mac}" "${BCAST_MAC}" "${ARP_REQ}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.10 ARP no poisoning dmac=veth0 req   sha=null"
-	run_no_arp_poisoning "${veth1_mac}" "${NULL_MAC}" "${ARP_REQ}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.11 ARP no poisoning dmac=veth0 req   sha=mcast"
-	run_no_arp_poisoning "${veth1_mac}" "${MCAST_MAC}" "${ARP_REQ}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	msg="1.12 ARP no poisoning dmac=veth0 reply sha=mcast"
-	run_no_arp_poisoning "${veth1_mac}" "${MCAST_MAC}" "${ARP_REPLY}"
-	results+="$(print_test_result "${msg}" "${ret}")\n"
-
-	cleanup
-
-	printf '%b' "${results}"
-}
-
-if [ "$(id -u)" -ne 0 ];then
-	echo "SKIP: Need root privileges"
-	exit "${ksft_skip}"
-fi
-
-if [ ! -x "$(command -v ip)" ]; then
-	echo "SKIP: Could not run test without ip tool"
-	exit "${ksft_skip}"
-fi
-
-run_all_tests
-exit "${ret}"
diff --git a/tools/testing/selftests/net/ndisc_send.c b/tools/testing/selftests/net/ndisc_send.c
new file mode 100644
index 000000000000..4f226221d079
--- /dev/null
+++ b/tools/testing/selftests/net/ndisc_send.c
@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <inttypes.h>
+#include <netinet/ether.h>
+#include <arpa/inet.h>
+
+#include <linux/if_packet.h>
+#include <linux/if_ether.h>
+#include <linux/ipv6.h>
+#include <linux/icmpv6.h>
+
+#define ICMPV6_ND_NS 135
+#define ICMPV6_ND_NA 136
+#define ICMPV6_ND_SLLADR 1
+#define ICMPV6_ND_TLLADR 2
+
+#ifndef __noinline
+#define __noinline __attribute__((noinline))
+#endif
+#ifndef __packed
+#define __packed __attribute__((packed))
+#endif
+
+struct icmp6_pseudohdr {
+	struct in6_addr saddr;
+	struct in6_addr daddr;
+	uint32_t plen;
+	uint8_t zero[3];
+	uint8_t next;
+};
+
+struct ndisc_pkt {
+	struct ethhdr eth;
+	struct ipv6hdr ip6;
+	struct ndp_hdrs {
+		struct icmp6hdr hdr;
+		struct in6_addr target;
+
+		uint8_t opt_type;
+		uint8_t opt_len;
+		uint8_t opt_mac[ETH_ALEN];
+	} __packed ndp;
+} __packed;
+
+__noinline uint32_t csum_add(void *buf, int len, uint32_t sum)
+{
+	uint16_t *p = (uint16_t *)buf;
+
+	while (len > 1) {
+		sum += *p++;
+		len -= 2;
+	}
+
+	if (len)
+		sum += *(uint8_t *)p;
+
+	return sum;
+}
+
+static uint16_t csum_fold(uint32_t sum)
+{
+	return ~((sum & 0xffff) + (sum >> 16)) ? : 0xffff;
+}
+
+int parse_opts(int argc, char **argv, int *ifindex, struct ndisc_pkt *pkt)
+{
+	struct ether_addr *mac;
+	uint16_t op;
+	struct icmp6_pseudohdr ph = {0};
+	uint32_t sum = 0;
+
+	if (argc != 9) {
+		fprintf(stderr, "Usage: %s <iface> <mac_dst> <mac_src> <dst_ip> <src_ip> <target_ip> <op> <lladr>\n",
+			argv[0]);
+		return -1;
+	}
+
+	*ifindex = atoi(argv[1]);
+	mac = ether_aton(argv[2]);
+	if (!mac) {
+		fprintf(stderr, "Unable to parse mac_dst from '%s'\n", argv[1]);
+		return -1;
+	}
+
+	/* Ethernet */
+	memcpy(pkt->eth.h_dest, mac, ETH_ALEN);
+	mac = ether_aton(argv[3]);
+	if (!mac) {
+		fprintf(stderr, "Unable to parse mac_src from '%s'\n", argv[2]);
+		return -1;
+	}
+	memcpy(pkt->eth.h_source, mac, ETH_ALEN);
+	pkt->eth.h_proto = htons(ETH_P_IPV6);
+
+	/* IPv6 */
+	pkt->ip6.version = 6;
+	pkt->ip6.nexthdr = IPPROTO_ICMPV6;
+	pkt->ip6.hop_limit = 255;
+
+	if (inet_pton(AF_INET6, argv[4], &pkt->ip6.daddr) != 1) {
+		fprintf(stderr, "Unable to parse src_ip from '%s'\n", argv[4]);
+		return -1;
+	}
+	if (inet_pton(AF_INET6, argv[5], &pkt->ip6.saddr) != 1) {
+		fprintf(stderr, "Unable to parse src_ip from '%s'\n", argv[5]);
+		return -1;
+	}
+
+	/* ICMPv6 */
+	op = atoi(argv[7]);
+	if (op != ICMPV6_ND_NS && op != ICMPV6_ND_NA) {
+		fprintf(stderr, "Invalid ICMPv6 op %d\n", op);
+		return -1;
+	}
+
+	pkt->ndp.hdr.icmp6_type = op;
+	pkt->ndp.hdr.icmp6_code = 0;
+
+	if (inet_pton(AF_INET6, argv[6], &pkt->ndp.target) != 1) {
+		fprintf(stderr, "Unable to parse target_ip from '%s'\n",
+			argv[6]);
+		return -1;
+	}
+
+	/* Target/Source Link-Layer Address */
+	if (op == ICMPV6_ND_NS) {
+		pkt->ndp.opt_type = ICMPV6_ND_SLLADR;
+	} else {
+		pkt->ndp.opt_type = ICMPV6_ND_TLLADR;
+		pkt->ndp.hdr.icmp6_override = 1;
+	}
+	pkt->ndp.opt_len = 1;
+
+	mac = ether_aton(argv[8]);
+	if (!mac) {
+		fprintf(stderr, "Invalid lladdr %s\n", argv[8]);
+		return -1;
+	}
+
+	memcpy(pkt->ndp.opt_mac, mac, ETH_ALEN);
+
+	pkt->ip6.payload_len = htons(sizeof(pkt->ndp));
+
+	/* Pseudoheader */
+	ph.saddr = pkt->ip6.saddr;
+	ph.daddr = pkt->ip6.daddr;
+	ph.plen = htonl(sizeof(pkt->ndp));
+	ph.next = IPPROTO_ICMPV6;
+
+	sum = csum_add(&ph, sizeof(ph), 0);
+	sum = csum_add(&pkt->ndp, sizeof(pkt->ndp), sum);
+
+	pkt->ndp.hdr.icmp6_cksum = csum_fold(sum);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int rc, fd;
+	struct sockaddr_ll bind_addr = {0};
+	int ifindex;
+	struct ndisc_pkt pkt = {0};
+
+	if (parse_opts(argc, argv, &ifindex, &pkt) < 0)
+		return -1;
+
+	fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+	if (fd < 0) {
+		fprintf(stderr, "Unable to open raw socket(%d). Need root privileges?\n",
+			fd);
+		return 1;
+	}
+
+	bind_addr.sll_family   = AF_PACKET;
+	bind_addr.sll_protocol = htons(ETH_P_ALL);
+	bind_addr.sll_ifindex  = ifindex;
+
+	rc = bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr));
+	if (rc < 0) {
+		fprintf(stderr, "Unable to bind raw socket(%d). Invalid iface '%d'?\n",
+			rc, ifindex);
+		return 1;
+	}
+
+	rc = send(fd, &pkt, sizeof(pkt), 0);
+	if (rc < 0) {
+		fprintf(stderr, "Unable to send packet: %d\n", rc);
+		return 1;
+	}
+
+	return 0;
+}
-- 
2.47.3


^ permalink raw reply related

* [PATCH net-next v4 3/4] neigh: discard invalid lladdr (b/mcast poison)
From: Marc Suñé @ 2026-02-07 20:40 UTC (permalink / raw)
  To: kuba, willemdebruijn.kernel, pabeni
  Cc: netdev, dborkman, vadim.fedorenko, Marc Suñé
In-Reply-To: <cover.1770399836.git.marcdevel@gmail.com>

Prior to this commit, the NDP implementation accepted NDP NS/NA with
the broadcast, multicast and null addresses as src/dst lladdr, and
updated the neighbour cache for that host.

Broadcast, multicast and null MAC addresses shall never be associated
with a unicast or a multicast IPv6 address (see RFC1812, section 3.3.2).

NDP poisioning with a broadcast MAC and certain multicast MAC addresses,
especially when poisoning a Gateway IP, have some undesired implications
compared to an NDP poisioning with a regular MAC (see ARP bcast poison
commit for more details).

Since these MACs should never be announced, discard/drop NDP with
lladdr={bcast, null}, which prevents the broadcast/multicast NDP
poisoning vector.

Signed-off-by: Marc Suñé <marcdevel@gmail.com>
---
 net/ipv6/ndisc.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index f6a5d8c73af9..8fe215505869 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -830,6 +830,14 @@ static enum skb_drop_reason ndisc_recv_ns(struct sk_buff *skb)
 			return reason;
 		}

+		/*
+		 * Broadcast/Multicast and zero MAC addresses should
+		 * never be announced and accepted as llsrc address (prevent
+		 * NDP B/MCAST MAC poisoning attack).
+		 */
+		if (dev->type == ARPHRD_ETHER && !is_valid_ether_addr(lladdr))
+			return reason;
+
 		/* RFC2461 7.1.1:
 		 *	If the IP source address is the unspecified address,
 		 *	there MUST NOT be source link-layer address option
@@ -1033,6 +1041,14 @@ static enum skb_drop_reason ndisc_recv_na(struct sk_buff *skb)
 			net_dbg_ratelimited("NA: invalid link-layer address length\n");
 			return reason;
 		}
+
+		/*
+		 * Broadcast/Multicast and zero MAC addresses should
+		 * never be announced and accepted as lltgt address (prevent
+		 * NDP B/MCAST MAC poisoning attack).
+		 */
+		if (dev->type == ARPHRD_ETHER && !is_valid_ether_addr(lladdr))
+			return reason;
 	}
 	ifp = ipv6_get_ifaddr(dev_net(dev), &msg->target, dev, 1);
 	if (ifp) {
-- 
2.47.3

^ permalink raw reply related

* [PATCH net-next v4 2/4] selftests/net: add no ARP b/mcast,null poison test
From: Marc Suñé @ 2026-02-07 20:40 UTC (permalink / raw)
  To: kuba, willemdebruijn.kernel, pabeni
  Cc: netdev, dborkman, vadim.fedorenko, Marc Suñé
In-Reply-To: <cover.1770399836.git.marcdevel@gmail.com>

Add a selftest to test that ARP bcast/mcast/null poisioning checks
are never bypassed.

Signed-off-by: Marc Suñé <marcdevel@gmail.com>
---
 tools/testing/selftests/net/Makefile          |   1 +
 .../net/arp_no_invalid_sha_poision.sh         | 173 ++++++++++++++++++
 2 files changed, 174 insertions(+)
 create mode 100755 tools/testing/selftests/net/arp_no_invalid_sha_poision.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index afdea6d95bde..c6d571bf72be 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -11,6 +11,7 @@ TEST_PROGS := \
 	amt.sh \
 	arp_ndisc_evict_nocarrier.sh \
 	arp_ndisc_untracked_subnets.sh \
+	arp_no_invalid_sha_poision.sh \
 	bareudp.sh \
 	big_tcp.sh \
 	bind_bhash.sh \
diff --git a/tools/testing/selftests/net/arp_no_invalid_sha_poision.sh b/tools/testing/selftests/net/arp_no_invalid_sha_poision.sh
new file mode 100755
index 000000000000..755dd31212c8
--- /dev/null
+++ b/tools/testing/selftests/net/arp_no_invalid_sha_poision.sh
@@ -0,0 +1,173 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Tests that ARP announcements with Broadcast, Multicast or NULL mac are never
+# accepted
+#
+
+source lib.sh
+
+readonly V4_ADDR0="10.0.10.1"
+readonly V4_ADDR1="10.0.10.2"
+readonly BCAST_MAC="ff:ff:ff:ff:ff:ff"
+readonly MCAST_MAC="01:00:5e:00:00:00"
+readonly NULL_MAC="00:00:00:00:00:00"
+readonly VALID_MAC="02:01:02:03:04:05"
+readonly ARP_REQ="request"
+readonly ARP_REPLY="reply"
+ret=0
+veth1_mac=
+
+setup() {
+	setup_ns PEER_NS
+
+	ip link add name veth0 type veth peer name veth1
+	ip link set dev veth0 up
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip netns exec "${PEER_NS}" ip link set dev veth1 up
+	ip addr add "${V4_ADDR0}"/24 dev veth0
+	ip netns exec "${PEER_NS}" ip addr add "${V4_ADDR1}"/24 dev veth1
+	ip netns exec "${PEER_NS}" ip route add default via "${V4_ADDR0}" dev veth1
+
+	# Raise ARP timers to avoid flakes due to refreshes
+	sysctl -w net.ipv4.neigh.veth0.base_reachable_time=3600 \
+		>/dev/null 2>&1
+	ip netns exec "${PEER_NS}" \
+		sysctl -w net.ipv4.neigh.veth1.gc_stale_time=3600 \
+		>/dev/null 2>&1
+	ip netns exec "${PEER_NS}" \
+		sysctl -w net.ipv4.neigh.veth1.base_reachable_time=3600 \
+		>/dev/null 2>&1
+
+	veth1_mac="$(ip netns exec "${PEER_NS}" ip -j link show veth1 | \
+		jq -r '.[0].address' )"
+}
+
+cleanup() {
+	ip neigh flush dev veth0
+	ip link del veth0
+	cleanup_ns "${PEER_NS}"
+}
+
+# Make sure ARP announcement with invalid MAC is never learnt
+run_no_arp_poisoning() {
+	local l2_dmac="${1}"
+	local tmac="${2}"
+	local op="${3}"
+
+	ret=0
+
+	ip netns exec "${PEER_NS}" ip neigh flush dev veth1 >/dev/null 2>&1
+	ip netns exec "${PEER_NS}" ping -c 1 "${V4_ADDR0}" >/dev/null 2>&1
+
+	# Poison with a valid MAC to ensure injection is working
+	mausezahn "veth0" -q -a "${VALID_MAC}" -b "${BCAST_MAC}" -t arp \
+		  "${op}, sip=${V4_ADDR0}, tip=${V4_ADDR0}, smac=${VALID_MAC}, tmac=${VALID_MAC}"
+
+	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}" | \
+		grep "${VALID_MAC}")
+	if [ "${neigh}" == "" ]; then
+		echo "ERROR: unable to ARP poision with a valid MAC ${VALID_MAC}"
+		ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}"
+		ret=1
+		return
+	fi
+
+	# Poison with tmac
+	mausezahn "veth0" -q -a "${VALID_MAC}" -b "${l2_dmac}" -t arp \
+		  "${op}, sip=${V4_ADDR0}, tip=${V4_ADDR0}, smac=${tmac}, tmac=${tmac}"
+
+	neigh=$(ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}" | \
+		grep "${tmac}")
+	if [ "${neigh}" != "" ]; then
+		echo "ERROR: ARP entry learnt for ${tmac} announcement."
+		ip netns exec "${PEER_NS}" ip neigh show "${V4_ADDR0}"
+		ret=1
+		return
+	fi
+}
+
+print_test_result() {
+	local msg="${1}"
+	local rc="${2}"
+
+	if [ "${rc}" == 0 ]; then
+		printf "TEST: %-60s  [ OK ]" "${msg}"
+	else
+		printf "TEST: %-60s  [ FAIL ]" "${msg}"
+	fi
+}
+
+run_all_tests() {
+	local results
+
+	setup
+
+	## ARP
+	# Broadcast gARPs
+	msg="1.1  ARP no poisoning dmac=bcast reply sha=bcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${BCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.2  ARP no poisoning dmac=bcast reply sha=null"
+	run_no_arp_poisoning "${BCAST_MAC}" "${NULL_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.3  ARP no poisoning dmac=bcast req   sha=bcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${BCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.4  ARP no poisoning dmac=bcast req   sha=null"
+	run_no_arp_poisoning "${BCAST_MAC}" "${NULL_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.5  ARP no poisoning dmac=bcast req   sha=mcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${MCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.6  ARP no poisoning dmac=bcast reply sha=mcast"
+	run_no_arp_poisoning "${BCAST_MAC}" "${MCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	# Targeted gARPs
+	msg="1.7  ARP no poisoning dmac=veth0 reply sha=bcast"
+	run_no_arp_poisoning "${veth1_mac}" "${BCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.8  ARP no poisoning dmac=veth0 reply sha=null"
+	run_no_arp_poisoning "${veth1_mac}" "${NULL_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.9  ARP no poisoning dmac=veth0 req   sha=bcast"
+	run_no_arp_poisoning "${veth1_mac}" "${BCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.10 ARP no poisoning dmac=veth0 req   sha=null"
+	run_no_arp_poisoning "${veth1_mac}" "${NULL_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.11 ARP no poisoning dmac=veth0 req   sha=mcast"
+	run_no_arp_poisoning "${veth1_mac}" "${MCAST_MAC}" "${ARP_REQ}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	msg="1.12 ARP no poisoning dmac=veth0 reply sha=mcast"
+	run_no_arp_poisoning "${veth1_mac}" "${MCAST_MAC}" "${ARP_REPLY}"
+	results+="$(print_test_result "${msg}" "${ret}")\n"
+
+	cleanup
+
+	printf '%b' "${results}"
+}
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit "${ksft_skip}"
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+	echo "SKIP: Could not run test without ip tool"
+	exit "${ksft_skip}"
+fi
+
+run_all_tests
+exit "${ret}"
-- 
2.47.3


^ permalink raw reply related

* [PATCH net-next v4 1/4] arp: discard invalid sha addr (b/mcast ARP poison)
From: Marc Suñé @ 2026-02-07 20:40 UTC (permalink / raw)
  To: kuba, willemdebruijn.kernel, pabeni
  Cc: netdev, dborkman, vadim.fedorenko, Marc Suñé
In-Reply-To: <cover.1770399836.git.marcdevel@gmail.com>

Prior to this commit, the ARP implementation accepted ARP req/replies
with multicast (including broadcast) and null MAC addresses as Sender
HW Address (SHA), and updated the ARP cache for that neighbour.
Broadcast, multicast and null MAC addresses shall never be associated
with a unicast or a multicast IPv4 address (see RFC1812, section 3.3.2).

ARP poisioning with a broadcast MAC address and certain multicast
addresses, especially when poisoning a Gateway IP, have some undesired
implications compared to an ARP poisioning with a regular MAC (see
Note1).

Worth mentioning that if an attacker is able to ARP poison in
a L2 segment, that in itself is probably a bigger security threat
(Man-in-middle etc., see Note2).

However, since these MACs should never be announced as SHA,
discard/drop ARPs with SHA={b/mcast, null}, which prevents the
broadcast/multicast ARP poisoning vector.

Note1:

After a successful broadcast/multicast ARP poisioning attack:

1. Unicast packets and refresh ("targeted") ARPs sent to or via
   the poisioned IP (e.g. the default GW) are flooded by
   bridges/switches. That is in absence of other security controls.

   Hardware swiches generally have rate-limits to prevent/mitigate
   broadcast storms, since ARPs are usually answered by the CPU.
   Legit unicast packets could be dropped (perf. degradation).

   Most modern NICs implement some form of L2 MAC filtering to early
   discard irrelevant packets. In contrast to an ARP poisoning
   attack with any other MAC, both unicast and ARP ("targeted")
   refresh packets are passed up to the Kernel networking stack
   (for all hosts in the L2 segment).

2. A single forged ARP packet (e.g. for the Gateway IP) can produce
   up to N "targeted" (to broadcast) ARPs, where N is the number of
   hosts in the L2 segment that have an ARP entry for that IP
   (e.g. GW), and some more traffic, since the real host will answer
   to targeted refresh ARPs with their (real) reply.

   This is a relatively low amount of traffic compared to 1).

3. An attacker could use this form of ARP poisoning to discover
   all hosts in a L2 segment in a very short period of time with
   one or few packets.

   By poisoning e.g. the default GW (likely multiple times, to
   avoid races with real gARPs from the GW), all hosts will eventually
   issue refresh "targeted" ARPs for the GW IP with the broadcast MAC
   address as destination. These packets will be flooded in the L2
   segment, revealing the presence of hosts to the attacker.

   For comparison:
     * Passive ARP monitoring: also stealthy, but can take a long
       time or not be possible at all in switches, as most refresh
       ARPs are targeted.
     * ARP req flooding: requires swiping the entire subnet. Noisy
       and easy to detect.
     * ICMP/L4 port scans: similar to the above.

4. In the unlikely case that hosts were to run with
   `/proc/sys/net/ipv4/conf/*/arp_accept=1` (unsafe, and disabled
   by default), poisoning with the broadcast MAC could be used to
   create significantly more broadcast traffic (low-volume
   amplification attack).

   An attacker could send M fake gARP with a number of IP addresses,
   where M is `/proc/sys/net/ipv4/neigh/*/gc_thresh3` (1024 by
   default). This would result in M x R ARPs, where R is the number
   of hosts in L2 segment with `arp_accept=1`, and likely other
   (real) ARP replies coming from the attacked host. This starts to
   get really relevant when R > 512, which is possible in large LANs
   but not very common.

Note2:

However, broadcast ARP poisoning might be subtle and difficult to
spot. These ARP packets appear on the surface as regular broadcast
ARP requests (unless ARP hdr is inspected), traffic continues to
flow uninterrupted (unless broadcast rate-limit in switches kick-in)
and, the next refresh ARP reply (from the GW) or any (valid) gARP
from the GW, will restore the original MAC in the ARP table, making
the traffic flow normally again.

Signed-off-by: Marc Suñé <marcdevel@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
---
 net/ipv4/arp.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index c8c3e1713c0e..7dbb3fd5cc8a 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -800,6 +800,14 @@ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb)
 		goto out_free_skb;

 /*
+ *	For Ethernet devices, Multicast/Broadcast and zero MAC addresses should
+ *	never be announced and accepted as sender HW address (RFC1812, 3.3.2).
+ *	Prevents Broadcast/Mcast ARP poisoning attack.
+ */
+	if (dev->type == ARPHRD_ETHER && !is_valid_ether_addr(sha))
+		goto out_free_skb;
+
+ /*
  *     Special case: We must set Frame Relay source Q.922 address
  */
 	if (dev_type == ARPHRD_DLCI)
-- 
2.47.3

^ permalink raw reply related

* [PATCH net-next v4 0/4] discard ARP/NDP b/mcast/null announce (poison)
From: Marc Suñé @ 2026-02-07 20:40 UTC (permalink / raw)
  To: kuba, willemdebruijn.kernel, pabeni
  Cc: netdev, dborkman, vadim.fedorenko, Marc Suñé

The current ARP and NDP implementations accept announcements with
multicast (broadcast incl.) and null MAC addresses as Sender HW Address
(SHA) in ARP or src/target lladdr in NDP, and updates the cache
for that neighbour.

Multicast (incl. broadcast) and null MAC addresses shall never be
associated with a unicast or a multicast IPv4/6 address (see RFC1812,
section 3.3.2).

ARP/NDP poisioning with a broadcast and certain multicast MAC addresses,
especially when poisoning a Gateway IP, have some undesired implications
compared to an ARP/NDP poisioning with a regular MAC (see commit message
in patch 1 for more information).

Worth mentioning that if an attacker is able to ARP/NDP poison in
a L2 segment, that in itself is probably a bigger security threat
(Man-in-middle etc., see Note2 in patch 1)

Since these MACs should never be announced, this patch series discards/drops
these packets, which prevents broadcast and multicast ARP/NDP poisoning
vectors.

This patchset only modifies the behaviour of the neighbouring subsystem
when processing network packets. Static entries can still be added with
mcast/bcast/null MACs.

v3: https://lore.kernel.org/netdev/cover.1770241104.git.marcdevel@gmail.com/
v2: https://lore.kernel.org/netdev/cover.1769464405.git.marcdevel@gmail.com/
v1: https://lore.kernel.org/netdev/cover.1766349632.git.marcdevel@gmail.com/

Changes since v3
================
  - Respin rebase on top net-next

Changes since v2
================
  - Target net-next instead of net
  - Use mausezahn for patch2 and remove arp_send.c
  - Kept ndisc_send.c for patch 4, as ndisc6 and mausezahn are not valid
    options (see comment)
  - Fixed comment llsrc->lltgt (AI review)
  - Misc fixes: shellcheck, alphabetical order in Makefile

Changes since RFC v1
====================
  - Discard announcements with multicast MAC addresses
  - Check for dev->type == ARPHRD_ETHER instead of HW addrlen in ARP
  - Use !is_valid_ether_addr()
  - Added multicast test coverage and renamed tests accordingly
  - Dropped patch 5 (scapy utils)

Comments
========

On `ndisc_send.c` alternatives, ndisc6 and extending mausezahn were not viable
options. Submitted with `ndisc_send.c`, as preferred by maintainers.

Having said that, I still think Scapy is the best tool for these sort of packet
generation, and can simplify - perhaps at a cost of some test execution time -
selftest creation and maintenance. As also mentioned in
../bpf/generate_udp_fragments.py, it's sort of industry standard, and is widely
used for dataplane functional testing. I've personally used it in several
organizations to functionally test ASICs, NPUs and SW datapaths.

If maintainers are willing to reconsider that, I'd be happy to work on
transitioning the existing selftests (using ndisc6, mausezahn...) towards Scapy.

Marc Suñé (4):
  arp: discard invalid sha addr (b/mcast ARP poison)
  selftests/net: add no ARP b/mcast,null poison test
  neigh: discard invalid lladdr (b/mcast poison)
  selftests/net: add no NDP b/mcast,null poison test

 net/ipv4/arp.c                                |   8 +
 net/ipv6/ndisc.c                              |  16 +
 tools/testing/selftests/net/.gitignore        |   1 +
 tools/testing/selftests/net/Makefile          |   3 +
 .../net/arp_ndisc_no_invalid_sha_poison.sh    | 368 ++++++++++++++++++
 tools/testing/selftests/net/ndisc_send.c      | 198 ++++++++++
 6 files changed, 594 insertions(+)
 create mode 100755 tools/testing/selftests/net/arp_ndisc_no_invalid_sha_poison.sh
 create mode 100644 tools/testing/selftests/net/ndisc_send.c

-- 
2.47.3

^ permalink raw reply

* Re: [BUG] vsock: poll() not waking on data arrival, causing multi-second SSH delays
From: agpn1b92 @ 2026-02-07 18:56 UTC (permalink / raw)
  To: sgarzare; +Cc: virtualization, netdev
In-Reply-To: <aYYTder4-zvgOfV7@sgarzare-redhat>

Hi Stefano and all,

Thank you Stefano for your response and skepticism about whether this was
a kernel issue - you were absolutely right to question it!

After extensive debugging with strace on both guest and host, I've
determined this was NOT a kernel bug at all, but rather an OpenSSH issue
specific to vsock connections.

Root Cause:
-----------
The 10-20 second delay was caused by OpenSSH's sshd attempting DNS lookups
on the literal string "UNKNOWN" (the placeholder hostname used for vsock
connections where no IP address exists). This triggered two 5-second DNS
timeouts during login recording and audit subsystem operations, totaling
~10 seconds of delay.

The strace showed:
   17:11:14.465 sendmmsg(13, DNS query for "UNKNOWN")
   17:11:14.465 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout) 
<5.005s>
   17:11:19.472 sendmmsg(13, DNS query for "UNKNOWN") [RETRY]
   17:11:19.472 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout) 
<5.005s>

Why I Initially Thought It Was a Kernel Issue:
----------------------------------------------
- bpftrace showed ppoll() timeouts while data appeared to be queued
- The pattern looked like a classic lost wakeup race condition

However, the vsock kernel modules were working perfectly. The delay
happened in userspace during sshd's session setup, specifically when
mm_record_login() tried to resolve the peer hostname for logging.

The Fix:
--------
OpenSSH 10.1 and 10.2 include fixes to prevent passing "UNKNOWN" to
subsystems that would attempt DNS resolution:

- 10.1: Skip audit logging for UNKNOWN hostnames
- 10.2: Don't set PAM_RHOST when remote host is "UNKNOWN"

References:
- https://github.com/openssh/openssh-portable/pull/388
- 
https://gitlab.archlinux.org/archlinux/packaging/packages/openssh/-/issues/16
- https://www.openssh.org/releasenotes.html

Workaround for older OpenSSH versions:
Add to /etc/hosts: 127.0.0.1 UNKNOWN

Apologies for the noise on netdev - the vsock kernel implementation is
working correctly. The misleading symptoms (PTY-specific, ppoll timeouts,
state between connections) made it appear kernel-related when it was
actually sshd's login recording code hitting DNS timeouts.

Thanks again for your help and for maintaining the vsock subsystem!

Best regards,
[Your name - don't forget to update it this time or you'll look even 
more stupid]

^ permalink raw reply

* [net-next] net: ethernet: ravb: Disable interrupts when closing device
From: Niklas Söderlund @ 2026-02-07 18:43 UTC (permalink / raw)
  To: Yoshihiro Shimoda, Paul Barker, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	linux-renesas-soc
  Cc: Niklas Söderlund

From: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

Disable interrupts when closing the device.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
[Niklas: Rebase from BSP and reword commit message]
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/net/ethernet/renesas/ravb_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 57b0db314fb5..d56b71003585 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2368,6 +2368,7 @@ static int ravb_close(struct net_device *ndev)
 	ravb_write(ndev, 0, RIC0);
 	ravb_write(ndev, 0, RIC2);
 	ravb_write(ndev, 0, TIC);
+	ravb_write(ndev, 0, ECSIPR);
 
 	/* PHY disconnect */
 	if (ndev->phydev) {
-- 
2.52.0


^ permalink raw reply related

* RFC: stmmac RSS support
From: Russell King (Oracle) @ 2026-02-07 17:36 UTC (permalink / raw)
  To: netdev, Andrew Lunn, Jose Abreu
  Cc: Maxime Chevallier, Thierry Reding, Paritosh Dixit

Hi,

While looking at the possibilities of minimising the memory that
struct plat_stmmacenet_data consumes (880 bytes presently on
aarch64), I came across the RSS feature in stmmac.

In commit 76067459c686 ("net: stmmac: Implement RSS and enable it in
XGMAC core"), support was added for RSS to the core stmmac driver for
the dwxgmac2 core. I can only find socfpga and tegra as the two
platform glues that use the dwxgmac2 core.

RSS support is only enabled when both the core supports it, and the
platform glue sets priv->plat->rss_en.

However, the stmmac-related results of grepping for this member do not
show any platform glues which set this flag:

$ git grep '\<rss_en\>'
Documentation/networking/device_drivers/ethernet/stmicro/stmmac.rst:        int rss_en;
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:      if (!priv->dma_cap.rssen || !priv->plat->rss_en) {
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:      if (priv->dma_cap.rssen && priv->plat->rss_en)

So, as no one has decided to enable this feature during the intervening
six years, is there any benefit to having this code in the mainline
kernel, or should this feature be dropped?

If a user appears, the code will remain in git history and could be
restored.

Thoughts?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: 回复：[PATCH v4 net-next 02/11] net/nebula-matrix: add our driver architecture
From: Andrew Lunn @ 2026-02-07 17:19 UTC (permalink / raw)
  To: Illusion Wang
  Cc: Dimon, Alvin, Sam, netdev, andrew+netdev, corbet, kuba, linux-doc,
	lorenzo, pabeni, horms, vadim.fedorenko, lukas.bulwahn, edumazet,
	open list
In-Reply-To: <8641f978-76d5-464f-a312-414bd913c918.illusion.wang@nebula-matrix.com>

On Fri, Feb 06, 2026 at 05:26:35PM +0800, Illusion Wang wrote:
> Last time sam had a question
> "
> Thank you for your feedback. You might have misunderstood me.
> Our difficulties lie in the following:
> 1. Assuming only the mainline version changes the name (Assume name "nbl"),
>    and our regularly released driver doesn't change its name, then when
>    customers upgrade to a new kernel (containing the "nbl" driver),
>    and then want to update our regularly released driver (named "nbl_core"),
>    the module (ko) conflict will occur.
> 2. If both our mainline and regularly released drivers change their names,
>    then customers who are already using the "nbl_core" driver will also
>    encounter conflict issues when updating to the new driver "nbl".
> 
> Is it possible to do this: our net driver is also modified to be a driver based
> on the auxiliary bus, while the PCIe driver only handles PCIe-related processing,
> and these two drivers share a single kernel module (ko), namely "nbl_core"?"
> 
> There's no conclusion to this issue yet, so I haven't modified the 'core' parts for now
> (as mentioned in patch0)

This is all open source, you can do whatever you want with a fork of
Linux and out of tree drivers. Mainline has no influence about what
you do in your out of tree driver. So for Mainline, your out of tree
vendor driver does not really exist, any problems with it are yours to
solve.

However, Mainline cares about Mainline. We expect drivers which get
merged follow Mainline design principles, look like other mainline
drivers, and use naming consistent with other Mainline drivers.

You should also think about how this driver is going to be merged. It
is going to be in small pieces. It is very unlikely the first merged
patchset is actually useful for customers. You probably need quite a
few patchset merged before the driver is useful. If you have customers
who use Linus releases, they are going to have to deal with these WIP
driver. Such customers will be building the kernel themselves, so can
leave the in tree module out of the build. However, do most of your
customers use a distribution? A distribution is not going to update
its kernel until the next LTS kernel is release, sometime in
December. By then, you might have something usable in Mainline, and
the vendor driver is not needed. Or you might still be in the process
of rewriting the driver to Mainline standards and it is not
usable. Your customers then need to handle removing the mainline
driver and use the vendor driver. Again, that is not Mainlines
problem.

So, if your "core" driver is purely core, you can call it core, and
give it an empty tristate. The other drivers which are layered on top
of it can then select it.

If your "core" driver is actually an Ethernet driver, please drop the
name core.

     Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox