public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] net: neighbour: Notify changes atomically
@ 2026-01-14  9:54 Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held Petr Machata
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

Andy Roulin and Francesco Ruggeri have apparently independently both hit an
issue with the current neighbor notification scheme. Francesco reported the
issue in [1]. In a response[2] to that report, Andy said:

    neigh_update sends a rtnl notification if an update, e.g.,
    nud_state change, was done but there is no guarantee of
    ordering of the rtnl notifications. Consider the following
    scenario:

    userspace thread                   kernel thread
    ================                   =============
    neigh_update
       write_lock_bh(n->lock)
       n->nud_state = STALE
       write_unlock_bh(n->lock)
       neigh_notify
         neigh_fill_info
           read_lock_bh(n->lock)
           ndm->nud_state = STALE
           read_unlock_bh(n->lock)
         -------------------------->
                                      neigh:update
                                      write_lock_bh(n->lock)
                                      n->nud_state = REACHABLE
                                      write_unlock_bh(n->lock)
                                      neigh_notify
                                        neigh_fill_info
                                           read_lock_bh(n->lock)
                                           ndm->nud_state = REACHABLE
                                           read_unlock_bh(n->lock)
                                        rtnl_nofify
                                      RTNL REACHABLE sent
                            <--------
        rtnl_notify
        RTNL STALE sent

    In this scenario, the kernel neigh is updated first to STALE and
    then REACHABLE but the netlink notifications are sent out of order,
    first REACHABLE and then STALE.

The solution presented in [2] was to extend the critical region to include
both the call to neigh_fill_info(), as well as rtnl_notify(). Then we have
a guarantee that whatever state was captured by neigh_fill_info(), will be
sent right away. The above scenario can thus not happen.

This is how this patchset begins: patches #1 and #2 add helper duals to
neigh_fill_info() and __neigh_notify() such that the __-prefixed function
assumes the neighbor lock is held, and the unprefixed one is a thin wrapper
that manages locking. This extends locking further than Andy's patch, but
makes for a clear code and supports the following part.

At that point, the original race is gone. But what can happen is the
following race, where the notification does not reflect the change that was
made:

    userspace thread		       kernel thread
    ================		       =============
    neigh_update
       write_lock_bh(n->lock)
       n->nud_state = STALE
       write_unlock_bh(n->lock)
	 -------------------------->
				      neigh:update
				      write_lock_bh(n->lock)
				      n->nud_state = REACHABLE
				      write_unlock_bh(n->lock)
				      neigh_notify
					read_lock_bh(n->lock)
					__neigh_fill_info
					   ndm->nud_state = REACHABLE
					rtnl_notify
					read_unlock_bh(n->lock)
				      RTNL REACHABLE sent
			    <--------
       neigh_notify
	 read_lock_bh(n->lock)
	 __neigh_fill_info
	   ndm->nud_state = REACHABLE
	 rtnl_notify
	 read_unlock_bh(n->lock)
       RTNL REACHABLE sent again

Here, even though neigh_update() made a change to STALE, it later sends a
notification with a NUD of REACHABLE. The obvious solution to fix this race
is to move the notifier to the same critical section that actually makes
the change.

Sending a notification in fact involves two things: invoking the internal
notifier chain, and sending the netlink notification. The overall approach
in this patchset is to move the netlink notification to the critical
section of the change, while keeping the internal notifier intact. Since
the motion is not obviously correct, the patchset presents the change in
series of incremental steps with discussion in commit messages. Please see
details in the patches themselves.


Reproducer
==========

To consistently reproduce, I injected an mdelay before the rtnl_notify()
call. Since only one thread should delay, a bit of instrumentation was
needed to see where the call originates. The mdelay was then only issued on
the call stack rooted in the RTNL request.

Then the general idea is to issue an "ip neigh replace" to mark a neighbor
entry as failed. In parallel to that, inject an ARP burst that validates
the entry. This is all observed with an "ip monitor neigh", where one can
see either a REACHABLE->FAILED transition, or FAILED->REACHABLE, while the
actual state at the end of the sequence is always REACHABLE.

With the patchset, only FAILED->REACHABLE is ever observed in the monitor.


Alternatives
============

Another approach to solving the issue would be to have a per-neighbor queue
of notification digests, each with a set of fields necessary for formatting
a notification. In pseudocode, a neighbor update would look something like
this:

  neighbor_update:
    - lock
    -   do update
    -   allocate notification digest, fill partially, mark not-committed
    - unlock
    - critical-section-breaking stuff (probes, ARP Q, etc.)
    - lock
    -   fill in missing details to the digest (notably neigh->probes)
    -   mark the digest as committed
    -   while (front of the digest queue is committed)
    -     pop it, convert to notifier, send the notification
    - unlock

This adds more complexity and would imply more changes to the code, which
is why I think the approach presented in this patchset is better. But it
would allow us to retain the overall structure of the code while giving us
accurate notifications.


A third approach would be to consider the second race not very serious and
be OK with seeing a notification that does not reflect the change that
prompted it. Then a two-patch prefix of this patchset would be all that is
needed.

[1]: https://lore.kernel.org/netdev/20220606230107.D70B55EC0B30@us226.sjc.aristanetworks.com/
[2]: https://lore.kernel.org/netdev/ed6768c1-80b8-aee2-e545-b51661d49336@nvidia.com/


Petr Machata (8):
  net: core: neighbour: Add a neigh_fill_info() helper for when lock not
    held
  net: core: neighbour: Call __neigh_notify() under a lock
  net: core: neighbour: Extract ARP queue processing to a helper
    function
  net: core: neighbour: Process ARP queue later
  net: core: neighbour: Inline neigh_update_notify() calls
  net: core: neighbour: Reorder netlink & internal notification
  net: core: neighbour: Make one netlink notification atomically
  net: core: neighbour: Make another netlink notification atomically

 net/core/neighbour.c | 147 ++++++++++++++++++++++++++-----------------
 1 file changed, 91 insertions(+), 56 deletions(-)

-- 
2.51.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  2026-01-16  7:22   ` Kuniyuki Iwashima
  2026-01-14  9:54 ` [PATCH net-next 2/8] net: core: neighbour: Call __neigh_notify() under a lock Petr Machata
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

The netlink message needs to be formatted and sent inside the critical
section where the neighbor is changed, so that it reflects the
notified-upon neighbor state. Because it will happen inside an already
existing critical section, it has to assume that the neighbor lock is held.
Add a helper __neigh_fill_info(), which is like neigh_fill_info(), but
makes this assumption. Convert neigh_fill_info() to a wrapper around this
new helper.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 96a3b1a93252..6cdd93dfa3ea 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2622,8 +2622,8 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
-			   u32 pid, u32 seq, int type, unsigned int flags)
+static int __neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
+			     u32 pid, u32 seq, int type, unsigned int flags)
 {
 	u32 neigh_flags, neigh_flags_ext;
 	unsigned long now = jiffies;
@@ -2649,23 +2649,19 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
 	if (nla_put(skb, NDA_DST, neigh->tbl->key_len, neigh->primary_key))
 		goto nla_put_failure;
 
-	read_lock_bh(&neigh->lock);
 	ndm->ndm_state	 = neigh->nud_state;
 	if (neigh->nud_state & NUD_VALID) {
 		char haddr[MAX_ADDR_LEN];
 
 		neigh_ha_snapshot(haddr, neigh, neigh->dev);
-		if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0) {
-			read_unlock_bh(&neigh->lock);
+		if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0)
 			goto nla_put_failure;
-		}
 	}
 
 	ci.ndm_used	 = jiffies_to_clock_t(now - neigh->used);
 	ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed);
 	ci.ndm_updated	 = jiffies_to_clock_t(now - neigh->updated);
 	ci.ndm_refcnt	 = refcount_read(&neigh->refcnt) - 1;
-	read_unlock_bh(&neigh->lock);
 
 	if (nla_put_u32(skb, NDA_PROBES, atomic_read(&neigh->probes)) ||
 	    nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
@@ -2684,6 +2680,20 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
 	return -EMSGSIZE;
 }
 
+static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
+			   u32 pid, u32 seq, int type, unsigned int flags)
+	__releases(neigh->lock)
+	__acquires(neigh->lock)
+{
+	int err;
+
+	read_lock_bh(&neigh->lock);
+	err = __neigh_fill_info(skb, neigh, pid, seq, type, flags);
+	read_unlock_bh(&neigh->lock);
+
+	return err;
+}
+
 static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
 			    u32 pid, u32 seq, int type, unsigned int flags,
 			    struct neigh_table *tbl)
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/8] net: core: neighbour: Call __neigh_notify() under a lock
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 3/8] net: core: neighbour: Extract ARP queue processing to a helper function Petr Machata
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

Andy Roulin has described an issue with the current neighbor notification
scheme as follows. This was also presented publicly at the link below.

    neigh_update sends a rtnl notification if an update, e.g.,
    nud_state change, was done but there is no guarantee of
    ordering of the rtnl notifications. Consider the following
    scenario:

    userspace thread                   kernel thread
    ================                   =============
    neigh_update
       write_lock_bh(n->lock)
       n->nud_state = STALE
       write_unlock_bh(n->lock)
       neigh_notify
         neigh_fill_info
           read_lock_bh(n->lock)
           ndm->nud_state = STALE
           read_unlock_bh(n->lock)
         -------------------------->
                                      neigh:update
                                      write_lock_bh(n->lock)
                                      n->nud_state = REACHABLE
                                      write_unlock_bh(n->lock)
                                      neigh_notify
                                        neigh_fill_info
                                           read_lock_bh(n->lock)
                                           ndm->nud_state = REACHABLE
                                           read_unlock_bh(n->lock)
                                        rtnl_nofify
                                      RTNL REACHABLE sent
                            <--------
        rtnl_notify
        RTNL STALE sent

    In this scenario, the kernel neigh is updated first to STALE and
    then REACHABLE but the netlink notifications are sent out of order,
    first REACHABLE and then STALE.

The solution is to send the netlink message inside the same critical
section that formats the message. That way both the contents and ordering
of the message reflect the same state, and we cannot see the abovementioned
out-of-order delivery.

Even with this patch, an issue remains that the contents of the message may
not reflect the changes made to the neighbor. A kernel thread might still
interrupt a userspace thread after the change is done, but before
formatting and sending the message. Then what we would see is two messages
with the same contents. The following patches will attempt to address that
issue.

To support those future patches, convert __neigh_notify() to a helper that
assumes that the neighbor lock is already taken by having it call
__neigh_fill_info() instead of neigh_fill_info(). Add a new helper,
neigh_notify(), which takes the lock before calling __neigh_notify().
Migrate all callers to use the latter.

Link: https://lore.kernel.org/netdev/ed6768c1-80b8-aee2-e545-b51661d49336@nvidia.com/
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6cdd93dfa3ea..5a56b787b5ec 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -51,8 +51,7 @@ do {						\
 #define PNEIGH_HASHMASK		0xF
 
 static void neigh_timer_handler(struct timer_list *t);
-static void __neigh_notify(struct neighbour *n, int type, int flags,
-			   u32 pid);
+static void neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
 static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid);
 static void pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
 			  bool skip_perm);
@@ -117,7 +116,7 @@ static int neigh_blackhole(struct neighbour *neigh, struct sk_buff *skb)
 static void neigh_cleanup_and_release(struct neighbour *neigh)
 {
 	trace_neigh_cleanup_and_release(neigh, 0);
-	__neigh_notify(neigh, RTM_DELNEIGH, 0, 0);
+	neigh_notify(neigh, RTM_DELNEIGH, 0, 0);
 	call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
 	neigh_release(neigh);
 }
@@ -2740,7 +2739,7 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
 static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid)
 {
 	call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
-	__neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+	neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
 }
 
 static bool neigh_master_filtered(struct net_device *dev, int master_idx)
@@ -3555,7 +3554,7 @@ static void __neigh_notify(struct neighbour *n, int type, int flags,
 	if (skb == NULL)
 		goto errout;
 
-	err = neigh_fill_info(skb, n, pid, 0, type, flags);
+	err = __neigh_fill_info(skb, n, pid, 0, type, flags);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in neigh_nlmsg_size() */
 		WARN_ON(err == -EMSGSIZE);
@@ -3570,9 +3569,18 @@ static void __neigh_notify(struct neighbour *n, int type, int flags,
 	rcu_read_unlock();
 }
 
+static void neigh_notify(struct neighbour *neigh, int type, int flags, u32 pid)
+	__releases(neigh->lock)
+	__acquires(neigh->lock)
+{
+	read_lock_bh(&neigh->lock);
+	__neigh_notify(neigh, type, flags, pid);
+	read_unlock_bh(&neigh->lock);
+}
+
 void neigh_app_ns(struct neighbour *n)
 {
-	__neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST, 0);
+	neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST, 0);
 }
 EXPORT_SYMBOL(neigh_app_ns);
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/8] net: core: neighbour: Extract ARP queue processing to a helper function
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 2/8] net: core: neighbour: Call __neigh_notify() under a lock Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 4/8] net: core: neighbour: Process ARP queue later Petr Machata
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

In order to make manipulation with this bit of code clearer, extract it
to a helper function, neigh_update_process_arp_queue().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 78 ++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 35 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 5a56b787b5ec..efbfaaba0e0b 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1302,6 +1302,47 @@ static void neigh_update_hhs(struct neighbour *neigh)
 	}
 }
 
+static void neigh_update_process_arp_queue(struct neighbour *neigh)
+	__releases(neigh->lock)
+	__acquires(neigh->lock)
+{
+	struct sk_buff *skb;
+
+	/* Again: avoid deadlock if something went wrong. */
+	while (neigh->nud_state & NUD_VALID &&
+	       (skb = __skb_dequeue(&neigh->arp_queue)) != NULL) {
+		struct dst_entry *dst = skb_dst(skb);
+		struct neighbour *n2, *n1 = neigh;
+
+		write_unlock_bh(&neigh->lock);
+
+		rcu_read_lock();
+
+		/* Why not just use 'neigh' as-is?  The problem is that
+		 * things such as shaper, eql, and sch_teql can end up
+		 * using alternative, different, neigh objects to output
+		 * the packet in the output path.  So what we need to do
+		 * here is re-lookup the top-level neigh in the path so
+		 * we can reinject the packet there.
+		 */
+		n2 = NULL;
+		if (dst &&
+		    READ_ONCE(dst->obsolete) != DST_OBSOLETE_DEAD) {
+			n2 = dst_neigh_lookup_skb(dst, skb);
+			if (n2)
+				n1 = n2;
+		}
+		READ_ONCE(n1->output)(n1, skb);
+		if (n2)
+			neigh_release(n2);
+		rcu_read_unlock();
+
+		write_lock_bh(&neigh->lock);
+	}
+	__skb_queue_purge(&neigh->arp_queue);
+	neigh->arp_queue_len_bytes = 0;
+}
+
 /* Generic update routine.
    -- lladdr is new lladdr or NULL, if it is not supplied.
    -- new    is new state.
@@ -1461,43 +1502,10 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
 		neigh_connect(neigh);
 	else
 		neigh_suspect(neigh);
-	if (!(old & NUD_VALID)) {
-		struct sk_buff *skb;
 
-		/* Again: avoid dead loop if something went wrong */
+	if (!(old & NUD_VALID))
+		neigh_update_process_arp_queue(neigh);
 
-		while (neigh->nud_state & NUD_VALID &&
-		       (skb = __skb_dequeue(&neigh->arp_queue)) != NULL) {
-			struct dst_entry *dst = skb_dst(skb);
-			struct neighbour *n2, *n1 = neigh;
-			write_unlock_bh(&neigh->lock);
-
-			rcu_read_lock();
-
-			/* Why not just use 'neigh' as-is?  The problem is that
-			 * things such as shaper, eql, and sch_teql can end up
-			 * using alternative, different, neigh objects to output
-			 * the packet in the output path.  So what we need to do
-			 * here is re-lookup the top-level neigh in the path so
-			 * we can reinject the packet there.
-			 */
-			n2 = NULL;
-			if (dst &&
-			    READ_ONCE(dst->obsolete) != DST_OBSOLETE_DEAD) {
-				n2 = dst_neigh_lookup_skb(dst, skb);
-				if (n2)
-					n1 = n2;
-			}
-			READ_ONCE(n1->output)(n1, skb);
-			if (n2)
-				neigh_release(n2);
-			rcu_read_unlock();
-
-			write_lock_bh(&neigh->lock);
-		}
-		__skb_queue_purge(&neigh->arp_queue);
-		neigh->arp_queue_len_bytes = 0;
-	}
 out:
 	if (update_isrouter)
 		neigh_update_is_router(neigh, flags, &notify);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 4/8] net: core: neighbour: Process ARP queue later
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
                   ` (2 preceding siblings ...)
  2026-01-14  9:54 ` [PATCH net-next 3/8] net: core: neighbour: Extract ARP queue processing to a helper function Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 5/8] net: core: neighbour: Inline neigh_update_notify() calls Petr Machata
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

ARP queue processing unlocks the neighbor lock, which can allow another
thread to asynchronously perform a neighbor update and send an out of order
notification. Therefore this needs to be done after the notification is
sent.

Move it just before the end of the critical section. Since
neigh_update_process_arp_queue() unlocks, it does not form a part of the
critical section anymore but it can benefit from the lock being taken. The
intention is to eventually do the RTNL notification before this call.

This motion crosses a call to neigh_update_is_router(), which should not
influence processing of the ARP queue.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index efbfaaba0e0b..f3290385db68 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1369,6 +1369,7 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
 			  struct netlink_ext_ack *extack)
 {
 	bool gc_update = false, managed_update = false;
+	bool process_arp_queue = false;
 	int update_isrouter = 0;
 	struct net_device *dev;
 	int err, notify = 0;
@@ -1504,12 +1505,17 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
 		neigh_suspect(neigh);
 
 	if (!(old & NUD_VALID))
-		neigh_update_process_arp_queue(neigh);
+		process_arp_queue = true;
 
 out:
 	if (update_isrouter)
 		neigh_update_is_router(neigh, flags, &notify);
+
+	if (process_arp_queue)
+		neigh_update_process_arp_queue(neigh);
+
 	write_unlock_bh(&neigh->lock);
+
 	if (((new ^ old) & NUD_PERMANENT) || gc_update)
 		neigh_update_gc_list(neigh);
 	if (managed_update)
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 5/8] net: core: neighbour: Inline neigh_update_notify() calls
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
                   ` (3 preceding siblings ...)
  2026-01-14  9:54 ` [PATCH net-next 4/8] net: core: neighbour: Process ARP queue later Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  2026-01-18  2:26   ` [net-next,5/8] " Jakub Kicinski
  2026-01-14  9:54 ` [PATCH net-next 6/8] net: core: neighbour: Reorder netlink & internal notification Petr Machata
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

The obvious idea behind the helper is to keep together the two bits that
should be done either both or neither: the internal notifier chain message,
and the netlink notification.

To make sure that the notification sent reflects the change being made, the
netlink message needs to be send inside the critical section where the
neighbor is changed. But for the notifier chain, there is no such need: the
listeners do not assume lock, and often in fact just schedule a delayed
work to act on the neighbor later. At least one in fact also takes the
neighbor lock. Therefore these two items have each different locking needs.

Now we could unlock inside the helper, but I find that error prone, and the
fact that the notification is conditional in the first place does not help
to make the call site obvious.

So in this patch, the helper is instead removed and the body, which is just
these two calls, inlined. That way we can use each notifier independently.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f3290385db68..e37db91c5339 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -52,7 +52,6 @@ do {						\
 
 static void neigh_timer_handler(struct timer_list *t);
 static void neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
-static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid);
 static void pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
 			  bool skip_perm);
 
@@ -1187,8 +1186,10 @@ static void neigh_timer_handler(struct timer_list *t)
 		write_unlock(&neigh->lock);
 	}
 
-	if (notify)
-		neigh_update_notify(neigh, 0);
+	if (notify) {
+		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+		neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+	}
 
 	trace_neigh_timer_handler(neigh, 0);
 
@@ -1520,8 +1521,12 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
 		neigh_update_gc_list(neigh);
 	if (managed_update)
 		neigh_update_managed_list(neigh);
-	if (notify)
-		neigh_update_notify(neigh, nlmsg_pid);
+
+	if (notify) {
+		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+		neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+	}
+
 	trace_neigh_update_done(neigh, err);
 	return err;
 }
@@ -2750,12 +2755,6 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
 	return -EMSGSIZE;
 }
 
-static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid)
-{
-	call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
-	neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
-}
-
 static bool neigh_master_filtered(struct net_device *dev, int master_idx)
 {
 	struct net_device *master;
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 6/8] net: core: neighbour: Reorder netlink & internal notification
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
                   ` (4 preceding siblings ...)
  2026-01-14  9:54 ` [PATCH net-next 5/8] net: core: neighbour: Inline neigh_update_notify() calls Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 7/8] net: core: neighbour: Make one netlink notification atomically Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 8/8] net: core: neighbour: Make another " Petr Machata
  7 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

The netlink message needs to be send inside the critical section where the
neighbor is changed, so that it reflects the notified-upon neighbor state.
On the other hand, there is no such need in case of notifier chain: the
listeners do not assume lock, and often in fact just schedule a delayed
work to act on the neighbor later. At least one in fact also takes the
neighbor lock.

This requires that the netlink notification be done before the internal
notifier chain message is sent. That is safe to do, because the current
listeners, as well as __neigh_notify(), only read the updated neighbor
fields, and never modify them. (Apart from locking.)

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index e37db91c5339..ada691690948 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1187,8 +1187,8 @@ static void neigh_timer_handler(struct timer_list *t)
 	}
 
 	if (notify) {
-		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
 		neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
 	}
 
 	trace_neigh_timer_handler(neigh, 0);
@@ -1523,8 +1523,8 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
 		neigh_update_managed_list(neigh);
 
 	if (notify) {
-		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
 		neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
 	}
 
 	trace_neigh_update_done(neigh, err);
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 7/8] net: core: neighbour: Make one netlink notification atomically
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
                   ` (5 preceding siblings ...)
  2026-01-14  9:54 ` [PATCH net-next 6/8] net: core: neighbour: Reorder netlink & internal notification Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  2026-01-14  9:54 ` [PATCH net-next 8/8] net: core: neighbour: Make another " Petr Machata
  7 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

As noted in a previous patch, one race remains in the current code. A
kernel thread might interrupt a userspace thread after the change is done,
but before formatting and sending the message. Then what we would see is
two messages with the same contents:

    userspace thread		       kernel thread
    ================		       =============
    neigh_update
       write_lock_bh(n->lock)
       n->nud_state = STALE
       write_unlock_bh(n->lock)
	 -------------------------->
				      neigh:update
				      write_lock_bh(n->lock)
				      n->nud_state = REACHABLE
				      write_unlock_bh(n->lock)
				      neigh_notify
					read_lock_bh(n->lock)
					__neigh_fill_info
					   ndm->nud_state = REACHABLE
					rtnl_notify
					read_unlock_bh(n->lock)
				      RTNL REACHABLE sent
			    <--------
       neigh_notify
	 read_lock_bh(n->lock)
	 __neigh_fill_info
	   ndm->nud_state = REACHABLE
	 rtnl_notify
	 read_unlock_bh(n->lock)
       RTNL REACHABLE sent again

The solution is to send the netlink message inside the critical section
where the neighbor is changed, so that it reflects the notified-upon
neighbor state.

To that end, in __neigh_update(), move the current neigh_notify() call up
to said critical section, and convert it to __neigh_notify(), because the
lock is held. This motion crosses calls to neigh_update_managed_list(),
neigh_update_gc_list() and neigh_update_process_arp_queue(), all of which
potentially unlock and give an opportunity for the above race.

This also crosses a call to neigh_update_process_arp_queue() which calls
neigh->output(), which might be neigh_resolve_output() calls
neigh_event_send() calls neigh_event_send_probe() calls
__neigh_event_send() calls neigh_probe(), which touches neigh->probes,
an update which will now not be visible in the notification.

However, there is indication that there is no promise that these changes
will be accurately projected to notifications: fib6_table_lookup()
indirectly calls route.c's find_match() calls rt6_probe(), which looks up a
neighbor and call __neigh_set_probe_once(), which sets neigh->probes to 0,
but neither this nor the caller seems to send a notification.

Additionally, the neighbor object that the neigh_probe() mentioned above is
called on, might be the alternative neighbor looked up for the ARP queue
packet destination. If that is the case, the changed value of n1->probes is
not notified anywhere.

So at least in some circumstances, the reported number of probes needs to
be assumed to change without notification.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ada691690948..635d71c6420f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -52,6 +52,7 @@ do {						\
 
 static void neigh_timer_handler(struct timer_list *t);
 static void neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
+static void __neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
 static void pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
 			  bool skip_perm);
 
@@ -1512,6 +1513,9 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
 	if (update_isrouter)
 		neigh_update_is_router(neigh, flags, &notify);
 
+	if (notify)
+		__neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+
 	if (process_arp_queue)
 		neigh_update_process_arp_queue(neigh);
 
@@ -1522,10 +1526,8 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
 	if (managed_update)
 		neigh_update_managed_list(neigh);
 
-	if (notify) {
-		neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+	if (notify)
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
-	}
 
 	trace_neigh_update_done(neigh, err);
 	return err;
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 8/8] net: core: neighbour: Make another netlink notification atomically
  2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
                   ` (6 preceding siblings ...)
  2026-01-14  9:54 ` [PATCH net-next 7/8] net: core: neighbour: Make one netlink notification atomically Petr Machata
@ 2026-01-14  9:54 ` Petr Machata
  7 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-14  9:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev
  Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw

Similarly to the issue from the previous patch, neigh_timer_handler() also
updates the neighbor separately from formatting and sending the netlink
notification message. We have not seen reports to the effect of this
causing trouble, but in theory, the same sort of issues could have come up:
neigh_timer_handler() would make changes as necessary, but before
formatting and sending a notification, is interrupted before sending by
another thread, which makes a parallel change and sends its own message.
The message send that is prompted by an earlier change thus contains
information that does not reflect the change having been made.

To solve this, the netlink notification needs to be in the same critical
section that updates the neighbor. The critical section is ended by the
neigh_probe() call which drops the lock before calling solicit. Stretching
the critical section over the solicit call is problematic, because that can
then involved all sorts of forwarding callbacks. Therefore, like in the
previous patch, split the netlink notification away from the internal one
and move it ahead of the probe call.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/core/neighbour.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 635d71c6420f..5512dd7035b1 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1180,6 +1180,10 @@ static void neigh_timer_handler(struct timer_list *t)
 		if (!mod_timer(&neigh->timer, next))
 			neigh_hold(neigh);
 	}
+
+	if (notify)
+		__neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+
 	if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) {
 		neigh_probe(neigh);
 	} else {
@@ -1187,10 +1191,8 @@ static void neigh_timer_handler(struct timer_list *t)
 		write_unlock(&neigh->lock);
 	}
 
-	if (notify) {
-		neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+	if (notify)
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
-	}
 
 	trace_neigh_timer_handler(neigh, 0);
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held
  2026-01-14  9:54 ` [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held Petr Machata
@ 2026-01-16  7:22   ` Kuniyuki Iwashima
  2026-01-16 12:01     ` Petr Machata
  0 siblings, 1 reply; 14+ messages in thread
From: Kuniyuki Iwashima @ 2026-01-16  7:22 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, Ido Schimmel, Breno Leitao, Andy Roulin,
	Francesco Ruggeri, Stephen Hemminger, mlxsw

On Wed, Jan 14, 2026 at 1:56 AM Petr Machata <petrm@nvidia.com> wrote:
>
> The netlink message needs to be formatted and sent inside the critical
> section where the neighbor is changed, so that it reflects the
> notified-upon neighbor state. Because it will happen inside an already
> existing critical section, it has to assume that the neighbor lock is held.
> Add a helper __neigh_fill_info(), which is like neigh_fill_info(), but
> makes this assumption. Convert neigh_fill_info() to a wrapper around this
> new helper.
>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/core/neighbour.c | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 96a3b1a93252..6cdd93dfa3ea 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -2622,8 +2622,8 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
>         return skb->len;
>  }
>
> -static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
> -                          u32 pid, u32 seq, int type, unsigned int flags)
> +static int __neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
> +                            u32 pid, u32 seq, int type, unsigned int flags)
>  {
>         u32 neigh_flags, neigh_flags_ext;
>         unsigned long now = jiffies;
> @@ -2649,23 +2649,19 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
>         if (nla_put(skb, NDA_DST, neigh->tbl->key_len, neigh->primary_key))
>                 goto nla_put_failure;
>
> -       read_lock_bh(&neigh->lock);
>         ndm->ndm_state   = neigh->nud_state;
>         if (neigh->nud_state & NUD_VALID) {
>                 char haddr[MAX_ADDR_LEN];
>
>                 neigh_ha_snapshot(haddr, neigh, neigh->dev);
> -               if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0) {
> -                       read_unlock_bh(&neigh->lock);
> +               if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0)
>                         goto nla_put_failure;
> -               }
>         }
>
>         ci.ndm_used      = jiffies_to_clock_t(now - neigh->used);
>         ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed);
>         ci.ndm_updated   = jiffies_to_clock_t(now - neigh->updated);
>         ci.ndm_refcnt    = refcount_read(&neigh->refcnt) - 1;
> -       read_unlock_bh(&neigh->lock);
>
>         if (nla_put_u32(skb, NDA_PROBES, atomic_read(&neigh->probes)) ||
>             nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
> @@ -2684,6 +2680,20 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
>         return -EMSGSIZE;
>  }
>
> +static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
> +                          u32 pid, u32 seq, int type, unsigned int flags)
> +       __releases(neigh->lock)
> +       __acquires(neigh->lock)

nit: Does Sparse complain without these annotations even
a function has a paired lock/unlock ops ?


> +{
> +       int err;
> +
> +       read_lock_bh(&neigh->lock);
> +       err = __neigh_fill_info(skb, neigh, pid, seq, type, flags);
> +       read_unlock_bh(&neigh->lock);
> +
> +       return err;
> +}
> +
>  static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
>                             u32 pid, u32 seq, int type, unsigned int flags,
>                             struct neigh_table *tbl)
> --
> 2.51.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held
  2026-01-16  7:22   ` Kuniyuki Iwashima
@ 2026-01-16 12:01     ` Petr Machata
  2026-01-19 12:25       ` Petr Machata
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2026-01-16 12:01 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netdev, Ido Schimmel, Breno Leitao,
	Andy Roulin, Francesco Ruggeri, Stephen Hemminger, mlxsw


Kuniyuki Iwashima <kuniyu@google.com> writes:

> On Wed, Jan 14, 2026 at 1:56 AM Petr Machata <petrm@nvidia.com> wrote:
>
>> @@ -2684,6 +2680,20 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
>>         return -EMSGSIZE;
>>  }
>>
>> +static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
>> +                          u32 pid, u32 seq, int type, unsigned int flags)
>> +       __releases(neigh->lock)
>> +       __acquires(neigh->lock)
>
> nit: Does Sparse complain without these annotations even
> a function has a paired lock/unlock ops ?

Actually it doesn't. Should I respin for this?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [net-next,5/8] net: core: neighbour: Inline neigh_update_notify() calls
  2026-01-14  9:54 ` [PATCH net-next 5/8] net: core: neighbour: Inline neigh_update_notify() calls Petr Machata
@ 2026-01-18  2:26   ` Jakub Kicinski
  2026-01-19  8:51     ` Petr Machata
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2026-01-18  2:26 UTC (permalink / raw)
  To: petrm; +Cc: Jakub Kicinski, netdev, edumazet, horms, davem, pabeni

I'm a bit behind so I haven't validated this myself TBH,
but in the interest of time perhaps you can TAL?

This is an AI-generated review of your patch.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: core: neighbour: Make another netlink notification atomically

This commit moves the netlink notification call in neigh_timer_handler()
earlier, into the critical section before neigh_probe() releases the lock,
to fix a race where notifications could reflect stale state.

> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 635d71c6420f..5512dd7035b1 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c

[ ... ]

Earlier in the function, when max probes are exhausted, there is this code:

        if ((neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) &&
            atomic_read(&neigh->probes) >= neigh_max_probes(neigh)) {
                ...
                notify = 1;
                goto out;
        }

Then the new code places __neigh_notify() here:

> @@ -1180,6 +1180,10 @@ static void neigh_timer_handler(struct timer_list *t)
>  		if (!mod_timer(&neigh->timer, next))
>  			neigh_hold(neigh);
>  	}
> +
> +	if (notify)
> +		__neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
> +
>  	if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) {
>  		neigh_probe(neigh);
>  	} else {
> @@ -1187,10 +1191,8 @@ static void neigh_timer_handler(struct timer_list *t)
> out:
>  		write_unlock(&neigh->lock);
>  	}
> -
> -	if (notify) {
> -		neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
> +	if (notify)
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
> -	}

When the neighbor exhausts max probes and transitions to NUD_FAILED or
NUD_STALE, the code sets notify = 1 and then does goto out. This jumps
directly to the out: label, skipping the new __neigh_notify() call.

The old code called neigh_notify() after the out: label, so the netlink
notification was always sent when notify was set. With this change, the
goto out path no longer sends the netlink notification to userspace,
though it still calls call_netevent_notifiers() for kernel-internal
listeners.

Is this an oversight? Should the __neigh_notify() call be placed before the
goto out, or should the goto out path have its own __neigh_notify() call?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [net-next,5/8] net: core: neighbour: Inline neigh_update_notify() calls
  2026-01-18  2:26   ` [net-next,5/8] " Jakub Kicinski
@ 2026-01-19  8:51     ` Petr Machata
  0 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-19  8:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: petrm, netdev, edumazet, horms, davem, pabeni


Jakub Kicinski <kuba@kernel.org> writes:

> I'm a bit behind so I haven't validated this myself TBH,
> but in the interest of time perhaps you can TAL?

It looks legit. I'll send a v2.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held
  2026-01-16 12:01     ` Petr Machata
@ 2026-01-19 12:25       ` Petr Machata
  0 siblings, 0 replies; 14+ messages in thread
From: Petr Machata @ 2026-01-19 12:25 UTC (permalink / raw)
  To: Petr Machata
  Cc: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netdev, Ido Schimmel, Breno Leitao,
	Andy Roulin, Francesco Ruggeri, Stephen Hemminger, mlxsw


Petr Machata <petrm@nvidia.com> writes:

> Kuniyuki Iwashima <kuniyu@google.com> writes:
>
>> On Wed, Jan 14, 2026 at 1:56 AM Petr Machata <petrm@nvidia.com> wrote:
>>
>>> @@ -2684,6 +2680,20 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
>>>         return -EMSGSIZE;
>>>  }
>>>
>>> +static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
>>> +                          u32 pid, u32 seq, int type, unsigned int flags)
>>> +       __releases(neigh->lock)
>>> +       __acquires(neigh->lock)
>>
>> nit: Does Sparse complain without these annotations even
>> a function has a paired lock/unlock ops ?
>
> Actually it doesn't. Should I respin for this?

I'll be respinning anyway, so I'll include this as well.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-01-19 12:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-14  9:54 [PATCH net-next 0/8] net: neighbour: Notify changes atomically Petr Machata
2026-01-14  9:54 ` [PATCH net-next 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held Petr Machata
2026-01-16  7:22   ` Kuniyuki Iwashima
2026-01-16 12:01     ` Petr Machata
2026-01-19 12:25       ` Petr Machata
2026-01-14  9:54 ` [PATCH net-next 2/8] net: core: neighbour: Call __neigh_notify() under a lock Petr Machata
2026-01-14  9:54 ` [PATCH net-next 3/8] net: core: neighbour: Extract ARP queue processing to a helper function Petr Machata
2026-01-14  9:54 ` [PATCH net-next 4/8] net: core: neighbour: Process ARP queue later Petr Machata
2026-01-14  9:54 ` [PATCH net-next 5/8] net: core: neighbour: Inline neigh_update_notify() calls Petr Machata
2026-01-18  2:26   ` [net-next,5/8] " Jakub Kicinski
2026-01-19  8:51     ` Petr Machata
2026-01-14  9:54 ` [PATCH net-next 6/8] net: core: neighbour: Reorder netlink & internal notification Petr Machata
2026-01-14  9:54 ` [PATCH net-next 7/8] net: core: neighbour: Make one netlink notification atomically Petr Machata
2026-01-14  9:54 ` [PATCH net-next 8/8] net: core: neighbour: Make another " Petr Machata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox