* [PATCH net-next v2 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 2/8] net: core: neighbour: Call __neigh_notify() under a lock Petr Machata
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
The netlink message needs to be formatted and sent inside the critical
section where the neighbor is changed, so that it reflects the
notified-upon neighbor state. Because it will happen inside an already
existing critical section, it has to assume that the neighbor lock is held.
Add a helper __neigh_fill_info(), which is like neigh_fill_info(), but
makes this assumption. Convert neigh_fill_info() to a wrapper around this
new helper.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/core/neighbour.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 96a3b1a93252..6cdd93dfa3ea 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2622,8 +2622,8 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
return skb->len;
}
-static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
- u32 pid, u32 seq, int type, unsigned int flags)
+static int __neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
+ u32 pid, u32 seq, int type, unsigned int flags)
{
u32 neigh_flags, neigh_flags_ext;
unsigned long now = jiffies;
@@ -2649,23 +2649,19 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
if (nla_put(skb, NDA_DST, neigh->tbl->key_len, neigh->primary_key))
goto nla_put_failure;
- read_lock_bh(&neigh->lock);
ndm->ndm_state = neigh->nud_state;
if (neigh->nud_state & NUD_VALID) {
char haddr[MAX_ADDR_LEN];
neigh_ha_snapshot(haddr, neigh, neigh->dev);
- if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0) {
- read_unlock_bh(&neigh->lock);
+ if (nla_put(skb, NDA_LLADDR, neigh->dev->addr_len, haddr) < 0)
goto nla_put_failure;
- }
}
ci.ndm_used = jiffies_to_clock_t(now - neigh->used);
ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed);
ci.ndm_updated = jiffies_to_clock_t(now - neigh->updated);
ci.ndm_refcnt = refcount_read(&neigh->refcnt) - 1;
- read_unlock_bh(&neigh->lock);
if (nla_put_u32(skb, NDA_PROBES, atomic_read(&neigh->probes)) ||
nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
@@ -2684,6 +2680,20 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
return -EMSGSIZE;
}
+static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
+ u32 pid, u32 seq, int type, unsigned int flags)
+ __releases(neigh->lock)
+ __acquires(neigh->lock)
+{
+ int err;
+
+ read_lock_bh(&neigh->lock);
+ err = __neigh_fill_info(skb, neigh, pid, seq, type, flags);
+ read_unlock_bh(&neigh->lock);
+
+ return err;
+}
+
static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
u32 pid, u32 seq, int type, unsigned int flags,
struct neigh_table *tbl)
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next v2 2/8] net: core: neighbour: Call __neigh_notify() under a lock
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 3/8] net: core: neighbour: Extract ARP queue processing to a helper function Petr Machata
` (7 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
Andy Roulin has described an issue with the current neighbor notification
scheme as follows. This was also presented publicly at the link below.
neigh_update sends a rtnl notification if an update, e.g.,
nud_state change, was done but there is no guarantee of
ordering of the rtnl notifications. Consider the following
scenario:
userspace thread kernel thread
================ =============
neigh_update
write_lock_bh(n->lock)
n->nud_state = STALE
write_unlock_bh(n->lock)
neigh_notify
neigh_fill_info
read_lock_bh(n->lock)
ndm->nud_state = STALE
read_unlock_bh(n->lock)
-------------------------->
neigh:update
write_lock_bh(n->lock)
n->nud_state = REACHABLE
write_unlock_bh(n->lock)
neigh_notify
neigh_fill_info
read_lock_bh(n->lock)
ndm->nud_state = REACHABLE
read_unlock_bh(n->lock)
rtnl_nofify
RTNL REACHABLE sent
<--------
rtnl_notify
RTNL STALE sent
In this scenario, the kernel neigh is updated first to STALE and
then REACHABLE but the netlink notifications are sent out of order,
first REACHABLE and then STALE.
The solution is to send the netlink message inside the same critical
section that formats the message. That way both the contents and ordering
of the message reflect the same state, and we cannot see the abovementioned
out-of-order delivery.
Even with this patch, a remaining issue that the contents of the message
may not reflect the changes made to the neighbor. A kernel thread might
still interrupt a userspace thread after the change is done, but before
formatting and sending the message. Then what we would see is two messages
with the same contents. The following patches will attempt to address that
issue.
To support those future patches, convert __neigh_notify() to a helper that
assumes that the neighbor lock is already taken by having it call
__neigh_fill_info() instead of neigh_fill_info(). Add a new helper,
neigh_notify(), which takes the lock before calling __neigh_notify().
Migrate all callers to use the latter.
Link: https://lore.kernel.org/netdev/ed6768c1-80b8-aee2-e545-b51661d49336@nvidia.com/
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
Notes:
v2:
- Drop the __acquires / __releases annotations at neigh_notify().
They are not necessary with a symmetrically locking function.
net/core/neighbour.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6cdd93dfa3ea..66f4a6525106 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -51,8 +51,7 @@ do { \
#define PNEIGH_HASHMASK 0xF
static void neigh_timer_handler(struct timer_list *t);
-static void __neigh_notify(struct neighbour *n, int type, int flags,
- u32 pid);
+static void neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid);
static void pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
bool skip_perm);
@@ -117,7 +116,7 @@ static int neigh_blackhole(struct neighbour *neigh, struct sk_buff *skb)
static void neigh_cleanup_and_release(struct neighbour *neigh)
{
trace_neigh_cleanup_and_release(neigh, 0);
- __neigh_notify(neigh, RTM_DELNEIGH, 0, 0);
+ neigh_notify(neigh, RTM_DELNEIGH, 0, 0);
call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
neigh_release(neigh);
}
@@ -2740,7 +2739,7 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid)
{
call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
- __neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+ neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
}
static bool neigh_master_filtered(struct net_device *dev, int master_idx)
@@ -3555,7 +3554,7 @@ static void __neigh_notify(struct neighbour *n, int type, int flags,
if (skb == NULL)
goto errout;
- err = neigh_fill_info(skb, n, pid, 0, type, flags);
+ err = __neigh_fill_info(skb, n, pid, 0, type, flags);
if (err < 0) {
/* -EMSGSIZE implies BUG in neigh_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
@@ -3570,9 +3569,16 @@ static void __neigh_notify(struct neighbour *n, int type, int flags,
rcu_read_unlock();
}
+static void neigh_notify(struct neighbour *neigh, int type, int flags, u32 pid)
+{
+ read_lock_bh(&neigh->lock);
+ __neigh_notify(neigh, type, flags, pid);
+ read_unlock_bh(&neigh->lock);
+}
+
void neigh_app_ns(struct neighbour *n)
{
- __neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST, 0);
+ neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST, 0);
}
EXPORT_SYMBOL(neigh_app_ns);
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next v2 3/8] net: core: neighbour: Extract ARP queue processing to a helper function
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 2/8] net: core: neighbour: Call __neigh_notify() under a lock Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 4/8] net: core: neighbour: Process ARP queue later Petr Machata
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
In order to make manipulation with this bit of code clearer, extract it
to a helper function, neigh_update_process_arp_queue().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/core/neighbour.c | 78 ++++++++++++++++++++++++--------------------
1 file changed, 43 insertions(+), 35 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 66f4a6525106..b6cb127a5b8e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1302,6 +1302,47 @@ static void neigh_update_hhs(struct neighbour *neigh)
}
}
+static void neigh_update_process_arp_queue(struct neighbour *neigh)
+ __releases(neigh->lock)
+ __acquires(neigh->lock)
+{
+ struct sk_buff *skb;
+
+ /* Again: avoid deadlock if something went wrong. */
+ while (neigh->nud_state & NUD_VALID &&
+ (skb = __skb_dequeue(&neigh->arp_queue)) != NULL) {
+ struct dst_entry *dst = skb_dst(skb);
+ struct neighbour *n2, *n1 = neigh;
+
+ write_unlock_bh(&neigh->lock);
+
+ rcu_read_lock();
+
+ /* Why not just use 'neigh' as-is? The problem is that
+ * things such as shaper, eql, and sch_teql can end up
+ * using alternative, different, neigh objects to output
+ * the packet in the output path. So what we need to do
+ * here is re-lookup the top-level neigh in the path so
+ * we can reinject the packet there.
+ */
+ n2 = NULL;
+ if (dst &&
+ READ_ONCE(dst->obsolete) != DST_OBSOLETE_DEAD) {
+ n2 = dst_neigh_lookup_skb(dst, skb);
+ if (n2)
+ n1 = n2;
+ }
+ READ_ONCE(n1->output)(n1, skb);
+ if (n2)
+ neigh_release(n2);
+ rcu_read_unlock();
+
+ write_lock_bh(&neigh->lock);
+ }
+ __skb_queue_purge(&neigh->arp_queue);
+ neigh->arp_queue_len_bytes = 0;
+}
+
/* Generic update routine.
-- lladdr is new lladdr or NULL, if it is not supplied.
-- new is new state.
@@ -1461,43 +1502,10 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
neigh_connect(neigh);
else
neigh_suspect(neigh);
- if (!(old & NUD_VALID)) {
- struct sk_buff *skb;
- /* Again: avoid dead loop if something went wrong */
+ if (!(old & NUD_VALID))
+ neigh_update_process_arp_queue(neigh);
- while (neigh->nud_state & NUD_VALID &&
- (skb = __skb_dequeue(&neigh->arp_queue)) != NULL) {
- struct dst_entry *dst = skb_dst(skb);
- struct neighbour *n2, *n1 = neigh;
- write_unlock_bh(&neigh->lock);
-
- rcu_read_lock();
-
- /* Why not just use 'neigh' as-is? The problem is that
- * things such as shaper, eql, and sch_teql can end up
- * using alternative, different, neigh objects to output
- * the packet in the output path. So what we need to do
- * here is re-lookup the top-level neigh in the path so
- * we can reinject the packet there.
- */
- n2 = NULL;
- if (dst &&
- READ_ONCE(dst->obsolete) != DST_OBSOLETE_DEAD) {
- n2 = dst_neigh_lookup_skb(dst, skb);
- if (n2)
- n1 = n2;
- }
- READ_ONCE(n1->output)(n1, skb);
- if (n2)
- neigh_release(n2);
- rcu_read_unlock();
-
- write_lock_bh(&neigh->lock);
- }
- __skb_queue_purge(&neigh->arp_queue);
- neigh->arp_queue_len_bytes = 0;
- }
out:
if (update_isrouter)
neigh_update_is_router(neigh, flags, ¬ify);
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next v2 4/8] net: core: neighbour: Process ARP queue later
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
` (2 preceding siblings ...)
2026-01-21 16:43 ` [PATCH net-next v2 3/8] net: core: neighbour: Extract ARP queue processing to a helper function Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 5/8] net: core: neighbour: Inline neigh_update_notify() calls Petr Machata
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
ARP queue processing unlocks the neighbor lock, which can allow another
thread to asynchronously perform a neighbor update and send an out of order
notification. Therefore this needs to be done after the notification is
sent.
Move it just before the end of the critical section. Since
neigh_update_process_arp_queue() unlocks, it does not form a part of the
critical section anymore but it can benefit from the lock being taken. The
intention is to eventually do the RTNL notification before this call.
This motion crosses a call to neigh_update_is_router(), which should not
influence processing of the ARP queue.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/core/neighbour.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index b6cb127a5b8e..3d969f0190a1 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1369,6 +1369,7 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
struct netlink_ext_ack *extack)
{
bool gc_update = false, managed_update = false;
+ bool process_arp_queue = false;
int update_isrouter = 0;
struct net_device *dev;
int err, notify = 0;
@@ -1504,12 +1505,17 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
neigh_suspect(neigh);
if (!(old & NUD_VALID))
- neigh_update_process_arp_queue(neigh);
+ process_arp_queue = true;
out:
if (update_isrouter)
neigh_update_is_router(neigh, flags, ¬ify);
+
+ if (process_arp_queue)
+ neigh_update_process_arp_queue(neigh);
+
write_unlock_bh(&neigh->lock);
+
if (((new ^ old) & NUD_PERMANENT) || gc_update)
neigh_update_gc_list(neigh);
if (managed_update)
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next v2 5/8] net: core: neighbour: Inline neigh_update_notify() calls
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
` (3 preceding siblings ...)
2026-01-21 16:43 ` [PATCH net-next v2 4/8] net: core: neighbour: Process ARP queue later Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 6/8] net: core: neighbour: Reorder netlink & internal notification Petr Machata
` (4 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
The obvious idea behind the helper is to keep together the two bits that
should be done either both or neither: the internal notifier chain message,
and the netlink notification.
To make sure that the notification sent reflects the change being made, the
netlink message needs to be send inside the critical section where the
neighbor is changed. But for the notifier chain, there is no such need: the
listeners do not assume lock, and often in fact just schedule a delayed
work to act on the neighbor later. At least one in fact also takes the
neighbor lock. Therefore these two items have each different locking needs.
Now we could unlock inside the helper, but I find that error prone, and the
fact that the notification is conditional in the first place does not help
to make the call site obvious.
So in this patch, the helper is instead removed and the body, which is just
these two calls, inlined. That way we can use each notifier independently.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/core/neighbour.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 3d969f0190a1..0fbdaae7df99 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -52,7 +52,6 @@ do { \
static void neigh_timer_handler(struct timer_list *t);
static void neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
-static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid);
static void pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
bool skip_perm);
@@ -1187,8 +1186,10 @@ static void neigh_timer_handler(struct timer_list *t)
write_unlock(&neigh->lock);
}
- if (notify)
- neigh_update_notify(neigh, 0);
+ if (notify) {
+ call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+ neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+ }
trace_neigh_timer_handler(neigh, 0);
@@ -1520,8 +1521,12 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
neigh_update_gc_list(neigh);
if (managed_update)
neigh_update_managed_list(neigh);
- if (notify)
- neigh_update_notify(neigh, nlmsg_pid);
+
+ if (notify) {
+ call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+ neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+ }
+
trace_neigh_update_done(neigh, err);
return err;
}
@@ -2750,12 +2755,6 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
return -EMSGSIZE;
}
-static void neigh_update_notify(struct neighbour *neigh, u32 nlmsg_pid)
-{
- call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
- neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
-}
-
static bool neigh_master_filtered(struct net_device *dev, int master_idx)
{
struct net_device *master;
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next v2 6/8] net: core: neighbour: Reorder netlink & internal notification
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
` (4 preceding siblings ...)
2026-01-21 16:43 ` [PATCH net-next v2 5/8] net: core: neighbour: Inline neigh_update_notify() calls Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-25 22:59 ` Jakub Kicinski
2026-01-21 16:43 ` [PATCH net-next v2 7/8] net: core: neighbour: Make one netlink notification atomically Petr Machata
` (3 subsequent siblings)
9 siblings, 1 reply; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
The netlink message needs to be send inside the critical section where the
neighbor is changed, so that it reflects the notified-upon neighbor state.
On the other hand, there is no such need in case of notifier chain: the
listeners do not assume lock, and often in fact just schedule a delayed
work to act on the neighbor later. At least one in fact also takes the
neighbor lock.
This requires that the netlink notification be done before the internal
notifier chain message is sent. That is safe to do, because the current
listeners, as well as __neigh_notify(), only read the updated neighbor
fields, and never modify them. (Apart from locking.)
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/core/neighbour.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 0fbdaae7df99..187615bcc1c3 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1187,8 +1187,8 @@ static void neigh_timer_handler(struct timer_list *t)
}
if (notify) {
- call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+ call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
}
trace_neigh_timer_handler(neigh, 0);
@@ -1523,8 +1523,8 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
neigh_update_managed_list(neigh);
if (notify) {
- call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+ call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
}
trace_neigh_update_done(neigh, err);
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH net-next v2 6/8] net: core: neighbour: Reorder netlink & internal notification
2026-01-21 16:43 ` [PATCH net-next v2 6/8] net: core: neighbour: Reorder netlink & internal notification Petr Machata
@ 2026-01-25 22:59 ` Jakub Kicinski
0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2026-01-25 22:59 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman, netdev,
Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, mlxsw
On Wed, 21 Jan 2026 17:43:40 +0100 Petr Machata wrote:
> The netlink message needs to be send inside the critical section where the
> neighbor is changed, so that it reflects the notified-upon neighbor state.
> On the other hand, there is no such need in case of notifier chain: the
> listeners do not assume lock, and often in fact just schedule a delayed
> work to act on the neighbor later. At least one in fact also takes the
> neighbor lock.
>
> This requires that the netlink notification be done before the internal
> notifier chain message is sent. That is safe to do, because the current
> listeners, as well as __neigh_notify(), only read the updated neighbor
> fields, and never modify them. (Apart from locking.)
Hopefully we're not setting a trap here that some driver will later
fall into.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next v2 7/8] net: core: neighbour: Make one netlink notification atomically
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
` (5 preceding siblings ...)
2026-01-21 16:43 ` [PATCH net-next v2 6/8] net: core: neighbour: Reorder netlink & internal notification Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-21 16:43 ` [PATCH net-next v2 8/8] net: core: neighbour: Make another " Petr Machata
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
As noted in a previous patch, one race remains in the current code. A
kernel thread might interrupt a userspace thread after the change is done,
but before formatting and sending the message. Then what we would see is
two messages with the same contents:
userspace thread kernel thread
================ =============
neigh_update
write_lock_bh(n->lock)
n->nud_state = STALE
write_unlock_bh(n->lock)
-------------------------->
neigh:update
write_lock_bh(n->lock)
n->nud_state = REACHABLE
write_unlock_bh(n->lock)
neigh_notify
read_lock_bh(n->lock)
__neigh_fill_info
ndm->nud_state = REACHABLE
rtnl_notify
read_unlock_bh(n->lock)
RTNL REACHABLE sent
<--------
neigh_notify
read_lock_bh(n->lock)
__neigh_fill_info
ndm->nud_state = REACHABLE
rtnl_notify
read_unlock_bh(n->lock)
RTNL REACHABLE sent again
The solution is to send the netlink message inside the critical section
where the neighbor is changed, so that it reflects the notified-upon
neighbor state.
To that end, in __neigh_update(), move the current neigh_notify() call up
to said critical section, and convert it to __neigh_notify(), because the
lock is held. This motion crosses calls to neigh_update_managed_list(),
neigh_update_gc_list() and neigh_update_process_arp_queue(), all of which
potentially unlock and give an opportunity for the above race.
This also crosses a call to neigh_update_process_arp_queue() which calls
neigh->output(), which might be neigh_resolve_output() calls
neigh_event_send() calls neigh_event_send_probe() calls
__neigh_event_send() calls neigh_probe(), which touches neigh->probes,
an update which will now not be visible in the notification.
However, there is indication that there is no promise that these changes
will be accurately projected to notifications: fib6_table_lookup()
indirectly calls route.c's find_match() calls rt6_probe(), which looks up a
neighbor and call __neigh_set_probe_once(), which sets neigh->probes to 0,
but neither this nor the caller seems to send a notification.
Additionally, the neighbor object that the neigh_probe() mentioned above is
called on, might be the alternative neighbor looked up for the ARP queue
packet destination. If that is the case, the changed value of n1->probes is
not notified anywhere.
So at least in some circumstances, the reported number of probes needs to
be assumed to change without notification.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
net/core/neighbour.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 187615bcc1c3..1d7489f50b21 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -52,6 +52,7 @@ do { \
static void neigh_timer_handler(struct timer_list *t);
static void neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
+static void __neigh_notify(struct neighbour *n, int type, int flags, u32 pid);
static void pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
bool skip_perm);
@@ -1512,6 +1513,9 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
if (update_isrouter)
neigh_update_is_router(neigh, flags, ¬ify);
+ if (notify)
+ __neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+
if (process_arp_queue)
neigh_update_process_arp_queue(neigh);
@@ -1522,10 +1526,8 @@ static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,
if (managed_update)
neigh_update_managed_list(neigh);
- if (notify) {
- neigh_notify(neigh, RTM_NEWNEIGH, 0, nlmsg_pid);
+ if (notify)
call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
- }
trace_neigh_update_done(neigh, err);
return err;
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net-next v2 8/8] net: core: neighbour: Make another netlink notification atomically
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
` (6 preceding siblings ...)
2026-01-21 16:43 ` [PATCH net-next v2 7/8] net: core: neighbour: Make one netlink notification atomically Petr Machata
@ 2026-01-21 16:43 ` Petr Machata
2026-01-23 15:28 ` [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Simon Horman
2026-01-25 23:00 ` patchwork-bot+netdevbpf
9 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2026-01-21 16:43 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netdev
Cc: Ido Schimmel, Kuniyuki Iwashima, Breno Leitao, Andy Roulin,
Francesco Ruggeri, Stephen Hemminger, Petr Machata, mlxsw
Similarly to the issue from the previous patch, neigh_timer_handler() also
updates the neighbor separately from formatting and sending the netlink
notification message. We have not seen reports to the effect of this
causing trouble, but in theory, the same sort of issues could have come up:
neigh_timer_handler() would make changes as necessary, but before
formatting and sending a notification, is interrupted before sending by
another thread, which makes a parallel change and sends its own message.
The message send that is prompted by an earlier change thus contains
information that does not reflect the change having been made.
To solve this, the netlink notification needs to be in the same critical
section that updates the neighbor. The critical section is ended by the
neigh_probe() call which drops the lock before calling solicit. Stretching
the critical section over the solicit call is problematic, because that can
then involved all sorts of forwarding callbacks. Therefore, like in the
previous patch, split the netlink notification away from the internal one
and move it ahead of the probe call.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
Notes:
v2:
- Do not skip the notification from inside the
atomic_read(&neigh->probes) >= neigh_max_probes(neigh)
conditional. Instead set a flag, and goto out after the
notification if the flag is set.
- Move the __neigh_notify() call another block up above the
NUD_IN_TIMER check. That belongs logically together with
the (NUD_INCOMPLETE | NUD_PROBE) check afterwards, no sense
to split the two conditionals with the notifier.
net/core/neighbour.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 1d7489f50b21..e0897eb41c8d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1104,6 +1104,7 @@ static void neigh_timer_handler(struct timer_list *t)
{
unsigned long now, next;
struct neighbour *neigh = timer_container_of(neigh, t, timer);
+ bool skip_probe = false;
unsigned int state;
int notify = 0;
@@ -1171,9 +1172,15 @@ static void neigh_timer_handler(struct timer_list *t)
neigh_invalidate(neigh);
}
notify = 1;
- goto out;
+ skip_probe = true;
}
+ if (notify)
+ __neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+
+ if (skip_probe)
+ goto out;
+
if (neigh->nud_state & NUD_IN_TIMER) {
if (time_before(next, jiffies + HZ/100))
next = jiffies + HZ/100;
@@ -1187,10 +1194,8 @@ static void neigh_timer_handler(struct timer_list *t)
write_unlock(&neigh->lock);
}
- if (notify) {
- neigh_notify(neigh, RTM_NEWNEIGH, 0, 0);
+ if (notify)
call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
- }
trace_neigh_timer_handler(neigh, 0);
--
2.51.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
` (7 preceding siblings ...)
2026-01-21 16:43 ` [PATCH net-next v2 8/8] net: core: neighbour: Make another " Petr Machata
@ 2026-01-23 15:28 ` Simon Horman
2026-01-25 23:00 ` patchwork-bot+netdevbpf
9 siblings, 0 replies; 12+ messages in thread
From: Simon Horman @ 2026-01-23 15:28 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, Kuniyuki Iwashima, Breno Leitao,
Andy Roulin, Francesco Ruggeri, Stephen Hemminger, mlxsw
On Wed, Jan 21, 2026 at 05:43:34PM +0100, Petr Machata wrote:
...
> v2:
> - Patch #2:
> - Drop the __acquires / __releases annotations at neigh_notify().
> They are not necessary with a symmetrically locking function.
> - Retain the R-b tag for this change.
> - Patch #8:
> - Do not skip the notification from inside the
> atomic_read(&neigh->probes) >= neigh_max_probes(neigh)
> conditional. Instead set a flag, and goto out after the
> notification if the flag is set.
> - Move the __neigh_notify() call another block up above the
> NUD_IN_TIMER check. That belongs logically together with
> the (NUD_INCOMPLETE | NUD_PROBE) check afterwards, no sense
> to split the two conditionals with the notifier.
I'm ambivalent on the last point. And at any rate, I don't think my next
point warrants a re-spin. But I do wonder if the same applies to
neigh_update_process_arp_queue and neigh_fill_info.
That notwithstanding, this series looks good to me.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically
2026-01-21 16:43 [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Petr Machata
` (8 preceding siblings ...)
2026-01-23 15:28 ` [PATCH net-next v2 0/8] net: neighbour: Notify changes atomically Simon Horman
@ 2026-01-25 23:00 ` patchwork-bot+netdevbpf
9 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-01-25 23:00 UTC (permalink / raw)
To: Petr Machata
Cc: davem, edumazet, kuba, pabeni, horms, netdev, idosch, kuniyu,
leitao, aroulin, fruggeri, stephen, mlxsw
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 21 Jan 2026 17:43:34 +0100 you wrote:
> Andy Roulin and Francesco Ruggeri have apparently independently both hit an
> issue with the current neighbor notification scheme. Francesco reported the
> issue in [1]. In a response[2] to that report, Andy said:
>
> neigh_update sends a rtnl notification if an update, e.g.,
> nud_state change, was done but there is no guarantee of
> ordering of the rtnl notifications. Consider the following
> scenario:
>
> [...]
Here is the summary with links:
- [net-next,v2,1/8] net: core: neighbour: Add a neigh_fill_info() helper for when lock not held
https://git.kernel.org/netdev/net-next/c/77fa50dcb987
- [net-next,v2,2/8] net: core: neighbour: Call __neigh_notify() under a lock
https://git.kernel.org/netdev/net-next/c/0ecdccb8a455
- [net-next,v2,3/8] net: core: neighbour: Extract ARP queue processing to a helper function
https://git.kernel.org/netdev/net-next/c/705ef89ac986
- [net-next,v2,4/8] net: core: neighbour: Process ARP queue later
https://git.kernel.org/netdev/net-next/c/89246ef82d79
- [net-next,v2,5/8] net: core: neighbour: Inline neigh_update_notify() calls
https://git.kernel.org/netdev/net-next/c/a228a7e681b8
- [net-next,v2,6/8] net: core: neighbour: Reorder netlink & internal notification
https://git.kernel.org/netdev/net-next/c/795258891c94
- [net-next,v2,7/8] net: core: neighbour: Make one netlink notification atomically
https://git.kernel.org/netdev/net-next/c/d0887dc8b2d0
- [net-next,v2,8/8] net: core: neighbour: Make another netlink notification atomically
https://git.kernel.org/netdev/net-next/c/a00266969c8e
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread