* Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses"
2026-05-29 11:41 ` Stefano Brivio
@ 2026-05-29 11:45 ` Fernando Fernandez Mancera
2026-05-29 12:06 ` Chris Adams
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Fernando Fernandez Mancera @ 2026-05-29 11:45 UTC (permalink / raw)
To: Stefano Brivio
Cc: netdev, yuhuang, justin.iurman, horms, pabeni, kuba, edumazet,
davem, idosch, dsahern, Chris Adams, David Gibson,
Beniamino Galvani, Thorsten Leemhuis, Andrew Lunn, ihuguet,
regressions
On 5/29/26 1:41 PM, Stefano Brivio wrote:
> On Fri, 29 May 2026 13:23:57 +0200
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>
>> Chris Adams reported that preserving insertion order for same-scope
>> addresses is causing SSH connections to be dropped after stopping a VM
>> while running NetworkManager.
>>
>> NetworkManager caches the IPv6 address configuration, when a RA arrives,
>> it determines the list of addresses to configure and checks if the
>> addresses are already in the right order in the kernel. If they aren't,
>> NetworkManager removes and re-adds them to achieve the desired order.
>>
>> As the order changes, NetworkManager is confused and reconfigures the
>> addresses on every update. In addition, this would also affect to cloud
>> tooling that relies on IPv6 addresses order to identify primary and
>> secondaries addresses.
>
> By the way, I'm still looking into this part, trying to find
> "problematic" examples.
>
> And I couldn't find any, yet, because it looks like there's always a
> _single_ IPv6 address being used as a secondary for a primary IPv4
> address.
>
IIRC, Azure cloud should be one of them. I do not have an account there
to test it.. but some years ago I did some work supporting configuring
IPV6 primary and secondary addresses via IMDSv2.
If someone with an account could test it I would appreciate it.
>>
>> This reverts commit cb3de96eea66f5e4a580086c6a1be46e765f97f4.
>>
>> Fixes: cb3de96eea66 ("ipv6: preserve insertion order for same-scope addresses")
>> Reported-by: Chris Adams <linux@cmadams.net>
>> Closes: https://lore.kernel.org/netdev/20260521135310.GC977@cmadams.net/
>> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
>> ---
>> v2: updated commit description to make it more accurate
>> ---
>> net/ipv6/addrconf.c | 2 +-
>> tools/testing/selftests/net/ioam6.sh | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 5476b6536eb7..bb84a78b80f6 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -1013,7 +1013,7 @@ ipv6_link_dev_addr(struct inet6_dev *idev, struct inet6_ifaddr *ifp)
>> list_for_each(p, &idev->addr_list) {
>> struct inet6_ifaddr *ifa
>> = list_entry(p, struct inet6_ifaddr, if_list);
>> - if (ifp_scope > ipv6_addr_src_scope(&ifa->addr))
>> + if (ifp_scope >= ipv6_addr_src_scope(&ifa->addr))
>> break;
>> }
>>
>> diff --git a/tools/testing/selftests/net/ioam6.sh b/tools/testing/selftests/net/ioam6.sh
>> index b2b99889942f..845c26dd01a9 100755
>> --- a/tools/testing/selftests/net/ioam6.sh
>> +++ b/tools/testing/selftests/net/ioam6.sh
>> @@ -273,8 +273,8 @@ setup()
>> ip -netns $ioam_node_beta link set ioam-veth-betaR name veth1 &>/dev/null
>> ip -netns $ioam_node_gamma link set ioam-veth-gamma name veth0 &>/dev/null
>>
>> - ip -netns $ioam_node_alpha addr add 2001:db8:1::2/64 dev veth0 &>/dev/null
>> ip -netns $ioam_node_alpha addr add 2001:db8:1::50/64 dev veth0 &>/dev/null
>> + ip -netns $ioam_node_alpha addr add 2001:db8:1::2/64 dev veth0 &>/dev/null
>> ip -netns $ioam_node_alpha link set veth0 up &>/dev/null
>> ip -netns $ioam_node_alpha link set lo up &>/dev/null
>> ip -netns $ioam_node_alpha route add 2001:db8:2::/64 \
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses"
2026-05-29 11:41 ` Stefano Brivio
2026-05-29 11:45 ` Fernando Fernandez Mancera
@ 2026-05-29 12:06 ` Chris Adams
2026-06-01 2:03 ` Matthieu Baerts
2026-06-02 6:44 ` IPv6 address insertion order (was Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses") David Gibson
3 siblings, 0 replies; 12+ messages in thread
From: Chris Adams @ 2026-05-29 12:06 UTC (permalink / raw)
To: Stefano Brivio
Cc: Fernando Fernandez Mancera, netdev, yuhuang, justin.iurman, horms,
pabeni, kuba, edumazet, davem, idosch, dsahern, David Gibson,
Beniamino Galvani, Thorsten Leemhuis, Andrew Lunn, ihuguet,
regressions
Once upon a time, Stefano Brivio <sbrivio@redhat.com> said:
> On Fri, 29 May 2026 13:23:57 +0200
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>
> > Chris Adams reported that preserving insertion order for same-scope
> > addresses is causing SSH connections to be dropped after stopping a VM
> > while running NetworkManager.
> >
> > NetworkManager caches the IPv6 address configuration, when a RA arrives,
> > it determines the list of addresses to configure and checks if the
> > addresses are already in the right order in the kernel. If they aren't,
> > NetworkManager removes and re-adds them to achieve the desired order.
> >
> > As the order changes, NetworkManager is confused and reconfigures the
> > addresses on every update. In addition, this would also affect to cloud
> > tooling that relies on IPv6 addresses order to identify primary and
> > secondaries addresses.
>
> By the way, I'm still looking into this part, trying to find
> "problematic" examples.
>
> And I couldn't find any, yet, because it looks like there's always a
> _single_ IPv6 address being used as a secondary for a primary IPv4
> address.
I hit two cases that triggered the behavior:
- after an overnight suspend/resume cycle (not sure if there's a minimum
time suspended)
- with a bridge NIC, after stopping a VM with a NIC on the bridge (which
removed an fe80:: link-local address); this is the case I used to
bisect since it was easy to script
In both cases, the next IPv6 router-advertisement received was what
triggered NetworkManager to replace the current privacy addresses. This
could be 2-3 minutes after the trigger.
> > This reverts commit cb3de96eea66f5e4a580086c6a1be46e765f97f4.
> >
> > Fixes: cb3de96eea66 ("ipv6: preserve insertion order for same-scope addresses")
> > Reported-by: Chris Adams <linux@cmadams.net>
> > Closes: https://lore.kernel.org/netdev/20260521135310.GC977@cmadams.net/
> > Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
> > ---
> > v2: updated commit description to make it more accurate
> > ---
> > net/ipv6/addrconf.c | 2 +-
> > tools/testing/selftests/net/ioam6.sh | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> > index 5476b6536eb7..bb84a78b80f6 100644
> > --- a/net/ipv6/addrconf.c
> > +++ b/net/ipv6/addrconf.c
> > @@ -1013,7 +1013,7 @@ ipv6_link_dev_addr(struct inet6_dev *idev, struct inet6_ifaddr *ifp)
> > list_for_each(p, &idev->addr_list) {
> > struct inet6_ifaddr *ifa
> > = list_entry(p, struct inet6_ifaddr, if_list);
> > - if (ifp_scope > ipv6_addr_src_scope(&ifa->addr))
> > + if (ifp_scope >= ipv6_addr_src_scope(&ifa->addr))
> > break;
> > }
> >
> > diff --git a/tools/testing/selftests/net/ioam6.sh b/tools/testing/selftests/net/ioam6.sh
> > index b2b99889942f..845c26dd01a9 100755
> > --- a/tools/testing/selftests/net/ioam6.sh
> > +++ b/tools/testing/selftests/net/ioam6.sh
> > @@ -273,8 +273,8 @@ setup()
> > ip -netns $ioam_node_beta link set ioam-veth-betaR name veth1 &>/dev/null
> > ip -netns $ioam_node_gamma link set ioam-veth-gamma name veth0 &>/dev/null
> >
> > - ip -netns $ioam_node_alpha addr add 2001:db8:1::2/64 dev veth0 &>/dev/null
> > ip -netns $ioam_node_alpha addr add 2001:db8:1::50/64 dev veth0 &>/dev/null
> > + ip -netns $ioam_node_alpha addr add 2001:db8:1::2/64 dev veth0 &>/dev/null
> > ip -netns $ioam_node_alpha link set veth0 up &>/dev/null
> > ip -netns $ioam_node_alpha link set lo up &>/dev/null
> > ip -netns $ioam_node_alpha route add 2001:db8:2::/64 \
>
> --
> Stefano
--
Chris Adams <linux@cmadams.net>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses"
2026-05-29 11:41 ` Stefano Brivio
2026-05-29 11:45 ` Fernando Fernandez Mancera
2026-05-29 12:06 ` Chris Adams
@ 2026-06-01 2:03 ` Matthieu Baerts
2026-06-01 13:35 ` Stefano Brivio
2026-06-02 6:44 ` IPv6 address insertion order (was Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses") David Gibson
3 siblings, 1 reply; 12+ messages in thread
From: Matthieu Baerts @ 2026-06-01 2:03 UTC (permalink / raw)
To: Stefano Brivio, Fernando Fernandez Mancera
Cc: netdev, yuhuang, justin.iurman, horms, pabeni, kuba, edumazet,
davem, idosch, dsahern, Chris Adams, David Gibson,
Beniamino Galvani, Thorsten Leemhuis, Andrew Lunn, ihuguet,
regressions
Hi Stefano,
On 29/05/2026 21:41, Stefano Brivio wrote:
> On Fri, 29 May 2026 13:23:57 +0200
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>
>> Chris Adams reported that preserving insertion order for same-scope
>> addresses is causing SSH connections to be dropped after stopping a VM
>> while running NetworkManager.
>>
>> NetworkManager caches the IPv6 address configuration, when a RA arrives,
>> it determines the list of addresses to configure and checks if the
>> addresses are already in the right order in the kernel. If they aren't,
>> NetworkManager removes and re-adds them to achieve the desired order.
>>
>> As the order changes, NetworkManager is confused and reconfigures the
>> addresses on every update. In addition, this would also affect to cloud
>> tooling that relies on IPv6 addresses order to identify primary and
>> secondaries addresses.
>
> By the way, I'm still looking into this part, trying to find
> "problematic" examples.
>
> And I couldn't find any, yet, because it looks like there's always a
> _single_ IPv6 address being used as a secondary for a primary IPv4
> address.
FYI, the order change also affected some specific scripts, e.g. here
with MPTCP and packetdrill:
https://github.com/multipath-tcp/packetdrill/commit/1b7cd4482ce8
Because the order was not "natural" before, and different from IPv4, a
workaround was needed to keep the same order. I was happy to remove it,
but now it looks like I need to re-apply it :)
It would be nice to get the "natural" order back without breaking the
userspace (or with a way to choose the order).
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses"
2026-06-01 2:03 ` Matthieu Baerts
@ 2026-06-01 13:35 ` Stefano Brivio
2026-06-01 14:01 ` Íñigo Huguet
0 siblings, 1 reply; 12+ messages in thread
From: Stefano Brivio @ 2026-06-01 13:35 UTC (permalink / raw)
To: Matthieu Baerts
Cc: Fernando Fernandez Mancera, netdev, yuhuang, justin.iurman, horms,
pabeni, kuba, edumazet, davem, idosch, dsahern, Chris Adams,
David Gibson, Beniamino Galvani, Thorsten Leemhuis, Andrew Lunn,
ihuguet, regressions
On Mon, 1 Jun 2026 12:03:59 +1000
Matthieu Baerts <matttbe@kernel.org> wrote:
> Hi Stefano,
>
> On 29/05/2026 21:41, Stefano Brivio wrote:
> > On Fri, 29 May 2026 13:23:57 +0200
> > Fernando Fernandez Mancera <fmancera@suse.de> wrote:
> >
> >> Chris Adams reported that preserving insertion order for same-scope
> >> addresses is causing SSH connections to be dropped after stopping a VM
> >> while running NetworkManager.
> >>
> >> NetworkManager caches the IPv6 address configuration, when a RA arrives,
> >> it determines the list of addresses to configure and checks if the
> >> addresses are already in the right order in the kernel. If they aren't,
> >> NetworkManager removes and re-adds them to achieve the desired order.
> >>
> >> As the order changes, NetworkManager is confused and reconfigures the
> >> addresses on every update. In addition, this would also affect to cloud
> >> tooling that relies on IPv6 addresses order to identify primary and
> >> secondaries addresses.
> >
> > By the way, I'm still looking into this part, trying to find
> > "problematic" examples.
> >
> > And I couldn't find any, yet, because it looks like there's always a
> > _single_ IPv6 address being used as a secondary for a primary IPv4
> > address.
>
> FYI, the order change also affected some specific scripts, e.g. here
> with MPTCP and packetdrill:
>
> https://github.com/multipath-tcp/packetdrill/commit/1b7cd4482ce8
>
> Because the order was not "natural" before, and different from IPv4, a
> workaround was needed to keep the same order. I was happy to remove it,
> but now it looks like I need to re-apply it :)
Ouch. :) We were are also pondering about some kind of workaround like
that (https://bugs.passt.top/show_bug.cgi?id=175#c9) but fixing the
kernel looked simpler and the right thing to do... until last week.
> It would be nice to get the "natural" order back without breaking the
> userspace (or with a way to choose the order).
I was thinking that if we implement a label like NLM_F_INSERT_LAST
(David's proposal for the name), we could patch iproute2 to set it on
'ip address restore' at least, other than using it in pasta(1).
But that wouldn't be enough for your case. At the same time always
adding it for RTM_NEWADDR requests in iproute2 could break somebody
else's scripts. I guess a reasonable solution could be to add an
additional parameter for ip-address... 'insert_last'? 'last'?
--
Stefano
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses"
2026-06-01 13:35 ` Stefano Brivio
@ 2026-06-01 14:01 ` Íñigo Huguet
2026-06-01 14:22 ` Thorsten Leemhuis
0 siblings, 1 reply; 12+ messages in thread
From: Íñigo Huguet @ 2026-06-01 14:01 UTC (permalink / raw)
To: Stefano Brivio
Cc: Matthieu Baerts, Fernando Fernandez Mancera, netdev, yuhuang,
justin.iurman, horms, pabeni, kuba, edumazet, davem, idosch,
dsahern, Chris Adams, David Gibson, Beniamino Galvani,
Thorsten Leemhuis, Andrew Lunn, regressions
On Mon, Jun 1, 2026 at 3:35 PM Stefano Brivio <sbrivio@redhat.com> wrote:
> I was thinking that if we implement a label like NLM_F_INSERT_LAST
> (David's proposal for the name), we could patch iproute2 to set it on
> 'ip address restore' at least, other than using it in pasta(1).
I'm not sure whether what I'm going to say is a good idea, or just a
plain stupid proposal:
By reverting the old odd behaviour, UAPI is restored, yes. On the
other side, userspace programs won't have any incentive to start using
NLM_F_INSERT_LAST, which is actually the most desirable behaviour. The
old odd behaviour is very error prone, as we can see with the `ip addr
restore` bug. That's why programs should adopt the "new" behaviour and
this will save them from introducing bugs like that one in the future.
And new programs should get the normal insertion by default, without
needing to read documentation or kernel headers to find
NLM_F_INSERT_LAST.
What if we reapply the patch and add a NLM_F_INSERT_FIRST /
NLM_F_PREPEND option instead? This will break UAPI, which is bad, but
probably not too bad per the conversations in this and the other
thread. Userspace programs affected by the change, like
NetworkManager, will need to be fixed, either by switching to natural
order insertion, or by passing the NLM_F_INSERT_FIRST flag to restore
the old behaviour.
Additionally, a NLM_F_INSERT_FIRST can be implemented for IPv4 too, at
least for consistency (I don't know whether it may be actually useful
for something, likely yes?).
As I said, this solution would break a few programs like
NetworkManager, but the fix is as easy as `if (ipv6) flags |=
NLM_F_INSERT_FIRST`. Would old kernels ignore this unknown flag?
Because, in that case, we don't even need a detection mechanism from
NetworkManager.
>
> But that wouldn't be enough for your case. At the same time always
> adding it for RTM_NEWADDR requests in iproute2 could break somebody
> else's scripts. I guess a reasonable solution could be to add an
> additional parameter for ip-address... 'insert_last'? 'last'?
>
> --
> Stefano
>
--
Íñigo Huguet
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses"
2026-06-01 14:01 ` Íñigo Huguet
@ 2026-06-01 14:22 ` Thorsten Leemhuis
0 siblings, 0 replies; 12+ messages in thread
From: Thorsten Leemhuis @ 2026-06-01 14:22 UTC (permalink / raw)
To: Íñigo Huguet, Stefano Brivio
Cc: Matthieu Baerts, Fernando Fernandez Mancera, netdev, yuhuang,
justin.iurman, horms, pabeni, kuba, edumazet, davem, idosch,
dsahern, Chris Adams, David Gibson, Beniamino Galvani,
Andrew Lunn, regressions
On 6/1/26 16:01, Íñigo Huguet wrote:
> As I said, this solution would break a few programs like
> NetworkManager
FWIW, that is not allowed in the Linux kernel; see
https://www.kernel.org/doc/html/latest/process/handling-regressions.html#on-situations-where-updating-something-in-userspace-can-resolve-regressions
and other sections on that page.
As mentioned earlier, there are nevertheless solutions for this kind of
problems, but they are all somewhat ugly -- like some bit in /proc/ a
fixed NetworkManager could flip at boot time, as then everything will
continue to work for those that update the kernel without updating
NetworkManager.
Ciao, Thorsten
^ permalink raw reply [flat|nested] 12+ messages in thread
* IPv6 address insertion order (was Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses")
2026-05-29 11:41 ` Stefano Brivio
` (2 preceding siblings ...)
2026-06-01 2:03 ` Matthieu Baerts
@ 2026-06-02 6:44 ` David Gibson
2026-06-02 12:46 ` Andrew Lunn
2026-06-02 13:21 ` Ido Schimmel
3 siblings, 2 replies; 12+ messages in thread
From: David Gibson @ 2026-06-02 6:44 UTC (permalink / raw)
To: Stefano Brivio
Cc: Fernando Fernandez Mancera, netdev, yuhuang, justin.iurman, horms,
pabeni, kuba, edumazet, davem, idosch, dsahern, Chris Adams,
Beniamino Galvani, Thorsten Leemhuis, Andrew Lunn, ihuguet,
regressions
[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]
I get the impression there's a rough consensus that the best we can do
now is revert this change (already done), and make a new patch which
changes the insertion order to the "correct" one conditional on a new
flag.
Stefano has enough other fires to fight, so I'm taking a look at
implementing that. Some initial thoughts, that I'm soliciting
feedback on:
1) I'm assuming the idea here is to add the new flag to nlmsg_flags in
nlmsghdr
ifa_flags in ifaddrmsg would be the other candidate, but it looks like
it's encoding properties of the address itself, not about the action
of inserting it. Plus all its bits are allocated, anyway.
2) Could we re-use NLM_F_APPEND?
The short description of this existing flag in linux/uapi/netlink.h is
"Add to end of list" which sounds like the right thing. Looking
closer, however, it seems like what is' used for so far is things
where the entity added with the NEW<whatever> operation is itself a
list, and NLM_F_APPEND causes it to be added to rather than replaced.
It's not used for addresses at present, AFAICT the list of addresses
is a semantic level above the address entity itself.
So maybe re-using it for the thing I tentatively called
NLM_F_INSERT_LAST would be confusing?
On the other hand, it's not used for addresses at the moment, so
AFAICT there's nothing actually preventing us reusing it for this
purpose. That would save a bit - we only have 2 general and 4 NEW
specific bits left, by the looks of it.
3) What other things might need changing to take advantage of the
kernel change
My innitial testing suggests that unknown nlmsg_flags bits are
currently ignored by the kernel. That means that tools for which the
new behviour is desirable, but not essential may be able to avoid
probing - they can set the bit and hope. I think that includes
passt/pasta and also iproute's save/restore functionality.
That said, things which might want updating:
- iproute2 to use this for save/restore
- netlink(7) man page to detail the new flag / new flag semantics
- passt/pasta
- network-manager, here we require the exact behaviour be maintained,
so it would have to attempt the new flag, then apply the existing
workaround (reversing the order itself) if that fails. Maybe more
trouble that it's worth?
Any others people can think of?
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: IPv6 address insertion order (was Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses")
2026-06-02 6:44 ` IPv6 address insertion order (was Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses") David Gibson
@ 2026-06-02 12:46 ` Andrew Lunn
2026-06-02 13:21 ` Ido Schimmel
1 sibling, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2026-06-02 12:46 UTC (permalink / raw)
To: David Gibson
Cc: Stefano Brivio, Fernando Fernandez Mancera, netdev, yuhuang,
justin.iurman, horms, pabeni, kuba, edumazet, davem, idosch,
dsahern, Chris Adams, Beniamino Galvani, Thorsten Leemhuis,
ihuguet, regressions
On Tue, Jun 02, 2026 at 04:44:19PM +1000, David Gibson wrote:
> I get the impression there's a rough consensus that the best we can do
> now is revert this change (already done), and make a new patch which
> changes the insertion order to the "correct" one conditional on a new
> flag.
>
> Stefano has enough other fires to fight, so I'm taking a look at
> implementing that. Some initial thoughts, that I'm soliciting
> feedback on:
I've only been partially reading along...
Are we talking about RTM_NEWADDR?
I've never worked on the code dealing with addresses. But in general,
if you want to add new functionality to a netlink message, you add a
new attribute to the message.
https://elixir.bootlin.com/linux/v7.0.10/source/include/uapi/linux/if_addr.h#L26
Andrew
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: IPv6 address insertion order (was Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses")
2026-06-02 6:44 ` IPv6 address insertion order (was Re: [PATCH net v2] Revert "ipv6: preserve insertion order for same-scope addresses") David Gibson
2026-06-02 12:46 ` Andrew Lunn
@ 2026-06-02 13:21 ` Ido Schimmel
1 sibling, 0 replies; 12+ messages in thread
From: Ido Schimmel @ 2026-06-02 13:21 UTC (permalink / raw)
To: David Gibson
Cc: Stefano Brivio, Fernando Fernandez Mancera, netdev, yuhuang,
justin.iurman, horms, pabeni, kuba, edumazet, davem, dsahern,
Chris Adams, Beniamino Galvani, Thorsten Leemhuis, Andrew Lunn,
ihuguet, regressions
On Tue, Jun 02, 2026 at 04:44:19PM +1000, David Gibson wrote:
> I get the impression there's a rough consensus that the best we can do
> now is revert this change (already done), and make a new patch which
> changes the insertion order to the "correct" one conditional on a new
> flag.
>
> Stefano has enough other fires to fight, so I'm taking a look at
> implementing that. Some initial thoughts, that I'm soliciting
> feedback on:
>
> 1) I'm assuming the idea here is to add the new flag to nlmsg_flags in
> nlmsghdr
>
> ifa_flags in ifaddrmsg would be the other candidate, but it looks like
> it's encoding properties of the address itself, not about the action
> of inserting it. Plus all its bits are allocated, anyway.
>
> 2) Could we re-use NLM_F_APPEND?
>
> The short description of this existing flag in linux/uapi/netlink.h is
> "Add to end of list" which sounds like the right thing. Looking
> closer, however, it seems like what is' used for so far is things
> where the entity added with the NEW<whatever> operation is itself a
> list, and NLM_F_APPEND causes it to be added to rather than replaced.
> It's not used for addresses at present, AFAICT the list of addresses
> is a semantic level above the address entity itself.
>
> So maybe re-using it for the thing I tentatively called
> NLM_F_INSERT_LAST would be confusing?
>
> On the other hand, it's not used for addresses at the moment, so
> AFAICT there's nothing actually preventing us reusing it for this
> purpose. That would save a bit - we only have 2 general and 4 NEW
> specific bits left, by the looks of it.
This is not really viable. Even if the kernel is not using NLM_F_APPEND
for RTM_NEWADDR, but not rejecting its presence either, then we can
create a change in behavior for a user space that is currently setting
it (intentionally or not).
Example:
https://lore.kernel.org/netdev/27c249d80c346a258cfbf32f1d131ad4fe64e77c.camel@debian.org/
>
> 3) What other things might need changing to take advantage of the
> kernel change
>
> My innitial testing suggests that unknown nlmsg_flags bits are
> currently ignored by the kernel.
Which is a problem... From the looks of it, the same is true for
ifa_flags.
> That means that tools for which the new behviour is desirable, but not
> essential may be able to avoid probing - they can set the bit and
> hope. I think that includes passt/pasta and also iproute's
> save/restore functionality.
>
> That said, things which might want updating:
> - iproute2 to use this for save/restore
> - netlink(7) man page to detail the new flag / new flag semantics
> - passt/pasta
> - network-manager, here we require the exact behaviour be maintained,
> so it would have to attempt the new flag, then apply the existing
> workaround (reversing the order itself) if that fails. Maybe more
> trouble that it's worth?
>
> Any others people can think of?
The safest route is probably a new attribute and a corresponding keyword
in iproute2. Note that new iproute2 versions need to be able to work
with old kernels, so iproute2 cannot use the new attribute by default
(it will be rejected by old kernels).
^ permalink raw reply [flat|nested] 12+ messages in thread