public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Wrong source address selection in arp_solicit for forwarded packets
@ 2025-10-17 14:47 Gabriel Goller
  2025-10-20 14:06 ` Ido Schimmel
  0 siblings, 1 reply; 6+ messages in thread
From: Gabriel Goller @ 2025-10-17 14:47 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netdev, linux-kernel

Hi,
I have a question about the arp solicit behavior:

I have the following simple infrastructure with linux hosts where the ip
addresses are configured on dummy interfaces and all other interfaces are
unnumbered:

   ┌────────┐     ┌────────┐     ┌────────┐  
   │ node1  ├─────┤ node2  ├─────┤ node3  │  
   │10.0.1.1│     │10.0.1.2│     │10.0.1.3│  
   └────────┘     └────────┘     └────────┘  

All nodes have routes configured and can ping each other. ipv4 forwarding is
enabled on all nodes, so pinging from node1 to node3 should work. However, I'm
encountering an issue where node2 does not send correct arp solicitation
packets when forwarding icmp packets from node1 to node3.

For example, when pinging from node1 to node3, node2 sends out the
following arp packet:

13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100),
length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6),
IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28

Here, 172.16.0.102 is an ip address configured on a different interface on
node2. This request will never receive a response because `rp_filter=2`.

node2 has the following (correct) routes installed:

10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink

Since arp_announce is set to 0 (the default), arp_solicit selects the first
interface with an ip address (inet_select_addr), which results in
selecting the wrong source address (172.16.0.102) for the arp request.
Because rp_filter is set to 2, we won't receive an answer to this arp
packet, and the ping will fail unless we explicitly ping from node2 to
node3.

I'm wondering if it would be possible (and correct) to modify arp_solicit to
perform a fib lookup to check if there's a route with an explicit source
address (e.g., the route above using src 10.0.1.2) and use that address as the
source address for the arp packet. Of course, this wouldn't be backward
compatible, as some users might rely on the current interface ordering behavior
(or the loopback interface being selected first), so it would need to be
controlled via a sysctl configuration flag. Perhaps I'm missing something
obvious here though.

Any insights would be appreciated!

Gabriel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wrong source address selection in arp_solicit for forwarded packets
  2025-10-17 14:47 Wrong source address selection in arp_solicit for forwarded packets Gabriel Goller
@ 2025-10-20 14:06 ` Ido Schimmel
  2025-10-20 14:27   ` Maciej W. Rozycki
  2025-10-21 12:31   ` Gabriel Goller
  0 siblings, 2 replies; 6+ messages in thread
From: Ido Schimmel @ 2025-10-20 14:06 UTC (permalink / raw)
  To: g.goller
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel

On Fri, Oct 17, 2025 at 04:47:27PM +0200, Gabriel Goller wrote:
> Hi,
> I have a question about the arp solicit behavior:
> 
> I have the following simple infrastructure with linux hosts where the ip
> addresses are configured on dummy interfaces and all other interfaces are
> unnumbered:
> 
>   ┌────────┐     ┌────────┐     ┌────────┐    │ node1  ├─────┤ node2
> ├─────┤ node3  │    │10.0.1.1│     │10.0.1.2│     │10.0.1.3│    └────────┘
> └────────┘     └────────┘

The diagram looks mangled. At least I don't understand it.

> 
> All nodes have routes configured and can ping each other. ipv4 forwarding is
> enabled on all nodes, so pinging from node1 to node3 should work. However, I'm
> encountering an issue where node2 does not send correct arp solicitation
> packets when forwarding icmp packets from node1 to node3.

I believe ICMP is irrelevant here.

> 
> For example, when pinging from node1 to node3, node2 sends out the
> following arp packet:
> 
> 13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100),
> length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6),
> IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28
> 
> Here, 172.16.0.102 is an ip address configured on a different interface on
> node2. This request will never receive a response because `rp_filter=2`.
> 
> node2 has the following (correct) routes installed:
> 
> 10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink
> 
> Since arp_announce is set to 0 (the default), arp_solicit selects the first
> interface with an ip address (inet_select_addr), which results in
> selecting the wrong source address (172.16.0.102) for the arp request.
> Because rp_filter is set to 2, we won't receive an answer to this arp
> packet, and the ping will fail unless we explicitly ping from node2 to
> node3.
> 
> I'm wondering if it would be possible (and correct) to modify arp_solicit to
> perform a fib lookup to check if there's a route with an explicit source
> address (e.g., the route above using src 10.0.1.2) and use that address as the
> source address for the arp packet. Of course, this wouldn't be backward
> compatible, as some users might rely on the current interface ordering behavior
> (or the loopback interface being selected first), so it would need to be
> controlled via a sysctl configuration flag. Perhaps I'm missing something
> obvious here though.

This would probably entail adding a new arp_announce level, but nobody
added a new level in at least 20 years, so you will need to explain why
your setup is special and why the same functionality cannot be achieved
in a different way that does not require kernel changes.

A few things you can consider:

1. You wrote that the router interfaces are unnumbered. Modern
unnumbered networks usually assign IPv6 link-local addresses to these
interfaces. These addresses are only used for neighbour resolution and
can be used as the nexthop address for IPv4 routes. For example:

ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev dummy1

Or using nexthop objects:

ip nexthop add id 1 via fe80::1 dev dummy1
ip route add 192.0.2.1/32 nhid 1

2. If you have interfaces whose addresses should not be considered as
source addresses when generating IP/ARP packets out of other interfaces,
then you can try placing them in a different VRF if it's viable.

3. Requires some work and I didn't look too much into it, but I believe
it should be possible to derive the preferred source address and rewrite
it in ARP packets using tc-bpf on egress. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dab4e1f06cabb6834de14264394ccab197007302

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wrong source address selection in arp_solicit for forwarded packets
  2025-10-20 14:06 ` Ido Schimmel
@ 2025-10-20 14:27   ` Maciej W. Rozycki
  2025-10-21 12:31   ` Gabriel Goller
  1 sibling, 0 replies; 6+ messages in thread
From: Maciej W. Rozycki @ 2025-10-20 14:27 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: g.goller, David S. Miller, dsahern, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel

On Mon, 20 Oct 2025, Ido Schimmel wrote:

> > I have the following simple infrastructure with linux hosts where the ip
> > addresses are configured on dummy interfaces and all other interfaces are
> > unnumbered:
> > 
> >   ┌────────┐     ┌────────┐     ┌────────┐    │ node1  ├─────┤ node2
> > ├─────┤ node3  │    │10.0.1.1│     │10.0.1.2│     │10.0.1.3│    └────────┘
> > └────────┘     └────────┘
> 
> The diagram looks mangled. At least I don't understand it.

 It's been broken by:

Content-Type: text/plain; charset=utf-8; format=flowed

Cf. Documentation/process/email-clients.rst.  Raw message contents look 
good.

 HTH,

  Maciej

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wrong source address selection in arp_solicit for forwarded packets
  2025-10-20 14:06 ` Ido Schimmel
  2025-10-20 14:27   ` Maciej W. Rozycki
@ 2025-10-21 12:31   ` Gabriel Goller
  2025-10-21 15:56     ` Ido Schimmel
  1 sibling, 1 reply; 6+ messages in thread
From: Gabriel Goller @ 2025-10-21 12:31 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel

On 20.10.2025 17:06, Ido Schimmel wrote:
> On Fri, Oct 17, 2025 at 04:47:27PM +0200, Gabriel Goller wrote:
> > Hi,
> > I have a question about the arp solicit behavior:
> > 
> > I have the following simple infrastructure with linux hosts where the ip
> > addresses are configured on dummy interfaces and all other interfaces are
> > unnumbered:
> > 
> >   ┌────────┐     ┌────────┐     ┌────────┐    │ node1  ├─────┤ node2
> > ├─────┤ node3  │    │10.0.1.1│     │10.0.1.2│     │10.0.1.3│    └────────┘
> > └────────┘     └────────┘
> 
> The diagram looks mangled. At least I don't understand it.

Ah sorry about that, looks like I had format=flowed configured on my
client.

Diagram should be correct now:

   ┌────────┐     ┌────────┐     ┌────────┐
   │ node1  ├─────┤ node2  ├─────┤ node3  │
   │10.0.1.1│     │10.0.1.2│     │10.0.1.3│
   └────────┘     └────────┘     └────────┘

If it's still not right it's correctly rendered on lore:
https://lore.kernel.org/netdev/eykjh3y2bse2tmhn5rn2uvztoepkbnxpb7n2pvwq62pjetdu7o@r46lgxf4azz7/

> > All nodes have routes configured and can ping each other. ipv4 forwarding is
> > enabled on all nodes, so pinging from node1 to node3 should work. However, I'm
> > encountering an issue where node2 does not send correct arp solicitation
> > packets when forwarding icmp packets from node1 to node3.
> 
> I believe ICMP is irrelevant here.

Yep, ICMP is just an example.

> > For example, when pinging from node1 to node3, node2 sends out the
> > following arp packet:
> > 
> > 13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100),
> > length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6),
> > IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28
> > 
> > Here, 172.16.0.102 is an ip address configured on a different interface on
> > node2. This request will never receive a response because `rp_filter=2`.
> > 
> > node2 has the following (correct) routes installed:
> > 
> > 10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink
> > 
> > Since arp_announce is set to 0 (the default), arp_solicit selects the first
> > interface with an ip address (inet_select_addr), which results in
> > selecting the wrong source address (172.16.0.102) for the arp request.
> > Because rp_filter is set to 2, we won't receive an answer to this arp
> > packet, and the ping will fail unless we explicitly ping from node2 to
> > node3.
> > 
> > I'm wondering if it would be possible (and correct) to modify arp_solicit to
> > perform a fib lookup to check if there's a route with an explicit source
> > address (e.g., the route above using src 10.0.1.2) and use that address as the
> > source address for the arp packet. Of course, this wouldn't be backward
> > compatible, as some users might rely on the current interface ordering behavior
> > (or the loopback interface being selected first), so it would need to be
> > controlled via a sysctl configuration flag. Perhaps I'm missing something
> > obvious here though.
> 
> This would probably entail adding a new arp_announce level, but nobody
> added a new level in at least 20 years, so you will need to explain why
> your setup is special and why the same functionality cannot be achieved
> in a different way that does not require kernel changes.

To add a bit more context, I'm using FRR on all nodes and the dummy
interface ips are distributed using OpenFabric. But this shouldn't
matter because the routes are inserted correctly and work fine.

> A few things you can consider:
> 
> 1. You wrote that the router interfaces are unnumbered. Modern
> unnumbered networks usually assign IPv6 link-local addresses to these
> interfaces. These addresses are only used for neighbour resolution and
> can be used as the nexthop address for IPv4 routes. For example:
> 
> ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev dummy1
> 
> Or using nexthop objects:
> 
> ip nexthop add id 1 via fe80::1 dev dummy1
> ip route add 192.0.2.1/32 nhid 1

Hmm I don't know how this would help? There is a link-local address set
on the interface, but we would have to add a ipv6 source address to the
arp packet which wouldn't be right?

The route already exists (see `dev ens22` and `onlink`).

> 2. If you have interfaces whose addresses should not be considered as
> source addresses when generating IP/ARP packets out of other interfaces,
> then you can try placing them in a different VRF if it's viable.

Yep, this is definitely a solution as the "loopback" address of the VRF
is its master device. Still, what if the master device or the loopback
device have multiple ips?

> 3. Requires some work and I didn't look too much into it, but I believe
> it should be possible to derive the preferred source address and rewrite
> it in ARP packets using tc-bpf on egress. See:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dab4e1f06cabb6834de14264394ccab197007302

Yeah ebpf is definitely also a solution, but IMO this is a bit of a
weird behavior and should be fixed in the kernel.

We have all the information we need (from the routes) and just need to
use them to select the correct source address, and not just give up and
select randomly.


Thanks for the answer!
Gabriel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wrong source address selection in arp_solicit for forwarded packets
  2025-10-21 12:31   ` Gabriel Goller
@ 2025-10-21 15:56     ` Ido Schimmel
  2025-10-22  8:57       ` Gabriel Goller
  0 siblings, 1 reply; 6+ messages in thread
From: Ido Schimmel @ 2025-10-21 15:56 UTC (permalink / raw)
  To: g.goller
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel

On Tue, Oct 21, 2025 at 02:31:51PM +0200, Gabriel Goller wrote:
> Hmm I don't know how this would help? There is a link-local address set
> on the interface, but we would have to add a ipv6 source address to the
> arp packet which wouldn't be right?

There are no ARP packets. Neighbour resolution is performed via IPv6
NA/NS messages. The script below [1] replicates your setup as I
understand, but it uses IPv6 link-local addresses for the nexthops.

[1]
#!/bin/bash

cleanup() {
	for i in {1..3}; do
		ip netns del node${i} &> /dev/null
	done
}

trap cleanup EXIT

cleanup

for i in {1..3}; do
	ip netns add node${i}
	ip netns exec node${i} sysctl -wq net.ipv4.conf.all.forwarding=1
	ip netns exec node${i} sysctl -wq net.ipv4.conf.all.rp_filter=2
	ip -n node${i} link set dev lo up
	ip -n node${i} link add name dummy up type dummy
	ip -n node${i} address add 10.0.1.${i}/32 dev dummy
done

ip -n node1 link add name veth1 type veth peer name veth2 netns node2
ip -n node2 link add name veth3 type veth peer name veth4 netns node3

ip -n node1 link set dev veth1 up
ip -n node2 link set dev veth2 up
ip -n node2 link set dev veth3 up
ip -n node3 link set dev veth4 up

ip -n node1 address add fe80::1/64 dev veth1 nodad
ip -n node2 address add fe80::2/64 dev veth2 nodad
ip -n node2 address add fe80::3/64 dev veth3 nodad
ip -n node3 address add fe80::4/64 dev veth4 nodad

ip -n node1 route add 10.0.1.2/32 src 10.0.1.1 nexthop via inet6 fe80::2 dev veth1
ip -n node1 route add 10.0.1.3/32 src 10.0.1.1 nexthop via inet6 fe80::2 dev veth1
ip -n node2 route add 10.0.1.1/32 src 10.0.1.2 nexthop via inet6 fe80::1 dev veth2
ip -n node2 route add 10.0.1.3/32 src 10.0.1.2 nexthop via inet6 fe80::4 dev veth3
ip -n node3 route add 10.0.1.1/32 src 10.0.1.3 nexthop via inet6 fe80::3 dev veth4
ip -n node3 route add 10.0.1.2/32 src 10.0.1.3 nexthop via inet6 fe80::3 dev veth4

ip netns exec node1 ping 10.0.1.3 -c 5
ip netns exec node1 ping 10.0.1.2 -c 5

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wrong source address selection in arp_solicit for forwarded packets
  2025-10-21 15:56     ` Ido Schimmel
@ 2025-10-22  8:57       ` Gabriel Goller
  0 siblings, 0 replies; 6+ messages in thread
From: Gabriel Goller @ 2025-10-22  8:57 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel

On 21.10.2025 18:56, Ido Schimmel wrote:
> On Tue, Oct 21, 2025 at 02:31:51PM +0200, Gabriel Goller wrote:
> > Hmm I don't know how this would help? There is a link-local address set
> > on the interface, but we would have to add a ipv6 source address to the
> > arp packet which wouldn't be right?
> 
> There are no ARP packets. Neighbour resolution is performed via IPv6
> NA/NS messages. The script below [1] replicates your setup as I
> understand, but it uses IPv6 link-local addresses for the nexthops.

Ah yes, I understand what you mean now. Still, in an IPv4-only
environment (e.g. OSPFv2, which does not support IPv6, where we would
have to add route-maps [0] for every interface to rewrite the routes)
you still have that problem.

[0]: https://docs.frrouting.org/en/latest/routemap.html#clicmd-set-ip-next-hop-peer-address

> [snip]

Thanks,
Gabriel


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-22  8:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17 14:47 Wrong source address selection in arp_solicit for forwarded packets Gabriel Goller
2025-10-20 14:06 ` Ido Schimmel
2025-10-20 14:27   ` Maciej W. Rozycki
2025-10-21 12:31   ` Gabriel Goller
2025-10-21 15:56     ` Ido Schimmel
2025-10-22  8:57       ` Gabriel Goller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox