* Wrong source address selection in arp_solicit for forwarded packets @ 2025-10-17 14:47 Gabriel Goller 2025-10-20 14:06 ` Ido Schimmel 0 siblings, 1 reply; 6+ messages in thread From: Gabriel Goller @ 2025-10-17 14:47 UTC (permalink / raw) To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman Cc: netdev, linux-kernel Hi, I have a question about the arp solicit behavior: I have the following simple infrastructure with linux hosts where the ip addresses are configured on dummy interfaces and all other interfaces are unnumbered: ┌────────┐ ┌────────┐ ┌────────┐ │ node1 ├─────┤ node2 ├─────┤ node3 │ │10.0.1.1│ │10.0.1.2│ │10.0.1.3│ └────────┘ └────────┘ └────────┘ All nodes have routes configured and can ping each other. ipv4 forwarding is enabled on all nodes, so pinging from node1 to node3 should work. However, I'm encountering an issue where node2 does not send correct arp solicitation packets when forwarding icmp packets from node1 to node3. For example, when pinging from node1 to node3, node2 sends out the following arp packet: 13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28 Here, 172.16.0.102 is an ip address configured on a different interface on node2. This request will never receive a response because `rp_filter=2`. node2 has the following (correct) routes installed: 10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink Since arp_announce is set to 0 (the default), arp_solicit selects the first interface with an ip address (inet_select_addr), which results in selecting the wrong source address (172.16.0.102) for the arp request. Because rp_filter is set to 2, we won't receive an answer to this arp packet, and the ping will fail unless we explicitly ping from node2 to node3. I'm wondering if it would be possible (and correct) to modify arp_solicit to perform a fib lookup to check if there's a route with an explicit source address (e.g., the route above using src 10.0.1.2) and use that address as the source address for the arp packet. Of course, this wouldn't be backward compatible, as some users might rely on the current interface ordering behavior (or the loopback interface being selected first), so it would need to be controlled via a sysctl configuration flag. Perhaps I'm missing something obvious here though. Any insights would be appreciated! Gabriel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wrong source address selection in arp_solicit for forwarded packets 2025-10-17 14:47 Wrong source address selection in arp_solicit for forwarded packets Gabriel Goller @ 2025-10-20 14:06 ` Ido Schimmel 2025-10-20 14:27 ` Maciej W. Rozycki 2025-10-21 12:31 ` Gabriel Goller 0 siblings, 2 replies; 6+ messages in thread From: Ido Schimmel @ 2025-10-20 14:06 UTC (permalink / raw) To: g.goller Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev, linux-kernel On Fri, Oct 17, 2025 at 04:47:27PM +0200, Gabriel Goller wrote: > Hi, > I have a question about the arp solicit behavior: > > I have the following simple infrastructure with linux hosts where the ip > addresses are configured on dummy interfaces and all other interfaces are > unnumbered: > > ┌────────┐ ┌────────┐ ┌────────┐ │ node1 ├─────┤ node2 > ├─────┤ node3 │ │10.0.1.1│ │10.0.1.2│ │10.0.1.3│ └────────┘ > └────────┘ └────────┘ The diagram looks mangled. At least I don't understand it. > > All nodes have routes configured and can ping each other. ipv4 forwarding is > enabled on all nodes, so pinging from node1 to node3 should work. However, I'm > encountering an issue where node2 does not send correct arp solicitation > packets when forwarding icmp packets from node1 to node3. I believe ICMP is irrelevant here. > > For example, when pinging from node1 to node3, node2 sends out the > following arp packet: > > 13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), > length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6), > IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28 > > Here, 172.16.0.102 is an ip address configured on a different interface on > node2. This request will never receive a response because `rp_filter=2`. > > node2 has the following (correct) routes installed: > > 10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink > > Since arp_announce is set to 0 (the default), arp_solicit selects the first > interface with an ip address (inet_select_addr), which results in > selecting the wrong source address (172.16.0.102) for the arp request. > Because rp_filter is set to 2, we won't receive an answer to this arp > packet, and the ping will fail unless we explicitly ping from node2 to > node3. > > I'm wondering if it would be possible (and correct) to modify arp_solicit to > perform a fib lookup to check if there's a route with an explicit source > address (e.g., the route above using src 10.0.1.2) and use that address as the > source address for the arp packet. Of course, this wouldn't be backward > compatible, as some users might rely on the current interface ordering behavior > (or the loopback interface being selected first), so it would need to be > controlled via a sysctl configuration flag. Perhaps I'm missing something > obvious here though. This would probably entail adding a new arp_announce level, but nobody added a new level in at least 20 years, so you will need to explain why your setup is special and why the same functionality cannot be achieved in a different way that does not require kernel changes. A few things you can consider: 1. You wrote that the router interfaces are unnumbered. Modern unnumbered networks usually assign IPv6 link-local addresses to these interfaces. These addresses are only used for neighbour resolution and can be used as the nexthop address for IPv4 routes. For example: ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev dummy1 Or using nexthop objects: ip nexthop add id 1 via fe80::1 dev dummy1 ip route add 192.0.2.1/32 nhid 1 2. If you have interfaces whose addresses should not be considered as source addresses when generating IP/ARP packets out of other interfaces, then you can try placing them in a different VRF if it's viable. 3. Requires some work and I didn't look too much into it, but I believe it should be possible to derive the preferred source address and rewrite it in ARP packets using tc-bpf on egress. See: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dab4e1f06cabb6834de14264394ccab197007302 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wrong source address selection in arp_solicit for forwarded packets 2025-10-20 14:06 ` Ido Schimmel @ 2025-10-20 14:27 ` Maciej W. Rozycki 2025-10-21 12:31 ` Gabriel Goller 1 sibling, 0 replies; 6+ messages in thread From: Maciej W. Rozycki @ 2025-10-20 14:27 UTC (permalink / raw) To: Ido Schimmel Cc: g.goller, David S. Miller, dsahern, edumazet, kuba, pabeni, horms, netdev, linux-kernel On Mon, 20 Oct 2025, Ido Schimmel wrote: > > I have the following simple infrastructure with linux hosts where the ip > > addresses are configured on dummy interfaces and all other interfaces are > > unnumbered: > > > > ┌────────┐ ┌────────┐ ┌────────┐ │ node1 ├─────┤ node2 > > ├─────┤ node3 │ │10.0.1.1│ │10.0.1.2│ │10.0.1.3│ └────────┘ > > └────────┘ └────────┘ > > The diagram looks mangled. At least I don't understand it. It's been broken by: Content-Type: text/plain; charset=utf-8; format=flowed Cf. Documentation/process/email-clients.rst. Raw message contents look good. HTH, Maciej ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wrong source address selection in arp_solicit for forwarded packets 2025-10-20 14:06 ` Ido Schimmel 2025-10-20 14:27 ` Maciej W. Rozycki @ 2025-10-21 12:31 ` Gabriel Goller 2025-10-21 15:56 ` Ido Schimmel 1 sibling, 1 reply; 6+ messages in thread From: Gabriel Goller @ 2025-10-21 12:31 UTC (permalink / raw) To: Ido Schimmel Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev, linux-kernel On 20.10.2025 17:06, Ido Schimmel wrote: > On Fri, Oct 17, 2025 at 04:47:27PM +0200, Gabriel Goller wrote: > > Hi, > > I have a question about the arp solicit behavior: > > > > I have the following simple infrastructure with linux hosts where the ip > > addresses are configured on dummy interfaces and all other interfaces are > > unnumbered: > > > > ┌────────┐ ┌────────┐ ┌────────┐ │ node1 ├─────┤ node2 > > ├─────┤ node3 │ │10.0.1.1│ │10.0.1.2│ │10.0.1.3│ └────────┘ > > └────────┘ └────────┘ > > The diagram looks mangled. At least I don't understand it. Ah sorry about that, looks like I had format=flowed configured on my client. Diagram should be correct now: ┌────────┐ ┌────────┐ ┌────────┐ │ node1 ├─────┤ node2 ├─────┤ node3 │ │10.0.1.1│ │10.0.1.2│ │10.0.1.3│ └────────┘ └────────┘ └────────┘ If it's still not right it's correctly rendered on lore: https://lore.kernel.org/netdev/eykjh3y2bse2tmhn5rn2uvztoepkbnxpb7n2pvwq62pjetdu7o@r46lgxf4azz7/ > > All nodes have routes configured and can ping each other. ipv4 forwarding is > > enabled on all nodes, so pinging from node1 to node3 should work. However, I'm > > encountering an issue where node2 does not send correct arp solicitation > > packets when forwarding icmp packets from node1 to node3. > > I believe ICMP is irrelevant here. Yep, ICMP is just an example. > > For example, when pinging from node1 to node3, node2 sends out the > > following arp packet: > > > > 13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), > > length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6), > > IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28 > > > > Here, 172.16.0.102 is an ip address configured on a different interface on > > node2. This request will never receive a response because `rp_filter=2`. > > > > node2 has the following (correct) routes installed: > > > > 10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink > > > > Since arp_announce is set to 0 (the default), arp_solicit selects the first > > interface with an ip address (inet_select_addr), which results in > > selecting the wrong source address (172.16.0.102) for the arp request. > > Because rp_filter is set to 2, we won't receive an answer to this arp > > packet, and the ping will fail unless we explicitly ping from node2 to > > node3. > > > > I'm wondering if it would be possible (and correct) to modify arp_solicit to > > perform a fib lookup to check if there's a route with an explicit source > > address (e.g., the route above using src 10.0.1.2) and use that address as the > > source address for the arp packet. Of course, this wouldn't be backward > > compatible, as some users might rely on the current interface ordering behavior > > (or the loopback interface being selected first), so it would need to be > > controlled via a sysctl configuration flag. Perhaps I'm missing something > > obvious here though. > > This would probably entail adding a new arp_announce level, but nobody > added a new level in at least 20 years, so you will need to explain why > your setup is special and why the same functionality cannot be achieved > in a different way that does not require kernel changes. To add a bit more context, I'm using FRR on all nodes and the dummy interface ips are distributed using OpenFabric. But this shouldn't matter because the routes are inserted correctly and work fine. > A few things you can consider: > > 1. You wrote that the router interfaces are unnumbered. Modern > unnumbered networks usually assign IPv6 link-local addresses to these > interfaces. These addresses are only used for neighbour resolution and > can be used as the nexthop address for IPv4 routes. For example: > > ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev dummy1 > > Or using nexthop objects: > > ip nexthop add id 1 via fe80::1 dev dummy1 > ip route add 192.0.2.1/32 nhid 1 Hmm I don't know how this would help? There is a link-local address set on the interface, but we would have to add a ipv6 source address to the arp packet which wouldn't be right? The route already exists (see `dev ens22` and `onlink`). > 2. If you have interfaces whose addresses should not be considered as > source addresses when generating IP/ARP packets out of other interfaces, > then you can try placing them in a different VRF if it's viable. Yep, this is definitely a solution as the "loopback" address of the VRF is its master device. Still, what if the master device or the loopback device have multiple ips? > 3. Requires some work and I didn't look too much into it, but I believe > it should be possible to derive the preferred source address and rewrite > it in ARP packets using tc-bpf on egress. See: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dab4e1f06cabb6834de14264394ccab197007302 Yeah ebpf is definitely also a solution, but IMO this is a bit of a weird behavior and should be fixed in the kernel. We have all the information we need (from the routes) and just need to use them to select the correct source address, and not just give up and select randomly. Thanks for the answer! Gabriel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wrong source address selection in arp_solicit for forwarded packets 2025-10-21 12:31 ` Gabriel Goller @ 2025-10-21 15:56 ` Ido Schimmel 2025-10-22 8:57 ` Gabriel Goller 0 siblings, 1 reply; 6+ messages in thread From: Ido Schimmel @ 2025-10-21 15:56 UTC (permalink / raw) To: g.goller Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev, linux-kernel On Tue, Oct 21, 2025 at 02:31:51PM +0200, Gabriel Goller wrote: > Hmm I don't know how this would help? There is a link-local address set > on the interface, but we would have to add a ipv6 source address to the > arp packet which wouldn't be right? There are no ARP packets. Neighbour resolution is performed via IPv6 NA/NS messages. The script below [1] replicates your setup as I understand, but it uses IPv6 link-local addresses for the nexthops. [1] #!/bin/bash cleanup() { for i in {1..3}; do ip netns del node${i} &> /dev/null done } trap cleanup EXIT cleanup for i in {1..3}; do ip netns add node${i} ip netns exec node${i} sysctl -wq net.ipv4.conf.all.forwarding=1 ip netns exec node${i} sysctl -wq net.ipv4.conf.all.rp_filter=2 ip -n node${i} link set dev lo up ip -n node${i} link add name dummy up type dummy ip -n node${i} address add 10.0.1.${i}/32 dev dummy done ip -n node1 link add name veth1 type veth peer name veth2 netns node2 ip -n node2 link add name veth3 type veth peer name veth4 netns node3 ip -n node1 link set dev veth1 up ip -n node2 link set dev veth2 up ip -n node2 link set dev veth3 up ip -n node3 link set dev veth4 up ip -n node1 address add fe80::1/64 dev veth1 nodad ip -n node2 address add fe80::2/64 dev veth2 nodad ip -n node2 address add fe80::3/64 dev veth3 nodad ip -n node3 address add fe80::4/64 dev veth4 nodad ip -n node1 route add 10.0.1.2/32 src 10.0.1.1 nexthop via inet6 fe80::2 dev veth1 ip -n node1 route add 10.0.1.3/32 src 10.0.1.1 nexthop via inet6 fe80::2 dev veth1 ip -n node2 route add 10.0.1.1/32 src 10.0.1.2 nexthop via inet6 fe80::1 dev veth2 ip -n node2 route add 10.0.1.3/32 src 10.0.1.2 nexthop via inet6 fe80::4 dev veth3 ip -n node3 route add 10.0.1.1/32 src 10.0.1.3 nexthop via inet6 fe80::3 dev veth4 ip -n node3 route add 10.0.1.2/32 src 10.0.1.3 nexthop via inet6 fe80::3 dev veth4 ip netns exec node1 ping 10.0.1.3 -c 5 ip netns exec node1 ping 10.0.1.2 -c 5 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Wrong source address selection in arp_solicit for forwarded packets 2025-10-21 15:56 ` Ido Schimmel @ 2025-10-22 8:57 ` Gabriel Goller 0 siblings, 0 replies; 6+ messages in thread From: Gabriel Goller @ 2025-10-22 8:57 UTC (permalink / raw) To: Ido Schimmel Cc: davem, dsahern, edumazet, kuba, pabeni, horms, netdev, linux-kernel On 21.10.2025 18:56, Ido Schimmel wrote: > On Tue, Oct 21, 2025 at 02:31:51PM +0200, Gabriel Goller wrote: > > Hmm I don't know how this would help? There is a link-local address set > > on the interface, but we would have to add a ipv6 source address to the > > arp packet which wouldn't be right? > > There are no ARP packets. Neighbour resolution is performed via IPv6 > NA/NS messages. The script below [1] replicates your setup as I > understand, but it uses IPv6 link-local addresses for the nexthops. Ah yes, I understand what you mean now. Still, in an IPv4-only environment (e.g. OSPFv2, which does not support IPv6, where we would have to add route-maps [0] for every interface to rewrite the routes) you still have that problem. [0]: https://docs.frrouting.org/en/latest/routemap.html#clicmd-set-ip-next-hop-peer-address > [snip] Thanks, Gabriel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-22 8:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-17 14:47 Wrong source address selection in arp_solicit for forwarded packets Gabriel Goller 2025-10-20 14:06 ` Ido Schimmel 2025-10-20 14:27 ` Maciej W. Rozycki 2025-10-21 12:31 ` Gabriel Goller 2025-10-21 15:56 ` Ido Schimmel 2025-10-22 8:57 ` Gabriel Goller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox