public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: g.goller@proxmox.com
Cc: davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Wrong source address selection in arp_solicit for forwarded packets
Date: Mon, 20 Oct 2025 17:06:23 +0300	[thread overview]
Message-ID: <aPZB33C-C1t1z7Dk@shredder> (raw)
In-Reply-To: <eykjh3y2bse2tmhn5rn2uvztoepkbnxpb7n2pvwq62pjetdu7o@r46lgxf4azz7>

On Fri, Oct 17, 2025 at 04:47:27PM +0200, Gabriel Goller wrote:
> Hi,
> I have a question about the arp solicit behavior:
> 
> I have the following simple infrastructure with linux hosts where the ip
> addresses are configured on dummy interfaces and all other interfaces are
> unnumbered:
> 
>   ┌────────┐     ┌────────┐     ┌────────┐    │ node1  ├─────┤ node2
> ├─────┤ node3  │    │10.0.1.1│     │10.0.1.2│     │10.0.1.3│    └────────┘
> └────────┘     └────────┘

The diagram looks mangled. At least I don't understand it.

> 
> All nodes have routes configured and can ping each other. ipv4 forwarding is
> enabled on all nodes, so pinging from node1 to node3 should work. However, I'm
> encountering an issue where node2 does not send correct arp solicitation
> packets when forwarding icmp packets from node1 to node3.

I believe ICMP is irrelevant here.

> 
> For example, when pinging from node1 to node3, node2 sends out the
> following arp packet:
> 
> 13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100),
> length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6),
> IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28
> 
> Here, 172.16.0.102 is an ip address configured on a different interface on
> node2. This request will never receive a response because `rp_filter=2`.
> 
> node2 has the following (correct) routes installed:
> 
> 10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink
> 
> Since arp_announce is set to 0 (the default), arp_solicit selects the first
> interface with an ip address (inet_select_addr), which results in
> selecting the wrong source address (172.16.0.102) for the arp request.
> Because rp_filter is set to 2, we won't receive an answer to this arp
> packet, and the ping will fail unless we explicitly ping from node2 to
> node3.
> 
> I'm wondering if it would be possible (and correct) to modify arp_solicit to
> perform a fib lookup to check if there's a route with an explicit source
> address (e.g., the route above using src 10.0.1.2) and use that address as the
> source address for the arp packet. Of course, this wouldn't be backward
> compatible, as some users might rely on the current interface ordering behavior
> (or the loopback interface being selected first), so it would need to be
> controlled via a sysctl configuration flag. Perhaps I'm missing something
> obvious here though.

This would probably entail adding a new arp_announce level, but nobody
added a new level in at least 20 years, so you will need to explain why
your setup is special and why the same functionality cannot be achieved
in a different way that does not require kernel changes.

A few things you can consider:

1. You wrote that the router interfaces are unnumbered. Modern
unnumbered networks usually assign IPv6 link-local addresses to these
interfaces. These addresses are only used for neighbour resolution and
can be used as the nexthop address for IPv4 routes. For example:

ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev dummy1

Or using nexthop objects:

ip nexthop add id 1 via fe80::1 dev dummy1
ip route add 192.0.2.1/32 nhid 1

2. If you have interfaces whose addresses should not be considered as
source addresses when generating IP/ARP packets out of other interfaces,
then you can try placing them in a different VRF if it's viable.

3. Requires some work and I didn't look too much into it, but I believe
it should be possible to derive the preferred source address and rewrite
it in ARP packets using tc-bpf on egress. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dab4e1f06cabb6834de14264394ccab197007302

  reply	other threads:[~2025-10-20 14:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 14:47 Wrong source address selection in arp_solicit for forwarded packets Gabriel Goller
2025-10-20 14:06 ` Ido Schimmel [this message]
2025-10-20 14:27   ` Maciej W. Rozycki
2025-10-21 12:31   ` Gabriel Goller
2025-10-21 15:56     ` Ido Schimmel
2025-10-22  8:57       ` Gabriel Goller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPZB33C-C1t1z7Dk@shredder \
    --to=idosch@idosch.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=g.goller@proxmox.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox