* Routing problem with udp, and a multihomed host in 2.4.20
@ 2003-02-12 23:18 Neil Brown
2003-02-13 7:11 ` David S. Miller
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2003-02-12 23:18 UTC (permalink / raw)
To: linux-kernel
I have three subnets. A, B and C.
I have a Linux 2.4.20 server, named bartok, with three interfaces, one
on each subnet (so it can listen to broadcast requests everywhere).
This server has a default route pointing at A.
I have a router that routes between all these subnets, and others.
I have a client that sits on subnet B.
When my client tries to talk to bartok, it might try any of 3 IP
addresses (Due to round-robining in the DNS).
All IP addresses work find when establishing a TCP connection
e.g. telnet or ssh.
They don't for UDP.
e.g. rpcinfo -u bartok mountd
If I use the address on B, it works fine (as you would expect - same
subnet).
If I use the address on A it works fine (that is the default route
interface).
But if I use the address on C, it doesn't.
What happens is:
- request goes from client to router
- request goes from router to bartok interface on C
- bartok issues an ARP for client on C interface which is WRONG
- nobody replies to the ARP because client is on B, not C.
If I turn on proxy-arp on the router I can get the reply back, but I
would rather not do that.
So why does a reply to a UDP request arriving on subnet C from some
other subnet try to ARP out on subnet C instead of being routed
normally, while replies to TCP requests get routed properly and work
fine?
Is this a bug, or is there some configuration I can change?
I have double checked the subnet masks and broadcast addresses and
they work fine.
We have rp_filter set to 0 on all interfaces.
forwarding is set to 0, but setting it to 1 makes no difference.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Routing problem with udp, and a multihomed host in 2.4.20
2003-02-12 23:18 Routing problem with udp, and a multihomed host in 2.4.20 Neil Brown
@ 2003-02-13 7:11 ` David S. Miller
2003-02-13 9:28 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2003-02-13 7:11 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel
On Wed, 2003-02-12 at 15:18, Neil Brown wrote:
> Is this a bug, or is there some configuration I can change?
Specify the correct 'src' parameter in your 'ip' route
command invocations.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Routing problem with udp, and a multihomed host in 2.4.20
2003-02-13 9:28 ` Neil Brown
@ 2003-02-13 9:19 ` David S. Miller
2003-02-14 0:20 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2003-02-13 9:19 UTC (permalink / raw)
To: neilb; +Cc: linux-kernel
From: Neil Brown <neilb@cse.unsw.edu.au>
Date: Thu, 13 Feb 2003 20:28:34 +1100
On February 12, davem@redhat.com wrote:
> On Wed, 2003-02-12 at 15:18, Neil Brown wrote:
> > Is this a bug, or is there some configuration I can change?
>
> Specify the correct 'src' parameter in your 'ip' route
> command invocations.
Thanks... but I think I need a bit more help.
Sorry, I forgot to add that you need to enable the
arp_filter sysctl as well to make this work properly.
It should work once you do this.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Routing problem with udp, and a multihomed host in 2.4.20
2003-02-13 7:11 ` David S. Miller
@ 2003-02-13 9:28 ` Neil Brown
2003-02-13 9:19 ` David S. Miller
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2003-02-13 9:28 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-kernel
On February 12, davem@redhat.com wrote:
> On Wed, 2003-02-12 at 15:18, Neil Brown wrote:
> > Is this a bug, or is there some configuration I can change?
>
> Specify the correct 'src' parameter in your 'ip' route
> command invocations.
Thanks... but I think I need a bit more help.
bartok # ./ip route show
129.94.232.0/24 via 129.94.172.66 dev eth1
129.94.242.0/24 dev eth0 proto kernel scope link src 129.94.242.45
129.94.241.0/24 via 129.94.174.2 dev eth1
129.94.172.0/22 dev eth1 proto kernel scope link src 129.94.172.12
129.94.208.0/22 dev eth2 proto kernel scope link src 129.94.208.2
default via 129.94.242.1 dev eth0
bartok # ip addr show
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:10:4b:1c:a3:a4 brd ff:ff:ff:ff:ff:ff
inet 129.94.242.45/24 brd 129.94.242.255 scope global eth0
3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:a0:c9:8f:7f:3c brd ff:ff:ff:ff:ff:ff
inet 129.94.172.12/22 brd 129.94.242.255 scope global eth1
4: eth2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:90:27:37:bb:d5 brd ff:ff:ff:ff:ff:ff
inet 129.94.208.2/22 brd 129.94.242.255 scope global eth2
The client in question is 129.94.211.194
It sends a UDP request to 129.94.172.12
The reply, addressed to 129.94.211.194 causes ARP requests on bartok's
eth1
Surely the route that applies is the 4th one, which seems to have a
correct 'src' parameter.
BTW, The routes were all created with 'route', not 'ip', incase it
matters.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Routing problem with udp, and a multihomed host in 2.4.20
2003-02-13 9:19 ` David S. Miller
@ 2003-02-14 0:20 ` Neil Brown
2003-02-14 23:49 ` Herbert Xu
2003-02-16 22:43 ` Neil Brown
0 siblings, 2 replies; 7+ messages in thread
From: Neil Brown @ 2003-02-14 0:20 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-kernel
On Thursday February 13, davem@redhat.com wrote:
> From: Neil Brown <neilb@cse.unsw.edu.au>
> Date: Thu, 13 Feb 2003 20:28:34 +1100
>
> On February 12, davem@redhat.com wrote:
> > On Wed, 2003-02-12 at 15:18, Neil Brown wrote:
> > > Is this a bug, or is there some configuration I can change?
> >
> > Specify the correct 'src' parameter in your 'ip' route
> > command invocations.
>
> Thanks... but I think I need a bit more help.
>
> Sorry, I forgot to add that you need to enable the
> arp_filter sysctl as well to make this work properly.
>
> It should work once you do this.
Nope...
Maybe I'm not explaining myself well enough.
So I expermented a bit more and did some "strace"ing, and read some
man pages....
It turns out that the problem occurs when send_msg is used to send a
UDP packet, and the control information contains
struct in_pktinfo {
unsigned int ipi_ifindex; /* Interface index */
struct in_addr ipi_spec_dst; /* Local address */
struct in_addr ipi_addr; /* Header Destination address */
};
specifying the address and interface of the message that we are
replying to.
I'll include all the numbers below for completenes, but the brief
description goes:
Three subnets, A,B,C all connected by a router.
Client X on subnet B - default route to router.
Server Y: three interfaces:
eth0 on A - default route to router on A
eth1 on B ( and so directly connected to client X)
eth2 on C
Packet from X to Y:C (i.e. address of eth2 on Y) goes through router
to Y.
Y responds with sendmsg specifying that the incoming packet was on
eth2 and was addressed to Y:C.
What *should* (IMO) happen is the response should have Y:C as the
source address, and that packet should be routed with a preference
to eth2. As eth2 in not on B, and there are no known routes to B via
eth2, the reply should be routed normally: i.e. directly to eth1.
What *does* happen is that the reply is sent on eth2 as though the
client X were local to eth2. i.e. an ARP request is sent to find the
MAC address, and then the packets is sent to this MAC address.
It might be reasonable that my *should* case would require
ip_forwarding begin turned on, but I have ip_forwarding turned on and
it doesn't help.
In any case the *does* case is wrong because it sends a packet on an
interface to a neighbour that in known not to be directly attached to
that interface.
Does that make my situation clearer?
Thanks,
NeilBrown
-------------------------
The numbers:
On a multi homed host with the following interfaces:
bartok # ./ip address show
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:10:4b:1c:a3:a4 brd ff:ff:ff:ff:ff:ff
inet 129.94.242.45/24 brd 129.94.242.255 scope global eth0
3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:a0:c9:8f:7f:3c brd ff:ff:ff:ff:ff:ff
inet 129.94.172.12/22 brd 129.94.242.255 scope global eth1
4: eth2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:90:27:37:bb:d5 brd ff:ff:ff:ff:ff:ff
inet 129.94.208.2/22 brd 129.94.242.255 scope global eth2
and the following routes:
bartok # ./ip route show
129.94.232.0/24 via 129.94.172.66 dev eth1
129.94.242.0/24 dev eth0 proto kernel scope link src 129.94.242.45
129.94.241.0/24 via 129.94.174.2 dev eth1
129.94.172.0/22 dev eth1 proto kernel scope link src 129.94.172.12
129.94.208.0/22 dev eth2 proto kernel scope link src 129.94.208.2
default via 129.94.242.1 dev eth0
1/ A TCP SYN/ACK with
source 129.94.172.12 dest 129.94.211.194
that is in response to a TCP SYN with
source 129.94.211.194 dest 129.94.172.12
that arrived on eth1 will be sent directly to
129.94.211.194 on eth2
This is what you would expect.
2/ A UDP packet with
source 129.94.172.12 dest 129.94.211.194
that is sent (sendto) on a newly created and bound
SOCK_DGRAM socket will be sent directly to
129.94.211.194 on eth2
This is also what you would expect.
3/ A UDP packet sent on a newly created unbound
socket (bound to 0.0.0.0) to 129.94.211.194
will have
source 129.94.208.2 dest 129.94.211.194
and will be sent directly on eth2
Again as you would expect.
However:
4/ A UDP packet send on an unbound socket (bound to a port but not an
IP address) to 129.94.211.194, via a sendmsg request with
in_pktinfo specifing that the incoming packet was recieved on eth1
and had
source 129.94.211.194 dest 129.94.172.12
will have
source 129.94.172.12 dest 129.94.211.194
and will be sent directly to 129.94.211.194 ON ETH1
By 'sent directly' I mean if the arp table has an entry for
129.94.211.194 on eth1, it will be sent to that MAC address, and if
it doesn't an ARP request will be broadcast on eth1 to find an
appropriate MAC address.
This is *wrong*.
I am happy that the source address is 129.94.172.12 in this case
while in case 3 it is 129.94.208.2. I am not happy that it
directly sends to eth1.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Routing problem with udp, and a multihomed host in 2.4.20
2003-02-14 0:20 ` Neil Brown
@ 2003-02-14 23:49 ` Herbert Xu
2003-02-16 22:43 ` Neil Brown
1 sibling, 0 replies; 7+ messages in thread
From: Herbert Xu @ 2003-02-14 23:49 UTC (permalink / raw)
To: Neil Brown, linux-kernel
Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> It turns out that the problem occurs when send_msg is used to send a
> UDP packet, and the control information contains
> struct in_pktinfo {
> unsigned int ipi_ifindex; /* Interface index */
> struct in_addr ipi_spec_dst; /* Local address */
> struct in_addr ipi_addr; /* Header Destination address */
> };
> specifying the address and interface of the message that we are
> replying to.
So your application is forcing the packet to go out on a specific
interface bypassing the routing table...
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Routing problem with udp, and a multihomed host in 2.4.20
2003-02-14 0:20 ` Neil Brown
2003-02-14 23:49 ` Herbert Xu
@ 2003-02-16 22:43 ` Neil Brown
1 sibling, 0 replies; 7+ messages in thread
From: Neil Brown @ 2003-02-16 22:43 UTC (permalink / raw)
To: Neil Brown; +Cc: David S. Miller, linux-kernel
On Friday February 14, neilb@cse.unsw.edu.au wrote:
>
> It turns out that the problem occurs when send_msg is used to send a
> UDP packet, and the control information contains
> struct in_pktinfo {
> unsigned int ipi_ifindex; /* Interface index */
> struct in_addr ipi_spec_dst; /* Local address */
> struct in_addr ipi_addr; /* Header Destination address */
> };
> specifying the address and interface of the message that we are
> replying to.
Well, I took the plunge and hunted in and around the networking
code....
in net/ipv4/route.c, in ip_route_output_slow there is an if clause:
if (fib_lookup(&key, &res)) {
In my situation, fib_lookup will fail because key contains a
non-zero "oif" (output interface), but that interface does not have a
valid route to key.dst.
What should be done in that case? Well in my situation, the oif is
just a hint, not a requirement (it is, after all, the interface that
the request arrived on. It is not necessarily the interface that the
reply has to go out on). So possibly it should clear oif and try
fib_lookup again. But it doesn't.
This is a branch for "if oif is non-zero", and it is largely a
comment:
if (oldkey->oif) {
/* Apparently, routing tables are wrong. Assume,
that the destination is on link.
WHY? DW.
Because we are allowed to send to iface
even if it has NO routes and NO assigned
addresses. When oif is specified, routing
tables are looked up with only one purpose:
to catch if destination is gatewayed, rather than
direct. Moreover, if MSG_DONTROUTE is set,
we send packet, ignoring both routing tables
and ifaddr state. --ANK
We could make it even if oif is unknown,
likely IPv6, but we do not.
*/
if (key.src == 0)
key.src = inet_select_addr(dev_out, 0,
RT_SCOPE_LINK);
res.type = RTN_UNICAST;
goto make_route;
This comment seems to be considering a case where oif has been set as
an explicit request for the message to go on that interface, which is
not the case for "sendmsg" with an IP_PKTINFO attachment.
Maybe when ip_cmsg_send interprets the IP_PKTINFO and sets ipc->oif,
it should set some flag to say "hint". And then fib_lookup, or
possibly ip_route_output_slow, could test for that hint and re-try
with no oif if the first try fails..
But even that is not a complete solution. We would also need to
modify ip_route_output_key which scans a table of recently computed
routes to avoid having to do fib_lookup for every packet. This table
would need to know about oif's that are hints, and ones that are
requirements.
At about this point it got all a bit too complicated, so I haven't
bothered to try to patch the kernel to make it work right.
Rather, I noticed that if the interface specified does have a route to
the destination, that route will be taken rather than attempting to
directly send to the dest.
So I have added default routes to all of my interfaces, not just the
prefered one, and my symptoms have gone away (and I don't need
proxy-arp on the router any more).
This will satisfy my needs for now, but I feel that something should
be done to fix this problem the "right" way. I'm just not sure what.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-02-16 22:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-12 23:18 Routing problem with udp, and a multihomed host in 2.4.20 Neil Brown
2003-02-13 7:11 ` David S. Miller
2003-02-13 9:28 ` Neil Brown
2003-02-13 9:19 ` David S. Miller
2003-02-14 0:20 ` Neil Brown
2003-02-14 23:49 ` Herbert Xu
2003-02-16 22:43 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox