* routing bug report for 2.4 @ 2003-06-27 23:00 Ben Greear 2003-06-28 9:02 ` Julian Anastasov 0 siblings, 1 reply; 15+ messages in thread From: Ben Greear @ 2003-06-27 23:00 UTC (permalink / raw) To: 'netdev@oss.sgi.com' (This has been discussed with Alexey, but sending to the list for general consumption). Here is how to reproduce this: ifconfig eth1 192.1.1.2 netmask 255.255.255.0 ifconfig eth2 192.1.2.2 netmask 255.255.255.0 Set up policy based routing with the 'ip' tool to make packets with source-address of each interface to use the gateway for that interface. Set gateway for eth1 to be 192.1.1.1 Set gateway for eth2 to be 192.1.2.1 Now, use ping to try to send pkts from one interface to the other: ping -I 192.1.1.2 192.1.2.2 You will see arps on eth1 for 192.1.2.2, whereas you should see packets being sent to the default gateway for eth1. If you modify the ping source to BINDTODEVICE eth1, then it will send correctly. I am under the impression that you should not have to specifically BINDTODEVICE in this case since the policy based routing should take care of routing things correctly. Or, maybe, the real bug is in ping in that it did not BINDTODEVICE? Also, ping -I eth1 192.1.2.2 will fail to route externally. That may just be a feature of ping: I'm unsure what the subtle difference is *supposed* to be between using -I eth1 and -I 1.2.3.4 Thanks, Ben -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: routing bug report for 2.4 2003-06-27 23:00 routing bug report for 2.4 Ben Greear @ 2003-06-28 9:02 ` Julian Anastasov 2003-06-28 18:38 ` Ben Greear 0 siblings, 1 reply; 15+ messages in thread From: Julian Anastasov @ 2003-06-28 9:02 UTC (permalink / raw) To: Ben Greear; +Cc: 'netdev@oss.sgi.com' Hello, I'll try to reply to some of your posts... On Fri, 27 Jun 2003, Ben Greear wrote: > (This has been discussed with Alexey, but sending to the list for > general consumption). I remember your previous posts, I assume they were skipped because you do not use properly the routing system, you have to use preferred sources in your routes. Now I'm not sure if the case is same. > Here is how to reproduce this: > ifconfig eth1 192.1.1.2 netmask 255.255.255.0 > ifconfig eth2 192.1.2.2 netmask 255.255.255.0 > > > Set up policy based routing with the 'ip' tool to make packets with > source-address of each interface to use the gateway for that interface. > Set gateway for eth1 to be 192.1.1.1 > Set gateway for eth2 to be 192.1.2.1 But not all packets, may be you have to place the source-based rules after table main. This is the recommeneded way. > Now, use ping to try to send pkts from one interface to the other: > > ping -I 192.1.1.2 192.1.2.2 Your report is damn wrong, why do you ping local IP? Or may be that is your test? Trying ping from ip-utils... sorry, not reproducible here (I hope it is the expected result). > You will see arps on eth1 for 192.1.2.2, whereas you should see packets > being sent to the default gateway for eth1. Why? 192.1.2.2 is local IP and the local table is first priority. We should not see any ARP packets for local targets, right? > If you modify the ping source to BINDTODEVICE eth1, then it will send > correctly. I am under the impression that you should not have to specifically > BINDTODEVICE in this case since the policy based routing should take care of > routing things correctly. Or, maybe, the real bug is in ping in that it did > not BINDTODEVICE? Do you really ping local IP? > Also, ping -I eth1 192.1.2.2 will fail to route externally. That may > just be a feature of ping: I'm unsure what the subtle difference is *supposed* > to be between using -I eth1 and -I 1.2.3.4 I think, the root of your problems is that you specify 'ping -I device' and the routing is forced to construct result from unknown route by using source address autoselection. From previous post: > # The other interface on the router machine (same machine as I just pinged above) > [root@localhost root]# ping -I eth1 10.3.2.1 > PING 10.3.2.1 (10.3.2.1) from 10.3.1.4 eth1: 56(84) bytes of data. > From 10.3.1.4 icmp_seq=1 Destination Host Unreachable > From 10.3.1.4 icmp_seq=3 Destination Host Unreachable > > # It is NOT using the default gateway for this traffic, but is instead > # just trying to ARP. > [root@localhost root]# tcpdump -n -i eth1 > tcpdump: listening on eth1 > 11:56:19.788336 arp who-has 10.3.2.1 tell 10.3.1.4 > 11:56:20.788134 arp who-has 10.3.2.1 tell 10.3.1.4 > 11:56:21.788149 arp who-has 10.3.2.1 tell 10.3.1.4 > 11:56:22.788379 arp who-has 10.3.2.1 tell 10.3.1.4 '-I eth1 10.3.2.1' requests route "from 0.0.0.0 to 10.3.2.1 oif eth1". You do not have such routes. I assume the result is (quoting route.c): "Apparently, routing tables are wrong." "Assume, that the destination is on link." For your setup I would say "The request is wrong". You see that the kernel even do not check whether eth1 is UP. You are lucky. Then the kernel autoselects 10.3.1.4 as src for the forced eth1 device. Thus, you see this ARP probe. Later, it seems 10.3.2.1 does not want to reply to 10.3.1.4, I assume this is a known problem? As for ping from iputils: you can specify device or saddr, not the both, so the only valid test for source based routing can be '-I IP'. Do you really need '-I eth1' ? Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: routing bug report for 2.4 2003-06-28 9:02 ` Julian Anastasov @ 2003-06-28 18:38 ` Ben Greear 2003-06-28 20:12 ` Julian Anastasov 2003-06-28 20:38 ` Julian Anastasov 0 siblings, 2 replies; 15+ messages in thread From: Ben Greear @ 2003-06-28 18:38 UTC (permalink / raw) To: Julian Anastasov, netdev Julian Anastasov wrote: > Hello, > > I'll try to reply to some of your posts... > > On Fri, 27 Jun 2003, Ben Greear wrote: > > >>(This has been discussed with Alexey, but sending to the list for >>general consumption). > > > I remember your previous posts, I assume they were skipped > because you do not use properly the routing system, you have to > use preferred sources in your routes. Now I'm not sure if the case > is same. > > >>Here is how to reproduce this: >>ifconfig eth1 192.1.1.2 netmask 255.255.255.0 >>ifconfig eth2 192.1.2.2 netmask 255.255.255.0 >> >> >>Set up policy based routing with the 'ip' tool to make packets with >>source-address of each interface to use the gateway for that interface. >> Set gateway for eth1 to be 192.1.1.1 >> Set gateway for eth2 to be 192.1.2.1 > > > But not all packets, may be you have to place the source-based > rules after table main. This is the recommeneded way. My test works if I ping the 192.1.2.1 router from the eth1 interface, the issue is that the localness of eth2 over-rides the policy based routing. Also, note that it does work when I BINDTODEVICE on eth1. I had assumed that because I was setting the source IP, and had a specific routing table for that case, then it would use that routing table. In the error case, it is at least partially ignoring that routing table, though not entirely: It is trying to communicate on eth1, but it is arping instead of routing. > > >>Now, use ping to try to send pkts from one interface to the other: >> >>ping -I 192.1.1.2 192.1.2.2 > > > Your report is damn wrong, why do you ping local IP? > Or may be that is your test? Trying ping from ip-utils... sorry, > not reproducible here (I hope it is the expected result). What results do you get? And did you set up policy based routing? I tried ping with RH8, RH9, and downloaded the latest ip-utils I could find. Only when I hacked the ping source to bind to the local IP AND bind specifically to the device did it work. I am trying to ping a local IP but over the external network. It is not something most people try to do now, I am aware. As well as my twisted reasons, it would be good for determining path failures in an HA setup, so it's not completely useless :) > > >>You will see arps on eth1 for 192.1.2.2, whereas you should see packets >>being sent to the default gateway for eth1. > > > Why? 192.1.2.2 is local IP and the local table is first > priority. We should not see any ARP packets for local targets, right? Local table is not used in my case because I specifically bind to the sending IP and have a table specifically for that case. > > >>If you modify the ping source to BINDTODEVICE eth1, then it will send >>correctly. I am under the impression that you should not have to specifically >>BINDTODEVICE in this case since the policy based routing should take care of >>routing things correctly. Or, maybe, the real bug is in ping in that it did >>not BINDTODEVICE? > > > Do you really ping local IP? Yes. > > >>Also, ping -I eth1 192.1.2.2 will fail to route externally. That may >>just be a feature of ping: I'm unsure what the subtle difference is *supposed* >>to be between using -I eth1 and -I 1.2.3.4 > > > I think, the root of your problems is that you specify > 'ping -I device' and the routing is forced to construct result from > unknown route by using source address autoselection. I am open to suggestions as to other ways to make this work: I want to ping from eth1 to eth2, and have at least the echo-request go out over eth1 and be routed back to eth2. > > From previous post: > > >># The other interface on the router machine (same machine as I just pinged above) >>[root@localhost root]# ping -I eth1 10.3.2.1 >>PING 10.3.2.1 (10.3.2.1) from 10.3.1.4 eth1: 56(84) bytes of data. >> From 10.3.1.4 icmp_seq=1 Destination Host Unreachable >> From 10.3.1.4 icmp_seq=3 Destination Host Unreachable >> >># It is NOT using the default gateway for this traffic, but is instead >># just trying to ARP. >>[root@localhost root]# tcpdump -n -i eth1 >>tcpdump: listening on eth1 >>11:56:19.788336 arp who-has 10.3.2.1 tell 10.3.1.4 >>11:56:20.788134 arp who-has 10.3.2.1 tell 10.3.1.4 >>11:56:21.788149 arp who-has 10.3.2.1 tell 10.3.1.4 >>11:56:22.788379 arp who-has 10.3.2.1 tell 10.3.1.4 > > > '-I eth1 10.3.2.1' requests route > "from 0.0.0.0 to 10.3.2.1 oif eth1". You do not have such routes. > I assume the result is (quoting route.c): > "Apparently, routing tables are wrong." > "Assume, that the destination is on link." > > For your setup I would say "The request is wrong". You see that > the kernel even do not check whether eth1 is UP. You are lucky. > > Then the kernel autoselects 10.3.1.4 as src for the forced eth1 device. > Thus, you see this ARP probe. Later, it seems 10.3.2.1 does not > want to reply to 10.3.1.4, I assume this is a known problem? > > As for ping from iputils: you can specify device or saddr, > not the both, so the only valid test for source based routing can > be '-I IP'. Do you really need '-I eth1' ? Actually, from the code I looked at, you can use two -I flags, but what appears to be a bug actually keeps it from working completely (I could find no combo of arguments to make it make the BINDTODEVICE call.) During some of my earlier testing, I had various things wrong. For instance, I noticed that if I had policy-based routing on my router, it would not work correctly. I have not debugged that issue in depth, as it does not really hinder the functionality that I require. If it still doesn't work in 2.6 I'll open a bug ;) One final note, I am running a kernel with a patch that allows external comm over two interfaces on the same machine on the same subnet (with policy based routing). The normal ping works in this case, btw. So, it may be that even if you change ping, it may still not work for you (my patch mostly deals with getting local ARPs to answer correctly, so I am not sure it comes into play in the routed case.) Ben > > Regards > > -- > Julian Anastasov <ja@ssi.bg> > -- Ben Greear <greearb@candelatech.com> <Ben_Greear@excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: routing bug report for 2.4 2003-06-28 18:38 ` Ben Greear @ 2003-06-28 20:12 ` Julian Anastasov 2003-06-28 20:38 ` Julian Anastasov 1 sibling, 0 replies; 15+ messages in thread From: Julian Anastasov @ 2003-06-28 20:12 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > My test works if I ping the 192.1.2.1 router from the eth1 interface, the issue > is that the localness of eth2 over-rides the policy based routing. ok but be ready for problems if rp_filter is used > Also, note that it does work when I BINDTODEVICE on eth1. I had assumed that > because I was setting the source IP, and had a specific routing table for > that case, then it would use that routing table. In the error case, it is > at least partially ignoring that routing table, though not entirely: It is > trying to communicate on eth1, but it is arping instead of routing. It is arping because '-I device' does not hit your 'from local_IP => 0/0 via remote_GW' route, the kernel can not find route "from 0 to remote_IP oif dev". If you specify '-I local_IP' then it will hit the 'from local_IP' rule that points to your table. See, "Assume, that the destination is on link", it is not gatewayed as you expect. Thus, the ARP probe is resolving target, not the GW. BINDTODEVICE translated to routing request is "oif XXX". As ping can do -I device (and can not specify saddr at the same time) the result is that the device is used (unless target is local), saddr is autoselected (there is no provided saddr) starting from the -I device, there is no GW (the target becomes gw, route is forced onlink), the packet reaches the neighbouring code where ARP sends probe to target (not to GW). > >>Now, use ping to try to send pkts from one interface to the other: > >> > >>ping -I 192.1.1.2 192.1.2.2 > > > > > > Your report is damn wrong, why do you ping local IP? > > Or may be that is your test? Trying ping from ip-utils... sorry, > > not reproducible here (I hope it is the expected result). > > What results do you get? And did you set up policy based routing? Yes, I have tried to simulate your rules and routes but not exactly. In any case, I can not generate ARP traffic when pinging local IP no matter what device I use. The kernel normally overrides the -I option if you talk to local IP, lo is used. It is expected with the plain kernel. > I tried ping with RH8, RH9, and downloaded the latest ip-utils I could > find. Only when I hacked the ping source to bind to the local IP AND bind > specifically to the device did it work. Yes, that will hit the ip rule and will avoid the "lo" cancellation for your patched kernel. > I am trying to ping a local IP but over the external network. It is not something > most people try to do now, I am aware. As well as my twisted reasons, it would > be good for determining path failures in an HA setup, so it's not completely > useless :) I now see that you have patched kernel and this is the reason I can not fully understand your previous postings. The normal kernel can not generate such strange results (I mean the ARP requests when resolving local IP). All your problems do not show kernel bug yet, it seems the problem is hidden in your strategy to support remote local IPs. Or may be you do not have problems with your tests but the plain kernel is suspect for ping insanity? > > Why? 192.1.2.2 is local IP and the local table is first > > priority. We should not see any ARP packets for local targets, right? > > Local table is not used in my case because I specifically bind to the sending IP > and have a table specifically for that case. Not true with the normal kernel, may be your patches avoid selecting dev lo for traffic to local IPs if oif is specified? > > I think, the root of your problems is that you specify > > 'ping -I device' and the routing is forced to construct result from > > unknown route by using source address autoselection. > > I am open to suggestions as to other ways to make this work: I want to ping from eth1 > to eth2, and have at least the echo-request go out over eth1 and be routed back to eth2. I see, this is another problem because you do not mention in your posts that you have patched kernel. > > As for ping from iputils: you can specify device or saddr, > > not the both, so the only valid test for source based routing can > > be '-I IP'. Do you really need '-I eth1' ? > > Actually, from the code I looked at, you can use two -I flags, but what appears > to be a bug actually keeps it from working completely (I could find no combo of arguments > to make it make the BINDTODEVICE call.) I do not see such -I behaviour in ping. I understand that the only way to really avoid the "lo" cancellation and to send traffic with daddr=local_IP is to patch the routing to keep the original device and always to BINDTODEVICE for this reason (-I dev). > During some of my earlier testing, I had various things wrong. For instance, I > noticed that if I had policy-based routing on my router, it would not work correctly. missing preferred sources in routes? > I have not debugged that issue in depth, as it does not really hinder the functionality > that I require. If it still doesn't work in 2.6 I'll open a bug ;) > > One final note, I am running a kernel with a patch that allows external comm over > two interfaces on the same machine on the same subnet (with policy based routing). > The normal ping works in this case, btw. So, it may be that even if > you change ping, it may still not work for you (my patch mostly deals with getting > local ARPs to answer correctly, so I am not sure it comes into play in the routed case.) If you still suspect the kernel may be you can show me fresh link for this patch because I'm not sure it is valid or at least does not break the things. But adding 'I local_IP' together with "-I device" should avoid the wrong ARP probe "where is TARGET", it should be changed to "where is GW". So, IMO, you need to make sure in your tests that: - you have patched ping to support -I device and -I local_IP together - you have preferred source in all your routes Do you still suspect the kernel? > Ben Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: routing bug report for 2.4 2003-06-28 18:38 ` Ben Greear 2003-06-28 20:12 ` Julian Anastasov @ 2003-06-28 20:38 ` Julian Anastasov 2003-06-28 22:13 ` Ben Greear 1 sibling, 1 reply; 15+ messages in thread From: Julian Anastasov @ 2003-06-28 20:38 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > What results do you get? And did you set up policy based routing? I now see, the kernel sends "who-has local_IP" when you use 'ping -I device local_IP'. If this is considered bad we can extend the checks when fib_lookup fails: - check for UP state (is it needed? return ENETDOWN?) - check if target IP is local and select "lo" instead of oif Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: routing bug report for 2.4 2003-06-28 20:38 ` Julian Anastasov @ 2003-06-28 22:13 ` Ben Greear 2003-06-29 7:28 ` Julian Anastasov 2003-06-29 9:43 ` send-to-self (was Re: routing bug report for 2.4) Julian Anastasov 0 siblings, 2 replies; 15+ messages in thread From: Ben Greear @ 2003-06-28 22:13 UTC (permalink / raw) To: Julian Anastasov; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 1387 bytes --] Julian Anastasov wrote: > Hello, > > On Sat, 28 Jun 2003, Ben Greear wrote: > > >>What results do you get? And did you set up policy based routing? > > > I now see, the kernel sends "who-has local_IP" when you > use 'ping -I device local_IP'. If this is considered bad we can extend > the checks when fib_lookup fails: > > - check for UP state (is it needed? return ENETDOWN?) > - check if target IP is local and select "lo" instead of oif Well, why should it try to route locally in this case (I'm assuming that by using 'lo' it will not try to send on the external link) Why not instead make it send to the router for that source-ip, if it is configured. If it is not configured, then I think arping is the best that can be expected, as the behaviour becomes quite undefined and we really have 'no route to host'. My send-to-self patch that I have been using is attached. I also have some other patches for mac-vlans and packet-gen applied, but I don't believe these will have any impact on the behaviour we have been discussing. There is example code on how to use it (and an original, more crufty patch) here: http://lwn.net/Articles/9897/ Thanks, Ben -- Ben Greear <greearb@candelatech.com> <Ben_Greear@excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear [-- Attachment #2: sts.diff --] [-- Type: text/plain, Size: 4999 bytes --] --- linux-2.4.20/include/linux/sockios.h 2001-11-07 14:39:36.000000000 -0800 +++ linux-2.4.20.c3/include/linux/sockios.h 2003-03-18 14:32:53.000000000 -0800 @@ -65,6 +65,8 @@ #define SIOCDIFADDR 0x8936 /* delete PA address */ #define SIOCSIFHWBROADCAST 0x8937 /* set hardware broadcast addr */ #define SIOCGIFCOUNT 0x8938 /* get number of devices */ +#define SIOCGIFWEIGHT 0x8939 /* get weight of device, in stones */ +#define SIOCSIFWEIGHT 0x893a /* set weight of device, in stones */ #define SIOCGIFBR 0x8940 /* Bridging support */ #define SIOCSIFBR 0x8941 /* Set bridging options */ @@ -92,6 +94,10 @@ #define SIOCGRARP 0x8961 /* get RARP table entry */ #define SIOCSRARP 0x8962 /* set RARP table entry */ +/* MAC address based VLAN control calls */ +#define SIOCGIFMACVLAN 0x8965 /* Mac address multiplex/demultiplex support */ +#define SIOCSIFMACVLAN 0x8966 /* Set macvlan options */ + /* Driver configuration calls */ #define SIOCGIFMAP 0x8970 /* Get device parameters */ @@ -114,6 +120,16 @@ #define SIOCBONDINFOQUERY 0x8994 /* rtn info about bond state */ #define SIOCBONDCHANGEACTIVE 0x8995 /* update to a new active slave */ + +/* Ben's little hack land */ +#define SIOCSACCEPTLOCALADDRS 0x89a0 /* Allow interfaces to accept pkts from + * local interfaces...use with SO_BINDTODEVICE + */ +#define SIOCGACCEPTLOCALADDRS 0x89a1 /* Allow interfaces to accept pkts from + * local interfaces...use with SO_BINDTODEVICE + */ + + /* Device private ioctl calls */ /* --- linux-2.4.20/net/Config.in 2002-08-02 17:39:46.000000000 -0700 +++ linux-2.4.20.c3/net/Config.in 2003-03-18 14:32:53.000000000 -0800 @@ -48,6 +48,7 @@ bool ' Per-VC IP filter kludge' CONFIG_ATM_BR2684_IPFILTER fi fi + tristate 'MAC address based VLANs (EXPERIMENTAL)' CONFIG_MACVLAN fi tristate '802.1Q VLAN Support' CONFIG_VLAN_8021Q --- linux-2.4.20/net/ipv4/arp.c 2002-11-28 15:53:15.000000000 -0800 +++ linux-2.4.20.c3/net/ipv4/arp.c 2003-03-18 14:32:53.000000000 -0800 @@ -1,4 +1,4 @@ -/* linux/net/inet/arp.c +/* linux/net/inet/arp.c -*-linux-c-*- * * Version: $Id: arp.c,v 1.99 2001/08/30 22:55:42 davem Exp $ * @@ -351,12 +351,22 @@ int flag = 0; /*unsigned long now; */ - if (ip_route_output(&rt, sip, tip, 0, 0) < 0) + if (ip_route_output(&rt, sip, tip, 0, 0) < 0) return 1; - if (rt->u.dst.dev != dev) { - NET_INC_STATS_BH(ArpFilter); - flag = 1; - } + + if (rt->u.dst.dev != dev) { + if ((dev->priv_flags & IFF_ACCEPT_LOCAL_ADDRS) && + (rt->u.dst.dev == &loopback_dev)) { + /* OK, we'll let this special case slide, so that we can arp from one + * local interface to another. This seems to work, but could use some + * review. --Ben + */ + } + else { + NET_INC_STATS_BH(ArpFilter); + flag = 1; + } + } ip_rt_put(rt); return flag; } --- linux-2.4.20/net/ipv4/fib_frontend.c 2002-08-02 17:39:46.000000000 -0700 +++ linux-2.4.20.c3/net/ipv4/fib_frontend.c 2003-03-18 14:32:53.000000000 -0800 @@ -233,8 +233,17 @@ if (fib_lookup(&key, &res)) goto last_resort; - if (res.type != RTN_UNICAST) - goto e_inval_res; + + if (res.type != RTN_UNICAST) { + if ((res.type == RTN_LOCAL) && + (dev->priv_flags & IFF_ACCEPT_LOCAL_ADDRS)) { + /* All is OK */ + } + else { + goto e_inval_res; + } + } + *spec_dst = FIB_RES_PREFSRC(res); fib_combine_itag(itag, &res); #ifdef CONFIG_IP_ROUTE_MULTIPATH --- linux-2.4.20/net/ipv4/tcp_ipv4.c 2002-11-28 15:53:15.000000000 -0800 +++ linux-2.4.20.c3/net/ipv4/tcp_ipv4.c 2003-03-18 14:32:53.000000000 -0800 @@ -1394,7 +1394,7 @@ #define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */ #endif - /* Never answer to SYNs send to broadcast or multicast */ + /* Never answer to SYNs sent to broadcast or multicast */ if (((struct rtable *)skb->dst)->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST)) goto drop; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: routing bug report for 2.4 2003-06-28 22:13 ` Ben Greear @ 2003-06-29 7:28 ` Julian Anastasov 2003-06-29 9:43 ` send-to-self (was Re: routing bug report for 2.4) Julian Anastasov 1 sibling, 0 replies; 15+ messages in thread From: Julian Anastasov @ 2003-06-29 7:28 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > > - check for UP state (is it needed? return ENETDOWN?) > > - check if target IP is local and select "lo" instead of oif First, here is what I mean (not compiled): - ignore matching of the oif key for local destinations - return ENETDOWN when the specified out_dev is down Dave, Alexey, can you judge on these issues because they are not fatal corner cases and can be ignored. --- v2.4.21/linux/net/ipv4/fib_semantics.c.orig Sat Jun 14 08:42:55 2003 +++ v2.4.21/linux/net/ipv4/fib_semantics.c Sun Jun 29 09:28:10 2003 @@ -603,7 +603,9 @@ for_nexthops(fi) { if (nh->nh_flags&RTNH_F_DEAD) continue; - if (!key->oif || key->oif == nh->nh_oif) + if (!key->oif || + key->oif == nh->nh_oif || + nh->nh_scope == RT_SCOPE_NOWHERE) break; } #ifdef CONFIG_IP_ROUTE_MULTIPATH --- v2.4.21/linux/net/ipv4/route.c.orig Sat Jun 14 08:42:55 2003 +++ v2.4.21/linux/net/ipv4/route.c Sun Jun 29 09:16:03 2003 @@ -1793,6 +1793,9 @@ dev_put(dev_out); goto out; /* Wrong error code */ } + err = -ENETDOWN; + if (!(dev_out->flags&IFF_UP)) + goto out; if (LOCAL_MCAST(oldkey->dst) || oldkey->dst == 0xFFFFFFFF) { if (!key.src) > Well, why should it try to route locally in this case (I'm assuming that > by using 'lo' it will not try to send on the external link) No, it does not use "lo", "lo" replaces "dev" only if we get RTN_LOCAL result. But "to local_IP dev different_device" can escape from our host because we can not find route and thus we can not override out_dev with lo. > Why not instead make it send to the router for that source-ip, if it is > configured. If it is not configured, then I think arping is the best that What we have is that app uses BINDTODEVICE to send packet with saddr=some_IP daddr=any_valid_local_IP. This is confusing but I do not see any harm. But I think route request "to local_IP" deserves "lo" result no matter the oif key. > can be expected, as the behaviour becomes quite undefined and we really > have 'no route to host'. The only reason can be to avoid confusions and to make it symmetric with the source validation check. And yes, this patch breaks your tests. > My send-to-self patch that I have been using is attached. I also have some other > patches for mac-vlans and packet-gen applied, but I don't believe these will have any > impact on the behaviour we have been discussing. I don't see anything in your patch that can disturb these tests. The kernel is helpful enough to send your ARP probe for local_IP on the LAN :) When I tested the first time, you claimed -I local_IP1 local_IP2 causes the problem but as we see, it is caused from -I dev > Thanks, > Ben Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* send-to-self (was Re: routing bug report for 2.4) 2003-06-28 22:13 ` Ben Greear 2003-06-29 7:28 ` Julian Anastasov @ 2003-06-29 9:43 ` Julian Anastasov 2003-06-29 20:18 ` Julian Anastasov 2003-06-30 20:22 ` James R. Leu 1 sibling, 2 replies; 15+ messages in thread From: Julian Anastasov @ 2003-06-29 9:43 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, On Sat, 28 Jun 2003, Ben Greear wrote: > My send-to-self patch that I have been using is attached. I also have some other > patches for mac-vlans and packet-gen applied, but I don't believe these will have any > impact on the behaviour we have been discussing. Ben, lets define new behaviour for your feature: 1. we mark ethX with /proc/sys/net/ipv4/conf/ethX/loop=1. That means this is a loop device (my site contains lot of device flags, you can see what costs creating a sysctl var): http://www.ssi.bg/~ja/ just hit some of the links, recommended example: http://www.ssi.bg/~ja/forward_shared-2.4.19-2.diff there are 2 variants: - loop can be 0(no loop) / 1(loop inout) or - 0(no loop), 1(loop in only), 2(loop out only), 3(loop inout) where "loop in only" means "accept only" and "loop out only" is "send only" interface but as all traffics are inout I think "loop inout" will be always used 2. arp_filter accepts traffic on ethX (as in your patch) if "loop in" is allowed for indev and "loop out" for the out_dev in routing result 3. rp_filter (source validation) accepts traffic on ethX (as in your patch) if "loop in" is allowed 4. get unicast output route for local IPs ethY->ethX if "loop in" is allowed for ethX and "loop out" is allowed for "ethY. ARP will add cache entries for local IPs. Goal 1. Can we just skip the BINDTODEVICE thing and to replace it with bind to src IP. We can avoid binding to src IP for our tests if we replace the preferred source IP in the desired local routes but this is a hack. Using BINDTODEVICE will not add any benefits but will be supported (it is ignored). Then to define it in this way: If ethX has "/proc/sys/net/ipv4/conf/ethX/loop" set to !0 then all output routes "from local_ip_on_ethY to local_ip_on_ethX" will not receive "lo" result but "ethY" with RTN_UNICAST type if local_ip_on_ethY is configured on ethY (ethY has loop enabled too), no matter the key->oif value. Sort of: fib_lookup for "from IP1 to IP2 oif XXX" if (RTN_LOCAL) { if dev_out is loop_in and key->src != 0 { src = key->src? : FIB_RES_PREFSRC(res); dev_in = ip_dev_find(src); if (dev_in is loop_out) { use dev_in as dev_out goto make_route; } } // else use "lo" } - this code is slow but it is guarded from loop check for out_dev so I do not see performance impact (the output routing to localhost is not used often). The result is cached (you can set long routing cache expiration value during the tests). - we assume my patch from previous posting is applied and we match any local IP no matter the key oif. Goal 2. Can we skip all TCP/UDP changes? - we rely on the fact the routing results allow traffic in both directions (incoming is accepted with RTN_LOCAL, output gets RTN_UNICAST). As for IPv6 I can not comment, we define ipv4/conf/XXX/loop flag, though. But I prefer we to keep the changes only at routing level. For TCP and UDP these talks should look as if "lo" is used. - what I'm not sure is whether any socket hash problems exists and this is the only thing that can prevent this patch to look nice and fast. But I'm wondering there are such issues as the talks on "lo" should work but we have to check that. The usage: - mark eth0 as loop_out and eth1 as loop_in device and start the test in eth0->eth1 direction or use loop inout for both directions. If you think that we can change only the routing then I can prepare patch for testing, I'm not sure I have a test setup for this feature right now. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: send-to-self (was Re: routing bug report for 2.4) 2003-06-29 9:43 ` send-to-self (was Re: routing bug report for 2.4) Julian Anastasov @ 2003-06-29 20:18 ` Julian Anastasov 2003-06-30 7:59 ` Ben Greear 2003-06-30 20:22 ` James R. Leu 1 sibling, 1 reply; 15+ messages in thread From: Julian Anastasov @ 2003-06-29 20:18 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, Ben, I have something for comments and testing (compiled only): http://www.ssi.bg/~ja/send-to-self-2.4.21-1.diff The usage should be: eth0/loop=1 eth1/loop=1 bind to src IP from eth0 and connect to local IP on eth1 Be ready, there can be something totally wrong. I'm avoiding the arp_filter changes. The setup uses asymmetric routing so better use arp_filter=0 or other ARP filtering tools that can restrict our ARP replies only via the desired device. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: send-to-self (was Re: routing bug report for 2.4) 2003-06-29 20:18 ` Julian Anastasov @ 2003-06-30 7:59 ` Ben Greear 2003-06-30 10:43 ` Julian Anastasov 2003-07-01 21:57 ` Julian Anastasov 0 siblings, 2 replies; 15+ messages in thread From: Ben Greear @ 2003-06-30 7:59 UTC (permalink / raw) To: Julian Anastasov; +Cc: netdev Julian Anastasov wrote: > Hello, > > Ben, I have something for comments and testing (compiled > only): > > http://www.ssi.bg/~ja/send-to-self-2.4.21-1.diff Just moved to my new home..will be a few days before I can take a detailed look at this..and your long description confused my tired mind for tonight... I'll look in detail soon. > > The usage should be: > eth0/loop=1 > eth1/loop=1 > bind to src IP from eth0 and connect to local IP on eth1 > > Be ready, there can be something totally wrong. > > I'm avoiding the arp_filter changes. The setup uses > asymmetric routing so better use arp_filter=0 or other arp_filter=1, right? > ARP filtering tools that can restrict our ARP replies > only via the desired device. I want to avoid strange(r) routing configurations, as I'm already using lots of routing tricks, and don't want to confuse matters more. I also turn on arp filtering to ensure the arps go out the right interface currently. You should be able to easily test most of the changes your code if you have a machine with two ethernet interfaces and a loopback cable... My requirements are: 1) Both ethernet ports communicate over the exernal link, UDP & IP traffic. Third-party programs if possible, thus I set the flag on the interface in my patch, not on an individual socket, though I do have to BINDTODEVICE and policy-base base route to get things working right... 1b) Allow both same-subnet comm (eth1 & eth2 are on same subnet), and also routed traffic (eth1 & eth2 have their own default router, similar to the previously discussed routing setup) 2) Allow normal non-looped communication on the ports, including policy-based routing based on source addr. Thanks, Ben -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: send-to-self (was Re: routing bug report for 2.4) 2003-06-30 7:59 ` Ben Greear @ 2003-06-30 10:43 ` Julian Anastasov 2003-07-01 21:57 ` Julian Anastasov 1 sibling, 0 replies; 15+ messages in thread From: Julian Anastasov @ 2003-06-30 10:43 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, On Mon, 30 Jun 2003, Ben Greear wrote: > > I'm avoiding the arp_filter changes. The setup uses > > asymmetric routing so better use arp_filter=0 or other > > arp_filter=1, right? Right, my mistake, the routing is symmetric, arp_filter=1 is even recommended. Only rp_filter=1 can not be used as ARP filter but you can use rp_filter=1 for IP filtering. > I want to avoid strange(r) routing configurations, as I'm already > using lots of routing tricks, and don't want to confuse matters > more. I also turn on arp filtering to ensure the arps go out the > right interface currently. Right, you need just to bind to src IP (may be you can avoid even that if you replace the prefsrc in your local routes). > You should be able to easily test most of the changes your code > if you have a machine with two ethernet interfaces and a loopback > cable... That is the problem, no 2.4 host with 2 NICs. Not soon. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: send-to-self (was Re: routing bug report for 2.4) 2003-06-30 7:59 ` Ben Greear 2003-06-30 10:43 ` Julian Anastasov @ 2003-07-01 21:57 ` Julian Anastasov 2003-07-01 22:07 ` Ben Greear 1 sibling, 1 reply; 15+ messages in thread From: Julian Anastasov @ 2003-07-01 21:57 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, On Mon, 30 Jun 2003, Ben Greear wrote: > You should be able to easily test most of the changes your code > if you have a machine with two ethernet interfaces and a loopback > cable... ok, tested the 2.5 version, the patch files are updated: http://www.ssi.bg/~ja/#loop - added missing dev_put on ENETDOWN - removed the checks that ignore oif for local routes as Alexey suggests I have tried simple tests: ICMP, telnet. What I see is that the 2.5 rt_set_nexthop() does not set sysctl_ip_default_ttl if res->fi is NULL and that causes the icmp echo packets to use ttl=0. May be there are still some noisy places like arp_set_predefined, it will need further investigation. I'm stopping here, for now. > My requirements are: > > 1) Both ethernet ports communicate over the exernal link, UDP & IP traffic. Done > Third-party programs if possible, thus I set the flag on the interface in > my patch, not on an individual socket, though I do have to BINDTODEVICE and > policy-base base route to get things working right... Now you have 2 options: - bind to src IP: the app needs to be aware for that - ip route replace local IP2 dev DEV2 ... src IP1 table local: the app does not need to be aware to use this feature Now using BINDTODEVICE can cause problems with this feature, because we do not ignore oif for local destinations, you risk to miss the local route and arp_filter to break the things or worse (not tested) > 1b) Allow both same-subnet comm (eth1 & eth2 are on same subnet), and also > routed traffic (eth1 & eth2 have their own default router, similar to the > previously discussed routing setup) all other routes remain unchanged, I hope > 2) Allow normal non-looped communication on the ports, including policy-based routing > based on source addr. hm, you better know what you mean. As expected, this feature has its drawbacks. The safe way is to teach some apps to bind to IP1 and the apps that are unaware for these loops to use the prefsrc and thus to use lo. There is no much space for improvement here but I'm open for suggestions. > Thanks, > Ben Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: send-to-self (was Re: routing bug report for 2.4) 2003-07-01 21:57 ` Julian Anastasov @ 2003-07-01 22:07 ` Ben Greear 2003-07-01 22:23 ` Julian Anastasov 0 siblings, 1 reply; 15+ messages in thread From: Ben Greear @ 2003-07-01 22:07 UTC (permalink / raw) To: Julian Anastasov; +Cc: netdev Julian Anastasov wrote: > Hello, > > On Mon, 30 Jun 2003, Ben Greear wrote: > > >>You should be able to easily test most of the changes your code >>if you have a machine with two ethernet interfaces and a loopback >>cable... > > > ok, tested the 2.5 version, the patch files are updated: > > http://www.ssi.bg/~ja/#loop > > - added missing dev_put on ENETDOWN > - removed the checks that ignore oif for local routes as Alexey suggests > > I have tried simple tests: ICMP, telnet. What I see > is that the 2.5 rt_set_nexthop() does not set sysctl_ip_default_ttl if > res->fi is NULL and that causes the icmp echo packets to use > ttl=0. May be there are still some noisy places like arp_set_predefined, > it will need further investigation. I'm stopping here, for now. How did you get telnet to bind to a particular local interface? Also, what ping syntax did you use? Did you have to modify either of these applications to get them to work? I looked at the patch...but don't have a good enough grasp of the routing code to provide a useful critique. I believe my patch _is_ smaller though ;) Thanks, Ben > > >>My requirements are: >> >>1) Both ethernet ports communicate over the exernal link, UDP & IP traffic. > > > Done > > >> Third-party programs if possible, thus I set the flag on the interface in >> my patch, not on an individual socket, though I do have to BINDTODEVICE and >> policy-base base route to get things working right... > > > Now you have 2 options: > > - bind to src IP: the app needs to be aware for that > > - ip route replace local IP2 dev DEV2 ... src IP1 table local: the app > does not need to be aware to use this feature > > Now using BINDTODEVICE can cause problems with this feature, > because we do not ignore oif for local destinations, you risk to > miss the local route and arp_filter to break the things or worse (not > tested) > > >>1b) Allow both same-subnet comm (eth1 & eth2 are on same subnet), and also >> routed traffic (eth1 & eth2 have their own default router, similar to the >> previously discussed routing setup) > > > all other routes remain unchanged, I hope > > >>2) Allow normal non-looped communication on the ports, including policy-based routing >> based on source addr. > > > hm, you better know what you mean. As expected, this feature > has its drawbacks. The safe way is to teach some apps to bind to > IP1 and the apps that are unaware for these loops to use the prefsrc > and thus to use lo. There is no much space for improvement here but > I'm open for suggestions. > > >>Thanks, >>Ben > > > Regards > > -- > Julian Anastasov <ja@ssi.bg> > -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: send-to-self (was Re: routing bug report for 2.4) 2003-07-01 22:07 ` Ben Greear @ 2003-07-01 22:23 ` Julian Anastasov 0 siblings, 0 replies; 15+ messages in thread From: Julian Anastasov @ 2003-07-01 22:23 UTC (permalink / raw) To: Ben Greear; +Cc: netdev Hello, On Tue, 1 Jul 2003, Ben Greear wrote: > How did you get telnet to bind to a particular local interface? Also, what I tested telnet with replacing the prefsrc, as result, 'ip route get telnet_server_ip_on_eth1' returns local_ip_from_eth0 as src. telnetd listens as usually to 0.0.0.0, incoming connection comes (IP1->IP2), so the server always gets two different IPs... > ping syntax did you use? Did you have to modify either of these applications > to get them to work? Nooo :) 'ping -I IP1 IP2' or if you set IP2's prefsrc to IP1 then even 'ping IP2' works > I looked at the patch...but don't have a good enough grasp of the routing > code to provide a useful critique. I believe my patch _is_ smaller though ;) At least, we have two alternatives :) I'm still not sure whether the "loop" feature will need some tuning in other netsource places. > Thanks, > Ben Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: send-to-self (was Re: routing bug report for 2.4) 2003-06-29 9:43 ` send-to-self (was Re: routing bug report for 2.4) Julian Anastasov 2003-06-29 20:18 ` Julian Anastasov @ 2003-06-30 20:22 ` James R. Leu 1 sibling, 0 replies; 15+ messages in thread From: James R. Leu @ 2003-06-30 20:22 UTC (permalink / raw) To: Julian Anastasov; +Cc: Ben Greear, netdev I have some done some work on a related subject, 'virtual routing and forwarding' for linux. One of the applications of this is 'self-to-self' routing. I have mentioned my work before on this list, and have been flamed (but no one provided me with ideas on how to do it better). If you would like to take a look at what I have done, head over to: http://linux-vrf.sf.net/ I'm open for suggestions of how to implement this better. On Sun, Jun 29, 2003 at 12:43:26PM +0300, Julian Anastasov wrote: > > Hello, > > On Sat, 28 Jun 2003, Ben Greear wrote: > > > My send-to-self patch that I have been using is attached. I also have some other > > patches for mac-vlans and packet-gen applied, but I don't believe these will have any > > impact on the behaviour we have been discussing. > > Ben, lets define new behaviour for your feature: > > 1. we mark ethX with /proc/sys/net/ipv4/conf/ethX/loop=1. That means > this is a loop device (my site contains lot of device flags, you > can see what costs creating a sysctl var): > http://www.ssi.bg/~ja/ > just hit some of the links, recommended example: > http://www.ssi.bg/~ja/forward_shared-2.4.19-2.diff > > there are 2 variants: > > - loop can be 0(no loop) / 1(loop inout) or > > - 0(no loop), 1(loop in only), 2(loop out only), 3(loop inout) > > where "loop in only" means "accept only" and "loop out only" > is "send only" interface > > but as all traffics are inout I think "loop inout" will > be always used > > 2. arp_filter accepts traffic on ethX (as in your patch) > if "loop in" is allowed for indev and "loop out" for the > out_dev in routing result > > 3. rp_filter (source validation) accepts traffic on ethX (as in your > patch) if "loop in" is allowed > > 4. get unicast output route for local IPs ethY->ethX if "loop in" is > allowed for ethX and "loop out" is allowed for "ethY. ARP > will add cache entries for local IPs. > > > Goal 1. Can we just skip the BINDTODEVICE thing and to replace it > with bind to src IP. We can avoid binding to src IP for our > tests if we replace the preferred source IP in the desired local > routes but this is a hack. Using BINDTODEVICE will not add > any benefits but will be supported (it is ignored). > > Then to define it in this way: > > If ethX has "/proc/sys/net/ipv4/conf/ethX/loop" set to !0 then > all output routes "from local_ip_on_ethY to local_ip_on_ethX" will > not receive "lo" result but "ethY" with RTN_UNICAST type > if local_ip_on_ethY is configured on ethY (ethY has loop enabled too), > no matter the key->oif value. Sort of: > > fib_lookup for "from IP1 to IP2 oif XXX" > if (RTN_LOCAL) > { > if dev_out is loop_in and key->src != 0 > { > src = key->src? : FIB_RES_PREFSRC(res); > dev_in = ip_dev_find(src); > if (dev_in is loop_out) > { > use dev_in as dev_out > goto make_route; > } > } > // else > use "lo" > } > > - this code is slow but it is guarded from loop check for out_dev > so I do not see performance impact (the output routing to localhost > is not used often). The result is cached (you can set long > routing cache expiration value during the tests). > > - we assume my patch from previous posting is applied > and we match any local IP no matter the key oif. > > Goal 2. Can we skip all TCP/UDP changes? > > - we rely on the fact the routing results allow traffic in > both directions (incoming is accepted with RTN_LOCAL, output > gets RTN_UNICAST). As for IPv6 I can not comment, we define > ipv4/conf/XXX/loop flag, though. But I prefer we to keep the > changes only at routing level. For TCP and UDP these talks > should look as if "lo" is used. > > - what I'm not sure is whether any socket hash problems exists > and this is the only thing that can prevent this patch to look > nice and fast. But I'm wondering there are such issues as > the talks on "lo" should work but we have to check that. > > The usage: > > - mark eth0 as loop_out and eth1 as loop_in device and start the test > in eth0->eth1 direction or use loop inout for both directions. > > If you think that we can change only the routing then > I can prepare patch for testing, I'm not sure I have a test setup > for this feature right now. > > Regards > > -- > Julian Anastasov <ja@ssi.bg> > -- James R. Leu ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2003-07-01 22:23 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-06-27 23:00 routing bug report for 2.4 Ben Greear 2003-06-28 9:02 ` Julian Anastasov 2003-06-28 18:38 ` Ben Greear 2003-06-28 20:12 ` Julian Anastasov 2003-06-28 20:38 ` Julian Anastasov 2003-06-28 22:13 ` Ben Greear 2003-06-29 7:28 ` Julian Anastasov 2003-06-29 9:43 ` send-to-self (was Re: routing bug report for 2.4) Julian Anastasov 2003-06-29 20:18 ` Julian Anastasov 2003-06-30 7:59 ` Ben Greear 2003-06-30 10:43 ` Julian Anastasov 2003-07-01 21:57 ` Julian Anastasov 2003-07-01 22:07 ` Ben Greear 2003-07-01 22:23 ` Julian Anastasov 2003-06-30 20:22 ` James R. Leu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).