* what's causing "ip_rt_bug"? @ 2011-06-17 20:00 Tomasz Chmielewski 2011-06-17 20:36 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Tomasz Chmielewski @ 2011-06-17 20:00 UTC (permalink / raw) To: netdev I have a system pushing around 800 Mbit/s, ~130 kpps. It uses 2.6.35.12 kernel. Several times a minute, I can see entries like (a.b.c.d - IP of this system): Jun 18 02:39:19 KOR-SV22 kernel: [37187.665951] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? Jun 18 02:39:19 KOR-SV22 kernel: [37187.685419] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? Jun 18 02:40:31 KOR-SV22 kernel: [37259.199315] ip_rt_bug: 124.x.x.x -> a.b.c.d, ? Jun 18 02:40:36 KOR-SV22 kernel: [37263.828000] ip_rt_bug: 124.x.x.x -> a.b.c.d, ? Jun 18 02:44:16 KOR-SV22 kernel: [37484.120689] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? Jun 18 02:44:19 KOR-SV22 kernel: [37487.114357] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? What may be causing this? Is it "dangerous"? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-17 20:00 what's causing "ip_rt_bug"? Tomasz Chmielewski @ 2011-06-17 20:36 ` Eric Dumazet 2011-06-17 21:37 ` Tomasz Chmielewski 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2011-06-17 20:36 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: netdev Le vendredi 17 juin 2011 à 22:00 +0200, Tomasz Chmielewski a écrit : > I have a system pushing around 800 Mbit/s, ~130 kpps. > > It uses 2.6.35.12 kernel. > > > Several times a minute, I can see entries like (a.b.c.d - IP of this system): > > Jun 18 02:39:19 KOR-SV22 kernel: [37187.665951] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? > Jun 18 02:39:19 KOR-SV22 kernel: [37187.685419] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? > Jun 18 02:40:31 KOR-SV22 kernel: [37259.199315] ip_rt_bug: 124.x.x.x -> a.b.c.d, ? > Jun 18 02:40:36 KOR-SV22 kernel: [37263.828000] ip_rt_bug: 124.x.x.x -> a.b.c.d, ? > Jun 18 02:44:16 KOR-SV22 kernel: [37484.120689] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? > Jun 18 02:44:19 KOR-SV22 kernel: [37487.114357] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? > Hi What your routing table looks like ? (ip ro) You also could backport this patch so that we can catch where/why this happens commit c378a9c019cf5e017d1ed24954b54fae7bebd2bc Author: Dave Jones <davej@redhat.com> Date: Sat May 21 07:16:42 2011 +0000 ipv4: Give backtrace in ip_rt_bug(). Add a stack backtrace to the ip_rt_bug path for debugging Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/ipv4/route.c b/net/ipv4/route.c index b24d58e..52b0b95 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1665,6 +1665,7 @@ static int ip_rt_bug(struct sk_buff *skb) &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr, skb->dev ? skb->dev->name : "?"); kfree_skb(skb); + WARN_ON(1); return 0; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-17 20:36 ` Eric Dumazet @ 2011-06-17 21:37 ` Tomasz Chmielewski 2011-06-17 23:56 ` Julian Anastasov 0 siblings, 1 reply; 12+ messages in thread From: Tomasz Chmielewski @ 2011-06-17 21:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev On 17.06.2011 22:36, Eric Dumazet wrote: > Le vendredi 17 juin 2011 à 22:00 +0200, Tomasz Chmielewski a écrit : >> I have a system pushing around 800 Mbit/s, ~130 kpps. >> >> It uses 2.6.35.12 kernel. >> >> >> Several times a minute, I can see entries like (a.b.c.d - IP of this system): >> >> Jun 18 02:39:19 KOR-SV22 kernel: [37187.665951] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? >> Jun 18 02:39:19 KOR-SV22 kernel: [37187.685419] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? >> Jun 18 02:40:31 KOR-SV22 kernel: [37259.199315] ip_rt_bug: 124.x.x.x -> a.b.c.d, ? >> Jun 18 02:40:36 KOR-SV22 kernel: [37263.828000] ip_rt_bug: 124.x.x.x -> a.b.c.d, ? >> Jun 18 02:44:16 KOR-SV22 kernel: [37484.120689] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? >> Jun 18 02:44:19 KOR-SV22 kernel: [37487.114357] ip_rt_bug: 110.x.x.x -> a.b.c.d, ? >> > > Hi > > What your routing table looks like ? (ip ro) It's just a proxy, no special routing set: # ip ro 58.185.117.18 via 119.46.110.193 dev eth0 119.46.240.13 via 119.46.110.193 dev eth0 58.185.117.29 via 119.46.110.193 dev eth0 119.46.241.13 via 119.46.110.193 dev eth0 58.185.117.28 via 119.46.110.193 dev eth0 119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197 169.254.0.0/16 dev eth0 scope link default via 119.46.110.195 dev eth0 The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic). The last time it crashed, I have this entry in syslog: Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ? Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239041] BUG: unable to handle kernel NULL pointer dereference at (null) Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239409] IP: [<ffffffff81361cae>] dev_queue_xmit+0x1e3/0x441 Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239760] PGD 43c30b067 PUD 439e63067 PMD 0 Jun 17 16:17:00 TRUE-SC02 kernel: [172531.240103] Oops: 0000 [#1] SMP Jun 17 16:19:58 TRUE-SC02 syslogd 1.4.1: restart. Right now, it uses the newest igb driver, and I started seeing "Out of socket memory" quite a bit (didn't have it with the original igb driver from 2.6.35.12). So I doubled this value to be: net.ipv4.tcp_max_orphans = 256000 and "Out of socket memory" stopped showing up. Instead, "ip_rt_bug" shows up. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-17 21:37 ` Tomasz Chmielewski @ 2011-06-17 23:56 ` Julian Anastasov 2011-06-18 8:31 ` Tomasz Chmielewski 0 siblings, 1 reply; 12+ messages in thread From: Julian Anastasov @ 2011-06-17 23:56 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: Eric Dumazet, netdev Hello, On Fri, 17 Jun 2011, Tomasz Chmielewski wrote: > > What your routing table looks like ? (ip ro) > > It's just a proxy, no special routing set: Is transparent proxy used? > # ip ro > 58.185.117.18 via 119.46.110.193 dev eth0 > 119.46.240.13 via 119.46.110.193 dev eth0 > 58.185.117.29 via 119.46.110.193 dev eth0 > 119.46.241.13 via 119.46.110.193 dev eth0 Same route for 58.185.117.28 2nd time? Is that possible?: > 58.185.117.28 via 119.46.110.193 dev eth0 > 119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197 > 169.254.0.0/16 dev eth0 scope link > default via 119.46.110.195 dev eth0 > > > The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic). > > > The last time it crashed, I have this entry in syslog: > > Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ? The ip_rt_bug messages show that skb->dev is NULL (OUTPUT hook), daddr in IP header is local address, may be some original received packet. If such packet is provided to ip_route_me_harder(skb, RTN_UNSPEC) an ip_route_input call can happen. Calling later dst_output should lead to this warning. The question is what can cause received packet to appear in OUTPUT hook where a change in mark or TOS can can trigger such ip_route_input call. What kind of netfilter modules are used? nf_queue, -j REJECT, NAT? Is 124.121.155.197 a local address? > Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239041] BUG: unable to handle kernel NULL pointer dereference at (null) > Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239409] IP: [<ffffffff81361cae>] dev_queue_xmit+0x1e3/0x441 > Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239760] PGD 43c30b067 PUD 439e63067 PMD 0 > Jun 17 16:17:00 TRUE-SC02 kernel: [172531.240103] Oops: 0000 [#1] SMP > Jun 17 16:19:58 TRUE-SC02 syslogd 1.4.1: restart. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-17 23:56 ` Julian Anastasov @ 2011-06-18 8:31 ` Tomasz Chmielewski 2011-06-18 17:53 ` Julian Anastasov 0 siblings, 1 reply; 12+ messages in thread From: Tomasz Chmielewski @ 2011-06-18 8:31 UTC (permalink / raw) To: Julian Anastasov; +Cc: Eric Dumazet, netdev On 18.06.2011 01:56, Julian Anastasov wrote: > > Hello, > > On Fri, 17 Jun 2011, Tomasz Chmielewski wrote: > >>> What your routing table looks like ? (ip ro) >> >> It's just a proxy, no special routing set: > > Is transparent proxy used? Yes, it is. >> # ip ro >> 58.185.117.18 via 119.46.110.193 dev eth0 >> 119.46.240.13 via 119.46.110.193 dev eth0 >> 58.185.117.29 via 119.46.110.193 dev eth0 >> 119.46.241.13 via 119.46.110.193 dev eth0 > > Same route for 58.185.117.28 2nd time? Is that possible?: Not second time, the addresses are similar, but different: 58.185.117.18, 58.185.117.29, 58.185.117.28. Unless there's something I don't see! ;) >> 58.185.117.28 via 119.46.110.193 dev eth0 >> 119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197 >> 169.254.0.0/16 dev eth0 scope link >> default via 119.46.110.195 dev eth0 >> >> >> The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic). >> >> >> The last time it crashed, I have this entry in syslog: >> >> Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ? > > The ip_rt_bug messages show that skb->dev is > NULL (OUTPUT hook), daddr in IP header is local address, > may be some original received packet. If such packet is > provided to ip_route_me_harder(skb, RTN_UNSPEC) an > ip_route_input call can happen. Calling later dst_output > should lead to this warning. The question is what can > cause received packet to appear in OUTPUT hook where > a change in mark or TOS can can trigger such ip_route_input > call. What kind of netfilter modules are used? nf_queue, > -j REJECT, NAT? Is 124.121.155.197 a local address? No, it's not local. With "ip_rt_bug: 124.121.184.77 -> 119.46.110.197, ?" lines, only the address on the right side is local. # iptables -L -t nat -n Chain PREROUTING (policy ACCEPT) target prot opt source destination REDIRECT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 redir ports 8080 Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination # iptables -L -t mangle -n Chain PREROUTING (policy ACCEPT) target prot opt source destination DIVERT tcp -- 0.0.0.0/0 0.0.0.0/0 socket Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination Chain DIVERT (1 references) target prot opt source destination MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xset 0x1/0xffffffff ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 # lsmod Module Size Used by xt_mark 1171 1 xt_socket 1922 1 nf_tproxy_core 1752 1 xt_socket,[permanent] ipt_REDIRECT 1093 1 xt_tcpudp 2331 1 ebt_redirect 1234 1 ebt_ip 1562 1 ebtable_broute 1395 1 bridge 64647 1 ebtable_broute stp 1931 1 bridge llc 5071 2 bridge,stp ebtables 20458 1 ebtable_broute iptable_mangle 1351 1 iptable_nat 3644 1 nf_nat 16977 2 ipt_REDIRECT,iptable_nat nf_conntrack_ipv4 11077 3 iptable_nat,nf_nat nf_defrag_ipv4 1337 2 xt_socket,nf_conntrack_ipv4 i2c_dev 4561 0 i2c_core 21774 1 i2c_dev nf_conntrack_netbios_ns 1486 0 nf_conntrack 65085 5 xt_socket,iptable_nat,nf_nat,nf_conntrack_ipv4,nf_conntrack_netbios_ns iptable_filter 1402 0 ip_tables 14931 3 iptable_mangle,iptable_nat,iptable_filter x_tables 20316 11 xt_mark,xt_socket,ipt_REDIRECT,xt_tcpudp,ebt_redirect,ebt_ip,ebtables,iptable_mangle,iptable_nat,iptable_filter,ip_tables dm_mirror 11724 0 dm_multipath 14772 0 scsi_dh 5994 1 dm_multipath video 21310 0 output 2103 1 video sbs 11378 0 sbshc 4115 1 sbs battery 10902 0 acpi_memhotplug 4135 0 ac 3274 0 parport_pc 21355 0 lp 9491 0 parport 33290 2 parport_pc,lp option 16045 0 usb_wwan 10222 1 option usbserial 34477 2 option,usb_wwan serio_raw 4064 0 tpm_tis 9203 0 tpm 14317 1 tpm_tis tpm_bios 5252 1 tpm rtc_cmos 8731 0 rtc_core 14080 1 rtc_cmos rtc_lib 2497 1 rtc_core button 5662 0 igb 131680 0 shpchp 29302 0 pcspkr 1822 0 dm_region_hash 9574 1 dm_mirror dm_log 8359 2 dm_mirror,dm_region_hash usb_storage 45133 0 ata_piix 22147 0 libata 169650 1 ata_piix cciss 88474 24 sd_mod 28117 0 scsi_mod 156163 5 scsi_dh,usb_storage,libata,cciss,sd_mod ext3 114308 12 jbd 43368 1 ext3 uhci_hcd 18941 0 ohci_hcd 20027 0 ehci_hcd 33605 0 # ifconfig -a eth0 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE inet addr:119.46.110.197 Bcast:119.46.110.255 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4872707550 errors:0 dropped:1177767 overruns:1177767 frame:0 TX packets:5066061004 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3719046973104 (3.3 TiB) TX bytes:4237588228875 (3.8 TiB) eth0:1 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE inet addr:119.46.110.249 Bcast:119.46.110.255 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-18 8:31 ` Tomasz Chmielewski @ 2011-06-18 17:53 ` Julian Anastasov 2011-06-28 3:55 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Julian Anastasov @ 2011-06-18 17:53 UTC (permalink / raw) To: Tomasz Chmielewski Cc: Eric Dumazet, netdev, Balazs Scheidler, KOVACS Krisztian Hello, CC-ing tproxy developers for more ideas... On Sat, 18 Jun 2011, Tomasz Chmielewski wrote: > >> It's just a proxy, no special routing set: > > > > Is transparent proxy used? > > Yes, it is. > > > >> # ip ro > >> 58.185.117.18 via 119.46.110.193 dev eth0 > >> 119.46.240.13 via 119.46.110.193 dev eth0 > >> 58.185.117.29 via 119.46.110.193 dev eth0 > >> 119.46.241.13 via 119.46.110.193 dev eth0 > > > > Same route for 58.185.117.28 2nd time? Is that possible?: > > Not second time, the addresses are similar, but different: 58.185.117.18, 58.185.117.29, 58.185.117.28. Unless there's something I don't see! ;) Ops, my fault :) > >> 58.185.117.28 via 119.46.110.193 dev eth0 > >> 119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197 > >> 169.254.0.0/16 dev eth0 scope link > >> default via 119.46.110.195 dev eth0 > >> > >> > >> The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic). > >> > >> > >> The last time it crashed, I have this entry in syslog: > >> > >> Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ? > > > > The ip_rt_bug messages show that skb->dev is > > NULL (OUTPUT hook), daddr in IP header is local address, > > may be some original received packet. If such packet is > > provided to ip_route_me_harder(skb, RTN_UNSPEC) an > > ip_route_input call can happen. Calling later dst_output > > should lead to this warning. The question is what can > > cause received packet to appear in OUTPUT hook where > > a change in mark or TOS can can trigger such ip_route_input > > call. What kind of netfilter modules are used? nf_queue, > > -j REJECT, NAT? Is 124.121.155.197 a local address? > > No, it's not local. > With "ip_rt_bug: 124.121.184.77 -> 119.46.110.197, ?" lines, only the address on the right side is local. Hm, if it happens "sometimes", can it be some problem with tproxy and TIME_WAIT sockets? I see that tproxy_sk_is_transparent has special treatment for TW sockets while ip_route_me_harder is different. As result, may be input route is assigned for TW packets. May be inet_sk_flowi_flags() needs fixing, not sure. But following patch is first step to fix this problem. I don't have setup to test this patch. =========================================================== Avoid creating input routes with ip_route_me_harder. It does not work for locally generated packets. Instead, restrict sockets to provide valid saddr for output route (or unicast saddr for transparent proxy). For other traffic allow saddr to be unicast or local but if callers forget to check saddr type use 0 for the output route. The resulting handling should be: - REJECT TCP: - in INPUT we can provide addr_type = RTN_LOCAL but better allow rejecting traffic delivered with local route (no IP address => use RTN_UNSPEC to allow also RTN_UNICAST). - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST - OUTPUT: RTN_UNSPEC - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT - IPVS: - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT to restrict saddr to be local Signed-off-by: Julian Anastasov <ja@ssi.bg> --- diff -urp v2.6.39/linux/net/ipv4/netfilter/ipt_REJECT.c linux/net/ipv4/netfilter/ipt_REJECT.c --- v2.6.39/linux/net/ipv4/netfilter/ipt_REJECT.c 2011-03-20 10:55:56.000000000 +0200 +++ linux/net/ipv4/netfilter/ipt_REJECT.c 2011-06-18 18:22:40.713189957 +0300 @@ -40,7 +40,6 @@ static void send_reset(struct sk_buff *o struct iphdr *niph; const struct tcphdr *oth; struct tcphdr _otcph, *tcph; - unsigned int addr_type; /* IP header checks: fragment. */ if (ip_hdr(oldskb)->frag_off & htons(IP_OFFSET)) @@ -55,6 +54,9 @@ static void send_reset(struct sk_buff *o if (oth->rst) return; + if (skb_rtable(oldskb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST)) + return; + /* Check checksum */ if (nf_ip_checksum(oldskb, hook, ip_hdrlen(oldskb), IPPROTO_TCP)) return; @@ -101,19 +103,11 @@ static void send_reset(struct sk_buff *o nskb->csum_start = (unsigned char *)tcph - nskb->head; nskb->csum_offset = offsetof(struct tcphdr, check); - addr_type = RTN_UNSPEC; - if (hook != NF_INET_FORWARD -#ifdef CONFIG_BRIDGE_NETFILTER - || (nskb->nf_bridge && nskb->nf_bridge->mask & BRNF_BRIDGED) -#endif - ) - addr_type = RTN_LOCAL; - /* ip_route_me_harder expects skb->dst to be set */ skb_dst_set_noref(nskb, skb_dst(oldskb)); nskb->protocol = htons(ETH_P_IP); - if (ip_route_me_harder(nskb, addr_type)) + if (ip_route_me_harder(nskb, RTN_UNSPEC)) goto free_nskb; niph->ttl = ip4_dst_hoplimit(skb_dst(nskb)); diff -urp v2.6.39/linux/net/ipv4/netfilter.c linux/net/ipv4/netfilter.c --- v2.6.39/linux/net/ipv4/netfilter.c 2011-05-20 10:38:08.000000000 +0300 +++ linux/net/ipv4/netfilter.c 2011-06-18 19:13:39.299189310 +0300 @@ -17,51 +17,35 @@ int ip_route_me_harder(struct sk_buff *s const struct iphdr *iph = ip_hdr(skb); struct rtable *rt; struct flowi4 fl4 = {}; - unsigned long orefdst; + __be32 saddr = iph->saddr; + __u8 flags = 0; unsigned int hh_len; - unsigned int type; - type = inet_addr_type(net, iph->saddr); - if (skb->sk && inet_sk(skb->sk)->transparent) - type = RTN_LOCAL; - if (addr_type == RTN_UNSPEC) - addr_type = type; + if (!skb->sk && addr_type != RTN_LOCAL) { + if (addr_type == RTN_UNSPEC) + addr_type = inet_addr_type(net, saddr); + if (addr_type == RTN_LOCAL || addr_type == RTN_UNICAST) + flags |= FLOWI_FLAG_ANYSRC; + else + saddr = 0; + } /* some non-standard hacks like ipt_REJECT.c:send_reset() can cause * packets with foreign saddr to appear on the NF_INET_LOCAL_OUT hook. */ - if (addr_type == RTN_LOCAL) { - fl4.daddr = iph->daddr; - if (type == RTN_LOCAL) - fl4.saddr = iph->saddr; - fl4.flowi4_tos = RT_TOS(iph->tos); - fl4.flowi4_oif = skb->sk ? skb->sk->sk_bound_dev_if : 0; - fl4.flowi4_mark = skb->mark; - fl4.flowi4_flags = skb->sk ? inet_sk_flowi_flags(skb->sk) : 0; - rt = ip_route_output_key(net, &fl4); - if (IS_ERR(rt)) - return -1; - - /* Drop old route. */ - skb_dst_drop(skb); - skb_dst_set(skb, &rt->dst); - } else { - /* non-local src, find valid iif to satisfy - * rp-filter when calling ip_route_input. */ - fl4.daddr = iph->saddr; - rt = ip_route_output_key(net, &fl4); - if (IS_ERR(rt)) - return -1; + fl4.daddr = iph->daddr; + fl4.saddr = saddr; + fl4.flowi4_tos = RT_TOS(iph->tos); + fl4.flowi4_oif = skb->sk ? skb->sk->sk_bound_dev_if : 0; + fl4.flowi4_mark = skb->mark; + fl4.flowi4_flags = skb->sk ? inet_sk_flowi_flags(skb->sk) : flags; + rt = ip_route_output_key(net, &fl4); + if (IS_ERR(rt)) + return -1; - orefdst = skb->_skb_refdst; - if (ip_route_input(skb, iph->daddr, iph->saddr, - RT_TOS(iph->tos), rt->dst.dev) != 0) { - dst_release(&rt->dst); - return -1; - } - dst_release(&rt->dst); - refdst_drop(orefdst); - } + /* Drop old route. */ + skb_dst_drop(skb); + skb_dst_set(skb, &rt->dst); if (skb_dst(skb)->error) return -1; ================================================================= > # iptables -L -t nat -n > Chain PREROUTING (policy ACCEPT) > target prot opt source destination > REDIRECT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 redir ports 8080 > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > Chain POSTROUTING (policy ACCEPT) > target prot opt source destination > > > # iptables -L -t mangle -n > Chain PREROUTING (policy ACCEPT) > target prot opt source destination > DIVERT tcp -- 0.0.0.0/0 0.0.0.0/0 socket > > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > Chain POSTROUTING (policy ACCEPT) > target prot opt source destination > > Chain DIVERT (1 references) > target prot opt source destination > MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xset 0x1/0xffffffff > ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 > > > # lsmod > Module Size Used by > xt_mark 1171 1 > xt_socket 1922 1 > nf_tproxy_core 1752 1 xt_socket,[permanent] > ipt_REDIRECT 1093 1 > xt_tcpudp 2331 1 > ebt_redirect 1234 1 > ebt_ip 1562 1 > ebtable_broute 1395 1 > bridge 64647 1 ebtable_broute > stp 1931 1 bridge > llc 5071 2 bridge,stp > ebtables 20458 1 ebtable_broute > iptable_mangle 1351 1 > iptable_nat 3644 1 > nf_nat 16977 2 ipt_REDIRECT,iptable_nat > nf_conntrack_ipv4 11077 3 iptable_nat,nf_nat > nf_defrag_ipv4 1337 2 xt_socket,nf_conntrack_ipv4 > i2c_dev 4561 0 > i2c_core 21774 1 i2c_dev > nf_conntrack_netbios_ns 1486 0 > nf_conntrack 65085 5 xt_socket,iptable_nat,nf_nat,nf_conntrack_ipv4,nf_conntrack_netbios_ns > iptable_filter 1402 0 > ip_tables 14931 3 iptable_mangle,iptable_nat,iptable_filter > x_tables 20316 11 xt_mark,xt_socket,ipt_REDIRECT,xt_tcpudp,ebt_redirect,ebt_ip,ebtables,iptable_mangle,iptable_nat,iptable_filter,ip_tables > dm_mirror 11724 0 > dm_multipath 14772 0 > scsi_dh 5994 1 dm_multipath > video 21310 0 > output 2103 1 video > sbs 11378 0 > sbshc 4115 1 sbs > battery 10902 0 > acpi_memhotplug 4135 0 > ac 3274 0 > parport_pc 21355 0 > lp 9491 0 > parport 33290 2 parport_pc,lp > option 16045 0 > usb_wwan 10222 1 option > usbserial 34477 2 option,usb_wwan > serio_raw 4064 0 > tpm_tis 9203 0 > tpm 14317 1 tpm_tis > tpm_bios 5252 1 tpm > rtc_cmos 8731 0 > rtc_core 14080 1 rtc_cmos > rtc_lib 2497 1 rtc_core > button 5662 0 > igb 131680 0 > shpchp 29302 0 > pcspkr 1822 0 > dm_region_hash 9574 1 dm_mirror > dm_log 8359 2 dm_mirror,dm_region_hash > usb_storage 45133 0 > ata_piix 22147 0 > libata 169650 1 ata_piix > cciss 88474 24 > sd_mod 28117 0 > scsi_mod 156163 5 scsi_dh,usb_storage,libata,cciss,sd_mod > ext3 114308 12 > jbd 43368 1 ext3 > uhci_hcd 18941 0 > ohci_hcd 20027 0 > ehci_hcd 33605 0 > > > # ifconfig -a > eth0 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE > inet addr:119.46.110.197 Bcast:119.46.110.255 Mask:255.255.255.192 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:4872707550 errors:0 dropped:1177767 overruns:1177767 frame:0 > TX packets:5066061004 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:3719046973104 (3.3 TiB) TX bytes:4237588228875 (3.8 TiB) > > eth0:1 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE > inet addr:119.46.110.249 Bcast:119.46.110.255 Mask:255.255.255.192 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > > -- > Tomasz Chmielewski > http://wpkg.org Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-18 17:53 ` Julian Anastasov @ 2011-06-28 3:55 ` David Miller 2011-06-28 8:13 ` Julian Anastasov 2011-06-28 8:30 ` Tomasz Chmielewski 0 siblings, 2 replies; 12+ messages in thread From: David Miller @ 2011-06-28 3:55 UTC (permalink / raw) To: ja; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden From: Julian Anastasov <ja@ssi.bg> Date: Sat, 18 Jun 2011 20:53:59 +0300 (EEST) > Hm, if it happens "sometimes", can it be some > problem with tproxy and TIME_WAIT sockets? I see that > tproxy_sk_is_transparent has special treatment for TW > sockets while ip_route_me_harder is different. As result, > may be input route is assigned for TW packets. > > May be inet_sk_flowi_flags() needs fixing, not > sure. But following patch is first step to fix this > problem. I don't have setup to test this patch. TPROXY has special code to make sure that time-wait sockets are not assigned to skb->sk, as explained in commit d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause all kinds of crashes in nfnetlink_log etc. Therefore we would see skb->sk==NULL at ip_route_me_harder() in that case. > =========================================================== > > Avoid creating input routes with ip_route_me_harder. > It does not work for locally generated packets. Instead, > restrict sockets to provide valid saddr for output route (or > unicast saddr for transparent proxy). For other traffic > allow saddr to be unicast or local but if callers forget > to check saddr type use 0 for the output route. > > The resulting handling should be: > > - REJECT TCP: > - in INPUT we can provide addr_type = RTN_LOCAL but > better allow rejecting traffic delivered with > local route (no IP address => use RTN_UNSPEC to > allow also RTN_UNICAST). > - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST > saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST > - OUTPUT: RTN_UNSPEC > > - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT > > - IPVS: > - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT > to restrict saddr to be local > > Signed-off-by: Julian Anastasov <ja@ssi.bg> Unless someone gives some negative feedback soon I'm going to apply this. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-28 3:55 ` David Miller @ 2011-06-28 8:13 ` Julian Anastasov 2011-06-28 8:41 ` David Miller 2011-06-28 8:30 ` Tomasz Chmielewski 1 sibling, 1 reply; 12+ messages in thread From: Julian Anastasov @ 2011-06-28 8:13 UTC (permalink / raw) To: David Miller; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden Hello, On Mon, 27 Jun 2011, David Miller wrote: > From: Julian Anastasov <ja@ssi.bg> > Date: Sat, 18 Jun 2011 20:53:59 +0300 (EEST) > > > Hm, if it happens "sometimes", can it be some > > problem with tproxy and TIME_WAIT sockets? I see that > > tproxy_sk_is_transparent has special treatment for TW > > sockets while ip_route_me_harder is different. As result, > > may be input route is assigned for TW packets. > > > > May be inet_sk_flowi_flags() needs fixing, not > > sure. But following patch is first step to fix this > > problem. I don't have setup to test this patch. > > TPROXY has special code to make sure that time-wait sockets > are not assigned to skb->sk, as explained in commit > d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause > all kinds of crashes in nfnetlink_log etc. > > Therefore we would see skb->sk==NULL at ip_route_me_harder() > in that case. Aha, after this clarification other changes should not be needed. If saddr is translated, now we will use FLOWI_FLAG_ANYSRC. As result, if SNAT happens one day in LOCAL_OUT, the new saddr can be unicast because RTN_UNSPEC is provided for addr_type. If saddr is not changed, it should be already validated when the first route for skb is performed, so TPROXY should work. > > =========================================================== > > > > Avoid creating input routes with ip_route_me_harder. > > It does not work for locally generated packets. Instead, > > restrict sockets to provide valid saddr for output route (or > > unicast saddr for transparent proxy). For other traffic > > allow saddr to be unicast or local but if callers forget > > to check saddr type use 0 for the output route. > > > > The resulting handling should be: > > > > - REJECT TCP: > > - in INPUT we can provide addr_type = RTN_LOCAL but > > better allow rejecting traffic delivered with > > local route (no IP address => use RTN_UNSPEC to > > allow also RTN_UNICAST). > > - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST > > saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST > > - OUTPUT: RTN_UNSPEC > > > > - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT > > > > - IPVS: > > - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT > > to restrict saddr to be local > > > > Signed-off-by: Julian Anastasov <ja@ssi.bg> > > Unless someone gives some negative feedback soon I'm going to > apply this. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-28 8:13 ` Julian Anastasov @ 2011-06-28 8:41 ` David Miller 2011-06-28 9:05 ` Julian Anastasov 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2011-06-28 8:41 UTC (permalink / raw) To: ja; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden From: Julian Anastasov <ja@ssi.bg> Date: Tue, 28 Jun 2011 11:13:25 +0300 (EEST) > On Mon, 27 Jun 2011, David Miller wrote: > >> TPROXY has special code to make sure that time-wait sockets >> are not assigned to skb->sk, as explained in commit >> d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause >> all kinds of crashes in nfnetlink_log etc. >> >> Therefore we would see skb->sk==NULL at ip_route_me_harder() >> in that case. > > Aha, after this clarification other changes should not > be needed. By this do you mean that you think your patch in this thread is completely sufficient? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-28 8:41 ` David Miller @ 2011-06-28 9:05 ` Julian Anastasov 0 siblings, 0 replies; 12+ messages in thread From: Julian Anastasov @ 2011-06-28 9:05 UTC (permalink / raw) To: David Miller; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden Hello, On Tue, 28 Jun 2011, David Miller wrote: > From: Julian Anastasov <ja@ssi.bg> > Date: Tue, 28 Jun 2011 11:13:25 +0300 (EEST) > > > On Mon, 27 Jun 2011, David Miller wrote: > > > >> TPROXY has special code to make sure that time-wait sockets > >> are not assigned to skb->sk, as explained in commit > >> d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause > >> all kinds of crashes in nfnetlink_log etc. > >> > >> Therefore we would see skb->sk==NULL at ip_route_me_harder() > >> in that case. > > > > Aha, after this clarification other changes should not > > be needed. > > By this do you mean that you think your patch in this thread > is completely sufficient? Yes. My worry was for the skb->sk != NULL not being handled by inet_sk_flowi_flags for TW sockets. But it seems it is not needed, so the patch in this form should be ok. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-28 3:55 ` David Miller 2011-06-28 8:13 ` Julian Anastasov @ 2011-06-28 8:30 ` Tomasz Chmielewski 2011-06-28 8:40 ` David Miller 1 sibling, 1 reply; 12+ messages in thread From: Tomasz Chmielewski @ 2011-06-28 8:30 UTC (permalink / raw) To: David Miller; +Cc: ja, eric.dumazet, netdev, bazsi, hidden On 28.06.2011 05:55, David Miller wrote: >> The resulting handling should be: >> >> - REJECT TCP: >> - in INPUT we can provide addr_type = RTN_LOCAL but >> better allow rejecting traffic delivered with >> local route (no IP address => use RTN_UNSPEC to >> allow also RTN_UNICAST). >> - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST >> saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST >> - OUTPUT: RTN_UNSPEC >> >> - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT >> >> - IPVS: >> - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT >> to restrict saddr to be local >> >> Signed-off-by: Julian Anastasov<ja@ssi.bg> > > Unless someone gives some negative feedback soon I'm going to > apply this. Can you tell me where it will be pushed? I.e. 3.x kernels only, or does it have a chance to go into 2.6.39.x? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"? 2011-06-28 8:30 ` Tomasz Chmielewski @ 2011-06-28 8:40 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2011-06-28 8:40 UTC (permalink / raw) To: mangoo; +Cc: ja, eric.dumazet, netdev, bazsi, hidden From: Tomasz Chmielewski <mangoo@wpkg.org> Date: Tue, 28 Jun 2011 10:30:11 +0200 > Can you tell me where it will be pushed? > > I.e. 3.x kernels only, or does it have a chance to go into 2.6.39.x? I'll apply it for 3.0.0 and also queue it up for -stable. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-06-28 9:02 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-17 20:00 what's causing "ip_rt_bug"? Tomasz Chmielewski 2011-06-17 20:36 ` Eric Dumazet 2011-06-17 21:37 ` Tomasz Chmielewski 2011-06-17 23:56 ` Julian Anastasov 2011-06-18 8:31 ` Tomasz Chmielewski 2011-06-18 17:53 ` Julian Anastasov 2011-06-28 3:55 ` David Miller 2011-06-28 8:13 ` Julian Anastasov 2011-06-28 8:41 ` David Miller 2011-06-28 9:05 ` Julian Anastasov 2011-06-28 8:30 ` Tomasz Chmielewski 2011-06-28 8:40 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).