* what's causing "ip_rt_bug"?
@ 2011-06-17 20:00 Tomasz Chmielewski
2011-06-17 20:36 ` Eric Dumazet
0 siblings, 1 reply; 12+ messages in thread
From: Tomasz Chmielewski @ 2011-06-17 20:00 UTC (permalink / raw)
To: netdev
I have a system pushing around 800 Mbit/s, ~130 kpps.
It uses 2.6.35.12 kernel.
Several times a minute, I can see entries like (a.b.c.d - IP of this system):
Jun 18 02:39:19 KOR-SV22 kernel: [37187.665951] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
Jun 18 02:39:19 KOR-SV22 kernel: [37187.685419] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
Jun 18 02:40:31 KOR-SV22 kernel: [37259.199315] ip_rt_bug: 124.x.x.x -> a.b.c.d, ?
Jun 18 02:40:36 KOR-SV22 kernel: [37263.828000] ip_rt_bug: 124.x.x.x -> a.b.c.d, ?
Jun 18 02:44:16 KOR-SV22 kernel: [37484.120689] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
Jun 18 02:44:19 KOR-SV22 kernel: [37487.114357] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
What may be causing this? Is it "dangerous"?
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-17 20:00 what's causing "ip_rt_bug"? Tomasz Chmielewski
@ 2011-06-17 20:36 ` Eric Dumazet
2011-06-17 21:37 ` Tomasz Chmielewski
0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2011-06-17 20:36 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: netdev
Le vendredi 17 juin 2011 à 22:00 +0200, Tomasz Chmielewski a écrit :
> I have a system pushing around 800 Mbit/s, ~130 kpps.
>
> It uses 2.6.35.12 kernel.
>
>
> Several times a minute, I can see entries like (a.b.c.d - IP of this system):
>
> Jun 18 02:39:19 KOR-SV22 kernel: [37187.665951] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
> Jun 18 02:39:19 KOR-SV22 kernel: [37187.685419] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
> Jun 18 02:40:31 KOR-SV22 kernel: [37259.199315] ip_rt_bug: 124.x.x.x -> a.b.c.d, ?
> Jun 18 02:40:36 KOR-SV22 kernel: [37263.828000] ip_rt_bug: 124.x.x.x -> a.b.c.d, ?
> Jun 18 02:44:16 KOR-SV22 kernel: [37484.120689] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
> Jun 18 02:44:19 KOR-SV22 kernel: [37487.114357] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
>
Hi
What your routing table looks like ? (ip ro)
You also could backport this patch so that we can catch where/why this
happens
commit c378a9c019cf5e017d1ed24954b54fae7bebd2bc
Author: Dave Jones <davej@redhat.com>
Date: Sat May 21 07:16:42 2011 +0000
ipv4: Give backtrace in ip_rt_bug().
Add a stack backtrace to the ip_rt_bug path for debugging
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b24d58e..52b0b95 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1665,6 +1665,7 @@ static int ip_rt_bug(struct sk_buff *skb)
&ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr,
skb->dev ? skb->dev->name : "?");
kfree_skb(skb);
+ WARN_ON(1);
return 0;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-17 20:36 ` Eric Dumazet
@ 2011-06-17 21:37 ` Tomasz Chmielewski
2011-06-17 23:56 ` Julian Anastasov
0 siblings, 1 reply; 12+ messages in thread
From: Tomasz Chmielewski @ 2011-06-17 21:37 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 17.06.2011 22:36, Eric Dumazet wrote:
> Le vendredi 17 juin 2011 à 22:00 +0200, Tomasz Chmielewski a écrit :
>> I have a system pushing around 800 Mbit/s, ~130 kpps.
>>
>> It uses 2.6.35.12 kernel.
>>
>>
>> Several times a minute, I can see entries like (a.b.c.d - IP of this system):
>>
>> Jun 18 02:39:19 KOR-SV22 kernel: [37187.665951] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
>> Jun 18 02:39:19 KOR-SV22 kernel: [37187.685419] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
>> Jun 18 02:40:31 KOR-SV22 kernel: [37259.199315] ip_rt_bug: 124.x.x.x -> a.b.c.d, ?
>> Jun 18 02:40:36 KOR-SV22 kernel: [37263.828000] ip_rt_bug: 124.x.x.x -> a.b.c.d, ?
>> Jun 18 02:44:16 KOR-SV22 kernel: [37484.120689] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
>> Jun 18 02:44:19 KOR-SV22 kernel: [37487.114357] ip_rt_bug: 110.x.x.x -> a.b.c.d, ?
>>
>
> Hi
>
> What your routing table looks like ? (ip ro)
It's just a proxy, no special routing set:
# ip ro
58.185.117.18 via 119.46.110.193 dev eth0
119.46.240.13 via 119.46.110.193 dev eth0
58.185.117.29 via 119.46.110.193 dev eth0
119.46.241.13 via 119.46.110.193 dev eth0
58.185.117.28 via 119.46.110.193 dev eth0
119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197
169.254.0.0/16 dev eth0 scope link
default via 119.46.110.195 dev eth0
The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic).
The last time it crashed, I have this entry in syslog:
Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ?
Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239041] BUG: unable to handle kernel NULL pointer dereference at (null)
Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239409] IP: [<ffffffff81361cae>] dev_queue_xmit+0x1e3/0x441
Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239760] PGD 43c30b067 PUD 439e63067 PMD 0
Jun 17 16:17:00 TRUE-SC02 kernel: [172531.240103] Oops: 0000 [#1] SMP
Jun 17 16:19:58 TRUE-SC02 syslogd 1.4.1: restart.
Right now, it uses the newest igb driver, and I started seeing "Out of socket memory" quite a bit (didn't have it with the original igb driver from 2.6.35.12).
So I doubled this value to be:
net.ipv4.tcp_max_orphans = 256000
and "Out of socket memory" stopped showing up.
Instead, "ip_rt_bug" shows up.
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-17 21:37 ` Tomasz Chmielewski
@ 2011-06-17 23:56 ` Julian Anastasov
2011-06-18 8:31 ` Tomasz Chmielewski
0 siblings, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2011-06-17 23:56 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: Eric Dumazet, netdev
Hello,
On Fri, 17 Jun 2011, Tomasz Chmielewski wrote:
> > What your routing table looks like ? (ip ro)
>
> It's just a proxy, no special routing set:
Is transparent proxy used?
> # ip ro
> 58.185.117.18 via 119.46.110.193 dev eth0
> 119.46.240.13 via 119.46.110.193 dev eth0
> 58.185.117.29 via 119.46.110.193 dev eth0
> 119.46.241.13 via 119.46.110.193 dev eth0
Same route for 58.185.117.28 2nd time? Is that possible?:
> 58.185.117.28 via 119.46.110.193 dev eth0
> 119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197
> 169.254.0.0/16 dev eth0 scope link
> default via 119.46.110.195 dev eth0
>
>
> The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic).
>
>
> The last time it crashed, I have this entry in syslog:
>
> Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ?
The ip_rt_bug messages show that skb->dev is
NULL (OUTPUT hook), daddr in IP header is local address,
may be some original received packet. If such packet is
provided to ip_route_me_harder(skb, RTN_UNSPEC) an
ip_route_input call can happen. Calling later dst_output
should lead to this warning. The question is what can
cause received packet to appear in OUTPUT hook where
a change in mark or TOS can can trigger such ip_route_input
call. What kind of netfilter modules are used? nf_queue,
-j REJECT, NAT? Is 124.121.155.197 a local address?
> Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239041] BUG: unable to handle kernel NULL pointer dereference at (null)
> Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239409] IP: [<ffffffff81361cae>] dev_queue_xmit+0x1e3/0x441
> Jun 17 16:17:00 TRUE-SC02 kernel: [172531.239760] PGD 43c30b067 PUD 439e63067 PMD 0
> Jun 17 16:17:00 TRUE-SC02 kernel: [172531.240103] Oops: 0000 [#1] SMP
> Jun 17 16:19:58 TRUE-SC02 syslogd 1.4.1: restart.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-17 23:56 ` Julian Anastasov
@ 2011-06-18 8:31 ` Tomasz Chmielewski
2011-06-18 17:53 ` Julian Anastasov
0 siblings, 1 reply; 12+ messages in thread
From: Tomasz Chmielewski @ 2011-06-18 8:31 UTC (permalink / raw)
To: Julian Anastasov; +Cc: Eric Dumazet, netdev
On 18.06.2011 01:56, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 17 Jun 2011, Tomasz Chmielewski wrote:
>
>>> What your routing table looks like ? (ip ro)
>>
>> It's just a proxy, no special routing set:
>
> Is transparent proxy used?
Yes, it is.
>> # ip ro
>> 58.185.117.18 via 119.46.110.193 dev eth0
>> 119.46.240.13 via 119.46.110.193 dev eth0
>> 58.185.117.29 via 119.46.110.193 dev eth0
>> 119.46.241.13 via 119.46.110.193 dev eth0
>
> Same route for 58.185.117.28 2nd time? Is that possible?:
Not second time, the addresses are similar, but different: 58.185.117.18, 58.185.117.29, 58.185.117.28. Unless there's something I don't see! ;)
>> 58.185.117.28 via 119.46.110.193 dev eth0
>> 119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197
>> 169.254.0.0/16 dev eth0 scope link
>> default via 119.46.110.195 dev eth0
>>
>>
>> The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic).
>>
>>
>> The last time it crashed, I have this entry in syslog:
>>
>> Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ?
>
> The ip_rt_bug messages show that skb->dev is
> NULL (OUTPUT hook), daddr in IP header is local address,
> may be some original received packet. If such packet is
> provided to ip_route_me_harder(skb, RTN_UNSPEC) an
> ip_route_input call can happen. Calling later dst_output
> should lead to this warning. The question is what can
> cause received packet to appear in OUTPUT hook where
> a change in mark or TOS can can trigger such ip_route_input
> call. What kind of netfilter modules are used? nf_queue,
> -j REJECT, NAT? Is 124.121.155.197 a local address?
No, it's not local.
With "ip_rt_bug: 124.121.184.77 -> 119.46.110.197, ?" lines, only the address on the right side is local.
# iptables -L -t nat -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
REDIRECT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 redir ports 8080
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
# iptables -L -t mangle -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DIVERT tcp -- 0.0.0.0/0 0.0.0.0/0 socket
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
Chain DIVERT (1 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xset 0x1/0xffffffff
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
# lsmod
Module Size Used by
xt_mark 1171 1
xt_socket 1922 1
nf_tproxy_core 1752 1 xt_socket,[permanent]
ipt_REDIRECT 1093 1
xt_tcpudp 2331 1
ebt_redirect 1234 1
ebt_ip 1562 1
ebtable_broute 1395 1
bridge 64647 1 ebtable_broute
stp 1931 1 bridge
llc 5071 2 bridge,stp
ebtables 20458 1 ebtable_broute
iptable_mangle 1351 1
iptable_nat 3644 1
nf_nat 16977 2 ipt_REDIRECT,iptable_nat
nf_conntrack_ipv4 11077 3 iptable_nat,nf_nat
nf_defrag_ipv4 1337 2 xt_socket,nf_conntrack_ipv4
i2c_dev 4561 0
i2c_core 21774 1 i2c_dev
nf_conntrack_netbios_ns 1486 0
nf_conntrack 65085 5 xt_socket,iptable_nat,nf_nat,nf_conntrack_ipv4,nf_conntrack_netbios_ns
iptable_filter 1402 0
ip_tables 14931 3 iptable_mangle,iptable_nat,iptable_filter
x_tables 20316 11 xt_mark,xt_socket,ipt_REDIRECT,xt_tcpudp,ebt_redirect,ebt_ip,ebtables,iptable_mangle,iptable_nat,iptable_filter,ip_tables
dm_mirror 11724 0
dm_multipath 14772 0
scsi_dh 5994 1 dm_multipath
video 21310 0
output 2103 1 video
sbs 11378 0
sbshc 4115 1 sbs
battery 10902 0
acpi_memhotplug 4135 0
ac 3274 0
parport_pc 21355 0
lp 9491 0
parport 33290 2 parport_pc,lp
option 16045 0
usb_wwan 10222 1 option
usbserial 34477 2 option,usb_wwan
serio_raw 4064 0
tpm_tis 9203 0
tpm 14317 1 tpm_tis
tpm_bios 5252 1 tpm
rtc_cmos 8731 0
rtc_core 14080 1 rtc_cmos
rtc_lib 2497 1 rtc_core
button 5662 0
igb 131680 0
shpchp 29302 0
pcspkr 1822 0
dm_region_hash 9574 1 dm_mirror
dm_log 8359 2 dm_mirror,dm_region_hash
usb_storage 45133 0
ata_piix 22147 0
libata 169650 1 ata_piix
cciss 88474 24
sd_mod 28117 0
scsi_mod 156163 5 scsi_dh,usb_storage,libata,cciss,sd_mod
ext3 114308 12
jbd 43368 1 ext3
uhci_hcd 18941 0
ohci_hcd 20027 0
ehci_hcd 33605 0
# ifconfig -a
eth0 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE
inet addr:119.46.110.197 Bcast:119.46.110.255 Mask:255.255.255.192
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4872707550 errors:0 dropped:1177767 overruns:1177767 frame:0
TX packets:5066061004 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3719046973104 (3.3 TiB) TX bytes:4237588228875 (3.8 TiB)
eth0:1 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE
inet addr:119.46.110.249 Bcast:119.46.110.255 Mask:255.255.255.192
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-18 8:31 ` Tomasz Chmielewski
@ 2011-06-18 17:53 ` Julian Anastasov
2011-06-28 3:55 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2011-06-18 17:53 UTC (permalink / raw)
To: Tomasz Chmielewski
Cc: Eric Dumazet, netdev, Balazs Scheidler, KOVACS Krisztian
Hello,
CC-ing tproxy developers for more ideas...
On Sat, 18 Jun 2011, Tomasz Chmielewski wrote:
> >> It's just a proxy, no special routing set:
> >
> > Is transparent proxy used?
>
> Yes, it is.
>
>
> >> # ip ro
> >> 58.185.117.18 via 119.46.110.193 dev eth0
> >> 119.46.240.13 via 119.46.110.193 dev eth0
> >> 58.185.117.29 via 119.46.110.193 dev eth0
> >> 119.46.241.13 via 119.46.110.193 dev eth0
> >
> > Same route for 58.185.117.28 2nd time? Is that possible?:
>
> Not second time, the addresses are similar, but different: 58.185.117.18, 58.185.117.29, 58.185.117.28. Unless there's something I don't see! ;)
Ops, my fault :)
> >> 58.185.117.28 via 119.46.110.193 dev eth0
> >> 119.46.110.192/26 dev eth0 proto kernel scope link src 119.46.110.197
> >> 169.254.0.0/16 dev eth0 scope link
> >> default via 119.46.110.195 dev eth0
> >>
> >>
> >> The box is also crashing every few days; and I really had no clue why (just connected a serial console to catch any new oops/panic).
> >>
> >>
> >> The last time it crashed, I have this entry in syslog:
> >>
> >> Jun 17 16:16:17 TRUE-SC02 kernel: [172488.602629] ip_rt_bug: 124.121.155.197 -> 119.46.110.197, ?
> >
> > The ip_rt_bug messages show that skb->dev is
> > NULL (OUTPUT hook), daddr in IP header is local address,
> > may be some original received packet. If such packet is
> > provided to ip_route_me_harder(skb, RTN_UNSPEC) an
> > ip_route_input call can happen. Calling later dst_output
> > should lead to this warning. The question is what can
> > cause received packet to appear in OUTPUT hook where
> > a change in mark or TOS can can trigger such ip_route_input
> > call. What kind of netfilter modules are used? nf_queue,
> > -j REJECT, NAT? Is 124.121.155.197 a local address?
>
> No, it's not local.
> With "ip_rt_bug: 124.121.184.77 -> 119.46.110.197, ?" lines, only the address on the right side is local.
Hm, if it happens "sometimes", can it be some
problem with tproxy and TIME_WAIT sockets? I see that
tproxy_sk_is_transparent has special treatment for TW
sockets while ip_route_me_harder is different. As result,
may be input route is assigned for TW packets.
May be inet_sk_flowi_flags() needs fixing, not
sure. But following patch is first step to fix this
problem. I don't have setup to test this patch.
===========================================================
Avoid creating input routes with ip_route_me_harder.
It does not work for locally generated packets. Instead,
restrict sockets to provide valid saddr for output route (or
unicast saddr for transparent proxy). For other traffic
allow saddr to be unicast or local but if callers forget
to check saddr type use 0 for the output route.
The resulting handling should be:
- REJECT TCP:
- in INPUT we can provide addr_type = RTN_LOCAL but
better allow rejecting traffic delivered with
local route (no IP address => use RTN_UNSPEC to
allow also RTN_UNICAST).
- FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST
saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST
- OUTPUT: RTN_UNSPEC
- NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT
- IPVS:
- use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT
to restrict saddr to be local
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
diff -urp v2.6.39/linux/net/ipv4/netfilter/ipt_REJECT.c linux/net/ipv4/netfilter/ipt_REJECT.c
--- v2.6.39/linux/net/ipv4/netfilter/ipt_REJECT.c 2011-03-20 10:55:56.000000000 +0200
+++ linux/net/ipv4/netfilter/ipt_REJECT.c 2011-06-18 18:22:40.713189957 +0300
@@ -40,7 +40,6 @@ static void send_reset(struct sk_buff *o
struct iphdr *niph;
const struct tcphdr *oth;
struct tcphdr _otcph, *tcph;
- unsigned int addr_type;
/* IP header checks: fragment. */
if (ip_hdr(oldskb)->frag_off & htons(IP_OFFSET))
@@ -55,6 +54,9 @@ static void send_reset(struct sk_buff *o
if (oth->rst)
return;
+ if (skb_rtable(oldskb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
+ return;
+
/* Check checksum */
if (nf_ip_checksum(oldskb, hook, ip_hdrlen(oldskb), IPPROTO_TCP))
return;
@@ -101,19 +103,11 @@ static void send_reset(struct sk_buff *o
nskb->csum_start = (unsigned char *)tcph - nskb->head;
nskb->csum_offset = offsetof(struct tcphdr, check);
- addr_type = RTN_UNSPEC;
- if (hook != NF_INET_FORWARD
-#ifdef CONFIG_BRIDGE_NETFILTER
- || (nskb->nf_bridge && nskb->nf_bridge->mask & BRNF_BRIDGED)
-#endif
- )
- addr_type = RTN_LOCAL;
-
/* ip_route_me_harder expects skb->dst to be set */
skb_dst_set_noref(nskb, skb_dst(oldskb));
nskb->protocol = htons(ETH_P_IP);
- if (ip_route_me_harder(nskb, addr_type))
+ if (ip_route_me_harder(nskb, RTN_UNSPEC))
goto free_nskb;
niph->ttl = ip4_dst_hoplimit(skb_dst(nskb));
diff -urp v2.6.39/linux/net/ipv4/netfilter.c linux/net/ipv4/netfilter.c
--- v2.6.39/linux/net/ipv4/netfilter.c 2011-05-20 10:38:08.000000000 +0300
+++ linux/net/ipv4/netfilter.c 2011-06-18 19:13:39.299189310 +0300
@@ -17,51 +17,35 @@ int ip_route_me_harder(struct sk_buff *s
const struct iphdr *iph = ip_hdr(skb);
struct rtable *rt;
struct flowi4 fl4 = {};
- unsigned long orefdst;
+ __be32 saddr = iph->saddr;
+ __u8 flags = 0;
unsigned int hh_len;
- unsigned int type;
- type = inet_addr_type(net, iph->saddr);
- if (skb->sk && inet_sk(skb->sk)->transparent)
- type = RTN_LOCAL;
- if (addr_type == RTN_UNSPEC)
- addr_type = type;
+ if (!skb->sk && addr_type != RTN_LOCAL) {
+ if (addr_type == RTN_UNSPEC)
+ addr_type = inet_addr_type(net, saddr);
+ if (addr_type == RTN_LOCAL || addr_type == RTN_UNICAST)
+ flags |= FLOWI_FLAG_ANYSRC;
+ else
+ saddr = 0;
+ }
/* some non-standard hacks like ipt_REJECT.c:send_reset() can cause
* packets with foreign saddr to appear on the NF_INET_LOCAL_OUT hook.
*/
- if (addr_type == RTN_LOCAL) {
- fl4.daddr = iph->daddr;
- if (type == RTN_LOCAL)
- fl4.saddr = iph->saddr;
- fl4.flowi4_tos = RT_TOS(iph->tos);
- fl4.flowi4_oif = skb->sk ? skb->sk->sk_bound_dev_if : 0;
- fl4.flowi4_mark = skb->mark;
- fl4.flowi4_flags = skb->sk ? inet_sk_flowi_flags(skb->sk) : 0;
- rt = ip_route_output_key(net, &fl4);
- if (IS_ERR(rt))
- return -1;
-
- /* Drop old route. */
- skb_dst_drop(skb);
- skb_dst_set(skb, &rt->dst);
- } else {
- /* non-local src, find valid iif to satisfy
- * rp-filter when calling ip_route_input. */
- fl4.daddr = iph->saddr;
- rt = ip_route_output_key(net, &fl4);
- if (IS_ERR(rt))
- return -1;
+ fl4.daddr = iph->daddr;
+ fl4.saddr = saddr;
+ fl4.flowi4_tos = RT_TOS(iph->tos);
+ fl4.flowi4_oif = skb->sk ? skb->sk->sk_bound_dev_if : 0;
+ fl4.flowi4_mark = skb->mark;
+ fl4.flowi4_flags = skb->sk ? inet_sk_flowi_flags(skb->sk) : flags;
+ rt = ip_route_output_key(net, &fl4);
+ if (IS_ERR(rt))
+ return -1;
- orefdst = skb->_skb_refdst;
- if (ip_route_input(skb, iph->daddr, iph->saddr,
- RT_TOS(iph->tos), rt->dst.dev) != 0) {
- dst_release(&rt->dst);
- return -1;
- }
- dst_release(&rt->dst);
- refdst_drop(orefdst);
- }
+ /* Drop old route. */
+ skb_dst_drop(skb);
+ skb_dst_set(skb, &rt->dst);
if (skb_dst(skb)->error)
return -1;
=================================================================
> # iptables -L -t nat -n
> Chain PREROUTING (policy ACCEPT)
> target prot opt source destination
> REDIRECT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 redir ports 8080
>
> Chain OUTPUT (policy ACCEPT)
> target prot opt source destination
>
> Chain POSTROUTING (policy ACCEPT)
> target prot opt source destination
>
>
> # iptables -L -t mangle -n
> Chain PREROUTING (policy ACCEPT)
> target prot opt source destination
> DIVERT tcp -- 0.0.0.0/0 0.0.0.0/0 socket
>
> Chain INPUT (policy ACCEPT)
> target prot opt source destination
>
> Chain FORWARD (policy ACCEPT)
> target prot opt source destination
>
> Chain OUTPUT (policy ACCEPT)
> target prot opt source destination
>
> Chain POSTROUTING (policy ACCEPT)
> target prot opt source destination
>
> Chain DIVERT (1 references)
> target prot opt source destination
> MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xset 0x1/0xffffffff
> ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
>
>
> # lsmod
> Module Size Used by
> xt_mark 1171 1
> xt_socket 1922 1
> nf_tproxy_core 1752 1 xt_socket,[permanent]
> ipt_REDIRECT 1093 1
> xt_tcpudp 2331 1
> ebt_redirect 1234 1
> ebt_ip 1562 1
> ebtable_broute 1395 1
> bridge 64647 1 ebtable_broute
> stp 1931 1 bridge
> llc 5071 2 bridge,stp
> ebtables 20458 1 ebtable_broute
> iptable_mangle 1351 1
> iptable_nat 3644 1
> nf_nat 16977 2 ipt_REDIRECT,iptable_nat
> nf_conntrack_ipv4 11077 3 iptable_nat,nf_nat
> nf_defrag_ipv4 1337 2 xt_socket,nf_conntrack_ipv4
> i2c_dev 4561 0
> i2c_core 21774 1 i2c_dev
> nf_conntrack_netbios_ns 1486 0
> nf_conntrack 65085 5 xt_socket,iptable_nat,nf_nat,nf_conntrack_ipv4,nf_conntrack_netbios_ns
> iptable_filter 1402 0
> ip_tables 14931 3 iptable_mangle,iptable_nat,iptable_filter
> x_tables 20316 11 xt_mark,xt_socket,ipt_REDIRECT,xt_tcpudp,ebt_redirect,ebt_ip,ebtables,iptable_mangle,iptable_nat,iptable_filter,ip_tables
> dm_mirror 11724 0
> dm_multipath 14772 0
> scsi_dh 5994 1 dm_multipath
> video 21310 0
> output 2103 1 video
> sbs 11378 0
> sbshc 4115 1 sbs
> battery 10902 0
> acpi_memhotplug 4135 0
> ac 3274 0
> parport_pc 21355 0
> lp 9491 0
> parport 33290 2 parport_pc,lp
> option 16045 0
> usb_wwan 10222 1 option
> usbserial 34477 2 option,usb_wwan
> serio_raw 4064 0
> tpm_tis 9203 0
> tpm 14317 1 tpm_tis
> tpm_bios 5252 1 tpm
> rtc_cmos 8731 0
> rtc_core 14080 1 rtc_cmos
> rtc_lib 2497 1 rtc_core
> button 5662 0
> igb 131680 0
> shpchp 29302 0
> pcspkr 1822 0
> dm_region_hash 9574 1 dm_mirror
> dm_log 8359 2 dm_mirror,dm_region_hash
> usb_storage 45133 0
> ata_piix 22147 0
> libata 169650 1 ata_piix
> cciss 88474 24
> sd_mod 28117 0
> scsi_mod 156163 5 scsi_dh,usb_storage,libata,cciss,sd_mod
> ext3 114308 12
> jbd 43368 1 ext3
> uhci_hcd 18941 0
> ohci_hcd 20027 0
> ehci_hcd 33605 0
>
>
> # ifconfig -a
> eth0 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE
> inet addr:119.46.110.197 Bcast:119.46.110.255 Mask:255.255.255.192
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:4872707550 errors:0 dropped:1177767 overruns:1177767 frame:0
> TX packets:5066061004 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:3719046973104 (3.3 TiB) TX bytes:4237588228875 (3.8 TiB)
>
> eth0:1 Link encap:Ethernet HWaddr 18:A9:05:41:CC:CE
> inet addr:119.46.110.249 Bcast:119.46.110.255 Mask:255.255.255.192
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>
>
>
> --
> Tomasz Chmielewski
> http://wpkg.org
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-18 17:53 ` Julian Anastasov
@ 2011-06-28 3:55 ` David Miller
2011-06-28 8:13 ` Julian Anastasov
2011-06-28 8:30 ` Tomasz Chmielewski
0 siblings, 2 replies; 12+ messages in thread
From: David Miller @ 2011-06-28 3:55 UTC (permalink / raw)
To: ja; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden
From: Julian Anastasov <ja@ssi.bg>
Date: Sat, 18 Jun 2011 20:53:59 +0300 (EEST)
> Hm, if it happens "sometimes", can it be some
> problem with tproxy and TIME_WAIT sockets? I see that
> tproxy_sk_is_transparent has special treatment for TW
> sockets while ip_route_me_harder is different. As result,
> may be input route is assigned for TW packets.
>
> May be inet_sk_flowi_flags() needs fixing, not
> sure. But following patch is first step to fix this
> problem. I don't have setup to test this patch.
TPROXY has special code to make sure that time-wait sockets
are not assigned to skb->sk, as explained in commit
d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause
all kinds of crashes in nfnetlink_log etc.
Therefore we would see skb->sk==NULL at ip_route_me_harder()
in that case.
> ===========================================================
>
> Avoid creating input routes with ip_route_me_harder.
> It does not work for locally generated packets. Instead,
> restrict sockets to provide valid saddr for output route (or
> unicast saddr for transparent proxy). For other traffic
> allow saddr to be unicast or local but if callers forget
> to check saddr type use 0 for the output route.
>
> The resulting handling should be:
>
> - REJECT TCP:
> - in INPUT we can provide addr_type = RTN_LOCAL but
> better allow rejecting traffic delivered with
> local route (no IP address => use RTN_UNSPEC to
> allow also RTN_UNICAST).
> - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST
> saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST
> - OUTPUT: RTN_UNSPEC
>
> - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT
>
> - IPVS:
> - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT
> to restrict saddr to be local
>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
Unless someone gives some negative feedback soon I'm going to
apply this.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-28 3:55 ` David Miller
@ 2011-06-28 8:13 ` Julian Anastasov
2011-06-28 8:41 ` David Miller
2011-06-28 8:30 ` Tomasz Chmielewski
1 sibling, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2011-06-28 8:13 UTC (permalink / raw)
To: David Miller; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden
Hello,
On Mon, 27 Jun 2011, David Miller wrote:
> From: Julian Anastasov <ja@ssi.bg>
> Date: Sat, 18 Jun 2011 20:53:59 +0300 (EEST)
>
> > Hm, if it happens "sometimes", can it be some
> > problem with tproxy and TIME_WAIT sockets? I see that
> > tproxy_sk_is_transparent has special treatment for TW
> > sockets while ip_route_me_harder is different. As result,
> > may be input route is assigned for TW packets.
> >
> > May be inet_sk_flowi_flags() needs fixing, not
> > sure. But following patch is first step to fix this
> > problem. I don't have setup to test this patch.
>
> TPROXY has special code to make sure that time-wait sockets
> are not assigned to skb->sk, as explained in commit
> d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause
> all kinds of crashes in nfnetlink_log etc.
>
> Therefore we would see skb->sk==NULL at ip_route_me_harder()
> in that case.
Aha, after this clarification other changes should not
be needed. If saddr is translated, now we will use
FLOWI_FLAG_ANYSRC. As result, if SNAT happens one day in
LOCAL_OUT, the new saddr can be unicast because RTN_UNSPEC
is provided for addr_type. If saddr is not changed, it
should be already validated when the first route for skb is
performed, so TPROXY should work.
> > ===========================================================
> >
> > Avoid creating input routes with ip_route_me_harder.
> > It does not work for locally generated packets. Instead,
> > restrict sockets to provide valid saddr for output route (or
> > unicast saddr for transparent proxy). For other traffic
> > allow saddr to be unicast or local but if callers forget
> > to check saddr type use 0 for the output route.
> >
> > The resulting handling should be:
> >
> > - REJECT TCP:
> > - in INPUT we can provide addr_type = RTN_LOCAL but
> > better allow rejecting traffic delivered with
> > local route (no IP address => use RTN_UNSPEC to
> > allow also RTN_UNICAST).
> > - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST
> > saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST
> > - OUTPUT: RTN_UNSPEC
> >
> > - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT
> >
> > - IPVS:
> > - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT
> > to restrict saddr to be local
> >
> > Signed-off-by: Julian Anastasov <ja@ssi.bg>
>
> Unless someone gives some negative feedback soon I'm going to
> apply this.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-28 3:55 ` David Miller
2011-06-28 8:13 ` Julian Anastasov
@ 2011-06-28 8:30 ` Tomasz Chmielewski
2011-06-28 8:40 ` David Miller
1 sibling, 1 reply; 12+ messages in thread
From: Tomasz Chmielewski @ 2011-06-28 8:30 UTC (permalink / raw)
To: David Miller; +Cc: ja, eric.dumazet, netdev, bazsi, hidden
On 28.06.2011 05:55, David Miller wrote:
>> The resulting handling should be:
>>
>> - REJECT TCP:
>> - in INPUT we can provide addr_type = RTN_LOCAL but
>> better allow rejecting traffic delivered with
>> local route (no IP address => use RTN_UNSPEC to
>> allow also RTN_UNICAST).
>> - FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST
>> saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST
>> - OUTPUT: RTN_UNSPEC
>>
>> - NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT
>>
>> - IPVS:
>> - use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT
>> to restrict saddr to be local
>>
>> Signed-off-by: Julian Anastasov<ja@ssi.bg>
>
> Unless someone gives some negative feedback soon I'm going to
> apply this.
Can you tell me where it will be pushed?
I.e. 3.x kernels only, or does it have a chance to go into 2.6.39.x?
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-28 8:30 ` Tomasz Chmielewski
@ 2011-06-28 8:40 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2011-06-28 8:40 UTC (permalink / raw)
To: mangoo; +Cc: ja, eric.dumazet, netdev, bazsi, hidden
From: Tomasz Chmielewski <mangoo@wpkg.org>
Date: Tue, 28 Jun 2011 10:30:11 +0200
> Can you tell me where it will be pushed?
>
> I.e. 3.x kernels only, or does it have a chance to go into 2.6.39.x?
I'll apply it for 3.0.0 and also queue it up for -stable.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-28 8:13 ` Julian Anastasov
@ 2011-06-28 8:41 ` David Miller
2011-06-28 9:05 ` Julian Anastasov
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2011-06-28 8:41 UTC (permalink / raw)
To: ja; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 28 Jun 2011 11:13:25 +0300 (EEST)
> On Mon, 27 Jun 2011, David Miller wrote:
>
>> TPROXY has special code to make sure that time-wait sockets
>> are not assigned to skb->sk, as explained in commit
>> d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause
>> all kinds of crashes in nfnetlink_log etc.
>>
>> Therefore we would see skb->sk==NULL at ip_route_me_harder()
>> in that case.
>
> Aha, after this clarification other changes should not
> be needed.
By this do you mean that you think your patch in this thread
is completely sufficient?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: what's causing "ip_rt_bug"?
2011-06-28 8:41 ` David Miller
@ 2011-06-28 9:05 ` Julian Anastasov
0 siblings, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2011-06-28 9:05 UTC (permalink / raw)
To: David Miller; +Cc: mangoo, eric.dumazet, netdev, bazsi, hidden
Hello,
On Tue, 28 Jun 2011, David Miller wrote:
> From: Julian Anastasov <ja@ssi.bg>
> Date: Tue, 28 Jun 2011 11:13:25 +0300 (EEST)
>
> > On Mon, 27 Jun 2011, David Miller wrote:
> >
> >> TPROXY has special code to make sure that time-wait sockets
> >> are not assigned to skb->sk, as explained in commit
> >> d503b30bd648b3cb4e5f50b65d27e389960cc6d9, that would cause
> >> all kinds of crashes in nfnetlink_log etc.
> >>
> >> Therefore we would see skb->sk==NULL at ip_route_me_harder()
> >> in that case.
> >
> > Aha, after this clarification other changes should not
> > be needed.
>
> By this do you mean that you think your patch in this thread
> is completely sufficient?
Yes. My worry was for the skb->sk != NULL not being
handled by inet_sk_flowi_flags for TW sockets. But it seems
it is not needed, so the patch in this form should be ok.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-06-28 9:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-17 20:00 what's causing "ip_rt_bug"? Tomasz Chmielewski
2011-06-17 20:36 ` Eric Dumazet
2011-06-17 21:37 ` Tomasz Chmielewski
2011-06-17 23:56 ` Julian Anastasov
2011-06-18 8:31 ` Tomasz Chmielewski
2011-06-18 17:53 ` Julian Anastasov
2011-06-28 3:55 ` David Miller
2011-06-28 8:13 ` Julian Anastasov
2011-06-28 8:41 ` David Miller
2011-06-28 9:05 ` Julian Anastasov
2011-06-28 8:30 ` Tomasz Chmielewski
2011-06-28 8:40 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).