netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bad nat connection tracking performance with ip_gre
@ 2009-08-18 10:14 Timo Teräs
  2009-08-18 10:38 ` Patrick McHardy
  0 siblings, 1 reply; 9+ messages in thread
From: Timo Teräs @ 2009-08-18 10:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

Hi,

I noticed (in relation to my nbma gre multicast testing) that
the nat connection tracking code does not cache flows for
locally originating traffic that is routed to gre tunnel
(forwarded traffic is ok).

I basically have a router box with nbma gre tunnel. It gets
10/8 traffic. And is routed to internet interface. An ipsec
xfrm is applied.

Now, if the router box is forwarding traffic from some
physical interface, everything works as expected.

However, if a local process on the router box is sending
packets that go to gre tunnel, each packet causes a new
lookup on nat table OUTPUT chain. This is easily verified
by doing flood ping on router box on private IP and the
counters on nat table OUTPUT chain default policy start
to get incremented wildly.

I tried to oprofile this and it says most of the time is
spent in ipt_do_table(). I would suppose that the place
where netfilter hook is called is
ip_gre.c:ipgre_tunnel_xmit() when it invokes macro
IPTUNNEL_XMIT() calling ip_local_out().

Monitoring the connection tracking stats, it looks like
all packets are reusing the proper connection tracking
cache entry. But somehow the nat target still gets
called for the locally originating packets to gre.

Any ideas how to fix this?

Thanks,
 Timo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 10:14 bad nat connection tracking performance with ip_gre Timo Teräs
@ 2009-08-18 10:38 ` Patrick McHardy
  2009-08-18 12:45   ` Timo Teräs
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick McHardy @ 2009-08-18 10:38 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netfilter-devel, netdev

Timo Teräs wrote:
> Hi,
> 
> I noticed (in relation to my nbma gre multicast testing) that
> the nat connection tracking code does not cache flows for
> locally originating traffic that is routed to gre tunnel
> (forwarded traffic is ok).
> 
> I basically have a router box with nbma gre tunnel. It gets
> 10/8 traffic. And is routed to internet interface. An ipsec
> xfrm is applied.
> 
> Now, if the router box is forwarding traffic from some
> physical interface, everything works as expected.
> 
> However, if a local process on the router box is sending
> packets that go to gre tunnel, each packet causes a new
> lookup on nat table OUTPUT chain. This is easily verified
> by doing flood ping on router box on private IP and the
> counters on nat table OUTPUT chain default policy start
> to get incremented wildly.
> 
> I tried to oprofile this and it says most of the time is
> spent in ipt_do_table(). I would suppose that the place
> where netfilter hook is called is
> ip_gre.c:ipgre_tunnel_xmit() when it invokes macro
> IPTUNNEL_XMIT() calling ip_local_out().
> 
> Monitoring the connection tracking stats, it looks like
> all packets are reusing the proper connection tracking
> cache entry. But somehow the nat target still gets
> called for the locally originating packets to gre.
> 
> Any ideas how to fix this?

Please use the TRACE target in raw/OUTPUT to trace the flow of
packets through the netfilter hooks:

modprobe ipt_LOG
iptables -t raw -A OUTPUT -j TRACE
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 10:38 ` Patrick McHardy
@ 2009-08-18 12:45   ` Timo Teräs
  2009-08-18 13:01     ` Patrick McHardy
  0 siblings, 1 reply; 9+ messages in thread
From: Timo Teräs @ 2009-08-18 12:45 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

Patrick McHardy wrote:
> Timo Teräs wrote:
>> However, if a local process on the router box is sending
>> packets that go to gre tunnel, each packet causes a new
>> lookup on nat table OUTPUT chain. This is easily verified
>> by doing flood ping on router box on private IP and the
>> counters on nat table OUTPUT chain default policy start
>> to get incremented wildly.

So ping test is not good as the connection tracking entry
is apparently removed once ICMP reply is received.

The one way to reliably to reproduce this is when I'm
sending packets with sendto() from user-land to nbma gre 
tunnel and specifying the nbma ip address.

>> Monitoring the connection tracking stats, it looks like
>> all packets are reusing the proper connection tracking
>> cache entry. But somehow the nat target still gets
>> called for the locally originating packets to gre.
>>
>> Any ideas how to fix this?
> 
> Please use the TRACE target in raw/OUTPUT to trace the flow of
> packets through the netfilter hooks:
> 
> modprobe ipt_LOG
> iptables -t raw -A OUTPUT -j TRACE

FORWARDED PACKET, does not hog CPU
----------------------------------

IN=eth1 OUT= MAC=x:x:x:x:x SRC=10.252.5.10 DST=239.255.12.42
LEN=1428 TOS=0x00 PREC=0x00 TTL=8 ID=31320 DF PROTO=UDP
SPT=34757 DPT=50002 LEN=1408
	1. mangle:INPUT
	2. filter:INPUT
	3. raw:PREROUTING
	4. mangle:PREROUTING

Next it turns it to GRE encapsulated packet like:

IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip LEN=0
TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47
	1. mangle:FORWARD
	2. filter:FORWARD

Gets proper SRC and LEN at this point and:
	1. raw:OUTPUT
	2. mangle:OUTPUT
	3. nat:OUTPUT
	4. filter:OUTPUT

LOCALLY GENERATED PACKET, hogs CPU
----------------------------------

IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344
TOS=0x00 PREC=0x00 TTL=8 ID=41664 DF PROTO=UDP SPT=47920
DPT=1234 LEN=1324 UID=1007 GID=1007
	1. raw:OUTPUT
	2. mangle:OUTPUT
	3. filter:OUTPUT
	4. mangle:POSTROUTING

Picked up by multicast routing.

IN=eth1 OUT= MAC= SRC=10.252.5.1 DST=239.255.12.42 LEN=1344
TOS=0x00 PREC=0x00 TTL=8 ID=41664 DF PROTO=UDP SPT=47920
DPT=1234 LEN=1324
	1. raw:PREROUTING
	2. mangle:PREROUTING

Forwarded to GRE tunnel.

IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip LEN=0 TOS=0x00
PREC=0x00 TTL=64 ID=0 DF PROTO=47
	1. mangle:FORWARD
	2. filter:FORWARD

Apparently GRE xmit code fixes it to:

IN= OUT=eth0 SRC=my.pub.lic.ip DST=re.mo.te.ip LEN=1372 TOS=0x00
PREC=0x00 TTL=64 ID=0 DF PROTO=47
	1. raw:OUTPUT
	2. mangle:OUTPUT

---

It's starting to smell like ip_gre problem. ipgre_header() seems
to set only the destination IP address. And that probably confuses
the connection tracking code for locally originating packets.

I suppose we should construct almost full IP header in ipgre_header().

- Timo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 12:45   ` Timo Teräs
@ 2009-08-18 13:01     ` Patrick McHardy
  2009-08-18 13:53       ` Timo Teräs
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick McHardy @ 2009-08-18 13:01 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netfilter-devel, netdev

Timo Teräs wrote:
> LOCALLY GENERATED PACKET, hogs CPU
> ----------------------------------
> 
> IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344
> TOS=0x00 PREC=0x00 TTL=8 ID=41664 DF PROTO=UDP SPT=47920
> DPT=1234 LEN=1324 UID=1007 GID=1007
>     1. raw:OUTPUT
>     2. mangle:OUTPUT
>     3. filter:OUTPUT
>     4. mangle:POSTROUTING
> 

Please include the complete output, I need to see the devices logged
at each hook.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 13:01     ` Patrick McHardy
@ 2009-08-18 13:53       ` Timo Teräs
  2009-08-18 14:58         ` Patrick McHardy
  0 siblings, 1 reply; 9+ messages in thread
From: Timo Teräs @ 2009-08-18 13:53 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

Patrick McHardy wrote:
> Timo Teräs wrote:
>> LOCALLY GENERATED PACKET, hogs CPU
>> ----------------------------------
>>
>> IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344
>> TOS=0x00 PREC=0x00 TTL=8 ID=41664 DF PROTO=UDP SPT=47920
>> DPT=1234 LEN=1324 UID=1007 GID=1007
>>     1. raw:OUTPUT
>>     2. mangle:OUTPUT
>>     3. filter:OUTPUT
>>     4. mangle:POSTROUTING
>>
> 
> Please include the complete output, I need to see the devices logged
> at each hook.

The devices are identical for each hook grouped under same line.

Here are the interesting lines from one packet:

Generation:

raw:OUTPUT:policy:2 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 UID=1007 GID=1007 
mangle:OUTPUT:policy:1 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 UID=1007 GID=1007 

(the nat hook is called for initial packet only):
nat:OUTPUT:policy:1 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36593 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 UID=1007 GID=1007 

filter:OUTPUT:policy:1 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 UID=1007 GID=1007 
mangle:POSTROUTING:policy:1 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 
mangle:POSTROUTING:policy:1 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 UID=1007 GID=1007 

Looped back by multicast routing:

raw:PREROUTING:policy:1 IN=eth1 OUT= MAC= SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 
mangle:PREROUTING:policy:1 IN=eth1 OUT= MAC= SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 

The cpu hogging happens somewhere below this, since the more
multicast destinations I have the more CPU it takes.

Multicast forwarded (I hacked this into the code; but similar
dump happens on local sendto()):

Actually, now that I think, here we should have the inner IP
contents, and not the incomplete outer yet. So apparently
the ipgre_header() messes the network_header position.

mangle:FORWARD:policy:1 IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip LEN=0 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 
filter:FORWARD:rule:2 IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip LEN=0 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 

ip_gre xmit sends out:

raw:OUTPUT:rule:1 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 
raw:OUTPUT:policy:2 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 
mangle:OUTPUT:policy:1 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 

(nat hook for initial packets)
nat:OUTPUT:policy:1 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 

filter:OUTPUT:policy:1 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 13:53       ` Timo Teräs
@ 2009-08-18 14:58         ` Patrick McHardy
  2009-08-18 17:39           ` Timo Teräs
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick McHardy @ 2009-08-18 14:58 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netfilter-devel, netdev

Timo Teräs wrote:
> Patrick McHardy wrote:
>> Timo Teräs wrote:
>>> LOCALLY GENERATED PACKET, hogs CPU
>>> ----------------------------------
>>>
>>> IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344
>>> TOS=0x00 PREC=0x00 TTL=8 ID=41664 DF PROTO=UDP SPT=47920
>>> DPT=1234 LEN=1324 UID=1007 GID=1007
>>>     1. raw:OUTPUT
>>>     2. mangle:OUTPUT
>>>     3. filter:OUTPUT
>>>     4. mangle:POSTROUTING
>>>
>>
>> Please include the complete output, I need to see the devices logged
>> at each hook.
> 
> The devices are identical for each hook grouped under same line.
> 
> Here are the interesting lines from one packet:
> 
> Generation:
> 
> raw:OUTPUT:policy:2 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42
> LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977
> DPT=1234 LEN=1324 UID=1007 GID=1007 mangle:OUTPUT:policy:1 IN= OUT=eth1
> SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8
> ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324 UID=1007 GID=1007
> (the nat hook is called for initial packet only):
> nat:OUTPUT:policy:1 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42
> LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36593 DF PROTO=UDP SPT=33977
> DPT=1234 LEN=1324 UID=1007 GID=1007
> filter:OUTPUT:policy:1 IN= OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42
> LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF PROTO=UDP SPT=33977
> DPT=1234 LEN=1324 UID=1007 GID=1007 mangle:POSTROUTING:policy:1 IN=
> OUT=eth1 SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00
> TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324
> mangle:POSTROUTING:policy:1 IN= OUT=eth1 SRC=10.252.5.1
> DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF
> PROTO=UDP SPT=33977 DPT=1234 LEN=1324 UID=1007 GID=1007
> Looped back by multicast routing:
> 
> raw:PREROUTING:policy:1 IN=eth1 OUT= MAC= SRC=10.252.5.1
> DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF
> PROTO=UDP SPT=33977 DPT=1234 LEN=1324 mangle:PREROUTING:policy:1 IN=eth1
> OUT= MAC= SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00
> TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324
> The cpu hogging happens somewhere below this, since the more
> multicast destinations I have the more CPU it takes.

So you're sending to multiple destinations? That obviously increases
the time spent in netfilter and the remaining networking stack.

> Multicast forwarded (I hacked this into the code; but similar
> dump happens on local sendto()):
> 
> Actually, now that I think, here we should have the inner IP
> contents, and not the incomplete outer yet. So apparently
> the ipgre_header() messes the network_header position.

It shouldn't even have been called at this point. Please retry this
without your changes.

> mangle:FORWARD:policy:1 IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip
> LEN=0 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 filter:FORWARD:rule:2
> IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip LEN=0 TOS=0x00 PREC=0x00
> TTL=64 ID=0 DF PROTO=47

This looks really broken. Why is the protocol already 47 before it even
reaches the gre tunnel?

> ip_gre xmit sends out:

There should be a POSTROUTING hook here.

> raw:OUTPUT:rule:1 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372
> TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 raw:OUTPUT:policy:2 IN=
> OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372 TOS=0x00 PREC=0x00
> TTL=64 ID=0 DF PROTO=47 mangle:OUTPUT:policy:1 IN= OUT=eth0
> SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372 TOS=0x00 PREC=0x00 TTL=64 ID=0
> DF PROTO=47
> (nat hook for initial packets)
> nat:OUTPUT:policy:1 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip LEN=1372
> TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47
> filter:OUTPUT:policy:1 IN= OUT=eth0 SRC=lo.ca.l.ip DST=re.mo.te.ip
> LEN=1372 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47
> - Timo
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 14:58         ` Patrick McHardy
@ 2009-08-18 17:39           ` Timo Teräs
  2009-08-18 19:36             ` Timo Teräs
  0 siblings, 1 reply; 9+ messages in thread
From: Timo Teräs @ 2009-08-18 17:39 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

Patrick McHardy wrote:
> Timo Teräs wrote:
>> Looped back by multicast routing:
>>
>> raw:PREROUTING:policy:1 IN=eth1 OUT= MAC= SRC=10.252.5.1
>> DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF
>> PROTO=UDP SPT=33977 DPT=1234 LEN=1324 mangle:PREROUTING:policy:1 IN=eth1
>> OUT= MAC= SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00
>> TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324
>> The cpu hogging happens somewhere below this, since the more
>> multicast destinations I have the more CPU it takes.
> 
> So you're sending to multiple destinations? That obviously increases
> the time spent in netfilter and the remaining networking stack.

Yes. But my observation was that for the same amount of packets
sent locally the CPU usage is significantly higher than if they
are forwarded from physical interface. That's what made me
curious.

If I had remember that icmp conn track entries get pruned right
when they get icmp reply back, I would not have probably bothered
to bug you. But that made me think it was more of generic problem
than my patch.

>> Multicast forwarded (I hacked this into the code; but similar
>> dump happens on local sendto()):
>>
>> Actually, now that I think, here we should have the inner IP
>> contents, and not the incomplete outer yet. So apparently
>> the ipgre_header() messes the network_header position.
> 
> It shouldn't even have been called at this point. Please retry this
> without your changes.

I patched ipmr.c to explicitly call dev_hard_header to setup the
ipgre nbma receiver. Sadly, the call was wrong side of the nf_hook.
Adjusting that makes the forward hooks look ok.

I thought hook was using network_header to figure out where the
IP header is, but looks like that isn't the case.

>> mangle:FORWARD:policy:1 IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip
>> LEN=0 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 filter:FORWARD:rule:2
>> IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip LEN=0 TOS=0x00 PREC=0x00
>> TTL=64 ID=0 DF PROTO=47
> 
> This looks really broken. Why is the protocol already 47 before it even
> reaches the gre tunnel?

Broken by me as explained.

>> ip_gre xmit sends out:
> 
> There should be a POSTROUTING hook here.

Hmm... Looking at the code I probably broke this too. Could missing
this hook have a performance penalty for future packets for the
same flow?

Ok. I'll go back to drawing board. I should have done the
multicast handling for nbma destinations on ip_gre side as I was
wondering earlier. I'll also double check with oprofile the local
sendto() approach where it dies.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 17:39           ` Timo Teräs
@ 2009-08-18 19:36             ` Timo Teräs
  2009-08-19  8:40               ` Timo Teräs
  0 siblings, 1 reply; 9+ messages in thread
From: Timo Teräs @ 2009-08-18 19:36 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

Timo Teräs wrote:
> Yes. But my observation was that for the same amount of packets
> sent locally the CPU usage is significantly higher than if they
> are forwarded from physical interface. That's what made me
> curious.
> 
> If I had remember that icmp conn track entries get pruned right
> when they get icmp reply back, I would not have probably bothered
> to bug you. But that made me think it was more of generic problem
> than my patch.
> 
> I'll also double check with oprofile the local sendto() approach 
> where it dies.

Ok, finally figured out the difference. Looks like depending
on the sendto() / local route / forward route / my patched mrt
the skb that gets passed to ipgre_tunnel_xmit() seems to have
nfctinfo either 0 or 2. This value is not modified; nf_reset()
is called just before ip_local_out(). Looks like nf_reset()
clears nfct to NULL, but does not touch nfctinfo.

So when LOCAL_OUT hook for the GRE packet is hit, depending
where the packet came: it has nfct=NULL and nfctinfo=ESTABLISHED
or NEW. This also seems to affect if that specific skb gets
the nat/OUTPUT hook called.

Is this behaviour for nf_reset() intentional?

- Timo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bad nat connection tracking performance with ip_gre
  2009-08-18 19:36             ` Timo Teräs
@ 2009-08-19  8:40               ` Timo Teräs
  0 siblings, 0 replies; 9+ messages in thread
From: Timo Teräs @ 2009-08-19  8:40 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

Timo Teräs wrote:
> Ok, finally figured out the difference. Looks like depending
> on the sendto() / local route / forward route / my patched mrt
> the skb that gets passed to ipgre_tunnel_xmit() seems to have
> nfctinfo either 0 or 2. This value is not modified; nf_reset()
> is called just before ip_local_out(). Looks like nf_reset()
> clears nfct to NULL, but does not touch nfctinfo.
> 
> So when LOCAL_OUT hook for the GRE packet is hit, depending
> where the packet came: it has nfct=NULL and nfctinfo=ESTABLISHED
> or NEW. This also seems to affect if that specific skb gets
> the nat/OUTPUT hook called.
> 
> Is this behaviour for nf_reset() intentional?

Apparently it does not matter.

The real problem seems to be bug that did not account for
irq times properly. It was fixed by "sched: account system time
properly" f5f293a4e3d0a0c52cec31de6762c95050156516 and that
caused biased CPU usage measurements.

Testing with kernel having that, I'm getting the same CPU
readings. And oprofile in timer mode didn't help noticing this.

Sorry for the noise.

- Timo


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-08-19  8:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-18 10:14 bad nat connection tracking performance with ip_gre Timo Teräs
2009-08-18 10:38 ` Patrick McHardy
2009-08-18 12:45   ` Timo Teräs
2009-08-18 13:01     ` Patrick McHardy
2009-08-18 13:53       ` Timo Teräs
2009-08-18 14:58         ` Patrick McHardy
2009-08-18 17:39           ` Timo Teräs
2009-08-18 19:36             ` Timo Teräs
2009-08-19  8:40               ` Timo Teräs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).