Re: Packet time delays on multi-core systems

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Packet time delays on multi-core systems
       [not found] <20100929191851.GC86786@beaver.vrungel.ru>
@ 2010-09-29 21:45 ` Eric Dumazet
  2010-09-30  6:24   ` Alexey Vlasov
  2010-09-30 12:30   ` Alexey Vlasov
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-09-29 21:45 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev

Le mercredi 29 septembre 2010 à 23:18 +0400, Alexey Vlasov a écrit : 
> Hi.
> 
> I'm not sure actually that I should write here, may be I should ask in
> netfilter maillist, but if is something wrong please correct me.
> 

CC netdev


> I've got rather large linux shared hosting, and on my new servers I
> noticed some strange singularity, that this simple rule:
> 
> # iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags
> FIN,SYN,RST,ACK SYN -j LOG --log-prefix "ipsec:SYN-OUTPUT "
> --log-uid
> 
> gives essential time delays simply at ping from the adjacent server
> on a local area network. I don't know precisely what's wrong whether the
> reason is in the bad support by a kernel of new hardware, or it concerns
> generally the new kernel, but now it leads to the situation that even at simple
> DDOS attacks to client sites, it becomes difficult to make something, and in
> general all works only worse.
> 
> It seems to me that with the increase of CPU cores' amount, it only becomes
> worse and worse, and, obviously, iptables uses resources of only one processor,
> which resources to it for any reason doesn't suffice.
> 

Its not true. iptables can run on all cpus in //

> newbox # iptables -F
> otherbox # ping -c 100 newbox
> ...
> 100 packets transmitted, 100 received, 0% packet loss, time 100044ms
> rtt min/avg/max/mdev = 0.133/2.637/17.172/3.736 ms
> 
> OK.
> 
> newbox # iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> -j LOG --log-prefix "ipsec:SYN-OUTPUT " --log-uid
> otherbox # ping -c 100 newbox
> ...
> 64 bytes from (newbox): icmp_seq=3 ttl=64 time=1.58 ms
> 64 bytes from (newbox): icmp_seq=4 ttl=64 time=98.7 ms
> 64 bytes from (newbox): icmp_seq=5 ttl=64 time=18.2 ms
> 64 bytes from (newbox): icmp_seq=6 ttl=64 time=6.13 ms
> 64 bytes from (newbox): icmp_seq=7 ttl=64 time=108 ms
> ...
> 64 bytes from (newbox): icmp_seq=55 ttl=64 time=2.30 ms
> 64 bytes from (newbox): icmp_seq=56 ttl=64 time=59.9 ms
> 64 bytes from (newbox): icmp_seq=57 ttl=64 time=0.155 ms
> ...
> 64 bytes from (newbox): icmp_seq=61 ttl=64 time=13.4 ms
> 64 bytes from (newbox): icmp_seq=62 ttl=64 time=55.0 ms
> 64 bytes from (newbox): icmp_seq=63 ttl=64 time=0.233 ms
> ...
> 100 packets transmitted, 100 received, 0% packet loss, time 99957ms
> rtt min/avg/max/mdev = 0.111/7.519/108.061/18.478 ms
> 
> newbox # iptables -L -v -n
> Chain INPUT (policy ACCEPT 346K packets, 213M bytes)
>  pkts bytes target     prot opt in     out     source               destination
> 
> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
>  pkts bytes target     prot opt in     out     source               destination
> 
> Chain OUTPUT (policy ACCEPT 296K packets, 290M bytes)
>  pkts bytes target     prot opt in     out     source               destination
>   234 14040 LOG        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0    
> tcp dpt:80 flags:0x17/0x02 LOG flags 8 level 4 prefix `ipsec:SYN-OUTPUT- '
> 
> My old server: Intel SR1500, Xeon 5430, kernel 2.6.24 - 2.6.28
> Newbox: SR1620UR, 5650, kernel 2.6.32
> 
> Thanks in advance.
> 

Seems strange indeed, since the LOG you add should not slowdown icmp
trafic that much.

But if you send SYN packets in the same time, (logged), this might slow
down the reception (and answers) of ICMP frames. LOG target can be quite
expensive... 

Is using other rules gives same problem ?

iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-29 21:45 ` Packet time delays on multi-core systems Eric Dumazet
@ 2010-09-30  6:24   ` Alexey Vlasov
  2010-09-30  6:33     ` Eric Dumazet
  2010-09-30 12:30   ` Alexey Vlasov
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey Vlasov @ 2010-09-30  6:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev

Here I found some dude with the same problem:
http://lkml.org/lkml/2010/7/9/340

On Wed, Sep 29, 2010 at 11:45:21PM +0200, Eric Dumazet wrote:
> Le mercredi 29 septembre 2010 ?? 23:18 +0400, Alexey Vlasov a ??crit : 
> > Hi.
> > 
> > I'm not sure actually that I should write here, may be I should ask in
> > netfilter maillist, but if is something wrong please correct me.
> > 
> 
> CC netdev
> 
> 
> > I've got rather large linux shared hosting, and on my new servers I
> > noticed some strange singularity, that this simple rule:
> > 
> > # iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags
> > FIN,SYN,RST,ACK SYN -j LOG --log-prefix "ipsec:SYN-OUTPUT "
> > --log-uid
> > 
> > gives essential time delays simply at ping from the adjacent server
> > on a local area network. I don't know precisely what's wrong whether the
> > reason is in the bad support by a kernel of new hardware, or it concerns
> > generally the new kernel, but now it leads to the situation that even at simple
> > DDOS attacks to client sites, it becomes difficult to make something, and in
> > general all works only worse.
> > 
> > It seems to me that with the increase of CPU cores' amount, it only becomes
> > worse and worse, and, obviously, iptables uses resources of only one processor,
> > which resources to it for any reason doesn't suffice.
> > 
> 
> Its not true. iptables can run on all cpus in //
> 
> > newbox # iptables -F
> > otherbox # ping -c 100 newbox
> > ...
> > 100 packets transmitted, 100 received, 0% packet loss, time 100044ms
> > rtt min/avg/max/mdev = 0.133/2.637/17.172/3.736 ms
> > 
> > OK.
> > 
> > newbox # iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> > -j LOG --log-prefix "ipsec:SYN-OUTPUT " --log-uid
> > otherbox # ping -c 100 newbox
> > ...
> > 64 bytes from (newbox): icmp_seq=3 ttl=64 time=1.58 ms
> > 64 bytes from (newbox): icmp_seq=4 ttl=64 time=98.7 ms
> > 64 bytes from (newbox): icmp_seq=5 ttl=64 time=18.2 ms
> > 64 bytes from (newbox): icmp_seq=6 ttl=64 time=6.13 ms
> > 64 bytes from (newbox): icmp_seq=7 ttl=64 time=108 ms
> > ...
> > 64 bytes from (newbox): icmp_seq=55 ttl=64 time=2.30 ms
> > 64 bytes from (newbox): icmp_seq=56 ttl=64 time=59.9 ms
> > 64 bytes from (newbox): icmp_seq=57 ttl=64 time=0.155 ms
> > ...
> > 64 bytes from (newbox): icmp_seq=61 ttl=64 time=13.4 ms
> > 64 bytes from (newbox): icmp_seq=62 ttl=64 time=55.0 ms
> > 64 bytes from (newbox): icmp_seq=63 ttl=64 time=0.233 ms
> > ...
> > 100 packets transmitted, 100 received, 0% packet loss, time 99957ms
> > rtt min/avg/max/mdev = 0.111/7.519/108.061/18.478 ms
> > 
> > newbox # iptables -L -v -n
> > Chain INPUT (policy ACCEPT 346K packets, 213M bytes)
> >  pkts bytes target     prot opt in     out     source               destination
> > 
> > Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
> >  pkts bytes target     prot opt in     out     source               destination
> > 
> > Chain OUTPUT (policy ACCEPT 296K packets, 290M bytes)
> >  pkts bytes target     prot opt in     out     source               destination
> >   234 14040 LOG        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0    
> > tcp dpt:80 flags:0x17/0x02 LOG flags 8 level 4 prefix `ipsec:SYN-OUTPUT- '
> > 
> > My old server: Intel SR1500, Xeon 5430, kernel 2.6.24 - 2.6.28
> > Newbox: SR1620UR, 5650, kernel 2.6.32
> > 
> > Thanks in advance.
> > 
> 
> Seems strange indeed, since the LOG you add should not slowdown icmp
> trafic that much.
> 
> But if you send SYN packets in the same time, (logged), this might slow
> down the reception (and answers) of ICMP frames. LOG target can be quite
> expensive... 
> 
> Is using other rules gives same problem ?
> 
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30  6:24   ` Alexey Vlasov
@ 2010-09-30  6:33     ` Eric Dumazet
  2010-09-30 12:23       ` Alexey Vlasov
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-30  6:33 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev

Le jeudi 30 septembre 2010 à 10:24 +0400, Alexey Vlasov a écrit :
> Here I found some dude with the same problem:
> http://lkml.org/lkml/2010/7/9/340
> 

In your opinion its the same problem.

But the description you gave is completely different.

You have time skew only when activating a particular iptables rule.

No ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30  6:33     ` Eric Dumazet
@ 2010-09-30 12:23       ` Alexey Vlasov
  2010-09-30 12:44         ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey Vlasov @ 2010-09-30 12:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev

On Thu, Sep 30, 2010 at 08:33:52AM +0200, Eric Dumazet wrote:
> Le jeudi 30 septembre 2010 ?? 10:24 +0400, Alexey Vlasov a ??crit :
> > Here I found some dude with the same problem:
> > http://lkml.org/lkml/2010/7/9/340
> > 
> 
> In your opinion its the same problem.
> 
> But the description you gave is completely different.
> 
> You have time skew only when activating a particular iptables rule.
 
Well I put interrups from NIC, namely tx/rx query, to different
processors and got normal pings by adding LOG rule.

I also found that overruns is constantly growing, I don't know if these are connected.
RX packets:2831439546 errors:0 dropped:134726 overruns:947671733 frame:0
TX packets:2880849825 errors:0 dropped:0 overruns:0 carrier:0

Rather strange that only one processor was involved, even in top was
clear that ksoftirqd eats the first processor up to 100%. 

Here goes the typical distribution of interrups on new servers:
           CPU0    CPU1    CPU2    CPU3 ... CPU23
752:         11       0       0       0 ...     0 PCI-MSI-edge eth0
753: 2799366721       0       0       0 ...     0 PCI-MSI-edge eth0-rx3
754: 2821840553       0       0       0 ...     0 PCI-MSI-edge eth0-rx2
755: 2786117044       0       0       0 ...     0 PCI-MSI-edge eth0-rx1
756: 2896099336       0       0       0 ...     0 PCI-MSI-edge eth0-rx0
757: 1808404680       0       0       0 ...     0 PCI-MSI-edge eth0-tx3
758: 1797855130       0       0       0 ...     0 PCI-MSI-edge eth0-tx2
759: 1807222032       0       0       0 ...     0 PCI-MSI-edge eth0-tx1
760: 1820309360       0       0       0 ...     0 PCI-MSI-edge eth0-tx0

On the old ones:
           CPU0       CPU1       CPU2  ...      CPU8
502:  522320256  522384039  522327386  ... 522380267 PCI-MSI-edge eth0

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30 12:23       ` Alexey Vlasov
@ 2010-09-30 12:44         ` Eric Dumazet
  2010-09-30 17:37           ` Alexey Vlasov
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-30 12:44 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev

Le jeudi 30 septembre 2010 à 16:23 +0400, Alexey Vlasov a écrit :
> On Thu, Sep 30, 2010 at 08:33:52AM +0200, Eric Dumazet wrote:
> > Le jeudi 30 septembre 2010 ?? 10:24 +0400, Alexey Vlasov a ??crit :
> > > Here I found some dude with the same problem:
> > > http://lkml.org/lkml/2010/7/9/340
> > > 
> > 
> > In your opinion its the same problem.
> > 
> > But the description you gave is completely different.
> > 
> > You have time skew only when activating a particular iptables rule.
>  
> Well I put interrups from NIC, namely tx/rx query, to different
> processors and got normal pings by adding LOG rule.
> 
> I also found that overruns is constantly growing, I don't know if these are connected.
> RX packets:2831439546 errors:0 dropped:134726 overruns:947671733 frame:0
> TX packets:2880849825 errors:0 dropped:0 overruns:0 carrier:0
> 
> Rather strange that only one processor was involved, even in top was
> clear that ksoftirqd eats the first processor up to 100%. 
> 

OK, because only CPU0 gets interrupts of all queues.

> Here goes the typical distribution of interrups on new servers:
>            CPU0    CPU1    CPU2    CPU3 ... CPU23
> 752:         11       0       0       0 ...     0 PCI-MSI-edge eth0
> 753: 2799366721       0       0       0 ...     0 PCI-MSI-edge eth0-rx3
> 754: 2821840553       0       0       0 ...     0 PCI-MSI-edge eth0-rx2
> 755: 2786117044       0       0       0 ...     0 PCI-MSI-edge eth0-rx1
> 756: 2896099336       0       0       0 ...     0 PCI-MSI-edge eth0-rx0
> 757: 1808404680       0       0       0 ...     0 PCI-MSI-edge eth0-tx3
> 758: 1797855130       0       0       0 ...     0 PCI-MSI-edge eth0-tx2
> 759: 1807222032       0       0       0 ...     0 PCI-MSI-edge eth0-tx1
> 760: 1820309360       0       0       0 ...     0 PCI-MSI-edge eth0-tx0
> 

echo 01 >/proc/irq/*/eth0-rx0/../smp_affinity
echo 02 >/proc/irq/*/eth0-rx1/../smp_affinity
echo 04 >/proc/irq/*/eth0-rx2/../smp_affinity
echo 08 >/proc/irq/*/eth0-rx3/../smp_affinity


cat /proc/irq/*/eth0-rx0/../smp_affinity
cat /proc/irq/*/eth0-rx1/../smp_affinity
cat /proc/irq/*/eth0-rx2/../smp_affinity
cat /proc/irq/*/eth0-rx3/../smp_affinity



> On the old ones:
>            CPU0       CPU1       CPU2  ...      CPU8
> 502:  522320256  522384039  522327386  ... 522380267 PCI-MSI-edge eth0
> 

What network driver is it (newbox), was it (old box) ?

If you switch to 2.6.35, you can use RPS to dispatch packets to several
cpu, in the case interrupt affinity could not be changed (all interrupts
still handled by CPU0)




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30 12:44         ` Eric Dumazet
@ 2010-09-30 17:37           ` Alexey Vlasov
  2010-09-30 18:03             ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey Vlasov @ 2010-09-30 17:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev

On Thu, Sep 30, 2010 at 02:44:29PM +0200, Eric Dumazet wrote:
> Le jeudi 30 septembre 2010 ?? 16:23 +0400, Alexey Vlasov a ??crit :
> > On Thu, Sep 30, 2010 at 08:33:52AM +0200, Eric Dumazet wrote:
> > > Le jeudi 30 septembre 2010 ?? 10:24 +0400, Alexey Vlasov a ??crit :
> > > > Here I found some dude with the same problem:
> > > > http://lkml.org/lkml/2010/7/9/340
> >  
> > Well I put interrups from NIC, namely tx/rx query, to different
> > processors and got normal pings by adding LOG rule.
> > 
> > I also found that overruns is constantly growing, I don't know if these are connected.
> > RX packets:2831439546 errors:0 dropped:134726 overruns:947671733 frame:0
> > TX packets:2880849825 errors:0 dropped:0 overruns:0 carrier:0
> > 

Too early to be happy, concerning one rule- the situation got better, but still
there are some time delays. But adding one more rule:
-A INPUT -p all -m state --state INVALID -j LOG --log-prefix
"ipsec:IN-INVALID "
it got totally wrecked:
...
64 bytes from (10.0.2.17): icmp_seq=24 ttl=64 time=0.342 ms
64 bytes from (10.0.2.17): icmp_seq=25 ttl=64 time=1868 ms
64 bytes from (10.0.2.17): icmp_seq=26 ttl=64 time=1448 ms
64 bytes from (10.0.2.17): icmp_seq=27 ttl=64 time=447 ms
64 bytes from (10.0.2.17): icmp_seq=28 ttl=64 time=0.196 ms
...
100 packets transmitted, 100 received, 0% packet loss, time 99990ms
rtt min/avg/max/mdev = 0.108/39.068/1868.663/237.507 ms, pipe 2

# iptables -L -v -n
Chain INPUT (policy ACCEPT 601K packets, 475M bytes)
 pkts bytes target     prot opt in     out     source               destination
  275 11096 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           state INVALID LOG flags 0 level 4 prefix `ipsec:IN-INVALID '

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 529K packets, 561M bytes)
 pkts bytes target     prot opt in     out     source               destination
13979  839K LOG        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:80 flags:0x17/0x02 LOG flags 8 level 4 prefix `ipsec:SYN-OUTPUT-DROP '
 
> > Here goes the typical distribution of interrups on new servers:
> >            CPU0    CPU1    CPU2    CPU3 ... CPU23
> > 752:         11       0       0       0 ...     0 PCI-MSI-edge eth0
> > 753: 2799366721       0       0       0 ...     0 PCI-MSI-edge eth0-rx3
> > 754: 2821840553       0       0       0 ...     0 PCI-MSI-edge eth0-rx2
> > 755: 2786117044       0       0       0 ...     0 PCI-MSI-edge eth0-rx1
> > 756: 2896099336       0       0       0 ...     0 PCI-MSI-edge eth0-rx0
> > 757: 1808404680       0       0       0 ...     0 PCI-MSI-edge eth0-tx3
> > 758: 1797855130       0       0       0 ...     0 PCI-MSI-edge eth0-tx2
> > 759: 1807222032       0       0       0 ...     0 PCI-MSI-edge eth0-tx1
> > 760: 1820309360       0       0       0 ...     0 PCI-MSI-edge eth0-tx0
> > 
> 
> echo 01 >/proc/irq/*/eth0-rx0/../smp_affinity
> echo 02 >/proc/irq/*/eth0-rx1/../smp_affinity
> echo 04 >/proc/irq/*/eth0-rx2/../smp_affinity
> echo 08 >/proc/irq/*/eth0-rx3/../smp_affinity
> 
> 
> cat /proc/irq/*/eth0-rx0/../smp_affinity
> cat /proc/irq/*/eth0-rx1/../smp_affinity
> cat /proc/irq/*/eth0-rx2/../smp_affinity
> cat /proc/irq/*/eth0-rx3/../smp_affinity
 
The last test were made already concerning such rx queue binding:
# cat /proc/irq/60/smp_affinity
001000
# cat /proc/irq/61/smp_affinity
010000
# cat /proc/irq/62/smp_affinity
080000
# cat /proc/irq/63/smp_affinity
800000

Now ksoftirqd eats not only one processor but all oness where I assigned the IRQs.

> > On the old ones:
> >            CPU0       CPU1       CPU2  ...      CPU8
> > 502:  522320256  522384039  522327386  ... 522380267 PCI-MSI-edge eth0
> > 
> 
> What network driver is it (newbox), was it (old box) ?

newbox:
01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network
Connection (rev 02)
driver: igb
version: 1.3.16-k2
firmware-version: 2.1-0
bus-info: 0000:01:00.0

oldbox:
05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
Ethernet Controller (Copper) (rev 01)
driver: e1000e
version: 0.3.3.3-k6
firmware-version: 1.0-0
bus-info: 0000:05:00.0

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30 17:37           ` Alexey Vlasov
@ 2010-09-30 18:03             ` Eric Dumazet
  2010-09-30 18:15               ` Alexey Vlasov
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-30 18:03 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev

Le jeudi 30 septembre 2010 à 21:37 +0400, Alexey Vlasov a écrit :
> On Thu, Sep 30, 2010 at 02:44:29PM +0200, Eric Dumazet wrote:
> > Le jeudi 30 septembre 2010 ?? 16:23 +0400, Alexey Vlasov a ??crit :
> > > On Thu, Sep 30, 2010 at 08:33:52AM +0200, Eric Dumazet wrote:
> > > > Le jeudi 30 septembre 2010 ?? 10:24 +0400, Alexey Vlasov a ??crit :
> > > > > Here I found some dude with the same problem:
> > > > > http://lkml.org/lkml/2010/7/9/340
> > >  
> > > Well I put interrups from NIC, namely tx/rx query, to different
> > > processors and got normal pings by adding LOG rule.
> > > 
> > > I also found that overruns is constantly growing, I don't know if these are connected.
> > > RX packets:2831439546 errors:0 dropped:134726 overruns:947671733 frame:0
> > > TX packets:2880849825 errors:0 dropped:0 overruns:0 carrier:0
> > > 
> 
> Too early to be happy, concerning one rule- the situation got better, but still
> there are some time delays. But adding one more rule:
> -A INPUT -p all -m state --state INVALID -j LOG --log-prefix
> "ipsec:IN-INVALID "
> it got totally wrecked:
> ...
> 64 bytes from (10.0.2.17): icmp_seq=24 ttl=64 time=0.342 ms
> 64 bytes from (10.0.2.17): icmp_seq=25 ttl=64 time=1868 ms
> 64 bytes from (10.0.2.17): icmp_seq=26 ttl=64 time=1448 ms
> 64 bytes from (10.0.2.17): icmp_seq=27 ttl=64 time=447 ms
> 64 bytes from (10.0.2.17): icmp_seq=28 ttl=64 time=0.196 ms
> ...
> 100 packets transmitted, 100 received, 0% packet loss, time 99990ms
> rtt min/avg/max/mdev = 0.108/39.068/1868.663/237.507 ms, pipe 2
> 
> # iptables -L -v -n
> Chain INPUT (policy ACCEPT 601K packets, 475M bytes)
>  pkts bytes target     prot opt in     out     source               destination
>   275 11096 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           state INVALID LOG flags 0 level 4 prefix `ipsec:IN-INVALID '
> 
> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
>  pkts bytes target     prot opt in     out     source               destination
> 
> Chain OUTPUT (policy ACCEPT 529K packets, 561M bytes)
>  pkts bytes target     prot opt in     out     source               destination
> 13979  839K LOG        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:80 flags:0x17/0x02 LOG flags 8 level 4 prefix `ipsec:SYN-OUTPUT-DROP '
>  
> > > Here goes the typical distribution of interrups on new servers:
> > >            CPU0    CPU1    CPU2    CPU3 ... CPU23
> > > 752:         11       0       0       0 ...     0 PCI-MSI-edge eth0
> > > 753: 2799366721       0       0       0 ...     0 PCI-MSI-edge eth0-rx3
> > > 754: 2821840553       0       0       0 ...     0 PCI-MSI-edge eth0-rx2
> > > 755: 2786117044       0       0       0 ...     0 PCI-MSI-edge eth0-rx1
> > > 756: 2896099336       0       0       0 ...     0 PCI-MSI-edge eth0-rx0
> > > 757: 1808404680       0       0       0 ...     0 PCI-MSI-edge eth0-tx3
> > > 758: 1797855130       0       0       0 ...     0 PCI-MSI-edge eth0-tx2
> > > 759: 1807222032       0       0       0 ...     0 PCI-MSI-edge eth0-tx1
> > > 760: 1820309360       0       0       0 ...     0 PCI-MSI-edge eth0-tx0
> > > 
> > 
> > echo 01 >/proc/irq/*/eth0-rx0/../smp_affinity
> > echo 02 >/proc/irq/*/eth0-rx1/../smp_affinity
> > echo 04 >/proc/irq/*/eth0-rx2/../smp_affinity
> > echo 08 >/proc/irq/*/eth0-rx3/../smp_affinity
> > 
> > 
> > cat /proc/irq/*/eth0-rx0/../smp_affinity
> > cat /proc/irq/*/eth0-rx1/../smp_affinity
> > cat /proc/irq/*/eth0-rx2/../smp_affinity
> > cat /proc/irq/*/eth0-rx3/../smp_affinity
>  
> The last test were made already concerning such rx queue binding:
> # cat /proc/irq/60/smp_affinity
> 001000
> # cat /proc/irq/61/smp_affinity
> 010000
> # cat /proc/irq/62/smp_affinity
> 080000
> # cat /proc/irq/63/smp_affinity
> 800000
> 

Why 60, 61, 62, 63 ? This should be 753, 754, 755, 756



> Now ksoftirqd eats not only one processor but all oness where I assigned the IRQs.
> 
> > > On the old ones:
> > >            CPU0       CPU1       CPU2  ...      CPU8
> > > 502:  522320256  522384039  522327386  ... 522380267 PCI-MSI-edge eth0
> > > 
> > 
> > What network driver is it (newbox), was it (old box) ?
> 
> newbox:
> 01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network
> Connection (rev 02)
> driver: igb
> version: 1.3.16-k2
> firmware-version: 2.1-0
> bus-info: 0000:01:00.0
> 
> oldbox:
> 05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> driver: e1000e
> version: 0.3.3.3-k6
> firmware-version: 1.0-0
> bus-info: 0000:05:00.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30 18:03             ` Eric Dumazet
@ 2010-09-30 18:15               ` Alexey Vlasov
  2010-09-30 18:52                 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey Vlasov @ 2010-09-30 18:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev

On Thu, Sep 30, 2010 at 08:03:02PM +0200, Eric Dumazet wrote:
> >  
> > The last test were made already concerning such rx queue binding:
> > # cat /proc/irq/60/smp_affinity
> > 001000
> > # cat /proc/irq/61/smp_affinity
> > 010000
> > # cat /proc/irq/62/smp_affinity
> > 080000
> > # cat /proc/irq/63/smp_affinity
> > 800000
> > 
> 
> Why 60, 61, 62, 63 ? This should be 753, 754, 755, 756

I've got several similar servers, the interrups 60-63 are on
that one that I can test now, so this isn't a mistake.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30 18:15               ` Alexey Vlasov
@ 2010-09-30 18:52                 ` Eric Dumazet
  2010-10-01 10:16                   ` Alexey Vlasov
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-30 18:52 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev

Le jeudi 30 septembre 2010 à 22:15 +0400, Alexey Vlasov a écrit :
> On Thu, Sep 30, 2010 at 08:03:02PM +0200, Eric Dumazet wrote:
> > >  
> > > The last test were made already concerning such rx queue binding:
> > > # cat /proc/irq/60/smp_affinity
> > > 001000
> > > # cat /proc/irq/61/smp_affinity
> > > 010000
> > > # cat /proc/irq/62/smp_affinity
> > > 080000
> > > # cat /proc/irq/63/smp_affinity
> > > 800000
> > > 
> > 
> > Why 60, 61, 62, 63 ? This should be 753, 754, 755, 756
> 
> I've got several similar servers, the interrups 60-63 are on
> that one that I can test now, so this isn't a mistake.
> 

If you have a burst of 'LOG' matches, it can really slow down the whole
thing.

You should add a limiter (eg: no more than 5 messages per second)

http://netfilter.org/documentation/HOWTO/packet-filtering-HOWTO-7.html

	This module is most useful after a limit match, so you don't
	flood your logs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30 18:52                 ` Eric Dumazet
@ 2010-10-01 10:16                   ` Alexey Vlasov
  2010-10-01 12:59                     ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey Vlasov @ 2010-10-01 10:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev

On Thu, Sep 30, 2010 at 08:52:45PM +0200, Eric Dumazet wrote:
> 
> If you have a burst of 'LOG' matches, it can really slow down the whole
> thing.
> 
> You should add a limiter (eg: no more than 5 messages per second)
> 
> http://netfilter.org/documentation/HOWTO/packet-filtering-HOWTO-7.html

Yes, sometimes there aren't even 5 packets a second, but also it slows down.
So I don't think this is the reason.

I have also found that:
1. rx overruns is increasing.
2. rx_queue_drop_packet_count is increasing.
# ethtool -S eth0 | grep drop
     tx_dropped: 0
     rx_queue_drop_packet_count: 1260743751
     dropped_smbus: 0
     rx_queue_0_drops: 0
     rx_queue_1_drops: 0
     rx_queue_2_drops: 0
     rx_queue_3_drops: 0

3. By sending SYN-packets by hping, RST packet doesn't send, but I don't know may 
be it is just some feature in 2.6.32.
newbox # hping -c 1 -S -p 80 111.111.111.111
HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
len=46 ip=111.111.111.111 ttl=58 DF id=11471 sport=80 flags=SA seq=0 win=65535 rtt=99.0 ms

--- 111.111.111.111 hping statistic ---
1 packets tramitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 99.0/99.0/99.0 ms

13:59:07.439528 IP newbox.2777 > 111.111.111.111.80: S 345595033:345595033(0) win 512
13:59:07.439626 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
13:59:10.439368 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
13:59:16.439313 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
13:59:28.439206 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>

As a result I got doubles:
DUP! len=46 ip=111.111.111.111 ttl=58 DF id=27454 sport=80 flags=SA seq=0 win=65535 rtt=3137.8 ms

Example of another TCP-session from 2.6.28 kernel:
oldbox # hping -c 1 -S -p 80 111.111.111.111
HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
len=46 ip=111.111.111.111 ttl=58 DF id=53180 sport=80 flags=SA seq=0 win=65535 rtt=2.9 ms

--- 111.111.111.111 hping statistic ---
1 packets tramitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 2.9/2.9/2.9 ms

14:01:45.225136 IP oldbox.2776 > 111.111.111.111.80: S 1983626200:1983626200(0) win 512
14:01:45.225288 IP 111.111.111.111.80 > oldbox.2776: S 3796385036:3796385036(0) ack 1983626201 win 65535 <mss 1460>
14:01:45.227990 IP oldbox.2776 > 111.111.111.111.80: R 1983626201:1983626201(0) win 0

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-10-01 10:16                   ` Alexey Vlasov
@ 2010-10-01 12:59                     ` Eric Dumazet
  2010-10-01 14:18                       ` Alexey Vlasov
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-10-01 12:59 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev

Le vendredi 01 octobre 2010 à 14:16 +0400, Alexey Vlasov a écrit :

> I have also found that:
> 1. rx overruns is increasing.
> 2. rx_queue_drop_packet_count is increasing.

So you flood machine with packets, its not an idle one ?

I thought you were doing experiments with light trafic.


> # ethtool -S eth0 | grep drop
>      tx_dropped: 0
>      rx_queue_drop_packet_count: 1260743751
>      dropped_smbus: 0
>      rx_queue_0_drops: 0
>      rx_queue_1_drops: 0
>      rx_queue_2_drops: 0
>      rx_queue_3_drops: 0
> 


ethtool -S eth0   (full output, not small parts)


> 3. By sending SYN-packets by hping, RST packet doesn't send, but I don't know may 
> be it is just some feature in 2.6.32.
> newbox # hping -c 1 -S -p 80 111.111.111.111
> HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> len=46 ip=111.111.111.111 ttl=58 DF id=11471 sport=80 flags=SA seq=0 win=65535 rtt=99.0 ms
> 
> --- 111.111.111.111 hping statistic ---
> 1 packets tramitted, 1 packets received, 0% packet loss
> round-trip min/avg/max = 99.0/99.0/99.0 ms
> 
> 13:59:07.439528 IP newbox.2777 > 111.111.111.111.80: S 345595033:345595033(0) win 512
> 13:59:07.439626 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 13:59:10.439368 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 13:59:16.439313 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 13:59:28.439206 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> 
> As a result I got doubles:

Are you playing with trafic shaping ?

tc -s -d qdisc



> DUP! len=46 ip=111.111.111.111 ttl=58 DF id=27454 sport=80 flags=SA seq=0 win=65535 rtt=3137.8 ms
> 
> Example of another TCP-session from 2.6.28 kernel:
> oldbox # hping -c 1 -S -p 80 111.111.111.111
> HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> len=46 ip=111.111.111.111 ttl=58 DF id=53180 sport=80 flags=SA seq=0 win=65535 rtt=2.9 ms
> 
> --- 111.111.111.111 hping statistic ---
> 1 packets tramitted, 1 packets received, 0% packet loss
> round-trip min/avg/max = 2.9/2.9/2.9 ms
> 
> 14:01:45.225136 IP oldbox.2776 > 111.111.111.111.80: S 1983626200:1983626200(0) win 512
> 14:01:45.225288 IP 111.111.111.111.80 > oldbox.2776: S 3796385036:3796385036(0) ack 1983626201 win 65535 <mss 1460>
> 14:01:45.227990 IP oldbox.2776 > 111.111.111.111.80: R 1983626201:1983626201(0) win 0
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-10-01 12:59                     ` Eric Dumazet
@ 2010-10-01 14:18                       ` Alexey Vlasov
  2010-10-01 15:27                         ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey Vlasov @ 2010-10-01 14:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev

On Fri, Oct 01, 2010 at 02:59:26PM +0200, Eric Dumazet wrote:
> Le vendredi 01 octobre 2010 ?? 14:16 +0400, Alexey Vlasov a ??crit :
> 
> > I have also found that:
> > 1. rx overruns is increasing.
> > 2. rx_queue_drop_packet_count is increasing.
> 
> So you flood machine with packets, its not an idle one ?
> 
> I thought you were doing experiments with light trafic.

No, it's a usual working server for shared hosting. There're about 1000 
clients' website, and I don't flood it specially. Franlky speaking I don't 
see any network suspicious activity.
 
> > # ethtool -S eth0 | grep drop
> >      tx_dropped: 0
> >      rx_queue_drop_packet_count: 1260743751
> >      dropped_smbus: 0
> >      rx_queue_0_drops: 0
> >      rx_queue_1_drops: 0
> >      rx_queue_2_drops: 0
> >      rx_queue_3_drops: 0
> > 
> 
> 
> ethtool -S eth0   (full output, not small parts)

NIC statistics:
     rx_packets: 2973717440
     tx_packets: 3032670910
     rx_bytes: 1892633650741
     tx_bytes: 2536130682695
     rx_broadcast: 118773199
     tx_broadcast: 68013
     rx_multicast: 95257
     tx_multicast: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 95257
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 7939
     rx_queue_drop_packet_count: 1324025520
     rx_missed_errors: 146631
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 50715
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 344724062
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 1892633650741
     rx_csum_offload_good: 2973697420
     rx_csum_offload_errors: 6235
     tx_dma_out_of_sync: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 9327
     rx_smbus: 118531661
     dropped_smbus: 0
     tx_queue_0_packets: 797617475
     tx_queue_0_bytes: 630191908685
     tx_queue_1_packets: 719681297
     tx_queue_1_bytes: 625907304846
     tx_queue_2_packets: 718841556
     tx_queue_2_bytes: 620522418855
     tx_queue_3_packets: 796521255
     tx_queue_3_bytes: 646196024585
     rx_queue_0_packets: 788885797
     rx_queue_0_bytes: 458936338699
     rx_queue_0_drops: 0
     rx_queue_1_packets: 701354604
     rx_queue_1_bytes: 457490536453
     rx_queue_1_drops: 0
     rx_queue_2_packets: 791887663
     rx_queue_2_bytes: 534425333616
     rx_queue_2_drops: 0
     rx_queue_3_packets: 691579028
     rx_queue_3_bytes: 429887244557
     rx_queue_3_drops: 0

> > 3. By sending SYN-packets by hping, RST packet doesn't send, but I don't know may 
> > be it is just some feature in 2.6.32.
> > newbox # hping -c 1 -S -p 80 111.111.111.111
> > HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> > len=46 ip=111.111.111.111 ttl=58 DF id=11471 sport=80 flags=SA seq=0 win=65535 rtt=99.0 ms
> > 
> > --- 111.111.111.111 hping statistic ---
> > 1 packets tramitted, 1 packets received, 0% packet loss
> > round-trip min/avg/max = 99.0/99.0/99.0 ms
> > 
> > 13:59:07.439528 IP newbox.2777 > 111.111.111.111.80: S 345595033:345595033(0) win 512
> > 13:59:07.439626 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 13:59:10.439368 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 13:59:16.439313 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 13:59:28.439206 IP 111.111.111.111.80 > newbox.2777: S 1178827395:1178827395(0) ack 345595034 win 65535 <mss 1460>
> > 
> > As a result I got doubles:
> 
> Are you playing with trafic shaping ?
> 
> tc -s -d qdisc
 
No, nothing alike, no shapers.

# tc -s -d qdisc
bash: tc: command not found
 
> > DUP! len=46 ip=111.111.111.111 ttl=58 DF id=27454 sport=80 flags=SA seq=0 win=65535 rtt=3137.8 ms
> > 
> > Example of another TCP-session from 2.6.28 kernel:
> > oldbox # hping -c 1 -S -p 80 111.111.111.111
> > HPING 111.111.111.111 (eth0 111.111.111.111): S set, 40 headers + 0 data bytes
> > len=46 ip=111.111.111.111 ttl=58 DF id=53180 sport=80 flags=SA seq=0 win=65535 rtt=2.9 ms
> > 
> > --- 111.111.111.111 hping statistic ---
> > 1 packets tramitted, 1 packets received, 0% packet loss
> > round-trip min/avg/max = 2.9/2.9/2.9 ms
> > 
> > 14:01:45.225136 IP oldbox.2776 > 111.111.111.111.80: S 1983626200:1983626200(0) win 512
> > 14:01:45.225288 IP 111.111.111.111.80 > oldbox.2776: S 3796385036:3796385036(0) ack 1983626201 win 65535 <mss 1460>
> > 14:01:45.227990 IP oldbox.2776 > 111.111.111.111.80: R 1983626201:1983626201(0) win 0
> > 

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-10-01 14:18                       ` Alexey Vlasov
@ 2010-10-01 15:27                         ` Eric Dumazet
  2010-10-01 18:54                           ` Jeff Kirsher
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-10-01 15:27 UTC (permalink / raw)
  To: Alexey Vlasov
  Cc: Linux Kernel Mailing List, netdev, Jeff Kirsher, Emil Tantilov

Le vendredi 01 octobre 2010 à 18:18 +0400, Alexey Vlasov a écrit :

> NIC statistics:
>      rx_packets: 2973717440
>      tx_packets: 3032670910
>      rx_bytes: 1892633650741
>      tx_bytes: 2536130682695
>      rx_broadcast: 118773199
>      tx_broadcast: 68013
>      rx_multicast: 95257
>      tx_multicast: 0
>      rx_errors: 0
>      tx_errors: 0
>      tx_dropped: 0
>      multicast: 95257
>      collisions: 0
>      rx_length_errors: 0
>      rx_over_errors: 0
>      rx_crc_errors: 0
>      rx_frame_errors: 0
>      rx_no_buffer_count: 7939
>      rx_queue_drop_packet_count: 1324025520
>      rx_missed_errors: 146631
>      tx_aborted_errors: 0
>      tx_carrier_errors: 0
>      tx_fifo_errors: 0
>      tx_heartbeat_errors: 0
>      tx_window_errors: 0
>      tx_abort_late_coll: 0
>      tx_deferred_ok: 0
>      tx_single_coll_ok: 0
>      tx_multi_coll_ok: 0
>      tx_timeout_count: 0
>      tx_restart_queue: 50715
>      rx_long_length_errors: 0
>      rx_short_length_errors: 0
>      rx_align_errors: 0
>      tx_tcp_seg_good: 344724062
>      tx_tcp_seg_failed: 0
>      rx_flow_control_xon: 0
>      rx_flow_control_xoff: 0
>      tx_flow_control_xon: 0
>      tx_flow_control_xoff: 0
>      rx_long_byte_count: 1892633650741
>      rx_csum_offload_good: 2973697420
>      rx_csum_offload_errors: 6235
>      tx_dma_out_of_sync: 0
>      alloc_rx_buff_failed: 0
>      tx_smbus: 9327
>      rx_smbus: 118531661
>      dropped_smbus: 0
>      tx_queue_0_packets: 797617475
>      tx_queue_0_bytes: 630191908685
>      tx_queue_1_packets: 719681297
>      tx_queue_1_bytes: 625907304846
>      tx_queue_2_packets: 718841556
>      tx_queue_2_bytes: 620522418855
>      tx_queue_3_packets: 796521255
>      tx_queue_3_bytes: 646196024585
>      rx_queue_0_packets: 788885797
>      rx_queue_0_bytes: 458936338699
>      rx_queue_0_drops: 0
>      rx_queue_1_packets: 701354604
>      rx_queue_1_bytes: 457490536453
>      rx_queue_1_drops: 0
>      rx_queue_2_packets: 791887663
>      rx_queue_2_bytes: 534425333616
>      rx_queue_2_drops: 0
>      rx_queue_3_packets: 691579028
>      rx_queue_3_bytes: 429887244557
>      rx_queue_3_drops: 0
> 11.111.80: R 1983626201:1983626201(0) win 0
> > > 

OK

IGB stats are wrong... for rx_queue_drop_packet_count field at least

Here is a patch against 2.6.32.23, to get the idea...

Dont trust it unless you patch your kernel ;)

Thanks

Note: current linux-2.6 tree doesnt have this bug.

[PATCH] igb: rx_fifo_errors counter fix

Alexey Vlasov reported insane rx_queue_drop_packet_count
(rx_fifo_errors) values. 

IGB drivers is doing an accumulation for 82575, instead using a zero
value for rqdpc_total.

Reported-by: Alexey Vlasov <renton@renton.name>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 linux-2.6.32.23/net/igb/igb_main.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- linux-2.6.32.23/drivers/net/igb/igb_main.c.orig
+++ linux-2.6.32.23/drivers/net/igb/igb_main.c
@@ -3552,6 +3552,7 @@
 	struct e1000_hw *hw = &adapter->hw;
 	struct pci_dev *pdev = adapter->pdev;
 	u16 phy_tmp;
+	unsigned long rqdpc_total = 0;
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
@@ -3645,7 +3646,6 @@
 
 	if (hw->mac.type != e1000_82575) {
 		u32 rqdpc_tmp;
-		u64 rqdpc_total = 0;
 		int i;
 		/* Read out drops stats per RX queue.  Notice RQDPC (Receive
 		 * Queue Drop Packet Count) stats only gets incremented, if
@@ -3660,7 +3660,6 @@
 			adapter->rx_ring[i].rx_stats.drops += rqdpc_tmp;
 			rqdpc_total += adapter->rx_ring[i].rx_stats.drops;
 		}
-		adapter->net_stats.rx_fifo_errors = rqdpc_total;
 	}
 
 	/* Note RNBC (Receive No Buffers Count) is an not an exact
@@ -3668,7 +3667,7 @@
 	 * one of the reason for saving it in rx_fifo_errors, as its
 	 * potentially not a true drop.
 	 */
-	adapter->net_stats.rx_fifo_errors += adapter->stats.rnbc;
+	adapter->net_stats.rx_fifo_errors = rqdpc_total + adapter->stats.rnbc;
 
 	/* RLEC on some newer hardware can be incorrect so build
 	 * our own version based on RUC and ROC */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-10-01 15:27                         ` Eric Dumazet
@ 2010-10-01 18:54                           ` Jeff Kirsher
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Kirsher @ 2010-10-01 18:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Vlasov, Linux Kernel Mailing List, netdev, Emil Tantilov

On Fri, Oct 1, 2010 at 08:27, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 01 octobre 2010 à 18:18 +0400, Alexey Vlasov a écrit :
>
>> NIC statistics:
>>      rx_packets: 2973717440
>>      tx_packets: 3032670910
>>      rx_bytes: 1892633650741
>>      tx_bytes: 2536130682695
>>      rx_broadcast: 118773199
>>      tx_broadcast: 68013
>>      rx_multicast: 95257
>>      tx_multicast: 0
>>      rx_errors: 0
>>      tx_errors: 0
>>      tx_dropped: 0
>>      multicast: 95257
>>      collisions: 0
>>      rx_length_errors: 0
>>      rx_over_errors: 0
>>      rx_crc_errors: 0
>>      rx_frame_errors: 0
>>      rx_no_buffer_count: 7939
>>      rx_queue_drop_packet_count: 1324025520
>>      rx_missed_errors: 146631
>>      tx_aborted_errors: 0
>>      tx_carrier_errors: 0
>>      tx_fifo_errors: 0
>>      tx_heartbeat_errors: 0
>>      tx_window_errors: 0
>>      tx_abort_late_coll: 0
>>      tx_deferred_ok: 0
>>      tx_single_coll_ok: 0
>>      tx_multi_coll_ok: 0
>>      tx_timeout_count: 0
>>      tx_restart_queue: 50715
>>      rx_long_length_errors: 0
>>      rx_short_length_errors: 0
>>      rx_align_errors: 0
>>      tx_tcp_seg_good: 344724062
>>      tx_tcp_seg_failed: 0
>>      rx_flow_control_xon: 0
>>      rx_flow_control_xoff: 0
>>      tx_flow_control_xon: 0
>>      tx_flow_control_xoff: 0
>>      rx_long_byte_count: 1892633650741
>>      rx_csum_offload_good: 2973697420
>>      rx_csum_offload_errors: 6235
>>      tx_dma_out_of_sync: 0
>>      alloc_rx_buff_failed: 0
>>      tx_smbus: 9327
>>      rx_smbus: 118531661
>>      dropped_smbus: 0
>>      tx_queue_0_packets: 797617475
>>      tx_queue_0_bytes: 630191908685
>>      tx_queue_1_packets: 719681297
>>      tx_queue_1_bytes: 625907304846
>>      tx_queue_2_packets: 718841556
>>      tx_queue_2_bytes: 620522418855
>>      tx_queue_3_packets: 796521255
>>      tx_queue_3_bytes: 646196024585
>>      rx_queue_0_packets: 788885797
>>      rx_queue_0_bytes: 458936338699
>>      rx_queue_0_drops: 0
>>      rx_queue_1_packets: 701354604
>>      rx_queue_1_bytes: 457490536453
>>      rx_queue_1_drops: 0
>>      rx_queue_2_packets: 791887663
>>      rx_queue_2_bytes: 534425333616
>>      rx_queue_2_drops: 0
>>      rx_queue_3_packets: 691579028
>>      rx_queue_3_bytes: 429887244557
>>      rx_queue_3_drops: 0
>> 11.111.80: R 1983626201:1983626201(0) win 0
>> > >
>
> OK
>
> IGB stats are wrong... for rx_queue_drop_packet_count field at least
>
> Here is a patch against 2.6.32.23, to get the idea...
>
> Dont trust it unless you patch your kernel ;)
>
> Thanks
>
> Note: current linux-2.6 tree doesnt have this bug.
>
> [PATCH] igb: rx_fifo_errors counter fix
>
> Alexey Vlasov reported insane rx_queue_drop_packet_count
> (rx_fifo_errors) values.
>
> IGB drivers is doing an accumulation for 82575, instead using a zero
> value for rqdpc_total.
>
> Reported-by: Alexey Vlasov <renton@renton.name>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  linux-2.6.32.23/net/igb/igb_main.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>

Thanks Eric!  I have added the patch to my queue.

-- 
Cheers,
Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-29 21:45 ` Packet time delays on multi-core systems Eric Dumazet
  2010-09-30  6:24   ` Alexey Vlasov
@ 2010-09-30 12:30   ` Alexey Vlasov
  2010-09-30 12:46     ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey Vlasov @ 2010-09-30 12:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev

On Wed, Sep 29, 2010 at 11:45:21PM +0200, Eric Dumazet wrote:
> But if you send SYN packets in the same time, (logged), this might
> slow
> down the reception (and answers) of ICMP frames. LOG target can be
> quite
> expensive...

Yes, it's clear that  some slow down can appear, but 100 ms is too much,
and this happens at 200 SYN packets in 2 minutes just as in my example.
On old servers where some tx/rx are missing in NIC card I don't see
such a situation even at more then 1000 SYN-packets per sec.

> Is using other rules gives same problem ?
>
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags

No, only LOG gives such a scheme.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Packet time delays on multi-core systems
  2010-09-30 12:30   ` Alexey Vlasov
@ 2010-09-30 12:46     ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-09-30 12:46 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev

Le jeudi 30 septembre 2010 à 16:30 +0400, Alexey Vlasov a écrit :
> On Wed, Sep 29, 2010 at 11:45:21PM +0200, Eric Dumazet wrote:
> > But if you send SYN packets in the same time, (logged), this might
> > slow
> > down the reception (and answers) of ICMP frames. LOG target can be
> > quite
> > expensive...
> 
> Yes, it's clear that  some slow down can appear, but 100 ms is too much,
> and this happens at 200 SYN packets in 2 minutes just as in my example.
> On old servers where some tx/rx are missing in NIC card I don't see
> such a situation even at more then 1000 SYN-packets per sec.

Because all cpus were servicing interrupts, which was good for your
needs. Things apparently changed with 2.6.32. 

You have a multiqueue NIC, but using a single CPU to handle the
workload.

> 
> > Is using other rules gives same problem ?
> >
> > iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags
> 
> No, only LOG gives such a scheme.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-10-01 18:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20100929191851.GC86786@beaver.vrungel.ru>
2010-09-29 21:45 ` Packet time delays on multi-core systems Eric Dumazet
2010-09-30  6:24   ` Alexey Vlasov
2010-09-30  6:33     ` Eric Dumazet
2010-09-30 12:23       ` Alexey Vlasov
2010-09-30 12:44         ` Eric Dumazet
2010-09-30 17:37           ` Alexey Vlasov
2010-09-30 18:03             ` Eric Dumazet
2010-09-30 18:15               ` Alexey Vlasov
2010-09-30 18:52                 ` Eric Dumazet
2010-10-01 10:16                   ` Alexey Vlasov
2010-10-01 12:59                     ` Eric Dumazet
2010-10-01 14:18                       ` Alexey Vlasov
2010-10-01 15:27                         ` Eric Dumazet
2010-10-01 18:54                           ` Jeff Kirsher
2010-09-30 12:30   ` Alexey Vlasov
2010-09-30 12:46     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).