netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Ivashchenko <hazard@francoudi.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: netdev@vger.kernel.org
Subject: Re: bond + tc regression ?
Date: Tue, 5 May 2009 20:41:35 +0300	[thread overview]
Message-ID: <20090505174135.GA29716@francoudi.com> (raw)
In-Reply-To: <4A0069F3.5030607@cosmosbay.com>

> > On both kernels, the system is running with at least 70% idle CPU.
> > The network interrupts are distributed accross the cores.
> 
> You should not distribute interrupts, but bound a NIC to one CPU

Kernels 2.6.28 and 2.6.29 do this by default, so I thought its correct.
The defaults are wrong?

I have tried with IRQs bound to one CPU per NIC. Same result.

> > I thought it was a e1000e driver issue, but tweaking e1000e ring buffers
> > didn't help. I tried using e1000 on 2.6.28 by adding necessary PCI IDs,
> > I tried running on a different server with bnx cards, I tried disabling
> > NO_HZ and HRTICK, but still I have the same problem.
> > 
> > However, if I don't utilize bond, but just apply rules on normal ethX
> > interfaces, there is no packet loss with 2.6.28/29. 
> > 
> > So, the problem appears only when I use 2.6.28/29 + bond + classful tc
> > combination. 
> > 
> > Any ideas ?
> > 
> 
> Yes, we need much more information :)
> Is it a forwarding setup only ?

Yes, the server is doing nothing else but forwarding, no iptables.

> cat /proc/interrupts

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:        130          0          0          0          0          0          0          0   IO-APIC-edge      timer
  1:          2          0          0          0          0          0          0          0   IO-APIC-edge      i8042
  3:          0          0          0          1          0          1          0          0   IO-APIC-edge
  4:          0          0          1          0          0          0          1          0   IO-APIC-edge
  9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          4          0          0          0          0          0          0          0   IO-APIC-edge      i8042
 14:          0          0          0          0          0          0          0          0   IO-APIC-edge      ata_piix
 15:          0          0          0          0          0          0          0          0   IO-APIC-edge      ata_piix
 17:      30901      31910      31446      30655      31618      30550      31543      30958   IO-APIC-fasteoi   aacraid
 20:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 21:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb5, ahci
 22:     298387     297642     295508     294368     295533     295430     295275     296036   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 23:      10868      10926      10980      10738      10939      10615      10761      10909   IO-APIC-fasteoi   uhci_hcd:usb3
 57: 1486251823 1486835830 1486677250 1487105983 1488000303 1485941815 1487728317 1486624997   PCI-MSI-edge      eth0
 58: 1510676329 1509708161 1510347202 1509969755 1508599471 1511220118 1509094578 1509727616   PCI-MSI-edge      eth1
 59: 1482578890 1483618556 1482963700 1483164528 1484561615 1482130645 1484116749 1483557717   PCI-MSI-edge      eth2
 60: 1507341647 1506685822 1506862759 1506612818 1505689367 1507559672 1505911622 1506940613   PCI-MSI-edge      eth3
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC: 1020533656 1020535165 1020533613 1020534967 1020535173 1020534409 1020534985 1020534220   Local timer interrupts
RES:      18605      21215      15957      18637      22429      19493      16649      15589   Rescheduling interrupts
CAL:        160        214        186        185        199        205        190        180   Function call interrupts
TLB:     259515     264126     309016     312222     263163     265601     306189     305430   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
ERR:          0
MIS:          0

> tc -s -d qdisc

For test sake, I just put "tc qdisc add dev $IFACE root handle 1: prio" and no filters at all. 
I get the same with HTB "tc qdisc add dev $IFACE root handle 1: htb default 99" and no subclasses.

qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 13287736273644 bytes 1263672018 pkt (dropped 0, overlimits 0 requeues 2928480094)
 rate 0bit 0pps backlog 0b 0p requeues 2928480094
qdisc pfifo_fast 0: dev eth1 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 40064376195000 bytes 1747026586 pkt (dropped 0, overlimits 0 requeues 463621814)
 rate 0bit 0pps backlog 0b 0p requeues 463621814
qdisc pfifo_fast 0: dev eth2 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 13350145517965 bytes 1350897201 pkt (dropped 0, overlimits 0 requeues 2930879507)
 rate 0bit 0pps backlog 0b 0p requeues 2930879507
qdisc pfifo_fast 0: dev eth3 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 40193456126884 bytes 1950653764 pkt (dropped 0, overlimits 0 requeues 465511120)
 rate 0bit 0pps backlog 0b 0p requeues 465511120
qdisc prio 1: dev bond0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 985164834 bytes 2720991 pkt (dropped 241834, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
qdisc prio 1: dev bond1 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 2347118738 bytes 3089171 pkt (dropped 304601, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

** Drops on bond0/bond1 are increasing by approximately 5000 per second:

qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 13287874353796 bytes 1264050808 pkt (dropped 0, overlimits 0 requeues 2928520779)
 rate 0bit 0pps backlog 0b 0p requeues 2928520779
qdisc pfifo_fast 0: dev eth1 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 40064706826018 bytes 1747459793 pkt (dropped 0, overlimits 0 requeues 463669610)
 rate 0bit 0pps backlog 0b 0p requeues 463669610
qdisc pfifo_fast 0: dev eth2 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 13350283202695 bytes 1351277761 pkt (dropped 0, overlimits 0 requeues 2930918488)
 rate 0bit 0pps backlog 0b 0p requeues 2930918488
qdisc pfifo_fast 0: dev eth3 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 40193784868074 bytes 1951084029 pkt (dropped 0, overlimits 0 requeues 465558015)
 rate 0bit 0pps backlog 0b 0p requeues 465558015
qdisc prio 1: dev bond0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1260929539 bytes 3480340 pkt (dropped 311145, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
qdisc prio 1: dev bond1 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 3006490946 bytes 3952643 pkt (dropped 396850, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

With same setup on 2.6.23, drops are increasing only by 50/sec or so.

As soon as I do "tc qdisc del dev $IFACE root", packet loss stops.

> cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 80
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2
        Actor Key: 17
        Partner Key: 4
        Partner Mac Address: 00:19:e7:b2:07:80

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:24:bd:e9:cc
Aggregator ID: 1

Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:24:bd:e9:ce
Aggregator ID: 1

> cat /proc/net/bonding/bond1

Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 80
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 2
        Actor Key: 17
        Partner Key: 5
        Partner Mac Address: 00:19:e7:b2:07:80

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:24:bd:e9:cd
Aggregator ID: 2

Slave Interface: eth3
MII Status: up
Link Failure Count: 2
Permanent HW addr: 00:1b:24:bd:e9:cf
Aggregator ID: 2


> mpstat -P ALL 10

08:04:36 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
08:04:46 PM  all    0.00    0.00    0.01    0.00    0.00    1.05    0.00   98.94  70525.73
08:04:46 PM    0    0.00    0.00    0.00    0.00    0.00    0.70    0.00   99.30   7814.41
08:04:46 PM    1    0.00    0.00    0.00    0.00    0.00    2.10    0.00   97.90   7814.41
08:04:46 PM    2    0.00    0.00    0.00    0.00    0.00    0.20    0.00   99.80   7814.41
08:04:46 PM    3    0.00    0.00    0.10    0.00    0.00    1.30    0.00   98.60   7814.51
08:04:46 PM    4    0.00    0.00    0.00    0.00    0.00    0.50    0.00   99.50   7814.41
08:04:46 PM    5    0.00    0.00    0.00    0.00    0.00    1.90    0.00   98.10   7814.41
08:04:46 PM    6    0.00    0.00    0.00    0.00    0.00    0.60    0.00   99.40   7814.41
08:04:46 PM    7    0.00    0.00    0.10    0.00    0.00    0.90    0.00   99.00   7814.51
08:04:46 PM    8    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00

08:04:46 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
08:04:56 PM  all    0.00    0.00    0.01    0.00    0.00    1.49    0.00   98.50  66429.30
08:04:56 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00   7303.50
08:04:56 PM    1    0.00    0.00    0.00    0.00    0.00    1.60    0.00   98.40   7303.50
08:04:56 PM    2    0.00    0.00    0.00    0.00    0.00    1.20    0.00   98.80   7303.50
08:04:56 PM    3    0.00    0.00    0.00    0.00    0.00    3.20    0.00   96.80   7303.40
08:04:56 PM    4    0.00    0.00    0.00    0.00    0.00    1.90    0.00   98.10   7303.60
08:04:56 PM    5    0.00    0.00    0.00    0.00    0.00    1.20    0.00   98.80   7303.50
08:04:56 PM    6    0.00    0.00    0.10    0.00    0.00    1.80    0.00   98.10   7303.50
08:04:56 PM    7    0.00    0.00    0.00    0.00    0.00    1.20    0.00   98.80   7303.50
08:04:56 PM    8    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00      0.00

> ifconfig -a

bond0     Link encap:Ethernet  HWaddr 00:1B:24:BD:E9:CC
          inet addr:xxx.xxx.135.44  Bcast:xxx.xxx.135.47  Mask:255.255.255.248
          inet6 addr: fe80::21b:24ff:febd:e9cc/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:436076190 errors:0 dropped:391250 overruns:0 frame:0
          TX packets:2620156321 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4210046233 (3.9 GiB)  TX bytes:2520272242 (2.3 GiB)

bond1     Link encap:Ethernet  HWaddr 00:1B:24:BD:E9:CD
          inet addr:xxx.xxx.70.156  Bcast:xxx.xxx.70.159  Mask:255.255.255.248
          inet6 addr: fe80::21b:24ff:febd:e9cd/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:239471641 errors:0 dropped:344 overruns:0 frame:0
          TX packets:3704083902 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2488754745 (2.3 GiB)  TX bytes:2685275089 (2.5 GiB)

eth0      Link encap:Ethernet  HWaddr 00:1B:24:BD:E9:CC
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2235085582 errors:0 dropped:353786 overruns:0 frame:0
          TX packets:1266449269 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3768096439 (3.5 GiB)  TX bytes:113363829 (108.1 MiB)
          Memory:fc6e0000-fc700000

eth1      Link encap:Ethernet  HWaddr 00:1B:24:BD:E9:CD
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:4228974804 errors:0 dropped:344 overruns:0 frame:0
          TX packets:1750216649 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3350270261 (3.1 GiB)  TX bytes:3358220645 (3.1 GiB)
          Memory:fc6c0000-fc6e0000

eth2      Link encap:Ethernet  HWaddr 00:1B:24:BD:E9:CC
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2495958020 errors:0 dropped:37464 overruns:0 frame:0
          TX packets:1353707165 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:442055526 (421.5 MiB)  TX bytes:2406943933 (2.2 GiB)
          Memory:fcde0000-fce00000

eth3      Link encap:Ethernet  HWaddr 00:1B:24:BD:E9:CD
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:305464222 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1953867360 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3433479245 (3.1 GiB)  TX bytes:3622113909 (3.3 GiB)
          Memory:fcd80000-fcda0000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:53537 errors:0 dropped:0 overruns:0 frame:0
          TX packets:53537 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:431006433 (411.0 MiB)  TX bytes:431006433 (411.0 MiB)


NOTE: ifconfig drops on bond0/bond1 are *NOT* increasing. These drops are there from before.

-- 
Best Regards
Vladimir Ivashchenko
Chief Technology Officer
PrimeTel, Cyprus - www.prime-tel.com

  reply	other threads:[~2009-05-05 17:41 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-05 15:45 bond + tc regression ? Vladimir Ivashchenko
2009-05-05 16:25 ` Denys Fedoryschenko
2009-05-05 16:31 ` Eric Dumazet
2009-05-05 17:41   ` Vladimir Ivashchenko [this message]
2009-05-05 18:50     ` Eric Dumazet
2009-05-05 23:50       ` Vladimir Ivashchenko
2009-05-05 23:52         ` Stephen Hemminger
2009-05-06  3:36         ` Eric Dumazet
2009-05-06 10:28           ` Vladimir Ivashchenko
2009-05-06 10:41             ` Eric Dumazet
2009-05-06 10:49               ` Denys Fedoryschenko
2009-05-06 18:45           ` Vladimir Ivashchenko
2009-05-06 19:30             ` Denys Fedoryschenko
2009-05-06 20:47               ` Vladimir Ivashchenko
2009-05-06 21:46                 ` Denys Fedoryschenko
2009-05-08 20:46                   ` Vladimir Ivashchenko
2009-05-08 21:05                     ` Denys Fedoryschenko
2009-05-08 22:07                       ` Vladimir Ivashchenko
2009-05-08 22:42                         ` Denys Fedoryschenko
2009-05-17 18:46                           ` Vladimir Ivashchenko
2009-05-18  8:51                             ` Jarek Poplawski
2009-05-06  8:03       ` Ingo Molnar
2009-05-06  6:10     ` Jarek Poplawski
2009-05-06 10:36       ` Vladimir Ivashchenko
2009-05-06 10:48         ` Jarek Poplawski
2009-05-06 13:11           ` Vladimir Ivashchenko
2009-05-06 13:31             ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090505174135.GA29716@francoudi.com \
    --to=hazard@francoudi.com \
    --cc=dada1@cosmosbay.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).