netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jarek Poplawski <jarkao2@gmail.com>
To: Calin Velea <vcalinus@gemenii.ro>
Cc: Radu Rendec <radu.rendec@ines.ro>,
	Jesper Dangaard Brouer <hawk@diku.dk>,
	Denys Fedoryschenko <denys@visp.net.lb>,
	netdev@vger.kernel.org
Subject: Re: htb parallelism on multi-core platforms
Date: Fri, 24 Apr 2009 11:19:40 +0000	[thread overview]
Message-ID: <20090424111940.GA6450@ff.dom.local> (raw)
In-Reply-To: <1039493214.20090424135024@gemenii.ro>

On Fri, Apr 24, 2009 at 01:50:24PM +0300, Calin Velea wrote:
> Hi,
Hi,

Very interesting message, but try to use plain format next time.
I guess your mime/html original wasn't accepted by netdev@.

Jarek P.

> 
>   Maybe some actual results I got some time ago could help you and others who had the same problems:
> 
> Hardware: quad-core Xeon X3210 (2.13GHz, 8M  L2 cache), 2 Intel PCI Express Gigabit NICs
> Kernel: 2.6.20
> 
>   I did some udp flood tests in the following configurations - the machine was configured as a
> traffic shaping bridge, about 10k htb rules loaded, using hashing (see below):
> 
> A) napi on,  irqs for each card statically allocated to 2 CPU cores
> 
> when flooding, the same CPU went 100% softirq always (seems logical,
> since it is statically bound to the irq)
> 
> B) napi on, CONFIG_IRQBALANCE=y
> 
> when flooding, a random CPU went 100% softirq always. (here,
> at high interrupt rates, NAPI kicks in and starts using polling
> rather than irqs, so no more balancing takes place since there are 
> no more interrupts - checked this with /proc/interrupts - at high packet 
> rates the irq counters for the network cards stalled)
> 
> C) napi off, CONFIG_IRQBALANCE=y
> 
> this is the setup I used in the end since all CPU cores were used. All of them
> went to 100%, and the pps rate I could pass through was higher than in 
> case A or B.
> 
> 
>   Also, your worst case hashing setup could be improved - I suggest you take a look at 
> http://vcalinus.gemenii.ro/?p=9 (see the generated filters example). The hashing method 
> described there will take a constant CPU time (4 checks) for each packet, regardless of how many 
> filter rules you have (provided you only filter by IP address). A tree of hashtables
> is constructed which matches each of the four bytes from the IP address in succesion.
> 
>   Using this hashing method, the hardware above, 2.6.20 with napi off and irq balancing on, I got 
> troughputs of 1.3Gbps / 250.000 pps  aggregated in+out in normal usage. CPU utilization 
> averages varied between 25 - 50 % for every core, so there was still room to grow. 
>   I expect much higher pps rates with better hardware (higher freq/larger cache Xeons).
> 
> 
> 
> Thursday, April 23, 2009, 3:31:47 PM, you wrote:
> 
> > On Wed, 2009-04-22 at 23:29 +0200, Jesper Dangaard Brouer wrote:
> >> Its runtime adjustable, so its easy to try out.
> 
> >>   via /sys/module/sch_htb/parameters/htb_hysteresis
> 
> > Thanks for the tip! This means I can play around with various values
> > while the machine is in production and see how it reacts.
> 
> >> The HTB classify hash has a scalability issue in kernels below 2.6.26. 
> >> Patrick McHardy fixes that up in 2.6.26.  What kernel version are you 
> >> using?
> 
> > I'm using 2.6.26, so I guess the fix is already there :(
> 
> >> Could you explain how you do classification? And perhaps outline where you 
> >> possible scalability issue is located?
> 
> >> If you are interested how I do scalable classification, see my 
> >> presentation from Netfilter Workshop 2008:
> 
> >>   http://nfws.inl.fr/en/?p=115
> >>   http://www.netoptimizer.dk/presentations/nfsw2008/Jesper-Brouer_Large-iptables-rulesets.pdf
> 
> > I had a look at your presentation and it seems to be focused in dividing
> > a single iptables rule chain into multiple chains, so that rule lookup
> > complexity decreases from linear to logarithmic.
> 
> > Since I only need to do shaping, I don't use iptables at all. Address
> > matching is all done in on the egress side, using u32. Rule schema is
> > this:
> 
> > 1. We have two /19 networks that differ pretty much in the first bits:
> > 80.x.y.z and 83.a.b.c; customer address spaces range from /22 nets to
> > individual /32 addresses.
> 
> > 2. The default ip hash (0x800) is size 1 (only one bucket) and has two
> > rules that select between two subsequent hash tables (say 0x100 and
> > 0x101) based on the most significant bits in the address.
> 
> > 3. Level 2 hash tables (0x100 and 0x101) are size 256 (256 buckets);
> > bucket selection is done by bits b10 - b17 (with b0 being the least
> > significant).
> 
> > 4. Each bucket contains complete cidr match rules (corresponding to real
> > customer addresses). Since bits b11 - b31 are already checked in upper
> > levels, this results in a maximum of 2 ^ 10 = 1024 rules, which is the
> > worst case, if all customer addresses that "fall" into that bucket
> > are /32 (fortunately this is not the real case).
> 
> > In conclusion each packet would be matched against at most 1026 rules
> > (worst case). The real case is actually much better: only one bucket
> > with 400 rules, all other less than 70 rules and most of them less than
> > 10 rules.
> 
> >> > I guess htb_hysteresis only affects the actual shaping (which takes 
> >> > place after the packet is classified).
> 
> >> Yes, htb_hysteresis basically is a hack to allow extra bursts... we 
> >> actually considered removing it completely...
> 
> > It's definitely worth a try at least. Thanks for the tips!
> 
> > Radu Rendec
> 
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Best regards,
>  Calin                            mailto:calin.velea@gemenii.ro

  parent reply	other threads:[~2009-04-24 11:19 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-17 10:40 htb parallelism on multi-core platforms Radu Rendec
2009-04-17 11:31 ` David Miller
2009-04-17 11:33 ` Badalian Vyacheslav
2009-04-17 22:41 ` Jarek Poplawski
2009-04-18  0:21   ` Denys Fedoryschenko
2009-04-18  7:56     ` Jarek Poplawski
2009-04-22 14:02       ` Radu Rendec
2009-04-22 21:29         ` Jesper Dangaard Brouer
2009-04-23  8:20           ` Jarek Poplawski
2009-04-23 13:56             ` Radu Rendec
2009-04-23 18:19               ` Jarek Poplawski
2009-04-23 20:19                 ` Jesper Dangaard Brouer
2009-04-24  9:42                   ` Radu Rendec
2009-04-28 10:15                     ` Jesper Dangaard Brouer
2009-04-29 10:21                       ` Radu Rendec
2009-04-29 10:31                         ` Jesper Dangaard Brouer
2009-04-29 11:03                           ` Radu Rendec
2009-04-29 12:23                             ` Jarek Poplawski
2009-04-29 13:15                               ` Radu Rendec
2009-04-29 13:38                                 ` Jarek Poplawski
2009-04-29 16:21                                   ` Radu Rendec
2009-04-29 22:49                                     ` Calin Velea
2009-04-29 23:00                                       ` Re[2]: " Calin Velea
2009-04-30 11:19                                       ` Radu Rendec
2009-04-30 11:44                                         ` Jesper Dangaard Brouer
2009-04-30 14:04                                         ` Re[2]: " Calin Velea
2009-05-08 10:15                                           ` Paweł Staszewski
2009-05-08 17:55                                             ` Vladimir Ivashchenko
2009-05-08 18:07                                               ` Denys Fedoryschenko
2009-04-23 12:31           ` Radu Rendec
2009-04-23 18:43             ` Jarek Poplawski
2009-04-23 19:06               ` Jesper Dangaard Brouer
2009-04-23 19:14                 ` Jarek Poplawski
2009-04-23 19:47                   ` Jesper Dangaard Brouer
2009-04-23 20:00                     ` Jarek Poplawski
2009-04-23 20:09                     ` Jeff King
2009-04-24  6:01               ` Jarek Poplawski
     [not found]             ` <1039493214.20090424135024@gemenii.ro>
2009-04-24 11:19               ` Jarek Poplawski [this message]
2009-04-24 11:35             ` Re[2]: " Calin Velea

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090424111940.GA6450@ff.dom.local \
    --to=jarkao2@gmail.com \
    --cc=denys@visp.net.lb \
    --cc=hawk@diku.dk \
    --cc=netdev@vger.kernel.org \
    --cc=radu.rendec@ines.ro \
    --cc=vcalinus@gemenii.ro \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).