All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jarek Poplawski <jarkao2@gmail.com>
To: Calin Velea <vcalinus@gemenii.ro>
Cc: Radu Rendec <radu.rendec@ines.ro>,
	Jesper Dangaard Brouer <hawk@diku.dk>,
	Denys Fedoryschenko <denys@visp.net.lb>,
	netdev@vger.kernel.org
Subject: Re: htb parallelism on multi-core platforms
Date: Fri, 24 Apr 2009 11:19:40 +0000	[thread overview]
Message-ID: <20090424111940.GA6450@ff.dom.local> (raw)
In-Reply-To: <1039493214.20090424135024@gemenii.ro>

On Fri, Apr 24, 2009 at 01:50:24PM +0300, Calin Velea wrote:
> Hi,
Hi,

Very interesting message, but try to use plain format next time.
I guess your mime/html original wasn't accepted by netdev@.

Jarek P.

> 
>   Maybe some actual results I got some time ago could help you and others who had the same problems:
> 
> Hardware: quad-core Xeon X3210 (2.13GHz, 8M  L2 cache), 2 Intel PCI Express Gigabit NICs
> Kernel: 2.6.20
> 
>   I did some udp flood tests in the following configurations - the machine was configured as a
> traffic shaping bridge, about 10k htb rules loaded, using hashing (see below):
> 
> A) napi on,  irqs for each card statically allocated to 2 CPU cores
> 
> when flooding, the same CPU went 100% softirq always (seems logical,
> since it is statically bound to the irq)
> 
> B) napi on, CONFIG_IRQBALANCE=y
> 
> when flooding, a random CPU went 100% softirq always. (here,
> at high interrupt rates, NAPI kicks in and starts using polling
> rather than irqs, so no more balancing takes place since there are 
> no more interrupts - checked this with /proc/interrupts - at high packet 
> rates the irq counters for the network cards stalled)
> 
> C) napi off, CONFIG_IRQBALANCE=y
> 
> this is the setup I used in the end since all CPU cores were used. All of them
> went to 100%, and the pps rate I could pass through was higher than in 
> case A or B.
> 
> 
>   Also, your worst case hashing setup could be improved - I suggest you take a look at 
> http://vcalinus.gemenii.ro/?p=9 (see the generated filters example). The hashing method 
> described there will take a constant CPU time (4 checks) for each packet, regardless of how many 
> filter rules you have (provided you only filter by IP address). A tree of hashtables
> is constructed which matches each of the four bytes from the IP address in succesion.
> 
>   Using this hashing method, the hardware above, 2.6.20 with napi off and irq balancing on, I got 
> troughputs of 1.3Gbps / 250.000 pps  aggregated in+out in normal usage. CPU utilization 
> averages varied between 25 - 50 % for every core, so there was still room to grow. 
>   I expect much higher pps rates with better hardware (higher freq/larger cache Xeons).
> 
> 
> 
> Thursday, April 23, 2009, 3:31:47 PM, you wrote:
> 
> > On Wed, 2009-04-22 at 23:29 +0200, Jesper Dangaard Brouer wrote:
> >> Its runtime adjustable, so its easy to try out.
> 
> >>   via /sys/module/sch_htb/parameters/htb_hysteresis
> 
> > Thanks for the tip! This means I can play around with various values
> > while the machine is in production and see how it reacts.
> 
> >> The HTB classify hash has a scalability issue in kernels below 2.6.26. 
> >> Patrick McHardy fixes that up in 2.6.26.  What kernel version are you 
> >> using?
> 
> > I'm using 2.6.26, so I guess the fix is already there :(
> 
> >> Could you explain how you do classification? And perhaps outline where you 
> >> possible scalability issue is located?
> 
> >> If you are interested how I do scalable classification, see my 
> >> presentation from Netfilter Workshop 2008:
> 
> >>   http://nfws.inl.fr/en/?p=115
> >>   http://www.netoptimizer.dk/presentations/nfsw2008/Jesper-Brouer_Large-iptables-rulesets.pdf
> 
> > I had a look at your presentation and it seems to be focused in dividing
> > a single iptables rule chain into multiple chains, so that rule lookup
> > complexity decreases from linear to logarithmic.
> 
> > Since I only need to do shaping, I don't use iptables at all. Address
> > matching is all done in on the egress side, using u32. Rule schema is
> > this:
> 
> > 1. We have two /19 networks that differ pretty much in the first bits:
> > 80.x.y.z and 83.a.b.c; customer address spaces range from /22 nets to
> > individual /32 addresses.
> 
> > 2. The default ip hash (0x800) is size 1 (only one bucket) and has two
> > rules that select between two subsequent hash tables (say 0x100 and
> > 0x101) based on the most significant bits in the address.
> 
> > 3. Level 2 hash tables (0x100 and 0x101) are size 256 (256 buckets);
> > bucket selection is done by bits b10 - b17 (with b0 being the least
> > significant).
> 
> > 4. Each bucket contains complete cidr match rules (corresponding to real
> > customer addresses). Since bits b11 - b31 are already checked in upper
> > levels, this results in a maximum of 2 ^ 10 = 1024 rules, which is the
> > worst case, if all customer addresses that "fall" into that bucket
> > are /32 (fortunately this is not the real case).
> 
> > In conclusion each packet would be matched against at most 1026 rules
> > (worst case). The real case is actually much better: only one bucket
> > with 400 rules, all other less than 70 rules and most of them less than
> > 10 rules.
> 
> >> > I guess htb_hysteresis only affects the actual shaping (which takes 
> >> > place after the packet is classified).
> 
> >> Yes, htb_hysteresis basically is a hack to allow extra bursts... we 
> >> actually considered removing it completely...
> 
> > It's definitely worth a try at least. Thanks for the tips!
> 
> > Radu Rendec
> 
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Best regards,
>  Calin                            mailto:calin.velea@gemenii.ro

  parent reply	other threads:[~2009-04-24 11:19 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-17 10:40 htb parallelism on multi-core platforms Radu Rendec
2009-04-17 11:31 ` David Miller
2009-04-17 11:33 ` Badalian Vyacheslav
2009-04-17 22:41 ` Jarek Poplawski
2009-04-18  0:21   ` Denys Fedoryschenko
2009-04-18  7:56     ` Jarek Poplawski
2009-04-22 14:02       ` Radu Rendec
2009-04-22 21:29         ` Jesper Dangaard Brouer
2009-04-23  8:20           ` Jarek Poplawski
2009-04-23 13:56             ` Radu Rendec
2009-04-23 18:19               ` Jarek Poplawski
2009-04-23 20:19                 ` Jesper Dangaard Brouer
2009-04-24  9:42                   ` Radu Rendec
2009-04-28 10:15                     ` Jesper Dangaard Brouer
2009-04-29 10:21                       ` Radu Rendec
2009-04-29 10:31                         ` Jesper Dangaard Brouer
2009-04-29 11:03                           ` Radu Rendec
2009-04-29 12:23                             ` Jarek Poplawski
2009-04-29 13:15                               ` Radu Rendec
2009-04-29 13:38                                 ` Jarek Poplawski
2009-04-29 16:21                                   ` Radu Rendec
2009-04-29 22:49                                     ` Calin Velea
2009-04-29 23:00                                       ` Re[2]: " Calin Velea
2009-04-30 11:19                                       ` Radu Rendec
2009-04-30 11:44                                         ` Jesper Dangaard Brouer
2009-04-30 14:04                                         ` Re[2]: " Calin Velea
2009-05-08 10:15                                           ` Paweł Staszewski
2009-05-08 17:55                                             ` Vladimir Ivashchenko
2009-05-08 18:07                                               ` Denys Fedoryschenko
2009-04-23 12:31           ` Radu Rendec
2009-04-23 18:43             ` Jarek Poplawski
2009-04-23 19:06               ` Jesper Dangaard Brouer
2009-04-23 19:14                 ` Jarek Poplawski
2009-04-23 19:47                   ` Jesper Dangaard Brouer
2009-04-23 20:00                     ` Jarek Poplawski
2009-04-23 20:09                     ` Jeff King
2009-04-24  6:01               ` Jarek Poplawski
     [not found]             ` <1039493214.20090424135024@gemenii.ro>
2009-04-24 11:19               ` Jarek Poplawski [this message]
2009-04-24 11:35             ` Re[2]: " Calin Velea

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090424111940.GA6450@ff.dom.local \
    --to=jarkao2@gmail.com \
    --cc=denys@visp.net.lb \
    --cc=hawk@diku.dk \
    --cc=netdev@vger.kernel.org \
    --cc=radu.rendec@ines.ro \
    --cc=vcalinus@gemenii.ro \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.