netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Bergsma <mark@wikimedia.org>
To: netdev@vger.kernel.org
Subject: Re: [PATCH] IPVS: Allow boot time change of hash size.
Date: Mon, 28 Dec 2009 19:49:38 +0100	[thread overview]
Message-ID: <4B38FDC2.9000507@wikimedia.org> (raw)
In-Reply-To: <20081202.163729.158874214.davem@davemloft.net>

On 03-12-08 01:37, David Miller wrote:
> From: "Catalin(ux) M. BOIE" <catab@embedromix.ro>
> Date: Tue, 2 Dec 2008 16:16:04 -0700 (MST)
>> I was looking for anything that could get me past of 88.000 request per
>> seconds.
>> The help text told me to raise that value if I have big number of
>> connections. I just needed an easy way to test.
> 
> You're just repeating what I said, you "think" it should be
> changed and as a result you are wasting everyones time.
> 
> You don't actually "know", you're just guessing using random
> snippets from documentation rather than good hard evidence of
> a need.

Hello,

I just found this year-old thread about a patch allowing the IPVS
connection hash table size to be set at load time by a module parameter.
Apparently the conclusion reached was that allowing this configuration
setting to be changed would be useless, and that the poster's
performance problems would likely lie elsewhere, since he had no
evidence it was caused by the hash table size.

We do however run into the same problem with the default setting (2^12 =
4096 entries), as most of our LVS balancers handle around a million
connections/SLAB entries at any point in time (around 100-150 kpps
load). With only 4096 hash table entries this implies that each entry
consists of a linked list of 256 connections *on average*.

To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
with both the default 4096 table size, and the same kernel recompiled
with IP_VS_CONN_TAB_BITS set to 18 (2^18 = 262144 entries). I built a
quick test setup with a part of Wikimedia/Wikipedia's live traffic
mirrored by the switch to the test host.

With the default setting, at ~ 120 kpps packet load we saw a typical %si
CPU usage of around 30-35%, and oprofile reported a hot spot in
ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
1719761  42.3741  ip_vs.ko                 ip_vs.ko
ip_vs_conn_in_get
302577    7.4554  bnx2                     bnx2                     /bnx2
181984    4.4840  vmlinux                  vmlinux
__ticket_spin_lock
128636    3.1695  vmlinux                  vmlinux
ip_route_input
74345     1.8318  ip_vs.ko                 ip_vs.ko
ip_vs_conn_out_get
68482     1.6874  vmlinux                  vmlinux
mwait_idle

After loading the recompiled kernel with 2^18 entries, %si CPU usage
dropped in half to around 12-18%, and oprofile looks much healthier,
with only 7% spent in ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
265641   14.4616  bnx2                     bnx2                     /bnx2
143251    7.7986  vmlinux                  vmlinux
__ticket_spin_lock
140661    7.6576  ip_vs.ko                 ip_vs.ko
ip_vs_conn_in_get
94364     5.1372  vmlinux                  vmlinux
mwait_idle
86267     4.6964  vmlinux                  vmlinux
ip_route_input

So yes, having the table size as an ip_vs module parameter would be
*very* welcome. Perhaps not as convenient as a dynamically resizing
table, but it would be a lot less work and much more maintainable in
production than compiling a kernel with every security update...

-- 
Mark Bergsma <mark@wikimedia.org>
Operations Engineer, Wikimedia Foundation

  reply	other threads:[~2009-12-28 19:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-26 13:36 [PATCH] IPVS: Allow boot time change of hash size Catalin(ux) M. BOIE
2008-11-26 14:40 ` Joseph Mack NA3T
2008-11-26 23:27   ` David Miller
2008-11-27  7:05     ` Catalin(ux) M. BOIE
2008-11-27  7:37       ` David Miller
2008-11-27  6:58   ` Catalin(ux) M. BOIE
2008-11-27 15:58     ` Joseph Mack NA3T
2008-11-28  8:49       ` Catalin(ux) M. BOIE
2008-11-28 14:55         ` Joseph Mack NA3T
2008-12-02 15:34           ` Catalin(ux) M. BOIE
2008-12-02 22:51             ` David Miller
2008-12-02 23:16               ` Catalin(ux) M. BOIE
2008-12-03  0:37                 ` David Miller
2009-12-28 18:49                   ` Mark Bergsma [this message]
2009-12-29  1:34                     ` Simon Horman
2010-01-04 13:57                       ` Patrick McHardy
2010-01-04 23:24                         ` Simon Horman
2010-01-05 11:02                           ` Mark Bergsma
2010-01-06 15:25                             ` Catalin(ux) M. BOIE
2010-01-05  0:20                     ` Simon Horman
2010-01-05  4:56                       ` Patrick McHardy
2008-12-03 21:11                 ` Graeme Fowler
2008-12-04  7:47                   ` Catalin(ux) M. BOIE

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B38FDC2.9000507@wikimedia.org \
    --to=mark@wikimedia.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).