All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Josefsson <gandalf@wlug.westbo.se>
To: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Netfilter Development Mailinglist <netfilter-devel@lists.netfilter.org>
Subject: Hashtrie testing2 (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2)
Date: Sat, 04 Mar 2006 21:11:51 +0100	[thread overview]
Message-ID: <1141503111.3881.61.camel@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.58.0602170951070.29301@blackhole.kfki.hu>

[-- Attachment #1: Type: text/plain, Size: 4717 bytes --]

On Fri, 2006-02-17 at 10:30 +0100, Jozsef Kadlecsik wrote:

> Here are the tests I'd like to see against hashtrie:
> 
> - larger hashentry, i.e. when HASHSIFT is equal to 6, 7 or 8

I have seen good results when sizeof(struct hashentry) = 64 on my
laptop. But on the other hand my laptop likes the padding of struct
hashentry at the beginning with gives unaligned pointers. It's an
Pentium M processor in my laptop.

> - other HASHALIGN, hashbit_t values: (32, u8), (64, u16) and (64, u32).

Yes this needs more testing as well.

>   The current (32, u8) doesn't look optimal on 64bit CPUs, (64, u16) seems
>   to be the best, but without testing it's hard to choose.

Currently I don't have any 64bit cpus to test things on.

> - DoS against hashtrie by non-random tuples: single fixed destination IP,
>   port and successive source IP, port numbers. (I don't think the current
>   max 7 levels (childs) can survive such an attack.)

I just ran some tests on this with this code:

dstip = 192 << 24 | 168 << 16 | 1 << 8 | 1;
srcip = 200 << 24 | 100 << 16 | 50 << 8 | 0;

conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].src.ip = srcip + rand() % 1024;
conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].src.u.tcp.port = (u16)rand();
conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].dst.ip = dstip;
conntrack[i]->tuple[IP_CT_DIR_ORIGINAL].dst.u.tcp.port = 80;

And this is the result I got:

sizeof struct hashentry: 32
sizeof struct ip_conntrack: 48
number of conntrack entries: 1228800
number of slots per hashtrie bucket: 5
number of pad bytes: 3
number of bytes per child: 2048
insert: time: 11299 cyc, 18854 ns (53040/s)
Number of failed inserts: 22232
Number of entries in hashtrie: 2435368
Number of children in hashtrie: 238756
Maxdepth of hashtrie: 3 (0 == root)
Maximum memory usage: 488972288

2.4 million entries in the hashtrie with fixed dstip, port and 1024
srcip's with random ports and a maxdepth of 3. I'd say the jenkins hash
is doing its job quite nicely, wasn't expecting such good results.

In the results above you see my main worry regarding hashtrie, the
memory usage, 200 bytes per entry in hashtrie (400 bytes per conntrack)
in this scenario, which means that the child-nodes aren't very populated
in the leaf nodes.

2435368 entries / 238756 child-nodes = 10.2 entries/child-node

And there's 2048 / 32 = 64 struct hashentry per child-node.
That's 64 * 5 = 320 entries/child-node

10.2 / 320 = 3.2% usage of child-nodes in average which is simply
horrible.

I'm almost starting to think there's a major bug in there somewhere. 
I need to write some code to walk the tree and calculate the usage of
each level to see that it's actually 100% for all buckets that has a
child.

> Somehow I don't really like the eviction algorithm. What about some lazy
> auto-eviction instead: say, if there are more than 90% of the max
> elements, then drop a (any) unassured connection which can be found on the
> path when inserting a new one. Thus the current fixed stack could be
> eliminated and there were no builtin limit in the depth.

Couldn't this lead to the situation where we evict an entry early on the
path, and then that slot gets reused for another entry that's also
unassured, and it repeats...
The problem with eviction in the hashtrie is that the depth of the entry
has no correlation to the age of the entry like in the current hashtable
(that's only true if you have long linked lists which leads to poor
performance, so no real-world installation has properly working
evication anyway today so maybe it doesn't matter too much)

> > I have an old untested patch against nf_conntrack as well but it needs
> > some rewriting of the conntrack locking in order to avoid an SMP deadlock.
> 
> Rusty's lock-ordering is a sure solution, but penalizes the process.
> I have been thinking on wether we could use simply, separated, unordered
> locking:
> 
> 1. lock bucket according to tuple1
> 	add element
>    unlock bucket
> 2. lock bucket according to tuple2,
> 	add element
>    unlock bucket
>    if (add element failed)
> 	undo 1.

Yes this is the way I've been thinking about. If we get into the case
where we have conntrack A and B that have the exact same tuples but in
reverse it doesn't really matter if one gets dropped or both, this is
strictly best effort, no need to bend over backwards in order to
minimize racewindows if it isn't anything serious.
The only case I can think about where it might matter is the case of
simultaneous open from both sides with the same source/destination
ports, dns (some clients issue requests from port 53), games and ipsec
isakmp come to mind. 

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2006-03-04 20:11 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-13  2:41 [PATCH 4/4] first conntrack ID must be 1 not 2 Pablo Neira Ayuso
2006-02-13 11:20 ` Harald Welte
2006-02-16  8:33   ` Patrick McHardy
2006-02-16  8:47     ` Jozsef Kadlecsik
2006-02-16  9:02       ` Patrick McHardy
2006-02-16  9:11         ` Jozsef Kadlecsik
2006-02-16  9:14           ` Patrick McHardy
2006-02-16  9:36             ` Jozsef Kadlecsik
2006-02-16 20:09               ` Patrick McHardy
2006-02-17  8:18                 ` Jozsef Kadlecsik
2006-02-17  8:45                   ` Martin Josefsson
2006-02-17  9:30                     ` Jozsef Kadlecsik
2006-02-17 18:41                       ` Jozsef Kadlecsik
2006-03-04 16:23                         ` Hashtrie testing (was: Re: [PATCH 4/4] first conntrack ID must be 1 not 2) Martin Josefsson
2006-03-05  9:49                           ` Jozsef Kadlecsik
2006-03-05 13:24                             ` Martin Josefsson
2006-03-04 20:11                       ` Martin Josefsson [this message]
2006-03-05 11:24                         ` Hashtrie testing2 " Jozsef Kadlecsik
2006-03-05 17:48                           ` Martin Josefsson
2006-03-06 13:15                             ` Jozsef Kadlecsik
2006-03-07 18:33                               ` Martin Josefsson
2006-03-08  6:34                                 ` Patrick Schaaf
2006-03-12 18:49                                 ` Martin Josefsson
2006-03-14 11:35                                   ` Jozsef Kadlecsik
2006-03-23 11:27                                   ` Jozsef Kadlecsik
2006-03-23 21:07                                     ` Martin Josefsson
2006-03-25  8:39                                       ` Jozsef Kadlecsik
2006-03-28 12:26                                         ` Jozsef Kadlecsik
2006-03-30  8:28                                 ` Hashtrie testing2, dancing trees Amin Azez
2006-03-31 18:43                                   ` Jozsef Kadlecsik
2006-02-17  8:50                   ` [PATCH 4/4] first conntrack ID must be 1 not 2 Patrick McHardy
2006-03-30  8:31                 ` Amin Azez
2006-03-31  1:11                   ` Patrick McHardy
2006-03-31 18:35                     ` Jozsef Kadlecsik
2006-03-31 18:44                       ` Patrick McHardy
2006-04-01 19:31                         ` Harald Welte
2006-04-06 11:02                           ` Patrick McHardy
2006-04-11 16:09                             ` Amin Azez
2006-04-11 16:17                               ` Patrick McHardy
     [not found]                                 ` <443CA579.3030908@ufomechanic.net>
2006-04-12 18:30                                   ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1141503111.3881.61.camel@localhost.localdomain \
    --to=gandalf@wlug.westbo.se \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=netfilter-devel@lists.netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.