netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Schaaf <bof@bof.de>
To: Martin Josefsson <gandalf@wlug.westbo.se>
Cc: Andi Kleen <ak@suse.de>, jamal <hadi@cyberus.ca>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Netfilter-devel <netfilter-devel@lists.netfilter.org>,
	netdev@oss.sgi.com, netfilter-core@lists.netfilter.org,
	Patrick Schaaf <bof@bof.de>
Subject: Re: TODO list before feature freeze
Date: Mon, 29 Jul 2002 18:15:01 +0200	[thread overview]
Message-ID: <20020729181501.C570@oknodo.bof.de> (raw)
In-Reply-To: <1027957218.12610.71.camel@tux>; from gandalf@wlug.westbo.se on Mon, Jul 29, 2002 at 05:40:18PM +0200

(warning: crystal ball engaged to parse from the quoted mail snippets.
Maybe missing context. I'm just reading netfilter-devel)

> On Mon, 2002-07-29 at 13:56, Andi Kleen wrote:
> 
> > here is a patch for 2.4 that just makes it use get_free_pages to test the 
> > TLB theory.

I presume this is about the vmalloc()ed hash bucket table? If yes, it's
certainly an interesting experiment to try making it allocated from an
area without TLB issues. We can expect a TLB miss on every packet with
the current setup, allocating the bucket table from large-TLB memory
would be a clear win of one memory roundtrip.

The netfilter hook statistics patch I mentioned in the other mail,
should be able to show the difference. If my guess is right, you
could see a 5-10% improvement on the ip_conntrack hook functions.

> > Another obvious improvement would be to not use list_heads 
> > for the hash table buckets - a single pointer would likely suffice and 
> > it would cut the hash table in half, saving cache, TLB and memory.
> 
> Martin Josefsson wrote:
> I think the list_heads are used for only one thing currently, for the
> early eviction in case of overload,

Don't forget the nonscanning list_del(), called whenever a conntrack
is unhashed at it's death. However, with a suitable bucket number,
i.e. low chain lengths, the scan on conntrack removal would be OK.

The early_drop() scanning, if it wants to work backward, may as well
work forward, keeping a "last unreplied found" pointer, and returning
that when falling off the single list end.

Thus, I also think that the list could be simple.

>From the top of my head, here are other fields that we could get rid off:

- the ctrack backpointer in each tuple.
- the protocol field in each tuple.
- the 20 byte infos[] array in ip_conntrack.
- we could out-of-line ip_nat_info.

With the current layout, when lists must be walked on a 32-byte-cacheline
box, we are sure to always read two cache lines for the skipped-over
tuples.

> I know I've had plans on rewriting the locking in conntrack which is
> quite frankly horrible, one giant rwlock used for almost everything
> (including the hashtable).

I'd like to see lockmeter statistics before this change. When you split
the one lock into a sectored lock: each conntrack is hashed twice, so
you need to be careful with lock order when adding or removing.
(well, there is another possibility, but I won't go into that now)

> One idea that has come to mind is using RCU

I don't see RCU solving hash link list update problems. Care to explain
how that would work?

> And this eviction which occurs at overload needs to be redone, we can't
> go around dropping one unreplied connection at a time, we need
> gang-eviction of unreplied connections.

I propose to put them all on a seperate LRU list, and reap the oldest.

best regards
  Patrick

  reply	other threads:[~2002-07-29 16:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-07-18  9:34 TODO list before feature freeze Rusty Russell
2002-07-19  7:39 ` Balazs Scheidler
2002-07-19 17:43 ` Michael Richardson
2002-07-29 10:57 ` jamal
2002-07-29 11:12   ` Andi Kleen
2002-07-29 11:23     ` jamal
2002-07-29 11:56       ` Andi Kleen
2002-07-29 15:40         ` Martin Josefsson
2002-07-29 16:15           ` Patrick Schaaf [this message]
2002-07-29 17:12             ` Martin Josefsson
2002-07-29 17:35               ` Nivedita Singhvi
2002-07-29 22:43         ` Martin Josefsson
2002-07-29 16:26       ` Patrick Schaaf
2002-07-29 16:31         ` Andi Kleen
2002-07-29 16:42           ` Patrick Schaaf
2002-07-29 16:45             ` Patrick Schaaf
2002-07-30 11:58         ` jamal
2002-07-30 12:27           ` Patrick Schaaf
2002-07-30 12:29             ` jamal
2002-07-30 13:06               ` Patrick Schaaf
2002-07-30 13:42                 ` jamal
2002-07-30 13:08               ` Martin Josefsson
2002-07-30 15:54                 ` Filip Sneppe (Cronos)
2002-07-29 15:25     ` Michael Richardson
2002-07-29 15:52       ` Patrick Schaaf
2002-07-29 20:51       ` Andi Kleen
2002-07-30  7:26         ` Patrick Schaaf
2002-07-29 22:14   ` Rusty Russell
2002-07-30 12:04     ` jamal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020729181501.C570@oknodo.bof.de \
    --to=bof@bof.de \
    --cc=ak@suse.de \
    --cc=gandalf@wlug.westbo.se \
    --cc=hadi@cyberus.ca \
    --cc=netdev@oss.sgi.com \
    --cc=netfilter-core@lists.netfilter.org \
    --cc=netfilter-devel@lists.netfilter.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).