Re: ctnetlink questions - Patrick McHardy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Patrick McHardy <kaber@trash.net>
To: Henrik Nordstrom <hno@marasystems.com>
Cc: Harald Welte <laforge@netfilter.org>,
	Netfilter Development Mailinglist
	<netfilter-devel@lists.netfilter.org>
Subject: Re: ctnetlink questions
Date: Mon, 20 Oct 2003 05:01:17 +0200	[thread overview]
Message-ID: <3F934FFD.7090700@trash.net> (raw)
In-Reply-To: <Pine.LNX.4.44.0310200244580.17141-100000@filer.marasystems.com>

Henrik Nordstrom wrote:

>Agreed, partially.
>
>My opinions:
>
>It is imporant that userspace does not miss entries which was in the 
>kernel when duming started and still exists in the kernel when the dump 
>finished.
>
>It is also important userspace can have some kind of semi-static 
>reference to a conntrack to be able to manipulate that conntrack without 
>risking hitting another conntrack.
>
>It is OK for me if it is unspecified what happens with entries which 
>either was created or destroyed while the dump was in progress.
>

I totally agree.

>With these criterias in mind I propose a hybrid of your approaches
>
>a) Assign a globally unique ID to each conntrack, in such manner that IDs 
>is not reused for a significant amount of time. This to provide a stable 
>point of reference to a connection with low risk of false collisions if 
>the original connection was destroyed while userspace still thought it was 
>there. 
>
>b) When duming the conntrack entries, dump one bucket at a time. 
>If the bucket is too large to fit in the current response packet 
>then sort the bucket entries on ID and keep track of which bucket+ID 
>was last dumped. On next netlink packet restart at the same bucket and 
>skip the entries with a ID lower than those already dumped for that 
>bucket.
>
>This requires a read lock per hash bucket while dumping that bucket, and
>some small (usually) amount of memory to keep the temporary sorted index
>of bucket entries unless the bucket is permanently resorted in which case
>it may be possible to solve with no memory allocation (but then requires 
>the bucket to be write locked while resorting which is probably worse).
>

Sounds like a nice solution. I favour the permanent resorting for these 
reasons:
- all temporary memory allocations should be released before 
ctnetlink_dump is left,
  not in ctnetlink_done since we don't know if and when the read will 
continue. this means
  sorting multiple times is required.

- we can use some sorting algorithm which benefits from pre-sorted 
input. this would
  give better average performance. IIRC new conntracks are added at the 
head of the
  chains, so if we sort and walk backwards through the chains we only 
have to resort
  after an id counter wrap. Sorting is also pretty easy in that case: 
move all entries at
  head of list whose id is smaller than the last one's to the tail while 
preserving order,
  stop at first one thats bigger. This also means we only need the write 
lock in a very
  very rare case.

>Regarding the conntrack ID. For me it is acceptable if as much as 64 bits
>is reserved for the conntrack ID. This gives sufficient namespace to
>
>a) Provide truly unique IDs suitable for long-term reference without any 
>risk of collisions.
>

I agree, we should use 64 bit.

>b) Allows for the namespace to be built in such manner that there never
>will be any risk for congestion in finding the next available ID. For 
>example by using CPU#+counter.
>

Also a good idea. Thanks Henrik for your valuable input. Harald, what do 
you think of
this approach ?

Best regards,
Patrick (hoping mozilla will have mercy with his formatting this time)

next prev parent reply	other threads:[~2003-10-20  3:01 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20031019171851.GR21521@sunbeam.de.gnumonks.org>
2003-10-19 19:36 ` ctnetlink questions Patrick McHardy
2003-10-19 20:28   ` Harald Welte
2003-10-19 22:55     ` Patrick McHardy
2003-10-20  1:05       ` Henrik Nordstrom
2003-10-20  3:01         ` Patrick McHardy [this message]
2003-10-20  3:09           ` Patrick McHardy
2003-10-20  6:34           ` Henrik Nordstrom
2003-10-20 17:53             ` Patrick McHardy
2003-10-20  7:15           ` Harald Welte
2003-10-20  9:37             ` Henrik Nordstrom
2003-10-20 18:43               ` Patrick McHardy
2003-10-20 18:37                 ` Harald Welte
2003-10-20 19:17                   ` Patrick McHardy
2003-10-20 19:41                   ` Balazs Scheidler
2003-10-20 20:20                     ` Patrick McHardy
2003-10-20 22:59                       ` Harald Welte
2003-10-20 18:17             ` Patrick McHardy
2003-10-20 18:39               ` Harald Welte
2003-10-20 19:21                 ` Patrick McHardy
2003-10-21 16:47                 ` Patrick McHardy
2003-10-21 19:54                   ` Henrik Nordstrom
2003-10-21 20:00                     ` Patrick McHardy
2003-10-20 18:52               ` Harald Welte
2003-10-20 19:52                 ` Patrick McHardy
2003-10-20 23:09                   ` Harald Welte
2003-10-20  7:04         ` Harald Welte
2003-10-20  7:17         ` Jozsef Kadlecsik
2003-10-20  9:29           ` Henrik Nordstrom
2004-02-06 18:52             ` Harald Welte
2004-02-09 10:33               ` Pablo Neira
2004-02-10 12:39               ` Patrick McHardy
2004-02-14 20:03                 ` Harald Welte
2004-02-15 10:01                   ` Patrick McHardy
2004-02-17 21:37                     ` Harald Welte
2003-10-20 14:48           ` Harald Welte
2003-10-20 18:53             ` Patrick McHardy
2003-10-20 22:57               ` Harald Welte
2003-10-20 11:11         ` Jozsef Kadlecsik
2003-10-20  6:58       ` Harald Welte
2003-10-19 14:54 Harald Welte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F934FFD.7090700@trash.net \
    --to=kaber@trash.net \
    --cc=hno@marasystems.com \
    --cc=laforge@netfilter.org \
    --cc=netfilter-devel@lists.netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.