[RFC] [PATCH] ctnetlink updates

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] [PATCH] ctnetlink updates
@ 2005-03-27 23:55 Pablo Neira
  2005-04-01  6:59 ` Harald Welte
  2005-04-03 18:01 ` Patrick McHardy
  0 siblings, 2 replies; 48+ messages in thread
From: Pablo Neira @ 2005-03-27 23:55 UTC (permalink / raw)
  To: Netfilter Development Mailinglist; +Cc: Harald Welte, Patrick McHardy

Hi,

I've ported nfnetlink-ctnetlink to 2.6 ip_conntrack to make the 
transition easier. So my intentions are porting it to nfconntrack once 
it gets pushed forward. My work is done on top of the ct-event-API.

There are some issues I'd like to discuss:

o Declaring ID as unsigned int. I think it's just fine.

	- A conntrack must be identified with one of the tuples (original or 
reply) and its id. That way it can be uniquely identified.

	- Using u_int64_t just reduces the possibility of the wrapping around 
but such possible problem is still there.

o dump_table() has problems once wrapping around happens.

	- The ordered list isn't ordered anymore once id wrapping around 
happens. New conntracks with low id's are inserted at the end. While 
dumping the table, the branch that compares that ct->id <= cb->args[0] 
returns true and those new conntracks aren't dumped.

I've introduced a function that inserts conntrack ordered by id in the 
buckets.

static inline void
list_insert_ordered(struct list_head *head,
                     struct ip_conntrack *ct,
                     enum ip_conntrack_dir dir)
{
         struct list_head *i;
         struct ip_conntrack *cur;

         ASSERT_WRITE_LOCK(head);
         list_for_each(i, head) {
                 cur = (struct ip_conntrack *) i;
                 if (ct->id <= cur->id) {
                         list_add_tail(&ct->tuplehash[dir].list, i);
                         return;
                 }
         }
         list_add_tail(&ct->tuplehash[dir].list, head);
}

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-03-27 23:55 [RFC] [PATCH] ctnetlink updates Pablo Neira
@ 2005-04-01  6:59 ` Harald Welte
  2005-04-03 18:01 ` Patrick McHardy
  1 sibling, 0 replies; 48+ messages in thread
From: Harald Welte @ 2005-04-01  6:59 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]

On Mon, Mar 28, 2005 at 01:55:15AM +0200, Pablo Neira wrote:
> Hi,
> 
> I've ported nfnetlink-ctnetlink to 2.6 ip_conntrack to make the transition 
> easier. So my intentions are porting it to nfconntrack once it gets pushed 
> forward. My work is done on top of the ct-event-API.

We have the habit of working simultaneously in the same area :(
Unfortunately I've reorganized the tree and shuffled the files, so I'll
have a hard time merging...

> There are some issues I'd like to discuss:
> 
> o Declaring ID as unsigned int. I think it's just fine.
> 
> 	- A conntrack must be identified with one of the tuples
> 	(original or reply) and its id. That way it can be uniquely
> 	identified.
> 
> 	- Using u_int64_t just reduces the possibility of the wrapping
> 	around but such possible problem is still there.

Well, those of you who know the discussion know my point of view: I
don't want an Id and/or an ordered list.  If the user tells us to delete
a connection with a given tuple, we simply delete it.  


-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-03-27 23:55 [RFC] [PATCH] ctnetlink updates Pablo Neira
  2005-04-01  6:59 ` Harald Welte
@ 2005-04-03 18:01 ` Patrick McHardy
  2005-04-06 18:08   ` Pablo Neira
  1 sibling, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-04-03 18:01 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> I've ported nfnetlink-ctnetlink to 2.6 ip_conntrack to make the 
> transition easier. So my intentions are porting it to nfconntrack once 
> it gets pushed forward. My work is done on top of the ct-event-API.
> 
> There are some issues I'd like to discuss:
> 
> o Declaring ID as unsigned int. I think it's just fine.
> 
>     - A conntrack must be identified with one of the tuples (original or 
> reply) and its id. That way it can be uniquely identified.

Good idea, although I'm not completely convinced.

>     - Using u_int64_t just reduces the possibility of the wrapping 
> around but such possible problem is still there.

The time until a wrap is many many years even if you assume very
high connection rate and many CPUs, so its not a practical problem.
The difference to your solution is that you can tell for sure that
clashes won't occur until a date under known conditions. I dislike
the idea of an unreliable API that has no possibilty of even noticing
and/or handling the error, so I'm not sure about this.

> o dump_table() has problems once wrapping around happens.
> 
>     - The ordered list isn't ordered anymore once id wrapping around 
> happens. New conntracks with low id's are inserted at the end. While 
> dumping the table, the branch that compares that ct->id <= cb->args[0] 
> returns true and those new conntracks aren't dumped.
> 
> I've introduced a function that inserts conntrack ordered by id in the 
> buckets.

I don't like this, but lets talk about the other problem first,
maybe it will just go away :)

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-04-03 18:01 ` Patrick McHardy
@ 2005-04-06 18:08   ` Pablo Neira
  2005-04-17 15:07     ` Patrick McHardy
  0 siblings, 1 reply; 48+ messages in thread
From: Pablo Neira @ 2005-04-06 18:08 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Pablo Neira wrote:
>> There are some issues I'd like to discuss:
>>
>> o Declaring ID as unsigned int. I think it's just fine.
>>
>>     - A conntrack must be identified with one of the tuples (original 
>> or reply) and its id. That way it can be uniquely identified.
> 
> Good idea, although I'm not completely convinced.

Now I've changed my mind :).

I think that we can identify a connection with both the original and 
reply tuple. Since a connection is represented by means of a conntrack, 
if a user kills a conntrack via ctnetlink, he's willing to kill the 
connection that the conntrack represents, and not to such conntrack itself.

There aren't two conntracks with the same original and reply tuples. I 
can't see anymore why we need such id.

>>     - Using u_int64_t just reduces the possibility of the wrapping 
>> around but such possible problem is still there.
> 
> 
> The time until a wrap is many many years even if you assume very
> high connection rate and many CPUs, so its not a practical problem.
> The difference to your solution is that you can tell for sure that
> clashes won't occur until a date under known conditions. I dislike
> the idea of an unreliable API that has no possibilty of even noticing
> and/or handling the error, so I'm not sure about this.

Yes, you are right this is not a practical problem. Well, if we keep 
using the id, we could detect a wrap and adjust all sequence numbers. Of 
course that this is a expensive operation but it would happen once in a 
blue moon :).

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-04-06 18:08   ` Pablo Neira
@ 2005-04-17 15:07     ` Patrick McHardy
  2005-04-29  7:14       ` Jozsef Kadlecsik
  0 siblings, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-04-17 15:07 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> Now I've changed my mind :).
> 
> I think that we can identify a connection with both the original and 
> reply tuple. Since a connection is represented by means of a conntrack, 
> if a user kills a conntrack via ctnetlink, he's willing to kill the 
> connection that the conntrack represents, and not to such conntrack itself.

It depends on by what criteria the user selects the conntrack. I might
choose to kill/remark/... every connection that has transfered more than
X bytes, in which case I don't want to touch a new connection with the
same tuples that has transfered less than X. How can we handle this
and similar cases without an identifier that is unique over time?

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-04-17 15:07     ` Patrick McHardy
@ 2005-04-29  7:14       ` Jozsef Kadlecsik
  2005-04-29  8:02         ` Harald Welte
  2005-05-01 23:49         ` [RFC] [PATCH] ctnetlink updates Pablo Neira
  0 siblings, 2 replies; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-04-29  7:14 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira

On Sun, 17 Apr 2005, Patrick McHardy wrote:

> Pablo Neira wrote:
> > Now I've changed my mind :).
> >
> > I think that we can identify a connection with both the original and
> > reply tuple. Since a connection is represented by means of a conntrack,
> > if a user kills a conntrack via ctnetlink, he's willing to kill the
> > connection that the conntrack represents, and not to such conntrack itself.
>
> It depends on by what criteria the user selects the conntrack. I might
> choose to kill/remark/... every connection that has transfered more than
> X bytes, in which case I don't want to touch a new connection with the
> same tuples that has transfered less than X. How can we handle this
> and similar cases without an identifier that is unique over time?

That is independent from id/tuple, because the condition is formulated in
the terms of transferred bytes.

I don't like id either. Conntrack can uniquely identified by

- src/dst tuples, globally, even in a cluster
- the pointer of the conntrack entry, locally

Why should we need another unique id?

Looking at the last changes, I think it'd be much more better to port
ip_queue to nfnetlink than to reserve another netlink ID: the hooks in
nfnetlink are already there. I know that'd create backward compatibility
issues at the existing queue applications, though... :-(

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-04-29  7:14       ` Jozsef Kadlecsik
@ 2005-04-29  8:02         ` Harald Welte
  2005-05-04  9:18           ` [RFC] alternative to conntrack ID Amin Azez
  2005-05-01 23:49         ` [RFC] [PATCH] ctnetlink updates Pablo Neira
  1 sibling, 1 reply; 48+ messages in thread
From: Harald Welte @ 2005-04-29  8:02 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Netfilter Development Mailinglist, Pablo Neira, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 2281 bytes --]

On Fri, Apr 29, 2005 at 09:14:16AM +0200, Jozsef Kadlecsik wrote:
> I don't like id either. Conntrack can uniquely identified by
> 
> - src/dst tuples, globally, even in a cluster
> - the pointer of the conntrack entry, locally

Yes, but not over time, i.e. if your cycle of reading the table and
issuing a 'delete' is long enough, then you could remove a connection
that was using the same tuple but was established meanwhile (after the
old died).  However looking at current timeouts, that would be more than
one or two minutes delat between read and delete.

My point of view is that we don't need the ID.  If there is too much
delay, well then the user has a certain risk.   If we would call it
'deleting a flow' then we'd be safe, since a flow has no start and
beginning, and multiple successive connections can comprise one flow ;)

> Looking at the last changes, I think it'd be much more better to port
> ip_queue to nfnetlink than to reserve another netlink ID: the hooks in
> nfnetlink are already there. I know that'd create backward compatibility
> issues at the existing queue applications, though... :-(

We've discussed that with David Miller at netconf'04.  The result was
that we can get another NETLINK family,  as there is a number of
obsolete/outdated ones in the kernel at the moment.  Also, if we keep
ULOG and ip_queue for now, and later migrate them into nfnetlink, there
will be again more free numbers.

ip_queue needs to be renamed to pkt_queue or nf_queue and made layer3
independent.  Same goes for ULOG.  Also, ULOG should no longer have a
fixed header containing interface names, ... but rather have that
in TLV's that are added according to the rule specified by the admin.

I've alsos been thinking of experimenting with a mmap'ed ring buffer for
ulog... at least it would be worth investigating at some point.

> Best regards,
> Jozsef

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC] alternative to conntrack ID
  2005-04-29  8:02         ` Harald Welte
@ 2005-05-04  9:18           ` Amin Azez
  2005-05-04  9:32             ` Patrick Schaaf
  2005-05-04 11:30             ` Patrick McHardy
  0 siblings, 2 replies; 48+ messages in thread
From: Amin Azez @ 2005-05-04  9:18 UTC (permalink / raw)
  To: Harald Welte
  Cc: Netfilter Development Mailinglist, Pablo Neira, Patrick McHardy

Harald Welte wrote:
> On Fri, Apr 29, 2005 at 09:14:16AM +0200, Jozsef Kadlecsik wrote:
> 
>>I don't like id either. Conntrack can uniquely identified by
>>
>>- src/dst tuples, globally, even in a cluster
>>- the pointer of the conntrack entry, locally
> 
> Yes, but not over time, i.e. if your cycle of reading the table and
> issuing a 'delete' is long enough, then you could remove a connection
> that was using the same tuple but was established meanwhile (after the
> old died).  However looking at current timeouts, that would be more than
> one or two minutes delat between read and delete.
> 
> My point of view is that we don't need the ID.  If there is too much
> delay, well then the user has a certain risk.   If we would call it
> 'deleting a flow' then we'd be safe, since a flow has no start and
> beginning, and multiple successive connections can comprise one flow ;)

I hope I am bringing a new angle to this and not the same old stuff.

With Pablo's new conntrack(-tool) there is an increased risk of this 
race condition. No longer will a userspace application read the table 
and "issue a delete" but it receives events via the netlink socket.

Any userspace tool tracking connections based on contrack events will 
receive an event some time after a conntrack is destroyed, but possibly 
after taking action on a new conntrack with the same tuples.

Here is an ascii art timeline with one of the failure cases

time+----+----+----+----+----+----+----+----+----+----+
                   destRoyed     created again????
contrack              *==*????????????????????????????????
netlink create event   *
user prog create event    *
netlink destroy event      *
user prog create action      * action may happen on new conntrack
user prog destroy event        *
user prog destroy action        * now we know we may have raced and lost

It is entirely possible that a new conntrack with the same tuples is 
created before the user program can be aware the old one has been destroyed.

Defining multiple successive connections as "one flow" is convenient, 
but as user space clients are notified of "interuptions and 
restorations" to this "one flow", it would be also convenient if they 
could safely take advantage of such notifications.

If an ID is not desirable as part of the tuple (and I can see that it is 
not) perhaps a "created time-stamp" per conntrack would suffice as an 
extra "guard" which MAY be provided to conntrack manipulation routines, 
and if so provided MUST also be satisified for the operation to take place.

That is my suggestion. It does not introduce an alternative ID, it does 
avoid the problem of race conditions.

Comments?

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-04  9:18           ` [RFC] alternative to conntrack ID Amin Azez
@ 2005-05-04  9:32             ` Patrick Schaaf
  2005-05-04 11:30             ` Patrick McHardy
  1 sibling, 0 replies; 48+ messages in thread
From: Patrick Schaaf @ 2005-05-04  9:32 UTC (permalink / raw)
  To: Amin Azez; +Cc: netfilter-devel, Pablo Neira, Patrick McHardy

> perhaps a "created time-stamp" per conntrack

I like this idea. One could then also have a match expressing
"conntrack has been live for at most / at least X seconds".
Which would be a useful new feature.

best regards
  Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-04  9:18           ` [RFC] alternative to conntrack ID Amin Azez
  2005-05-04  9:32             ` Patrick Schaaf
@ 2005-05-04 11:30             ` Patrick McHardy
  2005-05-04 12:01               ` Amin Azez
  1 sibling, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-05-04 11:30 UTC (permalink / raw)
  To: Amin Azez; +Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira

Amin Azez wrote:

> It is entirely possible that a new conntrack with the same tuples is
> created before the user program can be aware the old one has been
> destroyed.
> 
> Defining multiple successive connections as "one flow" is convenient,
> but as user space clients are notified of "interuptions and
> restorations" to this "one flow", it would be also convenient if they
> could safely take advantage of such notifications.

Agreed. Besides, this is an interface to conntrack, not flowtrack :)

> If an ID is not desirable as part of the tuple (and I can see that it is
> not) perhaps a "created time-stamp" per conntrack would suffice as an
> extra "guard" which MAY be provided to conntrack manipulation routines,
> and if so provided MUST also be satisified for the operation to take place.
> 
> That is my suggestion. It does not introduce an alternative ID, it does
> avoid the problem of race conditions.
> 
> Comments?

Why is that better than a unique ID? It needs space as well, but can't
be used to identify the conntrack without further information.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-04 11:30             ` Patrick McHardy
@ 2005-05-04 12:01               ` Amin Azez
  2005-05-06 15:16                 ` Patrick McHardy
  0 siblings, 1 reply; 48+ messages in thread
From: Amin Azez @ 2005-05-04 12:01 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira

Patrick McHardy wrote:

>Amin Azez wrote:
>
>  
>
>>It is entirely possible that a new conntrack with the same tuples is
>>created before the user program can be aware the old one has been
>>destroyed.
>>
>>Defining multiple successive connections as "one flow" is convenient,
>>but as user space clients are notified of "interuptions and
>>restorations" to this "one flow", it would be also convenient if they
>>could safely take advantage of such notifications.
>>    
>>
>
>Agreed. Besides, this is an interface to conntrack, not flowtrack :)
>  
>
>>If an ID is not desirable as part of the tuple (and I can see that it is
>>not) perhaps a "created time-stamp" per conntrack would suffice as an
>>extra "guard" which MAY be provided to conntrack manipulation routines,
>>and if so provided MUST also be satisified for the operation to take place.
>>
>>That is my suggestion. It does not introduce an alternative ID, it does
>>avoid the problem of race conditions.
>>
>>Comments?
>>    
>>
>
>Why is that better than a unique ID? It needs space as well, but can't
>be used to identify the conntrack without further information.
>  
>
There isn't the problems of having to generate a unique id, or the worry 
of it finally wrapping every few years as we don't pretend it is unique.
However, combined with either tuple it forms a unique id that wraps only 
when the calendar does.
Further, as pointed out by Patrick Schaaf, start time has the potential 
to be more useful than a unique id in filtering

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-04 12:01               ` Amin Azez
@ 2005-05-06 15:16                 ` Patrick McHardy
  2005-05-07 20:36                   ` Marcus Sundberg
  0 siblings, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-05-06 15:16 UTC (permalink / raw)
  To: Amin Azez; +Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira

Amin Azez wrote:
> Patrick McHardy wrote:
>
>> Why is that better than a unique ID? It needs space as well, but can't
>> be used to identify the conntrack without further information.
>>  
>>
> There isn't the problems of having to generate a unique id, or the worry
> of it finally wrapping every few years as we don't pretend it is unique.
> However, combined with either tuple it forms a unique id that wraps only
> when the calendar does.

Wrapping is not a problem with a 64bit id. One thing I'm worried about
with using a timestamp is that it might not be of high enough precision
with very fast CPU and network to uniquely identify each connection.

> Further, as pointed out by Patrick Schaaf, start time has the potential
> to be more useful than a unique id in filtering

Agreed, but it is secondary to solving the problem.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-06 15:16                 ` Patrick McHardy
@ 2005-05-07 20:36                   ` Marcus Sundberg
  2005-05-07 22:18                     ` Patrick McHardy
  0 siblings, 1 reply; 48+ messages in thread
From: Marcus Sundberg @ 2005-05-07 20:36 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	Amin Azez

Patrick McHardy wrote:
> Amin Azez wrote:
> 
>>There isn't the problems of having to generate a unique id, or the worry
>>of it finally wrapping every few years as we don't pretend it is unique.
>>However, combined with either tuple it forms a unique id that wraps only
>>when the calendar does.
> 
> Wrapping is not a problem with a 64bit id. One thing I'm worried about
> with using a timestamp is that it might not be of high enough precision
> with very fast CPU and network to uniquely identify each connection.

You don't even need fast CPUs or networks to risk precision problems
- think multiple NICs and SMP.

//Marcus
-- 
---------------------------------------+--------------------------
  Marcus Sundberg <marcus@ingate.com>  | Firewalls with SIP & NAT
 Software Developer, Ingate Systems AB |  http://www.ingate.com/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-07 20:36                   ` Marcus Sundberg
@ 2005-05-07 22:18                     ` Patrick McHardy
  2005-05-07 22:32                       ` Marcus Sundberg
  0 siblings, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-05-07 22:18 UTC (permalink / raw)
  To: Marcus Sundberg
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	Amin Azez

Marcus Sundberg wrote:
> You don't even need fast CPUs or networks to risk precision problems
> - think multiple NICs and SMP.

SMP or multiple NIcs don't matter because at any point in time only
one instance of a connection can exist. The challenge is to have a
unique identifier over time.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-07 22:18                     ` Patrick McHardy
@ 2005-05-07 22:32                       ` Marcus Sundberg
  2005-05-09 14:17                         ` KOVACS Krisztian
  2005-05-11  8:43                         ` Amin Azez
  0 siblings, 2 replies; 48+ messages in thread
From: Marcus Sundberg @ 2005-05-07 22:32 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	Amin Azez

Patrick McHardy wrote:
> Marcus Sundberg wrote:
> 
>>You don't even need fast CPUs or networks to risk precision problems
>>- think multiple NICs and SMP.
> 
> SMP or multiple NIcs don't matter because at any point in time only
> one instance of a connection can exist. The challenge is to have a
> unique identifier over time.

Yes, having a unique identifier over time was what was being discussed,
and I was merely pointing out that with SMP you can get two conntracks
with identical timestamps even if you have infinite precision, since
two new conntracks can be timestamped simultaneously by different CPUs.

//Marcus
-- 
---------------------------------------+--------------------------
  Marcus Sundberg <marcus@ingate.com>  | Firewalls with SIP & NAT
 Software Developer, Ingate Systems AB |  http://www.ingate.com/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-07 22:32                       ` Marcus Sundberg
@ 2005-05-09 14:17                         ` KOVACS Krisztian
  2005-05-09 15:08                           ` Amin Azez
  2005-05-17 16:12                           ` Amin Azez
  2005-05-11  8:43                         ` Amin Azez
  1 sibling, 2 replies; 48+ messages in thread
From: KOVACS Krisztian @ 2005-05-09 14:17 UTC (permalink / raw)
  To: Marcus Sundberg
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	Patrick McHardy, Amin Azez

  Hi,

2005-05-08, v keltezéssel 00.32-kor Marcus Sundberg ezt írta:
> >>You don't even need fast CPUs or networks to risk precision problems
> >>- think multiple NICs and SMP.
> > 
> > SMP or multiple NIcs don't matter because at any point in time only
> > one instance of a connection can exist. The challenge is to have a
> > unique identifier over time.
> 
> Yes, having a unique identifier over time was what was being discussed,
> and I was merely pointing out that with SMP you can get two conntracks
> with identical timestamps even if you have infinite precision, since
> two new conntracks can be timestamped simultaneously by different CPUs.

  OK, but it's not a problem. If the two conntracks are identical (their
tuples are the same) then equal timestamps are not a problem (and of
course one of them will be dropped anyway, since hash insertion is
serialized). If they are not the same, then it does not matter because
the timestamp is just an additional info -- the tuple identifies the
conntrack by itself.

  Probably Patrick was referring to a possible problem where the
following happens: a new connection is established and destroyed in a
very short time. If a new connection with the same tuple is created
before the timestamp increases (which is perfectly possible IMHO if you
have some slow embedded HW with no high precision timer available) then
you won't be able to tell the difference in the userspace app, so that
the race described by Amin is still possible.

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-09 14:17                         ` KOVACS Krisztian
@ 2005-05-09 15:08                           ` Amin Azez
  2005-05-10  6:49                             ` Harald Welte
  2005-05-17 16:12                           ` Amin Azez
  1 sibling, 1 reply; 48+ messages in thread
From: Amin Azez @ 2005-05-09 15:08 UTC (permalink / raw)
  To: KOVACS Krisztian
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	Marcus Sundberg, Patrick McHardy

KOVACS Krisztian wrote:

>
>  OK, but it's not a problem. If the two conntracks are identical (their
>tuples are the same) then equal timestamps are not a problem (and of
>course one of them will be dropped anyway, since hash insertion is
>serialized). If they are not the same, then it does not matter because
>the timestamp is just an additional info -- the tuple identifies the
>conntrack by itself.
>
>  Probably Patrick was referring to a possible problem where the
>following happens: a new connection is established and destroyed in a
>very short time. If a new connection with the same tuple is created
>before the timestamp increases (which is perfectly possible IMHO if you
>have some slow embedded HW with no high precision timer available) then
>you won't be able to tell the difference in the userspace app, so that
>the race described by Amin is still possible.
>  
>
The time struct used in skb's has time and microtime.
Is there a sequence of packets that conntrack could monitor so that a 
conntrack to be created and destroyed and re-created in the same 
microsecond?

I can't imagine any slow embedded hardware being called upon to process 
a sequence of packets that occur so quickly; are installations actually 
called upon to process packets at a rate beyond their ability to time? 
(maybe so, just curious)

Perhaps if the conntrack would be destroyed in the same time instance 
that it is created it is instead saved but destroyed later by a timer 
callback.  If a conntrack is then to be "re-created" before it has been 
destroyed, a small starts-at-0 counter in the conntrack struct is 
increased to indicate re-use. The size of this counter would have to 
reflect the number of times a conntrack could be destroyed and 
resurrected in the same timer-tick.

It is sad that the addition of the extra field, and deferral of 
destruction on contracks in the same microsecond they were created in 
make the solution less simple, but they would make it robust.

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-09 15:08                           ` Amin Azez
@ 2005-05-10  6:49                             ` Harald Welte
  0 siblings, 0 replies; 48+ messages in thread
From: Harald Welte @ 2005-05-10  6:49 UTC (permalink / raw)
  To: Amin Azez
  Cc: Patrick McHardy, Netfilter Development Mailinglist, Pablo Neira,
	Marcus Sundberg, KOVACS Krisztian

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On Mon, May 09, 2005 at 04:08:49PM +0100, Amin Azez wrote:

> The time struct used in skb's has time and microtime.

If you're referring to the skb receive timestamp:  That doesn't exist
for locally-generated packet, and on 'real' incoming pcakets from the
network it doesn't exist by default unless some application (such as
tcpdump) requests it.

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-09 14:17                         ` KOVACS Krisztian
  2005-05-09 15:08                           ` Amin Azez
@ 2005-05-17 16:12                           ` Amin Azez
  2005-05-17 20:17                             ` Patrick McHardy
  2005-05-18  6:45                             ` Jozsef Kadlecsik
  1 sibling, 2 replies; 48+ messages in thread
From: Amin Azez @ 2005-05-17 16:12 UTC (permalink / raw)
  To: KOVACS Krisztian
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	Patrick McHardy

KOVACS Krisztian wrote:
>   Probably Patrick was referring to a possible problem where the
> following happens: a new connection is established and destroyed in a
> very short time. If a new connection with the same tuple is created
> before the timestamp increases (which is perfectly possible IMHO if you
> have some slow embedded HW with no high precision timer available) 

After further reading I think this scenario is highly unlikely.

I don't mean improbable, I mean, is there any such hardware?
If a socket is not to be reused until TCP_TIME_WAIT which is recommended 
to be in the region of 4 minutes, is there really any hardware that 
can't time to that resolution? Is there really any devices that will 
re-use a TCP socket in the same timer tick as they closed it?

Any devices that re-use sockets too quickly are going to have problems 
anyway (of course that doesn't mean we can ignore them) but I surely 
they are so buggy as to not remain in use?

Amin

> then
> you won't be able to tell the difference in the userspace app, so that
> the race described by Amin is still possible.
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-17 16:12                           ` Amin Azez
@ 2005-05-17 20:17                             ` Patrick McHardy
  2005-05-18  7:24                               ` Amin Azez
  2005-05-18  9:30                               ` Jozsef Kadlecsik
  2005-05-18  6:45                             ` Jozsef Kadlecsik
  1 sibling, 2 replies; 48+ messages in thread
From: Patrick McHardy @ 2005-05-17 20:17 UTC (permalink / raw)
  To: Amin Azez
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	KOVACS Krisztian

Amin Azez wrote:
> KOVACS Krisztian wrote:
> 
>>   Probably Patrick was referring to a possible problem where the
>> following happens: a new connection is established and destroyed in a
>> very short time. If a new connection with the same tuple is created
>> before the timestamp increases (which is perfectly possible IMHO if you
>> have some slow embedded HW with no high precision timer available) 

Exactly.

> After further reading I think this scenario is highly unlikely.

Unlikely is still enough reason to handle it properly in an API.
Otherwise anything you build on top of it has to take this into
account for any guarantees it would like to give. And so far, I
haven't even seen a suggestion how to notice it - which would
also be fine with me.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-17 20:17                             ` Patrick McHardy
@ 2005-05-18  7:24                               ` Amin Azez
  2005-05-18  9:30                               ` Jozsef Kadlecsik
  1 sibling, 0 replies; 48+ messages in thread
From: Amin Azez @ 2005-05-18  7:24 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	KOVACS Krisztian

Patrick McHardy wrote:

>Amin Azez wrote:
>  
>
>>KOVACS Krisztian wrote:
>>
>>    
>>
>>>  Probably Patrick was referring to a possible problem where the
>>>following happens: a new connection is established and destroyed in a
>>>very short time. If a new connection with the same tuple is created
>>>before the timestamp increases (which is perfectly possible IMHO if you
>>>have some slow embedded HW with no high precision timer available) 
>>>      
>>>
>
>Exactly.
>
>  
>
>>After further reading I think this scenario is highly unlikely.
>>    
>>
>
>Unlikely is still enough reason to handle it properly in an API.
>  
>
By unlikely I didn't mean it would rarely happen I meant the hardware 
with which it could ever happen is surely unlikely. (A different order 
of unlikiness) However I guess your comment below still holds.

>Otherwise anything you build on top of it has to take this into
>account for any guarantees it would like to give. And so far, I
>haven't even seen a suggestion how to notice it - which would
>also be fine with me.
>  
>
One such suggestion is: IFF the conntrack is to be destroyed in the same 
clock tick as it was created, to instead destroy the conntrack one clock 
tick later through death-by-timeout. Then the new conntrack would have 
to be created (although the same clock tick) with a different internal 
conntrack id.
The costs of this would only be borne when such unusual hardware was in 
use, and when the problem case came up, but the internal conntrack id 
could then be used in conjunction with the timestamp to form a unique 
qualifier that (takes deep breath) could be used with the tuple to 
recognize a specific conntrack instance. It would require no extra 
storage but increase the amount of data sent though the netlink socket.

This would still offer some slight benefit over a public conntrack 
serial number in that it would also allow conntrack creation time 
matching in iptables rules.

I do point out and wonder about the possibilities of a denial of service 
though queueing lots of conntracks to be destroyed by timeout 1 tick 
later but think this is hardly any worse than without a timeout in practice.

Another hacky "policy" fix would be to drop the SYN packet that would 
re-create the conntrack in the same tick as its original creation and 
let it be sent again. Its barely normal behaviour to do such a thing, 
such packets deserve to be dropped (for the sins of their parents? Hmm) 
Would such packets get re-sent via a loopback interface? But then again 
device that abuses themselves in such a way beyond the resolution of 
their own timers are surely on drugs?

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-17 20:17                             ` Patrick McHardy
  2005-05-18  7:24                               ` Amin Azez
@ 2005-05-18  9:30                               ` Jozsef Kadlecsik
  2005-06-04 23:52                                 ` Pablo Neira
  1 sibling, 1 reply; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-05-18  9:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

On Tue, 17 May 2005, Patrick McHardy wrote:

> Amin Azez wrote:
> > KOVACS Krisztian wrote:
> >
> >>   Probably Patrick was referring to a possible problem where the
> >> following happens: a new connection is established and destroyed in a
> >> very short time. If a new connection with the same tuple is created
> >> before the timestamp increases (which is perfectly possible IMHO if you
> >> have some slow embedded HW with no high precision timer available)
>
> Exactly.
>
> > After further reading I think this scenario is highly unlikely.
>
> Unlikely is still enough reason to handle it properly in an API.
> Otherwise anything you build on top of it has to take this into
> account for any guarantees it would like to give. And so far, I
> haven't even seen a suggestion how to notice it - which would
> also be fine with me.

I think we should not state any guarantee here. Conntrack entries are
uniquely identified by tuples, that's all we should say.

There *is* a certain ambiquity, when, during te kernel-userspace
communication, a conntrack entry is deleted and a new one with the same
tuples is created, but that can be documented clearly.

In order to create unique identification of conntrack entries, there were
a couple of clever suggestions, all of them burdened by something:

- timer based solutions may be not fine-grained enough
- pointer of conntrack is not unique as it can be reused
- id creates a new possible bottleneck

What wrong can happen, if a reborn conntrack entry is deleted instead of
the original one?

If the conntrack entry is to be dropped due to a change in policy, then
what we did is just fine! If there was a "stuck" conntrack entry and the
admin was going to delete it manually but it went away and he deleted the
new conntrack entry, that's an unfortunate event - but the user was in
trouble anyway.

So - do we really need such accuracy?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-18  9:30                               ` Jozsef Kadlecsik
@ 2005-06-04 23:52                                 ` Pablo Neira
  2005-06-05  1:02                                   ` Pablo Neira
  2005-06-06  8:17                                   ` Jozsef Kadlecsik
  0 siblings, 2 replies; 48+ messages in thread
From: Pablo Neira @ 2005-06-04 23:52 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Jozsef Kadlecsik wrote:
> On Tue, 17 May 2005, Patrick McHardy wrote:
>>Amin Azez wrote:
>>
>>>KOVACS Krisztian wrote:
>>>
>>>>  Probably Patrick was referring to a possible problem where the
>>>>following happens: a new connection is established and destroyed in a
>>>>very short time. If a new connection with the same tuple is created
>>>>before the timestamp increases (which is perfectly possible IMHO if you
>>>>have some slow embedded HW with no high precision timer available)
>>
>>Exactly.
>>
>>>After further reading I think this scenario is highly unlikely.
>>
>>Unlikely is still enough reason to handle it properly in an API.
>>Otherwise anything you build on top of it has to take this into
>>account for any guarantees it would like to give. And so far, I
>>haven't even seen a suggestion how to notice it - which would
>>also be fine with me.
> 
> I think we should not state any guarantee here. Conntrack entries are
> uniquely identified by tuples, that's all we should say.
> 
> There *is* a certain ambiquity, when, during te kernel-userspace
> communication, a conntrack entry is deleted and a new one with the same
> tuples is created, but that can be documented clearly.
> 
> In order to create unique identification of conntrack entries, there were
> a couple of clever suggestions, all of them burdened by something:
> 
> - timer based solutions may be not fine-grained enough
> - pointer of conntrack is not unique as it can be reused
> - id creates a new possible bottleneck
> 
> What wrong can happen, if a reborn conntrack entry is deleted instead of
> the original one?
> 
> If the conntrack entry is to be dropped due to a change in policy, then
> what we did is just fine! If there was a "stuck" conntrack entry and the
> admin was going to delete it manually but it went away and he deleted the
> new conntrack entry, that's an unfortunate event - but the user was in
> trouble anyway.
> 
> So - do we really need such accuracy?

I want give another spin to this issue. A small digest about this ID thing:

+ The unique ID eats 8 extra bytes, since it will be an __u64. On my 
laptop (1787 buckets, 14296 max), that makes 114368 extra bytes (worst 
case).

+ "Slow" devices. As Krisztian and Patrick pointed out, a conntrack 
could be destroyed while the user could be trying to kill it, then 
another conntrack is created with the same tuples. Result: the user 
kills a connection that he didn't mean to.

+ If we've got an ID, the user could decide it he wants such accuracy or 
not to kill connections. If not, we would need to document this issue.

I'd definitely like to have such accuracy, but I still see this incident 
unlikely. I think that such TCP stack must be broken if it starts a 
brand new connection using the same source/destination ports that it's 
recently used.

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-04 23:52                                 ` Pablo Neira
@ 2005-06-05  1:02                                   ` Pablo Neira
  2005-06-06  8:48                                     ` Jozsef Kadlecsik
  2005-06-06  8:17                                   ` Jozsef Kadlecsik
  1 sibling, 1 reply; 48+ messages in thread
From: Pablo Neira @ 2005-06-05  1:02 UTC (permalink / raw)
  To: Pablo Neira
  Cc: Netfilter Development Mailinglist, Patrick McHardy,
	Jozsef Kadlecsik

Pablo Neira wrote:
> I'd definitely like to have such accuracy, but I still see this incident 
> unlikely. I think that such TCP stack must be broken if it starts a 
> brand new connection using the same source/destination ports that it's 
> recently used.

Forget this, this can happen in an attempt to reopen a closed 
connection, and such case is likely. We need such ID in order to achieve 
accuracy. I think that it must be the user who has to choose if he wants 
accuracy or not, in such case we have to provide the corresponding 
methods to achieve it. A user could kill a conntrack by means of:

a) the tuples, if he doesn't want accuracy
b) the tuples + the id, if he does.

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-05  1:02                                   ` Pablo Neira
@ 2005-06-06  8:48                                     ` Jozsef Kadlecsik
  2005-06-09 12:52                                       ` Pablo Neira
  0 siblings, 1 reply; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-06-06  8:48 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Hi Pablo,

On Sun, 5 Jun 2005, Pablo Neira wrote:

> Pablo Neira wrote:
> > I'd definitely like to have such accuracy, but I still see this incident
> > unlikely. I think that such TCP stack must be broken if it starts a
> > brand new connection using the same source/destination ports that it's
> > recently used.
>
> Forget this, this can happen in an attempt to reopen a closed
> connection, and such case is likely. We need such ID in order to achieve
> accuracy. I think that it must be the user who has to choose if he wants
> accuracy or not, in such case we have to provide the corresponding
> methods to achieve it. A user could kill a conntrack by means of:
>
> a) the tuples, if he doesn't want accuracy
> b) the tuples + the id, if he does.

I share your feelings about giving complete accurate access to the users
over conntrack entries. Still, I'm not completely convinced about the
practical usefulness of such accuracy. Let's therefore look at it again:

a. Policy changed and admin wants to enforce the new policy on the living
   conntrack entries as well: here the id does not buy anything, tuples
   are just sufficient.
b. Admin wants to kill a "stuck" conntrack entry, in order to make
   possible to build up a new connection. In my opinion that's just not
   the proper way to deal with the problem, conntrack should be able to
   handle such cases automatically. And I believe we worked very hard
   and that part is highly polished in conntrack in the recent 2.6
   tree, so that it's just a theoretical example ;-)
c. conntrack table is full and admin wants to get rid of a bunch of
   entries manually. Somehow I don't think id would be very useful here
   either.

Other possibilities?

I do not think users should poke conntrack without very good reason, at
their whim.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-06  8:48                                     ` Jozsef Kadlecsik
@ 2005-06-09 12:52                                       ` Pablo Neira
  2005-06-09 13:00                                         ` Pablo Neira
  0 siblings, 1 reply; 48+ messages in thread
From: Pablo Neira @ 2005-06-09 12:52 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Hi Jozsef,

Jozsef Kadlecsik wrote:
>>Pablo Neira wrote:
>>
>>>I'd definitely like to have such accuracy, but I still see this incident
>>>unlikely. I think that such TCP stack must be broken if it starts a
>>>brand new connection using the same source/destination ports that it's
>>>recently used.
>>
>>Forget this, this can happen in an attempt to reopen a closed
>>connection, and such case is likely. We need such ID in order to achieve
>>accuracy. I think that it must be the user who has to choose if he wants
>>accuracy or not, in such case we have to provide the corresponding
>>methods to achieve it. A user could kill a conntrack by means of:
>>
>>a) the tuples, if he doesn't want accuracy
>>b) the tuples + the id, if he does.
> 
> 
> I share your feelings about giving complete accurate access to the users
> over conntrack entries. Still, I'm not completely convinced about the
> practical usefulness of such accuracy. Let's therefore look at it again:
> 
> a. Policy changed and admin wants to enforce the new policy on the living
>    conntrack entries as well: here the id does not buy anything, tuples
>    are just sufficient.
> b. Admin wants to kill a "stuck" conntrack entry, in order to make
>    possible to build up a new connection. In my opinion that's just not
>    the proper way to deal with the problem, conntrack should be able to
>    handle such cases automatically. And I believe we worked very hard
>    and that part is highly polished in conntrack in the recent 2.6
>    tree, so that it's just a theoretical example ;-)
> c. conntrack table is full and admin wants to get rid of a bunch of
>    entries manually. Somehow I don't think id would be very useful here
>    either.
> 
> Other possibilities?
> 
> I do not think users should poke conntrack without very good reason, at
> their whim.

Agreed, those scenarios look pretty realistic.

But if the ID goes out, I'll have another concern. 
ctnetlink_dump_table[_w] currently uses the ID to know where it's 
stopped dumping the conntrack table, netlink dumping is not atomic. I 
could increase the conntrack refcount and hold a pointer to it but if 
timeout expires while returning data to user space, the conntrack won't 
be in hashes anymore, so it couldn't continue the travel through the 
conntrack table.

I thought about freezing the conntrack timer and active it once I 
continue traversing the list. That could result in problems since 
netlink dumping is not atomic, someone could interrupt the dumping and 
that conntrack will be stuck there forever. Moreover, I don't like it.

All the things I've though so far are burned by something. The cleanest 
way to do this looks the ID. Any other ideas?

P.D: Thanks to Krisztian Kovacs for the feedback.

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-09 12:52                                       ` Pablo Neira
@ 2005-06-09 13:00                                         ` Pablo Neira
  2005-06-09 13:34                                           ` Jozsef Kadlecsik
  0 siblings, 1 reply; 48+ messages in thread
From: Pablo Neira @ 2005-06-09 13:00 UTC (permalink / raw)
  To: Pablo Neira
  Cc: Netfilter Development Mailinglist, Patrick McHardy,
	Jozsef Kadlecsik

Pablo Neira wrote:
> All the things I've though so far are burned by something. The cleanest 
> way to do this looks the ID. Any other ideas?

Hm, this idea just came to my head. We could use a unsigned 8 bit
per-bucket-id that, together with the tuple, could uniquely identify a
conntrack (and make Patrick sleep with both eyes closed), reduce memory
comsumption (and make Jozsef happier) and fix my problem of the
conntrack table dumping (let Pablo drinks beer calmly).

Anything else?

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-09 13:00                                         ` Pablo Neira
@ 2005-06-09 13:34                                           ` Jozsef Kadlecsik
  2005-06-10 10:21                                             ` Pablo Neira
  0 siblings, 1 reply; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-06-09 13:34 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Hi Pablo,

On Thu, 9 Jun 2005, Pablo Neira wrote:

> Pablo Neira wrote:
> > All the things I've though so far are burned by something. The cleanest
> > way to do this looks the ID. Any other ideas?
>
> Hm, this idea just came to my head. We could use a unsigned 8 bit
> per-bucket-id that, together with the tuple, could uniquely identify a
> conntrack (and make Patrick sleep with both eyes closed), reduce memory
> comsumption (and make Jozsef happier) and fix my problem of the
> conntrack table dumping (let Pablo drinks beer calmly).

Let the id be at least unsigned 16 bit. No hash function can guarantee
that a given bucket won't happen to grow above 256 entries.

How are you going to handle id collision due to wraparound?

I like the idea! Per bucket id don't destroy what one can gain by per
bucket locking ;-). (But the latter would require something more scalable
than the unconfirmed list as well...)

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-09 13:34                                           ` Jozsef Kadlecsik
@ 2005-06-10 10:21                                             ` Pablo Neira
  2005-06-13  7:41                                               ` Jozsef Kadlecsik
  0 siblings, 1 reply; 48+ messages in thread
From: Pablo Neira @ 2005-06-10 10:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Hi Jozsef,

Jozsef Kadlecsik wrote:
>>Pablo Neira wrote:
>>
>>>All the things I've though so far are burned by something. The cleanest
>>>way to do this looks the ID. Any other ideas?
>>
>>Hm, this idea just came to my head. We could use a unsigned 8 bit
>>per-bucket-id [blah... blah... blah]
> 
> Let the id be at least unsigned 16 bit. No hash function can guarantee
> that a given bucket won't happen to grow above 256 entries.
> 
> How are you going to handle id collision due to wraparound?

yes, 8 bits is too short. About the wraparound problem, I'm planning to 
re-use id's. The id of a new conntrack will be set to the lastest 
inserted in the bucket plus one.

However this wouldn't uniquely identify a conntrack: Say a connection is 
established, lastest conntrack in the bucket uses id A, so its id will 
be set to A+1. After quite some time the connection is closed. Then, in 
a very short period of time, another connection with the same tuples is 
established and lastest conntrack id is still A, in that case the id of 
the new conntrack will be set to A+1 again.

To avoid that, I could keep an array of lastest id's released per bucket 
and set the id based on:

if (id_of_lastest_ct_inserted > lastest_id_released[bucket])
	id_of_lastest_ct_inserted + 1
else
	lastest_id_released[bucket] + 1;

About memory comsumption. On my laptop, ip_conntrack version 2.1 (1787 
buckets, 14296 max)

This approach:
1787 * 2 = 3574 extra bytes to store the lastest id used
14296 * 2 (extra bytes per conntrack) = 28592 extra bytes (worst case)

With regards to current u64 id approach:
28592 * 8 (extra bytes per conntrack) = 228736 extra bytes (worst case)

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-10 10:21                                             ` Pablo Neira
@ 2005-06-13  7:41                                               ` Jozsef Kadlecsik
  2005-06-14  2:30                                                 ` Pablo Neira
  0 siblings, 1 reply; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-06-13  7:41 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Hi Pablo,

On Fri, 10 Jun 2005, Pablo Neira wrote:

> > How are you going to handle id collision due to wraparound?
>
> yes, 8 bits is too short. About the wraparound problem, I'm planning to
> re-use id's. The id of a new conntrack will be set to the lastest
> inserted in the bucket plus one.
>
> However this wouldn't uniquely identify a conntrack: Say a connection is
> established, lastest conntrack in the bucket uses id A, so its id will
> be set to A+1. After quite some time the connection is closed. Then, in
> a very short period of time, another connection with the same tuples is
> established and lastest conntrack id is still A, in that case the id of
> the new conntrack will be set to A+1 again.

Wouldn't be more straightforward to store the last assigned id value in
the bucket and simply increment that whenever the next value is used up?
(Id collision is actually not a big problem, because the entries are
identified by the tuples in the first place.)

At dumping we could use the flip-bit solution: entries which are already
dumped were marked with the next value of the bit. Of course user requests
for dumping must be serialized, but conntrack replication could benefit
from such schema, because new entries could be added to the conntrack
table and replicated during full conntrack table replication as well.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-13  7:41                                               ` Jozsef Kadlecsik
@ 2005-06-14  2:30                                                 ` Pablo Neira
  2005-06-14  2:42                                                   ` Patrick McHardy
  0 siblings, 1 reply; 48+ messages in thread
From: Pablo Neira @ 2005-06-14  2:30 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Hello Jozsef,

Jozsef Kadlecsik wrote:
> On Fri, 10 Jun 2005, Pablo Neira wrote:
> 
> 
>>>How are you going to handle id collision due to wraparound?
>>
>>yes, 8 bits is too short. About the wraparound problem, I'm planning to
>>re-use id's. The id of a new conntrack will be set to the lastest
>>inserted in the bucket plus one.
>>
>>However this wouldn't uniquely identify a conntrack: Say a connection is
>>established, lastest conntrack in the bucket uses id A, so its id will
>>be set to A+1. After quite some time the connection is closed. Then, in
>>a very short period of time, another connection with the same tuples is
>>established and lastest conntrack id is still A, in that case the id of
>>the new conntrack will be set to A+1 again.
> 
> 
> Wouldn't be more straightforward to store the last assigned id value in
> the bucket and simply increment that whenever the next value is used up?
> (Id collision is actually not a big problem, because the entries are
> identified by the tuples in the first place.)

Right, but then I'll have to face another problem, once the wraparound 
happens the conntracks inserted in the bucket aren't ordered by the id 
anymore. Currently if the skbuff that is going to be sent to user space 
via netlink gets full (one page sized), I'll need to know which was the 
lastest processed conntrack, including possible race conditions, ie. the 
conntrack expires while netlink is returning the packet to user space. 
This is controled by the following branch while iterating over the list:

                         if (ct->id <= cb->args[1])
                                 continue;

That's why I came up with the idea of re-using id's, I want to avoid a 
wraparound. BTW, inserting conntracks in order isn't a solution either, 
since this will break LRU early drop.

> At dumping we could use the flip-bit solution: entries which are already
> dumped were marked with the next value of the bit. Of course user requests
> for dumping must be serialized, but conntrack replication could benefit
> from such schema, because new entries could be added to the conntrack
> table and replicated during full conntrack table replication as well.

Could you elaborate this idea about the flip-bit solution, please? looks 
interesting.

I'm still looking for a solution based on a simpler logic, we'll see ;).

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-14  2:30                                                 ` Pablo Neira
@ 2005-06-14  2:42                                                   ` Patrick McHardy
  2005-06-15  2:41                                                     ` Pablo Neira
  2005-06-20 16:04                                                     ` Amin Azez
  0 siblings, 2 replies; 48+ messages in thread
From: Patrick McHardy @ 2005-06-14  2:42 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Jozsef Kadlecsik

On Tue, 14 Jun 2005, Pablo Neira wrote:
>> At dumping we could use the flip-bit solution: entries which are already
>> dumped were marked with the next value of the bit. Of course user requests
>> for dumping must be serialized, but conntrack replication could benefit
>> from such schema, because new entries could be added to the conntrack
>> table and replicated during full conntrack table replication as well.
>
> Could you elaborate this idea about the flip-bit solution, please? looks 
> interesting.

You only allow one process to dump the table at a time, In each
conntrack entry you flip a bit when dumping it. When continuing
you continue with the next entry that has the bit unflipped.
This way you don't need an ID at all. You need a timeout to make
sure a hung process isn't blocking dumps forever. A malicious
acting process could probably still block others for a long time,
but dumping the conntrack table should only be possible with
root priviliges anyway. When a dump is interrupted the state of
the bits is inconsistent, in this case you need to reset all of
them to a known state.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-14  2:42                                                   ` Patrick McHardy
@ 2005-06-15  2:41                                                     ` Pablo Neira
  2005-06-20 16:04                                                     ` Amin Azez
  1 sibling, 0 replies; 48+ messages in thread
From: Pablo Neira @ 2005-06-15  2:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Jozsef Kadlecsik

Patrick McHardy wrote:
> On Tue, 14 Jun 2005, Pablo Neira wrote:
> 
>>> At dumping we could use the flip-bit solution: entries which are already
>>> dumped were marked with the next value of the bit. Of course user 
>>> requests
>>> for dumping must be serialized, but conntrack replication could benefit
>>> from such schema, because new entries could be added to the conntrack
>>> table and replicated during full conntrack table replication as well.
>>
>>
>> Could you elaborate this idea about the flip-bit solution, please? 
>> looks interesting.
> 
> 
> You only allow one process to dump the table at a time, In each
> conntrack entry you flip a bit when dumping it. When continuing
> you continue with the next entry that has the bit unflipped.
> This way you don't need an ID at all. You need a timeout to make
> sure a hung process isn't blocking dumps forever. A malicious
> acting process could probably still block others for a long time,
> but dumping the conntrack table should only be possible with
> root priviliges anyway. When a dump is interrupted the state of
> the bits is inconsistent, in this case you need to reset all of
> them to a known state.

This would complicate the logic. Moreover, a top-like processing dumping 
the conntrack table every x seconds could be such "evil" process. I'm 
going to stuck on the idea of using a u64 id. It's the simpler solution 
at the moment.

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-14  2:42                                                   ` Patrick McHardy
  2005-06-15  2:41                                                     ` Pablo Neira
@ 2005-06-20 16:04                                                     ` Amin Azez
  2005-06-20 16:12                                                       ` Patrick McHardy
  1 sibling, 1 reply; 48+ messages in thread
From: Amin Azez @ 2005-06-20 16:04 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Jozsef Kadlecsik

Patrick McHardy wrote:
> On Tue, 14 Jun 2005, Pablo Neira wrote:
> 
>>> At dumping we could use the flip-bit solution: entries which are already
>>> dumped were marked with the next value of the bit. Of course user 
>>> requests
>>> for dumping must be serialized, but conntrack replication could benefit
>>> from such schema, because new entries could be added to the conntrack
>>> table and replicated during full conntrack table replication as well.
>>
>>
>> Could you elaborate this idea about the flip-bit solution, please? 
>> looks interesting.
> 
> 
> You only allow one process to dump the table at a time, In each
> conntrack entry you flip a bit when dumping it. When continuing
> you continue with the next entry that has the bit unflipped.
> This way you don't need an ID at all. You need a timeout to make
> sure a hung process isn't blocking dumps forever. A malicious
> acting process could probably still block others for a long time,
> but dumping the conntrack table should only be possible with
> root priviliges anyway. When a dump is interrupted the state of
> the bits is inconsistent, in this case you need to reset all of
> them to a known state.

One of my uses for conntrack is for statistics and analysis and to 
reduce race conditions in taking actions on a particular conntrack.

I need some kind of conntrack ID that will be consistent in the medium 
term accross different conntrack manipulations

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-20 16:04                                                     ` Amin Azez
@ 2005-06-20 16:12                                                       ` Patrick McHardy
  2005-06-22  9:09                                                         ` Amin Azez
  0 siblings, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-06-20 16:12 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist, Jozsef Kadlecsik

Amin Azez wrote:
> One of my uses for conntrack is for statistics and analysis and to
> reduce race conditions in taking actions on a particular conntrack.
> 
> I need some kind of conntrack ID that will be consistent in the medium
> term accross different conntrack manipulations

That is why I've always argued in favour of the ID. Since its needed for
other reasons too, I suggest to just keep it and get on.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-20 16:12                                                       ` Patrick McHardy
@ 2005-06-22  9:09                                                         ` Amin Azez
  2005-06-22  9:30                                                           ` Oscar Mechanic
  2005-06-22 17:23                                                           ` Patrick McHardy
  0 siblings, 2 replies; 48+ messages in thread
From: Amin Azez @ 2005-06-22  9:09 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Jozsef Kadlecsik

Patrick McHardy wrote:

>Amin Azez wrote:
>  
>
>>One of my uses for conntrack is for statistics and analysis and to
>>reduce race conditions in taking actions on a particular conntrack.
>>
>>I need some kind of conntrack ID that will be consistent in the medium
>>term accross different conntrack manipulations
>>    
>>
>
>That is why I've always argued in favour of the ID. Since its needed for
>other reasons too, I suggest to just keep it and get on.
>  
>
Err... the current problem is that the conntrack id _may_ be re-used 
within milli-seconds?
I was trying to find a safe conntrack id.

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-22  9:09                                                         ` Amin Azez
@ 2005-06-22  9:30                                                           ` Oscar Mechanic
  2005-06-22 17:23                                                           ` Patrick McHardy
  1 sibling, 0 replies; 48+ messages in thread
From: Oscar Mechanic @ 2005-06-22  9:30 UTC (permalink / raw)
  To: Amin Azez
  Cc: Netfilter Development Mailinglist, Patrick McHardy,
	Jozsef Kadlecsik

I was thinking about this like using a random number or multipler or
divider on the connection params. One thought, from ip_conntrack_max and
buckets you have an approx number of connections that is feasible to
pass e.g 32k. So the conntrack id goes from 0 --> 32k So if these were
to be looked at like slots e.g. if the ID goes over 32k start from the
bottom again and find an empty slot.

Quite simple suggestion probably eloquently displays that I don't know
what I am talking about.

This is not going to be unique for accounting and I don't think anything
you choose can assure that as we are dealing with a state machine 


On Wed, 2005-06-22 at 10:09 +0100, Amin Azez wrote:
> Patrick McHardy wrote:
> 
> >Amin Azez wrote:
> >  
> >
> >>One of my uses for conntrack is for statistics and analysis and to
> >>reduce race conditions in taking actions on a particular conntrack.
> >>
> >>I need some kind of conntrack ID that will be consistent in the medium
> >>term accross different conntrack manipulations
> >>    
> >>
> >
> >That is why I've always argued in favour of the ID. Since its needed for
> >other reasons too, I suggest to just keep it and get on.
> >  
> >
> Err... the current problem is that the conntrack id _may_ be re-used 
> within milli-seconds?
> I was trying to find a safe conntrack id.
> 
> Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-22  9:09                                                         ` Amin Azez
  2005-06-22  9:30                                                           ` Oscar Mechanic
@ 2005-06-22 17:23                                                           ` Patrick McHardy
  2005-07-11  5:41                                                             ` Harald Welte
  1 sibling, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-06-22 17:23 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist, Jozsef Kadlecsik

Amin Azez wrote:
> Err... the current problem is that the conntrack id _may_ be re-used
> within milli-seconds?
> I was trying to find a safe conntrack id.

No, it is 64 bit wide and does not wrap for a long time.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-22 17:23                                                           ` Patrick McHardy
@ 2005-07-11  5:41                                                             ` Harald Welte
  2005-07-11  7:47                                                               ` Patrick McHardy
  0 siblings, 1 reply; 48+ messages in thread
From: Harald Welte @ 2005-07-11  5:41 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Development Mailinglist, Amin Azez, Jozsef Kadlecsik

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

On Wed, Jun 22, 2005 at 07:23:20PM +0200, Patrick McHardy wrote:
> Amin Azez wrote:
> > Err... the current problem is that the conntrack id _may_ be re-used
> > within milli-seconds?
> > I was trying to find a safe conntrack id.
> 
> No, it is 64 bit wide and does not wrap for a long time.

I'm still not convinced that the ID is a good idea (or that it is needed
in most cases).  However,

However, flow based accounting is basically finished, all that it lacks
is nfnetlink/ctnetlink.  So I want to submit them pretty soon for
mainline inclusion.

If you have decided onto which form of ID, please try to merge those patches
(if any) soon and tell me when I can finalize ctnetlink/nfnetlink for
submission.
 
Thanks!

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-07-11  5:41                                                             ` Harald Welte
@ 2005-07-11  7:47                                                               ` Patrick McHardy
  2005-07-11  9:50                                                                 ` Pablo Neira
  0 siblings, 1 reply; 48+ messages in thread
From: Patrick McHardy @ 2005-07-11  7:47 UTC (permalink / raw)
  To: Harald Welte
  Cc: Netfilter Development Mailinglist, Pablo Neira, Amin Azez,
	Jozsef Kadlecsik

Harald Welte wrote:
> I'm still not convinced that the ID is a good idea (or that it is needed
> in most cases).  However,
> 
> However, flow based accounting is basically finished, all that it lacks
> is nfnetlink/ctnetlink.  So I want to submit them pretty soon for
> mainline inclusion.
> 
> If you have decided onto which form of ID, please try to merge those patches
> (if any) soon and tell me when I can finalize ctnetlink/nfnetlink for
> submission.

Pablo decided to keep the 64bit ID, mainly there is no better
alternative for dumping. I don't know about the state, but AFAIK
he is currently reworking the ctnetlink message format to use
nested attributes instead of kernel structures. Unicast communication
also needs to be fixed, right now everything is only broadcasted and
userspace needs to filter. It should behave like all other netlink
families. That's all I know of that needs to be done, Pablo probably
has more.

Regards
Patrick

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-07-11  7:47                                                               ` Patrick McHardy
@ 2005-07-11  9:50                                                                 ` Pablo Neira
  0 siblings, 0 replies; 48+ messages in thread
From: Pablo Neira @ 2005-07-11  9:50 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, Netfilter Development Mailinglist, Amin Azez,
	Jozsef Kadlecsik

Hi!

Patrick McHardy wrote:
> Harald Welte wrote:
> 
>>I'm still not convinced that the ID is a good idea (or that it is needed
>>in most cases).
> 
> Pablo decided to keep the 64bit ID, mainly there is no better
> alternative for dumping.

Yes, we don't know any reliable way to know from which point the dumping 
stopped once the skbuff gets full.

> he is currently reworking the ctnetlink message format to use
> nested attributes instead of kernel structures.

Indeed. The new message format has required tons of changes but it's the 
way to go.

> Unicast communication
> also needs to be fixed, right now everything is only broadcasted and
> userspace needs to filter. It should behave like all other netlink
> families. That's all I know of that needs to be done, Pablo probably
> has more.

I expect to send the patches tomorrow, so we could discuss on the code.

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-06-04 23:52                                 ` Pablo Neira
  2005-06-05  1:02                                   ` Pablo Neira
@ 2005-06-06  8:17                                   ` Jozsef Kadlecsik
  1 sibling, 0 replies; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-06-06  8:17 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy

On Sun, 5 Jun 2005, Pablo Neira wrote:

> + The unique ID eats 8 extra bytes, since it will be an __u64. On my
> laptop (1787 buckets, 14296 max), that makes 114368 extra bytes (worst
> case).

And if nf_conntrack is submitted in its present form, every IP address
in conntrack will require extra 12 bytes, which makes extra 48 bytes per
entry.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-17 16:12                           ` Amin Azez
  2005-05-17 20:17                             ` Patrick McHardy
@ 2005-05-18  6:45                             ` Jozsef Kadlecsik
  2005-05-18  7:08                               ` Amin Azez
  1 sibling, 1 reply; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-05-18  6:45 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist

On Tue, 17 May 2005, Amin Azez wrote:

> KOVACS Krisztian wrote:
> >   Probably Patrick was referring to a possible problem where the
> > following happens: a new connection is established and destroyed in a
> > very short time. If a new connection with the same tuple is created
> > before the timestamp increases (which is perfectly possible IMHO if you
> > have some slow embedded HW with no high precision timer available)
>
> After further reading I think this scenario is highly unlikely.
>
> I don't mean improbable, I mean, is there any such hardware?
> If a socket is not to be reused until TCP_TIME_WAIT which is recommended
> to be in the region of 4 minutes, is there really any hardware that
> can't time to that resolution? Is there really any devices that will
> re-use a TCP socket in the same timer tick as they closed it?

We have to deal with other protocols as well, not just TCP. For example
using UDP one could fairly easily trigger the described situation.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-18  6:45                             ` Jozsef Kadlecsik
@ 2005-05-18  7:08                               ` Amin Azez
  2005-05-18  7:17                                 ` Jozsef Kadlecsik
  0 siblings, 1 reply; 48+ messages in thread
From: Amin Azez @ 2005-05-18  7:08 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Netfilter Development Mailinglist

Jozsef Kadlecsik wrote:

>On Tue, 17 May 2005, Amin Azez wrote:
>
>  
>
>>KOVACS Krisztian wrote:
>>    
>>
>>>  Probably Patrick was referring to a possible problem where the
>>>following happens: a new connection is established and destroyed in a
>>>very short time. If a new connection with the same tuple is created
>>>before the timestamp increases (which is perfectly possible IMHO if you
>>>have some slow embedded HW with no high precision timer available)
>>>      
>>>
>>After further reading I think this scenario is highly unlikely.
>>
>>I don't mean improbable, I mean, is there any such hardware?
>>If a socket is not to be reused until TCP_TIME_WAIT which is recommended
>>to be in the region of 4 minutes, is there really any hardware that
>>can't time to that resolution? Is there really any devices that will
>>re-use a TCP socket in the same timer tick as they closed it?
>>    
>>
>
>We have to deal with other protocols as well, not just TCP. For example
>using UDP one could fairly easily trigger the described situation.
>
>  
>

I think this situation could not be triggered by UDP, as there are no 
explicit close sequences for udp that conntrack recognizes, so the 
conntrack would only be destroyed after a conntrack timer expires (which 
must be larger than the minimum resolution of the timer), therefore it 
becomes impossible to bring up two conntracks with the same tuples in 
the same clock tick.

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-18  7:08                               ` Amin Azez
@ 2005-05-18  7:17                                 ` Jozsef Kadlecsik
  0 siblings, 0 replies; 48+ messages in thread
From: Jozsef Kadlecsik @ 2005-05-18  7:17 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist

On Wed, 18 May 2005, Amin Azez wrote:

> >>>  Probably Patrick was referring to a possible problem where the
> >>>following happens: a new connection is established and destroyed in a
> >>>very short time. If a new connection with the same tuple is created
> >>>before the timestamp increases (which is perfectly possible IMHO if you
> >>>have some slow embedded HW with no high precision timer available)
> >>>
> >>>
> >>After further reading I think this scenario is highly unlikely.
> >>
> >>I don't mean improbable, I mean, is there any such hardware?
> >>If a socket is not to be reused until TCP_TIME_WAIT which is recommended
> >>to be in the region of 4 minutes, is there really any hardware that
> >>can't time to that resolution? Is there really any devices that will
> >>re-use a TCP socket in the same timer tick as they closed it?
> >
> >We have to deal with other protocols as well, not just TCP. For example
> >using UDP one could fairly easily trigger the described situation.
>
> I think this situation could not be triggered by UDP, as there are no
> explicit close sequences for udp that conntrack recognizes, so the
> conntrack would only be destroyed after a conntrack timer expires (which
> must be larger than the minimum resolution of the timer), therefore it
> becomes impossible to bring up two conntracks with the same tuples in
> the same clock tick.

You're right. That was a broken example.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] alternative to conntrack ID
  2005-05-07 22:32                       ` Marcus Sundberg
  2005-05-09 14:17                         ` KOVACS Krisztian
@ 2005-05-11  8:43                         ` Amin Azez
  1 sibling, 0 replies; 48+ messages in thread
From: Amin Azez @ 2005-05-11  8:43 UTC (permalink / raw)
  To: Marcus Sundberg
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira,
	Patrick McHardy

I re-propose the possibility of using a serial number for conntracks as 
an "additional qualifer" (although also unique) to be used by user-space 
applications.
This way we keep the efficiency of using the tuple as a hash-key to 
retreive the conntrack, but the serial number to guard retreival of the 
right one.

I think that it is clear that although timestamp may sometimes be useful 
in a conntrack, it is does not universally solve the problem identifying 
a particular connection over short periods of time; and this because of 
the claim that it may be possible on some platforms to create and 
destroy and re-create a conntrack in the same tick.

UDP conntracks cannot be recreated in the same tick because their 
destruction is timer based, relating to a period of inactivity.

I think some cases where a TCP conntrack can be re-created in the same 
tick are where
1) SO_DONTLINGER/SO_REUSE_ADDR & friends are used on participating 
originating machines
2) Embedded and other weird network devices re-connect rapidly
3) A different MAC address takes over an IP address

Hacks to overcome this unlikely situation render the whole solution less 
attractive than a conntrack serial number.

User-space applications monitoring and manipulating conntracks do need a 
more permanent reference to a conntrack that is likely to remain unique 
over a timescale of at least a few minutes, so I re-propose a serial 
number to this end.

Amin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-04-29  7:14       ` Jozsef Kadlecsik
  2005-04-29  8:02         ` Harald Welte
@ 2005-05-01 23:49         ` Pablo Neira
  2005-05-02 10:47           ` Harald Welte
  1 sibling, 1 reply; 48+ messages in thread
From: Pablo Neira @ 2005-05-01 23:49 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Harald Welte, Netfilter Development Mailinglist, Patrick McHardy

Jozsef Kadlecsik wrote:
> Looking at the last changes, I think it'd be much more better to port
> ip_queue to nfnetlink than to reserve another netlink ID: the hooks in
> nfnetlink are already there. I know that'd create backward compatibility
> issues at the existing queue applications, though... :-(

I was playing around with an experimental port of ip_queue to 
nf_queue/nfnetlink during xmas holidays, it is crap. So I probably start 
it from scratch.

In this specific case where we can break third party applications, to 
ensure backward compatibility, I think that we can keep both ip_queue 
and nf_queue in kernel tree for quite some time.

--
Pablo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] [PATCH] ctnetlink updates
  2005-05-01 23:49         ` [RFC] [PATCH] ctnetlink updates Pablo Neira
@ 2005-05-02 10:47           ` Harald Welte
  0 siblings, 0 replies; 48+ messages in thread
From: Harald Welte @ 2005-05-02 10:47 UTC (permalink / raw)
  To: Pablo Neira
  Cc: Netfilter Development Mailinglist, Patrick McHardy,
	Jozsef Kadlecsik

[-- Attachment #1: Type: text/plain, Size: 742 bytes --]

On Mon, May 02, 2005 at 01:49:38AM +0200, Pablo Neira wrote:
> In this specific case where we can break third party applications, to ensure 
> backward compatibility, I think that we can keep both ip_queue and
> nf_queue in kernel tree for quite some time.

yes. also, as long as the libipq api can be offered to applications, I
don't see that much of an issue.

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2005-07-11  9:50 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-27 23:55 [RFC] [PATCH] ctnetlink updates Pablo Neira
2005-04-01  6:59 ` Harald Welte
2005-04-03 18:01 ` Patrick McHardy
2005-04-06 18:08   ` Pablo Neira
2005-04-17 15:07     ` Patrick McHardy
2005-04-29  7:14       ` Jozsef Kadlecsik
2005-04-29  8:02         ` Harald Welte
2005-05-04  9:18           ` [RFC] alternative to conntrack ID Amin Azez
2005-05-04  9:32             ` Patrick Schaaf
2005-05-04 11:30             ` Patrick McHardy
2005-05-04 12:01               ` Amin Azez
2005-05-06 15:16                 ` Patrick McHardy
2005-05-07 20:36                   ` Marcus Sundberg
2005-05-07 22:18                     ` Patrick McHardy
2005-05-07 22:32                       ` Marcus Sundberg
2005-05-09 14:17                         ` KOVACS Krisztian
2005-05-09 15:08                           ` Amin Azez
2005-05-10  6:49                             ` Harald Welte
2005-05-17 16:12                           ` Amin Azez
2005-05-17 20:17                             ` Patrick McHardy
2005-05-18  7:24                               ` Amin Azez
2005-05-18  9:30                               ` Jozsef Kadlecsik
2005-06-04 23:52                                 ` Pablo Neira
2005-06-05  1:02                                   ` Pablo Neira
2005-06-06  8:48                                     ` Jozsef Kadlecsik
2005-06-09 12:52                                       ` Pablo Neira
2005-06-09 13:00                                         ` Pablo Neira
2005-06-09 13:34                                           ` Jozsef Kadlecsik
2005-06-10 10:21                                             ` Pablo Neira
2005-06-13  7:41                                               ` Jozsef Kadlecsik
2005-06-14  2:30                                                 ` Pablo Neira
2005-06-14  2:42                                                   ` Patrick McHardy
2005-06-15  2:41                                                     ` Pablo Neira
2005-06-20 16:04                                                     ` Amin Azez
2005-06-20 16:12                                                       ` Patrick McHardy
2005-06-22  9:09                                                         ` Amin Azez
2005-06-22  9:30                                                           ` Oscar Mechanic
2005-06-22 17:23                                                           ` Patrick McHardy
2005-07-11  5:41                                                             ` Harald Welte
2005-07-11  7:47                                                               ` Patrick McHardy
2005-07-11  9:50                                                                 ` Pablo Neira
2005-06-06  8:17                                   ` Jozsef Kadlecsik
2005-05-18  6:45                             ` Jozsef Kadlecsik
2005-05-18  7:08                               ` Amin Azez
2005-05-18  7:17                                 ` Jozsef Kadlecsik
2005-05-11  8:43                         ` Amin Azez
2005-05-01 23:49         ` [RFC] [PATCH] ctnetlink updates Pablo Neira
2005-05-02 10:47           ` Harald Welte

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.