[RFC] NAT tuple reservation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] NAT tuple reservation
@ 2003-11-17 14:47 KOVACS Krisztian
  2003-11-17 22:29 ` Harald Welte
  2003-11-28 15:06 ` KOVACS Krisztian
  0 siblings, 2 replies; 6+ messages in thread
From: KOVACS Krisztian @ 2003-11-17 14:47 UTC (permalink / raw)
  To: netfilter-devel

   Hi,

   Netfilter's NAT subsystem currently does not support reservation of 
tuples. This means, there are no facility which you could use to 
"preallocate" exact IP/port pairs which could be later used by the 
registrator _for_sure_. However, this causes problems when NAT-ting 
certain protocols: when the protocol uses secondary channels, and the NAT 
helper has to modify the addresses/ports in the negotiation. (This is the 
case for example with SIP. See Harald's mail at 
http://lists.netfilter.org/pipermail/netfilter-devel/2003-November/012999.html 
for more info.)

   So, some kind of tuple reservation was really needed to be able to 
solve these problems. The reservation facility would provide a way to 
reserve a "half-tuple" (manip), which wouldn't be allocated by the NAT 
facility when a unique tuple is searched. When the registrator of the 
reserved tuple would like to use it, it would use a special function, 
which calls ip_nat_setup_info() with a special flag, and after applying 
the mapping succeeded, removes the reservation. The reservation would be 
per-expectation, ip_conntrack_expect would store a reference to the 
reservation, and when the expectation is deleted all related reservations 
are deleted, too.

   The allocation routines in ip_nat_core.c should be extended to check if 
a tuple is reserved. ip_nat_used_tuple() would be extended to check if the 
given tuple uses an already reserved manip. The reserved manips would be 
stored in a hash table, hashed based on the manip structure. The stored 
structures would look something like this:

struct ip_nat_reserved {
   /* hash chain */
   list_head prev, next;

   /* reference count */
   atomic_t use;

   /* reserved manip */
   struct ip_conntrack_manip m;
}

The functions used by the NAT helpers would look something like this:

ip_nat_reserved_register(struct ip_conntrack_expect *expect,
			 struct ip_conntrack_manip *manip)

ip_nat_reserved_unregister(struct ip_conntrack_expect *expect,
			   struct ip_conntrack_manip *manip)

ip_nat_setup_info_reserved(...): This is basically the same as 
ip_nat_setup_info, except that it calls ip_nat_setup_info() with the 
appropriate flag, and deletes the reservation in case of successful 
allocation.

   Do you see any shortcomings/problems with this?

-- 
   Regards,
     Krisztian KOVACS

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] NAT tuple reservation
  2003-11-17 14:47 [RFC] NAT tuple reservation KOVACS Krisztian
@ 2003-11-17 22:29 ` Harald Welte
  2003-11-18  7:47   ` KOVACS Krisztian
  2003-11-28 15:06 ` KOVACS Krisztian
  1 sibling, 1 reply; 6+ messages in thread
From: Harald Welte @ 2003-11-17 22:29 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2827 bytes --]

On Mon, Nov 17, 2003 at 03:47:44PM +0100, KOVACS Krisztian wrote:
>   So, some kind of tuple reservation was really needed to be able to 
> solve these problems. The reservation facility would provide a way to 
> reserve a "half-tuple" (manip), which wouldn't be allocated by the NAT 
> facility when a unique tuple is searched. When the registrator of the 
> reserved tuple would like to use it, it would use a special function, 
> which calls ip_nat_setup_info() with a special flag, and after applying 
> the mapping succeeded, removes the reservation. The reservation would be 
> per-expectation, ip_conntrack_expect would store a reference to the 
> reservation, and when the expectation is deleted all related reservations 
> are deleted, too.

Yes, we all agree that we need some kind of mechanism for that.

>   The allocation routines in ip_nat_core.c should be extended to check if 
> a tuple is reserved. ip_nat_used_tuple() would be extended to check if the 
> given tuple uses an already reserved manip. The reserved manips would be 
> stored in a hash table, hashed based on the manip structure. The stored 
> structures would look something like this:

However, I fear this would add considerable overhead.  Yet another hash
table... however, most of the time this table would be empty, or almost
empty.  So let's rather optimize on the speed of the hash function than
on the hash distribution in that case.

> struct ip_nat_reserved {
>   /* hash chain */
>   list_head prev, next;

you only need one list head for the hash table ;)

>   /* reference count */
>   atomic_t use;

do we really need to do sophisticated reference counting on those
objects?  I think it is enough to protect the whole hash table under the
already existing nat lock.  Entries are inserted/deleted under that lock, and
hash table lookups also only happen under that lock.   The only
reference would be a pointer from a single ip_conntrack_expect - thus
the refcount would always be one.

> ip_nat_setup_info_reserved(...): This is basically the same as 
> ip_nat_setup_info, except that it calls ip_nat_setup_info() with the 
> appropriate flag, and deletes the reservation in case of successful 
> allocation.
> 
>   Do you see any shortcomings/problems with this?

It doesn't cover allocation/reservation of contiguous port numbers,
which especially RTP/RTSP require. 

>   Regards,
>     Krisztian KOVACS

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] NAT tuple reservation
  2003-11-17 22:29 ` Harald Welte
@ 2003-11-18  7:47   ` KOVACS Krisztian
  2003-11-18 16:57     ` Harald Welte
  0 siblings, 1 reply; 6+ messages in thread
From: KOVACS Krisztian @ 2003-11-18  7:47 UTC (permalink / raw)
  To: Harald Welte; +Cc: netfilter-devel


   Hi,

Harald Welte wrote:
>>  The allocation routines in ip_nat_core.c should be extended to check if 
>>a tuple is reserved. ip_nat_used_tuple() would be extended to check if the 
>>given tuple uses an already reserved manip. The reserved manips would be 
>>stored in a hash table, hashed based on the manip structure. The stored 
>>structures would look something like this:
> 
> 
> However, I fear this would add considerable overhead.  Yet another hash
> table... however, most of the time this table would be empty, or almost
> empty.  So let's rather optimize on the speed of the hash function than
> on the hash distribution in that case.

   The empty case could be made fast using a reservation-counter.

>>struct ip_nat_reserved {
>>  /* hash chain */
>>  list_head prev, next;
> 
> you only need one list head for the hash table ;)

   Of course... :)

>>  /* reference count */
>>  atomic_t use;
> 
> do we really need to do sophisticated reference counting on those
> objects?  I think it is enough to protect the whole hash table under the
> already existing nat lock.  Entries are inserted/deleted under that lock, and
> hash table lookups also only happen under that lock.   The only
> reference would be a pointer from a single ip_conntrack_expect - thus
> the refcount would always be one.
> 
> 
>>ip_nat_setup_info_reserved(...): This is basically the same as 
>>ip_nat_setup_info, except that it calls ip_nat_setup_info() with the 
>>appropriate flag, and deletes the reservation in case of successful 
>>allocation.
>>
>>  Do you see any shortcomings/problems with this?
> 
> 
> It doesn't cover allocation/reservation of contiguous port numbers,
> which especially RTP/RTSP require. 

   It could be extended, however, it would make checking/allocating even 
more slow... If you could register ranges, for example, how do you define 
the hash function? Or, for example, you should be able to delete parts of 
that range from the reserved list, so that splitting the original range 
could be necessary... Cannot problems like these solved by allowing more 
than one reservation per expectation to be registered?

-- 
   Regards,
     Krisztian KOVACS

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] NAT tuple reservation
  2003-11-18  7:47   ` KOVACS Krisztian
@ 2003-11-18 16:57     ` Harald Welte
  2003-11-19 10:21       ` KOVACS Krisztian
  0 siblings, 1 reply; 6+ messages in thread
From: Harald Welte @ 2003-11-18 16:57 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2932 bytes --]

On Tue, Nov 18, 2003 at 08:47:26AM +0100, KOVACS Krisztian wrote:
> >However, I fear this would add considerable overhead.  Yet another hash
> >table... however, most of the time this table would be empty, or almost
> >empty.  So let's rather optimize on the speed of the hash function than
> >on the hash distribution in that case.
> 
>   The empty case could be made fast using a reservation-counter.

that sounds like a good plan.

> >> /* reference count */
> >> atomic_t use;
> >
> >do we really need to do sophisticated reference counting on those
> >objects?  I think it is enough to protect the whole hash table under the
> >already existing nat lock.  Entries are inserted/deleted under that lock, 
> >and
> >hash table lookups also only happen under that lock.   The only
> >reference would be a pointer from a single ip_conntrack_expect - thus
> >the refcount would always be one.

you didn't comment on that.  What do you think?

> >> Do you see any shortcomings/problems with this?
> >
> >It doesn't cover allocation/reservation of contiguous port numbers,
> >which especially RTP/RTSP require. 
> 
>   It could be extended, however, it would make checking/allocating even 
> more slow... 

I don't care about allocation speed, as it is a rare thing to happen
(compared to the total number of packets we see).  However, checking
should be fast.

> If you could register ranges, for example, how do you define 
> the hash function? Or, for example, you should be able to delete parts of 
> that range from the reserved list, so that splitting the original range 
> could be necessary... 

I know, it's not a trivial issue.  It's ugly.  And I don't think that
you could ever put ranges in the hashtable.  However, you could provide
a high-layer function that 

- locks the htable
- tries to allocate as many reservations as there are in the range,
  contiguously
- unlocks the htable

In normal cases, we would only have 2 or 4 (maybe 8) consecutive port
numbers to allocate.  So it is feasible to put each of them into the
hash.  This would not give a slowdown at checking / lookup time.

> Cannot problems like these solved by allowing more 
> than one reservation per expectation to be registered?

The problem is, that then there is no atomicity.  They really need to be
consecutive.  

So from the implementation point of view: yes, they are multiple
seperate reservations.  But from an API point of view, it should be a
single call.

> -- 
>   Regards,
>     Krisztian KOVACS

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] NAT tuple reservation
  2003-11-18 16:57     ` Harald Welte
@ 2003-11-19 10:21       ` KOVACS Krisztian
  0 siblings, 0 replies; 6+ messages in thread
From: KOVACS Krisztian @ 2003-11-19 10:21 UTC (permalink / raw)
  To: Harald Welte; +Cc: netfilter-devel

   Hi,

Harald Welte wrote:
>>>>/* reference count */
>>>>atomic_t use;
>>>
>>>do we really need to do sophisticated reference counting on those
>>>objects?  I think it is enough to protect the whole hash table under the
>>>already existing nat lock.  Entries are inserted/deleted under that lock, 
>>>and
>>>hash table lookups also only happen under that lock.   The only
>>>reference would be a pointer from a single ip_conntrack_expect - thus
>>>the refcount would always be one.
> 
> you didn't comment on that.  What do you think?

   I think we should try it this way. Currently I don't see anything which 
would require reference counting. However, we have to be sure that 
referenced reservations do not get deleted without setting the pointer to 
NULL. As the proposed interface allocates reservations only for 
expectations, the pointer can be easily handled there. So, the only 
remaining issue is that proper locking and checking should be implemented 
in the register/unregister functions. I don't know if we would need an 
additional lock for that, I suspect that ip_nat_lock will be enough.

> I don't care about allocation speed, as it is a rare thing to happen
> (compared to the total number of packets we see).  However, checking
> should be fast.

   The proposed implementation would require a hash table lookup only if 
there is at least one reservation. For the hash function, hash_by_src() 
could be used, which is rather simple:

   return (manip->ip + manip->u.all + proto) % ip_nat_htable_size;

>>If you could register ranges, for example, how do you define 
>>the hash function? Or, for example, you should be able to delete parts of 
>>that range from the reserved list, so that splitting the original range 
>>could be necessary... 
> 
> I know, it's not a trivial issue.  It's ugly.  And I don't think that
> you could ever put ranges in the hashtable.  However, you could provide
> a high-layer function that 
> 
> - locks the htable
> - tries to allocate as many reservations as there are in the range,
>   contiguously
> - unlocks the htable
> 
> In normal cases, we would only have 2 or 4 (maybe 8) consecutive port
> numbers to allocate.  So it is feasible to put each of them into the
> hash.  This would not give a slowdown at checking / lookup time.
 >
>>Cannot problems like these solved by allowing more 
>>than one reservation per expectation to be registered?
> 
> The problem is, that then there is no atomicity.  They really need to be
> consecutive.  
> 
> So from the implementation point of view: yes, they are multiple
> seperate reservations.  But from an API point of view, it should be a
> single call.

   So, you propose something like this:

   ip_nat_reserved_register_range(struct ip_conntrack_expect *expect,
				 struct ip_nat_range *range);

   ip_nat_reserved_unregister_range(struct ip_conntrack_expect *expect,
				   struct ip_nat_range *range);

   It should grab the lock protecting the reserved hash, check if anything 
in that range is already reserved, and register the manips one-by-one.

   Oh, and one possible problem: although NAT won't allocate a reserved 
tuple as unique, a connection which is not NAT-ted can still clash with 
the reservation... How the heck could this be checked?

-- 
   Regards,
     Krisztian KOVACS

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] NAT tuple reservation
  2003-11-17 14:47 [RFC] NAT tuple reservation KOVACS Krisztian
  2003-11-17 22:29 ` Harald Welte
@ 2003-11-28 15:06 ` KOVACS Krisztian
  1 sibling, 0 replies; 6+ messages in thread
From: KOVACS Krisztian @ 2003-11-28 15:06 UTC (permalink / raw)
  To: netfilter-devel

   Hi,

KOVACS Krisztian wrote:
> ip_nat_setup_info_reserved(...): This is basically the same as 
> ip_nat_setup_info, except that it calls ip_nat_setup_info() with the 
> appropriate flag, and deletes the reservation in case of successful 
> allocation.

   Oh, one more question, suggested by Nikolai Dahlem: to be able to 
decide if ip_nat_setup_info() may allocate a reserved tuple, we need some 
kind of additional argument (flags). However, ip_nat_setup_info() is 
called from a number of places, for example NAT helpers. So, modifying 
ip_nat_setup_info() would mean those calls need to be updated.

   Nikolai suggested we could provide an ip_nat_setup_info_extended() 
function, and make ip_nat_setup_info be a simple call with the default 
flags. However, I feel that this flags attribute could be used more 
universally, for example, our transparent proxying patch already adds it, 
and uses it actively to bypass calling helpers.

   So, the question is: what about extending ip_nat_setup_info() with an 
additional flag attribute in the NAT core? Or should we stay with a 
"workaround" for now?

-- 
   Regards,
     Krisztian KOVACS

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-11-28 15:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-17 14:47 [RFC] NAT tuple reservation KOVACS Krisztian
2003-11-17 22:29 ` Harald Welte
2003-11-18  7:47   ` KOVACS Krisztian
2003-11-18 16:57     ` Harald Welte
2003-11-19 10:21       ` KOVACS Krisztian
2003-11-28 15:06 ` KOVACS Krisztian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.