Bridge netfilter defered hooks

All of lore.kernel.org
 help / color / mirror / Atom feed

* Bridge netfilter defered hooks
@ 2006-06-02 14:57 Patrick McHardy
  2006-06-02 17:00 ` Bart De Schuymer
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2006-06-02 14:57 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: Netfilter Development Mailinglist

Bart, I would like to get another discussion going about what
to do about the physdev match and the hook deferal done by
bridge netfilter. The lastest addition to the list of things
it breaks is qdisc classification on the bridge device using
MARK or CLASSIFY.

The main question is if the feature that causes all this trouble
(output port matching within iptables) really is useful at all.
It is not needed for filtering based on the output port of a
bridge, this can be done using ebtables and iptables+mark if
necessary. The only thing I can see that can't be done using
ebtables is NAT based on the output port. I somehow doubt that
this is really worth all the trouble, google show about 20 hits
for "-t nat" "-m physdev" "--physdev-out", half of which appear
to be examples in some magazines. So my prefered solution would
be to deprecate it and remove it in a couple of month.

For a short-term solution we should also think about whether
the hook deferal really needs to be done by default or if the
few users that appear to be using this can't just enable it
manually.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-02 14:57 Bridge netfilter defered hooks Patrick McHardy
@ 2006-06-02 17:00 ` Bart De Schuymer
  2006-06-02 17:18   ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: Bart De Schuymer @ 2006-06-02 17:00 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Op vr, 02-06-2006 te 16:57 +0200, schreef Patrick McHardy:
> Bart, I would like to get another discussion going about what
> to do about the physdev match and the hook deferal done by
> bridge netfilter. The lastest addition to the list of things
> it breaks is qdisc classification on the bridge device using
> MARK or CLASSIFY.
> 
> The main question is if the feature that causes all this trouble
> (output port matching within iptables) really is useful at all.
> It is not needed for filtering based on the output port of a
> bridge, this can be done using ebtables and iptables+mark if
> necessary. The only thing I can see that can't be done using
> ebtables is NAT based on the output port. I somehow doubt that
> this is really worth all the trouble, google show about 20 hits
> for "-t nat" "-m physdev" "--physdev-out", half of which appear
> to be examples in some magazines. So my prefered solution would
> be to deprecate it and remove it in a couple of month.

Sounds reasonable. You of course missed the combination of any of the
iptables specific matches/targets with the physdev match.
I'm against removing --physdev-out completely, since it's perfectly
usable without causing problems (AFAIK) for filtering purely bridged
packets.
I don't object to disabling the functionality for packets that get
routed to a bridge device. Most of those situations can probably be
fixed by routing the packets to the bridge ports instead.

> For a short-term solution we should also think about whether
> the hook deferal really needs to be done by default or if the
> few users that appear to be using this can't just enable it
> manually.

I have no objections to this either. The physdev module can be modified
so that it sends a warning to the syslog if --physdev-out is used in
non-bridging mode (meaning the rule is probably erroneous).

cheers,
Bart

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-02 17:00 ` Bart De Schuymer
@ 2006-06-02 17:18   ` Patrick McHardy
  2006-06-02 20:10     ` Carl-Daniel Hailfinger
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2006-06-02 17:18 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: Netfilter Development Mailinglist

Bart De Schuymer wrote:
> Op vr, 02-06-2006 te 16:57 +0200, schreef Patrick McHardy:
> 
>>Bart, I would like to get another discussion going about what
>>to do about the physdev match and the hook deferal done by
>>bridge netfilter. The lastest addition to the list of things
>>it breaks is qdisc classification on the bridge device using
>>MARK or CLASSIFY.
>>
>>The main question is if the feature that causes all this trouble
>>(output port matching within iptables) really is useful at all.
>>It is not needed for filtering based on the output port of a
>>bridge, this can be done using ebtables and iptables+mark if
>>necessary. The only thing I can see that can't be done using
>>ebtables is NAT based on the output port. I somehow doubt that
>>this is really worth all the trouble, google show about 20 hits
>>for "-t nat" "-m physdev" "--physdev-out", half of which appear
>>to be examples in some magazines. So my prefered solution would
>>be to deprecate it and remove it in a couple of month.
> 
> 
> Sounds reasonable. You of course missed the combination of any of the
> iptables specific matches/targets with the physdev match.

Thats what I meant by "iptables+mark". You can combine iptables
specific matches by marking matching packets, then match on the
mark with ebtables (or the other way around for incoming packets).

> I'm against removing --physdev-out completely, since it's perfectly
> usable without causing problems (AFAIK) for filtering purely bridged
> packets.

I'm fine with that as long as it doesn't stand in the way of getting
rid of the defered hooks, but I think you're right and it won't.

> I don't object to disabling the functionality for packets that get
> routed to a bridge device. Most of those situations can probably be
> fixed by routing the packets to the bridge ports instead.
>
>>For a short-term solution we should also think about whether
>>the hook deferal really needs to be done by default or if the
>>few users that appear to be using this can't just enable it
>>manually.
> 
> 
> I have no objections to this either. The physdev module can be modified
> so that it sends a warning to the syslog if --physdev-out is used in
> non-bridging mode (meaning the rule is probably erroneous).

Great, I'll look into this. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-02 17:18   ` Patrick McHardy
@ 2006-06-02 20:10     ` Carl-Daniel Hailfinger
  2006-06-08  7:15       ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: Carl-Daniel Hailfinger @ 2006-06-02 20:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Patrick McHardy wrote:
> Bart De Schuymer wrote:
>> Op vr, 02-06-2006 te 16:57 +0200, schreef Patrick McHardy:
>>
>>> The main question is if the feature that causes all this trouble
>>> (output port matching within iptables) really is useful at all.
>>> It is not needed for filtering based on the output port of a
>>> bridge, this can be done using ebtables and iptables+mark if
>>> necessary.[...]
>>
>> Sounds reasonable. You of course missed the combination of any of the
>> iptables specific matches/targets with the physdev match.
> 
> Thats what I meant by "iptables+mark". You can combine iptables
> specific matches by marking matching packets, then match on the
> mark with ebtables (or the other way around for incoming packets).

IIRC the mark has only 32 bits. Not so long ago, I was using 30 bits
of that in my firewalling rules on a bridge-router. I might have
squeezed the physdev match in the remaining 2 bits, but I'm not
sure. I do admit the setup was fairly uncommon (bridging and
double nat with only one machine).

Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-02 20:10     ` Carl-Daniel Hailfinger
@ 2006-06-08  7:15       ` Patrick McHardy
  2006-06-08 20:47         ` Martijn Lievaart
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2006-06-08  7:15 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: Netfilter Development Mailinglist, Bart De Schuymer

Carl-Daniel Hailfinger wrote:
> Patrick McHardy wrote:
> 
>>Thats what I meant by "iptables+mark". You can combine iptables
>>specific matches by marking matching packets, then match on the
>>mark with ebtables (or the other way around for incoming packets).
> 
> 
> IIRC the mark has only 32 bits. Not so long ago, I was using 30 bits
> of that in my firewalling rules on a bridge-router. I might have
> squeezed the physdev match in the remaining 2 bits, but I'm not
> sure. I do admit the setup was fairly uncommon (bridging and
> double nat with only one machine).

Yes, its getting a bit tight in there, but so far in all setups I've
seen it was possible to get along with the 32 bits using masks or
reusing bits after they are no longer needed. I guess we'll have to
wait and see ..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-08  7:15       ` Patrick McHardy
@ 2006-06-08 20:47         ` Martijn Lievaart
  2006-06-08 21:40           ` Simon Lodal
  2006-06-19 15:59           ` Patrick McHardy
  0 siblings, 2 replies; 13+ messages in thread
From: Martijn Lievaart @ 2006-06-08 20:47 UTC (permalink / raw)
  Cc: Netfilter Development Mailinglist

Patrick McHardy wrote:

>Carl-Daniel Hailfinger wrote:
>  
>
>>IIRC the mark has only 32 bits. Not so long ago, I was using 30 bits
>>of that in my firewalling rules on a bridge-router. I might have
>>squeezed the physdev match in the remaining 2 bits, but I'm not
>>sure. I do admit the setup was fairly uncommon (bridging and
>>double nat with only one machine).
>>    
>>
>
>Yes, its getting a bit tight in there, but so far in all setups I've
>seen it was possible to get along with the 32 bits using masks or
>reusing bits after they are no longer needed. I guess we'll have to
>wait and see ..
>  
>

Something I've been thinking about. Currently it is impossible to write 
any kind of generic tool that uses the mark and plays nice with other 
generic tools. Maybe we need some kind of API that allocates bits in the 
mark. Something like "give me two bits", that returns some handle to the 
bits. That handle could then be used for identifying the bits in the mark.

As an added benefit, this encapsulates the operations on the mark, 
making it trivial to switch to say 64 bits for the mark.

What do others think?

M4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-08 20:47         ` Martijn Lievaart
@ 2006-06-08 21:40           ` Simon Lodal
  2006-06-08 22:17             ` Martijn Lievaart
  2006-06-19 15:59           ` Patrick McHardy
  1 sibling, 1 reply; 13+ messages in thread
From: Simon Lodal @ 2006-06-08 21:40 UTC (permalink / raw)
  To: netfilter-devel, Martijn Lievaart

On Thursday 08 June 2006 22:47, Martijn Lievaart wrote:
> Patrick McHardy wrote:
> >Carl-Daniel Hailfinger wrote:
> >>IIRC the mark has only 32 bits. Not so long ago, I was using 30 bits
> >>of that in my firewalling rules on a bridge-router. I might have
> >>squeezed the physdev match in the remaining 2 bits, but I'm not
> >>sure. I do admit the setup was fairly uncommon (bridging and
> >>double nat with only one machine).
> >
> >Yes, its getting a bit tight in there, but so far in all setups I've
> >seen it was possible to get along with the 32 bits using masks or
> >reusing bits after they are no longer needed. I guess we'll have to
> >wait and see ..
>
> Something I've been thinking about. Currently it is impossible to write
> any kind of generic tool that uses the mark and plays nice with other
> generic tools. Maybe we need some kind of API that allocates bits in the
> mark. Something like "give me two bits", that returns some handle to the
> bits. That handle could then be used for identifying the bits in the mark.
>
> As an added benefit, this encapsulates the operations on the mark,
> making it trivial to switch to say 64 bits for the mark.
>
> What do others think?
>
> M4

Sounds nice. If the handles are names it becomes a sort of runtime-defined bit 
struct. Would be nice to have across the tools, instead of doing the black 
bit magic directly (and in parallel) in each place.

Why not define the mark size at compile time, in case you need more bits?


Greetings

Simon

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-08 21:40           ` Simon Lodal
@ 2006-06-08 22:17             ` Martijn Lievaart
  2006-06-08 23:43               ` Philip Craig
  0 siblings, 1 reply; 13+ messages in thread
From: Martijn Lievaart @ 2006-06-08 22:17 UTC (permalink / raw)
  To: Simon Lodal; +Cc: netfilter-devel

Simon Lodal wrote:

>Why not define the mark size at compile time, in case you need more bits?
>  
>

Because that probably would cause the skb to grow beyond a cache line, 
though I'm nowhere an expert on these things. Opinions?

M4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-08 22:17             ` Martijn Lievaart
@ 2006-06-08 23:43               ` Philip Craig
  2006-06-19 16:01                 ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: Philip Craig @ 2006-06-08 23:43 UTC (permalink / raw)
  To: Martijn Lievaart; +Cc: netfilter-devel

On 06/09/2006 08:17 AM, Martijn Lievaart wrote:
> Simon Lodal wrote:
> 
>> Why not define the mark size at compile time, in case you need more bits?
>>  
>>
> 
> Because that probably would cause the skb to grow beyond a cache line, 
> though I'm nowhere an expert on these things. Opinions?

It may be enough to extend the conntrack mark, but not the packet mark,
so that we aren't growing the skb.  Would need fancier operations for
transferring only parts of the conntrack mark to the packet mark.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-08 20:47         ` Martijn Lievaart
  2006-06-08 21:40           ` Simon Lodal
@ 2006-06-19 15:59           ` Patrick McHardy
  2006-06-20 21:26             ` Martijn Lievaart
  1 sibling, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2006-06-19 15:59 UTC (permalink / raw)
  To: Martijn Lievaart; +Cc: Netfilter Development Mailinglist

Martijn Lievaart wrote:
> Patrick McHardy wrote:
> 
>> Yes, its getting a bit tight in there, but so far in all setups I've
>> seen it was possible to get along with the 32 bits using masks or
>> reusing bits after they are no longer needed. I guess we'll have to
>> wait and see ..
>>  
>>
> 
> Something I've been thinking about. Currently it is impossible to write
> any kind of generic tool that uses the mark and plays nice with other
> generic tools. Maybe we need some kind of API that allocates bits in the
> mark. Something like "give me two bits", that returns some handle to the
> bits. That handle could then be used for identifying the bits in the mark.

It can see that it would be useful for complex setups, but I can't think
of an efficient implementation of this. You would have to carry a table
of handle identifiers -> mark ranges with every packet, wouldn't you?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-08 23:43               ` Philip Craig
@ 2006-06-19 16:01                 ` Patrick McHardy
  0 siblings, 0 replies; 13+ messages in thread
From: Patrick McHardy @ 2006-06-19 16:01 UTC (permalink / raw)
  To: Philip Craig; +Cc: netfilter-devel, Martijn Lievaart

Philip Craig wrote:
> It may be enough to extend the conntrack mark, but not the packet mark,
> so that we aren't growing the skb.  Would need fancier operations for
> transferring only parts of the conntrack mark to the packet mark.

In my experience its really the nfmark bits that are getting hard to
use in complex setups. I never had problems with conntrack mark ..

I have an almost finished patch for nfmark mask support for routing
rules, which should provide a bit of relief for people using it for
routing.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-19 15:59           ` Patrick McHardy
@ 2006-06-20 21:26             ` Martijn Lievaart
  2006-06-20 21:44               ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: Martijn Lievaart @ 2006-06-20 21:26 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Patrick McHardy wrote:

>Martijn Lievaart wrote:
>  
>
>>Patrick McHardy wrote:
>>
>>    
>>
>>Something I've been thinking about. Currently it is impossible to write
>>any kind of generic tool that uses the mark and plays nice with other
>>generic tools. Maybe we need some kind of API that allocates bits in the
>>mark. Something like "give me two bits", that returns some handle to the
>>bits. That handle could then be used for identifying the bits in the mark.
>>    
>>
>
>It can see that it would be useful for complex setups, but I can't think
>of an efficient implementation of this. You would have to carry a table
>of handle identifiers -> mark ranges with every packet, wouldn't you?
>  
>

No, no! Just an API (which doesn't need to be coupled to iptables kernel 
part at all) where one can "reserve" some bits in the nfmark. That 
handle would refer to the same bit(s) everywhere, but you don't need to 
know which bits in the mark you are exactly using. So this implies 
ANDing as well. Something along these lines (error handling simplified):

# reserve 1 bit
MYMARK=`iptables-mark --reserve mybits:1`
# This would return some 'handle' (most probably the offset in the mark, 
prefixed with something)

# Use MYMARK to set just one bit in the mark
iptables -A ..... -j MARK --set $MYMARK

M4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bridge netfilter defered hooks
  2006-06-20 21:26             ` Martijn Lievaart
@ 2006-06-20 21:44               ` Patrick McHardy
  0 siblings, 0 replies; 13+ messages in thread
From: Patrick McHardy @ 2006-06-20 21:44 UTC (permalink / raw)
  To: Martijn Lievaart; +Cc: Netfilter Development Mailinglist

Martijn Lievaart wrote:
> Patrick McHardy wrote:
> 
>> It can see that it would be useful for complex setups, but I can't think
>> of an efficient implementation of this. You would have to carry a table
>> of handle identifiers -> mark ranges with every packet, wouldn't you?
>>  
>>
> 
> No, no! Just an API (which doesn't need to be coupled to iptables kernel
> part at all) where one can "reserve" some bits in the nfmark. That
> handle would refer to the same bit(s) everywhere, but you don't need to
> know which bits in the mark you are exactly using. So this implies
> ANDing as well. Something along these lines (error handling simplified):
> 
> # reserve 1 bit
> MYMARK=`iptables-mark --reserve mybits:1`
> # This would return some 'handle' (most probably the offset in the mark,
> prefixed with something)
> 
> # Use MYMARK to set just one bit in the mark
> iptables -A ..... -j MARK --set $MYMARK

It seems I misunderstood you, I thought you were talking about dynamic
reservations :) Sure, something like /etc/iproute/rt_realms would make
life easier for users. But it doesn't really solve the problem that its
sometimes really hard to get along with 32 bits.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-06-20 21:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-02 14:57 Bridge netfilter defered hooks Patrick McHardy
2006-06-02 17:00 ` Bart De Schuymer
2006-06-02 17:18   ` Patrick McHardy
2006-06-02 20:10     ` Carl-Daniel Hailfinger
2006-06-08  7:15       ` Patrick McHardy
2006-06-08 20:47         ` Martijn Lievaart
2006-06-08 21:40           ` Simon Lodal
2006-06-08 22:17             ` Martijn Lievaart
2006-06-08 23:43               ` Philip Craig
2006-06-19 16:01                 ` Patrick McHardy
2006-06-19 15:59           ` Patrick McHardy
2006-06-20 21:26             ` Martijn Lievaart
2006-06-20 21:44               ` Patrick McHardy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.