netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
@ 2012-01-04  8:07 Hans Schillstrom
  2012-01-04  8:28 ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04  8:07 UTC (permalink / raw)
  To: kaber, pablo, jengelh, netfilter-devel, netdev; +Cc: hans, Hans Schillstrom

In some cases it not desirable to have auto defrag.
Ex. in a cluster where packets can arrive on different blades.
In that case it is possible to use containers (LXC) and send
all fragments to one place where defrag is enabled.

This patch makes it possible to turn off the defrag per network name space,
by setting net.netfilter.nf_conntrack_nodefrag to 1.
Both IPv4 and IPv6 is effected by this sysctl.
Default is 0 which is defrag.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/netns/conntrack.h             |    1 +
 net/ipv4/netfilter/nf_defrag_ipv4.c       |    8 ++++++++
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c |    6 ++++++
 net/netfilter/nf_conntrack_standalone.c   |    8 ++++++++
 4 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 7a911ec..059f7b5 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -26,6 +26,7 @@ struct netns_ct {
 	int			sysctl_tstamp;
 	int			sysctl_checksum;
 	unsigned int		sysctl_log_invalid; /* Log invalid packets */
+	int			sysctl_nodefrag;
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*sysctl_header;
 	struct ctl_table_header	*acct_sysctl_header;
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 9bb1b8a..f4908b3 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -74,6 +74,14 @@ static unsigned int ipv4_conntrack_defrag(unsigned int hooknum,
 		return NF_ACCEPT;
 
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+	/* Check for no defrag options */
+	{
+		const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
+						out : in);
+
+		if (dev_net(dev)->ct.sysctl_nodefrag)
+			return NF_ACCEPT;
+	}
 #if !defined(CONFIG_NF_NAT) && !defined(CONFIG_NF_NAT_MODULE)
 	/* Previously seen (loopback)?  Ignore.  Do this before
 	   fragment check. */
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index cdd6d04..4b0a05b 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -61,6 +61,12 @@ static unsigned int ipv6_defrag(unsigned int hooknum,
 	struct sk_buff *reasm;
 
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+	/* Check for no defrag options */
+	const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
+					out : in);
+
+	if (dev_net(dev)->ct.sysctl_nodefrag)
+		return NF_ACCEPT;
 	/* Previously seen (loopback)?	*/
 	if (skb->nfct && !nf_ct_is_template((struct nf_conn *)skb->nfct))
 		return NF_ACCEPT;
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 885f5ab..95c489f 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -446,6 +446,13 @@ static ctl_table nf_ct_sysctl_table[] = {
 		.extra2		= &log_invalid_proto_max,
 	},
 	{
+		.procname	= "nf_conntrack_nodefrag",
+		.data		= &init_net.ct.sysctl_nodefrag,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
 		.procname	= "nf_conntrack_expect_max",
 		.data		= &nf_ct_expect_max,
 		.maxlen		= sizeof(int),
@@ -493,6 +500,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 	table[2].data = &net->ct.htable_size;
 	table[3].data = &net->ct.sysctl_checksum;
 	table[4].data = &net->ct.sysctl_log_invalid;
+	table[5].data = &net->ct.sysctl_nodefrag;
 
 	net->ct.sysctl_header = register_net_sysctl_table(net,
 					nf_net_netfilter_sysctl_path, table);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  8:07 [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns Hans Schillstrom
@ 2012-01-04  8:28 ` Jozsef Kadlecsik
  2012-01-04  8:49   ` Hans Schillstrom
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2012-01-04  8:28 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Patrick McHardy, Pablo Neira Ayuso, jengelh, netfilter-devel,
	netdev, hans

Hi Hans,

On Wed, 4 Jan 2012, Hans Schillstrom wrote:

> In some cases it not desirable to have auto defrag.
> Ex. in a cluster where packets can arrive on different blades.
> In that case it is possible to use containers (LXC) and send
> all fragments to one place where defrag is enabled.
> 
> This patch makes it possible to turn off the defrag per network name space,
> by setting net.netfilter.nf_conntrack_nodefrag to 1.
> Both IPv4 and IPv6 is effected by this sysctl.
> Default is 0 which is defrag.

Conntrack assumes that the packets are defragmented and will drop any 
unfragmented one. So your patch results packet drops.

Also, if you want to disable defragmentation then why don't you simply 
"mark" the packets with the NOTRACK target?

Best regards,
Jozsef
 
> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
> ---
>  include/net/netns/conntrack.h             |    1 +
>  net/ipv4/netfilter/nf_defrag_ipv4.c       |    8 ++++++++
>  net/ipv6/netfilter/nf_defrag_ipv6_hooks.c |    6 ++++++
>  net/netfilter/nf_conntrack_standalone.c   |    8 ++++++++
>  4 files changed, 23 insertions(+), 0 deletions(-)
> 
> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
> index 7a911ec..059f7b5 100644
> --- a/include/net/netns/conntrack.h
> +++ b/include/net/netns/conntrack.h
> @@ -26,6 +26,7 @@ struct netns_ct {
>  	int			sysctl_tstamp;
>  	int			sysctl_checksum;
>  	unsigned int		sysctl_log_invalid; /* Log invalid packets */
> +	int			sysctl_nodefrag;
>  #ifdef CONFIG_SYSCTL
>  	struct ctl_table_header	*sysctl_header;
>  	struct ctl_table_header	*acct_sysctl_header;
> diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
> index 9bb1b8a..f4908b3 100644
> --- a/net/ipv4/netfilter/nf_defrag_ipv4.c
> +++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
> @@ -74,6 +74,14 @@ static unsigned int ipv4_conntrack_defrag(unsigned int hooknum,
>  		return NF_ACCEPT;
>  
>  #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
> +	/* Check for no defrag options */
> +	{
> +		const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
> +						out : in);
> +
> +		if (dev_net(dev)->ct.sysctl_nodefrag)
> +			return NF_ACCEPT;
> +	}
>  #if !defined(CONFIG_NF_NAT) && !defined(CONFIG_NF_NAT_MODULE)
>  	/* Previously seen (loopback)?  Ignore.  Do this before
>  	   fragment check. */
> diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
> index cdd6d04..4b0a05b 100644
> --- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
> +++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
> @@ -61,6 +61,12 @@ static unsigned int ipv6_defrag(unsigned int hooknum,
>  	struct sk_buff *reasm;
>  
>  #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
> +	/* Check for no defrag options */
> +	const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
> +					out : in);
> +
> +	if (dev_net(dev)->ct.sysctl_nodefrag)
> +		return NF_ACCEPT;
>  	/* Previously seen (loopback)?	*/
>  	if (skb->nfct && !nf_ct_is_template((struct nf_conn *)skb->nfct))
>  		return NF_ACCEPT;
> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
> index 885f5ab..95c489f 100644
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -446,6 +446,13 @@ static ctl_table nf_ct_sysctl_table[] = {
>  		.extra2		= &log_invalid_proto_max,
>  	},
>  	{
> +		.procname	= "nf_conntrack_nodefrag",
> +		.data		= &init_net.ct.sysctl_nodefrag,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
>  		.procname	= "nf_conntrack_expect_max",
>  		.data		= &nf_ct_expect_max,
>  		.maxlen		= sizeof(int),
> @@ -493,6 +500,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>  	table[2].data = &net->ct.htable_size;
>  	table[3].data = &net->ct.sysctl_checksum;
>  	table[4].data = &net->ct.sysctl_log_invalid;
> +	table[5].data = &net->ct.sysctl_nodefrag;
>  
>  	net->ct.sysctl_header = register_net_sysctl_table(net,
>  					nf_net_netfilter_sysctl_path, table);
> -- 
> 1.7.2.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  8:28 ` Jozsef Kadlecsik
@ 2012-01-04  8:49   ` Hans Schillstrom
  2012-01-04  9:03     ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04  8:49 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Patrick McHardy, Pablo Neira Ayuso, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

Hello Jozsef

On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
> Hi Hans,
> 
> On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> 
> > In some cases it not desirable to have auto defrag.
> > Ex. in a cluster where packets can arrive on different blades.
> > In that case it is possible to use containers (LXC) and send
> > all fragments to one place where defrag is enabled.
> > 
> > This patch makes it possible to turn off the defrag per network name space,
> > by setting net.netfilter.nf_conntrack_nodefrag to 1.
> > Both IPv4 and IPv6 is effected by this sysctl.
> > Default is 0 which is defrag.
> 
> Conntrack assumes that the packets are defragmented and will drop any 
> unfragmented one. So your patch results packet drops.

Hmmm, more work...
> 
> Also, if you want to disable defragmentation then why don't you simply 
> "mark" the packets with the NOTRACK target?

I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400

> 
> Best regards,
> Jozsef
>  

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  8:49   ` Hans Schillstrom
@ 2012-01-04  9:03     ` Jozsef Kadlecsik
  2012-01-04  9:32       ` Jan Engelhardt
  2012-01-04 10:18       ` Hans Schillstrom
  0 siblings, 2 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2012-01-04  9:03 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Patrick McHardy, Pablo Neira Ayuso, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wed, 4 Jan 2012, Hans Schillstrom wrote:

> On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
> > 
> > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> > 
> > > In some cases it not desirable to have auto defrag.
> > > Ex. in a cluster where packets can arrive on different blades.
> > > In that case it is possible to use containers (LXC) and send
> > > all fragments to one place where defrag is enabled.
> > > 
> > > This patch makes it possible to turn off the defrag per network name space,
> > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
> > > Both IPv4 and IPv6 is effected by this sysctl.
> > > Default is 0 which is defrag.
> > 
> > Conntrack assumes that the packets are defragmented and will drop any 
> > unfragmented one. So your patch results packet drops.
> 
> Hmmm, more work...
> > 
> > Also, if you want to disable defragmentation then why don't you simply 
> > "mark" the packets with the NOTRACK target?
> 
> I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400

Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
The raw table should be made possible to completely override conntack and 
defrag is implicit part of the latter.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  9:03     ` Jozsef Kadlecsik
@ 2012-01-04  9:32       ` Jan Engelhardt
  2012-01-04  9:47         ` Hans Schillstrom
  2012-01-04  9:49         ` Jozsef Kadlecsik
  2012-01-04 10:18       ` Hans Schillstrom
  1 sibling, 2 replies; 22+ messages in thread
From: Jan Engelhardt @ 2012-01-04  9:32 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Hans Schillstrom, Patrick McHardy, Pablo Neira Ayuso,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wednesday 2012-01-04 10:03, Jozsef Kadlecsik wrote:

>On Wed, 4 Jan 2012, Hans Schillstrom wrote:
>
>> On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
>> > 
>> > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
>> > 
>> > > In some cases it not desirable to have auto defrag.
>> > > Ex. in a cluster where packets can arrive on different blades.
>> > > In that case it is possible to use containers (LXC) and send
>> > > all fragments to one place where defrag is enabled.
>> > > 
>> > > This patch makes it possible to turn off the defrag per network name space,
>> > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
>> > > Both IPv4 and IPv6 is effected by this sysctl.
>> > > Default is 0 which is defrag.
>> > 
>> > Conntrack assumes that the packets are defragmented and will drop any 
>> > unfragmented one. So your patch results packet drops.
>> 
>> Hmmm, more work...
>> > 
>> > Also, if you want to disable defragmentation then why don't you simply 
>> > "mark" the packets with the NOTRACK target?
>> 
>> I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400
>
>Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
>The raw table should be made possible to completely override conntack and 
>defrag is implicit part of the latter.

We've been there (me in the thread even) - defrag is running before raw,
because otherwise you could not select packets based upon L4 
parameters for non-defrag in the first place:

	-t raw ... -p udp --dport 53 -j CT --notrack

Not that I overly care about whether defrag is before/after raw..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  9:32       ` Jan Engelhardt
@ 2012-01-04  9:47         ` Hans Schillstrom
  2012-01-04 17:23           ` Pablo Neira Ayuso
  2012-01-04  9:49         ` Jozsef Kadlecsik
  1 sibling, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04  9:47 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Jozsef Kadlecsik, Patrick McHardy, Pablo Neira Ayuso,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wednesday 04 January 2012 10:32:14 Jan Engelhardt wrote:
> On Wednesday 2012-01-04 10:03, Jozsef Kadlecsik wrote:
> 
> >On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> >
> >> On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
> >> > 
> >> > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> >> > 
> >> > > In some cases it not desirable to have auto defrag.
> >> > > Ex. in a cluster where packets can arrive on different blades.
> >> > > In that case it is possible to use containers (LXC) and send
> >> > > all fragments to one place where defrag is enabled.
> >> > > 
> >> > > This patch makes it possible to turn off the defrag per network name space,
> >> > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
> >> > > Both IPv4 and IPv6 is effected by this sysctl.
> >> > > Default is 0 which is defrag.
> >> > 
> >> > Conntrack assumes that the packets are defragmented and will drop any 
> >> > unfragmented one. So your patch results packet drops.
> >> 
> >> Hmmm, more work...
> >> > 
> >> > Also, if you want to disable defragmentation then why don't you simply 
> >> > "mark" the packets with the NOTRACK target?
> >> 
> >> I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400
> >
> >Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
> >The raw table should be made possible to completely override conntack and 
> >defrag is implicit part of the latter.
> 
> We've been there (me in the thread even) - defrag is running before raw,
> because otherwise you could not select packets based upon L4 
> parameters for non-defrag in the first place:
> 
> 	-t raw ... -p udp --dport 53 -j CT --notrack
> 
> Not that I overly care about whether defrag is before/after raw..
> 
What about a mod param for ip{6}table_raw so it could be changed ?

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  9:32       ` Jan Engelhardt
  2012-01-04  9:47         ` Hans Schillstrom
@ 2012-01-04  9:49         ` Jozsef Kadlecsik
  1 sibling, 0 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2012-01-04  9:49 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Hans Schillstrom, Patrick McHardy, Pablo Neira Ayuso,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wed, 4 Jan 2012, Jan Engelhardt wrote:

> On Wednesday 2012-01-04 10:03, Jozsef Kadlecsik wrote:
> 
> >On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> >
> >> On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
> >> > 
> >> > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> >> > 
> >> > > In some cases it not desirable to have auto defrag.
> >> > > Ex. in a cluster where packets can arrive on different blades.
> >> > > In that case it is possible to use containers (LXC) and send
> >> > > all fragments to one place where defrag is enabled.
> >> > > 
> >> > > This patch makes it possible to turn off the defrag per network name space,
> >> > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
> >> > > Both IPv4 and IPv6 is effected by this sysctl.
> >> > > Default is 0 which is defrag.
> >> > 
> >> > Conntrack assumes that the packets are defragmented and will drop any 
> >> > unfragmented one. So your patch results packet drops.
> >> 
> >> Hmmm, more work...
> >> > 
> >> > Also, if you want to disable defragmentation then why don't you simply 
> >> > "mark" the packets with the NOTRACK target?
> >> 
> >> I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400
> >
> >Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
> >The raw table should be made possible to completely override conntack and 
> >defrag is implicit part of the latter.
> 
> We've been there (me in the thread even) - defrag is running before raw,
> because otherwise you could not select packets based upon L4 
> parameters for non-defrag in the first place:
> 
> 	-t raw ... -p udp --dport 53 -j CT --notrack
> 
> Not that I overly care about whether defrag is before/after raw..

You mean "for non-conntrack", but you are right. Nice cicle :-).

So we can sum up that either the system has got conntrack enabled which
requires defrag, or there's no defrag but then there's no conntrack at 
all.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  9:03     ` Jozsef Kadlecsik
  2012-01-04  9:32       ` Jan Engelhardt
@ 2012-01-04 10:18       ` Hans Schillstrom
  2012-01-04 11:17         ` Jan Engelhardt
  1 sibling, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04 10:18 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Patrick McHardy, Pablo Neira Ayuso, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wednesday 04 January 2012 10:03:49 Jozsef Kadlecsik wrote:
> On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> 
> > On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
> > > 
> > > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> > > 
> > > > In some cases it not desirable to have auto defrag.
> > > > Ex. in a cluster where packets can arrive on different blades.
> > > > In that case it is possible to use containers (LXC) and send
> > > > all fragments to one place where defrag is enabled.
> > > > 
> > > > This patch makes it possible to turn off the defrag per network name space,
> > > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
> > > > Both IPv4 and IPv6 is effected by this sysctl.
> > > > Default is 0 which is defrag.
> > > 
> > > Conntrack assumes that the packets are defragmented and will drop any 
> > > unfragmented one. So your patch results packet drops.
> > 
> > Hmmm, more work...
> > > 
> > > Also, if you want to disable defragmentation then why don't you simply 
> > > "mark" the packets with the NOTRACK target?
> > 
> > I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400
> 
> Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
> The raw table should be made possible to completely override conntack and 
> defrag is implicit part of the latter.
> 

An other idea, turn off both conntrack and defrag
i.e. do like NOTRAC and rename the flag  ?

Quick example for IPv4:
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -74,6 +74,14 @@ static unsigned int ipv4_conntrack_defrag(unsigned int hooknum,
...
+	const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
+					out : in);
+
+	/* No defrag and not Previously seen (loopback)? */
+	if (dev_net(dev)->ct.sysctl_notrac_defrag && skb->nfct) {
+		/* Attach fake conntrack entry. as in NOTRACK */
+		skb->nfct = &nf_ct_untracked_get()->ct_general;
+		skb->nfctinfo = IP_CT_NEW;
+		nf_conntrack_get(skb->nfct);
+		return NF_ACCEPT;
+	}
...

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 10:18       ` Hans Schillstrom
@ 2012-01-04 11:17         ` Jan Engelhardt
  2012-01-04 11:48           ` Hans Schillstrom
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2012-01-04 11:17 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Jozsef Kadlecsik, Patrick McHardy, Pablo Neira Ayuso,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wednesday 2012-01-04 11:18, Hans Schillstrom wrote:

>On Wednesday 04 January 2012 10:03:49 Jozsef Kadlecsik wrote:
>> On Wed, 4 Jan 2012, Hans Schillstrom wrote:
>> 
>> > On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
>> > > 
>> > > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
>> > > 
>> > > > In some cases it not desirable to have auto defrag.
>> > > > Ex. in a cluster where packets can arrive on different blades.
>> > > > In that case it is possible to use containers (LXC) and send
>> > > > all fragments to one place where defrag is enabled.
>> > > > 
>> > > > This patch makes it possible to turn off the defrag per network name space,
>> > > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
>> > > > Both IPv4 and IPv6 is effected by this sysctl.
>> > > > Default is 0 which is defrag.
>> > > 
>> > > Conntrack assumes that the packets are defragmented and will drop any 
>> > > unfragmented one. So your patch results packet drops.
>> > 
>> > Hmmm, more work...
>> > > 
>> > > Also, if you want to disable defragmentation then why don't you simply 
>> > > "mark" the packets with the NOTRACK target?
>> > 
>> > I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400
>> 
>> Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
>> The raw table should be made possible to completely override conntack and 
>> defrag is implicit part of the latter.
>> 
>
>An other idea, turn off both conntrack and defrag
>i.e. do like NOTRAC and rename the flag  ?

Or just add a new table - that one we can remove/stash
when I get my xt2 patches out.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 11:17         ` Jan Engelhardt
@ 2012-01-04 11:48           ` Hans Schillstrom
  2012-01-04 17:40             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04 11:48 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Jozsef Kadlecsik, Patrick McHardy, Pablo Neira Ayuso,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wednesday 04 January 2012 12:17:18 Jan Engelhardt wrote:
> On Wednesday 2012-01-04 11:18, Hans Schillstrom wrote:
> 
> >On Wednesday 04 January 2012 10:03:49 Jozsef Kadlecsik wrote:
> >> On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> >> 
> >> > On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
> >> > > 
> >> > > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> >> > > 
> >> > > > In some cases it not desirable to have auto defrag.
> >> > > > Ex. in a cluster where packets can arrive on different blades.
> >> > > > In that case it is possible to use containers (LXC) and send
> >> > > > all fragments to one place where defrag is enabled.
> >> > > > 
> >> > > > This patch makes it possible to turn off the defrag per network name space,
> >> > > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
> >> > > > Both IPv4 and IPv6 is effected by this sysctl.
> >> > > > Default is 0 which is defrag.
> >> > > 
> >> > > Conntrack assumes that the packets are defragmented and will drop any 
> >> > > unfragmented one. So your patch results packet drops.
> >> > 
> >> > Hmmm, more work...
> >> > > 
> >> > > Also, if you want to disable defragmentation then why don't you simply 
> >> > > "mark" the packets with the NOTRACK target?
> >> > 
> >> > I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400
> >> 
> >> Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
> >> The raw table should be made possible to completely override conntack and 
> >> defrag is implicit part of the latter.
> >> 
> >
> >An other idea, turn off both conntrack and defrag
> >i.e. do like NOTRAC and rename the flag  ?
> 
> Or just add a new table - that one we can remove/stash
> when I get my xt2 patches out.
> 
I like that idea, an "early" table at prio -500 with PREROUTING.
There is also a need for a new flag "--allfrags"
i.e. all fragments needs to be sorted out and sent to same dest for defrag.

ex.
iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK


-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04  9:47         ` Hans Schillstrom
@ 2012-01-04 17:23           ` Pablo Neira Ayuso
  0 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2012-01-04 17:23 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Jan Engelhardt, Jozsef Kadlecsik, Patrick McHardy,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wed, Jan 04, 2012 at 10:47:53AM +0100, Hans Schillstrom wrote:
> On Wednesday 04 January 2012 10:32:14 Jan Engelhardt wrote:
> > On Wednesday 2012-01-04 10:03, Jozsef Kadlecsik wrote:
> > 
> > >On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> > >
> > >> On Wednesday 04 January 2012 09:28:05 Jozsef Kadlecsik wrote:
> > >> > 
> > >> > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> > >> > 
> > >> > > In some cases it not desirable to have auto defrag.
> > >> > > Ex. in a cluster where packets can arrive on different blades.
> > >> > > In that case it is possible to use containers (LXC) and send
> > >> > > all fragments to one place where defrag is enabled.
> > >> > > 
> > >> > > This patch makes it possible to turn off the defrag per network name space,
> > >> > > by setting net.netfilter.nf_conntrack_nodefrag to 1.
> > >> > > Both IPv4 and IPv6 is effected by this sysctl.
> > >> > > Default is 0 which is defrag.
> > >> > 
> > >> > Conntrack assumes that the packets are defragmented and will drop any 
> > >> > unfragmented one. So your patch results packet drops.
> > >> 
> > >> Hmmm, more work...
> > >> > 
> > >> > Also, if you want to disable defragmentation then why don't you simply 
> > >> > "mark" the packets with the NOTRACK target?
> > >> 
> > >> I don't think that will work since NF_IP_PRI_CONNTRACK_DEFRAG is -400
> > >
> > >Then change NF_IP_PRI_RAW so that it precedes NF_IP_PRI_CONNTRACK_DEFRAG. 
> > >The raw table should be made possible to completely override conntack and 
> > >defrag is implicit part of the latter.
> > 
> > We've been there (me in the thread even) - defrag is running before raw,
> > because otherwise you could not select packets based upon L4 
> > parameters for non-defrag in the first place:
> > 
> > 	-t raw ... -p udp --dport 53 -j CT --notrack
> > 
> > Not that I overly care about whether defrag is before/after raw..
> > 
> What about a mod param for ip{6}table_raw so it could be changed ?

No obscure modparam tweaks, please.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 11:48           ` Hans Schillstrom
@ 2012-01-04 17:40             ` Pablo Neira Ayuso
  2012-01-04 18:05               ` Jozsef Kadlecsik
                                 ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2012-01-04 17:40 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Jan Engelhardt, Jozsef Kadlecsik, Patrick McHardy,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> I like that idea, an "early" table at prio -500 with PREROUTING.
> There is also a need for a new flag "--allfrags"
> i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> 
> ex.
> iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK

New tables add too much overhead. We have discussed this before with
Patrick.

Since this still remains specific to your needs, I think you can
remove nf_conntrack module in your setup.

I don't come with one sane setup that may want selectively defragment
some traffic yes and other not.

Am I missing anything else?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 17:40             ` Pablo Neira Ayuso
@ 2012-01-04 18:05               ` Jozsef Kadlecsik
  2012-01-04 20:56                 ` Hans Schillstrom
  2012-01-04 20:45               ` Hans Schillstrom
  2012-01-04 21:15               ` Hans Schillstrom
  2 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2012-01-04 18:05 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Hans Schillstrom, Jan Engelhardt, Patrick McHardy,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com

On Wed, 4 Jan 2012, Pablo Neira Ayuso wrote:

> On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> > I like that idea, an "early" table at prio -500 with PREROUTING.
> > There is also a need for a new flag "--allfrags"
> > i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> > 
> > ex.
> > iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK
> 
> New tables add too much overhead. We have discussed this before with
> Patrick.
> 
> Since this still remains specific to your needs, I think you can
> remove nf_conntrack module in your setup.
> 
> I don't come with one sane setup that may want selectively defragment
> some traffic yes and other not.
> 
> Am I missing anything else?

I agree. If you don't want defragmentation at all, then make sure you 
don't load the nf_conntrack module directly/indirectly. Conntrack doesn't 
work without defragmentation anyway.

The only thing what such a really-early table could buy at the moment is 
to specify which flows not to defragment at layer 3 level.

If we had dynamic hooks registration and hook priorities at table level, 
that'd come handy now.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 17:40             ` Pablo Neira Ayuso
  2012-01-04 18:05               ` Jozsef Kadlecsik
@ 2012-01-04 20:45               ` Hans Schillstrom
  2012-01-04 21:15               ` Hans Schillstrom
  2 siblings, 0 replies; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04 20:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Hans Schillstrom, Jan Engelhardt, Jozsef Kadlecsik,
	Patrick McHardy, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org


On Wednesday, January 04, 2012 18:40:35 Pablo Neira Ayuso wrote:
> On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> > I like that idea, an "early" table at prio -500 with PREROUTING.
> > There is also a need for a new flag "--allfrags"
> > i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> > 
> > ex.
> > iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK
> 
> New tables add too much overhead. We have discussed this before with
> Patrick.
> 
> Since this still remains specific to your needs, I think you can
> remove nf_conntrack module in your setup.
> 
> I don't come with one sane setup that may want selectively defragment
> some traffic yes and other not.
> 
> Am I missing anything else?
>

I might have been a little bit unclear, so I'll try the opposite :-)

Network namesapce i.e. Linux Containers (LXC) creates new possibilities,
Linux moves to new domains - Large Clusters controllers.

When you have two or more interfaces (on different machines) that receives data
from the Internet you will sooner or later end up with fragments on different
interfaces.

If you deal with Virtual IP:s in the cluster (which is very common)
there must be some place where packet defrag occurs, before sending
it to a load balancer.

Hardware is cheap but space and power consumption is not, so
no one wants extra hardware. If possible extra hops should also be avoided.

With existing functionality an extra level of physical machines must be
added between the (FW/GW) and the Load-Balancers to do the defrag,
which is not very efficient.

With a solution where it's possible to sort out fragments early
(based on ex source address) and send them to the same Container for defragmentation
no extra hardware is needed and only fragmented packet have an extra hop.


A Simplified Example:
(ASCII grapichs have some limitaions)

            Blade 1
         +------------+
         |   +-----+  | Defrag/LB
Inet A   |   | FW. |  |  Trafic                 VIP 11.1.1.1
---------+-> | LXC |--|-->+                     Blade a
         |   +-----+  |   |                    +-------+
         |      |<----|---+                    | Appl. |
         |   +-----+  |   |       +-------- >  | Serv. |
         |   | LB. |__|___|_______|            +-------+
         |   | IPVS|  |   |       |
         |   +-----+  |   |       |
         +------------+   |       |
                          |       |
            Blade 2       |       |
         +------------+   |       |             VIP 11.1.1.1
         |   +-----+  |   |       |             Blade b
Inet B   |   | FW. |  |   |       |            +-------+
---------+-> | LXC |--|-->|       |            | Appl. |
         |   +-----+  |   |       +----------> | Serv. |
         |      | <---|---+       |            +-------+
         |   +-----+  |   |       |
         |   | LB. |__|___|_______|
         |   | IPVS|  |   |       |             VIP 11.1.1.1
         |   +-----+  |   |       |             Blade c
         +------------+   |       |            +-------+
                          |       |            | Appl. |
            Blade n       |       +--------->  | Serv. |
         +------------+   |       |            +-------+
         |   +-----+  |   |       |
Inet N   |   | FW. |  |   |       |             VIP 11.1.1.1
---------+-> | LXC |--|-->|       |             Blade x
         |   +-----+  |   |       |            +-------+
         |      |<----|---+       |            | Appl. |
         |   +-----+  |           +--------->  | Serv. |
         |   | LB. |__|___________|            +-------+
         |   | IPVS|  |
         |   +-----+  |
         +------------+

You might even co-locate the Appl on the FW/GW Blades.
The ideal solution would be where you can sort out fragments based on interface
and have defrag on others. (In this case even the first fragment)

Regards
Hans Schillstrom

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 18:05               ` Jozsef Kadlecsik
@ 2012-01-04 20:56                 ` Hans Schillstrom
  2012-01-04 21:40                   ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04 20:56 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Pablo Neira Ayuso, Hans Schillstrom, Jan Engelhardt,
	Patrick McHardy, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org


On Wednesday, January 04, 2012 19:05:10 Jozsef Kadlecsik wrote:
> On Wed, 4 Jan 2012, Pablo Neira Ayuso wrote:
> 
> > On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> > > I like that idea, an "early" table at prio -500 with PREROUTING.
> > > There is also a need for a new flag "--allfrags"
> > > i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> > > 
> > > ex.
> > > iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK
> > 
> > New tables add too much overhead. We have discussed this before with
> > Patrick.
> > 
> > Since this still remains specific to your needs, I think you can
> > remove nf_conntrack module in your setup.
> > 
> > I don't come with one sane setup that may want selectively defragment
> > some traffic yes and other not.
> > 
> > Am I missing anything else?
> 
> I agree. If you don't want defragmentation at all, then make sure you 
> don't load the nf_conntrack module directly/indirectly. Conntrack doesn't 
> work without defragmentation anyway.

We are using LXC and it's only in the container that holds the external 
interface that can't have defragmentation.
The problem is if it's loaded you have it in all namespaces :-(

> 
> The only thing what such a really-early table could buy at the moment is 
> to specify which flows not to defragment at layer 3 level.
> 
> If we had dynamic hooks registration and hook priorities at table level, 
> that'd come handy now.

I do agree.

> 

Regards
Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 17:40             ` Pablo Neira Ayuso
  2012-01-04 18:05               ` Jozsef Kadlecsik
  2012-01-04 20:45               ` Hans Schillstrom
@ 2012-01-04 21:15               ` Hans Schillstrom
  2 siblings, 0 replies; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-04 21:15 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Hans Schillstrom, Jan Engelhardt, Jozsef Kadlecsik,
	Patrick McHardy, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org

[-- Attachment #1: Type: Text/Plain, Size: 985 bytes --]

Hello Again

On Wednesday, January 04, 2012 18:40:35 Pablo Neira Ayuso wrote:
> On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> > I like that idea, an "early" table at prio -500 with PREROUTING.
> > There is also a need for a new flag "--allfrags"
> > i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> > 
> > ex.
> > iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK
> 
> New tables add too much overhead. We have discussed this before with
> Patrick.
> 
Only if loaded .. 
It would have been the perfect solution.
Is the discussion about the overhead on the list (I can't find it)?

I made a quick test with an "early" table
and --allfrags fix (for IPv4) and it works really good.

iptables -t early -A PREROUTING -i eth0 -a -j NOTRACK
iptables -t mangle -A PREROUTING -i eth0 -a -j HMARK --mod 3 --offs 100

So your opinion is no more tables,
even if it's rare that it is loaded?

Regards
Hans

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 20:56                 ` Hans Schillstrom
@ 2012-01-04 21:40                   ` Jozsef Kadlecsik
  2012-01-05  7:19                     ` Hans Schillstrom
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2012-01-04 21:40 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Pablo Neira Ayuso, Hans Schillstrom, Jan Engelhardt,
	Patrick McHardy, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org

On Wed, 4 Jan 2012, Hans Schillstrom wrote:

> On Wednesday, January 04, 2012 19:05:10 Jozsef Kadlecsik wrote:
> > On Wed, 4 Jan 2012, Pablo Neira Ayuso wrote:
> > 
> > > On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> > > > I like that idea, an "early" table at prio -500 with PREROUTING.
> > > > There is also a need for a new flag "--allfrags"
> > > > i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> > > > 
> > > > ex.
> > > > iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK
> > > 
> > > New tables add too much overhead. We have discussed this before with
> > > Patrick.
> > > 
> > > Since this still remains specific to your needs, I think you can
> > > remove nf_conntrack module in your setup.
> > > 
> > > I don't come with one sane setup that may want selectively defragment
> > > some traffic yes and other not.
> > > 
> > > Am I missing anything else?
> > 
> > I agree. If you don't want defragmentation at all, then make sure you 
> > don't load the nf_conntrack module directly/indirectly. Conntrack doesn't 
> > work without defragmentation anyway.
> 
> We are using LXC and it's only in the container that holds the external 
> interface that can't have defragmentation.
> The problem is if it's loaded you have it in all namespaces :-(

Conntrack is per net namespaces. You may have one container with conntrack 
enabled and another one without conntrack.

Moreover, if you may receive fragments of the same packet at different 
interfaces in different blades, then you may receive different whole 
packets of the same flow at different interfaces/blades. But stateful 
firewalling relies on the assumption that all packets goes through of the 
firewall. Because it's not assured, conntrack may not run in the 
containers you denoted as FW LXC.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-04 21:40                   ` Jozsef Kadlecsik
@ 2012-01-05  7:19                     ` Hans Schillstrom
  2012-01-05  9:11                       ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-05  7:19 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Hans Schillstrom, Pablo Neira Ayuso, Jan Engelhardt,
	Patrick McHardy, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org

On Wednesday 04 January 2012 22:40:09 Jozsef Kadlecsik wrote:
> On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> 
> > On Wednesday, January 04, 2012 19:05:10 Jozsef Kadlecsik wrote:
> > > On Wed, 4 Jan 2012, Pablo Neira Ayuso wrote:
> > > 
> > > > On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> > > > > I like that idea, an "early" table at prio -500 with PREROUTING.
> > > > > There is also a need for a new flag "--allfrags"
> > > > > i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> > > > > 
> > > > > ex.
> > > > > iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK
> > > > 
> > > > New tables add too much overhead. We have discussed this before with
> > > > Patrick.
> > > > 
> > > > Since this still remains specific to your needs, I think you can
> > > > remove nf_conntrack module in your setup.
> > > > 
> > > > I don't come with one sane setup that may want selectively defragment
> > > > some traffic yes and other not.
> > > > 
> > > > Am I missing anything else?
> > > 
> > > I agree. If you don't want defragmentation at all, then make sure you 
> > > don't load the nf_conntrack module directly/indirectly. Conntrack doesn't 
> > > work without defragmentation anyway.
> > 
> > We are using LXC and it's only in the container that holds the external 
> > interface that can't have defragmentation.
> > The problem is if it's loaded you have it in all namespaces :-(
> 
> Conntrack is per net namespaces. You may have one container with conntrack 
> enabled and another one without conntrack.

How do you disable conntrack per netns ?
I can't see how to do it except for NOTRACK
Then the nf_defrag issue is still there...

> 
> Moreover, if you may receive fragments of the same packet at different 
> interfaces in different blades, then you may receive different whole 
> packets of the same flow at different interfaces/blades. But stateful 
> firewalling relies on the assumption that all packets goes through of the 
> firewall. 
True you can't have stateful fw in that stage because of fragments.
> Because it's not assured, conntrack may not run in the 
> containers you denoted as FW LXC.
Thats why I want to disable defrag and conntrack in them

A single flow, with Containers in any Blade.

    +---------------------------+    /              +--------------------+
--> | FW (no CT)frag  HMARK sel |--->---            | Conntrack and IPVS |---->
    +---------------------------+    \              +--------------------+
            \ (fragments)                                  ..
             v 
         +---------------------------+   /                 ..
         |     de-frag  HMARK sel    |----->  
         +---------------------------+   \          +--------------------+
                                                    | Conntrack and IPVS |---->
                                                    +--------------------+

Note that HMARK makes a preselection of which IPVS to use, and directs the flow
to the same IPVS independent of which Blade/interface it arrives on.
i.e. the defrag:ed packed will reach the same IPVS as the others.
                          
-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-05  7:19                     ` Hans Schillstrom
@ 2012-01-05  9:11                       ` Jozsef Kadlecsik
  2012-01-05 14:18                         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2012-01-05  9:11 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Pablo Neira Ayuso, Jan Engelhardt, Patrick McHardy,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org

On Thu, 5 Jan 2012, Hans Schillstrom wrote:

> On Wednesday 04 January 2012 22:40:09 Jozsef Kadlecsik wrote:
> > On Wed, 4 Jan 2012, Hans Schillstrom wrote:
> > 
> > > On Wednesday, January 04, 2012 19:05:10 Jozsef Kadlecsik wrote:
> > > > On Wed, 4 Jan 2012, Pablo Neira Ayuso wrote:
> > > > 
> > > > > On Wed, Jan 04, 2012 at 12:48:35PM +0100, Hans Schillstrom wrote:
> > > > > > I like that idea, an "early" table at prio -500 with PREROUTING.
> > > > > > There is also a need for a new flag "--allfrags"
> > > > > > i.e. all fragments needs to be sorted out and sent to same dest for defrag.
> > > > > > 
> > > > > > ex.
> > > > > > iptables -t early -A PREROUTING -i eth0 --allfrags -j NOTRACK
> > > > > 
> > > > > New tables add too much overhead. We have discussed this before with
> > > > > Patrick.
> > > > > 
> > > > > Since this still remains specific to your needs, I think you can
> > > > > remove nf_conntrack module in your setup.
> > > > > 
> > > > > I don't come with one sane setup that may want selectively defragment
> > > > > some traffic yes and other not.
> > > > > 
> > > > > Am I missing anything else?
> > > > 
> > > > I agree. If you don't want defragmentation at all, then make sure you 
> > > > don't load the nf_conntrack module directly/indirectly. Conntrack doesn't 
> > > > work without defragmentation anyway.
> > > 
> > > We are using LXC and it's only in the container that holds the external 
> > > interface that can't have defragmentation.
> > > The problem is if it's loaded you have it in all namespaces :-(
> > 
> > Conntrack is per net namespaces. You may have one container with conntrack 
> > enabled and another one without conntrack.
> 
> How do you disable conntrack per netns ?
> I can't see how to do it except for NOTRACK
> Then the nf_defrag issue is still there...

OK, I see. Conntrack is per net namespace but it's enabled globally.
 
So at the moment I think the best solution is something like your patch 
variant (but the condition is wrong, it should be "&& !skb->nfct"):

--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -74,6 +74,14 @@ static unsigned int ipv4_conntrack_defrag(unsigned int
hooknum,
...
+       const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
+                                       out : in);
+
+       /* No defrag and not Previously seen (loopback)? */
+       if (dev_net(dev)->ct.sysctl_notrac_defrag && skb->nfct) {
+               /* Attach fake conntrack entry. as in NOTRACK */
+               skb->nfct = &nf_ct_untracked_get()->ct_general;
+               skb->nfctinfo = IP_CT_NEW;
+               nf_conntrack_get(skb->nfct);
+               return NF_ACCEPT;
+       }
...

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-05  9:11                       ` Jozsef Kadlecsik
@ 2012-01-05 14:18                         ` Pablo Neira Ayuso
  2012-01-09  8:58                           ` Hans Schillstrom
  0 siblings, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2012-01-05 14:18 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Hans Schillstrom, Jan Engelhardt, Patrick McHardy,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org

On Thu, Jan 05, 2012 at 10:11:28AM +0100, Jozsef Kadlecsik wrote:
> OK, I see. Conntrack is per net namespace but it's enabled globally.
>  
> So at the moment I think the best solution is something like your patch 
> variant (but the condition is wrong, it should be "&& !skb->nfct"):
> 
> --- a/net/ipv4/netfilter/nf_defrag_ipv4.c
> +++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
> @@ -74,6 +74,14 @@ static unsigned int ipv4_conntrack_defrag(unsigned int
> hooknum,
> ...
> +       const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
> +                                       out : in);
> +
> +       /* No defrag and not Previously seen (loopback)? */
> +       if (dev_net(dev)->ct.sysctl_notrac_defrag && skb->nfct) {
> +               /* Attach fake conntrack entry. as in NOTRACK */
> +               skb->nfct = &nf_ct_untracked_get()->ct_general;
> +               skb->nfctinfo = IP_CT_NEW;
> +               nf_conntrack_get(skb->nfct);
> +               return NF_ACCEPT;
> +       }
> ...

I prefer the sysctl option as well, the new table is too much and it
remains too specific for this.

I wonder if we can conditionally register the sysctl only if we are
inside one lxc container.

I'm telling this because this sysctl does not seem to make any sense
to me outside of it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-05 14:18                         ` Pablo Neira Ayuso
@ 2012-01-09  8:58                           ` Hans Schillstrom
  2012-01-10  3:17                             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Schillstrom @ 2012-01-09  8:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jozsef Kadlecsik, Jan Engelhardt, Patrick McHardy,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org

On Thursday 05 January 2012 15:18:59 Pablo Neira Ayuso wrote:
> On Thu, Jan 05, 2012 at 10:11:28AM +0100, Jozsef Kadlecsik wrote:
> > OK, I see. Conntrack is per net namespace but it's enabled globally.
> >  
> > So at the moment I think the best solution is something like your patch 
> > variant (but the condition is wrong, it should be "&& !skb->nfct"):
> > 

Oops, I'll fix that :-)

> > --- a/net/ipv4/netfilter/nf_defrag_ipv4.c
> > +++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
> > @@ -74,6 +74,14 @@ static unsigned int ipv4_conntrack_defrag(unsigned int
> > hooknum,
> > ...
> > +       const struct net_device *dev = (hooknum == NF_INET_LOCAL_OUT ?
> > +                                       out : in);
> > +
> > +       /* No defrag and not Previously seen (loopback)? */
> > +       if (dev_net(dev)->ct.sysctl_notrac_defrag && skb->nfct) {
> > +               /* Attach fake conntrack entry. as in NOTRACK */
> > +               skb->nfct = &nf_ct_untracked_get()->ct_general;
> > +               skb->nfctinfo = IP_CT_NEW;
> > +               nf_conntrack_get(skb->nfct);
> > +               return NF_ACCEPT;
> > +       }
> > ...
> 
> I prefer the sysctl option as well, the new table is too much and it
> remains too specific for this.
> 
> I wonder if we can conditionally register the sysctl only if we are
> inside one lxc container.
> 
Sure no problem, but the code will not be so nice ... 

> I'm telling this because this sysctl does not seem to make any sense
> to me outside of it.

I'm not so sure that we should make it asymetric,
but it's not a big deal.

Anyway here is a sample of the sysctl in a namespace.
It is the  "if (!net_eq(net, &init_net)) {..." that does the magic

diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 885f5ab..2a0d530 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -454,6 +454,21 @@ static ctl_table nf_ct_sysctl_table[] = {
        },
        { }
 };
+#define NFCT_SYSCTL_LAST \
+       ((sizeof(nf_ct_sysctl_table) / sizeof(struct ctl_table)) - 1)
+/*
+ * Not Visible in root name space (init_net)
+ */
+static ctl_table nf_ct_sysctl_ns_table[] = {
+               {
+                       .procname       = "nf_conntrack_nodefrag",
+                       .data           = &init_net.ct.sysctl_nodefrag,
+                       .maxlen         = sizeof(int),
+                       .mode           = 0644,
+                       .proc_handler   = proc_dointvec,
+               },
+               { }
+};

 #define NET_NF_CONNTRACK_MAX 2089

@@ -483,9 +498,10 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
                if (!nf_ct_netfilter_header)
                        goto out;
        }
+       table = kzalloc(sizeof(nf_ct_sysctl_table) +
+                       sizeof(nf_ct_sysctl_ns_table), GFP_KERNEL);
+       memcpy(table, nf_ct_sysctl_table, sizeof(nf_ct_sysctl_table));

-       table = kmemdup(nf_ct_sysctl_table, sizeof(nf_ct_sysctl_table),
-                       GFP_KERNEL);
        if (!table)
                goto out_kmemdup;

@@ -494,6 +510,12 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
        table[3].data = &net->ct.sysctl_checksum;
        table[4].data = &net->ct.sysctl_log_invalid;

+       if (!net_eq(net, &init_net)) {
+               memcpy(&table[NFCT_SYSCTL_LAST], nf_ct_sysctl_ns_table,
+                      sizeof(nf_ct_sysctl_ns_table));
+               table[NFCT_SYSCTL_LAST].data = &net->ct.sysctl_nodefrag;
+       }
+
        net->ct.sysctl_header = register_net_sysctl_table(net,
                                        nf_net_netfilter_sysctl_path, table);
        if (!net->ct.sysctl_header)
--
1.7.2.3

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns
  2012-01-09  8:58                           ` Hans Schillstrom
@ 2012-01-10  3:17                             ` Pablo Neira Ayuso
  0 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2012-01-10  3:17 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Jozsef Kadlecsik, Jan Engelhardt, Patrick McHardy,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org

Hi Hans,

On Mon, Jan 09, 2012 at 09:58:42AM +0100, Hans Schillstrom wrote:
> > I wonder if we can conditionally register the sysctl only if we are
> > inside one lxc container.
> > 
> Sure no problem, but the code will not be so nice ... 

Indeed, ugly indeed.

> > I'm telling this because this sysctl does not seem to make any sense
> > to me outside of it.
> 
> I'm not so sure that we should make it asymetric,
> but it's not a big deal.
> 
> Anyway here is a sample of the sysctl in a namespace.
> It is the  "if (!net_eq(net, &init_net)) {..." that does the magic

Hm, after having a look at it, I think I prefer to provide some
inconditional sysctl.

Better call it nf_conntrack_enable and set it to 1 by default. AFAICS,
this will be a synonymous of:

iptables -I PREROUTING -t raw -j NOTRACK

This option is disabling conntracking after all. I don't think we
would ever support conntrack with fragments.

Please, send a patch including in the description that we need this
for lxc, I'll enqueue it for net-next unless someone raise the hand
with a better solution.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-01-10  3:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-04  8:07 [PATCH 1/1] netfilter: Add possibility to turn off netfilters defrag per netns Hans Schillstrom
2012-01-04  8:28 ` Jozsef Kadlecsik
2012-01-04  8:49   ` Hans Schillstrom
2012-01-04  9:03     ` Jozsef Kadlecsik
2012-01-04  9:32       ` Jan Engelhardt
2012-01-04  9:47         ` Hans Schillstrom
2012-01-04 17:23           ` Pablo Neira Ayuso
2012-01-04  9:49         ` Jozsef Kadlecsik
2012-01-04 10:18       ` Hans Schillstrom
2012-01-04 11:17         ` Jan Engelhardt
2012-01-04 11:48           ` Hans Schillstrom
2012-01-04 17:40             ` Pablo Neira Ayuso
2012-01-04 18:05               ` Jozsef Kadlecsik
2012-01-04 20:56                 ` Hans Schillstrom
2012-01-04 21:40                   ` Jozsef Kadlecsik
2012-01-05  7:19                     ` Hans Schillstrom
2012-01-05  9:11                       ` Jozsef Kadlecsik
2012-01-05 14:18                         ` Pablo Neira Ayuso
2012-01-09  8:58                           ` Hans Schillstrom
2012-01-10  3:17                             ` Pablo Neira Ayuso
2012-01-04 20:45               ` Hans Schillstrom
2012-01-04 21:15               ` Hans Schillstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).