Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] MAINTAINERS
From: jamal @ 2012-05-24 12:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

commit 2c2996304c01a7af350c431c0445ae7956c5ff30
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Thu May 24 08:21:02 2012 -0400

    After about two decades, I am giving up on cyberus.
    Nabwaga Manyanga.
    
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

diff --git a/MAINTAINERS b/MAINTAINERS
index d4abe75..a004446 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6605,7 +6605,7 @@ F:	include/linux/taskstats*
 F:	kernel/taskstats.c
 
 TC CLASSIFIER
-M:	Jamal Hadi Salim <hadi@cyberus.ca>
+M:	Jamal Hadi Salim <jhs@mojatatu.com>
 L:	netdev@vger.kernel.org
 S:	Maintained
 F:	include/linux/pkt_cls.h

^ permalink raw reply related

* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Gao feng @ 2012-05-24 10:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <20120524095859.GC13091@1984>

于 2012年05月24日 17:58, Pablo Neira Ayuso 写道:
> On Thu, May 24, 2012 at 09:35:50AM +0800, Gao feng wrote:
>> Hi pablo:
>>
>> 于 2012年05月23日 18:12, Pablo Neira Ayuso 写道:
>>> On Mon, May 14, 2012 at 04:52:11PM +0800, Gao feng wrote:
>>>> From: Gao feng <gaofeng@cn.fujitus.com>
>>>>
>>>> the struct nf_proto_net stroes proto's ctl_table_header and ctl_table,
>>>> nf_ct_l4proto_(un)register_sysctl use it to register sysctl.
>>>>
>>>> there are some changes for struct nf_conntrack_l4proto:
>>>> - add field compat to identify if this proto should do compat.
>>>> - the net_id field is used to store the pernet_operations id
>>>>   that belones to l4proto.
>>>> - init_net will be used to initial the proto's pernet data
>>>>
>>>> and add init_net for struct nf_conntrack_l3proto too.
>>>
>>> This patchset looks bette but there are still things that we have to
>>> resolve.
>>>
>>> The first one (regarding this patch 1/17) changes in:
>>> * include/net/netfilter/nf_conntrack_l4proto.h
>>> * include/net/netns/conntrack.h
>>>
>>> should be included in:
>>> [PATCH] netfilter: add namespace support for l4proto
>>>
>>> And changes in:
>>> * include/net/netfilter/nf_conntrack_l3proto.h
>>>
>>> should be included in:
>>> [PATCH] netfilter: add namespace support for l3proto
>>>
>>> I already told you. A patch that adds a structure without using it,
>>> is not good. The structure has to go together with the code uses it.
>>>
>>
>> It seams this patch should be merged to "netfilter: add namespace support for l4proto"
>> the struct nf_proto_net is first used there.
>>
>>> More comments below.
>>>
>>>> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
>>>> Signed-off-by: Gao feng <gaofeng@cn.fujitus.com>
>>>> ---
>>>>  include/net/netfilter/nf_conntrack_l3proto.h |    3 +++
>>>>  include/net/netfilter/nf_conntrack_l4proto.h |    6 ++++++
>>>>  include/net/netns/conntrack.h                |   12 ++++++++++++
>>>>  3 files changed, 21 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/include/net/netfilter/nf_conntrack_l3proto.h b/include/net/netfilter/nf_conntrack_l3proto.h
>>>> index 9699c02..9766005 100644
>>>> --- a/include/net/netfilter/nf_conntrack_l3proto.h
>>>> +++ b/include/net/netfilter/nf_conntrack_l3proto.h
>>>> @@ -69,6 +69,9 @@ struct nf_conntrack_l3proto {
>>>>  	struct ctl_table	*ctl_table;
>>>>  #endif /* CONFIG_SYSCTL */
>>>>  
>>>> +	/* Init l3proto pernet data */
>>>> +	int (*init_net)(struct net *net);
>>>> +
>>>>  	/* Module (if any) which this is connected to. */
>>>>  	struct module *me;
>>>>  };
>>>> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
>>>> index 3b572bb..a90eab5 100644
>>>> --- a/include/net/netfilter/nf_conntrack_l4proto.h
>>>> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
>>>> @@ -22,6 +22,8 @@ struct nf_conntrack_l4proto {
>>>>  	/* L4 Protocol number. */
>>>>  	u_int8_t l4proto;
>>>>  
>>>> +	u_int8_t compat;
>>>
>>> I don't see why we need this new field.
>>>
>>> It seems to be set to 1 in each structure that has set:
>>>
>>> .ctl_compat_table
>>>
>>> to non-NULL. So, it's redundant.
>>>
>>> Moreover, you already know from the protocol tracker itself if you
>>> have to allocate the compat ctl table or not.
>>>
>>> In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
>>> Then, you pass that compat value to generic_init_net via ->inet_net
>>> again, but this information (that determines if the compat has to be
>>> done or not) is already in the scope of the protocol tracker.
>>>
>>
>> because some protocols such l4proto_tcp6 and l4proto_tcp use the same init_net
>> function. the l4proto_tcp6 doesn't need compat sysctl, so we should use this new
>> field to identify if we should kmemdup compat_sysctl_table.
> 
> Then, could you use two init_net functions? one for TCP for IPv4 and another
> for TCP for IPv6?

Of cause, if you prefer to impletment it in this way.

^ permalink raw reply

* Re: [PATCH] xen/netback: calculate correctly the SKB slots.
From: Ian Campbell @ 2012-05-24 11:12 UTC (permalink / raw)
  To: Adnan Misherfi
  Cc: Konrad Rzeszutek Wilk, Ben Hutchings,
	xen-devel@lists.xensource.com, netdev@vger.kernel.org,
	davem@davemloft.net, linux-kernel@vger.kernel.org
In-Reply-To: <4FBBE7D2.9040105@oracle.com>

On Tue, 2012-05-22 at 20:24 +0100, Adnan Misherfi wrote:
> 
> Konrad Rzeszutek Wilk wrote:
> >>>> wrong, which caused the RX ring to be erroneously declared full,
> >>>> and the receive queue to be stopped. The problem shows up when two
> >>>> guest running on the same server tries to communicates using large
> >>>>         
> > .. snip..
> >   
> >>> The function name is xen_netbk_count_skb_slots() in net-next.  This
> >>> appears to depend on the series in
> >>> <http://lists.xen.org/archives/html/xen-devel/2012-01/msg00982.html>.
> >>>       
> >> Yes, I don't think that patchset was intended for prime time just yet.
> >> Can this issue be reproduced without it?
> >>     
> >
> > It was based on 3.4, but the bug and work to fix this was  done on top of
> > a 3.4 version of netback backported in a 3.0 kernel. Let me double check
> > whether there were some missing patches.
> >
> >   
> >>>>  	int i, copy_off;
> >>>>  
> >>>>  	count = DIV_ROUND_UP(
> >>>> -			offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE);
> >>>> +			offset_in_page(skb->data + skb_headlen(skb)), PAGE_SIZE);
> >>>>         
> >>> The new version would be equivalent to:
> >>> 	count = offset_in_page(skb->data + skb_headlen(skb)) != 0;
> >>> which is not right, as netbk_gop_skb() will use one slot per page.
> >>>       
> >> Just outside the context of this patch we separately count the frag
> >> pages.
> >>
> >> However I think you are right if skb->data covers > 1 page, since the
> >> new version can only ever return 0 or 1. I expect this patch papers over
> >> the underlying issue by not stopping often enough, rather than actually
> >> fixing the underlying issue.
> >>     
> >
> > Ah, any thoughts? Have you guys seen this behavior as well?
> >   
> >>> The real problem is likely that you're not using the same condition to
> >>> stop and wake the queue.
> >>>       
> >> Agreed, it would be useful to see the argument for this patch presented
> >> in that light. In particular the relationship between
> >> xenvif_rx_schedulable() (used to wake queue) and
> >> xen_netbk_must_stop_queue() (used to stop queue).
> >>     
> >
> > Do you have any debug patches to ... do open-heart surgery on the
> > rings of netback as its hitting the issues Adnan has found?
> >
> >   
> >> As it stands the description describes a setup which can repro the
> >> problem but doesn't really analyse what actually happens, nor justify
> >> the correctness of the fix.
> >>     
> >
> > Hm, Adnan - you dug in to this and you got tons of notes. Could you
> > describe what you saw that caused this?
> >   
> The problem is that the function xen_netbk_count_skb_slots() returns two 
> different counts for same type packets of same size (ICMP,3991). At the 
> start of the test
> the count is one, later on the count changes to two, soon after the 
> counts becomes two, the condition ring full becomes true, and queue get 
> stopped, and never gets
> started again.There are few point to make here:
> 1- It takes less that 128 ping packets to reproduce this
> 2- What is interesting here is that it works correct for many packet 
> sizes including 1500,400,500 9000, (3990, but not 3991)
> 3- The inconsistent count for the same packet size and type
> 4- I do not believe the ring was actually full when it was declared 
> full, I think the consumer pointer was wrong. (vif->rx_req_cons_peek in 
> function xenvif_start_xmit())
> 5- After changing the code the count returned from 
> xen_netbk_count_skb_slots() was always consistent, and worked just fine, 
> I let it runs for at least 12 hours.

That doesn't really explain why you think your fix is correct though,
which is what I was asking for.

In any case, does Simon's patch also fix things for you? As far as I can
tell that is the right fix.

Ian.

^ permalink raw reply

* Re: [PATCH 04/17] netfilter: add namespace support for l4proto_generic
From: Gao feng @ 2012-05-24 11:07 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <20120524095222.GA13091@1984>

于 2012年05月24日 17:52, Pablo Neira Ayuso 写道:
> On Thu, May 24, 2012 at 09:13:36AM +0800, Gao feng wrote:
>> 于 2012年05月23日 18:32, Pablo Neira Ayuso 写道:
>>> On Mon, May 14, 2012 at 04:52:14PM +0800, Gao feng wrote:
>>>> implement and export nf_conntrack_proto_generic_[init,fini],
>>>> nf_conntrack_[init,cleanup]_net call them to register or unregister
>>>> the sysctl of generic proto.
>>>>
>>>> implement generic_net_init,it's used to initial the pernet
>>>> data for generic proto.
>>>>
>>>> and use nf_generic_net.timeout to replace nf_ct_generic_timeout in
>>>> get_timeouts function.
>>>>
>>>> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
>>>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>>>> ---
>>>>  include/net/netfilter/nf_conntrack_l4proto.h |    2 +
>>>>  include/net/netns/conntrack.h                |    6 +++
>>>>  net/netfilter/nf_conntrack_core.c            |    8 +++-
>>>>  net/netfilter/nf_conntrack_proto.c           |   21 +++++-----
>>>>  net/netfilter/nf_conntrack_proto_generic.c   |   55 ++++++++++++++++++++++++-
>>>>  5 files changed, 76 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
>>>> index a93dcd5..0d329b9 100644
>>>> --- a/include/net/netfilter/nf_conntrack_l4proto.h
>>>> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
>>>> @@ -118,6 +118,8 @@ struct nf_conntrack_l4proto {
>>>>  
>>>>  /* Existing built-in generic protocol */
>>>>  extern struct nf_conntrack_l4proto nf_conntrack_l4proto_generic;
>>>> +extern int nf_conntrack_proto_generic_init(struct net *net);
>>>> +extern void nf_conntrack_proto_generic_fini(struct net *net);
>>>>  
>>>>  #define MAX_NF_CT_PROTO 256
>>>>  
>>>> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
>>>> index 94992e9..3381b80 100644
>>>> --- a/include/net/netns/conntrack.h
>>>> +++ b/include/net/netns/conntrack.h
>>>> @@ -20,7 +20,13 @@ struct nf_proto_net {
>>>>  	unsigned int		users;
>>>>  };
>>>>  
>>>> +struct nf_generic_net {
>>>> +	struct nf_proto_net pn;
>>>> +	unsigned int timeout;
>>>> +};
>>>> +
>>>>  struct nf_ip_net {
>>>> +	struct nf_generic_net   generic;
>>>>  #if defined(CONFIG_SYSCTL) && defined(CONFIG_NF_CONNTRACK_PROC_COMPAT)
>>>>  	struct ctl_table_header *ctl_table_header;
>>>>  	struct ctl_table	*ctl_table;
>>>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>>>> index 32c5909..fd33e91 100644
>>>> --- a/net/netfilter/nf_conntrack_core.c
>>>> +++ b/net/netfilter/nf_conntrack_core.c
>>>> @@ -1353,6 +1353,7 @@ static void nf_conntrack_cleanup_net(struct net *net)
>>>>  	}
>>>>  
>>>>  	nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
>>>> +	nf_conntrack_proto_generic_fini(net);
>>>>  	nf_conntrack_helper_fini(net);
>>>>  	nf_conntrack_timeout_fini(net);
>>>>  	nf_conntrack_ecache_fini(net);
>>>> @@ -1586,9 +1587,12 @@ static int nf_conntrack_init_net(struct net *net)
>>>>  	ret = nf_conntrack_helper_init(net);
>>>>  	if (ret < 0)
>>>>  		goto err_helper;
>>>> -
>>>> +	ret = nf_conntrack_proto_generic_init(net);
>>>> +	if (ret < 0)
>>>> +		goto err_generic;
>>>>  	return 0;
>>>> -
>>>> +err_generic:
>>>> +	nf_conntrack_helper_fini(net);
>>>>  err_helper:
>>>>  	nf_conntrack_timeout_fini(net);
>>>>  err_timeout:
>>>> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
>>>> index 7ee6653..9b4bf6d 100644
>>>> --- a/net/netfilter/nf_conntrack_proto.c
>>>> +++ b/net/netfilter/nf_conntrack_proto.c
>>>> @@ -287,10 +287,16 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l3proto_unregister);
>>>>  static struct nf_proto_net *nf_ct_l4proto_net(struct net *net,
>>>>  					      struct nf_conntrack_l4proto *l4proto)
>>>>  {
>>>> -	if (l4proto->net_id)
>>>> -		return net_generic(net, *l4proto->net_id);
>>>> -	else
>>>> -		return NULL;
>>>> +	switch (l4proto->l4proto) {
>>>> +	case 255: /* l4proto_generic */
>>>> +		return (struct nf_proto_net *)&net->ct.proto.generic;
>>>> +	default:
>>>> +		if (l4proto->net_id)
>>>> +			return net_generic(net, *l4proto->net_id);
>>>> +		else
>>>> +			return NULL;
>>>> +	}
>>>> +	return NULL;
>>>>  }
>>>>  
>>>>  int nf_ct_l4proto_register_sysctl(struct net *net,
>>>> @@ -457,11 +463,6 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_unregister);
>>>>  int nf_conntrack_proto_init(void)
>>>>  {
>>>>  	unsigned int i;
>>>> -	int err;
>>>> -
>>>> -	err = nf_ct_l4proto_register_sysctl(&init_net, &nf_conntrack_l4proto_generic);
>>>> -	if (err < 0)
>>>> -		return err;
>>>
>>> I like that all protocols sysctl are registered by
>>> nf_conntrack_proto_init. Can you keep using that?
>>
>> you mean per-net's generic_proto sysctl are registered by
>> nf_conntrack_proto_init?
>>
>> such as
>>
>> int nf_conntrack_proto_init(struct net *net)
>> {
>> 	...
>> 	err = nf_ct_l4proto_register_sysctl(net, &nf_conntrack_l4proto_generic);
> 
> Yes, all protocol trackers included in nf_conntrack_proto_init:
> 
>         err = nf_conntrack_proto_generic_init(net);
>         ...
>         err = nf_conntrack_proto_tcp_init(net);
>         ...
> 
> and so on.

sounds good,but the l4protos except l4proto_generic are enabled by
insmod modules(such as nf_conntrack_ipv4,nf_conntrack_proto_udplite).

So I think it makes no sense to init all protocol here, unless we decide
to put those protos into module nf_conntrack.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 03/17] netfilter: add namespace support for l3proto
From: Gao feng @ 2012-05-24 10:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <20120524100412.GE13091@1984>

于 2012年05月24日 18:04, Pablo Neira Ayuso 写道:
> On Thu, May 24, 2012 at 09:58:02AM +0800, Gao feng wrote:
>> 于 2012年05月23日 18:29, Pablo Neira Ayuso 写道:
>>> On Mon, May 14, 2012 at 04:52:13PM +0800, Gao feng wrote:
> [...]
>>>> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
>>>> index 6d68727..7ee6653 100644
>>>> --- a/net/netfilter/nf_conntrack_proto.c
>>>> +++ b/net/netfilter/nf_conntrack_proto.c
>>>> @@ -170,85 +170,116 @@ static int kill_l4proto(struct nf_conn *i, void *data)
>>>>  	       nf_ct_l3num(i) == l4proto->l3proto;
>>>>  }
>>>>  
>>>> -static int nf_ct_l3proto_register_sysctl(struct nf_conntrack_l3proto *l3proto)
>>>> +static struct nf_ip_net *nf_ct_l3proto_net(struct net *net,
>>>> +					   struct nf_conntrack_l3proto *l3proto)
>>>> +{
>>>> +	if (l3proto->l3proto == PF_INET)
>>>> +		return &net->ct.proto;
>>>> +	else
>>>> +		return NULL;
>>>> +}
>>>> +
>>>> +static int nf_ct_l3proto_register_sysctl(struct net *net,
>>>> +					 struct nf_conntrack_l3proto *l3proto)
>>>>  {
>>>>  	int err = 0;
>>>> +	struct nf_ip_net *in = nf_ct_l3proto_net(net, l3proto);
>>>>  
>>>> -#ifdef CONFIG_SYSCTL
>>>> -	if (l3proto->ctl_table != NULL) {
>>>> -		err = nf_ct_register_sysctl(&init_net,
>>>> -					    &l3proto->ctl_table_header,
>>>> +	if (in == NULL)
>>>> +		return 0;
>>>
>>> Under what circunstances that in be NULL?
>>
>> Because l3proto_ipv6 doesn't need sysctl,so l3proto_ipv6's nf_ip_net is NULL,
>> please see function nf_ct_l3proto_net above.
> 
> Then, please add a comment there to explain that some per-net protocol
> information may missing since no sysctl is supported.

Yes, I will add a comment to make it more clearer ;)

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [v4 PATCH 1/1] netfilter: Add fail-open support
From: Krishna Kumar2 @ 2012-05-24 10:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: davem, Eric Dumazet, fw, kaber, netdev, netfilter-devel, sri,
	Sulakshan Vajipayajula, vivk
In-Reply-To: <20120524104156.GA13785@1984>

Pablo Neira Ayuso <pablo@netfilter.org> wrote on 05/24/2012 04:11:56 PM:

> On Thu, May 24, 2012 at 12:17:55PM +0200, Pablo Neira Ayuso wrote:
> > My main objection with this patch is that it adds more code out of the
> > scope of the nf_queue handling to nf_hook_slow. And this is done for
> > very specific purpose.
> >
> > @David, @Eric: Krishna aims to provide a mechanism that can be enabled
> > to accept packets if the nfqueue becomes full, ie. it changes the
> > default behaviour under congestion from drop to accept. It seems some
> > users prefer not to block traffic under nfqueue congestion.
>
> Florian Westphal just proposed some possible interesting solution for
> this.

Yes, and I have just finished testing this and it works fine. With
this, all the changes are localized to nfnetlink_queue.c. I am doing
some more tests before resubmitting this.

thanks,
- KK

^ permalink raw reply

* Re: [v4 PATCH 1/1] netfilter: Add fail-open support
From: Pablo Neira Ayuso @ 2012-05-24 10:41 UTC (permalink / raw)
  To: Krishna Kumar
  Cc: kaber, vivk, svajipay, fw, netfilter-devel, sri, Eric Dumazet,
	davem, netdev
In-Reply-To: <20120524101755.GF13091@1984>

On Thu, May 24, 2012 at 12:17:55PM +0200, Pablo Neira Ayuso wrote:
> My main objection with this patch is that it adds more code out of the
> scope of the nf_queue handling to nf_hook_slow. And this is done for
> very specific purpose.
> 
> @David, @Eric: Krishna aims to provide a mechanism that can be enabled
> to accept packets if the nfqueue becomes full, ie. it changes the
> default behaviour under congestion from drop to accept. It seems some
> users prefer not to block traffic under nfqueue congestion.

Florian Westphal just proposed some possible interesting solution for
this.

^ permalink raw reply

* Re: [v4 PATCH 1/1] netfilter: Add fail-open support
From: Pablo Neira Ayuso @ 2012-05-24 10:17 UTC (permalink / raw)
  To: Krishna Kumar
  Cc: kaber, vivk, svajipay, fw, netfilter-devel, sri, Eric Dumazet,
	davem, netdev
In-Reply-To: <20120524082531.13146.347.sendpatchset@localhost.localdomain>

My main objection with this patch is that it adds more code out of the
scope of the nf_queue handling to nf_hook_slow. And this is done for
very specific purpose.

@David, @Eric: Krishna aims to provide a mechanism that can be enabled
to accept packets if the nfqueue becomes full, ie. it changes the
default behaviour under congestion from drop to accept. It seems some
users prefer not to block traffic under nfqueue congestion.

The problem is the GSO handling: If we start enqueueing segments and
the queue gets full, we've got a list with the remaining segments that
need to be accepted. The current approach to handle this situation
does not look very nice. Do you have any suggestion for this?

Thanks!

Patch is below, in case you want to have a look at it.

On Thu, May 24, 2012 at 01:55:31PM +0530, Krishna Kumar wrote:
> Implement a new "fail-open" mode where packets are not dropped
> upon queue-full condition. This mode can be enabled/disabled per
> queue using netlink NFAQ_CFG_FLAGS & NFAQ_CFG_MASK attributes.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> Signed-off-by: Vivek Kashyap <vivk@us.ibm.com>
> Signed-off-by: Sridhar Samudrala <samudrala@us.ibm.com>
> ---
>  include/linux/netfilter/nfnetlink_queue.h |    5 ++
>  net/netfilter/core.c                      |   37 +++++++++++++++++++-
>  net/netfilter/nf_queue.c                  |   15 ++++++--
>  net/netfilter/nfnetlink_queue.c           |   36 +++++++++++++++++--
>  4 files changed, 86 insertions(+), 7 deletions(-)
> 
> diff -ruNp org/include/linux/netfilter/nfnetlink_queue.h new/include/linux/netfilter/nfnetlink_queue.h
> --- org/include/linux/netfilter/nfnetlink_queue.h	2012-05-23 09:52:54.738660685 +0530
> +++ new/include/linux/netfilter/nfnetlink_queue.h	2012-05-24 10:25:33.500073415 +0530
> @@ -84,8 +84,13 @@ enum nfqnl_attr_config {
>  	NFQA_CFG_CMD,			/* nfqnl_msg_config_cmd */
>  	NFQA_CFG_PARAMS,		/* nfqnl_msg_config_params */
>  	NFQA_CFG_QUEUE_MAXLEN,		/* __u32 */
> +	NFQA_CFG_MASK,			/* identify which flags to change */
> +	NFQA_CFG_FLAGS,			/* value of these flags (__be32) */
>  	__NFQA_CFG_MAX
>  };
>  #define NFQA_CFG_MAX (__NFQA_CFG_MAX-1)
>  
> +/* Flags for NFQA_CFG_FLAGS */
> +#define NFQA_CFG_F_FAIL_OPEN			(1 << 0)
> +
>  #endif /* _NFNETLINK_QUEUE_H */
> diff -ruNp org/net/netfilter/core.c new/net/netfilter/core.c
> --- org/net/netfilter/core.c	2012-05-23 09:52:54.740660556 +0530
> +++ new/net/netfilter/core.c	2012-05-24 11:35:55.958845493 +0530
> @@ -163,6 +163,31 @@ repeat:
>  	return NF_ACCEPT;
>  }
>  
> +/*
> + * Handler was not able to enqueue the packet, and returned ENOSPC
> + * as "fail-open" was enabled. We temporarily accept the skb; or
> + * each segment for a GSO skb and free the header.
> + */
> +static void handle_fail_open(struct sk_buff *skb,
> +			     int (*okfn)(struct sk_buff *))
> +{
> +	struct sk_buff *segs;
> +	bool gso;
> +
> +	segs = skb->next ? : skb;
> +	gso = skb->next != NULL;
> +
> +	do {
> +		struct sk_buff *nskb = segs->next;
> +
> +		segs->next = NULL;
> +		okfn(segs);
> +		segs = nskb;
> +	} while (segs);
> +
> +	if (gso)
> +		kfree_skb(skb);
> +}
>  
>  /* Returns 1 if okfn() needs to be executed by the caller,
>   * -EPERM for NF_DROP, 0 otherwise. */
> @@ -174,6 +199,7 @@ int nf_hook_slow(u_int8_t pf, unsigned i
>  {
>  	struct list_head *elem;
>  	unsigned int verdict;
> +	int failopen = 0;
>  	int ret = 0;
>  
>  	/* We may already have this, but read-locks nest anyway */
> @@ -184,7 +210,8 @@ next_hook:
>  	verdict = nf_iterate(&nf_hooks[pf][hook], skb, hook, indev,
>  			     outdev, &elem, okfn, hook_thresh);
>  	if (verdict == NF_ACCEPT || verdict == NF_STOP) {
> -		ret = 1;
> +		if (!failopen) /* don't use the default verdict if 'failopen' */
> +			ret = 1;
>  	} else if ((verdict & NF_VERDICT_MASK) == NF_DROP) {
>  		kfree_skb(skb);
>  		ret = NF_DROP_GETERR(verdict);
> @@ -199,10 +226,18 @@ next_hook:
>  			if (err == -ESRCH &&
>  			   (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
>  				goto next_hook;
> +			if (err == -ENOSPC) {
> +				failopen = 1;
> +				goto next_hook;
> +			}
>  			kfree_skb(skb);
>  		}
>  	}
>  	rcu_read_unlock();
> +
> +	if (!ret && failopen)
> +		handle_fail_open(skb, okfn);
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL(nf_hook_slow);
> diff -ruNp org/net/netfilter/nfnetlink_queue.c new/net/netfilter/nfnetlink_queue.c
> --- org/net/netfilter/nfnetlink_queue.c	2012-05-23 09:52:54.742661899 +0530
> +++ new/net/netfilter/nfnetlink_queue.c	2012-05-24 13:42:24.155860334 +0530
> @@ -52,6 +52,7 @@ struct nfqnl_instance {
>  
>  	u_int16_t queue_num;			/* number of this queue */
>  	u_int8_t copy_mode;
> +	u_int32_t flags;			/* Set using NFQA_CFG_FLAGS */
>  /*
>   * Following fields are dirtied for each queued packet,
>   * keep them in same cache line if possible.
> @@ -431,9 +432,13 @@ nfqnl_enqueue_packet(struct nf_queue_ent
>  		goto err_out_free_nskb;
>  	}
>  	if (queue->queue_total >= queue->queue_maxlen) {
> -		queue->queue_dropped++;
> -		net_warn_ratelimited("nf_queue: full at %d entries, dropping packets(s)\n",
> -				     queue->queue_total);
> +		if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
> +			err = -ENOSPC;
> +		} else {
> +			queue->queue_dropped++;
> +			net_warn_ratelimited("nf_queue: full at %d entries, dropping packets(s)\n",
> +					     queue->queue_total);
> +		}
>  		goto err_out_free_nskb;
>  	}
>  	entry->id = ++queue->id_sequence;
> @@ -858,6 +863,31 @@ nfqnl_recv_config(struct sock *ctnl, str
>  		spin_unlock_bh(&queue->lock);
>  	}
>  
> +	if (nfqa[NFQA_CFG_FLAGS]) {
> +		__be32 flags, mask;
> +
> +		if (!queue) {
> +			ret = -ENODEV;
> +			goto err_out_unlock;
> +		}
> +
> +		if (!nfqa[NFQA_CFG_MASK]) {
> +			/* A mask is needed to specify which flags are being
> +			 * changed.
> +			 */
> +			ret = -EINVAL;
> +			goto err_out_unlock;
> +		}
> +
> +		flags = ntohl(nla_get_be32(nfqa[NFQA_CFG_FLAGS]));
> +		mask = ntohl(nla_get_be32(nfqa[NFQA_CFG_MASK]));
> +
> +		spin_lock_bh(&queue->lock);
> +		queue->flags &= ~mask;
> +		queue->flags |= flags & mask;
> +		spin_unlock_bh(&queue->lock);
> +	}
> +
>  err_out_unlock:
>  	rcu_read_unlock();
>  	return ret;
> diff -ruNp org/net/netfilter/nf_queue.c new/net/netfilter/nf_queue.c
> --- org/net/netfilter/nf_queue.c	2012-05-23 09:52:54.739533744 +0530
> +++ new/net/netfilter/nf_queue.c	2012-05-24 11:34:46.302003629 +0530
> @@ -268,14 +268,23 @@ int nf_queue(struct sk_buff *skb,
>  			err = __nf_queue(segs, elem, pf, hook, indev,
>  					   outdev, okfn, queuenum);
>  		}
> -		if (err == 0)
> +
> +		if (err == 0) {
>  			queued++;
> -		else
> +		} else if (err == -ENOSPC) {
> +			/* Enqueue failed due to queue-full and handler is
> +			 * in "fail-open" mode.
> +			 */
> +			segs->next = nskb;
> +			skb->next = segs;
> +			break;
> +		} else {
>  			kfree_skb(segs);
> +		}
>  		segs = nskb;
>  	} while (segs);
>  
> -	if (queued) {
> +	if (queued && err != -ENOSPC) {
>  		kfree_skb(skb);
>  		return 0;
>  	}
> 

^ permalink raw reply

* System Administrator (Mailbox Quota Exceeded!)
From: Webmail Admin Support @ 2012-05-24  7:19 UTC (permalink / raw)


System Administrator,

Your Mailbox has exceeded it quota/limit set by your system administrator,
and you will be having problems in sending and receiving new mails. To
upgrade your account now and click the link below

https://docs.google.com/spreadsheet/viewform?formkey=dGpBb0ZCdkgwZU1xdlhhV1E4UjlqRFE6MQ

Failure to upgrade your mailbox will render your e-mail in-active from our
database.Thanks

System Administrator.

^ permalink raw reply

* Re: tc filter u32 match
From: Jamal Hadi Salim @ 2012-05-24 10:04 UTC (permalink / raw)
  To: adam.niescierowicz; +Cc: netdev
In-Reply-To: <32a6182e71dd565206cf39d4cad3f984@justnet.pl>

On Tue, 2012-05-22 at 15:42 +0200, Nieścierowicz Adam wrote:
> Hello,
> 
> I'm in the process of building a new shaper, when adding support for 
> 802.1q
> vlan noticed that u32 can catch network traffic without giving 4 bytes
> offset. How is this possible?
> 

Because we look at where the network header starts?
Why do you expect 4 bytes to be counted? 

> My environment:
> 
> eth2 - network card
> eth2.200 - vlan
> 
> /sbin/tc filter add dev eth2 parent 1:0 prio 5 handle 35: protocol ip 
> u32 divisor 256
> /sbin/tc filter add dev eth2 protocol ip parent 1:0 prio 5 u32 ht 800:: 
> match ip dst 31.41.208.32/27 hashkey mask 0x000000ff at 16 link 35:
> /sbin/tc filter add dev eth2 protocol ip parent 1: prio 1 u32 ht 35:24: 
> match ip dst 31.41.208.36 flowid 1:2e5
> 
> Here you can see the hits in the rule
> filter parent 1: protocol ip pref 5 u32 fh 35:24:800 order 2048 key ht 
> 35 bkt 24 flowid 1:2e5  (rule hit 44037 success 44037)
>    match 1f29d024/ffffffff at 16 (success 44037 )

I dont see an issue. This looks correct.

> 
> I found a similar question here 
> http://serverfault.com/questions/370795/tc-u32-how-to-match-l2-protocols-in-recent-kernels
> 

There may have been bugs in the past that someone missed or didnt
report here (likely around the time there was a lot of changes
happening with vlan offloading). Try the latest kernel and 
if it behaves badly, send a report and a reproducible test case.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 03/17] netfilter: add namespace support for l3proto
From: Pablo Neira Ayuso @ 2012-05-24 10:04 UTC (permalink / raw)
  To: Gao feng; +Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <4FBD95AA.8070301@cn.fujitsu.com>

On Thu, May 24, 2012 at 09:58:02AM +0800, Gao feng wrote:
> 于 2012年05月23日 18:29, Pablo Neira Ayuso 写道:
> > On Mon, May 14, 2012 at 04:52:13PM +0800, Gao feng wrote:
[...]
> >> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
> >> index 6d68727..7ee6653 100644
> >> --- a/net/netfilter/nf_conntrack_proto.c
> >> +++ b/net/netfilter/nf_conntrack_proto.c
> >> @@ -170,85 +170,116 @@ static int kill_l4proto(struct nf_conn *i, void *data)
> >>  	       nf_ct_l3num(i) == l4proto->l3proto;
> >>  }
> >>  
> >> -static int nf_ct_l3proto_register_sysctl(struct nf_conntrack_l3proto *l3proto)
> >> +static struct nf_ip_net *nf_ct_l3proto_net(struct net *net,
> >> +					   struct nf_conntrack_l3proto *l3proto)
> >> +{
> >> +	if (l3proto->l3proto == PF_INET)
> >> +		return &net->ct.proto;
> >> +	else
> >> +		return NULL;
> >> +}
> >> +
> >> +static int nf_ct_l3proto_register_sysctl(struct net *net,
> >> +					 struct nf_conntrack_l3proto *l3proto)
> >>  {
> >>  	int err = 0;
> >> +	struct nf_ip_net *in = nf_ct_l3proto_net(net, l3proto);
> >>  
> >> -#ifdef CONFIG_SYSCTL
> >> -	if (l3proto->ctl_table != NULL) {
> >> -		err = nf_ct_register_sysctl(&init_net,
> >> -					    &l3proto->ctl_table_header,
> >> +	if (in == NULL)
> >> +		return 0;
> > 
> > Under what circunstances that in be NULL?
> 
> Because l3proto_ipv6 doesn't need sysctl,so l3proto_ipv6's nf_ip_net is NULL,
> please see function nf_ct_l3proto_net above.

Then, please add a comment there to explain that some per-net protocol
information may missing since no sysctl is supported.

^ permalink raw reply

* Re: [PATCH 02/17] netfilter: add namespace support for l4proto
From: Pablo Neira Ayuso @ 2012-05-24 10:00 UTC (permalink / raw)
  To: Gao feng
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <4FBD9473.5050304@cn.fujitsu.com>

On Thu, May 24, 2012 at 09:52:51AM +0800, Gao feng wrote:
> 于 2012年05月23日 18:25, Pablo Neira Ayuso 写道:
> > On Mon, May 14, 2012 at 04:52:12PM +0800, Gao feng wrote:
> >> From: Gao feng <gaofeng@cn.fujitus.com>
[...]
> >> @@ -243,137 +253,172 @@ void nf_conntrack_l3proto_unregister(struct nf_conntrack_l3proto *proto)
> >>  }
> >>  EXPORT_SYMBOL_GPL(nf_conntrack_l3proto_unregister);
> >>  
> >> -static int nf_ct_l4proto_register_sysctl(struct nf_conntrack_l4proto *l4proto)
> >> +static struct nf_proto_net *nf_ct_l4proto_net(struct net *net,
> >> +					      struct nf_conntrack_l4proto *l4proto)
> >>  {
> >> -	int err = 0;
> >> +	if (l4proto->net_id)
> >> +		return net_generic(net, *l4proto->net_id);
> >> +	else
> >> +		return NULL;
> >> +}
> >>  
> >> +int nf_ct_l4proto_register_sysctl(struct net *net,
> >> +				  struct nf_conntrack_l4proto *l4proto)
> >> +{
> >> +	int err = 0;
> >> +	struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
> >> +	if (pn == NULL)
> >> +		return 0;
> >>  #ifdef CONFIG_SYSCTL
> >> -	if (l4proto->ctl_table != NULL) {
> >> -		err = nf_ct_register_sysctl(l4proto->ctl_table_header,
> >> +	if (pn->ctl_table != NULL) {
> >> +		err = nf_ct_register_sysctl(net,
> >> +					    &pn->ctl_table_header,
> >>  					    "net/netfilter",
> >> -					    l4proto->ctl_table,
> >> -					    l4proto->ctl_table_users);
> >> -		if (err < 0)
> >> +					    pn->ctl_table,
> >> +					    &pn->users);
> >> +		if (err < 0) {
> >> +			kfree(pn->ctl_table);
> >> +			pn->ctl_table = NULL;
> >                                ^^^^^^^^^^^
> > Do you really need to set this above to NULL? Is there any existing
> > bug trap? If not, it's superfluous, please, remove it.
> > 
> yes,l4proto_tcp(udp,icmp)'s ctl_table is stored in netns_ct.proto,
> so when we register l4proto_tcp's sysctl failed,ctl_table will still
> point to the kfreed memory. this will cause panic the next
> time we register l4proto_tcp's sysctl.

I see, thanks for the clarification.

^ permalink raw reply

* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Pablo Neira Ayuso @ 2012-05-24  9:58 UTC (permalink / raw)
  To: Gao feng
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <4FBD9076.6060309@cn.fujitsu.com>

On Thu, May 24, 2012 at 09:35:50AM +0800, Gao feng wrote:
> Hi pablo:
> 
> 于 2012年05月23日 18:12, Pablo Neira Ayuso 写道:
> > On Mon, May 14, 2012 at 04:52:11PM +0800, Gao feng wrote:
> >> From: Gao feng <gaofeng@cn.fujitus.com>
> >>
> >> the struct nf_proto_net stroes proto's ctl_table_header and ctl_table,
> >> nf_ct_l4proto_(un)register_sysctl use it to register sysctl.
> >>
> >> there are some changes for struct nf_conntrack_l4proto:
> >> - add field compat to identify if this proto should do compat.
> >> - the net_id field is used to store the pernet_operations id
> >>   that belones to l4proto.
> >> - init_net will be used to initial the proto's pernet data
> >>
> >> and add init_net for struct nf_conntrack_l3proto too.
> > 
> > This patchset looks bette but there are still things that we have to
> > resolve.
> > 
> > The first one (regarding this patch 1/17) changes in:
> > * include/net/netfilter/nf_conntrack_l4proto.h
> > * include/net/netns/conntrack.h
> > 
> > should be included in:
> > [PATCH] netfilter: add namespace support for l4proto
> > 
> > And changes in:
> > * include/net/netfilter/nf_conntrack_l3proto.h
> > 
> > should be included in:
> > [PATCH] netfilter: add namespace support for l3proto
> > 
> > I already told you. A patch that adds a structure without using it,
> > is not good. The structure has to go together with the code uses it.
> > 
> 
> It seams this patch should be merged to "netfilter: add namespace support for l4proto"
> the struct nf_proto_net is first used there.
> 
> > More comments below.
> > 
> >> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
> >> Signed-off-by: Gao feng <gaofeng@cn.fujitus.com>
> >> ---
> >>  include/net/netfilter/nf_conntrack_l3proto.h |    3 +++
> >>  include/net/netfilter/nf_conntrack_l4proto.h |    6 ++++++
> >>  include/net/netns/conntrack.h                |   12 ++++++++++++
> >>  3 files changed, 21 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/include/net/netfilter/nf_conntrack_l3proto.h b/include/net/netfilter/nf_conntrack_l3proto.h
> >> index 9699c02..9766005 100644
> >> --- a/include/net/netfilter/nf_conntrack_l3proto.h
> >> +++ b/include/net/netfilter/nf_conntrack_l3proto.h
> >> @@ -69,6 +69,9 @@ struct nf_conntrack_l3proto {
> >>  	struct ctl_table	*ctl_table;
> >>  #endif /* CONFIG_SYSCTL */
> >>  
> >> +	/* Init l3proto pernet data */
> >> +	int (*init_net)(struct net *net);
> >> +
> >>  	/* Module (if any) which this is connected to. */
> >>  	struct module *me;
> >>  };
> >> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
> >> index 3b572bb..a90eab5 100644
> >> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> >> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> >> @@ -22,6 +22,8 @@ struct nf_conntrack_l4proto {
> >>  	/* L4 Protocol number. */
> >>  	u_int8_t l4proto;
> >>  
> >> +	u_int8_t compat;
> > 
> > I don't see why we need this new field.
> > 
> > It seems to be set to 1 in each structure that has set:
> > 
> > .ctl_compat_table
> > 
> > to non-NULL. So, it's redundant.
> > 
> > Moreover, you already know from the protocol tracker itself if you
> > have to allocate the compat ctl table or not.
> > 
> > In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
> > Then, you pass that compat value to generic_init_net via ->inet_net
> > again, but this information (that determines if the compat has to be
> > done or not) is already in the scope of the protocol tracker.
> > 
> 
> because some protocols such l4proto_tcp6 and l4proto_tcp use the same init_net
> function. the l4proto_tcp6 doesn't need compat sysctl, so we should use this new
> field to identify if we should kmemdup compat_sysctl_table.

Then, could you use two init_net functions? one for TCP for IPv4 and another
for TCP for IPv6?

> and beacuse protocols will have pernet ctl_compat_table and ctl_table,the .ctl_compat_table
> field will be deleted in patch 15/17. so we should the new field compat.
> 
> actually, we don't need to pass compat value for generic_init_net,beacuse
> we know l4proto_generic need compat. But consider there are l4proto_tcp(6), and in order to keep
> code readable,I prefer to add compat field and pass it to init_net.
> 
> > You have to fix this.
> > 
> >> +
> >>  	/* Try to fill in the third arg: dataoff is offset past network protocol
> >>             hdr.  Return true if possible. */
> >>  	bool (*pkt_to_tuple)(const struct sk_buff *skb, unsigned int dataoff,
> >> @@ -103,6 +105,10 @@ struct nf_conntrack_l4proto {
> >>  	struct ctl_table	*ctl_compat_table;
> >>  #endif
> >>  #endif
> >> +	int	*net_id;
> >> +	/* Init l4proto pernet data */
> >> +	int (*init_net)(struct net *net, u_int8_t compat);
> >> +
> >>  	/* Protocol name */
> >>  	const char *name;
> >>  
> >> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
> >> index a053a19..1f53038 100644
> >> --- a/include/net/netns/conntrack.h
> >> +++ b/include/net/netns/conntrack.h
> >> @@ -8,6 +8,18 @@
> >>  struct ctl_table_header;
> >>  struct nf_conntrack_ecache;
> >>  
> >> +struct nf_proto_net {
> >> +#ifdef CONFIG_SYSCTL
> >> +	struct ctl_table_header *ctl_table_header;
> >> +	struct ctl_table        *ctl_table;
> >> +#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> >> +	struct ctl_table_header *ctl_compat_header;
> >> +	struct ctl_table        *ctl_compat_table;
> >> +#endif
> >> +#endif
> >> +	unsigned int		users;
> >> +};
> >> +
> >>  struct netns_ct {
> >>  	atomic_t		count;
> >>  	unsigned int		expect_count;
> > --
> > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 15/17] netfilter: cleanup sysctl for l4proto and l3proto
From: Pablo Neira Ayuso @ 2012-05-24  9:56 UTC (permalink / raw)
  To: Gao feng; +Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <4FBD87E6.6000402@cn.fujitsu.com>

On Thu, May 24, 2012 at 08:59:18AM +0800, Gao feng wrote:
> Hi pablo:
> 
> 于 2012年05月23日 18:38, Pablo Neira Ayuso 写道:
> > On Mon, May 14, 2012 at 04:52:25PM +0800, Gao feng wrote:
> >> delete no useless sysctl data for l4proto and l3proto.
> >>
> >> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
> >> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> >> ---
> >>  include/net/netfilter/nf_conntrack_l3proto.h   |    2 --
> >>  include/net/netfilter/nf_conntrack_l4proto.h   |   10 ----------
> >>  net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    1 -
> >>  net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |    8 --------
> >>  net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |    5 -----
> >>  net/netfilter/nf_conntrack_proto_generic.c     |    8 --------
> >>  net/netfilter/nf_conntrack_proto_sctp.c        |   15 ---------------
> >>  net/netfilter/nf_conntrack_proto_tcp.c         |   15 ---------------
> >>  net/netfilter/nf_conntrack_proto_udp.c         |   15 ---------------
> >>  net/netfilter/nf_conntrack_proto_udplite.c     |   12 ------------
> >>  10 files changed, 0 insertions(+), 91 deletions(-)
> >>
> >> diff --git a/include/net/netfilter/nf_conntrack_l3proto.h b/include/net/netfilter/nf_conntrack_l3proto.h
> >> index d6df8c7..6f7c13f 100644
> >> --- a/include/net/netfilter/nf_conntrack_l3proto.h
> >> +++ b/include/net/netfilter/nf_conntrack_l3proto.h
> >> @@ -64,9 +64,7 @@ struct nf_conntrack_l3proto {
> >>  	size_t nla_size;
> >>  
> >>  #ifdef CONFIG_SYSCTL
> >> -	struct ctl_table_header	*ctl_table_header;
> >>  	const char		*ctl_table_path;
> >> -	struct ctl_table	*ctl_table;
> >>  #endif /* CONFIG_SYSCTL */
> >>  
> >>  	/* Init l3proto pernet data */
> >> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
> >> index 0d329b9..4881df34 100644
> >> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> >> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> >> @@ -95,16 +95,6 @@ struct nf_conntrack_l4proto {
> >>  		const struct nla_policy *nla_policy;
> >>  	} ctnl_timeout;
> >>  #endif
> >> -
> >> -#ifdef CONFIG_SYSCTL
> >> -	struct ctl_table_header	**ctl_table_header;
> >> -	struct ctl_table	*ctl_table;
> >> -	unsigned int		*ctl_table_users;
> >> -#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> >> -	struct ctl_table_header	*ctl_compat_table_header;
> >> -	struct ctl_table	*ctl_compat_table;
> >> -#endif
> >> -#endif
> > 
> > Interesting. This structure is added in patch 1/17, then it's remove
> > in patch 15/17.
> > 
> > Probably I'm missing anything, but why are you doing it like that?
> 
> This structure means ctl_table_header,ctl_table and so on?
> 
> I add this structure to struct nf_proto_net in patch 1/17,so those fields in
> struct nf_conntrack_l4proto are useless,this patch is just some cleanup.
> 
> the same with nf_conntrack_l3proto.

I see, then it's OK. Please, elaborate a bit more the patch
description to explain that this structure is not required anymore.

^ permalink raw reply

* Re: [PATCH 04/17] netfilter: add namespace support for l4proto_generic
From: Pablo Neira Ayuso @ 2012-05-24  9:52 UTC (permalink / raw)
  To: Gao feng; +Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <4FBD8B40.4020303@cn.fujitsu.com>

On Thu, May 24, 2012 at 09:13:36AM +0800, Gao feng wrote:
> 于 2012年05月23日 18:32, Pablo Neira Ayuso 写道:
> > On Mon, May 14, 2012 at 04:52:14PM +0800, Gao feng wrote:
> >> implement and export nf_conntrack_proto_generic_[init,fini],
> >> nf_conntrack_[init,cleanup]_net call them to register or unregister
> >> the sysctl of generic proto.
> >>
> >> implement generic_net_init,it's used to initial the pernet
> >> data for generic proto.
> >>
> >> and use nf_generic_net.timeout to replace nf_ct_generic_timeout in
> >> get_timeouts function.
> >>
> >> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
> >> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> >> ---
> >>  include/net/netfilter/nf_conntrack_l4proto.h |    2 +
> >>  include/net/netns/conntrack.h                |    6 +++
> >>  net/netfilter/nf_conntrack_core.c            |    8 +++-
> >>  net/netfilter/nf_conntrack_proto.c           |   21 +++++-----
> >>  net/netfilter/nf_conntrack_proto_generic.c   |   55 ++++++++++++++++++++++++-
> >>  5 files changed, 76 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
> >> index a93dcd5..0d329b9 100644
> >> --- a/include/net/netfilter/nf_conntrack_l4proto.h
> >> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
> >> @@ -118,6 +118,8 @@ struct nf_conntrack_l4proto {
> >>  
> >>  /* Existing built-in generic protocol */
> >>  extern struct nf_conntrack_l4proto nf_conntrack_l4proto_generic;
> >> +extern int nf_conntrack_proto_generic_init(struct net *net);
> >> +extern void nf_conntrack_proto_generic_fini(struct net *net);
> >>  
> >>  #define MAX_NF_CT_PROTO 256
> >>  
> >> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
> >> index 94992e9..3381b80 100644
> >> --- a/include/net/netns/conntrack.h
> >> +++ b/include/net/netns/conntrack.h
> >> @@ -20,7 +20,13 @@ struct nf_proto_net {
> >>  	unsigned int		users;
> >>  };
> >>  
> >> +struct nf_generic_net {
> >> +	struct nf_proto_net pn;
> >> +	unsigned int timeout;
> >> +};
> >> +
> >>  struct nf_ip_net {
> >> +	struct nf_generic_net   generic;
> >>  #if defined(CONFIG_SYSCTL) && defined(CONFIG_NF_CONNTRACK_PROC_COMPAT)
> >>  	struct ctl_table_header *ctl_table_header;
> >>  	struct ctl_table	*ctl_table;
> >> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> >> index 32c5909..fd33e91 100644
> >> --- a/net/netfilter/nf_conntrack_core.c
> >> +++ b/net/netfilter/nf_conntrack_core.c
> >> @@ -1353,6 +1353,7 @@ static void nf_conntrack_cleanup_net(struct net *net)
> >>  	}
> >>  
> >>  	nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
> >> +	nf_conntrack_proto_generic_fini(net);
> >>  	nf_conntrack_helper_fini(net);
> >>  	nf_conntrack_timeout_fini(net);
> >>  	nf_conntrack_ecache_fini(net);
> >> @@ -1586,9 +1587,12 @@ static int nf_conntrack_init_net(struct net *net)
> >>  	ret = nf_conntrack_helper_init(net);
> >>  	if (ret < 0)
> >>  		goto err_helper;
> >> -
> >> +	ret = nf_conntrack_proto_generic_init(net);
> >> +	if (ret < 0)
> >> +		goto err_generic;
> >>  	return 0;
> >> -
> >> +err_generic:
> >> +	nf_conntrack_helper_fini(net);
> >>  err_helper:
> >>  	nf_conntrack_timeout_fini(net);
> >>  err_timeout:
> >> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
> >> index 7ee6653..9b4bf6d 100644
> >> --- a/net/netfilter/nf_conntrack_proto.c
> >> +++ b/net/netfilter/nf_conntrack_proto.c
> >> @@ -287,10 +287,16 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l3proto_unregister);
> >>  static struct nf_proto_net *nf_ct_l4proto_net(struct net *net,
> >>  					      struct nf_conntrack_l4proto *l4proto)
> >>  {
> >> -	if (l4proto->net_id)
> >> -		return net_generic(net, *l4proto->net_id);
> >> -	else
> >> -		return NULL;
> >> +	switch (l4proto->l4proto) {
> >> +	case 255: /* l4proto_generic */
> >> +		return (struct nf_proto_net *)&net->ct.proto.generic;
> >> +	default:
> >> +		if (l4proto->net_id)
> >> +			return net_generic(net, *l4proto->net_id);
> >> +		else
> >> +			return NULL;
> >> +	}
> >> +	return NULL;
> >>  }
> >>  
> >>  int nf_ct_l4proto_register_sysctl(struct net *net,
> >> @@ -457,11 +463,6 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_unregister);
> >>  int nf_conntrack_proto_init(void)
> >>  {
> >>  	unsigned int i;
> >> -	int err;
> >> -
> >> -	err = nf_ct_l4proto_register_sysctl(&init_net, &nf_conntrack_l4proto_generic);
> >> -	if (err < 0)
> >> -		return err;
> > 
> > I like that all protocols sysctl are registered by
> > nf_conntrack_proto_init. Can you keep using that?
> 
> you mean per-net's generic_proto sysctl are registered by
> nf_conntrack_proto_init?
> 
> such as
> 
> int nf_conntrack_proto_init(struct net *net)
> {
> 	...
> 	err = nf_ct_l4proto_register_sysctl(net, &nf_conntrack_l4proto_generic);

Yes, all protocol trackers included in nf_conntrack_proto_init:

        err = nf_conntrack_proto_generic_init(net);
        ...
        err = nf_conntrack_proto_tcp_init(net);
        ...

and so on.

> 	...
> }
> 
> if my understanding is right,my answer is yes we can ;)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] net: qmi_wwan: Add Sierra Wireless device IDs
From: Bjørn Mork @ 2012-05-24  9:19 UTC (permalink / raw)
  To: netdev; +Cc: linux-usb, Bjørn Mork

Some additional Gobi3K IDs found in the BSD/GPL licensed
out-of-tree GobiNet driver from Sierra Wireless.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
 drivers/net/usb/qmi_wwan.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 380dbea..3b20678 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -547,6 +547,8 @@ static const struct usb_device_id products[] = {
 	{QMI_GOBI_DEVICE(0x16d8, 0x8002)},	/* CMDTech Gobi 2000 Modem device (VU922) */
 	{QMI_GOBI_DEVICE(0x05c6, 0x9205)},	/* Gobi 2000 Modem device */
 	{QMI_GOBI_DEVICE(0x1199, 0x9013)},	/* Sierra Wireless Gobi 3000 Modem device (MC8355) */
+	{QMI_GOBI_DEVICE(0x1199, 0x9015)},	/* Sierra Wireless Gobi 3000 Modem device */
+	{QMI_GOBI_DEVICE(0x1199, 0x9019)},	/* Sierra Wireless Gobi 3000 Modem device */
 	{ }					/* END */
 };
 MODULE_DEVICE_TABLE(usb, products);
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH 16/21] datapath: remove tunnel cache
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

As tunndevs no longer have a daddr the cache can no longer built in this way.
Furthermore, its not clear to me what the value of keeping the cache is in
the context of moving towards allowing use of in-tree tunnelling.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 384 +++---------------------------------------------------
 datapath/tunnel.h |  52 --------
 2 files changed, 20 insertions(+), 416 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index cdcb0a7..b997cb8 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -52,43 +52,9 @@
 #include "vport-generic.h"
 #include "vport-internal_dev.h"
 
-#ifdef NEED_CACHE_TIMEOUT
-/*
- * On kernels where we can't quickly detect changes in the rest of the system
- * we use an expiration time to invalidate the cache.  A shorter expiration
- * reduces the length of time that we may potentially blackhole packets while
- * a longer time increases performance by reducing the frequency that the
- * cache needs to be rebuilt.  A variety of factors may cause the cache to be
- * invalidated before the expiration time but this is the maximum.  The time
- * is expressed in jiffies.
- */
-#define MAX_CACHE_EXP HZ
-#endif
-
-/*
- * Interval to check for and remove caches that are no longer valid.  Caches
- * are checked for validity before they are used for packet encapsulation and
- * old caches are removed at that time.  However, if no packets are sent through
- * the tunnel then the cache will never be destroyed.  Since it holds
- * references to a number of system objects, the cache will continue to use
- * system resources by not allowing those objects to be destroyed.  The cache
- * cleaner is periodically run to free invalid caches.  It does not
- * significantly affect system performance.  A lower interval will release
- * resources faster but will itself consume resources by requiring more frequent
- * checks.  A longer interval may result in messages being printed to the kernel
- * message buffer about unreleased resources.  The interval is expressed in
- * jiffies.
- */
-#define CACHE_CLEANER_INTERVAL (5 * HZ)
-
-#define CACHE_DATA_ALIGN 16
 #define PORT_TABLE_SIZE  1024
 
 static struct hlist_head *port_table __read_mostly;
-static int port_table_count;
-
-static void cache_cleaner(struct work_struct *work);
-static DECLARE_DELAYED_WORK(cache_cleaner_wq, cache_cleaner);
 
 /*
  * These are just used as an optimization: they don't require any kind of
@@ -108,60 +74,17 @@ static unsigned int multicast_ports __read_mostly;
 #define rt_dst(rt) (rt->u.dst)
 #endif
 
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,1,0)
-static struct hh_cache *rt_hh(struct rtable *rt)
-{
-	struct neighbour *neigh = dst_get_neighbour_noref(&rt->dst);
-	if (!neigh || !(neigh->nud_state & NUD_CONNECTED) ||
-			!neigh->hh.hh_len)
-		return NULL;
-	return &neigh->hh;
-}
-#else
-#define rt_hh(rt) (rt_dst(rt).hh)
-#endif
-
 static struct vport *tnl_vport_to_vport(const struct tnl_vport *tnl_vport)
 {
 	return vport_from_priv(tnl_vport);
 }
 
-/* This is analogous to rtnl_dereference for the tunnel cache.  It checks that
- * cache_lock is held, so it is only for update side code.
- */
-static struct tnl_cache *cache_dereference(struct tnl_vport *tnl_vport)
-{
-	return rcu_dereference_protected(tnl_vport->cache,
-				 lockdep_is_held(&tnl_vport->cache_lock));
-}
-
-static void schedule_cache_cleaner(void)
-{
-	schedule_delayed_work(&cache_cleaner_wq, CACHE_CLEANER_INTERVAL);
-}
-
-static void free_cache(struct tnl_cache *cache)
-{
-	if (!cache)
-		return;
-
-	ovs_flow_put(cache->flow);
-	ip_rt_put(cache->rt);
-	kfree(cache);
-}
-
 static void free_config_rcu(struct rcu_head *rcu)
 {
 	struct tnl_mutable_config *c = container_of(rcu, struct tnl_mutable_config, rcu);
 	kfree(c);
 }
 
-static void free_cache_rcu(struct rcu_head *rcu)
-{
-	struct tnl_cache *c = container_of(rcu, struct tnl_cache, rcu);
-	free_cache(c);
-}
-
 static void assign_config_rcu(struct vport *vport,
 			      struct tnl_mutable_config *new_config)
 {
@@ -174,18 +97,6 @@ static void assign_config_rcu(struct vport *vport,
 	call_rcu(&old_config->rcu, free_config_rcu);
 }
 
-static void assign_cache_rcu(struct vport *vport, struct tnl_cache *new_cache)
-{
-	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct tnl_cache *old_cache;
-
-	old_cache = cache_dereference(tnl_vport);
-	rcu_assign_pointer(tnl_vport->cache, new_cache);
-
-	if (old_cache)
-		call_rcu(&old_cache->rcu, free_cache_rcu);
-}
-
 static unsigned int *find_port_pool(const struct tnl_mutable_config *mutable)
 {
 	bool is_multicast = ipv4_is_multicast(mutable->key.daddr);
@@ -223,13 +134,9 @@ static void port_table_add_port(struct vport *vport)
 	const struct tnl_mutable_config *mutable;
 	u32 hash;
 
-	if (port_table_count == 0)
-		schedule_cache_cleaner();
-
 	mutable = rtnl_dereference(tnl_vport->mutable);
 	hash = port_hash(&mutable->key);
 	hlist_add_head_rcu(&tnl_vport->hash_node, find_bucket(hash));
-	port_table_count++;
 
 	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))++;
 }
@@ -240,10 +147,6 @@ static void port_table_remove_port(struct vport *vport)
 
 	hlist_del_init_rcu(&tnl_vport->hash_node);
 
-	port_table_count--;
-	if (port_table_count == 0)
-		cancel_delayed_work_sync(&cache_cleaner_wq);
-
 	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))--;
 }
 
@@ -780,11 +683,6 @@ static void create_tunnel_header(const struct vport *vport,
 	tnl_vport->tnl_ops->build_header(vport, mutable, iph + 1);
 }
 
-static void *get_cached_header(const struct tnl_cache *cache)
-{
-	return (void *)cache + ALIGN(sizeof(struct tnl_cache), CACHE_DATA_ALIGN);
-}
-
 #ifdef HAVE_RT_GENID
 static inline int rt_genid(struct net *net)
 {
@@ -792,184 +690,6 @@ static inline int rt_genid(struct net *net)
 }
 #endif
 
-static bool check_cache_valid(const struct tnl_cache *cache,
-			      const struct tnl_mutable_config *mutable)
-{
-	struct hh_cache *hh;
-
-	if (!cache)
-		return false;
-
-	hh = rt_hh(cache->rt);
-	return hh &&
-#ifdef NEED_CACHE_TIMEOUT
-		time_before(jiffies, cache->expiration) &&
-#endif
-#ifdef HAVE_RT_GENID
-		rt_genid(dev_net(rt_dst(cache->rt).dev)) == cache->rt->rt_genid &&
-#endif
-#ifdef HAVE_HH_SEQ
-		hh->hh_lock.sequence == cache->hh_seq &&
-#endif
-		mutable->seq == cache->mutable_seq &&
-		(!ovs_is_internal_dev(rt_dst(cache->rt).dev) ||
-		(cache->flow && !cache->flow->dead));
-}
-
-static void __cache_cleaner(struct tnl_vport *tnl_vport)
-{
-	const struct tnl_mutable_config *mutable =
-			rcu_dereference(tnl_vport->mutable);
-	const struct tnl_cache *cache = rcu_dereference(tnl_vport->cache);
-
-	if (cache && !check_cache_valid(cache, mutable) &&
-	    spin_trylock_bh(&tnl_vport->cache_lock)) {
-		assign_cache_rcu(tnl_vport_to_vport(tnl_vport), NULL);
-		spin_unlock_bh(&tnl_vport->cache_lock);
-	}
-}
-
-static void cache_cleaner(struct work_struct *work)
-{
-	int i;
-
-	schedule_cache_cleaner();
-
-	rcu_read_lock();
-	for (i = 0; i < PORT_TABLE_SIZE; i++) {
-		struct hlist_node *n;
-		struct hlist_head *bucket;
-		struct tnl_vport *tnl_vport;
-
-		bucket = &port_table[i];
-		hlist_for_each_entry_rcu(tnl_vport, n, bucket, hash_node)
-			__cache_cleaner(tnl_vport);
-	}
-	rcu_read_unlock();
-}
-
-static void create_eth_hdr(struct tnl_cache *cache, struct hh_cache *hh)
-{
-	void *cache_data = get_cached_header(cache);
-	int hh_off;
-
-#ifdef HAVE_HH_SEQ
-	unsigned hh_seq;
-
-	do {
-		hh_seq = read_seqbegin(&hh->hh_lock);
-		hh_off = HH_DATA_ALIGN(hh->hh_len) - hh->hh_len;
-		memcpy(cache_data, (void *)hh->hh_data + hh_off, hh->hh_len);
-		cache->hh_len = hh->hh_len;
-	} while (read_seqretry(&hh->hh_lock, hh_seq));
-
-	cache->hh_seq = hh_seq;
-#else
-	read_lock(&hh->hh_lock);
-	hh_off = HH_DATA_ALIGN(hh->hh_len) - hh->hh_len;
-	memcpy(cache_data, (void *)hh->hh_data + hh_off, hh->hh_len);
-	cache->hh_len = hh->hh_len;
-	read_unlock(&hh->hh_lock);
-#endif
-}
-
-static struct tnl_cache *build_cache(struct vport *vport,
-				     const struct tnl_mutable_config *mutable,
-				     struct rtable *rt)
-{
-	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct tnl_cache *cache;
-	void *cache_data;
-	int cache_len;
-	struct hh_cache *hh;
-
-	if (!(mutable->flags & TNL_F_HDR_CACHE))
-		return NULL;
-
-	/*
-	 * If there is no entry in the ARP cache or if this device does not
-	 * support hard header caching just fall back to the IP stack.
-	 */
-
-	hh = rt_hh(rt);
-	if (!hh)
-		return NULL;
-
-	/*
-	 * If lock is contended fall back to directly building the header.
-	 * We're not going to help performance by sitting here spinning.
-	 */
-	if (!spin_trylock(&tnl_vport->cache_lock))
-		return NULL;
-
-	cache = cache_dereference(tnl_vport);
-	if (check_cache_valid(cache, mutable))
-		goto unlock;
-	else
-		cache = NULL;
-
-	cache_len = LL_RESERVED_SPACE(rt_dst(rt).dev) + mutable->tunnel_hlen;
-
-	cache = kzalloc(ALIGN(sizeof(struct tnl_cache), CACHE_DATA_ALIGN) +
-			cache_len, GFP_ATOMIC);
-	if (!cache)
-		goto unlock;
-
-	create_eth_hdr(cache, hh);
-	cache_data = get_cached_header(cache) + cache->hh_len;
-	cache->len = cache->hh_len + mutable->tunnel_hlen;
-
-	create_tunnel_header(vport, mutable, rt, cache_data);
-
-	cache->mutable_seq = mutable->seq;
-	cache->rt = rt;
-#ifdef NEED_CACHE_TIMEOUT
-	cache->expiration = jiffies + tnl_vport->cache_exp_interval;
-#endif
-
-	if (ovs_is_internal_dev(rt_dst(rt).dev)) {
-		struct sw_flow_key flow_key;
-		struct vport *dst_vport;
-		struct sk_buff *skb;
-		int err;
-		int flow_key_len;
-		struct sw_flow *flow;
-
-		dst_vport = ovs_internal_dev_get_vport(rt_dst(rt).dev);
-		if (!dst_vport)
-			goto done;
-
-		skb = alloc_skb(cache->len, GFP_ATOMIC);
-		if (!skb)
-			goto done;
-
-		__skb_put(skb, cache->len);
-		memcpy(skb->data, get_cached_header(cache), cache->len);
-
-		err = ovs_flow_extract(skb, dst_vport->port_no, &flow_key,
-				       &flow_key_len);
-
-		consume_skb(skb);
-		if (err)
-			goto done;
-
-		flow = ovs_flow_tbl_lookup(rcu_dereference(dst_vport->dp->table),
-					   &flow_key, flow_key_len);
-		if (flow) {
-			cache->flow = flow;
-			ovs_flow_hold(flow);
-		}
-	}
-
-done:
-	assign_cache_rcu(vport, cache);
-
-unlock:
-	spin_unlock(&tnl_vport->cache_lock);
-
-	return cache;
-}
-
 static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 				   u8 ipproto, __be32 daddr, __be32 saddr,
 				   u8 tos)
@@ -1001,33 +721,19 @@ static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 
 static struct rtable *find_route(struct vport *vport,
 				 const struct tnl_mutable_config *mutable,
-				 u8 tos, __be32 daddr, __be32 saddr,
-				 struct tnl_cache **cache)
+				 u8 tos, __be32 daddr, __be32 saddr)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct tnl_cache *cur_cache = rcu_dereference(tnl_vport->cache);
+	struct rtable *rt;
 
-	*cache = NULL;
 	tos = RT_TOS(tos);
 
-	if (daddr == mutable->key.daddr && saddr == mutable->key.saddr &&
-	    tos == RT_TOS(mutable->tos) &&
-	    check_cache_valid(cur_cache, mutable)) {
-		*cache = cur_cache;
-		return cur_cache->rt;
-	} else {
-		struct rtable *rt;
-
-		rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
-				  daddr, saddr, tos);
-		if (IS_ERR(rt))
-			return NULL;
-
-		if (likely(tos == RT_TOS(mutable->tos)))
-			*cache = build_cache(vport, mutable, rt);
+	rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
+			  daddr, saddr, tos);
+	if (IS_ERR(rt))
+		return NULL;
 
-		return rt;
-	}
+	return rt;
 }
 
 static bool need_linearize(const struct sk_buff *skb)
@@ -1152,7 +858,6 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	enum vport_err_type err = VPORT_E_TX_ERROR;
 	struct rtable *rt;
 	struct dst_entry *unattached_dst = NULL;
-	struct tnl_cache *cache;
 	int sent_len = 0;
 	__be16 frag_off = 0;
 	__be32 daddr;
@@ -1210,11 +915,10 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	}
 
 	/* Route lookup */
-	rt = find_route(vport, mutable, tos, daddr, saddr, &cache);
+	rt = find_route(vport, mutable, tos, daddr, saddr);
 	if (unlikely(!rt))
 		goto error_free;
-	if (unlikely(!cache))
-		unattached_dst = &rt_dst(rt);
+	unattached_dst = &rt_dst(rt);
 
 	tos = INET_ECN_encapsulate(tos, inner_tos);
 
@@ -1239,11 +943,9 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	 * If we are over the MTU, allow the IP stack to handle fragmentation.
 	 * Fragmentation is a slow path anyways.
 	 */
-	if (unlikely(skb->len + mutable->tunnel_hlen > dst_mtu(&rt_dst(rt)) &&
-		     cache)) {
+	if (unlikely(skb->len + mutable->tunnel_hlen > dst_mtu(&rt_dst(rt)))) {
 		unattached_dst = &rt_dst(rt);
 		dst_hold(unattached_dst);
-		cache = NULL;
 	}
 
 	/* TTL */
@@ -1270,23 +972,15 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		if (unlikely(vlan_deaccel_tag(skb)))
 			goto next;
 
-		if (likely(cache)) {
-			skb_push(skb, cache->len);
-			memcpy(skb->data, get_cached_header(cache), cache->len);
-			skb_reset_mac_header(skb);
-			skb_set_network_header(skb, cache->hh_len);
-
-		} else {
-			skb_push(skb, mutable->tunnel_hlen);
-			create_tunnel_header(vport, mutable, rt, skb->data);
-			skb_reset_network_header(skb);
-
-			if (next_skb)
-				skb_dst_set(skb, dst_clone(unattached_dst));
-			else {
-				skb_dst_set(skb, unattached_dst);
-				unattached_dst = NULL;
-			}
+		skb_push(skb, mutable->tunnel_hlen);
+		create_tunnel_header(vport, mutable, rt, skb->data);
+		skb_reset_network_header(skb);
+
+		if (next_skb)
+			skb_dst_set(skb, dst_clone(unattached_dst));
+		else {
+			skb_dst_set(skb, unattached_dst);
+			unattached_dst = NULL;
 		}
 		skb_set_transport_header(skb, skb_network_offset(skb) + sizeof(struct iphdr));
 
@@ -1301,37 +995,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		if (unlikely(!skb))
 			goto next;
 
-		if (likely(cache)) {
-			int orig_len = skb->len - cache->len;
-			struct vport *cache_vport;
-
-			cache_vport = ovs_internal_dev_get_vport(rt_dst(rt).dev);
-			skb->protocol = htons(ETH_P_IP);
-			iph = ip_hdr(skb);
-			iph->tot_len = htons(skb->len - skb_network_offset(skb));
-			ip_send_check(iph);
-
-			if (cache_vport) {
-				if (unlikely(compute_ip_summed(skb, true))) {
-					kfree_skb(skb);
-					goto next;
-				}
-
-				OVS_CB(skb)->flow = cache->flow;
-				ovs_vport_receive(cache_vport, skb);
-				sent_len += orig_len;
-			} else {
-				int xmit_err;
-
-				skb->dev = rt_dst(rt).dev;
-				xmit_err = dev_queue_xmit(skb);
-
-				if (likely(net_xmit_eval(xmit_err) == 0))
-					sent_len += orig_len;
-			}
-		} else
-			sent_len += send_frags(skb, mutable);
-
+		sent_len += send_frags(skb, mutable);
 next:
 		skb = next_skb;
 	}
@@ -1414,13 +1078,6 @@ struct vport *ovs_tnl_create(const struct vport_parms *parms,
 	if (err)
 		goto error_free_mutable;
 
-	spin_lock_init(&tnl_vport->cache_lock);
-
-#ifdef NEED_CACHE_TIMEOUT
-	tnl_vport->cache_exp_interval = MAX_CACHE_EXP -
-				       (net_random() % (MAX_CACHE_EXP / 2));
-#endif
-
 	rcu_assign_pointer(tnl_vport->mutable, mutable);
 
 	port_table_add_port(vport);
@@ -1439,7 +1096,6 @@ static void free_port_rcu(struct rcu_head *rcu)
 	struct tnl_vport *tnl_vport = container_of(rcu,
 						   struct tnl_vport, rcu);
 
-	free_cache((struct tnl_cache __force *)tnl_vport->cache);
 	kfree((struct tnl_mutable __force *)tnl_vport->mutable);
 	ovs_vport_free(tnl_vport_to_vport(tnl_vport));
 }
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index 0af27ac..ed3b4ec 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -172,58 +172,6 @@ struct tnl_ops {
 /* If we can't detect all system changes directly we need to use a timeout. */
 #define NEED_CACHE_TIMEOUT
 #endif
-struct tnl_cache {
-	struct rcu_head rcu;
-
-	int len;		/* Length of data to be memcpy'd from cache. */
-	int hh_len;		/* Hardware hdr length, cached from hh_cache. */
-
-	/* Sequence number of mutable->seq from which this cache was
-	 * generated. */
-	unsigned mutable_seq;
-
-#ifdef HAVE_HH_SEQ
-	/*
-	 * The sequence number from the seqlock protecting the hardware header
-	 * cache (in the ARP cache).  Since every write increments the counter
-	 * this gives us an easy way to tell if it has changed.
-	 */
-	unsigned hh_seq;
-#endif
-
-#ifdef NEED_CACHE_TIMEOUT
-	/*
-	 * If we don't have direct mechanisms to detect all important changes in
-	 * the system fall back to an expiration time.  This expiration time
-	 * can be relatively short since at high rates there will be millions of
-	 * packets per second, so we'll still get plenty of benefit from the
-	 * cache.  Note that if something changes we may blackhole packets
-	 * until the expiration time (depending on what changed and the kernel
-	 * version we may be able to detect the change sooner).  Expiration is
-	 * expressed as a time in jiffies.
-	 */
-	unsigned long expiration;
-#endif
-
-	/*
-	 * The routing table entry that is the result of looking up the tunnel
-	 * endpoints.  It also contains a sequence number (called a generation
-	 * ID) that can be compared to a global sequence to tell if the routing
-	 * table has changed (and therefore there is a potential that this
-	 * cached route has been invalidated).
-	 */
-	struct rtable *rt;
-
-	/*
-	 * If the output device for tunnel traffic is an OVS internal device,
-	 * the flow of that datapath.  Since all tunnel traffic will have the
-	 * same headers this allows us to cache the flow lookup.  NULL if the
-	 * output device is not OVS or if there is no flow installed.
-	 */
-	struct sw_flow *flow;
-
-	/* The cached header follows after padding for alignment. */
-};
 
 struct tnl_vport {
 	struct rcu_head rcu;
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

* [PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

The use of these flags in the tnl_mutable_config structure
are no longer correct as a tunnel device may be used to
transmit packets for many different tunnels.

This change restores the checksum and out key behavior of
tunneling.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-of-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c       | 58 ++++++++++++++++++++++++-------------------------
 datapath/tunnel.h       | 12 +++-------
 datapath/vport-capwap.c | 28 ++++++++++++------------
 datapath/vport-gre.c    | 33 ++++++++++++++--------------
 4 files changed, 63 insertions(+), 68 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index a303d8d..982de25 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -500,7 +500,7 @@ bool ovs_tnl_frag_needed(struct vport *vport,
 
 static bool check_mtu(struct sk_buff *skb,
 		      struct vport *vport,
-		      const struct tnl_mutable_config *mutable,
+		      const struct tnl_mutable_config *mutable, int tun_hlen,
 		      const struct rtable *rt, __be16 *frag_offp)
 {
 	bool df_inherit = mutable->flags & TNL_F_DF_INHERIT;
@@ -524,10 +524,7 @@ static bool check_mtu(struct sk_buff *skb,
 		    eth_hdr(skb)->h_proto == htons(ETH_P_8021Q))
 			vlan_header = VLAN_HLEN;
 
-		mtu = dst_mtu(&rt_dst(rt))
-			- ETH_HLEN
-			- mutable->tunnel_hlen
-			- vlan_header;
+		mtu = dst_mtu(&rt_dst(rt)) - ETH_HLEN - tun_hlen - vlan_header;
 	}
 
 	if (skb->protocol == htons(ETH_P_IP)) {
@@ -569,11 +566,10 @@ static bool check_mtu(struct sk_buff *skb,
 }
 
 static void create_tunnel_header(const struct vport *vport,
-				 const struct tnl_mutable_config *mutable,
-				 const struct rtable *rt, void *header)
+				 const struct rtable *rt, struct sk_buff *skb)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct iphdr *iph = header;
+	struct iphdr *iph = (struct iphdr *)skb->data;
 
 	iph->version	= 4;
 	iph->ihl	= sizeof(struct iphdr) >> 2;
@@ -584,7 +580,7 @@ static void create_tunnel_header(const struct vport *vport,
 	if (!iph->ttl)
 		iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
 
-	tnl_vport->tnl_ops->build_header(vport, mutable, iph + 1);
+	tnl_vport->tnl_ops->build_header(vport, skb);
 }
 
 #ifdef HAVE_RT_GENID
@@ -657,16 +653,14 @@ static bool need_linearize(const struct sk_buff *skb)
 	return false;
 }
 
-static struct sk_buff *handle_offloads(struct sk_buff *skb,
-				       const struct tnl_mutable_config *mutable,
+static struct sk_buff *handle_offloads(struct sk_buff *skb, int tun_hlen,
 				       const struct rtable *rt)
 {
 	int min_headroom;
 	int err;
 
 	min_headroom = LL_RESERVED_SPACE(rt_dst(rt).dev) + rt_dst(rt).header_len
-			+ mutable->tunnel_hlen
-			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
+			+ tun_hlen + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
 	if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
 		int head_delta = SKB_DATA_ALIGN(min_headroom -
@@ -719,15 +713,14 @@ error:
 	return ERR_PTR(err);
 }
 
-static int send_frags(struct sk_buff *skb,
-		      const struct tnl_mutable_config *mutable)
+static int send_frags(struct sk_buff *skb, int tun_hlen)
 {
 	int sent_len;
 
 	sent_len = 0;
 	while (skb) {
 		struct sk_buff *next = skb->next;
-		int frag_len = skb->len - mutable->tunnel_hlen;
+		int frag_len = skb->len - tun_hlen;
 		int err;
 
 		skb->next = NULL;
@@ -752,6 +745,14 @@ free_frags:
 	return sent_len;
 }
 
+static int tunnel_hlen(struct tnl_vport *tnl_vport, struct sk_buff *skb)
+{
+	int tun_hlen = tnl_vport->tnl_ops->hdr_len(skb);
+	if (tun_hlen < 0)
+		return tun_hlen;
+	return tun_hlen + sizeof(struct iphdr);
+}
+
 int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
@@ -765,6 +766,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	u8 ttl;
 	u8 inner_tos;
 	u8 tos;
+	int tun_hlen;
 
 	if (!OVS_CB(skb)->tun_key)
 		goto error_free;
@@ -822,13 +824,17 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	skb_dst_drop(skb);
 	skb_clear_rxhash(skb);
 
+	tun_hlen = tunnel_hlen(tnl_vport, skb);
+	if (unlikely(tun_hlen < 0))
+		goto error;
+
 	/* Offloading */
-	skb = handle_offloads(skb, mutable, rt);
+	skb = handle_offloads(skb, tun_hlen, rt);
 	if (IS_ERR(skb))
 		goto error;
 
 	/* MTU */
-	if (unlikely(!check_mtu(skb, vport, mutable, rt, &frag_off))) {
+	if (unlikely(!check_mtu(skb, vport, mutable, tun_hlen, rt, &frag_off))) {
 		err = VPORT_E_TX_DROPPED;
 		goto error_free;
 	}
@@ -837,7 +843,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	 * If we are over the MTU, allow the IP stack to handle fragmentation.
 	 * Fragmentation is a slow path anyways.
 	 */
-	if (unlikely(skb->len + mutable->tunnel_hlen > dst_mtu(&rt_dst(rt)))) {
+	if (unlikely(skb->len + tun_hlen > dst_mtu(&rt_dst(rt)))) {
 		unattached_dst = &rt_dst(rt);
 		dst_hold(unattached_dst);
 	}
@@ -862,8 +868,8 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		if (unlikely(vlan_deaccel_tag(skb)))
 			goto next;
 
-		skb_push(skb, mutable->tunnel_hlen);
-		create_tunnel_header(vport, mutable, rt, skb->data);
+		skb_push(skb, tun_hlen);
+		create_tunnel_header(vport, rt, skb);
 		skb_reset_network_header(skb);
 
 		if (next_skb)
@@ -880,12 +886,12 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		iph->frag_off = frag_off;
 		ip_select_ident(iph, &rt_dst(rt), NULL);
 
-		skb = tnl_vport->tnl_ops->update_header(vport, mutable,
+		skb = tnl_vport->tnl_ops->update_header(vport, tun_hlen,
 							&rt_dst(rt), skb);
 		if (unlikely(!skb))
 			goto next;
 
-		sent_len += send_frags(skb, mutable);
+		sent_len += send_frags(skb, tun_hlen);
 next:
 		skb = next_skb;
 	}
@@ -917,12 +923,6 @@ static int tnl_set_config(struct net *net,
 	port_key_set_net(&mutable->key, net);
 	mutable->key.tunnel_type = tnl_ops->tunnel_type;
 
-	mutable->tunnel_hlen = tnl_ops->hdr_len(mutable);
-	if (mutable->tunnel_hlen < 0)
-		return mutable->tunnel_hlen;
-
-	mutable->tunnel_hlen += sizeof(struct iphdr);
-
 	old_vport = port_table_lookup(&mutable->key);
 	if (old_vport && old_vport != cur_vport)
 		return -EEXIST;
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index cddb88e..a32241f 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -84,10 +84,8 @@ static inline void port_key_set_net(struct port_lookup_key *key, struct net *net
  * attributes.
  * @rcu: RCU callback head for deferred destruction.
  * @seq: Sequence number for distinguishing configuration versions.
- * @tunnel_hlen: Tunnel header length.
  * @eth_addr: Source address for packets generated by tunnel itself
  * (e.g. ICMP fragmentation needed messages).
- * @out_key: Key to use on output, 0 if this tunnel has no fixed output key.
  * @flags: TNL_F_* flags.
  */
 struct tnl_mutable_config {
@@ -96,12 +94,9 @@ struct tnl_mutable_config {
 
 	unsigned seq;
 
-	unsigned tunnel_hlen;
-
 	unsigned char eth_addr[ETH_ALEN];
 
 	/* Configured via OVS_TUNNEL_ATTR_* attributes. */
-	__be64	out_key;
 	u32	flags;
 };
 
@@ -114,7 +109,7 @@ struct tnl_ops {
 	 * build_header() (i.e. excludes the IP header).  Returns a negative
 	 * error code if the configuration is invalid.
 	 */
-	int (*hdr_len)(const struct tnl_mutable_config *);
+	int (*hdr_len)(struct sk_buff *skb);
 
 	/*
 	 * Builds the static portion of the tunnel header, which is stored in
@@ -124,8 +119,7 @@ struct tnl_ops {
 	 * in some circumstances caching is disabled and this function will be
 	 * called for every packet, so try not to make it too slow.
 	 */
-	void (*build_header)(const struct vport *,
-			     const struct tnl_mutable_config *, void *header);
+	void (*build_header)(const struct vport *, struct sk_buff *);
 
 	/*
 	 * Updates the cached header of a packet to match the actual packet
@@ -136,7 +130,7 @@ struct tnl_ops {
 	 * of fragmentation).
 	 */
 	struct sk_buff *(*update_header)(const struct vport *,
-					 const struct tnl_mutable_config *,
+					 int tun_hlen,
 					 struct dst_entry *, struct sk_buff *);
 };
 
diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
index a180b87..102a207 100644
--- a/datapath/vport-capwap.c
+++ b/datapath/vport-capwap.c
@@ -155,16 +155,17 @@ static struct inet_frags frag_state = {
 	.secret_interval = CAPWAP_FRAG_SECRET_INTERVAL,
 };
 
-static int capwap_hdr_len(const struct tnl_mutable_config *mutable)
+static int capwap_hdr_len(struct sk_buff *skb)
 {
 	int size = CAPWAP_MIN_HLEN;
 
 	/* CAPWAP has no checksums. */
-	if (mutable->flags & TNL_F_CSUM)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM) {
 		return -EINVAL;
 
 	/* if keys are specified, then add WSI field */
-	if (mutable->out_key || (mutable->flags & TNL_F_OUT_KEY_ACTION)) {
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		size += sizeof(struct capwaphdr_wsi) +
 			sizeof(struct capwaphdr_wsi_key);
 	}
@@ -172,11 +173,10 @@ static int capwap_hdr_len(const struct tnl_mutable_config *mutable)
 	return size;
 }
 
-static void capwap_build_header(const struct vport *vport,
-				const struct tnl_mutable_config *mutable,
-				void *header)
+static void capwap_build_header(const struct vport *vport, struct sk_buff *skb)
 {
-	struct udphdr *udph = header;
+	struct iphdr *iph = (struct iphdr *)skb->data;
+	struct udphdr *udph = (struct udphdr *)(iph + 1);
 	struct capwaphdr *cwh = (struct capwaphdr *)(udph + 1);
 
 	udph->source = htons(CAPWAP_SRC_PORT);
@@ -186,7 +186,8 @@ static void capwap_build_header(const struct vport *vport,
 	cwh->frag_id = 0;
 	cwh->frag_off = 0;
 
-	if (mutable->out_key || (mutable->flags & TNL_F_OUT_KEY_ACTION)) {
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION) {
 		struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
 
 		cwh->begin = CAPWAP_KEYED;
@@ -197,9 +198,9 @@ static void capwap_build_header(const struct vport *vport,
 		wsi->flags = CAPWAP_WSI_F_KEY64;
 		wsi->reserved_padding = 0;
 
-		if (mutable->out_key) {
+		if (OVS_CB(skb)->tun_key->tun_id) {
 			struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key *)(wsi + 1);
-			opt->key = mutable->out_key;
+			opt->key = OVS_CB(skb)->tun_key->tun_id;
 		}
 	} else {
 		/* make packet readable by old capwap code */
@@ -208,13 +209,12 @@ static void capwap_build_header(const struct vport *vport,
 }
 
 static struct sk_buff *capwap_update_header(const struct vport *vport,
-					    const struct tnl_mutable_config *mutable,
-					    struct dst_entry *dst,
+					    int tun_hlen, struct dst_entry *dst,
 					    struct sk_buff *skb)
 {
 	struct udphdr *udph = udp_hdr(skb);
 
-	if (mutable->flags & TNL_F_OUT_KEY_ACTION) {
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION) {
 		/* first field in WSI is key */
 		struct capwaphdr *cwh = (struct capwaphdr *)(udph + 1);
 		struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
@@ -226,7 +226,7 @@ static struct sk_buff *capwap_update_header(const struct vport *vport,
 	udph->len = htons(skb->len - skb_transport_offset(skb));
 
 	if (unlikely(skb->len - skb_network_offset(skb) > dst_mtu(dst))) {
-		unsigned int hlen = skb_transport_offset(skb) + capwap_hdr_len(mutable);
+		unsigned int hlen = skb_transport_offset(skb) + capwap_hdr_len(skb);
 		skb = fragment(skb, vport, dst, hlen);
 	}
 
diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
index 8fab193..b6a4308 100644
--- a/datapath/vport-gre.c
+++ b/datapath/vport-gre.c
@@ -45,16 +45,17 @@ struct gre_base_hdr {
 	__be16 protocol;
 };
 
-static int gre_hdr_len(const struct tnl_mutable_config *mutable)
+static int gre_hdr_len(struct sk_buff *skb)
 {
 	int len;
 
 	len = GRE_HEADER_SECTION;
 
-	if (mutable->flags & TNL_F_CSUM)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM)
 		len += GRE_HEADER_SECTION;
 
-	if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		len += GRE_HEADER_SECTION;
 
 	return len;
@@ -70,41 +71,41 @@ static __be32 be64_get_low32(__be64 x)
 #endif
 }
 
-static void gre_build_header(const struct vport *vport,
-			     const struct tnl_mutable_config *mutable,
-			     void *header)
+static void gre_build_header(const struct vport *vport, struct sk_buff *skb)
 {
-	struct gre_base_hdr *greh = header;
+	struct iphdr *iph = (struct iphdr *)skb->data;
+	struct gre_base_hdr *greh = (struct gre_base_hdr *)(iph + 1);
 	__be32 *options = (__be32 *)(greh + 1);
 
 	greh->protocol = htons(ETH_P_TEB);
 	greh->flags = 0;
 
-	if (mutable->flags & TNL_F_CSUM) {
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM) {
 		greh->flags |= GRE_CSUM;
 		*options = 0;
 		options++;
 	}
 
-	if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		greh->flags |= GRE_KEY;
 
-	if (mutable->out_key)
-		*options = be64_get_low32(mutable->out_key);
+	if (OVS_CB(skb)->tun_key->tun_id)
+		*options = be64_get_low32(OVS_CB(skb)->tun_key->tun_id);
 }
 
 static struct sk_buff *gre_update_header(const struct vport *vport,
-					 const struct tnl_mutable_config *mutable,
-					 struct dst_entry *dst,
+					 int tun_hlen, struct dst_entry *dst,
 					 struct sk_buff *skb)
 {
-	__be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen
+	__be32 *options = (__be32 *)(skb_network_header(skb) + tun_hlen
 					       - GRE_HEADER_SECTION);
 
-	if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		options--;
 
-	if (mutable->flags & TNL_F_CSUM)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM)
 		*(__sum16 *)options = csum_fold(skb_checksum(skb,
 						skb_transport_offset(skb),
 						skb->len - skb_transport_offset(skb),
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

* [PATCH 21/21] datapath: Always use tun_key flags
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

These flags should always be valid and allows the flags
element of tnl_mutable_config to be removed.

The flags in mutable were actually not being set due to a previous patch in
this series, so all flag-related features, except outgoing ken and csum
which were restored in a previous patch, were disabled.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-of-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 13 ++++++-------
 datapath/tunnel.h |  4 ----
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index 982de25..a91e319 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -482,7 +482,7 @@ bool ovs_tnl_frag_needed(struct vport *vport,
 	 * not symmetric then PMTUD needs to be disabled since we won't have
 	 * any way of synthesizing packets.
 	 */
-	if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
+	if ((OVS_CB(skb)->tun_key->tun_flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
 	    (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
 		ntun_key = *tun_key;
 		OVS_CB(nskb)->tun_key = &ntun_key;
@@ -503,9 +503,9 @@ static bool check_mtu(struct sk_buff *skb,
 		      const struct tnl_mutable_config *mutable, int tun_hlen,
 		      const struct rtable *rt, __be16 *frag_offp)
 {
-	bool df_inherit = mutable->flags & TNL_F_DF_INHERIT;
-	bool pmtud = mutable->flags & TNL_F_PMTUD;
-	__be16 frag_off = mutable->flags & TNL_F_DF_DEFAULT ? htons(IP_DF) : 0;
+	bool df_inherit = OVS_CB(skb)->tun_key->tun_flags & TNL_F_DF_INHERIT;
+	bool pmtud = OVS_CB(skb)->tun_key->tun_flags & TNL_F_PMTUD;
+	__be16 frag_off = OVS_CB(skb)->tun_key->tun_flags & TNL_F_DF_DEFAULT ? htons(IP_DF) : 0;
 	int mtu = 0;
 	unsigned int packet_length = skb->len - ETH_HLEN;
 
@@ -804,7 +804,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	else
 		inner_tos = 0;
 
-	if (mutable->flags & TNL_F_TOS_INHERIT)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_TOS_INHERIT)
 		tos = inner_tos;
 	else
 		tos = OVS_CB(skb)->tun_key->ipv4_tos;
@@ -851,7 +851,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
 	if (!ttl)
 		ttl = ip4_dst_hoplimit(&rt_dst(rt));
-	if (mutable->flags & TNL_F_TTL_INHERIT) {
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_TTL_INHERIT) {
 		if (skb->protocol == htons(ETH_P_IP))
 			ttl = ip_hdr(skb)->ttl;
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
@@ -919,7 +919,6 @@ static int tnl_set_config(struct net *net,
 {
 	const struct vport *old_vport;
 
-	mutable->flags = 0;
 	port_key_set_net(&mutable->key, net);
 	mutable->key.tunnel_type = tnl_ops->tunnel_type;
 
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index a32241f..4893903 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -86,7 +86,6 @@ static inline void port_key_set_net(struct port_lookup_key *key, struct net *net
  * @seq: Sequence number for distinguishing configuration versions.
  * @eth_addr: Source address for packets generated by tunnel itself
  * (e.g. ICMP fragmentation needed messages).
- * @flags: TNL_F_* flags.
  */
 struct tnl_mutable_config {
 	struct port_lookup_key key;
@@ -95,9 +94,6 @@ struct tnl_mutable_config {
 	unsigned seq;
 
 	unsigned char eth_addr[ETH_ALEN];
-
-	/* Configured via OVS_TUNNEL_ATTR_* attributes. */
-	u32	flags;
 };
 
 struct tnl_ops {
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

* [PATCH 19/21] datapath: Simplify vport lookup
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

The lookup is now only based on the net and tunnel type.
It should be possible to either get rid of the lookup alltogether
or push it into the GRE and CAPWAP implementations, but this
change is simpler for now

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c       | 110 +++---------------------------------------------
 datapath/tunnel.h       |  18 ++------
 datapath/vport-capwap.c |   7 +--
 datapath/vport-gre.c    |  10 ++---
 4 files changed, 16 insertions(+), 129 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index 39aa2af..a303d8d 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -56,18 +56,6 @@
 
 static struct hlist_head *port_table __read_mostly;
 
-/*
- * These are just used as an optimization: they don't require any kind of
- * synchronization because we could have just as easily read the value before
- * the port change happened.
- */
-static unsigned int key_local_remote_ports __read_mostly;
-static unsigned int key_remote_ports __read_mostly;
-static unsigned int key_multicast_ports __read_mostly;
-static unsigned int local_remote_ports __read_mostly;
-static unsigned int remote_ports __read_mostly;
-static unsigned int multicast_ports __read_mostly;
-
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,36)
 #define rt_dst(rt) (rt->dst)
 #else
@@ -97,27 +85,6 @@ static void assign_config_rcu(struct vport *vport,
 	call_rcu(&old_config->rcu, free_config_rcu);
 }
 
-static unsigned int *find_port_pool(const struct tnl_mutable_config *mutable)
-{
-	bool is_multicast = ipv4_is_multicast(mutable->key.daddr);
-
-	if (mutable->flags & TNL_F_IN_KEY_MATCH) {
-		if (mutable->key.saddr)
-			return &local_remote_ports;
-		else if (is_multicast)
-			return &multicast_ports;
-		else
-			return &remote_ports;
-	} else {
-		if (mutable->key.saddr)
-			return &key_local_remote_ports;
-		else if (is_multicast)
-			return &key_multicast_ports;
-		else
-			return &key_remote_ports;
-	}
-}
-
 static u32 port_hash(const struct port_lookup_key *key)
 {
 	return jhash2((u32 *)key, (PORT_KEY_LEN / sizeof(u32)), 0);
@@ -137,8 +104,6 @@ static void port_table_add_port(struct vport *vport)
 	mutable = rtnl_dereference(tnl_vport->mutable);
 	hash = port_hash(&mutable->key);
 	hlist_add_head_rcu(&tnl_vport->hash_node, find_bucket(hash));
-
-	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))++;
 }
 
 static void port_table_remove_port(struct vport *vport)
@@ -146,12 +111,9 @@ static void port_table_remove_port(struct vport *vport)
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
 
 	hlist_del_init_rcu(&tnl_vport->hash_node);
-
-	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))--;
 }
 
-static struct vport *port_table_lookup(struct port_lookup_key *key,
-				       const struct tnl_mutable_config **pmutable)
+static struct vport *port_table_lookup(struct port_lookup_key *key)
 {
 	struct hlist_node *n;
 	struct hlist_head *bucket;
@@ -164,79 +126,21 @@ static struct vport *port_table_lookup(struct port_lookup_key *key,
 		struct tnl_mutable_config *mutable;
 
 		mutable = rcu_dereference_rtnl(tnl_vport->mutable);
-		if (!memcmp(&mutable->key, key, PORT_KEY_LEN)) {
-			*pmutable = mutable;
+		if (!memcmp(&mutable->key, key, PORT_KEY_LEN))
 			return tnl_vport_to_vport(tnl_vport);
-		}
 	}
 
 	return NULL;
 }
 
-struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
-				__be64 key, int tunnel_type,
-				const struct tnl_mutable_config **mutable)
+struct vport *ovs_tnl_find_port(struct net *net, u32 tunnel_type)
 {
 	struct port_lookup_key lookup;
-	struct vport *vport;
-	bool is_multicast = ipv4_is_multicast(saddr);
 
 	port_key_set_net(&lookup, net);
-	lookup.saddr = saddr;
-	lookup.daddr = daddr;
-
-	/* First try for exact match on in_key. */
-	lookup.in_key = key;
-	lookup.tunnel_type = tunnel_type | TNL_T_KEY_EXACT;
-	if (!is_multicast && key_local_remote_ports) {
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-	}
-	if (key_remote_ports) {
-		lookup.saddr = 0;
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-
-		lookup.saddr = saddr;
-	}
-
-	/* Then try matches that wildcard in_key. */
-	lookup.in_key = 0;
-	lookup.tunnel_type = tunnel_type | TNL_T_KEY_MATCH;
-	if (!is_multicast && local_remote_ports) {
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-	}
-	if (remote_ports) {
-		lookup.saddr = 0;
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-	}
+	lookup.tunnel_type = tunnel_type;
 
-	if (is_multicast) {
-		lookup.saddr = 0;
-		lookup.daddr = saddr;
-		if (key_multicast_ports) {
-			lookup.tunnel_type = tunnel_type | TNL_T_KEY_EXACT;
-			lookup.in_key = key;
-			vport = port_table_lookup(&lookup, mutable);
-			if (vport)
-				return vport;
-		}
-		if (multicast_ports) {
-			lookup.tunnel_type = tunnel_type | TNL_T_KEY_MATCH;
-			lookup.in_key = 0;
-			vport = port_table_lookup(&lookup, mutable);
-			if (vport)
-				return vport;
-		}
-	}
-
-	return NULL;
+	return port_table_lookup(&lookup);
 }
 
 static void ecn_decapsulate(struct sk_buff *skb)
@@ -1008,11 +912,9 @@ static int tnl_set_config(struct net *net,
 			  struct tnl_mutable_config *mutable)
 {
 	const struct vport *old_vport;
-	const struct tnl_mutable_config *old_mutable;
 
 	mutable->flags = 0;
 	port_key_set_net(&mutable->key, net);
-	mutable->key.daddr = htonl(0);
 	mutable->key.tunnel_type = tnl_ops->tunnel_type;
 
 	mutable->tunnel_hlen = tnl_ops->hdr_len(mutable);
@@ -1021,7 +923,7 @@ static int tnl_set_config(struct net *net,
 
 	mutable->tunnel_hlen += sizeof(struct iphdr);
 
-	old_vport = port_table_lookup(&mutable->key, &old_mutable);
+	old_vport = port_table_lookup(&mutable->key);
 	if (old_vport && old_vport != cur_vport)
 		return -EEXIST;
 
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index 330df27..cddb88e 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -35,16 +35,9 @@
 
 /*
  * One of these goes in struct tnl_ops and in tnl_find_port().
- * These values are in the same namespace as other TNL_T_* values, so
- * only the least significant 10 bits are available to define protocol
- * identifiers.
  */
-#define TNL_T_PROTO_GRE		0
-#define TNL_T_PROTO_CAPWAP	1
-
-/* These flags are only needed when calling tnl_find_port(). */
-#define TNL_T_KEY_EXACT		(1 << 10)
-#define TNL_T_KEY_MATCH		(1 << 11)
+#define TNL_T_PROTO_GRE			0
+#define TNL_T_PROTO_CAPWAP		1
 
 /* Private flags not exposed to userspace in this form. */
 #define TNL_F_IN_KEY_MATCH	(1 << 16) /* Store the key in tun_id to
@@ -66,12 +59,9 @@
  * @tunnel_type: Set of TNL_T_* flags that define lookup.
  */
 struct port_lookup_key {
-	__be64 in_key;
 #ifdef CONFIG_NET_NS
 	struct net *net;
 #endif
-	__be32 saddr;
-	__be32 daddr;
 	u32    tunnel_type;
 };
 
@@ -212,9 +202,7 @@ const unsigned char *ovs_tnl_get_addr(const struct vport *vport);
 int ovs_tnl_send(struct vport *vport, struct sk_buff *skb);
 void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb);
 
-struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
-				__be64 key, int tunnel_type,
-				const struct tnl_mutable_config **mutable);
+struct vport *ovs_tnl_find_port(struct net *net, u32 tunnel_type);
 bool ovs_tnl_frag_needed(struct vport *vport,
 			 const struct tnl_mutable_config *mutable,
 			 struct sk_buff *skb, unsigned int mtu,
diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
index f26a7d2..a180b87 100644
--- a/datapath/vport-capwap.c
+++ b/datapath/vport-capwap.c
@@ -314,7 +314,6 @@ error:
 static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
 {
 	struct vport *vport;
-	const struct tnl_mutable_config *mutable;
 	struct iphdr *iph;
 	struct ovs_key_ipv4_tunnel tun_key;
 	__be64 key = 0;
@@ -327,15 +326,13 @@ static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
 		goto out;
 
 	iph = ip_hdr(skb);
-	vport = ovs_tnl_find_port(sock_net(sk), iph->daddr, iph->saddr, key,
-				  TNL_T_PROTO_CAPWAP, &mutable);
+	vport = ovs_tnl_find_port(dev_net(skb->dev), TNL_T_PROTO_CAPWAP);
 	if (unlikely(!vport)) {
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
 		goto error;
 	}
 
-	tun_key_init(&tun_key, iph,
-		     mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+	tun_key_init(&tun_key, iph, key);
 	OVS_CB(skb)->tun_key = &tun_key;
 
 	ovs_tnl_rcv(vport, skb);
diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
index f610097..8fab193 100644
--- a/datapath/vport-gre.c
+++ b/datapath/vport-gre.c
@@ -170,6 +170,8 @@ static int parse_header(struct iphdr *iph, __be16 *flags, __be64 *key)
 /* Called with rcu_read_lock and BH disabled. */
 static void gre_err(struct sk_buff *skb, u32 info)
 {
+#warning fix gre_err
+#if 0
 	struct vport *vport;
 	const struct tnl_mutable_config *mutable;
 	const int type = icmp_hdr(skb)->type;
@@ -292,6 +294,7 @@ out:
 	skb_set_mac_header(skb, orig_mac_header);
 	skb_set_network_header(skb, orig_nw_header);
 	skb->protocol = htons(ETH_P_IP);
+#endif
 }
 
 static bool check_checksum(struct sk_buff *skb)
@@ -324,7 +327,6 @@ static bool check_checksum(struct sk_buff *skb)
 static int gre_rcv(struct sk_buff *skb)
 {
 	struct vport *vport;
-	const struct tnl_mutable_config *mutable;
 	int hdr_len;
 	struct iphdr *iph;
 	struct ovs_key_ipv4_tunnel tun_key;
@@ -345,16 +347,14 @@ static int gre_rcv(struct sk_buff *skb)
 		goto error;
 
 	iph = ip_hdr(skb);
-	vport = ovs_tnl_find_port(dev_net(skb->dev), iph->daddr, iph->saddr, key,
-				  TNL_T_PROTO_GRE, &mutable);
+	vport = ovs_tnl_find_port(dev_net(skb->dev), TNL_T_PROTO_GRE);
 	if (unlikely(!vport)) {
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
 		goto error;
 	}
 
 
-	tun_key_init(&tun_key, iph,
-		     mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+	tun_key_init(&tun_key, iph, key);
 	OVS_CB(skb)->tun_key = &tun_key;
 
 	__skb_pull(skb, hdr_len);
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

* [PATCH 17/21] datapath: Always use tun_key addresses for route lookup
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

The tun_key should always be present and correct.
Mutable no longer stores correct address information
and the saddr and daddr fields will be removed.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 42 +++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index b997cb8..ba18055 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -690,46 +690,44 @@ static inline int rt_genid(struct net *net)
 }
 #endif
 
-static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
-				   u8 ipproto, __be32 daddr, __be32 saddr,
-				   u8 tos)
+static struct rtable *__find_route(struct net *net, u8 ipproto,
+				   struct ovs_key_ipv4_tunnel *tun_key, u8 tos)
 {
 	/* Tunnel configuration keeps DSCP part of TOS bits, But Linux
 	 * router expect RT_TOS bits only. */
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,39)
 	struct flowi fl = { .nl_u = { .ip4_u = {
-					.daddr = daddr,
-					.saddr = saddr,
+					.daddr = tun_key->ipv4_dst,
+					.saddr = tun_key->ipv4_src,
 					.tos   = RT_TOS(tos) } },
 					.proto = ipproto };
 	struct rtable *rt;
 
-	if (unlikely(ip_route_output_key(port_key_get_net(&mutable->key), &rt, &fl)))
+	if (unlikely(ip_route_output_key(net, &rt, &fl)))
 		return ERR_PTR(-EADDRNOTAVAIL);
 
 	return rt;
 #else
-	struct flowi4 fl = { .daddr = daddr,
-			     .saddr = saddr,
+	struct flowi4 fl = { .daddr = tun_key->ipv4_dst,
+			     .saddr = tun_key->ipv4_src,
 			     .flowi4_tos = RT_TOS(tos),
 			     .flowi4_proto = ipproto };
 
-	return ip_route_output_key(port_key_get_net(&mutable->key), &fl);
+	return ip_route_output_key(net, &fl);
 #endif
 }
 
-static struct rtable *find_route(struct vport *vport,
-				 const struct tnl_mutable_config *mutable,
-				 u8 tos, __be32 daddr, __be32 saddr)
+static struct rtable *find_route(struct vport *vport, struct net *net,
+				 struct ovs_key_ipv4_tunnel *tun_key, u8 tos)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
 	struct rtable *rt;
 
 	tos = RT_TOS(tos);
 
-	rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
-			  daddr, saddr, tos);
+	rt = __find_route(net, tnl_vport->tnl_ops->ipproto,
+			  tun_key, tos);
 	if (IS_ERR(rt))
 		return NULL;
 
@@ -860,12 +858,13 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct dst_entry *unattached_dst = NULL;
 	int sent_len = 0;
 	__be16 frag_off = 0;
-	__be32 daddr;
-	__be32 saddr;
 	u8 ttl;
 	u8 inner_tos;
 	u8 tos;
 
+	if (!OVS_CB(skb)->tun_key)
+		goto error_free;
+
 	/* Validate the protocol headers before we try to use them. */
 	if (skb->protocol == htons(ETH_P_8021Q) &&
 	    !vlan_tx_tag_present(skb)) {
@@ -906,16 +905,9 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	else
 		tos = mutable->tos;
 
-	if (OVS_CB(skb)->tun_key) {
-		daddr = OVS_CB(skb)->tun_key->ipv4_dst;
-		saddr = OVS_CB(skb)->tun_key->ipv4_src;
-	} else {
-		daddr = mutable->key.daddr;
-		saddr = mutable->key.saddr;
-	}
-
 	/* Route lookup */
-	rt = find_route(vport, mutable, tos, daddr, saddr);
+	rt = find_route(vport, port_key_get_net(&mutable->key),
+			OVS_CB(skb)->tun_key, tos);
 	if (unlikely(!rt))
 		goto error_free;
 	unattached_dst = &rt_dst(rt);
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

* [PATCH 08/21] ofproto: Add realdev_to_txdev()
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

This is used to map a tunnel or VLAN realdevs to
tundev and vlandevs respectively. This is used
on transmit to map fromt the interface used
in user-space to the interface used in the datapath.

In the case where an interface is not a tunnel
and does not have VLAN splinters configured
a identity map is made.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 ofproto/ofproto-dpif.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 642b508..c7ea391 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -539,8 +539,6 @@ struct vlan_splinter {
     int vid;
 };
 
-static uint32_t vsp_realdev_to_vlandev(const struct ofproto_dpif *,
-                                       uint32_t realdev, ovs_be16 vlan_tci);
 static bool vsp_adjust_flow(const struct ofproto_dpif *, struct flow *);
 static void vsp_remove(struct ofport_dpif *);
 static void vsp_add(struct ofport_dpif *, uint16_t realdev_ofp_port, int vid);
@@ -555,6 +553,10 @@ static unsigned multicast_ports;
 static int set_tunnelling(struct ofport *ofport_, uint16_t realdev_ofp_port,
                           const struct tunnel_settings *s);
 
+static uint32_t
+realdev_to_txdev(const struct ofproto_dpif *ofproto,
+                 const struct ofport_dpif *ofport, ovs_be16 vlan_tci);
+
 static struct ofport_dpif *
 ofport_dpif_cast(const struct ofport *ofport)
 {
@@ -4700,9 +4702,8 @@ send_packet(const struct ofport_dpif *ofport, struct ofpbuf *packet)
     int error;
 
     flow_extract((struct ofpbuf *) packet, 0, 0, 0, &flow);
-    odp_port = vsp_realdev_to_vlandev(ofproto, ofport->odp_port,
-                                      flow.vlan_tci);
-    if (odp_port != ofport->odp_port) {
+    odp_port = realdev_to_txdev(ofproto, ofport, flow.vlan_tci);
+    if (odp_port != ofport->odp_port && !ofport->tun) {
         eth_pop_vlan(packet);
         flow.vlan_tci = htons(0);
     }
@@ -4909,9 +4910,8 @@ compose_output_action__(struct action_xlate_ctx *ctx, uint16_t ofp_port,
          * later and we're pre-populating the flow table.  */
     }
 
-    out_port = vsp_realdev_to_vlandev(ctx->ofproto, odp_port,
-                                      ctx->flow.vlan_tci);
-    if (out_port != odp_port) {
+    out_port = realdev_to_txdev(ctx->ofproto, ofport, ctx->flow.vlan_tci);
+    if (out_port != odp_port && !ofport->tun) {
         ctx->flow.vlan_tci = htons(0);
     }
     commit_odp_actions(&ctx->flow, &ctx->base_flow, ctx->odp_actions);
@@ -7211,6 +7211,21 @@ set_tunnelling(struct ofport *ofport_, uint16_t tundev_ofp_port,
 
     return 0;
 }
+
+/* Maps a port to the port that it should be transmitted on.
+ * If tunneling is enabled then the associated tunnel port is returned.
+ * If VLAN splintering is enabled then the ofp_port of the vlandev is
+ * returned.
+ * Otherwise no mapping is in effect and ofport->odp_port is returned. */
+static uint32_t
+realdev_to_txdev(const struct ofproto_dpif *ofproto,
+                 const struct ofport_dpif *ofport, ovs_be16 vlan_tci)
+{
+    if (ofport->tun) {
+        return ofp_port_to_odp_port(ofport->tun->tundev_ofp_port);
+    }
+    return vsp_realdev_to_vlandev(ofproto, ofport->odp_port, vlan_tci);
+}
 \f
 const struct ofproto_class ofproto_dpif_class = {
     enumerate_types,
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

* [PATCH 02/21] datapath: Use tun_key on transmit
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

Use the tun_key, which is the basis of flow-based tunnelling, on transmit.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index 010e513..61add96 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -1002,15 +1002,16 @@ unlock:
 }
 
 static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
-				   u8 ipproto, u8 tos)
+				   u8 ipproto, __be32 daddr, __be32 saddr,
+				   u8 tos)
 {
 	/* Tunnel configuration keeps DSCP part of TOS bits, But Linux
 	 * router expect RT_TOS bits only. */
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,39)
 	struct flowi fl = { .nl_u = { .ip4_u = {
-					.daddr = mutable->key.daddr,
-					.saddr = mutable->key.saddr,
+					.daddr = daddr,
+					.saddr = saddr,
 					.tos   = RT_TOS(tos) } },
 					.proto = ipproto };
 	struct rtable *rt;
@@ -1020,8 +1021,8 @@ static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 
 	return rt;
 #else
-	struct flowi4 fl = { .daddr = mutable->key.daddr,
-			     .saddr = mutable->key.saddr,
+	struct flowi4 fl = { .daddr = daddr,
+			     .saddr = saddr,
 			     .flowi4_tos = RT_TOS(tos),
 			     .flowi4_proto = ipproto };
 
@@ -1031,7 +1032,8 @@ static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 
 static struct rtable *find_route(struct vport *vport,
 				 const struct tnl_mutable_config *mutable,
-				 u8 tos, struct tnl_cache **cache)
+				 u8 tos, __be32 daddr, __be32 saddr,
+				 struct tnl_cache **cache)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
 	struct tnl_cache *cur_cache = rcu_dereference(tnl_vport->cache);
@@ -1039,14 +1041,16 @@ static struct rtable *find_route(struct vport *vport,
 	*cache = NULL;
 	tos = RT_TOS(tos);
 
-	if (likely(tos == RT_TOS(mutable->tos) &&
-	    check_cache_valid(cur_cache, mutable))) {
+	if (daddr == mutable->key.daddr && saddr == mutable->key.saddr &&
+	    tos == RT_TOS(mutable->tos) &&
+	    check_cache_valid(cur_cache, mutable)) {
 		*cache = cur_cache;
 		return cur_cache->rt;
 	} else {
 		struct rtable *rt;
 
-		rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto, tos);
+		rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
+				  daddr, saddr, tos);
 		if (IS_ERR(rt))
 			return NULL;
 
@@ -1182,6 +1186,8 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct tnl_cache *cache;
 	int sent_len = 0;
 	__be16 frag_off = 0;
+	__be32 daddr;
+	__be32 saddr;
 	u8 ttl;
 	u8 inner_tos;
 	u8 tos;
@@ -1221,11 +1227,21 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	if (mutable->flags & TNL_F_TOS_INHERIT)
 		tos = inner_tos;
+	else if (OVS_CB(skb)->tun_key)
+		tos = OVS_CB(skb)->tun_key->ipv4_tos;
 	else
 		tos = mutable->tos;
 
+	if (OVS_CB(skb)->tun_key) {
+		daddr = OVS_CB(skb)->tun_key->ipv4_dst;
+		saddr = OVS_CB(skb)->tun_key->ipv4_src;
+	} else {
+		daddr = mutable->key.daddr;
+		saddr = mutable->key.saddr;
+	}
+
 	/* Route lookup */
-	rt = find_route(vport, mutable, tos, &cache);
+	rt = find_route(vport, mutable, tos, daddr, saddr, &cache);
 	if (unlikely(!rt))
 		goto error_free;
 	if (unlikely(!cache))
@@ -1262,10 +1278,12 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	}
 
 	/* TTL */
-	ttl = mutable->ttl;
+	if (OVS_CB(skb)->tun_key)
+		ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
+	else
+		ttl = mutable->ttl;
 	if (!ttl)
 		ttl = ip4_dst_hoplimit(&rt_dst(rt));
-
 	if (mutable->flags & TNL_F_TTL_INHERIT) {
 		if (skb->protocol == htons(ETH_P_IP))
 			ttl = ip_hdr(skb)->ttl;
@@ -1444,7 +1462,8 @@ static int tnl_set_config(struct net *net, struct nlattr *options,
 		struct net_device *dev;
 		struct rtable *rt;
 
-		rt = __find_route(mutable, tnl_ops->ipproto, mutable->tos);
+		rt = __find_route(mutable, tnl_ops->ipproto, mutable->tos,
+				  mutable->key.daddr, mutable->key.saddr);
 		if (IS_ERR(rt))
 			return -EADDRNOTAVAIL;
 		dev = rt_dst(rt).dev;
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

* [RFC v4 00/21] Flow Based Tunneling for Open vSwitch
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery

Hi,

This series comprises a fresh batch of proposed changes to introduce
flow-based tunnelling.

At the heart of these changes is the following structure, which
is attached as a pointer to skb->cb.

struct ovs_key_ipv4_tunnel {
        __be64 tun_id;
        __u32  tun_flags;
        __be32 ipv4_src;
        __be32 ipv4_dst;
        __u8   ipv4_tos;
        __u8   ipv4_ttl;
        __u8   pad[2];
};

This series does not introdue use of in-tree kernel tunneling code
by Open vSwitch. However, it is intended as preliminary work
for that goal and I believe attaching a structure similar
to the one above to to skb->cb could be mechanism to achieve that.

I have CCed netdev for any comment on that.

Some details of the implementatoin follow, they are not
particularly related to the use of in-tree kernel tunneling code.

Overview:

In general the appraoch that I have taken in user-space is to split
tunneling into realdevs and tundevs.  Tunnel realdevs are devices that look
to users like the existing port-based tunnelling implementation. Tunnel
tundevs exist in the datapath and are where tx and rx occur.  Tunnel
tundevs have very little configuration and are unable to opperate without
flow information that describes at least the remote IP.

Changes:

* Do not attempt to configure a tundev realport, it will fail which
  results in ovs-vswitchd to start. I had not noticed this as
  ovs-vswitchd will start if there are no tundevs present in the databse
  when it starts, and I usally test on a fresh install.

* Add a flags fields to ovs_key_ipv4_tunnel (above) and use it
  to reinstate the functionality of various flags e.g. tunnel checksum,
  tunnel out key. Previously these flags were set on the 'mutable' of
  a tunnel device in the kernel, however this is no longer appropriate
  as a tunnel device may now handle multiple tunnels.

* Cleaned up output and parsing of tunnel flows.
  Test Suite enhancements to come.

* Do not use Linux kernel headers in lib/odp-util.c.
  This is achieved by defining a new structure flow_tun_key
  and using it instead of ovs_key_ipv4_tunnel. THe structure
  is currently the same internally as ovs_key_ipv4_tunnel.

Limiations:

* In this series, realdevs exist in the kernel although I believe
  it should not be necessary for them to do so. The reason that they are
  there is to limit the changes that are needed to the user-space netdev
  code and to allow review of the series before making those changes.

* PMTU discovery is broken and I'm unsure if it has been fixed.
  Jesse Gross sugested that a uer-space implemtation of MSS clampint would
  be a good solution to this. I have made a start on that and sent a
  separate email about it.

* The header cache has been removed, but some reminants of the
  API remain. In particualr the tunnel header is still created and updated,
  even thogh both occur for each transmit. It may make sense to
  recombine those calls into a single call if the header cache is
  to be permantently removed.

* Multicast could be implemented in user-space byt currently isn't.
  This means that muilticast remote IP for tunneling is broken.

* I have not implemented matches for tun_keys. This means
  that the current implementation only provides port-based tunneling
  implemented on top of flow-bassed tunneling. It is not yet possible for a
  controller to match on or set the tun_key of flows.

  I expect this to be a small body of work to complete.

* The way that I have split the patchs is still somewhat arbitrary.
  I wanted to avoid one very large patch to aid review.  But a lot of the
  chagnes are inter-related, so a bisectable split seems rather difficult.
  None the less, the split could be significantly improved.

----------------------------------------------------------------
Simon Horman (21):
      datapath: tunnelling: Replace tun_id with tun_key
      datapath: Use tun_key on transmit
      odp-util: Add tun_key to parse_odp_key_attr()
      vswitchd: Add iface_parse_tunnel
      vswitchd: Add add_tunnel_ports()
      ofproto: Add set_tunnelling()
      vswitchd: Configure tunnel interfaces.
      ofproto: Add realdev_to_txdev()
      ofproto: Add tundev_to_realdev()
      classifier: Convert struct flow flow_metadata to use tun_key
      datapath, vport: Provide tunnel realdev and tundev classes and vports
      lib: Replace commit_set_tun_id_action() with commit_set_tunnel_action()
      global: Remove OVS_KEY_ATTR_TUN_ID
      ofproto: Set flow tun_key in compose_output_action()
      datapath: Remove mlink element from tnl_mutable_config
      datapath: remove tunnel cache
      datapath: Always use tun_key addresses for route lookup
      dataptah: remove ttl and tos from tnl_mutable_config
      datapath: Simplify vport lookup
      datapath: Use tun_key flags for id and csum settings on transmit
      datapath: Always use tun_key flags

 datapath/Modules.mk             |   3 +-
 datapath/actions.c              |   6 +-
 datapath/datapath.c             |  11 +-
 datapath/datapath.h             |   5 +-
 datapath/flow.c                 |  35 +-
 datapath/flow.h                 |  27 +-
 datapath/tunnel.c               | 782 +++++-----------------------------------
 datapath/tunnel.h               |  98 +----
 datapath/vport-capwap.c         |  45 +--
 datapath/vport-gre.c            |  62 ++--
 datapath/vport-tunnel-realdev.c | 260 +++++++++++++
 datapath/vport.c                |   3 +-
 datapath/vport.h                |   1 +
 include/linux/openvswitch.h     |  24 +-
 include/openvswitch/tunnel.h    |   4 +
 lib/classifier.c                |   8 +-
 lib/dpif-linux.c                |   2 +-
 lib/dpif-netdev.c               |   2 +-
 lib/flow.c                      |  31 +-
 lib/flow.h                      |  21 +-
 lib/meta-flow.c                 |   4 +-
 lib/netdev-vport.c              | 333 ++++-------------
 lib/nx-match.c                  |   2 +-
 lib/odp-util.c                  |  72 +++-
 lib/odp-util.h                  |   5 +-
 lib/ofp-print.c                 |  12 +-
 lib/ofp-util.c                  |   4 +-
 ofproto/ofproto-dpif.c          | 347 ++++++++++++++++--
 ofproto/ofproto-provider.h      |  12 +
 ofproto/ofproto.c               |  28 ++
 ofproto/ofproto.h               |  46 +++
 tests/test-classifier.c         |   7 +-
 vswitchd/bridge.c               | 350 ++++++++++++++++++
 33 files changed, 1451 insertions(+), 1201 deletions(-)
 create mode 100644 datapath/vport-tunnel-realdev.c

^ permalink raw reply

* [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr()
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman
In-Reply-To: <1337850554-10339-1-git-send-email-horms@verge.net.au>

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

---

v4
Correct parsing of tunnel key in parse_odp_key_attr()
so that it matches the out put of format_odp_key_attr()

TODO: fix test suite

v3
* Initial post
---
 lib/odp-util.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 23d1efe..7cff00c 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -925,6 +925,35 @@ parse_odp_key_attr(const char *s, const struct simap *port_names,
     }
 
     {
+        ovs_be32 ipv4_src;
+        ovs_be32 ipv4_dst;
+        unsigned long long tun_flags;
+        int ipv4_tos;
+        int ipv4_ttl;
+        int n = -1;
+
+        if (sscanf(s, "ipv4_tunnel(tun_id=%31[x0123456789abcdefABCDEF]"
+                   ",flags=%llx,src="IP_SCAN_FMT",dst="IP_SCAN_FMT
+                   ",tos=%i,ttl=%i)%n",
+                   tun_id_s, &tun_flags,
+                   IP_SCAN_ARGS(&ipv4_src), IP_SCAN_ARGS(&ipv4_dst),
+                   &ipv4_tos, &ipv4_ttl, &n) > 0
+            && n > 0) {
+            struct ovs_key_ipv4_tunnel tun_key;
+
+            tun_key.tun_id = htonll(strtoull(tun_id_s, NULL, 0));
+            tun_key.tun_flags = tun_flags;
+            tun_key.ipv4_src = ipv4_src;
+            tun_key.ipv4_dst = ipv4_dst;
+            tun_key.ipv4_tos = ipv4_tos;
+            tun_key.ipv4_ttl = ipv4_ttl;
+            nl_msg_put_unspec(key, OVS_KEY_ATTR_IPV4_TUNNEL,
+                              &tun_key, sizeof tun_key);
+            return n;
+        }
+    }
+
+    {
         unsigned long long int in_port;
         int n = -1;
 
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox