Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: linux-next: Tree for Oct 16 (net/sched/em_ipset.c)
From: Stephen Hemminger @ 2013-10-16 22:39 UTC (permalink / raw)
  To: Randy Dunlap, Vitaly Lavrov
  Cc: Thierry Reding, linux-next, linux-kernel, Mark Brown,
	netdev@vger.kernel.org
In-Reply-To: <525F09B0.4080802@infradead.org>

On Wed, 16 Oct 2013 14:48:32 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> On 10/16/13 11:51, Thierry Reding wrote:
> > Hi all,
> > 
> > I've uploaded today's linux-next tree to the master branch of the
> > repository below:
> > 
> >         git://gitorious.org/thierryreding/linux-next.git
> > 
> > A next-20131016 tag is also provided for convenience.
> > 
> > Gained two new conflicts, but nothing too exciting. x86 and ARM default
> > configurations as well as the x86 allmodconfig mostly build fine on the
> > final tree. There was a failure for the ARM at91x40_defconfig, but the
> > proper fix wasn't immediately obvious to me, so I've left it broken for
> > now.
> 
> on i386, when CONFIG_NET_NS is not enabled:
> 
> net/sched/em_ipset.c: In function 'em_ipset_change':
> net/sched/em_ipset.c:27:36: error: 'struct net_device' has no member named 'nd_net'
> net/sched/em_ipset.c: In function 'em_ipset_destroy':
> net/sched/em_ipset.c:49:34: error: 'struct net_device' has no member named 'nd_net'
> 
> 

I think this should fix.



--- a/net/sched/em_ipset.c	2013-10-06 14:48:25.030449222 -0700
+++ b/net/sched/em_ipset.c	2013-10-16 15:38:05.030278287 -0700
@@ -24,7 +24,7 @@ static int em_ipset_change(struct tcf_pr
 {
 	struct xt_set_info *set = data;
 	ip_set_id_t index;
-	struct net *net = qdisc_dev(tp->q)->nd_net;
+	struct net *net = dev_net(qdisc_dev(tp->q));
 
 	if (data_len != sizeof(*set))
 		return -EINVAL;
@@ -46,7 +46,7 @@ static void em_ipset_destroy(struct tcf_
 {
 	const struct xt_set_info *set = (const void *) em->data;
 	if (set) {
-		ip_set_nfnl_put(qdisc_dev(p->q)->nd_net, set->index);
+		ip_set_nfnl_put(dev_net(qdisc_dev(p->q)), set->index);
 		kfree((void *) em->data);
 	}
 }

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Julian Anastasov @ 2013-10-16 22:50 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Simon Horman,
	YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Mark Brooks
In-Reply-To: <20131016215953.GD18135@order.stressinduktion.org>


	Hello,

On Wed, 16 Oct 2013, Hannes Frederic Sowa wrote:

> To have ip6_dst_check working, there must to be a valid link from
> the rt6_info to the fib6_node. Otherwise we cannot check the serial
> number. As I currently see we also need a link from the fib6_node down
> to the dst entry for resource management. Thus we would have to insert
> the special dst-entry with RTF_GATEWAY and non-null rt6i_gateway back
> into the fib and have it globally visible. This could have unforseen
> side effects. We still cache all dst entries in the fib. One think I
> foresee as a possible problem is the automatic aggregation of ECMP routes,
> too.

	I'm still not sure what is needed. Looking at
ip6_pol_route(), I see that everything should just work: for
routes without RTF_GATEWAY flag we return cloned route
from rt6_alloc_cow() with valid rt6i_gateway. I still
didn't tested the problem myself, so I'll stop with the
comments before that.

> We should provide something similar to what IPv4 does with the
> KNOWN_NH flag. I guess my idea with exchanging rt6i_dst as nexthop would
> solve this without too much hassle but this would have to be checked by
> implementing it.
> 
> I don't think that storing a changed nexthop in the ipv6 cb is that nice and
> maintainable.

	Agreed. I'm not going to test that change. I'll test
latest net-next without any changes to see where exactly is
the failure because the IPv6 routing seems capable for what
we need. I have problems with my setup, so I'll need some
days. As Simon is also testing the problem, he can find the
reason before me.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH net V2 1/2] virtio-net: don't respond to cpu hotplug notifier if we're not ready
From: Rusty Russell @ 2013-10-16 23:27 UTC (permalink / raw)
  To: Jason Wang, mst, virtualization, netdev, linux-kernel; +Cc: Greg Kroah-Hartman
In-Reply-To: <1381807139-3450-1-git-send-email-jasowang@redhat.com>

Jason Wang <jasowang@redhat.com> writes:
> We're trying to re-configure the affinity unconditionally in cpu hotplug
> callback. This may lead the issue during resuming from s3/s4 since
>
> - virt queues haven't been allocated at that time.
> - it's unnecessary since thaw method will re-configure the affinity.
>
> Fix this issue by checking the config_enable and do nothing is we're not ready.
>
> The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
> (virtio-net: reset virtqueue affinity when doing cpu hotplug).
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> The patch is need for 3.8 and above.

Please put 'CC: stable@kernel.org # 3.8+' in the commit.

(The specification of the stable line is poor, but that seems to be one
common method).

Cheers,
Rusty.

^ permalink raw reply

* Re: transmit lockup using smsc95xx ethernet on usb3
From: Sarah Sharp @ 2013-10-17  0:07 UTC (permalink / raw)
  To: David Laight
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Xenia Ragiadakou
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B738B-CgBM+Bx2aUAnGFn1LkZF6NBPR1lH4CV8@public.gmane.org>

On Tue, Oct 15, 2013 at 10:59:18AM +0100, David Laight wrote:
> We are seeing complete lockups of the transmit side when using
> the smsc95xx driver connected to a USB3 port on an i7 (Ivybridge) cpu.
> These errors are very intermittent - less than once a day, and
> it isn't actually clear that they are related to traffic load.
> 
> Most of the systems are running the 3.2 kernel from Ubuntu 12.04
> but I've seen the same problem when running a 3.4 kernel.

Have you tried the latest stable kernel or the latest -rc kernel?

> Looking at the changelog for xhci-ring.c I can see that some
> 'nasty' bugs were fixed between 3.2 and 3.4 (and possibly since)
> but the usbmon trace I've now got doesn't seem to match any
> of the changelog entries.
> 
> We are also seeing similar problems if we connect to a USB2
> header.

Do you mean a USB 2.0 port under the xHCI host controller?  What does
`lsusb -t` show as the parent host controller in that case?

> Since we can't reproduce the problem quickly it is difficult to
> do any analysis. Any suggestions for increasing the error rate
> would be welcome.
> 
> Below is an annotated extract from a usbmon trace while running
> a netperf test that was sending 8192 byte TCP packets (nagle off).
> I've deleted the Bi entries (packets are received throughout)
> and numbered all the others (modulo 10000) so it is easier to
> see when the requests complete, I've also calculated the elapsed
> time and the number of Setup entries between the S and C traces.
> 
> The USB ring seems to have 60 outstanding transmits in it,
> each time one completes another is sent. There are a few 10000
> traces of that then:
> 
>   start:9870         ffff88020ea16000 293811125 S Bo:3:003:2 -115 1514 =
>                         e2350000 e2450000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 05d0ff11
>    done:9811:6969:60 ffff88020c7c8000 293811236 C Bo:3:003:2 0 1090 >
>   start:9871         ffff88020ea16a80 293811242 S Bo:3:003:2 -115 1090 =
>                         3a340000 3a440000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 0428ff12
> ...
>   start:9929         ffff88020ea16780 293817964 S Bo:3:003:2 -115 1514 =
>                         e2350000 e2450000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 05d0ff4c
> Last successful completion.
>    done:9870:6968:60 ffff88020ea16000 293818093 C Bo:3:003:2 0 1514 >
>   start:9930         ffff88020ea16000 293818099 S Bo:3:003:2 -115 1514 =
>                         e2350000 e2450000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 05d0ff4d
> 
> At this point something (untraced) seems to have gone horribly
> wrong in the transmit ring. dmesg shows nothing.
> Two Bo 'fail -71', 6 succeed, one fails -32 the rest fail -104.
>    done:9871:6913:60 ffff88020ea16a80 293818155 C Bo:3:003:2 -71 512 >
>    done:9872:6927:59 ffff88020ea16f00 293818235 C Bo:3:003:2 -71 0
>    done:9873:6875:58 ffff88020ea16480 293818313 C Bo:3:003:2 0 1514 >
>    done:9874:6786:57 ffff88020c7c83c0 293818353 C Bo:3:003:2 0 1514 >
>    done:9875:6794:56 ffff88020c7c80c0 293818470 C Bo:3:003:2 0 1514 >
>    done:9876:6789:55 ffff88020c7c8e40 293818589 C Bo:3:003:2 0 1514 >
>    done:9877:6775:54 ffff88020c7c8240 293818702 C Bo:3:003:2 0 1090 >
>    done:9878:6751:53 ffff88020c7c8180 293818803 C Bo:3:003:2 0 1514 >
>    done:9879:6735:52 ffff88020c7c89c0 293818885 C Bo:3:003:2 -32 0
>    done:9880:6671:51 ffff88020c7c8900 293818925 C Bo:3:003:2 -104 0
> ...
>     done:9927:1292:4 ffff88020cf0c480 293819015 C Bo:3:003:2 -104 0
>     done:9928:1170:3 ffff88020ea160c0 293819016 C Bo:3:003:2 -104 0
> Something is known to be wrong...
>   start:9931         ffff88020ea160c0 293819037 S Co:3:003:0
>                          s 02 01 0000 0002 0000 0
>     done:9929:1080:3 ffff88020ea16780 293819044 C Bo:3:003:2 -104 0
>      done:9930:945:2 ffff88020ea16000 293819044 C Bo:3:003:2 -104 0
>       done:9931:48:1 ffff88020ea160c0 293819085 C Co:3:003:0 0 0
> 
> These 10 transmits never finish:
>   start:9932         ffff88020ea160c0 293819098 S Bo:3:003:2 -115 1090 =
>                         3a340000 3a440000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 0428ff4e
> ... 9933 to 9940 deleted
>   start:9941         ffff88020ea16b40 293819111 S Bo:3:003:2 -115 1514 =
>                         e2350000 e2450000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 05d0ff57
> 
> All further transmits fail immediately E -12 and generate the
>     'xhci_hcd 0000:00:14.0: ERROR no room on ep ring' message.
> (There are 1070 'E' traces and 1070 'no room' messages.)
> Receives are still working.
>   start:9942         ffff88020ea16240 293819113 S Bo:3:003:2 -115 1514 =
>                         e2350000 e2450000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 05d0ff58
>     done:9942:1550:1 ffff88020ea16240 293820663 E Bo:3:003:2 -12 0
>   start:9943         ffff88020ea16240 293820675 S Bo:3:003:2 -115 1514 =
>                         e2350000 e2450000 22003200 00224d98
>                         d8460002 1f0057d7 08004500 05d0ff59
>     done:9943:1507:1 ffff88020ea16240 293822182 E Bo:3:003:2 -12 0
> 
> Eventually something causes a device remove and insert - everything re-initialises.
> This is over 12 hours later.
>       done:unknown   ffff88020c8570c0 3637139297 C Ii:3:001:1 0:2048 1 = 02
>   start:1015         ffff88020c8570c0 3637139302 S Ii:3:001:1 -115:2048 4 <
>   start:1016         ffff88020cbeb300 3637139323 S Ci:3:001:0
>                          s a3 00 0000 0001 0004 4 <
>                      ffff88020ea16240 3637139331 C Bi:3:003:1 -71 0
>        done:1016:9:1 ffff88020cbeb300 3637139332 C Ci:3:001:0 0 4 = 00010100
>   start:1017         ffff88020cbeb300 3637139334 S Co:3:001:0
>                          s 23 01 0010 0001 0000 0
>        done:1017:4:1 ffff88020cbeb300 3637139338 C Co:3:001:0 0 0
>       done:unknown   ffff88020ca9ae40 3637139423 C Ii:3:003:3 -71:1 0
>                      ffff88020c9db540 3637139428 C Bi:3:003:1 -108 0
>                      ffff88020c9db780 3637139430 C Bi:3:003:1 -108 0
>                      ffff88020d8bb540 3637139431 C Bi:3:003:1 -108 0
> 
> The last 10 transmits then terminate with error -108:
>  done:9932:xxxx      ffff88020ea160c0 3637139462 C Bo:3:003:2 -108 0
> ... 9933 to 9940 deleted
>  done:9941:xxxx      ffff88020ea16b40 3637139482 C Bo:3:003:2 -108 0
>    done:1015:21090:3 ffff88020c8570c0 3637160392 C Ii:3:001:1 0:2048 1 = 02
>   start:1018         ffff88020c8570c0 3637160396 S Ii:3:001:1 -115:2048 4 <
>       done:unknown   ffff88020cbf26c0 3637176790 C Ii:3:005:1 -108:8 0
>       done:unknown   ffff88020c68aa80 3637622497 C Ii:3:002:1 -108:2048 0
> 
> Followed by lots of Ci/Co and eventually it all starts working again.
> 
> I've not yet tried to look up the control transfers.
> 
> These aren't the only errors we are seeing, we also see (separately):
> [21549.917529] hub 3-2:1.0: port 1 disabled by hub (EMI?), re-enabling...
> [ 5822.629579] NETDEV WATCHDOG: eth0 (smsc95xx): transmit queue 0 timed out
> [ 7263.834404] hid-generic 0003:413C:2005.0002: can't reset device, 0000:00:1a.0-1.4.3/input0, status -71 (connected to a USB2 header).
> These all cause a USB bus reset and everything recovers within a couple of seconds.

I would suggest you try with the latest stable kernel, with
CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING enabled.  If you try
the latest 3.12-rc, you will only need the CONFIG_USB_DEBUG.  Or, if
that output is too much (it will spew on short packets, which may be an
issue with your ethernet adapter), with the 3.12-rc kernel, you can turn
off CONFIG_USB_DEBUG and capture command trace events through the xHCI
trace infrastructure.  Xenia can help you with that if necessary (I'm
going to be at a conference starting Friday).

Without that dmesg, I really can't tell what's happening at an xHCI
level.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: linux-next: Tree for Oct 16 (net/sched/em_ipset.c)
From: Randy Dunlap @ 2013-10-17  0:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vitaly Lavrov, Thierry Reding, linux-next, linux-kernel,
	Mark Brown, netdev@vger.kernel.org
In-Reply-To: <20131016153949.5f2257b4@nehalam.linuxnetplumber.net>

On 10/16/13 15:39, Stephen Hemminger wrote:
> On Wed, 16 Oct 2013 14:48:32 -0700
> Randy Dunlap <rdunlap@infradead.org> wrote:
> 
>> On 10/16/13 11:51, Thierry Reding wrote:
>>> Hi all,
>>>
>>> I've uploaded today's linux-next tree to the master branch of the
>>> repository below:
>>>
>>>         git://gitorious.org/thierryreding/linux-next.git
>>>
>>> A next-20131016 tag is also provided for convenience.
>>>
>>> Gained two new conflicts, but nothing too exciting. x86 and ARM default
>>> configurations as well as the x86 allmodconfig mostly build fine on the
>>> final tree. There was a failure for the ARM at91x40_defconfig, but the
>>> proper fix wasn't immediately obvious to me, so I've left it broken for
>>> now.
>>
>> on i386, when CONFIG_NET_NS is not enabled:
>>
>> net/sched/em_ipset.c: In function 'em_ipset_change':
>> net/sched/em_ipset.c:27:36: error: 'struct net_device' has no member named 'nd_net'
>> net/sched/em_ipset.c: In function 'em_ipset_destroy':
>> net/sched/em_ipset.c:49:34: error: 'struct net_device' has no member named 'nd_net'
>>
>>
> 
> I think this should fix.
> 

That works.  Thanks.

Acked-by: Randy Dunlap <rdunlap@infradead.org>


> 
> 
> --- a/net/sched/em_ipset.c	2013-10-06 14:48:25.030449222 -0700
> +++ b/net/sched/em_ipset.c	2013-10-16 15:38:05.030278287 -0700
> @@ -24,7 +24,7 @@ static int em_ipset_change(struct tcf_pr
>  {
>  	struct xt_set_info *set = data;
>  	ip_set_id_t index;
> -	struct net *net = qdisc_dev(tp->q)->nd_net;
> +	struct net *net = dev_net(qdisc_dev(tp->q));
>  
>  	if (data_len != sizeof(*set))
>  		return -EINVAL;
> @@ -46,7 +46,7 @@ static void em_ipset_destroy(struct tcf_
>  {
>  	const struct xt_set_info *set = (const void *) em->data;
>  	if (set) {
> -		ip_set_nfnl_put(qdisc_dev(p->q)->nd_net, set->index);
> +		ip_set_nfnl_put(dev_net(qdisc_dev(p->q)), set->index);
>  		kfree((void *) em->data);
>  	}
>  }
> --


-- 
~Randy

^ permalink raw reply

* [PATCH net-next] em_ipset: use dev_net() accessor
From: Stephen Hemminger @ 2013-10-17  0:29 UTC (permalink / raw)
  To: Randy Dunlap, David Miller
  Cc: Vitaly Lavrov, Thierry Reding, linux-next, linux-kernel,
	Mark Brown, netdev@vger.kernel.org, Jamal Hadi Salim
In-Reply-To: <525F2D7D.7010708@infradead.org>

Randy found that if network namespace not enabled then
nd_net does not exist and would cause compilation failure.

This is handled correctly by using the dev_net() macro.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>


--- a/net/sched/em_ipset.c	2013-10-06 14:48:25.030449222 -0700
+++ b/net/sched/em_ipset.c	2013-10-16 15:38:05.030278287 -0700
@@ -24,7 +24,7 @@ static int em_ipset_change(struct tcf_pr
 {
 	struct xt_set_info *set = data;
 	ip_set_id_t index;
-	struct net *net = qdisc_dev(tp->q)->nd_net;
+	struct net *net = dev_net(qdisc_dev(tp->q));
 
 	if (data_len != sizeof(*set))
 		return -EINVAL;
@@ -46,7 +46,7 @@ static void em_ipset_destroy(struct tcf_
 {
 	const struct xt_set_info *set = (const void *) em->data;
 	if (set) {
-		ip_set_nfnl_put(qdisc_dev(p->q)->nd_net, set->index);
+		ip_set_nfnl_put(dev_net(qdisc_dev(p->q)), set->index);
 		kfree((void *) em->data);
 	}
 }

^ permalink raw reply

* [PATCH] drivers: net: wireless: b43: Fix possible NULL ptr dereference
From: Felipe Pena @ 2013-10-17  0:40 UTC (permalink / raw)
  To: Stefano Brivio, John W. Linville
  Cc: linux-wireless, b43-dev, netdev, linux-kernel, Felipe Pena

On the ternary expression the 'e' variable could be NULL dereferenced,
when b43_nphy_get_rf_ctl_over_rev7 function returns NULL.

Signed-off-by: Felipe Pena <felipensp@gmail.com>
---
 drivers/net/wireless/b43/phy_n.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/b43/phy_n.c b/drivers/net/wireless/b43/phy_n.c
index 7c970d3..05ee7f1 100644
--- a/drivers/net/wireless/b43/phy_n.c
+++ b/drivers/net/wireless/b43/phy_n.c
@@ -164,7 +164,8 @@ static void b43_nphy_rf_ctl_override_rev7(struct b43_wldev *dev, u16 field,
 		}
 		en_addr = en_addrs[override][i];

-		val_addr = (i == 0) ? e->val_addr_core0 : e->val_addr_core1;
+		if (e)
+			val_addr = (i == 0) ? e->val_addr_core0 : e->val_addr_core1;

 		if (off) {
 			b43_phy_mask(dev, en_addr, ~en_mask);
--
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
From: Simon Horman @ 2013-10-17  0:49 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Pablo Neira Ayuso, lvs-devel, netdev, netfilter-devel,
	Wensong Zhang
In-Reply-To: <alpine.LFD.2.03.1310162231380.1917@ssi.bg>

On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> 
> > I can enqueue this fix to nf if you like. No need to resend, I can
> > manually apply.
> > 
> > Let me know.
> 
> 	It is not critical. I waited weeks the net tree to be
> copied into net-next because it collides with the recent
> "ipvs: make the service replacement more robust" change in
> net tree :) But if a rcu_barrier in the netns cleanup looks
> scary enough you can push it to nf. IMHO, it just adds
> unneeded delay there.

If it is not critical I would prefer for it to travel through
nf-next. Though I do not feel strongly about this.

^ permalink raw reply

* Re: [PATCH v2.43 0/5] MPLS actions and matches
From: Simon Horman @ 2013-10-17  0:56 UTC (permalink / raw)
  To: Ben Pfaff
  Cc: Jesse Gross, dev@openvswitch.org, netdev, Pravin B Shelar, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <20131016204743.GB4526@nicira.com>

On Wed, Oct 16, 2013 at 01:47:43PM -0700, Ben Pfaff wrote:
> On Wed, Oct 16, 2013 at 01:21:22PM -0700, Jesse Gross wrote:
> > On Tue, Oct 15, 2013 at 8:55 PM, Ben Pfaff <blp@nicira.com> wrote:
> > > On Thu, Oct 10, 2013 at 09:31:41AM +0900, Simon Horman wrote:
> > >> I believe this series addresses the feedback that each of you
> > >> gave with regards to recent previous postings of this patch-set.
> > >> I'm wondering if you could find some time to review it.
> > >
> > > I'm waiting for an ack from Jesse, then I'm going to do a final pass and
> > > I hope to commit this series at this point.  I see some ways that we can
> > > improve MPLS support afterward but I don't see any blockers.
> > 
> > I guess that we had a deadlock as I was waiting for you. I though that
> > maybe you would commit the rest of the userspace patches and then I
> > would just pick up the last two.
> 
> Ah, OK.
> 
> I will do this, then, after a final pass.

I have noticed a minor problem in the first patch of v2.43
and I have also noticed that it no longer applies on top of master.
I have prepared v2.44 and will send it shortly.

^ permalink raw reply

* [PATCH v2.44 3/5] lib: Support pushing of MPLS LSE before or after VLAN tag
From: Simon Horman @ 2013-10-17  1:15 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Jesse Gross, Ben Pfaff
  Cc: Isaku Yamahata, Ravi K
In-Reply-To: <1381972511-27221-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

From: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>

This patch modifies the push_mpls behaviour to allow
pushing of an MPLS LSE either before any VLAN tag that may be present.

Pushing the MPLS LSE before any VLAN tag that is present is the
behaviour specified in OpenFlow 1.3.

Pushing the MPLS LSE after the any VLAN tag that is present is the
behaviour specified in OpenFlow 1.1 and 1.2. This is the only behaviour
that was supported prior to this patch.

When an push_mpls action has been inserted using OpenFlow 1.2 or earlier
the behaviour of pushing the MPLS LSE before any VLAN tag that may be
present is implemented by by inserting VLAN actions around the MPLS push
action during odp translation; Pop VLAN tags before committing MPLS
actions, and push the expected VLAN tag afterwards.

The trigger condition for the two different behaviours is the value of the
mpls_before_vlan field of struct ofpact_push_mpls.  This field is set when
parsing OpenFlow actions.

Signed-off-by: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

---

v2.44
* No change

v2.43 [Simon Horman]
* Trivial rebase as the result of replacing 'mpls_before_vlan' field
  of struct ofpact_push_mpls with a 'position' field.

v2.42
* No change

v2.41 [Simon Horman]
* Use 'mpls_before_vlan' field of struct ofpact_push_mpls.
* Reword changelog to describe the difference in behaviour between
  different Open Flow versions.

v2.40 [Simon Horman]
* Trivial rebase for removal of set_ethertype()

v2.36 - v2.39
* No change

v2.35 [Joe Stringer]
* First post
---
 lib/flow.c                   |   2 +-
 lib/packets.c                |  10 +-
 lib/packets.h                |   2 +-
 ofproto/ofproto-dpif-xlate.c |  27 ++---
 tests/ofproto-dpif.at        | 237 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 259 insertions(+), 19 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index 0678c6f..6242ef9 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1061,7 +1061,7 @@ flow_compose(struct ofpbuf *b, const struct flow *flow)
     }
 
     if (eth_type_mpls(flow->dl_type)) {
-        b->l2_5 = b->l3;
+        b->l2_5 = (char*)b->l2 + ETH_HEADER_LEN;
         push_mpls(b, flow->dl_type, flow->mpls_lse);
     }
 }
diff --git a/lib/packets.c b/lib/packets.c
index 922c5db..f8a58b6 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -220,11 +220,11 @@ eth_pop_vlan(struct ofpbuf *packet)
 
 /* Set ethertype of the packet. */
 void
-set_ethertype(struct ofpbuf *packet, ovs_be16 eth_type)
+set_ethertype(struct ofpbuf *packet, ovs_be16 eth_type, bool inner)
 {
     struct eth_header *eh = packet->data;
 
-    if (eh->eth_type == htons(ETH_TYPE_VLAN)) {
+    if (inner && eh->eth_type == htons(ETH_TYPE_VLAN)) {
         ovs_be16 *p;
         p = ALIGNED_CAST(ovs_be16 *,
                 (char *)(packet->l2_5 ? packet->l2_5 : packet->l3) - 2);
@@ -332,8 +332,8 @@ push_mpls(struct ofpbuf *packet, ovs_be16 ethtype, ovs_be32 lse)
 
     if (!is_mpls(packet)) {
         /* Set ethtype and MPLS label stack entry. */
-        set_ethertype(packet, ethtype);
-        packet->l2_5 = packet->l3;
+        set_ethertype(packet, ethtype, false);
+        packet->l2_5 = (char*)packet->l2 + ETH_HEADER_LEN;
     }
 
     /* Push new MPLS shim header onto packet. */
@@ -354,7 +354,7 @@ pop_mpls(struct ofpbuf *packet, ovs_be16 ethtype)
         size_t len;
         mh = packet->l2_5;
         len = (char*)packet->l2_5 - (char*)packet->l2;
-        set_ethertype(packet, ethtype);
+        set_ethertype(packet, ethtype, true);
         if (mh->mpls_lse & htonl(MPLS_BOS_MASK)) {
             packet->l2_5 = NULL;
         } else {
diff --git a/lib/packets.h b/lib/packets.h
index 7388152..38fec70 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -143,7 +143,7 @@ void compose_rarp(struct ofpbuf *, const uint8_t eth_src[ETH_ADDR_LEN]);
 void eth_push_vlan(struct ofpbuf *, ovs_be16 tci);
 void eth_pop_vlan(struct ofpbuf *);
 
-void set_ethertype(struct ofpbuf *packet, ovs_be16 eth_type);
+void set_ethertype(struct ofpbuf *packet, ovs_be16 eth_type, bool inner);
 
 const char *eth_from_hex(const char *hex, struct ofpbuf **packetp);
 void eth_format_masked(const uint8_t eth[ETH_ADDR_LEN],
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 3a71ba8..14fd037 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -2494,24 +2494,27 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
             break;
         }
 
-        case OFPACT_PUSH_MPLS:
-            if (compose_mpls_push_action(ctx,
-                                         ofpact_get_PUSH_MPLS(a)->ethertype)) {
+        case OFPACT_PUSH_MPLS: {
+            struct ofpact_push_mpls *oam = ofpact_get_PUSH_MPLS(a);
+
+            if (compose_mpls_push_action(ctx, oam->ethertype)) {
                 return;
             }
 
-            /* Save and pop any existing VLAN tags. They will be pushed
-             * back onto the packet after pushing the MPLS LSE. The overall
-             * effect is to push the MPLS LSE after any VLAN tags that may
-             * be present. This is the behaviour described for OpenFlow 1.1
-             * and 1.2. This code needs to be enhanced to make this
-             * conditional to also and support pushing the MPLS LSE before
-             * any VLAN tags that may be present, the behaviour described
-             * for OpenFlow 1.3. */
+            /* Save and pop any existing VLAN tags if the MPLS LSE should
+             * be pushed after VLAN tags.  The overall effect is to push
+             * the MPLS LSE after any VLAN tags that may be present.  This
+             * is the behaviour described for OpenFlow 1.1 and 1.2.
+             * Do not save and therefore pop the VLAN tags if the MPLS LSE
+             * should be pushed before any VLAN tags that are present.
+             * This is the behaviour described for OpenFlow 1.3. */
             ctx->final_vlan_tci = *ctx->next_vlan_tci;
-            flow->vlan_tci = htons(0);
+            if (oam->position == OFPACT_MPLS_AFTER_VLAN) {
+                flow->vlan_tci = htons(0);
+            }
             ctx->next_vlan_tci = &ctx->final_vlan_tci;
             break;
+        }
 
         case OFPACT_POP_MPLS:
             if (compose_mpls_pop_action(ctx,
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index bc9bcf4..6b2072b 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -1174,6 +1174,243 @@ NXST_FLOW reply:
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([ofproto-dpif - OF1.3+ VLAN+MPLS handling])
+OVS_VSWITCHD_START([dnl
+   add-port br0 p1 -- set Interface p1 type=dummy
+])
+ON_EXIT([kill `cat ovs-ofctl.pid`])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_DATA([flows.txt], [dnl
+cookie=0xa dl_src=40:44:44:44:55:44 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],controller
+cookie=0xa dl_src=40:44:44:44:55:45 actions=push_vlan:0x8100,mod_vlan_vid:99,push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],controller
+cookie=0xa dl_src=40:44:44:44:55:46 actions=push_vlan:0x8100,mod_vlan_vid:99,push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],controller
+cookie=0xa dl_src=40:44:44:44:55:47 actions=push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],controller
+cookie=0xa dl_src=40:44:44:44:55:48 actions=push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],controller
+cookie=0xa dl_src=40:44:44:44:55:49 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],push_vlan:0x8100,mod_vlan_vid:99,controller
+cookie=0xa dl_src=40:44:44:44:55:50 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],push_vlan:0x8100,mod_vlan_vid:99,controller
+cookie=0xa dl_src=40:44:44:44:55:51 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,controller
+cookie=0xa dl_src=40:44:44:44:55:52 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],load:3->OXM_OF_MPLS_TC[[]],push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,controller
+])
+AT_CHECK([ovs-ofctl --protocols=OpenFlow13 add-flows br0 flows.txt])
+
+dnl Modified MPLS controller action.
+dnl The input packet has a VLAN tag, but because we push an MPLS tag in
+dnl OF1.3 mode, we can no longer see it.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:44,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:44,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:44,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:44,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push a VLAN tag, then an MPLS tag in OF1.3 mode, so we
+dnl can only see the MPLS tag in the result.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:45,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:45,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:45,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:45,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet is vlan-tagged; we update this tag then
+dnl push an MPLS tag in OF1.3 mode. As such, we can only see the MPLS tag in
+dnl the result.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:46,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:46,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:46,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:46,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push a VLAN tag, then an MPLS tag in OF1.3 mode, so we
+dnl can only see the MPLS tag in the result.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:47,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:47,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:47,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:47,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet is vlan-tagged; we update this tag then
+dnl push an MPLS tag in OF1.3 mode. As such, we can only see the MPLS tag in
+dnl the result.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:48,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:48,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:48,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,vlan_tci=0x0000,dl_src=40:44:44:44:55:48,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push the MPLS tag before pushing a VLAN tag, so we see
+dnl both of these in the final flow.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:49,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=0,dl_src=40:44:44:44:55:49,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=0,dl_src=40:44:44:44:55:49,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=0,dl_src=40:44:44:44:55:49,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet in vlan-tagged, which should be stripped
+dnl before we push the MPLS and VLAN tags.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:50,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=0,dl_src=40:44:44:44:55:50,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=0,dl_src=40:44:44:44:55:50,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=0,dl_src=40:44:44:44:55:50,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push the MPLS tag before pushing a VLAN tag, so we see
+dnl both of these in the final flow.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:51,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:55:51,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:55:51,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:55:51,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet in vlan-tagged, which should be stripped
+dnl before we push the MPLS and VLAN tags.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:55:52,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:55:52,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:55:52,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:55:52,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=3,mpls_ttl=64,mpls_bos=1
+])
+
+AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
+AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort], [0], [dnl
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:44 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:45 actions=mod_vlan_vid:99,push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:46 actions=mod_vlan_vid:99,push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:47 actions=load:0x63->OXM_OF_VLAN_VID[[]],push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:48 actions=load:0x63->OXM_OF_VLAN_VID[[]],push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:49 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],mod_vlan_vid:99,CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:50 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],mod_vlan_vid:99,CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:51 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],load:0x63->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:55:52 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x3->OXM_OF_MPLS_TC[[]],load:0x63->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,CONTROLLER:65535
+NXST_FLOW reply:
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - fragment handling])
 OVS_VSWITCHD_START
 ADD_OF_PORTS([br0], [1], [2], [3], [4], [5], [6], [90])
-- 
1.8.4

^ permalink raw reply related

* [PATCH v2.44 4/5] datapath: Break out deacceleration portion of vlan_push
From: Simon Horman @ 2013-10-17  1:15 UTC (permalink / raw)
  To: dev, netdev, Jesse Gross, Ben Pfaff
  Cc: Pravin B Shelar, Ravi K, Isaku Yamahata, Joe Stringer
In-Reply-To: <1381972511-27221-1-git-send-email-horms@verge.net.au>

Break out deacceleration portion of vlan_push into vlan_put
so that it may be re-used by mpls_push.

For both vlan_push and mpls_push if there is an accelerated VLAN tag
present then it should be deaccelerated, adding it to the data of
the skb, before the new tag is added.

Signed-off-by: Simon Horman <horms@verge.net.au>

---

v2.41 - v2.44
* No change

v2.40
* As suggested by Jesse Gross
  + Simplify vlan_push by returning an error code
    rather than an error code encoded as a struct xkb_buff *

v2.39
* First post
---
 datapath/actions.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index 30ea1d2..d961e5d 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -105,22 +105,31 @@ static int pop_vlan(struct sk_buff *skb)
 	return 0;
 }
 
-static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vlan)
+/* push down current VLAN tag */
+static int put_vlan(struct sk_buff *skb)
 {
-	if (unlikely(vlan_tx_tag_present(skb))) {
-		u16 current_tag;
+	u16 current_tag = vlan_tx_tag_get(skb);
 
-		/* push down current VLAN tag */
-		current_tag = vlan_tx_tag_get(skb);
+	if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
+		return -ENOMEM;
 
-		if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
-			return -ENOMEM;
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_add(skb->csum, csum_partial(skb->data
+				+ (2 * ETH_ALEN), VLAN_HLEN, 0));
 
-		if (skb->ip_summed == CHECKSUM_COMPLETE)
-			skb->csum = csum_add(skb->csum, csum_partial(skb->data
-					+ (2 * ETH_ALEN), VLAN_HLEN, 0));
+	return 0;
+}
 
+static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vlan)
+{
+	if (unlikely(vlan_tx_tag_present(skb))) {
+		int err;
+
+		err = put_vlan(skb);
+		if (unlikely(err))
+			return err;
 	}
+
 	__vlan_hwaccel_put_tag(skb, vlan->vlan_tpid, ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT);
 	return 0;
 }
-- 
1.8.4

^ permalink raw reply related

* [PATCH v2.44 0/5] MPLS actions and matches
From: Simon Horman @ 2013-10-17  1:15 UTC (permalink / raw)
  To: dev, netdev, Jesse Gross, Ben Pfaff
  Cc: Pravin B Shelar, Ravi K, Isaku Yamahata, Joe Stringer

Hi,

This series implements MPLS actions and matches based on work by
Ravi K, Leo Alterman, Yamahata-san and Joe Stringer.

This series provides two changes

* Patches 1 - 3

  Provide user-space support for the VLAN/MPLS tag insertion order
  up to and including OpenFlow 1.2, and the different ordering
  specified from OpenFlow 1.3. In a nutshell the datapath always
  uses the OpenFlow 1.3 ordering, which is to always insert tags
  immediately after the L2 header, regardless of the presence of other
  tags. And ovs-vswtichd provides compatibility for the behaviour up
  to OpenFlow 1.2, which is that MPLS tags should follow VLAN tags
  if present.

  Patch 1 has been updated since v3.43.
  Patches 2 and 3 were last updated in v2.42.

  Ben, these are for you to review.

* Patches 4 and 5

  Adding basic MPLS action and match support to the kernel datapath

  These patches were last updated in v2.41.

  Jesse, these are for you to review.


Differences between v2.44 and v2.43:

* Rebase for the following changes:
  f47ea02 ("Set datapath mask bits when setting a flow field.")
  7fdb60a ("Add support for write-actions")
  7358063 ("odp-util: Elaborate the comment for odp_flow_format() function.")
* Correct final_vlan_tci and next_vlan_tci initialisation in xlate_actions__()


Differences between v2.43 and v2.42:

* As suggested by Ben Pfaff
  Move vlan state into struct xlate_ctx
  1. Add final_vlan_tci member to struct xlate_ctx instead of vlan_tci member
     struct xlate_xin.  This seems to be a better palace for it as it does
     not need to be accessible from the caller.
  2. Move local vlan_tci variable of do_xlate_actions() to be the
     next_vlan_tci member of strict xlate_ctx.  This allows for it to work
     across resubmit actions and goto table instructions.
* Code contributed by Ben Pfaff
  + Use enum for to control order of MPLS LSE insertion
    This makes the logic somewhat clearer
* Add a helper push_mpls_from_openflow() to consolidate
  the same logic that appears in three locations.


Differences between v2.42 and v2.41:

* Rebase for:
  + 0585f7a ("datapath: Simplify mega-flow APIs.")
  + a097c0b ("datapath: Restructure datapath.c and flow.c")
* As suggested by Jesse Gross
  + Take into account that push_mpls() will have freed the skb on error
  + Remove dubious !eth_p_mpls(skb->protocol) condition from push_mpls
    The !eth_p_mpls(skb->protocol) condition on setting inner_protocol
    has no effect. Its motivation was to ensure that inner_protocol was
    only set the first time that mpls_push occured. However this is already
    ensured by the !ovs_skb_get_inner_protocol(skb) condition.
  + Return -EINVAL instead of -ENOMEM from pop_mpls() if the skb is too short
  + Do not add @inner_protocol to kernel doc for struct ovs_skb_cb.
    The patch no longer adds an inner_protocol member to struct ovs_skb_cb
  + Do not add and set otherwise unsued inner_protocol variable in
    rpl_dev_queue_xmit()
* As suggested by Pravin Shelar
  + Implement compatibility code in existing rpl_skb_gso_segment
    rather than introducing to use rpl___skb_gso_segment


Differences between v2.41 and v2.40:

* As suggested by Ben Pfaff
  + Expand struct ofpact_reg_load to include a mpls_before_vlan field
    rather than using the compat field of the ofpact field of
    struct ofpact_reg_load.
  + Pass version to  ofpacts_pull_openflow11_actions and
    ofpacts_pull_openflow11_instructions.  This removes the invalid
    assumption that that these functions are passed a full message and are
    thus able to deduce the OpenFlow version.


Differences between v2.40 and v2.39:

* Rebase for:
  + New dev_queue_xmit compat code
  + Updated put_vlan()
  + Removal of mpls_depth field from struct flow
* As suggested by Jesse Gross
  + Remove bogus mac_len update from push_mpls()
  + Slightly simplify push_mpls() by using eth_hdr()
  + Remove dubious condition !eth_p_mpls(inner_protocol) on
    an skb being considered to be MPLS in netdev_send()
  + Only use compatibility code for MPLS GSO segmentation on kernels
    older than 3.11
  + Revamp setting of inner_protocol
    1. Do not unconditionally set inner_protocol to the value of
       skb->protocol in ovs_execute_actions().
    2. Initialise inner_protocol it to zero only if compatibility code is in
       use. In the case where compatibility code is not in use it will either
       be zero due since the allocation of the skb or some other value set
       by some other user.
    3. Conditionally set the inner_protocol in push_mpls() to the value of
       skb->protocol when entering push_mpls(). The condition is that
       inner_protocol is zero and the value of skb->protocol is not an MPLS
       ethernet type.
    - This new scheme:
      + Pushes logic to set inner_protocol closer to the case where it is
	needed.
      + Avoids over-writing values set by other users.
* As suggested by Pravin Shelar
  + Only set and restore skb->protocol in rpl___skb_gso_segment() in the
    case of MPLS
  + Add inner_protocol field to struct ovs_gso_cb instead of ovs_skb_cb.
    This moves compatibility code closer to where it is used
    and creates fewer differences with mainline.
* Update comment on mac_len updates in datapath/actions.c
* Remove HAVE_INNER_PROCOTOL and instead just check
  against kernel version 3.11 directly.
  HAVE_INNER_PROCOTOL is a hang-over from work done prior
  to the merge of inner_protocol into the kernel.
* Remove dubious condition !eth_p_mpls(inner_protocol) on
  using inner_protocol as the type in rpl_skb_network_protocol()
* Do not update type of features in rpl_dev_queue_xmit.
  Though arguably correct this is not an inherent part of
  the changes made by this patch.
* Use skb_cow_head() in push_mpls()
  + Call skb_cow_head(skb, MPLS_HLEN) instead of
    make_writable(skb, skb->mac_len) to ensure that there is enough head
    room to push an MPLS LSE regardless of whether the skb is cloned or not.
  + This is consistent with the behaviour of rpl__vlan_put_tag().
  + This is a fix for crashes reported when performing mpls_push
    with headroom less than 4. This problem was introduced in v3.36.
* Skip popping in mpls_pop if the skb is too short to contain an MPLS LSE


Differences between v2.39 and v2.38:

* Rebase for removal of vlan, checksum and skb->mark compat code
  - This includes adding adding a new patch,
    "[PATCH v2.39 6/7] datapath: Break out deacceleration portion of
    vlan_push" to allow re-use of some existing code.


Differences between v2.38 and v2.37:

* Rebase for SCTP support
* Refactor validate_tp_port() to iterate over eth_types rather
  than open-coding the loop. With the addition of SCTP this logic
  is now used three times.


Differences between v2.37 and v2.36:

* Rebase


Differences between v2.36 and v2.35:

* Rebase

* Do not add set_ethertype() to datapath/actions.c.
  As this patch has evolved this function had devolved into
  to sets of functionality wrapped into a single function with
  only one line of common code. Refactor things to simply
  open-code setting the ether type in the two locations where
  set_ethertype() was previously used. The aim here is to improve
  readability.

* Update setting skb->ethertype after mpls push and pop.
  - In the case of push_mpls it should be set unconditionally
    as in v2.35 the behaviour of this function to always push
    an MPLS LSE before any VLAN tags.
  - In the case of mpls_pop eth_p_mpls(skb->protocol) is a better
    test than skb->protocol != htons(ETH_P_8021Q) as it will give the
    correct behaviour in the presence of other VLAN ethernet types,
    for example 0x88a8 which is used by 802.1ad. Moreover, it seems
    correct to update the ethernet type if it was previously set
    according to the top-most MPLS LSE.

* Deaccelerate VLANs when pushing MPLS tags the
  - Since v2.35 MPLS push will insert an MPLS LSE before any VLAN tags.
    This means that if an accelerated tag is present it should be
    deaccelerated to ensure it ends up in the correct position.

* Update skb->mac_len in push_mpls() so that it will be correct
  when used by a subsequent call to pop_mpls().

  As things stand I do not believe this is strictly necessary as
  ovs-vswitchd will not send a pop MPLS action after a push MPLS action.
  However, I have added this in order to code more defensively as I believe
  that if such a sequence did occur it would be rather unobvious why
  it didn't work.

* Do not add skb_cow_head() call in push_mpls().
  It is unnecessary as there is a make_writable() call.
  This change was also made in v2.30 but some how the
  code regressed between then and v2.35.


Differences between v2.35 and v2.34:

* Add support for the tag ordering specified up until OpenFlow 1.2 and
  the ordering specified from OpenFlow 1.3.

* Correct error in datapath patch's handling of GSO in the presence
  of MPLS and absence of VLANs.


To aid review this series is available in git at:

git://github.com/horms/openvswitch.git devel/mpls-v2.44


Patch list and overall diffstat:

Joe Stringer (3):
  odp: Allow VLAN actions after MPLS actions
  ofp-actions: Add separate OpenFlow 1.3 action parser
  lib: Support pushing of MPLS LSE before or after VLAN tag

Simon Horman (2):
  datapath: Break out deacceleration portion of vlan_push
  datapath: Add basic MPLS support to kernel

 datapath/Modules.mk                             |   1 +
 datapath/actions.c                              | 158 ++++++++-
 datapath/datapath.c                             |   4 +-
 datapath/flow.c                                 |  29 ++
 datapath/flow.h                                 |  17 +-
 datapath/flow_netlink.c                         | 286 +++++++++++++--
 datapath/flow_netlink.h                         |   2 +-
 datapath/linux/compat/gso.c                     |  70 +++-
 datapath/linux/compat/gso.h                     |  41 +++
 datapath/linux/compat/include/linux/netdevice.h |   6 +-
 datapath/linux/compat/netdevice.c               |  10 +-
 datapath/mpls.h                                 |  15 +
 include/linux/openvswitch.h                     |   7 +-
 lib/flow.c                                      |   2 +-
 lib/odp-util.c                                  |   9 +-
 lib/odp-util.h                                  |   3 +-
 lib/ofp-actions.c                               |  91 ++++-
 lib/ofp-actions.h                               |  16 +
 lib/ofp-print.c                                 |   2 +-
 lib/ofp-util.c                                  |  24 +-
 lib/ofp-util.h                                  |   2 +-
 lib/packets.c                                   |  10 +-
 lib/packets.h                                   |   2 +-
 ofproto/ofproto-dpif-xlate.c                    | 134 +++++--
 tests/ofproto-dpif.at                           | 446 ++++++++++++++++++++++++
 utilities/ovs-ofctl.c                           |   8 +-
 26 files changed, 1275 insertions(+), 120 deletions(-)
 create mode 100644 datapath/mpls.h

-- 
1.8.4

^ permalink raw reply

* [PATCH v2.44 2/5] ofp-actions: Add separate OpenFlow 1.3 action parser
From: Simon Horman @ 2013-10-17  1:15 UTC (permalink / raw)
  To: dev, netdev, Jesse Gross, Ben Pfaff
  Cc: Pravin B Shelar, Ravi K, Isaku Yamahata, Joe Stringer
In-Reply-To: <1381972511-27221-1-git-send-email-horms@verge.net.au>

From: Joe Stringer <joe@wand.net.nz>

This patch adds new ofpact_from_openflow13() and
ofpacts_from_openflow13() functions parallel to the existing ofpact
handling code. In the OpenFlow 1.3 version, push_mpls is handled
differently, but all other actions are handled by the existing code.

In the case of push_mpls for OpenFlow 1.3 the new mpls_before_vlan field of
struct ofpact_push_mpls is set to true.  This will be used by a subsequent
patch to allow allow the correct VLAN+MPLS datapath behaviour to be
determined at odp translation time.

enum ofpact_mpls_position contributed by Ben Pfaff.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>

---

Ben, please feel free to add yourself as a co-author
as you wrote the enum ofpact_mpls_position portion.

v2.44
* No change

v2.43 [Ben Pfaff and Simon Horman]
* Code contributed by Ben Pfaff
  + Use enum for to control order of MPLS LSE insertion
    This makes the logic somewhat clearer
* Add a helper push_mpls_from_openflow() to consolidate
  the same logic that appears in three locations.

v2.42
* No change

v2.41 [Simon Horman]
* As suggested by Ben Pfaff
  + Expand struct ofpact_reg_load to include a mpls_before_vlan field
    rather than using the compat field of the ofpact field of
    struct ofpact_reg_load.
  + Pass version to  ofpacts_pull_openflow11_actions and
    ofpacts_pull_openflow11_instructions.  This removes the invalid
    assumption that that these functions are passed a full message and are
    thus able to deduce the OpenFlow version.

v2.36 - v2.40
* No change

v2.35 [Joe Stringer]
* First post
---
 lib/ofp-actions.c     | 91 ++++++++++++++++++++++++++++++++++++++++++++-------
 lib/ofp-actions.h     | 16 +++++++++
 lib/ofp-print.c       |  2 +-
 lib/ofp-util.c        | 24 ++++++++------
 lib/ofp-util.h        |  2 +-
 utilities/ovs-ofctl.c |  8 ++---
 6 files changed, 115 insertions(+), 28 deletions(-)

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 06f9f6b..430d969 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -238,6 +238,22 @@ sample_from_openflow(const struct nx_action_sample *nas,
 }
 
 static enum ofperr
+push_mpls_from_openflow(ovs_be16 ethertype, enum ofpact_mpls_position position,
+                        struct ofpbuf *out)
+{
+    struct ofpact_push_mpls *oam;
+
+    if (!eth_type_mpls(ethertype)) {
+        return OFPERR_OFPBAC_BAD_ARGUMENT;
+    }
+    oam = ofpact_put_PUSH_MPLS(out);
+    oam->ethertype = ethertype;
+    oam->position = position;
+
+    return 0;
+}
+
+static enum ofperr
 decode_nxast_action(const union ofp_action *a, enum ofputil_action_code *code)
 {
     const struct nx_action_header *nah = (const struct nx_action_header *) a;
@@ -430,10 +446,8 @@ ofpact_from_nxast(const union ofp_action *a, enum ofputil_action_code code,
 
     case OFPUTIL_NXAST_PUSH_MPLS: {
         struct nx_action_push_mpls *nxapm = (struct nx_action_push_mpls *)a;
-        if (!eth_type_mpls(nxapm->ethertype)) {
-            return OFPERR_OFPBAC_BAD_ARGUMENT;
-        }
-        ofpact_put_PUSH_MPLS(out)->ethertype = nxapm->ethertype;
+        error = push_mpls_from_openflow(nxapm->ethertype,
+                                        OFPACT_MPLS_AFTER_VLAN, out);
         break;
     }
 
@@ -844,10 +858,8 @@ ofpact_from_openflow11(const union ofp_action *a, struct ofpbuf *out)
 
     case OFPUTIL_OFPAT11_PUSH_MPLS: {
         struct ofp11_action_push *oap = (struct ofp11_action_push *)a;
-        if (!eth_type_mpls(oap->ethertype)) {
-            return OFPERR_OFPBAC_BAD_ARGUMENT;
-        }
-        ofpact_put_PUSH_MPLS(out)->ethertype = oap->ethertype;
+        error = push_mpls_from_openflow(oap->ethertype,
+                                        OFPACT_MPLS_AFTER_VLAN, out);
         break;
     }
 
@@ -1108,6 +1120,35 @@ ofpacts_from_openflow11_for_action_set(const union ofp_action *in,
 }
 
 \f
+static enum ofperr
+ofpact_from_openflow13(const union ofp_action *a, struct ofpbuf *out)
+{
+    enum ofputil_action_code code;
+    enum ofperr error;
+
+    error = decode_openflow11_action(a, &code);
+    if (error) {
+        return error;
+    }
+
+    if (code == OFPUTIL_OFPAT11_PUSH_MPLS) {
+        struct ofp11_action_push *oap = (struct ofp11_action_push *)a;
+        error = push_mpls_from_openflow(oap->ethertype,
+                                        OFPACT_MPLS_BEFORE_VLAN, out);
+    } else {
+        error = ofpact_from_openflow11(a, out);
+    }
+
+    return error;
+}
+
+static enum ofperr
+ofpacts_from_openflow13(const union ofp_action *in, size_t n_in,
+                        struct ofpbuf *out)
+{
+    return ofpacts_from_openflow(in, n_in, out, ofpact_from_openflow13);
+}
+\f
 /* OpenFlow 1.1 instructions. */
 
 #define DEFINE_INST(ENUM, STRUCT, EXTENSIBLE, NAME)             \
@@ -1314,11 +1355,14 @@ get_actions_from_instruction(const struct ofp11_instruction *inst,
     *n_actions = (ntohs(inst->len) - sizeof *inst) / OFP11_INSTRUCTION_ALIGN;
 }
 
-/* Attempts to convert 'actions_len' bytes of OpenFlow 1.1 actions from the
+/* Attempts to convert 'actions_len' bytes of OpenFlow actions from the
  * front of 'openflow' into ofpacts.  On success, replaces any existing content
  * in 'ofpacts' by the converted ofpacts; on failure, clears 'ofpacts'.
  * Returns 0 if successful, otherwise an OpenFlow error.
  *
+ * Actions are processed according to their OpenFlow version which
+ * is provided in the 'version' parameter.
+ *
  * In most places in OpenFlow 1.1 and 1.2, actions appear encapsulated in
  * instructions, so you should call ofpacts_pull_openflow11_instructions()
  * instead of this function.
@@ -1330,15 +1374,27 @@ get_actions_from_instruction(const struct ofp11_instruction *inst,
  * valid in a specific context. */
 enum ofperr
 ofpacts_pull_openflow11_actions(struct ofpbuf *openflow,
+                                enum ofp_version version,
                                 unsigned int actions_len,
                                 struct ofpbuf *ofpacts)
 {
-    return ofpacts_pull_actions(openflow, actions_len, ofpacts,
-                                ofpacts_from_openflow11);
+    switch (version) {
+    case OFP10_VERSION:
+    case OFP11_VERSION:
+    case OFP12_VERSION:
+        return ofpacts_pull_actions(openflow, actions_len, ofpacts,
+                                    ofpacts_from_openflow11);
+    case OFP13_VERSION:
+        return ofpacts_pull_actions(openflow, actions_len, ofpacts,
+                                    ofpacts_from_openflow13);
+    default:
+        NOT_REACHED();
+    }
 }
 
 enum ofperr
 ofpacts_pull_openflow11_instructions(struct ofpbuf *openflow,
+                                     enum ofp_version version,
                                      unsigned int instructions_len,
                                      struct ofpbuf *ofpacts)
 {
@@ -1389,7 +1445,18 @@ ofpacts_pull_openflow11_instructions(struct ofpbuf *openflow,
 
         get_actions_from_instruction(insts[OVSINST_OFPIT11_APPLY_ACTIONS],
                                      &actions, &n_actions);
-        error = ofpacts_from_openflow11(actions, n_actions, ofpacts);
+        switch (version) {
+        case OFP10_VERSION:
+        case OFP11_VERSION:
+        case OFP12_VERSION:
+            error = ofpacts_from_openflow11(actions, n_actions, ofpacts);
+            break;
+        case OFP13_VERSION:
+            error = ofpacts_from_openflow13(actions, n_actions, ofpacts);
+            break;
+        default:
+            NOT_REACHED();
+        }
         if (error) {
             goto exit;
         }
diff --git a/lib/ofp-actions.h b/lib/ofp-actions.h
index 5996651..d6aabf9 100644
--- a/lib/ofp-actions.h
+++ b/lib/ofp-actions.h
@@ -327,12 +327,26 @@ struct ofpact_reg_load {
     union mf_subvalue subvalue; /* Least-significant bits are used. */
 };
 
+/* The position in the packet at which to insert an MPLS header.
+ *
+ * Used NXAST_PUSH_MPLS, OFPAT11_PUSH_MPLS. */
+enum ofpact_mpls_position {
+    /* Add the MPLS LSE after the Ethernet header but before any VLAN tags.
+     * OpenFlow 1.3+ requires this behavior. */
+   OFPACT_MPLS_BEFORE_VLAN,
+
+   /* Add the MPLS LSE after the Ethernet header and any VLAN tags.
+    * OpenFlow 1.1 and 1.2 require this behavior. */
+   OFPACT_MPLS_AFTER_VLAN
+};
+
 /* OFPACT_PUSH_VLAN/MPLS/PBB
  *
  * Used for NXAST_PUSH_MPLS, OFPAT11_PUSH_MPLS. */
 struct ofpact_push_mpls {
     struct ofpact ofpact;
     ovs_be16 ethertype;
+    enum ofpact_mpls_position position;
 };
 
 /* OFPACT_POP_MPLS
@@ -524,9 +538,11 @@ enum ofperr ofpacts_pull_openflow10(struct ofpbuf *openflow,
                                     unsigned int actions_len,
                                     struct ofpbuf *ofpacts);
 enum ofperr ofpacts_pull_openflow11_actions(struct ofpbuf *openflow,
+                                            enum ofp_version version,
                                             unsigned int actions_len,
                                             struct ofpbuf *ofpacts);
 enum ofperr ofpacts_pull_openflow11_instructions(struct ofpbuf *openflow,
+                                                 enum ofp_version version,
                                                  unsigned int instructions_len,
                                                  struct ofpbuf *ofpacts);
 enum ofperr ofpacts_check(const struct ofpact[], size_t ofpacts_len,
diff --git a/lib/ofp-print.c b/lib/ofp-print.c
index a31de9d..9a84645 100644
--- a/lib/ofp-print.c
+++ b/lib/ofp-print.c
@@ -2228,7 +2228,7 @@ ofp_print_group_desc(struct ds *s, const struct ofp_header *oh)
         struct ofputil_group_desc gd;
         int retval;
 
-        retval = ofputil_decode_group_desc_reply(&gd, &b);
+        retval = ofputil_decode_group_desc_reply(&gd, &b, oh->version);
         if (retval) {
             if (retval != EOF) {
                 ds_put_cstr(s, " ***parse error***");
diff --git a/lib/ofp-util.c b/lib/ofp-util.c
index 173b534..570447a 100644
--- a/lib/ofp-util.c
+++ b/lib/ofp-util.c
@@ -1504,7 +1504,8 @@ ofputil_decode_flow_mod(struct ofputil_flow_mod *fm,
             return error;
         }
 
-        error = ofpacts_pull_openflow11_instructions(&b, b.size, ofpacts);
+        error = ofpacts_pull_openflow11_instructions(&b, oh->version,
+                                                     b.size, ofpacts);
         if (error) {
             return error;
         }
@@ -2360,7 +2361,8 @@ ofputil_decode_flow_stats_reply(struct ofputil_flow_stats *fs,
             return EINVAL;
         }
 
-        if (ofpacts_pull_openflow11_instructions(msg, length - sizeof *ofs -
+        if (ofpacts_pull_openflow11_instructions(msg, oh->version,
+                                                 length - sizeof *ofs -
                                                  padded_match_len, ofpacts)) {
             VLOG_WARN_RL(&bad_ofmsg_rl, "OFPST_FLOW reply bad instructions");
             return EINVAL;
@@ -3092,7 +3094,8 @@ ofputil_decode_packet_out(struct ofputil_packet_out *po,
             return error;
         }
 
-        error = ofpacts_pull_openflow11_actions(&b, ntohs(opo->actions_len),
+        error = ofpacts_pull_openflow11_actions(&b, oh->version,
+                                                ntohs(opo->actions_len),
                                                 ofpacts);
         if (error) {
             return error;
@@ -5674,8 +5677,8 @@ ofputil_append_group_desc_reply(const struct ofputil_group_desc *gds,
 }
 
 static enum ofperr
-ofputil_pull_buckets(struct ofpbuf *msg, size_t buckets_length,
-                     struct list *buckets)
+ofputil_pull_buckets(struct ofpbuf *msg, enum ofp_version version,
+                     size_t buckets_length, struct list *buckets)
 {
     struct ofp11_bucket *ob;
 
@@ -5708,8 +5711,8 @@ ofputil_pull_buckets(struct ofpbuf *msg, size_t buckets_length,
         buckets_length -= ob_len;
 
         ofpbuf_init(&ofpacts, 0);
-        error = ofpacts_pull_openflow11_actions(msg, ob_len - sizeof *ob,
-                                                &ofpacts);
+        error = ofpacts_pull_openflow11_actions(msg, version,
+                                                ob_len - sizeof *ob, &ofpacts);
         if (error) {
             ofpbuf_uninit(&ofpacts);
             ofputil_bucket_list_destroy(buckets);
@@ -5745,7 +5748,7 @@ ofputil_pull_buckets(struct ofpbuf *msg, size_t buckets_length,
  * otherwise a positive errno value. */
 int
 ofputil_decode_group_desc_reply(struct ofputil_group_desc *gd,
-                                struct ofpbuf *msg)
+                                struct ofpbuf *msg, enum ofp_version version)
 {
     struct ofp11_group_desc_stats *ogds;
     size_t length;
@@ -5774,7 +5777,8 @@ ofputil_decode_group_desc_reply(struct ofputil_group_desc *gd,
         return OFPERR_OFPBRC_BAD_LEN;
     }
 
-    return ofputil_pull_buckets(msg, length - sizeof *ogds, &gd->buckets);
+    return ofputil_pull_buckets(msg, version, length - sizeof *ogds,
+                                &gd->buckets);
 }
 
 /* Converts abstract group mod 'gm' into a message for OpenFlow version
@@ -5857,7 +5861,7 @@ ofputil_decode_group_mod(const struct ofp_header *oh,
     gm->type = ogm->type;
     gm->group_id = ntohl(ogm->group_id);
 
-    return ofputil_pull_buckets(&msg, msg.size, &gm->buckets);
+    return ofputil_pull_buckets(&msg, oh->version, msg.size, &gm->buckets);
 }
 
 /* Parse a queue status request message into 'oqsr'.
diff --git a/lib/ofp-util.h b/lib/ofp-util.h
index d5f34d7..5fa8fee 100644
--- a/lib/ofp-util.h
+++ b/lib/ofp-util.h
@@ -973,7 +973,7 @@ int ofputil_decode_group_stats_reply(struct ofpbuf *,
                                      struct ofputil_group_stats *);
 
 int ofputil_decode_group_desc_reply(struct ofputil_group_desc *,
-                                    struct ofpbuf *);
+                                    struct ofpbuf *, enum ofp_version);
 
 void ofputil_append_group_desc_reply(const struct ofputil_group_desc *,
                                      struct list *buckets,
diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
index 2b89755..630591c 100644
--- a/utilities/ovs-ofctl.c
+++ b/utilities/ovs-ofctl.c
@@ -2970,8 +2970,8 @@ ofctl_parse_ofp11_actions(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
         /* Convert to ofpacts. */
         ofpbuf_init(&ofpacts, 0);
         size = of11_in.size;
-        error = ofpacts_pull_openflow11_actions(&of11_in, of11_in.size,
-                                                &ofpacts);
+        error = ofpacts_pull_openflow11_actions(&of11_in, OFP11_VERSION,
+                                                of11_in.size, &ofpacts);
         if (error) {
             printf("bad OF1.1 actions: %s\n\n", ofperr_get_name(error));
             ofpbuf_uninit(&ofpacts);
@@ -3039,8 +3039,8 @@ ofctl_parse_ofp11_instructions(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
         /* Convert to ofpacts. */
         ofpbuf_init(&ofpacts, 0);
         size = of11_in.size;
-        error = ofpacts_pull_openflow11_instructions(&of11_in, of11_in.size,
-                                                     &ofpacts);
+        error = ofpacts_pull_openflow11_instructions(&of11_in, OFP11_VERSION,
+                                                     of11_in.size, &ofpacts);
         if (!error) {
             /* Verify actions. */
             struct flow flow;
-- 
1.8.4

^ permalink raw reply related

* [PATCH v2.44 1/5] odp: Allow VLAN actions after MPLS actions
From: Simon Horman @ 2013-10-17  1:15 UTC (permalink / raw)
  To: dev, netdev, Jesse Gross, Ben Pfaff
  Cc: Pravin B Shelar, Ravi K, Isaku Yamahata, Joe Stringer
In-Reply-To: <1381972511-27221-1-git-send-email-horms@verge.net.au>

From: Joe Stringer <joe@wand.net.nz>

OpenFlow 1.1 and 1.2, and 1.3 differ in their handling of MPLS actions in the
presence of VLAN tags. To allow correct behaviour to be committed in
each situation, this patch adds a second round of VLAN tag action
handling to commit_odp_actions(), which occurs after MPLS actions. This
is implemented with a new field in 'struct xlate_in' called 'vlan_tci'.

When an push_mpls action is composed, the flow's current VLAN state is
stored into xin->vlan_tci, and flow->vlan_tci is set to 0 (pop_vlan). If
a VLAN tag is present, it is stripped; if not, then there is no change.
Any later modifications to the VLAN state is written to xin->vlan_tci.
When committing the actions, flow->vlan_tci is used before MPLS actions,
and xin->vlan_tci is used afterwards. This retains the current datapath
behaviour, but allows VLAN actions to be applied in a more flexible
manner.

Both before and after this patch MPLS LSEs are pushed onto a packet after
any VLAN tags that may be present. This is the behaviour described in
OpenFlow 1.1 and 1.2. OpenFlow 1.3 specifies that MPLS LSEs should be
pushed onto a packet before any VLAN tags that are present. Support
for this will be added by a subsequent patch that makes use of
the infrastructure added by this patch.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>

---

v2.44 [Simon Horman]
* Rebase for the following changes:
  f47ea02 ("Set datapath mask bits when setting a flow field.")
  7fdb60a ("Add support for write-actions")
  7358063 ("odp-util: Elaborate the comment for odp_flow_format() function.")
* Correct final_vlan_tci and next_vlan_tci initialisation in xlate_actions__()

v2.43 [Simon Horman]
* As suggested by Ben Pfaff
  Move vlan state into struct xlate_ctx

  1. Add final_vlan_tci member to struct xlate_ctx instead of vlan_tci member
     struct xlate_xin.  This seems to be a better palace for it as it does
     not need to be accessible from the caller.

  2. Move local vlan_tci variable of do_xlate_actions() to be the
     next_vlan_tci member of strict xlate_ctx.  This allows for it to work
     across resubmit actions and goto table instructions.

v2.42
* No change

v2.41 [Joe Stringer via Simon Horman]
* Rework comments to make things a little clearer

v2.40 [Simon Horman]
* Rebase for removal of mpls_depth from struct flow

v2.38 - v2.39
* No change

v2.37
* Rebase

v2.36
* No change

v2.5 [Joe Stringer]
* First post
---
 lib/odp-util.c               |   9 +-
 lib/odp-util.h               |   3 +-
 ofproto/ofproto-dpif-xlate.c | 125 +++++++++++++++++++++-----
 tests/ofproto-dpif.at        | 209 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 323 insertions(+), 23 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 6875e01..19a50a8 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -3639,11 +3639,15 @@ commit_set_pkt_mark_action(const struct flow *flow, struct flow *base,
  * used as part of the action.
  *
  * Returns a reason to force processing the flow's packets into the userspace
- * slow path, if there is one, otherwise 0. */
+ * slow path, if there is one, otherwise 0.
+ *
+ * VLAN actions may be committed twice; If vlan_tci in 'flow' differs from the
+ * one in 'base', then it is committed before MPLS actions. If 'final_vlan_tci'
+ * differs from 'flow->vlan_tci', it is committed afterwards. */
 enum slow_path_reason
 commit_odp_actions(const struct flow *flow, struct flow *base,
                    struct ofpbuf *odp_actions, struct flow_wildcards *wc,
-                   int *mpls_depth_delta)
+                   int *mpls_depth_delta, ovs_be16 final_vlan_tci)
 {
     enum slow_path_reason slow;
 
@@ -3656,6 +3660,7 @@ commit_odp_actions(const struct flow *flow, struct flow *base,
      * that it is no longer IP and thus nw and port actions are no longer valid.
      */
     commit_mpls_action(flow, base, odp_actions, wc, mpls_depth_delta);
+    commit_vlan_action(final_vlan_tci, base, odp_actions, wc);
     commit_set_priority_action(flow, base, odp_actions, wc);
     commit_set_pkt_mark_action(flow, base, odp_actions, wc);
 
diff --git a/lib/odp-util.h b/lib/odp-util.h
index 821b2c4..3fff988 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -175,7 +175,8 @@ enum slow_path_reason commit_odp_actions(const struct flow *,
                                          struct flow *base,
                                          struct ofpbuf *odp_actions,
                                          struct flow_wildcards *wc,
-                                         int *mpls_depth_delta);
+                                         int *mpls_depth_delta,
+                                         ovs_be16 final_vlan_tci);
 \f
 /* ofproto-dpif interface.
  *
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 7371750..3a71ba8 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -180,6 +180,36 @@ struct xlate_ctx {
     uint16_t user_cookie_offset;/* Used for user_action_cookie fixup. */
     bool exit;                  /* No further actions should be processed. */
 
+    /* The final vlan_tci state.
+     *
+     * The value of the vlan TCI prior to the committing of ODP MPLS
+     * actions should be stored in 'xin->flow->vlan_tci'.
+     *
+     * The final value of the VLAN TCI should be stored in 'vlan_tci'. And
+     * is if the value of 'vlan_tci' and 'xin->flow->vlan_tci' differ then
+     * VLAN ODP actions will be committed after any MPLS actions regardless
+     * of whether VLAN actions were also committed before the MPLS actions or
+     * not.
+     *
+     * This mechanism allows a VLAN tag to be popped before pushing
+     * an MPLS LSE and then the same VLAN tag pushed after pushing
+     * the MPLS LSE. In this way it is possible to push an MPLS LSE
+     * after an existing VLAN tag. Moreover this mechanism allows
+     * the order in which VLAN tags and MPLS LSEs are pushed. */
+    ovs_be16 final_vlan_tci;
+
+    /* The next vlan_tci state.
+     *
+     * This value this variable points to updated each time an
+     * action updates the VLAN tci.
+     *
+     * This variable initially points to 'xin->flow->vlan_tci' so that ODP
+     * VLAN actions are committed before any MPLS actions. When an MPLS
+     * action is composed 'next_vlan_tci' is updated to point to
+     * 'final_vlan_tci'. This causes subsequent VLAN actions to be
+     * committed after MPLS actions. */
+    ovs_be16 *next_vlan_tci;
+
     /* OpenFlow 1.1+ action set.
      *
      * 'action_set' accumulates "struct ofpact"s added by OFPACT_WRITE_ACTIONS.
@@ -1000,10 +1030,11 @@ static void
 output_normal(struct xlate_ctx *ctx, const struct xbundle *out_xbundle,
               uint16_t vlan)
 {
-    ovs_be16 *flow_tci = &ctx->xin->flow.vlan_tci;
+    ovs_be16 *flow_tci = &ctx->final_vlan_tci;
     uint16_t vid;
     ovs_be16 tci, old_tci;
     struct xport *xport;
+    bool flow_tci_equal_to_xin = (*flow_tci == ctx->xin->flow.vlan_tci);
 
     vid = output_vlan_to_vid(out_xbundle, vlan);
     if (list_is_empty(&out_xbundle->xports)) {
@@ -1034,9 +1065,15 @@ output_normal(struct xlate_ctx *ctx, const struct xbundle *out_xbundle,
         }
     }
     *flow_tci = tci;
+    if (flow_tci_equal_to_xin) {
+        ctx->xin->flow.vlan_tci = tci;
+    }
 
     compose_output_action(ctx, xport->ofp_port);
     *flow_tci = old_tci;
+    if (flow_tci_equal_to_xin) {
+        ctx->xin->flow.vlan_tci = old_tci;
+    }
 }
 
 /* A VM broadcasts a gratuitous ARP to indicate that it has resumed after
@@ -1269,7 +1306,7 @@ xlate_normal(struct xlate_ctx *ctx)
 
     /* Drop malformed frames. */
     if (flow->dl_type == htons(ETH_TYPE_VLAN) &&
-        !(flow->vlan_tci & htons(VLAN_CFI))) {
+        !(ctx->final_vlan_tci & htons(VLAN_CFI))) {
         if (ctx->xin->packet != NULL) {
             static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
             VLOG_WARN_RL(&rl, "bridge %s: dropping packet with partial "
@@ -1293,7 +1330,7 @@ xlate_normal(struct xlate_ctx *ctx)
     }
 
     /* Check VLAN. */
-    vid = vlan_tci_to_vid(flow->vlan_tci);
+    vid = vlan_tci_to_vid(ctx->final_vlan_tci);
     if (!input_vid_is_valid(vid, in_xbundle, ctx->xin->packet != NULL)) {
         xlate_report(ctx, "disallowed VLAN VID for this input port, dropping");
         return;
@@ -1551,7 +1588,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
     const struct xport *xport = get_ofp_port(ctx->xbridge, ofp_port);
     struct flow_wildcards *wc = &ctx->xout->wc;
     struct flow *flow = &ctx->xin->flow;
-    ovs_be16 flow_vlan_tci;
+    ovs_be16 flow_vlan_tci, vlan_tci;
     uint32_t flow_pkt_mark;
     uint8_t flow_nw_tos;
     odp_port_t out_port, odp_port;
@@ -1620,6 +1657,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
     }
 
     flow_vlan_tci = flow->vlan_tci;
+    vlan_tci = ctx->final_vlan_tci;
     flow_pkt_mark = flow->pkt_mark;
     flow_nw_tos = flow->nw_tos;
 
@@ -1659,12 +1697,13 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
             wc->masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
         }
         vlandev_port = vsp_realdev_to_vlandev(ctx->xbridge->ofproto, ofp_port,
-                                              flow->vlan_tci);
+                                              ctx->final_vlan_tci);
         if (vlandev_port == ofp_port) {
             out_port = odp_port;
         } else {
             out_port = ofp_port_to_odp_port(ctx->xbridge, vlandev_port);
             flow->vlan_tci = htons(0);
+            ctx->final_vlan_tci = htons(0);
         }
     }
 
@@ -1672,7 +1711,8 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
         ctx->xout->slow |= commit_odp_actions(flow, &ctx->base_flow,
                                               &ctx->xout->odp_actions,
                                               &ctx->xout->wc,
-                                              &ctx->mpls_depth_delta);
+                                              &ctx->mpls_depth_delta,
+                                              ctx->final_vlan_tci);
         nl_msg_put_odp_port(&ctx->xout->odp_actions, OVS_ACTION_ATTR_OUTPUT,
                             out_port);
 
@@ -1684,6 +1724,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
  out:
     /* Restore flow */
     flow->vlan_tci = flow_vlan_tci;
+    ctx->final_vlan_tci = vlan_tci;
     flow->pkt_mark = flow_pkt_mark;
     flow->nw_tos = flow_nw_tos;
 }
@@ -1838,7 +1879,8 @@ execute_controller_action(struct xlate_ctx *ctx, int len,
     ctx->xout->slow |= commit_odp_actions(&ctx->xin->flow, &ctx->base_flow,
                                           &ctx->xout->odp_actions,
                                           &ctx->xout->wc,
-                                          &ctx->mpls_depth_delta);
+                                          &ctx->mpls_depth_delta,
+                                          ctx->final_vlan_tci);
 
     odp_execute_actions(NULL, packet, &key, ctx->xout->odp_actions.data,
                         ctx->xout->odp_actions.size, NULL, NULL);
@@ -2227,7 +2269,8 @@ xlate_sample_action(struct xlate_ctx *ctx,
   ctx->xout->slow |= commit_odp_actions(&ctx->xin->flow, &ctx->base_flow,
                                         &ctx->xout->odp_actions,
                                         &ctx->xout->wc,
-                                        &ctx->mpls_depth_delta);
+                                        &ctx->mpls_depth_delta,
+                                        ctx->final_vlan_tci);
 
   compose_flow_sample_cookie(os->probability, os->collector_set_id,
                              os->obs_domain_id, os->obs_point_id, &cookie);
@@ -2276,6 +2319,18 @@ xlate_action_set(struct xlate_ctx *ctx)
 }
 
 static void
+vlan_tci_restore(struct xlate_ctx *ctx, ovs_be16 orig_tci)
+{
+    /* If MPLS actions were executed after vlan actions then
+     * copy the final vlan_tci out and restore the intermediate VLAN state. */
+    if (ctx->xin->flow.vlan_tci != orig_tci &&
+        ctx->next_vlan_tci == &ctx->final_vlan_tci) {
+        ctx->final_vlan_tci = ctx->xin->flow.vlan_tci;
+        ctx->xin->flow.vlan_tci = orig_tci;
+    }
+}
+
+static void
 do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
                  struct xlate_ctx *ctx)
 {
@@ -2293,6 +2348,9 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
             break;
         }
 
+        /* Update the final vlan state to match the current state. */
+        ctx->final_vlan_tci = *ctx->next_vlan_tci;
+
         switch (a->type) {
         case OFPACT_OUTPUT:
             xlate_output_action(ctx, ofpact_get_OUTPUT(a)->port,
@@ -2316,28 +2374,28 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
 
         case OFPACT_SET_VLAN_VID:
             wc->masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
-            flow->vlan_tci &= ~htons(VLAN_VID_MASK);
-            flow->vlan_tci |= (htons(ofpact_get_SET_VLAN_VID(a)->vlan_vid)
-                               | htons(VLAN_CFI));
+            *ctx->next_vlan_tci &= ~htons(VLAN_VID_MASK);
+            *ctx->next_vlan_tci |= (htons(ofpact_get_SET_VLAN_VID(a)->vlan_vid)
+                          | htons(VLAN_CFI));
             break;
 
         case OFPACT_SET_VLAN_PCP:
-            wc->masks.vlan_tci |= htons(VLAN_PCP_MASK | VLAN_CFI);
-            flow->vlan_tci &= ~htons(VLAN_PCP_MASK);
-            flow->vlan_tci |=
+            wc->masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
+            *ctx->next_vlan_tci &= ~htons(VLAN_PCP_MASK);
+            *ctx->next_vlan_tci |=
                 htons((ofpact_get_SET_VLAN_PCP(a)->vlan_pcp << VLAN_PCP_SHIFT)
                       | VLAN_CFI);
             break;
 
         case OFPACT_STRIP_VLAN:
             memset(&wc->masks.vlan_tci, 0xff, sizeof wc->masks.vlan_tci);
-            flow->vlan_tci = htons(0);
+            *ctx->next_vlan_tci = htons(0);
             break;
 
         case OFPACT_PUSH_VLAN:
             /* XXX 802.1AD(QinQ) */
             memset(&wc->masks.vlan_tci, 0xff, sizeof wc->masks.vlan_tci);
-            flow->vlan_tci = htons(VLAN_CFI);
+            *ctx->next_vlan_tci = htons(VLAN_CFI);
             break;
 
         case OFPACT_SET_ETH_SRC:
@@ -2405,29 +2463,54 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
             flow->skb_priority = ctx->orig_skb_priority;
             break;
 
-        case OFPACT_REG_MOVE:
+        case OFPACT_REG_MOVE: {
+            ovs_be16 orig_tci = flow->vlan_tci;
             nxm_execute_reg_move(ofpact_get_REG_MOVE(a), flow, wc);
+            vlan_tci_restore(ctx, orig_tci);
             break;
+        }
 
-        case OFPACT_REG_LOAD:
+        case OFPACT_REG_LOAD: {
+            ovs_be16 orig_tci = flow->vlan_tci;
             nxm_execute_reg_load(ofpact_get_REG_LOAD(a), flow, wc);
+            vlan_tci_restore(ctx, orig_tci);
             break;
+        }
 
-        case OFPACT_STACK_PUSH:
+        case OFPACT_STACK_PUSH: {
+            ovs_be16 orig_tci = flow->vlan_tci;
+            flow->vlan_tci = *ctx->next_vlan_tci;
             nxm_execute_stack_push(ofpact_get_STACK_PUSH(a), flow, wc,
                                    &ctx->stack);
+            flow->vlan_tci = orig_tci;
             break;
+        }
 
-        case OFPACT_STACK_POP:
+        case OFPACT_STACK_POP: {
+            ovs_be16 orig_tci = flow->vlan_tci;
             nxm_execute_stack_pop(ofpact_get_STACK_POP(a), flow, wc,
                                   &ctx->stack);
+            vlan_tci_restore(ctx, orig_tci);
             break;
+        }
 
         case OFPACT_PUSH_MPLS:
             if (compose_mpls_push_action(ctx,
                                          ofpact_get_PUSH_MPLS(a)->ethertype)) {
                 return;
             }
+
+            /* Save and pop any existing VLAN tags. They will be pushed
+             * back onto the packet after pushing the MPLS LSE. The overall
+             * effect is to push the MPLS LSE after any VLAN tags that may
+             * be present. This is the behaviour described for OpenFlow 1.1
+             * and 1.2. This code needs to be enhanced to make this
+             * conditional to also and support pushing the MPLS LSE before
+             * any VLAN tags that may be present, the behaviour described
+             * for OpenFlow 1.3. */
+            ctx->final_vlan_tci = *ctx->next_vlan_tci;
+            flow->vlan_tci = htons(0);
+            ctx->next_vlan_tci = &ctx->final_vlan_tci;
             break;
 
         case OFPACT_POP_MPLS:
@@ -2768,6 +2851,8 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out *xout)
     ctx.table_id = 0;
     ctx.exit = false;
     ctx.mpls_depth_delta = 0;
+    ctx.final_vlan_tci = ctx.xin->flow.vlan_tci;
+    ctx.next_vlan_tci = &ctx.xin->flow.vlan_tci;
 
     if (!xin->ofpacts && !ctx.rule) {
         rule_dpif_lookup(ctx.xbridge->ofproto, flow, wc, &rule);
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 3630cda..bc9bcf4 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -965,6 +965,215 @@ done
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([ofproto-dpif - OF1.2 VLAN+MPLS handling])
+OVS_VSWITCHD_START([dnl
+   add-port br0 p1 -- set Interface p1 type=dummy
+])
+ON_EXIT([kill `cat ovs-ofctl.pid`])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_DATA([flows.txt], [dnl
+cookie=0xa dl_src=40:44:44:44:54:50 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],push_vlan:0x8100,mod_vlan_vid:99,mod_vlan_pcp:1,controller
+cookie=0xa dl_src=40:44:44:44:54:51 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],push_vlan:0x8100,mod_vlan_vid:99,mod_vlan_pcp:1,controller
+cookie=0xa dl_src=40:44:44:44:54:52 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,controller
+cookie=0xa dl_src=40:44:44:44:54:53 actions=push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,controller
+cookie=0xa dl_src=40:44:44:44:54:54 actions=push_vlan:0x8100,mod_vlan_vid:99,mod_vlan_pcp:1,push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],controller
+cookie=0xa dl_src=40:44:44:44:54:55 actions=push_vlan:0x8100,mod_vlan_vid:99,mod_vlan_pcp:1,push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],controller
+cookie=0xa dl_src=40:44:44:44:54:56 actions=push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],controller
+cookie=0xa dl_src=40:44:44:44:54:57 actions=push_vlan:0x8100,load:99->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,push_mpls:0x8847,load:10->OXM_OF_MPLS_LABEL[[]],controller
+])
+AT_CHECK([ovs-ofctl --protocols=OpenFlow12 add-flows br0 flows.txt])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push the MPLS tag before pushing a VLAN tag, so we see
+dnl both of these in the final flow
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:50,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:50,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:50,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:50,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet in vlan-tagged, which should be stripped
+dnl before we push the MPLS and VLAN tags.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:51,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:51,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:51,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:51,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push the MPLS tag before pushing a VLAN tag, so we see
+dnl both of these in the final flow
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:52,dst=52:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:52,dl_dst=52:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:52,dl_dst=52:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:52,dl_dst=52:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet in vlan-tagged, which should be stripped
+dnl before we push the MPLS and VLAN tags.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:53,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:53,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:53,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:53,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push the VLAN tag before pushing a MPLS tag, but these
+dnl actions are reordered, so we see both of these in the final flow.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:54,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:54,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:54,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:54,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet in vlan-tagged, which should be stripped
+dnl before we push the MPLS and VLAN tags.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:55,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:55,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:55,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:55,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, we push the VLAN tag before pushing a MPLS tag, but these
+dnl actions are reordered, so we see both of these in the final flow.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:56,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no)'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:56,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:56,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=68 in_port=1 (via action) data_len=68 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:56,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+dnl Modified MPLS controller action.
+dnl In this test, the input packet in vlan-tagged, which should be stripped
+dnl before we push the MPLS and VLAN tags.
+AT_CHECK([ovs-ofctl monitor br0 65534 -P nxm --detach --pidfile 2> ofctl_monitor.log])
+
+for i in 1 2 3; do
+    ovs-appctl netdev-dummy/receive p1 'in_port(1),eth(src=40:44:44:44:54:57,dst=50:54:00:00:00:07),eth_type(0x8100),vlan(vid=88,pcp=7),encap(eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no))'
+done
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 6])
+ovs-appctl -t ovs-ofctl exit
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:57,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:57,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+dnl
+NXT_PACKET_IN (xid=0x0): cookie=0xa total_len=64 in_port=1 (via action) data_len=64 (unbuffered)
+mpls,metadata=0,in_port=0,dl_vlan=99,dl_vlan_pcp=1,dl_src=40:44:44:44:54:57,dl_dst=50:54:00:00:00:07,mpls_label=10,mpls_tc=0,mpls_ttl=64,mpls_bos=1
+])
+
+AT_CHECK([ovs-appctl time/warp 5000], [0], [ignore])
+AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip | sort], [0], [dnl
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:50 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],mod_vlan_vid:99,mod_vlan_pcp:1,CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:51 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],mod_vlan_vid:99,mod_vlan_pcp:1,CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:52 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x63->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:53 actions=push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],load:0x63->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:54 actions=mod_vlan_vid:99,mod_vlan_pcp:1,push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:55 actions=mod_vlan_vid:99,mod_vlan_pcp:1,push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:56 actions=load:0x63->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],CONTROLLER:65535
+ cookie=0xa, n_packets=3, n_bytes=180, dl_src=40:44:44:44:54:57 actions=load:0x63->OXM_OF_VLAN_VID[[]],mod_vlan_pcp:1,push_mpls:0x8847,load:0xa->OXM_OF_MPLS_LABEL[[]],CONTROLLER:65535
+NXST_FLOW reply:
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - fragment handling])
 OVS_VSWITCHD_START
 ADD_OF_PORTS([br0], [1], [2], [3], [4], [5], [6], [90])
-- 
1.8.4

^ permalink raw reply related

* [PATCH v2.44 5/5] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2013-10-17  1:15 UTC (permalink / raw)
  To: dev, netdev, Jesse Gross, Ben Pfaff
  Cc: Pravin B Shelar, Ravi K, Isaku Yamahata, Joe Stringer
In-Reply-To: <1381972511-27221-1-git-send-email-horms@verge.net.au>

Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>

---

v2.43 - v2.44
* No change

v2.42
* Rebase for:
  + 0585f7a ("datapath: Simplify mega-flow APIs.")
  + a097c0b ("datapath: Restructure datapath.c and flow.c")
* As suggested by Jesse Gross
  + Take into account that push_mpls() will have freed the skb on error
  + Remove dubious !eth_p_mpls(skb->protocol) condition from push_mpls
    The !eth_p_mpls(skb->protocol) condition on setting inner_protocol
    has no effect. Its motivation was to ensure that inner_protocol was
    only set the first time that mpls_push occured. However this is already
    ensured by the !ovs_skb_get_inner_protocol(skb) condition.
  + Return -EINVAL instead of -ENOMEM from pop_mpls() if the skb is too short
  + Do not add @inner_protocol to kernel doc for struct ovs_skb_cb.
    The patch no longer adds an inner_protocol member to struct ovs_skb_cb
  + Do not add and set otherwise unsued inner_protocol variable in
    rpl_dev_queue_xmit()
* As suggested by Pravin Shelar
  + Implement compatibility code in existing rpl_skb_gso_segment
    rather than introducing to use rpl___skb_gso_segment

v2.41
* No change

v2.40
* Rebase for:
  + New dev_queue_xmit compat code
  + Updated put_vlan()
* As suggested by Jesse Gross
  + Remove bogus mac_len update from push_mpls()
  + Slightly simplify push_mpls() by using eth_hdr()
  + Remove dubious condition !eth_p_mpls(inner_protocol) on
    an skb being considered to be MPLS in netdev_send()
  + Only use compatibility code for MPLS GSO segmentation on kernels
    older than 3.11
  + Revamp setting of inner_protocol
    1. Do not unconditionally set inner_protocol to the value of
       skb->protocol in ovs_execute_actions().
    2. Initialise inner_protocol it to zero only if compatibility code is in
       use. In the case where compatibility code is not in use it will either
       be zero due since the allocation of the skb or some other value set
       by some other user.
    3. Conditionally set the inner_protocol in push_mpls() to the value of
       skb->protocol when entering push_mpls(). The condition is that
       inner_protocol is zero and the value of skb->protocol is not an MPLS
       ethernet type.
    - This new scheme:
      + Pushes logic to set inner_protocol closer to the case where it is
	needed.
      + Avoids over-writing values set by other users.
* As suggested by Pravin Shelar
  + Only set and restore skb->protocol in rpl___skb_gso_segment() in the
    case of MPLS
  + Add inner_protocol field to struct ovs_gso_cb instead of ovs_skb_cb.
    This moves compatibility code closer to where it is used
    and creates fewer differences with mainline.
* Update comment on mac_len updates in datapath/actions.c
* Remove HAVE_INNER_PROCOTOL and instead just check
  against kernel version 3.11 directly.
  HAVE_INNER_PROCOTOL is a hang-over from work done prior
  to the merge of inner_protocol into the kernel.
* Remove dubious condition !eth_p_mpls(inner_protocol) on
  using inner_protocol as the type in rpl_skb_network_protocol()
* Do not update type of features in rpl_dev_queue_xmit.
  Though arguably correct this is not an inherent part of
  the changes made by this patch.
* Use skb_cow_head() in push_mpls()
  + Call skb_cow_head(skb, MPLS_HLEN) instead of
    make_writable(skb, skb->mac_len) to ensure that there is enough head
    room to push an MPLS LSE regardless of whether the skb is cloned or not.
  + This is consistent with the behaviour of rpl__vlan_put_tag().
  + This is a fix for crashes reported when performing mpls_push
    with headroom less than 4. This problem was introduced in v3.36.
* Skip popping in mpls_pop if the skb is too short to contain an MPLS LSE

v2.39
* Rebase for removal of vlan, checksum and skb->mark compat code

v2.38
* Rebase for SCTP support
* Refactor validate_tp_port() to iterate over eth_types rather
  than open-coding the loop. With the addition of SCTP this logic
  is now used three times.

v2.36 - v2.37
* Rebase

* Do not add set_ethertype() to datapath/actions.c.
  As this patch has evolved this function had devolved into
  to sets of functionality wrapped into a single function with
  only one line of common code. Refactor things to simply
  open-code setting the ether type in the two locations where
  set_ethertype() was previously used. The aim here is to improve
  readability.

* Update setting skb->ethertype after mpls push and pop.
  - In the case of push_mpls it should be set unconditionally
    as in v2.35 the behaviour of this function to always push
    an MPLS LSE before any VLAN tags.
  - In the case of mpls_pop eth_p_mpls(skb->protocol) is a better
    test than skb->protocol != htons(ETH_P_8021Q) as it will give the
    correct behaviour in the presence of other VLAN ethernet types,
    for example 0x88a8 which is used by 802.1ad. Moreover, it seems
    correct to update the ethernet type if it was previously set
    according to the top-most MPLS LSE.

* Deaccelerate VLANs when pushing MPLS tags the
  - Since v2.35 MPLS push will insert an MPLS LSE before any VLAN tags.
    This means that if an accelerated tag is present it should be
    deaccelerated to ensure it ends up in the correct position.

* Update skb->mac_len in push_mpls() so that it will be correct
  when used by a subsequent call to pop_mpls().

  As things stand I do not believe this is strictly necessary as
  ovs-vswitchd will not send a pop MPLS action after a push MPLS action.
  However, I have added this in order to code more defensively as I believe
  that if such a sequence did occur it would be rather unobvious why
  it didn't work.

* Do not add skb_cow_head() call in push_mpls().
  It is unnecessary as there is a make_writable() call.
  This change was also made in v2.30 but some how the
  code regressed between then and v2.35.

v2.35
* Rebase
* Move MPLS constants to mpls.h
* Push MPLS tags after ethernet, before VLAN tags
  - This is consistent with the OpenFlow 1.3 specification
  - Compatibility with OpenFlow 1.2 and earlier versions
    may be provided by ovs-vswitchd.
* Correct GSO behaviour in the presence of MPLS but absence of VLANs

v2.34
* Rebase for megaflow changes

v2.33
* Ensure that inner_protocol is always set to to the current
  skb->protocol value in ovs_execute_actions(). This ensures
  it is set to the correct value in the absence of a push_mpls action.
  Also remove setting of inner_protocol in push_mpls() as
  it duplicates the code now in ovs_execute_actions().
* Call __skb_gso_segment() instead of skb_gso_segment() from
  rpl___skb_gso_segment() in the case that HAVE___SKB_GSO_SEGMENT is set.
  This was a typo.

v2.32
* As suggested by Jesse Gross
  - Use int instead of size_t in validate_and_copy_actions__().
  - Fix crazy edit mess in pop_mpls() action comment
  - Move eth_p_mpls() into mpls.h
  - Refactor skb_gso_segment MPLS handling into rpl_skb_gso_segment
    Address Jesse's comments regarding this code:
    "Can we push this completely into the skb_gso_segment() compatibility
     code? It's both nicer and may make the interactions with the vlan code
     less confusing."
  - Move GSO compatibility code into linux/compat/gso.*
  - Set skb->protocol on mpls_push and mpls_pop in the presence
    of an offloaded VLAN.

v2.31
* As suggested by Jesse Gross
  - There is no need to make mac_header_end inline as it is not in a header file
  - Remove dubious if (*skb_ethertype == ethertype) optimisation from
    set_ethertype
  - Only set skb->protocol in push_mpls() or pop_mpls() for non-VLAN packets
  - Use MAX_ETH_TYPES instead of SAMPLE_ACTION_DEPTH for array size
    of types in struct eth_types. This corrects a typo/thinko.
  - Correct eth type tracking logic such that start isn't advanced
    when entering a sample action, ensuring that all possibly types
    are checked when verifying nested actions.
* Define HAVE_INNER_PROTOCOL based on kernel version.
  inner_protocol has been merged into net-next and should appear in
  v3.11 so there is no longer a need for a acinclude.m4 test to check for it.
* Add MPLS GSO compatibility code.
  This is for use on kernels that do not have MPLS GSO support.
  Thanks to Joe Stringer for his work on this.

v2.30
* As suggested by Jesse Gross
  - Use skb_cow_head in push_mpls to ensure there is sufficient headroom for
    skb_push
  - Call make_writable with skb->mac_len instead of skb->mac_len + MPLS_HLEN
    in push_mpls as only the first skb->mac_len bytes of existing packet data
    are modified.
  - Rename skb_mac_header_end as mac_header_end, this seems
    to be a more appropriate name for a local function.
  - Remove OVS_CSUM_COMPLETE code from set_ethertype().
    Inside OVS the ethernet header is not covered by OVS_CSUM_COMPLETE.
  - Use __skb_pull() instead of skb_pull() in pop_mpls()
  - Decrement and decrement skb->mac_len when poping and pushing VLAN tags.
    Previously mac_len was reset, but this would result in forgetting
    the MPLS label stack.
  - Remove spurious comment from before do_execute_actions().
  - Move OVS_KEY_ATTR_MPLS attribute to its final, upstreamable, location.
  - Correct ethertype check for OVS_ACTION_ATTR_POP_MPLS case in
    validate_and_copy_actions() to check for MPLS ethertypes rather than
    ETH_P_IP.
  - Rewrite tracking of eth types used to verify actions in the presence
    of sample actions. There is a large comment above struct eth_types
    describing the new implementation.

v2.29
* Break include/ and lib/ portions of the patch out into a
  separate patch "datapath: Add basic MPLS support to kernel"
* Update for new MPLS GSO scheme
  - skb->protocol is set to the new ethertype of the packet
    on MPLS push and pop
  - When pushing the first MPLS LSE onto a previously non-MPLS
    packet set skb->inner_protocol to the original ethertype.
  - skb->inner_protocol may be used by the network stack
    for GSO of the inner-packet.
* Drop const from ethertype parameter of set_ethertype.
  This appears to be a legacy of this parameter being a pointer.
* Pass the ethertype patrameter of pop_mpls as a value rather
  than a pointer.

v2.28
* Kernel Datapath changes as suggested by Jarno Rajahalme
  + Correct the logic introduced in v2.27 to set the network_header
    to after the MPLS label stack in the case of an MPLS packet.
    - Increment stack_len offset so that label stacks of depth greater
      than two do not cause an infinite loop.
    - Correct offset passed to check_header to include skb->mac len

v2.27
* Kernel Datapath changes as suggested by Jarno Rajahalme and Jesse Gross:
  + Previously the mac_len and network_header of an skb corresponded
    to the end of the L2 header.  To support GSO, just before transmission,
    do_output, with the results as follows:

    Input: non-MPLS skb: Output: network header and mac_len correspond
                         to the beginning of the L3 headers
    Input: MPLS:         Output: network header and mac_len correspond to the
                         end of the L2 headers.

    This is somewhat confusing.

  + The new scheme is as follows:
    - The mac_len always corresponds to the end of the L2 header.
    - The network header always corresponds to the beginning of the
      L3 header.

  + Note that in the case of MPLS output the end of the L2 headers and the
  beginning of the L3 headers will differ.

* Remove unused declaration of skb_cb_mpls_stack()

v2.26
* Rebase on master
* Kernel Datapath changes as suggested by Jarno Rajahalme
  - Use skb_network_header() instead of skb_mac_header() to locate
    the ethertype to set in set_ethertype() as the latter will
    be wrong in the presence of VLAN tags. This resolves
    a regression introduced in v2.24.
  - Enhance comment in do_output()
  - do_execute_actions(): Do not alter mpls_stack_depth if
    a MPLS push or pop action fail. This is achieved by altering
    mpls_stack_depth at the end of push_mpls() and pop_mpls().

v2.25
* Rebase on master
* Pass big-endian value as the last argument of eth_types_set() in
  validate_and_copy_actions__()
* Use revised GSO support as provided by the patch series
  "[PATCH 0/2] Small Modifications to GSO to allow segmentation of MPLS"
  - Set skb->mac_len to the length of the l2 header + MPLS stack length
  - Update skb->network_header accordingly
  - Set skb->encapsulated_features

v2.24
* Use skb_mac_header() in set_ethertype()
* Set skb->encapsulation in set_ethertype() to support MPLS GSO.
  Also add a note about the other requirements for MPLS GSO.
  MPLS GSO support will be posted as a patch net-next (Linux mainline)
  "MPLS: Add limited GSO support"
* Do not add ETH_TYPE_MIN, it is no longer used

v2.23
* As suggested by Jesse Gross:
  - Verify the current ethernet type when validating sample actions
    both for the taken and not-taken path if the sample action.
  - Document that the OVS_KEY_ATTR_MPLS attribute accepts a list of
    struct ovs_key_mpls but that an implementation may restrict
    the length it accepts.
  - Restrict the array length of the OVS_KEY_ATTR_MPLS to one.
    + Don't add ovs_flow_verify_key_len as it was added to
      handle attributes whose values are arrays but there are
      no attributes with values that are arrays (of length greater than one).

v2.22
* As suggested by Jesse Gross:
  - Fix sparse warning in validate_and_copy_actions()
    I have no idea why sparse doesn't show this up this on my system.
  - Remove call to skb_cow_head() from push_mpls() as it
    is already covered by a call to make_writable()
  - Check (key_type > OVS_KEY_ATTR_MAX) in ovs_flow_verify_key_len()
  - Disallow set actions on l2.5+ data and MPLS push and pop actions
    after an MPLS pop action as there is no verification that the packet
    is actually of the new ethernet type. This may later be supported
    using recirculation or by other means.
  - Do not add spurious debuging message to ovs_flow_cmd_new_or_set()

v2.21
* As suggested by Jesse Gross:
  - Verify that l3 and l4 actions always always occur prior to
    a push_mpls action and use the network header pointer of an skb
    to track the top of the MPLS stack. This avoids adding an l2_size
    element to the skb callback.

v2.20
* As suggested by Jesse Gross:
  - Do not add ovs_dp_ioctl_hook
    + This appears to be garbage from a rebase
  - Do not add skb_cb_set_l2_size. Instead set OVS_CB(skb)->l2_size
    in ovs_flow_extract().
  - Do not free skb on error in push_mpls(), it is freed in the caller
  - Call skb_reset_mac_len() in pop_mpls() and push_mpls()
  - Update checksums in pop_mpls(), push_mpls() and set_mpls().
  - Rename skb_cb_mpls_bos() as skb_cb_mpls_stack().
    It returns the top not the bottom of the stack.
  - Track the current eth_type in validate_and_copy_actions
    which is initially the eth_type of the flow and may be modified
    by push_mpls and pop_mpls actions. Use this to correctly validate
    mpls_set actions. This is to allow mpls_set actions to be applied
    to a non-MPLS frame after an mpls_push action (although ovs-vswitchd
    doesn't currently do that).
    Also:
    + Remove the check of the eth_type in set_mpls() as the new validation
      scheme should ensure it cannot be incorrect.
    + Use the current eth_type to validate mpls_pop actions and remove
      the eth_type check from pop_mpls().
  - Move OVS_KEY_ATTR_MPLS to non-upstream group in ovs_key_lens
  - Remove unnecessary memset of mpls_key in ovs_flow_to_nlattrs()
  - Make a union of the mpls and ip elements of struct sw_flow_key.
    Currently the code stops parsing after an MPLS header so it is
    not possible for the ip and mpls elements to be used simultaneously
    and some space can be saved by using a union.
  - Allow an array of MPLS key attributes
    + Currently all but the first element is ignored
    + User-space needs to be updated to accept more than one element,
      currently it will treat their presence as an error
  - Do not update network header in ovs_flow_extract() for after parsing
    the MPLS stack as it is never used because no l3+ processing
    occurs on MPLS frames.
  - Allow multiple MPLS entries in a match by allowing the OVS_KEY_ATTR_MPLS
    to be an array of struct ovs_key_mpls with at least one entry.
    Currently only one entry is used which is byte-for-byte compatible with
    the previous scheme of having OVS_KEY_ATTR_MPLS as a struct
    ovs_key_mpls.
* Make skb writable in pop_mpls(), push_mpls() and set_mpls().

v2.18 - v2.19
* No change

v2.17
* As suggested by Ben Pfaff
  - Use consistent terminology for MPLS.
    + Consistently refer to the MPLS component of a packet as the
      MPLS label stack and entries in the stack as MPLS label stack entries
      (LSE).  An MPLS label is a component of an MPLS label stack entry.
      The other components are the traffic class (TC), time to live (TTL)
      and bottom of stack (BoS) bit.
  - Rename compose_.*mpls_ functions as execute_.*mpls_

v2.16
* No change

v2.15
* As suggested by Ben Pfaff
  - Use OVS_ACTION_SET to set OVS_KEY_ATTR_MPLS instead of
    OVS_ACTION_ATTR_SET_MPLS

v2.14
* Remove include/linux/openvswitch.h portion which added add
  new key and action attributes. This
  now present in "User-Space MPLS actions and matches"
  which is now a dependency of this patch

v2.13
* As suggested by Jarno Rajahalme
  - Rename mpls_bos element of ovs_skb_cb as l2_size as it is set and used
    regardless of if an MPLS stack is present or not. Update the name of
    helper functions and documentation accordingly.
  - Ensure that skb_cb_mpls_bos() never returns NULL
* Correct endieness in eth_p_mpls()

v2.12
* Update skb and network header on MPLS extraction in ovs_flow_extract()
* Use NULL in skb_cb_mpls_bos()
* Add eth_p_mpls helper

v2.10 - v2.11
* No change

v2.9
* datapath: Always update the mpls bos if  vlan_pop is successful

  Regardless of the details of how a successful
  vlan_pop is achieved, the mpls bos needs to be updated.

  Without this fix it has been observed that the following
  results in malformed packets

v2.8
* No change

v2.7
* Rebase

v2.6
* As suggested by Yamahata-san
  - Do not guard against label == 0 for
    OVS_ACTION_ATTR_SET_MPLS in validate_actions().
    A label of 0 is valid
  - Remove comment stupulating that if
    the top_label element of struct sw_flow_key is 0 then
    there is no MPLS label. An MPLS label of 0 is valid
    and the correct check if ethertype is
    ntohs(ETH_TYPE_MPLS) or ntohs(ETH_TYPE_MPLS_MCAST)

v2.4 - v2.5
* No change

v2.3
* s/mpls_stack/mpls_bos/
  This is in keeping with the naming used in the OpenFlow 1.3 specification

v2.2
* Call skb_reset_mac_header() in skb_cb_set_mpls_stack()
  eth_hdr(skb) is non-NULL when called in skb_cb_set_mpls_stack().
* Add a call to skb_cb_set_mpls_stack() in ovs_packet_cmd_execute().
  I apologise that I have mislaid my notes on this but
  it avoids a kernel panic. I can investigate again if necessary.
* Use struct ovs_action_push_mpls instead of
  __be16 to decode OVS_ACTION_ATTR_PUSH_MPLS in validate_actions(). This is
  consistent with the data format for the attribute.
* Indentation fix in skb_cb_mpls_stack(). [cosmetic]

v2.1
* Manual rebase
---
 datapath/Modules.mk                             |   1 +
 datapath/actions.c                              | 131 ++++++++++-
 datapath/datapath.c                             |   4 +-
 datapath/flow.c                                 |  29 +++
 datapath/flow.h                                 |  17 +-
 datapath/flow_netlink.c                         | 286 ++++++++++++++++++++++--
 datapath/flow_netlink.h                         |   2 +-
 datapath/linux/compat/gso.c                     |  70 +++++-
 datapath/linux/compat/gso.h                     |  41 ++++
 datapath/linux/compat/include/linux/netdevice.h |   6 +-
 datapath/linux/compat/netdevice.c               |  10 +-
 datapath/mpls.h                                 |  15 ++
 include/linux/openvswitch.h                     |   7 +-
 13 files changed, 569 insertions(+), 50 deletions(-)
 create mode 100644 datapath/mpls.h

diff --git a/datapath/Modules.mk b/datapath/Modules.mk
index b652411..6aa80e5 100644
--- a/datapath/Modules.mk
+++ b/datapath/Modules.mk
@@ -26,6 +26,7 @@ openvswitch_headers = \
 	flow.h \
 	flow_netlink.h \
 	flow_table.h \
+	mpls.h \
 	vlan.h \
 	vport.h \
 	vport-internal_dev.h \
diff --git a/datapath/actions.c b/datapath/actions.c
index d961e5d..8babfc4 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -35,6 +35,8 @@
 #include <net/sctp/checksum.h>
 
 #include "datapath.h"
+#include "gso.h"
+#include "mpls.h"
 #include "vlan.h"
 #include "vport.h"
 
@@ -71,7 +73,8 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 
 	vlan_set_encap_proto(skb, vhdr);
 	skb->mac_header += VLAN_HLEN;
-	skb_reset_mac_len(skb);
+	/* Update mac_len for subsequent MPLS actions */
+	skb->mac_len -= VLAN_HLEN;
 
 	return 0;
 }
@@ -113,6 +116,9 @@ static int put_vlan(struct sk_buff *skb)
 	if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
 		return -ENOMEM;
 
+	/* update mac_len for subsequent MPLS actions */
+	skb->mac_len += VLAN_HLEN;
+
 	if (skb->ip_summed == CHECKSUM_COMPLETE)
 		skb->csum = csum_add(skb->csum, csum_partial(skb->data
 				+ (2 * ETH_ALEN), VLAN_HLEN, 0));
@@ -134,6 +140,114 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
 	return 0;
 }
 
+/* The end of the mac header.
+ *
+ * For non-MPLS skbs this will correspond to the network header.
+ * For MPLS skbs it will be before the network_header as the MPLS
+ * label stack lies between the end of the mac header and the network
+ * header. That is, for MPLS skbs the end of the mac header
+ * is the top of the MPLS label stack.
+ */
+static unsigned char *mac_header_end(const struct sk_buff *skb)
+{
+	return skb_mac_header(skb) + skb->mac_len;
+}
+
+/* Push MPLS after the ethernet header. */
+static int push_mpls(struct sk_buff *skb,
+		     const struct ovs_action_push_mpls *mpls)
+{
+	__be32 *new_mpls_lse;
+	struct ethhdr *hdr;
+
+	if (unlikely(vlan_tx_tag_present(skb))) {
+		int err;
+
+		err = put_vlan(skb);
+		if (unlikely(err))
+			return err;
+
+	        vlan_set_tci(skb, 0);
+	}
+
+	if (skb_cow_head(skb, MPLS_HLEN) < 0) {
+		kfree_skb(skb);
+		return -ENOMEM;
+	}
+	skb_push(skb, MPLS_HLEN);
+
+	memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
+		ETH_HLEN);
+	skb_reset_mac_header(skb);
+
+	new_mpls_lse = (__be32 *)(skb_mac_header(skb) + ETH_HLEN);
+	*new_mpls_lse = mpls->mpls_lse;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
+							     MPLS_HLEN, 0));
+
+	hdr = eth_hdr(skb);
+	hdr->h_proto = mpls->mpls_ethertype;
+	if (!ovs_skb_get_inner_protocol(skb))
+		ovs_skb_set_inner_protocol(skb, skb->protocol);
+	skb->protocol = mpls->mpls_ethertype;
+	return 0;
+}
+
+static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
+{
+	struct ethhdr *hdr;
+	int err;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	if (unlikely(skb->len < skb->mac_len + MPLS_HLEN))
+		return -EINVAL;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_sub(skb->csum,
+				     csum_partial(mac_header_end(skb),
+						  MPLS_HLEN, 0));
+
+	memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+		skb->mac_len);
+
+	__skb_pull(skb, MPLS_HLEN);
+	skb_reset_mac_header(skb);
+
+	/* mac_header_end() is used to locate the ethertype
+	 * field correctly in the presence of VLAN tags.
+	 */
+	hdr = (struct ethhdr *)(mac_header_end(skb) - ETH_HLEN);
+	hdr->h_proto = ethertype;
+	if (eth_p_mpls(skb->protocol))
+		skb->protocol = ethertype;
+	return 0;
+}
+
+static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
+{
+	__be32 *stack = (__be32 *)mac_header_end(skb);
+	int err;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE) {
+		__be32 diff[] = { ~(*stack), *mpls_lse };
+		skb->csum = ~csum_partial((char *)diff, sizeof(diff),
+					  ~skb->csum);
+	}
+
+	*stack = *mpls_lse;
+
+	return 0;
+}
+
 static int set_eth_addr(struct sk_buff *skb,
 			const struct ovs_key_ethernet *eth_key)
 {
@@ -509,6 +623,9 @@ static int execute_set_action(struct sk_buff *skb,
 
 	case OVS_KEY_ATTR_SCTP:
 		err = set_sctp(skb, nla_data(nested_attr));
+
+	case OVS_KEY_ATTR_MPLS:
+		err = set_mpls(skb, nla_data(nested_attr));
 		break;
 	}
 
@@ -545,6 +662,16 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			output_userspace(dp, skb, a);
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS:
+			err = push_mpls(skb, nla_data(a));
+			if (unlikely(err)) /* skb already freed. */
+				return err;
+			break;
+
+		case OVS_ACTION_ATTR_POP_MPLS:
+			err = pop_mpls(skb, nla_get_be16(a));
+			break;
+
 		case OVS_ACTION_ATTR_PUSH_VLAN:
 			err = push_vlan(skb, nla_data(a));
 			if (unlikely(err)) /* skb already freed. */
@@ -618,6 +745,8 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
 		goto out_loop;
 	}
 
+	ovs_skb_init_inner_protocol(skb);
+
 	OVS_CB(skb)->tun_key = NULL;
 	error = do_execute_actions(dp, skb, acts->actions,
 					 acts->actions_len, false);
diff --git a/datapath/datapath.c b/datapath/datapath.c
index 9e6df12..4a2afad 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -518,7 +518,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_flow_free;
 
 	err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
-				   &flow->key, 0, &acts);
+				   &flow->key, &acts);
 	rcu_assign_pointer(flow->sf_acts, acts);
 	if (err)
 		goto err_flow_free;
@@ -779,7 +779,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 
 		ovs_flow_mask_key(&masked_key, &key, &mask);
 		error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS],
-					     &masked_key, 0, &acts);
+					     &masked_key, &acts);
 		if (error) {
 			OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
 			goto err_kfree;
diff --git a/datapath/flow.c b/datapath/flow.c
index faa4e15..91f428d 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -44,6 +44,7 @@
 #include <net/ipv6.h>
 #include <net/ndisc.h>
 
+#include "mpls.h"
 #include "vlan.h"
 
 u64 ovs_flow_used_time(unsigned long flow_jiffies)
@@ -391,6 +392,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
 		return -ENOMEM;
 
 	skb_reset_network_header(skb);
+	skb_reset_mac_len(skb);
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 
 	/* Network layer. */
@@ -473,6 +475,33 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
 			memcpy(key->ipv4.arp.sha, arp->ar_sha, ETH_ALEN);
 			memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN);
 		}
+	} else if (eth_p_mpls(key->eth.type)) {
+		size_t stack_len = MPLS_HLEN;
+
+		/* In the presence of an MPLS label stack the end of the L2
+		 * header and the beginning of the L3 header differ.
+		 *
+		 * Advance network_header to the beginning of the L3
+		 * header. mac_len corresponds to the end of the L2 header.
+		 */
+		while (1) {
+			__be32 lse;
+
+			error = check_header(skb, skb->mac_len + stack_len);
+			if (unlikely(error))
+				return 0;
+
+			memcpy(&lse, skb_network_header(skb), MPLS_HLEN);
+
+			if (stack_len == MPLS_HLEN)
+				memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
+
+			skb_set_network_header(skb, skb->mac_len + stack_len);
+			if (lse & htonl(MPLS_BOS_MASK))
+				break;
+
+			stack_len += MPLS_HLEN;
+		}
 	} else if (key->eth.type == htons(ETH_P_IPV6)) {
 		int nh_len;             /* IPv6 Header + Extensions */
 
diff --git a/datapath/flow.h b/datapath/flow.h
index 91a3022..eff0023 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -79,12 +79,17 @@ struct sw_flow_key {
 		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
 		__be16 type;		/* Ethernet frame type. */
 	} eth;
-	struct {
-		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
-		u8     tos;		/* IP ToS. */
-		u8     ttl;		/* IP TTL/hop limit. */
-		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
-	} ip;
+	union {
+		struct {
+			__be32 top_lse;		/* top label stack entry */
+		} mpls;
+		struct {
+			u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
+			u8     tos;		/* IP ToS. */
+			u8     ttl;		/* IP TTL/hop limit. */
+			u8     frag;		/* One of OVS_FRAG_TYPE_*. */
+		} ip;
+	};
 	union {
 		struct {
 			struct {
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 515a9f6..eaf41dc 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -18,6 +18,7 @@
 
 #include "flow.h"
 #include "datapath.h"
+#include "mpls.h"
 #include <linux/uaccess.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
@@ -119,7 +120,8 @@ static bool match_validate(const struct sw_flow_match *match,
 			| (1ULL << OVS_KEY_ATTR_ICMP)
 			| (1ULL << OVS_KEY_ATTR_ICMPV6)
 			| (1ULL << OVS_KEY_ATTR_ARP)
-			| (1ULL << OVS_KEY_ATTR_ND));
+			| (1ULL << OVS_KEY_ATTR_ND)
+			| (1ULL << OVS_KEY_ATTR_MPLS));
 
 	/* Always allowed mask fields. */
 	mask_allowed |= ((1ULL << OVS_KEY_ATTR_TUNNEL)
@@ -134,6 +136,13 @@ static bool match_validate(const struct sw_flow_match *match,
 			mask_allowed |= 1ULL << OVS_KEY_ATTR_ARP;
 	}
 
+
+	if (eth_p_mpls(match->key->eth.type)) {
+		key_expected |= 1ULL << OVS_KEY_ATTR_MPLS;
+		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
+			mask_allowed |= 1ULL << OVS_KEY_ATTR_MPLS;
+	}
+
 	if (match->key->eth.type == htons(ETH_P_IP)) {
 		key_expected |= 1ULL << OVS_KEY_ATTR_IPV4;
 		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
@@ -242,6 +251,7 @@ static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
 	[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
 	[OVS_KEY_ATTR_TUNNEL] = -1,
+	[OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
 };
 
 static bool is_all_zero(const u8 *fp, size_t size)
@@ -616,6 +626,16 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  u64 attrs,
 		attrs &= ~(1ULL << OVS_KEY_ATTR_ARP);
 	}
 
+	if (attrs & (1ULL << OVS_KEY_ATTR_MPLS)) {
+		const struct ovs_key_mpls *mpls_key;
+
+		mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]);
+		SW_FLOW_KEY_PUT(match, mpls.top_lse,
+				mpls_key->mpls_lse, is_mask);
+
+		attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS);
+	 }
+
 	if (attrs & (1ULL << OVS_KEY_ATTR_TCP)) {
 		const struct ovs_key_tcp *tcp_key;
 
@@ -988,6 +1008,14 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
 		arp_key->arp_op = htons(output->ip.proto);
 		memcpy(arp_key->arp_sha, output->ipv4.arp.sha, ETH_ALEN);
 		memcpy(arp_key->arp_tha, output->ipv4.arp.tha, ETH_ALEN);
+	} else if (eth_p_mpls(swkey->eth.type)) {
+		struct ovs_key_mpls *mpls_key;
+
+		nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
+		if (!nla)
+			goto nla_put_failure;
+		mpls_key = nla_data(nla);
+		mpls_key->mpls_lse = output->mpls.top_lse;
 	}
 
 	if ((swkey->eth.type == htons(ETH_P_IP) ||
@@ -1190,15 +1218,133 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 
 	a->nla_len = sfa->actions_len - st_offset;
 }
+#define MAX_ETH_TYPES 16 /* Arbitrary Limit */
+
+/* struct eth_types - possible eth types
+ * @types: provides storage for the possible eth types.
+ * @start: is the index of the first entry of types which is possible.
+ * @end: is the index of the last entry of types which is possible.
+ * @cursor: is the index of the entry which should be updated if an action
+ * changes the eth type.
+ *
+ * Due to the sample action there may be multiple possible eth types.
+ * In order to correctly validate actions all possible types are tracked
+ * and verified. This is done using struct eth_types.
+ *
+ * Initially start, end and cursor should be 0, and the first element of
+ * types should be set to the eth type of the flow.
+ *
+ * When an action changes the eth type then the values of start and end are
+ * updated to the value of cursor. The new type is stored at types[cursor].
+ *
+ * When entering a sample action the start and cursor values are saved. The
+ * value of cursor is set to the value of end plus one.
+ *
+ * When leaving a sample action the start and cursor values are restored to
+ * their saved values.
+ *
+ * An example follows.
+ *
+ * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D)
+ *
+ * 0. Initial state:
+ *	types = { original_eth_type }
+ * 	start = end = cursor = 0;
+ *
+ * 1. pop_mpls(A)
+ *    a. Check types from start (0) to end (0) inclusive
+ *       i.e. Check against original_eth_type
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = A
+ *    New state:
+ *	types = { A }
+ *	start = end = cursor = 0;
+ *
+ * 2. Enter first sample()
+ *    a. Save start and cursor
+ *    b. Set cursor = end + 1
+ *    New state:
+ *	types = { A }
+ *	start = end = 0;
+ *	cursor = 1;
+ *
+ * 3. pop_mpls(B)
+ *    a. Check types from start (0) to end (0)
+ *       i.e: Check against A
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = B
+ *    New state:
+ *	types = { A, B }
+ *	start = end = cursor = 1;
+ *
+ * 4. Leave first sample()
+ *    a. Restore start and cursor to the values when entering 2.
+ *    New state:
+ *	types = { A, B }
+ *	start = cursor = 0;
+ *	end = 1;
+ *
+ * 5. Enter second sample()
+ *    a. Save start and cursor
+ *    b. Set cursor = end + 1
+ *    New state:
+ *	types = { A, B }
+ *	start = 0;
+ *	end = 1;
+ *	cursor = 2;
+ *
+ * 6. pop_mpls(C)
+ *    a. Check types from start (0) to end (1) inclusive
+ *       i.e: Check against A and B
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = C
+ *    New state:
+ *	types = { A, B, C }
+ *	start = end = cursor = 2;
+ *
+ * 7. Leave second sample()
+ *    a. Restore start and cursor to the values when entering 5.
+ *    New state:
+ *	types = { A, B, C }
+ *	start = cursor = 0;
+ *	end = 2;
+ *
+ * 8. pop_mpls(D)
+ *    a. Check types from start (0) to end (2) inclusive
+ *       i.e: Check against A, B and C
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = D
+ *    New state:
+ *	types = { D } // Trailing entries of type are no longer used end = 0
+ *	start = end = cursor = 0;
+ */
+struct eth_types {
+	int start, end, cursor;
+	__be16 types[MAX_ETH_TYPES];
+};
+
+static void eth_types_set(struct eth_types *types, __be16 type)
+{
+	types->start = types->end = types->cursor;
+	types->types[types->cursor] = type;
+}
+
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth,
+				  struct sw_flow_actions **sfa,
+				  struct eth_types *eth_types);
 
 static int validate_and_copy_sample(const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
-				    struct sw_flow_actions **sfa)
+				    struct sw_flow_actions **sfa,
+				    struct eth_types *eth_types)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
 	const struct nlattr *a;
 	int rem, start, err, st_acts;
+	int saved_eth_types_start, saved_eth_types_cursor;
 
 	memset(attrs, 0, sizeof(attrs));
 	nla_for_each_nested(a, attr, rem) {
@@ -1230,22 +1376,38 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 	if (st_acts < 0)
 		return st_acts;
 
-	err = ovs_nla_copy_actions(actions, key, depth + 1, sfa);
+	/* Save and update eth_types cursor and start.  Please see the
+	 * comment for struct eth_types for a discussion of this.
+	 */
+	saved_eth_types_start = eth_types->start;
+	saved_eth_types_cursor = eth_types->cursor;
+	eth_types->cursor = eth_types->end + 1;
+	if (eth_types->cursor == MAX_ETH_TYPES)
+		return -EINVAL;
+
+	err = ovs_nla_copy_actions__(actions, key, depth + 1, sfa, eth_types);
 	if (err)
 		return err;
 
+	/* Restore eth_types cursor and start.  Please see the
+	 * comment for struct eth_types for a discussion of this.
+	 */
+	eth_types->cursor = saved_eth_types_cursor;
+	eth_types->start = saved_eth_types_start;
+
 	add_nested_action_end(*sfa, st_acts);
 	add_nested_action_end(*sfa, start);
 
 	return 0;
 }
 
-static int validate_tp_port(const struct sw_flow_key *flow_key)
+static int validate_tp_port__(const struct sw_flow_key *flow_key,
+			      __be16 eth_type)
 {
-	if (flow_key->eth.type == htons(ETH_P_IP)) {
+	if (eth_type == htons(ETH_P_IP)) {
 		if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst)
 			return 0;
-	} else if (flow_key->eth.type == htons(ETH_P_IPV6)) {
+	} else  if (eth_type == htons(ETH_P_IPV6)) {
 		if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst)
 			return 0;
 	}
@@ -1253,6 +1415,21 @@ static int validate_tp_port(const struct sw_flow_key *flow_key)
 	return -EINVAL;
 }
 
+static int validate_tp_port(const struct sw_flow_key *flow_key,
+                           const struct eth_types *eth_types)
+{
+	int i;
+
+	for (i = eth_types->start; i < eth_types->end; i++) {
+		int ret = validate_tp_port__(flow_key, eth_types->types[i]);
+
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void ovs_match_init(struct sw_flow_match *match,
 		    struct sw_flow_key *key,
 		    struct sw_flow_mask *mask)
@@ -1295,7 +1472,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
 			struct sw_flow_actions **sfa,
-			bool *set_tun)
+			bool *set_tun, struct eth_types *eth_types)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -1326,9 +1503,12 @@ static int validate_set(const struct nlattr *a,
 			return err;
 		break;
 
-	case OVS_KEY_ATTR_IPV4:
-		if (flow_key->eth.type != htons(ETH_P_IP))
-			return -EINVAL;
+	case OVS_KEY_ATTR_IPV4: {
+		int i;
+
+		for (i = eth_types->start; i <= eth_types->end; i++)
+			if (eth_types->types[i] != htons(ETH_P_IP))
+				return -EINVAL;
 
 		if (!flow_key->ip.proto)
 			return -EINVAL;
@@ -1341,10 +1521,14 @@ static int validate_set(const struct nlattr *a,
 			return -EINVAL;
 
 		break;
+	}
 
-	case OVS_KEY_ATTR_IPV6:
-		if (flow_key->eth.type != htons(ETH_P_IPV6))
-			return -EINVAL;
+	case OVS_KEY_ATTR_IPV6: {
+		int i;
+
+		for (i = eth_types->start; i <= eth_types->end; i++)
+			if (eth_types->types[i] != htons(ETH_P_IPV6))
+				return -EINVAL;
 
 		if (!flow_key->ip.proto)
 			return -EINVAL;
@@ -1360,24 +1544,35 @@ static int validate_set(const struct nlattr *a,
 			return -EINVAL;
 
 		break;
+	}
+
 
 	case OVS_KEY_ATTR_TCP:
 		if (flow_key->ip.proto != IPPROTO_TCP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_types);
 
 	case OVS_KEY_ATTR_UDP:
 		if (flow_key->ip.proto != IPPROTO_UDP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_types);
+
+	case OVS_KEY_ATTR_MPLS: {
+		int i;
+
+		for (i = eth_types->start; i < eth_types->end; i++)
+			if (!eth_p_mpls(eth_types->types[i]))
+				return -EINVAL;
+		break;
+	}
 
 	case OVS_KEY_ATTR_SCTP:
 		if (flow_key->ip.proto != IPPROTO_SCTP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_types);
 
 	default:
 		return -EINVAL;
@@ -1421,10 +1616,11 @@ static int copy_action(const struct nlattr *from,
 	return 0;
 }
 
-int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key,
-			 int depth,
-			 struct sw_flow_actions **sfa)
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth,
+				  struct sw_flow_actions **sfa,
+				  struct eth_types *eth_types)
 {
 	const struct nlattr *a;
 	int rem, err;
@@ -1437,6 +1633,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 		static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
 			[OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
 			[OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
+			[OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct ovs_action_push_mpls),
+			[OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16),
 			[OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
 			[OVS_ACTION_ATTR_POP_VLAN] = 0,
 			[OVS_ACTION_ATTR_SET] = (u32)-1,
@@ -1479,14 +1677,44 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 				return -EINVAL;
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS: {
+			const struct ovs_action_push_mpls *mpls = nla_data(a);
+			if (!eth_p_mpls(mpls->mpls_ethertype))
+				return -EINVAL;
+			eth_types_set(eth_types, mpls->mpls_ethertype);
+			break;
+		}
+
+		case OVS_ACTION_ATTR_POP_MPLS: {
+			int i;
+
+			for (i = eth_types->start; i <= eth_types->end; i++)
+				if (!eth_p_mpls(eth_types->types[i]))
+					return -EINVAL;
+
+			/* Disallow subsequent L2.5+ set and mpls_pop actions
+			 * as there is no check here to ensure that the new
+			 * eth_type is valid and thus set actions could
+			 * write off the end of the packet or otherwise
+			 * corrupt it.
+			 *
+			 * Support for these actions is planned using packet
+			 * recirculation.
+			 */
+			eth_types_set(eth_types, htons(0));
+			break;
+		}
+
 		case OVS_ACTION_ATTR_SET:
-			err = validate_set(a, key, sfa, &skip_copy);
+			err = validate_set(a, key, sfa, &skip_copy,
+					   eth_types);
 			if (err)
 				return err;
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-			err = validate_and_copy_sample(a, key, depth, sfa);
+			err = validate_and_copy_sample(a, key, depth, sfa,
+						       eth_types);
 			if (err)
 				return err;
 			skip_copy = true;
@@ -1508,6 +1736,20 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 	return 0;
 }
 
+int ovs_nla_copy_actions(const struct nlattr *attr,
+			 const struct sw_flow_key *key,
+			 struct sw_flow_actions **sfa)
+{
+	struct eth_types eth_type = {
+		.start = 0,
+		.end = 0,
+		.cursor = 0,
+		.types = { key->eth.type, },
+	};
+
+	return ovs_nla_copy_actions__(attr, key, 0, sfa, &eth_type);
+}
+
 static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
 {
 	const struct nlattr *a;
diff --git a/datapath/flow_netlink.h b/datapath/flow_netlink.h
index 4401510..b471ece 100644
--- a/datapath/flow_netlink.h
+++ b/datapath/flow_netlink.h
@@ -49,7 +49,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 		      const struct nlattr *);
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key, int depth,
+			 const struct sw_flow_key *key,
 			 struct sw_flow_actions **sfa);
 int ovs_nla_put_actions(const struct nlattr *attr,
 			int len, struct sk_buff *skb);
diff --git a/datapath/linux/compat/gso.c b/datapath/linux/compat/gso.c
index 32f906c..43f0d16 100644
--- a/datapath/linux/compat/gso.c
+++ b/datapath/linux/compat/gso.c
@@ -19,6 +19,7 @@
 #include <linux/module.h>
 #include <linux/if.h>
 #include <linux/if_tunnel.h>
+#include <linux/if_vlan.h>
 #include <linux/icmp.h>
 #include <linux/in.h>
 #include <linux/ip.h>
@@ -35,6 +36,8 @@
 #include <net/xfrm.h>
 
 #include "gso.h"
+#include "mpls.h"
+#include "vlan.h"
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37) && \
 	!defined(HAVE_VLAN_BUG_WORKAROUND)
@@ -47,10 +50,12 @@ MODULE_PARM_DESC(vlan_tso, "Enable TSO for VLAN packets");
 #define vlan_tso true
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 static bool dev_supports_vlan_tx(struct net_device *dev)
 {
-#if defined(HAVE_VLAN_BUG_WORKAROUND)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+	return true;
+#elif defined(HAVE_VLAN_BUG_WORKAROUND)
 	return dev->features & NETIF_F_HW_VLAN_TX;
 #else
 	/* Assume that the driver is buggy. */
@@ -58,24 +63,64 @@ static bool dev_supports_vlan_tx(struct net_device *dev)
 #endif
 }
 
+/* Strictly this is not needed and will be optimised out
+ * as this code is guarded by if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0).
+ * It is here to make things explicit should the compatibility
+ * code be extended in some way prior extending its life-span
+ * beyond v3.11.
+ */
+static bool supports_mpls_gso(void)
+{
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0)
+	return true;
+#else
+	return false;
+#endif
+}
+
 int rpl_dev_queue_xmit(struct sk_buff *skb)
 {
 #undef dev_queue_xmit
 	int err = -ENOMEM;
+	bool vlan, mpls;
+
+	vlan = mpls = false;
+
+	if (eth_p_mpls(skb->protocol) && !supports_mpls_gso())
+		mpls = true;
+
+	if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev))
+		vlan = true;
 
-	if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) {
+	if (vlan || mpls) {
 		int features;
 
 		features = netif_skb_features(skb);
 
-		if (!vlan_tso)
-			features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
-				      NETIF_F_UFO | NETIF_F_FSO);
+		if (vlan) {
+			if (!vlan_tso)
+				features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
+					      NETIF_F_UFO | NETIF_F_FSO);
 
-		skb = __vlan_put_tag(skb, skb->vlan_proto, vlan_tx_tag_get(skb));
-		if (unlikely(!skb))
-			return err;
-		vlan_set_tci(skb, 0);
+			skb = __vlan_put_tag(skb, skb->vlan_proto,
+					     vlan_tx_tag_get(skb));
+			if (unlikely(!skb))
+				return err;
+			vlan_set_tci(skb, 0);
+		}
+
+		/* As of v3.11 the kernel provides an mpls_features field in
+		 * struct net_device which allows devices to advertise which
+		 * features its supports for MPLS. This value defaults to
+		 * NETIF_F_SG and as of v3.11.
+		 *
+		 * This compatibility code is intended for kernels older
+		 * than v3.11 that do not support MPLS GSO and thus do not
+		 * provide mpls_features. Thus this code uses NETIF_F_SG
+		 * directly in place of mpls_features.
+		 */
+		if (mpls)
+			features &= NETIF_F_SG;
 
 		if (netif_needs_gso(skb, features)) {
 			struct sk_buff *nskb;
@@ -114,13 +159,15 @@ drop:
 	kfree_skb(skb);
 	return err;
 }
-#endif /* kernel version < 2.6.37 */
 
 static __be16 __skb_network_protocol(struct sk_buff *skb)
 {
 	__be16 type = skb->protocol;
 	int vlan_depth = ETH_HLEN;
 
+	if (eth_p_mpls(skb->protocol))
+		type = ovs_skb_get_inner_protocol(skb);
+
 	while (type == htons(ETH_P_8021Q) || type == htons(ETH_P_8021AD)) {
 		struct vlan_hdr *vh;
 
@@ -134,6 +181,7 @@ static __be16 __skb_network_protocol(struct sk_buff *skb)
 
 	return type;
 }
+#endif /* kernel version < 3.11.0 */
 
 static struct sk_buff *tnl_skb_gso_segment(struct sk_buff *skb,
 					   netdev_features_t features,
diff --git a/datapath/linux/compat/gso.h b/datapath/linux/compat/gso.h
index 44fd213..d7a9cea 100644
--- a/datapath/linux/compat/gso.h
+++ b/datapath/linux/compat/gso.h
@@ -1,6 +1,7 @@
 #ifndef __LINUX_GSO_WRAPPER_H
 #define __LINUX_GSO_WRAPPER_H
 
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <net/protocol.h>
 
@@ -11,6 +12,9 @@ struct ovs_gso_cb {
 	sk_buff_data_t	inner_network_header;
 	sk_buff_data_t	inner_mac_header;
 	void (*fix_segment)(struct sk_buff *);
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+	__be16			inner_protocol;
+#endif
 };
 #define OVS_GSO_CB(skb) ((struct ovs_gso_cb *)(skb)->cb)
 
@@ -69,4 +73,41 @@ static inline void skb_reset_inner_headers(struct sk_buff *skb)
 
 #define ip_local_out rpl_ip_local_out
 int ip_local_out(struct sk_buff *skb);
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+static inline void ovs_skb_init_inner_protocol(struct sk_buff *skb) {
+	OVS_GSO_CB(skb)->inner_protocol = htons(0);
+}
+
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+					      __be16 ethertype) {
+	OVS_GSO_CB(skb)->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+	return OVS_GSO_CB(skb)->inner_protocol;
+}
+
+#else
+
+static inline void ovs_skb_init_inner_protocol(struct sk_buff *skb) {
+	/* Nothing to do. The inner_protocol is either zero or
+	 * has been set to a value by another user.
+	 * Either way it may be considered initialised.
+	 */
+}
+
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+					      __be16 ethertype)
+{
+	skb->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+	return skb->inner_protocol;
+}
+#endif
+
 #endif
diff --git a/datapath/linux/compat/include/linux/netdevice.h b/datapath/linux/compat/include/linux/netdevice.h
index d282201..fc20905 100644
--- a/datapath/linux/compat/include/linux/netdevice.h
+++ b/datapath/linux/compat/include/linux/netdevice.h
@@ -73,10 +73,12 @@ static inline struct net_device *dev_get_by_index_rcu(struct net *net, int ifind
 #define NETIF_F_FSO 0
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 #define skb_gso_segment rpl_skb_gso_segment
 struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features);
+#endif
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
 #define netif_skb_features rpl_netif_skb_features
 u32 rpl_netif_skb_features(struct sk_buff *skb);
 
@@ -125,7 +127,7 @@ static inline struct net_device *netdev_master_upper_dev_get(struct net_device *
 }
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 #define dev_queue_xmit rpl_dev_queue_xmit
 int dev_queue_xmit(struct sk_buff *skb);
 #endif
diff --git a/datapath/linux/compat/netdevice.c b/datapath/linux/compat/netdevice.c
index 248066d..7c1f045 100644
--- a/datapath/linux/compat/netdevice.c
+++ b/datapath/linux/compat/netdevice.c
@@ -1,6 +1,9 @@
 #include <linux/netdevice.h>
 #include <linux/if_vlan.h>
 
+#include "mpls.h"
+#include "gso.h"
+
 #ifdef HAVE_RHEL_OVS_HOOK
 int nr_bridges = 0;
 #endif
@@ -71,7 +74,9 @@ u32 rpl_netif_skb_features(struct sk_buff *skb)
 		return harmonize_features(skb, protocol, features);
 	}
 }
+#endif	/* kernel version < 2.6.38 */
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features)
 {
 	int vlan_depth = ETH_HLEN;
@@ -79,6 +84,9 @@ struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features)
 	__be16 skb_proto;
 	struct sk_buff *skb_gso;
 
+	if (eth_p_mpls(skb->protocol))
+		type = ovs_skb_get_inner_protocol(skb);
+
 	while (type == htons(ETH_P_8021Q)) {
 		struct vlan_hdr *vh;
 
@@ -99,4 +107,4 @@ struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb, u32 features)
 	skb->protocol = skb_proto;
 	return skb_gso;
 }
-#endif	/* kernel version < 2.6.38 */
+#endif	/* kernel version < 3.11.0 */
diff --git a/datapath/mpls.h b/datapath/mpls.h
new file mode 100644
index 0000000..7eab104
--- /dev/null
+++ b/datapath/mpls.h
@@ -0,0 +1,15 @@
+#ifndef MPLS_H
+#define MPLS_H 1
+
+#include <linux/if_ether.h>
+
+#define MPLS_BOS_MASK	0x00000100
+#define MPLS_HLEN 4
+
+static inline bool eth_p_mpls(__be16 eth_type)
+{
+	return eth_type == htons(ETH_P_MPLS_UC) ||
+		eth_type == htons(ETH_P_MPLS_MC);
+}
+
+#endif
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index 09c26b5..1ef98a8 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -283,14 +283,13 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_SKB_MARK,  /* u32 skb mark */
 	OVS_KEY_ATTR_TUNNEL,	/* Nested set of ovs_tunnel attributes */
 	OVS_KEY_ATTR_SCTP,      /* struct ovs_key_sctp */
+	OVS_KEY_ATTR_MPLS,      /* array of struct ovs_key_mpls.
+				 * The implementation may restrict
+				 * the accepted length of the array. */
 
 #ifdef __KERNEL__
 	OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
 #endif
-
-	OVS_KEY_ATTR_MPLS = 62, /* array of struct ovs_key_mpls.
-				 * The implementation may restrict
-				 * the accepted length of the array. */
 	__OVS_KEY_ATTR_MAX
 };
 
-- 
1.8.4

^ permalink raw reply related

* Re: [PATCH net-next] {selinux, af_key} Rework pfkey_sadb2xfrm_user_sec_ctx
From: Fan Du @ 2013-10-17  1:34 UTC (permalink / raw)
  To: Paul Moore; +Cc: steffen.klassert, davem, netdev
In-Reply-To: <1541456.gnvYckRYYL@sifl>



On 2013年10月16日 23:15, Paul Moore wrote:
> On Wednesday, October 16, 2013 02:15:14 PM Fan Du wrote:
>> Taking advantages of sadb_x_sec_ctx and xfrm_user_sec_ctx share the same
>> structure arrangement, rework pfkey_sadb2xfrm_user_sec_ctx by casting
>> sadb_x_sec_ctx into xfrm_user_sec_ctx with minor len fix.
>>
>> Then we can:
>>   -Avoid kmalloc/free memory for xfrm_user_sec_ctx, sadb_x_sec_ctx would be
>>    fine.
>>   -Fix missing return value check bug for pfkey_compile_policy when
>>    kmalloc fails
>>
>> Signed-off-by: Fan Du<fan.du@windriver.com>
>> ---
>>   net/key/af_key.c |   33 +--------------------------------
>>   1 file changed, 1 insertion(+), 32 deletions(-)
>>
>> diff --git a/net/key/af_key.c b/net/key/af_key.c
>> index 9d58537..c7d304d 100644
>> --- a/net/key/af_key.c
>> +++ b/net/key/af_key.c
>> @@ -435,22 +435,9 @@ static inline int verify_sec_ctx_len(const void *p)
>>
>>   static inline struct xfrm_user_sec_ctx *pfkey_sadb2xfrm_user_sec_ctx(const
>> struct sadb_x_sec_ctx *sec_ctx) {
>> -	struct xfrm_user_sec_ctx *uctx = NULL;
>> -	int ctx_size = sec_ctx->sadb_x_ctx_len;
>> -
>> -	uctx = kmalloc((sizeof(*uctx)+ctx_size), GFP_KERNEL);
>> -
>> -	if (!uctx)
>> -		return NULL;
>> +	struct xfrm_user_sec_ctx *uctx = (struct xfrm_user_sec_ctx *)sec_ctx;
>>
>>   	uctx->len = pfkey_sec_ctx_len(sec_ctx);
>> -	uctx->exttype = sec_ctx->sadb_x_sec_exttype;
>> -	uctx->ctx_doi = sec_ctx->sadb_x_ctx_doi;
>> -	uctx->ctx_alg = sec_ctx->sadb_x_ctx_alg;
>> -	uctx->ctx_len = sec_ctx->sadb_x_ctx_len;
>> -	memcpy(uctx + 1, sec_ctx + 1,
>> -	       uctx->ctx_len);
>> -
>>   	return uctx;
>>   }
>
> The fact that you are now changing sadb_x_sec_ctx->sadb_x_sec_len whenever
> pfkey_sadb2xfrm_user_sec_ctx() is called raises an eyebrow.  Can you elaborate
> on why this is not a problem?
>
Thanks for your attention, Paul.

sadb_x_sec_ctx is extra headers passed down from user space, the usage of
of this data structure falls down to one of pfkey_funcs function only for
one time, more specifically speaking, it's only used by SELINUX for security
checking for each operation. In other words, sadb_x_sec_ctx involves with a
one shot business here. So the original codes seems do a lots of extra job
which could easily be avoid using casting operation.

-- 
浮沉随浪只记今朝笑

--fan

^ permalink raw reply

* patch "ssb: convert bus code to use dev_groups" added to driver-core tree
From: gregkh @ 2013-10-17  1:39 UTC (permalink / raw)
  To: gregkh, m, netdev


This is a note to let you know that I've just added the patch titled

    ssb: convert bus code to use dev_groups

to my driver-core git tree which can be found at
    git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-next branch.

The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)

The patch will also be merged in the next major kernel release
during the merge window.

If you have any questions about this process, please let me know.


>From 4e9072d61695abcee201591ba9515a28bcb3bd33 Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Sun, 6 Oct 2013 23:55:48 -0700
Subject: ssb: convert bus code to use dev_groups

The dev_attrs field of struct bus_type is going away soon, dev_groups
should be used instead.  This converts the ssb bus code to use the
correct field.

Cc: Michael Buesch <m@bues.ch>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/ssb/main.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index e55ddf7c..32a811d1 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -374,7 +374,8 @@ static ssize_t \
 attrib##_show(struct device *dev, struct device_attribute *attr, char *buf) \
 { \
 	return sprintf(buf, format_string, dev_to_ssb_dev(dev)->field); \
-}
+} \
+static DEVICE_ATTR_RO(attrib);
 
 ssb_config_attr(core_num, core_index, "%u\n")
 ssb_config_attr(coreid, id.coreid, "0x%04x\n")
@@ -387,16 +388,18 @@ name_show(struct device *dev, struct device_attribute *attr, char *buf)
 	return sprintf(buf, "%s\n",
 		       ssb_core_name(dev_to_ssb_dev(dev)->id.coreid));
 }
-
-static struct device_attribute ssb_device_attrs[] = {
-	__ATTR_RO(name),
-	__ATTR_RO(core_num),
-	__ATTR_RO(coreid),
-	__ATTR_RO(vendor),
-	__ATTR_RO(revision),
-	__ATTR_RO(irq),
-	__ATTR_NULL,
+static DEVICE_ATTR_RO(name);
+
+static struct attribute *ssb_device_attrs[] = {
+	&dev_attr_name.attr,
+	&dev_attr_core_num.attr,
+	&dev_attr_coreid.attr,
+	&dev_attr_vendor.attr,
+	&dev_attr_revision.attr,
+	&dev_attr_irq.attr,
+	NULL,
 };
+ATTRIBUTE_GROUPS(ssb_device);
 
 static struct bus_type ssb_bustype = {
 	.name		= "ssb",
@@ -407,7 +410,7 @@ static struct bus_type ssb_bustype = {
 	.suspend	= ssb_device_suspend,
 	.resume		= ssb_device_resume,
 	.uevent		= ssb_device_uevent,
-	.dev_attrs	= ssb_device_attrs,
+	.dev_groups	= ssb_device_groups,
 };
 
 static void ssb_buses_lock(void)
-- 
1.8.4.3.gca3854a

^ permalink raw reply related

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
From: Hannes Frederic Sowa @ 2013-10-17  4:45 UTC (permalink / raw)
  To: Jon Mason, Jiri Pirko, netdev, yoshfuji, David Miller, kuznet,
	jmorris, kaber, herbert, Eric Dumazet
In-Reply-To: <20131008145331.GA4767@order.stressinduktion.org>

Hi Jon and Jiri!

Just wanted to remind you if you could have a look at this?

If you don't have time to test this may I know your assessment of the
situation? I could send a compile-time tested patch to disable UFO or if you
say so we could leave this as is.

Jiri, I would suggest you resend your patches then.

Thanks,

  Hannes

[top-posted by intention]

On Tue, Oct 08, 2013 at 04:53:31PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
> > On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > Hi!
> > >
> > > I have a question regarding UFO and the neterion driver, which as the only one
> > > advertises hardware UFO support:
> > >
> > > The patch discusses in this thread
> > > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > > some semantics how packets are constructed before submitted to the driver.
> > >
> > > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > > payload is attached in the skb's frags. With the changes discussed in this
> > > thread it is possible that we also append to skb->data some amount of data
> > > which is not targeted for the header. From reading the driver sources it seems
> > > the hardware interprets the skb->data to skb_headlen as the header, so we
> > > could include some data in the fragments more than once.
> > 
> > From my reading of the HW Spec and a quick look at the driver, it
> > appears that the driver is using one entry in the TX ring for the
> > header and another for the body of the packet to be fragmented (which
> > is what the hardware wants).  I don't understand what you are saying,
> > but if you are asking if simply appending a new header & data to the
> > end of skb->data will get it out on the wire correct, I don't believe
> > it will.
> 
> No this is not what I tried to say. I'll try to be more clear this
> time. ;)
> 
> We start with an UDP socket which is corked. As soon as we write the
> first few bytes (smaller than the mtu) onto this socket we put the
> header in place and the rest of the data is just appended behind the
> header directly in skb->data via plain ip_append_data.
> 
> Now a second write with a length > mtu happens: The ip(6)_append_data
> will branch to ufo_append. This will fetch the first skb and append
> to skb->frags.  gso_type and gso_size will be updated on this skb (this
> currently does not happen but will with the patches discussed in this
> thread).
> 
> If this packet is transmitted down to the device driver we have the udp
> header in skb->data *and* also the payload from the first write. The
> payload from the second write is appended as a frag and gso_type and
> gso_size are set. This header+payload seem to be mapped just after the
> ufo_in_band_v descriptor as the header in the first tx descriptor:
> 
>    4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
>    4175                                               frg_len, PCI_DMA_TODEVICE);
> 
> frg_len is set to skb_headlen(skb). This happens right after setting up
> the descriptor for the in-band ufo data.
> 
> My guess is that this data isn't split currently by the neterion driver
> (at least I could not find it in the driver as Eric showed it for bnx2x)
> so it might reappear in the packets when the hardware fragments the
> packet and places the first tx ring in front of every packet.
> 
> Before these changes we never updated the gso_type and gso_size even when
> we did append via UFO. So we never had payload in an UFO marked skb->data,
> only the headers. Now we could also end up with a some payload in the
> first TX ring, which you said is only for the header.
> 
> > I do have hardware that I can try the patch on, if you can walk me
> > through the use case (unless it is as easy as setup an IPv6 connection
> > and ping).
> 
> Ok, testing this should not be that complicated:
> 
> We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
> with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
> 
> --- >8 ---
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> #include <arpa/inet.h>
> #include <linux/udp.h>
> #include <stdio.h>
> 
> int test(int mtu)
> {
>         int fd;
>         const int one = 1;
>         const int off = 0;
>         struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
>         unsigned char buffer[3701];
> 
>         inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
> 
>         fd = socket(AF_INET, SOCK_DGRAM, 0);
>         connect(fd, (struct sockaddr *) &addr, sizeof(addr));
> 
>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
> 
>         write(fd, "    ", 4);
>         write(fd, buffer, sizeof(buffer));
>         write(fd, " ", 1);
> 
>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
> 
>         close(fd);
> }
> 
> int main() {
>         test(1280);
> }
> --- >8 ---
> 
> I left out error handling so it is better observed with strace if
> something went wrong.
> 
> You should change the port number and ip address to something reasonable
> for your network. My guess would be that the spaces (0x20) of the first
> write is now placed between UDP header and payload of every packet
> fragmented by the hardware. Would be nice to hear that I am wrong. ;)
> 
> Be aware that the above program can cause memory corruption in the kernel
> if you did not apply Jiri's patch.
> 
> Thanks for helping!
> 
>   Hannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
gruss,

  Hannes

^ permalink raw reply

* [PATCH iproute2 v2] vxlan: add ipv6 support
From: Cong Wang @ 2013-10-17  5:03 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Stephen Hemminger, Cong Wang

The kernel already supports it, so add the support
to iproute2 as well.
    
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

---
v2: use a helper

diff --git a/include/utils.h b/include/utils.h
index 24ff19f..a3e310e 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -151,6 +151,7 @@ int print_timestamp(FILE *fp);
 extern int cmdlineno;
 extern ssize_t getcmdline(char **line, size_t *len, FILE *in);
 extern int makeargs(char *line, char *argv[], int maxargs);
+extern int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6);
 
 struct iplink_req;
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 4304b0d..aa551d8 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -43,6 +43,9 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u32 saddr = 0;
 	__u32 gaddr = 0;
 	__u32 daddr = 0;
+	struct in6_addr saddr6 = IN6ADDR_ANY_INIT;
+	struct in6_addr gaddr6 = IN6ADDR_ANY_INIT;
+	struct in6_addr daddr6 = IN6ADDR_ANY_INIT;
 	unsigned link = 0;
 	__u8 tos = 0;
 	__u8 ttl = 0;
@@ -66,21 +69,30 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			vni_set = 1;
 		} else if (!matches(*argv, "group")) {
 			NEXT_ARG();
-			gaddr = get_addr32(*argv);
-
-			if (!IN_MULTICAST(ntohl(gaddr)))
+			if (!inet_get_addr(*argv, &gaddr, &gaddr6)) {
+				fprintf(stderr, "Invalid address \"%s\"\n", *argv);
+				return -1;
+			}
+			if (!IN6_IS_ADDR_MULTICAST(&gaddr6) && !IN_MULTICAST(ntohl(gaddr)))
 				invarg("invalid group address", *argv);
 		} else if (!matches(*argv, "remote")) {
 			NEXT_ARG();
-			daddr = get_addr32(*argv);
-
-			if (IN_MULTICAST(ntohl(daddr)))
+			if (!inet_get_addr(*argv, &daddr, &daddr6)) {
+				fprintf(stderr, "Invalid address \"%s\"\n", *argv);
+				return -1;
+			}
+			if (IN6_IS_ADDR_MULTICAST(&daddr6) || IN_MULTICAST(ntohl(daddr)))
 				invarg("invalid remote address", *argv);
 		} else if (!matches(*argv, "local")) {
 			NEXT_ARG();
-			if (strcmp(*argv, "any"))
-				saddr = get_addr32(*argv);
-			if (IN_MULTICAST(ntohl(saddr)))
+			if (strcmp(*argv, "any")) {
+				if (!inet_get_addr(*argv, &saddr, &saddr6)) {
+					fprintf(stderr, "Invalid address \"%s\"\n", *argv);
+					return -1;
+				}
+			}
+
+			if (IN_MULTICAST(ntohl(saddr)) || IN6_IS_ADDR_MULTICAST(&saddr6))
 				invarg("invalid local address", *argv);
 		} else if (!matches(*argv, "dev")) {
 			NEXT_ARG();
@@ -167,7 +179,9 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 		fprintf(stderr, "vxlan: missing virtual network identifier\n");
 		return -1;
 	}
-	if (gaddr && daddr) {
+	if ((gaddr && daddr) ||
+		(memcmp(&gaddr6, &in6addr_any, sizeof(gaddr6)) &&
+		 memcmp(&daddr6, &in6addr_any, sizeof(daddr6)))) {
 		fprintf(stderr, "vxlan: both group and remote cannot be specified\n");
 		return -1;
 	}
@@ -176,8 +190,16 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 		addattr_l(n, 1024, IFLA_VXLAN_GROUP, &gaddr, 4);
 	else if (daddr)
 		addattr_l(n, 1024, IFLA_VXLAN_GROUP, &daddr, 4);
+	if (memcmp(&gaddr6, &in6addr_any, sizeof(gaddr6)) != 0)
+		addattr_l(n, 1024, IFLA_VXLAN_GROUP6, &gaddr6, sizeof(struct in6_addr));
+	else if (memcmp(&daddr6, &in6addr_any, sizeof(daddr6)) != 0)
+		addattr_l(n, 1024, IFLA_VXLAN_GROUP6, &daddr6, sizeof(struct in6_addr));
+
 	if (saddr)
 		addattr_l(n, 1024, IFLA_VXLAN_LOCAL, &saddr, 4);
+	else if (memcmp(&saddr6, &in6addr_any, sizeof(saddr6)) != 0)
+		addattr_l(n, 1024, IFLA_VXLAN_LOCAL6, &saddr6, sizeof(struct in6_addr));
+
 	if (link)
 		addattr32(n, 1024, IFLA_VXLAN_LINK, link);
 	addattr8(n, 1024, IFLA_VXLAN_TTL, ttl);
@@ -229,6 +251,17 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 				fprintf(f, "remote %s ",
 					format_host(AF_INET, 4, &addr, s1, sizeof(s1)));
 		}
+	} else if (tb[IFLA_VXLAN_GROUP6]) {
+		struct in6_addr addr;
+		memcpy(&addr, RTA_DATA(tb[IFLA_VXLAN_GROUP6]), sizeof(struct in6_addr));
+		if (memcmp(&addr, &in6addr_any, sizeof(addr)) != 0) {
+			if (IN6_IS_ADDR_MULTICAST(&addr))
+				fprintf(f, "group %s ",
+					format_host(AF_INET6, sizeof(struct in6_addr), &addr, s1, sizeof(s1)));
+			else
+				fprintf(f, "remote %s ",
+					format_host(AF_INET6, sizeof(struct in6_addr), &addr, s1, sizeof(s1)));
+		}
 	}
 
 	if (tb[IFLA_VXLAN_LOCAL]) {
@@ -236,6 +269,12 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 		if (addr)
 			fprintf(f, "local %s ",
 				format_host(AF_INET, 4, &addr, s1, sizeof(s1)));
+	} else if (tb[IFLA_VXLAN_LOCAL6]) {
+		struct in6_addr addr;
+		memcpy(&addr, RTA_DATA(tb[IFLA_VXLAN_LOCAL6]), sizeof(struct in6_addr));
+		if (memcmp(&addr, &in6addr_any, sizeof(addr)) != 0)
+			fprintf(f, "local %s ",
+				format_host(AF_INET6, sizeof(struct in6_addr), &addr, s1, sizeof(s1)));
 	}
 
 	if (tb[IFLA_VXLAN_LINK] &&
diff --git a/lib/utils.c b/lib/utils.c
index dae1b51..4e9c719 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -868,3 +868,11 @@ int makeargs(char *line, char *argv[], int maxargs)
 
 	return argc;
 }
+
+int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6)
+{
+	if (strchr(src, ':'))
+		return inet_pton(AF_INET6, src, dst6);
+	else
+		return inet_pton(AF_INET, src, dst);
+}

^ permalink raw reply related

* Re: [PATCH net V2 1/2] virtio-net: don't respond to cpu hotplug notifier if we're not ready
From: Michael S. Tsirkin @ 2013-10-17  5:07 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev, linux-kernel, virtualization, Greg Kroah-Hartman
In-Reply-To: <8738o0aj9u.fsf@rustcorp.com.au>

On Thu, Oct 17, 2013 at 09:57:41AM +1030, Rusty Russell wrote:
> Jason Wang <jasowang@redhat.com> writes:
> > We're trying to re-configure the affinity unconditionally in cpu hotplug
> > callback. This may lead the issue during resuming from s3/s4 since
> >
> > - virt queues haven't been allocated at that time.
> > - it's unnecessary since thaw method will re-configure the affinity.
> >
> > Fix this issue by checking the config_enable and do nothing is we're not ready.
> >
> > The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
> > (virtio-net: reset virtqueue affinity when doing cpu hotplug).
> >
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > The patch is need for 3.8 and above.
> 
> Please put 'CC: stable@kernel.org # 3.8+' in the commit.

Not if this is going in through the net tree.

> 
> (The specification of the stable line is poor, but that seems to be one
> common method).
> 
> Cheers,
> Rusty.

^ permalink raw reply

* [PATCH net-next v3 0/9] Introduce support to lazy initialize mostly static keys
From: Hannes Frederic Sowa @ 2013-10-17  5:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Hi!

This series implements support for delaying the initialization of secret
keys, e.g. used for hashing, for as long as possible. This functionality
is implemented by a new macro, net_get_random_bytes.

I already used it to protect the socket hashes, the syncookie secret
(most important) and the tcp_fastopen secrets.

Changelog:
v2) Use static_keys in net_get_random_once to have as minimal impact to
    the fast-path as possible.
v3) added patch "static_key: WARN on usage before jump_label_init was called":
    Patch "x86/jump_label: expect default_nop if static_key gets enabled
    on boot-up" relaxes the checks for using static_key primitives before
    jump_label_init. So tighten them first.

Included patches:
 ipv4: split inet_ehashfn to hash functions per compilation unit
 ipv6: split inet6_ehashfn to hash functions per compilation unit
 static_key: WARN on usage before jump_label_init was called
 x86/jump_label: expect default_nop if static_key gets enabled on boot-up
 net: introduce new macro net_get_random_once
 inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_once
 inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
 tcp: switch tcp_fastopen key generation to net_get_random_once
 net: switch net_secret key generation to net_get_random_once

Diffstat:
 arch/x86/kernel/jump_label.c         | 25 ++++++++++++++++++-------
 include/linux/jump_label.h           | 10 ++++++++++
 include/linux/jump_label_ratelimit.h |  2 ++
 include/linux/net.h                  | 25 +++++++++++++++++++++++++
 include/net/inet6_hashtables.h       | 28 +++++++---------------------
 include/net/inet_sock.h              | 26 ++++++--------------------
 include/net/ipv6.h                   |  4 ++--
 include/net/tcp.h                    |  3 +--
 init/main.c                          |  7 +++++++
 kernel/jump_label.c                  |  5 +++++
 net/core/secure_seq.c                | 14 ++------------
 net/core/utils.c                     | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/af_inet.c                   | 27 ---------------------------
 net/ipv4/inet_hashtables.c           | 25 +++++++++++++++++++++++++
 net/ipv4/syncookies.c                | 15 +++++----------
 net/ipv4/sysctl_net_ipv4.c           |  5 +++++
 net/ipv4/tcp_fastopen.c              | 27 ++++++++++++++++-----------
 net/ipv4/udp.c                       | 20 ++++++++++++++++----
 net/ipv6/af_inet6.c                  |  5 -----
 net/ipv6/inet6_hashtables.c          | 33 +++++++++++++++++++++++++++++++++
 net/ipv6/syncookies.c                | 12 +++++++++---
 net/ipv6/udp.c                       | 31 +++++++++++++++++++++++++++----
 net/rds/connection.c                 | 12 +++++++++---
 23 files changed, 278 insertions(+), 131 deletions(-)

Greetings,

  Hannes

^ permalink raw reply

* [PATCH net-next v3 1/9] ipv4: split inet_ehashfn to hash functions per compilation unit
From: Hannes Frederic Sowa @ 2013-10-17  5:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Eric Dumazet, David S. Miller
In-Reply-To: <1381987923-1524-1-git-send-email-hannes@stressinduktion.org>

This duplicates a bit of code but let's us easily introduce
separate secret keys later. The separate compilation units are
ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet_sock.h    | 22 ++++++----------------
 net/ipv4/inet_hashtables.c | 21 +++++++++++++++++++++
 net/ipv4/udp.c             | 16 ++++++++++++----
 net/rds/connection.c       |  6 +++---
 4 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 06da91e..7a6c7f8 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -208,26 +208,16 @@ extern u32 inet_ehash_secret;
 extern u32 ipv6_hash_secret;
 void build_ehash_secret(void);
 
-static inline unsigned int inet_ehashfn(struct net *net,
-					const __be32 laddr, const __u16 lport,
-					const __be32 faddr, const __be16 fport)
+static inline unsigned int __inet_ehashfn(const __be32 laddr,
+					  const __u16 lport,
+					  const __be32 faddr,
+					  const __be16 fport,
+					  u32 initval)
 {
 	return jhash_3words((__force __u32) laddr,
 			    (__force __u32) faddr,
 			    ((__u32) lport) << 16 | (__force __u32)fport,
-			    inet_ehash_secret + net_hash_mix(net));
-}
-
-static inline int inet_sk_ehashfn(const struct sock *sk)
-{
-	const struct inet_sock *inet = inet_sk(sk);
-	const __be32 laddr = inet->inet_rcv_saddr;
-	const __u16 lport = inet->inet_num;
-	const __be32 faddr = inet->inet_daddr;
-	const __be16 fport = inet->inet_dport;
-	struct net *net = sock_net(sk);
-
-	return inet_ehashfn(net, laddr, lport, faddr, fport);
+			    initval);
 }
 
 static inline struct request_sock *inet_reqsk_alloc(struct request_sock_ops *ops)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index a4b66bb..18aa668 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -24,6 +24,27 @@
 #include <net/secure_seq.h>
 #include <net/ip.h>
 
+static unsigned int inet_ehashfn(struct net *net, const __be32 laddr,
+				 const __u16 lport, const __be32 faddr,
+				 const __be16 fport)
+{
+	return __inet_ehashfn(laddr, lport, faddr, fport,
+			      inet_ehash_secret + net_hash_mix(net));
+}
+
+
+static unsigned int inet_sk_ehashfn(const struct sock *sk)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+	const __be32 laddr = inet->inet_rcv_saddr;
+	const __u16 lport = inet->inet_num;
+	const __be32 faddr = inet->inet_daddr;
+	const __be16 fport = inet->inet_dport;
+	struct net *net = sock_net(sk);
+
+	return inet_ehashfn(net, laddr, lport, faddr, fport);
+}
+
 /*
  * Allocate and initialize a new local port bind bucket.
  * The bindhash mutex for snum's hash chain must be held here.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9f27bb8..b4437c7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -407,6 +407,14 @@ static inline int compute_score2(struct sock *sk, struct net *net,
 	return score;
 }
 
+static unsigned int udp_ehashfn(struct net *net, const __be32 laddr,
+				 const __u16 lport, const __be32 faddr,
+				 const __be16 fport)
+{
+	return __inet_ehashfn(laddr, lport, faddr, fport,
+			      inet_ehash_secret + net_hash_mix(net));
+}
+
 
 /* called with read_rcu_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
@@ -430,8 +438,8 @@ begin:
 			badness = score;
 			reuseport = sk->sk_reuseport;
 			if (reuseport) {
-				hash = inet_ehashfn(net, daddr, hnum,
-						    saddr, sport);
+				hash = udp_ehashfn(net, daddr, hnum,
+						   saddr, sport);
 				matches = 1;
 			}
 		} else if (score == badness && reuseport) {
@@ -511,8 +519,8 @@ begin:
 			badness = score;
 			reuseport = sk->sk_reuseport;
 			if (reuseport) {
-				hash = inet_ehashfn(net, daddr, hnum,
-						    saddr, sport);
+				hash = udp_ehashfn(net, daddr, hnum,
+						   saddr, sport);
 				matches = 1;
 			}
 		} else if (score == badness && reuseport) {
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 642ad42..45e2366 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -52,9 +52,9 @@ static struct kmem_cache *rds_conn_slab;
 static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
 {
 	/* Pass NULL, don't need struct net for hash */
-	unsigned long hash = inet_ehashfn(NULL,
-					  be32_to_cpu(laddr), 0,
-					  be32_to_cpu(faddr), 0);
+	unsigned long hash = __inet_ehashfn(be32_to_cpu(laddr), 0,
+					    be32_to_cpu(faddr), 0,
+					    inet_ehash_secret);
 	return &rds_conn_hash[hash & RDS_CONNECTION_HASH_MASK];
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 2/9] ipv6: split inet6_ehashfn to hash functions per compilation unit
From: Hannes Frederic Sowa @ 2013-10-17  5:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Eric Dumazet, David S. Miller
In-Reply-To: <1381987923-1524-1-git-send-email-hannes@stressinduktion.org>

This patch splits the inet6_ehashfn into separate ones in
ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of
seperate secrets keys later.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet6_hashtables.h | 28 +++++++---------------------
 include/net/ipv6.h             |  4 ++--
 net/ipv6/inet6_hashtables.c    | 24 ++++++++++++++++++++++++
 net/ipv6/udp.c                 | 20 ++++++++++++++++----
 4 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index a105d1a..ae06135 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -28,28 +28,14 @@
 
 struct inet_hashinfo;
 
-static inline unsigned int inet6_ehashfn(struct net *net,
-				const struct in6_addr *laddr, const u16 lport,
-				const struct in6_addr *faddr, const __be16 fport)
+static inline unsigned int __inet6_ehashfn(const u32 lhash,
+				    const u16 lport,
+				    const u32 fhash,
+				    const __be16 fport,
+				    const u32 initval)
 {
-	u32 ports = (((u32)lport) << 16) | (__force u32)fport;
-
-	return jhash_3words((__force u32)laddr->s6_addr32[3],
-			    ipv6_addr_jhash(faddr),
-			    ports,
-			    inet_ehash_secret + net_hash_mix(net));
-}
-
-static inline int inet6_sk_ehashfn(const struct sock *sk)
-{
-	const struct inet_sock *inet = inet_sk(sk);
-	const struct in6_addr *laddr = &sk->sk_v6_rcv_saddr;
-	const struct in6_addr *faddr = &sk->sk_v6_daddr;
-	const __u16 lport = inet->inet_num;
-	const __be16 fport = inet->inet_dport;
-	struct net *net = sock_net(sk);
-
-	return inet6_ehashfn(net, laddr, lport, faddr, fport);
+	const u32 ports = (((u32)lport) << 16) | (__force u32)fport;
+	return jhash_3words(lhash, fhash, ports, initval);
 }
 
 int __inet6_hash(struct sock *sk, struct inet_timewait_sock *twp);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index fe1c7f6..a35055f 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -539,14 +539,14 @@ static inline u32 ipv6_addr_hash(const struct in6_addr *a)
 }
 
 /* more secured version of ipv6_addr_hash() */
-static inline u32 ipv6_addr_jhash(const struct in6_addr *a)
+static inline u32 __ipv6_addr_jhash(const struct in6_addr *a, const u32 initval)
 {
 	u32 v = (__force u32)a->s6_addr32[0] ^ (__force u32)a->s6_addr32[1];
 
 	return jhash_3words(v,
 			    (__force u32)a->s6_addr32[2],
 			    (__force u32)a->s6_addr32[3],
-			    ipv6_hash_secret);
+			    initval);
 }
 
 static inline bool ipv6_addr_loopback(const struct in6_addr *a)
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 842d833..fa7dd38 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -23,6 +23,30 @@
 #include <net/secure_seq.h>
 #include <net/ip.h>
 
+static unsigned int inet6_ehashfn(struct net *net,
+				  const struct in6_addr *laddr,
+				  const u16 lport,
+				  const struct in6_addr *faddr,
+				  const __be16 fport)
+{
+	const u32 lhash = (__force u32)laddr->s6_addr32[3];
+	const u32 fhash = __ipv6_addr_jhash(faddr, ipv6_hash_secret);
+	return __inet6_ehashfn(lhash, lport, fhash, fport,
+			       inet_ehash_secret + net_hash_mix(net));
+}
+
+static int inet6_sk_ehashfn(const struct sock *sk)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+	const struct in6_addr *laddr = &sk->sk_v6_rcv_saddr;
+	const struct in6_addr *faddr = &sk->sk_v6_daddr;
+	const __u16 lport = inet->inet_num;
+	const __be16 fport = inet->inet_dport;
+	struct net *net = sock_net(sk);
+
+	return inet6_ehashfn(net, laddr, lport, faddr, fport);
+}
+
 int __inet6_hash(struct sock *sk, struct inet_timewait_sock *tw)
 {
 	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b496de1..324bd36 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -53,6 +53,18 @@
 #include <trace/events/skb.h>
 #include "udp_impl.h"
 
+static unsigned int udp6_ehashfn(struct net *net,
+				  const struct in6_addr *laddr,
+				  const u16 lport,
+				  const struct in6_addr *faddr,
+				  const __be16 fport)
+{
+	const u32 lhash = (__force u32)laddr->s6_addr32[3];
+	const u32 fhash = __ipv6_addr_jhash(faddr, ipv6_hash_secret);
+	return __inet6_ehashfn(lhash, lport, fhash, fport,
+			       inet_ehash_secret + net_hash_mix(net));
+}
+
 int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2)
 {
 	const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2);
@@ -214,8 +226,8 @@ begin:
 			badness = score;
 			reuseport = sk->sk_reuseport;
 			if (reuseport) {
-				hash = inet6_ehashfn(net, daddr, hnum,
-						     saddr, sport);
+				hash = udp6_ehashfn(net, daddr, hnum,
+						    saddr, sport);
 				matches = 1;
 			} else if (score == SCORE2_MAX)
 				goto exact_match;
@@ -295,8 +307,8 @@ begin:
 			badness = score;
 			reuseport = sk->sk_reuseport;
 			if (reuseport) {
-				hash = inet6_ehashfn(net, daddr, hnum,
-						     saddr, sport);
+				hash = udp6_ehashfn(net, daddr, hnum,
+						    saddr, sport);
 				matches = 1;
 			}
 		} else if (score == badness && reuseport) {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 3/9] static_key: WARN on usage before jump_label_init was called
From: Hannes Frederic Sowa @ 2013-10-17  5:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Steven Rostedt, Peter Zijlstra, Andi Kleen
In-Reply-To: <1381987923-1524-1-git-send-email-hannes@stressinduktion.org>

Based on a patch from Andi Kleen.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/jump_label.h           | 10 ++++++++++
 include/linux/jump_label_ratelimit.h |  2 ++
 init/main.c                          |  7 +++++++
 kernel/jump_label.c                  |  5 +++++
 4 files changed, 24 insertions(+)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index a507907..e96be72 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,6 +48,13 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/bug.h>
+
+extern bool static_key_initialized;
+
+#define STATIC_KEY_CHECK_USE() WARN(!static_key_initialized,		      \
+				    "%s used before call to jump_label_init", \
+				    __func__)
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
 
@@ -128,6 +135,7 @@ struct static_key {
 
 static __always_inline void jump_label_init(void)
 {
+	static_key_initialized = true;
 }
 
 static __always_inline bool static_key_false(struct static_key *key)
@@ -146,11 +154,13 @@ static __always_inline bool static_key_true(struct static_key *key)
 
 static inline void static_key_slow_inc(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	atomic_inc(&key->enabled);
 }
 
 static inline void static_key_slow_dec(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	atomic_dec(&key->enabled);
 }
 
diff --git a/include/linux/jump_label_ratelimit.h b/include/linux/jump_label_ratelimit.h
index 1137883..089f70f 100644
--- a/include/linux/jump_label_ratelimit.h
+++ b/include/linux/jump_label_ratelimit.h
@@ -23,12 +23,14 @@ struct static_key_deferred {
 };
 static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
 {
+	STATIC_KEY_CHECK_USE();
 	static_key_slow_dec(&key->key);
 }
 static inline void
 jump_label_rate_limit(struct static_key_deferred *key,
 		unsigned long rl)
 {
+	STATIC_KEY_CHECK_USE();
 }
 #endif	/* HAVE_JUMP_LABEL */
 #endif	/* _LINUX_JUMP_LABEL_RATELIMIT_H */
diff --git a/init/main.c b/init/main.c
index af310af..27bbec1a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -136,6 +136,13 @@ static char *execute_command;
 static char *ramdisk_execute_command;
 
 /*
+ * Used to generate warnings if static_key manipulation functions are used
+ * before jump_label_init is called.
+ */
+bool static_key_initialized __read_mostly = false;
+EXPORT_SYMBOL_GPL(static_key_initialized);
+
+/*
  * If set, this is an indication to the drivers that reset the underlying
  * device before going ahead with the initialization otherwise driver might
  * rely on the BIOS and skip the reset operation.
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 297a924..9019f15 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -58,6 +58,7 @@ static void jump_label_update(struct static_key *key, int enable);
 
 void static_key_slow_inc(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	if (atomic_inc_not_zero(&key->enabled))
 		return;
 
@@ -103,12 +104,14 @@ static void jump_label_update_timeout(struct work_struct *work)
 
 void static_key_slow_dec(struct static_key *key)
 {
+	STATIC_KEY_CHECK_USE();
 	__static_key_slow_dec(key, 0, NULL);
 }
 EXPORT_SYMBOL_GPL(static_key_slow_dec);
 
 void static_key_slow_dec_deferred(struct static_key_deferred *key)
 {
+	STATIC_KEY_CHECK_USE();
 	__static_key_slow_dec(&key->key, key->timeout, &key->work);
 }
 EXPORT_SYMBOL_GPL(static_key_slow_dec_deferred);
@@ -116,6 +119,7 @@ EXPORT_SYMBOL_GPL(static_key_slow_dec_deferred);
 void jump_label_rate_limit(struct static_key_deferred *key,
 		unsigned long rl)
 {
+	STATIC_KEY_CHECK_USE();
 	key->timeout = rl;
 	INIT_DELAYED_WORK(&key->work, jump_label_update_timeout);
 }
@@ -212,6 +216,7 @@ void __init jump_label_init(void)
 		key->next = NULL;
 #endif
 	}
+	static_key_initialized = true;
 	jump_label_unlock();
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 4/9] x86/jump_label: expect default_nop if static_key gets enabled on boot-up
From: Hannes Frederic Sowa @ 2013-10-17  5:31 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Steven Rostedt, Jason Baron, Peter Zijlstra, Eric Dumazet,
	David S. Miller, x86
In-Reply-To: <1381987923-1524-1-git-send-email-hannes@stressinduktion.org>

net_get_random_once(intrduced in the next patch) uses static_keys in
a way that they get enabled on boot-up instead of replaced with an
ideal_nop. So check for default_nop on initial enabling.

Other architectures don't check for this.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 arch/x86/kernel/jump_label.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index ee11b7d..26d5a55 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -42,15 +42,27 @@ static void __jump_label_transform(struct jump_entry *entry,
 				   int init)
 {
 	union jump_code_union code;
+	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
 	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 
 	if (type == JUMP_LABEL_ENABLE) {
-		/*
-		 * We are enabling this jump label. If it is not a nop
-		 * then something must have gone wrong.
-		 */
-		if (unlikely(memcmp((void *)entry->code, ideal_nop, 5) != 0))
-			bug_at((void *)entry->code, __LINE__);
+		if (init) {
+			/*
+			 * Jump label is enabled for the first time.
+			 * So we expect a default_nop...
+			 */
+			if (unlikely(memcmp((void *)entry->code, default_nop, 5)
+				     != 0))
+				bug_at((void *)entry->code, __LINE__);
+		} else {
+			/*
+			 * ...otherwise expect an ideal_nop. Otherwise
+			 * something went horribly wrong.
+			 */
+			if (unlikely(memcmp((void *)entry->code, ideal_nop, 5)
+				     != 0))
+				bug_at((void *)entry->code, __LINE__);
+		}
 
 		code.jump = 0xe9;
 		code.offset = entry->target -
@@ -63,7 +75,6 @@ static void __jump_label_transform(struct jump_entry *entry,
 		 * are converting the default nop to the ideal nop.
 		 */
 		if (init) {
-			const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
 			if (unlikely(memcmp((void *)entry->code, default_nop, 5) != 0))
 				bug_at((void *)entry->code, __LINE__);
 		} else {
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox