Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net 7/7] net/mlx5: Properly deal with flow counters when deleting rules
From: Saeed Mahameed @ 2018-04-26 19:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Chris Mi, Jianbo Liu, Saeed Mahameed
In-Reply-To: <20180426195842.29665-1-saeedm@mellanox.com>

From: Chris Mi <chrism@mellanox.com>

When deleting a flow counter, the modify mask should be the action and
the flow counter. Otherwise the flow counter is not deleted and we'll
get a firmware warning when deleting the remaining destinations on the
same FTE.

It only happens in the presence of flow counter and multiple vport
destinations. If there is only one vport destination, there is no
need to update the FTE when deleting the only vport destination,
we just delete the FTE.

Fixes: ae05831424ed ("net/mlx5: Add option to add fwd rule with counter")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 2595c67ea39e..c39c1692e674 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -482,7 +482,8 @@ static void del_sw_hw_rule(struct fs_node *node)

 	if (rule->dest_attr.type == MLX5_FLOW_DESTINATION_TYPE_COUNTER  &&
 	    --fte->dests_size) {
-		modify_mask = BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_ACTION);
+		modify_mask = BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_ACTION) |
+			      BIT(MLX5_SET_FTE_MODIFY_ENABLE_MASK_FLOW_COUNTERS);
 		fte->action.action &= ~MLX5_FLOW_CONTEXT_ACTION_COUNT;
 		update_fte = true;
 		goto out;
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next v2 1/7] dt-bindings: net: add DT bindings for Microsemi MIIM
From: Alexandre Belloni @ 2018-04-26 19:59 UTC (permalink / raw)
  To: David S . Miller
  Cc: Allan Nielsen, razvan.stefanescu, po.liu, Thomas Petazzoni,
	Andrew Lunn, Florian Fainelli, netdev, devicetree, linux-kernel,
	linux-mips, Alexandre Belloni, Rob Herring
In-Reply-To: <20180426195931.5393-1-alexandre.belloni@bootlin.com>

DT bindings for the Microsemi MII Management Controller found on Microsemi
SoCs

Cc: Rob Herring <robh+dt@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
---
 .../devicetree/bindings/net/mscc-miim.txt     | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/mscc-miim.txt

diff --git a/Documentation/devicetree/bindings/net/mscc-miim.txt b/Documentation/devicetree/bindings/net/mscc-miim.txt
new file mode 100644
index 000000000000..7104679cf59d
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mscc-miim.txt
@@ -0,0 +1,26 @@
+Microsemi MII Management Controller (MIIM) / MDIO
+=================================================
+
+Properties:
+- compatible: must be "mscc,ocelot-miim"
+- reg: The base address of the MDIO bus controller register bank. Optionally, a
+  second register bank can be defined if there is an associated reset register
+  for internal PHYs
+- #address-cells: Must be <1>.
+- #size-cells: Must be <0>.  MDIO addresses have no size component.
+- interrupts: interrupt specifier (refer to the interrupt binding)
+
+Typically an MDIO bus might have several children.
+
+Example:
+	mdio@107009c {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		compatible = "mscc,ocelot-miim";
+		reg = <0x107009c 0x36>, <0x10700f0 0x8>;
+		interrupts = <14>;
+
+		phy0: ethernet-phy@0 {
+			reg = <0>;
+		};
+	};
-- 
2.17.0

^ permalink raw reply related

* Re: [B.A.T.M.A.N.] [PATCH] batman-adv: fix batadv_interface_tx()'s return type
From: Luc Van Oostenryck @ 2018-04-26 20:05 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: b.a.t.m.a.n, linux-kernel, Marek Lindner, netdev,
	Antonio Quartulli
In-Reply-To: <2141097.l2ETxyu3Mo@sven-edge>

On Wed, Apr 25, 2018 at 07:35:00PM +0200, Sven Eckelmann wrote:
> On Dienstag, 24. April 2018 15:18:46 CEST Luc Van Oostenryck wrote:
> > The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> > which is a typedef for an enum type, but the implementation in this
> > driver returns an 'int'.
> > 
> > Fix this by returning 'netdev_tx_t' in this driver too.
> > 
> > Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> > ---
> >  net/batman-adv/soft-interface.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Applied as 6bf9e4d39a58 [1] but the alignment was fixed

Thanks for this,
-- Luc 

^ permalink raw reply

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Mikulas Patocka @ 2018-04-26 20:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: James Bottomley, Michal Hocko, David Rientjes, dm-devel,
	eric.dumazet, netdev, jasowang, Randy Dunlap, linux-kernel,
	Matthew Wilcox, linux-mm, edumazet, Andrew Morton, virtualization,
	David Miller, Vlastimil Babka
In-Reply-To: <20180426223925-mutt-send-email-mst@kernel.org>



On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 03:36:14PM -0400, Mikulas Patocka wrote:
> > People on this list argue "this should be a kernel parameter".
> 
> How about making it a writeable attribute, so it's easy to turn on/off
> after boot. Then you can keep it deterministic, userspace can play with
> the attribute at random if it wants to.
> 
> -- 
> MST

It is already controllable by an attribute in debugfs.

Will you email all the testers about this attribute? How many of them will 
remember to set it? How many of them will remember to set it a year after? 
Will you write a userspace program that manages it and introduce it into 
the distributon?

This is a little feature.

Mikulas

^ permalink raw reply

* Re: [PATCH] net: aquantia: fix aq_ndev_start_xmit()'s return type
From: Luc Van Oostenryck @ 2018-04-26 20:37 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, igor.russkikh, netdev
In-Reply-To: <20180424.104250.1411072442966778574.davem@davemloft.net>

On Tue, Apr 24, 2018 at 10:42:50AM -0400, David Miller wrote:
> 
> Luc please don't submit such a huge number of patches all at one time.
> 
> ...
> 
> Finally, make this a true patch series.  It is so much easier for
> maintainers to work with a set of changes all doing the same thing if
> you make them a proper patch series with an appropriate "[PATCH 0/N] ..."
> header posting.
> 
> Thank you.

I suppose these sort of patches are as much a PITA for the sender
than for the receivers.

I hesitated between a single patch, a series or separated patches.
In a sense, the single patch would have been the easier for both sides
but I guessed it would not have been very well welcomed. Since for a
series, you're supposed to CC the whole series to everyone involved,
it would have been, or at least at thought so, maximaly noisy for no
good reasons. Finally, as all of these patches are totally independent,
I thought it would be the best to send them as separated patches, 
each drivers maintainers being then free to accept, reject or ignore
the patch(es) concerning him/her. It seems it was a bad guess, and
yes, I see the point of having a series for this.

I'll remember all this for the next time (if next time there is,
of course, I was already quite hesitant to spend time to prepare
and send patches for these issues with enum/integer mix-up).

Sorry for the annoyance,
-- Luc

^ permalink raw reply

* [PATCH] net/mlx5: report persistent netdev stats across ifdown/ifup commands
From: Qing Huang @ 2018-04-26 20:37 UTC (permalink / raw)
  To: linux-kernel, linux-rdma, netdev; +Cc: leon, matanb, saeedm, Qing Huang

Current stats collecting scheme in mlx5 driver is to periodically fetch
aggregated stats from all the active mlx5 software channels associated
with the device. However when a mlx5 interface is brought down(ifdown),
all the channels will be deactivated and closed. A new set of channels
will be created when next ifup command or a similar command is called.
Unfortunately the new channels will have all stats reset to 0. So you
lose the accumulated stats information. This behavior is different from
other netdev drivers including the mlx4 driver. In order to fix it, we
now save prior mlx5 software stats into netdev stats fields, so all the
accumulated stats will survive multiple runs of ifdown/ifup commands and
be shown correctly.

Orabug: 27548610

Signed-off-by: Qing Huang <qing.huang@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 30 +++++++++++++++++++----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index f1fe490..5d50e69 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2621,6 +2621,23 @@ static void mlx5e_netdev_set_tcs(struct net_device *netdev)
 		netdev_set_tc_queue(netdev, tc, nch, 0);
 }
 
+static void mlx5e_netdev_save_stats(struct mlx5e_priv *priv)
+{
+	struct net_device *netdev = priv->netdev;
+
+	netdev->stats.rx_packets += priv->stats.sw.rx_packets;
+	netdev->stats.rx_bytes   += priv->stats.sw.rx_bytes;
+	netdev->stats.tx_packets += priv->stats.sw.tx_packets;
+	netdev->stats.tx_bytes   += priv->stats.sw.tx_bytes;
+	netdev->stats.tx_dropped += priv->stats.sw.tx_queue_dropped;
+
+	priv->stats.sw.rx_packets	= 0;
+	priv->stats.sw.rx_bytes		= 0;
+	priv->stats.sw.tx_packets	= 0;
+	priv->stats.sw.tx_bytes		= 0;
+	priv->stats.sw.tx_queue_dropped = 0;
+}
+
 static void mlx5e_build_channels_tx_maps(struct mlx5e_priv *priv)
 {
 	struct mlx5e_channel *c;
@@ -2691,6 +2708,7 @@ void mlx5e_switch_priv_channels(struct mlx5e_priv *priv,
 		netif_set_real_num_tx_queues(netdev, new_num_txqs);
 
 	mlx5e_deactivate_priv_channels(priv);
+	mlx5e_netdev_save_stats(priv);
 	mlx5e_close_channels(&priv->channels);
 
 	priv->channels = *new_chs;
@@ -2770,6 +2788,7 @@ int mlx5e_close_locked(struct net_device *netdev)
 
 	netif_carrier_off(priv->netdev);
 	mlx5e_deactivate_priv_channels(priv);
+	mlx5e_netdev_save_stats(priv);
 	mlx5e_close_channels(&priv->channels);
 
 	return 0;
@@ -3215,11 +3234,12 @@ static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
 		stats->tx_packets = PPORT_802_3_GET(pstats, a_frames_transmitted_ok);
 		stats->tx_bytes   = PPORT_802_3_GET(pstats, a_octets_transmitted_ok);
 	} else {
-		stats->rx_packets = sstats->rx_packets;
-		stats->rx_bytes   = sstats->rx_bytes;
-		stats->tx_packets = sstats->tx_packets;
-		stats->tx_bytes   = sstats->tx_bytes;
-		stats->tx_dropped = sstats->tx_queue_dropped;
+		stats->rx_packets = sstats->rx_packets + dev->stats.rx_packets;
+		stats->rx_bytes   = sstats->rx_bytes + dev->stats.rx_bytes;
+		stats->tx_packets = sstats->tx_packets + dev->stats.tx_packets;
+		stats->tx_bytes   = sstats->tx_bytes + dev->stats.tx_bytes;
+		stats->tx_dropped = sstats->tx_queue_dropped +
+				    dev->stats.tx_dropped;
 	}
 
 	stats->rx_dropped = priv->stats.qcnt.rx_out_of_buffer;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next v2 2/7] net: mscc: Add MDIO driver
From: Andrew Lunn @ 2018-04-26 20:46 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: David S . Miller, Allan Nielsen, razvan.stefanescu, po.liu,
	Thomas Petazzoni, Florian Fainelli, netdev, devicetree,
	linux-kernel, linux-mips
In-Reply-To: <20180426195931.5393-3-alexandre.belloni@bootlin.com>

On Thu, Apr 26, 2018 at 09:59:26PM +0200, Alexandre Belloni wrote:
> Add a driver for the Microsemi MII Management controller (MIIM) found on
> Microsemi SoCs.
> On Ocelot, there are two controllers, one is connected to the internal
> PHYs, the other one can communicate with external PHYs.
> 
> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCHv3 3/3] tools bpftool: Display license GPL compatible in prog show/list
From: Daniel Borkmann @ 2018-04-26 20:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jakub Kicinski, Jiri Olsa, Alexei Starovoitov, lkml, netdev,
	Quentin Monnet
In-Reply-To: <20180426081801.GK3396@krava>

On 04/26/2018 10:18 AM, Jiri Olsa wrote:
[...]
> v3 of the last patch attached, the branch is also updated
> 
> thanks,
> jirka
> 
> 
> ---
> Display the license "gpl" string in bpftool prog command, like:
> 
>   # bpftool prog list
>   5: tracepoint  name func  tag 57cd311f2e27366b  gpl
>           loaded_at Apr 26/09:37  uid 0
>           xlated 16B  not jited  memlock 4096B
> 
>   # bpftool --json --pretty prog show
>   [{
>           "id": 5,
>           "type": "tracepoint",
>           "name": "func",
>           "tag": "57cd311f2e27366b",
>           "gpl_compatible": true,
>           "loaded_at": "Apr 26/09:37",
>           "uid": 0,
>           "bytes_xlated": 16,
>           "jited": false,
>           "bytes_memlock": 4096
>       }
>   ]
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Ok, v2 from prior two patches and v3 of this one applied to bpf-next. Please
next time always submit a fresh new series at once, thanks Jiri.

^ permalink raw reply

* Proposal
From: MS Zeliha Omer Faruk @ 2018-04-26 20:39 UTC (permalink / raw)





Hello

   Greetings to you today i asked before but i did't get a response please
i know this might come to you as a surprise because you do not know me
personally i have a business proposal for you please reply for more
info.



Best Regards,

Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
 Sisli - Istanbul, Turkey

^ permalink raw reply

* Re: [PATCH net-next v2 5/7] MIPS: mscc: Add switch to ocelot
From: Andrew Lunn @ 2018-04-26 20:51 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: David S . Miller, Allan Nielsen, razvan.stefanescu, po.liu,
	Thomas Petazzoni, Florian Fainelli, netdev, devicetree,
	linux-kernel, linux-mips, James Hogan
In-Reply-To: <20180426195931.5393-6-alexandre.belloni@bootlin.com>

On Thu, Apr 26, 2018 at 09:59:29PM +0200, Alexandre Belloni wrote:
> Ocelot has an integrated switch, add support for it.
> 
> Cc: James Hogan <jhogan@kernel.org>
> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
> ---
>  arch/mips/boot/dts/mscc/ocelot.dtsi | 88 +++++++++++++++++++++++++++++
>  1 file changed, 88 insertions(+)
> 
> diff --git a/arch/mips/boot/dts/mscc/ocelot.dtsi b/arch/mips/boot/dts/mscc/ocelot.dtsi
> index dd239cab2f9d..4f33dbc67348 100644
> --- a/arch/mips/boot/dts/mscc/ocelot.dtsi
> +++ b/arch/mips/boot/dts/mscc/ocelot.dtsi
> @@ -91,6 +91,72 @@
>  			status = "disabled";
>  		};
>  
> +		switch@1010000 {
> +			compatible = "mscc,vsc7514-switch";
> +			reg = <0x1010000 0x10000>,
> +			      <0x1030000 0x10000>,
> +			      <0x1080000 0x100>,
> +			      <0x10d0000 0x10000>,
> +			      <0x11e0000 0x100>,
> +			      <0x11f0000 0x100>,
> +			      <0x1200000 0x100>,
> +			      <0x1210000 0x100>,
> +			      <0x1220000 0x100>,
> +			      <0x1230000 0x100>,
> +			      <0x1240000 0x100>,
> +			      <0x1250000 0x100>,
> +			      <0x1260000 0x100>,
> +			      <0x1270000 0x100>,
> +			      <0x1280000 0x100>,
> +			      <0x1800000 0x80000>,
> +			      <0x1880000 0x10000>;
> +			reg-names = "sys", "rew", "qs", "hsio", "port0",
> +				    "port1", "port2", "port3", "port4", "port5",
> +				    "port6", "port7", "port8", "port9", "port10",
> +				    "qsys", "ana";
> +			interrupts = <21 22>;
> +			interrupt-names = "xtr", "inj";
> +
> +			ethernet-ports {
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +
> +				port0: port@0 {
> +					reg = <0>;
> +				};
> +				port1: port@1 {
> +					reg = <1>;
> +				};
> +				port2: port@2 {
> +					reg = <2>;
> +				};
> +				port3: port@3 {
> +					reg = <3>;
> +				};
> +				port4: port@4 {
> +					reg = <4>;
> +				};
> +				port5: port@5 {
> +					reg = <5>;
> +				};
> +				port6: port@6 {
> +					reg = <6>;
> +				};
> +				port7: port@7 {
> +					reg = <7>;
> +				};
> +				port8: port@8 {
> +					reg = <8>;
> +				};
> +				port9: port@9 {
> +					reg = <9>;
> +				};
> +				port10: port@10 {
> +					reg = <10>;
> +				};
> +			};
> +		};
> +
>  		reset@1070008 {
>  			compatible = "mscc,ocelot-chip-reset";
>  			reg = <0x1070008 0x4>;
> @@ -113,5 +179,27 @@
>  				function = "uart2";
>  			};
>  		};
> +
> +		mdio0: mdio@107009c {
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +			compatible = "mscc,ocelot-miim";
> +			reg = <0x107009c 0x36>, <0x10700f0 0x8>;
> +			interrupts = <14>;
> +			status = "disabled";
> +
> +			phy0: ethernet-phy@0 {
> +				reg = <0>;
> +			};
> +			phy1: ethernet-phy@1 {
> +				reg = <1>;
> +			};
> +			phy2: ethernet-phy@2 {
> +				reg = <2>;
> +			};
> +			phy3: ethernet-phy@3 {
> +				reg = <3>;
> +			};

Hi Alexandre

These are internal PHYs? Is there an option to use external PHYs for
the ports which have internal PHYs?

I'm just wondering if they should be linked together by default. Or a
comment added to the commit message about why they are not linked
together here.

	 Andrew

^ permalink raw reply

* Re: [bpf PATCH] bpf: fix uninitialized variable in bpf tools
From: Daniel Borkmann @ 2018-04-26 20:55 UTC (permalink / raw)
  To: John Fastabend, ast, jbenc; +Cc: netdev
In-Reply-To: <20180425220852.10403.79675.stgit@john-Precision-Tower-5810>

On 04/26/2018 12:08 AM, John Fastabend wrote:
> Here the variable cont is used as the saved_pointer for a call to
> strtok_r(). It is safe to use the value uninitialized in this
> context however and the later reference is only ever used if
> the strtok_r is successful. But, 'gcc-5' at least doesn't have all
> this knowledge so initialize cont to NULL. Additionally, do the
> natural NULL check before accessing just for completness.
> 
> The warning is the following:
> 
> ./bpf/tools/bpf/bpf_dbg.c: In function ‘cmd_load’:
> ./bpf/tools/bpf/bpf_dbg.c:1077:13: warning: ‘cont’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   } else if (matches(subcmd, "pcap") == 0) {
> 
> Fixes: fd981e3c321a "filter: bpf_dbg: add minimal bpf debugger"
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Applied to bpf tree, thanks John!

^ permalink raw reply

* [PATCHv2 bpf-next 0/2] BPF tunnel testsuite
From: William Tu @ 2018-04-26 21:01 UTC (permalink / raw)
  To: netdev

The patch series provide end-to-end eBPF tunnel testsute.  A common topology
is created below for all types of tunnels:

Topology:                                                                     
---------                                                                     
     root namespace   |     at_ns0 namespace                                   
                      |                                                        
      -----------     |     -----------                                        
      | tnl dev |     |     | tnl dev |  (overlay network)                     
      -----------     |     -----------                                        
      metadata-mode   |     native-mode                                        
       with bpf       |                                                        
                      |                                                        
      ----------      |     ----------                                         
      |  veth1  | --------- |  veth0  |  (underlay network)                    
      ----------    peer    ----------                                         
	                                                                              
                                                                               
Device Configuration                                                          
--------------------                                                          
 Root namespace with metadata-mode tunnel + BPF                                
 Device names and addresses:                                                   
       veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)                         
       tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)              
                                                                               
 Namespace at_ns0 with native tunnel                                           
 Device names and addresses:                                                   
       veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)                       
       tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)              
                                                                               
                                                                               
End-to-end ping packet flow                                                   
---------------------------                                                   
 Most of the tests start by namespace creation, device configuration,          
 then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'     
 from root namespace, the following operations happen:                         
 1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.       
 2) Tnl device's egress BPF program is triggered and set the tunnel metadata,  
    with remote_ip=172.16.1.200 and others.                                    
 3) Outer tunnel header is prepended and route the packet to veth1's egress    
 4) veth0's ingress queue receive the tunneled packet at namespace at_ns0      
 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet                   
 6) Forward the packet to the overlay tnl dev                                  

Test Cases
-----------------------------
 Tunnel Type |  BPF Programs
-----------------------------
 GRE:          gre_set_tunnel, gre_get_tunnel
 IP6GRE:       ip6gretap_set_tunnel, ip6gretap_get_tunnel
 ERSPAN:       erspan_set_tunnel, erspan_get_tunnel
 IP6ERSPAN:    ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
 VXLAN:        vxlan_set_tunnel, vxlan_get_tunnel
 IP6VXLAN:     ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
 GENEVE:       geneve_set_tunnel, geneve_get_tunnel
 IP6GENEVE:    ip6geneve_set_tunnel, ip6geneve_get_tunnel
 IPIP:         ipip_set_tunnel, ipip_get_tunnel
 IP6IP:        ipip6_set_tunnel, ipip6_get_tunnel,
               ip6ip6_set_tunnel, ip6ip6_get_tunnel
 XFRM:         xfrm_get_state

William Tu (2):
  selftests/bpf: bpf tunnel test.
  samples/bpf: remove the bpf tunnel testsuite.

 samples/bpf/Makefile                           |   1 -
 samples/bpf/tcbpf2_kern.c                      | 612 ---------------------
 samples/bpf/test_tunnel_bpf.sh                 | 390 -------------
 tools/testing/selftests/bpf/Makefile           |   5 +-
 tools/testing/selftests/bpf/test_tunnel.sh     | 729 +++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_tunnel_kern.c | 713 ++++++++++++++++++++++++
 6 files changed, 1445 insertions(+), 1005 deletions(-)
 delete mode 100644 samples/bpf/tcbpf2_kern.c
 delete mode 100755 samples/bpf/test_tunnel_bpf.sh
 create mode 100755 tools/testing/selftests/bpf/test_tunnel.sh
 create mode 100644 tools/testing/selftests/bpf/test_tunnel_kern.c

-- 
2.7.4

^ permalink raw reply

* [PATCHv2 bpf-next 1/2] selftests/bpf: bpf tunnel test.
From: William Tu @ 2018-04-26 21:01 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1524776500-27030-1-git-send-email-u9012063@gmail.com>

The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
and samples/bpf/test_tunnel_bpf.sh to selftests.  There are a couple
changes from the original:
    1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
    2) simplify the original ipip tests (remove iperf tests)
    3) improve documentation
    4) use bpf_ntoh* and bpf_hton* api

In summary, 'test_tunnel_kern.o' contains the following bpf program:
  GRE: gre_set_tunnel, gre_get_tunnel
  IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
  ERSPAN: erspan_set_tunnel, erspan_get_tunnel
  IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
  VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
  IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
  GENEVE: geneve_set_tunnel, geneve_get_tunnel
  IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
  IPIP: ipip_set_tunnel, ipip_get_tunnel
  IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
         ip6ip6_set_tunnel, ip6ip6_get_tunnel
  XFRM: xfrm_get_state

Signed-off-by: William Tu <u9012063@gmail.com>
---
 tools/testing/selftests/bpf/Makefile           |   5 +-
 tools/testing/selftests/bpf/test_tunnel.sh     | 729 +++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_tunnel_kern.c | 713 ++++++++++++++++++++++++
 3 files changed, 1445 insertions(+), 2 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_tunnel.sh
 create mode 100644 tools/testing/selftests/bpf/test_tunnel_kern.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 0c19d5e08f08..b64a7a39cbc8 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -32,7 +32,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
 	sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
-	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o
+	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -40,7 +40,8 @@ TEST_PROGS := test_kmod.sh \
 	test_xdp_redirect.sh \
 	test_xdp_meta.sh \
 	test_offload.py \
-	test_sock_addr.sh
+	test_sock_addr.sh \
+	test_tunnel.sh
 
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr
diff --git a/tools/testing/selftests/bpf/test_tunnel.sh b/tools/testing/selftests/bpf/test_tunnel.sh
new file mode 100755
index 000000000000..aeb2901f21f4
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tunnel.sh
@@ -0,0 +1,729 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# End-to-end eBPF tunnel test suite
+#   The script tests BPF network tunnel implementation.
+#
+# Topology:
+# ---------
+#     root namespace   |     at_ns0 namespace
+#                      |
+#      -----------     |     -----------
+#      | tnl dev |     |     | tnl dev |  (overlay network)
+#      -----------     |     -----------
+#      metadata-mode   |     native-mode
+#       with bpf       |
+#                      |
+#      ----------      |     ----------
+#      |  veth1  | --------- |  veth0  |  (underlay network)
+#      ----------    peer    ----------
+#
+#
+# Device Configuration
+# --------------------
+# Root namespace with metadata-mode tunnel + BPF
+# Device names and addresses:
+# 	veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
+# 	tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)
+#
+# Namespace at_ns0 with native tunnel
+# Device names and addresses:
+# 	veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
+# 	tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)
+#
+#
+# End-to-end ping packet flow
+# ---------------------------
+# Most of the tests start by namespace creation, device configuration,
+# then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'
+# from root namespace, the following operations happen:
+# 1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
+# 2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
+#    with remote_ip=172.16.1.200 and others.
+# 3) Outer tunnel header is prepended and route the packet to veth1's egress
+# 4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
+# 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
+# 6) Forward the packet to the overlay tnl dev
+
+PING_ARG="-c 3 -w 10 -q"
+ret=0
+GREEN='\033[0;92m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+config_device()
+{
+	ip netns add at_ns0
+	ip link add veth0 type veth peer name veth1
+	ip link set veth0 netns at_ns0
+	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip link set dev veth1 up mtu 1500
+	ip addr add dev veth1 172.16.1.200/24
+}
+
+add_gre_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+        ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE key 2 external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6gretap_tunnel()
+{
+
+	# assign ipv6 address
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
+		local ::11 remote ::22
+
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip addr add dev $DEV fc80::200/24
+	ip link set dev $DEV up
+}
+
+add_erspan_tunnel()
+{
+	# at_ns0 namespace
+	if [ "$1" == "v1" ]; then
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200 \
+		erspan_ver 1 erspan 123
+	else
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200 \
+		erspan_ver 2 erspan_dir egress erspan_hwid 3
+	fi
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6erspan_tunnel()
+{
+
+	# assign ipv6 address
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	if [ "$1" == "v1" ]; then
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local ::11 remote ::22 \
+		erspan_ver 1 erspan 123
+	else
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local ::11 remote ::22 \
+		erspan_ver 2 erspan_dir egress erspan_hwid 7
+	fi
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_vxlan_tunnel()
+{
+	# Set static ARP entry here because iptables set-mark works
+	# on L3 packet, as a result not applying to ARP packets,
+	# causing errors at get_tunnel_{key/opt}.
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		id 2 dstport 4789 gbp remote 172.16.1.200
+	ip netns exec at_ns0 \
+		ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
+	ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external gbp dstport 4789
+	ip link set dev $DEV address 52:54:00:d9:02:00 up
+	ip addr add dev $DEV 10.1.1.200/24
+	arp -s 10.1.1.100 52:54:00:d9:01:00
+}
+
+add_ip6vxlan_tunnel()
+{
+	#ip netns exec at_ns0 ip -4 addr del 172.16.1.100 dev veth0
+	ip netns exec at_ns0 ip -6 addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	#ip -4 addr del 172.16.1.200 dev veth1
+	ip -6 addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 22 dstport 4789 \
+		local ::11 remote ::22
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external dstport 4789
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_geneve_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		id 2 dstport 6081 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE dstport 6081 external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6geneve_tunnel()
+{
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 22 \
+		remote ::22     # geneve has no local option
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_ipip_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		local 172.16.1.100 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ipip6tnl_tunnel()
+{
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		local ::11 remote ::22
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+test_gre()
+{
+	TYPE=gretap
+	DEV_NS=gretap00
+	DEV=gretap11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_gre_tunnel
+	attach_bpf $DEV gre_set_tunnel gre_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+        if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gre()
+{
+	TYPE=ip6gre
+	DEV_NS=ip6gre00
+	DEV=ip6gre11
+	ret=0
+
+	check $TYPE
+	config_device
+	# reuse the ip6gretap function
+	add_ip6gretap_tunnel
+	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# overlay: ipv4 over ipv6
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	# overlay: ipv6 over ipv6
+	ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+	check_err $?
+	cleanup
+
+        if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gretap()
+{
+	TYPE=ip6gretap
+	DEV_NS=ip6gretap00
+	DEV=ip6gretap11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6gretap_tunnel
+	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# overlay: ipv4 over ipv6
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	# overlay: ipv6 over ipv6
+	ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_erspan()
+{
+	TYPE=erspan
+	DEV_NS=erspan00
+	DEV=erspan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_erspan_tunnel $1
+	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6erspan()
+{
+	TYPE=ip6erspan
+	DEV_NS=ip6erspan00
+	DEV=ip6erspan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6erspan_tunnel $1
+	attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
+	ping6 $PING_ARG ::11
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_vxlan()
+{
+	TYPE=vxlan
+	DEV_NS=vxlan00
+	DEV=vxlan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_vxlan_tunnel
+	attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6vxlan()
+{
+	TYPE=vxlan
+	DEV_NS=ip6vxlan00
+	DEV=ip6vxlan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6vxlan_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ip6vxlan_set_tunnel ip6vxlan_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# ip4 over ip6
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_geneve()
+{
+	TYPE=geneve
+	DEV_NS=geneve00
+	DEV=geneve11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_geneve_tunnel
+	attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6geneve()
+{
+	TYPE=geneve
+	DEV_NS=ip6geneve00
+	DEV=ip6geneve11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6geneve_tunnel
+	attach_bpf $DEV ip6geneve_set_tunnel ip6geneve_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_ipip()
+{
+	TYPE=ipip
+	DEV_NS=ipip00
+	DEV=ipip11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ipip_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ipip6()
+{
+	TYPE=ip6tnl
+	DEV_NS=ipip6tnl00
+	DEV=ipip6tnl11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ipip6tnl_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ipip6_set_tunnel ipip6_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# ip4 over ip6
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+setup_xfrm_tunnel()
+{
+	auth=0x$(printf '1%.0s' {1..40})
+	enc=0x$(printf '2%.0s' {1..32})
+	spi_in_to_out=0x1
+	spi_out_to_in=0x2
+	# at_ns0 namespace
+	# at_ns0 -> root
+	ip netns exec at_ns0 \
+		ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
+			spi $spi_in_to_out reqid 1 mode tunnel \
+			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
+	ip netns exec at_ns0 \
+		ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir out \
+		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
+		mode tunnel
+	# root -> at_ns0
+	ip netns exec at_ns0 \
+		ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
+			spi $spi_out_to_in reqid 2 mode tunnel \
+			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
+	ip netns exec at_ns0 \
+		ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir in \
+		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
+		mode tunnel
+	# address & route
+	ip netns exec at_ns0 \
+		ip addr add dev veth0 10.1.1.100/32
+	ip netns exec at_ns0 \
+		ip route add 10.1.1.200 dev veth0 via 172.16.1.200 \
+			src 10.1.1.100
+
+	# root namespace
+	# at_ns0 -> root
+	ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
+		spi $spi_in_to_out reqid 1 mode tunnel \
+		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
+	ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir in \
+		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
+		mode tunnel
+	# root -> at_ns0
+	ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
+		spi $spi_out_to_in reqid 2 mode tunnel \
+		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
+	ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir out \
+		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
+		mode tunnel
+	# address & route
+	ip addr add dev veth1 10.1.1.200/32
+	ip route add 10.1.1.100 dev veth1 via 172.16.1.100 src 10.1.1.200
+}
+
+test_xfrm_tunnel()
+{
+	config_device
+        #tcpdump -nei veth1 ip &
+	output=$(mktemp)
+	cat /sys/kernel/debug/tracing/trace_pipe | tee $output &
+        setup_xfrm_tunnel
+	tc qdisc add dev veth1 clsact
+	tc filter add dev veth1 proto ip ingress bpf da obj test_tunnel_kern.o \
+		sec xfrm_get_state
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	sleep 1
+	grep "reqid 1" $output
+	check_err $?
+	grep "spi 0x1" $output
+	check_err $?
+	grep "remote ip 0xac100164" $output
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: xfrm tunnel"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: xfrm tunnel"${NC}
+}
+
+attach_bpf()
+{
+	DEV=$1
+	SET=$2
+	GET=$3
+	tc qdisc add dev $DEV clsact
+	tc filter add dev $DEV egress bpf da obj test_tunnel_kern.o sec $SET
+	tc filter add dev $DEV ingress bpf da obj test_tunnel_kern.o sec $GET
+}
+
+cleanup()
+{
+	ip netns delete at_ns0 2> /dev/null
+	ip link del veth1 2> /dev/null
+	ip link del ipip11 2> /dev/null
+	ip link del ipip6tnl11 2> /dev/null
+	ip link del gretap11 2> /dev/null
+	ip link del ip6gre11 2> /dev/null
+	ip link del ip6gretap11 2> /dev/null
+	ip link del vxlan11 2> /dev/null
+	ip link del ip6vxlan11 2> /dev/null
+	ip link del geneve11 2> /dev/null
+	ip link del ip6geneve11 2> /dev/null
+	ip link del erspan11 2> /dev/null
+	ip link del ip6erspan11 2> /dev/null
+}
+
+cleanup_exit()
+{
+	echo "CATCH SIGKILL or SIGINT, cleanup and exit"
+	cleanup
+	exit 0
+}
+
+check()
+{
+	ip link help $1 2>&1 | grep -q "^Usage:"
+	if [ $? -ne 0 ];then
+		echo "SKIP $1: iproute2 not support"
+	cleanup
+	return 1
+	fi
+}
+
+enable_debug()
+{
+	echo 'file ip_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+	echo 'file ip6_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+	echo 'file vxlan.c +p' > /sys/kernel/debug/dynamic_debug/control
+	echo 'file geneve.c +p' > /sys/kernel/debug/dynamic_debug/control
+	echo 'file ipip.c +p' > /sys/kernel/debug/dynamic_debug/control
+}
+
+check_err()
+{
+	if [ $ret -eq 0 ]; then
+		ret=$1
+	fi
+}
+
+bpf_tunnel_test()
+{
+	echo "Testing GRE tunnel..."
+	test_gre
+	echo "Testing IP6GRE tunnel..."
+	test_ip6gre
+	echo "Testing IP6GRETAP tunnel..."
+	test_ip6gretap
+	echo "Testing ERSPAN tunnel..."
+	test_erspan v2
+	echo "Testing IP6ERSPAN tunnel..."
+	test_ip6erspan v2
+	echo "Testing VXLAN tunnel..."
+	test_vxlan
+	echo "Testing IP6VXLAN tunnel..."
+	test_ip6vxlan
+	echo "Testing GENEVE tunnel..."
+	test_geneve
+	echo "Testing IP6GENEVE tunnel..."
+	test_ip6geneve
+	echo "Testing IPIP tunnel..."
+	test_ipip
+	echo "Testing IPIP6 tunnel..."
+	test_ipip6
+	echo "Testing IPSec tunnel..."
+	test_xfrm_tunnel
+}
+
+trap cleanup 0 3 6
+trap cleanup_exit 2 9
+
+cleanup
+bpf_tunnel_test
+
+exit 0
diff --git a/tools/testing/selftests/bpf/test_tunnel_kern.c b/tools/testing/selftests/bpf/test_tunnel_kern.c
new file mode 100644
index 000000000000..504df69c83df
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tunnel_kern.c
@@ -0,0 +1,713 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2016 VMware
+ * Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <stddef.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/types.h>
+#include <linux/tcp.h>
+#include <linux/socket.h>
+#include <linux/pkt_cls.h>
+#include <linux/erspan.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define ERROR(ret) do {\
+		char fmt[] = "ERROR line:%d ret:%d\n";\
+		bpf_trace_printk(fmt, sizeof(fmt), __LINE__, ret); \
+	} while (0)
+
+int _version SEC("version") = 1;
+
+struct geneve_opt {
+	__be16	opt_class;
+	__u8	type;
+	__u8	length:5;
+	__u8	r3:1;
+	__u8	r2:1;
+	__u8	r1:1;
+	__u8	opt_data[8]; /* hard-coded to 8 byte */
+};
+
+struct vxlan_metadata {
+	__u32     gbp;
+};
+
+SEC("gre_set_tunnel")
+int _gre_set_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX | BPF_F_SEQ_NUMBER);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("gre_get_tunnel")
+int _gre_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "key %d remote ip 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), key.tunnel_id, key.remote_ipv4);
+	return TC_ACT_OK;
+}
+
+SEC("ip6gretap_set_tunnel")
+int _ip6gretap_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+	key.tunnel_label = 0xabcde;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX |
+				     BPF_F_SEQ_NUMBER);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6gretap_get_tunnel")
+int _ip6gretap_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip6 ::%x label %x\n";
+	struct bpf_tunnel_key key;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_set_tunnel")
+int _erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&md, 0, sizeof(md));
+#ifdef ERSPAN_V1
+	md.version = 1;
+	md.u.index = bpf_htonl(123);
+#else
+	__u8 direction = 1;
+	__u8 hwid = 7;
+
+	md.version = 2;
+	md.u.md2.dir = direction;
+	md.u.md2.hwid = hwid & 0xf;
+	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+#endif
+
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_get_tunnel")
+int _erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x erspan version %d\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	__u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.version);
+
+#ifdef ERSPAN_V1
+	char fmt2[] = "\tindex %x\n";
+
+	index = bpf_ntohl(md.u.index);
+	bpf_trace_printk(fmt2, sizeof(fmt2), index);
+#else
+	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
+
+	bpf_trace_printk(fmt2, sizeof(fmt2),
+			 md.u.md2.dir,
+			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+			 bpf_ntohl(md.u.md2.timestamp));
+#endif
+
+	return TC_ACT_OK;
+}
+
+SEC("ip4ip6erspan_set_tunnel")
+int _ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11);
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&md, 0, sizeof(md));
+
+#ifdef ERSPAN_V1
+	md.u.index = bpf_htonl(123);
+	md.version = 1;
+#else
+	__u8 direction = 0;
+	__u8 hwid = 17;
+
+	md.version = 2;
+	md.u.md2.dir = direction;
+	md.u.md2.hwid = hwid & 0xf;
+	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+#endif
+
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip4ip6erspan_get_tunnel")
+int _ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "ip6erspan get key %d remote ip6 ::%x erspan version %d\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	__u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.version);
+
+#ifdef ERSPAN_V1
+	char fmt2[] = "\tindex %x\n";
+
+	index = bpf_ntohl(md.u.index);
+	bpf_trace_printk(fmt2, sizeof(fmt2), index);
+#else
+	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
+
+	bpf_trace_printk(fmt2, sizeof(fmt2),
+			 md.u.md2.dir,
+			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+			 bpf_ntohl(md.u.md2.timestamp));
+#endif
+
+	return TC_ACT_OK;
+}
+
+SEC("vxlan_set_tunnel")
+int _vxlan_set_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct vxlan_metadata md;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	md.gbp = 0x800FF; /* Set VXLAN Group Policy extension */
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("vxlan_get_tunnel")
+int _vxlan_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct vxlan_metadata md;
+	char fmt[] = "key %d remote ip 0x%x vxlan gbp 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.gbp);
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6vxlan_set_tunnel")
+int _ip6vxlan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 22;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6vxlan_get_tunnel")
+int _ip6vxlan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip6 ::%x label %x\n";
+	struct bpf_tunnel_key key;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
+
+	return TC_ACT_OK;
+}
+
+SEC("geneve_set_tunnel")
+int _geneve_set_tunnel(struct __sk_buff *skb)
+{
+	int ret, ret2;
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	__builtin_memset(&gopt, 0x0, sizeof(gopt));
+	gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
+	gopt.type = 0x08;
+	gopt.r1 = 0;
+	gopt.r2 = 0;
+	gopt.r3 = 0;
+	gopt.length = 2; /* 4-byte multiple */
+	*(int *) &gopt.opt_data = bpf_htonl(0xdeadbeef);
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("geneve_get_tunnel")
+int _geneve_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
+	return TC_ACT_OK;
+}
+
+SEC("ip6geneve_set_tunnel")
+int _ip6geneve_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 22;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&gopt, 0x0, sizeof(gopt));
+	gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
+	gopt.type = 0x08;
+	gopt.r1 = 0;
+	gopt.r2 = 0;
+	gopt.r3 = 0;
+	gopt.length = 2; /* 4-byte multiple */
+	*(int *) &gopt.opt_data = bpf_htonl(0xfeedbeef);
+
+	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6geneve_get_tunnel")
+int _ip6geneve_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip_set_tunnel")
+int _ipip_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct iphdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	key.tunnel_ttl = 64;
+	if (iph->protocol == IPPROTO_ICMP) {
+		key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	} else {
+		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5)
+			return TC_ACT_SHOT;
+
+		if (tcp->dest == bpf_htons(5200))
+			key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+		else if (tcp->dest == bpf_htons(5201))
+			key.remote_ipv4 = 0xac100165; /* 172.16.1.101 */
+		else
+			return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip_get_tunnel")
+int _ipip_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), key.remote_ipv4);
+	return TC_ACT_OK;
+}
+
+SEC("ipip6_set_tunnel")
+int _ipip6_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct iphdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip6_get_tunnel")
+int _ipip6_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip6 %x::%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+			 bpf_htonl(key.remote_ipv6[3]));
+	return TC_ACT_OK;
+}
+
+SEC("ip6ip6_set_tunnel")
+int _ip6ip6_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct ipv6hdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	key.remote_ipv6[0] = bpf_htonl(0x2401db00);
+	key.tunnel_ttl = 64;
+
+	if (iph->nexthdr == 58 /* NEXTHDR_ICMP */) {
+		key.remote_ipv6[3] = bpf_htonl(1);
+	} else {
+		if (iph->nexthdr != 6 /* NEXTHDR_TCP */) {
+			ERROR(iph->nexthdr);
+			return TC_ACT_SHOT;
+		}
+
+		if (tcp->dest == bpf_htons(5200)) {
+			key.remote_ipv6[3] = bpf_htonl(1);
+		} else if (tcp->dest == bpf_htons(5201)) {
+			key.remote_ipv6[3] = bpf_htonl(2);
+		} else {
+			ERROR(tcp->dest);
+			return TC_ACT_SHOT;
+		}
+	}
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6ip6_get_tunnel")
+int _ip6ip6_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip6 %x::%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+			 bpf_htonl(key.remote_ipv6[3]));
+	return TC_ACT_OK;
+}
+
+SEC("xfrm_get_state")
+int _xfrm_get_state(struct __sk_buff *skb)
+{
+	struct bpf_xfrm_state x;
+	char fmt[] = "reqid %d spi 0x%x remote ip 0x%x\n";
+	int ret;
+
+	ret = bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
+	if (ret < 0)
+		return TC_ACT_OK;
+
+	bpf_trace_printk(fmt, sizeof(fmt), x.reqid, bpf_ntohl(x.spi),
+			 bpf_ntohl(x.remote_ipv4));
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.7.4

^ permalink raw reply related

* [PATCHv2 bpf-next 2/2] samples/bpf: remove the bpf tunnel testsuite.
From: William Tu @ 2018-04-26 21:01 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1524776500-27030-1-git-send-email-u9012063@gmail.com>

Move the testsuite to
selftests/bpf/{test_tunnel_kern.c, test_tunnel.sh}

Signed-off-by: William Tu <u9012063@gmail.com>
---
 samples/bpf/Makefile           |   1 -
 samples/bpf/tcbpf2_kern.c      | 612 -----------------------------------------
 samples/bpf/test_tunnel_bpf.sh | 390 --------------------------
 3 files changed, 1003 deletions(-)
 delete mode 100644 samples/bpf/tcbpf2_kern.c
 delete mode 100755 samples/bpf/test_tunnel_bpf.sh

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index aa8c392e2e52..b853581592fd 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -114,7 +114,6 @@ always += sock_flags_kern.o
 always += test_probe_write_user_kern.o
 always += trace_output_kern.o
 always += tcbpf1_kern.o
-always += tcbpf2_kern.o
 always += tc_l2_redirect_kern.o
 always += lathist_kern.o
 always += offwaketime_kern.o
diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c
deleted file mode 100644
index fa260c750fb1..000000000000
--- a/samples/bpf/tcbpf2_kern.c
+++ /dev/null
@@ -1,612 +0,0 @@
-/* Copyright (c) 2016 VMware
- * Copyright (c) 2016 Facebook
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-#define KBUILD_MODNAME "foo"
-#include <uapi/linux/bpf.h>
-#include <uapi/linux/if_ether.h>
-#include <uapi/linux/if_packet.h>
-#include <uapi/linux/ip.h>
-#include <uapi/linux/ipv6.h>
-#include <uapi/linux/in.h>
-#include <uapi/linux/tcp.h>
-#include <uapi/linux/filter.h>
-#include <uapi/linux/pkt_cls.h>
-#include <uapi/linux/erspan.h>
-#include <net/ipv6.h>
-#include "bpf_helpers.h"
-#include "bpf_endian.h"
-
-#define _htonl __builtin_bswap32
-#define ERROR(ret) do {\
-		char fmt[] = "ERROR line:%d ret:%d\n";\
-		bpf_trace_printk(fmt, sizeof(fmt), __LINE__, ret); \
-	} while(0)
-
-struct geneve_opt {
-	__be16	opt_class;
-	u8	type;
-	u8	length:5;
-	u8	r3:1;
-	u8	r2:1;
-	u8	r1:1;
-	u8	opt_data[8]; /* hard-coded to 8 byte */
-};
-
-struct vxlan_metadata {
-	u32     gbp;
-};
-
-SEC("gre_set_tunnel")
-int _gre_set_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_ZERO_CSUM_TX | BPF_F_SEQ_NUMBER);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("gre_get_tunnel")
-int _gre_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "key %d remote ip 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), key.tunnel_id, key.remote_ipv4);
-	return TC_ACT_OK;
-}
-
-SEC("ip6gretap_set_tunnel")
-int _ip6gretap_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv6[3] = _htonl(0x11); /* ::11 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-	key.tunnel_label = 0xabcde;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX |
-				     BPF_F_SEQ_NUMBER);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip6gretap_get_tunnel")
-int _ip6gretap_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "key %d remote ip6 ::%x label %x\n";
-	struct bpf_tunnel_key key;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
-
-	return TC_ACT_OK;
-}
-
-SEC("erspan_set_tunnel")
-int _erspan_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	__builtin_memset(&md, 0, sizeof(md));
-#ifdef ERSPAN_V1
-	md.version = 1;
-	md.u.index = bpf_htonl(123);
-#else
-	u8 direction = 1;
-	u8 hwid = 7;
-
-	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
-#endif
-
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("erspan_get_tunnel")
-int _erspan_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "key %d remote ip 0x%x erspan version %d\n";
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	u32 index;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.version);
-
-#ifdef ERSPAN_V1
-	char fmt2[] = "\tindex %x\n";
-
-	index = bpf_ntohl(md.u.index);
-	bpf_trace_printk(fmt2, sizeof(fmt2), index);
-#else
-	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
-
-	bpf_trace_printk(fmt2, sizeof(fmt2),
-			 md.u.md2.dir,
-			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
-			 bpf_ntohl(md.u.md2.timestamp));
-#endif
-
-	return TC_ACT_OK;
-}
-
-SEC("ip4ip6erspan_set_tunnel")
-int _ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv6[3] = _htonl(0x11);
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	__builtin_memset(&md, 0, sizeof(md));
-
-#ifdef ERSPAN_V1
-	md.u.index = htonl(123);
-	md.version = 1;
-#else
-	u8 direction = 0;
-	u8 hwid = 17;
-
-	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
-#endif
-
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip4ip6erspan_get_tunnel")
-int _ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "ip6erspan get key %d remote ip6 ::%x erspan version %d\n";
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	u32 index;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.version);
-
-#ifdef ERSPAN_V1
-	char fmt2[] = "\tindex %x\n";
-
-	index = bpf_ntohl(md.u.index);
-	bpf_trace_printk(fmt2, sizeof(fmt2), index);
-#else
-	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
-
-	bpf_trace_printk(fmt2, sizeof(fmt2),
-			 md.u.md2.dir,
-			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
-			 bpf_ntohl(md.u.md2.timestamp));
-#endif
-
-	return TC_ACT_OK;
-}
-
-SEC("vxlan_set_tunnel")
-int _vxlan_set_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct vxlan_metadata md;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	md.gbp = 0x800FF; /* Set VXLAN Group Policy extension */
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("vxlan_get_tunnel")
-int _vxlan_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct vxlan_metadata md;
-	char fmt[] = "key %d remote ip 0x%x vxlan gbp 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.gbp);
-
-	return TC_ACT_OK;
-}
-
-SEC("geneve_set_tunnel")
-int _geneve_set_tunnel(struct __sk_buff *skb)
-{
-	int ret, ret2;
-	struct bpf_tunnel_key key;
-	struct geneve_opt gopt;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	__builtin_memset(&gopt, 0x0, sizeof(gopt));
-	gopt.opt_class = 0x102; /* Open Virtual Networking (OVN) */
-	gopt.type = 0x08;
-	gopt.r1 = 0;
-	gopt.r2 = 0;
-	gopt.r3 = 0;
-	gopt.length = 2; /* 4-byte multiple */
-	*(int *) &gopt.opt_data = 0xdeadbeef;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("geneve_get_tunnel")
-int _geneve_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct geneve_opt gopt;
-	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
-	return TC_ACT_OK;
-}
-
-SEC("ipip_set_tunnel")
-int _ipip_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct iphdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.tunnel_ttl = 64;
-	if (iph->protocol == IPPROTO_ICMP) {
-		key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	} else {
-		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5)
-			return TC_ACT_SHOT;
-
-		if (tcp->dest == htons(5200))
-			key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-		else if (tcp->dest == htons(5201))
-			key.remote_ipv4 = 0xac100165; /* 172.16.1.101 */
-		else
-			return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ipip_get_tunnel")
-int _ipip_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), key.remote_ipv4);
-	return TC_ACT_OK;
-}
-
-SEC("ipip6_set_tunnel")
-int _ipip6_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct iphdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.remote_ipv6[0] = _htonl(0x2401db00);
-	key.tunnel_ttl = 64;
-
-	if (iph->protocol == IPPROTO_ICMP) {
-		key.remote_ipv6[3] = _htonl(1);
-	} else {
-		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5) {
-			ERROR(iph->protocol);
-			return TC_ACT_SHOT;
-		}
-
-		if (tcp->dest == htons(5200)) {
-			key.remote_ipv6[3] = _htonl(1);
-		} else if (tcp->dest == htons(5201)) {
-			key.remote_ipv6[3] = _htonl(2);
-		} else {
-			ERROR(tcp->dest);
-			return TC_ACT_SHOT;
-		}
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ipip6_get_tunnel")
-int _ipip6_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip6 %x::%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
-			 _htonl(key.remote_ipv6[3]));
-	return TC_ACT_OK;
-}
-
-SEC("ip6ip6_set_tunnel")
-int _ip6ip6_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct ipv6hdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.remote_ipv6[0] = _htonl(0x2401db00);
-	key.tunnel_ttl = 64;
-
-	if (iph->nexthdr == NEXTHDR_ICMP) {
-		key.remote_ipv6[3] = _htonl(1);
-	} else {
-		if (iph->nexthdr != NEXTHDR_TCP) {
-			ERROR(iph->nexthdr);
-			return TC_ACT_SHOT;
-		}
-
-		if (tcp->dest == htons(5200)) {
-			key.remote_ipv6[3] = _htonl(1);
-		} else if (tcp->dest == htons(5201)) {
-			key.remote_ipv6[3] = _htonl(2);
-		} else {
-			ERROR(tcp->dest);
-			return TC_ACT_SHOT;
-		}
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip6ip6_get_tunnel")
-int _ip6ip6_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip6 %x::%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
-			 _htonl(key.remote_ipv6[3]));
-	return TC_ACT_OK;
-}
-
-SEC("xfrm_get_state")
-int _xfrm_get_state(struct __sk_buff *skb)
-{
-	struct bpf_xfrm_state x;
-	char fmt[] = "reqid %d spi 0x%x remote ip 0x%x\n";
-	int ret;
-
-	ret = bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
-	if (ret < 0)
-		return TC_ACT_OK;
-
-	bpf_trace_printk(fmt, sizeof(fmt), x.reqid, bpf_ntohl(x.spi),
-			 bpf_ntohl(x.remote_ipv4));
-	return TC_ACT_OK;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh
deleted file mode 100755
index 9c534dc07b36..000000000000
--- a/samples/bpf/test_tunnel_bpf.sh
+++ /dev/null
@@ -1,390 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-# In Namespace 0 (at_ns0) using native tunnel
-# Overlay IP: 10.1.1.100
-# local 192.16.1.100 remote 192.16.1.200
-# veth0 IP: 172.16.1.100, tunnel dev <type>00
-
-# Out of Namespace using BPF set/get on lwtunnel
-# Overlay IP: 10.1.1.200
-# local 172.16.1.200 remote 172.16.1.100
-# veth1 IP: 172.16.1.200, tunnel dev <type>11
-
-function config_device {
-	ip netns add at_ns0
-	ip link add veth0 type veth peer name veth1
-	ip link set veth0 netns at_ns0
-	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip link set dev veth1 up mtu 1500
-	ip addr add dev veth1 172.16.1.200/24
-}
-
-function add_gre_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-        ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE key 2 external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6gretap_tunnel {
-
-	# assign ipv6 address
-	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip addr add dev veth1 ::22/96
-	ip link set dev veth1 up
-
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
-		local ::11 remote ::22
-
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip addr add dev $DEV 10.1.1.200/24
-	ip addr add dev $DEV fc80::200/24
-	ip link set dev $DEV up
-}
-
-function add_erspan_tunnel {
-	# in namespace
-	if [ "$1" == "v1" ]; then
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200 \
-		erspan_ver 1 erspan 123
-	else
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200 \
-		erspan_ver 2 erspan_dir egress erspan_hwid 3
-	fi
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6erspan_tunnel {
-
-	# assign ipv6 address
-	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip addr add dev veth1 ::22/96
-	ip link set dev veth1 up
-
-	# in namespace
-	if [ "$1" == "v1" ]; then
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local ::11 remote ::22 \
-		erspan_ver 1 erspan 123
-	else
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local ::11 remote ::22 \
-		erspan_ver 2 erspan_dir egress erspan_hwid 7
-	fi
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip addr add dev $DEV 10.1.1.200/24
-	ip link set dev $DEV up
-}
-
-function add_vxlan_tunnel {
-	# Set static ARP entry here because iptables set-mark works
-	# on L3 packet, as a result not applying to ARP packets,
-	# causing errors at get_tunnel_{key/opt}.
-
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE id 2 dstport 4789 gbp remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
-	ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external gbp dstport 4789
-	ip link set dev $DEV address 52:54:00:d9:02:00 up
-	ip addr add dev $DEV 10.1.1.200/24
-	arp -s 10.1.1.100 52:54:00:d9:01:00
-}
-
-function add_geneve_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE id 2 dstport 6081 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE dstport 6081 external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ipip_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE local 172.16.1.100 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function setup_xfrm_tunnel {
-	auth=0x$(printf '1%.0s' {1..40})
-	enc=0x$(printf '2%.0s' {1..32})
-	spi_in_to_out=0x1
-	spi_out_to_in=0x2
-	# in namespace
-	# in -> out
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-			spi $spi_in_to_out reqid 1 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir out \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# out -> in
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-			spi $spi_out_to_in reqid 2 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir in \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip netns exec at_ns0 \
-		ip addr add dev veth0 10.1.1.100/32
-	ip netns exec at_ns0 \
-		ip route add 10.1.1.200 dev veth0 via 172.16.1.200 \
-			src 10.1.1.100
-
-	# out of namespace
-	# in -> out
-	ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-		spi $spi_in_to_out reqid 1 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir in \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# out -> in
-	ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-		spi $spi_out_to_in reqid 2 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir out \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip addr add dev veth1 10.1.1.200/32
-	ip route add 10.1.1.100 dev veth1 via 172.16.1.100 src 10.1.1.200
-}
-
-function attach_bpf {
-	DEV=$1
-	SET_TUNNEL=$2
-	GET_TUNNEL=$3
-	tc qdisc add dev $DEV clsact
-	tc filter add dev $DEV egress bpf da obj tcbpf2_kern.o sec $SET_TUNNEL
-	tc filter add dev $DEV ingress bpf da obj tcbpf2_kern.o sec $GET_TUNNEL
-}
-
-function test_gre {
-	TYPE=gretap
-	DEV_NS=gretap00
-	DEV=gretap11
-	config_device
-	add_gre_tunnel
-	attach_bpf $DEV gre_set_tunnel gre_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ip6gre {
-	TYPE=ip6gre
-	DEV_NS=ip6gre00
-	DEV=ip6gre11
-	config_device
-	# reuse the ip6gretap function
-	add_ip6gretap_tunnel
-	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
-	# underlay
-	ping6 -c 4 ::11
-	# overlay: ipv4 over ipv6
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	ping -c 1 10.1.1.100
-	# overlay: ipv6 over ipv6
-	ip netns exec at_ns0 ping6 -c 1 fc80::200
-	cleanup
-}
-
-function test_ip6gretap {
-	TYPE=ip6gretap
-	DEV_NS=ip6gretap00
-	DEV=ip6gretap11
-	config_device
-	add_ip6gretap_tunnel
-	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
-	# underlay
-	ping6 -c 4 ::11
-	# overlay: ipv4 over ipv6
-	ip netns exec at_ns0 ping -i .2 -c 1 10.1.1.200
-	ping -c 1 10.1.1.100
-	# overlay: ipv6 over ipv6
-	ip netns exec at_ns0 ping6 -c 1 fc80::200
-	cleanup
-}
-
-function test_erspan {
-	TYPE=erspan
-	DEV_NS=erspan00
-	DEV=erspan11
-	config_device
-	add_erspan_tunnel $1
-	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ip6erspan {
-	TYPE=ip6erspan
-	DEV_NS=ip6erspan00
-	DEV=ip6erspan11
-	config_device
-	add_ip6erspan_tunnel $1
-	attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
-	ping6 -c 3 ::11
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_vxlan {
-	TYPE=vxlan
-	DEV_NS=vxlan00
-	DEV=vxlan11
-	config_device
-	add_vxlan_tunnel
-	attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_geneve {
-	TYPE=geneve
-	DEV_NS=geneve00
-	DEV=geneve11
-	config_device
-	add_geneve_tunnel
-	attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ipip {
-	TYPE=ipip
-	DEV_NS=ipip00
-	DEV=ipip11
-	config_device
-	tcpdump -nei veth1 &
-	cat /sys/kernel/debug/tracing/trace_pipe &
-	add_ipip_tunnel
-	ethtool -K veth1 gso off gro off rx off tx off
-	ip link set dev veth1 mtu 1500
-	attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	ip netns exec at_ns0 iperf -sD -p 5200 > /dev/null
-	sleep 0.2
-	iperf -c 10.1.1.100 -n 5k -p 5200
-	cleanup
-}
-
-function test_xfrm_tunnel {
-	config_device
-        tcpdump -nei veth1 ip &
-	output=$(mktemp)
-	cat /sys/kernel/debug/tracing/trace_pipe | tee $output &
-        setup_xfrm_tunnel
-	tc qdisc add dev veth1 clsact
-	tc filter add dev veth1 proto ip ingress bpf da obj tcbpf2_kern.o \
-		sec xfrm_get_state
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	grep "reqid 1" $output
-	grep "spi 0x1" $output
-	grep "remote ip 0xac100164" $output
-	cleanup
-}
-
-function cleanup {
-	set +ex
-	pkill iperf
-	ip netns delete at_ns0
-	ip link del veth1
-	ip link del ipip11
-	ip link del gretap11
-	ip link del ip6gre11
-	ip link del ip6gretap11
-	ip link del vxlan11
-	ip link del geneve11
-	ip link del erspan11
-	ip link del ip6erspan11
-	ip x s flush
-	ip x p flush
-	pkill tcpdump
-	pkill cat
-	set -ex
-}
-
-trap cleanup 0 2 3 6 9
-cleanup
-echo "Testing GRE tunnel..."
-test_gre
-echo "Testing IP6GRE tunnel..."
-test_ip6gre
-echo "Testing IP6GRETAP tunnel..."
-test_ip6gretap
-echo "Testing ERSPAN tunnel..."
-test_erspan v1
-test_erspan v2
-echo "Testing IP6ERSPAN tunnel..."
-test_ip6erspan v1
-test_ip6erspan v2
-echo "Testing VXLAN tunnel..."
-test_vxlan
-echo "Testing GENEVE tunnel..."
-test_geneve
-echo "Testing IPIP tunnel..."
-test_ipip
-echo "Testing IPSec tunnel..."
-test_xfrm_tunnel
-echo "*** PASS ***"
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next v2 4/7] net: mscc: Add initial Ocelot switch support
From: Andrew Lunn @ 2018-04-26 21:09 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: David S . Miller, Allan Nielsen, razvan.stefanescu, po.liu,
	Thomas Petazzoni, Florian Fainelli, netdev, devicetree,
	linux-kernel, linux-mips
In-Reply-To: <20180426195931.5393-5-alexandre.belloni@bootlin.com>

> +/* Checks if the net_device instance given to us originate from our driver. */
> +static bool ocelot_netdevice_dev_check(const struct net_device *dev)
> +{
> +	return dev->netdev_ops == &ocelot_port_netdev_ops;
> +}

This is probably O.K. now, but when you add support for controlling
the switch over PCIe, i think it breaks. A board could have two
switches...

It might be possible to do something with dev->parent. All ports of a
switch should have the same parent.

       Andrew

^ permalink raw reply

* [PATCH 0/3] can: fix ndo_start_xmit()'s return type
From: Luc Van Oostenryck @ 2018-04-26 21:13 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: moderated list:ARM/Allwinner sunXi SoC support, open list,
	Maxime Ripard, open list:NETWORKING DRIVERS, Michal Simek,
	open list:CAN NETWORK DRIVERS, Chen-Yu Tsai, Luc Van Oostenryck,
	Wolfgang Grandegger

ndo_start_xmit() is defined as returing an 'netdev_tx_t'.
However, several can drivers use 'int' as the return type
of their start_xmit() method.
This series contains the fix for all three of them.

Luc Van Oostenryck (3):
  can: janz-ican3: fix ican3_xmit()'s return type
  can: sun4i: fix sun4ican_start_xmit()'s return type
  can: xilinx: fix xcan_start_xmit()'s return type

 drivers/net/can/janz-ican3.c | 2 +-
 drivers/net/can/sun4i_can.c  | 2 +-
 drivers/net/can/xilinx_can.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH 1/3] can: janz-ican3: fix ican3_xmit()'s return type
From: Luc Van Oostenryck @ 2018-04-26 21:13 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Luc Van Oostenryck, Wolfgang Grandegger, Maxime Ripard,
	Chen-Yu Tsai, Michal Simek, open list:CAN NETWORK DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	moderated list:ARM/Allwinner sunXi SoC support
In-Reply-To: <20180426211339.30821-1-luc.vanoostenryck@gmail.com>

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 drivers/net/can/janz-ican3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/janz-ican3.c b/drivers/net/can/janz-ican3.c
index adfdb66a4..02042cb09 100644
--- a/drivers/net/can/janz-ican3.c
+++ b/drivers/net/can/janz-ican3.c
@@ -1684,7 +1684,7 @@ static int ican3_stop(struct net_device *ndev)
 	return 0;
 }
 
-static int ican3_xmit(struct sk_buff *skb, struct net_device *ndev)
+static netdev_tx_t ican3_xmit(struct sk_buff *skb, struct net_device *ndev)
 {
 	struct ican3_dev *mod = netdev_priv(ndev);
 	struct can_frame *cf = (struct can_frame *)skb->data;
-- 
2.17.0

^ permalink raw reply related

* [PATCH 2/3] can: sun4i: fix sun4ican_start_xmit()'s return type
From: Luc Van Oostenryck @ 2018-04-26 21:13 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Luc Van Oostenryck, Wolfgang Grandegger, Maxime Ripard,
	Chen-Yu Tsai, Michal Simek, open list:CAN NETWORK DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	moderated list:ARM/Allwinner sunXi SoC support
In-Reply-To: <20180426211339.30821-1-luc.vanoostenryck@gmail.com>

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 drivers/net/can/sun4i_can.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c
index 1ac2090a1..093fc9a52 100644
--- a/drivers/net/can/sun4i_can.c
+++ b/drivers/net/can/sun4i_can.c
@@ -409,7 +409,7 @@ static int sun4ican_set_mode(struct net_device *dev, enum can_mode mode)
  * xx xx xx xx         ff         ll 00 11 22 33 44 55 66 77
  * [ can_id ] [flags] [len] [can data (up to 8 bytes]
  */
-static int sun4ican_start_xmit(struct sk_buff *skb, struct net_device *dev)
+static netdev_tx_t sun4ican_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct sun4ican_priv *priv = netdev_priv(dev);
 	struct can_frame *cf = (struct can_frame *)skb->data;
-- 
2.17.0

^ permalink raw reply related

* [PATCH 3/3] can: xilinx: fix xcan_start_xmit()'s return type
From: Luc Van Oostenryck @ 2018-04-26 21:13 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Luc Van Oostenryck, Wolfgang Grandegger, Maxime Ripard,
	Chen-Yu Tsai, Michal Simek, open list:CAN NETWORK DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	moderated list:ARM/Allwinner sunXi SoC support
In-Reply-To: <20180426211339.30821-1-luc.vanoostenryck@gmail.com>

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.

Fix this by returning 'netdev_tx_t' in this driver too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 drivers/net/can/xilinx_can.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
index 89aec07c2..a19648606 100644
--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -386,7 +386,7 @@ static int xcan_do_set_mode(struct net_device *ndev, enum can_mode mode)
  *
  * Return: 0 on success and failure value on error
  */
-static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+static netdev_tx_t xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 {
 	struct xcan_priv *priv = netdev_priv(ndev);
 	struct net_device_stats *stats = &ndev->stats;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive
From: Andy Lutomirski @ 2018-04-26 21:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Soheil Hassas Yeganeh, Eric Dumazet, David S. Miller,
	Network Development, Andrew Lutomirski, LKML, Linux-MM
In-Reply-To: <a2c405e1-0ebc-dd33-fb0d-575bf06a1ff6@gmail.com>

At the risk of further muddying the waters, there's another minor tweak
that could improve performance on certain workloads.  Currently you mmap()
a range for a given socket and then getsockopt() to receive.  If you made
it so you could mmap() something once for any number of sockets (by
mmapping /dev/misc/tcp_zero_receive or whatever), then the performance of
the getsockopt() bit would be identical, but you could release the mapping
for many sockets at once with only a single flush.  For some use cases,
this could be a big win.

You could also add this later easily enough, too.

^ permalink raw reply

* Re: [PATCH net-next 1/2 v2] netns: restrict uevents
From: Christian Brauner @ 2018-04-26 21:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <878t99opvd.fsf@xmission.com>

On Thu, Apr 26, 2018 at 12:10:30PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
> 
> > On Thu, Apr 26, 2018 at 11:47:19AM -0500, Eric W. Biederman wrote:
> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> 
> >> > On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> >> 
> >> >> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >> >
> >> >> >  Bah. This code is obviously correct and probably wrong.
> >> >> >
> >> >> >  How do we deliver uevents for network devices that are outside of the
> >> >> >  initial user namespace? The kernel still needs to deliver those.
> >> >> >
> >> >> >  The logic to figure out which network namespace a device needs to be
> >> >> >  delivered to is is present in kobj_bcast_filter. That logic will almost
> >> >> >  certainly need to be turned inside out. Sign not as easy as I would
> >> >> >  have hoped.
> >> >> >
> >> >> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic
> >> >> > out and come up with a new patch.
> >> >> 
> >> >> I may have mis-understood.
> >> >> 
> >> >> I heard and am still hearing additional filtering to reduce the places
> >> >> the packet is delievered.
> >> >> 
> >> >> I am saying something needs to change to increase the number of places
> >> >> the packet is delivered.
> >> >> 
> >> >> For the special class of devices that kobj_bcast_filter would apply to
> >> >> those need to be delivered to netowrk namespaces  that are no longer on
> >> >> uevent_sock_list.
> >> >> 
> >> >> So the code fundamentally needs to split into two paths.  Ordinary
> >> >> devices that use uevent_sock_list.  Network devices that are just
> >> >> delivered in their own network namespace.
> >> >> 
> >> >> netlink_broadcast_filtered gets to go away completely.
> >> >
> >> > The split *might* make sense but I think you're wrong about removing the
> >> > kobj_bcast_filter. The current filter doesn't operate on the uevent
> >> > socket in uevent_sock_list itself it rather operates on the sockets in
> >> > mc_list. And if socket in mc_list can have a different network namespace
> >> > then the uevent_socket itself then your way won't work. That's why my
> >> > original patch added additional filtering in there. The way I see it we
> >> > need something like:
> >> 
> >> We already filter the sockets in the mc_list by network namespace.
> >
> > Oh really? That's good to know. I haven't found where in the code this
> > actually happens. I thought that when netlink_bind() is called anyone
> > could register themselves in mc_list.
> 
> The code in af_netlink.c does:
> > static void do_one_broadcast(struct sock *sk,
> > 				    struct netlink_broadcast_data *p)
> > {
> > 	struct netlink_sock *nlk = nlk_sk(sk);
> > 	int val;
> > 
> > 	if (p->exclude_sk == sk)
> > 		return;
> > 
> > 	if (nlk->portid == p->portid || p->group - 1 >= nlk->ngroups ||
> > 	    !test_bit(p->group - 1, nlk->groups))
> > 		return;
> > 
> > 	if (!net_eq(sock_net(sk), p->net)) {
>             ^^^^^^^^^^^^ Here
> > 		if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> > 			return;
>                 ^^^^^^^^^^^ Here
> > 
> > 		if (!peernet_has_id(sock_net(sk), p->net))
> > 			return;
> > 
> > 		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> > 				     CAP_NET_BROADCAST))
> > 			return;
> > 	}
> 
> Which if you are not a magic NETLINK_F_LISTEN_ALL_NSID socket filters
> you out if you are the wrong network namespace.
> 
> 
> >> When a packet is transmitted with netlink_broadcast it is only
> >> transmitted within a single network namespace.
> >> 
> >> Even in the case of a NETLINK_F_LISTEN_ALL_NSID socket the skb is tagged
> >> with it's source network namespace so no confusion will result, and the
> >> permission checks have been done to make it safe. So you can safely
> >> ignore that case.  Please ignore that case.  It only needs to be
> >> considered if refactoring af_netlink.c
> >> 
> >> When I added netlink_broadcast_filtered I imagined that we would need
> >> code that worked across network namespaces that worked for different
> >> namespaces.   So it looked like we would need the level of granularity
> >> that you can get with netlink_broadcast_filtered.  It turns out we don't
> >> and that it was a case of over design.  As the only split we care about
> >> is per network namespace there is no need for
> >> netlink_broadcast_filtered.
> >> 
> >> > init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter);
> >> > user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter);
> >> >
> >> > The question that remains is whether we can rely on the network
> >> > namespace information we can gather from the kobject_ns_type_operations
> >> > to decide where we want to broadcast that event to. So something
> >> > *like*:
> >> 
> >> We can.  We already do.  That is what kobj_bcast_filter implements.
> >> 
> >> > 	ops = kobj_ns_ops(kobj);
> >> > 	if (!ops && kobj->kset) {
> >> > 		struct kobject *ksobj = &kobj->kset->kobj;
> >> > 		if (ksobj->parent != NULL)
> >> > 			ops = kobj_ns_ops(ksobj->parent);
> >> > 	}
> >> >
> >> > 	if (ops && ops->netlink_ns && kobj->ktype->namespace)
> >> > 		if (ops->type == KOBJ_NS_TYPE_NET)
> >> > 			net = kobj->ktype->namespace(kobj);
> >> 
> >> Please note the only entry in the enumeration in the kobj_ns_type
> >> enumeration other than KOBJ_NS_TYPE_NONE is KOBJ_NS_TYPE_NET.  So the
> >> check for ops->type in this case is redundant.
> >
> > Yes, I know the reason for doing it explicitly is to block the case
> > where kobjects get tagged with other namespaces. So we'd need to be
> > vigilant should that ever happen but fine.
> 
> It is fine to keep the check.
> 
> I was intending to point out that it is much more likely that we remove
> the enumeration and remove some of the extra abstraction, than another
> namespace is implemented there.
> 
> >> That is something else that could be simplifed.  At the time it was the
> >> necessary to get the sysfs changes merged.
> >> 
> >> > 	if (!net || net->user_ns == &init_user_ns)
> >> > 		ret = init_user_ns_broadcast(env, action_string, devpath);
> >> > 	else
> >> > 		ret = user_ns_broadcast(net->uevent_sock->sk, env,
> >> > 					action_string, devpath);
> >> 
> >> Almost.
> >> 
> >> 	if (!net)
> >>         	kobject_uevent_net_broadcast(kobj, env, action_string,
> >>         					dev_path);
> >> 	else
> >>         	netlink_broadcast(net->uevent_sock->sk, skb, 0, 1, GFP_KERNEL);
> >> 
> >> 
> >> I am handwaving to get the skb in the netlink_broadcast case but that
> >> should be enough for you to see what I am thinking.
> >
> > I have added a helper alloc_uevent_skb() that can be used in both cases.
> >
> > static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
> > 					const char *action_string,
> > 					const char *devpath)
> > {
> > 	struct sk_buff *skb = NULL;
> > 	char *scratch;
> > 	size_t len;
> >
> > 	/* allocate message with maximum possible size */
> > 	len = strlen(action_string) + strlen(devpath) + 2;
> > 	skb = alloc_skb(len + env->buflen, GFP_KERNEL);
> > 	if (!skb)
> > 		return NULL;
> >
> > 	/* add header */
> > 	scratch = skb_put(skb, len);
> > 	sprintf(scratch, "%s@%s", action_string, devpath);
> >
> > 	skb_put_data(skb, env->buf, env->buflen);
> >
> > 	NETLINK_CB(skb).dst_group = 1;
> >
> > 	return skb;
> > }
> >
> >> 
> >> My only concern with the above is that we almost certainly need to fix
> >> the credentials on the skb so that userspace does not drop the packet

I guess we simply want:
	if (user_ns != &init_user_ns) {
		NETLINK_CB(skb).creds.uid = (kuid_t)0;
		NETLINK_CB(skb).creds.gid = kgid_t)0;
	}

instead of the more complicated and - imho wrong:

	if (user_ns != &init_user_ns) {
		/* fix credentials for udev running in user namespace */
		kuid_t uid = NETLINK_CB(skb).creds.uid;
		kgid_t gid = NETLINK_CB(skb).creds.gid;
		NETLINK_CB(skb).creds.uid = from_kuid_munged(user_ns, uid);
		NETLINK_CB(skb).creds.gid = from_kgid_munged(user_ns, gid);
	}

Christian

> >> sent to a network namespace because it has the credentials that will
> >> cause userspace to drop the packet today.
> >> 
> >> But it should be straight forward to look at net->user_ns, to fix the
> >> credentials.
> >
> > Yes, afaict, the only thing that needs to be updated is the uid.
> 
> I suspect there may also be a gid.
> 
> Eric

^ permalink raw reply

* [PATCH net-next] hv_netvsc: simplify receive side calling arguments
From: Stephen Hemminger @ 2018-04-26 21:34 UTC (permalink / raw)
  To: haiyangz; +Cc: netdev, Stephen Hemminger

The calls up from the napi poll reading the receive ring had many
places where an argument was being recreated. I.e the caller already
had the value and wasn't passing it, then the callee would use
known relationship to determine the same value. Simpler and faster
to just pass arguments needed.

Also, add const in a couple places where message is being only read.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc.c | 58 +++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 32 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index e7308958b7a9..d2ee66c259a7 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -652,16 +652,14 @@ static inline void netvsc_free_send_slot(struct netvsc_device *net_device,
 	sync_change_bit(index, net_device->send_section_map);
 }
 
-static void netvsc_send_tx_complete(struct netvsc_device *net_device,
-				    struct vmbus_channel *incoming_channel,
-				    struct hv_device *device,
+static void netvsc_send_tx_complete(struct net_device *ndev,
+				    struct netvsc_device *net_device,
+				    struct vmbus_channel *channel,
 				    const struct vmpacket_descriptor *desc,
 				    int budget)
 {
 	struct sk_buff *skb = (struct sk_buff *)(unsigned long)desc->trans_id;
-	struct net_device *ndev = hv_get_drvdata(device);
 	struct net_device_context *ndev_ctx = netdev_priv(ndev);
-	struct vmbus_channel *channel = device->channel;
 	u16 q_idx = 0;
 	int queue_sends;
 
@@ -675,7 +673,6 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device,
 		if (send_index != NETVSC_INVALID_INDEX)
 			netvsc_free_send_slot(net_device, send_index);
 		q_idx = packet->q_idx;
-		channel = incoming_channel;
 
 		tx_stats = &net_device->chan_table[q_idx].tx_stats;
 
@@ -705,14 +702,13 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device,
 	}
 }
 
-static void netvsc_send_completion(struct netvsc_device *net_device,
+static void netvsc_send_completion(struct net_device *ndev,
+				   struct netvsc_device *net_device,
 				   struct vmbus_channel *incoming_channel,
-				   struct hv_device *device,
 				   const struct vmpacket_descriptor *desc,
 				   int budget)
 {
-	struct nvsp_message *nvsp_packet = hv_pkt_data(desc);
-	struct net_device *ndev = hv_get_drvdata(device);
+	const struct nvsp_message *nvsp_packet = hv_pkt_data(desc);
 
 	switch (nvsp_packet->hdr.msg_type) {
 	case NVSP_MSG_TYPE_INIT_COMPLETE:
@@ -726,8 +722,8 @@ static void netvsc_send_completion(struct netvsc_device *net_device,
 		break;
 
 	case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE:
-		netvsc_send_tx_complete(net_device, incoming_channel,
-					device, desc, budget);
+		netvsc_send_tx_complete(ndev, net_device, incoming_channel,
+					desc, budget);
 		break;
 
 	default:
@@ -1092,12 +1088,11 @@ static void enq_receive_complete(struct net_device *ndev,
 
 static int netvsc_receive(struct net_device *ndev,
 			  struct netvsc_device *net_device,
-			  struct net_device_context *net_device_ctx,
-			  struct hv_device *device,
 			  struct vmbus_channel *channel,
 			  const struct vmpacket_descriptor *desc,
-			  struct nvsp_message *nvsp)
+			  const struct nvsp_message *nvsp)
 {
+	struct net_device_context *net_device_ctx = netdev_priv(ndev);
 	const struct vmtransfer_page_packet_header *vmxferpage_packet
 		= container_of(desc, const struct vmtransfer_page_packet_header, d);
 	u16 q_idx = channel->offermsg.offer.sub_channel_index;
@@ -1158,13 +1153,12 @@ static int netvsc_receive(struct net_device *ndev,
 	return count;
 }
 
-static void netvsc_send_table(struct hv_device *hdev,
-			      struct nvsp_message *nvmsg)
+static void netvsc_send_table(struct net_device *ndev,
+			      const struct nvsp_message *nvmsg)
 {
-	struct net_device *ndev = hv_get_drvdata(hdev);
 	struct net_device_context *net_device_ctx = netdev_priv(ndev);
-	int i;
 	u32 count, *tab;
+	int i;
 
 	count = nvmsg->msg.v5_msg.send_table.count;
 	if (count != VRSS_SEND_TAB_SIZE) {
@@ -1179,24 +1173,25 @@ static void netvsc_send_table(struct hv_device *hdev,
 		net_device_ctx->tx_table[i] = tab[i];
 }
 
-static void netvsc_send_vf(struct net_device_context *net_device_ctx,
-			   struct nvsp_message *nvmsg)
+static void netvsc_send_vf(struct net_device *ndev,
+			   const struct nvsp_message *nvmsg)
 {
+	struct net_device_context *net_device_ctx = netdev_priv(ndev);
+
 	net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
 	net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
 }
 
-static inline void netvsc_receive_inband(struct hv_device *hdev,
-				 struct net_device_context *net_device_ctx,
-				 struct nvsp_message *nvmsg)
+static  void netvsc_receive_inband(struct net_device *ndev,
+				   const struct nvsp_message *nvmsg)
 {
 	switch (nvmsg->hdr.msg_type) {
 	case NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE:
-		netvsc_send_table(hdev, nvmsg);
+		netvsc_send_table(ndev, nvmsg);
 		break;
 
 	case NVSP_MSG4_TYPE_SEND_VF_ASSOCIATION:
-		netvsc_send_vf(net_device_ctx, nvmsg);
+		netvsc_send_vf(ndev, nvmsg);
 		break;
 	}
 }
@@ -1208,24 +1203,23 @@ static int netvsc_process_raw_pkt(struct hv_device *device,
 				  const struct vmpacket_descriptor *desc,
 				  int budget)
 {
-	struct net_device_context *net_device_ctx = netdev_priv(ndev);
-	struct nvsp_message *nvmsg = hv_pkt_data(desc);
+	const struct nvsp_message *nvmsg = hv_pkt_data(desc);
 
 	trace_nvsp_recv(ndev, channel, nvmsg);
 
 	switch (desc->type) {
 	case VM_PKT_COMP:
-		netvsc_send_completion(net_device, channel, device,
+		netvsc_send_completion(ndev, net_device, channel,
 				       desc, budget);
 		break;
 
 	case VM_PKT_DATA_USING_XFER_PAGES:
-		return netvsc_receive(ndev, net_device, net_device_ctx,
-				      device, channel, desc, nvmsg);
+		return netvsc_receive(ndev, net_device, channel,
+				      desc, nvmsg);
 		break;
 
 	case VM_PKT_DATA_INBAND:
-		netvsc_receive_inband(device, net_device_ctx, nvmsg);
+		netvsc_receive_inband(ndev, nvmsg);
 		break;
 
 	default:
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive
From: Eric Dumazet @ 2018-04-26 21:40 UTC (permalink / raw)
  To: Andy Lutomirski, Eric Dumazet
  Cc: Soheil Hassas Yeganeh, Eric Dumazet, David S. Miller,
	Network Development, LKML, Linux-MM
In-Reply-To: <CALCETrVBQD1tPUzc_t7HmoPfApTdFW+x-0DqL8+XHjrmEpYMXQ@mail.gmail.com>



On 04/26/2018 02:16 PM, Andy Lutomirski wrote:
> At the risk of further muddying the waters, there's another minor tweak
> that could improve performance on certain workloads.  Currently you mmap()
> a range for a given socket and then getsockopt() to receive.  If you made
> it so you could mmap() something once for any number of sockets (by
> mmapping /dev/misc/tcp_zero_receive or whatever), then the performance of
> the getsockopt() bit would be identical, but you could release the mapping
> for many sockets at once with only a single flush.  For some use cases,
> this could be a big win.
> 
> You could also add this later easily enough, too.
> 

I believe I implemented what you just described.

The getsockopt() call checks that the VMA was created by a mmap() to one TCP socket.

It does not check that the vma was created by mmap() on the same socket,
because we do not need this extra check really.

So you presumably could use mmap() to grab 1GB of virtual space, then split it
as you wish for different sockets.

Thanks.

^ permalink raw reply

* [PATCH net-next v2 00/14] bnxt_en: Net-next updates.
From: Michael Chan @ 2018-04-26 21:44 UTC (permalink / raw)
  To: davem; +Cc: netdev

This series has 3 main features.  The first is to add mqprio TC to
hardware queue mapping to avoid reprogramming hardware CoS queue
watermarks during run-time.  The second is DIM improvements from
Andy Gospo.  The third is some improvements to VF resource allocations
when supporting large numbers of VFs with more limited resources.

There are some additional minor improvements and a new function level
discard counter.

v2: Fixed EEPROM typo noted by Andrew Lunn.

Andy Gospodarek (3):
  bnxt_en: Increase RING_IDLE minimum threshold to 50
  bnxt_en: reduce timeout on initial HWRM calls
  bnxt_en: add debugfs support for DIM

Michael Chan (10):
  bnxt_en: Add TC to hardware QoS queue mapping logic.
  bnxt_en: Remap TC to hardware queues when configuring PFC.
  bnxt_en: Check the lengths of encapsulated firmware responses.
  bnxt_en: Do not set firmware time from VF driver on older firmware.
  bnxt_en: Simplify ring alloc/free error messages.
  bnxt_en: Do not allow VF to read EEPEOM.
  bnxt_en: Reserve rings in bnxt_set_channels() if device is down.
  bnxt_en: Don't reserve rings on VF when min rings were not provisioned
    by PF.
  bnxt_en: Reserve RSS and L2 contexts for VF.
  bnxt_en: Reserve rings at driver open if none was reserved at probe
    time.

Vasundhara Volam (1):
  bnxt_en: Display function level rx/tx_discard_pkts via ethtool

 drivers/net/ethernet/broadcom/bnxt/Makefile       |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 147 +++++++++++++------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |   9 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     | 166 +++++++++++++---------
 drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c | 124 ++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.h |  23 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  40 ++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c   |  19 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h   |  17 +++
 9 files changed, 433 insertions(+), 113 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.h

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next v2 01/14] bnxt_en: Add TC to hardware QoS queue mapping logic.
From: Michael Chan @ 2018-04-26 21:44 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1524779084-4016-1-git-send-email-michael.chan@broadcom.com>

The current driver maps MQPRIO traffic classes directly 1:1 to the
internal hardware queues (TC0 maps to hardware queue 0, etc).  This
direct mapping requires the internal hardware queues to be reconfigured
from lossless to lossy and vice versa when necessary.  This
involves reconfiguring internal buffer thresholds which is
disruptive and not always reliable.

Implement a new scheme to map TCs to internal hardware queues by
matching up their PFC requirements.  This will eliminate the need
to reconfigure a hardware queue internal buffers at run time.  After
remapping, the NIC is closed and opened for the new TC to hardware
queues to take effect.

This patch only adds the basic mapping logic.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  5 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 65 +++++++++++++++++----------
 3 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f83769d..bda618d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2383,6 +2383,7 @@ static int bnxt_alloc_tx_rings(struct bnxt *bp)
 	for (i = 0, j = 0; i < bp->tx_nr_rings; i++) {
 		struct bnxt_tx_ring_info *txr = &bp->tx_ring[i];
 		struct bnxt_ring_struct *ring;
+		u8 qidx;
 
 		ring = &txr->tx_ring_struct;
 
@@ -2411,7 +2412,8 @@ static int bnxt_alloc_tx_rings(struct bnxt *bp)
 
 			memset(txr->tx_push, 0, sizeof(struct tx_push_bd));
 		}
-		ring->queue_id = bp->q_info[j].queue_id;
+		qidx = bp->tc_to_qidx[j];
+		ring->queue_id = bp->q_info[qidx].queue_id;
 		if (i < bp->tx_nr_rings_xdp)
 			continue;
 		if (i % bp->tx_nr_rings_per_tc == (bp->tx_nr_rings_per_tc - 1))
@@ -5309,6 +5311,7 @@ static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp)
 	for (i = 0; i < bp->max_tc; i++) {
 		bp->q_info[i].queue_id = *qptr++;
 		bp->q_info[i].queue_profile = *qptr++;
+		bp->tc_to_qidx[i] = i;
 	}
 
 qportcfg_exit:
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3d55d3b..057f8a2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1242,6 +1242,7 @@ struct bnxt {
 	u8			max_tc;
 	u8			max_lltc;	/* lossless TCs */
 	struct bnxt_queue_info	q_info[BNXT_MAX_QUEUE];
+	u8			tc_to_qidx[BNXT_MAX_QUEUE];
 
 	unsigned int		current_interval;
 #define BNXT_TIMER_INTERVAL	HZ
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index 3c746f2..1b72f8a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -21,6 +21,21 @@
 #include "bnxt_dcb.h"
 
 #ifdef CONFIG_BNXT_DCB
+static int bnxt_queue_to_tc(struct bnxt *bp, u8 queue_id)
+{
+	int i, j;
+
+	for (i = 0; i < bp->max_tc; i++) {
+		if (bp->q_info[i].queue_id == queue_id) {
+			for (j = 0; j < bp->max_tc; j++) {
+				if (bp->tc_to_qidx[j] == i)
+					return j;
+			}
+		}
+	}
+	return -EINVAL;
+}
+
 static int bnxt_hwrm_queue_pri2cos_cfg(struct bnxt *bp, struct ieee_ets *ets)
 {
 	struct hwrm_queue_pri2cos_cfg_input req = {0};
@@ -33,10 +48,13 @@ static int bnxt_hwrm_queue_pri2cos_cfg(struct bnxt *bp, struct ieee_ets *ets)
 
 	pri2cos = &req.pri0_cos_queue_id;
 	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+		u8 qidx;
+
 		req.enables |= cpu_to_le32(
 			QUEUE_PRI2COS_CFG_REQ_ENABLES_PRI0_COS_QUEUE_ID << i);
 
-		pri2cos[i] = bp->q_info[ets->prio_tc[i]].queue_id;
+		qidx = bp->tc_to_qidx[ets->prio_tc[i]];
+		pri2cos[i] = bp->q_info[qidx].queue_id;
 	}
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 	return rc;
@@ -55,17 +73,15 @@ static int bnxt_hwrm_queue_pri2cos_qcfg(struct bnxt *bp, struct ieee_ets *ets)
 	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 	if (!rc) {
 		u8 *pri2cos = &resp->pri0_cos_queue_id;
-		int i, j;
+		int i;
 
 		for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
 			u8 queue_id = pri2cos[i];
+			int tc;
 
-			for (j = 0; j < bp->max_tc; j++) {
-				if (bp->q_info[j].queue_id == queue_id) {
-					ets->prio_tc[i] = j;
-					break;
-				}
-			}
+			tc = bnxt_queue_to_tc(bp, queue_id);
+			if (tc >= 0)
+				ets->prio_tc[i] = tc;
 		}
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
@@ -81,13 +97,15 @@ static int bnxt_hwrm_queue_cos2bw_cfg(struct bnxt *bp, struct ieee_ets *ets,
 	void *data;
 
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_COS2BW_CFG, -1, -1);
-	data = &req.unused_0;
-	for (i = 0; i < max_tc; i++, data += sizeof(cos2bw) - 4) {
+	for (i = 0; i < max_tc; i++) {
+		u8 qidx;
+
 		req.enables |= cpu_to_le32(
 			QUEUE_COS2BW_CFG_REQ_ENABLES_COS_QUEUE_ID0_VALID << i);
 
 		memset(&cos2bw, 0, sizeof(cos2bw));
-		cos2bw.queue_id = bp->q_info[i].queue_id;
+		qidx = bp->tc_to_qidx[i];
+		cos2bw.queue_id = bp->q_info[qidx].queue_id;
 		if (ets->tc_tsa[i] == IEEE_8021QAZ_TSA_STRICT) {
 			cos2bw.tsa =
 				QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP;
@@ -103,8 +121,9 @@ static int bnxt_hwrm_queue_cos2bw_cfg(struct bnxt *bp, struct ieee_ets *ets,
 				cpu_to_le32((ets->tc_tx_bw[i] * 100) |
 					    BW_VALUE_UNIT_PERCENT1_100);
 		}
+		data = &req.unused_0 + qidx * (sizeof(cos2bw) - 4);
 		memcpy(data, &cos2bw.queue_id, sizeof(cos2bw) - 4);
-		if (i == 0) {
+		if (qidx == 0) {
 			req.queue_id0 = cos2bw.queue_id;
 			req.unused_0 = 0;
 		}
@@ -132,22 +151,22 @@ static int bnxt_hwrm_queue_cos2bw_qcfg(struct bnxt *bp, struct ieee_ets *ets)
 
 	data = &resp->queue_id0 + offsetof(struct bnxt_cos2bw_cfg, queue_id);
 	for (i = 0; i < bp->max_tc; i++, data += sizeof(cos2bw) - 4) {
-		int j;
+		int tc;
 
 		memcpy(&cos2bw.queue_id, data, sizeof(cos2bw) - 4);
 		if (i == 0)
 			cos2bw.queue_id = resp->queue_id0;
 
-		for (j = 0; j < bp->max_tc; j++) {
-			if (bp->q_info[j].queue_id != cos2bw.queue_id)
-				continue;
-			if (cos2bw.tsa ==
-			    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP) {
-				ets->tc_tsa[j] = IEEE_8021QAZ_TSA_STRICT;
-			} else {
-				ets->tc_tsa[j] = IEEE_8021QAZ_TSA_ETS;
-				ets->tc_tx_bw[j] = cos2bw.bw_weight;
-			}
+		tc = bnxt_queue_to_tc(bp, cos2bw.queue_id);
+		if (tc < 0)
+			continue;
+
+		if (cos2bw.tsa ==
+		    QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP) {
+			ets->tc_tsa[tc] = IEEE_8021QAZ_TSA_STRICT;
+		} else {
+			ets->tc_tsa[tc] = IEEE_8021QAZ_TSA_ETS;
+			ets->tc_tx_bw[tc] = cos2bw.bw_weight;
 		}
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox