Netdev List
 help / color / mirror / Atom feed
* [net-next 01/10] ixgbe: Drop support for macvlan specific unicast lists
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Drop the code for handling macvlan specific unicast lists. It isn't needed
since we don't take any efforts to maintain it when we bring the interface
up and it takes the slow path anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 31 ---------------------------
 1 file changed, 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 51e7d82a5860..c8f59430a5a1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4900,36 +4900,6 @@ int ixgbe_del_mac_filter(struct ixgbe_adapter *adapter,
 	return -ENOMEM;
 }
 
-/**
- * ixgbe_write_uc_addr_list - write unicast addresses to RAR table
- * @netdev: network interface device structure
- * @vfn: pool to associate with unicast addresses
- *
- * Writes unicast address list to the RAR table.
- * Returns: -ENOMEM on failure/insufficient address space
- *                0 on no addresses written
- *                X on writing X addresses to the RAR table
- **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev, int vfn)
-{
-	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int count = 0;
-
-	/* return ENOMEM indicating insufficient memory for addresses */
-	if (netdev_uc_count(netdev) > ixgbe_available_rars(adapter, vfn))
-		return -ENOMEM;
-
-	if (!netdev_uc_empty(netdev)) {
-		struct netdev_hw_addr *ha;
-		netdev_for_each_uc_addr(ha, netdev) {
-			ixgbe_del_mac_filter(adapter, ha->addr, vfn);
-			ixgbe_add_mac_filter(adapter, ha->addr, vfn);
-			count++;
-		}
-	}
-	return count;
-}
-
 static int ixgbe_uc_sync(struct net_device *netdev, const unsigned char *addr)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
@@ -5328,7 +5298,6 @@ static void ixgbe_macvlan_set_rx_mode(struct net_device *dev, unsigned int pool,
 		vmolr |= IXGBE_VMOLR_ROMPE;
 		hw->mac.ops.update_mc_addr_list(hw, dev);
 	}
-	ixgbe_write_uc_addr_list(adapter->netdev, pool);
 	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr);
 }
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 06/10] macvlan: Add function to test for destination filtering support
From: Jeff Kirsher @ 2018-04-25 15:33 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch adds a function indicating if a given macvlan can fully supports
destination filtering, especially as it relates to unicast traffic. For
those macvlan interfaces that do not support destination filtering such
passthru or source mode filtering we should not be enabling offload
support.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/if_macvlan.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 80089d6c5a0a..0221390668a0 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -88,4 +88,13 @@ static inline void *macvlan_accel_priv(struct net_device *dev)
 
 	return macvlan->accel_priv;
 }
+
+static inline bool macvlan_supports_dest_filter(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	return macvlan->mode == MACVLAN_MODE_PRIVATE ||
+	       macvlan->mode == MACVLAN_MODE_VEPA ||
+	       macvlan->mode == MACVLAN_MODE_BRIDGE;
+}
 #endif /* _LINUX_IF_MACVLAN_H */
-- 
2.14.3

^ permalink raw reply related

* [net-next 02/10] macvlan: Rename fwd_priv to accel_priv and add accessor function
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180425153305.25878-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change renames the fwd_priv member to accel_priv as this more
accurately reflects the actual purpose of this value. In addition I am
adding an accessor which will allow us to further abstract this in the
future if needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 ++++-----
 drivers/net/macvlan.c                         | 31 ++++++++++++++++-----------
 include/linux/if_macvlan.h                    |  8 ++++++-
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c8f59430a5a1..0fbeb2f48711 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5398,10 +5398,9 @@ static int ixgbe_fwd_ring_up(struct net_device *vdev,
 static int ixgbe_upper_dev_walk(struct net_device *upper, void *data)
 {
 	if (netif_is_macvlan(upper)) {
-		struct macvlan_dev *dfwd = netdev_priv(upper);
-		struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+		struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
 
-		if (dfwd->fwd_priv)
+		if (vadapter)
 			ixgbe_fwd_ring_up(upper, vadapter);
 	}
 
@@ -8983,13 +8982,12 @@ struct upper_walk_data {
 static int get_macvlan_queue(struct net_device *upper, void *_data)
 {
 	if (netif_is_macvlan(upper)) {
-		struct macvlan_dev *dfwd = netdev_priv(upper);
-		struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+		struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
 		struct upper_walk_data *data = _data;
 		struct ixgbe_adapter *adapter = data->adapter;
 		int ifindex = data->ifindex;
 
-		if (vadapter && vadapter->netdev->ifindex == ifindex) {
+		if (vadapter && upper->ifindex == ifindex) {
 			data->queue = adapter->rx_ring[vadapter->rx_base_queue]->reg_idx;
 			data->action = data->queue;
 			return 1;
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 725f4b4afc6d..7ddc94ff4109 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -559,9 +559,9 @@ static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 	if (unlikely(netpoll_tx_running(dev)))
 		return macvlan_netpoll_send_skb(vlan, skb);
 
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		skb->dev = vlan->lowerdev;
-		ret = dev_queue_xmit_accel(skb, vlan->fwd_priv);
+		ret = dev_queue_xmit_accel(skb, vlan->accel_priv);
 	} else {
 		ret = macvlan_queue_xmit(skb, dev);
 	}
@@ -613,16 +613,23 @@ static int macvlan_open(struct net_device *dev)
 		goto hash_add;
 	}
 
+	err = -EBUSY;
+	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
+		goto out;
+
+	/* Attempt to populate accel_priv which is used to offload the L2
+	 * forwarding requests for unicast packets.
+	 */
 	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
-		vlan->fwd_priv =
+		vlan->accel_priv =
 		      lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
 
 		/* If we get a NULL pointer back, or if we get an error
 		 * then we should just fall through to the non accelerated path
 		 */
-		if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
-			vlan->fwd_priv = NULL;
-		} else
+		if (IS_ERR_OR_NULL(vlan->accel_priv))
+			vlan->accel_priv = NULL;
+		else
 			return 0;
 	}
 
@@ -655,10 +662,10 @@ static int macvlan_open(struct net_device *dev)
 del_unicast:
 	dev_uc_del(lowerdev, dev->dev_addr);
 out:
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
-							   vlan->fwd_priv);
-		vlan->fwd_priv = NULL;
+							   vlan->accel_priv);
+		vlan->accel_priv = NULL;
 	}
 	return err;
 }
@@ -668,10 +675,10 @@ static int macvlan_stop(struct net_device *dev)
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
 
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
-							   vlan->fwd_priv);
-		vlan->fwd_priv = NULL;
+							   vlan->accel_priv);
+		vlan->accel_priv = NULL;
 		return 0;
 	}
 
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 4cb7aeeafce0..c5106ce8273a 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -21,7 +21,7 @@ struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
-	void			*fwd_priv;
+	void			*accel_priv;
 	struct vlan_pcpu_stats __percpu *pcpu_stats;
 
 	DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
@@ -86,4 +86,10 @@ macvlan_dev_real_dev(const struct net_device *dev)
 }
 #endif
 
+static inline void *macvlan_accel_priv(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	return macvlan->accel_priv;
+}
 #endif /* _LINUX_IF_MACVLAN_H */
-- 
2.14.3

^ permalink raw reply related

* [net-next 00/10][pull request] 10GbE Intel Wired LAN Driver Updates 2018-04-25
From: Jeff Kirsher @ 2018-04-25 15:32 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series represents yet another phase of the macvlan cleanup Alex has
been working on.

The main goal of these changes is to make it so that we only support
offloading what we can actually offload and we don't break any existing
functionality. So for example we were claiming to advertise source mode
macvlan and we were doing nothing of the sort, so support for that has been
dropped.

The biggest change with this set is that broadcast/multicast replication is
no longer being supported in software. Alex dropped it as it leads to
scaling issues when a broadcast frame has to be replicated up to 64 times.

Beyond that this set goes through and optimized the time needed to bring up
and tear down the macvlan interfaces on ixgbe and provides a clean way for
us to disable the macvlan offload when needed.

The following are changes since commit c749fa181bd5848be78691d23168ec61ce691b95:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Alexander Duyck (10):
  ixgbe: Drop support for macvlan specific unicast lists
  macvlan: Rename fwd_priv to accel_priv and add accessor function
  macvlan: Use software path for offloaded local, broadcast, and
    multicast traffic
  ixgbe/fm10k: Drop tracking stats for macvlan broadcast/multicast
  macvlan: macvlan_count_rx shouldn't be static inline AND extern
  macvlan: Add function to test for destination filtering support
  macvlan: Provide function for interfaces to release HW offload
  ixgbe/fm10k: Only support macvlan offload for types that support
    destination filtering
  ixgbe: Drop real_adapter from l2 fwd acceleration structure
  ixgbe: Avoid performing unnecessary resets for macvlan offload

 drivers/net/ethernet/intel/fm10k/fm10k_main.c   |   7 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |  12 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h        |   1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 306 +++++++++++++-----------
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c  |   5 +-
 drivers/net/macvlan.c                           |  68 +++---
 include/linux/if_macvlan.h                      |  29 ++-
 7 files changed, 243 insertions(+), 185 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [PATCH iproute2 v3] json_print: Fix hidden 64-bit type promotion
From: Toke Høiland-Jørgensen @ 2018-04-25 15:28 UTC (permalink / raw)
  To: netdev; +Cc: Toke Høiland-Jørgensen
In-Reply-To: <20180425143022.6986-1-toke@toke.dk>

print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").

Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.

Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites. For symmetry,
also add a print_luint() method (with no users).

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
v3:
  - Fix a few more call sites.
  - Also add a print_luint() function for symmetry with the existing
    ones.
  
v2 (v1 was submitted by Kevin Darbyshire-Bryant):
  - Add print_u64() and convert call sites that were actually passing
    64-bit values to print_uint().

 include/json_print.h  |  5 +++-
 include/json_writer.h | 15 ++++++++---
 ip/ipaddress.c        | 62 +++++++++++++++++++++----------------------
 ip/ipmacsec.c         |  8 +++---
 ip/ipmroute.c         |  6 ++---
 ip/ipntable.c         | 38 +++++++++++++-------------
 ip/iproute_lwtunnel.c |  6 ++---
 ip/iptuntap.c         |  4 +--
 ip/tcp_metrics.c      |  6 ++---
 lib/json_print.c      |  5 +++-
 lib/json_writer.c     | 43 +++++++++++++++++++++++++++---
 11 files changed, 123 insertions(+), 75 deletions(-)

diff --git a/include/json_print.h b/include/json_print.h
index 2ca7830a..218fedc5 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -56,13 +56,16 @@ void close_json_array(enum output_type type, const char *delim);
 		print_color_##type_name(t, COLOR_NONE, key, fmt, value);	\
 	}
 _PRINT_FUNC(int, int);
+_PRINT_FUNC(s64, int64_t);
 _PRINT_FUNC(bool, bool);
 _PRINT_FUNC(null, const char*);
 _PRINT_FUNC(string, const char*);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(uint, unsigned int);
+_PRINT_FUNC(u64, uint64_t);
 _PRINT_FUNC(hu, unsigned short);
 _PRINT_FUNC(hex, unsigned int);
 _PRINT_FUNC(0xhex, unsigned int);
+_PRINT_FUNC(luint, unsigned long int);
 _PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
 #undef _PRINT_FUNC
diff --git a/include/json_writer.h b/include/json_writer.h
index 4b4dec28..9ab88e1d 100644
--- a/include/json_writer.h
+++ b/include/json_writer.h
@@ -34,22 +34,29 @@ void jsonw_string(json_writer_t *self, const char *value);
 void jsonw_bool(json_writer_t *self, bool value);
 void jsonw_float(json_writer_t *self, double number);
 void jsonw_float_fmt(json_writer_t *self, const char *fmt, double num);
-void jsonw_uint(json_writer_t *self, uint64_t number);
+void jsonw_uint(json_writer_t *self, unsigned int number);
+void jsonw_u64(json_writer_t *self, uint64_t number);
 void jsonw_xint(json_writer_t *self, uint64_t number);
 void jsonw_hu(json_writer_t *self, unsigned short number);
-void jsonw_int(json_writer_t *self, int64_t number);
+void jsonw_int(json_writer_t *self, int number);
+void jsonw_s64(json_writer_t *self, int64_t number);
 void jsonw_null(json_writer_t *self);
+void jsonw_luint(json_writer_t *self, unsigned long int num);
 void jsonw_lluint(json_writer_t *self, unsigned long long int num);
 
 /* Useful Combinations of name and value */
 void jsonw_string_field(json_writer_t *self, const char *prop, const char *val);
 void jsonw_bool_field(json_writer_t *self, const char *prop, bool value);
 void jsonw_float_field(json_writer_t *self, const char *prop, double num);
-void jsonw_uint_field(json_writer_t *self, const char *prop, uint64_t num);
+void jsonw_uint_field(json_writer_t *self, const char *prop, unsigned int num);
+void jsonw_u64_field(json_writer_t *self, const char *prop, uint64_t num);
 void jsonw_xint_field(json_writer_t *self, const char *prop, uint64_t num);
 void jsonw_hu_field(json_writer_t *self, const char *prop, unsigned short num);
-void jsonw_int_field(json_writer_t *self, const char *prop, int64_t num);
+void jsonw_int_field(json_writer_t *self, const char *prop, int num);
+void jsonw_s64_field(json_writer_t *self, const char *prop, int64_t num);
 void jsonw_null_field(json_writer_t *self, const char *prop);
+void jsonw_luint_field(json_writer_t *self, const char *prop,
+			unsigned long int num);
 void jsonw_lluint_field(json_writer_t *self, const char *prop,
 			unsigned long long int num);
 void jsonw_float_field_fmt(json_writer_t *self, const char *prop,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index aecc9a1d..75539e05 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -554,21 +554,21 @@ static void print_vf_stats64(FILE *fp, struct rtattr *vfstats)
 
 		/* RX stats */
 		open_json_object("rx");
-		print_uint(PRINT_JSON, "bytes", NULL,
+		print_u64(PRINT_JSON, "bytes", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
-		print_uint(PRINT_JSON, "packets", NULL,
+		print_u64(PRINT_JSON, "packets", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
-		print_uint(PRINT_JSON, "multicast", NULL,
+		print_u64(PRINT_JSON, "multicast", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
-		print_uint(PRINT_JSON, "broadcast", NULL,
+		print_u64(PRINT_JSON, "broadcast", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_BROADCAST]));
 		close_json_object();
 
 		/* TX stats */
 		open_json_object("tx");
-		print_uint(PRINT_JSON, "tx_bytes", NULL,
+		print_u64(PRINT_JSON, "tx_bytes", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
-		print_uint(PRINT_JSON, "tx_packets", NULL,
+		print_u64(PRINT_JSON, "tx_packets", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
 		close_json_object();
 		close_json_object();
@@ -608,69 +608,69 @@ static void __print_link_stats(FILE *fp, struct rtattr *tb[])
 
 		/* RX stats */
 		open_json_object("rx");
-		print_uint(PRINT_JSON, "bytes", NULL, s->rx_bytes);
-		print_uint(PRINT_JSON, "packets", NULL, s->rx_packets);
-		print_uint(PRINT_JSON, "errors", NULL, s->rx_errors);
-		print_uint(PRINT_JSON, "dropped", NULL, s->rx_dropped);
-		print_uint(PRINT_JSON, "over_errors", NULL, s->rx_over_errors);
-		print_uint(PRINT_JSON, "multicast", NULL, s->multicast);
+		print_u64(PRINT_JSON, "bytes", NULL, s->rx_bytes);
+		print_u64(PRINT_JSON, "packets", NULL, s->rx_packets);
+		print_u64(PRINT_JSON, "errors", NULL, s->rx_errors);
+		print_u64(PRINT_JSON, "dropped", NULL, s->rx_dropped);
+		print_u64(PRINT_JSON, "over_errors", NULL, s->rx_over_errors);
+		print_u64(PRINT_JSON, "multicast", NULL, s->multicast);
 		if (s->rx_compressed)
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "compressed", NULL, s->rx_compressed);
 
 		/* RX error stats */
 		if (show_stats > 1) {
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "length_errors",
 				   NULL, s->rx_length_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "crc_errors",
 				   NULL, s->rx_crc_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "frame_errors",
 				   NULL, s->rx_frame_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "fifo_errors",
 				   NULL, s->rx_fifo_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "missed_errors",
 				   NULL, s->rx_missed_errors);
 			if (s->rx_nohandler)
-				print_uint(PRINT_JSON,
+				print_u64(PRINT_JSON,
 					   "nohandler", NULL, s->rx_nohandler);
 		}
 		close_json_object();
 
 		/* TX stats */
 		open_json_object("tx");
-		print_uint(PRINT_JSON, "bytes", NULL, s->tx_bytes);
-		print_uint(PRINT_JSON, "packets", NULL, s->tx_packets);
-		print_uint(PRINT_JSON, "errors", NULL, s->tx_errors);
-		print_uint(PRINT_JSON, "dropped", NULL, s->tx_dropped);
-		print_uint(PRINT_JSON,
+		print_u64(PRINT_JSON, "bytes", NULL, s->tx_bytes);
+		print_u64(PRINT_JSON, "packets", NULL, s->tx_packets);
+		print_u64(PRINT_JSON, "errors", NULL, s->tx_errors);
+		print_u64(PRINT_JSON, "dropped", NULL, s->tx_dropped);
+		print_u64(PRINT_JSON,
 			   "carrier_errors",
 			   NULL, s->tx_carrier_errors);
-		print_uint(PRINT_JSON, "collisions", NULL, s->collisions);
+		print_u64(PRINT_JSON, "collisions", NULL, s->collisions);
 		if (s->tx_compressed)
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "compressed", NULL, s->tx_compressed);
 
 		/* TX error stats */
 		if (show_stats > 1) {
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "aborted_errors",
 				   NULL, s->tx_aborted_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "fifo_errors",
 				   NULL, s->tx_fifo_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "window_errors",
 				   NULL, s->tx_window_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "heartbeat_errors",
 				   NULL, s->tx_heartbeat_errors);
 			if (carrier_changes)
-				print_uint(PRINT_JSON, "carrier_changes", NULL,
+				print_u64(PRINT_JSON, "carrier_changes", NULL,
 					   rta_getattr_u32(carrier_changes));
 		}
 
diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index 38ec7136..4e4e158e 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -640,7 +640,7 @@ static void print_attrs(struct rtattr *attrs[])
 	}
 }
 
-static __u64 getattr_uint(struct rtattr *stat)
+static __u64 getattr_u64(struct rtattr *stat)
 {
 	switch (RTA_PAYLOAD(stat)) {
 	case sizeof(__u64):
@@ -681,7 +681,7 @@ static void print_fp_stats(const char *prefix,
 
 		pad = strlen(names[i]) + 1;
 		if (stats[i])
-			printf("%*llu", pad, getattr_uint(stats[i]));
+			printf("%*llu", pad, getattr_u64(stats[i]));
 		else
 			printf("%*c", pad, '-');
 	}
@@ -697,8 +697,8 @@ static void print_json_stats(const char *names[], unsigned int num,
 		if (!names[i] || !stats[i])
 			continue;
 
-		print_uint(PRINT_JSON, names[i],
-			   NULL, getattr_uint(stats[i]));
+		print_u64(PRINT_JSON, names[i],
+			   NULL, getattr_u64(stats[i]));
 	}
 }
 
diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index 59c5b771..cdb4d898 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -182,12 +182,12 @@ int print_mroute(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 		struct rta_mfc_stats *mfcs = RTA_DATA(tb[RTA_MFC_STATS]);
 
 		print_string(PRINT_FP, NULL, "%s", _SL_);
-		print_uint(PRINT_ANY, "packets", "  %"PRIu64" packets,",
+		print_u64(PRINT_ANY, "packets", "  %"PRIu64" packets,",
 			   mfcs->mfcs_packets);
-		print_uint(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
+		print_u64(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
 
 		if (mfcs->mfcs_wrong_if)
-			print_uint(PRINT_ANY, "wrong_if",
+			print_u64(PRINT_ANY, "wrong_if",
 				   ", %"PRIu64" arrived on wrong iif.",
 				   mfcs->mfcs_wrong_if);
 	}
diff --git a/ip/ipntable.c b/ip/ipntable.c
index 82f40f87..4fae181d 100644
--- a/ip/ipntable.c
+++ b/ip/ipntable.c
@@ -392,7 +392,7 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_REACHABLE_TIME]) {
 		__u64 reachable = rta_getattr_u64(tpb[NDTPA_REACHABLE_TIME]);
 
-		print_uint(PRINT_ANY, "reachable",
+		print_u64(PRINT_ANY, "reachable",
 			   "reachable %llu ", reachable);
 	}
 
@@ -400,14 +400,14 @@ static void print_ndtparams(struct rtattr *tpb[])
 		__u64 breachable
 			= rta_getattr_u64(tpb[NDTPA_BASE_REACHABLE_TIME]);
 
-		print_uint(PRINT_ANY, "base_reachable",
+		print_u64(PRINT_ANY, "base_reachable",
 			   "base_reachable %llu ", breachable);
 	}
 
 	if (tpb[NDTPA_RETRANS_TIME]) {
 		__u64 retrans = rta_getattr_u64(tpb[NDTPA_RETRANS_TIME]);
 
-		print_uint(PRINT_ANY, "retrans", "retrans %llu ", retrans);
+		print_u64(PRINT_ANY, "retrans", "retrans %llu ", retrans);
 	}
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
@@ -415,14 +415,14 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_GC_STALETIME]) {
 		__u64 gc_stale = rta_getattr_u64(tpb[NDTPA_GC_STALETIME]);
 
-		print_uint(PRINT_ANY, "gc_stale", "gc_stale %llu ", gc_stale);
+		print_u64(PRINT_ANY, "gc_stale", "gc_stale %llu ", gc_stale);
 	}
 
 	if (tpb[NDTPA_DELAY_PROBE_TIME]) {
 		__u64 delay_probe
 			= rta_getattr_u64(tpb[NDTPA_DELAY_PROBE_TIME]);
 
-		print_uint(PRINT_ANY, "delay_probe",
+		print_u64(PRINT_ANY, "delay_probe",
 			   "delay_probe %llu ", delay_probe);
 	}
 
@@ -459,14 +459,14 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_ANYCAST_DELAY]) {
 		__u64 anycast_delay = rta_getattr_u64(tpb[NDTPA_ANYCAST_DELAY]);
 
-		print_uint(PRINT_ANY, "anycast_delay",
+		print_u64(PRINT_ANY, "anycast_delay",
 			   "anycast_delay %llu ", anycast_delay);
 	}
 
 	if (tpb[NDTPA_PROXY_DELAY]) {
 		__u64 proxy_delay = rta_getattr_u64(tpb[NDTPA_PROXY_DELAY]);
 
-		print_uint(PRINT_ANY, "proxy_delay",
+		print_u64(PRINT_ANY, "proxy_delay",
 			   "proxy_delay %llu ", proxy_delay);
 	}
 
@@ -479,7 +479,7 @@ static void print_ndtparams(struct rtattr *tpb[])
 	if (tpb[NDTPA_LOCKTIME]) {
 		__u64 locktime = rta_getattr_u64(tpb[NDTPA_LOCKTIME]);
 
-		print_uint(PRINT_ANY, "locktime", "locktime %llu ", locktime);
+		print_u64(PRINT_ANY, "locktime", "locktime %llu ", locktime);
 	}
 
 	print_string(PRINT_FP, NULL, "%s", _SL_);
@@ -490,31 +490,31 @@ static void print_ndtstats(const struct ndt_stats *ndts)
 
 	print_string(PRINT_FP, NULL, "    stats ", NULL);
 
-	print_uint(PRINT_ANY, "allocs", "allocs %llu ", ndts->ndts_allocs);
-	print_uint(PRINT_ANY, "destroys", "destroys %llu ",
+	print_u64(PRINT_ANY, "allocs", "allocs %llu ", ndts->ndts_allocs);
+	print_u64(PRINT_ANY, "destroys", "destroys %llu ",
 		   ndts->ndts_destroys);
-	print_uint(PRINT_ANY, "hash_grows", "hash_grows %llu ",
+	print_u64(PRINT_ANY, "hash_grows", "hash_grows %llu ",
 		   ndts->ndts_hash_grows);
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
 
-	print_uint(PRINT_ANY, "res_failed", "res_failed %llu ",
+	print_u64(PRINT_ANY, "res_failed", "res_failed %llu ",
 		   ndts->ndts_res_failed);
-	print_uint(PRINT_ANY, "lookups", "lookups %llu ", ndts->ndts_lookups);
-	print_uint(PRINT_ANY, "hits", "hits %llu ", ndts->ndts_hits);
+	print_u64(PRINT_ANY, "lookups", "lookups %llu ", ndts->ndts_lookups);
+	print_u64(PRINT_ANY, "hits", "hits %llu ", ndts->ndts_hits);
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
 
-	print_uint(PRINT_ANY, "rcv_probes_mcast", "rcv_probes_mcast %llu ",
+	print_u64(PRINT_ANY, "rcv_probes_mcast", "rcv_probes_mcast %llu ",
 		   ndts->ndts_rcv_probes_mcast);
-	print_uint(PRINT_ANY, "rcv_probes_ucast", "rcv_probes_ucast %llu ",
+	print_u64(PRINT_ANY, "rcv_probes_ucast", "rcv_probes_ucast %llu ",
 		   ndts->ndts_rcv_probes_ucast);
 
 	print_string(PRINT_FP, NULL, "%s    ", _SL_);
 
-	print_uint(PRINT_ANY, "periodic_gc_runs", "periodic_gc_runs %llu ",
+	print_u64(PRINT_ANY, "periodic_gc_runs", "periodic_gc_runs %llu ",
 		   ndts->ndts_periodic_gc_runs);
-	print_uint(PRINT_ANY, "forced_gc_runs", "forced_gc_runs %llu ",
+	print_u64(PRINT_ANY, "forced_gc_runs", "forced_gc_runs %llu ",
 		   ndts->ndts_forced_gc_runs);
 
 	print_string(PRINT_FP, NULL, "%s", _SL_);
@@ -607,7 +607,7 @@ static int print_ntable(const struct sockaddr_nl *who,
 	if (tb[NDTA_GC_INTERVAL]) {
 		__u64 gc_int = rta_getattr_u64(tb[NDTA_GC_INTERVAL]);
 
-		print_uint(PRINT_ANY, "gc_interval", "gc_int %llu ", gc_int);
+		print_u64(PRINT_ANY, "gc_interval", "gc_int %llu ", gc_int);
 	}
 
 	if (ret)
diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index cde9b3d2..46a212c8 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -273,7 +273,7 @@ static void print_encap_ip(FILE *fp, struct rtattr *encap)
 	parse_rtattr_nested(tb, LWTUNNEL_IP_MAX, encap);
 
 	if (tb[LWTUNNEL_IP_ID])
-		print_uint(PRINT_ANY, "id", "id %llu ",
+		print_u64(PRINT_ANY, "id", "id %llu ",
 			   ntohll(rta_getattr_u64(tb[LWTUNNEL_IP_ID])));
 
 	if (tb[LWTUNNEL_IP_SRC])
@@ -333,7 +333,7 @@ static void print_encap_ip6(FILE *fp, struct rtattr *encap)
 	parse_rtattr_nested(tb, LWTUNNEL_IP6_MAX, encap);
 
 	if (tb[LWTUNNEL_IP6_ID])
-		print_uint(PRINT_ANY, "id", "id %llu ",
+		print_u64(PRINT_ANY, "id", "id %llu ",
 			    ntohll(rta_getattr_u64(tb[LWTUNNEL_IP6_ID])));
 
 	if (tb[LWTUNNEL_IP6_SRC])
@@ -347,7 +347,7 @@ static void print_encap_ip6(FILE *fp, struct rtattr *encap)
 				   rt_addr_n2a_rta(AF_INET6, tb[LWTUNNEL_IP6_DST]));
 
 	if (tb[LWTUNNEL_IP6_HOPLIMIT])
-		print_uint(PRINT_ANY, "hoplimit",
+		print_u64(PRINT_ANY, "hoplimit",
 			   "hoplimit %u ",
 			   rta_getattr_u8(tb[LWTUNNEL_IP6_HOPLIMIT]));
 
diff --git a/ip/iptuntap.c b/ip/iptuntap.c
index 6c5a7259..58996e6c 100644
--- a/ip/iptuntap.c
+++ b/ip/iptuntap.c
@@ -440,10 +440,10 @@ static int print_tuntap(const struct sockaddr_nl *who,
 			   "ifname", "%s:", name);
 	print_flags(flags);
 	if (owner != -1)
-		print_uint(PRINT_ANY, "user",
+		print_u64(PRINT_ANY, "user",
 			   " user %ld", owner);
 	if (group != -1)
-		print_uint(PRINT_ANY, "group",
+		print_u64(PRINT_ANY, "group",
 			   " group %ld", group);
 
 	if (show_details) {
diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
index 72dc980c..ad3d6f36 100644
--- a/ip/tcp_metrics.c
+++ b/ip/tcp_metrics.c
@@ -139,19 +139,19 @@ static void print_tcp_metrics(struct rtattr *a)
 
 		print_uint(PRINT_JSON, name, NULL, val);
 		print_string(PRINT_FP, NULL, " %s ", name);
-		print_uint(PRINT_FP, NULL, "%lu", val);
+		print_uint(PRINT_FP, NULL, "%u", val);
 	}
 
 	if (rtt) {
 		print_float(PRINT_JSON, "rtt", NULL,
 			    (double)rtt / usec_per_sec);
-		print_uint(PRINT_FP, NULL,
+		print_u64(PRINT_FP, NULL,
 			   " rtt %luus", rtt);
 	}
 	if (rttvar) {
 		print_float(PRINT_JSON, "rttvar", NULL,
 			    (double) rttvar / usec_per_sec);
-		print_uint(PRINT_FP, NULL,
+		print_u64(PRINT_FP, NULL,
 			   " rttvar %luus", rttvar);
 	}
 }
diff --git a/lib/json_print.c b/lib/json_print.c
index bda72933..5dc41bfa 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -116,8 +116,11 @@ void close_json_array(enum output_type type, const char *str)
 		}							\
 	}
 _PRINT_FUNC(int, int);
+_PRINT_FUNC(s64, int64_t);
 _PRINT_FUNC(hu, unsigned short);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(uint, unsigned int);
+_PRINT_FUNC(u64, uint64_t);
+_PRINT_FUNC(luint, unsigned long int);
 _PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
 #undef _PRINT_FUNC
diff --git a/lib/json_writer.c b/lib/json_writer.c
index 0ad04218..aa9ce1c6 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -220,7 +220,12 @@ void jsonw_hu(json_writer_t *self, unsigned short num)
 	jsonw_printf(self, "%hu", num);
 }
 
-void jsonw_uint(json_writer_t *self, uint64_t num)
+void jsonw_uint(json_writer_t *self, unsigned int num)
+{
+	jsonw_printf(self, "%u", num);
+}
+
+void jsonw_u64(json_writer_t *self, uint64_t num)
 {
 	jsonw_printf(self, "%"PRIu64, num);
 }
@@ -230,12 +235,22 @@ void jsonw_xint(json_writer_t *self, uint64_t num)
 	jsonw_printf(self, "%"PRIx64, num);
 }
 
+void jsonw_luint(json_writer_t *self, unsigned long int num)
+{
+	jsonw_printf(self, "%lu", num);
+}
+
 void jsonw_lluint(json_writer_t *self, unsigned long long int num)
 {
 	jsonw_printf(self, "%llu", num);
 }
 
-void jsonw_int(json_writer_t *self, int64_t num)
+void jsonw_int(json_writer_t *self, int num)
+{
+	jsonw_printf(self, "%d", num);
+}
+
+void jsonw_s64(json_writer_t *self, int64_t num)
 {
 	jsonw_printf(self, "%"PRId64, num);
 }
@@ -268,12 +283,18 @@ void jsonw_float_field_fmt(json_writer_t *self,
 	jsonw_float_fmt(self, fmt, val);
 }
 
-void jsonw_uint_field(json_writer_t *self, const char *prop, uint64_t num)
+void jsonw_uint_field(json_writer_t *self, const char *prop, unsigned int num)
 {
 	jsonw_name(self, prop);
 	jsonw_uint(self, num);
 }
 
+void jsonw_u64_field(json_writer_t *self, const char *prop, uint64_t num)
+{
+	jsonw_name(self, prop);
+	jsonw_u64(self, num);
+}
+
 void jsonw_xint_field(json_writer_t *self, const char *prop, uint64_t num)
 {
 	jsonw_name(self, prop);
@@ -286,6 +307,14 @@ void jsonw_hu_field(json_writer_t *self, const char *prop, unsigned short num)
 	jsonw_hu(self, num);
 }
 
+void jsonw_luint_field(json_writer_t *self,
+			const char *prop,
+			unsigned long int num)
+{
+	jsonw_name(self, prop);
+	jsonw_luint(self, num);
+}
+
 void jsonw_lluint_field(json_writer_t *self,
 			const char *prop,
 			unsigned long long int num)
@@ -294,12 +323,18 @@ void jsonw_lluint_field(json_writer_t *self,
 	jsonw_lluint(self, num);
 }
 
-void jsonw_int_field(json_writer_t *self, const char *prop, int64_t num)
+void jsonw_int_field(json_writer_t *self, const char *prop, int num)
 {
 	jsonw_name(self, prop);
 	jsonw_int(self, num);
 }
 
+void jsonw_s64_field(json_writer_t *self, const char *prop, int64_t num)
+{
+	jsonw_name(self, prop);
+	jsonw_s64(self, num);
+}
+
 void jsonw_null_field(json_writer_t *self, const char *prop)
 {
 	jsonw_name(self, prop);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
From: Christoph Hellwig @ 2018-04-25 15:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Eric Dumazet, David S . Miller, netdev,
	Andy Lutomirski, linux-kernel, linux-mm, Soheil Hassas Yeganeh
In-Reply-To: <5cd31eba-63b5-9160-0a2e-f441340df0d3@gmail.com>

On Wed, Apr 25, 2018 at 06:01:02AM -0700, Eric Dumazet wrote:
> Thanks Christoph
> 
> Note the high cost of zap_page_range(), needed to avoid -EBUSY being returned
> from vm_insert_page() the second time TCP_ZEROCOPY_RECEIVE is used on one VMA.
> 
> Ideally a vm_replace_page() would avoid this cost ?

No my area of expertise, I'll defer to the MM folks..

^ permalink raw reply

* Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-04-25 15:22 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: cake, Dave Taht
In-Reply-To: <e27a6618-3a68-fa60-53a0-109b4df70482@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 04/25/2018 06:42 AM, Toke Høiland-Jørgensen wrote:
>> sch_cake targets the home router use case and is intended to squeeze the
>> most bandwidth and latency out of even the slowest ISP links and routers,
>> while presenting an API simple enough that even an ISP can configure it.
>> 
>
> * Support for ack filtering.
>
> Oh my god. Cake became a monster.

Haha, you either die a hero or live long enough to see yourself become
the villain, I guess? ;)

We do realise there are lots of good reasons not to do ACK filtering;
but it does make a difference on highly asymmetrical links (see
http://blog.cerowrt.org/post/ack_filtering/), which sadly some people
are still behind.

> syzkaller will be very happy to trigger all kind of bugs in it.

Heh, right.

> Lack of any pskb_may_pull() is really concerning.

By this you mean "check that the packet is long enough to contain the
header we are looking for before trying to do ACK filtering", right?

> How ack filter deals with reorders ?

I think this will do the right thing?

		/* new ack sequence must be greater
		 */
		if (thisconn &&
		    (ntohl(tcph_check->ack_seq) > ntohl(tcph->ack_seq)))
			continue;

> Also the forced GSO segmentation looks wrong to me.

Wrong as in it's done wrong, or wrong as in you object to the whole
concept?

> This kills xmit_more gain we have when GSO is performed after
> qdisc dequeue before hitting device.
>
> This should be driven by a parameter really, some threshold on the skb size.
>
> What performance number do you get on a 10Gbit NIC for example ?

Single-flow throughput through 2 hops on a 40Gbit connection (with CAKE
in unlimited mode vs pfifo_fast on the router):

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 () port 0 AF_INET : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    18840.40   

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to testbed-40g-2 () port 0 AF_INET : demo
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    24804.77   


Note, however, that CAKE is explicitly targeted at home gateways, which
generally run at speed a few orders of magnitude slower than this, and
we have heavily optimised for lowest possible latency. So this is a
conscious design choice, I guess you could say.

> Also, how ack filter can suppress packets after skb_gso_segment() ?

Hmm, because pure ACKs are not generally aggregated (sorry, I'm not
quite clear on when exactly GSO will kick in)?

-Toke

^ permalink raw reply

* Re: [PATCH iproute2 v2] json_print: Fix hidden 64-bit type promotion
From: Stephen Hemminger @ 2018-04-25 15:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, Kevin Darbyshire-Bryant
In-Reply-To: <87bme75o5r.fsf@toke.dk>

On Wed, 25 Apr 2018 16:57:52 +0200
Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Stephen Hemminger <stephen@networkplumber.org> writes:
> 
> > On Wed, 25 Apr 2018 16:30:22 +0200
> > Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> >  
> >> print_uint() will silently promote its variable type to uint64_t, but there
> >> is nothing that ensures that the format string specifier passed along with
> >> it fits (and the function name suggest to pass "%u").
> >> 
> >> Fix this by changing print_uint() to use a native 'unsigned int' type, and
> >> introduce a separate print_u64() function for printing 64-bit values. All
> >> call sites that were actually printing 64-bit values using print_uint() are
> >> converted to use print_u64() instead.
> >> 
> >> Since print_int() was already using native int types, just add a
> >> print_s64() to match, but don't convert any call sites.
> >> 
> >> Cc: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>  
> >
> > Yes, this makes sense. Maybe there should be a print_luint for
> > consistency.  
> 
> I just realised I missed a few call sites, so I'll resend. Should I
> call the new function print_luint() instead of print_u64()?

Ideally, there would be both functions, and use based on what is being printed.

> > Also, I tried (in vain) to make a version that allows GCC to check the
> > format string.  But it was a struggle and just gave up.  
> 
> Yeah, no idea how to do this either...

Maybe some magic smatch or multi-line regex it would be possible to
find all instances of print_uint, then look at format string of each and see if there is
a single %u.  Some added complexity since some places only print json and don't care
and pass NULL for format.

^ permalink raw reply

* Re: [PATCH 3/8] ARM: dts: stm32: add ethernet pins to stm32mp157c
From: Rob Herring @ 2018-04-25 15:09 UTC (permalink / raw)
  To: Christophe Roullier
  Cc: Mark Rutland, Maxime Coquelin, Alexandre Torgue,
	Giuseppe CAVALLARO, devicetree,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE, netdev
In-Reply-To: <1524582120-4451-4-git-send-email-christophe.roullier@st.com>

On Tue, Apr 24, 2018 at 10:01 AM, Christophe Roullier
<christophe.roullier@st.com> wrote:
> Add ethernet pins on stm32mp157c.
>
> Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
> ---
>  arch/arm/boot/dts/stm32mp157-pinctrl.dtsi | 46 +++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
>
> diff --git a/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi b/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
> index 6f044100..86720a5 100644
> --- a/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
> +++ b/arch/arm/boot/dts/stm32mp157-pinctrl.dtsi
> @@ -158,6 +158,52 @@
>                                         bias-disable;
>                                 };
>                         };
> +
> +                       ethernet0_rgmii_pins_a: rgmii@0 {

A unit-address without 'reg' property is not valid, so drop the '@0'.

Please build your dtb with W=1 or W=12 which will tell you this and
other errors.

Rob

^ permalink raw reply

* [PATCH bpf-next] selftests/bpf: bpf tunnel test.
From: William Tu @ 2018-04-25 15:01 UTC (permalink / raw)
  To: netdev

The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
and samples/bpf/test_tunnel_bpf.sh to selftests.  There are a couple
changes from the original:
    1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
    2) simplify the original ipip tests (remove iperf tests)
    3) improve documentation
    4) use bpf_ntoh* and bpf_hton* api

In summary, 'test_tunnel_kern.o' contains the following bpf program:
  GRE: gre_set_tunnel, gre_get_tunnel
  IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
  ERSPAN: erspan_set_tunnel, erspan_get_tunnel
  IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
  VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
  IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
  GENEVE: geneve_set_tunnel, geneve_get_tunnel
  IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
  IPIP: ipip_set_tunnel, ipip_get_tunnel
  IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
         ip6ip6_set_tunnel, ip6ip6_get_tunnel

Signed-off-by: William Tu <u9012063@gmail.com>
---
 samples/bpf/Makefile                           |   1 -
 samples/bpf/tcbpf2_kern.c                      | 612 ----------------------
 samples/bpf/test_tunnel_bpf.sh                 | 390 --------------
 tools/testing/selftests/bpf/Makefile           |   5 +-
 tools/testing/selftests/bpf/test_tunnel.sh     | 651 +++++++++++++++++++++++
 tools/testing/selftests/bpf/test_tunnel_kern.c | 691 +++++++++++++++++++++++++
 6 files changed, 1345 insertions(+), 1005 deletions(-)
 delete mode 100644 samples/bpf/tcbpf2_kern.c
 delete mode 100755 samples/bpf/test_tunnel_bpf.sh
 create mode 100755 tools/testing/selftests/bpf/test_tunnel.sh
 create mode 100644 tools/testing/selftests/bpf/test_tunnel_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index aa8c392e2e52..b853581592fd 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -114,7 +114,6 @@ always += sock_flags_kern.o
 always += test_probe_write_user_kern.o
 always += trace_output_kern.o
 always += tcbpf1_kern.o
-always += tcbpf2_kern.o
 always += tc_l2_redirect_kern.o
 always += lathist_kern.o
 always += offwaketime_kern.o
diff --git a/samples/bpf/tcbpf2_kern.c b/samples/bpf/tcbpf2_kern.c
deleted file mode 100644
index fa260c750fb1..000000000000
--- a/samples/bpf/tcbpf2_kern.c
+++ /dev/null
@@ -1,612 +0,0 @@
-/* Copyright (c) 2016 VMware
- * Copyright (c) 2016 Facebook
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-#define KBUILD_MODNAME "foo"
-#include <uapi/linux/bpf.h>
-#include <uapi/linux/if_ether.h>
-#include <uapi/linux/if_packet.h>
-#include <uapi/linux/ip.h>
-#include <uapi/linux/ipv6.h>
-#include <uapi/linux/in.h>
-#include <uapi/linux/tcp.h>
-#include <uapi/linux/filter.h>
-#include <uapi/linux/pkt_cls.h>
-#include <uapi/linux/erspan.h>
-#include <net/ipv6.h>
-#include "bpf_helpers.h"
-#include "bpf_endian.h"
-
-#define _htonl __builtin_bswap32
-#define ERROR(ret) do {\
-		char fmt[] = "ERROR line:%d ret:%d\n";\
-		bpf_trace_printk(fmt, sizeof(fmt), __LINE__, ret); \
-	} while(0)
-
-struct geneve_opt {
-	__be16	opt_class;
-	u8	type;
-	u8	length:5;
-	u8	r3:1;
-	u8	r2:1;
-	u8	r1:1;
-	u8	opt_data[8]; /* hard-coded to 8 byte */
-};
-
-struct vxlan_metadata {
-	u32     gbp;
-};
-
-SEC("gre_set_tunnel")
-int _gre_set_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_ZERO_CSUM_TX | BPF_F_SEQ_NUMBER);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("gre_get_tunnel")
-int _gre_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "key %d remote ip 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), key.tunnel_id, key.remote_ipv4);
-	return TC_ACT_OK;
-}
-
-SEC("ip6gretap_set_tunnel")
-int _ip6gretap_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv6[3] = _htonl(0x11); /* ::11 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-	key.tunnel_label = 0xabcde;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX |
-				     BPF_F_SEQ_NUMBER);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip6gretap_get_tunnel")
-int _ip6gretap_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "key %d remote ip6 ::%x label %x\n";
-	struct bpf_tunnel_key key;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
-
-	return TC_ACT_OK;
-}
-
-SEC("erspan_set_tunnel")
-int _erspan_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	__builtin_memset(&md, 0, sizeof(md));
-#ifdef ERSPAN_V1
-	md.version = 1;
-	md.u.index = bpf_htonl(123);
-#else
-	u8 direction = 1;
-	u8 hwid = 7;
-
-	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
-#endif
-
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("erspan_get_tunnel")
-int _erspan_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "key %d remote ip 0x%x erspan version %d\n";
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	u32 index;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.version);
-
-#ifdef ERSPAN_V1
-	char fmt2[] = "\tindex %x\n";
-
-	index = bpf_ntohl(md.u.index);
-	bpf_trace_printk(fmt2, sizeof(fmt2), index);
-#else
-	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
-
-	bpf_trace_printk(fmt2, sizeof(fmt2),
-			 md.u.md2.dir,
-			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
-			 bpf_ntohl(md.u.md2.timestamp));
-#endif
-
-	return TC_ACT_OK;
-}
-
-SEC("ip4ip6erspan_set_tunnel")
-int _ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	int ret;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv6[3] = _htonl(0x11);
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
-				     BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	__builtin_memset(&md, 0, sizeof(md));
-
-#ifdef ERSPAN_V1
-	md.u.index = htonl(123);
-	md.version = 1;
-#else
-	u8 direction = 0;
-	u8 hwid = 17;
-
-	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
-#endif
-
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip4ip6erspan_get_tunnel")
-int _ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
-{
-	char fmt[] = "ip6erspan get key %d remote ip6 ::%x erspan version %d\n";
-	struct bpf_tunnel_key key;
-	struct erspan_metadata md;
-	u32 index;
-	int ret;
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.version);
-
-#ifdef ERSPAN_V1
-	char fmt2[] = "\tindex %x\n";
-
-	index = bpf_ntohl(md.u.index);
-	bpf_trace_printk(fmt2, sizeof(fmt2), index);
-#else
-	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
-
-	bpf_trace_printk(fmt2, sizeof(fmt2),
-			 md.u.md2.dir,
-			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
-			 bpf_ntohl(md.u.md2.timestamp));
-#endif
-
-	return TC_ACT_OK;
-}
-
-SEC("vxlan_set_tunnel")
-int _vxlan_set_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct vxlan_metadata md;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	md.gbp = 0x800FF; /* Set VXLAN Group Policy extension */
-	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("vxlan_get_tunnel")
-int _vxlan_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct vxlan_metadata md;
-	char fmt[] = "key %d remote ip 0x%x vxlan gbp 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, md.gbp);
-
-	return TC_ACT_OK;
-}
-
-SEC("geneve_set_tunnel")
-int _geneve_set_tunnel(struct __sk_buff *skb)
-{
-	int ret, ret2;
-	struct bpf_tunnel_key key;
-	struct geneve_opt gopt;
-
-	__builtin_memset(&key, 0x0, sizeof(key));
-	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	key.tunnel_id = 2;
-	key.tunnel_tos = 0;
-	key.tunnel_ttl = 64;
-
-	__builtin_memset(&gopt, 0x0, sizeof(gopt));
-	gopt.opt_class = 0x102; /* Open Virtual Networking (OVN) */
-	gopt.type = 0x08;
-	gopt.r1 = 0;
-	gopt.r2 = 0;
-	gopt.r3 = 0;
-	gopt.length = 2; /* 4-byte multiple */
-	*(int *) &gopt.opt_data = 0xdeadbeef;
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("geneve_get_tunnel")
-int _geneve_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	struct geneve_opt gopt;
-	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt),
-			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
-	return TC_ACT_OK;
-}
-
-SEC("ipip_set_tunnel")
-int _ipip_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct iphdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.tunnel_ttl = 64;
-	if (iph->protocol == IPPROTO_ICMP) {
-		key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-	} else {
-		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5)
-			return TC_ACT_SHOT;
-
-		if (tcp->dest == htons(5200))
-			key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
-		else if (tcp->dest == htons(5201))
-			key.remote_ipv4 = 0xac100165; /* 172.16.1.101 */
-		else
-			return TC_ACT_SHOT;
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ipip_get_tunnel")
-int _ipip_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip 0x%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), key.remote_ipv4);
-	return TC_ACT_OK;
-}
-
-SEC("ipip6_set_tunnel")
-int _ipip6_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct iphdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.remote_ipv6[0] = _htonl(0x2401db00);
-	key.tunnel_ttl = 64;
-
-	if (iph->protocol == IPPROTO_ICMP) {
-		key.remote_ipv6[3] = _htonl(1);
-	} else {
-		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5) {
-			ERROR(iph->protocol);
-			return TC_ACT_SHOT;
-		}
-
-		if (tcp->dest == htons(5200)) {
-			key.remote_ipv6[3] = _htonl(1);
-		} else if (tcp->dest == htons(5201)) {
-			key.remote_ipv6[3] = _htonl(2);
-		} else {
-			ERROR(tcp->dest);
-			return TC_ACT_SHOT;
-		}
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ipip6_get_tunnel")
-int _ipip6_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip6 %x::%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
-			 _htonl(key.remote_ipv6[3]));
-	return TC_ACT_OK;
-}
-
-SEC("ip6ip6_set_tunnel")
-int _ip6ip6_set_tunnel(struct __sk_buff *skb)
-{
-	struct bpf_tunnel_key key = {};
-	void *data = (void *)(long)skb->data;
-	struct ipv6hdr *iph = data;
-	struct tcphdr *tcp = data + sizeof(*iph);
-	void *data_end = (void *)(long)skb->data_end;
-	int ret;
-
-	/* single length check */
-	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
-		ERROR(1);
-		return TC_ACT_SHOT;
-	}
-
-	key.remote_ipv6[0] = _htonl(0x2401db00);
-	key.tunnel_ttl = 64;
-
-	if (iph->nexthdr == NEXTHDR_ICMP) {
-		key.remote_ipv6[3] = _htonl(1);
-	} else {
-		if (iph->nexthdr != NEXTHDR_TCP) {
-			ERROR(iph->nexthdr);
-			return TC_ACT_SHOT;
-		}
-
-		if (tcp->dest == htons(5200)) {
-			key.remote_ipv6[3] = _htonl(1);
-		} else if (tcp->dest == htons(5201)) {
-			key.remote_ipv6[3] = _htonl(2);
-		} else {
-			ERROR(tcp->dest);
-			return TC_ACT_SHOT;
-		}
-	}
-
-	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	return TC_ACT_OK;
-}
-
-SEC("ip6ip6_get_tunnel")
-int _ip6ip6_get_tunnel(struct __sk_buff *skb)
-{
-	int ret;
-	struct bpf_tunnel_key key;
-	char fmt[] = "remote ip6 %x::%x\n";
-
-	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
-	if (ret < 0) {
-		ERROR(ret);
-		return TC_ACT_SHOT;
-	}
-
-	bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
-			 _htonl(key.remote_ipv6[3]));
-	return TC_ACT_OK;
-}
-
-SEC("xfrm_get_state")
-int _xfrm_get_state(struct __sk_buff *skb)
-{
-	struct bpf_xfrm_state x;
-	char fmt[] = "reqid %d spi 0x%x remote ip 0x%x\n";
-	int ret;
-
-	ret = bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
-	if (ret < 0)
-		return TC_ACT_OK;
-
-	bpf_trace_printk(fmt, sizeof(fmt), x.reqid, bpf_ntohl(x.spi),
-			 bpf_ntohl(x.remote_ipv4));
-	return TC_ACT_OK;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh
deleted file mode 100755
index 9c534dc07b36..000000000000
--- a/samples/bpf/test_tunnel_bpf.sh
+++ /dev/null
@@ -1,390 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-# In Namespace 0 (at_ns0) using native tunnel
-# Overlay IP: 10.1.1.100
-# local 192.16.1.100 remote 192.16.1.200
-# veth0 IP: 172.16.1.100, tunnel dev <type>00
-
-# Out of Namespace using BPF set/get on lwtunnel
-# Overlay IP: 10.1.1.200
-# local 172.16.1.200 remote 172.16.1.100
-# veth1 IP: 172.16.1.200, tunnel dev <type>11
-
-function config_device {
-	ip netns add at_ns0
-	ip link add veth0 type veth peer name veth1
-	ip link set veth0 netns at_ns0
-	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip link set dev veth1 up mtu 1500
-	ip addr add dev veth1 172.16.1.200/24
-}
-
-function add_gre_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-        ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE key 2 external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6gretap_tunnel {
-
-	# assign ipv6 address
-	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip addr add dev veth1 ::22/96
-	ip link set dev veth1 up
-
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
-		local ::11 remote ::22
-
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip addr add dev $DEV 10.1.1.200/24
-	ip addr add dev $DEV fc80::200/24
-	ip link set dev $DEV up
-}
-
-function add_erspan_tunnel {
-	# in namespace
-	if [ "$1" == "v1" ]; then
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200 \
-		erspan_ver 1 erspan 123
-	else
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local 172.16.1.100 remote 172.16.1.200 \
-		erspan_ver 2 erspan_dir egress erspan_hwid 3
-	fi
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6erspan_tunnel {
-
-	# assign ipv6 address
-	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
-	ip netns exec at_ns0 ip link set dev veth0 up
-	ip addr add dev veth1 ::22/96
-	ip link set dev veth1 up
-
-	# in namespace
-	if [ "$1" == "v1" ]; then
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local ::11 remote ::22 \
-		erspan_ver 1 erspan 123
-	else
-		ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE seq key 2 \
-		local ::11 remote ::22 \
-		erspan_ver 2 erspan_dir egress erspan_hwid 7
-	fi
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip addr add dev $DEV 10.1.1.200/24
-	ip link set dev $DEV up
-}
-
-function add_vxlan_tunnel {
-	# Set static ARP entry here because iptables set-mark works
-	# on L3 packet, as a result not applying to ARP packets,
-	# causing errors at get_tunnel_{key/opt}.
-
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE id 2 dstport 4789 gbp remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-	ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
-	ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external gbp dstport 4789
-	ip link set dev $DEV address 52:54:00:d9:02:00 up
-	ip addr add dev $DEV 10.1.1.200/24
-	arp -s 10.1.1.100 52:54:00:d9:01:00
-}
-
-function add_geneve_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE id 2 dstport 6081 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE dstport 6081 external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ipip_tunnel {
-	# in namespace
-	ip netns exec at_ns0 \
-		ip link add dev $DEV_NS type $TYPE local 172.16.1.100 remote 172.16.1.200
-	ip netns exec at_ns0 ip link set dev $DEV_NS up
-	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
-	# out of namespace
-	ip link add dev $DEV type $TYPE external
-	ip link set dev $DEV up
-	ip addr add dev $DEV 10.1.1.200/24
-}
-
-function setup_xfrm_tunnel {
-	auth=0x$(printf '1%.0s' {1..40})
-	enc=0x$(printf '2%.0s' {1..32})
-	spi_in_to_out=0x1
-	spi_out_to_in=0x2
-	# in namespace
-	# in -> out
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-			spi $spi_in_to_out reqid 1 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir out \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# out -> in
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-			spi $spi_out_to_in reqid 2 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir in \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip netns exec at_ns0 \
-		ip addr add dev veth0 10.1.1.100/32
-	ip netns exec at_ns0 \
-		ip route add 10.1.1.200 dev veth0 via 172.16.1.200 \
-			src 10.1.1.100
-
-	# out of namespace
-	# in -> out
-	ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-		spi $spi_in_to_out reqid 1 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir in \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# out -> in
-	ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-		spi $spi_out_to_in reqid 2 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir out \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip addr add dev veth1 10.1.1.200/32
-	ip route add 10.1.1.100 dev veth1 via 172.16.1.100 src 10.1.1.200
-}
-
-function attach_bpf {
-	DEV=$1
-	SET_TUNNEL=$2
-	GET_TUNNEL=$3
-	tc qdisc add dev $DEV clsact
-	tc filter add dev $DEV egress bpf da obj tcbpf2_kern.o sec $SET_TUNNEL
-	tc filter add dev $DEV ingress bpf da obj tcbpf2_kern.o sec $GET_TUNNEL
-}
-
-function test_gre {
-	TYPE=gretap
-	DEV_NS=gretap00
-	DEV=gretap11
-	config_device
-	add_gre_tunnel
-	attach_bpf $DEV gre_set_tunnel gre_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ip6gre {
-	TYPE=ip6gre
-	DEV_NS=ip6gre00
-	DEV=ip6gre11
-	config_device
-	# reuse the ip6gretap function
-	add_ip6gretap_tunnel
-	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
-	# underlay
-	ping6 -c 4 ::11
-	# overlay: ipv4 over ipv6
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	ping -c 1 10.1.1.100
-	# overlay: ipv6 over ipv6
-	ip netns exec at_ns0 ping6 -c 1 fc80::200
-	cleanup
-}
-
-function test_ip6gretap {
-	TYPE=ip6gretap
-	DEV_NS=ip6gretap00
-	DEV=ip6gretap11
-	config_device
-	add_ip6gretap_tunnel
-	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
-	# underlay
-	ping6 -c 4 ::11
-	# overlay: ipv4 over ipv6
-	ip netns exec at_ns0 ping -i .2 -c 1 10.1.1.200
-	ping -c 1 10.1.1.100
-	# overlay: ipv6 over ipv6
-	ip netns exec at_ns0 ping6 -c 1 fc80::200
-	cleanup
-}
-
-function test_erspan {
-	TYPE=erspan
-	DEV_NS=erspan00
-	DEV=erspan11
-	config_device
-	add_erspan_tunnel $1
-	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ip6erspan {
-	TYPE=ip6erspan
-	DEV_NS=ip6erspan00
-	DEV=ip6erspan11
-	config_device
-	add_ip6erspan_tunnel $1
-	attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
-	ping6 -c 3 ::11
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_vxlan {
-	TYPE=vxlan
-	DEV_NS=vxlan00
-	DEV=vxlan11
-	config_device
-	add_vxlan_tunnel
-	attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_geneve {
-	TYPE=geneve
-	DEV_NS=geneve00
-	DEV=geneve11
-	config_device
-	add_geneve_tunnel
-	attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	cleanup
-}
-
-function test_ipip {
-	TYPE=ipip
-	DEV_NS=ipip00
-	DEV=ipip11
-	config_device
-	tcpdump -nei veth1 &
-	cat /sys/kernel/debug/tracing/trace_pipe &
-	add_ipip_tunnel
-	ethtool -K veth1 gso off gro off rx off tx off
-	ip link set dev veth1 mtu 1500
-	attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
-	ping -c 1 10.1.1.100
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	ip netns exec at_ns0 iperf -sD -p 5200 > /dev/null
-	sleep 0.2
-	iperf -c 10.1.1.100 -n 5k -p 5200
-	cleanup
-}
-
-function test_xfrm_tunnel {
-	config_device
-        tcpdump -nei veth1 ip &
-	output=$(mktemp)
-	cat /sys/kernel/debug/tracing/trace_pipe | tee $output &
-        setup_xfrm_tunnel
-	tc qdisc add dev veth1 clsact
-	tc filter add dev veth1 proto ip ingress bpf da obj tcbpf2_kern.o \
-		sec xfrm_get_state
-	ip netns exec at_ns0 ping -c 1 10.1.1.200
-	grep "reqid 1" $output
-	grep "spi 0x1" $output
-	grep "remote ip 0xac100164" $output
-	cleanup
-}
-
-function cleanup {
-	set +ex
-	pkill iperf
-	ip netns delete at_ns0
-	ip link del veth1
-	ip link del ipip11
-	ip link del gretap11
-	ip link del ip6gre11
-	ip link del ip6gretap11
-	ip link del vxlan11
-	ip link del geneve11
-	ip link del erspan11
-	ip link del ip6erspan11
-	ip x s flush
-	ip x p flush
-	pkill tcpdump
-	pkill cat
-	set -ex
-}
-
-trap cleanup 0 2 3 6 9
-cleanup
-echo "Testing GRE tunnel..."
-test_gre
-echo "Testing IP6GRE tunnel..."
-test_ip6gre
-echo "Testing IP6GRETAP tunnel..."
-test_ip6gretap
-echo "Testing ERSPAN tunnel..."
-test_erspan v1
-test_erspan v2
-echo "Testing IP6ERSPAN tunnel..."
-test_ip6erspan v1
-test_ip6erspan v2
-echo "Testing VXLAN tunnel..."
-test_vxlan
-echo "Testing GENEVE tunnel..."
-test_geneve
-echo "Testing IPIP tunnel..."
-test_ipip
-echo "Testing IPSec tunnel..."
-test_xfrm_tunnel
-echo "*** PASS ***"
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 0c19d5e08f08..b64a7a39cbc8 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -32,7 +32,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
 	sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
-	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o
+	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -40,7 +40,8 @@ TEST_PROGS := test_kmod.sh \
 	test_xdp_redirect.sh \
 	test_xdp_meta.sh \
 	test_offload.py \
-	test_sock_addr.sh
+	test_sock_addr.sh \
+	test_tunnel.sh
 
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr
diff --git a/tools/testing/selftests/bpf/test_tunnel.sh b/tools/testing/selftests/bpf/test_tunnel.sh
new file mode 100755
index 000000000000..7fce054eaebd
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tunnel.sh
@@ -0,0 +1,651 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# End-to-end eBPF tunnel test suite
+#   The script tests BPF network tunnel implementation.
+#
+# Topology:
+# ---------
+#     root namespace   |     at_ns0 namespace
+#                      |
+#      -----------     |     -----------
+#      | tnl dev |     |     | tnl dev |  (overlay network)
+#      -----------     |     -----------
+#      metadata-mode   |     native-mode
+#       with bpf       |
+#                      |
+#      ----------      |     ----------
+#      |  veth1  | --------- |  veth0  |   (underlay network)
+#      ----------    peer    ----------
+#
+#
+# Device Configuration
+# --------------------
+# Root namespace with metadata-mode tunnel + BPF
+# Device names and addresses:
+# 	veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
+# 	tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)
+#
+# Namespace at_ns0 with native tunnel
+# Device names and addresses:
+# 	veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
+# 	tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)
+#
+#
+# End-to-end ping packet flow
+# ---------------------------
+# Most of the tests start by namespace creation, device configuration,
+# then ping the underlay and overlay network.  When doing 'ping 10.1.1.100'
+# from root namespace, the following operations happen:
+# 1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
+# 2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
+#    with remote_ip=172.16.1.200 and others.
+# 3) Outer tunnel header is prepended and route the packet to veth1's egress
+# 4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
+# 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
+# 6) Forward the packet to the overlay tnl dev
+
+PING_ARG="-c 3 -w 10 -q"
+ret=0
+GREEN='\033[0;92m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+config_device()
+{
+	ip netns add at_ns0
+	ip link add veth0 type veth peer name veth1
+	ip link set veth0 netns at_ns0
+	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip link set dev veth1 up mtu 1500
+	ip addr add dev veth1 172.16.1.200/24
+}
+
+add_gre_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+        ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE key 2 external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6gretap_tunnel()
+{
+
+	# assign ipv6 address
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
+		local ::11 remote ::22
+
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip addr add dev $DEV fc80::200/24
+	ip link set dev $DEV up
+}
+
+add_erspan_tunnel()
+{
+	# at_ns0 namespace
+	if [ "$1" == "v1" ]; then
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200 \
+		erspan_ver 1 erspan 123
+	else
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local 172.16.1.100 remote 172.16.1.200 \
+		erspan_ver 2 erspan_dir egress erspan_hwid 3
+	fi
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6erspan_tunnel()
+{
+
+	# assign ipv6 address
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	if [ "$1" == "v1" ]; then
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local ::11 remote ::22 \
+		erspan_ver 1 erspan 123
+	else
+		ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE seq key 2 \
+		local ::11 remote ::22 \
+		erspan_ver 2 erspan_dir egress erspan_hwid 7
+	fi
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_vxlan_tunnel()
+{
+	# Set static ARP entry here because iptables set-mark works
+	# on L3 packet, as a result not applying to ARP packets,
+	# causing errors at get_tunnel_{key/opt}.
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 2 dstport 4789 gbp remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
+	ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external gbp dstport 4789
+	ip link set dev $DEV address 52:54:00:d9:02:00 up
+	ip addr add dev $DEV 10.1.1.200/24
+	arp -s 10.1.1.100 52:54:00:d9:01:00
+}
+
+add_ip6vxlan_tunnel()
+{
+	#ip netns exec at_ns0 ip -4 addr del 172.16.1.100 dev veth0
+	ip netns exec at_ns0 ip -6 addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	#ip -4 addr del 172.16.1.200 dev veth1
+	ip -6 addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 22 dstport 4789 \
+		local ::11 remote ::22
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	echo "set dev done" > /dev/kmsg
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external dstport 4789
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_geneve_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		id 2 dstport 6081 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE dstport 6081 external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6geneve_tunnel()
+{
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE id 22 \
+		remote ::22     # geneve has no local option
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+add_ipip_tunnel()
+{
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		local 172.16.1.100 remote 172.16.1.200
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip link set dev $DEV up
+	ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ipip6tnl_tunnel()
+{
+	ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip addr add dev veth1 ::22/96
+	ip link set dev veth1 up
+
+	# at_ns0 namespace
+	ip netns exec at_ns0 \
+		ip link add dev $DEV_NS type $TYPE \
+		local ::11 remote ::22
+	ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+	ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+	# root namespace
+	ip link add dev $DEV type $TYPE external
+	ip addr add dev $DEV 10.1.1.200/24
+	ip link set dev $DEV up
+}
+
+test_gre()
+{
+	TYPE=gretap
+	DEV_NS=gretap00
+	DEV=gretap11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_gre_tunnel
+	attach_bpf $DEV gre_set_tunnel gre_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+        if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gre()
+{
+	TYPE=ip6gre
+	DEV_NS=ip6gre00
+	DEV=ip6gre11
+	ret=0
+
+	check $TYPE
+	config_device
+	# reuse the ip6gretap function
+	add_ip6gretap_tunnel
+	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# overlay: ipv4 over ipv6
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	# overlay: ipv6 over ipv6
+	ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+	check_err $?
+	cleanup
+
+        if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gretap()
+{
+	TYPE=ip6gretap
+	DEV_NS=ip6gretap00
+	DEV=ip6gretap11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6gretap_tunnel
+	attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# overlay: ipv4 over ipv6
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	# overlay: ipv6 over ipv6
+	ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_erspan()
+{
+	TYPE=erspan
+	DEV_NS=erspan00
+	DEV=erspan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_erspan_tunnel $1
+	attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6erspan()
+{
+	TYPE=ip6erspan
+	DEV_NS=ip6erspan00
+	DEV=ip6erspan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6erspan_tunnel $1
+	attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
+	ping6 $PING_ARG ::11
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_vxlan()
+{
+	TYPE=vxlan
+	DEV_NS=vxlan00
+	DEV=vxlan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_vxlan_tunnel
+	attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6vxlan()
+{
+	TYPE=vxlan
+	DEV_NS=ip6vxlan00
+	DEV=ip6vxlan11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6vxlan_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ip6vxlan_set_tunnel ip6vxlan_get_tunnel
+	echo "set bpf done" > /dev/kmsg
+	# underlay
+	ping6 $PING_ARG ::11
+	echo "ping underlay" > /dev/kmsg
+	# ip4 over ip6
+	ping $PING_ARG 10.1.1.100
+	echo "ping overlay" > /dev/kmsg
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_geneve()
+{
+	TYPE=geneve
+	DEV_NS=geneve00
+	DEV=geneve11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_geneve_tunnel
+	attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6geneve()
+{
+	TYPE=geneve
+	DEV_NS=ip6geneve00
+	DEV=ip6geneve11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ip6geneve_tunnel
+	attach_bpf $DEV ip6geneve_set_tunnel ip6geneve_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_ipip()
+{
+	TYPE=ipip
+	DEV_NS=ipip00
+	DEV=ipip11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ipip_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ipip6()
+{
+	TYPE=ip6tnl
+	DEV_NS=ipip6tnl00
+	DEV=ipip6tnl11
+	ret=0
+
+	check $TYPE
+	config_device
+	add_ipip6tnl_tunnel
+	ip link set dev veth1 mtu 1500
+	attach_bpf $DEV ipip6_set_tunnel ipip6_get_tunnel
+	# underlay
+	ping6 $PING_ARG ::11
+	# ip4 over ip6
+	ping $PING_ARG 10.1.1.100
+	check_err $?
+	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+	check_err $?
+	cleanup
+
+	if [ $ret -ne 0 ]; then
+                echo -e ${RED}"FAIL: $TYPE"${NC}
+                return 1
+        fi
+        echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+attach_bpf()
+{
+	DEV=$1
+	SET=$2
+	GET=$3
+	tc qdisc add dev $DEV clsact
+	tc filter add dev $DEV egress bpf da obj test_tunnel_kern.o sec $SET
+	tc filter add dev $DEV ingress bpf da obj test_tunnel_kern.o sec $GET
+}
+
+cleanup()
+{
+	ip netns delete at_ns0 2> /dev/null
+	ip link del veth1 2> /dev/null
+	ip link del ipip11 2> /dev/null
+	ip link del ipip6tnl11 2> /dev/null
+	ip link del gretap11 2> /dev/null
+	ip link del ip6gre11 2> /dev/null
+	ip link del ip6gretap11 2> /dev/null
+	ip link del vxlan11 2> /dev/null
+	ip link del ip6vxlan11 2> /dev/null
+	ip link del geneve11 2> /dev/null
+	ip link del ip6geneve11 2> /dev/null
+	ip link del erspan11 2> /dev/null
+	ip link del ip6erspan11 2> /dev/null
+}
+
+cleanup_exit()
+{
+	echo "CATCH SIGKILL or SIGINT, cleanup and exit"
+	cleanup
+	exit 0
+}
+
+check()
+{
+   ip link help $1 2>&1 | grep -q "^Usage:"
+    if [ $? -ne 0 ];then
+        echo "SKIP $1: iproute2 not support"
+        cleanup
+        return 1
+    fi
+}
+
+enable_debug()
+{
+    echo 'file ip_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file ip6_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file vxlan.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file geneve.c +p' > /sys/kernel/debug/dynamic_debug/control
+    echo 'file ipip.c +p' > /sys/kernel/debug/dynamic_debug/control
+}
+
+check_err()
+{
+    if [ $ret -eq 0 ]; then
+        ret=$1
+    fi
+}
+
+bpf_tunnel_test()
+{
+	echo "Testing GRE tunnel..."
+	test_gre
+	echo "Testing IP6GRE tunnel..."
+	test_ip6gre
+	echo "Testing IP6GRETAP tunnel..."
+	test_ip6gretap
+	echo "Testing ERSPAN tunnel..."
+	test_erspan v2
+	echo "Testing IP6ERSPAN tunnel..."
+	test_ip6erspan v2
+	echo "Testing VXLAN tunnel..."
+	test_vxlan
+	echo "Testing IP6VXLAN tunnel..."
+	test_ip6vxlan
+	echo "Testing GENEVE tunnel..."
+	test_geneve
+	echo "Testing IP6GENEVE tunnel..."
+	test_ip6geneve
+	echo "Testing IPIP tunnel..."
+	test_ipip
+	echo "Testing IPIP6 tunnel..."
+	test_ipip6
+}
+
+trap cleanup 0 3 6
+trap cleanup_exit 2 9
+
+cleanup
+bpf_tunnel_test
+
+exit 0
diff --git a/tools/testing/selftests/bpf/test_tunnel_kern.c b/tools/testing/selftests/bpf/test_tunnel_kern.c
new file mode 100644
index 000000000000..0bf642c530fd
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tunnel_kern.c
@@ -0,0 +1,691 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2016 VMware
+ * Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <stddef.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/types.h>
+#include <linux/tcp.h>
+#include <linux/socket.h>
+#include <linux/pkt_cls.h>
+#include <linux/erspan.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define ERROR(ret) do {\
+		char fmt[] = "ERROR line:%d ret:%d\n";\
+		bpf_trace_printk(fmt, sizeof(fmt), __LINE__, ret); \
+	} while(0)
+
+int _version SEC("version") = 1;
+
+struct geneve_opt {
+	__be16	opt_class;
+	__u8	type;
+	__u8	length:5;
+	__u8	r3:1;
+	__u8	r2:1;
+	__u8	r1:1;
+	__u8	opt_data[8]; /* hard-coded to 8 byte */
+};
+
+struct vxlan_metadata {
+	__u32     gbp;
+};
+
+SEC("gre_set_tunnel")
+int _gre_set_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX | BPF_F_SEQ_NUMBER);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("gre_get_tunnel")
+int _gre_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "key %d remote ip 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), key.tunnel_id, key.remote_ipv4);
+	return TC_ACT_OK;
+}
+
+SEC("ip6gretap_set_tunnel")
+int _ip6gretap_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+	key.tunnel_label = 0xabcde;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX |
+				     BPF_F_SEQ_NUMBER);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6gretap_get_tunnel")
+int _ip6gretap_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip6 ::%x label %x\n";
+	struct bpf_tunnel_key key;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_set_tunnel")
+int _erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&md, 0, sizeof(md));
+#ifdef ERSPAN_V1
+	md.version = 1;
+	md.u.index = bpf_htonl(123);
+#else
+	__u8 direction = 1;
+	__u8 hwid = 7;
+
+	md.version = 2;
+	md.u.md2.dir = direction;
+	md.u.md2.hwid = hwid & 0xf;
+	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+#endif
+
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("erspan_get_tunnel")
+int _erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x erspan version %d\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	__u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.version);
+
+#ifdef ERSPAN_V1
+	char fmt2[] = "\tindex %x\n";
+
+	index = bpf_ntohl(md.u.index);
+	bpf_trace_printk(fmt2, sizeof(fmt2), index);
+#else
+	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
+
+	bpf_trace_printk(fmt2, sizeof(fmt2),
+			 md.u.md2.dir,
+			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+			 bpf_ntohl(md.u.md2.timestamp));
+#endif
+
+	return TC_ACT_OK;
+}
+
+SEC("ip4ip6erspan_set_tunnel")
+int _ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11);
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&md, 0, sizeof(md));
+
+#ifdef ERSPAN_V1
+	md.u.index = bpf_htonl(123);
+	md.version = 1;
+#else
+	__u8 direction = 0;
+	__u8 hwid = 17;
+
+	md.version = 2;
+	md.u.md2.dir = direction;
+	md.u.md2.hwid = hwid & 0xf;
+	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+#endif
+
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip4ip6erspan_get_tunnel")
+int _ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "ip6erspan get key %d remote ip6 ::%x erspan version %d\n";
+	struct bpf_tunnel_key key;
+	struct erspan_metadata md;
+	__u32 index;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.version);
+
+#ifdef ERSPAN_V1
+	char fmt2[] = "\tindex %x\n";
+
+	index = bpf_ntohl(md.u.index);
+	bpf_trace_printk(fmt2, sizeof(fmt2), index);
+#else
+	char fmt2[] = "\tdirection %d hwid %x timestamp %u\n";
+
+	bpf_trace_printk(fmt2, sizeof(fmt2),
+			 md.u.md2.dir,
+			 (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+			 bpf_ntohl(md.u.md2.timestamp));
+#endif
+
+	return TC_ACT_OK;
+}
+
+SEC("vxlan_set_tunnel")
+int _vxlan_set_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct vxlan_metadata md;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	md.gbp = 0x800FF; /* Set VXLAN Group Policy extension */
+	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("vxlan_get_tunnel")
+int _vxlan_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct vxlan_metadata md;
+	char fmt[] = "key %d remote ip 0x%x vxlan gbp 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &md, sizeof(md));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, md.gbp);
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6vxlan_set_tunnel")
+int _ip6vxlan_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 22;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6vxlan_get_tunnel")
+int _ip6vxlan_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip6 ::%x label %x\n";
+	struct bpf_tunnel_key key;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			 key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
+
+	return TC_ACT_OK;
+}
+
+SEC("geneve_set_tunnel")
+int _geneve_set_tunnel(struct __sk_buff *skb)
+{
+	int ret, ret2;
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	__builtin_memset(&gopt, 0x0, sizeof(gopt));
+	gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
+	gopt.type = 0x08;
+	gopt.r1 = 0;
+	gopt.r2 = 0;
+	gopt.r3 = 0;
+	gopt.length = 2; /* 4-byte multiple */
+	*(int *) &gopt.opt_data = bpf_htonl(0xdeadbeef);
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("geneve_get_tunnel")
+int _geneve_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
+	return TC_ACT_OK;
+}
+
+SEC("ip6geneve_set_tunnel")
+int _ip6geneve_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	int ret;
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_id = 22;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&gopt, 0x0, sizeof(gopt));
+	gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
+	gopt.type = 0x08;
+	gopt.r1 = 0;
+	gopt.r2 = 0;
+	gopt.r3 = 0;
+	gopt.length = 2; /* 4-byte multiple */
+	*(int *) &gopt.opt_data = bpf_htonl(0xfeedbeef);
+
+	ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6geneve_get_tunnel")
+int _ip6geneve_get_tunnel(struct __sk_buff *skb)
+{
+	char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
+	struct bpf_tunnel_key key;
+	struct geneve_opt gopt;
+	int ret;
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt),
+			key.tunnel_id, key.remote_ipv4, gopt.opt_class);
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip_set_tunnel")
+int _ipip_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct iphdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	key.tunnel_ttl = 64;
+	if (iph->protocol == IPPROTO_ICMP) {
+		key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	} else {
+		if (iph->protocol != IPPROTO_TCP || iph->ihl != 5)
+			return TC_ACT_SHOT;
+
+		if (tcp->dest == bpf_htons(5200))
+			key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+		else if (tcp->dest == bpf_htons(5201))
+			key.remote_ipv4 = 0xac100165; /* 172.16.1.101 */
+		else
+			return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip_get_tunnel")
+int _ipip_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip 0x%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), key.remote_ipv4);
+	return TC_ACT_OK;
+}
+
+SEC("ipip6_set_tunnel")
+int _ipip6_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct iphdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ipip6_get_tunnel")
+int _ipip6_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip6 %x::%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+			 bpf_htonl(key.remote_ipv6[3]));
+	return TC_ACT_OK;
+}
+
+SEC("ip6ip6_set_tunnel")
+int _ip6ip6_set_tunnel(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key key = {};
+	void *data = (void *)(long)skb->data;
+	struct ipv6hdr *iph = data;
+	struct tcphdr *tcp = data + sizeof(*iph);
+	void *data_end = (void *)(long)skb->data_end;
+	int ret;
+
+	/* single length check */
+	if (data + sizeof(*iph) + sizeof(*tcp) > data_end) {
+		ERROR(1);
+		return TC_ACT_SHOT;
+	}
+
+	key.remote_ipv6[0] = bpf_htonl(0x2401db00);
+	key.tunnel_ttl = 64;
+
+	if (iph->nexthdr == 58 /* NEXTHDR_ICMP */) {
+		key.remote_ipv6[3] = bpf_htonl(1);
+	} else {
+		if (iph->nexthdr != 6 /* NEXTHDR_TCP */) {
+			ERROR(iph->nexthdr);
+			return TC_ACT_SHOT;
+		}
+
+		if (tcp->dest == bpf_htons(5200)) {
+			key.remote_ipv6[3] = bpf_htonl(1);
+		} else if (tcp->dest == bpf_htons(5201)) {
+			key.remote_ipv6[3] = bpf_htonl(2);
+		} else {
+			ERROR(tcp->dest);
+			return TC_ACT_SHOT;
+		}
+	}
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("ip6ip6_get_tunnel")
+int _ip6ip6_get_tunnel(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	char fmt[] = "remote ip6 %x::%x\n";
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+	if (ret < 0) {
+		ERROR(ret);
+		return TC_ACT_SHOT;
+	}
+
+	bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+			 bpf_htonl(key.remote_ipv6[3]));
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] net: net_cls: remove a NULL check for css_cls_state
From: David Miller @ 2018-04-25 15:00 UTC (permalink / raw)
  To: roy.qing.li; +Cc: lirongqing, netdev
In-Reply-To: <CAJFZqHxzNFzg8hBMWc3DS3Nn1hdVxQ9cBhPSy4uL2b6ckCmbtg@mail.gmail.com>

From: Li RongQing <roy.qing.li@gmail.com>
Date: Wed, 25 Apr 2018 16:35:02 +0800

> On 4/20/18, David Miller <davem@davemloft.net> wrote:
>> From: Li RongQing <lirongqing@baidu.com>
>> Date: Thu, 19 Apr 2018 12:59:21 +0800
>>
>>> The input of css_cls_state() is impossible to NULL except
>>> cgrp_css_online, so simplify it
>>>
>>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>>
>> I don't view this as an improvement.  Just let the helper always check
>> NULL and that way there are less situations to audit.
>>
> css_cls_state maybe return NULL, but nearly no places check the return
> value with NULL, this seems unreadable.

I saw this email of your's the first few times, you don't need to post
it again.

I still disagree with your analysis, and I do not see your change as
an overall improvement.

Thank you.

^ permalink raw reply

* Re: [PATCH v4 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()
From: Stefano Brivio @ 2018-04-25 14:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andreas Christoforou, kernel-hardening, Steffen Klassert,
	Herbert Xu, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	netdev, linux-kernel
In-Reply-To: <20180425144639.GA38350@beast>

On Wed, 25 Apr 2018 07:46:39 -0700
Kees Cook <keescook@chromium.org> wrote:

> In the quest to remove all stack VLA usage removed from the kernel[1],
> just use XFRM_MAX_DEPTH as already done for the "class" array. In one
> case, it'll do this loop up to 5, the other caller up to 6.
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Co-developed-by: Andreas Christoforou <andreaschristofo@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v4:
> - actually remove memset(). :)
> v3:
> - adjust Subject and commit log (Steffen)
> - use "= { }" instead of memset() (Stefano)
> v2:
> - use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
> ---

Acked-by: Stefano Brivio <sbrivio@redhat.com>

^ permalink raw reply

* Re: [PATCH iproute2 v2] json_print: Fix hidden 64-bit type promotion
From: Toke Høiland-Jørgensen @ 2018-04-25 14:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Kevin Darbyshire-Bryant
In-Reply-To: <20180425075527.680f34fc@xeon-e3>

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Wed, 25 Apr 2018 16:30:22 +0200
> Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
>> print_uint() will silently promote its variable type to uint64_t, but there
>> is nothing that ensures that the format string specifier passed along with
>> it fits (and the function name suggest to pass "%u").
>> 
>> Fix this by changing print_uint() to use a native 'unsigned int' type, and
>> introduce a separate print_u64() function for printing 64-bit values. All
>> call sites that were actually printing 64-bit values using print_uint() are
>> converted to use print_u64() instead.
>> 
>> Since print_int() was already using native int types, just add a
>> print_s64() to match, but don't convert any call sites.
>> 
>> Cc: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>
> Yes, this makes sense. Maybe there should be a print_luint for
> consistency.

I just realised I missed a few call sites, so I'll resend. Should I
call the new function print_luint() instead of print_u64()?

> Also, I tried (in vain) to make a version that allows GCC to check the
> format string.  But it was a struggle and just gave up.

Yeah, no idea how to do this either...

-Toke

^ permalink raw reply

* [README] Reiteration of statement regarding NetDev conference and NetDev Society
From: David Miller @ 2018-04-25 14:57 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel


As the Linux networking maintainer, I feel the need to reiterate my
statement of a month ago, and to provide some supporting facts on my
end so that there is no confusion in the community on these issues.

I am no longer associated with either the Netdev Society or their
event, the NetDev conference.

Neither is endorsed nor supported by me.

I am writing to provide clarity on this situation to the subscribers
of the 'netdev' mailing list and the community at large.

Specifically, I am writing to make sure there isn't any confusion
through the use of the name "NetDev" that I either continue to plan,
support or participate in this event.  I do not.

I have in the past participated in the NetDev events as a member of
technical committee and as the committee's chair.  I took on this role
because I felt that I could improve the technical content.

I spent many hours helping to plan and make preparations for Netdev
2.2 in Seoul.  In connection with this work, the Executive Director at
the time (Soyoung Park) and I were reimbursed for our airfare
($7321.44 USD) and lodging ($717.55 USD) as we prepared for the
conference during the weeks leading up to the event.

I was also a member of the NetDev Society's board but have formally
left the board and the technical committee, am no longer associated
with planning the NetDev conference and do not intend to participate
in future NetDev conference events.

Having said all of this, I look forward to continued great work on the
netdev project and sharing cool advances in networking at the
networking track of Linux Plumbers Conference.

Thank you.

^ permalink raw reply

* Re: [PATCH iproute2 v2] json_print: Fix hidden 64-bit type promotion
From: Stephen Hemminger @ 2018-04-25 14:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, Kevin Darbyshire-Bryant
In-Reply-To: <20180425143022.6986-1-toke@toke.dk>

On Wed, 25 Apr 2018 16:30:22 +0200
Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> print_uint() will silently promote its variable type to uint64_t, but there
> is nothing that ensures that the format string specifier passed along with
> it fits (and the function name suggest to pass "%u").
> 
> Fix this by changing print_uint() to use a native 'unsigned int' type, and
> introduce a separate print_u64() function for printing 64-bit values. All
> call sites that were actually printing 64-bit values using print_uint() are
> converted to use print_u64() instead.
> 
> Since print_int() was already using native int types, just add a
> print_s64() to match, but don't convert any call sites.
> 
> Cc: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>

Yes, this makes sense. Maybe there should be a print_luint for consistency.
Also, I tried (in vain) to make a version that allows GCC to check the
format string.  But it was a struggle and just gave up.

^ permalink raw reply

* Re: [PATCH 1/2] net: can: sja1000: Replace mdelay with usleep_range in peak_pci_probe
From: Marc Kleine-Budde @ 2018-04-25 14:53 UTC (permalink / raw)
  To: Jia-Ju Bai, wg; +Cc: linux-can, netdev, linux-kernel
In-Reply-To: <1523410978-1834-1-git-send-email-baijiaju1990@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]

On 04/11/2018 03:42 AM, Jia-Ju Bai wrote:
> peak_pci_probe() is never called in atomic context.
> 
> peak_pci_probe() is set as ".probe" in struct pci_driver.
> 
> Despite never getting called from atomic context, peak_pci_probe()
> calls mdelay() to busily wait.
> This is not necessary and can be replaced with usleep_range() to
> avoid busy waiting.
> 
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>

Applied both to linux-can-next.

tnx,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
From: Eric Dumazet @ 2018-04-25 14:52 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, netdev; +Cc: cake, Dave Taht
In-Reply-To: <20180425134249.21300-1-toke@toke.dk>



On 04/25/2018 06:42 AM, Toke Høiland-Jørgensen wrote:
> sch_cake targets the home router use case and is intended to squeeze the
> most bandwidth and latency out of even the slowest ISP links and routers,
> while presenting an API simple enough that even an ISP can configure it.
> 

* Support for ack filtering.

Oh my god. Cake became a monster.

syzkaller will be very happy to trigger all kind of bugs in it.

Lack of any pskb_may_pull() is really concerning.

How ack filter deals with reorders ?

Also the forced GSO segmentation looks wrong to me.

This kills xmit_more gain we have when GSO is performed after
qdisc dequeue before hitting device.

This should be driven by a parameter really, some threshold on the skb size.

What performance number do you get on a 10Gbit NIC for example ?

Also, how ack filter can suppress packets after skb_gso_segment() ?

+		segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+		if (IS_ERR_OR_NULL(segs))
+			return qdisc_drop(skb, sch, to_free);
+
+		while (segs) {
+			nskb = segs->next;
+			segs->next = NULL;
+			qdisc_skb_cb(segs)->pkt_len = segs->len;
+			cobalt_set_enqueue_time(segs, now);
+			get_cobalt_cb(segs)->adjusted_len = cake_overhead(q,
+									  segs);
+			flow_queue_add(flow, segs);
+
+			if (q->ack_filter)
+				ack = cake_ack_filter(q, flow);
+

All the following must be dead code, right ???

+			if (ack) {
+				b->ack_drops++;
+				sch->qstats.drops++;
+				b->bytes += ack->len;
+				slen += segs->len - ack->len;
+				q->buffer_used += segs->truesize -
+					ack->truesize;
+				if (q->rate_flags & CAKE_FLAG_INGRESS)
+					cake_advance_shaper(q, b, ack,
+							    now, true);
+
+				qdisc_tree_reduce_backlog(sch, 1,
+							  qdisc_pkt_len(ack));
+				consume_skb(ack);
+			} else {

^ permalink raw reply

* [PATCH v4 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()
From: Kees Cook @ 2018-04-25 14:46 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Andreas Christoforou, kernel-hardening, Steffen Klassert,
	Herbert Xu, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	netdev, linux-kernel

In the quest to remove all stack VLA usage removed from the kernel[1],
just use XFRM_MAX_DEPTH as already done for the "class" array. In one
case, it'll do this loop up to 5, the other caller up to 6.

[1] https://lkml.org/lkml/2018/3/7/621

Co-developed-by: Andreas Christoforou <andreaschristofo@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v4:
- actually remove memset(). :)
v3:
- adjust Subject and commit log (Steffen)
- use "= { }" instead of memset() (Stefano)
v2:
- use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
---
 net/ipv6/xfrm6_state.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index 16f434791763..5bdca3d5d6b7 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -60,11 +60,9 @@ xfrm6_init_temprop(struct xfrm_state *x, const struct xfrm_tmpl *tmpl,
 static int
 __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void *p), int maxclass)
 {
-	int i;
+	int count[XFRM_MAX_DEPTH] = { };
 	int class[XFRM_MAX_DEPTH];
-	int count[maxclass];
-
-	memset(count, 0, sizeof(count));
+	int i;
 
 	for (i = 0; i < n; i++) {
 		int c;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: Repeating "unregister_netdevice: waiting for lo to become free" caused by upstream 76da0704507bb ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER")
From: Rafał Miłecki @ 2018-04-25 14:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov, WANG Cong, David S. Miller,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, Network Development, jeffy,
	David Ahern
  Cc: Greg Kroah-Hartman, Stable, Dan Streetman, Dan Streetman,
	Mathias Tillman
In-Reply-To: <5e44211e-7056-96ee-0d12-4f6dc8b22734@yandex-team.ru>

On 25.04.2018 16:30, Konstantin Khlebnikov wrote:
> On 25.04.2018 17:16, Rafał Miłecki wrote:
>> On 23.04.2018 15:08, Rafał Miłecki wrote:
>>> I've just updated my kernel 4.4.x and noticed a regression. Bisecting
>>> pointed me to the commit 2417da3f4d6bc ("ipv6: only call
>>> ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is
>>> backport of upstream 76da0704507bb. That backported commit has
>>> appeared in a 4.4.103.
>>>
>>> I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping
>>> a container I start getting these messages:
>>> [  229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1
>>> [  239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1
>>> [  249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1
>>> (...)
>>>
>>> Trying to start LXC nevertheless results in lxc-start command hang
>>> around network configuration. Trying to query LXC state afterwards
>>> results in a lxc-info command hang too.
>>>
>>> I tried Googling for this issue and found similar reports:
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637
>>> https://github.com/fnproject/fn/issues/686
>>> https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-waiting-for-lo-to-become-free-usage-count-1/
>>> all of them related to the Docker, which is probably a similar use
>>> case to the LXC.
>>>
>>> I couldn't find any reference to commit 76da0704507bb that could
>>> suggest fixing the problem I'm seeing.
>>>
>>> Does anyone have an idea what is the issue I'm seeing about? Or even
>>> better, how to fix it? Can I provide any additional info that would
>>> help?
>>>
>>>
>>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5
>>> [1] https://openwrt.org/
>>> [2] https://linuxcontainers.org/
>>
>> Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I
>> still experience the same problem.
>>
>>  From reading various reports regarding that "unregister_netdevice:
>> waiting for lo to become free" message it appears the problem is caused
>> by a leaking dst refcnt somewhere in the kernel code.
>>
>> I found links to few commit fixing leaks at various places:
>> 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst")
>> 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()")
>> 4ee806d51176b ("net: tcp: close sock if net namespace is exiting")
>> d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()")
>> 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
>>
>> All above patches are present in the linux-v4.4.y and are part of kernel
>> 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
>>
>> Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once
>> for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only
>> expost existing one?
> 
> Mathias Tillman reported this as "4.4.103 linux kernel regression".
> Last message in that thread (which I couldn't find in mailing list archives) had:
> | As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code that adds a in6_dev_get call without calling in6_dev_put.

Wow, this is very helpful, thank you!

Somehow I didn't even think about OpenWrt downstream patches. Too bad
this wasn't reported to the OpenWrt community, I spent 2 days on this.
There is indeed:
target/linux/generic/patches-4.4/670-ipv6-allow-rejecting-with-source-address-failed-policy.patch
[PATCH 1/2] ipv6: allow rejecting with "source address failed policy"

I'll move this issue discussion to the OpenWrt/LEDE now, I hope we can
sort it out.

^ permalink raw reply

* Re: [PATCH v3] ath9k: dfs: Remove VLA usage
From: Kees Cook @ 2018-04-25 14:42 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Andreas Christoforou, Rosen Penev, Eric Dumazet, Joe Perches,
	linux-wireless, Network Development, QCA ath9k Development,
	Kernel Hardening, LKML
In-Reply-To: <87po2nagrk.fsf@kamboji.qca.qualcomm.com>

On Wed, Apr 25, 2018 at 12:26 AM, Kalle Valo <kvalo@codeaurora.org> wrote:
> I have already applied an almost identical patch:
>
> ath9k: dfs: remove accidental use of stack VLA
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath-next&id=9c27489a34548913baaaf3b2776e05d4a9389e3e

Ah! Cool, no worries. I didn't see that in linux-next yet. :)

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* Re: Repeating "unregister_netdevice: waiting for lo to become free" caused by upstream 76da0704507bb ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER")
From: Konstantin Khlebnikov @ 2018-04-25 14:30 UTC (permalink / raw)
  To: Rafał Miłecki, WANG Cong, David S. Miller,
	Alexey Kuznetsov, Hideaki YOSHIFUJI, Network Development, jeffy,
	David Ahern
  Cc: Greg Kroah-Hartman, Stable, Dan Streetman, Dan Streetman,
	Mathias Tillman
In-Reply-To: <228a9486-06af-2cbb-d4b9-677642f2d754@gmail.com>


On 25.04.2018 17:16, Rafał Miłecki wrote:
> On 23.04.2018 15:08, Rafał Miłecki wrote:
>> I've just updated my kernel 4.4.x and noticed a regression. Bisecting
>> pointed me to the commit 2417da3f4d6bc ("ipv6: only call
>> ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is
>> backport of upstream 76da0704507bb. That backported commit has
>> appeared in a 4.4.103.
>>
>> I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping
>> a container I start getting these messages:
>> [  229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1
>> [  239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1
>> [  249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1
>> (...)
>>
>> Trying to start LXC nevertheless results in lxc-start command hang
>> around network configuration. Trying to query LXC state afterwards
>> results in a lxc-info command hang too.
>>
>> I tried Googling for this issue and found similar reports:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637
>> https://github.com/fnproject/fn/issues/686
>> https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-waiting-for-lo-to-become-free-usage-count-1/
>> all of them related to the Docker, which is probably a similar use
>> case to the LXC.
>>
>> I couldn't find any reference to commit 76da0704507bb that could
>> suggest fixing the problem I'm seeing.
>>
>> Does anyone have an idea what is the issue I'm seeing about? Or even
>> better, how to fix it? Can I provide any additional info that would
>> help?
>>
>>
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5
>> [1] https://openwrt.org/
>> [2] https://linuxcontainers.org/
> 
> Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I
> still experience the same problem.
> 
>  From reading various reports regarding that "unregister_netdevice:
> waiting for lo to become free" message it appears the problem is caused
> by a leaking dst refcnt somewhere in the kernel code.
> 
> I found links to few commit fixing leaks at various places:
> 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst")
> 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()")
> 4ee806d51176b ("net: tcp: close sock if net namespace is exiting")
> d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()")
> 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
> 
> All above patches are present in the linux-v4.4.y and are part of kernel
> 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
> 
> Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once
> for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only
> expost existing one?

Mathias Tillman reported this as "4.4.103 linux kernel regression".
Last message in that thread (which I couldn't find in mailing list archives) had:
| As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code that adds a in6_dev_get call without calling in6_dev_put.

^ permalink raw reply

* [PATCH iproute2 v2] json_print: Fix hidden 64-bit type promotion
From: Toke Høiland-Jørgensen @ 2018-04-25 14:30 UTC (permalink / raw)
  To: netdev; +Cc: Toke Høiland-Jørgensen, Kevin Darbyshire-Bryant
In-Reply-To: <20180329192220.30101-1-ldir@darbyshire-bryant.me.uk>

print_uint() will silently promote its variable type to uint64_t, but there
is nothing that ensures that the format string specifier passed along with
it fits (and the function name suggest to pass "%u").

Fix this by changing print_uint() to use a native 'unsigned int' type, and
introduce a separate print_u64() function for printing 64-bit values. All
call sites that were actually printing 64-bit values using print_uint() are
converted to use print_u64() instead.

Since print_int() was already using native int types, just add a
print_s64() to match, but don't convert any call sites.

Cc: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
Changes since v1 (which was sent by Kevin Darbyshire-Bryant):
  - Add print_u64() and convert call sites that were actually passing
    64-bit values to print_uint().
  
 include/json_print.h  |  4 ++-
 include/json_writer.h | 12 ++++++---
 ip/ipaddress.c        | 62 +++++++++++++++++++++----------------------
 ip/ipmacsec.c         |  8 +++---
 ip/ipmroute.c         |  6 ++---
 lib/json_print.c      |  4 ++-
 lib/json_writer.c     | 30 ++++++++++++++++++---
 7 files changed, 78 insertions(+), 48 deletions(-)

diff --git a/include/json_print.h b/include/json_print.h
index 2ca7830a..4677618e 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -56,10 +56,12 @@ void close_json_array(enum output_type type, const char *delim);
 		print_color_##type_name(t, COLOR_NONE, key, fmt, value);	\
 	}
 _PRINT_FUNC(int, int);
+_PRINT_FUNC(s64, int64_t);
 _PRINT_FUNC(bool, bool);
 _PRINT_FUNC(null, const char*);
 _PRINT_FUNC(string, const char*);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(uint, unsigned int);
+_PRINT_FUNC(u64, uint64_t);
 _PRINT_FUNC(hu, unsigned short);
 _PRINT_FUNC(hex, unsigned int);
 _PRINT_FUNC(0xhex, unsigned int);
diff --git a/include/json_writer.h b/include/json_writer.h
index 4b4dec28..9d3f37f8 100644
--- a/include/json_writer.h
+++ b/include/json_writer.h
@@ -34,10 +34,12 @@ void jsonw_string(json_writer_t *self, const char *value);
 void jsonw_bool(json_writer_t *self, bool value);
 void jsonw_float(json_writer_t *self, double number);
 void jsonw_float_fmt(json_writer_t *self, const char *fmt, double num);
-void jsonw_uint(json_writer_t *self, uint64_t number);
+void jsonw_uint(json_writer_t *self, unsigned int number);
+void jsonw_u64(json_writer_t *self, uint64_t number);
 void jsonw_xint(json_writer_t *self, uint64_t number);
 void jsonw_hu(json_writer_t *self, unsigned short number);
-void jsonw_int(json_writer_t *self, int64_t number);
+void jsonw_int(json_writer_t *self, int number);
+void jsonw_s64(json_writer_t *self, int64_t number);
 void jsonw_null(json_writer_t *self);
 void jsonw_lluint(json_writer_t *self, unsigned long long int num);
 
@@ -45,10 +47,12 @@ void jsonw_lluint(json_writer_t *self, unsigned long long int num);
 void jsonw_string_field(json_writer_t *self, const char *prop, const char *val);
 void jsonw_bool_field(json_writer_t *self, const char *prop, bool value);
 void jsonw_float_field(json_writer_t *self, const char *prop, double num);
-void jsonw_uint_field(json_writer_t *self, const char *prop, uint64_t num);
+void jsonw_uint_field(json_writer_t *self, const char *prop, unsigned int num);
+void jsonw_u64_field(json_writer_t *self, const char *prop, uint64_t num);
 void jsonw_xint_field(json_writer_t *self, const char *prop, uint64_t num);
 void jsonw_hu_field(json_writer_t *self, const char *prop, unsigned short num);
-void jsonw_int_field(json_writer_t *self, const char *prop, int64_t num);
+void jsonw_int_field(json_writer_t *self, const char *prop, int num);
+void jsonw_s64_field(json_writer_t *self, const char *prop, int64_t num);
 void jsonw_null_field(json_writer_t *self, const char *prop);
 void jsonw_lluint_field(json_writer_t *self, const char *prop,
 			unsigned long long int num);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index aecc9a1d..75539e05 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -554,21 +554,21 @@ static void print_vf_stats64(FILE *fp, struct rtattr *vfstats)
 
 		/* RX stats */
 		open_json_object("rx");
-		print_uint(PRINT_JSON, "bytes", NULL,
+		print_u64(PRINT_JSON, "bytes", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
-		print_uint(PRINT_JSON, "packets", NULL,
+		print_u64(PRINT_JSON, "packets", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
-		print_uint(PRINT_JSON, "multicast", NULL,
+		print_u64(PRINT_JSON, "multicast", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
-		print_uint(PRINT_JSON, "broadcast", NULL,
+		print_u64(PRINT_JSON, "broadcast", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_BROADCAST]));
 		close_json_object();
 
 		/* TX stats */
 		open_json_object("tx");
-		print_uint(PRINT_JSON, "tx_bytes", NULL,
+		print_u64(PRINT_JSON, "tx_bytes", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
-		print_uint(PRINT_JSON, "tx_packets", NULL,
+		print_u64(PRINT_JSON, "tx_packets", NULL,
 			   rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
 		close_json_object();
 		close_json_object();
@@ -608,69 +608,69 @@ static void __print_link_stats(FILE *fp, struct rtattr *tb[])
 
 		/* RX stats */
 		open_json_object("rx");
-		print_uint(PRINT_JSON, "bytes", NULL, s->rx_bytes);
-		print_uint(PRINT_JSON, "packets", NULL, s->rx_packets);
-		print_uint(PRINT_JSON, "errors", NULL, s->rx_errors);
-		print_uint(PRINT_JSON, "dropped", NULL, s->rx_dropped);
-		print_uint(PRINT_JSON, "over_errors", NULL, s->rx_over_errors);
-		print_uint(PRINT_JSON, "multicast", NULL, s->multicast);
+		print_u64(PRINT_JSON, "bytes", NULL, s->rx_bytes);
+		print_u64(PRINT_JSON, "packets", NULL, s->rx_packets);
+		print_u64(PRINT_JSON, "errors", NULL, s->rx_errors);
+		print_u64(PRINT_JSON, "dropped", NULL, s->rx_dropped);
+		print_u64(PRINT_JSON, "over_errors", NULL, s->rx_over_errors);
+		print_u64(PRINT_JSON, "multicast", NULL, s->multicast);
 		if (s->rx_compressed)
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "compressed", NULL, s->rx_compressed);
 
 		/* RX error stats */
 		if (show_stats > 1) {
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "length_errors",
 				   NULL, s->rx_length_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "crc_errors",
 				   NULL, s->rx_crc_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "frame_errors",
 				   NULL, s->rx_frame_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "fifo_errors",
 				   NULL, s->rx_fifo_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "missed_errors",
 				   NULL, s->rx_missed_errors);
 			if (s->rx_nohandler)
-				print_uint(PRINT_JSON,
+				print_u64(PRINT_JSON,
 					   "nohandler", NULL, s->rx_nohandler);
 		}
 		close_json_object();
 
 		/* TX stats */
 		open_json_object("tx");
-		print_uint(PRINT_JSON, "bytes", NULL, s->tx_bytes);
-		print_uint(PRINT_JSON, "packets", NULL, s->tx_packets);
-		print_uint(PRINT_JSON, "errors", NULL, s->tx_errors);
-		print_uint(PRINT_JSON, "dropped", NULL, s->tx_dropped);
-		print_uint(PRINT_JSON,
+		print_u64(PRINT_JSON, "bytes", NULL, s->tx_bytes);
+		print_u64(PRINT_JSON, "packets", NULL, s->tx_packets);
+		print_u64(PRINT_JSON, "errors", NULL, s->tx_errors);
+		print_u64(PRINT_JSON, "dropped", NULL, s->tx_dropped);
+		print_u64(PRINT_JSON,
 			   "carrier_errors",
 			   NULL, s->tx_carrier_errors);
-		print_uint(PRINT_JSON, "collisions", NULL, s->collisions);
+		print_u64(PRINT_JSON, "collisions", NULL, s->collisions);
 		if (s->tx_compressed)
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "compressed", NULL, s->tx_compressed);
 
 		/* TX error stats */
 		if (show_stats > 1) {
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "aborted_errors",
 				   NULL, s->tx_aborted_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "fifo_errors",
 				   NULL, s->tx_fifo_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "window_errors",
 				   NULL, s->tx_window_errors);
-			print_uint(PRINT_JSON,
+			print_u64(PRINT_JSON,
 				   "heartbeat_errors",
 				   NULL, s->tx_heartbeat_errors);
 			if (carrier_changes)
-				print_uint(PRINT_JSON, "carrier_changes", NULL,
+				print_u64(PRINT_JSON, "carrier_changes", NULL,
 					   rta_getattr_u32(carrier_changes));
 		}
 
diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index 38ec7136..4e4e158e 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -640,7 +640,7 @@ static void print_attrs(struct rtattr *attrs[])
 	}
 }
 
-static __u64 getattr_uint(struct rtattr *stat)
+static __u64 getattr_u64(struct rtattr *stat)
 {
 	switch (RTA_PAYLOAD(stat)) {
 	case sizeof(__u64):
@@ -681,7 +681,7 @@ static void print_fp_stats(const char *prefix,
 
 		pad = strlen(names[i]) + 1;
 		if (stats[i])
-			printf("%*llu", pad, getattr_uint(stats[i]));
+			printf("%*llu", pad, getattr_u64(stats[i]));
 		else
 			printf("%*c", pad, '-');
 	}
@@ -697,8 +697,8 @@ static void print_json_stats(const char *names[], unsigned int num,
 		if (!names[i] || !stats[i])
 			continue;
 
-		print_uint(PRINT_JSON, names[i],
-			   NULL, getattr_uint(stats[i]));
+		print_u64(PRINT_JSON, names[i],
+			   NULL, getattr_u64(stats[i]));
 	}
 }
 
diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index 59c5b771..cdb4d898 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -182,12 +182,12 @@ int print_mroute(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 		struct rta_mfc_stats *mfcs = RTA_DATA(tb[RTA_MFC_STATS]);
 
 		print_string(PRINT_FP, NULL, "%s", _SL_);
-		print_uint(PRINT_ANY, "packets", "  %"PRIu64" packets,",
+		print_u64(PRINT_ANY, "packets", "  %"PRIu64" packets,",
 			   mfcs->mfcs_packets);
-		print_uint(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
+		print_u64(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
 
 		if (mfcs->mfcs_wrong_if)
-			print_uint(PRINT_ANY, "wrong_if",
+			print_u64(PRINT_ANY, "wrong_if",
 				   ", %"PRIu64" arrived on wrong iif.",
 				   mfcs->mfcs_wrong_if);
 	}
diff --git a/lib/json_print.c b/lib/json_print.c
index bda72933..7a1cfa57 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -116,8 +116,10 @@ void close_json_array(enum output_type type, const char *str)
 		}							\
 	}
 _PRINT_FUNC(int, int);
+_PRINT_FUNC(s64, int64_t);
 _PRINT_FUNC(hu, unsigned short);
-_PRINT_FUNC(uint, uint64_t);
+_PRINT_FUNC(uint, unsigned int);
+_PRINT_FUNC(u64, uint64_t);
 _PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
 #undef _PRINT_FUNC
diff --git a/lib/json_writer.c b/lib/json_writer.c
index 0ad04218..dc2fdd49 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -220,7 +220,12 @@ void jsonw_hu(json_writer_t *self, unsigned short num)
 	jsonw_printf(self, "%hu", num);
 }
 
-void jsonw_uint(json_writer_t *self, uint64_t num)
+void jsonw_uint(json_writer_t *self, unsigned int num)
+{
+	jsonw_printf(self, "%u", num);
+}
+
+void jsonw_u64(json_writer_t *self, uint64_t num)
 {
 	jsonw_printf(self, "%"PRIu64, num);
 }
@@ -235,7 +240,12 @@ void jsonw_lluint(json_writer_t *self, unsigned long long int num)
 	jsonw_printf(self, "%llu", num);
 }
 
-void jsonw_int(json_writer_t *self, int64_t num)
+void jsonw_int(json_writer_t *self, int num)
+{
+	jsonw_printf(self, "%d", num);
+}
+
+void jsonw_s64(json_writer_t *self, int64_t num)
 {
 	jsonw_printf(self, "%"PRId64, num);
 }
@@ -268,12 +278,18 @@ void jsonw_float_field_fmt(json_writer_t *self,
 	jsonw_float_fmt(self, fmt, val);
 }
 
-void jsonw_uint_field(json_writer_t *self, const char *prop, uint64_t num)
+void jsonw_uint_field(json_writer_t *self, const char *prop, unsigned int num)
 {
 	jsonw_name(self, prop);
 	jsonw_uint(self, num);
 }
 
+void jsonw_u64_field(json_writer_t *self, const char *prop, uint64_t num)
+{
+	jsonw_name(self, prop);
+	jsonw_u64(self, num);
+}
+
 void jsonw_xint_field(json_writer_t *self, const char *prop, uint64_t num)
 {
 	jsonw_name(self, prop);
@@ -294,12 +310,18 @@ void jsonw_lluint_field(json_writer_t *self,
 	jsonw_lluint(self, num);
 }
 
-void jsonw_int_field(json_writer_t *self, const char *prop, int64_t num)
+void jsonw_int_field(json_writer_t *self, const char *prop, int num)
 {
 	jsonw_name(self, prop);
 	jsonw_int(self, num);
 }
 
+void jsonw_s64_field(json_writer_t *self, const char *prop, int64_t num)
+{
+	jsonw_name(self, prop);
+	jsonw_s64(self, num);
+}
+
 void jsonw_null_field(json_writer_t *self, const char *prop)
 {
 	jsonw_name(self, prop);
-- 
2.17.0

^ permalink raw reply related

* [PATCH] net: support compat 64-bit time in {s,g}etsockopt
From: Lance Richardson @ 2018-04-25 14:21 UTC (permalink / raw)
  To: netdev; +Cc: gopalsr83, vapier, hjl.tools

For the x32 ABI, struct timeval has two 64-bit fields. However
the kernel currently interprets the user-space values used for
the SO_RCVTIMEO and SO_SNDTIMEO socket options as having a pair
of 32-bit fields.

When the seconds portion of the requested timeout is less than 2**32,
the seconds portion of the effective timeout is correct but the
microseconds portion is zero.  When the seconds portion of the
requested timeout is zero and the microseconds portion is non-zero,
the kernel interprets the timeout as zero (never timeout).

Fix by using 64-bit time for SO_RCVTIMEO/SO_SNDTIMEO as required
for the ABI.

The code included below demonstrates the problem.

Results before patch:
    $ gcc -m64 -Wall -O2 -o socktmo socktmo.c && ./socktmo
    recv time: 2.008181 seconds
    send time: 2.015985 seconds

    $ gcc -m32 -Wall -O2 -o socktmo socktmo.c && ./socktmo
    recv time: 2.016763 seconds
    send time: 2.016062 seconds

    $ gcc -mx32 -Wall -O2 -o socktmo socktmo.c && ./socktmo
    recv time: 1.007239 seconds
    send time: 1.023890 seconds

Results after patch:
    $ gcc -m64 -O2 -Wall -o socktmo socktmo.c && ./socktmo
    recv time: 2.010062 seconds
    send time: 2.015836 seconds

    $ gcc -m32 -O2 -Wall -o socktmo socktmo.c && ./socktmo
    recv time: 2.013974 seconds
    send time: 2.015981 seconds

    $ gcc -mx32 -O2 -Wall -o socktmo socktmo.c && ./socktmo
    recv time: 2.030257 seconds
    send time: 2.013383 seconds

 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/socket.h>
 #include <sys/types.h>
 #include <sys/time.h>

 void checkrc(char *str, int rc)
 {
         if (rc >= 0)
                 return;

         perror(str);
         exit(1);
 }

 static char buf[1024];
 int main(int argc, char **argv)
 {
         int rc;
         int socks[2];
         struct timeval tv;
         struct timeval start, end, delta;

         rc = socketpair(AF_UNIX, SOCK_STREAM, 0, socks);
         checkrc("socketpair", rc);

         /* set timeout to 1.999999 seconds */
         tv.tv_sec = 1;
         tv.tv_usec = 999999;
         rc = setsockopt(socks[0], SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof tv);
         rc = setsockopt(socks[0], SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof tv);
         checkrc("setsockopt", rc);

         /* measure actual receive timeout */
         gettimeofday(&start, NULL);
         rc = recv(socks[0], buf, sizeof buf, 0);
         gettimeofday(&end, NULL);
         timersub(&end, &start, &delta);

         printf("recv time: %ld.%06ld seconds\n",
                (long)delta.tv_sec, (long)delta.tv_usec);

         /* fill send buffer */
         do {
                 rc = send(socks[0], buf, sizeof buf, 0);
         } while (rc > 0);

         /* measure actual send timeout */
         gettimeofday(&start, NULL);
         rc = send(socks[0], buf, sizeof buf, 0);
         gettimeofday(&end, NULL);
         timersub(&end, &start, &delta);

         printf("send time: %ld.%06ld seconds\n",
                (long)delta.tv_sec, (long)delta.tv_usec);
         exit(0);
 }

Fixes: 515c7af85ed9 ("x32: Use compat shims for {g,s}etsockopt")
Reported-by: Gopal RajagopalSai <gopalsr83@gmail.com>
Signed-off-by: Lance Richardson <lance.richardson.net@gmail.com>
---
 net/compat.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 5ae7437d3853..7242cce5631b 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -377,7 +377,8 @@ static int compat_sock_setsockopt(struct socket *sock, int level, int optname,
 	    optname == SO_ATTACH_REUSEPORT_CBPF)
 		return do_set_attach_filter(sock, level, optname,
 					    optval, optlen);
-	if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)
+	if (!COMPAT_USE_64BIT_TIME &&
+	    (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
 		return do_set_sock_timeout(sock, level, optname, optval, optlen);
 
 	return sock_setsockopt(sock, level, optname, optval, optlen);
@@ -448,7 +449,8 @@ static int do_get_sock_timeout(struct socket *sock, int level, int optname,
 static int compat_sock_getsockopt(struct socket *sock, int level, int optname,
 				char __user *optval, int __user *optlen)
 {
-	if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)
+	if (!COMPAT_USE_64BIT_TIME &&
+	    (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
 		return do_get_sock_timeout(sock, level, optname, optval, optlen);
 	return sock_getsockopt(sock, level, optname, optval, optlen);
 }
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next] microchipT1phy: Add driver for Microchip LAN87XX T1 PHYs
From: Florian Fainelli @ 2018-04-25 14:21 UTC (permalink / raw)
  To: Nisar Sayed, davem; +Cc: UNGLinuxDriver, netdev
In-Reply-To: <20180425184944.24939-1-Nisar.Sayed@microchip.com>



On 04/25/2018 11:49 AM, Nisar Sayed wrote:
> Add driver for Microchip LAN87XX T1 PHYs
> 
> This patch support driver for Microchp T1 PHYs.
> There will be followup patches to this driver to support T1 PHY
> features such as cable diagnostics, signal quality indicator(SQI),
> sleep and wakeup (TC10) support.
> 
> Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
> ---
>  drivers/net/phy/Kconfig          |   5 ++
>  drivers/net/phy/Makefile         |   1 +
>  drivers/net/phy/microchipT1phy.c | 116 +++++++++++++++++++++++++++++++++++++++
>  include/linux/microchipT1phy.h   |  28 ++++++++++
>  4 files changed, 150 insertions(+)
>  create mode 100644 drivers/net/phy/microchipT1phy.c
>  create mode 100644 include/linux/microchipT1phy.h
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index bdfbabb..7b0b351 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -354,6 +354,11 @@ config MICROCHIP_PHY
>  	help
>  	  Supports the LAN88XX PHYs.
>  
> +config MICROCHIP_T1_PHY
> +	tristate "Microchip T1 PHYs"
> +	---help---
> +	  Supports the LAN87XX PHYs.
> +
>  config MICROSEMI_PHY
>  	tristate "Microsemi PHYs"
>  	---help---
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 01acbcb..5706613 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -70,6 +70,7 @@ obj-$(CONFIG_MESON_GXL_PHY)	+= meson-gxl.o
>  obj-$(CONFIG_MICREL_KS8995MA)	+= spi_ks8995.o
>  obj-$(CONFIG_MICREL_PHY)	+= micrel.o
>  obj-$(CONFIG_MICROCHIP_PHY)	+= microchip.o
> +obj-$(CONFIG_MICROCHIP_T1_PHY)	+= microchipT1phy.o
>  obj-$(CONFIG_MICROSEMI_PHY)	+= mscc.o
>  obj-$(CONFIG_NATIONAL_PHY)	+= national.o
>  obj-$(CONFIG_QSEMI_PHY)		+= qsemi.o
> diff --git a/drivers/net/phy/microchipT1phy.c b/drivers/net/phy/microchipT1phy.c
> new file mode 100644
> index 0000000..15fbb04
> --- /dev/null
> +++ b/drivers/net/phy/microchipT1phy.c

Can we avoid CamelCase file names? Any reasons why this is not part of
microchip.c?

> @@ -0,0 +1,116 @@
> +/*
> + * Copyright (C) 2018 Microchip Technology

Please use a SPDX license tag.

> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/mii.h>
> +#include <linux/phy.h>
> +#include <linux/microchipT1phy.h>
> +
> +#define DRIVER_AUTHOR	"Nisar Sayed <nisar.sayed@microchip.com>"
> +#define DRIVER_DESC	"Microchip LAN87XX T1 PHY driver"
> +
> +struct lan87xx_priv {
> +	int	chip_id;
> +	int	chip_rev;
> +};
> +
> +static int lan87xx_phy_config_intr(struct phy_device *phydev)
> +{
> +	int rc = 0;
> +
> +	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
> +		/* unmask all source and clear them before enable */
> +		rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, 0x7FFF);
> +		rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
> +		rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK,
> +			       LAN87XX_MASK_LINK_UP | LAN87XX_MASK_LINK_DOWN);
> +	} else {
> +		rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, 0);
> +	}

Since the last thing you do in both branches is write to
LAN87XX_INTERRUPT_MASK, just move that write outside of the branches and
make sure the value you will set is correct in both cases.

> +
> +	return rc < 0 ? rc : 0;
> +}
> +
> +static int lan87xx_phy_ack_interrupt(struct phy_device *phydev)
> +{
> +	int rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
> +
> +	return rc < 0 ? rc : 0;
> +}
> +
> +static int lan87xx_probe(struct phy_device *phydev)
> +{
> +	struct device *dev = &phydev->mdio.dev;
> +	struct lan87xx_priv *priv;
> +
> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	/* these values can be used to identify internal PHY */
> +	priv->chip_id = phy_read(phydev, MII_PHYSID1);
> +	priv->chip_rev = phy_read(phydev, MII_PHYSID2);

This is absolutely not necessary, phydev->phy_id stores that information
and you can use that at any point in your driver's lifetime. Besides,
the revision is not the full MII_PHYSID2, it's only the lower 4 bits of it.

Your probe() and remove() function then become unnecessary.

> +
> +	phydev->priv = priv;
> +
> +	return 0;
> +}
> +
> +static void lan87xx_remove(struct phy_device *phydev)
> +{
> +	struct device *dev = &phydev->mdio.dev;
> +	struct lan87xx_priv *priv = phydev->priv;
> +
> +	if (priv)
> +		devm_kfree(dev, priv);
> +}
> +
> +static struct phy_driver microchip_T1_phy_driver[] = {
> +	{
> +		.phy_id         = 0x0007c150,
> +		.phy_id_mask    = 0xfffffff0,
> +		.name           = "Microchip LAN87xx",
> +
> +		.features       = SUPPORTED_100baseT_Full,

PHY_BASIC_FEATURES maybe? You cannot possibly be supporting just
100BaseT full duplex, 100BaseT half duplex, 10BaseT full and half duplex
are also supported, right?

> +		.flags          = PHY_HAS_INTERRUPT,
> +
> +		.probe          = lan87xx_probe,
> +		.remove         = lan87xx_remove,
> +
> +		.config_init    = genphy_config_init,
> +		.config_aneg    = genphy_config_aneg,
> +
> +		.ack_interrupt  = lan87xx_phy_ack_interrupt,
> +		.config_intr    = lan87xx_phy_config_intr,
> +
> +		.suspend        = genphy_suspend,
> +		.resume         = genphy_resume,
> +	}
> +};
> +
> +module_phy_driver(microchip_T1_phy_driver);
> +
> +static struct mdio_device_id __maybe_unused microchip_T1_tbl[] = {
> +	{ 0x0007c150, 0xfffffff0 },
> +	{ }
> +};
> +
> +MODULE_DEVICE_TABLE(mdio, microchip_T1_tbl);
> +
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/microchipT1phy.h b/include/linux/microchipT1phy.h
> new file mode 100644
> index 0000000..7353e7c
> --- /dev/null
> +++ b/include/linux/microchipT1phy.h
> @@ -0,0 +1,28 @@
> +/*
> + * Copyright (C) 2018 Microchip Technology

Same comment as before, please use SPDX license identifiers.

> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef _MICROCHIPT1PHY_H_
> +#define _MICROCHIPT1PHY_H_
> +
> +/* Interrupt Source Register */
> +#define LAN87XX_INTERRUPT_SOURCE                (0x18)
> +
> +/* Interrupt Mask Register */
> +#define LAN87XX_INTERRUPT_MASK                  (0x19)
> +#define LAN87XX_MASK_LINK_UP                    (0x0004)
> +#define LAN87XX_MASK_LINK_DOWN                  (0x0002)

What's the point of that header file if all definitions are consumed by
the same driver?

> +
> +#endif /* _MICROCHIPT1PHY_H_ */
> 

-- 
Florian

^ permalink raw reply

* Re: Repeating "unregister_netdevice: waiting for lo to become free" caused by upstream 76da0704507bb ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER")
From: Rafał Miłecki @ 2018-04-25 14:16 UTC (permalink / raw)
  To: WANG Cong, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Network Development, jeffy, David Ahern, Khlebnikov
  Cc: Greg Kroah-Hartman, Stable, Dan Streetman, Dan Streetman
In-Reply-To: <CACna6rwXW1LZdjpd2yc2_Gt=8M1n3aTF7_ZPGi01KVfwqxpWCQ@mail.gmail.com>

On 23.04.2018 15:08, Rafał Miłecki wrote:
> I've just updated my kernel 4.4.x and noticed a regression. Bisecting
> pointed me to the commit 2417da3f4d6bc ("ipv6: only call
> ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is
> backport of upstream 76da0704507bb. That backported commit has
> appeared in a 4.4.103.
> 
> I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping
> a container I start getting these messages:
> [  229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1
> [  239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1
> [  249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1
> (...)
> 
> Trying to start LXC nevertheless results in lxc-start command hang
> around network configuration. Trying to query LXC state afterwards
> results in a lxc-info command hang too.
> 
> I tried Googling for this issue and found similar reports:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637
> https://github.com/fnproject/fn/issues/686
> https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-waiting-for-lo-to-become-free-usage-count-1/
> all of them related to the Docker, which is probably a similar use
> case to the LXC.
> 
> I couldn't find any reference to commit 76da0704507bb that could
> suggest fixing the problem I'm seeing.
> 
> Does anyone have an idea what is the issue I'm seeing about? Or even
> better, how to fix it? Can I provide any additional info that would
> help?
> 
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5
> [1] https://openwrt.org/
> [2] https://linuxcontainers.org/

Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I
still experience the same problem.

 From reading various reports regarding that "unregister_netdevice:
waiting for lo to become free" message it appears the problem is caused
by a leaking dst refcnt somewhere in the kernel code.

I found links to few commit fixing leaks at various places:
4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst")
957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()")
4ee806d51176b ("net: tcp: close sock if net namespace is exiting")
d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()")
751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")

All above patches are present in the linux-v4.4.y and are part of kernel
4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.

Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once
for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only
expost existing one?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox