Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v4 00/16] Add Paravirtual RDMA Driver
From: Jason Gunthorpe @ 2016-09-12 18:02 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford, linux-rdma, pv-drivers, netdev, linux-pci, jhansen,
	asarwade, georgezhang, bryantan
In-Reply-To: <1473655766-31628-1-git-send-email-aditr@vmware.com>

On Sun, Sep 11, 2016 at 09:49:10PM -0700, Adit Ranadive wrote:
> [2] Libpvrdma User-level library - 
> http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary

You will probably find that rdma-plumbing will be the best way to get
your userspace component into the distributors.

  http://www.spinics.net/lists/linux-rdma/msg39026.html
  http://www.spinics.net/lists/linux-rdma/msg39328.html
  http://www.spinics.net/lists/linux-rdma/msg40014.html
  http://www.spinics.net/lists/linux-rdma/msg39026.html

Jason

^ permalink raw reply

* [PATCHv2 next 3/3] ipvlan: Introduce l3s mode
From: Mahesh Bandewar @ 2016-09-12 18:01 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, David Miller, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

In a typical IPvlan L3 setup where master is in default-ns and
each slave is into different (slave) ns. In this setup egress
packet processing for traffic originating from slave-ns will
hit all NF_HOOKs in slave-ns as well as default-ns. However same
is not true for ingress processing. All these NF_HOOKs are
hit only in the slave-ns skipping them in the default-ns.
IPvlan in L3 mode is restrictive and if admins want to deploy
iptables rules in default-ns, this asymmetric data path makes it
impossible to do so.

This patch makes use of the l3_rcv() (added as part of l3mdev
enhancements) to perform input route lookup on RX packets without
changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN
to change the skb->dev just before handing over skb to L4.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 Documentation/networking/ipvlan.txt |  7 ++-
 drivers/net/Kconfig                 |  1 +
 drivers/net/ipvlan/ipvlan.h         |  7 +++
 drivers/net/ipvlan/ipvlan_core.c    | 94 +++++++++++++++++++++++++++++++++++++
 drivers/net/ipvlan/ipvlan_main.c    | 60 ++++++++++++++++++++---
 include/uapi/linux/if_link.h        |  1 +
 6 files changed, 162 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt
index 14422f8fcdc4..24196cef7c91 100644
--- a/Documentation/networking/ipvlan.txt
+++ b/Documentation/networking/ipvlan.txt
@@ -22,7 +22,7 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
 	There are no module parameters for this driver and it can be configured
 using IProute2/ip utility.
 
-	ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }
+	ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | l3 | l3s }
 
 	e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
 
@@ -48,6 +48,11 @@ master device for the L2 processing and routing from that instance will be
 used before packets are queued on the outbound device. In this mode the slaves
 will not receive nor can send multicast / broadcast traffic.
 
+4.3 L3S mode:
+	This is very similar to the L3 mode except that iptables (conn-tracking)
+works in this mode and hence it is L3-symmetric (L3s). This will have slightly less
+performance but that shouldn't matter since you are choosing this mode over plain-L3
+mode to make conn-tracking work.
 
 5. What to choose (macvlan vs. ipvlan)?
 	These two devices are very similar in many regards and the specific use
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0c5415b05ea9..8768a625350d 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -149,6 +149,7 @@ config IPVLAN
     tristate "IP-VLAN support"
     depends on INET
     depends on IPV6
+    depends on NET_L3_MASTER_DEV
     ---help---
       This allows one to create virtual devices off of a main interface
       and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 695a5dc9ace3..68b270b59ba9 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -23,11 +23,13 @@
 #include <linux/if_vlan.h>
 #include <linux/ip.h>
 #include <linux/inetdevice.h>
+#include <linux/netfilter.h>
 #include <net/ip.h>
 #include <net/ip6_route.h>
 #include <net/rtnetlink.h>
 #include <net/route.h>
 #include <net/addrconf.h>
+#include <net/l3mdev.h>
 
 #define IPVLAN_DRV	"ipvlan"
 #define IPV_DRV_VER	"0.1"
@@ -96,6 +98,7 @@ struct ipvl_port {
 	struct work_struct	wq;
 	struct sk_buff_head	backlog;
 	int			count;
+	bool			ipt_hook_added;
 	struct rcu_head		rcu;
 };
 
@@ -124,4 +127,8 @@ struct ipvl_addr *ipvlan_find_addr(const struct ipvl_dev *ipvlan,
 				   const void *iaddr, bool is_v6);
 bool ipvlan_addr_busy(struct ipvl_port *port, void *iaddr, bool is_v6);
 void ipvlan_ht_addr_del(struct ipvl_addr *addr);
+struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb,
+			      u16 proto);
+unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
+			     const struct nf_hook_state *state);
 #endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index b5f9511d819e..b4e990743e1d 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -560,6 +560,7 @@ int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 	case IPVLAN_MODE_L2:
 		return ipvlan_xmit_mode_l2(skb, dev);
 	case IPVLAN_MODE_L3:
+	case IPVLAN_MODE_L3S:
 		return ipvlan_xmit_mode_l3(skb, dev);
 	}
 
@@ -664,6 +665,8 @@ rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb)
 		return ipvlan_handle_mode_l2(pskb, port);
 	case IPVLAN_MODE_L3:
 		return ipvlan_handle_mode_l3(pskb, port);
+	case IPVLAN_MODE_L3S:
+		return RX_HANDLER_PASS;
 	}
 
 	/* Should not reach here */
@@ -672,3 +675,94 @@ rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb)
 	kfree_skb(skb);
 	return RX_HANDLER_CONSUMED;
 }
+
+static struct ipvl_addr *ipvlan_skb_to_addr(struct sk_buff *skb,
+					    struct net_device *dev)
+{
+	struct ipvl_addr *addr = NULL;
+	struct ipvl_port *port;
+	void *lyr3h;
+	int addr_type;
+
+	if (!dev || !netif_is_ipvlan_port(dev))
+		goto out;
+
+	port = ipvlan_port_get_rcu(dev);
+	if (!port || port->mode != IPVLAN_MODE_L3S)
+		goto out;
+
+	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
+	if (!lyr3h)
+		goto out;
+
+	addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true);
+out:
+	return addr;
+}
+
+struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb,
+			      u16 proto)
+{
+	struct ipvl_addr *addr;
+	struct net_device *sdev;
+
+	addr = ipvlan_skb_to_addr(skb, dev);
+	if (!addr)
+		goto out;
+
+	sdev = addr->master->dev;
+	switch (proto) {
+	case AF_INET:
+	{
+		int err;
+		struct iphdr *ip4h = ip_hdr(skb);
+
+		err = ip_route_input_noref(skb, ip4h->daddr, ip4h->saddr,
+					   ip4h->tos, sdev);
+		if (unlikely(err))
+			goto out;
+		break;
+	}
+	case AF_INET6:
+	{
+		struct dst_entry *dst;
+		struct ipv6hdr *ip6h = ipv6_hdr(skb);
+		int flags = RT6_LOOKUP_F_HAS_SADDR;
+		struct flowi6 fl6 = {
+			.flowi6_iif   = sdev->ifindex,
+			.daddr        = ip6h->daddr,
+			.saddr        = ip6h->saddr,
+			.flowlabel    = ip6_flowinfo(ip6h),
+			.flowi6_mark  = skb->mark,
+			.flowi6_proto = ip6h->nexthdr,
+		};
+
+		skb_dst_drop(skb);
+		dst = ip6_route_input_lookup(dev_net(sdev), sdev, &fl6, flags);
+		skb_dst_set(skb, dst);
+		break;
+	}
+	default:
+		break;
+	}
+
+out:
+	return skb;
+}
+
+unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
+			     const struct nf_hook_state *state)
+{
+	struct ipvl_addr *addr;
+	unsigned int len;
+
+	addr = ipvlan_skb_to_addr(skb, skb->dev);
+	if (!addr)
+		goto out;
+
+	skb->dev = addr->master->dev;
+	len = skb->len + ETH_HLEN;
+	ipvlan_count_rx(addr->master, len, true, false);
+out:
+	return NF_ACCEPT;
+}
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 18b4e8c7f68a..d02be277e1db 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -9,24 +9,65 @@
 
 #include "ipvlan.h"
 
+static struct nf_hook_ops ipvl_nfops[] __read_mostly = {
+	{
+		.hook     = ipvlan_nf_input,
+		.pf       = NFPROTO_IPV4,
+		.hooknum  = NF_INET_LOCAL_IN,
+		.priority = INT_MAX,
+	},
+	{
+		.hook     = ipvlan_nf_input,
+		.pf       = NFPROTO_IPV6,
+		.hooknum  = NF_INET_LOCAL_IN,
+		.priority = INT_MAX,
+	},
+};
+
+static struct l3mdev_ops ipvl_l3mdev_ops __read_mostly = {
+	.l3mdev_l3_rcv = ipvlan_l3_rcv,
+};
+
 static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
 {
 	ipvlan->dev->mtu = dev->mtu - ipvlan->mtu_adj;
 }
 
-static void ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
+static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
 {
 	struct ipvl_dev *ipvlan;
+	int err = 0;
 
+	ASSERT_RTNL();
 	if (port->mode != nval) {
+		if (nval == IPVLAN_MODE_L3S) {
+			port->dev->l3mdev_ops = &ipvl_l3mdev_ops;
+			port->dev->priv_flags |= IFF_L3MDEV_MASTER;
+			if (!port->ipt_hook_added) {
+				err = _nf_register_hooks(ipvl_nfops,
+							ARRAY_SIZE(ipvl_nfops));
+				if (!err)
+					port->ipt_hook_added = true;
+				else
+					return err;
+			}
+		} else {
+			port->dev->priv_flags &= ~IFF_L3MDEV_MASTER;
+			port->dev->l3mdev_ops = NULL;
+			if (port->ipt_hook_added)
+				_nf_unregister_hooks(ipvl_nfops,
+						     ARRAY_SIZE(ipvl_nfops));
+			port->ipt_hook_added = false;
+		}
 		list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
-			if (nval == IPVLAN_MODE_L3)
+			if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S)
 				ipvlan->dev->flags |= IFF_NOARP;
 			else
 				ipvlan->dev->flags &= ~IFF_NOARP;
 		}
 		port->mode = nval;
 	}
+	return err;
 }
 
 static int ipvlan_port_create(struct net_device *dev)
@@ -132,7 +173,8 @@ static int ipvlan_open(struct net_device *dev)
 	struct net_device *phy_dev = ipvlan->phy_dev;
 	struct ipvl_addr *addr;
 
-	if (ipvlan->port->mode == IPVLAN_MODE_L3)
+	if (ipvlan->port->mode == IPVLAN_MODE_L3 ||
+	    ipvlan->port->mode == IPVLAN_MODE_L3S)
 		dev->flags |= IFF_NOARP;
 	else
 		dev->flags &= ~IFF_NOARP;
@@ -372,13 +414,14 @@ static int ipvlan_nl_changelink(struct net_device *dev,
 {
 	struct ipvl_dev *ipvlan = netdev_priv(dev);
 	struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev);
+	int err = 0;
 
 	if (data && data[IFLA_IPVLAN_MODE]) {
 		u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
 
-		ipvlan_set_port_mode(port, nmode);
+		err = ipvlan_set_port_mode(port, nmode);
 	}
-	return 0;
+	return err;
 }
 
 static size_t ipvlan_nl_getsize(const struct net_device *dev)
@@ -473,10 +516,13 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
 		unregister_netdevice(dev);
 		return err;
 	}
+	err = ipvlan_set_port_mode(port, mode);
+	if (err) {
+		unregister_netdevice(dev);
+		return err;
+	}
 
 	list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
-	ipvlan_set_port_mode(port, mode);
-
 	netif_stacked_transfer_operstate(phy_dev, dev);
 	return 0;
 }
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 9bf3aecfe05b..a615583bab09 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -464,6 +464,7 @@ enum {
 enum ipvlan_mode {
 	IPVLAN_MODE_L2 = 0,
 	IPVLAN_MODE_L3,
+	IPVLAN_MODE_L3S,
 	IPVLAN_MODE_MAX
 };
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCHv2 next 2/3] net: Add _nf_(un)register_hooks symbols
From: Mahesh Bandewar @ 2016-09-12 18:01 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, David Miller, Mahesh Bandewar, Pablo Neira Ayuso

From: Mahesh Bandewar <maheshb@google.com>

Add _nf_register_hooks() and _nf_unregister_hooks() calls which allow
caller to hold RTNL mutex.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter.h |  2 ++
 net/netfilter/core.c      | 51 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 9230f9aee896..e82b76781bf6 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -133,6 +133,8 @@ int nf_register_hook(struct nf_hook_ops *reg);
 void nf_unregister_hook(struct nf_hook_ops *reg);
 int nf_register_hooks(struct nf_hook_ops *reg, unsigned int n);
 void nf_unregister_hooks(struct nf_hook_ops *reg, unsigned int n);
+int _nf_register_hooks(struct nf_hook_ops *reg, unsigned int n);
+void _nf_unregister_hooks(struct nf_hook_ops *reg, unsigned int n);
 
 /* Functions to register get/setsockopt ranges (non-inclusive).  You
    need to check permissions yourself! */
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index f39276d1c2d7..2c5327e43a88 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -188,19 +188,17 @@ EXPORT_SYMBOL(nf_unregister_net_hooks);
 
 static LIST_HEAD(nf_hook_list);
 
-int nf_register_hook(struct nf_hook_ops *reg)
+static int _nf_register_hook(struct nf_hook_ops *reg)
 {
 	struct net *net, *last;
 	int ret;
 
-	rtnl_lock();
 	for_each_net(net) {
 		ret = nf_register_net_hook(net, reg);
 		if (ret && ret != -ENOENT)
 			goto rollback;
 	}
 	list_add_tail(&reg->list, &nf_hook_list);
-	rtnl_unlock();
 
 	return 0;
 rollback:
@@ -210,19 +208,34 @@ rollback:
 			break;
 		nf_unregister_net_hook(net, reg);
 	}
+	return ret;
+}
+
+int nf_register_hook(struct nf_hook_ops *reg)
+{
+	int ret;
+
+	rtnl_lock();
+	ret = _nf_register_hook(reg);
 	rtnl_unlock();
+
 	return ret;
 }
 EXPORT_SYMBOL(nf_register_hook);
 
-void nf_unregister_hook(struct nf_hook_ops *reg)
+static void _nf_unregister_hook(struct nf_hook_ops *reg)
 {
 	struct net *net;
 
-	rtnl_lock();
 	list_del(&reg->list);
 	for_each_net(net)
 		nf_unregister_net_hook(net, reg);
+}
+
+void nf_unregister_hook(struct nf_hook_ops *reg)
+{
+	rtnl_lock();
+	_nf_unregister_hook(reg);
 	rtnl_unlock();
 }
 EXPORT_SYMBOL(nf_unregister_hook);
@@ -246,6 +259,26 @@ err:
 }
 EXPORT_SYMBOL(nf_register_hooks);
 
+/* Caller MUST take rtnl_lock() */
+int _nf_register_hooks(struct nf_hook_ops *reg, unsigned int n)
+{
+	unsigned int i;
+	int err = 0;
+
+	for (i = 0; i < n; i++) {
+		err = _nf_register_hook(&reg[i]);
+		if (err)
+			goto err;
+	}
+	return err;
+
+err:
+	if (i > 0)
+		_nf_unregister_hooks(reg, i);
+	return err;
+}
+EXPORT_SYMBOL(_nf_register_hooks);
+
 void nf_unregister_hooks(struct nf_hook_ops *reg, unsigned int n)
 {
 	while (n-- > 0)
@@ -253,6 +286,14 @@ void nf_unregister_hooks(struct nf_hook_ops *reg, unsigned int n)
 }
 EXPORT_SYMBOL(nf_unregister_hooks);
 
+/* Caller MUST take rtnl_lock */
+void _nf_unregister_hooks(struct nf_hook_ops *reg, unsigned int n)
+{
+	while (n-- > 0)
+		_nf_unregister_hook(&reg[n]);
+}
+EXPORT_SYMBOL(_nf_unregister_hooks);
+
 unsigned int nf_iterate(struct list_head *head,
 			struct sk_buff *skb,
 			struct nf_hook_state *state,
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCHv2 next 1/3] ipv6: Export p6_route_input_lookup symbol
From: Mahesh Bandewar @ 2016-09-12 18:01 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, David Miller, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

Make ip6_route_input_lookup available outside of ipv6 the module
similar to ip_route_input_noref in the IPv4 world.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 include/net/ip6_route.h | 3 +++
 net/ipv6/route.c        | 7 ++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index d97305d0e71f..e0cd318d5103 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -64,6 +64,9 @@ static inline bool rt6_need_strict(const struct in6_addr *daddr)
 }
 
 void ip6_route_input(struct sk_buff *skb);
+struct dst_entry *ip6_route_input_lookup(struct net *net,
+					 struct net_device *dev,
+					 struct flowi6 *fl6, int flags);
 
 struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk,
 					 struct flowi6 *fl6, int flags);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 09d43ff11a8d..9563eedd4f97 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1147,15 +1147,16 @@ static struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *
 	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags);
 }
 
-static struct dst_entry *ip6_route_input_lookup(struct net *net,
-						struct net_device *dev,
-						struct flowi6 *fl6, int flags)
+struct dst_entry *ip6_route_input_lookup(struct net *net,
+					 struct net_device *dev,
+					 struct flowi6 *fl6, int flags)
 {
 	if (rt6_need_strict(&fl6->daddr) && dev->type != ARPHRD_PIMREG)
 		flags |= RT6_LOOKUP_F_IFACE;
 
 	return fib6_rule_lookup(net, fl6, flags, ip6_pol_route_input);
 }
+EXPORT_SYMBOL_GPL(ip6_route_input_lookup);
 
 void ip6_route_input(struct sk_buff *skb)
 {
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCHv2 next 0/3] ipvlan introduce l3s mode
From: Mahesh Bandewar @ 2016-09-12 18:01 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, David Miller, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

Same old problem with new approach especially from suggestions from
earlier patch-series.

First thing is that this is introduced as a new mode rather than
modifying the old (L3) mode. So the behavior of the existing modes is
preserved as it is and the new L3s mode obeys iptables so that intended
conn-tracking can work. 

To do this, the code uses newly added l3mdev_rcv() handler and an
Iptables hook. l3mdev_rcv() to perform an inbound route lookup with the
correct (IPvlan slave) interface and then IPtable-hook at LOCAL_INPUT
to change the input device from master to the slave to complete the
formality.

Supporting stack changes are trivial changes to export symbol to get
IPv4 equivalent code exported for IPv6 and to allow netfilter hook
registration code to allow caller to hold RTNL. Please look into
individual patches for details.

Mahesh Bandewar (3):
  ipv6: Export p6_route_input_lookup symbol
  net: Add _nf_(un)register_hooks symbols
  ipvlan: Introduce l3s mode

 Documentation/networking/ipvlan.txt |  7 ++-
 drivers/net/Kconfig                 |  1 +
 drivers/net/ipvlan/ipvlan.h         |  7 +++
 drivers/net/ipvlan/ipvlan_core.c    | 94 +++++++++++++++++++++++++++++++++++++
 drivers/net/ipvlan/ipvlan_main.c    | 60 ++++++++++++++++++++---
 include/linux/netfilter.h           |  2 +
 include/net/ip6_route.h             |  3 ++
 include/uapi/linux/if_link.h        |  1 +
 net/ipv6/route.c                    |  7 +--
 net/netfilter/core.c                | 51 ++++++++++++++++++--
 10 files changed, 217 insertions(+), 16 deletions(-)


v1: Initial post
v2: Text correction and config changed from "select" to "depends on"
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply

* Re: README: [PATCH RFC 11/11] net/mlx5e: XDP TX xmit more
From: Alexei Starovoitov via iovisor-dev @ 2016-09-12 17:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Tom Herbert, iovisor-dev, Jamal Hadi Salim, Saeed Mahameed,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA, Edward Cree
In-Reply-To: <20160912105655.0cb5607e-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Mon, Sep 12, 2016 at 10:56:55AM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 8 Sep 2016 23:30:50 -0700
> Alexei Starovoitov <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> > On Fri, Sep 09, 2016 at 07:36:52AM +0200, Jesper Dangaard Brouer wrote:
> > > > >  Lets do bundling/bulking from the start!    
> > > > 
> > > > mlx4 already does bulking and this proposed mlx5 set of patches
> > > > does bulking as well.
> > > > See nothing wrong about it. RX side processes the packets and
> > > > when it's done it tells TX to xmit whatever it collected.  
> > > 
> > > This is doing "hidden" bulking and not really taking advantage of using
> > > the icache more effeciently.  
> > > 
> > > Let me explain the problem I see, little more clear then, so you
> > > hopefully see where I'm going.
> > > 
> > > Imagine you have packets intermixed towards the stack and XDP_TX. 
> > > Every time you call the stack code, then you flush your icache.  When
> > > returning to the driver code, you will have to reload all the icache
> > > associated with the XDP_TX, this is a costly operation.  
> > 
> > correct. And why is that a problem?
> 
> It is good that you can see and acknowledge the I-cache problem.
> 
> XDP is all about performance.  What I hear is, that you are arguing
> against a model that will yield better performance, that does not make
> sense to me.  Let me explain this again, in another way.

I'm arguing against your proposal that I think will be more complex and
lower performance than what Saeed and the team already implemented.
Therefore I don't think it's fair to block the patch and ask them to
reimplement it just to test an idea that may or may not improve performance.

Getting maximum performance is tricky. Good is better than perfect.
It's important to argue about user space visible bits upfront, but
on the kernel performance side we should build/test incrementally.
This particular patch 11/11 is simple, easy to review and provides
good performance. What's not to like?

^ permalink raw reply

* Re: [PATCH v4 16/16] MAINTAINERS: Update for PVRDMA driver
From: Jason Gunthorpe @ 2016-09-12 17:52 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford, linux-rdma, pv-drivers, netdev, linux-pci, jhansen,
	asarwade, georgezhang, bryantan
In-Reply-To: <1473655766-31628-17-git-send-email-aditr@vmware.com>

On Sun, Sep 11, 2016 at 09:49:26PM -0700, Adit Ranadive wrote:
> Add maintainer info for the PVRDMA driver.

You can probably squash the last three patches.

.. and fix the __u32 stuff throughout the entire driver please.

Jason

^ permalink raw reply

* Re: [PATCH v4 03/16] IB/pvrdma: Add virtual device RDMA structures
From: Jason Gunthorpe @ 2016-09-12 17:50 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford, linux-rdma, pv-drivers, netdev, linux-pci, jhansen,
	asarwade, georgezhang, bryantan
In-Reply-To: <1473655766-31628-4-git-send-email-aditr@vmware.com>

On Sun, Sep 11, 2016 at 09:49:13PM -0700, Adit Ranadive wrote:
> +	__u8	raw[16];
> +	struct {
> +		__be64	subnet_prefix;
> +		__be64	interface_id;
> +	} global;

If this is not a userspace header do not use the __ varients..

Jason

^ permalink raw reply

* Re: [PATCH v4 02/16] IB/pvrdma: Add user-level shared functions
From: Jason Gunthorpe @ 2016-09-12 17:49 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford, linux-rdma, pv-drivers, netdev, linux-pci, jhansen,
	asarwade, georgezhang, bryantan
In-Reply-To: <1473655766-31628-3-git-send-email-aditr@vmware.com>

On Sun, Sep 11, 2016 at 09:49:12PM -0700, Adit Ranadive wrote:
> We share some common structures with the user-level driver. This patch
> adds those structures and shared functions to traverse the QP/CQ rings.

>  create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_uapi.h
>  create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_user.h

The files that are intended to be shared with userspace must be under
include/uapi/, please coordinate with Leon on the path.

Same for all the new drivers.

> +static inline __s32 pvrdma_idx(atomic_t *var, __u32 max_elems)
> +{
> +	const unsigned int idx = atomic_read(var);

Eh? Does this even compile in userspace?

If this is not a userspace header then why does it use __u32 and
related ??

> +#define PVRDMA_UVERBS_ABI_VERSION	3
> +#define PVRDMA_BOARD_ID			1
> +#define PVRDMA_REV_ID			1
> +
> +struct pvrdma_alloc_ucontext_resp {
> +	u32 qp_tab_size;
> +	u32 reserved;
> +};

This certainly looks like a userspace header, shouldn't it use __u32?

NAK

Jason

^ permalink raw reply

* Re: [PATCH -next] net: macb: fix missing unlock on error in macb_start_xmit()
From: Helmut Buchsbaum @ 2016-09-12 17:40 UTC (permalink / raw)
  To: Wei Yongjun, Nicolas Ferre; +Cc: netdev
In-Reply-To: <1473506277-31304-1-git-send-email-weiyj.lk@gmail.com>

On 09/10/2016 01:17 PM, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
>
> Fix missing unlock before return from function macb_start_xmit()
> in the error handling case.
>
> Fixes: 007e4ba3ee13 ("net: macb: initialize checksum when using
> checksum offloading")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/net/ethernet/cadence/macb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> index 0294b6a..63144bb 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -1398,7 +1398,7 @@ static int macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
>
>  	if (macb_clear_csum(skb)) {
>  		dev_kfree_skb_any(skb);
> -		return NETDEV_TX_OK;
> +		goto unlock;
>  	}
>
>  	/* Map socket buffer for DMA transfer */
>
You are definitely right. Sorry I missed that obvious point and for 
causing any inconveniences.

BTW, I see there are obviously quite a few users of MACB 
implementations. I'm just curious if anybody else ever encountered the 
checksum problem or if this a matter of Zynq implementation only.

Regards,
Helmut

^ permalink raw reply

* Re: [RFC 00/11] QLogic RDMA Driver (qedr) RFC
From: Yuval Mintz @ 2016-09-12 17:39 UTC (permalink / raw)
  To: Parav Pandit, Leon Romanovsky
  Cc: Ram Amrani, Doug Ledford, David Miller, Ariel Elior,
	Michal Kalderon, Rajesh Borundia, linux-rdma@vger.kernel.org,
	netdev
In-Reply-To: <CAG53R5UqF21z34jB2HJCTXU8w3Kbj_Jon=wBAoO5tYSqbD1f6g@mail.gmail.com>

>>>  include/linux/qed/common_hsi.h                 |    1 +
>>>  include/linux/qed/qed_if.h                     |    9 +-
>>>  include/linux/qed/qed_ll2_if.h                 |  140 +
>>>  include/linux/qed/qed_roce_if.h                |  604 ++++
>>>  include/linux/qed/qede_roce.h                  |   88 +
>> > include/linux/qed/rdma_common.h                |    1 +
>>
>> Something not directly related to your patches, but they brought my
>> attention to the fact that all these new (and old) rdma<->net devices
>> are polluting include/linux
>>
> ocrdma driver includes be_roce.h located in net/ethernet/emulex/benet
> location instead of include/linux/.
> This file helps to bind rdma to net device or underlying hw device.

> May be similar change can be done for rest of the drivers for
> rdma<-->net devices?

By adding explicit inclusion paths in the Makefile, a la
ccflags-y := -Idrivers/net/ethernet/emulex/benet   ?

While this might work, I personally dislike it as I find it
counter-intuitive when going over the code -
I don't expect driver to locally modify the inclusion path.
Besides, we're going to [eventually] a whole suite of drivers based
on the qed module, some of which would reside under drivers/scsi;
Not sure it's best to have 3 or 4 different drivers privately include the
same directory under a different subsystem.

>> Filtered output:
>> ➜  linux-rdma git:(topic/fixes-for-4.8-2) ls -dl include/linux/*/
>> drwxrwxr-x  2 leonro leonro  4096 Aug 30 16:27 include/linux/hsi/
>> drwxrwxr-x  2 leonro leonro  4096 Sep 12 19:08 include/linux/mlx4/
>> drwxrwxr-x  2 leonro leonro  4096 Sep  7 15:31 include/linux/mlx5/
>> drwxrwxr-x  2 leonro leonro  4096 Sep  8 17:46 include/linux/qed/
>>
>> Is this the right place for them?
>
> Thanks
     

^ permalink raw reply

* Re: icmpv6: issue with routing table entries from link local addresses
From: Hannes Frederic Sowa @ 2016-09-12 17:26 UTC (permalink / raw)
  To: Andreas Hübner, netdev, d. caratti
In-Reply-To: <20160912142732.GI26782@targo.k4n.de>

Hello,

On 12.09.2016 16:27, Andreas Hübner wrote:
> Hi,
> 
> I'm currently debugging a potential issue with the icmpv6 stack and
> hopefully this is the correct place to ask. (Was actually looking for a
> more specific list, but didn't find anything. Please point me to a more
> apropriate list if this is out of place here.)
> 
> I have the following setup:
>   - 2 directly connected hosts (A+B), both have only link local addresses
>     configured (interface on both hosts is eth0)
>   - host B is also connected to another host C (via interface eth1)
>   - main routing table (relevant part) on host B looks like this:
> 
>       fe80::/64 dev eth1  proto kernel  metric 256
>       fe80::/64 dev eth0  proto kernel  metric 256
> 
>   - host A is trying to ICMPv6 ping the link local address of host B
> 
> The issue I currently have is, that the echo reply that host B should
> generate is never sent back to host A. If I change the order of the
> routing table entries on host B, everything works fine.
> (host A is connected on eth0)
> 
> I'm wondering, if this is how it is supposed to work. Do we need to do a
> routing table lookup when generating an ICMPv6 echo reply for link local
> addresses?  (From my understanding, this is not done in the neighbour
> discovery stack, so why here?)

For global addresses this is necessary as asymetric routing could be
involved and we don't want to treat ping echos in any way special.

> Actually, I'm convinced I must be doing something wrong here. The setup
> for the issue is quite trivial, someone would have tripped over it
> already. The only condition is that one host has multiple interfaces
> with ipv6 enabled.
> 
> Any help in shedding some light onto this issue would be appreciated.

This shouldn't be the case. We certainly carry over the ifindex of the
received packet into the routing lookup of the outgoing packet, thus the
appropriate rule, with outgoing ifindex should be selected.

I also couldn't reproduce your problem here with my system. Can you
verify with tcpdump that the packet is leaving on another interface?

Thanks,
Hannes

^ permalink raw reply

* Re: [RFC V3 PATCH 00/26] Kernel NET policy
From: Cong Wang @ 2016-09-12 17:21 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Liang, Kan, David Miller, LKML, Linux Kernel Network Developers,
	Jeff Kirsher, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	Kees Cook, Al Viro, Cyrill Gorcunov, John Stultz, Alexander Duyck,
	ben, David Decotigny, Alexander Duyck <alexander.
In-Reply-To: <20160912153821.GA11685@breakpoint.cc>

On Mon, Sep 12, 2016 at 8:38 AM, Florian Westphal <fw@strlen.de> wrote:
> kan.liang@intel.com <kan.liang@intel.com> wrote:
>> From: Kan Liang <kan.liang@intel.com>
>>
>> It is a big challenge to get good network performance. First, the network
>> performance is not good with default system settings. Second, it is too
>
> [..]
>
> I ask to be dropped from CC list of further submissions of this series,
> I've said all I have say about this ('do it in userspace') and
> its very unlikely I will change my opinion.

+1
Same for me.

^ permalink raw reply

* Re: [PATCH v4 01/16] vmxnet3: Move PCI Id to pci_ids.h
From: Bjorn Helgaas @ 2016-09-12 17:21 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	pv-drivers-pghWNbHTmq7QT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, jhansen-pghWNbHTmq7QT0dZR+AlfA,
	asarwade-pghWNbHTmq7QT0dZR+AlfA,
	georgezhang-pghWNbHTmq7QT0dZR+AlfA,
	bryantan-pghWNbHTmq7QT0dZR+AlfA
In-Reply-To: <1473655766-31628-2-git-send-email-aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>

On Sun, Sep 11, 2016 at 09:49:11PM -0700, Adit Ranadive wrote:
> The VMXNet3 PCI Id will be shared with our paravirtual RDMA driver.
> Moved it to the shared location in pci_ids.h.
> 
> Suggested-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Adit Ranadive <aditr-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>

Acked-by: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

> ---
> ---
>  drivers/net/vmxnet3/vmxnet3_int.h | 3 +--
>  include/linux/pci_ids.h           | 1 +
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
> index 74fc030..2bd6bf8 100644
> --- a/drivers/net/vmxnet3/vmxnet3_int.h
> +++ b/drivers/net/vmxnet3/vmxnet3_int.h
> @@ -119,9 +119,8 @@ enum {
>  };
>  
>  /*
> - * PCI vendor and device IDs.
> + * Maximum devices supported.
>   */
> -#define PCI_DEVICE_ID_VMWARE_VMXNET3    0x07B0
>  #define MAX_ETHERNET_CARDS		10
>  #define MAX_PCI_PASSTHRU_DEVICE		6
>  
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index c58752f..98bb455 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2251,6 +2251,7 @@
>  #define PCI_DEVICE_ID_RASTEL_2PORT	0x2000
>  
>  #define PCI_VENDOR_ID_VMWARE		0x15ad
> +#define PCI_DEVICE_ID_VMWARE_VMXNET3	0x07b0
>  
>  #define PCI_VENDOR_ID_ZOLTRIX		0x15b0
>  #define PCI_DEVICE_ID_ZOLTRIX_2BD0	0x2bd0
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 net 1/1] net sched actions: fix GETing actions
From: Cong Wang @ 2016-09-12 17:18 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David Miller, Linux Kernel Network Developers
In-Reply-To: <1473675692-8490-1-git-send-email-jhs@emojatatu.com>

On Mon, Sep 12, 2016 at 3:21 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> From: Jamal Hadi Salim <jhs@mojatatu.com>
>
> With the batch changes that translated transient actions into
> a temporary list we lost in the translation the fact that
> tcf_action_destroy() will eventually delete the action from
> the permanent location if the refcount is zero.
>
> Example of what broke:
> ...add a gact action to drop
> sudo $TC actions add action drop index 10
> ...now retrieve it, looks good
> sudo $TC actions get action gact index 10
> ...retrieve it again and find it is gone!
> sudo $TC actions get action gact index 10
>
> Fixes:
> 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array"),
> 824a7e8863b3 ("net_sched: remove an unnecessary list_del()")
> f07fed82ad79 ("net_sched: remove the leftover cleanup_a()")
>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
>  net/sched/act_api.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index d09d068..63b8167 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -592,6 +592,16 @@ err_out:
>         return ERR_PTR(err);
>  }
>
> +static void cleanup_a(struct list_head *actions, int ovr)
> +{
> +       struct tc_action *a, *tmp;
> +
> +       list_for_each_entry_safe(a, tmp, actions, list) {


No need the safe version.

> +               if (ovr)
> +                       a->tcfa_refcnt-=1;

How about tcfa_bindcnt?

I hate to point out coding style issue, but since you need to update
the patch anyway, please add two spaces surround '-='.

I think checkpatch.pl should be able to catch this.

> +       }
> +}
> +
>  int tcf_action_init(struct net *net, struct nlattr *nla,
>                                   struct nlattr *est, char *name, int ovr,
>                                   int bind, struct list_head *actions)
> @@ -612,8 +622,15 @@ int tcf_action_init(struct net *net, struct nlattr *nla,
>                         goto err;
>                 }
>                 act->order = i;
> +               if (ovr)

Need to check this boolean? It looks like we need this for !ovr case too?


> +                       act->tcfa_refcnt+=1;


Ditto for coding style.

>                 list_add_tail(&act->list, actions);
>         }
> +
> +       /* Remove the temp refcnt which was necessary to protect against
> +        * destroying an existing action which was being replaced
> +        */
> +       cleanup_a(actions, ovr);
>         return 0;
>
>  err:
> @@ -883,6 +900,8 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
>                         goto err;
>                 }
>                 act->order = i;
> +               if (event == RTM_GETACTION)
> +                       act->tcfa_refcnt+=1;

Ditto.


>                 list_add_tail(&act->list, &actions);
>         }
>


Thanks.

^ permalink raw reply

* Re: [PATCH 12/15] sis900: use IS_ENABLED() instead of checking for built-in or module
From: Daniele Venzano @ 2016-09-12 17:10 UTC (permalink / raw)
  To: Javier Martinez Canillas, linux-kernel; +Cc: netdev
In-Reply-To: <1473689026-6983-13-git-send-email-javier@osg.samsung.com>



On 12 September 2016 16:03:43 CEST, Javier Martinez Canillas <javier@osg.samsung.com> wrote:
>The IS_ENABLED() macro checks if a Kconfig symbol has been enabled
>either
>built-in or as a module, use that macro instead of open coding the
>same.
>
>Using the macro makes the code more readable by helping abstract away
>some
>of the Kconfig built-in and module enable details.
>
>Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>

Acked-by: Daniele Venzano <venza@brownhat.org>

^ permalink raw reply

* [PATCH net-next 2/2] errqueue: include linux/time.h
From: Willem de Bruijn @ 2016-09-12 17:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, linux-kernel, john.stultz, bmoses, Willem de Bruijn
In-Reply-To: <1473699930-58865-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

struct scm_timestamping has fields of type struct timespec. Now that
it is safe to include linux/time.h and time.h at the same time,
include linux/time.h directly in linux/errqueue.h

Without this patch, when compiling the following program after
make headers_install:

gcc -Wall -Werror -Iusr/include -c -xc - <<EOF
  #include <linux/errqueue.h>
  static struct scm_timestamping tss;
  int main(void) { tss.ts[0].tv_sec = 1; return 0; }
EOF

gcc gives this error:

  In file included from <stdin>:1:0:
  usr/include/linux/errqueue.h:33:18: error: array type has incomplete element type
    struct timespec ts[3];

Reported-by: Brooks Moses <bmoses@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/errqueue.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index 07bdce1..abafec8 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -1,6 +1,7 @@
 #ifndef _UAPI_LINUX_ERRQUEUE_H
 #define _UAPI_LINUX_ERRQUEUE_H
 
+#include <linux/time.h>
 #include <linux/types.h>
 
 struct sock_extended_err {
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 1/2] uapi glibc compat: make linux/time.h compile with user time.h files
From: Willem de Bruijn @ 2016-09-12 17:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, linux-kernel, john.stultz, bmoses, Willem de Bruijn
In-Reply-To: <1473699930-58865-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Add libc-compat workaround for definitions in linux/time.h that
duplicate those in libc time.h, sys/time.h and bits/time.h.

With this change, userspace builds succeeds when linux/time.h is
included after libc time.h and when it is included after sys/time.h.

The inverse requires additional changes to those userspace headers.

Without this patch, when compiling the following program after
make headers_install:

  echo -e "#include <time.h>\n#include <linux/time.h>" | \
  	gcc -Wall -Werror -Iusr/include -c -xc -

gcc gives these errors:

  #include <time.h>
  #include <linux/time.h>

    In file included from ../test_time.c:3:0:
    /usr/include/time.h:120:8: error: redefinition of ‘struct timespec’
     struct timespec
    	^
    In file included from ../test_time.c:2:0:
    ./usr/include/linux/time.h:9:8: note: originally defined here
     struct timespec {
    	^
    In file included from ../test_time.c:3:0:
    /usr/include/time.h:161:8: error: redefinition of ‘struct itimerspec’
     struct itimerspec
    	^
    In file included from ../test_time.c:2:0:
    ./usr/include/linux/time.h:34:8: note: originally defined here
     struct itimerspec {

and this warning by indirect inclusion of bits/time.h:

    In file included from ../test_time.c:4:0:
    ./usr/include/linux/time.h:67:0: error: "TIMER_ABSTIME" redefined [-Werror]
     #define TIMER_ABSTIME   0x01
     ^
    In file included from /usr/include/time.h:41:0,
    		 from ../test_time.c:3:
    /usr/include/x86_64-linux-gnu/bits/time.h:82:0: note: this is the location of the previous definition
     #   define TIMER_ABSTIME  1
     ^

The _SYS_TIME_H variant resolves similar errors for timeval, timezone,
itimerval and warnings for ITIMER_REAL, ITIMER_VIRTUAL, ITIMER_PROF.

Ran the same program for sys/time.h and bits/time.h.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/libc-compat.h | 50 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/time.h        | 15 ++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/include/uapi/linux/libc-compat.h b/include/uapi/linux/libc-compat.h
index 44b8a6b..b08b0f5 100644
--- a/include/uapi/linux/libc-compat.h
+++ b/include/uapi/linux/libc-compat.h
@@ -165,6 +165,43 @@
 #define __UAPI_DEF_XATTR		1
 #endif
 
+/* Definitions for time.h */
+#if defined(__timespec_defined)
+#define __UAPI_DEF_TIMESPEC		0
+#else
+#define __UAPI_DEF_TIMESPEC		1
+#endif
+
+#if defined(_TIME_H) && defined(__USE_POSIX199309)
+#define __UAPI_DEF_ITIMERSPEC		0
+#else
+#define __UAPI_DEF_ITIMERSPEC		1
+#endif
+
+/* Definitions for sys/time.h */
+#if defined(_SYS_TIME_H)
+#define __UAPI_DEF_TIMEVAL		0
+#define __UAPI_DEF_ITIMERVAL		0
+#define __UAPI_DEF_ITIMER_WHICH		0
+#else
+#define __UAPI_DEF_TIMEVAL		1
+#define __UAPI_DEF_ITIMERVAL		1
+#define __UAPI_DEF_ITIMER_WHICH		1
+#endif
+
+/* Definitions for bits/time.h */
+#if defined(_BITS_TIME_H)
+#define __UAPI_DEF_ABSTIME		0
+#else
+#define __UAPI_DEF_ABSTIME		1
+#endif
+
+#if defined(_SYS_TIME_H) && defined(__USE_BSD)
+#define __UAPI_DEF_TIMEZONE		0
+#else
+#define __UAPI_DEF_TIMEZONE		1
+#endif
+
 /* If we did not see any headers from any supported C libraries,
  * or we are being included in the kernel, then define everything
  * that we need. */
@@ -208,6 +245,19 @@
 /* Definitions for xattr.h */
 #define __UAPI_DEF_XATTR		1
 
+/* Definitions for time.h */
+#define __UAPI_DEF_TIMESPEC		1
+#define __UAPI_DEF_ITIMERSPEC		1
+
+/* Definitions for sys/time.h */
+#define __UAPI_DEF_TIMEVAL		1
+#define __UAPI_DEF_ITIMERVAL		1
+#define __UAPI_DEF_ITIMER_WHICH		1
+#define __UAPI_DEF_TIMEZONE		1
+
+/* Definitions for bits/time.h */
+#define __UAPI_DEF_ABSTIME		1
+
 #endif /* __GLIBC__ */
 
 #endif /* _UAPI_LIBC_COMPAT_H */
diff --git a/include/uapi/linux/time.h b/include/uapi/linux/time.h
index e75e1b6..4e7333c 100644
--- a/include/uapi/linux/time.h
+++ b/include/uapi/linux/time.h
@@ -1,9 +1,11 @@
 #ifndef _UAPI_LINUX_TIME_H
 #define _UAPI_LINUX_TIME_H
 
+#include <linux/libc-compat.h>
 #include <linux/types.h>
 
 
+#if __UAPI_DEF_TIMESPEC
 #ifndef _STRUCT_TIMESPEC
 #define _STRUCT_TIMESPEC
 struct timespec {
@@ -11,35 +13,46 @@ struct timespec {
 	long		tv_nsec;		/* nanoseconds */
 };
 #endif
+#endif
 
+#if __UAPI_DEF_TIMEVAL
 struct timeval {
 	__kernel_time_t		tv_sec;		/* seconds */
 	__kernel_suseconds_t	tv_usec;	/* microseconds */
 };
+#endif
 
+#if __UAPI_DEF_TIMEZONE
 struct timezone {
 	int	tz_minuteswest;	/* minutes west of Greenwich */
 	int	tz_dsttime;	/* type of dst correction */
 };
+#endif
 
 
 /*
  * Names of the interval timers, and structure
  * defining a timer setting:
  */
+#if __UAPI_DEF_ITIMER_WHICH
 #define	ITIMER_REAL		0
 #define	ITIMER_VIRTUAL		1
 #define	ITIMER_PROF		2
+#endif
 
+#if __UAPI_DEF_ITIMERSPEC
 struct itimerspec {
 	struct timespec it_interval;	/* timer period */
 	struct timespec it_value;	/* timer expiration */
 };
+#endif
 
+#if __UAPI_DEF_ITIMERVAL
 struct itimerval {
 	struct timeval it_interval;	/* timer interval */
 	struct timeval it_value;	/* current value */
 };
+#endif
 
 /*
  * The IDs of the various system clocks (for POSIX.1b interval timers):
@@ -64,6 +77,8 @@ struct itimerval {
 /*
  * The various flags for setting POSIX.1b interval timers:
  */
+#if __UAPI_DEF_ABSTIME
 #define TIMER_ABSTIME			0x01
+#endif
 
 #endif /* _UAPI_LINUX_TIME_H */
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next 0/2] uapi: include time.h from errqueue.h
From: Willem de Bruijn @ 2016-09-12 17:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, linux-kernel, john.stultz, bmoses, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

It was reported that linux/errqueue.h requires linux/time.h, but that
adding the include directly may cause userspace conflicts between
linux/time.h and glibc time.h:

  https://lkml.org/lkml/2016/7/10/10

Address the conflicts using the standard libc-compat approach, then
add the #include to errqueue.h

The first patch is a resubmit. It was previously submitted to
tip/timers/core, but given the commit history, the maintainer
suggested this tree, instead.

  https://lkml.org/lkml/2016/8/10/748

This also allows sending the follow-up as part of the patchset.

Willem de Bruijn (2):
  uapi glibc compat: make linux/time.h compile with user time.h files
  errqueue: include linux/time.h

 include/uapi/linux/errqueue.h    |  1 +
 include/uapi/linux/libc-compat.h | 50 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/time.h        | 15 ++++++++++++
 3 files changed, 66 insertions(+)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply

* Re: [RFC V3 PATCH 22/26] net/netpolicy: set per task policy by proc
From: Sergei Shtylyov @ 2016-09-12 17:01 UTC (permalink / raw)
  To: kan.liang, davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi
In-Reply-To: <1473692159-4017-23-git-send-email-kan.liang@intel.com>

On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:

> From: Kan Liang <kan.liang@intel.com>
>
> Users may not want to change the source code to add per task net polic

    Policy?

> support. Or they may want to change a running task's net policy. prctl
> does not work for both cases.
>
> This patch adds an interface in /proc, which can be used to set and
> retrieve policy of already running tasks. User can write the policy name
> into /proc/$PID/net_policy to set per task net policy.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>

[...]

MBR, Sergei

^ permalink raw reply

* Re: [RFC 00/11] QLogic RDMA Driver (qedr) RFC
From: Parav Pandit @ 2016-09-12 16:49 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ram Amrani, Doug Ledford, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	Yuval.Mintz-h88ZbnxC6KDQT0dZR+AlfA,
	Ariel.Elior-h88ZbnxC6KDQT0dZR+AlfA,
	Michal.Kalderon-h88ZbnxC6KDQT0dZR+AlfA,
	rajesh.borundia-h88ZbnxC6KDQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160912163928.GK8812-2ukJVAZIZ/Y@public.gmane.org>

On Mon, Sep 12, 2016 at 10:09 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Sep 12, 2016 at 07:07:34PM +0300, Ram Amrani wrote:
>
>>  include/linux/qed/common_hsi.h                 |    1 +
>>  include/linux/qed/qed_if.h                     |    9 +-
>>  include/linux/qed/qed_ll2_if.h                 |  140 +
>>  include/linux/qed/qed_roce_if.h                |  604 ++++
>>  include/linux/qed/qede_roce.h                  |   88 +
>>  include/linux/qed/rdma_common.h                |    1 +
>
> Something not directly related to your patches, but they brought my
> attention to the fact that all these new (and old) rdma<->net devices
> are polluting include/linux
>
ocrdma driver includes be_roce.h located in net/ethernet/emulex/benet
location instead of include/linux/.
This file helps to bind rdma to net device or underlying hw device.

May be similar change can be done for rest of the drivers for
rdma<-->net devices?


> Filtered output:
> ➜  linux-rdma git:(topic/fixes-for-4.8-2) ls -dl include/linux/*/
> drwxrwxr-x  2 leonro leonro  4096 Aug 30 16:27 include/linux/hsi/
> drwxrwxr-x  2 leonro leonro  4096 Sep 12 19:08 include/linux/mlx4/
> drwxrwxr-x  2 leonro leonro  4096 Sep  7 15:31 include/linux/mlx5/
> drwxrwxr-x  2 leonro leonro  4096 Sep  8 17:46 include/linux/qed/
>
> Is this the right place for them?
>
> Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
From: Sergei Shtylyov @ 2016-09-12 16:48 UTC (permalink / raw)
  To: kan.liang, davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi
In-Reply-To: <1473692159-4017-4-git-send-email-kan.liang@intel.com>

Hello.

On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:

> From: Kan Liang <kan.liang@intel.com>
>
> Net policy needs to know device information. Currently, it's enough to
> only get irq information of rx and tx queues.
>
> This patch introduces ndo ops to do so, not ethtool ops.
> Because there are already several ways to get irq information in
> userspace. It's not necessory to extend the ethtool.

    Necessary.

> Signed-off-by: Kan Liang <kan.liang@intel.com>

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 13/15] stmmac: use IS_ENABLED() instead of checking for built-in or module
From: Alexandre Torgue @ 2016-09-12 16:47 UTC (permalink / raw)
  To: Javier Martinez Canillas, linux-kernel; +Cc: netdev, Giuseppe Cavallaro
In-Reply-To: <1473689026-6983-14-git-send-email-javier@osg.samsung.com>

Hi Javier,

On 09/12/2016 04:03 PM, Javier Martinez Canillas wrote:
> The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
> built-in or as a module, use that macro instead of open coding the same.
>
> Using the macro makes the code more readable by helping abstract away some
> of the Kconfig built-in and module enable details.
>
> Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
> ---
>
>  drivers/net/ethernet/stmicro/stmmac/common.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
> index 2533b91f1421..d3292c4a6eda 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/common.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
> @@ -30,7 +30,7 @@
>  #include <linux/stmmac.h>
>  #include <linux/phy.h>
>  #include <linux/module.h>
> -#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
> +#if IS_ENABLED(CONFIG_VLAN_8021Q)
>  #define STMMAC_VLAN_TAG_USED
>  #include <linux/if_vlan.h>
>  #endif
>

Reviewed-by: Alexandre TORGUE <alexandre.torgue@st.com>

Thanks,

Alex

^ permalink raw reply

* Re: [RFC 00/11] QLogic RDMA Driver (qedr) RFC
From: Leon Romanovsky @ 2016-09-12 16:39 UTC (permalink / raw)
  To: Ram Amrani
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	Yuval.Mintz-h88ZbnxC6KDQT0dZR+AlfA,
	Ariel.Elior-h88ZbnxC6KDQT0dZR+AlfA,
	Michal.Kalderon-h88ZbnxC6KDQT0dZR+AlfA,
	rajesh.borundia-h88ZbnxC6KDQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1473696465-27986-1-git-send-email-Ram.Amrani-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

On Mon, Sep 12, 2016 at 07:07:34PM +0300, Ram Amrani wrote:

>  include/linux/qed/common_hsi.h                 |    1 +
>  include/linux/qed/qed_if.h                     |    9 +-
>  include/linux/qed/qed_ll2_if.h                 |  140 +
>  include/linux/qed/qed_roce_if.h                |  604 ++++
>  include/linux/qed/qede_roce.h                  |   88 +
>  include/linux/qed/rdma_common.h                |    1 +

Something not directly related to your patches, but they brought my
attention to the fact that all these new (and old) rdma<->net devices
are polluting include/linux

Filtered output:
➜  linux-rdma git:(topic/fixes-for-4.8-2) ls -dl include/linux/*/
drwxrwxr-x  2 leonro leonro  4096 Aug 30 16:27 include/linux/hsi/
drwxrwxr-x  2 leonro leonro  4096 Sep 12 19:08 include/linux/mlx4/
drwxrwxr-x  2 leonro leonro  4096 Sep  7 15:31 include/linux/mlx5/
drwxrwxr-x  2 leonro leonro  4096 Sep  8 17:46 include/linux/qed/

Is this the right place for them?

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: stmmac/RTL8211F/Meson GXBB: TX throughput problems
From: Alexandre Torgue @ 2016-09-12 16:37 UTC (permalink / raw)
  To: Martin Blumenstingl, netdev, linux-amlogic
  Cc: Giuseppe Cavallaro, Johnson Leung
In-Reply-To: <CAFBinCALSL_-izJ2tEsAjevAw90kGNTrQ3na3D1YV8f1dS1=Xg@mail.gmail.com>

Hi Martin,


On 09/11/2016 10:39 PM, Martin Blumenstingl wrote:
> Hello,
>
> I have a device with a Meson GXBB SoC with an stmmac IP block.

Which Synopsys IP version do you use ?

> Gbit ethernet on my device is provided by a Realtek RTL8211F RGMII PHY.
> Similar issues were reported in #linux-amlogic by a user with an
> Odroid C2 board (= similar hardware).
>
> The symptoms are:
> Receiving data is plenty fast (I can max out my internet connection
> easily, and with iperf3 I get ~900Mbit/s).
> Transmitting data from the device is unfortunately very slow, traffic
> sometimes even stalls completely.
>
> I have attached the iperf results and the output of
> /sys/kernel/debug/stmmaceth/eth0/descriptors_status.
> Below you can find the ifconfig, netstat and stmmac dma_cap info
> (*after* I ran all tests).
>
> The "involved parties" are:
> - Meson GXBB specific network configuration registers (I have have
> double-checked them with the reference drivers: everything seems fine
> here)
> - stmmac: it seems that nobody else has reported these kind of issues
> so far, however I'd still like to hear where I should enable some
> debugging bits to rule out any stmmac bug

On my side, I just tested on the same "kind" of system:
-SYNOPSYS GMAC 3.7
-RTL8211EG as PHY

With I perf, I reach:
	-RX: 932 Mbps
	-TX: 820Mbps

Can you check ethtool -S eth0 (most precisely "MMC"counter and errors) ?
Which kernel version do you use ?

Regards

Alex




> - RTL8211F PHY driver: unfortunately there are no public datasheets
> available so this is hard to debug. but I'm guessing that TX delay
> could cause similar issues, so this may be the cause as well.
>
>
> Thanks for any input in advance!
> Regards,
> Martin
>
>
> [root@alarm ~]# ifconfig eth0
> eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 192.168.1.235  netmask 255.255.255.0  broadcast 192.168.1.255
>         ether e2:aa:53:fc:f5:c5  txqueuelen 1000  (Ethernet)
>         RX packets 1967602  bytes 2968750265 (2.7 GiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 101875  bytes 8548285 (8.1 MiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 18
>
> [root@alarm ~]# netstat -i
> Kernel Interface table
> Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
> eth0      1500  1967801      0      0 0        101934      0      0      0 BMRU
>
> [root@alarm ~]# cat /sys/kernel/debug/stmmaceth/eth0/dma_cap
> ==============================
>         DMA HW features
> ==============================
>         10/100 Mbps Y
>         1000 Mbps Y
>         Half duple Y
>         Hash Filter: Y
>         Multiple MAC address registers: Y
>         PCS (TBI/SGMII/RTBI PHY interfatces): N
>         SMA (MDIO) Interface: Y
>         PMT Remote wake up: Y
>         PMT Magic Frame: Y
>         RMON module: Y
>         IEEE 1588-2002 Time Stamp: N
>         IEEE 1588-2008 Advanced Time Stamp:N
>         802.3az - Energy-Efficient Ethernet (EEE) Y
>         AV features: N
>         Checksum Offload in TX: Y
>         IP Checksum Offload (type1) in RX: N
>         IP Checksum Offload (type2) in RX: Y
>         RXFIFO > 2048bytes: Y
>         Number of Additional RX channel: 0
>         Number of Additional TX channel: 0
>         Enhanced descriptors: N
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox