Netdev List
 help / color / mirror / Atom feed
* Proposal
From: Teresa Au @ 2016-11-18  4:34 UTC (permalink / raw)




Business Partnership Proposal For You,contact me via my personal E-mail for further 
detail's: ms_teresa_au17@outlook.com

^ permalink raw reply

* Re: [PATCH net-next v3 4/7] vxlan: improve vxlan route lookup checks.
From: Pravin Shelar @ 2016-11-18  5:30 UTC (permalink / raw)
  To: David Laight; +Cc: Jiri Benc, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0222355@AcuExch.aculab.com>

On Thu, Nov 17, 2016 at 2:17 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Of Jiri Benc
>> Sent: 15 November 2016 14:40
>> On Sun, 13 Nov 2016 20:43:55 -0800, Pravin B Shelar wrote:
>> > @@ -1929,8 +1951,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>> >     union vxlan_addr *src;
>> >     struct vxlan_metadata _md;
>> >     struct vxlan_metadata *md = &_md;
>> > -   struct dst_entry *ndst = NULL;
>> >     __be16 src_port = 0, dst_port;
>> > +   struct dst_entry *ndst = NULL;
>> >     __be32 vni, label;
>> >     __be16 df = 0;
>> >     __u8 tos, ttl;
>>
>> This looks kind of arbitrary. You might want to remove this hunk or
>> merge it to patch 3.
>
> Worse than arbitrary, it adds 4 bytes of pad on 64bit systems.
>
OK. I will send out a patch.
But this is not real issue in vxlan module today ;)

^ permalink raw reply

* RE: [PATCH] net: fec: Detect and recover receive queue hangs
From: Andy Duan @ 2016-11-18  6:44 UTC (permalink / raw)
  To: Chris Lesiak
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jaccon Bastiaansen
In-Reply-To: <1479417282-15540-1-git-send-email-chris.lesiak@licor.com>

From: Chris Lesiak <chris.lesiak@licor.com> Sent: Friday, November 18, 2016 5:15 AM
 >To: Andy Duan <fugang.duan@nxp.com>
 >Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jaccon
 >Bastiaansen <jaccon.bastiaansen@gmail.com>; chris.lesiak@licor.com
 >Subject: [PATCH] net: fec: Detect and recover receive queue hangs
 >
 >This corrects a problem that appears to be similar to ERR006358.  But while
 >ERR006358 is a race when the tx queue transitions from empty to not empty,
 >this problem is a race when the rx queue transitions from full to not full.
 >
 >The symptom is a receive queue that is stuck.  The ENET_RDAR register will
 >read 0, indicating that there are no empty receive descriptors in the receive
 >ring.  Since no additional frames can be queued, no RXF interrupts occur.
 >
 >This problem can be triggered with a 1 Gb link and about 400 Mbps of traffic.
 >
 >This patch detects this condition, sets the work_rx bit, and reschedules the
 >poll method.
 >
 >Signed-off-by: Chris Lesiak <chris.lesiak@licor.com>
 >---
 > drivers/net/ethernet/freescale/fec_main.c | 31
 >+++++++++++++++++++++++++++++++
 > 1 file changed, 31 insertions(+)
 >
Firstly, how to reproduce the issue, pls list the reproduce steps. Thanks.
Secondly, pls check below comments.

 >diff --git a/drivers/net/ethernet/freescale/fec_main.c
 >b/drivers/net/ethernet/freescale/fec_main.c
 >index fea0f33..8a87037 100644
 >--- a/drivers/net/ethernet/freescale/fec_main.c
 >+++ b/drivers/net/ethernet/freescale/fec_main.c
 >@@ -1588,6 +1588,34 @@ fec_enet_interrupt(int irq, void *dev_id)
 > 	return ret;
 > }
 >
 >+static inline bool
 >+fec_enet_recover_rxq(struct fec_enet_private *fep, u16 queue_id) {
 >+	int work_bit = (queue_id == 0) ? 2 : ((queue_id == 1) ? 0 : 1);
 >+
 >+	if (readl(fep->rx_queue[queue_id]->bd.reg_desc_active))
If rx ring is really empty in slight throughput cases,  rdar is always cleared, then there always do napi reschedule.

 >+		return false;
 >+
 >+	dev_notice_once(&fep->pdev->dev, "Recovered rx queue\n");
 >+
 >+	fep->work_rx |= 1 << work_bit;
 >+
 >+	return true;
 >+}
 >+
 >+static inline bool fec_enet_recover_rxqs(struct fec_enet_private *fep)
 >+{
 >+	unsigned int q;
 >+	bool ret = false;
 >+
 >+	for (q = 0; q < fep->num_rx_queues; q++) {
 >+		if (fec_enet_recover_rxq(fep, q))
 >+			ret = true;
 >+	}
 >+
 >+	return ret;
 >+}
 >+
 > static int fec_enet_rx_napi(struct napi_struct *napi, int budget)  {
 > 	struct net_device *ndev = napi->dev;
 >@@ -1601,6 +1629,9 @@ static int fec_enet_rx_napi(struct napi_struct *napi,
 >int budget)
 > 	if (pkts < budget) {
 > 		napi_complete(napi);
 > 		writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
 >+
 >+		if (fec_enet_recover_rxqs(fep) && napi_reschedule(napi))
 >+			writel(FEC_NAPI_IMASK, fep->hwp + FEC_IMASK);
 > 	}
 > 	return pkts;
 > }
 >--
 >2.5.5

^ permalink raw reply

* [PATCH net-next 0/4] geneve: Use LWT more effectively.
From: Pravin B Shelar @ 2016-11-18  7:10 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar

Following patch series make use of geneve LWT code path for
geneve netdev type of device.
This allows us to simplify geneve module.

Pravin B Shelar (4):
  geneve: Unify LWT and netdev handling.
  geneve: Merge ipv4 and ipv6 geneve_build_skb()
  geneve: Remove redundant socket checks.
  geneve: Optimize geneve device lookup.

 drivers/net/geneve.c | 678 +++++++++++++++++++++------------------------------
 1 file changed, 273 insertions(+), 405 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next 2/4] geneve: Merge ipv4 and ipv6 geneve_build_skb()
From: Pravin B Shelar @ 2016-11-18  7:10 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar
In-Reply-To: <1479453029-29619-1-git-send-email-pshelar@ovn.org>

There are minimal difference in building Geneve header
between ipv4 and ipv6 geneve tunnels. Following patch
refactors code to unify it.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
---
 drivers/net/geneve.c | 100 ++++++++++++++-------------------------------------
 1 file changed, 26 insertions(+), 74 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 622cf3b..9a4351c 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -630,67 +630,34 @@ static int geneve_stop(struct net_device *dev)
 }
 
 static void geneve_build_header(struct genevehdr *geneveh,
-				__be16 tun_flags, u8 vni[3],
-				u8 options_len, u8 *options)
+				const struct ip_tunnel_info *info)
 {
 	geneveh->ver = GENEVE_VER;
-	geneveh->opt_len = options_len / 4;
-	geneveh->oam = !!(tun_flags & TUNNEL_OAM);
-	geneveh->critical = !!(tun_flags & TUNNEL_CRIT_OPT);
+	geneveh->opt_len = info->options_len / 4;
+	geneveh->oam = !!(info->key.tun_flags & TUNNEL_OAM);
+	geneveh->critical = !!(info->key.tun_flags & TUNNEL_CRIT_OPT);
 	geneveh->rsvd1 = 0;
-	memcpy(geneveh->vni, vni, 3);
+	tunnel_id_to_vni(info->key.tun_id, geneveh->vni);
 	geneveh->proto_type = htons(ETH_P_TEB);
 	geneveh->rsvd2 = 0;
 
-	memcpy(geneveh->options, options, options_len);
+	ip_tunnel_info_opts_get(geneveh->options, info);
 }
 
-static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
-			    __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-			    bool xnet)
-{
-	bool udp_sum = !!(tun_flags & TUNNEL_CSUM);
-	struct genevehdr *gnvh;
-	int min_headroom;
-	int err;
-
-	skb_scrub_packet(skb, xnet);
-
-	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
-			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr);
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err))
-		goto free_rt;
-
-	err = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (err)
-		goto free_rt;
-
-	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
-	geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
-	return 0;
-
-free_rt:
-	ip_rt_put(rt);
-	return err;
-}
-
-#if IS_ENABLED(CONFIG_IPV6)
-static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
-			     __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-			     bool xnet)
+static int geneve_build_skb(struct dst_entry *dst, struct sk_buff *skb,
+			    const struct ip_tunnel_info *info,
+			    bool xnet, int ip_hdr_len)
 {
-	bool udp_sum = !!(tun_flags & TUNNEL_CSUM);
+	bool udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM);
 	struct genevehdr *gnvh;
 	int min_headroom;
 	int err;
 
+	skb_reset_mac_header(skb);
 	skb_scrub_packet(skb, xnet);
 
-	min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
-			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct ipv6hdr);
+	min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len +
+		       GENEVE_BASE_HLEN + info->options_len + ip_hdr_len;
 	err = skb_cow_head(skb, min_headroom);
 	if (unlikely(err))
 		goto free_dst;
@@ -699,9 +666,9 @@ static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 	if (err)
 		goto free_dst;
 
-	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
-	geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
-
+	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) +
+						   info->options_len);
+	geneve_build_header(gnvh, info);
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 	return 0;
 
@@ -709,12 +676,11 @@ static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 	dst_release(dst);
 	return err;
 }
-#endif
 
 static struct rtable *geneve_get_v4_rt(struct sk_buff *skb,
 				       struct net_device *dev,
 				       struct flowi4 *fl4,
-				       struct ip_tunnel_info *info)
+				       const struct ip_tunnel_info *info)
 {
 	bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
 	struct geneve_dev *geneve = netdev_priv(dev);
@@ -738,7 +704,7 @@ static struct rtable *geneve_get_v4_rt(struct sk_buff *skb,
 	}
 	fl4->flowi4_tos = RT_TOS(tos);
 
-	dst_cache = &info->dst_cache;
+	dst_cache = (struct dst_cache *)&info->dst_cache;
 	if (use_cache) {
 		rt = dst_cache_get_ip4(dst_cache, &fl4->saddr);
 		if (rt)
@@ -763,7 +729,7 @@ static struct rtable *geneve_get_v4_rt(struct sk_buff *skb,
 static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb,
 					   struct net_device *dev,
 					   struct flowi6 *fl6,
-					   struct ip_tunnel_info *info)
+					   const struct ip_tunnel_info *info)
 {
 	bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
 	struct geneve_dev *geneve = netdev_priv(dev);
@@ -789,7 +755,7 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb,
 
 	fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
 					   info->key.label);
-	dst_cache = &info->dst_cache;
+	dst_cache = (struct dst_cache *)&info->dst_cache;
 	if (use_cache) {
 		dst = dst_cache_get_ip6(dst_cache, &fl6->saddr);
 		if (dst)
@@ -812,7 +778,8 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb,
 #endif
 
 static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
-			   struct geneve_dev *geneve, struct ip_tunnel_info *info)
+			   struct geneve_dev *geneve,
+			   const struct ip_tunnel_info *info)
 {
 	bool xnet = !net_eq(geneve->net, dev_net(geneve->dev));
 	struct geneve_sock *gs4 = rcu_dereference(geneve->sock4);
@@ -820,11 +787,9 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	struct rtable *rt;
 	int err = -EINVAL;
 	struct flowi4 fl4;
-	u8 *opts = NULL;
 	__u8 tos, ttl;
 	__be16 sport;
 	__be16 df;
-	u8 vni[3];
 
 	if (!gs4)
 		return err;
@@ -843,13 +808,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	}
 	df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
 
-	tunnel_id_to_vni(key->tun_id, vni);
-	if (info->options_len)
-		opts = ip_tunnel_info_opts(info);
-
-	skb_reset_mac_header(skb);
-	err = geneve_build_skb(rt, skb, key->tun_flags, vni,
-			       info->options_len, opts, xnet);
+	err = geneve_build_skb(&rt->dst, skb, info, xnet, sizeof(struct iphdr));
 	if (unlikely(err))
 		return err;
 
@@ -862,7 +821,8 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 
 #if IS_ENABLED(CONFIG_IPV6)
 static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
-			    struct geneve_dev *geneve, struct ip_tunnel_info *info)
+			    struct geneve_dev *geneve,
+			    const struct ip_tunnel_info *info)
 {
 	bool xnet = !net_eq(geneve->net, dev_net(geneve->dev));
 	struct geneve_sock *gs6 = rcu_dereference(geneve->sock6);
@@ -870,10 +830,8 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	struct dst_entry *dst = NULL;
 	int err = -EINVAL;
 	struct flowi6 fl6;
-	u8 *opts = NULL;
 	__u8 prio, ttl;
 	__be16 sport;
-	u8 vni[3];
 
 	if (!gs6)
 		return err;
@@ -891,13 +849,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 					   ip_hdr(skb), skb);
 		ttl = key->ttl ? : ip6_dst_hoplimit(dst);
 	}
-	tunnel_id_to_vni(key->tun_id, vni);
-	if (info->options_len)
-		opts = ip_tunnel_info_opts(info);
-
-	skb_reset_mac_header(skb);
-	err = geneve6_build_skb(dst, skb, key->tun_flags, vni,
-				info->options_len, opts, xnet);
+	err = geneve_build_skb(dst, skb, info, xnet, sizeof(struct iphdr));
 	if (unlikely(err))
 		return err;
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 1/4] geneve: Unify LWT and netdev handling.
From: Pravin B Shelar @ 2016-11-18  7:10 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar
In-Reply-To: <1479453029-29619-1-git-send-email-pshelar@ovn.org>

Current geneve implementation has two separate cases to handle.
1. netdev xmit
2. LWT xmit.

In case of netdev, geneve configuration is stored in various
struct geneve_dev members. For example geneve_addr, ttl, tos,
label, flags, dst_cache, etc. For LWT ip_tunnel_info is passed
to the device in ip_tunnel_info.

Following patch uses ip_tunnel_info struct to store almost all
of configuration of a geneve netdevice. This allows us to unify
most of geneve driver code around ip_tunnel_info struct.
This dramatically simplify geneve code, since it does not
need to handle two different configuration cases. Removes
duplicate code, single code path can handle either type
of geneve devices.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
---
 drivers/net/geneve.c | 611 ++++++++++++++++++++++-----------------------------
 1 file changed, 262 insertions(+), 349 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 85a423a..622cf3b 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -45,41 +45,22 @@ struct geneve_net {
 
 static int geneve_net_id;
 
-union geneve_addr {
-	struct sockaddr_in sin;
-	struct sockaddr_in6 sin6;
-	struct sockaddr sa;
-};
-
-static union geneve_addr geneve_remote_unspec = { .sa.sa_family = AF_UNSPEC, };
-
 /* Pseudo network device */
 struct geneve_dev {
 	struct hlist_node  hlist;	/* vni hash table */
 	struct net	   *net;	/* netns for packet i/o */
 	struct net_device  *dev;	/* netdev for geneve tunnel */
+	struct ip_tunnel_info info;
 	struct geneve_sock __rcu *sock4;	/* IPv4 socket used for geneve tunnel */
 #if IS_ENABLED(CONFIG_IPV6)
 	struct geneve_sock __rcu *sock6;	/* IPv6 socket used for geneve tunnel */
 #endif
-	u8                 vni[3];	/* virtual network ID for tunnel */
-	u8                 ttl;		/* TTL override */
-	u8                 tos;		/* TOS override */
-	union geneve_addr  remote;	/* IP address for link partner */
 	struct list_head   next;	/* geneve's per namespace list */
-	__be32		   label;	/* IPv6 flowlabel override */
-	__be16		   dst_port;
-	bool		   collect_md;
 	struct gro_cells   gro_cells;
-	u32		   flags;
-	struct dst_cache   dst_cache;
+	bool		   collect_md;
+	bool		   use_udp6_rx_checksums;
 };
 
-/* Geneve device flags */
-#define GENEVE_F_UDP_ZERO_CSUM_TX	BIT(0)
-#define GENEVE_F_UDP_ZERO_CSUM6_TX	BIT(1)
-#define GENEVE_F_UDP_ZERO_CSUM6_RX	BIT(2)
-
 struct geneve_sock {
 	bool			collect_md;
 	struct list_head	list;
@@ -87,7 +68,6 @@ struct geneve_sock {
 	struct rcu_head		rcu;
 	int			refcnt;
 	struct hlist_head	vni_list[VNI_HASH_SIZE];
-	u32			flags;
 };
 
 static inline __u32 geneve_net_vni_hash(u8 vni[3])
@@ -109,6 +89,20 @@ static __be64 vni_to_tunnel_id(const __u8 *vni)
 #endif
 }
 
+/* Convert 64 bit tunnel ID to 24 bit VNI. */
+static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
+{
+#ifdef __BIG_ENDIAN
+	vni[0] = (__force __u8)(tun_id >> 16);
+	vni[1] = (__force __u8)(tun_id >> 8);
+	vni[2] = (__force __u8)tun_id;
+#else
+	vni[0] = (__force __u8)((__force u64)tun_id >> 40);
+	vni[1] = (__force __u8)((__force u64)tun_id >> 48);
+	vni[2] = (__force __u8)((__force u64)tun_id >> 56);
+#endif
+}
+
 static sa_family_t geneve_get_sk_family(struct geneve_sock *gs)
 {
 	return gs->sock->sk->sk_family;
@@ -117,6 +111,7 @@ static sa_family_t geneve_get_sk_family(struct geneve_sock *gs)
 static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 					__be32 addr, u8 vni[])
 {
+	__be64 id = vni_to_tunnel_id(vni);
 	struct hlist_head *vni_list_head;
 	struct geneve_dev *geneve;
 	__u32 hash;
@@ -125,8 +120,8 @@ static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 	hash = geneve_net_vni_hash(vni);
 	vni_list_head = &gs->vni_list[hash];
 	hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
-		if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
-		    addr == geneve->remote.sin.sin_addr.s_addr)
+		if (!memcmp(&id, &geneve->info.key.tun_id, sizeof(id)) &&
+		    addr == geneve->info.key.u.ipv4.dst)
 			return geneve;
 	}
 	return NULL;
@@ -136,6 +131,7 @@ static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 static struct geneve_dev *geneve6_lookup(struct geneve_sock *gs,
 					 struct in6_addr addr6, u8 vni[])
 {
+	__be64 id = vni_to_tunnel_id(vni);
 	struct hlist_head *vni_list_head;
 	struct geneve_dev *geneve;
 	__u32 hash;
@@ -144,8 +140,8 @@ static struct geneve_dev *geneve6_lookup(struct geneve_sock *gs,
 	hash = geneve_net_vni_hash(vni);
 	vni_list_head = &gs->vni_list[hash];
 	hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
-		if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
-		    ipv6_addr_equal(&addr6, &geneve->remote.sin6.sin6_addr))
+		if (!memcmp(&id, &geneve->info.key.tun_id, sizeof(id)) &&
+		    ipv6_addr_equal(&addr6, &geneve->info.key.u.ipv6.dst))
 			return geneve;
 	}
 	return NULL;
@@ -160,15 +156,12 @@ static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
 static struct geneve_dev *geneve_lookup_skb(struct geneve_sock *gs,
 					    struct sk_buff *skb)
 {
-	u8 *vni;
-	__be32 addr;
 	static u8 zero_vni[3];
-#if IS_ENABLED(CONFIG_IPV6)
-	static struct in6_addr zero_addr6;
-#endif
+	u8 *vni;
 
 	if (geneve_get_sk_family(gs) == AF_INET) {
 		struct iphdr *iph;
+		__be32 addr;
 
 		iph = ip_hdr(skb); /* outer IP header... */
 
@@ -183,6 +176,7 @@ static struct geneve_dev *geneve_lookup_skb(struct geneve_sock *gs,
 		return geneve_lookup(gs, addr, vni);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else if (geneve_get_sk_family(gs) == AF_INET6) {
+		static struct in6_addr zero_addr6;
 		struct ipv6hdr *ip6h;
 		struct in6_addr addr6;
 
@@ -305,13 +299,12 @@ static int geneve_init(struct net_device *dev)
 		return err;
 	}
 
-	err = dst_cache_init(&geneve->dst_cache, GFP_KERNEL);
+	err = dst_cache_init(&geneve->info.dst_cache, GFP_KERNEL);
 	if (err) {
 		free_percpu(dev->tstats);
 		gro_cells_destroy(&geneve->gro_cells);
 		return err;
 	}
-
 	return 0;
 }
 
@@ -319,7 +312,7 @@ static void geneve_uninit(struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 
-	dst_cache_destroy(&geneve->dst_cache);
+	dst_cache_destroy(&geneve->info.dst_cache);
 	gro_cells_destroy(&geneve->gro_cells);
 	free_percpu(dev->tstats);
 }
@@ -368,7 +361,7 @@ static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 }
 
 static struct socket *geneve_create_sock(struct net *net, bool ipv6,
-					 __be16 port, u32 flags)
+					 __be16 port, bool ipv6_rx_csum)
 {
 	struct socket *sock;
 	struct udp_port_cfg udp_conf;
@@ -379,8 +372,7 @@ static struct socket *geneve_create_sock(struct net *net, bool ipv6,
 	if (ipv6) {
 		udp_conf.family = AF_INET6;
 		udp_conf.ipv6_v6only = 1;
-		udp_conf.use_udp6_rx_checksums =
-		    !(flags & GENEVE_F_UDP_ZERO_CSUM6_RX);
+		udp_conf.use_udp6_rx_checksums = ipv6_rx_csum;
 	} else {
 		udp_conf.family = AF_INET;
 		udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
@@ -491,7 +483,7 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
 
 /* Create new listen socket if needed */
 static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
-						bool ipv6, u32 flags)
+						bool ipv6, bool ipv6_rx_csum)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_sock *gs;
@@ -503,7 +495,7 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
 	if (!gs)
 		return ERR_PTR(-ENOMEM);
 
-	sock = geneve_create_sock(net, ipv6, port, flags);
+	sock = geneve_create_sock(net, ipv6, port, ipv6_rx_csum);
 	if (IS_ERR(sock)) {
 		kfree(gs);
 		return ERR_CAST(sock);
@@ -579,21 +571,22 @@ static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6)
 	struct net *net = geneve->net;
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_sock *gs;
+	__u8 vni[3];
 	__u32 hash;
 
-	gs = geneve_find_sock(gn, ipv6 ? AF_INET6 : AF_INET, geneve->dst_port);
+	gs = geneve_find_sock(gn, ipv6 ? AF_INET6 : AF_INET, geneve->info.key.tp_dst);
 	if (gs) {
 		gs->refcnt++;
 		goto out;
 	}
 
-	gs = geneve_socket_create(net, geneve->dst_port, ipv6, geneve->flags);
+	gs = geneve_socket_create(net, geneve->info.key.tp_dst, ipv6,
+				  geneve->use_udp6_rx_checksums);
 	if (IS_ERR(gs))
 		return PTR_ERR(gs);
 
 out:
 	gs->collect_md = geneve->collect_md;
-	gs->flags = geneve->flags;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (ipv6)
 		rcu_assign_pointer(geneve->sock6, gs);
@@ -601,7 +594,8 @@ static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6)
 #endif
 		rcu_assign_pointer(geneve->sock4, gs);
 
-	hash = geneve_net_vni_hash(geneve->vni);
+	tunnel_id_to_vni(geneve->info.key.tun_id, vni);
+	hash = geneve_net_vni_hash(vni);
 	hlist_add_head_rcu(&geneve->hlist, &gs->vni_list[hash]);
 	return 0;
 }
@@ -609,7 +603,7 @@ static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6)
 static int geneve_open(struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
-	bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
+	bool ipv6 = !!(geneve->info.mode & IP_TUNNEL_INFO_IPV6);
 	bool metadata = geneve->collect_md;
 	int ret = 0;
 
@@ -653,12 +647,12 @@ static void geneve_build_header(struct genevehdr *geneveh,
 
 static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
 			    __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-			    u32 flags, bool xnet)
+			    bool xnet)
 {
+	bool udp_sum = !!(tun_flags & TUNNEL_CSUM);
 	struct genevehdr *gnvh;
 	int min_headroom;
 	int err;
-	bool udp_sum = !(flags & GENEVE_F_UDP_ZERO_CSUM_TX);
 
 	skb_scrub_packet(skb, xnet);
 
@@ -686,12 +680,12 @@ static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
 #if IS_ENABLED(CONFIG_IPV6)
 static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 			     __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-			     u32 flags, bool xnet)
+			     bool xnet)
 {
+	bool udp_sum = !!(tun_flags & TUNNEL_CSUM);
 	struct genevehdr *gnvh;
 	int min_headroom;
 	int err;
-	bool udp_sum = !(flags & GENEVE_F_UDP_ZERO_CSUM6_TX);
 
 	skb_scrub_packet(skb, xnet);
 
@@ -734,32 +728,22 @@ static struct rtable *geneve_get_v4_rt(struct sk_buff *skb,
 	memset(fl4, 0, sizeof(*fl4));
 	fl4->flowi4_mark = skb->mark;
 	fl4->flowi4_proto = IPPROTO_UDP;
+	fl4->daddr = info->key.u.ipv4.dst;
+	fl4->saddr = info->key.u.ipv4.src;
 
-	if (info) {
-		fl4->daddr = info->key.u.ipv4.dst;
-		fl4->saddr = info->key.u.ipv4.src;
-		fl4->flowi4_tos = RT_TOS(info->key.tos);
-		dst_cache = &info->dst_cache;
-	} else {
-		tos = geneve->tos;
-		if (tos == 1) {
-			const struct iphdr *iip = ip_hdr(skb);
-
-			tos = ip_tunnel_get_dsfield(iip, skb);
-			use_cache = false;
-		}
-
-		fl4->flowi4_tos = RT_TOS(tos);
-		fl4->daddr = geneve->remote.sin.sin_addr.s_addr;
-		dst_cache = &geneve->dst_cache;
+	tos = info->key.tos;
+	if (!geneve->collect_md && (tos == 1)) {
+		tos = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
+		use_cache = false;
 	}
+	fl4->flowi4_tos = RT_TOS(tos);
 
+	dst_cache = &info->dst_cache;
 	if (use_cache) {
 		rt = dst_cache_get_ip4(dst_cache, &fl4->saddr);
 		if (rt)
 			return rt;
 	}
-
 	rt = ip_route_output_key(geneve->net, fl4);
 	if (IS_ERR(rt)) {
 		netdev_dbg(dev, "no route to %pI4\n", &fl4->daddr);
@@ -795,34 +779,22 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb,
 	memset(fl6, 0, sizeof(*fl6));
 	fl6->flowi6_mark = skb->mark;
 	fl6->flowi6_proto = IPPROTO_UDP;
-
-	if (info) {
-		fl6->daddr = info->key.u.ipv6.dst;
-		fl6->saddr = info->key.u.ipv6.src;
-		fl6->flowlabel = ip6_make_flowinfo(RT_TOS(info->key.tos),
-						   info->key.label);
-		dst_cache = &info->dst_cache;
-	} else {
-		prio = geneve->tos;
-		if (prio == 1) {
-			const struct iphdr *iip = ip_hdr(skb);
-
-			prio = ip_tunnel_get_dsfield(iip, skb);
-			use_cache = false;
-		}
-
-		fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
-						   geneve->label);
-		fl6->daddr = geneve->remote.sin6.sin6_addr;
-		dst_cache = &geneve->dst_cache;
+	fl6->daddr = info->key.u.ipv6.dst;
+	fl6->saddr = info->key.u.ipv6.src;
+	prio = info->key.tos;
+	if (!geneve->collect_md && (prio == 1)) {
+		prio = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
+		use_cache = false;
 	}
 
+	fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
+					   info->key.label);
+	dst_cache = &info->dst_cache;
 	if (use_cache) {
 		dst = dst_cache_get_ip6(dst_cache, &fl6->saddr);
 		if (dst)
 			return dst;
 	}
-
 	if (ipv6_stub->ipv6_dst_lookup(geneve->net, gs6->sock->sk, &dst, fl6)) {
 		netdev_dbg(dev, "no route to %pI6\n", &fl6->daddr);
 		return ERR_PTR(-ENETUNREACH);
@@ -839,195 +811,129 @@ static struct dst_entry *geneve_get_v6_dst(struct sk_buff *skb,
 }
 #endif
 
-/* Convert 64 bit tunnel ID to 24 bit VNI. */
-static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
-{
-#ifdef __BIG_ENDIAN
-	vni[0] = (__force __u8)(tun_id >> 16);
-	vni[1] = (__force __u8)(tun_id >> 8);
-	vni[2] = (__force __u8)tun_id;
-#else
-	vni[0] = (__force __u8)((__force u64)tun_id >> 40);
-	vni[1] = (__force __u8)((__force u64)tun_id >> 48);
-	vni[2] = (__force __u8)((__force u64)tun_id >> 56);
-#endif
-}
-
-static netdev_tx_t geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
-				   struct ip_tunnel_info *info)
+static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
+			   struct geneve_dev *geneve, struct ip_tunnel_info *info)
 {
-	struct geneve_dev *geneve = netdev_priv(dev);
-	struct geneve_sock *gs4;
-	struct rtable *rt = NULL;
-	const struct iphdr *iip; /* interior IP header */
+	bool xnet = !net_eq(geneve->net, dev_net(geneve->dev));
+	struct geneve_sock *gs4 = rcu_dereference(geneve->sock4);
+	const struct ip_tunnel_key *key = &info->key;
+	struct rtable *rt;
 	int err = -EINVAL;
 	struct flowi4 fl4;
+	u8 *opts = NULL;
 	__u8 tos, ttl;
 	__be16 sport;
 	__be16 df;
-	bool xnet = !net_eq(geneve->net, dev_net(geneve->dev));
-	u32 flags = geneve->flags;
+	u8 vni[3];
 
-	gs4 = rcu_dereference(geneve->sock4);
 	if (!gs4)
-		goto tx_error;
-
-	if (geneve->collect_md) {
-		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
-			netdev_dbg(dev, "no tunnel metadata\n");
-			goto tx_error;
-		}
-		if (info && ip_tunnel_info_af(info) != AF_INET)
-			goto tx_error;
-	}
+		return err;
 
 	rt = geneve_get_v4_rt(skb, dev, &fl4, info);
-	if (IS_ERR(rt)) {
-		err = PTR_ERR(rt);
-		goto tx_error;
-	}
+	if (IS_ERR(rt))
+		return PTR_ERR(rt);
 
 	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
-	skb_reset_mac_header(skb);
-
-	iip = ip_hdr(skb);
-
-	if (info) {
-		const struct ip_tunnel_key *key = &info->key;
-		u8 *opts = NULL;
-		u8 vni[3];
-
-		tunnel_id_to_vni(key->tun_id, vni);
-		if (info->options_len)
-			opts = ip_tunnel_info_opts(info);
-
-		if (key->tun_flags & TUNNEL_CSUM)
-			flags &= ~GENEVE_F_UDP_ZERO_CSUM_TX;
-		else
-			flags |= GENEVE_F_UDP_ZERO_CSUM_TX;
-
-		err = geneve_build_skb(rt, skb, key->tun_flags, vni,
-				       info->options_len, opts, flags, xnet);
-		if (unlikely(err))
-			goto tx_error;
-
-		tos = ip_tunnel_ecn_encap(key->tos, iip, skb);
+	if (geneve->collect_md) {
+		tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
 		ttl = key->ttl;
-		df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
 	} else {
-		err = geneve_build_skb(rt, skb, 0, geneve->vni,
-				       0, NULL, flags, xnet);
-		if (unlikely(err))
-			goto tx_error;
-
-		tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, iip, skb);
-		ttl = geneve->ttl;
-		if (!ttl && IN_MULTICAST(ntohl(fl4.daddr)))
-			ttl = 1;
-		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
-		df = 0;
+		tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, ip_hdr(skb), skb);
+		ttl = key->ttl ? : ip4_dst_hoplimit(&rt->dst);
 	}
-	udp_tunnel_xmit_skb(rt, gs4->sock->sk, skb, fl4.saddr, fl4.daddr,
-			    tos, ttl, df, sport, geneve->dst_port,
-			    !net_eq(geneve->net, dev_net(geneve->dev)),
-			    !!(flags & GENEVE_F_UDP_ZERO_CSUM_TX));
+	df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
 
-	return NETDEV_TX_OK;
-
-tx_error:
-	dev_kfree_skb(skb);
+	tunnel_id_to_vni(key->tun_id, vni);
+	if (info->options_len)
+		opts = ip_tunnel_info_opts(info);
 
-	if (err == -ELOOP)
-		dev->stats.collisions++;
-	else if (err == -ENETUNREACH)
-		dev->stats.tx_carrier_errors++;
+	skb_reset_mac_header(skb);
+	err = geneve_build_skb(rt, skb, key->tun_flags, vni,
+			       info->options_len, opts, xnet);
+	if (unlikely(err))
+		return err;
 
-	dev->stats.tx_errors++;
-	return NETDEV_TX_OK;
+	udp_tunnel_xmit_skb(rt, gs4->sock->sk, skb, fl4.saddr, fl4.daddr,
+			    tos, ttl, df, sport, geneve->info.key.tp_dst,
+			    !net_eq(geneve->net, dev_net(geneve->dev)),
+			    !(info->key.tun_flags & TUNNEL_CSUM));
+	return 0;
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
-				    struct ip_tunnel_info *info)
+static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
+			    struct geneve_dev *geneve, struct ip_tunnel_info *info)
 {
-	struct geneve_dev *geneve = netdev_priv(dev);
+	bool xnet = !net_eq(geneve->net, dev_net(geneve->dev));
+	struct geneve_sock *gs6 = rcu_dereference(geneve->sock6);
+	const struct ip_tunnel_key *key = &info->key;
 	struct dst_entry *dst = NULL;
-	const struct iphdr *iip; /* interior IP header */
-	struct geneve_sock *gs6;
 	int err = -EINVAL;
 	struct flowi6 fl6;
+	u8 *opts = NULL;
 	__u8 prio, ttl;
 	__be16 sport;
-	__be32 label;
-	bool xnet = !net_eq(geneve->net, dev_net(geneve->dev));
-	u32 flags = geneve->flags;
+	u8 vni[3];
 
-	gs6 = rcu_dereference(geneve->sock6);
 	if (!gs6)
-		goto tx_error;
-
-	if (geneve->collect_md) {
-		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
-			netdev_dbg(dev, "no tunnel metadata\n");
-			goto tx_error;
-		}
-	}
+		return err;
 
 	dst = geneve_get_v6_dst(skb, dev, &fl6, info);
-	if (IS_ERR(dst)) {
-		err = PTR_ERR(dst);
-		goto tx_error;
-	}
+	if (IS_ERR(dst))
+		return PTR_ERR(dst);
 
 	sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
-	skb_reset_mac_header(skb);
-
-	iip = ip_hdr(skb);
+	if (geneve->collect_md) {
+		prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
+		ttl = key->ttl;
+	} else {
+		prio = ip_tunnel_ecn_encap(ip6_tclass(fl6.flowlabel),
+					   ip_hdr(skb), skb);
+		ttl = key->ttl ? : ip6_dst_hoplimit(dst);
+	}
+	tunnel_id_to_vni(key->tun_id, vni);
+	if (info->options_len)
+		opts = ip_tunnel_info_opts(info);
 
-	if (info) {
-		const struct ip_tunnel_key *key = &info->key;
-		u8 *opts = NULL;
-		u8 vni[3];
+	skb_reset_mac_header(skb);
+	err = geneve6_build_skb(dst, skb, key->tun_flags, vni,
+				info->options_len, opts, xnet);
+	if (unlikely(err))
+		return err;
 
-		tunnel_id_to_vni(key->tun_id, vni);
-		if (info->options_len)
-			opts = ip_tunnel_info_opts(info);
+	udp_tunnel6_xmit_skb(dst, gs6->sock->sk, skb, dev,
+			     &fl6.saddr, &fl6.daddr, prio, ttl,
+			     info->key.label, sport, geneve->info.key.tp_dst,
+			     !(info->key.tun_flags & TUNNEL_CSUM));
+	return 0;
+}
+#endif
 
-		if (key->tun_flags & TUNNEL_CSUM)
-			flags &= ~GENEVE_F_UDP_ZERO_CSUM6_TX;
-		else
-			flags |= GENEVE_F_UDP_ZERO_CSUM6_TX;
+static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct geneve_dev *geneve = netdev_priv(dev);
+	struct ip_tunnel_info *info = NULL;
+	int err;
 
-		err = geneve6_build_skb(dst, skb, key->tun_flags, vni,
-					info->options_len, opts,
-					flags, xnet);
-		if (unlikely(err))
+	if (geneve->collect_md) {
+		info = skb_tunnel_info(skb);
+		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
+			netdev_dbg(dev, "no tunnel metadata\n");
 			goto tx_error;
-
-		prio = ip_tunnel_ecn_encap(key->tos, iip, skb);
-		ttl = key->ttl;
-		label = info->key.label;
+		}
 	} else {
-		err = geneve6_build_skb(dst, skb, 0, geneve->vni,
-					0, NULL, flags, xnet);
-		if (unlikely(err))
-			goto tx_error;
-
-		prio = ip_tunnel_ecn_encap(ip6_tclass(fl6.flowlabel),
-					   iip, skb);
-		ttl = geneve->ttl;
-		if (!ttl && ipv6_addr_is_multicast(&fl6.daddr))
-			ttl = 1;
-		ttl = ttl ? : ip6_dst_hoplimit(dst);
-		label = geneve->label;
+		info = &geneve->info;
 	}
 
-	udp_tunnel6_xmit_skb(dst, gs6->sock->sk, skb, dev,
-			     &fl6.saddr, &fl6.daddr, prio, ttl, label,
-			     sport, geneve->dst_port,
-			     !!(flags & GENEVE_F_UDP_ZERO_CSUM6_TX));
-	return NETDEV_TX_OK;
+#if IS_ENABLED(CONFIG_IPV6)
+	if (info->mode & IP_TUNNEL_INFO_IPV6)
+		err = geneve6_xmit_skb(skb, dev, geneve, info);
+	else
+#endif
+		err = geneve_xmit_skb(skb, dev, geneve, info);
 
+	if (likely(!err))
+		return NETDEV_TX_OK;
 tx_error:
 	dev_kfree_skb(skb);
 
@@ -1039,23 +945,6 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	dev->stats.tx_errors++;
 	return NETDEV_TX_OK;
 }
-#endif
-
-static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
-{
-	struct geneve_dev *geneve = netdev_priv(dev);
-	struct ip_tunnel_info *info = NULL;
-
-	if (geneve->collect_md)
-		info = skb_tunnel_info(skb);
-
-#if IS_ENABLED(CONFIG_IPV6)
-	if ((info && ip_tunnel_info_af(info) == AF_INET6) ||
-	    (!info && geneve->remote.sa.sa_family == AF_INET6))
-		return geneve6_xmit_skb(skb, dev, info);
-#endif
-	return geneve_xmit_skb(skb, dev, info);
-}
 
 static int geneve_change_mtu(struct net_device *dev, int new_mtu)
 {
@@ -1073,14 +962,11 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 {
 	struct ip_tunnel_info *info = skb_tunnel_info(skb);
 	struct geneve_dev *geneve = netdev_priv(dev);
-	struct rtable *rt;
-	struct flowi4 fl4;
-#if IS_ENABLED(CONFIG_IPV6)
-	struct dst_entry *dst;
-	struct flowi6 fl6;
-#endif
 
 	if (ip_tunnel_info_af(info) == AF_INET) {
+		struct rtable *rt;
+		struct flowi4 fl4;
+
 		rt = geneve_get_v4_rt(skb, dev, &fl4, info);
 		if (IS_ERR(rt))
 			return PTR_ERR(rt);
@@ -1089,6 +975,9 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 		info->key.u.ipv4.src = fl4.saddr;
 #if IS_ENABLED(CONFIG_IPV6)
 	} else if (ip_tunnel_info_af(info) == AF_INET6) {
+		struct dst_entry *dst;
+		struct flowi6 fl6;
+
 		dst = geneve_get_v6_dst(skb, dev, &fl6, info);
 		if (IS_ERR(dst))
 			return PTR_ERR(dst);
@@ -1102,7 +991,7 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 
 	info->key.tp_src = udp_flow_src_port(geneve->net, skb,
 					     1, USHRT_MAX, true);
-	info->key.tp_dst = geneve->dst_port;
+	info->key.tp_dst = geneve->info.key.tp_dst;
 	return 0;
 }
 
@@ -1224,78 +1113,69 @@ static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
 }
 
 static struct geneve_dev *geneve_find_dev(struct geneve_net *gn,
-					  __be16 dst_port,
-					  union geneve_addr *remote,
-					  u8 vni[],
+					  const struct ip_tunnel_info *info,
 					  bool *tun_on_same_port,
 					  bool *tun_collect_md)
 {
-	struct geneve_dev *geneve, *t;
+	struct geneve_dev *geneve, *t = NULL;
 
 	*tun_on_same_port = false;
 	*tun_collect_md = false;
-	t = NULL;
 	list_for_each_entry(geneve, &gn->geneve_list, next) {
-		if (geneve->dst_port == dst_port) {
+		if (info->key.tp_dst == geneve->info.key.tp_dst) {
 			*tun_collect_md = geneve->collect_md;
 			*tun_on_same_port = true;
 		}
-		if (!memcmp(vni, geneve->vni, sizeof(geneve->vni)) &&
-		    !memcmp(remote, &geneve->remote, sizeof(geneve->remote)) &&
-		    dst_port == geneve->dst_port)
+		if (info->key.tun_id == geneve->info.key.tun_id &&
+		    info->key.tp_dst == geneve->info.key.tp_dst &&
+		    !memcmp(&info->key.u, &geneve->info.key.u, sizeof(info->key.u)))
 			t = geneve;
 	}
 	return t;
 }
 
+static bool is_all_zero(const u8 *fp, size_t size)
+{
+	int i;
+
+	for (i = 0; i < size; i++)
+		if (fp[i])
+			return false;
+	return true;
+}
+
+static bool is_tnl_info_zero(const struct ip_tunnel_info *info)
+{
+	if (info->key.tun_id || info->key.tun_flags || info->key.tos ||
+	    info->key.ttl || info->key.label || info->key.tp_src ||
+	    !is_all_zero((const u8 *)&info->key.u, sizeof(info->key.u)))
+		return false;
+	else
+		return true;
+}
+
 static int geneve_configure(struct net *net, struct net_device *dev,
-			    union geneve_addr *remote,
-			    __u32 vni, __u8 ttl, __u8 tos, __be32 label,
-			    __be16 dst_port, bool metadata, u32 flags)
+			    const struct ip_tunnel_info *info,
+			    bool metadata, bool ipv6_rx_csum)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct geneve_dev *t, *geneve = netdev_priv(dev);
 	bool tun_collect_md, tun_on_same_port;
 	int err, encap_len;
 
-	if (!remote)
-		return -EINVAL;
-	if (metadata &&
-	    (remote->sa.sa_family != AF_UNSPEC || vni || tos || ttl || label))
+	if (metadata && !is_tnl_info_zero(info))
 		return -EINVAL;
 
 	geneve->net = net;
 	geneve->dev = dev;
 
-	geneve->vni[0] = (vni & 0x00ff0000) >> 16;
-	geneve->vni[1] = (vni & 0x0000ff00) >> 8;
-	geneve->vni[2] =  vni & 0x000000ff;
-
-	if ((remote->sa.sa_family == AF_INET &&
-	     IN_MULTICAST(ntohl(remote->sin.sin_addr.s_addr))) ||
-	    (remote->sa.sa_family == AF_INET6 &&
-	     ipv6_addr_is_multicast(&remote->sin6.sin6_addr)))
-		return -EINVAL;
-	if (label && remote->sa.sa_family != AF_INET6)
-		return -EINVAL;
-
-	geneve->remote = *remote;
-
-	geneve->ttl = ttl;
-	geneve->tos = tos;
-	geneve->label = label;
-	geneve->dst_port = dst_port;
-	geneve->collect_md = metadata;
-	geneve->flags = flags;
-
-	t = geneve_find_dev(gn, dst_port, remote, geneve->vni,
-			    &tun_on_same_port, &tun_collect_md);
+	t = geneve_find_dev(gn, info, &tun_on_same_port, &tun_collect_md);
 	if (t)
 		return -EBUSY;
 
 	/* make enough headroom for basic scenario */
 	encap_len = GENEVE_BASE_HLEN + ETH_HLEN;
-	if (remote->sa.sa_family == AF_INET) {
+	if (ip_tunnel_info_af(info) == AF_INET) {
 		encap_len += sizeof(struct iphdr);
 		dev->max_mtu -= sizeof(struct iphdr);
 	} else {
@@ -1312,7 +1192,10 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 			return -EPERM;
 	}
 
-	dst_cache_reset(&geneve->dst_cache);
+	dst_cache_reset(&geneve->info.dst_cache);
+	geneve->info = *info;
+	geneve->collect_md = metadata;
+	geneve->use_udp6_rx_checksums = ipv6_rx_csum;
 
 	err = register_netdevice(dev);
 	if (err)
@@ -1322,74 +1205,99 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 	return 0;
 }
 
+static void init_tnl_info(struct ip_tunnel_info *info, __u16 dst_port)
+{
+	memset(info, 0, sizeof(*info));
+	info->key.tp_dst = htons(dst_port);
+}
+
 static int geneve_newlink(struct net *net, struct net_device *dev,
 			  struct nlattr *tb[], struct nlattr *data[])
 {
-	__be16 dst_port = htons(GENEVE_UDP_PORT);
-	__u8 ttl = 0, tos = 0;
+	bool use_udp6_rx_checksums = false;
+	struct ip_tunnel_info info;
 	bool metadata = false;
-	union geneve_addr remote = geneve_remote_unspec;
-	__be32 label = 0;
-	__u32 vni = 0;
-	u32 flags = 0;
+
+	init_tnl_info(&info, GENEVE_UDP_PORT);
 
 	if (data[IFLA_GENEVE_REMOTE] && data[IFLA_GENEVE_REMOTE6])
 		return -EINVAL;
 
 	if (data[IFLA_GENEVE_REMOTE]) {
-		remote.sa.sa_family = AF_INET;
-		remote.sin.sin_addr.s_addr =
+		info.key.u.ipv4.dst =
 			nla_get_in_addr(data[IFLA_GENEVE_REMOTE]);
+
+		if (IN_MULTICAST(ntohl(info.key.u.ipv4.dst))) {
+			netdev_dbg(dev, "multicast remote is unsupported\n");
+			return -EINVAL;
+		}
 	}
 
 	if (data[IFLA_GENEVE_REMOTE6]) {
-		if (!IS_ENABLED(CONFIG_IPV6))
-			return -EPFNOSUPPORT;
-
-		remote.sa.sa_family = AF_INET6;
-		remote.sin6.sin6_addr =
+ #if IS_ENABLED(CONFIG_IPV6)
+		info.mode = IP_TUNNEL_INFO_IPV6;
+		info.key.u.ipv6.dst =
 			nla_get_in6_addr(data[IFLA_GENEVE_REMOTE6]);
 
-		if (ipv6_addr_type(&remote.sin6.sin6_addr) &
+		if (ipv6_addr_type(&info.key.u.ipv6.dst) &
 		    IPV6_ADDR_LINKLOCAL) {
 			netdev_dbg(dev, "link-local remote is unsupported\n");
 			return -EINVAL;
 		}
+		if (ipv6_addr_is_multicast(&info.key.u.ipv6.dst)) {
+			netdev_dbg(dev, "multicast remote is unsupported\n");
+			return -EINVAL;
+		}
+		info.key.tun_flags |= TUNNEL_CSUM;
+		use_udp6_rx_checksums = true;
+#else
+		return -EPFNOSUPPORT;
+#endif
 	}
 
-	if (data[IFLA_GENEVE_ID])
+	if (data[IFLA_GENEVE_ID]) {
+		__u32 vni;
+		__u8 tvni[3];
+
 		vni = nla_get_u32(data[IFLA_GENEVE_ID]);
+		tvni[0] = (vni & 0x00ff0000) >> 16;
+		tvni[1] = (vni & 0x0000ff00) >> 8;
+		tvni[2] =  vni & 0x000000ff;
 
+		info.key.tun_id = vni_to_tunnel_id(tvni);
+	}
 	if (data[IFLA_GENEVE_TTL])
-		ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
+		info.key.ttl = nla_get_u8(data[IFLA_GENEVE_TTL]);
 
 	if (data[IFLA_GENEVE_TOS])
-		tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
+		info.key.tos = nla_get_u8(data[IFLA_GENEVE_TOS]);
 
-	if (data[IFLA_GENEVE_LABEL])
-		label = nla_get_be32(data[IFLA_GENEVE_LABEL]) &
-			IPV6_FLOWLABEL_MASK;
+	if (data[IFLA_GENEVE_LABEL]) {
+		info.key.label = nla_get_be32(data[IFLA_GENEVE_LABEL]) &
+				  IPV6_FLOWLABEL_MASK;
+		if (info.key.label && (!(info.mode & IP_TUNNEL_INFO_IPV6)))
+			return -EINVAL;
+	}
 
 	if (data[IFLA_GENEVE_PORT])
-		dst_port = nla_get_be16(data[IFLA_GENEVE_PORT]);
+		info.key.tp_dst = nla_get_be16(data[IFLA_GENEVE_PORT]);
 
 	if (data[IFLA_GENEVE_COLLECT_METADATA])
 		metadata = true;
 
 	if (data[IFLA_GENEVE_UDP_CSUM] &&
 	    !nla_get_u8(data[IFLA_GENEVE_UDP_CSUM]))
-		flags |= GENEVE_F_UDP_ZERO_CSUM_TX;
+		info.key.tun_flags |= TUNNEL_CSUM;
 
 	if (data[IFLA_GENEVE_UDP_ZERO_CSUM6_TX] &&
 	    nla_get_u8(data[IFLA_GENEVE_UDP_ZERO_CSUM6_TX]))
-		flags |= GENEVE_F_UDP_ZERO_CSUM6_TX;
+		info.key.tun_flags &= ~TUNNEL_CSUM;
 
 	if (data[IFLA_GENEVE_UDP_ZERO_CSUM6_RX] &&
 	    nla_get_u8(data[IFLA_GENEVE_UDP_ZERO_CSUM6_RX]))
-		flags |= GENEVE_F_UDP_ZERO_CSUM6_RX;
+		use_udp6_rx_checksums = false;
 
-	return geneve_configure(net, dev, &remote, vni, ttl, tos, label,
-				dst_port, metadata, flags);
+	return geneve_configure(net, dev, &info, metadata, use_udp6_rx_checksums);
 }
 
 static void geneve_dellink(struct net_device *dev, struct list_head *head)
@@ -1418,45 +1326,52 @@ static size_t geneve_get_size(const struct net_device *dev)
 static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
+	struct ip_tunnel_info *info = &geneve->info;
+	__u8 tmp_vni[3];
 	__u32 vni;
 
-	vni = (geneve->vni[0] << 16) | (geneve->vni[1] << 8) | geneve->vni[2];
+	tunnel_id_to_vni(info->key.tun_id, tmp_vni);
+	vni = (tmp_vni[0] << 16) | (tmp_vni[1] << 8) | tmp_vni[2];
 	if (nla_put_u32(skb, IFLA_GENEVE_ID, vni))
 		goto nla_put_failure;
 
-	if (geneve->remote.sa.sa_family == AF_INET) {
+	if (ip_tunnel_info_af(info) == AF_INET) {
 		if (nla_put_in_addr(skb, IFLA_GENEVE_REMOTE,
-				    geneve->remote.sin.sin_addr.s_addr))
+				    info->key.u.ipv4.dst))
+			goto nla_put_failure;
+
+		if (nla_put_u8(skb, IFLA_GENEVE_UDP_CSUM,
+			       !!(info->key.tun_flags & TUNNEL_CSUM)))
 			goto nla_put_failure;
+
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		if (nla_put_in6_addr(skb, IFLA_GENEVE_REMOTE6,
-				     &geneve->remote.sin6.sin6_addr))
+				     &info->key.u.ipv6.dst))
+			goto nla_put_failure;
+
+		if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_TX,
+			       !(info->key.tun_flags & TUNNEL_CSUM)))
+			goto nla_put_failure;
+
+		if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
+			       !geneve->use_udp6_rx_checksums))
 			goto nla_put_failure;
 #endif
 	}
 
-	if (nla_put_u8(skb, IFLA_GENEVE_TTL, geneve->ttl) ||
-	    nla_put_u8(skb, IFLA_GENEVE_TOS, geneve->tos) ||
-	    nla_put_be32(skb, IFLA_GENEVE_LABEL, geneve->label))
+	if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) ||
+	    nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) ||
+	    nla_put_be32(skb, IFLA_GENEVE_LABEL, info->key.label))
 		goto nla_put_failure;
 
-	if (nla_put_be16(skb, IFLA_GENEVE_PORT, geneve->dst_port))
+	if (nla_put_be16(skb, IFLA_GENEVE_PORT, info->key.tp_dst))
 		goto nla_put_failure;
 
 	if (geneve->collect_md) {
 		if (nla_put_flag(skb, IFLA_GENEVE_COLLECT_METADATA))
 			goto nla_put_failure;
 	}
-
-	if (nla_put_u8(skb, IFLA_GENEVE_UDP_CSUM,
-		       !(geneve->flags & GENEVE_F_UDP_ZERO_CSUM_TX)) ||
-	    nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_TX,
-		       !!(geneve->flags & GENEVE_F_UDP_ZERO_CSUM6_TX)) ||
-	    nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
-		       !!(geneve->flags & GENEVE_F_UDP_ZERO_CSUM6_RX)))
-		goto nla_put_failure;
-
 	return 0;
 
 nla_put_failure:
@@ -1480,6 +1395,7 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
 					u8 name_assign_type, u16 dst_port)
 {
 	struct nlattr *tb[IFLA_MAX + 1];
+	struct ip_tunnel_info info;
 	struct net_device *dev;
 	LIST_HEAD(list_kill);
 	int err;
@@ -1490,9 +1406,8 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
 	if (IS_ERR(dev))
 		return dev;
 
-	err = geneve_configure(net, dev, &geneve_remote_unspec,
-			       0, 0, 0, 0, htons(dst_port), true,
-			       GENEVE_F_UDP_ZERO_CSUM6_RX);
+	init_tnl_info(&info, dst_port);
+	err = geneve_configure(net, dev, &info, true, true);
 	if (err) {
 		free_netdev(dev);
 		return ERR_PTR(err);
@@ -1510,8 +1425,7 @@ struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
 		goto err;
 
 	return dev;
-
- err:
+err:
 	geneve_dellink(dev, &list_kill);
 	unregister_netdevice_many(&list_kill);
 	return ERR_PTR(err);
@@ -1594,7 +1508,6 @@ static int __init geneve_init_module(void)
 		goto out3;
 
 	return 0;
-
 out3:
 	unregister_netdevice_notifier(&geneve_notifier_block);
 out2:
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/4] geneve: Remove redundant socket checks.
From: Pravin B Shelar @ 2016-11-18  7:10 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar
In-Reply-To: <1479453029-29619-1-git-send-email-pshelar@ovn.org>

Geneve already has check for device socket in route
lookup function. So no need to check it in xmit
function.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
---
 drivers/net/geneve.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 9a4351c..8e6c659 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -785,14 +785,11 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	struct geneve_sock *gs4 = rcu_dereference(geneve->sock4);
 	const struct ip_tunnel_key *key = &info->key;
 	struct rtable *rt;
-	int err = -EINVAL;
 	struct flowi4 fl4;
 	__u8 tos, ttl;
 	__be16 sport;
 	__be16 df;
-
-	if (!gs4)
-		return err;
+	int err;
 
 	rt = geneve_get_v4_rt(skb, dev, &fl4, info);
 	if (IS_ERR(rt))
@@ -828,13 +825,10 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	struct geneve_sock *gs6 = rcu_dereference(geneve->sock6);
 	const struct ip_tunnel_key *key = &info->key;
 	struct dst_entry *dst = NULL;
-	int err = -EINVAL;
 	struct flowi6 fl6;
 	__u8 prio, ttl;
 	__be16 sport;
-
-	if (!gs6)
-		return err;
+	int err;
 
 	dst = geneve_get_v6_dst(skb, dev, &fl6, info);
 	if (IS_ERR(dst))
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 4/4] geneve: Optimize geneve device lookup.
From: Pravin B Shelar @ 2016-11-18  7:10 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar
In-Reply-To: <1479453029-29619-1-git-send-email-pshelar@ovn.org>

Rather than comparing 64-bit tunnel-id, compare tunnel vni
which is 24-bit id. This also save conversion from vni
to tunnel id on each tunnel packet receive.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
---
 drivers/net/geneve.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 8e6c659..21f3270 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -103,6 +103,17 @@ static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
 #endif
 }
 
+static bool cmp_tunnel_id_and_vni(u8 *tun_id, u8 *vni)
+{
+#ifdef __BIG_ENDIAN
+	return (vni[0] == tun_id[2]) &&
+	       (vni[1] == tun_id[1]) &&
+	       (vni[2] == tun_id[0]);
+#else
+	return !memcmp(vni, &tun_id[5], 3);
+#endif
+}
+
 static sa_family_t geneve_get_sk_family(struct geneve_sock *gs)
 {
 	return gs->sock->sk->sk_family;
@@ -111,7 +122,6 @@ static sa_family_t geneve_get_sk_family(struct geneve_sock *gs)
 static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 					__be32 addr, u8 vni[])
 {
-	__be64 id = vni_to_tunnel_id(vni);
 	struct hlist_head *vni_list_head;
 	struct geneve_dev *geneve;
 	__u32 hash;
@@ -120,7 +130,7 @@ static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 	hash = geneve_net_vni_hash(vni);
 	vni_list_head = &gs->vni_list[hash];
 	hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
-		if (!memcmp(&id, &geneve->info.key.tun_id, sizeof(id)) &&
+		if (cmp_tunnel_id_and_vni((u8 *)&geneve->info.key.tun_id, vni) &&
 		    addr == geneve->info.key.u.ipv4.dst)
 			return geneve;
 	}
@@ -131,7 +141,6 @@ static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 static struct geneve_dev *geneve6_lookup(struct geneve_sock *gs,
 					 struct in6_addr addr6, u8 vni[])
 {
-	__be64 id = vni_to_tunnel_id(vni);
 	struct hlist_head *vni_list_head;
 	struct geneve_dev *geneve;
 	__u32 hash;
@@ -140,7 +149,7 @@ static struct geneve_dev *geneve6_lookup(struct geneve_sock *gs,
 	hash = geneve_net_vni_hash(vni);
 	vni_list_head = &gs->vni_list[hash];
 	hlist_for_each_entry_rcu(geneve, vni_list_head, hlist) {
-		if (!memcmp(&id, &geneve->info.key.tun_id, sizeof(id)) &&
+		if (cmp_tunnel_id_and_vni((u8 *)&geneve->info.key.tun_id, vni) &&
 		    ipv6_addr_equal(&addr6, &geneve->info.key.u.ipv6.dst))
 			return geneve;
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next] bridge: add igmpv3 and mldv2 query support
From: Hangbin Liu @ 2016-11-18  7:32 UTC (permalink / raw)
  To: netdev
  Cc: Hannes Frederic Sowa, Nikolay Aleksandrov, linus.luessing,
	Hangbin Liu

Add bridge IGMPv3 and MLDv2 query support. But before we think it is stable
enough, only enable it when declare in force_igmp/mld_version.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/bridge/br_multicast.c | 203 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 194 insertions(+), 9 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2136e45..9fb47f3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -35,6 +35,10 @@
 
 #include "br_private.h"
 
+#define IGMP_V3_SEEN(in_dev) \
+	(IPV4_DEVCONF_ALL(dev_net(in_dev->dev), FORCE_IGMP_VERSION) == 3 || \
+	 IN_DEV_CONF_GET((in_dev), FORCE_IGMP_VERSION) == 3)
+
 static void br_multicast_start_querier(struct net_bridge *br,
 				       struct bridge_mcast_own_query *query);
 static void br_multicast_add_router(struct net_bridge *br,
@@ -360,9 +364,8 @@ static int br_mdb_rehash(struct net_bridge_mdb_htable __rcu **mdbp, int max,
 	return 0;
 }
 
-static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
-						    __be32 group,
-						    u8 *igmp_type)
+static struct sk_buff *br_ip4_alloc_query_v2(struct net_bridge *br,
+					     __be32 group, u8 *igmp_type)
 {
 	struct sk_buff *skb;
 	struct igmphdr *ih;
@@ -428,10 +431,82 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br,
 	return skb;
 }
 
+static struct sk_buff *br_ip4_alloc_query_v3(struct net_bridge *br,
+					     __be32 group, u8 *igmp_type)
+{
+	struct sk_buff *skb;
+	struct igmpv3_query *ih3;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+
+	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*iph) +
+						 sizeof(*ih3) + 4);
+	if (!skb)
+		goto out;
+
+	skb->protocol = htons(ETH_P_IP);
+
+	skb_reset_mac_header(skb);
+	eth = eth_hdr(skb);
+
+	ether_addr_copy(eth->h_source, br->dev->dev_addr);
+	eth->h_dest[0] = 1;
+	eth->h_dest[1] = 0;
+	eth->h_dest[2] = 0x5e;
+	eth->h_dest[3] = 0;
+	eth->h_dest[4] = 0;
+	eth->h_dest[5] = 1;
+	eth->h_proto = htons(ETH_P_IP);
+	skb_put(skb, sizeof(*eth));
+
+	skb_set_network_header(skb, skb->len);
+	iph = ip_hdr(skb);
+
+	iph->version = 4;
+	iph->ihl = 6;
+	iph->tos = 0xc0;
+	iph->tot_len = htons(sizeof(*iph) + sizeof(*ih3) + 4);
+	iph->id = 0;
+	iph->frag_off = htons(IP_DF);
+	iph->ttl = 1;
+	iph->protocol = IPPROTO_IGMP;
+	iph->saddr = br->multicast_query_use_ifaddr ?
+		     inet_select_addr(br->dev, 0, RT_SCOPE_LINK) : 0;
+	iph->daddr = htonl(INADDR_ALLHOSTS_GROUP);
+	((u8 *)&iph[1])[0] = IPOPT_RA;
+	((u8 *)&iph[1])[1] = 4;
+	((u8 *)&iph[1])[2] = 0;
+	((u8 *)&iph[1])[3] = 0;
+	ip_send_check(iph);
+	skb_put(skb, 24);
+
+	skb_set_transport_header(skb, skb->len);
+	ih3 = igmpv3_query_hdr(skb);
+
+	*igmp_type = IGMP_HOST_MEMBERSHIP_QUERY;
+	ih3->type = IGMP_HOST_MEMBERSHIP_QUERY;
+	ih3->code = (group ? br->multicast_last_member_interval :
+			    br->multicast_query_response_interval) /
+		   (HZ / IGMP_TIMER_SCALE);
+	ih3->csum = 0;
+	ih3->group = group;
+	ih3->resv = 0;
+	ih3->suppress = 0;
+	ih3->qrv= 2;
+	ih3->qqic = br->multicast_query_interval / HZ;
+	ih3->nsrcs = 0;
+	ih3->csum = ip_compute_csum((void *)ih3, sizeof(struct igmpv3_query ));
+	skb_put(skb, sizeof(*ih3));
+
+	__skb_pull(skb, sizeof(*eth));
+
+out:
+	return skb;
+}
 #if IS_ENABLED(CONFIG_IPV6)
-static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
-						    const struct in6_addr *grp,
-						    u8 *igmp_type)
+static struct sk_buff *br_ip6_alloc_query_v1(struct net_bridge *br,
+					     const struct in6_addr *grp,
+					     u8 *igmp_type)
 {
 	struct sk_buff *skb;
 	struct ipv6hdr *ip6h;
@@ -514,19 +589,129 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br,
 out:
 	return skb;
 }
+
+static struct sk_buff *br_ip6_alloc_query_v2(struct net_bridge *br,
+					     const struct in6_addr *grp,
+					     u8 *igmp_type)
+{
+	struct sk_buff *skb;
+	struct ipv6hdr *ip6h;
+	struct mld2_query *mld2q;
+	struct ethhdr *eth;
+	u8 *hopopt;
+	unsigned long interval;
+
+	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*ip6h) +
+						 8 + sizeof(*mld2q));
+	if (!skb)
+		goto out;
+
+	skb->protocol = htons(ETH_P_IPV6);
+
+	/* Ethernet header */
+	skb_reset_mac_header(skb);
+	eth = eth_hdr(skb);
+
+	ether_addr_copy(eth->h_source, br->dev->dev_addr);
+	eth->h_proto = htons(ETH_P_IPV6);
+	skb_put(skb, sizeof(*eth));
+
+	/* IPv6 header + HbH option */
+	skb_set_network_header(skb, skb->len);
+	ip6h = ipv6_hdr(skb);
+
+	*(__force __be32 *)ip6h = htonl(0x60000000);
+	ip6h->payload_len = htons(8 + sizeof(*mld2q));
+	ip6h->nexthdr = IPPROTO_HOPOPTS;
+	ip6h->hop_limit = 1;
+	ipv6_addr_set(&ip6h->daddr, htonl(0xff020000), 0, 0, htonl(1));
+	if (ipv6_dev_get_saddr(dev_net(br->dev), br->dev, &ip6h->daddr, 0,
+			       &ip6h->saddr)) {
+		kfree_skb(skb);
+		br->has_ipv6_addr = 0;
+		return NULL;
+	}
+
+	br->has_ipv6_addr = 1;
+	ipv6_eth_mc_map(&ip6h->daddr, eth->h_dest);
+
+	hopopt = (u8 *)(ip6h + 1);
+	hopopt[0] = IPPROTO_ICMPV6;		/* next hdr */
+	hopopt[1] = 0;				/* length of HbH */
+	hopopt[2] = IPV6_TLV_ROUTERALERT;	/* Router Alert */
+	hopopt[3] = 2;				/* Length of RA Option */
+	hopopt[4] = 0;				/* Type = 0x0000 (MLD) */
+	hopopt[5] = 0;
+	hopopt[6] = IPV6_TLV_PAD1;		/* Pad1 */
+	hopopt[7] = IPV6_TLV_PAD1;		/* Pad1 */
+
+	skb_put(skb, sizeof(*ip6h) + 8);
+
+	/* ICMPv6 */
+	skb_set_transport_header(skb, skb->len);
+	mld2q = (struct mld2_query *) icmp6_hdr(skb);
+
+	interval = ipv6_addr_any(grp) ?
+			br->multicast_query_response_interval :
+			br->multicast_last_member_interval;
+
+	*igmp_type = ICMPV6_MGM_QUERY;
+	mld2q->mld2q_type = ICMPV6_MGM_QUERY;
+	mld2q->mld2q_code = 0;
+	mld2q->mld2q_cksum = 0;
+	mld2q->mld2q_mrc = htons((u16)jiffies_to_msecs(interval));
+	mld2q->mld2q_resv1 = 0;
+	mld2q->mld2q_mca = *grp;
+	mld2q->mld2q_resv2 = 0;
+	mld2q->mld2q_suppress = 0;
+	mld2q->mld2q_qrv = 2;
+	mld2q->mld2q_qqic = br->multicast_query_interval / HZ;
+	mld2q->mld2q_nsrcs = 0;
+
+	/* checksum */
+	mld2q->mld2q_cksum = csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr,
+					  sizeof(*mld2q), IPPROTO_ICMPV6,
+					  csum_partial(mld2q,
+						       sizeof(*mld2q), 0));
+	skb_put(skb, sizeof(*mld2q));
+
+	__skb_pull(skb, sizeof(*eth));
+
+out:
+	return skb;
+}
 #endif
 
+static int mld_force_mld_version(const struct inet6_dev *idev)
+{
+	if (dev_net(idev->dev)->ipv6.devconf_all->force_mld_version != 0)
+		return dev_net(idev->dev)->ipv6.devconf_all->force_mld_version;
+	else
+		return idev->cnf.force_mld_version;
+}
+
 static struct sk_buff *br_multicast_alloc_query(struct net_bridge *br,
 						struct br_ip *addr,
 						u8 *igmp_type)
 {
+	struct in_device *in_dev = __in_dev_get_rcu(br->dev);
+	struct inet6_dev *idev = __in6_dev_get(br->dev);
 	switch (addr->proto) {
 	case htons(ETH_P_IP):
-		return br_ip4_multicast_alloc_query(br, addr->u.ip4, igmp_type);
+		if (IGMP_V3_SEEN(in_dev))
+			return br_ip4_alloc_query_v3(br, addr->u.ip4,
+						     igmp_type);
+		else
+			return br_ip4_alloc_query_v2(br, addr->u.ip4,
+						     igmp_type);
 #if IS_ENABLED(CONFIG_IPV6)
 	case htons(ETH_P_IPV6):
-		return br_ip6_multicast_alloc_query(br, &addr->u.ip6,
-						    igmp_type);
+		if (mld_force_mld_version(idev) == 2)
+			return br_ip6_alloc_query_v2(br, &addr->u.ip6,
+						     igmp_type);
+		else
+			return br_ip6_alloc_query_v1(br, &addr->u.ip6,
+						     igmp_type);
 #endif
 	}
 	return NULL;
-- 
2.5.5

^ permalink raw reply related

* pull-request: mac80211 2016-11-18
From: Johannes Berg @ 2016-11-18  7:52 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA

Hi Dave,

Due to travel/vacation, this is a bit late, but there aren't
that many fixes either. Most interesting/important are the
fixes from Felix and perhaps the scan entry limit.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit 269ebce4531b8edc4224259a02143181a1c1d77c:

  xen-netfront: cast grant table reference first to type int (2016-11-02 15:33:36 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-davem-2016-11-18

for you to fetch changes up to 9853a55ef1bb66d7411136046060bbfb69c714fa:

  cfg80211: limit scan results cache size (2016-11-18 08:44:44 +0100)

----------------------------------------------------------------
A few more bugfixes:
 * limit # of scan results stored in memory - this is a long-standing bug
   Jouni and I only noticed while discussing other things in Santa Fe
 * revert AP_LINK_PS patch that was causing issues (Felix)
 * various A-MSDU/A-MPDU fixes for TXQ code (Felix)
 * interoperability workaround for peers with broken VHT capabilities
   (Filip Matusiak)
 * add bitrate definition for a VHT MCS that's supposed to be invalid
   but gets used by some hardware anyway (Thomas Pedersen)
 * beacon timer fix in hwsim (Benjamin Beichler)

----------------------------------------------------------------
Benjamin Beichler (1):
      mac80211_hwsim: fix beacon delta calculation

Felix Fietkau (4):
      Revert "mac80211: allow using AP_LINK_PS with mac80211-generated TIM IE"
      mac80211: update A-MPDU flag on tx dequeue
      mac80211: remove bogus skb vif assignment
      mac80211: fix A-MSDU aggregation with fast-xmit + txq

Filip Matusiak (1):
      mac80211: Ignore VHT IE from peer with wrong rx_mcs_map

Johannes Berg (1):
      cfg80211: limit scan results cache size

Pedersen, Thomas (1):
      cfg80211: add bitrate for 20MHz MCS 9

 drivers/net/wireless/mac80211_hwsim.c |  2 +-
 net/mac80211/sta_info.c               |  2 +-
 net/mac80211/tx.c                     | 14 +++++--
 net/mac80211/vht.c                    | 16 ++++++++
 net/wireless/core.h                   |  1 +
 net/wireless/scan.c                   | 69 +++++++++++++++++++++++++++++++++++
 net/wireless/util.c                   |  3 +-
 7 files changed, 100 insertions(+), 7 deletions(-)

^ permalink raw reply

* RE: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Hayes Wang @ 2016-11-18  7:57 UTC (permalink / raw)
  To: Mark Lord, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: nic_swsd, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <d683c019-4e0f-6fe6-368c-c4fc86c72fe6-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>

Mark Lord [mailto:mlord-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org]
> Sent: Thursday, November 17, 2016 9:42 PM
[...]
> What the above sample shows, is the URB transfer buffer ran out of space in the
> middle
> of a packet, and the hardware then tried to just continue that same packet in the
> next URB,
> without an rx_desc header inserted.  The r8152.c driver always assumes the URB
> buffer begins
> with an rx_desc, so of course this behaviour produces really weird effects, and
> system crashes, etc..

The USB device wouldn't know the address and size of buffer. Only
the USB host controller knows. Therefore, the device sends the
data to host, and the host fills the memory. According to your
description, it seems the host splits the data from the device
into two different buffers (or URB transfers). I wonder if it would
occur. As far as I know, the host wouldn't allow the buffer size
less than the data length.

Our hw engineers need the log from the USB analyzer to confirm
what the device sends to the host. However, I don't think you
have USB analyzer to do this. I would try to reproduce the issue.
But, I am busy, so I don't think I would response quickly.

Besides, the maximum data length which the RTL8152 would send to
the host is 16KB. That is, if the agg_buf_sz is 16KB, the host
wouldn't split it. However, you still see problems for it.

[...]
> It is not clear to me how the chip decides when to forward an rx URB to the host.
> If you could describe how that part works for us, then it would help in further
> understanding why fast systems (eg. a PC) don't generally notice the issue,
> while much slower embedded systems do see the issue regularly.

The driver expects the rx buffer would be

	rx_desc + a packet + padding to 8 alignment + 
	rx_desc + a packet + padding to 8 alignment + ...
	
Therefore, when a urb transfer is completed, the driver parsers
the buffer by this way. After the buffer is handled, it would
be submitted to the host, until the transfer is completed again.
If the submitting fail, the driver would try again later. The
urb->actual_length means how much data the host fills. The drive
uses it to check the end of the data. The urb->status mean if
the transfer is successful. The driver submits the urb to the
host directly if the status is not successful.

Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 1/2] vhost: remove unused feature bit
From: Jason Wang @ 2016-11-18  7:58 UTC (permalink / raw)
  To: mst, jasowang; +Cc: netdev, linux-kernel, kvm, virtualization

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/uapi/linux/vhost.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 56b7ab5..60180c0 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -172,8 +172,6 @@ struct vhost_memory {
 #define VHOST_F_LOG_ALL 26
 /* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
 #define VHOST_NET_F_VIRTIO_NET_HDR 27
-/* Vhost have device IOTLB */
-#define VHOST_F_DEVICE_IOTLB 63
 
 /* VHOST_SCSI specific definitions */
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/2] vhost: forbid IOTLB invalidation when not enabled
From: Jason Wang @ 2016-11-18  7:58 UTC (permalink / raw)
  To: mst, jasowang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1479455920-3285-1-git-send-email-jasowang@redhat.com>

When IOTLB is not enabled, we should forbid IOTLB invalidation to
avoid a NULL pointer dereference.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c6f2d89..7d338d5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -959,6 +959,10 @@ int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		vhost_iotlb_notify_vq(dev, msg);
 		break;
 	case VHOST_IOTLB_INVALIDATE:
+		if (!dev->iotlb) {
+			ret = -EFAULT;
+			break;
+		}
 		vhost_del_umem_range(dev->iotlb, msg->iova,
 				     msg->iova + msg->size - 1);
 		break;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next v4 3/5] bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info
From: Gregory CLEMENT @ 2016-11-18  8:32 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem, mw, arnd, Shaohui.Xie, andrew
In-Reply-To: <20161117191914.11077-4-f.fainelli@gmail.com>

Hi Florian,
 
 On jeu., nov. 17 2016, Florian Fainelli <f.fainelli@gmail.com> wrote:

> In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST,
> provide an inline stub for mvebu_mbus_get_dram_win_info().

Actually the set of SoCs supporting mbus is more reduce than MVEBU. You
can have a look on 434cec62a6d7 ("bus: mvebu-mbus: Provide stub function
for mvebu_mbus_get_io_win_info()"), PLAT_ORION seems the good option.


Thanks,

Gregory

>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  include/linux/mbus.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/include/linux/mbus.h b/include/linux/mbus.h
> index 2931aa43dab1..0d3f14fd2621 100644
> --- a/include/linux/mbus.h
> +++ b/include/linux/mbus.h
> @@ -82,6 +82,7 @@ static inline int mvebu_mbus_get_io_win_info(phys_addr_t phyaddr, u32 *size,
>  }
>  #endif
>  
> +#ifdef CONFIG_MVEBU_MBUS
>  int mvebu_mbus_save_cpu_target(u32 __iomem *store_addr);
>  void mvebu_mbus_get_pcie_mem_aperture(struct resource *res);
>  void mvebu_mbus_get_pcie_io_aperture(struct resource *res);
> @@ -97,5 +98,12 @@ int mvebu_mbus_init(const char *soc, phys_addr_t mbus_phys_base,
>  		    size_t mbus_size, phys_addr_t sdram_phys_base,
>  		    size_t sdram_size);
>  int mvebu_mbus_dt_init(bool is_coherent);
> +#else
> +static inline int mvebu_mbus_get_dram_win_info(phys_addr_t phyaddr, u8 *target,
> +					       u8 *attr)
> +{
> +	return -EINVAL;
> +}
> +#endif /* CONFIG_MVEBU_MBUS */
>  
>  #endif /* __LINUX_MBUS_H */
> -- 
> 2.9.3
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH net-next 1/4] geneve: Unify LWT and netdev handling.
From: kbuild test robot @ 2016-11-18  8:40 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: kbuild-all, netdev, Pravin B Shelar
In-Reply-To: <1479453029-29619-2-git-send-email-pshelar@ovn.org>

[-- Attachment #1: Type: text/plain, Size: 2771 bytes --]

Hi Pravin,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Pravin-B-Shelar/geneve-Use-LWT-more-effectively/20161118-151426
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   drivers/net/geneve.c: In function 'geneve_xmit':
>> drivers/net/geneve.c:942:10: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized]
     else if (err == -ENETUNREACH)
             ^

vim +/err +942 drivers/net/geneve.c

8ed66f0e John W. Linville 2015-10-26  926  	}
8eb3b995 Daniel Borkmann  2016-03-09  927  
e6efd144 Pravin B Shelar  2016-11-17  928  #if IS_ENABLED(CONFIG_IPV6)
e6efd144 Pravin B Shelar  2016-11-17  929  	if (info->mode & IP_TUNNEL_INFO_IPV6)
e6efd144 Pravin B Shelar  2016-11-17  930  		err = geneve6_xmit_skb(skb, dev, geneve, info);
e6efd144 Pravin B Shelar  2016-11-17  931  	else
e6efd144 Pravin B Shelar  2016-11-17  932  #endif
e6efd144 Pravin B Shelar  2016-11-17  933  		err = geneve_xmit_skb(skb, dev, geneve, info);
8ed66f0e John W. Linville 2015-10-26  934  
e6efd144 Pravin B Shelar  2016-11-17  935  	if (likely(!err))
e6efd144 Pravin B Shelar  2016-11-17  936  		return NETDEV_TX_OK;
8ed66f0e John W. Linville 2015-10-26  937  tx_error:
8ed66f0e John W. Linville 2015-10-26  938  	dev_kfree_skb(skb);
aed069df Alexander Duyck  2016-04-14  939  
8ed66f0e John W. Linville 2015-10-26  940  	if (err == -ELOOP)
8ed66f0e John W. Linville 2015-10-26  941  		dev->stats.collisions++;
8ed66f0e John W. Linville 2015-10-26 @942  	else if (err == -ENETUNREACH)
8ed66f0e John W. Linville 2015-10-26  943  		dev->stats.tx_carrier_errors++;
efeb2267 Haishuang Yan    2016-06-21  944  
8ed66f0e John W. Linville 2015-10-26  945  	dev->stats.tx_errors++;
8ed66f0e John W. Linville 2015-10-26  946  	return NETDEV_TX_OK;
8ed66f0e John W. Linville 2015-10-26  947  }
8ed66f0e John W. Linville 2015-10-26  948  
91572088 Jarod Wilson     2016-10-20  949  static int geneve_change_mtu(struct net_device *dev, int new_mtu)
55e5bfb5 David Wragg      2016-02-10  950  {

:::::: The code at line 942 was first introduced by commit
:::::: 8ed66f0e8235118a31720acdab3bbbe9debd0f6a geneve: implement support for IPv6-based tunnels

:::::: TO: John W. Linville <linville@tuxdriver.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56934 bytes --]

^ permalink raw reply

* RE: [PATCH v9 0/8] thunderbolt: Introducing Thunderbolt(TM) Networking
From: Levy, Amir (Jer) @ 2016-11-18  8:48 UTC (permalink / raw)
  To: Simon Guinot
  Cc: gregkh@linuxfoundation.org, andreas.noever@gmail.com,
	bhelgaas@google.com, corbet@lwn.net, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, mario_limonciello@dell.com,
	thunderbolt-linux, Westerberg, Mika, Winkler, Tomas,
	Zhang, Xiong Y, Jamet, Michael, remi.rerolle@seagate.com
In-Reply-To: <20161115105920.GH6167@kw.sim.vm.gnt>

On Tue, Nov 15 2016, 12:59 PM, Simon Guinot wrote:
> On Wed, Nov 09, 2016 at 03:42:53PM +0000, Levy, Amir (Jer) wrote:
> > On Wed, Nov 9 2016, 04:36 PM, Simon Guinot wrote:
> > > Hi Amir,
> > >
> > > I have an ASUS "All Series/Z87-DELUXE/QUAD" motherboard with a 
> > > Thunderbolt 2 "Falcon Ridge" chipset (device ID 156d).
> > >
> > > Is the thunderbolt-icm driver supposed to work with this chipset ?
> > >
> >
> > Yes, the thunderbolt-icm supports Falcon Ridge, device ID 156c.
> > 156d is the bridge -
> > http://lxr.free-electrons.com/source/include/linux/pci_ids.h#L2619
> >
> > > I have installed both a 4.8.6 Linux kernel (patched with your v9
> > > series) and the thunderbolt-software-daemon (27 october release) 
> > > inside a Debian system (Jessie).
> > >
> > > If I connect the ASUS motherboard with a MacBook Pro (Thunderbolt 
> > > 2, device ID 156c), I can see that the thunderbolt-icm driver is 
> > > loaded and that the thunderbolt-software-daemon is well started. 
> > > But the Ethernet interface is not created.
> > >
> > > I have attached to this email the syslog file. There is the logs 
> > > from both the kernel and the daemon inside. Note that the daemon 
> > > logs are everything but clear about what could be the issue. Maybe 
> > > I missed some kind of configuration ? But I failed to find any 
> > > valuable information about configuring the driver and/or the 
> > > daemon in
> the various documentation files.
> > >
> > > Please, can you provide some guidance ? I'd really like to test 
> > > your patch series.
> >
> > First, thank you very much for willing to test it.
> > Thunderbolt Networking support was added during Falcon Ridge, in the
> latest FR images.
> > Do you know which Thunderbolt image version you have on your system?
> > Currently I submitted only Thunderbolt Networking feature in Linux, 
> > and we plan to add more features like reading the image version and
> updating the image.
> > If you don't know the image version, the only thing I can suggest is 
> > to load windows, install thunderbolt SW and check in the Thunderbolt
> application the image version.
> > To know if image update is needed, you can check - 
> > https://thunderbolttechnology.net/updates
> 
> Hi Amir,
> 
> From the Windows Thunderbolt software, I can read 13.00 for the 
> firmware version. And from https://thunderbolttechnology.net/updates, 
> I can see that there is no update available for my ASUS motherboard.
> 
> Am I good to go ?
> 

Thunderbolt Networking is supported on both Thunderbolt(tm) 2 and Thunderbolt(tm) 3 systems.  
Thunderbolt 2 systems must have updated NVM (version 25 or later) in order for the functionality to work properly.  
If the system does not have the update, please contact the OEM directly for an updated NVM.  
For best functionality and support, Intel recommends using Thunderbolt 3 systems for all validation and testing.

> BTW, it is quite a shame that the Thunderbolt firmware version can't 
> be read from Linux.
> 

This is WIP, once this patch will be upstream, we will be able to focus more
on aligning Linux with the Thunderbolt features that we have for windows.

Regards,
Amir

^ permalink raw reply

* Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
From: Nicolas Ferre @ 2016-11-18  9:10 UTC (permalink / raw)
  To: Rafal Ozieblo, Harini Katakam, Andrei Pistirica
  Cc: harini.katakam@xilinx.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <BN3PR07MB25166D8301C60C4A35402AB1C9B00@BN3PR07MB2516.namprd07.prod.outlook.com>

Le 18/11/2016 à 09:59, Rafal Ozieblo a écrit :
> Hello,
> 
>> From: Harini Katakam [mailto:harinikatakamlinux@gmail.com]
>> Sent: 18 listopada 2016 05:30
>> To: Rafal Ozieblo
>> Cc: Nicolas Ferre; harini.katakam@xilinx.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
>>
>> Hi Rafal,
>>
>> On Thu, Nov 17, 2016 at 7:05 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:
>>> -----Original Message-----
>>> From: Nicolas Ferre [mailto:nicolas.ferre@atmel.com]
>>> Sent: 17 listopada 2016 14:29
>>> To: Harini Katakam; Rafal Ozieblo
>>> Cc: harini.katakam@xilinx.com; netdev@vger.kernel.org;
>>> linux-kernel@vger.kernel.org
>>> Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support
>>> for GEM
>>>
>>>> Le 17/11/2016 à 13:21, Harini Katakam a écrit :
>>>>> Hi Rafal,
>>>>>
>>>>> On Thu, Nov 17, 2016 at 5:20 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:
>>>>>> Hello,
>>>>>> I think, there could a bug in your patch.
>>>>>>
>>>>>>> +
>>>>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>>>>> +             dmacfg |= GEM_BIT(ADDR64); #endif
>>>>>>
>>>>>> You enable 64 bit addressing (64b dma bus width) always when appropriate architecture config option is enabled.
>>>>>> But there are some legacy controllers which do not support that feature. According Cadence hardware team:
>>>>>> "64 bit addressing was added in July 2013. Earlier version do not have it.
>>>>>> This feature was enhanced in release August 2014 to have separate upper address values for transmit and receive."
>>>>>>
>>>>>>> /* Bitfields in NSR */
>>>>>>> @@ -474,6 +479,10 @@
>>>>>>>  struct macb_dma_desc {
>>>>>>  >      u32     addr;
>>>>>>>       u32     ctrl;
>>>>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>>>>> +     u32     addrh;
>>>>>>> +     u32     resvd;
>>>>>>> +#endif
>>>>>>>  };
>>>>>>
>>>>>> It will not work for legacy hardware. Old descriptor is 2 words wide, the new one is 4 words wide.
>>>>>> If you enable CONFIG_ARCH_DMA_ADDR_T_64BIT but hardware doesn't
>>>>>> support it at all, you will miss every second descriptor.
>>>>>>
>>>>>
>>>>> True, this feature is not available in all of Cadence IP versions.
>>>>> In fact, the IP version Zynq does not support this. But the one in ZynqMP does.
>>>>> So, we enable kernel config for 64 bit DMA addressing for this SoC
>>>>> and hence the driver picks it up. My assumption was that if the
>>>>> legacy IP does not support
>>>>> 64 bit addressing, then this DMA option wouldn't be enabled.
>>>>>
>>>>> There is a design config register in Cadence IP which is being read
>>>>> to check for 64 bit address support - DMA mask is set based on that.
>>>>> But the addition of two descriptor words cannot be based on this runtime check.
>>>>> For this reason, all the static changes were placed under this check.
>>>>
>>>> We have quite a bunch of options in this driver to determinate what is the real capacity of the underlying hardware.
>>>> If HW configuration registers are not appropriate, and it seems they are not, I would advice to simply use the DT compatibility string.
>>>>
>>>> Best regards,
>>>> --
>>>> Nicolas Ferre
>>>
>>> HW configuration registers are appropriate. The issue is that this code doesn’t use the capability bit to switch between different dma descriptors (2 words vs. 4 words).
>>> DMA descriptor size is chosen based on kernel configuration, not based on hardware capabilities.
>>
>> HW configuration register does give appropriate information.
>> But addition of two address words in the macb descriptor structure is a static change.
>>
>> +static inline void macb_set_addr(struct macb_dma_desc *desc, dma_addr_t
>> +addr) {
>> +       desc->addr = (u32)addr;
>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>> +       desc->addrh = (u32)(addr >> 32); #endif
>> +
>>
>> Even if the #ifdef condition here is changed to HW config check, addr and addrh are different.
>> And "addrh" entry has to be present for 64 bit desc case to be handled separately.
>> Can you please tell me how you propose change in DMA descriptor structure from
>> 4 to 2 or 2 to 4 words *after* reading the DCFG register?
> 
> It is very complex problem. I wrote to you because I faced the same issue. I'm working on PTP implementation for macb. When PTP is enabled there are additional two words in buffer descriptor.

Moreover, we can use PTP without the 64bits descriptor support (early
GEM revisions).

BTW, note that there is an initiative ongoing with Andrei to support
PTP/1588 on these devices with patches already send: please synchronize
with him.
https://marc.info/?l=linux-netdev&m=147282092309112&w=3
or
https://patchwork.kernel.org/patch/9310989/
and
https://patchwork.kernel.org/patch/9310991/

Regards,

> But hardware might not be compatible with PTP. Therefore I have to change DMA descriptor between 2 and 4 words after reading DCFG register (the same as you between 32b and 64b).
> When we consider both PTP and 64 bits, we end up with 4 different descriptors!
> 
> 1. 32b and without PTP support:
> 
> struct macb_dma_desc {
> 	u32	addr;
> 	u32	ctrl;
> }
> 
> 2. 64b and without PTP support:
> 
> struct macb_dma_desc {
> 	u32	addr;
> 	u32	ctrl;
> 	u32 	addrh;
> 	u32  	resvd;
> }
> 
> 3. 32 and PTP support:
> 
> struct macb_dma_desc {
> 	u32	addr;
> 	u32	ctrl;
> 	u32	dma_desc_ts_1;
> 	u32	dma_desc_ts_2;
> };
> 
> 4. 64b and PTP support:
> 
> struct macb_dma_desc {
> 	u32 	addr;
> 	u32 	ctrl;
> 	u32 	addrh;
> 	u32  	resvd;
> 	u32 	dma_desc_ts_1;
> 	u32	dma_desc_ts_2;
> };
> 
>> Can you please tell me how you propose change in DMA descriptor structure from
>> 4 to 2 or 2 to 4 words *after* reading the DCFG register?
> 
> I have some ideas but I don’t know whether it is possible to upstream it later on.
> IMHO, we need to use the same mechanism in both our cases.
> 
> My very first idea to start discussion:
> (Below is kind of pseudo code only to show my idea. All defines like 64B or PTP are omitted for simplicity)
> 
> 1. Prepare appropriate structures:
> 
> struct macb_dma_desc {
> 	u32	addr;
> 	u32	ctrl;
> }
> 
> struct macb_dma_desc_64 {
> 	u32 addrh;
> 	u32 resvd;
> }
> 
> struct macb_dma_desc_ptp {
> 	u32 dma_desc_ts_1;
> 	u32	dma_desc_ts_2;
> }
> 
> 2. Add hardware support information to macb:
> 
> enum macb_hw_cap {
> 	HW_CAP_NONE,
> 	HW_CAP_64B,
> 	HW_CAP_PTP,
> 	HW_CAP_64B_PTP,
> };
> 
> struct macb {
> // (...)
> 	macb_hw_cap hw_cap;
> 
> }
> 
> 3. Set bp->hw_cap on macb_probe()
> 
> 4. Additional function might be helpful:
> 
> static unsigned char macb_dma_desc_get_mul(struct macb *bp)
> {
> 	switch (bp->hw_cap) {
> 		case HW_CAP_NONE:
> 			return 1;
> 		case HW_CAP_64B:
> 		case HW_CAP_PTP:
> 			return 2;
> 		case HW_CAP_64B_PTP:
> 			return 3;
> 	}
> }
> 
> 5. Change sizeof struct to function:
> (sizeof(struct macb_dma_desc) change to macb_dma_desc_get_size(bp). It will return sizeof(struct macb_dma_desc) * macb_dma_desc_get_mul(bp).
> There is a hidden assumption that all three structures have the same size.
> 
> 6. macb_rx_desc() and macb_tx_desc() will still return struct macb_dma_desc * but descriptor index will be counted in different way, ex.
> 
> static struct macb_dma_desc *macb_rx_desc(struct macb *bp, unsigned int index)
> {
> 	index *= macb_dma_desc_get_mul(bp);
> 	return &bp->rx_ring[macb_rx_ring_wrap(bp, index)];
> }
> 
> 7. Two additional functions to dereference PTP and 64b information:
> 
> static struct macb_dma_desc_64 *macb_64b_desc(struct macb *bp, struct macb_dma_desc *desc)
> {
> 	switch (bp->hw_cap) {
> 		case HW_CAP_64B:
> 		case HW_CAP_64B_PTP:
> 			return (struct macb_dma_desc_64 *)(desc + 1);  // ugly casting
> 		default:
> 			return NULL;
> 	}
> }
> 
> static struct macb_dma_desc_ptp *macb_ptp_desc(struct macb *bp, struct macb_dma_desc *desc)
> {
> 	switch (bp->hw_cap) {
> 		case HW_CAP_PTP:
> 			return (struct macb_dma_desc_ptp *)(desc + 1);
> 		case HW_CAP_64B_PTP:
> 			return (struct macb_dma_desc_ptp *)(desc + 2);
> 		default:
> 			return NULL;
> 	}
> }
> 
> Whenever you want to reach fields in appropriate descriptor, above function should be used.
> 
> This is only my very first idea. Of course, we can leave it as it is and say, that old hardware is not support either when PTP enabled or 64b enabled.
> 


-- 
Nicolas Ferre

^ permalink raw reply

* RE: [PATCH iproute2 1/1] tc: updated man page to reflect GET command to retrieve a single filter.
From: Sathya Perla @ 2016-11-18  9:11 UTC (permalink / raw)
  To: Roman Mashak, davem; +Cc: netdev, jhs, stephen
In-Reply-To: <1478485658-14379-1-git-send-email-mrv@mojatatu.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
On Behalf Of Roman Mashak
>
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
>  man/man8/tc.8 | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/man/man8/tc.8 b/man/man8/tc.8 index 4e99dca..3f4f8b5 100644
> --- a/man/man8/tc.8
> +++ b/man/man8/tc.8
> @@ -28,7 +28,7 @@ class-id ] qdisc
>
>  .B tc
>  .RI "[ " OPTIONS " ]"
> -.B filter [ add | change | replace | delete ] dev
> +.B filter [ add | change | replace | delete | get ] dev
>  DEV
>  .B [ parent
>  qdisc-id

The handle parameter seems to be missing from the TC filter syntax
synopsis ; as in:
tc [ OPTIONS ] filter [ add | change | replace | delete | get ]  dev  DEV
[
       parent  qdisc-id  | root ] protocol protocol prio priority handle
handle-ID

^^^^^^^^^^^^^

I guess the user would have to specify the handle-ID to retrieve a single
filter.

thanks,
-Sathya

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: unbreak xfrm_sk_policy_lookup
From: Steffen Klassert @ 2016-11-18  9:17 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <1479385306-24731-1-git-send-email-fw@strlen.de>

On Thu, Nov 17, 2016 at 01:21:46PM +0100, Florian Westphal wrote:
> if we succeed grabbing the refcount, then
>   if (err && !xfrm_pol_hold_rcu)
> 
> will evaluate to false so this hits last else branch which then
> sets policy to ERR_PTR(0).
> 
> Fixes: ae33786f73a7ce ("xfrm: policy: only use rcu in xfrm_sk_policy_lookup")
> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied to the ipsec tree, thanks everyone!

^ permalink raw reply

* RE: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
From: Rafal Ozieblo @ 2016-11-18  9:30 UTC (permalink / raw)
  To: Nicolas Ferre, Harini Katakam, Andrei Pistirica
  Cc: harini.katakam@xilinx.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <a315b554-8b44-ac82-a104-e382940750bb@atmel.com>

>-----Original Message-----         
>From: Nicolas Ferre [mailto:nicolas.ferre@atmel.com] 
>Sent: 18 listopada 2016 10:10                        
>To: Rafal Ozieblo; Harini Katakam; Andrei Pistirica  
>Cc: harini.katakam@xilinx.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM      
>                                                                                   
>Le 18/11/2016 à 09:59, Rafal Ozieblo a écrit :                                     
>> Hello,                                                                           
>>                                                                                  
>>> From: Harini Katakam [mailto:harinikatakamlinux@gmail.com]                      
>>> Sent: 18 listopada 2016 05:30                                                   
>>> To: Rafal Ozieblo                                                               
>>> Cc: Nicolas Ferre; harini.katakam@xilinx.com; netdev@vger.kernel.org;           
>>> linux-kernel@vger.kernel.org                                                    
>>> Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support           
>>> for GEM                                                                         
>>>                                                                                 
>>> Hi Rafal,                                                                       
>>>                                                                                 
>>> On Thu, Nov 17, 2016 at 7:05 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:      
>>>> -----Original Message-----                                                     
>>>> From: Nicolas Ferre [mailto:nicolas.ferre@atmel.com]                           
>>>> Sent: 17 listopada 2016 14:29                                                  
>>>> To: Harini Katakam; Rafal Ozieblo                                              
>>>> Cc: harini.katakam@xilinx.com; netdev@vger.kernel.org;                         
>>>> linux-kernel@vger.kernel.org                                                   
>>>> Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing                  
>>>> support for GEM                                                                
>>>>                                                                                
>>>>> Le 17/11/2016 à 13:21, Harini Katakam a écrit :                               
>>>>>> Hi Rafal,                                                                    
>>>>>>                                                                              
>>>>>> On Thu, Nov 17, 2016 at 5:20 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:   
>>>>>>> Hello,                                                                      
>>>>>>> I think, there could a bug in your patch.                                   
>>>>>>>                                                                             
>>>>>>>> +                                                                          
>>>>>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT                                       
>>>>>>>> +             dmacfg |= GEM_BIT(ADDR64); #endif                            
>>>>>>>                                                                             
>>>>>>> You enable 64 bit addressing (64b dma bus width) always when appropriate architecture config option is enabled.                                                                                           
>>>>>>> But there are some legacy controllers which do not support that feature. According Cadence hardware team:                                                                                                 
>>>>>>> "64 bit addressing was added in July 2013. Earlier version do not have it.                       
>>>>>>> This feature was enhanced in release August 2014 to have separate upper address values for transmit and receive."                                                                                         
>>>>>>>                                                                                                  
>>>>>>>> /* Bitfields in NSR */                                                                          
>>>>>>>> @@ -474,6 +479,10 @@                                                                            
>>>>>>>>  struct macb_dma_desc {                                                                         
>>>>>>>  >      u32     addr;                                                                            
>>>>>>>>       u32     ctrl;                                                                             
>>>>>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT                                                            
>>>>>>>> +     u32     addrh;                                                                            
>>>>>>>> +     u32     resvd;                                                                            
>>>>>>>> +#endif                                                                                         
>>>>>>>>  };                                                                                             
>>>>>>>                                                                                                  
>>>>>>> It will not work for legacy hardware. Old descriptor is 2 words wide, the new one is 4 words wide.                                                                                                        
>>>>>>> If you enable CONFIG_ARCH_DMA_ADDR_T_64BIT but hardware doesn't                                  
>>>>>>> support it at all, you will miss every second descriptor.                                        
>>>>>>>                                                                                                  
>>>>>>                                                                                                   
>>>>>> True, this feature is not available in all of Cadence IP versions.                                
>>>>>> In fact, the IP version Zynq does not support this. But the one in ZynqMP does.                   
>>>>>> So, we enable kernel config for 64 bit DMA addressing for this SoC                                
>>>>>> and hence the driver picks it up. My assumption was that if the                                   
>>>>>> legacy IP does not support                                                                        
>>>>>> 64 bit addressing, then this DMA option wouldn't be enabled.                                      
>>>>>>                                                                                                   
>>>>>> There is a design config register in Cadence IP which is being                                    
>>>>>> read to check for 64 bit address support - DMA mask is set based on that.                         
>>>>>> But the addition of two descriptor words cannot be based on this runtime check.                   
>>>>>> For this reason, all the static changes were placed under this check.                             
>>>>>                                                                                                    
>>>>> We have quite a bunch of options in this driver to determinate what is the real capacity of the underlying hardware.                                                                                        
>>>>> If HW configuration registers are not appropriate, and it seems they are not, I would advice to simply use the DT compatibility string.                                                                     
>>>>>                                                                                                    
>>>>> Best regards,                                                                                      
>>>>> --                                                                                                 
>>>>> Nicolas Ferre                                                                                      
>>>>                                                                                                     
>>>> HW configuration registers are appropriate. The issue is that this code doesn’t use the capability bit to switch between different dma descriptors (2 words vs. 4 words).                                    
>>>> DMA descriptor size is chosen based on kernel configuration, not based on hardware capabilities.    
>>>                                                                                                      
>>> HW configuration register does give appropriate information.                                         
>>> But addition of two address words in the macb descriptor structure is a static change.               
>>>                                                                                                      
>>> +static inline void macb_set_addr(struct macb_dma_desc *desc,                                        
>>> +dma_addr_t                                                                                          
>>> +addr) {                                                                                             
>>> +       desc->addr = (u32)addr;                                                                      
>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT                                                                 
>>> +       desc->addrh = (u32)(addr >> 32); #endif                                                      
>>> +                                                                                                    
>>>                                                                                                      
>>> Even if the #ifdef condition here is changed to HW config check, addr and addrh are different.       
>>> And "addrh" entry has to be present for 64 bit desc case to be handled separately.                   
>>> Can you please tell me how you propose change in DMA descriptor                                      
>>> structure from                                                                                       
>>> 4 to 2 or 2 to 4 words *after* reading the DCFG register?                                            
>>                                                                                                       
>> It is very complex problem. I wrote to you because I faced the same issue. I'm working on PTP implementation for macb. When PTP is enabled there are additional two words in buffer descriptor.                
>                                                                                                        
>Moreover, we can use PTP without the 64bits descriptor support (early GEM revisions).                   
>                                                                                                        
>BTW, note that there is an initiative ongoing with Andrei to support                                    
>PTP/1588 on these devices with patches already send: please synchronize with him.                       
>https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dnetdev-26m-3D147282092309112-26w-3D3&d=DgIDaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=MWSZN_jP4BHjuU9UFsrndALGeJqIXzD0WtGCtCg4L5E&m=jUmZzqwn1uRrvcqF3BHCHFC6ZsKoZkU3vT8gJ-7fnsE&s=kr_km0MKQBzpkaOt0fkM-FIajN1-pOylzzTjsXi-vak&e=    
>or                                                                                                      
>https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9310989_&d=DgIDaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=MWSZN_jP4BHjuU9UFsrndALGeJqIXzD0WtGCtCg4L5E&m=jUmZzqwn1uRrvcqF3BHCHFC6ZsKoZkU3vT8gJ-7fnsE&s=ZbXVD5Lb5zGn7wUKIPYHxagIEKp_vJvrnkRa4qJfmTY&e=                              
>and                                                                                                     
>https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9310991_&d=DgIDaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=MWSZN_jP4BHjuU9UFsrndALGeJqIXzD0WtGCtCg4L5E&m=jUmZzqwn1uRrvcqF3BHCHFC6ZsKoZkU3vT8gJ-7fnsE&s=Z0kRqUqh5Is4TmuaY7A4hnpizfdeq3HrgPhyDAMLuA8&e=                              
>                                                                                                        
>Regards,                                                                                                
>--
>Nicolas Ferre

Above patch doesn’t use hardware timestamps in descriptors at all. It doesn't use appropriate accuracy. 
We have our PTP patch ready, which use timestamps from descriptor. But we have not sent it yet because of compatibility issue.
Of course, we can do it like Harini did:

struct macb_dma_desc {
	u32	addr;
	u32	ctrl;
#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
	u32     addrh;
	u32     resvd;
#endif
#if IS_ENABLED(CONFIG_PTP_1588_CLOCK)
	u32	dma_desc_ts_1;
	u32	dma_desc_ts_2;
#endif
};

But in that approach we lose backward compatibility.

What are your suggestion? Can we send patch like it is or wait for some common solution with backward compatibility?

Best regards,

Rafal Ozieblo   |   Firmware System Engineer, 
phone nbr.: +48 32 5085469    
www.cadence.com

^ permalink raw reply

* RE: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
From: Rafal Ozieblo @ 2016-11-18  8:59 UTC (permalink / raw)
  To: Harini Katakam
  Cc: Nicolas Ferre, harini.katakam@xilinx.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAFcVEC+RqUMQF39oe2EuCoisX6dCwRQ3G81zFMqoOHU3HpfPHQ@mail.gmail.com>

Hello,

> From: Harini Katakam [mailto:harinikatakamlinux@gmail.com]
> Sent: 18 listopada 2016 05:30
> To: Rafal Ozieblo
> Cc: Nicolas Ferre; harini.katakam@xilinx.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
>
> Hi Rafal,
>
> On Thu, Nov 17, 2016 at 7:05 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:
> > -----Original Message-----
> > From: Nicolas Ferre [mailto:nicolas.ferre@atmel.com]
> > Sent: 17 listopada 2016 14:29
> > To: Harini Katakam; Rafal Ozieblo
> > Cc: harini.katakam@xilinx.com; netdev@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support
> > for GEM
> >
> >> Le 17/11/2016 à 13:21, Harini Katakam a écrit :
> >> > Hi Rafal,
> >> >
> >> > On Thu, Nov 17, 2016 at 5:20 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:
> >> >> Hello,
> >> >> I think, there could a bug in your patch.
> >> >>
> >> >>> +
> >> >>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> >> >>> +             dmacfg |= GEM_BIT(ADDR64); #endif
> >> >>
> >> >> You enable 64 bit addressing (64b dma bus width) always when appropriate architecture config option is enabled.
> >> >> But there are some legacy controllers which do not support that feature. According Cadence hardware team:
> >> >> "64 bit addressing was added in July 2013. Earlier version do not have it.
> >> >> This feature was enhanced in release August 2014 to have separate upper address values for transmit and receive."
> >> >>
> >> >>> /* Bitfields in NSR */
> >> >>> @@ -474,6 +479,10 @@
> >> >>>  struct macb_dma_desc {
> >> >>  >      u32     addr;
> >> >>>       u32     ctrl;
> >> >>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> >> >>> +     u32     addrh;
> >> >>> +     u32     resvd;
> >> >>> +#endif
> >> >>>  };
> >> >>
> >> >> It will not work for legacy hardware. Old descriptor is 2 words wide, the new one is 4 words wide.
> >> >> If you enable CONFIG_ARCH_DMA_ADDR_T_64BIT but hardware doesn't
> >> >> support it at all, you will miss every second descriptor.
> >> >>
> >> >
> >> > True, this feature is not available in all of Cadence IP versions.
> >> > In fact, the IP version Zynq does not support this. But the one in ZynqMP does.
> >> > So, we enable kernel config for 64 bit DMA addressing for this SoC
> >> > and hence the driver picks it up. My assumption was that if the
> >> > legacy IP does not support
> >> > 64 bit addressing, then this DMA option wouldn't be enabled.
> >> >
> >> > There is a design config register in Cadence IP which is being read
> >> > to check for 64 bit address support - DMA mask is set based on that.
> >> > But the addition of two descriptor words cannot be based on this runtime check.
> >> > For this reason, all the static changes were placed under this check.
> >>
> >> We have quite a bunch of options in this driver to determinate what is the real capacity of the underlying hardware.
> >> If HW configuration registers are not appropriate, and it seems they are not, I would advice to simply use the DT compatibility string.
> >>
> >> Best regards,
> >> --
> >> Nicolas Ferre
> >
> > HW configuration registers are appropriate. The issue is that this code doesn’t use the capability bit to switch between different dma descriptors (2 words vs. 4 words).
> > DMA descriptor size is chosen based on kernel configuration, not based on hardware capabilities.
>
> HW configuration register does give appropriate information.
> But addition of two address words in the macb descriptor structure is a static change.
>
> +static inline void macb_set_addr(struct macb_dma_desc *desc, dma_addr_t
> +addr) {
> +       desc->addr = (u32)addr;
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> +       desc->addrh = (u32)(addr >> 32); #endif
> +
>
> Even if the #ifdef condition here is changed to HW config check, addr and addrh are different.
> And "addrh" entry has to be present for 64 bit desc case to be handled separately.
> Can you please tell me how you propose change in DMA descriptor structure from
> 4 to 2 or 2 to 4 words *after* reading the DCFG register?

It is very complex problem. I wrote to you because I faced the same issue. I'm working on PTP implementation for macb. When PTP is enabled there are additional two words in buffer descriptor.
But hardware might not be compatible with PTP. Therefore I have to change DMA descriptor between 2 and 4 words after reading DCFG register (the same as you between 32b and 64b).
When we consider both PTP and 64 bits, we end up with 4 different descriptors!

1. 32b and without PTP support:

struct macb_dma_desc {
	u32	addr;
	u32	ctrl;
}

2. 64b and without PTP support:

struct macb_dma_desc {
	u32	addr;
	u32	ctrl;
	u32 	addrh;
	u32  	resvd;
}

3. 32 and PTP support:

struct macb_dma_desc {
	u32	addr;
	u32	ctrl;
	u32	dma_desc_ts_1;
	u32	dma_desc_ts_2;
};

4. 64b and PTP support:

struct macb_dma_desc {
	u32 	addr;
	u32 	ctrl;
	u32 	addrh;
	u32  	resvd;
	u32 	dma_desc_ts_1;
	u32	dma_desc_ts_2;
};

> Can you please tell me how you propose change in DMA descriptor structure from
> 4 to 2 or 2 to 4 words *after* reading the DCFG register?

I have some ideas but I don’t know whether it is possible to upstream it later on.
IMHO, we need to use the same mechanism in both our cases.

My very first idea to start discussion:
(Below is kind of pseudo code only to show my idea. All defines like 64B or PTP are omitted for simplicity)

1. Prepare appropriate structures:

struct macb_dma_desc {
	u32	addr;
	u32	ctrl;
}

struct macb_dma_desc_64 {
	u32 addrh;
	u32 resvd;
}

struct macb_dma_desc_ptp {
	u32 dma_desc_ts_1;
	u32	dma_desc_ts_2;
}

2. Add hardware support information to macb:

enum macb_hw_cap {
	HW_CAP_NONE,
	HW_CAP_64B,
	HW_CAP_PTP,
	HW_CAP_64B_PTP,
};

struct macb {
// (...)
	macb_hw_cap hw_cap;

}

3. Set bp->hw_cap on macb_probe()

4. Additional function might be helpful:

static unsigned char macb_dma_desc_get_mul(struct macb *bp)
{
	switch (bp->hw_cap) {
		case HW_CAP_NONE:
			return 1;
		case HW_CAP_64B:
		case HW_CAP_PTP:
			return 2;
		case HW_CAP_64B_PTP:
			return 3;
	}
}

5. Change sizeof struct to function:
(sizeof(struct macb_dma_desc) change to macb_dma_desc_get_size(bp). It will return sizeof(struct macb_dma_desc) * macb_dma_desc_get_mul(bp).
There is a hidden assumption that all three structures have the same size.

6. macb_rx_desc() and macb_tx_desc() will still return struct macb_dma_desc * but descriptor index will be counted in different way, ex.

static struct macb_dma_desc *macb_rx_desc(struct macb *bp, unsigned int index)
{
	index *= macb_dma_desc_get_mul(bp);
	return &bp->rx_ring[macb_rx_ring_wrap(bp, index)];
}

7. Two additional functions to dereference PTP and 64b information:

static struct macb_dma_desc_64 *macb_64b_desc(struct macb *bp, struct macb_dma_desc *desc)
{
	switch (bp->hw_cap) {
		case HW_CAP_64B:
		case HW_CAP_64B_PTP:
			return (struct macb_dma_desc_64 *)(desc + 1);  // ugly casting
		default:
			return NULL;
	}
}

static struct macb_dma_desc_ptp *macb_ptp_desc(struct macb *bp, struct macb_dma_desc *desc)
{
	switch (bp->hw_cap) {
		case HW_CAP_PTP:
			return (struct macb_dma_desc_ptp *)(desc + 1);
		case HW_CAP_64B_PTP:
			return (struct macb_dma_desc_ptp *)(desc + 2);
		default:
			return NULL;
	}
}

Whenever you want to reach fields in appropriate descriptor, above function should be used.

This is only my very first idea. Of course, we can leave it as it is and say, that old hardware is not support either when PTP enabled or 64b enabled.

^ permalink raw reply

* Re: [PATCH net v2 7/7] net: ethernet: ti: cpsw: fix fixed-link phy probe deferral
From: Johan Hovold @ 2016-11-18  9:33 UTC (permalink / raw)
  To: David Miller
  Cc: johan, mugunthanvnm, grygorii.strashko, linux-omap, netdev,
	linux-kernel
In-Reply-To: <20161117171920.GD10490@localhost>

On Thu, Nov 17, 2016 at 06:19:20PM +0100, Johan Hovold wrote:
> On Thu, Nov 17, 2016 at 12:04:16PM -0500, David Miller wrote:
> > From: Johan Hovold <johan@kernel.org>
> > Date: Thu, 17 Nov 2016 17:40:04 +0100
> > 
> > > Make sure to propagate errors from of_phy_register_fixed_link() which
> > > can fail with -EPROBE_DEFER.
> > > 
> > > Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link
> > > PHY")
> > > Signed-off-by: Johan Hovold <johan@kernel.org>
> > 
> > Johan, when you update a patch within a series you must post the
> > entire series freshly to the lists, cover posting and all.
> 
> I'm quite sure that is exactly what I did. Did you only get this last
> patch out of the seven?

The full series has made it to the lists:

	https://marc.info/?l=linux-netdev&m=147940433115984&w=2

Perhaps some messages just got held up.

Let me know if you still want me to resend.

Thanks,
Johan

^ permalink raw reply

* [PATCH] netns: fix get_net_ns_by_fd(int pid) typo
From: Stefan Hajnoczi @ 2016-11-18  9:41 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Stefan Hajnoczi

The argument to get_net_ns_by_fd() is a /proc/$PID/ns/net file
descriptor not a pid.  Fix the typo.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/net/net_namespace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index fc4f757..0940598 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -170,7 +170,7 @@ static inline struct net *copy_net_ns(unsigned long flags,
 extern struct list_head net_namespace_list;
 
 struct net *get_net_ns_by_pid(pid_t pid);
-struct net *get_net_ns_by_fd(int pid);
+struct net *get_net_ns_by_fd(int fd);
 
 #ifdef CONFIG_SYSCTL
 void ipx_register_sysctl(void);
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
From: Harini Katakam @ 2016-11-18  9:43 UTC (permalink / raw)
  To: Rafal Ozieblo
  Cc: Nicolas Ferre, Andrei Pistirica, harini.katakam@xilinx.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <BN3PR07MB251655CAFDB38CCF0F240BB5C9B00@BN3PR07MB2516.namprd07.prod.outlook.com>

Hi Rafal,

On Fri, Nov 18, 2016 at 3:00 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:
>>-----Original Message-----
>>From: Nicolas Ferre [mailto:nicolas.ferre@atmel.com]
>>Sent: 18 listopada 2016 10:10
>>To: Rafal Ozieblo; Harini Katakam; Andrei Pistirica
>>Cc: harini.katakam@xilinx.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>>Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
>>
>>Le 18/11/2016 à 09:59, Rafal Ozieblo a écrit :
>>> Hello,
>>>
>>>> From: Harini Katakam [mailto:harinikatakamlinux@gmail.com]
>>>> Sent: 18 listopada 2016 05:30
>>>> To: Rafal Ozieblo
>>>> Cc: Nicolas Ferre; harini.katakam@xilinx.com; netdev@vger.kernel.org;
>>>> linux-kernel@vger.kernel.org
>>>> Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support
>>>> for GEM
>>>>
>>>> Hi Rafal,
>>>>
>>>> On Thu, Nov 17, 2016 at 7:05 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:
>>>>> -----Original Message-----
>>>>> From: Nicolas Ferre [mailto:nicolas.ferre@atmel.com]
>>>>> Sent: 17 listopada 2016 14:29
>>>>> To: Harini Katakam; Rafal Ozieblo
>>>>> Cc: harini.katakam@xilinx.com; netdev@vger.kernel.org;
>>>>> linux-kernel@vger.kernel.org
>>>>> Subject: Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing
>>>>> support for GEM
>>>>>
>>>>>> Le 17/11/2016 à 13:21, Harini Katakam a écrit :
>>>>>>> Hi Rafal,
>>>>>>>
>>>>>>> On Thu, Nov 17, 2016 at 5:20 PM, Rafal Ozieblo <rafalo@cadence.com> wrote:
>>>>>>>> Hello,
>>>>>>>> I think, there could a bug in your patch.
>>>>>>>>
>>>>>>>>> +
>>>>>>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>>>>>>> +             dmacfg |= GEM_BIT(ADDR64); #endif
>>>>>>>>
>>>>>>>> You enable 64 bit addressing (64b dma bus width) always when appropriate architecture config option is enabled.
>>>>>>>> But there are some legacy controllers which do not support that feature. According Cadence hardware team:
>>>>>>>> "64 bit addressing was added in July 2013. Earlier version do not have it.
>>>>>>>> This feature was enhanced in release August 2014 to have separate upper address values for transmit and receive."
>>>>>>>>
>>>>>>>>> /* Bitfields in NSR */
>>>>>>>>> @@ -474,6 +479,10 @@
>>>>>>>>>  struct macb_dma_desc {
>>>>>>>>  >      u32     addr;
>>>>>>>>>       u32     ctrl;
>>>>>>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>>>>>>> +     u32     addrh;
>>>>>>>>> +     u32     resvd;
>>>>>>>>> +#endif
>>>>>>>>>  };
>>>>>>>>
>>>>>>>> It will not work for legacy hardware. Old descriptor is 2 words wide, the new one is 4 words wide.
>>>>>>>> If you enable CONFIG_ARCH_DMA_ADDR_T_64BIT but hardware doesn't
>>>>>>>> support it at all, you will miss every second descriptor.
>>>>>>>>
>>>>>>>
>>>>>>> True, this feature is not available in all of Cadence IP versions.
>>>>>>> In fact, the IP version Zynq does not support this. But the one in ZynqMP does.
>>>>>>> So, we enable kernel config for 64 bit DMA addressing for this SoC
>>>>>>> and hence the driver picks it up. My assumption was that if the
>>>>>>> legacy IP does not support
>>>>>>> 64 bit addressing, then this DMA option wouldn't be enabled.
>>>>>>>
>>>>>>> There is a design config register in Cadence IP which is being
>>>>>>> read to check for 64 bit address support - DMA mask is set based on that.
>>>>>>> But the addition of two descriptor words cannot be based on this runtime check.
>>>>>>> For this reason, all the static changes were placed under this check.
>>>>>>
>>>>>> We have quite a bunch of options in this driver to determinate what is the real capacity of the underlying hardware.
>>>>>> If HW configuration registers are not appropriate, and it seems they are not, I would advice to simply use the DT compatibility string.
>>>>>>
>>>>>> Best regards,
>>>>>> --
>>>>>> Nicolas Ferre
>>>>>
>>>>> HW configuration registers are appropriate. The issue is that this code doesn’t use the capability bit to switch between different dma descriptors (2 words vs. 4 words).
>>>>> DMA descriptor size is chosen based on kernel configuration, not based on hardware capabilities.
>>>>
>>>> HW configuration register does give appropriate information.
>>>> But addition of two address words in the macb descriptor structure is a static change.
>>>>
>>>> +static inline void macb_set_addr(struct macb_dma_desc *desc,
>>>> +dma_addr_t
>>>> +addr) {
>>>> +       desc->addr = (u32)addr;
>>>> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>>>> +       desc->addrh = (u32)(addr >> 32); #endif
>>>> +
>>>>
>>>> Even if the #ifdef condition here is changed to HW config check, addr and addrh are different.
>>>> And "addrh" entry has to be present for 64 bit desc case to be handled separately.
>>>> Can you please tell me how you propose change in DMA descriptor
>>>> structure from
>>>> 4 to 2 or 2 to 4 words *after* reading the DCFG register?
>>>
>>> It is very complex problem. I wrote to you because I faced the same issue. I'm working on PTP implementation for macb. When PTP is enabled there are additional two words in buffer descriptor.
>>
>>Moreover, we can use PTP without the 64bits descriptor support (early GEM revisions).
>>
>>BTW, note that there is an initiative ongoing with Andrei to support
>>PTP/1588 on these devices with patches already send: please synchronize with him.
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dnetdev-26m-3D147282092309112-26w-3D3&d=DgIDaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=MWSZN_jP4BHjuU9UFsrndALGeJqIXzD0WtGCtCg4L5E&m=jUmZzqwn1uRrvcqF3BHCHFC6ZsKoZkU3vT8gJ-7fnsE&s=kr_km0MKQBzpkaOt0fkM-FIajN1-pOylzzTjsXi-vak&e=
>>or
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9310989_&d=DgIDaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=MWSZN_jP4BHjuU9UFsrndALGeJqIXzD0WtGCtCg4L5E&m=jUmZzqwn1uRrvcqF3BHCHFC6ZsKoZkU3vT8gJ-7fnsE&s=ZbXVD5Lb5zGn7wUKIPYHxagIEKp_vJvrnkRa4qJfmTY&e=
>>and
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9310991_&d=DgIDaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=MWSZN_jP4BHjuU9UFsrndALGeJqIXzD0WtGCtCg4L5E&m=jUmZzqwn1uRrvcqF3BHCHFC6ZsKoZkU3vT8gJ-7fnsE&s=Z0kRqUqh5Is4TmuaY7A4hnpizfdeq3HrgPhyDAMLuA8&e=
>>
>>Regards,
>>--
>>Nicolas Ferre
>
> Above patch doesn’t use hardware timestamps in descriptors at all. It doesn't use appropriate accuracy.
> We have our PTP patch ready, which use timestamps from descriptor. But we have not sent it yet because of compatibility issue.
> Of course, we can do it like Harini did:
>
> struct macb_dma_desc {
>         u32     addr;
>         u32     ctrl;
> #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>         u32     addrh;
>         u32     resvd;
> #endif
> #if IS_ENABLED(CONFIG_PTP_1588_CLOCK)
>         u32     dma_desc_ts_1;
>         u32     dma_desc_ts_2;
> #endif
> };
>
> But in that approach we lose backward compatibility.
>
> What are your suggestion? Can we send patch like it is or wait for some common solution with backward compatibility?
>

Yes, Andre's version of Cadence does not ability to report have RX timestamp.
The version I worked with did. This is the old series I sent:
https://lkml.org/lkml/2015/9/11/92
But right now, i'm working on building on top of Andre's changes.

Regards,
Harini

^ permalink raw reply

* Re: [RFC PATCH 2/2] net: macb: Add 64 bit addressing support for GEM
From: Harini Katakam @ 2016-11-18 10:03 UTC (permalink / raw)
  To: Rafal Ozieblo
  Cc: Nicolas Ferre, Andrei Pistirica, harini.katakam@xilinx.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <BN3PR07MB25162D82CEF09B8C7586B799C9B00@BN3PR07MB2516.namprd07.prod.outlook.com>

Hi Rafal,


> I can’t see a place where you enable extended descriptor for PTP. Did you add support for extended PTP descriptor?
Sorry, that was separate patch in the series:
http://lkml.iu.edu/hypermail/linux/kernel/1509.1/02239.html

> "DMA Configuration Register" 0x010:
> 29      tx_bd_extended_mode_en  "Enable TX extended BD mode. See TX BD control register definition for description of feature."
> 28      rx_bd_extended_mode_en  "Enable RX extended BD mode. See RX BD control register definition for description of feature."
>
> Can I send you our patch for comparison ?
Sure, please mail. Also, please mention the Cadence IP version number you use.

Regards,
Harini

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox