Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next] net: ipv6: put host and anycast routes on device with address
From: Hannes Frederic Sowa @ 2017-08-19  0:28 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, yoshfuji
In-Reply-To: <0bcba874-7fc3-508c-bf78-ae5832312845@gmail.com>

David Ahern <dsahern@gmail.com> writes:

> On 8/18/17 6:05 PM, David Ahern wrote:
>> On 8/18/17 5:15 PM, Hannes Frederic Sowa wrote:
>>> Hello David,
>>>
>>> David Ahern <dsahern@gmail.com> writes:
>>>
>>>> @@ -2688,15 +2716,9 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
>>>>  {
>>>>  	u32 tb_id;
>>>>  	struct net *net = dev_net(idev->dev);
>>>> -	struct net_device *dev = net->loopback_dev;
>>>> +	struct net_device *dev = idev->dev;
>>>>  	struct rt6_info *rt;
>>>>  
>>>> -	/* use L3 Master device as loopback for host routes if device
>>>> -	 * is enslaved and address is not link local or multicast
>>>> -	 */
>>>> -	if (!rt6_need_strict(addr))
>>>> -		dev = l3mdev_master_dev_rcu(idev->dev) ? : dev;
>>>> -
>>>>  	rt = ip6_dst_alloc(net, dev, DST_NOCOUNT);
>>>>  	if (!rt)
>>>>  		return ERR_PTR(-ENOMEM);
>>>
>>> I am afraid this change might break Java:
>>>
>>> <http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/65464a307408/src/java.base/unix/native/libnet/net_util_md.c#l574>
>>>
>>> I am all in for this change, but maybe it might be necessary to mask
>>> RTF_LOCAL routes with "lo" somehow.
>> 
>> That's asinine. The if_inet6 processing is just getting the 'lo'
>> interface index. Why scan the file looking for that? The ipv6_route
>> processing is assembling routes against the loopback device regardless
>> of what the route is. Do you know why - what the route list is used for?
>
>
> If I read it correctly, seems to be a 2.4 workaround:
> - only user of the route list is needsLoopbackRoute()
> - only caller of needsLoopbackRoute is here:
>
> http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/65464a307408/src/java.base/unix/native/libnet/net_util_md.c#l828

I agree that it looks like dead code now. But I know for sure that this
code has been excercised at least at some point in time and caused
problems for JVMs on Linux with IPv6.

On the top of this file I found this comment:

-- >8 --
/* following code creates a list of addresses from the kernel
 * routing table that are routed via the loopback address.
 * We check all destination addresses against this table
 * and override the scope_id field to use the relevant value for "lo"
 * in order to work-around the Linux bug that prevents packets destined
 * for certain local addresses from being sent via a physical interface.
 */
-- 8< --

I don't know if it makes sense to dive down into java history (and I
also found e.g. net-snmp scanning /proc/net/ipv6_route). The same
problem might be visible via RTM_GETROUTE dumps if applications
implement their own source address selection maybe. :/

Bye, Hannes

^ permalink raw reply

* [PATCH net] ipv6: add rcu grace period before freeing fib6_node
From: Wei Wang @ 2017-08-19  0:36 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Eric Dumazet, Martin KaFai Lau, Wei Wang

From: Wei Wang <weiwan@google.com>

We currently keep rt->rt6i_node pointing to the fib6_node for the route.
And some functions make use of this pointer to dereference the fib6_node
from rt structure, e.g. rt6_check(). However, as there is neither
refcount nor rcu taken when dereferencing rt->rt6i_node, it could
potentially cause crashes as rt->rt6i_node could be set to NULL by other
CPUs when doing a route deletion.
This patch introduces an rcu grace period before freeing fib6_node and
makes sure the functions that dereference it takes rcu_read_lock().

Note: there is no "Fixes" tag because this bug was there in a very
early stage.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip6_fib.h | 31 ++++++++++++++++++++++++++++++-
 net/ipv6/ip6_fib.c    | 20 ++++++++++++++++----
 net/ipv6/route.c      | 14 +++++++++++---
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 71c1646298ae..5691faf6b495 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -72,6 +72,7 @@ struct fib6_node {
 	__u16			fn_flags;
 	int			fn_sernum;
 	struct rt6_info		*rr_ptr;
+	struct rcu_head		rcu;
 };
 
 #ifndef CONFIG_IPV6_SUBTREES
@@ -171,13 +172,41 @@ static inline void rt6_update_expires(struct rt6_info *rt0, int timeout)
 	rt0->rt6i_flags |= RTF_EXPIRES;
 }
 
+/* Function to safely get fn->sernum for passed in rt
+ * and store result in passed in cookie.
+ * Return true if we can get cookie safely
+ * Return false if not
+ */
+static inline bool rt6_get_cookie_safe(const struct rt6_info *rt,
+				       u32 *cookie)
+{
+	struct fib6_node *fn;
+	bool status = false;
+
+	rcu_read_lock();
+	fn = rcu_dereference(rt->rt6i_node);
+
+	if (fn) {
+		*cookie = fn->fn_sernum;
+		status = true;
+	}
+
+	rcu_read_unlock();
+	return status;
+
+}
+
 static inline u32 rt6_get_cookie(const struct rt6_info *rt)
 {
+	u32 cookie = 0;
+
 	if (rt->rt6i_flags & RTF_PCPU ||
 	    (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->dst.from))
 		rt = (struct rt6_info *)(rt->dst.from);
 
-	return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
+	rt6_get_cookie_safe(rt, &cookie);
+
+	return cookie;
 }
 
 static inline void ip6_rt_put(struct rt6_info *rt)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 549aacc3cb2c..a9821c230e4e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -149,11 +149,23 @@ static struct fib6_node *node_alloc(void)
 	return fn;
 }
 
-static void node_free(struct fib6_node *fn)
+static void node_free_immediate(struct fib6_node *fn)
+{
+	kmem_cache_free(fib6_node_kmem, fn);
+}
+
+static void node_free_rcu(struct rcu_head *head)
 {
+	struct fib6_node *fn = container_of(head, struct fib6_node, rcu);
+
 	kmem_cache_free(fib6_node_kmem, fn);
 }
 
+static void node_free(struct fib6_node *fn)
+{
+	call_rcu(&fn->rcu, node_free_rcu);
+}
+
 void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
 {
 	int cpu;
@@ -697,9 +709,9 @@ static struct fib6_node *fib6_add_1(struct fib6_node *root,
 
 		if (!in || !ln) {
 			if (in)
-				node_free(in);
+				node_free_immediate(in);
 			if (ln)
-				node_free(ln);
+				node_free_immediate(ln);
 			return ERR_PTR(-ENOMEM);
 		}
 
@@ -1138,7 +1150,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt,
 				   root, and then (in failure) stale node
 				   in main tree.
 				 */
-				node_free(sfn);
+				node_free_immediate(sfn);
 				err = PTR_ERR(sn);
 				goto failure;
 			}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bec12ae3e6b7..4de2d793c4b8 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1289,7 +1289,9 @@ static void rt6_dst_from_metrics_check(struct rt6_info *rt)
 
 static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
 {
-	if (!rt->rt6i_node || (rt->rt6i_node->fn_sernum != cookie))
+	u32 rt_cookie;
+
+	if (!rt6_get_cookie_safe(rt, &rt_cookie) || rt_cookie != cookie)
 		return NULL;
 
 	if (rt6_check_expired(rt))
@@ -1357,8 +1359,14 @@ static void ip6_link_failure(struct sk_buff *skb)
 		if (rt->rt6i_flags & RTF_CACHE) {
 			if (dst_hold_safe(&rt->dst))
 				ip6_del_rt(rt);
-		} else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) {
-			rt->rt6i_node->fn_sernum = -1;
+		} else {
+			struct fib6_node *fn;
+
+			rcu_read_lock();
+			fn = rcu_dereference(rt->rt6i_node);
+			if (fn && (rt->rt6i_flags & RTF_DEFAULT))
+				fn->fn_sernum = -1;
+			rcu_read_unlock();
 		}
 	}
 }
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* Re: [PATCH net-next 00/10] sysfs related cleanups
From: David Miller @ 2017-08-19  1:01 UTC (permalink / raw)
  To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20170818204628.17147-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Fri, 18 Aug 2017 13:46:18 -0700

> Network sysfs infrastructure changes. Mostly related to using ro_after_init
> to make function tables immutable.

Series applied, thanks Stephen.

^ permalink raw reply

* Re: [PATCH net-next 00/12] nfp: add basic ethtool callbacks to representors
From: David Miller @ 2017-08-19  1:04 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20170818224822.8409-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Fri, 18 Aug 2017 15:48:10 -0700

> This set extends the basic ethtool functionality to representor
> netdevs.  I start with providing link state via ethtool and then
> move on to functions such as driver information, statistics and
> FW log dump.  The series contains a number of clean ups to the
> ethtool stats code too, some of the logic is simplified by making 
> better use of the nfp_port abstraction.  The stats we expose on 
> representors are only the PCIe and MAC port statistics firmware 
> maintains for us.

Looks good, series applied, thanks.

^ permalink raw reply

* [PATCH net-next v2 1/2] bpf: make htab inlining more robust wrt assumptions
From: Daniel Borkmann @ 2017-08-19  1:12 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Daniel Borkmann
In-Reply-To: <cover.1503104831.git.daniel@iogearbox.net>

Commit 9015d2f59535 ("bpf: inline htab_map_lookup_elem()") was
making the assumption that a direct call emission to the function
__htab_map_lookup_elem() will always work out for JITs.

This is currently true since all JITs we have are for 64 bit archs,
but in case of 32 bit JITs like upcoming arm32, we get a NULL pointer
dereference when executing the call to __htab_map_lookup_elem()
since passed arguments are of a different size (due to pointer args)
than what we do out of BPF. Guard and thus limit this for now for
the current 64 bit JITs only.

Reported-by: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/verifier.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4f6e7eb..e42c096 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4160,7 +4160,11 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
 			continue;
 		}
 
-		if (ebpf_jit_enabled() && insn->imm == BPF_FUNC_map_lookup_elem) {
+		/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
+		 * handlers are currently limited to 64 bit only.
+		 */
+		if (ebpf_jit_enabled() && BITS_PER_LONG == 64 &&
+		    insn->imm == BPF_FUNC_map_lookup_elem) {
 			map_ptr = env->insn_aux_data[i + delta].map_ptr;
 			if (map_ptr == BPF_MAP_PTR_POISON ||
 			    !map_ptr->ops->map_gen_lookup)
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next v2 0/2] BPF inline improvements
From: Daniel Borkmann @ 2017-08-19  1:12 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Daniel Borkmann

First one makes htab inlining more robust wrt future jits and
second one inlines map in map lookups through map_gen_lookup()
callback.

Thanks!

v1 -> v2:
  - BITS_PER_LONG guard in patch 1
  - BPF_EMIT_CALL is on __htab_map_lookup_elem

Daniel Borkmann (2):
  bpf: make htab inlining more robust wrt assumptions
  bpf: inline map in map lookup functions for array and htab

 kernel/bpf/arraymap.c | 26 ++++++++++++++++++++++++++
 kernel/bpf/hashtab.c  | 17 +++++++++++++++++
 kernel/bpf/verifier.c |  6 +++++-
 3 files changed, 48 insertions(+), 1 deletion(-)

-- 
1.9.3

^ permalink raw reply

* [PATCH net-next v2 2/2] bpf: inline map in map lookup functions for array and htab
From: Daniel Borkmann @ 2017-08-19  1:12 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Daniel Borkmann
In-Reply-To: <cover.1503104831.git.daniel@iogearbox.net>

Avoid two successive functions calls for the map in map lookup, first
is the bpf_map_lookup_elem() helper call, and second the callback via
map->ops->map_lookup_elem() to get to the map in map implementation.
Implementation inlines array and htab flavor for map in map lookups.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/arraymap.c | 26 ++++++++++++++++++++++++++
 kernel/bpf/hashtab.c  | 17 +++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index d771a38..b25d6ce 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -603,6 +603,31 @@ static void *array_of_map_lookup_elem(struct bpf_map *map, void *key)
 	return READ_ONCE(*inner_map);
 }
 
+static u32 array_of_map_gen_lookup(struct bpf_map *map,
+				   struct bpf_insn *insn_buf)
+{
+	u32 elem_size = round_up(map->value_size, 8);
+	struct bpf_insn *insn = insn_buf;
+	const int ret = BPF_REG_0;
+	const int map_ptr = BPF_REG_1;
+	const int index = BPF_REG_2;
+
+	*insn++ = BPF_ALU64_IMM(BPF_ADD, map_ptr, offsetof(struct bpf_array, value));
+	*insn++ = BPF_LDX_MEM(BPF_W, ret, index, 0);
+	*insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 5);
+	if (is_power_of_2(elem_size))
+		*insn++ = BPF_ALU64_IMM(BPF_LSH, ret, ilog2(elem_size));
+	else
+		*insn++ = BPF_ALU64_IMM(BPF_MUL, ret, elem_size);
+	*insn++ = BPF_ALU64_REG(BPF_ADD, ret, map_ptr);
+	*insn++ = BPF_LDX_MEM(BPF_DW, ret, ret, 0);
+	*insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1);
+	*insn++ = BPF_JMP_IMM(BPF_JA, 0, 0, 1);
+	*insn++ = BPF_MOV64_IMM(ret, 0);
+
+	return insn - insn_buf;
+}
+
 const struct bpf_map_ops array_of_maps_map_ops = {
 	.map_alloc = array_of_map_alloc,
 	.map_free = array_of_map_free,
@@ -612,4 +637,5 @@ static void *array_of_map_lookup_elem(struct bpf_map *map, void *key)
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
+	.map_gen_lookup = array_of_map_gen_lookup,
 };
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4fb4631..3f9a5a2 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1311,6 +1311,22 @@ static void *htab_of_map_lookup_elem(struct bpf_map *map, void *key)
 	return READ_ONCE(*inner_map);
 }
 
+static u32 htab_of_map_gen_lookup(struct bpf_map *map,
+				  struct bpf_insn *insn_buf)
+{
+	struct bpf_insn *insn = insn_buf;
+	const int ret = BPF_REG_0;
+
+	*insn++ = BPF_EMIT_CALL((u64 (*)(u64, u64, u64, u64, u64))__htab_map_lookup_elem);
+	*insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 2);
+	*insn++ = BPF_ALU64_IMM(BPF_ADD, ret,
+				offsetof(struct htab_elem, key) +
+				round_up(map->key_size, 8));
+	*insn++ = BPF_LDX_MEM(BPF_DW, ret, ret, 0);
+
+	return insn - insn_buf;
+}
+
 static void htab_of_map_free(struct bpf_map *map)
 {
 	bpf_map_meta_free(map->inner_map_meta);
@@ -1326,4 +1342,5 @@ static void htab_of_map_free(struct bpf_map *map)
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
+	.map_gen_lookup = htab_of_map_gen_lookup,
 };
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT
From: Daniel Borkmann @ 2017-08-19  1:18 UTC (permalink / raw)
  To: David Daney, Alexei Starovoitov, David S. Miller, netdev,
	linux-kernel
  Cc: linux-mips, ralf
In-Reply-To: <20170818234033.5990-1-david.daney@cavium.com>

On 08/19/2017 01:40 AM, David Daney wrote:
> Here are several improvements and bug fixes for the MIPS eBPF JIT.
>
> The main change is the addition of support for JLT, JLE, JSLT and JSLE
> ops, that were recently added.
>
> Also fix WARN output when used with preemptable kernel, and a small
> cleanup/optimization in the use of BPF_OP(insn->code).
>
> I suggest that the whole thing go via the BPF/net-next path as there
> are dependencies on code that is not yet merged to Linus' tree.

Yes, this would be via net-next.

> Still pending are changes to reduce stack usage when the verifier can
> determine the maximum stack size.

Awesome, thanks a lot!

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] bpf: make htab inlining more robust wrt assumptions
From: Alexei Starovoitov @ 2017-08-19  1:20 UTC (permalink / raw)
  To: Daniel Borkmann, davem; +Cc: netdev
In-Reply-To: <03f4e86a029058d0f674fd9bf288e55a5ec07df3.1503104831.git.daniel@iogearbox.net>

On 8/18/17 6:12 PM, Daniel Borkmann wrote:
> Commit 9015d2f59535 ("bpf: inline htab_map_lookup_elem()") was
> making the assumption that a direct call emission to the function
> __htab_map_lookup_elem() will always work out for JITs.
>
> This is currently true since all JITs we have are for 64 bit archs,
> but in case of 32 bit JITs like upcoming arm32, we get a NULL pointer
> dereference when executing the call to __htab_map_lookup_elem()
> since passed arguments are of a different size (due to pointer args)
> than what we do out of BPF. Guard and thus limit this for now for
> the current 64 bit JITs only.
>
> Reported-by: Shubham Bansal <illusionist.neo@gmail.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Alexei Starovoitov <ast@kernel.org>
Thanks. That's good robustness fix.

^ permalink raw reply

* [PATCH net-next] liquidio: fix use of pf in pass-through mode in a virtual machine
From: Felix Manlunas @ 2017-08-19  1:21 UTC (permalink / raw)
  To: davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	ricardo.farrington

From: Rick Farrington <ricardo.farrington@cavium.com>

Fix problem when PF is used in pass-through mode in a VM (w/embedded f/w).

If host error reading PF num from CN23XX_PCIE_SRIOV_FDL reg,
try to retrieve PF num from SLI_PKT(0)_INPUT_CONTROL (initialized by f/w).

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
 .../ethernet/cavium/liquidio/cn23xx_pf_device.c    | 47 +++++++++++++++++++---
 drivers/net/ethernet/cavium/liquidio/lio_main.c    |  2 +
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index 4b0ca9f..fbc0d4e 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -1150,14 +1150,50 @@ static void cn23xx_get_pcie_qlmport(struct octeon_device *oct)
 		oct->pcie_port);
 }
 
-static void cn23xx_get_pf_num(struct octeon_device *oct)
+static int cn23xx_get_pf_num(struct octeon_device *oct)
 {
 	u32 fdl_bit = 0;
+	u64 pkt0_in_ctl, d64;
+	int pfnum, mac, trs, ret;
+
+	ret = 0;
 
 	/** Read Function Dependency Link reg to get the function number */
-	pci_read_config_dword(oct->pci_dev, CN23XX_PCIE_SRIOV_FDL, &fdl_bit);
-	oct->pf_num = ((fdl_bit >> CN23XX_PCIE_SRIOV_FDL_BIT_POS) &
-		       CN23XX_PCIE_SRIOV_FDL_MASK);
+	if (pci_read_config_dword(oct->pci_dev, CN23XX_PCIE_SRIOV_FDL,
+				  &fdl_bit) == 0) {
+		oct->pf_num = ((fdl_bit >> CN23XX_PCIE_SRIOV_FDL_BIT_POS) &
+			       CN23XX_PCIE_SRIOV_FDL_MASK);
+	} else {
+		ret = EINVAL;
+
+		/* Under some virtual environments, extended PCI regs are
+		 * inaccessible, in which case the above read will have failed.
+		 * In this case, read the PF number from the
+		 * SLI_PKT0_INPUT_CONTROL reg (written by f/w)
+		 */
+		pkt0_in_ctl = octeon_read_csr64(oct,
+						CN23XX_SLI_IQ_PKT_CONTROL64(0));
+		pfnum = (pkt0_in_ctl >> CN23XX_PKT_INPUT_CTL_PF_NUM_POS) &
+			CN23XX_PKT_INPUT_CTL_PF_NUM_MASK;
+		mac = (octeon_read_csr(oct, CN23XX_SLI_MAC_NUMBER)) & 0xff;
+
+		/* validate PF num by reading RINFO; f/w writes RINFO.trs == 1*/
+		d64 = octeon_read_csr64(oct,
+					CN23XX_SLI_PKT_MAC_RINFO64(mac, pfnum));
+		trs = (int)(d64 >> CN23XX_PKT_MAC_CTL_RINFO_TRS_BIT_POS) & 0xff;
+		if (trs == 1) {
+			dev_err(&oct->pci_dev->dev,
+				"OCTEON: error reading PCI cfg space pfnum, re-read %u\n",
+				pfnum);
+			oct->pf_num = pfnum;
+			ret = 0;
+		} else {
+			dev_err(&oct->pci_dev->dev,
+				"OCTEON: error reading PCI cfg space pfnum; could not ascertain PF number\n");
+		}
+	}
+
+	return ret;
 }
 
 static void cn23xx_setup_reg_address(struct octeon_device *oct)
@@ -1279,7 +1315,8 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
 		return 1;
 	}
 
-	cn23xx_get_pf_num(oct);
+	if (cn23xx_get_pf_num(oct) != 0)
+		return 1;
 
 	if (cn23xx_sriov_config(oct)) {
 		octeon_unmap_pci_barx(oct, 0);
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index cbd6287..1a4fc17 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -1848,6 +1848,8 @@ static int octeon_chip_specific_setup(struct octeon_device *oct)
 	case OCTEON_CN23XX_PCIID_PF:
 		oct->chip_id = OCTEON_CN23XX_PF_VID;
 		ret = setup_cn23xx_octeon_pf_device(oct);
+		if (ret)
+			break;
 #ifdef CONFIG_PCI_IOV
 		if (!ret)
 			pci_sriov_set_totalvfs(oct->pci_dev,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net] ipv6: add rcu grace period before freeing fib6_node
From: Martin KaFai Lau @ 2017-08-19  2:20 UTC (permalink / raw)
  To: Wei Wang; +Cc: David Miller, netdev, Eric Dumazet
In-Reply-To: <20170819003655.64903-1-tracywwnj@gmail.com>

On Fri, Aug 18, 2017 at 05:36:55PM -0700, Wei Wang wrote:
> From: Wei Wang <weiwan@google.com>
>
> We currently keep rt->rt6i_node pointing to the fib6_node for the route.
> And some functions make use of this pointer to dereference the fib6_node
> from rt structure, e.g. rt6_check(). However, as there is neither
> refcount nor rcu taken when dereferencing rt->rt6i_node, it could
> potentially cause crashes as rt->rt6i_node could be set to NULL by other
> CPUs when doing a route deletion.
> This patch introduces an rcu grace period before freeing fib6_node and
> makes sure the functions that dereference it takes rcu_read_lock().
>
> Note: there is no "Fixes" tag because this bug was there in a very
> early stage.
>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
Looks good. Thanks for the fixing it.
Only have some nits comments.

> ---
>  include/net/ip6_fib.h | 31 ++++++++++++++++++++++++++++++-
>  net/ipv6/ip6_fib.c    | 20 ++++++++++++++++----
>  net/ipv6/route.c      | 14 +++++++++++---
>  3 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
> index 71c1646298ae..5691faf6b495 100644
> --- a/include/net/ip6_fib.h
> +++ b/include/net/ip6_fib.h
> @@ -72,6 +72,7 @@ struct fib6_node {
>  	__u16			fn_flags;
>  	int			fn_sernum;
>  	struct rt6_info		*rr_ptr;
> +	struct rcu_head		rcu;
>  };
>
>  #ifndef CONFIG_IPV6_SUBTREES
> @@ -171,13 +172,41 @@ static inline void rt6_update_expires(struct rt6_info *rt0, int timeout)
>  	rt0->rt6i_flags |= RTF_EXPIRES;
>  }
>
> +/* Function to safely get fn->sernum for passed in rt
> + * and store result in passed in cookie.
> + * Return true if we can get cookie safely
> + * Return false if not
> + */
> +static inline bool rt6_get_cookie_safe(const struct rt6_info *rt,
> +				       u32 *cookie)
Looking at fib6_new_sernum(), fn_sernum should be >0.

Would it further simplify the later changes if we do this instead?:
static inline u32 rt6_get_cookie_safe(const struct rt6_info *rt)

> +{
> +	struct fib6_node *fn;
> +	bool status = false;
> +
> +	rcu_read_lock();
> +	fn = rcu_dereference(rt->rt6i_node);
> +
> +	if (fn) {
> +		*cookie = fn->fn_sernum;
> +		status = true;
> +	}
> +
> +	rcu_read_unlock();
> +	return status;
> +
extra newline.

> +}
> +
>  static inline u32 rt6_get_cookie(const struct rt6_info *rt)
>  {
> +	u32 cookie = 0;
> +
>  	if (rt->rt6i_flags & RTF_PCPU ||
>  	    (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->dst.from))
>  		rt = (struct rt6_info *)(rt->dst.from);
>
> -	return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
> +	rt6_get_cookie_safe(rt, &cookie);
> +
> +	return cookie;
>  }
>
>  static inline void ip6_rt_put(struct rt6_info *rt)
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 549aacc3cb2c..a9821c230e4e 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -149,11 +149,23 @@ static struct fib6_node *node_alloc(void)
>  	return fn;
>  }
>
> -static void node_free(struct fib6_node *fn)
> +static void node_free_immediate(struct fib6_node *fn)
> +{
> +	kmem_cache_free(fib6_node_kmem, fn);
> +}
> +
> +static void node_free_rcu(struct rcu_head *head)
>  {
> +	struct fib6_node *fn = container_of(head, struct fib6_node, rcu);
> +
>  	kmem_cache_free(fib6_node_kmem, fn);
>  }
>
> +static void node_free(struct fib6_node *fn)
> +{
> +	call_rcu(&fn->rcu, node_free_rcu);
> +}
> +
>  void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
>  {
>  	int cpu;
> @@ -697,9 +709,9 @@ static struct fib6_node *fib6_add_1(struct fib6_node *root,
>
>  		if (!in || !ln) {
>  			if (in)
> -				node_free(in);
> +				node_free_immediate(in);
>  			if (ln)
> -				node_free(ln);
> +				node_free_immediate(ln);
>  			return ERR_PTR(-ENOMEM);
>  		}
>
> @@ -1138,7 +1150,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt,
>  				   root, and then (in failure) stale node
>  				   in main tree.
>  				 */
> -				node_free(sfn);
> +				node_free_immediate(sfn);
>  				err = PTR_ERR(sn);
>  				goto failure;
>  			}
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index bec12ae3e6b7..4de2d793c4b8 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1289,7 +1289,9 @@ static void rt6_dst_from_metrics_check(struct rt6_info *rt)
>
>  static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
>  {
> -	if (!rt->rt6i_node || (rt->rt6i_node->fn_sernum != cookie))
> +	u32 rt_cookie;
> +
> +	if (!rt6_get_cookie_safe(rt, &rt_cookie) || rt_cookie != cookie)
>  		return NULL;
>
>  	if (rt6_check_expired(rt))
> @@ -1357,8 +1359,14 @@ static void ip6_link_failure(struct sk_buff *skb)
>  		if (rt->rt6i_flags & RTF_CACHE) {
>  			if (dst_hold_safe(&rt->dst))
>  				ip6_del_rt(rt);
> -		} else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) {
> -			rt->rt6i_node->fn_sernum = -1;
> +		} else {
> +			struct fib6_node *fn;
> +
> +			rcu_read_lock();
> +			fn = rcu_dereference(rt->rt6i_node);
> +			if (fn && (rt->rt6i_flags & RTF_DEFAULT))
> +				fn->fn_sernum = -1;
> +			rcu_read_unlock();
>  		}
>  	}
>  }
> --
> 2.14.1.480.gb18f417b89-goog
>

^ permalink raw reply

* Re: [net-next PATCH 06/10] bpf: sockmap with sk redirect support
From: John Fastabend @ 2017-08-19  3:30 UTC (permalink / raw)
  To: Alexei Starovoitov, davem, daniel; +Cc: tgraf, netdev, tom
In-Reply-To: <653a9c34-c187-636d-40f4-eb74a819c3d5@fb.com>

[...] (trimmed email leaving proposal - 1 summary)

>>
>>  syscall:
>>
>>   bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
>>   bpf_prog_attach(verdict_prog, map_fd, BPF_SMAP_STREAM_VERDICT, 0);
>>   bpf_prog_attach(parse_prog, map_fd, BPF_SMAP_STREAM_PARSER, 0);
>>   bpf_map_update_elem(map_fd, key, sock_fd, BPF_ANY)
>>   bpf_map_delete_elem(map_fd, key)
>>
>>  helpers:
>>   to insert sock from sock ops progrm
>>       bpf_sock_map_update(skops, map, key, flags);
>>   to redirect skb to a sock in a sockmap
>>       bpf_sk_redirect_map(map, key, flags)
>>
>>  future work:
>>   bpf_prog_attach(verdict_prog, map_fd, BPF_SOCK_STREAM_VERDICT, 0)
>>   bpf_prog_attach(parse_prog, map_fd, BPF_SOCK_STREAM_PARSER, 0)
>>
>> How does this look? I think it will be both extensible and very usable
>> now.
> 
> Above sounds much better than the present situation.
> Can we take it even further and split psock from sockmap?

So psock data structure itself is almost entirely split at this
point anyways. The remaining two pieces are back pointers to the
map and a key so we can remove the entry when needed. The reason
is we need to remove the map entry when the socket is closed.

We use the sock insertion into the map as the trigger to create
the psock.

> My understanding that psock->key is there only because you tied
> psock with the map and using map as a storage for the rx socket

Almost. To clarify the psock only ever uses the key/map entries
to remove itself on a TCP state change that would close the sock.
.
> imo separating rx and tx sockets will make it cleaner.
> Like we can have new syscall cmd that creates psock that holds
> strpaser, verdict and potentially other programs.
> Later sock ops program will use a helper:
> bpf_psock_update(skops, psock_obj_handle, flags);
> to assign single skops socket into this psock object.
> The programs (strparser, verdict) will be applied to this skops socket,
> so your inheritance requirement is satisfied.
> And use sockmap only for TX sockets. Either user space via syscall
> will store them in there or sockops program will store them into the map
> via bpf_sock_map_update(skops, sockmap, key, flags); helper.
> Later the verdict program will use
> bpf_sk_redirect_map(sockmap, key, flags);
> and for the program author no need to worry about 'type' of socket
> in the sockmap. All sockets in there are TX sockets to redirect to.
> And the same verdict program can use multiple sockmaps.
> Similarly user space can create multiple psock objects with
> same strparser+verdict programs or different and sockops prog
> can pick and choose which psock to use to assign RX socket into.
> 

I prefer the alternative below it seems a bit cleaner to me. I think
a very similar programming flow to the above can be achieved using
the primitives below.

> Another alternative:
> Instead of new psock object to store single socket (like current
> implementation does), we can do two types of sockmap.
> One for a set of RX sockets. All of them will have the same
> strparser+verdict progs and psock with skbuff queue will be part
> of this sockmap type.
> And another sockmap type for TX sockets that don't have skbuff queues

Clarification the skbuff queue in the psock data structure,

	struct sk_buff_head rxqueue;

is attached to the TX socket actually and run through the workqueue.
The naming is perhaps unfortunate, it made sense to me because the
sending sock is receiving skbs from multiple socks.

> at all and can only be used to redirect the RX socket into.
> So bpf_rx_sock_map_update() helper will be used only on RX_SOCKMAP map
> and bpf_tx_sock_map_update() helper will be used only on TX_SOCKMAP,
> while bpf_sk_redirect_map() can only be used on TX_SOCKMAP.
> 

So this is really close to what I proposed above. For a TX_SOCKMAP
simply do not attach any programs,

   bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
   [...]

For an RX_SOCKMAP,

   bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
   bpf_prog_attach(verdict_prog, map_fd, BPF_SMAP_STREAM_VERDICT, 0);
   bpf_prog_attach(parse_prog, map_fd, BPF_SMAP_STREAM_PARSER, 0);   

With the new attach type (compared to the fd2 thing before) we can easily
extend maps to contain other program types as needed. So in the future
we might have TX_SOCKMAP, RX_SOCKMAP, FOO_SOCKMAP, ...

I don't see the need to have the API enforce the map type via update
specifiers bpf_{rx|tx}_sock_map_update. The programmer should "know"
the type by virtue of the programs attached. This is more flexible
as well because it allows a map to be TX only, RX only or TX/RX.

With this proposal we can relax the restriction where a sock can only
be in a single map and even allow a sock to be in the same map multiple
times. The limitation we do have to enforce is allowing a sock in the
a map with different BPF_SMAP_STREAM_* programs. But I think this
should be clear to the programmer (with good tracing functions and
error codes).

Slight aside: but by creating map size of 1 we have an object that
contains programs and later we can attach a sock to it, looks like
the following,

      create_map(BPF_MAP_TYPE_SOCKMAP,...)
      bpf_prog_attach(...) 
      [...]
      bpf_update_map_elem(fd, map, key, flags)

I think this is very close to your first approach where you suggested
a program container object.

> Or you have cases when two RX sockets need to redirect into each
> other and in both cases strparser+verdict need to run?

If we don't do rx, tx restrictions and use my suggestion here we
don't have this limitation. OR because we allow socks in multiple
maps now the user can simply put the sockets in different maps.

> In such case we need to allow bpf_sk_redirect_map() to use on
> RX_SOCKMAP map as well,
> but looking at current implementation you only allow one psock per map,
> so two sockets forwarding to each other cannot work due to only one queue.
> Am I missing anything from what you want to achieve?

I don't think so. But lets get rid of the one psock per map, I took a shot
at relaxing that today and was able to get it with a refcount on the psock
which seems to work OK.

Also reorganizing the psock structure into clear sections tx_psock, rx_psock,
general_psock will probably help readers.

> Thoughts?
> 

What do you think of my counter proposal I started coding it up and it
actually (other than pushing code snippets around) seems to work out
nicely with the existing code base. I think it is really a nice improvement.

Thanks,
John

^ permalink raw reply

* Re: [RFC] about net: Fix inconsistent teardown and release of private netdev state.
From: Eric Dumazet @ 2017-08-19  3:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20170818.155835.2053259542629638150.davem@davemloft.net>

On Fri, 2017-08-18 at 15:58 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 18 Aug 2017 06:13:49 -0700
> 
> > On Thu, 2017-08-17 at 22:21 -0700, David Miller wrote:
> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> Date: Thu, 17 Aug 2017 15:30:40 -0700
> >> 
> >> > So we do not really know if we need to clean up or not.
> >> 
> >> We always know, the answer is that whenever register_netdev() fails we
> >> never need to perform any cleanup which is done by priv_destructor.
> >> 
> >> > Any idea how to fix the issue ?
> >> 
> >> Your patch is exactly how we should fix this, but without the comment.
> >> The logic is straightforward.
> >> 
> >> If register_netdevice() fails any resources handled by priv_destructor
> >> are cleaned up, it is guaranteed.
> > 
> > Not in current code.
> > 
> > There are some failures which do a "goto out;" 
> > 
> > out:
> > 	return ret;
> > 
> > 
> > In these cases, priv_destructor is not called.
> > 
> > So we need multiple fixes I think :/
> 
> I don't think so.
> 
> The cases that "goto out;" in register_netdevce() are those that
> execute before ->ndo_init() succeeds.
> 
> Only if ->ndo_succeeds() runs successfully should ->priv_destructor()
> need execute.
> 
> So everything is fine as far as I can see.

Let look at tun->pcpu_stats, for example.

It is allocated at line 1831, before the register_netdevice()

drivers/net/tun.c does not provide ndo_init()

^ permalink raw reply

* Re: [net-next PATCH 06/10] bpf: sockmap with sk redirect support
From: Alexei Starovoitov @ 2017-08-19  4:50 UTC (permalink / raw)
  To: John Fastabend, davem, daniel; +Cc: tgraf, netdev, tom
In-Reply-To: <5997B0C5.703@gmail.com>

On 8/18/17 8:30 PM, John Fastabend wrote:
> So this is really close to what I proposed above. For a TX_SOCKMAP
> simply do not attach any programs,
>
>    bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
>    [...]
>
> For an RX_SOCKMAP,
>
>    bpf_create_map(BPF_MAP_TYPE_SOCKMAP, .... )
>    bpf_prog_attach(verdict_prog, map_fd, BPF_SMAP_STREAM_VERDICT, 0);
>    bpf_prog_attach(parse_prog, map_fd, BPF_SMAP_STREAM_PARSER, 0);
>
> With the new attach type (compared to the fd2 thing before) we can easily
> extend maps to contain other program types as needed. So in the future
> we might have TX_SOCKMAP, RX_SOCKMAP, FOO_SOCKMAP, ...

agree. that sounds as good generalization.

> I don't see the need to have the API enforce the map type via update
> specifiers bpf_{rx|tx}_sock_map_update. The programmer should "know"
> the type by virtue of the programs attached. This is more flexible
> as well because it allows a map to be TX only, RX only or TX/RX.

makes sense. good point.

> With this proposal we can relax the restriction where a sock can only
> be in a single map and even allow a sock to be in the same map multiple
> times. The limitation we do have to enforce is allowing a sock in the
> a map with different BPF_SMAP_STREAM_* programs. But I think this
> should be clear to the programmer (with good tracing functions and
> error codes).
>
> Slight aside: but by creating map size of 1 we have an object that
> contains programs and later we can attach a sock to it, looks like
> the following,
>
>       create_map(BPF_MAP_TYPE_SOCKMAP,...)
>       bpf_prog_attach(...)
>       [...]
>       bpf_update_map_elem(fd, map, key, flags)
>
> I think this is very close to your first approach where you suggested
> a program container object.

yep.

>> Or you have cases when two RX sockets need to redirect into each
>> other and in both cases strparser+verdict need to run?
> If we don't do rx, tx restrictions and use my suggestion here we
> don't have this limitation. OR because we allow socks in multiple
> maps now the user can simply put the sockets in different maps.

agree. good point as well.

>> In such case we need to allow bpf_sk_redirect_map() to use on
>> RX_SOCKMAP map as well,
>> but looking at current implementation you only allow one psock per map,
>> so two sockets forwarding to each other cannot work due to only one queue.
>> Am I missing anything from what you want to achieve?
> I don't think so. But lets get rid of the one psock per map, I took a shot
> at relaxing that today and was able to get it with a refcount on the psock
> which seems to work OK.

+1

> Also reorganizing the psock structure into clear sections tx_psock, rx_psock,
> general_psock will probably help readers.

nice. thanks!

>> Thoughts?
>>
> What do you think of my counter proposal I started coding it up and it
> actually (other than pushing code snippets around) seems to work out
> nicely with the existing code base. I think it is really a nice improvement.

ok. I think we're mostly on the same page and patches will
either bring us to the full agreement or show where we disagree :)
To clarify, I think the current code base is pretty good.
I'm only arguing to fix up the rough spots of the uapi to make
sure we don't corner ourselves with future extensions that I feel
inevitably will follow.
The feature itself is quite important and I feel a bit sad that it 
landed without enough due diligence. The RFC patches didn't get
much attentions and I didn't have time until now to look into them
in depth.

^ permalink raw reply

* [PATCH net-next 0/3 v6] Add support for rmnet driver
From: Subash Abhinov Kasiviswanathan @ 2017-08-19  5:35 UTC (permalink / raw)
  To: netdev, davem, fengguang.wu, dcbw, jiri, stephen, David.Laight,
	marcel
  Cc: Subash Abhinov Kasiviswanathan

This patch adds support for the rmnet driver which is required to
support recent chipsets using Qualcomm Technologies, Inc. modems. The data
from hardware follows the multiplexing and aggregation protocol (MAP).

This driver can be used to register onto any physical network device in
IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.

rmnet driver helps to decode these packets and queue them to network
stack (and encode and transmit it to the physical device).

--
v1: Same as the RFC patch with some minor fixes for issues reported by
kbuild test robot.

v1->v2: Change datatypes and remove config IOCTL as mentioned by David.
Also fix checkpatch issues and remove some unused code.

v2->v3: Move location to drivers/net and rename to rmnet. Change the
userspace - netlink communication from custom netlink to rtnl_link_ops.
Refactor some code. Use a fixed config for ingress and egress.

v3->v4: Move location to drivers/net/ethernet/qualcomm/.
Fix comments from Stephen and Jiri -
Split the ether and arp type changes into seperate patches.
Remove debug and custom logging and switch to standard netdevice log.
Remove module parameters. Refactor and change some code style issues.

v4->v5: Rename some structs and variables. Move the initializer
before the for loop start. Put the arp type in correct sequence.

v5->v6: Fix comments from Dan -
Use the upper link API. As a result, remove all the refcounting logic.
Device refcount is explicitly held on real_dev on rx_handler
registration only. Modifiy the flow control struct. Remove the unused
ethernet mode handling.

Subash Abhinov Kasiviswanathan (3):
  net: ether: Add support for multiplexing and aggregation type
  net: arp: Add support for raw IP device
  drivers: net: ethernet: qualcomm: rmnet: Initial implementation

 Documentation/networking/rmnet.txt                 |  82 ++++
 drivers/net/ethernet/qualcomm/Kconfig              |   2 +
 drivers/net/ethernet/qualcomm/Makefile             |   2 +
 drivers/net/ethernet/qualcomm/rmnet/Kconfig        |  12 +
 drivers/net/ethernet/qualcomm/rmnet/Makefile       |  14 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 412 +++++++++++++++++++++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  57 +++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 276 ++++++++++++++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h   |  26 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_main.c   |  37 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h    |  88 +++++
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c    | 116 ++++++
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 105 ++++++
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h    |  45 +++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c    | 268 ++++++++++++++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h    |  33 ++
 include/uapi/linux/if_arp.h                        |   1 +
 include/uapi/linux/if_ether.h                      |   4 +-
 18 files changed, 1579 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/rmnet.txt
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_main.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h

-- 
1.9.1

^ permalink raw reply

* [PATCH net-next 1/3 v6] net: ether: Add support for multiplexing and aggregation type
From: Subash Abhinov Kasiviswanathan @ 2017-08-19  5:35 UTC (permalink / raw)
  To: netdev, davem, fengguang.wu, dcbw, jiri, stephen, David.Laight,
	marcel
  Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1503120931-30092-1-git-send-email-subashab@codeaurora.org>

Define the multiplexing and aggregation (MAP) ether type 0xDA1A. This
is needed for receiving data in the MAP protocol like RMNET. This is
not an officially registered ID.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 include/uapi/linux/if_ether.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 5bc9bfd..e80b03f 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -104,7 +104,9 @@
 #define ETH_P_QINQ3	0x9300		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
 #define ETH_P_EDSA	0xDADA		/* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
 #define ETH_P_AF_IUCV   0xFBFB		/* IBM af_iucv [ NOT AN OFFICIALLY REGISTERED ID ] */
-
+#define ETH_P_MAP       0xDA1A          /* Multiplexing and Aggregation Protocol
+					 *  NOT AN OFFICIALLY REGISTERED ID ]
+					 */
 #define ETH_P_802_3_MIN	0x0600		/* If the value in the ethernet type is less than this value
 					 * then the frame is Ethernet II. Else it is 802.3 */
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 2/3 v6] net: arp: Add support for raw IP device
From: Subash Abhinov Kasiviswanathan @ 2017-08-19  5:35 UTC (permalink / raw)
  To: netdev, davem, fengguang.wu, dcbw, jiri, stephen, David.Laight,
	marcel
  Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1503120931-30092-1-git-send-email-subashab@codeaurora.org>

Define the raw IP type. This is needed for raw IP net devices
like rmnet.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 include/uapi/linux/if_arp.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
index cf73510..a2a6356 100644
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -59,6 +59,7 @@
 #define ARPHRD_LAPB	516		/* LAPB				*/
 #define ARPHRD_DDCMP    517		/* Digital's DDCMP protocol     */
 #define ARPHRD_RAWHDLC	518		/* Raw HDLC			*/
+#define ARPHRD_RAWIP    519		/* Raw IP                       */
 
 #define ARPHRD_TUNNEL	768		/* IPIP tunnel			*/
 #define ARPHRD_TUNNEL6	769		/* IP6IP6 tunnel       		*/
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 3/3 v6] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
From: Subash Abhinov Kasiviswanathan @ 2017-08-19  5:35 UTC (permalink / raw)
  To: netdev, davem, fengguang.wu, dcbw, jiri, stephen, David.Laight,
	marcel
  Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1503120931-30092-1-git-send-email-subashab@codeaurora.org>

RmNet driver provides a transport agnostic MAP (multiplexing and
aggregation protocol) support in embedded module. Module provides
virtual network devices which can be attached to any IP-mode
physical device. This will be used to provide all MAP functionality
on future hardware in a single consistent location.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 Documentation/networking/rmnet.txt                 |  82 ++++
 drivers/net/ethernet/qualcomm/Kconfig              |   2 +
 drivers/net/ethernet/qualcomm/Makefile             |   2 +
 drivers/net/ethernet/qualcomm/rmnet/Kconfig        |  12 +
 drivers/net/ethernet/qualcomm/rmnet/Makefile       |  14 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 412 +++++++++++++++++++++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  57 +++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 276 ++++++++++++++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h   |  26 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_main.c   |  37 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h    |  88 +++++
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c    | 116 ++++++
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 105 ++++++
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h    |  45 +++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c    | 268 ++++++++++++++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h    |  33 ++
 16 files changed, 1575 insertions(+)
 create mode 100644 Documentation/networking/rmnet.txt
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_main.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h

diff --git a/Documentation/networking/rmnet.txt b/Documentation/networking/rmnet.txt
new file mode 100644
index 0000000..6b341ea
--- /dev/null
+++ b/Documentation/networking/rmnet.txt
@@ -0,0 +1,82 @@
+1. Introduction
+
+rmnet driver is used for supporting the Multiplexing and aggregation
+Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm
+Technologies, Inc. modems.
+
+This driver can be used to register onto any physical network device in
+IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.
+
+Multiplexing allows for creation of logical netdevices (rmnet devices) to
+handle multiple private data networks (PDN) like a default internet, tethering,
+multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends
+packets with MAP headers to rmnet. Based on the multiplexer id, rmnet
+routes to the appropriate PDN after removing the MAP header.
+
+Aggregation is required to achieve high data rates. This involves hardware
+sending aggregated bunch of MAP frames. rmnet driver will de-aggregate
+these MAP frames and send them to appropriate PDN's.
+
+2. Packet format
+
+a. MAP packet (data / control)
+
+MAP header has the same endianness of the IP packet.
+
+Packet format -
+
+Bit             0             1           2-7      8 - 15           16 - 31
+Function   Command / Data   Reserved     Pad   Multiplexer ID    Payload length
+Bit            32 - x
+Function     Raw  Bytes
+
+Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command
+or data packet. Control packet is used for transport level flow control. Data
+packets are standard IP packets.
+
+Reserved bits are usually zeroed out and to be ignored by receiver.
+
+Padding is number of bytes to be added for 4 byte alignment if required by
+hardware.
+
+Multiplexer ID is to indicate the PDN on which data has to be sent.
+
+Payload length includes the padding length but does not include MAP header
+length.
+
+b. MAP packet (command specific)
+
+Bit             0             1           2-7      8 - 15           16 - 31
+Function   Command         Reserved     Pad   Multiplexer ID    Payload length
+Bit          32 - 39        40 - 45    46 - 47       48 - 63
+Function   Command name    Reserved   Command Type   Reserved
+Bit          64 - 95
+Function   Transaction ID
+Bit          96 - 127
+Function   Command data
+
+Command 1 indicates disabling flow while 2 is enabling flow
+
+Command types -
+0 for MAP command request
+1 is to acknowledge the receipt of a command
+2 is for unsupported commands
+3 is for error during processing of commands
+
+c. Aggregation
+
+Aggregation is multiple MAP packets (can be data or command) delivered to
+rmnet in a single linear skb. rmnet will process the individual
+packets and either ACK the MAP command or deliver the IP packet to the
+network stack as needed
+
+MAP header|IP Packet|Optional padding|MAP header|IP Packet|Optional padding....
+MAP header|IP Packet|Optional padding|MAP header|Command Packet|Optional pad...
+
+3. Userspace configuration
+
+rmnet userspace configuration is done through netlink library librmnetctl
+and command line utility rmnetcli. Utility is hosted in codeaurora forum git.
+The driver uses rtnl_link_ops for communication.
+
+https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/dataservices/tree/rmnetctl
diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig
index 877675a..f520071 100644
--- a/drivers/net/ethernet/qualcomm/Kconfig
+++ b/drivers/net/ethernet/qualcomm/Kconfig
@@ -59,4 +59,6 @@ config QCOM_EMAC
 	  low power, Receive-Side Scaling (RSS), and IEEE 1588-2008
 	  Precision Clock Synchronization Protocol.
 
+source "drivers/net/ethernet/qualcomm/rmnet/Kconfig"
+
 endif # NET_VENDOR_QUALCOMM
diff --git a/drivers/net/ethernet/qualcomm/Makefile b/drivers/net/ethernet/qualcomm/Makefile
index 92fa7c4..c4f38bd 100644
--- a/drivers/net/ethernet/qualcomm/Makefile
+++ b/drivers/net/ethernet/qualcomm/Makefile
@@ -9,3 +9,5 @@ obj-$(CONFIG_QCA7000_UART) += qcauart.o
 qcauart-objs := qca_uart.o
 
 obj-y += emac/
+
+obj-$(CONFIG_RMNET) += rmnet/
\ No newline at end of file
diff --git a/drivers/net/ethernet/qualcomm/rmnet/Kconfig b/drivers/net/ethernet/qualcomm/rmnet/Kconfig
new file mode 100644
index 0000000..4948f14
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/Kconfig
@@ -0,0 +1,12 @@
+#
+# RMNET MAP driver
+#
+
+menuconfig RMNET
+	depends on NETDEVICES
+	bool "RmNet MAP driver"
+	default n
+	---help---
+	  If you say Y here, then the rmnet module will be statically
+	  compiled into the kernel. The rmnet module provides MAP
+	  functionality for embedded and bridged traffic.
diff --git a/drivers/net/ethernet/qualcomm/rmnet/Makefile b/drivers/net/ethernet/qualcomm/rmnet/Makefile
new file mode 100644
index 0000000..2b6c9cf
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/Makefile
@@ -0,0 +1,14 @@
+#
+# Makefile for the RMNET module
+#
+
+rmnet-y		 := rmnet_main.o
+rmnet-y		 += rmnet_config.o
+rmnet-y		 += rmnet_vnd.o
+rmnet-y		 += rmnet_handlers.o
+rmnet-y		 += rmnet_map_data.o
+rmnet-y		 += rmnet_map_command.o
+rmnet-y		 += rmnet_stats.o
+obj-$(CONFIG_RMNET) += rmnet.o
+
+CFLAGS_rmnet_main.o := -I$(src)
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
new file mode 100644
index 0000000..5338bab
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -0,0 +1,412 @@
+/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * RMNET configuration engine
+ *
+ */
+
+#include <net/sock.h>
+#include <linux/netlink.h>
+#include <linux/netdevice.h>
+#include "rmnet_config.h"
+#include "rmnet_handlers.h"
+#include "rmnet_vnd.h"
+#include "rmnet_private.h"
+
+/* Local Definitions and Declarations */
+#define RMNET_LOCAL_LOGICAL_ENDPOINT -1
+
+struct rmnet_free_work {
+	struct work_struct work;
+	struct net_device *rmnet_dev;
+};
+
+static inline int
+rmnet_is_real_dev_registered(const struct net_device *real_dev)
+{
+	rx_handler_func_t *rx_handler;
+
+	rx_handler = rcu_dereference(real_dev->rx_handler);
+	return (rx_handler == rmnet_rx_handler);
+}
+
+static inline struct rmnet_real_dev_info*
+__rmnet_get_real_dev_info(const struct net_device *real_dev)
+{
+	if (rmnet_is_real_dev_registered(real_dev))
+		return (struct rmnet_real_dev_info *)
+			rcu_dereference(real_dev->rx_handler_data);
+	else
+		return 0;
+}
+
+static struct rmnet_endpoint*
+rmnet_get_endpoint(struct net_device *dev, int config_id)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	struct rmnet_endpoint *ep;
+
+	if (!rmnet_is_real_dev_registered(dev)) {
+		ep = rmnet_vnd_get_endpoint(dev);
+	} else {
+		rdinfo = __rmnet_get_real_dev_info(dev);
+
+		if (!rdinfo)
+			return NULL;
+
+		if (config_id == RMNET_LOCAL_LOGICAL_ENDPOINT)
+			ep = &rdinfo->local_ep;
+		else
+			ep = &rdinfo->muxed_ep[config_id];
+	}
+
+	return ep;
+}
+
+static int rmnet_unregister_real_device(struct net_device *real_dev)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	struct list_head *iter;
+
+	ASSERT_RTNL();
+
+	if (!rmnet_is_real_dev_registered(real_dev) ||
+	    netdev_lower_get_next(real_dev, &iter))
+		return -EINVAL;
+
+	rdinfo = __rmnet_get_real_dev_info(real_dev);
+	kfree(rdinfo);
+
+	netdev_rx_handler_unregister(real_dev);
+
+	/* release reference on real_dev */
+	dev_put(real_dev);
+
+	netdev_info(real_dev, "Removed from rmnet\n");
+	return 0;
+}
+
+static int rmnet_register_real_device(struct net_device *real_dev)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	int rc;
+
+	ASSERT_RTNL();
+
+	if (rmnet_is_real_dev_registered(real_dev))
+		return -EINVAL;
+
+	rdinfo = kzalloc(sizeof(*rdinfo), GFP_ATOMIC);
+	if (!rdinfo)
+		return -ENOMEM;
+
+	rdinfo->dev = real_dev;
+	rc = netdev_rx_handler_register(real_dev, rmnet_rx_handler, rdinfo);
+
+	if (rc) {
+		kfree(rdinfo);
+		return -EBUSY;
+	}
+
+	/* hold on to real dev for MAP data */
+	dev_hold(real_dev);
+
+	netdev_info(real_dev, "registered with rmnet\n");
+	return 0;
+}
+
+static int rmnet_set_ingress_data_format(struct net_device *dev, u32 idf)
+{
+	struct rmnet_real_dev_info *rdinfo;
+
+	ASSERT_RTNL();
+
+	netdev_info(dev, "Ingress format 0x%08X\n", idf);
+
+	rdinfo = __rmnet_get_real_dev_info(dev);
+	if (!rdinfo)
+		return -EINVAL;
+
+	rdinfo->ingress_data_format = idf;
+
+	return 0;
+}
+
+static int rmnet_set_egress_data_format(struct net_device *dev, u32 edf,
+					u16 agg_size, u16 agg_count)
+{
+	struct rmnet_real_dev_info *rdinfo;
+
+	ASSERT_RTNL();
+
+	netdev_info(dev, "Egress format 0x%08X agg size %d cnt %d\n",
+		    edf, agg_size, agg_count);
+
+	rdinfo = __rmnet_get_real_dev_info(dev);
+	if (!rdinfo)
+		return -EINVAL;
+
+	rdinfo->egress_data_format = edf;
+
+	return 0;
+}
+
+static int __rmnet_set_endpoint_config(struct net_device *dev, int config_id,
+				       struct rmnet_endpoint *ep)
+{
+	struct rmnet_endpoint *dev_ep;
+
+	ASSERT_RTNL();
+
+	dev_ep = rmnet_get_endpoint(dev, config_id);
+
+	if (!dev_ep)
+		return -EINVAL;
+
+	memcpy(dev_ep, ep, sizeof(struct rmnet_endpoint));
+	if (config_id == RMNET_LOCAL_LOGICAL_ENDPOINT)
+		dev_ep->mux_id = 0;
+	else
+		dev_ep->mux_id = config_id;
+
+	return 0;
+}
+
+static int __rmnet_unset_endpoint_config(struct net_device *dev, int config_id)
+{
+	struct rmnet_endpoint *ep = 0;
+
+	ASSERT_RTNL();
+
+	ep = rmnet_get_endpoint(dev, config_id);
+	if (!ep)
+		return -EINVAL;
+
+	memset(ep, 0, sizeof(struct rmnet_endpoint));
+
+	return 0;
+}
+
+static int rmnet_set_endpoint_config(struct net_device *dev,
+				     int config_id, u8 rmnet_mode,
+				     struct net_device *egress_dev)
+{
+	struct rmnet_endpoint ep;
+
+	netdev_info(dev, "id %d mode %d dev %s\n",
+		    config_id, rmnet_mode, egress_dev->name);
+
+	if (config_id < RMNET_LOCAL_LOGICAL_ENDPOINT ||
+	    config_id >= RMNET_MAX_LOGICAL_EP)
+		return -EINVAL;
+
+	memset(&ep, 0, sizeof(struct rmnet_endpoint));
+	ep.rmnet_mode = rmnet_mode;
+	ep.egress_dev = egress_dev;
+
+	return __rmnet_set_endpoint_config(dev, config_id, &ep);
+}
+
+static int rmnet_unset_endpoint_config(struct net_device *dev, int config_id)
+{
+	netdev_info(dev, "id %d\n", config_id);
+
+	if (config_id < RMNET_LOCAL_LOGICAL_ENDPOINT ||
+	    config_id >= RMNET_MAX_LOGICAL_EP)
+		return -EINVAL;
+
+	return __rmnet_unset_endpoint_config(dev, config_id);
+}
+
+static int rmnet_newlink(struct net *src_net, struct net_device *dev,
+			 struct nlattr *tb[], struct nlattr *data[],
+			 struct netlink_ext_ack *extack)
+{
+	int ingress_format = RMNET_INGRESS_FORMAT_DEMUXING |
+			     RMNET_INGRESS_FORMAT_DEAGGREGATION |
+			     RMNET_INGRESS_FORMAT_MAP;
+	int egress_format = RMNET_EGRESS_FORMAT_MUXING |
+			    RMNET_EGRESS_FORMAT_MAP;
+	struct net_device *real_dev;
+	int mode = RMNET_EPMODE_VND;
+	u16 mux_id;
+
+	real_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+	if (!real_dev || !dev)
+		return -ENODEV;
+
+	if (!data[IFLA_VLAN_ID])
+		return -EINVAL;
+
+	mux_id = nla_get_u16(data[IFLA_VLAN_ID]);
+
+	rmnet_register_real_device(real_dev);
+
+	if (rmnet_vnd_newlink(real_dev, mux_id, dev))
+		return -EINVAL;
+
+	rmnet_set_egress_data_format(real_dev, egress_format, 0, 0);
+	rmnet_set_ingress_data_format(real_dev, ingress_format);
+	rmnet_set_endpoint_config(real_dev, mux_id, mode, dev);
+	rmnet_set_endpoint_config(dev, mux_id, mode, real_dev);
+	netdev_master_upper_dev_link(dev, real_dev, NULL, NULL);
+	return 0;
+}
+
+static void rmnet_delink(struct net_device *dev, struct list_head *head)
+{
+	struct net_device *real_dev;
+	int mux_id;
+
+	real_dev = netdev_master_upper_dev_get_rcu(dev);
+	if (real_dev) {
+		mux_id = rmnet_vnd_get_mux(real_dev, dev);
+
+		/* rmnet_vnd_get_mux() gives mux_id + 1,
+		 * so subtract 1 to get the correct mux_id
+		 */
+		mux_id--;
+		rmnet_unset_endpoint_config(real_dev, mux_id);
+		rmnet_unset_endpoint_config(dev, mux_id);
+		rmnet_vnd_remove_ref_dev(real_dev, mux_id);
+		netdev_upper_dev_unlink(dev, real_dev);
+		rmnet_unregister_real_device(real_dev);
+	}
+
+	unregister_netdevice_queue(dev, head);
+}
+
+static void rmnet_free_later(struct work_struct *work)
+{
+	struct rmnet_free_work *fwork;
+
+	fwork = container_of(work, struct rmnet_free_work, work);
+
+	rtnl_lock();
+	rmnet_delink(fwork->rmnet_dev, NULL);
+	rtnl_unlock();
+
+	kfree(fwork);
+}
+
+static int rmnet_dev_walk(struct net_device *lower_dev, void *data)
+{
+	struct net_device *real_dev = data;
+	struct rmnet_free_work *vnd_work;
+	int rc = 0;
+
+	netdev_upper_dev_unlink(lower_dev, real_dev);
+
+	vnd_work = kzalloc(sizeof(*vnd_work), GFP_KERNEL);
+	if (!vnd_work)
+		return -ENOMEM;
+
+	INIT_WORK(&vnd_work->work, rmnet_free_later);
+	vnd_work->rmnet_dev = lower_dev;
+	schedule_work(&vnd_work->work);
+
+	return rc;
+}
+
+static void rmnet_force_unassociate_device(struct net_device *dev)
+{
+	struct net_device *real_dev = dev;
+
+	if (!rmnet_is_real_dev_registered(real_dev))
+		return;
+
+	netdev_walk_all_lower_dev(real_dev, rmnet_dev_walk, real_dev);
+	rmnet_unregister_real_device(real_dev);
+}
+
+static int rmnet_config_notify_cb(struct notifier_block *nb,
+				  unsigned long event, void *data)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(data);
+
+	if (!dev)
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_UNREGISTER_FINAL:
+	case NETDEV_UNREGISTER:
+		netdev_info(dev, "Kernel unregister\n");
+		rmnet_force_unassociate_device(dev);
+		break;
+
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block rmnet_dev_notifier __read_mostly = {
+	.notifier_call = rmnet_config_notify_cb,
+};
+
+static int rmnet_rtnl_validate(struct nlattr *tb[], struct nlattr *data[],
+			       struct netlink_ext_ack *extack)
+{
+	u16 mux_id;
+
+	if (!data || !data[IFLA_VLAN_ID])
+		return -EINVAL;
+
+	mux_id = nla_get_u16(data[IFLA_VLAN_ID]);
+	if (!mux_id || mux_id > (RMNET_MAX_LOGICAL_EP - 1))
+		return -ERANGE;
+
+	return 0;
+}
+
+static size_t rmnet_get_size(const struct net_device *dev)
+{
+	return nla_total_size(2); /* IFLA_VLAN_ID */
+}
+
+struct rtnl_link_ops rmnet_link_ops __read_mostly = {
+	.kind		= "rmnet",
+	.maxtype	= __IFLA_VLAN_MAX,
+	.priv_size	= sizeof(struct rmnet_priv),
+	.setup		= rmnet_vnd_setup,
+	.validate	= rmnet_rtnl_validate,
+	.newlink	= rmnet_newlink,
+	.dellink	= rmnet_delink,
+	.get_size	= rmnet_get_size,
+};
+
+struct rmnet_real_dev_info*
+rmnet_get_real_dev_info(struct net_device *real_dev)
+{
+	return __rmnet_get_real_dev_info(real_dev);
+}
+
+int rmnet_config_init(void)
+{
+	int rc;
+
+	rc = register_netdevice_notifier(&rmnet_dev_notifier);
+	if (rc != 0)
+		return rc;
+
+	rc = rtnl_link_register(&rmnet_link_ops);
+	if (rc != 0) {
+		unregister_netdevice_notifier(&rmnet_dev_notifier);
+		return rc;
+	}
+	return rc;
+}
+
+void rmnet_config_exit(void)
+{
+	unregister_netdevice_notifier(&rmnet_dev_notifier);
+	rtnl_link_unregister(&rmnet_link_ops);
+}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
new file mode 100644
index 0000000..4b29eca
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -0,0 +1,57 @@
+/* Copyright (c) 2013-2014, 2016-2017 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * RMNET Data configuration engine
+ *
+ */
+
+#include <linux/skbuff.h>
+
+#ifndef _RMNET_CONFIG_H_
+#define _RMNET_CONFIG_H_
+
+#define RMNET_MAX_LOGICAL_EP 255
+#define RMNET_MAX_VND        32
+
+/* Information about the next device to deliver the packet to.
+ * Exact usage of this parameter depends on the rmnet_mode.
+ */
+struct rmnet_endpoint {
+	u8 rmnet_mode;
+	u8 mux_id;
+	struct net_device *egress_dev;
+};
+
+/* One instance of this structure is instantiated for each real_dev associated
+ * with rmnet.
+ */
+struct rmnet_real_dev_info {
+	struct net_device *dev;
+	struct rmnet_endpoint local_ep;
+	struct rmnet_endpoint muxed_ep[RMNET_MAX_LOGICAL_EP];
+	u32 ingress_data_format;
+	u32 egress_data_format;
+	struct net_device *rmnet_devices[RMNET_MAX_VND];
+};
+
+extern struct rtnl_link_ops rmnet_link_ops;
+
+struct rmnet_priv {
+	struct rmnet_endpoint local_ep;
+};
+
+struct rmnet_real_dev_info*
+rmnet_get_real_dev_info(struct net_device *real_dev);
+
+int rmnet_config_init(void);
+void rmnet_config_exit(void);
+
+#endif /* _RMNET_CONFIG_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
new file mode 100644
index 0000000..f34fe9e
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -0,0 +1,276 @@
+/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * RMNET Data ingress/egress handler
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <linux/netdev_features.h>
+#include "rmnet_private.h"
+#include "rmnet_config.h"
+#include "rmnet_vnd.h"
+#include "rmnet_map.h"
+#include "rmnet_handlers.h"
+
+#define RMNET_IP_VERSION_4 0x40
+#define RMNET_IP_VERSION_6 0x60
+
+/* Helper Functions */
+
+static inline void rmnet_set_skb_proto(struct sk_buff *skb)
+{
+	switch (skb->data[0] & 0xF0) {
+	case RMNET_IP_VERSION_4:
+		skb->protocol = htons(ETH_P_IP);
+		break;
+	case RMNET_IP_VERSION_6:
+		skb->protocol = htons(ETH_P_IPV6);
+		break;
+	default:
+		skb->protocol = htons(ETH_P_MAP);
+		break;
+	}
+}
+
+/* Generic handler */
+
+static rx_handler_result_t
+rmnet_bridge_handler(struct sk_buff *skb, struct rmnet_endpoint *ep)
+{
+	if (!ep->egress_dev)
+		kfree_skb(skb);
+	else
+		rmnet_egress_handler(skb, ep);
+
+	return RX_HANDLER_CONSUMED;
+}
+
+static rx_handler_result_t
+rmnet_deliver_skb(struct sk_buff *skb, struct rmnet_endpoint *ep)
+{
+	switch (ep->rmnet_mode) {
+	case RMNET_EPMODE_NONE:
+		return RX_HANDLER_PASS;
+
+	case RMNET_EPMODE_BRIDGE:
+		return rmnet_bridge_handler(skb, ep);
+
+	case RMNET_EPMODE_VND:
+		skb_reset_transport_header(skb);
+		skb_reset_network_header(skb);
+		switch (rmnet_vnd_rx_fixup(skb, skb->dev)) {
+		case RX_HANDLER_CONSUMED:
+			return RX_HANDLER_CONSUMED;
+
+		case RX_HANDLER_PASS:
+			skb->pkt_type = PACKET_HOST;
+			skb_set_mac_header(skb, 0);
+			netif_receive_skb(skb);
+			return RX_HANDLER_CONSUMED;
+		}
+		return RX_HANDLER_PASS;
+
+	default:
+		kfree_skb(skb);
+		return RX_HANDLER_CONSUMED;
+	}
+}
+
+static rx_handler_result_t
+rmnet_ingress_deliver_packet(struct sk_buff *skb,
+			     struct rmnet_real_dev_info *rdinfo)
+{
+	if (!rdinfo) {
+		kfree_skb(skb);
+		return RX_HANDLER_CONSUMED;
+	}
+
+	skb->dev = rdinfo->local_ep.egress_dev;
+
+	return rmnet_deliver_skb(skb, &rdinfo->local_ep);
+}
+
+/* MAP handler */
+
+static rx_handler_result_t
+__rmnet_map_ingress_handler(struct sk_buff *skb,
+			    struct rmnet_real_dev_info *rdinfo)
+{
+	struct rmnet_endpoint *ep;
+	u8 mux_id;
+	u16 len;
+
+	if (RMNET_MAP_GET_CD_BIT(skb)) {
+		if (rdinfo->ingress_data_format
+		    & RMNET_INGRESS_FORMAT_MAP_COMMANDS)
+			return rmnet_map_command(skb, rdinfo);
+
+		kfree_skb(skb);
+		return RX_HANDLER_CONSUMED;
+	}
+
+	mux_id = RMNET_MAP_GET_MUX_ID(skb);
+	len = RMNET_MAP_GET_LENGTH(skb) - RMNET_MAP_GET_PAD(skb);
+
+	if (mux_id >= RMNET_MAX_LOGICAL_EP) {
+		kfree_skb(skb);
+		return RX_HANDLER_CONSUMED;
+	}
+
+	ep = &rdinfo->muxed_ep[mux_id];
+
+	if (rdinfo->ingress_data_format & RMNET_INGRESS_FORMAT_DEMUXING)
+		skb->dev = ep->egress_dev;
+
+	/* Subtract MAP header */
+	skb_pull(skb, sizeof(struct rmnet_map_header));
+	skb_trim(skb, len);
+	rmnet_set_skb_proto(skb);
+	return rmnet_deliver_skb(skb, ep);
+}
+
+static rx_handler_result_t
+rmnet_map_ingress_handler(struct sk_buff *skb,
+			  struct rmnet_real_dev_info *rdinfo)
+{
+	struct sk_buff *skbn;
+	int rc;
+
+	if (rdinfo->ingress_data_format & RMNET_INGRESS_FORMAT_DEAGGREGATION) {
+		while ((skbn = rmnet_map_deaggregate(skb, rdinfo)) != NULL)
+			__rmnet_map_ingress_handler(skbn, rdinfo);
+
+		consume_skb(skb);
+		rc = RX_HANDLER_CONSUMED;
+	} else {
+		rc = __rmnet_map_ingress_handler(skb, rdinfo);
+	}
+
+	return rc;
+}
+
+static int rmnet_map_egress_handler(struct sk_buff *skb,
+				    struct rmnet_real_dev_info *rdinfo,
+				    struct rmnet_endpoint *ep,
+				    struct net_device *orig_dev)
+{
+	int required_headroom, additional_header_len;
+	struct rmnet_map_header *map_header;
+
+	additional_header_len = 0;
+	required_headroom = sizeof(struct rmnet_map_header);
+
+	if (skb_headroom(skb) < required_headroom) {
+		if (pskb_expand_head(skb, required_headroom, 0, GFP_KERNEL))
+			return RMNET_MAP_CONSUMED;
+	}
+
+	map_header = rmnet_map_add_map_header(skb, additional_header_len, 0);
+	if (!map_header)
+		return RMNET_MAP_CONSUMED;
+
+	if (rdinfo->egress_data_format & RMNET_EGRESS_FORMAT_MUXING) {
+		if (ep->mux_id == 0xff)
+			map_header->mux_id = 0;
+		else
+			map_header->mux_id = ep->mux_id;
+	}
+
+	skb->protocol = htons(ETH_P_MAP);
+
+	return RMNET_MAP_SUCCESS;
+}
+
+/* Ingress / Egress Entry Points */
+
+/* Processes packet as per ingress data format for receiving device. Logical
+ * endpoint is determined from packet inspection. Packet is then sent to the
+ * egress device listed in the logical endpoint configuration.
+ */
+rx_handler_result_t rmnet_rx_handler(struct sk_buff **pskb)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	struct sk_buff *skb = *pskb;
+	struct net_device *dev;
+	int rc;
+
+	if (!skb)
+		return RX_HANDLER_CONSUMED;
+
+	dev = skb->dev;
+	rdinfo = rmnet_get_real_dev_info(dev);
+
+	if (rdinfo->ingress_data_format & RMNET_INGRESS_FORMAT_MAP) {
+		rc = rmnet_map_ingress_handler(skb, rdinfo);
+	} else {
+		switch (ntohs(skb->protocol)) {
+		case ETH_P_MAP:
+			if (rdinfo->local_ep.rmnet_mode ==
+				RMNET_EPMODE_BRIDGE) {
+				rc = rmnet_ingress_deliver_packet(skb, rdinfo);
+			} else {
+				kfree_skb(skb);
+				rc = RX_HANDLER_CONSUMED;
+			}
+			break;
+
+		case ETH_P_IP:
+		case ETH_P_IPV6:
+			rc = rmnet_ingress_deliver_packet(skb, rdinfo);
+			break;
+
+		default:
+			rc = RX_HANDLER_PASS;
+		}
+	}
+
+	return rc;
+}
+
+/* Modifies packet as per logical endpoint configuration and egress data format
+ * for egress device configured in logical endpoint. Packet is then transmitted
+ * on the egress device.
+ */
+void rmnet_egress_handler(struct sk_buff *skb,
+			  struct rmnet_endpoint *ep)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	struct net_device *orig_dev;
+
+	orig_dev = skb->dev;
+	skb->dev = ep->egress_dev;
+
+	rdinfo = rmnet_get_real_dev_info(skb->dev);
+	if (!rdinfo) {
+		kfree_skb(skb);
+		return;
+	}
+
+	if (rdinfo->egress_data_format & RMNET_EGRESS_FORMAT_MAP) {
+		switch (rmnet_map_egress_handler(skb, rdinfo, ep, orig_dev)) {
+		case RMNET_MAP_CONSUMED:
+			return;
+
+		case RMNET_MAP_SUCCESS:
+			break;
+
+		default:
+			kfree_skb(skb);
+			return;
+		}
+	}
+
+	if (ep->rmnet_mode == RMNET_EPMODE_VND)
+		rmnet_vnd_tx_fixup(skb, orig_dev);
+
+	dev_queue_xmit(skb);
+}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
new file mode 100644
index 0000000..f2638cf
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
@@ -0,0 +1,26 @@
+/* Copyright (c) 2013, 2016-2017 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * RMNET Data ingress/egress handler
+ *
+ */
+
+#ifndef _RMNET_HANDLERS_H_
+#define _RMNET_HANDLERS_H_
+
+#include "rmnet_config.h"
+
+void rmnet_egress_handler(struct sk_buff *skb,
+			  struct rmnet_endpoint *ep);
+
+rx_handler_result_t rmnet_rx_handler(struct sk_buff **pskb);
+
+#endif /* _RMNET_HANDLERS_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_main.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_main.c
new file mode 100644
index 0000000..80c3920
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_main.c
@@ -0,0 +1,37 @@
+/* Copyright (c) 2013-2014, 2016-2017 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ *
+ * RMNET Data generic framework
+ *
+ */
+
+#include <linux/module.h>
+#include "rmnet_private.h"
+#include "rmnet_config.h"
+#include "rmnet_vnd.h"
+
+/* Startup/Shutdown */
+
+static int __init rmnet_init(void)
+{
+	rmnet_config_init();
+	return 0;
+}
+
+static void __exit rmnet_exit(void)
+{
+	rmnet_config_exit();
+}
+
+module_init(rmnet_init)
+module_exit(rmnet_exit)
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
new file mode 100644
index 0000000..2aabad2
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -0,0 +1,88 @@
+/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _RMNET_MAP_H_
+#define _RMNET_MAP_H_
+
+struct rmnet_map_control_command {
+	u8  command_name;
+	u8  cmd_type:2;
+	u8  reserved:6;
+	u16 reserved2;
+	u32 transaction_id;
+	union {
+		struct {
+			u16 ip_family:2;
+			u16 reserved:14;
+			u16 flow_control_seq_num;
+			u32 qos_id;
+		} flow_control;
+		u8 data[0];
+	};
+}  __aligned(1);
+
+enum rmnet_map_results {
+	RMNET_MAP_SUCCESS,
+	RMNET_MAP_CONSUMED,
+	RMNET_MAP_GENERAL_FAILURE,
+	RMNET_MAP_NOT_ENABLED,
+	RMNET_MAP_FAILED_AGGREGATION,
+	RMNET_MAP_FAILED_MUX
+};
+
+enum rmnet_map_commands {
+	RMNET_MAP_COMMAND_NONE,
+	RMNET_MAP_COMMAND_FLOW_DISABLE,
+	RMNET_MAP_COMMAND_FLOW_ENABLE,
+	/* These should always be the last 2 elements */
+	RMNET_MAP_COMMAND_UNKNOWN,
+	RMNET_MAP_COMMAND_ENUM_LENGTH
+};
+
+struct rmnet_map_header {
+	u8  pad_len:6;
+	u8  reserved_bit:1;
+	u8  cd_bit:1;
+	u8  mux_id;
+	u16 pkt_len;
+}  __aligned(1);
+
+#define RMNET_MAP_GET_MUX_ID(Y) (((struct rmnet_map_header *) \
+				 (Y)->data)->mux_id)
+#define RMNET_MAP_GET_CD_BIT(Y) (((struct rmnet_map_header *) \
+				(Y)->data)->cd_bit)
+#define RMNET_MAP_GET_PAD(Y) (((struct rmnet_map_header *) \
+				(Y)->data)->pad_len)
+#define RMNET_MAP_GET_CMD_START(Y) ((struct rmnet_map_control_command *) \
+				    ((Y)->data + \
+				      sizeof(struct rmnet_map_header)))
+#define RMNET_MAP_GET_LENGTH(Y) (ntohs(((struct rmnet_map_header *) \
+					(Y)->data)->pkt_len))
+
+#define RMNET_MAP_COMMAND_REQUEST     0
+#define RMNET_MAP_COMMAND_ACK         1
+#define RMNET_MAP_COMMAND_UNSUPPORTED 2
+#define RMNET_MAP_COMMAND_INVALID     3
+
+#define RMNET_MAP_NO_PAD_BYTES        0
+#define RMNET_MAP_ADD_PAD_BYTES       1
+
+u8 rmnet_map_demultiplex(struct sk_buff *skb);
+struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
+				      struct rmnet_real_dev_info *rdinfo);
+
+struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
+						  int hdrlen, int pad);
+rx_handler_result_t rmnet_map_command(struct sk_buff *skb,
+				      struct rmnet_real_dev_info *rdinfo);
+
+#endif /* _RMNET_MAP_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
new file mode 100644
index 0000000..2de93e5
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -0,0 +1,116 @@
+/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/netdevice.h>
+#include "rmnet_config.h"
+#include "rmnet_map.h"
+#include "rmnet_private.h"
+#include "rmnet_vnd.h"
+
+static u8 rmnet_map_do_flow_control(struct sk_buff *skb,
+				    struct rmnet_real_dev_info *rdinfo,
+				    int enable)
+{
+	struct rmnet_map_control_command *cmd;
+	struct rmnet_endpoint *ep;
+	struct net_device *vnd;
+	u16 ip_family;
+	u16 fc_seq;
+	u32 qos_id;
+	u8 mux_id;
+	int r;
+
+	if (unlikely(!skb || !rdinfo))
+		return RX_HANDLER_CONSUMED;
+
+	mux_id = RMNET_MAP_GET_MUX_ID(skb);
+	cmd = RMNET_MAP_GET_CMD_START(skb);
+
+	if (mux_id >= RMNET_MAX_LOGICAL_EP) {
+		kfree_skb(skb);
+		return RX_HANDLER_CONSUMED;
+	}
+
+	ep = &rdinfo->muxed_ep[mux_id];
+	vnd = ep->egress_dev;
+
+	ip_family = cmd->flow_control.ip_family;
+	fc_seq = ntohs(cmd->flow_control.flow_control_seq_num);
+	qos_id = ntohl(cmd->flow_control.qos_id);
+
+	/* Ignore the ip family and pass the sequence number for both v4 and v6
+	 * sequence. User space does not support creating dedicated flows for
+	 * the 2 protocols
+	 */
+	r = rmnet_vnd_do_flow_control(rdinfo, vnd, enable);
+	if (r) {
+		kfree_skb(skb);
+		return RMNET_MAP_COMMAND_UNSUPPORTED;
+	} else {
+		return RMNET_MAP_COMMAND_ACK;
+	}
+}
+
+static void rmnet_map_send_ack(struct sk_buff *skb,
+			       unsigned char type,
+			       struct rmnet_real_dev_info *rdinfo)
+{
+	struct rmnet_map_control_command *cmd;
+	int xmit_status;
+
+	if (unlikely(!skb))
+		return;
+
+	skb->protocol = htons(ETH_P_MAP);
+
+	cmd = RMNET_MAP_GET_CMD_START(skb);
+	cmd->cmd_type = type & 0x03;
+
+	netif_tx_lock(skb->dev);
+	xmit_status = skb->dev->netdev_ops->ndo_start_xmit(skb, skb->dev);
+	netif_tx_unlock(skb->dev);
+}
+
+/* Process MAP command frame and send N/ACK message as appropriate. Message cmd
+ * name is decoded here and appropriate handler is called.
+ */
+rx_handler_result_t rmnet_map_command(struct sk_buff *skb,
+				      struct rmnet_real_dev_info *rdinfo)
+{
+	struct rmnet_map_control_command *cmd;
+	unsigned char command_name;
+	unsigned char rc = 0;
+
+	if (unlikely(!skb))
+		return RX_HANDLER_CONSUMED;
+
+	cmd = RMNET_MAP_GET_CMD_START(skb);
+	command_name = cmd->command_name;
+
+	switch (command_name) {
+	case RMNET_MAP_COMMAND_FLOW_ENABLE:
+		rc = rmnet_map_do_flow_control(skb, rdinfo, 1);
+		break;
+
+	case RMNET_MAP_COMMAND_FLOW_DISABLE:
+		rc = rmnet_map_do_flow_control(skb, rdinfo, 0);
+		break;
+
+	default:
+		rc = RMNET_MAP_COMMAND_UNSUPPORTED;
+		kfree_skb(skb);
+		break;
+	}
+	if (rc == RMNET_MAP_COMMAND_ACK)
+		rmnet_map_send_ack(skb, rc, rdinfo);
+	return RX_HANDLER_CONSUMED;
+}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
new file mode 100644
index 0000000..6d16c6ac
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -0,0 +1,105 @@
+/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * RMNET Data MAP protocol
+ *
+ */
+
+#include <linux/netdevice.h>
+#include "rmnet_config.h"
+#include "rmnet_map.h"
+#include "rmnet_private.h"
+
+#define RMNET_MAP_DEAGGR_SPACING  64
+#define RMNET_MAP_DEAGGR_HEADROOM (RMNET_MAP_DEAGGR_SPACING / 2)
+
+/* Adds MAP header to front of skb->data
+ * Padding is calculated and set appropriately in MAP header. Mux ID is
+ * initialized to 0.
+ */
+struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
+						  int hdrlen, int pad)
+{
+	struct rmnet_map_header *map_header;
+	u32 padding, map_datalen;
+	u8 *padbytes;
+
+	if (skb_headroom(skb) < sizeof(struct rmnet_map_header))
+		return 0;
+
+	map_datalen = skb->len - hdrlen;
+	map_header = (struct rmnet_map_header *)
+			skb_push(skb, sizeof(struct rmnet_map_header));
+	memset(map_header, 0, sizeof(struct rmnet_map_header));
+
+	if (pad == RMNET_MAP_NO_PAD_BYTES) {
+		map_header->pkt_len = htons(map_datalen);
+		return map_header;
+	}
+
+	padding = ALIGN(map_datalen, 4) - map_datalen;
+
+	if (padding == 0)
+		goto done;
+
+	if (skb_tailroom(skb) < padding)
+		return 0;
+
+	padbytes = (u8 *)skb_put(skb, padding);
+	memset(padbytes, 0, padding);
+
+done:
+	map_header->pkt_len = htons(map_datalen + padding);
+	map_header->pad_len = padding & 0x3F;
+
+	return map_header;
+}
+
+/* Deaggregates a single packet
+ * A whole new buffer is allocated for each portion of an aggregated frame.
+ * Caller should keep calling deaggregate() on the source skb until 0 is
+ * returned, indicating that there are no more packets to deaggregate. Caller
+ * is responsible for freeing the original skb.
+ */
+struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
+				      struct rmnet_real_dev_info *rdinfo)
+{
+	struct rmnet_map_header *maph;
+	struct sk_buff *skbn;
+	u32 packet_len;
+
+	if (skb->len == 0)
+		return 0;
+
+	maph = (struct rmnet_map_header *)skb->data;
+	packet_len = ntohs(maph->pkt_len) + sizeof(struct rmnet_map_header);
+
+	if (((int)skb->len - (int)packet_len) < 0)
+		return 0;
+
+	skbn = alloc_skb(packet_len + RMNET_MAP_DEAGGR_SPACING, GFP_ATOMIC);
+	if (!skbn)
+		return 0;
+
+	skbn->dev = skb->dev;
+	skb_reserve(skbn, RMNET_MAP_DEAGGR_HEADROOM);
+	skb_put(skbn, packet_len);
+	memcpy(skbn->data, skb->data, packet_len);
+	skb_pull(skb, packet_len);
+
+	/* Some hardware can send us empty frames. Catch them */
+	if (ntohs(maph->pkt_len) == 0) {
+		kfree_skb(skb);
+		return 0;
+	}
+
+	return skbn;
+}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
new file mode 100644
index 0000000..ed820b5
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
@@ -0,0 +1,45 @@
+/* Copyright (c) 2013-2014, 2016-2017 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _RMNET_PRIVATE_H_
+#define _RMNET_PRIVATE_H_
+
+#define RMNET_MAX_VND              32
+#define RMNET_MAX_PACKET_SIZE      16384
+#define RMNET_DFLT_PACKET_SIZE     1500
+#define RMNET_NEEDED_HEADROOM      16
+#define RMNET_TX_QUEUE_LEN         1000
+
+/* Constants */
+#define RMNET_EGRESS_FORMAT__RESERVED__         BIT(0)
+#define RMNET_EGRESS_FORMAT_MAP                 BIT(1)
+#define RMNET_EGRESS_FORMAT_AGGREGATION         BIT(2)
+#define RMNET_EGRESS_FORMAT_MUXING              BIT(3)
+#define RMNET_EGRESS_FORMAT_MAP_CKSUMV3         BIT(4)
+#define RMNET_EGRESS_FORMAT_MAP_CKSUMV4         BIT(5)
+
+#define RMNET_INGRESS_FIX_ETHERNET              BIT(0)
+#define RMNET_INGRESS_FORMAT_MAP                BIT(1)
+#define RMNET_INGRESS_FORMAT_DEAGGREGATION      BIT(2)
+#define RMNET_INGRESS_FORMAT_DEMUXING           BIT(3)
+#define RMNET_INGRESS_FORMAT_MAP_COMMANDS       BIT(4)
+#define RMNET_INGRESS_FORMAT_MAP_CKSUMV3        BIT(5)
+#define RMNET_INGRESS_FORMAT_MAP_CKSUMV4        BIT(6)
+
+/* Pass the frame up the stack with no modifications to skb->dev */
+#define RMNET_EPMODE_NONE (0)
+/* Replace skb->dev to a virtual rmnet device and pass up the stack */
+#define RMNET_EPMODE_VND (1)
+/* Pass the frame directly to another device with dev_queue_xmit() */
+#define RMNET_EPMODE_BRIDGE (2)
+
+#endif /* _RMNET_PRIVATE_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
new file mode 100644
index 0000000..240f41d
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -0,0 +1,268 @@
+/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ *
+ * RMNET Data virtual network driver
+ *
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/if_arp.h>
+#include <net/pkt_sched.h>
+#include "rmnet_config.h"
+#include "rmnet_handlers.h"
+#include "rmnet_private.h"
+#include "rmnet_map.h"
+#include "rmnet_vnd.h"
+
+/* RX/TX Fixup */
+
+int rmnet_vnd_rx_fixup(struct sk_buff *skb, struct net_device *dev)
+{
+	if (unlikely(!dev || !skb))
+		return RX_HANDLER_CONSUMED;
+
+	dev->stats.rx_packets++;
+	dev->stats.rx_bytes += skb->len;
+
+	return RX_HANDLER_PASS;
+}
+
+int rmnet_vnd_tx_fixup(struct sk_buff *skb, struct net_device *dev)
+{
+	struct rmnet_priv *priv;
+
+	priv = netdev_priv(dev);
+
+	if (unlikely(!dev || !skb))
+		return RX_HANDLER_CONSUMED;
+
+	dev->stats.tx_packets++;
+	dev->stats.tx_bytes += skb->len;
+
+	return RX_HANDLER_PASS;
+}
+
+/* Network Device Operations */
+
+static netdev_tx_t rmnet_vnd_start_xmit(struct sk_buff *skb,
+					struct net_device *dev)
+{
+	struct rmnet_priv *priv;
+
+	priv = netdev_priv(dev);
+	if (priv->local_ep.egress_dev) {
+		rmnet_egress_handler(skb, &priv->local_ep);
+	} else {
+		dev->stats.tx_dropped++;
+		kfree_skb(skb);
+	}
+	return NETDEV_TX_OK;
+}
+
+static int rmnet_vnd_change_mtu(struct net_device *rmnet_dev, int new_mtu)
+{
+	if (new_mtu < 0 || new_mtu > RMNET_MAX_PACKET_SIZE)
+		return -EINVAL;
+
+	rmnet_dev->mtu = new_mtu;
+	return 0;
+}
+
+static const struct net_device_ops rmnet_vnd_ops = {
+	.ndo_start_xmit = rmnet_vnd_start_xmit,
+	.ndo_change_mtu = rmnet_vnd_change_mtu,
+};
+
+/* Called by kernel whenever a new rmnet<n> device is created. Sets MTU,
+ * flags, ARP type, needed headroom, etc...
+ */
+void rmnet_vnd_setup(struct net_device *rmnet_dev)
+{
+	struct rmnet_priv *priv;
+
+	/* Clear out private data */
+	priv = netdev_priv(rmnet_dev);
+	memset(priv, 0, sizeof(struct rmnet_priv));
+
+	netdev_info(rmnet_dev, "Setting up device %s\n", rmnet_dev->name);
+
+	rmnet_dev->netdev_ops = &rmnet_vnd_ops;
+	rmnet_dev->mtu = RMNET_DFLT_PACKET_SIZE;
+	rmnet_dev->needed_headroom = RMNET_NEEDED_HEADROOM;
+	random_ether_addr(rmnet_dev->dev_addr);
+	rmnet_dev->tx_queue_len = RMNET_TX_QUEUE_LEN;
+
+	/* Raw IP mode */
+	rmnet_dev->header_ops = 0;  /* No header */
+	rmnet_dev->type = ARPHRD_RAWIP;
+	rmnet_dev->hard_header_len = 0;
+	rmnet_dev->flags &= ~(IFF_BROADCAST | IFF_MULTICAST);
+
+	rmnet_dev->needs_free_netdev = true;
+}
+
+int rmnet_vnd_is_rmnet(struct net_device *dev)
+{
+	return dev->netdev_ops == &rmnet_vnd_ops;
+}
+
+/* Exposed API */
+
+int rmnet_vnd_newlink(struct net_device *real_dev, int id,
+		      struct net_device *rmnet_dev)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	int rc;
+
+	rdinfo = rmnet_get_real_dev_info(real_dev);
+
+	if (rdinfo->rmnet_devices[id])
+		return -EINVAL;
+
+	rc = register_netdevice(rmnet_dev);
+	if (!rc) {
+		rdinfo->rmnet_devices[id] = rmnet_dev;
+		rmnet_dev->rtnl_link_ops = &rmnet_link_ops;
+	}
+	return rc;
+}
+
+/* Unregisters the virtual network device node and frees it.
+ * unregister_netdev locks the rtnl mutex, so the mutex must not be locked
+ * by the caller of the function. unregister_netdev enqueues the request to
+ * unregister the device into a TODO queue. The requests in the TODO queue
+ * are only done after rtnl mutex is unlocked, therefore free_netdev has to
+ * called after unlocking rtnl mutex.
+ */
+int rmnet_vnd_free_dev(struct net_device *real_dev, int id)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	struct net_device *rmnet_dev;
+	struct rmnet_endpoint *ep;
+
+	rdinfo = rmnet_get_real_dev_info(real_dev);
+
+	rtnl_lock();
+	if (id < 0 || id >= RMNET_MAX_VND || !rdinfo->rmnet_devices[id]) {
+		rtnl_unlock();
+		return -EINVAL;
+	}
+
+	ep = rmnet_vnd_get_endpoint(rdinfo->rmnet_devices[id]);
+	if (ep) {
+		rtnl_unlock();
+		return -EINVAL;
+	}
+
+	rmnet_dev = rdinfo->rmnet_devices[id];
+	rdinfo->rmnet_devices[id] = 0;
+	rtnl_unlock();
+
+	if (rmnet_dev) {
+		unregister_netdev(rmnet_dev);
+		free_netdev(rmnet_dev);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int rmnet_vnd_remove_ref_dev(struct net_device *real_dev, int id)
+{
+	struct rmnet_real_dev_info *rdinfo;
+	struct rmnet_endpoint *ep;
+
+	rdinfo = rmnet_get_real_dev_info(real_dev);
+	if (id < 0 || id >= RMNET_MAX_VND || !rdinfo->rmnet_devices[id])
+		return -EINVAL;
+
+	ep = rmnet_vnd_get_endpoint(rdinfo->rmnet_devices[id]);
+	rdinfo->rmnet_devices[id] = 0;
+	return 0;
+}
+
+/* Searches through list of known RmNet virtual devices. This function is O(n)
+ * and should not be used in the data path.
+ *
+ * To get the read id, subtract this result by 1.
+ */
+int rmnet_vnd_get_mux(struct net_device *real_dev,
+		      struct net_device *rmnet_dev)
+{
+	/* This is not an efficient search, but, this will only be called in
+	 * a configuration context, and the list is small.
+	 */
+	struct rmnet_real_dev_info *rdinfo;
+	int i;
+
+	rdinfo = rmnet_get_real_dev_info(real_dev);
+
+	if (!rmnet_dev)
+		return 0;
+
+	for (i = 0; i < RMNET_MAX_VND; i++)
+		if (rmnet_dev == rdinfo->rmnet_devices[i])
+			return i + 1;
+
+	return 0;
+}
+
+/* Gets the logical endpoint configuration for a RmNet virtual network device
+ * node. Caller should confirm that devices is a RmNet VND before calling.
+ */
+struct rmnet_endpoint *rmnet_vnd_get_endpoint(struct net_device *rmnet_dev)
+{
+	struct rmnet_priv *priv;
+
+	if (!rmnet_dev)
+		return 0;
+
+	priv = netdev_priv(rmnet_dev);
+	if (!priv)
+		return 0;
+
+	return &priv->local_ep;
+}
+
+int rmnet_vnd_do_flow_control(struct rmnet_real_dev_info *rdinfo,
+			      struct net_device *rmnet_dev, int enable)
+{
+	struct rmnet_priv *priv;
+
+	priv = netdev_priv(rmnet_dev);
+	if (unlikely(!priv))
+		return -EINVAL;
+
+	netdev_info(rmnet_dev, "Setting VND TX queue state to %d\n", enable);
+	/* Although we expect similar number of enable/disable
+	 * commands, optimize for the disable. That is more
+	 * latency sensitive than enable
+	 */
+	if (unlikely(enable))
+		netif_wake_queue(rmnet_dev);
+	else
+		netif_stop_queue(rmnet_dev);
+
+	return 0;
+}
+
+struct net_device *rmnet_vnd_get_by_id(struct net_device *real_dev, int id)
+{
+	struct rmnet_real_dev_info *rdinfo;
+
+	rdinfo = rmnet_get_real_dev_info(real_dev);
+
+	if (id < 0 || id >= RMNET_MAX_VND)
+		return 0;
+
+	return rdinfo->rmnet_devices[id];
+}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h
new file mode 100644
index 0000000..7de5839
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h
@@ -0,0 +1,33 @@
+/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * RMNET Data Virtual Network Device APIs
+ *
+ */
+
+#ifndef _RMNET_VND_H_
+#define _RMNET_VND_H_
+
+int rmnet_vnd_do_flow_control(struct rmnet_real_dev_info *rdinfo,
+			      struct net_device *dev, int enable);
+struct rmnet_endpoint *rmnet_vnd_get_endpoint(struct net_device *dev);
+int rmnet_vnd_free_dev(struct net_device *real_dev, int id);
+int rmnet_vnd_remove_ref_dev(struct net_device *real_dev, int id);
+int rmnet_vnd_rx_fixup(struct sk_buff *skb, struct net_device *dev);
+int rmnet_vnd_tx_fixup(struct sk_buff *skb, struct net_device *dev);
+int rmnet_vnd_get_mux(struct net_device *real_dev,
+		      struct net_device *rmnet_dev);
+struct net_device *rmnet_vnd_get_by_id(struct net_device *real_dev, int id);
+void rmnet_vnd_setup(struct net_device *dev);
+int rmnet_vnd_newlink(struct net_device *real_dev, int id,
+		      struct net_device *new_device);
+int rmnet_vnd_is_rmnet(struct net_device *dev);
+#endif /* _RMNET_VND_H_ */
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH RESEND 0/2] enable hires timer to timeout datagram socket
From: Richard Cochran @ 2017-08-19  6:21 UTC (permalink / raw)
  To: Vallish Vaidyeshwara
  Cc: davem, shuah, netdev, linux-kernel, eduval, anchalag, tglx
In-Reply-To: <20170818222756.GB28737@amazon.com>

On Fri, Aug 18, 2017 at 10:27:56PM +0000, Vallish Vaidyeshwara wrote:
> We have a on-demand application that uses long timeouts and needs to react to
> events within milliseconds.

Huh?  The test program you posted does not react to any event.

Thanks,
Richard

^ permalink raw reply

* [PATCH net-next] virtio-net: make napi_tx param easier to grasp
From: Koichiro Den @ 2017-08-19  6:37 UTC (permalink / raw)
  To: mst; +Cc: netdev, virtualization

The module param napi_tx needs not to be writable for now since we do
not have any means of activating/deactivating it online, which seems to
be a low priority. Also make it clear that napi_tx is disabled when it
has been dynamically disabled behind the scenes.

Signed-off-by: Koichiro Den <den@klaipeden.com>
---
 drivers/net/virtio_net.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4302f313d9a7..ea4e7ddcd377 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -37,7 +37,7 @@ module_param(napi_weight, int, 0444);
 static bool csum = true, gso = true, napi_tx;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
-module_param(napi_tx, bool, 0644);
+module_param(napi_tx, bool, 0444);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -1026,20 +1026,13 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
 	local_bh_enable();
 }
 
-static void virtnet_napi_tx_enable(struct virtnet_info *vi,
-				   struct virtqueue *vq,
-				   struct napi_struct *napi)
+static void virtnet_napi_tx_enable(struct virtqueue *vq, struct napi_struct *napi)
 {
-	if (!napi->weight)
-		return;
-
-	/* Tx napi touches cachelines on the cpu handling tx interrupts. Only
-	 * enable the feature if this is likely affine with the transmit path.
-	 */
-	if (!vi->affinity_hint_set) {
+	if (!napi_tx)
 		napi->weight = 0;
+
+	if (!napi->weight)
 		return;
-	}
 
 	return virtnet_napi_enable(vq, napi);
 }
@@ -1179,13 +1172,19 @@ static int virtnet_open(struct net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int i;
 
+	/* Tx napi touches cachelines on the cpu handling tx interrupts. Only
+	 * enable the feature if this is likely affine with the transmit path.
+	 */
+	if (!vi->affinity_hint_set)
+		napi_tx = false;
+
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		if (i < vi->curr_queue_pairs)
 			/* Make sure we have some buffers: if oom use wq. */
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
-		virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
+		virtnet_napi_tx_enable(vi->sq[i].vq, &vi->sq[i].napi);
 	}
 
 	return 0;
@@ -1890,6 +1889,12 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 
 	virtio_device_ready(vdev);
 
+	/* Tx napi touches cachelines on the cpu handling tx interrupts. Only
+	 * enable the feature if this is likely affine with the transmit path.
+	 */
+	if (!vi->affinity_hint_set)
+		napi_tx = false;
+
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->curr_queue_pairs; i++)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
@@ -1897,8 +1902,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
-			virtnet_napi_tx_enable(vi, vi->sq[i].vq,
-					       &vi->sq[i].napi);
+			virtnet_napi_tx_enable(vi->sq[i].vq, &vi->sq[i].napi);
 		}
 	}
 
-- 
2.9.4

^ permalink raw reply related

* [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Koichiro Den @ 2017-08-19  6:38 UTC (permalink / raw)
  To: mst; +Cc: netdev, virtualization

Facing the possible unbounded delay relying on freeing on xmit path,
we also better to invoke and clear the upper layer zerocopy callback
beforehand to keep them from waiting for unbounded duration in vain.
For instance, this removes the possible deadlock in the case that the
upper layer is a zerocopy-enabled vhost-net.
This does not apply if napi_tx is enabled since it will be called in
reasonale time.

Signed-off-by: Koichiro Den <den@klaipeden.com>
---
 drivers/net/virtio_net.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4302f313d9a7..f7deaa5b7b50 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1290,6 +1290,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* Don't wait up for transmitted skbs to be freed. */
 	if (!use_napi) {
+		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
+			struct ubuf_info *uarg;
+			uarg = skb_shinfo(skb)->destructor_arg;
+			if (uarg->callback)
+			    uarg->callback(uarg, true);
+			skb_shinfo(skb)->destructor_arg = NULL;
+			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
+		}
 		skb_orphan(skb);
 		nf_reset(skb);
 	}
-- 
2.9.4

^ permalink raw reply related

* [PATCH 1/2] vhost: remove the possible fruitless search on iotlb prefetch
From: Koichiro Den @ 2017-08-19  6:41 UTC (permalink / raw)
  To: mst, jasowang; +Cc: netdev, kvm, virtualization

Signed-off-by: Koichiro Den <den@klaipeden.com>
---
 drivers/vhost/vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e4613a3c362d..93e909afc1c3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1184,7 +1184,7 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
 	while (len > s) {
 		node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
 							   addr,
-							   addr + len - 1);
+							   addr + len - s - 1);
 		if (node == NULL || node->start > addr) {
 			vhost_iotlb_miss(vq, addr, access);
 			return false;
-- 
2.9.4

^ permalink raw reply related

* [PATCH 2/2] vhost-net: revert vhost_exceeds_maxpend logic to its original
From: Koichiro Den @ 2017-08-19  6:41 UTC (permalink / raw)
  To: mst, jasowang; +Cc: kvm, virtualization, netdev

To depend on vq.num and the usage of VHOST_MAX_PEND is not succinct
and in some case unexpected, so revert its logic part only.

Signed-off-by: Koichiro Den <den@klaipeden.com>
---
 drivers/vhost/net.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 06d044862e58..99cf99b308a7 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -433,11 +433,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 
 static bool vhost_exceeds_maxpend(struct vhost_net *net)
 {
+	int num_pends;
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
 
-	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
-		== nvq->done_idx;
+	num_pends = likely(nvq->upend_idx >= nvq->done_idx) ?
+		(nvq->upend_idx - nvq->done_idx) :
+		(nvq->upend_idx + UIO_MAXIOV - nvq->done_idx);
+
+	return num_pends > VHOST_MAX_PEND;
 }
 
 /* Expects to be always run from workqueue - which acts as
-- 
2.9.4

^ permalink raw reply related

* [PATCH 0/5] constify net eisa_device_id
From: Arvind Yadav @ 2017-08-19  6:51 UTC (permalink / raw)
  To: davem, tremyfr; +Cc: linux-kernel, netdev

eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.

Arvind Yadav (5):
  [PATCH 1/5] net: 3c509: constify eisa_device_id
  [PATCH 2/5] net: 3c59x: constify eisa_device_id
  [PATCH 3/5] net: de4x5: constify eisa_device_id
  [PATCH 4/5] net: hp100: constify eisa_device_id
  [PATCH 5/5] net: defxx: constify eisa_device_id

 drivers/net/ethernet/3com/3c509.c      | 2 +-
 drivers/net/ethernet/3com/3c59x.c      | 2 +-
 drivers/net/ethernet/dec/tulip/de4x5.c | 2 +-
 drivers/net/ethernet/hp/hp100.c        | 2 +-
 drivers/net/fddi/defxx.c               | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH 1/5] net: 3c509: constify eisa_device_id
From: Arvind Yadav @ 2017-08-19  6:51 UTC (permalink / raw)
  To: davem, tremyfr; +Cc: linux-kernel, netdev
In-Reply-To: <1503125503-15075-1-git-send-email-arvind.yadav.cs@gmail.com>

eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 drivers/net/ethernet/3com/3c509.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/3com/3c509.c b/drivers/net/ethernet/3com/3c509.c
index 077d01d..b223769 100644
--- a/drivers/net/ethernet/3com/3c509.c
+++ b/drivers/net/ethernet/3com/3c509.c
@@ -474,7 +474,7 @@ static int pnp_registered;
 #endif /* CONFIG_PNP */
 
 #ifdef CONFIG_EISA
-static struct eisa_device_id el3_eisa_ids[] = {
+static const struct eisa_device_id el3_eisa_ids[] = {
 		{ "TCM5090" },
 		{ "TCM5091" },
 		{ "TCM5092" },
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox